PerfXLab (Beijing) Ltd. was established in 2016, the company is committed to the technical research and development in the field of computing software stack technology. We have established deep cooperation with domestic outstanding computing hardware companies (Huawei, Suiyuan, Haiguang, Ali Pingtouge, etc.) and research institutes to jointly promote the construction of a "new infrastructure" for advanced computing technology. On the other hand, the self-developed PerfXAPI heterogeneous computing software stack is enabling scientific research and revolutionary innovative applications in the industry.
Zhang Xianyi. He graduated from Beijing University of Technology, Ph.D. from the Chinese Academy of Sciences, and has done postdoctoral research at UT Austin and MIT. He is the initiator and main maintainer of OpenBLAS, an internationally renowned open source matrix computing project. He received the Second Prize of Science and Technology of the Chinese Computer Society in 2016, the Outstanding Scientific and Technological Achievement Award of the Chinese Academy of Sciences in 2017, and the Best Paper Award from SIAM Activity Group on Supercomputing in 2020.
Job Description
The Computational Library Perf-Optimization group is responsible for development and maintenance of high-performance deep learning/scientific computing fundamental mathematical libraries.
Job Responsibilities
Perform instruction-level optimization of various operators; deep learning algorithm performance optimization; solver performance optimization; participate in the development of AI frameworks and underlying computational libraries.
Job Requirements
Bachelor's degree or above in computer science, electrical engineering, mathematics, automation, or other related majors; experience in parallel computing, heterogeneous computing, and computational performance optimization; familiar with C/C++; more than 1 year of experience; fresh graduates with solid basic skills are also acceptable.
Job Description
Adaptation, performance tuning, and evaluation for various computing hardware based on ONNX runtime and PerfXAPI.
Job Responsibilities
Design large-scale machine learning frameworks based on smart processors; Develop deep learning model quantization and optimization algorithms for smart processors; In-depth custom development and performance tuning of mainstream deep learning frameworks.
Job Requirements
Proficiency in C/C++/Python, solid programming foundation; familiar with basic machine learning algorithms, experience in ONNX runtime framework adaptation is preferred; good collaborative communication skills and teamwork ability; participated in or responsible for the design and performance tuning of large software frameworks or open source community experience; more than 1 year of relevant work experience.
Job Description
Adaptation, performance tuning, and evaluation for various computing hardware based on ONNX runtime and PerfXAPI.
Job Responsibilities
Implement AI arithmetic on NPU chips; integrate back-end frameworks such as TVM; develop NPU toolchain; verify and tune NPU toolchain; participate in NPU architecture design and optimization design.
Job Requirements
Bachelor's degree in computer science, electrical engineering, software engineering, or computer technology; proficiency in compilation principles and related algorithms, especially with experience in programming LLVM, XLA, etc.; proficiency in C/C++, Python programming; abundant and extreme experience in deep learning frameworks (e.g. Tensorflow, Pytorch, etc.) and multifarious models; good collaborative communication skills and teamwork ability.
Xianyi Zhang [email protected] 13466545921
Junhui Wang [email protected] 13510090675
Building 9, Yard 55, Haidian, Zique Road, Beijing, China
Room 511, DoBe, Yuelu District, Changsha, China