【数据科学名家讲坛】Accelerating Model Training on Ascend Chips: An Industrial System for Profiling, Analysis and Optimization
2025-08-25 数据科学名家讲坛
SDS Colloquium Series | |
| Topic | Accelerating Model Training on Ascend Chips: An Industrial System for Profiling, Analysis and Optimization |
| Speaker | Chen TIAN, Professor, School of Computer Science, Nanjing University |
| Host | Yunming XIAO, Assistant Professor, School of Data Science, CUHK-Shenzhen |
| Date | 25 August (Monday), 2025 |
| Time | 2:30 PM - 3:30 PM, Beijing Time |
| Format | Hybrid |
| Venue | Room 111, Zhi Xin Building |
| Zoom Link | https://cuhk-edu-cn.zoom.us/j/92038677688?pwd=BjHAhAz5m4KnOS9nxiBeggQaBBzvOM.1
Meeting ID: 920 3867 7688
Passcode: 532942 |
| Language | English |
Abstract | |
| Training large-scale deep learning (DL) models is a resource-intensive and time-consuming endeavor, yet optimizing training efficiency poses significant challenges. The sporadic performance fluctuations during long training require advanced profiling capabilities. It is not easy to perform comprehensive and accurate bottleneck analysis amidst numerous influencing factors. Selecting effective optimization strategies without proper guidance further complicates the process. This talk shares our practical insights on optimizing training on Huawei Ascend chips based on three years of experience with 135 typical cases. We propose a systematic optimization system, Hermes, including a lightweight profiling approach, a hierarchical bottleneck analysis framework, and an optimization advisor. Our real-world experiments demonstrate significant acceleration in training for models like PanGu-α, MobileNetV1, and MoE (Mixture of Experts). | |
Biography | |
| Chen Tian is currently a professor and doctoral supervisor at the School of Computer Science, Nanjing University. In 2023, he was selected for the National Science Fund for Distinguished Young Scholars. His research expertise lies in computer networks and distributed systems. He has published more than 100 papers in top academic conferences and renowned international journals in the fields of computer networks and distributed systems, such as SIGCOMM, NSDI, OSDI, FAST, SIGMOD, PPoPP, and Eurosys. He proposed a congestion management concept centered on traffic control for next-generation data center networks, designed a stateful programmable network tester with independent intellectual property rights, led the realization of large-scale parallel acceleration of open-source network simulation software, and served as the rotating chairman of the OpenNetLab international network testbed. | |


