CHEN, Zizhong

Presidential Chair Professor

Education Background

Ph.D. in Computer Science, University of Tennessee

M.S. in Economics, Renmin University of China

B.S. in Mathematics, Beijing Normal University

Research Field
High Performance Computing, GPU Acceleration, and AI Infrastructure
Personal Website
Email
chenzizhong@cuhk.edu.cn
Office
Room 519, Daoyuan Building
Biography

Professor Zizhong Chen is a Presidential Chair Professor at the Chinese University of Hong Kong, Shenzhen. He obtained his Ph.D. in Computer Science from the University of Tennessee in 2006 under the supervision of Professor Jack Dongarra. His research interests include High Performance Computing, GPU Acceleration, and AI Infrastructure. Professor Zizhong Chen ranks #11 worldwide in high performance computing at https://nebelwelt.net/pubstats/top-authors-sys_hpc.html.

Academic Publications

Selected Publications (with My Students Underlined, Full List at Google Scholar >>>)

SC'24Jinyang Liu, Jiannan Tian, Shixun Wu, Sheng Di, Boyuan Zhang, Robert Underwood, Yafan Huang, Jiajun HuangKai Zhao, Guanpeng Li, Dingwen TaoZizhong Chen, Franck Cappello,
High-ratio Scientific Lossy Compression on GPUs with Optimized Multi-level Interpolations,
IEEE/ACM The International Conference for High Performance computing, Networking, Storage and Analysis (IEEE/ACM SC2024),
Atlanta, Georgia, USA, November 17 - 22, 2024.
SC'24Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia ZhaiJinyang LiuZizhe JianXin LiangKai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur,
hZCC: Accelerating Collective Communication with Co-designed Operation-supported Compression,
IEEE/ACM The International Conference for High Performance computing, Networking, Storage and Analysis (IEEE/ACM SC2024),
Atlanta, Georgia, USA, November 17 - 22, 2024.
ICS'24Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia ZhaiJinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao, Xiaoyi Lu, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur,
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters,
Proceedings of THE 38th ACM International Conference on Supercomputing,,
Kyoto Japan June 4 - 7, 2024.
SIGMOD'24Jinyang Liu , Sheng Di, Kai ZhaoXin Liang, Sian Jin , Zizhe JianJiajun HuangShixun WuZizhong Chen, and Franck Cappello .
High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component Interpolation,
The 2024 ACM International Conference on Management of Data,,
Santiago, Chile, on June 9 - June 15, 2024.
IPDPS'24Zizhe Jian, Sheng Di, Jinyang LiuKai ZhaoXin Liang, Haiying Xu, Robert Underwood, Jiajun HuangShixun WuZizhong Chen, Franck Cappello
CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction,
Proceedings of the 38th IEEE International Parallel & Distributed Processing Symposium (IPDPS'24),
San Francisco, California, USA, May 27-21, 2024.
IPDPS'24Jiajun Huang, Sheng Di, Xiaodong Yu, Yujia Zhai, Zhaorui Zhang, Jinyang Liu, Xiaoyi Lu, Ken Raffenetti, Hui Zhou, Kai ZhaoZizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur,
An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression,
Proceedings of the 38th IEEE International Parallel & Distributed Processing Symposium (IPDPS'24),
San Francisco, California, USA, May 27-21, 2024.
IPDPS'23
(Best Paper Award)
Yujia Zhai, Chengquan Jiang, Leyuan Wang, Xiaoying Jia, Shang Zhang, Zizhong Chen, Xin Liu, and Yibo Zhu,
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs,
Proceedings of the 37th IEEE International Parallel & Distributed Processing Symposium (IPDPS'23),
St. Petersburg, Florida USA, May 15-19, 2023.
This research and the associated software are currently used to power all of ByteDance's in-house search businesses, including TikTok, Douyin, Xigua Video, Magellan, and Toutiao Search Portal, serving over billions of daily active users worldwide.
ICS'23
(Best Paper Finalist)
Jinyang Liu, Sheng Di, Jieyang ChenXin LiangZizhong Chen, and Franck Cappello.
FZ: A flexible auto-tuned modular error-bounded compression framework for scientific data,
Proceedings of THE 37th ACM International Conference on Supercomputing,,
Orlando, FL, June 21-23, 2023
ICS'23Shixun WuYujia ZhaiJinyang LiuJiajun Huang,Zizhe Jian, Bryan M. Wong, and Zizhong Chen.
Anatomy of High-Performance GEMM with Online Fault Tolerance on GPUs,
Proceedings of THE 37th ACM International Conference on Supercomputing,,
Orlando, FL, June 21-23, 2023
PPoPP'23Jieyang ChenXin LiangKai Zhao, Hadi Zamani Sabzi, Laxmi Bhuyan, and Zizhong Chen
Improving Energy Saving of One-sided Matrix Decompositions on CPU-GPU Heterogeneous Systems,
Proceedings of the 28th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'23),
Montreal, Canada, February 25 - March 1, 2023.
TPDS'23Yujia Zhai, Elisa Giem, Kai Zhao, Jinyang Liu, Jiajun Huang, Bryan Wong, Christian Shelton, and Zizhong Chen
FT-BLAS: A Fault Tolerant High Performance BLAS Implementation on x86 CPUs
IEEE Transactions on Parallel and Distributed Systems,
Volume: 34, Issue: 12, December, 2023.
SC'22Jinyang Liu, Sheng Di, Kai ZhaoXin LiangZizhong Chen, Franck Cappello,
Dynamic Quality Metric Oriented Error Bounded Lossy Compression for Scientific Datasets,
IEEE/ACM The International Conference for High Performance computing, Networking, Storage and Analysis (IEEE/ACM SC2022),
Dallas, Texas, USA, November 14 - 19, 2022.
IPDPS'22Yujia Zhai, Mohannad Ibrahim, Yiqin Qiu, Fabian Boemer, Zizhong Chen, Alexey Titov, and Alexander Lyashevsky,
Accelerating Encrypted Computing on Intel GPUs,
Proceedings of the 36th IEEE International Parallel & Distributed Processing Symposium (IPDPS'22),
Lyon, France, May 30 - June 3, 2022.
ICDE'22Kai Zhao, Sheng Di, Danny Perez, Franck Cappello, and Zizhong Chen,
MDZ: An Efficient Error-bounded Lossy Compressor for Molecular Dynamics Simulations,
Proceedings of the 38th IEEE International Conference on Data Engineering,
Virtual Event, May 9 - 12, 2022.
TPAMI'22Shuyin Xia, Daowan Peng, Deyu Meng, Changqing Zhang, Guoyin Wang, Elisabeth Giem, Wei Wei, and Zizhong Chen,
Ball k-Means: Fast Adaptive Clustering With No Bounds,
IEEE Transactions on Pattern Analysis and Machine Intelligence,
Volume: 44, Issue: 1, January 2022.
ICDE'21Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello,
Optimizing Error-Bounded Lossy Compression for Scienti?c Data by Dynamic Spline Interpolation,
Proceedings of the 37th IEEE International Conference on Data Engineering,
Chania, Crete, Greece, Apr 19 - 22, 2021.
TPDS'21Kai Zhao, Sheng Di, Sihuan LiXin LiangYujia ZhaiJieyang ChenKaiming Ouyang, Franck Cappello, and Zizhong Chen
FT-CNN: Algorithm-Based Fault Tolerance for Convolutional Neural Networks,
IEEE Transactions on Parallel and Distributed Systems,
Volume: 32, Issue: 7, July 2021.
SC'21Sihuan Li, Sheng Di, Kai ZhaoXin LiangZizhong Chen, and Franck Cappello
Resilient error-bounded lossy compressor for data transfer,
Proceedings of the 33th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
St. Louis, MO, USA. November 13 - 29, 2021. Acceptance Rate: 23.6% (86/365)
ICS'21Yujia ZhaiElisabeth GiemQuan FanKai ZhaoJinyang LiuZizhong Chen
FT-BLAS: a high performance BLAS implementation with online fault tolerance,
Proceedings of THE 35th ACM International Conference on Supercomputing,
Virtual Event, June, 2021. Acceptance Rate: 24%( 38/157)
HPDC'20Kai Zhao, Sheng Di, Xin LiangSihuan LiDingwen TaoZizhong Chen, and Franck Cappello
Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization,
Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing,
Stockholm, Sweden, June 23 - 26, 2020. Acceptance Rate: 22.5% (16/71)
SC'20Kaiming Ouyang, Min Si, Atsushi Hori, Zizhong Chen, and Pavan Balaji
CAB-MPI: exploring interprocess work-stealing towards balanced MPI communication,
Proceedings of the 32nd ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
Virtual Event, November 9-19, 2021. Acceptance Rate: 22%( 85/380)
SC'19Sihuan LiHongbo LiXin LiangJieyang ChenElisabeth GiemKaiming OuyangKai Zhao, Sheng Di, Franck Cappello, and Zizhong Chen
FT-iSort: Efficient Fault Tolerance for Introsort,
Proceedings of the 31st ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
Denver, Colorado, USA, Nov 17 - 22, 2019. Acceptance Rate: 20.9% (72/344).
SC'19Xin Liang, Sheng Di, Sihuan LiDingwen Tao, Bogdan Nicolae, Zizhong Chen, and Franck Cappello
Significantly Improving Lossy Compression Quality based on An Optimized Hybrid Prediction Model,
Proceedings of the 31st ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
Denver, Colorado, USA, Nov 17 - 22, 2019. Acceptance Rate: 20.9% (72/344).
ICS'19Jieyang ChenNan XiongXin LiangDingwen TaoSihuan LiKaiming OuyangKai Zhao, Nathan DeBardeleben, Qiang Guan, and Zizhong Chen
TSM2: Optimizing Tall-and-Skinny Matrix-Matrix Multiplication on GPUs,
Proceedings of the 33rd ACM International Conference on Supercomputing,
Phoenix, AZ, USA, June 26 - 28, 2019. Acceptance Rate: 23.3% (45/193)
ICS'19Hadi Zamani Sabzi, Yuanlai Liu, Devashree Tripathy, Laxmi Bhuyan, and Zizhong Chen
GreenMM: Energy-Efficient GPU Matrix Multiplication Through Undervolting,
Proceedings of the 33rd ACM International Conference on Supercomputing,
Phoenix, AZ, USA, June 26 - 28, 2019. Acceptance Rate: 23.3% (45/193)
TPDS’19Dingwen Tao, Sheng Di, Xin Liang, Zizhong Chen, Franck Cappello. "Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection Between SZ and ZFP." IEEE Transactions on Parallel and Distributed Systems 30, no. 8, 2019.
SC'18Jieyang ChenHongbo LiSihuan LiXin LiangPanruo WuDingwen TaoKaiming OuyangYuanlai LiuKai Zhao, Qiang Guan, and Zizhong Chen
Fault Tolerant One-sided Matrix Decompositions on Heterogeneous Systems with GPUs,
Proceedings of the 30th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
Dallas, Texas, USA, Nov 11 - 16, 2018. Acceptance Rate: 19.1% (55/288).
HPDC'18Dingwen Tao, Sheng Di, Xin LiangZizhong Chen, and Franck Cappello
Improving Performance of Iterative Methods by Lossy Checkponting,
Proceedings of the 27th ACM International Symposium on High-Performance Parallel and Distributed Computing,
Tempe, AZ, United States, June 11-15, 2018. Acceptance Rate: 18.2% (22/121).
HPDC'18Yuankun Fu, Feng Li, Fengguang Song, and Zizhong Chen
Performance Analysis and Optimization of In-situ Integration of Simulation with Data Analysis: Zipping Applications Up,
Proceedings of the 27th ACM International Symposium on High-Performance Parallel and Distributed Computing,
Tempe, AZ, United States, June 11-15, 2018. Acceptance Rate: 18.2% (22/121).
ICDCS'18Jieyang Chen, Qiang Guan, Zhao Zhang, Xin Liang, Louis Vernon, Allen Mcpherson, Li-Ta Lo, Zizhong Chen, Patricia Grubel, and James Ahrens
BeeFlow : a Workflow Management System for In situ Processing Across HPC and Cloud Systems,
Proceedings of the 38th IEEE International Conference on Distributed Computing Systems,
Vienna, Austria, July 2-5, 2018. Acceptance Rate: 20.6% (78/378).

IPDPS'18

 

 

 

Hongbo LiSihuan Li, Zachary Benavides, Zizhong Chen, and Rajiv Gupta
COMPI: Concolic Testing for MPI Applications,
Proceedings of the 32nd IEEE International Parallel & Distributed Processing Symposium,
Vancouver, British Columbia, Canada, May 21-25, 2018. Acceptance Rate: 24.5% (113/461).
SC'17Hongbo LiZizhong Chen, and Rajiv Gupta
ParaStack: Efficient Hang Detection for MPI Programs at Large Scale,
Proceedings of the 29th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
Denver, Colorado, USA, Nov 12 - 17, 2017. Acceptance Rate: 18.6% (61/327).
SC'17Xin LiangJieyang ChenDingwen TaoSihuan LiPanruo WuHongbo LiKaiming OuyangYuanlai Liu, Fengguang Song, and Zizhong Chen
Correcting Soft Errors Online in Fast Fourier Transform,
Proceedings of the 29th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
Denver, Colorado, USA, Nov 12 - 17, 2017. Acceptance Rate: 18.6% (61/327).
PPoPP'17Panruo Wu, Nathan DeBardeleben, Qiang Guan, Sean Blanchard, Jieyang ChenDingwen TaoXin LiangOuyang KaimingSihuan Li, and Zizhong Chen
Silent Data Corruption Resilient Two-sided Matrix Factorizations,
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
Austin, Texas, USA, February 4-8 2017. Acceptance Rate: 21.9% (29/132).
IPDPS'17Dingwen Tao, Sheng Di, Zizhong Chen, and Franck Cappello
Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization,
Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium,
Orlando, Florida USA, May 29 - June 2, 2017. Acceptance Rate: 23%.
SC'16Jieyang Chen*, Li Tan*, Panruo WuDingwen TaoHongbo LiXin LiangSihuan Li, Rong Ge, Laxmi Bhuyan, and Zizhong Chen
GreenLA: Green Linear Algebra Software for GPU-Accelerated Heterogeneous Computing,
Proceedings of the 28th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
Salt Lake City, Utah, USA, Nov 13- 18, 2016. Acceptance Rate: 18.4% (82/446). *Authors contributed equally.
HPDC'16Panruo Wu, Qiang Guan, Nathan DeBardeleben, Sean Blanchard, Dingwen TaoXin LiangJieyang Chen, and Zizhong Chen
Towards Practical Algorithm Based Fault Tolerance in Dense Linear Algebra,
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing,
Kyoto, JAPAN, May 31- June 4, 2016. Acceptance Rate: 15.5% (20/129).
HPDC'16Panruo Wu, Dong Li, Zizhong Chen, Jeffrey S. Vetter, Sparsh Mittal
Algorithm-Directed Data Placement in Hybrid Memory,
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing,
Kyoto, JAPAN, May 31- June 4, 2016. Acceptance Rate: 15.5% (20/129).
HPDC'16Dingwen Tao, Shuaiwen Leon Song, Sriram Krishnamoorthy, Panruo WuXin Liang, Zheng Eddy Zhang, Darren Kerbyson, and Zizhong Chen
New-Sum: A Novel Online ABFT Scheme For General Iterative Methods,
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing,
Kyoto, JAPAN, May 31- June 4, 2016. Acceptance Rate: 15.5% (20/129).
IPDPS'16Jieyang ChenXin Liang, and Zizhong Chen
Online Algorithm-Based Fault Tolerance for Cholesky Decomposition on Heterogeneous Systems with GPUs,
Proceedings of the 30th IEEE International Parallel & Distributed Processing Symposium,
Chicago, Illinois, USA, May 23-27, 2016. Acceptance Rate: 22.98% (114/496).
TACO'16Li TanZizhong Chen, and Suaiwen Leon Song
Scalable Energy Efficiency with Resilience for High Performance Computing Systems: A Quantitative Methodology,
ACM Transactions on Architecture and Code Optimization,
Volume 12 Issue 4, January 2016
IPDPS'15Li Tan, Shuaiwen Song, Panruo WuZizhong Chen, Rong Ge, and Darren Kerbyson
Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing,
Proceedings of the 29th IEEE International Parallel & Distributed Processing Symposium,
Hyderabad, India, May 25-29, 2015. Acceptance Rate: 21.77% (108/496).
TPDS'15Doug HakkarinenPanruo Wu, and Zizhong Chen
Fail-Stop Failure Algorithm-Based Fault Tolerance for Cholesky Decomposition,
IEEE Transactions on Parallel and Distributed Systems,
Volume: 26, Issue: 5, Page 1323-1335,May, 2015.
PARCO'14Li TanShashank KothapalliLongxiang ChenOmar HussainiRyan Bissiri, and Zizhong Chen
A Survey of Power and Energy Efficient Techniques for High Performance Numerical Linear Algebra Operations,
Parallel Computing,
Vol. 40, No. 10, pp. 559-573, Dec. 2014.
HPDC'14Panruo Wu and Zizhong Chen
FT-ScaLAPACK: Correcting Soft Errors On-Line for ScaLAPACK Cholesky, QR, and LU Factorization Routines,
Proceedings of the 23rd ACM International Symposium on High-Performance Parallel and Distributed Computing,
Vancouver, Canada, June 23-27, 2014. Acceptance Rate: 16.2% (21/130).
SC'13     Dong Li, Zizhong ChenPanruo Wu, and Jeffrey Vetter
Rethinking Algorithm-Based Fault Tolerance with a Cooperative Software-Hardware Approach,
Proceedings of the 25th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
Denver, CO, November 17-22, 2013. Acceptance Rate: 19.7% (90/457).
TC'13Doug Hakkarinen and Zizhong Chen
Multi-Level Diskless Checkpointing,
IEEE Transactions on Computers,
Vol. 62, No. 4, Page 772-783, April, 2013.
HPDC'13Teresa Davies and Zizhong Chen
Correcting Soft Errors Online in LU Factorization,
Proceedings of the 22nd ACM International Symposium on High-Performance Parallel and Distributed Computing,
New York City, NY, USA. June 17-21, 2013. Acceptance Rate: 15.3% (20/131).
PPoPP'13Zizhong Chen
Online-ABFT: An Online Algorithm Based Fault Tolerance Scheme for Soft Error Detection in Iterative Methods,
Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
Shenzhen, China, February 23-27, 2013. Acceptance Rate: 17.8% (26/146).
ICS'11Teresa DaviesChrister KarlssonHui LiuChong Ding, and Zizhong Chen
High Performance Linpack Benchmark: A Fault Tolerant Implementation without Checkpointing,
Proceedings of the 25th ACM International Conference on Supercomputing,
Tucson, Arizona, May 31 - June 4, 2011. Acceptance Rate: 21.7% (35/161).
HPDC'11Zizhong Chen
Algorithm-Based Recovery for Iterative Methods without Checkpointing,
Proceedings of the 20th ACM International Symposium on High-Performance Parallel and Distributed Computing,
San Jose, California, June 8-11, 2011. Acceptance Rate: 12.9% (22/170).
IPDPS'10Doug Hakkarinen and Zizhong Chen
Algorithmic Cholesky Factorization Fault Recovery,
Proceedings of the 24th IEEE International Parallel & Distributed Processing Symposium,
Atlanta, GA, USA, April 19-23, 2010. Acceptance Rate: 24.1% (127/527).
SC'09Zizhong Chen
Optimal Real Number Codes for Fault Tolerant Matrix Operations,
Proceedings of the 21st ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis,
Portland, OR, November 14-20, 2009. Acceptance Rate: 22.6% (59/261).
TC'09Zizhong Chen and Jack Dongarra
Highly Scalable Self-Healing Algorithms for High Performance Scientific Computing,
IEEE Transactions on Computers,
Vol. 58, No. 11, November, 2009.
TPDS'08Zizhong Chen and Jack Dongarra
Algorithm-Based Fault Tolerance for Fail-Stop Failures,
IEEE Transactions on Parallel and Distributed Systems,
Vol. 19, No. 12, December, 2008.
SISC'08

Julien Langou, Zizhong Chen, George Bosilca, and Jack Dongarra
Recovery Patterns for Iterative Methods in a Parallel Unstable Environment,

SIAM Journal on Scientific Computing,
Vol. 30, Issue 1, 2008.

IPDPS'06Zizhong Chen and Jack Dongarra.
Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Multiplications on Volatile Resources,
Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium,
Rhodes Island, Greece, April 25-29, 2006.
SIMAX'05Zizhong Chen and Jack Dongarra.
Condition Numbers of Gaussian Random Matrices,
SIAM Journal on Matrix Analysis and Applications,
Volume 27, Number 3, Page 603-620, 2005.
PPoPP'05Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julien Langou, Thara Angskun, George Bosilca, and Jack Dongarra.
Fault Tolerant High Performance Computing by a Coding Approach,
Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming,
Chicago, Illinois, USA, June 15-17, 2005.
PARCO'03Zizhong Chen, Jack Dongarra, Piotr Luszczek, and Kenneth Roche.
Self Adapting Software for Numerical Linear Algebra and LAPACK for Clusters,
Parallel Computing,
Volume 29, Number 11-12, Page 1723-1743, November-December, 2003.