Postdoctoral & Assistant Researcher

Computer architecture &
machine-learning systems.

I am a postdoc at the School of Computer Science and Engineering, Beihang University, where I received my Ph.D. and M.S. in Computer Science under the supervision of Prof. Hailong Yang. I also received my B.S. in Computer Science from Beihang University.

My research lies at the intersection of computer architecture and machine-learning systems. I work on high-performance computing, performance analysis and tuning tools, deep-learning systems, and training & inference systems for large language models — in particular, how the next generation of AI workloads should reshape the hardware and software stacks beneath them. I currently serve as a core member on several national-level projects, including the National Key R&D Program of China and a Key Project of the National Natural Science Foundation of China (NSFC).

Some of my recent work includes profiling tools (GVARP, SimTrace) for diagnosing performance bottlenecks, variance, and trace analysis, and tools for GNN performance profiling and spatial-temporal analysis of parallel programs (GNNPerf, STAD).

High-performance computing Performance profiling & analysis tools Deep learning systems LLM training & inference systems

01News

  • Sep 2025Organizing a tutorial Identifying Software and Hardware Inefficiency at Scale at IEEE CLUSTER '25 (CCF-B).
  • Aug 2025Invited talk at HPC China — Performance Trace Tracking and Analysis Methods for Ultra-Large-Scale Supercomputing Systems (超大规模超算系统下的性能轨迹追踪与性能分析方法).
  • Jul 2025Our paper STAD was accepted to IEEE TPDS.
  • Jun 2025Our paper SimTrace was accepted to ACM TACO.
  • Nov 2024Our paper GVARP was accepted to SC '24.

02Research Interests

01
High-performance computing

Systems and runtime optimization for large-scale scientific workloads on supercomputing platforms.

02
Performance profiling & analysis tools

Scalable trace collection, synchronization-aware alignment, and performance variance diagnosis for parallel programs (e.g., SimTrace, STAD, GVARP).

03
Deep learning systems

GNN framework profiling, elastic training, and model serving optimization (e.g., GNNPerf).

04
Training & inference systems for LLMs

Efficient LLM serving and training on heterogeneous hardware.

03Education

Ph.D.Computer Science, Beihang University — advisors: Prof. Yi Liu & Prof. Hailong Yang
M.S.Computer Science, Beihang University
B.S.Computer Science, Beihang University

04Experience

Postdoc & Assistant Researcher, School of Computer Science and Engineering, Beihang University (Jan 2026 – present)

05Publications

Identifying Performance Inefficiencies of Parallel Program With Spatial and Temporal Trace Analysis

Zhibo Xuan, Xin Sun, Xin You, Hailong Yang, Zhongzhi Luan, Yi Liu, Depei Qian

IEEE Transactions on Parallel and Distributed Systems, 2025CCF-ADOI

· Combines spatial and temporal trace analysis to pinpoint performance inefficiencies in large-scale parallel programs.

SimTrace: Exploiting Spatial and Temporal Sampling for Large-Scale Performance Analysis

Zhibo Xuan, Xin You, Tianyu Feng, Hailong Yang, Zhongzhi Luan, Yi Liu, Depei Qian

ACM Transactions on Architecture and Code Optimization, 2025CCF-ADOI

· A sampling-based tracing framework that drastically reduces overhead while preserving diagnostic fidelity for HPC workloads.

GNNPerf: Towards Effective Performance Profiling and Analysis Across GNN Frameworks

Kejie Ma, Hailong Yang, Zizheng Zhang, Xin You, Zhibo Xuan, Qingxiao Sun, Zhongzhi Luan, Yi Liu, Depei Qian

IPDPS '25, 2025CCF-BDOI

· A cross-framework profiling tool that identifies and compares performance bottlenecks in GNN training pipelines.

Exploiting Synchronization-aware Transformer for Aligning Large-Scale MPI Traces

Zhibo Xuan, Xin You, Hailong Yang, Haoran Kong, Jingqi Chen, Tianyu Feng, Zhongzhi Luan, Yi Liu, Depei Qian

Frontiers of Computer Science, 2025CCF-B

· Leverages synchronization-aware transformers to efficiently align large-scale MPI traces for performance analysis.

GVARP: Detecting Performance Variance on Large-Scale Heterogeneous Systems

Xin You*, Zhibo Xuan*, Hailong Yang, Zhongzhi Luan, Yi Liu, Depei Qian* Equal contribution

SC '24, 2024CCF-ADOI

· Lightweight profiler that pinpoints the root causes of run-to-run performance variance on large-scale heterogeneous GPU systems.

06Awards & Honors

  • 2022SolverChallenge 2022 Second Prize (SolverChallenge 2022 二等奖)
  • 2022Kunpeng Zhongzhi Gold Quality Award (鲲鹏众智金质量奖)