About Me
I am a Staff Software Engineer at Google, working on large-scale AI systems to enable efficienct Gemini training and serving on TPUs. My research interests include AI Systems, Energy-efficient Computing, and Hardware/Software Co-design.
Before I joined Google, I received my Ph.D. from the University of Illinois Urbana-Champaign (UIUC) in 2022. I was a Google Ph.D. Fellow and Mavis Future Faculty Fellow. My research was conducted under the supervision of Prof. Deming Chen, with close collaboration with Prof. Wen-mei Hwu and Prof. Junjun Xiong. I completed my B.S. and M.S. at UESTC in Chengdu, China.
News
JUL
2022
OCT
2020
Xiaofan Receives 2020 Google Ph.D. Fellowship
Awarded the prestigious Google Ph.D. Fellowship, recognized as the only recipient in the mobile computing area worldwide for exceptional and innovative research.
Publications
2025
ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training
Yuran Ding, Xinwei Chen, Xiaofan Zhang, Zongwei Zhou.
Conference on Neural Information Processing Systems (NeurIPS) ML for Systems Workshop, Dec. 2025.
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Gemini Team.
arXiv preprint arXiv:2507.06261, July 2025.
Reconfigurable Stream Network Architecture
Chengyue Wang, Xiaofan Zhang, Jason Cong, James C. Hoe.
International Symposium on Computer Architecture (ISCA), Jun. 2025.
Profile-Guided Quantization: a compiler solution to automate quantization for efficient LLM training
Gil Tabak, Clemens JS Schaefer, Xiaofan Zhang, Denali Molitor, Jinliang Wei, Zongwei Zhou, Philip G Hendrix, Mitchelle Rasquinha.
International Symposium on Computer Architecture (ISCA) workshop on Machine Learning for Computer Architecture and Systems (MLArchSys), Jun. 2025.
SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training
Kun Wu*, Jeongmin Brian Park*, Xiaofan Zhang*, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu. (*equal contributors)
Design Automation Conference (DAC), Jun. 2025.
2024
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Lin.
Conference on Neural Information Processing Systems (NeurIPS), Dec. 2024.
New Solutions on LLM Acceleration, Optimization, and Application Invited
Yingbing Huang, Jiaxin Wan, Hanchen Ye, Manvi Jha, Jinghua Wang, Yuhong Li, Xiaofan Zhang, Deming Chen.
Design Automation Conference (DAC), June 2024.
AutoAI2C: An Automated Hardware Generator for DNN Acceleration on both FPGA and ASIC
Yongan Zhang, Xiaofan Zhang, Pengfei Xu, Yang Zhao, Cong Hao, Deming Chen, Yingyan Lin.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD).
Software/Hardware Co-design for LLM and Its Application for Design Verification Invited
Jiaxin Wan, Yingbing Huang, Yuhong Li, Hanchen Ye, Jinghua Wang, Xiaofan Zhang, Deming Chen.
Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2024.
HomeSGN: A Smarter Home with Novel Rule Mining Enabled by a Scorer-Generator GAN
Zehua Yuan, Junhao Pan, Xiaofan Zhang, Deming Chen.
Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2024.
2023
Compilation and Optimizations for Efficient Machine Learning on Embedded Systems
Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li, Deming Chen.
Book chapter in Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, Springer Nature.
EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search
Qian Jiang*, Xiaofan Zhang*, Deming Chen, Minh N Do, Raymond A Yeh. (*equal contributors)
International Conference on Machine Learning (ICML) Workshop on Differentiable Almost Everything, July 2023.
2022
Efficient AI Hardware Acceleration
Xiaofan Zhang.
Dissertation, University of Illinois Urbana-Champaign (UIUC).
Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems
Xiaofan Zhang, Yuan Ma, Jinjun Xiong, Wen-mei Hwu, Volodymyr Kindratenko, Deming Chen.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD).
Algorithm/Accelerator Co-Design and Co-Search for Edge AI
Xiaofan Zhang, Yuhong Li, Junhao Pan, Deming Chen.
IEEE Transactions on Circuits and Systems II, 2022.
AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
Xiaofan Zhang, Zongwei Zhou, Deming Chen, Yu Emma Wang.
arXiv preprint: 2201.08539, Jan, 2022.
2021
F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding
Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, Yuecheng Li
Design Automation Conference (DAC), Dec. 2021.
Exploring HW/SW Co-Optimizations for Accelerating Large-scale Texture Identification on Distributed GPUs
Junsong Wang, Xiaofan Zhang, Yubo Li, Yonghua Lin.
International Conference on Parallel Processing (ICPP), Aug. 2021.
Efficient Methods for Mapping Neural Machine Translator on FPGAs
Qin Li*, Xiaofan Zhang*, Jinjun Xiong, Wen-mei Hwu, Deming Chen. (*equal contributors)
IEEE Transactions on Parallel and Distributed Systems (TPDS).
Being-ahead: Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment
Xiaofan Zhang, Hanchen Ye, Deming Chen.
Conference on Machine Learning and Systems (MLSys) workshop on Benchmarking Machine Learning Workloads on Emerging Hardware (MLBench), Apr. 2021.
2020
DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator
Xiaofan Zhang*, Hanchen Ye*, Junsong Wang, Yonghua Lin, JinJun Xiong, Wen-mei Hwu, Deming Chen. (*equal contributors)
International Conference on Computer Aided Design (ICCAD), Nov. 2020.
Effective Algorithm-Accelerator Co-design for AI Solutions on Edge DevicesInvited
Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen.
ACM Great Lakes Symposium on VLSI (GLSVLSI), Sep. 2020.
HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation
Hanchen Ye, Xiaofan Zhang, Zhize Huang, Gengsheng Chen, Deming Chen.
Design Automation Conference (DAC), July 2020.
EDD: Efficient Differentiable DNN architecture and implementation co-search for embedded AI solutions
Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen.
Design Automation Conference (DAC), July 2020.
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems DAC'19 Champion Design
Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen.
Conference on Machine Learning and Systems (MLSys). Mar. 2020.
AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs
Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin.
International Symposium on Field-Programmable Gate Arrays (FPGA). Feb. 2020.
2019
A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices Best Poster Award
Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen.
International Conference on Machine Learning (ICML) Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations (ODML-CDNNR). June 2019.
FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge
Cong Hao*, Xiaofan Zhang*, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen. (*equal contributors)
Design Automation Conference (DAC). June 2019.
Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs
Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, Deming Chen.
International Symposium on Field-Programmable Gate Arrays (FPGA). Feb. 2019.
Implementing Neural Machine Translation with Bi-Directional GRU and Attention Mechanism on FPGAs Using HLS
Qin Li*, Xiaofan Zhang*, JinJun Xiong, Wen-mei Hwu, Deming Chen. (*equal contributors)
Asia and South Pacific Design Automation Conference (ASP-DAC). Jan. 2019.
2018
DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Best Paper Award
Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, JinJun Xiong, Wen-mei Hwu, Deming Chen.
International Conference on Computer Aided Design (ICCAD). Nov. 2018.
Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA
Junsong Wang, Qiuwen Lou, Xiaofan Zhang, Chao Zhu, Yonghua Lin, Deming Chen.
International Conference on Field-Programmable Logic and Applications (FPL). Aug. 2018.
CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
Yuhong Li, Xiaofan Zhang, Deming Chen.
Computer Vision and Pattern Recognition (CVPR). June 2018.
Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs
Chuanhao Zhuge, Xinheng Liu, Xiaofan Zhang, Sudeep Gummadi, Jinjun Xiong, Deming Chen.
Great Lakes Symposium on VLSI (GLSVLSI). May 2018.
2017
An Energy Efficient Approach for C4.5 Algorithm using OpenCL Design Flow
Hai Peng, Xiaofan Zhang, Letian Huang.
International Conference on Field-Programmable Technology (FPT). Dec. 2017.
Machine Learning on FPGAs to Face the IoT Revolution Invited
Xiaofan Zhang*, Anand Ramachandran*, Chuanhao Zhuge*, Di He, Wei Zuo, Zuofu Cheng, Kyle Rupnow, Deming Chen. (*equal contributors)
International Conference On Computer Aided Design (ICCAD). Nov. 2017.
High-Performance Video Content Recognition with Long-term Recurrent Convolutional Network for FPGA
Xiaofan Zhang, Anand Ramachandran, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Kyle Rupnow, Deming Chen
International Conference on Field-Programmable Logic and Applications (FPL). Sep. 2017.
Awards & Fellowships
Google Gold Perfy Award
Google Silver Perfy Award
Google Ph.D. Fellowship
ACM Student Research Competition Winner Award (ICCAD)
Mavis Future Faculty Fellowship (MF3)
Rambus Computer Engineering Fellowship
Sundaram Seshu International Student Fellowship
Service
Peer Reviewer
Journal Reviewer
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
- IEEE Transactions on Circuits and Systems Part II (TCAS-II)
- IEEE Embedded Systems Letters (ESL)
- ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Conference Technical Program Committee / Reviewer
- 2025 International Symposium on Computer Architecture (ISCA) Workshop on Machine Learning for Computer Architecture and Systems (MLArchSys)
- 2025 ACM/IEEE Design Automation Conference (DAC)
- 2024 IEEE International Workshop on LLM-Aided Design (LAD)
- 2023 - 2024 ACM/IEEE International Conference on Computer-Aided Design (ICCAD)
- 2023 ACM/IEEE Supercomputing Conference (SC)
- 2023 Great Lakes Symposium on VLSI (GLSVLSI)
- 2023 Conference on Machine Learning and Systems (MLSys)
- 2016 - 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA)
- 2018 - 2022 IEEE International Symposium On Field-Programmable Custom Computing Machines (FCCM)
Session Chair / Competition Judge
- Technical Session Chair: 2024 ICCAD: Architectural Mapping
- Technical Session Chair: 2024 ICCAD: Applications and Architectures
- Technical Session Chair: 2023 ICCAD: Sustainable AI Training at the Large and Tiny Scales
- Competition Judge: 2023 ACM Student Research Competition at ICCAD
- Competition Judge: 2023 Ph.D. Forum at FCCM
- Competition Judge: 2022 ACM Student Research Competition at ICCAD
Teaching
- Guest Lecturer: ELEC 515: Embedded Machine Learning: FPGA for AI Inference (Rice University, Fall 2020)
- Guest Lecturer: IEEE Council on Electronic Design Automation (CEDA) Lecture Series
- Head Teaching Assistant: ECE 498 ICC: IoT and Cognitive Computing (UIUC, Spring 2020)
- Teaching Assistant: ECE 498 ICC: IoT and Cognitive Computing (UIUC, Spring 2019)
Topic: Hardware Accelerator Design and Development
Topic: FPGA-based Accelerator Design for AI Inference