Abstract
Training large language models (LLM) costs a huge number of GPU-hours and US dollars. The high financial and hardware is only affordable for giant tech companies. In this work, I will describe our research results to address this challenge. The first work, CoMERA, is a rank-adaptive tensor-compressed training framework, which has sped up the pre-training of CodeBERT by 2X while reducing the mode size by 4.3X. The second work, CoLA, is a memory- and computing-efficient pre-training method exploiting the low-dim property of activation functions. This method achieves simultaneous reduction of model size, GPU memory and runtime in LlaMA pre-training on the C4 dataset, while achieving the same or even better accuracy than standard pre-training.
Biography
Dr. Zheng Zhang is a Full Professor of Electrical and Computer Engineering at University of California, Santa Barbara. He received his PhD degree in Electrical Engineering and Computer Science from MIT in 2015. His group currently focuses on efficient LLM pre-training, fine-tuning and on-device learning.