Lecture Series in AI: “Training Language Models in Academia: Research Questions and Opportunities”

Columbia Engineering 7,857 lượt xem 4 months ago

Video Not Working? Fix It Now

ABOUT THE LECTURE
Large language models have emerged as transformative tools in artificial intelligence, demonstrating unprecedented capabilities in understanding and generating human language. While these models have achieved remarkable performance across a wide range of benchmarks and enabled groundbreaking applications, their development has been predominantly concentrated within large technology companies due to substantial computational and proprietary data requirements. In this talk, I will present a vision for how academic research can play a critical role in advancing the open language model ecosystem, particularly by developing smaller yet highly capable models and advancing our fundamental understanding of training practices. Drawing from our research group's recent projects, I will examine key research questions and challenges in both pre-training and post-training stages. Our work spans developing small language models (Sheared LLaMA; 1-3B parameters), the state-of-the-art less than 10B model on Chatbot Arena (gemma-2-SimPO), and long-context models supporting up to 512K tokens (ProLong). These examples illustrate how academic research can push the boundaries of model efficiency, capability, and scalability. I will conclude by exploring future directions and highlighting opportunities to shape the development of more accessible and powerful language models.

ABOUT THE SPEAKER
Danqi Chen, Assistant Professor of Computer Science at Princeton University and Associate Director of Princeton Language and Intelligence

Columbia Engineering

Fu Foundation of Engineering and Applied Science

Columbia University

engineering school

lecture series in ai

Danqi Chen

artificial intelligence

technology

innovation

engineering

large language models

Comment