Content
What is this course about?
Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks. As the field of artificial intelligence (AI), machine learning (ML), and NLP continues to grow, possessing a deep understanding of language models becomes essential for scientists and engineers alike. This course is designed to provide students with a comprehensive understanding of language models by walking them through the entire process of developing their own. Drawing inspiration from operating systems courses that create an entire operating system from scratch, we will lead students through every aspect of language model creation, including data collection and cleaning for pre-training, transformer model construction, model training, and evaluation before deployment.
Prerequisites
-
Proficiency in Python
The majority of class assignments will be in Python. Unlike most other AI classes, students will be given minimal scaffolding. The amount of code you will write will be at least an order of magnitude greater than for other classes. Therefore, being proficient in Python and software engineering is paramount.
-
Experience with deep learning and systems optimization
A significant part of the course will involve making neural language models run quickly and efficiently on GPUs across multiple machines. We expect students to be able to have a strong familiarity with PyTorch and know basic systems concepts like the memory hierarchy.
-
College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
You should be comfortable understanding matrix/vector notation and operations.
-
Basic Probability and Statistics (e.g. CS 109 or equivalent)
You should know the basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
-
Machine Learning (e.g. CS221, CS229, CS230, CS124, CS224N)
You should be comfortable with the basics of machine learning and deep learning.
Note that this is a 5-unit class. This is a very implementation-heavy class, so please allocate enough time for it.