CS336

This post documents my journey through Stanford CS336: Language Models from Scratch — specifically Assignment 1, where we implement core components of a language model. Overview CS336 is Stanford’s deep dive into building large language models from first principles. Assignment 1 focuses on: Tokenization (BPE implementation) Transformer architecture components Training loop fundamentals Key Takeaways 1. Byte-Pair Encoding (BPE) We first need to consider the related Python methods. BPE was originally developed for data compression but works remarkably well for subword tokenization in NLP. ...