LLM

This post documents my journey through Stanford CS336: Language Models from Scratch — specifically Assignment 1, where we implement core components of a language model. Overview CS336 is Stanford’s deep dive into building large language models from first principles. Assignment 1 focuses on: Tokenization (BPE implementation) Transformer architecture components Training loop fundamentals Key Takeaways 1. Byte-Pair Encoding (BPE) We first need to consider the related Python methods. BPE was originally developed for data compression but works remarkably well for subword tokenization in NLP. ...

How to Visually Understand the Self-Attention Equation

Building an LLM from Scratch: CS336 Assignment 1