Building a LLM from Scratch

INPROGRESS

This article will cover everything you need to know to build and train a LLM from scratch while also covering some more advanced and modern techniques to achieve better performance. This requires a certain level of technical knowledge, if you come from a programming or mathematical background you should be able to follow along just fine. If you are intrested in building up your knowledge to better understand the material below I will list some soft prerequisites:

  • Linear Algebra
    • LAFF (covers basic linear algebra)
    • ALAFF (covers some more advanced topics to provide a deeper understanding that I believe to be very beneficial to your understanding)
    • (ulaff.net/)[http://ulaff.net/]
  • Calculus
    • openstax Calculus 1-3
    • (openstax.org)[https://openstax.org/subjects/math]
  • Python
    • Pytorch
    • (pytorch.org/docs)[https://pytorch.org/docs/stable/index.html]

Tokenization

  • Byte Pair Encoding

Embeddings

Self-Attention

Multi-Head Self-Attention

Transformers

Positional Encoding

Training

Dataset

Gradient Accumulation

Training Time

Gradient Clipping

Context Length

Advanced Topics

RLHF

RoPE

Chain of Thought

nGPT

BUS

Byte Latent Transformer