From the Book - [First edition].
Understanding large language models
Coding attention mechanisms
Implementing a GPT model from scratch to generate text
Pretraining on unlabeled data
Fine-tuning for classification
Fine-tuning to follow instructions
References and further reading
Adding bells and whistles to the training loop
Parameter-efficient fine-tuning with LoRA.