2. The Transformer

Today's lab is about training our first simple transformer architecture and see how things are running on a GPU. We'll run it on a GPU and train a model to tell tiny stories. Please follow the jupyer lab

Our task today is to go through the notebook, and after this play with different model parameters. We'll modify the learning rate, model architecture (hidden dimension, number of layers), batch size, number of epochs. Try even increasing the model size to 5x parameters. Note down at the end of the notebook an informal report on how this changes the performance of the model*

Datacenter Computing

2. The Transformer