2. The Transformer
Today's lab is about training our first simple transformer architecture on a GPU and multiple GPU.
First, we'll train it on a GPU and train a model to tell tiny stories. Please follow the jupyer lab
The task is to go through the notebook, and after this play with different model parameters. We'll modify the learning rate, model architecture (hidden dimension, number of layers), batch size, number of epochs. Try even increasing the model size to 5x parameters.
Note down at the end of the notebook a report on how this changes the performance of the model