After last week, I was really excited to train my model on the lab’s GPU! Unfortunately, things did not go according to plan. I accessed the computer fine with Anydesk, and downloaded my files to it. The disadvantage of using the lab computer, though, was that I had to be careful on the computer to not update or downgrade (or install) any packages. I was able to get around that with a virtual environment, but when I tried to run my code, it couldn’t access the CUDA. When I tested it on the computer, it could access the CUDA- but I couldn’t install packages I would need later, and I couldn’t load any of my data because of the Pytorch version being too outdated. Eventually, the solution we came up with was to train the model on Nathan’s (my PhD student mentor) personal computer, which has a GPU. After some minor errors, and fixing the code to run marginally faster (paradoxically by taking the data off the GPU), it started training.
Even though it trains faster now, it’s still around half an hour per epoch. Around 61 epochs, with the loss still going down, Nathan sent me the latest saved model. After testing it on the test set, I was impressed by its performance- it’s definitely not done training yet, with an average loss of .99 per sentence, but the generated sentences are pretty close. They still have an unfortunate tendency to repeat themselves- for example, given the sentence embedding for “came here with my family this week”, it outputs “came here my my with my my family week”. It’s not great, but when I think in context that it’s never seen that sentence before and can almost replicate it, it’s still impressive- I’m excited to see how well it performs when it’s been trained for longer.
The last part that I’ve been working on is creating something that scores a sentence (as in how likely an input string is to be a valid sentence). Luckily for me, that’s something that other people have already made, so the challenge lies in trying to make sure it works with the structure of my code. Transformers, which I’ve already been using for Roberta, also provides pretrained GPT2, so I’ve created a function that takes in the Roberta tokens and outputs GPT tokens. Then, it’s just a mater of getting the loss for the sentences. So far, it seems to work out, so hopefully that will come in handy for the next part of the project.