Week 6

Coming off of last week, I was very excited to find out which version of the model was the “best” version of the model. The model that I’d worked with that had performed the best was the model after the 61st epoch, but I had a suspicion that it was overfitted, which was confirmed when we looked at the error on the validation set and found that the best-performing models were around the 40-45 epoch mark. The model we selected to continue with was the one that had been trained for 40 epochs, as it was the first model with comparably low error to the other four lowest error models.

With that out of the way, all of the component parts were in place, and I was able to start working on the final generation method. This uses a package called Pyribs, which generates random numbers as inputs and then archives them with their objective function (how good the sentences are, measured using GPT2’s loss for the sentences) and their behavior (In this case, measuring how positive or negative they are with the classifier). In theory, this generates a variety of texts! In practice, though, the text it was generating was not good. None of the sentences even looked like sentences. Thinking about it, there are a couple reasons: For testing, I had implemented it for one sentence at a time, which made it very slow. Also, the model isn’t trained to generate good sentences, it’s trained to generate sentences that correspond to embeddings, so the fact that the numbers are entirely random means that there’s a chance they would never generate good sentences.

The first way of solving this problem, obviously, is speeding up the process by batching it so that there’s a greater chance that good sentences will be formed, even if it’s unlikely. To that end, I’ve been planning to convert it to batching, although there are some minor snags I’ve hit- namely, that GPT2 doesn’t support batching. Luckily, a Roberta model operates similarly, which also cuts out the need to convert between tokens, which hopefully will speed it up more. I still haven’t implemented this, but with this preparation, next week it should be pretty easily done.

The second thing that could help would be using the random vectors as noise to an existing vector, which I’ve already found to work better than totally random vectors for generating. If increasing the number of times it can run doesn’t help, this is a good next step to make sure that it can replicate the vectors.

A less likely solution would be (unfortunately) to retrain the model, but this time with a weighted loss function that also prioritizes making sentences that make sense, for example by using the loss from the Roberta pretrained model. This could make the generatory worse at replicating the embeddings exactly, but would increase the number of sentences that look like sentences. This would be a total last resort, though, given how computationally expensive it is to retrain the model (although now there’s an idea of the approximate amount of epochs to shoot for). Hopefully, it doesn’t come to that, and I can solve it using the first two solutions.

Written on July 16, 2021