This week has been a mixed bag with the LSTM. Initially, I kind of followed a tutorial for a seq2seq model that used a GRU- ultimately that may have ended up being more complicated, since the embeddings I calculated act similarly to the encoder portion of a traditional seq2seq model, and I had to figure out how to combine the structure of the tutorial with what I want to use the model for. Once that worked, I tried to get it to run more efficiently by adding in batch sizes. What I’ve come to realize while working on this project is that the most frustrating part of it is trying to make sure all the dimensions line up. I want to say that I’m getting better at it, though, and after a few hours of trying to understand the documentation, I succeeded in getting it to run on batches. Finally, I converted the GRU to an LSTM.
The latest thing I’ve been struggling with is the limits of Google Colab. For one thing, it will occasionally deny access to the GPU because I use it so often; for another the GPU has a limited amount of memory and will not let me run it for long periods of time. It turns out that running an LSTM on oer 400,000 sentences takes a significant amount of both time and memory, and so to properly train the LSTM, I need to get a GPU that isn’t Colab-dependent. Luckily, the lab has a GPU I can use, but that relies on me being able to use the GPU on the remote computer. Before I do that, I want to double check that everything is set up efficiently, so that I’m not wasting the time on the GPU (and the according amount of energy). I also want to possibly implement a better way of getting the data from the dataset, on the off chance that that might speed the process up.