After working with the objective function for a while last week, there were still a lot of issues. For one thing, the new model didn’t train unless the learning rate was absurdly high- close to .1, which is orders of magnitude larger than it should be. For another, it didn’t do a good job. After talking about it with Nathan, he suggested that the reason it was doing so badly could be that having the token numbers implied a kind of ordinality that doesn’t exist. The solution would be to work with one-hot vectors, but the vocabulary size was large enough that it took a very long time to do that. Eventually I came across pytorch’s embedding layer, which worked amazingly- so well, in fact, that when I tested a trained model with pyribs, it actually generated text! Even without using an embedding as a seed, it generated somewhat plausible sentences (admittedly, still within the bounds of what the generator could produce), which was huge progress considering how terrible the other generated text could be.
Unfortunately, the other part- a spectrum of sentiment with the text- was not as present. Clearly, the classifier still had issues. Consequently, I decided to work on generating a new classifier. The results so far have been mixed- it’s actually started generating worse text when paired with the new classifier. I think that if I make a less simple network, things may get better- I also may have to tweak some settings to get it to work more efficiently. Overall, though, things are looking more and more likely that the network I make will actually work. Assuming it starts generating better text, the next step might be to start adding in the length of the text to add another axis to generate on (although I predict that will make the generated text much, much worse.)