So, I took a few days off to play video games instead of overheating my GPU on model training. I’m starting up again now, but first I decided to change my training data a bit.
Mostly I’m annoyed that the NNs are performing so poorly against the smart random player. So, for the next few rounds, the training data is going to be generated by the SRP playing itself. Specifically, this means the NNs will be studying games where the SRP was defeated. Admittedly it will also have won those games, but the point is that by studying how the SRP is beaten the NNs should start winning against it in the tournament.