As I wrote previously, I had run into the limit of how much my neural networks could improve by playing against purely random opponents (or even random opponents with shortcuts, such as always taking winning moves if available) because the signal to noise ratio of those players was simply too low.
If a Neural Network played and one, there was nothing to learn, as the observable patterns would only reinforce what they already knew.
If a Neural Network played one of these random players and lost, there still wasn’t anything to learn as the random player was too erratic for patterns to be picked up.
So, I adjusted my algorithms allowing for a bit of randomness from the Neural Networks. Now, I have essentially 3 ways of playing:
- Neural Networks make the best move they see available. This is how they’ve been playing until now
- Neural Networks randomly choose between all available moves, weighted by how good they think each move is
- Neural Networks first normalize the move weights by a given amount, and then randomly choose between them. This makes the “good” moves less likely and the “bad” moves more likely, resulting in increased randomness.
The key here is that, even though they are choosing random moves, the Neural Networks are still adhering more or less to the patterns of “good” behavior they’ve already learned. Sure, they can pull a wild and random move out of thin air, but they’re not very likely to; this greatly increases the signal strength in the set of “random” matches they play.
The result is that the effectiveness of Larry (the Neural Network I’m currently training) has jumped from 80% against the SRP (where he was previously stuck for several generations) to over 90% in just a few generations. I’m going to see if I can get him over 95% and then try playing against him again to see how he does.
There is, of course, a risk here that he will overtrain and only learn how to beat himself. I’ll have to wait and see if that happens.