After writing yesterday, I ran a few rounds through the evening and overnight (god bless NVidia. My GTX1030 finished four rounds of training in that time frame). Unfortunately, I can now confirm that I’ve reached the limit of what I can train with my current method of generating training data.
Some background: Neural Networks, as we currently employ them, simply look at data and tease out patterns. High-quality data will have a strong signal to noise ratio; low-quality data will have a large amount of noise or even anti-signals that might look promising at first but end up leading you astray. The ultimate effectiveness of your neural network, then, is determined by the quality of your training data as much as it is the size and structure of your network.
At first, I was playing my NNs against each other, looking at who won, and using that as the basis for my training. Unfortunately, this proved extremely slow, as the players were roughly analogous to random number generators. So, I bootstrapped the process, by playing the Smart Random Player against itself.
The main idea here is that, by seeing the SRP lose, my networks would learn how to beat it. Sure, it is still essentially a random player, and so there would be a lot of “noise” mixed in there, but I was hoping that the strength of the signal would at least give my NNs a leg up.
And, in fact, it did- for a time. My NNs trained and learned and improved dramatically. Some of them now win as often as 60% of the time against the SRP. The problem is that there is a great deal of variability here; a model that will win 60% of one set of 100 games will, after training for another round, win only 40% of the next set.
There are a couple ways around this… The first, and simplest would be to increase the size of the training set dramatically. The increased noise in the set should cancel itself out (as it’s essentially random), while the increased signal should break through and allow my NNs to continue learning.
The second way around this is to go back to my original method; pick the best few NNs, play a series of games, and look at who won. Specifically, I need to pick out games where the best NNs lost… This will help me select games they can learn from, generating useful training data.