As I watched my networks being trained, I noticed how quickly even the simplest learned to exactly match the training data. With a data set of 100,000 items, this ought not to happen. Then it struck me:
The networks are deterministic.
For a given set of inputs, any given network will generate one exact set of output data. Given a specific board, that network will make the exact same move every time.
What this means is that, for my training sessions involving 3 networks, there are three possible starting moves that will repeat over and over. For each of those starting moves, there are two possible responses. Following that, each game will follow a set pattern. My data set of 100,000 items (roughly 12,000 games) is actually a data set of 6 possible games (around 50 data lines) repeated over and over.
It’s too late for me to fix this tonight, but the solution is simple. For the round-robin tournament of ranked games, I will allow each AI to make the best move it knows how. For the training items, I will introduce a random element; the best moves will be most likely, and the worst moves least likely, but any of them will be possible.
I’ll get to work on that tomorrow.