Nothing. Larry, the neural network with a foundation of convolutional layers, has not progressed beyond winning 80% of games against the SRP. I’ve varied the size of the training set (100,000 – 500,000 games), as well as the ratio of wins to losses (from Larry winning 30% to winning 70% against the SRP).
In theory, a large number of games where Larry loses should allow him to learn new patterns of behavior which allow him to perform better. In practice, I believe the signal to noise ratio is simply too low, and that the SRP is entirely too random in its play for Larry to learn anything more.
The solution, of course, would be to increase the quality of play.
I could simply use VICTOR, which was developed as part of a master’s thesis at MIT in the 80s. It has been proven to play flawlessly (leading to an unavoidable win by the first player; the second player may force a tie if the first player makes a single mistake). However, I’d rather not lean on this crutch, as such tools aren’t available for Goban or Go (which I plan on tackling later). I’d rather have my NNs able to learn from themselves.
Instead, I’ll add a bit of random flavor to their play, adding a bit of variation to the moves chosen. I’ll probably add in a variable which controls how likely the NN is to make a random move, with 1 being “Make the best move available” and 0 being “Choose a move at random.” Then I can drop the SRP entirely and have the neural networks solely responsible for generating the training data. By varying the value of this randomness parameter, I can see what values allow for good training data.