I’ve been running training sessions for my latest model for roughly two months now and just getting frustrated. Progress is extremely slow, as the model is still infantile in its ability to play Gomoku.
Looking at my adventures with Connect 4, the main things I learned are:
- Smaller training sets are fine as they are quickly swapped out with higher quality data as the model learns to play
- Larger / more complex models play better, but there’s a quick drop off in ROI beyond a certain point
Because of that, I’ve been playing a single model against itself, with one of the players adding a touch of random variability, in 50 game matches. Even so, each set takes 3 hours to perform. After two months of such training, the model is still frustratingly unable to perform basic actions like making a winning move or block mine.
I’ve decided to take a page out of Google’s playbook with their recent AlphaGo Zero implementation. They played two separate instances against each other and learned from each and every game played, with tremendous results (learning to beat the best human players).
So on my laptop, I’ll set up two separate models and play them against each other. Every round will consist of a pair of games with each model playing once as both white and black. The game will be added to a single database that will grow over time. In order to manage the size of the data, I’m also not going to shift it over the board, meaning I’ll have a lot less training data, so sessions should complete more quickly at the cost of generalization. We’ll see how this does over the next few weeks.