So I tried an experiment: For a few rounds, I generated my training data by only picking the best performing neural network, playing it against the Smart Random player, and only adding those games where the NN lost to the data set.
It was, quite bluntly, an unmitigated disaster.
Every NN began performing worse by training against this data. Over the course of a few rounds, some of them lost more than 25% of their effectiveness.
I’m going back to playing a set number of games, and looking to ALL of them. This will teach my networks new skills (looking at the games they lost), while also reinforcing what they have already learned (the games they won).