Reversion to the mean, and a new method

So my current neural network I’m training, Larry, had previously increased his effectiveness to 90% against the SRP. Unfortunately, a few more dozen rounds decreased his effectiveness to the point that reverted to an 80% rating.

That’s unfortunate.

On a lark, I started a new model on the side. This one was trained entirely by playing games against me. Since I can’t play hundreds of thousands of games I can’t afford to wipe out the training data over time, so instead every time I play it the game gets added to the current training set. While it’s fun to see this NN slowly learn the basics of the game, it’s also frustrating to see how long it’s taking to improve. After all, I’ve only managed to get through around 200 games against it, whereas the automated models use a set of 100,000 games every training cycle.

It occurred to me that perhaps the problem is that I am wiping out the training data every round. After all, when people learn a skill they don’t forget everything they previously learned as they progress through it; they keep the basics and occasionally revisit them. For instance, musicians start their day with basic drills and warmups.

Perhaps my neural networks would improve more consistently if they did something similar. Perhaps I need to establish a method that allows my NNs to revisit the basics in the same way.

I’m going to try adding to the training set instead of replacing it every round. My previously chosen size of 100,000 games was chosen because of how long it took to train against that size; if the set grows over time I won’t want to add 100,000 games every time. So instead, every round I’ll try adding 1,000 games.

I’ll also try only adding the games where it loses. This has two advantages:

  1. As the NN improves in quality, I will add to the training set advanced techniques that triumph over it,
  2. If the NN focuses too much on advanced tactics and forgets the basics, those basic techniques will simply get reinforced in the training data set.

We’ll see how this method works out over a few dozen rounds.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s