Resetting the Models

It’s been about a month since my last update, so first, a refresher.

I’ve been iteratively training a number of Neural Networks to play Connect 4. Every round, I would have them play a hundred thousand games, study those games to learn from them, and then play them in a Round Robin style tournament. The networks (referred to as models) vary in both the number of layers of neurons they employ as well as the width of those layers.

Unfortunately, some of my models got a bit stuck and ended up with consistently poor performance. Not only that, but the ones that were performing better seemed to hit a plateau and stop improving.

As a result I decided to have the best performing models play a large number of games (1,000,000), create a brand new set of fresh models, and train those models against this data set for a larger number of training cycles, or epochs (previously I’d been training for 15 epochs every cycle, for this fresh set I went up to 100). The training took in some cases up to 90 minutes per epoch; given the sheer number of models and epochs, it took me almost a month to complete.

The Round Robin results were lackluster at best. Every single model won every single game where they played first and lost every game where they played second. Testing against the smart random player, however, revealed a bit more variation (the smart random player will first looking for winning moves, then look to block the opponent’s winning moves, then (if none of the prior are found) make a completely random move).

Here is the result of testing the new models against the SRP:

		Width
		100	250	500	1000	2500	5000
Layers	1	344	354	372	358	347	370
	2	330	343	360	349	365	351
	3	375	351	351	360	361	354
	4	334	359	356	324	371	333
	5	340	355	383	367	348	369

Current Performance against SRP in number of games won out of 1,000

The green squares are above average, up to one standard deviation; the blue squares are more than one standard deviation above the average. Likewise, the yellow squares are below average, and the red squares more than one standard deviation below average.

Since I’ve been showing the Round Robin results in the past, you can’t directly compare these results with those in my previous posts. Here, however, is the SRP data from the final round of the old models for comparison:

		Width
		100	250	500	1000	2500	5000
Layers	1	262	328	377	364	365	370
	2	1	58	380	358	377	407
	3	324	330	382	363	388	382
	4	204	350	379	66	355	56
	5	53	351	361	377	364	337

Prior performance against SRP

You can see the trouble I had… Certain models had simply gotten stuck as very poor performers. But, if we removed those from the data as outliers, we still have this:

		Width
		100	250	500	1000	2500	5000
Layers	1	262	328	377	364	365	370
	2			380	358	377	407
	3	324	330	382	363	388	382
	4	204	350	379		355
	5		351	361	377	364	337

Prior performance against SRP after removing “stuck” models

To recap the performance:

	Current	Prior	Prior w/o Stuck Models
Avg	354	302	353
Std Dev	13.0	120	41

Average performance and Standard Deviation for the prior and current generation of models

As you can see, restarting from scratch with a new set of models mainly resulted in standardizing the performance.

I’m going to go run a few standard rounds with this new set of models and see if they improve at all in performance. The prior models peaked at winning around 40% of games against the SRP and didn’t pass that point for several rounds, so we’ll see if starting over allows me to reach a higher level of performance.

Resetting the Models

Published by BenjaminDChambers

Leave a comment Cancel reply

Share this:

Published by BenjaminDChambers

Leave a comment Cancel reply