It’s been about a month since my last update, so first, a refresher.
I’ve been iteratively training a number of Neural Networks to play Connect 4. Every round, I would have them play a hundred thousand games, study those games to learn from them, and then play them in a Round Robin style tournament. The networks (referred to as models) vary in both the number of layers of neurons they employ as well as the width of those layers.
Unfortunately, some of my models got a bit stuck and ended up with consistently poor performance. Not only that, but the ones that were performing better seemed to hit a plateau and stop improving.
As a result I decided to have the best performing models play a large number of games (1,000,000), create a brand new set of fresh models, and train those models against this data set for a larger number of training cycles, or epochs (previously I’d been training for 15 epochs every cycle, for this fresh set I went up to 100). The training took in some cases up to 90 minutes per epoch; given the sheer number of models and epochs, it took me almost a month to complete.
The Round Robin results were lackluster at best. Every single model won every single game where they played first and lost every game where they played second. Testing against the smart random player, however, revealed a bit more variation (the smart random player will first looking for winning moves, then look to block the opponent’s winning moves, then (if none of the prior are found) make a completely random move).
Here is the result of testing the new models against the SRP:
|
Width |
|||||||||
|
100 |
250 |
500 |
1000 |
2500 |
5000 |
||||
|
Layers |
1 |
344 |
354 |
372 |
358 |
347 |
370 |
||
|
2 |
330 |
343 |
360 |
349 |
365 |
351 |
|||
|
3 |
375 |
351 |
351 |
360 |
361 |
354 |
|||
|
4 |
334 |
359 |
356 |
324 |
371 |
333 |
|||
|
5 |
340 |
355 |
383 |
367 |
348 |
369 |
|||
Current Performance against SRP in number of games won out of 1,000
The green squares are above average, up to one standard deviation; the blue squares are more than one standard deviation above the average. Likewise, the yellow squares are below average, and the red squares more than one standard deviation below average.
Since I’ve been showing the Round Robin results in the past, you can’t directly compare these results with those in my previous posts. Here, however, is the SRP data from the final round of the old models for comparison:
|
Width |
|||||||
|
100 |
250 |
500 |
1000 |
2500 |
5000 |
||
|
Layers |
1 |
262 |
328 |
377 |
364 |
365 |
370 |
|
2 |
1 |
58 |
380 |
358 |
377 |
407 |
|
|
3 |
324 |
330 |
382 |
363 |
388 |
382 |
|
|
4 |
204 |
350 |
379 |
66 |
355 |
56 |
|
|
5 |
53 |
351 |
361 |
377 |
364 |
337 |
|
Prior performance against SRP
You can see the trouble I had… Certain models had simply gotten stuck as very poor performers. But, if we removed those from the data as outliers, we still have this:
|
Width |
|||||||
|
100 |
250 |
500 |
1000 |
2500 |
5000 |
||
|
Layers |
1 |
262 |
328 |
377 |
364 |
365 |
370 |
|
2 |
380 |
358 |
377 |
407 |
|||
|
3 |
324 |
330 |
382 |
363 |
388 |
382 |
|
|
4 |
204 |
350 |
379 |
355 |
|||
|
5 |
351 |
361 |
377 |
364 |
337 |
||
Prior performance against SRP after removing “stuck” models
To recap the performance:
|
Current |
Prior |
Prior w/o Stuck Models |
|
|
Avg |
354 |
302 |
353 |
|
Std Dev |
13.0 |
120 |
41 |
Average performance and Standard Deviation for the prior and current generation of models
As you can see, restarting from scratch with a new set of models mainly resulted in standardizing the performance.
I’m going to go run a few standard rounds with this new set of models and see if they improve at all in performance. The prior models peaked at winning around 40% of games against the SRP and didn’t pass that point for several rounds, so we’ll see if starting over allows me to reach a higher level of performance.