Resetting the Models

It’s been about a month since my last update, so first, a refresher.

I’ve been iteratively training a number of Neural Networks to play Connect 4. Every round, I would have them play a hundred thousand games, study those games to learn from them, and then play them in a Round Robin style tournament. The networks (referred to as models) vary in both the number of layers of neurons they employ as well as the width of those layers.

Unfortunately, some of my models got a bit stuck and ended up with consistently poor performance. Not only that, but the ones that were performing better seemed to hit a plateau and stop improving.

As a result I decided to have the best performing models play a large number of games (1,000,000), create a brand new set of fresh models, and train those models against this data set for a larger number of training cycles, or epochs (previously I’d been training for 15 epochs every cycle, for this fresh set I went up to 100). The training took in some cases up to 90 minutes per epoch; given the sheer number of models and epochs, it took me almost a month to complete.

The Round Robin results were lackluster at best. Every single model won every single game where they played first and lost every game where they played second. Testing against the smart random player, however, revealed a bit more variation (the smart random player will first looking for winning moves, then look to block the opponent’s winning moves, then (if none of the prior are found) make a completely random move).

Here is the result of testing the new models against the SRP:

   

Width

   

100

250

500

1000

2500

5000

Layers

1

344

354

372

358

347

370

2

330

343

360

349

365

351

3

375

351

351

360

361

354

4

334

359

356

324

371

333

5

340

355

383

367

348

369

Current Performance against SRP in number of games won out of 1,000

The green squares are above average, up to one standard deviation; the blue squares are more than one standard deviation above the average. Likewise, the yellow squares are below average, and the red squares more than one standard deviation below average.

Since I’ve been showing the Round Robin results in the past, you can’t directly compare these results with those in my previous posts. Here, however, is the SRP data from the final round of the old models for comparison:

   

Width

   

100

250

500

1000

2500

5000

Layers

1

262

328

377

364

365

370

2

1

58

380

358

377

407

3

324

330

382

363

388

382

4

204

350

379

66

355

56

5

53

351

361

377

364

337

Prior performance against SRP

You can see the trouble I had… Certain models had simply gotten stuck as very poor performers. But, if we removed those from the data as outliers, we still have this:

   

Width

   

100

250

500

1000

2500

5000

Layers

1

262

328

377

364

365

370

2

   

380

358

377

407

3

324

330

382

363

388

382

4

204

350

379

 

355

 

5

 

351

361

377

364

337

Prior performance against SRP after removing “stuck” models

To recap the performance:

 

Current

Prior

Prior w/o Stuck Models

Avg

354

302

353

Std Dev

13.0

120

41

Average performance and Standard Deviation for the prior and current generation of models

As you can see, restarting from scratch with a new set of models mainly resulted in standardizing the performance.

I’m going to go run a few standard rounds with this new set of models and see if they improve at all in performance. The prior models peaked at winning around 40% of games against the SRP and didn’t pass that point for several rounds, so we’ll see if starting over allows me to reach a higher level of performance.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s