Quick update

So, I fixed the determinism problem. When ranking different networks in the tournament, the best move will always be chosen. When generating training data, moves are randomized, with the best move being the most likely.

I also replaced the two random players with a new, single randomized player. This player will first look for winning moves and, if none are found, then look for opportunities to block a win by their opponent. Only after that will completely random moves be made. This one change made it able to completely dominate the tournament, not losing a single game (which is expected since the networks haven’t really been trained yet).

I’m now running an overnight training session, so I’ll be able to rank them again in the morning.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s