In theory, we could solve any game by building a big game tree, labeling the terminal states as wins, losses, or ties, and then working backwards from that information to identify the minimax strategy. But in practice, game trees get so big so quickly that for all but the most simple games, game trees are too expensive to store in memory and take too long to traverse.
For example, consider the game of connect four, which is normally played on a $7 \times 6$ board. Whereas tic-tac-toe had $5\,478,$ valid board states, connect four has $4\,531\,985\,219\,092$ (source). Implementing a game tree of this size is infeasible – if you’re not convinced, try running a simple “for” loop that loops over the numbers from $1$ to $4\,531\,985\,219\,092.$ If looping over a million numbers takes a couple seconds, then looping over a trillion numbers will take days.
Reduced Search Depth
We can’t build a full game tree for connect four. But what we can do instead is
- reduce the search depth (i.e. build a shortened game tree up to some maximum depth), and
- come up with some kind of heuristic evaluation function to rate how good or bad each leaf node state is.
Then we can apply the minimax strategy in attempt to move in the direction of the best leaf node state.
This procedure for selecting a move can be outlined more explicitly as follows:
- Build a game tree that is $N$ layers deep from the current game state. (This is referred to as a $N$-ply game tree.)
- Use the heuristic evaluation function to assign minimax to the terminal nodes in the game tree.
- Repeatedly propagate those values up to parent nodes using the minimax algorithm.
- Choose your action in accordance with the standard minimax strategy (i.e. choose the move that takes you to the child state with the highest minimax value).
Note that this time, you’ll have to relabel the game tree on every move because the terminal nodes of the tree will change (thereby changing the minimax values of the rest of the tree). But you don’t need to rebuild the full game tree on every move – you can take the existing game tree, prune off nodes that are no longer irrelevant, and grow the additional nodes needed to bring you back to a search depth of $N.$
Heuristic Evaluation Function
Now, let’s talk about the “secret sauce” in this recipe: the heuristic evaluation function. It takes a game state as input, and returns a number between $-1$ and $1$ that represents how strongly we want or do not want to be in any given game state. To write this function we use our human intuition about the game. Here are some rough guidelines:
- If we're 100% confident that a game state is a win or will result in a win, then the function should return $1.$
- If we think that a win is more likely than a loss, then the function should return a decimal between $0$ and $1,$ with higher win probabilities corresponding to higher decimals.
- If we have no idea whether a game state will result in a win or loss, or we think it will result in a tie, then the function should return $0.$
- If we think that a win is less likely than a loss, then the function should return a decimal between $0$ and $-1,$ with lower win probabilities corresponding to higher decimals.
- If we're 100% confident that a game state is a win or will result in a loss, then the function should return $-1.$
For example, here is a simple heuristic function for tic-tac-toe:
- If the game state is a definite win, tie, or loss, then return $1,$ $0,$ or $-1$ respectively.
- Otherwise, count up the number of rows, columns, and diagonals where you occupy two spaces and the third space is empty. Then, subtract the number of rows, columns, and diagonals where your opponent occupies two spaces and the third space is empty. Finally, divide the result by $8$ (which is the total number of rows, columns, and diagonals).
Remember to alternate who goes first in the matchups described below.
- Implement a 9-ply heuristic minimax tic-tac-toe player, i.e. it uses the heuristic evaluation function described above and a search depth of $N=9$ (which happens to be the full tree). Then, run it against the perfect minimax player that you created previously. Every game should result in a tie.
- Implement a 5-ply heuristic minimax tic-tac-toe player. Then, run it against your 9-ply heuristic player, as well as a purely random player. The 5-ply player should do better than the random player but worse than the 9-ply player.
- Develop a heuristic minimax connect four player that uses as many ply as can be computed quickly, and verify that it performs better than a random player.
- Develop another heuristic minimax connect four player that uses better heuristic evaluation function. Then, verify that this second heuristic player is better than your first player. (If you can't figure out how to improve your function, then make the second player's function intentionally worse than that of the first player and verify that it indeed performs worse.)
- Play against your heuristic minimax connect four player yourself.