Minimaxing and pruning with many players

28 June 2001

[ work in progress ]

I describe how I think the classic Minimax algorithm and Alpha-Beta pruning may be extended to any number N of players. The result is not entirely satisfactory, and the exact condition about when to prune when nodes yield equal values needs to be worked out, but otherwise it looks very good to me - it quite mimics the way I play :^) when I don't know the level of my opponents, and it seems to play endgames perfectly.

Content:

How to compare the players' progress
Tactical deepening
Pruning fruitless branches
A proof that we prune well?
What now?

How to compare the players' progress

First we have to extend the way we compare a player's position with its opponents.

A class Position will encapsulate descriptions of the whole current states of the game, including whose turn it is.

I'll use a function StaticScore supposed to value quickly how good some player's development is on the board. For a race for example, it might return how far the player has run plus some weighted estimation of her energy left. Here is a pseudo-definition of it in the Python language mixed with plain English.

class Position:
    def StaticScore(CurrentPosition, Player):
        if Player has definitely won in the CurrentPosition: return +infinity
        else: return some static estimate of the player's development...

At leaves in the search tree, I'll call that function for every player and store the resulting list of scores into an object of class Scores. Note that a player's static score is not required to take other players much into account at this stage - the comparison wil be done outside, once the whole list has been computed.

I'll use the following function Globalimprove to adjust any player's score and take his opponents' scores into account - particularly the score of his most advanced opponent. In two-players games, that would just be about subtracting Player 0's score from Player 1's or the opposite. We usually make do with only one point of view and negate the result when we want the other. With more than two players on the other hand, it is easier to keep all the points of view. I guess at least N-1 such improved values are necessary to know them all.

class Scores:
    # Will hold a List of static scores read by the following function
    def Globalimprove(CurrentScores, Player):
        # Return the player's score minus his strongest opponent's score
        return CurrentScores.List[Player] \
        - max(CurrentScores.List[0:Player-1]+CurrentScores.List[Player+1:N])

Another definition is likely to come first to mind, namely the player's score minus the sum or the average of all his opponents' scores. Many human beings are basing their choice of a move on such a global evaluation of their position. I am afraid it does not show much understanding of multi-player strategy. They see how their position is in front of the whole relative opposition, quite as a block, admittedly with some possible local focus on closer helps and binary oppositions, and would often miss the main point - a very advanced player might be close to win if they do not fight him first instead of trying to reinforce themselves against weaker opponents (who may actually be free to help the very advanced player).

Indeed, human beings often tend to feel desperate. They are afraid of the opposing multi-faceted force as if all their opponents might suddenly focus only on them. Well, their opponents might, but they'd better move on with some confidence, without neglecting dynamic, fragile trust and collaboration based on some implicit but rather clear common objective - not to allow another player to get too strong and win - and without resorting to spoiling, missdirected, desperate actions themselves.

Well, comparing oneself with only the strongest opponent as I advocate might be (slightly) bad. If a player has to choose between two moves equally good against her strongest opponent, why not helping her choice and give an additional small weight to her second strongest opponent, and so on? More investigation is needed, especially about whether the pruning I'll then introduce into the minimax algorithm would remain sound.

The constant +infinity is not only a convenient name for a value bigger than any other value. As in mathematics I want it to remains just the same after subtracting any smaller value from it. My intent is that a player's different ways to victory be just the same. And above all a player's different ways to lose be just the same as well, so that no player spoils the game and makes undue expectations on others' bias towards him for example when the end of the game is at stake. See Mark Mammel's dilemmas for some details.

Tactical deepening

Alright, here is now Minimax, a function in the class Position. It searches throughout the moves available on the current position and the possible answers etc. at the requested depth and returns a list of N values yielding every player's point of view on the position, and a recommended move.

    def Minimax(CurrentPosition, Depth):

        # If we are at a leaf in the search tree
        if Depth == 0 or CurrentPosition.GameOver():

            # Fill in a list with the players' static scores
            scores = Scores()
            scores.List = map(CurrentPosition.StaticScore, range(N))

            # Return a list of improved values for every player, and no recommended move
            return map(scores.Globalimprove, range(N)), none

        # Else, get a list ready to accept the players' first board values
        bestvalues = map(lambda player: +infinity, range(N))
        bestvalues[CurrentPosition.PlayerToMove] = -infinity

        m = 0  # This will count equivalent best moves

        # Go through every move available to the current player
        for move in CurrentPosition.MoveList():

            # Apply the move and value the new position a little less deeply
            newposition = Apply(move, CurrentPosition)
            values, recommendedanswer = newposition.Minimax(Depth-1)

            # Get the result from the current player's point of view
            v = values[CurrentPosition.PlayerToMove]

            # If it is equivalent to our current best candidate move
            if v == bestvalues[CurrentPosition.PlayerToMove]:

                # Prepare to tell that you may select what is worse for each other player
                bestvalues = map(min, bestvalues, values)

                # But actually choose uniformly-randomly among equivalent moves
                m = m+1
                if a number randomly picked from range(m) == 0:
                    recommendedmove = move

            # Else if the result is better than our current best candidate move
            elif v > bestvalues[CurrentPosition.PlayerToMove]:
                bestvalues = values
                m = 1
                recommendedmove = move

        # In the end we got our values and best move
        return bestvalues, recommendedmove

Pruning fruitless branches

We are ready for PruningMinimax, which additionally identifies and cuts off branches in the search that cannot affect the best move and values selected on the parent node. For that purpose, the caller specifies an additional parameter to the function, namely a list of upper bounds. The first call should specify a list filled in with +infinity, resulting in no possible cut-off at the very root level.

    def PruningMinimax(CurrentPosition, Depth, Bounds):

        if Depth == 0 or CurrentPosition.GameOver():
            scores = Scores()
            scores.List = map(CurrentPosition.StaticScore, range(N))
            return map(scores.Globalimprove, range(N)), none

        bestvalues = map(lambda player: +infinity, range(N))
        bestvalues[CurrentPosition.PlayerToMove] = -infinity
        m = 0
        newbounds = Bounds

        for move in CurrentPosition.MoveList():

            newposition = Apply(move, CurrentPosition)
            values, recommendedanswer = newposition.PruningMinimax(Depth-1, newbounds)
            v = values[CurrentPosition.PlayerToMove]

            if v == bestvalues[CurrentPosition.PlayerToMove]:
                bestvalues = map(min, bestvalues, values)     # Unbiasing
                m = m+1
                if a number randomly picked from range(m) == 0:
                    recommendedmove = move

            elif v > bestvalues[CurrentPosition.PlayerToMove]:
                bestvalues = values
                m = 1
                recommendedmove = move

                for player in range(N):
                    if v > Bounds[player] and player != CurrentPosition.PlayerToMove\
                    and some further conditions to work out due to the Unbiasing :
                        return bestvalues, recommendedmove     # Cut-off

                if -v < newbounds[CurrentPosition.PlayerToMove]:
                    newbounds[CurrentPosition.PlayerToMove] = -v  # Narrowing

        return bestvalues, recommendedmove

A proof that we prune well?

[To be continued...]

What now?

Of course StaticScore may need that we mine specific knowledge about the game. I also recommend an iterative deepening and a killer heuristic to speed up the whole search. Now as far as the game-independent setting of minimax and pruned minimax is concerned, the next improving steps would be to:

investigate adding a fraction of the second highest score plus a smaller fraction of the third, etc. when we globally improve the static scores
base the random choice among equivalent moves on a non-uniform distribution derived from expected outcomes (see Daniel E. Loeb: Stable Winning Coalitions in Games of No Chance, MSRI Publications Volume 29, 1996, pp.451-471)
deal with values close to each others and with approximation errors in the static scores and in human judgement - perhaps a value only an epsilon worse than another value should not be completely dismissed together with the other players' values
deal with the opponents' style and strength - up to psychology and poker tricks

[Up home]