An Investigation of an Adaptive Poker Player

 

Game playing has a long research history. Chess has received particular interest culminating in Deep Blue beating Kasparov in 1997, albeit with specialized hardware (Hamilton, 1997) and brute force search. However, although arguably, being a ‘solved game’ chess still receives interest as researchers turn to adaptive learning techniques which allow computers to ‘learn’ to play chess, rather than being ‘told’ how it should play (Kendall, 2001). Adaptive learning was being used for checkers as far back as the 1950’s with Samuel’s seminal work (1959, re-produced in Samuel, 2000). Checkers research would lead to Jonathan Schaeffer developing Chinook, which claimed the world title in 1994 (Schaeffer, 1996). Like Deep Blue, it is arguable if Chinook used AI techniques. Chinook had an opening and ending database. In certain games it was able to play the entire game from these two databases. If this could not be achieved, a form of mini-max search, with alpha-beta pruning was used. Despite Chinook becoming the world champion, the search has continued for an adaptive checkers player. Chellapilla and Fogel’s (Chellapilla, 2000) Anaconda was named due to the strangle hold it placed on its opponent. It is also named Blondie24, this being the name it used when competing in internet games (Fogel, 2001). Anaconda uses an artificial neural network (ANN), with 5000 weights, which are evolved by an evolutionary strategy. The inputs to the ANN are the current board position and it outputs a value which is used in a mini-max search. During the training period, using co-evolution, the program is given no information other than whether it won or lost. Once Anaconda is able to play at a suitable level, it often searches to a depth of 10, but depths of 6 and 8 are also common in play. Anaconda has been available to the delegates at the Congress on Evolutionary Computing (CEC) conference for the past two years (CEC’00, San Diego and CEC’01, Seoul) with Fogel offering a prize of $100 (CEC’00) and $200 (CEC’01) to anybody who could defeat it. The prize remains unclaimed and at the next conference (CEC’02, Hawaii), the prize rises to $300. Poker also has an equally long research history with von Neumann and Morgensten (von Neumann, 1944) experimenting with a simplified, two-player, version of poker. Findler (Findler, 1977) studied poker, over a 20 year period. He also worked on a simplified game, based on 5-card draw poker with no ante and no consideration of betting position due to the computer always playing last. He concluded that dynamic and adaptive algorithms are required for successful play and static mathematical models were unsuccessful and easily beaten. In more recent times three research groups have been researching poker. Jonathan Schaeffer (of Chinook fame) and a number of his students have developed ideas which have led to Loki, which is, arguably, the strongest poker playing program to date. It is still a long way from being able to compete in the World Series of Poker (WSOP), an annual event held in Las Vegas, but initial results are promising. Schaeffer’s work concentrates on two main areas (Billings, 1998a and Schaeffer, 1999). The first research theme makes betting decisions using probabilistic knowledge (Billings, 1999) to determine which action to take (fold, call or raise) given the current game state. Billings et. al. also uses real time simulation of the remainder of the game that allows the program to determine a statistically significant result in the program’s decision making process. Schaeffer’s group also uses opponent modeling (Billings, 1998b). This allows Loki to maintain a model of an opponent and use this information to decide what betting decisions to make. Koller and Pfeffer (Koller, 1997), using their Gala system, allow games of imperfect information to be specified and solved, using a tree based approach. However, due to the size of the trees they state “…we are nowhere close to being able to solve huge games such as full-scale poker, and it is unlikely that we will ever be able to do so.” Luigi Barone and Lyndon While recognise four main types of poker player; Loose, Tight, Passive, and Aggressive. These characteristics are combined to create the four common types of poker players: Loose Passive, Loose Aggressive, Tight Passive and Tight Aggressive players (Barone & While, 1999; 2000). A Loose Aggressive player will overestimate their hand, raising frequently, and their aggressive nature will drive the pot higher, increasing their potential winnings. A Loose Passive player will overestimate their hand, but due to their passive nature will rarely raise, preferring to call and allow other players to increase the pot. A Tight Aggressive player will play to close constraints, participating in only a few hands which they have a high probability of winning. The hands they do play, they will raise frequently to increase the size of the pot. A Tight Passive player will participate in few hands, only considering playing those that they have a high probability of winning. The passive nature implies that they allow other players to drive the pot, raising infrequently themselves. In their first paper Barone and While (Barone, 1998) suggest evolutionary strategies as a way of modelling an adaptive poker player. They use a simple poker variant where each player has two private cards, there are five community cards and one round of betting. This initial work incorporates three main areas of analysis; hand strength, position and risk management. Two types of tables are used, a loose table and a tight table. The work demonstrates how a player that has evolved using evolutionary strategies can adapt its style to the two types of table. In (Barone, 1999) they develop their work by introducing a hypercube which is an n dimensional vector, used to store candidate solutions. The hypercube has one dimension for the betting position (early, middle and late) and another dimension for the risk management (selected from the interval 0..3). At each stage of the game the relevant candidate solutions are selected from the hypercube (e.g. middle betting position and risk management 2) and the decision is made whether to fold, call or raise. To make the decision the hypercube entry holds seven real valued numbers which are used as constants to three functions (fold, call and raise). In effect, the functions lead to a probability of carrying out the relevant action. It i s the seven real values that are evolved depending on whether the player won the hand or not. Barone reports that this poker player improves on the 1998 version. Their 2000 paper (Barone, 2000) extends the dimensions of the hypercube to include four betting rounds (pre-flop, post-flop, post-turn and post-river) and an opponent dimension so that the evolved player can choose which type of player it is up against. The authors report this player out performs a competent static player. Poker, being a game of imperfect information, is interesting as a game for the basis of research. Unlike chess and checkers, poker has some information that is unseen. Poker also contains other unknowns such as the playing styles of the other players who may use bluffing (and double bluffing) during the course of the game. These elements add to the research interest. Unlike complete information games where the techniques to solve the games (computational power allowing) have been known and understood for a long time (such as mini-max search and alpha-beta pruning), games of imperfect information have not received the same sort of analysis and, doing so, could prove relevant to many other areas such as economics, on-line auctions and negotiating.

Related posts:

  1. Ultimately, every tournament player should have three goals when playing poker: ...