Académique Documents
Professionnel Documents
Culture Documents
1 Introduction 1.1 Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 History of Poker Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 History of Poker Research . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview Modeling Games 3.1 Players . . . . . . . . . . . 3.2 Moves, Actions and Game 3.3 Payos . . . . . . . . . . . 3.4 Information Sets . . . . . 3.5 Strategies . . . . . . . . . 3.6 Solutions . . . . . . . . . 3.7 Formalization . . . . . . . Modeling Poker 4.1 Players . . . . . . . . . . . 4.2 Moves, Actions and Game 4.3 Payos and Rake . . . . . 4.4 Information Sets . . . . . 4.5 Strategies . . . . . . . . . 4.6 Solutions . . . . . . . . . 4.7 Cards and the Deck . . . 4.8 Other Conventions . . . . 4.9 Toy Games . . . . . . . . Discrete Toy Games 5.1 3 Card Poker . 5.2 13 Card Poker 5.3 A Case Study 5.4 A Case Study 3 3 3 6 6 7 . 7 . 8 . 8 . 9 . 9 . 9 . 10 13 13 13 15 17 17 18 18 18 19 20 20 26 34 35 36 36 41 44 46
2 3
. . . Tree . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . Turn Turn
. . . . . . Play Play
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Continuous Toy Games: [0,1] Poker 6.1 [0,1] Fixed Bet Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 [0,1] Chosen Bet Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 [0,1] Free Bet Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Botting
7 8
Appendix
47 49
Bibliography
1 Introduction
1.1 Games
Games play a dominant role in mammal and especially human life. Children spend most of their early years exclusively playing. Leading theories1 suggest that games help the brain to focus on and master small skill sets in a friendly environment, which ultimately help the brain conquer complex tasks. Humans dont stop playing games as grown ups, games are enjoyed at all ages with family and friends, as well as increasingly online. The railing of games ll up many a pastime, the largest of them being the Olympic games, drawing in massive world attention. Games and gaming being what they are, there is little surprise that the academic world takes interest2 . However, the study of games from a mathematical standpoint has a surprisingly short history. Perhaps this is because analyzing a game takes some of the magic out of it. Taking a great shot at a goal is much more enjoyable than running numbers on how hard to hit the ball, using what angle, and how to turn the foot to achieve the desired spin. Through the advance of living standards around the globe more careers at professional gaming than ever are available now. This increases the competitiveness and the mathematical framework for understanding games is just slowly catching up.
Fig. 1 Peak daily online real money poker players plotted against the time period 2003 to 2013. Courtesy of pokerhistory.eu. US in the 19th century. The modern community card rules, specically the variant Texas Holdem which is discussed in this writing, appeared in the early 20th century. By 1967 Taxes Holdem was played in Las Vegas casinos, from where its popularity has been growing consistently. Soon after the no-limit variant was introduced for tournaments, the variant took o. No limit holdem features a set of rules with interesting properties. The game is easy to understand, thrilling and has tremendous strategic depth. As poker is a form of strategic gambling where players play against each other and not against the house, it is one of a few forms of possible positive expected value gambling. Unlike other forms of gambling where one can not win in the long term, such as roulette, lotteries or bingo, professional poker players stand to potentially win money consistently. In this regard, poker is very unique. There are certain parallels to professional sports betting, there, however, players bet against the bookie, and thus making a constant prot is at odds with the operator. Before we move on, it should be noted that poker should be thought of as a civilized past time game. There are known clear rules, a veriable pseudo random shue and safe money transactions in place. A third party trusted arbitrator3 handles disputes: the casino in a live setting and the poker room in an online setting. Scenes from Hollywood movies depicting betting cars by placing car keys on the table are far from reality. Unregulated poker play in private clubs, while existing and illegal, is a very small part of the poker universe. That is not to say poker is completely safe. Cheating and insolvency of the game host resulting is loss of player funds have their part in poker history as much as in any other industry. Starting around 1990 poker for play money was rst played online on IRC servers.
3 Implementing poker in a completely trustless fashion on a distributed ledger may soon be a possibility; for instance the ethereum project is working towards supplying a turing complete blockchain scripting language.
Fig. 2 Selection of software and hardware specialization of professional poker players. TL: Information set visualization for a strategy. TR: Statistical data on a player as collected by a program describing the players frequencies. BL: A professional poker player at work. BR: Replay analysis of a game of poker with statistical data. Compilation of the author. The rst successful poker bots were developed during this period. The rst online poker real money site Planet Poker opened for business in early 1997. About a year later the second site, Paradise Poker, launched, which still operates today. Only the name has survived though, Paradise is just a vertical of Sportingbet plc using third party poker software. With the the Moneymaker Eect4 in 2003 the advent of online poker started. Newer and better operators opened their online card rooms. Poker magazines and news outlets popped up, many books were written. Poker training and coaching websites launched their services, and software companies for statistical data analysis were founded. By 2010 online poker had become so competitive that professional players were using customized hardware and highly specialized software in addition to the software oered by the operator.
Chris Moneymaker, an accountant at the time, won the World Series of Poker main event in 2003, taking home a life changing amount of money. He qualied to play in the event through a $39 online satellite tournament. This story of rags to riches gained enormous public popularity.
4
2 Overview
This paper gives a quick introduction to game theory, discusses how poker is best understood in its framework, and then applies the ideas in aid of strong poker play. The section Modeling Games outlines how the modern branch of mathematics game theory tries to model conicts of interest. Participants in a game, their possible actions and outcomes of those actions are formalized. Information sets, game states and strategies are derived and the extensive and normal form of a game are presented.
5 Street is the shorthand for a draw of cards, a chance event. Modern poker is typically played with four streets. 6 Finding the solution to a two person zero sum game with nite pure strategy sets can be done by computing the solution to a corresponding LP, which can be solved in polynomial time (section 9) 7 Association for the Advancement or Articial Intelligence
The distinctions between two-player and n-player, constant sum and non-constant sum, complete information and hidden information, cooperative and competitive as well as one time and sequential are made. With the game theory framework available we move onto poker, with its intriguing hidden information property. The necessary notation and conventions are introduced. After the necessity for toy games has been shown, we move to the main part. TODO We solve a number of toy games which are designed to model certain aspects of the full game of poker. The rst games will be solved algebraically. The later games will be translated into LP form and solved by a commercial LP engine. The results will be analyzed and extrapolated
3 Modeling Games
The branch of game theory sometimes justies its existence by claiming to be an adequate model for conicts of interest. Game theory can be used to examine economics, tactical warfare, political debates or ght for resources of any kind. These ideas have pushed game theoretic courses into the human studies. A problem that arises is that the mathematical framework makes many subtle assumptions, some fundamental ones of which do not translate well into the real world. As such the authors view is that the game theoretic framework is best applied to games in the narrow sense dened below, and any extrapolations to real world situations need to be done with great care. There exist many ways to model a game. In particular, a given game can be modeled dierently, and a given model can be extrapolated to many dierent games. In this part the extensive form and normal form are introduced. Those descriptions lacks a time component, not in the sense of no existing consecutivity, but in the sense of no model for limited time resources for players as reaction times or time to contemplate. As such these models are useful for games where players have sucient time to learn and plan for as well as to play in. These models succeed in describing turn based games, such as chess or poker, by oering strategic advice. They tend to fail to describe real time games, such as tennis or Starcraft, in a useful way to aid play. Although with appropriate abstractions even parts of real time games can be analyzed with the standard game theoretic models. A game features players, who take actions when it is their move. The players moves form the vertices of the game tree, its root is the start of the game. The players actions traverse the game tree until they read a leaf in a nite number of steps. Each leaf is assigned a payo to each player. A player may not be able to dierentiate between her moves. Indistinguishable moves form an information set. A line is a path from root to leaf, representing one full play of the game.
3.1 Players
The games players consist of the participants 1, 2, 3... and player 0, a chance player, sometimes referred to as nature.
The participants resemble parties or individuals that take part in the game. Each participant is assumed to be fully aware of the games structure. A participant knows about all players, all existing moves, all possible actions, all payos, and all information sets. Additionally, each participant is assumed to act rational. A participant will choose, under their information, the action with the highest expected payo (see 3.3).
3.3 Payos
Payos are the outcome of a game. They describe who wins, and when relevant, how much. The diculty is: How much of what? Payos are denominated in utility units, which is formalization of comparable value, interest and measurable or unmeasurable desires of the participants. Finding a suitable utility unit and assigning them in a consistent manner to the outcomes of the game can be a daunting task. A player may be willing to endure a poor outcome if the outcome for his foe is even worse. Such preferences must not violate rationality requirements. Fortunately, in poker there is an inherent unit of utility - the wager. We make the obvious but explicit assumption that each participant is solely trying to maximize their expected wager units in each game. The payo is a function that assigns each leaf a set real values, one per participant. The payo function is said to be constant sum, if the payos always sum to a constant. If the constant happens to be zero, the payo, and the game, are called zero sum.
3.5 Strategies
With the framework of the game laid out, we consider how the participants behavior can be modeled. During game play, each participant chooses an action among his action set at each of his reached information sets. However, the participant could note down his chosen action for each information set before the game starts, and the noted actions could be executed automatically. This predetermined choice on all actions is called a pure strategy. A participant, especially in games of incomplete information, may not always choose the same action at a given information set, in order not to give o information that allow the other participants to narrow down the possible moves within their information sets. A predetermined choice on all information sets with a probability distribution over all possible actions is called a strategy. A strategy is a sucient notion for describing all possible game play from a participant. The set of all participants strategies is sucient to simulate the game. For a given set of participants strategies, the average payos can be determined. A strategy is said to dominate another strategy, if its (expected) payo is always at least that of the dominated.
3.6 Solutions
A solution to a game is an attempt to predict how the participants will act in the game. The more complex a game, the more dicult it is to dene a notion of solution that correctly predicts behavior. At the core of all solution lies the following idea. A participant can, given the set of strategies of all other participants, nd a strategy that maximizes her (expected) payo. If a number of participants are co-operating, whether or not payos are transferable, that group of participants can, given the set of strategies of all other participants, nd a set of strategies that maximize their (expected) payo. If this process, for all participants or cooperating parties of participants, yields no change in strategy for an initial given set of strategies, the strategies are said to be in
equilibrium. Such an equilibrium strategy set has the desirable property of being stable and can thus be seen as a solution. A game may have none or many such equilibria. Their occurrence, structure, nding, approximation and pruning down to a smaller set or single solution are part of the goal of game theory. What qualies as a solution and the structure of the solution space depend on the number of players, repetition of game play, properties of the payo and the exchange of value and information among participants outside before, during or after the game. Constant Sum or Non-Constant Sum A constant sum game has a constant total payo, there exists a c R such that iP \{0} Zi (v ) = c for all v T . A game is non-constant sum if there exists no such value. Two Participants or Three+ Participants A game has two participants if P has two elements in addition to {0}. A game has three or more participants if P as more than three elements in addition to {0}. Preplay Negotiating or Preplay Announcements or No Preplay Commnunication Preplay negotiations allow participants to discuss their actions before the game starts. Two participants may form a coalition, agreeing to adhere to specic strategies. This agreement may be enforceable (a contract between two companies in a legal system) or unenforceable (a vocal commitment which can be violated). These negotiations are potentially complex, and may constitute a game themselves. Preplay announcements allow participants to make promises and threats without feedback (If this happens, I will do that). No preplay communication is closest modeled by our notion of a game, where no participant can take actions outside the game tree. Single Game or Repetitive Game A game can be played a single time or be repeated many times. For repeated games promise and threat execution, coalition abiding and defecting, and other forms of contract negotiations play an important role. Outside Payments or No Outside Payments A game which permits outside payments allows participants to transfer part of their payo to one another. Which allows the formation of cooperations; groups of participants that try to maximize their joint payo.
3.7 Formalization
A game is described by G = (P, G, q, I , A, a, x, Z ) 1. P = {0, 1, ..., n} a set of players. 2. G = (V, E ), v0 V, T V a nite tree with root v0 as starting move, a subset T of terminal moves and E : V \ T V actions from move to move. 10
3. q : V \ T P a map that assigns a player to each move. 4. I a partition of V \ T into the players information sets, I I : I V \ T while I I I = V \ T which satises I I v, w I : p(v ) = p(w ), naturally mapping a player to her information sets. 5. A = (AI )I I a partition of all actions into possible actions per information set. 6. a : E AI A a surjective map from edge to an action. 7. Av = {a((v, w)) | (v, w) E } the action set for a move v. 8. AI = Av v I a property, the action set is a same for each move in an information set. 9. x = {Pr(A)|A Av , q (v ) = 0} a family of probability distributions over all action sets of 0. 10. Z : T Rn a payo to each participant at the terminal moves. This complete form, and similar forms, are referred to as the extensive form of a game. A pure strategy Si for a player i chooses one possible actions in each information set: Si = {A AI | v I, q (v ) = i} for i P \ {0} A given strategy match-up S = (S1 , ..., Sn ) yields a payo to player i P of Yi (S ) =
v T
Pr(v ) Zi (v )
where Pr(v ) denotes the probability of reaching leaf v . Let H denote the set of all directed paths from the root to v as an edge list, that is all lines that reach leaf v . Then Pr(v ) is Pr(v ) =
hH eh
Pr(a(e)).
where 1. Pr(a((v, w))) x if q (v ) = 0 is given and 2. Pr(a((v, w))) = 1 if q (v ) = 0 with a((v, w)) Sq(v) or 3. Pr(a((v, w))) = 0 if q (v ) = 0 with a((v, w)) Sq(v) otherwise. We can now describe the game from a behavioral perspective, viewing the game in the light of what the players can do. The game is described by G = (P, S , Y ) 1. P = {1, ..., n} a set of active players. 11
2. S = {Si |i P } a set of nite sets of pure strategies available to each player. 3. Y = {Yi |i P } a set of linear payos for each player, where Yi : j P Sj R is the payo Player i receives under a strategy match-up. This is also referred to as the normal form of a game. The solution theory for games is broad. A small overview is given below. A constant sum two participant (2 players and nature) game has a solution with unique payo (Minimax theorem section 9). A non-constant sum game with two participants, enforceable preplay negotiations that allows outside payments will push the participants to form a coalition. This also is called a cooperative game as participants can work together. They can achieve total payo v T such that Z (v ) is maximal. Finding all strategy pairs that reach this maximal total payo is straight forward. However, it is unclear how the maximal payo Z (v ) is to be distributed among the participants; typically the use of an arbiter is useful to determine this distribution in a fair manner. Fair arbitration schemes are widely studied, e.g. as bargaining problems where under specic axiomatic desiderata a single solution can emerge [5, p 155-162]. A non-constant sum game with two participants, enforceable preplay negotiations that does not allow outside payments still push the participants to form a coalition, however their objective is more complicated than maxing Z (v ) over v T . Determining a fair solution is can also be seen as a bargaining problem. A non-constant sum game with two participants without preplay communication allows no strategic agreement to be made. This is also called a non-cooperative game as participants can not work together despite potentially possible outside payments. There still exists at least one equilibrium strategy pair [6, p 286-295]. However, an equilibrium pair is a poor notion of solution in this case. A solution with a more applicable denition may not exist [4, p 106-109]. Interestingly, however, if such a game is repeated many times, implicit collusion may occur. Patterns in the choice of strategy can serve as means of in-game communication strong enough to settle on a solution of the same cooperative game [4, p 110-111]. A game with three+ participants with enforceable preplay negotiations and outside payments allows for the denition of a characteristic function v : 2P \{0} R which captures the maximal value a coalition of the participants can achieve. The value of coalition S , v (S ), is computed by assuming P \ S form an opposing coalition, in turn reducing the problem to a two participant game. Coalitions which are stable in the sense that no participant in S gains by defecting are called imputations. Under dierent sets of assumptions dierent imputations qualify as solutions, e.g. introducing a domination relation on the set of imputations leads to useful results [9]
12
For other three+ participant games there exist various approaches, a broad overview can be found in [4, p170].
4 Modeling Poker
4.1 Players
Poker is played between two to ten participants, which map directly to the player set {1, 2, ...}. Player 0 takes the important role of the shue. A shue can be modeled to take place either once before any other player takes a move (pick a random permutation of the deck (see section 4.7) once) or every time a street is dealt (pick the appropriate number of random cards from those cards which are left). The second methodology, although appearing less clean, has two big advantages. Firstly, the game tree is much smaller, as the order of undrawn cards is irrelevant. Even if we were to restrict permutations to those on n out of 52 cards, where n is the maximum number of known cards for a round, we still specify cards for branches of the game tree where they are not reached. Secondly, the notion of new cards being drawn at the time they inuence the information sets interacts more naturally with the description of behavioral strategies. Each participant knowing about the game structure; that is the other players, moves, actions, payos and info sets, is a given in poker, as the rules are known and easy to understand. As hinted to earlier, the structure of the game with units of wager allows for satisfying the rationality requirement: each participant simply aims to maximize their (expected) units of wager.
13
All (still active) participant have bet the same amount. Only one participant remains active. At the end of a betting round, if only one participant remains active, payos are determined. If more than two participants remain active, 0 acts to initiate the next betting round. On the the last betting round, the river, if it ends with two or more active participants, there is a showdown : the starting hands are revealed, and payos are determined based on them. At each move there is a current bet size. For the rst move preop, the starting bet size is set by the game rules to a xed amount a number of participants are forced to place. Without this initial wager, there would be no reason to engage in betting. In holdem there are two starting bets called small blind and big blind forced onto two participants. The current bet size can be altered by the participants moves. Whenever a new street starts by a move of 0, the current bet size is set to zero. The actions the participants can take during a round of betting fall into three categories. Fold A fold is the action of passing. When folding, a person discards their hole cards and surrenders. A fold is a terminal action for a participant, it is always her last action. The participant changes from being active to not being active. The payo for the participant is xed at this point to the negative total amount wagered up to that move. Call A call is the action of matching the current bet size. When the current bet size is zero, a call is called a check. When the current bet size is larger than the remaining stack size of the acting participant, prohibiting her from matching the bet size in full, the call action matches the bet size up to the maximum possible, and she is said to be all-in. An all-in participant is usually treated as automatically matching all future bets for free without the need to take the action explicitly. Raise A raise is the action of increasing the current bet size. When the current bet size is zero, a raise is called a bet. An action of raising requires a numerical value, the amount of the increase or the total current bet size; the authors prefers the latter notation. The size of the bet increase is subject to a number of restrictions. The minimum increase from zero on each street is given by the game rules. In no-limit holdem it is equal to the initial big blind. The minimum increase from a non-zero amount is the previous increase. For example, the bet size progression may take the form 2, 4, 6, 12, 18 with increases of 2, 2, 6, 6 respectively. The progressions 2, 4, 5 and 2, 4, 6, 12, 14 are prohibited. The granularity of the increase is specied by the game rules. Typically the minimum currency denomination (1 cent) in an online setting and the smallest chip in a brick an mortar casino are used, but other conventions are possible. 14
Exception to the above. If a participants stack size is not sucient for a raise by the minimum increase amount, she has the option to raise all-in. Such a non-complete raise must be matched by other participants to continue to the next street. However, depending on the house rules, it may not constitute a full raise that increased the current bet size, and the minimum increase afterwards diers.
15
Hand # 113628743330: Holdem No Limit ($5/$10 USD) [2014/03/21 17:51:39 ET] Seat # 1 is the button Seat 1: Player1 ($2866.33 in chips) Seat 3: Player2 ($1898 in chips) Player1: posts small blind $5 Player2: posts big blind $10 *** HOLE CARDS *** Dealt to Player1 [6h 2d] Dealt to Player2 [Js 7s] Player1: raises $20 to $30 Player2: calls $20 *** FLOP *** [5s 8h 7c] Player2: checks Player1: bets $60 Player2: calls $60 *** TURN *** [5s 8h 7c] [Jc] Player2: checks Player1: checks *** RIVER *** [5s 8h 7c Jc] [Qs] Player2: bets $120 Player1: folds Uncalled bet ($120) returned to Player2 Player2 collected $178.50 from pot Player2: doesnt show hand *** SUMMARY *** Total pot $180 Rake $1.50 Board [5s 8h 7c Jc Qs] Seat 1: Player1 (button) (small blind) folded on the River Seat 3: Player2 (big blind) collected ($178.50) Fig. 3 A record of a poker game. The game setup is given with the description of stack sizes in chips and the seating arrangement. Hole cards are dealt by 0 to initiate game play. Subsequently the players take action up to the river, where only Player2 remains active after Player1 folded. This initiates payos without comparison of hole cards.
16
hand is raked. Whenever a op is seen, a percentage of the pot is taken as rake. At low stakes (pot sizes rarely exceed $60), this can be as high as 5% of the pot size. At high stakes (pot sizes regularly exceed $60), the rake is typically capped at a fraction of a big blind. With this structure, the game is not zero- or even constant-sum. The rake can be modeled as a third player with a deterministic strategy. In this discussion we ignore these eects and model a game without fees, as they add more than justiable complexity for their negligibility.
4.5 Strategies
A strategy describes which action a player chooses in each information set. As the other participants hole cards are varied (take all possible values) on a players information set, a strategy must be formulated without taking into account the cards of the opponents. A player must choose his actions solely based on his own hole cards and the preceding actions taken by the other players. Strategies for the rst one or two preop actions are simple enough to be written down, as there are only 169 distinct two card starting hands in Holdem12 and only a few actions could have been taking by the other players before. Once the op is dealt, however, the number of distinct hole cards can increase to up to 117613 and the number of potential actions taken grows almost exponentially14 .
In forms other than Texas Holdem and Omaha there can be more than one move by 0 which creates hidden information. 10 including her own; this may not be a given in games such as bridge. 11 Cards which are held by the player or on the board cant be hole cards of someone else 12 Due to suit isomorphism Ac2c and As2s are strategically identical preop (see 4.7 for notation). 13 if no suit isomorphism exists anymore, as on ops with three dierent suits 14 The number of actions per move is almost constant, it only decreases slowly as the total amount bet increases. For example, with two active players, the game tree below the moves reached by the actions bet b and check, bet b is identical is size.
9
17
4.6 Solutions
Poker is a zero-sum game ignoring the rake. For two players there exists a single value which can be reached by a pair of equilibrium strategies. For more than two players, preplay communication is explicitly prohibited by the rules. Poker operators are in fact using software to detect potential strategy synchronizing. No outside payments are allowed of course, but that is virtually unenforceable. Implicit collusion poses a problem, though. The game is repeated around 100 times per hour per tables online, and the set of participants over multiple tables can overlap, thus playing multiple hundred games per hour against or with each other.
18
**, ATC A*, Ax T* Tx Q8 Q8s Q8o As*s 5cxc 4c*o *h*h KK-99 AK-AT, AT+ 22+ T6s+
two random cards, Any Two Cards an Ace of any suit and any other random card a Ten of any suit and any other random card a Ten of any suit and a random card of rank 9 or lower a Queen and an Eight of any suits a Queen and an Eight of matching suit, equal to Qs8s, Qh8h, Qc8c, Qd8d a Queen and an Eight of diering suit, equal to Q8 minus Q8s the Ace of Spades and any other Spade the 5 of Clubs and any other Club of rank 4 or lower, equal to 5c4c, 5c3c, 5c2c the 4 of Clubs and any other non-Club any two Hearts two cards of equal rank between K and 9 an Ace and any card of rank K to T any Pair T9s, T8s, T7s, T6s
Fig. 4 Various short hands for denoting specic sets of two cards. The last street, if not the fourth in a given variant of poker, is still referred to as the river. In no-limit holdem, there is a distinct positioning of participants. With three or more participants the players {1, 2, 3, ..., n} are ordered as 1 being the small blind, 2 being the big blind and n being the button. Before the game starts, the small blind and big blind place their forced initial wagers. Preop action is ordered (3, ..., n, 1, 2), postop15 action is ordered (1, 2, 3, ..., n). With two participants the players {1, 2} are 2 the small blind and button and 1 the big blind. Preop action is ordered (2, 1), postop (1, 2). The research parts of this paper will deal with two participants exclusively. For this purpose we let, unless otherwise indicated, P = {0, 1, 2}, whenever P is mentioned. 1 denotes the participant who acts last on the last street.
15
19
20
+ (s(h1 ) + 1)1{t(h2 ,s(h1 ))=call, (s(h1 ) + 1)1{t(h2 ,s(h1 ))=call, + 1{t(h2 ,s(h1 ))=fold} (h1 , h2 )
In gure 5 are payo Z = (z, z ) for the leaves of the game tree. z (, b, , call) A K 2 Q 1 K (b + 1) b+1
A b+1 b+1
Q (b + 1) (b + 1)
Fig. 5 Payo z for 1 when h1 (row) is betting b and h2 (column) is calling. Payo for h2 folding is always +1, irrespective of h1 , h2 , as no showdown is reached. We can make the following observations on dominated strategies18 For 2, folding the Queen (z = 1) dominates calling (z = b + 1). For b = 0 both options are equal, for b > 0 folding is better by b. Thus, 2 will always fold a Queen. For 2, calling the Ace (z = (b + 1)) strictly dominates folding (z = 1). Thus, 2 will always call the Ace. For 2, calling against b = 0 dominates folding. Thus, 2 will always call b = 0 without the Queen. For 1, assuming 2 folds all Queens and calls all Aces, betting b = 0 with the King (z = 1 half the time against the Queen, z = 1 the other 50% against the Ace) strictly dominates betting b > 0 (z = 1 vs. Queen, (z = (b + 1)) vs. Ace). Thus, 1 will bet b = 0 with the King. For 1, betting b > 0 with the Ace (z = +1 at some frequency of folds k [0, 1], z = b + 1 for calls (1 k ) of the time) dominates betting b = 0 (z = +1). Thus 1 will bet some amount b > 0 with the Ace.
18
21
In other words, 2 should not put in extra money when always losing at showdown with the Queen, but should always put in extra money when always winning with the Ace. 1 should not bet the King against that strategy, there is no potential gain. 1 should always bet the Ace for more than 0. Eliminating dominated options, the potential solution space simplies to:
Next we consider the bet sizes 1 can choose. For 1, when holding the Queen, betting b = 0 (z = 1) dominates betting b > 2 (z = (b + 1)) at least half the time when called by the Ace, z = +1 at most half the time when the King folds, totals < 1). The Queen will bet b [0, 2] For 1, when holding the Ace, there exists a bA > 0, such that the average payo is maximal. If there were two such values bA1 , bA2 , 1 could choose any of them for the same payo. 19 If 1 plays bQ > 0 with the Queen while bA with the Ace, and bQ = bA , 2 will call the King against bQ and fold against bA . But then bA is not the maximal payo for the Ace of 1, bQ would be better. Therefore, if the Queen bets, it must bet bA . Does 1 bet bA with the Queen? Let s1 = (bA , 0, bA ) (1 bets the Queen) and s2 = (bA , 0, 0) (1 checks the Queen). Let t1 = (call, bA call, fold) (2 calls the King against bA ) and t2 = (call, bA fold, fold) (2 folds the King against bA ). We nd y (s1 , t1 ) = ((bA + 1) (bA + 1) (bA + 1) + 1)/6 y (s2 , t1 ) = ((bA + 1) 1 1 + 1)/6 y (s1 , t2 ) = (1 (bA + 1) + 1 + 1)/6 y (s2 , t2 ) = (1 1 1 + 1)/6
Where the rst three summands each are the payo of the card match-ups (A, K ), (Q, A) and (Q, K ). The fourth and last summand 1 is the payo achieved when 2 has the Queen against 1s Ace (payo +1 for (A, Q)). The match-ups where 1 has the King ((K, A) and (K, Q)) have average payo 0).
19
Both 1 and 2 only have one decision point, so 2 cant force 1 to mix bA1 , bA2 to achieve the maximum.
22
For example, y (s1 , t2 ) has 2 folding the K, and 1 betting the Queen. (A, K ) pays 1, (Q, A) pays ((bA + 1)) and (Q, K ) pays 1. This simplies to
We can see, neither pair can is in equilibrium. If 1 plays s1 2 plays t1 . If 2 plays t1 1 plays s2 . If 1 plays s2 2 plays t2 . And if 2 plays t2 1 plays s1 20 , completing a cycle. Translating into poker language, if 1 blus, 2 should call in the hopes of catching the blu. If 2 tries to catch all blus, 1 should never blu. If 1 never blus, 2 should never try to catch a blu. Which in turn triggers 1 to always blu. The solution is a mixed strategy of some blus and some calls where no player can unilaterally improve. This means at equilibrium 1 has the same payo when blung as when not blung and 2 has the same payo catching a blu as not trying to catch a blu; otherwise either could change their strategy to achieve a higher payo. Let 1 play s1 with frequency u [0, 1] and 2 t1 with frequency k [0, 1]. To satisfy the above constraint it must:
Which solves to bA 2 bA 2 bA . 2 + bA
u= k=
23
Fig. 6 Game value for Player one plotted against his chosen bet size bA . If 1 bets too small, she misses value with her Aces, as she could put in more money on average with a larger bet. If she bets too big, she gets too few blucatch calls from the King, and thus also misses value with her Aces. At bA = 0 there is no value gained for Aces because the bet size is zero; at bA = 2 there is no value gained for Aces as no King must call to prevent protable blus from 1s Queens.
y (s1 , ((k, t1 ), (1 k, t2 ))) = = = = y (s2 , ((k, t1 ), (1 k, t2 ))) y (((u, s1 ), (1 u, s2 )), t1 ) y (((u, s1 ), (1 u, s2 )), t2 ) bA (2 bA ) 6(2 + bA )
This game value is positive, 1 has the advantage. To maximize his advantage he chooses bA to maximize his game value. The game value function has one extrema per arm, the maximum falls into bA [0, 2].
24
8 = 2( (2) 1) 0.83
The solution to the game is (s , t ) with game value 1 s1 (A, K, Q) = (2( s2 (A, K, Q) = (2( s =
2 3
(2) 0.0572.
(2) 2 (2) s1 + s2 2 2
t1 ((A, b), (K, b), (Q, b)) = (call, call, fold) t2 ((A, b), (K, b), (Q, b)) = (call, fold, fold) t = 2b 2b t1 + t2 2+b 2+b for b [0, 2) for b 2
t = t2 We learn a few poker essentials from this game and its solution.
Strong hands perform best by putting in more money, they want to value bet. Medium strong perform best just seeing a showdown. Weak hands can perform best by putting in more money, they want to blu. They counteract the value bets, to give the opponent a tough decision when she holds a medium strength hand. Bet sizing is important, betting too small or too big achieves poor results.
25
P = {1, 2} S = {0, 1}D {0, 1}{D0} T = {00, 01, 10, 11}D y (s, t) =
(h1 ,h2 )D2
t{11,10}, h1 >h2 } (h1 , h2 ) t{11,10}, h1 <h2 } (h1 , h2 ) t=00, h1 >h2 } (h1 , h2 ) t=00, h1 <h2 } (h1 , h2 )
S is written in compact form where {D 0} is shorthand for the preimage of the anonymous function f : D {0, 1} that maps to 0 of the rst part of the strategy space, that is {D 0} = {d | d D, f (d) = 0}. Only cards which elect not to bet initially face a second decision. Treating the game as S = T and letting (1, 0) and (1, 1) denote the same strategy in S is also possible. Q 1 as part of s S means the card Queen bets. J 00 means the card Jack checks and then folds to a bet, if it is made. K 10 as part of t T means card King would call a bet when bet into but not bet itself when checked to.
22
26
Fig. 7 The game tree for two decisions for player one. Under each possible deal of 0 this tree hangs with the payos depending on the hole cards (w expresses the two cases of who has the stronger hand). After the deal of 0 , 1 decides to bet (b = 1) or check (b = 0). Against bets, 2 can fold or call; folding for instance loses the blind of 1. Against checks, 2 can bet or check herself; if bet, putting 2 to the call or fold decision.
27
Figure 7 depicts the game tree, for instance strategy 01 for a hand of 1 follows the branches marked b = 0 then b = 1 of the nodes marked with 1. 01 could also be called check-call, meaning checking with the intention of calling if a bet is made. We can again make some observations on dominated strategies No player folds an Ace when facing a bet. Both players fold the lowest card when facing a bet. 2 bets the Ace when checked to. Some other observations using our understanding from the 3-card game. Both players will have strong hands value betting and weak hands blung when they make a bet. The number of value bets to blu bets will be related. When players face a bet with a medium strength hand, they will sometimes call and sometimes fold. If 1 bets his strong hands up to L (e.g. up to a King), 2 bets the his strong hands at least up to L 1 (a Queen) when checked to. Thus, if 1 bets all strong hands it cant be in equilibrium, as checking becomes best against 2s best response. We therefore expect some mixing on 1s strong hands. Considering the implications of the bet size of 1 unit into 2 units. A medium strong hand, a blucatcher, needs to win 25% when facing a bet to call, as it stands to win 3 if best and lose 1 if worst, compared to folding. As such, we expect the value bet frequency to be 75%. A weak hand, a blu, needs to achieve at least 33% folds to make a blu worthwhile, as it risks 1 if unsuccessful and wins 2 is successful. Therefor we expect the calling frequency of hands that can beat a blu to be at most 67%. If at most 67% of hands are calling, a hand that has the option to check and see a showdown can only be a value bet if it is at least in the top 33% of hands of the opponent at that point. If, e.g. in the 12-card case D = {A, ..., 3}, 1 were check all hands, we expect 2 to bet A-Q for value, 3 as a blu, and 1 to call A-8, or something very similar.
28
Solving this game by hand is possible, but at this point computer help is potentially faster. Georey J. Gordon published a tool on his website23 which nds approximate equilibria for exactly this game. He represents the game in sequence form, which makes the size very manageable, and then runs the corresponding LP through an interior point algorithm. In this representation the constraints (Ax = b) of the LP are that each participant chooses an admissible action with every card. 1 can choose between bet, check-fold and check-call; 2 between call and fold or bet and check. Through clever modeling, only exactly one of the admissible choices for each card satises the constraints, and therefore only a linear combination of admissible choices is in the feasible set. Since there are only up to n cards and four dierent actions per player per card, the constraint matrix is rather small.24 The objective function (max cT x), naturally, is the payo of the actions represented by x being played by 1 and 2. We present the (numerically approximate) solutions for the 3-card, 13-card and also the 52-card version. The found solutions are in equilibrium, they may however contain dominated strategies. 3-card 1 action % bet check call25 fold A 48 52 100 0 K 0 100 50 50 Q 17 93 0 100 2 action % call fold bet check A 100 0 100 0 K 33 67 0 100 Q 0 100 33 67
We see a few familiar patterns. When betting, 1 is blung 25% of the time. When checking, 1 is calling 67% of hands that beat a blu from the Queen. For 2, the patterns are even more clear. Against a bet, 2 calls 67% of hands that beat a blu. When betting, 2 is blung 25% of the time. The open question is, why does 1 split his Aces in the fashion he does? Interestingly enough, according to [1, p164] this 3-card game actually ips at exactly blinds of 1 unit. For blinds of less than 1 unit, while keeping the bet size at 1 unit, 1 checks all Aces. For blinds of more than 1 unit, 1 bets all Aces. In this game then, betting no Aces, some Aces, or all Aces, should not make a dierence. 1 The game value for the above solution is 18 , an advantage for 2 who acts last. This is exactly the game value for our initial 3-card game at b = 1 of for the person acting last. The cited result above appears to be correct then, 1 gains nothing by being to able to bis his Aces and some Queens himself at the given betsizing of 1 into 2. The fact that
http://www.cs.cmu.edu/~ggordon/poker For Matlab source les append /source. In the given implementation Gordon uses four entries per card in x which leads to sparse A of size (8n 6n). Comparing this to the strategy space which runs exponential in n (O(4n ) as given above) shows how useful this modeling method is. 25 Call and fold are the actions check-call and check-fold given a check was the rst action. Not normalizing by the checking frequency makes the results more dicult to interpret. The frequency of the check-call with an Ace is the product of check and call in this notation.
24 23
29
the solver nds a bet for 48% of Aces is solely due to the deterministic implementation and starting point. The (exact) solution (s , t ) in our strategy notation is (no bets with the Ace for 1, any number thereof accompanied with a third as many Queens would be co-optimal): s1 (A, K, Q) = (0, 0, 0, 1, 1, 0) s2 (A, K, Q) = (0, 0, 0, 1, 0, 0) 1 2 s = s1 + s2 3 3 t1 (A, K, Q) = (11, 10, 00) t2 (A, K, Q) = (11, 00, 00) 2 1 t = t1 + t2 3 3
13-card 1 action % bet check call fold A 64 36 100 0 K 63 37 100 0 Q 62 38 100 0 J 60 40 100 0 T 55 45 100 0 9 42 58 100 0 8 0 100 76 34 7 0 100 61 39 6 0 100 43 57 5 0 100 27 73 4 25 75 17 83 3 44 56 0 100 2 45 55 0 100
A 100 0 100 0
K 100 0 100 0
Q 100 0 100 0
J 100 0 100 0
T 100 0 100 0
9 100 0 100 0
8 76 24 0 100
7 58 42 0 100
6 40 60 0 100
5 25 75 0 100
4 0 100 0 100
3 0 100 100 0
2 0 100 100 0
When betting, 1 is blung 25% of the time again. The value bets are a mix of Ace-Nine with weight gradually decreasing with hand strength. The blus are a mix of Four-Two. If any of the blus check, they lose always, as Three-Two blus for 2 against a check. As such, the weight on the blus of 1 can be distributed arbitrarily; however blung weaker hands dominates blung stronger hands. Against a weak opponent that sometimes misses a blu with a Three, checking a Four and blung a Two performs relatively better than checking a Two and blung a Four. 1 can improve his blung game against non-optimal opponents: 1 action % bet check 4 0 100 3 14 86 2 100 0 30
When checking, 1 is calling 67% of hands that beat a blu from the Three a Two, 67% of hands Four+ that have checked. These calls consist of all hands that beat some value bets of 2 starting at the Ten. They also include all Nines as those block the value betting Nines of 2. This means if 1 checks the Nine or Eight-Four and faces a bet, the Nine is losing against 5 cards (Ten to Ace), but the Eight-Four is losing to six cards (Nine to Ace), all while the blus they face are equal. For the calls from Eight to Four there is no dierence, these are non-blocking blucatchers. Since calling the Eight and folding the Seven dominates playing the other way around, 1 can improve his calling game against non-optimal opponents by shifting calling weight in this fashion: 1 action % call fold 8 100 0 7 100 0 6 24 76 5 0 100 4 0 100
When facing a bet, 2 is calling 67% of the time from the perspective of 1s blu hands Four-Two, and a 61% on average. Only at this frequency the blus of 1 have the same payo for blung as for checking of 0. The selection of hands which call is similar to 1s hands that call after checking: all strong hands and a random mix of blucatcher hands. This again can be improved co-optimally: 2 action % call fold 8 100 0 7 100 0 6 0 100 5 0 100
When 2 is betting, its 25% blus again. The value hands are Ace-Nine, the blus Three-Two. Against 1s checking range, 2s Nine is the best hand over 77% of the time, against 1s checking range of blucatchers and better, the Nine is the best hand 71% of the time, just good enough to be a value bet. Since 67% of 1s blucatch and better hands are calling, the Nine is ahead, and the Eight is not. With eight mixed strategic options being played translating the solution into our strategy notation is omitted here. 52-card For a larger deck such as 52 cards we plot the strategies. In gure 8 we can see how 1 plays. The structure of the solution is the same as for the 13-card game. We also observe the same dominated strategies for calling and blung as discussed in the 13-card game which can be easily xed. Figure 9 depicts how 2 plays. The structure of the solution is also equivalent to that of the 13-card game, and the same dominated strategy for calling as before pops up, which we can swap for a more robust equilibrium solution easily. Expanding from 3-card to 13-card to 52-card lets us make the following observations. When a bet is made, value and blu frequency are correlated. The blinds and the 31
Fig. 8 Player one bets around 70% of the time with hands 1-25 and 44-52, for a blu frequency of 25%. After checking, she calls all strong hands 1-26 always, all weak hands 44-52 never, and hands in between sometimes. bet size determine the risk to reward ratio for the calling player and thereby the frequencies for the person betting. Value frequency depends on the number of value hands, which depends on the calling hands of the opponent. When there is more than one decision point, strong hands are potentially split before the last the rst. This is called slow-playing, acting weak with a strong hand. This can be interpreted as protecting other hands that act weak (check as 1), or as denying the other player to have too many value bets.
32
Fig. 9 Player two when facing a bet calls all strong hands 1-25 always and all weak hands 44-52 never, this works in tandem with Player ones bets. With the blucatcher hands in the middle she calls sometimes. When facing a check the strong hands 1-25 and half of 26 value bet and the weak hands 45-52 and half of 44 blu bet, the rest checks for showdown.
33
EV(Hero) =
card C
(frequency of C )(EV on C ) =
Heros EV if turn is checked through is about $44.81. If Villainbets the turn for an innitesimal amount, Hero would raise to a very large amount with AhKh and just shy of two blu combos ( 11 6 ), say AsKs and 83.3% of AcKc. When Hero raises, his EV is $100. When Hero just calls there is no betting on the river and Heros EV is just $22.73 ($ 250 11 ). His total EV against an innitesimal bet is then 17 25 250 12475 ( 6 $100 + 6 $ 11 )/7 = $ 231 or about $54.00. (If Hero only called the innitesimal bet, he would achieve the EV of a checked turn, which is lower.) If Villainbets the turn for an amount b Hero would raise to a very large amount with the same range as above (ush and 11 6 blu combos), again making QJ an indierent call. If b < $ 125 then the remaining AK will call, else they will fold. If they can call 3 $2506b 25 $2506b their EV is . Heros total EV is then ( 17 )/7. 11 6 ($100 + b) + 1{b<$ 125 } 6 11 At b = $0 we get the same value as above of about $44.62. At b = $ 125 3 this is about $57.34. To summarize, Hero achieves the following EVs
3
$ 33.77 Checked down $ 44.81 Checked turn $ 54.00 Checkraised turn vs innitesimal bet (= Hero bet turn himself) $ 57.34 Checkraised turn vs $ 125 3 (AKo indierent to calling + checking river)
34
EV(Hero) =
35
36
(b + 1) sgn(h1 h2 )1{(s(h1 ),t(h2 ))=(bet b,call)} (h1 , h2 ) (+1) sgn(h1 h2 )1{s(h1 )=check} (h1 , h2 ) (+1)1{(s(h1 ),t(h2 ))=(bet b,fold)} (h1 , h2 ) dh1 dh2
The payo for the leaves of the game tree is given by z, Z = (z, z ) 1 b = bet b, a = fold Bet - Fold b = check, h1 < h2 Check, 1 wins. 1 z ((h1 , b), (h2 , a)) = 1 b = check, h1 > h2 Check, 2 wins. b+1 b = bet b, a = call, h1 < h2 Bet - Call, 1 wins. b 1 b = bet b, a = call, h1 > h2 Bet - Call, 2 wins. Where leaf ((h1 , b), (h2 , a)) is reached when 1 chose action b and 2 chose action a with hands h1 , h2 dealt by 0, respectively. We can make the following observations Both player one and player two have only a single decision point. 1 when she has the option to bet, 2 when she faces a bet. At both those points, both have their full distribution [0, 1] of hands available, and the other player can have no information where one is in that set. If 2 folds, there is no dierence in the hand she was dealt, the payo is 1 to her (z ((h1 , bet b), (h2 , fold)) = 1 h2 ). If 2 calls, a stronger hand can not perform worse than a weaker hand. For a given play of 1 z (, (j, call)) z (, (k, call)) whenever j < k . Thus, if 2 correctly calls some hand k , she also calls all j < k . As such, there is a unique value k for each b splitting 2s distribution into a calling (< k ) and folding (> k ) region. If 1 bets, she will encounter a response of the type characterized by k (b), meaning calls from [0, k ) and folds from [k , 1]. The payo for h1 betting can be better than checking if h1 prots b frequently when called, that is h1 << k , or if h1 prots by folds enough to make it better than checking, that is h1 >> k . h1 close to k will perform worse by betting over checking since checking and betting have the same payo against hands worse than k , but betting has a worse payo then checking against hands better than k . 37
Fig. 10 An example path in the game tree of section 6.1. 0 deals the starting hands. Player one receives a strong hand, 0.2, and decides to bet b = 2. Player two receives a medium hand, 0.4, and decides to call against b = 2. They reach the payo z ((0.2, 2), (0.4, call)) = 3. 1 made a successful value bet, 2 made an unsuccessful blucatch call. With this reasoning we propose three values: k , v and u , which mark the thresholds at which calling performs better for 2 than folding and where betting performs better for 1 than checking, with v < k < u . 27 The corresponding strategies are bet b if h1 [0, v ) s (h1 ) = check if h1 [v , u ) bet b if h1 [u , 1] t (h2 ) = call if h2 [0, k ) fold if h2 [k , 1]
(1)
(2)
The payo of the proposed v < k < u structure for strategies s and t is y (s, t) = ((0)v + (b + 1)(k v ) + (+1)(1 k ))v + ((1)(v ) + (0)(u v ) + (+1)(1 u))(u v ) + ((b 1)(k ) + (+1)(1 k ))(1 u) = bkv + bku bv 2 + 2ku bk 2k u2 + 1.
27
38
Fig. 11 Visualization of the payo for the a strategy match-up characterized by v < k < u for a xed b. The horizontal axis represents hands [0, 1] for 1, the vertical for 2. The areas of intersection are marked in areas of uniform payos for their leaves Z . E.g. the area marked y = b + 1 is the intersection of strong hands [0, v ] of 1 with weaker hands calling [v, k ] of 2. In the rst equation the rst summand is the payo when 1 holds v and better: on average 0 for hands of 2 of v and better, (b + 1) for hands of 2 between v and k , and +1 for hands of 2 of k and worse. The second and third summand is the payo of 1s hands between v and u and worse than u, respectively. See Fig. 11 for a visualization. We nd 1s problem of nding v for given k as v = arg max y (k , v, u)
v [0,k ]
= arg max bk v + bk u bv 2 + 2k u bk 2k u2 + 1
v [0,k ]
With the second order coecient being negative there exists exactly one maximum 39
where 2 (v (b) + v (bk )) = 0. v k v = [0, k ] 2 The same problem of nding u for a given k u = arg max y (k , v, u)
u[k ,1]
= arg max bk v + bk u bv 2 + 2k u bk 2k u2 + 1
u[k ,1]
Again, with the second order coecient being negative there exists exactly one maximum where 2 (u (1) + u(bk + 2k )) = 0. u b+2 u = k [k , 1] 2 We nd 2s problem of nding k for given v , b as k = arg min y (k, v , u )
k[v ,u ]
= v 1{bv +bu +2u b2>0} (v , u ) + u 1{bv +bu +2u b2<0} (v , u ) 1{bv +bu +2u b2=0} (v , u ) +k
can be any arbitrary value in the interval. We see that the interesting case Where k is bv + bu + 2u b 2 = 0. If b and u do not qualify this condition, 2 will either call as many hands as possible (1 has too many blus) or as few hands as possible (1 has too few blus). If the condition is violated, k will actually fall outside the [v , u ] 28 and range; for instance for v > 0, u = 1 (1 has no blus) we nd k = (1 2bb +2 )v < v b+2 b for v = 0, u < 1 (1 has only blus) we get k = 2b+2 + 2b+2 u > u
Poker players call a term of the 2bb kind pot odds. If a hand matches a bet b for a total win +2 b 2b +pot size, the hand must be winning at showdown at least at the frequency of their fraction 2b+pot . size
28
40
bv
We conclude that if our v < k < u structure does indeed characterize a solution, + bu + 2u b 2 = 0 must be satised. In other terms v = (1 u )( b+2 b ). The size of the blung region (1 u ) is linearly related in terms of b to the size of the value region v . This multiplier eect has rst been coined with its own character in [1, p 113]. The system v = k 2 b+2 2
u = k
8 b2 + 4b + 4 ,1 b2 + 5b + 4 9 b2 2b + 4 + 5b + 4 [v , u ] [0, 1]
for b 0. The last step is to show that s , t described by v , u , k as in equation 1 are in fact a solution. We have to show that neither 1 nor 2 can unilaterally improve their strategy to achieve a higher payo. If that is true, the strategies are in equilibrium, and are therefore a solution. 2 tries to improve over t when facing s : 2 should maximize her payo over a {call, fold} for all h2 [0, 1]. As can be readily veried, calling performs best for h2 [0, v ], folding performs best for h2 [u , 1], and both options perform exactly the same for h2 [v , u ]. Therefor any k [v , u ] performs best possible, and k is one such k . 2 can not unilaterally improve t , although all strategies with arbitrary play on the middle interval oer the same payo against s . 1 tries to improve over s when facing t : 1 should maximize her payo over b {bet b, check} for all h1 [0, 1]. As shown in the derivation, betting performs best for h1 [0, v ] [u , 1]. For h1 [v , u ] it can be easily veried that checking is indeed best. s is best possible against t and all deviations (with weight > 0 of hands) perform worse. Thus 1 can not unilaterally improve s .
Fig. 12 Optimal strategy bounds v , k , u plotted against b. u reaches its minimum 8 of 9 at b = 2. v , k 0, u 1 for b . already know how the solution to the game for any xed b looks like, as such we must only nd b [0, ) with maximal payo for 1 and 2. The payo for s , t is maximized and minimized over b. We recall the payo for strategies s, t of the v, k, u structure as y (s, t) = bkv + bku bv 2 + 2ku bk 2k u2 + 1.
= arg max
b1 [0,)
2 b1 (2b1 + 4)(b1 + 2) + b1 (2b1 + 4)(b2 1 + 4b1 + 4) b1 (b1 + 2) 2 2 2 + 2(2b1 + 4)(b2 1 + 4b1 + 4) (b1 + 4b1 + 4) /c(b1 )
This fourth order fraction solves to one maximum for b > 0 at exactly
42
Fig. 13 Payo y for s against t for bet sizes b. y is maximal at b = 2 and minimal at b = 0 and when b .
b 1 = 2.
Incidentally b 1 = 2 is the point at which u is maximal. This is not a coincidence, we can solve many toy poker games by using the short hand of nding maximal blung ranges. The same problem of nding b 2 for 2 b 2 = arg max y (s (b2 ), t (b2 )) b2 [0,)
=0 2 prefers if there was no betting, which is true at b = 0 or as b , which intuitively seems logical, as only 1 gains by putting in extra wagers with strong hands.
43
If Player one can choose a single possible bet size, the game plays as b 1 =2 2 v = 9 k = u = y= 4 9 8 9 1 . 9
If Player two can choose a single possible bet size, it is b 2 = 0 and the game plays arbitrarily, the players have no options, eectively, with y = 0.
(s(h1 ) + 1) sgn(h1 h2 )1{t(h2 ,s(h1 ))=call} (h1 , h2 ) (+1)1{t(h2 ,s(h1 ))=fold} (h1 , h2 ) dh1 dh2
The payo for the leaves of the game tree is essentially unchanged from section 6.1. Checks are now treated as bets of b = 0. Action a is based on the betsize b it faces, but as arguments of z simply have to mark a path through the game tree, this must not be
44
represented. a = fold 1 z ((h1 , b), (h2 , a)) = b + 1 a = call, h1 < h2 b 1 a = call, h1 > h2 We again start with some observations If 2 calls against b, a stronger hand can not perform worse than a weaker hand. ) and folding [k , 1] against a size b can weakly Thus 2 a strategy of calling [0, kb b dominate all other strategies, and therefore there exists an optimal strategy that is part of the set of strategies of this type.
. For each If 1 bets b, she will encounter a response of the type characterized by kb hand h1 the average payo of z ((h1 , b), ) will be maximal over all b.
45
We express the value that h has in betting b as follows EV (h, b) = +b(max0, 1/(b + 1) h) b (max1/(b + 1), h) Now it would not make sense to value bet a hand worse than c(b) = 1/(b + 1) so using h < 1/(b + 1) we get EV (h, b) = +b (1/(b + 1) h) b (h) EV (h, b) = b (1/(b + 1) 2h) Let us take a quick look how this function looks like for example h = 0.2, so a top 20% hand. Apparently the best bet size is somewhere around 0.5, a half pot bet. Lets solve for it. To nd the maximum of the function EV we take the derivative in respect to b. d/dbEV (h, b) =: EV (h, b) = 1/(b + 1)2 2h Then nd its root, that means the value of b where EV(h,b) = 0 0 = 1/(b + 1)2 2h sqrt(2h) = 1/(b + 1) b = 1/sqrt(2h) 1 c) Result, Application, Chart So our optimal bet size in respect to our hand h looks like this. A few key points to notice. At h=0.5 we should bet zero, that means checking behind. For hands worse than 0.5 we should not bet (you can try to interpret negative bet sizes) At h close to zero we should bet extremely large. To get a connection to a real poker situation. h=0.5 means we have the best hand half the time versus blucatchers and better. h=0.1 means we have the best hand 90% of the time. One way to apply this bet sizing theory is to measure how often our hand is good, and translate that into h. A simple way to do it: If our hand has equity x versus a range of blucatchers and better, we have a (1-x) hand h in [0,1] notation. We call equity vs blucatchers and better lead. A chart that maps lead to the best bet size. 1/sqrt(2 (1 x)) 1 And also useful, the other way around, mapping a bet size to the required lead. Example: http://weaktight.com/4716760 vs his blucatch+ range: 86.5% JsTs 13.5% KK+, 55, AQs, Ac8c, KQs, Kc8c, QJs, Tc8c, 98s, AQo, KQo, QJo opt size(86.5) = sqrt(1/2(1-0.865)) - 1 = 1/sqrt(2*13.5%) - 1 = 1.924 -1 = 0.92 = 92% Almost pot sized bet! Lower a little bit since he can also raise his nuts and a few blus.
7 Botting
When one has the option to check, checking dominates folding. In this text the option of open folding or folding behind will therefor never be considered, even though they are usually part of the strategy space. Game 1: Description: board: KQJT9 stacks: innite betting: one bet no blocking OOP: {A(a), split(b)} IP: {A(c), split(d)}, a, b, c, d N Solution space (OOP,IP,OOP):
46
If either player is faced with a bet at any point holding an Ace folding is dominated. If IP is faced with a check by OOP, betting an A dominates checking it behind. All other decision points need to be solved for. OOP can bet himself or check. If he is betting he may select a number of dierent sizes. If two dierent sizes s1 and s2 are played, they must yield the same EV, making using a single size, say s1 , co-optimal. As such considering only a single bet size s1 is sucient. All hands that dont bet s1 will check and may then face a bet by IP where they can call or fold to. If IP is checked to he may also select a number of dierent bet sizes for his bets. However, by the same argument, IP can play optimally with only one bet size when checked to: s2 . This yields the following parametrization for the game: OOP bets s1 with ({A(a1 ), split(b1 )} where a1 [0, a], b1 [0, b], s1 ]0, ] OOP checks with ({A(a a1 ), split(b b1 )} IP calls s1 with {A(c), split(d1 )}) where d1 [0, d] IP folds to s1 with {A(0), split(d d1 )}) IP bets s2 with {A(c), split(d2 )} where d2 [0, d] IP checks with ({A(0), split(d d2 )} OOP calls s2 with ({A(a a1 ), split(b2)} where b2 [0, b b1 ] OOP folds to s2 with ({A(0), split(b b1 b2 )} Bet,Call ({A(a1 ), split(b1 )}, {A(c), split(d1 )}) Bet,Fold ({A(a1 ), split(b1 )}, {A(0), split(d d1 )}) Check,Check ({A(a a1 ), split(b b1 )}, {A(0), split(d d2 )}) Check,Bet,Fold ({A(a a1 ), split(b b1 )}, {A(c), split(d2 )}) Check,Bet,Call ({A(a a1 ), split(b b1 )}, {A(c), split(d2 )}) a1 a a2
8 Summary
8.1 Further development
9 Appendix
Theorem (Minimax). In every nite two participant zero sum game there exists a value v R a strategy pair (s, t) such that (s, t) has payos v and v respectively and no improvement in payo is possible for both players. We call v the value of the game. The strategies (s, t) are said to form an equilibrium pair. The proof is based on[4, p 408-419]. Proof. Two players, P = {1, 2}, each having a nite number of pure strategies available S = {s, t} = {{s1 , ..., sn }, {t1 , ..., tm }}, lead to a nite number of possible payouts Y1 = {yij |i {1, ..., n}, j {1, ..., m}} Rnm to player one. Player twos payos are the negative payo of player one due to the zero sum nature of the game. 47
Since the game is symmetric in description, this discussion simultaneously applies to 1 and 2 with perspectives interchanged. Let p1 be a probability distribution over 1s pure strategies.
n
p1 (si ) = 1
p1 describes a (mixed) strategy suciently and we shall refer to p1 as a strategy. Let w1 be the lowest possible payo for strategy p1 , that is the payo against a perfect response of 2.
n
w1 (p1 ) =
j {1,...,m}
min
p1 (si ) yij
i=1
Payo w1 is always achievable for 2 by playing the best pure strategy tj with
n
j = arg min
j {1,...,m} i=1
p1 (si ) yij
If w1 was achievable only by mixing two or more pure strategies tk , k K, K {1, ..., m}, then 2 could improve by increasing weight on the best option, j = arg minkK If all options are equal, any j K is a best pure strategy response to p1 . Player one is looking to nd a strategy p 1 such that w1 (p1 ) is maximal. No matter how well 2 plays against p1 , 1 gains w1 (p1 ) or more. p 1 = arg max
p1 :s[0,1],
n i=1
w1 (p1 )
p1 (si )=1
, then there is a p with If 1 can achieve at least some w1 1 n p1 (si ) yij w1 j {1, ..., m} i=1
Let qi = Let the payos be restricted to non-negative values yij > 0 i, j . new = If they are not they can shifted by a constant, the absolute of the lowest payo, yij old + |min y |, without changing the solution. yij ij ij
n
. p1 (si )/w1
i=1
n n A feasible point q = (q1 , ..., qn ) satises i=1 p1 (si )/w1 = 1/w1 and i=1 qi = thus corresponds to a strategy p1 with w1 (p1 ) = w1 , where p1 (si ) = w1 qi . Maximizing n (1)T q = n i=1 qi is maximizing 1/( i=1 qi ) and therefore w1 . A is simply Aij = yij . Since LP P, this problem can be solved eciently, e.g. by simplex algorithm or proprietary software. Turning to player two, it may not be surprising that the maximum guaranteed payo problem is the dual to the above LP. Following the same reasoning as above for player two, the the denition of w1 (p1 ) changes as twos payo is y : m
w2 (p2 ) =
i{1,...,n}
max
p2 (tj ) yij
j =1
The remainder of the argument is the same. This leads to an LP formulation of minimize subject to and (1)T r AT r 1 r0
If player one can get at least payo 1/ i qi and player two can get at most payo 1/ j rj , then i qi j rj due to the zero sum nature of the game. We must now = 1/ show that they are in fact equal, and v = 1/ i qi j rj is the value of the game. TODO show there exists feasible point which is at v
Bibliography
[1] Bill Chen and Jerrod Ankenman. The Mathematics of Poker. ConJelCo, Pittsburgh, PA, 2006. [2] Chudaco H. Children at Play: An American History. NYU Press, New York, NY, 2007. [3] Michael Johanson. Measuring the size of large no-limit poker games. http://poker.cs.ualberta.ca/publications/2013-techreport-nl-size.pdf. 2013.
[4] R. Duncan Luce and Howard Raia. Games and Decisions. Dover, New York, 1957. [5] J. F. Nash. The bargaining problem. Econometrica (18), 1950. [6] J. F. Nash. Non-cooperative games. Annals of Mathematics (54), 1951. [7] Deborah Ann Mulligan Regina M. Milteer, Kenneth R. Ginsburg. The importance of play in promoting healthy child developement and maintaining strong parentchild bond: Focus on children in poverty. American Academy of Pediatrics, 2011. http://pediatrics.aappublications.org/content/129/1/e204.full.html. 49
[8] David Sklansky. The Theory of Poker. Two Plus Two, Las Vegas, 1987. [9] von Neumann and Morgenstern. Theory of Games and Economic Behavior. Princeton University Press, Oxfordshire, 4 edition, 1944.
50