See all Lex Fridman transcripts on Youtube

youtube thumbnail

Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation | Lex Fridman Podcast #344

2 hours 29 minutes 21 seconds

🇬🇧 English

S1

Speaker 1

00:00

A lot of people were saying like, oh, this whole idea of game theory, it's just nonsense. And if you really want to make money, you got to like look into the other person's eyes and read their soul and figure out what cards they have. But what happened was where we played our bot against 4 top heads up, no limit, hold them poker players. And the bot wasn't trying to adapt to them.

S1

Speaker 1

00:19

It wasn't trying to exploit them. It wasn't trying to do these mind games. It was just trying to approximate the Nash equilibrium and it crushed them.

S2

Speaker 2

00:28

The following is a conversation with Noah Brown, research scientist at FAIR, Facebook AI research group at Meta AI. He co-created the first AI system that achieved superhuman level performance in no limit Texas Hold'em, both heads up and multiplayer. And now recently, he co-created an AI system that can strategically out-negotiate humans using natural language in a popular board game called Diplomacy, which is a war game that emphasizes negotiation.

S2

Speaker 2

00:59

This is the Lex Friedman Podcast. To support it, please check out our sponsors in the description. And now, dear friends, here's Noam Brown. You've been a lead on 3 amazing AI projects.

S2

Speaker 2

01:12

So we got Libratus that solved, or at least achieved human level performance on No Limit Texas Hold'em poker with 2 players, heads up. You got Pluribus that solved No Limit Texas Hold'em poker with 6 players. And just now you have Cicero, these are all names of systems, that solved or achieved human level performance on the game of diplomacy, which for people who don't know, is a popular strategy board game. It was loved by JFK, John F.

S2

Speaker 2

01:44

Kennedy, and Henry Kissinger, and many other big famous people in the decades since. So let's talk about poker and diplomacy today. First, poker. What is the game of No Limit Texas Hold'em?

S2

Speaker 2

01:59

And how is it different from chess?

S1

Speaker 1

02:00

Well, no limit Texas Hold'em poker is the most popular variant of poker in the world. So, you go to a casino, you sit down at the poker table, the game that you're playing is no limit Texas Hold'em. If you watch movies about poker, like Casino Royale or Rounders, the game that they're playing is no limit Texas hold'em poker.

S1

Speaker 1

02:17

Now it's very different from limit hold'em in that you can bet any amount of chips that you want and so the stakes escalate really quickly. You start out with like 1 or 2 dollars in the pot and then by the end of the hand, you've got like a thousand dollars in there maybe.

S2

Speaker 2

02:32

So the option to increase the number very aggressively and very quickly is always there.

S1

Speaker 1

02:36

Right, the no limit aspect is there's no limits to how much you can bet. In limit hold'em, there's like $2 in the pot, you can only bet like $2. But if you got $10,000 in front of you, you're always welcome to put $10,000 into the pot.

S2

Speaker 2

02:50

So I've got a chance to hang out with Phil Helmuth who plays all these different variants of poker and correct me if I'm wrong, but it seems like no limit rewards crazy versus the other ones rewards more kind of calculated strategy or no, because you're sort of looking from an analytic perspective, is strategy also rewarded in no limit Texas hold'em?

S1

Speaker 1

03:15

I think both variants reward strategy, but I think what's different about no limit hold'em is it's much easier to get jumpy. You go in there thinking you're going to play for like $100 or something and suddenly there's like $1,000 in the pot. A lot of people can't handle that.

S2

Speaker 2

03:32

Can you define jumpy?

S1

Speaker 1

03:33

When you're playing poker, you always want to choose the action that's going to maximize your expected value. It's kind of like with investing, right? If you're ever in a situation where the amount of money that's at stake is going to have a material impact on your life, then you're going to play in a more risk-averse style.

S1

Speaker 1

03:51

If somebody makes a huge bet, you're going to, if you're playing no limit hold'em and somebody makes a huge bet, there might come a point where you're like, this is too much money for me to handle. Like I can't risk this amount. And that's what throws a lot of people off. So that's the big difference, I think, between no limit and limit.

S2

Speaker 2

04:09

What about on the action side, when you're actually making that big bet? That's what I mean by crazy. I was trying to refer to the technical term of crazy, meaning use the big jump in the bet to completely throw off the other person in terms of their ability to reason optimally.

S1

Speaker 1

04:30

I think that's right. I think 1 of the key strategies in poker is to put the other person into an uncomfortable position. And if you're doing that, then you're playing poker well.

S1

Speaker 1

04:40

And there's a lot of opportunities to do that in no limit hold'em. You can have like $50 in there, you throw in a $1,000 bet. And that's sometimes, if you do it right, it puts the other person in a really tough spot. Now, it's also possible that you make huge mistakes that way.

S1

Speaker 1

04:56

And so it's really easy to lose a lot of money in no limit hold'em if you don't know what you're doing. But there's a lot of upside potential too.

S2

Speaker 2

05:02

So when you build systems, AI systems that play these games, we'll talk about poker, we'll talk about diplomacy, are you drawn in in part by the beauty of the game itself, AI aside, or is it to you primarily a fascinating problem set for the AI to solve?

S1

Speaker 1

05:19

I'm drawn in by the beauty of the game. When I started playing poker when I was in high school and the idea to me that there is a correct, an objectively correct way of playing poker. And if you could figure out what that is, then you're making unlimited money basically.

S1

Speaker 1

05:38

That's like a really fascinating concept to me. And so I was fascinated by the strategy of poker, even when I was like 16 years old. It wasn't until much later that I actually worked on poker AIs.

S2

Speaker 2

05:48

So there was a sense that you can solve poker, like in the way you can solve chess, for example, or checkers. I believe checkers got solved, right?

S1

Speaker 1

05:57

Yeah, checkers got completely solved.

S2

Speaker 2

05:59

Optimal strategy.

S1

Speaker 1

06:00

Optimal strategy. It's impossible to beat the AI.

S2

Speaker 2

06:01

Yeah, and so in that same way, you could technically solve chess.

S1

Speaker 1

06:05

You could solve chess, you could solve poker.

S2

Speaker 2

06:07

You could solve poker.

S1

Speaker 1

06:09

So this gets into the concept of a Nash equilibrium. So it is a Nash equilibrium. Okay, So in any finite two-player zero-sum game, there is an optimal strategy that if you play it, you are guaranteed to not lose an expectation no matter what your opponent does.

S1

Speaker 1

06:26

And this is kind of a radical concept to a lot of people, but it's true in chess. It's true in poker. It's true in any finite two-player zero-sum game. And to give some intuition for this, you can think of rock, paper, scissors.

S1

Speaker 1

06:38

In rock, paper, scissors, if you randomly choose between throwing rock, paper, and scissors with equal probability, then no matter what your opponent does, you are not going to lose an expectation. You're not going to lose an expectation in the long run. Now, the same is true for poker. There exists some strategy, some really complicated strategy, that if you play that, you are guaranteed to not lose money in the long run.

S1

Speaker 1

07:00

And I should say, this is for 2 player poker. 6 player poker is a different story.

S2

Speaker 2

07:03

Yeah, it's a beautiful giant mess. When you say in expectation, you're guaranteed not to lose in expectation. What does in expectation mean?

S1

Speaker 1

07:12

Poker is a very high variance game. So you're gonna have hands where you win, you're gonna have hands with your lose. Even if you're playing the perfect strategy, you can't guarantee that you're gonna win every single hand.

S1

Speaker 1

07:20

But if you play for long enough, then you are guaranteed to at least break even and in practice probably win.

S2

Speaker 2

07:27

So that's an expectation, the size of your stack, generally speaking. Now that doesn't include anything about the fact that you can go broke. It doesn't include any of those kinds of normal real world limitations.

S2

Speaker 2

07:39

You're talking in a theoretical world. What about the 0 sum aspect? How big of a constraint is that? How big of a constraint

S1

Speaker 1

07:46

is finite? So finite's not a huge constraint. So I mean, most games that you play are finite in size.

S1

Speaker 1

07:54

It's also true, actually, that there exists this perfect strategy in many infinite games as well. Technically, the game has to be compact. There are like some edge cases where you don't have a Nash equilibrium in a two-player zero-sum game. So you can think of a game where you're like, you know, if we're playing a game where whoever names the bigger number is the winner, there's no Nash equilibrium to that game.

S1

Speaker 1

08:13

17. 18.

S2

Speaker 2

08:16

You win again. You're good at this.

S1

Speaker 1

08:18

I played a lot of games.

S2

Speaker 2

08:21

Okay, so that's, and then the 0 sum aspect.

S1

Speaker 1

08:24

The 0 sum. 0 sum aspect. So there exists a Nash equilibrium in non 2 player 0 sum games as well.

S1

Speaker 1

08:30

And by the way, just to clarify what I mean by 2 players, 0 sum, I mean there's 2 players, and whatever 1 player wins, the other player loses. So if we're playing poker and I win $50, that means that you're losing $50. Now, outside of two-player zero-sum games, there still exists Nash equilibria, but they're not as meaningful. Because you can think of a game like Risk.

S1

Speaker 1

08:51

If everybody else on the board decides to team up against you and take you out, there's no perfect strategy you can play that's going to guarantee that you win there. There's just nothing you can do. So outside of two-player zero-sum games, there's no guarantee that you're going to win by playing a Nash equilibrium.

S2

Speaker 2

09:07

Have you ever tried to model in the other aspects of the game, which is like the pleasure you draw from playing the game? And then if you're a professional poker player, if you're exciting, even if you lose, you know, the money you would get from the attention you get to the sponsors and all that kind of stuff, is that, that'd be a fun thing to model in. Or is that make it sort of super complex to include the human factor in this full complexity?

S1

Speaker 1

09:36

I think you bring up a couple good points there. So I think a lot of professional poker players, I mean, they get a huge amount of money, not from actually playing poker, but from the sponsorships and having a personality that people want to tune in and watch. That's a big way to make a name for yourself in poker.

S2

Speaker 2

09:53

I just wonder from an AI perspective, if you create, and we'll talk about this more, maybe AI system that also talks trash and all that kind of stuff, that that becomes part of the function to maximize. So it's not just optimal poker play. Maybe sometimes you want to be chaotic.

S2

Speaker 2

10:10

Maybe sometimes you want to be suboptimal and you lose the chaos. And maybe sometimes you want to be overly aggressive because the audience loves that. That'd be fascinating.

S1

Speaker 1

10:22

I think what you're getting at here is that there's a difference between making an AI that wins a game and an AI that's fun to play

S2

Speaker 2

10:27

with.

S1

Speaker 1

10:28

Yeah. Yeah.

S2

Speaker 2

10:29

Or fun to watch. So those are all different things. Fun to play with and fun to watch.

S1

Speaker 1

10:33

Yeah, and I think I've heard talks from game designers. And they say people that work on AI for actual recreational games that people play. And they say, yeah, there's a big difference between trying to make an AI that actually wins.

S1

Speaker 1

10:46

And you look at a game like Civilization, the way that the AIs play is not optimal for trying to win. They're playing a different game. They're trying to have personalities. They're trying to be fun and engaging and that makes for a better game.

S2

Speaker 2

11:00

Yeah. And we also talk about NPCs. I just talked to Todd Howard, who is the creator of Fallout and the Elder Scrolls series and Starfield, the new game coming out. And the creator of what I think is the greatest game of all time, which is Skyrim and the NPCs there.

S2

Speaker 2

11:15

The AI that governs that whole game is very interesting, but the NPCs also are super interesting. And considering what language models might do to NPCs in an open world RPG role-playing game, it's super exciting.

S1

Speaker 1

11:30

Yeah, honestly, I think this is 1 of the first applications where we're going to see real consumer interaction with large language models. I guess Elder Scrolls 6 is in development now. They're probably pretty close to finishing it.

S1

Speaker 1

11:44

But I would not be surprised at all if Elder Scrolls VII was using large language models for their NPCs.

S2

Speaker 2

11:49

No, they're not. I mean, I'm not saying anything. I'm not saying anything.

S2

Speaker 2

11:53

Okay, this

S1

Speaker 1

11:54

is me speculating, not you.

S2

Speaker 2

11:55

No, but they're just releasing the Starfield game. They do 1 game at a time. And so whatever it is, whenever the date is, I don't know what the date is, calm down.

S2

Speaker 2

12:06

But it would be, I don't know, like 2024, 25, 26. So it's actually very possible that would include language models.

S1

Speaker 1

12:14

I was listening to this talk by a gaming executive when I was in grad school, and 1 of the questions that a person in the audience asked is, why are all these games so focused on fighting and killing? And the person responded that it's just so much harder to make an AI that can talk with you and cooperate with you than it is to make an AI that can fight you. And I think once this technology develops further and you can have a, you can reach a point where like not every single line of dialogue has to be scripted, it unlocks a lot of potential for new kinds of games, like much more like positive interactions that are not so focused on fighting.

S1

Speaker 1

12:50

And I'm really looking forward to that.

S2

Speaker 2

12:51

It might not be positive. It might be just drama. So you'll be in like a call of duty game.

S2

Speaker 2

12:56

Instead of doing the shooting, you'll just be hanging out and like arguing with an AI about like passive aggressive, and then you won't be able to sleep that night, you have to return and continue the argument that you were emotionally hurt. I mean, yeah, I think that's actually an exciting world. Whatever is the drama, the chaos that we love, the push and pull of human connection, I think it's possible to do that in the video game world. And I think you could be messier and make more mistakes in the video game world, which is why it would be a nice place.

S2

Speaker 2

13:28

And also it doesn't have as deep of a real psychological impact because inside video games it's kind of understood that you're in a not a real world. So whatever crazy stuff AI does, we have some flexibility to play. Just like with a game of diplomacy, it's a game. This is not real geopolitics, not real war.

S2

Speaker 2

13:49

It's a game. So you can have a little bit of fun, a little bit of chaos. Okay, back to Nash equilibrium. How do we find the Nash equilibrium?

S1

Speaker 1

13:59

All right, so there's different ways to find a Nash equilibrium. So the way that we do it is with this process called self-play. Basically we have this algorithm that starts by playing totally randomly and it learns how to play the game by playing against itself.

S1

Speaker 1

14:15

So it will start playing the game totally randomly, and then if it's playing poker, it'll eventually get to the end of the game and make $50. And then it will review all the decisions that it made along the way and say, what would have happened if I had chosen this other action instead? You know, if I had raised here instead of called, what would the other player have done? And because it's playing against a copy of itself, it's able to do that counterfactual reasoning.

S1

Speaker 1

14:42

So I can say, okay, well, if I took this action and the other person takes this action, and then I take this action. And eventually, I make $150 instead of $50. And so it updates the regret value for that action. Regret is basically like, how much does it regret having not played that action in the past?

S1

Speaker 1

15:00

And when it encounters that same situation again, it's going to pick actions that have higher regret with higher probability. Now, it'll just keep simulating the games this way. It'll keep, you know, accumulating regrets for different situations. And in the long run, if you pick actions that have high regret with higher probability in the correct way, it's proven to converge to a Nash equilibrium.

S2

Speaker 2

15:24

Even for super complex games? Even for imperfect information games?

S1

Speaker 1

15:28

It's true for all games. It's true for chess, it's true for poker, it's particularly useful for poker.

S2

Speaker 2

15:33

So this is the method of counterfactual regret minimization?

S1

Speaker 1

15:36

This is counterfactual regret minimization.

S2

Speaker 2

15:37

That doesn't have to do with self-play, it has to do with just any, if you follow this kind of process, self-play or not, you will be able to arrive at an optimal set of actions.

S1

Speaker 1

15:48

So this counterfactual regret minimization is a kind of self-play. It's a principled kind of self-play that's proven to converge to Nash equilibria, even in imperfect information games. Now you can have other forms of self-play and people use other forms of self-play for perfect information games, where you have more flexibility, the algorithm doesn't have to be as theoretically sound in order to converge to that class of games because it's a simpler setting.

S2

Speaker 2

16:11

Sure, so I kind of, in my brain, the word self-play has mapped to neural networks, but we're speaking something bigger than just neural networks. It could be anything. The self-play mechanism is just the mechanism of a system playing itself.

S1

Speaker 1

16:26

Exactly, yeah. Self-play is not tied specifically to neural nets. It's a kind of reinforcement learning, basically.

S1

Speaker 1

16:31

And I would also say this process of like trying to reason, oh, what would the value have been if I had taken this other action instead? This is very similar to how humans learn to play a game like poker, right? Like you probably played poker before and with your friends you probably ask like, oh, what do you have called me if I raise there? You know, and that's a person trying to do the same kind of like learning from a counterfactual that the AI is doing.

S2

Speaker 2

16:54

Okay, and if you do that at scale, you're gonna be able to learn an optimal policy.

S1

Speaker 1

16:59

Yeah, Now where the neural nets come in, I said like, okay, if it's in that situation again, then it will choose the action that has high regret. Now the problem is that poker is such a huge game. I think no limit Texas Hold'em, the version that we were playing has 10 to the 161 different decision points, which is more than the number of atoms in the universe squared.

S2

Speaker 2

17:18

That's heads up?

S1

Speaker 1

17:18

That's heads up.

S2

Speaker 2

17:19

Yeah. 10 to the 161, you said?

S1

Speaker 1

17:21

Yeah. I mean, it depends on the number of chips that you have, the stacks and everything, but like the version that we were playing was 10 to the 161.

S2

Speaker 2

17:27

Which I assume would be a somewhat simplified version anyway. Because I bet there's some like step function you had for like bets.

S1

Speaker 1

17:36

Oh, no, no, no. That's I'm saying like we played the full game. You can bet whatever amount you want.

S1

Speaker 1

17:39

And the bot maybe was constrained in like what it considered for bet sizes, but the person on the other side could bet whatever they wanted.

S2

Speaker 2

17:45

Yeah, I mean, 161 plus or minus 10 doesn't matter. Yeah.

S1

Speaker 1

17:51

And so the way neural nets help out here is, you don't have to run into the same exact situation because that's never gonna happen again. The odds of you running into the same exact situation are pretty slim. But if you run into a similar situation, then you can generalize from other states that you've been in that kind of look like that 1.

S1

Speaker 1

18:06

And you can say like, well, these other situations, I had high regret for this action. And so maybe I should play that action here as well.

S2

Speaker 2

18:12

Which is the more complex game, chess or poker or go or poker? Do you know?

S1

Speaker 1

18:19

That is a controversial question.

S2

Speaker 2

18:21

Okay.

S1

Speaker 1

18:21

I'm gonna-

S2

Speaker 2

18:22

It's like somebody's screaming on Reddit right now. It depends on which subreddit you're on. Is it chess or is it poker?

S1

Speaker 1

18:27

I'm sure like David Silver is gonna get really angry at me. Yeah. I'll say, I'm gonna say poker actually.

S1

Speaker 1

18:31

And I think for a couple of reasons.

S2

Speaker 2

18:33

They're not here to defend themselves.

S1

Speaker 1

18:36

So first of all, you have the imperfect information aspect. And so it's, we can go into that, but like once you introduce imperfect information, things get much more complicated.

S2

Speaker 2

18:48

So we should say, maybe you can describe what is seen to the players, what is not seen in the game of Texas Hold'em.

S1

Speaker 1

18:57

Yeah, so Texas Hold'em, you get 2 cards facedown that only you see. And so that's the hidden information of the game. The other players also all get 2 cards face down that only they see.

S1

Speaker 1

19:08

And so you have to kind of, as you're playing, reason about like, okay, what do they think I have? What do they have? What do they think I think they have? That kind of stuff.

S1

Speaker 1

19:16

And that's kind of where bluffing comes into play, right? Because the fact that you can bluff, the fact that you can bet with a bad hand and still win is because they don't know what your cards are. And that's the key difference between a perfect information game like chess and go and imperfect information games like poker.

S2

Speaker 2

19:36

This is what trash talk looks like. The implied statement is the game I solved is much tougher. But Yeah, so when you're playing, I'm just gonna do random questions here.

S2

Speaker 2

19:49

So when you're playing your opponent under imperfect information, is there some degree to which you're trying to estimate the range of hands that they have? Or is that not part of the algorithm? So what are the different approaches to the imperfect information game?

S1

Speaker 1

20:06

So the key thing to understand about why imperfect information makes things difficult is that you have to worry not just about which actions to play, but the probability that you're going to play those actions. So you think about Rock-Paper-Scissors, for example. Rock-Paper-Scissors is an imperfect information game because you don't know what I'm about to throw.

S2

Speaker 2

20:26

I do, but yeah, usually not. Yeah.

S1

Speaker 1

20:29

And so you can't just say like, I'm just going to throw a rock every single time because the other person's going to figure that out and notice a pattern and then suddenly you're going to start losing. And so you don't just have to figure out like which action to play, you have to figure out the probability that you play it. And really importantly, the value of an action depends on the probability that you're going to play it.

S1

Speaker 1

20:47

So if you're playing rock every single time, that value is really low. But if you're never playing rock, you play rock like 1% of the time, then suddenly the other person is probably going to be throwing scissors. And when you throw rock, the value of that action is going to be really high. Now, you take that to poker, what that means is the value of bluffing, for example, if you're the kind of person that never bluffs and you have this reputation as somebody that never bluffs, and suddenly you bluff, there's a really good chance that that bluff is going to work, and you're going to make a lot of money.

S1

Speaker 1

21:18

On the other hand, if you've got a reputation, like if they've seen you play for a long time, and they see, oh, you're the kind of person that's bluffing all the time, when you bluff, they're not going to buy it. And they're going to call you down. You're going to lose a lot of money. And Finding that balance of how often you should be bluffing is the key challenge of a game of poker.

S1

Speaker 1

21:37

And you contrast that with a game like chess. It doesn't matter if you're opening with the queen's gambit 10% of the time or 100% of the time. The value, the expected value is the same. So, that's why we need these algorithms that understand not just we have to figure out what actions are good, but the probabilities.

S1

Speaker 1

21:57

We need to get the exact probabilities correct. And that's actually when we created the bot Libratus, Libratus means balanced because the algorithm that we designed was designed to find that right balance of how often it should play each action.

S2

Speaker 2

22:10

The balance of how often in the key sort of branching is the bluff or not to bluff. Is that a good crude simplification of the major decision in poker?

S1

Speaker 1

22:20

It's a good simplification. I think that's like the main tension, but it's not just how often to bluff or not to bluff. It's like, how often should you bet in general?

S1

Speaker 1

22:29

How often should you, what kind of bet should you make? Should you bet big or should you bet small? And with which hands? And so this is where the idea of a range comes from.

S1

Speaker 1

22:40

Because when you're bluffing with a particular hand in a particular spot, you don't want there to be a pattern for the other person to pick up on. You don't want them to figure out, oh, whenever this person is in this spot, they're always bluffing. And so you have to reason about, okay, would I also bet with a good hand in this spot? You wanna be unpredictable.

S1

Speaker 1

22:59

So you have to think about, what would I do if I had this different set of cards.

S2

Speaker 2

23:04

Is there explicit estimation of like a theory of mind that the other person has about you, or is that just a emergent thing that happens?

S1

Speaker 1

23:14

The way that the bots handle it that are really successful, they have an explicit theory of mind. So they're explicitly reasoning about what's the common knowledge belief? What do you think I have?

S1

Speaker 1

23:27

What do I think you have? What do you think I think you have? It's explicitly reasoning about that.

S2

Speaker 2

23:32

Is there multiple use there? So maybe that's jumping ahead to 6 players, but is there a stickiness to the person? So it's an iterative game, you're playing the same person.

S2

Speaker 2

23:45

There's a stickiness to that, right? You're gathering information as you play. It's not every hand is a new hand. Is there a continuation in terms of estimating what kind of player I'm facing here?

S1

Speaker 1

23:59

That's a good question. So you could approach the game that way. The way that the bots do it, they don't, and the way that humans approach it also, expert human players, the way they approach it is to basically assume that you know my strategy.

S1

Speaker 1

24:13

So I'm going to try to pick a strategy where even if I were to play it for 10,000 hands and you could figure out exactly what it was, you still wouldn't be able to beat it. Basically, what that means is I'm trying to approximate the Nash equilibrium. I'm trying to be perfectly balanced. Because if I'm playing the Nash equilibrium, even if you know what my strategy is, like I said, I'm still unbeatable in expectation.

S1

Speaker 1

24:33

So that's what the bot aims for. And that's actually what a lot of expert poker players aim for as well, to start by playing the Nash equilibrium. And then maybe if they spot weaknesses in the way you're playing, then they can deviate a little bit to take advantage of that.

S2

Speaker 2

24:47

They aim to be unbeatable in expectation. Okay, so who's the greatest poker player of all time and why is it Phil Hellmuth? So this is for Phil.

S2

Speaker 2

24:56

So he's known, at least in part, for maybe playing suboptimally and he still wins a lot. It's a bit chaotic. So maybe can you speak from an AI perspective about the genius of his madness or the madness of his genius? So playing suboptimally, playing chaotically as a way to make you hard to pin down about what your strategy is.

S1

Speaker 1

25:23

So, okay. The thing that I should explain first of all with like Nash equilibrium, it doesn't mean that it's predictable. The whole point of it is that you're trying to be unpredictable.

S1

Speaker 1

25:32

Now I think when somebody like Phil Hellmuth might be really successful is not in being unpredictable, but in being able to take advantage of the other player and figure out where they're being predictable or guiding the other player into thinking that you have certain weaknesses and then understanding how they're going to change their behavior. They're gonna deviate from a Nash equilibrium style of play to try to take advantage of those perceived weaknesses and then counter exploit them. So you kind of get into the mind games there.

S2

Speaker 2

26:02

So you think about these heads up poker as a dance between 2 agents. I guess, are you playing the cards or are you playing the player?

S1

Speaker 1

26:10

So this gets down to a big argument in the poker community and the academic community. For a long time, there was this debate of what's called GTO, Game Theory Optimal Poker, or exploitative play. And up until about 2017 when we did the Labradors match, I think actually exploitative play had the advantage.

S1

Speaker 1

26:29

A lot of people were saying, like, oh, this whole idea of game theory, it's just nonsense. And if you really want to make money, you got to look into the other person's eyes and read their soul and figure out what cards they have. But what happened was people started adopting the game theory optimal strategy, And they were making good money. And they weren't trying to adapt so much to the other player.

S1

Speaker 1

26:50

They were just trying to play the Nash equilibrium. And then what really solidified it, I think, was the Labradors match, where we played our bot against 4 top heads up, no limit hold'em poker players. And the bot wasn't trying to adapt to them. It wasn't trying to exploit them.

S1

Speaker 1

27:05

It wasn't trying to do these mind games. It was just trying to approximate the Nash equilibrium. And it crushed them. I think, you know, we were playing for $50, $100 blinds.

S1

Speaker 1

27:17

And over the course of about 120,000 hands, it made close to $2 million.

S2

Speaker 2

27:21

120,000 hands. 120,000 hands. Against humans.

S1

Speaker 1

27:24

Yeah, and this was fake money to be clear. So there was real money at stake. There was $200,000.

S2

Speaker 2

27:28

First of all, all money is fake, but that's a different conversation. We give it meaning. It's a phenomena that gets meaning from our complex psychology as a human civilization.

S2

Speaker 2

27:43

It's emerging from the collective intelligence of the human species, but that's not what you mean. You mean like there's literally, you can't buy stuff with it. Okay, can you actually step back and take me through that competition?

S1

Speaker 1

27:55

Yeah, okay, so when I was in grad school, there was this thing called the Annual Computer Poker Competition, where every year, all the different research labs that were working on AI for poker would get together, they would make a bot, they would play them against each other. And we made a bot that actually won the 2014 competition, the 2016 competition. And so we decided we're going to take this bot, build on it, and play against real top professional heads-up no-limit Texas Hold'em poker players.

S1

Speaker 1

28:25

So we invited 4 of the world's best players in this specialty, And we challenged them to 120,000 hands of poker over the course of 20 days. And we had $200,000 in prize money at stake where it would basically be divided among them depending on how well they did relative to each other. So we wanted to have some incentive for them to play their best.

S2

Speaker 2

28:47

Did you have a confidence, 2014, 16, that this is even possible? How much doubt was there?

S1

Speaker 1

28:53

So we did a competition actually in 2015 where we also played against professional poker players and the bot lost by a pretty sizable margin actually. Now there were some big improvements from 2015 to 2017. And so-

S2

Speaker 2

29:06

Can you speak to the improvements? Is it computational in nature? Is it the algorithm, the methods?

S1

Speaker 1

29:11

It was really an algorithmic approach. That was the difference. So 2015, it was much more focused on trying to come up with a strategy upfront, like trying to solve the entire game of poker, like, and then just have a lookup table where you're saying like, oh, I'm in this situation, what's the strategy?

S1

Speaker 1

29:29

The approach that we took in 2017 was much more search-based. It was trying to say, okay, well, let me in real time try to compute a much better strategy than what I had pre-computed by playing against myself during self-play.

S2

Speaker 2

29:42

What is the search space for poker? What are you searching over? What's that look like?

S2

Speaker 2

29:50

There's different actions like raising, calling. Yeah, what are the actions? Is it just a search over actions?

S1

Speaker 1

29:59

So In a game like chess, the search is like, okay, I'm in this chess position and I can like, you know, move these different pieces and see where things end up. In poker, what you're searching over is the actions that you can take for your hand, the probabilities that you take those actions, and then also the probabilities that you take other actions with other hands that you might have. And that's kind of like hard to wrap your head around.

S1

Speaker 1

30:23

Why are you searching over these other hands that you might have and trying to figure out what you would do with those hands? And the idea is, again, you wanna always be balanced and unpredictable. And so if you're a search algorithm is saying like, oh, I want to raise with this hand. Well, in order to know whether that's a good action, like let's say it's a bluff, let's say you have a bad hand and you're saying like, oh, I think I should be betting here with this really bad hand and bluffing.

S1

Speaker 1

30:50

Well, that's only a good action if you're also betting with a strong hand, otherwise it's an obvious bluff.

S2

Speaker 2

30:56

So if your action in some sense maximizes your unpredictability, so that action could be mapped by your opponent to a lot of different hands, then that's a good action.

S1

Speaker 1

31:07

Basically what you want to do is put your opponent into a tough spot. So you want them to always have some doubt, like should I call here, should I fold here? And if you are raising in the appropriate balance between bluffs and good hands, then you're putting them into that tough spot.

S1

Speaker 1

31:21

And so that's what we're trying to do. We're always trying to search for a strategy that would put the opponent into a difficult position.

S2

Speaker 2

31:26

Can you give a metric that you're trying to maximize or minimize? Does this have to do with the regret thing that we're talking about in terms of putting your opponent in a maximally tough spot?

S1

Speaker 1

31:37

Yeah, ultimately what you're trying to maximize is your expected winnings, like your expected value, the amount of money that you're gonna walk away from, assuming that your opponent was playing optimally in response. So you're going to assume that your opponent is also playing as well as possible a Nash equilibrium approach, because if they're not, then you're just going to make more money. Right?

S1

Speaker 1

31:58

Like anything that deviates... Like by definition, the Nash equilibrium is the strategy that does the best in expectation. And so if you're deviating from that, then you're just, they're gonna lose money. And since it's a 2 player 0 sum game, that means you're gonna make money.

S2

Speaker 2

32:12

So there's not an explicit like objective function that maximizes the toughness of the spot they're put in. You're always, this is from like a self play reinforcement learning perspective. You're just trying to maximize winnings and the rest is implicit.

S1

Speaker 1

32:27

That's right, yeah. So what we're actually trying to maximize is the expected value, given that the opponent is playing optimally in response to us. Now in practice, what that ends up looking like is it's putting the opponent into difficult situations where there's no obvious decision to be made.

S2

Speaker 2

32:41

So the system doesn't know anything about the difficulty of the situation?

S1

Speaker 1

32:46

Not at all, doesn't care.

S2

Speaker 2

32:47

Okay, in my head it was getting excited whenever I was making the other, the opponent sweat. Okay, so you're in 2015, you didn't do as well. So what's the journey from that to a system that in your mind could have a chance?

S1

Speaker 1

33:00

So 2015, we got, we beat pretty badly And we actually learned a lot from that competition. And in particular, what became clear to me is that the way the humans were approaching the game was very different from how the bot was approaching the game. The bot would not be doing search.

S1

Speaker 1

33:17

It would just be trying to compute. It would do like months of self-play. It would just be playing against itself for months, but then when it's actually playing the game, it would just act instantly. And the humans, when they're in a tough spot, they would sit there and think for sometimes even like 5 minutes about whether they're gonna call or fold a hand.

S1

Speaker 1

33:36

And it became clear to me that that's, there's a good chance that that's what's missing from our bot. So I actually did some initial experiments to try to figure out how much of a difference does this actually make? And the difference was huge.

S2

Speaker 2

33:48

As a signal to the human player, how long you took to think?

S1

Speaker 1

33:52

No, no, no, I'm not saying that there were any timing tells. I was saying when the human, like the bot would always act instantly. It wouldn't try to come up with a better strategy in real time over what it had pre-computed during training.

S1

Speaker 1

34:04

Whereas the human, like they have all this intuition about how to play, but they're also in real time leveraging their ability to think, to search, to plan, and coming up with an even better strategy than what their intuition would say. So

S2

Speaker 2

34:17

you're saying that there is, you're doing, that's what you mean by you're doing search also.

S1

Speaker 1

34:21

Yeah.

S2

Speaker 2

34:22

You have an intuition and search on top of that looking for a better solution.

S1

Speaker 1

34:28

Yeah, that's what I mean by search. Instead of acting instantly, a neural net usually gives you a response in like 100 milliseconds or something. It depends on the size of the net.

S1

Speaker 1

34:39

But if you can leverage extra computational resources, you can possibly get a much better outcome. And we did some experiments in small scale versions of poker. And what we found was that if you do a little bit of search, even just a little bit, it was the equivalent of making your pre-computed strategy, like you can kind of think of it as your neural net, a thousand times bigger. With just a little bit of search.

S1

Speaker 1

35:08

And it just like blew away all of the research that we had been working on and trying to like scale up this like pre-computed solution. It was dwarfed by the benefit that we got from search.

S2

Speaker 2

35:19

Can you just linger on what you mean by search here? You're searching over a space of actions for your hand and for other hands. How are you selecting the other hands to search over?

S2

Speaker 2

35:32

Is it randomly?

S1

Speaker 1

35:33

No, it's all the other hands that you could have. So when you're playing No Limit Texas Hold'em, you've got 2 face down cards. And so that's 52 choose 2, 1,326 different combinations.

S1

Speaker 1

35:43

Now that's actually a little bit lower because there's face up cards in the middle and so you can eliminate those as well.

S2

Speaker 2

35:48

But

S1

Speaker 1

35:48

you're looking at like around a thousand different possible hands that you can have. And so when we're doing, when the bot's doing search, it's thinking explicitly, there are these thousand different hands that I could have, there are these thousand different hands that you could have. Let me try to figure out what would be a better strategy than what I've pre-computed for these hands and your hands.

S2

Speaker 2

36:07

Okay, so that search, how do you fuse that with what the neural net is telling you or what the trained system is telling you?

S1

Speaker 1

36:19

Yeah, so you kind of like, where the train system comes in is the value at the end. So there's, you only look so far ahead. You look like maybe, you know, 1 round ahead.

S1

Speaker 1

36:31

So if you're on the flop, you're looking to the start of the turn. And at that point, you can use the pre-computed solution to figure out what's the value here of this strategy.

S2

Speaker 2

36:43

Is it of a single action, essentially, in that spot? Are you getting a value, or is it the value of the entire series of actions?

S1

Speaker 1

36:52

Well, it's kind of both, because you're trying to maximize the value for the hand that you have, but in the process, in order to maximize the value of the hand that you have, you have to figure out what would I be doing with all these other hands as

S2

Speaker 2

37:04

well. Okay. But are you in the search always going to the end of the game?

S1

Speaker 1

37:09

In Labradus, we did. So we only use search starting on the turn and then we searched all the way to the end of the game.

S2

Speaker 2

37:17

The turn, the river. Can we take it through the terminology?

S1

Speaker 1

37:22

Yeah, there's 4 rounds of poker. So there's the pre-flop, the flop, the turn and the river. And so we would start doing search halfway through the game.

S1

Speaker 1

37:30

Now the first half of the game, that was all pre-computed. It would just act instantly. And then when it got to the halfway point, then it would always search to the end of the game. Now we later improved this, so it wouldn't have to search all the way to the end of the game.

S1

Speaker 1

37:41

It would actually search just a few moves ahead. But that came later and that drastically reduced the amount of computational resources that we needed.

S2

Speaker 2

37:51

But the moves, cause you can keep betting on top of each other. That's what you mean by moves. So like that's where you don't just get 1 bet per turn of poker.

S2

Speaker 2

37:59

You can have multiple arbitrary number of bets, right?

S1

Speaker 1

38:02

Right, I'm trying to think like, I'm gonna bet and then what are you gonna do in response? Are you gonna raise me or are you gonna call? And then if you raise, what should I do?

S1

Speaker 1

38:10

So it's reasoning about that whole process up until the end of the game in the case of Libratus.

S2

Speaker 2

38:15

So for Libratus, what's the most number of re-raises have you ever seen?

S1

Speaker 1

38:21

You probably cap out at like 5 or something because at that point you're basically all in.

S2

Speaker 2

38:26

I mean, is there like interesting patterns like that that you've seen that the game does? Like you'll have like AlphaZero doing way more sacrifices than humans usually do. Is there something like Libratus was constantly re-raising or something like that that you've noticed?

S1

Speaker 1

38:43

There was something really interesting that we observed with Libratus. So humans, when they're playing poker, they usually size their bets relative to the size of the pot. So if the pot has $100 in there, maybe you bet like $75 or somewhere around there, somewhere between like 50 and $100.

S1

Speaker 1

39:01

And with Libratus, we gave it the option to basically bet whatever it wanted. It was actually really easy for us to say, like, oh, if you want, you can bet like 10 times the pot.

S2

Speaker 2

39:09

And

S1

Speaker 1

39:09

we didn't think it would actually do that. It was just like, why not give it the option? And then during the competition, it actually started doing this.

S1

Speaker 1

39:16

And by the way, this was like a very last minute decision on our part to add this option. And so we did not think the bot would do this. And I was actually kind of worried when it did start to do this, like, oh, is this a problem? Humans don't do this.

S1

Speaker 1

39:28

Is it screwing up? But it would put the humans into really difficult spots when it would do that. Because you could imagine you have the second best hand that's possible given the board, and you're thinking, oh, you're in a really great spot here. And suddenly the bot bets $20,000 into a $1,000 pot.

S1

Speaker 1

39:47

And it's basically saying, I have the best hand or I'm bluffing. And you having the second best hand, like now you get a really tough choice to make. And so the humans would sometimes think like 5 or 10 minutes about Like, what do you do? Should I call?

S1

Speaker 1

40:01

Should I fold? And when I saw the humans like really struggling with that decision, like that's when I realized like, oh, actually this is maybe a good thing to do after all.

S2

Speaker 2

40:09

And of course the system is a no that it's making, again, like we said, that it's putting them in a tough spot. It's just, that's part of the optimal, the game theory optimal.

S1

Speaker 1

40:21

Right. From the bot's perspective, it's just doing the thing that's going to make it the most money. And the fact that it's putting the humans in a difficult spot, that's just a side effect of that. And this was, I think, the 1 thing, I mean, there were a few things that the humans walked away from, but this was the number 1 thing that the humans walked away from the competition saying like, we need to start doing this.

S1

Speaker 1

40:43

And now these overbets, what are called overbets, have become really common in high-level poker play.

S2

Speaker 2

40:48

Have you ever talked to somebody like Daniel Negreanu about this? He seems to be a student of the game.

S1

Speaker 1

40:53

I did actually have a conversation with Daniel Negreanu once, yeah. I was visiting the Isle of Man to talk to Poker Stars about AI. And Daniel Negreanu was there when we had dinner together with some other people.

S1

Speaker 1

41:07

And yeah, he was really interested in it. He mentioned that he was excited about learning from these AIs.

S2

Speaker 2

41:14

So he wasn't scared, he was excited.

S1

Speaker 1

41:16

He was excited. And he honestly, he wanted to play against the bot. He thought he had a decent chance of beating it.

S1

Speaker 1

41:23

I think he, you know, this was like several years ago when I think it was like not as clear to everybody that, you know, the AIs were taking over. I think now people recognize that like, if you're playing against a bot, there's like no chance that you have in a game like poker.

S2

Speaker 2

41:38

So consistently the bots will win. The bots have heads up and in other variants too. So multi, 6 player Texas hold them, no limit Texas hold them, the bots win?

S1

Speaker 1

41:51

Yeah, that's the case. So I think there is some debate about like, is it true for every single variant of poker? I think, I think for every single variant of poker, if somebody really put in the effort, they can make an AI that would beat all humans at it.

S1

Speaker 1

42:04

We've focused on the most popular variants. So heads up, no limit Texas Hold'em. And then we followed that up with 6 player poker as well, where we managed to make a bot that beat expert human players. And I think even there now, it's pretty clear that humans don't stand a chance.

S2

Speaker 2

42:20

See, I would love to hook up an AI system that looks at EEG, like how, like actually tries to optimize the toughness of the spot it puts a human in. And I would love to see how different is that from the game theory optimal. So you try to maximize the heart rate of the human player, like the freaking out over a long period of time.

S2

Speaker 2

42:42

I wonder if there's going to be different strategies that emerge that are close in terms of effectiveness. Because something tells me you could still be achieve superhuman level performance by just making people sweat.

S1

Speaker 1

42:58

I feel like that there's a good chance that is the case. Yeah, If you're able to see like, that it's like a decent proxy for score, right? And this is actually like the common poker wisdom when they're teaching players, before there were bots, and they were trying to teach people how to play poker, they would say like, the key to the game is to put your opponent into difficult spots.

S1

Speaker 1

43:18

It's a good estimate for if you're making the right decision

S2

Speaker 2

43:21

So what else can you say about the fundamental role of search in poker? And maybe if you can also relate it to chess and go in these games, What's the role of search to solve in these games?

S1

Speaker 1

43:36

Yeah, I think a lot of people under, this is true for the general public, and I think it's true for the AI community. A lot of people underestimate the importance of search for these kinds of game AI results. An example of this is TDGammon that came out in 1992.

S1

Speaker 1

43:52

This was the first real instance of a neural net being used in a game AI. It's a landmark achievement. It was actually the inspiration for AlphaZero. And it used search.

S1

Speaker 1

44:01

It used 2-ply search to figure out its next move. You got Deep Blue, there it was very heavily focused on search, looking many, many moves ahead, farther than any human could. And that was key for why it won. And then even with something like AlphaGo, I mean, AlphaGo is commonly hailed as a landmark achievement for neural nets, and it is, but there's also this huge component of search, Monte Carlo tree search to AlphaGo, that was key, absolutely essential for the AI to be able to beat top humans.

S1

Speaker 1

44:35

I think a good example of this is you look at the latest versions of AlphaGo. It was called AlphaZero. And there's this metric called Elo rating, where you can compare different humans and you can compare bots to humans. Now, a top human player is around 3,600 Elo, maybe a little bit higher now.

S1

Speaker 1

44:55

AlphaZero, the strongest version, is around 5,200 Elo. But If you take out the search that's being done at test time, and by the way, what I mean by search is the planning ahead, the thinking of like, oh, if I move my, if I place this stone here and then he does this, and then you look like 5 moves ahead and you see like what the board state looks like. That's what I mean by search. If you take out the search that's done during the game, the ELO rating drops to around 3000.

S1

Speaker 1

45:22

So even today, what, 7 years after AlphaGo, if you take out the Monte Carlo tree search that's being done at 1 playing against the human, the bots are not superhuman. Nobody has made a raw neural net that is superhuman in Go.

S2

Speaker 2

45:39

That's worth lingering on. That's quite profound. So without search, that just means looking at the next move and saying, this is the best move.

S2

Speaker 2

45:49

So having a function that estimates accurately what the best move is without search.

S1

Speaker 1

45:55

Yeah, and all these bots, they have what's called a policy network where it will tell you, this is what the neural net thinks is the next best move. And it's kind of like the intuition that a human has. The human looks at the board and any Go or Chess master will be able to tell you like, oh, instantly, here's what I think the right move is.

S1

Speaker 1

46:17

And the bot is able to do the same thing. But just like how a human, grandmaster can make a better decision if they have more time to think, when you add on this Monte Carlo tree search, the bot is able to make a better decision.

S2

Speaker 2

46:30

Yeah, I mean, of course a human is doing something like search in their brain, but it's not, I hesitate to draw a hard line, but it's not like Monte Carlo tree search. It's more like sequential language model generation. So it's like a different, the neural network is doing the searching.

S2

Speaker 2

46:51

I wonder what the human brain is doing in terms of searching. Because you're doing that like computation, a humanist computing. They have intuition, they have gut, they have a really strong ability to estimate, you know, amongst the top players of what is good and not position without calculating all the details, but they're still doing search in their head, but it's a different kind of search. Have you ever thought about, like, what is the difference between the human, the search that the human is performing versus what computers are doing?

S1

Speaker 1

47:20

I have thought a lot about that and I think it's a really important question. So the AI in Alpha and Alphas in AlphaGo or any of these Go AIs, they're all doing Monte Carlo tree search, which is a particular kind of search. And it's actually a symbolic tabular search.

S1

Speaker 1

47:36

It uses the neural net to guide its search, but it isn't actually like full on neural net. Now, that kind of search is very successful in these kinds of perfect information board games like chess and Go. But if you take it to a game like poker, for example, it doesn't work. It can't understand the concept of hidden information.

S1

Speaker 1

47:56

It doesn't understand the balance that you have to strike between the amount that you're raising versus the amount that you're calling. And in every 1 of these games, you see a different kind of search. And the human brain is able to plan for all these different games in a very general way. Now, I think that's 1 thing that we're missing from AI today.

S1

Speaker 1

48:14

And I think it's a really important missing piece, the ability to plan and reason more generally across a wide variety of different settings.

S2

Speaker 2

48:23

In a way where the general reasoning makes you better each 1 of the games, not worse. Yeah,

S1

Speaker 1

48:29

So you can kind of think of it as like neural nets today, they'll give you like transformers, for example, are super general, but they'll give you, it'll output an answer in like 100 milliseconds. And if you tell it like, oh, you've got 5 minutes, do you have a decision, feel free to take more time to make a better decision. It's not gonna know what to do with that.

S1

Speaker 1

48:48

But a human, if you're playing a game like chess, they're gonna give you a very different answer depending on if you say, oh, you've got 100 milliseconds or you've got 5 minutes.

S2

Speaker 2

48:57

Yeah, I mean, people have started using transformational language models in an iterative way that does improve the answer or like showing the work kind of idea?

S1

Speaker 1

49:07

Yeah, they got this thing called chain of thought reasoning. And that's, I think-

S2

Speaker 2

49:11

Super promising, right?

S1

Speaker 1

49:12

Yeah, I think it's a good step in the right direction. I would kind of like say it's similar to Monte Carlo rollouts in a game like chess. There's a kind of search that you can do where you're saying, like, I'm going to roll out my intuition and see, like, without really thinking, you know, what are the better decisions I can make farther down the path?

S1

Speaker 1

49:30

What would I do if I just acted according to intuition for the next 10 moves? And that gets you an improvement, but I think that there's much richer kinds of planning that we could do.

S2

Speaker 2

49:41

So when Labradus actually beat the Poco players, what did that feel like? What was that, I mean, actually on that day, what were you feeling like? Were you nervous?

S2

Speaker 2

49:51

I mean, poker was 1 of the games that you thought like is not gonna be solvable because of the human factor. So at least in the narratives we'll tell ourselves the human factor is so fundamental to the game of poker.

S1

Speaker 1

50:05

Yeah, the Libratus competition was super stressful for me. Also, I mean, I was working on this like basically continuously for a year leading up to the competition. I mean, for me, it became like very clear, like, okay, this is the search technique, this is the approach that we need.

S1

Speaker 1

50:19

And then I spent a year working on this pretty much nonstop.

S2

Speaker 2

50:22

Can we actually get into details, like what programming languages is it written in? What's some interesting implementation details that are fun slash painful?

S1

Speaker 1

50:33

Yeah, so 1 of the interesting things about Libratus is that we had no idea what the bar was to actually beat top humans. We could play against like our prior bots and that kind of gives us some sense of like, are we making progress? Are we going in the right direction?

S1

Speaker 1

50:45

But we had no idea like what the bar actually was. And so we threw a huge amount of resources at trying to make the strongest bot possible. So we use C++, it was parallelized. We were using, I think like a thousand CPUs, maybe more actually.

S1

Speaker 1

51:00

And today that sounds like nothing, but for a grad student back in 2016, that was a huge amount of resources.

S2

Speaker 2

51:06

Well, it's still a lot for even any grad student today. It's still tough to get, or even to allow yourself to think in that, in terms of scale at CMU, at MIT, anything like that?

S1

Speaker 1

51:18

Yeah, and talking about terabytes of memory. So it was a very parallelized, and it had to be very fast too, because the more games that you could simulate, the stronger the bot would be.

S2

Speaker 2

51:30

So is there some like John Carmack style, like efficiencies you had to come up with, like an efficient way to represent the hand, all that kind of stuff?

S1

Speaker 1

51:39

There are all sorts of optimizations that I had to make to try to get this thing to run as fast as possible. They were like, How do you minimize the latency? How do you package things together so that you minimize the amount of communication between the different nodes?

S1

Speaker 1

51:53

How do you optimize the algorithms so that you can try to squeeze out more and more from the game that you're actually playing? All these kinds of different decisions that I had to make.

S2

Speaker 2

52:03

Just a fun question, what IDE did you use for C++?

S1

Speaker 1

52:10

I think I used Visual Studio, actually.

S2

Speaker 2

52:12

Okay, is that still carried through to today?

S1

Speaker 1

52:15

VS Code is what I use today. It seems like it's pretty popular.

S2

Speaker 2

52:18

It's the community, basically conversion on. Okay, cool. So you got this super optimized C++ system and then you show up to the day of competition.

S2

Speaker 2

52:28

Yeah. Humans versus machine. How did it feel throughout the day?

S1

Speaker 1

52:34

Super stressful. I mean, I thought going into it that we had like a 5050 chance, because basically, I thought if they play in a totally normal style, I think we'll squeak out a win. But there's always a chance that they can find some weakness in the bot.

S1

Speaker 1

52:50

And if they do, and we're playing like for 20 days, 120,000 hands of poker, they have a lot of time to find weaknesses in the system. And if they do, we're going to get crushed. And that's actually what happened in the previous competition. The humans, you know, they started out, it wasn't like they were winning from the start, but then they found these weaknesses that they could take advantage of.

S1

Speaker 1

53:07

And for the next, you know, like 10 days, they were just crushing the bot, stealing money from it.

S2

Speaker 2

53:12

What were the weaknesses they found? Like maybe over betting was effective, that kind of stuff. So certain betting strategies worked.

S1

Speaker 1

53:19

What they found is, yeah, over betting, like betting certain amounts, the bot would have a lot of trouble dealing with those sizes. And then also when the bot got into really difficult all-in situations, it wasn't able to, because it wasn't doing search, it had to clump different hands together and it would treat them identically. And so it wouldn't be able to distinguish, you know, like having a king high flush versus an Ace High Flush.

S1

Speaker 1

53:46

And in some situations that really matters a lot. And so they could put the bot into those situations and then the bot would just bleed money.

S2

Speaker 2

53:52

Clever humans. Okay, so I didn't realize it was over 20 days. So what were the humans like over those 20 days?

S2

Speaker 2

54:02

And what was the bot like?

S1

Speaker 1

54:04

So we had set up the competition, like I said, there was $200,000 in prize money and they would get paid a fraction of that depending on how well they did relative to each other.

S2

Speaker 2

54:14

So I

S1

Speaker 1

54:14

was kind of hoping that they wouldn't work together to try to find weaknesses in the bot, but they entered the competition with their number 1 objective being to beat the bot. And they didn't care about individual glory. They were like, we're all going to work as a team to try to take down this bot.

S1

Speaker 1

54:28

So they immediately started comparing notes. What they would do is they would coordinate looking at different parts of the strategy to try to find out weaknesses. And then at the end of the day, we actually sent them a log of all the hands that were played and what cards the bot had on each of those hands.

S2

Speaker 2

54:46

Oh, wow. Yeah. That's gutsy.

S1

Speaker 1

54:49

Yeah, it was honestly, and I'm not sure why we did that in retrospect, but I mean, I'm glad we did it because we ended up winning anyway, but that, if you've ever played poker before, like that is golden information. I mean, to know, Usually when you play poker, you see about a third of the hands to show down. And to just hand them all the cards that the bot had on every single hand, that was just a goldmine for them.

S2

Speaker 2

55:11

And

S1

Speaker 1

55:11

so then they would review the hands and try to see like, okay, could they find patterns in the bot, the weaknesses? And could they then, then they would coordinate and study together and try to figure out, okay, now this person's going to explore this part of the strategy for weaknesses. This person's going to explore this part of the strategy for weaknesses.

S2

Speaker 2

55:27

It's a kind of psychological warfare showing them the hands. Yeah. I mean, I'm sure you didn't think of it that way, but like doing that means you're confident in the bots ability to win.

S1

Speaker 1

55:38

Well, that's 1 way of putting it. I wasn't super confident. So, you know, going in, like I said, I think I had like 50, 50 odds on us winning.

S1

Speaker 1

55:46

The, when we actually, when we announced the competition, the poker community decided to gamble on who would win, and their initial odds against us were like 4 to 1. They were really convinced that the humans were gonna pull out a win. The bot ended up winning for 3 days straight. And even then after 3 days, the betting odds were still just 50-50.

S1

Speaker 1

56:08

And then at that point, it started to look like the humans were coming back. They started to like, but Poker is a very high variance game. And I think what happened is like, they thought that they spotted some weaknesses that weren't actually there. And then around day 8, it was just very clear that they were getting absolutely crushed.

S1

Speaker 1

56:28

And from that point, I mean, for a while there, I was super stressed out thinking like, oh my God, the humans are coming back and they've found weaknesses and now we're just gonna lose the whole thing. But no, it ended up going in the other direction and the bot ended up crushing them in the long run.

S2

Speaker 2

56:42

How did it feel at the end, like as a human being, as a person who loves, appreciates the beauty of the game of poker, and as a person who appreciates the beauty of AI, is there, did you feel a certain kind of way about it?

S1

Speaker 1

56:59

I felt A lot of things, man. I mean, at that point in my life, I had spent 5 years working on this project and it was a huge sense of accomplishment. I mean, to spend 5 years working on something and finally see it succeed, yeah, I wouldn't trade that for anything in the world.

S2

Speaker 2

57:15

Yeah, because that's a real benchmark. It's not like getting some percent accuracy on a data set. This is like real.

S2

Speaker 2

57:24

This is real world. It's just a game, but it's also a game that means a lot to a lot of people. And this is humans doing their best to beat the machine. So this is a real benchmark, unlike anything else.

S1

Speaker 1

57:36

Yeah, and I mean, this is what I have been dreaming about since I was like 16 playing poker, you know, with my friends in high school. The idea that you could find a strategy, you know, approximate the Nash equilibrium, be able to beat all the poker players in the world with it. So to actually see that come to fruition and be realized, that was, it's kind of magical.

S2

Speaker 2

57:58

Yeah, especially money is on the line too. It's different than chess. And that aspect, like people get, that's why you wanna look at betting markets if you want to actually understand what people really think.

S2

Speaker 2

58:11

And in the same sense, poker, it's really high stakes because it's money. And to solve that game, that's an amazing accomplishment. So the leap from that to multi-way six-player poker, how difficult is that jump? And what are some interesting differences between heads-up poker and multi-way poker?

S1

Speaker 1

58:32

Yeah, so I mentioned Nash equilibrium in two-player zero-sum games. If you play that strategy, you are guaranteed to not lose an expectation no matter what your opponent does. Now, once you go to six-player poker, you're no longer playing a two-player zero-sum game.

S1

Speaker 1

58:45

And so there was a lot of debate among the academic community and among the poker community about how well these techniques would extend beyond just two-player heads-up poker. Now what I had come to realize is that the techniques actually I thought really would extend to six-player poker. Because even though in theory, they don't give you these guarantees outside of two-player zero-sum games, in practice, it still gives you a really strong strategy. Now, there were a lot of complications that would come up with six-player poker.

S1

Speaker 1

59:15

Besides the game-theoretic aspect, I mean, for 1, the game is just exponentially larger. So the main thing that allowed us to go from two-player to six-player was the idea of depth-limited search. So I said before, we would do search. We would plan out.

S1

Speaker 1

59:33

The bot would plan out what it's going to do next and for the next several moves. And in Libratus, that search was done extending all the way to the end of the game. So it would have to start from the turn onwards, looking maybe 10 moves ahead, it would have to figure out what it was doing for all those moves. Now when you get to 6-player poker, it can't do that exhaustive search anymore because the game is just way too large.