Noam Brown is a research scientist at FAIR, Meta AI, co-creator of AI that achieved superhuman level performance in games of No-Limit Texas Hold'em and Diplomacy. Please support this podcast by checking out our sponsors:
- True Classic Tees: https://trueclassictees.com/lex and use code LEX to get 25% off
- Audible: https://audible.com/lex to get 30-day free trial
- InsideTracker: https://insidetracker.com/lex to get 20% off
- ExpressVPN: https://expressvpn.com/lexpod to get 3 months free

EPISODE LINKS:
Noam's Twitter: https://twitter.com/polynoamial
Noam's LinkedIn: https://www.linkedin.com/in/noam-brown-8b785b62/
webDiplomacy: https://webdiplomacy.net/
Noam's papers:
Superhuman AI for multiplayer poker: https://par.nsf.gov/servlets/purl/10119653
Superhuman AI for heads-up no-limit poker: https://par.nsf.gov/servlets/purl/10077416
Human-level play in the game of Diplomacy: https://www.science.org/doi/10.1126/science.ade9097

PODCAST INFO:
Podcast website: https://lexfridman.com/podcast
Apple Podcasts: https://apple.co/2lwqZIr
Spotify: https://spoti.fi/2nEwCF8
RSS: https://lexfridman.com/feed/podcast/
Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4
Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41

OUTLINE:
0:00 - Introduction
1:09 - No Limit Texas Hold 'em
5:02 - Solving poker
18:12 - Poker vs Chess
24:50 - AI playing poker
58:18 - Heads-up vs Multi-way poker
1:09:08 - Greatest poker player of all time
1:12:42 - Diplomacy game
1:22:33 - AI negotiating with humans
2:04:58 - AI in geopolitics
2:09:43 - Human-like AI for games
2:15:44 - Ethics of AI
2:19:57 - AGI
2:23:57 - Advice to beginners

SOCIAL:
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
- Medium: https://medium.com/@lexfridman
- Reddit: https://reddit.com/r/lexfridman
- Support on Patreon: https://www.patreon.com/lexfridman

A lot of people were saying like, oh, this whole idea of game theory, it's just nonsense.

And if you really want to make money, you got to like look into the other person's eyes and read their soul and figure out what cards they have.

But what happened was where we played our bot against 4 top heads up, no limit, hold them poker players.

And the bot wasn't trying to adapt to them.

It was just trying to approximate the Nash equilibrium and it crushed them.

The following is a conversation with Noah Brown, research scientist at FAIR, Facebook AI research group at Meta AI.

He co-created the first AI system that achieved superhuman level performance in no limit Texas Hold'em, both heads up and multiplayer.

And now recently, he co-created an AI system that can strategically out-negotiate humans using natural language in a popular board game called Diplomacy, which is a war game that emphasizes negotiation.

To support it, please check out our sponsors in the description.

And now, dear friends, here's Noam Brown.

You've been a lead on 3 amazing AI projects.

So we got Libratus that solved, or at least achieved human level performance on No Limit Texas Hold'em poker with 2 players, heads up.

You got Pluribus that solved No Limit Texas Hold'em poker with 6 players.

And just now you have Cicero, these are all names of systems, that solved or achieved human level performance on the game of diplomacy, which for people who don't know, is a popular strategy board game.

Kennedy, and Henry Kissinger, and many other big famous people in the decades since.

So let's talk about poker and diplomacy today.

What is the game of No Limit Texas Hold'em?

Well, no limit Texas Hold'em poker is the most popular variant of poker in the world.

So, you go to a casino, you sit down at the poker table, the game that you're playing is no limit Texas Hold'em.

If you watch movies about poker, like Casino Royale or Rounders, the game that they're playing is no limit Texas hold'em poker.

Now it's very different from limit hold'em in that you can bet any amount of chips that you want and so the stakes escalate really quickly.

You start out with like 1 or 2 dollars in the pot and then by the end of the hand, you've got like a thousand dollars in there maybe.

So the option to increase the number very aggressively and very quickly is always there.

Right, the no limit aspect is there's no limits to how much you can bet.

In limit hold'em, there's like $2 in the pot, you can only bet like $2.

But if you got $10,000 in front of you, you're always welcome to put $10,000 into the pot.

So I've got a chance to hang out with Phil Helmuth who plays all these different variants of poker and correct me if I'm wrong, but it seems like no limit rewards crazy versus the other ones rewards more kind of calculated strategy or no, because you're sort of looking from an analytic perspective, is strategy also rewarded in no limit Texas hold'em?

I think both variants reward strategy, but I think what's different about no limit hold'em is it's much easier to get jumpy.

You go in there thinking you're going to play for like $100 or something and suddenly there's like $1,000 in the pot.

When you're playing poker, you always want to choose the action that's going to maximize your expected value.

If you're ever in a situation where the amount of money that's at stake is going to have a material impact on your life, then you're going to play in a more risk-averse style.

If somebody makes a huge bet, you're going to, if you're playing no limit hold'em and somebody makes a huge bet, there might come a point where you're like, this is too much money for me to handle.

And that's what throws a lot of people off.

So that's the big difference, I think, between no limit and limit.

What about on the action side, when you're actually making that big bet?

I was trying to refer to the technical term of crazy, meaning use the big jump in the bet to completely throw off the other person in terms of their ability to reason optimally.

I think 1 of the key strategies in poker is to put the other person into an uncomfortable position.

And if you're doing that, then you're playing poker well.

And there's a lot of opportunities to do that in no limit hold'em.

You can have like $50 in there, you throw in a $1,000 bet.

And that's sometimes, if you do it right, it puts the other person in a really tough spot.

Now, it's also possible that you make huge mistakes that way.

And so it's really easy to lose a lot of money in no limit hold'em if you don't know what you're doing.

But there's a lot of upside potential too.

So when you build systems, AI systems that play these games, we'll talk about poker, we'll talk about diplomacy, are you drawn in in part by the beauty of the game itself, AI aside, or is it to you primarily a fascinating problem set for the AI to solve?

When I started playing poker when I was in high school and the idea to me that there is a correct, an objectively correct way of playing poker.

And if you could figure out what that is, then you're making unlimited money basically.

That's like a really fascinating concept to me.

And so I was fascinated by the strategy of poker, even when I was like 16 years old.

It wasn't until much later that I actually worked on poker AIs.

So there was a sense that you can solve poker, like in the way you can solve chess, for example, or checkers.

Yeah, and so in that same way, you could technically solve chess.

You could solve chess, you could solve poker.

So this gets into the concept of a Nash equilibrium.

Okay, So in any finite two-player zero-sum game, there is an optimal strategy that if you play it, you are guaranteed to not lose an expectation no matter what your opponent does.

And this is kind of a radical concept to a lot of people, but it's true in chess.

It's true in any finite two-player zero-sum game.

And to give some intuition for this, you can think of rock, paper, scissors.

In rock, paper, scissors, if you randomly choose between throwing rock, paper, and scissors with equal probability, then no matter what your opponent does, you are not going to lose an expectation.

You're not going to lose an expectation in the long run.

There exists some strategy, some really complicated strategy, that if you play that, you are guaranteed to not lose money in the long run.

And I should say, this is for 2 player poker.

When you say in expectation, you're guaranteed not to lose in expectation.

So you're gonna have hands where you win, you're gonna have hands with your lose.

Even if you're playing the perfect strategy, you can't guarantee that you're gonna win every single hand.

But if you play for long enough, then you are guaranteed to at least break even and in practice probably win.

So that's an expectation, the size of your stack, generally speaking.

Now that doesn't include anything about the fact that you can go broke.

It doesn't include any of those kinds of normal real world limitations.

So I mean, most games that you play are finite in size.

It's also true, actually, that there exists this perfect strategy in many infinite games as well.

There are like some edge cases where you don't have a Nash equilibrium in a two-player zero-sum game.

So you can think of a game where you're like, you know, if we're playing a game where whoever names the bigger number is the winner, there's no Nash equilibrium to that game.

Okay, so that's, and then the 0 sum aspect.

So there exists a Nash equilibrium in non 2 player 0 sum games as well.

And by the way, just to clarify what I mean by 2 players, 0 sum, I mean there's 2 players, and whatever 1 player wins, the other player loses.

So if we're playing poker and I win $50, that means that you're losing $50.

Now, outside of two-player zero-sum games, there still exists Nash equilibria, but they're not as meaningful.

Because you can think of a game like Risk.

If everybody else on the board decides to team up against you and take you out, there's no perfect strategy you can play that's going to guarantee that you win there.

So outside of two-player zero-sum games, there's no guarantee that you're going to win by playing a Nash equilibrium.

Have you ever tried to model in the other aspects of the game, which is like the pleasure you draw from playing the game?

And then if you're a professional poker player, if you're exciting, even if you lose, you know, the money you would get from the attention you get to the sponsors and all that kind of stuff, is that, that'd be a fun thing to model in.

Or is that make it sort of super complex to include the human factor in this full complexity?

I think you bring up a couple good points there.

So I think a lot of professional poker players, I mean, they get a huge amount of money, not from actually playing poker, but from the sponsorships and having a personality that people want to tune in and watch.

That's a big way to make a name for yourself in poker.

I just wonder from an AI perspective, if you create, and we'll talk about this more, maybe AI system that also talks trash and all that kind of stuff, that that becomes part of the function to maximize.

Maybe sometimes you want to be suboptimal and you lose the chaos.

And maybe sometimes you want to be overly aggressive because the audience loves that.

I think what you're getting at here is that there's a difference between making an AI that wins a game and an AI that's fun to play

Yeah, and I think I've heard talks from game designers.

And they say people that work on AI for actual recreational games that people play.

And they say, yeah, there's a big difference between trying to make an AI that actually wins.

And you look at a game like Civilization, the way that the AIs play is not optimal for trying to win.

They're trying to be fun and engaging and that makes for a better game.

I just talked to Todd Howard, who is the creator of Fallout and the Elder Scrolls series and Starfield, the new game coming out.

And the creator of what I think is the greatest game of all time, which is Skyrim and the NPCs there.

The AI that governs that whole game is very interesting, but the NPCs also are super interesting.

And considering what language models might do to NPCs in an open world RPG role-playing game, it's super exciting.

Yeah, honestly, I think this is 1 of the first applications where we're going to see real consumer interaction with large language models.

I guess Elder Scrolls 6 is in development now.

They're probably pretty close to finishing it.

But I would not be surprised at all if Elder Scrolls VII was using large language models for their NPCs.

No, but they're just releasing the Starfield game.

And so whatever it is, whenever the date is, I don't know what the date is, calm down.

But it would be, I don't know, like 2024, 25, 26.

So it's actually very possible that would include language models.

I was listening to this talk by a gaming executive when I was in grad school, and 1 of the questions that a person in the audience asked is, why are all these games so focused on fighting and killing?

And the person responded that it's just so much harder to make an AI that can talk with you and cooperate with you than it is to make an AI that can fight you.

And I think once this technology develops further and you can have a, you can reach a point where like not every single line of dialogue has to be scripted, it unlocks a lot of potential for new kinds of games, like much more like positive interactions that are not so focused on fighting.

So you'll be in like a call of duty game.

Instead of doing the shooting, you'll just be hanging out and like arguing with an AI about like passive aggressive, and then you won't be able to sleep that night, you have to return and continue the argument that you were emotionally hurt.

I mean, yeah, I think that's actually an exciting world.

Whatever is the drama, the chaos that we love, the push and pull of human connection, I think it's possible to do that in the video game world.

And I think you could be messier and make more mistakes in the video game world, which is why it would be a nice place.

And also it doesn't have as deep of a real psychological impact because inside video games it's kind of understood that you're in a not a real world.

So whatever crazy stuff AI does, we have some flexibility to play.

Just like with a game of diplomacy, it's a game.

This is not real geopolitics, not real war.

So you can have a little bit of fun, a little bit of chaos.

All right, so there's different ways to find a Nash equilibrium.

So the way that we do it is with this process called self-play.

Basically we have this algorithm that starts by playing totally randomly and it learns how to play the game by playing against itself.

So it will start playing the game totally randomly, and then if it's playing poker, it'll eventually get to the end of the game and make $50.

And then it will review all the decisions that it made along the way and say, what would have happened if I had chosen this other action instead?

You know, if I had raised here instead of called, what would the other player have done?

And because it's playing against a copy of itself, it's able to do that counterfactual reasoning.

So I can say, okay, well, if I took this action and the other person takes this action, and then I take this action.

And eventually, I make $150 instead of $50.

And so it updates the regret value for that action.

Regret is basically like, how much does it regret having not played that action in the past?

And when it encounters that same situation again, it's going to pick actions that have higher regret with higher probability.

Now, it'll just keep simulating the games this way.

It'll keep, you know, accumulating regrets for different situations.

And in the long run, if you pick actions that have high regret with higher probability in the correct way, it's proven to converge to a Nash equilibrium.

It's true for chess, it's true for poker, it's particularly useful for poker.

So this is the method of counterfactual regret minimization?

This is counterfactual regret minimization.

That doesn't have to do with self-play, it has to do with just any, if you follow this kind of process, self-play or not, you will be able to arrive at an optimal set of actions.

So this counterfactual regret minimization is a kind of self-play.

It's a principled kind of self-play that's proven to converge to Nash equilibria, even in imperfect information games.

Now you can have other forms of self-play and people use other forms of self-play for perfect information games, where you have more flexibility, the algorithm doesn't have to be as theoretically sound in order to converge to that class of games because it's a simpler setting.

Sure, so I kind of, in my brain, the word self-play has mapped to neural networks, but we're speaking something bigger than just neural networks.

The self-play mechanism is just the mechanism of a system playing itself.

Self-play is not tied specifically to neural nets.

It's a kind of reinforcement learning, basically.

And I would also say this process of like trying to reason, oh, what would the value have been if I had taken this other action instead?

This is very similar to how humans learn to play a game like poker, right?

Like you probably played poker before and with your friends you probably ask like, oh, what do you have called me if I raise there?

You know, and that's a person trying to do the same kind of like learning from a counterfactual that the AI is doing.

Okay, and if you do that at scale, you're gonna be able to learn an optimal policy.

Yeah, Now where the neural nets come in, I said like, okay, if it's in that situation again, then it will choose the action that has high regret.

Now the problem is that poker is such a huge game.

I think no limit Texas Hold'em, the version that we were playing has 10 to the 161 different decision points, which is more than the number of atoms in the universe squared.

I mean, it depends on the number of chips that you have, the stacks and everything, but like the version that we were playing was 10 to the 161.

Which I assume would be a somewhat simplified version anyway.

Because I bet there's some like step function you had for like bets.

That's I'm saying like we played the full game.

And the bot maybe was constrained in like what it considered for bet sizes, but the person on the other side could bet whatever they wanted.

Yeah, I mean, 161 plus or minus 10 doesn't matter.

And so the way neural nets help out here is, you don't have to run into the same exact situation because that's never gonna happen again.

The odds of you running into the same exact situation are pretty slim.

But if you run into a similar situation, then you can generalize from other states that you've been in that kind of look like that 1.

And you can say like, well, these other situations, I had high regret for this action.

And so maybe I should play that action here as well.

Which is the more complex game, chess or poker or go or poker?

It's like somebody's screaming on Reddit right now.

I'm sure like David Silver is gonna get really angry at me.

So first of all, you have the imperfect information aspect.

And so it's, we can go into that, but like once you introduce imperfect information, things get much more complicated.

So we should say, maybe you can describe what is seen to the players, what is not seen in the game of Texas Hold'em.

Yeah, so Texas Hold'em, you get 2 cards facedown that only you see.

And so that's the hidden information of the game.

The other players also all get 2 cards face down that only they see.

And so you have to kind of, as you're playing, reason about like, okay, what do they think I have?

And that's kind of where bluffing comes into play, right?

Because the fact that you can bluff, the fact that you can bet with a bad hand and still win is because they don't know what your cards are.

And that's the key difference between a perfect information game like chess and go and imperfect information games like poker.

The implied statement is the game I solved is much tougher.

But Yeah, so when you're playing, I'm just gonna do random questions here.

So when you're playing your opponent under imperfect information, is there some degree to which you're trying to estimate the range of hands that they have?

So what are the different approaches to the imperfect information game?

So the key thing to understand about why imperfect information makes things difficult is that you have to worry not just about which actions to play, but the probability that you're going to play those actions.

So you think about Rock-Paper-Scissors, for example.

Rock-Paper-Scissors is an imperfect information game because you don't know what I'm about to throw.

And so you can't just say like, I'm just going to throw a rock every single time because the other person's going to figure that out and notice a pattern and then suddenly you're going to start losing.

And so you don't just have to figure out like which action to play, you have to figure out the probability that you play it.

And really importantly, the value of an action depends on the probability that you're going to play it.

So if you're playing rock every single time, that value is really low.

But if you're never playing rock, you play rock like 1% of the time, then suddenly the other person is probably going to be throwing scissors.

And when you throw rock, the value of that action is going to be really high.

Now, you take that to poker, what that means is the value of bluffing, for example, if you're the kind of person that never bluffs and you have this reputation as somebody that never bluffs, and suddenly you bluff, there's a really good chance that that bluff is going to work, and you're going to make a lot of money.

On the other hand, if you've got a reputation, like if they've seen you play for a long time, and they see, oh, you're the kind of person that's bluffing all the time, when you bluff, they're not going to buy it.

See all Lex Fridman transcripts on Youtube

Noam Brown: AI vs Humans in Poker and Games of Strategic Negotiation | Lex Fridman Podcast #344