Marcus Hutter is a senior research scientist at DeepMind and professor at Australian National University. Throughout his career of research, including with Jürgen Schmidhuber and Shane Legg, he has proposed a lot of interesting ideas in and around the field of artificial general intelligence, including the development of the AIXI model which is a mathematical approach to AGI that incorporates ideas of Kolmogorov complexity, Solomonoff induction, and reinforcement learning.

This episode is presented by Cash App. Download it & use code "LexPodcast":
Cash App (App Store): https://apple.co/2sPrUHe
Cash App (Google Play): https://bit.ly/2MlvP5w

PODCAST INFO:
Podcast website:
https://lexfridman.com/podcast
Apple Podcasts:
https://apple.co/2lwqZIr
Spotify:
https://spoti.fi/2nEwCF8
RSS:
https://lexfridman.com/feed/podcast/
Full episodes playlist:
https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4
Clips playlist:
https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41

EPISODE LINKS:
Hutter Prize: http://prize.hutter1.net
Marcus web: http://www.hutter1.net
Books mentioned:
- Universal AI: https://amzn.to/2waIAuw
- AI: A Modern Approach: https://amzn.to/3camxnY
- Reinforcement Learning: https://amzn.to/2PoANj9
- Theory of Knowledge: https://amzn.to/3a6Vp7x

OUTLINE:
0:00 - Introduction
3:32 - Universe as a computer
5:48 - Occam's razor
9:26 - Solomonoff induction
15:05 - Kolmogorov complexity
20:06 - Cellular automata
26:03 - What is intelligence?
35:26 - AIXI - Universal Artificial Intelligence
1:05:24 - Where do rewards come from?
1:12:14 - Reward function for human existence
1:13:32 - Bounded rationality
1:16:07 - Approximation in AIXI
1:18:01 - Godel machines
1:21:51 - Consciousness
1:27:15 - AGI community
1:32:36 - Book recommendations
1:36:07 - Two moments to relive (past and future)

CONNECT:
- Subscribe to this YouTube channel
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/LexFridmanPage
- Instagram: https://www.instagram.com/lexfridman
- Medium: https://medium.com/@lexfridman
- Support on Patreon: https://www.patreon.com/lexfridman

The following is a conversation with Marcus Hutter, senior research scientist at Google DeepMind.

Throughout his career of research, including with Juergen Schmidhuber and Shane Legg, he has proposed a lot of interesting ideas in and around the field of artificial general intelligence, including the development of AIXI, spelled A-I-X-I model, which is a mathematical approach to AGI that incorporates ideas of Kolmogorov complexity, Solomonov induction, and reinforcement learning.

In 2006, Marcus launched the 50,000 Euro Hutter Prize for lossless compression of human knowledge.

The idea behind this prize is that the ability to compress well is closely related to intelligence.

Specifically, if you can compress the first 100 megabytes or 1 gigabyte of Wikipedia better than your predecessors, your compressor likely has to also be smarter.

The intention of this prize is to encourage the development of intelligent compressors as a path to AGI.

In conjunction with this podcast release just a few days ago, Marcus announced a 10X increase in several aspects of this prize, including the money, to 500,000 euros.

The better your compressor works relative to the previous winners, the higher fraction of that prize money is awarded to you.

You can learn more about it if you Google simply, Hutter Prize.

I'm a big fan of benchmarks for developing AI systems and the Hutter Prize may indeed be 1 that will spark some good ideas for approaches that will make progress on the path of developing AGI systems.

This is the Artificial Intelligence Podcast.

If you enjoy it, subscribe on YouTube, give it 5 stars on Apple Podcasts, support it on Patreon, or simply connect with me on Twitter at Lex Friedman, spelled F-R-I-D-M-A-N.

As usual, I'll do 1 or 2 minutes of ads now, and never any ads in the middle that can break the flow of the conversation.

I hope that works for you and doesn't hurt the listening experience.

This show is presented by Cash App, the number 1 finance app in the App Store.

Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1.

Brokerage services are provided by Cash App investing, a subsidiary of Square, and member SIPC.

Since Cash App allows you to send and receive money digitally, peer-to-peer, and security in all digital transactions is very important, let me mention the PCI data security standard that Cash App is compliant with.

I'm a big fan of standards for safety and security.

PCI DSS is a good example of that, where a bunch of competitors got together and agreed that there needs to be a global standard around the security of transactions.

Now we just need to do the same for autonomous vehicles and AI systems in general.

So again, if you get Cash App from the App Store or Google Play and use the code LEXPODCAST, you'll get $10 and Cash App will also donate $10 to FIRST, 1 of my favorite organizations that is helping to advance robotics and STEM education for young people around the world.

And now, here's my conversation with Markus Hutter.

Do you think of the universe as a computer or maybe an information processing system?

Okay, I'll go with the big question first.

I think it's a very interesting hypothesis or idea.

And I have a background in physics, so I know a little bit about physical theories, the standard model of particle physics and general relativity theory.

And they are amazing and describe virtually everything in the universe.

And they're all, in a sense, computable theories.

And, you know, it's very elegant, simple theories which describe virtually everything in the universe.

So there's a strong indication that somehow the universe is computable, but it's a plausible hypothesis.

So what do you think, just like you said, general relativity, quantum field theory, what do you think that the laws of physics are so nice and beautiful and simple and compressible?

Do you think our universe was designed, is naturally this way, are we just focusing on the parts that are especially compressible?

Are human minds just enjoy something about that simplicity and in fact there's other things that are not so compressible?

No, I strongly believe and I'm pretty convinced that the universe is inherently beautiful, elegant and simple and described by these equations and we're not just picking that.

I mean, if there were some phenomena which cannot be neatly described, scientists would try that, right?

And there's biology which is more messy, but we understand that it's an emergent phenomena and it's complex systems, but they still follow the same rules, right, of quantum and electrodynamics.

All of chemistry follows that and we know that.

I mean, we cannot compute everything because we have limited computational resources.

No, I think it's not a bias of the humans, but it's objectively simple.

I mean, of course, you never know, you know, maybe there's some corners very far out in the universe or super, super tiny below the nucleus of atoms or, well, parallel universes which are not nice and simple, but there's no evidence for that.

And we should apply Occam's razor and choose the simplest tree consistent with it, but also it's a little bit self-referential.

So maybe a quick pause, what is Occam's Razor?

So Occam's Razor says that you should not multiply entities beyond necessity, which sort of if you translate it to proper English, means, and you know, in the scientific context, means that if you have 2 theories or hypotheses or models which equally well describe the phenomenon, your study or the data, you should choose the more simple 1.

Or sort of, that's not like a provable law perhaps.

Perhaps we'll kind of discuss it and think about it, but what's the intuition of why the simpler answer is the 1 that is likelier to be more correct descriptor of whatever we're talking about?

I believe that outcomes raiser is probably the most important principle in science.

I mean, of course, we need logical deduction and we do experimental design, but science is about understanding the world, finding models of the world, and we can come up with crazy complex models which explain everything but predict nothing, but the simple model seem to have predictive power and it's a valid question why.

You can just accept it, that is the principle of science, and we use this principle and it seems to be successful.

We don't know why, but it just happens to be.

Or you can try, you know, find another principle which explains Occam's razor.

And if we start with the assumption that the world is governed by simple rules, then there's a bias towards simplicity.

And applying Occam's razor is the mechanism to finding these rules.

And actually in a more quantitative sense, and we come back to that later in case of somnolent reduction, you can rigorously prove that.

You assume that the world is simple, then Occam's razor is the best you can do in a certain sense.

So I apologize for the romanticized question, but why do you think, outside of its effectiveness, why do you think we find simplicity so appealing as human beings?

Why does E equals MC squared seem so beautiful to us humans?

I guess mostly, in general, many things can be explained by an evolutionary argument.

And there's some artifacts in humans which are just artifacts and not evolutionary necessary.

But with this beauty and simplicity, it's, I believe, at least the core is about, like science, finding regularities in the world, understanding the world, which is necessary for survival.

If I look at a bush and I just see noise, and there is a tiger and eats me, then I'm dead.

But if I try to find a pattern, and we know that humans are prone to find more patterns in data than they are, you know, like the Mars face and all these things, but this bias towards finding patterns, even if they are non, but I mean, it's best of course if they are, yeah?

I haven't thought really about the, I thought I just loved science, but indeed, in terms of just survival purposes, there is an evolutionary argument for why we find the work of Einstein so beautiful.

Maybe a quick small tangent, could you describe what Solomonov induction is?

Yeah, so that's a theory which I claim and Rensselaer Lominov sort of claimed a long time ago that this solves the big philosophical problem of induction.

And I believe the claim is essentially true.

So, okay, for the picky listener, induction can be interpreted narrowly and widely.

And widely means also then using these models for doing predictions, so predictions also part of the induction.

So I'm a little sloppy sort of with the terminology and maybe that comes from Ray Solomonoff, you know, being sloppy, maybe I shouldn't say that.

So let me explain a little bit this theory in simple terms.

The simplest 1, say, 1, 1, 1, 1, 1, and you see 100 1s.

The natural answer, I'm going to speed up a little bit, the natural answer is of course, you know, 1.

And why should it suddenly after 100 ones be different?

So what we're looking for is simple explanations or models for the data we have.

And now the question is, a model has to be presented in a certain language.

In science, we want formal languages, and we can use mathematics, or we can use programs on a computer.

So abstractly on a Turing machine, for instance, or it can be a general purpose computer.

So, and there are of course lots of models.

You can say maybe it's 101s and then 100 zeros and 101s, that's a model, right?

And if you push that to the extreme, you are looking for the shortest program, which, if you run this program, reproduces the data you have.

And on the sequence of ones, it's very plausible, right, that print 1 loop is the shortest program.

We can give some more complex examples like 12345.

The short program is again, you know, counter.

And so that is, roughly speaking, how Solomon's induction works.

The extra twist is that it can also deal with noisy data.

So if you have, for instance, a coin flip, say a biased coin, which comes up head with 60% probability, then it will predict, it will learn and figure this out, and after a while it predict, oh, the next coin flip will be head with probability 60%, so it's the stochastic version of that.

But the goal is, the dream is, always the search for the short program.

Well, in Solomon of Induction, precisely what you do is, so you combine, so looking for the shortest program is like applying Opax Razor, like looking for the simplest theory.

There's also Epicurus principle, which says, if you have multiple hypotheses, which equally well describe your data, don't discard any of them.

And you can put it together and say, okay, I have a bias towards simplicity, but I don't rule out the larger models.

And technically what we do is we weigh the shorter models higher and the longer models lower.

And you use a Bayesian technique, so you have a prior, which is precisely 2 to the minus the complexity of the program, and you weigh all this hypothesis and take this mixture, and then you get also this stochasticity in.

Yeah, like many of your ideas, that's just a beautiful idea of weighing based on the simplicity of the program.

That seems to me maybe a very human-centric concept, seems to be a very appealing way of discovering good programs in this world.

You've used the term compression quite a bit.

We just talked about simplicity, and maybe science or just all of our intellectual pursuits is basically the attempt to compress the complexity all around us into something simple.

So What does this word mean to you, compression?

So compression means for me, finding short programs for the data or the phenomenon at hand.

You could interpret it more widely as finding simple theories which can be mathematical theories or maybe even informal, like just in words.

Compression means finding short descriptions, explanations, programs for the data.

Do you see science as a kind of our human attempt at compression?

So we're speaking more generally, because when you say programs, you're kind of zooming in on a particular sort of, almost like a computer science, artificial intelligence focus, but do you see all of human endeavor as a kind of compression?

Well, at least all of science I see as an endeavor of compression, not all of humanity maybe.

And well, there are also some other aspects of science like experimental design, right?

I mean, we create experiments specifically to get extra knowledge.

And this is, that is then part of the decision making process.

But once we have the data to understand the data is essentially compression.

So I don't see any difference between compression, understanding and prediction.

So we're jumping around topics a little bit, but returning back to simplicity, a fascinating concept of Kolmogorov complexity.

So In your sense, do most objects in our mathematical universe have high Kolmogorov complexity?

And maybe what is, first of all, what is Kolmogorov complexity?

Okay, Kolmogorov complexity is a notion of simplicity or complexity.

it takes the compression view to the extreme.

So I explained before that if you have some data sequence just think about a file on a computer and best sort of you know just a string of bits and if you and we have data compressors like we compress big files into zip files with certain compressors.

And you can also produce self-extracting RKFs, that means as an executable, if you run it, it reproduces your original file without needing an extra decompressor.

It's just the decompressor plus the RKF together in 1.

And now there are better and worse compressors and you can ask what is the ultimate compressor?

So what is the shortest possible self-extracting RKF you could produce for a certain data set, yeah, which reproduces the data set.

And the length of this is called the Kolmogorov complexity, and arguably, that is the information content in the data set.

I mean, if the data set is very redundant or very boring, you can compress it very well, so the information content should be low, and it is low according to this definition.

So it's the length of the shortest program that summarizes the data?

And what's your sense of our universe when we think about the different objects in our universe, that we, concepts or whatever, at every level, do they have high or low Kolmogorov complexity?

Do we have a lot of hope in being able to summarize much of our world?

So, as I said before, I believe that the whole universe, based on the evidence we have, is very simple.

Sorry, to linger on that, the whole universe, what does that mean?

Do you mean at the very basic fundamental level in order to create the universe?

Yes, yeah, so You need a very short program and you run it.

the thing going and then it will reproduce our universe.

Is noise a problem or is it a bug or a feature?

I would say it makes our life as a scientist really, really much harder.

I mean, think about without noise, we wouldn't need all of the statistics.

But that may be, we wouldn't feel like there's a free will.

Maybe we need that for the, for the, for the.

an illusion that noise can give you free will.

But also, if you don't have noise, you have chaotic phenomena, which are effectively like noise.

So we can't get away with statistics even then.

I mean, think about rolling a dice and forget about quantum mechanics and you know exactly how you throw it.

But I mean, it's still so hard to compute the trajectory that effectively it is best to model it as coming out with a number, this probability 1 over 6.

But from this set of philosophical Kolmogorov complexity perspective, if we didn't have noise, then arguably you could describe the whole universe as well as a standard model plus generativity.

I mean, we don't have a theory of everything yet, but sort of assuming we are close to it or have it, yeah?

Plus the initial conditions, which may hopefully be simple, and then you just run it and then you would reproduce the universe.

But that's spoiled by noise or by chaotic systems or by initial conditions, which may be complex.

So now if we don't take the whole universe, we're just a subset, just take planet Earth.

Planet Earth cannot be compressed into a couple of equations.

So when you look at the window, like the whole thing might be simple, but when you just take a small window, then...

It may become complex, and that may be counterintuitive, but there's a very nice analogy, the book, the library of all books.

So imagine you have a normal library with interesting books and you go there great lots of information and huge quite complex yeah so now I create a library which contains all possible books say of 500 pages So the first book just has AAAAA over all the pages.

The next book, AAA, and ends with B, and so on.

I can write a super short program which creates this library.

So this library which has all books has 0 information content.

And you take a subset of this library and suddenly you have a lot of information in there.

I think 1 of the most beautiful mathematical objects that at least today seems to be understudied or under-talked about is cellular automata.

What lessons do you draw from sort of the game of life for cellular automata where you start with the simple rules just like you're describing with the universe and somehow complexity emerges?

Do you feel like you have an intuitive grasp on the fascinating behavior of such systems where some, like you said, some chaotic behavior could happen, some complexity could emerge, some, it could die out in some very rigid structures.

Do you have a sense about cellular automata that somehow transfers maybe to the bigger questions of our universe?

Yeah, the cellular automata, and especially

the converse game of life, is really great because the rules are so simple, you can explain it to every child, and even by hand you can simulate a little bit, and you see these beautiful patterns emerge and people have proven that it's even Turing complete.

You cannot just use a computer to simulate Game of Life, but you can also use Game of Life to simulate any computer.

And it's the prime example probably to demonstrate that very simple rules can lead to very rich phenomena.

And people sometimes, you know, how can, how is chemistry and biology so rich?

I mean, this can't be based on simple rules.

But no, we know quantum electrodynamics describes all of chemistry.

I claim intelligence can be explained or described in 1 single equation, this very rich phenomenon.

You asked also about whether I understand this phenomenon.

And there's this saying, you never understand really things, you just get used to them.

And I think I'm pretty used to cellular automata.

So you believe that you understand now why this phenomenon happens.

I didn't play too much with this Converse game of life, but a little bit more with fractals and with the Mandelbrot set and its beautiful patterns.

And well, when the computers were really slow, and I just had a black and white monitor and programmed my own programs in assembler to.

To get these vectors on the screen, and it was mesmerized.

And much later, so I returned to this every couple of years, and then I tried to understand what is going on.

And you can understand a little bit, so I tried to derive the locations.

There are these circles and the apple shape.

And then you have smaller Mandelbrot sets recursively in this set.

And there's a way to mathematically, by solving high order polynomials, to figure out where these centers are and what size they are approximately.

And by sort of mathematically approaching this problem, you slowly get a feeling of why things are like they are.

And that sort of is, you know, first step to understanding why this rich phenomenon.

Do you think it's possible, what's your intuition,

do you think it's possible to reverse engineer and find the short program that generated these fractals by looking at the fractals?

mean, in principle, what you can do is you take any data set, you take these fractals or you take whatever your data set, whatever you have, say a picture of Conway's Game of Life.

You take a program of size 1234, and all these programs, run them all in parallel in so-called dovetailing fashion.

Give them computational resources, first 1 50%, second 1 half resources, and so on, and let them run.

Wait until they halt, give an output, compare it to your data, and if some of these programs produce the correct data, then you stop and then you have already some program.

It may be a long program because it's faster.

And then you continue and you get shorter and shorter programs until you eventually find the shortest program.

The interesting thing you can never know whether it's the shortest program because there could be an even shorter program which is just even slower and you just have to wait yeah but asymptotically and actually after finite time you have this shortest program.

So this is a theoretical but completely impractical way of finding the underlying

And that is what Solomon of induction does and Kolmogorov complexity.

In practice, of course, we have to approach the problem more intelligently.

And then if you take resource limitations into account, there's, for instance, the field of pseudo random numbers, and these are random numbers, so these are deterministic sequences, but no algorithm which is fast, fast means runs in polynomial time, can detect that it's actually deterministic.

So we can produce interesting, I mean, random numbers, maybe not that interesting, but just an example.

We can produce complex looking data, and we can then prove that no fast algorithm can detect the underlying pattern.

Which is unfortunately, that's a big challenge for our search for simple programs in the space of artificial intelligence perhaps.

for artificial intelligence and it's quite surprising that it's, I can't say easy, I mean physicists worked really hard to find these theories, but apparently it was possible for human minds to find these simple rules in the universe.

So let me ask another absurdly big question.

I wasn't sure what you were gonna say because you could have just as easily said, I have no clue.

Which many people would say, but I'm not modest in this question.

So the informal version, which I worked out together with Shane Leck, who co-founded DeepMind, is that intelligence measures an agent's ability to perform well in a wide range of environments.

So that doesn't sound very impressive, and these words have been very carefully chosen and there is a mathematical theory behind that and we come back to that later and If you look at this this definition by itself, It seems like yeah, okay, but it seems a lot of things are missing.

But if you think it through, then you realize that most, and I claim all of the other traits, at least of rational intelligence, which we usually associate with intelligence, are emergent phenomena from this definition.

See all Lex Fridman transcripts on Youtube

Marcus Hutter: Universal Artificial Intelligence, AIXI, and AGI | Lex Fridman Podcast #75