2 hours 34 minutes 20 seconds
🇬🇧 English
Speaker 1
00:00
The following is a conversation with Francois Chollet, his second time on the podcast. He's both a world-class engineer and a philosopher in the realm of deep learning and artificial intelligence. This time, we talk a lot about his paper titled On the Measure of Intelligence that discusses how we might define and measure general intelligence in our computing machinery. Quick summary of the sponsors, Babbel, Masterclass, and Cash App.
Speaker 1
00:29
Click the sponsor links in the description to get a discount and to support this podcast. As a side note, let me say that the serious, rigorous, scientific study of artificial general intelligence is a rare thing. The mainstream machine learning community works on very narrow AI with very narrow benchmarks. This is very good for incremental and sometimes big incremental progress.
Speaker 1
00:53
On the other hand, the outside the mainstream renegade, you could say, AGI community works on approaches that verge on the philosophical and even the literary without big public benchmarks. Walking the line between the 2 worlds is a rare breed, but it doesn't have to be. I ran the AGI series at MIT as an attempt to inspire more people to walk this line. Deep mind and open AI for a time, and still on occasion walk this line.
Speaker 1
01:23
Francois Chollet does as well. I hope to also. It's a beautiful dream to work towards and to make real 1 day. If you enjoy this thing, subscribe on YouTube, review it with 5 stars on Apple Podcasts, follow us on Spotify, support on Patreon, or connect with me on Twitter at Lex Friedman.
Speaker 1
01:41
As usual, I'll do a few minutes of ads now and no ads in the middle. I try to make these interesting, but I give you timestamps so you can skip. But still, please do check out the sponsors by clicking the links in the description. It's the best way to support this podcast.
Speaker 1
01:57
This show is sponsored by Babbel, an app and website that gets you speaking in a new language within weeks. Go to babbel.com and use code Lex to get 3 months free. They offer 14 languages, including Spanish, French, Italian, German, and yes, Russian. Daily lessons are 10 to 15 minutes, super easy, effective, designed by over 100 language experts.
Speaker 1
02:22
Let me read a few lines from the Russian poem Noch, ulitsa, fanary, apteka by Alexander Blok that you'll start to understand if you sign up to Babbel. Noch, ulitsa, fanar, apteka, besmyslenny I tusklyy svet. Zhevi escho, khot chetert' veka. Vse budet tak, izhoda net.
Speaker 1
02:44
Now, I say that you'll start to understand this poem because Russian starts with a language and ends with vodka. Now, the latter part is definitely not endorsed or provided by Babbel. It will probably lose me this sponsorship, although it hasn't yet. But once you graduate with Babbel, you can enroll in my advanced course of late night Russian conversation over vodka.
Speaker 1
03:09
No app for that yet. So get started by visiting babbel.com and use code Lex to get 3 months free. This show is also sponsored by Masterclass. Sign up at masterclass.com slash lux to get a discount and to support this podcast.
Speaker 1
03:26
When I first heard about Masterclass, I thought it was too good to be true. I still think it's too good to be true. For $180 a year, you get an all-access pass to watch courses from, to list some of my favorites. Chris Hadfield on space exploration, hope to have him in this podcast 1 day.
Speaker 1
03:43
Neil deGrasse Tyson on scientific thinking and communication, Neil too. Will Wright, creator of SimCity and Sims and Game Design, Carlos Santana on guitar, Gary Kasparov on chess, Daniel Negrano on poker, and many more. Chris Hadfield explaining how rockets work and the experience of being launched into space alone is worth the money. By the way, you can watch it on basically any device Once again sign up at masterclass.com slash Lex to get a discount and to support this podcast This show finally is presented by cash app the number 1 finance app in the App Store.
Speaker 1
04:18
When you get it, use code LEXPODCAST. Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1. Since Cash App allows you to send and receive money digitally, let me mention a surprising fact related to physical money. Of all the currency in the world, roughly 8% of it is actually physical money.
Speaker 1
04:39
The other 92% of the money only exists digitally. And that's only going to increase. So again, if you get Cash App from the App Store or Google Play and use code LEXPODCAST, you get 10 bucks and Cash App will also donate $10 to FIRST, an organization that is helping to advance robotics and STEM education for young people around the world. And now here's my conversation with Francois Chollet.
Speaker 1
05:05
What philosophers, thinkers, or ideas had a big impact on you growing
Speaker 2
05:08
up and today? So 1 author that had a big impact on me when I read his books as a teenager with Jean Piaget, who is a Swiss psychologist, is considered to be the father of developmental psychology. And he has a large body of work about basically how intelligence develops in children.
Speaker 2
05:33
And so it's very old work, like most of it is from the 1930s, 1940s. So it's not quite up to date. It's actually superseded by many newer developments in developmental psychology. But to me it was very interesting, very striking and actually shaped the early ways in which I started to think about the mind and the development of intelligence as a teenager.
Speaker 1
05:56
His actual ideas or the way he thought about it or just the fact that you could think about the developing mind at all?
Speaker 2
06:01
I guess both. Jean-Pierre is the author that reintroduced me to the notion that intelligence and the mind is something that you construct throughout your life and that children construct it in stages. And I thought that was a very interesting idea, which is, you know, of course, very relevant to AI, to building artificial minds.
Speaker 2
06:23
Another book that I read around the same time that had a big impact on me, and there was actually a little bit of overlap with Jean-Pierre as well, and I read it around the same time, is Jeff Hawkins on intelligence,
Speaker 1
06:38
which
Speaker 2
06:38
is a classic. And he has this vision of the mind as a multi-scale hierarchy of temporal prediction modules. And these ideas really resonated with me, like the notion of modular hierarchy, of, you know, potentially of compression functions or prediction functions.
Speaker 2
07:01
I thought it was really, really interesting. And it really shaped the way I started thinking about how to build minds.
Speaker 1
07:10
The hierarchical nature, which aspect. Also, he's a neuroscientist, So he was thinking actual, he was basically talking about how our mind works.
Speaker 2
07:20
Yeah, the notion that cognition is prediction was an idea that was kind of new to me at the time and that I really loved at the time. And yeah, and the notion that there are multiple scales of processing in the brain. The hierarchy.
Speaker 1
07:36
Yes. This is before deep learning.
Speaker 2
07:38
These ideas of hierarchies in AI have been around for a long time, even before on intelligence, I mean, they've been around since the 1980s. And yeah, that was before deep learning, but of course I think these ideas really found their practical implementation in deep learning.
Speaker 1
07:57
What about the memory side of things? I think he was talking about knowledge representation. Do you think about memory a lot?
Speaker 1
08:04
1 way you can think of neural networks as a kind of memory, you're memorizing things, but it doesn't seem to be the kind of memory that's in our brains, or it doesn't have the same rich complexity, long-term nature that's in our brains? Yes. The brain is more
Speaker 2
08:22
of a sparse access memory so that you can actually retrieve very precisely like bits of your experience.
Speaker 1
08:30
The retrieval aspect, you can like introspect, you can ask yourself questions,
Speaker 2
08:35
I guess. Yes, you can program your own memory and language is actually the tool you use to do that. I think language is a kind of operating system for the mind and use language.
Speaker 2
08:47
Well, 1 of the uses of language is as a query that you run over your own memory, use words as keys to retrieve specific experiences, specific concepts, specific thoughts, like language is a way you store thoughts, not just in writing in the, in the physical world, but also in your own mind. And it's also how you retrieve them. Like imagine if you didn't have language, then you would have to, you would not really have a self internally triggered way of retrieving past thoughts. You would have to rely on external experiences.
Speaker 2
09:21
For instance, you see a specific site, you smell a specific smell, and that brings up memories, but you would not really have a way to deliberately, deliberately access these memories without language.
Speaker 1
09:32
Well, the interesting thing you mentioned is you can also program the memory. You can change it probably with language. Yeah, using language, yes.
Speaker 1
09:41
Well, let me ask you a Chomsky question, which is like, First of all, do you think language is like fundamental, like there's turtles, what's at the bottom of the turtles? They don't go, it can't be turtles all the way down. Is language at the bottom of cognition of everything? Is like language the fundamental aspect of like what it means to be a thinking thing?
Speaker 2
10:10
No, I don't think so. I think language is- You
Speaker 1
10:13
disagree with Norm Chomsky?
Speaker 2
10:14
Yes. I think language is a layer on top of cognition. So it is fundamental to cognition in the sense that to use a computing metaphor, I see language as the operating system of the brain, of the human mind. And the operating system, you know, is a layer on top of the computer.
Speaker 2
10:33
The computer exists before the operating system, but the operating system is how you make it truly useful.
Speaker 1
10:39
And the operating system is most likely Windows, not Linux. Because its language is messy.
Speaker 2
10:45
Yeah, it's messy. And it's pretty difficult to inspect it, introspect it.
Speaker 1
10:53
How do you think about language? Like we use actually sort of human interpretable language, But is there something like a deeper, that's closer to like, like logical type of statements? Like, yeah, what is the nature of language?
Speaker 1
11:14
Do you think? Because there's something deeper than like the syntactic rules we construct? Is there something that doesn't require utterances or writing or so on?
Speaker 2
11:25
Are you asking about the possibility that there could exist languages for thinking that are not made of words. Yeah. Yeah, I think so.
Speaker 2
11:34
I think, so the mind is layers, right? And language is almost like the outermost, the uppermost layer. But Before we think in words, I think we think in terms of emotion in space and we think in terms of physical actions. And I think babies in particular, probably expresses thoughts in terms of the actions that they've seen of that, or that they can perform.
Speaker 2
12:03
And in terms of the, in terms of motions of objects in their environment, before they start thinking in terms of words.
Speaker 1
12:10
It's amazing to think about that as the building blocks of language. So like the kind of actions and ways the baby see the world as like more fundamental than the beautiful Shakespearean language you construct on top of it. And we probably don't have any idea what that looks like, right?
Speaker 1
12:31
Like what, because it's important for them trying to engineer it into AI systems.
Speaker 2
12:38
I think visual analogies and motion is a fundamental building block of the mind. And you, you actually see it reflected in language. Like language is full of special metaphors.
Speaker 2
12:51
And when you think about things, I consider myself very much as a visual thinker. You, you often express your thoughts by using things like visualizing concepts in a 2D space or like you solve problems by imagining yourself navigating a concept space. I don't know if you have this sort of experience.
Speaker 1
13:17
You said visualize in concept space. So like, so I certainly think about, I certainly visualize mathematical concepts, but you mean like in concept space? Visually, you're embedding ideas into a three-dimensional space you can explore with your mind, essentially?
Speaker 2
13:38
You're showing me more like 2D, but yeah. 2D? Yeah.
Speaker 1
13:41
You're a flatlander. You're, okay. No.
Speaker 1
13:48
I do not. I always have to, before I jump from concept to concept, I have to put it back down. It has to be on paper. I can only travel on 2D paper, not inside my mind.
Speaker 1
14:03
You're able to move inside your mind.
Speaker 2
14:05
But even if you're writing like a paper, for instance, don't you have like a spatial representation of your paper? Like you visualize where ideas lie topologically in relationship to other ideas, kind of like a subway map of the ideas in your paper.
Speaker 1
14:22
Yeah, that's true. I mean, there is, in papers, I don't know about you, but it feels like there's a destination. There's a key idea that you want to arrive at and a lot of it is in the fog and you're trying to kind of, it's almost like, what's that called when you do a path planning search from both directions, from the start and from the end?
Speaker 1
14:52
And then you find, you do like shortest path, but like, you know, in game playing, you do this with like A star from both sides. And you
Speaker 2
15:01
see where we are to join.
Speaker 1
15:03
Yeah, so you kind of do, at least for me, I think like, first of all, just exploring from the start, from like first principles, what do I know? What can I start proving from that, right? And then from the destination, if I use their backtracking, like if I want to show some kind of sets of ideas, what would it take to show them?
Speaker 1
15:26
And you kind of backtrack. But yeah, I don't think I'm doing all that in my mind though. Like I'm putting it down on paper.
Speaker 2
15:33
Do you use mind maps to organize your ideas? Yeah, I like mind maps. I'm that kind
Speaker 1
15:37
of person. Let's get into this, because I've been so jealous of people. I haven't really tried it.
Speaker 1
15:42
I've been jealous of people that seem to, like, they get like this fire of passion in their eyes because everything starts making sense. It's like Tom Cruise in the movie was like moving stuff around. Some of the most brilliant people I know use mind maps. I haven't tried really.
Speaker 1
15:57
Can you explain what the hell a mind map is?
Speaker 2
16:01
I guess a mind map is a way to make kind of like the mess inside your mind to just put it on paper so that you gain more control over it. It's a way to organize things on paper. And as kind of like a consequence for organizing things on paper, it started being more organized inside your own mind.
Speaker 1
16:20
So what does that look like? You put, like, do you have an example? Like, what's the first thing you write on paper?
Speaker 1
16:27
What's the second thing you write?
Speaker 2
16:28
I mean, typically you draw a mind map to organize the way you think about the topics. So you would start by writing down like the, the key concept about that topic, like you would write intelligence or something, and then you would start adding, associative connections. Like, what do you think about when you think about intelligence?
Speaker 2
16:47
What do you think are the key elements of intelligence? So maybe you would have language, for instance, and you'd have motion. And so you would start drawing notes with these things. And then you would see, what do you think about when you think about motion?
Speaker 2
16:58
And so on, and you would go like that, like a tree.
Speaker 1
17:01
It's a tree or a tree mostly, or is it a graph too? Like a tree.
Speaker 2
17:05
Oh, it's more of a graph than a tree. And it's not limited to just, you know, writing down words. You can also draw things and it's not, it's not supposed to be purely hierarchical.
Speaker 2
17:19
Right? Like you can, the point is that you can start, once you start writing it down, you can start reorganizing it so that it makes more sense so that it's connected in a more effective way.
Speaker 1
17:30
See, but I'm so OCD that you just mentioned intelligence and the language in motion. I would start becoming paranoid that the categorization is imperfect. That I would become paralyzed with the mind map.
Speaker 1
17:51
Even though you're just doing associative kind of connections, there's an implied hierarchy that's emerging, and I will start becoming paranoid that it's not the proper hierarchy. So you're not just, 1 way to see mind maps is you're putting thoughts on paper, it's like a stream of consciousness, but then you can also start getting paranoid, well, is this the right hierarchy?
Speaker 2
18:15
Sure, but it's a mind map, it's your mind map. You're free to draw anything you want. You're free to draw any connection you want.
Speaker 2
18:20
And you can just make a different mind map if you think the central node is not the right node.
Speaker 1
18:26
Yeah, I suppose there's a fear of being wrong.
Speaker 2
18:30
If you want to organize your ideas by writing down what you think, which I think is, is, is very effective. Like how do you know what you think about something if you, if you don't write it down, right? If you do that, the thing is that it imposes a much more, syntactic structure over your ideas, which is not required with a mind map.
Speaker 2
18:51
So mind map is kind of like a lower level, more freehand way of organizing your thoughts. And once you've drawn it, then you can start actually voicing your thoughts in terms of, you know, paragraphs.
Speaker 1
19:05
And sometimes- It's a two-dimensional aspect of layout too, right? Yeah. And it's a kind of flower, I guess, you start.
Speaker 1
19:12
There's usually, you want to start with a central concept?
Speaker 2
19:15
Yes. And then
Speaker 1
19:16
you move out.
Speaker 2
19:16
Typically, it ends up more like a subway map. So it ends up more like a subway map. So it ends up more like a graph, a topological graph.
Speaker 1
19:22
Without a root note. Yeah, so like in a
Speaker 2
19:24
subway map, there are some nodes that are more connected than others. And there are some nodes that are more important than others. So there are destinations, but it's not gonna be purely like a tree, for instance.
Speaker 1
19:36
Yeah. It's fascinating to think that if there's something to that about our, about the way our mind thinks, by the way, I just kind of remembered obvious thing that I have probably thousands of documents in Google Doc at this point that are bullet point lists, which is you can probably map a mind map to a bullet point list. It's the same, It's a, no, it's not, it's a tree. It's a tree, yeah.
Speaker 1
20:06
So I create trees, but also they don't have the visual element. Like I guess I'm comfortable with the structure. It feels like, the narrowness, the constraints feel more comforting.
Speaker 2
20:18
If you have thousands of documents with your own thoughts in Google Docs, why don't you write some kind of search engine, like maybe a mind map, a piece of software, a mind mapping software, where you write down a concept and then it gives you sentences or paragraphs from your SARS and Google Docs document that match this concept.
Speaker 1
20:41
The problem is it's so deeply, unlike mind maps, It's so deeply rooted in natural language. So it's not semantically searchable, I would say, because the categories are very, you kind of mentioned intelligence, language, and motion. They're very strong semantic.
Speaker 1
21:03
It feels like the mind map forces you to be semantically clear and specific. The bullet points list I have are sparse, desperate thoughts that poetically represent a category like motion as opposed to saying motion. So unfortunately, that's the same problem with the internet. That's why the idea of semantic web is difficult to get.
Speaker 1
21:34
Most language on the internet is a giant mess of natural language that's hard to interpret. Which, so do you think there's something to mind maps as you actually originally brought it up as we were talking about kind of cognition and language. Do you think there's something to mind maps about how our brain actually deals, like think reasons about things?
Speaker 2
22:01
It's possible. I think it's reasonable to assume that there is some level of topological processing in the brain, that the brain is very associative in nature. And I also believe that a topological space is a better medium to encode thoughts than a geometric space.
Speaker 2
22:27
So I think...
Speaker 1
22:28
What's the difference in a topological and a geometric space?
Speaker 2
22:31
Well, if you're talking about topologies, then points are either connected or not. So the topology is more like a subway map. And geometry is when you're interested in the distance between things.
Speaker 2
22:43
And in subway maps, you don't really have the concept of distance. You only have the concept of whether there is a train going from station A to station B. And what we do in deep learning is that we're actually dealing with geometric spaces. We are dealing with concept vectors, word vectors that have a distance between the distance in terms of that product.
Speaker 2
23:06
We are not, we are not really building topological models. Usually.
Speaker 1
23:10
I think you're absolutely right. Like distance is a fundamental importance in deep learning. I mean, it's, it's the continuous aspect of it.
Speaker 2
23:19
Yes, because everything is a vector and everything has to be a vector because everything has to be differentiable. If your space is discrete, it's no longer differentiable. You cannot do deep learning in it anymore.
Speaker 2
23:29
Well, you could, but you can only do it by embedding it in a bigger continuous space. So if you do topology in the context of deep learning, you have to do it by embedding your topology in a geometry,
Speaker 1
23:41
right? Yeah. Well, let me zoom out for a second. Let's get into your paper on the measure of intelligence.
Speaker 1
23:50
That, did you put it on 2019? Yes. Okay. Yeah.
Speaker 1
23:54
November. November. Yeah, remember 2019? That was a different time.
Speaker 2
24:01
Yeah, I remember. I still remember. Yeah.
Speaker 1
24:06
It feels like a different world.
Speaker 2
24:09
You could travel, you could actually go outside and see friends.
Speaker 1
24:15
Yeah. Let me ask the most absurd question. I think there's some non-zero probability there'll be a textbook 1 day, like 200 years from now, on artificial intelligence, or it'll be called just intelligence because humans will already be gone. It'll be your picture with a quote.
Speaker 1
24:35
This is 1 of the early biological systems would consider the nature of intelligence and there'll be like a definition of how they thought about intelligence. Which is 1 of the things you do in your paper on measuring intelligence is to ask like, well, what is intelligence and how to test for intelligence and so on. So is there a spiffy quote about what is intelligence? What is the definition of intelligence according to Francois Chollet?
Speaker 3
25:06
Yeah, so do you think
Speaker 2
25:09
the super intelligent AIs of the future will want to remember us the way we remember humans from the past? And do you think they won't be ashamed of having a biological origin?
Speaker 1
25:22
No, I think it would be a niche topic. It won't be that interesting, but it'll be like the people that study in certain contexts like historical civilization that no longer exists, the Aztecs and so on, that's how it'll be seen. And it'll be studied in also the context on social media, there'll be hashtags about the atrocity committed to human beings when the robots finally got rid of them.
Speaker 1
25:54
Like it was a mistake, it'll be seen as a giant mistake, but ultimately in the name of progress and it created a better world because humans were over-consuming the resources and they were not very rational and were destructive in the end in terms of productivity and putting more love in the world. And so within that context, there'll be a chapter about these biological systems.
Speaker 2
26:17
Seems you have a very detailed vision of that feature. You should write a sci-fi novel about it.
Speaker 1
26:22
I'm working on a sci-fi novel currently, yes. Self-published,
Speaker 2
26:28
yeah. The definition of intelligence. So intelligence is the efficiency with which you acquire new skills, tasks that you did not previously know about that you did not prepare for. Right.
Speaker 2
26:44
So it is not intelligence is not skill itself. It's not what you know, it's not what you can do. It's how well and how efficiently you can learn new things,
Speaker 1
26:54
new things. Yes. The idea of newness there seems to be fundamentally important.
Speaker 1
27:01
Yes.
Speaker 2
27:01
So you would see intelligence on display, for instance, whenever you see a human being or, you know, an AI creature adapt to a new environment that it does not seen before, that its creators did not anticipate. When you see adaptation, when you see improvisation, when you see generalization, that's intelligence. In reverse, if you have a system that's when you put it in a slightly new environment, it cannot adapt, it cannot improvise, it cannot deviate from what it's hard coded to do or what it has been trained to do.
Speaker 2
27:38
That is a system that is not intelligent. There's actually a quote from Einstein that captures this idea, which is, "'The measure of intelligence is the ability to change. I like that quote. I think it captures at least part of this idea.
Speaker 1
27:54
You know, there might be something interesting about the difference between your definition and Einstein's. I mean, he's just being Einstein and clever, but acquisition of new ability to deal with new things versus ability to just change. What's the difference between those 2 things?
Speaker 1
28:19
So just change in itself. Do you think there's something to that? Just being able to change.
Speaker 2
28:26
Yes, being able to adapt. So not change, but certainly a change in direction. Being able to adapt yourself to your environment.
Speaker 1
28:37
Whatever the environment is.
Speaker 2
28:38
That's a big part of intelligence, yes. And intelligence is most precisely, you know, how efficiently you're able to adapt, how efficiently you're able to basically master your environment, how efficiently you can acquire new skills. And I
Speaker 3
28:52
think there's a there's a
Speaker 2
28:53
big distinction to be drawn between intelligence, which is a process and the output of that process, which is skill. So for instance, if you have a very smart human programmer that considers the game of chess, and that writes down a static program that can play chess, then the intelligence is the process of developing that program. But the program itself is just encoding the output artifacts of that process.
Speaker 2
29:27
The program itself is not intelligent. And the way you tell it's not intelligent is that if you put it in a different context, you ask it to play Go or something, it's not going to be able to perform well without human involvement. Because the source of intelligence, the entity that is capable of that process is the human programmer. So we should be able to tell a difference between the process and its output.
Speaker 2
29:49
We should not confuse the output and the process. It's the same as, you know, do not confuse a road building company and 1 specific road. Because 1 specific road takes you from point A to point B, but a road-building company can take you from, can make a path from anywhere to anywhere else.
Speaker 1
30:08
Yeah, that's beautifully put, but it's also, to play devil's advocate a little bit, you know, it's possible that there's something more fundamental than us humans. So you kind of said the programmer creates the difference between the choir of the skill and the skill itself. There could be something, like you could argue the universe is more intelligent.
Speaker 1
30:36
Like the deep, the base intelligence of that we should be trying to measure is something that created humans. We should be measuring God, or the source of the universe, as opposed to, like there could be a deeper intelligence. There's always deeper intelligence, I guess.
Speaker 2
30:56
You can argue that, but that does not take anything away from the fact that humans are intelligent and you can tell that because they are capable of adaptation and, and generality. And you see that in particular in the fact that humans are capable of handling situations and tasks that are quite different from anything that any of our evolutionary ancestors has ever encountered. So we are capable of generalizing very much out of distribution, if you consider our evolutionary history as being in a way of sharing data.
Speaker 1
31:33
Of course, evolutionary biologists would argue that we're not going too far out of the distribution. We're like mapping the skills we've learned previously, desperately trying to like jam them into like these new situations.
Speaker 3
31:46
I mean, there's definitely
Speaker 2
31:48
a little bit of that, but it's pretty clear to me that we're able to, you know, most of the things we do any given day in our modern civilization are things that are very, very different from what, you know, our ancestors a million years ago would have been doing in a given day and your environment is very different. So I agree that everything we do, we do it with cognitive building blocks that we acquired over the course of evolution, right? And that anchors our cognition to certain context, which is the human condition very much.
Speaker 2
32:25
But still our mind is capable of a pretty remarkable degree of generality far beyond anything we can create in artificial systems today. Like the degree in which the mind can generalize from its evolutionary history, can generalize away from its evolutionary history is much greater than the degree to which a deep learning system today can generalize away from its training data.
Speaker 1
32:50
And like the key point you're making, which I think is quite beautiful, is like we shouldn't measure, if we're talking about measurement, we shouldn't measure the skill. We should measure like the creation of the new skill, the ability to create that new skill. But it's tempting, like, it's weird because the skill is a little bit of a small window into the system.
Speaker 1
33:16
So whenever you have a lot of skills, it's tempting to measure the skills.
Speaker 2
33:21
Yes. I mean, the skill is the only thing you can objectively measure. But yeah, so the thing to keep in mind is that when you see skill in the human, it gives you a strong signal that that human is intelligent because you know they weren't born with that skill, typically, like you see a very strong chess player, Maybe you're a very strong chess player yourself.
Speaker 1
33:47
I think you're saying that because I'm Russian and now you're prejudiced. You assume all Russians are good
Speaker 2
33:54
at chess.
Speaker 1
33:55
I'm biased. It's cultural bias.
Speaker 2
34:00
So if you see a very strong chess player, you know they weren't born knowing how to play chess. So they had to acquire that skill with their limited resources, with their limited lifetime. And you know, they did that because they are generally intelligent.
Speaker 2
34:15
And so they may as well have acquired any other skill, you know, they have this potential. And on the other hand, if you see a computer playing chess, you cannot make the same assumptions, because you cannot just assume the computer is generally intelligent. The computer may be born knowing how to play chess in the sense that it may have been programmed by a human that has understood chess for the computer and that has just encoded the output of that understanding and aesthetic program. And that program is not intelligent.
Speaker 1
34:49
So let's zoom out just for a second and say like, what is the goal of the, on the measure of intelligence paper? Like what do you hope to achieve with it?
Speaker 2
34:59
So the goal of the paper is to clear up some longstanding misunderstandings about the way we've been conceptualizing intelligence in the AI community and in the way we've been evaluating progress in AI. There's been a lot of progress recently in machine learning and people are, you know, extrapolating from that progress that we are about to sort of general intelligence. And if you want to be able to evaluate these statements, You need to precisely define what you're talking about when you're talking about general intelligence.
Speaker 2
35:35
And you need a formal way, a reliable way to measure how much intelligence, how much general intelligence a system processes. And ideally this measure of intelligence should be actionable. So it should not just describe what intelligence is. It should not just be a binary indicator that tells you this system is intelligent or it isn't.
Speaker 2
36:01
It should be actionable. It should have explanatory power, right? So you could use it as a feedback signal, it would show you the way towards building more intelligent systems.
Speaker 1
36:13
So at the first level, you draw a distinction between 2 divergent views of intelligence, of, as we just talked about, intelligence is a collection of task-specific skills and a general learning ability. So what's the difference between kind of this memorization of skills and a general learning ability? We've talked about it a little bit, but can you try to linger on this topic for a bit?
Speaker 2
36:42
Yeah, so the first part of the paper is an assessment of the different ways we've been thinking about intelligence and the different ways we've been evaluating progress in AI. And the history of cognitive sciences has been shaped by 2 views of the human mind. And 1 view is the evolutionary psychology view in which the mind is a collection of fairly static, special purpose, ad hoc mechanisms that have been hard hard-coded by evolution over our history as a species over a very long time.
Speaker 2
37:22
And early AI researchers, people like Marvin Minsky, for instance, they clearly subscribed to this view. And they saw the mind as a kind of, you know, collection of static programs, similar to the programs they would run on like mainframe computers. And in fact, I think they very much understood the mind through the metaphor of the mainframe computer, because it was tool day they were working with. Right.
Speaker 2
37:53
And so you had the static programs, this collection of very different static programs operating over a database like memory. And in this picture, learning was not very important. Learning was considered to be just memorization. And in fact, learning is basically not featured in AI textbooks until the 1980s with the rise of machine learning.
Speaker 2
38:16
It's kind of
Speaker 1
38:17
fun to think about that learning was the outcast. Like the weird people working on learning. Like the mainstream AI world was, I mean, I don't know what the best term is, but it's non-learning.
Speaker 1
38:34
It was seen as like reasoning would not be learning based.
Speaker 2
38:37
Yes, it was seen, it was considered the mind was a collection of programs that were primarily logical in nature. And that's all you needed to do to create a mind was to write down these programs, and they would operate over knowledge, which will be stored in some kind of database. And as long as your database would encompass, you know, everything about the world and your logical rules were comprehensive, then you would have a mind.
Speaker 2
39:04
So the other view of the mind is the brain as a sort of blank slate, right? This is a very old idea. You find it in John Locke's writings. This is the tabula rasa.
Speaker 2
39:19
And this is this idea that the mind is some kind of like information sponge that starts empty, that starts blank, and that absorbs knowledge and skills from experience. So it's a sponge that reflects the complexity of the world, the complexity of your life experience, essentially, that everything you know, and everything we can do is a reflection of something you found in the outside world, essentially. So this is an idea that's very old. That was not very popular for instance, in the, in the 1970s, but that had gained a lot of vitality recently with the rise of connectionism in particular deep learning.
Speaker 2
40:04
And so today deep learning is the dominant paradigm in AI. And I feel like lots of AI researchers are conceptualizing the mind via a deep learning metaphor. Like they see the mind as a kind of a randomly initialized neural network that starts blank when you're born. And then that gets trained via exposure to training data that acquires knowledge and skills via exposure to training data.
Speaker 1
40:30
By the way, it's a small tangent. I feel like people who are thinking about intelligence are not conceptualizing it that way. I actually haven't met too many people who believe that a neural network will be able to reason, who seriously think that, rigorously, because I think it's actually an interesting worldview, and we'll talk about it more, but it's been impressive what neural networks have been able to accomplish.
Speaker 1
41:01
And it's, to me, I don't know, you might disagree, but it's an open question whether like scaling size eventually might lead to incredible results to us mere humans will appear as if it's general.
Speaker 2
41:16
I mean, if you ask people who are seriously thinking about intelligence, they will definitely not say that all you need to do is, like the mind is just a neural network. However, it's actually a view that's very popular, I think, in the deep learning community, that many people are kind of conceptually, you know, intellectually lazy about it.
Speaker 1
41:36
Right, it's a, but I guess what I'm saying exactly right. It's, I haven't met many people and I think it would be interesting to meet a person who is not intellectually lazy about this particular topic and still believes that neural networks will go all the way. I think Yann LeCun is probably closest to that with self-supervisors.
Speaker 1
41:57
There are
Speaker 2
41:57
definitely people who argue that currently planning techniques are already the way to general artificial intelligence and that all you need to do is to scale it up to all the available trained data. And that's, if you look at the waves that OpenAI's GPT-3 model has made, you see echoes of this idea.
Speaker 1
42:22
So on that topic, GPT-3, similar to GPT-2 actually, have captivated some part of the imagination of the public. There's just a bunch of hype of different kind. It's, I would say it's emergent.
Speaker 1
42:37
It's not artificially manufactured. It's just like people just get excited for some strange reason. And in the case of GPT-3, which is funny, that there's, I believe, a couple of months delay from a release to hype. Maybe I'm not historically correct on that, but it feels like there was a little bit of a lack of hype and then there's a phase shift into hype.
Speaker 1
43:04
But nevertheless, there's a bunch of cool applications that seem to captivate the imagination of the public about what this language model that's trained in unsupervised way without any fine tuning is able to achieve. So what do you make of that? What are your thoughts about GPT-3?
Speaker 2
43:22
Yeah, so I think what's interesting about GPT-3 is the idea that it may be able to learn new tasks in after just being shown a few examples. So I think if it's actually capable of doing that, that's novel and that's very interesting and that's something we should investigate. That said, I must say, I'm not entirely convinced that we have shown it's capable of doing that.
Speaker 2
43:47
It's very likely given the amount of data that the model is trained on that what it's actually doing is pattern matching a new task you give it with a task that it's been exposed to in its training data. It's just recognizing the task instead of just developing a model of the task. Right.
Speaker 1
44:05
But there's, sorry to interrupt. There's, there's a parallels to what you said before, which is it's possible to see GPT-3 as like the prompts it's given as a kind of SQL query into this thing that it's learned, similar to what you said before, which is languages used to query the memory. So is it possible that neural network is a giant memorization thing, but then if it gets sufficiently giant, it'll memorize sufficiently large amounts of things in the world, or it becomes, or intelligence becomes a querying machine.
Speaker 2
44:40
I think it's possible that a significant chunk of intelligence is this giant associative memory. I definitely don't believe that intelligence is just a giant associative memory, but it may well be a big
Speaker 1
44:55
component. So do you think GPT-3, 4, 5, GPT-10 will eventually, like what do you think, where's the ceiling? Do you think you'll be able to reason? No, that's a bad question.
Speaker 1
45:14
Like what is the ceiling is the better question.
Speaker 2
45:16
Well, what is going to scale? How good is GPT-N going to be? Yeah.
Speaker 2
45:21
So I believe GPT-N is going to, GPT-N is going to improve on the strength of GPT-2 and 3, which is it, it will be able to generate, you know, ever more plausible text in context. Just
Speaker 1
45:37
monotonically increasing performance.
Speaker 2
45:41
Yes. If you train, if you train a bigger model on more data, then your text will be increasingly more context aware and increasingly more plausible in the same way that GPT-3 is much better at generating plausible text compared to GPT-2. But that said, I don't think just scaling up the model to more transformer layers and more trained data is going to address the flaws of GPT-3, which is that it can generate plausible texts, but that text is not constrained by anything else other than plausibility. So in particular, it's not constrained by factualness or even consistency, which is why it's very easy to get GPT-3 to generate statements that are factually untrue or to generate statements that are even self-contradictory, right?
Speaker 2
46:32
Because its only goal is plausibility and it has no other constraints. It's not constrained to be self consistent, right? And so, for this reason, 1 thing that I thought was very interesting with GPT-3 is that you can pre-determine the answer it will give you by asking the question in a specific way, because it's very responsive to the way you ask the question, since it has no understanding of the content of the question. Right.
Speaker 2
47:03
And if you have the same question in 2 different ways that are basically adversarially engineered to produce certain answer, you will get 2 different answers, 2 contradictory answers.
Speaker 1
47:15
It's very susceptible to adversarial attacks essentially.
Speaker 2
47:18
Potentially, yes. So in general, the problem with these models, these generative models, is that they are very good at generating plausible text, but that's just, that's just not enough. Right.
Speaker 2
47:31
You need, I think 111 avenue that would be very, interesting to make progress is to make it possible, to write programs over the latent space that these models operate on, that you would rely on these, self-supervised models to generate a sort of like pool of knowledge and concepts and common sense. And then you would be able to write explicit reasoning programs over it. Because the current problem with GPT-String is that it can be quite difficult to get it to do what you want to do. If you want to turn GPT-String into products, you need to put constraints on it.
Speaker 2
48:14
You need to force it to obey certain rules. So you need a way to program it explicitly.
Speaker 1
48:22
Yeah, so if you look at its ability to do program synthesis, it generates, like you said, something that's plausible.
Speaker 2
48:28
Yeah, so if you try to make it generate programs, it will perform well for any program that it has seen it in its training data. But because program space is not interpretative, right, it's not going to be able to generalize to problems it hasn't seen before.
Speaker 1
48:48
Now that's currently, do you think, sort of an absurd, but I think useful, I guess, intuition builder is, You know, the GPT-3 has 175 billion parameters. The human brain has about a thousand times that or more in terms of number of synapses. Do you think, obviously, very different kinds of things, but there is some degree of similarity.
Speaker 1
49:26
Do you think, what do you think GPT will look like when it has 100 trillion parameters? You think our conversation might be in nature different? Like, cause you've criticized GPT-3 very effectively now. Do you think?
Speaker 2
49:45
No, I don't think so. So the, to begin with the bottleneck with scaling up GPT-3, GPT models, generative pretrained transformer models is not going to be the size of the model or how long it takes to train it. The bottleneck is going to be the train data because OpenAI is already training GPT-3 on a crawl of basically the entire web, right?
Speaker 2
50:08
And that's a lot of data. So you could imagine training on more data than that. Like Google could train on more data than that, but it would still be only incrementally more data. And I don't recall exactly how much more data GPT-3 was trained on compared to GPT-2, but it's probably at least like a hundred, maybe even a thousand X.
Speaker 2
50:26
I don't have the exact number. You're not going to be able to train a model on a hundred more data than what you're already doing.
Speaker 1
50:34
So that's brilliant. So it's not, you know, it's easier to think of compute as a bottleneck and then arguing that we can remove that bottleneck, but.
Speaker 2
50:41
We can remove the compute bottleneck. I don't think it's a big problem. If you look at the pace at which we've improved the efficiency of deep learning models in the past a few years, I'm not worried about training time bottlenecks or model size bottlenecks.
Speaker 2
50:59
The bottleneck in the case of these generative transformer models is absolutely the trained data.
Speaker 1
51:05
What about the quality of the data? So yeah, so
Speaker 2
51:08
the quality of the data is an interesting point. The thing is, if you're going to want to use these models in real products, then you want to feed them data that's as high quality as factual, I would say as unbiased as possible, but you know, there's, there's not really such a thing as unbiased data in the first place, but you probably don't want to, to train it on Reddit, for instance. It sounds like a bad plan.
Speaker 2
51:36
So from my personal experience working with large scale deep learning models, so at some point I was working on a model at Google that's trained on, 350 million, labeled images. It's an image classification model. That's a lot of images. That's like probably most publicly available images on the web at the time.
Speaker 2
52:00
And it was a very noisy data set because the labels were not originally annotated by hand by humans. They were automatically derived from like tags on social media, or just keywords in the same page as the image was found and so on. So it was very noisy. And it turned out that you could easily get a better model, not just by trying, like if you train on more of the noisy data, you get an incrementally better model, but you very quickly hit diminishing returns.
Speaker 2
52:35
On the other hand, if you train on smaller data set with higher quality annotations, annotations that are actually made by humans, you get a better model. And it also takes less time to train it.
Speaker 1
52:49
Oh yeah, that's fascinating. It's the self-supervised learning. There's a way to get better doing the automated labeling.
Speaker 2
52:58
Yeah, so you can enrich or refine your labels in an automated way? That's correct.
Speaker 1
53:07
Do you have a hope for, I don't know if you're familiar with the idea of a semantic web. Is a semantic web, just for people who are not familiar, and is the idea of being able to convert the internet or be able to attach like semantic meaning to the words on the internet. The sentences, the paragraphs, to be able to convert information on the internet or some fraction of the internet into something that's interpretable by machines.
Speaker 1
53:38
That was kind of a dream for, I think the semantic web papers in the 90s. It's kind of the dream that, you know, the internet is full of rich, exciting information. Even just looking at Wikipedia, we should be able to use that as data for machines.
Speaker 2
53:57
And so far- Information is not, is not really in a format that's available to machines. So no, I don't think the semantic web will ever work simply because it would be a lot of work, right. To make, to provide that information in structured form.
Speaker 2
54:11
And there is not really any incentive for anyone to provide that work. So I think the way forward to make the knowledge on the web available to machines is actually something closer to unsupervised deep learning. Yeah, the GPT-3 is actually a bigger step in the direction of making the knowledge of the web available to machines than the semantic web was.
Speaker 1
54:36
Yeah, perhaps in a human-centric sense, it feels like GPT-3 hasn't learned anything that could be used to reason. But that might be just the early days.
Speaker 2
54:52
Yeah, I think that's correct. I think the forms of reasoning that you see perform are basically just reproducing patterns that it has seen in its training data. So of course, if you're trained on the entire web, then you can produce an illusion of reasoning in many different situations, but it will break down if it's presented with a novel situation.
Speaker 1
55:15
That's the open question between the illusion of reasoning and actual reasoning. Yeah.
Speaker 2
55:18
Yes. The power to adapt to something that is genuinely new, because the thing is, even imagine you had, you could train on every bit of data ever generated in this tree of humanity. It remains that model would be capable of, of anticipating many different possible situations, but it remains that the future is going to be something different. Like, for instance, if you train a GPT-3 model on data from the year 2002, for instance, and then use it today, it's going to be missing many things, it's going to be missing many common sense facts about the world.
Speaker 2
56:02
It's even going to be missing vocabulary and so on.
Speaker 1
56:05
Yeah, it's interesting that GPT-3 even doesn't have, I think, any information about the coronavirus.
Speaker 2
56:13
Yes, which is why, you know, a system that's, you tell that the system is intelligent when it's capable to adapt. So intelligence is going to require a small amount of continuous learning. It's also going to require some amount of improvisation.
Speaker 2
56:30
Like it's not enough to assume that what you're going to be asked to do is something that you've seen before, or something that is a simple interpolation of things you've seen before. Yeah. In fact, that model breaks down for even, even very tasks that look relatively simple from a distance, like L5 self-driving, for instance, Google had a paper a couple of years back, showing that somewhere like 30 million different road situations were actually completely insufficient to train a driving model. It wasn't even L2, right?
Speaker 2
57:11
And that's a lot of data. That's a lot more data than the 20 or 30 hours of driving that a human needs to learn to drive given the knowledge they've already accumulated. Well,
Speaker 1
57:22
let me ask you on that topic. Elon Musk, Tesla Autopilot, 1 of the only companies I believe is really pushing for a learning-based approach. Are you skeptical that that kind of network can achieve level 4?
Speaker 2
57:39
L4 is probably achievable, L5 probably not.
Speaker 1
57:44
What's the distinction there? Is L5 is completely, you can just fall asleep.
Speaker 2
57:49
Yeah, L5 is basically human level. Well, driving, we have
Speaker 1
57:52
to be careful saying human level, because like, that's the most- Yeah, there are all kinds of drivers. Yeah, that's the clearest example of like, you know, cars will most likely be much safer than humans in many situations where humans fail? It's the vice versa question.
Speaker 2
58:09
I'll tell you, you know, the thing is the amounts of trained data you would need to anticipate for pretty much every possible situation you learn content around the world. It's such that it's not entirely unrealistic to think that at some point in the future we'll develop a system that's trained on enough data, especially provided that we can simulate a lot of that data. We don't necessarily need actual actual cars on the road for everything.
Speaker 2
58:37
But it's a massive effort. And it turns out you can create a system that's much more adaptive, that can generalize much better if you just add explicit models of the surroundings of the car. And if you use deep learning for what it's good at, which is to provide perceptive information. So in general, deep learning is a way to encode perception and a way to encode intuition, but it is not a good medium for any sort of explicit reasoning.
Speaker 2
59:11
And in AI systems today, strong generalization tends to come from explicit models, tend to come from abstractions in the human mind that are encoded in program form by human engineer, right? Yeah, these are the abstractions can actually generalize not the sort of weak abstraction that is learned by a neural network.
Speaker 1
59:34
Yeah, and the question is how much reasoning, how much strong abstractions are required to solve particular tasks like driving? That's the question, or human life, existence. How much strong abstractions does existence require, but more specifically on driving.
Speaker 1
59:56
That's, that seems to be, that seems to be a coupled question about.
Omnivision Solutions Ltd