See all Lex Fridman transcripts on Youtube

youtube thumbnail

Sergey Levine: Robotics and Machine Learning | Lex Fridman Podcast #108

1 hours 37 minutes 30 seconds

🇬🇧 English

S1

Speaker 1

01:00:00

Clean in some way, like for example, clean in the sense that the classes in your multi-class classification problems separate linearly. So they have some kind of good representation, and we call this a feature representation. And for a long time, people were very worried about features in the world of supervised learning, because somebody had to actually build those features. So you couldn't just take an image and plug it into your logistic regression or your SVM or something.

S1

Speaker 1

01:00:22

Someone had to take that image and process it using some handwritten code. And then neural nets came along, and they could actually learn the features. And Suddenly, we could apply learning directly to the raw inputs, which was great for images, but it was even more great for all the other fields where people hadn't come up with good features

S2

Speaker 2

01:00:39

yet.

S1

Speaker 1

01:00:39

And 1 of those fields was actually reinforcement learning. Because in reinforcement learning, the notion of features, if you don't use neural nets and you have to design your own features, is very, very opaque. It's very hard to imagine, let's say I'm playing chess or Go.

S1

Speaker 1

01:00:53

What is a feature with which I can represent the value function for Go or even the optimal policy for Go linearly? I don't even know how to start thinking about it. And people tried all sorts of things. They would write down, you know, an expert chess player looks for whether the knight is in the middle of the board or not.

S1

Speaker 1

01:01:09

So that's a feature, is knight in middle of board? And they would write these like long lists of kind of arbitrary made up stuff. And that was really kind of getting us nowhere.

S2

Speaker 2

01:01:17

But that's a little, chess is a little more accessible than the robotics problem. Absolutely. Right, there's at least experts in the different features for chess, but still like the neural network there, to me that's, I mean you put it eloquently and almost made it seem like a natural step to add neural networks, but the fact that neural networks are able to discover features in the control problem, It's very interesting, it's hopeful.

S2

Speaker 2

01:01:46

I'm not sure what to think about it, but it feels hopeful that the control problem has features to be learned. Like, I guess my question is, is it surprising to you how far the deep side of deep reinforcement learning was able to like what the space of problems has been able to tackle from, especially in games with the AlphaStar and AlphaZero and just the representation power there and in the robotics space. And what is your sense of the limits of this representation power and the control

S1

Speaker 1

01:02:24

context? I think that in regard to the limits that here, I think that 1 thing that makes it a little hard to fully answer this question is because in settings where we would like to push these things to the limit, we encounter other bottlenecks. So, like, The reason that I can't get my robot to learn how to, I don't know, do the dishes in the kitchen, it's not because its neural net is not big enough. It's because when you try to actually do trial and error learning, reinforcement learning directly in the real world, where you have the potential to gather these large, highly varied and complex data sets, you start running into other problems.

S1

Speaker 1

01:03:11

Like 1 problem you run into very quickly, It'll first sound like a very pragmatic problem, but it actually turns out to be a pretty deep scientific problem. Take the robot, put it in your kitchen, have it try to learn to do the dishes with trial and error. It'll break all your dishes. And then we'll have no more dishes to clean.

S1

Speaker 1

01:03:26

Now you might think this is a very practical issue, but there's something to this, which is that if you have a person trying to do this, a person will have some degree of common sense. They'll break 1 dish, they'll be a little more careful with the next 1. If they break all of them, they're going to go and get more or something like that. There's all sorts of scaffolding that comes very naturally to us for our learning process.

S1

Speaker 1

01:03:46

Like, you know, if I have to learn something through trial and error, I have the common sense to know that I have to, you know, try multiple times. If I screw something up, I ask for help, or I reset things, or something like that. And all of that is kind of outside of the classic reinforcement learning problem formulation. There are other things that can also be categorized as kind of scaffolding but are very important.

S1

Speaker 1

01:04:07

Like, for example, where do you get your reward function? If I want to learn how to pour a cup of water, well, how do I know if I've done it correctly? Now, that probably requires an entire computer vision system to be built just to determine that. And that seems a little bit inelegant.

S1

Speaker 1

01:04:21

So there are all sorts of things like this that start to come up when we think through what we really need to get reinforcement learning to happen at scale in the real world. And I think that many of these things actually suggest a little bit of a shortcoming in the problem formulation and a few deeper questions that we have to resolve.

S2

Speaker 2

01:04:36

That's really interesting. I talked to like David Silver about AlphaZero and it seems like there's no, again, that we haven't hit the limit at all in the context when there is no broken dishes. So in the case of Go, it's really about just scaling compute.

S2

Speaker 2

01:04:55

So again, like the bottleneck is the amount of money you're willing to invest in compute and then maybe the different, the scaffolding around how difficult it is to scale compute maybe. But there, there's no limit. And it's interesting, now we move to the real world and there's the broken dishes, there's all the, and the reward function like you mentioned, that's really nice. So what, how do we push forward there?

S2

Speaker 2

01:05:19

Do you think, there's this kind of sample efficiency question that people bring up, you know, not having to break 100,000 dishes. Is this an algorithm question? Is this a data selection question? What do you think?

S2

Speaker 2

01:05:38

How do we not break too many dishes?

S1

Speaker 1

01:05:41

Yeah. Well, 1 way we can think about that is that Maybe we need to be better at reusing our data, building that iceberg. So perhaps it's too much to hope that you can have a machine that in isolation, in the vacuum without anything else, can just master complex tasks in minutes the way that people do. But perhaps it also doesn't have to.

S1

Speaker 1

01:06:09

Perhaps what it really needs to do is have an existence, a lifetime, where it does many things and the previous things that it has done, prepare it to do new things more efficiently. And the study of these kinds of questions typically falls under categories like multitask learning or meta-learning, but they all fundamentally deal with the same general theme, which is use experience for doing other things to learn to do new things efficiently and quickly.

S2

Speaker 2

01:06:37

So what do you think about, if you just look at 1 particular case study of a Tesla autopilot that has quickly approaching towards a million vehicles on the road where some percentage of the time, 30, 40% of the time is driven using the computer vision, multitask, HydroNet, right? And then the other percent, that's what they call it, HydroNet, The other percent is human controlled. From the human side, how can we use that data?

S2

Speaker 2

01:07:09

What's your sense? So like, what's the signal? Do you have ideas in this autonomous vehicle space when people can lose their lives? You know, it's a it's a safety critical environment.

S2

Speaker 2

01:07:21

So how do we use that data?

S1

Speaker 1

01:07:24

So I think that actually, the kind of problems that come up when we want systems that are reliable, and that can kind of understand the limits of their capabilities, they're actually very similar to the kind of problems that come up when we're doing off-policy reinforcement learning. So as I mentioned before, in off-policy reinforcement learning, the big problem is you need to know when you can trust the predictions of your model. Because if you're trying to evaluate some pattern of behavior for which your model doesn't give you an accurate prediction, then you shouldn't use that to modify your policy.

S1

Speaker 1

01:07:57

It's actually very similar to the problem that we're faced when we actually then deploy that thing and we want to decide whether we trust it in the moment or not. So perhaps we just need to do a better job of figuring out that part. And that's a very deep research question, of course.

S2

Speaker 2

01:08:09

But it's

S1

Speaker 1

01:08:10

also a question that a lot of people are working on. So I'm pretty optimistic that we can make some progress on that over the next few years.

S2

Speaker 2

01:08:15

What's the role of simulation in reinforcement learning, deep reinforcement learning, reinforcement learning? Like how essential is it? It's been essential for the breakthroughs so far, for some interesting breakthroughs.

S2

Speaker 2

01:08:28

Do you think it's a crutch that we rely on? I mean, again, this connects to our off-policy discussion, but do you think we can ever get rid of simulation, or do you think simulation will actually take over, will create more and more realistic simulations that will allow us to solve actual real-world problems, like transfer the models we learn in simulation to real world problems.

S1

Speaker 1

01:08:49

Yeah. I think that simulation is a very pragmatic tool that we can use to get a lot of useful stuff to work right now. But I think that in the long run, we will need to build machines that can learn from real data, because that's the only way that we'll get them to improve perpetually. Because if we can't have our machines learn from real data, if they have to rely on simulated data, eventually the simulator becomes the bottleneck.

S1

Speaker 1

01:09:12

In fact, this is a general thing. If your machine has any bottleneck that is built by humans and that doesn't improve from data, it will eventually be the thing that holds it back. And if you're entirely reliant on your simulator, that'll be the bottleneck. If you're entirely reliant on a manually designed controller, that's going to be the bottleneck.

S1

Speaker 1

01:09:30

So simulation is very useful. It's very pragmatic, but it's not a substitute for being able to utilize real experience. By the way, this is something that I think is quite relevant now, especially in the context of some of the things we've discussed because some of these scaffolding issues that I mentioned, things like the broken dishes and the unknown reward function, like these are not problems that you would ever stumble on when working in a purely simulated kind of environment. But they become very apparent when we try to actually run these things in the real world.

S2

Speaker 2

01:10:03

To throw a brief wrench into our discussion, let me ask, do you think we're living in a simulation?

S1

Speaker 1

01:10:07

Oh, I have no idea.

S2

Speaker 2

01:10:09

Do you think that's a useful thing to even think about, about the fundamental physics nature of reality? Or another perspective, the reason I think the simulation hypothesis is interesting is to think about how difficult is it to create sort of a virtual reality game type situation that will be sufficiently convincing to us humans or sufficiently enjoyable that we wouldn't wanna leave? I mean, that's actually a practical engineering challenge.

S2

Speaker 2

01:10:43

And I personally really enjoy virtual reality, but it's quite far away, but I kind of think about what would it take for me to want to spend more time in virtual reality versus the real world? And that's sort of a nice, clean question, because at that point, we've reached, if I want to live in a virtual reality, that means we're just a few years away, we're a majority of the population lives in a virtual reality and that's how we create the simulation, right? You don't need to actually simulate the quantum gravity and just every aspect of the universe. And that's an interesting question for reinforcement learning too.

S2

Speaker 2

01:11:23

Is if you want to make sufficiently realistic simulations that may, it blend the difference between sort of the real world and the simulation, thereby just some of the things we've been talking about, kind of the problems go away if we can create actually interesting, rich simulations.

S1

Speaker 1

01:11:40

It's an interesting question, and it actually, I think your question casts your previous question in a very interesting light. Because in some ways, asking whether we can, well, the more kind of practical version of this, like, can we build simulators that are good enough to train essentially AI systems that will work in the world? And it's kind of interesting to think about this, about what this implies.

S1

Speaker 1

01:12:05

If true, it kind of implies that it's easier to create the universe than it is to create a brain. And that seems like, put this way, it seems kind of weird.

S2

Speaker 2

01:12:15

The aspect of the simulation most interesting to me is the simulation around the humans. That seems to be a complexity that makes the robotics problem harder. Now I don't know if every robotics person agrees with that notion.

S2

Speaker 2

01:12:31

Just as a quick aside, what are your thoughts about when the human enters the picture of the robotics problem? How does that change the reinforcement learning problem, the learning problem in general?

S1

Speaker 1

01:12:45

Yeah, I think that's a, it's a kind of a complex question. And I guess my hope for a while had been that if we build these robotic learning systems that, that are multitask, that utilize lots of prior data and that learn from their own experience, the bit where they have to interact with people will be perhaps handled in much the same way as all the other bits. If they have prior experience of interacting with people and they can learn from their own experience of interacting with people for this new task, Maybe that'll be enough.

S1

Speaker 1

01:13:17

Now, of course, if it's not enough, there are many other things we can do. And there's quite a bit of research in that area. But I think it's worth a shot to see whether the multi-agent interaction, the ability to understand that other beings in the world have their own goals, intentions, and thoughts, and so on, whether that kind of understanding can emerge automatically from simply learning to do things and maximize utility.

S2

Speaker 2

01:13:44

That information arises from the data. You've said something about gravity, that you don't need to explicitly inject anything into the system, that it can be learned from the data, and gravity is an example of something that can be learned from data, sort of like the physics of the world. What are the limits of what we can learn from data?

S2

Speaker 2

01:14:08

Do you really, do you think we can, so a very simple, clean way to ask that is, do you really think we can learn gravity from just data? The idea, the laws of gravity.

S1

Speaker 1

01:14:19

So, something that I think is a common kind of pitfall when thinking about prior knowledge and learning is to assume that just because we know something, then that it's better to tell the machine about that rather than have it figure it out on its own. In many cases, things that are important that affect many of the events that the machine will experience are actually pretty easy to learn. Like, you know, if things, if every time you drop something, it falls down, like, yeah, you might not get the, you know, you might get kind of the Newton's version, not Einstein's version, but it'll be pretty good.

S1

Speaker 1

01:14:56

And it will probably be sufficient for you to act rationally in the world because you see the phenomenon all the time. So things that are readily apparent from the data, we might not need to specify those by hand. It might actually be easier to let the machine figure them out.

S2

Speaker 2

01:15:10

It just feels like there might be a space of many local, local minima in terms of theories of this world that we would discover and get stuck on. That Newtonian mechanics is not necessarily easy to come by.

S1

Speaker 1

01:15:27

Yeah, and in fact, in some fields of science, for example, human civilizations that sell full of these local optimums. So, for example, if you think about how people tried to figure out biology and medicine, for the longest time, the kind of rules, the kind of principles that serve us very well in our day-to-day lives, actually serve us very poorly in understanding medicine and biology. We had kind of very superstitious and weird ideas about how the body worked until the advent of the modern scientific method.

S1

Speaker 1

01:15:57

So that does seem to be a failing of this approach, but it's also a failing of human intelligence, arguably.

S2

Speaker 2

01:16:04

Maybe a small aside, but some, you know, the idea of self-play is fascinating in reinforcement learning, sort of these competitive, creating a competitive context in which agents can play against each other in a, sort of at the same skill level and thereby increasing each other's skill level. It seems to be this kind of self-improving mechanism is exceptionally powerful in the context where it could be applied. First of all, is that beautiful to you that this mechanism work as well as it does and also can be generalized to other contexts like in the robotic space or anything that's applicable to the real world?

S1

Speaker 1

01:16:43

I think that it's a very interesting idea but I suspect that the bottleneck to actually generalizing it to the robotic setting is actually going to be the same as the bottleneck for everything else. That we need to be able to build machines that can get better and better through natural interaction with the world. And once we can do that, then they can go out and play with, they can play with each other, they can play with people, they can play with the natural environment.

S1

Speaker 1

01:17:12

But before we get there, we've got all these other problems we have to get out of the way.

S2

Speaker 2

01:17:16

So there's no shortcut around that. You have to interact with a natural environment that...

S1

Speaker 1

01:17:20

Well, because in a self-play setting, you still need a mediating mechanism. So the reason that self-play works for a board game is because the rules of that board game mediate the interaction between the agents. So the kind of intelligent behavior that will emerge depends very heavily on the nature of that mediating mechanism.

S2

Speaker 2

01:17:39

So on the side of reward functions, that's coming up with good reward functions seems to be the thing that we associate with general intelligence. Like human beings seem to value the idea of developing our own reward functions, at arriving at meaning and so on. And yet for reinforcement learning, we often kind of specify this, the given.

S2

Speaker 2

01:18:02

What's your sense of how we develop good reward functions?

S1

Speaker 1

01:18:08

Yeah, I think that's a very complicated and very deep question. And you're completely right that classically in reinforcement learning, this question, I guess, kind of been treated as a non-issue, that you sort of treat the reward as this external thing that comes from some other bit of your biology and you kind of don't worry about it. And I do think that that's actually, you know, a little bit of a mistake that we should worry about it.

S1

Speaker 1

01:18:33

And we can approach it in a few different ways. We can approach it, for instance, by thinking of reward as a communication medium. We can say, well, how does a person communicate to a robot what its objective is? You can approach it also as sort of more of an intrinsic motivation medium.

S1

Speaker 1

01:18:47

You could say, can we write down kind of a general objective that leads to good capability? Like, for example, can you write down some objective such that even in the absence of any other task, if you maximize that objective, you'll sort of learn useful things? This is something that has sometimes been called unsupervised reinforcement learning, which I think is a really fascinating area of research, especially today. We've done a bit of work on that recently.

S1

Speaker 1

01:19:12

1 of the things we've studied is whether we can have some notion of unsupervised reinforcement learning by means of information theoretic quantities, like for instance, minimizing a Bayesian measure of surprise. This is an idea that was pioneered actually in the computational neuroscience community by folks like Carl Friston. And we've done some work recently that shows that you can actually learn pretty interesting skills by essentially behaving in a way that allows you to make accurate predictions about the world. It seems a little circular, like do the things that will lead to you getting the right answer for prediction.

S1

Speaker 1

01:19:48

But you can, you know, by doing this, you can sort of discover stable niches in the world. You can discover that if you're playing Tetris, then correctly, you know, clearing the rows will let you play Tetris for longer and keep the board nice and clean, which sort of satisfies some desire for order in the world. And as a result, get some degree of leverage over your domain. So we're exploring that pretty actively.

S2

Speaker 2

01:20:08

Is there a role for a human notion of curiosity in itself being the reward, sort of discovering new things about the world?

S1

Speaker 1

01:20:19

So 1 of the things that I'm pretty interested in is actually whether discovering new things can actually be an emergent property of some other objective that quantifies capability. So new things for the sake of new things, maybe it might not by itself be the right answer, but perhaps we can figure out an objective for which discovering new things is actually the natural consequence. That's something we're working on right now, but I don't have a clear answer for you there yet that's still a work in progress.

S2

Speaker 2

01:20:49

You mean just that it's a curious observation to see sort of creative patterns of curiosity on the way to optimize for a particular...

S1

Speaker 1

01:21:00

On the way to optimize for a particular measure of capability.

S2

Speaker 2

01:21:05

Is there ways to understand or anticipate unexpected, unintended consequences of particular reward functions? Sort of anticipate the kind of strategies that might be developed and try to avoid highly detrimental strategies.

S1

Speaker 1

01:21:26

Yeah, so classically, this is something that has been pretty hard in reinforcement learning because it's difficult for a designer to have good intuition about what a learning algorithm will come up with when they give it some objective. There are ways to mitigate that. 1 way to mitigate it is to actually define an objective that says, don't do weird stuff.

S1

Speaker 1

01:21:45

You can actually quantify it and say just like don't enter situations that have low probability under the distribution of states you've seen before. It turns out that that's actually 1 very good way to do off-policy reinforcement learning, actually. So We can do some things like that.

S2

Speaker 2

01:22:02

If we slowly venture in speaking about reward functions into greater and greater levels of intelligence, there's, I mean, Stuart Russell thinks about this, the alignment of AI systems with us humans. So how do we ensure that AGI systems align with us humans? It's kind of a reward function question of specifying the behavior of AI systems such that their success aligns with the broader intended success interest of human beings.

S2

Speaker 2

01:22:40

Do you have thoughts on this? Do you have kind of concerns of where reinforcement learning fits into this? Or are you really focused on the current moment of us being quite far away and trying to solve the robotics problem?

S1

Speaker 1

01:22:51

I don't have a great answer to this, but, you know, and I do think that this is a problem that's important to figure out. For my part, I'm actually a bit more concerned about the other side of this equation that, you know, maybe rather than unintended consequences for objectives that are specified too well, I'm actually more worried right now about unintended consequences for objectives that are not optimized well enough, which might become a very pressing problem when we, for instance, try to use these techniques for safety critical systems like cars and aircraft and so on. I think at some point we'll face the issue of objectives being optimized too well, but right now I think we're more likely to face the issue of them not being optimized well enough.

S2

Speaker 2

01:23:36

But you don't think unintended consequences can arise even when you're far from optimality, sort of like on the path to it?

S1

Speaker 1

01:23:43

Oh no, I think unintended consequences can absolutely arise. It's just, I think right now, the bottleneck for improving reliability, safety, and things like that is more with systems that need to work better, that need to optimize their objective better.

S2

Speaker 2

01:23:58

Do you have thoughts, concerns about existential threats of human level intelligence? If we put on our hat of looking in

S1

Speaker 1

01:24:07

10, 20, 100, 500

S2

Speaker 2

01:24:09

years from now, do you have concerns about existential threats of AI systems?

S1

Speaker 1

01:24:15

I think there are absolutely existential threats for AI systems, just like there are for any powerful technology. But I think that these kinds of problems can take many forms, and some of those forms will come down to people with nefarious intent. Some of them will come down to AI systems that have some fatal flaws, and some of them will of course come down to AI systems that are too capable in some way.

S1

Speaker 1

01:24:44

But Among this set of potential concerns, I would actually be much more concerned about the first 2 right now, and principally the 1 with nefarious humans, because just through all of human history, actually it's the nefarious humans that have been the problem, not the nefarious machines, than I am about the others. And I think that right now, the best that I can do to make sure things go well is to build the best technology I can and also hopefully promote responsible use of that technology.

S2

Speaker 2

01:25:13

Do you think RL systems has something to teach us humans? You said nefarious humans getting us in trouble. I mean, machine learning systems have in some ways have revealed to us the ethical flaws in our data.

S2

Speaker 2

01:25:28

In that same kind of way, can reinforcement learning teach us about ourselves? Has it taught something? What have you learned about yourself from trying to build robots and reinforcement learning systems?

S1

Speaker 1

01:25:42

I'm not sure what I've learned about myself, but Maybe part of the answer to your question might become a little bit more apparent once we see more widespread deployment of reinforcement learning for decision-making support in domains like healthcare, education, social media, etc. And I think we will see some interesting stuff emerge there. We will see, for instance, what kind of behaviors these systems come up with in situations where there is interaction with humans and where they have possibility of influencing human behavior.

S1

Speaker 1

01:26:18

I think we're not quite there yet, but maybe in the next few years we'll see some interesting stuff come out in that area.

S2

Speaker 2

01:26:23

I hope outside the research space, because the exciting space where this could be observed is sort of large companies that deal with large data, and I hope there's some transparency. Because 1 of the things that's unclear when I look at social networks and just online is why an algorithm did something, or whether even an algorithm was involved. And that'd be interesting from a research perspective just to observe the results of algorithms to open up that data or to at least be sufficiently transparent about the behavior of these AI systems in the real world.

S2

Speaker 2

01:27:01

What's your sense, I don't know if you looked at the Block Post-Bitter Lesson by Erich Sutton, where it looks at sort of the big lesson of researching AI and reinforcement learning is that simple methods, general methods that leverage computation seem to work well. So basically don't try to do any kind of fancy algorithms, just wait for computation to get fast. Do you share this kind of intuition?

S1

Speaker 1

01:27:31

I think the high level idea makes a lot of sense. I'm not sure that my takeaway would be that we don't need to work on algorithms. I think that my takeaway would be that we should work on general algorithms.

S1

Speaker 1

01:27:44

Actually, I think that this idea of needing to better automate the acquisition of experience in the real world actually follows pretty naturally from Rich Sutton's conclusion. So if the claim is that automated general methods plus data leads to good results, then it makes sense that we should build general methods and we should build the kind of methods that we can deploy and get them to go out there and collect their experience autonomously. I think that 1 place where I think that the current state of things falls a little bit short of that is actually that the going out there and collecting the data autonomously which is easy to do in a simulated board game but very hard to do in the real world.

S2

Speaker 2

01:28:27

Yeah, it keeps coming back to this 1 problem, right? It's, so your mind is focused there now in this real world. It just seems scary, this step of collecting the data.

S2

Speaker 2

01:28:41

And it seems unclear to me how we can do it effectively.

S1

Speaker 1

01:28:45

Yeah, Well, you know, 7000000000 people in the world, each of them had to do that at some point in their lives.

S2

Speaker 2

01:28:50

And we should leverage that experience that they've all done. We should be able to try to collect that kind of data. Okay, big questions.

S2

Speaker 2

01:29:02

Maybe stepping back through your life, would book or books, technical or fiction or philosophical, had a big impact on the way you saw the world, on the way you thought about in the world, your life in general. And maybe what books, if it's different, would you recommend people consider reading on their own intellectual journey? It could be within reinforcement learning, but it could be very much bigger.

S1

Speaker 1

01:29:32

I don't know if this is like a scientifically, like, particularly meaningful answer, but like the honest answer is that I actually found a lot of the work by Isaac Asimov to be very inspiring when I was younger. I don't know if that has anything to do with AI necessarily.

S2

Speaker 2

01:29:50

You don't think it had a ripple effect in your life? Maybe it did.

S1

Speaker 1

01:29:56

But yeah, I think that a vision of a future where, well, first of all, artificial, I might say artificial intelligence systems, artificial robotic systems have, you know, kind of a big place, a big role in society, and where we try to imagine the sort of the limiting case of technological advancement and how that might play out in our future history. But yeah, I think that that was in some way influential. I don't really know how, but I would recommend it.

S1

Speaker 1

01:30:34

I mean, if nothing else, you'd be well entertained.

S2

Speaker 2

01:30:36

When did you first, yourself, like fall in love with the idea of artificial intelligence, get captivated by this field?

S1

Speaker 1

01:30:45

So my honest answer here is actually that I only really started to think about it as something that I might want to do actually in graduate school pretty late. And a big part of that was that until somewhere around 2009, 2010, it just wasn't really high on my priority list because I didn't think that it was something where we were going to see very substantial advances in my lifetime. Maybe in terms of my career, the time when I really decided I wanted to work on this was when I actually took a seminar course that was taught by Professor Andrew Ng.

S1

Speaker 1

01:31:24

And at that point, I, of course, had a decent understanding of the technical things involved. But 1 of the things that really resonated with me was when he said in the opening lecture something to the effect of, well, he used to have graduate students come to him and talk about how they want to work on AI. And he would kind of chuckle and give them some math problem to deal with. But now he's actually thinking that this is an area where we might see substantial advances in our lifetime.

S1

Speaker 1

01:31:47

And that kind of got me thinking because, you know, in some abstract sense, yeah, like you can kind of imagine that, but in a very real sense, when someone who had been working on that kind of stuff their whole career, suddenly says that, yeah, like that had some effect on me.

S2

Speaker 2

01:32:03

Yeah, this might be a special moment in the history of the field. That this is where we might see some interesting breakthroughs. So in the space of advice, somebody who's interested in getting started in machine learning or reinforcement learning, what advice would you give to maybe an undergraduate student or maybe even younger, what are the first steps to take and further on, what are the steps to take on that journey?

S1

Speaker 1

01:32:32

So something that I think is important to do is to not be afraid to spend time imagining the kind of outcome that you might like to see. So 1 outcome might be a successful career, a large paycheck or something, or state of the art results in some benchmark. But hopefully, that's not the thing that's the main driving force for somebody.

S1

Speaker 1

01:32:57

But I think that if someone who is a student considering a career in AI like takes a little while, sits down and thinks like, what do I really want to see? What I want to see a machine do? What do I want to see a robot do? What do I want to do in, what I want to see a natural language system?

S1

Speaker 1

01:33:12

Just like imagine, you know, imagine it almost like a commercial for a future product or something, or like something that you'd like to see in the world, and then actually sit down and think about the steps that are necessary to get there. And hopefully that thing is not a better number on ImageNet classification. It's probably like an actual thing that we can't do today that would be really awesome. Whether it's a robot butler or a, you know, a really awesome healthcare decision-making support system, whatever it is that you find inspiring.

S1

Speaker 1

01:33:41

And I think that thinking about that and then backtracking from there and imagining the steps needed to get there will actually lead to much better research. It'll lead to rethinking the assumptions. It'll lead to working on the bottlenecks that other people aren't working on.

S2

Speaker 2

01:33:55

And then naturally to turn to you, we've talked about reward functions, and you've just given advice on looking forward how you'd like to see, what kind of change you would like to make in the world. What do you think, ridiculous, big question, what do you think is the meaning of life? What is the meaning of your life?

S2

Speaker 2

01:34:13

What gives you fulfillment, purpose, happiness, and meaning?

S1

Speaker 1

01:34:20

That's a very big question.

S2

Speaker 2

01:34:24

What's the reward function under which you are operating?

S1

Speaker 1

01:34:27

Yeah, I think 1 thing that does give, if not meaning, at least satisfaction is some degree of confidence that I'm working on a problem that really matters. I feel like it's less important to me to like actually solve a problem, but it's quite nice to take things to spend my time on that I believe really matter. And I try pretty hard to look for that.

S2

Speaker 2

01:34:52

I don't know if it's easy to answer this, but if you're successful, what does that look like? What's the big dream? Now, of course, success is built on top of success and you keep going forever, but what is the dream?

S1

Speaker 1

01:35:10

Yeah, so 1 very concrete thing, or maybe as concrete as it's gonna get here is to see machines that actually get better and better the longer they exist in the world. And that kind of seems like on the surface, 1 might even think that that's something that we have today, but I think we really don't. I think that there is an unending complexity in the universe and To date, all of the machines that we've been able to build don't improve up to the limit of that complexity.

S1

Speaker 1

01:35:44

They hit a wall somewhere. Maybe they hit a wall because they're in a simulator that is only a very limited, very pale imitation of the real world, or they hit a wall because they rely on a labeled dataset, but they never hit the wall of like running out of stuff to see. So, I'd like to build a machine that can go as far as possible in that regard.

S2

Speaker 2

01:36:04

Runs up against the ceiling of the complexity of the universe. Yes. Well, I don't think there's a better way to end it, Sergei.

S2

Speaker 2

01:36:11

Thank you so much, it's a huge honor. I can't wait to see the amazing work that you have to publish and in education space in terms of reinforcement learning. Thank you for inspiring the world. Thank you for the great research you do.

S2

Speaker 2

01:36:24

Thank you.

S3

Speaker 3

01:36:25

Thanks for listening to this conversation with Sergei Lavine and thank you to our sponsors, Cash app and ExpressVPN. Please consider supporting this podcast by downloading Cache app and using code LexPodcast and signing up at ExpressVPN.com slash LexPod. Click all the links, buy all the stuff, it's the best way to support this podcast and the journey I'm on.

S3

Speaker 3

01:36:52

If you enjoy this thing, subscribe on YouTube, review it with 5 Stars and Apple Podcast, support it on Patreon, or connect with me on Twitter at Lex Friedman, spelled somehow, if you can figure out how, without using the letter E, just F-R-I-D-M-A-N. And now, let me leave you with some words from Salvador Dali. Intelligence without ambition is a bird without wings.