1 hours 44 minutes 10 seconds
🇬🇧 English
Speaker 1
00:00
The following is a conversation with Jeremy Howard. He's the founder of Fast AI, a research institute dedicated to making deep learning more accessible. He's also a distinguished research scientist at the University of San Francisco, a former president of Kegel, as well as a top-ranking competitor there and in general he's a successful entrepreneur, educator, researcher, and inspiring personality in the AI community. When someone asks me how do I get started with deep learning, Fast.ai is 1 of the top places I point them to.
Speaker 1
00:33
It's free, it's easy to get started, it's insightful and accessible. And if I may say so, it has very little BS. They can sometimes dilute the value of educational content on popular topics like deep learning. Fast.ai has a focus on practical application of deep learning and hands-on exploration of the cutting edge that is incredibly both accessible to beginners and useful to experts.
Speaker 1
00:58
This is the Artificial Intelligence podcast. If you enjoy it, subscribe on YouTube, give it 5 stars on iTunes, support it on Patreon, or simply connect with me on Twitter at Lex Friedman, spelled F-R-I-D-M-A-N. And now, here's my conversation with Jeremy Howard.
Speaker 2
01:18
What's the first program you ever written?
Speaker 3
01:21
First program I wrote that I remember would be at high school. I did an assignment where I decided to try to find out if there were some better musical scales than the normal 12 tone, 12 interval scale. So I wrote a program on my Commodore 64 in BASIC that searched through other scale sizes to see if it could find 1 where there were more accurate, you know, harmonies.
Speaker 3
01:51
Like mid-tone? Like you want an actual exactly 3 to 2 ratio, whereas with a 12 interval scale, it's not exactly 3 to 2, for example.
Speaker 2
02:01
So that's well-tempered,
Speaker 3
02:03
as they say.
Speaker 2
02:05
And basic on a Commodore
Speaker 1
02:06
64.
Speaker 2
02:07
Yeah. What was the interest in music from? Or is it just...
Speaker 3
02:10
I did music all my life. So I played saxophone and clarinet and piano and guitar and drums and whatever.
Speaker 2
02:17
So how does that thread go through your life? Where's music today?
Speaker 3
02:24
It's not where I wish it was. I, for various reasons, couldn't really keep it going, particularly because I had a lot of problems with RSI with my fingers. And so I had to kind of like cut back anything that used hands and fingers.
Speaker 3
02:39
I hope 1 day I'll be able to get back to it health wise.
Speaker 2
02:43
So there's a love for music underlying
Speaker 3
02:45
it all. Sure, yeah.
Speaker 2
02:47
What's your favorite instrument?
Speaker 3
02:49
Saxophone. Sax. Baritone saxophone. Well, probably bass saxophone, but they're awkward.
Speaker 2
02:57
Well, I always love it when music is coupled with programming. There's something about a brain that utilizes those that emerges with creative ideas. So you've used and studied quite a few programming languages.
Speaker 2
03:11
Can you give an overview of what you've used? What are the pros and cons of each?
Speaker 3
03:17
My favorite programming environment, almost certainly was Microsoft Access back in the earliest days. That was Visual Basic for Applications, which is not a good programming language, but the programming environment was fantastic. It's like the ability to create, you know, user interfaces and tie data and actions to them and create reports and all that is, I've never seen anything as good.
Speaker 3
03:46
There's things nowadays like Airtable, which are like small subsets of that, which people love for good reason, but unfortunately nobody's ever achieved anything like that.
Speaker 2
04:01
What is that? If you could pause on that for a second.
Speaker 3
04:03
Oh, Access.
Speaker 2
04:03
Access is it a database
Speaker 3
04:06
program that Microsoft produced, part of Office, and that kind of withered, you know, but basically it lets you in a totally graphical way create tables and relationships and queries and tie them to forms and set up, you know, event handlers and calculations. And it was a very complete, powerful system designed for not massive, scalable things, but for like useful little applications that I loved.
Speaker 2
04:36
So what's the connection between Excel and Access?
Speaker 3
04:40
So very close. So Access kind of was the relational database equivalent, if you like. So people still do a lot of that stuff that should be in access in Excel.
Speaker 3
04:52
Excel is great as well. But it's just not as rich a programming model as VBA combined with a relational database. And so I've always loved relational databases, but today programming on top of a relational database is just a lot more of a headache. You know, you generally either need to kind of, you know, you need something that connects, that runs some kind of database server, unless you use SQLite, which has its own issues.
Speaker 3
05:24
And you kind of often, if you want to get a nice programming model, you'll need to like create an add an ORM on top. And then, I don't know, there's all these pieces tied together, and it's just a lot more awkward than it should be. There are people that are trying to make it easier. So in particular, I think of F sharp, you know, Don Syme, who, him and his team have done a great job of making something like a database appear in the type system.
Speaker 3
05:51
So you actually get like tab completion for fields and tables and stuff like that. Anyway, so that was kind of, anyway, so like that whole VBA office thing, I guess, was a starting point, which I still miss. And I got into standard visual basic,
Speaker 2
06:07
which- That's interesting just to pause on that for a second. It's interesting that you're connecting programming languages to the ease of management of data. Yeah.
Speaker 2
06:18
So in your use of programming languages, you always had a love and a connection with data.
Speaker 3
06:24
I've always been interested in doing useful things for myself and for others, which generally means getting some data and doing something with it and putting it out there again. So that's been my interest throughout. So I also did a lot of stuff with AppleScript back in the early days.
Speaker 3
06:43
So it's kind of nice being able to get the computer and computers to talk to each other and to do things for you. And then I think that 1, the programming language I most loved then would have been Delphi, which was Object Pascal, created by Anders Helsberg, who previously did Turbo Pascal, and then went on to create .NET, and then went on to create TypeScript. Delphi was amazing, because it was like a compiled, fast language that was as easy to use as Visual Basic.
Speaker 2
07:20
Delphi, what is it similar to in more modern languages?
Speaker 3
07:27
Visual Basic.
Speaker 2
07:28
Visual Basic.
Speaker 3
07:29
Yeah, But a compiled fast version. So I'm not sure there's anything quite like it anymore. If you took like C sharp or Java and got rid of the virtual machine and replaced it with something you could compile a small type binary.
Speaker 3
07:46
I feel like it's where Swift could get to with the new Swift UI and the cross-platform development going on. Like that's 1 of my dreams is that we'll Hopefully get back to where Delphi was. There is actually a free Pascal project nowadays called Lazarus, which is also attempting to kind of recreate Delphi. So they're making good progress.
Speaker 2
08:16
So okay, Delphi, that's 1 of your favorite programming languages.
Speaker 3
08:20
Or at least programming environments. Again, I'd say Pascal's not a nice language. If you wanted to know specifically about what languages I like, I would definitely pick J as being an amazingly wonderful language.
Speaker 3
08:35
What's J? J, are you aware of APL? I am not,
Speaker 2
08:40
except from doing a little research on the work you've done.
Speaker 3
08:44
OK, so Not at all surprising you're not familiar with it because it's not well known, but it's actually 1 of the main families of programming languages going back to the late 50s, early 60s. So there was a couple of major directions. 1 was the kind of lambda calculus, Alonzo Church direction, which I guess kind of Lisp
Speaker 2
09:07
and
Speaker 3
09:08
Scheme and whatever, which has a history going back to the early days of computing. The second was the kind of Imperative slash, oh, you know, algo, simula, going under C, C++, so forth. There was a third, which are called array-oriented languages, which started with a paper by a guy called Ken Iverson, which was actually a math theory paper, not a programming paper.
Speaker 3
09:37
It was called Notation as a Tool for Thought. And it was the development of a new way, a new type of math notation. And the idea is that this math notation was much more flexible, expressive, and also well-defined than traditional math notation, which is none of those things. Math notation is awful.
Speaker 3
09:59
And so he actually turned that into a programming language. And because this was the early 50s, or the, sorry, late 50s, all the names were available. So he called his language a programming language or APL.
Speaker 2
10:10
APL.
Speaker 3
10:11
So APL is a implementation of notation as a tool of the thought, by which he means math notation. And Ken and his son went on to do many things, but eventually they actually produced a, you know, a new language that was built on top of all the learnings of APL. And that was called J.
Speaker 3
10:30
And J is the most expressive, composable, language of, you know, beautifully designed language I've ever seen.
Speaker 2
10:42
Does it have object oriented components? Does it have that kind of thing?
Speaker 3
10:45
Not really, it's an array oriented language. It's a new, it's the third path.
Speaker 2
10:51
Are you saying
Speaker 3
10:52
array? Array-oriented. Yeah.
Speaker 2
10:53
What does it mean to be array-oriented?
Speaker 3
10:55
So array-oriented means that you generally don't use any loops, but the whole thing is done with kind of a extreme version of broadcasting, if you're familiar with that NumPy slash Python concept. So you do a lot with 1 line of code. It looks a lot like math notation.
Speaker 3
11:17
So it's basically a... Highly compact. Mm-hmm. And the idea is that you can kind of, because you can do so much with 1 line of code, a single screen of code is very unlikely to, you very rarely need more than that to express your program.
Speaker 3
11:30
And so you can kind of keep it all in your head and you can kind of clearly communicate it. It's interesting that APL created 2 main branches, K and J. J is this kind of like open source niche community of crazy enthusiasts like me. And then the other path, K, was fascinating.
Speaker 3
11:52
It's an astonishingly expensive programming language, which many of the world's most ludicrously rich hedge funds use. So the entire K machine is so small it sits inside level 3 cache on your CPU and it easily wins every benchmark I've ever seen in terms of data processing speed. But you don't come across it very much because it's like $100,000 per CPU to run it. It's like this path of programming languages is just so much, I don't know, so much more powerful in every way than the ones that almost anybody uses every day.
Speaker 2
12:33
So it's all about computation. It's really focusing.
Speaker 3
12:38
It's pretty heavily focused on computation. I mean, so much of programming is data processing by definition. And so there's a lot of things you can do with it, but Yeah, there's not much work being done on making like Use user interface Toolkits or whatever.
Speaker 3
12:56
I mean there's some but it's they're not great
Speaker 2
12:59
at the same time you've done a lot of stuff with Perl and Python. Yeah. So where does that fit into the picture of J and K and APL?
Speaker 3
13:08
Well, it's just much more pragmatic. Like in the end, you have to end up where the libraries are. Because to me, my focus is on productivity.
Speaker 3
13:21
I just want to get stuff done and solve problems. So Pell was great. I created an email company called Fastmail and Pell was great because back in the late 90s, early 2000s, it just had a lot of stuff it could do. I still had to write my own monitoring system and my own web framework, my own whatever, because none of that stuff existed.
Speaker 3
13:45
But it was a super flexible language to do that in.
Speaker 2
13:50
And you used Perl for FastMail, you used it as a backend? So everything was written in Perl?
Speaker 3
13:56
Yeah. Yeah. Everything, everything was Perl.
Speaker 2
13:58
Why do you think Perl hasn't succeeded or hasn't dominated the market where Python really takes over a lot of
Speaker 3
14:07
the same task. Yeah. Well, I mean, Perl did dominate.
Speaker 3
14:09
It was everything, everywhere. But then the guy that ran Perl, Larry Wall, kind of just didn't put the time in anymore. And no project can be successful if there isn't, you know, particularly 1 that started with a strong leader that loses that strong leadership. So then Python has kind of replaced it.
Speaker 3
14:37
You know, Python is a lot less elegant language in nearly every way, but it has the data science libraries and a lot of them are pretty great. So I kind of use it because it's the best we have, but it's definitely not good enough.
Speaker 2
15:01
But what do you think the future of programming looks like? What do you hope the future of programming looks like? If we zoom in on the computational fields, on data science, on machine learning?
Speaker 3
15:11
I hope Swift is successful. Because the goal of Swift, the way Chris Latner describes it, is to be infinitely hackable. And that's what I want.
Speaker 3
15:23
I want something where me and the people I do research with and my students can look at and change everything from top to bottom. There's nothing mysterious and magical and inaccessible. Unfortunately with Python, it's the opposite of that because Python's so slow, it's extremely unhackable. You get to a point where it's like, okay, from here on down at C.
Speaker 3
15:45
So your debugger doesn't work in the same way, your profiler doesn't work in the same way, your build system doesn't work in the same way. It's really not very hackable at all.
Speaker 2
15:53
What's the part you like to be hackable? Is it for the objective of optimizing training of neural networks, inference of neural networks, is it performance of the system? Or is there some non-performance related just
Speaker 3
16:07
creative idea? It's everything. I mean, in the end, I want to be productive as a practitioner.
Speaker 3
16:13
So that means that, so like at the moment, our understanding of deep learning is incredibly primitive. There's very little we understand. Most things don't work very well, even though it works better than anything else out there. There's so many opportunities to make it better.
Speaker 3
16:28
So you look at any domain area, like, I don't know, speech recognition with deep learning or natural language processing classification with deep learning or whatever. Every time I look at an area with deep learning, I always see like, oh, it's terrible. There's lots and lots of obviously stupid ways to do things that need to be fixed. So then I want to be able to jump in there and quickly experiment and make them better.
Speaker 2
16:54
You think the programming language is, has a role in that? Huge role.
Speaker 3
17:00
Yeah. So currently Python, has a big, gap in terms of our ability to innovate, particularly around recurrent neural networks and natural language processing, because it's so slow. The actual loop where we actually loop through words, we have to do that whole thing in CUDA C. So we actually can't innovate with the kernel, the heart of that most important algorithm.
Speaker 3
17:31
And it's just a huge problem. And this happens all over the place. So we hit, you know, research limitations. Another example, convolutional neural networks, which are actually the most popular architecture for lots of things, maybe most things in deep learning.
Speaker 3
17:48
We almost certainly should be using sparse convolutional neural networks. But only like 2 people are, because to do it you have to rewrite all of that courtesy level stuff. And yeah, just researchers and practitioners don't. So like there's just big gaps in like what people actually research on, what people actually implement because of the programming language problem.
Speaker 2
18:13
So you think, you think it's just too difficult to write in Kudas C that a programming, a higher level programming language like Swift should enable the easier fooling around creative stuff with RNNs or with sparse convolutional networks. Kind of. Who's at fault?
Speaker 2
18:38
Who's in charge of making it easy for a researcher to play around?
Speaker 3
18:42
I mean, no one's at fault. It's just nobody's got around to it yet. Or it's just, it's hard.
Speaker 3
18:46
Right. And I mean, part of the fault is that we ignored that whole APL kind of direction. Most or nearly everybody did for 60 years, 50 years. But recently people have been starting to reinvent pieces of that and kind of create some interesting new directions in the compiler technology.
Speaker 3
19:07
So the place where that's particularly happening right now is something called MLIR, which is something that again, Chris Latner, the Swift guy is leading. And yeah, because it's actually not going to be Swift on its own that solves this problem. Because the problem is that currently writing a acceptably fast, you know, GPU program is too complicated regardless of what language you use. And that's just because if you have to deal with the fact that I've got, you know, 10,000 threads and I have to synchronize between them all and I have to put my thing into grid blocks and think about warps and all this stuff.
Speaker 3
19:47
It's just, it's just so much boilerplate that to do that well, you have to be a specialist at that. And it's going to be a year's work to, you know, optimize that algorithm in that way. But With things like tensor comprehensions and Tile and MLIR and TVM, there's all these various projects which are all about saying, let's let people create domain-specific languages for tensor computations. These are the kinds of things we do generally on the GPU for deep learning.
Speaker 3
20:21
And then have a compiler which can optimize that tensor computation. A lot of this work is actually sitting on top of a project called Halide, which is a mind-blowing project where they came up with such a domain-specific language. In fact, 2. 1 domain-specific language for expressing this is what my tensor computation is.
Speaker 3
20:43
And another domain-specific language for expressing this is the kind of the way I want you to structure the compilation of that, like do it block by block and do these bits in parallel. And they were able to show how you can compress the amount of code by 10x compared to optimized GPU code and get the same performance. So that's like, so these other things are kind of sitting on top of that kind of research and MLIR is pulling a lot of those best practices together. And now we're starting to see work done on making all of that directly accessible through Swift, so that I could use Swift to kind of write those domain-specific languages.
Speaker 3
21:25
And hopefully we'll get then Swift CUDA kernels written in a very expressive and concise way that looks a bit like J in APL. And then Swift layers on top of that, and then a Swift UI on top of that. And, you know, that'll be so nice if we can get to that point.
Speaker 2
21:42
Now, does it all eventually boil down to CUDA and NVIDIA GPUs?
Speaker 3
21:48
Unfortunately at the moment it does. But 1 of the nice things about MLIR, if AMD ever gets their act together, which they probably won't, is that they or others could write MLIR backends for other GPUs or other tensor computation devices, of which today there are increasing number like Graphcore or Vertex AI or whatever. So yeah, being able to target lots of backends would be another benefit of this.
Speaker 3
22:24
And the market really needs competition because at the moment, NVIDIA is massively overcharging for their kind of enterprise class cards because there is no serious competition because nobody else is doing the software properly.
Speaker 2
22:39
In the cloud there is some competition right?
Speaker 3
22:42
But not really other than TPUs perhaps. TPUs are almost unprogrammable at the moment.
Speaker 2
22:48
So you can't, the TPUs has the same problem that you can't.
Speaker 3
22:51
It's even worse. So TPUs, Google actually made an explicit decision to make them almost entirely unprogrammable because they felt that there was too much IP in there. And if they gave people direct access to program them, people would learn their secrets.
Speaker 3
23:04
So you can't actually directly program the memory in a TPU. You can't even directly create code that runs on and that you look at on the machine that has the GPU, it all goes through a virtual machine. So all you can really do is this kind of cookie cutter thing of like plug-in high-level stuff together, which is just super Tedious and annoying and totally unnecessary.
Speaker 2
23:33
So what was the, tell me if you could, the origin story of Fast.ai? What is the motivation, its mission, its dream?
Speaker 3
23:45
So I guess the founding story is heavily tied to my previous startup, which is a company called Enlytic, which was the first company to focus on deep learning for medicine. And I created that because I saw there was a huge opportunity to, there's about a 10X shortage of the number of doctors in the world, in the developing world that we need. Expected it would take about 300 years to train enough doctors to meet that gap, but I guess that maybe if we used deep learning for some of the analytics, we could maybe make it so you don't need as highly trained doctors.
Speaker 2
24:27
For diagnosis?
Speaker 3
24:28
For diagnosis and treatment planning.
Speaker 2
24:29
Where's the biggest benefit, just before we get to fast AI, where's the biggest benefit of AI in medicine that you see today?
Speaker 3
24:39
Not much happening today in terms of like stuff that's actually out there. It's very early, but in terms of the opportunity, It's to take markets like India and China and Indonesia, which have big populations, Africa, small numbers of doctors, and provide diagnostic, particularly treatment planning and triage, kind of on device so that if you do a, you know, test for malaria or tuberculosis or whatever, you immediately get something that even a healthcare worker that's had a month of training can get a very high quality assessment of whether the patient might be at risk and tell, you know, okay, we'll send them off to a hospital. So for example, in Africa, outside of South Africa, there's only 5 pediatric radiologists for the entire continent.
Speaker 3
25:35
So most countries don't have any. So if your kid is sick and they need something diagnosed through medical imaging, the person, even if you're able to get medical imaging done, the person that looks at it will be, you know, a nurse at best. But actually in India, for example, and China, almost no x-rays are read by anybody, by any trained professional, because they don't have enough. So if instead we had an algorithm that could take the most likely high-risk 5% and say triage, basically say, okay, someone needs to look at this.
Speaker 3
26:13
It would massively change the kind of way that what's possible with medicine in the developing world. And remember, they have, increasingly, they have money. They're the developing world, they're not the poor world, they're developing world. So they have the money, so they're building the hospitals, they're getting the diagnostic equipment, but they just, there's no way for a very long time will they be able to have the expertise.
Speaker 2
26:38
Shortage of expertise. Okay, and that's where the deep learning systems can step in and magnify the expertise they do have.
Speaker 3
26:46
Exactly.
Speaker 2
26:46
Yeah. So you do see just to linger it a little bit longer, the interaction, do you still see the human experts still at the core of these systems? Is there something in medicine that could be automated almost completely?
Speaker 3
27:03
I don't see the point of even thinking about that because we have such a shortage of people Why would we not why would we want to find a way not to use them? Right, like we have people so the idea of like even from an economic point of view, if you can make them 10X more productive, getting rid of the person doesn't impact your unit economics at all. And it totally ignores the fact that there are things people do better than machines.
Speaker 3
27:28
So it's just, to me, that's not a useful way of framing
Speaker 2
27:33
the problem. I guess, just to clarify, I guess I meant there may be some problems where you can avoid even going to the expert ever, sort of maybe preventative care or some basic stuff, low-hanging fruit, allowing the expert to focus on the things that are really that you know.
Speaker 3
27:51
Well, that's what the triage would do, right? So the triage would say, okay, is
Speaker 1
27:56
99%
Speaker 3
27:58
sure there's nothing here? So that can be done on device, and they can just say, okay, go home. So the experts are being used to look at the stuff which Has some chance it's worth looking at which Most things is it's not you know, it's fine.
Speaker 2
28:16
Why do you think we haven't quite made progress on that yet in terms of the scale of how much AI is applied in the medical field? Oh, there's
Speaker 3
28:27
a lot of reasons. I mean, 1 is it's pretty new. I only started in Lytic in like 2014.
Speaker 3
28:32
And before that, like, it's hard to express to what degree the medical world was not aware of the opportunities here. So I went to RSNA, which is the world's largest radiology conference, and I told everybody I could, you know, like, I'm doing this thing with deep learning, please come and check it out. And no 1 had any idea what I was talking about and no 1 had any interest in it. So like We've come from absolute 0, which is hard.
Speaker 3
29:05
And then the whole regulatory framework, education system, everything is just set up to think of doctoring in a very different way. So today there is a small number of people who are deep learning practitioners and doctors at the same time. And we're starting to see the first ones come out of their PhD programs. So Zach Kahan over in Boston, Cambridge has a number of students now who are data science experts, deep learning experts, and actual medical doctors.
Speaker 3
29:46
Quite a few doctors have completed our fast AI course now and are publishing papers and creating journal reading groups in the American Council of Radiology. And like, it's just starting to happen, But it's going to be a long process. The regulators have to learn how to regulate this. They have to build, you know, guidelines.
Speaker 3
30:08
And then the lawyers at hospitals have to develop a new way of understanding that sometimes it makes sense for data to be, you know, looked at in raw form in large quantities in order to create world-changing results.
Speaker 2
30:26
Yeah, so regulation around data, all that, it sounds, it's probably the hardest problem, but sounds reminiscent of autonomous vehicles as well. Many of the same regulatory challenges, many of the same data challenges.
Speaker 3
30:40
Yeah. I mean, funnily enough, the problem is less the regulation and more the interpretation of that regulation by, by lawyers in hospitals. So HIPAA is actually, was designed to, it's, it, to be, and HIPAA is not standing, does not stand for privacy. It stands for portability.
Speaker 3
30:57
It's actually meant to be a way that data can be used. And it was created with lots of gray areas because the idea is that would be more practical and it would help people to use this legislation to actually share data in a more thoughtful way. Unfortunately, it's done the opposite because when a lawyer sees a gray area, they say, oh, if we don't know, we won't get sued, then we can't do it. So HIPAA is not exactly the problem.
Speaker 3
31:26
The problem is more that there's, hospital lawyers are not incented to make bold decisions about data portability.
Speaker 2
31:36
Or even to embrace technology that saves lives. Right. They more want to not get in trouble for embracing that
Speaker 3
31:43
technology. Right. Also, it is also, saves lives in a very abstract way, which is like, oh, we've been able to release these 100,000 anonymized records. I can't point to the specific person whose life that saved.
Speaker 3
31:55
I can say like, oh, we ended up with this paper which found this result, which, you know, diagnosed a thousand more people than we would have otherwise. But it's like, which ones were helped? It's very abstract.
Speaker 2
32:07
And on the counter side of that, you may be able to point to a life that was taken because of something that was...
Speaker 3
32:14
Yeah, Or a person whose privacy was violated. It's like, oh, this specific person, you know, was de-identified. De-identified.
Speaker 2
32:25
Just a fascinating topic. We're jumping around. We'll get back to fast.ai.
Speaker 2
32:29
But on the question of privacy, Data is the fuel for so much innovation in deep learning. What's your sense on privacy, whether we're talking about Twitter, Facebook, YouTube, just the technologies like in the medical field that rely on people's data in order to create impact. How do we get that right? Respecting people's privacy and yet creating technology that learns from data.
Speaker 3
33:03
1 of my areas of focus is on doing more with less data. So most vendors, unfortunately, are strongly incented to find ways to require more data and more computation. So Google and IBM being the most obvious.
Speaker 3
33:24
IBM. Yeah. Sorry. So Watson, you know, so Google and IBM both strongly push the idea that you have to be, you know, that they have more data and more computation and more intelligent people than anybody else.
Speaker 3
33:37
And so you have to trust them to do things because nobody else can do it. And Google's very upfront about this. Like Jeff Dean has gone out there and given talks and said our goal is to require a thousand times more computation but less people Our goal is to use the people That you have better and the data you have better and the computation you have better. So 1 of the things that we've discovered is, or at least highlighted, is that you very, very, very often don't need much data at all.
Speaker 3
34:13
And so the data you already have in your organization will be enough to get state-of-the-art results. So, like, my starting point would be to kind of say around privacy is a lot of people are looking for ways to share data and aggregate data. But I think often that's unnecessary. They assume that they need more data than they do because they're not familiar with the basics of transfer learning, which is this critical technique for needing orders of magnitude less data.
Speaker 2
34:41
Is your sense, 1 reason you might want to collect data from everyone is like in the recommender system context, where your individual, Jeremy Howard's individual data is the most useful for providing a product that's impactful for you. So for giving you advertisements, for recommending to you movies, for doing medical diagnosis. Is your sense we can build with a small amount of data, general models that will have a huge impact for most people that we don't need to have data from each individual?
Speaker 3
35:19
On the whole I'd say yes. I mean there are things like you know recommender systems have this cold start problem where you know Jeremy is a new customer we haven't seen him before so we can't recommend him things based on what else he's bought and liked with us. And there's various workarounds to that.
Speaker 3
35:38
Like in a lot of music programs, we'll start out by saying, which of these artists do you like? Which of these albums do you like? Which of these songs do you like? Netflix used to do that.
Speaker 3
35:50
Nowadays, they, they tend not to. People kind of don't like that because they think, oh, we don't want to bother the user. So you could work around that by having some kind of data sharing where you get my marketing record from Axiom or whatever and try to guesstimate. To me, the benefit to me and to society of saving me 5 minutes on answering some questions versus the negative externalities of the privacy issue doesn't add up.
Speaker 3
36:24
So I think like a lot of the time the places where people are invading our privacy in order to provide convenience is really about just trying to make them more money and, and they move these negative externalities to places that they don't have to pay for them. So, when you actually see regulations appear that actually cause the companies that create these negative externalities to have to pay for it themselves, they say, well, we can't do it anymore. So the cost is actually too high. But for something like medicine, yeah, I mean, the hospital has my, you know, medical imaging, my pathology studies, my medical records.
Speaker 3
37:09
And also I own my medical data. So you can, so I help a startup called DocAI. 1 of the things DocAI does is that there's an app that you can connect to, you know, Sutter Health and LabCorp and Walgreens and download your medical data to your phone and then upload it again at your discretion to share it as you wish. So with that kind of approach, we can share our medical information with the people we want to.
Speaker 2
37:44
Yeah, so control. I mean, really being able to control who you share with and so on. Yeah.
Speaker 2
37:49
So that has a beautiful, interesting tangent, but to return back to the origin story of Fast.AI.
Speaker 3
37:59
Right. So before I started Fast.ai, I spent a year researching where are the biggest opportunities for deep learning. Because I knew from my time at Kaggle in particular that deep learning had kind of hit this threshold point where it was rapidly becoming the state-of-the-art approach in every area that looked at it. And I'd been working with neural nets for over 20 years.
Speaker 3
38:25
I knew that from a theoretical point of view, once it hit that point, it would do that in kind of just about every domain. And so I kind of spent a year researching what are the domains that's going to have the biggest low-hanging fruit in the shortest time period. I picked medicine, but there were so many I could have picked. And so there was a kind of level of frustration for me of like, okay, I'm really glad we've opened up the medical deep learning world and today it's huge, as you know, but we can't do, you know, I can't do everything.
Speaker 3
38:58
I don't even know, like, like in medicine, it took me a really long time to even get a sense of like what kind of problems do medical practitioners solve, what kind of data do they have, who has that data. So I kind of felt like I need to approach this differently if I want to maximize the positive impact of deep learning. Rather than me picking an area and trying to become good at it and building something, I should let people who are already domain experts in those areas and who already have the data do it themselves.
Speaker 2
39:29
So
Speaker 3
39:29
that was the reason for Fast.ai is to basically try and figure out how to get deep learning into the hands of people who could benefit from it and help them to do so in as quick and easy and effective a way as possible.
Speaker 2
39:47
Got it. So sort of empower the domain experts.
Speaker 3
39:50
Yeah. And like partly it's because like, unlike most people in this field, my background is very applied and industrial, like My first job was at McKinsey and Company. I spent 10 years in management consulting. I spent a lot of time with domain experts,
Speaker 2
40:09
you
Speaker 3
40:10
know, so I kind of respect them and appreciate them and know I know that's where the value generation in society is. And so I also know how most of them can't code and most of them don't have the time to invest, you know, 3 years in a graduate degree or whatever. So it's like, how do I upskill those domain experts?
Speaker 3
40:33
I think that would be a super powerful thing, you know, biggest societal impact I could have. So yeah, that was the thinking.
Speaker 2
40:42
So so much of fast AI students and researchers and the things you teach are pragmatically minded, practically minded, figuring out ways how to solve real problems and fast. So from your experience, what's the difference between theory and practice of deep learning?
Speaker 3
41:03
Well, most of the research in the deep learning world is a total waste of time.
Speaker 2
41:09
Right, that's what I was getting at.
Speaker 3
41:11
Yeah. It's a problem in science in general. Scientists need to be published, which means they need to work on things that their peers are extremely familiar with and can recognize an advance in that area. So that means that they all need to work on the same thing.
Speaker 3
41:30
And so it really, and the thing they work on, there's nothing to encourage them to work on things that are practically useful. So you get just a whole lot of research, which is minor advances and stuff that's been very highly studied and has no significant practical impact. Whereas the things that really make a difference, like I mentioned transfer learning, like if we can do better at transfer learning, then it's this like world-changing thing where suddenly like lots more people can do world-class work with less resources and less data. But almost nobody works on that.
Speaker 3
42:08
Or another example, active learning, which is the study of, like, how do we get more out of the human beings in the loop?
Speaker 2
42:15
That's my favorite topic.
Speaker 3
42:17
Yeah, so active learning's great, but it's almost nobody working on it because it's just not a trendy thing right now.
Speaker 2
42:23
You know what somebody started to interrupt? He was saying that nobody's publishing on active learning, But there's people inside companies, anybody who actually has to solve a problem, they're going to innovate on active learning.
Speaker 3
42:39
Yeah. Everybody kind of reinvents active learning when they actually have to work in practice because they start labeling things And they think, gosh, this is taking a long time and it's very expensive. And then they start thinking, well, why am I labeling everything? I'm only, the machine's only making mistakes on those 2 classes.
Speaker 3
42:55
They're the hard ones. Maybe I'll just start labeling those 2 classes. And then you start thinking, well, why did I do that manually? Why can't I just get the system to tell me which things are going to be hardest?
Speaker 3
43:04
It's an obvious thing to do, but yeah, it's just like transfer learning. It's understudied and the academic world just has no reason to care about practical results. The funny thing is, like I've only really ever written 1 paper. I hate writing papers.
Speaker 3
43:21
And I didn't even write it. It was my colleague, Sebastian Rueda, who actually wrote it. I just did the research for it. But it was basically introducing transfer learning, successful transfer learning to NLP for the first time.
Speaker 3
43:34
The algorithm is called ULMfit. And it actually, I actually wrote it for the course, for the first day of the course. I wanted to teach people NLP And I thought I only want to teach people practical stuff and I think the only practical stuff is transfer learning. And I couldn't find any examples of transfer learning in NLP.
Speaker 3
43:53
So I just did it. And I was shocked to find that as soon as I did it, which, you know, the basic prototype took a couple of days, it smashed the state of the art on 1 of the most important data sets in a field that I knew nothing about. And I just thought, well, this is ridiculous. And so I spoke to Sebastian about it and he kindly offered to write it up, the results.
Speaker 3
44:17
And so it ended up being published in ACL, which is the top linguist, computational linguistics conference. So like people do actually care once you do it, but I guess it's difficult for maybe like junior researchers or like, like I don't care whether I get citations or papers or whatever. There's nothing in my life that makes that important, which is why I've never actually bothered to write a paper myself. But for people who do, I guess they have to pick the kind of safe option, which is like, yeah, make a slight improvement on something that everybody's already working on.
Speaker 2
44:55
Yeah. Nobody does anything interesting or succeeds in life with the safe option. Although, I
Speaker 3
45:01
mean, the nice thing is nowadays, everybody is now working on NLP transfer learning because since that time we've had GPT and GPT-2 and BERT and you know, it's like, it's so yeah, once you show that something's possible, everybody jumps in, I guess. So
Speaker 2
45:17
I hope, hope to be a part of, and I hope to see more innovation in active learning in the same way. I think transfer learning and active learning
Speaker 3
45:24
are fascinating, public open work. I actually helped start a startup called Platform AI, which is really all about active learning. And yeah, it's been interesting trying to kind of see what research is out there and make the most of it.
Speaker 3
45:37
And there's basically none. So we've had to do all our own research.
Speaker 2
45:40
Once again, and just as you described, can you tell the story of the Stanford competition, DawnBench and FastAI's achievement on it?
Speaker 3
45:51
Sure. So something which I really enjoy is that I basically teach 2 courses a year. The practical deep learning for coders, which is kind of the introductory course, and then cutting edge deep learning for coders, which is the kind of research level course. And while I teach those courses, I have a, I basically have a Big office at the University of San Francisco would be enough for like 30 people and I invite anybody any student who wants to come and Hang out with me while I build the course and so generally it's full And so we have 20 or 30 people in a big office with nothing to do but study deep learning.
Speaker 3
46:33
So it was during 1 of these times that somebody in the group said, oh, there's a thing called DawnBench that looks interesting. And I was like, what the hell is that? And they set out some competition to see how quickly you can train a model. Seems kind of not exactly relevant to what we're doing, but it sounds like the kind of thing which you might be interested in.
Speaker 3
46:52
I checked it out and I was like, oh crap, there's only 10 days till it's over. It's pretty much too late. And we're kind of busy trying to teach this course. But we're like, oh, it would make an interesting Case study for the course like it's all the stuff we're already doing Why don't we just put together our current best practices and ideas?
Speaker 3
47:12
So me and I guess about 4 students Just decided to give it a go and we focused on this small 1 called sci-fi 10 Which is little 32 by 32 pixel images Can
Speaker 2
47:24
you say what dimensions?
Speaker 3
47:25
Yeah, so it's a it's a competition to train a model as fast as possible It was run by Stanford and
Speaker 2
47:30
as cheap as possible to
Speaker 3
47:32
That's also another 1 for as cheap as possible. And there's a couple of categories, ImageNet and SciFar 10. So ImageNet's this big 1.3 million image thing that took a couple of days to train.
Speaker 3
47:45
I remember a friend of mine, Pete Warden, who's now at Google. I remember he told me how he trained ImageNet a few years ago, and he basically had this little granny flat out the back that he turned into his ImageNet training center. And he figured, you know, after like a year of work, he figured out how to train it in like 10 days or something. It's like, that was a big job.
Speaker 3
48:08
Well, Cypher 10 at that time, you could train in a few hours. You know, it was much smaller and easier. So we thought we'd try Cypher
Speaker 1
48:15
10.
Speaker 3
48:18
And yeah, I'd really never done that before. Like I'd never really, like things like using more than 1 GPU at a time was something I tried to avoid. Cause to me it's like very against the whole idea of accessibility is, you should better do things with 1 GPU.
Speaker 2
48:34
I mean, have you asked in the past before, after having accomplished something, how do I do this faster, much faster?
Speaker 3
48:42
Oh, always. But it's always, for me, it's always, how do I make it much faster on a single GPU that a normal person could afford in their day-to-day life. It's not how could I do it faster by, you know, having a huge data center?
Speaker 3
48:55
Because to me, it's all about like, as many people should be able to use something as possible without Fussing around with infrastructure. So anyways, in this case, it's like, well, we can use 8 GPUs just by renting a AWS machine. So we thought we'd try that. And yeah, basically using the stuff we were already doing, we were able to get, you know, the speed, you know, within a few days, we had the speed down to, I don't know, a very small number of minutes.
Speaker 3
49:25
I can't remember exactly how many minutes it was, but it might've been like 10 minutes or something. And so, yeah, we found ourselves at the top of the leaderboard easily for both time and money, which really shocked me because the other people competing in this were like Google and Intel and stuff, who I like know a lot more about this stuff than I think we do. So then we were emboldened. We thought, let's try the ImageNet 1 too.
Speaker 3
49:50
I mean, it seemed way out of our league, but our goal was to get under 12 hours. And we did, which was really exciting. And but we didn't put anything up on the leaderboard, But we were down to like 10 hours. But then Google put in some like 5 hours or something.
Speaker 3
50:09
And we're just like, oh we're so screwed. But we kind of thought, we'll keep trying. You know if Google can do it. In fact, I mean Google did on 5 hours on some on like a TPU pod or something, like a lot of hardware.
Speaker 3
50:24
But we kind of like had a bunch of ideas to try, like a really simple thing was, why are we using these big images? They're like
Speaker 1
50:31
224, 256
Speaker 3
50:33
by 256 pixels. You know, why don't we try smaller ones?
Speaker 2
50:37
And just to elaborate, there's a constraint on the accuracy that your trained model is supposed to achieve.
Speaker 3
50:42
Yeah, you got to achieve 93%, I think it was for ImageNet. Exactly.
Speaker 2
50:49
Which is very tough. So you have to.
Speaker 3
50:50
Yeah, 93 percent. Like they picked a good threshold. It was a little bit higher than what the most commonly used ResNet-50 model could achieve at that time.
Speaker 3
51:03
So yeah, so it's quite a difficult problem to solve. But yeah, we realized if we actually just use 64 by 64 images, it trained a pretty good model. And then we could take that same model and just give it a couple of epochs to learn
Speaker 1
51:19
224
Speaker 3
51:20
by 224 images. And it was basically already trained. Which makes a lot of sense.
Speaker 3
51:25
Like if you teach somebody, like here's what a dog looks like, and you show them low res versions. And then you say, here's a really clear picture of a dog. They already know what a dog looks like. So that like just we jumped to the front and we ended up winning parts of that competition.
Speaker 3
51:46
We actually ended up doing a distributed version over multiple machines a couple of months later and ended up at the top of the leaderboard. We had 18 minutes. ImageNet. Yeah.
Speaker 3
51:56
And it was, and people have just kept on blasting through again and again since then.
Speaker 2
52:02
So what's your view on multi GPU or multiple machine training in general as a way to speed code up? I think it's largely a waste of time. Both multi GPU on a single machine and.
Speaker 3
52:15
Yeah, particularly multi machines. Cause it's just clunky. Multi GPUs is less clunky than it used to be, but to me anything that slows down your iteration speed is a waste of time.
Speaker 3
52:31
So you could maybe do your very last, you know, perfecting of the model on multi GPUs if you need to. But, so for example, I think doing stuff on ImageNet is generally a waste of time. Why test things on 1.3 million images? Most of us don't use 1.3 million images.
Speaker 3
52:50
And we've also done research that shows that doing things on a smaller subset of images gives you the same relative answers anyway. So from a research point of view, why waste that time? So actually I released a couple of new datasets recently. 1 is called ImageNet, the French ImageNet, which is a small subset of ImageNet, which is designed to be easy to classify.
Speaker 2
53:15
What's how do you spell ImageNet?
Speaker 3
53:17
It's got an extra T and E at the end, because it's very French.
Speaker 2
53:20
Image, okay.
Speaker 3
53:21
Yeah, and then another 1 called image wolf, which is a subset of image net that only contains dog breeds. And that's a hard 1, right? That's a hard 1.
Speaker 3
53:33
Yeah. And I've discovered that if you just look at these 2 subsets, you can train things on a single GPU in 10 minutes. And the results you get directly transferable to ImageNet nearly all the time. And so now I'm starting to see some researchers start to use these
Speaker 2
53:47
much smaller datasets. I so deeply love the way you think, because I think you might have written a blog post saying that sort of going to these big datasets is encouraging people to not think creatively. Absolutely.
Speaker 2
54:04
So you're too, it sort of constrains you to train on large resources. And because you have these resources, you think more resources will be better. And then you start, so Like somehow you kill the creativity. Yeah.
Speaker 3
54:17
And even worse than that, Lex, I keep hearing from people who say, I decided not to get into deep learning because I don't believe it's accessible to people outside of Google to do useful work. So like I see a lot of people make an explicit decision to not learn this incredibly valuable tool, because they've drunk the Google Kool-Aid, which is that only Google's big enough and smart enough to do it. And I just find that so disappointing and it's so wrong.
Speaker 2
54:45
And I think all of the major breakthroughs in AI in the next 20 years will be doable on a single GPU. Like I would say, my sense is all the big sort of... Well, let's
Speaker 3
54:57
put it this way. None of the big breakthroughs of the last 20 years have required multiple GPUs. So like batch norm, value, dropout, to demonstrate that there's something to that.
Speaker 3
55:08
Every 1 of them, none of them has required multiple GPUs.
Speaker 2
55:11
GANs, the original GANs didn't require multiple GPUs.
Speaker 3
55:15
Well, and we've actually recently shown that you don't even need GANs. So we've developed GAN level outcomes without needing GANs. And we can now do it with, again, by using transfer learning, we can do it in a couple of hours on a single GPU.
Speaker 3
55:29
You're just
Speaker 2
55:30
using a generator model, like without the adversarial part?
Speaker 3
55:32
Yeah, so we've found loss functions that work super well without the adversarial part. And then 1 of our students, a guy called Jason Antich, has created a system called DeOldify, which uses this technique to colorize old black and white movies. You can do it on a single GPU, colorize a whole movie in a couple of hours.
Speaker 3
55:52
And 1 of the things that Jason and I did together was we figured out how to add a little bit of GAN at the very end, which it turns out for colorization makes it just a bit brighter and nicer. And then Jason did masses of experiments to figure out exactly how much to do, but it's still all done on his home machine on a single GPU in his lounge room. And like, if you think about like colorizing Hollywood movies, that sounds like something a huge studio would have to do. But he has the world's best results on this.
Speaker 2
56:25
There's this problem of microphones. We're just talking about microphones now. Yeah.
Speaker 2
56:28
It's such a pain in the ass to have these microphones to get good quality audio. And I tried to see if it's possible to plop down a bunch of cheap sensors and reconstruct higher quality audio from multiple sources. Because right now I haven't seen work from, okay, We can save inexpensive mics, automatically combining audio from multiple sources to improve the combined audio. Right.
Speaker 2
56:52
People haven't done that. And that feels like a learning problem. Right. So hopefully somebody can.
Speaker 3
56:56
Well, I mean, it's, it's eminently doable and it should have been done by now. I felt, I felt the same way about computational photography 4 years ago. Why are we investing in big lenses when 3 cheap lenses plus actually a little bit of intentional movement, so like hold, you know, like take a few frames, gives you enough information to get excellent sub-pixel resolution, which particularly with deep learning, you would know exactly what you're meant to be looking at.
Speaker 3
57:25
We can totally do the same thing with audio. I think it's madness that it hasn't been done yet.
Speaker 2
57:30
Is there been progress on the photography? Yeah,
Speaker 3
57:33
photography is basically standard now. So the Google Pixel Night Light, I don't know if you've ever tried it, but it's astonishing. You take a picture in almost pitch black and you get back a very high quality image.
Speaker 3
57:48
And it's not because of the lens. Same stuff with like adding the bokeh to the, you know, the background blurring. It's done computationally.
Speaker 2
57:56
This is the pixel here.
Speaker 3
57:58
Yeah, basically the everybody now is doing most of the fanciest stuff on their phones with computational photography. And also increasingly people are putting more than 1 lens on the back of the camera. So the same will happen for audio for sure.
Speaker 2
58:14
And there's applications in the audio side. If you look at an Alexa type device, most people I've seen, especially I worked at Google before, when you look at noise background removal, you don't think of multiple sources of audio. You don't play with that as much as I would hope people would.
Speaker 2
58:31
But I mean,
Speaker 3
58:31
you can still do it even with 1. Like again, it's not much work's been done in this area. So we're actually going to be releasing an audio library soon, which hopefully will encourage development of this because it's so underused.
Speaker 3
58:42
The basic approach we used for our super resolution, which Jason uses for DeOldify of generating high quality images, the exact same approach would work for audio. No one's done it yet, but it would be a couple of months work.
Speaker 2
58:56
Okay. Also learning rate in terms of DawnBench. There's some magic on learning rate that you played around with. It's interesting.
Speaker 3
59:05
Yeah. So this is all work that came from a guy called Leslie Smith. Leslie's a researcher who, like us, cares a lot about just the Practicalities of training neural networks quickly and accurately. Which you would think is what everybody should care about, but almost nobody does.
Speaker 3
59:25
And he discovered something very interesting, which he calls superconvergence, which is there are certain networks that with certain settings of high parameters could suddenly be trained 10 times faster by using a 10 times higher learning rate. Now no 1 published that paper because It's not an area of kind of active research in the academic world. No academics recognize this is important. And also deep learning in academia is not considered an experimental science.
Omnivision Solutions Ltd