1 hours 27 minutes 41 seconds
🇬🇧 English
Speaker 1
00:00
Welcome to 2020 and welcome to the Deep Learning Lecture Series. Let's start it off today to take a quick whirlwind tour of all the exciting things that happened in 17, 18, and 19 especially, and the amazing things we're going to see this year in 2020. Also as part of this series, it's going to be a few talks from some of the top people in learning in artificial intelligence. After today, of course, Start at the broad, the celebrations from the Turing Award to the limitations and the debates and the exciting growth first.
Speaker 1
00:44
And first, of course, a step back to the quote I've used before. I love it. I'll keep reusing it. AI began not with Alan Turing or McCarthy, but with the ancient wish to forge the gods, a quote from Pamela McCordick in Machines Who Think.
Speaker 1
01:00
That visualization there is just 3% of the neurons in our brain of the thalamocortical system. That magical thing between our ears that allows us all to see and hear and think and reason and hope and dream and fear our eventual mortality. All of that is the thing we wish to understand. That's the dream of artificial intelligence and recreate, recreate versions of it, echoes of it in engineering of our intelligence systems.
Speaker 1
01:36
That's the dream. We should never forget in the details I'll talk, the exciting stuff I'll talk about today. That's sort of the reason why this is exciting. This mystery that's our mind.
Speaker 1
01:49
The modern human brain, the modern human as we know them today, know and love them today, it's just about 300,000 years ago. And the Industrial Revolution is about 300 years ago. That's 0.1% of the development since the early modern human being is when we've seen a lot of the machinery. The machine was born not in stories but in actuality.
Speaker 1
02:19
The machine was engineered since the Industrial Revolution and the steam engine and the mechanized factory system and the machining tools. That's just 0.1% in the history. And that's the 300 years. Now we zoom in to the 60, 70 years since the founder, the father arguably of artificial intelligence, Alan Turing and the dreams.
Speaker 1
02:39
You know, there's always been the dance in artificial intelligence between the dreams, the mathematical foundations, and when the dreams meet the engineering, the practice, the reality. So Alan Turing has spoken many times that by the year 2000, that he would be sure that the Turing test natural language would be passed. It seems probably you said that once the machine thinking method has started, it would not take long to outstrip our feeble powers. They will be able to converse with each other to sharpen their wits.
Speaker 1
03:13
Some stage therefore, we should have to expect the machines to take control. A little shout out to self-play there. So that's the dream, both the father of the mathematical foundation of artificial intelligence and the father of dreams in artificial intelligence. And that dream again in the early days was taking reality.
Speaker 1
03:34
The practice met with the perceptron, often thought of as a single layer neural network, but actually what's not as much known as Frank Rosenblatt was also the developer of the multi-layer Perceptron. And that history zooming through has amazed our civilization. To me, 1 of the most inspiring things in this in the world of games. First with the great Garry Kasparov losing to IBM Dblue in 1997.
Speaker 1
04:03
Then Lee Sedol losing to AlphaGo in 2016, seminal moments. And captivating the world through the engineering of actual real world systems. Robots on 4 wheels, as we'll talk about today, from Waymo to Tesla, to all of the autonomous vehicle companies working in the space. Robots on 2 legs, captivating the world of what actuation, what kind of manipulation can be achieved.
Speaker 1
04:32
The history of deep learning. From 1943, the initial models from neuroscience thinking about neural networks, how to model neural networks mathematically, to the creation as I said of the single layer and the multi-layer perceptron by Frank Rosenblatt in 57 and 62, to the ideas of back propagation and recurring neural nets in the 70s and 80s, to convolutional neural networks and LSTMs and bidirectional RNNs in the 80s and 90s, to the birth of the deep learning term and the new wave, the revolution in 2006, to the ImageNet and AlexNet, the seminal moment that captivated the possibility, the imagination of the AI community of what neural networks can do in the image and natural language space closely following years after to the to the development of the popularization of GANs generative adversarial networks with AlphaGo and AlphaZero in 2016 and 7. And as we'll talk about language models of transformers in 17, 18 and 19, those has been the last few years have been dominated by the ideas of deep learning in the space of natural language processing. Okay, celebrations.
Speaker 1
05:49
This year, the Turing Award was given for deep learning. This is like deep learning has grown up. We can finally start giving awards. Jan LeCun, Jeffrey Hinton, Yasha Banjo received the Turing Award for their conceptual engineering breakthroughs that have made deep neural networks a critical component of computing.
Speaker 1
06:09
I would also like to add that perhaps the popularization in the face of skepticism, And for those a little bit older, have known the skepticism that neural networks have received throughout the 90s. In the face of that skepticism, continuing pushing, believing, and working in this field and popularizing it through in the face of that skepticism, I think is part of the reason these 3 folks have received the award. But of course, the community that contributed to deep learning is bigger, much bigger than those 3. Many of whom might be here today at MIT, broadly in academia and industry.
Speaker 1
06:48
Looking at the early key figures, Walter Pitts and Warren McCulloch, as I mentioned for the computational models of the neural nets, these ideas of that of thinking that the kind of neural networks, biological neural networks can have on our brain can be modeled mathematically. And then the engineering of those models into actual physical and conceptual mathematical systems by Frank Rosenblatt, 57 against single layer, multi-layer in 1962. You could say Frank Rosenblatt is the father of deep learning. The first person to really, in 62, mention the idea of multiple hidden layers in neural networks.
Speaker 1
07:28
As far as I know, somebody was correct me. But in 1965, shout out to the Soviet Union and Ukraine. The person who is considered to be the father of deep learning, Alexey Evaknenko and VG Lapa co-author of that work is the first learning algorithms on multi-layer perceptrons, multiple hidden layers. The work on back propagation, automatic differentiation in 1970.
Speaker 1
07:58
In 1979, convolution neural networks were first introduced and John Hopfield looking at recurrent neural networks what are now called Hopfield networks a special kind of recurrent neural networks. Okay that's the early birth of deep learning. I want to mention this because it's been a kind of contention space now that we can celebrate the incredible accomplishments of deep learning. Much like in reinforcement learning, in academia, a credit assignment is a big problem.
Speaker 1
08:26
And the embodiment of that, almost a point of meme, is the great Juergen Schmidhuber. I encourage for people who are interested in the amazing contribution of the different people in the deep learning field to read his work on deep learning and neural networks. It's an overview of all the various people who have contributed besides Jan LeCun, Geoffrey Hinton and Yoshua Bengio. It's a big, beautiful community.
Speaker 1
08:54
So full of great ideas and full of great people. My hope for this community, given the tension as some of you might have seen around this kind of credit assignment problem, is that we have more, not on this slide, but love, there can never be enough love in the world, but general respect, open-mindedness, and collaboration and credit sharing in the community. Less derision, jealousy, and stubbornness, and silos, academic silos, within institutions, within disciplines. Also, 2019 was the first time it became cool to highlight the limits of deep learning.
Speaker 1
09:38
This is the interesting moment in time. Several books, several papers have come out in the past couple of years, highlighting that deep learning is not able to do the kind of the broad spectrum of tasks that we can think of artificial intelligence system being able to do, like read common sense reasoning, like building knowledge bases and so on. Rodney Brooks said by 2020, the popular press starts having stories that the era of deep learning is over. And certainly there has been echoes of that through the press, through the Twitter sphere and all that kind of world.
Speaker 1
10:19
And I'd like to say that a little skepticism, a little criticism is really good always for the community, but not too much. It's like a little spice in the soup of progress. Aside from that kind of skepticism, the growth of CVPR, iClear, NeurIPS, all these conference submission papers has grown year over year. There's been a lot of exciting research, some of which I'd like to cover today.
Speaker 1
10:52
My hope in this space of deep learning growth, celebrations and limitations for 2020 is that there's less, both less hype and less anti-hype, Less tweets on how there's too much hype in AI and more solid research. Less criticism and more doing. But again, a little criticism is a little spice is always good for the recipe. Hybrid research, less contentious counterproductive debates and more open-minded interdisciplinary collaboration across neuroscience, cognitive science, computer science, robotics, mathematics, physics, across all of these disciplines working together.
Speaker 1
11:38
And the research topics that I would love to see more contributions to, as we'll briefly talk about in some domains, is reasoning, common sense reasoning, integrating that into the learning architecture, active learning and lifelong learning, multimodal multitask learning, open domain conversation. So expanding the success of natural language to dialogue, to open domain dialogue and conversation, and then applications. The 2 most exciting, 1 of which we'll talk about is medical and autonomous vehicles. Then algorithmic ethics in all of its forms, fairness, privacy, bias.
Speaker 1
12:15
There's been a lot of exciting research there. I hope that continues. Taking responsibility for the flaws in our data and the flaws in our human ethics. And then robotics, in terms of deep learning application robotics, I'd love to see a lot of development, continued development, deep reinforcement learning application, robotics and robot manipulation.
Speaker 1
12:36
By the way, there might be a little bit of time for questions at the end. If you have a really pressing question, you can ask it along the way too. Questions so far? Thank God.
Speaker 1
12:48
Okay. So first, the practical, the deep learning and deep RL frameworks. This has really been a year where the frameworks have really matured and converged towards 2 popular deep learning frameworks that people have used. It's TensorFlow and PyTorch.
Speaker 1
13:08
So TensorFlow 2.0 and PyTorch 1.3 is the most recent version. And they've converged towards each other, taking the best features, removing the weaknesses from each other. So that competition has been really fruitful in some sense for the development of the community. So on the TensorFlow side, eager execution, so imperative programming, the kind of how you would program in Python has become the default, has been first integrated, made easy to use, and become the default.
Speaker 1
13:38
On the PyTorch side, TorchScript allowed for now graph representation. So do what you're used to be able to do and what used to be the default mode of operation in TensorFlow, allow you to have this intermediate representation that's in graph form. On the TensorFlow side, just the deep Keras integration and promotion as the primary citizen, the default citizen of the API, of the way you would track a TensorFlow, allowing complete beginners, just anybody outside of machine learning to use TensorFlow with just a few lines of code to train and do inference with the model. That's really exciting.
Speaker 1
14:20
They cleaned up the API, the documentation, and so on. Of course, maturing the JavaScript in the browser, implementation of TensorFlow, TensorFlow Lite, being able to run TensorFlow on phones, mobile, and serving, apparently, this is something industry cares a lot about, of course, is being able to efficiently use models in the Cloud. PyTorch catching up with TPU support and experimental versions of PyTorch mobile. So being able to run a smartphone on their side.
Speaker 1
14:51
This tense, exciting competition, and I almost forgot to mention, we have to say goodbye to our favorite Python 2. This is the year that support finally in the January 1st, 2020 support for Python 2 and TensorFlow and PyTorch support for Python 2 has ended. So goodbye, print, goodbye, cruel world. Okay.
Speaker 1
15:15
On the reinforcement learning front, we're kind of in the same space as JavaScript libraries are in. There's no clear winners coming out. If you're a beginner in the space, the 1 I recommend is a fork of open app baselines is stable baselines. But there's a lot of exciting ones.
Speaker 1
15:32
Some of them are really closely built on TensorFlow. Some are built on PyTorch. Of course, from Google, from Facebook, from DeepMind, Dopamine, TF-Agents, TensorFlow. Most of these I've used.
Speaker 1
15:48
If you have specific questions, I can answer them. So stable baselines is the OpenAI Baselines 4, because I said this implements a lot of the basic deep RL algorithms, PPO, ATC, everything. Good documentation and just allows very simple, minimal, few lines of code implementation of the basic, the matching of the basic algorithms of the OpenAI gym environments. That's the 1 I recommend.
Speaker 1
16:13
Okay. For the framework world, My hope for 2020 is framework agnostic research. So 1 of the things that I mentioned is PyTorch has really become almost overtaking TensorFlow in popularity in the research world. What I'd love to see is being able to develop an architecture in TensorFlow or developing in PyTorch, which you currently can.
Speaker 1
16:36
Then once you train the model to be able to easily transfer to the other from PyTorch to TensorFlow, from TensorFlow to PyTorch. Currently, it takes 345 hours if you know what you're doing in both languages to do that. It'd be nice if there was a very easy way to do that transfer. Then the maturing of the Deep RL frameworks, I'd love it to see OpenAI step up, DeepMind to step up and really take some of these frameworks to maturity that we can all agree on, much like OpenAI GM for the environment world has done.
Speaker 1
17:06
And continued work that Keras has started and many other wrappers around TensorFlow started of greater and greater abstractions, allowing machine learning to be used by people outside of the machine learning field. I think the powerful thing about supervised sort of basic vanilla supervised learning is that people in biology and chemistry and neuroscience and physics and astronomy can deal with a huge amount of data that they're working with, and without needing to learn any of the details of even Python. So that I would love to see greater and greater abstractions which empower scientists outside the field. Okay, natural language processing.
Speaker 1
17:56
2017, 2018 was when the transformer was developed and its power was demonstrated most especially by BERT achieving a lot of state-of-the-art results and a lot of language benchmarks from sentence classification to tagging, question answering and so on. There's hundreds of data sets and benchmarks that emerged most of which Bert has dominated in 2018. 2019 was sort of the year that the transformer really exploded in terms of all the different variations. Again, starting from Bert, ExcelNet, It's very cool to use Bert in the name of your new derivative of a transformer.
Speaker 1
18:54
Roberta, distilled Bert from Hugging Face, Salesforce, OpenAI, GPT-2 of course, Albert, and Megatron from NVIDIA. Huge transformer. A few tools have emerged. So 1 on Hugging Face is a company and also a repository that has implemented in both PyTorch and TensorFlow a lot of these transformer based natural language models.
Speaker 1
19:19
So that's really exciting. So most people here can just use it easily. So those are already pre-trained models. And the other exciting stuff is Sebastian Ruder, great researcher in the field of natural language processing has put together NLP progress, which is all the different benchmarks for all the different natural language tasks, tracking who sort of leaderboards of who's winning where.
Speaker 1
19:42
Okay, I'll mention a few models that stand out the work from this year. Megatron LM from Nvidia is basically taking, I believe the GPT-2 transformer model and just putting it on steroids, right? 8.3 versus 1.5 billion parameters. And a lot of interesting stuff there, as you would expect from NVIDIA.
Speaker 1
20:07
Of course, there's always brilliant research, but also interesting aspects about how to train in a parallel way, model and data parallelism in the training. The first breakthrough results in terms of performance, the model that replaced BERT as king of transformers is XLNet from CMU and Google Research. They combine the bidirectionality from BERT and the recurrence aspect of Transformer XL, the relative positional embeddings and the recurrence mechanism of Transformer XL. So taking the bidirectionality and the recurrence combining it to achieve state of the art performance on 20 tasks.
Speaker 1
20:50
Albert is a recent addition from Google Research and it reduces significantly the amount of parameters versus BERT by doing parameter sharing across the layers. And it has achieved state-of-the-art results on 12 NLP tasks, including the difficult Stanford question answering benchmark of SQuAD2. And they provide open source TensorFlow implementation, including a number of ready-to-use pre-trained language models. OK, another way for people who are completely new to this field, a bunch of apps, right with Transformer is 1 of them, from Hugging Face, popped up that allows you to explore the capabilities of these language models.
Speaker 1
21:35
And I think they're quite fascinating from a philosophical point of view. And this has actually been at the core of a lot of the tension of how much do these transformers actually understand, basically memorizing the statistics of the language in a self-supervised way by reading a lot of texts. Is that really understanding? A lot of people say no, until it impresses us and then everybody will say it's obvious.
Speaker 1
22:02
But right with transformer is a really powerful way to generate text to reveal to you how much these models really learn. Before this yesterday, actually just came up with a bunch of prompts. So on the left is a prompt you give it, the meaning of life here, for example, is not what I think it is. It's what I do to make it.
Speaker 1
22:22
And you can do a lot of prompts of this nature. It's very profound. And some of them will be just absurd. You'll make sense of it statistically, but it'll be absurd and reveal that the model really doesn't understand the fundamentals of the prompt is being provided.
Speaker 1
22:39
But at the same time, it's incredible. What kind of text is able to generate? The limits of deep learning, we're just having fun with this at this point, still are still in the process of being figured out. Very true.
Speaker 1
22:55
Had to type this. Most important person in the history of deep learning is probably Andrew Ng. I have to agree. So this model knows what it's doing.
Speaker 1
23:07
And I tried to get it to say something nice about me. And that took a lot of attempts. So this is kind of funny. Is finally it did 1.
Speaker 1
23:17
I said, Lex Freeman's best quality is that he's smart. I said, finally, but, it's never nothing but ever happens, but I think he gets more attention than every, every Twitter comment ever. That's very true. Okay.
Speaker 1
23:37
A nice way to sort of reveal through this that the models are not able to do any kind of understanding of language is just to do prompts that show understanding of concepts and being able to reason with those concepts, common sense reasoning. Trivia 1 is doing 2 plus 2 is A35 is A67, the result is a simple equation, 4 and 2 plus 3 is, like you got it right and then it changes mind. Okay, 2 minus 2 is 7, so on. You can reveal any kind of reasoning.
Speaker 1
24:10
You can do a blocks. You can ask it about gravity, all those kinds of things. It shows that it doesn't understand the fundamentals of the concepts that are being reasoned about. And I'll mention of work that takes it beyond towards that reasoning world in the next few slides.
Speaker 1
24:28
But I should also mention with this GPT-2 model, if you remember about a year ago, there was a lot of thinking about this 1.5 billion parameter model from open AI. It is so, the thought was it might be so powerful that it would be dangerous. And so the idea from open AI is when you have an AI system that you're about to release that might turn out to be dangerous in this case used probably by Russians fake news for misinformation. That that's the kind of thinking is how do we release it?
Speaker 1
25:04
And I think while it turned out that the GPT-2 model is not quite so dangerous, that humans are in fact more dangerous than AI currently, that thought experiment is very interesting. They released a report on release strategies and the social impacts of language models that almost didn't get as much intention as I think it should. And it was a little bit disappointing to me how little people are worried about this kind of situation. There was more of an eye roll about, oh, these language models aren't as smart as we thought they might be.
Speaker 1
25:42
But the reality is once they are, It's a very interesting thought experiment of how should the process go of companies and experts communicating with each other during that release. This report thinks through some of those details. My takeaway from just reading the report from this whole year of that event is that conversation on this topic are difficult because we as the public seem to penalize anybody trying to have that conversation. And the model of sharing privately confidentially between ML, machine learning organizations and experts is not there.
Speaker 1
26:19
There's no incentive or a model or a history or a culture of sharing. Okay. Best paper from ACL, the main conference for languages was on the difficult task of, so we talked about language models. Now there's the task taking it a step further of dialogue, multi-domain task oriented dialogue.
Speaker 1
26:45
That's sort of like the next challenge for dialogue systems. And they've had a few ideas on how to perform dialogues, state tracking across domains, achieving state of the art performance on multi-laws, which is a 5 domain challenging, very difficult 5 domain, human to human dialogue dataset. There's a few ideas there. I should probably hurry up and start skipping stuff.
Speaker 1
27:13
On the common sense reasoning, which is really interesting is, this 1 of the open questions for the deep learning community, AI community in general, is how can we have hybrid systems of whether it's symbolic AI deep learning or generally common sense reasoning with learning systems. And there's been a few papers in this space. 1 of my favorites from Salesforce on building a data set where we can start to do question answering and figuring out the concepts that are being explored in the question and answering. Here, the question, while eating a hamburger with friends, what are people trying to do?
Speaker 1
27:51
Multiple choice, have fun, tasty, indigestion. The idea that needs to be generated there, and that's where the language model would come in, is that usually a hamburger with friends indicates a good time. So you basically take the question, generate the common sense concept, and from that, be able to determine the multiple choice what's happening, what's the state of affairs in this particular question. Okay, Alexa prize again, hasn't received nearly enough attention that I think it should have, perhaps because there hasn't been major breakthroughs, but it's open domain conversations that all of us, anybody who owns an Alexa can participate in as a provider of data.
Speaker 1
28:49
But there's been a lot of amazing work from universities across the world on the Alexa prize in the last couple of years. And there's been a lot of interesting lessons summarized in papers and blog posts. A few lessons from Alquist team that I particularly like. And this is kind of echoes the work in IBM Watson with the Jeopardy challenge is that 1 of the big ones is that machine learning is not an essential tool for effective conversation yet.
Speaker 1
29:18
So machine learning is useful for general chit-chat when you fail at deep, meaningful conversation or actually understanding what the topic we're talking about. So throwing in chit-chat and classification, sort of classifying intent, finding the entities, detecting the sentiment of the sentences. That's sort of an assistive tool. But the fundamentals of the conversation are the following.
Speaker 1
29:42
So first you have to break it apart. Sort of conversation is, you can think of it as a long dance and the way you have fun dancing is you break it up into a set of moves and turns and so on and focus on that sort of live in the moment kind of thing. So focus on small parts of the conversation taken at a time. Then also have a graph sort of conversation is also all about tangents.
Speaker 1
30:08
So have a graph of topics and be ready to jump context from 1 context to the other and back. If you look at some of these natural language conversations that they publish, it's just all over the place in terms of topics. You jump back and forth, and that's the beauty, the humor, the wit, the fun of conversations. You jump around from topic to topic.
Speaker 1
30:29
And opinions. 1 of the things that natural language systems don't seem to have much is opinions. If I learned anything, 1 of the simplest way to convey intelligence is to be very opinionated about something and confident. And that's a really interesting concept about, in general, there's just a lot of lessons.
Speaker 1
30:51
Oh, and finally, of course, maximize entertainment, not information. This is true for autonomous vehicles. This is true for natural language conversation is Fun should be part of the objective function. Okay.
Speaker 1
31:03
Lots of lessons to learn there. This is really the Lobner Prize, the Turing test of our generation. It's, I'm excited to see if there's, anybody's able to solve the Alexa Prize. Again, Alexa Prize is your task with talking to a bot and the measure of quality is the same as the Lobner prize is just measuring how good was that conversation, but also the task is to try to continue the conversation for 20 minutes.
Speaker 1
31:30
If you try to talk to a bot today, like, and you have a choice to talk to a bot or go do something else, watch Netflix, the, you last probably less than 10 seconds. You'll be bored. The point is to continue trapping you in the conversation because you're enjoying it so much. And the 20 minutes is that's a really nice benchmark for passing the spirit of what the Turing test stood for.
Speaker 1
31:58
Examples here from the Alexa prize, from the Alcos bot. So the difference in 2 kinds of conversations. So Alcos says, have you been in Brazil? The user says, what is the population of Brazil?
Speaker 1
32:10
Alcos says, it is about 20 million. User says, well, okay. This is what happens a lot with, like I mentioned, multi-domain conversation is once you jump to a new domain, you stay there. Once you switch context, you stay there.
Speaker 1
32:26
The reality is you want to jump back and continue jumping around. Like in the second, more successful conversation, have you been in Brazil? What is the population of Brazil? It is around 20 million.
Speaker 1
32:38
Anyway, I was saying, have you been in Brazil? So jumping back in context, that's how conversation goes. Tangent to tangent and back. Quickly, there's been a lot of sequence to sequence kind of work using natural language to summarize a lot of applications.
Speaker 1
32:57
1 of them for make eye clear that I wanted to highlight from Technion that I find particularly interesting is the abstract syntax tree based summarization of code. So modeling computer code, in this case, sadly Java and C-sharp in trees, in syntax trees, and then using, operating on those trees to then do the summarization in text. Here, an example of a basic power of 2 function on the bottom right In Java, the code to sec summarization says, get power of 2. That's an exciting possibility of automated documentation of source code.
Speaker 1
33:41
I thought it was particularly interesting and the future there is bright. Okay, hopes for 2020 for natural language processing is reasoning, common sense reasoning becomes greater and greater part of the transformer type language model work that we've seen in the deep learning world. Extending the context from thousands, from hundreds or thousands of words to tens of thousands of words, being able to read entire stories and maintain the context. Which Transformers, again, with XLNet, Transformer XL is starting to be able to do, but we're still far away from that long-term, lifelong maintenance of context.
Speaker 1
34:19
Dialogue, open domain dialogue, forever since Alan Turing to today's, the dream of artificial intelligence, being able to pass the Turing test. And the dream of sort of natural language model transformers are self-supervised learning. And the dream of Yan LeCun is to, for these kinds of what previously were called unsupervised but he's calling now self-supervised learning systems to be able to sort of watch YouTube videos. And from that, start to form representation based on which you can understand the world.
Speaker 1
34:58
Sort of the hope for 2020 and beyond is to be able to transfer some of the success of transformers to the world of visual information, the world of video, for example. Deep RL and self-play. This has been an exciting year, continues to be an exciting time for reinforcement learning in games and robotics. So first, Dota 2 and OpenAI, an exceptionally popular competitive game, e-sports game that people compete, win millions of dollars with.
Speaker 1
35:36
So this is a lot of world-class professional players. So in 2018, OpenAI 5, this is a team play, tried their best at the international and lost, and said that we're looking forward to pushing 5 to the next level, which they did in April, 2018. They beat the 2018 world champions in 5 on 5 play. So the key there was compute 8 times more training compute because the the the actual compute was already maxed out the way they achieved the 8x is in time simply training for longer.
Speaker 1
36:15
So the current version of OpenAI 5, as Jacob will talk about next Friday, has consumed 800 petaflop a second days and experienced about 45,000 years of Dota self-play over 10 real-time months. Again, behind a lot of the game systems talk about the, they use self-play. So they play against each other. This is 1 of the most exciting concepts in deep learning systems that learn by playing each other and incrementally improving in time.
Speaker 1
36:43
So starting from being terrible and getting better and better and better and better. And you always being challenged by a slightly better opponent because of the natural process of self-play. That's a fascinating process. The 2019 version, the last version of OpenAI 5 has a 99.9 win rate versus the 2018 version.
Speaker 1
37:03
Okay. Then DeepMind also in parallel has been working and using self-play to solve some of these multi-agent games, which is a really difficult space when people have to collaborate as part of the competition. It's exceptionally difficult from the reinforcement learning perspective. So this is from raw pixels, solve the arena capture the flag game, quake 3 arena.
Speaker 1
37:30
1 of the things I love just as a sort of side note about both OpenAI and DeepMind and general research and reinforcement learning, there will always be 1 or 2 paragraphs of philosophy. In this case from DeepMind, billions of people inhabit the planet, each with their own individual goals and actions, but still capable of coming together through teams, organizations, and societies, in impressive displays of collective intelligence. This is a setting we call multi-agent learning. Many individual agents must act independently, yet learn to interact and cooperate with other agent.
Speaker 1
38:03
This is immensely difficult problem because with co-adapting agent, the world is constantly changing. The fact that we, 7 billion people on earth, people in this room, in families, in villages, can collaborate while being, for the most part, self-interested agents is fascinating. 1 of my hopes actually for 2020 is to explore social behaviors that emerge in reinforcement learning agents and how those are echoed in real human to human social systems. Okay, here's some visualizations.
Speaker 1
38:36
The agents automatically figure out, as you see in other games, they figure out the concepts. So knowing very little, knowing nothing about the rules of the game, about the concepts of the game, about the strategy and the behaviors, they're able to figure it out. There's the T-SNE visualizations of the different states, importance states and concepts in the game that this figures out and so on. Skipping ahead, automatic discovery of different behaviors.
Speaker 1
38:59
This happens in all the different games we talk about from Dota to StarCraft to Quake, the different strategies that it doesn't know about, it figures out automatically. And the really exciting work in terms of the multi-agent RL On the DeepMind side was the beating world-class players and achieving Grand Master level in a game I do know about, which is StarCraft. In December, 2018, AlphaStar beat Mana, 1 of the world's strongest professional StarCraft players, but that was in a very constrained environment. And it was a single race, I think Protoss.
Speaker 1
39:38
And in October, 2019, AlphaStar reached Grandmaster level by doing what we humans do. So using a camera, observing the game and playing as part of, against other humans. So this is not an artificial side system. This is doing exact same process humans would undertake and achieve grandmaster, which is the highest level.
Speaker 1
39:58
Okay, great. I encourage you to observe a lot of the interesting on their blog posts and videos of the different strategies that the, there are RL agents are able to figure out. Here's a quote from the, 1 of the professional Starcraft players. And we see this with AlphaZero2 in chess is AlphaStar is an intriguing unorthodox player.
Speaker 1
40:19
1 with the reflexes and speed of the best pros but strategies and style that are entirely its own. The way AlphaStar was trained with agents competing against each other in a league has resulted in gameplay that's unimaginably unusual. It really makes you question how much of Starcraft's diverse possibilities pro players have really explored. And that's the really exciting thing about reinforcement learning agent in chess, in Go, in games, and hopefully simulated systems in the future that teach us, teach experts that think they understand the dynamics of a particular game, a particular simulation of new strategies, of new behaviors to study.
Speaker 1
41:02
That's 1 of the exciting applications from almost a psychology perspective that I'd love to see reinforcement learning push towards. And on the imperfect information game side, poker In 2018, CMU, Noah Brown was able to beat head-to-head, no limit Texas Hold'em, and now team 6 player no limit Texas Hold'em against professional players. Many of the same results, many of the same approaches with self-play, iterative Monte Carlo, and there's a bunch of ideas in terms of the abstractions. So there's so many possibilities under the imperfect information that you have to form these bins of abstractions in both the action space in order to reduce the action space and the information abstraction space.
Speaker 1
41:54
So the probabilities of all the different hands that could possibly have and all the different hands that the betting strategies could possibly represent. And so you have to do this kind of course planning. So they use self-play to generate a course blueprint strategy that in real time, they then use Monte Carlo search to adjust as they play. Again, unlike the deep mind open AI approach is very few, very minimal compute required.
Speaker 1
42:21
And they're able to achieve to beat world-class players. Again, I like this is getting quotes from the professional players after they get beaten. So Chris Ferguson, famous World Series of Poker player said, Pleribus, that's the name of the agent, is a very hard opponent to play against. It's really hard to pin him down on any kind of hand.
Speaker 1
42:45
He's also very good at making thin value bets on the river. He's very good at extracting value out of his good hands, sort of making bets without scaring off the opponent. Darin Elias said, its major strength is its ability to use mixed strategies. That's the same thing that humans try to do.
Speaker 1
43:06
It's a matter of execution for humans to do this in a perfectly random way and to do so consistently, most people just can't. Then in the robotic space, there's been a lot of applications of reinforcement learning. 1 of the most exciting is the manipulation, sufficient manipulation to be able to solve the Rubik's cube. Again, this is learned through reinforcement learning.
Speaker 1
43:30
Again, because self-plays in this context is not possible, they use automatic domain randomization, ADR. So they generate progressively more difficult environments for the hand. There's a giraffe head there you see. There's a lot of perturbations to the system, so they mess with it a lot, and then a lot of noise injected into the system to be able to teach the hand to manipulate the cube in order to then solve.
Speaker 1
43:52
The actual solution of figuring out how to go from this particular face to the solved cube is an obvious problem. The, this paper and this work is focused on the much more difficult learning to manipulate the cube. It's really exciting. Again, a little philosophy, as you would expect from open AI is they have this idea of emergent meta-learning.
Speaker 1
44:19
This idea that the capacity of the neural network that's learning this manipulation is constrained while the ADR, the automatic domain randomization is progressively making harder and harder environment. So the capacity of the environment to be difficult is unconstrained. And because of that, there's an emergent self-optimization of the neural network to learn general concepts as opposed to memorize particular manipulations. The hope for me in a deeper enforcement learning space for 2020 is the continued application of robotics, even sort of a legged robotics, but also robotic manipulation.
Speaker 1
45:08
Human behavior, so use of multi-agent self-plays I've mentioned to explore naturally emerging social behaviors, constructing simulations of social behavior and seeing what kind of multi-human behavior emerges in self-play context. I think that's 1 of the nice, there are always, I hope there'll be like a reinforcement learning self-play psychology department 1 day, like where you use reinforcement learning to study, to reverse engineer human behavior and study it through that way. And again, in games, I'm not sure what the big challenges that remain, but I would love to see, to me at least, it's exciting to see learned solution to games, to self-play. Science and deep learning, I would say there's been a lot of really exciting developments here that deserve their own lecture.
Speaker 1
45:59
I'll mention just a few here from MIT in really 2018, but it sparked a lot of interest in 2019 follow on work is the idea of the lottery ticket hypothesis. So this work showed that sub networks, small sub networks within the larger network are the ones that are doing all the thinking. The same results in accuracy can be achieved from a small sub network from within a neural network. And they have a very simple process of arriving at a sub network of randomly initializing in your network.
Speaker 1
46:36
That's I guess the lottery ticket. Train the network until it converges. This is an iterative process. Prune the fraction of the network with low weights.
Speaker 1
46:45
Reset the waste of the remaining network with the original initialization, the same lottery ticket, and then train again, the pruned untrained network and continue this iteratively, continuously to arrive at a network that's much smaller using the same original initializations. This is fascinating that within these big networks, there's often a much smaller network that can achieve the same kind of accuracy. Now, practically speaking, it's unclear what that, what are the big takeaways there, except the inspiring takeaway that there exist architectures that are much more efficient. So there's value in investing time in finding such networks.
Speaker 1
47:30
Then there is this entangled representations, which again, deserves its own lecture, but here showing a 10 vector representation. And the goal is where each part of the vector can learn 1 particular concept about a dataset. Sort of the dream of unsupervised learning is you can learn compressed representations where every 1 thing is disentangled. And you can learn some fundamental concept about the underlying data that can carry from data set to data set to data set.
Speaker 1
48:01
That's disentangled representation. There's theoretical work, best ICML paper in 2019 showing that that's impossible. So disentangled representations are impossible without some, without inductive biases. And so the suggestion there is that the biases that you use should be made explicit as much as possible.
Speaker 1
48:26
The open problem is finding good inductive biases for unsupervised model selection that work across multiple data sets that we're actually interested in. A lot more papers, but 1 of the exciting is the double descent idea that's been extended and to the deep neural network context by OpenAI to explore the phenomena that as we increase the number of parameters in neural network, the test error initially decreases, increases, and just as the model is able to fit the training set undergoes a second descent. So decrease, increase, decrease. So there's this critical moment of time when the training set is just fit perfectly.
Speaker 1
49:09
Okay, and this is the open AI shows that it's applicable not just to model size, but also to training time and data set time. This is more like an open problem of why this is, trying to understand this and how to leverage it in optimizing training dynamics in neural networks. That's a, there's a lot of really interesting theoretical questions there. So my hope there for the science of deep learning in 2020 is to continue exploring the fundamentals of model selection, training dynamics, and the folks focused on the performance of the training in terms of memory and speed, it's worked on and the representation characteristics with respect to architecture characteristics.
Speaker 1
49:47
So a lot of the fundamental work there in understanding neural networks. 2 areas that I had hold 2 sections on in papers, which is super exciting. My first love is graphs. So graph neural networks is a really exciting area of deep learning.
Speaker 1
50:05
Graph convolution neural networks, as well for solving combinatorial problems and recommendation systems are really useful in any kind of problem that is fundamentally can be modeled as a graph can be then solved or at least aided in by neural networks. There's a lot of exciting area there and Bayesian deep learning using Bayesian neural networks. That's been for several years, an exciting possibility. It's very difficult to train large Bayesian networks, but in the context that you can, and it's useful, small data sets, providing uncertainty measurements in the predictions is extremely powerful capability of Bayesian nets, of Bayesian neural networks, and online incremental learning, these neural networks release.
Speaker 1
50:51
There's a lot of really good papers there. It's exciting. Okay, autonomous vehicles. Oh boy.
Speaker 1
50:58
Let me try to use as few sentences as possible to describe this section of a few slides. It is 1 of the most exciting areas of applications of AI and learning in the real world today. And I think it's the way that artificial intelligence, It is the place where artificial intelligence systems touch human beings that don't know anything about artificial intelligence the most. Hundreds of thousands, soon millions of cars will be interacting with human beings, robots really.
Speaker 1
51:30
So this is a really exciting area and really difficult problem. And there's 2 approaches. 1 is level 2, where the human is fundamentally responsible for the supervision of the AI system. And level 4, where at least the dream is, where the AI system is responsible for the actions and the human does not need to be a supervisor.
Speaker 1
51:49
Okay, 2 companies represent each of these approaches that are sort of leading the way. Waymo in October, 2018, 10 million miles on road. Today, this year, they've done 20 million miles in simulation, 10 billion miles, and a lot, I got a chance to visit them out in Arizona. They're doing a lot of really exciting work and they're obsessed with testing.
Speaker 1
52:15
So the kind of testing they're doing is incredible. 20,000 classes of structured tests of putting the system through all kinds of tests that the engineers can think through and that appear in the real world. And they've initiated testing on road with real consumers without a safety driver. If you don't know what that is, that means the car is truly responsible.
Speaker 1
52:40
There's no human catch. The exciting thing is that there is 700,000, 800,000 Tesla autopilot systems. That means there's these systems that are human supervised. They're using a multi-headed neural network, multitask neural network to perceive, predict and act in this world.
Speaker 1
53:09
So that's a really exciting real world deployment, large scale of neural networks. As a fundamentally deep learning system, unlike Waymo, which is deep learning is the icing on the cake. For Tesla, deep learning is the cake. Okay, it's at the core of the perception and the action that the system performs.
Speaker 1
53:32
They have to date done over 2 billion miles estimated and that continues to quickly grow. I'll briefly mention, which I think is a super exciting idea in all applications of machine learning in the real world, which is online. So iterative learning, active learning, Andrej Karpathy, who's the head of Autopilot calls this the data engine. It's this iterative process of having a neural network, performing the task, discovering the edge cases, searching for other edge cases that are similar, and then retraining the network, annotating the edge cases and retraining there, and continuously doing this loop.
Speaker 1
54:09
This is what every single company that's using machine learning seriously is doing. Very little publications on this space, interactive learning, But this is the fundamental problem of machine learning. It's not to create a brilliant neural network. It's to create a dumb neural network that continuously learns to improve until it's brilliant.
Speaker 1
54:28
And that process is especially interesting when you take it outside of single task learning. So most papers are written on single task learning. You take whatever benchmark here in the case of driving this object detection, landmark detection, driving by area, trajectory generation, right? That all those have benchmarks and you can have separate neural networks for them.
Speaker 1
54:50
That's a single task. But combining to use a single neural network that performs all those tasks together, that's the fascinating challenge where you're reusing parts of the neural network to learn things that are coupled and then to learn things that are completely independent and doing the continuous active learning loop. They're inside companies in the case of Tesla and Waymo in general, it's exciting to have people. These are actual human beings that are responsible for these particular tasks.
Speaker 1
55:18
They become experts of particular perception tasks, experts of particular planning tasks and so on. And so the job of that expert is both to train the neural network and to discover the edge cases which maximize the improvement of the network. That's where the human expertise comes in a lot. Okay.
Speaker 1
55:35
And there's a lot of debate. It's an open question about which kind of system would be, which kind of approach would be successful. A fundamentally learning based approach as is with the level 2, with the Tesla autopilot system, that's learning all the different tasks that are involved with driving. And as it gets better and better and better, less and less human supervision is required.
Speaker 1
55:59
The pro of that approach is the camera based systems have the highest resolution. So it's very amenable to learning, but the con is that it requires a lot of data, a huge amount of data. And when nobody knows how much data yet. The other con is human psychology is the driver behavior that the human must continue to remain vigilant.
Speaker 1
56:21
On the level 4 approach that leverages besides cameras and radar and so on also leverages LIDAR map. The pros that it's much more consistent, reliable, explainable system. So the detection, the accuracy of the detection, the depth estimation, the detection of different objects is much higher, accurate with less data. The cons is it's expensive, at least for now.
Speaker 1
56:49
It's less amenable to learning methods because much fewer data, lower resolution data and must require at least for now, some fallback whether that's the safety driver or tele-operation. The open questions for the deep learning level 2 Tesla autopilot approach is how hard is driving? This is actually the open question for most disciplines in artificial intelligence. How difficult is driving?
Speaker 1
57:16
How many edge cases does driving have? Can that, can we learn to generalize over those edge cases without solving the common sense reasoning problem? That's kind of, it's kind of a task without solving the human level artificial intelligence problem. And that means perception.
Speaker 1
57:30
How hard is perception, detection, intention, modeling, human mental modeling, the trajectory of prediction. Then the action side, the game theoretic action side of balancing, like I mentioned, fun and enjoyability with the safety of the systems because these are life critical systems and human supervision, the vigilance side, how good can autopilot get before visuals decrements significantly? And so people fall asleep, become distracted, start watching movies, so on and so on. The things that people naturally do.
Speaker 1
58:02
The open question is how good can autopilot get before that becomes a serious problem? And if that decrement nullifies the safety benefit of the use of autopilot, which is autopilot AI system, when the sensors are working well is perfectly vigilant. The AI is always paying attention. The open questions for the LIDAR based, the level 4, the way more approaches.
Speaker 1
58:31
When we have maps, LiDAR and geo-fenced routes that are taken, how difficult is driving? The traditional approach to robotics that from the DARPA challenge to today for most autonomous vehicle companies is to do HD maps, to use LIDAR for really accurate localization together with GPS. And then the perception problem becomes the icing on the cake because you already have a really good sense of where you are with obstacles in the scene. And the perception is not a safety critical task but a task of interpreting the environment further.
Speaker 1
59:05
So you have more, it's naturally by nature already safer, but how difficult is nevertheless is that problem? If perception is the hard problem, then the LIDAR based approaches is nice. If action is the hard problem, then both Tesla and Waymo have to solve the action problem without the sensors don't matter there. It's the difficult problem, the planning, the game theoretic, the human, the modeling of mental models and the intentions of other human beings, the pedestrians and the cyclists is the hard problem.
Speaker 1
59:42
And then the other side, the 10 billion miles of simulation, the open problem from reinforcement learning, deep learning in general, is how much can we learn from simulation? How much of that knowledge can we transfer to then read the real world systems? My hope in the autonomous vehicle space, AI assisted driving space.
Omnivision Solutions Ltd