Lecture on most recent research and developments in deep learning, and hopes for 2020. This is not intended to be a list of SOTA benchmark results, but rather a set of highlights of machine learning and AI innovations and progress in academia, industry, and society in general. This lecture is part of the MIT Deep Learning Lecture Series.

Website: https://deeplearning.mit.edu
Slides: http://bit.ly/2QEfbAm
References: http://bit.ly/deeplearn-sota-2020
Playlist: http://bit.ly/deep-learning-playlist

OUTLINE:
0:00 - Introduction
0:33 - AI in the context of human history
5:47 - Deep learning celebrations, growth, and limitations
6:35 - Deep learning early key figures
9:29 - Limitations of deep learning
11:01 - Hopes for 2020: deep learning community and research
12:50 - Deep learning frameworks: TensorFlow and PyTorch
15:11 - Deep RL frameworks
16:13 - Hopes for 2020: deep learning and deep RL frameworks
17:53 - Natural language processing
19:42 - Megatron, XLNet, ALBERT
21:21 - Write with transformer examples
24:28 - GPT-2 release strategies report
26:25 - Multi-domain dialogue
27:13 - Commonsense reasoning
28:26 - Alexa prize and open-domain conversation
33:44 - Hopes for 2020: natural language processing
35:11 - Deep RL and self-play
35:30 - OpenAI Five and Dota 2
37:04 - DeepMind Quake III Arena
39:07 - DeepMind AlphaStar
41:09 - Pluribus: six-player no-limit Texas hold'em poker
43:13 - OpenAI Rubik's Cube
44:49 - Hopes for 2020: Deep RL and self-play
45:52 - Science of deep learning
46:01 - Lottery ticket hypothesis
47:29 - Disentangled representations
48:34 - Deep double descent
49:30 - Hopes for 2020: science of deep learning
50:56 - Autonomous vehicles and AI-assisted driving
51:50 - Waymo
52:42 - Tesla Autopilot
57:03 - Open question for Level 2 and Level 4 approaches
59:55 - Hopes for 2020: autonomous vehicles and AI-assisted driving
1:01:43 - Government, politics, policy
1:03:03 - Recommendation systems and policy
1:05:36 - Hopes for 2020: Politics, policy and recommendation systems
1:06:50 - Courses, Tutorials, Books
1:10:05 - General hopes for 2020
1:11:19 - Recipe for progress in AI
1:14:15 - Q&A: what made you interested in AI
1:15:21 - Q&A: Will machines ever be able to think and feel?
1:18:20 - Q&A: Is RL a good candidate for achieving AGI?
1:21:31 - Q&A: Are autonomous vehicles responsive to sound?
1:22:43 - Q&A: What does the future with AGI look like? 
1:25:50 - Q&A: Will AGI systems become our masters?

CONNECT:
- If you enjoyed this video, please subscribe to this channel.
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman

Welcome to 2020 and welcome to the Deep Learning Lecture Series.

Let's start it off today to take a quick whirlwind tour of all the exciting things that happened in 17, 18, and 19 especially, and the amazing things we're going to see this year in 2020.

Also as part of this series, it's going to be a few talks from some of the top people in learning in artificial intelligence.

After today, of course, Start at the broad, the celebrations from the Turing Award to the limitations and the debates and the exciting growth first.

And first, of course, a step back to the quote I've used before.

AI began not with Alan Turing or McCarthy, but with the ancient wish to forge the gods, a quote from Pamela McCordick in Machines Who Think.

That visualization there is just 3% of the neurons in our brain of the thalamocortical system.

That magical thing between our ears that allows us all to see and hear and think and reason and hope and dream and fear our eventual mortality.

All of that is the thing we wish to understand.

That's the dream of artificial intelligence and recreate, recreate versions of it, echoes of it in engineering of our intelligence systems.

We should never forget in the details I'll talk, the exciting stuff I'll talk about today.

That's sort of the reason why this is exciting.

The modern human brain, the modern human as we know them today, know and love them today, it's just about 300,000 years ago.

And the Industrial Revolution is about 300 years ago.

That's 0.1% of the development since the early modern human being is when we've seen a lot of the machinery.

The machine was born not in stories but in actuality.

The machine was engineered since the Industrial Revolution and the steam engine and the mechanized factory system and the machining tools.

Now we zoom in to the 60, 70 years since the founder, the father arguably of artificial intelligence, Alan Turing and the dreams.

You know, there's always been the dance in artificial intelligence between the dreams, the mathematical foundations, and when the dreams meet the engineering, the practice, the reality.

So Alan Turing has spoken many times that by the year 2000, that he would be sure that the Turing test natural language would be passed.

It seems probably you said that once the machine thinking method has started, it would not take long to outstrip our feeble powers.

They will be able to converse with each other to sharpen their wits.

Some stage therefore, we should have to expect the machines to take control.

So that's the dream, both the father of the mathematical foundation of artificial intelligence and the father of dreams in artificial intelligence.

And that dream again in the early days was taking reality.

The practice met with the perceptron, often thought of as a single layer neural network, but actually what's not as much known as Frank Rosenblatt was also the developer of the multi-layer Perceptron.

And that history zooming through has amazed our civilization.

To me, 1 of the most inspiring things in this in the world of games.

First with the great Garry Kasparov losing to IBM Dblue in 1997.

Then Lee Sedol losing to AlphaGo in 2016, seminal moments.

And captivating the world through the engineering of actual real world systems.

Robots on 4 wheels, as we'll talk about today, from Waymo to Tesla, to all of the autonomous vehicle companies working in the space.

Robots on 2 legs, captivating the world of what actuation, what kind of manipulation can be achieved.

From 1943, the initial models from neuroscience thinking about neural networks, how to model neural networks mathematically, to the creation as I said of the single layer and the multi-layer perceptron by Frank Rosenblatt in 57 and 62, to the ideas of back propagation and recurring neural nets in the 70s and 80s, to convolutional neural networks and LSTMs and bidirectional RNNs in the 80s and 90s, to the birth of the deep learning term and the new wave, the revolution in 2006, to the ImageNet and AlexNet, the seminal moment that captivated the possibility, the imagination of the AI community of what neural networks can do in the image and natural language space closely following years after to the to the development of the popularization of GANs generative adversarial networks with AlphaGo and AlphaZero in 2016 and 7.

And as we'll talk about language models of transformers in 17, 18 and 19, those has been the last few years have been dominated by the ideas of deep learning in the space of natural language processing.

This year, the Turing Award was given for deep learning.

Jan LeCun, Jeffrey Hinton, Yasha Banjo received the Turing Award for their conceptual engineering breakthroughs that have made deep neural networks a critical component of computing.

I would also like to add that perhaps the popularization in the face of skepticism, And for those a little bit older, have known the skepticism that neural networks have received throughout the 90s.

In the face of that skepticism, continuing pushing, believing, and working in this field and popularizing it through in the face of that skepticism, I think is part of the reason these 3 folks have received the award.

But of course, the community that contributed to deep learning is bigger, much bigger than those 3.

Many of whom might be here today at MIT, broadly in academia and industry.

Looking at the early key figures, Walter Pitts and Warren McCulloch, as I mentioned for the computational models of the neural nets, these ideas of that of thinking that the kind of neural networks, biological neural networks can have on our brain can be modeled mathematically.

And then the engineering of those models into actual physical and conceptual mathematical systems by Frank Rosenblatt, 57 against single layer, multi-layer in 1962.

You could say Frank Rosenblatt is the father of deep learning.

The first person to really, in 62, mention the idea of multiple hidden layers in neural networks.

As far as I know, somebody was correct me.

But in 1965, shout out to the Soviet Union and Ukraine.

The person who is considered to be the father of deep learning, Alexey Evaknenko and VG Lapa co-author of that work is the first learning algorithms on multi-layer perceptrons, multiple hidden layers.

The work on back propagation, automatic differentiation in 1970.

In 1979, convolution neural networks were first introduced and John Hopfield looking at recurrent neural networks what are now called Hopfield networks a special kind of recurrent neural networks.

Okay that's the early birth of deep learning.

I want to mention this because it's been a kind of contention space now that we can celebrate the incredible accomplishments of deep learning.

Much like in reinforcement learning, in academia, a credit assignment is a big problem.

And the embodiment of that, almost a point of meme, is the great Juergen Schmidhuber.

I encourage for people who are interested in the amazing contribution of the different people in the deep learning field to read his work on deep learning and neural networks.

It's an overview of all the various people who have contributed besides Jan LeCun, Geoffrey Hinton and Yoshua Bengio.

So full of great ideas and full of great people.

My hope for this community, given the tension as some of you might have seen around this kind of credit assignment problem, is that we have more, not on this slide, but love, there can never be enough love in the world, but general respect, open-mindedness, and collaboration and credit sharing in the community.

Less derision, jealousy, and stubbornness, and silos, academic silos, within institutions, within disciplines.

Also, 2019 was the first time it became cool to highlight the limits of deep learning.

Several books, several papers have come out in the past couple of years, highlighting that deep learning is not able to do the kind of the broad spectrum of tasks that we can think of artificial intelligence system being able to do, like read common sense reasoning, like building knowledge bases and so on.

Rodney Brooks said by 2020, the popular press starts having stories that the era of deep learning is over.

And certainly there has been echoes of that through the press, through the Twitter sphere and all that kind of world.

And I'd like to say that a little skepticism, a little criticism is really good always for the community, but not too much.

It's like a little spice in the soup of progress.

Aside from that kind of skepticism, the growth of CVPR, iClear, NeurIPS, all these conference submission papers has grown year over year.

There's been a lot of exciting research, some of which I'd like to cover today.

My hope in this space of deep learning growth, celebrations and limitations for 2020 is that there's less, both less hype and less anti-hype, Less tweets on how there's too much hype in AI and more solid research.

But again, a little criticism is a little spice is always good for the recipe.

Hybrid research, less contentious counterproductive debates and more open-minded interdisciplinary collaboration across neuroscience, cognitive science, computer science, robotics, mathematics, physics, across all of these disciplines working together.

And the research topics that I would love to see more contributions to, as we'll briefly talk about in some domains, is reasoning, common sense reasoning, integrating that into the learning architecture, active learning and lifelong learning, multimodal multitask learning, open domain conversation.

So expanding the success of natural language to dialogue, to open domain dialogue and conversation, and then applications.

The 2 most exciting, 1 of which we'll talk about is medical and autonomous vehicles.

Then algorithmic ethics in all of its forms, fairness, privacy, bias.

There's been a lot of exciting research there.

Taking responsibility for the flaws in our data and the flaws in our human ethics.

And then robotics, in terms of deep learning application robotics, I'd love to see a lot of development, continued development, deep reinforcement learning application, robotics and robot manipulation.

By the way, there might be a little bit of time for questions at the end.

If you have a really pressing question, you can ask it along the way too.

So first, the practical, the deep learning and deep RL frameworks.

This has really been a year where the frameworks have really matured and converged towards 2 popular deep learning frameworks that people have used.

So TensorFlow 2.0 and PyTorch 1.3 is the most recent version.

And they've converged towards each other, taking the best features, removing the weaknesses from each other.

So that competition has been really fruitful in some sense for the development of the community.

So on the TensorFlow side, eager execution, so imperative programming, the kind of how you would program in Python has become the default, has been first integrated, made easy to use, and become the default.

On the PyTorch side, TorchScript allowed for now graph representation.

So do what you're used to be able to do and what used to be the default mode of operation in TensorFlow, allow you to have this intermediate representation that's in graph form.

On the TensorFlow side, just the deep Keras integration and promotion as the primary citizen, the default citizen of the API, of the way you would track a TensorFlow, allowing complete beginners, just anybody outside of machine learning to use TensorFlow with just a few lines of code to train and do inference with the model.

They cleaned up the API, the documentation, and so on.

Of course, maturing the JavaScript in the browser, implementation of TensorFlow, TensorFlow Lite, being able to run TensorFlow on phones, mobile, and serving, apparently, this is something industry cares a lot about, of course, is being able to efficiently use models in the Cloud.

PyTorch catching up with TPU support and experimental versions of PyTorch mobile.

So being able to run a smartphone on their side.

This tense, exciting competition, and I almost forgot to mention, we have to say goodbye to our favorite Python 2.

This is the year that support finally in the January 1st, 2020 support for Python 2 and TensorFlow and PyTorch support for Python 2 has ended.

On the reinforcement learning front, we're kind of in the same space as JavaScript libraries are in.

If you're a beginner in the space, the 1 I recommend is a fork of open app baselines is stable baselines.

Some of them are really closely built on TensorFlow.

Of course, from Google, from Facebook, from DeepMind, Dopamine, TF-Agents, TensorFlow.

If you have specific questions, I can answer them.

So stable baselines is the OpenAI Baselines 4, because I said this implements a lot of the basic deep RL algorithms, PPO, ATC, everything.

Good documentation and just allows very simple, minimal, few lines of code implementation of the basic, the matching of the basic algorithms of the OpenAI gym environments.

For the framework world, My hope for 2020 is framework agnostic research.

So 1 of the things that I mentioned is PyTorch has really become almost overtaking TensorFlow in popularity in the research world.

What I'd love to see is being able to develop an architecture in TensorFlow or developing in PyTorch, which you currently can.

Then once you train the model to be able to easily transfer to the other from PyTorch to TensorFlow, from TensorFlow to PyTorch.

Currently, it takes 345 hours if you know what you're doing in both languages to do that.

It'd be nice if there was a very easy way to do that transfer.

Then the maturing of the Deep RL frameworks, I'd love it to see OpenAI step up, DeepMind to step up and really take some of these frameworks to maturity that we can all agree on, much like OpenAI GM for the environment world has done.

And continued work that Keras has started and many other wrappers around TensorFlow started of greater and greater abstractions, allowing machine learning to be used by people outside of the machine learning field.

I think the powerful thing about supervised sort of basic vanilla supervised learning is that people in biology and chemistry and neuroscience and physics and astronomy can deal with a huge amount of data that they're working with, and without needing to learn any of the details of even Python.

So that I would love to see greater and greater abstractions which empower scientists outside the field.

2017, 2018 was when the transformer was developed and its power was demonstrated most especially by BERT achieving a lot of state-of-the-art results and a lot of language benchmarks from sentence classification to tagging, question answering and so on.

There's hundreds of data sets and benchmarks that emerged most of which Bert has dominated in 2018.

2019 was sort of the year that the transformer really exploded in terms of all the different variations.

Again, starting from Bert, ExcelNet, It's very cool to use Bert in the name of your new derivative of a transformer.

Roberta, distilled Bert from Hugging Face, Salesforce, OpenAI, GPT-2 of course, Albert, and Megatron from NVIDIA.

So 1 on Hugging Face is a company and also a repository that has implemented in both PyTorch and TensorFlow a lot of these transformer based natural language models.

So most people here can just use it easily.

And the other exciting stuff is Sebastian Ruder, great researcher in the field of natural language processing has put together NLP progress, which is all the different benchmarks for all the different natural language tasks, tracking who sort of leaderboards of who's winning where.

Okay, I'll mention a few models that stand out the work from this year.

Megatron LM from Nvidia is basically taking, I believe the GPT-2 transformer model and just putting it on steroids, right?

And a lot of interesting stuff there, as you would expect from NVIDIA.

Of course, there's always brilliant research, but also interesting aspects about how to train in a parallel way, model and data parallelism in the training.

The first breakthrough results in terms of performance, the model that replaced BERT as king of transformers is XLNet from CMU and Google Research.

They combine the bidirectionality from BERT and the recurrence aspect of Transformer XL, the relative positional embeddings and the recurrence mechanism of Transformer XL.

So taking the bidirectionality and the recurrence combining it to achieve state of the art performance on 20 tasks.

Albert is a recent addition from Google Research and it reduces significantly the amount of parameters versus BERT by doing parameter sharing across the layers.

And it has achieved state-of-the-art results on 12 NLP tasks, including the difficult Stanford question answering benchmark of SQuAD2.

And they provide open source TensorFlow implementation, including a number of ready-to-use pre-trained language models.

OK, another way for people who are completely new to this field, a bunch of apps, right with Transformer is 1 of them, from Hugging Face, popped up that allows you to explore the capabilities of these language models.

And I think they're quite fascinating from a philosophical point of view.

And this has actually been at the core of a lot of the tension of how much do these transformers actually understand, basically memorizing the statistics of the language in a self-supervised way by reading a lot of texts.

A lot of people say no, until it impresses us and then everybody will say it's obvious.

But right with transformer is a really powerful way to generate text to reveal to you how much these models really learn.

Before this yesterday, actually just came up with a bunch of prompts.

So on the left is a prompt you give it, the meaning of life here, for example, is not what I think it is.

And you can do a lot of prompts of this nature.

You'll make sense of it statistically, but it'll be absurd and reveal that the model really doesn't understand the fundamentals of the prompt is being provided.

The limits of deep learning, we're just having fun with this at this point, still are still in the process of being figured out.

Most important person in the history of deep learning is probably Andrew Ng.

And I tried to get it to say something nice about me.

I said, Lex Freeman's best quality is that he's smart.

I said, finally, but, it's never nothing but ever happens, but I think he gets more attention than every, every Twitter comment ever.

A nice way to sort of reveal through this that the models are not able to do any kind of understanding of language is just to do prompts that show understanding of concepts and being able to reason with those concepts, common sense reasoning.

Trivia 1 is doing 2 plus 2 is A35 is A67, the result is a simple equation, 4 and 2 plus 3 is, like you got it right and then it changes mind.

You can ask it about gravity, all those kinds of things.

It shows that it doesn't understand the fundamentals of the concepts that are being reasoned about.

And I'll mention of work that takes it beyond towards that reasoning world in the next few slides.

But I should also mention with this GPT-2 model, if you remember about a year ago, there was a lot of thinking about this 1.5 billion parameter model from open AI.

It is so, the thought was it might be so powerful that it would be dangerous.

And so the idea from open AI is when you have an AI system that you're about to release that might turn out to be dangerous in this case used probably by Russians fake news for misinformation.

That that's the kind of thinking is how do we release it?

And I think while it turned out that the GPT-2 model is not quite so dangerous, that humans are in fact more dangerous than AI currently, that thought experiment is very interesting.

They released a report on release strategies and the social impacts of language models that almost didn't get as much intention as I think it should.

And it was a little bit disappointing to me how little people are worried about this kind of situation.

There was more of an eye roll about, oh, these language models aren't as smart as we thought they might be.

But the reality is once they are, It's a very interesting thought experiment of how should the process go of companies and experts communicating with each other during that release.

This report thinks through some of those details.

My takeaway from just reading the report from this whole year of that event is that conversation on this topic are difficult because we as the public seem to penalize anybody trying to have that conversation.

And the model of sharing privately confidentially between ML, machine learning organizations and experts is not there.

There's no incentive or a model or a history or a culture of sharing.

Best paper from ACL, the main conference for languages was on the difficult task of, so we talked about language models.

Now there's the task taking it a step further of dialogue, multi-domain task oriented dialogue.

That's sort of like the next challenge for dialogue systems.

And they've had a few ideas on how to perform dialogues, state tracking across domains, achieving state of the art performance on multi-laws, which is a 5 domain challenging, very difficult 5 domain, human to human dialogue dataset.

I should probably hurry up and start skipping stuff.

On the common sense reasoning, which is really interesting is, this 1 of the open questions for the deep learning community, AI community in general, is how can we have hybrid systems of whether it's symbolic AI deep learning or generally common sense reasoning with learning systems.

And there's been a few papers in this space.

1 of my favorites from Salesforce on building a data set where we can start to do question answering and figuring out the concepts that are being explored in the question and answering.

Here, the question, while eating a hamburger with friends, what are people trying to do?

Multiple choice, have fun, tasty, indigestion.

The idea that needs to be generated there, and that's where the language model would come in, is that usually a hamburger with friends indicates a good time.

So you basically take the question, generate the common sense concept, and from that, be able to determine the multiple choice what's happening, what's the state of affairs in this particular question.

Okay, Alexa prize again, hasn't received nearly enough attention that I think it should have, perhaps because there hasn't been major breakthroughs, but it's open domain conversations that all of us, anybody who owns an Alexa can participate in as a provider of data.

But there's been a lot of amazing work from universities across the world on the Alexa prize in the last couple of years.

And there's been a lot of interesting lessons summarized in papers and blog posts.

A few lessons from Alquist team that I particularly like.

And this is kind of echoes the work in IBM Watson with the Jeopardy challenge is that 1 of the big ones is that machine learning is not an essential tool for effective conversation yet.

So machine learning is useful for general chit-chat when you fail at deep, meaningful conversation or actually understanding what the topic we're talking about.

So throwing in chit-chat and classification, sort of classifying intent, finding the entities, detecting the sentiment of the sentences.

But the fundamentals of the conversation are the following.

Sort of conversation is, you can think of it as a long dance and the way you have fun dancing is you break it up into a set of moves and turns and so on and focus on that sort of live in the moment kind of thing.

So focus on small parts of the conversation taken at a time.

Then also have a graph sort of conversation is also all about tangents.

So have a graph of topics and be ready to jump context from 1 context to the other and back.

If you look at some of these natural language conversations that they publish, it's just all over the place in terms of topics.

You jump back and forth, and that's the beauty, the humor, the wit, the fun of conversations.

1 of the things that natural language systems don't seem to have much is opinions.

If I learned anything, 1 of the simplest way to convey intelligence is to be very opinionated about something and confident.

And that's a really interesting concept about, in general, there's just a lot of lessons.

Oh, and finally, of course, maximize entertainment, not information.

This is true for natural language conversation is Fun should be part of the objective function.

This is really the Lobner Prize, the Turing test of our generation.

It's, I'm excited to see if there's, anybody's able to solve the Alexa Prize.

Again, Alexa Prize is your task with talking to a bot and the measure of quality is the same as the Lobner prize is just measuring how good was that conversation, but also the task is to try to continue the conversation for 20 minutes.

If you try to talk to a bot today, like, and you have a choice to talk to a bot or go do something else, watch Netflix, the, you last probably less than 10 seconds.

The point is to continue trapping you in the conversation because you're enjoying it so much.

And the 20 minutes is that's a really nice benchmark for passing the spirit of what the Turing test stood for.

Examples here from the Alexa prize, from the Alcos bot.

So the difference in 2 kinds of conversations.

See all Lex Fridman transcripts on Youtube

Deep Learning State of the Art (2020)