Daphne Koller is a professor of computer science at Stanford University, a co-founder of Coursera with Andrew Ng and Founder and CEO of insitro, a company at the intersection of machine learning and biomedicine.

Support this podcast by signing up with these sponsors:
- Cash App - use code "LexPodcast" and download:
- Cash App (App Store): https://apple.co/2sPrUHe
- Cash App (Google Play): https://bit.ly/2MlvP5w

EPISODE LINKS:
Daphne's Twitter: https://twitter.com/daphnekoller
Daphne's Website: https://ai.stanford.edu/users/koller/index.html
Insitro: http://insitro.com

PODCAST INFO:
Podcast website:
https://lexfridman.com/podcast
Apple Podcasts:
https://apple.co/2lwqZIr
Spotify:
https://spoti.fi/2nEwCF8
RSS:
https://lexfridman.com/feed/podcast/
Full episodes playlist:
https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4
Clips playlist:
https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41

OUTLINE:
0:00 - Introduction
2:22 - Will we one day cure all disease?
6:31 - Longevity
10:16 - Role of machine learning in treating diseases
13:05 - A personal journey to medicine
16:25 - Insitro and disease-in-a-dish models
33:25 - What diseases can be helped with disease-in-a-dish approaches?
36:43 - Coursera and education
49:04 - Advice to people interested in AI
50:52 - Beautiful idea in deep learning
55:10 - Uncertainty in AI
58:29 - AGI and AI safety
1:06:52 - Are most people good?
1:09:04 - Meaning of life

CONNECT:
- Subscribe to this YouTube channel
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/LexFridmanPage
- Instagram: https://www.instagram.com/lexfridman
- Medium: https://medium.com/@lexfridman
- Support on Patreon: https://www.patreon.com/lexfridman

The following is a conversation with Daphne Kohler, a professor of computer science at Stanford University, a co-founder of Coursera with Andrew Ng, and founder and CEO of In-Citro, a company at the intersection of machine learning and biomedicine.

We're now in the exciting early days of using the data-driven methods of machine learning to help discover and develop new drugs and treatments at scale.

Daphne and In-sitro are leading the way on this, with breakthroughs that may ripple through all fields of medicine, including one's most critical for helping with the current coronavirus pandemic.

This conversation was recorded before the COVID-19 outbreak.

For everyone feeling the medical, psychological, and financial burden of this crisis, I'm sending love your way.

Stay strong, we're in this together, we'll beat this thing.

This is the Artificial Intelligence Podcast.

If you enjoy it, subscribe on YouTube, review it with 5 stars on Apple Podcasts, support it on Patreon, or simply connect with me on Twitter at Lex Friedman, spelled F-R-I-D-M-A-N.

few minutes of ads now and never any ads in the middle that can break the flow of the conversation.

I hope that works for you and doesn't hurt the listening experience.

This show is presented by Cash App, the number 1 finance app in the App Store.

Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1.

Since Cash App allows you to send and receive money digitally, peer-to-peer, and security in all digital transactions is very important, let me mention the PCI Data Security Standard that Cash App is compliant with.

I'm a big fan of standards for safety and security.

PCI DSS is a good example of that, where a bunch of competitors got together and agreed that there needs to be a global standard around the security of transactions.

Now we just need to do the same for autonomous vehicles and AI systems in general.

So again, if you get Cash App from the App Store, Google Play, and use the code LEXBODCAST, you get $10 and Cash App will also donate $10 to FIRST, an organization that is helping to advance robotics and STEM education for young people around the world.

And now here's my conversation with Daphne Koller.

So you co-founded Coursera and made a huge impact in the global education of AI.

And after 5 years in August, 2016, wrote a blog post saying that you're stepping away and wrote, quote, it is time for me to turn to another critical challenge, the development of machine learning and its applications to improving human health.

So let me ask 2 far out philosophical questions.

1, do you think we'll 1 day find cures for all major diseases known today?

And 2, do you think we'll 1 day figure out a way to extend the human lifespan, perhaps to the point of immortality?

So 1 day is a very long time and I don't like to make predictions of the type we will never be able to do X because I think that's a, you know, that's the smacks of hubris.

Seems that never in the entire eternity of human existence will we be able to solve a problem.

That being said, curing disease is very hard because oftentimes by the time you discover the disease, a lot of damage has already been done.

And so to assume that we would be able to cure disease at that stage assumes that we would come up with ways of basically regenerating entire parts of the human body in the way that actually returns it to its original state.

We've been able to provide treatment for an increasingly large number, but the number of things that you could actually define to be cures is actually not that large.

So I think that there's a lot of work that would need to happen before 1 could legitimately say that we have cured even a reasonable number, far less all diseases.

On a scale of 0 to 100, where are we in understanding the fundamental mechanisms of all major diseases?

So from the computer science perspective that you've entered the world of health, how far along are we?

I mean, there are ones where I would say we're maybe not quite at a hundred because biology is really complicated and there's always new things that we uncover that people didn't even realize existed.

But I would say there's diseases where we might be in the 70s or 80s.

And then there's diseases in which I would say probably the majority were really close to 0.

Would Alzheimer's and schizophrenia and type 2 diabetes fall closer to 0 or to the 80?

I think Alzheimer's is probably closer to 0 than to 80.

There are hypotheses, but I don't think those hypotheses have as of yet been sufficiently validated that we believe them to be true, and there's an increasing number of people who believe that the traditional hypotheses might not really explain what's going on.

I would also say that Alzheimer's and schizophrenia and even type 2 diabetes are not really 1 disease.

They're almost certainly a heterogeneous collection of mechanisms that manifest in clinically similar ways.

So in the same way that we now understand that breast cancer is really not 1 disease.

It is a multitude of cellular mechanisms, all of which ultimately translate to uncontrolled proliferation, but it's not 1 disease.

The same is almost undoubtedly true for those other diseases as well.

And it's that understanding that needs to precede any understanding of the specific mechanisms of any of those other diseases.

Now, in schizophrenia, I would say we're almost certainly closer to 0 than to anything else.

There are clear mechanisms that are implicated that I think have been validated that have to do with insulin resistance and such, but there's almost certainly there as well, many mechanisms that we have not yet understood.

So You've also thought and worked a little bit on the longevity side.

Do you see the disease and longevity as overlapping completely, partially, or not at all as efforts?

Those mechanisms are certainly overlapping.

There's a well-known phenomenon that says that for most diseases, other than childhood diseases, the risk for contracting that disease increases exponentially year on year every year from the time you're about 40.

So obviously there is a connection between those 2 things.

That's not to say that they're identical.

There's clearly aging that happens that is not really associated with any specific disease.

And there's also diseases and mechanisms of disease that are not specifically related to aging.

It is a little unfortunate that we get older and it seems that there's some correlation with the occurrence of diseases or the fact that we get older and both are quite sad.

I mean, there's processes that happen as cells age that I think are contributing to disease.

Some of those have to do with DNA damage that accumulates as cells divide, where the repair mechanisms don't fully correct for those.

There are accumulations of proteins that are misfolded and potentially aggregate, and those 2 contribute to disease and contribute to inflammation.

There is a multitude of mechanisms that have been uncovered that are sort of wear and tear at the cellular level that contribute to disease processes that I'm sure there's many that we don't yet understand.

On a small tangent, perhaps philosophical, the fact that things get older and the fact that things die is a very powerful feature for the growth of new things.

So, do you, so in trying to fight disease and trying to fight aging, do you think about sort of the useful fact of our mortality?

Or would you, like if you were, could be immortal, would you choose to be immortal?

Again, I think immortal is a very long time.

And I don't know that that would necessarily be something that I would want to aspire to, but I think all of us aspire to an increased health span, I would say, which is an increased amount of time where you're healthy and active and feel as you did when you were 20, we're nowhere close to that.

People deteriorate physically and mentally over time, and that is a very sad phenomenon.

So, I think a wonderful aspiration would be if we could all live to, you know, the biblical 120 maybe in perfect health.

I think that would be an amazing goal for us to achieve as a society.

I think that's up for debate, but I think an increased health span is a really worthy goal.

And anyway, in a grand time of the age of the universe, it's all pretty short.

So from the perspective, you've done obviously a lot of incredible work in machine learning.

So what role do you think data and machine learning play in this goal of trying to understand diseases and trying to eradicate diseases?

Up until now, I don't think it's played very much of a significant role because largely the datasets that 1 really needed to enable a powerful machine learning method, those datasets haven't really existed.

There's been dribs and drabs and some interesting machine learning that has been applied, I would say machine learning slash data science.

But the last few years are starting to change that.

We now see an increase in some large datasets, but equally importantly, an increase in technologies that are able to produce data at scale.

It's not typically the case that people have deliberately, proactively used those tools for the purpose of generating data for machine learning.

They, to the extent that those techniques have been used for data production, they've been used for data production to drive scientific discovery.

The machine learning came as a sort of byproduct second stage of, oh, you know, now we have a dataset, let's do machine learning on that rather than a more simplistic data analysis method.

But what we are doing at In-Sitro is actually flipping that around and saying, here's this incredible repertoire of methods that bioengineers, cell biologists have come up with.

Let's see if we can put them together in brand new ways with the goal of creating data sets that machine learning can really be applied on productively to create powerful predictive models that can help us address fundamental problems in human health.

So really focus, to get, make data the primary focus and the primary goal, and find, use the mechanisms of biology and chemistry to create the kinds of data set that could allow machine learning to benefit the most?

I wouldn't put it in those terms because that says that data is the end goal.

So for us, the end goal is helping address challenges in human health.

And the method that we've elected to do that is to apply machine learning to build predictive models and machine learning, in my opinion, can only be really successfully applied, especially the more powerful models, if you give it data that is of sufficient scale and sufficient quality.

So how do you create those data sets so as to drive the ability to generate predictive models, which subsequently help improve human health?

So before we dive into the details of that, let me take a step back and ask, when and where was your interest in human health born?

Are there moments, events, perhaps, if I may ask, tragedies in your own life that catalyzes passion, or was it the broader desire to help humankind?

So on, I mean, my interest in human health actually dates back to the early 2000s when a lot of my peers in machine learning and I were using datasets that frankly were not very inspiring.

Some of us old-timers still remember the quote-unquote 20 news groups data set where this was literally a bunch of text from 20 news groups, a concept that doesn't really even exist anymore.

And the question was, Can you classify which news group a particular bag of words came from?

The data sets at the time on the biology side were much more interesting, both from a technical and also from an aspirational perspective.

They were still pretty small, but they were better than 20 news groups.

I started out, I think, just by wanting to do something that was more, I don't know, societally useful and technically interesting.

Then over time, became more and more interested in the biology and the human health aspects for themselves.

And began to work even sometimes on papers that were just in biology without having a significant machine learning component.

I think my interest in drug discovery is partly due to an incident I had when my father sadly passed away about 12 years ago.

He had an autoimmune disease that settled in his lungs.

And the doctor's basic said, well, there's only 1 thing we can do, which is give him prednisone.

At some point, I remember a doctor even came and said, hey, let's do a lung biopsy to figure out which autoimmune disease he has.

And I had friends who were rheumatologists who said, The FDA would never approve prednisone today because the ratio of side effects to benefit is probably not large enough.

Today, we're in a state where there's probably 4 or 5, maybe even more, well, it depends for which autoimmune disease, but there are multiple drugs that can help people with autoimmune disease, many of which didn't exist 12 years ago.

And I think we're at a golden time in some ways in drug discovery, where there's the ability to create drugs that are much more safe and much more effective than we've ever been able to before.

And what's lacking is enough understanding of biology and mechanism to know where to aim that engine.

And I think that's where machine learning can help.

So in 2018, you started and now lead a company in Citro, which is, like you mentioned, perhaps the focus is drug discovery and the utilization of machine learning for drug discovery.

So you mentioned that, quote, we're really interested in creating what you might call a disease in a dish model, disease in a dish models, Places where diseases are complex, where we really haven't had a good model system.

Where typical animal models that have been used for years, including testing on mice, just aren't very effective.

So Can you try to describe what is an animal model and what is a disease in a dish model?

Sure, so an animal model for disease is where you create effectively, it's what it sounds like, It's oftentimes a mouse where we have introduced some external perturbation that creates the disease and then we cure that disease.

And the hope is that by doing that we will cure a similar disease in the human.

The problem is that oftentimes the way in which we generate the disease in the animal has nothing to do with how that disease actually comes about in a human.

It's what you might think of as a copy of the phenotype, a copy of the clinical outcome, but the mechanisms are quite different.

And so curing the disease in the animal, which in most cases doesn't happen naturally, mice don't get Alzheimer's, they don't get diabetes, they don't get atherosclerosis, they don't get autism or schizophrenia, those cures don't translate over to what happens in the human.

And that's where most drugs fails, just because the findings that we had in the mouse don't translate to a human.

The disease in the dish models is a fairly new approach.

It's been enabled by technologies that have not existed for more than 5 to 10 years.

So for instance, the ability for us to take a cell from any 1 of us, you or me, revert that, say, skin cell to what's called stem cell status, which is what's called a pluripotent cell that can then be differentiated into different types of cells.

So from that pluripotent cell, 1 can create a Lex neuron or a Lex cardiomyocyte or a Lex hepatocyte that has your genetics, but that right cell type.

And so if there's a genetic burden of disease that would manifest in that particular cell type, you might be able to see it by looking at those cells and saying, oh, that's what potentially sick cells look like versus healthy cells and understand how, and then explore what kind of interventions might revert the unhealthy looking cell to a healthy cell.

Now, of course, curing cells is not the same as curing people.

And so there's still potentially a translatability gap.

But at least for diseases that are driven, say, by human genetics, and where the human genetics is what drives the cellular phenotype, there is some reason to hope that if we revert those cells in which the disease begins and where the disease is driven by genetics and we can revert that cell back to a healthy state, maybe that will help also revert the more global clinical phenotypes.

That step, that backward step, I was reading about it, the Yamanaka factor.

So it's like that reverse step back to stem cells.

Honestly, before that happened, I think very few people would have predicted that to be possible.

Can you maybe elaborate, is it actually possible?

Like, where, like, how stable, so this result was maybe, like, I don't know how many years ago, maybe 10 years ago, was first demonstrated, something like that.

It was much more, I think, finicky and bespoke at the early stages when the discovery was first made.

But at this point it's become almost industrialized.

There are what's called contract research organizations, vendors that will take a sample from a human and revert it back to stem cell status, and it works a very good fraction of the time.

Now, there are people who will ask, I think, good questions.

Is this really, truly a stem cell, or does it remember certain aspects of changes that were made in the human beyond the genetics.

It's passed as a skin cell, or it's passed in terms of exposures to different environmental factors and so on.

So I think the consensus right now is that these are not always perfect and there is little bits and pieces of memory sometimes, but by and large, these are actually pretty good.

So 1 of the key things, well maybe you can correct me, but 1 of the useful things for machine learning is size, scale of data.

How easy it is to do these kinds of reversals to stem cells and then disease in a dish models at scale.

So the reversal is not as of this point something that can be done at the scale of tens of thousands or hundreds of thousands, I think total number of stem cells or IPS cells that are what's called induced pluripotent stem cells in the world I think is somewhere between 5 and 10,000 last I looked.

Now again, that might not count things that exist in this or that academic center, and they may add up to a bit more, but that's about the range.

So it's not something that you could at this point generate iPS cells from a million people, but maybe you don't need to, because maybe that background is enough, because it can also be now perturbed in different ways.

And some people have done really interesting experiments in, for instance, taking cells from a healthy human and then introducing a mutation into it using 1 of the other miracle technologies that's emerged in the last decade, which is CRISPR gene editing, and introduce the mutation that is known to be pathogenic.

And so you can now look at the healthy cells and unhealthy cells, the 1 with the mutation, and do a one-on-one comparison where everything else is held constant.

And so you could really start to understand specifically what the mutation does at the cellular level.

So the IPS cells are a great starting point and obviously more diversity is better because you also want to capture ethnic background and how that affects things, but maybe you don't need 1 from every single patient with every single type of disease because we have other tools at our disposal.

Well, how much difference is there between people, I mentioned ethnic background, in terms of iPS cells?

So we're all, like it seems like these magical cells that can do anything, to create anything between different populations, different people, is there a lot of variability between stem cells?

Well, first of all, there's the variability that's driven simply by the fact that genetically we're different.

So a stem cell that's derived from my genotype is gonna be different from a stem cell that's derived from your genotype.

There's also some differences that have more to do with, for whatever reason, some people's stem cells differentiate better than other people's stem cells.

We don't entirely understand why, so there's certainly some differences there as well.

But the fundamental difference, and the 1 that we really care about and is a positive, is that the fact that the genetics are different and therefore recapitulate my disease burden versus your disease burden.

Well, a disease burden is just, if you think, I mean, it's not a well-defined mathematical term, although there are mathematical formulations of it.

If you think about the fact that some of us are more likely to get a certain disease than others because we have more variations in our genome that are causative of the disease, maybe fewer that are protective of the disease.

People have quantified that using what are called polygenic risk scores, which look at all of the variations in an individual person's genome and add them all up in terms of how much risk they confer for a particular disease.

And then they've put people on a spectrum of their disease risk.

And for certain diseases where we've been sufficiently powered to really understand the connection between the many, many small variations that give rise to an increased disease risk, there is some pretty significant differences in terms of the risk between the people, say, at the highest decile of this polygenic risk score and the people at the lowest decile.

Sometimes those differences are a factor of 10 or 12 higher.

So there's definitely a lot that our genetics contributes to disease risk, even if it's not by any stretch the full explanation.

And from a machinery perspective, there's signal there.

There is definitely signal in the genetics, and there's even more signal, we believe, in looking at the cells that are derived from those different genetics.

Because in principle, you could say all the signal is there at the genetics level, so we don't need to look at the cells.

But our understanding of the biology is so limited at this point, then seeing what actually happens at the cellular level is a heck of a lot closer to the human clinical outcome than looking at the genetics directly.

And so we can learn a lot more from it than we could by looking at genetics alone.

So just to get a sense, I don't know if it's easy to do, but what kind of data is useful in this disease in a dish model?

Like what's the source of raw data information?

And also, from my outsider's perspective, sort of biology and cells are squishy things.

So that's another 1 of those revolutions that have happened in the last 10 years in that our ability to measure cells very quantitatively has also dramatically increased.

So back when I started doing biology in the late 90s, early 2000s, that was the initial era where we started to measure biology in really quantitative ways using things like microarrays, where you would measure in a single experiment, the activity level, what's called expression level, of every gene in the genome in that sample.

And that ability is what actually allowed us to even understand that there are molecular subtypes of diseases like cancer, where up until that point it's like, oh, you have breast cancer.

But then when we looked at the molecular data, it was clear that there's different subtypes of breast cancer that, at the level of gene activity, look completely different to each other.

So that was the beginning of this process.

Now we have the ability to measure individual cells in terms of their gene activity using what's called single cell RNA sequencing, which basically sequences the RNA, which is that activity level of different genes for every gene in the genome.

And you could do that at single-cell levels.

That's an incredibly powerful way of measuring cells.

I mean, you literally count the number of transcripts, so it really turns that squishy thing into something that's digital.

Another tremendous data source that's emerged in the last few years is microscopy, and specifically even super-resolution microscopy, where you could use digital reconstruction to look at subcellular structures, sometimes even things that are below the diffraction limit of light by doing a sophisticated reconstruction.

And again, that gives you tremendous amounts of information at the subcellular level.

There's now more and more ways that amazing scientists out there are developing for getting new types of information from even single cells.

And so that is a way of turning those squishy things into digital data.

But so that data set then with machine learning tools allows you to maybe understand the developmental, like the mechanism of a particular disease.

And if it's possible to sort of at a high level, describe how does that help lead to drug discovery that can help prevent, reverse that mechanism?

So I think there's different ways in which this data could potentially be used.

Some people use it for scientific discovery and say, oh look, we see this phenotype at the cellular level.

So let's try and work our way backwards and think which genes might be involved in pathways that give rise to that.

So that's a very sort of analytical method to sort of work our way backwards using our understanding of known biology.

Some people use it in a somewhat more, you know, sort of forward.

If that was backward, this would be forward, which is to say, okay, if I can perturb this gene, does it show a phenotype that is similar to what I see in disease patients?

And so maybe that gene is actually causal of the disease, so that's a different way.

And then there's what we do, which is basically to take that very large collection of data and use machine learning to uncover the patterns that emerge from it.

So for instance, what are those subtypes that might be similar at the human clinical outcome, but quite distinct when you look at the molecular data.

And then if we can identify such a subtype, are there interventions that if I apply it to cells that come from this subtype of the disease, and you apply that intervention, it could be a drug or it could be a CRISPR gene intervention, does it revert the disease state to something that looks more like normal, happy, healthy cells?

And so hopefully if you see that, that gives you a certain hope that that intervention will also have a meaningful clinical benefit to people.

And there's obviously a bunch of things that you would wanna do after that to validate that, but it's a very different and much less hypothesis-driven way of uncovering new potential interventions and might give rise to things that are not the same things that everyone else is already looking at.

That's, I don't know, I'm just like, to psychoanalyze my own feeling about our discussion currently.

It's so exciting to talk about fundamentally, well, something that's been turned into a machine learning problem and that can have so much real world impact.

That's kind of exciting, because I'm so, most of my day is spent with data sets that I guess closer to the news groups So this is a kind of it just feels good to talk about In fact, I don't almost don't want to talk to you about machine learning I want to talk about the fundamentals of the dataset, which is an exciting place to be.

It's also what attracts a lot of the people who work at In-sitro to In-sitro because I think all of our machine learning people are outstanding and could go get a job, you know, selling ads online or doing e-commerce or even self-driving cars.

But I think they would want, they come to us because they want to work on something that has more of an aspirational nature and can really benefit humanity.

What, with these approaches, what do you hope, what kind of diseases can be helped?

We mentioned Alzheimer's, schizophrenia, type 2 diabetes.

Can you just describe the various kinds of diseases that this approach can help?

Well, we don't know, and I try and be very cautious about making promises about some things.

See all Lex Fridman transcripts on Youtube

Daphne Koller: Biomedicine and Machine Learning | Lex Fridman Podcast #93