See all Lex Fridman transcripts on Youtube

youtube thumbnail

Dmitry Korkin: Computational Biology of Coronavirus | Lex Fridman Podcast #90

2 hours 9 minutes 1 seconds

🇬🇧 English

S1

Speaker 1

00:00

The following is a conversation with Dmitry Korkin. He's a professor of bioinformatics and computational biology at WPI, Worcester Polytechnic Institute, where he specializes in bioinformatics of complex diseases, computational genomics, systems biology, and biomedical data analytics. I came across Dmitry's work when in February, his group used the viral genome of the COVID-19 to reconstruct the 3D structure of its major viral proteins and their interaction with the human proteins. In effect, creating a structural genomics map of the coronavirus and making this data open and available to researchers everywhere.

S1

Speaker 1

00:40

We talked about the biology of COVID-19, SARS and viruses in general, and how computational methods can help us understand their structure and function in order to develop antiviral drugs and vaccines. This conversation was recorded recently in the time of the coronavirus pandemic. For everyone feeling the medical, psychological, and financial burden of this crisis, I'm sending love your way. Stay strong, we're in this together, we'll beat this thing.

S1

Speaker 1

01:09

This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, review it with 5 Stars and Apple podcast, support it on Patreon, or simply connect with me on Twitter at Lex Friedman, spelled F-R-I-D-M-A-N. This show is presented by Cash App, the number 1 finance app in the App Store. When you get it, use code LEXPODCAST.

S1

Speaker 1

01:30

Cash App lets you send money to friends, buy Bitcoin, and invest in the stock market with as little as $1. Since CashApp allows you to buy Bitcoin, let me mention that cryptocurrency in the context of the history of money is fascinating. I recommend Ascent of Money as a great book on this history. Debits and credits on ledgers started around 30,000 years ago.

S1

Speaker 1

01:52

The US dollar created over 200 years ago. And Bitcoin, the first decentralized cryptocurrency released just over 10 years ago. So given that history, cryptocurrency is still very much in its early days of development, but it's still aiming to and just might redefine the nature of money. So again, if you get Cash App from the App Store or Google Play and use the code LEXPODCAST, you get $10 and Cash App will also donate $10 to FIRST, an organization that is helping to advance robotics and STEM education for young people around the world.

S1

Speaker 1

02:28

And now, here's my conversation with Dmitry Korkin.

S2

Speaker 2

02:33

Do you find viruses terrifying or fascinating?

S3

Speaker 3

02:38

When I think about viruses, I think about them, I mean, I imagine them as those villains that do their work so perfectly well. That is impossible not to be fascinated with them.

S2

Speaker 2

02:57

So what do you imagine when you think about a virus? Do you imagine the individual, sort of these 100 nanometer particle things, or do you imagine the whole pandemic, like society level, when you say the efficiency at which they do their work, Do you think of viruses as the millions that occupy a human body or a living organism, society level, like spreading as a pandemic, or do

S3

Speaker 3

03:26

you think of the individual little guy? Yes, I think this is a unique concept that allows you to move from micro scale to the macro scale. All right, so the virus itself, I mean, it's not a living organism.

S3

Speaker 3

03:44

It's a machine to me, It's a machine, but it is perfected to the way that it essentially has a limited number of functions. It needs to do necessary functions. And essentially has enough information just to do those functions, as well as the ability to modify itself. So, you know, It's a machine, it's an intelligent machine.

S2

Speaker 2

04:18

So yeah, look, maybe on that point, you're in danger of reducing the power of this thing by calling it a machine, right? But you now mentioned that it's also possibly intelligent. It seems that there's these elements of brilliance that a virus has, of intelligence, of maximizing so many things about its behavior and to ensure its survival and its success.

S2

Speaker 2

04:48

So

S3

Speaker 3

04:49

do you see it as intelligent? So, you know, I think it's a different, I understand it differently than, you know, I think about, you know, intelligence of humankind or intelligence of the artificial intelligence mechanisms. I think the intelligence of a virus is in its simplicity.

S3

Speaker 3

05:20

The ability to do so much with so little material and information. But also I think it's interesting, it keeps me thinking, you know, it keeps me wondering whether or not it's also the, An example of the basic swarm intelligence where essentially the viruses act as the whole and they're extremely efficient in that.

S2

Speaker 2

06:02

So what do you attribute the incredible simplicity and the efficiency to? Is it the evolutionary process? So maybe another way to ask that, if you look at the next hundred years, are you more worried about the natural pandemics or the engineered pandemics?

S2

Speaker 2

06:20

So how hard is it to build a virus?

S3

Speaker 3

06:23

Yes, it's a very, very interesting question because obviously there's a lot of conversations about the, you know, whether we are capable of engineering a, you know, anyone worse a virus. I personally expect and am mostly concerned with the naturally occurring viruses simply because we keep seeing that. We keep seeing new strains of influenza emerging, some of them becoming pandemic, we keep seeing new strains of coronaviruses emerging.

S3

Speaker 3

07:07

This is a natural process and I think this is why it's so powerful. You know, if you ask me, I've read papers about scientists trying to study the capacity of the modern biotechnology to alter the viruses. But I hope that it won't be our main concern in the near future

S2

Speaker 2

07:48

What do you mean by? Hope

S3

Speaker 3

07:54

Well, you know if you look back and look at the history of the of the most dangerous viruses, right so that's the first thing that comes into mind is smallpox. So right now there is perhaps a handful of places where the strains of this virus are stored. So this is essentially the effort of the whole society to limit the access to those viruses.

S3

Speaker 3

08:31

And you

S2

Speaker 2

08:32

mean in a lab in a controlled environment in order to study? And then smallpox is 1 of the viruses for which it should be stated there's a vaccine is developed.

S3

Speaker 3

08:43

Yes, yes. And it's until 70s, in my opinion it was perhaps the most dangerous thing that was there.

S2

Speaker 2

08:56

Is that a very different virus than the influenza and the coronaviruses?

S3

Speaker 3

09:04

It is, It is different in several aspects. Biologically, it's a so-called double-stranded DNA virus, but also in the way that it is much more contagious. So the R naught for, so this is the...

S2

Speaker 2

09:30

What's R naught?

S3

Speaker 3

09:31

R naught is essentially an average number as person infected by the virus can spread to other people. So then the average number of people that he or she can spread it to. And there is still some discussion about the estimates of the current virus.

S3

Speaker 3

10:00

The estimations vary between 1.5 and

S1

Speaker 1

10:04

3.

S3

Speaker 3

10:07

In case of smallpox, it was 5 to 7. And we're talking about the exponential growth, right? So that's a very big difference.

S3

Speaker 3

10:23

It's not the most contagious 1. Measles, for example, it's, I think, 15 and up. So it's, you know, but it's definitely, definitely more contagious that the seasonal flu than the current coronavirus or SARS for that matter.

S2

Speaker 2

10:50

What makes a virus more contagious? I'm sure there's a lot of variables that come into play, but is it that whole discussion of aerosol and like the size of droplets, if it's airborne, or is there some other stuff that's more biology-centered?

S3

Speaker 3

11:04

I mean, there are a lot of components, and there are biological components, that there are also social components. The ability of the virus to, you know, so the ways in which the virus is spread is definitely 1. The ability to virus to stay on the surfaces, to survive.

S3

Speaker 3

11:29

The ability of the virus to replicate fast or so. Once it's in the cell or whatever. Once it's inside the host. And interestingly enough, something that I think we didn't pay that much attention to is the incubation period, where hosts are symptomatic.

S3

Speaker 3

11:53

And now it turns out that another thing that 1 really needs to take into account, the percentage of the symptomatic population, because those people still shed this virus and they still are contagious.

S2

Speaker 2

12:13

I saw there's the Iceland study, which I think is probably the most impressive size-wise, shows 50% asymptomatic for this virus. I also recently learned the swine flu is just the number of people who got infected was in the billions. It was some crazy number.

S2

Speaker 2

12:38

It was like, like 20% of the, 30% of the population, something crazy like that. So the lucky thing there is the fatality rate is low. But the fact that a virus can just take over an entire population so quickly is terrifying.

S3

Speaker 3

12:56

I think, I mean, this is, you know, That's perhaps my favorite example of a butterfly effect. Because it's really, I mean, it's even tinier than a butterfly. And look at, you know, and with, you know, if you think about it, right?

S3

Speaker 3

13:13

So it used to be in those bat species. And perhaps because of a couple of small changes in the viral genome, it first had become capable of jumping from bats to human, and then it became capable of jumping from human to human. So this is, I mean, it's not even the size of a virus, it's the size of several, you know, several atoms or a few atoms. And all of a sudden this change has such a major impact.

S2

Speaker 2

13:58

So is that a mutation on a single virus? So if we talk about the flap of a butterfly wing, like, what's the first flap?

S3

Speaker 3

14:09

Well, I think this is the mutations that make, that made this virus capable of jumping from bat species to human. Of course, the scientists are still trying to find, I mean, they're still even trying to find who was the first infected, right? The patient 0.

S3

Speaker 3

14:31

The first human. The first human infected, right? I mean, the fact that there are coronaviruses, different strains of coronaviruses in various bat species, I mean, we know that. So we, you know, virologists observe them, they study them.

S3

Speaker 3

14:48

They look at their genomic sequences. They're trying, of course, to understand what make this viruses to jump from bats to human. There was, you know, similar to that in influenza, there was, I think, a few years ago, there was this, you know, interesting story where several groups of scientists studying influenza virus, essentially, you know, made experiments to show that this virus can jump from 1 species to another, you know, by changing, I think, just a couple of residues. And of course, it was very controversial.

S3

Speaker 3

15:39

I think there was a moratorium on this study for a while, but then the study was released, it was published.

S2

Speaker 2

15:47

So that was their moratorium, because it shows through engineering it, through modifying it, you can make it jump.

S3

Speaker 3

15:56

Yes. Yeah. I personally think it is important to study this. I mean, we should be informed.

S3

Speaker 3

16:05

We should try to understand as much as possible in order to prevent it.

S2

Speaker 2

16:12

But so then the engineering aspect there is, Can't you then just start searching because there's so many strands of viruses out there. Can't you just search for the ones in bats that are the deadliest from the virologist's perspective and then just try to engineer, try to see how to, but see, there's a nice aspect to it. The really nice thing about engineering viruses, it has the same problem as nuclear weapons, is it's hard for it to not lead to mutual self-destruction.

S2

Speaker 2

16:49

So you can't control a virus, it can't be used as a weapon, right?

S3

Speaker 3

16:53

Yeah, that's why in the beginning I said, I'm hopeful because there definitely regulations to be needed to be introduced. And I mean, as the scientific society is, we are in charge of, you know, making the right actions, making the right decisions. But I think we will benefit tremendously by understanding the mechanisms by which the virus can jump, by which the virus can become more, more dangerous to humans.

S3

Speaker 3

17:41

Because all this answers would eventually lead to designing better vaccines, hopefully universal vaccines, right? And that would be a triumph of science.

S2

Speaker 2

17:56

So what's the universal vaccine? So is that something that, How universal is universal?

S3

Speaker 3

18:02

Well, I mean, you know, so. What's the dream,

S2

Speaker 2

18:04

I guess, because you kind

S3

Speaker 3

18:05

of mentioned the dream of this. I would be extremely happy if, you know, we designed the vaccine that is able, I mean, I'll give you an example, right? So Every year we do a seasonal flu shot.

S3

Speaker 3

18:21

The reason we do it is because we are in the arms race, our vaccines are in the arms race with constantly changing virus. Now if the next pandemic, influenza pandemic will occur, most likely this vaccine would not save us. Right? Although it's, you know, it's the same virus, might be different strain.

S3

Speaker 3

18:52

So if we're able to essentially design a vaccine against influenza A virus, no matter what's the strain, no matter which species did it jump from, that would be, I think that would be a huge, huge progress and advancement.

S2

Speaker 2

19:13

You mentioned smallpox until the 70s might have been something that you would be worried the most about. What about these days? Well, we're sitting here in the middle of a COVID-19 pandemic but these days, Nevertheless, what is your biggest worry virus-wise?

S2

Speaker 2

19:34

What are you keeping your eye out on?

S3

Speaker 3

19:38

It looks like, and based on the past several years of the new viruses emerging, I think we're still dealing with different types of influence. I mean, it's also the H7N9 avian flu that emerged, I think, a couple of years ago in China. I think the mortality rate was incredible.

S3

Speaker 3

20:13

I mean, it was, you know, I think above 30 percent, you know. So this is huge. I mean, luckily for us, this strain was not pandemic, all right? So it was jumping from birds to human, but I don't think it was actually transmittable between the humans.

S3

Speaker 3

20:35

And this is actually a very interesting question which scientists try to understand, right? So the balance, the delicate balance between the virus being very contagious, right? So efficient in spreading and virus to be very pathogenic, causing harms and deaths to their hosts. So it looks like that the more pathogenic the virus is, the less contagious it is.

S2

Speaker 2

21:14

Is that a property of biology or

S3

Speaker 3

21:17

what is it? I don't have an answer to that. And I think this is still an open question.

S3

Speaker 3

21:22

But if you look at, with the coronavirus, for example, if you look at the deadlier relative MERS, MERS was never a pandemic virus. But again, the mortality rate from MERS is far above, I think, 20 or

S1

Speaker 1

21:48

30%.

S2

Speaker 2

21:52

So whatever is making this all happen doesn't want us dead, because it's balancing out nicely. I mean, how do you explain that we're not dead yet? Because there's so many viruses and they're so good at what they do.

S2

Speaker 2

22:11

Why do they keep us alive?

S3

Speaker 3

22:14

I mean, We also have a lot of protection, right? So with the immune system. And so, I mean, we do have ways to fight against those viruses.

S3

Speaker 3

22:31

And I think with the, now we're much better equipped, right, so with the discoveries of vaccines, and you know, there are vaccines against the viruses that maybe 200 years ago would wipe us out completely. But because of these vaccines, we are actually, we are capable of eradicating pretty much fully as is the case with smallpox.

S2

Speaker 2

22:58

So if we could, can

S3

Speaker 3

23:00

we go to the basics a little bit of the biology of the virus? How does a virus infect the body? So I think there are some key steps that the virus needs to perform.

S3

Speaker 3

23:13

And of course the first 1, the viral particle needs to get attached to the host cell. In the case of coronavirus, there is a lot of evidence that it actually interacts in the same way as the SARS coronavirus. So it gets attached to AC2 human receptor. And so there is, I mean, as we speak, there is a growing number of papers suggesting it.

S3

Speaker 3

23:46

Moreover, most recent, I think, most recent results suggest that this virus attaches more efficiently to this human receptor than SARS.

S2

Speaker 2

24:00

So just to sort of back off, so there's a family of viruses, the coronaviruses, and SARS, whatever the heck, forgot, this is respiratory, whatever that stands for.

S3

Speaker 3

24:12

So SARS actually stands for the disease that you get, is the syndrome of acute respiratory.

S2

Speaker 2

24:20

So SARS is the first strand and then there's MERS.

S3

Speaker 3

24:25

MERS and there is, yes. Scientists actually know more than 3 strands. I mean, so there is the MHV strain, which is considered to be a canonical model, disease model in mice.

S3

Speaker 3

24:47

And so there is a lot of work done on this virus because it's...

S2

Speaker 2

24:52

But it hasn't jumped to humans yet?

S3

Speaker 3

24:53

No, no. Oh, interesting. Yes.

S2

Speaker 2

24:55

That's fascinating. So, and then you mentioned AC2, So when you say attach, proteins are involved on both sides.

S3

Speaker 3

25:06

Yes, so we have this infamous spike protein on the surface of the virion particle, And it does look like a spike. And I mean, that's essentially because of this protein, we call the coronavirus coronavirus. So that what makes corona on top of the surface.

S3

Speaker 3

25:29

So this protein, it actually, it acts, So it doesn't act alone, it actually it makes a 3 copies, and it's it makes so called trimer. So this trimer is essentially a functional unit, a single functional unit that starts interacting with the AC2 receptor. So this is again another protein that now sits on the surface of a human cell, a host cell I would say, and that's essentially in that way the virus anchors itself to the host cell. Because then it needs to actually, it needs to get inside, you know, it fuses its membrane with the host membrane.

S3

Speaker 3

26:24

It releases the key components, it releases its, you know, RNA, and then essentially hijacks the machinery of the cell because none of the viruses that we know of have ribosome, the machinery that allows us to print out proteins. So in order to print out proteins that are necessary for functioning of this virus, it actually needs to hijack the host ribosomes.

S2

Speaker 2

27:00

So a virus is an RNA wrapped in a bunch of proteins, 1 of which is this functional mechanism of a spike protein that does the attachment.

S3

Speaker 3

27:09

So yeah, so if you look at this virus, there are several basic components, right? So we start with the spike protein. This is not the only surface protein, the protein that lives on the surface of the viral particle.

S3

Speaker 3

27:24

So there is also perhaps the protein with the highest number of copies is the membrane protein. So it's essentially, it forms the envelope of the protein of the viral particle and essentially, helps to maintain a certain curvature, helps to make a certain curvature. Then there is another protein called envelope protein or E protein, and it actually occurs in far less quantities. And still there is an ongoing research what exactly does this protein do?

S3

Speaker 3

28:13

So these are sort of the 3 major surface proteins that make the viral envelope. And when we go inside, then we have another structural protein called nuclear protein. And the purpose of this protein is to protect the viral RNA. So it actually binds to the viral RNA, creates a capsid.

S3

Speaker 3

28:40

And so the rest of the virus, viral information is inside of this RNA. And, you know, if you compare the amount of the genes or, you know, proteins that are made of these genes, it's much, you know, it's significantly higher than of influenza virus, for example. Influenza virus has, I think, around 8 or 9 proteins, where this 1 has at least

S1

Speaker 1

29:12

29.

S2

Speaker 2

29:13

Wow. That has to do with the length of the RNA strand? I mean, what?

S3

Speaker 3

29:17

So, I mean, so it affects the length of the RNA strand, right, so because you essentially need to have sort of the minimum amount of information to encode those genes.

S2

Speaker 2

29:29

How many proteins did you say?

S3

Speaker 3

29:30

Say again. 29. 29 proteins.

S3

Speaker 3

29:34

Yes, so this is something definitely interesting because believe it or not, we've been studying coronaviruses for over 2 decades. We've yet to uncover all functionalities of its proteins.

S2

Speaker 2

29:52

Could we maybe take a small tangent and can you say how 1 would try to figure out what a function of a particular protein is? So you've mentioned people are still trying to figure out what the function of

S3

Speaker 3

30:07

the envelope protein might be or what's the process? So this is where the research that computational scientists do might be of help. Because, you know, in the past several decades, we actually have collected a pretty decent amount of knowledge about different proteins in different viruses.

S3

Speaker 3

30:34

So what we can actually try to do, and this is sort of, could be sort of our first lead to a possible function, is to see whether those, you know, say we have this genome of the coronavirus, of the novel coronavirus, and we identify the potential proteins. Then in order to infer the function, what we can do, we can actually see whether those proteins are similar to those ones that we already know. Okay? In such a way we can, you know, for example, clearly identify, you know, some critical components that RNA polymerase or different types of proteases.

S3

Speaker 3

31:19

These are the proteins that essentially clip the protein sequences. And so this works in many cases. However, in some cases you have truly novel proteins. And this is a much more difficult task.

S3

Speaker 3

31:40

Now as a small pause, when you say similar, Like what if some parts are different and some parts are similar? Like how do you disentangle that? You know, it's a big question. Of course, you know, what bioinformatics does, it does predictions, right?

S3

Speaker 3

32:00

So those predictions, they have to be validated by experiments.

S2

Speaker 2

32:05

Functional or structural predictions?

S3

Speaker 3

32:08

Both, I mean, we do structural predictions, we do functional predictions, we do interactions predictions.

S2

Speaker 2

32:14

Oh, so this is interesting. So you just generate a lot of predictions, like reasonable predictions based on structure and function interaction, like you said, and then here you go. That's the power of bioinformatics is data grounded, good predictions of what should happen.

S3

Speaker 3

32:32

So in a way, I see it. We're helping experimental scientists to streamline their discovery process.

S2

Speaker 2

32:43

And the experimental scientists, Is that what a virologist is?

S3

Speaker 3

32:47

So yeah, virology is 1 of the experimental sciences that focus on viruses. They often work with other experimental scientists, for example, the molecular imaging scientists, right? So the viruses often can be viewed and reconstructed through electron microscopy techniques.

S3

Speaker 3

33:12

So, but these are specialists that are not necessarily virologists. They work with small particles, whether it's viruses or it's an organelle of a human cell, whether it's a complex molecular machinery. So the techniques that are used are very similar in their essence. And so, yeah, so typically, and We see it now, the research that is emerging and that is needed often involves the collaborations between virologists, biochemists, between virologists, biochemists, people from pharmaceutical sciences, computational sciences, so we have to work together.

S2

Speaker 2

34:19

So from my perspective, just to step back, sometimes I look at this stuff, just how much we understand about RNA and DNA, how much we understand about protein, like your work, the amount of proteins that you're exploring, is it surprising to you that we were able, we descendants of apes, were able to figure all of this out? So You're a computer scientist, so for me, from a computer science perspective, I know how to write a Python program, things are clear, but biology is a giant mess, it feels like to me, from an outsider's perspective. How surprising is it, amazing is it, that we're able to figure this stuff out?

S3

Speaker 3

35:04

You know, if you look at the, you know, how computational science and computer science was evolving, right? I think it was just a matter of time that we would approach biology. So we started from, you know, applications to much more fundamental systems, physics, and now we are...

S3

Speaker 3

35:27

Or small chemical compounds. So now we are approaching the more complex biological systems. And I think it's a natural evolution of the computer science, of mathematics.

S2

Speaker 2

35:48

So sure, that's the computer science side. I just meant even in higher levels. So that to me is surprising, that computer science can offer help in this messy world.

S2

Speaker 2

35:57

But I just mean it's incredible that the biologists and the chemists can figure all this out. Or does that just sound ridiculous to you, that of course they would? It just seems like a very complicated set of problems. Like the variety of the kinds of things that could be produced in the body.

S2

Speaker 2

36:16

Just like you said, 29 proteins, I mean, just getting a hang of it so quickly, it just seems impossible to me.

S3

Speaker 3

36:27

I agree, I mean, and I have to say, we are in the very, very beginning of this journey. I mean, we've yet to comprehend, not even try to understand and figure out all the details, but we've yet to comprehend the complexity

S2

Speaker 2

36:49

of the cell. We know that neuroscience is not even at the beginning of understanding the human mind. So where's biology sit in terms of understanding the function, deeply understanding the function of viruses and cells?

S2

Speaker 2

37:10

So sometimes it's easy to say when you talk about function, what you really refer to is perhaps not a deep understanding but more of a understanding sufficient to be able to mess with it using an antiviral, like mess with it chemically to prevent some of its function. Or do you understand the function deeply?

S3

Speaker 3

37:32

I think we are much farther in terms of understanding of the complex genetic disorder, such as cancer, where you have layers of complexity. And we, in my laboratory, we're trying to contribute to that research, but we're also, you know, we're overwhelmed with how many different layers of complexity, different layers of mechanisms that can be hijacked by cancer simultaneously. And so, you know, I think biology in the past 20 years, again, from the perspective of the outsider, because I'm not a biologist, but I think it has advanced tremendously.

S3

Speaker 3

38:18

And 1 thing that where computational scientists and data scientists are now becoming very, very helpful is in the fact, it's coming from the fact that we are now able to generate a lot of information about the cell, whether it's next generation sequencing or transcriptomics, whether it's life imaging information, where it is, you know, complex interactions between proteins or between proteins and small molecules such as drugs. We are becoming very efficient in generating this information. And now the next step is to become equally efficient in processing this information and extracting the key knowledge from that.

S2

Speaker 2

39:20

That could then be validated with experiment.

S3

Speaker 3

39:23

Yes. I'm

S2

Speaker 2

39:24

back. So maybe then going all the way back, we were talking, you said the first step is seeing if we can match the new proteins you found in the virus against something we've seen before to figure out its function. And then you also mentioned that, but there could be cases where it's a totally new protein. Is there something bioinformatics can offer when it's a totally new protein?

S3

Speaker 3

39:48

This is where many of the methods, and you're probably aware of the case of machine learning, many of these methods rely on the previous knowledge. Right? So things that where we try to do from scratch are incredibly difficult.

S3

Speaker 3

40:07

You know, something that we call ab initio. And this is, I mean, it's not just the function. I mean, you know, we've yet to have a robust method to predict the structures of these proteins in ab initio, by not using any templates of other related proteins. So protein is

S2

Speaker 2

40:32

a chain of amino acids. It's residues. Residues, yeah.

S2

Speaker 2

40:39

And then somehow, magically, maybe you can tell me, they seem to fold in incredibly weird and complicated 3D shapes. Yes. So, and that's where actually the idea of protein folding, or just not the idea, but the problem of figuring out how the

S3

Speaker 3

41:00

hell they- Yeah, the concept.

S2

Speaker 2

41:01

The concept, yeah. How they fold into those weird shapes comes in. So that's another side of computational work.

S2

Speaker 2

41:09

So can you describe what protein folding from the computational side is and maybe your thoughts on the folding at home efforts that a lot of people know that you can use your machine to do protein folding.

S3

Speaker 3

41:22

So yeah, protein folding is 1 of those 1000000 dollar price challenges, right? So the reason for that is we've yet to understand precisely how the protein gets folded so efficiently to the point that in many cases where you try to unfold it due to the high temperature, it actually folds back into its original state. All right, so we know a lot about the mechanisms, right?

S3

Speaker 3

41:59

But putting those mechanisms together and making sense, it's computationally very expensive task.

S2

Speaker 2

42:11

In general, do proteins fold, can they fold in arbitrary large number of ways or do they usually fold in a very small number

S3

Speaker 3

42:19

of ways? It's typically, I mean, we tend to think that, you know, there is a 1 sort of canonical fold for a protein, although there are many cases where the proteins, you know, upon destabilization, it can be folded into a different confirmation. And this is especially true when you look at sort of proteins that include more than 1 structural unit.

S3

Speaker 3

42:45

So those structural units, we call them protein domains. Essentially, a protein domain is a single unit that typically is evolutionary preserved, that typically carries out a single function, and typically has a very distinct fold, right? The structure, 3D structure organization. But turns out that if you look at human, an average protein in a human cell would have a bit of 2 or 3 such subunits.

S3

Speaker 3

43:19

And how they are trying to fold into the sort of, you know, next level fold, right? So within subunit, there's folding, and then? And then they fold into the larger 3D structure, right? And all of that, there's some understanding

S2

Speaker 2

43:40

of the basic mechanisms, but not to put together to be able to fold it.

S3

Speaker 3

43:44

We're still, I mean, we're still struggling. I mean, we're getting pretty good about folding relatively small proteins up to a hundred residues. But we're still far away from folding larger proteins.

S3

Speaker 3

44:02

And some of them are notoriously difficult. For example, transmembrane proteins, proteins that sit in the membranes of the cell. They're incredibly important, but they are incredibly difficult to solve.

S2

Speaker 2

44:19

And so basically, there's a lot of degrees of freedom, how it folds, and so it's a combinatorial problem, or it just explodes, there's so many dimensions.

S3

Speaker 3

44:28

Well, it is a combinatorial problem, but it doesn't mean that we cannot approach it from the non-con, not from the brute force approach. And so the machine learning approaches, you know, have been emerged that try to tackle it.

S2

Speaker 2

44:47

So folding at home, I don't know how familiar you are with it, but is that user machine learning or is it more brute force?

S3

Speaker 3

44:55

No, so folding at home, it was originally, and I remember, I was, I mean, it was a long time ago. I was a postdoc and we learned about this game because it was originally designed as a game. And I took a look at it, and it's interesting, because it's really very transparent, very intuitive.

S3

Speaker 3

45:22

So, and from what I heard, I've yet to introduce it to my son, but kids are actually getting very good at folding the proteins. And it came to me as the, not as a surprise, but actually as the sort of manifest of our capacity to do this kind of, to solve this kind of problems. When a paper was published in 1 of these top journals with the co-authors being the actual players of this game. And what happened was that they managed to get better structures than the scientists themselves.

S3

Speaker 3

46:15

Than the scientists themselves. So that was very, I mean, it was kind of profound revelation that problems that are so challenging for a computational science, maybe not that challenging for a human brain.

S2

Speaker 2

46:37

Well, that's a really good, that's a hopeful message always when there's the proof of existence, the existence proof that it's possible, that's really interesting. But it seems, what are the best ways to do protein folding now? So if you look at what DeepMind does with AlphaFold.

S2

Speaker 2

47:02

AlphaFold, yes. So they kind of, that's a learning approach. What's your sense? I mean, your background is in machine learning, but is this a learnable problem?

S2

Speaker 2

47:12

Is this still a brute force? Are we in the Garry Kasparov, the blue days, or are we in the AlphaGo playing the game of Go days of folding?

S3

Speaker 3

47:24

Well, I think we are advancing towards this direction. I mean, if you look, So there is sort of Olympic game for protein folders called CASP. And it's essentially, it's a competition where different teams are given exactly the same protein sequences and they try to predict their structures.

S3

Speaker 3

47:49

Right? And of course there are different sort of sub tasks, but in the recent competition, AlphaFault was among the top performing teams, if not the top performing team. So there is definitely a benefit from the data that have been generated, you know, in the past several decades, the structural data. And certainly, we are now at the capacity to summarize this data, to generalize this data, and to use those principles in order to predict protein structures.

S2

Speaker 2

48:33

That's 1 of the really cool things here is, there's, maybe you can comment on it, there seems to be these open data sets of protein. How did that?

S3

Speaker 3

48:44

Protein Data bank?

S2

Speaker 2

48:46

Yeah, protein data bank. I mean, is this a recent thing for just the coronavirus? Or has this been a?

S3

Speaker 3

48:53

It's been for many, many years. I believe the first protein data bank was designed on flashcards. So, yes, I mean, this is a great example of the community efforts of everyone contributing.

S3

Speaker 3

49:16

Because every time you solve a protein or a protein complex, this is where you submit it. And the scientists get access to it, scientists get to test it, and we, bioinformaticians, use this information

S2

Speaker 2

49:40

to make predictions. So there's no culture of like hoarding discoveries here. So that's, I mean, you've released a few or a bunch of proteins that were matching, whatever, we'll talk about details a little bit, but it's kind of amazing that that's,

S3

Speaker 3

50:04

it's kind of amazing how open the culture here is. It is, and I think this pandemic actually demonstrated the ability of scientific community to solve this challenge collaboratively. And this is, I think, if anything, it actually moved us to a brand new level of collaborations of the efficiency in which people establish new collaborations, in which people offer their help to each other, scientists offer their help to each other.

S2

Speaker 2

50:44

And publish results too, it's very interesting. We're now trying to figure out, there's a few journals that are trying to sort of do the very accelerated review cycle, but so many pre-prints, so just posting a paper, going out, I think it's fundamentally changing the way we think about papers. Yes,

S3

Speaker 3

51:04

I mean, the way we think about knowledge, I would say, yes, because yes, I completely agree. I think now it's, the knowledge is becoming sort of the core value, not the paper or the journal where this knowledge is published. And I think this is, again, we are living in the times where it becomes really crystallized, that the idea that the most important value is in the knowledge.

S2

Speaker 2

51:43

So maybe you can comment, like what do you think the future of that knowledge sharing looks like? So you have this paper that I hope we'll get a chance to talk about a little bit, but it has like a really nice abstract and introduction and related, like it has all the usual, I mean, it probably took a long time to put together.

S3

Speaker 3

52:02

But is that going to remain, like you could have communicated a lot of fundamental ideas here in much shorter amount that's less traditionally acceptable by the journal context? So, well, you know, so the first version that we posted, not even on the bioarchive, because bioarchive back then, it was essentially, you know, overwhelmed with the number of submissions. So our submission, I think it took 5 or 6 days just for it to be screened and put online.

S3

Speaker 3

52:43

So we, essentially we put the first preprint on our website. And it started getting accessed right away. So this original preprint was in a much rougher shape than this paper. But we tried, I mean, we honestly tried to be as compact as possible with introducing the information that is necessary to explain

S2

Speaker 2

53:24

our results. So maybe you can dive right in if it's okay. Sure.

S2

Speaker 2

53:29

So There's a paper called Structural Genomics of SARS-CoV-2. How do you even pronounce?

S3

Speaker 3

53:35

SARS-CoV-2. CoV-2? Yeah. By

S2

Speaker 2

53:39

the way, CoV-2 is such a terrible name, but it's stuck. Anyway. Yes.

S2

Speaker 2

53:43

SARS-CoV-2 indicates evolutionary conserved functional regions of viral proteins. So this is looking at all kinds of proteins that are part of this novel coronavirus and how they match up against the previous other kinds of coronaviruses. I mean, there's a lot of beautiful figures. I was wondering if you could, I mean, there's so many questions I could ask here, but maybe at the, how do you get started doing this paper?

S2

Speaker 2

54:11

So how do you start to figure out the 3D structure of a novel virus?

S3

Speaker 3

54:16

Yes, so there is actually a little story behind it. And so the story actually dated back in September of 2019. And you probably remember that Back then we had another dangerous virus, triple E virus, Eastern Equine Encephalitis virus.

S2

Speaker 2

54:40

Can you maybe linger on it? I have to admit I was sadly completely unaware.

S3

Speaker 3

54:46

So that was actually a virus outbreak that happened in New England only. The danger in this virus was that it actually targeted your brain. So the word deaths from this virus, the main vector was mosquitoes.

S3

Speaker 3

55:12

And obviously fall time is the time where you have a lot of them in New England. And, you know, on 1 hand, people realize this is actually a very dangerous thing. So it had an impact on the local economy. The schools were closed past 06:00.

S3

Speaker 3

55:37

No activities outside for the kids because the kids were suffering quite tremendously from when infected from this virus.

S2

Speaker 2

55:48

How do I not know about this? Was universities impacted?

S3

Speaker 3

55:52

It was in the news. I mean, it was not impacted to a high degree in Boston necessarily, but in the Metro West area and actually spread around, I think, all the way to New Hampshire, Connecticut.

S2

Speaker 2

56:09

And you mentioned affecting the brain. That's 1 other comment

S3

Speaker 3

56:12

we should make. So you mentioned AC2 for the coronavirus. So these viruses kind of attach to something in the body.

S3

Speaker 3

56:23

So it essentially attaches to these proteins in those cells in the body where those proteins are expressed, where they actually have them in abundance.

S2

Speaker 2

56:35

So sometimes that could be in the lungs, that could be in the brain, that could be in some of that.

S3

Speaker 3

56:39

So I think what they, right now, from what I read, They have the epithelial cells inside. So the cells essentially inside the, it's the cells that are covering the surface. So inside the nasal surfaces, the throat.

S3

Speaker 3

57:05

The lung cells and I believe liver as a couple of other organs where they are actually expressed in abundance. That's for the AC2 receptors? For the AC2 receptors.

S2

Speaker 2

57:16

So okay, so back to the story.

S3

Speaker 3

57:19

The outbreak in the fall. So, now, the impact of this virus is significant. However, it's a pre-local problem, to the point that this is something that we would call a neglected disease because it's not big enough to make the drug design companies to design a new antiviral or a new vaccine.

S3

Speaker 3

57:52

It's not big enough to generate a lot of grants from the national funding agencies. So does it mean we cannot do anything about it? And so what I did is I taught a bioinformatics class and in Worcester Polytechnic Institute, and we are very much a problem learning institution. So I thought that that would be a perfect project.

S2

Speaker 2

58:31

It's kind of an

S3

Speaker 3

58:31

ongoing case study. So I asked, so I, we essentially designed a study where we tried to use bioinformatics to understand as much as possible about this virus. And a very substantial portion of the study was to understand the structures of the proteins, to understand how they interact with each other and with the host proteins, try to understand the evolution of this virus.

S3

Speaker 3

59:08

It's obviously a very important question, how, where it will evolve further, how it happened here. So we did all these projects, and now I'm trying to put them into a paper where all these undergraduate students will be co-authors. But essentially the projects were finished right about mid-December. And a couple of weeks later, I heard about this mysterious new virus that was discovered in, you know, was reported in Wuhan province.

S3

Speaker 3

59:51

And immediately I thought that, well, we just did that. Can't we do the same thing with this virus?