See all Lex Fridman transcripts on Youtube

youtube thumbnail

Dileep George: Brain-Inspired AI | Lex Fridman Podcast #115

2 hours 10 minutes 5 seconds

🇬🇧 English

S1

Speaker 1

00:00

The following is a conversation with Dalipe George, a researcher at the intersection of neuroscience and artificial intelligence, co-founder of Vicarious with Scott Phoenix, and formerly co-founder of Numenta with Jeff Hawkins, who's been on this podcast, and Donna Dubinsky. From his early work on hierarchical temporal memory to recursive cortical networks to today, De Leap's always sought to engineer intelligence that is closely inspired by the human brain. As a side note, I think we understand very little about the fundamental principles underlying the function of the human brain, but the little we do know gives hints that may be more useful for engineering intelligence than any idea in mathematics, computer science, physics, and scientific fields outside of biology. And so the brain is a kind of existence proof that says it's possible, keep at it.

S1

Speaker 1

00:56

I should also say that brain inspired AI is often overhyped and use this fodder, just as quantum computing for marketing speak. But I'm not afraid of exploring these sometimes overhyped areas since where there's smoke, there's sometimes fire. Quick summary of the ads. 3 sponsors, Babbel, Raycon Earbuds, and Masterclass.

S1

Speaker 1

01:19

Please consider supporting this podcast by clicking the special links in the description to get the discount. It really is the best way to support this podcast. If you enjoy this thing, subscribe on YouTube, review it with the 5 Stars on Apple Podcasts, support on Patreon, or connect with me on Twitter at Lex Friedman. As usual, I'll do a few minutes of ads now, and never any ads in the middle that can break the flow of the conversation.

S1

Speaker 1

01:45

This show is sponsored by Babbel, an app and website that gets you speaking in a new language within weeks. Go to babbel.com and use code Lex to get 3 months free. They offer 14 languages, including Spanish, French, Italian, German, and yes, Russian. Daily lessons are 10 to 15 minutes, super easy, effective, designed by over 100 language experts.

S1

Speaker 1

02:10

Let me read a few lines from the Russian poem Noch, ulitsa, fanar, apteka by Alexander Bloch that you'll start to understand if you sign up to Babbel. Noch, ulitsa, fonary, apteka, besmyslenny I tuskly svet. Zhevi escho, khod chetvert veka, syo budet tak, iskhoda net. Now, I say that you'll only start to understand this poem because Russian starts with a language and ends with a vodka.

S1

Speaker 1

02:43

The latter part is definitely not endorsed or provided by Babbel and will probably lose me the sponsorship. But once you graduate from Babbel, you can enroll in my advanced course of late night Russian conversation over vodka. I have not yet developed an app for that. It's in progress.

S1

Speaker 1

03:00

So get started by visiting babbel.com and use code Lex to get 3 months free. This show is sponsored by Raycon earbuds. Get them at buyraycon.com slash Lex. They've become my main method of listening to podcasts, audio books, and music when I run, do push-ups and pull-ups, or just living life.

S1

Speaker 1

03:20

In fact, I often listen to Brown Noise with them when I'm thinking deeply about something, it helps me focus. They're super comfortable, pair easily, great sound, great bass, 6 hours of playtime. I've been putting in a lot of miles to get ready for a potential ultra marathon and listening to audiobooks on World War II. The sound is rich and really comes in clear.

S1

Speaker 1

03:45

So again, get them at buyraycon.com slash Lex. This show is sponsored by Masterclass. Sign up at masterclass.com slash Lex to get a discount and to support this podcast. When I first heard about Masterclass, I thought it was too good to be true.

S1

Speaker 1

04:01

I still think it's too good to be true. For 180 bucks a year, you get an all access pass to watch courses from, to list some of my favorites, Chris Hadfield on space exploration, Neil deGrasse Tyson on scientific thinking and communication, Will Wright, creator of SimCity and Sims and game design. Every time I do this read, I really want to play a city builder game. Carlos Santana on guitar, Garry Kasparov on chess, Daniel Negrano on poker and many more.

S1

Speaker 1

04:30

Chris Hadfield explaining how rockets work and the experience of being launched into space alone is worth the money. By the way, you can watch it on basically any device. Once again, sign up at masterclass.com to get a discount and to support this podcast. And now here's my conversation with Dalip George.

S2

Speaker 2

04:50

Do you think we need to understand the brain in order to build it? Yes, if

S3

Speaker 3

04:54

you want to build the brain, we definitely need to understand how it works. So Blue Brain, or Henry Markram's project, is trying to build a brain without understanding it, just trying to put details of the brain from neuroscience experiments into a giant simulation by putting more and more neurons, more and more details. But that is not going to work because when it doesn't perform as what you expect it to do, then what do you do?

S3

Speaker 3

05:27

You do, you just keep adding more details. How do you debug it? So it's a, so unless you understand, unless you have a theory about how the system is supposed to work, how the pieces are supposed to fit together, what they're going to contribute, you can't build it.

S2

Speaker 2

05:42

At the functional level, understand. So can you actually linger on and describe the Blue Brain Project? It's kind of a fascinating principle, an idea to try to simulate the brain.

S2

Speaker 2

05:55

We're talking about the human brain, right?

S3

Speaker 3

05:57

Right, human brains and rat brains or cat brains have lots in common, that the cortex, the neocortex structure is very similar. So initially they were trying to just simulate a cat brain.

S2

Speaker 2

06:14

To understand the nature of evil.

S3

Speaker 3

06:17

To Understand the nature of evil. Or as it happens in most of the simulations, you easily get 1 thing out, which is oscillations. If you simulate a large number of neurons, they oscillate.

S3

Speaker 3

06:32

And you can adjust the parameters and say that, oh, oscillations match the rhythm that we see in the brain, et cetera.

S2

Speaker 2

06:40

But... Oh, I see. So the idea is, is the simulation at the level of individual neurons?

S3

Speaker 3

06:46

Yeah, so the Blue Brain project, the original idea as proposed was you put very detailed biophysical neurons, biophysical models of neurons, and you interconnect them according to the statistics of connections that we have found from real neuroscience experiments, and then turn it on and see what happens. And these neural models are incredibly complicated in themselves, right? Because these neurons are modeled using this idea called Hodgkin-Huxley models, which are about how signals propagate in a cable, and there are active dendrites, all those phenomena, which those phenomena themselves, we don't understand that well.

S3

Speaker 3

07:36

And then we put in connectivity, which is part guesswork, part observed. And of course, if we do not have any theory about how it is supposed to work, we just have to take whatever comes out of it as, okay, this is something interesting.

S2

Speaker 2

07:54

But in your sense, like these models of the way signal travels along, or like with the axons and all the basic models, they're too crude.

S3

Speaker 3

08:04

Oh, well, actually, they are pretty detailed and pretty sophisticated, and they do replicate the neural dynamics. If you take a single neuron, and you try to turn on the different channels, the calcium channels and the different receptors and see what the effect of turning on or off those channels are in the neuron's spike output, people have built pretty sophisticated models of that. And they are, I would say, in the regime of correct.

S2

Speaker 2

08:41

Well, see, the correctness, that's interesting, because you mentioned it at several levels. The correctness is measured by looking at some kind of aggregate statistics.

S3

Speaker 3

08:49

It would be more the spiking dynamics of a single neuron.

S2

Speaker 2

08:53

Spiking dynamics of a single neuron,

S3

Speaker 3

08:55

okay. Yeah, and yeah, these models, because they are going to the level of mechanism, right? So they are basically looking at, okay, what is the effect of turning on an ion channel? And you can model that using electric circuits.

S3

Speaker 3

09:11

And then, so they are model, so it is not just a function fitting, it is people are looking at the mechanism underlying it and putting that in terms of electric circuit theory, signal propagation theory, and modeling that. And so those models are sophisticated, but getting a single neurons model 99% right does not still tell you how to, you know, it would be the analog of getting a transistor model right and now trying to build a microprocessor. And if you just observe, you know, if you did not understand how a microprocessor works, but you say, oh, I now can model 1 transistor well, and now I will just try to interconnect the transistors according to whatever I could guess from the experiments and try to simulate it, then it is very unlikely that you will produce a functioning microprocessor. When you want to produce a functioning microprocessor, you want to understand Boolean logic, how do the gates work, all those things, and then understand how do those gates get implemented using transistors.

S2

Speaker 2

10:23

Yeah, there's actually, I remember, this reminds me, there's a paper, maybe you're familiar with it, that I remember going through in a reading group that approaches a microprocessor from a perspective of a neuroscientist. I think it basically, it uses all the tools that we have of neuroscience to try to understand, like as if we just aliens showed up to study computers and to see if those tools could be used to get any kind of sense of how the microprocessor works. I think the final, the takeaway from at least this initial exploration is that we're screwed.

S2

Speaker 2

11:02

There's no way that the tools of neuroscience would be able to get us to anything, like not even Boolean logic. I mean, it's just any aspect of the architecture of the function of the processes involved, the clocks, the timing, all that, you can't figure that out from the tools of neuroscience.

S3

Speaker 3

11:24

Yeah, so I'm very familiar with this particular paper. I think it was called, Can a Neuroscientist Understand a Microprocessor? Something like that.

S3

Speaker 3

11:35

Following the methodology in that paper, even an electrical engineer would not understand microprocessors. So I could, so. So I don't think it is that bad in the sense of saying neuroscientists do find valuable things by observing the brain. They do find good insights, but those insights cannot be put together just as a simulation.

S3

Speaker 3

12:04

You have to investigate what are the computational underpinnings of those findings. How do all of them fit together from an information processing perspective? You have to, somebody has to painstakingly put those things together and build hypothesis. So I don't want to diss all of neuroscience of saying, oh, they're not finding anything.

S3

Speaker 3

12:27

No, that paper almost went to that level of neuroscientists will never understand. No, that's not true. I think they do find lots of useful things, but it has to be put together in a computational framework.

S2

Speaker 2

12:39

Yeah, I mean, but you know, just the AI systems will be listening to this podcast a hundred years from now, and they will probably, there's some non-zero probability they'll find your words laughable. They're like, I remember humans thought they understood something about the brain, they were totally clueless. There's a sense about neuroscience that we may be in the very, very early days of understanding the brain.

S2

Speaker 2

13:04

But I mean, that's 1 perspective. In your perspective, how far are we into understanding any aspect of the brain? So the dynamics of the individual neuro communication to the how in a collective sense, how they're able to store information, transfer information, how intelligence then emerges, all that kind of stuff. Where are we on that timeline?

S3

Speaker 3

13:35

Yeah, so timelines are very, very hard to predict. And you can, of course, be wrong. And you can be wrong on either side.

S3

Speaker 3

13:44

We know that when we look back, the first flight was in

S1

Speaker 1

13:49

1903.

S3

Speaker 3

13:51

In 1900, there was a New York Times article on flying machines that do not fly. And humans might not fly for another 100 years. That was what that article stated.

S3

Speaker 3

14:04

And so, but no, they flew 3 years after that. So it is, you know, it's very hard to, so.

S2

Speaker 2

14:11

Well, and on that 0.1 of the Wright brothers, I think 2 years before said that, like he said, like some number, like 50 years, he has become convinced that it's

S3

Speaker 3

14:28

impossible. Even during their experimentation?

S2

Speaker 2

14:31

Yeah, yeah, yeah. I mean, that's attributed to when, that's like the entrepreneurial battle of like depression of going through just like thinking this is impossible. But there, yeah, there's something, even the person that's in it is not able to see, estimate correctly.

S3

Speaker 3

14:47

Exactly, But I can tell from the point of, you know, objectively, what are the things that we know about the brain and how that can be used to build AI models, which can then go back and inform how the brain works. So my way of understanding the brain would be to basically say, look at the insights neuroscientists have found. Understand that from a computational angle, information processing angle, build models using that.

S3

Speaker 3

15:15

And then building that model which functions, which is a functional model, which is doing the task that we want the model to do. It is not just trying to model a phenomena in the brain. It is trying to do what the brain is trying to do on the whole functional level. And building that model will help you fill in the missing pieces that, you know, biology just gives you the hints and building the model, you know, fills in the rest of the pieces of the puzzle.

S3

Speaker 3

15:44

And then you can go and connect that back to biology and say, okay, now it makes sense that this part of the brain is doing this, or this layer in the cortical circuit is doing this. And then continue this iteratively, because now that will inform new experiments in neuroscience. And of course, building the model and verifying that in the real world will also tell you more about, does the model actually work? And you can refine the model, find better ways of putting these neuroscience insights together.

S3

Speaker 3

16:20

So I would say it is, you know, so neuroscientists alone, just from experimentation, will not be able to build a model of the brain, a functional model of the brain. So there's lots of efforts, which are very impressive efforts in collecting more and more connectivity data from the brain. How are the microcircuits of the brain connected with each other?

S2

Speaker 2

16:45

Those are beautiful, by the way.

S3

Speaker 3

16:47

Those are beautiful. And at the same time, those do not itself, by themselves, convey the story of how does it work. Yeah.

S3

Speaker 3

16:58

And somebody has to understand, okay, why are they connected like that? And what are those things doing? And we do that by building models in AI, using hints from neuroscience and repeat the cycle.

S2

Speaker 2

17:11

So what aspect of the brain are useful in this whole endeavor? Which by the way, I should say, you're both a neuroscientist and AI person. I guess the dream is to both understand the brain and to build AGI systems.

S2

Speaker 2

17:27

So it's like an engineer's perspective of trying to understand the brain. So what aspects of the brain, functionally speaking, like you said, do you find interesting?

S3

Speaker 3

17:38

Yeah, quite a lot of things. So 1 is, if you look at the visual cortex, and the visual cortex is a large part of the brain. I forgot the exact fraction, but a huge part of our brain area is occupied by just vision.

S3

Speaker 3

17:59

So visual cortex is not just a feed-forward cascade of neurons. There are a lot more feedback connections in the brain compared to the feed-forward connections. And it is surprising to the level of detail neuroscientists have actually studied this. If you go into neuroscience literature and poke around and ask, have they studied what will be the effect of poking a neuron in level IT in level V1?

S3

Speaker 3

18:29

And Have they studied that? And you will say, yes, they have studied that.

S2

Speaker 2

18:35

So every possible combination.

S3

Speaker 3

18:38

I mean, it's not a random exploration at all. It's very hypothesis driven, right? They are very, experimental neuroscientists are very, very systematic in how they probe the brain, because experiments are very costly to conduct.

S3

Speaker 3

18:52

They take a lot of preparation. They need a lot of control. So they are very hypothesis-driven in how they probe the brain. And often what I find is that when we have a question in AI about, has anybody probed how lateral connections in the brain works?

S3

Speaker 3

19:10

And when you go and read the literature, yes, people have probed it, and people have probed it very systematically. And they have hypothesis about how those lateral connections are supposedly contributing to visual processing. But of course they haven't built very, very functional detailed models of it. By the

S2

Speaker 2

19:28

way, how do they, in those studies, Sorry to interrupt, do they stimulate like a neuron in 1 particular area of the visual cortex and then see how the signal travels kind of thing?

S3

Speaker 3

19:39

Fascinating, very, very fascinating experiments. So I can give you 1 example I was impressed with. This is, So before going to that, let me give you an overview of how the layers in the cortex are organized.

S3

Speaker 3

19:54

Visual cortex is organized into roughly 4 hierarchical levels. Okay, so V1, V2, V4, IT. And in V1- What happened to V3? Well, yeah, there's another pathway.

S3

Speaker 3

20:06

Okay, so there is this, I'm talking about just object recognition pathway.

S2

Speaker 2

20:10

All right, cool.

S3

Speaker 3

20:11

Okay, and then in V1 itself, So there is a very detailed microcircuit in V1 itself. That is organization within a level itself. The cortical sheet is organized into multiple layers, and there are columnar structure.

S3

Speaker 3

20:30

This layer-wise and columnar structure is repeated in V1, V2, V4, IT, all of them, right? And the connections between these layers within a level, you know, in V1 itself, there are 6 layers, roughly, and the connections between them, there is a particular structure to them. And now, so 1 example of an experiment people did is when you present a stimulus, which is, let's say, requires separating the foreground from the background of an object. So it's a textured triangle on a textured background.

S3

Speaker 3

21:10

And you can check, does the surface settle first, or does the contour settle first?

S2

Speaker 2

21:19

Settle?

S3

Speaker 3

21:19

Settle in the sense that, so when you finally form the percept of the triangle, you understand where the contours of the triangle are and you also know where the inside of the triangle is, right? That's when you form the final percept. Now, you can ask, what is the dynamics of forming that final percept?

S3

Speaker 3

21:45

Do the neurons first find the edges and converge on where the edges are, and then they find the inner surfaces, or does it go the other way around?

S2

Speaker 2

21:55

The other way around. So what's the answer?

S3

Speaker 3

21:58

In this case, it turns out that it first settles on the edges, it converges on the edge hypothesis first, and then the surfaces are filled in from the edges to the inside.

S2

Speaker 2

22:11

That's fascinating.

S3

Speaker 3

22:12

And the detail to which you can study this, it's amazing that you can actually not only find the temporal dynamics of when this happens, and then you can also find which layer in V1, which layer is encoding the edges, which layer is encoding the surfaces, and which layer is encoding the feedback, which layer is encoding the feedforward, and what's the combination of them that produces the final person. And these kinds of experiments stand out when you try to explain illusions. 1 example of a favorite illusion of mine is the Kanitsa triangle.

S3

Speaker 3

22:51

I don't know that you are familiar with this 1. So this is an example where it's a triangle, but only the corners of the triangle are shown in the stimulus. So they look like kind of Pac-Man.

S2

Speaker 2

23:06

Oh, the black Pac-Man. Exactly.

S3

Speaker 3

23:08

And then your visual system hallucinates the edges. Yeah. And When you look at it, you will see a faint edge.

S3

Speaker 3

23:17

You can go inside the brain and look, do actually neurons signal the presence of this edge? If they signal, how do they do it? Because they are not receiving anything from the input. The input is black for those neurons, right?

S3

Speaker 3

23:35

So how do they signal it? When does the signaling happen? So if a real contour is present in the input, then the neurons immediately signal, okay, there is an edge here. When it is an illusory edge, it is clearly not in the input.

S3

Speaker 3

23:53

It is coming from the context. So those neurons fire later. And you can say that, okay, these are, it's the feedback connections that is causing them to fire. And they happen later and you can find the dynamics of them.

S3

Speaker 3

24:09

So these studies are pretty impressive and very detailed.

S2

Speaker 2

24:13

So by the way, just a step back, you said that there may be more feedback connections and feedforward connections. First of all, if it's just for like a machine learning folks, I mean, that's crazy that there's all these feedback connections. We often think

S1

Speaker 1

24:38

about, thanks to

S2

Speaker 2

24:38

deep learning, you start to think about the human brain as a kind of feed forward mechanism. So what the heck are these feedback connections? What's the dynamics?

S2

Speaker 2

24:52

What are we supposed to think about them?

S3

Speaker 3

24:54

Yeah, so this fits into a very beautiful picture about how the brain works, right? So The beautiful picture of how the brain works is that our brain is building a model of the world. So our visual system is building a model of how objects behave in the world.

S3

Speaker 3

25:13

And we are constantly projecting that model back onto the world. So what we are seeing is not just a feed forward thing that just gets interpreted in a feed forward part. We are constantly projecting our expectations onto the world. And what the final percept is a combination of what we project onto the world combined with what the actual sensory input is.

S2

Speaker 2

25:36

Almost like trying to calculate the difference and then trying to interpret the difference.

S3

Speaker 3

25:40

Yeah, I wouldn't put it as calculating the difference. It's more Like what is the best explanation for the input stimulus based on the model of the world I have?

S2

Speaker 2

25:52

Got it, got it. And that's where all the illusions come in. But that's an incredibly efficient process.

S2

Speaker 2

26:00

So the feedback mechanism, it just helps you constantly, yeah, so hallucinate how the world should be based on your world model, and then just looking at if there's novelty, like trying to explain it. Hence, that's why movement, we detect movement really well, there's all these kinds of things. And this is like at all different levels of the cortex you're saying. This happens at the lowest level, at the highest level.

S3

Speaker 3

26:29

Yes, yeah. In fact, feedback connections are more prevalent in everywhere in the cortex. And so 1 way to think about it, and there's a lot of evidence for this, is inference.

S3

Speaker 3

26:41

So basically, if you have a model of the world, and when some evidence comes in, what you are doing is inference, right? You are trying to now explain this evidence using your model of the world. And this inference includes projecting your model onto the evidence and taking the evidence back into the model and doing an iterative procedure. And this iterative procedure is what happens using the feed-forward feedback propagation.

S3

Speaker 3

27:13

And feedback affects what you see in the world and it also affects feed-forward propagation. And examples are everywhere. We see these kinds of things everywhere. The idea that there can be multiple competing hypothesis in our model, trying to explain the same evidence, and then you have to kind of make them compete, and 1 hypothesis will explain away the other hypothesis through this competition process.

S2

Speaker 2

27:42

Wait, what? So you have competing models of the world that try to explain, what do you mean by explain away?

S3

Speaker 3

27:49

So this is a classic example in a graphical models, probabilistic models. So if you- What are those? Okay.

S2

Speaker 2

28:01

I think it's useful to mention because we'll talk about them more. Yeah, yeah.

S3

Speaker 3

28:05

So neural networks are 1 class of machine learning models. You know, you have distributed set of nodes, which are called the neurons. You know, Each 1 is doing a dot product and you can approximate any function using this multilevel network of neurons.

S3

Speaker 3

28:22

So that's a class of models which are useful for function approximation. There is another class of models in machine learning called probabilistic graphical models. And you can think of them as each node in that model is variable, which is talking about something. You know, it can be a variable representing is an edge present in the input or not?

S3

Speaker 3

28:48

And at the top of the network, a node can be representing, is there an object present in the world or not? And then, so It is another way of encoding knowledge. Then once you encode the knowledge, you can do inference in the right way. What is the best way to explain some set of evidence using this model that you encoded.

S3

Speaker 3

29:19

So when you encode the model, you are encoding the relationship between these different variables. How is the edge connected to the model of the object? How is the surface connected to the model of the object? And then, of course, this is a very distributed, complicated model.

S3

Speaker 3

29:35

And inference is how do you explain a piece of evidence when a set of stimulus comes in? If somebody tells me there is a 50% probability that there is an edge here in this part of the model. How does that affect my belief on whether I should think that there should be a square percent in the image? So this is the process of inference.

S3

Speaker 3

29:56

So 1 example of inference is having this expiring effect between multiple causes. So graphical models can be used to represent causality in the world. So let's say, you know, your alarm at home can be triggered by a burglar getting into your house, or it can be triggered by an earthquake. Both can be causes of the alarm going off.

S3

Speaker 3

30:27

So now, You're in your office, you heard burglar alarm going off, you are heading home, thinking that there's a burglar. But while driving home, if you hear on the radio that there was an earthquake in the vicinity, now your strength of evidence for a burglar getting into their house is diminished. Because now that piece of evidence is explained by the earthquake being present. So if you think about these 2 causes explaining a lower level variable, which is alarm, now what we're seeing is that increasing the evidence for some cause, there is evidence coming in from below for alarm being present.

S3

Speaker 3

31:10

And initially, it was flowing to a burglar being present. But now, since there is side evidence for this other cause, it explains away this evidence and evidence will now flow to the other cause. This is 2 competing causal things trying to explain the same evidence.

S2

Speaker 2

31:28

And the brain has a similar kind of mechanism for doing so. That's kind of interesting. And that, how's that all encoded in the brain?

S2

Speaker 2

31:38

Like where's the storage of information? Are we talking, just maybe to get it a little bit more specific, Is it in the hardware of the actual connections? Is it in chemical communication? Is it electrical communication?

S2

Speaker 2

31:53

Do we know?

S3

Speaker 3

31:54

So this is a paper that we are bringing out soon. Which 1 is this? This is the cortical microcircuits paper that I sent you a draft of.

S3

Speaker 3

32:03

Of course, a lot of it is still hypothesis. 1 hypothesis is that you can think of a cortical column as encoding a concept. Think of it as an example of a concept is an edge present or not, or is an object present or not. Okay, so you can think of it as a binary variable, a binary random variable, the presence of an edge or not, or the presence of an object or not.

S3

Speaker 3

32:32

So each cortical column can be thought of as representing that 1 concept, 1 variable. And then the connections between these cortical columns are basically encoding the relationship between these random variables. And then there are connections within the cortical column. Each cortical column is implemented using multiple layers of neurons with very, very, very rich structure there.

S3

Speaker 3

32:57

There are thousands of neurons in a cortical column.

S2

Speaker 2

33:00

But that structure is similar across the different cortical columns.

S3

Speaker 3

33:03

Correct. Correct. Also, these cortical columns connect to a substructure called thalamus. So all cortical columns pass through this substructure.

S3

Speaker 3

33:14

Our hypothesis is that the connections between the cortical columns implement this, you know, that's where the knowledge is stored about, you know, how these different concepts connect to each other. And then the neurons inside this cortical column and in thalamus in combination implement this actual computations in data for inference, which includes explaining away and competing between the different hypotheses. And it is all very, So what is amazing is that neuroscientists have actually done experiments to the tune of showing these things. They might not be putting it in the overall inference framework, but they will show things like, if I poke this higher level neuron, it will inhibit, through this complicated loop, through the thalamus, it will inhibit this other column.

S3

Speaker 3

34:09

So they will do such experiments.

S2

Speaker 2

34:11

Do they use terminology of concepts, for example? So, I mean, is it something where, it's easy to anthropomorphize and think about concepts, like you started moving into logic-based kind of reasoning systems, So how would you think of concepts in that kind of way? Or is it a lot messier, a lot more gray area?

S2

Speaker 2

34:40

You know, even more gray, even more messy than the artificial neural network kinds of abstractions?

S3

Speaker 3

34:48

Easiest way to think of it is a variable, right? It's a binary variable, which is showing the presence or absence of something.

S2

Speaker 2

34:55

But I guess what I'm asking is, is that something that we're supposed to think of something that's human interpretable of that something.

S3

Speaker 3

35:04

It doesn't need to be. It doesn't need to be human interpretable. There's no need for it to be human interpretable.

S3

Speaker 3

35:09

But it's almost like you will be able to find some interpretation of it because it is connected to the other things that you know about.

S2

Speaker 2

35:22

And the point is it's useful somehow. It's useful as an entity in the graph in connecting to the other entities that are, let's call them concepts. Okay, so by the way, are these the cortical microcircuits?

S3

Speaker 3

35:39

Correct, these are the cortical microcircuits. That's what neuroscientists use to talk about the circuits within a level of the cortex. So you can think of, let's think of a neural network, artificial neural network terms.

S3

Speaker 3

35:54

People talk about the architecture of the, how many layers they build, what is the fan in fan out, et cetera. That is the macro architecture. So, and then within a layer of the neural network, you can, you know, the cortical neural network is much more structured within a level. There's a lot more intricate structure there.

S3

Speaker 3

36:18

But even within an artificial neural network, you can think of feature detection plus pooling as 1 level. And so that is kind of a microcircuit. It's much more complex in the real brain. So within a level, whatever is that circuitry within a column of the cortex and between the layers of the cortex, that's the microcircuitry.

S2

Speaker 2

36:40

I love that terminology. Machine learning people don't use the circuit terminology, but they should. It's nice.

S2

Speaker 2

36:47

So, okay. Okay, so that's the cortical microcircuit. So what's interesting about, what can we say, what is the paper that you're working on, propose about the ideas around these cortical microcircuits.

S3

Speaker 3

37:04

So this is a fully functional model for the microcircuits of the visual cortex.

S2

Speaker 2

37:10

So the paper focuses on your idea and our discussions now is focusing on vision. Yeah. The visual cortex.

S2

Speaker 2

37:17

Okay. Yeah. So this is a model, this is a full model. This is how vision works.

S3

Speaker 3

37:22

But this is a model of, yeah. Okay, so let me step back a bit. So we looked at neuroscience for insights on how to build a vision model.

S3

Speaker 3

37:35

And we synthesized all those insights into a computational model. This is called the recursive critical network model that we used for breaking captures. And we are using the same model for robotic picking and tracking of objects.

S2

Speaker 2

37:50

And that again is a vision system.

S3

Speaker 3

37:52

That's a vision system. Computer vision system. That's a computer vision system.

S2

Speaker 2

37:55

Takes in images and outputs what?

S3

Speaker 3

37:59

On 1 side it outputs the class of the image and also segments the image. And you can also ask it further queries. Where is the edge of the object?

S3

Speaker 3

38:08

Where is the interior of the object? So it's a model that you build to answer multiple questions. So you're not trying to build a model for just classification or just segmentation, et cetera. It's a joint model that can do multiple things.

S3

Speaker 3

38:25

So that's the model that we built using insights from neuroscience. And some of those insights are, what is the role of feedback connections? What is the role of lateral connections? So all those things went into the model.

S3

Speaker 3

38:38

The model actually uses feedback connections.

S2

Speaker 2

38:40

All these ideas from your side. Yeah. So what the heck is a recursive cortical network?

S2

Speaker 2

38:46

Like what are the architecture approaches, interesting aspects here, which is essentially a brain inspired approach to a computer vision?

S3

Speaker 3

38:56

Yeah, so there are multiple layers to this question. Again, go from the very, very top and then zoom in. So 1 important thing, constraint that went into the model is that you should not think of vision as something in isolation.

S3

Speaker 3

39:12

We should not think perception as something as a pre-processor for cognition. Perception and cognition are interconnected. So you should not think of 1 problem in separation from the other problem. So that means if you finally want to have a system that understands concepts about the world and can learn in a very conceptual model of the world and can reason and connect to language, all of those things, you need to have, think all the way through and make sure that your perception system is compatible with your cognition system and language system and all of them.

S3

Speaker 3

39:48

And 1 aspect of that is top-down controllability.

S1

Speaker 1

39:53

What does that mean?

S3

Speaker 3

39:54

So that means, you know, so think of, you know, you can close your eyes and think about the details of 1 object, right? I can zoom in further and further. Think of the bottle in front of me.

S3

Speaker 3

40:09

Now, you can think about what the cap of that bottle looks. We can think about what's the texture on that bottle of the cap? You can think about what will happen if something hits that. So you can manipulate your visual knowledge in cognition driven ways.

S2

Speaker 2

40:31

Yes.

S3

Speaker 3

40:32

And so this top-down controllability and being able to simulate scenarios in the world.

S2

Speaker 2

40:40

So you're not just a passive player in this perception game. You can control it. You have imagination.

S3

Speaker 3

40:49

Correct, correct. So basically having a generative network, which is a model, and it is not just some arbitrary generative network. It has to be built in a way that it is controllable top down.

S3

Speaker 3

41:03

It is not just trying to generate a whole picture at once. You know, it's not trying to generate photorealistic things of the world. You know, you don't have good photorealistic models of the world, human brains do not have. If I, for example, ask you the question, what is the color of the letter E in the Google logo?

S3

Speaker 3

41:22

You have no idea.

S2

Speaker 2

41:23

No idea.

S3

Speaker 3

41:23

Although you have seen it millions of times. I've seen it. Or not millions of times, hundreds of times.

S3

Speaker 3

41:28

Yeah. So it's not, our model is not photorealistic. But it has other properties that we can manipulate it. You can think about filling in a different color in that logo.

S3

Speaker 3

41:39

You can think about expanding the letter E. You can Imagine the consequence of actions that you have never performed. So these are the kind of characteristics the generative model need to have. So this is 1 constraint that went into our model.

S3

Speaker 3

41:55

So this is, when you read the, just the perception side of the paper, it is not obvious that this was a constraint into the, that went into the model, this top-down controllability of the generative model.

S2

Speaker 2

42:06

So what does top-down controllability in a model look like? It's a really interesting concept, fascinating concept. What is that?

S2

Speaker 2

42:15

Is that the recursiveness gives you that? Or how do you

S3

Speaker 3

42:20

do it? Quite a few things. It's like, what does the model factor, or factorize?

S3

Speaker 3

42:25

You know, what is the model representing as different pieces in the puzzle? Like, you know, So in the RCN network, it thinks of the world, so what I say, the background of an image is modeled separately from the foreground of the image. So the objects are separate from the background, they are different entities.

S2

Speaker 2

42:45

So there's a kind of segmentation that's built in fundamentally to

S3

Speaker 3

42:48

the structure. And then even that object is composed of parts. And also another 1 is the shape of the object is differently modeled from the texture of the object.

S1

Speaker 1

43:01

Got it. So there's like these,

S2

Speaker 2

43:05

I've been, you know who Francois Chollet is? Yeah. He's, so there's, he developed this like IQ test type of thing for ARC challenge for, And it's kind of cool that there's these concepts, priors that he defines that you bring to the table in order to be able to reason about basic shapes and things in IQ test.

S2

Speaker 2

43:28

So here you're making it quite explicit that here are the things that you should be, these are like distinct things that you should be able to model in this.

S3

Speaker 3

43:40

Keep in mind that you can derive this from much more general principles. It doesn't, You don't need to explicitly put it as, oh, objects versus foreground versus background, the surface versus texture. No, these are derivable from more fundamental principles of how, what's the property of continuity of natural signals?

S2

Speaker 2

44:02

What's the property of continuity of natural signals? Yeah. By the way, that sounds very poetic.

S2

Speaker 2

44:07

But yeah, so you're saying there's some low-level properties from which emerges the idea that shapes should be different than, like there should be parts of an object, there should be, I mean, kind of like Francois, I mean, there's objectness, there's all these things that it's kind of crazy that we humans, I guess, evolved to have because it's useful for us to perceive the world.

S3

Speaker 3

44:32

Correct, correct. And it derives mostly from the properties of natural signals.

S2

Speaker 2

44:38

Natural signals. So natural signals are the kind of things we'll perceive in the natural world. I don't know why that sounds so beautiful, natural signals, yeah.

S3

Speaker 3

44:48

As opposed to a QR code, right? Which is an artificial signal that we created. Humans are not very good at classifying QR codes.

S3

Speaker 3

44:55

We are very good at saying something is a cat or a dog, but not very good at, you know, where computers are very good at classifying QR codes. So our visual system is tuned for natural signals. And there are fundamental assumptions in the architecture that are derived from natural signals properties.

S1

Speaker 1

45:15

I wonder when you take

S2

Speaker 2

45:16

a hallucinogenic drugs, does that go into natural or is that closer to QR code? That's a

S3

Speaker 3

45:23

whole- It's still natural.

S1

Speaker 1

45:23

It's still natural?

S3

Speaker 3

45:24

Yeah, because it is still operating using our brains.

S2

Speaker 2

45:27

By the way, on that topic, I mean, I haven't been following. I think they're becoming legalized in certain, I can't wait until they become legalized to a degree that you, like vision science researchers could study it. Yeah.

S2

Speaker 2

45:41

Just like through medical, chemical ways, modify. They could be ethical concerns, but that's another way to study the brain is to be able to chemically modify it. There's probably a very long way to figure out how to do it ethically.

S3

Speaker 3

45:59

Yeah, but I think there are studies on that already. Yeah, I think so. Because it's not unethical to give it to rats.

S2

Speaker 2

46:08

Oh, that's true, that's true. There's a lot of drugged up rats out there. Okay, cool, sorry, sorry.

S2

Speaker 2

46:16

So there's these low level things from natural signals that...

S3

Speaker 3

46:26

From which these properties will emerge. Yes. But it is still a very hard problem on how to encode that.

S3

Speaker 3

46:37

So you mentioned the priors, Francho wanted to encode in the abstract reasoning challenge, But it is not straightforward how to encode those priors. So some of those challenges, like the object recognition, completion challenges are things that we purely use our visual system to do. It looks like abstract reasoning, but it is purely an output of the vision system. For example, completing the corners of that Kaninsa triangle, completing the lines of that Kaninsa triangle.

S3

Speaker 3

47:07

It's a purely visual system property. There is no abstract reasoning involved. It uses all these priors, but it is stored in our visual system in a particular way that is amenable to inference. And that is 1 of the things that we tackled in the, you know, basically saying, okay, these are the prior knowledge, which will be derived from the world.

S3

Speaker 3

47:29

But then how is that prior knowledge represented in the model such that inference, when some piece of evidence comes in, can be done very efficiently and in a very distributed way. Because there are so many ways of representing knowledge which is not amenable to very quick inference, you know, quick lookups. And so that's 1 core part of what we tackled in the RCN model. How do you encode visual knowledge to do very quick inference?

S3

Speaker 3

48:02

And yeah.

S2

Speaker 2

48:02

Can you maybe comment on, so folks listening to this in general may be familiar with different kinds of architectures of neural networks. What are we talking about with the RCN? What does the architecture look like?

S2

Speaker 2

48:17

What are the different components? Is it close to neural networks? Is it far away from neural networks? What does it look like?

S3

Speaker 3

48:22

Yeah, so you can think of the delta between the model and a convolutional neural network, if people are familiar with convolutional neural networks. So convolutional neural networks have this feed-forward processing cascade, which is called feature detectors and pooling. And that is repeated in the hierarchy, in a multi-level system.

S3

Speaker 3

48:44

And If you want an intuitive idea of what is happening, feature detectors are detecting interesting co-occurrences in the input. It can be a line, a corner, an eye, or a piece of texture, et cetera. And the pooling neurons are doing some local transformation of that and making it invariant to local transformations. So this is what the structure of convolutional neural network is.

S3

Speaker 3

49:12

Recursive cortical network has a similar structure when you look at just the feedforward pathway. But in addition to that, it is also structured in a way that it is generative. So that again, it can run it backward and combine the forward with the backward. Another aspect that it has is it has lateral connections.

S3

Speaker 3

49:34

These lateral connections, which is between, so if you have an edge here and an edge here, it has connections between these edges. It is not just feed-forward connections, it is something between these edges, which is the nodes are presenting these edges, which is to enforce compatibility between them. So otherwise, what will happen

S1

Speaker 1

49:53

is that- Constraints?

S3

Speaker 3

49:54

It's a constraint. It's basically, if you do just feature detection followed by pooling, then your transformations in different parts of the visual field are not coordinated. And so you will create jagged, when you generate from the model, you will create jagged things and uncoordinated transformations.

S3

Speaker 3

50:17

So these lateral connections are enforcing the transformations.

S2

Speaker 2

50:21

Is the whole thing still differentiable? No. Okay.

S3

Speaker 3

50:26

No. It's not trained using backprop.

S2

Speaker 2

50:30

Okay, that's really important. So there's this feed forward, there's feedback mechanisms, there's some interesting connectivity things, it's still layered?

S3

Speaker 3

50:39

Yes, there

S2

Speaker 2

50:39

are multiple layers. Multiple layers. Okay, very, very interesting.

S2

Speaker 2

50:45

And yeah, okay, so the interconnection between adjacent, so connections across service constraints that keep the thing stable. Correct. Okay, so what else?

S3

Speaker 3

50:58

And then there's this idea of doing inference. A neural network does not do inference on the fly. So an example of why this inference is important is, you know, so 1 of the first applications that we showed in the paper was to crack text-based CAPTCHAs.

S2

Speaker 2

51:17

What are CAPTCHAs, by the way? Yeah.

S3

Speaker 3

51:20

By the

S2

Speaker 2

51:20

way, 1 of the most awesome, like the people don't use this term anymore, it's human computation, I think. I love this term. The guy who created CAPTCHAs, I think came up with this term.

S3

Speaker 3

51:30

Yeah.

S2

Speaker 2

51:31

I love it. Anyway. Yeah.

S2

Speaker 2

51:34

What are CAPTCHAs?

S3

Speaker 3

51:35

So CAPTCHAs are those strings that you fill in when you're, you know, if you're opening a new account in Google, they show you a picture, You know, usually it used to be set of garbled letters that you have to kind of figure out what is that string of characters and type it. And the reason GAP just exists is because, you know, Google or Twitter do not want automatic creation of accounts. You can use a computer to create millions of accounts and use that for nefarious purposes.

S3

Speaker 3

52:10

So you want to make sure that to the extent possible, the interaction that their system is having is with a human. So it's called a human interaction proof. A CAPTCHA is a human interaction proof. So this is a CAPTCHA that by design, thinks that are easy for humans to solve, but Hard for computers.

S3

Speaker 3

52:30

Hard for robots, yeah. So, and text-based captures was the 1 which is prevalent around

S1

Speaker 1

52:38

2014,

S3

Speaker 3

52:39

because at that time, text-based captures were hard for computers to crack. Even now, they are actually, in the sense of, an arbitrary text-based capture will be unsolvable even now. But with the techniques that we have developed, it can be, you can quickly develop a mechanism that solves the capture.

S2

Speaker 2

52:58

They've probably gotten a lot harder too. The people, they've been getting cleverer and cleverer generating these text captures.

S3

Speaker 3

53:05

Correct, correct.

S2

Speaker 2

53:06

So okay, so that was 1 of the things you've tested on is these kinds of captures in

S1

Speaker 1

53:10

2014, 15,

S3

Speaker 3

53:12

that kind of stuff.

S2

Speaker 2

53:13

So what, I mean, by the way, why CAPTCHAs? Why?

S3

Speaker 3

53:18

Yeah, yeah. Even now, I would say CAPTCHA is a very, very good challenge problem if you want to understand how human perception works and if you want to build systems that work like the human brain. And I wouldn't say capture is a solved problem.

S3

Speaker 3

53:34

We have cracked the fundamental defense of captures, but it is not solved in the way that humans solve it. So I can give an example. I can Take a five-year-old child who has just learned characters and show them any new capture that we create. They will be able to solve it.

S3

Speaker 3

53:55

I can show you pretty much any new capture from any new website. You'll be able to solve it without getting any training examples from that particular style of capture. You're

S2

Speaker 2

54:05

assuming I'm human, yeah.

S3

Speaker 3

54:07

Yes, yeah, that's right. So if you are human, otherwise I will be able to figure that out using this 1.

S2

Speaker 2

54:16

This whole podcast is just a touring test. A long touring test. Anyway, I'm sorry.

S2

Speaker 2

54:21

So yeah, so humans can figure it out with very few examples.

S3

Speaker 3

54:26

Or no training examples. Like no training examples from that particular style of capture. And so you can, you know, so even now this is unreachable for the current deep learning system.

S3

Speaker 3

54:38

So basically there is no, I don't think a system exists where you can basically say, train on whatever you want. And then now say, hey, I will show you a new capture, which I did not show you in the training setup. Will the system be able to solve it? Still doesn't exist.

S3

Speaker 3

54:54

So that is the magic of human perception. And Doug Huffstatter put this very beautifully in 1 of his talks. The central problem in AI is what is the letter A? If you can build a system that reliably can detect all the variations of the letter A.

S3

Speaker 3

55:16

You don't even need to go to the...

S2

Speaker 2

55:19

The B and the C. Yeah, you

S3

Speaker 3

55:20

don't even need to go to the B and the C or the strings of characters. And so that is the spirit at which, with which we tackle that problem.

S2

Speaker 2

55:28

What does he mean by that? I mean, is it like without training examples, try to figure out the fundamental elements that make up the letter A in all of its forms?

S3

Speaker 3

55:43

In all of its forms. It can be, A can be made with 2 humans standing, leaning against each other, holding hands. And it can be made of leaves.

S3

Speaker 3

55:51

It can be-

S2

Speaker 2

55:52

You might have to understand everything about this world in order to understand the letter A. Yeah. So it's common sense reasoning, essentially.

S3

Speaker 3

56:00

Right. So to finally, to really solve, finally to say that you have solved CAPTCHA, you have to solve the whole problem. So.

S2

Speaker 2

56:11

Yeah, okay, so how does this kind of the RCN architecture help us to get, do a better job of that kind of thing?

S3

Speaker 3

56:18

Yeah, so as I mentioned, 1 of the important things was being able to do inference, being able to dynamically do inference.

S2

Speaker 2

56:27

Can you clarify what you mean? Because you said like neural networks don't do inference. Yeah.

S2

Speaker 2

56:33

So what do you mean by inference in this context then?

S3

Speaker 3

56:36

So, okay, so in captures what they do to confuse people is to make these characters crowd together. Yes. And when you make the characters crowd together, what happens is that you will now start seeing combinations of characters as some other new character or an existing character.

S3

Speaker 3

56:53

So you would put an R and N together, it will start looking like an M. And so Locally, there is very strong evidence for it being some incorrect character. But globally, the only explanation that fits together is something that is different from what you can find locally. So this is inference.

S3

Speaker 3

57:18

You are basically taking local evidence and putting it in the global context and often coming to a conclusion locally, which is conflicting with the local information.

S1

Speaker 1

57:30

So actually, so you mean inference like in the way it's used when you talk about reasoning, for example,

S2

Speaker 2

57:37

as opposed to like inference, which is with artificial neural networks, which is a single pass to the network. Correct. Okay.

S2

Speaker 2

57:45

So like you're basically doing some basic forms of reasoning, like integration of like how local things fit into the global picture.

S3

Speaker 3

57:54

Right, and things like explaining away coming into this 1 because you are explaining that piece of evidence as something else because globally that's the only thing that makes sense. So now you can amortize this inference by, in a neural network, if you want to do this, you can brute force it. You can just show it all combinations of things that you want your reasoning to work over, and you can just train the help out of that neural network, and it will look like it is doing inference on the fly, but it is really just doing amortized inference.

S3

Speaker 3

58:34

It is because you have shown it a lot of these combinations during training time. So what you want to do is be able to do dynamic inference rather than just being able to show all those combinations in the training time. And that's something we emphasized in the model.

S2

Speaker 2

58:51

What does it mean, dynamic inference? Is that, that has to do with the feedback thing? Yes.

S2

Speaker 2

58:56

Like what is dynamic? I'm trying to visualize what dynamic inference would be in this case. Like, what is it doing with the input? It's shown the input the first time.

S2

Speaker 2

59:07

And it's like, what's changing over temporarily? What's the dynamics of this inference process?

S3

Speaker 3

59:15

So you can think of it as you have at the top of the model, the characters that you are trained on, they are the causes. You're trying to explain the pixels using the characters as the causes. The characters are the things that cause the pixels.

S2

Speaker 2

59:32

Yeah, so there's this causality thing. So the reason you mentioned causality, I guess, is because there's a temporal aspect to this whole thing.

S3

Speaker 3

59:40

In this particular case, the temporal aspect is not important. It is more like when, if I turn the character on, the pixels will turn on. Yeah, it will be after this a little bit, but

S2

Speaker 2

59:51

yeah. So it's a causality in the sense of like a logic causality, like hence inference, okay.

S3

Speaker 3

59:57

The dynamics is that even though