1 hours 30 minutes 51 seconds
🇬🇧 English
Speaker 1
00:00
So today we have Nate Derbinski. He's a professor at Northeastern University working on various aspects of computational agents that exhibit human level intelligence. Please give Nate a warm welcome. Thank you.
Speaker 2
00:18
Thanks a lot and thanks for having me here. So the title that was on the page was Cognitive Modeling. I'll kind of get there, but I wanted to put it in context.
Speaker 2
00:28
So the bigger theme here is I want to talk about what's called cognitive architecture. And if you've never heard about that before, that's great. And I wanted to contextualize that as, how is that 1 approach to get us to AGI? And I say what my view of AGI is, and put up a whole bunch of TV and movie characters that I grew up with that inspire me.
Speaker 2
00:52
That will lead us into what is this thing called cognitive architecture. It's a whole research field that crosses neuroscience, psychology, cognitive science, and all the way into AI. So I'll try to give you kind of the historical big picture view of it, what some of the actual systems are out there that might be of interest to you. And then we'll kind of zoom in on 1 of them that I've done a good amount of work with called SOAR.
Speaker 2
01:14
And what I'll try to do is tell a story, a research story of how we started with kind of a core research question. We look to how humans operate, understood that phenomenon, and then took it and saw really interesting results from it. And so at the end, if this field is of interest, there's a few pointers for you to go read more and go experience more of cognitive architecture. So just a rough definition of AGI, given this is an AGI class.
Speaker 2
01:46
Depending the direction that you're coming from. It might be kind of understanding intelligence or it might be developing intelligence systems that are operating at the level of human level intelligence. The typical differences between this and other sorts of maybe AI, machine learning systems. We want systems that are gonna persist for a long period of time.
Speaker 2
02:07
We want them robust to different conditions. We want them learning over time. And here's the crux of it, working on different tasks. And in a lot of cases, tasks they didn't know were coming ahead of time.
Speaker 2
02:21
I got into this because I clearly watched too much TV and too many movies, and then I looked back at this and I realized I think I'm covering 70s, 80s, 90s, noughts, I guess it is, and today. And so this is what I wanted out of AI, and this is what I wanted to work with. And then there's the reality that we have today. So instead of, so who's watched Knight Rider for instance?
Speaker 2
02:51
I don't think that exists yet, but maybe we're getting there. And in particular for fun, during the Amazon sale day, I got myself an Alexa, and I could just see myself at some point saying, hey Alexa, please write me an R-sync script, to sync my class. And if you have an Alexa, you probably know the following phrase, this just always hurts me inside, which is, sorry, I don't know that 1. Which is okay, right?
Speaker 2
03:20
That's, a lot of people have no idea what I'm asking, let alone how to do that. So what I want Alexa to respond with after that is, do you have time to teach me? And to provide some sort of interface by which back and forth we can kind of talk through this. That we aren't there yet, to say the least, but I'll talk later about some work on a system called ROSI that's working in that direction.
Speaker 2
03:46
We're starting to see some ideas about being able to teach systems how to work. So folks who are in this field I think generally fall into these 3 categories. They're just curious. They want to learn new things, generate knowledge, work on hard problems.
Speaker 2
04:04
Great. I think there are folks who are in that middle cognitive modeling realm. And so I'll use this term a lot. It's really understanding how humans think, how humans operate, human intelligence at multiple levels.
Speaker 2
04:19
And if you can do that, 1, there's just knowledge in and of itself of how we operate, but there's a lot of really important applications that you can think of. If we were able to not only understand, but predict how humans would respond, react, and various tasks. Medicine is an easy 1. There's some work in HCI or HRI, I'll get to later, where if you can predict how humans would respond to a task, you can iterate tightly and develop better interfaces.
Speaker 2
04:50
It's already being used in the realm of simulation and in defense industries. I happen to fall into the latter group, or the bottom group, which is systems development, which is to say just the desire to build systems for various tasks that are working on tasks that kind of current AI machine learning can't operate on. And I think when you're working at this level or on any system that nobody's really achieved before, what do you do? You kind of look to the examples that you have, which in this case that we know of, it's just humans, right?
Speaker 2
05:25
Irrespective of your motivation, when you have kind of an intent that you want to achieve in your research, you kind of let that drive your approach. And so I often show my AI students this. The Turing test you might have heard of, or variants of it that have come before, these were folks who were trying to create systems that acted in a certain way, that acted intelligently. And the kind of line that they drew, the benchmark that they used was to say, let's make systems that operate like humans do.
Speaker 2
05:58
Cognitive modelers will fit up into this top point here to say it's not enough to act that way, but by some definition of thinking, we want the system to do what humans do, or at least be able to make predictions about it. So that might be things like, what errors would the human make on this task? Or how long would it take them to perform this task? Or what emotion would be produced in this task?
Speaker 2
06:23
There are folks who are still thinking about how the computer is operating, but trying to apply kind of rational rules to it. So a logician, for instance, would say, if you have A and you have B, A gives you B, B gives you C, A should definitely give you C. That's just what's rational. And so there are folks operating in that direction.
Speaker 2
06:44
And then if you go to intro AI class anywhere around the country, particularly Berkeley, because they have graphics designers that I get to steal from, the benchmark would be what the system produces in terms of action, and the benchmark is some sort of optimal rational bound. Irrespective of where you work in this space, there's kind of a common output that arrives when you research these areas, which is you can learn individual bits and pieces, and it can be hard to bring them together to build a system that either predicts or acts on different tasks. So this is part of the transfer learning problem but it's also part of having distinct theories that are hard to combine together. So I'm gonna give an example that comes out of cognitive modeling, or perhaps 3 examples.
Speaker 2
07:38
So if you were in a HCI class or some intro psychology classes, 1 of the first things you learn about is Fitts' Law, which provides you the ability to predict the difficulty level of basically human pointing from where they start to a particular place. And it turns out that you can learn some parameters and model this based upon just the distance from where you are to the target and the size of the target. So both moving a long distance will take a while, but also if you're aiming for a very small point, that can take longer than if there's a large area that you just kind of have to get yourself to. And so this is held true for many humans.
Speaker 2
08:18
So let's say we've learned this, and then we move on to the next task, and we learn about what's called the power law of practice, which has been shown true in a number of different tasks. What I'm showing here is 1 of them where you're going to draw a line through sequential set of circles here, starting at 1, going to 2, and so forth, not making a mistake, or at least not trying to, and try to do this as fast as possible. And so for a particular person, we would fit the A, B, and C parameters and we'd see a power loss. So as you perform this task more, you're gonna see a decrease in the amount of reaction time required to complete the task.
Speaker 2
08:59
Great, we've learned 2 things about humans. Let's add some more in. So for those who might have done some reinforcement learning, TD learning is 1 of those approaches, temporal difference learning, that's had some evidence of similar sorts of processes in the dopamine centers of the brain. And it basically says in a sequential learning task, you perform the task, you get some sort of reward, how are you going to kind of update your representation of what to do in the future, such as to maximize expectation of future reward.
Speaker 2
09:29
And there are various models of how that changes over time, and you can build up functions that allow you to form better and better and better given trial and error. Great, so we've learned 3 interesting models here that hold true over multiple people, multiple tasks, And so my question is, if we take these together and add them together, how do we start to understand a task as quote unquote simple as chess? Which is to say, we could ask questions, how long would it take for a person to play? What mistakes would they make?
Speaker 2
10:06
After they played a few games, how would they adapt themselves? Or, if we wanted to develop a system that ended up being good at chess, or at least learning to become better at chess. My question is, if you could, there doesn't seem to be a clear way to take these very, very individual theories and kind of smash them together and get a reasonable answer of how to play chess or how do humans play chess. And so, gentleman in this slide is Alan Newell, 1 of the founders of AI, did incredible work in psychology and other fields.
Speaker 2
10:43
He gave a series of lectures at Harvard in 1987 and they were published in 1990 called the Unified Theories of Cognition. And his argument to the psychology community at that point was the argument on the prior slide. They had many individual studies, many individual results. And so the question was, how do you bring them together to gain this overall theory?
Speaker 2
11:04
How do you make forward progress? And so his proposal was unified theories of cognition, which became known as cognitive architecture, which is to say, to bring together your core assumptions, your core beliefs of what are the fixed mechanisms and processes that intelligent agents would use across tasks. So the representations, the learning mechanisms, the memory systems, bring them together, implement them in a theory, and use that across tasks. And the core idea is that when you actually have to implement this and see how it's going to work across different tasks, the interconnections between these different processes and representations would add constraints.
Speaker 2
11:51
And over time, the constraints would start limiting the design space of what is necessary and what is possible in terms of building intelligent systems. And So the overall goal from there was to understand and exhibit human level intelligence using these cognitive architectures. A natural question to ask is, okay, so we've gone from a methodology of science that we understand how to operate in. We make a hypothesis, we construct a study, we gather our data, we evaluate that data, and we falsify or we do not falsify the original hypothesis.
Speaker 2
12:29
And we can do that over and over again, and we know that we're making forward progress scientifically. If I've now taken that model and changed it into I have a piece of software and it's representing my theories and to some extent I can configure that software in different ways to work on different tasks. How do I know that I'm making progress? And so there's a form of science called lactoseum.
Speaker 2
12:53
And it's kind of shown pictorially here where you start with your core of what your beliefs are about where your head, what is necessary for achieving the goal that you have. And around that you'll have kind of ephemeral hypotheses and assumptions that over time may grow and shrink. And so you're trying out different things, trying out different things. And if an assumption is around there long enough, it becomes part of that core.
Speaker 2
13:20
And so as you work on more tasks and learn more, either by your work or by data coming in from someone else, the core is growing larger and larger. You've got more constraints and you've made more progress. And so what I wanted to look at were in this community, what are some of the core assumptions that are driving forward scientific progress? So 1 of them actually came out of those lectures that are referred to as Newell's Time Scales of Human Action.
Speaker 2
13:49
And so off on the left, the left 2 columns are both time units, just expressed somewhat differently. Second from the left being maybe more useful to a lot of us in understanding daily life. 1 step over from there would be kind of at what level processes are occurring. So the lowest 3 are down at kind of the substrate, the neuronal level.
Speaker 2
14:12
We're building up to deliberate tasks that occur in the brain and tasks that are operating on the order of 10 seconds. Some of these might occur in the psychology laboratory, but probably a step up to minutes and hours. And then above that really becomes interactions between agents over time. And so if we start with that, the things to take away is that the hypothesis is that regularities will occur at these different time scales and that they're useful.
Speaker 2
14:41
And so those who operate at that lowest time scale might be considering neuroscience, cognitive neuroscience. When you shift up to the next couple levels, what we would think about in terms of the areas of science that deal with that would be psychology and cognitive science, and then we shift up a level and we're talking about sociology and economics and the interplay between agents over time. And so what we'll find with cognitive architecture is that most of them will tend to sit at the deliberate act. We're trying to take knowledge of a situation and make a single decision And then sequences of decisions over time will build to tasks, and tasks over time will build to more interesting phenomenon.
Speaker 2
15:23
I'm actually going to show that that isn't strictly true, that there are folks working in this field that actually do operate 1 level below. Some other assumptions. So this is Herb Simon receiving the Nobel Prize in Economics and part of what he received that award for was an idea of bounded rationality. So In various fields we tend to model humans as rational.
Speaker 2
15:49
And his argument was, let's consider that human beings are operating under various kinds of constraints. And so to model the rationality with respect to and bounded by how complex the problem is that they're working on, how big is that search space that they have to conquer, cognitive limitations, so speed of operations, amount of memory, short-term as well as long-term, as well as other aspects of our computing infrastructure that are going to keep us from being able to arbitrarily solve complex problems, as well as how much time is available to make that decision. And so This is actually a phrase that came out of his speech when he received the Nobel Prize. Decision makers can satisfice either by finding optimum solutions for a simplified world, which is to say, take your big problem, simplify it in some way, and then solve that, Or by finding satisfactory solutions for a more realistic world.
Speaker 2
16:48
Take the world in all its complexity, take the problem in all its complexity, and try to find something that works. Neither approach in general dominates the other, and both have continued to coexist. And so what you're actually going to see throughout the cognitive architecture community is this understanding that some problems you're not gonna be able to get an optimal solution to if you consider, for instance, bounded amount of computation, bounded time, the need to be reactive to a changing environment, these sorts of issues. And so in some sense, we can decompose problems that come up over and over again into simpler problems, solve those near optimally or optimally, Fix those in, optimize those, but more general problems we might have to satisfy some.
Speaker 2
17:36
There's also the idea of the symbol system hypothesis. So this is Alan Newell and Herb Simon there considering how a computer could play the game of chess. So the physical symbol system talks about the idea of taking something, some signal, abstractly referred to as symbol, combining them in some ways to form expressions, and then having operations that produce new expressions. A weak interpretation of the idea that symbol systems are necessary and sufficient for intelligent systems.
Speaker 2
18:09
A very weak way of talking about it is the claim that there's nothing unique about the neuronal infrastructure that we have. But if we got the software right, we could implement it in the bits, bytes, RAM, and processor that make up modern computers. That's kind of the weakest way to look at this, that we can do it with silicon and not carbon. Stronger way that this used to be looked at was more of a logical standpoint, which is to say if we can encode rules of logic, these tend to line up if we think intuitively of planning and problem-solving.
Speaker 2
18:47
And if we can just get that right and get enough facts in there and enough rules in there that somehow intelligence, well, that's what we need for intelligence and eventually we can get to the point of intelligence and that's what you need for intelligence. And that was a starting point that lasted for a while. I think by now most folks in this field would agree that that's necessary to be able to operate logically, but that there are going to be representations and processes that will benefit from non-symbolic representation. So particularly perceptual processing, visual, auditory, and processing things in a more standard machine learning sort of way, as well as taking advantage of statistical representations.
Speaker 2
19:36
So we're getting closer to actually looking at cognitive architectures. I did want to go back to the idea that different researchers are coming with different research foci. And we'll start off with kind of the lowest level and understanding biological modeling. So Lieber and Spahn both tried to model different degrees of low level details, parameters, firing rates, connectivities between different kind of levels of neuronal representations.
Speaker 2
20:09
They build that up and then they try to build tasks above that layer, but always being very cautious about being true to human biological processes. And a layer above there would be psychological modeling, which is to say trying to build systems that are true in some sense to areas of the brain, interactions in the brain, and being able to predict errors that we made, timing that we produced by the human mind. And so there I'll talk a little bit about ActR. This final level down here, These are systems that are focused mainly on producing functional systems that exhibit really cool artifacts and solve really cool problems.
Speaker 2
20:56
And so I'll spend most of the time talking about Soar, but I want to point out a relative newcomer in the game called Sigma. So to talk about spawn a little bit, we'll see if the sound works in here. I'm going to let the creator take this 1, or not. See how the AV system likes this.
Speaker 2
21:27
There we go.
Speaker 3
21:31
My name is Chris Weisman and I'm the director of the Centre for Theory of Neuroscience at the University of Waterloo. And I'm actually jointly appointed between philosophy and engineering. The philosophy allows me to consider general conceptual issues about how the mind works.
Speaker 3
21:44
But of course, if I want to make claims about how the mind works, I have to understand also how the brain works. And this is where engineering plays a critical role. Engineering allows me to break down equations and very precise descriptions which we can test by building actual models. 1 model that we built recently is called the Spine model.
Speaker 3
22:00
This model, Spine, has about 2 and a half million individual neurons that are simulated in it. And the input to the model is an eye, and the output from the model is the movement of an arm. So essentially it can see images of numbers and then do something like categorize them, in which case it would just draw the number that it sees, or it can actually try to reproduce the style of the number that it's looking at. So for instance, if it sees a loopy 2, or 2 with a big loop on the bottom, it can actually reproduce that particular style of 2.
Speaker 3
22:27
On the medical side, we all know that we have cognitive challenges that show up as we get older. And we can try to address those challenges by simulating the aging process with these kinds of models. Another potential area of impact is on artificial intelligence. A lot of work in artificial intelligence attempts to build agents that are extremely good at 1 task, for instance, playing chess.
Speaker 3
22:45
What's special about Spine is that it's quite good at many different tasks. And this adds the additional challenge of trying to figure out how to coordinate the flow of information through different parts of the model. Something that animals seem to be very lucky.
Speaker 2
23:02
So I'll provide a pointer at the end. He's got a really cool book called How to Build a Brain. And if you Google him, you can Google spawn, you can find a toolkit where you can kind of construct circuits that will approximate functions that you're interested in, connect them together, set certain properties that you would want at a low level, and build them up and actually work on tasks at the level of vision and robotic actuation.
Speaker 2
23:28
So that's a really cool system. As we move into architectures that are sitting above that biological level, I wanted to give you kind of an overall sense of what they're going to look like, what a prototypical architecture is going to look like. So they're going to have some ability to have perception. The modalities typically are more digital symbolic, but they will, depending on the architecture, be able to handle vision, audition, and various sensory inputs.
Speaker 2
24:02
These will get represented in some sort of short-term memory, whatever the state's representation for the particular system is. It's typical to have a representation of the knowledge of what tasks can be performed, when they should be performed, how they should be controlled. And so these are typically both actions that take place internally that manage the internal state of the system and perform internal computations, but also about external actuation. And external might be a digital system, a game AI, but it might also be some sort of robotic actuation in the real world.
Speaker 2
24:39
There's typically some sort of mechanism by which to select from the available actions in a particular situation. There's typically some way to augment this procedural information, which is to say, learn about new actions, possibly modify existing ones. There's typically some semblance of what's called declarative memory. So whereas procedural, at least in humans, if I asked you to describe how to ride a bike, you might be able to say get on the seat and pedal, but in terms of keeping your balance there, you'd have a pretty hard time describing it declaratively.
Speaker 2
25:18
So that's kind of the procedural side, the implicit representation of knowledge, whereas declarative would include facts, geography, math, but it could also include experiences that the agent has had, a more episodic representation of declarative memory. And they'll typically have some way of learning this information, augmenting it over time. And then finally, some way of taking actions in the world. And they'll all have some sort of cycle, which is perception comes in, knowledge that the agent has is brought to bear on that, an action is selected, knowledge that knows to condition on that action will act accordingly, both with internal processes as well as eventually to take action, and then rinse and repeat.
Speaker 2
26:02
So when we talk about, in an AI system, an agent, in this context, that would be the fixed representation, which is whatever architecture we're talking about, plus set of knowledge that is typically specific to the task but might be more general. So oftentimes these systems could incorporate a more general knowledge base of facts, of linguistic facts, of geographic facts. Let's take Wikipedia and let's just stick it in the brain of the system. There'll be more task in general, but then also whatever it is that you're doing right now, how should you proceed in that?
Speaker 2
26:37
And then it's typical to see this processing cycle, and going back to the prior assumption, the idea is that These primitive cycles allow for the agent to be reactive to its environment. So if new things come in that has react to, if the lion's sitting over there, I better run and maybe not do my calculus homework, right? So as long as this cycle is going, I'm reactive, but at the same time, if multiple actions are taken over time, I'm able to get complex behavior over the long term. So this is the ACT-R cognitive architecture.
Speaker 2
27:13
It has many of the kind of core pieces that I talked about before. Let's see if the, is the mouse, yes, mouse is useful up there. So we have the procedural model here. A short-term memory is going to be these buffers that are on the outside.
Speaker 2
27:31
The procedural memory is encoded as what I call production rules, or if-then rules. If this is the state of my short-term memory, this is what I think should happen as a result. You have a selection of the appropriate rule to fire and an execution. You're seeing associated parts of the brain being represented here.
Speaker 2
27:54
Cool thing that has been done over time in the ActR community is to make predictions about brain areas and then perform MRIs and gather that data and correlate that data. So when you use the system you will get predictions about things like timing of operations, errors that will occur, probabilities that something is learned, but you also get predictions about, to the degree that they can, kind of brain areas that are going to light up. And if you want to, that's actively being developed at Carnegie Mellon. To the left is John Anderson, who developed this cognitive architecture 30-ish years ago.
Speaker 2
28:38
And until the last about 5 years, he was the primary researcher developer behind it with Christian. And then Recently, he's decided to spend more time on cognitive tutoring systems. And so Christian has become the primary developer. There is an annual ACT-R workshop.
Speaker 2
28:57
There's a summer school, which if you're Thinking about modeling a particular task, you can kind of bring your task to them, bring your data, they teach you how to use the system, and try to get that study going right there on the spot. To give you a sense of what kinds of tasks this could be applied to, So this is representative of a certain class of tasks, certainly not the only 1. Let's try this again. Think PowerPoint's gonna want a restart every time.
Speaker 2
29:28
Okay, So we're getting predictions about basically where the eye is going to move. What you're not seeing is it's actually processing things like text and colors and making predictions about what to do and how to represent the information and how to process the graph as a whole. I had alluded to this earlier. There's work by Bonnie John, very similar, so making predictions about how humans would use computer interfaces.
Speaker 2
29:54
And at the time she got hired away by IBM. And so they wanted the ability to have software that you can put in front of software designers, and when they think they have a good interface, press a button. This model of human cognition would try to perform the tasks that it had been told to do and make predictions about how long it would take. And so you can have this tight feedback loop from designers saying, here's how good your particular interface is.
Speaker 2
30:20
So ActR as a whole, it's very prevalent in this community. I went to their webpage and counted up just the papers that they knew about. It was over 1,100 papers over time. If you're interested in it, the main distribution is in Lisp, but many people have used this and wanted to apply it to systems that need a little more processing power.
Speaker 2
30:42
So there's the NRL has a Java port of it that they use in robotics. The Air Force Research Lab in Dayton has implemented it in Erlang for parallel processing of large declarative knowledge bases. They're trying to do service-oriented architectures with it, CUDA, because they want what it has to say, they don't want to wait around for it to have to figure that stuff out. So that's the 2 minutes about ActR.
Speaker 2
31:11
Sigma is a relative newcomer, and It's developed out at the University of Southern California by a man named Paul Rosenblum, and I'll mention him in a couple minutes because he was 1 of the prime developers of SOAR at Carnegie Mellon. So he knows a lot about how SOAR works and he's worked on it over the years. And I think originally, I'm gonna speak for him and he'll probably say I was wrong. I think originally it was kind of a mental exercise of can I reproduce SOAR using a uniform substrate?
Speaker 2
31:42
I'll talk about SOAR in a little bit. It's 30 years of research code. If anybody's dealt with research code, it's 30 years of C and C++ with dozens of graduate students over time. It's not pretty at all.
Speaker 2
31:57
And theoretically, it's got these boxes sitting out here. And so he re-implemented the core functionality of SOAR all using factor graphs and message passing algorithms under the hood. He got to that point and then said there's nothing stopping me from going further. And so now it can do all sorts of modern machine learning, vision, optimization sort of things that would take some time in any other architecture to be able to integrate well.
Speaker 2
32:25
So it's been an interesting experience. It's now going to be the basis for the Virtual Human project out at the Institute for Creative Technology, it's an institute associated with the University of Southern California. For him, until recently, he couldn't get your hands on it, but in the last couple years he's done some tutorials on it, He's got a public release with documentation. So that's something interesting to keep an eye on.
Speaker 2
32:51
But I'm going to spend all the remaining time on the SOAR Cognitive Architecture. And so you see, it looks quite a bit like the prototypical architecture. And I'll give a sense, again, about how this all operates. Give a sense of the people involved.
Speaker 2
33:05
We already talked about Alan Newell, so both John Laird, who is my advisor, and Paul Rosenblum were students of Alan Newell. John's thesis project was related to the chunking mechanism in SOAR, which learns new rules based upon sub-goal reasoning. So he finished that, I believe, the year I was born. And so he's 1 of the few researchers you'll find who's still actively working on their thesis project.
Speaker 2
33:39
Beyond that, about I think about 10 years ago, he founded Soar Technology, which is a company up in Ann Arbor, Michigan. While it's called SOAR technology, it doesn't do exclusively SOAR, but that's a part of the portfolio. General intelligence system stuff, a lot of defense association. So, some notes of what's gonna make SOAR different from the other architectures that fall into this kind of functional architecture category.
Speaker 2
34:06
A big thing is a focus on efficiency. So John wants to be able to run SOAR on just about anything. We just got on the SOAR mailing list a desire to run it on a real-time processor. And our answer, while we had never done it before, was probably it'll work.
Speaker 2
34:25
Every release, there's timing tests. And we always, what we look at is, in a bunch of different domains for a bunch of different reasons that relate to human processing, there's this magic number that comes out which is 50 milliseconds, which is to say in terms of responding to tasks, if you're above that time humans will sense a delay and you don't want that to happen. Now, if we're working in a robotics task, 50 milliseconds, if you're dramatically above that, you just fell off the curb, or worse, or you just hit somebody in a car, right? So we're trying to keep that as low as possible, and For most agents, it doesn't even register.
Speaker 2
35:02
It's below 1 millisecond, fractions of millisecond. But I'll come back to this, because a lot of the work that I was doing was computer science, AI, and a lot of efficient algorithms and data structures. And 50 milliseconds was that very high upper bound. It's also 1 of the projects that has a public distribution.
Speaker 2
35:20
You can get it in all sorts of operating systems. We use something called SWIG that allows you to interface with it in a bunch of different languages. We kind of describe the meta description and you are able to basically generate bindings in a bunch of different platforms. Core is C++.
Speaker 2
35:38
There was a team at Soar Tech that said, we don't like C++, it gets messy. So they actually did a port over to pure Java in case that appeals to you. There's an annual SOAR workshop that takes place in Ann Arbor, typically it's free. You can go there, get a SOAR tutorial, and talk to folks who are working on SOAR.
Speaker 2
35:57
And it's fun, I've been there every year but 1 in the last decade. It's just fun to see the people around the world that are using the system in all sorts of interesting ways. To give you a sense of the diversity of the applications, 1 of the first was R1 Sor, which was back in the days when it was an actual challenge to build a computer, which is to say that your choice of certain components would have radical implications for other parts of the computer. So it wasn't just the Dell website where you just, I want this much RAM, I want this much CPU.
Speaker 2
36:28
There was a lot of thinking that went behind it and then physical labor that went to construct your computer. And so it was making that process a lot better. There are folks that apply it to natural language processing. SOAR 7 was the core of the Virtual Humans Project for a long time.
Speaker 2
36:44
HCI tasks. TAC AirSOAR was 1 of the largest rule-based systems. Tens of thousands of rules over 48 hours. It was a very large-scale simulation, a defense simulation.
Speaker 2
36:56
Lots of games it's been applied to for various reasons. And then in the last few years, porting it onto mobile robotics platforms. This is Edwin Olson's Splinterbot, an early version of it that went on to win the Magic competition. Then I went on to put Soar on the web.
Speaker 2
37:16
And if after this talk, you're really interested in a dice game that I'm going to talk about, you can actually go to the iOS app store and download. It's called Michigan Liars Dice. It's free. You don't have to pay for it.
Speaker 2
37:28
But you can actually play Liars Dice with SOAR. And you can set the difficulty level. It's pretty good. It beats me on a regular basis I Want to give you a couple other just kind of really weird Feeling sort of applications and really cool applications.
Speaker 2
37:46
The first 1 is out of Georgia Tech. Go PowerPoint.
Speaker 3
37:55
Yes.
Speaker 4
38:00
In which human participants can engage in collaborative movement improvisation with each other and virtual dance partners. This interaction creates a hybrid space in which virtual and corporeal bodies meet. The line between human and non-human is blurred, spurring participants to examine their relationship with technology.
Speaker 4
38:22
The Lumini installation ultimately examines how humans and machine can co-create experiences, and it does so in a playful environment. The dome creates a social space that encourages human-human interaction and collective dance experiences, allowing participants to creatively explore movement while having fun. The development of Lumini has been a hybrid exploration in art forms of theater and dance, as well as research in artificial intelligence and cognitive science. Lumini draws on inspiration from the ancient art form of shadow theater.
Speaker 4
39:00
The original two-dimensional version of the installation led to the conceptualization of the dome and the liminal space, with human silhouettes and virtual characters meeting to dance together on the projection surface. Rather than relying on a pre-opened library of movement responses, the virtual dancer learns its partner's movements and utilizes viewpoint's movement theory to systematically reason about them in order to improvisationally choose a movement response. Viewpoint's theory is based in dance and theater, and analyzes the performance along the dimensions of tempo, duration, repetition, kinesthetic response, shape, spatial relationships, gesture, architecture, and movement topography. The virtual dancer is able to use several different strategies to respond to human movements.
Speaker 4
39:54
These include mimicry of the movement, transformation of the movement along viewpoint's dimensions, Recalling a similar or complementary movement from memory in terms of viewpoint dimensions, and applying action-response patterns that the agent has learned while dancing with its human partner.
Speaker 3
40:13
The reason we did this is This is part of a larger effort in our lab for understanding the relationship between computation, cognition, and creativity, where a large amount of our efforts go into understanding human creativity and how we make things together, how we're creative together, as a way to help us understand how we can build co-creative AI that serves the same purpose, where it can be a colleague and collaborate with us and create things with us.
Speaker 2
40:47
So Brian was a graduate student in John Laird's lab as well. Before I start this, I alluded to this earlier where we're getting closer to Rosie saying, Can you teach me? So let me give you some introduction to this.
Speaker 2
41:02
In the lower left, you're seeing the view of a Kinect camera onto a flat surface. There's a robotic arm, mainly 3D printed parts, few servos. Above that, you're seeing an interpretation of the scene. We're giving it associations of the 4 areas with semantic titles, like 1 is the table, 1 is the garbage, just semantic terms for areas.
Speaker 2
41:30
But other than that, the agent doesn't actually know all that much. And it's going to operate in 2 modalities. 1 is we'll call it natural language, natural-ish language, a restricted subset of English, as well as some quote, unquote, pointing. So you're going to see some mouse pointers in the upper left saying, I'm talking about this.
Speaker 2
41:51
And this is just a way to indicate location. And so starting off, we're going to say things like, pick up the blue block. And it's going to be like, I don't know what blue is. What is blue?
Speaker 2
42:01
We say, oh, well, that's a color. OK. So go get the green thing. What's green?
Speaker 2
42:09
Oh, it's a color. OK. Move the blue thing to a particular location. Where's that?
Speaker 2
42:14
Point it. OK. What is moving? Really, it has to start from the beginning.
Speaker 2
42:19
And it's described, and it said, OK, now you've finished. And once we got to that point, now I can say, move the green thing over here. And it's got everything that it needs to be able to then reproduce the task given new parameters. And it's learned that ability.
Speaker 2
42:33
So let me give it a little bit of time. So you can look a little bit at the top left in terms of the pointers. You're going to see some text commands being entered. So what kind of attribute is blue?
Speaker 2
42:55
We're going to say it's a color. And so that can map it then to a particular sensor modality. This is green, so the pointing. What kind of thing is green?
Speaker 2
43:04
OK, color. So now it knows how to understand blue and green as colors with respect to the visual scene. Move rectangle to the table. What is rectangle?
Speaker 2
43:17
OK, now I can map that onto understanding parts of the world. Is this the blue rectangle? So the arm is actually pointing itself to get confirmation from the instructor. And then we're trying to understand, in general, when you say move something, what is the goal of this operation?
Speaker 2
43:34
And so then it also has a declared representation of the idea of this task, not only that it completed it. Then it can look back on having completed the task and understand what were the steps that led to achieving a particular goal. So in order to move it you're gonna have to pick it up. It knows which 1 the blue thing is.
Speaker 2
44:00
Great. Now put it in the table. So that's a particular location. At this point we can say, you're done.
Speaker 2
44:12
You have accomplished the move the blue rectangle to the table. So now I can understand what that very simple kind of process is like and associate that with the verb to move. And now we can say move the green object or not to the garbage And without any further interaction, based on everything that learned up to that point, it can successfully complete that task. So this is work of Shivali Mohan and others at the SOAR group at the University of Michigan on the ROSI project.
Speaker 2
44:49
And they're extending this to playing games and learning the rules of games through text-based descriptions and multimodal experience. So in order to build up to here's a story in SOAR, I wanted to give you a sense of how research occurs in the group. And so there's these back and forths that occur over time between there's this piece of software called SOAR, we wanna make this thing better and give it new capabilities and so all our agents are gonna become better. And we always have to keep in mind, and you'll see this as I go further, that it has to be useful to a wide variety of agents, it has to be task independent, and it has to be efficient.
Speaker 2
45:25
For us to do anything in the architecture, all of those have to hold true. So we do something cool in the architecture, and then we say, OK, let's solve a cool problem. So let's build some agents to do this. And so this ends up testing what are the limitations, what are the issues that arise in a particular mechanism, as well as integration with others.
Speaker 2
45:45
And we get to solve interesting problems, we usually find there was something missing, and then we can go back to the architecture and rinse and repeat. Just to give you an idea, again, how SOAR works. So the working memory is actually a directed connected graph. The perception is just a subset of that graph, and so there's going to be symbolic representations of most of the world.
Speaker 2
46:06
There is a visual subsystem in which you can provide a scene graph, just not showing it here. Actions are also a subset of that graph, and so The procedural knowledge, which is also production rules, can read sections of the input, modify sections of the output, as well as arbitrary parts of the graph to take actions. So the decision procedure says, of all the things that I know to do, and I've kind of ranked them according to various preferences, what single thing should I do? Semantic memory for facts.
Speaker 2
46:36
There's episodic memory. The agent is always actually storing every experience it's ever had over time in episodic memory, and it has the ability to get back to that. And so the similar cycle we saw before, we get input in this perception called the input link. Rules are going to fire all in parallel and say here's everything I know about the situation, here's all the things I could do.
Speaker 2
46:57
Decision procedure says here's what we're going to do. Based upon the selected operator, All sorts of things could happen with respect to memories providing input, rules firing to perform computations, and as well as potentially output in the world. And remember, agent reactivity is required. We want the system to be able to react to things in the world at a very quick pace.
Speaker 2
47:24
So anything that happens in this cycle, at max, the overall cycle has to be under 50 milliseconds. And so That's going to be a constraint we hold ourselves to. And so the story I'll be telling will say how we got to a point where we started actually forgetting things. And we're an architecture that doesn't want to be like humans.
Speaker 2
47:42
We want to create cool systems. But what we realized was something that we do, there's probably some benefit to it. And we actually put it into our system, and it led to good outputs. So here's the research path I'm going to walk down.
Speaker 2
47:56
We had just a simple problem, which was we have these memory systems, and sometimes they're going to get a queue that could relate to multiple memories. And the question is, if you have a fixed mechanism, what should you return in a task-independent way? Which 1 of these many memories should you return? That was our question.
Speaker 2
48:15
And we looked to some human data on this, something called the rational analysis of memory done by John Anderson, and realized that in human language, there are recency and frequency effects that maybe would be useful. And so we actually did an analysis, found that not only does this occur, but it's useful in what are called word sense disambiguation tasks. And I'll get to that, what that means in a second. Developed some algorithms to scale this really well.
Speaker 2
48:42
And it turned out to work out well not only in the original task, but when we looked to 2 other completely different ones, the same underlying mechanism ended up producing some really interesting outputs. So let me talk about word sense disambiguation real quick. This is a core problem in natural language processing if you haven't heard of it before. Let's say we have an agent, and for some reason it needs to understand the verb to run.
Speaker 2
49:06
Looks to its memory and finds that it could run in the park, it could be running a fever, it could run an election, it could run a program. And the question is, what should a task-independent memory mechanism return if all you've been given is the verb to run? And so the rational analysis of memory looked through multiple text corpora. And what they found was, If a particular word had been used recently, it's very likely to be reused again.
Speaker 2
49:36
And if it hadn't been used recently, there's going to be this effect where the expression here, the T is time since the most recent use. It's going to sum those with a exponential decay. So what it looks like if time is going to the right, activation higher is better. As you get these individual usages, you get these little drops, and then eventually drop down.
Speaker 2
49:59
And So if we had just 1 usage of a word, the red would be what the decay would look like. And so the core problem here is, if we're at a particular point and we want to select between the blue thing or the red thing, blue would have a higher activation. And so maybe that's useful. This is how things are modeled with human memory, but is it useful in general for tasks?
Speaker 2
50:22
And so we looked at common corpora used in word sense disambiguation and just said, well, if we just look at this corpora twice and we just use answers, prior answers, I asked the question, what is the sense of this word? I took a guess. I got the right answer. And I used that recency and frequency information in my task-independent memory.
Speaker 2
50:42
Would that be useful? And somewhat of a surprise, but somewhat maybe not of a surprise, it actually performed really well across multiple corporate. So we said, okay, this seems like a reasonable mechanism. Let's look at implementing this efficiently in the architecture.
Speaker 2
51:00
And the problem was this term right here said, for every memory, for every time step, you're having to decay everything. That doesn't sound like a recipe for efficiency if you're talking about lots and lots of knowledge over long periods of time. So we made use of a nice approximation that Petrov had come up with to approximate tail effects. So, accesses that happened long, long ago, we could basically approximate their effect on the overall sum.
Speaker 2
51:31
So we had a fixed set of values. And what we basically said is, since these are always decreasing, and all we care about is relative order, let's just only recompute when someone gets a new value. So it's a guess. It's a heuristic, an approximation.
Speaker 2
51:48
But we looked at how this worked on the same set of corpora. And in terms of query time, if we made these approximations well under our 50 millisecond, the effect on task performance was negligible. In fact, on a couple of these, it got ever so slightly better in terms of accuracy. And actually, if we looked at the individual decisions that were being made, making these sorts of approximations were leading to up to 90%-- sorry, at least 90% of the decisions being made were identical to having done the true full calculation.
Speaker 2
52:25
So we said, this is great. And we implemented this and worked really well. And then we started working on what seemed like completely unrelated problems. 1 was in mobile robotics.
Speaker 2
52:37
We had a mobile robot I'll show a picture of in a little while roaming around the halls, performing all sorts of tasks. And what we were finding was If you have a system that's remembering everything in your short-term memory, and your short-term memory gets really, really big, I don't know about you, my short-term memory feels really, really small. I would love it to be big. But if you make your memory really big, and you try to remember something, you're now having to pull lots and lots and lots of information into your short-term memory.
Speaker 2
53:04
So the system was actually getting slower simply because it had a lot of short-term memory, representation of the overall map it was looking at. So large working memory problem. Liars, dice is a game you play with dice. We were doing an RL-based system on this, reinforcement learning.
Speaker 2
53:22
And it turned out it's a really, really big value function. We were having to store lots of data. And we didn't know which stuff we had to keep around to keep the performance up. So we had a hypothesis that forgetting was actually going to be a beneficial thing, that maybe the problem we have with our memory is that we really, really dislike this forgetting thing.
Speaker 2
53:45
Maybe it's actually useful. And so we experimented with the following policy. We said, let's forget a memory if, 1, we haven't really, it's not predicted to be useful by this base level activation. We haven't used it recently.
Speaker 2
53:59
We haven't used it frequently. Maybe it's not worth it. That and we felt confident that we could approximately reconstruct it if we absolutely had to. And if those 2 things held, we could forget something.
Speaker 2
54:13
So it's this same basic algorithm but instead of the ranking them, it's if we set a threshold for base level activation, finding when it is that a memory is going to pass that threshold, and try to forget based upon that in a way that's efficient, that isn't going to scale really, really poorly. So we were able to come up with an efficient way to implement this using an approximation that ended up for most memories to be exactly correct to the original. I'm happy to go over details of this if anybody's interested later. But it ended up being a fairly close approximation.
Speaker 2
54:54
1 that, as compared to an accurate, completely accurate search for the value, ended up being somewhere between 15 to 20 times faster. And so when we looked at our mobile robot here, oh sorry let me get this back, because our little robots actually going around that's the third floor of the computer science building at the University of Michigan. He's going around. He's building a map.
Speaker 2
55:19
And again, the idea was this map is getting too big. So here was the basic idea. As the robot's going around, it's going to need this map information about rooms. The color there is describing kind of the strength of the memory.
Speaker 2
55:31
And as it gets farther and farther away and it hasn't used part of the map for planning or other purposes, basically make it decay away so that by the time it gets to the bottom, it's forgotten about the top. But we had the belief that we could reconstruct portions of that map if necessary. And so the hypothesis was this would take care of our speed problems. And so what we looked at was, here's our 50 millisecond threshold.
Speaker 2
55:56
If we do no forgetting whatsoever, bad things were happening over time. So just
Speaker 1
56:02
3,600
Speaker 2
56:03
seconds, This isn't a very long time. We're passing that threshold. This is dangerous for the robot.
Speaker 2
56:09
If we implemented task-specific, basically, cleanup rules, which is really hard to get right, that basically solved the problem. When we looked at our general forgetting mechanism that we're using in other places, at an appropriate level of decay, we were actually doing better than hand-tuned rules. So this was kind of a surprise win for us. The other task seems totally unrelated.
Speaker 2
56:31
It's a dice game. You cover your dice. You make bids about what are under other people's cups. This is played in Pirates of the Caribbean when they're on the boat in the second movie and bidding for lives of service.
Speaker 2
56:43
Honestly, this is a game we love to play in the University of Michigan lab. And so we're like, could Sor play this? And so we built a system that could learn to play this game rather well with reinforcement learning. And so the basic idea was, in a particular state of the game, Sor would have options of actions to perform.
Speaker 2
57:01
It could construct estimates of their associated value. It would choose 1 of those, and depending on the outcome, something good happened, you might update that value. And the big problem was that the size of the state space, the number of possible states and actions, just is enormous. And so memory was blowing up.
Speaker 2
57:20
And so what we said, similar sort of hypothesis, if we decay away these estimates that we could probably reconstruct and we haven't used in a while, are things going to get better? And so if we don't forget at all, 40,000 games isn't a whole lot when it comes to reinforcement learning. We were up at 2 gigs. We wanted to put this on an iPhone.
Speaker 2
57:42
That wasn't going to work so well. There had been prior work that had used a similar approach. They were down at 400 or 500 megs. The iPhone's not going to be happy, but it'll work.
Speaker 2
57:56
So that gave us some hope. And we implemented our system. OK, we're somewhere in the middle. We can fit on the iPhone, a very good iPhone, maybe an iPad.
Speaker 2
58:06
The question was, though, 1, efficiency. Yeah, we fit under our 50 milliseconds. But 2, how does the system actually perform when you start forgetting stuff? Can it learn to play well?
Speaker 2
58:18
And so y-axis here, you're seeing competency. You play 1,000 games. How many do you win? So the bottom here, 500, that's flipping a coin, whether or not you're going to win.
Speaker 2
58:30
If we do no forgetting whatsoever, this is a pretty good system. The prior work, while keeping the memory low, was also suffering with respect to how well it was playing the game. Kind of cool was the system that was basically more than having the memory requirement was still performing at the level of no forgetting whatsoever. So just to bring back why I went through this story was, we had a problem.
Speaker 2
59:00
We looked to our example of human-level AI, which is humans themselves. We took an idea, it turned out to be beneficial, we found efficient implementations, and then found it was useful in other parts of the architecture and other tasks that didn't seem to relate whatsoever. But if you download SOAR right now, you would gain access to all these mechanisms for whatever task you wanted to perform. Just to give some sense in the field of cognitive architecture what some of the open issues are, I think this is true in a lot of fields in AI, but integration of systems over time.
Speaker 2
59:33
The goal was that you wouldn't have all these theories and so you could just kind of build over time, particularly when folks are working on different architectures, that becomes hard. But also when you have very different initial starting points, that can still be an issue. Transfer learning is an issue. We're building into the space of multimodal representations, which is to say not only abstract symbolic, but also visual.
Speaker 2
59:56
Wouldn't it be nice if we had auditory and other senses?
Omnivision Solutions Ltd