The following is a conversation with Rajat Manga.

He's an engineering director at Google, leading the TensorFlow team.

TensorFlow is an open source library at the center of much of the work going on in the world in deep learning, both the cutting edge research and the large scale application of learning based approaches.

But it's quickly becoming much more than a software library.

It's now an ecosystem of tools for the deployment of machine learning in the cloud, on the phone, in the browser, on both generic and specialized hardware, TPU, GPU, and so on.

Plus, there's a big emphasis on growing a passionate community of developers.

Raja, Jeff Dean, and a large team of engineers at Google Brain are working to define the future of machine learning with TensorFlow 2.0, which is now in alpha.

I think the decision to open source TensorFlow was a definitive moment in the tech industry.

It showed that open innovation can be successful and inspire many companies to open source their code to publish and in general engage in the open exchange of ideas.

This conversation is part of the Artificial Intelligence Podcast.

If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter at Lex Friedman, spelled F-R-I-D.

And now, here's my conversation with Rajat Manga.

You were involved with Google Brain since its start in 2011 with Jeff Dean.

It started with Disbelief, the proprietary machine learning library, and turned into TensorFlow in 2014, the open source library.

So what were the early days of Google Brain like?

How do you even proceed forward once there's so much possibilities before you?

It was interesting back then, you know, when I started, or when we were even just talking about it, the idea of deep learning was interesting and intriguing in some ways.

It hadn't yet taken off, but it held some promise.

It had shown some very promising and early results.

I think the idea where Andrew and Jeff had started was, what if we can take this, what people are doing in research and scale it to what Google has in terms of the compute power.

And also put that kind of data together, what does it mean?

And so far the results have been if you scale the compute, scale the data, it does better and would that work?

And so that was the first year or 2, can we prove that out, right?

And with Disbelief, when we started the first year, we got some early wins, which is always great.

What was the wins where you were, there's some problems to this, this is gonna be good?

I think there are 2 early wins where 1 was speech that we collaborated very closely with the speech research team who was also getting interested in this.

And the other 1 was on images where we, you know, the cat paper as we call it, that was covered by a lot of folks.

And the birth of Google Brain was around neural networks.

That was, so it was deep learning from the very beginning.

So what would, in terms of scale, what was the sort of dream of what this could become?

Were there echoes of this open source TensorFlow community that might be brought in?

Was there a sense of like, machine learning is now gonna be at the core of the entire company, is going to grow into that direction?

Yeah, I think, so that was interesting, and like if I think back to 2012 or 2011, and first was can we scale it, and in the year or So we had started scaling it to hundreds and thousands of machines.

In fact, we had some runs even going to 10,000 machines.

In terms of machine learning at Google, the good thing was Google's been doing machine learning for a long time.

But as we scaled this up, we showed that yes, that was possible, and it was going to impact lots of things.

Like we started seeing real products wanting to use this.

There were image things that photos came out of and then many other products as well.

As we went into that a couple of years, externally also academia started to, there was lots of push on, okay, deep learning's interesting, we should be doing more, and so on.

And so by 2014, we were looking at, okay, this is a big thing, it's gonna grow.

And not just internally, externally as well.

Yes, maybe Google's ahead of where everybody is, but there's a lot to do.

So a lot of this started to make sense and come together.

So the decision to open source, I was just chatting with Chris Glattner about this, the decision to go open source with TensorFlow, I would say, for me personally, seems to be 1 of the big seminal moments in all of software engineering ever.

I think that's when a large company like Google decides to take a large project that many lawyers might argue has a lot of IP, just decide to go open source with it, and in so doing, lead the entire world in saying, you know what, open innovation is a pretty powerful thing, and it's okay to do.

That was, I mean, that's an incredible moment in time.

So do you remember those discussions happening?

I would say, I think, so the initial idea came from Jeff, who was a big proponent of this.

1 was, research-wise, we were a research group.

We were putting all our research out there.

If you wanted to, we were building on others' research and we wanted to push the state of the art forward.

And part of that was to share the research.

That's how I think deep learning and machine learning has really grown so fast.

So the next step was, okay, now, would software help with that?

And it seemed like there were existing a few libraries out there, Tiano being 1, Torch being another, and a few others.

But they were all done by academia, and so the level was significantly different.

The other 1 was from a software perspective, Google had done lots of software that we used internally, and we published papers.

Often there was an open source project that came out of that, that somebody else picked up that paper and implemented, and they were very successful.

Back then it was like, okay, there's Hadoop, which has come off of tech that we've built.

We know the tech we've built is way better for a number of different reasons.

And turns out we have Google Cloud and we are now not really providing our tech, but we are saying, okay, we have Bigtable, which is the original thing.

We are going to now provide HBase APIs on top of that, which isn't as good, but that's what everybody's used to.

So there's like, can we make something that is better and really just provide?

Helps the community in lots of ways, but also helps push a good standard forward.

There's a TensorFlow open source library.

And how does the fact that you can use so many of the resources that Google provides and the Cloud fit into that strategy?

So TensorFlow itself is open, and you can use it anywhere.

And we want to make sure that continues to be the case.

On Google Cloud we do make sure that there's lots of integrations with everything else and we want to make sure that it works really really well there.

Can you tell me the history and the timeline of TensorFlow project in terms of major design decisions, so like the open source decision, but really what to include and not?

There's this incredible ecosystem that I'd like to talk about.

There's all these parts, but what if you just, some sample moments that defined what TensorFlow eventually became through its, I don't know if you're allowed to say history when it's just, but in deep learning everything moves so fast in just a few years, is already history.

So looking back, we were building TensorFlow, I guess we open sourced it in

We started on it in summer of 2014, I guess.

And somewhere like 3 to 6, late 2014, by then we had decided that, okay, there's a high likelihood we'll open source it.

So we started thinking about that and making sure we're heading down that path.

By that point, we had seen a few, lots of different use cases at Google.

So there were things like, okay, yes, you want to run in at large scale in the data center.

Yes, we need to support different kind of hardware, we had GPUs at that point, we had our first GPU at that point, or was about to come out, you know, roughly around that time.

We had started to push on mobile, so we were running models on mobile.

At that point, people were customizing code, so we wanted to make sure TensorFlow could support that as well, so that that sort of became part of that overall design.

When you say mobile, you mean like pretty complicated algorithms running on the phone?

So when you have a model that you deploy on the phone and run it the right way.

So already at that time there was ideas of running machine learning on the phone.

We already had a couple of products that were doing that by then.

And in those cases, we had basically customized handcrafted code or some internal libraries that we're using.

So I was actually at Google during this time in a parallel, I guess, universe, but we were using Theano and Caffe.

Was there some degree to which you were balancing, like trying to see what Caffe was offering people, trying to see what Theano was offering, that you want to make sure you're delivering on whatever that is, perhaps the Python part of things, maybe, did that influence any design decisions?

Totally, so when we built this belief, And some of that was in parallel with some of these libraries coming up.

But we were building Disbelief focused on our internal thing because our systems were very different.

By the time we got to this, we looked at a number of libraries that were out there.

Tiano, there were folks in the group who had experience with Torch, with Lua.

There were folks here who had seen Caffe.

I mean, actually, Yang Jing was here as well.

In fact, yeah, we did discuss ideas around, okay, should we have a graph or not?

And they were so supporting all these together was definitely, you know, they were key decisions that we wanted.

We had seen limitations in our prior disbelief things.

A few of them were just in terms of research was moving so fast, we wanted the flexibility, the hardware was changing fast, we expected to change that, so that those probably were 2 things.

And yeah, I think the flexibility in terms of being able to express all kinds of crazy things was definitely a big 1 then.

So what, the graph decisions, so that was moving towards TensorFlow

There's more, by default, it'll be eager execution.

So sort of hiding the graph a little bit because it's less intuitive in terms of the way people develop and so on.

What was that discussion like in terms of using graphs?

It seemed, it's kind of the Theano way, did it seem the obvious choice?

it came from was our disbelief had a graph-like thing as well.

A much more, it wasn't a general graph, it was more like a straight line thing.

More like what you might think of cafe, I guess in that sense.

But the graph was, and we always cared about the production stuff.

Like even with Disbelief, we were deploying a whole bunch of stuff in production.

So, graph did come from that when we thought of, okay, should we do that in Python?

And we experimented with some ideas where it looked a lot simpler to use, but not having a graph meant, okay, how do you deploy now?

So that was probably what tilted the balance for us and eventually we ended up with a graph.

And I guess the question there is, did you, I mean, so production seems to be the really good thing to focus on, but Did you even anticipate the other side of it where there could be, what is it, what are the numbers?

I mean, was that even a possibility in your mind that it would be as popular as it became?

So I think we did see a need for this a lot from the research perspective and like early days of deep learning in some ways.

million, No, I don't think I imagined this number then.

It seemed like there's a potential future where lots more people would be doing this and how do we enable that?

I would say this kind of growth, I probably started seeing somewhat after the open sourcing where it was like, okay, you know, deep learning is actually growing way faster for a lot of different reasons.

And we are in just the right place to push on that and leverage that and deliver on lots of things that people want.

Like how, you know, this incredible amount of attention from a global population of developers, what, how did the project start changing?

I don't even actually remember during those times.

I know looking now, there's really good documentation, there's an ecosystem of tools, there's a community, there's a YouTube channel now, right?

I think we called it 0.6 or 5, something like that.

I think we've gone through a few things there.

When we started out, when we first came out, people loved the documentation we have, because it was just a huge step up from everything else, because all of those were academic projects, people doing, you know, would don't think about documentation.

I think what that changed was, instead of deep learning being a research thing, some people who were just developers could now suddenly take this out and do some interesting things with it, right?

Who had no clue what machine learning was before then.

And that, I think, really changed how things started to scale up in some ways and pushed on it.

Over the next few months, as we looked at, you know, how do we stabilize things?

As we look at not just researchers, now we want stability, people want to deploy things.

And there are certain needs for that perspective.

And so again, documentation comes up, designs, more kinds of things to put that together.

And so that was exciting to get that to a stage where more and more enterprises wanted to buy in and really get behind that.

And I think post 1.0 and you know, with the next few releases, that enterprise adoption also started to take off.

I would say between the initial release and 1.0, it was, okay, researchers, of course, then a lot of hobbies and early interest, people excited about this who started to get on board, and then over the 1.x thing, lots of enterprises.

I imagine anything that's below 1.0, gets pressure to be, enterprise probably wants something that's stable.

And Do you have a sense now that TensorFlow is state, like it feels like deep learning in general is extremely dynamic field, so much is changing.

You have a sense of stability at the helm of it?

I mean, I know you're in the midst of it, but.

Yeah, I think in the midst of it, it's often easy to forget what an enterprise wants and what some of the people on that side want.

There are still people running models that are 3 years old, 4 years old.

So Inception is still used by tons of people.

Even ResNet-50 is, what, a couple of years old now or more?

But there are tons of people who use that, and they're fine.

They don't need the last couple of bits of performance or quality.

They want some stability in things that just work.

And so there is value in providing that with that kind of stability and making it really simpler because that allows a lot more people to access it.

And then there's the research crowd which wants, okay, they want to do these crazy things exactly like you're saying, right?

And not just deep learning in the straight up models that used to be there.

They want RNNs and even RNNs are maybe old, they are transformers now, and now it needs to combine with RL and GANs and so on.

So there's definitely that area that, like the boundary that's shifting and pushing the state of the art.

But I think there's more and more of the past that's much more stable and even stuff that was 2, 3 years old is very, very usable by lots of people.

So I imagine, maybe you can correct me if I'm wrong, 1 of the biggest use cases is essentially taking something like ResNet-50 and doing some kind of transfer learning on a very particular problem that you have.

It's basically probably what majority of the world does.

And you want to make that as easy as possible.

So I would say, for the hobbyist perspective, that's the most common case, right?

In fact, the apps on phones and stuff that you'll see, the early ones, that's the most common case.

I would say there are a couple of reasons for that.

What enterprises want is that is part of it, but that's not the big thing.

Enterprises really have data that they want to make predictions on.

This is often what they used to do with the people who were doing ML was just regression models, linear regression, logistic regression, linear models, or maybe gradient booster trees and so on.

Some of them still benefit from deep learning, but they weren't that, that's the bread and butter, like the structured data and so on.

So depending on the audience you look at, they're a little bit different.

And they just have, I mean, the best of enterprise probably just has a very large data set, or deep learning can probably shine.

And then I think the other pieces that they want, again, with 2.0, the developer summit we put together is the whole TensorFlow extended piece, which is the entire pipeline.

They care about stability across doing their entire thing.

They want simplicity across the entire thing.

I don't need to just train a model, I need to do that every day again, over and over again.

I wonder to which degree you have a role in, I don't know, so I teach a course on deep learning, I have people like lawyers come up to me and say, you know, say, when is machine learning gonna enter legal, the legal realm?

The same thing in all kinds of disciplines, immigration, insurance.

Often when I see what it boils down to is these companies are often a little bit old school in the way they organize the data.

Do you also find yourself being in the role of an evangelist for like, let's get, organize your data folks and then you'll get the big benefit of TensorFlow.

Do you get those, have those conversations?

Yeah, yeah, I get all kinds of questions there from, okay, what do I need to make this work, right?

I already use this linear model, why would this help?

I don't have enough data, let's say, you know, or I want to use machine learning, but I have no clue where to start.

So it varies, back to all the way to the experts who wise for very specific things, so it's interesting.

It boils down to oftentimes digitizing data.

So whatever you want automated, whatever data you want to make prediction based on, you have to make sure that it's in an organized form.

Like within the TensorFlow ecosystem, there's now, you're providing more and more datasets and more and more pre-trained models.

Are you finding yourself also the organizer of datasets?

Yes, I think with TensorFlow datasets that we just released, that's definitely come up where people want these data sets.

Can we organize them and can we make that easier?

The other related thing I would say is I often tell people, you know what, don't think of the most fanciest thing, that the newest model that you see.

Make something very basic work, and then you can improve it.

There's just lots of things you can do with it.

1 of the big things that makes it, makes TensorFlow even more accessible was the appearance, whenever that happened, of Keras.

The Keras standard, sort of outside of TensorFlow.

I think it was Keras on top of Tiano at first only, and then Keras became on top of TensorFlow.

Do you know when Keras chose to also add TensorFlow as a backend, was it just the community that drove that initially?

Do you know if there was discussions, conversations?

Yeah, so Francois started the Keras project before he was at Google.

I don't remember if that was after TensorFlow was created or way before.

And then at some point when TensorFlow started becoming popular, there were enough similarities that he decided to create this interface and put TensorFlow as a backend.

I believe that might still have been before he joined Google.

He decided on his own and thought that was interesting and relevant to the community.

In fact, I didn't find out about him being at Google until a few months after he was here.

He was working on some research ideas and doing Keras on his nights and weekends project.

So he wasn't like part of the TensorFlow.

He joined research and he was doing some amazing research.

And at some point we realized, oh, he's doing this good stuff.

See all Lex Fridman transcripts on Youtube

Rajat Monga: TensorFlow | Lex Fridman Podcast #22