Guido van Rossum is the creator of Python programming language. Please support this podcast by checking out our sponsors:
- GiveDirectly: https://givedirectly.org/lex to get gift matched up to $1000
- Eight Sleep: https://www.eightsleep.com/lex to get special savings
- Fundrise: https://fundrise.com/lex
- InsideTracker: https://insidetracker.com/lex to get 20% off
- Athletic Greens: https://athleticgreens.com/lex to get 1 month of fish oil

EPISODE LINKS:
Guido's Twitter: https://twitter.com/gvanrossum 
Guido's Website: https://gvanrossum.github.io/
Python's Website: https://python.org

PODCAST INFO:
Podcast website: https://lexfridman.com/podcast
Apple Podcasts: https://apple.co/2lwqZIr
Spotify: https://spoti.fi/2nEwCF8
RSS: https://lexfridman.com/feed/podcast/
Full episodes playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOdP_8GztsuKi9nrraNbKKp4
Clips playlist: https://www.youtube.com/playlist?list=PLrAXtmErZgOeciFP3CBCIEElOJeitOr41

OUTLINE:
0:00 - Introduction
0:48 - CPython
6:01 - Code readability
10:22 - Indentation
26:58 - Bugs
38:26 - Programming fads
53:37 - Speed of Python 3.11
1:18:31 - Type hinting
1:23:49 - mypy
1:29:05 - TypeScript vs JavaScript
1:45:05 - Best IDE for Python
1:55:05 - Parallelism
2:12:58 - Global Interpreter Lock (GIL)
2:22:36 - Python 4.0
2:34:53 - Machine learning
2:44:35 - Benevolent Dictator for Life (BDFL)
2:56:11 - Advice for beginners
3:02:43 - GitHub Copilot
3:06:10 - Future of Python

SOCIAL:
- Twitter: https://twitter.com/lexfridman
- LinkedIn: https://www.linkedin.com/in/lexfridman
- Facebook: https://www.facebook.com/lexfridman
- Instagram: https://www.instagram.com/lexfridman
- Medium: https://medium.com/@lexfridman
- Reddit: https://reddit.com/r/lexfridman
- Support on Patreon: https://www.patreon.com/lexfridman

Of how to come up with an efficient algorithm.

And sometimes the more efficient algorithm is not so much more complex than the inefficient 1.

But that's an art and it's not always the case.

In the general cases, the more performant the algorithm, the more complex it's going to be.

The simpler algorithms are also the ones that people invent first, because when you're looking for a solution, you look at the simplest way to get there first.

And so if there is a simple solution, even if it's not the best solution, not the fastest or the most memory efficient or whatever, a simple solution and simple is fairly subjective, but mathematicians have also thought about sort of what is a good definition for simple in the case of algorithms.

But the simpler solutions tend to be easier to follow for other programmers who haven't made a study of a particular field.

And when I started with Python, I was a good programmer in general.

But there were many areas where I was only somewhat familiar with the state of the art.

And so I picked in many cases the simplest way I could solve a particular sub problem.

Because when you're designing and implementing a language, you have to like, you have many hundreds of little problems to solve.

And you have to have solutions for every 1 of them before you can sort of say, I've invented a programming language.

First of all, so CPython, what kind of things does it do?

It's an interpreter, it takes in this readable language that we talked about, that is Python.

The interpreter basically, it's sort of a recipe for understanding recipes.

So instead of a recipe that says, bake me a cake, we have a recipe for, well, given the text of a program, how do we run that program?

And that is sort of the recipe for building a computer.

What are the algorithmically tricky things that happen to be low-hanging fruit that could be improved on?

Maybe throw out the history of Python, but also now.

How is it possible that 3.11 in year 2022, it's possible to get such a big performance improvement?

On a few areas where we still felt there was low hanging fruit.

The biggest 1 is actually the interpreter itself.

And this has to do with details of how Python is defined.

So I didn't know if the fisherman is going to follow this story.

Python is actually, even though it's always called an interpreted language, there's also a compiler in there.

It compiles to bytecode, which is sort of code for an imaginary computer that is called the Python interpreter.

code that is more easily digestible by the interpreter or is digestible at all?

It is the code that is digested by the interpreter.

We tweaked very minor bits of the compiler.

Almost all the work was done in the interpreter because when you have a program you compile it once and then you run the code a whole bunch of times or maybe there's 1 function in the in the code that gets run many times.

Now I know that that sort of people who who know this field are expecting me to at some point say we built a just-in-time compiler.

We just made the interpreter a little more efficient.

a thing from the Java world, although it's now applied to almost all programming languages, especially interpreted ones.

So you see the compiler inside Python, not like a just-in-time compiler, but it's a compiler that creates bytecode that is then fed to the interpreter.

And the compiler, was there something interesting to say about the compiler?

It's interesting that you haven't changed that, tweaked that at all, or much.

And so we only had to change the parts of the compiler where we decided that the breakdown of a Python program in bytecode instructions had to be slightly different.

But that didn't gain us the performance improvements.

The performance improvements were like making the interpreter faster in part by sort of removing the fat from some internal data structures used by the interpreter.

But the key idea is an adaptive specializing interpreter.

Well let me first talk about the specializing part because the adaptive part is the sort of the second-order effect but they're both important.

So bytecode is a bunch of machine instructions, but it's an imaginary machine.

But the machine can do things like call a function, add 2 numbers, print a value.

Those are sort of typical instructions in Python.

And if we take the example of adding 2 numbers, actually In Python, the language, there's no such thing as adding 2 numbers.

The compiler doesn't know that you're adding 2 numbers.

You might as well be adding 2 strings or 2 lists or 2 instances of some user-defined class that happened to implement this operator called add.

That's a very interesting and fairly powerful mathematical concept.

It's mostly a user interface trick because it means that a certain category of functions can be written using a single symbol, the plus sign, and sort of a bunch of other functions can be written using another single symbol, the multiply sign.

So if we take addition, the way traditionally in Python, the add byte code was executed is pointers, pointers, and more pointers.

An object is basically a pointer to a bunch of memory that contains more pointers.

Well not quite, but there are a lot of them.

So To simplify a bit, we look up in 1 of the objects, what is the type of that object and does that object type define an add operation.

And so you can imagine that there is a sort of a type integer that knows how to add itself to another integer.

And there is a type floating point number that knows how to add itself to another floating point number.

And the integers and floating point numbers are sort of important, I think, mostly historically, because in the first computers, you used the sort of the same bit pattern when interpreted as a floating point number had a very different value than when interpreted as an integer.

Given the basics of int and float and add, who carries the knowledge of how to add 2 integers?

It's the type integer and the type float.

Does the operator just exist as a platonic form possessed by the integer?

It's an index in a list of functions that the integer type defines.

And so the integer type is really a collection of functions.

And there is an add function, and there is a multiply function, and there are like 30 other functions for other operations there's a power function for example and you can imagine that in in memory there is a distinct slot for the add operations.

Let's say the add operation is the first operation of a type and the multiply is the second operation of a type.

So now we take the integer type and we take the floating point type.

In both cases the add operation is the first slot and multiply is the second slot.

But each slot contains a function and the functions are different because the addToIntegers function interprets the bit patterns as integers.

The addToFloat function interprets the same bit pattern as a floating point number.

And then there is the string data type, which again interprets the bit pattern as the address of a sequence of characters.

There are lots of lies in that story, but that's sort

I can tell, I can tell the fake news and the fabrication going on here at the table.

The optimization is the observation that in a particular line of code, so now you write your little Python program and you write a function and that function sort of takes a bunch of inputs and at some point it adds 2 of the inputs together.

Now I bet you even if you call your function a thousand times that all those calls are likely all going to be about integers.

Because maybe your program is all about integers.

Or maybe on that particular line of code where there's that plus operator, Every time the program hits that line, the variables A and B that are being added together happen to be strings.

And so what we do is instead of having this single byte code that says, here's an add operation and the implementation of add is fully generic it looks at the object from the object it looks at the type then it takes the type and it looks at looks up the function pointer then it calls the function now the function has to be has to look at the other argument and has to double check that the other argument has the right type.

And then there's a bunch of error checking before it can actually just go ahead and add the 2 bit patterns in the right way.

What we do is every time we execute an add instruction like that, we keep a little note of in the end after we hit the code that did the addition for a particular type, what type was it?

And then after a few times through that code, if it's the same type all the time, we say, oh, so this add operation, even though it's the generic add operation, it might as well be the add integer operation.

And add integer operation is much more efficient because it just says, assume that A and B are integers, do the addition operation, do it right there in line, and produce the result.

And the big lie here is that in Python, even if you have great evidence that in the past, it was always 2 integers that you were adding, at some point in the future, that same line of code could still be hit with 2 floating points or 2 strings or maybe a string and an integer.

I didn't account for what should happen in that case in the way I told the story.

And so what we actually have to do is when we have the add integer operation, we still have to check, are the 2 arguments in fact integers?

We applied some tricks to make those checks efficient.

And we know statistically that the outcome is almost always, yes, they are both integers.

And so we quickly make that check and then we proceed with the sort of add integer operation.

And then there is a fallback mechanism where we say, oops, 1 of them wasn't an integer.

Now we're going to pretend that it was just the fully generic add operation.

We wasted a few cycles believing it was going to be 2 integers and then we had to back up but we didn't waste that much time and statistically most of the time.

Basically we were sort of hoping that most of the time we guess right, because if it turns out that we guessed wrong too often, or we didn't have a good guess at all, things might actually end up running a little slower.

So someone armed with this knowledge and a copy of the implementation, someone could easily construct a counter example where they say, oh I have a program and now it runs 5 times as slow in Python 3.11 than it did in Python 3.10.

It's a fun reverse engineering task though.

So there's some, presumably, heuristic of what defines a momentum of saying, you know, you seem to be working, adding 2 integers, not 2 generic types.

I think that the heuristic is actually, we assume that the weather tomorrow is gonna be the same as the weather today.

That is already so much better than guessing randomly.

Hey, I wonder if instead of adding 2 generic types, we start assuming that the weather tomorrow is the same as the weather today, where do you find the idea for that?

Because that ultimately, for you to do that, you have to kind of understand how people are using the language, right?

Python is not the first language to do a thing like this.

This is a fairly well known trick, especially from other interpreted languages that had reason to be sped up.

We occasionally look at papers about HHVM, which is Facebook's efficient compiler for PHP.

There are tricks known from the JVM And sometimes it just comes from academia.

So the trick here is that the type itself doesn't, the variable doesn't know what type it is.

So this is not a statically typed language where you can afford to have a shortcut to saying it's ints.

This is a trick that is especially important for interpreted languages with dynamic typing because if the compiler could read in the source These x and y that we're adding are integers.

The compiler can just insert a single add machine code, that hardware machine instruction that exists on every CPU and ditto for floats.

But because in Python you don't generally declare the types of your variables, you don't even declare the existence of your variables, they just spring into existence when you first assign them.

Which is really cool and sort of helps those beginners because there is less bookkeeping they have to learn how to do before they can start playing around with code, but it makes the, the interpretation of the code less efficient and so we're, we're sort of trying to, to make the interpretation more efficient without losing the super dynamic nature of the language.

What is type hinting and is it used by the interpreter, the hints, or is it just syntactic sugar?

So the Type hints is an optional mechanism that people can use and it's especially popular with sort of larger companies that have very large code bases written in Python.

Do you think of it as almost like documentation saying these 2 variables

I mean, so it is a sub-language of Python where you can express the types of variables.

So here is a variable and it's an integer.

And here's an argument to this function and it's a string and here is a function that returns a list of strings.

But that's not checked when you run the code.

There is a separate piece of software called a static type checker that reads all your source code without executing it and thinks long and hard about what it looks from just reading the code that code might be doing and double checks if that makes sense if you take the types as annotated into account.

So this is something you're supposed to run as you develop.

Yeah, that's definitely a development tool, but the type annotations currently are not used for speeding up the interpreter.

Even when they do use them, they sometimes contain lies, where the static type checker says everything's fine.

I cannot prove that this integer is ever not an integer, but at runtime somehow someone manages to violate that assumption.

And the interpreter ends up doing just fine.

If we started enforcing type annotations in Python, many Python programs would no longer work.

And some Python programs wouldn't even be possible because they're too dynamic.

And so we made a choice of not using the annotations.

There is a possible future where eventually 345 releases in the future, we could start using those annotations to sort of provide hints because we can still say, well, the source code leads us to believe that these x and y are both integers and so we can generate an add an add integer instruction but we can still have a fallback that says oh if the if somehow the code coded runtime provided something else maybe it provided 2 decimal numbers we can still use that generic add operation as a fallback.

Is there currently a mechanism or do you see something like that where you can almost add like an assert inside a function that says please check that my type hints are actually mapping to reality.

There are third party libraries that are in that business.

It's possible for a third party library to take a hint and enforce it?

Well, what we actually do is, and I think this is a fairly unique feature in Python, the type hints can be introspected at runtime.

So while the program is running, they mean Python is a very introspectable language.

You can look at a variable and ask yourself what is the type of this variable and if that variable happens to refer to a function you can ask what are the arguments to the function.

And nowadays you can also ask what are the type annotations for the function.

So the type annotations are there inside the variable as it's at runtime.

They're mostly associated with the function object, not with each individual variable, but you can sort of map from the arguments to the variables.

And that's what a third party library can help with.

And the problem with that is that all that extra runtime type checking is going to slow your code down instead of speed it up?

I think to reference this sales pitchy blog post that says 75% of developers time is spent on debugging, I would say that in some cases that might be okay.

It might be okay to pay the cost of performance for the catching of the

And in most cases, doing it statically before you ship your code to production is more efficient than doing it at runtime piecemeal.

Can you tell me about MYPY, MyPy project?

And in general, What is the future of static typing in Python?

Well, so MyPy was started by a Finnish developer, Jukka Lettusalo.

So many cool things out of Finland, I gotta say.

I guess people have nothing better to do in those long, cold winters.

I think Jukka lived in England when he invented that stuff, actually.

But MyPy is the original static type checker for Python.

And the type annotations that were introduced with PEP 484 were sort of developed together with the static type checker.

And in fact, Yuka had first invented a different syntax that wasn't quite compatible with Python.

And Jukka and I sort of met at a Python conference in, I think, in 2013.

And we sort of came up with a compromise syntax that would not require any changes to Python and that would let MyPy sort of be an add-on static type checker for Python.

Just out of curiosity, was it like double colon or something?

What was he proposing that would break Python?

I think he was using angular brackets for types like in C++ or Java generics.

Yeah, you can't use angular brackets in Python.

Well, the key thing is that we already had a syntax for annotations, we just didn't know what to use them for yet.

So type annotations were just the sort of most logical thing to use that existing dummy syntax for.

But there was no syntax for defining generics directly syntactically in the language.

Mypy literally meant my version of Python, where my refers to Yuka.

He had a parser that translated Mypy into Python by like doing the type checks and then removing the annotations and all the angular brackets from the positions where he was using them.

But a pre-processor model doesn't work very well with the typical workflow of Python development projects.

I mean, that could have been another major split if it became successful.

Like if you watch TypeScript versus JavaScript, it's like a split in the community over types, right?

There are certainly plenty of people who don't use TypeScript, but just use the original JavaScript notation, just like there are many people in the Python world who don't use type annotations and don't use static type checkers.

No, I know, but there is a bit of a split between TypeScript and old school JavaScript, ES, whatever.

Well, in the JavaScript world, transpilers are sort of the standard way of working anyway, which is why TypeScript being a transpiler itself is not a big deal.

And transpilers, for people who don't know, it's exactly the thing you said with MyPy, it's the code, I guess you call it pre-processing code that translates from 1 language to the other.

And that's part of the culture, part of the workflow of the JavaScript community, so.

At the same time, an interesting development in the JavaScript slash TypeScript world at the moment is that there is a proposal under consideration, it's only a stage 1 proposal, that proposes to add a feature to JavaScript where just like Python, it will ignore certain syntax when running the JavaScript code.

And what it ignores is more or less a superset of the TypeScript annotation syntax.

So that would mean that eventually, if you wanted to, you could take TypeScript and you could shove it directly into a JavaScript interpreter without translation.

The interesting thing in the JavaScript world, at least the web browser world, the web browsers have changed how they deploy and they sort of update their JavaScript engines much more quickly than they used to in the early days and so there's much less of a need for translation in JavaScript itself because most browsers just support the most recent version of ECMAScript.

Just on a tangent of a tangent, do you see, if you were

to recommend somebody use a thing, would you recommend TypeScript or JavaScript?

Just because of the strictness of the typing.

It's an enormously helpful extra tool that helps you sort of

keep your head straight about what your code is actually doing.

I mean it helps with editing your code, it helps with ensuring that your code is not too incorrect.

And it's actually quite compatible with JavaScript, never mind this syntactic sort of hack that is still years in the future.

But any library that is written in pure JavaScript can still be used from TypeScript programs.

And also the other way around, you can write a library in TypeScript and then export it in a form that is totally consumable by JavaScript.

That sort of compatibility is sort of the key to the success of TypeScript.

Yeah, just to look at it, it's almost like a biological system that's evolving.

It's fascinating to see JavaScript evolve the way it does.

Well, maybe we should consider that biological systems are just engineering systems too, right?

But it's almost like the most visceral in the JavaScript world because there's just so much code written in JavaScript that for its history was messy.

If you're talking about bugs per line of code, I just feel like JavaScript eats the cake or whatever the terminology is.

It beats Python by a lot in terms of number of bugs, meaning like way more bugs in JavaScript.

And then obviously the browsers are developed.

I mean, just there's so much active development, it feels a lot more like evolution, where a bunch of stuff is born and dies, and there's experimentation and debates, versus Python, there's more, all that stuff is happening, but there's just a longer history of stable, working, giant software systems written in Python versus JavaScript is just a giant, beautiful, I would say, mess of code.

It's a very different culture and to some extent differences in culture are random, but to some extent the differences have to do with the environment.

And the fact that JavaScript is primarily the language for developing web applications, especially the client side.

And the fact that it's basically the only language for developing web applications makes that community sort of just have a different nature than the community of other languages.

Plus the graphical component and the fact that they're deploying it on all kinds of shapes of screens and devices and all that kind of stuff, it just creates a beautiful chaos.

So what, okay, you met, you talked about a syntax that could work.

What's the future of static typing in Python?

It is still controversial, but it is much more accepted than when MyPy and PEP484 were young.

What's the connection between PEP484 type hints and MyPy?

MyPy was the original static type checker.

So MyPy quickly evolved from Yuka's own variant of Python to a static type checker for Python.

And sort of PEP 484, that was like a very productive year where like many hundreds of messages were exchanged debating the merits of every aspect of that PEP.

And so MyPy is a static type checker for Python.

Most additional static typing features that we introduced in the time since 3.6 were also prototyped through MyPy.

See all Lex Fridman transcripts on Youtube

Guido van Rossum: Python and the Future of Programming | Lex Fridman Podcast #341