3 hours 15 minutes 50 seconds
🇬🇧 English
Speaker 1
01:00:00
Of how to come up with an efficient algorithm. And sometimes the more efficient algorithm is not so much more complex than the inefficient 1. But that's an art and it's not always the case. In the general cases, the more performant the algorithm, the more complex it's going to be.
Speaker 1
01:00:18
There's a kind of trade off.
Speaker 2
01:00:20
The simpler algorithms are also the ones that people invent first, because when you're looking for a solution, you look at the simplest way to get there first. And so if there is a simple solution, even if it's not the best solution, not the fastest or the most memory efficient or whatever, a simple solution and simple is fairly subjective, but mathematicians have also thought about sort of what is a good definition for simple in the case of algorithms. But the simpler solutions tend to be easier to follow for other programmers who haven't made a study of a particular field.
Speaker 2
01:01:09
And when I started with Python, I was a good programmer in general. I knew sort of basic data structures. I knew the C language pretty well. But there were many areas where I was only somewhat familiar with the state of the art.
Speaker 2
01:01:30
And so I picked in many cases the simplest way I could solve a particular sub problem. Because when you're designing and implementing a language, you have to like, you have many hundreds of little problems to solve. And you have to have solutions for every 1 of them before you can sort of say, I've invented a programming language.
Speaker 1
01:01:56
First of all, so CPython, what kind of things does it do? It's an interpreter, it takes in this readable language that we talked about, that is Python. What is it supposed to do?
Speaker 2
01:02:08
The interpreter basically, it's sort of a recipe for understanding recipes. So instead of a recipe that says, bake me a cake, we have a recipe for, well, given the text of a program, how do we run that program? And that is sort of the recipe for building a computer.
Speaker 1
01:02:34
The recipe for the baker and the chef. What are the algorithmically tricky things that happen to be low-hanging fruit that could be improved on? Maybe throw out the history of Python, but also now.
Speaker 1
01:02:48
How is it possible that 3.11 in year 2022, it's possible to get such a big performance improvement?
Speaker 2
01:02:57
We focused. On a few areas where we still felt there was low hanging fruit. The biggest 1 is actually the interpreter itself.
Speaker 2
01:03:11
And this has to do with details of how Python is defined. So I didn't know if the fisherman is going to follow this story.
Speaker 1
01:03:21
He already jumped off the boat. He's bored. He's stupid.
Speaker 2
01:03:25
Yeah. Python is actually, even though it's always called an interpreted language, there's also a compiler in there. It just doesn't compile to machine code. It compiles to bytecode, which is sort of code for an imaginary computer that is called the Python interpreter.
Speaker 2
01:03:45
So it's compiling
Speaker 1
01:03:46
code that is more easily digestible by the interpreter or is digestible at all?
Speaker 2
01:03:51
It is the code that is digested by the interpreter. That's the compiler. We tweaked very minor bits of the compiler.
Speaker 2
01:03:57
Almost all the work was done in the interpreter because when you have a program you compile it once and then you run the code a whole bunch of times or maybe there's 1 function in the in the code that gets run many times. Now I know that that sort of people who who know this field are expecting me to at some point say we built a just-in-time compiler. Actually we didn't. We just made the interpreter a little more efficient.
Speaker 1
01:04:31
What's a just-in-time compiler? That is
Speaker 2
01:04:35
a thing from the Java world, although it's now applied to almost all programming languages, especially interpreted ones.
Speaker 1
01:04:44
So you see the compiler inside Python, not like a just-in-time compiler, but it's a compiler that creates bytecode that is then fed to the interpreter. And the compiler, was there something interesting to say about the compiler? It's interesting that you haven't changed that, tweaked that at all, or much.
Speaker 1
01:05:02
We changed some
Speaker 2
01:05:03
parts of the bytecode but not very much. And so we only had to change the parts of the compiler where we decided that the breakdown of a Python program in bytecode instructions had to be slightly different. But that didn't gain us the performance improvements.
Speaker 2
01:05:28
The performance improvements were like making the interpreter faster in part by sort of removing the fat from some internal data structures used by the interpreter. But the key idea is an adaptive specializing interpreter.
Speaker 1
01:05:49
Let's go. What is adaptive about it? What is specialized about it?
Speaker 2
01:05:53
Well let me first talk about the specializing part because the adaptive part is the sort of the second-order effect but they're both important. So bytecode is a bunch of machine instructions, but it's an imaginary machine. But the machine can do things like call a function, add 2 numbers, print a value.
Speaker 2
01:06:18
Those are sort of typical instructions in Python. And if we take the example of adding 2 numbers, actually In Python, the language, there's no such thing as adding 2 numbers. The compiler doesn't know that you're adding 2 numbers. You might as well be adding 2 strings or 2 lists or 2 instances of some user-defined class that happened to implement this operator called add.
Speaker 2
01:06:52
That's a very interesting and fairly powerful mathematical concept. It's mostly a user interface trick because it means that a certain category of functions can be written using a single symbol, the plus sign, and sort of a bunch of other functions can be written using another single symbol, the multiply sign. So if we take addition, the way traditionally in Python, the add byte code was executed is pointers, pointers, and more pointers. So first we have 2 objects.
Speaker 2
01:07:34
An object is basically a pointer to a bunch of memory that contains more pointers.
Speaker 1
01:07:39
Pointers all the way down.
Speaker 2
01:07:41
Well not quite, but there are a lot of them. So To simplify a bit, we look up in 1 of the objects, what is the type of that object and does that object type define an add operation. And so you can imagine that there is a sort of a type integer that knows how to add itself to another integer.
Speaker 2
01:08:07
And there is a type floating point number that knows how to add itself to another floating point number. And the integers and floating point numbers are sort of important, I think, mostly historically, because in the first computers, you used the sort of the same bit pattern when interpreted as a floating point number had a very different value than when interpreted as an integer.
Speaker 1
01:08:34
Can I ask a dumb question here?
Speaker 2
01:08:36
Please do.
Speaker 1
01:08:36
Given the basics of int and float and add, who carries the knowledge of how to add 2 integers? Is it the integer? It's the type integer versus?
Speaker 2
01:08:47
It's the type integer and the type float.
Speaker 1
01:08:50
What about the operator? Does the operator just exist as a platonic form possessed by the integer?
Speaker 2
01:08:59
The operator is more like... It's an index in a list of functions that the integer type defines. And so the integer type is really a collection of functions.
Speaker 2
01:09:18
And there is an add function, and there is a multiply function, and there are like 30 other functions for other operations there's a power function for example and you can imagine that in in memory there is a distinct slot for the add operations. Let's say the add operation is the first operation of a type and the multiply is the second operation of a type. So now we take the integer type and we take the floating point type. In both cases the add operation is the first slot and multiply is the second slot.
Speaker 2
01:09:56
But each slot contains a function and the functions are different because the addToIntegers function interprets the bit patterns as integers. The addToFloat function interprets the same bit pattern as a floating point number. And then there is the string data type, which again interprets the bit pattern as the address of a sequence of characters. There are lots of lies in that story, but that's sort
Speaker 1
01:10:35
of a basic idea. I can tell, I can tell the fake news and the fabrication going on here at the table. But where's the optimization?
Speaker 1
01:10:44
Is it on the operators? Is it different inside the integer?
Speaker 2
01:10:47
The optimization is the observation that in a particular line of code, so now you write your little Python program and you write a function and that function sort of takes a bunch of inputs and at some point it adds 2 of the inputs together. Now I bet you even if you call your function a thousand times that all those calls are likely all going to be about integers. Because maybe your program is all about integers.
Speaker 2
01:11:24
Or maybe on that particular line of code where there's that plus operator, Every time the program hits that line, the variables A and B that are being added together happen to be strings. And so what we do is instead of having this single byte code that says, here's an add operation and the implementation of add is fully generic it looks at the object from the object it looks at the type then it takes the type and it looks at looks up the function pointer then it calls the function now the function has to be has to look at the other argument and has to double check that the other argument has the right type. And then there's a bunch of error checking before it can actually just go ahead and add the 2 bit patterns in the right way. In the right way.
Speaker 2
01:12:16
What we do is every time we execute an add instruction like that, we keep a little note of in the end after we hit the code that did the addition for a particular type, what type was it? And then after a few times through that code, if it's the same type all the time, we say, oh, so this add operation, even though it's the generic add operation, it might as well be the add integer operation. And add integer operation is much more efficient because it just says, assume that A and B are integers, do the addition operation, do it right there in line, and produce the result. And the big lie here is that in Python, even if you have great evidence that in the past, it was always 2 integers that you were adding, at some point in the future, that same line of code could still be hit with 2 floating points or 2 strings or maybe a string and an integer.
Speaker 1
01:13:35
It's not a great lie. That's just the fact of life.
Speaker 2
01:13:39
I didn't account for what should happen in that case in the way I told the story. There is some accounting for that. And so what we actually have to do is when we have the add integer operation, we still have to check, are the 2 arguments in fact integers?
Speaker 2
01:14:01
We applied some tricks to make those checks efficient. And we know statistically that the outcome is almost always, yes, they are both integers. And so we quickly make that check and then we proceed with the sort of add integer operation. And then there is a fallback mechanism where we say, oops, 1 of them wasn't an integer.
Speaker 2
01:14:27
Now we're going to pretend that it was just the fully generic add operation. We wasted a few cycles believing it was going to be 2 integers and then we had to back up but we didn't waste that much time and statistically most of the time. Basically we were sort of hoping that most of the time we guess right, because if it turns out that we guessed wrong too often, or we didn't have a good guess at all, things might actually end up running a little slower. So someone armed with this knowledge and a copy of the implementation, someone could easily construct a counter example where they say, oh I have a program and now it runs 5 times as slow in Python 3.11 than it did in Python 3.10.
Speaker 2
01:15:22
But that's a very unrealistic program. That's just like an extreme fluke.
Speaker 1
01:15:29
It's a fun reverse engineering task though. Oh yeah. So there's a...
Speaker 2
01:15:34
You can
Speaker 1
01:15:34
do that stuff. People like fun, yes. So there's some, presumably, heuristic of what defines a momentum of saying, you know, you seem to be working, adding 2 integers, not 2 generic types.
Speaker 1
01:15:51
So how do you figure out that heuristic? I think that the heuristic is actually, we assume that the weather tomorrow is gonna be the same as the weather today. So you don't need 2 days of the weather? No.
Speaker 2
01:16:05
That is already so much better than guessing randomly.
Speaker 1
01:16:10
So how do you find this idea? Hey, I wonder if instead of adding 2 generic types, we start assuming that the weather tomorrow is the same as the weather today, where do you find the idea for that? Because that ultimately, for you to do that, you have to kind of understand how people are using the language, right?
Speaker 2
01:16:34
Python is not the first language to do a thing like this. This is a fairly well known trick, especially from other interpreted languages that had reason to be sped up. We occasionally look at papers about HHVM, which is Facebook's efficient compiler for PHP.
Speaker 2
01:16:57
There are tricks known from the JVM And sometimes it just comes from academia.
Speaker 1
01:17:04
So the trick here is that the type itself doesn't, the variable doesn't know what type it is. So this is not a statically typed language where you can afford to have a shortcut to saying it's ints.
Speaker 2
01:17:17
This is a trick that is especially important for interpreted languages with dynamic typing because if the compiler could read in the source These x and y that we're adding are integers. The compiler can just insert a single add machine code, that hardware machine instruction that exists on every CPU and ditto for floats. But because in Python you don't generally declare the types of your variables, you don't even declare the existence of your variables, they just spring into existence when you first assign them.
Speaker 2
01:18:01
Which is really cool and sort of helps those beginners because there is less bookkeeping they have to learn how to do before they can start playing around with code, but it makes the, the interpretation of the code less efficient and so we're, we're sort of trying to, to make the interpretation more efficient without losing the super dynamic nature of the language. That's always the challenge.
Speaker 1
01:18:32
3.5 got the PEP484 type hints. What is type hinting and is it used by the interpreter, the hints, or is it just syntactic sugar?
Speaker 2
01:18:44
So the Type hints is an optional mechanism that people can use and it's especially popular with sort of larger companies that have very large code bases written in Python.
Speaker 1
01:18:58
Do you think of it as almost like documentation saying these 2 variables
Speaker 2
01:19:02
are this type? More than documentation. I mean, so it is a sub-language of Python where you can express the types of variables.
Speaker 2
01:19:13
So here is a variable and it's an integer. And here's an argument to this function and it's a string and here is a function that returns a list of strings.
Speaker 1
01:19:22
But that's not checked when you run the code.
Speaker 2
01:19:24
But exactly. There is a separate piece of software called a static type checker that reads all your source code without executing it and thinks long and hard about what it looks from just reading the code that code might be doing and double checks if that makes sense if you take the types as annotated into account.
Speaker 1
01:19:51
So this is something you're supposed to run as you develop.
Speaker 2
01:19:54
It's like a linter. Yeah, that's definitely a development tool, but the type annotations currently are not used for speeding up the interpreter. And there are a number of reasons.
Speaker 2
01:20:09
Many people don't use them. Even when they do use them, they sometimes contain lies, where the static type checker says everything's fine. I cannot prove that this integer is ever not an integer, but at runtime somehow someone manages to violate that assumption. And the interpreter ends up doing just fine.
Speaker 2
01:20:36
If we started enforcing type annotations in Python, many Python programs would no longer work. And some Python programs wouldn't even be possible because they're too dynamic. And so we made a choice of not using the annotations. There is a possible future where eventually 345 releases in the future, we could start using those annotations to sort of provide hints because we can still say, well, the source code leads us to believe that these x and y are both integers and so we can generate an add an add integer instruction but we can still have a fallback that says oh if the if somehow the code coded runtime provided something else maybe it provided 2 decimal numbers we can still use that generic add operation as a fallback.
Speaker 2
01:21:40
But we're not there.
Speaker 1
01:21:41
Is there currently a mechanism or do you see something like that where you can almost add like an assert inside a function that says please check that my type hints are actually mapping to reality. Sort of like insert
Speaker 2
01:21:58
manual static typing. There are third party libraries that are in that business.
Speaker 1
01:22:05
It's possible to do that kind of thing? It's possible for a third party library to take a hint and enforce it? Well, yes.
Speaker 1
01:22:12
It seems like
Speaker 2
01:22:13
a tricky thing. Well, what we actually do is, and I think this is a fairly unique feature in Python, the type hints can be introspected at runtime. So while the program is running, they mean Python is a very introspectable language.
Speaker 2
01:22:32
You can look at a variable and ask yourself what is the type of this variable and if that variable happens to refer to a function you can ask what are the arguments to the function. And nowadays you can also ask what are the type annotations for the function.
Speaker 1
01:22:50
So the type annotations are there inside the variable as it's at runtime.
Speaker 2
01:22:55
They're mostly associated with the function object, not with each individual variable, but you can sort of map from the arguments to the variables.
Speaker 1
01:23:05
And that's what a third party library can help with. Exactly.
Speaker 2
01:23:08
And the problem with that is that all that extra runtime type checking is going to slow your code down instead of speed it up?
Speaker 1
01:23:17
I think to reference this sales pitchy blog post that says 75% of developers time is spent on debugging, I would say that in some cases that might be okay. It might be okay to pay the cost of performance for the catching of the
Speaker 2
01:23:34
types, the type errors. And in most cases, doing it statically before you ship your code to production is more efficient than doing it at runtime piecemeal. Yeah.
Speaker 1
01:23:50
Can you tell me about MYPY, MyPy project? What is it? What's the mission?
Speaker 1
01:23:59
And in general, What is the future of static typing in Python?
Speaker 2
01:24:04
Well, so MyPy was started by a Finnish developer, Jukka Lettusalo.
Speaker 1
01:24:11
So many cool things out of Finland, I gotta say. Just that part of the world.
Speaker 2
01:24:15
I guess people have nothing better to do in those long, cold winters. I don't know. I think Jukka lived in England when he invented that stuff, actually.
Speaker 2
01:24:26
But MyPy is the original static type checker for Python. And the type annotations that were introduced with PEP 484 were sort of developed together with the static type checker. And in fact, Yuka had first invented a different syntax that wasn't quite compatible with Python. And Jukka and I sort of met at a Python conference in, I think, in 2013.
Speaker 2
01:24:59
And we sort of came up with a compromise syntax that would not require any changes to Python and that would let MyPy sort of be an add-on static type checker for Python.
Speaker 1
01:25:15
Just out of curiosity, was it like double colon or something? What was he proposing that would break Python?
Speaker 2
01:25:21
I think he was using angular brackets for types like in C++ or Java generics.
Speaker 1
01:25:28
Yeah, you can't use angular brackets in Python. It would be too tricky
Speaker 2
01:25:33
for template. Well, the key thing is that we already had a syntax for annotations, we just didn't know what to use them for yet. So type annotations were just the sort of most logical thing to use that existing dummy syntax for.
Speaker 2
01:25:54
But there was no syntax for defining generics directly syntactically in the language. Mypy literally meant my version of Python, where my refers to Yuka. He had a parser that translated Mypy into Python by like doing the type checks and then removing the annotations and all the angular brackets from the positions where he was using them.
Speaker 1
01:26:30
But a pre-processor model doesn't work very well with the typical workflow of Python development projects. That's funny. I mean, that could have been another major split if it became successful.
Speaker 1
01:26:42
Like if you watch TypeScript versus JavaScript, it's like a split in the community over types, right? That seems to be stabilizing now.
Speaker 2
01:26:53
It's not necessarily a split. There are certainly plenty of people who don't use TypeScript, but just use the original JavaScript notation, just like there are many people in the Python world who don't use type annotations and don't use static type checkers.
Speaker 1
01:27:11
No, I know, but there is a bit of a split between TypeScript and old school JavaScript, ES, whatever.
Speaker 2
01:27:17
Well, in the JavaScript world, transpilers are sort of the standard way of working anyway, which is why TypeScript being a transpiler itself is not a big deal.
Speaker 1
01:27:28
And transpilers, for people who don't know, it's exactly the thing you said with MyPy, it's the code, I guess you call it pre-processing code that translates from 1 language to the other. And that's part of the culture, part of the workflow of the JavaScript community, so.
Speaker 2
01:27:43
That's right. At the same time, an interesting development in the JavaScript slash TypeScript world at the moment is that there is a proposal under consideration, it's only a stage 1 proposal, that proposes to add a feature to JavaScript where just like Python, it will ignore certain syntax when running the JavaScript code. And what it ignores is more or less a superset of the TypeScript annotation syntax.
Speaker 2
01:28:20
Interesting. So that would mean that eventually, if you wanted to, you could take TypeScript and you could shove it directly into a JavaScript interpreter without translation. The interesting thing in the JavaScript world, at least the web browser world, the web browsers have changed how they deploy and they sort of update their JavaScript engines much more quickly than they used to in the early days and so there's much less of a need for translation in JavaScript itself because most browsers just support the most recent version of ECMAScript. Just on a tangent of a tangent, do you see, if you were
Speaker 1
01:29:08
to recommend somebody use a thing, would you recommend TypeScript or JavaScript? I would recommend TypeScript. Just because of the strictness of the typing.
Speaker 1
01:29:19
It's an enormously helpful extra tool that helps you sort of
Speaker 2
01:29:26
keep your head straight about what your code is actually doing. I mean it helps with editing your code, it helps with ensuring that your code is not too incorrect. And it's actually quite compatible with JavaScript, never mind this syntactic sort of hack that is still years in the future.
Speaker 2
01:29:52
But any library that is written in pure JavaScript can still be used from TypeScript programs. And also the other way around, you can write a library in TypeScript and then export it in a form that is totally consumable by JavaScript. That sort of compatibility is sort of the key to the success of TypeScript.
Speaker 1
01:30:17
Yeah, just to look at it, it's almost like a biological system that's evolving. It's fascinating to see JavaScript evolve the way it does.
Speaker 2
01:30:24
Well, maybe we should consider that biological systems are just engineering systems too, right? Yes. Just very advanced, with more history.
Speaker 1
01:30:35
But it's almost like the most visceral in the JavaScript world because there's just so much code written in JavaScript that for its history was messy. If you're talking about bugs per line of code, I just feel like JavaScript eats the cake or whatever the terminology is. It beats Python by a lot in terms of number of bugs, meaning like way more bugs in JavaScript.
Speaker 1
01:31:01
And then obviously the browsers are developed. I mean, just there's so much active development, it feels a lot more like evolution, where a bunch of stuff is born and dies, and there's experimentation and debates, versus Python, there's more, all that stuff is happening, but there's just a longer history of stable, working, giant software systems written in Python versus JavaScript is just a giant, beautiful, I would say, mess of code.
Speaker 2
01:31:31
It's a very different culture and to some extent differences in culture are random, but to some extent the differences have to do with the environment. And the fact that JavaScript is primarily the language for developing web applications, especially the client side. And the fact that it's basically the only language for developing web applications makes that community sort of just have a different nature than the community of other languages.
Speaker 1
01:32:10
Plus the graphical component and the fact that they're deploying it on all kinds of shapes of screens and devices and all that kind of stuff, it just creates a beautiful chaos. Anyway, back to MyPy. So what, okay, you met, you talked about a syntax that could work.
Speaker 1
01:32:29
Where does it currently stand? What's the future of static typing in Python?
Speaker 2
01:32:34
It is still controversial, but it is much more accepted than when MyPy and PEP484 were young.
Speaker 1
01:32:43
What's the connection between PEP484 type hints and MyPy?
Speaker 2
01:32:48
MyPy was the original static type checker. So MyPy quickly evolved from Yuka's own variant of Python to a static type checker for Python. And sort of PEP 484, that was like a very productive year where like many hundreds of messages were exchanged debating the merits of every aspect of that PEP.
Speaker 2
01:33:19
And so MyPy is a static type checker for Python. It is itself written in Python. Most additional static typing features that we introduced in the time since 3.6 were also prototyped through MyPy. MyPy being an open source project with a very small number of maintainers was successful enough that people said this static type checking stuff for Python is actually worth an investment for our company.
Speaker 2
01:33:57
But somehow they chose not to support making MyPi faster, say, or adding new features to MyPi, but both Google and Facebook and later Microsoft developed their own static type checker. I think Facebook was 1 of the first. They decided that they wanted to use the same technology that they had successfully used for HHVM. Because they sort of, they had a bunch of compiler writers and static type checking experts who had written the HHVM compiler and it was a big success within the company.
Speaker 2
01:34:45
And they had done it in a certain way, sort of. They wrote a big, highly parallel application in an obscure language named OCaml, which is apparently mostly very good for writing static type checkers.
Speaker 1
01:35:01
Interesting. I have a lot of questions about how to write a static typechecker then. That's very confusing.
Speaker 2
01:35:07
Facebook wrote their version and they worked on it in secret for about a year and then they came clean and went open source. Google in the meantime was developing something called PyType, which was mostly interesting because it, as you may have heard, they have 1 gigantic mono repo. So all the code is checked into a single repository.
Speaker 2
01:35:35
Facebook has a different approach. So Facebook developed Pyre, which was written in OCaml, which worked well with Facebook's development workflow. Google developed something they called PyType, which was actually itself written in Python. And it was meant to sort of fit well in their static type checking needs in Google's gigantic mono repo.
Speaker 1
01:36:05
So Google has in 1 giant, got it. So just to clarify this static type checker philosophically is a thing that's supposed to exist outside of the language itself. And it's just a workflow, like a debugger for the programmers.
Speaker 1
01:36:19
It's
Speaker 2
01:36:20
a linter.
Speaker 1
01:36:21
For people who don't know, a linter, maybe you can correct me, but it's a thing that runs through the code continuously, pre-processing to find issues based on style, documentation. I mean there's all kinds of linters, right? You can check that.
Speaker 1
01:36:37
What usual things does a linter do? Maybe check that you haven't too many characters in a single line.
Speaker 2
01:36:45
Linters often do static analysis where they try to point out things that are likely mistakes but not incorrect according to the language specification. Like maybe you have a variable that you never use. For the compiler, that is valid.
Speaker 2
01:37:04
You might sort of, You might be planning to use it in a future version of the code and the compiler might just optimize it out. But the compiler is not going to tell you, hey you're never using this variable. A linter will tell you that variable is not used. Maybe there's a typo somewhere else where you meant to use it but you accidentally use something else or there are a number of sort of common scenarios and A linter is often a big collection of little heuristics where by looking at the combination of how your code is laid out, maybe how it's indented, maybe the comment structure, but also just things like definition of names, use of names.
Speaker 2
01:37:53
It'll tell you likely things that are wrong. And in some cases, linters are really style checkers. For Python, there are a number of linters that check things like, do you use the PEP 8 recommended naming scheme for your functions and classes and variables? Because like classes start with an uppercase and the rest starts with a lowercase.
Speaker 2
01:38:19
There's like differences there. And so the linter can tell you, hey, you have a class that whose first letter is not an uppercase letter. And that's just, I just find it annoying if I wanted that to be an uppercase letter. I would have typed an uppercase letter, but other people find it very comforting that if the linter is no longer complaining about their code that they have followed all the style rules.
Speaker 2
01:38:45
Maybe it's a fast way for a new developer joining a team to
Speaker 1
01:38:48
learn the style rules, right?
Speaker 2
01:38:50
Yeah, there's definitely that. But the best use of a linter is probably not so much to sort of enforce team uniformity, but to actually help developers catch bugs that the compilers for whatever reason don't catch. And there is lots of that in Python.
Speaker 2
01:39:12
And so, but a static type checker focuses on a particular aspect of the linting which, I mean, MyPy doesn't care how you name your classes and variables, but it is meticulous about when you say that it was an integer here and you're passing a string there, it will tell you, hey, that string is not an integer. So something's wrong. Either you were incorrect when you said it was an integer or you're incorrect when you're passing it a string. If this is a race of static type checkers, is somebody winning?
Speaker 1
01:39:49
As you said, it's interesting that the companies didn't choose to invest in this centralized development of MyPy. Is there a future for MyPy? What do you see as the, will 1 of the companies win out and everybody uses like a Pi type, whatever Google's is called?
Speaker 2
01:40:10
Well, Microsoft is hoping that Microsoft's horse in that race called Pi Wright is going to win. Pi Wright, like R-I-G-H-T? Correct, Yeah, all my word processors tend to typo correct that as Pyrite, the name of the, I don't know what it is, some kind of semi-precious metal.
Speaker 2
01:40:35
Oh, right.
Speaker 1
01:40:37
I love it. Okay, so, okay, that's the Microsoft hope, but, okay, so let me ask the question a different way. Is there going to be ever a future where the static type checker gets integrated into the language?
Speaker 2
01:40:53
Nobody is currently excited about doing any work towards that. That doesn't mean that 5 or 10 years from now the situation isn't different. At the moment all the static type checkers still evolve at a much higher speed than Python and its annotation syntax evolve.
Speaker 2
01:41:22
You get a new release of Python once a year. Those are the only times that you can introduce new annotation syntax. And there are always people who invent new annotation syntax that they're trying to push. And worse, once we've all agreed that we are going to put some new syntax in, we can never take it back.
Speaker 2
01:41:48
At least a sort of deprecating an existing feature takes many releases because you have to assume that people started using it as soon as we announced it. And then you can't take it away from them right away. You have to start telling them well this will go away but we're not gonna tell you that it's an error yet and then later it's gonna be a warning and then eventually 3 releases in the future maybe we remove it. On the other hand the typical static type checker still has a release like every month, every 2 months, certainly many times a year.
Speaker 2
01:42:30
Some type checkers also include a bunch of experimental ideas that aren't official standard Python syntax yet. The static type checkers also just get better at discovering things that are unspecified by the language, but that sort of could make sense. And so each static type checker actually has its sort of strong and weak points.
Speaker 1
01:43:00
So it's cool, it's like a laboratory of experiments. Yeah. Microsoft, Google and all, and you get to see.
Speaker 2
01:43:04
And you see that everywhere, right? Because there's not 1 single JavaScript engine either. There is 1 in Chrome, there is 1 in Safari, there is 1 in Firefox.
Speaker 1
01:43:17
But that said, you said there's not interest. I think there is a lot of interest in type hinting, right? In the PEP 484.
Speaker 1
01:43:26
Actually, like, how many people use that? Do you have a sense? How many people use, because it's optional, it's just sugar.
Speaker 2
01:43:32
I can't put a number on it, but from the number of packages that do interesting things with it at runtime and the fact that there are like now 3 or 4 very mature type checkers that each have their segment of the market. And oh, and then there is PyCharm, which has a sort of more heuristic-based type checker that also supports the same syntax. My assumption is that many, many people developing Python software professionally for some kind of production situation are using a static type checker.
Speaker 2
01:44:15
Especially anybody who has a continuous integration cycle probably has 1 of the steps in there. Their testing routine that happens for basically every commit is run a static type checker. In most cases, that will be MyPy. So I think it's pretty popular topic.
Speaker 1
01:44:42
According to this web page, 20-30 percent of Python 3 codebases are using type hints. Wow. I wonder how they measured that.
Speaker 1
01:44:53
Did they just scan all of GitHub? Yeah, that's what it looks like. Yeah. They did a quick, not an all of, but like a random sampling.
Speaker 1
01:45:02
So you mentioned PyCharm. Let me ask you the big subjective question. What's the best IDE for Python? And you're extremely biased now that you're with Microsoft.
Speaker 1
01:45:17
Is it PyCharm, VS Code, Vim or Emacs?
Speaker 2
01:45:21
Historically, I actually started out with using Vim, but when it was still called VI. For a very long time, I think from the early eighties to, I'd say 2 years ago, I was Emacs user nice between I'd say 2013 and 2018, I dabbled with PyCharm mostly because it had a couple of features. I mean, PyCharm is like driving an 18-wheeler truck, whereas E-MAX is more like driving your comfortable Toyota car that you've had for 100,000 miles and you know what every little rattle of the car means.
Speaker 2
01:46:19
I was very comfortable in Emacs but there were certain things it couldn't do. It wasn't very good at that sort of, at least the way I had configured it, I didn't have very good tooling in Emacs for finding a definition of a function. When I was at Dropbox exploring a 5000000 line Python code base, just grabbing all that code for where is there a class foobar. Well it turns out that if you grab all 5 million lines of code, there are many classes with the same name.
Speaker 2
01:46:57
And so PyCharm sort of once you fired it up and once it's indexed, your repository was very helpful. But as soon as I had to edit code, I would jump back to Emacs and do all my editing there because I could type much faster and switch between files when I knew which file I wanted much quicker. And I never really got used to the whole PyCharm user interface.
Speaker 1
01:47:26
Yeah, I feel torn in that same kind of way because I've used PyCharm off and on exactly in that same way. And I feel like I'm just being an old grumpy man for not learning how to quickly switch between files and all that kind of stuff. I feel like that has to do with shortcuts, that has to do with, I mean, you just have to get accustomed, just like with touch typing.
Speaker 2
01:47:46
Yeah, you have to just want to learn that. I mean, if you don't need it much.
Speaker 1
01:47:51
You don't need touch typing either. You can type with 2 fingers just fine in the short term, but in the long term your life will become better psychologically and productivity wise if you learn how to type with 10 fingers.
Speaker 2
01:48:04
If you do a lot of keyboard input.
Speaker 1
01:48:06
Before everyone, emails and stuff, right? Like you look at the next 20, 30 years of your life, you have to anticipate where technology is going. Do you wanna invest in handwriting notes?
Speaker 1
01:48:19
Probably not. More and more people are doing typing versus handwriting notes. So you can anticipate that. So there's no reason to actually practice handwriting.
Speaker 1
01:48:28
There's more reason to practice typing. You can actually estimate, back to the spreadsheet, the number of paragraphs, sentences, or words you write for the rest of your life. You can probably estimate. You
Speaker 2
01:48:43
go again with the spreadsheet of my life, huh?
Speaker 1
01:48:46
Yes, all of that is not actual, like converted to a spreadsheet, but it's a gut feeling. Like I have the same kind of gut feeling about books. I've almost exclusively switched to Kindle now, for ebook readers, even though I still love, and probably always will, the smell, the feel of a physical book.
Speaker 1
01:49:05
And the reason I switched to Kindle is like, all right, well, this is really paving, the future is going to be digital in terms of consuming books and content of that nature. So you should let your brain get accustomed to that experience. In that same way, it feels like PyCharm or VS Code, I think PyCharm is the most sort of sophisticated, featureful Python ID. It feels like I should probably at some point very soon switch entire, like I'm not allowed to use anything else for Python than this ID or VS Code, it doesn't matter, but walk away from Emacs for this particular application.
Speaker 1
01:49:48
So I think I'm limiting myself in the same way that using 2 fingers for typing is limiting myself. This is a therapy session, I'm not even asking questions.
Speaker 2
01:50:00
But I'm sure a lot of people are thinking
Speaker 1
01:50:01
this way, right? I'm not going to
Speaker 2
01:50:01
stop you. I think that everybody has to decide for themselves which 1 they want to invest more time in. I actually ended up giving VS Code a very tentative try when I started out at Microsoft and really liking it.
Speaker 2
01:50:24
And it sort of, it took me a while before I realized why that was. But and I think that actually the founders of VS Code may not necessarily agree with me on this, but to me VS Code is in a sense the spiritual successor of Emacs. Because as you probably know as an old Emacs hack, the key part of Emacs is that it's mostly written in Lisp. And that sort of new features of Emacs usually update all the Lisp packages and add new Lisp packages and oh yeah there's also some very obscure thing improved in the part that's not in Lisp but that's usually not why you would upgrade to a new version of Emacs.
Speaker 2
01:51:21
There's a core implementation that sort of can read a file and it can put bits on the screen and it can sort of manage memory and buffers. And then what makes it an editor full of features is all the list packages. And of course the design of how the list packages interact with each other and with that sort of that base layer of the core immutable engine. But almost everything in that core engine in Emacs case can still be overridden or replaced.
Speaker 1
01:52:00
And
Speaker 2
01:52:02
so VS Code has a similar architecture where there is like a base engine that you have no control over. I mean it's open source but nobody except the people who work on that part changes it much. And it has a sort of a package manager and a whole series of interfaces for packages and an additional series of conventions for how packages should interact with the lower layers and with each other.
Speaker 2
01:52:40
And powerful primitive operations that let you move the cursor around or select pieces of text or delete pieces of text or interact with the keyboard and the mouse and whatever peripherals you have. And so the sort of the extreme extensibility and the package ecosystem that you see in VS Code is a mirror of very similar architectural features in Emacs.
Speaker 1
01:53:14
Well, I have to give it a serious try because as far as sort of the hype and the excitement in the general programming community, VS Code seems to dominate. The interesting thing about PyCharm and what is it, PHPStorm, which are these JetBrains specific IDs that are designed for 1 programming language. It's interesting to, when an ID is specialized, right?
Speaker 2
01:53:41
They're usually actually just specializations of IntelliJ, because underneath it's all the same editing engine with different veneer on top. Where in VS Code, many things you do require loading third-party extensions. In PyCharm it is possible to have third-party extensions, but it is a struggle to create 1.
Speaker 2
01:54:14
Yes, and
Speaker 1
01:54:15
it's not part of the culture, all that kind of stuff.
Speaker 2
01:54:17
Yeah, I remember that might've been 5 years ago or so, we were trying to get some better MyPy integration into PyCharm, because MyPy is sort of Python tooling and PyCharm had its own type checking heuristic thing that we wanted to replace with something based on MyPy because that was what we were using in the company. And For the guy who was writing that PyCharm extension, it was really a struggle to sort of find documentation and get the development workflow going and debug his code and all that. So that was not a pleasant experience.
Speaker 1
01:55:05
Let me talk to you about parallelism. In your post titled, Reasoning About AsyncIO Semaphore, you talk about a fast food restaurant in Silicon Valley that has only 1 table. Is this a real thing?
Speaker 1
01:55:18
I just wanted to ask you about that. Is that just like a metaphor you're using or is that an actual restaurant in Silicon Valley?
Speaker 2
01:55:25
It was a metaphor, of course.
Speaker 1
01:55:27
I can imagine such a restaurant. So for people who don't, then read the thing you should, but it was an idea of a restaurant where there's only 1 table and you show up 1 at a time and you're prepared. And I actually looked it up and there is restaurants like this throughout the world.
Speaker 1
01:55:47
And it just seems like a fascinating idea. You stand in line, you show up, there's 1 table. They ask you all kinds of questions, they cook just for you. That's fascinating.
Speaker 2
01:55:58
It sounds like you'd find places like that in Tokyo. It sounds like a very Japanese thing. Or in the Bay Area, there are pop-up places that probably more or less work like that.
Speaker 2
01:56:08
I've never eaten at such a place.
Speaker 1
01:56:10
The fascinating thing is you propose it's a fast food. This is all for a burger.
Speaker 2
01:56:14
It was 1 of my rare sort of more literary or poetic moments where I thought I'll just open with a crazy example to catch your attention and the rest is very dry stuff about locks and semaphores and how a semaphore is a generalization of a lock.
Speaker 1
01:56:34
Well, it was very poetic and well delivered and it actually made me wonder if it's real or not because you don't make that explicit and it feels like it could be true. And in fact, I wouldn't be surprised if somebody listens to this and knows exactly a restaurant like this in Silicon Valley. Anyway, can we step back and can you just talk about parallelism, concurrency, threading, asynchronous, all of these different terms?
Speaker 1
01:56:59
What is it, sort of a high philosophical level? The fisherman is back in the boat?
Speaker 2
01:57:04
Well, the idea is if the fisherman has 2 fishing rods, since fishing is mostly a matter of waiting for a fish to nibble,
Speaker 1
01:57:15
Well,
Speaker 2
01:57:15
it depends on how you do it actually. But if you had to, if you're doing the style of fishing where you sort of, you throw it out and then you let it sit for a while until maybe you see a nibble, 1 fisherman can easily run 2 or 3 or 4 fishing rods. And so as long as you can afford the equipment, you can catch 4 times as many fish by a small investment in 4 fishing rods.
Speaker 2
01:57:41
And so since your time, you sort of say you have all Saturday to go fishing. If you can catch 4 times as much fish, you have a much higher productivity.
Speaker 1
01:57:52
And that's actually, I think, how deep sea fishing is done. You could just have a rod and you put in a hole so that you can have many rods. What, is there an interesting difference between parallelism and concurrency and asynchronous?
Speaker 1
01:58:06
Is there 1 subset of the other to you? Like, how do you think about these terms?
Speaker 2
01:58:10
In the computer world, there is a big difference. When people are talking about parallelism, like a parallel computer, that's usually really several complete CPUs that are sort of tied together and share something like memory or an I O bus. Concurrency can be a much more abstract concept where you have the illusion that things happen simultaneously but what the computer actually does is it spends a little time running this program for a while, and then it spends some time running that program for a while, and then spending some time for
Speaker 1
01:59:00
the third program for a while. So parallelism is the reality in concurrency, is part reality, part illusion.
Speaker 2
01:59:08
Yeah, parallelism typically implies that there is multiple copies of the hardware.
Speaker 1
01:59:15
You write that implementing synchronization primitives is hard in that blog post. And you talk about locks and semaphores. Why is it hard to implement synchronization primitives?
Speaker 1
01:59:27
Because at the conscious level, Our brains are not
Speaker 2
01:59:32
trained to sort of keep track of multiple things at the same time. Like obviously you can walk and chew gum at the same time because they're both activities that require only a little bit of your conscious activity. But try balancing your checkbook and watching a murder mystery on TV.
Speaker 2
01:59:57
You'll mix up the digits or...
Omnivision Solutions Ltd