See all Y Combinator transcripts on Youtube

youtube thumbnail

Fermat's Library Cofounders João Batalha and Luís Batalha

1 hours 13 minutes 40 seconds

🇬🇧 English

S1

Speaker 1

00:00

You guys are brothers, right? Yeah, we are.

S2

Speaker 2

00:02

Yeah, okay.

S3

Speaker 3

00:03

He's the older 1, I'm 2 years younger.

S1

Speaker 1

00:06

Okay, and what made you want to start Format's library?

S3

Speaker 3

00:11

So, just for the people that don't know what it is, Format is a platform for annotating papers. And so if you want to think about it, you imagine a PDF view in your browser, and then you have annotations on the side that support LaTeX and Markdown. And so you can add annotations in parts of papers that you think are particularly tough to understand or you think you could add more content there.

S3

Speaker 3

00:36

But so it's something that we've done. The 4 of us that started Format, we all have a technical background. And so after college, we kept on reading papers. And every once in a while, we had this internal journal club where we would read a paper and present it to the others.

S3

Speaker 3

00:56

So I remember, for instance, presenting a few years back, presenting the Bitcoin paper to Luis and Mika, which don't have a CS background. And so you kind of have to go into, for instance, for the Bitcoin, you might have to go into, OK, what's a hash function? What's public key encryption? And so we were already doing this.

S3

Speaker 3

01:14

And we knew that you also have this behavior offline in places like universities. And so we wanted to take that experience and bring it online. And we thought there was a lot of content that you end up producing while you're trying to read a paper, which can be the most dense piece of content that a human can read sometimes, right? The language can be incredibly spartan, and sometimes there's a step in some paper that they say, oh, this should be obvious, but then you look at it, and it's like, okay, I don't get it.

S3

Speaker 3

01:47

And so we knew that there was a lot of content there that you end up producing while trying to understand a paper and we wanted to bring that online.

S1

Speaker 1

01:56

Because, Luiz, you were in physics

S2

Speaker 2

01:58

before. I studied physics together with Mika and João and Taimo went to MIT. Taimo studied economics and you studied CS. So a lot of the papers are around physics, math, economics, biology, CS, right?

S1

Speaker 1

02:16

Yeah. Yeah, because that was, you kind of like solved the cold start by just annotating yourself.

S2

Speaker 2

02:20

Exactly. Right.

S1

Speaker 1

02:21

And now it's more about getting the author in there.

S2

Speaker 2

02:24

Exactly. That was the kind of the growth act.

S3

Speaker 3

02:26

We start our first paper was the Bitcoin paper. Yep. And still

S1

Speaker 1

02:30

the most commented. Right.

S3

Speaker 3

02:31

Yeah. That 1 is a good number of comments. It has been there for the longest and it was quoted or just there are a bunch of news sites that have pointed back to it.

S1

Speaker 1

02:40

Oh, okay.

S3

Speaker 3

02:41

It's like, okay, if you wanna read it, go to the annotated version. But We had a few cool people comment there.

S2

Speaker 2

02:50

Laurence Lessig commented on the Bitcoin paper.

S3

Speaker 3

02:53

A bunch of people from the Bitcoin community Exactly. Commented there. Yeah.

S3

Speaker 3

02:59

But the larger goal with Fermat is to try to move things in the right direction, meaning move science towards what people call open science. And so that encompasses a number of things from open data, which means just sharing the data that you've used for publishing whatever research you might be publishing. And you want to share that and make that easily accessible to people so that if they want to replicate the results that you got or use it in their own research, they have an easy time doing that. So that's open data.

S3

Speaker 3

03:31

You also have just publishing the code that you used or the algorithms that you've used and making those more easily available to people. There's also open publishing, which means just publishing in papers that are not behind, or in journals that are not behind paywalls. So there's a lot of things that are within open science, all of those. And then there's also, so we want to push things in that direction and also try to build a platform that makes it easier for people to collaborate.

S3

Speaker 3

04:03

And we think there are a lot of things that could be happening nowadays where people could be collaborating, scientists could be collaborating remotely a lot more than they are. Or that's at least the way we think. But it's starting to change where we've had for the paper, the Erdos.

S2

Speaker 2

04:25

Yeah. I think this is actually a trend. We're seeing more and more people collaborating online around papers. So for instance, there is this famous example around a problem called the Erdos discrepancy.

S2

Speaker 2

04:38

And this problem is a famous problem that was posed by Paul Erdos, which is like this famous mathematician 80 years ago. And Terence Tao, the field's medalist, was trying to solve the problem. And he put it on his blog that he was trying a certain approach to solve the problem. And then there was this guy from Germany that just wrote a comment there, like the size of a tweet.

S2

Speaker 2

05:02

And he said that the Erdos problem had a Sudoku-like flavor, and that some of the machinery that they were using to solve the Sudoku problem could be used there. And that was actually the key to correct the problem. And they ended up publishing a solution to the Yardosh discrepancy problem, which was probably 1 of the biggest milestones in number theory in 2016. And that was all thanks to a comment on his blog and to the fact that they were collaborating online around solving that problem, which was also a polymath problem.

S2

Speaker 2

05:36

The polymath project was a project started by these other Fields medalists called Tim Gowers. And they were trying to, it was actually a social experiment to see if it was possible to solve math problems online and collaborating around math problems online. And yeah, and they were able to solve it, thanks to that comment.

S3

Speaker 3

05:57

Because you kind of see, right, You look at GitHub, and then you think of the impact that GitHub has had for open source. Open source, of course, existed much before GitHub. But it has really allowed a lot more people to come in and be able to get into open source and start contributing.

S3

Speaker 3

06:16

And there are a number of other really interesting platforms. Right. You have Wikipedia just for more general knowledge or you have Stack Overflow for just programmers helping each other. And we think that there could be something similar to that.

S3

Speaker 3

06:31

But for science in general. Right.

S1

Speaker 1

06:33

Well because did you listen to the Rogan with Peter Atiyah?

S3

Speaker 3

06:37

No. Parts of

S2

Speaker 2

06:38

it. Mika listened to that.

S1

Speaker 1

06:40

Yeah, that was a really good 1. And he talks about, I don't know if They're talking about the archive in particular around publishing papers but he talks about having full time staff.

S3

Speaker 3

06:50

Oh yeah.

S1

Speaker 1

06:51

Just scrubbing the data looking for interesting information coming out. And again, like in the context of Stack Overflow. That's the place where like programmers find specific answers to problems.

S1

Speaker 1

07:04

Whereas with the archive, like good luck. Yeah. Good luck finding that stuff. Yeah.

S1

Speaker 1

07:09

And so have you guys thought about addressing like just discoverability in the context of particular fields?

S3

Speaker 3

07:17

The, it's a really tough problem. Like For instance, paper recommendations, it's really hard to.

S1

Speaker 1

07:24

Because you're just doing 1 a week right now. Yeah. In addition to the browser extension.

S3

Speaker 3

07:29

And We also have our tool that is used internally at universities and research groups for people that they're reading papers together and they add annotations. But for now, we have the weekly journal. So we release a paper every week that we select and we annotate it, or somebody in the community annotates it.

S3

Speaker 3

07:49

And then we have the archive extension that adds a bunch of features on top of archive, like BibTeX extraction, reference extraction, and comments. And eventually, definitely, like recommendation engine and making it easier to discover papers that are relevant to you, that's something we definitely want to add onto our archive extension. But it's a tough problem.

S2

Speaker 2

08:17

It is. Yeah, initially we started Fermats, as John said, as a journal club. And then we saw that people liked the interface, the commenting interface, and liked reading the annotation.

S2

Speaker 2

08:30

So now we are starting to expand and turn formats into more of a platform. And that's why we decided to do the archive Chrome extension. Because archive, for people that don't know what it is, it's basically a place where papers leave before they go to journals in the form of preprints. So they are like drafts before they go to journals.

S2

Speaker 2

08:55

And what we did is we built a Chrome extension that basically allows people to see all the commenting interface on archive papers. And so you don't have to go to another website. You're just reading archive papers, and you see the comments on the site if you have the Chrome extension installed.

S1

Speaker 1

09:12

Well, and a lot of these papers don't even have comments on the page.

S3

Speaker 3

09:14

They don't.

S1

Speaker 1

09:15

Like, best case, you're emailing the author?

S3

Speaker 3

09:17

Exactly. Yeah.

S1

Speaker 1

09:18

Yeah. They don't have.

S2

Speaker 2

09:20

So what Archive does, it's basically they just host papers. That's the core functionality of Archive. And so 1 of the things that we noticed is that, especially for areas like machine learning and deep learning, archive is super important.

S2

Speaker 2

09:36

Because the new papers are coming out at such a high rate that people don't wait before the papers go to journals, before they start working on top of it and using the stuff that other people discover. So all the papers are published on archive. And so you need a way to distinguish good quality work from bad work if you are reading a paper on archive that hasn't been peer reviewed or something about machine learning. And I think that's why the Librarian Extension is so important in fields such as machine learning.

S1

Speaker 1

10:13

So does the Librarian Extension have a rating mechanism as well? Like, how do you distinguish good from bad work?

S2

Speaker 2

10:19

Right now, it's only through the comments. But we are actually thinking about implementing some sort of rating system for papers.

S3

Speaker 3

10:28

And we're probably going to also, we've been thinking about that for a while now. We're probably going to run a few surveys to our audience to, because you could do it in a number of ways, like rating a paper, you could do it, obviously there's likes or dislikes or upvotes and downvotes. So you could either just have an holistic rating for the whole paper.

S3

Speaker 3

10:49

You could also imagine rating it on a number of different aspects of the paper. It could be about, OK, how big is their data set if they're using some data set? Or what do you think about their methods? So you could have a more complex rating system.

S3

Speaker 3

11:05

And so we've been thinking about that a lot. And we're just trying to figure out what makes the most sense there. But that's also definitely in, like, we would love to add that to Archive, or to the Chrome extension for archive.

S1

Speaker 1

11:20

Yeah, so how do you think the collaboration plays out then? Because I understand how, you know, say for instance, you know, you're a physicist, you start commenting on someone else's paper, you start a discussion that creates a new project, right? Do you think you'll go further than there?

S1

Speaker 1

11:35

Like, are we talking about like forking and that kind of stuff?

S3

Speaker 3

11:39

Yeah, that's, I think you could, there's a lot of things that you could do if you, once you have a platform that has more people in it and that they're doing more stuff in it. And so that's why the way we've been growing Fermat is with a goal far in the future where we are a much broader platform. And so right now, but right now we were focused mostly on solving problems that people have nowadays.

S3

Speaker 3

12:09

And actually, we were largely inspired for our archive extension by the survey that the Archive guys did where they, right, they had, I don't know how many people, but they surveyed the people that use Archive and then published a paper where they describe the problems that those people reported while using Archive and the things that they most wanted to see, the features that they most wanted to see. And then the archive folks just said, hey, we're just going to be the platform to build upon. And we're not going to do all of these things that people would like us to do. But here it is.

S3

Speaker 3

12:45

This is what people want to see. There's anybody else that wants to work on this, here are the results of the survey. And since then, they've actually done a pretty great job of building an API and wanting to become more of a platform. And so There's a lot of ways that we envision that you could have collaboration around science.

S3

Speaker 3

13:06

And so, yeah, like forking a paper or forking some type of research. Or data. Exactly, or data. There's a lot of things that you could do there.

S3

Speaker 3

13:17

It's not something that we're focused on right now. Right now, we're just trying to solve these problems that people have pointed out and create a place where people can just post comments and discuss around a paper.

S2

Speaker 2

13:29

An example of the problems that people mentioned was like, for instance, reference extraction. So if you go to a PDF, you have at the bottom of the paper, you have the references that they used. And most of the times, when people want to search the references, they have to copy the text in the PDF, put it on Google, and try to find the link to the paper.

S2

Speaker 2

13:49

And 1 of the things that we did with our Chrome extension is we allowed that. They just click on a button in the Chrome extension, and then they see a list of references with links to the papers. So that was 1 of the features that was most requested by the archive users. And our idea was, initially, we wanted really to convince people to install the Chrome extension.

S2

Speaker 2

14:08

And so let's solve the hair on fire problems that they are describing here. And then once we have people using the Chrome extension, and then we can expand into open collaboration around papers since they're already there. Yeah, so that was the growth.

S1

Speaker 1

14:23

Do you guys know of anyone working on publishing negative results? This is something that I've been fascinated with. And basically the problem is that as an academic, you're not incentivized to publish negative results because you want to publish things that have high impact so you can get a job or a tenure position or just get people to even care about your work, right?

S1

Speaker 1

14:45

And so they don't publish. Do you know anyone like working on that?

S3

Speaker 3

14:50

I know of researchers that are studying that field a lot, but unfortunately for some of these things you just, that's a very large problem and people are becoming more aware of that. And with that, you have negative results. You also have people doing a lot of research into p-value hacking.

S3

Speaker 3

15:09

Yeah. Would you

S1

Speaker 1

15:11

explain that?

S3

Speaker 3

15:12

Yeah. Yeah, so p-value, it's essentially a standard that people use in order to know if the results that you have obtained out of some experiment that you've run are worthy of being published. And so, and that has worked for the most part, that has worked fine until now, but, or I mean, that's arguable, But people are looking into it and thinking, OK, should we do things differently? And should we be much more stricter with what's considered the golden standard to publishing?

S3

Speaker 3

15:45

And we've thought of doing things there with Fermat, so that if you're looking at a paper, to have an idea, OK, how relevant is this paper? This is more specific for certain areas, like if you're talking about medicine or biology, where that is really important, like the statistical significance of the results that you're presenting. That's all, right? That's the most important thing.

S3

Speaker 3

16:12

And so we've thought of doing something with Fermat there, either via some API where you could send us the DOI of a paper and we would send you some information regarding the p-value or something, or with a Chrome extension where you'd see that information displayed very prominently saying, hey, there might be some p-value hacking here or this is very solid research. Because there is a very big problem and people are realizing how prevalent it is, especially in things like economics and biology. Nutrition.

S1

Speaker 1

16:51

I mean it came about, I was just talking to a friend who's doing a PhD at Cambridge in bio.

S2

Speaker 2

16:57

That's a big thing.

S1

Speaker 1

16:58

Yeah, and only by attending a conference in the States did he realize that there was someone in Australia working on the exact same problem as him concurrently and they're failing at the same types of experiments but because they don't publish them, Like no 1 knows the results, no 1 knows the methods, and essentially like these, you know, traveling salesman type problems that people are so excited about quantum for, like trying all these permutations, are happening at a smaller scale, but no one's publishing anything.

S3

Speaker 3

17:30

So

S1

Speaker 1

17:30

like the progress isn't happening.

S3

Speaker 3

17:32

Yeah, and part of it is just the way research is done and you come into it and you're trying to find some correlation usually. You will be trying to find some trend in the data and you are going to usually have that bias. You're trying to find some correlation in publishing that.

S3

Speaker 3

17:51

And so, yeah, you might need to change things dramatically in order to get people to start publishing negative results, which are like could be incredibly useful for other researchers. Yeah. That's yeah. But there are a bunch of people working on that.

S3

Speaker 3

18:11

There's this researcher at Stanford. I'm forgetting his name. It's John. And then I forget his last name.

S3

Speaker 3

18:17

But he actually just went on this podcast, Econ Talk. And he talked.

S1

Speaker 1

18:20

Oh, really? I love Econ Talk.

S3

Speaker 3

18:21

Yeah. So you should listen to that podcast. And actually, Taimur has been talking to that professor. I think he's a professor at Stanford.

S3

Speaker 3

18:30

And yeah, and he has analyzed more this subject, but more relating to economics, I believe. But yeah, he's found a lot of the things that we're talking about here, they're prevalent also in economics. Cool.

S1

Speaker 1

18:46

Let's go into the Twitter questions. So we have a ton of quite. You guys are very popular on Twitter.

S1

Speaker 1

18:50

So congrats on your great following. Let's see. Let's start with something broad. Tanner Goblinstein asks, what are the most interesting papers you've read in the past couple of years that are not widely known?

S3

Speaker 3

19:08

That, that's interesting. We end up, like I end up reading all sorts of papers from different areas. Like- How

S1

Speaker 1

19:16

do you get the papers actually?

S3

Speaker 3

19:18

It's just like a

S2

Speaker 2

19:19

random walk. Really? Yeah.

S2

Speaker 2

19:22

The random walk. It's

S3

Speaker 3

19:23

funny. Yeah. Or sometimes you'll think, for instance, a few months ago I got like a Fitbit to track like my sleep. And so I wanted to read papers about sleep.

S3

Speaker 3

19:38

And so that just got me into a random walk around, like research around sleep. And then I found a bunch of interesting things. I ended up annotating a paper about a big study in Finland that was done regards to the association between sleep and mortality. There are a bunch of really interesting things that I learned from there.

S3

Speaker 3

19:58

For instance, that like If you sleep less than 7 hours, that's associated with higher mortality. But if you sleep more than 8 hours, that is also associated with higher mortality. Really?

S1

Speaker 1

20:09

Yeah. So have you changed your life based on that?

S3

Speaker 3

20:12

Yeah. No, I try. Well, not that I was usually more on the end of not sleeping enough. But there was also another thing from that research that apparently sleep quality doesn't matter as much, at least for mortality, which is kind of counterintuitive.

S3

Speaker 3

20:27

But it seems that just sleep quality is very closely related to the amount of sleep that you're

S1

Speaker 1

20:33

getting.

S3

Speaker 3

20:34

So like 7 hours of like okay sleep versus 7 hours of great sleep that's kind of hard to distinguish. Kyle Yerganka Seriously?

S1

Speaker 1

20:42

So you could like sleep on an airplane your whole life and live as long?

S3

Speaker 3

20:45

Dr. Kim Kite Yeah, yeah, apparently. Maybe your life will be a little bit more miserable. So it's hard sometimes to pick the favorites, but there is 1, for instance, there is 1 that is also kind of random, but it's a paper published in the 90s about the Simpsons paradox and the hot hand phenomenon in basketball.

S3

Speaker 3

21:08

So the hot hand phenomenon in basketball is you think that, OK, because they just made a field goal, the next 1 they have a higher chance of making it. And so there's this researcher that in the 90s looked at a data set from the Celtics to to see if for free throws, if that if that was true. And so before, they had asked students at Stanford and Cornell, like 100 students, if they thought that, OK, if they just made the first free throw, for the second 1, are they higher? Did they have a higher chance of making it or not?

S3

Speaker 3

21:46

And there was something like 68 of the 100 students that were asked that agreed. And they thought that that was true. And these are people from Stanford and Cornell. And so then they looked at this.

S3

Speaker 3

22:00

And so what they found back in the 90s, what they found was that actually that seemed not to be the case, right? That from your second free throw is not, you're not more likely to make it if you made the first 1. But what they found is that you're just more likely to make it on your second 1.

S1

Speaker 1

22:21

Objectively. Significantly. Yeah. OK.

S3

Speaker 3

22:24

And so this was done in the 90s with, like, I don't know how many free throws, but maybe like 5,000. They looked at some data from the Celtics. Just across

S1

Speaker 1

22:31

the Celtics. Yeah. And

S3

Speaker 3

22:32

then I went and got a data set from Kaggle with like 600,000 free throws. Free throw shots? And I reran the same, right, reran the same algorithms that they ran for the study in the 90s and then looked at what the results were.

S3

Speaker 3

22:49

And yeah, and so the pattern is pretty clear that just on their second free throw, they're just much better at it significantly, regardless of their first 1. And yeah, it doesn't matter if they made their first 1 or if they missed. Yeah, and then that paper kind of then tried to explain why people think that there is a hot hand phenomena. And that is related to the Simpson's paradox, which for people that don't know what the Simpson's paradox is, it's also really kind of changed my world view a little bit once I learned more about the Simpson's paradox.

S3

Speaker 3

23:32

But it's basically, what it says is that you can get 2 valid conclusions out of the same data, depending on how you split it. So An example is, for instance, that between 2000 and 2013, the average or the median wage for high school dropouts in the US is dropped. For high school graduates, it also dropped. For people with an undergrad degree, it dropped.

S3

Speaker 3

24:04

And for people with a graduate degree or higher, it also dropped. So across the board, for all of those segments, the median wage dropped. But in aggregate, it went up. And so you look at it, and it's like, OK, what's going on here?

S3

Speaker 3

24:22

And it turns out is that what happened is that a lot more people got a degree. So they just shifted towards higher education. So that's why you get, on average, it going up. And then for each 1 of these segments, it goes down.

S3

Speaker 3

24:38

And so the Simpson's paradox is that depending on how you cut the data, you might get different results. But that could be valid. In this case, it's pretty easy to understand that you should be like, what's the right way to look at this data. But in some other cases, it's not clear whether or not you should include this variable and cut the data in some different way.

S3

Speaker 3

24:58

And so relating it back, Like for this basketball issue, what it was is that if you looked, the results were different whether you looked on a player by player or if you looked at the aggregate. Once you collapse it all into the same table,

S2

Speaker 2

25:13

you

S3

Speaker 3

25:14

get different results rather than when you looked at it player by player. And so if you collapse it, I think, I forget exactly the way it went. But if you collapse it, it might have been that you indeed saw.

S3

Speaker 3

25:30

You didn't see the hot hand phenomenon, but if you looked at it player by player, you saw it. And so they're arguing that that's why people add the idea. That's why you get like 68 students out of 100 saying that they believe in the hot hand phenomenon. Yeah, yeah, yeah.

S3

Speaker 3

25:43

And so, yeah, so some of the papers, like that's really random.

S1

Speaker 1

25:46

It's just like it's funny you're getting these just like little tidbits of trivia. Yeah. Is it has it been relevant to you in terms of physics?

S1

Speaker 1

25:54

I mean you're basically you're working on software now right. Yeah. But I

S2

Speaker 2

25:58

yeah I also end up discovering really cool physics papers. So for instance, my 2 favorite papers are actually written by Freeman Dyson. 1 of them is when he proposed the concept of a Dyson sphere.

S2

Speaker 2

26:14

It's just 1 page. And he basically explained how an advanced civilization would need more energy than the energy that we can generate on Earth. So we would have to go to a star and build a cap around the star to extract the energy of a star. But It's funny because it's like with really simple math and physics equations, he was able to derive, OK, is this sphere stable?

S2

Speaker 2

26:39

Is it going to eat indefinitely? And so it's a really interesting paper. And the other 1 that I really like is 1 about Feynman's derivation of Schrodinger equation and also written by Freeman Dyson. And it just shows, you know, Feynman's intuition about quantum mechanics.

S2

Speaker 2

27:00

And it's also really simple and easy to read, even if you don't have a physics background. But 1 of the things that I noticed from trying to find papers and annotating all these papers was that in the 60s and all through the 20th century, all these discoveries and all these papers were mostly like 1, 2 pages. And yeah, it's so funny. And also fairly simple to read.

S2

Speaker 2

27:32

But the discovery of the neutron, it's like maybe 1 column just. The discovery of the positron, the Dyson sphere paper, they're really, really short papers and fairly accessible.

S1

Speaker 1

27:48

Why do you think they've gotten so long? Is it sort of like, you know, David Foster Wallace citing a million things because he doesn't have confidence?

S2

Speaker 2

27:55

I think it's also a consequence of a field developing. You just have, you know, more complex questions and so it's harder to write. I think

S3

Speaker 3

28:07

they're also a little bit more detailed as to the methodology and the format of papers has gotten a little bit more formal in that sense where people follow a very specific format. And I think that has added on to it. But yeah, nowadays they tend, like the gravitation wave that we annotate, that's relatively, that's what, like 15 pages?

S2

Speaker 2

28:29

Maybe. It would be interesting to analyze the constraints in terms of size that the journals were imposing like 50 or 60 years ago compared to what they are doing now. If they are forcing people to write, they were forcing people to write shorter pages, shorter papers back then. Not sure.

S2

Speaker 2

28:49

But I mean, if the discovery of the positron paper was published today, I bet it wouldn't be just a single column. Yeah.

S1

Speaker 1

28:59

Well, Are they intended to be more reproducible now?

S2

Speaker 2

29:03

Good question. Maybe. Maybe.

S2

Speaker 2

29:07

Yeah, I think. Or maybe it's just more complex problems that they are tackling now. It might be the case.

S3

Speaker 3

29:16

Yeah. It's definitely not going back, it seems. You don't really see a trend anywhere of shorter papers. But yeah, it's interesting.

S3

Speaker 3

29:26

You go back to the 60s and 50s and it was pretty nuts.

S1

Speaker 1

29:29

Man, glory days. Yeah. All right, Cool.

S1

Speaker 1

29:32

So let's go to another question. Polaris 7 asks, what are the necessary ingredients in a good and impactful good and impactful science writing?

S2

Speaker 2

29:42

This is also a good question. I don't I don't think that I'm qualified to, or like I haven't published that many papers to know that. But 1 of the things that we noticed, or at least I noticed from reading papers, is that sometimes it's not like the discovery paper that is the most impactful paper.

S2

Speaker 2

30:05

So for instance, I just remember when quantum electrodynamics was discovered, there were 3 guys working on that problem. So Feynman, Schwinger and Tomonaga. And they were sort of working independently on that problem and publishing papers on quantum electrodynamics. And the most impactful paper was actually published by Freeman Dyson, who at the time took the time to analyze all the work and kind of unified the work of Feynman, Tomonaga, and Schwinger, wrote a paper that helped other researchers understand what quantum electrodynamics was back then and helped really spread their work.

S2

Speaker 2

30:56

So it was actually the most impactful paper.

S1

Speaker 1

30:59

So in other words, clear writing. Exactly.

S3

Speaker 3

31:02

Yeah. Clear writing. Yeah. It's also, I mean, the question here is impactful scientific writing.

S3

Speaker 3

31:08

And so you have, of course, writing papers. And then you also have just scientific writing in the sense of making some concept more, explaining that to a more general audience. And so I think there's also, it's also the same where you want to make it clear and you want to make it accessible. But for instance, even like something like the Bitcoin paper, where it is like, I mean, I studied photography in college and even like it took me a few reads through it to actually get it.

S3

Speaker 3

31:41

And it's a beautiful paper, but it's definitely not, It's a very Spartan language and you want to read every sentence. And so it can be very challenging to approach it. And I think definitely you always benefit if you can make it as clear and accessible as possible. Because you never know the audience that is going to be, end up reading your paper.

S3

Speaker 3

32:06

Of course you can expect other people in your field are going to read it, but sometimes things can be useful, especially like interactions between math and physics. Things can be useful in different fields. And so I think it's always beneficial for science if you try to make it as accessible

S2

Speaker 2

32:23

as possible. What does impact mean? Yeah, well, that's

S1

Speaker 1

32:27

a question as well. Yeah. Did you see that 1 from Adam?

S1

Speaker 1

32:31

Adam Baybut asks basically the metrics for value add.

S3

Speaker 3

32:35

Yeah, exactly.

S2

Speaker 2

32:36

What does impact mean? If it's the number of citations that you get or just the number of people that learn about a certain subject because of a paper. So in that way, a review paper can have a really big impact compared to a discovery paper.

S2

Speaker 2

32:54

And so it's 1 of the problems that we also think about a lot. These metrics and what are the incentives in science and what makes people, you know, want to publish a paper or, you know, why should people worry about clarifying a paper and make it understandable to as many people as possible. Do they have the incentives to do that? How can you create incentives to do that?

S2

Speaker 2

33:23

And then sometimes, if the metric is just number of citations, sometimes it's not aligned to making the paper understandable and comprehensible to a large audience.

S1

Speaker 1

33:34

I mean is that a is that a question that you guys have to tackle because you know on 1 hand you want to illuminate these papers that people could potentially learn from. Then on the other hand you're running a site with content right And you want things that are going to capture attention. So I saw you have the Charlie Munger post on there.

S1

Speaker 1

33:53

Right.

S2

Speaker 2

33:54

Mika annotated the Charlie Munger paper. Other co-founder.

S1

Speaker 1

33:59

So it's like squarely non-technical paper. But Charlie Munger has millions of fans across

S2

Speaker 2

34:05

the world.

S3

Speaker 3

34:05

So you

S1

Speaker 1

34:05

kind of have to balance those 2 things.

S3

Speaker 3

34:08

Yeah, and yeah, it's not easy. And citations are definitely a proxy, right? If the paper is getting cited a lot, it has some sort of importance, but it's definitely not perfect.

S3

Speaker 3

34:21

And if you look at the most cited papers in these different fields, you might be surprised that there might not be the ones that you expected to be. I certainly remember looking at like the most cited papers in computer science, And they're definitely very impactful, but you might, some of them, I remember reading through those 10, and some of them I'd never heard about before. And so, yeah, and sometimes very important, well, this is more specific for certain fields, very important concepts or discoveries never really get published in 1 paper that then gets a ton of citations. That knowledge gets spread in some other way.

S3

Speaker 3

35:01

And so there are, yeah, citations that are not perfect. But I wouldn't say that we have a great answer for that. What's a better proxy and how you should go about it? And I don't think anybody really right now has a better answer to, or not that we've heard about, but yeah, it's an interesting problem.

S3

Speaker 3

35:24

We'll see what people start using in the future. Because yeah, You could measure impact or how many people are talking about it on social media.

S2

Speaker 2

35:35

There's many blog posts are written about this paper. Or if you have code, if you have a public repo, how many forks do you have on your repo?

S3

Speaker 3

35:47

Yeah, or like for certain. And then it depends on field by field, right? So if you take bio, then bio papers can have a very direct, can be used very directly, say, in industry, right?

S3

Speaker 3

36:02

You can publish a paper about a drug, and then that can be used worldwide and save lives. So there are like, for that field, maybe you can, there are a bunch of other metrics that you could use there to calculate the impact of a paper. But for the more traditional science, like physics and math, sorry, yeah, that's it's hard. Okay.

S1

Speaker 1

36:26

Question up top, Arsalan Yarvesi asks, It's basically about working in public and in the speed of publishing. They say, since scientific papers usually go through scrutiny and evaluation before getting published, how do you cope with not being always updated and up to speed in a world with daily news and contributions. This kind of relates to what we were talking about before in relation to people publishing to the archive before they really test it out.

S1

Speaker 1

36:55

Where do you guys fall in that dynamic of like publishing as soon as possible? Like with something like machine learning where things are just getting put out all the time versus going through a peer review for getting something out.

S3

Speaker 3

37:09

And this this kind of loops back loops into peer review, which is the whole world unto itself that people are talking a lot about. For us, generally, or say for a weekly journal, we generally are not publishing the most recent research.

S1

Speaker 1

37:27

And

S3

Speaker 3

37:28

there is definitely, like sometimes, there's a lot of us having to catch up to even, I remember annotating a paper about like, this machine learning algorithm to play one-on-one poker.

S1

Speaker 1

37:43

And

S3

Speaker 3

37:43

this was like, out of my league, I had to go like spend a good amount of time there researching it and also figuring out, okay, how relevant is this? I also don't because, you know, I'm not in the field, so it's hard for me to gauge, okay, what's the impact of this paper? So Yeah, there's sometimes it takes us a lot of reading up before we can actually say, okay, this is worth publicizing and having our audience or it's worth our stamp of approval and say, hey, you should read this.

S3

Speaker 3

38:16

I think you'll like it. And it can take a while sometimes. But in the future, like looping back to peer review, that's also something that I think the system nowadays does not seem to be perfect for the way things work nowadays. And we would love to see either VIA Fermat or some other platform to try to tackle that and try to do something to make the peer review a better system or to change it significantly.

S3

Speaker 3

38:50

I think there's a lot of work left to be done there, which can have a very significant impact in science. That's part of the most, 1 of the most important aspects of science is just, OK, having a very skeptical mindset, looking at it with a very critical eye, and seeing, OK, is this something that we can build upon? Is this something that we're going to add to our foundations to build more science upon this? And so that's a very important aspect of science and I think it's not perfect and could be better.

S1

Speaker 1

39:26

So Anvil Rotterdam asks, Have you ever thought about building a tool for annotating books? Something like what Patrick Collison was talking about in this thread where he basically says I'd pay a lot more for books if I could see the highlights, annotations, and marginalia of friends or people I follow.

S2

Speaker 2

39:42

Yeah. No, it's, it's, I think it's, it's actually a really, really good question. And, And we have a friend, Jess Riedel, from the Preliminary Institute. He's a researcher there that wrote about this on his blog.

S2

Speaker 2

39:58

And I think that besides annotating academic papers, it also makes total sense to annotate books. And especially kind of introductory books about science. And he gives this example of a book that is used by thousands of students to learn classical mechanics called Goldstein. And there is a section on that book where they talk about this transformation called the Legendre transform.

S2

Speaker 2

40:31

And he does a bad job at explaining what it is. But apart from that section, the rest of the book is awesome. It's really nice if you want to learn classical mechanics. But if I want to write a book that does a better job at explaining the Legendre transformation, it has to be net better than the Goldstein book so that anyone will adopt that book.

S2

Speaker 2

40:55

Otherwise, people just keep using the Goldstein book. So It would make sense for books to be annotated and also be open source so that in that sense, you would just commit a new chapter, a new explanation for that, and keep all the other chapters, and then just change that bit instead of having to write a new book and then convince people to adopt your book just because of that. So I think it makes total sense to do that.

S3

Speaker 3

41:24

More introductory. No, and we've thought about that, the type of things that you could do. If you add some platform where you could have books that kept being updated, and you could have, okay, this is the standard for learning calculus, where this is constantly being updated, you're adding exercises to it, people are forking in, if you need more information about this, you're not understanding it, you could deep dive into it and you have a bunch of additional content that is attached to it.

S3

Speaker 3

41:57

Really feels like something that should exist. And we've thought about it, like about doing something with format for that. Yeah, it's just it's so many things.

S1

Speaker 1

42:09

Just in terms of copyright, are there massive issues there? Or is that possible?

S3

Speaker 3

42:16

I think some of the some you might be facing some of the same challenges that Wikipedia is facing to an extent. Then yeah, I think it would depend a lot on the format that is used. I do think there's, for something like this, you'd probably benefit from having some editor or like a team of editors to curate and to see, okay, what, like, Should we add this?

S3

Speaker 3

42:46

Should we not? To an extent, to be a curating voice. In terms of copyright, yeah, you could run into some issues there.

S2

Speaker 2

42:53

Well, some of these, especially the classic books on electromagnetism are like...

S1

Speaker 1

42:58

They're out of copyright, yeah.

S2

Speaker 2

42:59

Yeah. A lot of these are... Yeah, I

S1

Speaker 1

43:00

mean, my impression was that these are maybe even like current books coming out, like popular fiction even, as annotated by X famous person. So, I mean, Maybe if they gave away their notes for free and they were just a layer on top, then you're good. But if you wanted to resell your own version of the book.

S3

Speaker 3

43:22

Yeah, that's interesting. There's also some legislation. Well, there's fair use, where you can use a piece of content if you're adding on to it.

S3

Speaker 3

43:37

This is why you can have a video on YouTube with a snippet from a movie if you're reviewing it. There's some precedent there for doing this type of thing. But yeah, but for more general books, I also agree that it would be amazing. Because we were just talking about this.

S3

Speaker 3

43:56

We've talked about this for a while now. Because you read a book, And the purpose of that book is not only for you to absorb all the knowledge that is there, but it's also to get you thinking about what's being talked about in the book. And then you might reach some other conclusion. You might go on a tangent.

S3

Speaker 3

44:15

And when you're reading it, that knowledge might never be shared with anybody else. You might just read it yourself and you think, okay, this just made me think about something else. And it would be really, like there's a lot of knowledge that is being lost. And it would be great if you could capture it in some way.

S1

Speaker 1

44:34

The Amazon Kindle highlights site is 1 of the saddest things I've ever seen. Yeah. Have you ever done that?

S3

Speaker 3

44:40

We have Kindles but we haven't explored them.

S1

Speaker 1

44:42

Oh yeah, so there's a whole web interface for looking at all of your highlights across all of your Kindle books.

S3

Speaker 3

44:48

It's not good. So do you use it for anything?

S1

Speaker 1

44:51

I mean, sometimes I go back. So like the best way that I've found for me personally to retain is to buy the audio book and go through a book a couple times. And then my retention goes way up.

S1

Speaker 1

45:02

But occasionally I'll be just like, what was that passage in, you know, whatever book. And I'll go back onto Amazon and you can like dig. It's from Amazon. Yeah.

S1

Speaker 1

45:11

And you can dig through your highlights from your Kindle.

S2

Speaker 2

45:13

I think I've seen like a startup that does that in a better way.

S1

Speaker 1

45:17

Kind of

S2

Speaker 2

45:17

pulls all your highlights and organizes them. You know, the Kindle highlights.

S3

Speaker 3

45:23

Mad Fientist I remember looking into this. But what I've started doing is, well if I'm ever, I also use Kindle and so that's, I don't do, I don't, usually don't write annotations via Kindle somewhere, or highlighting, I usually don't use it for that, but if I'm reading a physical book over the path, whereas before, maybe I would never write anything, now I try to like write a lot more there. And then at some point, if I have time to try to go through, try to go through the books, see where I wrote things and then write that in some notebook.

S3

Speaker 3

45:56

And because there is like, Just going through that exercise of looking what you highlighted can be very helpful.

S1

Speaker 1

46:06

Yeah, I mean, I was an English major in college, so I've forgotten more books than a lot of people ever read in college. Yeah. And 1 of my professors actually recommended this, which is basically take a 5 by 7 index card.

S1

Speaker 1

46:21

And as you're reading the book, you're making little notes, right? You're like, all right, this character does this, or like this is an important point. And then at the end, you basically write a paragraph to your future self, describing your memories of the book and what happens and like important ideas and that can really like trigger it

S2

Speaker 2

46:38

for you

S1

Speaker 1

46:39

to retain but past that like I don't know.

S3

Speaker 3

46:41

Yeah. No but I remember in school like and or back in Portugal we all have to read this epic poem that is like, it's called the Lusíadas. And it was written by a poet back in the day. And it's about the Portuguese going from Portugal all the way to India.

S2

Speaker 2

46:57

The Portuguese discoveries.

S3

Speaker 3

46:58

The Portuguese discoveries. And so I remember we had a version. You had the original version, which is pretty thick.

S3

Speaker 3

47:05

And then we also had the version that had annotations on the side for each verse, or not for all of them, but for a lot of them. And that made such a big difference, right? Because you're reading in this, in old Portuguese, which by itself is already hard to tell. I mean, he's using, he's making references that you have no clue about.

S2

Speaker 2

47:26

So much historical context in every word, almost.

S3

Speaker 3

47:31

It was like- The names of all, like India was not called India. So like, there's, everything is different. And you're reading it through, the first time you go, it sounds great, it rhymes, but you don't understand a lot of the context behind it.

S3

Speaker 3

47:48

And if you go through it and you read through it and then on the side you have all this rich content that really only adds on to your experience and makes it much more memorable. You can map it out in your mind and create much more connections. It really enriches your experience. And of course, you have this because in this case, this is an epic poem that everybody has to read.

S3

Speaker 3

48:11

And so there is a large incentive to publishing the annotated version of this book that is no longer under copyright. And so there you can have those type of things. But for a lot of more recent books, I think you could benefit a lot from having that to some extent. Where if you're reading through these few pages and you love what the author is talking about here, you wanna dig deeper into this topic that he's talking about right now, there should be some place where you could do that.

S3

Speaker 3

48:42

But yeah, it's just nobody has actually built this. I mean,

S1

Speaker 1

48:46

I think that like defaults toward the blogosphere for most people. They just like some people summarize books and like write Amazon reviews. Yeah.

S3

Speaker 3

48:56

But then the thing there is that, and sometimes that content does exist, But being able to find it easily, having that in your fingertips, can make the whole difference. Right? Even if you, yeah, maybe you could spend a minute searching on Google and you'll find the content there you're looking for.

S3

Speaker 3

49:16

But if it was right there, you could just click and it would pop up and you'd see it then. It would be much more likely that you would end up reading that content. Those type of things make a big difference, being right there.

S2

Speaker 2

49:27

Dave

S1

Speaker 1

49:27

Do you find that annotations sometimes are best done by someone who is not the author of a paper?

S3

Speaker 3

49:35

What's interesting is that the authors of the paper, sometimes they are not going to know where people are going to struggle understanding the paper, oftentimes. I remember when I was annotating the Ethereum white paper, written by Vitalik, I went through it and then I emailed him, and it's super quick to reply, And he replied back with some of the questions that he gets the most about Ethereum.

S1

Speaker 1

50:08

Makes sense.

S3

Speaker 3

50:09

But when you're writing it, you have no clue. For you, you've worked it out in your mind. Some steps you might skip, because you just have internalized them by so much.

S3

Speaker 3

50:18

So you only get, you only know where people are going to struggle once you put it out there and you start getting questions. And so, yeah, so sometimes the authors are not the best.

S2

Speaker 2

50:30

Every time we talk with an author, I think it's easier for them to answer questions about their papers than to annotate the paper. But then if you have another person annotating a paper, I think it's easier for them. But yeah, with the authors, we see that a lot.

S2

Speaker 2

50:46

Just ask me questions, I'll answer them. But sometimes I don't know how to enhance or add content to my own paper.

S1

Speaker 1

50:53

Yeah. You guys could provide that service for sure. You could like reverse engineer clear papers.

S2

Speaker 2

51:00

Yeah.

S1

Speaker 1

51:01

It's kind of worth noting that this is a side project for you How I mean I have so many questions about like how you go about Building this thing that's like definitely consuming a lot of your time I mean it has to right between finding reading papers making all those like graphics and tweets and stuff that you guys do. How do you find that balance? Like what's your whole philosophy around this?

S3

Speaker 3

51:29

Yeah, So yeah, it definitely takes its time. It is something that we actively tried to do after college and while we were working, while we were just, before doing, reading papers and staying up to date, it's something that we tried to do anyway. And so we were already looking into research before as just something that we would enjoy.

S3

Speaker 3

51:54

And then we found it good to have some sort of peer pressure amongst ourselves to present papers to each other, right? Because that really forces you to understand something well, right? I think it was Feynman. He has some quote where you don't understand something until you can explain it to some freshman in college.

S3

Speaker 3

52:17

And so that's very true. And so we tried to do that amongst each other. And then, and so then we got to Fermat and we thought, okay, maybe we can bring this online. And so we were already spending an healthy amount of time doing this type of stuff.

S3

Speaker 3

52:34

But it is, but with Fermat you have to, like the first version of Fermat, we kind of build it over the weekend and we try to just make it, just put it out there as fast as possible. And then it's mostly late at night. I'll be trying to fix bugs. People in Acronews don't seem to think that it's a side project.

S3

Speaker 3

52:56

And they're pretty harsh on it. So yeah, so there are definitely bugs. And sorry about that. We try to fix them when we have time.

S3

Speaker 3

53:05

Yeah, but it definitely takes its time. But

S2

Speaker 2

53:07

it's I think it's also something that all of us really like doing. And I mean, I start looking at Wikipedia articles about quantum computing and then I like spend 3 hours clicking on articles and articles and articles. And then I found like 5 papers to annotate.

S2

Speaker 2

53:24

And I've produced like 10 or 15 tweets. So it's something that we really enjoy doing.

S1

Speaker 1

53:30

Yeah.

S2

Speaker 2

53:31

And so it's it's you know, I

S1

Speaker 1

53:33

think that that's the real genius of it. Right. It's like basically figuring out a way to turn your, I mean, if you have the desire, turn your what would be your hobby anyway

S2

Speaker 2

53:42

into this little exercise project.

S3

Speaker 3

53:43

And having a forcing function, Because this type of thing is really easy to let go, right? Because sometimes, yes, sometimes you might not feel like understanding a paper to the point where you could annotate it. It takes a while to get a good grip, especially if it's not an area that you're super familiar with.

S1

Speaker 1

54:04

And

S3

Speaker 3

54:04

so it's not that, yeah, that's definitely not the type of effort that you do on a Saturday night, right? Unless you add a forcing function that you know that within a couple of weeks you're going to be putting this to a lot of people.

S1

Speaker 1

54:19

That's my favorite part of the podcast. Like with the software stuff, it's pretty easy for me to just like it could be anyone in the room and we can do a podcast. But when we do physics ones or any or math or something, I'm just like, oh, I have to take a couple of days just reading.

S1

Speaker 1

54:33

I'm not, obviously I couldn't even become an expert if I dedicated a week to it, but I want to be conversant to a certain extent and that part's fun.

S3

Speaker 3

54:41

Yeah, same with us. You definitely feel the pressure when you're writing these annotations. Yeah.

S3

Speaker 3

54:47

Because people, and people will call you up on it. And you'll be like, OK, this is wrong. Or you missed this. And so when you're writing it, you want to be really careful, make sure that what you're saying is correct.

S3

Speaker 3

54:58

And you know that you might have somebody that actually, a college kid or whoever, that is reading through that paper and then is going to use your annotation to help him understand. So you have the responsibility, we feel that responsibility towards those people to do a good job at it. And when we put an annotation, we want to stand by it, and we want it to be of quality.

S2

Speaker 2

55:26

And it's funny. It's like the more you annotate a paper, this is like a circle. And the more you annotate a paper, the more people there are that are at the edge of starting to understand what the paper is about.

S2

Speaker 2

55:39

So you start getting more and more questions, because the circle expands. And then you just have more people that are starting to understand this topic about number theory or physics or whatever. So you get more and more questions about the paper. And then when do you stop explaining a certain concept?

S2

Speaker 2

55:55

So it's like you want to annotate a paper about number theory. OK, do you have to explain what a prime number is, for instance? Or Do you have to explain what a prime number is, for instance? Or do you have to explain what a rational number is?

S2

Speaker 2

56:06

So it's really interesting once you start thinking about that, like how deep do you go. MARK MANDEL-WILSON And

S1

Speaker 1

56:14

you've got to be careful about those YouTube videos then. Because if you get discovered on YouTube as an explainer series, good luck.

S3

Speaker 3

56:21

People will start asking. Yeah, yeah, yeah. No, no.

S3

Speaker 3

56:24

We've done a few of those, but yeah.

S2

Speaker 2

56:26

We've annotated a paper that it was, I think it was a proof of the irrationality of the square root of 2. And then there was this, I think it was 14-year-old kid from Russia, that because of that paper, he came out with an alternative proof for that. And he sent us that proof.

S2

Speaker 2

56:50

And I read the proof, and it was apparently, you know. It was legit? Yeah. And I told him to submit that to a journal, a mad journal.

S2

Speaker 2

56:59

And I think he did it. I haven't heard back from him, but we should reach out to him to see if he actually was able to publish it. So it's also nice to see how we can inspire people sometimes to do these types of things. And I also think, especially with Twitter, 1 of the things that we learned is that learning something, learning a concept or learning a fact is really, really addictive.

S2

Speaker 2

57:25

And we see that on Twitter almost every day. People come back and we have hundreds of thousands of users that read our tweets. And I think that's why people really like when they have a good teacher and when they can go to a class and really learn something. I think the problem is that usually that requires a lot of effort from people.

S2

Speaker 2

57:51

You either have to go to a class or you have to read a book to learn something. And I think what we're able to do with our Twitter account was to provide that same feeling, acquiring a quantum of knowledge, but at the cost of reading a tweet, which is really easy for the reader. Sometimes it's really hard to make those tweets. It requires a lot of reading and thinking, how can you explain something with just these characters and an image maybe.

S2

Speaker 2

58:19

But once you get to that, and once you're able to teach someone a fact or something, people really like that. And I think it's something that there should be more people exploring that on Twitter. It's a very particular medium.

S3

Speaker 3

58:37

But there's a lot of people that are attracted by that. You might not, yeah, you might not. A few years ago, I would have been very surprised, but now you have all of these scientific, be it explainers, but you have people that have millions of followers and what they're following for is for scientific content or they just want to learn.

S3

Speaker 3

58:59

And So that's something very uplifting that we've learned, that there's a lot of people out there that want to learn. I think

S1

Speaker 1

59:06

it's too easy to get down on those people. They're just like, oh, you know, this is like base, it's fun facts or whatever.

S3

Speaker 3

59:11

But like

S1

Speaker 1

59:12

at the end of the day, like that's good. People are excited to learn. They want to learn.

S1

Speaker 1

59:17

And then you like extrapolated out a little bit more and you look at someone like Dan Carlin doing the Hardcore History podcast. I think if you had objectively like written that down, you're like, all right, I'm going to produce 25 hours of content about the cons and people are going to be into it. I would have told you no fucking way. And then you look at it and it's like millions and millions and millions of downloads.

S3

Speaker 3

59:39

Yeah. That's pretty cool. There's some things you look at and it really catches you by surprise. I mean this is parallel but it's like Wikipedia for instance.

S3

Speaker 3

59:49

If somebody had pitched Wikipedia to me before Wikipedia existed,

S1

Speaker 1

59:54

I

S3

Speaker 3

59:54

would have never guessed that would be possible. Yeah. Because how are you going to do this?