Format's library is a platform for annotating papers, allowing users to add annotations to PDFs that support LaTeX and Markdown. The founders, who have technical backgrounds, started Format after realizing the need for online collaboration and support in understanding dense scientific papers. The platform aims to promote open science by encouraging the sharing of data, code, and publishing in non-paywalled journals. The founders also hope to improve discoverability and recommendation of relevant papers. The Format team has developed a browser extension for the preprint repository, Archive, which allows users to view and add comments directly on papers. The extension is particularly useful for areas like machine learning where papers are published at a rapid rate on Archive.

- The Librarian Extension is important in machine learning to distinguish good quality work from bad work in papers that haven't been peer reviewed
- The Librarian Extension is considering implementing a rating system for papers, potentially through likes/dislikes or rating different aspects of the paper
- The focus right now is on solving problems that users have reported, such as reference extraction
- Collaboration around science could include forking papers or data, but it is not currently a focus
- Negative results in academic publishing are not incentivized, but there are researchers studying this field
- P-value hacking is being looked into as a way to determine the significance of research results
- There is a need for more publication of negative results to aid progress in research
- There are researchers, such as John at Stanford, who have analyzed the lack of negative result publication in economics
- The interviewee reads a variety of papers, often through a random walk or based on specific interests such as sleep research.

- Lack of sleep, both less than 7 hours and more than 8 hours, is associated with higher mortality.
- Sleep quality does not have a significant impact on mortality.
- The hot hand phenomenon in basketball, where players are believed to have a higher chance of making a shot after a successful shot, is not supported by data.
- The Simpson's paradox explains how different conclusions can be drawn from the same data depending on how it is analyzed.
- Papers in the past were shorter and simpler, but modern papers have become longer and more detailed.
- The reasons for longer papers may include the development of more complex questions and the need for reproducibility.
- There is no trend towards shorter papers in current scientific writing.

- Sometimes the most impactful scientific papers are not the ones that make the initial discovery.
- Clear and accessible writing is important in scientific papers to help spread knowledge and make it understandable to a larger audience.
- Metrics for measuring the impact of a paper, such as citations, may not always align with making the paper accessible.
- The balance between illuminating papers and capturing attention on a content platform is challenging.
- Different fields may require different metrics to measure the impact of a paper.
- The process of peer review could be improved to better adapt to the current scientific landscape.
- There is potential for building a tool for annotating books to share highlights and annotations with others.

The researcher discusses the idea of annotating and open-sourcing books, particularly introductory science books, to improve explanations and capture lost knowledge. The challenges of copyright and the need for curation are mentioned. The importance of easily accessible annotations and the limitations of current methods, such as Kindle highlights, are highlighted. The benefits of annotations in understanding complex texts and historical context are emphasized. The researcher also mentions the value of annotations from readers rather than the authors themselves.

- The speaker recalls annotating the Ethereum white paper and discussing it with Vitalik, the author.
- Authors often struggle to anticipate where readers will have difficulties understanding their work.
- The speaker and their team enjoy researching and presenting papers to each other, which led to the creation of Fermat.
- Building Fermat is time-consuming, but the team finds it enjoyable and fulfilling.
- The team feels a responsibility to provide accurate and high-quality annotations for readers.
- The process of annotation leads to more questions and a deeper understanding of the paper.
- The team's Twitter account provides a way for people to learn and acquire knowledge in a convenient manner.
- Many people are interested in learning and there is a demand for scientific content on platforms like Twitter.
- It is important to appreciate and encourage people's enthusiasm for learning.

- The speaker discusses how projects like Wikipedia and Stack Overflow rely on people's goodwill to contribute and help others.
- They believe that humans have untapped reservoirs of goodwill that can be leveraged more effectively.
- The speaker mentions the linear growth of projects like Archive, emphasizing that some things take time to grow and make an impact.
- They mention that profitability and sustainability were not initially a concern for their project, as it was a side project with limited resources.
- The speaker discusses the possibility of building platforms like theirs for profit or non-profit.
- They mention the trend of open journals being built on top of Archive and the potential for these journals to gain reputation and challenge the paywall system.
- The speaker suggests that convincing renowned scientists to publish in open journals could help shift the equilibrium and make them as popular as traditional journals.

- Format's library is a platform for annotating papers, promoting open science and collaboration.
- The Librarian Extension is important in machine learning to distinguish good quality work from bad work in papers that haven't been peer reviewed.
- Collaboration around science could include forking papers or data, but it is not currently a focus.
- Negative results in academic publishing are not incentivized, but there are researchers studying this field.
- Lack of sleep, both less than 7 hours and more than 8 hours, is associated with higher mortality.
- Sleep quality does not have a significant impact on mortality.
- The hot hand phenomenon in basketball is not supported by data.
- The Simpson's paradox explains how different conclusions can be drawn from the same data depending on how it is analyzed.
- Modern scientific papers have become longer and more detailed.
- Clear and accessible writing is important in scientific papers.
- Metrics for measuring the impact of a paper may not always align with making the paper accessible.
- The process of peer review could be improved.
- There is potential for building a tool for annotating books to share highlights and annotations with others.
- The speaker discusses the idea of annotating and open-sourcing books to improve explanations and capture lost knowledge.
- The challenges of copyright and the need for curation are mentioned.
- Annotations in understanding complex texts and historical context are emphasized.
- The team feels a responsibility to provide accurate and high-quality annotations for readers.
- The team's Twitter account provides a way for people to learn and acquire knowledge.
- There is a demand for scientific content on platforms like Twitter.
- Projects like Wikipedia and Stack Overflow rely on people's goodwill to contribute and help others.
- The speaker believes that humans have untapped reservoirs of goodwill that can be leveraged more effectively.
- The growth of projects like Archive takes time but can make an impact.
- Profitability and sustainability were not initially a concern for Format, but it could be built for profit or non-profit.
- Open journals built on top of Archive have the potential to challenge the paywall system.

- The mission is to convince famous mathematicians and others to publish on open journals.
- Open journals are important for young researchers trying to get positions in competitive fields.
- Big names endorsing open journals will increase their reputation.
- Access to research is a problem outside of prestigious institutions like MIT.
- Well-funded universities in Portugal still struggle to afford journal subscriptions.
- Contributions can be made by annotating papers on Fermat's library, spreading the word, and uploading papers.
- Donations are accepted, with most costs going towards server expenses.

- The mission is to promote open journals and convince famous mathematicians to publish on them.
- Open journals are crucial for young researchers seeking positions in competitive fields.
- Endorsements from renowned mathematicians will enhance the reputation of open journals.
- Limited access to research is an issue outside prestigious institutions like MIT.
- Even well-funded universities in Portugal struggle to afford journal subscriptions.
- Contributions can be made by annotating papers on Fermat's library, spreading awareness, and uploading papers.
- Donations are accepted, with most funds allocated for server expenses.

João Batalha - https://twitter.com/joao_batalha - and Luís Batalha - https://twitter.com/luismbat - are cofounders of Fermat’s Library - https://fermatslibrary.com/

Fermat’s Library is a platform for annotating papers. Each week they send out a paper annotated by their community. Some recent papers were Birds and Frogs by Freeman Dyson - https://fermatslibrary.com/s/birds-and-frogs - and Von Neumann's First Computer Program by Donald Knuth - https://fermatslibrary.com/s/von-neumanns-first-computer-program

They’ve also built a Chrome Extension call Librarian - https://fermatslibrary.com/librarian - for the arXiv which allows you to get direct links to references, do BibTeX extraction and make comments on papers.

You can find them at https://fermatslibrary.com/

Read the transcript at https://blog.ycombinator.com/fermats-library-annotating-academic-papers-every-week

Okay, and what made you want to start Format's library?

So, just for the people that don't know what it is, Format is a platform for annotating papers.

And so if you want to think about it, you imagine a PDF view in your browser, and then you have annotations on the side that support LaTeX and Markdown.

And so you can add annotations in parts of papers that you think are particularly tough to understand or you think you could add more content there.

The 4 of us that started Format, we all have a technical background.

And so after college, we kept on reading papers.

And every once in a while, we had this internal journal club where we would read a paper and present it to the others.

So I remember, for instance, presenting a few years back, presenting the Bitcoin paper to Luis and Mika, which don't have a CS background.

And so you kind of have to go into, for instance, for the Bitcoin, you might have to go into, OK, what's a hash function?

And we knew that you also have this behavior offline in places like universities.

And so we wanted to take that experience and bring it online.

And we thought there was a lot of content that you end up producing while you're trying to read a paper, which can be the most dense piece of content that a human can read sometimes, right?

The language can be incredibly spartan, and sometimes there's a step in some paper that they say, oh, this should be obvious, but then you look at it, and it's like, okay, I don't get it.

And so we knew that there was a lot of content there that you end up producing while trying to understand a paper and we wanted to bring that online.

I studied physics together with Mika and João and Taimo went to MIT.

Taimo studied economics and you studied CS.

So a lot of the papers are around physics, math, economics, biology, CS, right?

Yeah, because that was, you kind of like solved the cold start by just annotating yourself.

And now it's more about getting the author in there.

We start our first paper was the Bitcoin paper.

It has been there for the longest and it was quoted or just there are a bunch of news sites that have pointed back to it.

It's like, okay, if you wanna read it, go to the annotated version.

But We had a few cool people comment there.

Laurence Lessig commented on the Bitcoin paper.

A bunch of people from the Bitcoin community Exactly.

But the larger goal with Fermat is to try to move things in the right direction, meaning move science towards what people call open science.

And so that encompasses a number of things from open data, which means just sharing the data that you've used for publishing whatever research you might be publishing.

And you want to share that and make that easily accessible to people so that if they want to replicate the results that you got or use it in their own research, they have an easy time doing that.

You also have just publishing the code that you used or the algorithms that you've used and making those more easily available to people.

There's also open publishing, which means just publishing in papers that are not behind, or in journals that are not behind paywalls.

So there's a lot of things that are within open science, all of those.

And then there's also, so we want to push things in that direction and also try to build a platform that makes it easier for people to collaborate.

And we think there are a lot of things that could be happening nowadays where people could be collaborating, scientists could be collaborating remotely a lot more than they are.

But it's starting to change where we've had for the paper, the Erdos.

We're seeing more and more people collaborating online around papers.

So for instance, there is this famous example around a problem called the Erdos discrepancy.

And this problem is a famous problem that was posed by Paul Erdos, which is like this famous mathematician 80 years ago.

And Terence Tao, the field's medalist, was trying to solve the problem.

And he put it on his blog that he was trying a certain approach to solve the problem.

And then there was this guy from Germany that just wrote a comment there, like the size of a tweet.

And he said that the Erdos problem had a Sudoku-like flavor, and that some of the machinery that they were using to solve the Sudoku problem could be used there.

And that was actually the key to correct the problem.

And they ended up publishing a solution to the Yardosh discrepancy problem, which was probably 1 of the biggest milestones in number theory in 2016.

And that was all thanks to a comment on his blog and to the fact that they were collaborating online around solving that problem, which was also a polymath problem.

The polymath project was a project started by these other Fields medalists called Tim Gowers.

And they were trying to, it was actually a social experiment to see if it was possible to solve math problems online and collaborating around math problems online.

And yeah, and they were able to solve it, thanks to that comment.

Because you kind of see, right, You look at GitHub, and then you think of the impact that GitHub has had for open source.

Open source, of course, existed much before GitHub.

But it has really allowed a lot more people to come in and be able to get into open source and start contributing.

And there are a number of other really interesting platforms.

You have Wikipedia just for more general knowledge or you have Stack Overflow for just programmers helping each other.

And we think that there could be something similar to that.

Well because did you listen to the Rogan with Peter Atiyah?

And he talks about, I don't know if They're talking about the archive in particular around publishing papers but he talks about having full time staff.

Just scrubbing the data looking for interesting information coming out.

And again, like in the context of Stack Overflow.

That's the place where like programmers find specific answers to problems.

Whereas with the archive, like good luck.

And so have you guys thought about addressing like just discoverability in the context of particular fields?

Like For instance, paper recommendations, it's really hard to.

Because you're just doing 1 a week right now.

And We also have our tool that is used internally at universities and research groups for people that they're reading papers together and they add annotations.

So we release a paper every week that we select and we annotate it, or somebody in the community annotates it.

And then we have the archive extension that adds a bunch of features on top of archive, like BibTeX extraction, reference extraction, and comments.

And eventually, definitely, like recommendation engine and making it easier to discover papers that are relevant to you, that's something we definitely want to add onto our archive extension.

Yeah, initially we started Fermats, as John said, as a journal club.

And then we saw that people liked the interface, the commenting interface, and liked reading the annotation.

So now we are starting to expand and turn formats into more of a platform.

And that's why we decided to do the archive Chrome extension.

Because archive, for people that don't know what it is, it's basically a place where papers leave before they go to journals in the form of preprints.

So they are like drafts before they go to journals.

And what we did is we built a Chrome extension that basically allows people to see all the commenting interface on archive papers.

And so you don't have to go to another website.

You're just reading archive papers, and you see the comments on the site if you have the Chrome extension installed.

Well, and a lot of these papers don't even have comments on the page.

Like, best case, you're emailing the author?

So what Archive does, it's basically they just host papers.

That's the core functionality of Archive.

And so 1 of the things that we noticed is that, especially for areas like machine learning and deep learning, archive is super important.

Because the new papers are coming out at such a high rate that people don't wait before the papers go to journals, before they start working on top of it and using the stuff that other people discover.

So all the papers are published on archive.

And so you need a way to distinguish good quality work from bad work if you are reading a paper on archive that hasn't been peer reviewed or something about machine learning.

And I think that's why the Librarian Extension is so important in fields such as machine learning.

So does the Librarian Extension have a rating mechanism as well?

Like, how do you distinguish good from bad work?

Right now, it's only through the comments.

But we are actually thinking about implementing some sort of rating system for papers.

And we're probably going to also, we've been thinking about that for a while now.

We're probably going to run a few surveys to our audience to, because you could do it in a number of ways, like rating a paper, you could do it, obviously there's likes or dislikes or upvotes and downvotes.

So you could either just have an holistic rating for the whole paper.

You could also imagine rating it on a number of different aspects of the paper.

It could be about, OK, how big is their data set if they're using some data set?

Or what do you think about their methods?

So you could have a more complex rating system.

And so we've been thinking about that a lot.

And we're just trying to figure out what makes the most sense there.

But that's also definitely in, like, we would love to add that to Archive, or to the Chrome extension for archive.

Yeah, so how do you think the collaboration plays out then?

Because I understand how, you know, say for instance, you know, you're a physicist, you start commenting on someone else's paper, you start a discussion that creates a new project, right?

Do you think you'll go further than there?

Like, are we talking about like forking and that kind of stuff?

Yeah, that's, I think you could, there's a lot of things that you could do if you, once you have a platform that has more people in it and that they're doing more stuff in it.

And so that's why the way we've been growing Fermat is with a goal far in the future where we are a much broader platform.

And so right now, but right now we were focused mostly on solving problems that people have nowadays.

And actually, we were largely inspired for our archive extension by the survey that the Archive guys did where they, right, they had, I don't know how many people, but they surveyed the people that use Archive and then published a paper where they describe the problems that those people reported while using Archive and the things that they most wanted to see, the features that they most wanted to see.

And then the archive folks just said, hey, we're just going to be the platform to build upon.

And we're not going to do all of these things that people would like us to do.

There's anybody else that wants to work on this, here are the results of the survey.

And since then, they've actually done a pretty great job of building an API and wanting to become more of a platform.

And so There's a lot of ways that we envision that you could have collaboration around science.

And so, yeah, like forking a paper or forking some type of research.

There's a lot of things that you could do there.

It's not something that we're focused on right now.

Right now, we're just trying to solve these problems that people have pointed out and create a place where people can just post comments and discuss around a paper.

An example of the problems that people mentioned was like, for instance, reference extraction.

So if you go to a PDF, you have at the bottom of the paper, you have the references that they used.

And most of the times, when people want to search the references, they have to copy the text in the PDF, put it on Google, and try to find the link to the paper.

And 1 of the things that we did with our Chrome extension is we allowed that.

They just click on a button in the Chrome extension, and then they see a list of references with links to the papers.

So that was 1 of the features that was most requested by the archive users.

And our idea was, initially, we wanted really to convince people to install the Chrome extension.

And so let's solve the hair on fire problems that they are describing here.

And then once we have people using the Chrome extension, and then we can expand into open collaboration around papers since they're already there.

Do you guys know of anyone working on publishing negative results?

This is something that I've been fascinated with.

And basically the problem is that as an academic, you're not incentivized to publish negative results because you want to publish things that have high impact so you can get a job or a tenure position or just get people to even care about your work, right?

I know of researchers that are studying that field a lot, but unfortunately for some of these things you just, that's a very large problem and people are becoming more aware of that.

And with that, you have negative results.

You also have people doing a lot of research into p-value hacking.

Yeah, so p-value, it's essentially a standard that people use in order to know if the results that you have obtained out of some experiment that you've run are worthy of being published.

And so, and that has worked for the most part, that has worked fine until now, but, or I mean, that's arguable, But people are looking into it and thinking, OK, should we do things differently?

And should we be much more stricter with what's considered the golden standard to publishing?

And we've thought of doing things there with Fermat, so that if you're looking at a paper, to have an idea, OK, how relevant is this paper?

This is more specific for certain areas, like if you're talking about medicine or biology, where that is really important, like the statistical significance of the results that you're presenting.

And so we've thought of doing something with Fermat there, either via some API where you could send us the DOI of a paper and we would send you some information regarding the p-value or something, or with a Chrome extension where you'd see that information displayed very prominently saying, hey, there might be some p-value hacking here or this is very solid research.

Because there is a very big problem and people are realizing how prevalent it is, especially in things like economics and biology.

I mean it came about, I was just talking to a friend who's doing a PhD at Cambridge in bio.

Yeah, and only by attending a conference in the States did he realize that there was someone in Australia working on the exact same problem as him concurrently and they're failing at the same types of experiments but because they don't publish them, Like no 1 knows the results, no 1 knows the methods, and essentially like these, you know, traveling salesman type problems that people are so excited about quantum for, like trying all these permutations, are happening at a smaller scale, but no one's publishing anything.

Yeah, and part of it is just the way research is done and you come into it and you're trying to find some correlation usually.

You will be trying to find some trend in the data and you are going to usually have that bias.

You're trying to find some correlation in publishing that.

And so, yeah, you might need to change things dramatically in order to get people to start publishing negative results, which are like could be incredibly useful for other researchers.

But there are a bunch of people working on that.

But he actually just went on this podcast, Econ Talk.

And actually, Taimur has been talking to that professor.

And yeah, and he has analyzed more this subject, but more relating to economics, I believe.

But yeah, he's found a lot of the things that we're talking about here, they're prevalent also in economics.

Tanner Goblinstein asks, what are the most interesting papers you've read in the past couple of years that are not widely known?

We end up, like I end up reading all sorts of papers from different areas.

Or sometimes you'll think, for instance, a few months ago I got like a Fitbit to track like my sleep.

And so I wanted to read papers about sleep.

And so that just got me into a random walk around, like research around sleep.

And then I found a bunch of interesting things.

I ended up annotating a paper about a big study in Finland that was done regards to the association between sleep and mortality.

There are a bunch of really interesting things that I learned from there.

For instance, that like If you sleep less than 7 hours, that's associated with higher mortality.

But if you sleep more than 8 hours, that is also associated with higher mortality.

So have you changed your life based on that?

Well, not that I was usually more on the end of not sleeping enough.

But there was also another thing from that research that apparently sleep quality doesn't matter as much, at least for mortality, which is kind of counterintuitive.

But it seems that just sleep quality is very closely related to the amount of sleep that you're

So like 7 hours of like okay sleep versus 7 hours of great sleep that's kind of hard to distinguish.

So you could like sleep on an airplane your whole life and live as long?

Maybe your life will be a little bit more miserable.

So it's hard sometimes to pick the favorites, but there is 1, for instance, there is 1 that is also kind of random, but it's a paper published in the 90s about the Simpsons paradox and the hot hand phenomenon in basketball.

So the hot hand phenomenon in basketball is you think that, OK, because they just made a field goal, the next 1 they have a higher chance of making it.

And so there's this researcher that in the 90s looked at a data set from the Celtics to to see if for free throws, if that if that was true.

And so before, they had asked students at Stanford and Cornell, like 100 students, if they thought that, OK, if they just made the first free throw, for the second 1, are they higher?

Did they have a higher chance of making it or not?

And there was something like 68 of the 100 students that were asked that agreed.

And these are people from Stanford and Cornell.

And so what they found back in the 90s, what they found was that actually that seemed not to be the case, right?

That from your second free throw is not, you're not more likely to make it if you made the first 1.

But what they found is that you're just more likely to make it on your second 1.

And so this was done in the 90s with, like, I don't know how many free throws, but maybe like 5,000.

They looked at some data from the Celtics.

then I went and got a data set from Kaggle with like 600,000 free throws.

And I reran the same, right, reran the same algorithms that they ran for the study in the 90s and then looked at what the results were.

And yeah, and so the pattern is pretty clear that just on their second free throw, they're just much better at it significantly, regardless of their first 1.

And yeah, it doesn't matter if they made their first 1 or if they missed.

Yeah, and then that paper kind of then tried to explain why people think that there is a hot hand phenomena.

And that is related to the Simpson's paradox, which for people that don't know what the Simpson's paradox is, it's also really kind of changed my world view a little bit once I learned more about the Simpson's paradox.

But it's basically, what it says is that you can get 2 valid conclusions out of the same data, depending on how you split it.

So An example is, for instance, that between 2000 and 2013, the average or the median wage for high school dropouts in the US is dropped.

For high school graduates, it also dropped.

For people with an undergrad degree, it dropped.

And for people with a graduate degree or higher, it also dropped.

So across the board, for all of those segments, the median wage dropped.

And so you look at it, and it's like, OK, what's going on here?

See all Y Combinator transcripts on Youtube

Fermat's Library Cofounders João Batalha and Luís Batalha