Pathogens are developing antibiotic resistance. The solution? Develop a database of pathogen destroying peptide chains:
How Scientific Incentives Stalled the Fight Against Antibiotic Resistance, and How We Can Fix It
Peptide-DB: A Million-Peptide Database to Accelerate Science
MAXWELL TABARROK
DEC 13, 2024
Back in July, Macroscience announced an open RFP for short papers on “negative metascience”, diagnosing places where the infrastructure for science has broken down, and how we might do better.
We’re publishing the first of these — from Maxwell Tabarrok on peptides and antibiotic resistance — today. Enjoy!
Introduction
For all of human history until the past 100 years, infectious diseases have been our deadliest foe. Even during the roaring 1920s, nearly one in a hundred Americans would die of an infectious disease every year. To put that into context, the US infectious disease death rate was 10x lower during the height of the COVID-19 pandemic in 2021. The glorious relief we enjoy from the ancient specter of deadly disease is due in large part to development of antibiotic treatments like penicillin.
But this relief may soon be coming to an end. If nothing is done, antibiotic resistance promises a return to the historical norm of frequent death from infectious disease. As humans use more antibiotics, we are inadvertently running the world's largest selective breeding program for bacteria which can survive our onslaught of drugs. Already by the late 1960s, 80% of cases of Staphylococcus aureus, a common and notorious bacterial infection agent, were resistant to penicillin. Since then, we have discovered many more powerful antibiotic drugs, but our use of the drugs is growing rapidly, while our discovery rate is stagnating at best.
As a result, antibiotic resistance is spreading. Today, certain forms of Staphylococcus aureus, like MRSA, are resistant to even our most powerful antibiotics, and the disease results in 20 thousand deaths every year in the US.
The most promising solution to antibiotic resistance comes from dragon blood.
Komodo dragons, native to a few small islands in Indonesia, are the world’s largest lizards. They eat carrion and live in swamps, and their saliva hosts many of the world’s most stubborn and infectious bacteria. But Komodos almost never get infected. Even when they have open wounds, Komodo dragons can trudge happily along through rotting corpses and mud without a worry.
Their resilience is due to an arsenal of chemicals in their blood called antimicrobial peptides. These peptides are short sequences of amino acids, the building blocks of proteins. These chemical chains glom onto negatively charged bacteria (but not neutrally charged animal cells) and force open holes in the membrane, killing the infectious bacterium. Humans have peptides too, and we use them for everything from regulating blood sugar with insulin to fighting infections.
Peptides are especially promising candidates for antibiotic-resistant pathogens for two reasons. One is that they are easily programmable and synthesized. Their properties and structure are the result of chaining amino acids together in a line, so it’s easy to work with them computationally and apply machine learning and bioinformatics. The second reason is that peptides are resistant to resistance. Researchers can use them to target much more fundamental properties of bacteria, whereas antibiotics target particular molecular pathways that are often closed by a single, small mutation. For example, bacterial membranes are almost universally negatively charged; it is a feature of their physiology which is not easily mutated away. Therefore, peptides which use this negative charge to seek out and destroy invading bacteria are difficult to avoid, even after those bacteria evolve through generations of intensive selective breeding as a result of being targeted.
Even though peptides are short, usually less than 50 amino acids, the combinatorial space of peptide sequences is vast. It’s difficult to search through this space for peptides that are effective against the resistant superbugs which threaten to return us to the medieval world of deadly infections. However, searching for these peptides is a well-defined problem with easy-to-measure inputs and outputs. The fundamental research problem is perfectly poised to benefit from rapid advances in computation. The cutting edge of research in this field involves building machine learning models to predict which sequences of amino acids will be bio-active against certain pathogens, similar to Deepmind’s AlphaFold, then developing those peptides and testing the model’s predictions.
But progress in this field is slower than we need it to be to meet the challenge of antibiotic resistance. This isn’t just due to inherent difficulties in the science, though of course those do exist. Progress towards antimicrobial peptides is slowed by scattered, poorly maintained, and small datasets of peptide sequences paired with experimentally verified properties. Machine learning thrives on big data, but the largest database of peptides only has a few thousand experimentally validated sequences and only tracks three or four chemical properties, like antimicrobial activity and host toxicity. These properties are often difficult to compare to other sources.
Most importantly, there is almost zero negative data in these sources. Scientists test hundreds or thousands of peptides to find one which is active against some pathogen, and then they publish a paper about the one which succeeded. That success might go into the database, but all of the preceding failures are kept in the file drawer, even though they are, at current margins, far more valuable for machine learning models than one more success data point.
Making a better dataset is feasible and desirable, but no actor in science today has the incentives to do it. Open data sets are a public good, so private research organizations will tend to underinvest. The non-pecuniary rewards in academia like publications and prestige are pointed towards splashy results in big journals, not a foundational piece of infrastructure, like a dataset.
This problem is solvable with an investment in public data production. A massive, standardized, and detailed dataset of one million peptide sequences and their antimicrobial properties (or lack thereof) would accelerate progress towards new drugs that can kill antibiotic-resistant pathogens. This would replicate the success of datasets like the Protein Structure Initiative and the Human Genome Project and put us on track to defeat these drug-resistant diseases, before they roll back the clock on the medical progress of the past century.
What Are Peptides, and How Do They Work?
Proteins are the machinery of biology: they constitute the motors, factories, and control surfaces of cellular life. Some proteins are incredibly complex, like this motor protein made of thousands of amino acids.
Peptides are a particular kind of protein. They are short and simple without many moving parts. Instead of using intricate and specialized binding sites like larger proteins, peptides just use thousands of copies of themselves and preferential chemical attractions to perform various tasks in the human body, like regulating blood sugar or pain sensitivity.
Antimicrobial peptides are peptides whose specific purpose is killing pathogens that are invading the body. These are subjects of active research in microbiology. Our body employs lots of antimicrobial peptides naturally. Peptides like Defensin or LL-37 are most frequently found on our skin or in our mouths and noses as the first line of defense against all of the pathogens we come into contact with.
Much is still unknown about exactly how peptides work and how to target them, but antimicrobial peptides tend to have a positive charge and two different surfaces along their structure that either attract or repel water. This attracts them to pathogenic bacteria, which have negatively charged membranes. Then, the hydro-phobic and -philic surfaces of the peptide interact with the membrane to drill holes in it, and the cell collapses and dies. Lower concentrations of peptide may not kill the invading pathogens, but they will slow down their metabolic processes, giving a head start to the rest of our immune system.
Eukaryotic membranes, which normal human cells are made of, have different fats on their membranes, which means they are much closer to neutrally charged and aren’t as vulnerable to the attacks that peptides make on cell membranes. Peptides can also target gram-positive vs gram-negative bacteria; they can preferentially attract to bacteria with thin, single-layer membranes or thick, multilayered ones. This specificity is important because it can help preserve non-pathogenic, beneficial bacteria while still attacking invaders.
None of this targeting is perfect. Peptides are sent out millions at a time and, since they get stronger as the concentration on a cell increases, small differences in chemical preference lead to big differences in activity. Some of our cells will bump into these peptides by chance and potentially be affected, but hundreds of times more peptides will be reliably attracted to targets like negative charge and particular chemicals on the cell walls of bacteria. This is similar to how traditional antibiotics work: There is some degree of targeting, but a heavy dose of antibiotics will still harm beneficial bacteria and human cells. That tradeoff is often worth it to fight off a deadly disease.
Peptides have two big advantages over antibiotics. The first advantage is resistance to resistance. Antibiotics often target very narrow biochemical reaction pathways into a bacteria’s metabolism or particular proteins found in the cytoplasm of pathogens, whereas peptides target general properties of a bacteria’s entire membrane, like charge or lipid composition. This gives antibiotics a slight advantage in specificity, but it also makes antibiotics easy to resist. Changing one residue in a target protein is a lot easier than changing the electric charge over the entire bacterial surface. This general targeting has allowed antimicrobial peptides to be effective first defenses against pathogens for millions of years without changing much.
The second advantage of peptides is that they are easy to synthesize and mass manufacture. Biology has done most of the heavy lifting for us here. Proteins are so versatile and fundamental to so many biological processes that nearly every cell has completely general purpose protein factories. We can take single-celled organisms that are simple and easy to grow, like yeast, insert the right DNA instructions, add sugar, and the yeast will start pumping out copies of the desired protein. There are dozens of companies that will synthesize custom proteins on demand for reasonable prices. By rapidly synthesizing and testing hundreds of different peptides, you can screen for effective and non-toxic treatments and scale them up in six or seven days. This is a stark contrast to small molecule antibiotic manufacturing, where figuring out how to synthesize a particular chemical can take years of trial and error, and making that synthesis efficient can take even longer.
The broad-spectrum chemical warfare and mass manufacturing ease of antimicrobial peptides makes them a promising avenue for combating antibiotic-resistant pathogens. Their ability to disrupt fundamental properties of bacterial cells, rather than specific molecular pathways, suggests that peptide-based treatments could remain effective over longer periods compared to traditional antibiotics, and the ease of synthesis means that new treatments can be made in weeks instead of years when the need does arise.
The Frontier of Research
Peptides have verified effects on the toughest antibiotic-resistant infections including MRSA, on viral infections like HIV, on fungal infections, and even on cancer. But they still aren’t common on pharmacy shelves or in hospital treatment. Some current clinical trials will change this, but the main barrier is still in the fundamental research.
Peptides are chains of chemicals where each link is chosen from 1 of 20 amino acids. Thus, the combinatorial space of possible peptides is incomprehensibly massive. We have mapped a tiny fraction of this space. Only a few thousand peptides are registered in databases, and there are even fewer with all the important information on not only antimicrobial activity, but also specific targeting and host cell toxicity. Much of the research on peptides has started by indexing naturally occurring peptides which takes advantage of evolution’s exploration of this combinatorial space over billions of years, but it’s still nowhere close to comprehensive.
The frontier of research in this field uses machine learning to explore the vast space of possible peptides and filter them down to the most promising candidates, similar to Google’s AlphaFold, which used machine learning algorithms to improve the prediction of a protein’s 3D structure based on the sequence of amino acids that make it up. Machine learning models of peptides also try to improve predictions based on the amino acid sequence of a protein, but they more directly target the medical properties of the peptides, rather than just trying to predict their 3D structure. Machine learning prediction on peptides may also be more tractable than AlphaFold because peptides are so much shorter than most proteins.
Based on a database of a few thousand peptide sequences, researchers have used machine learning techniques to predict brand new peptides that are active against MRSA, HIV, or cancer, and often at higher rates than naturally occurring analogs. One way they did this is by splicing, shuffling, and combining some of the existing sequences into new ones. Other approaches apply successive filters to the database and then combine the properties of those filtered sequences into a new peptide. Both of these approaches created peptides with high degrees of activity against multi-drug-resistant infections like Staphylococcus aureus.
All of this research is very promising, but it’s still moving slow because of one main constraint: data.
The Problem
Machine learning needs data. Google’s AlphaGo trained on 30 million moves from human games and orders of magnitude more from games it played against itself. The largest language models are trained on at least 60 terabytes of text. AlphaFold was trained on just over 100,000 3D protein structures from the Protein Data Bank.
The data available for antimicrobial peptides is nowhere near these benchmarks. Some databases contain a few thousand peptides each, but they are scattered, unstandardized, incomplete, and often duplicative. Data on a few thousand peptide sequences and a scattershot view of their biological properties is simply not sufficient to get accurate machine learning predictions for a system as complex as protein-chemical reactions. For example, the APD3 database is small, with just under 4,000 sequences, but is among the most tightly curated and detailed. However, most of the sequences available are from frogs or amphibians due to path-dependent discovery of peptides in that taxon. Another database, CAMPR4 has on the order of 20,000 sequences, but around half are “predicted” or synthetic peptides that may not have experimental validation, and contain less info about source and activity. The formatting of each of these sources is different, so it’s not easy to put all the sequences into one model. More inconsistencies and idiosyncrasies stack up for the dozens of other datasets available.
There is even less negative training data; that is, data on all the amino-acid sequences without interesting publishable properties. In current machine learning research, labs will test dozens or even hundreds of peptide sequences for activity against certain pathogens, but they usually only publish and upload the sequences that worked. Training a model without this data makes it extremely difficult to avoid false positive predictions. Since most data currently available is “positive” — i.e, peptides that do have antimicrobial properties — negative data is especially valuable.
Expanding the dataset of peptides and including negative observations is feasible and desirable, but no one in science has the incentive to do it. Open data sets are a public good: anyone can costlessly copy-paste a dataset, so it is difficult and often socially wasteful to put it behind a paywall. Therefore, we can’t rely on private pharmaceutical companies to invest sufficiently in this kind of open data infrastructure. Even if they did, they would fight hard to keep this data a trade secret. This would help firms recoup their investment, but it would prevent other firms and scientists from using the data, undercutting the reason it was so valuable in the first place.
Non-monetary rewards like publications and prestige are pointed towards splashy results in big journals, not toward foundational infrastructure like an open dataset. Scientists are often altruistic with open datasets and tools that they’ve developed for personal use. In the field of antimicrobial peptides, researchers host open peptide databases and prediction tools free for anyone to use. They are motivated by a genuine desire to see progress in this field, but genuine desire doesn’t pay for all of the equipment and labor required to scale up these databases to ML-efficient size.
The most common funding mechanisms for researchers in this field reinforce the shortfall in data infrastructure investment. Project-based grants, like the NIH’s R01, are focused on specific research questions or outcomes. These grants usually have relatively short timelines (e.g., 3-5 years) and emphasize novel findings and publications as key metrics of success.
This emphasis on short-term project-based grants stems from a desire for measurable outcomes, accountability, and novelty. University tenure committees and academics themselves heavily weigh high-impact publications and grant funding. Building infrastructure, while valuable to the scientific community, typically generates fewer publications, is often seen as less prestigious or less interesting, and has more spillover benefits that aren’t credited. NIH program officers also want clear metrics of their impact, and the higher-ups need to convince Congress that they aren’t wasting billions of dollars by enforcing accountability of their funding decisions to those metrics. Accountability is easier with smaller projects that have a shorter gap between investment and return. Mistakes are less damaging when the funding amounts are small and more of the responsibility for funding decisions lies outside of the NIH, in expert external review panels. Another important metric targeted by the NIH is novelty. The NIH and its remit from Congress explicitly prizes novelty of research and its results. Internal and external calls for the NIH to pursue more “high-risk, high-reward” research reinforce this desire for discrete projects with novel designs over and above expansions of already established scientific techniques.
The million-peptide database project is not a high-risk high-reward experiment, or a counterintuitive result that can turn into a highly cited paper or patent. Instead, it’s a massive scale-up of established procedures for synthesizing and testing peptides that will be more expensive and time-consuming than a project-based grant and have a less legible connection to the metric of success tracked by academics, the NIH, and Congress.
The Solution: A Million-Peptide Database
The data problem facing peptide research is solvable with targeted investments in data infrastructure. We can make a million-peptide database
There are no significant scientific barriers to generating a 1,000x or 10,000x larger peptide dataset. Several high-throughput testing methods have been successfully demonstrated, with some screening as many as 800,000 peptide sequences and nearly doubling the number of unique antimicrobial peptides reported in publicly available databases. These methods will need to be scaled up, not only by testing more peptides, but also by testing them against different bacteria, checking for human toxicity, and testing other chemical properties, but scaling is an infrastructure problem, not a scientific one.
This strategy of targeted data infrastructure investments has three successful precedents: PubChem, the Human Genome Project, and ProteinDB.
The NIH’s PubChem is a database of 118 million small molecule chemical compounds that contains nearly 300 million biological tests of their activity, e.g. their toxicity or activity against bacteria. This project began in the early 2000s and was first released in 2004. More than the peptide database proposed here, PubChem is about aggregation and standardization rather than direct data creation. It combined existing databases, and invited academics to add new molecules to the collection. This was still incredibly useful to the chemistry research community. With a budget of $3 million a year, PubChem exceeded the size of the leading private molecule database from Advanced Chemistry Development by around 10,000x and made the data free. PubChem is credited with supporting a renaissance in machine learning for chemistry.
Another success is the Human Genome Project. This 13-year effort began in the early 1990s and cost about $3.8 billion. Unlike PubChem, the Human Genome Project couldn’t rely on collating existing data, and had to industrialize DNA sequencing to get through the 3 billion base pairs of human DNA in time. Over the course of the project, the per-base cost of DNA sequencing plummeted by ~100,000-fold. By 2011, sequencing machines could read about 250 billion bases in a week, compared to 25,000 in 1990 and 5 million in 2000. Before the HGP, gene therapies were less than 1% of clinical trials; today they comprise more than 16%, all building off the data infrastructure foundation laid by the project.
Perhaps the closest analog to the million-peptide database proposal is ProteinDB, a database of around 150,000 complex proteins and their 3D structure. This open data base began as a project of the Department of Energy’s Brookhaven laboratory in the early ‘70s and has evolved into an international scientific collaboration. ProteinDB is like PubChem, in that it has become the primary depository for protein structure discoveries, but it is also like the Human Genome Project in that it was paired with a large data generation program: the Protein Structure Initiative (PSI). The Protein Structure Initiative was a $764 million project funded by the U.S. National Institute of General Medical Sciences between 2000 and 2015. The PSI developed high-throughput methods for protein structure determination and contributed thousands of unique protein structures to the database. By 2006, PSI centers were responsible for about two-thirds of worldwide structural genomics output. The hundreds of thousands of detailed 3D protein structures in the databank were the essential training data behind the success of AlphaFold.
These projects cut against the NIH’s structural incentives for smaller, shorter, investigator-led grants, but they still succeeded. PubChem was housed within the National Library of Medicine, which already had a mandate for data infrastructure, and received dedicated funding through the NIH Common Fund rather than competing with R01s. It also managed some of the drawbacks of data infrastructure projects in legibility and credit assignment by creating clear metrics of success around database usage, downloads, and a formal citation mechanism for database entries. Similarly, the Protein Structure Initiative was funded through the National Center for Research Resources, another NIH division with an explicit focus on research infrastructure.
The Human Genome Project overcame its barriers through a strong presidential endorsement and dedicated Congressional funding that bypassed normal NIH processes. It sustained this political momentum by developing clear technical milestones, like cost per base pair, that could be evaluated without relying on traditional academic metrics.
Here’s how a scientific funder like the NIH can adapt the success of ProteinDB, the Protein Structure Initiative, PubChem, and the Human Genome Project to create a million-peptide database:
Like PubChem, start by merging and standardizing existing peptide datasets, and open them to all. This alone would be a big help for machine learning in peptide research. A researcher today who wants to use all available peptide data in their model has to collect dozens of files, interpret poorly documented variables, and filter everything into a standardized format. Hundreds of researchers are currently duplicating all of this work for their projects. Thousands of hours of their time could be saved if the NIH or NSF paid to organize this data once and for all and opened the results to all interested researchers. Setting a Schelling point for all future data additions would also help keep the data standardized as the dataset grows.
Collecting existing data won’t be nearly enough to get to a million-peptide database. The next step, like the Protein Structure Initiative and the Human Genome Project, is to industrialize peptide testing. Mass-produced protein synthesis and testing are already well-established techniques in the field, so this project won’t need any 100,000x advances in technology to succeed like the HGP did. A scientific funding organization like the NIH only needs to support scaling up these existing techniques. Researchers can already test tens or hundreds of thousands of peptides simultaneously.
Industrializing peptide testing is more complicated than the demonstrations in individual research papers, because we need to screen for lots of variables in addition to a single measure of anti-microbial activity as the above research projects are doing. We want to know about the peptide’s activity against a broad range of bacteria, viruses, fungi, and cancer cells, we want to know about the peptide’s effects on benign human cells or beneficial bacteria so it doesn’t do too much collateral damage, and we want to know about the peptides that failed to have any interesting effects so our machine learning models know what to avoid. For peptide testing to match the scale needed by machine learning models, it needs to be funded beyond the resources available for a single paper.
This effort requires a purpose-made grant from a scientific funding agency like the NIH or the NSF, not a standard PI-led research project. The focus here should not be papers, citations, or prestige; just data. With a grant like this, a million-peptide database is achievable well below the budget and timeline standard set by the Protein Structure Initiative and the Human Genome Project.
Retail custom proteins cost $5-$10 per amino acid. At an average peptide length of 20 amino acids, that’s around $200 per peptide. That cost is just for the synthesis, not all of the time and labor required for testing, so a reasonable upper bound on the cost of a million-peptide database is $350 million. Even this large upper bound cost is likely justified by the potential impact of antimicrobial peptides. The direct treatment costs for just six drug-resistant infections is around $4.6 billion annually in the US, with a far greater cost coming from the excess mortality and damaged health.
The actual cost is likely considerably less than this $350 million upper bound. Performing protein synthesis in house and in-bulk rather than buying retail can greatly reduce costs. Additionally, these synthesis costs are for the highest-quality resin synthesis. High throughput methods, like SPOT synthesis, can be less than 1% of the cost per peptide, and allow researchers to synthesize thousands of peptides at once. Clinical use of the tested peptides would probably require retesting them with more expensive, higher purity methods, but you’d only need to retest the few most promising candidates. For the purpose of supplying millions of data points to a machine learning model, the purity of this high throughput method is more than sufficient.
Other methods use mass-produced DNA plasmids to induce bacteria like E. Coli to produce peptides on long chains attached to their membrane which, if they’re antimicrobial, end up killing the host cell. Researchers can then blend up all of the E. Coli and check which of the DNA plasmids copied themselves and which did not. The plasmids that didn’t reproduce are the ones which encoded antimicrobial peptides and prevented their host bacteria from multiplying. This method allowed University of Texas researchers to test 800,000 peptides at once, at a cost significantly lower than any other high throughput testing method. The downside is that you never get to isolate the actual peptide from the bacterial culture, which limits the types of tests you can run. But scaling up this process could easily generate hundreds of thousands of peptide candidates with some verified anti-microbial activity that can then move on to more detailed tests.
The time required to build a million-peptide database is also reasonable, perhaps less than five years. A single researcher can synthesize 400 peptides on a 20×20 cm cellulose sheet in 6 days using SPOT synthesis and can probably perform tests for antimicrobial activity, human toxicity, and other traits in another week. With an automated pipetting machine the yield increases to 6-8 thousand peptides in the same six days. A rate of 8,000 peptides synthesized and tested every two weeks would get to a million peptides in 1,800 days, just under five years. Most importantly, almost all of these processes are highly parallelizable, so scaling up the number of peptides you want to test doesn’t necessarily increase the amount of time it takes if you can set up another researcher or pipetting machine working in parallel.
The failure of standard scientific incentives to fund the creation of the peptide database is solvable. A single concentrated effort over several years would lay a foundation for a machine learning renaissance in antimicrobial peptide research, as PubChem, the HGP, and ProteinDB did for their respective fields.
Conclusion
The specter of infectious disease that haunted humanity for millennia is threatening to return. Our century-long respite from the constant threat of deadly infections is at risk as antibiotic resistance spreads. Already, antibiotic-resistant infections claim over 1.2 million lives annually worldwide. Peptides, in dragon blood and human spit, have been nature’s first line of defense against these infections for millions of years. We can learn from and improve upon nature’s example, making new effective treatments for some of the world’s deadliest and intransigent diseases.
More than simply preserving the 20th century safety that antibiotics created, peptides can exceed the effectiveness and versatility of antibiotics. Peptides are just short proteins and proteins are the machinery of all living things. Peptides can thus help prevent not only bacterial infections, like antibiotics, but also viruses, fungal infections, and cancer. Peptides are also programmable and easy to manufacture. Once we figure out how the properties of a peptide change as we substitute different amino acid building blocks, we will be able to design, test, and mass manufacture new treatments within weeks, rather than the decades it takes for new antibiotics to come to market.
The path towards this future is clear. Machine learning prediction on the sequence of amino acids is a promising and tractable way to advance our understanding and control over the properties of antimicrobial peptides. The most difficult scientific bottlenecks with this strategy have been crossed; all we need now is scale.
That means we need data. The existing data infrastructure for antimicrobial peptides is tiny and scattered: a few thousand sequences with a couple of useful biological assays scattered across dozens of data providers. No one in science today has the incentives to create this data. Pharma companies can’t make money from it and researchers can’t get any splashy publications. This means researchers are duplicating expensive legwork collating and cleaning all of this data and are not getting optimal results as it’s simply not enough information to fully take advantage of the machine learning approach.
Scientific funding organizations like the NIH or the NSF can fix this problem. The scientific knowledge required to massively scale the data we have on antimicrobial peptides is well-established and ready to go. It wouldn’t be too expensive or take too long to get a clean dataset of a million peptides or more with detailed information on their activity against the most important resistant pathogens and its toxicity to human cells. This is well within the scale of successful projects that these organizations have funded in the past like PubChem, the HGP, and ProteinDB.
We can meet this challenge and solve it quickly if we target our resources towards building open data infrastructure that thousands of research projects will use. Let’s not wait while antibiotic-resistant pathogens get stronger.
https://www.macroscience.org/p/how-scientific-incentives-stalled