Machine Learning Discovers Hundreds of Thousands of Unknown Antiviral Proteins in Bacteria — A 'Treasure Trove' for Biotech
Two research teams have used machine learning algorithms to discover hundreds of thousands of previously unknown antiviral proteins in bacterial genomes, revealing that scientists have been "massively underestimating" the diversity of bacterial immune systems. The findings could inspire a new generation of molecular tools for genetic engineering.
Published in Science, the studies estimate that 1.5% of genes in a bacterial genome correspond to antiviral immunity proteins — three times more than previous estimates. More than 85% of the predicted protein families had never been associated with immune function.
One team, led by Aude Bernheim at the Pasteur Institute in Paris, trained deep-learning models on protein and genomic data to predict antiviral systems. Laboratory experiments on E. coli and Streptomyces albus confirmed 12 previously unknown "antiphage" defense systems.
A separate team at MIT, led by Michael Laub, developed a tool called DefensePredictor that analyzed 17,000 bacterial genomes. In tests on 69 E. coli strains, it identified 624 defense-related proteins, more than 100 of which were previously unknown. Lab experiments confirmed defense activity in 42 cases.
Previous discoveries of bacterial antiviral systems include CRISPR-Cas9 gene editing and restriction enzymes, both of which have been repurposed into revolutionary biotechnology tools. "There's a hope that maybe there's a next generation of molecular tools that would come from some of these new systems," said Laub.
"This is a treasure trove for any biochemist," said José Antonio Escudero, a microbiologist at Spain's CSIC National Center for Biotechnology.