T. M. Murali: Decoding pandemic potential: The intersection of genomics and machine learning
The following story was written in April 2024 by Dhruv Shah in ENGL 4824: Science Writing as part of a collaboration between the English department and the Center for Communicating Science.
In 2020, as the world dealt with the devastating impact of COVID-19, we found ourselves facing an unsettling truth — the virus had been lurking in animals long before it made its deadly leap to humans. Imagine how the outcome might have been different if, from the moment we detected the presence of the virus in bats, we had been able to recognize its pandemic potential. Millions of lives could have been spared, and the course of history might have been dramatically altered.
This scenario isn't a work of fiction; it's a reminder of the urgent need for proactive measures in pandemic prevention.
In a recent interview with a pioneering pandemic prevention researcher, T. M. Murali, I dove into the critical questions surrounding viral mutations, host identification, and the role of cutting-edge technology in safeguarding against future global health crises.
Murali, whose journey in academia has taken him from the halls of prestigious institutions like the Indian Institute of Technology, Madras to the research labs of Brown University and Boston University before landing at Virginia Tech, brings a wealth of expertise to the field of pandemic prevention. His knowledge in computer science, systems biology, host-pathogen protein interactions, and whole-genome gene function prediction has allowed him to make major contributions to the study of pandemic prevention. Many of these methods have to do with communication between cells and how the specific genes in our bodies can make us more or less susceptible to illness.
“If you think of viruses, they are typically found in many animal species, they circulate in animals for millions or hundreds of thousands of years; [in that time] they can change and mutate,” said Murali. “Sometimes what can happen is a human being can come in contact with an infected animal, and if the virus has the right mutations it can start infecting the human as well. Those infections can stay between one or two humans and die out, or there could be situations where it could start an endemic or a pandemic, like COVID-19.”
Understanding the process by which a virus transitions from animals to humans and subsequently triggers widespread illness is a daunting task. What exactly drives this transition, and what genetic changes are necessary for a virus to establish itself within the human population and spark a pandemic? These are the questions at the heart of pandemic prevention research being undertaken by scientists like Murali. They are dedicated to unraveling the mysteries of viral mutations so that they can identify the specific genetic sequences that enable viruses to cause widespread disease outbreaks.
“Let’s say I give you a string of letters that makes up the sequence of a viral protein,” Murali explained. “Maybe that sequence comes from some protein in HIV, maybe it comes from some coronavirus that infects a bat. If I just give you an arbitrary sequence, can you predict which host it effects? No. That is the machine learning question.”
Murali and his team have made impressive progress in crafting a dependable machine learning model to tackle this host prediction problem, he said, especially for animal species where data is plentiful. The model was developed through two main steps. Initially, they conducted what they call “unsupervised training” of the large language model, where segments of a sequence are masked and fed to the model and it is then trained to predict the missing data.
“Imagine you feed the model a sentence with some words intentionally missing,” Murali explained. “Our objective is to train the model with enough data to the point that it is able to accurately predict what those words are." This idea is used widely in machine learning models of natural languages, he said.
In the next step, the model undergoes a fine-tuning phase in which it tackles the actual supervised classification task. Because of the ample amount of data available on sequences of viruses known to infect humans, Murali said, his team has been able to achieve remarkable accuracy in predicting potential humans hosts for various viruses using their model. Their work directly highlights the significance machine learning has in advancing our ability to forecast and combat infectious diseases.
Picture a world in which such machine learning models enable us to swiftly identify and assess the potential threat posed by emerging viruses. With a highly accurate classifier at our disposal, we could analyze viral sequences extracted from animal samples or from wastewater and quickly determine their likelihood of causing harm to humans and initiating a pandemic. When dangerous viral sequences are found, public health officials could implement targeted monitoring and containment measures, preventing the spread of these dangerous viruses before they have a chance to wreak havoc on human populations.
The goal of Murali’s research in pandemic prevention is to stay one step ahead of infectious diseases by leveraging the power of science and technology to anticipate and mitigate future threats. By decoding the genetic signatures of viruses and harnessing the predictive capabilities of machine learning, he said, we can bolster our defenses against potential pandemics, safeguarding the health and well-being of communities around the globe.
More information regarding Murali’s research can be found on his personal website: https://bioinformatics.cs.vt.edu/~murali/.