Vasu Gatne: Coding against the next pandemic
The following story was written in Fall 2025 by Maggie Sheridan in ENGL 4824: Science Writing as part of a collaboration among the English department, the Center for Communicating Science, and the U.S. National Science Foundation COMPASS Center. The COMPASS Center is tackling the grand challenge of uncovering the genetic, molecular, cellular, and chemical rules of life underlying virus-host interactions through community-based and ethically grounded research. It is one of four Predictive Intelligence for Pandemic Prevention (PIPP) centers funded by the National Science Foundation.
When the world shut down in 2020, it wasn’t just a global health crisis, but a wake-up call. Within months, a virus that began in animals had spread across continents, upending economies and daily life. Scientists estimate that nearly 60 percent of human infectious diseases originate in animals, and that number is only climbing. But what if we could predict the next pandemic before it happens?
That’s the question Vasu Gatne, a master’s degree student in computer science at Virginia Tech, is helping to answer. Today, Gatne works with the NSF COMPASS Center (U.S. National Science Foundation Center for Community Empowering Pandemic Prediction and Prevention from Atoms to Societies) and a team of bioinformatics researchers and virologists to develop ways to use artificial intelligence to identify viral mutations. Their results could allow researchers to find the animal-borne viruses most likely to infect humans, potentially stopping future pandemics before they start.
Born and raised in Woodbridge, Virginia, Gatne came to Virginia Tech as an undergraduate computer science major.
“I started doing research my senior year,” she recalls. “I really liked my lab, and I wanted to stick with it for my master’s.” What began as a curiosity about algorithms evolved into a mission to use code for global good.
Unlike many of her peers focused on purely technical innovations, such as optimizing algorithms or designing new computing architectures, Gatne wanted to see machine learning applied to real-world challenges.
“Most CS [computer science] research is about developing algorithms, but I wanted to apply those algorithms to actually help people,” she explains. Gatne was determined to make a difference and assist those that are most at risk.
Gatne’s project combines machine learning and virology, a field where data meets disease. Her team focuses on zoonotic viruses, which are pathogens that originate in animals and can spread to humans — successfully infecting them and causing diseases such as Ebola, HIV, and COVID-19.
Gatne’s lab, which includes Ph.D. students developing new models, fine tuning them, and creating the datasets needed for training, is refining parameters and expanding the datasets to improve accuracy. Under the mentorship of T. M. Murali, director of the COMPASS Center and computer science faculty member at Virginia Tech, Gatne works with a team of four researchers at Virginia Tech as well as at Cornell University and the University of Michigan. The team has been working on this project since last January and is getting closer to having an end result this year.
The larger effort in the COMPASS Center to develop machine learning models for solving problems on viruses requires high-quality data for training, including data on which mutations in a virus enable it to "jump" to a new species. Much of this information is spread across the scientific literature. Biologists must manually review thousands of research papers, and this work is very painstaking.
“Right now, this process takes a really long time,” Gatne explains. “But with recent advances in large language models, like OpenAI’s and others, we think we can automate parts of it.” This means using AI [artificial intelligence] to scan and summarize massive amounts of scientific data, reducing the time it takes for researchers to identify risky mutations.
Gatne and others on the Virginia Tech team use large language models to extract information from scientific literature. They aim to train these models to detect which viral protein mutations might make animal viruses capable of infecting humans. To start, they’re testing their system on influenza A, a well-documented virus with a rich data set.
“It’s a really good starting point, and eventually, we want to use this on viruses that don’t have complete databases,” says Gatne. So far, their results have shown promise. The AI has successfully flagged certain known mutations as “high risk,” showing that the system can recognize meaningful biological patterns.
However, Gatne emphasized that teaching AI to read scientific literature isn’t as simple as it sounds. Early experiments using prompt engineering led to one of AI’s classic pitfalls: hallucination.
“It would just make up facts and even fake citations,” Gatne admits with a laugh. “So we realized we needed fine-tuning.”
The team shifted to a Retrieval-Augmented Generation approach, which feeds the Large Language Model verified publications and requires it to cite actual sources. However, AI still struggles with its recall rate, meaning it sometimes fails to retrieve key details or detect subtle mutations that human experts might not notice. “AI is kind of a black box,” she explains. “We don’t always know how it’s making decisions. So it’s important to test it to make sure the results are trustworthy.”
They also built an evaluation interface, a web page Gatne created that allows virologists to log in and evaluate the reasoning provided by LLMs for their retrieval of specific mutations from the scientific literature. They weigh in on whether the reasoning makes a valuable scientific statement and is clear, concise, and well written.
“We’re not biologists or virologists,” she explains. “So we have experts look at each of the data sets we input and provide us feedback on whether the LLM’s results looks right or not.” She says that she will then “need to be able to parse the outputs that they give me, and that can be a challenge if the system does not communicate exactly what I need,” she says.
Beyond the algorithms, Gatne’s work demonstrates the growing importance of collaboration between computer scientists and life scientists. These groups working in tandem helps give the scientific community research they need to support the health of our country and minimize negative effects of viruses.
If successful, Gatne’s research could transform how global health organizations track emerging diseases. Instead of waiting for outbreaks to happen, researchers could scan viral genomes in wildlife populations and receive AI-assisted alerts about high-risk mutations.
“It’ll help biologists quickly and accurately identify which sequences have zoonotic potential,” she says.
The possible implications of her work extend beyond research labs. Public health agencies could use such models to guide vaccine development, prioritize field studies, or even preempt outbreaks. In particular, people with autoimmune conditions or those in high-exposure professions will greatly benefit from this early detection.
“This project will lead the way into bigger projects that are going to be used by the National Science Foundation and other agencies, which is really cool,” said Gatne. She recognizes the potential long-term significance of her work in influencing large-scale public health research and pandemic prevention.
Looking ahead, Gatne hopes to push her system further.
“It’d be really cool to feed it a protein sequence that no one has seen before,” she says, “and have it identify potential mutations that could cause cross-species transmission.”
For Gatne, this potential encapsulates what drew her to this work in the first place: not just developing smarter machines, but using them to make a safer world. And in a world where viruses evolve faster than we can react, Gatne is helping humanity learn to see the next one coming.