Amartya Dutta: Paving the way for new therapies using AI-assisted drug repurposing
The following story was written in Fall 2025 by Kylor Chou in ENGL 4824: Science Writing as part of a collaboration among the English department, the Center for Communicating Science, and the U.S. National Science Foundation COMPASS Center. The COMPASS Center is tackling the grand challenge of uncovering the genetic, molecular, cellular, and chemical rules of life underlying virus-host interactions through community-based and ethically grounded research. It is one of four Predictive Intelligence for Pandemic Prevention (PIPP) centers funded by the National Science Foundation.
When Amartya Dutta first saw his AI model accurately describe a man walking a dog, not just recognizing the man and the dog individually, he realized he was on to something bigger. This observation sparked a journey that bridges computer vision and computational biology, two fields that rarely overlap but that can be used to meet a common goal: helping machines make sense of complexity.
Born in India, and now a Ph.D. student in the Department of Computer Science at Virginia Tech, Dutta has long been fascinated by the endless possibilities of artificial intelligence.
“My inspiration was literally [Marvel’s Iron Man] Tony Stark. I wanted to build the next ‘Jarvis,’” says Dutta, recalling his early work with augmented reality and real-world object detection apps.
His inspiration, coupled with his background in computer science, would lead him to work in computer vision or, more specifically, scene graph generation: a field that focuses on teaching AI not just to identify objects in an image, but to identify the relationship between those objects as well.
“I love images. Wherever there’s an image, I actually very much enjoy it," Dutta says. “I can visually see my predictions and my models working.”
At its core, scene graph generation, or SGG, aims to recognize the interactions between objects. The training of AI to recognize the relationship between the objects in an image allows for more accurate reasoning, Dutta says.
Typical scene graph generation models are fully trained for their task. However, Dutta’s approach relies on a pretrained vision language model that must accurately process unfamiliar word and image pairs the first time. By reducing training time, this approach also reduces use of computer resources.
During his master’s program, Dutta explored how different design choices could influence performance, including organizing objects in different sets to improve results or determining how using different model structures can shape accuracy.
Computer vision can also be useful in a different research area: drug repurposing. Through his work with the U.S. NSF COMPASS Center (National Science Foundation Center for Community Empowering Pandemic Prediction and Prevention from Atoms to Societies), Dutta collaborates with biologists and virologists to explore how machine learning can be used to identify existing drugs that can treat new or emerging diseases.
“It takes almost 10 to 12 years for a new drug to be available in the market,” Dutta explains. “Because the development of a new drug is such a costly and time-intensive process, if we come across a virus for which we don’t have a treatment ready, it would take a lot of time and resources to develop a drug.”
His research in drug repurposing explores the ability of AI models to predict the relationship between viruses, proteins, and drugs in hopes of accelerating treatment and reducing the number of lives lost to new diseases.
Drug repurposing research can use protein language models.
“There are models that can be trained to understand the language of proteins like we would interpret human grammar,” explains Dutta. For instance, a protein language model might read a protein sequence like a sentence, with amino acids functioning like individual letters within the sentence, Dutta said.
This research project doesn’t come without its own challenges. One is that a computer scientist is working with biologists.
“It took me a few months just to understand what people are talking about. The last time I studied biology was probably in my tenth standard,” Dutta says, referring to the school year for 15- and 16-year-olds in India.
Another challenge is that Dutta has very little data with which to train AI models. Although there are 202 drugs and 104 viruses in a dataset he has access to, only 1,016 drug–virus pairs are publicly verified, which is very few compared with the 21,008 combinations that are theoretically possible, he explains.
For SGG, drug repurposing, and other uses of AI, Dutta points out that there are ethical considerations that must be considered. For instance, AI technology has the potential to extend to autonomous driving and to military capabilities.
“In terms of military-level use, there would certainly be safety concerns,” he says. “Does this give too much power to someone? Or from a robotaxi level, is this safe enough to be deployed on the streets?”
The stakes only get higher with drug repurposing research. Dutta acknowledges that freely sharing predictions about drug-disease interactions can have unintended consequences, leading to a misinterpretation of drug use. Some models can predict not just which drugs to use, but also which drugs not to use.
“It’s a difficult decision to make, whether or not you should make that knowledge available to the public or only allow access with due permission and clearance,” says Dutta.
Once these ethical roadblocks are cleared, Dutta looks toward creating foundation models, which are large, general-purpose systems that can be adapted for many scientific purposes.
“The goal is to develop models which are rich in information and can be reused for other downstream tasks,” he explains. “With these systems, you won’t have to train something else from scratch again.”
Whether in code or proteins, Dutta’s goal remains the same. Teaching machines to accurately understand and interpret data will allow artificial intelligence systems to be better at assisting humanity.