DrAfter123/DigitalVision Vectors
CAIA turns to federated learning, AI agents for cancer research
The Cancer AI Alliance launched a federated learning platform and will pilot Asta DataVoyager, an Ai2 data analysis AI agent, to ask questions of datasets in plain language.
The Cancer AI Alliance, a group of four cancer centers, launched a federated learning platform for cancer research on Oct. 1.
The Fred Hutchinson Cancer Center serves as the coordinating center for CAIA. The other cancer centers in CAIA include the Dana-Farber Cancer Institute, Memorial Sloan Kettering Cancer Center and the Sidney Kimmel Comprehensive Cancer Center and Whiting School of Engineering at Johns Hopkins.
Federated learning is a type of machine learning that safeguards the anonymity of individual data while providing a decentralized way to train ML models.
Launched in October 2024, CAIA's goal is to use the federated learning platform to train AI models while maintaining data security and privacy as well as adhering to regulatory and ethical standards. CAIA built an AI laboratory for cancer research that focuses on data from the four cancer centers, said Brian Bot, executive director of the strategic coordinating center at CAIA and director of AI and data center partnerships at Fred Hutch.
The joint development of the federated learning platform by the four cancer centers led to a unified technical, legal and government structure, according to CAIA.
"For the Cancer AI Alliance, what we are aspiring to is to build the world's most comprehensive AI laboratory," Bot told Healthtech Analytics. "And what that means is building a platform and a set of data resources that are federated across these alliance members and provide them, importantly, the compute power to build some of these models."CAIA's financial supporters and tech partners include Amazon Web Services, Deloitte, the nonprofit Allen Institute for AI (Ai2), Google, Microsoft, Nvidia and Slalom. Working with the alliance and drawing on capital from the alliance's tech partners enabled the cancer centers to advance care while maintaining data security and integrity standards, noted Anaeze Offodile, MD, chief strategy officer of Memorial Sloan Kettering Cancer Center, in a news release.
By uniting, the cancer centers can address cancer research challenges together rather than in silos and study patterns across more diverse populations and rare cancers.
"This could serve as a model for every cancer center in the country to join CAIA in this collaborative effort to unlock innovation in cancer care," Vasan Yegnasubramanian, M.D., Ph.D., professor of oncology, pathology, and radiation oncology and molecular radiation sciences at the Johns Hopkins Kimmel Cancer Center and director of inHealth Precision Medicine at Johns Hopkins Medicine, said in a statement.
CAIA's federated learning technology connects to a centralized orchestration layer. This architecture enables AI models to reach each cancer center's secure data to learn locally while producing a learning summary. The cancer centers gain insights from training the model on their deidentified data. The data is then aggregated centrally to make the AI models stronger, enabling the centers to uncover patterns.
The alliance launched eight projects across its cancer centers that address persistent challenges in oncology, including identifying novel biomarkers and analyzing rare cancer trends. The deidentified data allows cancer centers to perform modeling and analysis on a diverse and representative group of 1 million patients.
Here are some projects that CAIA partners are undertaking.
Ai2 launches natural language AI agent for cancer research
Ai2, an AI research institute founded by the late Microsoft cofounder Paul G. Allen, launched Asta, an open ecosystem and suite of tools for AI agents, on Aug. 26.
As part of the Asta ecosystem, Ai2 introduced Asta DataVoyager on Oct. 1. It's a data-driven discovery and analysis platform that allows researchers to make queries about structured files in plain language, which produces clear, explainable answers. The nonprofit developed Asta DataVoyager along with the Fred Hutchinson Cancer Center. Their goal was to pull out insights from massive, siloed patient datasets while still protecting privacy.
On its federated platform, the CAIA has prototyped a federated instance of Asta DataVoyager.
"In Asta DataVoyager, a scientist could point to their scientific datasets and ask any questions to churn insights from the dataset, and the system would then ... statistically validate questions the scientist is asking for," Bodhisattwa Prasad Majumder, research scientist at Ai2, told HealthTech Analytics. "It will produce a scientific report, which will entail assumptions, programmatic analysis, statistical analysis of that question and eventually an answer to that question."
Asta DataVoyager is a "communication vehicle to talk to the data" and will write programs to draw insight from the data, he added.
CAIA is using Asta DataVoyager as part of an ongoing study focused on lung cancer. It compares treatments and outcomes across centers, drawing on factors like time to surgery with neoadjuvant chemo-immunotherapy and the impact of adding immunotherapy after definitive radiation.
Fred Hutch pilots Asta DataVoyager
The eight scientific projects CAIA announced include a collaboration between Fred Hutch and Ai2 to extend the use of DataVoyager to enable analysis in a federated manner, Bot said.
Ai2 and Fred Hutch paired Asta DataVoyager with a biostatistician and compared their ability to answer a specific type of question. The joint project will also explore the validity of the tool moving forward, according to Bot.
"I think this initial research project is going to be really important for us to better understand the differences between how a reasoning machine and a human biostatistician go about answering a question," Bot said.
"There's lots of nuance in what is right or wrong, but we feel understanding the differences between how a biostatistician and/or a reasoning engine would answer certain questions is really important," Bot added. "And so launching a collaboration with a group like Ai2 just made a ton of sense. And for us, we feel like that's the only way that we will, as a community, get to a point where we would feel comfortable putting these tools in the hands of a clinician or a researcher to answer really sensitive or important questions."
A biostatistician at Fred Hutch noted the speed advantage when accessing data through Asta DataVoyager, Majumder said.
"As soon as you get access to the real data through this federated platform, it would have taken some time for the biostatistician to sit and look at the data, write code, whereas in DataVoyager, it's purely a natural language interface," Majumder explained. "You could literally type in a natural language question, and in a minute, it'll generate the whole program that would run in a federated manner through this federated platform. It will basically combine insights from multiple cancer centers' data and give you an aggregation of that result, which is pretty amazing."
For instance, Ai2 worked with Fred Hutch to try to understand the chances of survival between two cancer treatments using CAIA's version of DataVoyager. The biostatistician used natural language to plot the graph by asking follow-up questions. The process only took two or three minutes, Majumder said.
Not only that, but DataVoyager will also search for new patterns in oncology research that are overlooked by human scientists, he added.
Bot, who has performed survival analysis in cancer clinical trials at the Mayo Clinic, further noted that trial questions sometimes get lost in the literature.
The AI agent will allow scientists or biostatisticians to create new hypotheses and validate them in the data.
"The tool will never sort of publish a scientific paper, but it can still flag interesting insights in the data, which are then worthy of looking into by the doctors or the scientists who are using it," Majumder said.
Ensuring trustworthy results from AI and data security
Asta DataVoyager enables trustworthy analysis and data security by allowing teams to fully control their data, according to Ai2.
"With each individual cancer center, their patient data never leaves their edge node," Bot said. "And so it's a way to increase security, reduce the likelihood of privacy leaks, while still being able to answer questions across the alliance."
The Ai2 platform has a layer of transparency in which it asks a biostatistician to review code before it runs to check if it's erroneous, Majumder explained. Running the program through a federated platform further prevents real patient data from being exposed. Statisticians can also download the code separately from DataVoyager.
"Right now, we have deidentified patient data existing in each one of the edge nodes," Bot added. "And so, while still sensitive information, there is a level of rigor that has already gone into deidentifying the individual patient data that resides within each cancer center."
A privacy preservation knob in Asta DataVoyager allows Fred Hutch and the other cancer centers to ensure both privacy and accuracy in the data models, according to Bot. For example, the CAIA federated platform will not provide an answer that includes fewer than five patients, he noted.
"There are bins of information, levels of granularity, that the framework will not let you get to in order to safeguard patient privacy," Bot said.
Looking ahead at CAIA
The partnership between Fred Hutch and Ai2 on Asta DataVoyager will allow researchers to "understand how a reasoning machine or AI model like what they have developed there can be customized and or responsibly extended or deployed eventually in an oncology setting," Bot said.
Asta DataVoyager allows scientists to examine data at all four cancer centers that are part of CAIA, and this capability could expand.
"We have ambitions to grow this much beyond just the four founding centers, such that you can start to look at those more rare cases, not as individuals but as smaller cohorts that you just otherwise would not be able to have access to when you are just Fred Hutch or just Johns Hopkins or just Memorial Sloan Kettering," Bot said.
Brian T. Horowitz started covering health IT news in 2010 and the tech beat overall in 1996.