AI in 2026 offers the promise of solving our biggest scientific questions in medicine and, if we're to believe the hype, doing so cheaply and with minimal experimental effort. The reality, as anyone working in research knows, is more complicated — and more interesting.
In my work at the Allison Institute, I've seen multiomic spatial profiling emerge as a necessity, particularly in immunotherapy research. Approaches like CODEX, Lunaphore Comet, and CosMx are essential for understanding the tumor microenvironment, but they remain the primary bottleneck. They are expensive and time-consuming, ultimately limiting how many samples or patients can be included in a study.
Machine learning is undoubtedly at its best when integrating massive, disparate datasets. We have seen an explosion of publications in this area — particularly in cancer research, where multi-omics publications reached a peak in 2024 and 2025. Yet, despite the promise, it remains unclear how much novel insight is actually being generated versus how much is simply being catalogued.
Recently, machine learning approaches have promised to draw molecular profiling data directly from standard H&E images — predicting spatial transcriptomics from histology slides already abundant in clinical archives. If accurate at scale, this would change the entire field. But it raises a critical question: what is lost in translation, and are those the most important data points?
The Market Has Noticed
The acquisitions and consolidation happening in this space tell you everything — what's being competed for isn't the instruments, it's the patient data those instruments generate. Whoever controls the largest, highest-quality multimodal biological datasets controls the foundation on which every model will be trained. The money understands this even when the science press hasn't quite caught up.
The Benchmark Problem
Take, for example, the high-profile work led by Vivek Natarajan at Google. Models like Med-PaLM 2 and AMIE have reached expert-level scores on medical benchmarks — a genuine achievement. But benchmark performance and real-world utility are different questions. Their impact still depends on the quality and representativeness of the data they were trained on. A model trained on images from well-resourced academic centers may perform very differently on samples from institutions without the same tissue processing standards. The benchmark is only as meaningful as the biology underneath it.
New tools, like Cellformatica — which arose from work in Garry Nolan's lab — seek to provide an "immunologist in a box," fusing disparate datasets into something navigable. But tools alone aren't the solution. Someone with functional knowledge of both the biology and the computation must still serve as the bridge.
The Interface Problem
High-level collaboration between laboratory researchers and their computational counterparts is often beset by a classic difficulty. I have witnessed this interface firsthand:
One side says: "They just aren't giving us any data."
The other says: "They won't tell us what data they need."
This is not a failure of intelligence on either side. It is a structural problem — one that no amount of tooling resolves on its own. What is needed are truly symbiotic partnerships built around the most pressing biological questions, rather than computational approaches deployed in search of a use case to justify a heavy prior investment.
The Executive Reality
AI models need data to learn and improve, and reliable scientific data remains the currency. The scientific and medical communities are enthusiastic about supercharging research with these tools — the harder questions are which tools, deployed how, governed by whom, and toward what end. AI may replace many of us someday. That day is not today. For now, it seems mostly to find more work for all of us. In times of economic uncertainty, that may be a blessing in and of itself.