Newsletter Archive
Listen to Our Podcast
Dear Aventine Readers,
As AI has become wizard-like in its ability to find patterns in vast amounts of data, it was only a matter of time before its power was directed at making predictions about our health. So far such systems have been operating in the research phase — making predictions about historical data for which the outcomes are already known. One model had a 79 percent accuracy rate in predicting a person's chances of heart failure within ten years. Now, before the year's over, the first active patients will take part in a clinical trial to determine whether that same AI model can detect heart disease and, subsequently, whether it can predict the patient's chances of experiencing heart failure over the coming year. We look at this state-of-the-art form of predictive medicine and what it will take to make it mainstream.
Also in this issue:
Thanks for reading,
Danielle Mattoon
Executive Director, Aventine
Subscribe
Subscribe to our newsletter and be kept up to date on upcoming Aventine projects
The Promise of Predictive AI in Medicine
As 2025 draws to a close, hundreds of patients of one of the UK’s largest hospital groups will be part of a test to determine whether AI can deliver on a potentially life-changing promise.
This is the first clinical trial of an AI system called AIRE, built to predict heart failure up to a decade in advance based solely on a standard electrocardiogram, or ECG, the routine medical test in which electrodes measure the heart's electrical activity. If the trial, taking place in hospitals across Imperial College Healthcare NHS Trust and Chelsea and Westminster Hospital NHS Foundation, proves that the tests are reliable over the coming year, it will be a significant advance over existing cardiovascular risk prediction.
While AIRE is at the vanguard of predictive AI in medicine, it is far from alone. As AI has become more powerful in recent years, the aim of using it to make predictions about human health has become a sort of holy grail for medical researchers, inspiring dozens of efforts to bring such tools to market. Train a neural network on millions of patient records, and it could find subtle predictive signals that traditional epidemiology misses, allowing physicians to intervene earlier and — potentially — to prevent illness entirely.
Yet for medical AI to move from research lab to everyday use, it must check all kinds of boxes: be accurate across diverse patient populations, be sufficiently validated so that clinicians trust its recommendations, avoid bias that could worsen health disparities and have necessary regulatory approvals. Being wrong, or even just a little off, could have dire consequences.
“Humans are complex, disease is complex,” said Eric Oermann, a neurosurgeon at NYU Langone Health who also develops medical artificial intelligence systems. “Because life's complex, [making predictions about patient outcomes] can be very, very hard, especially over long [time] horizons.”
Pattern recognition in heart rhythms
AIRE illustrates both the potential and complexity of predictive AI in healthcare. Built on a neural network architecture traditionally used for image recognition and trained on a dataset including nearly 1.2 million ECGs, it analyzes the waveforms of electrocardiograms to assess future cardiovascular risk. In initial experiments using historical data, it correctly identified patients' 10-year risk of death in 78 percent of cases, and predicted future heart failure with 79 percent accuracy, outperforming traditional risk prediction tools. The upcoming trial with real patients will initially determine whether the model can identify current heart disease, as well as assessing its ability to predict one-year outcomes.
Unlike large language models, which can generate very different outputs from the same input data — a feature that contributes to AI hallucinations — AIRE is deterministic, meaning it produces consistent outputs for identical inputs. That consistency aligns it much more closely with how traditional diagnostic tools behave.
But AIRE also offers a glimpse at the more unexpected connections that AI can make. In addition to being able to anticipate heart conditions, the system showed promise in predicting Type 2 diabetes up to 10 years in advance, before there would have been changes in a patient’s blood sugar levels. It showed similar promise predicting chronic kidney disease. The big takeaway: AI is capable of identifying patterns that humans may struggle to spot or never think of. "One of the paradigm shifts in medical research guided by AI is that we ideally shouldn't be anchored to any prior beliefs," said Arunashis Sau, a cardiologist and AI researcher at Imperial College London who is leading the development of AIRE. "Because everything we think we know may prevent us from finding new things."
Discovering unexpected predictive relationships is clearly valuable. But it raises questions about how those predictions should be used. The connection between an ECG and heart health is clear; the connection between ECG results and diabetes or kidney disease, less so. And without thoroughly understanding that connection, how should doctors use such information? “That is a controversial area,” said Sau. “Some [clinicians] would say that if it works, it works. Others wouldn't buy into that.”
Currently, the team at AIRE can attribute patterns in ECGs to certain risks at the population level, but can't point to features in a single ECG and explain how they contribute to an individual's risk profile. As AI prediction systems become more complex, understanding how a system arrives at its predictions about the patient will get even harder: “There is a very clear paradigm in AI where the more complex [a model] is, the better it performs, but the less explainable it is,” said Sau.
Predicting all disease everywhere all at once
This summer, we got a glimpse of just how advanced things could get. Researchers at the European Molecular Biology Laboratory in Cambridge unveiled DELPHI-2M, a generative AI system that represents perhaps the most ambitious health prediction effort to date. Built on an architecture similar to the large language models that underpin ChatGPT, DELPHI-2M was trained on anonymized medical records from 400,000 UK Biobank participants with the aim of predicting a patient’s likelihood of developing any of 1,258 diseases over the next 20 years.
In early testing, DELPHI-2M's predictions matched or exceeded existing single-disease risk estimation tools for most conditions. And when tested on health data from 1.9 million people in the Danish National Patient Registry — a database spanning 40 years — the model maintained most of its predictive accuracy, suggesting that it is highly transferable across health care systems.
"We can, at any point in somebody's life, say, 'OK, what's your likely rate of heart attack in the next year,'" said Tom Fitzgerald, a lead researcher on the project. "And you can do that across all diseases."
This breadth is unprecedented. Yet the model's probabilistic transformer architecture — the same technology that makes ChatGPT powerful but unpredictable — raises questions about its consistency. Unlike AIRE's deterministic outputs, DELPHI-2M can generate different predictions from identical inputs, which could make validation and getting regulatory approval of the model more complex.
The current system also has limits. It struggles to make accurate predictions about diseases that have unpredictable external causes, as well as about the existence of rare congenital conditions. And while it outperforms many other risk prediction tools, it performs worse than current models for diabetes forecasting. Finally, because UK Biobank participants skew healthier and more middle-class than the national average, the model tends toward optimistic predictions. The team hopes to overcome some of these issues by training the model on more diverse data sets.
AI in the back office
While systems like AIRE and DELPHI-2M aim directly at clinical decision-making, other systems operate in medicine's back office.
NYUtron, developed by Eric Oermann, a neurosurgeon and AI researcher at NYU Langone Health, uses a language model architecture to analyze physicians' notes and predict administrative outcomes, including length of hospital stay, readmission risk and the chances of insurance claim denial. Trained on 7.25 million clinical notes from 387,000 patients, the model predicts these metrics with 79 to 95 percent accuracy, performing 5 to 15 percent better than traditional prediction methods.
Every night, the model makes an updated set of predictions based on the latest changes to patients' electronic health records across NYU Langone’s hospitals to help clinicians and managers understand issues such as bed allocation and resource planning. "Running a hospital is kind of like running a hotel," Oermann explained. "But you don't know how long [people] are going to stay."
The decision to focus on administration rather than direct patient care is deliberate. "The regulatory burden is pretty light, if not nonexistent," Oermann said. That allows for faster iteration without FDA oversight. The first version of NYUtron was described in Nature in 2023, and the team is already preparing to publish details of its third iteration later this year. A startup he founded is commercializing the technology in order to sell it to other health care providers.
Yet the administrative-versus-clinical distinction may not be quite as clear-cut as it appears. AI software is exempt from FDA regulation if its purpose is solely for administrative support of a health care facility. But if administrative tools become more advanced and engrained, and begin to, say, directly affect discharge decisions, the boundary between logistical decision support and clinical decision-making could become blurred.
Fitzgerald thinks that it will be five to ten years before tools like DELPHI-2M reach clinical deployment. Simpler, more focused systems like AIRE may arrive sooner. And just as GPT models can now process both images and text, future medical AI will likely incorporate a wider range of data, such as genetic sequences, medical imaging or information from wearables. This introduces additional privacy and consent challenges, but also points toward increasingly sophisticated systems.
In the meantime, these technologies will likely find a home in hospital administration and population health management, predicting aggregate risks, patient volumes, and resource needs rather than individual clinical outcomes. "The fundamental act of medicine is a predictive problem," Oermann observed. In that sense, these AI-powered systems represent a natural evolution in health care, whatever form they take.
Listen To Our Podcast
Learn about the past, present and future of artificial intelligence on our latest podcast, Humans vs Machines with Gary Marcus.
Advances That Matter
Scientists are building embryo-like structures without sperm or eggs. An emerging field of synthetic biology is assembling stem cells — the body’s unspecialized cells that can develop into any tissue — into clusters that organize themselves into structures resembling early human embryos. MIT Technology Review reports that researchers at Israel’s Weizmann Institute of Science, Caltech, Rockefeller University and the University of Cambridge have created synthetic human embryos with features such as beating hearts, neural folds and cells destined to develop into a placenta. Startups, too, are working on the idea, with concepts that include generating endless supplies of blood-forming cells for transfusions, or — plunging fully into the realm of horror sci-fi — growing headless bodies for organ donation. But the work raises enormous ethical questions. In most countries, natural embryos can’t be grown in the lab for more than 14 days. Synthetic embryos fall outside that rule, and one company, Renewal Bio, has said it lets some grow for at least 28 days. At 40 days, you’d expect to see the first signs of eyes and limbs. Some scientists have proposed genetically disabling brain development to sidestep ethical constraints, a fix that creates its own moral dilemmas. The technology remains unreliable for now, but progress is accelerating, making the ethical debate increasingly urgent.
AI trained by white-shoe workers is coming for professional work. Large language models are so good at producing compelling documents because they’ve been trained on vast quantities of text from across the internet. But much of what happens inside banks, law firms, and consultancy practices relies on proprietary knowledge locked inside companies — and inside the heads of the people who work there. Now, AI labs are starting to extract that expertise directly. We mentioned in a recent newsletter that OpenAI has hired more than 100 former investment bankers from firms like JPMorgan Chase, Morgan Stanley and Goldman Sachs to teach its models how to build financial models. The recruiting is being handled by Mercor, a startup that has also contracted around 150 ex-consultants from McKinsey, Bain, and BCG to train AI systems on entry-level consulting tasks. Its job board lists openings for all sorts of professionals, including lawyers, physicians, actuaries and financial advisers. Building models that can actually perform such work would give companies like OpenAI and Anthropic a huge commercial edge. If this sort of hiring seems a little odd to you, analyst Benedict Evans notes that, “people were [once] surprised that Google paid people to drive down every street in the developed world to feed Google Maps.” The training itself takes place in so-called AI gyms: simulated environments that mimic software such as Excel or Salesforce, where models learn by watching professionals work before practicing on their own. And in case you’re wondering how seriously this is being taken, The Information reported over the summer that Anthropic is considering spending $1 billion on such gyms over the next year.
A new breed of super-targeted radioactive cancer drugs is emerging. Radiopharmaceuticals combine cell-destroying radioactivity with molecular “homing” compounds that deliver that radiation directly to cancer cells, ideally killing tumors while sparing healthy tissue. A few are already on the market: the FDA has approved Lutathera, for a rare intestinal cancer, and Pluvicto, for metastatic prostate cancer. Their commercial success, Science reports, has kicked off a wave of innovation aimed at creating far more powerful versions. Researchers are now experimenting with different radioisotopes to make the drugs more potent, developing new targeting molecules for greater precision, and redesigning how isotopes and delivery systems are packaged to reduce side effects. The goal is to treat a wider range of diseases, from blood cancers like leukemia to solid tumors such as those in the colon. Perhaps the biggest challenge is supply. Producing medical-grade radioisotopes is difficult, demand for them generated by existing drugs has already strained the supply pipeline and things are further complicated by a reliance on Russia, a major producer of the materials. Strict safety, infrastructure and training requirements will also make scaling up harder than for conventional pharmaceuticals, and many radiopharmaceuticals must be manufactured and used quickly before they lose potency. Still, hundreds of new radiopharmaceuticals are now in clinical trials, and many could win approval within the next five years, bringing about a quiet revolution in how we target and destroy cancer.