While mammograms are the gold standard in breast cancer screening, there is some debate over when and how often they should be performed. On the one hand, proponents believe that the ability to save lives is important: Women in their 60s and 70s who have mammograms, for example, have a 33 percent lower risk of dying than those who don’t. Others, on the other hand, contend that false positives are costly and potentially traumatic: According to a meta-analysis of three randomized trials, mammography results in a 19% over-diagnosis rate.
Despite some lives saved and some overtreatment and overscreening, current guidelines remain a catch-all: women aged 45 to 54 should get mammograms every year. While tailored screening has long been considered the answer, systems that can make use of vast amounts of data are lacking.
As a result, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and the Jameel Clinic for Machine Learning and Health wondered if machine learning could be used to give individualised screening.
Tempo, a system for establishing risk-based screening parameters, arose from this. Tempo will propose a patient return for a mammogram at a specified time point in the future, such as six months or three years, based on an AI-based risk model that looks at who was screened and when they were diagnosed. Without having to educate new policies, the same Tempo policy can be simply changed to a wide range of conceivable screening preferences, allowing physicians to choose their preferred early-detection-to-screening-cost trade-off.
The model was trained using a large screening mammography dataset from Massachusetts General Hospital (MGH) and tested using MGH held-out patients as well as external datasets from Emory, Karolinska Sweden, and Chang Gung Memorial hospitals. Tempo achieved superior early detection than annual screening using the team’s previously established risk-assessment system Mirai, while requiring 25% fewer mammograms overall at Karolinska. It was advised that women get a mammography once a year at MGH, and it resulted in a simulated early detection benefit of four and a half months.
“By tailoring the screening to the patient’s individual risk, we can improve patient outcomes, reduce overtreatment, and eliminate health disparities,” says Adam Yala, a PhD student in electrical engineering and computer science, MIT CSAIL affiliate, and lead researcher on a paper describing Tempo published Jan. 13 in Nature Medicine. “Given the massive scale of breast cancer screening, with tens of millions of women getting mammograms every year, improvements to our guidelines are immensely important.”
Early applications of AI in medicine date back to the 1960s, with the Dendral studies often credited with kicking off the field. Researchers developed the first expert-level software system, which automated organic chemists’ decision-making and problem-solving behaviour. Deep medicine has advanced significantly in terms of drug diagnostics, predictive medicine, and patient care in the last sixty years.
“Current guidelines divide the population into a few large groups, like younger or older than 55, and recommend the same screening frequency to all the members of a cohort. The development of AI-based risk models that operate over raw patient data give us an opportunity to transform screening, giving more frequent screens to those who need it and sparing the rest,” says Yala. “A key aspect of these models is that their predictions can evolve over time as a patient’s raw data changes, suggesting that screening policies need to be attuned to changes in risk and be optimized over long periods of patient data.”
Tempo develops a “policy” that predicts a follow-up recommendation for each patient using reinforcement learning, a machine learning technology popular in games like Chess and Go.
Only information on a patient’s risk at the time of their mammogram was available in the training data (when they were 50, or 55, for example). The researchers wanted to measure risk at intermediate stages, so they created an algorithm that learned a patient’s risk at unseen time points from their observed screenings, which evolved as more mammograms of the patient became available.
The researchers used a neural network to train to anticipate future risk ratings based on prior ones. This model then calculates patient risk at unobserved time points and allows risk-based screening policies to be simulated. They then applied that strategy (also a neural network) to the retrospective training set to optimise the reward (for example, the combination of early detection and screening cost). You’d eventually get a recommendation for when you should come back for the next screen, which might be anywhere from six months to three years in the future, in six-month increments — the standard is only one or two years.
Let’s imagine Patient A has their first mammogram and is diagnosed towards the end of the fourth year. They don’t come back for another two years since there’s nothing in Year Two, but they do acquire a diagnosis in Year Four. There has now been a two-year delay since the last screening, during which a tumour could have grown.
Using Tempo, the advise at the initial mammogram, Year Zero, might have been to return in two years. Then, in Year Two, it may have determined that the risk is high and advised the patient to return in six months, where, in the best case scenario, it would be detected. Based on how the risk profile changes, the model dynamically changes the patient’s screening frequency.
Tempo employs a basic early detection metric based on the assumption that cancer can be detected up to 18 months in advance. Tempo surpassed current guidelines in different situations of this assumption (six months, 12 months), but none of these assumptions are flawless, because a tumor’s early detection potential is dependent on its characteristics. Follow-up research employing tumour growth models, according to the scientists, could overcome this issue.
Furthermore, because it does not clearly quantify false positive risks or other screening harms, the screening-cost metric, which counts the total screening volume advised by Tempo, does not provide a complete estimate of the entire future cost.
There are numerous possible avenues in which tailored screening algorithms can be improved. One option, according to the researchers, is to improve on the metrics used to estimate early detection and screening costs using retrospective data, resulting in more precise guidance. Tempo could also be tweaked to incorporate multiple types of screening recommendations, such as using MRI or mammograms, and future research might model the costs and benefits of each independently. Recalculating the earliest and latest age at which screening is still cost-effective for a patient may be possible with better screening programmes.
“Our framework is flexible and can be readily utilized for other diseases, other forms of risk models, and other definitions of early detection benefit or screening cost. We expect the utility of Tempo to continue to improve as risk models and outcome metrics are further refined. We’re excited to work with hospital partners to prospectively study this technology and help us further improve personalized cancer screening,” says Yala.
Yala collaborated on the Tempo paper with MIT PhD student Peter G. Mikhael, Karolinska University Hospital’s Fredrik Strand, Chang Gung Memorial Hospital’s Gigin Lin, Chang Gung University’s Yung-Liang Wan, Emory University’s Siddharth Satuluru, Georgia Tech’s Thomas Kim, Emory University’s Hari Trivedi, Mayo Clinic’s Imon Banerjee, Judy Gichoya
Susan G. Komen, Breast Cancer Research Foundation, Quanta Computing, an Anonymous Foundation, the MIT Jameel-Clinic, Chang Gung Medical Foundation Grant, and Stockholm Läns Landsting HMT Grant have all contributed to the research.