External Data and Cancer: the New Frontier That Could Transform Clinical Trials
In cancer research, one of the trickiest challenges is also among least visible: truly understanding whether a new therapy works. Not on average, but for those who matter most—the subgroups of patients for whom a treatment can mean the difference between progress and failure.
In recent years, with the rise of targeted therapies and biomarkers, it has become increasingly clear that drugs do not work the same way for everyone. Yet many clinical trials are still designed to measure “average” effects, risking the loss of crucial information. This is where a new methodological proposal comes into play—one that could change the rules of the game.
An international team of researchers, including Sandra Fortini (Department of Decision Sciences, Bocconi University), together with B. Ren (McLean Hospital, Belmont, Massachusetts), F. Ferrari (Merck & Co), S. Ventz (University of Minnesota), and L. Trippa (Harvard School of Public Health), has developed an innovative approach that leverages external data to improve the statistical power of clinical trials. The work, published in Biometrika, addresses a central issue: how to avoid false negatives when treatment effects are concentrated in specific subgroups.
The hidden problem in clinical trials
In randomized trials—the gold standard of medical experimentation—the goal is to determine whether a treatment works by comparing it to a control. But when benefits are unevenly distributed across patients, there is a risk of detecting nothing at all.
As the authors note,
“false negative results are likely to occur when the treatment effects concentrate in a subpopulation.”
In other words, a drug may be highly effective for some patients, while appearing useless if one looks only at the average. The problem is compounded by practical constraints: recruiting more patients to analyze subgroups is often unfeasible, especially in rare diseases or advanced oncology.
The solution: making better use of existing data
The researchers’ proposal is to integrate external data into clinical trials—data drawn from previous studies or electronic health records. This integration makes it possible to “strengthen” statistical analysis without increasing the number of enrolled patients. But it is not just a matter of quantity—it is also a matter of method.
The authors introduce a new permutation-based statistical test that leverages external data while maintaining rigorous standards of reliability:
“The permutation test leverages the available external data to increase power.”
In practical terms, the method increases the probability of detecting real effects without raising the risk of false positives.
The method
At first glance, this may seem like a technical issue, but the core idea is more intuitive than it appears: testing whether observed results are truly due to the treatment or could arise by chance.
Imagine having data from a clinical trial: some patients received the experimental therapy, others the standard one, and each has a certain outcome. The key question is whether the difference between the two groups is real. The proposed method does something conceptually simple: it generates many “alternative” versions of the same experiment, where treatments are randomly reassigned to patients while keeping their outcomes fixed. If similar differences appear even in these random scenarios, the treatment is likely to have no real effect. If, instead, the observed effect is much stronger than what appears in these simulated scenarios, the signal is credible.
Crucially, each simulated scenario does not rely only on trial data, but also incorporates information from previous studies or clinical databases. It is like having a benchmark: knowing how similar patients behave in other contexts helps determine whether a result is plausible or surprising.
A delicate balance
One of the most innovative aspects of the study lies in this balance. Integrating external data is risky: populations may differ, data may be incomplete, and hidden biases may exist. Yet the proposed method ensures that the rate of false positives remains controlled even in complex scenarios:
“the control of false positives does not require any assumption on the external data.”
This means the test works even when external data are not perfectly comparable—a highly realistic condition in clinical practice.
What the results show
Simulations and real-world applications, including an analysis of glioblastoma trials, confirm the method’s potential. The use of external data increases statistical power (i.e., the ability to detect real effects) compared to traditional methods, without compromising error control.
In particular, when external data are consistent with trial data, the benefits are substantial. When discrepancies exist, the method remains robust, avoiding false positives. This is crucial: many alternative approaches fail precisely when data are imperfect.
Faster trials, better decisions
The practical implications are significant. Integrating external data could:
- accelerate the development of new drugs
- reduce trial costs
- improve decisions about which therapies to pursue
In a context where time is often critical—especially in oncology—these advantages can translate into lives saved.
Moreover, the method paves the way for increasingly personalized medicine, where treatments are evaluated not only for the general population but for specific patient subgroups.
Beyond statistics: toward a new data culture
This study also signals a broader cultural shift: data are no longer confined within individual studies but become shared resources to be integrated and leveraged. The challenge now is twofold: improving the quality and accessibility of external data, and persuading regulators and pharmaceutical companies to adopt these new tools.