A New Theory to Investigate the Hidden Links Between Phenomena
Understanding how much two phenomena depend on each other is one of the most fascinating and difficult challenges in statistics. From Pearson's correlation, developed in the late 19th century, to the algorithms that power artificial intelligence today, scholars have created dozens of tools to measure the dependence between variables. But each method has shown its limitations: some only work in linear cases, others lose consistency when the data becomes complex.
Emanuele Borgonovo and Giuseppe Savaré (Department of Decision Sciences and BIDSA, Bocconi), together with Alessio Figalli (ETH Zürich), Promit Ghosal (University of Chicago), and Elmar Plischke (HZDR, Germany), have attempted to bring order to this maze. Their study, "Convexity and Measures of Statistical Association," published in the Journal of the Royal Statistical Society, proposes a general theory that unifies many of the existing measures into a single coherent framework.
A common language for dependence
The underlying idea is as simple as it is powerful: measures of dependence arise from very different formulas, developed in different contexts, but all seek to capture how much two variables influence each other. Borgonovo and colleagues show that seemingly unrelated tools—from Csiszár's distance to kernel-based methods and optimal transport—can be traced back to a single coherent mathematical structure, which helps us understand when and why certain measures work better than others.
This insight gives shape to a common language that helps bring classical statistics, information theory, and machine learning into dialogue.
Convexity: a way to navigate data
At the heart of this new theory is convexity, a mathematical concept that the authors place at the foundation of the entire construct. To get an idea, think of a wide, regular valley: if we draw an imaginary line between two points on its slopes, that line always passes over the ground. There are no hidden holes or sudden rises. In statistics, a 'convex' measure behaves in the same way: it varies consistently, without sudden changes in direction.
As the authors explain, thanks to convexity, information behaves naturally: it grows when details are added and decreases only when details are lost, never the other way around.
In other words, if the observation of a phenomenon is simplified, the number describing the relationship between the variables cannot suddenly become larger. A natural rule, translated into the precise language of mathematics.
From theory to concrete data
Once the theoretical framework has been defined, the authors go further and show how to apply it in practice. They propose two estimation procedures: one based on nearest neighbors (a typical machine learning technique) and another that draws on a method developed by Pearson in 1905, reinterpreted in light of the new theory. The authors demonstrate that Pearson's estimators become asymptotically unbiased for a large family of association measures, answering a long-standing theoretical question.
Their structure also allows them to derive a central limit theorem, which makes it possible to estimate the uncertainty of results and construct more reliable statistical tests, an important step for applications ranging from economics to biology to the safety of artificial intelligence models.
A compass for data science
In their conclusions, Borgonovo, Savaré, and their colleagues write that convexity “ensures minimality, not negativity and monotonicity, making the measure maximum when Y is determined by X and minimum when Y is independent of X.” In simple terms: it offers a compass for carefully choosing the right way to measure the link between data, without falling into paradoxes or inconsistencies. And this class, while not exhausting all possibilities, opens up new avenues of research into the geometric foundations of statistical dependence.
In an era in which relationships between variables—economic, climatic, or biological—are at the heart of every prediction, this research proposes a mathematics of consistency, capable of determining how much one thing really depends on another.