
Generating Better Images (and Ideas)
Imagine using an artificial intelligence model to generate a realistic face of a person. Each image is composed of millions of pixels, but we know that human faces, however different, all resemble each other in some basic features: two eyes, symmetry, proportions... It is precisely in this latent structure that the study Memorization and Generalization in Generative Diffusion under the Manifold Hypothesis, published in the Journal of Statistical Mechanics Theory and Experiment by Beatrice Achilli, Carlo Lucibello, Marc Mézard, and Enrico Ventura of Bocconi University and Luca Ambrogioni (Radboud University), finds the key to understanding when and why a generative model really works.
It's not just about aesthetics. This type of AI, called a generative diffusion model (DM), is now used to generate images, music, even genetic sequences or medical images. But how much does a model really understand what it generates? Is it capable of producing new and consistent examples? When do models become mere parrots that generate examples that were already part of the training set?
Modern Hopfield networks
To study the phenomenon of memorization in generative models, i.e., the undesirable tendency opposite to generalization, the authors exploit a connection between diffusion models and associative memory networks, inspired by neurobiology. This class of models was introduced by John Hopfield in 1982, a contribution that earned him the Nobel Prize in Physics in 2024. The analysis of a modern reinterpretation of the Hopfield network, with exponentially greater capacity than in the past, has allowed the authors to characterize and mitigate the problem of memorization in diffusion-based generative models.
Beneath the surface
The study starts from a well-known hypothesis in the world of machine learning: complex data is not distributed randomly, but is arranged on a sort of hidden surface, a low-dimensional ‘manifold’. Think of human faces: each photo may have millions of pixels, but the variations (expression, light, age...) move on a much simpler surface. The same applies to voice, medical images, or scientific data: they are only high-dimensional in appearance.
Researchers use a theoretical model called the Hidden Manifold Model (HMM) to represent this hidden structure. According to Carlo Lucibello, assistant professor of computer science, “the real data we use in generative models is not random. It is constrained by laws, rules, and structures, and it is precisely these that make learning more effective.”
The right time to stop
Even networks that “overfit”, i.e., learn data too well, to the point of memorizing it, are not entirely lost. One of the central findings of the study concerns the optimal time to stop generation in order to maximize generalization. In diffusion models, each new image is generated by progressively removing noise from a cloud of random points. The longer this process continues, the closer the image gets to examples seen during training. But, surprisingly, the best point to stop is not at the end of the process, but earlier: at a stage when the model is starting to memorize, but is still producing new variants. “Maximum generalization,” explains Marc Mézard, full professor of theoretical physics and Fondazione Romeo ed Enrica Invernizzi Chair in Computer Science, “occurs during the memorization phase. It's a paradox: generalization is best when the model is just starting to copy.”
A curse (partly) avoided
Another myth debunked is that of the curse of dimensionality: the belief that as data sizes increase, more and more examples are needed to learn. The study shows that this "curse" concerns the latent dimension, not the visible one. And if the latent variety is low, much less data is needed.
This is particularly useful for applications where data is rare or expensive to obtain: images of cancer cells, MRIs, chemical simulations... “In these cases,” Lucibello explains, “you don't need a disproportionate amount of data. You need to understand the underlying structure well.”
Why it matters
The theoretical and sophisticated work uses tools from statistical physics and spin glass theory to model the behavior of diffusion models. But the implications are practical: they help to design more efficient models, better select data, and optimize generation times.
In a context where generative AI is becoming increasingly widespread, from medicine to creativity, knowing when to stop can make all the difference. To produce images, sounds, or simulations that are not only beautiful to look at, but also intelligent, flexible, and sustainable.