Generating Better Images (and Ideas)

25 Sep 2025, by Barbara Orlando

A theoretical study by Bocconi sheds new light on how generative diffusion models, such as those behind Midjourney and Stable Diffusion, can create lifelike images. The secret? They generalize better when data is structured, but if trained for too long or on too little data, they are affected by a phenomenon called memorization

Imagine using an artificial intelligence model to generate a realistic face of a person. Each image is composed of millions of pixels, but we know that human faces, however different, all resemble each other in some basic features: two eyes, symmetry, proportions... It is precisely in this latent structure that the study Memorization and Generalization in Generative Diffusion under the Manifold Hypothesis, published in the Journal of Statistical Mechanics Theory and Experiment by Beatrice Achilli, Carlo Lucibello, Marc Mézard, and Enrico Ventura of Bocconi University and Luca Ambrogioni (Radboud University), finds the key to understanding when and why a generative model really works.

It's not just about aesthetics. This type of AI, called a generative diffusion model (DM), is now used to generate images, music, even genetic sequences or medical images. But how much does a model really understand what it generates? Is it capable of producing new and consistent examples? When do models become mere parrots that generate examples that were already part of the training set?

Modern Hopfield networks

To study the phenomenon of memorization in generative models, i.e., the undesirable tendency opposite to generalization, the authors exploit a connection between diffusion models and associative memory networks, inspired by neurobiology. This class of models was introduced by John Hopfield in 1982, a contribution that earned him the Nobel Prize in Physics in 2024. The analysis of a modern reinterpretation of the Hopfield network, with exponentially greater capacity than in the past, has allowed the authors to characterize and mitigate the problem of memorization in diffusion-based generative models.

Beneath the surface

The study starts from a well-known hypothesis in the world of machine learning: complex data is not distributed randomly, but is arranged on a sort of hidden surface, a low-dimensional ‘manifold’. Think of human faces: each photo may have millions of pixels, but the variations (expression, light, age...) move on a much simpler surface. The same applies to voice, medical images, or scientific data: they are only high-dimensional in appearance.

Researchers use a theoretical model called the Hidden Manifold Model (HMM) to represent this hidden structure. According to Carlo Lucibello, assistant professor of computer science, “the real data we use in generative models is not random. It is constrained by laws, rules, and structures, and it is precisely these that make learning more effective.”

The right time to stop

Even networks that “overfit”, i.e., learn data too well, to the point of memorizing it, are not entirely lost. One of the central findings of the study concerns the optimal time to stop generation in order to maximize generalization. In diffusion models, each new image is generated by progressively removing noise from a cloud of random points. The longer this process continues, the closer the image gets to examples seen during training. But, surprisingly, the best point to stop is not at the end of the process, but earlier: at a stage when the model is starting to memorize, but is still producing new variants. “Maximum generalization,” explains Marc Mézard, full professor of theoretical physics and Fondazione Romeo ed Enrica Invernizzi Chair in Computer Science, “occurs during the memorization phase. It's a paradox: generalization is best when the model is just starting to copy.”

A curse (partly) avoided

Another myth debunked is that of the curse of dimensionality: the belief that as data sizes increase, more and more examples are needed to learn. The study shows that this "curse" concerns the latent dimension, not the visible one. And if the latent variety is low, much less data is needed.

This is particularly useful for applications where data is rare or expensive to obtain: images of cancer cells, MRIs, chemical simulations... “In these cases,” Lucibello explains, “you don't need a disproportionate amount of data. You need to understand the underlying structure well.”

Why it matters

The theoretical and sophisticated work uses tools from statistical physics and spin glass theory to model the behavior of diffusion models. But the implications are practical: they help to design more efficient models, better select data, and optimize generation times.

In a context where generative AI is becoming increasingly widespread, from medicine to creativity, knowing when to stop can make all the difference. To produce images, sounds, or simulations that are not only beautiful to look at, but also intelligent, flexible, and sustainable.

MARC MEZARD

Bocconi University

Department of Computing Sciences

CARLO LUCIBELLO

Bocconi University

Department of Computing Sciences

Generating Better Images (and Ideas)

Modern Hopfield networks

Beneath the surface

The right time to stop

A curse (partly) avoided

Why it matters

MARC MEZARD

CARLO LUCIBELLO

Research

Startups: The Right Accelerator Is Worth Its Weight in Gold

A New Theory to Investigate the Hidden Links Between Phenomena

Europe Persuades More Than War

When Your Career Is Decided by Adjectives

Globalizing Firms in an Anti-Global Age

The Clauses That Cripple Work: Non-Competition Agreements Are Holding Italy Back

The Two Opposing Strategies for Decision-Making in Uncertain Conditions

Leonardo Borlini Joins the Editorial Board of the German Law Journal

Insider Trading Regulations? Just a Legal Placebo

A New Statistical Model to Infer Complex Structures in Multilayer Networks