How physics can improve image-generating AI

Listen to this story.

TODAY’S BEST image-generating artificial-intelligence (AI) models are remarkable. Ask OpenAI’s DALL-E 3, or its counterparts Midjourney and Stable Diffusion, to draw a penguin sipping on a vodka martini on the French Riviera and they will do so with aplomb. Ask them to replicate it in the style of Rembrandt or Caravaggio and they will speedily oblige.

These abilities all depend on a family of physics-inspired algorithms known as diffusion models. For now, they reign supreme. But that may not always be the case. A team of physicists and computer scientists at the Massachusetts Institute of Technology (MIT) has been taking inspiration from the laws of nature to come up with a series of increasingly sophisticated algorithms that can generate higher-quality images faster, and with smaller training data sets, than diffusion models.

Diffusion models mimic the maths of the physical process of diffusion, the flow of particles from areas of high to low concentration as they are randomly jostled about in space. The MIT team’s more advanced algorithms make use, instead, of the equations of electromagnetism—and may one day even use the mathematics that govern the forces at play in the atomic nucleus. This work suggests that computer scientists have barely scratched the surface of how generative algorithms can work. A new school of AI art is emerging.

The goal of a good generative algorithm is, ultimately, to create bespoke images from scratch. One growing class of models, to which DALL-E 3 and its counterparts belong, does this by taking vast data sets of training images and distorting them, pixel by pixel, until they are indistinguishable from visual static. As the patterns underlying these distortions are identified, the algorithm can run combinations of them in reverse, allowing entirely new images to be born out of nothing more than background noise.

This means that mathematical ways of inducing distortion are in high demand. Enter diffusion. Imagine, for simplicity’s sake, a monochrome picture consisting of a single pixel. To a computer scientist, that picture can be represented by a point on a single axis running from white to black. Or, in other words, as a point in one-dimensional space. For every pixel that is added to the picture, the number of dimensions increases by one—and the picture is now represented by a point in multidimensional space. If random noise is added to that point (by changing the colour of the image’s constituent pixels), it will then move randomly in its multidimensional space—a process mathematically identical to that of a particle undergoing diffusion.

The fact that mimicking such simple physical processes has had such profound computational benefits caught the attention of Max Tegmark and Tommi Jaakkola, two physicists at MIT, and their graduate students, Yilun Xu and Ziming Liu, in 2022. Together, they set out to explore whether models trained on more complex processes might do an even better job of image generation. They started by toying with the physics of electrically charged particles. Unlike in standard diffusion, the journeys of charged particles are not truly random. Repelled and attracted by their neighbours, they are governed instead by the electric field in which they exist.

It’s electrifying

This behaviour can be emulated in the way that noise is added to a digital image. Recall that images can be represented as points in a multidimensional space defined by the colours of each pixel. If these points are treated like particles with identical electric charge, they ought to repel one another, moving in opposite directions until the system reaches electrostatic equilibrium. Or, in other words, each image will change in response to every other image until all have been sufficiently distorted.

It turns out that a machine-learning model trained to reverse this process can have considerable advantages. This is because the distorting noise is not merely random, as in diffusion models, but carries additional information about the training data. That makes for a more efficient algorithm. Mr Xu, Mr Liu and their colleagues then published a preprint outlining this new class of models. They called them “Poisson flow generative models” (PFGMs), named for Poisson’s equation, which describes the electric field created by static electrical charges. Judged by industry standards, PFGMs generate images of equal or better quality than state-of-the-art diffusion models, while being less error-prone and requiring between ten and 20 times fewer computational steps.

The researchers were not done yet. They also turned their attention to Coulomb’s law, the equation that governs the strength of the electric field which exists between two charges (and from which Poisson’s equation can be derived). The researchers found that changing the number of dimensions in which Coulomb’s law operates has implications for a PFGM’s behaviour. Fewer dimensions result in models that require more data to train but that need fewer parameters, make fewer errors and produce more consistent images. More dimensions result in models that require less data to train but are bulkier, more error-prone and less consistent.

In a subsequent preprint, the team called this broader family of electrostatic models PFGM++. They also made a surprising discovery. When the number of dimensions in the equations is taken to infinity, the distortion algorithm behaves like a standard diffusion model. This means that PFGM++ folds all the current physics-inspired generative models into one family.

Still more complex distortion mechanisms beckon. The next target for Messrs Xu and Liu is the weak interaction, which, alongside electromagnetism, gravity and the strong interaction, is a fundamental force of nature. (Imperceptible at human scales, it is responsible for certain types of radioactive decay.) Conveniently, its equations are almost identical to those used in the PFGM++ family of models.

The weak force, however, has special properties that the electromagnetic force does not. For one thing, it does not need to conserve the number of particles. Pairs of particles can mutually annihilate, and new ones can pop into being. If this physics is translated into an algorithm, it may unlock new behaviour: compressing data with record efficiency, for example, or offering applications in cell biology where objects multiply or die out. How well it can draw a penguin, though, remains to be seen. ■

Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.

Темы и теги

Наука и Технологии

генерировать изображение физика интеллект искусственный модель хороший замечательный