China’s AI firms are cleverly innovating around chip bans

Listen to this story.

TODAY’S TOP artificial-intelligence (AI) models rely on large numbers of cutting-edge processors known as graphics processing units (GPUs). Most Western companies have no trouble acquiring them. Llama 3, the newest model from Meta, a social-media giant, was trained on 16,000 H100 GPUs from Nvidia, an American chipmaker. Meta plans to stockpile 600,000 more before year’s end. XAI, a startup backed by Elon Musk, has built a data centre in Memphis powered by 100,000 H100s. And though OpenAI, the other big model-maker, is tight-lipped about its GPU stash, it had its latest processors hand-delivered by Jensen Huang, Nvidia’s boss, in April.

This kind of access is a distant dream for most Chinese tech firms. Since October 2022 America has blocked the sale of high-performance processors to China. Some Chinese firms are rumoured to be turning to the black market to get their hands on these coveted chips. But the majority have shifted their focus to making the most of limited resources. Their results are giving Western firms food for thought.

Among the innovators is DeepSeek, a Chinese startup based in Hangzhou. Its latest model, DeepSeek-v2.5, launched in early September, holds its own against leading open-source models on coding challenges as well as tasks in both English and Chinese. These gains are not down to size: DeepSeek is said to have just over 10,000 of Nvidia’s older GPUs—a big number for a Chinese firm, but small by the standards of its American competitors.

DeepSeek makes up for this shortfall in a number of ways. The first is that it consists of a number of different networks, each best suited to a different problem. This “mixture of experts” approach allows the model to delegate each task to the right network, improving speed and reducing processing time. Though DeepSeek has 236bn “parameters”—the virtual connections linking distinct bits of data—it uses less than a tenth at a time for each new chunk of information it processes. The model also compresses new data before they are processed. This helps it handle large inputs more efficiently.

DeepSeek is not alone in finding creative solutions to a GPU shortage. MiniCPM, an open-source model developed by Tsinghua University and ModelBest, an AI startup, comes in varieties with 2.4bn and 1.2bn parameters, respectively. Despite its small size, MiniCPM’s performance on language-related tasks is comparable to large language models (LLMs) with between 7bn and 13bn parameters. Like DeepSeek’s model, it combines a mixture-of-experts approach with input compression. Like other small models with fewer parameters, however, MiniCPM may not be terribly high-perfoming in areas outside its specific field of training.

MiniCPM’s tiny size makes it well-suited for personal devices. In August its creators released a version of the model for mobile phones, which supports multiple languages and works with various types of data, from text and images to audio.

Similar approaches are being tried elsewhere. FlashAttention-3, an algorithm developed by researchers from Together.ai, Meta and Nvidia, speeds up the training and running of LLMs by tailoring its design for Nvidia’s H100 GPUs. JEST, another algorithm released in July by Google DeepMind, is fed smaller quantities of high-quality data for its initial training before being let loose on larger, lower-quality data sets. The company claims this approach is 13 times faster and ten times more efficient than other methods. Researchers at Microsoft, which backs OpenAI, have also released a small language model called Phi-3 mini with around 4bn parameters.

For Chinese firms, unlike those in the West, doing more with less is not optional. But this may be no bad thing. After all, says Nathan Benaich of Air Street Capital, an AI investment fund, “The scarcity mindset definitely incentivises efficiency gains.”

Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.