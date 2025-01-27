A Chinese AI vendor's new large language model is making technology vendors in the U.S. rethink the development and training of generative AI reasoning models.

On Jan. 20, DeepSeek introduced its first generation of reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.

Since its release, DeepSeek's AI assistant has taken the top spot from OpenAI’s ChatGPT as the most downloaded free app on iOS.

The new LLM's immediate worldwide popularity sent AI chipmakers’ stocks, particularly that of AI chip giant Nvidia, plummeting on Monday as tech investors lost confidence in U.S. AI vendors. Nvidia lost 17% of its value on Monday, wiping out $589 billion of its market capitalization, while the tech-heavy NASDAQ composite dropped 3%.

DeepSeek-R1-Zero is a model trained with reinforcement learning, a type of machine learning that trains an AI system to perform a desired action by punishing undesired ones.

DeepSeek-R1 is a version of DeepSeek-R1 Zero with better readability and language mixing capabilities, according to the AI startup.

Reasoning and open source DeepSeek R1 is comparable to OpenAI o1 models in performing reasoning tasks, the startup said. Both models are open source and come in six parameter sizes: 1.5B, 7B, 8B, 14B, 32B and 70B. The models were released in open source, continuing the interplay between open source and closed source models. Meta’s Llama family of open models has become widely popular as enterprises look to fine-tune models to use with their own private data, and that popularity has spawned increasing demand for open source generative AI systems. Founded in 2023, DeepSeek achieved innovative success out of its need to find solutions to the infrastructure problem imposed on Chinese companies by the U.S. government’s restriction of Chinese access to top AI chips. Given the hardware restrictions, DeepSeek's achievement in inexpensively building an open source model that performs well compared to established models from big AI vendors in reasoning techniques is impressive, said Gartner analyst Arun Chandrasekaran. "The conventional thinking was that LLMs are getting commoditized, so the future is building more reasoning models," Chandrasekaran said. In line with that trend, Google in December introduced Gemini 2.0, which included reasoning capabilities. The models in the OpenAI o1 series have also been trained with reinforcement learning to perform complex reasoning. Despite prominent vendors introducing reasoning models, it was expected that few vendors could build that class of models, Chandrasekaran said. "Nobody saw a Chinese company actually coming up with a … reasoning model," he said. "That in itself is really noteworthy." DeepSeek's ability to also use various models and techniques to take any LLM and turn it into a reasoning mode is also innovative, said Futurum Group analyst Nick Patience. The excitement about DeepSeek also comes from a need for the AI models to consume less power and cost less to run, said Mark Beccue, an analyst with Enterprise Strategy Group, now part of Omdia. DeepSeek, which says it trained its latest model for two months at a cost of less than $6 million. By comparison, the cost to train OpenAI’s biggest model, GPT-4, was about $63 million, excluding employee salaries. "Models must become cheaper to run and they must become more accurate in order for GenAI to scale for enterprise," Beccue said. "In terms of running cheaper, model makers, the chip makers and other hardware manufacturers and data center players all know this and are working towards that goal. It's up to the model makers to deliver more accurate AI responses."

Constraints and innovations But some observers are skeptical that the vendor performed inferencing and training of its model as cheaply as the startup, which originated as a hedge fund firm, claims, Chandrasekaran said. "One theory is that constraints often create innovations," he said, adding that DeepSeek’s lack of access to GPUs could have forced the vendor to create an innovative technology without accruing the cost of modern, expensive GPUs since they don't have access to them. "The other way to think about [DeepSeek] is that we don't know the infrastructure that it is trained on." DeepSeek is not the only AI vendor or technology company in China that could turn limitations into innovation, Patience said. "When you put these constraints on a country so large with much understanding of how to build electronics ... you could see them eventually getting to the stage where they're going to be building their own GPU competitors," Patience said. "That's a long way out, but I suspect that will happen."