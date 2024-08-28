AI hardware startup Cerebras Systems' new AI inference tool could challenge Nvidia's GPU offerings, but the vendor faces many hurdles in winning over enterprises.

On Tuesday, the AI vendor introduced Cerebras Inference, a new product that delivers 1,800 tokens per second for Llama 3.1 8B and 450 tokens per second for Llama 3.1 70B. Cerebras Inference is faster than Nvidia's GPU-based hyperscale cloud, Cerebras said.

It is powered by Cerebras' Wafer-Scale Engine and costs less than GPU-based offerings, the AI vendor said.

Change in the market Cerebras Inference shows the change in the generative AI market, according to Arun Chandrasekaran, an analyst at Gartner. In the initial stage of the generative AI hype, there was a lot of emphasis on training. Now, the market is shifting toward the cost and performance of inferencing, he said. "It is also a sign that AI use cases are starting to proliferate and expand into the enterprise," Chandrasekaran said. "Which is why the innovation is not just happening in the training aspect of it. It's happening in the inferencing aspect of it." As GenAI use cases grow in the enterprise, the performance of inferencing is becoming more important, providing an opportunity for vendors such as Cerebras, Chandrasekaran said. However, the opportunity is also for specialized cloud providers starting to rise and build intrinsic chips, while offering open source models on top of the chips. Therefore, while Cerebras can differentiate itself based on performance and might be able to outperform even Nvidia, it will also have to compete against others such as hyperscalers like Microsoft, AWS and Google, and specialized inferencing providers like Groq, which recently raised $640 million.