Generative AI vendors xAI and Perplexity released new models and products to challenge mainstream vendors.

Amid controversy surrounding its Grok AI chatbot making a series of antisemitic comments, xAI released Grok 4 on Wednesday night.

During a live stream on X, xAI's founder and X owner, Elon Musk, said the model can perform at a postgraduate level in mathematics, chemistry and linguistics based on tests like AI benchmarking platform Humanity's Last Exam.

"With respect to academic questions, Grok 4 is better than a PhD level in every subject, with no exception," Musk said during the livestream.

He added that while the multimodal generative AI model has not yet discovered new technologies, it could do so later this year or by 2026.

"AI is advancing faster than any human," Musk said.

Meanwhile, upstart AI search vendor Perplexity released an AI browser.

Examining Grok 4 The new model has reasoning and problem-solving capability and uses DeepSearch to access factual information from the web, including the X platform. DeepSearch is a tool for web-based analysis and helps with complex queries that require multiple steps. Grok 4 can process text and image inputs and has a new voice called Eve. The model can also perform multiple tasks simultaneously and is agentic, meaning it can use one or numerous agents for functions. It has a 256k context window and comes in standard and Heavy versions. Standard costs $30 per month, and Heavy costs $300. The standard version performs single-agentic tasks, while the Heavy version is multi-agentic. The release of Grok 4 comes only a few months after Grok 3 was released earlier this year, and days after Grok produced a slew of antisemitic responses. While Grok 4 shows the progress xAI is making in foundation models, the uproar over the model overshadowed the latest version's technical capabilities, said Arun Chandrasekaran, an analyst with Gartner. "They have solid research and technical capabilities," Chandrasekaran said. Also, the benchmarks that xAI cites seem accurate, but enterprises should not make their decisions about models based on benchmarks, said Bradley Shimmin, an analyst with Futurum Group. "It is a very much a guidepost, at best," Shimmin said. "It tells us that Grok 4 aligns with other frontier-scale models." He added that the Grok models have been in line with other frontier models for some time, but the update with Grok 4 shows that xAI has been trying to improve the model's ability to exceed other models on Humanity’s Last Exam.

Safe and responsible AI Despite the advancement, xAI needs to focus on responsible and safe AI, according to many tech observers "They need to focus more on guardrails," Chandrasekaran said. XAI should concentrate more on safety and ensure that the safety mechanisms are layered as part of the entire process of training and releasing a model, including considering prompt inputs. "Particularly in the case of Grok, it's more about the recency," Chandrasekaran said. This is because Grok seems to be taking context from the content coming from the X social media platform, known for its sometimes virulent and uncensored arguments about politics and culture. "They need to have a better filtering way from the context because otherwise the model could be very easily baited and biased from the recency of the inputs that are coming from X." In response to the comments the Grok chatbot made about the holocaust and false statements about “white genocide” in South Africa, xAI blamed a programming error. But for some, model’s offensive hallucinations go beyond an error made by a computer system. "This is just the latest instance in which [Musk's] work and reputation are bound up with antisemitism," said Michael Bennett, associate vice chancellor for data science and artificial intelligence strategy at the University of Illinois Chicago. "For the industry, it's just a clear indicator that there's still a lot of work to be done to get these models to produce useful, unbiased and socially acceptable responses. For his enterprises, it's a further datapoint suggesting that his antisemitism perhaps is not a one-off."

Permissiveness in the industry The model’s responses also signal an attitude of laxness in the AI industry that has cropped up over the last year, said Kashyap Kompella, CEO of RPA2AI Research. "The Grok incident is a sharp reminder that unfettered AI is a bad idea," Kompella said. "Grok's shenanigans expose the challenges of letting out AI chatbots unsupervised. We are ignoring and underinvesting in AI governance and guardrails. If there is a silver lining, this incident should wake up the AI industry to take AI governance seriously." Taking AI governance is especially important because these tools and technologies have a wider reach beyond the bounds of the U.S. and traditions of free speech, Bennett said. "For technologies that enable speech that reaches a broader audience ... the norms that we ought to be targeting to get the technology to align with, must necessarily be broad as well." The lack of governance could also affect xAI's ability to attract enterprise customers. "Model safety and responsible AI is a critical evaluation factor for a lot of enterprises; it's an area where xAI needs to make a lot of progress if they want to be a serious enterprise contender," Chandrasekaran said.