Why enterprise AI is moving toward customization

TechTarget.com/searchenterpriseai

https://www.techtarget.com/searchenterpriseai/feature/Why-enterprise-AI-is-moving-toward-customization

Why enterprise AI is moving toward customization

By Lev Craig

Agents, assistants, copilots -- the language of AI is increasingly focused on autonomy, including in many talks at this year's EmTech AI conference, hosted by MIT Technology Review.

But for business users, the most promising developments aren't flashy general agents. The most tangible progress is coming from a different direction: customized AI systems designed for real-world workflows.

These tools don't promise general intelligence, and that's the point. Whereas foundation models like OpenAI's GPT series or Anthropic's Claude are trained on broad internet data and can handle a wide range of downstream tasks, specialist systems are built to handle defined domains.

"If you're using a foundation model, that's absolutely great if you want it to give you an answer for a poem that you're writing or ... a recipe," said Eleanor Lightbody, CEO of legal tech company Luminance, in an interview with Informa TechTarget. "When you're working with lawyers ... you need to have a system that is accurate every single time. And, if it's not accurate, it tells the human when it doesn't know the answer."

The limits of current agentic systems

Both the capabilities and limitations of today's general AI agents were on display during the session "AI Agents Transforming Digital Interaction," led by Reiichiro Nakano, a research scientist at OpenAI, and Yash Kumar, an engineer on OpenAI's new product explorations team.

In that session, the two demoed Operator, an AI agent that performs tasks by navigating the internet in a human-like manner through clicking, scrolling and typing. The interface is polished, and the demo tasks focused on everyday activities: listing conference speakers and finding them on LinkedIn, buying basketball tickets on StubHub, or ordering groceries via Instacart.

But the performance was slow and uncertain. OpenAI ran the three tasks simultaneously -- nominally to show that Operator could handle multiple requests, but also functionally masking the plodding pace of each process.

More striking was the tool's constant need for confirmation. Operator asks for user approval before proceeding with minor actions, coming off a bit like an anxious intern looking for signoff. This can feel especially obtrusive when doing things that are trivially easy for humans, like navigating StubHub and Instacart.

In fairness, that friction is there by design. Kumar and Nakano framed Operator's interruptions as a safety measure, a way to prevent mistakes and unintended actions. "We trained the model to ask for confirmations every time it tries to do something that might have an effect on the real world," Nakano said in a Q&A following the presentation.

But the tradeoff between oversight and usability is clear: The more checks are required, the less independently capable an agent really is. And today's systems still struggle when it comes to reasoning and long-term planning, including on some simple tasks.

Earlier this year, for example, Platformer's Casey Newton tested Operator on a grocery purchasing task of his own, noting that it launched by searching for milk in Des Moines, Iowa, rather than asking for his location or grocery list. "The process was painstaking and inefficient in a way that personally made me laugh, but I imagine might drive others insane," he wrote.

At EmTech AI, OpenAI's clearest Operator success story was, in fact, a fairly anodyne enterprise customer service use case: automatically retrieving ChatGPT receipts from internal support systems.

"The model just fires up Intracom, which is what we use for our customer support, looks at it and figures out, 'I need to send the invoice,'" Kumar said in the presentation. Then, "[the agent] goes to Stripe, clicks the invoice button and it's done."

It's a useful business automation, but not exactly science fiction. Today's agents still get tripped up by multistep workflows -- Kumar described Operator's ideal task length as 10 to 15 minutes -- and when they do act, their decisions must be scrutinized for accuracy, bias and security.

Domain-specific AI shows practical value for enterprises

These limitations of general agents are one reason why many companies are instead seeking out constrained, customized AI systems. Those kinds of narrowly scoped tools, built not for general reasoning but for getting a specific job done, are already proving useful.

Some task-specific AI tools have been derided as "ChatGPT wrappers": simple interfaces tacked onto existing models. But when executed well, a domain-specific AI tool is useful precisely because it does more than repackage; it tailors a general-purpose model or technique to the structure and needs of a particular profession.

"We need models to be contextualized," said Margarida Garcia, vice president of operations at AI coding startup Poolside, in her session, "Innovations in Generative Coding." "The more context, the more relevance there is in those models for the environments that they're being deployed into, the better ... the solutions, the better the answers, the better the outputs."

Anyone can access a large language model (LLM) like ChatGPT or Claude. But adding AI capabilities to other types of tools -- with the right context windows, trust layers and constraints to produce effective results -- can make a big difference in usability. And many domain-specific tools go far beyond UI tweaks, incorporating proprietary models, orchestration layers and hybrid architectures that blend generative and non-generative machine learning techniques.

How Luminance is using AI in law

The Luminance platform is designed to augment and accelerate legal workflows, automating tasks like contract negotiation and compliance checks. Rather than relying on a single LLM, Luminance incorporates proprietary, open source and foundation models, as well as classical machine learning techniques.

"We basically have a model leaderboard ... where we're always looking at the different tasks, how accurate [the models] are, both with precision and recall," Lightbody said. "If it's not over 95%, we're not going to use it."

Likewise, instead of using an LLM to reason through tasks, Luminance uses an orchestration layer that selects the right model or tool. "Large language models aren't very good at choosing what to use when," Lightbody said.

Overall, the platform emphasizes traceability, observability and control -- important considerations in a high-stakes, highly regulated field like law.

"We have to provide evidence of every decision we make," said Graham Sills, Luminance's co-founder and director of AI, in an interview with Informa TechTarget. "We often go by the rule of, we have to find evidence in one of your documents if we're going to make a decision. That's really important. ... How do we keep it grounded in something that's going to build trust?"

The nature of legal work -- highly standardized, procedural and repetitive -- also makes it well suited to this approach. "Lawyers are very process driven," Sills said. "They have all their processes mapped out. They all have to follow the same process. So it's almost custom made for agents."

The fact that legal professionals tend to have a backlog of repetitive, tedious tasks that they'd like to offload doesn't hurt, either. "Often, the best applications of AI are the really boring bits that people don't want to do," Sills said. "If something is exciting, then you don't have to build AI for it."

How Poolside is using AI in software development

Poolside is taking a similarly targeted approach to software engineering. The company builds coding foundation models from scratch, which can then be fine-tuned on an enterprise's codebase and documentation and integrated with editors like Visual Studio Code and IntelliJ.

Most general-purpose LLMs can write code, but they weren't necessarily designed for it. Rather, they're capable of producing code because code is present in the enormous corpora of general internet data they were trained on.

"Natural language offers models the ability to communicate," Garcia said in her session. "[LLMs] write back at you in the same way you write back at them. [But] the reality is, there's a lot of things that natural language alone doesn't solve."

Rather than rely solely on public code from open source repositories, Poolside supplements its training data with synthetic data generated through an internal simulator capable of running code. That approach -- which the company calls "reinforcement learning from code execution feedback" -- is effective due to the nature of coding as a domain.

"The good thing about code, and the good thing that code as a data set offers, is that it's a deterministic environment," Garcia said. "It's binary. We know whether code computes or it doesn't when we put it through the machine. So it's much easier to generate synthetic [data]."

For human developers, writing software involves repeated cycles of testing, revising and refactoring. That back-and-forth quality makes coding an especially promising use case for reinforcement learning, where a model learns through trial and error by receiving feedback about whether its action succeeds in achieving a preset goal.

"Writing code is so iterative," Garcia said. "You have to see if it runs. You have to come back to it, you have to improve it. And this is why ... reinforcement learning, especially in code, is the way to go."

Today, Poolside's models can assist with writing, debugging and testing code across multiple files, especially in enterprise environments where more general models might struggle to understand context. But Garcia was also upfront about the limits of current AI systems. These models typically do well on tasks that take a few hours, but they're not designed for full system builds or for totally automating the development lifecycle.

"[AI] needs a lot more assistance from devs than, I think, what a lot of people still think about," she said.

Lev Craig covers AI and machine learning as the site editor for SearchEnterpriseAI. Craig graduated from Harvard University with a bachelor's degree in English and has previously written about enterprise IT, software development and cybersecurity.

08 May 2025