Getty Images/iStockphoto
The hidden costs of AI: What leaders must budget
AI implementation costs extend beyond model fees. Hidden expenses such as data preparation, governance, security and talent significantly affect ROI and require careful planning.
Businesses have integrated AI into the workplace. Employees use tools such as ChatGPT, Gemini and Claude for assistance writing emails, formatting letters, sprucing up content and even coding. And now, agentic AI has been introduced.
Enterprise software companies charge for many of these capabilities, and IT managers need to take a closer look at how these AI tools are affecting their metrics and ROI. When evaluating an AI investment, IT leaders must expand their focus beyond model access, infrastructure and productivity gains. They need to consider the true cost of integration, governance, security and ongoing operational overhead.
Integration costs often exceed model costs
When a prototype is developed, it often feels production-ready. A chatbot was created, data was attached, and it provided answers -- giving the impression it's ready to run with minor adjustments. But what works as a prototype often doesn't work at scale.
The transition to agentic AI introduces costs different from those seen with traditional SaaS models, according to Sumit Johar, CIO of BlackLine, a financial software platform.
Prototypes run on low-cost or free tiers. Large-scale deployment creates additional costs -- such as token costs and licensing fees -- which can be surprising, he said.
The data layer can also increase costs. Traditional SaaS systems rely on business logic to interpret data. But AI requires context. Building integrations to allow the system to pull context from other systems increases cost. Data must be AI-ready, meaning it's clean, noncontradictory and respects existing permission structures.
AI agents must be trained on their role and boundaries, which is another ongoing cost.
But because AI is not perfect, a human must be kept in the loop to monitor and approve critical agent judgments and protect security and compliance, Johar said.
"AI technology may be new, but the core implementation challenges are not," said Marcelo Lorenzetti, founder and chief AI officer of Savvylex, an AI platform for legal professionals.
The hidden costs of AI projects often become apparent during the transition from pilot to production. At this point, the environment shifts from low-demand to operationally intensive settings, introducing complexities and costs across multiple dimensions, Lorenzetti said.
Areas where expenses may increase include the following:
- Compute scarcity and escalating CPU pricing.
- Inference driven up by content bloat and retries.
- Network and data transfer.
- Compliance and audit.
- Security risk mitigation.
- Energy and labor.
Data preparation is expensive and ongoing
Data infrastructure and readiness costs now consume about $60 out of every $100 spent for AI projects, said Kunal Agarwal, co-founder and CEO of Unravel Data, a data observability and FinOps platform.
"LLM/model costs are certainly rising, but they are visible. Everyone's watching those," Agarwal said. "What's being missed is everything around the model: The data pipeline feeding it. The pre-processing jobs. The retraining loops that no one schedules properly. That's where the majority of the money is being spent, and no one is getting alerted to it."
Several factors can contribute to those numbers, he said, including the following:
- Reprocessing data already processed.
- Feature pipelines running on stale logic.
- Oversized clusters for small jobs.
- Experiment runs that never got cleaned up.
If organizations don't manage this AI-driven data explosion effectively, the costs and inefficiencies will stop innovation in its tracks, Agarwal said.
"Organizations have fiduciary duties, and if the investment becomes larger than the revenue, then the music stops," Agarwal said.
Governance compliance and legal costs
While the agentic-AI-era software increases execution speed, it also creates a governance paradox -- the faster an organization can move, the more critical traditional orchestration, logging and accountability become. The need for automation, monitoring and systems of record becomes even more crucial, driving up IT organizations' overhead.
"I do think it opens up a world of compliance and governance questions that perhaps weren't there before," said Phil Christianson, chief product officer at Xurrent, a service and operations management platform.
One new issue is the risk of employees bringing their own agent to work. Employees can now grant personal access tokens to tools such as Claude, enabling them to interact with enterprise APIs. This creates a massive blind spot to IT, as actions appear to come from a human but occur at machine speed.
In one instance, an employee gave Claude access to an internal system, and the agent sent thousands of requests in seconds, causing the company ID to be blocked after it was flagged as a DDOS attack, Christianson said.
While organizations can't prevent employees from using these tools, they can look for ways to bring those capabilities into enterprise software where what is happening will be monitored, logged and approved, Christianson said.
Ongoing operational costs
When calculating the true cost of enterprise-grade AI deployment, an organization must consider both ends -- the front end consisting of governance, security and ethical guardrails; and the back end, covering data cleanup, continuous maintenance and organizational adoption, said Ram Palaniappan, CTO of TEKsystems Global Services, a tech services and talent management provider.
Unlike legacy systems that are handed off to support after deployment, AI technology requires continuous fine-tuning, data quality monitoring and efficiency improvements. Maintaining an open architecture approach lets organizations "plug out and plug in" new models as they evolve, Palaniappan said.
As AI tools collapse traditional silos, the new full-stack expectation requires engineers to think horizontally across the entire software development lifecycle, moving from narrow, deep focus to end-to-end orchestration, Palaniappan said. Organizations must be prepared to scale up their team members to this new mindset.
As the new agentic AI layer emerges, organizations must plan for the future requirement for agent-to-agent protocols and specialized governance to monitor autonomous decision-making.
The greatest hidden cost is often the failure of human adoption, Palaniappan said.
"If the human-in-the-loop workflow is not clearly defined and the workforce is not upskilled, the technology investment yields zero ROI," Palaniappan said.
Security and risk management overhead
Many organizations view AI technology as a plug-and-play tool, said Dirk Schrader, vice president of security research and field CISO, EMEA, at Netwrix Corp., a cybersecurity software company. In reality, when AI -- a probabilistic engine -- is introduced into processes designed for predictable outputs, existing controls and ownership models begin to break down. Organizations must establish new workflows, clearer accountability, stronger validations and human oversight at the right points. Without these changes, AI introduces friction, exceptions and risk rather than value, Schrader said.
"If you don't govern the use of AI, if you don't have policies in place, if you don't have a clear vision of what you want to do with AI, you're basically giving carte blanche to make use of it in any shape or form -- whatever the employee has in mind -- introducing more cost, making exhaustive use of the model with no return on investment," Schrader said.
Organizations must be specific about what they want to accomplish with AI. This involves defining the business processes that will be enabled by AI, the data that will flow through those processes, who will own and access them, and how they all work together to achieve the expected output.
While AI may not create new security flaws, it can highlight existing weaknesses. It operates at a greater speed than a human but without the discretion to protect sensitive data.
"AI can help you improve. But before it improves your organization, it will simply start making those weaknesses you already have impossible to ignore," Schrader said.
Talent and organization coat
Despite AI's problem-solving abilities, there will always be a need to keep humans in the loop. On top of initial costs to bring current employees up to speed on the new system, organizations may need to retrain their existing talent to take on additional tasks or expand the team with some new talent. New roles may include the following:
- Machine learning engineers for developing algorithms, training models and deploying AI systems.
- Data engineers to build infrastructure and maintain data pipelines.
- AI researchers to focus on specifics, such as natural language processing and reinforcement learning.
- Machine learning operations engineers responsible for monitoring and maintenance of the models.
- AI solutions architects to design the AI system and integrate it with the existing business infrastructure.
Organizations may also consider adding personnel to ethical and governance roles, such as a compliance specialist, to ensure regulatory compliance and address compliance or data privacy issues.
Cloud and infrastructure cost creep
When implementing new enterprise software, organizations may encounter unexpected costs due to token consumption, infrastructure bottlenecks and data quality issues.
In traditional software, costs are predictable -- you run a process, you get a result. In agentic workflows, prompts are broken into subtasks handled by multiple AI agents, each making their own calls to the model. A single user query can trigger a cascade of model interactions, each of which consumes tokens. Because the number of steps is unpredictable, bills can be far larger than expected, said Hugo Huang, public cloud alliance director at Canonical, the publisher and creator of Ubuntu.
Running the new software on existing hardware can also increase costs. Huang outlined some infrastructure bottlenecks, including the following:
- CPU. Von Neumann architecture excels as sequential logic, crucial for AI system organization; however, the memory wall creates a bottleneck for running large language models.
- GPU. These are built for parallel tasks, but GPUs rely on shared memory that all processors must access. This shared memory can become a bottleneck, which increases cost and latency.
- TPU. Instead of shuffling data back and forth to shared memory, TPUs pass data directly from one processing step to the next. But you need specialized engineering talent to run the sophisticated XLA compiler and use the systolic array architecture.
- XPUs such as NPU, DPU, IPU and LPU. Each has its own strengths and costs if not in its comfort zone.
- Legacy systems. Decades-old databases and documentation weren't designed to work with modern AI. Connecting them requires expensive engineering work that is rarely budgeted upfront.
Overall, the quality of the data matters more than the model's sophistication, Huang said. A state-of-the-art model fed poor or incomplete data will produce poor results.
"With agentic AI, the bill is larger than expected by design," Huang said. "Token consumption scales in ways most executives never see coming -- and by the time they notice, the budget is already gone."
Julie Hanson is a freelance writer who has reported on local news across Massachusetts and New Hampshire.