Red Hat distinguished engineer unpacks 'The Agentic Paradox'
Agentic AI has been making waves in the technology conference circuit over the past few months as more enterprises make use of AI agents.
An Omdia survey of 400 IT and cybersecurity professionals revealed that 66% of organizations are actively using agentic AI, with 82% of respondents classifying AI agents as a top or high priority for their organization.
Industry leaders are responding: Google debuted Agentic Data Cloud, a data management platform built to support the scale of AI agent workloads, at its Google Cloud Next conference in April. In May, Dell introduced Deskside Agentic AI, its on-premises AI agent sandbox, at Dell Technologies World.
As enterprises' agentic AI deployments scale, however, rising token costs can become a budgetary burden. This begets the agentic paradox, a concept described in a blog by Steve Watt, a distinguished engineer and vice president of the Office of the CTO at Red Hat.
"I think the simplest way to put it is: You'd be crazy not to start with frontier AI models," Watt explained in a recent episode of IT Ops Query. "But because of the unit economics, eventually, as your company achieves a certain amount of scale, you'd be crazy to stay on them."
The agentic paradox, which Watt noted might be more accurately referred to as an inference paradox, mirrors the cloud paradox described in a 2021 article by Sarah Wang and Martin Casado of Andreessen Horowitz.
"Essentially, what they suggest there is to basically have a hybrid strategy and move some of your cloud costs -- and in this case, inference costs -- to a place that you can control the costs, which would be self-managed," Watt said.
Open-weight models are a key component of this hybrid strategy, according to Watt.
Unlike frontier models, such as Google Gemini and Anthropic Claude, open-weight models can be hosted and run more feasibly on local infrastructure. This eliminates token costs, which can increase sharply as an organization scales its use of AI agents.
However, research has shown that frontier models are still the first choice for many users. "Closed models dominate, with on average 80% of monthly LLM tokens using closed models despite much higher prices … and only modest performance advantages," according to a Georgia Tech Scheller College of Business research paper.
Though open-weight models offer potential cost savings and comparable performance, making the shift to self-hosting these AI models can pose challenges for some organizations. Session panelists at this year's Red Hat Summit discussed their experiences migrating to self-hosted models and the obstacles they encountered.
One such obstacle? Most enterprises don't currently have the expertise in-house for this sort of migration, according to Watt. "I think the incentives are there. But to actually be successful, you'll need the skills or be able to outsource or bring in the skills," he said.
At its 2026 summit, Red Hat rolled out version 3.4 of its Red Hat AI platform, which the vendor claimed can help enterprises in their transition to self-hosted AI. Updates included built-in observability and security controls that support Model-as-a-Service, which delivers AI models as an internally managed platform service.
Watch this episode of IT Ops Query for more on the agentic paradox and Red Hat's current work to further the reality of self-hosted AI.
Kate Murray is a managing editor with Informa TechTarget's Infrastructure editorial team. She joined the company as an associate managing editor of e-products in 2020.
Beth Pariseau: Hello from Atlanta. I'm here at Red Hat Summit with Steve Watt, a distinguished engineer and vice president of the Office of the CTO at Red Hat, which includes Red Hat Research and Emerging Technologies.
Prior to joining Red Hat, Steve was the founder of the Hadoop Business and Hadoop Chief Technologist at HP, and a software architect and master inventor at IBM Emerging Technologies. Prior to IBM, Steve worked for a number of consumer-facing software startups in the USA and his native South Africa.
This week, Steve published a blog post titled 'The agentic paradox.' In it, he wrote, 'The fastest path to increase the velocity of your business processes is to use powerful frontier AI models. However, as adoption scales, this strategy becomes unsustainable.'
So, thank you for joining me, Steve.
Steve Watt: Yeah, my pleasure. I'm glad to be here.
Pariseau: So, what exactly is 'The Agentic Paradox'?
Watt: I think the simplest way to put it is you'd be crazy not to start with frontier AI models. But because of the unit economics, you'd be -- eventually, as your company achieves a certain amount of scale -- you'd be crazy to stay on them.
Pariseau: Why is that?
Watt: Well, essentially, you'd lose control. And so, essentially, the best way to describe it is like -- I'd start with why you'd want to start there, right? I think it's pretty well established that Anthropic and OpenAI have the best model performance today. So, if you're starting a new company, you're gonna get the best yield using those services.
At some point though, you'd start hitting diminishing returns as to the token cost, and you'd lose the ability to manage the token cost because it's basically a third-party SaaS service that you're using.
And so, that's where it runs into the same paradox that was first established -- I mean, I'm essentially highlighting the same paradox called the cloud paradox from an article from Andreessen Horowitz. And essentially, you know, what they suggest there is to basically have a hybrid strategy and move some of your cloud costs -- and in this case, inference costs -- to a place that you can control the costs, which would be, you know, self-managed.
Pariseau: So, why is it the agentic paradox? Does it apply differently to agents versus AI in general?
Watt: No, I think another way it could be said is inference, an inference paradox. I would just say that I think the term agentic is a little more relatable, where that seems to be the overwhelming use case for how people are using inference today.
Pariseau: And, you also mentioned in the blog, reinforcement learning and open-weight models will lead to this kind of hybrid world. So, what is reinforcement learning, and how is it going to lead to a hybrid architecture?
Watt: That's such a great question. I'll just start with open-weight models, which are a wonderful, probably second step in this journey. I'd say the first step would be finding a cheaper inference as a service, which delivers the same value, is probably the first thing people will look to to drive down cost.
But as you come self-managed, you're going to need to have your own model. And then there's a bunch of free models out there on Hugging Face, which are these open-weight models. It's a pretty fast-moving space. So, the companies, the model providers, that are providing the best ones are changing, you know, every month. But they're just getting better and better.
But you're able to leverage, you know, they've taken on the cost of pretraining and producing these giant models, and so you literally can just use them to close that last mile. And then there are these different strategies of closing the last mile to get them to work for the agents you've built.
So essentially, the idea is, OK, you've built some -- technology just automates our business problems. So, you've taken a business process, you've used Claude, or something to that effect, to build an agent around it so it's automated. Your next goal would be to, well, how can I get off Claude onto an open-weight model? That open-weight model may work out of the box. It may not. If it doesn't, then you need a strategy to close that last mile.
And that's where reinforcement learning comes in, where you can post-train the model with your own data and then use reinforcement learning. It's a post-training strategy where you're able to create a scoring system that it's able to do different improvement runs. And across those, it can basically gauge its efficacy and then pick winning strategies for how it routes through the model weights to get to, like, the outcomes you want.
So you could say, you know, 'My agent needs to be effective at this thing,' and it will use that reinforcement learning to back into the most successful strategy to deliver that business process.
Pariseau: Do you think most enterprises have the expertise in-house to go through that process?
Watt: Not today. Yeah. And I think this is part of, like, the broader journey everyone's gonna have to go on, which is -- and it's a challenge. And I think it's not too dissimilar from the public cloud challenge, right? Of, like, managing your enterprise infrastructure.
You typically need a strong incentive to not have someone else do it for you, right? And I think that that's what we'll see with the skills as the incentives emerge, where cost is a significant one, right? Where we're seeing some reports of 'By Q2, they've burned through their entire cloud spend for the year.' That's a problem.
And so, I think the incentives are there. But to actually be successful, you'll need the skills or be able to outsource or bring in the skills. There's a lot of different companies, startups at this point, building 'reinforcement learning gyms' is what they call it. But you think of it like an environment or a harness, which you can choose to -- so that you can lower the amount of expertise you need in-house.
Pariseau: OK. So, I also understand that, you know, Red Hat doesn't see it all as self-managed or public cloud. It's a hybrid situation. But, and I guess this is sort of a time-honored question with hybrid, how do you have a hybrid cloud AI strategy without getting the worst of both worlds instead of the best, right? The costs of public cloud, but the complexity of the private side. I mean, you know, is that potentially compounding the issue?
Watt: Yeah. I think the ultimate hybrid strategy is to get the best of both worlds. I think if something's not working for you on-premises or self-managed, you would tend to keep it working where it is working well for you until you can find a compelling alternative.
And I think the best -- one of the things I liked about our CEO's keynote this morning was he talked about our own transformational journey. So, I'll talk a little bit about what's going on in my own research team, right, which is we are building agents to automate our entire research process from how ideas get incubated and approved. You know, anybody can work on anything. But if, you know, like, say you need -- the difference between needing one person and 20 people? Twenty people is going to require some management approval that it's sufficiently prioritized.
So, we're building agents around this whole process from prototyping to graduation into the product. We're primarily starting initially, like, on Opus. But the moment we've finished the full agentic SDLC process, what we're doing from then is moving as much of it as possible to open-weight models.
And that's, like, a journey everyone, I think, will go to. Like that's a research flavor, but, you know, there could be a procurement flavor, or something to that effect. And our arbiter will be, like, 'Can we close that last mile, or can't we?' And if we can't, it's gonna stay on Anthropic. If it can, it'll be self-managed.
Pariseau: Has there been any significant increase in the number of enterprise customers that are running AI agents in production, or is that still something people are kind of grappling with?
Watt: Well, I think -- so, I'm a research leader, so my insight into, like, you know, our customer base is not, I would say, as well informed as, say, like, our business units. But I can tell you what I'm seeing in the industry and open source. And primarily, you know, coding agents -- like that is most of the spend around agentic, is around coding agents. And so that is taking off like wildfire.
I think the industry is still on the sort of arc of they're using the frontier models to build agents and then they're having to hit this inflection point of like, 'OK, this strategy isn't working for me for X, Y, or Z.' And then they're gonna come back to self-managed. You know? It's just sort of the nature.
We went through this with hybrid cloud, right? Where at some point people were, like, public cloud or just putting all my eggs in one basket, you know? Maybe because it's for disaster recovery, or compliance or for many different reasons. There's sort of this tail effect where they come back to self-managed. And the benefit for us is that the AI industry is moving so fast that, with hybrid cloud, that took years to happen. And it's literally happening in months.
Pariseau: So, within your team, as you're doing this reinforcement learning and developing, getting that last mile closed, how do you also account for, you know, the sort of 'scary stories' that we've heard about agents just sort of going off the rails? Things like Claude Mythos that break out of their sandbox or just do things that they were never prompted to do. You know, how do you keep these things reliable?
Watt: Well, as we often hear from security people, nothing's ever secure. We can just make it more secure. And so, we have -- I have two teams or two groups in my broader org. One is Research, one is Emerging Tech. They're both working on sandboxing. And so, this is -- and then we're also working across a number of different open source projects.
So, OpenClaw has really taken off quite recently. We have one of the few OpenClaw maintainers in our team, Sally O'Malley. She just published a new open source project called Tank OS. This is just an upstream project. It's not a product yet.
But one of the ways that we're trying to secure these agents is we start with a bootable, immutable operating system. So, bootc-based. That's a technology that's been around for a while. We have it in our products today. But that creates a -- puts OpenClaw in a bootable image that allows, enables, OpenClaw to run rootless.
So, it doesn't have root permissions to the operating system. The operating system's immutable, but -- so, it can't mutate the operating system and change certain things, but at the same time, it also doesn't have the permissions to do so anyway. And so that's like the first step.
We have OpenShell that we're collaborating with Nvidia on. That's sort of a richer environment. And then you start drifting into the Guardrails story and egress. So basically, like, when they're -- the agent -- if you look at the different sort of threat vectors, one is what it can do to the operating system and your files and your databases. So, Tank OS is sort of moving in that vector. OpenShell helps with that as well.
But then there's also like toolchain calling, you know, what it's trying to do by reaching out over the internet. And then you can very carefully, with policies, control different egress points and what it's allowed to do and what not to do. But I think it's just gonna follow the same security arc there. It's a fun space. There's a lot going on there.
Pariseau: So, within your team, what kinds of wins have you seen so far with these open-weight models and with operationalizing AI agents? Have you gained productivity? Have there been, you know, any breakthroughs?
Watt: Well, one is my favorite agent is the one that I replace myself with, which is -- it's a little weird because it's, you know, sometimes feels a bit like you're training the person you're outsourcing your job to, you know? But it has allowed me to think on other -- I think -- spend more time thinking of other things.
And so, like, the example is we have a fairly constant stream of new researchers and engineers into my team. And we have a pretty well-established playbook, which the more senior engineers understand well, but the newer folks that come to the group don't understand so well.
And this playbook is, as new ideas are pitched, I always ask the same set of questions. And it's not a super good use of my time, especially because it's a fixed list of questions. So, I've got a set of agents. And then basically, if they've got a new idea, a research prototype they want to work on, they can basically go and work with the agent, and the agent will work with them.
So, it's not like a judge in that it's like pass/fail. It helps them shape their idea into something that impacts our addressable market. There's a product outlet for it. They can clearly articulate why Red Hat would care, why our customers would care about this thing. And that has been a huge win. And that's sort of like our first step in that.
And then obviously, the coding agents. I would put it this way: We're not completely finished with the shift to the agentic SDLC, but when it is complete, which I'd say in a month or two, our velocity will have gone from months to build a prototype and ship it into commercialization to days.
Pariseau: And have you quantified the cost savings from using open-weight models versus the hosted frontier models?
Watt: Well, right now we're still in the first phase where we're building everything on Opus. And then so, it's that second phase if you're on the 'Maybe we can do that in the second,' what worked, what didn't, but yeah, exactly.
Pariseau: Any early lessons learned that you would advise customers to, you know, be aware of?
Watt: So, I think one thing is having a good taxonomy about how to talk about the different patterns and approaches on this. And so, there's two emerging patterns that we've seen. One is what I call the copilot pattern and one is the factory pattern.
And so, the copilot is where you're building software the way you always have, submitting PRs, having those PRs reviewed, but the difference is that you're using a coding agent to help you create those PRs.
The factory pattern -- and well, the first thing with the copilot pattern is you're mutating source code. So, the factory pattern is you don't mutate source code, you mutate specs and tests. And you lifecycle maintain those. And ultimately, you say, 'Hey, read the specs and make sure it passes all the tests' and produce -- it's like something coming through an assembly line in a factory and then it spits out at the other end. And if there's something wrong with it, you throw it in the trash and you go fix the factory assembly line.
That and just understanding that these are different patterns, and copilot works really good with products and solutions that are deployed already today, brownfield. The factory is more of a greenfield approach. And so, we're understanding that and being able to articulate, and they have different harnesses and different approaches. And so, having a clear vernacular around that, I think, is important.
Pariseau: OK. And so, what is next? You focus on emerging technologies, whether it's within Red Hat Research or in the wider market. How do you see things evolving with agentic AI?
Watt: Yeah, this is a great question. So, I've been talking a little bit about how we're innovating our process, but our process basically builds prototypes. And so, what are the prototypes we're working on?
So, vLLM Omni is the ability to add multimodality to self-managed models. So, vLLM is the inference server project, so you can take these open-weight models or closed-weight. It's just for your self-managed models. But they're typically the historical autoregressive, you know, 'what's the token, predict the next token' type model architectures.
And vLLM Omni allows -- and it's just text to audio, video, image -- and to bring that into your self-managed models. It's an emerging area, so we're busy, you know, contributing to the project, hardening it, sort of coming in on the ground floor so we can bring those capabilities to our self-managed offerings that we offer our customers.
So, vLLM Omni, I think, is quite an interesting piece. We're also looking at connecting it into, like, llm-d because it's a multiphase-type step.
That's also tied to diffusion models. And so, diffusion models are used a lot today around, like, image generation. So, the autoregressive sort of predicts the next token, where diffusion models is basically starts with, like, a noisy, fuzzy picture and then takes multiple passes, creating it to get to a clearer and clearer outcome until it's sort of like what you asked it to build. There's a way of using diffusion models for media but also with text. And so that's a space that we're trying to figure out how we can bring those into self-managed.
But lots, you know, like, accelerator enablement is another area where we're trying to light up as many accelerators. And I think that is a fascinating area because I think there's all kinds of interesting geopolitical dynamics around availability of power and the ability to build new data centers. That means that the ability to take the latest accelerators that are being built and diffusing them evenly geographically is not always possible.
Pariseau: OK, so, I mean, this is a space that is moving so fast that predictions are nearly impossible, but I'm gonna ask anyway. Let's say, not a year, but maybe six months out. What would you say is the best-case scenario for enterprise AI, and what would be the worst-case scenario in your mind?
Watt: OK, let's start with the good news first. I think the good news is that the best-case scenario is that we'll have our business processes velocity at, like, 10x. And I think that will free us up to do other things and build new capabilities. I think, also, I would say across a bunch of different modalities.
I would say, like, maybe a simplistic way of thinking about it is the same way that, like, ChatGPT is impacting our daily lives, you know, and giving us like hyperspecific, hypercontextual, in some cases situational and disposable advice, we'll be able to bring that into our business. And I think that will revolutionize the way we operate our businesses and the types of value that businesses bring.
I think there's an aspect of that where we'll also see a whole new class of new agent- or AI-native startups that are just -- and you're starting to see some examples of this, you know, the two people that produce $500 million in revenue, or something to that effect -- they're hyperproductive because they really know how to unlock this technology.
I think the worst-case scenario, like a dystopian one, is the gap between the frontier models and the open-weight models isn't able to be meaningfully closed.
The reason this is dystopian is it would create a sort of disintermediation across the entire industry, where the hyperscalers would be building their own chips, their own servers, everything, and everybody would just be buying from one or two players, right? And I don't think that's super good for anybody, unless you're the monopoly itself. And so, I think there are shared incentives for the different industry ecosystems to work together to make self-managed models a reality.
Pariseau: And you mentioned changing the value that businesses bring. Can you say more about that? What do you specifically think of?
Watt: Yeah, so I think I would say, like, what I was referring to in that specific example is, like, how fast they could bring value, right?
So, it would typically take them, I don't know, like if you look at State Farm responding to an insurance claim -- this isn't an actual customer scenario, just to disclaim. But, like, for example, you know, it takes you several weeks. You know, you've got to deal with the adjuster, upload the pictures. And then several weeks later, the check may arrive, you know, it might not. You know?
And I think you're going from, you're increasing the automation, to actually go from end to end, from weeks down to days. And I think that's a real measurable impact that the business is able to deliver to a customer. So, it's a velocity, I think, maybe a different way of putting that.
Pariseau: And you mentioned it, you know, about replacing yourself with an agent. I think, for a lot of white-collar workers, that is the fear that they are training their replacements with AI agents. What's your view on that? I mean, you know, do you think that that's where things are headed? Are people gonna have to, you know, change the way that they think about work?
Watt: Yes, I think they will have to change the way they think about work. I find, as humans, when we reason about the future, we do tend to reason about the future linearly, where the future, often, innovation is often exponential and our imagination is somewhat limited.
I find, personally, like I have to spend a lot of time in my job imagining what the future is and trying to build that reality. The way I'm thinking about this right now is there's so many variables that would impact what that future looks like. It is hard to imagine right now.
And what I, my personal perspective, and what I say to my team is, look, you know, whatever, however this plays out, you want as much time as possible to adjust to that future reality. The best way to do that is if you're on the crest of the wave, you know?
And so, we are building out this future, but at the same time, because we're on the frontier there, it gives us a little bit more time to stop and be reflective about, you know, should we push the technology this way? Should we push it that way? What are the different implications of that?
So, I wish I had a clear answer to you. But I think it's just, we're just trying to stay on the forefront, which is challenging in itself, and then make educated decisions.
Pariseau: Right. So, I know I've really peppered you with questions here, but is there anything I haven't thought to ask you about that you want to mention?
Watt: I think there's a couple of projects that, I think, are really germane and exciting and a sort of broader vision.
So, like one, we're just going back to the agentic paradox and the shift to being able to empower organizations to have more control. VLLM Semantic Router is a project that Red Hat created, and where it becomes important is you're building your applications to basically leverage an inference endpoint, which right now is a third-party, you know, model as a service.
Semantic Router allows you to basically replace that endpoint with a proxy, and it's an open source project that provides inference routing. And so, you can put all kinds, you know, policies, guardrails, static routing, dynamic routing, semantic. You know, it's called Semantic Router because you can actually look at the body of the text and understand whether you should send it to a physics model, or a history model, or something to that effect.
But this is not a product at Red Hat, but it's a place where we explore new ideas around token economics and inference routing. And what we learn from that is making its way into our products. So, I think Semantic Router is one that's particularly interesting.
Another thing is the work that we're doing in Helion, which is to be able to create a platform abstraction across all these different new accelerators that are coming onto the market. I think it's hard to separate AI from geopolitical dynamics. And I think China will have a hand in shaping the world's future. They have the leading open-weight models, they have a huge host of accelerator startups.
So, I think it's not always possible, depending on where you live, given sovereign constraints, to leverage everything from every different geography. But the ability to create a simple way in open source to be able to light up and support different accelerators and make sure that everybody's able to sort of stay on the same page, with the meritocracy of ideas, I think it's useful.
Pariseau: There's a lot to think about. Well, I appreciate you sharing your insights on it with us, and thank you very much for taking the time out.
Watt: Yeah, my pleasure.
Pariseau: And thank you for watching.