Getty Images/iStockphoto

AWS Kiro 'user error' reflects common AI coding review gap

Even internal AWS Kiro users haven't always peer-reviewed AI code output, as evidenced by a reported December outage that overshadowed recent product updates.

AWS Kiro was at the center of an eyebrow-raising report last week as it advanced its AI coding features.

The agentic IDE, which emerged last year as a significant alternative to GitHub agents with specification-driven development, bolstered its support for existing applications with new features Feb. 18. Late the next day, the Financial Times published a report citing multiple anonymous company sources that internal users of AWS Kiro and Amazon Q had triggered at least two outages in production services over the last few months.

An anonymous source quoted by FT said that a Kiro coding agent "determined that the best course of action was to 'delete and recreate the environment,'" according to the report.

On Friday, Feb. 20, Amazon issued a statement disputing aspects of the FT report, saying there had been just one outage in AWS Cost Explorer in one of its 39 global regions in December, linked to "user error -- specifically misconfigured access controls."

The Amazon statement continued,

The issue stemmed from a misconfigured role -- the same issue that could occur with any developer tool (AI powered or not) or manual action. We did not receive any customer inquiries regarding the interruption. We implemented numerous safeguards to prevent this from happening again—not because the event had a big impact (it didn't), but because we insist on learning from our operational experience to improve our security and resilience. Additional safeguards include mandatory peer review for production access.

Industry reacts to AWS Kiro report

Amazon neither confirmed nor denied the FT report's claim that a Kiro agent had deleted a production environment in its statement, but some developers and analysts expressed surprise that the tool had sufficient access to carry out such an action.

Kyler Middleton, principal software engineer, VeradigmKyler Middleton

"Engineers should generally not have the ability to run commands in production regardless, so this feels like a check in on how much access [AWS] engineers have to production," said Kyler Middleton, principal software engineer at healthcare tech company Veradigm. "AI or not, that likely shouldn't have that much access without peer review."

This isn't the first time an AWS AI coding tool suffered an impactful misconfiguration issue. Last year, a malicious prompt injection was found in the Amazon Q extension for VS Code as the result of "an inappropriately scoped GitHub token in the project repo's CodeBuild configuration," according to an AWS statement.

AWS isn't alone in finding its AI coding tools making unwanted headlines: Replit's vibe coding agent went awry for another early adopter last July. This month, the OpenClaw AI assistant was at the center of multiple high-profile cybersecurity industry discussions, including a software supply chain security attack.

Moreover, according to recent market research, AWS engineers are also far from alone in not always peer-reviewing AI-generated code. A January report based on a survey of 1,100 developers by SonarSource, which markets the SonarQube code-scanning tool, found that while 96% of respondents don't trust that AI code is functionally correct, just 48% always check AI-generated or assisted code before committing it.

"I have seen some clients (not AWS specifically) say they are struggling with what they call 'lazy engineering,'" said James Andersen, an analyst at Moor Insights & Strategy. "Developers are under strain to deliver faster, and they end up giving the AI a bit too much latitude. ... This is leading to code that may pass a unit test, but it's not great when it goes into integration or production.

"On some level, we have to chalk this up to the collective learning process of a new and different technology. But in general, I would expect anyone using these tools at least to respect the standard human operating procedures, peer reviews and testing plans."

We need traditional testing as well as AI-based testing to ensure AI isn't breaking more than it's fixing.
Andrew Cornwall, Analyst, Forrester

For another analyst, the AWS Kiro incident points to a broader concern about potential flaws in emerging AI-driven safeguards for AI tools.

"As developers and vibe coders are both learning, AI guardrails are suggestions rather than hard boundaries. We need traditional testing as well as AI-based testing to ensure AI isn’t breaking more than it’s fixing," said Andrew Cornwall, an analyst at Forrester Research. "The volume of AI-generated code may overwhelm traditional testers. They’ll turn to AI assistance to keep up, meaning we’re likely to see more outages where 'the AI broke it.' AIs will hallucinate. Businesses need to make sure their processes account for that."

AWS Kiro updates expand past greenfield apps

In the meantime, AI coding tools continue to proliferate and develop, including AWS Kiro. Both Andersen and Cornwall said they found the AWS Kiro feature updates on Feb. 18 significant. These included two new ways for users to work with spec-driven development, a practice that emphasizes defining a software system's structure and behavior before code is written.

A requirements-first workflow starts with a behavior and outcome in mind, while a design-first workflow, now supported in AWS Kiro as of last week, begins with a technical vision, such as building a new feature into an existing app, according to a company blog post. A new Bugfix spec makes it easier to fix specific issues using coding agents without changing too much.

"This is a big deal," Andersen said. "Spec-based development is really good when there is a blank sheet of paper, but not always when you need to fix something that is already there. These new features help bring spec-based capabilities to a more surgical level, so you can fix and modify existing apps without the AI breaking or touching things it should not."

Requirements that change after the first demo "have been a sticking point with Kiro, which insisted on requirements first, then design, then tasks," according to Cornwall.

"The Kiro announcement addresses some of the ways developers work today, but doesn't flex as much as the real world does," Cornwall added. "Kiro will need to have a full two-way update of requirements and design to meet the needs of modern developers."

Beth Pariseau, a senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism. Have a tip? Email her.

Dig Deeper on Software design and development