Guest Post

Reproducibility: Is your AI project doomed?

AI governance focuses on mitigating risks from generative AI's unpredictability. Key strategies include reproducibility, explainability and comprehensive controls.

AI is a top technology priority in 2026, according to 62% of the nearly 3,000 digital trust respondents who took the 2026 "Tech Trends and Priorities Pulse Poll" from ISACA. However, 75% of all respondents assert that they are, at best, only somewhat prepared to manage generative AI risks with appropriate governance, policies and training.

AI governance should be centered on mitigating the potential for harm -- e.g., financial (business), economic (policy) and social (people) -- from AI deployments. The key to a comprehensive enterprise AI governance landscape is to identify and assess the full range of harms and then create controls that appropriately address them.

One of those harms could emerge from AI -- specifically, generative AI with large language models (LLMs) -- that provides different responses to the same prompt. This results in agentic AI and larger automation processes that produce inconsistent, unpredictable outputs each time they are executed, with consequent harms. This also has negative implications for auditability, as the results of LLM-based agentic AI decisioning or automation are not reproducible.

That said, is your LLM-based automation project doomed? Let's find out.

First-generation AI governance

Regulatory pressures have driven most current AI governance toward ensuring compliance with privacy and security standards. The privacy angle makes sense because AI requires massive amounts of data for training, and much of this data contains or can be linked back to personal information. The security angle also makes sense given the new attack surfaces, vulnerabilities and systemic risk in AI platforms.

Second-generation AI governance

Then there are emerging drivers of AI governance. These include ethics, fairness, accountability and transparency -- the foundation of good governance -- data quality and environmental sustainability, which considers the effects of AI data centers on energy consumption. There are also growing regulatory requirements for operational risk management and resilience, such as in banking, where regulatory expectations for Canadian financial institutions are set out in OSFI's E-21 guideline.

Nascent AI governance: Ensuring reproducibility

Generative AI does not provide consistent outputs for the same prompt. Run a prompt once and get one answer. Run the prompt again, and a different answer emerges. This characteristic of generative AI is great for creative endeavors, but it is a problem for business processes that depend on reproducibility, which is critical for auditability.

Reproducibility issues with generative AI stem from many factors. Understanding how generative AI works is fundamental for designing effective governance, risk management and audit processes at the deployment stage of an AI project.

How LLMs are created and how prompts work

Consider the following issues a governance professional should be aware of at the usage stage of an AI deployment:

  • Veracity.

  • Model and data drift.

  • Prompt interpretation.

  • Neural network path.

  • Tunable transformer models.

LLM creation starts with a massive collection of data sets that span all types of data and represent a variety of topics. From a data quality perspective, while it's possible to check for duplicates, formatting and relevance, it's difficult to verify the veracity of such broad data swaths. This is the first red flag for governance -- veracity. In practice, there have been many published examples of AI returning false responses.

The data is then decomposed into tokens, which represent individual words and punctuation. These tokens are used to train a neural network architecture, which is typically a transformer model. This is tuned before deployment to behave in a specific way. New data can be added to the neural network over time, which dynamically changes the model and thus its performance and outputs. This introduces a second governance red flag -- model and data drift. In practice, this partly explains changes in AI responses for the same prompt over time.

From a user perspective, it all starts with a prompt. The prompt is decomposed into tokens and translated in a way that enables the AI to simulate human understanding of the prompt. This is a third red flag for governance -- prompt interpretation. In practice, the ambiguity of language used in the prompt can result in unexpected responses.

Given the sheer scale of the data sets, the neural network uses its understanding of the prompt to formulate a response by compiling a set of next-best tokens or token sequences in a coherent way. This is a fourth red flag for governance -- the neural network path. In practice, users experience this issue when asking AI to try again or to propose a different answer to a prompt.

The challenges continue because the transformer models are tunable. Some are accessible through an API, while others can be tuned only by the vendor. This tunability presents a fifth red flag for governance. In practice, the "temperature" setting in API calls produces results that range from more fact-based to more creative. It's crucial for end users to understand how this parameter -- or group of parameters -- is configured, especially if they can't directly modify it themselves.

From this basic explanation, the lack of repeatability becomes easier to understand and thus to accept. The question is what, if anything, can be done to enhance reproducibility.

What to do about reproducibility

There's not a whole lot one can do about the nondeterministic, unpredictable nature of an LLM. However, while not fail-safe, it's possible to minimize inconsistent LLM outcomes for AI-based process automation and some agentic AI.

The following are more end-user controllable approaches:

  • Ensure the prompt is simple and unambiguous. Perform regular prompt hygiene, such as setting the context and defining the role you want AI to play. This ensures the system selects a path through the neural network that is most related to the desired outcome.

  • Use the same model for the same query -- e.g., if you're using GPT 4.1, use it consistently.

  • Test the prompt response if the model is upgraded in any way -- check vendor upgrade notices -- to determine how, if at all, the response characteristics change.

  • Ensure a human in the loop to interpret and review a sampling of responses and decisions.

The following is less end-user controllable and generally needs specialist technical assistance:

  • Set the prompt response type to the most fact-based setting, versus a creative setting.

Taking steps like these helps move toward ensuring explainable AI outcomes, a key attribute of an auditable AI system. Explainable AI is a field concerned with ensuring that AI is understandable to humans and that its outcomes make sense.

Conclusion

If you're concerned about the state of your AI governance, one approach is to move forward systematically. Start with policy and intent, identify the major risks related to your project's specific goals, assess and define appropriate controls with associated roles and responsibilities, and define and document suitable processes. If your major risks are privacy, security and ethics, materials in the public domain are available to help develop suitable AI frameworks, policies, processes and other governance artifacts.

If the goal of your AI initiative is automation, you need to take greater care from a governance perspective to ensure that the nondeterministic nature of LLMs is addressed.

While it's possible to manage LLM unpredictability, it's not recommended that such automation be in the makeup of your critical operations just yet. This approach is doomed to fail because regulatory reproducibility requirements are not forgiving.

It's not all doom and gloom, though. Apply automation and agentic AI where the potential effects of LLM unpredictability are less risky. Understanding how LLMs work is useful for a governance team because it helps ensure an appropriate level of controls that facilitate some form of outcome reproducibility, hopefully to the satisfaction of audits.

Guy Pearce has an academic background in business, computer science, economics, and the natural and built environments. He has served in senior strategic leadership and governance roles in both the private and public sectors. He leads digital transformations involving IT and data, excelling in building sustainable enterprise capabilities that enable value creation. An industry thought leader with more than 100 published articles, Pearce was awarded the 2019 ISACA® Michael Cangemi Best Author award for contributions to IT governance. He consults on IT and data to business and government.

Dig Deeper on AI technologies