Getty Images

Tip

How to build your first agentic AI system

Agentic AI systems can be a value add for many businesses -- but only if they're built properly. Development teams looking to build agentic AI can use this guide to get started.

Business software increasingly involves agentic systems that can reason, plan and adapt autonomously. By following patterns and practices, developers can build agents that not only respond to prompts but also actively solve complex problems.

Consider a traditional e-commerce system where a customer service ticket arrives in the middle of the night: Our entire payment system is down, and customers can't complete their purchases! This message could remain in a queue until human agents address it in the morning.

In contrast, agentic AI systems can immediately act, categorizing the urgency of the ticket, searching knowledge bases for similar incidents and even attempting automated diagnostics.

Agentic AI systems are a viable tool for many business functions. However, their complexity makes them daunting to build. To properly build agentic AI systems, development teams must consider the following seven aspects, each with its own intricacies:

  • Architecture foundations.
  • Planning capabilities.
  • Tool integrations.
  • Memory management and context preservation.
  • Error handling and recovery.
  • Evaluation and performance monitoring.
  • Development best practices.

1. Architecture foundations

When building agentic AI systems, architectural choices shape an agent's capabilities and operational constraints. Developers need to consider large language models (LLMs), coding languages, possible frameworks and other miscellaneous requirements.

Language model

Language models are fundamental to agentic AI systems because agents need to understand goals expressed in human language. When an agent receives a prompt to resolve a customer billing issue, it must parse the intent, understand the domain context and divide that high-level goal into actionable steps. The choice of language model is use case dependent, as it fundamentally shapes an agent's capabilities, costs and operational characteristics.

Coding language

Developers need to define an optimal coding language when building agentic systems. Python is a popular option due to its extensive machine learning ecosystem. However, JavaScript and TypeScript also work well for web-integrated agents. Some developers use Go for performance advantages in high-throughput systems.

Framework

Agentic AI systems sometimes require an AI agent framework, which consists of prebuilt software libraries and architectural patterns that facilitate the complex orchestration of agent behavior. This relieves developers from building everything from scratch.

Framework selection depends on an agentic system's complexity requirements. Consider the following options:

  • LangGraph. Helpful for stateful workflows with sophisticated branching logic and built-in persistence.
  • CrewAI. Simplifies multi-agent coordination by organizing agents into role-based teams that collaborate on shared objectives.
  • AutoGen. Handles conversational multi-agent interactions and works directly with LLM APIs.

Other requirements

Other infrastructure requirements are more complex than those of traditional applications, such as the following:

  • Vector databases. Store and retrieve contextual memory. Unlike traditional databases that match exact keywords, vector databases find conceptually related information. Therefore, an agent working on payment processing errors can learn lessons from billing system failures.
  • Message queues. Handle asynchronous operations when agents need to coordinate multiple long-running tasks or communicate with external systems that are slow or unavailable.
  • Monitoring systems. Track agentic workflows, capturing metrics like response times, error rates, reasoning chains, tool use and decision quality so developers can assess agents' efficacy.

2. Planning capabilities

The heart of any agentic system is its ability to break down complex problems into manageable tasks and adapt when initial approaches fail. Two abilities developers need to build are the agent decision loop and the planning function.

The agent decision loop

Most effective agentic systems implement some variation of a plan-execute-evaluate cycle. This loop forms the foundation of autonomous behavior.

The pseudocode structure looks like the following:

MAIN_AGENT_LOOP:
  WHILE goal_not_achieved AND iterations < MAX_LIMIT:
    current_state = ANALYZE_SITUATION(memory, available_tools)
    next_action = PLAN_ACTION(goal, current_state)
    result = EXECUTE_ACTION(next_action)
    evaluation = EVALUATE_RESULT(result, goal)

    IF evaluation.success:
      RETURN result
    ELSE:
      memory.ADD(action, result, evaluation)
      UPDATE_STRATEGY(evaluation.feedback)

  // If we reach here, escalate
  ESCALATE_TO_HUMAN("Goal not achieved within limits")

The planning function

Effective planning starts with understanding the current environment and the desired actions that lead toward a goal.

Agents need to assess available information, including identifying any knowledge gaps. Then they need to sequence actions logically. In a sense, the planning process becomes a dialogue between the agent and its language model. The system reasons through options before deciding on specific actions.

For the customer support ticket example, the pseudocode for a simple planning function looks like the following:

PLANNING_FUNCTION(goal, current_state, memory, available_tools):
  // Analyze the goal
  goal_components = DECOMPOSE_GOAL(goal)
  success_criteria = DEFINE_SUCCESS_METRICS(goal)

  // Assess current situation
  known_facts = EXTRACT_FACTS(current_state, memory)
  available_actions = LIST_POSSIBLE_ACTIONS(available_tools)
  constraints = IDENTIFY_CONSTRAINTS(time, cost, permissions)

  // Find gaps
  missing_info = COMPARE(goal_requirements, known_facts)
  blocking_issues = IDENTIFY_BLOCKERS(goal, constraints)

  // Generate action plan
  IF missing_info.critical:
    RETURN PLAN_INFO_GATHERING(missing_info, available_tools)
  ELSE IF blocking_issues.exist:
    RETURN PLAN_ISSUE_RESOLUTION(blocking_issues)
  ELSE:
    RETURN PLAN_DIRECT_ACTION(goal, available_actions)

Or, the planning function can be thought of as a goal with the following four steps:

Goal: Resolve customer complaint about billing error.

  1. Analyze the goal. Identify the billing issue, determine the cause, fix the error and communicate a resolution.
  2. Assess the current situation. Have customer message, access to billing system and knowledge base available.
  3. Find gaps. Identify missing customer account details and the specific billing period affected.
  4. Generate an action plan. Gather account information, then investigate billing records to determine the fix.

3. Tool integrations

Tools transform agents from conversational interfaces into active participants in a business or technical ecosystem. The key is to design tool interfaces with both flexibility and safety, finding a balance between the useful actions and potentially destructive ones.

The following shows a simplified high-level pattern for tool execution, which should be resilient, secure and well-governed:

TOOL_EXECUTION_PATTERN:
  // Validate inputs
  validated_params = VALIDATE_PARAMETERS(tool_params, schema)
  IF validation_failed:
    RETURN error_with_guidance

  // Check permissions
  IF NOT has_permission(agent_id, tool_name, resource):
    RETURN permission_denied

  // Execute with retry logic
  FOR attempt = 1 TO max_retries:
    TRY:
      result = EXECUTE_TOOL(tool_name, validated_params)
      LOG_SUCCESS(tool_name, params, result, attempt, agent_id, ticket_id)
      RETURN result
    CATCH retryable_error:
      LOG_RETRY(tool_name, error, attempt, agent_id, ticket_id)
      WAIT(exponential_backoff(attempt))
    CATCH fatal_error:
      LOG_FAILURE(tool_name, error, attempt, agent_id, ticket_id)
      RETURN error_result

  RETURN max_retries_exceeded

There are a few important notes to consider with tool integration, such as the following:

  • Validating inputs before execution should return actionable feedback (error_with_guidance), not a generic failure. LLMs make this kind of adaptive error handling much easier than before. This matters because LLM-based agents operate autonomously and often without hardcoded rules. They decide what to do next based on the results of their actions. If an agent encounters a failure, it can interpret that failure.
  • Checking permissions means implementing agent scoping. For example, some agents perhaps only read metadata, but others can write to a CRM.
  • Implement a structured retry pattern. This differentiates between retriable and fatal errors, which is essential for observability and LLM agents.
  • Logging is structured with context, which can be used for analysis, optimization and tracing.

4. Memory management and context preservation

As they work through often complex problems, agentic systems accumulate substantial context. To be effective, agents need to remember previous interactions, learn from past decisions and maintain some coherence across workflows.

Short-term memory captures the immediate context of current tasks, such as recent actions, intermediate results and active goals. This information should be readily accessible in the agent's working memory because it influences immediate decisions.

In contrast, long-term memory stores patterns, successful strategies and essential facts that apply across multiple tasks. Vector databases are used for this type of storage, enabling agents to access historical context based on semantic similarity rather than exact matches.

The following shows a simplified memory management system:

MEMORY_MANAGEMENT_SYSTEM:
  // Working memory (immediate context)
  working_memory = CircularBuffer(max_size=10)
  current_goal = None
  active_context = {}

  // Long-term storage (semantic & symbolic)
  vector_store = VectorDatabase()      // stores embeddings + metadata
  tag_index = TagIndex()               // supports tag-based lookup

  FUNCTION add_experience(action, result, outcome):
    experience = {
      action: action,
      result: result,
      outcome: outcome,
      timestamp: current_time(),
      success: outcome.success,
      tags: extract_tags(action, result, outcome),         // semantic + symbolic tags
      embedding: embed_text(action + result + outcome.summary) // vector representation
    }

    // Add to working memory
    working_memory.append(experience)

    // Archive to long-term if significant or working memory full
    IF experience_is_significant(experience) OR working_memory.full():
      archived = working_memory.pop_oldest()
      vector_store.store(archived.embedding, metadata=archived)
      tag_index.index(archived.tags, metadata=archived)

  FUNCTION recall_similar_experiences(current_situation):
    query_embedding = embed_text(describe_situation(current_situation))
    query_tags = extract_tags(current_situation)

    // Hybrid search: vector + tag relevance
    similar_by_vector = vector_store.similarity_search(query_embedding, limit=3)
    similar_by_tags = tag_index.tag_search(query_tags, limit=3)

    RETURN merge_and_rank(similar_by_vector, similar_by_tags)

Here are some key points to note regarding memory management:

  • Memories are enriched with tags and embeddings. This enables both symbolic (tag-based) and semantic (vector-based) recall from the database. Tags could represent concepts such as user intent, business domain, tool category or error type. They make memory more searchable, interpretable and auditable.
  • The significance filter (experience_is_significant) ensures that only essential experiences are promoted from working memory to long-term storage. This mimics human memory -- not every step is worth remembering forever. Developers can define significance in many ways; for example, success/failure, novelty, confidence shifts or user feedback.
  • The recall_similar_experiences function should use hybrid retrieval. Combine tag-based lookups with semantic similarity. This enables agents to base their decisions on experience. This is a core capability for adaptive, reflective behavior. The merge_and_rank step enables developers to tune this based on recency, similarity or confidence.
  • Memory systems support both execution and reflection. Agents can consider previous similar occurrences or what kinds of actions can succeed in achieving a specific goal. LLMs excel at using structured context to improve performance, generate explanations or justify escalations.

5. Error handling and recovery

Real-world deployment means confronting a wide range of potential failures, including network timeouts, API quotas, unexpected data formats and edge cases that were never encountered in development and testing. Robust agents handle these gracefully rather than crashing or producing incorrect results.

When implementing retry logic for issues such as these, distinguish between temporary issues and fundamental problems. An agent might retry a failed database query several times, but it shouldn't attempt to connect to a nonexistent tool.

To prevent agents from getting stuck in expensive loops, use circuit breaker patterns. Set maximum iteration counts, thresholds and time limits that force escalation to human operators when automated handling fails.

The following commands highlight a circuit breaker pattern:

RESILIENT_EXECUTION_PATTERN:
  max_retries = 3
  cost_limit = 10.0
  current_cost = 0.0

  FUNCTION execute_with_recovery(action, context):
    last_error = None

    FOR attempt = 1 TO max_retries:
      // Check cost limits before proceeding
      IF current_cost > cost_limit:
        ESCALATE_TO_HUMAN("Cost limit exceeded")
        RETURN None

      TRY:
        result = EXECUTE_ACTION(action, context)
        RETURN result

      CATCH RetryableError as e:
        last_error = e
        wait_time = 2^attempt  // Exponential backoff
        SLEEP(wait_time)

      CATCH FatalError as e:
        ESCALATE_TO_HUMAN("Fatal error: " + e.message)
        RETURN None

    // All retries exhausted
    ESCALATE_TO_HUMAN("Max retries exceeded: " + last_error.message)
    RETURN None

This example includes a cost limit, represented as a budget or quota. This could be in dollars, API credits, token usage or even latency thresholds. Cost limits are especially important when agents interact with metered services or production-critical systems, as runaway retries can lead to cascading failures and increased costs.

6. Evaluation and performance monitoring

Measuring agent performance requires new metrics, compared to traditional software measures such as response time and error rates. These remain important, but agentic systems need measurements that reflect their autonomous decision-making capabilities. Consider the following KPIs for agent performance:

  • Task completion rate. Measure the percentage of objectives that agents complete successfully without human intervention.
  • Action efficiency. Track how many steps agents take to complete typical tasks. While some complexity is unavoidable, dramatic increases in step count might indicate errors in reasoning.
  • Decision quality. Review completed tasks regularly and have domain experts evaluate whether agents selected appropriate strategies and drew reasonable conclusions.

Track these metrics using dashboards and visualizations, and maintain comprehensive logs of agent decisions for post-incident analysis and audit purposes.

7. Development best practices

Start with well-defined problems where success criteria are clear and measurable. After all, a simple agent that works provides more value than a sophisticated system that fails unpredictably. Build confidence in AI deployments by incrementally increasing complexity.

Design for observability from the very start. Every agent decision, tool call and reasoning step should be logged, traceable and auditable. Include enough context in logs to understand why agents made specific choices, not just what actions they took.

Test with ambiguity. Agents will encounter badly formed requests, partial failures, conflicting information and even malicious input. Build test suites that cover these scenarios before deployment.

Lastly, maintain human oversight. All agents have limitations, so providing clear escalation paths prevents minor issues from escalating into major incidents.

Donald Farmer is a data strategist with 30-plus years of experience, including as a product team leader at Microsoft and Qlik. He advises global clients on data, analytics, AI and innovation strategy, with expertise spanning from tech giants to startups.

Dig Deeper on AI business strategies