Skip to content

Agentic AI Concepts

This document explains the core concepts behind Amelia's agentic architecture for engineers new to agentic AI systems.

What is an "Agent"?

An agent is an LLM given a specific role, tools, and instructions. Unlike a chatbot that just responds to queries, an agent can take actions - executing commands, writing files, and making decisions.

Each agent has:

  • System prompt: Defines its persona and capabilities
  • Tools: Access to specific operations (shell commands, file operations)
  • Input/output schemas: Structured communication with defined types

Agent Quality Framework

Agent quality is measured across four pillars established in industry research:

PillarDefinitionHow Amelia Addresses It
EffectivenessDoes the agent achieve the goal?Human approval gates validate plans before execution
EfficiencyTokens, latency, steps to completionToken usage tracking, review iteration limits
RobustnessGraceful handling of failuresRetry configuration, structured error types
SafetyOperation within ethical/security bounds4-layer shell security, path traversal protection

Traditional QA fails for agents because failures are subtle degradations, not crashes. The system returns 200 OK with wrong answers. This is why observability and trajectory evaluation are architectural requirements, not afterthoughts.

Amelia's Agents

Architect (amelia/agents/architect.py)

Role: Analyzes issues, designs solutions, creates implementation plans.

PropertyDescription
InputIssue (id, title, description) + optional Design
OutputPlanOutput (markdown plan, goal, key files)
Key FeatureGenerates rich markdown plans for agentic execution

The Architect examines an issue and produces a detailed Markdown implementation plan with phases, tasks, and steps. This plan is presented for human approval before execution.

Plan Structure:

  • Goal: Clear description of what needs to be accomplished
  • Phases: Logical groupings of related work
  • Tasks: Discrete units of work within each phase
  • Steps: Specific actions with code blocks, commands, and success criteria

The Architect follows TDD principles when applicable: write test first, verify it fails, implement, verify it passes.

Developer (amelia/agents/developer.py)

Role: Executes code changes through autonomous tool-calling.

CapabilityDescription
Agentic executionAutonomous LLM decides which tools to call and when
Streaming eventsReal-time updates: tool_call, tool_result, thinking, result
Session continuityMaintains context across the execution session

The Developer uses an agentic execution model where the LLM autonomously decides what actions to take based on the goal and context. It has access to tools for shell commands, file operations, and reading files.

Streaming Events:

  • thinking: Agent is analyzing the situation
  • tool_call: Agent is invoking a tool (shell command, file write, etc.)
  • tool_result: Result from tool execution
  • result: Final output when execution completes

Reviewer (amelia/agents/reviewer.py)

Role: Reviews code changes, provides feedback, approves or requests fixes.

The Reviewer uses agentic execution to auto-detect technologies, load appropriate review skills, and fetch the diff via git. It produces structured feedback with issues categorized by severity (critical, major, minor).

OutputDescription
approvedBoolean - whether changes are acceptable
issuesList of issues with severity (critical, major, minor) and descriptions
summaryHigh-level summary of the review findings

The Reviewer examines code changes and either approves them or provides feedback. If changes are not approved, the Developer receives the feedback and attempts fixes. This review-fix loop continues until approved or the maximum iterations (max_review_iterations) is reached.

Evaluator (amelia/agents/evaluator.py)

Role: Evaluates review feedback against the actual codebase.

The Evaluator examines each piece of review feedback in context and categorizes it using a decision matrix:

DecisionMeaning
IMPLEMENTFeedback is valid — apply the fix
REJECTFeedback is wrong or inapplicable
DEFERNot relevant to the current task
CLARIFYNeed more information before acting

Returns an EvaluationResult with categorized items. This prevents the Developer from blindly applying all review suggestions, filtering out false positives and deferring out-of-scope work.

Oracle (amelia/agents/oracle.py)

Role: Expert consultation agent.

The Oracle takes a problem statement combined with codebase context and provides advice via agentic LLM execution. It acts as an on-demand expert that other agents or the user can consult for architectural guidance, debugging help, or design decisions. Returns an OracleConsultation record.

Brainstormer (amelia/agents/brainstormer.py)

Role: Chat-based design session service.

The Brainstormer facilitates interactive Q&A design sessions with artifacts. It uses a restricted write_design_doc tool that only produces markdown output. Sessions persist and can be handed off to implementation workflows, bridging the gap between ideation and execution.

Plan Validator (amelia/agents/plan_validator.py)

Role: Pipeline node that validates plan structure.

The Plan Validator is not a traditional agent — it is a pipeline node that validates plan structure using regex-based extraction of goal, key_files, and total_tasks from the Architect's markdown plan. Returns a PlanValidationResult. This replaced an earlier LLM-based extraction approach for determinism and speed.

Tracker Factory (amelia/trackers/factory.py)

Role: Creates the appropriate tracker based on profile configuration.

The create_tracker() factory function returns a BaseTracker implementation (Jira, GitHub, or Noop) based on the profile's tracker setting.

What is "Orchestration"?

Orchestration coordinates multiple agents through a workflow. Rather than one monolithic AI call, orchestration breaks work into specialized steps with clear handoffs.

Amelia uses LangGraph's StateGraph for orchestration:

  • Nodes: Individual agent calls (architect, developer, reviewer)
  • Edges: Transitions between nodes
  • Conditional edges: Decision points (approved? review passed?)

State Machine

ExecutionState tracks everything:

  • Current profile and issue
  • Generated plan (goal + markdown)
  • Tool calls and results (with reducers for streaming)
  • Approval status
  • Review results
  • driver_session_id: For session continuity
  • review_iteration: Current iteration in review-fix loop
  • review_pass / max_review_passes: Track multi-pass review cycles
  • agentic_status: Current execution status
  • total_tasks: Number of tasks extracted from the plan
  • current_task_index: Which task is being worked on (0-indexed)
  • task_review_iteration: Review attempts for the current task
  • evaluation_result: Output from the Evaluator agent
  • approved_items: Review items approved for implementation
  • external_plan: Plan provided externally (bypasses Architect)

Task-Based Execution

For complex implementations, plans are broken into discrete tasks that are executed and reviewed individually.

How It Works

  1. Plan Parsing: The plan_validator_node extracts task count from the plan markdown
  2. Per-Task Execution: Developer works on one task at a time (current_task_index)
  3. Per-Task Review: Each task has its own review cycle (task_review_iteration resets per task)
  4. Commit on Approval: When a task passes review, changes are committed before moving to the next task
  5. Progression: next_task_node advances to the next task until all tasks complete

Benefits

  • Incremental commits: Each task's changes are committed separately
  • Focused review: Reviewer evaluates one task at a time
  • Failure isolation: If a task fails after max iterations, previous tasks are preserved
  • Progress visibility: Clear tracking of which task is being worked on

State Fields

FieldPurpose
total_tasksNumber of tasks in the plan (None = legacy mode)
current_task_indexWhich task is being worked on (0-indexed)
task_review_iterationReview attempts for current task (resets per task)

Tool Use

Agents don't just generate text - they call tools. This is what makes them "agentic."

How Tool Calls Work

  1. Agent receives goal/context
  2. Agent decides which tool to call with what parameters
  3. Tool executes and returns result
  4. Agent decides next action based on result
  5. Repeat until goal is achieved

Example Flow

Developer receives goal: "Add user authentication tests"

Developer calls: glob(pattern="tests/test_*.py")

Result: ["tests/test_api.py", "tests/test_utils.py"]

Developer calls: read_file(path="tests/test_api.py")

Result: [file contents]

Developer calls: write_file(path="tests/test_auth.py", content="...")

Result: "File created successfully"

Developer calls: bash(command="pytest tests/test_auth.py")

Result: "1 passed"

Developer marks execution complete

Available Tools

The Developer agent uses FilesystemBackend which provides these tools:

ToolPurpose
read_fileRead file contents
write_fileCreate or overwrite files
edit_fileApply targeted edits to existing files
globFind files by pattern matching
grepSearch file contents with regex
bashExecute shell commands

Unified Streaming with StreamEvent

Amelia uses a unified StreamEvent type for real-time streaming across all drivers. Regardless of whether you use the API driver, Claude driver, or Codex driver, tool execution progress is communicated through the same event format.

StreamEventType (the event categories):

TypeDescription
CLAUDE_THINKINGAgent is analyzing the situation and planning
CLAUDE_TOOL_CALLAgent is invoking a tool with specific parameters
CLAUDE_TOOL_RESULTResult returned from tool execution
AGENT_OUTPUTFinal output when agent completes execution

StreamEvent contains:

  • id: Unique event identifier
  • type: One of the StreamEventType values
  • content: Event payload (text content, result, etc.)
  • timestamp: When the event occurred
  • agent: Which agent produced the event (developer, reviewer, etc.)
  • workflow_id: Links the event to its workflow
  • tool_name: Name of tool being called (for tool events)
  • tool_input: Input parameters for tool calls

Driver Conversion

Each driver converts its native message types to StreamEvent:

python
# Claude CLI driver converts SDK messages via convert_to_stream_event()
stream_event = convert_to_stream_event(sdk_message, agent="developer", workflow_id="...")

# The UI and logging systems consume StreamEvent uniformly
await emit_event(stream_event)

This abstraction allows the dashboard and logging systems to display real-time progress identically regardless of which driver is executing the work.

Sandbox Execution

When sandbox mode is enabled, agents execute inside isolated containers rather than on the host machine. This provides a security boundary between the AI agent's actions and the host environment.

How It Works

The ContainerDriver delegates execution to a sandboxed worker process running inside a container. Two sandbox providers are supported:

ProviderDescription
DockerLocal container isolation using Docker
DaytonaCloud-based sandbox environments

LLM Proxy

API keys never enter the sandbox. Instead, the host runs an LLM proxy that the sandboxed worker connects to for model access. This keeps credentials on the host side of the security boundary while allowing the agent to make LLM calls from within the container.

Host                          Sandbox Container
┌──────────────┐              ┌──────────────────┐
│ LLM Proxy    │◄────────────►│ Worker Process   │
│ (holds keys) │   HTTP       │ (no API keys)    │
└──────────────┘              └──────────────────┘

Compatibility

Only the api driver works in sandbox mode. The claude and codex CLI drivers require local CLI installations and authentication that are not available inside the container.

The Driver Abstraction

Drivers abstract how Amelia communicates with LLMs. This separation enables flexibility across different environments.

Why Multiple Drivers?

DriverUse CaseRequirements
apiDirect API calls via DeepAgents + LangChain. Only driver that works in sandbox modeOPENROUTER_API_KEY env var
claudeClaude CLI wrapper, policy-compliantclaude CLI installed and authenticated
codexOpenAI Codex CLI wrappercodex CLI installed and authenticated

Driver Interface

All drivers implement a simple prompt-based interface:

python
class DriverInterface(Protocol):
    async def prompt(
        self,
        prompt: str,
        system_prompt: str | None = None,
        session_id: str | None = None,
    ) -> str:
        """Send prompt and get response."""
        ...

    async def prompt_agentic(
        self,
        prompt: str,
        system_prompt: str | None = None,
        session_id: str | None = None,
    ) -> AsyncIterator[ApiStreamEvent]:
        """Stream agentic execution events."""
        ...

Why This Matters

Some environments prohibit direct API calls due to data retention policies. The CLI drivers wrap existing approved tools (claude and codex CLIs) that:

  • Inherit SSO authentication
  • Comply with data policies
  • Use existing security approvals

Users can switch between drivers without code changes - just update the profile.

The Tracker Abstraction

Trackers provide pluggable backends for fetching issues.

TrackerSourceRequirements
jiraJira issuesJIRA_BASE_URL, JIRA_EMAIL, JIRA_API_TOKEN
githubGitHub issuesgh CLI authenticated (gh auth login)
noneManual inputNone

All implement the BaseTracker protocol:

python
class BaseTracker(Protocol):
    def get_issue(self, issue_id: str) -> Issue:
        """Fetch issue details by ID."""
        ...

This abstraction means Amelia works with any issue source without changing the core orchestration logic.

Key Takeaways

  1. Agents are specialized: Architect, Developer, Reviewer, Evaluator, Oracle, Brainstormer, and Plan Validator each have focused roles with defined input/output contracts
  2. Trajectory is truth: Full execution trace persisted for debugging, not just final outputs
  3. Human-in-the-loop: Approval gates at critical decision points prevent runaway execution
  4. Defense in depth: Multiple security layers (metacharacters → blocklist → patterns → allowlist)
  5. Abstractions enable flexibility: Drivers and trackers adapt to organizational constraints without code changes
  6. Observability by design: Events, correlation IDs, and token tracking from day one
  7. Iterative refinement: Developer ↔ Reviewer loop with configurable iteration limits