Agentic AI Concepts

This document explains the core concepts behind Amelia's agentic architecture for engineers new to agentic AI systems.

What is an "Agent"?

An agent is an LLM given a specific role, tools, and instructions. Unlike a chatbot that just responds to queries, an agent can take actions - executing commands, writing files, and making decisions.

Each agent has:

System prompt: Defines its persona and capabilities
Tools: Access to specific operations (shell commands, file operations)
Input/output schemas: Structured communication with defined types

Agent Quality Framework

Agent quality is measured across four pillars established in industry research:

Pillar	Definition	How Amelia Addresses It
Effectiveness	Does the agent achieve the goal?	Human approval gates validate plans before execution
Efficiency	Tokens, latency, steps to completion	Token usage tracking, review iteration limits
Robustness	Graceful handling of failures	Retry configuration, structured error types
Safety	Operation within ethical/security bounds	4-layer shell security, path traversal protection

Traditional QA fails for agents because failures are subtle degradations, not crashes. The system returns 200 OK with wrong answers. This is why observability and trajectory evaluation are architectural requirements, not afterthoughts.

Amelia's Agents

Architect (`amelia/agents/architect.py`)

Role: Analyzes issues, designs solutions, creates implementation plans.

Property	Description
Input	`Issue` (id, title, description) + optional `Design`
Output	`PlanOutput` (markdown plan, goal, key files)
Key Feature	Generates rich markdown plans for agentic execution

The Architect examines an issue and produces a detailed Markdown implementation plan with phases, tasks, and steps. This plan is presented for human approval before execution.

Plan Structure:

Goal: Clear description of what needs to be accomplished
Phases: Logical groupings of related work
Tasks: Discrete units of work within each phase
Steps: Specific actions with code blocks, commands, and success criteria

The Architect follows TDD principles when applicable: write test first, verify it fails, implement, verify it passes.

Developer (`amelia/agents/developer.py`)

Role: Executes code changes through autonomous tool-calling.

Capability	Description
Agentic execution	Autonomous LLM decides which tools to call and when
Streaming events	Real-time updates: tool_call, tool_result, thinking, result
Session continuity	Maintains context across the execution session

The Developer uses an agentic execution model where the LLM autonomously decides what actions to take based on the goal and context. It has access to tools for shell commands, file operations, and reading files.

Streaming Events:

thinking: Agent is analyzing the situation
tool_call: Agent is invoking a tool (shell command, file write, etc.)
tool_result: Result from tool execution
result: Final output when execution completes

Reviewer (`amelia/agents/reviewer.py`)

Role: Reviews code changes, provides feedback, approves or requests fixes.

The Reviewer uses agentic execution to auto-detect technologies, load appropriate review skills, and fetch the diff via git. It produces structured feedback with issues categorized by severity (critical, major, minor).

Output	Description
`approved`	Boolean - whether changes are acceptable
`issues`	List of issues with severity (critical, major, minor) and descriptions
`summary`	High-level summary of the review findings

The Reviewer examines code changes and either approves them or provides feedback. If changes are not approved, the Developer receives the feedback and attempts fixes. This review-fix loop continues until approved or the maximum iterations (max_review_iterations) is reached.

Evaluator (`amelia/agents/evaluator.py`)

Role: Evaluates review feedback against the actual codebase.

The Evaluator examines each piece of review feedback in context and categorizes it using a decision matrix:

Decision	Meaning
`IMPLEMENT`	Feedback is valid — apply the fix
`REJECT`	Feedback is wrong or inapplicable
`DEFER`	Not relevant to the current task
`CLARIFY`	Need more information before acting

Returns an EvaluationResult with categorized items. This prevents the Developer from blindly applying all review suggestions, filtering out false positives and deferring out-of-scope work.

Oracle (`amelia/agents/oracle.py`)

Role: Expert consultation agent.

The Oracle takes a problem statement combined with codebase context and provides advice via agentic LLM execution. It acts as an on-demand expert that other agents or the user can consult for architectural guidance, debugging help, or design decisions. Returns an OracleConsultation record.

Brainstormer (`amelia/agents/brainstormer.py`)

Role: Chat-based design session service.

The Brainstormer facilitates interactive Q&A design sessions with artifacts. It uses a restricted write_design_doc tool that only produces markdown output. Sessions persist and can be handed off to implementation workflows, bridging the gap between ideation and execution.

Plan Validator (`amelia/agents/plan_validator.py`)

Role: Pipeline node that validates plan structure.

The Plan Validator is not a traditional agent — it is a pipeline node that validates plan structure using regex-based extraction of goal, key_files, and total_tasks from the Architect's markdown plan. Returns a PlanValidationResult. This replaced an earlier LLM-based extraction approach for determinism and speed.

Tracker Factory (`amelia/trackers/factory.py`)

Role: Creates the appropriate tracker based on profile configuration.

The create_tracker() factory function returns a BaseTracker implementation (Jira, GitHub, or Noop) based on the profile's tracker setting.

What is "Orchestration"?

Orchestration coordinates multiple agents through a workflow. Rather than one monolithic AI call, orchestration breaks work into specialized steps with clear handoffs.

Amelia uses LangGraph's StateGraph for orchestration:

Nodes: Individual agent calls (architect, developer, reviewer)
Edges: Transitions between nodes
Conditional edges: Decision points (approved? review passed?)

State Machine

ExecutionState tracks everything:

Current profile and issue
Generated plan (goal + markdown)
Tool calls and results (with reducers for streaming)
Approval status
Review results
driver_session_id: For session continuity
review_iteration: Current iteration in review-fix loop
review_pass / max_review_passes: Track multi-pass review cycles
agentic_status: Current execution status
total_tasks: Number of tasks extracted from the plan
current_task_index: Which task is being worked on (0-indexed)
task_review_iteration: Review attempts for the current task
evaluation_result: Output from the Evaluator agent
approved_items: Review items approved for implementation
external_plan: Plan provided externally (bypasses Architect)

Task-Based Execution

For complex implementations, plans are broken into discrete tasks that are executed and reviewed individually.

How It Works

Plan Parsing: The plan_validator_node extracts task count from the plan markdown
Per-Task Execution: Developer works on one task at a time (current_task_index)
Per-Task Review: Each task has its own review cycle (task_review_iteration resets per task)
Commit on Approval: When a task passes review, changes are committed before moving to the next task
Progression: next_task_node advances to the next task until all tasks complete

Benefits

Incremental commits: Each task's changes are committed separately
Focused review: Reviewer evaluates one task at a time
Failure isolation: If a task fails after max iterations, previous tasks are preserved
Progress visibility: Clear tracking of which task is being worked on

State Fields

Field	Purpose
`total_tasks`	Number of tasks in the plan (None = legacy mode)
`current_task_index`	Which task is being worked on (0-indexed)
`task_review_iteration`	Review attempts for current task (resets per task)

Tool Use

Agents don't just generate text - they call tools. This is what makes them "agentic."

How Tool Calls Work

Agent receives goal/context
Agent decides which tool to call with what parameters
Tool executes and returns result
Agent decides next action based on result
Repeat until goal is achieved

Example Flow

Developer receives goal: "Add user authentication tests"
    ↓
Developer calls: glob(pattern="tests/test_*.py")
    ↓
Result: ["tests/test_api.py", "tests/test_utils.py"]
    ↓
Developer calls: read_file(path="tests/test_api.py")
    ↓
Result: [file contents]
    ↓
Developer calls: write_file(path="tests/test_auth.py", content="...")
    ↓
Result: "File created successfully"
    ↓
Developer calls: bash(command="pytest tests/test_auth.py")
    ↓
Result: "1 passed"
    ↓
Developer marks execution complete

Available Tools

The Developer agent uses FilesystemBackend which provides these tools:

Tool	Purpose
`read_file`	Read file contents
`write_file`	Create or overwrite files
`edit_file`	Apply targeted edits to existing files
`glob`	Find files by pattern matching
`grep`	Search file contents with regex
`bash`	Execute shell commands

Unified Streaming with StreamEvent

Amelia uses a unified StreamEvent type for real-time streaming across all drivers. Regardless of whether you use the API driver, Claude driver, or Codex driver, tool execution progress is communicated through the same event format.

StreamEventType (the event categories):

Type	Description
`CLAUDE_THINKING`	Agent is analyzing the situation and planning
`CLAUDE_TOOL_CALL`	Agent is invoking a tool with specific parameters
`CLAUDE_TOOL_RESULT`	Result returned from tool execution
`AGENT_OUTPUT`	Final output when agent completes execution

StreamEvent contains:

id: Unique event identifier
type: One of the StreamEventType values
content: Event payload (text content, result, etc.)
timestamp: When the event occurred
agent: Which agent produced the event (developer, reviewer, etc.)
workflow_id: Links the event to its workflow
tool_name: Name of tool being called (for tool events)
tool_input: Input parameters for tool calls

Driver Conversion

Each driver converts its native message types to StreamEvent:

python

# Claude CLI driver converts SDK messages via convert_to_stream_event()
stream_event = convert_to_stream_event(sdk_message, agent="developer", workflow_id="...")

# The UI and logging systems consume StreamEvent uniformly
await emit_event(stream_event)

This abstraction allows the dashboard and logging systems to display real-time progress identically regardless of which driver is executing the work.

Sandbox Execution

When sandbox mode is enabled, agents execute inside isolated containers rather than on the host machine. This provides a security boundary between the AI agent's actions and the host environment.

How It Works

The ContainerDriver delegates execution to a sandboxed worker process running inside a container. Two sandbox providers are supported:

Provider	Description
Docker	Local container isolation using Docker
Daytona	Cloud-based sandbox environments

LLM Proxy

API keys never enter the sandbox. Instead, the host runs an LLM proxy that the sandboxed worker connects to for model access. This keeps credentials on the host side of the security boundary while allowing the agent to make LLM calls from within the container.

Host                          Sandbox Container
┌──────────────┐              ┌──────────────────┐
│ LLM Proxy    │◄────────────►│ Worker Process   │
│ (holds keys) │   HTTP       │ (no API keys)    │
└──────────────┘              └──────────────────┘

Compatibility

Only the api driver works in sandbox mode. The claude and codex CLI drivers require local CLI installations and authentication that are not available inside the container.

The Driver Abstraction

Drivers abstract how Amelia communicates with LLMs. This separation enables flexibility across different environments.

Why Multiple Drivers?

Driver	Use Case	Requirements
`api`	Direct API calls via DeepAgents + LangChain. Only driver that works in sandbox mode	`OPENROUTER_API_KEY` env var
`claude`	Claude CLI wrapper, policy-compliant	`claude` CLI installed and authenticated
`codex`	OpenAI Codex CLI wrapper	`codex` CLI installed and authenticated

Driver Interface

All drivers implement a simple prompt-based interface:

python

class DriverInterface(Protocol):
    async def prompt(
        self,
        prompt: str,
        system_prompt: str | None = None,
        session_id: str | None = None,
    ) -> str:
        """Send prompt and get response."""
        ...

    async def prompt_agentic(
        self,
        prompt: str,
        system_prompt: str | None = None,
        session_id: str | None = None,
    ) -> AsyncIterator[ApiStreamEvent]:
        """Stream agentic execution events."""
        ...

Why This Matters

Some environments prohibit direct API calls due to data retention policies. The CLI drivers wrap existing approved tools (claude and codex CLIs) that:

Inherit SSO authentication
Comply with data policies
Use existing security approvals

Users can switch between drivers without code changes - just update the profile.

The Tracker Abstraction

Trackers provide pluggable backends for fetching issues.

Tracker	Source	Requirements
`jira`	Jira issues	`JIRA_BASE_URL`, `JIRA_EMAIL`, `JIRA_API_TOKEN`
`github`	GitHub issues	`gh` CLI authenticated (`gh auth login`)
`none`	Manual input	None

All implement the BaseTracker protocol:

python

class BaseTracker(Protocol):
    def get_issue(self, issue_id: str) -> Issue:
        """Fetch issue details by ID."""
        ...

This abstraction means Amelia works with any issue source without changing the core orchestration logic.

Key Takeaways

Agents are specialized: Architect, Developer, Reviewer, Evaluator, Oracle, Brainstormer, and Plan Validator each have focused roles with defined input/output contracts
Trajectory is truth: Full execution trace persisted for debugging, not just final outputs
Human-in-the-loop: Approval gates at critical decision points prevent runaway execution
Defense in depth: Multiple security layers (metacharacters → blocklist → patterns → allowlist)
Abstractions enable flexibility: Drivers and trackers adapt to organizational constraints without code changes
Observability by design: Events, correlation IDs, and token tracking from day one
Iterative refinement: Developer ↔ Reviewer loop with configurable iteration limits

Agentic AI Concepts ​

What is an "Agent"? ​

Agent Quality Framework ​

Amelia's Agents ​

Architect (amelia/agents/architect.py) ​

Developer (amelia/agents/developer.py) ​

Reviewer (amelia/agents/reviewer.py) ​

Evaluator (amelia/agents/evaluator.py) ​

Oracle (amelia/agents/oracle.py) ​

Brainstormer (amelia/agents/brainstormer.py) ​

Plan Validator (amelia/agents/plan_validator.py) ​

Tracker Factory (amelia/trackers/factory.py) ​

What is "Orchestration"? ​

State Machine ​

Task-Based Execution ​

How It Works ​

Benefits ​

State Fields ​

Tool Use ​

How Tool Calls Work ​

Example Flow ​

Available Tools ​

Unified Streaming with StreamEvent ​

Sandbox Execution ​

How It Works ​

LLM Proxy ​

Compatibility ​

The Driver Abstraction ​

Why Multiple Drivers? ​

Driver Interface ​

Why This Matters ​

The Tracker Abstraction ​

Key Takeaways ​

Agentic AI Concepts

What is an "Agent"?

Agent Quality Framework

Amelia's Agents

Architect (`amelia/agents/architect.py`)

Developer (`amelia/agents/developer.py`)

Reviewer (`amelia/agents/reviewer.py`)

Evaluator (`amelia/agents/evaluator.py`)

Oracle (`amelia/agents/oracle.py`)

Brainstormer (`amelia/agents/brainstormer.py`)

Plan Validator (`amelia/agents/plan_validator.py`)

Tracker Factory (`amelia/trackers/factory.py`)

What is "Orchestration"?

State Machine

Task-Based Execution

How It Works

Benefits

State Fields

Tool Use

How Tool Calls Work

Example Flow

Available Tools

Unified Streaming with StreamEvent

Sandbox Execution

How It Works

LLM Proxy

Compatibility

The Driver Abstraction

Why Multiple Drivers?

Driver Interface

Why This Matters

The Tracker Abstraction

Key Takeaways