Skip to content

Amelia Roadmap

Goal: Agent orchestration for software development. Workflow control from issue to PR with security, observability, and automated quality gates.

Architecture: 12-Factor Agents methodology.

Track Progress: GitHub Project Board

Design Principles

From Anthropic's research on effective agent harnesses:

  1. Model Improvement as Tailwind - Prefer prompts over code, delegation over hardcoding
  2. Structured Handoffs - Explicit state transfer for cross-session work
  3. One Feature at a Time - Focused work with clear completion criteria
  4. Verify Before Declaring Done - Agents test as humans would
  5. Incremental Accountability - Every change is committed, logged, recoverable
  6. Environment as Truth - Git history is the source of truth, not agent memory

Completed

Core Orchestration

LangGraph state machine coordinating specialized agents with human approval gates.

  • Agent orchestration (Architect → Developer → Reviewer loop)
  • Human approval gates before execution
  • Multi-driver support (API via OpenRouter, Claude CLI, Codex CLI)
  • GitHub and Jira issue tracker integrations (read-only)
  • Phased execution for Developer agent with per-task context isolation (#188)
  • Per-agent driver configuration (#279)
  • Multiple workflow pipelines: implementation and review (#260)

Web Dashboard

Local web interface for workflow visibility, approvals, and real-time updates.

  • FastAPI server with PostgreSQL persistence (#308)
  • REST API for workflow lifecycle (create, approve, reject, cancel, resume, replan)
  • React dashboard with XyFlow workflow visualization
  • WebSocket real-time events with auto-reconnect and backfill
  • Event bus with subscription filtering and trace-level logging (#227)
  • Quick Shot modal for launching workflows (#248)
  • External plan import with file picker and async validation (#346, #448)
  • Agent prompt configuration UI (#182)
  • Token usage, cost, and duration tracking in dashboard (#176)
  • Database-backed configuration replacing split YAML/ServerConfig (#307)

Knowledge Library

RAG infrastructure for framework documentation, white papers, and specifications.

  • Document ingestion (PDF, Markdown) (#433#436)
  • Semantic search with source citations and tag filtering
  • Dashboard UI with search, document management, and upload
  • API endpoints for document CRUD and search

Next: DOCX/HTML ingestion, chat Q&A interface for document queries (see Phase 1 dependencies)

Oracle Consulting System

Foundation for agents to query external knowledge sources with codebase context (#280).

  • Oracle agent with agentic consultation via driver.execute_agentic()
  • FileBundler for gathering codebase files via glob patterns (git-aware, respects .gitignore)
  • Token estimation with tiktoken (cl100k_base) for context management
  • OracleConsultation state model with session metrics, cost tracking, and outcome recording
  • POST /api/oracle/consult endpoint with path traversal prevention
  • 6 WebSocket event types (EventDomain.ORACLE): started, thinking, tool call/result, completed, failed
  • Pipeline state integration via oracle_consultations append-only reducer

Spec Builder — Brainstorming

Document-assisted design tool for synthesizing specifications from research and requirements (#204).

  • Brainstorming chat interface with streaming (SpecBuilderPage dashboard + BrainstormService backend)
  • Session management with persistence (create, list, delete, resume)
  • Artifact generation from brainstorming sessions (design documents)
  • Handoff from brainstorming to implementation pipeline via Design state
  • Token usage and cost tracking per message
  • Tool execution visualization (tool calls, results, reasoning blocks)
  • Interactive ask_user_question cards for structured user input during brainstorming (#489)

Plan Validation

Automated plan quality enforcement with feedback loops.

  • Plan validation feedback loop: automatic re-validation after architect revisions (#493)

Parallel Execution Foundation

Concurrent workflows with resource management.

  • Concurrent workflows with configurable max_concurrent limit (default: 5)
  • One-workflow-per-worktree constraint with path-based locking
  • Batch workflow API (POST /workflows/start-batch) for starting multiple queued workflows
  • Queue-and-execute pattern: queue_workflow()start_batch_workflows()
  • Background asyncio task execution with proper state tracking
  • DevContainer sandbox for isolated agent execution (#408#411)

Phase 1: Agent Integration & Context [In Progress]

Complete Oracle and Spec Builder integration into the agent workflow, plus context management improvements.

  • Oracle dashboard UI for consultation requests and real-time event streaming
  • Tool registration in driver abstraction for agent-initiated consultations
  • Architect integration: Oracle consultation during plan() for library knowledge
  • Reviewer integration: Oracle consultation during agentic_review() for pattern validation
  • Automated context window management (#229)
  • Automatic code pre-fetching before Architect planning:
    • Existing tests related to modified files
    • Similar features via semantic search
    • Recent commits touching related code
    • CI pipeline status

Dependencies: Knowledge Library + RLM processing for full capability


Phase 2: RLM Document Processing [Planned]

Recursive Language Model (RLM) processing for documents exceeding context limits. Based on Zhang, Khattab, Kraska (MIT CSAIL).

  • Hybrid mode: direct injection (<16K tokens) vs RLM processing (>16K or complex tasks)
  • Structured tools: search_pattern, get_section, chunk_by_size, chunk_by_structure, query_subset, summarize
  • request_capability() for signaling tool gaps → dashboard + analytics
  • Session-scoped caching by hash(tool + params)
  • Extended OracleConsultation model: tools_used, recursive_calls, capability_requests
  • New events: ORACLE_RLM_TOOL_CALL, ORACLE_CAPABILITY_REQUESTED

Dependencies: Knowledge Library (provides documents)


Phase 3: Quality Gates & Verification [Planned]

Automated verification before code reaches human reviewers.

Foundation in place:

  • Configurable iteration limits (max_retries, max_iterations) with auto-halt

  • Evaluator agent with decision matrix (IMPLEMENT, REJECT, DEFER, CLARIFY)

  • Pre-push hook running lint, typecheck, test, and dashboard build

  • Retry with exponential backoff for transient failures (RetryConfig, _run_workflow_with_retry)

  • Error compaction/summarization before LLM re-invocation (review feedback currently fed verbatim)

  • Pre-review gates integrated into agent workflow (not just git hooks)

  • Evaluation CI/CD integration (#230)

  • Specialized parallel reviewers (Security, Performance, Accessibility) (#68)

  • Reviewer agent benchmark framework (#8)

  • Self-reflection protocol: Developer self-reviews before Reviewer

  • Security scan integration (SAST tools)

  • Browser automation (Playwright) for E2E verification

  • Configurable coverage thresholds with regression tracking


Phase 4: Parallel Execution — Advanced Patterns [Planned]

Advanced concurrency patterns building on the parallel execution foundation.

  • DAG-aware task scheduling within workflows
  • Resource management (LLM rate limiting, compute allocation)
  • Sectioning pattern: Parallel independent subtasks
  • Voting pattern: Run high-stakes tasks multiple times for consensus
  • Fire-and-forget completion notifications (currently WebSocket/dashboard only)

Phase 5: Bidirectional Tracker Sync [Planned]

Full issue lifecycle management from CLI (#64).

  • Create, update, transition, and close issues via CLI
  • Label, milestone, and related-issue management
  • Bidirectional sync with conflict resolution

Phase 6: Pull Request Lifecycle [Planned]

PR management from creation through merge (#66).

  • Generate PRs from task metadata with auto-assigned reviewers
  • Fetch and address review comments with fixup commits
  • Monitor CI status and auto-merge when approved
  • Automatic branch cleanup post-merge

Phase 7: Chat Integration & Notifications [Planned]

Slack/Discord interface for async workflow management (#61).

  • Approval buttons in chat
  • Configurable notification verbosity and quiet hours
  • Thread-per-workflow isolation
  • request_human_input tool for agent-initiated questions (promotes F7 from partial to complete)
  • Mobile pairing API for Volant iOS app (#265)

Phase 8: Observability & Tooling [Planned]

Infrastructure for monitoring, debugging, and extending agent capabilities.

  • Distributed tracing with OTel-compatible spans (#232)
  • Observability metrics foundation (latency, throughput, error rates) (#234)
  • Tool registry for dynamic tool discovery and registration (#233)
  • Read-only DeepAgent mode via technical tool restriction (#357)

Phase 9: Continuous Improvement [Planned]

Metrics and feedback for agent quality improvement (#63).

Foundation in place:

  • Success/failure tracking with success_rate in usage summaries

  • Token usage and cost tracking per agent, model, and workflow (15+ models supported)

  • Costs dashboard with trend charts, success rate badges, and model breakdown

  • Reviewer structured output with severity classification (Critical/Major/Minor)

  • Project-specific knowledge base (idioms, pitfalls, decisions)

  • Prompt A/B testing with benchmark suite

  • Pre-merge evaluation CI for agent quality

  • LLM-as-Judge for nuanced quality assessment


Phase 10: Debate Mode [Planned]

Multi-agent deliberation for design decisions (#202).

  • Moderator assigns perspectives to debater agents
  • Parallel debate rounds with convergence detection
  • Human checkpoints for guidance injection
  • Synthesis documents with recommendations and confidence levels

Phase 11: Security & Authorization [Planned]

Defense-in-depth security with per-agent permissions (#228, #231).

Foundation in place:

  • Path traversal prevention in worktree and file operations

  • allowed_tools parameter on execute_agentic() for per-agent tool restriction (#356)

  • Deterministic guardrails blocking high-risk operations

  • Per-agent tool allowlists (Architect: read-only, Developer: write, Reviewer: read+comment)

  • Configurable risk levels per profile (Prototype/Demo/Production) (#219)

  • Reasoning-based defenses with optional guard model

  • Tool-call-level audit logging with agent identity

  • MCP security: explicit allowlists, collision detection, taint tracking


Phase 12: Cloud Deployment [Planned]

Parallel execution on cloud infrastructure.

  • Multiple workflows running in parallel on AWS
  • Thin CLI client for submitting and monitoring
  • OAuth-based authentication with GitHub

Phase 13: Capitalization Tracking [Planned]

Engineering work attribution for financial reporting (OPEX vs CAPEX) (#70).

  • Initiative resolution from JIRA Epics or GitHub Projects
  • Hours estimation from workflow timestamps
  • OPEX vs CAPEX classification
  • CLI and dashboard reporting with audit trails

12-Factor Compliance Summary

FactorStatusPrimary PhasesNotes
F1: Natural Language → Tool Calls🟡 PartialCore OrchestrationEvaluator/Knowledge use Pydantic schemas; core agents (Architect/Developer/Reviewer) use free-form agentic mode
F2: Own Your Prompts✅ CompleteCore OrchestrationFull versioning system with DB persistence, PromptResolver, workflow audit linking
F3: Own Your Context Window🚧 In ProgressPhases 1, 2Task sectioning exists; no token counting or context budgeting
F4: Tools = Structured Outputs🟡 PartialCore Orchestration, Phase 8Canonical ToolName registry; tool schemas owned by driver frameworks, not Amelia
F5: Unified State✅ CompleteWeb DashboardFrozen BasePipelineStateImplementationState hierarchy with LangGraph reducers
F6: Launch/Pause/Resume✅ CompleteWeb Dashboard8 REST endpoints, LangGraph interrupt_before, PostgreSQL checkpointing
F7: Contact Humans with Tools🟡 PartialPhases 7, 10, 11human_approval_node exists; agents cannot initiate contact mid-execution; no outbound notifications
F8: Own Your Control Flow✅ CompleteCore OrchestrationCustom routing functions with business logic, two pipeline graphs
F9: Compact Errors🟡 PartialPhases 3, 8Exponential backoff retry exists; no error compaction for LLM context
F10: Small Focused Agents✅ CompleteCore Orchestration4 narrow agents, step limits, per-task fresh sessions
F11: Trigger from Anywhere🟡 PartialWeb Dashboard, Phase 7CLI + REST + Dashboard; no inbound webhooks or event-driven triggers
F12: Stateless Reducer✅ CompleteCore OrchestrationFrozen Pydantic state, operator.add reducers, pure node functions
F13: Pre-fetch Context🟡 PartialPhase 1Issue/commit/design/prompts pre-fetched; codebase context via runtime exploration

Current: 6 Complete, 1 In Progress, 6 Partial


References

Last updated: