Case Study March 2026

Building a Persistent
Autonomous AI Agent

AEGIS is a production system that runs 24/7 on Cloudflare's edge — routing across 6+ models, managing its own memory, shipping code through governed pipelines, and operating autonomously with real safety constraints. This is the story of building it.

v2.10.35+ 230+ versions shipped

48+ MCP + internal tools

500+ Autonomous tasks executed

51 Active repos managed

The Problem

AI Assistants Are Stateless, Passive, and Fragile

Most AI integrations follow the same pattern: user sends message, model responds, context is lost. Every conversation starts from zero. The model can't remember what happened yesterday, can't act on its own initiative, and has no concept of ongoing work.

I needed something fundamentally different — an AI system that:

Accumulates knowledge across conversations and acts on it
Operates autonomously between interactions — monitoring, analyzing, shipping
Routes across models based on task complexity, not a fixed provider
Governs its own actions with real safety constraints
Runs continuously at near-zero cost on edge infrastructure

Not a chatbot. Not a wrapper. A persistent cognitive system that thinks like a co-founder.

Architecture

Edge-Native, Multi-Model, Zero Origin Servers

AEGIS runs entirely on Cloudflare's edge platform. No containers, no origin servers, no cold starts. The entire system is TypeScript end to end, deployed as a single Worker with D1 for persistence and Vectorize for semantic memory.

9-Tier Model Router

Every request is classified by Workers AI (free), evaluated for complexity, and routed to the cheapest model that can handle it. Direct responses, Workers AI, Groq, Cerebras mid, Cerebras reasoning, GPT-OSS 120B, Claude Sonnet, Claude Opus, and a 4-model composite pipeline.

Workers AI Groq Cerebras GPT-OSS Claude

6 Memory Subsystems

Semantic memory via Vectorize (BGE-base-en-v1.5, 768-dim). Episodic memory in D1. Procedural learning that improves routing. Persona matrix (20 operator observations). Cross-Repo Intelligence Exchange (CRIX) for pattern sharing. Graph tier for relationship-aware retrieval.

Vectorize D1 RRF Fusion

20+ Scheduled Tasks

Hourly cron fires a phased task pipeline: escalation, issue watching, morning briefing, memory consolidation, heartbeat monitoring, product health, goal execution, content generation, memory reflection, curiosity cycles, dreaming, cost monitoring, behavioral detectors, D1 backups, and PRISM synthesis.

Heartbeat Goals Dreaming

3-Layer Safety

Shell hooks block destructive operations. CLI constraints prevent interactive prompts. Mission briefs scope each task. Governance caps limit tasks per repo (5) and total active (20). Authority levels: proposed, auto_safe, operator.

Hooks Governance Authority

Cognitive Systems

Subsystems Running Under the Hood

Beyond the model router and memory, AEGIS runs a stack of specialized subsystems that handle synthesis, grounding, event intelligence, and autonomous publishing.

PRISM

Nightly synthesis

Pattern Synthesis Daemon. Discovers cross-domain connections between memory facts and surfaces emergent insights. Four adversarial epistemic gates block circular references, parrot responses, self-deception, and gap signals before anything reaches long-term memory.

Grounding Layer

Pre/post dispatch

Anti-hallucination passes on every dispatch. Entity grounding fanout verifies named claims, fabrication detector flags invented facts, semantic sanhedrin runs contradiction detection, and gap signal escalation surfaces knowledge holes rather than guessing.

ARGUS

Webhook intelligence

Ingests GA4, Stripe, and GitHub events and runs heartbeat pattern detection: CI failure clusters, payment anomalies, usage droughts, deploy gaps. Routes signals to CTO/CISO agent consultation before any action is taken.

Nexus

Cross-repo awareness

Maintains contract awareness across all 51 managed repos. Pre-commit guard blocks commits that would collide with in-flight work in consumer repos — catching cross-repo conflicts before they become integration problems.

Content Orchestration

Autonomous publishing

Autonomous Bluesky posting, blog dispatch pipeline, and video brief API for Stackmotion. AEGIS writes the brief, Stackmotion renders and uploads to YouTube. Fully automated content-to-publication flow.

MindSpring

Conversation search

Semantic search across the full conversation history. The Claude Code Stop hook pushes session transcripts into a searchable notebook on session end, making every coding session retrievable by semantic query.

The Pipeline

From GitHub Issue to Merged PR — Autonomously

This is the core innovation. AEGIS doesn't just answer questions — it ships code through a governed pipeline that mirrors how a senior engineer works.

Issue Detection

GitHub issues labeled aegis are detected by the issue watcher. Label-to-category routing maps tests to auto_safe, feature to proposed.

Task Creation

cc_tasks are created in D1 with governance checks: per-repo caps (5), active task limit (20), duplicate detection. Proposed tasks require human approval.

Headless Execution

The taskrunner dequeues the next task, launches a headless Claude Code session with a scoped mission brief, safety hooks active, branch-per-task isolation.

PR + Review

Completed work is committed to auto/{category}/{task-id}, a PR is opened, and Codex runs an automated review. Critical findings get needs-fix labels. Clean PRs get codex-reviewed.

Session Digest

Every completed task posts a session digest that feeds the dreaming cycle — what was changed, what was learned, what's still open. The system learns from its own work.

Battle Tested

8 Production Incidents, 0 Data Loss

AEGIS has been running in production since March 2026. Here are real incidents that shaped the system's resilience:

WARN

.replace() Crash Loop

BizOps used fragile query.replace() for SQL sanitization, causing malformed MCP SSE responses. AEGIS's router called .trim() on null Groq responses without guards. Two-service fix across BizOps validation.ts and AEGIS router.ts + evaluator.ts.

Fix: null guards on all external response boundaries

ALERT

Goal Cadence Runaway

Goal execution hit 28-38 runs per day instead of the expected 4-6. The touchGoal timestamp wasn't being updated on failure paths, causing the same goals to re-fire every cycle.

Fix: moved touchGoal to finally block, ensuring update on all paths

WARN

Composite Executor Parameter Dropping

The 4-model composite pipeline was silently dropping tool schemas between the gather and orchestrate phases. Single-subtask queries bypassed synthesis entirely. BizOps mutations were being routed to the wrong model.

Fix: 5-part restructure — gather gets original query, orchestrator sees schemas, synthesis gets raw data, fast-path for single subtask, bizops_mutate routed to GPT-OSS

INFO

Duplicate Email Storm

Heartbeat and escalation both fired on the same cron tick, each sending overlapping alerts about stale agenda items. Users received near-identical emails 1 minute apart.

Fix: escalation returns StaleHighItem[] instead of sending own email; heartbeat folds them into a single consolidated report

Autonomous Cognition

The Dreaming Cycle

Once per day, AEGIS enters a dreaming cycle — an async reflection over the full day's conversation threads, task completions, and memory state, powered by Workers AI (free tier). This is where the system processes what happened, extracts facts, queues tasks, and evolves its persona. PRISM then runs a second synthesis pass to find cross-domain patterns across everything consolidated that day.

Phase 1

Memory Consolidation

Scans recent conversations for important facts, decisions, and patterns. Records to semantic memory. Deduplicates against existing knowledge.

Phase 2

Self-Improvement Analysis

Analyzes its own performance — routing accuracy, task success rates, memory recall quality. Proposes improvements as GitHub issues with category routing.

Phase 3

Task Triage

Reviews open issues across 20+ repos. Promotes stray work items to properly categorized issues. Proposes task queue entries for the taskrunner.

Phase 4

Persona Extraction

Maintains a 20-observation persona matrix across 6 dimensions. Surfaces operator preferences and communication patterns in every prompt via split-recall.

Outcomes

What It Actually Delivered

~$0

Infrastructure cost per month. Cloudflare free tier for Workers, D1, Vectorize. Only pay for Claude API calls on complex tasks.

500+

Autonomous tasks completed — docs, tests, research, bugfixes, refactors — with zero manual intervention after approval.

Repositories actively managed. Safety hooks, CLAUDE.md standards, and ADF manifests propagated across the ecosystem.

<50ms

Global edge response time. No origin server. No cold starts. 300+ Cloudflare locations.

230+

Versions shipped since launch. From v1.0 (basic chat) to v2.10.35+ — MCPA evaluation, ARGUS intelligence, PRISM synthesis, Grounding layer, Nexus, and 6-tier memory.

Data loss incidents across 8 production outages. Safety hooks and governance caps caught every destructive action.

Stack

What It Runs On

Cloudflare Workers Runtime + API + Cron

D1 (SQLite) Persistence + State

Vectorize Semantic Memory (768-dim)

Workers AI Classification (Free)

Claude API Complex Reasoning

Groq Fast Inference

Resend Transactional Email

GitHub API Issues + PRs + Trees

MCP (SSE) Tool Protocol

TypeScript End to End

Explore AEGIS

The system is live and running right now. Check the health endpoint, read the technical blog, or browse the source.

/health endpoint ↗ Technical Blog ↗ Back to Projects

Building a PersistentAutonomous AI Agent