Amy

Architecture

The complete picture of how Amy's backend is built, why each piece was chosen, and where the seams are.

Goals (in priority order)

  1. Backend developer experience is the product. Adding a new client surface (mobile, web, partner integration, AI agent) should feel like "install SDK, call function." If it doesn't, the backend has failed.
  2. AI-agent-readable end-to-end. Docs, errors, schemas all designed to be consumed cleanly by LLMs. This is now a first-class integration vector, not an afterthought.
  3. Durable agent runtime. A turn (the 2-7 minute multi-agent reasoning pipeline) survives worker restarts, Anthropic 5xxs, and code deploys mid-flight.
  4. Cloudflare-native. Stick with the stack we're already paying for and understand.
  5. No premature scale, no premature features. Beta is 10-100 users; design for that and clearly document where scale will bite.

Non-goals for v1

  • Multi-tenancy / organizations (one user per account).
  • Outbound webhooks for partners (we design the seam, don't build it).
  • BYOK / self-hosted (probably never).
  • Real-time multi-device collaboration.

Five principles

  1. The schema is the contract. A single Zod schema → OpenAPI → SDK → docs. One source of truth, everything else generated.
  2. Resources, not RPC. POST /v1/turns to start a turn, not POST /v1/createTurn.
  3. Every async thing is a Workflow. Turns, lab parsing, future scheduled check-ins all run as Cloudflare Workflows for durable, resumable, observable execution.
  4. Every write is idempotent. The Idempotency-Key header is honored on every POST/PATCH; results are cached in KV for 24h.
  5. Every response carries a request_id. Logs and errors include it. Tracing is free if you wire it up from day one.

System diagram

            CLI (Bun)   Mobile (RN)   Web   Partners/AI Agents
                │           │          │          │
                └───────────┴────┬─────┴──────────┘
                                 │  HTTPS · SSE · (WebSocket later)

                 ┌─────────────────────────────────────┐
                 │  @amy/sdk-{ts,py,swift}             │
                 │  Generated from OpenAPI; identical  │
                 │  ergonomics across languages.       │
                 └────────────────┬────────────────────┘

                 ┌─────────────────────────────────────┐
                 │  api.amy.health                     │
                 │  Cloudflare Worker · Hono ·         │
                 │  @hono/zod-openapi                  │
                 │                                     │
                 │  /v1/turns       /v1/sources        │
                 │  /v1/data        /v1/labs           │
                 │  /v1/memory      /v1/me             │
                 │  /webhooks/terra                    │
                 │  /openapi.json   /llms.txt          │
                 └──┬────────┬──────────┬──────────────┘
                    │        │          │
                    ▼        ▼          ▼
              ┌────────┐  ┌────┐    ┌──────────┐
              │ D1     │  │ R2 │    │ KV       │
              │ rel.   │  │ blob│   │ idemp    │
              │ data   │  │     │   │ stream   │
              └────────┘  └────┘    └──────────┘

                    │  step state read/written

            ┌───────┴──────────┐      ┌──────────────────────┐
            │ CF Workflows     │ ←──→ │ CF Queues            │
            │ · TurnWorkflow   │      │ · terra-events       │
            │ · LabParse       │      │ · workflow-dispatch  │
            └──────┬───────────┘      └──────────────────────┘


            Anthropic · Terra · OpenRouter · Clerk

The five layers

Layer 1, Contract (the highest-use piece)

A single Zod schema package (packages/contracts/) defines every request, response, and event. Every route imports from it; the OpenAPI spec is auto-generated; the TypeScript SDK is auto-generated; the docs embed the spec.

// packages/contracts/src/turns.ts
export const TurnCreate = z.object({
  messages: z.array(MessageSchema),
  stream: z.boolean().default(true),
});

export const Turn = z.object({
  id: z.string().startsWith("turn_"),
  status: z.enum(["queued", "running", "completed", "failed"]),
  created_at: z.string().datetime(),
  result: TurnResult.optional(),
});
// apps/api/src/routes/turns.ts — Hono route consumes the contract
import { TurnCreate, Turn } from "@amy/contracts";

app.openapi(
  { method: "post", path: "/v1/turns",
    request: { body: { content: { "application/json": { schema: TurnCreate } } } },
    responses: { 201: { content: { "application/json": { schema: Turn } } } } },
  async (c) => { /* dispatch to TurnWorkflow, return Turn */ }
);
// any client (CLI, mobile, web) — fully typed
import { Amy } from "@amy/sdk";
const amy = new Amy({ apiKey: process.env.AMY_API_KEY });

const turn = await amy.turns.create({ messages: [...] });
for await (const event of amy.turns.stream(turn.id)) {
  console.log(event.type, event);
}

Why this matters: Once the contract is in place, adding a new client is a 30-minute job. Without it, every client re-defines the same types and drifts.

Layer 2, API surface

Resource-oriented REST. Every resource follows the same shape: POST to create, GET /:id to read, GET to list, PATCH /:id to update, DELETE /:id to remove. Long-running creates return 202 Accepted with a status URL; everything else is 200/201/204.

POST   /v1/turns                       Start a turn → 201 { id, status: "queued" }
GET    /v1/turns/:id                   Status + result
GET    /v1/turns/:id/events            SSE stream of turn events
GET    /v1/turns                       List (cursor-paginated)

GET    /v1/me                          Current user
PATCH  /v1/me                          Update profile

GET    /v1/sources                     List connected wearables
POST   /v1/sources/terra/connect       → { widget_url } for OAuth
DELETE /v1/sources/:provider           Disconnect

POST   /v1/labs                        Upload PDF (multipart) → 202 { id }
GET    /v1/labs                        List uploads
GET    /v1/labs/:id                    Status + parsed biomarkers

GET    /v1/data/sync?cursor=...        Delta sync (for offline clients)
GET    /v1/data/biomarkers             Timeseries query
GET    /v1/data/summaries/:date        Daily summary

GET    /v1/memory                      Facts Amy remembers
POST   /v1/memory                      Add a fact
DELETE /v1/memory/:id                  Remove a fact

POST   /webhooks/terra                 Ingest (HMAC-verified)
POST   /v1/auth/cli/start              Device flow start
POST   /v1/auth/cli/approve            Device flow approve

GET    /openapi.json                   Live OpenAPI 3.1
GET    /llms.txt                       AI-agent index
GET    /healthz                        Liveness

Conventions:

  • Resource URIs are plural nouns.
  • IDs are typed prefixes: turn_…, lab_…, src_…, mem_…. Easy to grep, hard to confuse.
  • Cursor pagination on all list endpoints: ?cursor=…&limit=…{ data: [...], next_cursor: "..." }.
  • Idempotency on all writes: Idempotency-Key: <client-uuid> header.
  • Errors follow one shape:
    { "error": { "code": "turn_not_found", "message": "...", "request_id": "req_...", "docs_url": "https://docs.amy.health/concepts/errors#turn_not_found" } }
  • Versioning via URL path (/v1/). When v2 ships, both run side-by-side; v1 deprecated with at least 6 months notice.

See API reference for the full schema of every endpoint.

Layer 3, Compute model

The Worker handles everything that fits in <5s of wall time. Anything longer is a Workflow.

TurnWorkflow

A turn is decomposed into discrete, retry-safe steps. Each step's output persists in the workflow's durable state; if step 5 fails on an Anthropic 5xx, the workflow resumes at step 5, steps 1-4 don't replay.

step 1   classify_vagueness     Sonnet         ~3s
step 2   route                  Sonnet         ~2s
step 3   rephrase_per_agent     Sonnet         ~2s
step 4   run_supporting_agents  Opus, parallel ~30–90s each
step 5   run_main_agent         Opus           ~30–120s
step 6   reflection             Sonnet         ~5s
step 7   validation_gates       deterministic + Critic  ~10s
step 8   synthesis              Opus, streams  ~20s
step 9   memory_extraction      Sonnet         ~3s
step 10  finalize               write Turn row, fire turn.completed event

Free wins from Workflows:

  • Retry/resume on failure.
  • Observability, every step's input/output visible in the CF dashboard.
  • Replay, re-run a turn with the same inputs for debugging.
  • Sleep + waitForEvent, paves the way for human-in-the-loop pauses ("Amy paused for your confirmation").

See Internals: Agent orchestration for the full step list and validation gate spec.

Streaming (the hardest call)

We want the "watch Amy think" UX from the CLI to work on every client.

v1 choice: SSE with KV-buffered events.

  • Each workflow step writes events to KV: stream:{turn_id}:{seq} and bumps a cursor counter.
  • GET /v1/turns/:id/events is an SSE Worker that polls KV every 250ms and forwards new events to the client.
  • Supports Last-Event-Id header for resume after disconnect.
  • Works on curl, browser EventSource, React Native (with react-native-event-source), Swift URLSession, anything.

Cost: ~250ms event latency. Negligible for LLM streaming where tokens come in chunks anyway.

Upgrade path: swap the KV poll for a Durable Object per active turn that brokers events over WebSocket. The HTTP surface stays identical, clients don't notice the change.

See Concepts: Streaming for the full event type catalog and reconnect protocol.

Layer 4, Storage

DataStoreWhy
Users, sources, turns, biomarkers, daily summariesD1 (SQLite)Transactional, indexed, cheap
Lab PDFs, future audio recordings, exportsR2S3-compatible (Terra likes this), zero egress fees
Idempotency keys, stream event bufferKVShort TTL, edge-cached
Long-term agent memory (facts)D1 + Vectorize (later)Start relational; add vector retrieval when fuzzy lookups appear
Workflow step stateWorkflow runtimeManaged by CF

See Internals: Storage for the D1 schema, R2 layout, KV key patterns, and migration story.

Layer 5, Developer experience

This is the layer that makes everything else worth it.

Repo layout (Bun workspaces, free monorepo)

amy/
├── apps/
│   ├── api/            ← the Cloudflare Worker (was cloud/)
│   ├── cli/            ← the existing CLI (was src/)
│   ├── docs/           ← Fumadocs site → docs.amy.health
│   ├── mobile/         ← (later) React Native app
│   └── web/            ← (later) marketing + dashboard
├── packages/
│   ├── contracts/      ← Zod schemas, the source of truth
│   ├── sdk-ts/         ← generated TS SDK (npm: @amy/sdk)
│   ├── sdk-py/         ← (later) PyPI: amy-sdk
│   ├── agents/         ← the runTurn pipeline (used by apps/api)
│   └── eval/           ← offline evals on agents/
└── tooling/
    ├── openapi-gen/    ← script: routes → openapi.json
    └── llms-txt-gen/   ← script: docs → llms.txt

The CLI keeps working throughout. The first migration step is just extracting packages/contracts/ from the duplicated schemas, purely a refactor, no behavior change.

SDKs

LanguageHow it's builtWhen it ships
TypeScriptHono's hc() typed client wrapping fetch, generated at build time from the OpenAPI specDay one
PythonFern free tier from OpenAPIWhen the first Python user asks
SwiftFern or StainlessWhen the native iOS app starts

Every SDK ships with:

  • Fully typed methods (matching the contract package).
  • Automatic retries with exponential backoff for transient failures.
  • Auto-generated Idempotency-Key on writes (UUIDv4).
  • An async iterator for streaming endpoints.
  • Typed error classes with stable code fields.

See SDK reference.

Docs site

Fumadocs (open-source, Next.js, MDX) served at docs.amy.health. Sections mirror this directory:

  1. Getting Started, 5 minutes to your first turn.
  2. Concepts, turns, streaming, memory, webhooks, errors.
  3. Guides, how-to articles, ordered.
  4. Recipes, end-to-end builds, including the mobile app one.
  5. API Reference, embedded Scalar component fed from /openapi.json.
  6. SDK Reference, auto-generated from SDK source.

The AI-agent surface (the differentiator):

  • GET /llms.txt, llmstxt.org standard, indexes every doc page with a 1-line description.
  • GET /llms-full.txt, all docs concatenated, for one-shot context loading.
  • Every docs page available as raw markdown at <url>.md.
  • Every API error includes a docs_url pointing to the relevant page.
  • OpenAPI spec includes rich description, examples, and per-language x-codeSamples for every endpoint.

When Claude Code (or any agent) integrates with Amy: it fetches llms.txt, picks the relevant pages, fetches them as .md, and writes code. No HTML parsing, no scraping, no guessing.

Local dev loop

bun dev              # wrangler dev (api) + docs site + cli watch — all in one
bun test             # vitest across all packages
bun openapi          # regenerate openapi.json from routes
bun sdk:gen          # regenerate the TS SDK from the spec
bun docs:dev         # docs site hot reload

Edit a route → SDK types update in <2s → CLI/docs reflect it. That's the inner loop we're building toward.

Trade-offs (explicit)

DecisionChoiceCost
Schema languageZodNot as expressive as TypeSpec for API design, but already familiar and used everywhere in the codebase
Initial SDKhc() for TS onlyOther languages delayed until first ask
StreamingSSE + KV poll~250ms latency vs WebSocket; trivial upgrade path
OrchestrationCF WorkflowsDeeper Cloudflare lock-in (we were already deep)
MonorepoBun workspacesMore files to navigate; mitigated by a README per package
Docs siteSelf-hosted FumadocsMore setup than hosted Mintlify; zero recurring cost
AuthClerkVendor dependency; mitigated by hiding it behind our own /v1/me
VersioningURL path (/v1/)Less flexible than header-based; simpler to reason about
AdaptersTerra-firstSkips per-vendor OAuth dance; locked to Terra's coverage

Migration sketch (current state → v1)

You're closer than it looks. The order matters; steps 1-3 are pure refactors that don't change behavior but unlock everything else.

  1. Carve out packages/contracts/ from the duplicated src/data/schema.ts and cloud/src/schema.ts. Both sides import from it. No behavior change.
  2. Adopt @hono/zod-openapi in the existing Worker. Refactor routes to declare their schemas; auto-generate /openapi.json.
  3. Move the CLI client to hc() from manual fetch. Delete the bespoke wrapper in src/cloud/client.ts.
  4. Build TurnWorkflow. Wrap the existing runTurn from src/orchestrator/index.ts; step-decompose; persist state. Add POST /v1/turns and GET /v1/turns/:id/events.
  5. Restructure to monorepo. src/apps/cli/, cloud/apps/api/. Bun workspaces.
  6. Stand up apps/docs/ with Fumadocs. Hook Scalar to /openapi.json. Write the llms.txt generator.
  7. Deploy api.amy.health and docs.amy.health. First real-world v1 use.

After (1-3) the codebase is dramatically easier to work with even before (4) lands. After (4) the first mobile screen is buildable.

What we'll revisit at scale

These are seams to watch as the system grows. Each has a documented upgrade path; none requires a rewrite.

  • D1 row limits (~10 GB per database). Past ~10k active users, shard by user_id range or move biomarker timeseries to ClickHouse or Tinybird.
  • KV-based streaming. Past ~100 concurrent active turns, the 250ms poll wastes reads. Cut over to Durable Object + WebSocket. The HTTP surface stays identical.
  • Workflow step counts. Cloudflare has limits on steps per workflow (~100 today). If a turn balloons past ~30 steps, decompose into sub-workflows.
  • Synthesis prompt size. Currently includes the full Fact Sheet. At some point use Workers AI Vectorize for selective retrieval.
  • Eval infrastructure. packages/eval/ needs to grow into a real regression harness against frozen agent traces. This is critical before any agent change ships.
  • Multi-region. Workers are already edge-distributed; D1 is single-region. When global latency starts to matter, evaluate D1 read replicas (in beta) or partition by user region.

Where to next

On this page