Workload Architecture

Overview

The daemon evolves from a webhook-driven pipeline runner into a remote skill execution service. Thin clients (CLI, CI runners, other services) dial in to submit workloads. The daemon acts as a trusted execution boundary — it holds secrets, knowledge base access, and privileged context that clients never see.

Core Concepts

Workload

A workload is the full lifecycle container: auth context, secrets, KB references, one or more tasks, and their results. It replaces the concept of a "session" (reserved for agent/runtime use).

interface Workload {
  id: string;
  client_id: string;
  scopes: string[];                  // what this workload can do
  secrets: string[];                 // secret namespaces (e.g. ["github", "linear"])
  kb: string[];                      // knowledge sources (e.g. ["security-policy", "style-guide"])
  context: Record<string, unknown>;  // accumulated state across tasks
  ttl: number;                       // workload expiry (seconds)
  created_at: string;
  status: "active" | "completed" | "expired" | "cancelled";
}

A workload is opened by an authenticated client, accumulates context across multiple task submissions, and is eventually closed or expires.

Invocation

An invocation is a single unit of work within a workload. The agent and/or skill fields name what handles it; the payload is opaque to the transport layer — only the agent/skill interprets it.

interface InvocationInput {
  workload_id: string;               // the workload this invocation belongs to
  agent?: string;                    // agent persona to use (optional)
  skill?: string;                    // skill procedure to follow (optional)
  payload: unknown;                  // agent/skill-specific, opaque to transport
  options?: {
    runtime?: string;                // override default runtime
    model?: string;                  // override default model
    priority?: number;
    timeout?: number;
    redact?: string[];               // additional fields to strip from output
    ephemeral?: boolean;             // don't persist result (sensitive workload)
  };
}

At least one of agent or skill is required. When both are set, the agent's SOUL is composed with the skill's body to form the system prompt.

Agent and Skill Dispatch

The daemon resolves agent and skill against discovered registries (agents/<name>/SOUL.md, skills/<name>/SKILL.md). New agents or skills never require API changes — they only require a new directory under the corresponding tree.

Security Model

Trusted Execution Boundary

Client (untrusted)                 Daemon (trusted)
──────────────────                 ────────────────
authenticate ─────────────────────→ create workload
                                    ↓ attach scopes, KB refs, secret namespaces
 
submit task (kind + payload) ─────→ hydrate from workload:
                                      • inject secrets (API keys, tokens)
                                      • attach KB context (docs, policies)
                                      • merge workload history (prior outputs)
                                    ↓
                                    execute skill
                                    ↓
                                    sanitize output (redact secrets, PII)
                                    ↓
stream result ←────────────────────← deliver sanitized result

The developer submits intent ("review this PR against our security policy"). The daemon knows where the policy lives, has the credentials to fetch it, and returns findings without leaking either.

Authentication

Bearer token auth on every HTTP request and WebSocket upgrade.

# bento.yaml
auth:
  tokens:
    - id: cli-prod
      secret: "hsk_..."
      scopes: ["pipelines:*", "agents:*"]
    - id: ci-runner
      secret: "hsk_..."
      scopes: ["pipelines:review"]

Scopes control which skills, secrets, and KB sources a client can access within its workloads.

Secret Isolation

Secrets are stored in the daemon's secret backend, referenced by namespace — never by value in the API. A workload declares which namespaces it has access to (bounded by the client's scopes). Skills resolve secrets at execution time (e.g. secrets.get("github.token")).

If a skill output accidentally contains a secret value, the sanitizer strips it before streaming to the client.

Ephemeral Execution

Tasks marked ephemeral: true leave no trace in the event store. Results are delivered over the WebSocket and discarded. For workloads involving company secrets that shouldn't persist.

API Surface

The daemon exposes three interfaces on the same Hono server and port. All share the same bearer token auth middleware.

CLI scope

The bento CLI is scoped to system lifecycle management of the daemon process — start, stop, restart, status, logs, install/uninstall (systemd), and queue ops. It is not a transport for domain operations. Workload submission, skill invocation, and agent operations are exclusively exposed through the daemon's network transports (REST, WebSocket, MCP) below.

REST (human / automation clients)

POST   /workloads                  → open workload
DELETE /workloads/:id              → close/teardown workload
 
POST   /workloads/:id/tasks        → submit task within workload
GET    /workloads/:id/tasks/:id    → poll task status/result
DELETE /workloads/:id/tasks/:id    → cancel task
 
GET    /health                     → liveness check

WebSocket (real-time streaming)

WS     /workloads/:id/stream       → real-time progress + results for all tasks

MCP (agent-to-agent interface)

GET    /mcp                        → SSE transport for MCP protocol
POST   /mcp                        → MCP message endpoint

The daemon exposes an MCP server at /mcp via SSE transport. Any MCP-capable client (Claude, pi, Codex, or any agent runtime) gets workload submission as standard tool calls — no custom SDK required.

Tools:

Tool	Description
`create_workload`	Open a workload with scopes, KB refs, TTL
`submit_invocation`	Submit an invocation to an agent and/or skill
`get_invocation`	Poll invocation status and result
`cancel_invocation`	Cancel a running invocation
`list_invocations`	List invocations in a workload
`close_workload`	Teardown a workload
`list_workloads`	List active workloads for the authenticated client

Resources:

URI Pattern	Description
`kb://{source}`	Read KB material (scoped by workload)
`workload://{id}`	Workload state and accumulated context

Secret injection and output sanitization apply identically — the MCP layer never exposes secret values in tool results or resources.

This also solves agent-to-agent composition: an agent running inside the daemon can submit a task to another skill on the same daemon via MCP tools, with the security boundary enforced at the workload level.

The WebSocket delivers structured frames:

{"type":"progress","task_id":"...","stage":"clone","pct":30}
{"type":"log","task_id":"...","level":"info","msg":"running agent: reviewer"}
{"type":"progress","task_id":"...","stage":"review","pct":75}
{"type":"result","task_id":"...","status":"completed","output":{...}}
{"type":"error","task_id":"...","code":"AGENT_FAILED","msg":"..."}

If a client disconnects and reconnects, it can re-attach to the stream. Missed events are buffered and replayed on reconnect (bounded by a configurable buffer size).

Webhook Ingress (unchanged)

The existing webhook routes (/webhooks/github, /webhooks/linear, etc.) remain on the same server. Incoming webhooks create internal workloads with preconfigured scopes from the pipeline config. Externally, they behave identically to today.

Interface Summary

:7890
├── /workloads/**          REST — human/CLI clients
├── /workloads/:id/stream  WS  — real-time streaming
├── /mcp                   SSE — agent-to-agent (MCP protocol)
├── /webhooks/**           REST — external event ingress
└── /health                REST — liveness

Single port, single auth layer, three consumption patterns.

Execution Flow

1. Client authenticates (bearer token)
2. Client opens workload → POST /workloads
   - Daemon validates scopes, attaches available secrets + KB
   - Returns workload ID
 
3. Client opens stream → WS /workloads/:id/stream
 
4. Client submits task → POST /workloads/:id/tasks
   - Daemon validates kind against scopes
   - Enqueues via BullMQ
   - Returns task ID
 
5. Worker picks up job:
   a. Hydrate context (workload secrets + KB + accumulated context)
   b. Resolve kind → skill runner
   c. Execute skill
   d. Stream progress events → WebSocket manager → client
   e. Sanitize output (redact secrets/PII)
   f. Deliver result frame
   g. Update workload context with result (for subsequent tasks)
 
6. Client submits more tasks or closes workload

Knowledge Base

KB sources are referenced by name in the workload. During hydration, the daemon resolves the reference and injects content into the skill's context. Clients never download KB material directly.

KB format is TBD — candidates:

Markdown files on disk (simplest)
Embeddings in a vector store (semantic retrieval)
References to external systems (Notion, Confluence, etc.)

Technology Decisions

Hono (HTTP framework)

Evaluated Hono, Elysia, h3, and raw Bun.serve(). Hono wins decisively:

MCP SDK ships Hono as a direct dependency (not peer, not optional) with an official WebStandardStreamableHTTPServerTransport and a Hono example. Mounting MCP at /mcp is three lines.
Zero transitive dependencies — the npm package is a single module.
First-class Bun adapter (hono/bun) using Web Standard Request/Response.
Middleware chaining for auth, CORS, error handling — critical as the API surface grows.
WebSocket support via hono/ws.
Elysia has only a community MCP plugin (v0.1.1, single maintainer). h3 has no MCP support. Raw Bun.serve() means reimplementing routing/middleware.

Implementation Plan

Phase 1 — Hono Server ✅

Migrated Bun.serve() routing to Hono. Replaced WebhookServer class with Server class backed by a Hono app. All existing routes and tests preserved. Old webhook.ts removed.

Phase 2 — Workload Lifecycle

Implement workload creation, persistence, and teardown. Add POST /workloads and DELETE /workloads/:id. Workloads stored in SQLite alongside events.

Phase 3 — Task Submission + Execution

Add POST /workloads/:id/tasks and GET /workloads/:id/tasks/:id. Wire task submission through BullMQ to the existing pipeline/skill execution machinery.

Phase 4 — WebSocket Streaming

Implement WS /workloads/:id/stream with progress frames. BullMQ job progress events bridge to WebSocket connections. Handle reconnect + replay.

Phase 5 — Secret Store + Sanitization

Implement secret backend (start with encrypted SQLite). Add output sanitizer that strips secret values before delivery. Wire into the hydration/execution flow.

Phase 6 — KB Integration

Implement KB source resolution and injection during hydration. Start with markdown files on disk, extend to other backends as needed.

To Evaluate

Sandbox Runtime

anthropic-experimental/sandbox-runtime — Anthropic's sandboxed execution environment. Potentially relevant for:

Task execution isolation — run workload tasks in sandboxed containers rather than bare processes, enforcing the trusted execution boundary at the OS level
Secret containment — secrets injected into a sandbox can't leak to the host or other workloads
Ephemeral execution — sandbox teardown guarantees no residual state for ephemeral: true tasks
Multi-tenant safety — workloads from different clients run in isolated sandboxes

Needs evaluation: runtime overhead, Bun compatibility, how it composes with BullMQ workers, whether it supports streaming output (needed for WebSocket progress frames).

Open Questions

Secret backend — encrypted SQLite to start, or integrate Vault/SOPS from day one?
KB format — markdown files, vector store, or external system references?
Workload persistence — survive daemon restarts, or in-memory with TTL?
Multi-tenancy — single daemon serving multiple orgs, or one daemon per org?
Rate limiting — per-client, per-workload, or both?
Sandbox runtime — evaluate Anthropic's sandbox-runtime for task execution isolation (see above)