Workload Architecture
Overview
The daemon evolves from a webhook-driven pipeline runner into a remote skill execution service. Thin clients (CLI, CI runners, other services) dial in to submit workloads. The daemon acts as a trusted execution boundary — it holds secrets, knowledge base access, and privileged context that clients never see.
Core Concepts
Workload
A workload is the full lifecycle container: auth context, secrets, KB references, one or more tasks, and their results. It replaces the concept of a "session" (reserved for agent/runtime use).
interface Workload {
id: string;
client_id: string;
scopes: string[]; // what this workload can do
secrets: string[]; // secret namespaces (e.g. ["github", "linear"])
kb: string[]; // knowledge sources (e.g. ["security-policy", "style-guide"])
context: Record<string, unknown>; // accumulated state across tasks
ttl: number; // workload expiry (seconds)
created_at: string;
status: "active" | "completed" | "expired" | "cancelled";
}A workload is opened by an authenticated client, accumulates context across multiple task submissions, and is eventually closed or expires.
Invocation
An invocation is a single unit of work within a workload. The agent and/or skill fields name what handles it; the payload is opaque to the transport layer — only the agent/skill interprets it.
interface InvocationInput {
workload_id: string; // the workload this invocation belongs to
agent?: string; // agent persona to use (optional)
skill?: string; // skill procedure to follow (optional)
payload: unknown; // agent/skill-specific, opaque to transport
options?: {
runtime?: string; // override default runtime
model?: string; // override default model
priority?: number;
timeout?: number;
redact?: string[]; // additional fields to strip from output
ephemeral?: boolean; // don't persist result (sensitive workload)
};
}At least one of agent or skill is required. When both are set, the agent's SOUL is composed with the skill's body to form the system prompt.
Agent and Skill Dispatch
The daemon resolves agent and skill against discovered registries (agents/<name>/SOUL.md, skills/<name>/SKILL.md). New agents or skills never require API changes — they only require a new directory under the corresponding tree.
Security Model
Trusted Execution Boundary
Client (untrusted) Daemon (trusted)
────────────────── ────────────────
authenticate ─────────────────────→ create workload
↓ attach scopes, KB refs, secret namespaces
submit task (kind + payload) ─────→ hydrate from workload:
• inject secrets (API keys, tokens)
• attach KB context (docs, policies)
• merge workload history (prior outputs)
↓
execute skill
↓
sanitize output (redact secrets, PII)
↓
stream result ←────────────────────← deliver sanitized resultThe developer submits intent ("review this PR against our security policy"). The daemon knows where the policy lives, has the credentials to fetch it, and returns findings without leaking either.
Authentication
Bearer token auth on every HTTP request and WebSocket upgrade.
# bento.yaml
auth:
tokens:
- id: cli-prod
secret: "hsk_..."
scopes: ["pipelines:*", "agents:*"]
- id: ci-runner
secret: "hsk_..."
scopes: ["pipelines:review"]Scopes control which skills, secrets, and KB sources a client can access within its workloads.
Secret Isolation
Secrets are stored in the daemon's secret backend, referenced by namespace — never by value in the API. A workload declares which namespaces it has access to (bounded by the client's scopes). Skills resolve secrets at execution time (e.g. secrets.get("github.token")).
If a skill output accidentally contains a secret value, the sanitizer strips it before streaming to the client.
Ephemeral Execution
Tasks marked ephemeral: true leave no trace in the event store. Results are delivered over the WebSocket and discarded. For workloads involving company secrets that shouldn't persist.
API Surface
The daemon exposes three interfaces on the same Hono server and port. All share the same bearer token auth middleware.
CLI scope
The bento CLI is scoped to system lifecycle management of the daemon process — start, stop, restart, status, logs, install/uninstall (systemd), and queue ops. It is not a transport for domain operations. Workload submission, skill invocation, and agent operations are exclusively exposed through the daemon's network transports (REST, WebSocket, MCP) below.
REST (human / automation clients)
POST /workloads → open workload
DELETE /workloads/:id → close/teardown workload
POST /workloads/:id/tasks → submit task within workload
GET /workloads/:id/tasks/:id → poll task status/result
DELETE /workloads/:id/tasks/:id → cancel task
GET /health → liveness checkWebSocket (real-time streaming)
WS /workloads/:id/stream → real-time progress + results for all tasksMCP (agent-to-agent interface)
GET /mcp → SSE transport for MCP protocol
POST /mcp → MCP message endpointThe daemon exposes an MCP server at /mcp via SSE transport. Any MCP-capable client (Claude, pi, Codex, or any agent runtime) gets workload submission as standard tool calls — no custom SDK required.
| Tool | Description |
|---|---|
create_workload | Open a workload with scopes, KB refs, TTL |
submit_invocation | Submit an invocation to an agent and/or skill |
get_invocation | Poll invocation status and result |
cancel_invocation | Cancel a running invocation |
list_invocations | List invocations in a workload |
close_workload | Teardown a workload |
list_workloads | List active workloads for the authenticated client |
| URI Pattern | Description |
|---|---|
kb://{source} | Read KB material (scoped by workload) |
workload://{id} | Workload state and accumulated context |
Secret injection and output sanitization apply identically — the MCP layer never exposes secret values in tool results or resources.
This also solves agent-to-agent composition: an agent running inside the daemon can submit a task to another skill on the same daemon via MCP tools, with the security boundary enforced at the workload level.
The WebSocket delivers structured frames:
{"type":"progress","task_id":"...","stage":"clone","pct":30}
{"type":"log","task_id":"...","level":"info","msg":"running agent: reviewer"}
{"type":"progress","task_id":"...","stage":"review","pct":75}
{"type":"result","task_id":"...","status":"completed","output":{...}}
{"type":"error","task_id":"...","code":"AGENT_FAILED","msg":"..."}If a client disconnects and reconnects, it can re-attach to the stream. Missed events are buffered and replayed on reconnect (bounded by a configurable buffer size).
Webhook Ingress (unchanged)
The existing webhook routes (/webhooks/github, /webhooks/linear, etc.) remain on the same server. Incoming webhooks create internal workloads with preconfigured scopes from the pipeline config. Externally, they behave identically to today.
Interface Summary
:7890
├── /workloads/** REST — human/CLI clients
├── /workloads/:id/stream WS — real-time streaming
├── /mcp SSE — agent-to-agent (MCP protocol)
├── /webhooks/** REST — external event ingress
└── /health REST — livenessSingle port, single auth layer, three consumption patterns.
Execution Flow
1. Client authenticates (bearer token)
2. Client opens workload → POST /workloads
- Daemon validates scopes, attaches available secrets + KB
- Returns workload ID
3. Client opens stream → WS /workloads/:id/stream
4. Client submits task → POST /workloads/:id/tasks
- Daemon validates kind against scopes
- Enqueues via BullMQ
- Returns task ID
5. Worker picks up job:
a. Hydrate context (workload secrets + KB + accumulated context)
b. Resolve kind → skill runner
c. Execute skill
d. Stream progress events → WebSocket manager → client
e. Sanitize output (redact secrets/PII)
f. Deliver result frame
g. Update workload context with result (for subsequent tasks)
6. Client submits more tasks or closes workloadKnowledge Base
KB sources are referenced by name in the workload. During hydration, the daemon resolves the reference and injects content into the skill's context. Clients never download KB material directly.
KB format is TBD — candidates:
- Markdown files on disk (simplest)
- Embeddings in a vector store (semantic retrieval)
- References to external systems (Notion, Confluence, etc.)
Technology Decisions
Hono (HTTP framework)
Evaluated Hono, Elysia, h3, and raw Bun.serve(). Hono wins decisively:
- MCP SDK ships Hono as a direct dependency (not peer, not optional) with an official
WebStandardStreamableHTTPServerTransportand a Hono example. Mounting MCP at/mcpis three lines. - Zero transitive dependencies — the npm package is a single module.
- First-class Bun adapter (
hono/bun) using Web StandardRequest/Response. - Middleware chaining for auth, CORS, error handling — critical as the API surface grows.
- WebSocket support via
hono/ws. - Elysia has only a community MCP plugin (v0.1.1, single maintainer). h3 has no MCP support. Raw
Bun.serve()means reimplementing routing/middleware.
Implementation Plan
Phase 1 — Hono Server ✅
Migrated Bun.serve() routing to Hono. Replaced WebhookServer class with Server class backed by a Hono app. All existing routes and tests preserved. Old webhook.ts removed.
Phase 2 — Workload Lifecycle
Implement workload creation, persistence, and teardown. Add POST /workloads and DELETE /workloads/:id. Workloads stored in SQLite alongside events.
Phase 3 — Task Submission + Execution
Add POST /workloads/:id/tasks and GET /workloads/:id/tasks/:id. Wire task submission through BullMQ to the existing pipeline/skill execution machinery.
Phase 4 — WebSocket Streaming
Implement WS /workloads/:id/stream with progress frames. BullMQ job progress events bridge to WebSocket connections. Handle reconnect + replay.
Phase 5 — Secret Store + Sanitization
Implement secret backend (start with encrypted SQLite). Add output sanitizer that strips secret values before delivery. Wire into the hydration/execution flow.
Phase 6 — KB Integration
Implement KB source resolution and injection during hydration. Start with markdown files on disk, extend to other backends as needed.
To Evaluate
Sandbox Runtime
anthropic-experimental/sandbox-runtime — Anthropic's sandboxed execution environment. Potentially relevant for:
- Task execution isolation — run workload tasks in sandboxed containers rather than bare processes, enforcing the trusted execution boundary at the OS level
- Secret containment — secrets injected into a sandbox can't leak to the host or other workloads
- Ephemeral execution — sandbox teardown guarantees no residual state for
ephemeral: truetasks - Multi-tenant safety — workloads from different clients run in isolated sandboxes
Needs evaluation: runtime overhead, Bun compatibility, how it composes with BullMQ workers, whether it supports streaming output (needed for WebSocket progress frames).
Open Questions
- Secret backend — encrypted SQLite to start, or integrate Vault/SOPS from day one?
- KB format — markdown files, vector store, or external system references?
- Workload persistence — survive daemon restarts, or in-memory with TTL?
- Multi-tenancy — single daemon serving multiple orgs, or one daemon per org?
- Rate limiting — per-client, per-workload, or both?
- Sandbox runtime — evaluate Anthropic's sandbox-runtime for task execution isolation (see above)

