Prompt Composition

How bento assembles the prompt an agent receives — which file or mechanism owns which part, and which message channel each part lands in.

Every agent invocation runs against a composed prompt. It is not written in one place: it is layered from operator-authored files and bento-injected blocks, then split across two channels — the system message and the user message. Keeping the layers separate is what lets one agent serve multiple pipelines without its instructions contradicting the task.

Who controls each layer

Three parties contribute, and the dividing line is control, not file format:

The runtime — the agent CLI (claude, pi, codex) ships its own base system prompt and tool set. bento does not author this; it is the floor everything else sits on.
bento — injects the operating brief, the notes protocol, and the per-run fact blocks. These are derived from what bento itself provides (the sandbox, the checked-out workspace, the notes mechanism), so only bento can assert them. Not operator-editable.
The operator — authors the persona, the procedure, and the task. This is the configurable surface.

The layers

Layer	Source	Channel	Controlled by	Varies per
Runtime harness	the agent CLI	system	runtime	runtime
Operating brief	bento-injected	system	bento	nothing — stable
Notes protocol	bento-injected (`<notes-protocol>`)	system	bento	nothing — stable
Persona	`agents/<name>/SOUL.md`	system	operator	agent
Procedure	`skills/<name>/SKILL.md`	system	operator	skill
Task	`pipelines.<name>.prompt` (`bento.yaml`)	user	operator	pipeline + event
Per-run facts	bento-injected (`<context-sources>`, `<prior-runs>`, `<notes>`)	user	bento	run

Channels: system vs user

The model API has two message roles, and they are not interchangeable.

System carries identity and standing rules — stable across every turn of a run. Prompt caching matches a prefix, so stable content placed here is cached: a 24-turn run re-sends the system prompt 24 times but pays full price only once.
User carries the task and its per-run data — necessarily variable, so it must sit after the cached prefix or it would bust the cache for everything before it.

This is the recurring split: a stable contract goes in the system channel; the matching per-run fact goes in the user channel.

The operating brief (stable rule about the sandbox) is a system fragment; <context-sources> (this run's specific commit SHA) is a user fragment.
The notes protocol (the constant convention for recording durable signals) is a system fragment; the accumulated <notes> (what prior runs actually recorded) is a user fragment.

Within the system channel the order is operating brief → notes protocol → persona → procedure. The bento-injected blocks (operating brief, notes protocol) are byte-identical across every agent, so leading with them lets all agents share that cached prefix.

The composition rule

Each layer is independent of the layers it composes with.

SOUL.md never names a skill, an output format, or a trigger type.
SKILL.md never assumes a particular agent's voice.
Neither names a pipeline.

A layer that violates this cannot be reused. If SOUL.md hard-codes "you are invoked when someone leaves a review comment", any pipeline that is not a review-comment pipeline inherits a system prompt that lies about the task.

How it is assembled

assemblePrompt (services/daemon/src/pipelines/prompt-assembly.ts) does not hard-code which content goes to which channel. Each contributing block is a fragment that carries its own channel tag:

interface PromptFragment {
  channel: 'system' | 'user'
  text: string
}

compose then groups fragments by channel and joins each — it makes no placement decision. Moving a block between channels is a one-line retag where the fragment is built, not a rewrite of the assembler.

Worked example: the reviewer

The reviewer agent is used by more than one pipeline — a whole-PR merge-readiness review and a single-comment reply. One persona, two procedures:

agents/reviewer/SOUL.md — "You are Reviewer. You read code critically, cite specifics, do not hedge, do not apologise for the author. Read-only." Voice and standards only. Nothing about comments, diff hunks, or output formats.
skills/review/SKILL.md — whole-PR procedure: read the checked-out repo, assess merge readiness, post a GitHub review with inline comments, end with a verdict.
skills/pr-comment-review/SKILL.md — comment procedure: evaluate the comment against HEAD, form a position, emit the structured reply that output: review_comment_reply consumes.
pipeline prompt: — "Review PR #24 … produce APPROVE / REQUEST_CHANGES / HOLD."

The skill differentiates whole-PR review from comment-reply. Because the persona is scenario-free, one reviewer agent serves both — no need for a separate agent per pipeline.

Why output contracts belong to the skill

A structured-JSON output requirement exists because a pipeline declares output: review_comment_reply — it is a property of that procedure, not of the agent's character. It belongs in the skill, not in SOUL.md. If the output contract lived in the persona, every pipeline using that agent would inherit it, including the ones that want a plain prose verdict.

The test for any instruction: if it would still be true for the same agent running a different kind of task, it is persona — put it in SOUL.md. If it changes with the task, it is procedure — put it in the skill, or in the pipeline prompt.