first commit

2026-02-28 23:01:30 +08:00
commit 3956ee4806
415 changed files with 74538 additions and 0 deletions
--- a/content/reference/prompt-caching.md
+++ b/content/reference/prompt-caching.md
@@ -0,0 +1,185 @@
+---
+title: "Prompt Caching"
+summary: "Prompt caching knobs, merge order, provider behavior, and tuning patterns"
+read_when:
+  - You want to reduce prompt token costs with cache retention
+  - You need per-agent cache behavior in multi-agent setups
+  - You are tuning heartbeat and cache-ttl pruning together
+---
+
+# Prompt caching
+
+Prompt caching means the model provider can reuse unchanged prompt prefixes (usually system/developer instructions and other stable context) across turns instead of re-processing them every time. The first matching request writes cache tokens (`cacheWrite`), and later matching requests can read them back (`cacheRead`).
+
+Why this matters: lower token cost, faster responses, and more predictable performance for long-running sessions. Without caching, repeated prompts pay the full prompt cost on every turn even when most input did not change.
+
+This page covers all cache-related knobs that affect prompt reuse and token cost.
+
+For Anthropic pricing details, see:
+[https://docs.anthropic.com/docs/build-with-claude/prompt-caching](https://docs.anthropic.com/docs/build-with-claude/prompt-caching)
+
+## Primary knobs
+
+### `cacheRetention` (model and per-agent)
+
+Set cache retention on model params:
+
+```yaml
+agents:
+  defaults:
+    models:
+      "anthropic/claude-opus-4-6":
+        params:
+          cacheRetention: "short" # none | short | long
+```
+
+Per-agent override:
+
+```yaml
+agents:
+  list:
+    - id: "alerts"
+      params:
+        cacheRetention: "none"
+```
+
+Config merge order:
+
+1. `agents.defaults.models["provider/model"].params`
+2. `agents.list[].params` (matching agent id; overrides by key)
+
+### Legacy `cacheControlTtl`
+
+Legacy values are still accepted and mapped:
+
+- `5m` -> `short`
+- `1h` -> `long`
+
+Prefer `cacheRetention` for new config.
+
+### `contextPruning.mode: "cache-ttl"`
+
+Prunes old tool-result context after cache TTL windows so post-idle requests do not re-cache oversized history.
+
+```yaml
+agents:
+  defaults:
+    contextPruning:
+      mode: "cache-ttl"
+      ttl: "1h"
+```
+
+See [Session Pruning](/concepts/session-pruning) for full behavior.
+
+### Heartbeat keep-warm
+
+Heartbeat can keep cache windows warm and reduce repeated cache writes after idle gaps.
+
+```yaml
+agents:
+  defaults:
+    heartbeat:
+      every: "55m"
+```
+
+Per-agent heartbeat is supported at `agents.list[].heartbeat`.
+
+## Provider behavior
+
+### Anthropic (direct API)
+
+- `cacheRetention` is supported.
+- With Anthropic API-key auth profiles, OpenClaw seeds `cacheRetention: "short"` for Anthropic model refs when unset.
+
+### Amazon Bedrock
+
+- Anthropic Claude model refs (`amazon-bedrock/*anthropic.claude*`) support explicit `cacheRetention` pass-through.
+- Non-Anthropic Bedrock models are forced to `cacheRetention: "none"` at runtime.
+
+### OpenRouter Anthropic models
+
+For `openrouter/anthropic/*` model refs, OpenClaw injects Anthropic `cache_control` on system/developer prompt blocks to improve prompt-cache reuse.
+
+### Other providers
+
+If the provider does not support this cache mode, `cacheRetention` has no effect.
+
+## Tuning patterns
+
+### Mixed traffic (recommended default)
+
+Keep a long-lived baseline on your main agent, disable caching on bursty notifier agents:
+
+```yaml
+agents:
+  defaults:
+    model:
+      primary: "anthropic/claude-opus-4-6"
+    models:
+      "anthropic/claude-opus-4-6":
+        params:
+          cacheRetention: "long"
+  list:
+    - id: "research"
+      default: true
+      heartbeat:
+        every: "55m"
+    - id: "alerts"
+      params:
+        cacheRetention: "none"
+```
+
+### Cost-first baseline
+
+- Set baseline `cacheRetention: "short"`.
+- Enable `contextPruning.mode: "cache-ttl"`.
+- Keep heartbeat below your TTL only for agents that benefit from warm caches.
+
+## Cache diagnostics
+
+OpenClaw exposes dedicated cache-trace diagnostics for embedded agent runs.
+
+### `diagnostics.cacheTrace` config
+
+```yaml
+diagnostics:
+  cacheTrace:
+    enabled: true
+    filePath: "~/.openclaw/logs/cache-trace.jsonl" # optional
+    includeMessages: false # default true
+    includePrompt: false # default true
+    includeSystem: false # default true
+```
+
+Defaults:
+
+- `filePath`: `$OPENCLAW_STATE_DIR/logs/cache-trace.jsonl`
+- `includeMessages`: `true`
+- `includePrompt`: `true`
+- `includeSystem`: `true`
+
+### Env toggles (one-off debugging)
+
+- `OPENCLAW_CACHE_TRACE=1` enables cache tracing.
+- `OPENCLAW_CACHE_TRACE_FILE=/path/to/cache-trace.jsonl` overrides output path.
+- `OPENCLAW_CACHE_TRACE_MESSAGES=0|1` toggles full message payload capture.
+- `OPENCLAW_CACHE_TRACE_PROMPT=0|1` toggles prompt text capture.
+- `OPENCLAW_CACHE_TRACE_SYSTEM=0|1` toggles system prompt capture.
+
+### What to inspect
+
+- Cache trace events are JSONL and include staged snapshots like `session:loaded`, `prompt:before`, `stream:context`, and `session:after`.
+- Per-turn cache token impact is visible in normal usage surfaces via `cacheRead` and `cacheWrite` (for example `/usage full` and session usage summaries).
+
+## Quick troubleshooting
+
+- High `cacheWrite` on most turns: check for volatile system-prompt inputs and verify model/provider supports your cache settings.
+- No effect from `cacheRetention`: confirm model key matches `agents.defaults.models["provider/model"]`.
+- Bedrock Nova/Mistral requests with cache settings: expected runtime force to `none`.
+
+Related docs:
+
+- [Anthropic](/providers/anthropic)
+- [Token Use and Costs](/reference/token-use)
+- [Session Pruning](/concepts/session-pruning)
+- [Gateway Configuration Reference](/gateway/configuration-reference)