first commit

This commit is contained in:
2026-02-28 23:01:30 +08:00
commit 3956ee4806
415 changed files with 74538 additions and 0 deletions

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,158 @@
---
summary: "Holy grail refactor plan for one unified runtime streaming pipeline across main, subagent, and ACP"
owner: "onutc"
status: "draft"
last_updated: "2026-02-25"
title: "Unified Runtime Streaming Refactor Plan"
---
# Unified Runtime Streaming Refactor Plan
## Objective
Deliver one shared streaming pipeline for `main`, `subagent`, and `acp` so all runtimes get identical coalescing, chunking, delivery ordering, and crash recovery behavior.
## Why this exists
- Current behavior is split across multiple runtime-specific shaping paths.
- Formatting/coalescing bugs can be fixed in one path but remain in others.
- Delivery consistency, duplicate suppression, and recovery semantics are harder to reason about.
## Target architecture
Single pipeline, runtime-specific adapters:
1. Runtime adapters emit canonical events only.
2. Shared stream assembler coalesces and finalizes text/tool/status events.
3. Shared channel projector applies channel-specific chunking/formatting once.
4. Shared delivery ledger enforces idempotent send/replay semantics.
5. Outbound channel adapter executes sends and records delivery checkpoints.
Canonical event contract:
- `turn_started`
- `text_delta`
- `block_final`
- `tool_started`
- `tool_finished`
- `status`
- `turn_completed`
- `turn_failed`
- `turn_cancelled`
## Workstreams
### 1) Canonical streaming contract
- Define strict event schema + validation in core.
- Add adapter contract tests to guarantee each runtime emits compatible events.
- Reject malformed runtime events early and surface structured diagnostics.
### 2) Shared stream processor
- Replace runtime-specific coalescer/projector logic with one processor.
- Processor owns text delta buffering, idle flush, max-chunk splitting, and completion flush.
- Move ACP/main/subagent config resolution into one helper to prevent drift.
### 3) Shared channel projection
- Keep channel adapters dumb: accept finalized blocks and send.
- Move Discord-specific chunking quirks to channel projector only.
- Keep pipeline channel-agnostic before projection.
### 4) Delivery ledger + replay
- Add per-turn/per-chunk delivery IDs.
- Record checkpoints before and after physical send.
- On restart, replay pending chunks idempotently and avoid duplicates.
### 5) Migration and cutover
- Phase 1: shadow mode (new pipeline computes output but old path sends; compare).
- Phase 2: runtime-by-runtime cutover (`acp`, then `subagent`, then `main` or reverse by risk).
- Phase 3: delete legacy runtime-specific streaming code.
## Non-goals
- No changes to ACP policy/permissions model in this refactor.
- No channel-specific feature expansion outside projection compatibility fixes.
- No transport/backend redesign (acpx plugin contract remains as-is unless needed for event parity).
## Risks and mitigations
- Risk: behavioral regressions in existing main/subagent paths.
Mitigation: shadow mode diffing + adapter contract tests + channel e2e tests.
- Risk: duplicate sends during crash recovery.
Mitigation: durable delivery IDs + idempotent replay in delivery adapter.
- Risk: runtime adapters diverge again.
Mitigation: required shared contract test suite for all adapters.
## Acceptance criteria
- All runtimes pass shared streaming contract tests.
- Discord ACP/main/subagent produce equivalent spacing/chunking behavior for tiny deltas.
- Crash/restart replay sends no duplicate chunk for the same delivery ID.
- Legacy ACP projector/coalescer path is removed.
- Streaming config resolution is shared and runtime-independent.

View File

@@ -0,0 +1,396 @@
---
summary: "Plan: isolate browser act:evaluate from Playwright queue using CDP, with end-to-end deadlines and safer ref resolution"
read_when:
- Working on browser `act:evaluate` timeout, abort, or queue blocking issues
- Planning CDP based isolation for evaluate execution
owner: "openclaw"
status: "draft"
last_updated: "2026-02-10"
title: "Browser Evaluate CDP Refactor"
---
# Browser Evaluate CDP Refactor Plan
## Context
`act:evaluate` executes user provided JavaScript in the page. Today it runs via Playwright
(`page.evaluate` or `locator.evaluate`). Playwright serializes CDP commands per page, so a
stuck or long running evaluate can block the page command queue and make every later action
on that tab look "stuck".
PR #13498 adds a pragmatic safety net (bounded evaluate, abort propagation, and best-effort
recovery). This document describes a larger refactor that makes `act:evaluate` inherently
isolated from Playwright so a stuck evaluate cannot wedge normal Playwright operations.
## Goals
- `act:evaluate` cannot permanently block later browser actions on the same tab.
- Timeouts are single source of truth end to end so a caller can rely on a budget.
- Abort and timeout are treated the same way across HTTP and in-process dispatch.
- Element targeting for evaluate is supported without switching everything off Playwright.
- Maintain backward compatibility for existing callers and payloads.
## Non-goals
- Replace all browser actions (click, type, wait, etc.) with CDP implementations.
- Remove the existing safety net introduced in PR #13498 (it remains a useful fallback).
- Introduce new unsafe capabilities beyond the existing `browser.evaluateEnabled` gate.
- Add process isolation (worker process/thread) for evaluate. If we still see hard to recover
stuck states after this refactor, that is a follow-up idea.
## Current Architecture (Why It Gets Stuck)
At a high level:
- Callers send `act:evaluate` to the browser control service.
- The route handler calls into Playwright to execute the JavaScript.
- Playwright serializes page commands, so an evaluate that never finishes blocks the queue.
- A stuck queue means later click/type/wait operations on the tab can appear to hang.
## Proposed Architecture
### 1. Deadline Propagation
Introduce a single budget concept and derive everything from it:
- Caller sets `timeoutMs` (or a deadline in the future).
- The outer request timeout, route handler logic, and the execution budget inside the page
all use the same budget, with small headroom where needed for serialization overhead.
- Abort is propagated as an `AbortSignal` everywhere so cancellation is consistent.
Implementation direction:
- Add a small helper (for example `createBudget({ timeoutMs, signal })`) that returns:
- `signal`: the linked AbortSignal
- `deadlineAtMs`: absolute deadline
- `remainingMs()`: remaining budget for child operations
- Use this helper in:
- `src/browser/client-fetch.ts` (HTTP and in-process dispatch)
- `src/node-host/runner.ts` (proxy path)
- browser action implementations (Playwright and CDP)
### 2. Separate Evaluate Engine (CDP Path)
Add a CDP based evaluate implementation that does not share Playwright's per page command
queue. The key property is that the evaluate transport is a separate WebSocket connection
and a separate CDP session attached to the target.
Implementation direction:
- New module, for example `src/browser/cdp-evaluate.ts`, that:
- Connects to the configured CDP endpoint (browser level socket).
- Uses `Target.attachToTarget({ targetId, flatten: true })` to get a `sessionId`.
- Runs either:
- `Runtime.evaluate` for page level evaluate, or
- `DOM.resolveNode` plus `Runtime.callFunctionOn` for element evaluate.
- On timeout or abort:
- Sends `Runtime.terminateExecution` best-effort for the session.
- Closes the WebSocket and returns a clear error.
Notes:
- This still executes JavaScript in the page, so termination can have side effects. The win
is that it does not wedge the Playwright queue, and it is cancelable at the transport
layer by killing the CDP session.
### 3. Ref Story (Element Targeting Without A Full Rewrite)
The hard part is element targeting. CDP needs a DOM handle or `backendDOMNodeId`, while
today most browser actions use Playwright locators based on refs from snapshots.
Recommended approach: keep existing refs, but attach an optional CDP resolvable id.
#### 3.1 Extend Stored Ref Info
Extend the stored role ref metadata to optionally include a CDP id:
- Today: `{ role, name, nth }`
- Proposed: `{ role, name, nth, backendDOMNodeId?: number }`
This keeps all existing Playwright based actions working and allows CDP evaluate to accept
the same `ref` value when the `backendDOMNodeId` is available.
#### 3.2 Populate backendDOMNodeId At Snapshot Time
When producing a role snapshot:
1. Generate the existing role ref map as today (role, name, nth).
2. Fetch the AX tree via CDP (`Accessibility.getFullAXTree`) and compute a parallel map of
`(role, name, nth) -> backendDOMNodeId` using the same duplicate handling rules.
3. Merge the id back into the stored ref info for the current tab.
If mapping fails for a ref, leave `backendDOMNodeId` undefined. This makes the feature
best-effort and safe to roll out.
#### 3.3 Evaluate Behavior With Ref
In `act:evaluate`:
- If `ref` is present and has `backendDOMNodeId`, run element evaluate via CDP.
- If `ref` is present but has no `backendDOMNodeId`, fall back to the Playwright path (with
the safety net).
Optional escape hatch:
- Extend the request shape to accept `backendDOMNodeId` directly for advanced callers (and
for debugging), while keeping `ref` as the primary interface.
### 4. Keep A Last Resort Recovery Path
Even with CDP evaluate, there are other ways to wedge a tab or a connection. Keep the
existing recovery mechanisms (terminate execution + disconnect Playwright) as a last resort
for:
- legacy callers
- environments where CDP attach is blocked
- unexpected Playwright edge cases
## Implementation Plan (Single Iteration)
### Deliverables
- A CDP based evaluate engine that runs outside the Playwright per-page command queue.
- A single end-to-end timeout/abort budget used consistently by callers and handlers.
- Ref metadata that can optionally carry `backendDOMNodeId` for element evaluate.
- `act:evaluate` prefers the CDP engine when possible and falls back to Playwright when not.
- Tests that prove a stuck evaluate does not wedge later actions.
- Logs/metrics that make failures and fallbacks visible.
### Implementation Checklist
1. Add a shared "budget" helper to link `timeoutMs` + upstream `AbortSignal` into:
- a single `AbortSignal`
- an absolute deadline
- a `remainingMs()` helper for downstream operations
2. Update all caller paths to use that helper so `timeoutMs` means the same thing everywhere:
- `src/browser/client-fetch.ts` (HTTP and in-process dispatch)
- `src/node-host/runner.ts` (node proxy path)
- CLI wrappers that call `/act` (add `--timeout-ms` to `browser evaluate`)
3. Implement `src/browser/cdp-evaluate.ts`:
- connect to the browser-level CDP socket
- `Target.attachToTarget` to get a `sessionId`
- run `Runtime.evaluate` for page evaluate
- run `DOM.resolveNode` + `Runtime.callFunctionOn` for element evaluate
- on timeout/abort: best-effort `Runtime.terminateExecution` then close the socket
4. Extend stored role ref metadata to optionally include `backendDOMNodeId`:
- keep existing `{ role, name, nth }` behavior for Playwright actions
- add `backendDOMNodeId?: number` for CDP element targeting
5. Populate `backendDOMNodeId` during snapshot creation (best-effort):
- fetch AX tree via CDP (`Accessibility.getFullAXTree`)
- compute `(role, name, nth) -> backendDOMNodeId` and merge into the stored ref map
- if mapping is ambiguous or missing, leave the id undefined
6. Update `act:evaluate` routing:
- if no `ref`: always use CDP evaluate
- if `ref` resolves to a `backendDOMNodeId`: use CDP element evaluate
- otherwise: fall back to Playwright evaluate (still bounded and abortable)
7. Keep the existing "last resort" recovery path as a fallback, not the default path.
8. Add tests:
- stuck evaluate times out within budget and the next click/type succeeds
- abort cancels evaluate (client disconnect or timeout) and unblocks subsequent actions
- mapping failures cleanly fall back to Playwright
9. Add observability:
- evaluate duration and timeout counters
- terminateExecution usage
- fallback rate (CDP -> Playwright) and reasons
### Acceptance Criteria
- A deliberately hung `act:evaluate` returns within the caller budget and does not wedge the
tab for later actions.
- `timeoutMs` behaves consistently across CLI, agent tool, node proxy, and in-process calls.
- If `ref` can be mapped to `backendDOMNodeId`, element evaluate uses CDP; otherwise the
fallback path is still bounded and recoverable.
## Testing Plan
- Unit tests:
- `(role, name, nth)` matching logic between role refs and AX tree nodes.
- Budget helper behavior (headroom, remaining time math).
- Integration tests:
- CDP evaluate timeout returns within budget and does not block the next action.
- Abort cancels evaluate and triggers termination best-effort.
- Contract tests:
- Ensure `BrowserActRequest` and `BrowserActResponse` remain compatible.
## Risks And Mitigations
- Mapping is imperfect:
- Mitigation: best-effort mapping, fallback to Playwright evaluate, and add debug tooling.
- `Runtime.terminateExecution` has side effects:
- Mitigation: only use on timeout/abort and document the behavior in errors.
- Extra overhead:
- Mitigation: only fetch AX tree when snapshots are requested, cache per target, and keep
CDP session short lived.
- Extension relay limitations:
- Mitigation: use browser level attach APIs when per page sockets are not available, and
keep the current Playwright path as fallback.
## Open Questions
- Should the new engine be configurable as `playwright`, `cdp`, or `auto`?
- Do we want to expose a new "nodeRef" format for advanced users, or keep `ref` only?
- How should frame snapshots and selector scoped snapshots participate in AX mapping?

View File

@@ -0,0 +1,70 @@
---
last_updated: "2026-01-05"
owner: openclaw
status: complete
summary: 加固 cron.add 输入处理,对齐 schema改进 cron UI/智能体工具
title: Cron Add 加固
x-i18n:
generated_at: "2026-02-03T07:47:26Z"
model: claude-opus-4-5
provider: pi
source_hash: d7e469674bd9435b846757ea0d5dc8f174eaa8533917fc013b1ef4f82859496d
source_path: experiments/plans/cron-add-hardening.md
workflow: 15
---
# Cron Add 加固 & Schema 对齐
## 背景
最近的 Gateway 网关日志显示重复的 `cron.add` 失败,参数无效(缺少 `sessionTarget``wakeMode``payload`,以及格式错误的 `schedule`。这表明至少有一个客户端可能是智能体工具调用路径正在发送包装的或部分指定的任务负载。另外TypeScript 中的 cron 提供商枚举、Gateway 网关 schema、CLI 标志和 UI 表单类型之间存在漂移,加上 `cron.status` 的 UI 不匹配(期望 `jobCount` 而 Gateway 网关返回 `jobs`)。
## 目标
- 通过规范化常见的包装负载并推断缺失的 `kind` 字段来停止 `cron.add` INVALID_REQUEST 垃圾。
- 在 Gateway 网关 schema、cron 类型、CLI 文档和 UI 表单之间对齐 cron 提供商列表。
- 使智能体 cron 工具 schema 明确,以便 LLM 生成正确的任务负载。
- 修复 Control UI cron 状态任务计数显示。
- 添加测试以覆盖规范化和工具行为。
## 非目标
- 更改 cron 调度语义或任务执行行为。
- 添加新的调度类型或 cron 表达式解析。
- 除了必要的字段修复外,不大改 cron 的 UI/UX。
## 发现(当前差距)
- Gateway 网关中的 `CronPayloadSchema` 排除了 `signal` + `imessage`,而 TS 类型包含它们。
- Control UI CronStatus 期望 `jobCount`,但 Gateway 网关返回 `jobs`
- 智能体 cron 工具 schema 允许任意 `job` 对象,导致格式错误的输入。
- Gateway 网关严格验证 `cron.add` 而不进行规范化,因此包装的负载会失败。
## 变更内容
- `cron.add``cron.update` 现在规范化常见的包装形式并推断缺失的 `kind` 字段。
- 智能体 cron 工具 schema 与 Gateway 网关 schema 匹配,减少无效负载。
- 提供商枚举在 Gateway 网关、CLI、UI 和 macOS 选择器之间对齐。
- Control UI 使用 Gateway 网关的 `jobs` 计数字段显示状态。
## 当前行为
- **规范化:**包装的 `data`/`job` 负载被解包;`schedule.kind``payload.kind` 在安全时被推断。
- **默认值:**当缺失时,为 `wakeMode``sessionTarget` 应用安全默认值。
- **提供商:**Discord/Slack/Signal/iMessage 现在在 CLI/UI 中一致显示。
参见 [Cron 任务](/automation/cron-jobs) 了解规范化的形式和示例。
## 验证
- 观察 Gateway 网关日志中 `cron.add` INVALID_REQUEST 错误是否减少。
- 确认 Control UI cron 状态在刷新后显示任务计数。
## 可选后续工作
- 手动 Control UI 冒烟测试:为每个提供商添加一个 cron 任务 + 验证状态任务计数。
## 开放问题
- `cron.add` 是否应该接受来自客户端的显式 `state`(当前被 schema 禁止)?
- 我们是否应该允许 `webchat` 作为显式投递提供商(当前在投递解析中被过滤)?

View File

@@ -0,0 +1,45 @@
---
read_when:
- 查看历史 Telegram 允许列表更改
summary: Telegram 允许列表加固:前缀 + 空白规范化
title: Telegram 允许列表加固
x-i18n:
generated_at: "2026-02-03T07:47:16Z"
model: claude-opus-4-5
provider: pi
source_hash: a2eca5fcc85376948cfe1b6044f1a8bc69c7f0eb94d1ceafedc1e507ba544162
source_path: experiments/plans/group-policy-hardening.md
workflow: 15
---
# Telegram 允许列表加固
**日期**2026-01-05
**状态**:已完成
**PR**#216
## 摘要
Telegram 允许列表现在不区分大小写地接受 `telegram:``tg:` 前缀,并容忍意外的空白。这使入站允许列表检查与出站发送规范化保持一致。
## 更改内容
- 前缀 `telegram:``tg:` 被同等对待(不区分大小写)。
- 允许列表条目会被修剪;空条目会被忽略。
## 示例
以下所有形式都被接受为同一 ID
- `telegram:123456`
- `TG:123456`
- `tg:123456`
## 为什么重要
从日志或聊天 ID 复制/粘贴通常会包含前缀和空白。规范化可避免在决定是否在私信或群组中响应时出现误判。
## 相关文档
- [群聊](/channels/groups)
- [Telegram 提供商](/channels/telegram)

View File

@@ -0,0 +1,121 @@
---
last_updated: "2026-01-19"
owner: openclaw
status: draft
summary: 计划:添加 OpenResponses /v1/responses 端点并干净地弃用 chat completions
title: OpenResponses Gateway 网关计划
x-i18n:
generated_at: "2026-02-03T07:47:33Z"
model: claude-opus-4-5
provider: pi
source_hash: 71a22c48397507d1648b40766a3153e420c54f2a2d5186d07e51eb3d12e4636a
source_path: experiments/plans/openresponses-gateway.md
workflow: 15
---
# OpenResponses Gateway 网关集成计划
## 背景
OpenClaw Gateway 网关目前在 `/v1/chat/completions` 暴露了一个最小的 OpenAI 兼容 Chat Completions 端点(参见 [OpenAI Chat Completions](/gateway/openai-http-api))。
Open Responses 是基于 OpenAI Responses API 的开放推理标准。它专为智能体工作流设计使用基于项目的输入加语义流式事件。OpenResponses 规范定义的是 `/v1/responses`,而不是 `/v1/chat/completions`
## 目标
- 添加一个遵循 OpenResponses 语义的 `/v1/responses` 端点。
- 保留 Chat Completions 作为兼容层,易于禁用并最终移除。
- 使用隔离的、可复用的 schema 标准化验证和解析。
## 非目标
- 第一阶段完全实现 OpenResponses 功能(图片、文件、托管工具)。
- 替换内部智能体执行逻辑或工具编排。
- 在第一阶段更改现有的 `/v1/chat/completions` 行为。
## 研究摘要
来源OpenResponses OpenAPI、OpenResponses 规范网站和 Hugging Face 博客文章。
提取的关键点:
- `POST /v1/responses` 接受 `CreateResponseBody` 字段,如 `model``input`(字符串或 `ItemParam[]`)、`instructions``tools``tool_choice``stream``max_output_tokens``max_tool_calls`
- `ItemParam` 是以下类型的可区分联合:
- 具有角色 `system``developer``user``assistant``message`
- `function_call``function_call_output`
- `reasoning`
- `item_reference`
- 成功响应返回带有 `object: "response"``status``output` 项的 `ResponseResource`
- 流式传输使用语义事件,如:
- `response.created``response.in_progress``response.completed``response.failed`
- `response.output_item.added``response.output_item.done`
- `response.content_part.added``response.content_part.done`
- `response.output_text.delta``response.output_text.done`
- 规范要求:
- `Content-Type: text/event-stream`
- `event:` 必须匹配 JSON `type` 字段
- 终止事件必须是字面量 `[DONE]`
- Reasoning 项可能暴露 `content``encrypted_content``summary`
- HF 示例在请求中包含 `OpenResponses-Version: latest`(可选头部)。
## 提议的架构
- 添加 `src/gateway/open-responses.schema.ts`,仅包含 Zod schema无 gateway 导入)。
- 添加 `src/gateway/openresponses-http.ts`(或 `open-responses-http.ts`)用于 `/v1/responses`
- 保持 `src/gateway/openai-http.ts` 不变,作为遗留兼容适配器。
- 添加配置 `gateway.http.endpoints.responses.enabled`(默认 `false`)。
- 保持 `gateway.http.endpoints.chatCompletions.enabled` 独立;允许两个端点分别切换。
- 当 Chat Completions 启用时发出启动警告,以表明其遗留状态。
## Chat Completions 弃用路径
- 保持严格的模块边界responses 和 chat completions 之间不共享 schema 类型。
- 通过配置使 Chat Completions 成为可选,这样无需代码更改即可禁用。
- 一旦 `/v1/responses` 稳定,更新文档将 Chat Completions 标记为遗留。
- 可选的未来步骤:将 Chat Completions 请求映射到 Responses 处理器,以便更简单地移除。
## 第一阶段支持子集
- 接受 `input` 为字符串或带有消息角色和 `function_call_output``ItemParam[]`
- 将 system 和 developer 消息提取到 `extraSystemPrompt` 中。
- 使用最近的 `user``function_call_output` 作为智能体运行的当前消息。
- 对不支持的内容部分(图片/文件)返回 `invalid_request_error` 拒绝。
- 返回带有 `output_text` 内容的单个助手消息。
- 返回带有零值的 `usage`,直到 token 计数接入。
## 验证策略(无 SDK
- 为以下支持子集实现 Zod schema
- `CreateResponseBody`
- `ItemParam` + 消息内容部分联合
- `ResponseResource`
- Gateway 网关使用的流式事件形状
- 将 schema 保存在单个隔离模块中,以避免漂移并允许未来代码生成。
## 流式实现(第一阶段)
- 带有 `event:``data:` 的 SSE 行。
- 所需序列(最小可行):
- `response.created`
- `response.output_item.added`
- `response.content_part.added`
- `response.output_text.delta`(根据需要重复)
- `response.output_text.done`
- `response.content_part.done`
- `response.completed`
- `[DONE]`
## 测试和验证计划
-`/v1/responses` 添加端到端覆盖:
- 需要认证
- 非流式响应形状
- 流式事件顺序和 `[DONE]`
- 使用头部和 `user` 的会话路由
- 保持 `src/gateway/openai-http.e2e.test.ts` 不变。
- 手动:用 `stream: true` curl `/v1/responses` 并验证事件顺序和终止 `[DONE]`
## 文档更新(后续)
-`/v1/responses` 使用和示例添加新文档页面。
- 更新 `/gateway/openai-http-api`,添加遗留说明和指向 `/v1/responses` 的指针。

View File

@@ -0,0 +1,317 @@
---
summary: "Production plan for reliable interactive process supervision (PTY + non-PTY) with explicit ownership, unified lifecycle, and deterministic cleanup"
read_when:
- Working on exec/process lifecycle ownership and cleanup
- Debugging PTY and non-PTY supervision behavior
owner: "openclaw"
status: "in-progress"
last_updated: "2026-02-15"
title: "PTY and Process Supervision Plan"
---
# PTY and Process Supervision Plan
## 1. Problem and goal
We need one reliable lifecycle for long-running command execution across:
- `exec` foreground runs
- `exec` background runs
- `process` follow up actions (`poll`, `log`, `send-keys`, `paste`, `submit`, `kill`, `remove`)
- CLI agent runner subprocesses
The goal is not just to support PTY. The goal is predictable ownership, cancellation, timeout, and cleanup with no unsafe process matching heuristics.
## 2. Scope and boundaries
- Keep implementation internal in `src/process/supervisor`.
- Do not create a new package for this.
- Keep current behavior compatibility where practical.
- Do not broaden scope to terminal replay or tmux style session persistence.
## 3. Implemented in this branch
### Supervisor baseline already present
- Supervisor module is in place under `src/process/supervisor/*`.
- Exec runtime and CLI runner are already routed through supervisor spawn and wait.
- Registry finalization is idempotent.
### This pass completed
1. Explicit PTY command contract
- `SpawnInput` is now a discriminated union in `src/process/supervisor/types.ts`.
- PTY runs require `ptyCommand` instead of reusing generic `argv`.
- Supervisor no longer rebuilds PTY command strings from argv joins in `src/process/supervisor/supervisor.ts`.
- Exec runtime now passes `ptyCommand` directly in `src/agents/bash-tools.exec-runtime.ts`.
2. Process layer type decoupling
- Supervisor types no longer import `SessionStdin` from agents.
- Process local stdin contract lives in `src/process/supervisor/types.ts` (`ManagedRunStdin`).
- Adapters now depend only on process level types:
- `src/process/supervisor/adapters/child.ts`
- `src/process/supervisor/adapters/pty.ts`
3. Process tool lifecycle ownership improvement
- `src/agents/bash-tools.process.ts` now requests cancellation through supervisor first.
- `process kill/remove` now use process-tree fallback termination when supervisor lookup misses.
- `remove` keeps deterministic remove behavior by dropping running session entries immediately after termination is requested.
4. Single source watchdog defaults
- Added shared defaults in `src/agents/cli-watchdog-defaults.ts`.
- `src/agents/cli-backends.ts` consumes the shared defaults.
- `src/agents/cli-runner/reliability.ts` consumes the same shared defaults.
5. Dead helper cleanup
- Removed unused `killSession` helper path from `src/agents/bash-tools.shared.ts`.
6. Direct supervisor path tests added
- Added `src/agents/bash-tools.process.supervisor.test.ts` to cover kill and remove routing through supervisor cancellation.
7. Reliability gap fixes completed
- `src/agents/bash-tools.process.ts` now falls back to real OS-level process termination when supervisor lookup misses.
- `src/process/supervisor/adapters/child.ts` now uses process-tree termination semantics for default cancel/timeout kill paths.
- Added shared process-tree utility in `src/process/kill-tree.ts`.
8. PTY contract edge-case coverage added
- Added `src/process/supervisor/supervisor.pty-command.test.ts` for verbatim PTY command forwarding and empty-command rejection.
- Added `src/process/supervisor/adapters/child.test.ts` for process-tree kill behavior in child adapter cancellation.
## 4. Remaining gaps and decisions
### Reliability status
The two required reliability gaps for this pass are now closed:
- `process kill/remove` now has a real OS termination fallback when supervisor lookup misses.
- child cancel/timeout now uses process-tree kill semantics for default kill path.
- Regression tests were added for both behaviors.
### Durability and startup reconciliation
Restart behavior is now explicitly defined as in-memory lifecycle only.
- `reconcileOrphans()` remains a no-op in `src/process/supervisor/supervisor.ts` by design.
- Active runs are not recovered after process restart.
- This boundary is intentional for this implementation pass to avoid partial persistence risks.
### Maintainability follow-ups
1. `runExecProcess` in `src/agents/bash-tools.exec-runtime.ts` still handles multiple responsibilities and can be split into focused helpers in a follow-up.
## 5. Implementation plan
The implementation pass for required reliability and contract items is complete.
Completed:
- `process kill/remove` fallback real termination
- process-tree cancellation for child adapter default kill path
- regression tests for fallback kill and child adapter kill path
- PTY command edge-case tests under explicit `ptyCommand`
- explicit in-memory restart boundary with `reconcileOrphans()` no-op by design
Optional follow-up:
- split `runExecProcess` into focused helpers with no behavior drift
## 6. File map
### Process supervisor
- `src/process/supervisor/types.ts` updated with discriminated spawn input and process local stdin contract.
- `src/process/supervisor/supervisor.ts` updated to use explicit `ptyCommand`.
- `src/process/supervisor/adapters/child.ts` and `src/process/supervisor/adapters/pty.ts` decoupled from agent types.
- `src/process/supervisor/registry.ts` idempotent finalize unchanged and retained.
### Exec and process integration
- `src/agents/bash-tools.exec-runtime.ts` updated to pass PTY command explicitly and keep fallback path.
- `src/agents/bash-tools.process.ts` updated to cancel via supervisor with real process-tree fallback termination.
- `src/agents/bash-tools.shared.ts` removed direct kill helper path.
### CLI reliability
- `src/agents/cli-watchdog-defaults.ts` added as shared baseline.
- `src/agents/cli-backends.ts` and `src/agents/cli-runner/reliability.ts` now consume same defaults.
## 7. Validation run in this pass
Unit tests:
- `pnpm vitest src/process/supervisor/registry.test.ts`
- `pnpm vitest src/process/supervisor/supervisor.test.ts`
- `pnpm vitest src/process/supervisor/supervisor.pty-command.test.ts`
- `pnpm vitest src/process/supervisor/adapters/child.test.ts`
- `pnpm vitest src/agents/cli-backends.test.ts`
- `pnpm vitest src/agents/bash-tools.exec.pty-cleanup.test.ts`
- `pnpm vitest src/agents/bash-tools.process.poll-timeout.test.ts`
- `pnpm vitest src/agents/bash-tools.process.supervisor.test.ts`
- `pnpm vitest src/process/exec.test.ts`
E2E targets:
- `pnpm vitest src/agents/cli-runner.test.ts`
- `pnpm vitest run src/agents/bash-tools.exec.pty-fallback.test.ts src/agents/bash-tools.exec.background-abort.test.ts src/agents/bash-tools.process.send-keys.test.ts`
Typecheck note:
- Use `pnpm build` (and `pnpm check` for full lint/docs gate) in this repo. Older notes that mention `pnpm tsgo` are obsolete.
## 8. Operational guarantees preserved
- Exec env hardening behavior is unchanged.
- Approval and allowlist flow is unchanged.
- Output sanitization and output caps are unchanged.
- PTY adapter still guarantees wait settlement on forced kill and listener disposal.
## 9. Definition of done
1. Supervisor is lifecycle owner for managed runs.
2. PTY spawn uses explicit command contract with no argv reconstruction.
3. Process layer has no type dependency on agent layer for supervisor stdin contracts.
4. Watchdog defaults are single source.
5. Targeted unit and e2e tests remain green.
6. Restart durability boundary is explicitly documented or fully implemented.
## 10. Summary
The branch now has a coherent and safer supervision shape:
- explicit PTY contract
- cleaner process layering
- supervisor driven cancellation path for process operations
- real fallback termination when supervisor lookup misses
- process-tree cancellation for child-run default kill paths
- unified watchdog defaults
- explicit in-memory restart boundary (no orphan reconciliation across restart in this pass)

View File

@@ -0,0 +1,324 @@
---
summary: "Channel agnostic session binding architecture and iteration 1 delivery scope"
read_when:
- Refactoring channel-agnostic session routing and bindings
- Investigating duplicate, stale, or missing session delivery across channels
owner: "onutc"
status: "in-progress"
last_updated: "2026-02-21"
title: "Session Binding Channel Agnostic Plan"
---
# Session Binding Channel Agnostic Plan
## Overview
This document defines the long term channel agnostic session binding model and the concrete scope for the next implementation iteration.
Goal:
- make subagent bound session routing a core capability
- keep channel specific behavior in adapters
- avoid regressions in normal Discord behavior
## Why this exists
Current behavior mixes:
- completion content policy
- destination routing policy
- Discord specific details
This caused edge cases such as:
- duplicate main and thread delivery under concurrent runs
- stale token usage on reused binding managers
- missing activity accounting for webhook sends
## Iteration 1 scope
This iteration is intentionally limited.
### 1. Add channel agnostic core interfaces
Add core types and service interfaces for bindings and routing.
Proposed core types:
```ts
export type BindingTargetKind = "subagent" | "session";
export type BindingStatus = "active" | "ending" | "ended";
export type ConversationRef = {
channel: string;
accountId: string;
conversationId: string;
parentConversationId?: string;
};
export type SessionBindingRecord = {
bindingId: string;
targetSessionKey: string;
targetKind: BindingTargetKind;
conversation: ConversationRef;
status: BindingStatus;
boundAt: number;
expiresAt?: number;
metadata?: Record<string, unknown>;
};
```
Core service contract:
```ts
export interface SessionBindingService {
bind(input: {
targetSessionKey: string;
targetKind: BindingTargetKind;
conversation: ConversationRef;
metadata?: Record<string, unknown>;
ttlMs?: number;
}): Promise<SessionBindingRecord>;
listBySession(targetSessionKey: string): SessionBindingRecord[];
resolveByConversation(ref: ConversationRef): SessionBindingRecord | null;
touch(bindingId: string, at?: number): void;
unbind(input: {
bindingId?: string;
targetSessionKey?: string;
reason: string;
}): Promise<SessionBindingRecord[]>;
}
```
### 2. Add one core delivery router for subagent completions
Add a single destination resolution path for completion events.
Router contract:
```ts
export interface BoundDeliveryRouter {
resolveDestination(input: {
eventKind: "task_completion";
targetSessionKey: string;
requester?: ConversationRef;
failClosed: boolean;
}): {
binding: SessionBindingRecord | null;
mode: "bound" | "fallback";
reason: string;
};
}
```
For this iteration:
- only `task_completion` is routed through this new path
- existing paths for other event kinds remain as-is
### 3. Keep Discord as adapter
Discord remains the first adapter implementation.
Adapter responsibilities:
- create/reuse thread conversations
- send bound messages via webhook or channel send
- validate thread state (archived/deleted)
- map adapter metadata (webhook identity, thread ids)
### 4. Fix currently known correctness issues
Required in this iteration:
- refresh token usage when reusing existing thread binding manager
- record outbound activity for webhook based Discord sends
- stop implicit main channel fallback when a bound thread destination is selected for session mode completion
### 5. Preserve current runtime safety defaults
No behavior change for users with thread bound spawn disabled.
Defaults stay:
- `channels.discord.threadBindings.spawnSubagentSessions = false`
Result:
- normal Discord users stay on current behavior
- new core path affects only bound session completion routing where enabled
## Not in iteration 1
Explicitly deferred:
- ACP binding targets (`targetKind: "acp"`)
- new channel adapters beyond Discord
- global replacement of all delivery paths (`spawn_ack`, future `subagent_message`)
- protocol level changes
- store migration/versioning redesign for all binding persistence
Notes on ACP:
- interface design keeps room for ACP
- ACP implementation is not started in this iteration
## Routing invariants
These invariants are mandatory for iteration 1.
- destination selection and content generation are separate steps
- if session mode completion resolves to an active bound destination, delivery must target that destination
- no hidden reroute from bound destination to main channel
- fallback behavior must be explicit and observable
## Compatibility and rollout
Compatibility target:
- no regression for users with thread bound spawning off
- no change to non-Discord channels in this iteration
Rollout:
1. Land interfaces and router behind current feature gates.
2. Route Discord completion mode bound deliveries through router.
3. Keep legacy path for non-bound flows.
4. Verify with targeted tests and canary runtime logs.
## Tests required in iteration 1
Unit and integration coverage required:
- manager token rotation uses latest token after manager reuse
- webhook sends update channel activity timestamps
- two active bound sessions in same requester channel do not duplicate to main channel
- completion for bound session mode run resolves to thread destination only
- disabled spawn flag keeps legacy behavior unchanged
## Proposed implementation files
Core:
- `src/infra/outbound/session-binding-service.ts` (new)
- `src/infra/outbound/bound-delivery-router.ts` (new)
- `src/agents/subagent-announce.ts` (completion destination resolution integration)
Discord adapter and runtime:
- `src/discord/monitor/thread-bindings.manager.ts`
- `src/discord/monitor/reply-delivery.ts`
- `src/discord/send.outbound.ts`
Tests:
- `src/discord/monitor/provider*.test.ts`
- `src/discord/monitor/reply-delivery.test.ts`
- `src/agents/subagent-announce.format.test.ts`
## Done criteria for iteration 1
- core interfaces exist and are wired for completion routing
- correctness fixes above are merged with tests
- no main and thread duplicate completion delivery in session mode bound runs
- no behavior change for disabled bound spawn deployments
- ACP remains explicitly deferred