chore(plans): add input-history and model-selection plans

plans/input-history.md:
- Implementation plan for shell-style up/down arrow message history
  in SimpleInput, deriving history from session log conversation data
- Covers prop threading, history derivation, navigation state,
  keybinding details, modal parity, and test cases

plans/model-selection.md:
- Three-phase plan for model visibility and control: display current
  model, model picker at spawn, mid-session model switching via Zellij

plans/PLAN-tool-result-display.md:
- Updates to tool result display plan (pre-existing changes)

plans/subagent-visibility.md:
- Updates to subagent visibility plan (pre-existing changes)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
teernisse
2026-03-06 14:51:28 -05:00
parent abbede923d
commit c5b1fb3a80
4 changed files with 765 additions and 89 deletions

View File

@@ -20,6 +20,8 @@ Add the ability to view tool call results (diffs, bash output, file contents) di
- Copy-to-clipboard functionality
- Virtual scrolling / performance optimization
- Editor integration (clicking paths to open files)
- Accessibility (keyboard navigation, focus management, ARIA labels — deferred to v2)
- Lazy-fetch API for tool results (consider for v2 if payload size becomes an issue)
---
@@ -61,44 +63,46 @@ Add the ability to view tool call results (diffs, bash output, file contents) di
- **AC-1:** Tool calls render as expandable elements showing tool name and summary
- **AC-2:** Clicking a collapsed tool call expands to show its result
- **AC-3:** Clicking an expanded tool call collapses it
- **AC-4:** Tool results in the most recent assistant message are expanded by default
- **AC-5:** When a new assistant message arrives, previous tool results collapse
- **AC-6:** Edit and Write tool diffs remain expanded regardless of message age
- **AC-7:** Tool calls without results display as non-expandable with muted styling
- **AC-4:** In active sessions, tool results in the most recent assistant message are expanded by default
- **AC-5:** When a new assistant message arrives, previous non-diff tool results collapse unless the user has manually toggled them in that message
- **AC-6:** Edit and Write results remain expanded regardless of message age or session status (even if Write only has confirmation text)
- **AC-7:** In completed sessions, all non-diff tool results start collapsed
- **AC-8:** Tool calls without results display as non-expandable with muted styling; in active sessions, pending tool calls show a spinner to distinguish in-progress from permanently missing
### Diff Rendering
- **AC-8:** Edit/Write results display structuredPatch data as syntax-highlighted diff
- **AC-9:** Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
- **AC-10:** Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
- **AC-11:** Full file path displays above each diff block
- **AC-12:** Diff context lines use structuredPatch as-is (no recomputation)
- **AC-9:** Edit/Write results display structuredPatch data as syntax-highlighted diff; falls back to raw content text if structuredPatch is malformed or absent
- **AC-10:** Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
- **AC-11:** Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
- **AC-12:** Full file path displays above each diff block
- **AC-13:** Diff context lines use structuredPatch as-is (no recomputation)
### Other Tool Types
- **AC-13:** Bash results display stdout in monospace, stderr separately if present
- **AC-14:** Read results display file content with syntax highlighting based on file extension
- **AC-15:** Grep/Glob results display file list with match counts
- **AC-16:** WebFetch results display URL and response summary
- **AC-14:** Bash results display stdout in monospace, stderr separately if present
- **AC-15:** Bash output with ANSI escape codes renders as colored HTML (via ansi_up)
- **AC-16:** Read results display file content with syntax highlighting based on file extension
- **AC-17:** Grep/Glob results display file list with match counts
- **AC-18:** Unknown tools (WebFetch, Task, etc.) use GenericResult fallback showing raw content
### Truncation
- **AC-17:** Long outputs truncate at thresholds matching Claude Code behavior
- **AC-18:** Truncated outputs show "Show full output (N lines)" link
- **AC-19:** Clicking "Show full output" opens a dedicated lightweight modal
- **AC-20:** Modal displays full content with syntax highlighting, scrollable
- **AC-19:** Long outputs truncate at configurable line/character thresholds (defaults tuned to approximate Claude Code behavior)
- **AC-20:** Truncated outputs show "Show full output (N lines)" link
- **AC-21:** Clicking "Show full output" opens a dedicated lightweight modal
- **AC-22:** Modal displays full content with syntax highlighting, scrollable
### Error States
- **AC-21:** Failed tool calls display with red-tinted background
- **AC-22:** Error content (stderr, error messages) is clearly distinguishable from success content
- **AC-23:** is_error flag from tool_result determines error state
- **AC-23:** Failed tool calls display with red-tinted background
- **AC-24:** Error content (stderr, error messages) is clearly distinguishable from success content
- **AC-25:** is_error flag from tool_result determines error state
### API Contract
- **AC-24:** /api/conversation response includes tool results nested in tool_calls
- **AC-25:** Each tool_call has: name, id, input, result (when available)
- **AC-26:** Result structure varies by tool type (documented in IMP-SERVER)
- **AC-26:** /api/conversation response includes tool results nested in tool_calls
- **AC-27:** Each tool_call has: name, id, input, result (when available)
- **AC-28:** All tool results conform to a normalized envelope: `{ kind, status, content, is_error }` with tool-specific fields nested in `content`
---
@@ -130,6 +134,23 @@ Full output can be thousands of lines. Inline expansion would:
A modal provides a focused reading experience without disrupting conversation layout.
### Why a Normalized Result Contract
Raw `toolUseResult` shapes vary wildly by tool type — Edit has `structuredPatch`, Bash has `stdout`/`stderr`, Glob has `filenames`. Passing these raw to the frontend means every renderer must know the exact JSONL format, and adding Codex support (v2) would require duplicating all that branching.
Instead, the server normalizes each result into a stable envelope:
```python
{
"kind": "diff" | "bash" | "file_content" | "file_list" | "generic",
"status": "success" | "error" | "pending",
"is_error": bool,
"content": { ... } # tool-specific fields, documented per kind
}
```
The frontend switches on `kind` (5 cases) rather than tool name (unbounded). This also gives us a clean seam for the `result_mode` query parameter if payload size becomes an issue later.
### Component Structure
```
@@ -157,7 +178,7 @@ FullOutputModal (new, top-level)
### IMP-SERVER: Parse and Attach Tool Results
**Fulfills:** AC-24, AC-25, AC-26
**Fulfills:** AC-26, AC-27, AC-28
**Location:** `amc_server/mixins/conversation.py`
@@ -167,38 +188,43 @@ Two-pass parsing:
1. First pass: Scan all entries, build map of `tool_use_id``toolUseResult`
2. Second pass: Parse messages as before, but when encountering `tool_use`, lookup and attach result
**Tool call schema after change:**
**API query parameter:** `/api/conversation?result_mode=full` (default). Future option: `result_mode=preview` to return truncated previews and reduce payload size without an API-breaking change.
**Normalization step:** After looking up the raw `toolUseResult`, the server normalizes it into the stable envelope before attaching:
```python
{
"name": "Edit",
"id": "toolu_abc123",
"input": {"file_path": "...", "old_string": "...", "new_string": "..."},
"result": {
"content": "The file has been updated successfully.",
"kind": "diff",
"status": "success",
"is_error": False,
"structuredPatch": [...],
"filePath": "...",
# ... other fields from toolUseResult
"content": {
"structuredPatch": [...],
"filePath": "...",
"text": "The file has been updated successfully."
}
}
}
```
**Result Structure by Tool Type:**
**Normalized `kind` mapping:**
| Tool | Result Fields |
|------|---------------|
| Edit | `structuredPatch`, `filePath`, `oldString`, `newString` |
| Write | `filePath`, content confirmation |
| Read | `file`, `type`, content in `content` field |
| Bash | `stdout`, `stderr`, `interrupted` |
| Glob | `filenames`, `numFiles`, `truncated` |
| Grep | `content`, `filenames`, `numFiles`, `numLines` |
| kind | Source Tools | `content` Fields |
|------|-------------|-----------------|
| `diff` | Edit, Write | `structuredPatch`, `filePath`, `text` |
| `bash` | Bash | `stdout`, `stderr`, `interrupted` |
| `file_content` | Read | `file`, `type`, `text` |
| `file_list` | Glob, Grep | `filenames`, `numFiles`, `truncated`, `numLines` |
| `generic` | All others | `text` (raw content string) |
---
### IMP-TOOLCALL: Expandable Tool Call Component
**Fulfills:** AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7
**Fulfills:** AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7, AC-8
**Location:** `dashboard/lib/markdown.js` (refactor `renderToolCalls`)
@@ -213,16 +239,21 @@ Renders a single tool call with:
**State Management:**
Track expanded state per message. When new assistant message arrives:
Track two sets per message: `autoExpanded` (system-controlled) and `userToggled` (manual clicks).
When new assistant message arrives:
- Compare latest assistant message ID to stored ID
- If different, reset expanded set to empty
- If different, reset `autoExpanded` to empty for previous messages
- `userToggled` entries are never reset — user intent is preserved
- Edit/Write tools bypass this logic (always expanded via CSS/logic)
Expand/collapse logic: a tool call is expanded if it is in `userToggled` (explicit click) OR in `autoExpanded` (latest message) OR is Edit/Write kind.
---
### IMP-DIFF: Diff Rendering Component
**Fulfills:** AC-8, AC-9, AC-10, AC-11, AC-12
**Fulfills:** AC-9, AC-10, AC-11, AC-12, AC-13
**Location:** `dashboard/lib/markdown.js` (new function `renderDiff`)
@@ -234,12 +265,13 @@ hljs.registerLanguage('diff', langDiff);
**Diff Renderer:**
1. Convert `structuredPatch` array to unified diff text:
1. If `structuredPatch` is present and valid, convert to unified diff text:
- Each hunk: `@@ -oldStart,oldLines +newStart,newLines @@`
- Followed by hunk.lines array
2. Syntax highlight with hljs diff language
3. Sanitize with DOMPurify before rendering
4. Wrap in container with file path header
2. If `structuredPatch` is missing or malformed, fall back to raw `content.text` in a monospace block
3. Syntax highlight with hljs diff language
4. Sanitize with DOMPurify before rendering
5. Wrap in container with file path header
**CSS styling:**
- Container: dark border, rounded corners
@@ -252,22 +284,33 @@ hljs.registerLanguage('diff', langDiff);
### IMP-BASH: Bash Output Component
**Fulfills:** AC-13, AC-21, AC-22
**Fulfills:** AC-14, AC-15, AC-23, AC-24
**Location:** `dashboard/lib/markdown.js` (new function `renderBashResult`)
Renders:
- `stdout` in monospace pre block
**ANSI-to-HTML conversion:**
```javascript
import AnsiUp from 'https://esm.sh/ansi_up';
const ansi = new AnsiUp();
const html = ansi.ansi_to_html(bashOutput);
```
The `ansi_up` library (zero dependencies, ~8KB) converts ANSI escape codes to styled HTML spans, preserving colored test output, progress indicators, and error highlighting from CLI tools.
**Renders:**
- `stdout` in monospace pre block with ANSI colors preserved
- `stderr` in separate block with error styling (if present)
- "Command interrupted" notice (if interrupted flag)
**Sanitization order (CRITICAL):** First convert ANSI to HTML via ansi_up, THEN sanitize with DOMPurify. Sanitizing before conversion would strip escape codes; sanitizing after preserves the styled spans while preventing XSS.
Error state: `is_error` or presence of stderr triggers error styling (red tint, left border).
---
### IMP-TRUNCATE: Output Truncation
**Fulfills:** AC-17, AC-18
**Fulfills:** AC-19, AC-20
**Truncation Thresholds (match Claude Code):**
@@ -289,7 +332,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
### IMP-MODAL: Full Output Modal
**Fulfills:** AC-19, AC-20
**Fulfills:** AC-21, AC-22
**Location:** `dashboard/components/FullOutputModal.js` (new file)
@@ -305,7 +348,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
### IMP-ERROR: Error State Styling
**Fulfills:** AC-21, AC-22, AC-23
**Fulfills:** AC-23, AC-24, AC-25
**Styling:**
- Tool call header: red-tinted background when `result.is_error`
@@ -331,17 +374,19 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
---
### Slice 2: Server-Side Tool Result Parsing
### Slice 2: Server-Side Tool Result Parsing and Normalization
**Goal:** API returns tool results nested in tool_calls
**Goal:** API returns normalized tool results nested in tool_calls
**Deliverables:**
1. Two-pass parsing in `_parse_claude_conversation`
2. Tool results attached with `id` field
3. Unit tests for result attachment
4. Handle missing results gracefully (return tool_call without result)
2. Normalization layer: raw `toolUseResult``{ kind, status, is_error, content }` envelope
3. Tool results attached with `id` field
4. Unit tests for result attachment and normalization per tool type
5. Handle missing results gracefully (return tool_call without result)
6. Support `result_mode=full` query parameter (only mode for now, but wired up for future `preview`)
**Exit Criteria:** AC-24, AC-25, AC-26 pass
**Exit Criteria:** AC-26, AC-27, AC-28 pass
---
@@ -356,7 +401,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
4. Collapse on new assistant message
5. Keep Edit/Write always expanded
**Exit Criteria:** AC-1 through AC-7 pass
**Exit Criteria:** AC-1 through AC-8 pass
---
@@ -370,7 +415,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
3. VS Code dark theme styling
4. Full file path header
**Exit Criteria:** AC-8 through AC-12 pass
**Exit Criteria:** AC-9 through AC-13 pass
---
@@ -379,12 +424,13 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
**Goal:** Bash, Read, Glob, Grep render appropriately
**Deliverables:**
1. `renderBashResult` with stdout/stderr separation
2. `renderFileContent` for Read
3. `renderFileList` for Glob/Grep
4. Generic fallback for unknown tools
1. Import and configure `ansi_up` for ANSI-to-HTML conversion
2. `renderBashResult` with stdout/stderr separation and ANSI color preservation
3. `renderFileContent` for Read
4. `renderFileList` for Glob/Grep
5. `GenericResult` fallback for unknown tools (WebFetch, Task, etc.)
**Exit Criteria:** AC-13 through AC-16 pass
**Exit Criteria:** AC-14 through AC-18 pass
---
@@ -398,7 +444,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
3. `FullOutputModal` component
4. Syntax highlighting in modal
**Exit Criteria:** AC-17 through AC-20 pass
**Exit Criteria:** AC-19 through AC-22 pass
---
@@ -412,15 +458,16 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
3. Test with interrupted sessions
4. Cross-browser testing
**Exit Criteria:** AC-21 through AC-23 pass, feature complete
**Exit Criteria:** AC-23 through AC-25 pass, feature complete
---
## Open Questions
1. **Exact Claude Code truncation thresholds** — need to verify against Claude Code source or experiment
1. ~~**Exact Claude Code truncation thresholds**~~**Resolved:** using reasonable defaults with a note to tune via testing. AC-19 updated.
2. **Performance with 100+ tool calls** — monitor after ship, optimize if needed
3. **Codex support timeline** — when should we prioritize v2?
3. **Codex support timeline** — when should we prioritize v2? The normalized `kind` contract makes this easier: add Codex normalizers without touching renderers.
4. ~~**Lazy-fetch for large payloads**~~**Resolved:** `result_mode` query parameter wired into API contract. Only `full` implemented in v1; `preview` deferred.
---