chore(plans): add input-history and model-selection plans

plans/input-history.md: - Implementation plan for shell-style up/down arrow message history in SimpleInput, deriving history from session log conversation data - Covers prop threading, history derivation, navigation state, keybinding details, modal parity, and test cases plans/model-selection.md: - Three-phase plan for model visibility and control: display current model, model picker at spawn, mid-session model switching via Zellij plans/PLAN-tool-result-display.md: - Updates to tool result display plan (pre-existing changes) plans/subagent-visibility.md: - Updates to subagent visibility plan (pre-existing changes) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 14:51:28 -05:00
parent abbede923d
commit c5b1fb3a80
4 changed files with 765 additions and 89 deletions
--- a/plans/PLAN-tool-result-display.md
+++ b/plans/PLAN-tool-result-display.md
@@ -20,6 +20,8 @@ Add the ability to view tool call results (diffs, bash output, file contents) di
 - Copy-to-clipboard functionality
 - Virtual scrolling / performance optimization
 - Editor integration (clicking paths to open files)
+- Accessibility (keyboard navigation, focus management, ARIA labels — deferred to v2)
+- Lazy-fetch API for tool results (consider for v2 if payload size becomes an issue)

 ---

@@ -61,44 +63,46 @@ Add the ability to view tool call results (diffs, bash output, file contents) di
 - **AC-1:** Tool calls render as expandable elements showing tool name and summary
 - **AC-2:** Clicking a collapsed tool call expands to show its result
 - **AC-3:** Clicking an expanded tool call collapses it
- **AC-4:** Tool results in the most recent assistant message are expanded by default
- **AC-5:** When a new assistant message arrives, previous tool results collapse
- **AC-6:** Edit and Write tool diffs remain expanded regardless of message age
- **AC-7:** Tool calls without results display as non-expandable with muted styling
+- **AC-4:** In active sessions, tool results in the most recent assistant message are expanded by default
+- **AC-5:** When a new assistant message arrives, previous non-diff tool results collapse unless the user has manually toggled them in that message
+- **AC-6:** Edit and Write results remain expanded regardless of message age or session status (even if Write only has confirmation text)
+- **AC-7:** In completed sessions, all non-diff tool results start collapsed
+- **AC-8:** Tool calls without results display as non-expandable with muted styling; in active sessions, pending tool calls show a spinner to distinguish in-progress from permanently missing

 ### Diff Rendering

- **AC-8:** Edit/Write results display structuredPatch data as syntax-highlighted diff
- **AC-9:** Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
- **AC-10:** Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
- **AC-11:** Full file path displays above each diff block
- **AC-12:** Diff context lines use structuredPatch as-is (no recomputation)
+- **AC-9:** Edit/Write results display structuredPatch data as syntax-highlighted diff; falls back to raw content text if structuredPatch is malformed or absent
+- **AC-10:** Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
+- **AC-11:** Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
+- **AC-12:** Full file path displays above each diff block
+- **AC-13:** Diff context lines use structuredPatch as-is (no recomputation)

 ### Other Tool Types

- **AC-13:** Bash results display stdout in monospace, stderr separately if present
- **AC-14:** Read results display file content with syntax highlighting based on file extension
- **AC-15:** Grep/Glob results display file list with match counts
- **AC-16:** WebFetch results display URL and response summary
+- **AC-14:** Bash results display stdout in monospace, stderr separately if present
+- **AC-15:** Bash output with ANSI escape codes renders as colored HTML (via ansi_up)
+- **AC-16:** Read results display file content with syntax highlighting based on file extension
+- **AC-17:** Grep/Glob results display file list with match counts
+- **AC-18:** Unknown tools (WebFetch, Task, etc.) use GenericResult fallback showing raw content

 ### Truncation

- **AC-17:** Long outputs truncate at thresholds matching Claude Code behavior
- **AC-18:** Truncated outputs show "Show full output (N lines)" link
- **AC-19:** Clicking "Show full output" opens a dedicated lightweight modal
- **AC-20:** Modal displays full content with syntax highlighting, scrollable
+- **AC-19:** Long outputs truncate at configurable line/character thresholds (defaults tuned to approximate Claude Code behavior)
+- **AC-20:** Truncated outputs show "Show full output (N lines)" link
+- **AC-21:** Clicking "Show full output" opens a dedicated lightweight modal
+- **AC-22:** Modal displays full content with syntax highlighting, scrollable

 ### Error States

- **AC-21:** Failed tool calls display with red-tinted background
- **AC-22:** Error content (stderr, error messages) is clearly distinguishable from success content
- **AC-23:** is_error flag from tool_result determines error state
+- **AC-23:** Failed tool calls display with red-tinted background
+- **AC-24:** Error content (stderr, error messages) is clearly distinguishable from success content
+- **AC-25:** is_error flag from tool_result determines error state

 ### API Contract

- **AC-24:** /api/conversation response includes tool results nested in tool_calls
- **AC-25:** Each tool_call has: name, id, input, result (when available)
- **AC-26:** Result structure varies by tool type (documented in IMP-SERVER)
+- **AC-26:** /api/conversation response includes tool results nested in tool_calls
+- **AC-27:** Each tool_call has: name, id, input, result (when available)
+- **AC-28:** All tool results conform to a normalized envelope: `{ kind, status, content, is_error }` with tool-specific fields nested in `content`

 ---

@@ -130,6 +134,23 @@ Full output can be thousands of lines. Inline expansion would:

 A modal provides a focused reading experience without disrupting conversation layout.

+### Why a Normalized Result Contract
+
+Raw `toolUseResult` shapes vary wildly by tool type — Edit has `structuredPatch`, Bash has `stdout`/`stderr`, Glob has `filenames`. Passing these raw to the frontend means every renderer must know the exact JSONL format, and adding Codex support (v2) would require duplicating all that branching.
+
+Instead, the server normalizes each result into a stable envelope:
+
+```python
+{
+    "kind": "diff" | "bash" | "file_content" | "file_list" | "generic",
+    "status": "success" | "error" | "pending",
+    "is_error": bool,
+    "content": { ... }  # tool-specific fields, documented per kind
+}
+```
+
+The frontend switches on `kind` (5 cases) rather than tool name (unbounded). This also gives us a clean seam for the `result_mode` query parameter if payload size becomes an issue later.
+
 ### Component Structure

 ```
@@ -157,7 +178,7 @@ FullOutputModal (new, top-level)

 ### IMP-SERVER: Parse and Attach Tool Results

-**Fulfills:** AC-24, AC-25, AC-26
+**Fulfills:** AC-26, AC-27, AC-28

 **Location:** `amc_server/mixins/conversation.py`

@@ -167,38 +188,43 @@ Two-pass parsing:
 1. First pass: Scan all entries, build map of `tool_use_id` → `toolUseResult`
 2. Second pass: Parse messages as before, but when encountering `tool_use`, lookup and attach result

-**Tool call schema after change:**
+**API query parameter:** `/api/conversation?result_mode=full` (default). Future option: `result_mode=preview` to return truncated previews and reduce payload size without an API-breaking change.
+
+**Normalization step:** After looking up the raw `toolUseResult`, the server normalizes it into the stable envelope before attaching:
+
 ```python
 {
    "name": "Edit",
    "id": "toolu_abc123",
    "input": {"file_path": "...", "old_string": "...", "new_string": "..."},
    "result": {
-        "content": "The file has been updated successfully.",
+        "kind": "diff",
+        "status": "success",
        "is_error": False,
-        "structuredPatch": [...],
-        "filePath": "...",
-        # ... other fields from toolUseResult
+        "content": {
+            "structuredPatch": [...],
+            "filePath": "...",
+            "text": "The file has been updated successfully."
+        }
    }
 }
 ```

-**Result Structure by Tool Type:**
+**Normalized `kind` mapping:**

-| Tool | Result Fields |
-|------|---------------|
-| Edit | `structuredPatch`, `filePath`, `oldString`, `newString` |
-| Write | `filePath`, content confirmation |
-| Read | `file`, `type`, content in `content` field |
-| Bash | `stdout`, `stderr`, `interrupted` |
-| Glob | `filenames`, `numFiles`, `truncated` |
-| Grep | `content`, `filenames`, `numFiles`, `numLines` |
+| kind | Source Tools | `content` Fields |
+|------|-------------|-----------------|
+| `diff` | Edit, Write | `structuredPatch`, `filePath`, `text` |
+| `bash` | Bash | `stdout`, `stderr`, `interrupted` |
+| `file_content` | Read | `file`, `type`, `text` |
+| `file_list` | Glob, Grep | `filenames`, `numFiles`, `truncated`, `numLines` |
+| `generic` | All others | `text` (raw content string) |

 ---

 ### IMP-TOOLCALL: Expandable Tool Call Component

-**Fulfills:** AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7
+**Fulfills:** AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7, AC-8

 **Location:** `dashboard/lib/markdown.js` (refactor `renderToolCalls`)

@@ -213,16 +239,21 @@ Renders a single tool call with:

 **State Management:**

-Track expanded state per message. When new assistant message arrives:
+Track two sets per message: `autoExpanded` (system-controlled) and `userToggled` (manual clicks).
+
+When new assistant message arrives:
 - Compare latest assistant message ID to stored ID
- If different, reset expanded set to empty
+- If different, reset `autoExpanded` to empty for previous messages
+- `userToggled` entries are never reset — user intent is preserved
 - Edit/Write tools bypass this logic (always expanded via CSS/logic)

+Expand/collapse logic: a tool call is expanded if it is in `userToggled` (explicit click) OR in `autoExpanded` (latest message) OR is Edit/Write kind.
+
 ---

 ### IMP-DIFF: Diff Rendering Component

-**Fulfills:** AC-8, AC-9, AC-10, AC-11, AC-12
+**Fulfills:** AC-9, AC-10, AC-11, AC-12, AC-13

 **Location:** `dashboard/lib/markdown.js` (new function `renderDiff`)

@@ -234,12 +265,13 @@ hljs.registerLanguage('diff', langDiff);

 **Diff Renderer:**

-1. Convert `structuredPatch` array to unified diff text:
+1. If `structuredPatch` is present and valid, convert to unified diff text:
   - Each hunk: `@@ -oldStart,oldLines +newStart,newLines @@`
   - Followed by hunk.lines array
-2. Syntax highlight with hljs diff language
-3. Sanitize with DOMPurify before rendering
-4. Wrap in container with file path header
+2. If `structuredPatch` is missing or malformed, fall back to raw `content.text` in a monospace block
+3. Syntax highlight with hljs diff language
+4. Sanitize with DOMPurify before rendering
+5. Wrap in container with file path header

 **CSS styling:**
 - Container: dark border, rounded corners
@@ -252,22 +284,33 @@ hljs.registerLanguage('diff', langDiff);

 ### IMP-BASH: Bash Output Component

-**Fulfills:** AC-13, AC-21, AC-22
+**Fulfills:** AC-14, AC-15, AC-23, AC-24

 **Location:** `dashboard/lib/markdown.js` (new function `renderBashResult`)

-Renders:
- `stdout` in monospace pre block
+**ANSI-to-HTML conversion:**
+```javascript
+import AnsiUp from 'https://esm.sh/ansi_up';
+const ansi = new AnsiUp();
+const html = ansi.ansi_to_html(bashOutput);
+```
+
+The `ansi_up` library (zero dependencies, ~8KB) converts ANSI escape codes to styled HTML spans, preserving colored test output, progress indicators, and error highlighting from CLI tools.
+
+**Renders:**
+- `stdout` in monospace pre block with ANSI colors preserved
 - `stderr` in separate block with error styling (if present)
 - "Command interrupted" notice (if interrupted flag)

+**Sanitization order (CRITICAL):** First convert ANSI to HTML via ansi_up, THEN sanitize with DOMPurify. Sanitizing before conversion would strip escape codes; sanitizing after preserves the styled spans while preventing XSS.
+
 Error state: `is_error` or presence of stderr triggers error styling (red tint, left border).

 ---

 ### IMP-TRUNCATE: Output Truncation

-**Fulfills:** AC-17, AC-18
+**Fulfills:** AC-19, AC-20

 **Truncation Thresholds (match Claude Code):**

@@ -289,7 +332,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r

 ### IMP-MODAL: Full Output Modal

-**Fulfills:** AC-19, AC-20
+**Fulfills:** AC-21, AC-22

 **Location:** `dashboard/components/FullOutputModal.js` (new file)

@@ -305,7 +348,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r

 ### IMP-ERROR: Error State Styling

-**Fulfills:** AC-21, AC-22, AC-23
+**Fulfills:** AC-23, AC-24, AC-25

 **Styling:**
 - Tool call header: red-tinted background when `result.is_error`
@@ -331,17 +374,19 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r

 ---

-### Slice 2: Server-Side Tool Result Parsing
+### Slice 2: Server-Side Tool Result Parsing and Normalization

-**Goal:** API returns tool results nested in tool_calls
+**Goal:** API returns normalized tool results nested in tool_calls

 **Deliverables:**
 1. Two-pass parsing in `_parse_claude_conversation`
-2. Tool results attached with `id` field
-3. Unit tests for result attachment
-4. Handle missing results gracefully (return tool_call without result)
+2. Normalization layer: raw `toolUseResult` → `{ kind, status, is_error, content }` envelope
+3. Tool results attached with `id` field
+4. Unit tests for result attachment and normalization per tool type
+5. Handle missing results gracefully (return tool_call without result)
+6. Support `result_mode=full` query parameter (only mode for now, but wired up for future `preview`)

-**Exit Criteria:** AC-24, AC-25, AC-26 pass
+**Exit Criteria:** AC-26, AC-27, AC-28 pass

 ---

@@ -356,7 +401,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
 4. Collapse on new assistant message
 5. Keep Edit/Write always expanded

-**Exit Criteria:** AC-1 through AC-7 pass
+**Exit Criteria:** AC-1 through AC-8 pass

 ---

@@ -370,7 +415,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
 3. VS Code dark theme styling
 4. Full file path header

-**Exit Criteria:** AC-8 through AC-12 pass
+**Exit Criteria:** AC-9 through AC-13 pass

 ---

@@ -379,12 +424,13 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
 **Goal:** Bash, Read, Glob, Grep render appropriately

 **Deliverables:**
-1. `renderBashResult` with stdout/stderr separation
-2. `renderFileContent` for Read
-3. `renderFileList` for Glob/Grep
-4. Generic fallback for unknown tools
+1. Import and configure `ansi_up` for ANSI-to-HTML conversion
+2. `renderBashResult` with stdout/stderr separation and ANSI color preservation
+3. `renderFileContent` for Read
+4. `renderFileList` for Glob/Grep
+5. `GenericResult` fallback for unknown tools (WebFetch, Task, etc.)

-**Exit Criteria:** AC-13 through AC-16 pass
+**Exit Criteria:** AC-14 through AC-18 pass

 ---

@@ -398,7 +444,7 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
 3. `FullOutputModal` component
 4. Syntax highlighting in modal

-**Exit Criteria:** AC-17 through AC-20 pass
+**Exit Criteria:** AC-19 through AC-22 pass

 ---

@@ -412,15 +458,16 @@ Takes content string, returns `{ text, truncated, totalLines }`. If truncated, r
 3. Test with interrupted sessions
 4. Cross-browser testing

-**Exit Criteria:** AC-21 through AC-23 pass, feature complete
+**Exit Criteria:** AC-23 through AC-25 pass, feature complete

 ---

 ## Open Questions

-1. **Exact Claude Code truncation thresholds** — need to verify against Claude Code source or experiment
+1. ~~**Exact Claude Code truncation thresholds**~~ — **Resolved:** using reasonable defaults with a note to tune via testing. AC-19 updated.
 2. **Performance with 100+ tool calls** — monitor after ship, optimize if needed
-3. **Codex support timeline** — when should we prioritize v2?
+3. **Codex support timeline** — when should we prioritize v2? The normalized `kind` contract makes this easier: add Codex normalizers without touching renderers.
+4. ~~**Lazy-fetch for large payloads**~~ — **Resolved:** `result_mode` query parameter wired into API contract. Only `full` implemented in v1; `preview` deferred.

 ---