chore(plans): update implementation plans

plans/PLAN-tool-result-display.md: - Add comprehensive plan for displaying tool results inline in conversation view, including truncation strategies and expand/collapse UI patterns plans/subagent-visibility.md: - Mark completed phases and update remaining work items - Reflects current state of subagent tracking implementation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-28 00:48:55 -05:00
parent 781e74cda2
commit fb9d4e5b9f
2 changed files with 485 additions and 1609 deletions
--- a/plans/PLAN-tool-result-display.md
+++ b/plans/PLAN-tool-result-display.md
@@ -0,0 +1,456 @@
+# Plan: Tool Result Display in AMC Dashboard
+
+> **Status:** Draft — awaiting review and mockup phase
+> **Author:** Claude + Taylor
+> **Created:** 2026-02-27
+
+## Summary
+
+Add the ability to view tool call results (diffs, bash output, file contents) directly in the AMC dashboard conversation view. Currently, users see that a tool was called but cannot see what it did. This feature brings Claude Code's result visibility to the multi-agent dashboard.
+
+### Goals
+
+1. **See code changes as they happen** — diffs from Edit/Write tools always visible
+2. **Debug agent behavior** — inspect Bash output, Read content, search results
+3. **Match Claude Code UX** — familiar expand/collapse behavior with latest results expanded
+
+### Non-Goals (v1)
+
+- Codex agent support (different JSONL format — deferred to v2)
+- Copy-to-clipboard functionality
+- Virtual scrolling / performance optimization
+- Editor integration (clicking paths to open files)
+
+---
+
+## User Workflows
+
+### Workflow 1: Watching an Active Session
+
+1. User opens a session card showing an active Claude agent
+2. Agent calls Edit tool to modify a file
+3. User immediately sees the diff expanded below the tool call pill
+4. Agent calls Bash to run tests
+5. User sees bash output expanded, previous Edit diff stays expanded (it's a diff)
+6. Agent sends a text message explaining results
+7. Bash output collapses (new assistant message arrived), Edit diff stays expanded
+
+### Workflow 2: Reviewing a Completed Session
+
+1. User opens a completed session to review what the agent did
+2. All tool calls are collapsed by default (no "latest" assistant message)
+3. Exception: Edit/Write diffs are still expanded
+4. User clicks a Bash tool call to see what command ran and its output
+5. User clicks "Show full output" when output is truncated
+6. Lightweight modal opens with full scrollable content
+7. User closes modal and continues reviewing
+
+### Workflow 3: Debugging a Failed Tool Call
+
+1. Agent runs a Bash command that fails
+2. Tool result block shows with red-tinted background
+3. stderr content is visible, clearly marked as error
+4. User can see what went wrong without leaving the dashboard
+
+---
+
+## Acceptance Criteria
+
+### Display Behavior
+
+- **AC-1:** Tool calls render as expandable elements showing tool name and summary
+- **AC-2:** Clicking a collapsed tool call expands to show its result
+- **AC-3:** Clicking an expanded tool call collapses it
+- **AC-4:** Tool results in the most recent assistant message are expanded by default
+- **AC-5:** When a new assistant message arrives, previous tool results collapse
+- **AC-6:** Edit and Write tool diffs remain expanded regardless of message age
+- **AC-7:** Tool calls without results display as non-expandable with muted styling
+
+### Diff Rendering
+
+- **AC-8:** Edit/Write results display structuredPatch data as syntax-highlighted diff
+- **AC-9:** Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
+- **AC-10:** Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
+- **AC-11:** Full file path displays above each diff block
+- **AC-12:** Diff context lines use structuredPatch as-is (no recomputation)
+
+### Other Tool Types
+
+- **AC-13:** Bash results display stdout in monospace, stderr separately if present
+- **AC-14:** Read results display file content with syntax highlighting based on file extension
+- **AC-15:** Grep/Glob results display file list with match counts
+- **AC-16:** WebFetch results display URL and response summary
+
+### Truncation
+
+- **AC-17:** Long outputs truncate at thresholds matching Claude Code behavior
+- **AC-18:** Truncated outputs show "Show full output (N lines)" link
+- **AC-19:** Clicking "Show full output" opens a dedicated lightweight modal
+- **AC-20:** Modal displays full content with syntax highlighting, scrollable
+
+### Error States
+
+- **AC-21:** Failed tool calls display with red-tinted background
+- **AC-22:** Error content (stderr, error messages) is clearly distinguishable from success content
+- **AC-23:** is_error flag from tool_result determines error state
+
+### API Contract
+
+- **AC-24:** /api/conversation response includes tool results nested in tool_calls
+- **AC-25:** Each tool_call has: name, id, input, result (when available)
+- **AC-26:** Result structure varies by tool type (documented in IMP-SERVER)
+
+---
+
+## Architecture
+
+### Why Two-Pass JSONL Parsing
+
+The Claude Code JSONL stores tool_use and tool_result as separate entries linked by tool_use_id. To nest results inside tool_calls for the API response, the server must:
+
+1. First pass: Build a map of tool_use_id → toolUseResult
+2. Second pass: Parse messages, attaching results to matching tool_calls
+
+This adds parsing overhead but keeps the API contract simple. Alternatives considered:
+- **Streaming/incremental:** More complex, doesn't help since we need full conversation anyway
+- **Client-side joining:** Shifts complexity to frontend, increases payload size
+
+### Why Render Everything, Not Virtual Scroll
+
+Sessions typically have 20-80 tool calls. Modern browsers handle hundreds of DOM elements efficiently. Virtual scrolling adds significant complexity (measuring, windowing, scroll position management) for marginal benefit.
+
+Decision: Ship simple, measure real-world performance, optimize if >100ms render times observed.
+
+### Why Dedicated Modal Over Inline Expansion
+
+Full output can be thousands of lines. Inline expansion would:
+- Push other content out of view
+- Make scrolling confusing
+- Lose context of surrounding conversation
+
+A modal provides a focused reading experience without disrupting conversation layout.
+
+### Component Structure
+
+```
+MessageBubble
+├── Content (text)
+├── Thinking (existing)
+└── ToolCallList (new)
+    └── ToolCallItem (repeated)
+        ├── Header (pill: chevron, name, summary, status)
+        └── ResultContent (conditional)
+            ├── DiffResult (for Edit/Write)
+            ├── BashResult (for Bash)
+            ├── FileListResult (for Glob/Grep)
+            └── GenericResult (fallback)
+
+FullOutputModal (new, top-level)
+├── Header (tool name, file path)
+├── Content (full output, scrollable)
+└── CloseButton
+```
+
+---
+
+## Implementation Specifications
+
+### IMP-SERVER: Parse and Attach Tool Results
+
+**Fulfills:** AC-24, AC-25, AC-26
+
+**Location:** `amc_server/mixins/conversation.py`
+
+**Changes to `_parse_claude_conversation`:**
+
+Two-pass parsing:
+1. First pass: Scan all entries, build map of `tool_use_id` → `toolUseResult`
+2. Second pass: Parse messages as before, but when encountering `tool_use`, lookup and attach result
+
+**Tool call schema after change:**
+```python
+{
+    "name": "Edit",
+    "id": "toolu_abc123",
+    "input": {"file_path": "...", "old_string": "...", "new_string": "..."},
+    "result": {
+        "content": "The file has been updated successfully.",
+        "is_error": False,
+        "structuredPatch": [...],
+        "filePath": "...",
+        # ... other fields from toolUseResult
+    }
+}
+```
+
+**Result Structure by Tool Type:**
+
+| Tool | Result Fields |
+|------|---------------|
+| Edit | `structuredPatch`, `filePath`, `oldString`, `newString` |
+| Write | `filePath`, content confirmation |
+| Read | `file`, `type`, content in `content` field |
+| Bash | `stdout`, `stderr`, `interrupted` |
+| Glob | `filenames`, `numFiles`, `truncated` |
+| Grep | `content`, `filenames`, `numFiles`, `numLines` |
+
+---
+
+### IMP-TOOLCALL: Expandable Tool Call Component
+
+**Fulfills:** AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7
+
+**Location:** `dashboard/lib/markdown.js` (refactor `renderToolCalls`)
+
+**New function: `ToolCallItem`**
+
+Renders a single tool call with:
+- Chevron for expand/collapse (when result exists and not Edit/Write)
+- Tool name (bold, colored)
+- Summary (from existing `getToolSummary`)
+- Status icon (checkmark or X)
+- Result content (when expanded)
+
+**State Management:**
+
+Track expanded state per message. When new assistant message arrives:
+- Compare latest assistant message ID to stored ID
+- If different, reset expanded set to empty
+- Edit/Write tools bypass this logic (always expanded via CSS/logic)
+
+---
+
+### IMP-DIFF: Diff Rendering Component
+
+**Fulfills:** AC-8, AC-9, AC-10, AC-11, AC-12
+
+**Location:** `dashboard/lib/markdown.js` (new function `renderDiff`)
+
+**Add diff language to highlight.js:**
+```javascript
+import langDiff from 'https://esm.sh/highlight.js@11.11.1/lib/languages/diff';
+hljs.registerLanguage('diff', langDiff);
+```
+
+**Diff Renderer:**
+
+1. Convert `structuredPatch` array to unified diff text:
+   - Each hunk: `@@ -oldStart,oldLines +newStart,newLines @@`
+   - Followed by hunk.lines array
+2. Syntax highlight with hljs diff language
+3. Sanitize with DOMPurify before rendering
+4. Wrap in container with file path header
+
+**CSS styling:**
+- Container: dark border, rounded corners
+- Header: muted background, monospace font, full file path
+- Content: monospace, horizontal scroll
+- Additions: `background: rgba(46, 160, 67, 0.15)`
+- Deletions: `background: rgba(248, 81, 73, 0.15)`
+
+---
+
+### IMP-BASH: Bash Output Component
+
+**Fulfills:** AC-13, AC-21, AC-22
+
+**Location:** `dashboard/lib/markdown.js` (new function `renderBashResult`)
+
+Renders:
+- `stdout` in monospace pre block
+- `stderr` in separate block with error styling (if present)
+- "Command interrupted" notice (if interrupted flag)
+
+Error state: `is_error` or presence of stderr triggers error styling (red tint, left border).
+
+---
+
+### IMP-TRUNCATE: Output Truncation
+
+**Fulfills:** AC-17, AC-18
+
+**Truncation Thresholds (match Claude Code):**
+
+| Tool Type | Max Lines | Max Chars |
+|-----------|-----------|-----------|
+| Bash stdout | 100 | 10000 |
+| Bash stderr | 50 | 5000 |
+| Read content | 500 | 50000 |
+| Grep matches | 100 | 10000 |
+| Glob files | 100 | 5000 |
+
+**Note:** These thresholds need verification against Claude Code behavior. May require adjustment based on testing.
+
+**Truncation Helper:**
+
+Takes content string, returns `{ text, truncated, totalLines }`. If truncated, result renderers show "Show full output (N lines)" link.
+
+---
+
+### IMP-MODAL: Full Output Modal
+
+**Fulfills:** AC-19, AC-20
+
+**Location:** `dashboard/components/FullOutputModal.js` (new file)
+
+**Structure:**
+- Overlay (click to close)
+- Modal container (click does NOT close)
+- Header: title (tool name + file path), close button
+- Content: scrollable pre/code block with syntax highlighting
+
+**Integration:** Modal state managed at App level or ChatMessages level. "Show full output" link sets state with content + metadata.
+
+---
+
+### IMP-ERROR: Error State Styling
+
+**Fulfills:** AC-21, AC-22, AC-23
+
+**Styling:**
+- Tool call header: red-tinted background when `result.is_error`
+- Status icon: red X instead of green checkmark
+- Bash stderr: red text, italic, distinct from stdout
+- Overall: left border accent in error color
+
+---
+
+## Rollout Slices
+
+### Slice 1: Design Mockups (Pre-Implementation)
+
+**Goal:** Validate visual design before building
+
+**Deliverables:**
+1. Create `/mockups` test route with static data
+2. Implement 3-4 design variants (card-based, minimal, etc.)
+3. Use real tool result data from session JSONL
+4. User reviews and selects preferred design
+
+**Exit Criteria:** Design direction locked
+
+---
+
+### Slice 2: Server-Side Tool Result Parsing
+
+**Goal:** API returns tool results nested in tool_calls
+
+**Deliverables:**
+1. Two-pass parsing in `_parse_claude_conversation`
+2. Tool results attached with `id` field
+3. Unit tests for result attachment
+4. Handle missing results gracefully (return tool_call without result)
+
+**Exit Criteria:** AC-24, AC-25, AC-26 pass
+
+---
+
+### Slice 3: Basic Expand/Collapse UI
+
+**Goal:** Tool calls are expandable, show raw result content
+
+**Deliverables:**
+1. Refactor `renderToolCalls` to `ToolCallList` component
+2. Implement expand/collapse with chevron
+3. Track expanded state per message
+4. Collapse on new assistant message
+5. Keep Edit/Write always expanded
+
+**Exit Criteria:** AC-1 through AC-7 pass
+
+---
+
+### Slice 4: Diff Rendering
+
+**Goal:** Edit/Write show beautiful diffs
+
+**Deliverables:**
+1. Add diff language to highlight.js
+2. Implement `renderDiff` function
+3. VS Code dark theme styling
+4. Full file path header
+
+**Exit Criteria:** AC-8 through AC-12 pass
+
+---
+
+### Slice 5: Other Tool Types
+
+**Goal:** Bash, Read, Glob, Grep render appropriately
+
+**Deliverables:**
+1. `renderBashResult` with stdout/stderr separation
+2. `renderFileContent` for Read
+3. `renderFileList` for Glob/Grep
+4. Generic fallback for unknown tools
+
+**Exit Criteria:** AC-13 through AC-16 pass
+
+---
+
+### Slice 6: Truncation and Modal
+
+**Goal:** Long outputs truncate with modal expansion
+
+**Deliverables:**
+1. Truncation helper with Claude Code thresholds
+2. "Show full output" link
+3. `FullOutputModal` component
+4. Syntax highlighting in modal
+
+**Exit Criteria:** AC-17 through AC-20 pass
+
+---
+
+### Slice 7: Error States and Polish
+
+**Goal:** Failed tools visually distinct, edge cases handled
+
+**Deliverables:**
+1. Error state styling (red tint)
+2. Muted styling for missing results
+3. Test with interrupted sessions
+4. Cross-browser testing
+
+**Exit Criteria:** AC-21 through AC-23 pass, feature complete
+
+---
+
+## Open Questions
+
+1. **Exact Claude Code truncation thresholds** — need to verify against Claude Code source or experiment
+2. **Performance with 100+ tool calls** — monitor after ship, optimize if needed
+3. **Codex support timeline** — when should we prioritize v2?
+
+---
+
+## Appendix: Research Findings
+
+### Claude Code JSONL Format
+
+Tool calls and results are stored as separate entries:
+
+```json
+// Assistant sends tool_use
+{"type": "assistant", "message": {"content": [{"type": "tool_use", "id": "toolu_abc", "name": "Edit", "input": {...}}]}}
+
+// Result in separate user entry
+{"type": "user", "message": {"content": [{"type": "tool_result", "tool_use_id": "toolu_abc", "content": "Success"}]}, "toolUseResult": {...}}
+```
+
+The `toolUseResult` object contains rich structured data varying by tool type.
+
+### Missing Results Statistics
+
+Across 55 sessions with 2,063 tool calls:
+- 11 missing results (0.5%)
+- Affected tools: Edit (4), Read (2), Bash (1), others
+
+### Interrupt Handling
+
+User interrupts create a separate user message:
+```json
+{"type": "user", "message": {"content": [{"type": "text", "text": "[Request interrupted by user for tool use]"}]}}
+```
+
+Tool results for completed tools are still present; the interrupt message indicates the turn ended early.