Files
amc/plans/PLAN-tool-result-display.md
teernisse fb9d4e5b9f chore(plans): update implementation plans
plans/PLAN-tool-result-display.md:
- Add comprehensive plan for displaying tool results inline in
  conversation view, including truncation strategies and expand/collapse
  UI patterns

plans/subagent-visibility.md:
- Mark completed phases and update remaining work items
- Reflects current state of subagent tracking implementation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-28 00:49:02 -05:00

457 lines
14 KiB
Markdown

# Plan: Tool Result Display in AMC Dashboard
> **Status:** Draft — awaiting review and mockup phase
> **Author:** Claude + Taylor
> **Created:** 2026-02-27
## Summary
Add the ability to view tool call results (diffs, bash output, file contents) directly in the AMC dashboard conversation view. Currently, users see that a tool was called but cannot see what it did. This feature brings Claude Code's result visibility to the multi-agent dashboard.
### Goals
1. **See code changes as they happen** — diffs from Edit/Write tools always visible
2. **Debug agent behavior** — inspect Bash output, Read content, search results
3. **Match Claude Code UX** — familiar expand/collapse behavior with latest results expanded
### Non-Goals (v1)
- Codex agent support (different JSONL format — deferred to v2)
- Copy-to-clipboard functionality
- Virtual scrolling / performance optimization
- Editor integration (clicking paths to open files)
---
## User Workflows
### Workflow 1: Watching an Active Session
1. User opens a session card showing an active Claude agent
2. Agent calls Edit tool to modify a file
3. User immediately sees the diff expanded below the tool call pill
4. Agent calls Bash to run tests
5. User sees bash output expanded, previous Edit diff stays expanded (it's a diff)
6. Agent sends a text message explaining results
7. Bash output collapses (new assistant message arrived), Edit diff stays expanded
### Workflow 2: Reviewing a Completed Session
1. User opens a completed session to review what the agent did
2. All tool calls are collapsed by default (no "latest" assistant message)
3. Exception: Edit/Write diffs are still expanded
4. User clicks a Bash tool call to see what command ran and its output
5. User clicks "Show full output" when output is truncated
6. Lightweight modal opens with full scrollable content
7. User closes modal and continues reviewing
### Workflow 3: Debugging a Failed Tool Call
1. Agent runs a Bash command that fails
2. Tool result block shows with red-tinted background
3. stderr content is visible, clearly marked as error
4. User can see what went wrong without leaving the dashboard
---
## Acceptance Criteria
### Display Behavior
- **AC-1:** Tool calls render as expandable elements showing tool name and summary
- **AC-2:** Clicking a collapsed tool call expands to show its result
- **AC-3:** Clicking an expanded tool call collapses it
- **AC-4:** Tool results in the most recent assistant message are expanded by default
- **AC-5:** When a new assistant message arrives, previous tool results collapse
- **AC-6:** Edit and Write tool diffs remain expanded regardless of message age
- **AC-7:** Tool calls without results display as non-expandable with muted styling
### Diff Rendering
- **AC-8:** Edit/Write results display structuredPatch data as syntax-highlighted diff
- **AC-9:** Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
- **AC-10:** Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
- **AC-11:** Full file path displays above each diff block
- **AC-12:** Diff context lines use structuredPatch as-is (no recomputation)
### Other Tool Types
- **AC-13:** Bash results display stdout in monospace, stderr separately if present
- **AC-14:** Read results display file content with syntax highlighting based on file extension
- **AC-15:** Grep/Glob results display file list with match counts
- **AC-16:** WebFetch results display URL and response summary
### Truncation
- **AC-17:** Long outputs truncate at thresholds matching Claude Code behavior
- **AC-18:** Truncated outputs show "Show full output (N lines)" link
- **AC-19:** Clicking "Show full output" opens a dedicated lightweight modal
- **AC-20:** Modal displays full content with syntax highlighting, scrollable
### Error States
- **AC-21:** Failed tool calls display with red-tinted background
- **AC-22:** Error content (stderr, error messages) is clearly distinguishable from success content
- **AC-23:** is_error flag from tool_result determines error state
### API Contract
- **AC-24:** /api/conversation response includes tool results nested in tool_calls
- **AC-25:** Each tool_call has: name, id, input, result (when available)
- **AC-26:** Result structure varies by tool type (documented in IMP-SERVER)
---
## Architecture
### Why Two-Pass JSONL Parsing
The Claude Code JSONL stores tool_use and tool_result as separate entries linked by tool_use_id. To nest results inside tool_calls for the API response, the server must:
1. First pass: Build a map of tool_use_id → toolUseResult
2. Second pass: Parse messages, attaching results to matching tool_calls
This adds parsing overhead but keeps the API contract simple. Alternatives considered:
- **Streaming/incremental:** More complex, doesn't help since we need full conversation anyway
- **Client-side joining:** Shifts complexity to frontend, increases payload size
### Why Render Everything, Not Virtual Scroll
Sessions typically have 20-80 tool calls. Modern browsers handle hundreds of DOM elements efficiently. Virtual scrolling adds significant complexity (measuring, windowing, scroll position management) for marginal benefit.
Decision: Ship simple, measure real-world performance, optimize if >100ms render times observed.
### Why Dedicated Modal Over Inline Expansion
Full output can be thousands of lines. Inline expansion would:
- Push other content out of view
- Make scrolling confusing
- Lose context of surrounding conversation
A modal provides a focused reading experience without disrupting conversation layout.
### Component Structure
```
MessageBubble
├── Content (text)
├── Thinking (existing)
└── ToolCallList (new)
└── ToolCallItem (repeated)
├── Header (pill: chevron, name, summary, status)
└── ResultContent (conditional)
├── DiffResult (for Edit/Write)
├── BashResult (for Bash)
├── FileListResult (for Glob/Grep)
└── GenericResult (fallback)
FullOutputModal (new, top-level)
├── Header (tool name, file path)
├── Content (full output, scrollable)
└── CloseButton
```
---
## Implementation Specifications
### IMP-SERVER: Parse and Attach Tool Results
**Fulfills:** AC-24, AC-25, AC-26
**Location:** `amc_server/mixins/conversation.py`
**Changes to `_parse_claude_conversation`:**
Two-pass parsing:
1. First pass: Scan all entries, build map of `tool_use_id``toolUseResult`
2. Second pass: Parse messages as before, but when encountering `tool_use`, lookup and attach result
**Tool call schema after change:**
```python
{
"name": "Edit",
"id": "toolu_abc123",
"input": {"file_path": "...", "old_string": "...", "new_string": "..."},
"result": {
"content": "The file has been updated successfully.",
"is_error": False,
"structuredPatch": [...],
"filePath": "...",
# ... other fields from toolUseResult
}
}
```
**Result Structure by Tool Type:**
| Tool | Result Fields |
|------|---------------|
| Edit | `structuredPatch`, `filePath`, `oldString`, `newString` |
| Write | `filePath`, content confirmation |
| Read | `file`, `type`, content in `content` field |
| Bash | `stdout`, `stderr`, `interrupted` |
| Glob | `filenames`, `numFiles`, `truncated` |
| Grep | `content`, `filenames`, `numFiles`, `numLines` |
---
### IMP-TOOLCALL: Expandable Tool Call Component
**Fulfills:** AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7
**Location:** `dashboard/lib/markdown.js` (refactor `renderToolCalls`)
**New function: `ToolCallItem`**
Renders a single tool call with:
- Chevron for expand/collapse (when result exists and not Edit/Write)
- Tool name (bold, colored)
- Summary (from existing `getToolSummary`)
- Status icon (checkmark or X)
- Result content (when expanded)
**State Management:**
Track expanded state per message. When new assistant message arrives:
- Compare latest assistant message ID to stored ID
- If different, reset expanded set to empty
- Edit/Write tools bypass this logic (always expanded via CSS/logic)
---
### IMP-DIFF: Diff Rendering Component
**Fulfills:** AC-8, AC-9, AC-10, AC-11, AC-12
**Location:** `dashboard/lib/markdown.js` (new function `renderDiff`)
**Add diff language to highlight.js:**
```javascript
import langDiff from 'https://esm.sh/highlight.js@11.11.1/lib/languages/diff';
hljs.registerLanguage('diff', langDiff);
```
**Diff Renderer:**
1. Convert `structuredPatch` array to unified diff text:
- Each hunk: `@@ -oldStart,oldLines +newStart,newLines @@`
- Followed by hunk.lines array
2. Syntax highlight with hljs diff language
3. Sanitize with DOMPurify before rendering
4. Wrap in container with file path header
**CSS styling:**
- Container: dark border, rounded corners
- Header: muted background, monospace font, full file path
- Content: monospace, horizontal scroll
- Additions: `background: rgba(46, 160, 67, 0.15)`
- Deletions: `background: rgba(248, 81, 73, 0.15)`
---
### IMP-BASH: Bash Output Component
**Fulfills:** AC-13, AC-21, AC-22
**Location:** `dashboard/lib/markdown.js` (new function `renderBashResult`)
Renders:
- `stdout` in monospace pre block
- `stderr` in separate block with error styling (if present)
- "Command interrupted" notice (if interrupted flag)
Error state: `is_error` or presence of stderr triggers error styling (red tint, left border).
---
### IMP-TRUNCATE: Output Truncation
**Fulfills:** AC-17, AC-18
**Truncation Thresholds (match Claude Code):**
| Tool Type | Max Lines | Max Chars |
|-----------|-----------|-----------|
| Bash stdout | 100 | 10000 |
| Bash stderr | 50 | 5000 |
| Read content | 500 | 50000 |
| Grep matches | 100 | 10000 |
| Glob files | 100 | 5000 |
**Note:** These thresholds need verification against Claude Code behavior. May require adjustment based on testing.
**Truncation Helper:**
Takes content string, returns `{ text, truncated, totalLines }`. If truncated, result renderers show "Show full output (N lines)" link.
---
### IMP-MODAL: Full Output Modal
**Fulfills:** AC-19, AC-20
**Location:** `dashboard/components/FullOutputModal.js` (new file)
**Structure:**
- Overlay (click to close)
- Modal container (click does NOT close)
- Header: title (tool name + file path), close button
- Content: scrollable pre/code block with syntax highlighting
**Integration:** Modal state managed at App level or ChatMessages level. "Show full output" link sets state with content + metadata.
---
### IMP-ERROR: Error State Styling
**Fulfills:** AC-21, AC-22, AC-23
**Styling:**
- Tool call header: red-tinted background when `result.is_error`
- Status icon: red X instead of green checkmark
- Bash stderr: red text, italic, distinct from stdout
- Overall: left border accent in error color
---
## Rollout Slices
### Slice 1: Design Mockups (Pre-Implementation)
**Goal:** Validate visual design before building
**Deliverables:**
1. Create `/mockups` test route with static data
2. Implement 3-4 design variants (card-based, minimal, etc.)
3. Use real tool result data from session JSONL
4. User reviews and selects preferred design
**Exit Criteria:** Design direction locked
---
### Slice 2: Server-Side Tool Result Parsing
**Goal:** API returns tool results nested in tool_calls
**Deliverables:**
1. Two-pass parsing in `_parse_claude_conversation`
2. Tool results attached with `id` field
3. Unit tests for result attachment
4. Handle missing results gracefully (return tool_call without result)
**Exit Criteria:** AC-24, AC-25, AC-26 pass
---
### Slice 3: Basic Expand/Collapse UI
**Goal:** Tool calls are expandable, show raw result content
**Deliverables:**
1. Refactor `renderToolCalls` to `ToolCallList` component
2. Implement expand/collapse with chevron
3. Track expanded state per message
4. Collapse on new assistant message
5. Keep Edit/Write always expanded
**Exit Criteria:** AC-1 through AC-7 pass
---
### Slice 4: Diff Rendering
**Goal:** Edit/Write show beautiful diffs
**Deliverables:**
1. Add diff language to highlight.js
2. Implement `renderDiff` function
3. VS Code dark theme styling
4. Full file path header
**Exit Criteria:** AC-8 through AC-12 pass
---
### Slice 5: Other Tool Types
**Goal:** Bash, Read, Glob, Grep render appropriately
**Deliverables:**
1. `renderBashResult` with stdout/stderr separation
2. `renderFileContent` for Read
3. `renderFileList` for Glob/Grep
4. Generic fallback for unknown tools
**Exit Criteria:** AC-13 through AC-16 pass
---
### Slice 6: Truncation and Modal
**Goal:** Long outputs truncate with modal expansion
**Deliverables:**
1. Truncation helper with Claude Code thresholds
2. "Show full output" link
3. `FullOutputModal` component
4. Syntax highlighting in modal
**Exit Criteria:** AC-17 through AC-20 pass
---
### Slice 7: Error States and Polish
**Goal:** Failed tools visually distinct, edge cases handled
**Deliverables:**
1. Error state styling (red tint)
2. Muted styling for missing results
3. Test with interrupted sessions
4. Cross-browser testing
**Exit Criteria:** AC-21 through AC-23 pass, feature complete
---
## Open Questions
1. **Exact Claude Code truncation thresholds** — need to verify against Claude Code source or experiment
2. **Performance with 100+ tool calls** — monitor after ship, optimize if needed
3. **Codex support timeline** — when should we prioritize v2?
---
## Appendix: Research Findings
### Claude Code JSONL Format
Tool calls and results are stored as separate entries:
```json
// Assistant sends tool_use
{"type": "assistant", "message": {"content": [{"type": "tool_use", "id": "toolu_abc", "name": "Edit", "input": {...}}]}}
// Result in separate user entry
{"type": "user", "message": {"content": [{"type": "tool_result", "tool_use_id": "toolu_abc", "content": "Success"}]}, "toolUseResult": {...}}
```
The `toolUseResult` object contains rich structured data varying by tool type.
### Missing Results Statistics
Across 55 sessions with 2,063 tool calls:
- 11 missing results (0.5%)
- Affected tools: Edit (4), Read (2), Bash (1), others
### Interrupt Handling
User interrupts create a separate user message:
```json
{"type": "user", "message": {"content": [{"type": "text", "text": "[Request interrupted by user for tool use]"}]}}
```
Tool results for completed tools are still present; the interrupt message indicates the turn ended early.