plans/PLAN-tool-result-display.md: - Add comprehensive plan for displaying tool results inline in conversation view, including truncation strategies and expand/collapse UI patterns plans/subagent-visibility.md: - Mark completed phases and update remaining work items - Reflects current state of subagent tracking implementation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
457 lines
14 KiB
Markdown
457 lines
14 KiB
Markdown
# Plan: Tool Result Display in AMC Dashboard
|
|
|
|
> **Status:** Draft — awaiting review and mockup phase
|
|
> **Author:** Claude + Taylor
|
|
> **Created:** 2026-02-27
|
|
|
|
## Summary
|
|
|
|
Add the ability to view tool call results (diffs, bash output, file contents) directly in the AMC dashboard conversation view. Currently, users see that a tool was called but cannot see what it did. This feature brings Claude Code's result visibility to the multi-agent dashboard.
|
|
|
|
### Goals
|
|
|
|
1. **See code changes as they happen** — diffs from Edit/Write tools always visible
|
|
2. **Debug agent behavior** — inspect Bash output, Read content, search results
|
|
3. **Match Claude Code UX** — familiar expand/collapse behavior with latest results expanded
|
|
|
|
### Non-Goals (v1)
|
|
|
|
- Codex agent support (different JSONL format — deferred to v2)
|
|
- Copy-to-clipboard functionality
|
|
- Virtual scrolling / performance optimization
|
|
- Editor integration (clicking paths to open files)
|
|
|
|
---
|
|
|
|
## User Workflows
|
|
|
|
### Workflow 1: Watching an Active Session
|
|
|
|
1. User opens a session card showing an active Claude agent
|
|
2. Agent calls Edit tool to modify a file
|
|
3. User immediately sees the diff expanded below the tool call pill
|
|
4. Agent calls Bash to run tests
|
|
5. User sees bash output expanded, previous Edit diff stays expanded (it's a diff)
|
|
6. Agent sends a text message explaining results
|
|
7. Bash output collapses (new assistant message arrived), Edit diff stays expanded
|
|
|
|
### Workflow 2: Reviewing a Completed Session
|
|
|
|
1. User opens a completed session to review what the agent did
|
|
2. All tool calls are collapsed by default (no "latest" assistant message)
|
|
3. Exception: Edit/Write diffs are still expanded
|
|
4. User clicks a Bash tool call to see what command ran and its output
|
|
5. User clicks "Show full output" when output is truncated
|
|
6. Lightweight modal opens with full scrollable content
|
|
7. User closes modal and continues reviewing
|
|
|
|
### Workflow 3: Debugging a Failed Tool Call
|
|
|
|
1. Agent runs a Bash command that fails
|
|
2. Tool result block shows with red-tinted background
|
|
3. stderr content is visible, clearly marked as error
|
|
4. User can see what went wrong without leaving the dashboard
|
|
|
|
---
|
|
|
|
## Acceptance Criteria
|
|
|
|
### Display Behavior
|
|
|
|
- **AC-1:** Tool calls render as expandable elements showing tool name and summary
|
|
- **AC-2:** Clicking a collapsed tool call expands to show its result
|
|
- **AC-3:** Clicking an expanded tool call collapses it
|
|
- **AC-4:** Tool results in the most recent assistant message are expanded by default
|
|
- **AC-5:** When a new assistant message arrives, previous tool results collapse
|
|
- **AC-6:** Edit and Write tool diffs remain expanded regardless of message age
|
|
- **AC-7:** Tool calls without results display as non-expandable with muted styling
|
|
|
|
### Diff Rendering
|
|
|
|
- **AC-8:** Edit/Write results display structuredPatch data as syntax-highlighted diff
|
|
- **AC-9:** Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
|
|
- **AC-10:** Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
|
|
- **AC-11:** Full file path displays above each diff block
|
|
- **AC-12:** Diff context lines use structuredPatch as-is (no recomputation)
|
|
|
|
### Other Tool Types
|
|
|
|
- **AC-13:** Bash results display stdout in monospace, stderr separately if present
|
|
- **AC-14:** Read results display file content with syntax highlighting based on file extension
|
|
- **AC-15:** Grep/Glob results display file list with match counts
|
|
- **AC-16:** WebFetch results display URL and response summary
|
|
|
|
### Truncation
|
|
|
|
- **AC-17:** Long outputs truncate at thresholds matching Claude Code behavior
|
|
- **AC-18:** Truncated outputs show "Show full output (N lines)" link
|
|
- **AC-19:** Clicking "Show full output" opens a dedicated lightweight modal
|
|
- **AC-20:** Modal displays full content with syntax highlighting, scrollable
|
|
|
|
### Error States
|
|
|
|
- **AC-21:** Failed tool calls display with red-tinted background
|
|
- **AC-22:** Error content (stderr, error messages) is clearly distinguishable from success content
|
|
- **AC-23:** is_error flag from tool_result determines error state
|
|
|
|
### API Contract
|
|
|
|
- **AC-24:** /api/conversation response includes tool results nested in tool_calls
|
|
- **AC-25:** Each tool_call has: name, id, input, result (when available)
|
|
- **AC-26:** Result structure varies by tool type (documented in IMP-SERVER)
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
### Why Two-Pass JSONL Parsing
|
|
|
|
The Claude Code JSONL stores tool_use and tool_result as separate entries linked by tool_use_id. To nest results inside tool_calls for the API response, the server must:
|
|
|
|
1. First pass: Build a map of tool_use_id → toolUseResult
|
|
2. Second pass: Parse messages, attaching results to matching tool_calls
|
|
|
|
This adds parsing overhead but keeps the API contract simple. Alternatives considered:
|
|
- **Streaming/incremental:** More complex, doesn't help since we need full conversation anyway
|
|
- **Client-side joining:** Shifts complexity to frontend, increases payload size
|
|
|
|
### Why Render Everything, Not Virtual Scroll
|
|
|
|
Sessions typically have 20-80 tool calls. Modern browsers handle hundreds of DOM elements efficiently. Virtual scrolling adds significant complexity (measuring, windowing, scroll position management) for marginal benefit.
|
|
|
|
Decision: Ship simple, measure real-world performance, optimize if >100ms render times observed.
|
|
|
|
### Why Dedicated Modal Over Inline Expansion
|
|
|
|
Full output can be thousands of lines. Inline expansion would:
|
|
- Push other content out of view
|
|
- Make scrolling confusing
|
|
- Lose context of surrounding conversation
|
|
|
|
A modal provides a focused reading experience without disrupting conversation layout.
|
|
|
|
### Component Structure
|
|
|
|
```
|
|
MessageBubble
|
|
├── Content (text)
|
|
├── Thinking (existing)
|
|
└── ToolCallList (new)
|
|
└── ToolCallItem (repeated)
|
|
├── Header (pill: chevron, name, summary, status)
|
|
└── ResultContent (conditional)
|
|
├── DiffResult (for Edit/Write)
|
|
├── BashResult (for Bash)
|
|
├── FileListResult (for Glob/Grep)
|
|
└── GenericResult (fallback)
|
|
|
|
FullOutputModal (new, top-level)
|
|
├── Header (tool name, file path)
|
|
├── Content (full output, scrollable)
|
|
└── CloseButton
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Specifications
|
|
|
|
### IMP-SERVER: Parse and Attach Tool Results
|
|
|
|
**Fulfills:** AC-24, AC-25, AC-26
|
|
|
|
**Location:** `amc_server/mixins/conversation.py`
|
|
|
|
**Changes to `_parse_claude_conversation`:**
|
|
|
|
Two-pass parsing:
|
|
1. First pass: Scan all entries, build map of `tool_use_id` → `toolUseResult`
|
|
2. Second pass: Parse messages as before, but when encountering `tool_use`, lookup and attach result
|
|
|
|
**Tool call schema after change:**
|
|
```python
|
|
{
|
|
"name": "Edit",
|
|
"id": "toolu_abc123",
|
|
"input": {"file_path": "...", "old_string": "...", "new_string": "..."},
|
|
"result": {
|
|
"content": "The file has been updated successfully.",
|
|
"is_error": False,
|
|
"structuredPatch": [...],
|
|
"filePath": "...",
|
|
# ... other fields from toolUseResult
|
|
}
|
|
}
|
|
```
|
|
|
|
**Result Structure by Tool Type:**
|
|
|
|
| Tool | Result Fields |
|
|
|------|---------------|
|
|
| Edit | `structuredPatch`, `filePath`, `oldString`, `newString` |
|
|
| Write | `filePath`, content confirmation |
|
|
| Read | `file`, `type`, content in `content` field |
|
|
| Bash | `stdout`, `stderr`, `interrupted` |
|
|
| Glob | `filenames`, `numFiles`, `truncated` |
|
|
| Grep | `content`, `filenames`, `numFiles`, `numLines` |
|
|
|
|
---
|
|
|
|
### IMP-TOOLCALL: Expandable Tool Call Component
|
|
|
|
**Fulfills:** AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7
|
|
|
|
**Location:** `dashboard/lib/markdown.js` (refactor `renderToolCalls`)
|
|
|
|
**New function: `ToolCallItem`**
|
|
|
|
Renders a single tool call with:
|
|
- Chevron for expand/collapse (when result exists and not Edit/Write)
|
|
- Tool name (bold, colored)
|
|
- Summary (from existing `getToolSummary`)
|
|
- Status icon (checkmark or X)
|
|
- Result content (when expanded)
|
|
|
|
**State Management:**
|
|
|
|
Track expanded state per message. When new assistant message arrives:
|
|
- Compare latest assistant message ID to stored ID
|
|
- If different, reset expanded set to empty
|
|
- Edit/Write tools bypass this logic (always expanded via CSS/logic)
|
|
|
|
---
|
|
|
|
### IMP-DIFF: Diff Rendering Component
|
|
|
|
**Fulfills:** AC-8, AC-9, AC-10, AC-11, AC-12
|
|
|
|
**Location:** `dashboard/lib/markdown.js` (new function `renderDiff`)
|
|
|
|
**Add diff language to highlight.js:**
|
|
```javascript
|
|
import langDiff from 'https://esm.sh/highlight.js@11.11.1/lib/languages/diff';
|
|
hljs.registerLanguage('diff', langDiff);
|
|
```
|
|
|
|
**Diff Renderer:**
|
|
|
|
1. Convert `structuredPatch` array to unified diff text:
|
|
- Each hunk: `@@ -oldStart,oldLines +newStart,newLines @@`
|
|
- Followed by hunk.lines array
|
|
2. Syntax highlight with hljs diff language
|
|
3. Sanitize with DOMPurify before rendering
|
|
4. Wrap in container with file path header
|
|
|
|
**CSS styling:**
|
|
- Container: dark border, rounded corners
|
|
- Header: muted background, monospace font, full file path
|
|
- Content: monospace, horizontal scroll
|
|
- Additions: `background: rgba(46, 160, 67, 0.15)`
|
|
- Deletions: `background: rgba(248, 81, 73, 0.15)`
|
|
|
|
---
|
|
|
|
### IMP-BASH: Bash Output Component
|
|
|
|
**Fulfills:** AC-13, AC-21, AC-22
|
|
|
|
**Location:** `dashboard/lib/markdown.js` (new function `renderBashResult`)
|
|
|
|
Renders:
|
|
- `stdout` in monospace pre block
|
|
- `stderr` in separate block with error styling (if present)
|
|
- "Command interrupted" notice (if interrupted flag)
|
|
|
|
Error state: `is_error` or presence of stderr triggers error styling (red tint, left border).
|
|
|
|
---
|
|
|
|
### IMP-TRUNCATE: Output Truncation
|
|
|
|
**Fulfills:** AC-17, AC-18
|
|
|
|
**Truncation Thresholds (match Claude Code):**
|
|
|
|
| Tool Type | Max Lines | Max Chars |
|
|
|-----------|-----------|-----------|
|
|
| Bash stdout | 100 | 10000 |
|
|
| Bash stderr | 50 | 5000 |
|
|
| Read content | 500 | 50000 |
|
|
| Grep matches | 100 | 10000 |
|
|
| Glob files | 100 | 5000 |
|
|
|
|
**Note:** These thresholds need verification against Claude Code behavior. May require adjustment based on testing.
|
|
|
|
**Truncation Helper:**
|
|
|
|
Takes content string, returns `{ text, truncated, totalLines }`. If truncated, result renderers show "Show full output (N lines)" link.
|
|
|
|
---
|
|
|
|
### IMP-MODAL: Full Output Modal
|
|
|
|
**Fulfills:** AC-19, AC-20
|
|
|
|
**Location:** `dashboard/components/FullOutputModal.js` (new file)
|
|
|
|
**Structure:**
|
|
- Overlay (click to close)
|
|
- Modal container (click does NOT close)
|
|
- Header: title (tool name + file path), close button
|
|
- Content: scrollable pre/code block with syntax highlighting
|
|
|
|
**Integration:** Modal state managed at App level or ChatMessages level. "Show full output" link sets state with content + metadata.
|
|
|
|
---
|
|
|
|
### IMP-ERROR: Error State Styling
|
|
|
|
**Fulfills:** AC-21, AC-22, AC-23
|
|
|
|
**Styling:**
|
|
- Tool call header: red-tinted background when `result.is_error`
|
|
- Status icon: red X instead of green checkmark
|
|
- Bash stderr: red text, italic, distinct from stdout
|
|
- Overall: left border accent in error color
|
|
|
|
---
|
|
|
|
## Rollout Slices
|
|
|
|
### Slice 1: Design Mockups (Pre-Implementation)
|
|
|
|
**Goal:** Validate visual design before building
|
|
|
|
**Deliverables:**
|
|
1. Create `/mockups` test route with static data
|
|
2. Implement 3-4 design variants (card-based, minimal, etc.)
|
|
3. Use real tool result data from session JSONL
|
|
4. User reviews and selects preferred design
|
|
|
|
**Exit Criteria:** Design direction locked
|
|
|
|
---
|
|
|
|
### Slice 2: Server-Side Tool Result Parsing
|
|
|
|
**Goal:** API returns tool results nested in tool_calls
|
|
|
|
**Deliverables:**
|
|
1. Two-pass parsing in `_parse_claude_conversation`
|
|
2. Tool results attached with `id` field
|
|
3. Unit tests for result attachment
|
|
4. Handle missing results gracefully (return tool_call without result)
|
|
|
|
**Exit Criteria:** AC-24, AC-25, AC-26 pass
|
|
|
|
---
|
|
|
|
### Slice 3: Basic Expand/Collapse UI
|
|
|
|
**Goal:** Tool calls are expandable, show raw result content
|
|
|
|
**Deliverables:**
|
|
1. Refactor `renderToolCalls` to `ToolCallList` component
|
|
2. Implement expand/collapse with chevron
|
|
3. Track expanded state per message
|
|
4. Collapse on new assistant message
|
|
5. Keep Edit/Write always expanded
|
|
|
|
**Exit Criteria:** AC-1 through AC-7 pass
|
|
|
|
---
|
|
|
|
### Slice 4: Diff Rendering
|
|
|
|
**Goal:** Edit/Write show beautiful diffs
|
|
|
|
**Deliverables:**
|
|
1. Add diff language to highlight.js
|
|
2. Implement `renderDiff` function
|
|
3. VS Code dark theme styling
|
|
4. Full file path header
|
|
|
|
**Exit Criteria:** AC-8 through AC-12 pass
|
|
|
|
---
|
|
|
|
### Slice 5: Other Tool Types
|
|
|
|
**Goal:** Bash, Read, Glob, Grep render appropriately
|
|
|
|
**Deliverables:**
|
|
1. `renderBashResult` with stdout/stderr separation
|
|
2. `renderFileContent` for Read
|
|
3. `renderFileList` for Glob/Grep
|
|
4. Generic fallback for unknown tools
|
|
|
|
**Exit Criteria:** AC-13 through AC-16 pass
|
|
|
|
---
|
|
|
|
### Slice 6: Truncation and Modal
|
|
|
|
**Goal:** Long outputs truncate with modal expansion
|
|
|
|
**Deliverables:**
|
|
1. Truncation helper with Claude Code thresholds
|
|
2. "Show full output" link
|
|
3. `FullOutputModal` component
|
|
4. Syntax highlighting in modal
|
|
|
|
**Exit Criteria:** AC-17 through AC-20 pass
|
|
|
|
---
|
|
|
|
### Slice 7: Error States and Polish
|
|
|
|
**Goal:** Failed tools visually distinct, edge cases handled
|
|
|
|
**Deliverables:**
|
|
1. Error state styling (red tint)
|
|
2. Muted styling for missing results
|
|
3. Test with interrupted sessions
|
|
4. Cross-browser testing
|
|
|
|
**Exit Criteria:** AC-21 through AC-23 pass, feature complete
|
|
|
|
---
|
|
|
|
## Open Questions
|
|
|
|
1. **Exact Claude Code truncation thresholds** — need to verify against Claude Code source or experiment
|
|
2. **Performance with 100+ tool calls** — monitor after ship, optimize if needed
|
|
3. **Codex support timeline** — when should we prioritize v2?
|
|
|
|
---
|
|
|
|
## Appendix: Research Findings
|
|
|
|
### Claude Code JSONL Format
|
|
|
|
Tool calls and results are stored as separate entries:
|
|
|
|
```json
|
|
// Assistant sends tool_use
|
|
{"type": "assistant", "message": {"content": [{"type": "tool_use", "id": "toolu_abc", "name": "Edit", "input": {...}}]}}
|
|
|
|
// Result in separate user entry
|
|
{"type": "user", "message": {"content": [{"type": "tool_result", "tool_use_id": "toolu_abc", "content": "Success"}]}, "toolUseResult": {...}}
|
|
```
|
|
|
|
The `toolUseResult` object contains rich structured data varying by tool type.
|
|
|
|
### Missing Results Statistics
|
|
|
|
Across 55 sessions with 2,063 tool calls:
|
|
- 11 missing results (0.5%)
|
|
- Affected tools: Edit (4), Read (2), Bash (1), others
|
|
|
|
### Interrupt Handling
|
|
|
|
User interrupts create a separate user message:
|
|
```json
|
|
{"type": "user", "message": {"content": [{"type": "text", "text": "[Request interrupted by user for tool use]"}]}}
|
|
```
|
|
|
|
Tool results for completed tools are still present; the interrupt message indicates the turn ended early.
|