chore(plans): update implementation plans
plans/PLAN-tool-result-display.md: - Add comprehensive plan for displaying tool results inline in conversation view, including truncation strategies and expand/collapse UI patterns plans/subagent-visibility.md: - Mark completed phases and update remaining work items - Reflects current state of subagent tracking implementation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
456
plans/PLAN-tool-result-display.md
Normal file
456
plans/PLAN-tool-result-display.md
Normal file
@@ -0,0 +1,456 @@
|
||||
# Plan: Tool Result Display in AMC Dashboard
|
||||
|
||||
> **Status:** Draft — awaiting review and mockup phase
|
||||
> **Author:** Claude + Taylor
|
||||
> **Created:** 2026-02-27
|
||||
|
||||
## Summary
|
||||
|
||||
Add the ability to view tool call results (diffs, bash output, file contents) directly in the AMC dashboard conversation view. Currently, users see that a tool was called but cannot see what it did. This feature brings Claude Code's result visibility to the multi-agent dashboard.
|
||||
|
||||
### Goals
|
||||
|
||||
1. **See code changes as they happen** — diffs from Edit/Write tools always visible
|
||||
2. **Debug agent behavior** — inspect Bash output, Read content, search results
|
||||
3. **Match Claude Code UX** — familiar expand/collapse behavior with latest results expanded
|
||||
|
||||
### Non-Goals (v1)
|
||||
|
||||
- Codex agent support (different JSONL format — deferred to v2)
|
||||
- Copy-to-clipboard functionality
|
||||
- Virtual scrolling / performance optimization
|
||||
- Editor integration (clicking paths to open files)
|
||||
|
||||
---
|
||||
|
||||
## User Workflows
|
||||
|
||||
### Workflow 1: Watching an Active Session
|
||||
|
||||
1. User opens a session card showing an active Claude agent
|
||||
2. Agent calls Edit tool to modify a file
|
||||
3. User immediately sees the diff expanded below the tool call pill
|
||||
4. Agent calls Bash to run tests
|
||||
5. User sees bash output expanded, previous Edit diff stays expanded (it's a diff)
|
||||
6. Agent sends a text message explaining results
|
||||
7. Bash output collapses (new assistant message arrived), Edit diff stays expanded
|
||||
|
||||
### Workflow 2: Reviewing a Completed Session
|
||||
|
||||
1. User opens a completed session to review what the agent did
|
||||
2. All tool calls are collapsed by default (no "latest" assistant message)
|
||||
3. Exception: Edit/Write diffs are still expanded
|
||||
4. User clicks a Bash tool call to see what command ran and its output
|
||||
5. User clicks "Show full output" when output is truncated
|
||||
6. Lightweight modal opens with full scrollable content
|
||||
7. User closes modal and continues reviewing
|
||||
|
||||
### Workflow 3: Debugging a Failed Tool Call
|
||||
|
||||
1. Agent runs a Bash command that fails
|
||||
2. Tool result block shows with red-tinted background
|
||||
3. stderr content is visible, clearly marked as error
|
||||
4. User can see what went wrong without leaving the dashboard
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Display Behavior
|
||||
|
||||
- **AC-1:** Tool calls render as expandable elements showing tool name and summary
|
||||
- **AC-2:** Clicking a collapsed tool call expands to show its result
|
||||
- **AC-3:** Clicking an expanded tool call collapses it
|
||||
- **AC-4:** Tool results in the most recent assistant message are expanded by default
|
||||
- **AC-5:** When a new assistant message arrives, previous tool results collapse
|
||||
- **AC-6:** Edit and Write tool diffs remain expanded regardless of message age
|
||||
- **AC-7:** Tool calls without results display as non-expandable with muted styling
|
||||
|
||||
### Diff Rendering
|
||||
|
||||
- **AC-8:** Edit/Write results display structuredPatch data as syntax-highlighted diff
|
||||
- **AC-9:** Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
|
||||
- **AC-10:** Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
|
||||
- **AC-11:** Full file path displays above each diff block
|
||||
- **AC-12:** Diff context lines use structuredPatch as-is (no recomputation)
|
||||
|
||||
### Other Tool Types
|
||||
|
||||
- **AC-13:** Bash results display stdout in monospace, stderr separately if present
|
||||
- **AC-14:** Read results display file content with syntax highlighting based on file extension
|
||||
- **AC-15:** Grep/Glob results display file list with match counts
|
||||
- **AC-16:** WebFetch results display URL and response summary
|
||||
|
||||
### Truncation
|
||||
|
||||
- **AC-17:** Long outputs truncate at thresholds matching Claude Code behavior
|
||||
- **AC-18:** Truncated outputs show "Show full output (N lines)" link
|
||||
- **AC-19:** Clicking "Show full output" opens a dedicated lightweight modal
|
||||
- **AC-20:** Modal displays full content with syntax highlighting, scrollable
|
||||
|
||||
### Error States
|
||||
|
||||
- **AC-21:** Failed tool calls display with red-tinted background
|
||||
- **AC-22:** Error content (stderr, error messages) is clearly distinguishable from success content
|
||||
- **AC-23:** is_error flag from tool_result determines error state
|
||||
|
||||
### API Contract
|
||||
|
||||
- **AC-24:** /api/conversation response includes tool results nested in tool_calls
|
||||
- **AC-25:** Each tool_call has: name, id, input, result (when available)
|
||||
- **AC-26:** Result structure varies by tool type (documented in IMP-SERVER)
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### Why Two-Pass JSONL Parsing
|
||||
|
||||
The Claude Code JSONL stores tool_use and tool_result as separate entries linked by tool_use_id. To nest results inside tool_calls for the API response, the server must:
|
||||
|
||||
1. First pass: Build a map of tool_use_id → toolUseResult
|
||||
2. Second pass: Parse messages, attaching results to matching tool_calls
|
||||
|
||||
This adds parsing overhead but keeps the API contract simple. Alternatives considered:
|
||||
- **Streaming/incremental:** More complex, doesn't help since we need full conversation anyway
|
||||
- **Client-side joining:** Shifts complexity to frontend, increases payload size
|
||||
|
||||
### Why Render Everything, Not Virtual Scroll
|
||||
|
||||
Sessions typically have 20-80 tool calls. Modern browsers handle hundreds of DOM elements efficiently. Virtual scrolling adds significant complexity (measuring, windowing, scroll position management) for marginal benefit.
|
||||
|
||||
Decision: Ship simple, measure real-world performance, optimize if >100ms render times observed.
|
||||
|
||||
### Why Dedicated Modal Over Inline Expansion
|
||||
|
||||
Full output can be thousands of lines. Inline expansion would:
|
||||
- Push other content out of view
|
||||
- Make scrolling confusing
|
||||
- Lose context of surrounding conversation
|
||||
|
||||
A modal provides a focused reading experience without disrupting conversation layout.
|
||||
|
||||
### Component Structure
|
||||
|
||||
```
|
||||
MessageBubble
|
||||
├── Content (text)
|
||||
├── Thinking (existing)
|
||||
└── ToolCallList (new)
|
||||
└── ToolCallItem (repeated)
|
||||
├── Header (pill: chevron, name, summary, status)
|
||||
└── ResultContent (conditional)
|
||||
├── DiffResult (for Edit/Write)
|
||||
├── BashResult (for Bash)
|
||||
├── FileListResult (for Glob/Grep)
|
||||
└── GenericResult (fallback)
|
||||
|
||||
FullOutputModal (new, top-level)
|
||||
├── Header (tool name, file path)
|
||||
├── Content (full output, scrollable)
|
||||
└── CloseButton
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Specifications
|
||||
|
||||
### IMP-SERVER: Parse and Attach Tool Results
|
||||
|
||||
**Fulfills:** AC-24, AC-25, AC-26
|
||||
|
||||
**Location:** `amc_server/mixins/conversation.py`
|
||||
|
||||
**Changes to `_parse_claude_conversation`:**
|
||||
|
||||
Two-pass parsing:
|
||||
1. First pass: Scan all entries, build map of `tool_use_id` → `toolUseResult`
|
||||
2. Second pass: Parse messages as before, but when encountering `tool_use`, lookup and attach result
|
||||
|
||||
**Tool call schema after change:**
|
||||
```python
|
||||
{
|
||||
"name": "Edit",
|
||||
"id": "toolu_abc123",
|
||||
"input": {"file_path": "...", "old_string": "...", "new_string": "..."},
|
||||
"result": {
|
||||
"content": "The file has been updated successfully.",
|
||||
"is_error": False,
|
||||
"structuredPatch": [...],
|
||||
"filePath": "...",
|
||||
# ... other fields from toolUseResult
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Result Structure by Tool Type:**
|
||||
|
||||
| Tool | Result Fields |
|
||||
|------|---------------|
|
||||
| Edit | `structuredPatch`, `filePath`, `oldString`, `newString` |
|
||||
| Write | `filePath`, content confirmation |
|
||||
| Read | `file`, `type`, content in `content` field |
|
||||
| Bash | `stdout`, `stderr`, `interrupted` |
|
||||
| Glob | `filenames`, `numFiles`, `truncated` |
|
||||
| Grep | `content`, `filenames`, `numFiles`, `numLines` |
|
||||
|
||||
---
|
||||
|
||||
### IMP-TOOLCALL: Expandable Tool Call Component
|
||||
|
||||
**Fulfills:** AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7
|
||||
|
||||
**Location:** `dashboard/lib/markdown.js` (refactor `renderToolCalls`)
|
||||
|
||||
**New function: `ToolCallItem`**
|
||||
|
||||
Renders a single tool call with:
|
||||
- Chevron for expand/collapse (when result exists and not Edit/Write)
|
||||
- Tool name (bold, colored)
|
||||
- Summary (from existing `getToolSummary`)
|
||||
- Status icon (checkmark or X)
|
||||
- Result content (when expanded)
|
||||
|
||||
**State Management:**
|
||||
|
||||
Track expanded state per message. When new assistant message arrives:
|
||||
- Compare latest assistant message ID to stored ID
|
||||
- If different, reset expanded set to empty
|
||||
- Edit/Write tools bypass this logic (always expanded via CSS/logic)
|
||||
|
||||
---
|
||||
|
||||
### IMP-DIFF: Diff Rendering Component
|
||||
|
||||
**Fulfills:** AC-8, AC-9, AC-10, AC-11, AC-12
|
||||
|
||||
**Location:** `dashboard/lib/markdown.js` (new function `renderDiff`)
|
||||
|
||||
**Add diff language to highlight.js:**
|
||||
```javascript
|
||||
import langDiff from 'https://esm.sh/highlight.js@11.11.1/lib/languages/diff';
|
||||
hljs.registerLanguage('diff', langDiff);
|
||||
```
|
||||
|
||||
**Diff Renderer:**
|
||||
|
||||
1. Convert `structuredPatch` array to unified diff text:
|
||||
- Each hunk: `@@ -oldStart,oldLines +newStart,newLines @@`
|
||||
- Followed by hunk.lines array
|
||||
2. Syntax highlight with hljs diff language
|
||||
3. Sanitize with DOMPurify before rendering
|
||||
4. Wrap in container with file path header
|
||||
|
||||
**CSS styling:**
|
||||
- Container: dark border, rounded corners
|
||||
- Header: muted background, monospace font, full file path
|
||||
- Content: monospace, horizontal scroll
|
||||
- Additions: `background: rgba(46, 160, 67, 0.15)`
|
||||
- Deletions: `background: rgba(248, 81, 73, 0.15)`
|
||||
|
||||
---
|
||||
|
||||
### IMP-BASH: Bash Output Component
|
||||
|
||||
**Fulfills:** AC-13, AC-21, AC-22
|
||||
|
||||
**Location:** `dashboard/lib/markdown.js` (new function `renderBashResult`)
|
||||
|
||||
Renders:
|
||||
- `stdout` in monospace pre block
|
||||
- `stderr` in separate block with error styling (if present)
|
||||
- "Command interrupted" notice (if interrupted flag)
|
||||
|
||||
Error state: `is_error` or presence of stderr triggers error styling (red tint, left border).
|
||||
|
||||
---
|
||||
|
||||
### IMP-TRUNCATE: Output Truncation
|
||||
|
||||
**Fulfills:** AC-17, AC-18
|
||||
|
||||
**Truncation Thresholds (match Claude Code):**
|
||||
|
||||
| Tool Type | Max Lines | Max Chars |
|
||||
|-----------|-----------|-----------|
|
||||
| Bash stdout | 100 | 10000 |
|
||||
| Bash stderr | 50 | 5000 |
|
||||
| Read content | 500 | 50000 |
|
||||
| Grep matches | 100 | 10000 |
|
||||
| Glob files | 100 | 5000 |
|
||||
|
||||
**Note:** These thresholds need verification against Claude Code behavior. May require adjustment based on testing.
|
||||
|
||||
**Truncation Helper:**
|
||||
|
||||
Takes content string, returns `{ text, truncated, totalLines }`. If truncated, result renderers show "Show full output (N lines)" link.
|
||||
|
||||
---
|
||||
|
||||
### IMP-MODAL: Full Output Modal
|
||||
|
||||
**Fulfills:** AC-19, AC-20
|
||||
|
||||
**Location:** `dashboard/components/FullOutputModal.js` (new file)
|
||||
|
||||
**Structure:**
|
||||
- Overlay (click to close)
|
||||
- Modal container (click does NOT close)
|
||||
- Header: title (tool name + file path), close button
|
||||
- Content: scrollable pre/code block with syntax highlighting
|
||||
|
||||
**Integration:** Modal state managed at App level or ChatMessages level. "Show full output" link sets state with content + metadata.
|
||||
|
||||
---
|
||||
|
||||
### IMP-ERROR: Error State Styling
|
||||
|
||||
**Fulfills:** AC-21, AC-22, AC-23
|
||||
|
||||
**Styling:**
|
||||
- Tool call header: red-tinted background when `result.is_error`
|
||||
- Status icon: red X instead of green checkmark
|
||||
- Bash stderr: red text, italic, distinct from stdout
|
||||
- Overall: left border accent in error color
|
||||
|
||||
---
|
||||
|
||||
## Rollout Slices
|
||||
|
||||
### Slice 1: Design Mockups (Pre-Implementation)
|
||||
|
||||
**Goal:** Validate visual design before building
|
||||
|
||||
**Deliverables:**
|
||||
1. Create `/mockups` test route with static data
|
||||
2. Implement 3-4 design variants (card-based, minimal, etc.)
|
||||
3. Use real tool result data from session JSONL
|
||||
4. User reviews and selects preferred design
|
||||
|
||||
**Exit Criteria:** Design direction locked
|
||||
|
||||
---
|
||||
|
||||
### Slice 2: Server-Side Tool Result Parsing
|
||||
|
||||
**Goal:** API returns tool results nested in tool_calls
|
||||
|
||||
**Deliverables:**
|
||||
1. Two-pass parsing in `_parse_claude_conversation`
|
||||
2. Tool results attached with `id` field
|
||||
3. Unit tests for result attachment
|
||||
4. Handle missing results gracefully (return tool_call without result)
|
||||
|
||||
**Exit Criteria:** AC-24, AC-25, AC-26 pass
|
||||
|
||||
---
|
||||
|
||||
### Slice 3: Basic Expand/Collapse UI
|
||||
|
||||
**Goal:** Tool calls are expandable, show raw result content
|
||||
|
||||
**Deliverables:**
|
||||
1. Refactor `renderToolCalls` to `ToolCallList` component
|
||||
2. Implement expand/collapse with chevron
|
||||
3. Track expanded state per message
|
||||
4. Collapse on new assistant message
|
||||
5. Keep Edit/Write always expanded
|
||||
|
||||
**Exit Criteria:** AC-1 through AC-7 pass
|
||||
|
||||
---
|
||||
|
||||
### Slice 4: Diff Rendering
|
||||
|
||||
**Goal:** Edit/Write show beautiful diffs
|
||||
|
||||
**Deliverables:**
|
||||
1. Add diff language to highlight.js
|
||||
2. Implement `renderDiff` function
|
||||
3. VS Code dark theme styling
|
||||
4. Full file path header
|
||||
|
||||
**Exit Criteria:** AC-8 through AC-12 pass
|
||||
|
||||
---
|
||||
|
||||
### Slice 5: Other Tool Types
|
||||
|
||||
**Goal:** Bash, Read, Glob, Grep render appropriately
|
||||
|
||||
**Deliverables:**
|
||||
1. `renderBashResult` with stdout/stderr separation
|
||||
2. `renderFileContent` for Read
|
||||
3. `renderFileList` for Glob/Grep
|
||||
4. Generic fallback for unknown tools
|
||||
|
||||
**Exit Criteria:** AC-13 through AC-16 pass
|
||||
|
||||
---
|
||||
|
||||
### Slice 6: Truncation and Modal
|
||||
|
||||
**Goal:** Long outputs truncate with modal expansion
|
||||
|
||||
**Deliverables:**
|
||||
1. Truncation helper with Claude Code thresholds
|
||||
2. "Show full output" link
|
||||
3. `FullOutputModal` component
|
||||
4. Syntax highlighting in modal
|
||||
|
||||
**Exit Criteria:** AC-17 through AC-20 pass
|
||||
|
||||
---
|
||||
|
||||
### Slice 7: Error States and Polish
|
||||
|
||||
**Goal:** Failed tools visually distinct, edge cases handled
|
||||
|
||||
**Deliverables:**
|
||||
1. Error state styling (red tint)
|
||||
2. Muted styling for missing results
|
||||
3. Test with interrupted sessions
|
||||
4. Cross-browser testing
|
||||
|
||||
**Exit Criteria:** AC-21 through AC-23 pass, feature complete
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. **Exact Claude Code truncation thresholds** — need to verify against Claude Code source or experiment
|
||||
2. **Performance with 100+ tool calls** — monitor after ship, optimize if needed
|
||||
3. **Codex support timeline** — when should we prioritize v2?
|
||||
|
||||
---
|
||||
|
||||
## Appendix: Research Findings
|
||||
|
||||
### Claude Code JSONL Format
|
||||
|
||||
Tool calls and results are stored as separate entries:
|
||||
|
||||
```json
|
||||
// Assistant sends tool_use
|
||||
{"type": "assistant", "message": {"content": [{"type": "tool_use", "id": "toolu_abc", "name": "Edit", "input": {...}}]}}
|
||||
|
||||
// Result in separate user entry
|
||||
{"type": "user", "message": {"content": [{"type": "tool_result", "tool_use_id": "toolu_abc", "content": "Success"}]}, "toolUseResult": {...}}
|
||||
```
|
||||
|
||||
The `toolUseResult` object contains rich structured data varying by tool type.
|
||||
|
||||
### Missing Results Statistics
|
||||
|
||||
Across 55 sessions with 2,063 tool calls:
|
||||
- 11 missing results (0.5%)
|
||||
- Affected tools: Edit (4), Read (2), Bash (1), others
|
||||
|
||||
### Interrupt Handling
|
||||
|
||||
User interrupts create a separate user message:
|
||||
```json
|
||||
{"type": "user", "message": {"content": [{"type": "text", "text": "[Request interrupted by user for tool use]"}]}}
|
||||
```
|
||||
|
||||
Tool results for completed tools are still present; the interrupt message indicates the turn ended early.
|
||||
Reference in New Issue
Block a user