plans/input-history.md: - Implementation plan for shell-style up/down arrow message history in SimpleInput, deriving history from session log conversation data - Covers prop threading, history derivation, navigation state, keybinding details, modal parity, and test cases plans/model-selection.md: - Three-phase plan for model visibility and control: display current model, model picker at spawn, mid-session model switching via Zellij plans/PLAN-tool-result-display.md: - Updates to tool result display plan (pre-existing changes) plans/subagent-visibility.md: - Updates to subagent visibility plan (pre-existing changes) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
18 KiB
Plan: Tool Result Display in AMC Dashboard
Status: Draft — awaiting review and mockup phase Author: Claude + Taylor Created: 2026-02-27
Summary
Add the ability to view tool call results (diffs, bash output, file contents) directly in the AMC dashboard conversation view. Currently, users see that a tool was called but cannot see what it did. This feature brings Claude Code's result visibility to the multi-agent dashboard.
Goals
- See code changes as they happen — diffs from Edit/Write tools always visible
- Debug agent behavior — inspect Bash output, Read content, search results
- Match Claude Code UX — familiar expand/collapse behavior with latest results expanded
Non-Goals (v1)
- Codex agent support (different JSONL format — deferred to v2)
- Copy-to-clipboard functionality
- Virtual scrolling / performance optimization
- Editor integration (clicking paths to open files)
- Accessibility (keyboard navigation, focus management, ARIA labels — deferred to v2)
- Lazy-fetch API for tool results (consider for v2 if payload size becomes an issue)
User Workflows
Workflow 1: Watching an Active Session
- User opens a session card showing an active Claude agent
- Agent calls Edit tool to modify a file
- User immediately sees the diff expanded below the tool call pill
- Agent calls Bash to run tests
- User sees bash output expanded, previous Edit diff stays expanded (it's a diff)
- Agent sends a text message explaining results
- Bash output collapses (new assistant message arrived), Edit diff stays expanded
Workflow 2: Reviewing a Completed Session
- User opens a completed session to review what the agent did
- All tool calls are collapsed by default (no "latest" assistant message)
- Exception: Edit/Write diffs are still expanded
- User clicks a Bash tool call to see what command ran and its output
- User clicks "Show full output" when output is truncated
- Lightweight modal opens with full scrollable content
- User closes modal and continues reviewing
Workflow 3: Debugging a Failed Tool Call
- Agent runs a Bash command that fails
- Tool result block shows with red-tinted background
- stderr content is visible, clearly marked as error
- User can see what went wrong without leaving the dashboard
Acceptance Criteria
Display Behavior
- AC-1: Tool calls render as expandable elements showing tool name and summary
- AC-2: Clicking a collapsed tool call expands to show its result
- AC-3: Clicking an expanded tool call collapses it
- AC-4: In active sessions, tool results in the most recent assistant message are expanded by default
- AC-5: When a new assistant message arrives, previous non-diff tool results collapse unless the user has manually toggled them in that message
- AC-6: Edit and Write results remain expanded regardless of message age or session status (even if Write only has confirmation text)
- AC-7: In completed sessions, all non-diff tool results start collapsed
- AC-8: Tool calls without results display as non-expandable with muted styling; in active sessions, pending tool calls show a spinner to distinguish in-progress from permanently missing
Diff Rendering
- AC-9: Edit/Write results display structuredPatch data as syntax-highlighted diff; falls back to raw content text if structuredPatch is malformed or absent
- AC-10: Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
- AC-11: Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
- AC-12: Full file path displays above each diff block
- AC-13: Diff context lines use structuredPatch as-is (no recomputation)
Other Tool Types
- AC-14: Bash results display stdout in monospace, stderr separately if present
- AC-15: Bash output with ANSI escape codes renders as colored HTML (via ansi_up)
- AC-16: Read results display file content with syntax highlighting based on file extension
- AC-17: Grep/Glob results display file list with match counts
- AC-18: Unknown tools (WebFetch, Task, etc.) use GenericResult fallback showing raw content
Truncation
- AC-19: Long outputs truncate at configurable line/character thresholds (defaults tuned to approximate Claude Code behavior)
- AC-20: Truncated outputs show "Show full output (N lines)" link
- AC-21: Clicking "Show full output" opens a dedicated lightweight modal
- AC-22: Modal displays full content with syntax highlighting, scrollable
Error States
- AC-23: Failed tool calls display with red-tinted background
- AC-24: Error content (stderr, error messages) is clearly distinguishable from success content
- AC-25: is_error flag from tool_result determines error state
API Contract
- AC-26: /api/conversation response includes tool results nested in tool_calls
- AC-27: Each tool_call has: name, id, input, result (when available)
- AC-28: All tool results conform to a normalized envelope:
{ kind, status, content, is_error }with tool-specific fields nested incontent
Architecture
Why Two-Pass JSONL Parsing
The Claude Code JSONL stores tool_use and tool_result as separate entries linked by tool_use_id. To nest results inside tool_calls for the API response, the server must:
- First pass: Build a map of tool_use_id → toolUseResult
- Second pass: Parse messages, attaching results to matching tool_calls
This adds parsing overhead but keeps the API contract simple. Alternatives considered:
- Streaming/incremental: More complex, doesn't help since we need full conversation anyway
- Client-side joining: Shifts complexity to frontend, increases payload size
Why Render Everything, Not Virtual Scroll
Sessions typically have 20-80 tool calls. Modern browsers handle hundreds of DOM elements efficiently. Virtual scrolling adds significant complexity (measuring, windowing, scroll position management) for marginal benefit.
Decision: Ship simple, measure real-world performance, optimize if >100ms render times observed.
Why Dedicated Modal Over Inline Expansion
Full output can be thousands of lines. Inline expansion would:
- Push other content out of view
- Make scrolling confusing
- Lose context of surrounding conversation
A modal provides a focused reading experience without disrupting conversation layout.
Why a Normalized Result Contract
Raw toolUseResult shapes vary wildly by tool type — Edit has structuredPatch, Bash has stdout/stderr, Glob has filenames. Passing these raw to the frontend means every renderer must know the exact JSONL format, and adding Codex support (v2) would require duplicating all that branching.
Instead, the server normalizes each result into a stable envelope:
{
"kind": "diff" | "bash" | "file_content" | "file_list" | "generic",
"status": "success" | "error" | "pending",
"is_error": bool,
"content": { ... } # tool-specific fields, documented per kind
}
The frontend switches on kind (5 cases) rather than tool name (unbounded). This also gives us a clean seam for the result_mode query parameter if payload size becomes an issue later.
Component Structure
MessageBubble
├── Content (text)
├── Thinking (existing)
└── ToolCallList (new)
└── ToolCallItem (repeated)
├── Header (pill: chevron, name, summary, status)
└── ResultContent (conditional)
├── DiffResult (for Edit/Write)
├── BashResult (for Bash)
├── FileListResult (for Glob/Grep)
└── GenericResult (fallback)
FullOutputModal (new, top-level)
├── Header (tool name, file path)
├── Content (full output, scrollable)
└── CloseButton
Implementation Specifications
IMP-SERVER: Parse and Attach Tool Results
Fulfills: AC-26, AC-27, AC-28
Location: amc_server/mixins/conversation.py
Changes to _parse_claude_conversation:
Two-pass parsing:
- First pass: Scan all entries, build map of
tool_use_id→toolUseResult - Second pass: Parse messages as before, but when encountering
tool_use, lookup and attach result
API query parameter: /api/conversation?result_mode=full (default). Future option: result_mode=preview to return truncated previews and reduce payload size without an API-breaking change.
Normalization step: After looking up the raw toolUseResult, the server normalizes it into the stable envelope before attaching:
{
"name": "Edit",
"id": "toolu_abc123",
"input": {"file_path": "...", "old_string": "...", "new_string": "..."},
"result": {
"kind": "diff",
"status": "success",
"is_error": False,
"content": {
"structuredPatch": [...],
"filePath": "...",
"text": "The file has been updated successfully."
}
}
}
Normalized kind mapping:
| kind | Source Tools | content Fields |
|---|---|---|
diff |
Edit, Write | structuredPatch, filePath, text |
bash |
Bash | stdout, stderr, interrupted |
file_content |
Read | file, type, text |
file_list |
Glob, Grep | filenames, numFiles, truncated, numLines |
generic |
All others | text (raw content string) |
IMP-TOOLCALL: Expandable Tool Call Component
Fulfills: AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7, AC-8
Location: dashboard/lib/markdown.js (refactor renderToolCalls)
New function: ToolCallItem
Renders a single tool call with:
- Chevron for expand/collapse (when result exists and not Edit/Write)
- Tool name (bold, colored)
- Summary (from existing
getToolSummary) - Status icon (checkmark or X)
- Result content (when expanded)
State Management:
Track two sets per message: autoExpanded (system-controlled) and userToggled (manual clicks).
When new assistant message arrives:
- Compare latest assistant message ID to stored ID
- If different, reset
autoExpandedto empty for previous messages userToggledentries are never reset — user intent is preserved- Edit/Write tools bypass this logic (always expanded via CSS/logic)
Expand/collapse logic: a tool call is expanded if it is in userToggled (explicit click) OR in autoExpanded (latest message) OR is Edit/Write kind.
IMP-DIFF: Diff Rendering Component
Fulfills: AC-9, AC-10, AC-11, AC-12, AC-13
Location: dashboard/lib/markdown.js (new function renderDiff)
Add diff language to highlight.js:
import langDiff from 'https://esm.sh/highlight.js@11.11.1/lib/languages/diff';
hljs.registerLanguage('diff', langDiff);
Diff Renderer:
- If
structuredPatchis present and valid, convert to unified diff text:- Each hunk:
@@ -oldStart,oldLines +newStart,newLines @@ - Followed by hunk.lines array
- Each hunk:
- If
structuredPatchis missing or malformed, fall back to rawcontent.textin a monospace block - Syntax highlight with hljs diff language
- Sanitize with DOMPurify before rendering
- Wrap in container with file path header
CSS styling:
- Container: dark border, rounded corners
- Header: muted background, monospace font, full file path
- Content: monospace, horizontal scroll
- Additions:
background: rgba(46, 160, 67, 0.15) - Deletions:
background: rgba(248, 81, 73, 0.15)
IMP-BASH: Bash Output Component
Fulfills: AC-14, AC-15, AC-23, AC-24
Location: dashboard/lib/markdown.js (new function renderBashResult)
ANSI-to-HTML conversion:
import AnsiUp from 'https://esm.sh/ansi_up';
const ansi = new AnsiUp();
const html = ansi.ansi_to_html(bashOutput);
The ansi_up library (zero dependencies, ~8KB) converts ANSI escape codes to styled HTML spans, preserving colored test output, progress indicators, and error highlighting from CLI tools.
Renders:
stdoutin monospace pre block with ANSI colors preservedstderrin separate block with error styling (if present)- "Command interrupted" notice (if interrupted flag)
Sanitization order (CRITICAL): First convert ANSI to HTML via ansi_up, THEN sanitize with DOMPurify. Sanitizing before conversion would strip escape codes; sanitizing after preserves the styled spans while preventing XSS.
Error state: is_error or presence of stderr triggers error styling (red tint, left border).
IMP-TRUNCATE: Output Truncation
Fulfills: AC-19, AC-20
Truncation Thresholds (match Claude Code):
| Tool Type | Max Lines | Max Chars |
|---|---|---|
| Bash stdout | 100 | 10000 |
| Bash stderr | 50 | 5000 |
| Read content | 500 | 50000 |
| Grep matches | 100 | 10000 |
| Glob files | 100 | 5000 |
Note: These thresholds need verification against Claude Code behavior. May require adjustment based on testing.
Truncation Helper:
Takes content string, returns { text, truncated, totalLines }. If truncated, result renderers show "Show full output (N lines)" link.
IMP-MODAL: Full Output Modal
Fulfills: AC-21, AC-22
Location: dashboard/components/FullOutputModal.js (new file)
Structure:
- Overlay (click to close)
- Modal container (click does NOT close)
- Header: title (tool name + file path), close button
- Content: scrollable pre/code block with syntax highlighting
Integration: Modal state managed at App level or ChatMessages level. "Show full output" link sets state with content + metadata.
IMP-ERROR: Error State Styling
Fulfills: AC-23, AC-24, AC-25
Styling:
- Tool call header: red-tinted background when
result.is_error - Status icon: red X instead of green checkmark
- Bash stderr: red text, italic, distinct from stdout
- Overall: left border accent in error color
Rollout Slices
Slice 1: Design Mockups (Pre-Implementation)
Goal: Validate visual design before building
Deliverables:
- Create
/mockupstest route with static data - Implement 3-4 design variants (card-based, minimal, etc.)
- Use real tool result data from session JSONL
- User reviews and selects preferred design
Exit Criteria: Design direction locked
Slice 2: Server-Side Tool Result Parsing and Normalization
Goal: API returns normalized tool results nested in tool_calls
Deliverables:
- Two-pass parsing in
_parse_claude_conversation - Normalization layer: raw
toolUseResult→{ kind, status, is_error, content }envelope - Tool results attached with
idfield - Unit tests for result attachment and normalization per tool type
- Handle missing results gracefully (return tool_call without result)
- Support
result_mode=fullquery parameter (only mode for now, but wired up for futurepreview)
Exit Criteria: AC-26, AC-27, AC-28 pass
Slice 3: Basic Expand/Collapse UI
Goal: Tool calls are expandable, show raw result content
Deliverables:
- Refactor
renderToolCallstoToolCallListcomponent - Implement expand/collapse with chevron
- Track expanded state per message
- Collapse on new assistant message
- Keep Edit/Write always expanded
Exit Criteria: AC-1 through AC-8 pass
Slice 4: Diff Rendering
Goal: Edit/Write show beautiful diffs
Deliverables:
- Add diff language to highlight.js
- Implement
renderDifffunction - VS Code dark theme styling
- Full file path header
Exit Criteria: AC-9 through AC-13 pass
Slice 5: Other Tool Types
Goal: Bash, Read, Glob, Grep render appropriately
Deliverables:
- Import and configure
ansi_upfor ANSI-to-HTML conversion renderBashResultwith stdout/stderr separation and ANSI color preservationrenderFileContentfor ReadrenderFileListfor Glob/GrepGenericResultfallback for unknown tools (WebFetch, Task, etc.)
Exit Criteria: AC-14 through AC-18 pass
Slice 6: Truncation and Modal
Goal: Long outputs truncate with modal expansion
Deliverables:
- Truncation helper with Claude Code thresholds
- "Show full output" link
FullOutputModalcomponent- Syntax highlighting in modal
Exit Criteria: AC-19 through AC-22 pass
Slice 7: Error States and Polish
Goal: Failed tools visually distinct, edge cases handled
Deliverables:
- Error state styling (red tint)
- Muted styling for missing results
- Test with interrupted sessions
- Cross-browser testing
Exit Criteria: AC-23 through AC-25 pass, feature complete
Open Questions
Exact Claude Code truncation thresholds— Resolved: using reasonable defaults with a note to tune via testing. AC-19 updated.- Performance with 100+ tool calls — monitor after ship, optimize if needed
- Codex support timeline — when should we prioritize v2? The normalized
kindcontract makes this easier: add Codex normalizers without touching renderers. Lazy-fetch for large payloads— Resolved:result_modequery parameter wired into API contract. Onlyfullimplemented in v1;previewdeferred.
Appendix: Research Findings
Claude Code JSONL Format
Tool calls and results are stored as separate entries:
// Assistant sends tool_use
{"type": "assistant", "message": {"content": [{"type": "tool_use", "id": "toolu_abc", "name": "Edit", "input": {...}}]}}
// Result in separate user entry
{"type": "user", "message": {"content": [{"type": "tool_result", "tool_use_id": "toolu_abc", "content": "Success"}]}, "toolUseResult": {...}}
The toolUseResult object contains rich structured data varying by tool type.
Missing Results Statistics
Across 55 sessions with 2,063 tool calls:
- 11 missing results (0.5%)
- Affected tools: Edit (4), Read (2), Bash (1), others
Interrupt Handling
User interrupts create a separate user message:
{"type": "user", "message": {"content": [{"type": "text", "text": "[Request interrupted by user for tool use]"}]}}
Tool results for completed tools are still present; the interrupt message indicates the turn ended early.