Files

teernisse fb9d4e5b9f chore(plans): update implementation plans

plans/PLAN-tool-result-display.md:
- Add comprehensive plan for displaying tool results inline in
  conversation view, including truncation strategies and expand/collapse
  UI patterns

plans/subagent-visibility.md:
- Mark completed phases and update remaining work items
- Reflects current state of subagent tracking implementation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-28 00:49:02 -05:00

14 KiB

Raw Blame History

Plan: Tool Result Display in AMC Dashboard

Status: Draft — awaiting review and mockup phase Author: Claude + Taylor Created: 2026-02-27

Summary

Add the ability to view tool call results (diffs, bash output, file contents) directly in the AMC dashboard conversation view. Currently, users see that a tool was called but cannot see what it did. This feature brings Claude Code's result visibility to the multi-agent dashboard.

Goals

See code changes as they happen — diffs from Edit/Write tools always visible
Debug agent behavior — inspect Bash output, Read content, search results
Match Claude Code UX — familiar expand/collapse behavior with latest results expanded

Non-Goals (v1)

Codex agent support (different JSONL format — deferred to v2)
Copy-to-clipboard functionality
Virtual scrolling / performance optimization
Editor integration (clicking paths to open files)

User Workflows

Workflow 1: Watching an Active Session

User opens a session card showing an active Claude agent
Agent calls Edit tool to modify a file
User immediately sees the diff expanded below the tool call pill
Agent calls Bash to run tests
User sees bash output expanded, previous Edit diff stays expanded (it's a diff)
Agent sends a text message explaining results
Bash output collapses (new assistant message arrived), Edit diff stays expanded

Workflow 2: Reviewing a Completed Session

User opens a completed session to review what the agent did
All tool calls are collapsed by default (no "latest" assistant message)
Exception: Edit/Write diffs are still expanded
User clicks a Bash tool call to see what command ran and its output
User clicks "Show full output" when output is truncated
Lightweight modal opens with full scrollable content
User closes modal and continues reviewing

Workflow 3: Debugging a Failed Tool Call

Agent runs a Bash command that fails
Tool result block shows with red-tinted background
stderr content is visible, clearly marked as error
User can see what went wrong without leaving the dashboard

Acceptance Criteria

Display Behavior

AC-1: Tool calls render as expandable elements showing tool name and summary
AC-2: Clicking a collapsed tool call expands to show its result
AC-3: Clicking an expanded tool call collapses it
AC-4: Tool results in the most recent assistant message are expanded by default
AC-5: When a new assistant message arrives, previous tool results collapse
AC-6: Edit and Write tool diffs remain expanded regardless of message age
AC-7: Tool calls without results display as non-expandable with muted styling

Diff Rendering

AC-8: Edit/Write results display structuredPatch data as syntax-highlighted diff
AC-9: Diff additions render with VS Code dark theme green background (rgba(46, 160, 67, 0.15))
AC-10: Diff deletions render with VS Code dark theme red background (rgba(248, 81, 73, 0.15))
AC-11: Full file path displays above each diff block
AC-12: Diff context lines use structuredPatch as-is (no recomputation)

Other Tool Types

AC-13: Bash results display stdout in monospace, stderr separately if present
AC-14: Read results display file content with syntax highlighting based on file extension
AC-15: Grep/Glob results display file list with match counts
AC-16: WebFetch results display URL and response summary

Truncation

AC-17: Long outputs truncate at thresholds matching Claude Code behavior
AC-18: Truncated outputs show "Show full output (N lines)" link
AC-19: Clicking "Show full output" opens a dedicated lightweight modal
AC-20: Modal displays full content with syntax highlighting, scrollable

Error States

AC-21: Failed tool calls display with red-tinted background
AC-22: Error content (stderr, error messages) is clearly distinguishable from success content
AC-23: is_error flag from tool_result determines error state

API Contract

AC-24: /api/conversation response includes tool results nested in tool_calls
AC-25: Each tool_call has: name, id, input, result (when available)
AC-26: Result structure varies by tool type (documented in IMP-SERVER)

Architecture

Why Two-Pass JSONL Parsing

The Claude Code JSONL stores tool_use and tool_result as separate entries linked by tool_use_id. To nest results inside tool_calls for the API response, the server must:

First pass: Build a map of tool_use_id → toolUseResult
Second pass: Parse messages, attaching results to matching tool_calls

This adds parsing overhead but keeps the API contract simple. Alternatives considered:

Streaming/incremental: More complex, doesn't help since we need full conversation anyway
Client-side joining: Shifts complexity to frontend, increases payload size

Why Render Everything, Not Virtual Scroll

Sessions typically have 20-80 tool calls. Modern browsers handle hundreds of DOM elements efficiently. Virtual scrolling adds significant complexity (measuring, windowing, scroll position management) for marginal benefit.

Decision: Ship simple, measure real-world performance, optimize if >100ms render times observed.

Full output can be thousands of lines. Inline expansion would:

Push other content out of view
Make scrolling confusing
Lose context of surrounding conversation

A modal provides a focused reading experience without disrupting conversation layout.

Component Structure

MessageBubble
├── Content (text)
├── Thinking (existing)
└── ToolCallList (new)
    └── ToolCallItem (repeated)
        ├── Header (pill: chevron, name, summary, status)
        └── ResultContent (conditional)
            ├── DiffResult (for Edit/Write)
            ├── BashResult (for Bash)
            ├── FileListResult (for Glob/Grep)
            └── GenericResult (fallback)

FullOutputModal (new, top-level)
├── Header (tool name, file path)
├── Content (full output, scrollable)
└── CloseButton

Implementation Specifications

IMP-SERVER: Parse and Attach Tool Results

Fulfills: AC-24, AC-25, AC-26

Location: amc_server/mixins/conversation.py

Changes to _parse_claude_conversation:

Two-pass parsing:

First pass: Scan all entries, build map of tool_use_id → toolUseResult
Second pass: Parse messages as before, but when encountering tool_use, lookup and attach result

Tool call schema after change:

{
    "name": "Edit",
    "id": "toolu_abc123",
    "input": {"file_path": "...", "old_string": "...", "new_string": "..."},
    "result": {
        "content": "The file has been updated successfully.",
        "is_error": False,
        "structuredPatch": [...],
        "filePath": "...",
        # ... other fields from toolUseResult
    }
}

Result Structure by Tool Type:

Tool	Result Fields
Edit	`structuredPatch`, `filePath`, `oldString`, `newString`
Write	`filePath`, content confirmation
Read	`file`, `type`, content in `content` field
Bash	`stdout`, `stderr`, `interrupted`
Glob	`filenames`, `numFiles`, `truncated`
Grep	`content`, `filenames`, `numFiles`, `numLines`

IMP-TOOLCALL: Expandable Tool Call Component

Fulfills: AC-1, AC-2, AC-3, AC-4, AC-5, AC-6, AC-7

Location: dashboard/lib/markdown.js (refactor renderToolCalls)

New function: ToolCallItem

Renders a single tool call with:

Chevron for expand/collapse (when result exists and not Edit/Write)
Tool name (bold, colored)
Summary (from existing getToolSummary)
Status icon (checkmark or X)
Result content (when expanded)

State Management:

Track expanded state per message. When new assistant message arrives:

Compare latest assistant message ID to stored ID
If different, reset expanded set to empty
Edit/Write tools bypass this logic (always expanded via CSS/logic)

IMP-DIFF: Diff Rendering Component

Fulfills: AC-8, AC-9, AC-10, AC-11, AC-12

Location: dashboard/lib/markdown.js (new function renderDiff)

Add diff language to highlight.js:

import langDiff from 'https://esm.sh/highlight.js@11.11.1/lib/languages/diff';
hljs.registerLanguage('diff', langDiff);

Diff Renderer:

Convert structuredPatch array to unified diff text:
- Each hunk: @@ -oldStart,oldLines +newStart,newLines @@
- Followed by hunk.lines array
Syntax highlight with hljs diff language
Sanitize with DOMPurify before rendering
Wrap in container with file path header

CSS styling:

Container: dark border, rounded corners
Header: muted background, monospace font, full file path
Content: monospace, horizontal scroll
Additions: background: rgba(46, 160, 67, 0.15)
Deletions: background: rgba(248, 81, 73, 0.15)

IMP-BASH: Bash Output Component

Fulfills: AC-13, AC-21, AC-22

Location: dashboard/lib/markdown.js (new function renderBashResult)

Renders:

stdout in monospace pre block
stderr in separate block with error styling (if present)
"Command interrupted" notice (if interrupted flag)

Error state: is_error or presence of stderr triggers error styling (red tint, left border).

IMP-TRUNCATE: Output Truncation

Fulfills: AC-17, AC-18

Truncation Thresholds (match Claude Code):

Tool Type	Max Lines	Max Chars
Bash stdout	100	10000
Bash stderr	50	5000
Read content	500	50000
Grep matches	100	10000
Glob files	100	5000

Note: These thresholds need verification against Claude Code behavior. May require adjustment based on testing.

Truncation Helper:

Takes content string, returns { text, truncated, totalLines }. If truncated, result renderers show "Show full output (N lines)" link.

Fulfills: AC-19, AC-20

Location: dashboard/components/FullOutputModal.js (new file)

Structure:

Overlay (click to close)
Modal container (click does NOT close)
Header: title (tool name + file path), close button
Content: scrollable pre/code block with syntax highlighting

Integration: Modal state managed at App level or ChatMessages level. "Show full output" link sets state with content + metadata.

IMP-ERROR: Error State Styling

Fulfills: AC-21, AC-22, AC-23

Styling:

Tool call header: red-tinted background when result.is_error
Status icon: red X instead of green checkmark
Bash stderr: red text, italic, distinct from stdout
Overall: left border accent in error color

Rollout Slices

Slice 1: Design Mockups (Pre-Implementation)

Goal: Validate visual design before building

Deliverables:

Create /mockups test route with static data
Implement 3-4 design variants (card-based, minimal, etc.)
Use real tool result data from session JSONL
User reviews and selects preferred design

Exit Criteria: Design direction locked

Slice 2: Server-Side Tool Result Parsing

Goal: API returns tool results nested in tool_calls

Deliverables:

Two-pass parsing in _parse_claude_conversation
Tool results attached with id field
Unit tests for result attachment
Handle missing results gracefully (return tool_call without result)

Exit Criteria: AC-24, AC-25, AC-26 pass

Slice 3: Basic Expand/Collapse UI

Goal: Tool calls are expandable, show raw result content

Deliverables:

Refactor renderToolCalls to ToolCallList component
Implement expand/collapse with chevron
Track expanded state per message
Collapse on new assistant message
Keep Edit/Write always expanded

Exit Criteria: AC-1 through AC-7 pass

Slice 4: Diff Rendering

Goal: Edit/Write show beautiful diffs

Deliverables:

Add diff language to highlight.js
Implement renderDiff function
VS Code dark theme styling
Full file path header

Exit Criteria: AC-8 through AC-12 pass

Slice 5: Other Tool Types

Goal: Bash, Read, Glob, Grep render appropriately

Deliverables:

renderBashResult with stdout/stderr separation
renderFileContent for Read
renderFileList for Glob/Grep
Generic fallback for unknown tools

Exit Criteria: AC-13 through AC-16 pass

Goal: Long outputs truncate with modal expansion

Deliverables:

Truncation helper with Claude Code thresholds
"Show full output" link
FullOutputModal component
Syntax highlighting in modal

Exit Criteria: AC-17 through AC-20 pass

Slice 7: Error States and Polish

Goal: Failed tools visually distinct, edge cases handled

Deliverables:

Error state styling (red tint)
Muted styling for missing results
Test with interrupted sessions
Cross-browser testing

Exit Criteria: AC-21 through AC-23 pass, feature complete

Open Questions

Exact Claude Code truncation thresholds — need to verify against Claude Code source or experiment
Performance with 100+ tool calls — monitor after ship, optimize if needed
Codex support timeline — when should we prioritize v2?

Appendix: Research Findings

Claude Code JSONL Format

Tool calls and results are stored as separate entries:

// Assistant sends tool_use
{"type": "assistant", "message": {"content": [{"type": "tool_use", "id": "toolu_abc", "name": "Edit", "input": {...}}]}}

// Result in separate user entry
{"type": "user", "message": {"content": [{"type": "tool_result", "tool_use_id": "toolu_abc", "content": "Success"}]}, "toolUseResult": {...}}

The toolUseResult object contains rich structured data varying by tool type.

Missing Results Statistics

Across 55 sessions with 2,063 tool calls:

11 missing results (0.5%)
Affected tools: Edit (4), Read (2), Bash (1), others

Interrupt Handling

User interrupts create a separate user message:

{"type": "user", "message": {"content": [{"type": "text", "text": "[Request interrupted by user for tool use]"}]}}

Tool results for completed tools are still present; the interrupt message indicates the turn ended early.

14 KiB Raw Blame History

Plan: Tool Result Display in AMC Dashboard

Summary

Goals

Non-Goals (v1)

User Workflows

Workflow 1: Watching an Active Session

Workflow 2: Reviewing a Completed Session

Workflow 3: Debugging a Failed Tool Call

Acceptance Criteria

Display Behavior

Diff Rendering

Other Tool Types

Truncation

Error States

API Contract

Architecture

Why Two-Pass JSONL Parsing

Why Render Everything, Not Virtual Scroll

Why Dedicated Modal Over Inline Expansion

Component Structure

Implementation Specifications

IMP-SERVER: Parse and Attach Tool Results

IMP-TOOLCALL: Expandable Tool Call Component

IMP-DIFF: Diff Rendering Component

IMP-BASH: Bash Output Component

IMP-TRUNCATE: Output Truncation

IMP-MODAL: Full Output Modal

IMP-ERROR: Error State Styling

Rollout Slices

Slice 1: Design Mockups (Pre-Implementation)

Slice 2: Server-Side Tool Result Parsing

Slice 3: Basic Expand/Collapse UI

Slice 4: Diff Rendering

Slice 5: Other Tool Types

Slice 6: Truncation and Modal

Slice 7: Error States and Polish

Open Questions

Appendix: Research Findings

Claude Code JSONL Format

Missing Results Statistics

Interrupt Handling

14 KiB

Raw Blame History