Rewrite session discovery to be filesystem-first, addressing the widespread
bug where Claude Code's sessions-index.json files are unreliable (87 MB of
unindexed sessions, 17% loss rate across all projects).
Architecture: Three-tier metadata lookup
Tier 1 - Index validation (instant):
- Parse sessions-index.json into Map<sessionId, IndexEntry>
- Validate entry.modified against actual file stat.mtimeMs
- Use 1s tolerance to account for ISO string → filesystem mtime rounding
- Trust content fields only (messageCount, summary, firstPrompt)
- Timestamps always come from fs.stat, never from index
Tier 2 - Persistent cache hit (instant):
- Check MetadataCache by (filePath, mtimeMs, size)
- If match, use cached metadata
- Survives server restarts
Tier 3 - Full JSONL parse (~5-50ms/file):
- Call extractSessionMetadata() with shared parser helpers
- Cache result for future lookups
Key correctness guarantees:
- All .jsonl files appear regardless of index state
- SessionEntry timestamps always from fs.stat (list ordering never stale)
- Message counts exact (shared helpers ensure parser parity)
- Duration computed from JSONL timestamps, not index
Performance:
- Bounded concurrency: 32 concurrent operations per project
- mapWithLimit() prevents file handle exhaustion
- Warm start <1s (stat all files, in-memory lookups)
- Cold start ~3-5s for 3,103 files (stat + parse phases)
TOCTOU handling:
- Files that disappear between readdir and stat: silently skipped
- Files that disappear between stat and read: silently skipped
- File actively being written: partial parse handled gracefully
Include PRD document that drove this implementation with detailed
requirements, edge cases, and verification plan.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce MetadataCache class in metadata-cache.ts that persists extracted
session metadata to ~/.cache/session-viewer/metadata.json for fast warm
starts across server restarts.
Key features:
- Invalidation keyed on (mtimeMs, size): If either changes, entry is
re-extracted via Tier 3 parsing. This catches both content changes
and file truncation/corruption.
- Dirty-flag write-behind: Only writes to disk when entries have changed,
coalescing multiple discovery passes into a single write operation.
- Atomic writes: Uses temp file + rename pattern to prevent corruption
from crashes during write. Safe for concurrent server restarts.
- Stale entry pruning: Removes entries for files that no longer exist
on disk during the save operation.
- Graceful degradation: Missing or corrupt cache file triggers fallback
to Tier 3 extraction for all files (cache rebuilt on next save).
Cache file format:
{
"version": 1,
"entries": {
"/path/to/session.jsonl": {
"mtimeMs": 1234567890,
"size": 12345,
"messageCount": 42,
"firstPrompt": "...",
"summary": "...",
"firstTimestamp": "...",
"lastTimestamp": "..."
}
}
}
Test coverage includes:
- Cache hit/miss/invalidation behavior
- Dirty flag triggers write only when entries changed
- Concurrent save coalescing
- Stale entry pruning on save
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce extractSessionMetadata() in a new session-metadata.ts module
that extracts only what the list view needs from JSONL files:
- messageCount: Uses shared countMessagesForLine() for exact parity
- firstPrompt: First non-system-reminder user message, truncated to 200 chars
- summary: Last type="summary" line's summary field
- firstTimestamp/lastTimestamp: For duration computation
Design goals:
- Parser parity: Uses forEachJsonlLine() and countMessagesForLine() from
session-parser.ts, ensuring list counts always match detail-view counts
- No string building: Avoids JSON.stringify and markdown processing
- 2-3x faster than full parse: Only captures metadata, skips content
- Graceful degradation: Handles malformed lines identically to full parser
This is the Tier 3 data source for JSONL-first session discovery. When
neither the sessions-index.json nor the persistent cache has valid data,
this function extracts fresh metadata from the file.
Test coverage includes:
- Output matches parseSessionContent().length on sample fixtures
- Duration extraction from JSONL timestamps
- firstPrompt extraction skips system-reminder content
- Empty files return zero counts and empty strings
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce three shared helpers in session-parser.ts that both the full
parser and the lightweight metadata extractor can use:
- forEachJsonlLine(content, onLine): Iterates JSONL lines with consistent
malformed-line handling. Skips invalid JSON lines identically to how
parseSessionContent handles them. Returns parse error count for diagnostics.
- countMessagesForLine(parsed): Returns the number of messages a single
JSONL line expands into, using the same classification rules as the
full parser. User arrays expand tool_result and text blocks; assistant
arrays expand thinking, text, and tool_use.
- classifyLine(parsed): Classifies a parsed line into one of 8 types
(user, assistant, system, progress, summary, file_snapshot, queue, other).
The internal extractMessages() function now uses these shared helpers,
ensuring no behavior change while enabling the upcoming metadata extraction
service to reuse the same logic. This guarantees list counts can never drift
from detail-view counts, regardless of future parser changes.
Test coverage includes:
- Malformed line handling parity with full parser
- Parse error counting for truncated/corrupted files
- countMessagesForLine output matches extractMessages().length
- Edge cases: empty files, progress events, array content expansion
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Test fixture updates:
- Add toolUseId fields (toolu_read1, toolu_edit1) to tool_use blocks
- Add parentToolUseID-linked progress events for read and edit tools
- Add orphaned SessionStart progress event (no parent)
- Update tool_result references to match new toolUseId values
- Add bash_progress and mcp_progress subtypes for subtype derivation
session-parser tests (7 new):
- toolUseId extraction from tool_use blocks with and without id field
- parentToolUseId and progressSubtype extraction from hook_progress
- Subtype derivation for bash_progress, mcp_progress, agent_progress
- Fallback to "hook" for unknown data types
- Undefined parentToolUseId when field is absent
progress-grouper tests (7 new):
- Partition parented progress into toolProgress map
- Remove parented progress from filtered messages array
- Keep orphaned progress (no parentToolUseId) in main stream
- Keep progress with invalid parentToolUseId (no matching tool_call)
- Empty input handling
- Sort each group by rawIndex
- Multiple tool_call parents tracked independently
agent-progress-parser tests (full suite):
- Parse user text events with prompt/agentId metadata extraction
- Parse tool_use blocks into AgentToolCall events
- Parse tool_result blocks with content extraction
- Parse text content as text_response with line counting
- Handle multiple content blocks in single turn
- Post-pass tool_result→tool_call linking (sourceTool, language)
- Empty input and malformed JSON → raw_content fallback
- stripLineNumbers for cat-n prefixed output
- summarizeToolCall for Read, Grep, Glob, Bash, Task, WarpGrep, etc.
ProgressBadge component tests:
- Collapsed state shows pill counts, hides content
- Expanded state shows all event content via markdown
- Subtype counting accuracy
- Agent-only events route to AgentProgressView
AgentProgressView component tests:
- Prompt banner rendering with truncation
- Agent ID and turn count display
- Summary rows with timestamps and tool names
- Click-to-expand drill-down content
html-exporter tests (8 new):
- Collapsible rendering for thinking, tool_call, tool_result
- Toggle button and JavaScript inclusion
- Non-collapsible messages lack collapse attributes
- Diff content detection and highlighting
- Progress badge rendering with toolProgress data
filters tests (2 new):
- hook_progress included/excluded by category toggle
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Export a new countSensitiveMessages() function that returns how many
messages in an array contain at least one sensitive pattern match.
Checks both content and toolInput fields, counting each message at
most once regardless of how many matches it contains.
Tests verify zero counts for clean messages, correct counting with
mixed sensitive/clean messages, and the single-count-per-message
invariant when multiple secrets appear in one message.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Category toggles and the auto-redact checkbox now survive page
reloads. On mount, useFilters reads from localStorage keys
session-viewer:enabledCategories and session-viewer:autoRedact,
falling back to defaults when storage is empty, corrupted, or
contains invalid category names. Each state change writes back
to localStorage in a useEffect.
Tests cover round-trip persistence, invalid data recovery, corrupted
JSON fallback, and the boolean coercion for auto-redact.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change the default-hidden message categories from [thinking,
hook_progress] to [tool_result, system_message, hook_progress,
file_snapshot]. This hides the verbose machine-oriented categories
by default while keeping thinking blocks visible — they contain
useful reasoning context that users typically want to see.
Also rename the "summary" category label from "Summaries" to
"Compactions" to better reflect what Claude's summary messages
actually represent (context-window compaction artifacts).
Tests updated to match the new defaults: the filter test now
asserts that tool_result, system_message, hook_progress, and
file_snapshot are all excluded, producing 5 visible messages
instead of the previous 7.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace hardcoded absolute paths in test assertions with dynamically
constructed paths matching the temp directory. This makes tests portable
across environments where path.resolve() produces different results.
Add test verifying that absolute paths pointing outside the projects
directory (e.g. /etc/shadow.jsonl) are rejected by the discovery filter.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Visual overhaul of exported HTML to match the new client dark design:
- Replace category-specific CSS classes with inline border/dot/text styles
from a CATEGORY_STYLES map matching client-side colors
- Add message header layout with category dot, label, and timestamp
- Add Inter font family, refined prose typography, and proper code styling
- Add print-friendly media query
- Redesign redacted divider with SVG eye-slash icon and red accent
- Add SVG icons to session header metadata (project, date, message count)
- Fix singular/plural for '1 message' vs 'N messages'
Performance: Skip markdown parsing for hook_progress, tool_result, and
file_snapshot categories (structured data). Render as preformatted text
instead, avoiding expensive marked.parse() on large JSON blobs (~300ms each).
Replace local escapeHtml with shared/escape-html module. Add formatTimestamp
helper. Add cast safety comment for marked.parse() sync usage.
Update test to verify singular message count ('1 message' not '1 messages').
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The keyword pre-filter used case-sensitive string matching for all patterns,
but several regex patterns use the /i flag (e.g. generic_api_key). This meant
inputs like 'ApiKey = "secret"' would skip the keyword check for 'api_key'
and miss the redaction entirely.
Changes:
- Add caseInsensitive parameter to hasKeyword() that lowercases both content
and keywords before comparison
- Detect /i flag on pattern regex and pass it through automatically
- Narrow IP address keywords from ["."] to ["0.", "1.", ..., "9."] to reduce
false-positive regex invocations on content containing periods
- Fix email regex character class [A-Z|a-z] → [A-Za-z] (the pipe was literal)
- Add clarifying comment on url_with_creds pattern
- Add test cases for mixed-case and UPPER_CASE key assignments
- Relax SECRET_KEY test assertion to accept either redaction label
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Security: Reject session paths containing '..' traversal segments or
non-.jsonl extensions before resolving them. This prevents a malicious
sessions-index.json from tricking the viewer into reading arbitrary files.
Performance: Process all project directories concurrently with Promise.all
instead of sequentially awaiting each one. Each directory's stat + readFile
is independent I/O that benefits from parallelism.
Add test case verifying that traversal paths and non-JSONL paths are rejected
while valid paths pass through.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>