Cover the core JSONL byte-level parser with targeted tests and a fuzz
harness to prevent regressions and catch panics on malformed input:
Unit tests:
- TestParseFile_UserMessages: verifies user message counting and
ProjectPath extraction from cwd field
- TestParseFile_AssistantDedup: confirms message-ID-based deduplication
where the last entry wins (handles edits/retries)
- TestParseFile_TimeRange: validates StartTime/EndTime tracking across
out-of-order timestamps
- TestParseFile_SystemDuration: tests turn_duration aggregation from
system entries (durationMs -> DurationSecs conversion)
- TestParseFile_EmptyFile: ensures zero stats without errors on empty input
- TestParseFile_MalformedLines: confirms graceful skip of unparseable
lines without aborting the entire file
- TestParseFile_CacheTokens: validates extraction of cache_read,
cache_creation_5m, and cache_creation_1h token fields
- TestExtractTopLevelType: table-driven tests for the byte-level type
extractor covering user, assistant, system, nested-type-ignored,
unknown, no-type, and empty cases
Fuzz test:
- FuzzExtractTopLevelType: seeds with realistic patterns plus edge cases
(unterminated strings, non-string type values, empty input). Asserts
the parser never panics and only returns known type strings or empty.
Uses a writeSession helper that creates temp JSONL files for each test,
keeping tests isolated and cleanup automatic via t.TempDir().
Implement the bottom of the data pipeline — discovery and parsing of
Claude Code session files:
- source/types.go: Raw JSON deserialization types (RawEntry,
RawMessage, RawUsage, CacheCreation) matching the Claude Code
JSONL schema. DiscoveredFile carries file metadata including
decoded project name, session ID, and subagent relationship info.
- source/scanner.go: ScanDir walks ~/.claude/projects/ to discover
all .jsonl session files. Detects subagent files by the
<project>/<session>/subagents/agent-<id>.jsonl path pattern and
links them to parent sessions. decodeProjectName reverses Claude
Code's path-encoding convention (/-delimited path segments joined
with hyphens) by scanning for known parent markers (projects,
repos, src, code, workspace, dev) and extracting the project name
after the last marker.
- source/parser.go: ParseFile processes a single JSONL session file.
Uses a hybrid parsing strategy for performance:
* "user" and "system" entries: byte-level field extraction for
timestamps, cwd, and turn_duration (avoids JSON allocation).
extractTopLevelType tracks brace depth and string boundaries to
find only the top-level "type" field, early-exiting ~400 bytes
in for O(1) per line cost regardless of line length.
* "assistant" entries: full JSON unmarshal to extract token usage,
model name, and cost data.
Deduplicates API calls by message.id (keeping the last entry per
ID, which holds the final billed usage). Computes per-model cost
breakdown using config.CalculateCost and aggregates cache hit rate.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>