Commit Graph

5 Commits

Author SHA1 Message Date
teernisse
24454247a3 feat: add data pipeline with parallel loading, aggregation, and cache integration
Implement the pipeline layer that orchestrates discovery, parsing,
caching, and aggregation:

- pipeline/loader.go: Load() discovers session files via ScanDir,
  optionally filters out subagent files, then parses all files in
  parallel using a bounded worker pool sized to GOMAXPROCS. Workers
  read from a pre-filled channel (no contention on dispatch) and
  report progress via an atomic counter and callback. LoadResult
  tracks total files, parsed files, parse errors, and file errors.

- pipeline/aggregator.go: Five aggregation functions, all operating
  on time-filtered session slices:

  * Aggregate: computes SummaryStats across all sessions — total
    tokens (5 types), estimated cost, cache savings (summed per-model
    via config.CalculateCacheSavings), cache hit rate, and per-active-
    day rates (cost, tokens, sessions, prompts, minutes).

  * AggregateDays: groups sessions by local calendar date, sorted
    most-recent-first.

  * AggregateModels: groups by normalized model name with share
    percentages, sorted by cost descending.

  * AggregateProjects: groups by project name, sorted by cost.

  * AggregateHourly: distributes prompt/session/token counts across
    24 hour buckets (attributed to session start hour).

  Also provides FilterByTime, FilterByProject, FilterByModel with
  case-insensitive substring matching.

- pipeline/incremental.go: LoadWithCache() implements the incremental
  loading strategy — compares discovered files against the cache's
  file_tracker (mtime_ns + size), loads unchanged sessions from
  SQLite, and only reparses files that changed. Reparsed results
  are immediately saved back to cache. CacheDir/CachePath follow
  XDG_CACHE_HOME convention (~/.cache/cburn/metrics.db).

- pipeline/bench_test.go: Benchmarks for ScanDir, ParseFile (worst-
  case largest file), full Load, and LoadWithCache to measure the
  incremental cache speedup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 13:01:56 -05:00
teernisse
0e9091f56e feat: add SQLite cache store for incremental session persistence
Implement the caching layer that enables fast subsequent runs by
persisting parsed session data in SQLite:

- store/schema.go: DDL for three tables — sessions (primary metrics
  with file_mtime_ns/file_size for change detection), session_models
  (per-model breakdown, FK cascade on delete), and file_tracker
  (path -> mtime+size mapping for cache invalidation). Indexes on
  start_time and project for efficient time-range and filter queries.

- store/cache.go: Cache struct wrapping database/sql with WAL mode
  and synchronous=normal for concurrent read safety and write
  performance. Key operations:

  * Open: creates the cache directory, opens/creates the database,
    and ensures the schema is applied (idempotent via IF NOT EXISTS).

  * GetTrackedFiles: returns the mtime/size map used by the pipeline
    to determine which files need reparsing.

  * SaveSession: transactional upsert of session stats + model
    breakdown + file tracker entry. Uses INSERT OR REPLACE to handle
    both new files and files that changed since last parse.

  * LoadAllSessions: batch-loads all cached sessions with a two-pass
    strategy — first loads session rows, then batch-loads model data
    with an index map for O(1) join, avoiding N+1 queries.

  Uses modernc.org/sqlite (pure-Go, no CGO) for zero-dependency
  cross-platform builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 13:01:40 -05:00
teernisse
ad484a2a6f feat: add JSONL source layer with directory scanner and byte-level parser
Implement the bottom of the data pipeline — discovery and parsing of
Claude Code session files:

- source/types.go: Raw JSON deserialization types (RawEntry,
  RawMessage, RawUsage, CacheCreation) matching the Claude Code
  JSONL schema. DiscoveredFile carries file metadata including
  decoded project name, session ID, and subagent relationship info.

- source/scanner.go: ScanDir walks ~/.claude/projects/ to discover
  all .jsonl session files. Detects subagent files by the
  <project>/<session>/subagents/agent-<id>.jsonl path pattern and
  links them to parent sessions. decodeProjectName reverses Claude
  Code's path-encoding convention (/-delimited path segments joined
  with hyphens) by scanning for known parent markers (projects,
  repos, src, code, workspace, dev) and extracting the project name
  after the last marker.

- source/parser.go: ParseFile processes a single JSONL session file.
  Uses a hybrid parsing strategy for performance:

  * "user" and "system" entries: byte-level field extraction for
    timestamps, cwd, and turn_duration (avoids JSON allocation).
    extractTopLevelType tracks brace depth and string boundaries to
    find only the top-level "type" field, early-exiting ~400 bytes
    in for O(1) per line cost regardless of line length.

  * "assistant" entries: full JSON unmarshal to extract token usage,
    model name, and cost data.

  Deduplicates API calls by message.id (keeping the last entry per
  ID, which holds the final billed usage). Computes per-model cost
  breakdown using config.CalculateCost and aggregates cache hit rate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 13:01:27 -05:00
teernisse
8984d5062d feat: add configuration system with pricing tables and plan detection
Implement the configuration layer that supports the entire cost
estimation pipeline:

- config/config.go: TOML-based config at ~/.config/cburn/config.toml
  (XDG-compliant) with sections for general preferences, Admin API
  credentials, budget tracking, appearance, and per-model pricing
  overrides. Supports Load/Save with sensible defaults (30-day
  window, subagents included, Flexoki Dark theme). Admin API key
  resolution checks ANTHROPIC_ADMIN_KEY env var first, falling back
  to the config file.

- config/pricing.go: Hardcoded pricing table for 8 Claude model
  variants (Opus 4/4.1/4.5/4.6, Sonnet 4/4.5/4.6, Haiku 3.5/4.5)
  with per-million-token rates across 5 billing dimensions: input,
  output, cache_write_5m, cache_write_1h, cache_read, plus long-
  context overrides (>200K tokens). NormalizeModelName strips date
  suffixes (e.g., "claude-opus-4-5-20251101" -> "claude-opus-4-5").
  CalculateCost and CalculateCacheSavings compute per-call USD costs
  by multiplying token counts against the pricing table.

- config/plan.go: DetectPlan reads ~/.claude/.claude.json to
  determine the billing plan type. Maps "stripe_subscription" to
  the Max plan ($200/mo ceiling), everything else to Pro ($100/mo).
  Used by the budget tab for plan-relative spend visualization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 13:01:11 -05:00
teernisse
cfbbb9d6db feat: add domain model types for session metrics and statistics
Define the core data structures that flow through the entire pipeline:

- model/session.go: SessionStats (per-session aggregates including
  token counts across 5 categories — input, output, cache_write_5m,
  cache_write_1h, cache_read), APICall (deduplicated by message.id,
  keyed to the final billed usage), and ModelUsage (per-model
  breakdown within a session). Tracks subagent relationships via
  IsSubagent/ParentSession fields.

- model/metrics.go: Higher-order aggregate types — SummaryStats
  (top-level totals with per-active-day rates for cost, tokens,
  sessions, and minutes), DailyStats/HourlyStats/WeeklyStats
  (time-bucketed views), ModelStats (cross-session model comparison
  with share percentages), ProjectStats (per-project ranking), and
  PeriodComparison (current vs previous period for delta display).

- model/budget.go: BudgetStats with plan ceiling, custom budget,
  burn rate, and projected monthly spend for the budget tab.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 13:00:57 -05:00