4 Commits

Author SHA1 Message Date
teernisse
8fddd50193 Implement JSONL-first session discovery with tiered lookup
Rewrite session discovery to be filesystem-first, addressing the widespread
bug where Claude Code's sessions-index.json files are unreliable (87 MB of
unindexed sessions, 17% loss rate across all projects).

Architecture: Three-tier metadata lookup

Tier 1 - Index validation (instant):
  - Parse sessions-index.json into Map<sessionId, IndexEntry>
  - Validate entry.modified against actual file stat.mtimeMs
  - Use 1s tolerance to account for ISO string → filesystem mtime rounding
  - Trust content fields only (messageCount, summary, firstPrompt)
  - Timestamps always come from fs.stat, never from index

Tier 2 - Persistent cache hit (instant):
  - Check MetadataCache by (filePath, mtimeMs, size)
  - If match, use cached metadata
  - Survives server restarts

Tier 3 - Full JSONL parse (~5-50ms/file):
  - Call extractSessionMetadata() with shared parser helpers
  - Cache result for future lookups

Key correctness guarantees:
- All .jsonl files appear regardless of index state
- SessionEntry timestamps always from fs.stat (list ordering never stale)
- Message counts exact (shared helpers ensure parser parity)
- Duration computed from JSONL timestamps, not index

Performance:
- Bounded concurrency: 32 concurrent operations per project
- mapWithLimit() prevents file handle exhaustion
- Warm start <1s (stat all files, in-memory lookups)
- Cold start ~3-5s for 3,103 files (stat + parse phases)

TOCTOU handling:
- Files that disappear between readdir and stat: silently skipped
- Files that disappear between stat and read: silently skipped
- File actively being written: partial parse handled gracefully

Include PRD document that drove this implementation with detailed
requirements, edge cases, and verification plan.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-28 00:53:20 -05:00
69857fa825 Fix session discovery tests to use dynamic paths and add containment test
Replace hardcoded absolute paths in test assertions with dynamically
constructed paths matching the temp directory. This makes tests portable
across environments where path.resolve() produces different results.

Add test verifying that absolute paths pointing outside the projects
directory (e.g. /etc/shadow.jsonl) are rejected by the discovery filter.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 01:10:55 -05:00
eb8001dbf1 Harden session discovery with path validation and parallel I/O
Security: Reject session paths containing '..' traversal segments or
non-.jsonl extensions before resolving them. This prevents a malicious
sessions-index.json from tricking the viewer into reading arbitrary files.

Performance: Process all project directories concurrently with Promise.all
instead of sequentially awaiting each one. Each directory's stat + readFile
is independent I/O that benefits from parallelism.

Add test case verifying that traversal paths and non-JSONL paths are rejected
while valid paths pass through.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 01:08:57 -05:00
a1d54e84c7 Add test suite: unit tests for parser, discovery, redactor, exporter, and filters
Comprehensive test coverage for all server services and shared modules:

tests/unit/session-parser.test.ts (16 tests):
- Parses every message type: user (string + array content), assistant
  (text, thinking, tool_use blocks), progress/hook events (from data
  field), file-history-snapshot, summary (from summary field)
- Verifies system metadata (turn_duration) and queue-operation lines
  are silently skipped
- Detects <system-reminder> tags and reclassifies as system_message
- Resilience: skips malformed JSONL lines, returns empty for empty
  files, preserves UUIDs from source
- Integration: parses full sample-session.jsonl fixture verifying all
  9 categories are represented, handles edge-cases.jsonl with corrupt
  lines

tests/unit/session-discovery.test.ts (6 tests):
- Discovers sessions from {version, entries} index format
- Handles legacy raw array format
- Gracefully returns empty for missing directories and corrupt JSON
- Aggregates across multiple project directories
- Uses fullPath from index entries when available

tests/unit/sensitive-redactor.test.ts (40+ tests):
- Tier 1 secrets: AWS access keys (AKIA/ASIA), Bedrock keys, GitHub
  PATs (ghp_, github_pat_, ghu_, ghs_), GitLab (glpat-, glrt-),
  OpenAI (sk-proj-, legacy sk-), Anthropic (api03, admin01),
  HuggingFace, Perplexity, Stripe (sk_live/test/prod, rk_*), Slack
  (bot token, webhook URL), SendGrid, GCP, Heroku, npm, PyPI, Sentry,
  JWT, PEM private keys (RSA + generic), generic API key assignments
- Tier 2 PII: home directory paths (Linux/macOS/Windows), connection
  strings (PostgreSQL/MongoDB/Redis), URLs with embedded credentials,
  email addresses, IPv4 addresses, Bearer tokens, env var secrets
- False positive resistance: normal code, markdown, short strings,
  non-home file paths, version numbers
- Allowlists: example.com/test.com emails, noreply@anthropic.com,
  127.0.0.1, 0.0.0.0, RFC 5737 documentation IPs
- Edge cases: empty strings, multiple secrets, category tracking
- redactMessage: preserves uuid/category/timestamp, redacts content
  and toolInput, leaves toolName unchanged, doesn't mutate original

tests/unit/html-exporter.test.ts (8 tests):
- Valid DOCTYPE HTML output with no external URL references
- Includes visible messages, excludes filtered and redacted ones
- Inserts redacted dividers at correct positions
- Renders markdown to HTML with syntax highlighting CSS
- Includes session metadata header

tests/unit/filters.test.ts (12 tests):
- Category filtering: include/exclude by enabled set
- Default state: thinking and hook_progress hidden
- All-on/all-off edge cases
- Redacted UUID exclusion
- Search match counting with empty query edge case
- Preset validation (conversation, debug)
- Category count computation with and without redactions

src/client/components/SessionList.test.tsx (7 tests):
- Two-phase navigation: project list → session list → back
- Auto-select project when selectedId matches
- Loading and empty states
- onSelect callback on session click

tests/fixtures/:
- sample-session.jsonl: representative session with all message types
- edge-cases.jsonl: corrupt lines interspersed with valid messages
- sessions-index.json: sample index file for discovery tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 22:57:02 -05:00