3 Commits

Author SHA1 Message Date
6681f07fc0 Add countSensitiveMessages for pre-scan sensitive content detection
Export a new countSensitiveMessages() function that returns how many
messages in an array contain at least one sensitive pattern match.
Checks both content and toolInput fields, counting each message at
most once regardless of how many matches it contains.

Tests verify zero counts for clean messages, correct counting with
mixed sensitive/clean messages, and the single-count-per-message
invariant when multiple secrets appear in one message.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 13:35:15 -05:00
0e5a36f0d1 Fix sensitive redactor keyword matching for case-insensitive patterns
The keyword pre-filter used case-sensitive string matching for all patterns,
but several regex patterns use the /i flag (e.g. generic_api_key). This meant
inputs like 'ApiKey = "secret"' would skip the keyword check for 'api_key'
and miss the redaction entirely.

Changes:
- Add caseInsensitive parameter to hasKeyword() that lowercases both content
  and keywords before comparison
- Detect /i flag on pattern regex and pass it through automatically
- Narrow IP address keywords from ["."] to ["0.", "1.", ..., "9."] to reduce
  false-positive regex invocations on content containing periods
- Fix email regex character class [A-Z|a-z] → [A-Za-z] (the pipe was literal)
- Add clarifying comment on url_with_creds pattern
- Add test cases for mixed-case and UPPER_CASE key assignments
- Relax SECRET_KEY test assertion to accept either redaction label

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 01:09:11 -05:00
a1d54e84c7 Add test suite: unit tests for parser, discovery, redactor, exporter, and filters
Comprehensive test coverage for all server services and shared modules:

tests/unit/session-parser.test.ts (16 tests):
- Parses every message type: user (string + array content), assistant
  (text, thinking, tool_use blocks), progress/hook events (from data
  field), file-history-snapshot, summary (from summary field)
- Verifies system metadata (turn_duration) and queue-operation lines
  are silently skipped
- Detects <system-reminder> tags and reclassifies as system_message
- Resilience: skips malformed JSONL lines, returns empty for empty
  files, preserves UUIDs from source
- Integration: parses full sample-session.jsonl fixture verifying all
  9 categories are represented, handles edge-cases.jsonl with corrupt
  lines

tests/unit/session-discovery.test.ts (6 tests):
- Discovers sessions from {version, entries} index format
- Handles legacy raw array format
- Gracefully returns empty for missing directories and corrupt JSON
- Aggregates across multiple project directories
- Uses fullPath from index entries when available

tests/unit/sensitive-redactor.test.ts (40+ tests):
- Tier 1 secrets: AWS access keys (AKIA/ASIA), Bedrock keys, GitHub
  PATs (ghp_, github_pat_, ghu_, ghs_), GitLab (glpat-, glrt-),
  OpenAI (sk-proj-, legacy sk-), Anthropic (api03, admin01),
  HuggingFace, Perplexity, Stripe (sk_live/test/prod, rk_*), Slack
  (bot token, webhook URL), SendGrid, GCP, Heroku, npm, PyPI, Sentry,
  JWT, PEM private keys (RSA + generic), generic API key assignments
- Tier 2 PII: home directory paths (Linux/macOS/Windows), connection
  strings (PostgreSQL/MongoDB/Redis), URLs with embedded credentials,
  email addresses, IPv4 addresses, Bearer tokens, env var secrets
- False positive resistance: normal code, markdown, short strings,
  non-home file paths, version numbers
- Allowlists: example.com/test.com emails, noreply@anthropic.com,
  127.0.0.1, 0.0.0.0, RFC 5737 documentation IPs
- Edge cases: empty strings, multiple secrets, category tracking
- redactMessage: preserves uuid/category/timestamp, redacts content
  and toolInput, leaves toolName unchanged, doesn't mutate original

tests/unit/html-exporter.test.ts (8 tests):
- Valid DOCTYPE HTML output with no external URL references
- Includes visible messages, excludes filtered and redacted ones
- Inserts redacted dividers at correct positions
- Renders markdown to HTML with syntax highlighting CSS
- Includes session metadata header

tests/unit/filters.test.ts (12 tests):
- Category filtering: include/exclude by enabled set
- Default state: thinking and hook_progress hidden
- All-on/all-off edge cases
- Redacted UUID exclusion
- Search match counting with empty query edge case
- Preset validation (conversation, debug)
- Category count computation with and without redactions

src/client/components/SessionList.test.tsx (7 tests):
- Two-phase navigation: project list → session list → back
- Auto-select project when selectedId matches
- Loading and empty states
- onSelect callback on session click

tests/fixtures/:
- sample-session.jsonl: representative session with all message types
- edge-cases.jsonl: corrupt lines interspersed with valid messages
- sessions-index.json: sample index file for discovery tests

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 22:57:02 -05:00