Commit Graph

6 Commits

Author SHA1 Message Date
6681f07fc0 Add countSensitiveMessages for pre-scan sensitive content detection
Export a new countSensitiveMessages() function that returns how many
messages in an array contain at least one sensitive pattern match.
Checks both content and toolInput fields, counting each message at
most once regardless of how many matches it contains.

Tests verify zero counts for clean messages, correct counting with
mixed sensitive/clean messages, and the single-count-per-message
invariant when multiple secrets appear in one message.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 13:35:15 -05:00
54f909c80c Revise default hidden categories to reduce noise in session view
Change the default-hidden message categories from [thinking,
hook_progress] to [tool_result, system_message, hook_progress,
file_snapshot]. This hides the verbose machine-oriented categories
by default while keeping thinking blocks visible — they contain
useful reasoning context that users typically want to see.

Also rename the "summary" category label from "Summaries" to
"Compactions" to better reflect what Claude's summary messages
actually represent (context-window compaction artifacts).

Tests updated to match the new defaults: the filter test now
asserts that tool_result, system_message, hook_progress, and
file_snapshot are all excluded, producing 5 visible messages
instead of the previous 7.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 10:42:21 -05:00
4b13e7eeb9 Add session duration computation to discovery pipeline
Extend SessionEntry with an optional duration field (milliseconds)
computed from the delta between created and modified timestamps.
The computeDuration helper handles missing or invalid dates gracefully,
returning 0 for any edge case. This enables downstream UI to show
how long each session lasted without additional API calls.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 09:25:44 -05:00
0e5a36f0d1 Fix sensitive redactor keyword matching for case-insensitive patterns
The keyword pre-filter used case-sensitive string matching for all patterns,
but several regex patterns use the /i flag (e.g. generic_api_key). This meant
inputs like 'ApiKey = "secret"' would skip the keyword check for 'api_key'
and miss the redaction entirely.

Changes:
- Add caseInsensitive parameter to hasKeyword() that lowercases both content
  and keywords before comparison
- Detect /i flag on pattern regex and pass it through automatically
- Narrow IP address keywords from ["."] to ["0.", "1.", ..., "9."] to reduce
  false-positive regex invocations on content containing periods
- Fix email regex character class [A-Z|a-z] → [A-Za-z] (the pipe was literal)
- Add clarifying comment on url_with_creds pattern
- Add test cases for mixed-case and UPPER_CASE key assignments
- Relax SECRET_KEY test assertion to accept either redaction label

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 01:09:11 -05:00
8e713b9c50 Extract escapeHtml into shared module for reuse across client and server
The same HTML entity escaping logic was duplicated in three places:
MessageBubble.tsx, html-exporter.ts, and markdown.ts. Consolidate into
a single shared/escape-html.ts with a single-pass regex+lookup implementation
instead of five chained .replace() calls.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 01:08:38 -05:00
c4e15bf082 Add shared type definitions and sensitive content redactor
Shared module consumed by both the Express server and the React client:

types.ts:
- ParsedMessage: the normalized message unit (uuid, category, content,
  toolName, toolInput, timestamp, rawIndex) that the parser emits and
  every downstream consumer (viewer, filter, export) operates on
- MessageCategory: 9-value union covering user_message, assistant_text,
  thinking, tool_call, tool_result, system_message, hook_progress,
  file_snapshot, and summary
- SessionEntry / SessionListResponse / SessionDetailResponse / ExportRequest:
  API contract types for the sessions list, session detail, and HTML
  export endpoints
- ALL_CATEGORIES, CATEGORY_LABELS, DEFAULT_HIDDEN_CATEGORIES: constants
  for the filter panel UI and presets (thinking + hook_progress hidden
  by default)

sensitive-redactor.ts:
- 34 regex patterns derived from gitleaks production config, organized
  into Tier 1 (known secret formats: AWS, GitHub, GitLab, OpenAI,
  Anthropic, HuggingFace, Perplexity, Stripe, Slack, SendGrid, Twilio,
  GCP, Azure AD, Heroku, npm, PyPI, Sentry, JWT, PEM private keys,
  generic API key assignments) and Tier 2 (PII/system info: home
  directory paths, connection strings, URLs with credentials, email
  addresses, IPv4 addresses, Bearer tokens, env var secret assignments)
- Keyword pre-filtering: each pattern declares keywords that must appear
  in the text before the expensive regex is evaluated, following the
  gitleaks performance optimization approach
- False-positive allowlists: example/test email domains, localhost/
  documentation IPs (RFC 5737), noreply@anthropic.com
- Pure functions: redactSensitiveContent returns {sanitized, count,
  categories}, redactString returns just the string, redactMessage
  returns a new ParsedMessage with content and toolInput redacted

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 22:55:48 -05:00