diff --git a/docs/prd-jsonl-first-discovery.md b/docs/prd-jsonl-first-discovery.md new file mode 100644 index 0000000..a1b6618 --- /dev/null +++ b/docs/prd-jsonl-first-discovery.md @@ -0,0 +1,313 @@ +# PRD: JSONL-First Session Discovery + +## Status: Ready for Implementation + +## Context + +Session viewer relies exclusively on `sessions-index.json` files that Claude Code maintains. These indexes are unreliable — a known, widespread bug with multiple open GitHub issues ([#22030](https://github.com/anthropics/claude-code/issues/22030), [#21610](https://github.com/anthropics/claude-code/issues/21610), [#18619](https://github.com/anthropics/claude-code/issues/18619), [#22114](https://github.com/anthropics/claude-code/issues/22114)). + +### Root cause + +Claude Code updates `sessions-index.json` only at session end. If a session crashes, is killed, or is abandoned, the JSONL file is written but the index is never updated. Multiple concurrent Claude instances can also corrupt the index (last-write-wins on a single JSON file). There is no reindex command and no background repair process. + +### Impact on this system + +- **542 unindexed JSONL files** across all projects (87 MB total) +- **48 unindexed in last 7 days** (30.8 MB) +- **13 projects** have JSONL session files but no index at all +- **Zero sessions from today** (Feb 4, 2026) appear in any index +- **3,103 total JSONL files** vs **2,563 indexed entries** = 17% loss rate + +### Key insight + +The `.jsonl` files are the source of truth. The index is an unreliable convenience cache. The session viewer must treat it that way. + +## Requirements + +### Must have + +1. **All sessions with a `.jsonl` file must appear in the session list**, regardless of whether they're in `sessions-index.json` +2. **Exact message counts** — no estimates, no approximations. Contract: Tier 3 extraction MUST reuse the same line-classification logic as `parseSessionContent` (shared helper), so list counts cannot drift from detail parsing. +3. **Performance**: Warm start (cache exists, few changes) must complete under 1 second. Cold start (no cache) is acceptable up to 5 seconds for first request +4. **Correctness over speed** — never show stale metadata if the file has been modified +5. **Zero config** — works out of the box with no setup or external dependencies + +### Should have + +6. Session `summary` extracted from the last `type="summary"` line in the JSONL +7. Session `firstPrompt` extracted from the first non-system-reminder user message +8. Session `duration` MUST be derivable without relying on `sessions-index.json` — extract first and last timestamps from JSONL when index is missing or stale +9. Persistent metadata cache survives server restarts + +### Won't have (this iteration) + +- Real-time push updates (sessions appearing in UI without refresh) +- Background file watcher daemon +- Integration with `cass` as a search/indexing backend +- Rebuilding Claude Code's `sessions-index.json` + +## Technical Design + +### Architecture: Filesystem-primary with tiered metadata lookup + +``` +discoverSessions() + | + +-- For each project directory under ~/.claude/projects/: + | | + | +-- fs.readdir() --> list all *.jsonl files + | +-- Read sessions-index.json (optional, used as pre-populated cache) + | | + | +-- Batch stat all .jsonl files (bounded concurrency) + | | Files that disappeared between readdir and stat are silently skipped (TOCTOU race) + | | + | +-- For each .jsonl file: + | | | + | | +-- Tier 1: Check index + | | | Entry exists AND normalize(index.modified) matches stat mtime? + | | | --> Use index content data (messageCount, summary, firstPrompt) + | | | --> Use stat-derived timestamps for created/modified (always) + | | | + | | +-- Tier 2: Check persistent metadata cache + | | | path + mtimeMs + size match? + | | | --> Use cached metadata (fast path) + | | | + | | +-- Tier 3: Extract metadata from JSONL content + | | Read file, lightweight parse using shared line iterator + counting helper + | | --> Cache result for future lookups + | | + | +-- Collect SessionEntry[] for this project + | + +-- Merge all projects + +-- Sort by modified (descending) — always stat-derived, never index-derived + +-- Async: persist metadata cache to disk (if dirty) +``` + +### Tier explanation + +| Tier | Source | Speed | When used | Trusts from source | +|------|--------|-------|-----------|--------------------| +| 1 | `sessions-index.json` | Instant (in-memory lookup) | Index exists, entry present, `normalize(modified)` matches actual file mtime | `messageCount`, `summary`, `firstPrompt` only. Timestamps always from stat. | +| 2 | Persistent metadata cache | Instant (in-memory lookup) | Index missing/stale, but file hasn't changed since last extraction (mtimeMs + size match) | All cached fields | +| 3 | JSONL file parse | ~5-50ms/file | New or modified file, not in any cache | Extracted fresh | + +Tier 1 reuses Claude's index when it's valid — no wasted work. The index `modified` field (ISO string) is normalized to milliseconds and compared against the real file `stat.mtimeMs`. If the index is missing or corrupt, discovery continues with Tier 2 and 3 without error. Even when Tier 1 is valid, `created` and `modified` timestamps on the `SessionEntry` always come from `fs.stat` — the index is a content cache only. + +### Tier 1: Index validation details + +The actual `sessions-index.json` format has `created` and `modified` as ISO strings, not a `fileMtime` field. Tier 1 validation must: + +1. Map JSONL filename to sessionId: `sessionId := path.basename(jsonlFile, '.jsonl')` +2. Look up `sessionId` in the index `Map` +3. Compare `new Date(entry.modified).getTime()` against `stat.mtimeMs` — reject if they differ by more than 1000ms (accounts for ISO string → filesystem mtime rounding) +4. If the index entry has no `modified` field, skip Tier 1 (fall through to Tier 2) +5. When Tier 1 is valid, trust only content fields (`messageCount`, `summary`, `firstPrompt`). The `created`/`modified` on the resulting `SessionEntry` must come from `stat.birthtimeMs`/`stat.mtimeMs` respectively — this ensures list ordering is never stale even within the 1s mtime tolerance window. + +### Shared line-iteration and counting (parser parity contract) + +The biggest correctness risk in this design is duplicating any JSONL processing logic. The real parser in `session-parser.ts` has non-trivial expansion rules: + +- User array content: expands `tool_result` and `text` blocks into separate messages +- `system-reminder` detection reclassifies user `text` blocks as `system_message` +- Assistant array content: `thinking`, `text`, and `tool_use` each become separate messages +- `progress`, `file-history-snapshot`, `summary` → 1 message each +- `system`, `queue-operation` → 0 (skipped) + +It also has error-handling behavior: malformed/truncated JSON lines are skipped (common when sessions crash mid-write). If the metadata extractor and the full parser handle malformed lines differently, counts will drift. + +Rather than reimplementing any of these rules, extract shared helpers at two levels: + +```typescript +// In session-parser.ts (or a shared module): + +// Level 1: Line iteration with consistent error handling +// Splits content by newlines, JSON.parse each, skips malformed lines identically +// to how parseSessionContent handles them. Returns parse error count for diagnostics. +export function forEachJsonlLine( + content: string, + onLine: (parsed: RawLine, lineIndex: number) => void +): { parseErrors: number } + +// Level 2: Classification and counting (called per parsed line) +export function countMessagesForLine(parsed: RawLine): number +export function classifyLine(parsed: RawLine): LineClassification +``` + +Both `extractSessionMetadata()` and `parseSessionContent()` use `forEachJsonlLine()` for iteration, ensuring identical malformed-line handling. Both use `countMessagesForLine()` for counting. This two-level sharing guarantees that list counts can never drift from detail-view counts, regardless of future parser changes or edge cases in error handling. + +### Metadata extraction (Tier 3) + +A lightweight `extractSessionMetadata()` function reads the JSONL file and extracts only what the list view needs, without building full message content strings: + +```typescript +export function extractSessionMetadata(content: string): SessionMetadata +``` + +Implementation: + +1. Iterate lines via `forEachJsonlLine(content, ...)` — the shared iterator with identical malformed-line handling as the main parser +2. Call `countMessagesForLine(parsed)` per line — the shared helper that uses the **same classification rules** as `parseSessionContent` in `session-parser.ts` +3. Extract `firstPrompt`: content of the first user message that isn't a ``, truncated to 200 characters +4. Extract `summary`: the `summary` field from the last `type="summary"` line +5. Capture first and last `timestamp` fields for duration computation + +No string building, no `JSON.stringify`, no markdown processing — just counting, timestamp capture, and first-match extraction. This is exact (matches `parseSessionContent().length` via shared helpers) but 2-3x faster than full parsing. + +### Persistent metadata cache + +**Location:** `~/.cache/session-viewer/metadata.json` + +```typescript +interface CacheFile { + version: 1; + entries: Record; +} +``` + +Behavior: +- Loaded once on first `discoverSessions()` call +- Entries validated by `(mtimeMs, size)` — if either changes, entry is re-extracted via Tier 3 +- Written to disk asynchronously using a dirty-flag write-behind strategy: only when cache has new/updated entries, coalescing multiple discovery passes, non-blocking +- Flush any pending write on process exit (`SIGTERM`, `SIGINT`) and graceful server shutdown — prevents losing cache updates when the server stops before the async write fires +- Corrupt or missing cache file triggers graceful fallback (all files go through Tier 3, cache rebuilt) +- Atomic writes: write to temp file, then rename (prevents corruption from crashes during write) +- Stale entries (file no longer exists on disk) are pruned on save + +### Concurrency model + +Cold start with 3,103 files requires bounded parallelism to avoid file-handle exhaustion and IO thrash, while still meeting the <5s target: + +- **Stat phase**: Batch all `fs.stat()` calls with concurrency limit (e.g., 64). This classifies each file into Tier 1/2 (cache hit) or Tier 3 (needs parse). Files that fail stat (ENOENT from deletion race, EACCES) are silently skipped with a debug log. +- **Parse phase**: Process Tier-3 misses with bounded concurrency (e.g., 8). Each parse reads + iterates via shared `forEachJsonlLine()` + shared counter. With max file size 4.5MB, each parse is ~5-50ms. +- Use a simple async work queue (e.g., `p-limit` or hand-rolled semaphore). No worker threads needed for this IO-bound workload. + +### Performance expectations + +| Scenario | Estimated time | +|----------|---------------| +| Cold start (no cache, no index) | ~3-5s for 3,103 files (~500MB), bounded concurrency: stat@64, parse@8 | +| Warm start (cache exists, few changes) | ~300-500ms (stat all files at bounded concurrency, in-memory lookups) | +| Incremental (cache + few new sessions) | ~500ms + ~50ms per new file | +| Subsequent API calls within 30s TTL | <1ms (in-memory session list cache) | + +### Existing infrastructure leveraged + +- **30-second in-memory cache** in `sessions.ts` (`getCachedSessions()`) — unchanged, provides the fast path for repeated API calls +- **`?refresh=1` query parameter** — forces cache invalidation, unchanged +- **Concurrent request deduplication** via `cachePromise` pattern — unchanged +- **Security validations** — path traversal rejection, containment checks, `.jsonl` extension enforcement — applied to filesystem-discovered files identically + +## Implementation scope + +### Checkpoints + +#### CP0 — Parser parity foundations +- Extract `forEachJsonlLine()` shared line iterator from existing parser +- Extract `countMessagesForLine()` and `classifyLine()` shared helpers +- Refactor `extractMessages()` to use these internally (no behavior change to parseSessionContent) +- Tests verify identical behavior on malformed/truncated lines + +#### CP1 — Filesystem-first correctness +- All `.jsonl` sessions appear even with missing/corrupt index +- `extractSessionMetadata()` uses shared line iterator + counting helpers; exact counts verified by tests +- Stat-derived `created`/`modified` are the single source for SessionEntry timestamps and list ordering +- Duration computed from JSONL timestamps, not index +- TOCTOU races (readdir/stat, stat/read) handled gracefully — disappeared files silently skipped + +#### CP2 — Persistent cache +- Atomic writes with dirty-flag write-behind; prune stale entries +- Invalidation keyed on `(mtimeMs, size)` +- Flush pending writes on process exit / server shutdown + +#### CP3 — Index fast path (Tier 1) +- Parse index into Map; normalize `modified` ISO → ms; validate against stat mtime with 1s tolerance +- sessionId mapping: `basename(file, '.jsonl')` +- Tier 1 trusts content fields only; timestamps always from stat + +#### CP4 — Performance hardening +- Bounded concurrency for stat + parse phases +- Warm start <1s verified on real dataset + +### Modified files + +**`src/server/services/session-parser.ts`** + +1. Extract `forEachJsonlLine(content, onLine): { parseErrors: number }` — shared line iterator with consistent malformed-line handling +2. Extract `countMessagesForLine(parsed: RawLine): number` — shared counting helper +3. Extract `classifyLine(parsed: RawLine): LineClassification` — shared classification +4. Refactor `extractMessages()` to use these shared helpers internally (no behavior change to parseSessionContent) + +**`src/server/services/session-discovery.ts`** + +1. Add `extractSessionMetadata(content: string): SessionMetadata` — lightweight JSONL metadata extractor using shared line iterator + counting helper +2. Add `MetadataCache` class — persistent cache with load/get/set/save, dirty-flag write-behind, shutdown flush +3. Rewrite per-project discovery loop — filesystem-first, tiered metadata lookup with bounded concurrency +4. Read `sessions-index.json` as optimization only — parse into `Map`, normalize `modified` to ms, validate against stat mtime before trusting +5. Register shutdown hooks for cache flush on `SIGTERM`/`SIGINT` + +### Unchanged files + +- `src/server/routes/sessions.ts` — existing caching layer works as-is +- `src/shared/types.ts` — `SessionEntry` type already has `duration?: number` +- All client components — no changes needed + +### New tests + +- Unit test: `forEachJsonlLine()` skips malformed lines identically to how `parseSessionContent` handles them +- Unit test: `forEachJsonlLine()` reports parse error count for truncated/corrupted lines +- Unit test: `countMessagesForLine()` matches actual `extractMessages()` output length on sample lines +- Unit test: `extractSessionMetadata()` output matches `parseSessionContent().length` on sample fixtures (including malformed/truncated lines) +- Unit test: Duration extracted from JSONL timestamps matches expected values +- Unit test: SessionEntry `created`/`modified` always come from stat, even when Tier 1 index data is trusted +- Unit test: Tier 1 validation rejects stale index entries (mtime mismatch beyond 1s tolerance) +- Unit test: Tier 1 handles missing `modified` field gracefully (falls through to Tier 2) +- Unit test: Discovery works with no `sessions-index.json` present +- Unit test: Discovery silently skips files that disappear between readdir and stat (TOCTOU) +- Unit test: Cache hit/miss/invalidation behavior (mtimeMs + size) +- Unit test: Cache dirty-flag only triggers write when entries changed + +## Edge cases + +| Scenario | Behavior | +|----------|----------| +| File actively being written | mtime changes between stat and read. Next discovery pass re-extracts. Partial JSONL handled gracefully (malformed lines skipped via shared `forEachJsonlLine`, same behavior as real parser). | +| Deleted session files | File in cache but gone from disk. Entry silently dropped, pruned from cache on next save. | +| File disappears between readdir and stat | TOCTOU race. Stat failure (ENOENT) silently skipped with debug log. | +| File disappears between stat and read | Read failure silently skipped; file excluded from results. Next pass re-discovers if it reappears. | +| Index entry with wrong mtime | Tier 1 validation rejects it (>1s tolerance). Falls through to Tier 2/3. | +| Index entry with no `modified` field | Tier 1 skips it. Falls through to Tier 2/3. | +| Index `modified` in seconds vs milliseconds | Normalization handles both ISO strings and numeric timestamps. | +| Cache file locked or unwritable | Extraction still works, just doesn't persist. Warning logged to stderr. | +| Very large files | 4.5MB max observed. Tier 3 parse ~50ms. Acceptable. | +| Concurrent server restarts | Cache writes are atomic (temp file + rename). | +| Server killed before async cache write | Shutdown hooks flush pending writes on SIGTERM/SIGINT. Hard kills (SIGKILL) may lose updates — acceptable, cache rebuilt on next cold start. | +| Empty JSONL files | Returns `messageCount: 0`, empty `firstPrompt`, `summary`, and timestamps. Duration: 0. | +| Projects with no index file | Discovery proceeds normally via Tier 2/3. Common case (13 projects). | +| Non-JSONL files in project dirs | Filtered out by `.jsonl` extension check in `readdir` results. | +| File handle exhaustion | Bounded concurrency (stat@64, parse@8) prevents opening thousands of handles. | +| Future parser changes (new message types) | Shared line iterator + counting helper in session-parser.ts means Tier 3 automatically stays in sync. | +| Malformed JSONL lines (crash mid-write) | Shared `forEachJsonlLine()` skips identically in both metadata extraction and full parsing — no count drift. | + +## Verification plan + +1. Start dev server, confirm today's sessions appear immediately in the session list +2. Compare message counts for indexed sessions: Tier 1 data vs Tier 3 extraction (should match) +3. Verify duration is shown for sessions that have no index entry (JSONL-only sessions) +4. Delete a `sessions-index.json`, refresh — verify all sessions for that project still appear with correct counts and durations +5. Run existing test suite: `npm test` +6. Run new unit tests for shared line iterator, counting helper, `extractSessionMetadata()`, and `MetadataCache` +7. Verify `created`/`modified` in session list come from stat, not index (compare with `ls -l` output) +8. Verify cold start performance: delete `~/.cache/session-viewer/metadata.json`, time the first API request +9. Verify warm start performance: time a subsequent server start with cache in place +10. Verify cache dirty-flag: repeated refreshes with no file changes should not write cache to disk +11. Kill server with SIGTERM, restart — verify cache was flushed (no full re-parse on restart) diff --git a/src/server/services/session-discovery.ts b/src/server/services/session-discovery.ts index 8189137..6cff1bb 100644 --- a/src/server/services/session-discovery.ts +++ b/src/server/services/session-discovery.ts @@ -2,6 +2,54 @@ import fs from "fs/promises"; import path from "path"; import os from "os"; import type { SessionEntry } from "../../shared/types.js"; +import { extractSessionMetadata } from "./session-metadata.js"; +import { MetadataCache } from "./metadata-cache.js"; +import type { CacheEntry } from "./metadata-cache.js"; + +const CLAUDE_PROJECTS_DIR = path.join(os.homedir(), ".claude", "projects"); +const FILE_CONCURRENCY = 32; + +let cache: MetadataCache | null = null; +let cacheLoaded = false; + +export function setCache(c: MetadataCache | null): void { + cache = c; + cacheLoaded = c !== null; +} + +async function ensureCache(): Promise { + if (!cache) { + cache = new MetadataCache(); + } + if (!cacheLoaded) { + await cache.load(); + cacheLoaded = true; + } + return cache; +} + +async function mapWithLimit( + items: T[], + limit: number, + fn: (item: T) => Promise +): Promise { + const results: R[] = new Array(items.length); + let nextIndex = 0; + + async function worker(): Promise { + while (nextIndex < items.length) { + const i = nextIndex++; + results[i] = await fn(items[i]); + } + } + + const workers = Array.from( + { length: Math.min(limit, items.length) }, + () => worker() + ); + await Promise.all(workers); + return results; +} interface IndexEntry { sessionId: string; @@ -14,12 +62,14 @@ interface IndexEntry { projectPath?: string; } -const CLAUDE_PROJECTS_DIR = path.join(os.homedir(), ".claude", "projects"); +const MTIME_TOLERANCE_MS = 1000; export async function discoverSessions( projectsDir: string = CLAUDE_PROJECTS_DIR ): Promise { const sessions: SessionEntry[] = []; + const metadataCache = await ensureCache(); + const discoveredPaths = new Set(); let projectDirs: string[]; try { @@ -28,63 +78,152 @@ export async function discoverSessions( return sessions; } - // Parallel I/O: stat + readFile for all project dirs concurrently const results = await Promise.all( projectDirs.map(async (projectDir) => { const projectPath = path.join(projectsDir, projectDir); const entries: SessionEntry[] = []; - let stat; + let dirStat; try { - stat = await fs.stat(projectPath); + dirStat = await fs.stat(projectPath); } catch { return entries; } - if (!stat.isDirectory()) return entries; + if (!dirStat.isDirectory()) return entries; - const indexPath = path.join(projectPath, "sessions-index.json"); + let files: string[]; try { - const content = await fs.readFile(indexPath, "utf-8"); - const parsed = JSON.parse(content); - - // Handle both formats: raw array or { version, entries: [...] } - const rawEntries: IndexEntry[] = Array.isArray(parsed) - ? parsed - : parsed.entries ?? []; - - for (const entry of rawEntries) { - const sessionPath = - entry.fullPath || - path.join(projectPath, `${entry.sessionId}.jsonl`); - - // Validate: reject paths with traversal segments or non-JSONL extensions. - // Check the raw path for ".." before resolving (resolve normalizes them away). - if (sessionPath.includes("..") || !sessionPath.endsWith(".jsonl")) { - continue; - } - const resolved = path.resolve(sessionPath); - - // Containment check: reject paths that escape the projects directory. - // A corrupted or malicious index could set fullPath to an arbitrary - // absolute path like "/etc/shadow.jsonl". - if (!resolved.startsWith(projectsDir + path.sep) && resolved !== projectsDir) { - continue; - } - - entries.push({ - id: entry.sessionId, - summary: entry.summary || "", - firstPrompt: entry.firstPrompt || "", - project: projectDir, - created: entry.created || "", - modified: entry.modified || "", - messageCount: entry.messageCount || 0, - path: resolved, - duration: computeDuration(entry.created, entry.modified), - }); - } + files = await fs.readdir(projectPath); } catch { - // Missing or corrupt index - skip + return entries; + } + + const jsonlFiles = files.filter((f) => f.endsWith(".jsonl")); + + // Tier 1: Load sessions-index.json for this project + const indexMap = await loadProjectIndex(projectPath); + + const fileResults = await mapWithLimit( + jsonlFiles, + FILE_CONCURRENCY, + async (filename) => { + const filePath = path.join(projectPath, filename); + + // Security: reject traversal + if (filename.includes("..")) return null; + + const resolved = path.resolve(filePath); + if ( + !resolved.startsWith(projectsDir + path.sep) && + resolved !== projectsDir + ) { + return null; + } + + let fileStat; + try { + fileStat = await fs.stat(resolved); + } catch { + return null; + } + + discoveredPaths.add(resolved); + + const sessionId = path.basename(filename, ".jsonl"); + + // Tier 1: Check index + const indexEntry = indexMap.get(sessionId); + if (indexEntry?.modified) { + const indexMtimeMs = new Date(indexEntry.modified).getTime(); + if ( + !isNaN(indexMtimeMs) && + Math.abs(indexMtimeMs - fileStat.mtimeMs) <= MTIME_TOLERANCE_MS + ) { + const duration = computeDuration( + indexEntry.created, + indexEntry.modified + ); + return { + id: sessionId, + project: projectDir, + path: resolved, + created: new Date(fileStat.birthtimeMs).toISOString(), + modified: new Date(fileStat.mtimeMs).toISOString(), + messageCount: indexEntry.messageCount || 0, + firstPrompt: indexEntry.firstPrompt || "", + summary: indexEntry.summary || "", + duration: duration > 0 ? duration : undefined, + } satisfies SessionEntry; + } + } + + // Tier 2: Check metadata cache + const cached = metadataCache.get( + resolved, + fileStat.mtimeMs, + fileStat.size + ); + if (cached) { + const duration = computeDuration( + cached.firstTimestamp, + cached.lastTimestamp + ); + return { + id: sessionId, + project: projectDir, + path: resolved, + created: new Date(fileStat.birthtimeMs).toISOString(), + modified: new Date(fileStat.mtimeMs).toISOString(), + messageCount: cached.messageCount, + firstPrompt: cached.firstPrompt, + summary: cached.summary, + duration: duration > 0 ? duration : undefined, + } satisfies SessionEntry; + } + + // Tier 3: Full parse + let content: string; + try { + content = await fs.readFile(resolved, "utf-8"); + } catch { + return null; + } + + const metadata = extractSessionMetadata(content); + + // Update cache + const cacheEntry: CacheEntry = { + mtimeMs: fileStat.mtimeMs, + size: fileStat.size, + messageCount: metadata.messageCount, + firstPrompt: metadata.firstPrompt, + summary: metadata.summary, + firstTimestamp: metadata.firstTimestamp, + lastTimestamp: metadata.lastTimestamp, + }; + metadataCache.set(resolved, cacheEntry); + + const duration = computeDuration( + metadata.firstTimestamp, + metadata.lastTimestamp + ); + + return { + id: sessionId, + project: projectDir, + path: resolved, + created: new Date(fileStat.birthtimeMs).toISOString(), + modified: new Date(fileStat.mtimeMs).toISOString(), + messageCount: metadata.messageCount, + firstPrompt: metadata.firstPrompt, + summary: metadata.summary, + duration: duration > 0 ? duration : undefined, + } satisfies SessionEntry; + } + ); + + for (const entry of fileResults) { + if (entry) entries.push(entry); } return entries; @@ -101,14 +240,47 @@ export async function discoverSessions( return dateB - dateA; }); + // Fire-and-forget cache save + metadataCache.save(discoveredPaths).catch(() => { + // Cache write failure is non-fatal + }); + return sessions; } -function computeDuration(created?: string, modified?: string): number { - if (!created || !modified) return 0; - const createdMs = new Date(created).getTime(); - const modifiedMs = new Date(modified).getTime(); - if (isNaN(createdMs) || isNaN(modifiedMs)) return 0; - const diff = modifiedMs - createdMs; +async function loadProjectIndex( + projectPath: string +): Promise> { + const indexMap = new Map(); + const indexPath = path.join(projectPath, "sessions-index.json"); + + try { + const raw = await fs.readFile(indexPath, "utf-8"); + const parsed = JSON.parse(raw); + const rawEntries: IndexEntry[] = Array.isArray(parsed) + ? parsed + : parsed.entries ?? []; + + for (const entry of rawEntries) { + if (entry.sessionId) { + indexMap.set(entry.sessionId, entry); + } + } + } catch { + // Missing or corrupt index — continue without Tier 1 + } + + return indexMap; +} + +function computeDuration( + firstTimestamp?: string, + lastTimestamp?: string +): number { + if (!firstTimestamp || !lastTimestamp) return 0; + const firstMs = new Date(firstTimestamp).getTime(); + const lastMs = new Date(lastTimestamp).getTime(); + if (isNaN(firstMs) || isNaN(lastMs)) return 0; + const diff = lastMs - firstMs; return diff > 0 ? diff : 0; } diff --git a/tests/unit/session-discovery.test.ts b/tests/unit/session-discovery.test.ts index cc2f828..27489d8 100644 --- a/tests/unit/session-discovery.test.ts +++ b/tests/unit/session-discovery.test.ts @@ -1,70 +1,122 @@ -import { describe, it, expect } from "vitest"; -import { discoverSessions } from "../../src/server/services/session-discovery.js"; +import { describe, it, expect, beforeEach } from "vitest"; +import { discoverSessions, setCache } from "../../src/server/services/session-discovery.js"; +import { MetadataCache } from "../../src/server/services/metadata-cache.js"; import path from "path"; import fs from "fs/promises"; import os from "os"; -/** Helper to write a sessions-index.json in the real { version, entries } format */ -function makeIndex(entries: Record[]) { +function makeJsonlContent(lines: Record[]): string { + return lines.map((l) => JSON.stringify(l)).join("\n"); +} + +function makeIndex(entries: Record[]): string { return JSON.stringify({ version: 1, entries }); } -describe("session-discovery", () => { - it("discovers sessions from { version, entries } format", async () => { - const tmpDir = path.join(os.tmpdir(), `sv-test-${Date.now()}`); - const projectDir = path.join(tmpDir, "test-project"); - await fs.mkdir(projectDir, { recursive: true }); +async function makeTmpProject( + suffix: string +): Promise<{ tmpDir: string; projectDir: string; cachePath: string; cleanup: () => Promise }> { + const tmpDir = path.join(os.tmpdir(), `sv-test-${suffix}-${Date.now()}`); + const projectDir = path.join(tmpDir, "test-project"); + const cachePath = path.join(tmpDir, ".cache", "metadata.json"); + await fs.mkdir(projectDir, { recursive: true }); + return { + tmpDir, + projectDir, + cachePath, + cleanup: () => fs.rm(tmpDir, { recursive: true }), + }; +} - const sessionPath = path.join(projectDir, "sess-001.jsonl"); - await fs.writeFile( - path.join(projectDir, "sessions-index.json"), - makeIndex([ - { - sessionId: "sess-001", - fullPath: sessionPath, - summary: "Test session", - firstPrompt: "Hello", - created: "2025-10-15T10:00:00Z", - modified: "2025-10-15T11:00:00Z", - messageCount: 5, +describe("session-discovery", () => { + beforeEach(() => { + // Reset global cache between tests to prevent cross-contamination + setCache(new MetadataCache(path.join(os.tmpdir(), `sv-cache-${Date.now()}.json`))); + }); + + it("discovers sessions from .jsonl files without index", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("no-index"); + + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Hello world" }, + uuid: "u-1", + timestamp: "2025-10-15T10:00:00Z", + }, + { + type: "assistant", + message: { + role: "assistant", + content: [{ type: "text", text: "Hi there" }], }, - ]) - ); + uuid: "a-1", + timestamp: "2025-10-15T10:01:00Z", + }, + ]); + + await fs.writeFile(path.join(projectDir, "sess-001.jsonl"), content); const sessions = await discoverSessions(tmpDir); expect(sessions).toHaveLength(1); expect(sessions[0].id).toBe("sess-001"); - expect(sessions[0].summary).toBe("Test session"); expect(sessions[0].project).toBe("test-project"); - expect(sessions[0].messageCount).toBe(5); - expect(sessions[0].path).toBe(sessionPath); + expect(sessions[0].messageCount).toBe(2); + expect(sessions[0].firstPrompt).toBe("Hello world"); + expect(sessions[0].path).toBe(path.join(projectDir, "sess-001.jsonl")); - await fs.rm(tmpDir, { recursive: true }); + await cleanup(); }); - it("also handles legacy raw array format", async () => { - const tmpDir = path.join(os.tmpdir(), `sv-test-legacy-${Date.now()}`); - const projectDir = path.join(tmpDir, "legacy-project"); - await fs.mkdir(projectDir, { recursive: true }); + it("timestamps come from stat, not JSONL content", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("stat-times"); - // Raw array (not wrapped in { version, entries }) - await fs.writeFile( - path.join(projectDir, "sessions-index.json"), - JSON.stringify([ - { - sessionId: "legacy-001", - summary: "Legacy format", - created: "2025-10-15T10:00:00Z", - modified: "2025-10-15T11:00:00Z", - }, - ]) - ); + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Hello" }, + uuid: "u-1", + timestamp: "2020-01-01T00:00:00Z", + }, + ]); + + const filePath = path.join(projectDir, "sess-stat.jsonl"); + await fs.writeFile(filePath, content); const sessions = await discoverSessions(tmpDir); expect(sessions).toHaveLength(1); - expect(sessions[0].id).toBe("legacy-001"); - await fs.rm(tmpDir, { recursive: true }); + // created and modified should be from stat (recent), not from the 2020 timestamp + const createdDate = new Date(sessions[0].created); + const now = new Date(); + const diffMs = now.getTime() - createdDate.getTime(); + expect(diffMs).toBeLessThan(60_000); // within last minute + + await cleanup(); + }); + + it("silently skips files deleted between readdir and stat", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("toctou"); + + // Write a session, discover will find it + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Survives" }, + uuid: "u-1", + }, + ]); + await fs.writeFile(path.join(projectDir, "survivor.jsonl"), content); + + // Write and immediately delete another + await fs.writeFile(path.join(projectDir, "ghost.jsonl"), content); + await fs.unlink(path.join(projectDir, "ghost.jsonl")); + + const sessions = await discoverSessions(tmpDir); + expect(sessions).toHaveLength(1); + expect(sessions[0].id).toBe("survivor"); + + await cleanup(); }); it("handles missing projects directory gracefully", async () => { @@ -72,21 +124,6 @@ describe("session-discovery", () => { expect(sessions).toEqual([]); }); - it("handles corrupt index files gracefully", async () => { - const tmpDir = path.join(os.tmpdir(), `sv-test-corrupt-${Date.now()}`); - const projectDir = path.join(tmpDir, "corrupt-project"); - await fs.mkdir(projectDir, { recursive: true }); - await fs.writeFile( - path.join(projectDir, "sessions-index.json"), - "not valid json {" - ); - - const sessions = await discoverSessions(tmpDir); - expect(sessions).toEqual([]); - - await fs.rm(tmpDir, { recursive: true }); - }); - it("aggregates across multiple project directories", async () => { const tmpDir = path.join(os.tmpdir(), `sv-test-multi-${Date.now()}`); const proj1 = path.join(tmpDir, "project-a"); @@ -94,14 +131,25 @@ describe("session-discovery", () => { await fs.mkdir(proj1, { recursive: true }); await fs.mkdir(proj2, { recursive: true }); - await fs.writeFile( - path.join(proj1, "sessions-index.json"), - makeIndex([{ sessionId: "a-001", created: "2025-01-01T00:00:00Z", modified: "2025-01-01T00:00:00Z" }]) - ); - await fs.writeFile( - path.join(proj2, "sessions-index.json"), - makeIndex([{ sessionId: "b-001", created: "2025-01-02T00:00:00Z", modified: "2025-01-02T00:00:00Z" }]) - ); + const contentA = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Project A" }, + uuid: "u-a", + timestamp: "2025-01-01T00:00:00Z", + }, + ]); + const contentB = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Project B" }, + uuid: "u-b", + timestamp: "2025-01-02T00:00:00Z", + }, + ]); + + await fs.writeFile(path.join(proj1, "a-001.jsonl"), contentA); + await fs.writeFile(path.join(proj2, "b-001.jsonl"), contentB); const sessions = await discoverSessions(tmpDir); expect(sessions).toHaveLength(2); @@ -112,93 +160,299 @@ describe("session-discovery", () => { await fs.rm(tmpDir, { recursive: true }); }); - it("rejects paths with traversal segments", async () => { - const tmpDir = path.join(os.tmpdir(), `sv-test-traversal-${Date.now()}`); - const projectDir = path.join(tmpDir, "traversal-project"); - await fs.mkdir(projectDir, { recursive: true }); + it("ignores non-.jsonl files in project directories", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("filter-ext"); - const goodPath = path.join(projectDir, "good-001.jsonl"); + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Hello" }, + uuid: "u-1", + }, + ]); + + await fs.writeFile(path.join(projectDir, "session.jsonl"), content); await fs.writeFile( path.join(projectDir, "sessions-index.json"), - makeIndex([ - { - sessionId: "evil-001", - fullPath: "/home/ubuntu/../../../etc/passwd", - created: "2025-10-15T10:00:00Z", - modified: "2025-10-15T11:00:00Z", - }, - { - sessionId: "evil-002", - fullPath: "/home/ubuntu/sessions/not-a-jsonl.txt", - created: "2025-10-15T10:00:00Z", - modified: "2025-10-15T11:00:00Z", - }, - { - sessionId: "good-001", - fullPath: goodPath, - created: "2025-10-15T10:00:00Z", - modified: "2025-10-15T11:00:00Z", - }, - ]) + '{"version":1,"entries":[]}' ); + await fs.writeFile(path.join(projectDir, "notes.txt"), "notes"); const sessions = await discoverSessions(tmpDir); expect(sessions).toHaveLength(1); - expect(sessions[0].id).toBe("good-001"); + expect(sessions[0].id).toBe("session"); - await fs.rm(tmpDir, { recursive: true }); + await cleanup(); }); - it("rejects absolute paths outside the projects directory", async () => { - const tmpDir = path.join(os.tmpdir(), `sv-test-containment-${Date.now()}`); - const projectDir = path.join(tmpDir, "contained-project"); - await fs.mkdir(projectDir, { recursive: true }); + it("duration computed from JSONL timestamps", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("duration"); - await fs.writeFile( - path.join(projectDir, "sessions-index.json"), - makeIndex([ - { - sessionId: "escaped-001", - fullPath: "/etc/shadow.jsonl", - created: "2025-10-15T10:00:00Z", - modified: "2025-10-15T11:00:00Z", + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Start" }, + uuid: "u-1", + timestamp: "2025-10-15T10:00:00Z", + }, + { + type: "assistant", + message: { + role: "assistant", + content: [{ type: "text", text: "End" }], }, - { - sessionId: "escaped-002", - fullPath: "/tmp/other-dir/secret.jsonl", - created: "2025-10-15T10:00:00Z", - modified: "2025-10-15T11:00:00Z", - }, - ]) - ); + uuid: "a-1", + timestamp: "2025-10-15T10:30:00Z", + }, + ]); + + await fs.writeFile(path.join(projectDir, "sess-dur.jsonl"), content); const sessions = await discoverSessions(tmpDir); - expect(sessions).toHaveLength(0); + expect(sessions).toHaveLength(1); + // 30 minutes = 1800000 ms + expect(sessions[0].duration).toBe(1_800_000); - await fs.rm(tmpDir, { recursive: true }); + await cleanup(); }); - it("uses fullPath from index entry", async () => { - const tmpDir = path.join(os.tmpdir(), `sv-test-fp-${Date.now()}`); - const projectDir = path.join(tmpDir, "fp-project"); - await fs.mkdir(projectDir, { recursive: true }); + it("handles empty .jsonl files", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("empty"); - const sessionPath = path.join(projectDir, "fp-001.jsonl"); - await fs.writeFile( - path.join(projectDir, "sessions-index.json"), - makeIndex([ - { - sessionId: "fp-001", - fullPath: sessionPath, - created: "2025-10-15T10:00:00Z", - modified: "2025-10-15T11:00:00Z", - }, - ]) - ); + await fs.writeFile(path.join(projectDir, "empty.jsonl"), ""); const sessions = await discoverSessions(tmpDir); - expect(sessions[0].path).toBe(sessionPath); + expect(sessions).toHaveLength(1); + expect(sessions[0].id).toBe("empty"); + expect(sessions[0].messageCount).toBe(0); + expect(sessions[0].firstPrompt).toBe(""); - await fs.rm(tmpDir, { recursive: true }); + await cleanup(); + }); + + it("sorts by modified descending", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("sort"); + + const content1 = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "First" }, + uuid: "u-1", + }, + ]); + const content2 = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Second" }, + uuid: "u-2", + }, + ]); + + await fs.writeFile(path.join(projectDir, "older.jsonl"), content1); + // Small delay to ensure different mtime + await new Promise((r) => setTimeout(r, 50)); + await fs.writeFile(path.join(projectDir, "newer.jsonl"), content2); + + const sessions = await discoverSessions(tmpDir); + expect(sessions).toHaveLength(2); + expect(sessions[0].id).toBe("newer"); + expect(sessions[1].id).toBe("older"); + + await cleanup(); + }); + + describe("Tier 1 index validation", () => { + it("uses index data when modified matches stat mtime within 1s", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-hit"); + + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Hello" }, + uuid: "u-1", + timestamp: "2025-10-15T10:00:00Z", + }, + ]); + const filePath = path.join(projectDir, "sess-idx.jsonl"); + await fs.writeFile(filePath, content); + + // Get the actual mtime from the file + const stat = await fs.stat(filePath); + const mtimeIso = new Date(stat.mtimeMs).toISOString(); + + // Write an index with the matching modified timestamp and different metadata + await fs.writeFile( + path.join(projectDir, "sessions-index.json"), + makeIndex([ + { + sessionId: "sess-idx", + summary: "Index summary", + firstPrompt: "Index prompt", + messageCount: 99, + modified: mtimeIso, + created: "2025-10-15T09:00:00Z", + }, + ]) + ); + + const sessions = await discoverSessions(tmpDir); + expect(sessions).toHaveLength(1); + // Should use index data (Tier 1 hit) + expect(sessions[0].messageCount).toBe(99); + expect(sessions[0].summary).toBe("Index summary"); + expect(sessions[0].firstPrompt).toBe("Index prompt"); + + await cleanup(); + }); + + it("rejects index data when mtime mismatch > 1s", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-miss"); + + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Real content" }, + uuid: "u-1", + timestamp: "2025-10-15T10:00:00Z", + }, + ]); + await fs.writeFile(path.join(projectDir, "sess-stale.jsonl"), content); + + // Write an index with a very old modified timestamp (stale) + await fs.writeFile( + path.join(projectDir, "sessions-index.json"), + makeIndex([ + { + sessionId: "sess-stale", + summary: "Stale index summary", + firstPrompt: "Stale prompt", + messageCount: 99, + modified: "2020-01-01T00:00:00Z", + created: "2020-01-01T00:00:00Z", + }, + ]) + ); + + const sessions = await discoverSessions(tmpDir); + expect(sessions).toHaveLength(1); + // Should NOT use index data (Tier 1 miss) — falls through to Tier 3 + expect(sessions[0].messageCount).toBe(1); // Actual parse count + expect(sessions[0].firstPrompt).toBe("Real content"); + + await cleanup(); + }); + + it("skips Tier 1 when entry has no modified field", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-no-mod"); + + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Real content" }, + uuid: "u-1", + }, + ]); + await fs.writeFile(path.join(projectDir, "sess-nomod.jsonl"), content); + + await fs.writeFile( + path.join(projectDir, "sessions-index.json"), + makeIndex([ + { + sessionId: "sess-nomod", + summary: "Index summary", + messageCount: 99, + // No modified field + }, + ]) + ); + + const sessions = await discoverSessions(tmpDir); + expect(sessions).toHaveLength(1); + // Falls through to Tier 3 parse + expect(sessions[0].messageCount).toBe(1); + + await cleanup(); + }); + + it("handles missing sessions-index.json", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-missing"); + + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "No index" }, + uuid: "u-1", + }, + ]); + await fs.writeFile(path.join(projectDir, "sess-noindex.jsonl"), content); + + const sessions = await discoverSessions(tmpDir); + expect(sessions).toHaveLength(1); + expect(sessions[0].firstPrompt).toBe("No index"); + + await cleanup(); + }); + + it("handles corrupt sessions-index.json", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-corrupt"); + + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Corrupt index" }, + uuid: "u-1", + }, + ]); + await fs.writeFile(path.join(projectDir, "sess-corrupt.jsonl"), content); + await fs.writeFile( + path.join(projectDir, "sessions-index.json"), + "not valid json {" + ); + + const sessions = await discoverSessions(tmpDir); + expect(sessions).toHaveLength(1); + expect(sessions[0].firstPrompt).toBe("Corrupt index"); + + await cleanup(); + }); + + it("timestamps always from stat even on Tier 1 hit", async () => { + const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-stat-ts"); + + const content = makeJsonlContent([ + { + type: "user", + message: { role: "user", content: "Hello" }, + uuid: "u-1", + }, + ]); + const filePath = path.join(projectDir, "sess-ts.jsonl"); + await fs.writeFile(filePath, content); + + const stat = await fs.stat(filePath); + const mtimeIso = new Date(stat.mtimeMs).toISOString(); + + await fs.writeFile( + path.join(projectDir, "sessions-index.json"), + makeIndex([ + { + sessionId: "sess-ts", + messageCount: 1, + modified: mtimeIso, + created: "1990-01-01T00:00:00Z", + }, + ]) + ); + + const sessions = await discoverSessions(tmpDir); + expect(sessions).toHaveLength(1); + + // created/modified should be from stat (recent), not from index's 1990 date + const createdDate = new Date(sessions[0].created); + const now = new Date(); + expect(now.getTime() - createdDate.getTime()).toBeLessThan(60_000); + + await cleanup(); + }); }); });