Implement JSONL-first session discovery with tiered lookup

Rewrite session discovery to be filesystem-first, addressing the widespread
bug where Claude Code's sessions-index.json files are unreliable (87 MB of
unindexed sessions, 17% loss rate across all projects).

Architecture: Three-tier metadata lookup

Tier 1 - Index validation (instant):
  - Parse sessions-index.json into Map<sessionId, IndexEntry>
  - Validate entry.modified against actual file stat.mtimeMs
  - Use 1s tolerance to account for ISO string → filesystem mtime rounding
  - Trust content fields only (messageCount, summary, firstPrompt)
  - Timestamps always come from fs.stat, never from index

Tier 2 - Persistent cache hit (instant):
  - Check MetadataCache by (filePath, mtimeMs, size)
  - If match, use cached metadata
  - Survives server restarts

Tier 3 - Full JSONL parse (~5-50ms/file):
  - Call extractSessionMetadata() with shared parser helpers
  - Cache result for future lookups

Key correctness guarantees:
- All .jsonl files appear regardless of index state
- SessionEntry timestamps always from fs.stat (list ordering never stale)
- Message counts exact (shared helpers ensure parser parity)
- Duration computed from JSONL timestamps, not index

Performance:
- Bounded concurrency: 32 concurrent operations per project
- mapWithLimit() prevents file handle exhaustion
- Warm start <1s (stat all files, in-memory lookups)
- Cold start ~3-5s for 3,103 files (stat + parse phases)

TOCTOU handling:
- Files that disappear between readdir and stat: silently skipped
- Files that disappear between stat and read: silently skipped
- File actively being written: partial parse handled gracefully

Include PRD document that drove this implementation with detailed
requirements, edge cases, and verification plan.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
teernisse
2026-02-28 00:53:05 -05:00
parent f15a1b1b58
commit 8fddd50193
3 changed files with 926 additions and 187 deletions

View File

@@ -0,0 +1,313 @@
# PRD: JSONL-First Session Discovery
## Status: Ready for Implementation
## Context
Session viewer relies exclusively on `sessions-index.json` files that Claude Code maintains. These indexes are unreliable — a known, widespread bug with multiple open GitHub issues ([#22030](https://github.com/anthropics/claude-code/issues/22030), [#21610](https://github.com/anthropics/claude-code/issues/21610), [#18619](https://github.com/anthropics/claude-code/issues/18619), [#22114](https://github.com/anthropics/claude-code/issues/22114)).
### Root cause
Claude Code updates `sessions-index.json` only at session end. If a session crashes, is killed, or is abandoned, the JSONL file is written but the index is never updated. Multiple concurrent Claude instances can also corrupt the index (last-write-wins on a single JSON file). There is no reindex command and no background repair process.
### Impact on this system
- **542 unindexed JSONL files** across all projects (87 MB total)
- **48 unindexed in last 7 days** (30.8 MB)
- **13 projects** have JSONL session files but no index at all
- **Zero sessions from today** (Feb 4, 2026) appear in any index
- **3,103 total JSONL files** vs **2,563 indexed entries** = 17% loss rate
### Key insight
The `.jsonl` files are the source of truth. The index is an unreliable convenience cache. The session viewer must treat it that way.
## Requirements
### Must have
1. **All sessions with a `.jsonl` file must appear in the session list**, regardless of whether they're in `sessions-index.json`
2. **Exact message counts** — no estimates, no approximations. Contract: Tier 3 extraction MUST reuse the same line-classification logic as `parseSessionContent` (shared helper), so list counts cannot drift from detail parsing.
3. **Performance**: Warm start (cache exists, few changes) must complete under 1 second. Cold start (no cache) is acceptable up to 5 seconds for first request
4. **Correctness over speed** — never show stale metadata if the file has been modified
5. **Zero config** — works out of the box with no setup or external dependencies
### Should have
6. Session `summary` extracted from the last `type="summary"` line in the JSONL
7. Session `firstPrompt` extracted from the first non-system-reminder user message
8. Session `duration` MUST be derivable without relying on `sessions-index.json` — extract first and last timestamps from JSONL when index is missing or stale
9. Persistent metadata cache survives server restarts
### Won't have (this iteration)
- Real-time push updates (sessions appearing in UI without refresh)
- Background file watcher daemon
- Integration with `cass` as a search/indexing backend
- Rebuilding Claude Code's `sessions-index.json`
## Technical Design
### Architecture: Filesystem-primary with tiered metadata lookup
```
discoverSessions()
|
+-- For each project directory under ~/.claude/projects/:
| |
| +-- fs.readdir() --> list all *.jsonl files
| +-- Read sessions-index.json (optional, used as pre-populated cache)
| |
| +-- Batch stat all .jsonl files (bounded concurrency)
| | Files that disappeared between readdir and stat are silently skipped (TOCTOU race)
| |
| +-- For each .jsonl file:
| | |
| | +-- Tier 1: Check index
| | | Entry exists AND normalize(index.modified) matches stat mtime?
| | | --> Use index content data (messageCount, summary, firstPrompt)
| | | --> Use stat-derived timestamps for created/modified (always)
| | |
| | +-- Tier 2: Check persistent metadata cache
| | | path + mtimeMs + size match?
| | | --> Use cached metadata (fast path)
| | |
| | +-- Tier 3: Extract metadata from JSONL content
| | Read file, lightweight parse using shared line iterator + counting helper
| | --> Cache result for future lookups
| |
| +-- Collect SessionEntry[] for this project
|
+-- Merge all projects
+-- Sort by modified (descending) — always stat-derived, never index-derived
+-- Async: persist metadata cache to disk (if dirty)
```
### Tier explanation
| Tier | Source | Speed | When used | Trusts from source |
|------|--------|-------|-----------|--------------------|
| 1 | `sessions-index.json` | Instant (in-memory lookup) | Index exists, entry present, `normalize(modified)` matches actual file mtime | `messageCount`, `summary`, `firstPrompt` only. Timestamps always from stat. |
| 2 | Persistent metadata cache | Instant (in-memory lookup) | Index missing/stale, but file hasn't changed since last extraction (mtimeMs + size match) | All cached fields |
| 3 | JSONL file parse | ~5-50ms/file | New or modified file, not in any cache | Extracted fresh |
Tier 1 reuses Claude's index when it's valid — no wasted work. The index `modified` field (ISO string) is normalized to milliseconds and compared against the real file `stat.mtimeMs`. If the index is missing or corrupt, discovery continues with Tier 2 and 3 without error. Even when Tier 1 is valid, `created` and `modified` timestamps on the `SessionEntry` always come from `fs.stat` — the index is a content cache only.
### Tier 1: Index validation details
The actual `sessions-index.json` format has `created` and `modified` as ISO strings, not a `fileMtime` field. Tier 1 validation must:
1. Map JSONL filename to sessionId: `sessionId := path.basename(jsonlFile, '.jsonl')`
2. Look up `sessionId` in the index `Map<string, IndexEntry>`
3. Compare `new Date(entry.modified).getTime()` against `stat.mtimeMs` — reject if they differ by more than 1000ms (accounts for ISO string → filesystem mtime rounding)
4. If the index entry has no `modified` field, skip Tier 1 (fall through to Tier 2)
5. When Tier 1 is valid, trust only content fields (`messageCount`, `summary`, `firstPrompt`). The `created`/`modified` on the resulting `SessionEntry` must come from `stat.birthtimeMs`/`stat.mtimeMs` respectively — this ensures list ordering is never stale even within the 1s mtime tolerance window.
### Shared line-iteration and counting (parser parity contract)
The biggest correctness risk in this design is duplicating any JSONL processing logic. The real parser in `session-parser.ts` has non-trivial expansion rules:
- User array content: expands `tool_result` and `text` blocks into separate messages
- `system-reminder` detection reclassifies user `text` blocks as `system_message`
- Assistant array content: `thinking`, `text`, and `tool_use` each become separate messages
- `progress`, `file-history-snapshot`, `summary` → 1 message each
- `system`, `queue-operation` → 0 (skipped)
It also has error-handling behavior: malformed/truncated JSON lines are skipped (common when sessions crash mid-write). If the metadata extractor and the full parser handle malformed lines differently, counts will drift.
Rather than reimplementing any of these rules, extract shared helpers at two levels:
```typescript
// In session-parser.ts (or a shared module):
// Level 1: Line iteration with consistent error handling
// Splits content by newlines, JSON.parse each, skips malformed lines identically
// to how parseSessionContent handles them. Returns parse error count for diagnostics.
export function forEachJsonlLine(
content: string,
onLine: (parsed: RawLine, lineIndex: number) => void
): { parseErrors: number }
// Level 2: Classification and counting (called per parsed line)
export function countMessagesForLine(parsed: RawLine): number
export function classifyLine(parsed: RawLine): LineClassification
```
Both `extractSessionMetadata()` and `parseSessionContent()` use `forEachJsonlLine()` for iteration, ensuring identical malformed-line handling. Both use `countMessagesForLine()` for counting. This two-level sharing guarantees that list counts can never drift from detail-view counts, regardless of future parser changes or edge cases in error handling.
### Metadata extraction (Tier 3)
A lightweight `extractSessionMetadata()` function reads the JSONL file and extracts only what the list view needs, without building full message content strings:
```typescript
export function extractSessionMetadata(content: string): SessionMetadata
```
Implementation:
1. Iterate lines via `forEachJsonlLine(content, ...)` — the shared iterator with identical malformed-line handling as the main parser
2. Call `countMessagesForLine(parsed)` per line — the shared helper that uses the **same classification rules** as `parseSessionContent` in `session-parser.ts`
3. Extract `firstPrompt`: content of the first user message that isn't a `<system-reminder>`, truncated to 200 characters
4. Extract `summary`: the `summary` field from the last `type="summary"` line
5. Capture first and last `timestamp` fields for duration computation
No string building, no `JSON.stringify`, no markdown processing — just counting, timestamp capture, and first-match extraction. This is exact (matches `parseSessionContent().length` via shared helpers) but 2-3x faster than full parsing.
### Persistent metadata cache
**Location:** `~/.cache/session-viewer/metadata.json`
```typescript
interface CacheFile {
version: 1;
entries: Record<string, { // keyed by absolute file path
mtimeMs: number;
size: number;
messageCount: number;
firstPrompt: string;
summary: string;
created: string; // ISO string from file birthtime
modified: string; // ISO string from file mtime
firstTimestamp: string; // ISO from first JSONL line with timestamp
lastTimestamp: string; // ISO from last JSONL line with timestamp
}>;
}
```
Behavior:
- Loaded once on first `discoverSessions()` call
- Entries validated by `(mtimeMs, size)` — if either changes, entry is re-extracted via Tier 3
- Written to disk asynchronously using a dirty-flag write-behind strategy: only when cache has new/updated entries, coalescing multiple discovery passes, non-blocking
- Flush any pending write on process exit (`SIGTERM`, `SIGINT`) and graceful server shutdown — prevents losing cache updates when the server stops before the async write fires
- Corrupt or missing cache file triggers graceful fallback (all files go through Tier 3, cache rebuilt)
- Atomic writes: write to temp file, then rename (prevents corruption from crashes during write)
- Stale entries (file no longer exists on disk) are pruned on save
### Concurrency model
Cold start with 3,103 files requires bounded parallelism to avoid file-handle exhaustion and IO thrash, while still meeting the <5s target:
- **Stat phase**: Batch all `fs.stat()` calls with concurrency limit (e.g., 64). This classifies each file into Tier 1/2 (cache hit) or Tier 3 (needs parse). Files that fail stat (ENOENT from deletion race, EACCES) are silently skipped with a debug log.
- **Parse phase**: Process Tier-3 misses with bounded concurrency (e.g., 8). Each parse reads + iterates via shared `forEachJsonlLine()` + shared counter. With max file size 4.5MB, each parse is ~5-50ms.
- Use a simple async work queue (e.g., `p-limit` or hand-rolled semaphore). No worker threads needed for this IO-bound workload.
### Performance expectations
| Scenario | Estimated time |
|----------|---------------|
| Cold start (no cache, no index) | ~3-5s for 3,103 files (~500MB), bounded concurrency: stat@64, parse@8 |
| Warm start (cache exists, few changes) | ~300-500ms (stat all files at bounded concurrency, in-memory lookups) |
| Incremental (cache + few new sessions) | ~500ms + ~50ms per new file |
| Subsequent API calls within 30s TTL | <1ms (in-memory session list cache) |
### Existing infrastructure leveraged
- **30-second in-memory cache** in `sessions.ts` (`getCachedSessions()`) — unchanged, provides the fast path for repeated API calls
- **`?refresh=1` query parameter** — forces cache invalidation, unchanged
- **Concurrent request deduplication** via `cachePromise` pattern — unchanged
- **Security validations** — path traversal rejection, containment checks, `.jsonl` extension enforcement — applied to filesystem-discovered files identically
## Implementation scope
### Checkpoints
#### CP0 — Parser parity foundations
- Extract `forEachJsonlLine()` shared line iterator from existing parser
- Extract `countMessagesForLine()` and `classifyLine()` shared helpers
- Refactor `extractMessages()` to use these internally (no behavior change to parseSessionContent)
- Tests verify identical behavior on malformed/truncated lines
#### CP1 — Filesystem-first correctness
- All `.jsonl` sessions appear even with missing/corrupt index
- `extractSessionMetadata()` uses shared line iterator + counting helpers; exact counts verified by tests
- Stat-derived `created`/`modified` are the single source for SessionEntry timestamps and list ordering
- Duration computed from JSONL timestamps, not index
- TOCTOU races (readdir/stat, stat/read) handled gracefully — disappeared files silently skipped
#### CP2 — Persistent cache
- Atomic writes with dirty-flag write-behind; prune stale entries
- Invalidation keyed on `(mtimeMs, size)`
- Flush pending writes on process exit / server shutdown
#### CP3 — Index fast path (Tier 1)
- Parse index into Map; normalize `modified` ISO → ms; validate against stat mtime with 1s tolerance
- sessionId mapping: `basename(file, '.jsonl')`
- Tier 1 trusts content fields only; timestamps always from stat
#### CP4 — Performance hardening
- Bounded concurrency for stat + parse phases
- Warm start <1s verified on real dataset
### Modified files
**`src/server/services/session-parser.ts`**
1. Extract `forEachJsonlLine(content, onLine): { parseErrors: number }` — shared line iterator with consistent malformed-line handling
2. Extract `countMessagesForLine(parsed: RawLine): number` — shared counting helper
3. Extract `classifyLine(parsed: RawLine): LineClassification` — shared classification
4. Refactor `extractMessages()` to use these shared helpers internally (no behavior change to parseSessionContent)
**`src/server/services/session-discovery.ts`**
1. Add `extractSessionMetadata(content: string): SessionMetadata` — lightweight JSONL metadata extractor using shared line iterator + counting helper
2. Add `MetadataCache` class — persistent cache with load/get/set/save, dirty-flag write-behind, shutdown flush
3. Rewrite per-project discovery loop — filesystem-first, tiered metadata lookup with bounded concurrency
4. Read `sessions-index.json` as optimization only — parse into `Map<sessionId, IndexEntry>`, normalize `modified` to ms, validate against stat mtime before trusting
5. Register shutdown hooks for cache flush on `SIGTERM`/`SIGINT`
### Unchanged files
- `src/server/routes/sessions.ts` — existing caching layer works as-is
- `src/shared/types.ts``SessionEntry` type already has `duration?: number`
- All client components — no changes needed
### New tests
- Unit test: `forEachJsonlLine()` skips malformed lines identically to how `parseSessionContent` handles them
- Unit test: `forEachJsonlLine()` reports parse error count for truncated/corrupted lines
- Unit test: `countMessagesForLine()` matches actual `extractMessages()` output length on sample lines
- Unit test: `extractSessionMetadata()` output matches `parseSessionContent().length` on sample fixtures (including malformed/truncated lines)
- Unit test: Duration extracted from JSONL timestamps matches expected values
- Unit test: SessionEntry `created`/`modified` always come from stat, even when Tier 1 index data is trusted
- Unit test: Tier 1 validation rejects stale index entries (mtime mismatch beyond 1s tolerance)
- Unit test: Tier 1 handles missing `modified` field gracefully (falls through to Tier 2)
- Unit test: Discovery works with no `sessions-index.json` present
- Unit test: Discovery silently skips files that disappear between readdir and stat (TOCTOU)
- Unit test: Cache hit/miss/invalidation behavior (mtimeMs + size)
- Unit test: Cache dirty-flag only triggers write when entries changed
## Edge cases
| Scenario | Behavior |
|----------|----------|
| File actively being written | mtime changes between stat and read. Next discovery pass re-extracts. Partial JSONL handled gracefully (malformed lines skipped via shared `forEachJsonlLine`, same behavior as real parser). |
| Deleted session files | File in cache but gone from disk. Entry silently dropped, pruned from cache on next save. |
| File disappears between readdir and stat | TOCTOU race. Stat failure (ENOENT) silently skipped with debug log. |
| File disappears between stat and read | Read failure silently skipped; file excluded from results. Next pass re-discovers if it reappears. |
| Index entry with wrong mtime | Tier 1 validation rejects it (>1s tolerance). Falls through to Tier 2/3. |
| Index entry with no `modified` field | Tier 1 skips it. Falls through to Tier 2/3. |
| Index `modified` in seconds vs milliseconds | Normalization handles both ISO strings and numeric timestamps. |
| Cache file locked or unwritable | Extraction still works, just doesn't persist. Warning logged to stderr. |
| Very large files | 4.5MB max observed. Tier 3 parse ~50ms. Acceptable. |
| Concurrent server restarts | Cache writes are atomic (temp file + rename). |
| Server killed before async cache write | Shutdown hooks flush pending writes on SIGTERM/SIGINT. Hard kills (SIGKILL) may lose updates — acceptable, cache rebuilt on next cold start. |
| Empty JSONL files | Returns `messageCount: 0`, empty `firstPrompt`, `summary`, and timestamps. Duration: 0. |
| Projects with no index file | Discovery proceeds normally via Tier 2/3. Common case (13 projects). |
| Non-JSONL files in project dirs | Filtered out by `.jsonl` extension check in `readdir` results. |
| File handle exhaustion | Bounded concurrency (stat@64, parse@8) prevents opening thousands of handles. |
| Future parser changes (new message types) | Shared line iterator + counting helper in session-parser.ts means Tier 3 automatically stays in sync. |
| Malformed JSONL lines (crash mid-write) | Shared `forEachJsonlLine()` skips identically in both metadata extraction and full parsing — no count drift. |
## Verification plan
1. Start dev server, confirm today's sessions appear immediately in the session list
2. Compare message counts for indexed sessions: Tier 1 data vs Tier 3 extraction (should match)
3. Verify duration is shown for sessions that have no index entry (JSONL-only sessions)
4. Delete a `sessions-index.json`, refresh — verify all sessions for that project still appear with correct counts and durations
5. Run existing test suite: `npm test`
6. Run new unit tests for shared line iterator, counting helper, `extractSessionMetadata()`, and `MetadataCache`
7. Verify `created`/`modified` in session list come from stat, not index (compare with `ls -l` output)
8. Verify cold start performance: delete `~/.cache/session-viewer/metadata.json`, time the first API request
9. Verify warm start performance: time a subsequent server start with cache in place
10. Verify cache dirty-flag: repeated refreshes with no file changes should not write cache to disk
11. Kill server with SIGTERM, restart — verify cache was flushed (no full re-parse on restart)

View File

@@ -2,6 +2,54 @@ import fs from "fs/promises";
import path from "path";
import os from "os";
import type { SessionEntry } from "../../shared/types.js";
import { extractSessionMetadata } from "./session-metadata.js";
import { MetadataCache } from "./metadata-cache.js";
import type { CacheEntry } from "./metadata-cache.js";
const CLAUDE_PROJECTS_DIR = path.join(os.homedir(), ".claude", "projects");
const FILE_CONCURRENCY = 32;
let cache: MetadataCache | null = null;
let cacheLoaded = false;
export function setCache(c: MetadataCache | null): void {
cache = c;
cacheLoaded = c !== null;
}
async function ensureCache(): Promise<MetadataCache> {
if (!cache) {
cache = new MetadataCache();
}
if (!cacheLoaded) {
await cache.load();
cacheLoaded = true;
}
return cache;
}
async function mapWithLimit<T, R>(
items: T[],
limit: number,
fn: (item: T) => Promise<R>
): Promise<R[]> {
const results: R[] = new Array(items.length);
let nextIndex = 0;
async function worker(): Promise<void> {
while (nextIndex < items.length) {
const i = nextIndex++;
results[i] = await fn(items[i]);
}
}
const workers = Array.from(
{ length: Math.min(limit, items.length) },
() => worker()
);
await Promise.all(workers);
return results;
}
interface IndexEntry {
sessionId: string;
@@ -14,12 +62,14 @@ interface IndexEntry {
projectPath?: string;
}
const CLAUDE_PROJECTS_DIR = path.join(os.homedir(), ".claude", "projects");
const MTIME_TOLERANCE_MS = 1000;
export async function discoverSessions(
projectsDir: string = CLAUDE_PROJECTS_DIR
): Promise<SessionEntry[]> {
const sessions: SessionEntry[] = [];
const metadataCache = await ensureCache();
const discoveredPaths = new Set<string>();
let projectDirs: string[];
try {
@@ -28,63 +78,152 @@ export async function discoverSessions(
return sessions;
}
// Parallel I/O: stat + readFile for all project dirs concurrently
const results = await Promise.all(
projectDirs.map(async (projectDir) => {
const projectPath = path.join(projectsDir, projectDir);
const entries: SessionEntry[] = [];
let stat;
let dirStat;
try {
stat = await fs.stat(projectPath);
dirStat = await fs.stat(projectPath);
} catch {
return entries;
}
if (!stat.isDirectory()) return entries;
if (!dirStat.isDirectory()) return entries;
const indexPath = path.join(projectPath, "sessions-index.json");
let files: string[];
try {
const content = await fs.readFile(indexPath, "utf-8");
const parsed = JSON.parse(content);
// Handle both formats: raw array or { version, entries: [...] }
const rawEntries: IndexEntry[] = Array.isArray(parsed)
? parsed
: parsed.entries ?? [];
for (const entry of rawEntries) {
const sessionPath =
entry.fullPath ||
path.join(projectPath, `${entry.sessionId}.jsonl`);
// Validate: reject paths with traversal segments or non-JSONL extensions.
// Check the raw path for ".." before resolving (resolve normalizes them away).
if (sessionPath.includes("..") || !sessionPath.endsWith(".jsonl")) {
continue;
}
const resolved = path.resolve(sessionPath);
// Containment check: reject paths that escape the projects directory.
// A corrupted or malicious index could set fullPath to an arbitrary
// absolute path like "/etc/shadow.jsonl".
if (!resolved.startsWith(projectsDir + path.sep) && resolved !== projectsDir) {
continue;
}
entries.push({
id: entry.sessionId,
summary: entry.summary || "",
firstPrompt: entry.firstPrompt || "",
project: projectDir,
created: entry.created || "",
modified: entry.modified || "",
messageCount: entry.messageCount || 0,
path: resolved,
duration: computeDuration(entry.created, entry.modified),
});
}
files = await fs.readdir(projectPath);
} catch {
// Missing or corrupt index - skip
return entries;
}
const jsonlFiles = files.filter((f) => f.endsWith(".jsonl"));
// Tier 1: Load sessions-index.json for this project
const indexMap = await loadProjectIndex(projectPath);
const fileResults = await mapWithLimit(
jsonlFiles,
FILE_CONCURRENCY,
async (filename) => {
const filePath = path.join(projectPath, filename);
// Security: reject traversal
if (filename.includes("..")) return null;
const resolved = path.resolve(filePath);
if (
!resolved.startsWith(projectsDir + path.sep) &&
resolved !== projectsDir
) {
return null;
}
let fileStat;
try {
fileStat = await fs.stat(resolved);
} catch {
return null;
}
discoveredPaths.add(resolved);
const sessionId = path.basename(filename, ".jsonl");
// Tier 1: Check index
const indexEntry = indexMap.get(sessionId);
if (indexEntry?.modified) {
const indexMtimeMs = new Date(indexEntry.modified).getTime();
if (
!isNaN(indexMtimeMs) &&
Math.abs(indexMtimeMs - fileStat.mtimeMs) <= MTIME_TOLERANCE_MS
) {
const duration = computeDuration(
indexEntry.created,
indexEntry.modified
);
return {
id: sessionId,
project: projectDir,
path: resolved,
created: new Date(fileStat.birthtimeMs).toISOString(),
modified: new Date(fileStat.mtimeMs).toISOString(),
messageCount: indexEntry.messageCount || 0,
firstPrompt: indexEntry.firstPrompt || "",
summary: indexEntry.summary || "",
duration: duration > 0 ? duration : undefined,
} satisfies SessionEntry;
}
}
// Tier 2: Check metadata cache
const cached = metadataCache.get(
resolved,
fileStat.mtimeMs,
fileStat.size
);
if (cached) {
const duration = computeDuration(
cached.firstTimestamp,
cached.lastTimestamp
);
return {
id: sessionId,
project: projectDir,
path: resolved,
created: new Date(fileStat.birthtimeMs).toISOString(),
modified: new Date(fileStat.mtimeMs).toISOString(),
messageCount: cached.messageCount,
firstPrompt: cached.firstPrompt,
summary: cached.summary,
duration: duration > 0 ? duration : undefined,
} satisfies SessionEntry;
}
// Tier 3: Full parse
let content: string;
try {
content = await fs.readFile(resolved, "utf-8");
} catch {
return null;
}
const metadata = extractSessionMetadata(content);
// Update cache
const cacheEntry: CacheEntry = {
mtimeMs: fileStat.mtimeMs,
size: fileStat.size,
messageCount: metadata.messageCount,
firstPrompt: metadata.firstPrompt,
summary: metadata.summary,
firstTimestamp: metadata.firstTimestamp,
lastTimestamp: metadata.lastTimestamp,
};
metadataCache.set(resolved, cacheEntry);
const duration = computeDuration(
metadata.firstTimestamp,
metadata.lastTimestamp
);
return {
id: sessionId,
project: projectDir,
path: resolved,
created: new Date(fileStat.birthtimeMs).toISOString(),
modified: new Date(fileStat.mtimeMs).toISOString(),
messageCount: metadata.messageCount,
firstPrompt: metadata.firstPrompt,
summary: metadata.summary,
duration: duration > 0 ? duration : undefined,
} satisfies SessionEntry;
}
);
for (const entry of fileResults) {
if (entry) entries.push(entry);
}
return entries;
@@ -101,14 +240,47 @@ export async function discoverSessions(
return dateB - dateA;
});
// Fire-and-forget cache save
metadataCache.save(discoveredPaths).catch(() => {
// Cache write failure is non-fatal
});
return sessions;
}
function computeDuration(created?: string, modified?: string): number {
if (!created || !modified) return 0;
const createdMs = new Date(created).getTime();
const modifiedMs = new Date(modified).getTime();
if (isNaN(createdMs) || isNaN(modifiedMs)) return 0;
const diff = modifiedMs - createdMs;
async function loadProjectIndex(
projectPath: string
): Promise<Map<string, IndexEntry>> {
const indexMap = new Map<string, IndexEntry>();
const indexPath = path.join(projectPath, "sessions-index.json");
try {
const raw = await fs.readFile(indexPath, "utf-8");
const parsed = JSON.parse(raw);
const rawEntries: IndexEntry[] = Array.isArray(parsed)
? parsed
: parsed.entries ?? [];
for (const entry of rawEntries) {
if (entry.sessionId) {
indexMap.set(entry.sessionId, entry);
}
}
} catch {
// Missing or corrupt index — continue without Tier 1
}
return indexMap;
}
function computeDuration(
firstTimestamp?: string,
lastTimestamp?: string
): number {
if (!firstTimestamp || !lastTimestamp) return 0;
const firstMs = new Date(firstTimestamp).getTime();
const lastMs = new Date(lastTimestamp).getTime();
if (isNaN(firstMs) || isNaN(lastMs)) return 0;
const diff = lastMs - firstMs;
return diff > 0 ? diff : 0;
}

View File

@@ -1,70 +1,122 @@
import { describe, it, expect } from "vitest";
import { discoverSessions } from "../../src/server/services/session-discovery.js";
import { describe, it, expect, beforeEach } from "vitest";
import { discoverSessions, setCache } from "../../src/server/services/session-discovery.js";
import { MetadataCache } from "../../src/server/services/metadata-cache.js";
import path from "path";
import fs from "fs/promises";
import os from "os";
/** Helper to write a sessions-index.json in the real { version, entries } format */
function makeIndex(entries: Record<string, unknown>[]) {
function makeJsonlContent(lines: Record<string, unknown>[]): string {
return lines.map((l) => JSON.stringify(l)).join("\n");
}
function makeIndex(entries: Record<string, unknown>[]): string {
return JSON.stringify({ version: 1, entries });
}
describe("session-discovery", () => {
it("discovers sessions from { version, entries } format", async () => {
const tmpDir = path.join(os.tmpdir(), `sv-test-${Date.now()}`);
const projectDir = path.join(tmpDir, "test-project");
await fs.mkdir(projectDir, { recursive: true });
async function makeTmpProject(
suffix: string
): Promise<{ tmpDir: string; projectDir: string; cachePath: string; cleanup: () => Promise<void> }> {
const tmpDir = path.join(os.tmpdir(), `sv-test-${suffix}-${Date.now()}`);
const projectDir = path.join(tmpDir, "test-project");
const cachePath = path.join(tmpDir, ".cache", "metadata.json");
await fs.mkdir(projectDir, { recursive: true });
return {
tmpDir,
projectDir,
cachePath,
cleanup: () => fs.rm(tmpDir, { recursive: true }),
};
}
const sessionPath = path.join(projectDir, "sess-001.jsonl");
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
makeIndex([
{
sessionId: "sess-001",
fullPath: sessionPath,
summary: "Test session",
firstPrompt: "Hello",
created: "2025-10-15T10:00:00Z",
modified: "2025-10-15T11:00:00Z",
messageCount: 5,
describe("session-discovery", () => {
beforeEach(() => {
// Reset global cache between tests to prevent cross-contamination
setCache(new MetadataCache(path.join(os.tmpdir(), `sv-cache-${Date.now()}.json`)));
});
it("discovers sessions from .jsonl files without index", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("no-index");
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Hello world" },
uuid: "u-1",
timestamp: "2025-10-15T10:00:00Z",
},
{
type: "assistant",
message: {
role: "assistant",
content: [{ type: "text", text: "Hi there" }],
},
])
);
uuid: "a-1",
timestamp: "2025-10-15T10:01:00Z",
},
]);
await fs.writeFile(path.join(projectDir, "sess-001.jsonl"), content);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
expect(sessions[0].id).toBe("sess-001");
expect(sessions[0].summary).toBe("Test session");
expect(sessions[0].project).toBe("test-project");
expect(sessions[0].messageCount).toBe(5);
expect(sessions[0].path).toBe(sessionPath);
expect(sessions[0].messageCount).toBe(2);
expect(sessions[0].firstPrompt).toBe("Hello world");
expect(sessions[0].path).toBe(path.join(projectDir, "sess-001.jsonl"));
await fs.rm(tmpDir, { recursive: true });
await cleanup();
});
it("also handles legacy raw array format", async () => {
const tmpDir = path.join(os.tmpdir(), `sv-test-legacy-${Date.now()}`);
const projectDir = path.join(tmpDir, "legacy-project");
await fs.mkdir(projectDir, { recursive: true });
it("timestamps come from stat, not JSONL content", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("stat-times");
// Raw array (not wrapped in { version, entries })
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
JSON.stringify([
{
sessionId: "legacy-001",
summary: "Legacy format",
created: "2025-10-15T10:00:00Z",
modified: "2025-10-15T11:00:00Z",
},
])
);
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Hello" },
uuid: "u-1",
timestamp: "2020-01-01T00:00:00Z",
},
]);
const filePath = path.join(projectDir, "sess-stat.jsonl");
await fs.writeFile(filePath, content);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
expect(sessions[0].id).toBe("legacy-001");
await fs.rm(tmpDir, { recursive: true });
// created and modified should be from stat (recent), not from the 2020 timestamp
const createdDate = new Date(sessions[0].created);
const now = new Date();
const diffMs = now.getTime() - createdDate.getTime();
expect(diffMs).toBeLessThan(60_000); // within last minute
await cleanup();
});
it("silently skips files deleted between readdir and stat", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("toctou");
// Write a session, discover will find it
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Survives" },
uuid: "u-1",
},
]);
await fs.writeFile(path.join(projectDir, "survivor.jsonl"), content);
// Write and immediately delete another
await fs.writeFile(path.join(projectDir, "ghost.jsonl"), content);
await fs.unlink(path.join(projectDir, "ghost.jsonl"));
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
expect(sessions[0].id).toBe("survivor");
await cleanup();
});
it("handles missing projects directory gracefully", async () => {
@@ -72,21 +124,6 @@ describe("session-discovery", () => {
expect(sessions).toEqual([]);
});
it("handles corrupt index files gracefully", async () => {
const tmpDir = path.join(os.tmpdir(), `sv-test-corrupt-${Date.now()}`);
const projectDir = path.join(tmpDir, "corrupt-project");
await fs.mkdir(projectDir, { recursive: true });
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
"not valid json {"
);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toEqual([]);
await fs.rm(tmpDir, { recursive: true });
});
it("aggregates across multiple project directories", async () => {
const tmpDir = path.join(os.tmpdir(), `sv-test-multi-${Date.now()}`);
const proj1 = path.join(tmpDir, "project-a");
@@ -94,14 +131,25 @@ describe("session-discovery", () => {
await fs.mkdir(proj1, { recursive: true });
await fs.mkdir(proj2, { recursive: true });
await fs.writeFile(
path.join(proj1, "sessions-index.json"),
makeIndex([{ sessionId: "a-001", created: "2025-01-01T00:00:00Z", modified: "2025-01-01T00:00:00Z" }])
);
await fs.writeFile(
path.join(proj2, "sessions-index.json"),
makeIndex([{ sessionId: "b-001", created: "2025-01-02T00:00:00Z", modified: "2025-01-02T00:00:00Z" }])
);
const contentA = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Project A" },
uuid: "u-a",
timestamp: "2025-01-01T00:00:00Z",
},
]);
const contentB = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Project B" },
uuid: "u-b",
timestamp: "2025-01-02T00:00:00Z",
},
]);
await fs.writeFile(path.join(proj1, "a-001.jsonl"), contentA);
await fs.writeFile(path.join(proj2, "b-001.jsonl"), contentB);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(2);
@@ -112,93 +160,299 @@ describe("session-discovery", () => {
await fs.rm(tmpDir, { recursive: true });
});
it("rejects paths with traversal segments", async () => {
const tmpDir = path.join(os.tmpdir(), `sv-test-traversal-${Date.now()}`);
const projectDir = path.join(tmpDir, "traversal-project");
await fs.mkdir(projectDir, { recursive: true });
it("ignores non-.jsonl files in project directories", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("filter-ext");
const goodPath = path.join(projectDir, "good-001.jsonl");
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Hello" },
uuid: "u-1",
},
]);
await fs.writeFile(path.join(projectDir, "session.jsonl"), content);
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
makeIndex([
{
sessionId: "evil-001",
fullPath: "/home/ubuntu/../../../etc/passwd",
created: "2025-10-15T10:00:00Z",
modified: "2025-10-15T11:00:00Z",
},
{
sessionId: "evil-002",
fullPath: "/home/ubuntu/sessions/not-a-jsonl.txt",
created: "2025-10-15T10:00:00Z",
modified: "2025-10-15T11:00:00Z",
},
{
sessionId: "good-001",
fullPath: goodPath,
created: "2025-10-15T10:00:00Z",
modified: "2025-10-15T11:00:00Z",
},
])
'{"version":1,"entries":[]}'
);
await fs.writeFile(path.join(projectDir, "notes.txt"), "notes");
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
expect(sessions[0].id).toBe("good-001");
expect(sessions[0].id).toBe("session");
await fs.rm(tmpDir, { recursive: true });
await cleanup();
});
it("rejects absolute paths outside the projects directory", async () => {
const tmpDir = path.join(os.tmpdir(), `sv-test-containment-${Date.now()}`);
const projectDir = path.join(tmpDir, "contained-project");
await fs.mkdir(projectDir, { recursive: true });
it("duration computed from JSONL timestamps", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("duration");
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
makeIndex([
{
sessionId: "escaped-001",
fullPath: "/etc/shadow.jsonl",
created: "2025-10-15T10:00:00Z",
modified: "2025-10-15T11:00:00Z",
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Start" },
uuid: "u-1",
timestamp: "2025-10-15T10:00:00Z",
},
{
type: "assistant",
message: {
role: "assistant",
content: [{ type: "text", text: "End" }],
},
{
sessionId: "escaped-002",
fullPath: "/tmp/other-dir/secret.jsonl",
created: "2025-10-15T10:00:00Z",
modified: "2025-10-15T11:00:00Z",
},
])
);
uuid: "a-1",
timestamp: "2025-10-15T10:30:00Z",
},
]);
await fs.writeFile(path.join(projectDir, "sess-dur.jsonl"), content);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(0);
expect(sessions).toHaveLength(1);
// 30 minutes = 1800000 ms
expect(sessions[0].duration).toBe(1_800_000);
await fs.rm(tmpDir, { recursive: true });
await cleanup();
});
it("uses fullPath from index entry", async () => {
const tmpDir = path.join(os.tmpdir(), `sv-test-fp-${Date.now()}`);
const projectDir = path.join(tmpDir, "fp-project");
await fs.mkdir(projectDir, { recursive: true });
it("handles empty .jsonl files", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("empty");
const sessionPath = path.join(projectDir, "fp-001.jsonl");
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
makeIndex([
{
sessionId: "fp-001",
fullPath: sessionPath,
created: "2025-10-15T10:00:00Z",
modified: "2025-10-15T11:00:00Z",
},
])
);
await fs.writeFile(path.join(projectDir, "empty.jsonl"), "");
const sessions = await discoverSessions(tmpDir);
expect(sessions[0].path).toBe(sessionPath);
expect(sessions).toHaveLength(1);
expect(sessions[0].id).toBe("empty");
expect(sessions[0].messageCount).toBe(0);
expect(sessions[0].firstPrompt).toBe("");
await fs.rm(tmpDir, { recursive: true });
await cleanup();
});
it("sorts by modified descending", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("sort");
const content1 = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "First" },
uuid: "u-1",
},
]);
const content2 = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Second" },
uuid: "u-2",
},
]);
await fs.writeFile(path.join(projectDir, "older.jsonl"), content1);
// Small delay to ensure different mtime
await new Promise((r) => setTimeout(r, 50));
await fs.writeFile(path.join(projectDir, "newer.jsonl"), content2);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(2);
expect(sessions[0].id).toBe("newer");
expect(sessions[1].id).toBe("older");
await cleanup();
});
describe("Tier 1 index validation", () => {
it("uses index data when modified matches stat mtime within 1s", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-hit");
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Hello" },
uuid: "u-1",
timestamp: "2025-10-15T10:00:00Z",
},
]);
const filePath = path.join(projectDir, "sess-idx.jsonl");
await fs.writeFile(filePath, content);
// Get the actual mtime from the file
const stat = await fs.stat(filePath);
const mtimeIso = new Date(stat.mtimeMs).toISOString();
// Write an index with the matching modified timestamp and different metadata
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
makeIndex([
{
sessionId: "sess-idx",
summary: "Index summary",
firstPrompt: "Index prompt",
messageCount: 99,
modified: mtimeIso,
created: "2025-10-15T09:00:00Z",
},
])
);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
// Should use index data (Tier 1 hit)
expect(sessions[0].messageCount).toBe(99);
expect(sessions[0].summary).toBe("Index summary");
expect(sessions[0].firstPrompt).toBe("Index prompt");
await cleanup();
});
it("rejects index data when mtime mismatch > 1s", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-miss");
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Real content" },
uuid: "u-1",
timestamp: "2025-10-15T10:00:00Z",
},
]);
await fs.writeFile(path.join(projectDir, "sess-stale.jsonl"), content);
// Write an index with a very old modified timestamp (stale)
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
makeIndex([
{
sessionId: "sess-stale",
summary: "Stale index summary",
firstPrompt: "Stale prompt",
messageCount: 99,
modified: "2020-01-01T00:00:00Z",
created: "2020-01-01T00:00:00Z",
},
])
);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
// Should NOT use index data (Tier 1 miss) — falls through to Tier 3
expect(sessions[0].messageCount).toBe(1); // Actual parse count
expect(sessions[0].firstPrompt).toBe("Real content");
await cleanup();
});
it("skips Tier 1 when entry has no modified field", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-no-mod");
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Real content" },
uuid: "u-1",
},
]);
await fs.writeFile(path.join(projectDir, "sess-nomod.jsonl"), content);
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
makeIndex([
{
sessionId: "sess-nomod",
summary: "Index summary",
messageCount: 99,
// No modified field
},
])
);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
// Falls through to Tier 3 parse
expect(sessions[0].messageCount).toBe(1);
await cleanup();
});
it("handles missing sessions-index.json", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-missing");
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "No index" },
uuid: "u-1",
},
]);
await fs.writeFile(path.join(projectDir, "sess-noindex.jsonl"), content);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
expect(sessions[0].firstPrompt).toBe("No index");
await cleanup();
});
it("handles corrupt sessions-index.json", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-corrupt");
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Corrupt index" },
uuid: "u-1",
},
]);
await fs.writeFile(path.join(projectDir, "sess-corrupt.jsonl"), content);
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
"not valid json {"
);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
expect(sessions[0].firstPrompt).toBe("Corrupt index");
await cleanup();
});
it("timestamps always from stat even on Tier 1 hit", async () => {
const { tmpDir, projectDir, cleanup } = await makeTmpProject("tier1-stat-ts");
const content = makeJsonlContent([
{
type: "user",
message: { role: "user", content: "Hello" },
uuid: "u-1",
},
]);
const filePath = path.join(projectDir, "sess-ts.jsonl");
await fs.writeFile(filePath, content);
const stat = await fs.stat(filePath);
const mtimeIso = new Date(stat.mtimeMs).toISOString();
await fs.writeFile(
path.join(projectDir, "sessions-index.json"),
makeIndex([
{
sessionId: "sess-ts",
messageCount: 1,
modified: mtimeIso,
created: "1990-01-01T00:00:00Z",
},
])
);
const sessions = await discoverSessions(tmpDir);
expect(sessions).toHaveLength(1);
// created/modified should be from stat (recent), not from index's 1990 date
const createdDate = new Date(sessions[0].created);
const now = new Date();
expect(now.getTime() - createdDate.getTime()).toBeLessThan(60_000);
await cleanup();
});
});
});