1. **Make immutable identity usable now (`--author-id`)** Why: The plan captures `author_id` but intentionally defers using it, so the core longitudinal-analysis problem is only half-fixed. ```diff @@ Phase 1: `lore notes` Command / Work Chunk 1A pub struct NoteListFilters<'a> { + pub author_id: Option, // immutable identity filter @@ - pub author: Option<&'a str>, // case-insensitive match via COLLATE NOCASE + pub author: Option<&'a str>, // display-name filter + // If both author and author_id are provided, apply both (AND) for precision. } @@ Filter mappings: + - `author_id`: `n.author_id = ?` (exact immutable identity) - `author`: strip `@` prefix, `n.author_username = ? COLLATE NOCASE` @@ Phase 1 / Work Chunk 1B (CLI) + /// Filter by immutable author id + #[arg(long = "author-id", help_heading = "Filters")] + pub author_id: Option, @@ Phase 2 / Work Chunk 2F + Add `--author-id` support to `lore search` filtering for note documents. @@ Phase 1 / Work Chunk 1E + CREATE INDEX IF NOT EXISTS idx_notes_project_author_id_created + ON notes(project_id, author_id, created_at DESC, id DESC) + WHERE is_system = 0 AND author_id IS NOT NULL; ``` 2. **Fix document staleness on username changes** Why: Current plan says username changes are “not semantic,” but note documents include username in content/title, so docs go stale/inconsistent. ```diff @@ Work Chunk 0D: Immutable Author Identity Capture - Assert: changed_semantics = false (username change is not a semantic change for documents) + Assert: changed_semantics = true (username affects note document content/title) @@ Work Chunk 0A: semantic-change detection - old_body != body || old_note_type != note_type || ... + old_body != body || old_note_type != note_type || ... + || old_author_username != author_username @@ Work Chunk 2C: Note Document Extractor header author: @{author} + author_id: {author_id} ``` 3. **Replace `last_seen_at` sweep marker with monotonic `sync_run_id`** Why: Timestamp markers are vulnerable to clock skew and concurrent runs; run IDs are deterministic and safer. ```diff @@ Phase 0: Stable Note Identity + ### Work Chunk 0E: Monotonic Run Marker + Add `sync_runs` table and `notes.last_seen_run_id`. + Ingest assigns one run_id per sync transaction. + Upsert sets `last_seen_run_id = current_run_id`. + Sweep condition becomes `last_seen_run_id < current_run_id` (when fetch_complete=true). @@ Work Chunk 0C - fetch_complete + last_seen_at-based sweep + fetch_complete + run_id-based sweep ``` 4. **Materialize stale-note set once during sweep** Why: Current set-based SQL still re-runs the stale subquery 3 times; materializing once improves performance and guarantees identical deletion set. ```diff @@ Work Chunk 0B: Immediate Deletion Propagation - DELETE FROM documents ... IN (SELECT id FROM notes WHERE ...); - DELETE FROM dirty_sources ... IN (SELECT id FROM notes WHERE ...); - DELETE FROM notes WHERE ...; + CREATE TEMP TABLE _stale_note_ids AS + SELECT id, is_system FROM notes WHERE discussion_id = ? AND last_seen_run_id < ?; + DELETE FROM documents + WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0); + DELETE FROM dirty_sources + WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0); + DELETE FROM notes WHERE id IN (SELECT id FROM _stale_note_ids); + DROP TABLE _stale_note_ids; ``` 5. **Move historical note backfill out of migration into resumable runtime job** Why: Data-heavy migration can block startup and is harder to resume/recover on large DBs. ```diff @@ Work Chunk 2H - Backfill Existing Notes After Upgrade (Migration 024) + Backfill Existing Notes After Upgrade (Resumable Runtime Backfill) @@ - Files: `migrations/024_note_dirty_backfill.sql`, `src/core/db.rs` + Files: `src/documents/backfill.rs`, `src/cli/commands/generate_docs.rs` @@ - INSERT INTO dirty_sources ... SELECT ... FROM notes ... + Introduce batched backfill API: + `enqueue_missing_note_documents(batch_size: usize) -> BackfillProgress` + invoked from `generate-docs`/`sync` until complete, resumable across runs. ``` 6. **Add streaming path for large `jsonl`/`csv` note exports** Why: Current `query_notes` materializes full result set in memory; streaming improves scalability and latency. ```diff @@ Work Chunk 1A + Add `query_notes_stream(conn, filters, row_handler)` for forward-only row iteration. @@ Work Chunk 1C - print_list_notes_jsonl(&result) - print_list_notes_csv(&result) + print_list_notes_jsonl_stream(config, filters) + print_list_notes_csv_stream(config, filters) + (table/json keep counted buffered path) ``` 7. **Add index for path-centric note queries** Why: `--path` + project/date queries are a stated hot path and not fully covered by current proposed indexes. ```diff @@ Work Chunk 1E: Composite Query Index + CREATE INDEX IF NOT EXISTS idx_notes_project_path_created + ON notes(project_id, position_new_path, created_at DESC, id DESC) + WHERE is_system = 0 AND position_new_path IS NOT NULL; ``` 8. **Add property/invariant tests (not only examples)** Why: This feature touches ingestion identity, sweeping, deletion propagation, and document regeneration; randomized invariants will catch subtle regressions. ```diff @@ Verification Checklist + Add property tests (proptest): + - stable local IDs across randomized re-sync orderings + - no orphan `documents(source_type='note')` after randomized deletions/sweeps + - partial-fetch runs never reduce note count + - repeated full rebuild converges (fixed-point idempotence) ``` These revisions keep your existing direction, avoid all rejected items, and materially improve correctness, scale behavior, and long-term maintainability.