docs: add per-note search PRD and user journey documentation

Per-note search PRD: Comprehensive product requirements for evolving the search system from document-level to note-level granularity. Includes 6 rounds of iterative feedback refining scope, ranking strategy, migration path, and robot mode integration. User journeys: Detailed walkthrough of 8 primary user workflows covering issue triage, MR review lookup, code archaeology, expert discovery, sync pipeline operation, and agent integration patterns.
2026-02-11 16:00:34 -05:00
parent cd25cf61ca
commit 125938fba6
8 changed files with 4103 additions and 0 deletions
--- a/docs/prd-per-note-search.feedback-6.md
+++ b/docs/prd-per-note-search.feedback-6.md
@@ -0,0 +1,131 @@
+1. **Make immutable identity usable now (`--author-id`)**
+Why: The plan captures `author_id` but intentionally defers using it, so the core longitudinal-analysis problem is only half-fixed.
+
+```diff
+@@ Phase 1: `lore notes` Command / Work Chunk 1A
+ pub struct NoteListFilters<'a> {
+    pub author_id: Option<i64>,         // immutable identity filter
+@@
+-    pub author: Option<&'a str>,          // case-insensitive match via COLLATE NOCASE
+    pub author: Option<&'a str>,          // display-name filter
+    // If both author and author_id are provided, apply both (AND) for precision.
+ }
+@@
+ Filter mappings:
+ - `author_id`: `n.author_id = ?` (exact immutable identity)
+  - `author`: strip `@` prefix, `n.author_username = ? COLLATE NOCASE`
+@@ Phase 1 / Work Chunk 1B (CLI)
+ /// Filter by immutable author id
+ #[arg(long = "author-id", help_heading = "Filters")]
+ pub author_id: Option<i64>,
+@@ Phase 2 / Work Chunk 2F
+ Add `--author-id` support to `lore search` filtering for note documents.
+@@ Phase 1 / Work Chunk 1E
+ CREATE INDEX IF NOT EXISTS idx_notes_project_author_id_created
+ ON notes(project_id, author_id, created_at DESC, id DESC)
+ WHERE is_system = 0 AND author_id IS NOT NULL;
+```
+
+2. **Fix document staleness on username changes**
+Why: Current plan says username changes are “not semantic,” but note documents include username in content/title, so docs go stale/inconsistent.
+
+```diff
+@@ Work Chunk 0D: Immutable Author Identity Capture
+- Assert: changed_semantics = false (username change is not a semantic change for documents)
+ Assert: changed_semantics = true (username affects note document content/title)
+@@ Work Chunk 0A: semantic-change detection
+- old_body != body || old_note_type != note_type || ...
+ old_body != body || old_note_type != note_type || ...
+ || old_author_username != author_username
+@@ Work Chunk 2C: Note Document Extractor header
+     author: @{author}
+    author_id: {author_id}
+```
+
+3. **Replace `last_seen_at` sweep marker with monotonic `sync_run_id`**
+Why: Timestamp markers are vulnerable to clock skew and concurrent runs; run IDs are deterministic and safer.
+
+```diff
+@@ Phase 0: Stable Note Identity
+ ### Work Chunk 0E: Monotonic Run Marker
+ Add `sync_runs` table and `notes.last_seen_run_id`.
+ Ingest assigns one run_id per sync transaction.
+ Upsert sets `last_seen_run_id = current_run_id`.
+ Sweep condition becomes `last_seen_run_id < current_run_id` (when fetch_complete=true).
+@@ Work Chunk 0C
+- fetch_complete + last_seen_at-based sweep
+ fetch_complete + run_id-based sweep
+```
+
+4. **Materialize stale-note set once during sweep**
+Why: Current set-based SQL still re-runs the stale subquery 3 times; materializing once improves performance and guarantees identical deletion set.
+
+```diff
+@@ Work Chunk 0B: Immediate Deletion Propagation
+- DELETE FROM documents ... IN (SELECT id FROM notes WHERE ...);
+- DELETE FROM dirty_sources ... IN (SELECT id FROM notes WHERE ...);
+- DELETE FROM notes WHERE ...;
+ CREATE TEMP TABLE _stale_note_ids AS
+ SELECT id, is_system FROM notes WHERE discussion_id = ? AND last_seen_run_id < ?;
+ DELETE FROM documents
+  WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
+ DELETE FROM dirty_sources
+  WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
+ DELETE FROM notes WHERE id IN (SELECT id FROM _stale_note_ids);
+ DROP TABLE _stale_note_ids;
+```
+
+5. **Move historical note backfill out of migration into resumable runtime job**
+Why: Data-heavy migration can block startup and is harder to resume/recover on large DBs.
+
+```diff
+@@ Work Chunk 2H
+- Backfill Existing Notes After Upgrade (Migration 024)
+ Backfill Existing Notes After Upgrade (Resumable Runtime Backfill)
+@@
+- Files: `migrations/024_note_dirty_backfill.sql`, `src/core/db.rs`
+ Files: `src/documents/backfill.rs`, `src/cli/commands/generate_docs.rs`
+@@
+- INSERT INTO dirty_sources ... SELECT ... FROM notes ...
+ Introduce batched backfill API:
+ `enqueue_missing_note_documents(batch_size: usize) -> BackfillProgress`
+ invoked from `generate-docs`/`sync` until complete, resumable across runs.
+```
+
+6. **Add streaming path for large `jsonl`/`csv` note exports**
+Why: Current `query_notes` materializes full result set in memory; streaming improves scalability and latency.
+
+```diff
+@@ Work Chunk 1A
+ Add `query_notes_stream(conn, filters, row_handler)` for forward-only row iteration.
+@@ Work Chunk 1C
+- print_list_notes_jsonl(&result)
+- print_list_notes_csv(&result)
+ print_list_notes_jsonl_stream(config, filters)
+ print_list_notes_csv_stream(config, filters)
+ (table/json keep counted buffered path)
+```
+
+7. **Add index for path-centric note queries**
+Why: `--path` + project/date queries are a stated hot path and not fully covered by current proposed indexes.
+
+```diff
+@@ Work Chunk 1E: Composite Query Index
+ CREATE INDEX IF NOT EXISTS idx_notes_project_path_created
+ ON notes(project_id, position_new_path, created_at DESC, id DESC)
+ WHERE is_system = 0 AND position_new_path IS NOT NULL;
+```
+
+8. **Add property/invariant tests (not only examples)**
+Why: This feature touches ingestion identity, sweeping, deletion propagation, and document regeneration; randomized invariants will catch subtle regressions.
+
+```diff
+@@ Verification Checklist
+ Add property tests (proptest):
+ - stable local IDs across randomized re-sync orderings
+ - no orphan `documents(source_type='note')` after randomized deletions/sweeps
+ - partial-fetch runs never reduce note count
+ - repeated full rebuild converges (fixed-point idempotence)
+```
+
+These revisions keep your existing direction, avoid all rejected items, and materially improve correctness, scale behavior, and long-term maintainability.