docs: add per-note search PRD and user journey documentation

Per-note search PRD: Comprehensive product requirements for evolving
the search system from document-level to note-level granularity.
Includes 6 rounds of iterative feedback refining scope, ranking
strategy, migration path, and robot mode integration.

User journeys: Detailed walkthrough of 8 primary user workflows
covering issue triage, MR review lookup, code archaeology, expert
discovery, sync pipeline operation, and agent integration patterns.
This commit is contained in:
teernisse
2026-02-11 16:00:34 -05:00
parent cd25cf61ca
commit 125938fba6
8 changed files with 4103 additions and 0 deletions

View File

@@ -0,0 +1,131 @@
1. **Make immutable identity usable now (`--author-id`)**
Why: The plan captures `author_id` but intentionally defers using it, so the core longitudinal-analysis problem is only half-fixed.
```diff
@@ Phase 1: `lore notes` Command / Work Chunk 1A
pub struct NoteListFilters<'a> {
+ pub author_id: Option<i64>, // immutable identity filter
@@
- pub author: Option<&'a str>, // case-insensitive match via COLLATE NOCASE
+ pub author: Option<&'a str>, // display-name filter
+ // If both author and author_id are provided, apply both (AND) for precision.
}
@@
Filter mappings:
+ - `author_id`: `n.author_id = ?` (exact immutable identity)
- `author`: strip `@` prefix, `n.author_username = ? COLLATE NOCASE`
@@ Phase 1 / Work Chunk 1B (CLI)
+ /// Filter by immutable author id
+ #[arg(long = "author-id", help_heading = "Filters")]
+ pub author_id: Option<i64>,
@@ Phase 2 / Work Chunk 2F
+ Add `--author-id` support to `lore search` filtering for note documents.
@@ Phase 1 / Work Chunk 1E
+ CREATE INDEX IF NOT EXISTS idx_notes_project_author_id_created
+ ON notes(project_id, author_id, created_at DESC, id DESC)
+ WHERE is_system = 0 AND author_id IS NOT NULL;
```
2. **Fix document staleness on username changes**
Why: Current plan says username changes are “not semantic,” but note documents include username in content/title, so docs go stale/inconsistent.
```diff
@@ Work Chunk 0D: Immutable Author Identity Capture
- Assert: changed_semantics = false (username change is not a semantic change for documents)
+ Assert: changed_semantics = true (username affects note document content/title)
@@ Work Chunk 0A: semantic-change detection
- old_body != body || old_note_type != note_type || ...
+ old_body != body || old_note_type != note_type || ...
+ || old_author_username != author_username
@@ Work Chunk 2C: Note Document Extractor header
author: @{author}
+ author_id: {author_id}
```
3. **Replace `last_seen_at` sweep marker with monotonic `sync_run_id`**
Why: Timestamp markers are vulnerable to clock skew and concurrent runs; run IDs are deterministic and safer.
```diff
@@ Phase 0: Stable Note Identity
+ ### Work Chunk 0E: Monotonic Run Marker
+ Add `sync_runs` table and `notes.last_seen_run_id`.
+ Ingest assigns one run_id per sync transaction.
+ Upsert sets `last_seen_run_id = current_run_id`.
+ Sweep condition becomes `last_seen_run_id < current_run_id` (when fetch_complete=true).
@@ Work Chunk 0C
- fetch_complete + last_seen_at-based sweep
+ fetch_complete + run_id-based sweep
```
4. **Materialize stale-note set once during sweep**
Why: Current set-based SQL still re-runs the stale subquery 3 times; materializing once improves performance and guarantees identical deletion set.
```diff
@@ Work Chunk 0B: Immediate Deletion Propagation
- DELETE FROM documents ... IN (SELECT id FROM notes WHERE ...);
- DELETE FROM dirty_sources ... IN (SELECT id FROM notes WHERE ...);
- DELETE FROM notes WHERE ...;
+ CREATE TEMP TABLE _stale_note_ids AS
+ SELECT id, is_system FROM notes WHERE discussion_id = ? AND last_seen_run_id < ?;
+ DELETE FROM documents
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
+ DELETE FROM dirty_sources
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
+ DELETE FROM notes WHERE id IN (SELECT id FROM _stale_note_ids);
+ DROP TABLE _stale_note_ids;
```
5. **Move historical note backfill out of migration into resumable runtime job**
Why: Data-heavy migration can block startup and is harder to resume/recover on large DBs.
```diff
@@ Work Chunk 2H
- Backfill Existing Notes After Upgrade (Migration 024)
+ Backfill Existing Notes After Upgrade (Resumable Runtime Backfill)
@@
- Files: `migrations/024_note_dirty_backfill.sql`, `src/core/db.rs`
+ Files: `src/documents/backfill.rs`, `src/cli/commands/generate_docs.rs`
@@
- INSERT INTO dirty_sources ... SELECT ... FROM notes ...
+ Introduce batched backfill API:
+ `enqueue_missing_note_documents(batch_size: usize) -> BackfillProgress`
+ invoked from `generate-docs`/`sync` until complete, resumable across runs.
```
6. **Add streaming path for large `jsonl`/`csv` note exports**
Why: Current `query_notes` materializes full result set in memory; streaming improves scalability and latency.
```diff
@@ Work Chunk 1A
+ Add `query_notes_stream(conn, filters, row_handler)` for forward-only row iteration.
@@ Work Chunk 1C
- print_list_notes_jsonl(&result)
- print_list_notes_csv(&result)
+ print_list_notes_jsonl_stream(config, filters)
+ print_list_notes_csv_stream(config, filters)
+ (table/json keep counted buffered path)
```
7. **Add index for path-centric note queries**
Why: `--path` + project/date queries are a stated hot path and not fully covered by current proposed indexes.
```diff
@@ Work Chunk 1E: Composite Query Index
+ CREATE INDEX IF NOT EXISTS idx_notes_project_path_created
+ ON notes(project_id, position_new_path, created_at DESC, id DESC)
+ WHERE is_system = 0 AND position_new_path IS NOT NULL;
```
8. **Add property/invariant tests (not only examples)**
Why: This feature touches ingestion identity, sweeping, deletion propagation, and document regeneration; randomized invariants will catch subtle regressions.
```diff
@@ Verification Checklist
+ Add property tests (proptest):
+ - stable local IDs across randomized re-sync orderings
+ - no orphan `documents(source_type='note')` after randomized deletions/sweeps
+ - partial-fetch runs never reduce note count
+ - repeated full rebuild converges (fixed-point idempotence)
```
These revisions keep your existing direction, avoid all rejected items, and materially improve correctness, scale behavior, and long-term maintainability.