docs: add per-note search PRD and user journey documentation
Per-note search PRD: Comprehensive product requirements for evolving the search system from document-level to note-level granularity. Includes 6 rounds of iterative feedback refining scope, ranking strategy, migration path, and robot mode integration. User journeys: Detailed walkthrough of 8 primary user workflows covering issue triage, MR review lookup, code archaeology, expert discovery, sync pipeline operation, and agent integration patterns.
This commit is contained in:
131
docs/prd-per-note-search.feedback-6.md
Normal file
131
docs/prd-per-note-search.feedback-6.md
Normal file
@@ -0,0 +1,131 @@
|
||||
1. **Make immutable identity usable now (`--author-id`)**
|
||||
Why: The plan captures `author_id` but intentionally defers using it, so the core longitudinal-analysis problem is only half-fixed.
|
||||
|
||||
```diff
|
||||
@@ Phase 1: `lore notes` Command / Work Chunk 1A
|
||||
pub struct NoteListFilters<'a> {
|
||||
+ pub author_id: Option<i64>, // immutable identity filter
|
||||
@@
|
||||
- pub author: Option<&'a str>, // case-insensitive match via COLLATE NOCASE
|
||||
+ pub author: Option<&'a str>, // display-name filter
|
||||
+ // If both author and author_id are provided, apply both (AND) for precision.
|
||||
}
|
||||
@@
|
||||
Filter mappings:
|
||||
+ - `author_id`: `n.author_id = ?` (exact immutable identity)
|
||||
- `author`: strip `@` prefix, `n.author_username = ? COLLATE NOCASE`
|
||||
@@ Phase 1 / Work Chunk 1B (CLI)
|
||||
+ /// Filter by immutable author id
|
||||
+ #[arg(long = "author-id", help_heading = "Filters")]
|
||||
+ pub author_id: Option<i64>,
|
||||
@@ Phase 2 / Work Chunk 2F
|
||||
+ Add `--author-id` support to `lore search` filtering for note documents.
|
||||
@@ Phase 1 / Work Chunk 1E
|
||||
+ CREATE INDEX IF NOT EXISTS idx_notes_project_author_id_created
|
||||
+ ON notes(project_id, author_id, created_at DESC, id DESC)
|
||||
+ WHERE is_system = 0 AND author_id IS NOT NULL;
|
||||
```
|
||||
|
||||
2. **Fix document staleness on username changes**
|
||||
Why: Current plan says username changes are “not semantic,” but note documents include username in content/title, so docs go stale/inconsistent.
|
||||
|
||||
```diff
|
||||
@@ Work Chunk 0D: Immutable Author Identity Capture
|
||||
- Assert: changed_semantics = false (username change is not a semantic change for documents)
|
||||
+ Assert: changed_semantics = true (username affects note document content/title)
|
||||
@@ Work Chunk 0A: semantic-change detection
|
||||
- old_body != body || old_note_type != note_type || ...
|
||||
+ old_body != body || old_note_type != note_type || ...
|
||||
+ || old_author_username != author_username
|
||||
@@ Work Chunk 2C: Note Document Extractor header
|
||||
author: @{author}
|
||||
+ author_id: {author_id}
|
||||
```
|
||||
|
||||
3. **Replace `last_seen_at` sweep marker with monotonic `sync_run_id`**
|
||||
Why: Timestamp markers are vulnerable to clock skew and concurrent runs; run IDs are deterministic and safer.
|
||||
|
||||
```diff
|
||||
@@ Phase 0: Stable Note Identity
|
||||
+ ### Work Chunk 0E: Monotonic Run Marker
|
||||
+ Add `sync_runs` table and `notes.last_seen_run_id`.
|
||||
+ Ingest assigns one run_id per sync transaction.
|
||||
+ Upsert sets `last_seen_run_id = current_run_id`.
|
||||
+ Sweep condition becomes `last_seen_run_id < current_run_id` (when fetch_complete=true).
|
||||
@@ Work Chunk 0C
|
||||
- fetch_complete + last_seen_at-based sweep
|
||||
+ fetch_complete + run_id-based sweep
|
||||
```
|
||||
|
||||
4. **Materialize stale-note set once during sweep**
|
||||
Why: Current set-based SQL still re-runs the stale subquery 3 times; materializing once improves performance and guarantees identical deletion set.
|
||||
|
||||
```diff
|
||||
@@ Work Chunk 0B: Immediate Deletion Propagation
|
||||
- DELETE FROM documents ... IN (SELECT id FROM notes WHERE ...);
|
||||
- DELETE FROM dirty_sources ... IN (SELECT id FROM notes WHERE ...);
|
||||
- DELETE FROM notes WHERE ...;
|
||||
+ CREATE TEMP TABLE _stale_note_ids AS
|
||||
+ SELECT id, is_system FROM notes WHERE discussion_id = ? AND last_seen_run_id < ?;
|
||||
+ DELETE FROM documents
|
||||
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
|
||||
+ DELETE FROM dirty_sources
|
||||
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
|
||||
+ DELETE FROM notes WHERE id IN (SELECT id FROM _stale_note_ids);
|
||||
+ DROP TABLE _stale_note_ids;
|
||||
```
|
||||
|
||||
5. **Move historical note backfill out of migration into resumable runtime job**
|
||||
Why: Data-heavy migration can block startup and is harder to resume/recover on large DBs.
|
||||
|
||||
```diff
|
||||
@@ Work Chunk 2H
|
||||
- Backfill Existing Notes After Upgrade (Migration 024)
|
||||
+ Backfill Existing Notes After Upgrade (Resumable Runtime Backfill)
|
||||
@@
|
||||
- Files: `migrations/024_note_dirty_backfill.sql`, `src/core/db.rs`
|
||||
+ Files: `src/documents/backfill.rs`, `src/cli/commands/generate_docs.rs`
|
||||
@@
|
||||
- INSERT INTO dirty_sources ... SELECT ... FROM notes ...
|
||||
+ Introduce batched backfill API:
|
||||
+ `enqueue_missing_note_documents(batch_size: usize) -> BackfillProgress`
|
||||
+ invoked from `generate-docs`/`sync` until complete, resumable across runs.
|
||||
```
|
||||
|
||||
6. **Add streaming path for large `jsonl`/`csv` note exports**
|
||||
Why: Current `query_notes` materializes full result set in memory; streaming improves scalability and latency.
|
||||
|
||||
```diff
|
||||
@@ Work Chunk 1A
|
||||
+ Add `query_notes_stream(conn, filters, row_handler)` for forward-only row iteration.
|
||||
@@ Work Chunk 1C
|
||||
- print_list_notes_jsonl(&result)
|
||||
- print_list_notes_csv(&result)
|
||||
+ print_list_notes_jsonl_stream(config, filters)
|
||||
+ print_list_notes_csv_stream(config, filters)
|
||||
+ (table/json keep counted buffered path)
|
||||
```
|
||||
|
||||
7. **Add index for path-centric note queries**
|
||||
Why: `--path` + project/date queries are a stated hot path and not fully covered by current proposed indexes.
|
||||
|
||||
```diff
|
||||
@@ Work Chunk 1E: Composite Query Index
|
||||
+ CREATE INDEX IF NOT EXISTS idx_notes_project_path_created
|
||||
+ ON notes(project_id, position_new_path, created_at DESC, id DESC)
|
||||
+ WHERE is_system = 0 AND position_new_path IS NOT NULL;
|
||||
```
|
||||
|
||||
8. **Add property/invariant tests (not only examples)**
|
||||
Why: This feature touches ingestion identity, sweeping, deletion propagation, and document regeneration; randomized invariants will catch subtle regressions.
|
||||
|
||||
```diff
|
||||
@@ Verification Checklist
|
||||
+ Add property tests (proptest):
|
||||
+ - stable local IDs across randomized re-sync orderings
|
||||
+ - no orphan `documents(source_type='note')` after randomized deletions/sweeps
|
||||
+ - partial-fetch runs never reduce note count
|
||||
+ - repeated full rebuild converges (fixed-point idempotence)
|
||||
```
|
||||
|
||||
These revisions keep your existing direction, avoid all rejected items, and materially improve correctness, scale behavior, and long-term maintainability.
|
||||
Reference in New Issue
Block a user