Per-note search PRD: Comprehensive product requirements for evolving the search system from document-level to note-level granularity. Includes 6 rounds of iterative feedback refining scope, ranking strategy, migration path, and robot mode integration. User journeys: Detailed walkthrough of 8 primary user workflows covering issue triage, MR review lookup, code archaeology, expert discovery, sync pipeline operation, and agent integration patterns.
131 lines
5.6 KiB
Markdown
131 lines
5.6 KiB
Markdown
1. **Make immutable identity usable now (`--author-id`)**
|
|
Why: The plan captures `author_id` but intentionally defers using it, so the core longitudinal-analysis problem is only half-fixed.
|
|
|
|
```diff
|
|
@@ Phase 1: `lore notes` Command / Work Chunk 1A
|
|
pub struct NoteListFilters<'a> {
|
|
+ pub author_id: Option<i64>, // immutable identity filter
|
|
@@
|
|
- pub author: Option<&'a str>, // case-insensitive match via COLLATE NOCASE
|
|
+ pub author: Option<&'a str>, // display-name filter
|
|
+ // If both author and author_id are provided, apply both (AND) for precision.
|
|
}
|
|
@@
|
|
Filter mappings:
|
|
+ - `author_id`: `n.author_id = ?` (exact immutable identity)
|
|
- `author`: strip `@` prefix, `n.author_username = ? COLLATE NOCASE`
|
|
@@ Phase 1 / Work Chunk 1B (CLI)
|
|
+ /// Filter by immutable author id
|
|
+ #[arg(long = "author-id", help_heading = "Filters")]
|
|
+ pub author_id: Option<i64>,
|
|
@@ Phase 2 / Work Chunk 2F
|
|
+ Add `--author-id` support to `lore search` filtering for note documents.
|
|
@@ Phase 1 / Work Chunk 1E
|
|
+ CREATE INDEX IF NOT EXISTS idx_notes_project_author_id_created
|
|
+ ON notes(project_id, author_id, created_at DESC, id DESC)
|
|
+ WHERE is_system = 0 AND author_id IS NOT NULL;
|
|
```
|
|
|
|
2. **Fix document staleness on username changes**
|
|
Why: Current plan says username changes are “not semantic,” but note documents include username in content/title, so docs go stale/inconsistent.
|
|
|
|
```diff
|
|
@@ Work Chunk 0D: Immutable Author Identity Capture
|
|
- Assert: changed_semantics = false (username change is not a semantic change for documents)
|
|
+ Assert: changed_semantics = true (username affects note document content/title)
|
|
@@ Work Chunk 0A: semantic-change detection
|
|
- old_body != body || old_note_type != note_type || ...
|
|
+ old_body != body || old_note_type != note_type || ...
|
|
+ || old_author_username != author_username
|
|
@@ Work Chunk 2C: Note Document Extractor header
|
|
author: @{author}
|
|
+ author_id: {author_id}
|
|
```
|
|
|
|
3. **Replace `last_seen_at` sweep marker with monotonic `sync_run_id`**
|
|
Why: Timestamp markers are vulnerable to clock skew and concurrent runs; run IDs are deterministic and safer.
|
|
|
|
```diff
|
|
@@ Phase 0: Stable Note Identity
|
|
+ ### Work Chunk 0E: Monotonic Run Marker
|
|
+ Add `sync_runs` table and `notes.last_seen_run_id`.
|
|
+ Ingest assigns one run_id per sync transaction.
|
|
+ Upsert sets `last_seen_run_id = current_run_id`.
|
|
+ Sweep condition becomes `last_seen_run_id < current_run_id` (when fetch_complete=true).
|
|
@@ Work Chunk 0C
|
|
- fetch_complete + last_seen_at-based sweep
|
|
+ fetch_complete + run_id-based sweep
|
|
```
|
|
|
|
4. **Materialize stale-note set once during sweep**
|
|
Why: Current set-based SQL still re-runs the stale subquery 3 times; materializing once improves performance and guarantees identical deletion set.
|
|
|
|
```diff
|
|
@@ Work Chunk 0B: Immediate Deletion Propagation
|
|
- DELETE FROM documents ... IN (SELECT id FROM notes WHERE ...);
|
|
- DELETE FROM dirty_sources ... IN (SELECT id FROM notes WHERE ...);
|
|
- DELETE FROM notes WHERE ...;
|
|
+ CREATE TEMP TABLE _stale_note_ids AS
|
|
+ SELECT id, is_system FROM notes WHERE discussion_id = ? AND last_seen_run_id < ?;
|
|
+ DELETE FROM documents
|
|
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
|
|
+ DELETE FROM dirty_sources
|
|
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
|
|
+ DELETE FROM notes WHERE id IN (SELECT id FROM _stale_note_ids);
|
|
+ DROP TABLE _stale_note_ids;
|
|
```
|
|
|
|
5. **Move historical note backfill out of migration into resumable runtime job**
|
|
Why: Data-heavy migration can block startup and is harder to resume/recover on large DBs.
|
|
|
|
```diff
|
|
@@ Work Chunk 2H
|
|
- Backfill Existing Notes After Upgrade (Migration 024)
|
|
+ Backfill Existing Notes After Upgrade (Resumable Runtime Backfill)
|
|
@@
|
|
- Files: `migrations/024_note_dirty_backfill.sql`, `src/core/db.rs`
|
|
+ Files: `src/documents/backfill.rs`, `src/cli/commands/generate_docs.rs`
|
|
@@
|
|
- INSERT INTO dirty_sources ... SELECT ... FROM notes ...
|
|
+ Introduce batched backfill API:
|
|
+ `enqueue_missing_note_documents(batch_size: usize) -> BackfillProgress`
|
|
+ invoked from `generate-docs`/`sync` until complete, resumable across runs.
|
|
```
|
|
|
|
6. **Add streaming path for large `jsonl`/`csv` note exports**
|
|
Why: Current `query_notes` materializes full result set in memory; streaming improves scalability and latency.
|
|
|
|
```diff
|
|
@@ Work Chunk 1A
|
|
+ Add `query_notes_stream(conn, filters, row_handler)` for forward-only row iteration.
|
|
@@ Work Chunk 1C
|
|
- print_list_notes_jsonl(&result)
|
|
- print_list_notes_csv(&result)
|
|
+ print_list_notes_jsonl_stream(config, filters)
|
|
+ print_list_notes_csv_stream(config, filters)
|
|
+ (table/json keep counted buffered path)
|
|
```
|
|
|
|
7. **Add index for path-centric note queries**
|
|
Why: `--path` + project/date queries are a stated hot path and not fully covered by current proposed indexes.
|
|
|
|
```diff
|
|
@@ Work Chunk 1E: Composite Query Index
|
|
+ CREATE INDEX IF NOT EXISTS idx_notes_project_path_created
|
|
+ ON notes(project_id, position_new_path, created_at DESC, id DESC)
|
|
+ WHERE is_system = 0 AND position_new_path IS NOT NULL;
|
|
```
|
|
|
|
8. **Add property/invariant tests (not only examples)**
|
|
Why: This feature touches ingestion identity, sweeping, deletion propagation, and document regeneration; randomized invariants will catch subtle regressions.
|
|
|
|
```diff
|
|
@@ Verification Checklist
|
|
+ Add property tests (proptest):
|
|
+ - stable local IDs across randomized re-sync orderings
|
|
+ - no orphan `documents(source_type='note')` after randomized deletions/sweeps
|
|
+ - partial-fetch runs never reduce note count
|
|
+ - repeated full rebuild converges (fixed-point idempotence)
|
|
```
|
|
|
|
These revisions keep your existing direction, avoid all rejected items, and materially improve correctness, scale behavior, and long-term maintainability. |