Per-note search PRD: Comprehensive product requirements for evolving the search system from document-level to note-level granularity. Includes 6 rounds of iterative feedback refining scope, ranking strategy, migration path, and robot mode integration. User journeys: Detailed walkthrough of 8 primary user workflows covering issue triage, MR review lookup, code archaeology, expert discovery, sync pipeline operation, and agent integration patterns.
174 lines
7.2 KiB
Markdown
174 lines
7.2 KiB
Markdown
Highest-impact gaps I see in the current plan:
|
|
|
|
1. `for-issue` / `for-mr` filtering is ambiguous across projects and can return incorrect rows.
|
|
2. `lore notes` has no pagination contract, so large exports and deterministic resumption are weak.
|
|
3. Migration `022` is high-risk (table rebuild + FTS + junction tables) without explicit integrity gates.
|
|
4. Note-doc freshness is incomplete for upstream note deletions and parent metadata changes (labels/title).
|
|
|
|
Below are my best revisions, each with rationale and a git-diff-style plan edit.
|
|
|
|
---
|
|
|
|
1. **Add gated rollout + rollback controls**
|
|
Rationale: You can still “ship together” while reducing blast radius. This makes recovery fast if note-doc generation causes DB/embedding pressure.
|
|
|
|
```diff
|
|
@@ ## Design
|
|
-Two phases, shipped together as one feature:
|
|
+Two phases, shipped together as one feature, but with runtime gates:
|
|
+
|
|
+- `feature.notes_cli` (Phase 1 surface)
|
|
+- `feature.note_documents` (Phase 2 indexing/extraction path)
|
|
+
|
|
+Rollout order:
|
|
+1) Enable `notes_cli`
|
|
+2) Run note-doc backfill in bounded batches
|
|
+3) Enable `note_documents` for continuous updates
|
|
+
|
|
+Rollback:
|
|
+- Disabling `feature.note_documents` stops new note-doc generation without affecting issue/MR/discussion docs.
|
|
```
|
|
|
|
2. **Add keyset pagination + deterministic ordering**
|
|
Rationale: Needed for year-long reviewer analysis and reliable “continue where I left off” behavior under concurrent updates.
|
|
|
|
```diff
|
|
@@ pub struct NoteListFilters<'a> {
|
|
pub limit: usize,
|
|
+ pub cursor: Option<&'a str>, // keyset token "<sort_ms>:<id>"
|
|
+ pub include_total_count: bool, // avoid COUNT(*) in hot paths
|
|
@@
|
|
- pub sort: &'a str, // "created" (default) | "updated"
|
|
+ pub sort: &'a str, // "created" | "updated"
|
|
@@ query_notes SQL
|
|
-ORDER BY {sort_column} {order}
|
|
+ORDER BY {sort_column} {order}, n.id {order}
|
|
LIMIT ?
|
|
```
|
|
|
|
3. **Make `for-issue` / `for-mr` project-scoped**
|
|
Rationale: IIDs are not globally unique. Requiring project avoids false positives and hard-to-debug cross-project leakage.
|
|
|
|
```diff
|
|
@@ pub struct NotesArgs {
|
|
- #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr")]
|
|
+ #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr", requires = "project")]
|
|
pub for_issue: Option<i64>,
|
|
@@
|
|
- #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue")]
|
|
+ #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue", requires = "project")]
|
|
pub for_mr: Option<i64>,
|
|
```
|
|
|
|
4. **Upgrade path filtering semantics**
|
|
Rationale: Review comments often reference renames/moves. Restricting to `position_new_path` misses relevant notes.
|
|
|
|
```diff
|
|
@@ pub struct NotesArgs {
|
|
- /// Filter by file path (trailing / for prefix match)
|
|
+ /// Filter by file path
|
|
#[arg(long, help_heading = "Filters")]
|
|
pub path: Option<String>,
|
|
+ /// Path mode: exact|prefix|glob
|
|
+ #[arg(long = "path-mode", value_parser = ["exact","prefix","glob"], default_value = "exact", help_heading = "Filters")]
|
|
+ pub path_mode: String,
|
|
+ /// Match against old path as well as new path
|
|
+ #[arg(long = "match-old-path", help_heading = "Filters")]
|
|
+ pub match_old_path: bool,
|
|
@@ query_notes filter mappings
|
|
-- `path` ... n.position_new_path ...
|
|
+- `path` applies to `n.position_new_path` and optionally `n.position_old_path`.
|
|
+- `glob` mode translates `*`/`?` to SQL LIKE with escaping.
|
|
```
|
|
|
|
5. **Add explicit performance indexes (new migration)**
|
|
Rationale: `notes` becomes a first-class query surface; without indexes, filters degrade quickly at 10k+ note scale.
|
|
|
|
```diff
|
|
@@ ## Phase 1: `lore notes` Command
|
|
+### Work Chunk 1E: Query Performance Indexes
|
|
+**Files:** `migrations/023_notes_query_indexes.sql`, `src/core/db.rs`
|
|
+
|
|
+Add indexes:
|
|
+- `notes(project_id, created_at DESC, id DESC)`
|
|
+- `notes(author_username, created_at DESC, id DESC) WHERE is_system = 0`
|
|
+- `notes(discussion_id)`
|
|
+- `notes(position_new_path)`
|
|
+- `notes(position_old_path)`
|
|
+- `discussions(issue_id)`
|
|
+- `discussions(merge_request_id)`
|
|
```
|
|
|
|
6. **Harden migration 022 with transactional integrity checks**
|
|
Rationale: This is the riskiest part of the plan. Add hard fail-fast checks so corruption cannot silently pass.
|
|
|
|
```diff
|
|
@@ ### Work Chunk 2A: Schema Migration (022)
|
|
+Migration safety requirements:
|
|
+- Execute in a single `BEGIN IMMEDIATE ... COMMIT` transaction.
|
|
+- Capture and compare pre/post row counts for `documents`, `document_labels`, `document_paths`, `dirty_sources`.
|
|
+- Run `PRAGMA foreign_key_check` and abort on any violation.
|
|
+- Run `PRAGMA integrity_check` and abort on non-`ok`.
|
|
+- Rebuild FTS and assert `documents_fts` rowcount equals `documents` rowcount.
|
|
```
|
|
|
|
7. **Add note deletion + parent-change propagation**
|
|
Rationale: Current plan handles create/update ingestion but not all staleness paths. Without this, note documents drift.
|
|
|
|
```diff
|
|
@@ ## Phase 2: Per-Note Documents
|
|
+### Work Chunk 2G: Freshness Propagation
|
|
+**Files:** `src/ingestion/discussions.rs`, `src/ingestion/mr_discussions.rs`, `src/documents/regenerator.rs`
|
|
+
|
|
+Rules:
|
|
+- If a previously stored note is missing from upstream payload, delete local note row and enqueue `(note, id)` for document deletion.
|
|
+- When parent issue/MR title or labels change, enqueue descendant note docs dirty (notes inherit parent metadata).
|
|
+- Keep idempotent behavior for repeated syncs.
|
|
```
|
|
|
|
8. **Separate FTS coverage from embedding coverage**
|
|
Rationale: Biggest cost/perf risk is embeddings. Index all notes in FTS, but embed selectively with policy knobs.
|
|
|
|
```diff
|
|
@@ ## Estimated Document Volume Impact
|
|
-FTS5 handles this comfortably. Embedding generation time scales linearly (~4x increase).
|
|
+FTS5 handles this comfortably. Embedding generation is policy-controlled:
|
|
+- FTS: index all non-system note docs
|
|
+- Embeddings default: only notes with body length >= 40 chars (configurable)
|
|
+- Add config: `documents.note_embeddings.min_chars`, `documents.note_embeddings.enabled`
|
|
+- Prioritize unresolved DiffNotes before other notes during embedding backfill
|
|
```
|
|
|
|
9. **Bring structured reviewer profiling into scope (not narrative reporting)**
|
|
Rationale: This directly serves the stated use case and makes the feature compelling immediately.
|
|
|
|
```diff
|
|
@@ ## Non-Goals
|
|
-- Adding a "reviewer profile" report command (that's a downstream use case built on this infrastructure)
|
|
+- Generating free-form narrative reviewer reports.
|
|
+ A structured profiling command is in scope.
|
|
+
|
|
+## Phase 3: Structured Reviewer Profiling
|
|
+Add `lore notes profile --author <user> --since <window>` returning:
|
|
+- top commented paths
|
|
+- top parent labels
|
|
+- unresolved-comment ratio
|
|
+- note-type distribution
|
|
+- median comment length
|
|
```
|
|
|
|
10. **Add operational SLOs + robot-mode status for note pipeline**
|
|
Rationale: Reliability improves when regressions are observable, not inferred from failures.
|
|
|
|
```diff
|
|
@@ ## Verification Checklist
|
|
+Operational checks:
|
|
+- `lore -J stats` includes per-`source_type` document counts (including `note`)
|
|
+- Add queue lag metrics: oldest dirty note age, retry backlog size
|
|
+- Add extraction error breakdown by `source_type`
|
|
+- Add smoke assertion: disabling `feature.note_documents` leaves other source regeneration unaffected
|
|
```
|
|
|
|
---
|
|
|
|
If you want, I can produce a single consolidated revised PRD draft (fully merged text, not just diffs) as the next step. |