Highest-impact gaps I see in the current plan: 1. `for-issue` / `for-mr` filtering is ambiguous across projects and can return incorrect rows. 2. `lore notes` has no pagination contract, so large exports and deterministic resumption are weak. 3. Migration `022` is high-risk (table rebuild + FTS + junction tables) without explicit integrity gates. 4. Note-doc freshness is incomplete for upstream note deletions and parent metadata changes (labels/title). Below are my best revisions, each with rationale and a git-diff-style plan edit. --- 1. **Add gated rollout + rollback controls** Rationale: You can still “ship together” while reducing blast radius. This makes recovery fast if note-doc generation causes DB/embedding pressure. ```diff @@ ## Design -Two phases, shipped together as one feature: +Two phases, shipped together as one feature, but with runtime gates: + +- `feature.notes_cli` (Phase 1 surface) +- `feature.note_documents` (Phase 2 indexing/extraction path) + +Rollout order: +1) Enable `notes_cli` +2) Run note-doc backfill in bounded batches +3) Enable `note_documents` for continuous updates + +Rollback: +- Disabling `feature.note_documents` stops new note-doc generation without affecting issue/MR/discussion docs. ``` 2. **Add keyset pagination + deterministic ordering** Rationale: Needed for year-long reviewer analysis and reliable “continue where I left off” behavior under concurrent updates. ```diff @@ pub struct NoteListFilters<'a> { pub limit: usize, + pub cursor: Option<&'a str>, // keyset token ":" + pub include_total_count: bool, // avoid COUNT(*) in hot paths @@ - pub sort: &'a str, // "created" (default) | "updated" + pub sort: &'a str, // "created" | "updated" @@ query_notes SQL -ORDER BY {sort_column} {order} +ORDER BY {sort_column} {order}, n.id {order} LIMIT ? ``` 3. **Make `for-issue` / `for-mr` project-scoped** Rationale: IIDs are not globally unique. Requiring project avoids false positives and hard-to-debug cross-project leakage. ```diff @@ pub struct NotesArgs { - #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr")] + #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr", requires = "project")] pub for_issue: Option, @@ - #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue")] + #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue", requires = "project")] pub for_mr: Option, ``` 4. **Upgrade path filtering semantics** Rationale: Review comments often reference renames/moves. Restricting to `position_new_path` misses relevant notes. ```diff @@ pub struct NotesArgs { - /// Filter by file path (trailing / for prefix match) + /// Filter by file path #[arg(long, help_heading = "Filters")] pub path: Option, + /// Path mode: exact|prefix|glob + #[arg(long = "path-mode", value_parser = ["exact","prefix","glob"], default_value = "exact", help_heading = "Filters")] + pub path_mode: String, + /// Match against old path as well as new path + #[arg(long = "match-old-path", help_heading = "Filters")] + pub match_old_path: bool, @@ query_notes filter mappings -- `path` ... n.position_new_path ... +- `path` applies to `n.position_new_path` and optionally `n.position_old_path`. +- `glob` mode translates `*`/`?` to SQL LIKE with escaping. ``` 5. **Add explicit performance indexes (new migration)** Rationale: `notes` becomes a first-class query surface; without indexes, filters degrade quickly at 10k+ note scale. ```diff @@ ## Phase 1: `lore notes` Command +### Work Chunk 1E: Query Performance Indexes +**Files:** `migrations/023_notes_query_indexes.sql`, `src/core/db.rs` + +Add indexes: +- `notes(project_id, created_at DESC, id DESC)` +- `notes(author_username, created_at DESC, id DESC) WHERE is_system = 0` +- `notes(discussion_id)` +- `notes(position_new_path)` +- `notes(position_old_path)` +- `discussions(issue_id)` +- `discussions(merge_request_id)` ``` 6. **Harden migration 022 with transactional integrity checks** Rationale: This is the riskiest part of the plan. Add hard fail-fast checks so corruption cannot silently pass. ```diff @@ ### Work Chunk 2A: Schema Migration (022) +Migration safety requirements: +- Execute in a single `BEGIN IMMEDIATE ... COMMIT` transaction. +- Capture and compare pre/post row counts for `documents`, `document_labels`, `document_paths`, `dirty_sources`. +- Run `PRAGMA foreign_key_check` and abort on any violation. +- Run `PRAGMA integrity_check` and abort on non-`ok`. +- Rebuild FTS and assert `documents_fts` rowcount equals `documents` rowcount. ``` 7. **Add note deletion + parent-change propagation** Rationale: Current plan handles create/update ingestion but not all staleness paths. Without this, note documents drift. ```diff @@ ## Phase 2: Per-Note Documents +### Work Chunk 2G: Freshness Propagation +**Files:** `src/ingestion/discussions.rs`, `src/ingestion/mr_discussions.rs`, `src/documents/regenerator.rs` + +Rules: +- If a previously stored note is missing from upstream payload, delete local note row and enqueue `(note, id)` for document deletion. +- When parent issue/MR title or labels change, enqueue descendant note docs dirty (notes inherit parent metadata). +- Keep idempotent behavior for repeated syncs. ``` 8. **Separate FTS coverage from embedding coverage** Rationale: Biggest cost/perf risk is embeddings. Index all notes in FTS, but embed selectively with policy knobs. ```diff @@ ## Estimated Document Volume Impact -FTS5 handles this comfortably. Embedding generation time scales linearly (~4x increase). +FTS5 handles this comfortably. Embedding generation is policy-controlled: +- FTS: index all non-system note docs +- Embeddings default: only notes with body length >= 40 chars (configurable) +- Add config: `documents.note_embeddings.min_chars`, `documents.note_embeddings.enabled` +- Prioritize unresolved DiffNotes before other notes during embedding backfill ``` 9. **Bring structured reviewer profiling into scope (not narrative reporting)** Rationale: This directly serves the stated use case and makes the feature compelling immediately. ```diff @@ ## Non-Goals -- Adding a "reviewer profile" report command (that's a downstream use case built on this infrastructure) +- Generating free-form narrative reviewer reports. + A structured profiling command is in scope. + +## Phase 3: Structured Reviewer Profiling +Add `lore notes profile --author --since ` returning: +- top commented paths +- top parent labels +- unresolved-comment ratio +- note-type distribution +- median comment length ``` 10. **Add operational SLOs + robot-mode status for note pipeline** Rationale: Reliability improves when regressions are observable, not inferred from failures. ```diff @@ ## Verification Checklist +Operational checks: +- `lore -J stats` includes per-`source_type` document counts (including `note`) +- Add queue lag metrics: oldest dirty note age, retry backlog size +- Add extraction error breakdown by `source_type` +- Add smoke assertion: disabling `feature.note_documents` leaves other source regeneration unaffected ``` --- If you want, I can produce a single consolidated revised PRD draft (fully merged text, not just diffs) as the next step.