who.rs: Add INDEXED BY idx_notes_diffnote_path_created to all DiffNote query paths (expert, expert_details, reviews, path probes, suffix_probe). SQLite planner was choosing idx_notes_system (106K rows, 38%) over the partial index (26K rows, 9.3%) when LIKE predicates are present. Measured: expert 1561ms->59ms (26x), reviews ~1200ms->16ms (75x). stats.rs: Replace 12+ sequential COUNT(*) queries with conditional aggregates (SUM(CASE WHEN...)) and use FTS5 shadow table (documents_fts_docsize) instead of virtual table for counting. Measured: warm 109ms->65ms (1.68x).
174 lines
7.2 KiB
Markdown
174 lines
7.2 KiB
Markdown
Highest-impact gaps I see in the current plan:
|
|
|
|
1. `for-issue` / `for-mr` filtering is ambiguous across projects and can return incorrect rows.
|
|
2. `lore notes` has no pagination contract, so large exports and deterministic resumption are weak.
|
|
3. Migration `022` is high-risk (table rebuild + FTS + junction tables) without explicit integrity gates.
|
|
4. Note-doc freshness is incomplete for upstream note deletions and parent metadata changes (labels/title).
|
|
|
|
Below are my best revisions, each with rationale and a git-diff-style plan edit.
|
|
|
|
---
|
|
|
|
1. **Add gated rollout + rollback controls**
|
|
Rationale: You can still “ship together” while reducing blast radius. This makes recovery fast if note-doc generation causes DB/embedding pressure.
|
|
|
|
```diff
|
|
@@ ## Design
|
|
-Two phases, shipped together as one feature:
|
|
+Two phases, shipped together as one feature, but with runtime gates:
|
|
+
|
|
+- `feature.notes_cli` (Phase 1 surface)
|
|
+- `feature.note_documents` (Phase 2 indexing/extraction path)
|
|
+
|
|
+Rollout order:
|
|
+1) Enable `notes_cli`
|
|
+2) Run note-doc backfill in bounded batches
|
|
+3) Enable `note_documents` for continuous updates
|
|
+
|
|
+Rollback:
|
|
+- Disabling `feature.note_documents` stops new note-doc generation without affecting issue/MR/discussion docs.
|
|
```
|
|
|
|
2. **Add keyset pagination + deterministic ordering**
|
|
Rationale: Needed for year-long reviewer analysis and reliable “continue where I left off” behavior under concurrent updates.
|
|
|
|
```diff
|
|
@@ pub struct NoteListFilters<'a> {
|
|
pub limit: usize,
|
|
+ pub cursor: Option<&'a str>, // keyset token "<sort_ms>:<id>"
|
|
+ pub include_total_count: bool, // avoid COUNT(*) in hot paths
|
|
@@
|
|
- pub sort: &'a str, // "created" (default) | "updated"
|
|
+ pub sort: &'a str, // "created" | "updated"
|
|
@@ query_notes SQL
|
|
-ORDER BY {sort_column} {order}
|
|
+ORDER BY {sort_column} {order}, n.id {order}
|
|
LIMIT ?
|
|
```
|
|
|
|
3. **Make `for-issue` / `for-mr` project-scoped**
|
|
Rationale: IIDs are not globally unique. Requiring project avoids false positives and hard-to-debug cross-project leakage.
|
|
|
|
```diff
|
|
@@ pub struct NotesArgs {
|
|
- #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr")]
|
|
+ #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr", requires = "project")]
|
|
pub for_issue: Option<i64>,
|
|
@@
|
|
- #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue")]
|
|
+ #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue", requires = "project")]
|
|
pub for_mr: Option<i64>,
|
|
```
|
|
|
|
4. **Upgrade path filtering semantics**
|
|
Rationale: Review comments often reference renames/moves. Restricting to `position_new_path` misses relevant notes.
|
|
|
|
```diff
|
|
@@ pub struct NotesArgs {
|
|
- /// Filter by file path (trailing / for prefix match)
|
|
+ /// Filter by file path
|
|
#[arg(long, help_heading = "Filters")]
|
|
pub path: Option<String>,
|
|
+ /// Path mode: exact|prefix|glob
|
|
+ #[arg(long = "path-mode", value_parser = ["exact","prefix","glob"], default_value = "exact", help_heading = "Filters")]
|
|
+ pub path_mode: String,
|
|
+ /// Match against old path as well as new path
|
|
+ #[arg(long = "match-old-path", help_heading = "Filters")]
|
|
+ pub match_old_path: bool,
|
|
@@ query_notes filter mappings
|
|
-- `path` ... n.position_new_path ...
|
|
+- `path` applies to `n.position_new_path` and optionally `n.position_old_path`.
|
|
+- `glob` mode translates `*`/`?` to SQL LIKE with escaping.
|
|
```
|
|
|
|
5. **Add explicit performance indexes (new migration)**
|
|
Rationale: `notes` becomes a first-class query surface; without indexes, filters degrade quickly at 10k+ note scale.
|
|
|
|
```diff
|
|
@@ ## Phase 1: `lore notes` Command
|
|
+### Work Chunk 1E: Query Performance Indexes
|
|
+**Files:** `migrations/023_notes_query_indexes.sql`, `src/core/db.rs`
|
|
+
|
|
+Add indexes:
|
|
+- `notes(project_id, created_at DESC, id DESC)`
|
|
+- `notes(author_username, created_at DESC, id DESC) WHERE is_system = 0`
|
|
+- `notes(discussion_id)`
|
|
+- `notes(position_new_path)`
|
|
+- `notes(position_old_path)`
|
|
+- `discussions(issue_id)`
|
|
+- `discussions(merge_request_id)`
|
|
```
|
|
|
|
6. **Harden migration 022 with transactional integrity checks**
|
|
Rationale: This is the riskiest part of the plan. Add hard fail-fast checks so corruption cannot silently pass.
|
|
|
|
```diff
|
|
@@ ### Work Chunk 2A: Schema Migration (022)
|
|
+Migration safety requirements:
|
|
+- Execute in a single `BEGIN IMMEDIATE ... COMMIT` transaction.
|
|
+- Capture and compare pre/post row counts for `documents`, `document_labels`, `document_paths`, `dirty_sources`.
|
|
+- Run `PRAGMA foreign_key_check` and abort on any violation.
|
|
+- Run `PRAGMA integrity_check` and abort on non-`ok`.
|
|
+- Rebuild FTS and assert `documents_fts` rowcount equals `documents` rowcount.
|
|
```
|
|
|
|
7. **Add note deletion + parent-change propagation**
|
|
Rationale: Current plan handles create/update ingestion but not all staleness paths. Without this, note documents drift.
|
|
|
|
```diff
|
|
@@ ## Phase 2: Per-Note Documents
|
|
+### Work Chunk 2G: Freshness Propagation
|
|
+**Files:** `src/ingestion/discussions.rs`, `src/ingestion/mr_discussions.rs`, `src/documents/regenerator.rs`
|
|
+
|
|
+Rules:
|
|
+- If a previously stored note is missing from upstream payload, delete local note row and enqueue `(note, id)` for document deletion.
|
|
+- When parent issue/MR title or labels change, enqueue descendant note docs dirty (notes inherit parent metadata).
|
|
+- Keep idempotent behavior for repeated syncs.
|
|
```
|
|
|
|
8. **Separate FTS coverage from embedding coverage**
|
|
Rationale: Biggest cost/perf risk is embeddings. Index all notes in FTS, but embed selectively with policy knobs.
|
|
|
|
```diff
|
|
@@ ## Estimated Document Volume Impact
|
|
-FTS5 handles this comfortably. Embedding generation time scales linearly (~4x increase).
|
|
+FTS5 handles this comfortably. Embedding generation is policy-controlled:
|
|
+- FTS: index all non-system note docs
|
|
+- Embeddings default: only notes with body length >= 40 chars (configurable)
|
|
+- Add config: `documents.note_embeddings.min_chars`, `documents.note_embeddings.enabled`
|
|
+- Prioritize unresolved DiffNotes before other notes during embedding backfill
|
|
```
|
|
|
|
9. **Bring structured reviewer profiling into scope (not narrative reporting)**
|
|
Rationale: This directly serves the stated use case and makes the feature compelling immediately.
|
|
|
|
```diff
|
|
@@ ## Non-Goals
|
|
-- Adding a "reviewer profile" report command (that's a downstream use case built on this infrastructure)
|
|
+- Generating free-form narrative reviewer reports.
|
|
+ A structured profiling command is in scope.
|
|
+
|
|
+## Phase 3: Structured Reviewer Profiling
|
|
+Add `lore notes profile --author <user> --since <window>` returning:
|
|
+- top commented paths
|
|
+- top parent labels
|
|
+- unresolved-comment ratio
|
|
+- note-type distribution
|
|
+- median comment length
|
|
```
|
|
|
|
10. **Add operational SLOs + robot-mode status for note pipeline**
|
|
Rationale: Reliability improves when regressions are observable, not inferred from failures.
|
|
|
|
```diff
|
|
@@ ## Verification Checklist
|
|
+Operational checks:
|
|
+- `lore -J stats` includes per-`source_type` document counts (including `note`)
|
|
+- Add queue lag metrics: oldest dirty note age, retry backlog size
|
|
+- Add extraction error breakdown by `source_type`
|
|
+- Add smoke assertion: disabling `feature.note_documents` leaves other source regeneration unaffected
|
|
```
|
|
|
|
---
|
|
|
|
If you want, I can produce a single consolidated revised PRD draft (fully merged text, not just diffs) as the next step. |