who.rs: Add INDEXED BY idx_notes_diffnote_path_created to all DiffNote query paths (expert, expert_details, reviews, path probes, suffix_probe). SQLite planner was choosing idx_notes_system (106K rows, 38%) over the partial index (26K rows, 9.3%) when LIKE predicates are present. Measured: expert 1561ms->59ms (26x), reviews ~1200ms->16ms (75x). stats.rs: Replace 12+ sequential COUNT(*) queries with conditional aggregates (SUM(CASE WHEN...)) and use FTS5 shadow table (documents_fts_docsize) instead of virtual table for counting. Measured: warm 109ms->65ms (1.68x).
187 lines
7.0 KiB
Markdown
187 lines
7.0 KiB
Markdown
1. **Canonical note identity for documents: use `notes.gitlab_id` as `source_id`**
|
||
Why this is better: the current plan still couples document identity to local row IDs. Even with upsert+sweep, local IDs are a storage artifact and can be reused in edge cases. Using GitLab note IDs as canonical document IDs makes regeneration, backfill, and deletion propagation more stable and portable.
|
||
|
||
```diff
|
||
--- a/PRD.md
|
||
+++ b/PRD.md
|
||
@@ Phase 0: Stable Note Identity
|
||
-Phase 2 depends on `notes.id` as the `source_id` for note documents.
|
||
+Phase 2 uses `notes.gitlab_id` as the `source_id` for note documents.
|
||
+`notes.id` remains an internal relational key only.
|
||
|
||
@@ Work Chunk 0A
|
||
pub struct NoteUpsertOutcome {
|
||
pub local_note_id: i64,
|
||
+ pub document_source_id: i64, // notes.gitlab_id
|
||
pub changed_semantics: bool,
|
||
}
|
||
|
||
@@ Work Chunk 2D
|
||
-if !note.is_system && outcome.changed_semantics {
|
||
- dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.local_note_id)?;
|
||
+if !note.is_system && outcome.changed_semantics {
|
||
+ dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.document_source_id)?;
|
||
}
|
||
|
||
@@ Work Chunk 2E
|
||
-SELECT 'note', n.id, ?1
|
||
+SELECT 'note', n.gitlab_id, ?1
|
||
|
||
@@ Work Chunk 2H
|
||
-ON d.source_type = 'note' AND d.source_id = n.id
|
||
+ON d.source_type = 'note' AND d.source_id = n.gitlab_id
|
||
```
|
||
|
||
2. **Prevent false deletions on partial/incomplete syncs**
|
||
Why this is better: sweep-based deletion is correct only when a discussion’s notes were fully fetched. If a page fails mid-fetch, current logic can incorrectly delete valid notes. Add an explicit “fetch complete” guard before sweep.
|
||
|
||
```diff
|
||
--- a/PRD.md
|
||
+++ b/PRD.md
|
||
@@ Phase 0
|
||
+### Work Chunk 0C: Sweep Safety Guard (Partial Fetch Protection)
|
||
+
|
||
+Only run stale-note sweep when note pagination completed successfully for that discussion.
|
||
+If fetch is partial/interrupted, skip sweep and keep prior notes intact.
|
||
|
||
+#### Tests to Write First
|
||
+#[test]
|
||
+fn test_partial_fetch_does_not_sweep_notes() { /* ... */ }
|
||
+
|
||
+#[test]
|
||
+fn test_complete_fetch_runs_sweep_notes() { /* ... */ }
|
||
|
||
+#### Implementation
|
||
+if discussion_fetch_complete {
|
||
+ sweep_stale_issue_notes(...)?;
|
||
+} else {
|
||
+ tracing::warn!("Skipping stale sweep for discussion {} due to partial fetch", discussion_gitlab_id);
|
||
+}
|
||
```
|
||
|
||
3. **Make deletion propagation set-based (not per-note loop)**
|
||
Why this is better: the current per-note DELETE loop is O(N) statements and gets slow on large threads. A temp-table/CTE set-based delete is faster, simpler to reason about, and remains atomic.
|
||
|
||
```diff
|
||
--- a/PRD.md
|
||
+++ b/PRD.md
|
||
@@ Work Chunk 0B Implementation
|
||
- for note_id in stale_note_ids {
|
||
- conn.execute("DELETE FROM documents WHERE source_type = 'note' AND source_id = ?", [note_id])?;
|
||
- conn.execute("DELETE FROM dirty_sources WHERE source_type = 'note' AND source_id = ?", [note_id])?;
|
||
- }
|
||
+ CREATE TEMP TABLE _stale_note_source_ids(source_id INTEGER PRIMARY KEY) WITHOUT ROWID;
|
||
+ INSERT INTO _stale_note_source_ids
|
||
+ SELECT gitlab_id
|
||
+ FROM notes
|
||
+ WHERE discussion_id = ? AND last_seen_at < ? AND is_system = 0;
|
||
+
|
||
+ DELETE FROM notes
|
||
+ WHERE discussion_id = ? AND last_seen_at < ?;
|
||
+
|
||
+ DELETE FROM documents
|
||
+ WHERE source_type = 'note'
|
||
+ AND source_id IN (SELECT source_id FROM _stale_note_source_ids);
|
||
+
|
||
+ DELETE FROM dirty_sources
|
||
+ WHERE source_type = 'note'
|
||
+ AND source_id IN (SELECT source_id FROM _stale_note_source_ids);
|
||
+
|
||
+ DROP TABLE _stale_note_source_ids;
|
||
```
|
||
|
||
4. **Fix project-scoping and time-window semantics in `lore notes`**
|
||
Why this is better: the plan currently has a contradiction: clap `requires = "project"` blocks use of `defaultProject`, while query layer says default fallback is allowed. Also, `since/until` parsing should use one shared “now” to avoid subtle drift and inverted windows.
|
||
|
||
```diff
|
||
--- a/PRD.md
|
||
+++ b/PRD.md
|
||
@@ Work Chunk 1B NotesArgs
|
||
-#[arg(long = "for-issue", ..., requires = "project")]
|
||
+#[arg(long = "for-issue", ...)]
|
||
pub for_issue: Option<i64>;
|
||
|
||
-#[arg(long = "for-mr", ..., requires = "project")]
|
||
+#[arg(long = "for-mr", ...)]
|
||
pub for_mr: Option<i64>;
|
||
|
||
@@ Work Chunk 1A Query Notes
|
||
-- `since`: `parse_since(since_str)` then `n.created_at >= ?`
|
||
-- `until`: `parse_since(until_str)` then `n.created_at <= ?`
|
||
+- Parse `since` and `until` with a single anchored `now_ms` captured once per command.
|
||
+- If user supplies `YYYY-MM-DD` for `--until`, interpret as end-of-day (23:59:59.999 UTC).
|
||
+- Validate `since <= until` after both parse with same anchor.
|
||
```
|
||
|
||
5. **Add an analytics mode (not a profile command): `lore notes --aggregate`**
|
||
Why this is better: this directly supports the stated use case (review patterns) without introducing the rejected “profile report” command. It keeps scope narrow and reuses existing filters.
|
||
|
||
```diff
|
||
--- a/PRD.md
|
||
+++ b/PRD.md
|
||
@@ Phase 1
|
||
+### Work Chunk 1F: Aggregation Mode for Notes Listing
|
||
+
|
||
+Add optional aggregation on top of `lore notes`:
|
||
+- `--aggregate author|note_type|path|resolution`
|
||
+- `--top N` (default 20)
|
||
+
|
||
+Behavior:
|
||
+- Reuses all existing filters (`--since`, `--project`, `--for-mr`, etc.)
|
||
+- Returns grouped counts (+ percentage of filtered corpus)
|
||
+- Works in table/json/jsonl/csv
|
||
+
|
||
+Non-goal alignment:
|
||
+- This is not a narrative “reviewer profile” command.
|
||
+- It is a query primitive for downstream analysis.
|
||
```
|
||
|
||
6. **Prevent note backfill from starving other document regeneration**
|
||
Why this is better: after migration/backfill, note dirty entries can dominate the queue and delay issue/MR/discussion updates. Add source-type fairness in regenerator scheduling.
|
||
|
||
```diff
|
||
--- a/PRD.md
|
||
+++ b/PRD.md
|
||
@@ Work Chunk 2D
|
||
+#### Scheduling Revision
|
||
+Process dirty sources with weighted fairness instead of strict FIFO:
|
||
+- issue: 3
|
||
+- merge_request: 3
|
||
+- discussion: 2
|
||
+- note: 1
|
||
+
|
||
+Implementation sketch:
|
||
+- fetch next batch by source_type buckets
|
||
+- interleave according to weights
|
||
+- preserve retry semantics per source
|
||
|
||
+#### Tests to Write First
|
||
+#[test]
|
||
+fn test_note_backfill_does_not_starve_issue_and_mr_regeneration() { /* ... */ }
|
||
```
|
||
|
||
7. **Harden migration 023: remove invalid SQL assertions and move integrity checks to tests**
|
||
Why this is better: `RAISE(ABORT, ...)` in standalone `SELECT` is not valid SQLite usage outside triggers/check expressions. Keep migration SQL minimal/portable and enforce invariants in migration tests.
|
||
|
||
```diff
|
||
--- a/PRD.md
|
||
+++ b/PRD.md
|
||
@@ Work Chunk 2A Migration SQL
|
||
--- Step 10: Integrity verification
|
||
-SELECT CASE
|
||
- WHEN ... THEN RAISE(ABORT, '...')
|
||
-END;
|
||
+-- Step 10 removed from SQL migration.
|
||
+-- Integrity verification is enforced in migration tests:
|
||
+-- 1) pre/post row-count equality
|
||
+-- 2) `PRAGMA foreign_key_check` is empty
|
||
+-- 3) documents_fts row count matches documents row count after rebuild
|
||
|
||
@@ Work Chunk 2A Tests
|
||
+#[test]
|
||
+fn test_migration_023_integrity_checks_pass() {
|
||
+ // pre/post counts, foreign_key_check empty, fts parity
|
||
+}
|
||
```
|
||
|
||
These 7 revisions improve correctness under failure, reduce churn risk, improve large-sync performance, and make the feature materially more useful for reviewer-analysis workflows without reintroducing any rejected recommendations. |