1. **Canonical note identity for documents: use `notes.gitlab_id` as `source_id`** Why this is better: the current plan still couples document identity to local row IDs. Even with upsert+sweep, local IDs are a storage artifact and can be reused in edge cases. Using GitLab note IDs as canonical document IDs makes regeneration, backfill, and deletion propagation more stable and portable. ```diff --- a/PRD.md +++ b/PRD.md @@ Phase 0: Stable Note Identity -Phase 2 depends on `notes.id` as the `source_id` for note documents. +Phase 2 uses `notes.gitlab_id` as the `source_id` for note documents. +`notes.id` remains an internal relational key only. @@ Work Chunk 0A pub struct NoteUpsertOutcome { pub local_note_id: i64, + pub document_source_id: i64, // notes.gitlab_id pub changed_semantics: bool, } @@ Work Chunk 2D -if !note.is_system && outcome.changed_semantics { - dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.local_note_id)?; +if !note.is_system && outcome.changed_semantics { + dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.document_source_id)?; } @@ Work Chunk 2E -SELECT 'note', n.id, ?1 +SELECT 'note', n.gitlab_id, ?1 @@ Work Chunk 2H -ON d.source_type = 'note' AND d.source_id = n.id +ON d.source_type = 'note' AND d.source_id = n.gitlab_id ``` 2. **Prevent false deletions on partial/incomplete syncs** Why this is better: sweep-based deletion is correct only when a discussion’s notes were fully fetched. If a page fails mid-fetch, current logic can incorrectly delete valid notes. Add an explicit “fetch complete” guard before sweep. ```diff --- a/PRD.md +++ b/PRD.md @@ Phase 0 +### Work Chunk 0C: Sweep Safety Guard (Partial Fetch Protection) + +Only run stale-note sweep when note pagination completed successfully for that discussion. +If fetch is partial/interrupted, skip sweep and keep prior notes intact. +#### Tests to Write First +#[test] +fn test_partial_fetch_does_not_sweep_notes() { /* ... */ } + +#[test] +fn test_complete_fetch_runs_sweep_notes() { /* ... */ } +#### Implementation +if discussion_fetch_complete { + sweep_stale_issue_notes(...)?; +} else { + tracing::warn!("Skipping stale sweep for discussion {} due to partial fetch", discussion_gitlab_id); +} ``` 3. **Make deletion propagation set-based (not per-note loop)** Why this is better: the current per-note DELETE loop is O(N) statements and gets slow on large threads. A temp-table/CTE set-based delete is faster, simpler to reason about, and remains atomic. ```diff --- a/PRD.md +++ b/PRD.md @@ Work Chunk 0B Implementation - for note_id in stale_note_ids { - conn.execute("DELETE FROM documents WHERE source_type = 'note' AND source_id = ?", [note_id])?; - conn.execute("DELETE FROM dirty_sources WHERE source_type = 'note' AND source_id = ?", [note_id])?; - } + CREATE TEMP TABLE _stale_note_source_ids(source_id INTEGER PRIMARY KEY) WITHOUT ROWID; + INSERT INTO _stale_note_source_ids + SELECT gitlab_id + FROM notes + WHERE discussion_id = ? AND last_seen_at < ? AND is_system = 0; + + DELETE FROM notes + WHERE discussion_id = ? AND last_seen_at < ?; + + DELETE FROM documents + WHERE source_type = 'note' + AND source_id IN (SELECT source_id FROM _stale_note_source_ids); + + DELETE FROM dirty_sources + WHERE source_type = 'note' + AND source_id IN (SELECT source_id FROM _stale_note_source_ids); + + DROP TABLE _stale_note_source_ids; ``` 4. **Fix project-scoping and time-window semantics in `lore notes`** Why this is better: the plan currently has a contradiction: clap `requires = "project"` blocks use of `defaultProject`, while query layer says default fallback is allowed. Also, `since/until` parsing should use one shared “now” to avoid subtle drift and inverted windows. ```diff --- a/PRD.md +++ b/PRD.md @@ Work Chunk 1B NotesArgs -#[arg(long = "for-issue", ..., requires = "project")] +#[arg(long = "for-issue", ...)] pub for_issue: Option; -#[arg(long = "for-mr", ..., requires = "project")] +#[arg(long = "for-mr", ...)] pub for_mr: Option; @@ Work Chunk 1A Query Notes -- `since`: `parse_since(since_str)` then `n.created_at >= ?` -- `until`: `parse_since(until_str)` then `n.created_at <= ?` +- Parse `since` and `until` with a single anchored `now_ms` captured once per command. +- If user supplies `YYYY-MM-DD` for `--until`, interpret as end-of-day (23:59:59.999 UTC). +- Validate `since <= until` after both parse with same anchor. ``` 5. **Add an analytics mode (not a profile command): `lore notes --aggregate`** Why this is better: this directly supports the stated use case (review patterns) without introducing the rejected “profile report” command. It keeps scope narrow and reuses existing filters. ```diff --- a/PRD.md +++ b/PRD.md @@ Phase 1 +### Work Chunk 1F: Aggregation Mode for Notes Listing + +Add optional aggregation on top of `lore notes`: +- `--aggregate author|note_type|path|resolution` +- `--top N` (default 20) + +Behavior: +- Reuses all existing filters (`--since`, `--project`, `--for-mr`, etc.) +- Returns grouped counts (+ percentage of filtered corpus) +- Works in table/json/jsonl/csv + +Non-goal alignment: +- This is not a narrative “reviewer profile” command. +- It is a query primitive for downstream analysis. ``` 6. **Prevent note backfill from starving other document regeneration** Why this is better: after migration/backfill, note dirty entries can dominate the queue and delay issue/MR/discussion updates. Add source-type fairness in regenerator scheduling. ```diff --- a/PRD.md +++ b/PRD.md @@ Work Chunk 2D +#### Scheduling Revision +Process dirty sources with weighted fairness instead of strict FIFO: +- issue: 3 +- merge_request: 3 +- discussion: 2 +- note: 1 + +Implementation sketch: +- fetch next batch by source_type buckets +- interleave according to weights +- preserve retry semantics per source +#### Tests to Write First +#[test] +fn test_note_backfill_does_not_starve_issue_and_mr_regeneration() { /* ... */ } ``` 7. **Harden migration 023: remove invalid SQL assertions and move integrity checks to tests** Why this is better: `RAISE(ABORT, ...)` in standalone `SELECT` is not valid SQLite usage outside triggers/check expressions. Keep migration SQL minimal/portable and enforce invariants in migration tests. ```diff --- a/PRD.md +++ b/PRD.md @@ Work Chunk 2A Migration SQL --- Step 10: Integrity verification -SELECT CASE - WHEN ... THEN RAISE(ABORT, '...') -END; +-- Step 10 removed from SQL migration. +-- Integrity verification is enforced in migration tests: +-- 1) pre/post row-count equality +-- 2) `PRAGMA foreign_key_check` is empty +-- 3) documents_fts row count matches documents row count after rebuild @@ Work Chunk 2A Tests +#[test] +fn test_migration_023_integrity_checks_pass() { + // pre/post counts, foreign_key_check empty, fts parity +} ``` These 7 revisions improve correctness under failure, reduce churn risk, improve large-sync performance, and make the feature materially more useful for reviewer-analysis workflows without reintroducing any rejected recommendations.