Strong plan overall. The biggest gaps I’d fix are around sync-health correctness, idempotency/integrity under repeated ingests, deleted-entity lifecycle, and reducing schema drift risk without heavy reflection machinery. I avoided everything in your `## Rejected Recommendations` section. **1. Add Sync Health Semantics (not just age)** Time freshness alone can mislead after partial/failed syncs. Agents need to know whether data is both recent and complete. ```diff @@ ## Freshness Metadata & Staleness Guards (Cross-Cutting, Change A/F/G) - pub freshness_state: String, // "fresh" | "stale" | "unknown" + pub freshness_state: String, // "fresh" | "stale" | "unknown" + pub sync_status: String, // "ok" | "partial" | "failed" | "never" + pub last_successful_sync_run_id: Option, + pub last_attempted_sync_run_id: Option, @@ -#[arg(long, help_heading = "Freshness")] -pub require_fresh: Option, +#[arg(long, help_heading = "Freshness")] +pub require_fresh: Option, +#[arg(long, help_heading = "Freshness")] +pub require_sync_ok: bool, ``` Rationale: this prevents false confidence when one project is fresh-by-time but latest sync actually failed or was partial. --- **2. Add `--require-complete` Guard for Missing Required IDs** You already expose `meta.incomplete_rows`; add a hard gate for automation. ```diff @@ ## Count Semantics (Cross-Cutting Convention) `incomplete_rows` is computed via a dedicated COUNT query... +Add CLI guard: +`--require-complete` fails with exit code 19 when `meta.incomplete_rows > 0`. +Suggested action: `lore sync --full`. ``` Rationale: agents can fail fast instead of silently acting on partial datasets. --- **3. Strengthen Ingestion Idempotency + Referential Integrity for Notes** You added natural-key uniqueness for discussions; do the same for notes and enforce parent integrity at DB level. ```diff @@ ## Supporting Indexes (Cross-Cutting, Change D) CREATE UNIQUE INDEX IF NOT EXISTS idx_discussions_project_gitlab_discussion_id ON discussions(project_id, gitlab_discussion_id); +CREATE UNIQUE INDEX IF NOT EXISTS idx_notes_project_gitlab_id + ON notes(project_id, gitlab_id); + +-- Referential integrity +-- notes.discussion_id REFERENCES discussions(id) +-- notes.project_id REFERENCES projects(id) ``` Rationale: repeated syncs and retries won’t duplicate notes, and orphaned rows can’t accumulate. --- **4. Add Deleted/Tombstoned Entity Lifecycle** Current plan excludes null IDs but doesn’t define behavior when GitLab entities are deleted after sync. ```diff @@ ## Contract Invariants (NEW) +### Deletion Lifecycle Invariant +1. Notes/discussions deleted upstream are tombstoned locally (`deleted_at`), not hard-deleted. +2. All list/show commands exclude tombstoned rows by default. +3. Optional flag `--include-deleted` exposes tombstoned rows for audit/debug. ``` Rationale: preserves auditability, prevents ghost actions on deleted objects, and avoids destructive resync behavior. --- **5. Expand Discussions Payload for Rename Accuracy + Better Triage** `--path-side old` is great, but output currently only returns `position_new_*`. ```diff @@ ## Change 3: Add Standalone `discussions` List Command pub position_new_path: Option, pub position_new_line: Option, + pub position_old_path: Option, + pub position_old_line: Option, + pub last_author: Option, + pub participant_usernames: Vec, ``` Rationale: for renamed/deleted files, agents need old and new coordinates to act confidently; participants/last_author improve thread routing and prioritization. --- **6. Add SQLite Busy Handling + Retry Policy** Read transactions + concurrent sync writes can still produce `SQLITE_BUSY` under load. ```diff @@ ## Count Semantics (Cross-Cutting Convention) **Snapshot consistency**: All three queries ... inside a single read transaction ... +**Busy handling**: set `PRAGMA busy_timeout` (e.g. 5000ms) and retry transient +`SQLITE_BUSY` errors up to 3 times with jittered backoff for read commands. ``` Rationale: improves reliability in real multi-agent usage without changing semantics. --- **7. Make Field Definitions Single-Source (Lightweight Drift Prevention)** You rejected full schema generation from code; a lower-cost middle ground is shared field manifests used by both docs and `--fields` validation. ```diff @@ ## Change 7: Fix Robot-Docs Response Schemas +#### 7h. Single-source field manifests (no reflection) +Define per-command field constants (e.g. `NOTES_FIELDS`, `DISCUSSIONS_FIELDS`) +used by: +1) `--fields` validation/filtering +2) `--fields minimal` expansion +3) `robot-docs` schema rendering ``` Rationale: cuts drift risk materially while staying much simpler than reflection/snapshot infra. --- **8. De-duplicate and Upgrade Test Strategy Around Concurrency** There are duplicated tests across Change 2 and Change 3; add explicit race tests where sync writes happen between list subqueries to prove tx consistency. ```diff @@ ## Tests -**Test 6**: `--project-id` scopes IID resolution directly -**Test 7**: `--path-side old` matches renamed file discussions -**Test 8**: `--path-side either` matches both old and new paths +Move shared discussion-filter tests to a single section under Change 3. +Add concurrency tests: +1) count/page/incomplete consistency under concurrent sync writes +2) show discussion+notes snapshot consistency under concurrent writes ``` Rationale: less maintenance noise, better coverage of your highest-risk correctness path. --- If you want, I can also produce a single consolidated patch block that rewrites your plan text end-to-end with these edits applied in-place.