**Critical Gaps In Current Plan** 1. `dirty_sources` scoping is based on `id`, but `dirty_sources` has no `id` column and uses `(source_type, source_id)` UPSERT semantics. 2. Plan assumes a new `dependent_queue` with `status`, but current code uses `pending_dependent_fetches` (delete-on-complete), so queue-scoping design conflicts with existing invariants. 3. Constraint 6 says all remote fetches happen before any content writes, but the proposed surgical flow fetches discussions/events/diffs after ingest writes. 4. `sync_runs` is already an existing table and already used by `SyncRunRecorder`; the plan currently treats it like a new table. **Best Revisions** 1. **Fix dirty-source scoping to match real schema (queued-at watermark, not `id` high-water).** Why this is better: This removes a correctness bug and makes same-entity re-ingest deterministic under UPSERT behavior. ```diff @@ Design Constraints -2. Dirty queue scoping: ... capture MAX(id) FROM dirty_sources ... run_generate_docs_for_dirty_ids ... +2. Dirty queue scoping: `dirty_sources` is keyed by `(source_type, source_id)` and updated via UPSERT. + Surgical scoping MUST use: + 1) a run-level `run_dirty_floor_ms` captured before surgical ingest, and + 2) explicit touched source keys from ingest (`(source_type, source_id)`). + Surgical docs MUST call a scoped API (e.g. `run_generate_docs_for_sources`) and MUST NOT drain global dirty queue. @@ Step 9a -pub fn run_generate_docs_for_dirty_ids(config: &Config, dirty_source_ids: &[i64]) -> Result +pub fn run_generate_docs_for_sources(config: &Config, sources: &[(SourceType, i64)]) -> Result ``` 2. **Bypass shared dependent queue in surgical mode; run dependents inline per target.** Why this is better: Avoids queue migration churn, avoids run-scope conflicts with existing unique constraints, and removes orphan-job hygiene complexity entirely. ```diff @@ Design Constraints -4. Dependent queue scoping: ... scope_run_id indexed column on dependent_queue ... +4. Surgical dependent execution: surgical mode MUST bypass `pending_dependent_fetches`. + Dependents (resource_events, mr_closes_issues, mr_diffs) run inline for targeted entities only. + Global queue remains for normal sync only. @@ Design Constraints -14. Queue failure hygiene: ... pending scoped jobs ... terminalized to aborted ... +14. Surgical failure hygiene: surgical mode MUST leave no queue artifacts because it does not enqueue dependent jobs. @@ Step 9b / 9c / Step 13 -Implement scoped drain helpers and enqueue_job scope_run_id plumbing +Replace with direct per-entity helpers in ingestion layer: + - sync_issue_resource_events_direct(...) + - sync_mr_resource_events_direct(...) + - sync_mr_closes_issues_direct(...) + - sync_mr_diffs_direct(...) ``` 3. **Clarify atomicity contract to “primary-entity atomicity” (remove contradiction).** Why this is better: Keeps strong zero-write guarantees for missing IIDs while matching practical staged pipeline behavior. ```diff @@ Design Constraints -6. Preflight-then-commit (content-plane): All remote fetches happen BEFORE any writes to content tables ... +6. Primary-entity atomicity: all requested issue/MR payload fetches complete before first content write. + If any primary IID fetch fails, primary ingest does zero content writes. + Dependent stages (discussions/events/diffs/closes) are post-ingest and best-effort, with structured per-stage failure reporting. ``` 4. **Extend existing `sync_runs` schema instead of redefining it.** Why this is better: Preserves compatibility with current `SyncRunRecorder`, `sync_status`, and existing historical data. ```diff @@ Step 8a -Add `sync_runs` table migration (CREATE TABLE sync_runs ...) +Add migration 027 to extend existing `sync_runs` table: + - ADD COLUMN mode TEXT NULL -- 'standard' | 'surgical' + - ADD COLUMN phase TEXT NULL -- preflight|ingest|dependents|docs|embed|done|failed + - ADD COLUMN surgical_summary_json TEXT NULL +Reuse `SyncRunRecorder` row lifecycle; do not introduce a parallel run-ledger model. ``` 5. **Strengthen TOCTOU stale protection for equal timestamps.** Why this is better: Prevents regressions when `updated_at` is equal but a fresher local fetch already happened. ```diff @@ Design Constraints -13. ... If local `updated_at` is newer than preflight payload `updated_at`, skip ... +13. ... Skip stale when: + a) local.updated_at > payload.updated_at, OR + b) local.updated_at == payload.updated_at AND local.last_seen_at > preflight_started_at_ms. + This prevents equal-timestamp regressions under concurrent sync. @@ Step 1f tests +Add test: `equal_updated_at_but_newer_last_seen_is_skipped`. ``` 6. **Shrink lock window further: release `sync` lock before embed; use dedicated embed lock.** Why this is better: Prevents long embedding from blocking unrelated syncs and avoids concurrent embed writers. ```diff @@ Design Constraints -11. Lock ... held through all mutation stages. +11. Lock ... held through ingest/dependents/docs only. + Release `AppLock("sync")` before embed. + Embed stage uses `AppLock("embed")` for single-flight embedding writes. @@ Step 9 -Embed runs inside the same sync lock window +Embed runs after sync lock release, under dedicated embed lock ``` 7. **Add the missing `sync-runs` robot read path (the plan references it but doesn’t define it).** Why this is better: Makes durable run-state actually useful for recovery automation and observability. ```diff @@ Step 14 (new) +## Step 14a: Add `sync-runs` read command + +CLI: + lore --robot sync-runs --limit 20 + lore --robot sync-runs --run-id + lore --robot sync-runs --state failed + +Robot response fields: + run_id, mode, phase, status, started_at, finished_at, counters, failures, suggested_retry_command ``` 8. **Add URL-native surgical targets (`--issue-url`, `--mr-url`) with project inference.** Why this is better: Much more agent-friendly and reduces project-resolution errors from copy/paste workflows. ```diff @@ CLI Interface lore sync --issue 123 --issue 456 -p myproject +lore sync --issue-url https://gitlab.example.com/group/proj/-/issues/123 +lore sync --mr-url https://gitlab.example.com/group/proj/-/merge_requests/789 @@ Step 2 +Add repeatable flags: + --issue-url + --mr-url +Parse URL into (project_path, iid). If all targets are URL-derived and same project, `-p` is optional. +If mixed projects are provided in one command, reject with clear error. ``` If you want, I can produce a single consolidated patched version of your plan (iteration 5 draft) with these revisions already merged.