feat(cli): implement 'lore file-history' command (bd-z94)

Adds file-history command showing which MRs touched a file, with: - Rename chain resolution via BFS (resolve_rename_chain from bd-1yx) - DiffNote discussion snippets with --discussions flag - --merged filter, --no-follow-renames, -n limit - Human output with styled MR list and rename chain display - Robot JSON output with {ok, data, meta} envelope - Autocorrect registry and robot-docs manifest entry - Fixes pre-existing --no-status missing from sync autocorrect registry
2026-02-17 11:43:42 -05:00
parent 491dc52864
commit a1bca10408
30 changed files with 4758 additions and 3380 deletions
--- a/docs/plan-surgical-sync.feedback-5.md
+++ b/docs/plan-surgical-sync.feedback-5.md
@@ -0,0 +1,130 @@
+**Critical Gaps In Current Plan**
+1. `dirty_sources` scoping is based on `id`, but `dirty_sources` has no `id` column and uses `(source_type, source_id)` UPSERT semantics.
+2. Plan assumes a new `dependent_queue` with `status`, but current code uses `pending_dependent_fetches` (delete-on-complete), so queue-scoping design conflicts with existing invariants.
+3. Constraint 6 says all remote fetches happen before any content writes, but the proposed surgical flow fetches discussions/events/diffs after ingest writes.
+4. `sync_runs` is already an existing table and already used by `SyncRunRecorder`; the plan currently treats it like a new table.
+
+**Best Revisions**
+
+1. **Fix dirty-source scoping to match real schema (queued-at watermark, not `id` high-water).**  
+Why this is better: This removes a correctness bug and makes same-entity re-ingest deterministic under UPSERT behavior.
+
+```diff
+@@ Design Constraints
+-2. Dirty queue scoping: ... capture MAX(id) FROM dirty_sources ... run_generate_docs_for_dirty_ids ...
+2. Dirty queue scoping: `dirty_sources` is keyed by `(source_type, source_id)` and updated via UPSERT.
+   Surgical scoping MUST use:
+   1) a run-level `run_dirty_floor_ms` captured before surgical ingest, and
+   2) explicit touched source keys from ingest (`(source_type, source_id)`).
+   Surgical docs MUST call a scoped API (e.g. `run_generate_docs_for_sources`) and MUST NOT drain global dirty queue.
+@@ Step 9a
+-pub fn run_generate_docs_for_dirty_ids(config: &Config, dirty_source_ids: &[i64]) -> Result<GenerateDocsResult>
+pub fn run_generate_docs_for_sources(config: &Config, sources: &[(SourceType, i64)]) -> Result<GenerateDocsResult>
+```
+
+2. **Bypass shared dependent queue in surgical mode; run dependents inline per target.**  
+Why this is better: Avoids queue migration churn, avoids run-scope conflicts with existing unique constraints, and removes orphan-job hygiene complexity entirely.
+
+```diff
+@@ Design Constraints
+-4. Dependent queue scoping: ... scope_run_id indexed column on dependent_queue ...
+4. Surgical dependent execution: surgical mode MUST bypass `pending_dependent_fetches`.
+   Dependents (resource_events, mr_closes_issues, mr_diffs) run inline for targeted entities only.
+   Global queue remains for normal sync only.
+@@ Design Constraints
+-14. Queue failure hygiene: ... pending scoped jobs ... terminalized to aborted ...
+14. Surgical failure hygiene: surgical mode MUST leave no queue artifacts because it does not enqueue dependent jobs.
+@@ Step 9b / 9c / Step 13
+-Implement scoped drain helpers and enqueue_job scope_run_id plumbing
+Replace with direct per-entity helpers in ingestion layer:
+  - sync_issue_resource_events_direct(...)
+  - sync_mr_resource_events_direct(...)
+  - sync_mr_closes_issues_direct(...)
+  - sync_mr_diffs_direct(...)
+```
+
+3. **Clarify atomicity contract to “primary-entity atomicity” (remove contradiction).**  
+Why this is better: Keeps strong zero-write guarantees for missing IIDs while matching practical staged pipeline behavior.
+
+```diff
+@@ Design Constraints
+-6. Preflight-then-commit (content-plane): All remote fetches happen BEFORE any writes to content tables ...
+6. Primary-entity atomicity: all requested issue/MR payload fetches complete before first content write.
+   If any primary IID fetch fails, primary ingest does zero content writes.
+   Dependent stages (discussions/events/diffs/closes) are post-ingest and best-effort, with structured per-stage failure reporting.
+```
+
+4. **Extend existing `sync_runs` schema instead of redefining it.**  
+Why this is better: Preserves compatibility with current `SyncRunRecorder`, `sync_status`, and existing historical data.
+
+```diff
+@@ Step 8a
+-Add `sync_runs` table migration (CREATE TABLE sync_runs ...)
+Add migration 027 to extend existing `sync_runs` table:
+  - ADD COLUMN mode TEXT NULL            -- 'standard' | 'surgical'
+  - ADD COLUMN phase TEXT NULL           -- preflight|ingest|dependents|docs|embed|done|failed
+  - ADD COLUMN surgical_summary_json TEXT NULL
+Reuse `SyncRunRecorder` row lifecycle; do not introduce a parallel run-ledger model.
+```
+
+5. **Strengthen TOCTOU stale protection for equal timestamps.**  
+Why this is better: Prevents regressions when `updated_at` is equal but a fresher local fetch already happened.
+
+```diff
+@@ Design Constraints
+-13. ... If local `updated_at` is newer than preflight payload `updated_at`, skip ...
+13. ... Skip stale when:
+    a) local.updated_at > payload.updated_at, OR
+    b) local.updated_at == payload.updated_at AND local.last_seen_at > preflight_started_at_ms.
+    This prevents equal-timestamp regressions under concurrent sync.
+@@ Step 1f tests
+Add test: `equal_updated_at_but_newer_last_seen_is_skipped`.
+```
+
+6. **Shrink lock window further: release `sync` lock before embed; use dedicated embed lock.**  
+Why this is better: Prevents long embedding from blocking unrelated syncs and avoids concurrent embed writers.
+
+```diff
+@@ Design Constraints
+-11. Lock ... held through all mutation stages.
+11. Lock ... held through ingest/dependents/docs only.
+    Release `AppLock("sync")` before embed.
+    Embed stage uses `AppLock("embed")` for single-flight embedding writes.
+@@ Step 9
+-Embed runs inside the same sync lock window
+Embed runs after sync lock release, under dedicated embed lock
+```
+
+7. **Add the missing `sync-runs` robot read path (the plan references it but doesn’t define it).**  
+Why this is better: Makes durable run-state actually useful for recovery automation and observability.
+
+```diff
+@@ Step 14 (new)
+## Step 14a: Add `sync-runs` read command
+
+CLI:
+  lore --robot sync-runs --limit 20
+  lore --robot sync-runs --run-id <id>
+  lore --robot sync-runs --state failed
+
+Robot response fields:
+  run_id, mode, phase, status, started_at, finished_at, counters, failures, suggested_retry_command
+```
+
+8. **Add URL-native surgical targets (`--issue-url`, `--mr-url`) with project inference.**  
+Why this is better: Much more agent-friendly and reduces project-resolution errors from copy/paste workflows.
+
+```diff
+@@ CLI Interface
+ lore sync --issue 123 --issue 456 -p myproject
+lore sync --issue-url https://gitlab.example.com/group/proj/-/issues/123
+lore sync --mr-url https://gitlab.example.com/group/proj/-/merge_requests/789
+@@ Step 2
+Add repeatable flags:
+  --issue-url <url>
+  --mr-url <url>
+Parse URL into (project_path, iid). If all targets are URL-derived and same project, `-p` is optional.
+If mixed projects are provided in one command, reject with clear error.
+```
+
+If you want, I can produce a single consolidated patched version of your plan (iteration 5 draft) with these revisions already merged.