feat(cli): implement 'lore file-history' command (bd-z94)
Adds file-history command showing which MRs touched a file, with:
- Rename chain resolution via BFS (resolve_rename_chain from bd-1yx)
- DiffNote discussion snippets with --discussions flag
- --merged filter, --no-follow-renames, -n limit
- Human output with styled MR list and rename chain display
- Robot JSON output with {ok, data, meta} envelope
- Autocorrect registry and robot-docs manifest entry
- Fixes pre-existing --no-status missing from sync autocorrect registry
This commit is contained in:
212
docs/plan-surgical-sync.feedback-4.md
Normal file
212
docs/plan-surgical-sync.feedback-4.md
Normal file
@@ -0,0 +1,212 @@
|
||||
1. **Resolve the current contract contradictions (`preflight-only`, `dry-run`, `sync_runs`)**
|
||||
|
||||
Why this improves the plan:
|
||||
- Right now constraints conflict: “zero DB writes before commit” vs inserting `sync_runs` during preflight.
|
||||
- This ambiguity will cause implementation drift and flaky acceptance tests.
|
||||
- Splitting control-plane writes from content-plane writes keeps safety guarantees strict while preserving observability.
|
||||
|
||||
```diff
|
||||
@@ ## Design Constraints
|
||||
-6. **Preflight-then-commit**: All remote fetches happen BEFORE any DB writes. If any IID fetch fails (404, network error), the entire operation aborts with zero DB mutations.
|
||||
+6. **Preflight-then-commit (content-plane)**: All remote fetches happen BEFORE any writes to content tables (`issues`, `merge_requests`, `discussions`, `resource_events`, `documents`, `embeddings`).
|
||||
+7. **Control-plane exception**: `sync_runs` / `sync_run_entities` writes are allowed during preflight for observability and crash diagnostics.
|
||||
@@
|
||||
-11. **Preflight-only mode**: `--preflight-only` validates remote entity existence and permissions with zero DB writes.
|
||||
+11. **Preflight-only mode**: `--preflight-only` performs zero content writes; control-plane run-ledger writes are allowed.
|
||||
@@ ### For me to evaluate (functional):
|
||||
-24. **Preflight-only mode** ... no DB mutations beyond the sync_runs ledger entry
|
||||
+24. **Preflight-only mode** ... no content DB mutations; only run-ledger rows may be written
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
2. **Add stale-write protection to avoid TOCTOU regressions during unlocked preflight**
|
||||
|
||||
Why this improves the plan:
|
||||
- You intentionally preflight without lock; that’s good for throughput but introduces race risk.
|
||||
- Without a guard, a slower surgical run can overwrite newer data ingested by a concurrent normal sync.
|
||||
- This is a correctness bug under contention, not a nice-to-have.
|
||||
|
||||
```diff
|
||||
@@ ## Design Constraints
|
||||
+12. **Stale-write protection**: Surgical ingest MUST NOT overwrite fresher local rows. If local `updated_at` is newer than the preflight payload’s `updated_at`, skip that entity and record `skipped_stale`.
|
||||
@@ ## Step 7: Create `src/ingestion/surgical.rs`
|
||||
- let labels_created = process_single_issue(conn, config, project_id, issue)?;
|
||||
+ // Skip stale payloads to avoid TOCTOU overwrite after unlocked preflight.
|
||||
+ if is_local_newer_issue(conn, project_id, issue.iid, issue.updated_at)? {
|
||||
+ result.skipped_stale += 1;
|
||||
+ return Ok(result);
|
||||
+ }
|
||||
+ let labels_created = process_single_issue(conn, config, project_id, issue)?;
|
||||
@@
|
||||
+// same guard for MR path
|
||||
@@ ## Step 15: Update `SyncResult`
|
||||
+ /// Entities skipped because local row was newer than preflight payload
|
||||
+ pub skipped_stale: usize,
|
||||
@@ ### Edge cases to verify:
|
||||
+38. **TOCTOU safety**: if a normal sync updates entity after preflight but before ingest, surgical run skips stale payload (no overwrite)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
3. **Make dirty-source scoping exact (do not capture pre-existing rows for same entity)**
|
||||
|
||||
Why this improves the plan:
|
||||
- Current “query dirty rows by `source_id` after ingest” can accidentally include older dirty rows for the same entity.
|
||||
- That silently violates strict run scoping and can delete unrelated backlog rows.
|
||||
- You can fix this without adding `origin_run_id` to `dirty_sources` (which you already rejected).
|
||||
|
||||
```diff
|
||||
@@ ## Step 7: Create `src/ingestion/surgical.rs`
|
||||
- // Collect dirty_source rows for this entity
|
||||
- let mut stmt = conn.prepare(
|
||||
- "SELECT id FROM dirty_sources WHERE source_type = 'issue' AND source_id = ?1"
|
||||
- )?;
|
||||
+ // Capture only rows inserted by THIS call using high-water mark.
|
||||
+ let before_dirty_id: i64 = conn.query_row(
|
||||
+ "SELECT COALESCE(MAX(id), 0) FROM dirty_sources",
|
||||
+ [], |r| r.get(0),
|
||||
+ )?;
|
||||
+ // ... call process_single_issue ...
|
||||
+ let mut stmt = conn.prepare(
|
||||
+ "SELECT id FROM dirty_sources
|
||||
+ WHERE id > ?1 AND source_type = 'issue' AND source_id = ?2"
|
||||
+ )?;
|
||||
@@
|
||||
+ // same pattern for MR
|
||||
@@ ### 1d. Scoping invariant tests
|
||||
+#[test]
|
||||
+fn surgical_docs_scope_ignores_preexisting_dirty_rows_for_same_entity() {
|
||||
+ // pre-insert dirty row for iid=7, then surgical ingest iid=7
|
||||
+ // assert result.dirty_source_ids only contains newly inserted rows
|
||||
+}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
4. **Fix embed-stage leakage when `--no-docs` is used in surgical mode**
|
||||
|
||||
Why this improves the plan:
|
||||
- Current design can run global embed even when docs stage is skipped, which may embed unrelated backlog docs.
|
||||
- That breaks the surgical “scope only this run” promise.
|
||||
- This is both correctness and operator-trust critical.
|
||||
|
||||
```diff
|
||||
@@ ## Step 9: Create `run_sync_surgical`
|
||||
- if !options.no_embed {
|
||||
+ // Surgical embed only runs when surgical docs actually regenerated docs in this run.
|
||||
+ if !options.no_embed && !options.no_docs && result.documents_regenerated > 0 {
|
||||
@@ ## Step 4: Wire new fields in `handle_sync_cmd`
|
||||
+ if options.is_surgical() && options.no_docs && !options.no_embed {
|
||||
+ return Err(Box::new(LoreError::Other(
|
||||
+ "In surgical mode, --no-docs requires --no-embed (to preserve scoping guarantees)".to_string()
|
||||
+ )));
|
||||
+ }
|
||||
@@ ### For me to evaluate
|
||||
+39. **No embed leakage**: `sync --issue X --no-docs` never embeds unrelated unembedded docs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
5. **Add queue-failure hygiene so scoped jobs do not leak forever**
|
||||
|
||||
Why this improves the plan:
|
||||
- Scoped drains prevent accidental processing, but failed runs can strand pending jobs permanently.
|
||||
- You need explicit terminalization (`aborted`) and optional replay mechanics.
|
||||
- Otherwise queue bloat and confusing diagnostics accumulate.
|
||||
|
||||
```diff
|
||||
@@ ## Step 8a: Add `sync_runs` table migration
|
||||
+ALTER TABLE dependent_queue ADD COLUMN aborted_reason TEXT;
|
||||
+-- status domain now includes: pending, claimed, done, failed, aborted
|
||||
@@ ## Step 9: run_sync_surgical failure paths
|
||||
+// On run failure/cancel:
|
||||
+conn.execute(
|
||||
+ "UPDATE dependent_queue
|
||||
+ SET status='aborted', aborted_reason=?1
|
||||
+ WHERE project_id=?2 AND scope_run_id=?3 AND status='pending'",
|
||||
+ rusqlite::params![failure_summary, project_id, run_id],
|
||||
+)?;
|
||||
@@ ## Acceptance Criteria
|
||||
+40. **No stranded scoped jobs**: failed surgical runs leave no `pending` rows for their `scope_run_id`
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
6. **Persist per-entity lifecycle (`sync_run_entities`) for real observability and deterministic retry**
|
||||
|
||||
Why this improves the plan:
|
||||
- `sync_runs` alone gives aggregate counters but not which IID failed at which stage.
|
||||
- Per-entity records make retries deterministic and robot output far more useful.
|
||||
- This is the missing piece for your stated “deterministic retry decisions.”
|
||||
|
||||
```diff
|
||||
@@ ## Step 8a: Add `sync_runs` table migration
|
||||
+CREATE TABLE IF NOT EXISTS sync_run_entities (
|
||||
+ id INTEGER PRIMARY KEY,
|
||||
+ run_id TEXT NOT NULL REFERENCES sync_runs(run_id),
|
||||
+ entity_type TEXT NOT NULL CHECK(entity_type IN ('issue','merge_request')),
|
||||
+ iid INTEGER NOT NULL,
|
||||
+ stage TEXT NOT NULL,
|
||||
+ status TEXT NOT NULL CHECK(status IN ('ok','failed','skipped_stale')),
|
||||
+ error_code TEXT,
|
||||
+ error_message TEXT,
|
||||
+ updated_at INTEGER NOT NULL
|
||||
+);
|
||||
+CREATE INDEX IF NOT EXISTS idx_sync_run_entities_run ON sync_run_entities(run_id, entity_type, iid);
|
||||
@@ ## Step 15: Update `SyncResult`
|
||||
+ pub failed_iids: Vec<(String, u64)>,
|
||||
+ pub skipped_stale_iids: Vec<(String, u64)>,
|
||||
@@ ## CLI Interface
|
||||
+lore --robot sync-runs --run-id <id>
|
||||
+lore --robot sync-runs --run-id <id> --retry-failed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
7. **Use explicit error type for surgical preflight failures (not `LoreError::Other`)**
|
||||
|
||||
Why this improves the plan:
|
||||
- `Other(String)` loses machine semantics, weakens robot mode, and leads to bad exit-code behavior.
|
||||
- A typed error preserves structured failures and enables actionable recovery commands.
|
||||
|
||||
```diff
|
||||
@@ ## Step 9: run_sync_surgical
|
||||
- return Err(LoreError::Other(
|
||||
- format!("Surgical preflight failed for {} of {} IIDs: {}", ...)
|
||||
- ).into());
|
||||
+ return Err(LoreError::SurgicalPreflightFailed {
|
||||
+ run_id: run_id.to_string(),
|
||||
+ total: total_items,
|
||||
+ failures: preflight.failures.clone(),
|
||||
+ }.into());
|
||||
@@ ## Step 15: Update `SyncResult`
|
||||
+ /// Machine-actionable error summary for robot mode
|
||||
+ pub error_code: Option<String>,
|
||||
@@ ## Acceptance Criteria
|
||||
+41. **Typed failure**: preflight failures serialize structured errors (not generic `Other`) with machine-usable codes/actions
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
8. **Strengthen tests for rollback, contention, and stale-skip guarantees**
|
||||
|
||||
Why this improves the plan:
|
||||
- Current tests cover many happy-paths and scoping invariants, but key race/rollback behaviors are still under-tested.
|
||||
- These are exactly where regressions will appear first in production.
|
||||
|
||||
```diff
|
||||
@@ ## Step 1: TDD — Write Failing Tests First
|
||||
+### 1f. Transactional rollback + TOCTOU tests
|
||||
+1. `preflight_success_then_ingest_failure_rolls_back_all_content_writes`
|
||||
+2. `stale_payload_is_skipped_when_local_updated_at_is_newer`
|
||||
+3. `failed_run_aborts_pending_scoped_jobs`
|
||||
+4. `surgical_no_docs_requires_no_embed`
|
||||
@@ ### Automated scoping invariants
|
||||
-38. **Scoped queue/docs invariants are enforced by automated tests**
|
||||
+42. **Rollback and race invariants are enforced by automated tests** (no partial writes on ingest failure, no stale overwrite)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
These eight revisions keep your core approach intact, avoid your explicitly rejected ideas, and close the biggest correctness/operability gaps before implementation.
|
||||
Reference in New Issue
Block a user