Files
gitlore/docs/plan-surgical-sync.feedback-3.md
teernisse a1bca10408 feat(cli): implement 'lore file-history' command (bd-z94)
Adds file-history command showing which MRs touched a file, with:
- Rename chain resolution via BFS (resolve_rename_chain from bd-1yx)
- DiffNote discussion snippets with --discussions flag
- --merged filter, --no-follow-renames, -n limit
- Human output with styled MR list and rename chain display
- Robot JSON output with {ok, data, meta} envelope
- Autocorrect registry and robot-docs manifest entry
- Fixes pre-existing --no-status missing from sync autocorrect registry
2026-02-17 12:57:56 -05:00

169 lines
7.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Below are the strongest **new** revisions Id make (excluding everything in your rejected list), with rationale and plan-level diffs.
### 1. Add a durable run ledger (`sync_runs`) with phase state
This makes surgical sync crash-resumable, auditable, and safer under Ctrl+C. Right now `run_id` is mostly ephemeral; persisting phase state removes ambiguity about what completed.
```diff
@@ Design Constraints
+9. **Durable run state**: Surgical sync MUST persist a `sync_runs` row keyed by `run_id`
+ with phase transitions (`preflight`, `ingest`, `dependents`, `docs`, `embed`, `done`, `failed`).
+ This is required for crash recovery, observability, and deterministic retries.
@@ Step 9: Create `run_sync_surgical`
+Before Stage 0, insert `sync_runs(run_id, project_id, mode='surgical', requested_counts, started_at)`.
+After each stage, update `sync_runs.phase`, counters, and `last_error` if present.
+On success/failure, set terminal state (`done`/`failed`) and `finished_at`.
```
### 2. Add `--preflight-only` (network validation without writes)
`--dry-run` is intentionally zero-network, so it cannot validate IIDs. `--preflight-only` is high-value for agents: verifies existence/permissions quickly with no DB mutation.
```diff
@@ CLI Interface
lore sync --dry-run --issue 123 -p myproject
+lore sync --preflight-only --issue 123 -p myproject
@@ Step 2: Add `--issue`, `--mr`, `-p` to `SyncArgs`
+ /// Validate remote entities and auth without any DB writes
+ #[arg(long, default_value_t = false)]
+ pub preflight_only: bool,
@@ Step 10: Add branch in `run_sync`
+if options.preflight_only && options.is_surgical() {
+ return run_sync_surgical_preflight_only(config, &options, run_id, signal).await;
+}
```
### 3. Preflight should aggregate all missing/failed IIDs, not fail-fast
Fail-fast causes repeated reruns. Aggregating errors gives one-shot correction and better robot automation.
```diff
@@ Step 7: Create `src/ingestion/surgical.rs`
-/// Returns the fetched payloads. If ANY fetch fails, the entire operation should abort.
+/// Returns fetched payloads plus per-IID failures; caller aborts writes if failures exist.
pub async fn preflight_fetch(...) -> Result<PreflightResult> {
@@
#[derive(Debug, Default)]
pub struct PreflightResult {
pub issues: Vec<GitLabIssue>,
pub merge_requests: Vec<GitLabMergeRequest>,
+ pub failures: Vec<EntityFailure>, // stage="fetch"
}
@@ Step 9: Create `run_sync_surgical`
-let preflight = preflight_fetch(...).await?;
+let preflight = preflight_fetch(...).await?;
+if !preflight.failures.is_empty() {
+ result.entity_failures = preflight.failures;
+ return Err(LoreError::Other("Surgical preflight failed for one or more IIDs".into()).into());
+}
```
### 4. Stop filtering scoped queue drains with raw `json_extract` scans
`json_extract(payload_json, '$.scope_run_id')` in hot drain queries will degrade as queue grows. Use indexed scope metadata.
```diff
@@ Step 9b: Implement scoped drain helpers
-// claim query adds:
-// AND json_extract(payload_json, '$.scope_run_id') = ?
+// Add migration:
+// 1) Add `scope_run_id` generated/stored column derived from payload_json (or explicit column)
+// 2) Create index on (project_id, job_type, scope_run_id, status, id)
+// Scoped drains filter by indexed `scope_run_id`, not full-table JSON extraction.
```
### 5. Replace `dirty_source_ids` collection-by-query with explicit run scoping
Current approach can accidentally include prior dirty rows for same source and can duplicate work. Tag dirty rows with `origin_run_id` and consume by run.
```diff
@@ Design Constraints
-2. **Dirty queue scoping**: ... MUST call ... `run_generate_docs_for_dirty_ids`
+2. **Dirty queue scoping**: Surgical sync MUST scope docs by `origin_run_id` on `dirty_sources`
+ (or equivalent exact run marker) and MUST NOT drain unrelated dirty rows.
@@ Step 7: `SurgicalIngestResult`
- pub dirty_source_ids: Vec<i64>,
+ pub origin_run_id: String,
@@ Step 9a: Implement `run_generate_docs_for_dirty_ids`
-pub fn run_generate_docs_for_dirty_ids(config: &Config, dirty_source_ids: &[i64]) -> Result<...>
+pub fn run_generate_docs_for_run_id(config: &Config, run_id: &str) -> Result<...>
```
### 6. Enforce transaction safety at the type boundary
`unchecked_transaction()` + `&Connection` signatures is fragile. Accept `&Transaction` for ingest internals and use `TransactionBehavior::Immediate` for deterministic lock behavior.
```diff
@@ Step 7: Create `src/ingestion/surgical.rs`
-pub fn ingest_issue_by_iid_from_payload(conn: &Connection, ...)
+pub fn ingest_issue_by_iid_from_payload(tx: &rusqlite::Transaction<'_>, ...)
-pub fn ingest_mr_by_iid_from_payload(conn: &Connection, ...)
+pub fn ingest_mr_by_iid_from_payload(tx: &rusqlite::Transaction<'_>, ...)
-let tx = conn.unchecked_transaction()?;
+let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Immediate)?;
```
### 7. Acquire sync lock only for mutation phases, not remote preflight
This materially reduces lock contention and keeps normal sync throughput higher, while still guaranteeing mutation serialization.
```diff
@@ Design Constraints
+10. **Lock window minimization**: Preflight fetch runs without sync lock; lock is acquired immediately
+ before first DB mutation and held through all mutation stages.
@@ Step 9: Create `run_sync_surgical`
-// ── Acquire sync lock ──
-...
-// ── Stage 0: Preflight fetch ──
+// ── Stage 0: Preflight fetch (no lock, no writes) ──
...
+// ── Acquire sync lock just before Stage 1 mutation ──
```
### 8. Add explicit transient retry policy beyond 429
Client already handles rate limits; surgical reliability improves a lot if 5xx/timeouts are retried with bounded backoff.
```diff
@@ Design Constraints
+11. **Transient retry policy**: Preflight and dependent remote fetches MUST retry boundedly on
+ timeout/5xx with jittered backoff; permanent errors (404/401/403) fail immediately.
@@ Step 5: Add `get_issue_by_iid` / `get_mr_by_iid`
+Document retry behavior for transient transport/server failures.
```
### 9. Tighten automated tests around scoping invariants
You already list manual checks; these should be enforced in unit/integration tests to prevent regressions.
```diff
@@ Step 1: TDD — Write Failing Tests First
+### 1d. New invariants tests
+- `surgical_docs_scope_ignores_preexisting_dirty_rows`
+- `scoped_queue_drain_ignores_orphaned_jobs`
+- `preflight_aggregates_multiple_missing_iids`
+- `preflight_only_performs_zero_writes`
+- `dry_run_performs_zero_network_calls`
+- `lock_window_does_not_block_during_preflight`
@@ Acceptance Criteria
+32. Scoped queue/docs invariants are covered by automated tests (not manual-only verification).
```
### 10. Make robot-mode surgical output first-class
For agent workflows, include full stage telemetry and actionable recovery commands.
```diff
@@ Step 15: Update `SyncResult` for robot mode structured output
+ /// Per-stage elapsed ms for deterministic performance tracking
+ pub stage_timings_ms: std::collections::BTreeMap<String, u64>,
+ /// Suggested recovery commands (robot ergonomics)
+ pub recovery_actions: Vec<String>,
@@ Step 14: Update `robot-docs` manifest
+Document surgical-specific error codes and `actions` schema for automated recovery.
```
If you want, I can now produce a fully rewritten **iteration 3** plan that merges these into your current structure end-to-end.