feat(surgical-sync): add per-IID surgical sync pipeline with preflight validation

Add the ability to sync specific issues or merge requests by IID without
running a full incremental sync. This enables fast, targeted data refresh
for individual entities — useful for agent workflows, debugging, and
real-time investigation of specific issues or MRs.

Architecture:
- New CLI flags: --issue <IID> and --mr <IID> (repeatable, up to 100 total)
  scoped to a single project via -p/--project
- Preflight phase validates all IIDs exist on GitLab before any DB writes,
  with TOCTOU-aware soft verification at ingest time
- 6-stage pipeline: preflight -> fetch -> ingest -> dependents -> docs -> embed
- Each stage is cancellation-aware via ShutdownSignal
- Dedicated SyncRunRecorder extensions track surgical-specific counters
  (issues_fetched, mrs_ingested, docs_regenerated, etc.)

New modules:
- src/ingestion/surgical.rs: Core surgical fetch/ingest/dependent logic
  with preflight_fetch(), ingest_issue_by_iid(), ingest_mr_by_iid(),
  and fetch_dependents_for_{issue,mr}()
- src/cli/commands/sync_surgical.rs: Full CLI orchestrator with progress
  spinners, human/robot output, and cancellation handling
- src/embedding/pipeline.rs: embed_documents_by_ids() for scoped embedding
- src/documents/regenerator.rs: regenerate_dirty_documents_for_sources()
  for scoped document regeneration

Database changes:
- Migration 027: Extends sync_runs with mode, phase, surgical_iids_json,
  per-entity counters, and cancelled_at column
- New indexes: idx_sync_runs_mode_started, idx_sync_runs_status_phase_started

GitLab client:
- get_issue_by_iid() and get_mr_by_iid() single-entity fetch methods

Error handling:
- New SurgicalPreflightFailed error variant with entity_type, iid, project,
  and reason fields. Shares exit code 6 with GitLabNotFound.

Includes comprehensive test coverage:
- 645 lines of surgical ingestion tests (wiremock-based)
- 184 lines of scoped embedding tests
- 85 lines of scoped regeneration tests
- 113 lines of GitLab client single-entity tests
- 236 lines of sync_run surgical column/counter tests
- Unit tests for SyncOptions, error codes, and CLI validation
This commit is contained in:
teernisse
2026-02-18 16:27:59 -05:00
parent ea6e45e43f
commit 9ec1344945
25 changed files with 3354 additions and 37 deletions

View File

@@ -84,6 +84,60 @@ pub fn regenerate_dirty_documents(
Ok(result)
}
#[derive(Debug, Default)]
pub struct RegenerateForSourcesResult {
pub regenerated: usize,
pub unchanged: usize,
pub errored: usize,
pub document_ids: Vec<i64>,
}
pub fn regenerate_dirty_documents_for_sources(
conn: &Connection,
source_keys: &[(SourceType, i64)],
) -> Result<RegenerateForSourcesResult> {
let mut result = RegenerateForSourcesResult::default();
let mut cache = ParentMetadataCache::new();
for &(source_type, source_id) in source_keys {
match regenerate_one(conn, source_type, source_id, &mut cache) {
Ok(changed) => {
if changed {
result.regenerated += 1;
} else {
result.unchanged += 1;
}
clear_dirty(conn, source_type, source_id)?;
// Try to collect the document_id if a document exists
if let Ok(doc_id) = get_document_id(conn, source_type, source_id) {
result.document_ids.push(doc_id);
}
}
Err(e) => {
warn!(
source_type = %source_type,
source_id,
error = %e,
"Failed to regenerate document for source"
);
record_dirty_error(conn, source_type, source_id, &e.to_string())?;
result.errored += 1;
}
}
}
debug!(
regenerated = result.regenerated,
unchanged = result.unchanged,
errored = result.errored,
document_ids = result.document_ids.len(),
"Scoped document regeneration complete"
);
Ok(result)
}
fn regenerate_one(
conn: &Connection,
source_type: SourceType,