feat(surgical-sync): add per-IID surgical sync pipeline with preflight validation
Add the ability to sync specific issues or merge requests by IID without
running a full incremental sync. This enables fast, targeted data refresh
for individual entities — useful for agent workflows, debugging, and
real-time investigation of specific issues or MRs.
Architecture:
- New CLI flags: --issue <IID> and --mr <IID> (repeatable, up to 100 total)
scoped to a single project via -p/--project
- Preflight phase validates all IIDs exist on GitLab before any DB writes,
with TOCTOU-aware soft verification at ingest time
- 6-stage pipeline: preflight -> fetch -> ingest -> dependents -> docs -> embed
- Each stage is cancellation-aware via ShutdownSignal
- Dedicated SyncRunRecorder extensions track surgical-specific counters
(issues_fetched, mrs_ingested, docs_regenerated, etc.)
New modules:
- src/ingestion/surgical.rs: Core surgical fetch/ingest/dependent logic
with preflight_fetch(), ingest_issue_by_iid(), ingest_mr_by_iid(),
and fetch_dependents_for_{issue,mr}()
- src/cli/commands/sync_surgical.rs: Full CLI orchestrator with progress
spinners, human/robot output, and cancellation handling
- src/embedding/pipeline.rs: embed_documents_by_ids() for scoped embedding
- src/documents/regenerator.rs: regenerate_dirty_documents_for_sources()
for scoped document regeneration
Database changes:
- Migration 027: Extends sync_runs with mode, phase, surgical_iids_json,
per-entity counters, and cancelled_at column
- New indexes: idx_sync_runs_mode_started, idx_sync_runs_status_phase_started
GitLab client:
- get_issue_by_iid() and get_mr_by_iid() single-entity fetch methods
Error handling:
- New SurgicalPreflightFailed error variant with entity_type, iid, project,
and reason fields. Shares exit code 6 with GitLabNotFound.
Includes comprehensive test coverage:
- 645 lines of surgical ingestion tests (wiremock-based)
- 184 lines of scoped embedding tests
- 85 lines of scoped regeneration tests
- 113 lines of GitLab client single-entity tests
- 236 lines of sync_run surgical column/counter tests
- Unit tests for SyncOptions, error codes, and CLI validation
This commit is contained in:
@@ -84,6 +84,60 @@ pub fn regenerate_dirty_documents(
|
||||
Ok(result)
|
||||
}
|
||||
|
||||
#[derive(Debug, Default)]
|
||||
pub struct RegenerateForSourcesResult {
|
||||
pub regenerated: usize,
|
||||
pub unchanged: usize,
|
||||
pub errored: usize,
|
||||
pub document_ids: Vec<i64>,
|
||||
}
|
||||
|
||||
pub fn regenerate_dirty_documents_for_sources(
|
||||
conn: &Connection,
|
||||
source_keys: &[(SourceType, i64)],
|
||||
) -> Result<RegenerateForSourcesResult> {
|
||||
let mut result = RegenerateForSourcesResult::default();
|
||||
let mut cache = ParentMetadataCache::new();
|
||||
|
||||
for &(source_type, source_id) in source_keys {
|
||||
match regenerate_one(conn, source_type, source_id, &mut cache) {
|
||||
Ok(changed) => {
|
||||
if changed {
|
||||
result.regenerated += 1;
|
||||
} else {
|
||||
result.unchanged += 1;
|
||||
}
|
||||
clear_dirty(conn, source_type, source_id)?;
|
||||
|
||||
// Try to collect the document_id if a document exists
|
||||
if let Ok(doc_id) = get_document_id(conn, source_type, source_id) {
|
||||
result.document_ids.push(doc_id);
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
warn!(
|
||||
source_type = %source_type,
|
||||
source_id,
|
||||
error = %e,
|
||||
"Failed to regenerate document for source"
|
||||
);
|
||||
record_dirty_error(conn, source_type, source_id, &e.to_string())?;
|
||||
result.errored += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
debug!(
|
||||
regenerated = result.regenerated,
|
||||
unchanged = result.unchanged,
|
||||
errored = result.errored,
|
||||
document_ids = result.document_ids.len(),
|
||||
"Scoped document regeneration complete"
|
||||
);
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
|
||||
fn regenerate_one(
|
||||
conn: &Connection,
|
||||
source_type: SourceType,
|
||||
|
||||
Reference in New Issue
Block a user