feat(surgical-sync): add per-IID surgical sync pipeline with preflight validation

Add the ability to sync specific issues or merge requests by IID without
running a full incremental sync. This enables fast, targeted data refresh
for individual entities — useful for agent workflows, debugging, and
real-time investigation of specific issues or MRs.

Architecture:
- New CLI flags: --issue <IID> and --mr <IID> (repeatable, up to 100 total)
  scoped to a single project via -p/--project
- Preflight phase validates all IIDs exist on GitLab before any DB writes,
  with TOCTOU-aware soft verification at ingest time
- 6-stage pipeline: preflight -> fetch -> ingest -> dependents -> docs -> embed
- Each stage is cancellation-aware via ShutdownSignal
- Dedicated SyncRunRecorder extensions track surgical-specific counters
  (issues_fetched, mrs_ingested, docs_regenerated, etc.)

New modules:
- src/ingestion/surgical.rs: Core surgical fetch/ingest/dependent logic
  with preflight_fetch(), ingest_issue_by_iid(), ingest_mr_by_iid(),
  and fetch_dependents_for_{issue,mr}()
- src/cli/commands/sync_surgical.rs: Full CLI orchestrator with progress
  spinners, human/robot output, and cancellation handling
- src/embedding/pipeline.rs: embed_documents_by_ids() for scoped embedding
- src/documents/regenerator.rs: regenerate_dirty_documents_for_sources()
  for scoped document regeneration

Database changes:
- Migration 027: Extends sync_runs with mode, phase, surgical_iids_json,
  per-entity counters, and cancelled_at column
- New indexes: idx_sync_runs_mode_started, idx_sync_runs_status_phase_started

GitLab client:
- get_issue_by_iid() and get_mr_by_iid() single-entity fetch methods

Error handling:
- New SurgicalPreflightFailed error variant with entity_type, iid, project,
  and reason fields. Shares exit code 6 with GitLabNotFound.

Includes comprehensive test coverage:
- 645 lines of surgical ingestion tests (wiremock-based)
- 184 lines of scoped embedding tests
- 85 lines of scoped regeneration tests
- 113 lines of GitLab client single-entity tests
- 236 lines of sync_run surgical column/counter tests
- Unit tests for SyncOptions, error codes, and CLI validation
This commit is contained in:
teernisse
2026-02-18 16:27:59 -05:00
parent ea6e45e43f
commit 9ec1344945
25 changed files with 3354 additions and 37 deletions

View File

@@ -4,7 +4,7 @@ pub mod progress;
pub mod render;
pub mod robot;
use clap::{Parser, Subcommand};
use clap::{Args, Parser, Subcommand};
use std::io::IsTerminal;
#[derive(Parser)]
@@ -298,6 +298,15 @@ pub enum Commands {
lore cron uninstall # Remove cron job")]
Cron(CronArgs),
/// Manage stored GitLab token
#[command(after_help = "\x1b[1mExamples:\x1b[0m
lore token set # Interactive token entry + validation
lore token set --token glpat-xxx # Non-interactive token storage
echo glpat-xxx | lore token set # Pipe token from stdin
lore token show # Show token (masked)
lore token show --unmask # Show full token")]
Token(TokenArgs),
#[command(hide = true)]
List {
#[arg(value_parser = ["issues", "mrs"])]
@@ -798,7 +807,9 @@ pub struct GenerateDocsArgs {
lore sync --no-embed # Skip embedding step
lore sync --no-status # Skip work-item status enrichment
lore sync --full --force # Full re-sync, override stale lock
lore sync --dry-run # Preview what would change")]
lore sync --dry-run # Preview what would change
lore sync --issue 42 -p group/repo # Surgically sync one issue
lore sync --mr 10 --mr 20 -p g/r # Surgically sync two MRs")]
pub struct SyncArgs {
/// Reset cursors, fetch everything
#[arg(long, overrides_with = "no_full")]
@@ -848,6 +859,22 @@ pub struct SyncArgs {
/// Acquire file lock before syncing (skip if another sync is running)
#[arg(long)]
pub lock: bool,
/// Surgically sync specific issues by IID (repeatable, must be positive)
#[arg(long, value_parser = clap::value_parser!(u64).range(1..), action = clap::ArgAction::Append)]
pub issue: Vec<u64>,
/// Surgically sync specific merge requests by IID (repeatable, must be positive)
#[arg(long, value_parser = clap::value_parser!(u64).range(1..), action = clap::ArgAction::Append)]
pub mr: Vec<u64>,
/// Scope to a single project (required when --issue or --mr is used)
#[arg(short = 'p', long)]
pub project: Option<String>,
/// Validate remote entities exist without DB writes (preflight only)
#[arg(long)]
pub preflight_only: bool,
}
#[derive(Parser)]
@@ -973,15 +1000,14 @@ pub struct WhoArgs {
#[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>,
/// Maximum results per section (1..=500, bounded for output safety)
/// Maximum results per section (1..=500); omit for unlimited
#[arg(
short = 'n',
long = "limit",
default_value = "20",
value_parser = clap::value_parser!(u16).range(1..=500),
help_heading = "Output"
)]
pub limit: u16,
pub limit: Option<u16>,
/// Select output fields (comma-separated, or 'minimal' preset; varies by mode)
#[arg(long, help_heading = "Output", value_delimiter = ',')]
@@ -1128,3 +1154,26 @@ pub enum CronAction {
/// Show current cron configuration
Status,
}
#[derive(Args)]
pub struct TokenArgs {
#[command(subcommand)]
pub action: TokenAction,
}
#[derive(Subcommand)]
pub enum TokenAction {
/// Store a GitLab token in the config file
Set {
/// Token value (reads from stdin if omitted in non-interactive mode)
#[arg(long)]
token: Option<String>,
},
/// Show the current token (masked by default)
Show {
/// Show the full unmasked token
#[arg(long)]
unmask: bool,
},
}