feat(ingestion): Implement MR sync with parallel discussion prefetch

Adds complete merge request ingestion pipeline with a novel two-phase
discussion sync strategy optimized for throughput.

New modules:
- merge_requests.rs: MR upsert with labels/assignees/reviewers handling,
  stale MR cleanup, and watermark-based incremental sync
- mr_discussions.rs: Parallel prefetch strategy for MR discussions

Two-phase MR discussion sync:
1. PREFETCH PHASE: Spawn concurrent tasks to fetch discussions for
   multiple MRs simultaneously (configurable concurrency, default 8).
   Transform and validate in parallel, storing results in memory.
2. WRITE PHASE: Serial database writes to avoid lock contention.
   Each MR's discussions written in a single transaction, with
   proper stale discussion cleanup.

This approach achieves ~4-8x throughput vs serial fetching while
maintaining database consistency. Transform errors are tracked per-MR
to prevent partial writes from corrupting watermarks.

Orchestrator updates:
- ingest_merge_requests(): Coordinates MR fetch -> discussion sync flow
- Progress callbacks emit MR-specific events for UI feedback
- Respects --full flag to reset discussion watermarks for full resync

The prefetch strategy is critical for MRs which typically have more
discussions than issues, and where API latency dominates sync time.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Taylor Eernisse
2026-01-26 22:45:48 -05:00
parent d33f24c91b
commit cd44e516e3
6 changed files with 1458 additions and 26 deletions

View File

@@ -148,12 +148,11 @@ fn passes_cursor_filter_with_ts(gitlab_id: i64, issue_ts: i64, cursor: &SyncCurs
return false;
}
if issue_ts == cursor_ts {
if let Some(cursor_id) = cursor.tie_breaker_id {
if gitlab_id <= cursor_id {
return false;
}
}
if issue_ts == cursor_ts
&& let Some(cursor_id) = cursor.tie_breaker_id
&& gitlab_id <= cursor_id
{
return false;
}
true
@@ -219,6 +218,7 @@ fn process_single_issue(
}
/// Inner function that performs all DB operations within a transaction.
#[allow(clippy::too_many_arguments)]
fn process_issue_in_transaction(
tx: &Transaction<'_>,
config: &Config,
@@ -366,7 +366,11 @@ fn link_issue_label_tx(tx: &Transaction<'_>, issue_id: i64, label_id: i64) -> Re
}
/// Upsert a milestone within a transaction, returning its local ID.
fn upsert_milestone_tx(tx: &Transaction<'_>, project_id: i64, milestone: &MilestoneRow) -> Result<i64> {
fn upsert_milestone_tx(
tx: &Transaction<'_>,
project_id: i64,
milestone: &MilestoneRow,
) -> Result<i64> {
tx.execute(
"INSERT INTO milestones (gitlab_id, project_id, iid, title, description, state, due_date, web_url)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8)