perf(ingestion): implement prefetch pattern for issue discussions

Issue discussion sync was ~10x slower than MR discussion sync because it
used a fully sequential pattern: fetch one issue's discussions, write to
DB, repeat. MR sync already used a prefetch pattern with concurrent HTTP
requests followed by sequential DB writes.

This commit brings issue discussion sync to parity with MRs:

Architecture (prefetch pattern):
  1. HTTP phase: Concurrent fetches via `join_all()` with batch size
     controlled by `dependent_concurrency` config (default 8)
  2. Transform phase: Normalize discussions and notes during prefetch
  3. DB phase: Sequential writes with proper transaction boundaries

Changes:
  - gitlab/client.rs: Add `fetch_all_issue_discussions()` to mirror
    the existing MR pattern for API consistency
  - discussions.rs: Replace `ingest_issue_discussions()` with:
    * `prefetch_issue_discussions()` - async HTTP fetch + transform
    * `write_prefetched_issue_discussions()` - sync DB writes
    * New structs: `PrefetchedIssueDiscussions`, `PrefetchedDiscussion`
  - orchestrator.rs: Update `sync_discussions_sequential()` to use
    concurrent prefetch for each batch instead of sequential calls
  - surgical.rs: Update single-issue surgical sync to use new functions
  - mod.rs: Update public exports

Expected improvement: 5-10x speedup on issue discussion sync (from ~50s
to ~5-10s for large projects) due to concurrent HTTP round-trips.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
teernisse
2026-03-02 14:13:51 -05:00
parent b67bb8754c
commit 5fd1ce6905
5 changed files with 132 additions and 119 deletions

View File

@@ -576,6 +576,23 @@ impl GitLabClient {
Ok(discussions)
}
pub async fn fetch_all_issue_discussions(
&self,
gitlab_project_id: i64,
issue_iid: i64,
) -> Result<Vec<GitLabDiscussion>> {
use futures::StreamExt;
let mut discussions = Vec::new();
let mut stream = self.paginate_issue_discussions(gitlab_project_id, issue_iid);
while let Some(result) = stream.next().await {
discussions.push(result?);
}
Ok(discussions)
}
}
impl GitLabClient {