feat(ingestion): Implement cursor-based incremental sync from GitLab

Provides efficient data synchronization with minimal API calls.

src/ingestion/issues.rs - Issue sync logic:
- Cursor-based incremental sync using updated_at timestamp
- Fetches only issues modified since last sync
- Configurable cursor rewind for overlap safety (default 2s)
- Batched database writes with transaction wrapping
- Upserts issues, labels, milestones, and assignees
- Maintains issue_labels and issue_assignees junction tables
- Returns IngestIssuesResult with counts and issues needing discussion sync
- Identifies issues where discussion count changed

src/ingestion/discussions.rs - Discussion sync logic:
- Fetches discussions for issues that need sync
- Compares discussion count vs stored to detect changes
- Batched note insertion with raw payload preservation
- Updates discussion metadata (resolved state, note counts)
- Tracks sync state per discussion to enable incremental updates
- Returns IngestDiscussionsResult with fetched/skipped counts

src/ingestion/orchestrator.rs - Sync coordination:
- Two-phase sync: issues first, then discussions
- Progress callback support for CLI progress bars
- ProgressEvent enum for fine-grained status updates:
  - IssueFetch, IssueProcess, DiscussionFetch, DiscussionSkip
- Acquires sync lock before starting
- Updates sync watermark on successful completion
- Handles partial failures gracefully (watermark not updated)
- Returns IngestProjectResult with detailed statistics

The architecture supports future additions:
- Merge request ingestion (parallel to issues)
- Full-text search indexing hooks
- Vector embedding pipeline integration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Taylor Eernisse
2026-01-26 11:28:34 -05:00
parent dd5eb04953
commit cd60350c6d
4 changed files with 1153 additions and 0 deletions

15
src/ingestion/mod.rs Normal file
View File

@@ -0,0 +1,15 @@
//! Data ingestion modules for GitLab resources.
//!
//! This module handles fetching and storing issues, discussions, and notes
//! from GitLab with cursor-based incremental sync.
pub mod discussions;
pub mod issues;
pub mod orchestrator;
pub use discussions::{IngestDiscussionsResult, ingest_issue_discussions};
pub use issues::{IngestIssuesResult, IssueForDiscussionSync, ingest_issues};
pub use orchestrator::{
IngestProjectResult, ProgressCallback, ProgressEvent, ingest_project_issues,
ingest_project_issues_with_progress,
};