Commit Graph

3 Commits

Author SHA1 Message Date
Taylor Eernisse
cd44e516e3 feat(ingestion): Implement MR sync with parallel discussion prefetch
Adds complete merge request ingestion pipeline with a novel two-phase
discussion sync strategy optimized for throughput.

New modules:
- merge_requests.rs: MR upsert with labels/assignees/reviewers handling,
  stale MR cleanup, and watermark-based incremental sync
- mr_discussions.rs: Parallel prefetch strategy for MR discussions

Two-phase MR discussion sync:
1. PREFETCH PHASE: Spawn concurrent tasks to fetch discussions for
   multiple MRs simultaneously (configurable concurrency, default 8).
   Transform and validate in parallel, storing results in memory.
2. WRITE PHASE: Serial database writes to avoid lock contention.
   Each MR's discussions written in a single transaction, with
   proper stale discussion cleanup.

This approach achieves ~4-8x throughput vs serial fetching while
maintaining database consistency. Transform errors are tracked per-MR
to prevent partial writes from corrupting watermarks.

Orchestrator updates:
- ingest_merge_requests(): Coordinates MR fetch -> discussion sync flow
- Progress callbacks emit MR-specific events for UI feedback
- Respects --full flag to reset discussion watermarks for full resync

The prefetch strategy is critical for MRs which typically have more
discussions than issues, and where API latency dominates sync time.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 22:45:48 -05:00
Taylor Eernisse
d9d749ac57 fix(discussion): Make NormalizedDiscussion polymorphic for MR support
This is a P0 fix from the CP1-CP2 alignment audit. The original
NormalizedDiscussion struct had issue_id as a non-optional i64 and
hardcoded noteable_type to "Issue", making it incompatible with merge
request discussions even though the database schema already supports
both via nullable columns and a CHECK constraint.

Changes:
- Add NoteableRef enum with Issue(i64) and MergeRequest(i64) variants
  to provide compile-time safety against mixing up issue vs MR IDs
- Change NormalizedDiscussion.issue_id from i64 to Option<i64>
- Add NormalizedDiscussion.merge_request_id: Option<i64>
- Update transform_discussion() signature to take NoteableRef instead
  of local_issue_id, deriving issue_id/merge_request_id/noteable_type
  from the enum variant
- Update upsert_discussion() SQL to include merge_request_id column
  (now 12 parameters instead of 11)
- Export NoteableRef from transformers module
- Add test for MergeRequest discussion transformation
- Update all existing tests to use NoteableRef::Issue(id)

The database schema (migration 002) was forward-thinking and already
supports both issue_id and merge_request_id as nullable columns with
a CHECK constraint. This change prepares the application layer for
CP2 merge request support without requiring any migrations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 17:00:49 -05:00
Taylor Eernisse
cd60350c6d feat(ingestion): Implement cursor-based incremental sync from GitLab
Provides efficient data synchronization with minimal API calls.

src/ingestion/issues.rs - Issue sync logic:
- Cursor-based incremental sync using updated_at timestamp
- Fetches only issues modified since last sync
- Configurable cursor rewind for overlap safety (default 2s)
- Batched database writes with transaction wrapping
- Upserts issues, labels, milestones, and assignees
- Maintains issue_labels and issue_assignees junction tables
- Returns IngestIssuesResult with counts and issues needing discussion sync
- Identifies issues where discussion count changed

src/ingestion/discussions.rs - Discussion sync logic:
- Fetches discussions for issues that need sync
- Compares discussion count vs stored to detect changes
- Batched note insertion with raw payload preservation
- Updates discussion metadata (resolved state, note counts)
- Tracks sync state per discussion to enable incremental updates
- Returns IngestDiscussionsResult with fetched/skipped counts

src/ingestion/orchestrator.rs - Sync coordination:
- Two-phase sync: issues first, then discussions
- Progress callback support for CLI progress bars
- ProgressEvent enum for fine-grained status updates:
  - IssueFetch, IssueProcess, DiscussionFetch, DiscussionSkip
- Acquires sync lock before starting
- Updates sync watermark on successful completion
- Handles partial failures gracefully (watermark not updated)
- Returns IngestProjectResult with detailed statistics

The architecture supports future additions:
- Merge request ingestion (parallel to issues)
- Full-text search indexing hooks
- Vector embedding pipeline integration

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 11:28:34 -05:00