Commit Graph

14 Commits

Author SHA1 Message Date
Taylor Eernisse
ee5c5f9645 perf: Eliminate double serialization, add SQLite tuning, optimize hot paths
11 isomorphic performance fixes from deep audit (no behavior changes):

- Eliminate double serialization: store_payload now accepts pre-serialized
  bytes (&[u8]) instead of re-serializing from serde_json::Value. Uses
  Cow<[u8]> for zero-copy when compression is disabled.
- Add SQLite cache_size (64MB) and mmap_size (256MB) pragmas
- Replace SELECT-then-INSERT label upserts with INSERT...ON CONFLICT
  RETURNING in both issues.rs and merge_requests.rs
- Replace INSERT + SELECT milestone upsert with RETURNING
- Use prepare_cached for 5 hot-path queries in extractor.rs
- Optimize compute_list_hash: index-sort + incremental SHA-256 instead
  of clone+sort+join+hash
- Pre-allocate embedding float-to-bytes buffer with Vec::with_capacity
- Replace RandomState::new() in rand_jitter with atomic counter XOR nanos
- Remove redundant per-note payload storage (discussion payload contains
  all notes already)
- Change transform_issue to accept &GitLabIssue (avoids full struct clone)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 08:12:37 -05:00
Taylor Eernisse
f5b4a765b7 perf: Configurable rate limit, 429 auto-retry, concurrent project ingestion
The sync pipeline was bottlenecked at 10 req/s (hardcoded) with
sequential project processing and no retry on rate limiting. These
changes target 3-5x throughput improvement.

Rate limit configuration:
- Add requestsPerSecond to SyncConfig (default 30.0, was hardcoded 10)
- Pass configured rate through to GitLabClient::new from ingest
- Floor rate at 0.1 rps in RateLimiter::new to prevent panic on
  Duration::from_secs_f64(1.0 / 0.0) — now reachable via user config

429 auto-retry:
- Both request() and request_with_headers() retry up to 3 times on
  HTTP 429, respecting the retry-after header (default 60s)
- Extract parse_retry_after helper, reused by handle_response fallback
- After exhausting retries, the 429 error propagates as before
- Improved JSON decode errors now include a response body preview

Concurrent project ingestion:
- Derive Clone on GitLabClient (cheap: shares Arc<Mutex<RateLimiter>>
  and reqwest::Client which is already Arc-backed)
- Restructure project loop to use futures::stream::buffer_unordered
  with primary_concurrency (default 4) as the parallelism bound
- Each project gets its own SQLite connection (WAL mode + busy_timeout
  handles concurrent writes)
- Add show_spinner field to IngestDisplay to separate the per-project
  spinner from the sync-level stage spinner
- Error aggregation defers failures: all successful projects get their
  summaries printed and results counted before returning the first error
- Bump dependentConcurrency default from 2 to 8 for discussion prefetch

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:37:06 -05:00
Taylor Eernisse
a92e176bb6 fix(events): Handle nullable label and milestone in resource events
GitLab returns null for the label/milestone fields on resource_label_events
and resource_milestone_events when the referenced label or milestone has
been deleted. This caused deserialization failures during sync.

- Add migration 012 to recreate both event tables with nullable
  label_name, milestone_title, and milestone_id columns (SQLite
  requires table recreation to alter NOT NULL constraints)
- Change GitLabLabelEvent.label and GitLabMilestoneEvent.milestone
  to Option<> in the Rust types
- Update upsert functions to pass through None values correctly
- Add tests for null label and null milestone deserialization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:36:17 -05:00
Taylor Eernisse
deafa88af5 perf: Concurrent resource event fetching, remove unnecessary async
client.rs:
- fetch_all_resource_events() now uses tokio::try_join!() to fire all
  three API requests (state, label, milestone events) concurrently
  instead of awaiting each sequentially. For entities with many events,
  this reduces wall-clock time by up to ~3x since the three independent
  HTTP round-trips overlap.

main.rs:
- Removed async from handle_issues() and handle_mrs(). These functions
  perform only synchronous database queries and formatting; they never
  await anything. Removing the async annotation avoids the overhead of
  an unnecessary Future state machine and makes the sync nature of
  these code paths explicit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 14:09:44 -05:00
Taylor Eernisse
a50fc78823 style: Apply cargo fmt and clippy fixes across codebase
Automated formatting and lint corrections from parallel agent work:

- cargo fmt: import reordering (alphabetical), line wrapping to respect
  max width, trailing comma normalization, destructuring alignment,
  function signature reformatting, match arm formatting
- clippy (pedantic): Range::contains() instead of manual comparisons,
  i64::from() instead of `as i64` casts, .clamp() instead of
  .max().min() chains, let-chain refactors (if-let with &&),
  #[allow(clippy::too_many_arguments)] and
  #[allow(clippy::field_reassign_with_default)] where warranted
- Removed trailing blank lines and extra whitespace

No behavioral changes. All existing tests pass unmodified.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 13:01:59 -05:00
Taylor Eernisse
e73d2907dc feat(client): Add Resource Events API endpoints with generic paginated fetcher
Extends GitLabClient with methods for fetching resource events from
GitLab's per-entity API endpoints. Adds a new impl block containing:

- fetch_all_pages<T>: Generic paginated collector that handles
  x-next-page header parsing with fallback to page-size heuristics.
  Uses per_page=100 and respects the existing rate limiter via
  request_with_headers. Terminates when: (a) x-next-page header is
  absent/stale, (b) response is empty, or (c) page is not full.

- Six typed endpoint methods:
  - fetch_issue_state_events / fetch_mr_state_events
  - fetch_issue_label_events / fetch_mr_label_events
  - fetch_issue_milestone_events / fetch_mr_milestone_events

- fetch_all_resource_events: Convenience method that fetches all three
  event types for an entity (issue or merge_request) in sequence,
  returning a tuple of (state, label, milestone) event vectors.
  Routes to issue or MR endpoints based on entity_type string.

All methods follow the existing client patterns: path formatting with
gitlab_project_id and iid, error propagation via Result, and rate
limiter integration through the shared request_with_headers path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:07:19 -05:00
Taylor Eernisse
92ff255909 feat(types): Add GitLab Resource Event serde types with deserialization tests
Adds six new types for deserializing responses from GitLab's three
Resource Events API endpoints (state, label, milestone):

- GitLabStateEvent: State transitions with optional user, source_commit,
  and source_merge_request reference
- GitLabLabelEvent: Label add/remove events with nested GitLabLabelRef
- GitLabMilestoneEvent: Milestone assignment changes with nested
  GitLabMilestoneRef
- GitLabMergeRequestRef: Lightweight MR reference (iid, title, web_url)
- GitLabLabelRef: Label metadata (id, name, color, description)
- GitLabMilestoneRef: Milestone metadata (id, iid, title)

All types derive Deserialize + Serialize and use Option<T> for nullable
fields (user, source_commit, color, description) to match GitLab's API
contract where these fields may be null.

Includes 8 new test cases covering:
- State events with/without user, with/without source_merge_request
- Label events for add and remove actions, including null color handling
- Milestone event deserialization
- Standalone ref type deserialization (MR, label, milestone)

Uses r##"..."## raw string delimiters where JSON contains hex color
codes (#FF0000) that would conflict with r#"..."# delimiters.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:06:56 -05:00
Taylor Eernisse
d31d5292f2 fix(gitlab): Improve pagination heuristics and fix rate limiter lock contention
Two targeted fixes to the GitLab API client:

1. Pagination: When the x-next-page header is missing but the current
   page returned a full page of results, heuristically advance to the
   next page instead of stopping. This fixes silent data truncation
   observed with certain GitLab instances that omit pagination headers
   on intermediate pages. The existing early-exit on empty or partial
   pages remains as the termination condition.

2. Rate limiter: Refactor the async acquire() method into a synchronous
   check_delay() that computes the required sleep duration and updates
   last_request time while holding the mutex, then releases the lock
   before sleeping. This eliminates holding the Mutex<RateLimiter>
   across an await point, which previously could block other request
   tasks unnecessarily during the sleep interval.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 15:46:05 -05:00
Taylor Eernisse
390f8a9288 refactor(core): Centralize timestamp parsing in core::time
Duplicate ISO 8601 timestamp parsing functions existed in both
discussion.rs and merge_request.rs transformers. This extracts
iso_to_ms_strict() and iso_to_ms_opt_strict() into core::time
as the single source of truth, and updates both transformer
modules to use the shared implementations.

Also removes the private now_ms() from merge_request.rs in
favor of the existing core::time::now_ms(), and replaces the
local parse_timestamp_opt() in discussion.rs with the public
iso_to_ms() from core::time.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 08:41:34 -05:00
Taylor Eernisse
d33f24c91b feat(transformers): Add MR transformer and polymorphic discussion support
Introduces NormalizedMergeRequest transformer and updates discussion
normalization to handle both issue and MR discussions polymorphically.

New transformers:
- NormalizedMergeRequest: Transforms API MergeRequest to database row,
  extracting labels/assignees/reviewers into separate collections for
  junction table insertion. Handles draft detection, detailed_merge_status
  preference over deprecated merge_status, and merge_user over merged_by.

Discussion transformer updates:
- NormalizedDiscussion now takes noteable_type ("Issue" | "MergeRequest")
  and noteable_id for polymorphic FK binding
- normalize_discussions_for_issue(): Convenience wrapper for issues
- normalize_discussions_for_mr(): Convenience wrapper for MRs
- DiffNote position fields (type, line_range, SHA triplet) now extracted
  from API position object for code review context

Design decisions:
- Transformer returns (normalized_item, labels, assignees, reviewers)
  tuple for efficient batch insertion without re-querying
- Timestamps converted to ms epoch for SQLite storage consistency
- Optional fields use map() chains for clean null handling

The polymorphic discussion approach allows reusing the same discussions
and notes tables for both issues and MRs, with noteable_type + FK
determining the parent relationship.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 22:45:29 -05:00
Taylor Eernisse
cc8c489fd2 feat(gitlab): Add MR and MR discussion API endpoints to client
Extends GitLabClient with endpoints for fetching merge requests and
their discussions, following the same patterns established for issues.

New methods:
- fetch_merge_requests(): Paginated MR listing with cursor support,
  using updated_after filter for incremental sync. Uses 'all' scope
  to include MRs where user is author/assignee/reviewer.
- fetch_merge_requests_single_page(): Single page variant for callers
  managing their own pagination (used by parallel prefetch)
- fetch_mr_discussions(): Paginated discussion listing for a single MR,
  returns full discussion trees with notes

API design notes:
- Uses keyset pagination (order_by=updated_at, keyset=true) for
  consistent results during sync operations
- MR endpoint uses /merge_requests (not /mrs) per GitLab API naming
- Discussion endpoint matches issue pattern for consistency
- Per_page defaults to 100 (GitLab max) for efficiency

The fetch_merge_requests_single_page method enables the parallel
prefetch strategy used in mr_discussions.rs, where multiple MRs'
discussions are fetched concurrently during the sweep phase.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 22:45:13 -05:00
Taylor Eernisse
a18908c377 feat(gitlab): Add MergeRequest and related types for API deserialization
Extends GitLab type definitions with comprehensive merge request support,
matching the API response structure for /projects/:id/merge_requests.

New types:
- MergeRequest: Full MR metadata including draft status, branch info,
  detailed_merge_status, merge_user (modern API fields replacing
  deprecated alternatives), and references for cross-project support
- MrReviewer: Reviewer user info (MR-specific, distinct from assignees)
- MrAssignee: Assignee user info with consistent structure
- MrDiscussion: MR discussion wrapper for polymorphic handling
- DiffNotePosition: Rich position data for code review comments with
  line ranges and SHA triplet for commit context

Design decisions:
- Use Option<T> for all nullable API fields to handle partial responses
- Include deprecated fields (merged_by, merge_status) alongside modern
  alternatives for backward compatibility with older GitLab instances
- DiffNotePosition uses Option for all fields since different position
  types (text/image/file) populate different subsets

These types enable type-safe deserialization of GitLab MR API responses
with full coverage of the fields needed for CP2 ingestion.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 22:44:58 -05:00
Taylor Eernisse
d9d749ac57 fix(discussion): Make NormalizedDiscussion polymorphic for MR support
This is a P0 fix from the CP1-CP2 alignment audit. The original
NormalizedDiscussion struct had issue_id as a non-optional i64 and
hardcoded noteable_type to "Issue", making it incompatible with merge
request discussions even though the database schema already supports
both via nullable columns and a CHECK constraint.

Changes:
- Add NoteableRef enum with Issue(i64) and MergeRequest(i64) variants
  to provide compile-time safety against mixing up issue vs MR IDs
- Change NormalizedDiscussion.issue_id from i64 to Option<i64>
- Add NormalizedDiscussion.merge_request_id: Option<i64>
- Update transform_discussion() signature to take NoteableRef instead
  of local_issue_id, deriving issue_id/merge_request_id/noteable_type
  from the enum variant
- Update upsert_discussion() SQL to include merge_request_id column
  (now 12 parameters instead of 11)
- Export NoteableRef from transformers module
- Add test for MergeRequest discussion transformation
- Update all existing tests to use NoteableRef::Issue(id)

The database schema (migration 002) was forward-thinking and already
supports both issue_id and merge_request_id as nullable columns with
a CHECK constraint. This change prepares the application layer for
CP2 merge request support without requiring any migrations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 17:00:49 -05:00
Taylor Eernisse
dd5eb04953 feat(gitlab): Implement GitLab REST API client and type definitions
Provides a typed interface to the GitLab API with pagination support.

src/gitlab/types.rs - API response type definitions:
- GitLabIssue: Full issue payload with author, assignees, labels
- GitLabDiscussion: Discussion thread with notes array
- GitLabNote: Individual note with author, timestamps, body
- GitLabAuthor/GitLabUser: User information with avatar URLs
- GitLabProject: Project metadata from /api/v4/projects
- GitLabVersion: GitLab instance version from /api/v4/version
- GitLabNotePosition: Line-level position for diff notes
- All types derive Deserialize for JSON parsing

src/gitlab/client.rs - HTTP client with authentication:
- Bearer token authentication from config
- Base URL configuration for self-hosted instances
- Paginated iteration via keyset or offset pagination
- Automatic Link header parsing for next page URLs
- Per-page limit control (default 100)
- Methods: get_user(), get_version(), get_project()
- Async stream for issues: list_issues_paginated()
- Async stream for discussions: list_issue_discussions_paginated()
- Respects GitLab rate limiting via response headers

src/gitlab/transformers/ - API to database mapping:

transformers/issue.rs - Issue transformation:
- Maps GitLabIssue to IssueRow for database insert
- Extracts milestone ID and due date
- Normalizes author/assignee usernames
- Preserves label IDs for junction table
- Returns IssueWithMetadata including label/assignee lists

transformers/discussion.rs - Discussion transformation:
- Maps GitLabDiscussion to NormalizedDiscussion
- Extracts thread metadata (resolvable, resolved)
- Flattens notes to NormalizedNote with foreign keys
- Handles system notes vs user notes
- Preserves note position for diff discussions

transformers/mod.rs - Re-exports all transformer types

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 11:28:21 -05:00