Previously, drain_resource_events, drain_mr_closes_issues, and
drain_mr_diffs each opened a transaction only for the job-complete +
watermark update, but the store operation ran outside that transaction.
If the process crashed between the store and the watermark update, data
would be persisted without the watermark advancing, causing silent
duplicates on the next sync.
Now each drain function opens the transaction before the store call and
commits it only after both the store and the watermark update succeed.
On error, the transaction is explicitly dropped so the connection is
not left in a half-committed state.
Also:
- store_resource_events no longer manages its own transaction; the caller
passes in a connection (which is actually the transaction)
- upsert_mr_file_changes wraps DELETE + INSERT in a transaction internally
- reset_discussion_watermarks now also clears diffs_synced_for_updated_at
- Orchestrator error span now includes closes_issues_failed + mr_diffs_failed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries)
- Add fetch_mr_file_changes config option and --no-file-changes CLI flag
- Add GitLab MR diffs API fetch pipeline with watermark-based sync
- Create migration 020 for diffs_synced_for_updated_at watermark column
- Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL:
DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers
- Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END)
- Add insert_file_change test helper, 8 new who tests, all 397 tests pass
- Also includes: list performance migration 019, autocorrect module, README updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The orchestrator already accepted a ShutdownSignal but only checked it
between phases (after all issues fetched, before discussions). The inner
loops in ingest_issues() and ingest_merge_requests() consumed entire
paginated streams without checking for cancellation.
On a large initial sync (thousands of issues/MRs), Ctrl+C could be
unresponsive for minutes while the current entity type finished draining.
Now both functions accept &ShutdownSignal and check is_cancelled() at
the top of each iteration, breaking out promptly and committing the
cursor for whatever was already processed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three hardening improvements to the ingestion orchestrator:
- Replace .unwrap_or(0) with ? on COUNT(*) queries for total_issues
and total_mrs. These are simple aggregate queries that should never
fail, but if they do (e.g. table missing after failed migration),
propagating the error gives an actionable message instead of silently
reporting 0 items.
- Wrap store_closes_issues_refs in a SAVEPOINT with proper
ROLLBACK/RELEASE. Previously, a failure mid-loop (e.g. on the 5th of
10 close-issue references) would leave partial refs committed. Now
the entire batch is atomic.
- Replace silent catch-all (_ => {}) arms in enqueue_resource_events
and update_resource_event_watermark with explicit warnings for
unknown entity_type values. Makes debugging easier when new entity
types are added but the match arms aren't updated.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes to the sync pipeline:
1. Atomic watermarks: wrap complete_job + update_watermark in a single
SQLite transaction so crash between them can't leave partial state.
2. Concurrent drain loops: prefetch HTTP requests via join_all (batch
size = dependent_concurrency), then write serially to DB. Reduces
~9K sequential requests from ~19 min to ~2.4 min.
3. Graceful shutdown: install Ctrl+C handler via ShutdownSignal
(Arc<AtomicBool>), thread through orchestrator/CLI, release locked
jobs on interrupt, record sync_run as "failed".
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests
(Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark
for incremental sync, and the missing idx_label_events_label index.
The MR transformer and ingestion pipeline now populate commit SHAs during
sync. The orchestrator uses watermark-based filtering for closes_issues
jobs instead of re-enqueuing all MRs every sync.
The Phase B PRD is updated to match the actual codebase: corrected
migration numbering (011-015), documented nullable label/milestone
fields (migration 012), watermark patterns (013), observability
infrastructure (014), simplified source_method values, and updated
entity_references schema to match implementation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extends the MR ingestion pipeline to populate the entity_references table
from multiple sources:
1. Resource state events (extract_refs_from_state_events):
Called after draining the resource_events queue for both issues and MRs.
Extracts "closes" relationships from the structured API data.
2. System notes (extract_refs_from_system_notes):
Called during MR ingestion to parse "mentioned in" and "closed by"
patterns from discussion note bodies.
3. MR closes_issues API (new):
- enqueue_mr_closes_issues_jobs(): Queues jobs for all MRs
- drain_mr_closes_issues(): Fetches closes_issues for each MR
- Records cross-references with source_method='closes_issues_api'
New progress events:
- ClosesIssuesFetchStarted { total }
- ClosesIssueFetched { current, total }
- ClosesIssuesFetchComplete { fetched, failed }
New result fields on IngestMrProjectResult:
- closes_issues_fetched: Count of successful fetches
- closes_issues_failed: Count of failed fetches
The pipeline now comprehensively builds the relationship graph between
issues and MRs, enabling queries like "what will close this issue?"
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add end-to-end observability to the sync and ingest pipelines:
Sync command:
- Generate UUID-based run_id for each sync invocation, propagated through
all child spans for log correlation across stages
- Accept MetricsLayer reference to extract hierarchical StageTiming data
after pipeline completion for robot-mode performance output
- Record sync runs in DB via SyncRunRecorder (start/succeed/fail lifecycle)
- Wrap entire sync execution in a root tracing span with run_id field
Ingest command:
- Wrap run_ingest in an instrumented root span with run_id and resource_type
- Add project path prefix to discussion progress bars for multi-project clarity
- Reset resource_events_synced_for_updated_at on --full re-sync
Sync status:
- Expand from single last_run to configurable recent runs list (default 10)
- Parse and expose StageTiming metrics from stored metrics_json
- Add run_id, total_items_processed, total_errors to SyncRunInfo
- Add mr_count to DataSummary for complete entity coverage
Orchestrator:
- Add #[instrument] with structured fields to issue and MR ingestion functions
- Record items_processed, items_skipped, errors on span close for MetricsLayer
- Emit granular progress events (IssuesFetchStarted, IssuesFetchComplete)
- Pass project_id through to drain_resource_events for scoped job claiming
Document regenerator and embedding pipeline:
- Add #[instrument] spans with items_processed, items_skipped, errors fields
- Record final counts on span close for metrics extraction
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two hardening changes to the dependent queue and orchestrator:
- dependent_queue::fail_job now propagates the rusqlite error via ?
instead of silently falling back to 0 attempts when the job row is
missing. A missing job is a real bug that should surface, not be
masked by unwrap_or(0) which would cause infinite retries at the
base backoff interval.
- orchestrator::enqueue_resource_events_for_entity_type replaces
format!-based SQL ("SELECT {id_col} FROM {table}") with separate
hardcoded queries per entity type. While the original values were
not user-controlled, hardcoded SQL is clearer about intent and
eliminates a class of injection risk entirely.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
events_db.rs:
- Removed internal savepoints from upsert_state_events,
upsert_label_events, and upsert_milestone_events. Each function
previously created its own savepoint, making it impossible for
callers to wrap all three in a single atomic transaction.
- Changed signatures from &mut Connection to &Connection, since
savepoints are no longer created internally. This makes the
functions compatible with rusqlite::Transaction (which derefs to
Connection), allowing callers to pass a transaction directly.
orchestrator.rs:
- Deleted the three store_*_events_tx() functions (store_state_events_tx,
store_label_events_tx, store_milestone_events_tx) which were
hand-duplicated copies of the events_db upsert functions, created as
a workaround for the &mut Connection requirement. Now that events_db
accepts &Connection, store_resource_events() calls the canonical
upsert functions directly through the unchecked_transaction.
- Replaced the max-iterations guard in drain_resource_events() with a
HashSet-based deduplication of job IDs. The old guard used an
arbitrary 2x multiplier on total_pending which could either terminate
too early (if many retries were legitimate) or too late. The new
approach precisely prevents reprocessing the same job within a single
drain run, which is the actual invariant we need.
Net effect: ~133 lines of duplicated SQL removed, single source of
truth for event upsert logic, and callers control transaction scope.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three bugs fixed:
1. Early return in orchestrator when no discussions needed sync also
skipped resource event enqueue+drain. On incremental syncs (the most
common case), resource events were never fetched. Restructured to use
if/else instead of early return so Step 4 always executes.
2. Ingest command JSON and human-readable output silently dropped
resource_events_fetched/failed counts. Added to IngestJsonData and
print_ingest_summary.
3. Progress bar reuse after finish_and_clear caused indicatif to silently
ignore subsequent set_position/set_length calls. Added reset() call
before reconfiguring the bar for resource events.
Also removed stale comment referencing "unsafe" that didn't reflect
the actual unchecked_transaction approach.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Integrate resource event fetching as Step 4 of both issue and MR
ingestion, gated behind the fetch_resource_events config flag.
Orchestrator changes:
- Add ProgressEvent variants: ResourceEventsFetchStarted,
ResourceEventFetched, ResourceEventsFetchComplete
- Add resource_events_fetched/failed fields to IngestProjectResult
and IngestMrProjectResult
- New enqueue_resource_events_for_entity_type() queries all
issues/MRs for a project and enqueues resource_events jobs via
the dependent queue (INSERT OR IGNORE for idempotency)
- New drain_resource_events() claims jobs in batches, fetches
state/label/milestone events from GitLab API, stores them
atomically via unchecked_transaction, and handles failures
with exponential backoff via fail_job()
- Max-iterations guard prevents infinite retry loops within a
single drain run
- New store_resource_events() + per-type _tx helpers write events
using prepared statements inside a single transaction
- DrainResult struct tracks fetched/failed counts
CLI ingest changes:
- IngestResult gains resource_events_fetched/failed fields
- Progress bar repurposed for resource event fetch phase
(reuses discussion bar with updated template)
- Accumulates event counts from both issue and MR ingestion
CLI sync changes:
- SyncResult gains resource_events_fetched/failed fields
- Accumulates counts from both ingest stages
- print_sync() conditionally displays event counts
- Structured logging includes event counts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds complete merge request ingestion pipeline with a novel two-phase
discussion sync strategy optimized for throughput.
New modules:
- merge_requests.rs: MR upsert with labels/assignees/reviewers handling,
stale MR cleanup, and watermark-based incremental sync
- mr_discussions.rs: Parallel prefetch strategy for MR discussions
Two-phase MR discussion sync:
1. PREFETCH PHASE: Spawn concurrent tasks to fetch discussions for
multiple MRs simultaneously (configurable concurrency, default 8).
Transform and validate in parallel, storing results in memory.
2. WRITE PHASE: Serial database writes to avoid lock contention.
Each MR's discussions written in a single transaction, with
proper stale discussion cleanup.
This approach achieves ~4-8x throughput vs serial fetching while
maintaining database consistency. Transform errors are tracked per-MR
to prevent partial writes from corrupting watermarks.
Orchestrator updates:
- ingest_merge_requests(): Coordinates MR fetch -> discussion sync flow
- Progress callbacks emit MR-specific events for UI feedback
- Respects --full flag to reset discussion watermarks for full resync
The prefetch strategy is critical for MRs which typically have more
discussions than issues, and where API latency dominates sync time.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>