Add foreign key constraint on discussions.merge_request_id to prevent orphaned
discussions when MRs are deleted. SQLite doesn't support ALTER TABLE ADD CONSTRAINT,
so this migration recreates the table with:
1. New table with FK: REFERENCES merge_requests(id) ON DELETE CASCADE
2. Data copy with FK validation (only copies rows with valid MR references)
3. Table swap (DROP old, RENAME new)
4. Full index recreation (all 10 indexes from migrations 002-022)
The migration also includes a CHECK constraint ensuring mutual exclusivity:
- Issue discussions have issue_id NOT NULL and merge_request_id NULL
- MR discussions have merge_request_id NOT NULL and issue_id NULL
Also fixes run_migrations() to properly propagate query errors instead of
silently returning unwrap_or defaults, improving error diagnostics.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the ability to sync specific issues or merge requests by IID without
running a full incremental sync. This enables fast, targeted data refresh
for individual entities — useful for agent workflows, debugging, and
real-time investigation of specific issues or MRs.
Architecture:
- New CLI flags: --issue <IID> and --mr <IID> (repeatable, up to 100 total)
scoped to a single project via -p/--project
- Preflight phase validates all IIDs exist on GitLab before any DB writes,
with TOCTOU-aware soft verification at ingest time
- 6-stage pipeline: preflight -> fetch -> ingest -> dependents -> docs -> embed
- Each stage is cancellation-aware via ShutdownSignal
- Dedicated SyncRunRecorder extensions track surgical-specific counters
(issues_fetched, mrs_ingested, docs_regenerated, etc.)
New modules:
- src/ingestion/surgical.rs: Core surgical fetch/ingest/dependent logic
with preflight_fetch(), ingest_issue_by_iid(), ingest_mr_by_iid(),
and fetch_dependents_for_{issue,mr}()
- src/cli/commands/sync_surgical.rs: Full CLI orchestrator with progress
spinners, human/robot output, and cancellation handling
- src/embedding/pipeline.rs: embed_documents_by_ids() for scoped embedding
- src/documents/regenerator.rs: regenerate_dirty_documents_for_sources()
for scoped document regeneration
Database changes:
- Migration 027: Extends sync_runs with mode, phase, surgical_iids_json,
per-entity counters, and cancelled_at column
- New indexes: idx_sync_runs_mode_started, idx_sync_runs_status_phase_started
GitLab client:
- get_issue_by_iid() and get_mr_by_iid() single-entity fetch methods
Error handling:
- New SurgicalPreflightFailed error variant with entity_type, iid, project,
and reason fields. Shares exit code 6 with GitLabNotFound.
Includes comprehensive test coverage:
- 645 lines of surgical ingestion tests (wiremock-based)
- 184 lines of scoped embedding tests
- 85 lines of scoped regeneration tests
- 113 lines of GitLab client single-entity tests
- 236 lines of sync_run surgical column/counter tests
- Unit tests for SyncOptions, error codes, and CLI validation
Add references_full, user_notes_count, merge_requests_count computed
fields to show issue. Add closed_at and confidential columns via
migration 023.
Closes: bd-2g50
Add a "Phase 1.5" status enrichment step to the issue ingestion pipeline
that fetches work item statuses via the GitLab GraphQL API after the
standard REST API ingestion completes.
Schema changes (migration 021):
- Add status_name, status_category, status_color, status_icon_name, and
status_synced_at columns to the issues table (all nullable)
Ingestion pipeline changes:
- New `enrich_issue_statuses_txn()` function that applies fetched
statuses in a single transaction with two phases: clear stale statuses
for issues that no longer have a status widget, then apply new/updated
statuses from the GraphQL response
- ProgressEvent variants for status enrichment (complete/skipped)
- IngestProjectResult tracks enrichment metrics (seen, enriched, cleared,
without_widget, partial_error_count, enrichment_mode, errors)
- Robot mode JSON output includes per-project status enrichment details
Configuration:
- New `sync.fetchWorkItemStatus` config option (defaults true) to disable
GraphQL status enrichment on instances without Premium/Ultimate
- `LoreError::GitLabAuthFailed` now treated as permanent API error so
status enrichment auth failures don't trigger retries
Also removes the unnecessary nested SAVEPOINT in store_closes_issues_refs
(already runs within the orchestrator's transaction context).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migration 019 created performance indexes but never recorded itself
in the schema_version table. Without this row the migration runner
considers the schema outdated and would attempt to re-apply. Adds
the standard INSERT INTO schema_version for version 19.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries)
- Add fetch_mr_file_changes config option and --no-file-changes CLI flag
- Add GitLab MR diffs API fetch pipeline with watermark-based sync
- Create migration 020 for diffs_synced_for_updated_at watermark column
- Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL:
DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers
- Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END)
- Add insert_file_change test helper, 8 new who tests, all 397 tests pass
- Also includes: list performance migration 019, autocorrect module, README updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three correctness bugs found via peer code review:
1. TimelineEvent PartialEq/Ord omitted entity_type — issue #42 and MR #42
with the same timestamp and event_type were treated as equal. In a
BTreeSet or dedup, one would silently be dropped. Added entity_type to
both PartialEq and Ord comparisons.
2. discussions.rs: store_payload() was called outside the transaction
(on bare conn) while upsert_discussion/notes were inside. A crash
between them left orphaned payload rows. Moved store_payload inside
the unchecked_transaction block, matching mr_discussions.rs pattern.
3. Migration 017 created idx_issue_assignees_username(username, issue_id)
but migration 005 already created the same index name with just
(username). SQLite's IF NOT EXISTS silently skipped the composite
version on every existing database. New migration 018 drops and
recreates the index with correct composite columns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add `lore who` command with 5 query modes answering collaboration questions
using existing DB data (280K notes, 210K discussions, 33K DiffNotes):
- Expert: who knows about a file/directory (DiffNote path analysis + MR breadth scoring)
- Workload: what is a person working on (assigned issues, authored/reviewing MRs, discussions)
- Active: what discussions need attention (unresolved resolvable, global/project-scoped)
- Overlap: who else is touching these files (dual author+reviewer role tracking)
- Reviews: what review patterns does a person have (prefix-based category extraction)
Includes migration 017 (5 composite indexes), CLI skeleton with clap conflicts_with
validation, robot JSON output with input+resolved_input reproducibility, human terminal
output, and 20 unit tests. All quality gates pass.
Closes: bd-1q8z, bd-34rr, bd-2rk9, bd-2ldg, bd-zqpf, bd-s3rc, bd-m7k1, bd-b51e,
bd-2711, bd-1rdi, bd-3mj2, bd-tfh3, bd-zibc, bd-g0d5
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests
(Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark
for incremental sync, and the missing idx_label_events_label index.
The MR transformer and ingestion pipeline now populate commit SHAs during
sync. The orchestrator uses watermark-based filtering for closes_issues
jobs instead of re-enqueuing all MRs every sync.
The Phase B PRD is updated to match the actual codebase: corrected
migration numbering (011-015), documented nullable label/milestone
fields (migration 012), watermark patterns (013), observability
infrastructure (014), simplified source_method values, and updated
entity_references schema to match implementation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce the foundational observability layer for the sync pipeline:
- MetricsLayer: Custom tracing subscriber layer that captures span timing
and structured fields, materializing them into a hierarchical
Vec<StageTiming> tree for robot-mode performance data output
- logging: Dual-layer subscriber infrastructure with configurable stderr
verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily
rotation and configurable retention (default 30 days)
- SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs
table (start -> succeed|fail), with correlation IDs and aggregate counts
- LoggingConfig: New config section for log_dir, retention_days, and
file_logging toggle
- get_log_dir(): Path helper for log directory resolution
- is_permanent_api_error(): Distinguish retryable vs permanent API failures
(only 404 is truly permanent; 403/auth errors may be environmental)
Database changes:
- Migration 013: Add resource_events_synced_for_updated_at watermark columns
to issues and merge_requests tables for incremental resource event sync
- Migration 014: Enrich sync_runs with run_id correlation ID, aggregate
counts (total_items_processed, total_errors), and run_id index
- Wrap file-based migrations in savepoints for rollback safety
Dependencies: Add uuid (run_id generation), tracing-appender (file logging)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
GitLab returns null for the label/milestone fields on resource_label_events
and resource_milestone_events when the referenced label or milestone has
been deleted. This caused deserialization failures during sync.
- Add migration 012 to recreate both event tables with nullable
label_name, milestone_title, and milestone_id columns (SQLite
requires table recreation to alter NOT NULL constraints)
- Change GitLabLabelEvent.label and GitLabMilestoneEvent.milestone
to Option<> in the Rust types
- Update upsert functions to pass through None values correctly
- Add tests for null label and null milestone deserialization
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces five new tables that power temporal queries (timeline,
file-history, trace) via GitLab Resource Events APIs:
- resource_state_events: State transitions (opened/closed/reopened/merged/locked)
with actor tracking, source commit, and source MR references
- resource_label_events: Label add/remove history per entity
- resource_milestone_events: Milestone assignment changes per entity
- entity_references: Cross-reference table (Gate 2 prep) linking
source/target entity pairs with reference type and discovery method
- pending_dependent_fetches: Generic job queue for resource_events,
mr_closes_issues, and mr_diffs with exponential backoff retry
All event tables enforce entity exclusivity via CHECK constraints
(exactly one of issue_id or merge_request_id must be non-NULL).
Deduplication handled via UNIQUE indexes on (gitlab_id, project_id).
FK cascades ensure cleanup when parent entities are removed.
The dependent fetch queue uses a UNIQUE constraint on
(project_id, entity_type, entity_iid, job_type) for idempotent
enqueue, with partial indexes optimizing claim and retry queries.
Registered as migration 011 in the embedded MIGRATIONS array in db.rs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add chunk_max_bytes and chunk_count columns to embedding_metadata to
support config drift detection and adaptive dedup sizing. Includes a
partial index on sentinel rows (chunk_index=0) to accelerate the drift
detection and max-chunk queries.
Also exports LATEST_SCHEMA_VERSION as a public constant derived from
the MIGRATIONS array length, replacing the previously hardcoded magic
number in the health check.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three new migrations establish the search infrastructure:
- 007_documents: Creates the `documents` table as the central search
unit. Each document is a rendered text blob derived from an issue,
MR, or discussion. Includes `dirty_queue` table for tracking which
entities need document regeneration after ingestion changes.
- 008_fts5: Creates FTS5 virtual table `documents_fts` with content
sync triggers. Uses `unicode61` tokenizer with `remove_diacritics=2`
for broad language support. Automatic insert/update/delete triggers
keep the FTS index synchronized with the documents table.
- 009_embeddings: Creates `embeddings` table for storing vector
chunks produced by Ollama. Uses `doc_id * 1000 + chunk_index`
rowid encoding to support multi-chunk documents while enabling
efficient doc-level deduplication in vector search results.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces comprehensive database schema for merge request ingestion
(CP2), designed with forward compatibility for future features.
New tables:
- merge_requests: Core MR metadata with draft status, branch info,
detailed_merge_status (modern API field), and sync health telemetry
columns for debuggability
- mr_labels: Junction table linking MRs to shared labels table
- mr_assignees: MR assignee usernames (same pattern as issues)
- mr_reviewers: MR-specific reviewer tracking (not applicable to issues)
Additional indexes:
- discussions: Add merge_request_id and resolved status indexes
- notes: Add composite indexes for DiffNote file/line queries
DiffNote position enhancements:
- position_type: 'text' | 'image' | 'file' for diff comment semantics
- position_line_range_start/end: Multi-line comment range support
- position_base_sha/start_sha/head_sha: Commit context for diff notes
The schema captures CP3-ready fields (head_sha, references_short/full,
SHA triplet) at zero additional API cost, preparing for file-context
and cross-project reference features.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements a comprehensive relational schema for storing GitLab data
with full audit trail and raw payload preservation.
Migration 001_initial.sql establishes core metadata tables:
- projects: Tracked GitLab projects with paths and namespace
- sync_watermarks: Cursor-based incremental sync state per project
- schema_migrations: Migration tracking with checksums for integrity
Migration 002_issues.sql creates the issues data model:
- issues: Core issue data with timestamps, author, state, counts
- labels: Project-specific label definitions with colors/descriptions
- issue_labels: Many-to-many junction for issue-label relationships
- milestones: Project milestones with state and due dates
- discussions: Threaded discussions linked to issues/MRs
- notes: Individual notes within discussions with full metadata
- raw_payloads: Compressed original API responses keyed by entity
Migration 003_indexes.sql adds performance indexes:
- Covering indexes for common query patterns (state, updated_at)
- Composite indexes for filtered queries (project + state)
Migration 004_discussions_payload.sql extends discussions:
- Adds raw_payload column for discussion-level API preservation
- Enables debugging and data recovery from original responses
Migration 005_assignees_milestone_duedate.sql completes the model:
- issue_assignees: Many-to-many for multiple assignees per issue
- Adds milestone_id, due_date columns to issues table
- Indexes for assignee and milestone filtering
Schema supports both incremental sync and full historical queries.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>