Introduce the foundational observability layer for the sync pipeline:
- MetricsLayer: Custom tracing subscriber layer that captures span timing
and structured fields, materializing them into a hierarchical
Vec<StageTiming> tree for robot-mode performance data output
- logging: Dual-layer subscriber infrastructure with configurable stderr
verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily
rotation and configurable retention (default 30 days)
- SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs
table (start -> succeed|fail), with correlation IDs and aggregate counts
- LoggingConfig: New config section for log_dir, retention_days, and
file_logging toggle
- get_log_dir(): Path helper for log directory resolution
- is_permanent_api_error(): Distinguish retryable vs permanent API failures
(only 404 is truly permanent; 403/auth errors may be environmental)
Database changes:
- Migration 013: Add resource_events_synced_for_updated_at watermark columns
to issues and merge_requests tables for incremental resource event sync
- Migration 014: Enrich sync_runs with run_id correlation ID, aggregate
counts (total_items_processed, total_errors), and run_id index
- Wrap file-based migrations in savepoints for rollback safety
Dependencies: Add uuid (run_id generation), tracing-appender (file logging)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
GitLab returns null for the label/milestone fields on resource_label_events
and resource_milestone_events when the referenced label or milestone has
been deleted. This caused deserialization failures during sync.
- Add migration 012 to recreate both event tables with nullable
label_name, milestone_title, and milestone_id columns (SQLite
requires table recreation to alter NOT NULL constraints)
- Change GitLabLabelEvent.label and GitLabMilestoneEvent.milestone
to Option<> in the Rust types
- Update upsert functions to pass through None values correctly
- Add tests for null label and null milestone deserialization
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces five new tables that power temporal queries (timeline,
file-history, trace) via GitLab Resource Events APIs:
- resource_state_events: State transitions (opened/closed/reopened/merged/locked)
with actor tracking, source commit, and source MR references
- resource_label_events: Label add/remove history per entity
- resource_milestone_events: Milestone assignment changes per entity
- entity_references: Cross-reference table (Gate 2 prep) linking
source/target entity pairs with reference type and discovery method
- pending_dependent_fetches: Generic job queue for resource_events,
mr_closes_issues, and mr_diffs with exponential backoff retry
All event tables enforce entity exclusivity via CHECK constraints
(exactly one of issue_id or merge_request_id must be non-NULL).
Deduplication handled via UNIQUE indexes on (gitlab_id, project_id).
FK cascades ensure cleanup when parent entities are removed.
The dependent fetch queue uses a UNIQUE constraint on
(project_id, entity_type, entity_iid, job_type) for idempotent
enqueue, with partial indexes optimizing claim and retry queries.
Registered as migration 011 in the embedded MIGRATIONS array in db.rs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add chunk_max_bytes and chunk_count columns to embedding_metadata to
support config drift detection and adaptive dedup sizing. Includes a
partial index on sentinel rows (chunk_index=0) to accelerate the drift
detection and max-chunk queries.
Also exports LATEST_SCHEMA_VERSION as a public constant derived from
the MIGRATIONS array length, replacing the previously hardcoded magic
number in the health check.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three new migrations establish the search infrastructure:
- 007_documents: Creates the `documents` table as the central search
unit. Each document is a rendered text blob derived from an issue,
MR, or discussion. Includes `dirty_queue` table for tracking which
entities need document regeneration after ingestion changes.
- 008_fts5: Creates FTS5 virtual table `documents_fts` with content
sync triggers. Uses `unicode61` tokenizer with `remove_diacritics=2`
for broad language support. Automatic insert/update/delete triggers
keep the FTS index synchronized with the documents table.
- 009_embeddings: Creates `embeddings` table for storing vector
chunks produced by Ollama. Uses `doc_id * 1000 + chunk_index`
rowid encoding to support multi-chunk documents while enabling
efficient doc-level deduplication in vector search results.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces comprehensive database schema for merge request ingestion
(CP2), designed with forward compatibility for future features.
New tables:
- merge_requests: Core MR metadata with draft status, branch info,
detailed_merge_status (modern API field), and sync health telemetry
columns for debuggability
- mr_labels: Junction table linking MRs to shared labels table
- mr_assignees: MR assignee usernames (same pattern as issues)
- mr_reviewers: MR-specific reviewer tracking (not applicable to issues)
Additional indexes:
- discussions: Add merge_request_id and resolved status indexes
- notes: Add composite indexes for DiffNote file/line queries
DiffNote position enhancements:
- position_type: 'text' | 'image' | 'file' for diff comment semantics
- position_line_range_start/end: Multi-line comment range support
- position_base_sha/start_sha/head_sha: Commit context for diff notes
The schema captures CP3-ready fields (head_sha, references_short/full,
SHA triplet) at zero additional API cost, preparing for file-context
and cross-project reference features.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements a comprehensive relational schema for storing GitLab data
with full audit trail and raw payload preservation.
Migration 001_initial.sql establishes core metadata tables:
- projects: Tracked GitLab projects with paths and namespace
- sync_watermarks: Cursor-based incremental sync state per project
- schema_migrations: Migration tracking with checksums for integrity
Migration 002_issues.sql creates the issues data model:
- issues: Core issue data with timestamps, author, state, counts
- labels: Project-specific label definitions with colors/descriptions
- issue_labels: Many-to-many junction for issue-label relationships
- milestones: Project milestones with state and due dates
- discussions: Threaded discussions linked to issues/MRs
- notes: Individual notes within discussions with full metadata
- raw_payloads: Compressed original API responses keyed by entity
Migration 003_indexes.sql adds performance indexes:
- Covering indexes for common query patterns (state, updated_at)
- Composite indexes for filtered queries (project + state)
Migration 004_discussions_payload.sql extends discussions:
- Adds raw_payload column for discussion-level API preservation
- Enables debugging and data recovery from original responses
Migration 005_assignees_milestone_duedate.sql completes the model:
- issue_assignees: Many-to-many for multiple assignees per issue
- Adds milestone_id, due_date columns to issues table
- Indexes for assignee and milestone filtering
Schema supports both incremental sync and full historical queries.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>