Three new migrations establish the search infrastructure:
- 007_documents: Creates the `documents` table as the central search
unit. Each document is a rendered text blob derived from an issue,
MR, or discussion. Includes `dirty_queue` table for tracking which
entities need document regeneration after ingestion changes.
- 008_fts5: Creates FTS5 virtual table `documents_fts` with content
sync triggers. Uses `unicode61` tokenizer with `remove_diacritics=2`
for broad language support. Automatic insert/update/delete triggers
keep the FTS index synchronized with the documents table.
- 009_embeddings: Creates `embeddings` table for storing vector
chunks produced by Ollama. Uses `doc_id * 1000 + chunk_index`
rowid encoding to support multi-chunk documents while enabling
efficient doc-level deduplication in vector search results.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces comprehensive database schema for merge request ingestion
(CP2), designed with forward compatibility for future features.
New tables:
- merge_requests: Core MR metadata with draft status, branch info,
detailed_merge_status (modern API field), and sync health telemetry
columns for debuggability
- mr_labels: Junction table linking MRs to shared labels table
- mr_assignees: MR assignee usernames (same pattern as issues)
- mr_reviewers: MR-specific reviewer tracking (not applicable to issues)
Additional indexes:
- discussions: Add merge_request_id and resolved status indexes
- notes: Add composite indexes for DiffNote file/line queries
DiffNote position enhancements:
- position_type: 'text' | 'image' | 'file' for diff comment semantics
- position_line_range_start/end: Multi-line comment range support
- position_base_sha/start_sha/head_sha: Commit context for diff notes
The schema captures CP3-ready fields (head_sha, references_short/full,
SHA triplet) at zero additional API cost, preparing for file-context
and cross-project reference features.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements a comprehensive relational schema for storing GitLab data
with full audit trail and raw payload preservation.
Migration 001_initial.sql establishes core metadata tables:
- projects: Tracked GitLab projects with paths and namespace
- sync_watermarks: Cursor-based incremental sync state per project
- schema_migrations: Migration tracking with checksums for integrity
Migration 002_issues.sql creates the issues data model:
- issues: Core issue data with timestamps, author, state, counts
- labels: Project-specific label definitions with colors/descriptions
- issue_labels: Many-to-many junction for issue-label relationships
- milestones: Project milestones with state and due dates
- discussions: Threaded discussions linked to issues/MRs
- notes: Individual notes within discussions with full metadata
- raw_payloads: Compressed original API responses keyed by entity
Migration 003_indexes.sql adds performance indexes:
- Covering indexes for common query patterns (state, updated_at)
- Composite indexes for filtered queries (project + state)
Migration 004_discussions_payload.sql extends discussions:
- Adds raw_payload column for discussion-level API preservation
- Enables debugging and data recovery from original responses
Migration 005_assignees_milestone_duedate.sql completes the model:
- issue_assignees: Many-to-many for multiple assignees per issue
- Adds milestone_id, due_date columns to issues table
- Indexes for assignee and milestone filtering
Schema supports both incremental sync and full historical queries.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>