Migration 019 created performance indexes but never recorded itself
in the schema_version table. Without this row the migration runner
considers the schema outdated and would attempt to re-apply. Adds
the standard INSERT INTO schema_version for version 19.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries)
- Add fetch_mr_file_changes config option and --no-file-changes CLI flag
- Add GitLab MR diffs API fetch pipeline with watermark-based sync
- Create migration 020 for diffs_synced_for_updated_at watermark column
- Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL:
DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers
- Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END)
- Add insert_file_change test helper, 8 new who tests, all 397 tests pass
- Also includes: list performance migration 019, autocorrect module, README updates
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three correctness bugs found via peer code review:
1. TimelineEvent PartialEq/Ord omitted entity_type — issue #42 and MR #42
with the same timestamp and event_type were treated as equal. In a
BTreeSet or dedup, one would silently be dropped. Added entity_type to
both PartialEq and Ord comparisons.
2. discussions.rs: store_payload() was called outside the transaction
(on bare conn) while upsert_discussion/notes were inside. A crash
between them left orphaned payload rows. Moved store_payload inside
the unchecked_transaction block, matching mr_discussions.rs pattern.
3. Migration 017 created idx_issue_assignees_username(username, issue_id)
but migration 005 already created the same index name with just
(username). SQLite's IF NOT EXISTS silently skipped the composite
version on every existing database. New migration 018 drops and
recreates the index with correct composite columns.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add `lore who` command with 5 query modes answering collaboration questions
using existing DB data (280K notes, 210K discussions, 33K DiffNotes):
- Expert: who knows about a file/directory (DiffNote path analysis + MR breadth scoring)
- Workload: what is a person working on (assigned issues, authored/reviewing MRs, discussions)
- Active: what discussions need attention (unresolved resolvable, global/project-scoped)
- Overlap: who else is touching these files (dual author+reviewer role tracking)
- Reviews: what review patterns does a person have (prefix-based category extraction)
Includes migration 017 (5 composite indexes), CLI skeleton with clap conflicts_with
validation, robot JSON output with input+resolved_input reproducibility, human terminal
output, and 20 unit tests. All quality gates pass.
Closes: bd-1q8z, bd-34rr, bd-2rk9, bd-2ldg, bd-zqpf, bd-s3rc, bd-m7k1, bd-b51e,
bd-2711, bd-1rdi, bd-3mj2, bd-tfh3, bd-zibc, bd-g0d5
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests
(Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark
for incremental sync, and the missing idx_label_events_label index.
The MR transformer and ingestion pipeline now populate commit SHAs during
sync. The orchestrator uses watermark-based filtering for closes_issues
jobs instead of re-enqueuing all MRs every sync.
The Phase B PRD is updated to match the actual codebase: corrected
migration numbering (011-015), documented nullable label/milestone
fields (migration 012), watermark patterns (013), observability
infrastructure (014), simplified source_method values, and updated
entity_references schema to match implementation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce the foundational observability layer for the sync pipeline:
- MetricsLayer: Custom tracing subscriber layer that captures span timing
and structured fields, materializing them into a hierarchical
Vec<StageTiming> tree for robot-mode performance data output
- logging: Dual-layer subscriber infrastructure with configurable stderr
verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily
rotation and configurable retention (default 30 days)
- SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs
table (start -> succeed|fail), with correlation IDs and aggregate counts
- LoggingConfig: New config section for log_dir, retention_days, and
file_logging toggle
- get_log_dir(): Path helper for log directory resolution
- is_permanent_api_error(): Distinguish retryable vs permanent API failures
(only 404 is truly permanent; 403/auth errors may be environmental)
Database changes:
- Migration 013: Add resource_events_synced_for_updated_at watermark columns
to issues and merge_requests tables for incremental resource event sync
- Migration 014: Enrich sync_runs with run_id correlation ID, aggregate
counts (total_items_processed, total_errors), and run_id index
- Wrap file-based migrations in savepoints for rollback safety
Dependencies: Add uuid (run_id generation), tracing-appender (file logging)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
GitLab returns null for the label/milestone fields on resource_label_events
and resource_milestone_events when the referenced label or milestone has
been deleted. This caused deserialization failures during sync.
- Add migration 012 to recreate both event tables with nullable
label_name, milestone_title, and milestone_id columns (SQLite
requires table recreation to alter NOT NULL constraints)
- Change GitLabLabelEvent.label and GitLabMilestoneEvent.milestone
to Option<> in the Rust types
- Update upsert functions to pass through None values correctly
- Add tests for null label and null milestone deserialization
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces five new tables that power temporal queries (timeline,
file-history, trace) via GitLab Resource Events APIs:
- resource_state_events: State transitions (opened/closed/reopened/merged/locked)
with actor tracking, source commit, and source MR references
- resource_label_events: Label add/remove history per entity
- resource_milestone_events: Milestone assignment changes per entity
- entity_references: Cross-reference table (Gate 2 prep) linking
source/target entity pairs with reference type and discovery method
- pending_dependent_fetches: Generic job queue for resource_events,
mr_closes_issues, and mr_diffs with exponential backoff retry
All event tables enforce entity exclusivity via CHECK constraints
(exactly one of issue_id or merge_request_id must be non-NULL).
Deduplication handled via UNIQUE indexes on (gitlab_id, project_id).
FK cascades ensure cleanup when parent entities are removed.
The dependent fetch queue uses a UNIQUE constraint on
(project_id, entity_type, entity_iid, job_type) for idempotent
enqueue, with partial indexes optimizing claim and retry queries.
Registered as migration 011 in the embedded MIGRATIONS array in db.rs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add chunk_max_bytes and chunk_count columns to embedding_metadata to
support config drift detection and adaptive dedup sizing. Includes a
partial index on sentinel rows (chunk_index=0) to accelerate the drift
detection and max-chunk queries.
Also exports LATEST_SCHEMA_VERSION as a public constant derived from
the MIGRATIONS array length, replacing the previously hardcoded magic
number in the health check.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three new migrations establish the search infrastructure:
- 007_documents: Creates the `documents` table as the central search
unit. Each document is a rendered text blob derived from an issue,
MR, or discussion. Includes `dirty_queue` table for tracking which
entities need document regeneration after ingestion changes.
- 008_fts5: Creates FTS5 virtual table `documents_fts` with content
sync triggers. Uses `unicode61` tokenizer with `remove_diacritics=2`
for broad language support. Automatic insert/update/delete triggers
keep the FTS index synchronized with the documents table.
- 009_embeddings: Creates `embeddings` table for storing vector
chunks produced by Ollama. Uses `doc_id * 1000 + chunk_index`
rowid encoding to support multi-chunk documents while enabling
efficient doc-level deduplication in vector search results.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces comprehensive database schema for merge request ingestion
(CP2), designed with forward compatibility for future features.
New tables:
- merge_requests: Core MR metadata with draft status, branch info,
detailed_merge_status (modern API field), and sync health telemetry
columns for debuggability
- mr_labels: Junction table linking MRs to shared labels table
- mr_assignees: MR assignee usernames (same pattern as issues)
- mr_reviewers: MR-specific reviewer tracking (not applicable to issues)
Additional indexes:
- discussions: Add merge_request_id and resolved status indexes
- notes: Add composite indexes for DiffNote file/line queries
DiffNote position enhancements:
- position_type: 'text' | 'image' | 'file' for diff comment semantics
- position_line_range_start/end: Multi-line comment range support
- position_base_sha/start_sha/head_sha: Commit context for diff notes
The schema captures CP3-ready fields (head_sha, references_short/full,
SHA triplet) at zero additional API cost, preparing for file-context
and cross-project reference features.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements a comprehensive relational schema for storing GitLab data
with full audit trail and raw payload preservation.
Migration 001_initial.sql establishes core metadata tables:
- projects: Tracked GitLab projects with paths and namespace
- sync_watermarks: Cursor-based incremental sync state per project
- schema_migrations: Migration tracking with checksums for integrity
Migration 002_issues.sql creates the issues data model:
- issues: Core issue data with timestamps, author, state, counts
- labels: Project-specific label definitions with colors/descriptions
- issue_labels: Many-to-many junction for issue-label relationships
- milestones: Project milestones with state and due dates
- discussions: Threaded discussions linked to issues/MRs
- notes: Individual notes within discussions with full metadata
- raw_payloads: Compressed original API responses keyed by entity
Migration 003_indexes.sql adds performance indexes:
- Covering indexes for common query patterns (state, updated_at)
- Composite indexes for filtered queries (project + state)
Migration 004_discussions_payload.sql extends discussions:
- Adds raw_payload column for discussion-level API preservation
- Enables debugging and data recovery from original responses
Migration 005_assignees_milestone_duedate.sql completes the model:
- issue_assignees: Many-to-many for multiple assignees per issue
- Adds milestone_id, due_date columns to issues table
- Indexes for assignee and milestone filtering
Schema supports both incremental sync and full historical queries.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>