gitlore

Author	SHA1	Message	Date
Taylor Eernisse	233eb546af	feat: Add commit SHAs, closes_issues watermark, and PRD alignment Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests (Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark for incremental sync, and the missing idx_label_events_label index. The MR transformer and ingestion pipeline now populate commit SHAs during sync. The orchestrator uses watermark-based filtering for closes_issues jobs instead of re-enqueuing all MRs every sync. The Phase B PRD is updated to match the actual codebase: corrected migration numbering (011-015), documented nullable label/milestone fields (migration 012), watermark patterns (013), observability infrastructure (014), simplified source_method values, and updated entity_references schema to match implementation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-05 15:29:51 -05:00
teernisse	329c8f4539	feat(observability): Add metrics, logging, and sync-run core modules Introduce the foundational observability layer for the sync pipeline: - MetricsLayer: Custom tracing subscriber layer that captures span timing and structured fields, materializing them into a hierarchical Vec<StageTiming> tree for robot-mode performance data output - logging: Dual-layer subscriber infrastructure with configurable stderr verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily rotation and configurable retention (default 30 days) - SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs table (start -> succeed\|fail), with correlation IDs and aggregate counts - LoggingConfig: New config section for log_dir, retention_days, and file_logging toggle - get_log_dir(): Path helper for log directory resolution - is_permanent_api_error(): Distinguish retryable vs permanent API failures (only 404 is truly permanent; 403/auth errors may be environmental) Database changes: - Migration 013: Add resource_events_synced_for_updated_at watermark columns to issues and merge_requests tables for incremental resource event sync - Migration 014: Enrich sync_runs with run_id correlation ID, aggregate counts (total_items_processed, total_errors), and run_id index - Wrap file-based migrations in savepoints for rollback safety Dependencies: Add uuid (run_id generation), tracing-appender (file logging) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:38:29 -05:00
Taylor Eernisse	a92e176bb6	fix(events): Handle nullable label and milestone in resource events GitLab returns null for the label/milestone fields on resource_label_events and resource_milestone_events when the referenced label or milestone has been deleted. This caused deserialization failures during sync. - Add migration 012 to recreate both event tables with nullable label_name, milestone_title, and milestone_id columns (SQLite requires table recreation to alter NOT NULL constraints) - Change GitLabLabelEvent.label and GitLabMilestoneEvent.milestone to Option<> in the Rust types - Update upsert functions to pass through None values correctly - Add tests for null label and null milestone deserialization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:36:17 -05:00
Taylor Eernisse	ce5cd9c95d	feat(schema): Add migration 011 for resource events, entity references, and dependent fetch queue Introduces five new tables that power temporal queries (timeline, file-history, trace) via GitLab Resource Events APIs: - resource_state_events: State transitions (opened/closed/reopened/merged/locked) with actor tracking, source commit, and source MR references - resource_label_events: Label add/remove history per entity - resource_milestone_events: Milestone assignment changes per entity - entity_references: Cross-reference table (Gate 2 prep) linking source/target entity pairs with reference type and discovery method - pending_dependent_fetches: Generic job queue for resource_events, mr_closes_issues, and mr_diffs with exponential backoff retry All event tables enforce entity exclusivity via CHECK constraints (exactly one of issue_id or merge_request_id must be non-NULL). Deduplication handled via UNIQUE indexes on (gitlab_id, project_id). FK cascades ensure cleanup when parent entities are removed. The dependent fetch queue uses a UNIQUE constraint on (project_id, entity_type, entity_iid, job_type) for idempotent enqueue, with partial indexes optimizing claim and retry queries. Registered as migration 011 in the embedded MIGRATIONS array in db.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:06:43 -05:00
Taylor Eernisse	2a52594a60	feat(db): Add migration 010 for chunk config tracking columns Add chunk_max_bytes and chunk_count columns to embedding_metadata to support config drift detection and adaptive dedup sizing. Includes a partial index on sentinel rows (chunk_index=0) to accelerate the drift detection and max-chunk queries. Also exports LATEST_SCHEMA_VERSION as a public constant derived from the MIGRATIONS array length, replacing the previously hardcoded magic number in the health check. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:34:48 -05:00
Taylor Eernisse	4270603da4	feat(db): Add migrations for documents, FTS5, and embeddings Three new migrations establish the search infrastructure: - 007_documents: Creates the `documents` table as the central search unit. Each document is a rendered text blob derived from an issue, MR, or discussion. Includes `dirty_queue` table for tracking which entities need document regeneration after ingestion changes. - 008_fts5: Creates FTS5 virtual table `documents_fts` with content sync triggers. Uses `unicode61` tokenizer with `remove_diacritics=2` for broad language support. Automatic insert/update/delete triggers keep the FTS index synchronized with the documents table. - 009_embeddings: Creates `embeddings` table for storing vector chunks produced by Ollama. Uses `doc_id * 1000 + chunk_index` rowid encoding to support multi-chunk documents while enabling efficient doc-level deduplication in vector search results. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:45:41 -05:00
Taylor Eernisse	39a71d8b85	feat(db): Add schema migration v6 for merge request support Introduces comprehensive database schema for merge request ingestion (CP2), designed with forward compatibility for future features. New tables: - merge_requests: Core MR metadata with draft status, branch info, detailed_merge_status (modern API field), and sync health telemetry columns for debuggability - mr_labels: Junction table linking MRs to shared labels table - mr_assignees: MR assignee usernames (same pattern as issues) - mr_reviewers: MR-specific reviewer tracking (not applicable to issues) Additional indexes: - discussions: Add merge_request_id and resolved status indexes - notes: Add composite indexes for DiffNote file/line queries DiffNote position enhancements: - position_type: 'text' \| 'image' \| 'file' for diff comment semantics - position_line_range_start/end: Multi-line comment range support - position_base_sha/start_sha/head_sha: Commit context for diff notes The schema captures CP3-ready fields (head_sha, references_short/full, SHA triplet) at zero additional API cost, preparing for file-context and cross-project reference features. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:44:37 -05:00
Taylor Eernisse	d15f457a58	feat(db): Add SQLite database migrations for GitLab data model Implements a comprehensive relational schema for storing GitLab data with full audit trail and raw payload preservation. Migration 001_initial.sql establishes core metadata tables: - projects: Tracked GitLab projects with paths and namespace - sync_watermarks: Cursor-based incremental sync state per project - schema_migrations: Migration tracking with checksums for integrity Migration 002_issues.sql creates the issues data model: - issues: Core issue data with timestamps, author, state, counts - labels: Project-specific label definitions with colors/descriptions - issue_labels: Many-to-many junction for issue-label relationships - milestones: Project milestones with state and due dates - discussions: Threaded discussions linked to issues/MRs - notes: Individual notes within discussions with full metadata - raw_payloads: Compressed original API responses keyed by entity Migration 003_indexes.sql adds performance indexes: - Covering indexes for common query patterns (state, updated_at) - Composite indexes for filtered queries (project + state) Migration 004_discussions_payload.sql extends discussions: - Adds raw_payload column for discussion-level API preservation - Enables debugging and data recovery from original responses Migration 005_assignees_milestone_duedate.sql completes the model: - issue_assignees: Many-to-many for multiple assignees per issue - Adds milestone_id, due_date columns to issues table - Indexes for assignee and milestone filtering Schema supports both incremental sync and full historical queries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:27:51 -05:00

8 Commits