gitlore

Author	SHA1	Message	Date
teernisse	7fdeafa330	feat(db): add migration 028 for discussions.merge_request_id FK constraint Add foreign key constraint on discussions.merge_request_id to prevent orphaned discussions when MRs are deleted. SQLite doesn't support ALTER TABLE ADD CONSTRAINT, so this migration recreates the table with: 1. New table with FK: REFERENCES merge_requests(id) ON DELETE CASCADE 2. Data copy with FK validation (only copies rows with valid MR references) 3. Table swap (DROP old, RENAME new) 4. Full index recreation (all 10 indexes from migrations 002-022) The migration also includes a CHECK constraint ensuring mutual exclusivity: - Issue discussions have issue_id NOT NULL and merge_request_id NULL - MR discussions have merge_request_id NOT NULL and issue_id NULL Also fixes run_migrations() to properly propagate query errors instead of silently returning unwrap_or defaults, improving error diagnostics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-26 11:06:01 -05:00
teernisse	71d07c28d8	fix(migrations): add schema_version inserts to migrations 022-027 Defense-in-depth: The migration framework already handles missing inserts via INSERT OR REPLACE (db.rs:174), but adding explicit inserts to .sql files ensures consistency and makes migrations self-documenting. Migrations affected: - 022_notes_query_index - 024_note_documents - 025_note_dirty_backfill - 026_scoring_indexes - 027_surgical_sync_runs	2026-02-21 09:20:18 -05:00
teernisse	9ec1344945	feat(surgical-sync): add per-IID surgical sync pipeline with preflight validation Add the ability to sync specific issues or merge requests by IID without running a full incremental sync. This enables fast, targeted data refresh for individual entities — useful for agent workflows, debugging, and real-time investigation of specific issues or MRs. Architecture: - New CLI flags: --issue <IID> and --mr <IID> (repeatable, up to 100 total) scoped to a single project via -p/--project - Preflight phase validates all IIDs exist on GitLab before any DB writes, with TOCTOU-aware soft verification at ingest time - 6-stage pipeline: preflight -> fetch -> ingest -> dependents -> docs -> embed - Each stage is cancellation-aware via ShutdownSignal - Dedicated SyncRunRecorder extensions track surgical-specific counters (issues_fetched, mrs_ingested, docs_regenerated, etc.) New modules: - src/ingestion/surgical.rs: Core surgical fetch/ingest/dependent logic with preflight_fetch(), ingest_issue_by_iid(), ingest_mr_by_iid(), and fetch_dependents_for_{issue,mr}() - src/cli/commands/sync_surgical.rs: Full CLI orchestrator with progress spinners, human/robot output, and cancellation handling - src/embedding/pipeline.rs: embed_documents_by_ids() for scoped embedding - src/documents/regenerator.rs: regenerate_dirty_documents_for_sources() for scoped document regeneration Database changes: - Migration 027: Extends sync_runs with mode, phase, surgical_iids_json, per-entity counters, and cancelled_at column - New indexes: idx_sync_runs_mode_started, idx_sync_runs_status_phase_started GitLab client: - get_issue_by_iid() and get_mr_by_iid() single-entity fetch methods Error handling: - New SurgicalPreflightFailed error variant with entity_type, iid, project, and reason fields. Shares exit code 6 with GitLabNotFound. Includes comprehensive test coverage: - 645 lines of surgical ingestion tests (wiremock-based) - 184 lines of scoped embedding tests - 85 lines of scoped regeneration tests - 113 lines of GitLab client single-entity tests - 236 lines of sync_run surgical column/counter tests - Unit tests for SyncOptions, error codes, and CLI validation	2026-02-18 16:28:21 -05:00
teernisse	94c8613420	feat(bd-226s): implement time-decay expert scoring model Replace flat-weight expertise scoring with exponential half-life decay, split reviewer signals (participated vs assigned-only), dual-path rename awareness, and new CLI flags (--as-of, --explain-score, --include-bots, --all-history). Changes: - ScoringConfig: 8 new fields with validation (config.rs) - half_life_decay() and normalize_query_path() pure functions (who.rs) - CTE-based SQL with dual-path matching, mr_activity, reviewer_participation (who.rs) - Rust-side decay aggregation with deterministic f64 ordering (who.rs) - Path resolution probes check old_path columns (who.rs) - Migration 026: 5 new indexes for dual-path and reviewer participation - Default --since changed from 6m to 24m - 31 new tests (example-based + invariant), 621 total who tests passing - Autocorrect registry updated with new flags Closes: bd-226s, bd-2w1p, bd-1soz, bd-18dn, bd-2ao4, bd-2yu5, bd-1b50, bd-1hoq, bd-1h3f, bd-13q8, bd-11mg, bd-1vti, bd-1j5o	2026-02-12 15:44:55 -05:00
teernisse	83cd16c918	feat: implement per-note search and document pipeline - Add SourceType::Note with extract_note_document() and ParentMetadataCache - Migration 022: composite indexes for notes queries + author_id column - Migration 024: table rebuild adding 'note' to CHECK constraints, defense triggers - Migration 025: backfill existing non-system notes into dirty queue - Add lore notes CLI command with 17 filter options (author, path, resolution, etc.) - Support table/json/jsonl/csv output formats with field selection - Wire note dirty tracking through discussion and MR discussion ingestion - Fix test_migration_024_preserves_existing_data off-by-one (tested wrong migration) - Fix upsert_document_inner returning false for label/path-only changes	2026-02-12 13:31:24 -05:00
teernisse	b29c382583	feat(bd-2g50): fill data gaps in issue detail view Add references_full, user_notes_count, merge_requests_count computed fields to show issue. Add closed_at and confidential columns via migration 023. Closes: bd-2g50	2026-02-12 11:59:44 -05:00
Taylor Eernisse	6b75697638	feat(ingestion): enrich issues with work item status from GraphQL API Add a "Phase 1.5" status enrichment step to the issue ingestion pipeline that fetches work item statuses via the GitLab GraphQL API after the standard REST API ingestion completes. Schema changes (migration 021): - Add status_name, status_category, status_color, status_icon_name, and status_synced_at columns to the issues table (all nullable) Ingestion pipeline changes: - New `enrich_issue_statuses_txn()` function that applies fetched statuses in a single transaction with two phases: clear stale statuses for issues that no longer have a status widget, then apply new/updated statuses from the GraphQL response - ProgressEvent variants for status enrichment (complete/skipped) - IngestProjectResult tracks enrichment metrics (seen, enriched, cleared, without_widget, partial_error_count, enrichment_mode, errors) - Robot mode JSON output includes per-project status enrichment details Configuration: - New `sync.fetchWorkItemStatus` config option (defaults true) to disable GraphQL status enrichment on instances without Premium/Ultimate - `LoreError::GitLabAuthFailed` now treated as permanent API error so status enrichment auth failures don't trigger retries Also removes the unnecessary nested SAVEPOINT in store_closes_issues_refs (already runs within the orchestrator's transaction context). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 08:09:21 -05:00
Taylor Eernisse	7dd86d5433	fix(db): add missing schema_version insert to migration 019 Migration 019 created performance indexes but never recorded itself in the schema_version table. Without this row the migration runner considers the schema outdated and would attempt to re-apply. Adds the standard INSERT INTO schema_version for version 19. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 14:33:13 -05:00
Taylor Eernisse	95b7183add	feat(who): expand expert + overlap queries with mr_file_changes and mr_reviewers Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries) - Add fetch_mr_file_changes config option and --no-file-changes CLI flag - Add GitLab MR diffs API fetch pipeline with watermark-based sync - Create migration 020 for diffs_synced_for_updated_at watermark column - Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL: DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers - Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END) - Add insert_file_change test helper, 8 new who tests, all 397 tests pass - Also includes: list performance migration 019, autocorrect module, README updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 13:35:14 -05:00
Taylor Eernisse	121a634653	fix: critical data integrity — timeline dedup, discussion atomicity, index collision Three correctness bugs found via peer code review: 1. TimelineEvent PartialEq/Ord omitted entity_type — issue #42 and MR #42 with the same timestamp and event_type were treated as equal. In a BTreeSet or dedup, one would silently be dropped. Added entity_type to both PartialEq and Ord comparisons. 2. discussions.rs: store_payload() was called outside the transaction (on bare conn) while upsert_discussion/notes were inside. A crash between them left orphaned payload rows. Moved store_payload inside the unchecked_transaction block, matching mr_discussions.rs pattern. 3. Migration 017 created idx_issue_assignees_username(username, issue_id) but migration 005 already created the same index name with just (username). SQLite's IF NOT EXISTS silently skipped the composite version on every existing database. New migration 018 drops and recreates the index with correct composite columns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 07:54:59 -05:00
Taylor Eernisse	f267578aab	feat: implement lore who — people intelligence commands (5 modes) Add `lore who` command with 5 query modes answering collaboration questions using existing DB data (280K notes, 210K discussions, 33K DiffNotes): - Expert: who knows about a file/directory (DiffNote path analysis + MR breadth scoring) - Workload: what is a person working on (assigned issues, authored/reviewing MRs, discussions) - Active: what discussions need attention (unresolved resolvable, global/project-scoped) - Overlap: who else is touching these files (dual author+reviewer role tracking) - Reviews: what review patterns does a person have (prefix-based category extraction) Includes migration 017 (5 composite indexes), CLI skeleton with clap conflicts_with validation, robot JSON output with input+resolved_input reproducibility, human terminal output, and 20 unit tests. All quality gates pass. Closes: bd-1q8z, bd-34rr, bd-2rk9, bd-2ldg, bd-zqpf, bd-s3rc, bd-m7k1, bd-b51e, bd-2711, bd-1rdi, bd-3mj2, bd-tfh3, bd-zibc, bd-g0d5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 23:11:14 -05:00
Taylor Eernisse	3767c33c28	feat: Implement Gate 3 timeline pipeline and Gate 4 migration scaffolding Complete 5 beads for the Phase B temporal intelligence feature: - bd-1oo: Register migration 015 (commit SHAs, closes watermark) and create migration 016 (mr_file_changes table with 4 indexes for Gate 4 file-history) - bd-20e: Define TimelineEvent model with 9 event type variants, EntityRef, ExpandedEntityRef, UnresolvedRef, and TimelineResult types. Ord impl for chronological sorting with stable tiebreak. - bd-32q: Implement timeline seed phase - FTS5 keyword search to entity IDs with discussion-to-parent resolution, entity dedup, and evidence note extraction with snippet truncation. - bd-ypa: Implement timeline expand phase - BFS cross-reference expansion over entity_references with bidirectional traversal, depth limiting, mention filtering, provenance tracking, and unresolved reference collection. - bd-3as: Implement timeline event collection - gathers Created, StateChanged, LabelAdded/Removed, MilestoneSet/Removed, Merged, and NoteEvidence events. Merged dedup (state=merged -> Merged variant only). NULL label/milestone fallbacks. Chronological interleaving with since filter and limit. 38 new tests, all 445 tests pass. All quality gates clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-05 16:54:28 -05:00
Taylor Eernisse	233eb546af	feat: Add commit SHAs, closes_issues watermark, and PRD alignment Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests (Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark for incremental sync, and the missing idx_label_events_label index. The MR transformer and ingestion pipeline now populate commit SHAs during sync. The orchestrator uses watermark-based filtering for closes_issues jobs instead of re-enqueuing all MRs every sync. The Phase B PRD is updated to match the actual codebase: corrected migration numbering (011-015), documented nullable label/milestone fields (migration 012), watermark patterns (013), observability infrastructure (014), simplified source_method values, and updated entity_references schema to match implementation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-05 15:29:51 -05:00
teernisse	329c8f4539	feat(observability): Add metrics, logging, and sync-run core modules Introduce the foundational observability layer for the sync pipeline: - MetricsLayer: Custom tracing subscriber layer that captures span timing and structured fields, materializing them into a hierarchical Vec<StageTiming> tree for robot-mode performance data output - logging: Dual-layer subscriber infrastructure with configurable stderr verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily rotation and configurable retention (default 30 days) - SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs table (start -> succeed\|fail), with correlation IDs and aggregate counts - LoggingConfig: New config section for log_dir, retention_days, and file_logging toggle - get_log_dir(): Path helper for log directory resolution - is_permanent_api_error(): Distinguish retryable vs permanent API failures (only 404 is truly permanent; 403/auth errors may be environmental) Database changes: - Migration 013: Add resource_events_synced_for_updated_at watermark columns to issues and merge_requests tables for incremental resource event sync - Migration 014: Enrich sync_runs with run_id correlation ID, aggregate counts (total_items_processed, total_errors), and run_id index - Wrap file-based migrations in savepoints for rollback safety Dependencies: Add uuid (run_id generation), tracing-appender (file logging) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:38:29 -05:00
Taylor Eernisse	a92e176bb6	fix(events): Handle nullable label and milestone in resource events GitLab returns null for the label/milestone fields on resource_label_events and resource_milestone_events when the referenced label or milestone has been deleted. This caused deserialization failures during sync. - Add migration 012 to recreate both event tables with nullable label_name, milestone_title, and milestone_id columns (SQLite requires table recreation to alter NOT NULL constraints) - Change GitLabLabelEvent.label and GitLabMilestoneEvent.milestone to Option<> in the Rust types - Update upsert functions to pass through None values correctly - Add tests for null label and null milestone deserialization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:36:17 -05:00
Taylor Eernisse	ce5cd9c95d	feat(schema): Add migration 011 for resource events, entity references, and dependent fetch queue Introduces five new tables that power temporal queries (timeline, file-history, trace) via GitLab Resource Events APIs: - resource_state_events: State transitions (opened/closed/reopened/merged/locked) with actor tracking, source commit, and source MR references - resource_label_events: Label add/remove history per entity - resource_milestone_events: Milestone assignment changes per entity - entity_references: Cross-reference table (Gate 2 prep) linking source/target entity pairs with reference type and discovery method - pending_dependent_fetches: Generic job queue for resource_events, mr_closes_issues, and mr_diffs with exponential backoff retry All event tables enforce entity exclusivity via CHECK constraints (exactly one of issue_id or merge_request_id must be non-NULL). Deduplication handled via UNIQUE indexes on (gitlab_id, project_id). FK cascades ensure cleanup when parent entities are removed. The dependent fetch queue uses a UNIQUE constraint on (project_id, entity_type, entity_iid, job_type) for idempotent enqueue, with partial indexes optimizing claim and retry queries. Registered as migration 011 in the embedded MIGRATIONS array in db.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:06:43 -05:00
Taylor Eernisse	2a52594a60	feat(db): Add migration 010 for chunk config tracking columns Add chunk_max_bytes and chunk_count columns to embedding_metadata to support config drift detection and adaptive dedup sizing. Includes a partial index on sentinel rows (chunk_index=0) to accelerate the drift detection and max-chunk queries. Also exports LATEST_SCHEMA_VERSION as a public constant derived from the MIGRATIONS array length, replacing the previously hardcoded magic number in the health check. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:34:48 -05:00
Taylor Eernisse	4270603da4	feat(db): Add migrations for documents, FTS5, and embeddings Three new migrations establish the search infrastructure: - 007_documents: Creates the `documents` table as the central search unit. Each document is a rendered text blob derived from an issue, MR, or discussion. Includes `dirty_queue` table for tracking which entities need document regeneration after ingestion changes. - 008_fts5: Creates FTS5 virtual table `documents_fts` with content sync triggers. Uses `unicode61` tokenizer with `remove_diacritics=2` for broad language support. Automatic insert/update/delete triggers keep the FTS index synchronized with the documents table. - 009_embeddings: Creates `embeddings` table for storing vector chunks produced by Ollama. Uses `doc_id * 1000 + chunk_index` rowid encoding to support multi-chunk documents while enabling efficient doc-level deduplication in vector search results. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:45:41 -05:00
Taylor Eernisse	39a71d8b85	feat(db): Add schema migration v6 for merge request support Introduces comprehensive database schema for merge request ingestion (CP2), designed with forward compatibility for future features. New tables: - merge_requests: Core MR metadata with draft status, branch info, detailed_merge_status (modern API field), and sync health telemetry columns for debuggability - mr_labels: Junction table linking MRs to shared labels table - mr_assignees: MR assignee usernames (same pattern as issues) - mr_reviewers: MR-specific reviewer tracking (not applicable to issues) Additional indexes: - discussions: Add merge_request_id and resolved status indexes - notes: Add composite indexes for DiffNote file/line queries DiffNote position enhancements: - position_type: 'text' \| 'image' \| 'file' for diff comment semantics - position_line_range_start/end: Multi-line comment range support - position_base_sha/start_sha/head_sha: Commit context for diff notes The schema captures CP3-ready fields (head_sha, references_short/full, SHA triplet) at zero additional API cost, preparing for file-context and cross-project reference features. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:44:37 -05:00
Taylor Eernisse	d15f457a58	feat(db): Add SQLite database migrations for GitLab data model Implements a comprehensive relational schema for storing GitLab data with full audit trail and raw payload preservation. Migration 001_initial.sql establishes core metadata tables: - projects: Tracked GitLab projects with paths and namespace - sync_watermarks: Cursor-based incremental sync state per project - schema_migrations: Migration tracking with checksums for integrity Migration 002_issues.sql creates the issues data model: - issues: Core issue data with timestamps, author, state, counts - labels: Project-specific label definitions with colors/descriptions - issue_labels: Many-to-many junction for issue-label relationships - milestones: Project milestones with state and due dates - discussions: Threaded discussions linked to issues/MRs - notes: Individual notes within discussions with full metadata - raw_payloads: Compressed original API responses keyed by entity Migration 003_indexes.sql adds performance indexes: - Covering indexes for common query patterns (state, updated_at) - Composite indexes for filtered queries (project + state) Migration 004_discussions_payload.sql extends discussions: - Adds raw_payload column for discussion-level API preservation - Enables debugging and data recovery from original responses Migration 005_assignees_milestone_duedate.sql completes the model: - issue_assignees: Many-to-many for multiple assignees per issue - Adds milestone_id, due_date columns to issues table - Indexes for assignee and milestone filtering Schema supports both incremental sync and full historical queries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:27:51 -05:00

20 Commits