gitlore

Author	SHA1	Message	Date
teernisse	a0519a4d0d	feat(surgical-sync): add per-IID surgical sync pipeline Implement lore sync --issue <IID> --mr <IID> -p <project> for on-demand sync of specific entities without running the full project-wide pipeline. Completes in seconds by fetching only targeted entities, their discussions, resource events, and dependent data, then scoping doc regeneration and embedding to only affected documents. Pipeline stages: PREFLIGHT -> TOCTOU -> INGEST -> DEPENDENTS -> DOCS -> EMBED New files: - src/ingestion/surgical.rs: TOCTOU guard, preflight fetch, per-entity ingest - src/ingestion/surgical_tests.rs: 17 unit/wiremock tests - src/cli/commands/sync_surgical.rs: 719-line orchestrator - src/embedding/pipeline_tests.rs: scoped embedding tests - src/gitlab/client_tests.rs: get_by_iid wiremock tests - migrations/027_surgical_sync_runs.sql: 12 surgical columns + indexes Key changes: - SyncOptions: issue_iids, mr_iids, project, preflight_only fields - SyncResult: surgical_mode, surgical_iids, entity_results fields - SyncRunRecorder: surgical lifecycle methods (set_surgical_metadata, etc) - GitLabClient: get_issue_by_iid, get_mr_by_iid - Scoped docs: regenerate_dirty_documents_for_sources - Scoped embed: embed_documents_by_ids - run_sync dispatches to run_sync_surgical when is_surgical() - robot-docs updated with surgical sync schema + workflows - All 1019 tests pass, clippy clean Closes: bd-1sc6, bd-tiux, bd-159p, bd-1lja, bd-hs6j, bd-1elx, bd-arka, bd-3sez, bd-wcja, bd-kanh, bd-1i4i, bd-3bec	2026-02-18 15:39:14 -05:00
Taylor Eernisse	dc49f5209e	feat(gitlab): add GraphQL client with adaptive pagination and work item status types Introduce a reusable GraphQL client (`src/gitlab/graphql.rs`) that handles GitLab's GraphQL API with full error handling for auth failures, rate limiting, and partial errors. Key capabilities: - Adaptive page sizing (100 → 50 → 25 → 10) to handle GitLab GraphQL complexity limits without hardcoding a single safe page size - Paginated issue status fetching via the workItems GraphQL query - Graceful detection of unsupported instances (missing GraphQL endpoint or forbidden auth) so ingestion continues without status data - Retry-After header parsing via the `httpdate` crate for rate limit compliance Also adds `WorkItemStatus` type to `gitlab::types` with name, category, color, and icon_name fields (all optional except name) with comprehensive deserialization tests covering all system statuses (TO_DO, IN_PROGRESS, DONE, CANCELED) and edge cases (null category, unknown future values). The `GitLabClient` gains a `graphql_client()` factory method for ergonomic access from the ingestion pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 08:08:53 -05:00
Taylor Eernisse	95b7183add	feat(who): expand expert + overlap queries with mr_file_changes and mr_reviewers Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries) - Add fetch_mr_file_changes config option and --no-file-changes CLI flag - Add GitLab MR diffs API fetch pipeline with watermark-based sync - Create migration 020 for diffs_synced_for_updated_at watermark column - Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL: DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers - Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END) - Add insert_file_change test helper, 8 new who tests, all 397 tests pass - Also includes: list performance migration 019, autocorrect module, README updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 13:35:14 -05:00
Taylor Eernisse	e6b880cbcb	fix: prevent panics in robot-mode JSON output and arithmetic paths Peer code review found multiple panic-reachable paths: 1. serde_json::to_string().unwrap() in 4 robot-mode output functions (who.rs, main.rs x3). If serialization ever failed (e.g., NaN from edge-case division), the CLI would panic with an unhelpful stack trace. Replaced with unwrap_or_else that emits a structured JSON error fallback. 2. encode_rowid() in chunk_ids.rs used unchecked multiplication (document_id * 1000). On extreme document IDs this could silently wrap in release mode, causing embedding rowid collisions. Now uses checked_mul + checked_add with a diagnostic panic message. 3. HTTP response body truncation at byte index 500 in client.rs could split a multi-byte UTF-8 character, causing a panic. Now uses floor_char_boundary(500) for safe truncation. 4. who.rs reviews mode: SQL used `m.author_username != ?1` which silently dropped MRs with NULL author_username (SQL NULL != anything = NULL). Changed to `(m.author_username IS NULL OR m.author_username != ?1)` to match the pattern already used in expert mode. 5. handle_auth_test hardcoded exit code 5 for all errors regardless of type. Config not found (20), token not set (4), and network errors (8) all incorrectly returned 5. Now uses e.exit_code() from the actual LoreError, with proper suggestion hints in human mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 07:55:20 -05:00
Taylor Eernisse	db750e4fc5	fix: Graceful HTTP client fallbacks and overflow protection HTTP client initialization (embedding/ollama.rs, gitlab/client.rs): - Replace expect/panic with unwrap_or_else fallback to default Client - Log warning when configured client fails to build - Prevents crash on TLS/system configuration issues Doctor command (cli/commands/doctor.rs): - Handle reqwest Client::builder() failure in Ollama health check - Return Warning status with descriptive message instead of panicking - Ensures doctor command remains operational even with HTTP issues These changes improve resilience when running in unusual environments (containers with limited TLS, restrictive network policies, etc.) without affecting normal operation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:21:40 -05:00
Taylor Eernisse	26cf13248d	feat(gitlab): Add MR closes_issues API endpoint and GitLabIssueRef type Extends the GitLab client to fetch the list of issues that an MR will close when merged, using the /projects/:id/merge_requests/:iid/closes_issues endpoint. New type: - GitLabIssueRef: Lightweight issue reference with id, iid, project_id, title, state, and web_url. Used for the closes_issues response which returns a list of issue summaries rather than full GitLabIssue objects. New client method: - fetch_mr_closes_issues(gitlab_project_id, iid): Returns Vec<GitLabIssueRef> for all issues that the MR's description/commits indicate will be closed. This enables building the entity_references table from API data in addition to parsing system notes, providing more reliable cross-reference discovery. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:03:30 -05:00
Taylor Eernisse	925ec9f574	fix: Retry loop safety, doctor model matching, regenerator robustness Three defensive improvements from peer code review: Replace unreachable!() in GitLab client retry loops: Both request() and request_with_headers() had unreachable!() after their for loops. While the logic was sound (the final iteration always reaches the return/break), any refactor to the loop condition would turn this into a runtime panic. Restructured both to store last_response with explicit break, making the control flow self-documenting and the .expect() message useful if ever violated. Doctor model name comparison asymmetry: Ollama model names were stripped of their tag (:latest, :v1.5) for comparison, but the configured model name was compared as-is. A config value like "nomic-embed-text:v1.5" would never match. Now strips the tag from both sides before comparing. Regenerator savepoint cleanup and progress accuracy: - upsert_document's error path did ROLLBACK TO but never RELEASE, leaving a dangling savepoint that could nest on the next call. Added RELEASE after rollback so the connection is clean. - estimated_total for progress reporting was computed once at start but the dirty queue can grow during processing. Now recounts each loop iteration with max() so the progress fraction never goes backwards. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:54 -05:00
teernisse	86a51cddef	fix: Project-scoped job claiming, structured rate-limit logging, RRF total_cmp Targeted fixes across multiple subsystems: dependent_queue: - Add project_id parameter to claim_jobs() for project-scoped job claiming, preventing cross-project job theft during concurrent multi-project ingestion - Add project_id parameter to count_pending_jobs() with optional scoping (None returns global counts, Some(pid) returns per-project counts) gitlab/client: - Downgrade rate-limit log from warn to info (429s are expected operational behavior, not warnings) and add structured fields (path, status_code) for better log filtering and aggregation gitlab/transformers/discussion: - Add tracing::warn on invalid timestamp parse instead of silent fallback to epoch 0, making data quality issues visible in logs ingestion/merge_requests: - Remove duplicate doc comment on upsert_label_tx search/rrf: - Replace partial_cmp().unwrap_or() with total_cmp() for f64 sorting, eliminating the NaN edge case entirely (total_cmp treats NaN consistently) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:13 -05:00
Taylor Eernisse	ee5c5f9645	perf: Eliminate double serialization, add SQLite tuning, optimize hot paths 11 isomorphic performance fixes from deep audit (no behavior changes): - Eliminate double serialization: store_payload now accepts pre-serialized bytes (&[u8]) instead of re-serializing from serde_json::Value. Uses Cow<[u8]> for zero-copy when compression is disabled. - Add SQLite cache_size (64MB) and mmap_size (256MB) pragmas - Replace SELECT-then-INSERT label upserts with INSERT...ON CONFLICT RETURNING in both issues.rs and merge_requests.rs - Replace INSERT + SELECT milestone upsert with RETURNING - Use prepare_cached for 5 hot-path queries in extractor.rs - Optimize compute_list_hash: index-sort + incremental SHA-256 instead of clone+sort+join+hash - Pre-allocate embedding float-to-bytes buffer with Vec::with_capacity - Replace RandomState::new() in rand_jitter with atomic counter XOR nanos - Remove redundant per-note payload storage (discussion payload contains all notes already) - Change transform_issue to accept &GitLabIssue (avoids full struct clone) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 08:12:37 -05:00
Taylor Eernisse	f5b4a765b7	perf: Configurable rate limit, 429 auto-retry, concurrent project ingestion The sync pipeline was bottlenecked at 10 req/s (hardcoded) with sequential project processing and no retry on rate limiting. These changes target 3-5x throughput improvement. Rate limit configuration: - Add requestsPerSecond to SyncConfig (default 30.0, was hardcoded 10) - Pass configured rate through to GitLabClient::new from ingest - Floor rate at 0.1 rps in RateLimiter::new to prevent panic on Duration::from_secs_f64(1.0 / 0.0) — now reachable via user config 429 auto-retry: - Both request() and request_with_headers() retry up to 3 times on HTTP 429, respecting the retry-after header (default 60s) - Extract parse_retry_after helper, reused by handle_response fallback - After exhausting retries, the 429 error propagates as before - Improved JSON decode errors now include a response body preview Concurrent project ingestion: - Derive Clone on GitLabClient (cheap: shares Arc<Mutex<RateLimiter>> and reqwest::Client which is already Arc-backed) - Restructure project loop to use futures::stream::buffer_unordered with primary_concurrency (default 4) as the parallelism bound - Each project gets its own SQLite connection (WAL mode + busy_timeout handles concurrent writes) - Add show_spinner field to IngestDisplay to separate the per-project spinner from the sync-level stage spinner - Error aggregation defers failures: all successful projects get their summaries printed and results counted before returning the first error - Bump dependentConcurrency default from 2 to 8 for discussion prefetch Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:37:06 -05:00
Taylor Eernisse	deafa88af5	perf: Concurrent resource event fetching, remove unnecessary async client.rs: - fetch_all_resource_events() now uses tokio::try_join!() to fire all three API requests (state, label, milestone events) concurrently instead of awaiting each sequentially. For entities with many events, this reduces wall-clock time by up to ~3x since the three independent HTTP round-trips overlap. main.rs: - Removed async from handle_issues() and handle_mrs(). These functions perform only synchronous database queries and formatting; they never await anything. Removing the async annotation avoids the overhead of an unnecessary Future state machine and makes the sync nature of these code paths explicit. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 14:09:44 -05:00
Taylor Eernisse	a50fc78823	style: Apply cargo fmt and clippy fixes across codebase Automated formatting and lint corrections from parallel agent work: - cargo fmt: import reordering (alphabetical), line wrapping to respect max width, trailing comma normalization, destructuring alignment, function signature reformatting, match arm formatting - clippy (pedantic): Range::contains() instead of manual comparisons, i64::from() instead of `as i64` casts, .clamp() instead of .max().min() chains, let-chain refactors (if-let with &&), #[allow(clippy::too_many_arguments)] and #[allow(clippy::field_reassign_with_default)] where warranted - Removed trailing blank lines and extra whitespace No behavioral changes. All existing tests pass unmodified. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:01:59 -05:00
Taylor Eernisse	e73d2907dc	feat(client): Add Resource Events API endpoints with generic paginated fetcher Extends GitLabClient with methods for fetching resource events from GitLab's per-entity API endpoints. Adds a new impl block containing: - fetch_all_pages<T>: Generic paginated collector that handles x-next-page header parsing with fallback to page-size heuristics. Uses per_page=100 and respects the existing rate limiter via request_with_headers. Terminates when: (a) x-next-page header is absent/stale, (b) response is empty, or (c) page is not full. - Six typed endpoint methods: - fetch_issue_state_events / fetch_mr_state_events - fetch_issue_label_events / fetch_mr_label_events - fetch_issue_milestone_events / fetch_mr_milestone_events - fetch_all_resource_events: Convenience method that fetches all three event types for an entity (issue or merge_request) in sequence, returning a tuple of (state, label, milestone) event vectors. Routes to issue or MR endpoints based on entity_type string. All methods follow the existing client patterns: path formatting with gitlab_project_id and iid, error propagation via Result, and rate limiter integration through the shared request_with_headers path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:19 -05:00
Taylor Eernisse	d31d5292f2	fix(gitlab): Improve pagination heuristics and fix rate limiter lock contention Two targeted fixes to the GitLab API client: 1. Pagination: When the x-next-page header is missing but the current page returned a full page of results, heuristically advance to the next page instead of stopping. This fixes silent data truncation observed with certain GitLab instances that omit pagination headers on intermediate pages. The existing early-exit on empty or partial pages remains as the termination condition. 2. Rate limiter: Refactor the async acquire() method into a synchronous check_delay() that computes the required sleep duration and updates last_request time while holding the mutex, then releases the lock before sleeping. This eliminates holding the Mutex<RateLimiter> across an await point, which previously could block other request tasks unnecessarily during the sleep interval. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:05 -05:00
Taylor Eernisse	cc8c489fd2	feat(gitlab): Add MR and MR discussion API endpoints to client Extends GitLabClient with endpoints for fetching merge requests and their discussions, following the same patterns established for issues. New methods: - fetch_merge_requests(): Paginated MR listing with cursor support, using updated_after filter for incremental sync. Uses 'all' scope to include MRs where user is author/assignee/reviewer. - fetch_merge_requests_single_page(): Single page variant for callers managing their own pagination (used by parallel prefetch) - fetch_mr_discussions(): Paginated discussion listing for a single MR, returns full discussion trees with notes API design notes: - Uses keyset pagination (order_by=updated_at, keyset=true) for consistent results during sync operations - MR endpoint uses /merge_requests (not /mrs) per GitLab API naming - Discussion endpoint matches issue pattern for consistency - Per_page defaults to 100 (GitLab max) for efficiency The fetch_merge_requests_single_page method enables the parallel prefetch strategy used in mr_discussions.rs, where multiple MRs' discussions are fetched concurrently during the sweep phase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:45:13 -05:00
Taylor Eernisse	dd5eb04953	feat(gitlab): Implement GitLab REST API client and type definitions Provides a typed interface to the GitLab API with pagination support. src/gitlab/types.rs - API response type definitions: - GitLabIssue: Full issue payload with author, assignees, labels - GitLabDiscussion: Discussion thread with notes array - GitLabNote: Individual note with author, timestamps, body - GitLabAuthor/GitLabUser: User information with avatar URLs - GitLabProject: Project metadata from /api/v4/projects - GitLabVersion: GitLab instance version from /api/v4/version - GitLabNotePosition: Line-level position for diff notes - All types derive Deserialize for JSON parsing src/gitlab/client.rs - HTTP client with authentication: - Bearer token authentication from config - Base URL configuration for self-hosted instances - Paginated iteration via keyset or offset pagination - Automatic Link header parsing for next page URLs - Per-page limit control (default 100) - Methods: get_user(), get_version(), get_project() - Async stream for issues: list_issues_paginated() - Async stream for discussions: list_issue_discussions_paginated() - Respects GitLab rate limiting via response headers src/gitlab/transformers/ - API to database mapping: transformers/issue.rs - Issue transformation: - Maps GitLabIssue to IssueRow for database insert - Extracts milestone ID and due date - Normalizes author/assignee usernames - Preserves label IDs for junction table - Returns IssueWithMetadata including label/assignee lists transformers/discussion.rs - Discussion transformation: - Maps GitLabDiscussion to NormalizedDiscussion - Extracts thread metadata (resolvable, resolved) - Flattens notes to NormalizedNote with foreign keys - Handles system notes vs user notes - Preserves note position for diff discussions transformers/mod.rs - Re-exports all transformer types Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:28:21 -05:00

16 Commits