gitlore

Author	SHA1	Message	Date
Taylor Eernisse	ab43bbd2db	feat: Add dry-run mode to ingest, sync, and stats commands Enables preview of operations without making changes, useful for understanding what would happen before committing to a full sync. Ingest dry-run (--dry-run flag): - Shows resource type, sync mode (full vs incremental), project list - Per-project info: existing count, has_cursor, last_synced timestamp - No GitLab API calls, no database writes Sync dry-run (--dry-run flag): - Preview all four stages: issues ingest, MRs ingest, docs, embed - Shows which stages would run vs be skipped (--no-docs, --no-embed) - Per-project breakdown for both entity types Stats repair dry-run (--dry-run flag): - Shows what would be repaired without executing repairs - "would fix" vs "fixed" indicator in terminal output - dry_run: true field in JSON response Implementation details: - DryRunPreview struct captures project-level sync state - SyncDryRunResult aggregates previews for all sync stages - Terminal output uses yellow styling for "would" actions - JSON output includes dry_run: true at top level Flag handling: - --dry-run and --no-dry-run pair for explicit control - Defaults to false (normal operation) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:22:22 -05:00
Taylor Eernisse	784fe79b80	feat(show): Enrich issue detail with assignees, milestones, and closing MRs Issue detail now includes: - assignees: List of assigned usernames from issue_assignees table - due_date: Issue due date when set - milestone: Milestone title when assigned - closing_merge_requests: MRs that will close this issue when merged Closing MR detection: - Queries entity_references table for 'closes' reference type - Shows MR iid, title, state (with color coding) in terminal output - Full MR metadata included in JSON output Human-readable output: - "Assignees:" line shows comma-separated @usernames - "Development:" section lists closing MRs with state indicator - Green for merged, cyan for opened, red for closed JSON output: - New fields: assignees, due_date, milestone, closing_merge_requests - closing_merge_requests array contains iid, title, state, web_url Test coverage: - get_issue_assignees: empty, single, multiple (alphabetical order) - get_closing_mrs: empty, single, ignores 'mentioned' references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:22:02 -05:00
Taylor Eernisse	db750e4fc5	fix: Graceful HTTP client fallbacks and overflow protection HTTP client initialization (embedding/ollama.rs, gitlab/client.rs): - Replace expect/panic with unwrap_or_else fallback to default Client - Log warning when configured client fails to build - Prevents crash on TLS/system configuration issues Doctor command (cli/commands/doctor.rs): - Handle reqwest Client::builder() failure in Ollama health check - Return Warning status with descriptive message instead of panicking - Ensures doctor command remains operational even with HTTP issues These changes improve resilience when running in unusual environments (containers with limited TLS, restrictive network policies, etc.) without affecting normal operation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:21:40 -05:00
Taylor Eernisse	72f1cafdcf	perf: Optimize SQL queries and reduce allocations in hot paths Change detection queries (embedding/change_detector.rs): - Replace triple-EXISTS subquery pattern with LEFT JOIN + NULL check - SQLite now scans embedding_metadata once instead of three times - Semantically identical: returns docs needing embedding when no embedding exists, hash changed, or config mismatch Count queries (cli/commands/count.rs): - Consolidate 3 separate COUNT queries for issues into single query using conditional aggregation (CASE WHEN state = 'x' THEN 1) - Same optimization for MRs: 5 queries reduced to 1 Search filter queries (search/filters.rs): - Replace N separate EXISTS clauses for label filtering with single IN() clause with COUNT/GROUP BY HAVING pattern - For multi-label AND queries, this reduces N subqueries to 1 FTS tokenization (search/fts.rs): - Replace collect-into-Vec-then-join pattern with direct String building - Pre-allocate capacity hint for result string Discussion truncation (documents/truncation.rs): - Calculate total length without allocating concatenated string first - Only allocate full string when we know it fits within limit Embedding pipeline (embedding/pipeline.rs): - Add Vec::with_capacity hints for chunk work and cleared_docs hashset - Reduces reallocations during embedding batch processing Backoff calculation (core/backoff.rs): - Replace unchecked addition with saturating_add to prevent overflow - Add test case verifying overflow protection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:21:28 -05:00
Taylor Eernisse	9c04b7fb1b	chore(beads): Update issue tracker metadata Syncs .beads/issues.jsonl and last-touched timestamp with current project state. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:04:44 -05:00
Taylor Eernisse	dd2869fd98	test: Remove redundant comments from test files Applies the same doc comment cleanup to test files: - Removes test module headers (//! lines) - Removes obvious test function comments - Retains comments explaining non-obvious test scenarios Test names should be descriptive enough to convey intent without additional comments. Complex test setup or assertions that need explanation retain their comments. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:04:39 -05:00
Taylor Eernisse	65583ed5d6	refactor: Remove redundant doc comments throughout codebase Removes module-level doc comments (//! lines) and excessive inline doc comments that were duplicating information already evident from: - Function/struct names (self-documenting code) - Type signatures (the what is clear from types) - Implementation context (the how is clear from code) Affected modules: - cli/* - Removed command descriptions duplicating clap help text - core/* - Removed module headers and obvious function docs - documents/* - Removed extractor/regenerator/truncation docs - embedding/* - Removed pipeline and chunking docs - gitlab/* - Removed client and transformer docs (kept type definitions) - ingestion/* - Removed orchestrator and ingestion docs - search/* - Removed FTS and vector search docs Philosophy: Code should be self-documenting. Comments should explain "why" (business decisions, non-obvious constraints) not "what" (which the code itself shows). This change reduces noise and maintenance burden while keeping the codebase just as understandable. Retains comments for: - Non-obvious business logic - Important safety invariants - Complex algorithm explanations - Public API boundaries where generated docs matter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:04:32 -05:00
Taylor Eernisse	976ad92ef0	test(gitlab): Add GitLabIssueRef deserialization tests Adds test coverage for the new GitLabIssueRef type used by the MR closes_issues API endpoint: - deserializes_gitlab_issue_ref: Single object with all fields - deserializes_gitlab_issue_ref_array: Array of refs (typical API response) Validates that cross-project references (different project_id values) deserialize correctly, which is important for cross-project close links. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:03:47 -05:00
Taylor Eernisse	a76dc8089e	feat(orchestrator): Integrate closes_issues fetching and cross-ref extraction Extends the MR ingestion pipeline to populate the entity_references table from multiple sources: 1. Resource state events (extract_refs_from_state_events): Called after draining the resource_events queue for both issues and MRs. Extracts "closes" relationships from the structured API data. 2. System notes (extract_refs_from_system_notes): Called during MR ingestion to parse "mentioned in" and "closed by" patterns from discussion note bodies. 3. MR closes_issues API (new): - enqueue_mr_closes_issues_jobs(): Queues jobs for all MRs - drain_mr_closes_issues(): Fetches closes_issues for each MR - Records cross-references with source_method='closes_issues_api' New progress events: - ClosesIssuesFetchStarted { total } - ClosesIssueFetched { current, total } - ClosesIssuesFetchComplete { fetched, failed } New result fields on IngestMrProjectResult: - closes_issues_fetched: Count of successful fetches - closes_issues_failed: Count of failed fetches The pipeline now comprehensively builds the relationship graph between issues and MRs, enabling queries like "what will close this issue?" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:03:40 -05:00
Taylor Eernisse	26cf13248d	feat(gitlab): Add MR closes_issues API endpoint and GitLabIssueRef type Extends the GitLab client to fetch the list of issues that an MR will close when merged, using the /projects/:id/merge_requests/:iid/closes_issues endpoint. New type: - GitLabIssueRef: Lightweight issue reference with id, iid, project_id, title, state, and web_url. Used for the closes_issues response which returns a list of issue summaries rather than full GitLabIssue objects. New client method: - fetch_mr_closes_issues(gitlab_project_id, iid): Returns Vec<GitLabIssueRef> for all issues that the MR's description/commits indicate will be closed. This enables building the entity_references table from API data in addition to parsing system notes, providing more reliable cross-reference discovery. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:03:30 -05:00
Taylor Eernisse	a2e26454dc	build: Add regex dependency for cross-reference parsing The note_parser module requires regex for extracting "mentioned in" and "closed by" patterns from GitLab system notes. The regex crate provides: - LazyLock-compatible lazy compilation (Regex::new at first use) - Named capture groups for clean field extraction - Efficient iteration over all matches via captures_iter() Version 1.x is the current stable release with good compile times. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:03:21 -05:00
Taylor Eernisse	f748570d4d	feat(core): Add cross-reference extraction infrastructure Introduces two new modules for extracting and storing entity cross-references from GitLab data: note_parser.rs: - Parses system notes for "mentioned in" and "closed by" patterns - Extracts cross-project references (group/project#42, group/project!123) - Uses lazy-compiled regexes for performance - Handles both issue (#) and MR (!) sigils - Provides extract_refs_from_system_notes() for batch processing references.rs: - Extracts refs from resource_state_events table (API-sourced closes links) - Provides insert_entity_reference() for storing discovered references - Includes resolution helpers: resolve_issue_local_id, resolve_mr_local_id, resolve_project_path for converting iids to internal IDs - Enables cross-project reference resolution These modules power the entity_references table, enabling features like "find all MRs that close this issue" and "find all issues mentioned in this MR". Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:03:13 -05:00
Taylor Eernisse	0b6b168043	chore(beads): Update issue tracker metadata Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 15:02:17 -05:00
Taylor Eernisse	1d003aeac2	fix(sync): Replace text-only progress with animated bars for docs/embed stages Stages 3 (generate-docs) and 4 (embed) reported progress by appending "(N/M)" text to the stage spinner message, while stages 1-2 (ingest) used dedicated indicatif progress bars with animated [====> ] rendering registered with the global MultiProgress. This visual inconsistency was introduced when progress callbacks were wired through in `266ed78`. Replace the spinner.set_message() callbacks with proper ProgressBar instances that match the ingest stage pattern: - Create a bar-style ProgressBar registered via multi().add() - Use the same template/progress_chars as the ingest discussion bars - Lazy-init the tick via AtomicBool to avoid showing the bar before the first callback fires (matching how ingest enables ticks only at DiscussionSyncStarted) - Update set_length on every callback for the docs stage, since the regenerator's estimated_total can grow if new dirty items are queued during processing (using .max() internally) - Clean up both the sub-bar and stage spinner on completion/error Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 15:02:13 -05:00
Taylor Eernisse	925ec9f574	fix: Retry loop safety, doctor model matching, regenerator robustness Three defensive improvements from peer code review: Replace unreachable!() in GitLab client retry loops: Both request() and request_with_headers() had unreachable!() after their for loops. While the logic was sound (the final iteration always reaches the return/break), any refactor to the loop condition would turn this into a runtime panic. Restructured both to store last_response with explicit break, making the control flow self-documenting and the .expect() message useful if ever violated. Doctor model name comparison asymmetry: Ollama model names were stripped of their tag (:latest, :v1.5) for comparison, but the configured model name was compared as-is. A config value like "nomic-embed-text:v1.5" would never match. Now strips the tag from both sides before comparing. Regenerator savepoint cleanup and progress accuracy: - upsert_document's error path did ROLLBACK TO but never RELEASE, leaving a dangling savepoint that could nest on the next call. Added RELEASE after rollback so the connection is clean. - estimated_total for progress reporting was computed once at start but the dirty queue can grow during processing. Now recounts each loop iteration with max() so the progress fraction never goes backwards. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:54 -05:00
Taylor Eernisse	1fdc6d03cc	fix: Savepoint leak in embedding pipeline, atomic fail_job, RRF dedup Three correctness fixes found during peer code review: Embedding pipeline savepoint leak (HIGH severity): The SAVEPOINT embed_page / RELEASE embed_page pattern had ~10 `?` propagation points between them. Any error from record_embedding_error, clear_document_embeddings, or store_embedding would exit the function without rolling back, leaving the SQLite connection in a broken transactional state and causing cascading failures for the rest of the session. Fixed by extracting page processing into `embed_page()` and wrapping with explicit rollback-on-error handling. Dependent queue fail_job race (MEDIUM severity): fail_job performed a SELECT followed by a separate UPDATE on the attempts counter without a transaction. Under concurrent lock reclamation, the attempts value could be read stale. Replaced with a single atomic UPDATE that increments attempts and computes exponential backoff entirely in SQL, also halving DB round-trips. Added explicit error when the job no longer exists. RRF duplicate document score inflation (MEDIUM severity): If a retriever returned the same document_id multiple times, the RRF score accumulated multiple rank contributions while the rank only recorded the first occurrence. Moved the score accumulation inside the `if is_none` guard so only the first occurrence per list contributes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:38 -05:00
Taylor Eernisse	266ed78e73	feat(sync): Wire progress callbacks through sync pipeline stages The sync command's stage spinners now show real-time aggregate progress for each pipeline phase instead of static "syncing..." messages. - Add `progress_callback` parameter to `run_embed` and `run_generate_docs` so callers can receive `(processed, total)` updates - Add `stage_bar` parameter to `run_ingest` for aggregate progress across concurrently-ingested projects using shared AtomicUsize counters - Update `stage_spinner` to use `{prefix}` for the `[N/M]` label, allowing `{msg}` to be updated independently with progress details - Thread `ProgressBar` clones into each concurrent project task so per-entity progress (fetch, discussions, events) is reflected on the aggregate spinner - Pass `None` for progress callbacks at standalone CLI entry points (handle_ingest, handle_generate_docs, handle_embed) to preserve existing behavior when commands are run outside of sync Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:21 -05:00
teernisse	a65ea2f56f	chore(beads): Add observability and orchestrator issues to tracker Add new beads for MR orchestrator integration, sync run observability, metrics collection, logging infrastructure, and CLI verbosity controls. Update last-touched timestamp. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:34 -05:00
teernisse	38da7ca47b	docs: Add observability PRD and sync pipeline explorer visualization - prd-observability.md: Product requirements document for the sync pipeline observability system, covering structured logging, metrics collection, sync run tracking, and robot-mode performance output - gitlore-sync-explorer.html: Self-contained interactive HTML visualization for exploring sync pipeline stage timings and data flow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:22 -05:00
teernisse	86a51cddef	fix: Project-scoped job claiming, structured rate-limit logging, RRF total_cmp Targeted fixes across multiple subsystems: dependent_queue: - Add project_id parameter to claim_jobs() for project-scoped job claiming, preventing cross-project job theft during concurrent multi-project ingestion - Add project_id parameter to count_pending_jobs() with optional scoping (None returns global counts, Some(pid) returns per-project counts) gitlab/client: - Downgrade rate-limit log from warn to info (429s are expected operational behavior, not warnings) and add structured fields (path, status_code) for better log filtering and aggregation gitlab/transformers/discussion: - Add tracing::warn on invalid timestamp parse instead of silent fallback to epoch 0, making data quality issues visible in logs ingestion/merge_requests: - Remove duplicate doc comment on upsert_label_tx search/rrf: - Replace partial_cmp().unwrap_or() with total_cmp() for f64 sorting, eliminating the NaN edge case entirely (total_cmp treats NaN consistently) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:13 -05:00
teernisse	f6d19a9467	feat(sync): Instrument pipeline with tracing spans, run_id correlation, and metrics Add end-to-end observability to the sync and ingest pipelines: Sync command: - Generate UUID-based run_id for each sync invocation, propagated through all child spans for log correlation across stages - Accept MetricsLayer reference to extract hierarchical StageTiming data after pipeline completion for robot-mode performance output - Record sync runs in DB via SyncRunRecorder (start/succeed/fail lifecycle) - Wrap entire sync execution in a root tracing span with run_id field Ingest command: - Wrap run_ingest in an instrumented root span with run_id and resource_type - Add project path prefix to discussion progress bars for multi-project clarity - Reset resource_events_synced_for_updated_at on --full re-sync Sync status: - Expand from single last_run to configurable recent runs list (default 10) - Parse and expose StageTiming metrics from stored metrics_json - Add run_id, total_items_processed, total_errors to SyncRunInfo - Add mr_count to DataSummary for complete entity coverage Orchestrator: - Add #[instrument] with structured fields to issue and MR ingestion functions - Record items_processed, items_skipped, errors on span close for MetricsLayer - Emit granular progress events (IssuesFetchStarted, IssuesFetchComplete) - Pass project_id through to drain_resource_events for scoped job claiming Document regenerator and embedding pipeline: - Add #[instrument] spans with items_processed, items_skipped, errors fields - Record final counts on span close for metrics extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:00 -05:00
teernisse	362503d3bf	feat(cli): Add verbosity controls, JSON log format, and triple-layer subscriber Overhaul the CLI logging infrastructure for production observability: CLI flags: - Add -v/-vv/-vvv (--verbose) for progressive stderr verbosity control: 0=INFO, 1=DEBUG app, 2=DEBUG all, 3+=TRACE - Add --log-format text\|json for structured stderr output in automation - Existing -q/--quiet overrides verbosity for silent operation Subscriber architecture (main.rs): - Replace single-layer subscriber with triple-layer setup: 1. stderr layer: human-readable or JSON, filtered by -v flags 2. file layer: always-on JSON to daily-rotated logs (lore.YYYY-MM-DD.log) 3. MetricsLayer: captures span timing for robot-mode performance payloads - Parse CLI before subscriber init so verbosity is known at setup time - Load LoggingConfig early (with graceful fallback for pre-init commands) - Clean up old log files before subscriber init to avoid holding deleted handles - Hold WorkerGuard at function scope to ensure flush on exit Doctor command: - Add logging health check: validates log directory exists, reports file count and total size, warns on missing or inaccessible log directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:38:43 -05:00
teernisse	329c8f4539	feat(observability): Add metrics, logging, and sync-run core modules Introduce the foundational observability layer for the sync pipeline: - MetricsLayer: Custom tracing subscriber layer that captures span timing and structured fields, materializing them into a hierarchical Vec<StageTiming> tree for robot-mode performance data output - logging: Dual-layer subscriber infrastructure with configurable stderr verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily rotation and configurable retention (default 30 days) - SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs table (start -> succeed\|fail), with correlation IDs and aggregate counts - LoggingConfig: New config section for log_dir, retention_days, and file_logging toggle - get_log_dir(): Path helper for log directory resolution - is_permanent_api_error(): Distinguish retryable vs permanent API failures (only 404 is truly permanent; 403/auth errors may be environmental) Database changes: - Migration 013: Add resource_events_synced_for_updated_at watermark columns to issues and merge_requests tables for incremental resource event sync - Migration 014: Enrich sync_runs with run_id correlation ID, aggregate counts (total_items_processed, total_errors), and run_id index - Wrap file-based migrations in savepoints for rollback safety Dependencies: Add uuid (run_id generation), tracing-appender (file logging) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:38:29 -05:00
Taylor Eernisse	ee5c5f9645	perf: Eliminate double serialization, add SQLite tuning, optimize hot paths 11 isomorphic performance fixes from deep audit (no behavior changes): - Eliminate double serialization: store_payload now accepts pre-serialized bytes (&[u8]) instead of re-serializing from serde_json::Value. Uses Cow<[u8]> for zero-copy when compression is disabled. - Add SQLite cache_size (64MB) and mmap_size (256MB) pragmas - Replace SELECT-then-INSERT label upserts with INSERT...ON CONFLICT RETURNING in both issues.rs and merge_requests.rs - Replace INSERT + SELECT milestone upsert with RETURNING - Use prepare_cached for 5 hot-path queries in extractor.rs - Optimize compute_list_hash: index-sort + incremental SHA-256 instead of clone+sort+join+hash - Pre-allocate embedding float-to-bytes buffer with Vec::with_capacity - Replace RandomState::new() in rand_jitter with atomic counter XOR nanos - Remove redundant per-note payload storage (discussion payload contains all notes already) - Change transform_issue to accept &GitLabIssue (avoids full struct clone) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 08:12:37 -05:00
Taylor Eernisse	f5b4a765b7	perf: Configurable rate limit, 429 auto-retry, concurrent project ingestion The sync pipeline was bottlenecked at 10 req/s (hardcoded) with sequential project processing and no retry on rate limiting. These changes target 3-5x throughput improvement. Rate limit configuration: - Add requestsPerSecond to SyncConfig (default 30.0, was hardcoded 10) - Pass configured rate through to GitLabClient::new from ingest - Floor rate at 0.1 rps in RateLimiter::new to prevent panic on Duration::from_secs_f64(1.0 / 0.0) — now reachable via user config 429 auto-retry: - Both request() and request_with_headers() retry up to 3 times on HTTP 429, respecting the retry-after header (default 60s) - Extract parse_retry_after helper, reused by handle_response fallback - After exhausting retries, the 429 error propagates as before - Improved JSON decode errors now include a response body preview Concurrent project ingestion: - Derive Clone on GitLabClient (cheap: shares Arc<Mutex<RateLimiter>> and reqwest::Client which is already Arc-backed) - Restructure project loop to use futures::stream::buffer_unordered with primary_concurrency (default 4) as the parallelism bound - Each project gets its own SQLite connection (WAL mode + busy_timeout handles concurrent writes) - Add show_spinner field to IngestDisplay to separate the per-project spinner from the sync-level stage spinner - Error aggregation defers failures: all successful projects get their summaries printed and results counted before returning the first error - Bump dependentConcurrency default from 2 to 8 for discussion prefetch Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:37:06 -05:00
Taylor Eernisse	4ee99c1677	fix: Propagate queue errors, eliminate format!-based SQL construction Two hardening changes to the dependent queue and orchestrator: - dependent_queue::fail_job now propagates the rusqlite error via ? instead of silently falling back to 0 attempts when the job row is missing. A missing job is a real bug that should surface, not be masked by unwrap_or(0) which would cause infinite retries at the base backoff interval. - orchestrator::enqueue_resource_events_for_entity_type replaces format!-based SQL ("SELECT {id_col} FROM {table}") with separate hardcoded queries per entity type. While the original values were not user-controlled, hardcoded SQL is clearer about intent and eliminates a class of injection risk entirely. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:36:45 -05:00
Taylor Eernisse	c35f485e0e	refactor(cli): Replace tracing-indicatif with shared MultiProgress tracing-indicatif pulled in vt100, arrayvec, and its own indicatif integration layer. Replace it with a minimal SuspendingWriter that coordinates tracing output with progress bars via a global LazyLock MultiProgress. - Add src/cli/progress.rs: shared MultiProgress singleton via LazyLock and a SuspendingWriter that suspends bars before writing log lines, preventing interleaving/flicker - Wire all progress bar creation through multi().add() in sync and ingest commands - Replace IndicatifLayer in main.rs with SuspendingWriter for tracing-subscriber's fmt layer - Remove tracing-indicatif from Cargo.toml (drops vt100 and arrayvec transitive deps) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:36:31 -05:00
Taylor Eernisse	a92e176bb6	fix(events): Handle nullable label and milestone in resource events GitLab returns null for the label/milestone fields on resource_label_events and resource_milestone_events when the referenced label or milestone has been deleted. This caused deserialization failures during sync. - Add migration 012 to recreate both event tables with nullable label_name, milestone_title, and milestone_id columns (SQLite requires table recreation to alter NOT NULL constraints) - Change GitLabLabelEvent.label and GitLabMilestoneEvent.milestone to Option<> in the Rust types - Update upsert functions to pass through None values correctly - Add tests for null label and null milestone deserialization Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:36:17 -05:00
Taylor Eernisse	deafa88af5	perf: Concurrent resource event fetching, remove unnecessary async client.rs: - fetch_all_resource_events() now uses tokio::try_join!() to fire all three API requests (state, label, milestone events) concurrently instead of awaiting each sequentially. For entities with many events, this reduces wall-clock time by up to ~3x since the three independent HTTP round-trips overlap. main.rs: - Removed async from handle_issues() and handle_mrs(). These functions perform only synchronous database queries and formatting; they never await anything. Removing the async annotation avoids the overhead of an unnecessary Future state machine and makes the sync nature of these code paths explicit. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 14:09:44 -05:00
Taylor Eernisse	880ad1d3fa	refactor(events): Lift transaction control to callers, eliminate duplicated store functions events_db.rs: - Removed internal savepoints from upsert_state_events, upsert_label_events, and upsert_milestone_events. Each function previously created its own savepoint, making it impossible for callers to wrap all three in a single atomic transaction. - Changed signatures from &mut Connection to &Connection, since savepoints are no longer created internally. This makes the functions compatible with rusqlite::Transaction (which derefs to Connection), allowing callers to pass a transaction directly. orchestrator.rs: - Deleted the three store_*_events_tx() functions (store_state_events_tx, store_label_events_tx, store_milestone_events_tx) which were hand-duplicated copies of the events_db upsert functions, created as a workaround for the &mut Connection requirement. Now that events_db accepts &Connection, store_resource_events() calls the canonical upsert functions directly through the unchecked_transaction. - Replaced the max-iterations guard in drain_resource_events() with a HashSet-based deduplication of job IDs. The old guard used an arbitrary 2x multiplier on total_pending which could either terminate too early (if many retries were legitimate) or too late. The new approach precisely prevents reprocessing the same job within a single drain run, which is the actual invariant we need. Net effect: ~133 lines of duplicated SQL removed, single source of truth for event upsert logic, and callers control transaction scope. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 14:09:35 -05:00
Taylor Eernisse	4c0123426a	fix: Content hash now computed after truncation, atomic job claiming Two bug fixes: 1. extractor.rs: The content hash was computed on the pre-truncation content, meaning the hash stored in the document didn't correspond to the actual stored (truncated) content. This would cause change detection to miss updates when content changed only within the truncated portion. Hash is now computed after truncate_hard_cap() so it always matches the persisted content. 2. dependent_queue.rs: claim_jobs() had a TOCTOU race between the SELECT that found available jobs and the UPDATE that locked them. Under concurrent callers, two drain runs could claim the same job. Replaced with a single UPDATE ... RETURNING statement that atomically selects and locks jobs in one operation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 14:09:22 -05:00
Taylor Eernisse	bb75a9d228	fix(events): Resource events now run on incremental syncs, fix output and progress bar Three bugs fixed: 1. Early return in orchestrator when no discussions needed sync also skipped resource event enqueue+drain. On incremental syncs (the most common case), resource events were never fetched. Restructured to use if/else instead of early return so Step 4 always executes. 2. Ingest command JSON and human-readable output silently dropped resource_events_fetched/failed counts. Added to IngestJsonData and print_ingest_summary. 3. Progress bar reuse after finish_and_clear caused indicatif to silently ignore subsequent set_position/set_length calls. Added reset() call before reconfiguring the bar for resource events. Also removed stale comment referencing "unsafe" that didn't reflect the actual unchecked_transaction approach. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:06:35 -05:00
Taylor Eernisse	2bcd8db0e9	feat(events): Wire resource event fetching into sync pipeline (bd-1ep) Integrate resource event fetching as Step 4 of both issue and MR ingestion, gated behind the fetch_resource_events config flag. Orchestrator changes: - Add ProgressEvent variants: ResourceEventsFetchStarted, ResourceEventFetched, ResourceEventsFetchComplete - Add resource_events_fetched/failed fields to IngestProjectResult and IngestMrProjectResult - New enqueue_resource_events_for_entity_type() queries all issues/MRs for a project and enqueues resource_events jobs via the dependent queue (INSERT OR IGNORE for idempotency) - New drain_resource_events() claims jobs in batches, fetches state/label/milestone events from GitLab API, stores them atomically via unchecked_transaction, and handles failures with exponential backoff via fail_job() - Max-iterations guard prevents infinite retry loops within a single drain run - New store_resource_events() + per-type _tx helpers write events using prepared statements inside a single transaction - DrainResult struct tracks fetched/failed counts CLI ingest changes: - IngestResult gains resource_events_fetched/failed fields - Progress bar repurposed for resource event fetch phase (reuses discussion bar with updated template) - Accumulates event counts from both issue and MR ingestion CLI sync changes: - SyncResult gains resource_events_fetched/failed fields - Accumulates counts from both ingest stages - print_sync() conditionally displays event counts - Structured logging includes event counts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:02:15 -05:00
Taylor Eernisse	a50fc78823	style: Apply cargo fmt and clippy fixes across codebase Automated formatting and lint corrections from parallel agent work: - cargo fmt: import reordering (alphabetical), line wrapping to respect max width, trailing comma normalization, destructuring alignment, function signature reformatting, match arm formatting - clippy (pedantic): Range::contains() instead of manual comparisons, i64::from() instead of `as i64` casts, .clamp() instead of .max().min() chains, let-chain refactors (if-let with &&), #[allow(clippy::too_many_arguments)] and #[allow(clippy::field_reassign_with_default)] where warranted - Removed trailing blank lines and extra whitespace No behavioral changes. All existing tests pass unmodified. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:01:59 -05:00
Taylor Eernisse	ff94f24702	chore(beads): Update issue tracker state for Gate 1 completions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:01:46 -05:00
Taylor Eernisse	5c521491b7	chore(beads): Update issue tracker state for Gate 1 completions Closes bd-hu3, bd-2e8, bd-2fm, bd-sqw, bd-1uc, bd-tir, bd-3sh, bd-1m8. All Gate 1 resource events infrastructure beads except bd-1ep (pipeline wiring) are now complete. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:08:23 -05:00
Taylor Eernisse	0236ef2776	feat(stats): Extend --check with event FK integrity and queue health diagnostics Adds two new categories of integrity checks to 'lore stats --check': Event FK integrity (3 queries): - Detects orphaned resource_state_events where issue_id or merge_request_id points to a non-existent parent entity - Same check for resource_label_events and resource_milestone_events - Under normal CASCADE operation these should always be zero; non-zero indicates manual DB edits, bugs, or partial migration state Queue health diagnostics: - pending_dependent_fetches counts: pending, failed, and stuck (locked) - queue_stuck_locks: Jobs with locked_at set (potential worker crashes) - queue_max_attempts: Highest retry count across all jobs (signals permanently failing jobs when > 3) New IntegrityResult fields: orphan_state_events, orphan_label_events, orphan_milestone_events, queue_stuck_locks, queue_max_attempts. New QueueStats fields: pending_dependent_fetches, pending_dependent_fetches_failed, pending_dependent_fetches_stuck. Human output shows colored PASS/WARN/FAIL indicators: - Red "!" for orphaned events (integrity failure) - Yellow "!" for stuck locks and high retry counts (warnings) - Dependent fetch queue line only shown when non-zero All new queries are guarded by table_exists() checks for graceful degradation on databases without migration 011 applied. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:08:15 -05:00
Taylor Eernisse	12811683ca	feat(cli): Add 'lore count events' command with human and robot output Extends the count command to support "events" as an entity type, displaying resource event counts broken down by event type (state, label, milestone) and entity type (issue, merge request). New functions in count.rs: - run_count_events: Creates DB connection and delegates to events_db::count_events for the actual queries - print_event_count: Human-readable table with aligned columns showing per-type breakdowns and row/column totals - print_event_count_json: Structured JSON matching the robot mode contract with ok/data envelope and per-type issue/mr/total counts JSON output structure: {"ok":true,"data":{"state_events":{"issue":N,"merge_request":N, "total":N},"label_events":{...},"milestone_events":{...},"total":N}} Updated exports in commands/mod.rs to expose the three new public functions (run_count_events, print_event_count, print_event_count_json). The "events" branch in handle_count (main.rs, committed earlier) routes to these functions before the existing entity type dispatcher. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:08:01 -05:00
Taylor Eernisse	724be4d265	feat(queue): Add generic dependent fetch queue with exponential backoff New module src/core/dependent_queue.rs provides job queue operations against the pending_dependent_fetches table. Designed for second-pass fetches that depend on primary entity ingestion (resource events, MR close references, MR file diffs). Queue operations: - enqueue_job: Idempotent INSERT OR IGNORE keyed on the UNIQUE (project_id, entity_type, entity_iid, job_type) constraint. Returns bool indicating whether the row was actually inserted. - claim_jobs: Two-phase claim — SELECT available jobs (unlocked, past retry window) then UPDATE locked_at in batch. Orders by enqueued_at ASC for FIFO processing within a job type. - complete_job: DELETE the row on successful processing. - fail_job: Increments attempts, calculates exponential backoff (30s * 2^(attempts-1), capped at 480s), sets next_retry_at, clears locked_at, and records the error message. Reads current attempts via query with unwrap_or(0) fallback for robustness. - reclaim_stale_locks: Clears locked_at on jobs locked longer than a configurable threshold, recovering from worker crashes. - count_pending_jobs: GROUP BY job_type aggregation for progress reporting and stats display. Registers both events_db and dependent_queue in src/core/mod.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:48 -05:00
Taylor Eernisse	c34ed3007e	feat(db): Add event upsert functions and count queries in events_db module New module src/core/events_db.rs provides database operations for resource events: - upsert_state_events: Batch INSERT OR REPLACE for state change events, keyed on UNIQUE(gitlab_id, project_id). Wraps in a savepoint for atomicity per entity batch. Maps GitLabStateEvent fields including optional user, source_commit, and source_merge_request_iid. - upsert_label_events: Same pattern for label add/remove events, extracting label.name for denormalized storage. - upsert_milestone_events: Same pattern for milestone assignment events, storing both milestone.title and milestone.id. All three upsert functions: - Take &mut Connection (required for savepoint creation) - Use prepare_cached for statement reuse across batch iterations - Convert ISO timestamps via iso_to_ms_strict for ms-epoch storage - Propagate rusqlite errors via the #[from] LoreError::Database path - Return the count of events processed Supporting functions: - resolve_entity_ids: Maps entity_type string to (issue_id, MR_id) pair with exactly-one-non-NULL invariant matching the CHECK constraints - count_events: Queries all three event tables with conditional COUNT aggregations, returning EventCounts struct. Uses unwrap_or((0, 0)) for graceful degradation when tables don't exist (pre-migration 011). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:34 -05:00
Taylor Eernisse	e73d2907dc	feat(client): Add Resource Events API endpoints with generic paginated fetcher Extends GitLabClient with methods for fetching resource events from GitLab's per-entity API endpoints. Adds a new impl block containing: - fetch_all_pages<T>: Generic paginated collector that handles x-next-page header parsing with fallback to page-size heuristics. Uses per_page=100 and respects the existing rate limiter via request_with_headers. Terminates when: (a) x-next-page header is absent/stale, (b) response is empty, or (c) page is not full. - Six typed endpoint methods: - fetch_issue_state_events / fetch_mr_state_events - fetch_issue_label_events / fetch_mr_label_events - fetch_issue_milestone_events / fetch_mr_milestone_events - fetch_all_resource_events: Convenience method that fetches all three event types for an entity (issue or merge_request) in sequence, returning a tuple of (state, label, milestone) event vectors. Routes to issue or MR endpoints based on entity_type string. All methods follow the existing client patterns: path formatting with gitlab_project_id and iid, error propagation via Result, and rate limiter integration through the shared request_with_headers path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:19 -05:00
Taylor Eernisse	9d4755521f	feat(config): Add fetchResourceEvents config flag with --no-events CLI override Adds a new boolean field to SyncConfig that controls whether resource event fetching is performed during sync: - SyncConfig.fetch_resource_events: defaults to true via serde default_true helper, serialized as "fetchResourceEvents" in JSON - SyncArgs.no_events: --no-events CLI flag that overrides the config value to false when present - SyncOptions.no_events: propagates the flag through the sync pipeline - handle_sync_cmd: mutates loaded config when --no-events is set, ensuring the flag takes effect regardless of config file contents This follows the existing pattern established by --no-embed and --no-docs flags, where CLI flags override config file defaults. The config is loaded as mutable specifically to support this override. Also adds "events" to the count command's entity type value_parser, enabling `lore count events` (implementation in a separate commit). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:06 -05:00
Taylor Eernisse	92ff255909	feat(types): Add GitLab Resource Event serde types with deserialization tests Adds six new types for deserializing responses from GitLab's three Resource Events API endpoints (state, label, milestone): - GitLabStateEvent: State transitions with optional user, source_commit, and source_merge_request reference - GitLabLabelEvent: Label add/remove events with nested GitLabLabelRef - GitLabMilestoneEvent: Milestone assignment changes with nested GitLabMilestoneRef - GitLabMergeRequestRef: Lightweight MR reference (iid, title, web_url) - GitLabLabelRef: Label metadata (id, name, color, description) - GitLabMilestoneRef: Milestone metadata (id, iid, title) All types derive Deserialize + Serialize and use Option<T> for nullable fields (user, source_commit, color, description) to match GitLab's API contract where these fields may be null. Includes 8 new test cases covering: - State events with/without user, with/without source_merge_request - Label events for add and remove actions, including null color handling - Milestone event deserialization - Standalone ref type deserialization (MR, label, milestone) Uses r##"..."## raw string delimiters where JSON contains hex color codes (#FF0000) that would conflict with r#"..."# delimiters. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:06:56 -05:00
Taylor Eernisse	ce5cd9c95d	feat(schema): Add migration 011 for resource events, entity references, and dependent fetch queue Introduces five new tables that power temporal queries (timeline, file-history, trace) via GitLab Resource Events APIs: - resource_state_events: State transitions (opened/closed/reopened/merged/locked) with actor tracking, source commit, and source MR references - resource_label_events: Label add/remove history per entity - resource_milestone_events: Milestone assignment changes per entity - entity_references: Cross-reference table (Gate 2 prep) linking source/target entity pairs with reference type and discovery method - pending_dependent_fetches: Generic job queue for resource_events, mr_closes_issues, and mr_diffs with exponential backoff retry All event tables enforce entity exclusivity via CHECK constraints (exactly one of issue_id or merge_request_id must be non-NULL). Deduplication handled via UNIQUE indexes on (gitlab_id, project_id). FK cascades ensure cleanup when parent entities are removed. The dependent fetch queue uses a UNIQUE constraint on (project_id, entity_type, entity_iid, job_type) for idempotent enqueue, with partial indexes optimizing claim and retry queries. Registered as migration 011 in the embedded MIGRATIONS array in db.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:06:43 -05:00
Taylor Eernisse	549a0646d7	chore: Add test-runner agent, agent-swarm-launcher skill, review artifacts, and beads updates - .claude/agents/test-runner.md: New Claude Code agent definition for running cargo test suites and analyzing results, configured with haiku model for fast execution. - skills/agent-swarm-launcher/: New skill for bootstrapping coordinated multi-agent workflows with AGENTS.md reconnaissance, Agent Mail coordination, and beads task tracking. - api-review.html, phase-a-review.html: Self-contained HTML review artifacts for API audit and Phase A search pipeline review. - .beads/issues.jsonl, .beads/last-touched: Updated issue tracker state reflecting current project work items. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:36:05 -05:00
Taylor Eernisse	a417640faa	docs: Overhaul AGENTS.md, update README, add pipeline spec and Phase B plan AGENTS.md: Comprehensive rewrite adding file deletion safeguards, destructive git command protocol, Rust toolchain conventions, code editing discipline rules, compiler check requirements, TDD mandate, MCP Agent Mail coordination protocol, beads/bv/ubs/ast-grep/cass tool documentation, and session completion workflow. README.md: Document NO_COLOR/CLICOLOR env vars, --since 1m duration, project resolution cascading match logic, lore health and robot-docs commands, exit codes 17 (not found) and 18 (ambiguous match), --color/--quiet global flags, dirty_sources and pending_discussion_fetches tables, and version command git hash output. docs/embedding-pipeline-hardening.md: Detailed spec covering the three problems from the chunk size reduction (broken --full wiring, mixed chunk sizes in vector space, static dedup multiplier) with decision records, implementation plan, and acceptance criteria. docs/phase-b-temporal-intelligence.md: Draft planning document for transforming gitlore from a search engine into a temporal code intelligence system by ingesting structured event data from GitLab. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:51 -05:00
Taylor Eernisse	f560e6bc00	test(embedding): Add regression tests for pipeline hardening bugs Three targeted regression tests covering bugs fixed in the embedding pipeline hardening: - overflow_doc_with_error_sentinel_not_re_detected_as_pending: verifies that documents skipped for producing too many chunks have their sentinel error recorded in embedding_metadata and are NOT returned by find_pending_documents or count_pending_documents on subsequent runs (prevents infinite re-processing loop). - count_and_find_pending_agree: exercises four states (empty DB, new document, fully-embedded document, config-drifted document) and asserts that count_pending_documents and find_pending_documents produce consistent results across all of them. - full_embed_delete_is_atomic: confirms the --full flag's two DELETE statements (embedding_metadata + embeddings) execute atomically within a transaction. Also updates test DB creation to apply migration 010. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:34 -05:00
Taylor Eernisse	aebbe6b795	feat(cli): Wire --full flag for embed, add sync stage spinners - Add --full / --no-full flag pair to EmbedArgs with overrides_with semantics matching the existing flag pattern. When active, atomically DELETEs all embedding_metadata and embeddings before re-embedding. - Thread the full flag through run_embed -> run_sync so that 'lore sync --full' triggers a complete re-embed alongside the full re-ingest it already performed. - Add indicatif spinners to sync stages with dynamic stage numbering that adjusts when --no-docs or --no-embed skip stages. Spinners are hidden in robot mode. - Update robot-docs manifest to advertise the new --full flag on the embed command. - Replace hardcoded schema version 9 in health check with the LATEST_SCHEMA_VERSION constant from db.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:22 -05:00
Taylor Eernisse	7d07f95d4c	fix(embedding): Harden pipeline against chunk overflow, config drift, and partial failures Reduces CHUNK_MAX_BYTES from 32KB to 6KB and CHUNK_OVERLAP_CHARS from 500 to 200 to stay within nomic-embed-text's 8,192-token context window. This commit addresses all downstream consequences of that reduction: - Config drift detection: find_pending_documents and count_pending_documents now take model_name and compare chunk_max_bytes, model, and dims against stored metadata. Documents embedded with stale config are automatically re-queued. - Overflow guard: documents producing >= CHUNK_ROWID_MULTIPLIER chunks are skipped with a sentinel error recorded in embedding_metadata, preventing both rowid collision and infinite re-processing loops. - Deferred clearing: old embeddings are no longer cleared before attempting new ones. clear_document_embeddings is deferred until the first successful chunk embedding, so if all chunks fail the document retains its previous embeddings rather than losing all data. - Savepoints: each page of DB writes is wrapped in a SQLite savepoint so a crash mid-page rolls back atomically instead of leaving partial state (cleared embeddings with no replacements). - Per-chunk retry on context overflow: when a batch fails with a context-length error, each chunk is retried individually so one oversized chunk doesn't poison the entire batch. - Adaptive dedup in vector search: replaces the static 3x over-fetch multiplier with a dynamic one based on actual max chunks per document (using the new chunk_count column with a fallback COUNT query for pre-migration data). Also replaces partial_cmp with total_cmp for f64 distance sorting. - Stores chunk_max_bytes and chunk_count (on sentinel rows) in embedding_metadata to support config drift detection and adaptive dedup without runtime queries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:08 -05:00
Taylor Eernisse	2a52594a60	feat(db): Add migration 010 for chunk config tracking columns Add chunk_max_bytes and chunk_count columns to embedding_metadata to support config drift detection and adaptive dedup sizing. Includes a partial index on sentinel rows (chunk_index=0) to accelerate the drift detection and max-chunk queries. Also exports LATEST_SCHEMA_VERSION as a public constant derived from the MIGRATIONS array length, replacing the previously hardcoded magic number in the health check. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:34:48 -05:00

... 2 3 4 5 6

254 Commits