gitlore

Author	SHA1	Message	Date
Taylor Eernisse	e6b880cbcb	fix: prevent panics in robot-mode JSON output and arithmetic paths Peer code review found multiple panic-reachable paths: 1. serde_json::to_string().unwrap() in 4 robot-mode output functions (who.rs, main.rs x3). If serialization ever failed (e.g., NaN from edge-case division), the CLI would panic with an unhelpful stack trace. Replaced with unwrap_or_else that emits a structured JSON error fallback. 2. encode_rowid() in chunk_ids.rs used unchecked multiplication (document_id * 1000). On extreme document IDs this could silently wrap in release mode, causing embedding rowid collisions. Now uses checked_mul + checked_add with a diagnostic panic message. 3. HTTP response body truncation at byte index 500 in client.rs could split a multi-byte UTF-8 character, causing a panic. Now uses floor_char_boundary(500) for safe truncation. 4. who.rs reviews mode: SQL used `m.author_username != ?1` which silently dropped MRs with NULL author_username (SQL NULL != anything = NULL). Changed to `(m.author_username IS NULL OR m.author_username != ?1)` to match the pattern already used in expert mode. 5. handle_auth_test hardcoded exit code 5 for all errors regardless of type. Config not found (20), token not set (4), and network errors (8) all incorrectly returned 5. Now uses e.exit_code() from the actual LoreError, with proper suggestion hints in human mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 07:55:20 -05:00
Taylor Eernisse	f267578aab	feat: implement lore who — people intelligence commands (5 modes) Add `lore who` command with 5 query modes answering collaboration questions using existing DB data (280K notes, 210K discussions, 33K DiffNotes): - Expert: who knows about a file/directory (DiffNote path analysis + MR breadth scoring) - Workload: what is a person working on (assigned issues, authored/reviewing MRs, discussions) - Active: what discussions need attention (unresolved resolvable, global/project-scoped) - Overlap: who else is touching these files (dual author+reviewer role tracking) - Reviews: what review patterns does a person have (prefix-based category extraction) Includes migration 017 (5 composite indexes), CLI skeleton with clap conflicts_with validation, robot JSON output with input+resolved_input reproducibility, human terminal output, and 20 unit tests. All quality gates pass. Closes: bd-1q8z, bd-34rr, bd-2rk9, bd-2ldg, bd-zqpf, bd-s3rc, bd-m7k1, bd-b51e, bd-2711, bd-1rdi, bd-3mj2, bd-tfh3, bd-zibc, bd-g0d5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 23:11:14 -05:00
Taylor Eernisse	cf6d27435a	feat(robot): add elapsed_ms timing, --fields support, and actionable error actions Robot mode consistency improvements across all command output: Timing: - Every robot JSON response now includes meta.elapsed_ms measuring wall-clock time from command start to serialization. Agents can use this to detect slow queries and tune --limit or --project filters. Field selection (--fields): - print_list_issues_json and print_list_mrs_json accept an optional fields slice that prunes each item in the response array to only the requested keys. A "minimal" preset expands to [iid, title, state, updated_at_iso] for token-efficient agent scans. - filter_fields and expand_fields_preset live in the new src/cli/robot.rs module alongside RobotMeta. Actionable error recovery: - LoreError gains an actions() method returning concrete shell commands an agent can execute to recover (e.g. "ollama serve" for OllamaUnavailable, "lore init" for ConfigNotFound). - RobotError now serializes an "actions" array (empty array omitted) so agents can parse and offer one-click fixes. Envelope consistency: - show issue/MR JSON responses now use the standard {"ok":true,"data":...,"meta":...} envelope instead of bare data, matching all other commands. Files: src/cli/robot.rs (new), src/core/error.rs, src/cli/commands/{count,embed,generate_docs,ingest,list,show,stats,sync_status}.rs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 23:46:48 -05:00
Taylor Eernisse	c2036c64e9	feat(embed): docs_embedded tracking, buffer reuse, retry hardening Embedding pipeline improvements building on the concurrent batching foundation: - Track docs_embedded vs chunks_embedded separately. A document counts as embedded only when ALL its chunks succeed, giving accurate progress reporting. The sync command reads docs_embedded for its document count. - Reuse a single Vec<u8> buffer (embed_buf) across all store_embedding calls instead of allocating per chunk. Eliminates ~3KB allocation per 768-dim embedding. - Detect and record errors when Ollama silently returns fewer embeddings than inputs (batch mismatch). Previously these dropped chunks were invisible. - Improve retry error messages: distinguish "retry returned unexpected result" (wrong dims/count) from "retry request failed" (network error) instead of generic "chunk too large" message. - Convert all hot-path SQL from conn.execute() to prepare_cached() for statement cache reuse (clear_document_embeddings, store_embedding, record_embedding_error). - Record embedding_metadata errors for empty documents so they don't appear as perpetually pending on subsequent runs. - Accept concurrency parameter (configurable via config.embedding.concurrency) instead of hardcoded EMBED_CONCURRENCY=2. - Add schema version pre-flight check in embed command to fail fast with actionable error instead of cryptic SQL errors. - Fix --retry-failed to use DELETE instead of UPDATE. UPDATE clears last_error but the row still matches config params in the LEFT JOIN, making the doc permanently invisible to find_pending_documents. DELETE removes the row entirely so the LEFT JOIN returns NULL. Regression test added (old_update_approach_leaves_doc_invisible). - Add chunking forward-progress guard: after floor_char_boundary() rounds backward, ensure start advances by at least one full character to prevent infinite loops on multi-byte sequences (box-drawing chars, smart quotes). Test cases cover the exact patterns that caused production hangs on document 18526. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 22:42:08 -05:00
Taylor Eernisse	39cb0cb087	feat(embed): concurrent batching, UTF-8 safe chunking, right-sized chunks Three fixes to the embedding pipeline: 1. Concurrent HTTP batching: fire EMBED_CONCURRENCY (2) Ollama requests in parallel via join_all, then write results serially to SQLite. ~2x throughput improvement on GPU-bound workloads. 2. UTF-8 boundary safety: all computed byte offsets in split_into_chunks (paragraph/sentence/word break finders + overlap advance) now use floor_char_boundary() to prevent panics on multi-byte characters like smart quotes and non-breaking spaces. 3. CHUNK_MAX_BYTES reduced from 6000 to 1500 to fit nomic-embed-text's actual 2048-token context window, eliminating context-length retry storms that were causing 10x slowdowns. Also threads ShutdownSignal through embed pipeline for graceful Ctrl+C. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 14:48:34 -05:00
Taylor Eernisse	1c45725cba	fix(sync): pass options.full through to generate-docs stage The sync pipeline was hardcoding `false` for the `full` parameter when calling run_generate_docs, so `lore sync --full` would re-ingest all entities but then only regenerate documents for newly-dirtied ones. Entities loaded before migration 007 (which introduced the dirty_sources system) were never marked dirty and thus never got documents generated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 11:42:11 -05:00
Taylor Eernisse	405e5370dc	feat(sync): concurrent drains, atomic watermarks, graceful Ctrl+C shutdown Three fixes to the sync pipeline: 1. Atomic watermarks: wrap complete_job + update_watermark in a single SQLite transaction so crash between them can't leave partial state. 2. Concurrent drain loops: prefetch HTTP requests via join_all (batch size = dependent_concurrency), then write serially to DB. Reduces ~9K sequential requests from ~19 min to ~2.4 min. 3. Graceful shutdown: install Ctrl+C handler via ShutdownSignal (Arc<AtomicBool>), thread through orchestrator/CLI, release locked jobs on interrupt, record sync_run as "failed". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 11:22:04 -05:00
Taylor Eernisse	32783080f1	fix(timeline): report true total_events in robot JSON meta The robot JSON envelope's meta.total_events field was incorrectly reporting events.len() (the post-limit count), making it identical to meta.showing. This defeated the purpose of having both fields. Changes across the pipeline to fix this: - collect_events now returns (Vec<TimelineEvent>, usize) where the second element is the total event count before truncation - TimelineResult gains a total_events_before_limit field (serde-skipped) so the value flows cleanly from collect through to the renderer - main.rs passes the real total instead of the events.len() workaround Additional cleanup in this pass: - Derive PartialEq/Eq/PartialOrd/Ord on TimelineEventType, replacing the hand-rolled event_type_discriminant() function. Variant declaration order now defines sort tiebreak, documented in a doc comment. - Validate --since input with a proper LoreError::Other instead of silently treating invalid values as None - Fix ANSI-aware tag column padding with console::pad_str (colored tags like "[merged]" were misaligned because ANSI escapes consumed width) - Remove dead print_timeline_json and infer_max_depth functions that were superseded by print_timeline_json_with_meta Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 09:35:02 -05:00
Taylor Eernisse	69df8a5603	feat(timeline): wire up lore timeline command with human + robot renderers Complete Gate 3 by implementing the final three beads: - bd-2f2: Human output renderer with colored event tags, entity refs, evidence snippets, and expansion summary footer - bd-dty: Robot JSON output with {ok,data,meta} envelope, ISO timestamps, nested via provenance, and per-event-type details objects - bd-1nf: CLI wiring with TimelineArgs (9 flags), Commands::Timeline variant, handle_timeline handler, VALID_COMMANDS entry, and robot-docs manifest with temporal_intelligence workflow All 7 Gate 3 children now closed. Pipeline: SEED -> HYDRATE -> EXPAND -> COLLECT -> RENDER fully operational. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 08:49:48 -05:00
Taylor Eernisse	5d1586b88e	feat(show): Display full discussion content without truncation Remove artificial length limits from `lore show` output to display complete descriptions and discussion threads. Previously, descriptions were truncated to 500 characters and discussion notes to 300 characters, which cut off important context when reviewing issues and MRs. Users often need the full content to understand the complete discussion history. Changes: - Remove truncate() helper function and its 2 unit tests - Pass description and note bodies directly to wrap_text() - Affects both print_show_issue() and print_show_mr() The wrap_text() function continues to handle line wrapping for readability at the configured widths (76/72/68 chars depending on nesting level). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:46:29 -05:00
Taylor Eernisse	c730b0ec54	feat(cli): Improve help text, error handling, and add fuzzy command suggestions CLI help improvements (cli/mod.rs): - Add descriptive help text to all global flags (-c, --robot, -J, etc.) - Add descriptions to all subcommands (Issues, Mrs, Sync, etc.) - Add --no-quiet flag for explicit quiet override - Shell completions now shows installation instructions for each shell - Optional subcommand: running bare 'lore' shows help in terminal mode, robot-docs in robot mode Structured clap error handling (main.rs): - Early robot mode detection before parsing (env + args) - JSON error output for parse failures in robot mode - Semantic error codes: UNKNOWN_COMMAND, UNKNOWN_FLAG, MISSING_REQUIRED, INVALID_VALUE, ARGUMENT_CONFLICT, etc. - Fuzzy command suggestion using Jaro-Winkler similarity (>0.7 threshold) - Help/version requests handled normally (exit 0, not error) Robot-docs enhancements (main.rs): - Document deprecated command aliases (list issues -> issues, etc.) - Document clap error codes for programmatic error handling - Include completions command in manifest - Update flag documentation to show short forms (-n, -s, -p, etc.) Dependencies: - Add strsim 0.11 for Jaro-Winkler fuzzy matching Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:22:38 -05:00
Taylor Eernisse	ab43bbd2db	feat: Add dry-run mode to ingest, sync, and stats commands Enables preview of operations without making changes, useful for understanding what would happen before committing to a full sync. Ingest dry-run (--dry-run flag): - Shows resource type, sync mode (full vs incremental), project list - Per-project info: existing count, has_cursor, last_synced timestamp - No GitLab API calls, no database writes Sync dry-run (--dry-run flag): - Preview all four stages: issues ingest, MRs ingest, docs, embed - Shows which stages would run vs be skipped (--no-docs, --no-embed) - Per-project breakdown for both entity types Stats repair dry-run (--dry-run flag): - Shows what would be repaired without executing repairs - "would fix" vs "fixed" indicator in terminal output - dry_run: true field in JSON response Implementation details: - DryRunPreview struct captures project-level sync state - SyncDryRunResult aggregates previews for all sync stages - Terminal output uses yellow styling for "would" actions - JSON output includes dry_run: true at top level Flag handling: - --dry-run and --no-dry-run pair for explicit control - Defaults to false (normal operation) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:22:22 -05:00
Taylor Eernisse	784fe79b80	feat(show): Enrich issue detail with assignees, milestones, and closing MRs Issue detail now includes: - assignees: List of assigned usernames from issue_assignees table - due_date: Issue due date when set - milestone: Milestone title when assigned - closing_merge_requests: MRs that will close this issue when merged Closing MR detection: - Queries entity_references table for 'closes' reference type - Shows MR iid, title, state (with color coding) in terminal output - Full MR metadata included in JSON output Human-readable output: - "Assignees:" line shows comma-separated @usernames - "Development:" section lists closing MRs with state indicator - Green for merged, cyan for opened, red for closed JSON output: - New fields: assignees, due_date, milestone, closing_merge_requests - closing_merge_requests array contains iid, title, state, web_url Test coverage: - get_issue_assignees: empty, single, multiple (alphabetical order) - get_closing_mrs: empty, single, ignores 'mentioned' references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:22:02 -05:00
Taylor Eernisse	db750e4fc5	fix: Graceful HTTP client fallbacks and overflow protection HTTP client initialization (embedding/ollama.rs, gitlab/client.rs): - Replace expect/panic with unwrap_or_else fallback to default Client - Log warning when configured client fails to build - Prevents crash on TLS/system configuration issues Doctor command (cli/commands/doctor.rs): - Handle reqwest Client::builder() failure in Ollama health check - Return Warning status with descriptive message instead of panicking - Ensures doctor command remains operational even with HTTP issues These changes improve resilience when running in unusual environments (containers with limited TLS, restrictive network policies, etc.) without affecting normal operation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:21:40 -05:00
Taylor Eernisse	72f1cafdcf	perf: Optimize SQL queries and reduce allocations in hot paths Change detection queries (embedding/change_detector.rs): - Replace triple-EXISTS subquery pattern with LEFT JOIN + NULL check - SQLite now scans embedding_metadata once instead of three times - Semantically identical: returns docs needing embedding when no embedding exists, hash changed, or config mismatch Count queries (cli/commands/count.rs): - Consolidate 3 separate COUNT queries for issues into single query using conditional aggregation (CASE WHEN state = 'x' THEN 1) - Same optimization for MRs: 5 queries reduced to 1 Search filter queries (search/filters.rs): - Replace N separate EXISTS clauses for label filtering with single IN() clause with COUNT/GROUP BY HAVING pattern - For multi-label AND queries, this reduces N subqueries to 1 FTS tokenization (search/fts.rs): - Replace collect-into-Vec-then-join pattern with direct String building - Pre-allocate capacity hint for result string Discussion truncation (documents/truncation.rs): - Calculate total length without allocating concatenated string first - Only allocate full string when we know it fits within limit Embedding pipeline (embedding/pipeline.rs): - Add Vec::with_capacity hints for chunk work and cleared_docs hashset - Reduces reallocations during embedding batch processing Backoff calculation (core/backoff.rs): - Replace unchecked addition with saturating_add to prevent overflow - Add test case verifying overflow protection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:21:28 -05:00
Taylor Eernisse	65583ed5d6	refactor: Remove redundant doc comments throughout codebase Removes module-level doc comments (//! lines) and excessive inline doc comments that were duplicating information already evident from: - Function/struct names (self-documenting code) - Type signatures (the what is clear from types) - Implementation context (the how is clear from code) Affected modules: - cli/* - Removed command descriptions duplicating clap help text - core/* - Removed module headers and obvious function docs - documents/* - Removed extractor/regenerator/truncation docs - embedding/* - Removed pipeline and chunking docs - gitlab/* - Removed client and transformer docs (kept type definitions) - ingestion/* - Removed orchestrator and ingestion docs - search/* - Removed FTS and vector search docs Philosophy: Code should be self-documenting. Comments should explain "why" (business decisions, non-obvious constraints) not "what" (which the code itself shows). This change reduces noise and maintenance burden while keeping the codebase just as understandable. Retains comments for: - Non-obvious business logic - Important safety invariants - Complex algorithm explanations - Public API boundaries where generated docs matter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:04:32 -05:00
Taylor Eernisse	1d003aeac2	fix(sync): Replace text-only progress with animated bars for docs/embed stages Stages 3 (generate-docs) and 4 (embed) reported progress by appending "(N/M)" text to the stage spinner message, while stages 1-2 (ingest) used dedicated indicatif progress bars with animated [====> ] rendering registered with the global MultiProgress. This visual inconsistency was introduced when progress callbacks were wired through in `266ed78`. Replace the spinner.set_message() callbacks with proper ProgressBar instances that match the ingest stage pattern: - Create a bar-style ProgressBar registered via multi().add() - Use the same template/progress_chars as the ingest discussion bars - Lazy-init the tick via AtomicBool to avoid showing the bar before the first callback fires (matching how ingest enables ticks only at DiscussionSyncStarted) - Update set_length on every callback for the docs stage, since the regenerator's estimated_total can grow if new dirty items are queued during processing (using .max() internally) - Clean up both the sub-bar and stage spinner on completion/error Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 15:02:13 -05:00
Taylor Eernisse	925ec9f574	fix: Retry loop safety, doctor model matching, regenerator robustness Three defensive improvements from peer code review: Replace unreachable!() in GitLab client retry loops: Both request() and request_with_headers() had unreachable!() after their for loops. While the logic was sound (the final iteration always reaches the return/break), any refactor to the loop condition would turn this into a runtime panic. Restructured both to store last_response with explicit break, making the control flow self-documenting and the .expect() message useful if ever violated. Doctor model name comparison asymmetry: Ollama model names were stripped of their tag (:latest, :v1.5) for comparison, but the configured model name was compared as-is. A config value like "nomic-embed-text:v1.5" would never match. Now strips the tag from both sides before comparing. Regenerator savepoint cleanup and progress accuracy: - upsert_document's error path did ROLLBACK TO but never RELEASE, leaving a dangling savepoint that could nest on the next call. Added RELEASE after rollback so the connection is clean. - estimated_total for progress reporting was computed once at start but the dirty queue can grow during processing. Now recounts each loop iteration with max() so the progress fraction never goes backwards. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:54 -05:00
Taylor Eernisse	266ed78e73	feat(sync): Wire progress callbacks through sync pipeline stages The sync command's stage spinners now show real-time aggregate progress for each pipeline phase instead of static "syncing..." messages. - Add `progress_callback` parameter to `run_embed` and `run_generate_docs` so callers can receive `(processed, total)` updates - Add `stage_bar` parameter to `run_ingest` for aggregate progress across concurrently-ingested projects using shared AtomicUsize counters - Update `stage_spinner` to use `{prefix}` for the `[N/M]` label, allowing `{msg}` to be updated independently with progress details - Thread `ProgressBar` clones into each concurrent project task so per-entity progress (fetch, discussions, events) is reflected on the aggregate spinner - Pass `None` for progress callbacks at standalone CLI entry points (handle_ingest, handle_generate_docs, handle_embed) to preserve existing behavior when commands are run outside of sync Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:21 -05:00
teernisse	f6d19a9467	feat(sync): Instrument pipeline with tracing spans, run_id correlation, and metrics Add end-to-end observability to the sync and ingest pipelines: Sync command: - Generate UUID-based run_id for each sync invocation, propagated through all child spans for log correlation across stages - Accept MetricsLayer reference to extract hierarchical StageTiming data after pipeline completion for robot-mode performance output - Record sync runs in DB via SyncRunRecorder (start/succeed/fail lifecycle) - Wrap entire sync execution in a root tracing span with run_id field Ingest command: - Wrap run_ingest in an instrumented root span with run_id and resource_type - Add project path prefix to discussion progress bars for multi-project clarity - Reset resource_events_synced_for_updated_at on --full re-sync Sync status: - Expand from single last_run to configurable recent runs list (default 10) - Parse and expose StageTiming metrics from stored metrics_json - Add run_id, total_items_processed, total_errors to SyncRunInfo - Add mr_count to DataSummary for complete entity coverage Orchestrator: - Add #[instrument] with structured fields to issue and MR ingestion functions - Record items_processed, items_skipped, errors on span close for MetricsLayer - Emit granular progress events (IssuesFetchStarted, IssuesFetchComplete) - Pass project_id through to drain_resource_events for scoped job claiming Document regenerator and embedding pipeline: - Add #[instrument] spans with items_processed, items_skipped, errors fields - Record final counts on span close for metrics extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:00 -05:00
teernisse	362503d3bf	feat(cli): Add verbosity controls, JSON log format, and triple-layer subscriber Overhaul the CLI logging infrastructure for production observability: CLI flags: - Add -v/-vv/-vvv (--verbose) for progressive stderr verbosity control: 0=INFO, 1=DEBUG app, 2=DEBUG all, 3+=TRACE - Add --log-format text\|json for structured stderr output in automation - Existing -q/--quiet overrides verbosity for silent operation Subscriber architecture (main.rs): - Replace single-layer subscriber with triple-layer setup: 1. stderr layer: human-readable or JSON, filtered by -v flags 2. file layer: always-on JSON to daily-rotated logs (lore.YYYY-MM-DD.log) 3. MetricsLayer: captures span timing for robot-mode performance payloads - Parse CLI before subscriber init so verbosity is known at setup time - Load LoggingConfig early (with graceful fallback for pre-init commands) - Clean up old log files before subscriber init to avoid holding deleted handles - Hold WorkerGuard at function scope to ensure flush on exit Doctor command: - Add logging health check: validates log directory exists, reports file count and total size, warns on missing or inaccessible log directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:38:43 -05:00
Taylor Eernisse	f5b4a765b7	perf: Configurable rate limit, 429 auto-retry, concurrent project ingestion The sync pipeline was bottlenecked at 10 req/s (hardcoded) with sequential project processing and no retry on rate limiting. These changes target 3-5x throughput improvement. Rate limit configuration: - Add requestsPerSecond to SyncConfig (default 30.0, was hardcoded 10) - Pass configured rate through to GitLabClient::new from ingest - Floor rate at 0.1 rps in RateLimiter::new to prevent panic on Duration::from_secs_f64(1.0 / 0.0) — now reachable via user config 429 auto-retry: - Both request() and request_with_headers() retry up to 3 times on HTTP 429, respecting the retry-after header (default 60s) - Extract parse_retry_after helper, reused by handle_response fallback - After exhausting retries, the 429 error propagates as before - Improved JSON decode errors now include a response body preview Concurrent project ingestion: - Derive Clone on GitLabClient (cheap: shares Arc<Mutex<RateLimiter>> and reqwest::Client which is already Arc-backed) - Restructure project loop to use futures::stream::buffer_unordered with primary_concurrency (default 4) as the parallelism bound - Each project gets its own SQLite connection (WAL mode + busy_timeout handles concurrent writes) - Add show_spinner field to IngestDisplay to separate the per-project spinner from the sync-level stage spinner - Error aggregation defers failures: all successful projects get their summaries printed and results counted before returning the first error - Bump dependentConcurrency default from 2 to 8 for discussion prefetch Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:37:06 -05:00
Taylor Eernisse	c35f485e0e	refactor(cli): Replace tracing-indicatif with shared MultiProgress tracing-indicatif pulled in vt100, arrayvec, and its own indicatif integration layer. Replace it with a minimal SuspendingWriter that coordinates tracing output with progress bars via a global LazyLock MultiProgress. - Add src/cli/progress.rs: shared MultiProgress singleton via LazyLock and a SuspendingWriter that suspends bars before writing log lines, preventing interleaving/flicker - Wire all progress bar creation through multi().add() in sync and ingest commands - Replace IndicatifLayer in main.rs with SuspendingWriter for tracing-subscriber's fmt layer - Remove tracing-indicatif from Cargo.toml (drops vt100 and arrayvec transitive deps) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 17:36:31 -05:00
Taylor Eernisse	bb75a9d228	fix(events): Resource events now run on incremental syncs, fix output and progress bar Three bugs fixed: 1. Early return in orchestrator when no discussions needed sync also skipped resource event enqueue+drain. On incremental syncs (the most common case), resource events were never fetched. Restructured to use if/else instead of early return so Step 4 always executes. 2. Ingest command JSON and human-readable output silently dropped resource_events_fetched/failed counts. Added to IngestJsonData and print_ingest_summary. 3. Progress bar reuse after finish_and_clear caused indicatif to silently ignore subsequent set_position/set_length calls. Added reset() call before reconfiguring the bar for resource events. Also removed stale comment referencing "unsafe" that didn't reflect the actual unchecked_transaction approach. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:06:35 -05:00
Taylor Eernisse	2bcd8db0e9	feat(events): Wire resource event fetching into sync pipeline (bd-1ep) Integrate resource event fetching as Step 4 of both issue and MR ingestion, gated behind the fetch_resource_events config flag. Orchestrator changes: - Add ProgressEvent variants: ResourceEventsFetchStarted, ResourceEventFetched, ResourceEventsFetchComplete - Add resource_events_fetched/failed fields to IngestProjectResult and IngestMrProjectResult - New enqueue_resource_events_for_entity_type() queries all issues/MRs for a project and enqueues resource_events jobs via the dependent queue (INSERT OR IGNORE for idempotency) - New drain_resource_events() claims jobs in batches, fetches state/label/milestone events from GitLab API, stores them atomically via unchecked_transaction, and handles failures with exponential backoff via fail_job() - Max-iterations guard prevents infinite retry loops within a single drain run - New store_resource_events() + per-type _tx helpers write events using prepared statements inside a single transaction - DrainResult struct tracks fetched/failed counts CLI ingest changes: - IngestResult gains resource_events_fetched/failed fields - Progress bar repurposed for resource event fetch phase (reuses discussion bar with updated template) - Accumulates event counts from both issue and MR ingestion CLI sync changes: - SyncResult gains resource_events_fetched/failed fields - Accumulates counts from both ingest stages - print_sync() conditionally displays event counts - Structured logging includes event counts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:02:15 -05:00
Taylor Eernisse	a50fc78823	style: Apply cargo fmt and clippy fixes across codebase Automated formatting and lint corrections from parallel agent work: - cargo fmt: import reordering (alphabetical), line wrapping to respect max width, trailing comma normalization, destructuring alignment, function signature reformatting, match arm formatting - clippy (pedantic): Range::contains() instead of manual comparisons, i64::from() instead of `as i64` casts, .clamp() instead of .max().min() chains, let-chain refactors (if-let with &&), #[allow(clippy::too_many_arguments)] and #[allow(clippy::field_reassign_with_default)] where warranted - Removed trailing blank lines and extra whitespace No behavioral changes. All existing tests pass unmodified. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:01:59 -05:00
Taylor Eernisse	0236ef2776	feat(stats): Extend --check with event FK integrity and queue health diagnostics Adds two new categories of integrity checks to 'lore stats --check': Event FK integrity (3 queries): - Detects orphaned resource_state_events where issue_id or merge_request_id points to a non-existent parent entity - Same check for resource_label_events and resource_milestone_events - Under normal CASCADE operation these should always be zero; non-zero indicates manual DB edits, bugs, or partial migration state Queue health diagnostics: - pending_dependent_fetches counts: pending, failed, and stuck (locked) - queue_stuck_locks: Jobs with locked_at set (potential worker crashes) - queue_max_attempts: Highest retry count across all jobs (signals permanently failing jobs when > 3) New IntegrityResult fields: orphan_state_events, orphan_label_events, orphan_milestone_events, queue_stuck_locks, queue_max_attempts. New QueueStats fields: pending_dependent_fetches, pending_dependent_fetches_failed, pending_dependent_fetches_stuck. Human output shows colored PASS/WARN/FAIL indicators: - Red "!" for orphaned events (integrity failure) - Yellow "!" for stuck locks and high retry counts (warnings) - Dependent fetch queue line only shown when non-zero All new queries are guarded by table_exists() checks for graceful degradation on databases without migration 011 applied. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:08:15 -05:00
Taylor Eernisse	12811683ca	feat(cli): Add 'lore count events' command with human and robot output Extends the count command to support "events" as an entity type, displaying resource event counts broken down by event type (state, label, milestone) and entity type (issue, merge request). New functions in count.rs: - run_count_events: Creates DB connection and delegates to events_db::count_events for the actual queries - print_event_count: Human-readable table with aligned columns showing per-type breakdowns and row/column totals - print_event_count_json: Structured JSON matching the robot mode contract with ok/data envelope and per-type issue/mr/total counts JSON output structure: {"ok":true,"data":{"state_events":{"issue":N,"merge_request":N, "total":N},"label_events":{...},"milestone_events":{...},"total":N}} Updated exports in commands/mod.rs to expose the three new public functions (run_count_events, print_event_count, print_event_count_json). The "events" branch in handle_count (main.rs, committed earlier) routes to these functions before the existing entity type dispatcher. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:08:01 -05:00
Taylor Eernisse	9d4755521f	feat(config): Add fetchResourceEvents config flag with --no-events CLI override Adds a new boolean field to SyncConfig that controls whether resource event fetching is performed during sync: - SyncConfig.fetch_resource_events: defaults to true via serde default_true helper, serialized as "fetchResourceEvents" in JSON - SyncArgs.no_events: --no-events CLI flag that overrides the config value to false when present - SyncOptions.no_events: propagates the flag through the sync pipeline - handle_sync_cmd: mutates loaded config when --no-events is set, ensuring the flag takes effect regardless of config file contents This follows the existing pattern established by --no-embed and --no-docs flags, where CLI flags override config file defaults. The config is loaded as mutable specifically to support this override. Also adds "events" to the count command's entity type value_parser, enabling `lore count events` (implementation in a separate commit). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:06 -05:00
Taylor Eernisse	aebbe6b795	feat(cli): Wire --full flag for embed, add sync stage spinners - Add --full / --no-full flag pair to EmbedArgs with overrides_with semantics matching the existing flag pattern. When active, atomically DELETEs all embedding_metadata and embeddings before re-embedding. - Thread the full flag through run_embed -> run_sync so that 'lore sync --full' triggers a complete re-embed alongside the full re-ingest it already performed. - Add indicatif spinners to sync stages with dynamic stage numbering that adjusts when --no-docs or --no-embed skip stages. Spinners are hidden in robot mode. - Update robot-docs manifest to advertise the new --full flag on the embed command. - Replace hardcoded schema version 9 in health check with the LATEST_SCHEMA_VERSION constant from db.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:22 -05:00
Taylor Eernisse	667f70e177	refactor(commands): Add IngestDisplay, resolve_project, and color-aware tables Ingest: - Introduce IngestDisplay struct with show_progress/show_text booleans to decouple progress bars from text output. Replaces the robot_mode bool parameter with explicit display control, enabling sync to show progress without duplicating summary text (progress_only mode). - Use resolve_project() for --project filtering instead of LIKE queries, providing proper error messages for ambiguous or missing projects. List: - Add colored_cell() helper that checks console::colors_enabled() before applying comfy-table foreground colors, bridging the gap between the console and comfy-table crates for --color flag support. - Use resolve_project() for project filtering (exact ID match). - Improve since filter to return explicit errors instead of silently ignoring invalid values. - Improve format_relative_time for proper singular/plural forms. Search: - Validate --after/--updated-after with explicit error messages. - Handle optional title field (Option<String>) in HydratedRow. Show: - Use resolve_project() for project disambiguation. Sync: - Thread robot_mode via SyncOptions for IngestDisplay selection. - Use IngestDisplay::progress_only() in interactive sync mode. GenerateDocs: - Use resolve_project() for --project filtering. Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:54:36 -05:00
Taylor Eernisse	daf5a73019	feat(cli): Add search, stats, embed, sync, health, and robot-docs commands Extends the CLI with six new commands that complete the search pipeline: - lore search <QUERY>: Hybrid search with mode selection (lexical, hybrid, semantic), rich filtering (--type, --author, --project, --label, --path, --after, --updated-after), result limits, and optional explain mode showing RRF score breakdowns. Safe FTS mode sanitizes user input; raw mode passes through for power users. - lore stats: Document and index statistics with optional --check for integrity verification and --repair to fix inconsistencies (orphaned documents, missing FTS entries, stale dirty queue items). - lore embed: Generate vector embeddings via Ollama. Supports --retry-failed to re-attempt previously failed embeddings. - lore generate-docs: Drain the dirty queue to regenerate documents. --full seeds all entities for complete rebuild. --project scopes to a single project. - lore sync: Full pipeline orchestration (ingest issues + MRs, generate-docs, embed) with --no-embed and --no-docs flags for partial runs. Reports per-stage results and total elapsed time. - lore health: Quick pre-flight check (config exists, DB exists, schema current). Returns exit code 1 if unhealthy. Designed for agent pre-flight scripts. - lore robot-docs: Machine-readable command manifest for agent self-discovery. Returns all commands, flags, examples, exit codes, and recommended workflows as structured JSON. Also enhances lore init with --gitlab-url, --token-env-var, and --projects flags for fully non-interactive robot-mode initialization. Fixes init's force/non-interactive precedence logic and adds JSON output for robot mode. Updates all command files for the GiError -> LoreError rename. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:10 -05:00
Taylor Eernisse	8fe5feda7e	fix(ingestion): Move counter increments after transaction commit Ingestion counters (discussions_upserted, notes_upserted, discussions_fetched, diffnotes_count) were incremented before tx.commit(), meaning a failed commit would report inflated metrics. Counters now increment only after successful commit so reported numbers accurately reflect persisted state. Also simplifies the stale-removal guard in issue discussions: the received_first_response flag was unnecessary since an empty seen_discussion_ids list is safe to pass to remove_stale -- if there were no discussions, stale removal correctly sweeps all previously-stored discussions. The two separate code paths (empty vs populated) are collapsed into a single branch. Derives Default on IngestResult to eliminate verbose zero-init. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:42:11 -05:00
Taylor Eernisse	753ff46bb4	fix(cli): Correct project filtering and GROUP_CONCAT delimiter Two SQL correctness issues fixed: 1. Project filter used LIKE '%term%' which caused partial matches (e.g. filtering for "foo" matched "group/foobar"). Now uses exact match OR suffix match after '/' so "foo" matches "group/foo" but not "group/foobar". 2. GROUP_CONCAT used comma as delimiter for labels and assignees, which broke parsing when label names themselves contained commas. Switched to ASCII unit separator (0x1F) which cannot appear in GitLab entity names. Also adds a guard for negative time deltas in format_relative_time to handle clock skew gracefully instead of panicking. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:41:56 -05:00
teernisse	55b895a2eb	Update name to gitlore instead of gitlab-inbox	2026-01-28 15:49:14 -05:00
Taylor Eernisse	8ddc974b89	feat(cli): Add MR support to list/show/count/ingest commands Extends all data commands to support merge requests alongside issues, with consistent patterns and JSON output for robot mode. List command (gi list mrs): - MR-specific columns: branches, draft status, reviewers - Filters: --state (opened\|merged\|closed\|locked\|all), --draft, --no-draft, --reviewer, --target-branch, --source-branch - Discussion count with unresolved indicator (e.g., "5/2!") - JSON output includes full MR metadata Show command (gi show mr <iid>): - MR details with branches, assignees, reviewers, merge status - DiffNote positions showing file:line for code review comments - Full description and discussion bodies (no truncation in JSON) - --json flag for structured output with ISO timestamps Count command (gi count mrs): - MR counting with optional --type filter for discussions/notes - JSON output with breakdown by state Ingest command (gi ingest --type mrs): - Full MR sync with discussion prefetch - Progress output shows MR-specific metrics (diffnotes count) - JSON summary with comprehensive sync statistics All commands respect global --robot mode for auto-JSON output. The pattern "gi list mrs --json \| jq '.mrs[] \| .iid'" now works for scripted MR processing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:46:59 -05:00
Taylor Eernisse	4abbe2a226	fix(ingest): Reset discussion watermarks when --full flag is used This is a P1 fix from the CP1-CP2 alignment audit. The --full flag was designed to enable complete data re-synchronization, but it only reset sync_cursors for issues—it failed to reset the per-issue discussions_synced_for_updated_at watermark. The result was an inconsistent state: issues would be re-fetched from GitLab (because sync_cursors were cleared), but their discussions would NOT be re-synced (because the watermark comparison prevented it). This was a subtle bug because the watermark check uses: WHERE updated_at > COALESCE(discussions_synced_for_updated_at, 0) When discussions_synced_for_updated_at is already set to the issue's updated_at, the comparison fails and discussions are skipped. Fix: Before clearing sync_cursors, set discussions_synced_for_updated_at to NULL for all issues in the project. This makes COALESCE return 0, ensuring all issues become eligible for discussion sync. The ordering is important: watermarks must be reset BEFORE cursors to ensure the full sync behaves consistently. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 17:01:04 -05:00
Taylor Eernisse	8fb890c528	feat(cli): Implement complete command-line interface Provides a user-friendly CLI for all GitLab Inbox operations. src/cli/mod.rs - Clap command definitions: - Global --config flag for alternate config path - Subcommands: init, auth-test, doctor, version, backup, reset, migrate, sync-status, ingest, list, count, show - Ingest supports --type (issues/merge_requests), --project filter, --force lock override, --full resync - List supports rich filtering: --state, --author, --assignee, --label, --milestone, --since, --due-before, --has-due-date - List supports --sort (updated/created/iid), --order (asc/desc) - List supports --open to launch browser, --json for scripting src/cli/commands/ - Command implementations: init.rs: Interactive configuration wizard - Prompts for GitLab URL, token env var, projects to track - Creates config file and initializes database - Supports --force overwrite and --non-interactive mode auth_test.rs: Verify GitLab authentication - Calls /api/v4/user to validate token - Displays username and GitLab instance URL doctor.rs: Environment health check - Validates config file exists and parses correctly - Checks database connectivity and migration state - Verifies GitLab authentication - Reports token environment variable status - Supports --json output for CI integration ingest.rs: Data synchronization from GitLab - Acquires sync lock with stale detection - Shows progress bars for issues and discussions - Reports sync statistics on completion - Supports --full flag to reset cursors and refetch all data list.rs: Query local database - Formatted table output with comfy-table - Filters build dynamic SQL with parameterized queries - Username filters normalize @ prefix automatically - --open flag uses 'open' crate for cross-platform browser launch - --json outputs array of issue objects show.rs: Detailed entity view - Displays issue metadata in structured format - Shows full description with markdown - Lists labels, assignees, milestone - Shows discussion threads with notes count.rs: Entity statistics - Counts issues, discussions, or notes - Supports --type filter for discussions/notes sync_status.rs: Display sync watermarks - Shows last sync time per project - Displays cursor positions for debugging src/main.rs - Application entry point: - Initializes tracing subscriber with env-filter - Parses CLI arguments via clap - Dispatches to appropriate command handler - Consistent error formatting for all failure modes src/lib.rs - Library entry point: - Exports cli, core, gitlab, ingestion modules - Re-exports Config, GiError, Result for convenience Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:28:52 -05:00

1 2

88 Commits