gitlore

Author	SHA1	Message	Date
Taylor Eernisse	ff94f24702	chore(beads): Update issue tracker state for Gate 1 completions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:01:46 -05:00
Taylor Eernisse	5c521491b7	chore(beads): Update issue tracker state for Gate 1 completions Closes bd-hu3, bd-2e8, bd-2fm, bd-sqw, bd-1uc, bd-tir, bd-3sh, bd-1m8. All Gate 1 resource events infrastructure beads except bd-1ep (pipeline wiring) are now complete. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:08:23 -05:00
Taylor Eernisse	0236ef2776	feat(stats): Extend --check with event FK integrity and queue health diagnostics Adds two new categories of integrity checks to 'lore stats --check': Event FK integrity (3 queries): - Detects orphaned resource_state_events where issue_id or merge_request_id points to a non-existent parent entity - Same check for resource_label_events and resource_milestone_events - Under normal CASCADE operation these should always be zero; non-zero indicates manual DB edits, bugs, or partial migration state Queue health diagnostics: - pending_dependent_fetches counts: pending, failed, and stuck (locked) - queue_stuck_locks: Jobs with locked_at set (potential worker crashes) - queue_max_attempts: Highest retry count across all jobs (signals permanently failing jobs when > 3) New IntegrityResult fields: orphan_state_events, orphan_label_events, orphan_milestone_events, queue_stuck_locks, queue_max_attempts. New QueueStats fields: pending_dependent_fetches, pending_dependent_fetches_failed, pending_dependent_fetches_stuck. Human output shows colored PASS/WARN/FAIL indicators: - Red "!" for orphaned events (integrity failure) - Yellow "!" for stuck locks and high retry counts (warnings) - Dependent fetch queue line only shown when non-zero All new queries are guarded by table_exists() checks for graceful degradation on databases without migration 011 applied. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:08:15 -05:00
Taylor Eernisse	12811683ca	feat(cli): Add 'lore count events' command with human and robot output Extends the count command to support "events" as an entity type, displaying resource event counts broken down by event type (state, label, milestone) and entity type (issue, merge request). New functions in count.rs: - run_count_events: Creates DB connection and delegates to events_db::count_events for the actual queries - print_event_count: Human-readable table with aligned columns showing per-type breakdowns and row/column totals - print_event_count_json: Structured JSON matching the robot mode contract with ok/data envelope and per-type issue/mr/total counts JSON output structure: {"ok":true,"data":{"state_events":{"issue":N,"merge_request":N, "total":N},"label_events":{...},"milestone_events":{...},"total":N}} Updated exports in commands/mod.rs to expose the three new public functions (run_count_events, print_event_count, print_event_count_json). The "events" branch in handle_count (main.rs, committed earlier) routes to these functions before the existing entity type dispatcher. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:08:01 -05:00
Taylor Eernisse	724be4d265	feat(queue): Add generic dependent fetch queue with exponential backoff New module src/core/dependent_queue.rs provides job queue operations against the pending_dependent_fetches table. Designed for second-pass fetches that depend on primary entity ingestion (resource events, MR close references, MR file diffs). Queue operations: - enqueue_job: Idempotent INSERT OR IGNORE keyed on the UNIQUE (project_id, entity_type, entity_iid, job_type) constraint. Returns bool indicating whether the row was actually inserted. - claim_jobs: Two-phase claim — SELECT available jobs (unlocked, past retry window) then UPDATE locked_at in batch. Orders by enqueued_at ASC for FIFO processing within a job type. - complete_job: DELETE the row on successful processing. - fail_job: Increments attempts, calculates exponential backoff (30s * 2^(attempts-1), capped at 480s), sets next_retry_at, clears locked_at, and records the error message. Reads current attempts via query with unwrap_or(0) fallback for robustness. - reclaim_stale_locks: Clears locked_at on jobs locked longer than a configurable threshold, recovering from worker crashes. - count_pending_jobs: GROUP BY job_type aggregation for progress reporting and stats display. Registers both events_db and dependent_queue in src/core/mod.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:48 -05:00
Taylor Eernisse	c34ed3007e	feat(db): Add event upsert functions and count queries in events_db module New module src/core/events_db.rs provides database operations for resource events: - upsert_state_events: Batch INSERT OR REPLACE for state change events, keyed on UNIQUE(gitlab_id, project_id). Wraps in a savepoint for atomicity per entity batch. Maps GitLabStateEvent fields including optional user, source_commit, and source_merge_request_iid. - upsert_label_events: Same pattern for label add/remove events, extracting label.name for denormalized storage. - upsert_milestone_events: Same pattern for milestone assignment events, storing both milestone.title and milestone.id. All three upsert functions: - Take &mut Connection (required for savepoint creation) - Use prepare_cached for statement reuse across batch iterations - Convert ISO timestamps via iso_to_ms_strict for ms-epoch storage - Propagate rusqlite errors via the #[from] LoreError::Database path - Return the count of events processed Supporting functions: - resolve_entity_ids: Maps entity_type string to (issue_id, MR_id) pair with exactly-one-non-NULL invariant matching the CHECK constraints - count_events: Queries all three event tables with conditional COUNT aggregations, returning EventCounts struct. Uses unwrap_or((0, 0)) for graceful degradation when tables don't exist (pre-migration 011). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:34 -05:00
Taylor Eernisse	e73d2907dc	feat(client): Add Resource Events API endpoints with generic paginated fetcher Extends GitLabClient with methods for fetching resource events from GitLab's per-entity API endpoints. Adds a new impl block containing: - fetch_all_pages<T>: Generic paginated collector that handles x-next-page header parsing with fallback to page-size heuristics. Uses per_page=100 and respects the existing rate limiter via request_with_headers. Terminates when: (a) x-next-page header is absent/stale, (b) response is empty, or (c) page is not full. - Six typed endpoint methods: - fetch_issue_state_events / fetch_mr_state_events - fetch_issue_label_events / fetch_mr_label_events - fetch_issue_milestone_events / fetch_mr_milestone_events - fetch_all_resource_events: Convenience method that fetches all three event types for an entity (issue or merge_request) in sequence, returning a tuple of (state, label, milestone) event vectors. Routes to issue or MR endpoints based on entity_type string. All methods follow the existing client patterns: path formatting with gitlab_project_id and iid, error propagation via Result, and rate limiter integration through the shared request_with_headers path. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:19 -05:00
Taylor Eernisse	9d4755521f	feat(config): Add fetchResourceEvents config flag with --no-events CLI override Adds a new boolean field to SyncConfig that controls whether resource event fetching is performed during sync: - SyncConfig.fetch_resource_events: defaults to true via serde default_true helper, serialized as "fetchResourceEvents" in JSON - SyncArgs.no_events: --no-events CLI flag that overrides the config value to false when present - SyncOptions.no_events: propagates the flag through the sync pipeline - handle_sync_cmd: mutates loaded config when --no-events is set, ensuring the flag takes effect regardless of config file contents This follows the existing pattern established by --no-embed and --no-docs flags, where CLI flags override config file defaults. The config is loaded as mutable specifically to support this override. Also adds "events" to the count command's entity type value_parser, enabling `lore count events` (implementation in a separate commit). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:07:06 -05:00
Taylor Eernisse	92ff255909	feat(types): Add GitLab Resource Event serde types with deserialization tests Adds six new types for deserializing responses from GitLab's three Resource Events API endpoints (state, label, milestone): - GitLabStateEvent: State transitions with optional user, source_commit, and source_merge_request reference - GitLabLabelEvent: Label add/remove events with nested GitLabLabelRef - GitLabMilestoneEvent: Milestone assignment changes with nested GitLabMilestoneRef - GitLabMergeRequestRef: Lightweight MR reference (iid, title, web_url) - GitLabLabelRef: Label metadata (id, name, color, description) - GitLabMilestoneRef: Milestone metadata (id, iid, title) All types derive Deserialize + Serialize and use Option<T> for nullable fields (user, source_commit, color, description) to match GitLab's API contract where these fields may be null. Includes 8 new test cases covering: - State events with/without user, with/without source_merge_request - Label events for add and remove actions, including null color handling - Milestone event deserialization - Standalone ref type deserialization (MR, label, milestone) Uses r##"..."## raw string delimiters where JSON contains hex color codes (#FF0000) that would conflict with r#"..."# delimiters. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:06:56 -05:00
Taylor Eernisse	ce5cd9c95d	feat(schema): Add migration 011 for resource events, entity references, and dependent fetch queue Introduces five new tables that power temporal queries (timeline, file-history, trace) via GitLab Resource Events APIs: - resource_state_events: State transitions (opened/closed/reopened/merged/locked) with actor tracking, source commit, and source MR references - resource_label_events: Label add/remove history per entity - resource_milestone_events: Milestone assignment changes per entity - entity_references: Cross-reference table (Gate 2 prep) linking source/target entity pairs with reference type and discovery method - pending_dependent_fetches: Generic job queue for resource_events, mr_closes_issues, and mr_diffs with exponential backoff retry All event tables enforce entity exclusivity via CHECK constraints (exactly one of issue_id or merge_request_id must be non-NULL). Deduplication handled via UNIQUE indexes on (gitlab_id, project_id). FK cascades ensure cleanup when parent entities are removed. The dependent fetch queue uses a UNIQUE constraint on (project_id, entity_type, entity_iid, job_type) for idempotent enqueue, with partial indexes optimizing claim and retry queries. Registered as migration 011 in the embedded MIGRATIONS array in db.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 12:06:43 -05:00
Taylor Eernisse	549a0646d7	chore: Add test-runner agent, agent-swarm-launcher skill, review artifacts, and beads updates - .claude/agents/test-runner.md: New Claude Code agent definition for running cargo test suites and analyzing results, configured with haiku model for fast execution. - skills/agent-swarm-launcher/: New skill for bootstrapping coordinated multi-agent workflows with AGENTS.md reconnaissance, Agent Mail coordination, and beads task tracking. - api-review.html, phase-a-review.html: Self-contained HTML review artifacts for API audit and Phase A search pipeline review. - .beads/issues.jsonl, .beads/last-touched: Updated issue tracker state reflecting current project work items. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:36:05 -05:00
Taylor Eernisse	a417640faa	docs: Overhaul AGENTS.md, update README, add pipeline spec and Phase B plan AGENTS.md: Comprehensive rewrite adding file deletion safeguards, destructive git command protocol, Rust toolchain conventions, code editing discipline rules, compiler check requirements, TDD mandate, MCP Agent Mail coordination protocol, beads/bv/ubs/ast-grep/cass tool documentation, and session completion workflow. README.md: Document NO_COLOR/CLICOLOR env vars, --since 1m duration, project resolution cascading match logic, lore health and robot-docs commands, exit codes 17 (not found) and 18 (ambiguous match), --color/--quiet global flags, dirty_sources and pending_discussion_fetches tables, and version command git hash output. docs/embedding-pipeline-hardening.md: Detailed spec covering the three problems from the chunk size reduction (broken --full wiring, mixed chunk sizes in vector space, static dedup multiplier) with decision records, implementation plan, and acceptance criteria. docs/phase-b-temporal-intelligence.md: Draft planning document for transforming gitlore from a search engine into a temporal code intelligence system by ingesting structured event data from GitLab. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:51 -05:00
Taylor Eernisse	f560e6bc00	test(embedding): Add regression tests for pipeline hardening bugs Three targeted regression tests covering bugs fixed in the embedding pipeline hardening: - overflow_doc_with_error_sentinel_not_re_detected_as_pending: verifies that documents skipped for producing too many chunks have their sentinel error recorded in embedding_metadata and are NOT returned by find_pending_documents or count_pending_documents on subsequent runs (prevents infinite re-processing loop). - count_and_find_pending_agree: exercises four states (empty DB, new document, fully-embedded document, config-drifted document) and asserts that count_pending_documents and find_pending_documents produce consistent results across all of them. - full_embed_delete_is_atomic: confirms the --full flag's two DELETE statements (embedding_metadata + embeddings) execute atomically within a transaction. Also updates test DB creation to apply migration 010. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:34 -05:00
Taylor Eernisse	aebbe6b795	feat(cli): Wire --full flag for embed, add sync stage spinners - Add --full / --no-full flag pair to EmbedArgs with overrides_with semantics matching the existing flag pattern. When active, atomically DELETEs all embedding_metadata and embeddings before re-embedding. - Thread the full flag through run_embed -> run_sync so that 'lore sync --full' triggers a complete re-embed alongside the full re-ingest it already performed. - Add indicatif spinners to sync stages with dynamic stage numbering that adjusts when --no-docs or --no-embed skip stages. Spinners are hidden in robot mode. - Update robot-docs manifest to advertise the new --full flag on the embed command. - Replace hardcoded schema version 9 in health check with the LATEST_SCHEMA_VERSION constant from db.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:22 -05:00
Taylor Eernisse	7d07f95d4c	fix(embedding): Harden pipeline against chunk overflow, config drift, and partial failures Reduces CHUNK_MAX_BYTES from 32KB to 6KB and CHUNK_OVERLAP_CHARS from 500 to 200 to stay within nomic-embed-text's 8,192-token context window. This commit addresses all downstream consequences of that reduction: - Config drift detection: find_pending_documents and count_pending_documents now take model_name and compare chunk_max_bytes, model, and dims against stored metadata. Documents embedded with stale config are automatically re-queued. - Overflow guard: documents producing >= CHUNK_ROWID_MULTIPLIER chunks are skipped with a sentinel error recorded in embedding_metadata, preventing both rowid collision and infinite re-processing loops. - Deferred clearing: old embeddings are no longer cleared before attempting new ones. clear_document_embeddings is deferred until the first successful chunk embedding, so if all chunks fail the document retains its previous embeddings rather than losing all data. - Savepoints: each page of DB writes is wrapped in a SQLite savepoint so a crash mid-page rolls back atomically instead of leaving partial state (cleared embeddings with no replacements). - Per-chunk retry on context overflow: when a batch fails with a context-length error, each chunk is retried individually so one oversized chunk doesn't poison the entire batch. - Adaptive dedup in vector search: replaces the static 3x over-fetch multiplier with a dynamic one based on actual max chunks per document (using the new chunk_count column with a fallback COUNT query for pre-migration data). Also replaces partial_cmp with total_cmp for f64 distance sorting. - Stores chunk_max_bytes and chunk_count (on sentinel rows) in embedding_metadata to support config drift detection and adaptive dedup without runtime queries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:08 -05:00
Taylor Eernisse	2a52594a60	feat(db): Add migration 010 for chunk config tracking columns Add chunk_max_bytes and chunk_count columns to embedding_metadata to support config drift detection and adaptive dedup sizing. Includes a partial index on sentinel rows (chunk_index=0) to accelerate the drift detection and max-chunk queries. Also exports LATEST_SCHEMA_VERSION as a public constant derived from the MIGRATIONS array length, replacing the previously hardcoded magic number in the health check. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:34:48 -05:00
Taylor Eernisse	51c370fac2	feat(project): Add substring matching and use Ambiguous error for resolution Extend resolve_project() with a 4th cascade step: case-insensitive substring match when exact, case-insensitive, and suffix matches all fail. This allows shorthand like "typescript" to match "vs/typescript-code" when unambiguous. Multi-match still returns an error with all candidates listed. Also change ambiguity errors from LoreError::Other to LoreError::Ambiguous so they get the proper AMBIGUOUS error code (exit 18) instead of INTERNAL_ERROR. Includes tests for unambiguous substring, case-insensitive substring, ambiguous substring, and suffix-preferred-over-substring ordering. Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:55:23 -05:00
Taylor Eernisse	7b7d781a19	docs: Update exit codes, add config precedence and shell completions Exit code tables (README + AGENTS.md): - Add codes 14-16 (Ollama unavailable, model not found, embedding failed) - Add code 20 (Config not found, remapped from 2) - Clarify code 1 (now includes health check failed + not implemented) - Clarify code 2 (now exclusively usage/parsing errors from clap) New sections: - Configuration Precedence: CLI flags > env vars > config file > defaults - Shell Completions: bash, zsh, fish, powershell installation instructions Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:55:02 -05:00
Taylor Eernisse	03ea51513d	feat(main): Wire SIGPIPE, color, quiet, completions, and negation flag handling Runtime setup: - Reset SIGPIPE to SIG_DFL on Unix at the very start of main() so piping to head/grep doesn't cause a panic. - Apply --color flag to console::set_colors_enabled() after CLI parse. - Extract quiet flag and thread it to handle_ingest. Command dispatch: - Add Completions match arm using clap_complete::generate(). - Resolve all --no-X negation flags in handlers: asc, has_due, open (issues/mrs), force/full (ingest/sync), check (stats), explain (search), retry_failed (embed). - Auto-enable --check when --repair is used in handle_stats. - Suppress deprecation warnings in robot mode for List, Show, AuthTest, and SyncStatus deprecated aliases. Stubs: - Change handle_backup/handle_reset from ok:true to structured error JSON on stderr with exit code 1. Remove unused NotImplementedOutput and NotImplementedData structs. Version: - Include GIT_HASH env var in handle_version output (human and robot). - Add git_hash field to VersionData with skip_serializing_if for None. Robot-docs: - Update exit code table with codes 14-18 (Ollama, NotFound, Ambiguous) and code 20 (ConfigNotFound). Clarify code 1 and 2 descriptions. Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:54:53 -05:00
Taylor Eernisse	667f70e177	refactor(commands): Add IngestDisplay, resolve_project, and color-aware tables Ingest: - Introduce IngestDisplay struct with show_progress/show_text booleans to decouple progress bars from text output. Replaces the robot_mode bool parameter with explicit display control, enabling sync to show progress without duplicating summary text (progress_only mode). - Use resolve_project() for --project filtering instead of LIKE queries, providing proper error messages for ambiguous or missing projects. List: - Add colored_cell() helper that checks console::colors_enabled() before applying comfy-table foreground colors, bridging the gap between the console and comfy-table crates for --color flag support. - Use resolve_project() for project filtering (exact ID match). - Improve since filter to return explicit errors instead of silently ignoring invalid values. - Improve format_relative_time for proper singular/plural forms. Search: - Validate --after/--updated-after with explicit error messages. - Handle optional title field (Option<String>) in HydratedRow. Show: - Use resolve_project() for project disambiguation. Sync: - Thread robot_mode via SyncOptions for IngestDisplay selection. - Use IngestDisplay::progress_only() in interactive sync mode. GenerateDocs: - Use resolve_project() for --project filtering. Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:54:36 -05:00
Taylor Eernisse	585b746461	feat(cli): Add --color, --quiet, --no-X negations, completions, and help headings Global flags: - --color (auto\|always\|never) for explicit color control - --quiet/-q to suppress non-essential output - Hidden Completions subcommand for bash/zsh/fish/powershell Flag negation (--no-X) with overrides_with for: has-due, asc, open (issues/mrs), force/full (ingest/sync), check (stats), explain (search), retry-failed (embed). Enables scripted flag composition where later flags override earlier ones. Validation: - value_parser on search --mode, --type, --fts-mode for early rejection - Remove requires="check" from --repair (auto-enabled in handler) Polish: - help_heading groups (Filters, Sorting, Output, Actions) on issues, mrs, and search args for cleaner --help output - Hide Backup, Reset, and Completions from --help Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:54:18 -05:00
Taylor Eernisse	730ddef339	fix(error): Remap ConfigNotFound to exit 20 and add NotFound/Ambiguous codes ConfigNotFound previously used exit code 2 which collides with clap's usage error code. Remap it to exit 20 to avoid ambiguity. Also add dedicated NotFound (exit 17) and Ambiguous (exit 18) error codes with proper ErrorCode variants and Display implementations, replacing the previous incorrect mapping of these errors to GitLabNotFound. Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:54:02 -05:00
Taylor Eernisse	5508d8464a	build: Add clap_complete, libc dependencies and git hash build script Add clap_complete for shell completion generation and libc (unix-only) for SIGPIPE handling. Create build.rs to embed the git commit hash at compile time via cargo:rustc-env=GIT_HASH, enabling `lore version` to display the short hash alongside the version number. Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:53:51 -05:00
Taylor Eernisse	41d20f1374	chore(beads): Update issue tracker with search pipeline beads Add new beads for the checkpoint-3 search pipeline work including document generation, FTS5 indexing, embedding pipeline, hybrid search, and CLI command implementations. Update status on completed beads. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:39 -05:00
Taylor Eernisse	9b63671df9	docs: Update documentation for search pipeline and Phase A spec - README.md: Add hybrid search and robot mode to feature list. Update quick start to use new noun-first CLI syntax (lore issues, lore mrs, lore search). Add embedding configuration section. Update command examples throughout. - AGENTS.md: Update robot mode examples to new CLI syntax. Add search, sync, stats, and generate-docs commands to the robot mode reference. Update flag conventions (-n for limit, -s for state, -J for JSON). - docs/prd/checkpoint-3.md: Major expansion with gated milestone structure (Gate A: lexical, Gate B: hybrid, Gate C: sync). Add prerequisite rename note, code sample conventions, chunking strategy details, and sqlite-vec rowid encoding scheme. Clarify that Gate A requires only SQLite + FTS5 with no sqlite-vec dependency. - docs/phase-a-spec.md: New detailed specification for Gate A (lexical search MVP) covering document schema, FTS5 configuration, dirty queue mechanics, CLI interface, and acceptance criteria. - docs/api-efficiency-findings.md: Analysis of GitLab API pagination behavior and efficiency observations from production sync runs. Documents the missing x-next-page header issue and heuristic fix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:33 -05:00
Taylor Eernisse	d235f2b4dd	test: Add test suites for embedding, FTS, hybrid search, and golden queries Four new test modules covering the search infrastructure: - tests/embedding.rs: Unit tests for the embedding pipeline including chunk ID encoding/decoding, change detection, and document chunking with overlap verification. - tests/fts_search.rs: Integration tests for FTS5 search including safe query sanitization, multi-term queries, prefix matching, and the raw FTS mode for power users. - tests/hybrid_search.rs: End-to-end tests for hybrid search mode including RRF fusion correctness, graceful degradation when embeddings are unavailable, and filter application. - tests/golden_query_tests.rs: Golden query tests using fixtures from tests/fixtures/golden_queries.json to verify search quality against known-good query/result pairs. Ensures ranking stability across implementation changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:19 -05:00
Taylor Eernisse	daf5a73019	feat(cli): Add search, stats, embed, sync, health, and robot-docs commands Extends the CLI with six new commands that complete the search pipeline: - lore search <QUERY>: Hybrid search with mode selection (lexical, hybrid, semantic), rich filtering (--type, --author, --project, --label, --path, --after, --updated-after), result limits, and optional explain mode showing RRF score breakdowns. Safe FTS mode sanitizes user input; raw mode passes through for power users. - lore stats: Document and index statistics with optional --check for integrity verification and --repair to fix inconsistencies (orphaned documents, missing FTS entries, stale dirty queue items). - lore embed: Generate vector embeddings via Ollama. Supports --retry-failed to re-attempt previously failed embeddings. - lore generate-docs: Drain the dirty queue to regenerate documents. --full seeds all entities for complete rebuild. --project scopes to a single project. - lore sync: Full pipeline orchestration (ingest issues + MRs, generate-docs, embed) with --no-embed and --no-docs flags for partial runs. Reports per-stage results and total elapsed time. - lore health: Quick pre-flight check (config exists, DB exists, schema current). Returns exit code 1 if unhealthy. Designed for agent pre-flight scripts. - lore robot-docs: Machine-readable command manifest for agent self-discovery. Returns all commands, flags, examples, exit codes, and recommended workflows as structured JSON. Also enhances lore init with --gitlab-url, --token-env-var, and --projects flags for fully non-interactive robot-mode initialization. Fixes init's force/non-interactive precedence logic and adds JSON output for robot mode. Updates all command files for the GiError -> LoreError rename. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:10 -05:00
Taylor Eernisse	559f0702ad	feat(ingestion): Mark entities dirty on ingest for document regeneration Integrates the dirty tracking system into all four ingestion paths (issues, MRs, issue discussions, MR discussions). After each entity is upserted within its transaction, a corresponding dirty_queue entry is inserted so the document regenerator knows which documents need rebuilding. This ensures that document generation stays transactionally consistent with data changes: if the ingest transaction rolls back, the dirty marker rolls back too, preventing stale document regeneration attempts. Also updates GiError references to LoreError in these files as part of the codebase-wide rename, and adjusts issue discussion logging from info to debug level to reduce noise during normal sync runs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:51 -05:00
Taylor Eernisse	d5bdb24b0f	feat(search): Add hybrid search engine with FTS5, vector, and RRF fusion Implements the search module providing three search modes: - Lexical (FTS5): Full-text search using SQLite FTS5 with safe query sanitization. User queries are automatically tokenized and wrapped in proper FTS5 syntax. Supports a "raw" mode for power users who want direct FTS5 query syntax (NEAR, column filters, etc.). - Semantic (vector): Embeds the search query via Ollama, then performs cosine similarity search against stored document embeddings. Results are deduplicated by doc_id since documents may have multiple chunks. - Hybrid (default): Executes both lexical and semantic searches in parallel, then fuses results using Reciprocal Rank Fusion (RRF) with k=60. This avoids the complexity of score normalization while producing high-quality merged rankings. Gracefully degrades to lexical-only when embeddings are unavailable. Additional components: - search::filters: Post-retrieval filtering by source_type, author, project, labels (AND logic), file path prefix, created_after, and updated_after. Date filters accept relative formats (7d, 2w) and ISO dates. - search::rrf: Reciprocal Rank Fusion implementation with configurable k parameter and optional explain mode that annotates each result with its component ranks and fusion score breakdown. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:42 -05:00
Taylor Eernisse	723703bed9	feat(embedding): Add Ollama-powered vector embedding pipeline Implements the embedding module that generates vector representations of documents using a local Ollama instance with the nomic-embed-text model. These embeddings enable semantic (vector) search and the hybrid search mode that fuses lexical and semantic results via RRF. Key components: - embedding::ollama: HTTP client for the Ollama /api/embeddings endpoint. Handles connection errors with actionable error messages (OllamaUnavailable, OllamaModelNotFound) and validates response dimensions. - embedding::chunking: Splits long documents into overlapping paragraph-aware chunks for embedding. Uses a configurable max token estimate (8192 default for nomic-embed-text) with 10% overlap to preserve cross-chunk context. - embedding::chunk_ids: Encodes chunk identity as doc_id * 1000 + chunk_index for the embeddings table rowid. This allows vector search to map results back to documents and deduplicate by doc_id efficiently. - embedding::change_detector: Compares document content_hash against stored embedding hashes to skip re-embedding unchanged documents, making incremental embedding runs fast. - embedding::pipeline: Orchestrates the full embedding flow: detect changed documents, chunk them, call Ollama in configurable concurrency (default 4), store results. Supports --retry-failed to re-attempt previously failed embeddings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:30 -05:00
Taylor Eernisse	20edff4ab1	feat(documents): Add document generation pipeline with dirty tracking Implements the documents module that transforms raw ingested entities (issues, MRs, discussions) into searchable document blobs stored in the documents table. This is the foundation for both FTS5 lexical search and vector embedding. Key components: - documents::extractor: Renders entities into structured text documents. Issues include title, description, labels, milestone, assignees, and threaded discussion summaries. MRs additionally include source/target branches, reviewers, and approval status. Discussions are rendered with full note threading. - documents::regenerator: Drains the dirty_queue table to regenerate only documents whose source entities changed since last sync. Supports full rebuild mode (seeds all entities into dirty queue first) and project-scoped regeneration. - documents::truncation: Safety cap at 2MB per document to prevent pathological outliers from degrading FTS or embedding performance. - ingestion::dirty_tracker: Marks entities as dirty inside the ingestion transaction so document regeneration stays consistent with data changes. Uses INSERT OR IGNORE to deduplicate. - ingestion::discussion_queue: Queue-based discussion fetching that isolates individual discussion failures from the broader ingestion pipeline, preventing a single corrupt discussion from blocking an entire project sync. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:18 -05:00
Taylor Eernisse	d31d5292f2	fix(gitlab): Improve pagination heuristics and fix rate limiter lock contention Two targeted fixes to the GitLab API client: 1. Pagination: When the x-next-page header is missing but the current page returned a full page of results, heuristically advance to the next page instead of stopping. This fixes silent data truncation observed with certain GitLab instances that omit pagination headers on intermediate pages. The existing early-exit on empty or partial pages remains as the termination condition. 2. Rate limiter: Refactor the async acquire() method into a synchronous check_delay() that computes the required sleep duration and updates last_request time while holding the mutex, then releases the lock before sleeping. This eliminates holding the Mutex<RateLimiter> across an await point, which previously could block other request tasks unnecessarily during the sleep interval. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:05 -05:00
Taylor Eernisse	6e22f120d0	refactor(core): Rename GiError to LoreError and add search infrastructure Mechanical rename of GiError -> LoreError across the core module to match the project's rebranding from gitlab-inbox to gitlore/lore. Updates the error enum name, all From impls, and the Result type alias. Additionally introduces: - New error variants for embedding pipeline: OllamaUnavailable, OllamaModelNotFound, EmbeddingFailed, EmbeddingsNotBuilt. Each includes actionable suggestions (e.g., "ollama serve", "ollama pull nomic-embed-text") to guide users through recovery. - New error codes 14-16 for programmatic handling of Ollama failures. - Savepoint-based migration execution in db.rs: each migration now runs inside a SQLite SAVEPOINT so a failed migration rolls back cleanly without corrupting the schema_version tracking. Previously a partial migration could leave the database in an inconsistent state. - core::backoff module: exponential backoff with jitter utility for retry loops in the embedding pipeline and discussion queues. - core::project module: helper for resolving project IDs and paths from the local database, used by the document regenerator and search filters. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:45:54 -05:00
Taylor Eernisse	4270603da4	feat(db): Add migrations for documents, FTS5, and embeddings Three new migrations establish the search infrastructure: - 007_documents: Creates the `documents` table as the central search unit. Each document is a rendered text blob derived from an issue, MR, or discussion. Includes `dirty_queue` table for tracking which entities need document regeneration after ingestion changes. - 008_fts5: Creates FTS5 virtual table `documents_fts` with content sync triggers. Uses `unicode61` tokenizer with `remove_diacritics=2` for broad language support. Automatic insert/update/delete triggers keep the FTS index synchronized with the documents table. - 009_embeddings: Creates `embeddings` table for storing vector chunks produced by Ollama. Uses `doc_id * 1000 + chunk_index` rowid encoding to support multi-chunk documents while enabling efficient doc-level deduplication in vector search results. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:45:41 -05:00
Taylor Eernisse	aca4773327	deps: Add rand crate for randomized backoff and jitter The embedding pipeline and retry queues need randomized exponential backoff to prevent thundering herd effects when Ollama or GitLab recover from transient failures. The rand crate (0.8) provides the thread-safe RNG needed for jitter computation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:45:30 -05:00
Taylor Eernisse	f4dba386c9	docs: Restructure checkpoint-3 PRD with gated milestones Reorganizes the Search & Sync MVP plan into three independently verifiable gates (A: Lexical MVP, B: Hybrid MVP, C: Sync MVP) to reduce integration risk. Each gate has explicit deliverables, acceptance criteria, and can ship on its own. Expands the specification with additional detail on document generation, search API surface, sync orchestration, and integrity repair paths. Removes the outdated rename note since the project is now fully migrated to gitlore/lore naming. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:42:39 -05:00
Taylor Eernisse	856aad1641	feat(cli): Redesign CLI with noun-first subcommands Replaces the verb-first pattern ('lore list issues', 'lore show issue 42') with noun-first subcommands that feel more natural: lore issues # list issues lore issues 42 # show issue #42 lore mrs # list merge requests lore mrs 99 # show MR #99 lore ingest # ingest everything lore ingest issues # ingest only issues lore count issues # count issues lore status # sync status lore auth # verify auth lore doctor # health check Key changes: - New IssuesArgs, MrsArgs, IngestArgs, CountArgs structs with short flags (-n, -s, -p, -a, -l, -o, -f, -J, etc.) - Global -J/--json flag as shorthand for --robot - 'lore ingest' with no argument ingests both issues and MRs, emitting combined JSON summary in robot mode - --asc flag replaces --order=asc/desc for brevity - Renamed flags: --has-due-date -> --has-due, --type -> --for, --confirm -> --yes, target_branch -> --target, etc. Old commands (list, show, auth-test, sync-status) are preserved as hidden backward-compat aliases that emit deprecation warnings to stderr before delegating to the new handlers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:42:26 -05:00
Taylor Eernisse	8fe5feda7e	fix(ingestion): Move counter increments after transaction commit Ingestion counters (discussions_upserted, notes_upserted, discussions_fetched, diffnotes_count) were incremented before tx.commit(), meaning a failed commit would report inflated metrics. Counters now increment only after successful commit so reported numbers accurately reflect persisted state. Also simplifies the stale-removal guard in issue discussions: the received_first_response flag was unnecessary since an empty seen_discussion_ids list is safe to pass to remove_stale -- if there were no discussions, stale removal correctly sweeps all previously-stored discussions. The two separate code paths (empty vs populated) are collapsed into a single branch. Derives Default on IngestResult to eliminate verbose zero-init. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:42:11 -05:00
Taylor Eernisse	753ff46bb4	fix(cli): Correct project filtering and GROUP_CONCAT delimiter Two SQL correctness issues fixed: 1. Project filter used LIKE '%term%' which caused partial matches (e.g. filtering for "foo" matched "group/foobar"). Now uses exact match OR suffix match after '/' so "foo" matches "group/foo" but not "group/foobar". 2. GROUP_CONCAT used comma as delimiter for labels and assignees, which broke parsing when label names themselves contained commas. Switched to ASCII unit separator (0x1F) which cannot appear in GitLab entity names. Also adds a guard for negative time deltas in format_relative_time to handle clock skew gracefully instead of panicking. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:41:56 -05:00
Taylor Eernisse	d3a05cfb87	fix(error): Improve error suggestions with inline examples Error suggestions now include concrete CLI examples so users (and robot-mode consumers) can act immediately without consulting docs. For instance, ConfigNotFound now shows the expected path and the exact command to run, TokenNotSet shows the export syntax, and Ambiguous shows the -p flag with example project paths. Also fixes the error code for Ambiguous errors: it now maps to GitLabNotFound instead of InternalError, since the entity exists but the user needs to disambiguate -- not an internal failure. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:41:45 -05:00
Taylor Eernisse	390f8a9288	refactor(core): Centralize timestamp parsing in core::time Duplicate ISO 8601 timestamp parsing functions existed in both discussion.rs and merge_request.rs transformers. This extracts iso_to_ms_strict() and iso_to_ms_opt_strict() into core::time as the single source of truth, and updates both transformer modules to use the shared implementations. Also removes the private now_ms() from merge_request.rs in favor of the existing core::time::now_ms(), and replaces the local parse_timestamp_opt() in discussion.rs with the public iso_to_ms() from core::time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:41:34 -05:00
teernisse	55b895a2eb	Update name to gitlore instead of gitlab-inbox	2026-01-28 15:49:14 -05:00
teernisse	9a6357c353	Begin planning phase 3-5 implementation	2026-01-27 22:40:49 -05:00
Taylor Eernisse	96ef60fa05	docs: Update documentation for CP2 merge request support Updates project documentation to reflect the complete CP2 feature set with merge request ingestion and robot mode capabilities. README.md: - Add MR-related CLI examples (gi list mrs, gi show mr, gi ingest) - Document robot mode (--robot flag, GI_ROBOT env, auto-detect) - Update feature list with MR support and DiffNote positions - Add configuration section with all config file options - Expand CLI reference with new commands and flags AGENTS.md: - Add MR ingestion patterns for AI agent consumption - Document robot mode JSON schemas for parsing - Include error handling patterns with exit codes - Add discussion/note querying examples for code review context Cargo.toml: - Bump version to 0.2.0 reflecting major feature addition The documentation emphasizes the robot mode design which enables AI agents like Claude Code to reliably parse gi output for automated GitLab workflow integration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:47:34 -05:00
Taylor Eernisse	d338d68191	test: Add comprehensive test suite for MR ingestion Introduces thorough test coverage for merge request functionality, following the established testing patterns from issue ingestion. New test files: - mr_transformer_tests.rs: NormalizedMergeRequest transformation tests covering full MR with all fields, minimal MR, draft detection via title prefix and work_in_progress field, label/assignee/reviewer extraction, and timestamp conversion - mr_discussion_tests.rs: MR discussion normalization tests including polymorphic noteable binding, DiffNote position extraction with line ranges and SHA triplet, and resolvable note handling - diffnote_position_tests.rs: Exhaustive DiffNote position scenarios covering text/image/file types, single-line vs multi-line comments, added/removed/modified lines, and missing position handling New fixtures: - fixtures/gitlab_merge_request.json: Representative MR API response with nested structures for integration testing Updated tests: - gitlab_types_tests.rs: Add MR type deserialization tests - migration_tests.rs: Update expected schema version to 6 Test design follows property-based patterns where feasible, with explicit edge case coverage for nullable fields and API variants across different GitLab versions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:47:17 -05:00
Taylor Eernisse	8ddc974b89	feat(cli): Add MR support to list/show/count/ingest commands Extends all data commands to support merge requests alongside issues, with consistent patterns and JSON output for robot mode. List command (gi list mrs): - MR-specific columns: branches, draft status, reviewers - Filters: --state (opened\|merged\|closed\|locked\|all), --draft, --no-draft, --reviewer, --target-branch, --source-branch - Discussion count with unresolved indicator (e.g., "5/2!") - JSON output includes full MR metadata Show command (gi show mr <iid>): - MR details with branches, assignees, reviewers, merge status - DiffNote positions showing file:line for code review comments - Full description and discussion bodies (no truncation in JSON) - --json flag for structured output with ISO timestamps Count command (gi count mrs): - MR counting with optional --type filter for discussions/notes - JSON output with breakdown by state Ingest command (gi ingest --type mrs): - Full MR sync with discussion prefetch - Progress output shows MR-specific metrics (diffnotes count) - JSON summary with comprehensive sync statistics All commands respect global --robot mode for auto-JSON output. The pattern "gi list mrs --json \| jq '.mrs[] \| .iid'" now works for scripted MR processing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:46:59 -05:00
Taylor Eernisse	7d0d586932	feat(cli): Add global robot mode for machine-readable output Introduces a unified robot mode that enables JSON output across all commands, designed for AI agent and script consumption. Robot mode activation (any of): - --robot flag: Explicit opt-in - GI_ROBOT=1 env var: For persistent configuration - Non-TTY stdout: Auto-detect when piped (e.g., gi list issues \| jq) Implementation: - Cli::is_robot_mode(): Centralized detection logic - All command handlers receive robot_mode boolean - Errors emit structured JSON to stderr with exit codes - Success responses emit JSON to stdout Behavior changes in robot mode: - No color/emoji output (no ANSI escapes) - No progress spinners or interactive prompts - Timestamps as ISO 8601 strings (not relative "2 hours ago") - Full content (no truncation of descriptions/notes) - Structured error objects with code, message, suggestion This enables reliable parsing by Claude Code, shell scripts, and automation pipelines. The auto-detect on non-TTY means simple piping "just works" without explicit flags. Per-command --json flags remain for explicit control and override robot mode when needed for human-friendly terminal + JSON file output. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:46:27 -05:00
Taylor Eernisse	5fe76e46a3	fix(core): Add structured error handling and responsive lock release Improves core infrastructure with robot-friendly error output and faster lock release for better sync behavior. Error handling improvements (error.rs): - ErrorCode::exit_code(): Unique exit codes per error type (1-13) for programmatic error handling in scripts/agents - GiError::suggestion(): Helpful hints for common error recovery - GiError::to_robot_error(): Structured JSON error conversion - RobotError/RobotErrorOutput: Serializable error types with code, message, and optional suggestion fields Lock improvements (lock.rs): - Heartbeat thread now polls every 100ms for release flag, only updating database heartbeat at full interval (5s default) - Eliminates 5-10s delay after sync completion when waiting for heartbeat thread to notice release - Reduces lock hold time after operation completes Database (db.rs): - Bump expected schema version to 6 for MR migration The exit code mapping enables shell scripts and CI/CD pipelines to distinguish between configuration errors (2-4), GitLab API errors (5-8), and database errors (9-11) for appropriate retry/alert logic. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:46:08 -05:00
Taylor Eernisse	cd44e516e3	feat(ingestion): Implement MR sync with parallel discussion prefetch Adds complete merge request ingestion pipeline with a novel two-phase discussion sync strategy optimized for throughput. New modules: - merge_requests.rs: MR upsert with labels/assignees/reviewers handling, stale MR cleanup, and watermark-based incremental sync - mr_discussions.rs: Parallel prefetch strategy for MR discussions Two-phase MR discussion sync: 1. PREFETCH PHASE: Spawn concurrent tasks to fetch discussions for multiple MRs simultaneously (configurable concurrency, default 8). Transform and validate in parallel, storing results in memory. 2. WRITE PHASE: Serial database writes to avoid lock contention. Each MR's discussions written in a single transaction, with proper stale discussion cleanup. This approach achieves ~4-8x throughput vs serial fetching while maintaining database consistency. Transform errors are tracked per-MR to prevent partial writes from corrupting watermarks. Orchestrator updates: - ingest_merge_requests(): Coordinates MR fetch -> discussion sync flow - Progress callbacks emit MR-specific events for UI feedback - Respects --full flag to reset discussion watermarks for full resync The prefetch strategy is critical for MRs which typically have more discussions than issues, and where API latency dominates sync time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:45:48 -05:00
Taylor Eernisse	d33f24c91b	feat(transformers): Add MR transformer and polymorphic discussion support Introduces NormalizedMergeRequest transformer and updates discussion normalization to handle both issue and MR discussions polymorphically. New transformers: - NormalizedMergeRequest: Transforms API MergeRequest to database row, extracting labels/assignees/reviewers into separate collections for junction table insertion. Handles draft detection, detailed_merge_status preference over deprecated merge_status, and merge_user over merged_by. Discussion transformer updates: - NormalizedDiscussion now takes noteable_type ("Issue" \| "MergeRequest") and noteable_id for polymorphic FK binding - normalize_discussions_for_issue(): Convenience wrapper for issues - normalize_discussions_for_mr(): Convenience wrapper for MRs - DiffNote position fields (type, line_range, SHA triplet) now extracted from API position object for code review context Design decisions: - Transformer returns (normalized_item, labels, assignees, reviewers) tuple for efficient batch insertion without re-querying - Timestamps converted to ms epoch for SQLite storage consistency - Optional fields use map() chains for clean null handling The polymorphic discussion approach allows reusing the same discussions and notes tables for both issues and MRs, with noteable_type + FK determining the parent relationship. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:45:29 -05:00

... 3 4 5 6 7

320 Commits