gitlore

Author	SHA1	Message	Date
Taylor Eernisse	730ddef339	fix(error): Remap ConfigNotFound to exit 20 and add NotFound/Ambiguous codes ConfigNotFound previously used exit code 2 which collides with clap's usage error code. Remap it to exit 20 to avoid ambiguity. Also add dedicated NotFound (exit 17) and Ambiguous (exit 18) error codes with proper ErrorCode variants and Display implementations, replacing the previous incorrect mapping of these errors to GitLabNotFound. Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:54:02 -05:00
Taylor Eernisse	5508d8464a	build: Add clap_complete, libc dependencies and git hash build script Add clap_complete for shell completion generation and libc (unix-only) for SIGPIPE handling. Create build.rs to embed the git commit hash at compile time via cargo:rustc-env=GIT_HASH, enabling `lore version` to display the short hash alongside the version number. Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>	2026-01-30 16:53:51 -05:00
Taylor Eernisse	41d20f1374	chore(beads): Update issue tracker with search pipeline beads Add new beads for the checkpoint-3 search pipeline work including document generation, FTS5 indexing, embedding pipeline, hybrid search, and CLI command implementations. Update status on completed beads. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:39 -05:00
Taylor Eernisse	9b63671df9	docs: Update documentation for search pipeline and Phase A spec - README.md: Add hybrid search and robot mode to feature list. Update quick start to use new noun-first CLI syntax (lore issues, lore mrs, lore search). Add embedding configuration section. Update command examples throughout. - AGENTS.md: Update robot mode examples to new CLI syntax. Add search, sync, stats, and generate-docs commands to the robot mode reference. Update flag conventions (-n for limit, -s for state, -J for JSON). - docs/prd/checkpoint-3.md: Major expansion with gated milestone structure (Gate A: lexical, Gate B: hybrid, Gate C: sync). Add prerequisite rename note, code sample conventions, chunking strategy details, and sqlite-vec rowid encoding scheme. Clarify that Gate A requires only SQLite + FTS5 with no sqlite-vec dependency. - docs/phase-a-spec.md: New detailed specification for Gate A (lexical search MVP) covering document schema, FTS5 configuration, dirty queue mechanics, CLI interface, and acceptance criteria. - docs/api-efficiency-findings.md: Analysis of GitLab API pagination behavior and efficiency observations from production sync runs. Documents the missing x-next-page header issue and heuristic fix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:33 -05:00
Taylor Eernisse	d235f2b4dd	test: Add test suites for embedding, FTS, hybrid search, and golden queries Four new test modules covering the search infrastructure: - tests/embedding.rs: Unit tests for the embedding pipeline including chunk ID encoding/decoding, change detection, and document chunking with overlap verification. - tests/fts_search.rs: Integration tests for FTS5 search including safe query sanitization, multi-term queries, prefix matching, and the raw FTS mode for power users. - tests/hybrid_search.rs: End-to-end tests for hybrid search mode including RRF fusion correctness, graceful degradation when embeddings are unavailable, and filter application. - tests/golden_query_tests.rs: Golden query tests using fixtures from tests/fixtures/golden_queries.json to verify search quality against known-good query/result pairs. Ensures ranking stability across implementation changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:19 -05:00
Taylor Eernisse	daf5a73019	feat(cli): Add search, stats, embed, sync, health, and robot-docs commands Extends the CLI with six new commands that complete the search pipeline: - lore search <QUERY>: Hybrid search with mode selection (lexical, hybrid, semantic), rich filtering (--type, --author, --project, --label, --path, --after, --updated-after), result limits, and optional explain mode showing RRF score breakdowns. Safe FTS mode sanitizes user input; raw mode passes through for power users. - lore stats: Document and index statistics with optional --check for integrity verification and --repair to fix inconsistencies (orphaned documents, missing FTS entries, stale dirty queue items). - lore embed: Generate vector embeddings via Ollama. Supports --retry-failed to re-attempt previously failed embeddings. - lore generate-docs: Drain the dirty queue to regenerate documents. --full seeds all entities for complete rebuild. --project scopes to a single project. - lore sync: Full pipeline orchestration (ingest issues + MRs, generate-docs, embed) with --no-embed and --no-docs flags for partial runs. Reports per-stage results and total elapsed time. - lore health: Quick pre-flight check (config exists, DB exists, schema current). Returns exit code 1 if unhealthy. Designed for agent pre-flight scripts. - lore robot-docs: Machine-readable command manifest for agent self-discovery. Returns all commands, flags, examples, exit codes, and recommended workflows as structured JSON. Also enhances lore init with --gitlab-url, --token-env-var, and --projects flags for fully non-interactive robot-mode initialization. Fixes init's force/non-interactive precedence logic and adds JSON output for robot mode. Updates all command files for the GiError -> LoreError rename. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:10 -05:00
Taylor Eernisse	559f0702ad	feat(ingestion): Mark entities dirty on ingest for document regeneration Integrates the dirty tracking system into all four ingestion paths (issues, MRs, issue discussions, MR discussions). After each entity is upserted within its transaction, a corresponding dirty_queue entry is inserted so the document regenerator knows which documents need rebuilding. This ensures that document generation stays transactionally consistent with data changes: if the ingest transaction rolls back, the dirty marker rolls back too, preventing stale document regeneration attempts. Also updates GiError references to LoreError in these files as part of the codebase-wide rename, and adjusts issue discussion logging from info to debug level to reduce noise during normal sync runs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:51 -05:00
Taylor Eernisse	d5bdb24b0f	feat(search): Add hybrid search engine with FTS5, vector, and RRF fusion Implements the search module providing three search modes: - Lexical (FTS5): Full-text search using SQLite FTS5 with safe query sanitization. User queries are automatically tokenized and wrapped in proper FTS5 syntax. Supports a "raw" mode for power users who want direct FTS5 query syntax (NEAR, column filters, etc.). - Semantic (vector): Embeds the search query via Ollama, then performs cosine similarity search against stored document embeddings. Results are deduplicated by doc_id since documents may have multiple chunks. - Hybrid (default): Executes both lexical and semantic searches in parallel, then fuses results using Reciprocal Rank Fusion (RRF) with k=60. This avoids the complexity of score normalization while producing high-quality merged rankings. Gracefully degrades to lexical-only when embeddings are unavailable. Additional components: - search::filters: Post-retrieval filtering by source_type, author, project, labels (AND logic), file path prefix, created_after, and updated_after. Date filters accept relative formats (7d, 2w) and ISO dates. - search::rrf: Reciprocal Rank Fusion implementation with configurable k parameter and optional explain mode that annotates each result with its component ranks and fusion score breakdown. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:42 -05:00
Taylor Eernisse	723703bed9	feat(embedding): Add Ollama-powered vector embedding pipeline Implements the embedding module that generates vector representations of documents using a local Ollama instance with the nomic-embed-text model. These embeddings enable semantic (vector) search and the hybrid search mode that fuses lexical and semantic results via RRF. Key components: - embedding::ollama: HTTP client for the Ollama /api/embeddings endpoint. Handles connection errors with actionable error messages (OllamaUnavailable, OllamaModelNotFound) and validates response dimensions. - embedding::chunking: Splits long documents into overlapping paragraph-aware chunks for embedding. Uses a configurable max token estimate (8192 default for nomic-embed-text) with 10% overlap to preserve cross-chunk context. - embedding::chunk_ids: Encodes chunk identity as doc_id * 1000 + chunk_index for the embeddings table rowid. This allows vector search to map results back to documents and deduplicate by doc_id efficiently. - embedding::change_detector: Compares document content_hash against stored embedding hashes to skip re-embedding unchanged documents, making incremental embedding runs fast. - embedding::pipeline: Orchestrates the full embedding flow: detect changed documents, chunk them, call Ollama in configurable concurrency (default 4), store results. Supports --retry-failed to re-attempt previously failed embeddings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:30 -05:00
Taylor Eernisse	20edff4ab1	feat(documents): Add document generation pipeline with dirty tracking Implements the documents module that transforms raw ingested entities (issues, MRs, discussions) into searchable document blobs stored in the documents table. This is the foundation for both FTS5 lexical search and vector embedding. Key components: - documents::extractor: Renders entities into structured text documents. Issues include title, description, labels, milestone, assignees, and threaded discussion summaries. MRs additionally include source/target branches, reviewers, and approval status. Discussions are rendered with full note threading. - documents::regenerator: Drains the dirty_queue table to regenerate only documents whose source entities changed since last sync. Supports full rebuild mode (seeds all entities into dirty queue first) and project-scoped regeneration. - documents::truncation: Safety cap at 2MB per document to prevent pathological outliers from degrading FTS or embedding performance. - ingestion::dirty_tracker: Marks entities as dirty inside the ingestion transaction so document regeneration stays consistent with data changes. Uses INSERT OR IGNORE to deduplicate. - ingestion::discussion_queue: Queue-based discussion fetching that isolates individual discussion failures from the broader ingestion pipeline, preventing a single corrupt discussion from blocking an entire project sync. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:18 -05:00
Taylor Eernisse	d31d5292f2	fix(gitlab): Improve pagination heuristics and fix rate limiter lock contention Two targeted fixes to the GitLab API client: 1. Pagination: When the x-next-page header is missing but the current page returned a full page of results, heuristically advance to the next page instead of stopping. This fixes silent data truncation observed with certain GitLab instances that omit pagination headers on intermediate pages. The existing early-exit on empty or partial pages remains as the termination condition. 2. Rate limiter: Refactor the async acquire() method into a synchronous check_delay() that computes the required sleep duration and updates last_request time while holding the mutex, then releases the lock before sleeping. This eliminates holding the Mutex<RateLimiter> across an await point, which previously could block other request tasks unnecessarily during the sleep interval. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:05 -05:00
Taylor Eernisse	6e22f120d0	refactor(core): Rename GiError to LoreError and add search infrastructure Mechanical rename of GiError -> LoreError across the core module to match the project's rebranding from gitlab-inbox to gitlore/lore. Updates the error enum name, all From impls, and the Result type alias. Additionally introduces: - New error variants for embedding pipeline: OllamaUnavailable, OllamaModelNotFound, EmbeddingFailed, EmbeddingsNotBuilt. Each includes actionable suggestions (e.g., "ollama serve", "ollama pull nomic-embed-text") to guide users through recovery. - New error codes 14-16 for programmatic handling of Ollama failures. - Savepoint-based migration execution in db.rs: each migration now runs inside a SQLite SAVEPOINT so a failed migration rolls back cleanly without corrupting the schema_version tracking. Previously a partial migration could leave the database in an inconsistent state. - core::backoff module: exponential backoff with jitter utility for retry loops in the embedding pipeline and discussion queues. - core::project module: helper for resolving project IDs and paths from the local database, used by the document regenerator and search filters. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:45:54 -05:00
Taylor Eernisse	4270603da4	feat(db): Add migrations for documents, FTS5, and embeddings Three new migrations establish the search infrastructure: - 007_documents: Creates the `documents` table as the central search unit. Each document is a rendered text blob derived from an issue, MR, or discussion. Includes `dirty_queue` table for tracking which entities need document regeneration after ingestion changes. - 008_fts5: Creates FTS5 virtual table `documents_fts` with content sync triggers. Uses `unicode61` tokenizer with `remove_diacritics=2` for broad language support. Automatic insert/update/delete triggers keep the FTS index synchronized with the documents table. - 009_embeddings: Creates `embeddings` table for storing vector chunks produced by Ollama. Uses `doc_id * 1000 + chunk_index` rowid encoding to support multi-chunk documents while enabling efficient doc-level deduplication in vector search results. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:45:41 -05:00
Taylor Eernisse	aca4773327	deps: Add rand crate for randomized backoff and jitter The embedding pipeline and retry queues need randomized exponential backoff to prevent thundering herd effects when Ollama or GitLab recover from transient failures. The rand crate (0.8) provides the thread-safe RNG needed for jitter computation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:45:30 -05:00
Taylor Eernisse	f4dba386c9	docs: Restructure checkpoint-3 PRD with gated milestones Reorganizes the Search & Sync MVP plan into three independently verifiable gates (A: Lexical MVP, B: Hybrid MVP, C: Sync MVP) to reduce integration risk. Each gate has explicit deliverables, acceptance criteria, and can ship on its own. Expands the specification with additional detail on document generation, search API surface, sync orchestration, and integrity repair paths. Removes the outdated rename note since the project is now fully migrated to gitlore/lore naming. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:42:39 -05:00
Taylor Eernisse	856aad1641	feat(cli): Redesign CLI with noun-first subcommands Replaces the verb-first pattern ('lore list issues', 'lore show issue 42') with noun-first subcommands that feel more natural: lore issues # list issues lore issues 42 # show issue #42 lore mrs # list merge requests lore mrs 99 # show MR #99 lore ingest # ingest everything lore ingest issues # ingest only issues lore count issues # count issues lore status # sync status lore auth # verify auth lore doctor # health check Key changes: - New IssuesArgs, MrsArgs, IngestArgs, CountArgs structs with short flags (-n, -s, -p, -a, -l, -o, -f, -J, etc.) - Global -J/--json flag as shorthand for --robot - 'lore ingest' with no argument ingests both issues and MRs, emitting combined JSON summary in robot mode - --asc flag replaces --order=asc/desc for brevity - Renamed flags: --has-due-date -> --has-due, --type -> --for, --confirm -> --yes, target_branch -> --target, etc. Old commands (list, show, auth-test, sync-status) are preserved as hidden backward-compat aliases that emit deprecation warnings to stderr before delegating to the new handlers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:42:26 -05:00
Taylor Eernisse	8fe5feda7e	fix(ingestion): Move counter increments after transaction commit Ingestion counters (discussions_upserted, notes_upserted, discussions_fetched, diffnotes_count) were incremented before tx.commit(), meaning a failed commit would report inflated metrics. Counters now increment only after successful commit so reported numbers accurately reflect persisted state. Also simplifies the stale-removal guard in issue discussions: the received_first_response flag was unnecessary since an empty seen_discussion_ids list is safe to pass to remove_stale -- if there were no discussions, stale removal correctly sweeps all previously-stored discussions. The two separate code paths (empty vs populated) are collapsed into a single branch. Derives Default on IngestResult to eliminate verbose zero-init. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:42:11 -05:00
Taylor Eernisse	753ff46bb4	fix(cli): Correct project filtering and GROUP_CONCAT delimiter Two SQL correctness issues fixed: 1. Project filter used LIKE '%term%' which caused partial matches (e.g. filtering for "foo" matched "group/foobar"). Now uses exact match OR suffix match after '/' so "foo" matches "group/foo" but not "group/foobar". 2. GROUP_CONCAT used comma as delimiter for labels and assignees, which broke parsing when label names themselves contained commas. Switched to ASCII unit separator (0x1F) which cannot appear in GitLab entity names. Also adds a guard for negative time deltas in format_relative_time to handle clock skew gracefully instead of panicking. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:41:56 -05:00
Taylor Eernisse	d3a05cfb87	fix(error): Improve error suggestions with inline examples Error suggestions now include concrete CLI examples so users (and robot-mode consumers) can act immediately without consulting docs. For instance, ConfigNotFound now shows the expected path and the exact command to run, TokenNotSet shows the export syntax, and Ambiguous shows the -p flag with example project paths. Also fixes the error code for Ambiguous errors: it now maps to GitLabNotFound instead of InternalError, since the entity exists but the user needs to disambiguate -- not an internal failure. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:41:45 -05:00
Taylor Eernisse	390f8a9288	refactor(core): Centralize timestamp parsing in core::time Duplicate ISO 8601 timestamp parsing functions existed in both discussion.rs and merge_request.rs transformers. This extracts iso_to_ms_strict() and iso_to_ms_opt_strict() into core::time as the single source of truth, and updates both transformer modules to use the shared implementations. Also removes the private now_ms() from merge_request.rs in favor of the existing core::time::now_ms(), and replaces the local parse_timestamp_opt() in discussion.rs with the public iso_to_ms() from core::time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-29 08:41:34 -05:00
teernisse	55b895a2eb	Update name to gitlore instead of gitlab-inbox	2026-01-28 15:49:14 -05:00
teernisse	9a6357c353	Begin planning phase 3-5 implementation	2026-01-27 22:40:49 -05:00
Taylor Eernisse	96ef60fa05	docs: Update documentation for CP2 merge request support Updates project documentation to reflect the complete CP2 feature set with merge request ingestion and robot mode capabilities. README.md: - Add MR-related CLI examples (gi list mrs, gi show mr, gi ingest) - Document robot mode (--robot flag, GI_ROBOT env, auto-detect) - Update feature list with MR support and DiffNote positions - Add configuration section with all config file options - Expand CLI reference with new commands and flags AGENTS.md: - Add MR ingestion patterns for AI agent consumption - Document robot mode JSON schemas for parsing - Include error handling patterns with exit codes - Add discussion/note querying examples for code review context Cargo.toml: - Bump version to 0.2.0 reflecting major feature addition The documentation emphasizes the robot mode design which enables AI agents like Claude Code to reliably parse gi output for automated GitLab workflow integration. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:47:34 -05:00
Taylor Eernisse	d338d68191	test: Add comprehensive test suite for MR ingestion Introduces thorough test coverage for merge request functionality, following the established testing patterns from issue ingestion. New test files: - mr_transformer_tests.rs: NormalizedMergeRequest transformation tests covering full MR with all fields, minimal MR, draft detection via title prefix and work_in_progress field, label/assignee/reviewer extraction, and timestamp conversion - mr_discussion_tests.rs: MR discussion normalization tests including polymorphic noteable binding, DiffNote position extraction with line ranges and SHA triplet, and resolvable note handling - diffnote_position_tests.rs: Exhaustive DiffNote position scenarios covering text/image/file types, single-line vs multi-line comments, added/removed/modified lines, and missing position handling New fixtures: - fixtures/gitlab_merge_request.json: Representative MR API response with nested structures for integration testing Updated tests: - gitlab_types_tests.rs: Add MR type deserialization tests - migration_tests.rs: Update expected schema version to 6 Test design follows property-based patterns where feasible, with explicit edge case coverage for nullable fields and API variants across different GitLab versions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:47:17 -05:00
Taylor Eernisse	8ddc974b89	feat(cli): Add MR support to list/show/count/ingest commands Extends all data commands to support merge requests alongside issues, with consistent patterns and JSON output for robot mode. List command (gi list mrs): - MR-specific columns: branches, draft status, reviewers - Filters: --state (opened\|merged\|closed\|locked\|all), --draft, --no-draft, --reviewer, --target-branch, --source-branch - Discussion count with unresolved indicator (e.g., "5/2!") - JSON output includes full MR metadata Show command (gi show mr <iid>): - MR details with branches, assignees, reviewers, merge status - DiffNote positions showing file:line for code review comments - Full description and discussion bodies (no truncation in JSON) - --json flag for structured output with ISO timestamps Count command (gi count mrs): - MR counting with optional --type filter for discussions/notes - JSON output with breakdown by state Ingest command (gi ingest --type mrs): - Full MR sync with discussion prefetch - Progress output shows MR-specific metrics (diffnotes count) - JSON summary with comprehensive sync statistics All commands respect global --robot mode for auto-JSON output. The pattern "gi list mrs --json \| jq '.mrs[] \| .iid'" now works for scripted MR processing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:46:59 -05:00
Taylor Eernisse	7d0d586932	feat(cli): Add global robot mode for machine-readable output Introduces a unified robot mode that enables JSON output across all commands, designed for AI agent and script consumption. Robot mode activation (any of): - --robot flag: Explicit opt-in - GI_ROBOT=1 env var: For persistent configuration - Non-TTY stdout: Auto-detect when piped (e.g., gi list issues \| jq) Implementation: - Cli::is_robot_mode(): Centralized detection logic - All command handlers receive robot_mode boolean - Errors emit structured JSON to stderr with exit codes - Success responses emit JSON to stdout Behavior changes in robot mode: - No color/emoji output (no ANSI escapes) - No progress spinners or interactive prompts - Timestamps as ISO 8601 strings (not relative "2 hours ago") - Full content (no truncation of descriptions/notes) - Structured error objects with code, message, suggestion This enables reliable parsing by Claude Code, shell scripts, and automation pipelines. The auto-detect on non-TTY means simple piping "just works" without explicit flags. Per-command --json flags remain for explicit control and override robot mode when needed for human-friendly terminal + JSON file output. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:46:27 -05:00
Taylor Eernisse	5fe76e46a3	fix(core): Add structured error handling and responsive lock release Improves core infrastructure with robot-friendly error output and faster lock release for better sync behavior. Error handling improvements (error.rs): - ErrorCode::exit_code(): Unique exit codes per error type (1-13) for programmatic error handling in scripts/agents - GiError::suggestion(): Helpful hints for common error recovery - GiError::to_robot_error(): Structured JSON error conversion - RobotError/RobotErrorOutput: Serializable error types with code, message, and optional suggestion fields Lock improvements (lock.rs): - Heartbeat thread now polls every 100ms for release flag, only updating database heartbeat at full interval (5s default) - Eliminates 5-10s delay after sync completion when waiting for heartbeat thread to notice release - Reduces lock hold time after operation completes Database (db.rs): - Bump expected schema version to 6 for MR migration The exit code mapping enables shell scripts and CI/CD pipelines to distinguish between configuration errors (2-4), GitLab API errors (5-8), and database errors (9-11) for appropriate retry/alert logic. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:46:08 -05:00
Taylor Eernisse	cd44e516e3	feat(ingestion): Implement MR sync with parallel discussion prefetch Adds complete merge request ingestion pipeline with a novel two-phase discussion sync strategy optimized for throughput. New modules: - merge_requests.rs: MR upsert with labels/assignees/reviewers handling, stale MR cleanup, and watermark-based incremental sync - mr_discussions.rs: Parallel prefetch strategy for MR discussions Two-phase MR discussion sync: 1. PREFETCH PHASE: Spawn concurrent tasks to fetch discussions for multiple MRs simultaneously (configurable concurrency, default 8). Transform and validate in parallel, storing results in memory. 2. WRITE PHASE: Serial database writes to avoid lock contention. Each MR's discussions written in a single transaction, with proper stale discussion cleanup. This approach achieves ~4-8x throughput vs serial fetching while maintaining database consistency. Transform errors are tracked per-MR to prevent partial writes from corrupting watermarks. Orchestrator updates: - ingest_merge_requests(): Coordinates MR fetch -> discussion sync flow - Progress callbacks emit MR-specific events for UI feedback - Respects --full flag to reset discussion watermarks for full resync The prefetch strategy is critical for MRs which typically have more discussions than issues, and where API latency dominates sync time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:45:48 -05:00
Taylor Eernisse	d33f24c91b	feat(transformers): Add MR transformer and polymorphic discussion support Introduces NormalizedMergeRequest transformer and updates discussion normalization to handle both issue and MR discussions polymorphically. New transformers: - NormalizedMergeRequest: Transforms API MergeRequest to database row, extracting labels/assignees/reviewers into separate collections for junction table insertion. Handles draft detection, detailed_merge_status preference over deprecated merge_status, and merge_user over merged_by. Discussion transformer updates: - NormalizedDiscussion now takes noteable_type ("Issue" \| "MergeRequest") and noteable_id for polymorphic FK binding - normalize_discussions_for_issue(): Convenience wrapper for issues - normalize_discussions_for_mr(): Convenience wrapper for MRs - DiffNote position fields (type, line_range, SHA triplet) now extracted from API position object for code review context Design decisions: - Transformer returns (normalized_item, labels, assignees, reviewers) tuple for efficient batch insertion without re-querying - Timestamps converted to ms epoch for SQLite storage consistency - Optional fields use map() chains for clean null handling The polymorphic discussion approach allows reusing the same discussions and notes tables for both issues and MRs, with noteable_type + FK determining the parent relationship. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:45:29 -05:00
Taylor Eernisse	cc8c489fd2	feat(gitlab): Add MR and MR discussion API endpoints to client Extends GitLabClient with endpoints for fetching merge requests and their discussions, following the same patterns established for issues. New methods: - fetch_merge_requests(): Paginated MR listing with cursor support, using updated_after filter for incremental sync. Uses 'all' scope to include MRs where user is author/assignee/reviewer. - fetch_merge_requests_single_page(): Single page variant for callers managing their own pagination (used by parallel prefetch) - fetch_mr_discussions(): Paginated discussion listing for a single MR, returns full discussion trees with notes API design notes: - Uses keyset pagination (order_by=updated_at, keyset=true) for consistent results during sync operations - MR endpoint uses /merge_requests (not /mrs) per GitLab API naming - Discussion endpoint matches issue pattern for consistency - Per_page defaults to 100 (GitLab max) for efficiency The fetch_merge_requests_single_page method enables the parallel prefetch strategy used in mr_discussions.rs, where multiple MRs' discussions are fetched concurrently during the sweep phase. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:45:13 -05:00
Taylor Eernisse	a18908c377	feat(gitlab): Add MergeRequest and related types for API deserialization Extends GitLab type definitions with comprehensive merge request support, matching the API response structure for /projects/:id/merge_requests. New types: - MergeRequest: Full MR metadata including draft status, branch info, detailed_merge_status, merge_user (modern API fields replacing deprecated alternatives), and references for cross-project support - MrReviewer: Reviewer user info (MR-specific, distinct from assignees) - MrAssignee: Assignee user info with consistent structure - MrDiscussion: MR discussion wrapper for polymorphic handling - DiffNotePosition: Rich position data for code review comments with line ranges and SHA triplet for commit context Design decisions: - Use Option<T> for all nullable API fields to handle partial responses - Include deprecated fields (merged_by, merge_status) alongside modern alternatives for backward compatibility with older GitLab instances - DiffNotePosition uses Option for all fields since different position types (text/image/file) populate different subsets These types enable type-safe deserialization of GitLab MR API responses with full coverage of the fields needed for CP2 ingestion. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:44:58 -05:00
Taylor Eernisse	39a71d8b85	feat(db): Add schema migration v6 for merge request support Introduces comprehensive database schema for merge request ingestion (CP2), designed with forward compatibility for future features. New tables: - merge_requests: Core MR metadata with draft status, branch info, detailed_merge_status (modern API field), and sync health telemetry columns for debuggability - mr_labels: Junction table linking MRs to shared labels table - mr_assignees: MR assignee usernames (same pattern as issues) - mr_reviewers: MR-specific reviewer tracking (not applicable to issues) Additional indexes: - discussions: Add merge_request_id and resolved status indexes - notes: Add composite indexes for DiffNote file/line queries DiffNote position enhancements: - position_type: 'text' \| 'image' \| 'file' for diff comment semantics - position_line_range_start/end: Multi-line comment range support - position_base_sha/start_sha/head_sha: Commit context for diff notes The schema captures CP3-ready fields (head_sha, references_short/full, SHA triplet) at zero additional API cost, preparing for file-context and cross-project reference features. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:44:37 -05:00
Taylor Eernisse	8afb2c2e75	docs: Expand README with comprehensive CLI and config documentation Significantly expand the README to serve as complete user documentation for the CLI tool, reflecting the full CP1 implementation. Configuration section: - Add missing config options: heartbeatIntervalSeconds, primaryConcurrency, dependentConcurrency, backupDir, embedding provider settings - Document config file resolution order (CLI flag, env var, XDG, local) - Add environment variables table with GITLAB_TOKEN, GI_CONFIG_PATH, XDG_CONFIG_HOME, XDG_DATA_HOME, RUST_LOG Commands section: - Document --full flag for complete re-sync (resets cursors and watermarks) - Add output descriptions for list, show, and count commands - Document assignee filter with @ prefix normalization - Add gi doctor checks explanation (config, db, GitLab auth, Ollama) - Add gi sync-status output description - Add placeholder documentation for backup and reset commands Database schema section: - Reformat as table with descriptions - Add sync_runs, sync_cursors, app_locks, schema_version tables - Note WAL mode and foreign keys enabled Development section: - Add RUST_LOG=gi=trace example for detailed logging Current status section: - Document CP1 scope (issues, discussions, incremental sync) - List not-yet-implemented features (MRs, embeddings, backup/reset) - Reference SPEC.md for full roadmap Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 17:01:37 -05:00
Taylor Eernisse	0952d21a90	docs(prd): Add CP2 PRD and CP1-CP2 alignment audit Add comprehensive planning documentation for Checkpoint 2 (Merge Request support) and document the results of the CP1 implementation audit. checkpoint-2.md (2093 lines): - Complete PRD for adding merge request ingestion, querying, and display - Detailed user stories with acceptance criteria - ASCII wireframes for CLI output formats - Database schema extensions (migrations 006-007) - API integration specifications for MR endpoints - State transition diagrams for MR lifecycle - Performance requirements and test specifications - Risk assessment and mitigation strategies cp1-cp2-alignment-audit.md (344 lines): - Gap analysis between CP1 PRD and actual implementation - Identified issues prioritized by severity (P0/P1/P2/P3) - P0: NormalizedDiscussion struct incompatible with MR discussions - P1: --full flag not resetting discussion watermarks - P2: Missing Link header pagination fallback - P3: Missing sync health telemetry and selective payload storage - Each issue includes root cause, recommended fix, and affected files The P0 and P1 issues have been fixed in accompanying commits. P2 and P3 items are deferred to CP2 implementation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 17:01:20 -05:00
Taylor Eernisse	4abbe2a226	fix(ingest): Reset discussion watermarks when --full flag is used This is a P1 fix from the CP1-CP2 alignment audit. The --full flag was designed to enable complete data re-synchronization, but it only reset sync_cursors for issues—it failed to reset the per-issue discussions_synced_for_updated_at watermark. The result was an inconsistent state: issues would be re-fetched from GitLab (because sync_cursors were cleared), but their discussions would NOT be re-synced (because the watermark comparison prevented it). This was a subtle bug because the watermark check uses: WHERE updated_at > COALESCE(discussions_synced_for_updated_at, 0) When discussions_synced_for_updated_at is already set to the issue's updated_at, the comparison fails and discussions are skipped. Fix: Before clearing sync_cursors, set discussions_synced_for_updated_at to NULL for all issues in the project. This makes COALESCE return 0, ensuring all issues become eligible for discussion sync. The ordering is important: watermarks must be reset BEFORE cursors to ensure the full sync behaves consistently. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 17:01:04 -05:00
Taylor Eernisse	d9d749ac57	fix(discussion): Make NormalizedDiscussion polymorphic for MR support This is a P0 fix from the CP1-CP2 alignment audit. The original NormalizedDiscussion struct had issue_id as a non-optional i64 and hardcoded noteable_type to "Issue", making it incompatible with merge request discussions even though the database schema already supports both via nullable columns and a CHECK constraint. Changes: - Add NoteableRef enum with Issue(i64) and MergeRequest(i64) variants to provide compile-time safety against mixing up issue vs MR IDs - Change NormalizedDiscussion.issue_id from i64 to Option<i64> - Add NormalizedDiscussion.merge_request_id: Option<i64> - Update transform_discussion() signature to take NoteableRef instead of local_issue_id, deriving issue_id/merge_request_id/noteable_type from the enum variant - Update upsert_discussion() SQL to include merge_request_id column (now 12 parameters instead of 11) - Export NoteableRef from transformers module - Add test for MergeRequest discussion transformation - Update all existing tests to use NoteableRef::Issue(id) The database schema (migration 002) was forward-thinking and already supports both issue_id and merge_request_id as nullable columns with a CHECK constraint. This change prepares the application layer for CP2 merge request support without requiring any migrations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 17:00:49 -05:00
teernisse	fbdfd8f4cb	beads	2026-01-26 11:34:04 -05:00
Taylor Eernisse	f53645790a	test: Add comprehensive test suite with fixtures Establishes testing infrastructure for reliable development. tests/fixtures/ - GitLab API response samples: - gitlab_issue.json: Single issue with full metadata - gitlab_issues_page.json: Paginated issue list response - gitlab_discussion.json: Discussion thread with notes - gitlab_discussions_page.json: Paginated discussions response All fixtures captured from real GitLab API responses with sensitive data redacted, ensuring tests match actual behavior. tests/gitlab_types_tests.rs - Type deserialization tests: - Validates serde parsing of all GitLab API types - Tests edge cases: null fields, empty arrays, nested objects - Ensures GitLabIssue, GitLabDiscussion, GitLabNote parse correctly - Verifies optional fields handle missing data gracefully - Tests author/assignee extraction from various formats tests/fixture_tests.rs - Integration with fixtures: - Loads fixture files and validates parsing - Tests transformer functions produce correct database rows - Verifies IssueWithMetadata extracts labels and assignees - Tests NormalizedDiscussion/NormalizedNote structure - Validates raw payload preservation logic tests/migration_tests.rs - Database schema tests: - Creates in-memory SQLite for isolation - Runs all migrations and verifies schema - Tests table creation with expected columns - Validates foreign key constraints - Tests index creation for query performance - Verifies idempotent migration behavior Test infrastructure uses: - tempfile for isolated database instances - wiremock for HTTP mocking (available for future API tests) - Standard Rust #[test] attributes Run with: cargo test Run single: cargo test test_name Run with output: cargo test -- --nocapture Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:29:06 -05:00
Taylor Eernisse	8fb890c528	feat(cli): Implement complete command-line interface Provides a user-friendly CLI for all GitLab Inbox operations. src/cli/mod.rs - Clap command definitions: - Global --config flag for alternate config path - Subcommands: init, auth-test, doctor, version, backup, reset, migrate, sync-status, ingest, list, count, show - Ingest supports --type (issues/merge_requests), --project filter, --force lock override, --full resync - List supports rich filtering: --state, --author, --assignee, --label, --milestone, --since, --due-before, --has-due-date - List supports --sort (updated/created/iid), --order (asc/desc) - List supports --open to launch browser, --json for scripting src/cli/commands/ - Command implementations: init.rs: Interactive configuration wizard - Prompts for GitLab URL, token env var, projects to track - Creates config file and initializes database - Supports --force overwrite and --non-interactive mode auth_test.rs: Verify GitLab authentication - Calls /api/v4/user to validate token - Displays username and GitLab instance URL doctor.rs: Environment health check - Validates config file exists and parses correctly - Checks database connectivity and migration state - Verifies GitLab authentication - Reports token environment variable status - Supports --json output for CI integration ingest.rs: Data synchronization from GitLab - Acquires sync lock with stale detection - Shows progress bars for issues and discussions - Reports sync statistics on completion - Supports --full flag to reset cursors and refetch all data list.rs: Query local database - Formatted table output with comfy-table - Filters build dynamic SQL with parameterized queries - Username filters normalize @ prefix automatically - --open flag uses 'open' crate for cross-platform browser launch - --json outputs array of issue objects show.rs: Detailed entity view - Displays issue metadata in structured format - Shows full description with markdown - Lists labels, assignees, milestone - Shows discussion threads with notes count.rs: Entity statistics - Counts issues, discussions, or notes - Supports --type filter for discussions/notes sync_status.rs: Display sync watermarks - Shows last sync time per project - Displays cursor positions for debugging src/main.rs - Application entry point: - Initializes tracing subscriber with env-filter - Parses CLI arguments via clap - Dispatches to appropriate command handler - Consistent error formatting for all failure modes src/lib.rs - Library entry point: - Exports cli, core, gitlab, ingestion modules - Re-exports Config, GiError, Result for convenience Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:28:52 -05:00
Taylor Eernisse	cd60350c6d	feat(ingestion): Implement cursor-based incremental sync from GitLab Provides efficient data synchronization with minimal API calls. src/ingestion/issues.rs - Issue sync logic: - Cursor-based incremental sync using updated_at timestamp - Fetches only issues modified since last sync - Configurable cursor rewind for overlap safety (default 2s) - Batched database writes with transaction wrapping - Upserts issues, labels, milestones, and assignees - Maintains issue_labels and issue_assignees junction tables - Returns IngestIssuesResult with counts and issues needing discussion sync - Identifies issues where discussion count changed src/ingestion/discussions.rs - Discussion sync logic: - Fetches discussions for issues that need sync - Compares discussion count vs stored to detect changes - Batched note insertion with raw payload preservation - Updates discussion metadata (resolved state, note counts) - Tracks sync state per discussion to enable incremental updates - Returns IngestDiscussionsResult with fetched/skipped counts src/ingestion/orchestrator.rs - Sync coordination: - Two-phase sync: issues first, then discussions - Progress callback support for CLI progress bars - ProgressEvent enum for fine-grained status updates: - IssueFetch, IssueProcess, DiscussionFetch, DiscussionSkip - Acquires sync lock before starting - Updates sync watermark on successful completion - Handles partial failures gracefully (watermark not updated) - Returns IngestProjectResult with detailed statistics The architecture supports future additions: - Merge request ingestion (parallel to issues) - Full-text search indexing hooks - Vector embedding pipeline integration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:28:34 -05:00
Taylor Eernisse	dd5eb04953	feat(gitlab): Implement GitLab REST API client and type definitions Provides a typed interface to the GitLab API with pagination support. src/gitlab/types.rs - API response type definitions: - GitLabIssue: Full issue payload with author, assignees, labels - GitLabDiscussion: Discussion thread with notes array - GitLabNote: Individual note with author, timestamps, body - GitLabAuthor/GitLabUser: User information with avatar URLs - GitLabProject: Project metadata from /api/v4/projects - GitLabVersion: GitLab instance version from /api/v4/version - GitLabNotePosition: Line-level position for diff notes - All types derive Deserialize for JSON parsing src/gitlab/client.rs - HTTP client with authentication: - Bearer token authentication from config - Base URL configuration for self-hosted instances - Paginated iteration via keyset or offset pagination - Automatic Link header parsing for next page URLs - Per-page limit control (default 100) - Methods: get_user(), get_version(), get_project() - Async stream for issues: list_issues_paginated() - Async stream for discussions: list_issue_discussions_paginated() - Respects GitLab rate limiting via response headers src/gitlab/transformers/ - API to database mapping: transformers/issue.rs - Issue transformation: - Maps GitLabIssue to IssueRow for database insert - Extracts milestone ID and due date - Normalizes author/assignee usernames - Preserves label IDs for junction table - Returns IssueWithMetadata including label/assignee lists transformers/discussion.rs - Discussion transformation: - Maps GitLabDiscussion to NormalizedDiscussion - Extracts thread metadata (resolvable, resolved) - Flattens notes to NormalizedNote with foreign keys - Handles system notes vs user notes - Preserves note position for diff discussions transformers/mod.rs - Re-exports all transformer types Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:28:21 -05:00
Taylor Eernisse	7aaa51f645	feat(core): Implement infrastructure layer for CLI operations Establishes foundational modules that all other components depend on. src/core/config.rs - Configuration management: - JSON-based config file with Zod-like validation via serde - GitLab settings: base URL, token environment variable - Project list with paths to track - Sync settings: backfill days, stale lock timeout, cursor rewind - Storage settings: database path, payload compression toggle - XDG-compliant config path resolution via dirs crate - Loads GITLAB_TOKEN from configured environment variable src/core/db.rs - Database connection and migrations: - Opens or creates SQLite database with WAL mode for concurrency - Embeds migration SQL as const strings (001-005) - Runs migrations idempotently with checksum verification - Provides thread-safe connection management src/core/error.rs - Unified error handling: - GiError enum with variants for all failure modes - Config, Database, GitLab, Ingestion, Lock, IO, Parse errors - thiserror derive for automatic Display/Error impls - Result type alias for ergonomic error propagation src/core/lock.rs - Distributed sync locking: - File-based locks to prevent concurrent syncs - Stale lock detection with configurable timeout - Force override for recovery scenarios - Lock file contains PID and timestamp for debugging src/core/paths.rs - Path resolution: - XDG Base Directory Specification compliance - Config: ~/.config/gi/config.json - Data: ~/.local/share/gi/gi.db - Creates parent directories on first access src/core/payloads.rs - Raw payload storage: - Optional gzip compression for storage efficiency - SHA-256 content addressing for deduplication - Type-prefixed keys (issue:, discussion:, note:) - Batch insert with UPSERT for idempotent ingestion src/core/time.rs - Timestamp utilities: - Relative time parsing (7d, 2w, 1m) for --since flag - ISO 8601 date parsing for absolute dates - Human-friendly relative time formatting Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:28:07 -05:00
Taylor Eernisse	d15f457a58	feat(db): Add SQLite database migrations for GitLab data model Implements a comprehensive relational schema for storing GitLab data with full audit trail and raw payload preservation. Migration 001_initial.sql establishes core metadata tables: - projects: Tracked GitLab projects with paths and namespace - sync_watermarks: Cursor-based incremental sync state per project - schema_migrations: Migration tracking with checksums for integrity Migration 002_issues.sql creates the issues data model: - issues: Core issue data with timestamps, author, state, counts - labels: Project-specific label definitions with colors/descriptions - issue_labels: Many-to-many junction for issue-label relationships - milestones: Project milestones with state and due dates - discussions: Threaded discussions linked to issues/MRs - notes: Individual notes within discussions with full metadata - raw_payloads: Compressed original API responses keyed by entity Migration 003_indexes.sql adds performance indexes: - Covering indexes for common query patterns (state, updated_at) - Composite indexes for filtered queries (project + state) Migration 004_discussions_payload.sql extends discussions: - Adds raw_payload column for discussion-level API preservation - Enables debugging and data recovery from original responses Migration 005_assignees_milestone_duedate.sql completes the model: - issue_assignees: Many-to-many for multiple assignees per issue - Adds milestone_id, due_date columns to issues table - Indexes for assignee and milestone filtering Schema supports both incremental sync and full historical queries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:27:51 -05:00
Taylor Eernisse	986bc59f6a	docs: Add comprehensive documentation and planning artifacts README.md provides complete user documentation: - Installation via cargo install or build from source - Quick start guide with example commands - Configuration file format with all options documented - Full command reference for init, auth-test, doctor, ingest, list, show, count, sync-status, migrate, and version - Database schema overview covering projects, issues, milestones, assignees, labels, discussions, notes, and raw payloads - Development setup with test, lint, and debug commands SPEC.md updated from original TypeScript planning document: - Added note clarifying this is historical (implementation uses Rust) - Updated sqlite-vss references to sqlite-vec (deprecated library) - Added architecture overview with Technology Choices rationale - Expanded project structure showing all planned modules docs/prd/ contains detailed checkpoint planning: - checkpoint-0.md: Initial project vision and requirements - checkpoint-1.md: Revised planning after technology decisions These documents capture the evolution from initial concept through the decision to use Rust for performance and type safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:27:40 -05:00
Taylor Eernisse	e065862f81	feat: Initialize Rust project with dependencies and tooling Set up the GitLab Inbox (gi) CLI tool as a Rust 2024 edition project. Dependencies organized by purpose: - Database: rusqlite (bundled SQLite), sqlite-vec for vector search - Serialization: serde/serde_json for GitLab API responses - CLI: clap for argument parsing, dialoguer for interactive prompts, comfy-table for formatted output, indicatif for progress bars - HTTP: reqwest with tokio async runtime for GitLab API calls - Async: async-stream and futures for paginated API iteration - Utilities: thiserror for error types, chrono for timestamps, flate2 for payload compression, sha2 for content hashing - Logging: tracing with env-filter for structured debug output Release profile optimized for small binary size (LTO, strip symbols). Project structure follows standard Rust conventions with src/lib.rs exposing modules and src/main.rs as CLI entry point. Added .gitignore for Rust/Cargo artifacts and local database files. Added AGENTS.md with TDD workflow guidance and beads issue tracking integration instructions for AI-assisted development. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:27:30 -05:00
teernisse	e846a39ce6	More planning	2026-01-23 15:31:52 -05:00
teernisse	1f36fe6a21	more planning	2026-01-21 15:56:11 -05:00
teernisse	97a303eca9	Spec iterations	2026-01-20 16:43:39 -05:00
teernisse	7702d2a493	initial	2026-01-20 13:11:40 -05:00

1 2 3 4

199 Commits