Taylor Eernisse d5bdb24b0f feat(search): Add hybrid search engine with FTS5, vector, and RRF fusion
Implements the search module providing three search modes:

- Lexical (FTS5): Full-text search using SQLite FTS5 with safe query
  sanitization. User queries are automatically tokenized and wrapped
  in proper FTS5 syntax. Supports a "raw" mode for power users who
  want direct FTS5 query syntax (NEAR, column filters, etc.).

- Semantic (vector): Embeds the search query via Ollama, then performs
  cosine similarity search against stored document embeddings. Results
  are deduplicated by doc_id since documents may have multiple chunks.

- Hybrid (default): Executes both lexical and semantic searches in
  parallel, then fuses results using Reciprocal Rank Fusion (RRF) with
  k=60. This avoids the complexity of score normalization while
  producing high-quality merged rankings. Gracefully degrades to
  lexical-only when embeddings are unavailable.

Additional components:

- search::filters: Post-retrieval filtering by source_type, author,
  project, labels (AND logic), file path prefix, created_after, and
  updated_after. Date filters accept relative formats (7d, 2w) and
  ISO dates.

- search::rrf: Reciprocal Rank Fusion implementation with configurable
  k parameter and optional explain mode that annotates each result
  with its component ranks and fusion score breakdown.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 15:46:42 -05:00

Gitlore

Local GitLab data management with semantic search. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying and filtering.

Features

  • Local-first: All data stored in SQLite for instant queries
  • Incremental sync: Cursor-based sync only fetches changes since last sync
  • Full re-sync: Reset cursors and fetch all data from scratch when needed
  • Multi-project: Track issues and MRs across multiple GitLab projects
  • Rich filtering: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
  • Raw payload storage: Preserves original GitLab API responses for debugging
  • Discussion threading: Full support for issue and MR discussions including inline code review comments

Installation

cargo install --path .

Or build from source:

cargo build --release
./target/release/lore --help

Quick Start

# Initialize configuration (interactive)
lore init

# Verify authentication
lore auth-test

# Sync issues from GitLab
lore ingest --type issues

# Sync merge requests from GitLab
lore ingest --type mrs

# List recent issues
lore list issues --limit 10

# List open merge requests
lore list mrs --state opened

# Show issue details
lore show issue 123 --project group/repo

# Show MR details with discussions
lore show mr 456 --project group/repo

Configuration

Configuration is stored in ~/.config/lore/config.json (or $XDG_CONFIG_HOME/lore/config.json).

Example Configuration

{
  "gitlab": {
    "baseUrl": "https://gitlab.com",
    "tokenEnvVar": "GITLAB_TOKEN"
  },
  "projects": [
    { "path": "group/project" },
    { "path": "other-group/other-project" }
  ],
  "sync": {
    "backfillDays": 14,
    "staleLockMinutes": 10,
    "heartbeatIntervalSeconds": 30,
    "cursorRewindSeconds": 2,
    "primaryConcurrency": 4,
    "dependentConcurrency": 2
  },
  "storage": {
    "compressRawPayloads": true
  }
}

Configuration Options

Section Field Default Description
gitlab baseUrl GitLab instance URL (required)
gitlab tokenEnvVar GITLAB_TOKEN Environment variable containing API token
projects path Project path (e.g., group/project)
sync backfillDays 14 Days to backfill on initial sync
sync staleLockMinutes 10 Minutes before sync lock considered stale
sync heartbeatIntervalSeconds 30 Frequency of lock heartbeat updates
sync cursorRewindSeconds 2 Seconds to rewind cursor for overlap safety
sync primaryConcurrency 4 Concurrent GitLab requests for primary resources
sync dependentConcurrency 2 Concurrent requests for dependent resources
storage dbPath ~/.local/share/lore/lore.db Database file path
storage backupDir ~/.local/share/lore/backups Backup directory
storage compressRawPayloads true Compress stored API responses with gzip
embedding provider ollama Embedding provider
embedding model nomic-embed-text Model name for embeddings
embedding baseUrl http://localhost:11434 Ollama server URL
embedding concurrency 4 Concurrent embedding requests

Config File Resolution

The config file is resolved in this order:

  1. --config CLI flag
  2. LORE_CONFIG_PATH environment variable
  3. ~/.config/lore/config.json (XDG default)
  4. ./lore.config.json (local fallback for development)

GitLab Token

Create a personal access token with read_api scope:

  1. Go to GitLab → Settings → Access Tokens
  2. Create token with read_api scope
  3. Export it: export GITLAB_TOKEN=glpat-xxxxxxxxxxxx

Environment Variables

Variable Purpose Required
GITLAB_TOKEN GitLab API authentication token (name configurable via gitlab.tokenEnvVar) Yes
LORE_CONFIG_PATH Override config file location No
XDG_CONFIG_HOME XDG Base Directory for config (fallback: ~/.config) No
XDG_DATA_HOME XDG Base Directory for data (fallback: ~/.local/share) No
RUST_LOG Logging level filter (e.g., lore=debug) No

Commands

lore init

Initialize configuration and database interactively.

lore init                    # Interactive setup
lore init --force            # Overwrite existing config
lore init --non-interactive  # Fail if prompts needed

lore auth-test

Verify GitLab authentication is working.

lore auth-test
# Authenticated as @username (Full Name)
# GitLab: https://gitlab.com

lore doctor

Check environment health and configuration.

lore doctor          # Human-readable output
lore doctor --json   # JSON output for scripting

Checks performed:

  • Config file existence and validity
  • Database existence and pragmas (WAL mode, foreign keys)
  • GitLab authentication
  • Project accessibility
  • Ollama connectivity (optional)

lore ingest

Sync data from GitLab to local database.

# Issues
lore ingest --type issues                       # Sync all projects
lore ingest --type issues --project group/repo  # Single project
lore ingest --type issues --force               # Override stale lock
lore ingest --type issues --full                # Full re-sync (reset cursors)

# Merge Requests
lore ingest --type mrs                          # Sync all projects
lore ingest --type mrs --project group/repo     # Single project
lore ingest --type mrs --full                   # Full re-sync (reset cursors)

The --full flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:

  • Assignee data or other fields were missing from earlier syncs
  • You want to ensure complete data after schema changes
  • Troubleshooting sync issues

lore list issues

Query issues from local database.

lore list issues                              # Recent issues (default 50)
lore list issues --limit 100                  # More results
lore list issues --state opened               # Only open issues
lore list issues --state closed               # Only closed issues
lore list issues --author username            # By author (@ prefix optional)
lore list issues --assignee username          # By assignee (@ prefix optional)
lore list issues --label bug                  # By label (AND logic)
lore list issues --label bug --label urgent   # Multiple labels
lore list issues --milestone "v1.0"           # By milestone title
lore list issues --since 7d                   # Updated in last 7 days
lore list issues --since 2w                   # Updated in last 2 weeks
lore list issues --since 2024-01-01           # Updated since date
lore list issues --due-before 2024-12-31      # Due before date
lore list issues --has-due-date               # Only issues with due dates
lore list issues --project group/repo         # Filter by project
lore list issues --sort created --order asc   # Sort options
lore list issues --open                       # Open first result in browser
lore list issues --json                       # JSON output

Output includes: IID, title, state, author, assignee, labels, and update time.

lore list mrs

Query merge requests from local database.

lore list mrs                                 # Recent MRs (default 50)
lore list mrs --limit 100                     # More results
lore list mrs --state opened                  # Only open MRs
lore list mrs --state merged                  # Only merged MRs
lore list mrs --state closed                  # Only closed MRs
lore list mrs --state locked                  # Only locked MRs
lore list mrs --state all                     # All states
lore list mrs --author username               # By author (@ prefix optional)
lore list mrs --assignee username             # By assignee (@ prefix optional)
lore list mrs --reviewer username             # By reviewer (@ prefix optional)
lore list mrs --draft                         # Only draft/WIP MRs
lore list mrs --no-draft                      # Exclude draft MRs
lore list mrs --target-branch main            # By target branch
lore list mrs --source-branch feature/foo     # By source branch
lore list mrs --label needs-review            # By label (AND logic)
lore list mrs --since 7d                      # Updated in last 7 days
lore list mrs --project group/repo            # Filter by project
lore list mrs --sort created --order asc      # Sort options
lore list mrs --open                          # Open first result in browser
lore list mrs --json                          # JSON output

Output includes: IID, title (with [DRAFT] prefix if applicable), state, author, assignee, labels, and update time.

lore show issue

Display detailed issue information.

lore show issue 123                      # Show issue #123
lore show issue 123 --project group/repo # Disambiguate if needed

Shows: title, description, state, author, assignees, labels, milestone, due date, web URL, and threaded discussions.

lore show mr

Display detailed merge request information.

lore show mr 456                         # Show MR !456
lore show mr 456 --project group/repo    # Disambiguate if needed

Shows: title, description, state, draft status, author, assignees, reviewers, labels, source/target branches, merge status, web URL, and threaded discussions. Inline code review comments (DiffNotes) display file context in the format [src/file.ts:45].

lore count

Count entities in local database.

lore count issues                    # Total issues
lore count mrs                       # Total MRs (with state breakdown)
lore count discussions               # Total discussions
lore count discussions --type issue  # Issue discussions only
lore count discussions --type mr     # MR discussions only
lore count notes                     # Total notes (shows system vs user breakdown)

lore sync-status

Show current sync state and watermarks.

lore sync-status

Displays:

  • Last sync run details (status, timing)
  • Cursor positions per project and resource type (issues and MRs)
  • Data summary counts

lore migrate

Run pending database migrations.

lore migrate

Shows current schema version and applies any pending migrations.

lore version

Show version information.

lore version

lore backup

Create timestamped database backup.

lore backup

Note: Not yet implemented.

lore reset

Delete database and reset all state.

lore reset --confirm

Note: Not yet implemented.

Database Schema

Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:

Table Purpose
projects Tracked GitLab projects with metadata
issues Issue metadata (title, state, author, due date, milestone)
merge_requests MR metadata (title, state, draft, branches, merge status)
milestones Project milestones with state and due dates
labels Project labels with colors
issue_labels Many-to-many issue-label relationships
issue_assignees Many-to-many issue-assignee relationships
mr_labels Many-to-many MR-label relationships
mr_assignees Many-to-many MR-assignee relationships
mr_reviewers Many-to-many MR-reviewer relationships
discussions Issue/MR discussion threads
notes Individual notes within discussions (with system note flag and DiffNote position data)
sync_runs Audit trail of sync operations
sync_cursors Cursor positions for incremental sync
app_locks Crash-safe single-flight lock
raw_payloads Compressed original API responses
schema_version Migration version tracking

The database is stored at ~/.local/share/lore/lore.db by default (XDG compliant).

Global Options

lore --config /path/to/config.json <command>  # Use alternate config

Development

# Run tests
cargo test

# Run with debug logging
RUST_LOG=lore=debug lore list issues

# Run with trace logging
RUST_LOG=lore=trace lore ingest --type issues

# Check formatting
cargo fmt --check

# Lint
cargo clippy

Tech Stack

  • Rust (2024 edition)
  • SQLite via rusqlite (bundled)
  • clap for CLI parsing
  • reqwest for HTTP
  • tokio for async runtime
  • serde for serialization
  • tracing for logging
  • indicatif for progress bars

Current Status

This is Checkpoint 2 (CP2) of the Gitlore project. Currently implemented:

  • Issue ingestion with cursor-based incremental sync
  • Merge request ingestion with cursor-based incremental sync
  • Discussion and note syncing for issues and MRs
  • DiffNote support for inline code review comments
  • Rich filtering and querying for both issues and MRs
  • Full re-sync capability with watermark reset

Not yet implemented:

  • Semantic search with embeddings (CP3+)
  • Backup and reset commands

See SPEC.md for the full project roadmap and architecture.

License

MIT

Description
No description provided
Readme 42 MiB
Languages
Rust 100%