teernisse 86a51cddef fix: Project-scoped job claiming, structured rate-limit logging, RRF total_cmp
Targeted fixes across multiple subsystems:

dependent_queue:
- Add project_id parameter to claim_jobs() for project-scoped job claiming,
  preventing cross-project job theft during concurrent multi-project ingestion
- Add project_id parameter to count_pending_jobs() with optional scoping
  (None returns global counts, Some(pid) returns per-project counts)

gitlab/client:
- Downgrade rate-limit log from warn to info (429s are expected operational
  behavior, not warnings) and add structured fields (path, status_code)
  for better log filtering and aggregation

gitlab/transformers/discussion:
- Add tracing::warn on invalid timestamp parse instead of silent fallback
  to epoch 0, making data quality issues visible in logs

ingestion/merge_requests:
- Remove duplicate doc comment on upsert_label_tx

search/rrf:
- Replace partial_cmp().unwrap_or() with total_cmp() for f64 sorting,
  eliminating the NaN edge case entirely (total_cmp treats NaN consistently)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:39:13 -05:00

Gitlore

Local GitLab data management with semantic search. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, and hybrid search.

Features

  • Local-first: All data stored in SQLite for instant queries
  • Incremental sync: Cursor-based sync only fetches changes since last sync
  • Full re-sync: Reset cursors and fetch all data from scratch when needed
  • Multi-project: Track issues and MRs across multiple GitLab projects
  • Rich filtering: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
  • Hybrid search: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
  • Raw payload storage: Preserves original GitLab API responses for debugging
  • Discussion threading: Full support for issue and MR discussions including inline code review comments
  • Robot mode: Machine-readable JSON output with structured errors and meaningful exit codes

Installation

cargo install --path .

Or build from source:

cargo build --release
./target/release/lore --help

Quick Start

# Initialize configuration (interactive)
lore init

# Verify authentication
lore auth

# Sync everything from GitLab (issues + MRs + docs + embeddings)
lore sync

# List recent issues
lore issues -n 10

# List open merge requests
lore mrs -s opened

# Show issue details
lore issues 123

# Show MR details with discussions
lore mrs 456

# Search across all indexed data
lore search "authentication bug"

# Robot mode (machine-readable JSON)
lore -J issues -n 5 | jq .

Configuration

Configuration is stored in ~/.config/lore/config.json (or $XDG_CONFIG_HOME/lore/config.json).

Example Configuration

{
  "gitlab": {
    "baseUrl": "https://gitlab.com",
    "tokenEnvVar": "GITLAB_TOKEN"
  },
  "projects": [
    { "path": "group/project" },
    { "path": "other-group/other-project" }
  ],
  "sync": {
    "backfillDays": 14,
    "staleLockMinutes": 10,
    "heartbeatIntervalSeconds": 30,
    "cursorRewindSeconds": 2,
    "primaryConcurrency": 4,
    "dependentConcurrency": 2
  },
  "storage": {
    "compressRawPayloads": true
  },
  "embedding": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "baseUrl": "http://localhost:11434",
    "concurrency": 4
  }
}

Configuration Options

Section Field Default Description
gitlab baseUrl -- GitLab instance URL (required)
gitlab tokenEnvVar GITLAB_TOKEN Environment variable containing API token
projects path -- Project path (e.g., group/project)
sync backfillDays 14 Days to backfill on initial sync
sync staleLockMinutes 10 Minutes before sync lock considered stale
sync heartbeatIntervalSeconds 30 Frequency of lock heartbeat updates
sync cursorRewindSeconds 2 Seconds to rewind cursor for overlap safety
sync primaryConcurrency 4 Concurrent GitLab requests for primary resources
sync dependentConcurrency 2 Concurrent requests for dependent resources
storage dbPath ~/.local/share/lore/lore.db Database file path
storage backupDir ~/.local/share/lore/backups Backup directory
storage compressRawPayloads true Compress stored API responses with gzip
embedding provider ollama Embedding provider
embedding model nomic-embed-text Model name for embeddings
embedding baseUrl http://localhost:11434 Ollama server URL
embedding concurrency 4 Concurrent embedding requests

Config File Resolution

The config file is resolved in this order:

  1. --config / -c CLI flag
  2. LORE_CONFIG_PATH environment variable
  3. ~/.config/lore/config.json (XDG default)
  4. ./lore.config.json (local fallback for development)

GitLab Token

Create a personal access token with read_api scope:

  1. Go to GitLab > Settings > Access Tokens
  2. Create token with read_api scope
  3. Export it: export GITLAB_TOKEN=glpat-xxxxxxxxxxxx

Environment Variables

Variable Purpose Required
GITLAB_TOKEN GitLab API authentication token (name configurable via gitlab.tokenEnvVar) Yes
LORE_CONFIG_PATH Override config file location No
LORE_ROBOT Enable robot mode globally (set to true or 1) No
XDG_CONFIG_HOME XDG Base Directory for config (fallback: ~/.config) No
XDG_DATA_HOME XDG Base Directory for data (fallback: ~/.local/share) No
NO_COLOR Disable color output when set (any value) No
CLICOLOR Standard color control (0 to disable) No
RUST_LOG Logging level filter (e.g., lore=debug) No

Commands

lore issues

Query issues from local database, or show a specific issue.

lore issues                           # Recent issues (default 50)
lore issues 123                       # Show issue #123 with discussions
lore issues 123 -p group/repo        # Disambiguate by project
lore issues -n 100                    # More results
lore issues -s opened                 # Only open issues
lore issues -s closed                 # Only closed issues
lore issues -a username               # By author (@ prefix optional)
lore issues -A username               # By assignee (@ prefix optional)
lore issues -l bug                    # By label (AND logic)
lore issues -l bug -l urgent          # Multiple labels
lore issues -m "v1.0"                 # By milestone title
lore issues --since 7d               # Updated in last 7 days
lore issues --since 2w               # Updated in last 2 weeks
lore issues --since 1m               # Updated in last month
lore issues --since 2024-01-01       # Updated since date
lore issues --due-before 2024-12-31  # Due before date
lore issues --has-due                 # Only issues with due dates
lore issues -p group/repo            # Filter by project
lore issues --sort created --asc     # Sort by created date, ascending
lore issues -o                        # Open first result in browser

When listing, output includes: IID, title, state, author, assignee, labels, and update time.

When showing a single issue (e.g., lore issues 123), output includes: title, description, state, author, assignees, labels, milestone, due date, web URL, and threaded discussions.

Project Resolution

The -p / --project flag uses cascading match logic across all commands:

  1. Exact match: group/project
  2. Case-insensitive: Group/Project
  3. Suffix match: project matches group/project (if unambiguous)
  4. Substring match: typescript matches vs/typescript-code (if unambiguous)

If multiple projects match, an error lists the candidates with a hint to use the full path.

lore mrs

Query merge requests from local database, or show a specific MR.

lore mrs                              # Recent MRs (default 50)
lore mrs 456                          # Show MR !456 with discussions
lore mrs 456 -p group/repo           # Disambiguate by project
lore mrs -n 100                       # More results
lore mrs -s opened                    # Only open MRs
lore mrs -s merged                    # Only merged MRs
lore mrs -s closed                    # Only closed MRs
lore mrs -s locked                    # Only locked MRs
lore mrs -s all                       # All states
lore mrs -a username                  # By author (@ prefix optional)
lore mrs -A username                  # By assignee (@ prefix optional)
lore mrs -r username                  # By reviewer (@ prefix optional)
lore mrs -d                           # Only draft/WIP MRs
lore mrs -D                           # Exclude draft MRs
lore mrs --target main               # By target branch
lore mrs --source feature/foo        # By source branch
lore mrs -l needs-review              # By label (AND logic)
lore mrs --since 7d                  # Updated in last 7 days
lore mrs -p group/repo               # Filter by project
lore mrs --sort created --asc        # Sort by created date, ascending
lore mrs -o                           # Open first result in browser

When listing, output includes: IID, title (with [DRAFT] prefix if applicable), state, author, assignee, labels, and update time.

When showing a single MR (e.g., lore mrs 456), output includes: title, description, state, draft status, author, assignees, reviewers, labels, source/target branches, merge status, web URL, and threaded discussions. Inline code review comments (DiffNotes) display file context in the format [src/file.ts:45].

Search across indexed documents using hybrid (lexical + semantic), lexical-only, or semantic-only modes.

lore search "authentication bug"              # Hybrid search (default)
lore search "login flow" --mode lexical       # FTS5 lexical only
lore search "login flow" --mode semantic      # Vector similarity only
lore search "auth" --type issue               # Filter by source type
lore search "auth" --type mr                  # MR documents only
lore search "auth" --type discussion          # Discussion documents only
lore search "deploy" --author username        # Filter by author
lore search "deploy" -p group/repo           # Filter by project
lore search "deploy" --label backend          # Filter by label (AND logic)
lore search "deploy" --path src/             # Filter by file path (trailing / for prefix)
lore search "deploy" --after 7d              # Created after (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-after 2w      # Updated after
lore search "deploy" -n 50                    # Limit results (default 20, max 100)
lore search "deploy" --explain               # Show ranking explanation per result
lore search "deploy" --fts-mode raw          # Raw FTS5 query syntax (advanced)

Requires lore generate-docs (or lore sync) to have been run at least once. Semantic and hybrid modes require lore embed (or lore sync) to have generated vector embeddings via Ollama.

lore sync

Run the full sync pipeline: ingest from GitLab, generate searchable documents, and compute embeddings.

lore sync                    # Full pipeline
lore sync --full             # Reset cursors, fetch everything
lore sync --force            # Override stale lock
lore sync --no-embed         # Skip embedding step
lore sync --no-docs          # Skip document regeneration

lore ingest

Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings).

lore ingest                                    # Ingest everything (issues + MRs)
lore ingest issues                             # Issues only
lore ingest mrs                                # MRs only
lore ingest issues -p group/repo              # Single project
lore ingest --force                            # Override stale lock
lore ingest --full                             # Full re-sync (reset cursors)

The --full flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:

  • Assignee data or other fields were missing from earlier syncs
  • You want to ensure complete data after schema changes
  • Troubleshooting sync issues

lore generate-docs

Extract searchable documents from ingested issues, MRs, and discussions for the FTS5 index.

lore generate-docs                    # Incremental (dirty items only)
lore generate-docs --full             # Full rebuild
lore generate-docs -p group/repo     # Single project

lore embed

Generate vector embeddings for documents via Ollama. Requires Ollama running with the configured embedding model.

lore embed                    # Embed new/changed documents
lore embed --retry-failed     # Retry previously failed embeddings

lore count

Count entities in local database.

lore count issues                     # Total issues
lore count mrs                        # Total MRs (with state breakdown)
lore count discussions                # Total discussions
lore count discussions --for issue   # Issue discussions only
lore count discussions --for mr      # MR discussions only
lore count notes                      # Total notes (system vs user breakdown)
lore count notes --for issue         # Issue notes only

lore stats

Show document and index statistics, with optional integrity checks.

lore stats                    # Document and index statistics
lore stats --check            # Run integrity checks
lore stats --check --repair   # Repair integrity issues

lore status

Show current sync state and watermarks.

lore status

Displays:

  • Last sync run details (status, timing)
  • Cursor positions per project and resource type (issues and MRs)
  • Data summary counts

lore init

Initialize configuration and database interactively.

lore init                    # Interactive setup
lore init --force            # Overwrite existing config
lore init --non-interactive  # Fail if prompts needed

lore auth

Verify GitLab authentication is working.

lore auth
# Authenticated as @username (Full Name)
# GitLab: https://gitlab.com

lore doctor

Check environment health and configuration.

lore doctor

Checks performed:

  • Config file existence and validity
  • Database existence and pragmas (WAL mode, foreign keys)
  • GitLab authentication
  • Project accessibility
  • Ollama connectivity (optional)

lore migrate

Run pending database migrations.

lore migrate

lore health

Quick pre-flight check for config, database, and schema version. Exits 0 if healthy, 1 if unhealthy.

lore health

Useful as a fast gate before running queries or syncs. For a more thorough check including authentication and project access, use lore doctor.

lore robot-docs

Machine-readable command manifest for agent self-discovery. Returns a JSON schema of all commands, flags, exit codes, and example workflows.

lore robot-docs                   # Pretty-printed JSON
lore --robot robot-docs           # Compact JSON for parsing

lore version

Show version information including the git commit hash.

lore version
# lore version 0.1.0 (abc1234)

Robot Mode

Machine-readable JSON output for scripting and AI agent consumption.

Activation

# Global flag
lore --robot issues -n 5

# JSON shorthand (-J)
lore -J issues -n 5

# Environment variable
LORE_ROBOT=1 lore issues -n 5

# Auto-detection (when stdout is not a TTY)
lore issues -n 5 | jq .

Response Format

All commands return consistent JSON:

{"ok": true, "data": {...}, "meta": {...}}

Errors return structured JSON to stderr:

{"error": {"code": "CONFIG_NOT_FOUND", "message": "...", "suggestion": "Run 'lore init'"}}

Exit Codes

Code Meaning
0 Success
1 Internal error / health check failed / not implemented
2 Usage error (invalid flags or arguments)
3 Config invalid
4 Token not set
5 GitLab auth failed
6 Resource not found
7 Rate limited
8 Network error
9 Database locked
10 Database error
11 Migration failed
12 I/O error
13 Transform error
14 Ollama unavailable
15 Ollama model not found
16 Embedding failed
17 Not found (entity does not exist)
18 Ambiguous match (use -p to specify project)
20 Config not found

Configuration Precedence

Settings are resolved in this order (highest to lowest priority):

  1. CLI flags (--robot, --config, --color)
  2. Environment variables (LORE_ROBOT, GITLAB_TOKEN, LORE_CONFIG_PATH)
  3. Config file (~/.config/lore/config.json)
  4. Built-in defaults

Global Options

lore -c /path/to/config.json <command>   # Use alternate config
lore --robot <command>                    # Machine-readable JSON
lore -J <command>                         # JSON shorthand
lore --color never <command>              # Disable color output
lore --color always <command>             # Force color output
lore -q <command>                         # Suppress non-essential output

Color output respects NO_COLOR and CLICOLOR environment variables in auto mode (the default).

Shell Completions

Generate shell completions for tab-completion support:

# Bash (add to ~/.bashrc)
lore completions bash > ~/.local/share/bash-completion/completions/lore

# Zsh (add to ~/.zshrc: fpath=(~/.zfunc $fpath))
lore completions zsh > ~/.zfunc/_lore

# Fish
lore completions fish > ~/.config/fish/completions/lore.fish

# PowerShell (add to $PROFILE)
lore completions powershell >> $PROFILE

Database Schema

Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:

Table Purpose
projects Tracked GitLab projects with metadata
issues Issue metadata (title, state, author, due date, milestone)
merge_requests MR metadata (title, state, draft, branches, merge status)
milestones Project milestones with state and due dates
labels Project labels with colors
issue_labels Many-to-many issue-label relationships
issue_assignees Many-to-many issue-assignee relationships
mr_labels Many-to-many MR-label relationships
mr_assignees Many-to-many MR-assignee relationships
mr_reviewers Many-to-many MR-reviewer relationships
discussions Issue/MR discussion threads
notes Individual notes within discussions (with system note flag and DiffNote position data)
documents Extracted searchable text for FTS and embedding
documents_fts FTS5 full-text search index
embeddings Vector embeddings for semantic search
dirty_sources Entities needing document regeneration after ingest
pending_discussion_fetches Queue for discussion fetch operations
sync_runs Audit trail of sync operations
sync_cursors Cursor positions for incremental sync
app_locks Crash-safe single-flight lock
raw_payloads Compressed original API responses
schema_version Migration version tracking

The database is stored at ~/.local/share/lore/lore.db by default (XDG compliant).

Development

# Run tests
cargo test

# Run with debug logging
RUST_LOG=lore=debug lore issues

# Run with trace logging
RUST_LOG=lore=trace lore ingest issues

# Check formatting
cargo fmt --check

# Lint
cargo clippy

Tech Stack

  • Rust (2024 edition)
  • SQLite via rusqlite (bundled) with FTS5 and sqlite-vec
  • Ollama for vector embeddings (nomic-embed-text)
  • clap for CLI parsing
  • reqwest for HTTP
  • tokio for async runtime
  • serde for serialization
  • tracing for logging
  • indicatif for progress bars

License

MIT

Description
No description provided
Readme 42 MiB
Languages
Rust 100%