Reduces CHUNK_MAX_BYTES from 32KB to 6KB and CHUNK_OVERLAP_CHARS from 500 to 200 to stay within nomic-embed-text's 8,192-token context window. This commit addresses all downstream consequences of that reduction: - Config drift detection: find_pending_documents and count_pending_documents now take model_name and compare chunk_max_bytes, model, and dims against stored metadata. Documents embedded with stale config are automatically re-queued. - Overflow guard: documents producing >= CHUNK_ROWID_MULTIPLIER chunks are skipped with a sentinel error recorded in embedding_metadata, preventing both rowid collision and infinite re-processing loops. - Deferred clearing: old embeddings are no longer cleared before attempting new ones. clear_document_embeddings is deferred until the first successful chunk embedding, so if all chunks fail the document retains its previous embeddings rather than losing all data. - Savepoints: each page of DB writes is wrapped in a SQLite savepoint so a crash mid-page rolls back atomically instead of leaving partial state (cleared embeddings with no replacements). - Per-chunk retry on context overflow: when a batch fails with a context-length error, each chunk is retried individually so one oversized chunk doesn't poison the entire batch. - Adaptive dedup in vector search: replaces the static 3x over-fetch multiplier with a dynamic one based on actual max chunks per document (using the new chunk_count column with a fallback COUNT query for pre-migration data). Also replaces partial_cmp with total_cmp for f64 distance sorting. - Stores chunk_max_bytes and chunk_count (on sentinel rows) in embedding_metadata to support config drift detection and adaptive dedup without runtime queries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Gitlore
Local GitLab data management with semantic search. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, and hybrid search.
Features
- Local-first: All data stored in SQLite for instant queries
- Incremental sync: Cursor-based sync only fetches changes since last sync
- Full re-sync: Reset cursors and fetch all data from scratch when needed
- Multi-project: Track issues and MRs across multiple GitLab projects
- Rich filtering: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
- Hybrid search: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
- Raw payload storage: Preserves original GitLab API responses for debugging
- Discussion threading: Full support for issue and MR discussions including inline code review comments
- Robot mode: Machine-readable JSON output with structured errors and meaningful exit codes
Installation
cargo install --path .
Or build from source:
cargo build --release
./target/release/lore --help
Quick Start
# Initialize configuration (interactive)
lore init
# Verify authentication
lore auth
# Sync everything from GitLab (issues + MRs + docs + embeddings)
lore sync
# List recent issues
lore issues -n 10
# List open merge requests
lore mrs -s opened
# Show issue details
lore issues 123
# Show MR details with discussions
lore mrs 456
# Search across all indexed data
lore search "authentication bug"
# Robot mode (machine-readable JSON)
lore -J issues -n 5 | jq .
Configuration
Configuration is stored in ~/.config/lore/config.json (or $XDG_CONFIG_HOME/lore/config.json).
Example Configuration
{
"gitlab": {
"baseUrl": "https://gitlab.com",
"tokenEnvVar": "GITLAB_TOKEN"
},
"projects": [
{ "path": "group/project" },
{ "path": "other-group/other-project" }
],
"sync": {
"backfillDays": 14,
"staleLockMinutes": 10,
"heartbeatIntervalSeconds": 30,
"cursorRewindSeconds": 2,
"primaryConcurrency": 4,
"dependentConcurrency": 2
},
"storage": {
"compressRawPayloads": true
},
"embedding": {
"provider": "ollama",
"model": "nomic-embed-text",
"baseUrl": "http://localhost:11434",
"concurrency": 4
}
}
Configuration Options
| Section | Field | Default | Description |
|---|---|---|---|
gitlab |
baseUrl |
-- | GitLab instance URL (required) |
gitlab |
tokenEnvVar |
GITLAB_TOKEN |
Environment variable containing API token |
projects |
path |
-- | Project path (e.g., group/project) |
sync |
backfillDays |
14 |
Days to backfill on initial sync |
sync |
staleLockMinutes |
10 |
Minutes before sync lock considered stale |
sync |
heartbeatIntervalSeconds |
30 |
Frequency of lock heartbeat updates |
sync |
cursorRewindSeconds |
2 |
Seconds to rewind cursor for overlap safety |
sync |
primaryConcurrency |
4 |
Concurrent GitLab requests for primary resources |
sync |
dependentConcurrency |
2 |
Concurrent requests for dependent resources |
storage |
dbPath |
~/.local/share/lore/lore.db |
Database file path |
storage |
backupDir |
~/.local/share/lore/backups |
Backup directory |
storage |
compressRawPayloads |
true |
Compress stored API responses with gzip |
embedding |
provider |
ollama |
Embedding provider |
embedding |
model |
nomic-embed-text |
Model name for embeddings |
embedding |
baseUrl |
http://localhost:11434 |
Ollama server URL |
embedding |
concurrency |
4 |
Concurrent embedding requests |
Config File Resolution
The config file is resolved in this order:
--config/-cCLI flagLORE_CONFIG_PATHenvironment variable~/.config/lore/config.json(XDG default)./lore.config.json(local fallback for development)
GitLab Token
Create a personal access token with read_api scope:
- Go to GitLab > Settings > Access Tokens
- Create token with
read_apiscope - Export it:
export GITLAB_TOKEN=glpat-xxxxxxxxxxxx
Environment Variables
| Variable | Purpose | Required |
|---|---|---|
GITLAB_TOKEN |
GitLab API authentication token (name configurable via gitlab.tokenEnvVar) |
Yes |
LORE_CONFIG_PATH |
Override config file location | No |
LORE_ROBOT |
Enable robot mode globally (set to true or 1) |
No |
XDG_CONFIG_HOME |
XDG Base Directory for config (fallback: ~/.config) |
No |
XDG_DATA_HOME |
XDG Base Directory for data (fallback: ~/.local/share) |
No |
RUST_LOG |
Logging level filter (e.g., lore=debug) |
No |
Commands
lore issues
Query issues from local database, or show a specific issue.
lore issues # Recent issues (default 50)
lore issues 123 # Show issue #123 with discussions
lore issues 123 -p group/repo # Disambiguate by project
lore issues -n 100 # More results
lore issues -s opened # Only open issues
lore issues -s closed # Only closed issues
lore issues -a username # By author (@ prefix optional)
lore issues -A username # By assignee (@ prefix optional)
lore issues -l bug # By label (AND logic)
lore issues -l bug -l urgent # Multiple labels
lore issues -m "v1.0" # By milestone title
lore issues --since 7d # Updated in last 7 days
lore issues --since 2w # Updated in last 2 weeks
lore issues --since 2024-01-01 # Updated since date
lore issues --due-before 2024-12-31 # Due before date
lore issues --has-due # Only issues with due dates
lore issues -p group/repo # Filter by project
lore issues --sort created --asc # Sort by created date, ascending
lore issues -o # Open first result in browser
When listing, output includes: IID, title, state, author, assignee, labels, and update time.
When showing a single issue (e.g., lore issues 123), output includes: title, description, state, author, assignees, labels, milestone, due date, web URL, and threaded discussions.
lore mrs
Query merge requests from local database, or show a specific MR.
lore mrs # Recent MRs (default 50)
lore mrs 456 # Show MR !456 with discussions
lore mrs 456 -p group/repo # Disambiguate by project
lore mrs -n 100 # More results
lore mrs -s opened # Only open MRs
lore mrs -s merged # Only merged MRs
lore mrs -s closed # Only closed MRs
lore mrs -s locked # Only locked MRs
lore mrs -s all # All states
lore mrs -a username # By author (@ prefix optional)
lore mrs -A username # By assignee (@ prefix optional)
lore mrs -r username # By reviewer (@ prefix optional)
lore mrs -d # Only draft/WIP MRs
lore mrs -D # Exclude draft MRs
lore mrs --target main # By target branch
lore mrs --source feature/foo # By source branch
lore mrs -l needs-review # By label (AND logic)
lore mrs --since 7d # Updated in last 7 days
lore mrs -p group/repo # Filter by project
lore mrs --sort created --asc # Sort by created date, ascending
lore mrs -o # Open first result in browser
When listing, output includes: IID, title (with [DRAFT] prefix if applicable), state, author, assignee, labels, and update time.
When showing a single MR (e.g., lore mrs 456), output includes: title, description, state, draft status, author, assignees, reviewers, labels, source/target branches, merge status, web URL, and threaded discussions. Inline code review comments (DiffNotes) display file context in the format [src/file.ts:45].
lore search
Search across indexed documents using hybrid (lexical + semantic), lexical-only, or semantic-only modes.
lore search "authentication bug" # Hybrid search (default)
lore search "login flow" --mode lexical # FTS5 lexical only
lore search "login flow" --mode semantic # Vector similarity only
lore search "auth" --type issue # Filter by source type
lore search "auth" --type mr # MR documents only
lore search "auth" --type discussion # Discussion documents only
lore search "deploy" --author username # Filter by author
lore search "deploy" -p group/repo # Filter by project
lore search "deploy" --label backend # Filter by label (AND logic)
lore search "deploy" --path src/ # Filter by file path (trailing / for prefix)
lore search "deploy" --after 7d # Created after (7d, 2w, or YYYY-MM-DD)
lore search "deploy" --updated-after 2w # Updated after
lore search "deploy" -n 50 # Limit results (default 20, max 100)
lore search "deploy" --explain # Show ranking explanation per result
lore search "deploy" --fts-mode raw # Raw FTS5 query syntax (advanced)
Requires lore generate-docs (or lore sync) to have been run at least once. Semantic mode requires Ollama with the configured embedding model.
lore sync
Run the full sync pipeline: ingest from GitLab, generate searchable documents, and compute embeddings.
lore sync # Full pipeline
lore sync --full # Reset cursors, fetch everything
lore sync --force # Override stale lock
lore sync --no-embed # Skip embedding step
lore sync --no-docs # Skip document regeneration
lore ingest
Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings).
lore ingest # Ingest everything (issues + MRs)
lore ingest issues # Issues only
lore ingest mrs # MRs only
lore ingest issues -p group/repo # Single project
lore ingest --force # Override stale lock
lore ingest --full # Full re-sync (reset cursors)
The --full flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:
- Assignee data or other fields were missing from earlier syncs
- You want to ensure complete data after schema changes
- Troubleshooting sync issues
lore generate-docs
Extract searchable documents from ingested issues, MRs, and discussions for the FTS5 index.
lore generate-docs # Incremental (dirty items only)
lore generate-docs --full # Full rebuild
lore generate-docs -p group/repo # Single project
lore embed
Generate vector embeddings for documents via Ollama. Requires Ollama running with the configured embedding model.
lore embed # Embed new/changed documents
lore embed --retry-failed # Retry previously failed embeddings
lore count
Count entities in local database.
lore count issues # Total issues
lore count mrs # Total MRs (with state breakdown)
lore count discussions # Total discussions
lore count discussions --for issue # Issue discussions only
lore count discussions --for mr # MR discussions only
lore count notes # Total notes (system vs user breakdown)
lore count notes --for issue # Issue notes only
lore stats
Show document and index statistics, with optional integrity checks.
lore stats # Document and index statistics
lore stats --check # Run integrity checks
lore stats --check --repair # Repair integrity issues
lore status
Show current sync state and watermarks.
lore status
Displays:
- Last sync run details (status, timing)
- Cursor positions per project and resource type (issues and MRs)
- Data summary counts
lore init
Initialize configuration and database interactively.
lore init # Interactive setup
lore init --force # Overwrite existing config
lore init --non-interactive # Fail if prompts needed
lore auth
Verify GitLab authentication is working.
lore auth
# Authenticated as @username (Full Name)
# GitLab: https://gitlab.com
lore doctor
Check environment health and configuration.
lore doctor
Checks performed:
- Config file existence and validity
- Database existence and pragmas (WAL mode, foreign keys)
- GitLab authentication
- Project accessibility
- Ollama connectivity (optional)
lore migrate
Run pending database migrations.
lore migrate
lore version
Show version information.
lore version
Robot Mode
Machine-readable JSON output for scripting and AI agent consumption.
Activation
# Global flag
lore --robot issues -n 5
# JSON shorthand (-J)
lore -J issues -n 5
# Environment variable
LORE_ROBOT=1 lore issues -n 5
# Auto-detection (when stdout is not a TTY)
lore issues -n 5 | jq .
Response Format
All commands return consistent JSON:
{"ok": true, "data": {...}, "meta": {...}}
Errors return structured JSON to stderr:
{"error": {"code": "CONFIG_NOT_FOUND", "message": "...", "suggestion": "Run 'lore init'"}}
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Internal error / health check failed / not implemented |
| 2 | Usage error (invalid flags or arguments) |
| 3 | Config invalid |
| 4 | Token not set |
| 5 | GitLab auth failed |
| 6 | Resource not found |
| 7 | Rate limited |
| 8 | Network error |
| 9 | Database locked |
| 10 | Database error |
| 11 | Migration failed |
| 12 | I/O error |
| 13 | Transform error |
| 14 | Ollama unavailable |
| 15 | Ollama model not found |
| 16 | Embedding failed |
| 20 | Config not found |
Configuration Precedence
Settings are resolved in this order (highest to lowest priority):
- CLI flags (
--robot,--config,--color) - Environment variables (
LORE_ROBOT,GITLAB_TOKEN,LORE_CONFIG_PATH) - Config file (
~/.config/lore/config.json) - Built-in defaults
Global Options
lore -c /path/to/config.json <command> # Use alternate config
lore --robot <command> # Machine-readable JSON
lore -J <command> # JSON shorthand
Shell Completions
Generate shell completions for tab-completion support:
# Bash (add to ~/.bashrc)
lore completions bash > ~/.local/share/bash-completion/completions/lore
# Zsh (add to ~/.zshrc: fpath=(~/.zfunc $fpath))
lore completions zsh > ~/.zfunc/_lore
# Fish
lore completions fish > ~/.config/fish/completions/lore.fish
# PowerShell (add to $PROFILE)
lore completions powershell >> $PROFILE
Database Schema
Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
| Table | Purpose |
|---|---|
projects |
Tracked GitLab projects with metadata |
issues |
Issue metadata (title, state, author, due date, milestone) |
merge_requests |
MR metadata (title, state, draft, branches, merge status) |
milestones |
Project milestones with state and due dates |
labels |
Project labels with colors |
issue_labels |
Many-to-many issue-label relationships |
issue_assignees |
Many-to-many issue-assignee relationships |
mr_labels |
Many-to-many MR-label relationships |
mr_assignees |
Many-to-many MR-assignee relationships |
mr_reviewers |
Many-to-many MR-reviewer relationships |
discussions |
Issue/MR discussion threads |
notes |
Individual notes within discussions (with system note flag and DiffNote position data) |
documents |
Extracted searchable text for FTS and embedding |
documents_fts |
FTS5 full-text search index |
embeddings |
Vector embeddings for semantic search |
sync_runs |
Audit trail of sync operations |
sync_cursors |
Cursor positions for incremental sync |
app_locks |
Crash-safe single-flight lock |
raw_payloads |
Compressed original API responses |
schema_version |
Migration version tracking |
The database is stored at ~/.local/share/lore/lore.db by default (XDG compliant).
Development
# Run tests
cargo test
# Run with debug logging
RUST_LOG=lore=debug lore issues
# Run with trace logging
RUST_LOG=lore=trace lore ingest issues
# Check formatting
cargo fmt --check
# Lint
cargo clippy
Tech Stack
- Rust (2024 edition)
- SQLite via rusqlite (bundled) with FTS5 and sqlite-vec
- Ollama for vector embeddings (nomic-embed-text)
- clap for CLI parsing
- reqwest for HTTP
- tokio for async runtime
- serde for serialization
- tracing for logging
- indicatif for progress bars
License
MIT