Implements the documents module that transforms raw ingested entities (issues, MRs, discussions) into searchable document blobs stored in the documents table. This is the foundation for both FTS5 lexical search and vector embedding. Key components: - documents::extractor: Renders entities into structured text documents. Issues include title, description, labels, milestone, assignees, and threaded discussion summaries. MRs additionally include source/target branches, reviewers, and approval status. Discussions are rendered with full note threading. - documents::regenerator: Drains the dirty_queue table to regenerate only documents whose source entities changed since last sync. Supports full rebuild mode (seeds all entities into dirty queue first) and project-scoped regeneration. - documents::truncation: Safety cap at 2MB per document to prevent pathological outliers from degrading FTS or embedding performance. - ingestion::dirty_tracker: Marks entities as dirty inside the ingestion transaction so document regeneration stays consistent with data changes. Uses INSERT OR IGNORE to deduplicate. - ingestion::discussion_queue: Queue-based discussion fetching that isolates individual discussion failures from the broader ingestion pipeline, preventing a single corrupt discussion from blocking an entire project sync. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Gitlore
Local GitLab data management with semantic search. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying and filtering.
Features
- Local-first: All data stored in SQLite for instant queries
- Incremental sync: Cursor-based sync only fetches changes since last sync
- Full re-sync: Reset cursors and fetch all data from scratch when needed
- Multi-project: Track issues and MRs across multiple GitLab projects
- Rich filtering: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
- Raw payload storage: Preserves original GitLab API responses for debugging
- Discussion threading: Full support for issue and MR discussions including inline code review comments
Installation
cargo install --path .
Or build from source:
cargo build --release
./target/release/lore --help
Quick Start
# Initialize configuration (interactive)
lore init
# Verify authentication
lore auth-test
# Sync issues from GitLab
lore ingest --type issues
# Sync merge requests from GitLab
lore ingest --type mrs
# List recent issues
lore list issues --limit 10
# List open merge requests
lore list mrs --state opened
# Show issue details
lore show issue 123 --project group/repo
# Show MR details with discussions
lore show mr 456 --project group/repo
Configuration
Configuration is stored in ~/.config/lore/config.json (or $XDG_CONFIG_HOME/lore/config.json).
Example Configuration
{
"gitlab": {
"baseUrl": "https://gitlab.com",
"tokenEnvVar": "GITLAB_TOKEN"
},
"projects": [
{ "path": "group/project" },
{ "path": "other-group/other-project" }
],
"sync": {
"backfillDays": 14,
"staleLockMinutes": 10,
"heartbeatIntervalSeconds": 30,
"cursorRewindSeconds": 2,
"primaryConcurrency": 4,
"dependentConcurrency": 2
},
"storage": {
"compressRawPayloads": true
}
}
Configuration Options
| Section | Field | Default | Description |
|---|---|---|---|
gitlab |
baseUrl |
— | GitLab instance URL (required) |
gitlab |
tokenEnvVar |
GITLAB_TOKEN |
Environment variable containing API token |
projects |
path |
— | Project path (e.g., group/project) |
sync |
backfillDays |
14 |
Days to backfill on initial sync |
sync |
staleLockMinutes |
10 |
Minutes before sync lock considered stale |
sync |
heartbeatIntervalSeconds |
30 |
Frequency of lock heartbeat updates |
sync |
cursorRewindSeconds |
2 |
Seconds to rewind cursor for overlap safety |
sync |
primaryConcurrency |
4 |
Concurrent GitLab requests for primary resources |
sync |
dependentConcurrency |
2 |
Concurrent requests for dependent resources |
storage |
dbPath |
~/.local/share/lore/lore.db |
Database file path |
storage |
backupDir |
~/.local/share/lore/backups |
Backup directory |
storage |
compressRawPayloads |
true |
Compress stored API responses with gzip |
embedding |
provider |
ollama |
Embedding provider |
embedding |
model |
nomic-embed-text |
Model name for embeddings |
embedding |
baseUrl |
http://localhost:11434 |
Ollama server URL |
embedding |
concurrency |
4 |
Concurrent embedding requests |
Config File Resolution
The config file is resolved in this order:
--configCLI flagLORE_CONFIG_PATHenvironment variable~/.config/lore/config.json(XDG default)./lore.config.json(local fallback for development)
GitLab Token
Create a personal access token with read_api scope:
- Go to GitLab → Settings → Access Tokens
- Create token with
read_apiscope - Export it:
export GITLAB_TOKEN=glpat-xxxxxxxxxxxx
Environment Variables
| Variable | Purpose | Required |
|---|---|---|
GITLAB_TOKEN |
GitLab API authentication token (name configurable via gitlab.tokenEnvVar) |
Yes |
LORE_CONFIG_PATH |
Override config file location | No |
XDG_CONFIG_HOME |
XDG Base Directory for config (fallback: ~/.config) |
No |
XDG_DATA_HOME |
XDG Base Directory for data (fallback: ~/.local/share) |
No |
RUST_LOG |
Logging level filter (e.g., lore=debug) |
No |
Commands
lore init
Initialize configuration and database interactively.
lore init # Interactive setup
lore init --force # Overwrite existing config
lore init --non-interactive # Fail if prompts needed
lore auth-test
Verify GitLab authentication is working.
lore auth-test
# Authenticated as @username (Full Name)
# GitLab: https://gitlab.com
lore doctor
Check environment health and configuration.
lore doctor # Human-readable output
lore doctor --json # JSON output for scripting
Checks performed:
- Config file existence and validity
- Database existence and pragmas (WAL mode, foreign keys)
- GitLab authentication
- Project accessibility
- Ollama connectivity (optional)
lore ingest
Sync data from GitLab to local database.
# Issues
lore ingest --type issues # Sync all projects
lore ingest --type issues --project group/repo # Single project
lore ingest --type issues --force # Override stale lock
lore ingest --type issues --full # Full re-sync (reset cursors)
# Merge Requests
lore ingest --type mrs # Sync all projects
lore ingest --type mrs --project group/repo # Single project
lore ingest --type mrs --full # Full re-sync (reset cursors)
The --full flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:
- Assignee data or other fields were missing from earlier syncs
- You want to ensure complete data after schema changes
- Troubleshooting sync issues
lore list issues
Query issues from local database.
lore list issues # Recent issues (default 50)
lore list issues --limit 100 # More results
lore list issues --state opened # Only open issues
lore list issues --state closed # Only closed issues
lore list issues --author username # By author (@ prefix optional)
lore list issues --assignee username # By assignee (@ prefix optional)
lore list issues --label bug # By label (AND logic)
lore list issues --label bug --label urgent # Multiple labels
lore list issues --milestone "v1.0" # By milestone title
lore list issues --since 7d # Updated in last 7 days
lore list issues --since 2w # Updated in last 2 weeks
lore list issues --since 2024-01-01 # Updated since date
lore list issues --due-before 2024-12-31 # Due before date
lore list issues --has-due-date # Only issues with due dates
lore list issues --project group/repo # Filter by project
lore list issues --sort created --order asc # Sort options
lore list issues --open # Open first result in browser
lore list issues --json # JSON output
Output includes: IID, title, state, author, assignee, labels, and update time.
lore list mrs
Query merge requests from local database.
lore list mrs # Recent MRs (default 50)
lore list mrs --limit 100 # More results
lore list mrs --state opened # Only open MRs
lore list mrs --state merged # Only merged MRs
lore list mrs --state closed # Only closed MRs
lore list mrs --state locked # Only locked MRs
lore list mrs --state all # All states
lore list mrs --author username # By author (@ prefix optional)
lore list mrs --assignee username # By assignee (@ prefix optional)
lore list mrs --reviewer username # By reviewer (@ prefix optional)
lore list mrs --draft # Only draft/WIP MRs
lore list mrs --no-draft # Exclude draft MRs
lore list mrs --target-branch main # By target branch
lore list mrs --source-branch feature/foo # By source branch
lore list mrs --label needs-review # By label (AND logic)
lore list mrs --since 7d # Updated in last 7 days
lore list mrs --project group/repo # Filter by project
lore list mrs --sort created --order asc # Sort options
lore list mrs --open # Open first result in browser
lore list mrs --json # JSON output
Output includes: IID, title (with [DRAFT] prefix if applicable), state, author, assignee, labels, and update time.
lore show issue
Display detailed issue information.
lore show issue 123 # Show issue #123
lore show issue 123 --project group/repo # Disambiguate if needed
Shows: title, description, state, author, assignees, labels, milestone, due date, web URL, and threaded discussions.
lore show mr
Display detailed merge request information.
lore show mr 456 # Show MR !456
lore show mr 456 --project group/repo # Disambiguate if needed
Shows: title, description, state, draft status, author, assignees, reviewers, labels, source/target branches, merge status, web URL, and threaded discussions. Inline code review comments (DiffNotes) display file context in the format [src/file.ts:45].
lore count
Count entities in local database.
lore count issues # Total issues
lore count mrs # Total MRs (with state breakdown)
lore count discussions # Total discussions
lore count discussions --type issue # Issue discussions only
lore count discussions --type mr # MR discussions only
lore count notes # Total notes (shows system vs user breakdown)
lore sync-status
Show current sync state and watermarks.
lore sync-status
Displays:
- Last sync run details (status, timing)
- Cursor positions per project and resource type (issues and MRs)
- Data summary counts
lore migrate
Run pending database migrations.
lore migrate
Shows current schema version and applies any pending migrations.
lore version
Show version information.
lore version
lore backup
Create timestamped database backup.
lore backup
Note: Not yet implemented.
lore reset
Delete database and reset all state.
lore reset --confirm
Note: Not yet implemented.
Database Schema
Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
| Table | Purpose |
|---|---|
projects |
Tracked GitLab projects with metadata |
issues |
Issue metadata (title, state, author, due date, milestone) |
merge_requests |
MR metadata (title, state, draft, branches, merge status) |
milestones |
Project milestones with state and due dates |
labels |
Project labels with colors |
issue_labels |
Many-to-many issue-label relationships |
issue_assignees |
Many-to-many issue-assignee relationships |
mr_labels |
Many-to-many MR-label relationships |
mr_assignees |
Many-to-many MR-assignee relationships |
mr_reviewers |
Many-to-many MR-reviewer relationships |
discussions |
Issue/MR discussion threads |
notes |
Individual notes within discussions (with system note flag and DiffNote position data) |
sync_runs |
Audit trail of sync operations |
sync_cursors |
Cursor positions for incremental sync |
app_locks |
Crash-safe single-flight lock |
raw_payloads |
Compressed original API responses |
schema_version |
Migration version tracking |
The database is stored at ~/.local/share/lore/lore.db by default (XDG compliant).
Global Options
lore --config /path/to/config.json <command> # Use alternate config
Development
# Run tests
cargo test
# Run with debug logging
RUST_LOG=lore=debug lore list issues
# Run with trace logging
RUST_LOG=lore=trace lore ingest --type issues
# Check formatting
cargo fmt --check
# Lint
cargo clippy
Tech Stack
- Rust (2024 edition)
- SQLite via rusqlite (bundled)
- clap for CLI parsing
- reqwest for HTTP
- tokio for async runtime
- serde for serialization
- tracing for logging
- indicatif for progress bars
Current Status
This is Checkpoint 2 (CP2) of the Gitlore project. Currently implemented:
- Issue ingestion with cursor-based incremental sync
- Merge request ingestion with cursor-based incremental sync
- Discussion and note syncing for issues and MRs
- DiffNote support for inline code review comments
- Rich filtering and querying for both issues and MRs
- Full re-sync capability with watermark reset
Not yet implemented:
- Semantic search with embeddings (CP3+)
- Backup and reset commands
See SPEC.md for the full project roadmap and architecture.
License
MIT