Taylor Eernisse 8dc479e515 docs: add lore who command design plan with 8 iterations of review feedback
Design document for `lore who` — a people intelligence query layer over
existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53
participants). Answers five collaboration questions: expert lookup by
file/path, workload summary, review pattern analysis, active discussion
tracking, and file overlap detection.

Key design decisions refined across 8 feedback iterations:
- All SQL is fully static (no format!()) with prepare_cached() throughout
- Exact vs prefix path matching via PathQuery struct (two static SQL variants)
- Self-review exclusion (author != reviewer) on all DiffNote branches
- Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers
- Bounded payloads with *_total/*_truncated metadata for robot consumers
- Truncation transparency via LIMIT+1 overflow detection pattern
- Robot JSON includes resolved_input for reproducibility (since_mode tri-state)
- Multi-project correctness with project-qualified entity references
- Composite migration indexes designed for query selectivity on hot paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:05 -05:00

Gitlore

Local GitLab data management with semantic search and temporal intelligence. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, hybrid search, and chronological event reconstruction.

Features

  • Local-first: All data stored in SQLite for instant queries
  • Incremental sync: Cursor-based sync only fetches changes since last sync
  • Full re-sync: Reset cursors and fetch all data from scratch when needed
  • Multi-project: Track issues and MRs across multiple GitLab projects
  • Rich filtering: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
  • Hybrid search: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
  • Timeline pipeline: Reconstructs chronological event histories by combining search, graph traversal, and event aggregation across related entities
  • Git history linking: Tracks merge and squash commit SHAs to connect MRs with git history
  • File change tracking: Records which files each MR touches, enabling file-level history queries
  • Raw payload storage: Preserves original GitLab API responses for debugging
  • Discussion threading: Full support for issue and MR discussions including inline code review comments
  • Cross-reference tracking: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
  • Resource event history: Tracks state changes, label events, and milestone events for issues and MRs
  • Robot mode: Machine-readable JSON output with structured errors and meaningful exit codes
  • Observability: Verbosity controls, JSON log format, structured metrics, and stage timing

Installation

cargo install --path .

Or build from source:

cargo build --release
./target/release/lore --help

Quick Start

# Initialize configuration (interactive)
lore init

# Verify authentication
lore auth

# Sync everything from GitLab (issues + MRs + docs + embeddings)
lore sync

# List recent issues
lore issues -n 10

# List open merge requests
lore mrs -s opened

# Show issue details
lore issues 123

# Show MR details with discussions
lore mrs 456

# Search across all indexed data
lore search "authentication bug"

# Robot mode (machine-readable JSON)
lore -J issues -n 5 | jq .

Configuration

Configuration is stored in ~/.config/lore/config.json (or $XDG_CONFIG_HOME/lore/config.json).

Example Configuration

{
  "gitlab": {
    "baseUrl": "https://gitlab.com",
    "tokenEnvVar": "GITLAB_TOKEN"
  },
  "projects": [
    { "path": "group/project" },
    { "path": "other-group/other-project" }
  ],
  "sync": {
    "backfillDays": 14,
    "staleLockMinutes": 10,
    "heartbeatIntervalSeconds": 30,
    "cursorRewindSeconds": 2,
    "primaryConcurrency": 4,
    "dependentConcurrency": 2
  },
  "storage": {
    "compressRawPayloads": true
  },
  "embedding": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "baseUrl": "http://localhost:11434",
    "concurrency": 4
  }
}

Configuration Options

Section Field Default Description
gitlab baseUrl -- GitLab instance URL (required)
gitlab tokenEnvVar GITLAB_TOKEN Environment variable containing API token
projects path -- Project path (e.g., group/project)
sync backfillDays 14 Days to backfill on initial sync
sync staleLockMinutes 10 Minutes before sync lock considered stale
sync heartbeatIntervalSeconds 30 Frequency of lock heartbeat updates
sync cursorRewindSeconds 2 Seconds to rewind cursor for overlap safety
sync primaryConcurrency 4 Concurrent GitLab requests for primary resources
sync dependentConcurrency 2 Concurrent requests for dependent resources
storage dbPath ~/.local/share/lore/lore.db Database file path
storage backupDir ~/.local/share/lore/backups Backup directory
storage compressRawPayloads true Compress stored API responses with gzip
embedding provider ollama Embedding provider
embedding model nomic-embed-text Model name for embeddings
embedding baseUrl http://localhost:11434 Ollama server URL
embedding concurrency 4 Concurrent embedding requests

Config File Resolution

The config file is resolved in this order:

  1. --config / -c CLI flag
  2. LORE_CONFIG_PATH environment variable
  3. ~/.config/lore/config.json (XDG default)
  4. ./lore.config.json (local fallback for development)

GitLab Token

Create a personal access token with read_api scope:

  1. Go to GitLab > Settings > Access Tokens
  2. Create token with read_api scope
  3. Export it: export GITLAB_TOKEN=glpat-xxxxxxxxxxxx

Environment Variables

Variable Purpose Required
GITLAB_TOKEN GitLab API authentication token (name configurable via gitlab.tokenEnvVar) Yes
LORE_CONFIG_PATH Override config file location No
LORE_ROBOT Enable robot mode globally (set to true or 1) No
XDG_CONFIG_HOME XDG Base Directory for config (fallback: ~/.config) No
XDG_DATA_HOME XDG Base Directory for data (fallback: ~/.local/share) No
NO_COLOR Disable color output when set (any value) No
CLICOLOR Standard color control (0 to disable) No
RUST_LOG Logging level filter (e.g., lore=debug) No

Commands

lore issues

Query issues from local database, or show a specific issue.

lore issues                           # Recent issues (default 50)
lore issues 123                       # Show issue #123 with discussions
lore issues 123 -p group/repo        # Disambiguate by project
lore issues -n 100                    # More results
lore issues -s opened                 # Only open issues
lore issues -s closed                 # Only closed issues
lore issues -a username               # By author (@ prefix optional)
lore issues -A username               # By assignee (@ prefix optional)
lore issues -l bug                    # By label (AND logic)
lore issues -l bug -l urgent          # Multiple labels
lore issues -m "v1.0"                 # By milestone title
lore issues --since 7d               # Updated in last 7 days
lore issues --since 2w               # Updated in last 2 weeks
lore issues --since 1m               # Updated in last month
lore issues --since 2024-01-01       # Updated since date
lore issues --due-before 2024-12-31  # Due before date
lore issues --has-due                 # Only issues with due dates
lore issues -p group/repo            # Filter by project
lore issues --sort created --asc     # Sort by created date, ascending
lore issues -o                        # Open first result in browser

# Field selection (robot mode)
lore -J issues --fields minimal       # Compact: iid, title, state, updated_at_iso
lore -J issues --fields iid,title,labels,state  # Custom fields

When listing, output includes: IID, title, state, author, assignee, labels, and update time. In robot mode, the --fields flag controls which fields appear in the JSON response.

When showing a single issue (e.g., lore issues 123), output includes: title, description, state, author, assignees, labels, milestone, due date, web URL, and threaded discussions.

Project Resolution

The -p / --project flag uses cascading match logic across all commands:

  1. Exact match: group/project
  2. Case-insensitive: Group/Project
  3. Suffix match: project matches group/project (if unambiguous)
  4. Substring match: typescript matches vs/typescript-code (if unambiguous)

If multiple projects match, an error lists the candidates with a hint to use the full path.

lore mrs

Query merge requests from local database, or show a specific MR.

lore mrs                              # Recent MRs (default 50)
lore mrs 456                          # Show MR !456 with discussions
lore mrs 456 -p group/repo           # Disambiguate by project
lore mrs -n 100                       # More results
lore mrs -s opened                    # Only open MRs
lore mrs -s merged                    # Only merged MRs
lore mrs -s closed                    # Only closed MRs
lore mrs -s locked                    # Only locked MRs
lore mrs -s all                       # All states
lore mrs -a username                  # By author (@ prefix optional)
lore mrs -A username                  # By assignee (@ prefix optional)
lore mrs -r username                  # By reviewer (@ prefix optional)
lore mrs -d                           # Only draft/WIP MRs
lore mrs -D                           # Exclude draft MRs
lore mrs --target main               # By target branch
lore mrs --source feature/foo        # By source branch
lore mrs -l needs-review              # By label (AND logic)
lore mrs --since 7d                  # Updated in last 7 days
lore mrs -p group/repo               # Filter by project
lore mrs --sort created --asc        # Sort by created date, ascending
lore mrs -o                           # Open first result in browser

# Field selection (robot mode)
lore -J mrs --fields minimal          # Compact: iid, title, state, updated_at_iso
lore -J mrs --fields iid,title,draft,target_branch  # Custom fields

When listing, output includes: IID, title (with [DRAFT] prefix if applicable), state, author, assignee, labels, and update time.

When showing a single MR (e.g., lore mrs 456), output includes: title, description, state, draft status, author, assignees, reviewers, labels, source/target branches, merge status, web URL, and threaded discussions. Inline code review comments (DiffNotes) display file context in the format [src/file.ts:45].

Search across indexed documents using hybrid (lexical + semantic), lexical-only, or semantic-only modes.

lore search "authentication bug"              # Hybrid search (default)
lore search "login flow" --mode lexical       # FTS5 lexical only
lore search "login flow" --mode semantic      # Vector similarity only
lore search "auth" --type issue               # Filter by source type
lore search "auth" --type mr                  # MR documents only
lore search "auth" --type discussion          # Discussion documents only
lore search "deploy" --author username        # Filter by author
lore search "deploy" -p group/repo           # Filter by project
lore search "deploy" --label backend          # Filter by label (AND logic)
lore search "deploy" --path src/             # Filter by file path (trailing / for prefix)
lore search "deploy" --after 7d              # Created after (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-after 2w      # Updated after
lore search "deploy" -n 50                    # Limit results (default 20, max 100)
lore search "deploy" --explain               # Show ranking explanation per result
lore search "deploy" --fts-mode raw          # Raw FTS5 query syntax (advanced)

Requires lore generate-docs (or lore sync) to have been run at least once. Semantic and hybrid modes require lore embed (or lore sync) to have generated vector embeddings via Ollama.

lore sync

Run the full sync pipeline: ingest from GitLab, generate searchable documents, and compute embeddings.

lore sync                    # Full pipeline
lore sync --full             # Reset cursors, fetch everything
lore sync --force            # Override stale lock
lore sync --no-embed         # Skip embedding step
lore sync --no-docs          # Skip document regeneration
lore sync --no-events        # Skip resource event fetching

The sync command displays animated progress bars for each stage and outputs timing metrics on completion. In robot mode (-J), detailed stage timing is included in the JSON response.

lore ingest

Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings).

lore ingest                                    # Ingest everything (issues + MRs)
lore ingest issues                             # Issues only
lore ingest mrs                                # MRs only
lore ingest issues -p group/repo              # Single project
lore ingest --force                            # Override stale lock
lore ingest --full                             # Full re-sync (reset cursors)

The --full flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:

  • Assignee data or other fields were missing from earlier syncs
  • You want to ensure complete data after schema changes
  • Troubleshooting sync issues

lore generate-docs

Extract searchable documents from ingested issues, MRs, and discussions for the FTS5 index.

lore generate-docs                    # Incremental (dirty items only)
lore generate-docs --full             # Full rebuild
lore generate-docs -p group/repo     # Single project

lore embed

Generate vector embeddings for documents via Ollama. Requires Ollama running with the configured embedding model.

lore embed                    # Embed new/changed documents
lore embed --retry-failed     # Retry previously failed embeddings

lore count

Count entities in local database.

lore count issues                     # Total issues
lore count mrs                        # Total MRs (with state breakdown)
lore count discussions                # Total discussions
lore count discussions --for issue   # Issue discussions only
lore count discussions --for mr      # MR discussions only
lore count notes                      # Total notes (system vs user breakdown)
lore count notes --for issue         # Issue notes only

lore stats

Show document and index statistics, with optional integrity checks.

lore stats                    # Document and index statistics
lore stats --check            # Run integrity checks
lore stats --check --repair   # Repair integrity issues

lore status

Show current sync state and watermarks.

lore status

Displays:

  • Last sync run details (status, timing)
  • Cursor positions per project and resource type (issues and MRs)
  • Data summary counts

lore init

Initialize configuration and database interactively.

lore init                    # Interactive setup
lore init --force            # Overwrite existing config
lore init --non-interactive  # Fail if prompts needed

lore auth

Verify GitLab authentication is working.

lore auth
# Authenticated as @username (Full Name)
# GitLab: https://gitlab.com

lore doctor

Check environment health and configuration.

lore doctor

Checks performed:

  • Config file existence and validity
  • Database existence and pragmas (WAL mode, foreign keys)
  • GitLab authentication
  • Project accessibility
  • Ollama connectivity (optional)

lore migrate

Run pending database migrations.

lore migrate

lore health

Quick pre-flight check for config, database, and schema version. Exits 0 if healthy, 1 if unhealthy.

lore health

Useful as a fast gate before running queries or syncs. For a more thorough check including authentication and project access, use lore doctor.

lore robot-docs

Machine-readable command manifest for agent self-discovery. Returns a JSON schema of all commands, flags, exit codes, and example workflows.

lore robot-docs                   # Pretty-printed JSON
lore --robot robot-docs           # Compact JSON for parsing

lore version

Show version information including the git commit hash.

lore version
# lore version 0.1.0 (abc1234)

Robot Mode

Machine-readable JSON output for scripting and AI agent consumption. All responses use compact (single-line) JSON with a uniform envelope and timing metadata.

Activation

# Global flag
lore --robot issues -n 5

# JSON shorthand (-J)
lore -J issues -n 5

# Environment variable
LORE_ROBOT=1 lore issues -n 5

# Auto-detection (when stdout is not a TTY)
lore issues -n 5 | jq .

Response Format

All commands return a consistent JSON envelope to stdout:

{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}

Every response includes meta.elapsed_ms (wall-clock milliseconds for the command).

Errors return structured JSON to stderr with machine-actionable recovery steps:

{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'","actions":["lore init"]}}

The actions array contains executable shell commands an agent can run to recover from the error. It is omitted when empty (e.g., for generic I/O errors).

Field Selection

The --fields flag on issues and mrs list commands controls which fields appear in the JSON response, reducing token usage for AI agent workflows:

# Minimal preset (~60% fewer tokens)
lore -J issues --fields minimal

# Custom field list
lore -J issues --fields iid,title,state,labels,updated_at_iso

# Available presets
#   minimal: iid, title, state, updated_at_iso

Valid fields for issues: iid, title, state, author_username, labels, assignees, discussion_count, unresolved_count, created_at_iso, updated_at_iso, web_url, project_path

Valid fields for MRs: iid, title, state, author_username, labels, draft, target_branch, source_branch, discussion_count, unresolved_count, created_at_iso, updated_at_iso, web_url, project_path, reviewers

Agent Self-Discovery

The robot-docs command provides a complete machine-readable manifest including response schemas for every command:

lore robot-docs | jq '.data.commands.issues.response_schema'

Each command entry includes response_schema describing the shape of its JSON response, fields_presets for commands supporting --fields, and copy-paste example invocations.

Exit Codes

Code Meaning
0 Success
1 Internal error / health check failed / not implemented
2 Usage error (invalid flags or arguments)
3 Config invalid
4 Token not set
5 GitLab auth failed
6 Resource not found
7 Rate limited
8 Network error
9 Database locked
10 Database error
11 Migration failed
12 I/O error
13 Transform error
14 Ollama unavailable
15 Ollama model not found
16 Embedding failed
17 Not found (entity does not exist)
18 Ambiguous match (use -p to specify project)
19 Health check failed
20 Config not found

Configuration Precedence

Settings are resolved in this order (highest to lowest priority):

  1. CLI flags (--robot, --config, --color)
  2. Environment variables (LORE_ROBOT, GITLAB_TOKEN, LORE_CONFIG_PATH)
  3. Config file (~/.config/lore/config.json)
  4. Built-in defaults

Global Options

lore -c /path/to/config.json <command>   # Use alternate config
lore --robot <command>                    # Machine-readable JSON
lore -J <command>                         # JSON shorthand
lore --color never <command>              # Disable color output
lore --color always <command>             # Force color output
lore -q <command>                         # Suppress non-essential output
lore -v <command>                         # Debug logging
lore -vv <command>                        # More verbose debug logging
lore -vvv <command>                       # Trace-level logging
lore --log-format json <command>          # JSON-formatted log output to stderr

Color output respects NO_COLOR and CLICOLOR environment variables in auto mode (the default).

Shell Completions

Generate shell completions for tab-completion support:

# Bash (add to ~/.bashrc)
lore completions bash > ~/.local/share/bash-completion/completions/lore

# Zsh (add to ~/.zshrc: fpath=(~/.zfunc $fpath))
lore completions zsh > ~/.zfunc/_lore

# Fish
lore completions fish > ~/.config/fish/completions/lore.fish

# PowerShell (add to $PROFILE)
lore completions powershell >> $PROFILE

Database Schema

Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:

Table Purpose
projects Tracked GitLab projects with metadata
issues Issue metadata (title, state, author, due date, milestone)
merge_requests MR metadata (title, state, draft, branches, merge status, commit SHAs)
milestones Project milestones with state and due dates
labels Project labels with colors
issue_labels Many-to-many issue-label relationships
issue_assignees Many-to-many issue-assignee relationships
mr_labels Many-to-many MR-label relationships
mr_assignees Many-to-many MR-assignee relationships
mr_reviewers Many-to-many MR-reviewer relationships
mr_file_changes Files touched by each MR (path, change type, renames)
discussions Issue/MR discussion threads
notes Individual notes within discussions (with system note flag and DiffNote position data)
resource_state_events Issue/MR state change history (opened, closed, merged, reopened)
resource_label_events Label add/remove events with actor and timestamp
resource_milestone_events Milestone add/remove events with actor and timestamp
entity_references Cross-references between entities (MR closes issue, mentioned in, etc.)
documents Extracted searchable text for FTS and embedding
documents_fts FTS5 full-text search index
embeddings Vector embeddings for semantic search
dirty_sources Entities needing document regeneration after ingest
pending_discussion_fetches Queue for discussion fetch operations
sync_runs Audit trail of sync operations
sync_cursors Cursor positions for incremental sync
app_locks Crash-safe single-flight lock
raw_payloads Compressed original API responses
schema_version Migration version tracking

The database is stored at ~/.local/share/lore/lore.db by default (XDG compliant).

Timeline Pipeline

The timeline pipeline reconstructs chronological event histories for GitLab entities by combining full-text search, cross-reference graph traversal, and resource event aggregation. Given a search query, it identifies relevant issues and MRs, discovers related entities through their reference graph, and assembles a unified, time-ordered event stream.

Stages

The pipeline executes in five stages:

  1. SEED -- Full-text search identifies the most relevant issues and MRs matching the query. Documents (issue bodies, MR descriptions, discussion notes) are ranked by BM25 relevance.

  2. HYDRATE -- Evidence notes are extracted from the seed results: the top FTS-matched discussion notes with 200-character snippets that explain why each entity was surfaced.

  3. EXPAND -- Breadth-first traversal over the entity_references graph discovers related entities. Starting from seed entities, the pipeline follows "closes", "related", and optionally "mentioned" references up to a configurable depth, tracking provenance (which entity referenced which, via what method).

  4. COLLECT -- Events are gathered for all discovered entities (seeds + expanded). Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking (timestamp, then entity ID, then event type).

  5. RENDER -- Events are formatted for output as human-readable text or structured JSON.

Event Types

Event Description
Created Entity creation
StateChanged State transitions (opened, closed, reopened)
LabelAdded Label applied to entity
LabelRemoved Label removed from entity
MilestoneSet Milestone assigned
MilestoneRemoved Milestone removed
Merged MR merged (deduplicated against state events)
NoteEvidence Discussion note matched by FTS, with snippet
CrossReferenced Reference to another entity

Unresolved References

When the graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the pipeline output. This enables discovery of external dependencies and can inform future sync targets.

Development

# Run tests
cargo test

# Run with debug logging
RUST_LOG=lore=debug lore issues

# Run with trace logging
RUST_LOG=lore=trace lore ingest issues

# Check formatting
cargo fmt --check

# Lint
cargo clippy

Tech Stack

  • Rust (2024 edition)
  • SQLite via rusqlite (bundled) with FTS5 and sqlite-vec
  • Ollama for vector embeddings (nomic-embed-text)
  • clap for CLI parsing
  • reqwest for HTTP
  • tokio for async runtime
  • serde for serialization
  • tracing for logging
  • indicatif for progress bars

License

MIT

Description
No description provided
Readme 42 MiB
Languages
Rust 100%