gitlore/README.md

# Gitlore

Local GitLab data management with semantic search. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, and hybrid search.

## Features

- **Local-first**: All data stored in SQLite for instant queries
- **Incremental sync**: Cursor-based sync only fetches changes since last sync
- **Full re-sync**: Reset cursors and fetch all data from scratch when needed
- **Multi-project**: Track issues and MRs across multiple GitLab projects
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
- **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
- **Raw payload storage**: Preserves original GitLab API responses for debugging
- **Discussion threading**: Full support for issue and MR discussions including inline code review comments
- **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
- **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
- **Robot mode**: Machine-readable JSON output with structured errors and meaningful exit codes
- **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing

## Installation

```bash
cargo install --path .
```

Or build from source:

```bash
cargo build --release
./target/release/lore --help
```

## Quick Start

```bash
# Initialize configuration (interactive)
lore init

# Verify authentication
lore auth

# Sync everything from GitLab (issues + MRs + docs + embeddings)
lore sync

# List recent issues
lore issues -n 10

# List open merge requests
lore mrs -s opened

# Show issue details
lore issues 123

# Show MR details with discussions
lore mrs 456

# Search across all indexed data
lore search "authentication bug"

# Robot mode (machine-readable JSON)
lore -J issues -n 5 | jq .
```

## Configuration

Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lore/config.json`).

### Example Configuration

```json
{
  "gitlab": {
    "baseUrl": "https://gitlab.com",
    "tokenEnvVar": "GITLAB_TOKEN"
  },
  "projects": [
    { "path": "group/project" },
    { "path": "other-group/other-project" }
  ],
  "sync": {
    "backfillDays": 14,
    "staleLockMinutes": 10,
    "heartbeatIntervalSeconds": 30,
    "cursorRewindSeconds": 2,
    "primaryConcurrency": 4,
    "dependentConcurrency": 2
  },
  "storage": {
    "compressRawPayloads": true
  },
  "embedding": {
    "provider": "ollama",
    "model": "nomic-embed-text",
    "baseUrl": "http://localhost:11434",
    "concurrency": 4
  }
}
```

### Configuration Options

| Section | Field | Default | Description |
|---------|-------|---------|-------------|
| `gitlab` | `baseUrl` | -- | GitLab instance URL (required) |
| `gitlab` | `tokenEnvVar` | `GITLAB_TOKEN` | Environment variable containing API token |
| `projects` | `path` | -- | Project path (e.g., `group/project`) |
| `sync` | `backfillDays` | `14` | Days to backfill on initial sync |
| `sync` | `staleLockMinutes` | `10` | Minutes before sync lock considered stale |
| `sync` | `heartbeatIntervalSeconds` | `30` | Frequency of lock heartbeat updates |
| `sync` | `cursorRewindSeconds` | `2` | Seconds to rewind cursor for overlap safety |
| `sync` | `primaryConcurrency` | `4` | Concurrent GitLab requests for primary resources |
| `sync` | `dependentConcurrency` | `2` | Concurrent requests for dependent resources |
| `storage` | `dbPath` | `~/.local/share/lore/lore.db` | Database file path |
| `storage` | `backupDir` | `~/.local/share/lore/backups` | Backup directory |
| `storage` | `compressRawPayloads` | `true` | Compress stored API responses with gzip |
| `embedding` | `provider` | `ollama` | Embedding provider |
| `embedding` | `model` | `nomic-embed-text` | Model name for embeddings |
| `embedding` | `baseUrl` | `http://localhost:11434` | Ollama server URL |
| `embedding` | `concurrency` | `4` | Concurrent embedding requests |

### Config File Resolution

The config file is resolved in this order:
1. `--config` / `-c` CLI flag
2. `LORE_CONFIG_PATH` environment variable
3. `~/.config/lore/config.json` (XDG default)
4. `./lore.config.json` (local fallback for development)

### GitLab Token

Create a personal access token with `read_api` scope:

1. Go to GitLab > Settings > Access Tokens
2. Create token with `read_api` scope
3. Export it: `export GITLAB_TOKEN=glpat-xxxxxxxxxxxx`

## Environment Variables

| Variable | Purpose | Required |
|----------|---------|----------|
| `GITLAB_TOKEN` | GitLab API authentication token (name configurable via `gitlab.tokenEnvVar`) | Yes |
| `LORE_CONFIG_PATH` | Override config file location | No |
| `LORE_ROBOT` | Enable robot mode globally (set to `true` or `1`) | No |
| `XDG_CONFIG_HOME` | XDG Base Directory for config (fallback: `~/.config`) | No |
| `XDG_DATA_HOME` | XDG Base Directory for data (fallback: `~/.local/share`) | No |
| `NO_COLOR` | Disable color output when set (any value) | No |
| `CLICOLOR` | Standard color control (0 to disable) | No |
| `RUST_LOG` | Logging level filter (e.g., `lore=debug`) | No |

## Commands

### `lore issues`

Query issues from local database, or show a specific issue.

```bash
lore issues                           # Recent issues (default 50)
lore issues 123                       # Show issue #123 with discussions
lore issues 123 -p group/repo        # Disambiguate by project
lore issues -n 100                    # More results
lore issues -s opened                 # Only open issues
lore issues -s closed                 # Only closed issues
lore issues -a username               # By author (@ prefix optional)
lore issues -A username               # By assignee (@ prefix optional)
lore issues -l bug                    # By label (AND logic)
lore issues -l bug -l urgent          # Multiple labels
lore issues -m "v1.0"                 # By milestone title
lore issues --since 7d               # Updated in last 7 days
lore issues --since 2w               # Updated in last 2 weeks
lore issues --since 1m               # Updated in last month
lore issues --since 2024-01-01       # Updated since date
lore issues --due-before 2024-12-31  # Due before date
lore issues --has-due                 # Only issues with due dates
lore issues -p group/repo            # Filter by project
lore issues --sort created --asc     # Sort by created date, ascending
lore issues -o                        # Open first result in browser
```

When listing, output includes: IID, title, state, author, assignee, labels, and update time.

When showing a single issue (e.g., `lore issues 123`), output includes: title, description, state, author, assignees, labels, milestone, due date, web URL, and threaded discussions.

#### Project Resolution

The `-p` / `--project` flag uses cascading match logic across all commands:

1. **Exact match**: `group/project`
2. **Case-insensitive**: `Group/Project`
3. **Suffix match**: `project` matches `group/project` (if unambiguous)
4. **Substring match**: `typescript` matches `vs/typescript-code` (if unambiguous)

If multiple projects match, an error lists the candidates with a hint to use the full path.

### `lore mrs`

Query merge requests from local database, or show a specific MR.

```bash
lore mrs                              # Recent MRs (default 50)
lore mrs 456                          # Show MR !456 with discussions
lore mrs 456 -p group/repo           # Disambiguate by project
lore mrs -n 100                       # More results
lore mrs -s opened                    # Only open MRs
lore mrs -s merged                    # Only merged MRs
lore mrs -s closed                    # Only closed MRs
lore mrs -s locked                    # Only locked MRs
lore mrs -s all                       # All states
lore mrs -a username                  # By author (@ prefix optional)
lore mrs -A username                  # By assignee (@ prefix optional)
lore mrs -r username                  # By reviewer (@ prefix optional)
lore mrs -d                           # Only draft/WIP MRs
lore mrs -D                           # Exclude draft MRs
lore mrs --target main               # By target branch
lore mrs --source feature/foo        # By source branch
lore mrs -l needs-review              # By label (AND logic)
lore mrs --since 7d                  # Updated in last 7 days
lore mrs -p group/repo               # Filter by project
lore mrs --sort created --asc        # Sort by created date, ascending
lore mrs -o                           # Open first result in browser
```

When listing, output includes: IID, title (with [DRAFT] prefix if applicable), state, author, assignee, labels, and update time.

When showing a single MR (e.g., `lore mrs 456`), output includes: title, description, state, draft status, author, assignees, reviewers, labels, source/target branches, merge status, web URL, and threaded discussions. Inline code review comments (DiffNotes) display file context in the format `[src/file.ts:45]`.

### `lore search`

Search across indexed documents using hybrid (lexical + semantic), lexical-only, or semantic-only modes.

```bash
lore search "authentication bug"              # Hybrid search (default)
lore search "login flow" --mode lexical       # FTS5 lexical only
lore search "login flow" --mode semantic      # Vector similarity only
lore search "auth" --type issue               # Filter by source type
lore search "auth" --type mr                  # MR documents only
lore search "auth" --type discussion          # Discussion documents only
lore search "deploy" --author username        # Filter by author
lore search "deploy" -p group/repo           # Filter by project
lore search "deploy" --label backend          # Filter by label (AND logic)
lore search "deploy" --path src/             # Filter by file path (trailing / for prefix)
lore search "deploy" --after 7d              # Created after (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-after 2w      # Updated after
lore search "deploy" -n 50                    # Limit results (default 20, max 100)
lore search "deploy" --explain               # Show ranking explanation per result
lore search "deploy" --fts-mode raw          # Raw FTS5 query syntax (advanced)
```

Requires `lore generate-docs` (or `lore sync`) to have been run at least once. Semantic and hybrid modes require `lore embed` (or `lore sync`) to have generated vector embeddings via Ollama.

### `lore sync`

Run the full sync pipeline: ingest from GitLab, generate searchable documents, and compute embeddings.

```bash
lore sync                    # Full pipeline
lore sync --full             # Reset cursors, fetch everything
lore sync --force            # Override stale lock
lore sync --no-embed         # Skip embedding step
lore sync --no-docs          # Skip document regeneration
lore sync --no-events        # Skip resource event fetching
```

The sync command displays animated progress bars for each stage and outputs timing metrics on completion. In robot mode (`-J`), detailed stage timing is included in the JSON response.

### `lore ingest`

Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings).

```bash
lore ingest                                    # Ingest everything (issues + MRs)
lore ingest issues                             # Issues only
lore ingest mrs                                # MRs only
lore ingest issues -p group/repo              # Single project
lore ingest --force                            # Override stale lock
lore ingest --full                             # Full re-sync (reset cursors)
```

The `--full` flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:
- Assignee data or other fields were missing from earlier syncs
- You want to ensure complete data after schema changes
- Troubleshooting sync issues

### `lore generate-docs`

Extract searchable documents from ingested issues, MRs, and discussions for the FTS5 index.

```bash
lore generate-docs                    # Incremental (dirty items only)
lore generate-docs --full             # Full rebuild
lore generate-docs -p group/repo     # Single project
```

### `lore embed`

Generate vector embeddings for documents via Ollama. Requires Ollama running with the configured embedding model.

```bash
lore embed                    # Embed new/changed documents
lore embed --retry-failed     # Retry previously failed embeddings
```

### `lore count`

Count entities in local database.

```bash
lore count issues                     # Total issues
lore count mrs                        # Total MRs (with state breakdown)
lore count discussions                # Total discussions
lore count discussions --for issue   # Issue discussions only
lore count discussions --for mr      # MR discussions only
lore count notes                      # Total notes (system vs user breakdown)
lore count notes --for issue         # Issue notes only
```

### `lore stats`

Show document and index statistics, with optional integrity checks.

```bash
lore stats                    # Document and index statistics
lore stats --check            # Run integrity checks
lore stats --check --repair   # Repair integrity issues
```

### `lore status`

Show current sync state and watermarks.

```bash
lore status
```

Displays:
- Last sync run details (status, timing)
- Cursor positions per project and resource type (issues and MRs)
- Data summary counts

### `lore init`

Initialize configuration and database interactively.

```bash
lore init                    # Interactive setup
lore init --force            # Overwrite existing config
lore init --non-interactive  # Fail if prompts needed
```

### `lore auth`

Verify GitLab authentication is working.

```bash
lore auth
# Authenticated as @username (Full Name)
# GitLab: https://gitlab.com
```

### `lore doctor`

Check environment health and configuration.

```bash
lore doctor
```

Checks performed:
- Config file existence and validity
- Database existence and pragmas (WAL mode, foreign keys)
- GitLab authentication
- Project accessibility
- Ollama connectivity (optional)

### `lore migrate`

Run pending database migrations.

```bash
lore migrate
```

### `lore health`

Quick pre-flight check for config, database, and schema version. Exits 0 if healthy, 1 if unhealthy.

```bash
lore health
```

Useful as a fast gate before running queries or syncs. For a more thorough check including authentication and project access, use `lore doctor`.

### `lore robot-docs`

Machine-readable command manifest for agent self-discovery. Returns a JSON schema of all commands, flags, exit codes, and example workflows.

```bash
lore robot-docs                   # Pretty-printed JSON
lore --robot robot-docs           # Compact JSON for parsing
```

### `lore version`

Show version information including the git commit hash.

```bash
lore version
# lore version 0.1.0 (abc1234)
```

## Robot Mode

Machine-readable JSON output for scripting and AI agent consumption.

### Activation

```bash
# Global flag
lore --robot issues -n 5

# JSON shorthand (-J)
lore -J issues -n 5

# Environment variable
LORE_ROBOT=1 lore issues -n 5

# Auto-detection (when stdout is not a TTY)
lore issues -n 5 | jq .
```

### Response Format

All commands return consistent JSON:

```json
{"ok": true, "data": {...}, "meta": {...}}
```

Errors return structured JSON to stderr:

```json
{"error": {"code": "CONFIG_NOT_FOUND", "message": "...", "suggestion": "Run 'lore init'"}}
```

### Exit Codes

| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Internal error / health check failed / not implemented |
| 2 | Usage error (invalid flags or arguments) |
| 3 | Config invalid |
| 4 | Token not set |
| 5 | GitLab auth failed |
| 6 | Resource not found |
| 7 | Rate limited |
| 8 | Network error |
| 9 | Database locked |
| 10 | Database error |
| 11 | Migration failed |
| 12 | I/O error |
| 13 | Transform error |
| 14 | Ollama unavailable |
| 15 | Ollama model not found |
| 16 | Embedding failed |
| 17 | Not found (entity does not exist) |
| 18 | Ambiguous match (use `-p` to specify project) |
| 20 | Config not found |

## Configuration Precedence

Settings are resolved in this order (highest to lowest priority):

1. CLI flags (`--robot`, `--config`, `--color`)
2. Environment variables (`LORE_ROBOT`, `GITLAB_TOKEN`, `LORE_CONFIG_PATH`)
3. Config file (`~/.config/lore/config.json`)
4. Built-in defaults

## Global Options

```bash
lore -c /path/to/config.json <command>   # Use alternate config
lore --robot <command>                    # Machine-readable JSON
lore -J <command>                         # JSON shorthand
lore --color never <command>              # Disable color output
lore --color always <command>             # Force color output
lore -q <command>                         # Suppress non-essential output
lore -v <command>                         # Debug logging
lore -vv <command>                        # More verbose debug logging
lore -vvv <command>                       # Trace-level logging
lore --log-format json <command>          # JSON-formatted log output to stderr
```

Color output respects `NO_COLOR` and `CLICOLOR` environment variables in `auto` mode (the default).

## Shell Completions

Generate shell completions for tab-completion support:

```bash
# Bash (add to ~/.bashrc)
lore completions bash > ~/.local/share/bash-completion/completions/lore

# Zsh (add to ~/.zshrc: fpath=(~/.zfunc $fpath))
lore completions zsh > ~/.zfunc/_lore

# Fish
lore completions fish > ~/.config/fish/completions/lore.fish

# PowerShell (add to $PROFILE)
lore completions powershell >> $PROFILE
```

## Database Schema

Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:

| Table | Purpose |
|-------|---------|
| `projects` | Tracked GitLab projects with metadata |
| `issues` | Issue metadata (title, state, author, due date, milestone) |
| `merge_requests` | MR metadata (title, state, draft, branches, merge status) |
| `milestones` | Project milestones with state and due dates |
| `labels` | Project labels with colors |
| `issue_labels` | Many-to-many issue-label relationships |
| `issue_assignees` | Many-to-many issue-assignee relationships |
| `mr_labels` | Many-to-many MR-label relationships |
| `mr_assignees` | Many-to-many MR-assignee relationships |
| `mr_reviewers` | Many-to-many MR-reviewer relationships |
| `discussions` | Issue/MR discussion threads |
| `notes` | Individual notes within discussions (with system note flag and DiffNote position data) |
| `resource_state_events` | Issue/MR state change history (opened, closed, merged, reopened) |
| `resource_label_events` | Label add/remove events with actor and timestamp |
| `resource_milestone_events` | Milestone add/remove events with actor and timestamp |
| `entity_references` | Cross-references between entities (MR closes issue, mentioned in, etc.) |
| `documents` | Extracted searchable text for FTS and embedding |
| `documents_fts` | FTS5 full-text search index |
| `embeddings` | Vector embeddings for semantic search |
| `dirty_sources` | Entities needing document regeneration after ingest |
| `pending_discussion_fetches` | Queue for discussion fetch operations |
| `sync_runs` | Audit trail of sync operations |
| `sync_cursors` | Cursor positions for incremental sync |
| `app_locks` | Crash-safe single-flight lock |
| `raw_payloads` | Compressed original API responses |
| `schema_version` | Migration version tracking |

The database is stored at `~/.local/share/lore/lore.db` by default (XDG compliant).

## Development

```bash
# Run tests
cargo test

# Run with debug logging
RUST_LOG=lore=debug lore issues

# Run with trace logging
RUST_LOG=lore=trace lore ingest issues

# Check formatting
cargo fmt --check

# Lint
cargo clippy
```

## Tech Stack

- **Rust** (2024 edition)
- **SQLite** via rusqlite (bundled) with FTS5 and sqlite-vec
- **Ollama** for vector embeddings (nomic-embed-text)
- **clap** for CLI parsing
- **reqwest** for HTTP
- **tokio** for async runtime
- **serde** for serialization
- **tracing** for logging
- **indicatif** for progress bars

## License

MIT