What is Gitlore?
Gitlore (lore) is a read-only local sync engine for GitLab data. It fetches issues, merge requests, and discussions from the GitLab REST API v4 and stores them in a local SQLite database for fast offline querying.
~11,148 lines of Rust. Noun-first CLI. Robot mode for automation. Cursor-based incremental sync.
API Surface Coverage
Gitlore uses a small, focused subset of the GitLab API. It is read-only — it never creates, updates, or deletes anything on GitLab.
Data Flow
10 req/s + jitter
Paginated fetch
Normalize data
Local DB
Key Design Decisions
Only GET requests. Never mutates GitLab state. Safe to run repeatedly.
Uses updated_after parameter to only fetch changed data. 2-second rewind overlap for safety.
Stores original JSON responses with SHA-256 dedup and optional gzip compression.
Discussions use DELETE+INSERT strategy per parent (no incremental). Parallel prefetch, serial write.
CLI Architecture
Command Structure (Noun-First)
lore <noun> [verb/arg] # Primary pattern lore issues # List all issues lore issues 42 # Show issue #42 lore mrs # List all merge requests lore mrs 17 # Show MR #17 lore ingest issues # Fetch issues from GitLab lore ingest mrs # Fetch MRs from GitLab lore count issues # Count local issues lore count discussions # Count local discussions lore status # Show sync state lore auth # Verify GitLab auth lore doctor # Health check lore init # Initialize config + DB lore migrate # Run DB migrations lore version # Show version
Global Flags
| Flag | Description |
|---|---|
-c, --config | Path to config file |
--robot | Machine-readable JSON output |
-J, --json | JSON shorthand (same as --robot) |
Robot Mode Detection
Three ways to activate:
lore --robot list issues # Explicit flag lore list issues | jq . # Auto: stdout not a TTY LORE_ROBOT=1 lore list issues # Environment variable
Robot mode returns JSON: {"ok":true,"data":{...},"meta":{...}}
Errors go to stderr: {"error":{"code":"...","message":"...","suggestion":"..."}}
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Internal error |
| 2 | Config not found |
| 3 | Config invalid |
| 4 | Token not set |
| 5 | GitLab auth failed |
| 6 | Resource not found |
| 7 | Rate limited |
| 8 | Network error |
| 9 | Database locked |
| 10 | Database error |
| 11 | Migration failed |
| 12 | I/O error |
| 13 | Transform error |
Configuration
{
"gitlab": {
"baseUrl": "https://gitlab.com",
"tokenEnvVar": "GITLAB_TOKEN"
},
"projects": [
{ "path": "group/project" }
],
"sync": {
"backfillDays": 14,
"staleLockMinutes": 10,
"heartbeatIntervalSeconds": 30,
"cursorRewindSeconds": 2,
"primaryConcurrency": 4,
"dependentConcurrency": 2
},
"storage": {
"dbPath": "~/.local/share/lore/lore.db",
"compressRawPayloads": true
}
}
Deprecated Commands (Hidden)
| Old | New | Notes |
|---|---|---|
lore list | lore issues / lore mrs | Shows deprecation warning |
lore show | lore issues <iid> | Shows deprecation warning |
lore auth-test | lore auth | Alias |
lore sync-status | lore status | Alias |
GitLab API Endpoints Used by Gitlore
All requests use PRIVATE-TOKEN header authentication. Rate limited at 10 req/s with 0-50ms jitter.
Full GitLab REST API v4 Reference
Complete endpoint inventory for the resources relevant to Gitlore. USED = consumed by Gitlore.
Issues API
| Method | Endpoint | Description | Status |
|---|---|---|---|
| GET | /issues | List all issues (global) | -- |
| GET | /groups/:id/issues | List group issues | -- |
| GET | /projects/:id/issues | List project issues | USED |
| GET | /projects/:id/issues/:iid | Get single issue | -- |
| POST | /projects/:id/issues | Create issue | -- |
| PUT | /projects/:id/issues/:iid | Update issue | -- |
| DEL | /projects/:id/issues/:iid | Delete issue | -- |
| PUT | /projects/:id/issues/:iid/reorder | Reorder issue | -- |
| POST | /projects/:id/issues/:iid/move | Move issue | -- |
| POST | /projects/:id/issues/:iid/clone | Clone issue | -- |
| POST | /projects/:id/issues/:iid/subscribe | Subscribe to issue | -- |
| POST | /projects/:id/issues/:iid/unsubscribe | Unsubscribe | -- |
| POST | /projects/:id/issues/:iid/todo | Create to-do | -- |
| POST | /projects/:id/issues/:iid/time_estimate | Set time estimate | -- |
| POST | /projects/:id/issues/:iid/add_spent_time | Add spent time | -- |
| GET | /projects/:id/issues/:iid/time_stats | Get time stats | -- |
| GET | /projects/:id/issues/:iid/related_merge_requests | Related MRs | -- |
| GET | /projects/:id/issues/:iid/closed_by | MRs that close issue | -- |
| GET | /projects/:id/issues/:iid/participants | List participants | -- |
Merge Requests API
| Method | Endpoint | Description | Status |
|---|---|---|---|
| GET | /merge_requests | List all MRs (global) | -- |
| GET | /groups/:id/merge_requests | List group MRs | -- |
| GET | /projects/:id/merge_requests | List project MRs | USED |
| GET | /projects/:id/merge_requests/:iid | Get single MR | -- |
| POST | /projects/:id/merge_requests | Create MR | -- |
| PUT | /projects/:id/merge_requests/:iid | Update MR | -- |
| DEL | /projects/:id/merge_requests/:iid | Delete MR | -- |
| PUT | /projects/:id/merge_requests/:iid/merge | Merge an MR | -- |
| POST | /projects/:id/merge_requests/:iid/cancel_merge | Cancel merge | -- |
| PUT | /projects/:id/merge_requests/:iid/rebase | Rebase MR | -- |
| GET | /projects/:id/merge_requests/:iid/commits | List MR commits | -- |
| GET | /projects/:id/merge_requests/:iid/changes | List MR diffs | -- |
| GET | /projects/:id/merge_requests/:iid/pipelines | MR pipelines | -- |
| GET | /projects/:id/merge_requests/:iid/participants | MR participants | -- |
| GET | /projects/:id/merge_requests/:iid/approvals | MR approvals | -- |
| POST | /projects/:id/merge_requests/:iid/approve | Approve MR | -- |
Discussions API
| Method | Endpoint | Description | Status |
|---|---|---|---|
| GET | /projects/:id/issues/:iid/discussions | List issue discussions | USED |
| GET | /projects/:id/issues/:iid/discussions/:did | Get single discussion | -- |
| POST | /projects/:id/issues/:iid/discussions | Create issue thread | -- |
| POST | /projects/:id/issues/:iid/discussions/:did/notes | Add note to thread | -- |
| PUT | /projects/:id/issues/:iid/discussions/:did/notes/:nid | Modify note | -- |
| DEL | /projects/:id/issues/:iid/discussions/:did/notes/:nid | Delete note | -- |
| GET | /projects/:id/merge_requests/:iid/discussions | List MR discussions | USED |
| GET | /projects/:id/merge_requests/:iid/discussions/:did | Get single MR discussion | -- |
| POST | /projects/:id/merge_requests/:iid/discussions | Create MR thread | -- |
| PUT | /projects/:id/merge_requests/:iid/discussions/:did | Resolve/unresolve thread | -- |
| POST | /projects/:id/merge_requests/:iid/discussions/:did/notes | Add note to MR thread | -- |
| PUT | /projects/:id/merge_requests/:iid/discussions/:did/notes/:nid | Modify MR note | -- |
| DEL | /projects/:id/merge_requests/:iid/discussions/:did/notes/:nid | Delete MR note | -- |
| GET | /projects/:id/snippets/:sid/discussions | List snippet discussions | -- |
| GET | /groups/:id/epics/:eid/discussions | List epic discussions | -- |
| GET | /projects/:id/repository/commits/:sha/discussions | List commit discussions | -- |
Notes API (Flat, non-threaded)
| Method | Endpoint | Description | Status |
|---|---|---|---|
| GET | /projects/:id/issues/:iid/notes | List issue notes | -- |
| POST | /projects/:id/issues/:iid/notes | Create issue note | -- |
| GET | /projects/:id/merge_requests/:iid/notes | List MR notes | -- |
| POST | /projects/:id/merge_requests/:iid/notes | Create MR note | -- |
| GET | /projects/:id/snippets/:sid/notes | List snippet notes | -- |
Gitlore uses the Discussions API (threaded) instead of the flat Notes API. Notes are extracted from discussion responses.
Other APIs Used
| Method | Endpoint | Description | Status |
|---|---|---|---|
| GET | /user | Current authenticated user | USED |
| GET | /projects/:path | Get project by path | USED |
| GET | /version | GitLab instance version | USED |
CLI Command ↔ API Endpoint Mapping
How each CLI command maps to GitLab API calls and local database operations.
GET /projects/:id/issues (paginated, cursor-based)WHERE updated_at > discussions_synced_for_updated_atGET /projects/:id/issues/:iid/discussions (parallel prefetch)issues, labels, issue_labels, discussions, notes, raw_payloadsGET /projects/:id/merge_requests (paginated, cursor-based)WHERE updated_at > discussions_synced_for_updated_atGET /projects/:id/merge_requests/:iid/discussions (parallel prefetch)merge_requests, labels, mr_labels, mr_assignees, mr_reviewers, discussions, notes, raw_payloadsSELECT ... FROM issues/merge_requests with filters (no API call)SELECT ... WHERE iid = ? + join discussions/notes (no API call)GET /api/v4/userGET /api/v4/user + GET /api/v4/versionGET /api/v4/projects/:pathAPI Capabilities NOT Used by Gitlore
- Create/update/delete issues
- Create/update/delete MRs
- Merge MRs
- Create/reply to discussions
- Resolve/unresolve threads
- Approve MRs
- Single issue/MR fetch (uses list with filters instead)
- MR commits, diffs, pipelines
- Issue/MR participants
- Time tracking stats
- Related MRs / closed-by
- Labels API (extracted from issue/MR responses)
- Milestones API (extracted from issue responses)
- Flat Notes API (uses threaded Discussions API)
- Snippets, Epics, Commits discussions
- Webhooks, CI/CD, Pipelines, Deployments
Database Schema
SQLite with WAL mode. 12 tables across 6 migrations.
Entity Relationship
projects ──────────────────────────────────────────────────────────
│ │
├──< issues ──< issue_labels >── labels │
│ │ │
│ └──< discussions ──< notes │
│ │
├──< merge_requests ──< mr_labels >── labels │
│ │ │ │ │
│ │ │ └──< mr_reviewers │
│ │ └──< mr_assignees │
│ │ │
│ └──< discussions ──< notes │
│ │
├──< raw_payloads │
├──< sync_cursors │
└── sync_runs, app_locks, schema_version │
───────────────────────────────────────────────────────────────────
Table Details
Ingestion Pipeline
Three-Phase Architecture
Paginated API fetch with cursor-based sync.
Stores raw payloads + normalized rows.
SQL query: which issues/MRs need
their discussions refreshed?
Parallel prefetch + serial write.
Full-refresh per parent entity.
Cursor-Based Incremental Sync
Cursor State: (updated_at_cursor: i64, tie_breaker_id: i64)
First sync:
updated_after = (now - backfillDays)
Subsequent syncs:
updated_after = cursor.updated_at - cursorRewindSeconds
For each fetched resource:
if (gitlab_id, updated_at) <= cursor:
SKIP (already processed in overlap zone)
else:
UPSERT into database
After each page boundary:
UPDATE sync_cursors (crash recovery safe)
Discussion Sync Strategy
For each issue/MR where updated_at > discussions_synced_for_updated_at:
1. PREFETCH (parallel, configurable concurrency):
GET /projects/:id/issues/:iid/discussions (all pages)
2. WRITE (serial, inside transaction):
DELETE FROM discussions WHERE issue_id = ?
DELETE FROM notes WHERE discussion_id IN (...)
INSERT discussions + notes (fresh data)
UPDATE issues SET discussions_synced_for_updated_at = updated_at
Full-refresh avoids complexity of detecting deleted/edited notes. Trade-off: more API calls for heavily-discussed items.
Rate Limiting
RateLimiter {
min_interval: 100ms (= 1s / 10 req/s)
jitter: 0-50ms random
acquire():
elapsed = now - last_request
if elapsed < min_interval:
sleep(min_interval - elapsed + random_jitter)
last_request = now
}
Pagination
Async stream-based. Fallback chain for next-page detection:
Linkheader (RFC 8288) — parserel="next"x-next-pageheader — direct page number- Full-page heuristic — if response has 100 items, assume more pages
Raw Payload Storage
JSON bytes
Dedup check
(if enabled)
BLOB storage
UNIQUE constraint on (project_id, resource_type, gitlab_id, payload_hash) prevents storing identical payloads.
Concurrency Model
Single-threaded async stream. Rate-limited. Each page written in a transaction. Cursor updated at page boundaries.
Parallel prefetch (configurable, default 2 concurrent). Serial write phase to avoid DB contention. Each parent entity is one transaction.
Single-Flight Lock
AppLock (database-enforced mutex):
name: 'sync' (PK)
owner: UUIDv4 (unique per process)
heartbeat_at: updated every 30s
Acquire:
INSERT OR fail if row exists
Check stale: if heartbeat > staleLockMinutes, force-acquire
Release:
DELETE WHERE owner = my_uuid
Progress Events
| Event | Description |
|---|---|
IssuesFetchStarted | Beginning primary issue fetch |
IssueFetched | Each issue processed (for progress bars) |
IssuesFetchComplete | All pages consumed |
DiscussionSyncStarted | Beginning discussion phase |
DiscussionSynced | Each parent's discussions written |
DiscussionSyncComplete | All discussions updated |
| Same events exist for MRs (MrsFetchStarted, etc.) | |
Field-Level Coverage: API Response vs Gitlore Storage
Every field in every GitLab API response, mapped to what Gitlore does with it. Serde silently drops fields not in the Rust structs.
Field Coverage Summary
Although many fields are dropped during transformation, the raw_payloads table stores the complete original JSON response (with SHA-256 dedup and optional gzip). This means all "dropped" data is still recoverable from the blob storage without re-fetching from GitLab. The normalized tables are optimized for query patterns, not completeness.
Efficiency Analysis & Opportunities
Observations on how gitlore could leverage the GitLab API more efficiently, and data it currently leaves on the table.
Current Efficiency Wins
Uses updated_after + order_by=updated_at&sort=asc to only fetch changed records. Avoids full re-fetch on every sync. This is the single biggest efficiency feature.
SHA-256 hashing prevents storing identical payloads. If an issue's updated_at changes but the actual content is identical, the raw blob is deduplicated.
Only re-syncs discussions for issues/MRs whose updated_at has advanced past their discussions_synced_for_updated_at watermark. Skips unchanged entities.
Fetches discussions for multiple issues/MRs concurrently (configurable, default 2). Dramatically reduces wall-clock time for discussion sync.
Potential Inefficiencies
Every time an issue/MR is updated, ALL its discussions are re-fetched and replaced (DELETE + INSERT). For heavily-discussed items (50+ comments), this is expensive.
| Scenario | Current | Alternative |
|---|---|---|
| Issue with 100 notes gets 1 new comment | Re-fetch all 100 notes (multiple pages) | Could use GET .../notes?order_by=updated_at&updated_after=... for incremental note sync |
| MR label change (no new comments) | Re-fetch all discussions anyway | Could check user_notes_count delta or use Notes API with updated_after |
Trade-off: Full-refresh is simpler and guarantees consistency (catches edits, deletes). Incremental would miss deleted notes.
Gitlore uses page=N&per_page=100 offset pagination. GitLab supports keyset pagination for some endpoints (Issues, MRs), which is more efficient for large datasets and recommended by GitLab.
Current: GET /projects/:id/issues?page=5&per_page=100
Keyset: GET /projects/:id/issues?pagination=keyset&per_page=100
(uses Link header rel="next" with cursor)
Benefit: Keyset pagination is O(1) per page (vs O(N) for offset). GitLab recommends it for >10,000 records. Gitlore already parses Link headers, so the client-side support partially exists.
GitLab returns ETag headers on API responses. Sending If-None-Match on subsequent requests would return 304 Not Modified without consuming rate limit quota on some endpoints. Currently all requests are unconditional.
Impact: Moderate. The cursor-based sync already avoids re-fetching unchanged data, so ETag would mainly help with the discussions full-refresh scenario where nothing changed.
Gitlore extracts labels from the labels[] string array embedded in issue/MR responses. The dedicated GET /projects/:id/labels endpoint returns richer data:
| From issues response | From Labels API |
|---|---|
| Label name (string only) | name, color, description, text_color, priority, is_project_label, subscribed, open_issues_count, closed_issues_count, open_merge_requests_count |
Impact: The labels table has color and description columns but they may not be populated from the embedded string array. A single Labels API call (one request, non-paginated for most projects) would enrich the local label catalog.
Dropped Data Worth Capturing
Fields currently silently dropped that could add value to local queries:
| Field | Source | Value Proposition | Effort |
|---|---|---|---|
user_notes_count |
Issues, MRs | Could skip discussion re-sync when count hasn't changed. Quick "activity" sort without joining notes table. | Low |
upvotes / downvotes |
Issues, MRs | Engagement metrics for triage. "Most upvoted issues" is a common query. | Low |
confidential |
Issues | Security-sensitive filtering. Avoid exposing confidential issues in outputs. | Low |
weight |
Issues | Effort estimation for sprint planning (Premium/Ultimate only). | Low |
time_stats |
Issues, MRs | Time tracking data for project reporting. Already in the response, free to capture. | Low |
has_conflicts |
MRs | Identify MRs needing rebase. Useful for "stale MR" alerts. | Low |
blocking_discussions_resolved |
MRs | MR readiness indicator without joining discussions table. | Low |
merge_commit_sha |
MRs | Trace merged MRs to specific commits. Useful for git correlation. | Low |
suggestions[] |
Discussion notes | Code review suggestions with from/to content. Rich data for code review analysis. | Medium |
task_completion_status |
Issues, MRs | Track task-list checkbox progress without parsing description markdown. | Low |
issue_type |
Issues | Distinguish issues vs incidents vs test cases. | Low |
discussion_locked |
Issues, MRs | Know if new comments can be added. | Low |
Structural Optimization Opportunities
Currently stores only username for authors, assignees, reviewers. The API returns name, avatar_url, web_url, and state for every user reference. A users table could deduplicate this data and provide richer displays.
-- Potential schema CREATE TABLE users ( username TEXT PRIMARY KEY, name TEXT, gitlab_id INTEGER, avatar_url TEXT, state TEXT, -- "active", "blocked", etc. last_seen_at INTEGER -- auto-updated on encounter );
Cost: No additional API calls. Data is already in every issue/MR/note response. Just needs extraction during transform.
Milestones are stored for issues but the MR transformer does not extract the milestone object from MR responses, even though GitLab returns it. The merge_requests table has no milestone_id column.
Impact: Cannot query "which MRs are in milestone X?" locally. The data is in the raw payload but not indexed.
MRs store references.short and references.full, but the issue transformer drops the references object entirely. This means issues lack the cross-project reference format (e.g., group/project#42).
API Strategies Not Yet Used
Instead of polling, GitLab can push events via POST /projects/:id/hooks. Would enable near-real-time sync without rate-limit cost. Requires a listener endpoint.
GET /projects/:id/events returns a stream of all project activity. Could be used as a fast "has anything changed?" check before running expensive issue/MR sync. Much lighter than fetching full issue lists.
GitLab's GraphQL API allows requesting exactly the fields needed. Would eliminate bandwidth waste from ~50% of response fields being silently dropped. Trade-off: different pagination model, potentially less stable API surface.
Summary Verdict
Gitlore is well-optimized for its core use case (read-only local sync). The cursor-based incremental sync and raw payload archival are sophisticated. The main opportunities are:
- Capture more "free" data — Fields like
user_notes_count,upvotes,has_conflictsare already in API responses. Storing them costs zero API calls and enables richer queries. - Discussion sync efficiency — The full-refresh strategy is the biggest source of redundant API calls. Even a simple
user_notes_countcomparison could skip unchanged discussions. - Keyset pagination — A meaningful improvement for large projects (>10K issues), and Gitlore already has partial infrastructure for it.
- MR milestone parity — Low-effort gap to close with issue milestone support.