Gitlore CLI ↔ GitLab API Review

Interactive mapping of the gitlore local sync engine to the GitLab REST API v4

Overview
CLI Architecture
API Endpoints Used
Full GitLab API
Mapping
Data Model
Ingestion Pipeline
Field Coverage
Efficiency Analysis

What is Gitlore?

Gitlore (lore) is a read-only local sync engine for GitLab data. It fetches issues, merge requests, and discussions from the GitLab REST API v4 and stores them in a local SQLite database for fast offline querying.

~11,148 lines of Rust. Noun-first CLI. Robot mode for automation. Cursor-based incremental sync.

API Surface Coverage

Gitlore uses a small, focused subset of the GitLab API. It is read-only — it never creates, updates, or deletes anything on GitLab.

Endpoints Used: 7
Issues (list)GET
Merge Requests (list)GET
Issue Discussions (list)GET
MR Discussions (list)GET
Current UserGET
Project DetailsGET
GitLab VersionGET
Coverage by Resource
Issues API2 of ~20 endpoints
MR API2 of ~30 endpoints
Discussions API2 of 30 endpoints
Users API1 of ~15 endpoints
Projects API1 of ~50 endpoints

Data Flow

GitLab API v4
Rate Limiter
10 req/s + jitter
Async Streams
Paginated fetch
Transformers
Normalize data
SQLite (WAL)
Local DB

Key Design Decisions

Read-only sync

Only GET requests. Never mutates GitLab state. Safe to run repeatedly.

Cursor-based incremental

Uses updated_after parameter to only fetch changed data. 2-second rewind overlap for safety.

Raw payload archival

Stores original JSON responses with SHA-256 dedup and optional gzip compression.

Discussion full-refresh

Discussions use DELETE+INSERT strategy per parent (no incremental). Parallel prefetch, serial write.

CLI Architecture

Command Structure (Noun-First)

lore <noun> [verb/arg]    # Primary pattern
lore issues              # List all issues
lore issues 42           # Show issue #42
lore mrs                 # List all merge requests
lore mrs 17              # Show MR #17
lore ingest issues       # Fetch issues from GitLab
lore ingest mrs          # Fetch MRs from GitLab
lore count issues        # Count local issues
lore count discussions   # Count local discussions
lore status              # Show sync state
lore auth                # Verify GitLab auth
lore doctor              # Health check
lore init                # Initialize config + DB
lore migrate             # Run DB migrations
lore version             # Show version

Global Flags

FlagDescription
-c, --configPath to config file
--robotMachine-readable JSON output
-J, --jsonJSON shorthand (same as --robot)

Robot Mode Detection

Three ways to activate:

lore --robot list issues          # Explicit flag
lore list issues | jq .           # Auto: stdout not a TTY
LORE_ROBOT=1 lore list issues     # Environment variable

Robot mode returns JSON: {"ok":true,"data":{...},"meta":{...}}

Errors go to stderr: {"error":{"code":"...","message":"...","suggestion":"..."}}

Exit Codes

CodeMeaning
0Success
1Internal error
2Config not found
3Config invalid
4Token not set
5GitLab auth failed
6Resource not found
7Rate limited
8Network error
9Database locked
10Database error
11Migration failed
12I/O error
13Transform error

Configuration

{
  "gitlab": {
    "baseUrl": "https://gitlab.com",
    "tokenEnvVar": "GITLAB_TOKEN"
  },
  "projects": [
    { "path": "group/project" }
  ],
  "sync": {
    "backfillDays": 14,
    "staleLockMinutes": 10,
    "heartbeatIntervalSeconds": 30,
    "cursorRewindSeconds": 2,
    "primaryConcurrency": 4,
    "dependentConcurrency": 2
  },
  "storage": {
    "dbPath": "~/.local/share/lore/lore.db",
    "compressRawPayloads": true
  }
}

Deprecated Commands (Hidden)

OldNewNotes
lore listlore issues / lore mrsShows deprecation warning
lore showlore issues <iid>Shows deprecation warning
lore auth-testlore authAlias
lore sync-statuslore statusAlias

GitLab API Endpoints Used by Gitlore

All requests use PRIVATE-TOKEN header authentication. Rate limited at 10 req/s with 0-50ms jitter.

Full GitLab REST API v4 Reference

Complete endpoint inventory for the resources relevant to Gitlore. USED = consumed by Gitlore.

Issues API

MethodEndpointDescriptionStatus
GET/issuesList all issues (global)--
GET/groups/:id/issuesList group issues--
GET/projects/:id/issuesList project issuesUSED
GET/projects/:id/issues/:iidGet single issue--
POST/projects/:id/issuesCreate issue--
PUT/projects/:id/issues/:iidUpdate issue--
DEL/projects/:id/issues/:iidDelete issue--
PUT/projects/:id/issues/:iid/reorderReorder issue--
POST/projects/:id/issues/:iid/moveMove issue--
POST/projects/:id/issues/:iid/cloneClone issue--
POST/projects/:id/issues/:iid/subscribeSubscribe to issue--
POST/projects/:id/issues/:iid/unsubscribeUnsubscribe--
POST/projects/:id/issues/:iid/todoCreate to-do--
POST/projects/:id/issues/:iid/time_estimateSet time estimate--
POST/projects/:id/issues/:iid/add_spent_timeAdd spent time--
GET/projects/:id/issues/:iid/time_statsGet time stats--
GET/projects/:id/issues/:iid/related_merge_requestsRelated MRs--
GET/projects/:id/issues/:iid/closed_byMRs that close issue--
GET/projects/:id/issues/:iid/participantsList participants--

Merge Requests API

MethodEndpointDescriptionStatus
GET/merge_requestsList all MRs (global)--
GET/groups/:id/merge_requestsList group MRs--
GET/projects/:id/merge_requestsList project MRsUSED
GET/projects/:id/merge_requests/:iidGet single MR--
POST/projects/:id/merge_requestsCreate MR--
PUT/projects/:id/merge_requests/:iidUpdate MR--
DEL/projects/:id/merge_requests/:iidDelete MR--
PUT/projects/:id/merge_requests/:iid/mergeMerge an MR--
POST/projects/:id/merge_requests/:iid/cancel_mergeCancel merge--
PUT/projects/:id/merge_requests/:iid/rebaseRebase MR--
GET/projects/:id/merge_requests/:iid/commitsList MR commits--
GET/projects/:id/merge_requests/:iid/changesList MR diffs--
GET/projects/:id/merge_requests/:iid/pipelinesMR pipelines--
GET/projects/:id/merge_requests/:iid/participantsMR participants--
GET/projects/:id/merge_requests/:iid/approvalsMR approvals--
POST/projects/:id/merge_requests/:iid/approveApprove MR--

Discussions API

MethodEndpointDescriptionStatus
GET/projects/:id/issues/:iid/discussionsList issue discussionsUSED
GET/projects/:id/issues/:iid/discussions/:didGet single discussion--
POST/projects/:id/issues/:iid/discussionsCreate issue thread--
POST/projects/:id/issues/:iid/discussions/:did/notesAdd note to thread--
PUT/projects/:id/issues/:iid/discussions/:did/notes/:nidModify note--
DEL/projects/:id/issues/:iid/discussions/:did/notes/:nidDelete note--
GET/projects/:id/merge_requests/:iid/discussionsList MR discussionsUSED
GET/projects/:id/merge_requests/:iid/discussions/:didGet single MR discussion--
POST/projects/:id/merge_requests/:iid/discussionsCreate MR thread--
PUT/projects/:id/merge_requests/:iid/discussions/:didResolve/unresolve thread--
POST/projects/:id/merge_requests/:iid/discussions/:did/notesAdd note to MR thread--
PUT/projects/:id/merge_requests/:iid/discussions/:did/notes/:nidModify MR note--
DEL/projects/:id/merge_requests/:iid/discussions/:did/notes/:nidDelete MR note--
GET/projects/:id/snippets/:sid/discussionsList snippet discussions--
GET/groups/:id/epics/:eid/discussionsList epic discussions--
GET/projects/:id/repository/commits/:sha/discussionsList commit discussions--

Notes API (Flat, non-threaded)

MethodEndpointDescriptionStatus
GET/projects/:id/issues/:iid/notesList issue notes--
POST/projects/:id/issues/:iid/notesCreate issue note--
GET/projects/:id/merge_requests/:iid/notesList MR notes--
POST/projects/:id/merge_requests/:iid/notesCreate MR note--
GET/projects/:id/snippets/:sid/notesList snippet notes--

Gitlore uses the Discussions API (threaded) instead of the flat Notes API. Notes are extracted from discussion responses.

Other APIs Used

MethodEndpointDescriptionStatus
GET/userCurrent authenticated userUSED
GET/projects/:pathGet project by pathUSED
GET/versionGitLab instance versionUSED

CLI Command ↔ API Endpoint Mapping

How each CLI command maps to GitLab API calls and local database operations.

lore ingest issues
Phase 1: Fetch primary resources
GET /projects/:id/issues (paginated, cursor-based)
Phase 2: Identify stale discussions
SQL: WHERE updated_at > discussions_synced_for_updated_at
Phase 3: Sync discussions
GET /projects/:id/issues/:iid/discussions (parallel prefetch)
Storage: Write to DB
Tables: issues, labels, issue_labels, discussions, notes, raw_payloads
lore ingest mrs
Phase 1: Fetch primary resources
GET /projects/:id/merge_requests (paginated, cursor-based)
Phase 2: Identify stale discussions
SQL: WHERE updated_at > discussions_synced_for_updated_at
Phase 3: Sync discussions
GET /projects/:id/merge_requests/:iid/discussions (parallel prefetch)
Storage: Write to DB
Tables: merge_requests, labels, mr_labels, mr_assignees, mr_reviewers, discussions, notes, raw_payloads
lore issues / lore mrs
List mode: Query local DB
SQL: SELECT ... FROM issues/merge_requests with filters (no API call)
Show mode: Query local DB by IID
SQL: SELECT ... WHERE iid = ? + join discussions/notes (no API call)
lore auth
Verify token works
GET /api/v4/user
lore doctor
Check auth + GitLab version
GET /api/v4/user + GET /api/v4/version
Check each configured project
GET /api/v4/projects/:path
lore count / lore status / lore init / lore migrate
Local-only operations
No API calls. Database queries only.

API Capabilities NOT Used by Gitlore

Write Operations
  • Create/update/delete issues
  • Create/update/delete MRs
  • Merge MRs
  • Create/reply to discussions
  • Resolve/unresolve threads
  • Approve MRs
Read Operations
  • Single issue/MR fetch (uses list with filters instead)
  • MR commits, diffs, pipelines
  • Issue/MR participants
  • Time tracking stats
  • Related MRs / closed-by
  • Labels API (extracted from issue/MR responses)
  • Milestones API (extracted from issue responses)
  • Flat Notes API (uses threaded Discussions API)
  • Snippets, Epics, Commits discussions
  • Webhooks, CI/CD, Pipelines, Deployments

Database Schema

SQLite with WAL mode. 12 tables across 6 migrations.

Entity Relationship

  projects ──────────────────────────────────────────────────────────
    │                                                                │
    ├──< issues ──< issue_labels >── labels                          │
    │     │                                                          │
    │     └──< discussions ──< notes                                 │
    │                                                                │
    ├──< merge_requests ──< mr_labels >── labels                    │
    │     │    │    │                                                 │
    │     │    │    └──< mr_reviewers                                 │
    │     │    └──< mr_assignees                                     │
    │     │                                                          │
    │     └──< discussions ──< notes                                 │
    │                                                                │
    ├──< raw_payloads                                                │
    ├──< sync_cursors                                                │
    └── sync_runs, app_locks, schema_version                         │
  ───────────────────────────────────────────────────────────────────

Table Details

Ingestion Pipeline

Three-Phase Architecture

Phase 1: Primary Fetch
Paginated API fetch with cursor-based sync.
Stores raw payloads + normalized rows.
Phase 2: Identify Stale
SQL query: which issues/MRs need
their discussions refreshed?
Phase 3: Discussion Sync
Parallel prefetch + serial write.
Full-refresh per parent entity.

Cursor-Based Incremental Sync

Cursor State: (updated_at_cursor: i64, tie_breaker_id: i64)

First sync:
  updated_after = (now - backfillDays)

Subsequent syncs:
  updated_after = cursor.updated_at - cursorRewindSeconds

  For each fetched resource:
    if (gitlab_id, updated_at) <= cursor:
      SKIP (already processed in overlap zone)
    else:
      UPSERT into database

  After each page boundary:
    UPDATE sync_cursors  (crash recovery safe)

Discussion Sync Strategy

For each issue/MR where updated_at > discussions_synced_for_updated_at:

  1. PREFETCH (parallel, configurable concurrency):
     GET /projects/:id/issues/:iid/discussions  (all pages)

  2. WRITE (serial, inside transaction):
     DELETE FROM discussions WHERE issue_id = ?
     DELETE FROM notes WHERE discussion_id IN (...)
     INSERT discussions + notes (fresh data)
     UPDATE issues SET discussions_synced_for_updated_at = updated_at

Full-refresh avoids complexity of detecting deleted/edited notes. Trade-off: more API calls for heavily-discussed items.

Rate Limiting

RateLimiter {
  min_interval: 100ms  (= 1s / 10 req/s)
  jitter: 0-50ms random

  acquire():
    elapsed = now - last_request
    if elapsed < min_interval:
      sleep(min_interval - elapsed + random_jitter)
    last_request = now
}

Pagination

Async stream-based. Fallback chain for next-page detection:

  1. Link header (RFC 8288) — parse rel="next"
  2. x-next-page header — direct page number
  3. Full-page heuristic — if response has 100 items, assume more pages

Raw Payload Storage

API Response
JSON bytes
SHA-256 Hash
Dedup check
Gzip Compress
(if enabled)
raw_payloads
BLOB storage

UNIQUE constraint on (project_id, resource_type, gitlab_id, payload_hash) prevents storing identical payloads.

Concurrency Model

Primary Resource Fetch

Single-threaded async stream. Rate-limited. Each page written in a transaction. Cursor updated at page boundaries.

Discussion Sync

Parallel prefetch (configurable, default 2 concurrent). Serial write phase to avoid DB contention. Each parent entity is one transaction.

Single-Flight Lock

AppLock (database-enforced mutex):
  name: 'sync' (PK)
  owner: UUIDv4 (unique per process)
  heartbeat_at: updated every 30s

  Acquire:
    INSERT OR fail if row exists
    Check stale: if heartbeat > staleLockMinutes, force-acquire

  Release:
    DELETE WHERE owner = my_uuid

Progress Events

EventDescription
IssuesFetchStartedBeginning primary issue fetch
IssueFetchedEach issue processed (for progress bars)
IssuesFetchCompleteAll pages consumed
DiscussionSyncStartedBeginning discussion phase
DiscussionSyncedEach parent's discussions written
DiscussionSyncCompleteAll discussions updated
Same events exist for MRs (MrsFetchStarted, etc.)

Field-Level Coverage: API Response vs Gitlore Storage

Every field in every GitLab API response, mapped to what Gitlore does with it. Serde silently drops fields not in the Rust structs.

Stored in DB
Used transiently (logic only)
Deserialized but ignored
Never deserialized (silently dropped)

Field Coverage Summary

Issues Response
Stored15 fields
Deserialized, ignored5 fields
Never deserialized~22 fields
Merge Request Response
Stored23 fields
Transient (fallbacks)3 fields
Never deserialized~22 fields
Discussion/Note Response
Stored23 fields
Never deserialized~13 fields
Project Response
Stored6 fields
Deserialized, ignored4 fields
Never deserialized~30 fields
Key insight: Raw payloads preserve everything

Although many fields are dropped during transformation, the raw_payloads table stores the complete original JSON response (with SHA-256 dedup and optional gzip). This means all "dropped" data is still recoverable from the blob storage without re-fetching from GitLab. The normalized tables are optimized for query patterns, not completeness.

Efficiency Analysis & Opportunities

Observations on how gitlore could leverage the GitLab API more efficiently, and data it currently leaves on the table.

Current Efficiency Wins

Cursor-based incremental sync

Uses updated_after + order_by=updated_at&sort=asc to only fetch changed records. Avoids full re-fetch on every sync. This is the single biggest efficiency feature.

Raw payload dedup

SHA-256 hashing prevents storing identical payloads. If an issue's updated_at changes but the actual content is identical, the raw blob is deduplicated.

Discussion watermark

Only re-syncs discussions for issues/MRs whose updated_at has advanced past their discussions_synced_for_updated_at watermark. Skips unchanged entities.

Parallel discussion prefetch

Fetches discussions for multiple issues/MRs concurrently (configurable, default 2). Dramatically reduces wall-clock time for discussion sync.

Potential Inefficiencies

1. Discussion full-refresh strategy

Every time an issue/MR is updated, ALL its discussions are re-fetched and replaced (DELETE + INSERT). For heavily-discussed items (50+ comments), this is expensive.

ScenarioCurrentAlternative
Issue with 100 notes gets 1 new commentRe-fetch all 100 notes (multiple pages)Could use GET .../notes?order_by=updated_at&updated_after=... for incremental note sync
MR label change (no new comments)Re-fetch all discussions anywayCould check user_notes_count delta or use Notes API with updated_after

Trade-off: Full-refresh is simpler and guarantees consistency (catches edits, deletes). Incremental would miss deleted notes.

2. Offset pagination instead of keyset

Gitlore uses page=N&per_page=100 offset pagination. GitLab supports keyset pagination for some endpoints (Issues, MRs), which is more efficient for large datasets and recommended by GitLab.

Current:  GET /projects/:id/issues?page=5&per_page=100
Keyset:   GET /projects/:id/issues?pagination=keyset&per_page=100
          (uses Link header rel="next" with cursor)

Benefit: Keyset pagination is O(1) per page (vs O(N) for offset). GitLab recommends it for >10,000 records. Gitlore already parses Link headers, so the client-side support partially exists.

3. No ETag / conditional request support

GitLab returns ETag headers on API responses. Sending If-None-Match on subsequent requests would return 304 Not Modified without consuming rate limit quota on some endpoints. Currently all requests are unconditional.

Impact: Moderate. The cursor-based sync already avoids re-fetching unchanged data, so ETag would mainly help with the discussions full-refresh scenario where nothing changed.

4. Labels extracted from embedded data, not dedicated API

Gitlore extracts labels from the labels[] string array embedded in issue/MR responses. The dedicated GET /projects/:id/labels endpoint returns richer data:

From issues responseFrom Labels API
Label name (string only)name, color, description, text_color, priority, is_project_label, subscribed, open_issues_count, closed_issues_count, open_merge_requests_count

Impact: The labels table has color and description columns but they may not be populated from the embedded string array. A single Labels API call (one request, non-paginated for most projects) would enrich the local label catalog.

Dropped Data Worth Capturing

Fields currently silently dropped that could add value to local queries:

FieldSourceValue PropositionEffort
user_notes_count Issues, MRs Could skip discussion re-sync when count hasn't changed. Quick "activity" sort without joining notes table. Low
upvotes / downvotes Issues, MRs Engagement metrics for triage. "Most upvoted issues" is a common query. Low
confidential Issues Security-sensitive filtering. Avoid exposing confidential issues in outputs. Low
weight Issues Effort estimation for sprint planning (Premium/Ultimate only). Low
time_stats Issues, MRs Time tracking data for project reporting. Already in the response, free to capture. Low
has_conflicts MRs Identify MRs needing rebase. Useful for "stale MR" alerts. Low
blocking_discussions_resolved MRs MR readiness indicator without joining discussions table. Low
merge_commit_sha MRs Trace merged MRs to specific commits. Useful for git correlation. Low
suggestions[] Discussion notes Code review suggestions with from/to content. Rich data for code review analysis. Medium
task_completion_status Issues, MRs Track task-list checkbox progress without parsing description markdown. Low
issue_type Issues Distinguish issues vs incidents vs test cases. Low
discussion_locked Issues, MRs Know if new comments can be added. Low

Structural Optimization Opportunities

5. User denormalization

Currently stores only username for authors, assignees, reviewers. The API returns name, avatar_url, web_url, and state for every user reference. A users table could deduplicate this data and provide richer displays.

-- Potential schema
CREATE TABLE users (
  username TEXT PRIMARY KEY,
  name TEXT,
  gitlab_id INTEGER,
  avatar_url TEXT,
  state TEXT,           -- "active", "blocked", etc.
  last_seen_at INTEGER  -- auto-updated on encounter
);

Cost: No additional API calls. Data is already in every issue/MR/note response. Just needs extraction during transform.

6. MR milestone not captured

Milestones are stored for issues but the MR transformer does not extract the milestone object from MR responses, even though GitLab returns it. The merge_requests table has no milestone_id column.

Impact: Cannot query "which MRs are in milestone X?" locally. The data is in the raw payload but not indexed.

7. Issue references not captured

MRs store references.short and references.full, but the issue transformer drops the references object entirely. This means issues lack the cross-project reference format (e.g., group/project#42).

API Strategies Not Yet Used

Webhooks (push-based sync)

Instead of polling, GitLab can push events via POST /projects/:id/hooks. Would enable near-real-time sync without rate-limit cost. Requires a listener endpoint.

Events API (lightweight change detection)

GET /projects/:id/events returns a stream of all project activity. Could be used as a fast "has anything changed?" check before running expensive issue/MR sync. Much lighter than fetching full issue lists.

GraphQL API (precise field selection)

GitLab's GraphQL API allows requesting exactly the fields needed. Would eliminate bandwidth waste from ~50% of response fields being silently dropped. Trade-off: different pagination model, potentially less stable API surface.

Summary Verdict

Gitlore is well-optimized for its core use case (read-only local sync). The cursor-based incremental sync and raw payload archival are sophisticated. The main opportunities are:

  1. Capture more "free" data — Fields like user_notes_count, upvotes, has_conflicts are already in API responses. Storing them costs zero API calls and enables richer queries.
  2. Discussion sync efficiency — The full-refresh strategy is the biggest source of redundant API calls. Even a simple user_notes_count comparison could skip unchanged discussions.
  3. Keyset pagination — A meaningful improvement for large projects (>10K issues), and Gitlore already has partial infrastructure for it.
  4. MR milestone parity — Low-effort gap to close with issue milestone support.