Gitlore ↔ GitLab API Review

What is Gitlore?

Gitlore (lore) is a read-only local sync engine for GitLab data. It fetches issues, merge requests, and discussions from the GitLab REST API v4 and stores them in a local SQLite database for fast offline querying.

~11,148 lines of Rust. Noun-first CLI. Robot mode for automation. Cursor-based incremental sync.

API Surface Coverage

Gitlore uses a small, focused subset of the GitLab API. It is read-only — it never creates, updates, or deletes anything on GitLab.

Endpoints Used: 7

Issues (list)GET

Merge Requests (list)GET

Issue Discussions (list)GET

MR Discussions (list)GET

Current UserGET

Project DetailsGET

GitLab VersionGET

Coverage by Resource

Issues API2 of ~20 endpoints

MR API2 of ~30 endpoints

Discussions API2 of 30 endpoints

Users API1 of ~15 endpoints

Projects API1 of ~50 endpoints

Data Flow

GitLab API v4

→

Rate Limiter
10 req/s + jitter

→

Async Streams
Paginated fetch

→

Transformers
Normalize data

→

SQLite (WAL)
Local DB

Key Design Decisions

Read-only sync

Only GET requests. Never mutates GitLab state. Safe to run repeatedly.

Cursor-based incremental

Uses updated_after parameter to only fetch changed data. 2-second rewind overlap for safety.

Raw payload archival

Stores original JSON responses with SHA-256 dedup and optional gzip compression.

Discussion full-refresh

Discussions use DELETE+INSERT strategy per parent (no incremental). Parallel prefetch, serial write.

CLI Architecture

Command Structure (Noun-First)

lore <noun> [verb/arg]    # Primary pattern
lore issues              # List all issues
lore issues 42           # Show issue #42
lore mrs                 # List all merge requests
lore mrs 17              # Show MR #17
lore ingest issues       # Fetch issues from GitLab
lore ingest mrs          # Fetch MRs from GitLab
lore count issues        # Count local issues
lore count discussions   # Count local discussions
lore status              # Show sync state
lore auth                # Verify GitLab auth
lore doctor              # Health check
lore init                # Initialize config + DB
lore migrate             # Run DB migrations
lore version             # Show version

Global Flags

Flag	Description
`-c, --config`	Path to config file
`--robot`	Machine-readable JSON output
`-J, --json`	JSON shorthand (same as --robot)

Robot Mode Detection

Three ways to activate:

lore --robot list issues          # Explicit flag
lore list issues | jq .           # Auto: stdout not a TTY
LORE_ROBOT=1 lore list issues     # Environment variable

Robot mode returns JSON: {"ok":true,"data":{...},"meta":{...}}

Errors go to stderr: {"error":{"code":"...","message":"...","suggestion":"..."}}

Exit Codes

Code	Meaning
0	Success
1	Internal error
2	Config not found
3	Config invalid
4	Token not set
5	GitLab auth failed
6	Resource not found
7	Rate limited
8	Network error
9	Database locked
10	Database error
11	Migration failed
12	I/O error
13	Transform error

Configuration

{
  "gitlab": {
    "baseUrl": "https://gitlab.com",
    "tokenEnvVar": "GITLAB_TOKEN"
  },
  "projects": [
    { "path": "group/project" }
  ],
  "sync": {
    "backfillDays": 14,
    "staleLockMinutes": 10,
    "heartbeatIntervalSeconds": 30,
    "cursorRewindSeconds": 2,
    "primaryConcurrency": 4,
    "dependentConcurrency": 2
  },
  "storage": {
    "dbPath": "~/.local/share/lore/lore.db",
    "compressRawPayloads": true
  }
}

Deprecated Commands (Hidden)

Old	New	Notes
`lore list`	`lore issues` / `lore mrs`	Shows deprecation warning
`lore show`	`lore issues <iid>`	Shows deprecation warning
`lore auth-test`	`lore auth`	Alias
`lore sync-status`	`lore status`	Alias

GitLab API Endpoints Used by Gitlore

All requests use PRIVATE-TOKEN header authentication. Rate limited at 10 req/s with 0-50ms jitter.

GET /api/v4/projects/:id/issues — List project issues ▶

Used By

lore ingest issues

Query Parameters Used

Param	Value	Purpose
`scope`	`all`	Include all issues, not just assigned to user
`state`	`all`	Open and closed
`order_by`	`updated_at`	Support cursor-based incremental sync
`sort`	`asc`	Oldest first for cursor correctness
`per_page`	`100`	Maximum page size
`page`	`{n}`	Offset pagination
`updated_after`	ISO 8601	Cursor: only fetch changes since last sync

Response Fields Consumed

id, iid, project_id, title, description, state,
created_at, updated_at, closed_at,
author { id, username, name },
assignees[] { id, username, name },
labels[], milestone { id, iid, title, state, due_date, web_url },
web_url, due_date

Pagination

Fallback chain: Link header (RFC 8288) → x-next-page header → full-page heuristic (100 items = more pages)

GET /api/v4/projects/:id/merge_requests — List project MRs ▶

Used By

lore ingest mrs

Query Parameters Used

Param	Value	Purpose
`scope`	`all`	All MRs in project
`state`	`all`	opened, merged, closed, locked
`order_by`	`updated_at`	Cursor-based sync
`sort`	`asc`	Oldest first
`per_page`	`100`	Max page size
`page`	`{n}`	Offset pagination
`updated_after`	ISO 8601	Incremental cursor

Response Fields Consumed

id, iid, project_id, title, description, state,
draft (preferred) OR work_in_progress (fallback),
author, assignees[], reviewers[],
labels[], source_branch, target_branch, sha,
references { short, full },
detailed_merge_status (preferred) OR merge_status (fallback),
merge_user (preferred) OR merged_by (fallback),
created_at, updated_at, merged_at, closed_at, web_url

Deprecated Field Fallbacks

Preferred	Fallback
`draft`	`work_in_progress`
`detailed_merge_status`	`merge_status`
`merge_user`	`merged_by`

GET /api/v4/projects/:id/issues/:iid/discussions — Issue discussions ▶

Used By

lore ingest issues (dependent phase)

Sync Strategy

Full-refresh: DELETE all existing discussions for the issue, INSERT all fetched discussions. Only triggered when issue.updated_at > issue.discussions_synced_for_updated_at.

Response Fields Consumed

id (string), individual_note (bool),
notes[] {
  id, type (DiscussionNote | DiffNote | null),
  body, author { username, name },
  created_at, updated_at,
  system (bool), resolvable, resolved,
  resolved_by, resolved_at,
  position { old_path, new_path, old_line, new_line,
    position_type, line_range, base_sha, start_sha, head_sha }
}

GET /api/v4/projects/:id/merge_requests/:iid/discussions — MR discussions ▶

Used By

lore ingest mrs (dependent phase)

Sync Strategy

Same full-refresh strategy as issue discussions. Includes DiffNote support (code review comments on specific lines).

Additional: DiffNote Position

position {
  old_path, new_path,       // file paths
  old_line, new_line,       // line numbers
  position_type,            // "text" or "image"
  line_range {              // multi-line comments
    start { line_code, type, old_line, new_line },
    end { line_code, type, old_line, new_line }
  },
  base_sha, start_sha, head_sha  // commit references
}

GET /api/v4/user — Current authenticated user ▶

Used By

lore auth, lore doctor

Response Fields

id, username, name, email, avatar_url, web_url, created_at, state

GET /api/v4/projects/:path_encoded — Project details ▶

Used By

lore ingest (project resolution during init/sync)

Response Fields

id, path_with_namespace, default_branch, web_url,
created_at, updated_at, name, description, visibility, archived

GET /api/v4/version — GitLab instance version ▶

Used By

lore doctor

Response Fields

version, revision

Full GitLab REST API v4 Reference

Complete endpoint inventory for the resources relevant to Gitlore. USED = consumed by Gitlore.

Issues API

Method	Endpoint	Description	Status
GET	`/issues`	List all issues (global)	--
GET	`/groups/:id/issues`	List group issues	--
GET	`/projects/:id/issues`	List project issues	USED
GET	`/projects/:id/issues/:iid`	Get single issue	--
POST	`/projects/:id/issues`	Create issue	--
PUT	`/projects/:id/issues/:iid`	Update issue	--
DEL	`/projects/:id/issues/:iid`	Delete issue	--
PUT	`/projects/:id/issues/:iid/reorder`	Reorder issue	--
POST	`/projects/:id/issues/:iid/move`	Move issue	--
POST	`/projects/:id/issues/:iid/clone`	Clone issue	--
POST	`/projects/:id/issues/:iid/subscribe`	Subscribe to issue	--
POST	`/projects/:id/issues/:iid/unsubscribe`	Unsubscribe	--
POST	`/projects/:id/issues/:iid/todo`	Create to-do	--
POST	`/projects/:id/issues/:iid/time_estimate`	Set time estimate	--
POST	`/projects/:id/issues/:iid/add_spent_time`	Add spent time	--
GET	`/projects/:id/issues/:iid/time_stats`	Get time stats	--
GET	`/projects/:id/issues/:iid/related_merge_requests`	Related MRs	--
GET	`/projects/:id/issues/:iid/closed_by`	MRs that close issue	--
GET	`/projects/:id/issues/:iid/participants`	List participants	--

Merge Requests API

Method	Endpoint	Description	Status
GET	`/merge_requests`	List all MRs (global)	--
GET	`/groups/:id/merge_requests`	List group MRs	--
GET	`/projects/:id/merge_requests`	List project MRs	USED
GET	`/projects/:id/merge_requests/:iid`	Get single MR	--
POST	`/projects/:id/merge_requests`	Create MR	--
PUT	`/projects/:id/merge_requests/:iid`	Update MR	--
DEL	`/projects/:id/merge_requests/:iid`	Delete MR	--
PUT	`/projects/:id/merge_requests/:iid/merge`	Merge an MR	--
POST	`/projects/:id/merge_requests/:iid/cancel_merge`	Cancel merge	--
PUT	`/projects/:id/merge_requests/:iid/rebase`	Rebase MR	--
GET	`/projects/:id/merge_requests/:iid/commits`	List MR commits	--
GET	`/projects/:id/merge_requests/:iid/changes`	List MR diffs	--
GET	`/projects/:id/merge_requests/:iid/pipelines`	MR pipelines	--
GET	`/projects/:id/merge_requests/:iid/participants`	MR participants	--
GET	`/projects/:id/merge_requests/:iid/approvals`	MR approvals	--
POST	`/projects/:id/merge_requests/:iid/approve`	Approve MR	--

Discussions API

Method	Endpoint	Description	Status
GET	`/projects/:id/issues/:iid/discussions`	List issue discussions	USED
GET	`/projects/:id/issues/:iid/discussions/:did`	Get single discussion	--
POST	`/projects/:id/issues/:iid/discussions`	Create issue thread	--
POST	`/projects/:id/issues/:iid/discussions/:did/notes`	Add note to thread	--
PUT	`/projects/:id/issues/:iid/discussions/:did/notes/:nid`	Modify note	--
DEL	`/projects/:id/issues/:iid/discussions/:did/notes/:nid`	Delete note	--
GET	`/projects/:id/merge_requests/:iid/discussions`	List MR discussions	USED
GET	`/projects/:id/merge_requests/:iid/discussions/:did`	Get single MR discussion	--
POST	`/projects/:id/merge_requests/:iid/discussions`	Create MR thread	--
PUT	`/projects/:id/merge_requests/:iid/discussions/:did`	Resolve/unresolve thread	--
POST	`/projects/:id/merge_requests/:iid/discussions/:did/notes`	Add note to MR thread	--
PUT	`/projects/:id/merge_requests/:iid/discussions/:did/notes/:nid`	Modify MR note	--
DEL	`/projects/:id/merge_requests/:iid/discussions/:did/notes/:nid`	Delete MR note	--
GET	`/projects/:id/snippets/:sid/discussions`	List snippet discussions	--
GET	`/groups/:id/epics/:eid/discussions`	List epic discussions	--
GET	`/projects/:id/repository/commits/:sha/discussions`	List commit discussions	--

Notes API (Flat, non-threaded)

Method	Endpoint	Description	Status
GET	`/projects/:id/issues/:iid/notes`	List issue notes	--
POST	`/projects/:id/issues/:iid/notes`	Create issue note	--
GET	`/projects/:id/merge_requests/:iid/notes`	List MR notes	--
POST	`/projects/:id/merge_requests/:iid/notes`	Create MR note	--
GET	`/projects/:id/snippets/:sid/notes`	List snippet notes	--

Gitlore uses the Discussions API (threaded) instead of the flat Notes API. Notes are extracted from discussion responses.

Other APIs Used

Method	Endpoint	Description	Status
GET	`/user`	Current authenticated user	USED
GET	`/projects/:path`	Get project by path	USED
GET	`/version`	GitLab instance version	USED

CLI Command ↔ API Endpoint Mapping

How each CLI command maps to GitLab API calls and local database operations.

lore ingest issues

Phase 1: Fetch primary resources

→

GET /projects/:id/issues (paginated, cursor-based)

Phase 2: Identify stale discussions

→

SQL: WHERE updated_at > discussions_synced_for_updated_at

Phase 3: Sync discussions

→

GET /projects/:id/issues/:iid/discussions (parallel prefetch)

Storage: Write to DB

→

Tables: issues, labels, issue_labels, discussions, notes, raw_payloads

lore ingest mrs

Phase 1: Fetch primary resources

→

GET /projects/:id/merge_requests (paginated, cursor-based)

Phase 2: Identify stale discussions

→

SQL: WHERE updated_at > discussions_synced_for_updated_at

Phase 3: Sync discussions

→

GET /projects/:id/merge_requests/:iid/discussions (parallel prefetch)

Storage: Write to DB

→

Tables: merge_requests, labels, mr_labels, mr_assignees, mr_reviewers, discussions, notes, raw_payloads

lore issues / lore mrs

List mode: Query local DB

→

SQL: SELECT ... FROM issues/merge_requests with filters (no API call)

Show mode: Query local DB by IID

→

SQL: SELECT ... WHERE iid = ? + join discussions/notes (no API call)

lore auth

Verify token works

→

GET /api/v4/user

lore doctor

Check auth + GitLab version

→

GET /api/v4/user + GET /api/v4/version

Check each configured project

→

GET /api/v4/projects/:path

lore count / lore status / lore init / lore migrate

Local-only operations

→

No API calls. Database queries only.

API Capabilities NOT Used by Gitlore

Write Operations

Create/update/delete issues
Create/update/delete MRs
Merge MRs
Create/reply to discussions
Resolve/unresolve threads
Approve MRs

Read Operations

Single issue/MR fetch (uses list with filters instead)
MR commits, diffs, pipelines
Issue/MR participants
Time tracking stats
Related MRs / closed-by
Labels API (extracted from issue/MR responses)
Milestones API (extracted from issue responses)
Flat Notes API (uses threaded Discussions API)
Snippets, Epics, Commits discussions
Webhooks, CI/CD, Pipelines, Deployments

Database Schema

SQLite with WAL mode. 12 tables across 6 migrations.

Entity Relationship

  projects ──────────────────────────────────────────────────────────
    │                                                                │
    ├──< issues ──< issue_labels >── labels                          │
    │     │                                                          │
    │     └──< discussions ──< notes                                 │
    │                                                                │
    ├──< merge_requests ──< mr_labels >── labels                    │
    │     │    │    │                                                 │
    │     │    │    └──< mr_reviewers                                 │
    │     │    └──< mr_assignees                                     │
    │     │                                                          │
    │     └──< discussions ──< notes                                 │
    │                                                                │
    ├──< raw_payloads                                                │
    ├──< sync_cursors                                                │
    └── sync_runs, app_locks, schema_version                         │
  ───────────────────────────────────────────────────────────────────

Table Details

projects — Configured GitLab projects ▶

Column	Type	Notes
id	INTEGER PK	Auto-increment
gitlab_project_id	INTEGER UNIQUE	GitLab's project ID
path_with_namespace	TEXT	e.g. "group/project"
default_branch	TEXT	e.g. "main"
web_url	TEXT	Full URL
created_at, updated_at	INTEGER	ms epoch
raw_payload_id	INTEGER FK	Link to raw_payloads

issues — Synced GitLab issues ▶

Column	Type	Notes
id	INTEGER PK	Auto-increment
gitlab_id	INTEGER UNIQUE	GitLab's issue ID
project_id	INTEGER FK	projects.id
iid	INTEGER	Issue # within project (UNIQUE with project_id)
title, description	TEXT
state	TEXT	'opened' \| 'closed'
author_username	TEXT	Denormalized
created_at, updated_at, closed_at	INTEGER	ms epoch
last_seen_at	INTEGER	Orphan detection
discussions_synced_for_updated_at	INTEGER	Discussion sync watermark
web_url	TEXT
raw_payload_id	INTEGER FK

merge_requests — Synced GitLab MRs ▶

Column	Type	Notes
id	INTEGER PK
gitlab_id	INTEGER UNIQUE
project_id	INTEGER FK
iid	INTEGER
title, description, state	TEXT	'opened' \| 'merged' \| 'closed' \| 'locked'
draft	INTEGER	0/1
author_username	TEXT
source_branch, target_branch	TEXT
head_sha	TEXT
references_short, references_full	TEXT	e.g. "!42", "group/proj!42"
detailed_merge_status	TEXT
merge_user_username	TEXT
created_at, updated_at, merged_at, closed_at	INTEGER	ms epoch
last_seen_at	INTEGER
discussions_synced_for_updated_at	INTEGER	Watermark
discussions_sync_last_attempt_at	INTEGER	Health telemetry
discussions_sync_attempts	INTEGER	Retry count
discussions_sync_last_error	TEXT
web_url, raw_payload_id	TEXT/INT

discussions + notes — Threaded comments ▶

discussions

Column	Type	Notes
id	INTEGER PK
gitlab_discussion_id	TEXT	SHA-like string ID
project_id	INTEGER FK
issue_id	INTEGER FK	XOR with merge_request_id
merge_request_id	INTEGER FK	XOR with issue_id
noteable_type	TEXT	'Issue' \| 'MergeRequest'
individual_note	INTEGER	0 = threaded, 1 = standalone
resolvable, resolved	INTEGER	0/1
first_note_at, last_note_at	INTEGER	Computed from notes

notes

Column	Type	Notes
id	INTEGER PK
gitlab_id	INTEGER UNIQUE
discussion_id	INTEGER FK
note_type	TEXT	'DiscussionNote' \| 'DiffNote' \| null
is_system	INTEGER	Auto-generated (label changes, etc.)
author_username, body	TEXT
position	INTEGER	0-indexed order within discussion
resolvable, resolved	INTEGER	0/1
resolved_by, resolved_at	TEXT/INT
position_old/new_path	TEXT	DiffNote: file path
position_old/new_line	INTEGER	DiffNote: line number
position_type	TEXT	DiffNote: "text" or "image"
position_line_range_start/end	TEXT	JSON for multi-line
position_base/start/head_sha	TEXT	Commit references

Infrastructure tables — sync_cursors, sync_runs, app_locks, raw_payloads ▶

sync_cursors

PK: (project_id, resource_type). Tracks updated_at_cursor (ms epoch) and tie_breaker_id for incremental sync.

sync_runs

Audit trail: started_at, finished_at, status ('running'|'succeeded'|'failed'), command, error, metrics_json.

app_locks

Single-flight mutex: name PK ('sync'), owner (UUIDv4), heartbeat_at. Prevents concurrent sync.

raw_payloads

Archived API responses. SHA-256 dedup. Optional gzip. UNIQUE(project_id, resource_type, gitlab_id, payload_hash).

SQLite Pragmas

journal_mode = WAL          -- Write-ahead logging
synchronous = NORMAL        -- Safe for WAL
foreign_keys = ON           -- Referential integrity
busy_timeout = 5000         -- 5s lock wait
temp_store = MEMORY         -- Speed optimization

Ingestion Pipeline

Three-Phase Architecture

        Phase 1: Primary Fetch

        Paginated API fetch with cursor-based sync.
Stores raw payloads + normalized rows.
      

Phase 2: Identify Stale
SQL query: which issues/MRs need
their discussions refreshed?

        Phase 3: Discussion Sync

        Parallel prefetch + serial write.
Full-refresh per parent entity.
      

Cursor-Based Incremental Sync

Cursor State: (updated_at_cursor: i64, tie_breaker_id: i64)

First sync:
  updated_after = (now - backfillDays)

Subsequent syncs:
  updated_after = cursor.updated_at - cursorRewindSeconds

  For each fetched resource:
    if (gitlab_id, updated_at) <= cursor:
      SKIP (already processed in overlap zone)
    else:
      UPSERT into database

  After each page boundary:
    UPDATE sync_cursors  (crash recovery safe)

Discussion Sync Strategy

For each issue/MR where updated_at > discussions_synced_for_updated_at:

  1. PREFETCH (parallel, configurable concurrency):
     GET /projects/:id/issues/:iid/discussions  (all pages)

  2. WRITE (serial, inside transaction):
     DELETE FROM discussions WHERE issue_id = ?
     DELETE FROM notes WHERE discussion_id IN (...)
     INSERT discussions + notes (fresh data)
     UPDATE issues SET discussions_synced_for_updated_at = updated_at

Full-refresh avoids complexity of detecting deleted/edited notes. Trade-off: more API calls for heavily-discussed items.

Rate Limiting

RateLimiter {
  min_interval: 100ms  (= 1s / 10 req/s)
  jitter: 0-50ms random

  acquire():
    elapsed = now - last_request
    if elapsed < min_interval:
      sleep(min_interval - elapsed + random_jitter)
    last_request = now
}

Pagination

Async stream-based. Fallback chain for next-page detection:

Link header (RFC 8288) — parse rel="next"
x-next-page header — direct page number
Full-page heuristic — if response has 100 items, assume more pages

Raw Payload Storage

API Response
JSON bytes

→

SHA-256 Hash
Dedup check

→

Gzip Compress
(if enabled)

→

raw_payloads
BLOB storage

UNIQUE constraint on (project_id, resource_type, gitlab_id, payload_hash) prevents storing identical payloads.

Concurrency Model

Primary Resource Fetch

Single-threaded async stream. Rate-limited. Each page written in a transaction. Cursor updated at page boundaries.

Discussion Sync

Parallel prefetch (configurable, default 2 concurrent). Serial write phase to avoid DB contention. Each parent entity is one transaction.

Single-Flight Lock

AppLock (database-enforced mutex):
  name: 'sync' (PK)
  owner: UUIDv4 (unique per process)
  heartbeat_at: updated every 30s

  Acquire:
    INSERT OR fail if row exists
    Check stale: if heartbeat > staleLockMinutes, force-acquire

  Release:
    DELETE WHERE owner = my_uuid

Progress Events

Event	Description
`IssuesFetchStarted`	Beginning primary issue fetch
`IssueFetched`	Each issue processed (for progress bars)
`IssuesFetchComplete`	All pages consumed
`DiscussionSyncStarted`	Beginning discussion phase
`DiscussionSynced`	Each parent's discussions written
`DiscussionSyncComplete`	All discussions updated
Same events exist for MRs (MrsFetchStarted, etc.)

Field-Level Coverage: API Response vs Gitlore Storage

Every field in every GitLab API response, mapped to what Gitlore does with it. Serde silently drops fields not in the Rust structs.

Stored in DB

Used transiently (logic only)

Deserialized but ignored

Never deserialized (silently dropped)

GET /projects/:id/issues — Issue response object (15 stored / 30+ total) ▶

API Field	Status	DB Column	Notes
`id`	STORED	`issues.gitlab_id`	GitLab's global issue ID
`iid`	STORED	`issues.iid`	Project-scoped #
`project_id`	STORED	`issues.project_id`	FK to projects
`title`	STORED	`issues.title`
`description`	STORED	`issues.description`
`state`	STORED	`issues.state`	"opened" \| "closed"
`created_at`	STORED	`issues.created_at`	Converted to ms epoch
`updated_at`	STORED	`issues.updated_at`	Converted to ms epoch; drives cursor
`closed_at`	STORED	`issues.closed_at`	Nullable, ms epoch
`due_date`	STORED	`issues.due_date`	YYYY-MM-DD string
`web_url`	STORED	`issues.web_url`
`author.username`	STORED	`issues.author_username`	Only username kept
`assignees[].username`	STORED	`issue_assignees`	Junction table, username only
`labels[]`	STORED	`issue_labels` → `labels`	Junction table
`milestone.*`	STORED	`milestones` table + denorm	id, iid, title, state, due_date, web_url
`author.id`	IGNORED	—	Deserialized but not stored
`author.name`	IGNORED	—	Deserialized but not stored
`author.state`	IGNORED	—	Deserialized but not stored
`author.avatar_url`	IGNORED	—	Deserialized but not stored
`author.web_url`	IGNORED	—	Deserialized but not stored
`type`	DROPPED	—	"ISSUE" — not deserialized
`upvotes`	DROPPED	—	Engagement metric, not captured
`downvotes`	DROPPED	—	Engagement metric, not captured
`merge_requests_count`	DROPPED	—	Count of linked MRs
`user_notes_count`	DROPPED	—	Comment count
`subscribed`	DROPPED	—	User-specific subscription state
`confidential`	DROPPED	—	Visibility flag
`discussion_locked`	DROPPED	—	Lock state
`weight`	DROPPED	—	Premium: issue weight
`time_stats.*`	DROPPED	—	time_estimate, total_time_spent, human_*
`task_completion_status`	DROPPED	—	{count, completed_count}
`references.*`	DROPPED	—	{short, relative, full}
`closed_by`	DROPPED	—	User who closed it
`assignee`	DROPPED	—	Deprecated singular; uses assignees[]
`has_tasks`	DROPPED	—	Task list presence
`issue_type`	DROPPED	—	"issue" \| "incident" \| "test_case"
`severity`	DROPPED	—	Incident severity
`imported`	DROPPED	—	Import flag
`imported_from`	DROPPED	—	Import source
`moved_to_id`	DROPPED	—	Target if moved
`_links`	DROPPED	—	HATEOAS links
`epic`	DROPPED	—	Premium: parent epic
`iteration`	DROPPED	—	Premium: iteration
`health_status`	DROPPED	—	Ultimate: health

GET /projects/:id/merge_requests — MR response object (20 stored / 45+ total) ▶

API Field	Status	DB Column	Notes
`id`	STORED	`merge_requests.gitlab_id`
`iid`	STORED	`merge_requests.iid`
`project_id`	STORED	`merge_requests.project_id`
`title`	STORED	`merge_requests.title`
`description`	STORED	`merge_requests.description`
`state`	STORED	`merge_requests.state`	opened\|merged\|closed\|locked
`draft`	STORED	`merge_requests.draft`	Preferred field
`source_branch`	STORED	`merge_requests.source_branch`
`target_branch`	STORED	`merge_requests.target_branch`
`sha`	STORED	`merge_requests.head_sha`	Head commit SHA
`references.short`	STORED	`merge_requests.references_short`	e.g. "!42"
`references.full`	STORED	`merge_requests.references_full`	e.g. "group/proj!42"
`detailed_merge_status`	STORED	`merge_requests.detailed_merge_status`	Preferred field
`author.username`	STORED	`merge_requests.author_username`
`merge_user.username`	STORED	`merge_requests.merge_user_username`	Preferred over merged_by
`assignees[].username`	STORED	`mr_assignees`	Junction table
`reviewers[].username`	STORED	`mr_reviewers`	Junction table
`labels[]`	STORED	`mr_labels` → `labels`
`created_at`	STORED	`merge_requests.created_at`	ms epoch
`updated_at`	STORED	`merge_requests.updated_at`	ms epoch; drives cursor
`merged_at`	STORED	`merge_requests.merged_at`	ms epoch, nullable
`closed_at`	STORED	`merge_requests.closed_at`	ms epoch, nullable
`web_url`	STORED	`merge_requests.web_url`
`work_in_progress`	TRANSIENT	—	Deprecated fallback for `draft`
`merge_status`	TRANSIENT	—	Deprecated fallback for `detailed_merge_status`
`merged_by`	TRANSIENT	—	Deprecated fallback for `merge_user`
`upvotes`	DROPPED	—	Engagement metric
`downvotes`	DROPPED	—	Engagement metric
`user_notes_count`	DROPPED	—	Comment count
`source_project_id`	DROPPED	—	Fork source
`target_project_id`	DROPPED	—	Fork target
`milestone`	DROPPED	—	Full milestone object (stored for issues, not MRs)
`merge_when_pipeline_succeeds`	DROPPED	—	Auto-merge flag
`merge_commit_sha`	DROPPED	—	Merge commit reference
`squash_commit_sha`	DROPPED	—	Squash commit reference
`discussion_locked`	DROPPED	—	Lock state
`should_remove_source_branch`	DROPPED	—	Cleanup preference
`force_remove_source_branch`	DROPPED	—	Forced cleanup
`squash`	DROPPED	—	Squash flag
`squash_on_merge`	DROPPED	—	Squash on merge flag
`has_conflicts`	DROPPED	—	Merge conflict state
`blocking_discussions_resolved`	DROPPED	—	All blocking threads resolved?
`time_stats.*`	DROPPED	—	Time tracking data
`task_completion_status`	DROPPED	—	{count, completed_count}
`closed_by`	DROPPED	—	User who closed
`prepared_at`	DROPPED	—	Preparation timestamp
`merge_after`	DROPPED	—	Scheduled merge time
`imported`	DROPPED	—	Import flag
`approvals_before_merge`	DROPPED	—	Deprecated approval count
`references.relative`	DROPPED	—	Only short + full stored

GET /projects/:id/.../discussions — Discussion + Note response (18 stored / 28+ total) ▶

Discussion object

API Field	Status	DB Column	Notes
`id`	STORED	`discussions.gitlab_discussion_id`	SHA-like string
`individual_note`	STORED	`discussions.individual_note`	0/1
`notes[]`	STORED	`notes` table	Each note stored individually

Computed from notes: first_note_at, last_note_at, resolvable (any note resolvable), resolved (all resolvable notes resolved)

Note object (within discussion)

API Field	Status	DB Column	Notes
`id`	STORED	`notes.gitlab_id`
`type`	STORED	`notes.note_type`	DiscussionNote\|DiffNote\|null
`body`	STORED	`notes.body`
`system`	STORED	`notes.is_system`	Auto-generated notes
`author.username`	STORED	`notes.author_username`
`created_at`	STORED	`notes.created_at`	ms epoch
`updated_at`	STORED	`notes.updated_at`	ms epoch
`resolvable`	STORED	`notes.resolvable`
`resolved`	STORED	`notes.resolved`
`resolved_by.username`	STORED	`notes.resolved_by`	Username only
`resolved_at`	STORED	`notes.resolved_at`	ms epoch
`position.old_path`	STORED	`notes.position_old_path`	DiffNote only
`position.new_path`	STORED	`notes.position_new_path`	DiffNote only
`position.old_line`	STORED	`notes.position_old_line`
`position.new_line`	STORED	`notes.position_new_line`
`position.position_type`	STORED	`notes.position_type`	"text"\|"image"\|"file"
`position.line_range`	STORED	`notes.position_line_range_*`	Start/end line numbers extracted
`position.base_sha`	STORED	`notes.position_base_sha`
`position.start_sha`	STORED	`notes.position_start_sha`
`position.head_sha`	STORED	`notes.position_head_sha`
`attachment`	DROPPED	—	File attachment metadata
`noteable_id`	DROPPED	—	Parent ID (redundant, known from URL)
`noteable_type`	DROPPED	—	Parent type (redundant)
`noteable_iid`	DROPPED	—	Parent IID (redundant)
`project_id`	DROPPED	—	From note context (redundant)
`commit_id`	DROPPED	—	DiffNote: specific commit
`position.width`	DROPPED	—	Image position only
`position.height`	DROPPED	—	Image position only
`position.x`	DROPPED	—	Image position only
`position.y`	DROPPED	—	Image position only
`suggestions[]`	DROPPED	—	Code suggestion diffs (from_line, to_line, content)
`line_range.*.line_code`	DROPPED	—	Internal line identifier
`line_range.*.type`	DROPPED	—	"old"\|"new" side indicator

GET /projects/:path — Project response (6 stored / 40+ total) ▶

API Field	Status	DB Column	Notes
`id`	STORED	`projects.gitlab_project_id`
`path_with_namespace`	STORED	`projects.path_with_namespace`
`default_branch`	STORED	`projects.default_branch`
`web_url`	STORED	`projects.web_url`
`created_at`	STORED	`projects.created_at`	ms epoch
`updated_at`	STORED	`projects.updated_at`	ms epoch
`name`	IGNORED	—	Deserialized but not stored
`description`	IGNORED	—	Deserialized but not stored
`visibility`	IGNORED	—	Deserialized but not stored
`archived`	IGNORED	—	Deserialized but not stored
`forks_count`	DROPPED	—
`star_count`	DROPPED	—
`issues_enabled`	DROPPED	—	Feature toggle
`merge_requests_enabled`	DROPPED	—	Feature toggle
`wiki_enabled`	DROPPED	—
`http_url_to_repo`	DROPPED	—	Clone URL
`ssh_url_to_repo`	DROPPED	—	Clone URL
`owner`	DROPPED	—	Owner user object
`namespace`	DROPPED	—	Group/namespace info
`last_activity_at`	DROPPED	—	Last push/update
`statistics`	DROPPED	—	Commit count, storage, etc.

GET /user — Current user response (0 stored, transient only / 20+ total) ▶

API Field	Status	Notes
`id`	TRANSIENT	Displayed in auth check
`username`	TRANSIENT	Displayed in auth check
`name`	TRANSIENT	Displayed in auth check
`email`	IGNORED	Deserialized but not used
`avatar_url`	IGNORED	Deserialized but not used
`web_url`	IGNORED	Deserialized but not used
`created_at`	IGNORED	Deserialized but not used
`state`	IGNORED	Deserialized but not used
`bio`	DROPPED
`location`	DROPPED
`public_email`	DROPPED
`organization`	DROPPED
`job_title`	DROPPED
`two_factor_enabled`	DROPPED
`identities`	DROPPED

Field Coverage Summary

Issues Response

Stored15 fields

Deserialized, ignored5 fields

Never deserialized~22 fields

Merge Request Response

Stored23 fields

Transient (fallbacks)3 fields

Never deserialized~22 fields

Discussion/Note Response

Stored23 fields

Never deserialized~13 fields

Project Response

Stored6 fields

Deserialized, ignored4 fields

Never deserialized~30 fields

Key insight: Raw payloads preserve everything

Although many fields are dropped during transformation, the raw_payloads table stores the complete original JSON response (with SHA-256 dedup and optional gzip). This means all "dropped" data is still recoverable from the blob storage without re-fetching from GitLab. The normalized tables are optimized for query patterns, not completeness.

Efficiency Analysis & Opportunities

Observations on how gitlore could leverage the GitLab API more efficiently, and data it currently leaves on the table.

Current Efficiency Wins

Cursor-based incremental sync

Uses updated_after + order_by=updated_at&sort=asc to only fetch changed records. Avoids full re-fetch on every sync. This is the single biggest efficiency feature.

Raw payload dedup

SHA-256 hashing prevents storing identical payloads. If an issue's updated_at changes but the actual content is identical, the raw blob is deduplicated.

Discussion watermark

Only re-syncs discussions for issues/MRs whose updated_at has advanced past their discussions_synced_for_updated_at watermark. Skips unchanged entities.

Parallel discussion prefetch

Fetches discussions for multiple issues/MRs concurrently (configurable, default 2). Dramatically reduces wall-clock time for discussion sync.

Potential Inefficiencies

1. Discussion full-refresh strategy

Every time an issue/MR is updated, ALL its discussions are re-fetched and replaced (DELETE + INSERT). For heavily-discussed items (50+ comments), this is expensive.

Scenario	Current	Alternative
Issue with 100 notes gets 1 new comment	Re-fetch all 100 notes (multiple pages)	Could use `GET .../notes?order_by=updated_at&updated_after=...` for incremental note sync
MR label change (no new comments)	Re-fetch all discussions anyway	Could check `user_notes_count` delta or use Notes API with updated_after

Trade-off: Full-refresh is simpler and guarantees consistency (catches edits, deletes). Incremental would miss deleted notes.

2. Offset pagination instead of keyset

Gitlore uses page=N&per_page=100 offset pagination. GitLab supports keyset pagination for some endpoints (Issues, MRs), which is more efficient for large datasets and recommended by GitLab.

Current:  GET /projects/:id/issues?page=5&per_page=100
Keyset:   GET /projects/:id/issues?pagination=keyset&per_page=100
          (uses Link header rel="next" with cursor)

Benefit: Keyset pagination is O(1) per page (vs O(N) for offset). GitLab recommends it for >10,000 records. Gitlore already parses Link headers, so the client-side support partially exists.

3. No ETag / conditional request support

GitLab returns ETag headers on API responses. Sending If-None-Match on subsequent requests would return 304 Not Modified without consuming rate limit quota on some endpoints. Currently all requests are unconditional.

Impact: Moderate. The cursor-based sync already avoids re-fetching unchanged data, so ETag would mainly help with the discussions full-refresh scenario where nothing changed.

4. Labels extracted from embedded data, not dedicated API

Gitlore extracts labels from the labels[] string array embedded in issue/MR responses. The dedicated GET /projects/:id/labels endpoint returns richer data:

From issues response	From Labels API
Label name (string only)	name, color, description, text_color, priority, is_project_label, subscribed, open_issues_count, closed_issues_count, open_merge_requests_count

Impact: The labels table has color and description columns but they may not be populated from the embedded string array. A single Labels API call (one request, non-paginated for most projects) would enrich the local label catalog.

Dropped Data Worth Capturing

Fields currently silently dropped that could add value to local queries:

Field	Source	Value Proposition	Effort
`user_notes_count`	Issues, MRs	Could skip discussion re-sync when count hasn't changed. Quick "activity" sort without joining notes table.	Low
`upvotes` / `downvotes`	Issues, MRs	Engagement metrics for triage. "Most upvoted issues" is a common query.	Low
`confidential`	Issues	Security-sensitive filtering. Avoid exposing confidential issues in outputs.	Low
`weight`	Issues	Effort estimation for sprint planning (Premium/Ultimate only).	Low
`time_stats`	Issues, MRs	Time tracking data for project reporting. Already in the response, free to capture.	Low
`has_conflicts`	MRs	Identify MRs needing rebase. Useful for "stale MR" alerts.	Low
`blocking_discussions_resolved`	MRs	MR readiness indicator without joining discussions table.	Low
`merge_commit_sha`	MRs	Trace merged MRs to specific commits. Useful for git correlation.	Low
`suggestions[]`	Discussion notes	Code review suggestions with from/to content. Rich data for code review analysis.	Medium
`task_completion_status`	Issues, MRs	Track task-list checkbox progress without parsing description markdown.	Low
`issue_type`	Issues	Distinguish issues vs incidents vs test cases.	Low
`discussion_locked`	Issues, MRs	Know if new comments can be added.	Low

Structural Optimization Opportunities

5. User denormalization

Currently stores only username for authors, assignees, reviewers. The API returns name, avatar_url, web_url, and state for every user reference. A users table could deduplicate this data and provide richer displays.

-- Potential schema
CREATE TABLE users (
  username TEXT PRIMARY KEY,
  name TEXT,
  gitlab_id INTEGER,
  avatar_url TEXT,
  state TEXT,           -- "active", "blocked", etc.
  last_seen_at INTEGER  -- auto-updated on encounter
);

Cost: No additional API calls. Data is already in every issue/MR/note response. Just needs extraction during transform.

6. MR milestone not captured

Milestones are stored for issues but the MR transformer does not extract the milestone object from MR responses, even though GitLab returns it. The merge_requests table has no milestone_id column.

Impact: Cannot query "which MRs are in milestone X?" locally. The data is in the raw payload but not indexed.

7. Issue references not captured

MRs store references.short and references.full, but the issue transformer drops the references object entirely. This means issues lack the cross-project reference format (e.g., group/project#42).

API Strategies Not Yet Used

Webhooks (push-based sync)

Instead of polling, GitLab can push events via POST /projects/:id/hooks. Would enable near-real-time sync without rate-limit cost. Requires a listener endpoint.

Events API (lightweight change detection)

GET /projects/:id/events returns a stream of all project activity. Could be used as a fast "has anything changed?" check before running expensive issue/MR sync. Much lighter than fetching full issue lists.

GraphQL API (precise field selection)

GitLab's GraphQL API allows requesting exactly the fields needed. Would eliminate bandwidth waste from ~50% of response fields being silently dropped. Trade-off: different pagination model, potentially less stable API surface.

Summary Verdict

Gitlore is well-optimized for its core use case (read-only local sync). The cursor-based incremental sync and raw payload archival are sophisticated. The main opportunities are:

Capture more "free" data — Fields like user_notes_count, upvotes, has_conflicts are already in API responses. Storing them costs zero API calls and enables richer queries.
Discussion sync efficiency — The full-refresh strategy is the biggest source of redundant API calls. Even a simple user_notes_count comparison could skip unchanged discussions.
Keyset pagination — A meaningful improvement for large projects (>10K issues), and Gitlore already has partial infrastructure for it.
MR milestone parity — Low-effort gap to close with issue milestone support.