Files

Taylor Eernisse 9b63671df9 docs: Update documentation for search pipeline and Phase A spec

- README.md: Add hybrid search and robot mode to feature list. Update
  quick start to use new noun-first CLI syntax (lore issues, lore mrs,
  lore search). Add embedding configuration section. Update command
  examples throughout.

- AGENTS.md: Update robot mode examples to new CLI syntax. Add search,
  sync, stats, and generate-docs commands to the robot mode reference.
  Update flag conventions (-n for limit, -s for state, -J for JSON).

- docs/prd/checkpoint-3.md: Major expansion with gated milestone
  structure (Gate A: lexical, Gate B: hybrid, Gate C: sync). Add
  prerequisite rename note, code sample conventions, chunking strategy
  details, and sqlite-vec rowid encoding scheme. Clarify that Gate A
  requires only SQLite + FTS5 with no sqlite-vec dependency.

- docs/phase-a-spec.md: New detailed specification for Gate A (lexical
  search MVP) covering document schema, FTS5 configuration, dirty
  queue mechanics, CLI interface, and acceptance criteria.

- docs/api-efficiency-findings.md: Analysis of GitLab API pagination
  behavior and efficiency observations from production sync runs.
  Documents the missing x-next-page header issue and heuristic fix.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-30 15:47:33 -05:00

24 KiB

Raw Blame History

Phase A: Complete API Field Capture

Status: Draft Guiding principle: Mirror everything GitLab gives us.

Lossless mirror: the raw API JSON stored behind raw_payload_id. This is the true complete representation of every API response.

Relational projection: a stable, query-optimized subset of fields we commit to keeping current on every re-sync. This preserves maximum context for processing and analysis while avoiding unbounded schema growth. Migration: 007_complete_field_capture.sql Prerequisite: None (independent of CP3)

Scope

One migration. Three categories of work:

New columns on issues and merge_requests for fields currently dropped by serde or dropped during transform
New serde fields on GitLabIssue and GitLabMergeRequest to deserialize currently-silently-dropped JSON fields
Transformer + insert updates to pass the new fields through to the DB

No new tables. No new API calls. No new endpoints. All data comes from responses we already receive.

Issues: Field Gap Inventory

Currently stored

id, iid, project_id, title, description, state, author_username, created_at, updated_at, web_url, due_date, milestone_id, milestone_title, raw_payload_id, last_seen_at, discussions_synced_for_updated_at, labels (junction), assignees (junction)

Currently deserialized but dropped during transform

API Field	Status	Action
`closed_at`	Deserialized in serde struct, but no DB column exists and transformer never populates it	Add column in migration 007, wire up in IssueRow + transform + INSERT
`author.id`	Deserialized	Store as `author_id` column
`author.name`	Deserialized	Store as `author_name` column

Currently silently dropped by serde (not in GitLabIssue struct)

API Field	Type	DB Column	Notes
`issue_type`	Option<String>	`issue_type`	Canonical field (lowercase, e.g. "issue"); preferred for DB storage
`upvotes`	i64	`upvotes`
`downvotes`	i64	`downvotes`
`user_notes_count`	i64	`user_notes_count`	Useful for discussion sync optimization
`merge_requests_count`	i64	`merge_requests_count`	Count of linked MRs
`confidential`	bool	`confidential`	0/1
`discussion_locked`	bool	`discussion_locked`	0/1
`weight`	Option<i64>	`weight`	Premium/Ultimate, null on Free
`time_stats.time_estimate`	i64	`time_estimate`	Seconds
`time_stats.total_time_spent`	i64	`time_spent`	Seconds
`time_stats.human_time_estimate`	Option<String>	`human_time_estimate`	e.g. "3h 30m"
`time_stats.human_total_time_spent`	Option<String>	`human_time_spent`	e.g. "1h 15m"
`task_completion_status.count`	i64	`task_count`	Checkbox total
`task_completion_status.completed_count`	i64	`task_completed_count`	Checkboxes checked
`has_tasks`	bool	`has_tasks`	0/1
`severity`	Option<String>	`severity`	Incident severity
`closed_by`	Option<object>	`closed_by_username`	Who closed it (username only, consistent with author pattern)
`imported`	bool	`imported`	0/1
`imported_from`	Option<String>	`imported_from`	Import source
`moved_to_id`	Option<i64>	`moved_to_id`	Target issue if moved
`references.short`	String	`references_short`	e.g. "#42"
`references.relative`	String	`references_relative`	e.g. "#42" or "group/proj#42"
`references.full`	String	`references_full`	e.g. "group/project#42"
`health_status`	Option<String>	`health_status`	Ultimate only
`type`	Option<String>	(transform-only)	Uppercase category (e.g. "ISSUE"); fallback for `issue_type` -- lowercased before storage. Not stored as separate column; raw JSON remains lossless.
`epic.id`	Option<i64>	`epic_id`	Premium/Ultimate, null on Free
`epic.iid`	Option<i64>	`epic_iid`
`epic.title`	Option<String>	`epic_title`
`epic.url`	Option<String>	`epic_url`
`epic.group_id`	Option<i64>	`epic_group_id`
`iteration.id`	Option<i64>	`iteration_id`	Premium/Ultimate, null on Free
`iteration.iid`	Option<i64>	`iteration_iid`
`iteration.title`	Option<String>	`iteration_title`
`iteration.state`	Option<i64>	`iteration_state`	Enum: 1=upcoming, 2=current, 3=closed
`iteration.start_date`	Option<String>	`iteration_start_date`	ISO date
`iteration.due_date`	Option<String>	`iteration_due_date`	ISO date

Merge Requests: Field Gap Inventory

Currently stored

id, iid, project_id, title, description, state, draft, author_username, source_branch, target_branch, head_sha, references_short, references_full, detailed_merge_status, merge_user_username, created_at, updated_at, merged_at, closed_at, last_seen_at, web_url, raw_payload_id, discussions_synced_for_updated_at, discussions_sync_last_attempt_at, discussions_sync_attempts, discussions_sync_last_error, labels (junction), assignees (junction), reviewers (junction)

Currently deserialized but dropped during transform

API Field	Status	Action
`author.id`	Deserialized	Store as `author_id` column
`author.name`	Deserialized	Store as `author_name` column
`work_in_progress`	Used transiently for `draft` fallback	Already handled, no change needed
`merge_status` (legacy)	Used transiently for `detailed_merge_status` fallback	Already handled, no change needed
`merged_by`	Used transiently for `merge_user` fallback	Already handled, no change needed

Currently silently dropped by serde (not in GitLabMergeRequest struct)

API Field	Type	DB Column	Notes
`upvotes`	i64	`upvotes`
`downvotes`	i64	`downvotes`
`user_notes_count`	i64	`user_notes_count`
`source_project_id`	i64	`source_project_id`	Fork source
`target_project_id`	i64	`target_project_id`	Fork target
`milestone`	Option<object>	`milestone_id`, `milestone_title`	Reuse issue milestone pattern
`merge_when_pipeline_succeeds`	bool	`merge_when_pipeline_succeeds`	0/1, auto-merge flag
`merge_commit_sha`	Option<String>	`merge_commit_sha`	Commit ref after merge
`squash_commit_sha`	Option<String>	`squash_commit_sha`	Commit ref after squash
`discussion_locked`	bool	`discussion_locked`	0/1
`should_remove_source_branch`	Option<bool>	`should_remove_source_branch`	0/1
`force_remove_source_branch`	Option<bool>	`force_remove_source_branch`	0/1
`squash`	bool	`squash`	0/1
`squash_on_merge`	bool	`squash_on_merge`	0/1
`has_conflicts`	bool	`has_conflicts`	0/1
`blocking_discussions_resolved`	bool	`blocking_discussions_resolved`	0/1
`time_stats.time_estimate`	i64	`time_estimate`	Seconds
`time_stats.total_time_spent`	i64	`time_spent`	Seconds
`time_stats.human_time_estimate`	Option<String>	`human_time_estimate`
`time_stats.human_total_time_spent`	Option<String>	`human_time_spent`
`task_completion_status.count`	i64	`task_count`
`task_completion_status.completed_count`	i64	`task_completed_count`
`closed_by`	Option<object>	`closed_by_username`
`prepared_at`	Option<String>	`prepared_at`	ISO datetime in API; store as ms epoch via `iso_to_ms()`, nullable
`merge_after`	Option<String>	`merge_after`	ISO datetime in API; store as ms epoch via `iso_to_ms()`, nullable (scheduled merge)
`imported`	bool	`imported`	0/1
`imported_from`	Option<String>	`imported_from`
`approvals_before_merge`	Option<i64>	`approvals_before_merge`	Deprecated, scheduled for removal in GitLab API v5; store best-effort, keep nullable
`references.relative`	String	`references_relative`	Currently only short + full stored
`confidential`	bool	`confidential`	0/1 (MRs can be confidential too)
`iteration.id`	Option<i64>	`iteration_id`	Premium/Ultimate, null on Free
`iteration.iid`	Option<i64>	`iteration_iid`
`iteration.title`	Option<String>	`iteration_title`
`iteration.state`	Option<i64>	`iteration_state`
`iteration.start_date`	Option<String>	`iteration_start_date`	ISO date
`iteration.due_date`	Option<String>	`iteration_due_date`	ISO date

Migration 007: complete_field_capture.sql

-- Migration 007: Capture all remaining GitLab API response fields.
-- Principle: mirror everything GitLab returns. No field left behind.

-- ============================================================
-- ISSUES: new columns
-- ============================================================

-- Fields currently deserialized but not stored
ALTER TABLE issues ADD COLUMN closed_at INTEGER;             -- ms epoch, deserialized but never stored until now
ALTER TABLE issues ADD COLUMN author_id INTEGER;             -- GitLab user ID
ALTER TABLE issues ADD COLUMN author_name TEXT;              -- Display name

-- Issue metadata
ALTER TABLE issues ADD COLUMN issue_type TEXT;               -- 'issue' | 'incident' | 'test_case'
ALTER TABLE issues ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN discussion_locked INTEGER NOT NULL DEFAULT 0;

-- Engagement
ALTER TABLE issues ADD COLUMN upvotes INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN downvotes INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN user_notes_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN merge_requests_count INTEGER NOT NULL DEFAULT 0;

-- Time tracking
ALTER TABLE issues ADD COLUMN time_estimate INTEGER NOT NULL DEFAULT 0;       -- seconds
ALTER TABLE issues ADD COLUMN time_spent INTEGER NOT NULL DEFAULT 0;          -- seconds
ALTER TABLE issues ADD COLUMN human_time_estimate TEXT;
ALTER TABLE issues ADD COLUMN human_time_spent TEXT;

-- Task lists
ALTER TABLE issues ADD COLUMN task_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN task_completed_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN has_tasks INTEGER NOT NULL DEFAULT 0;

-- References (MRs already have short + full)
ALTER TABLE issues ADD COLUMN references_short TEXT;         -- e.g. "#42"
ALTER TABLE issues ADD COLUMN references_relative TEXT;      -- context-dependent
ALTER TABLE issues ADD COLUMN references_full TEXT;          -- e.g. "group/project#42"

-- Close/move tracking
ALTER TABLE issues ADD COLUMN closed_by_username TEXT;

-- Premium/Ultimate fields (nullable, null on Free tier)
ALTER TABLE issues ADD COLUMN weight INTEGER;
ALTER TABLE issues ADD COLUMN severity TEXT;
ALTER TABLE issues ADD COLUMN health_status TEXT;

-- Import tracking
ALTER TABLE issues ADD COLUMN imported INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN imported_from TEXT;
ALTER TABLE issues ADD COLUMN moved_to_id INTEGER;

-- Epic (Premium/Ultimate, null on Free)
ALTER TABLE issues ADD COLUMN epic_id INTEGER;
ALTER TABLE issues ADD COLUMN epic_iid INTEGER;
ALTER TABLE issues ADD COLUMN epic_title TEXT;
ALTER TABLE issues ADD COLUMN epic_url TEXT;
ALTER TABLE issues ADD COLUMN epic_group_id INTEGER;

-- Iteration (Premium/Ultimate, null on Free)
ALTER TABLE issues ADD COLUMN iteration_id INTEGER;
ALTER TABLE issues ADD COLUMN iteration_iid INTEGER;
ALTER TABLE issues ADD COLUMN iteration_title TEXT;
ALTER TABLE issues ADD COLUMN iteration_state INTEGER;
ALTER TABLE issues ADD COLUMN iteration_start_date TEXT;
ALTER TABLE issues ADD COLUMN iteration_due_date TEXT;

-- ============================================================
-- MERGE REQUESTS: new columns
-- ============================================================

-- Author enrichment
ALTER TABLE merge_requests ADD COLUMN author_id INTEGER;
ALTER TABLE merge_requests ADD COLUMN author_name TEXT;

-- Engagement
ALTER TABLE merge_requests ADD COLUMN upvotes INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN downvotes INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN user_notes_count INTEGER NOT NULL DEFAULT 0;

-- Fork tracking
ALTER TABLE merge_requests ADD COLUMN source_project_id INTEGER;
ALTER TABLE merge_requests ADD COLUMN target_project_id INTEGER;

-- Milestone (parity with issues)
ALTER TABLE merge_requests ADD COLUMN milestone_id INTEGER;
ALTER TABLE merge_requests ADD COLUMN milestone_title TEXT;

-- Merge behavior
ALTER TABLE merge_requests ADD COLUMN merge_when_pipeline_succeeds INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN merge_commit_sha TEXT;
ALTER TABLE merge_requests ADD COLUMN squash_commit_sha TEXT;
ALTER TABLE merge_requests ADD COLUMN squash INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN squash_on_merge INTEGER NOT NULL DEFAULT 0;

-- Merge readiness
ALTER TABLE merge_requests ADD COLUMN has_conflicts INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN blocking_discussions_resolved INTEGER NOT NULL DEFAULT 0;

-- Branch cleanup
ALTER TABLE merge_requests ADD COLUMN should_remove_source_branch INTEGER;
ALTER TABLE merge_requests ADD COLUMN force_remove_source_branch INTEGER;

-- Discussion lock
ALTER TABLE merge_requests ADD COLUMN discussion_locked INTEGER NOT NULL DEFAULT 0;

-- Time tracking
ALTER TABLE merge_requests ADD COLUMN time_estimate INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN time_spent INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN human_time_estimate TEXT;
ALTER TABLE merge_requests ADD COLUMN human_time_spent TEXT;

-- Task lists
ALTER TABLE merge_requests ADD COLUMN task_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN task_completed_count INTEGER NOT NULL DEFAULT 0;

-- Close tracking
ALTER TABLE merge_requests ADD COLUMN closed_by_username TEXT;

-- Scheduling (API returns ISO datetimes; we store ms epoch for consistency)
ALTER TABLE merge_requests ADD COLUMN prepared_at INTEGER;       -- ms epoch after iso_to_ms()
ALTER TABLE merge_requests ADD COLUMN merge_after INTEGER;       -- ms epoch after iso_to_ms()

-- References (add relative, short + full already exist)
ALTER TABLE merge_requests ADD COLUMN references_relative TEXT;

-- Import tracking
ALTER TABLE merge_requests ADD COLUMN imported INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN imported_from TEXT;

-- Premium/Ultimate
ALTER TABLE merge_requests ADD COLUMN approvals_before_merge INTEGER;
ALTER TABLE merge_requests ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0;

-- Iteration (Premium/Ultimate, null on Free)
ALTER TABLE merge_requests ADD COLUMN iteration_id INTEGER;
ALTER TABLE merge_requests ADD COLUMN iteration_iid INTEGER;
ALTER TABLE merge_requests ADD COLUMN iteration_title TEXT;
ALTER TABLE merge_requests ADD COLUMN iteration_state INTEGER;
ALTER TABLE merge_requests ADD COLUMN iteration_start_date TEXT;
ALTER TABLE merge_requests ADD COLUMN iteration_due_date TEXT;

-- Record migration version
INSERT INTO schema_version (version, applied_at, description)
VALUES (7, strftime('%s', 'now') * 1000, 'Complete API field capture for issues and merge requests');

Serde Struct Changes

Existing type changes

GitLabReferences                              // Add: relative: Option<String> (with #[serde(default)])
                                              // Existing fields short + full remain unchanged
GitLabIssue                                   // Add #[derive(Default)] for test ergonomics
GitLabMergeRequest                            // Add #[derive(Default)] for test ergonomics

New helper types needed

GitLabTimeStats { time_estimate, total_time_spent, human_time_estimate, human_total_time_spent }
GitLabTaskCompletionStatus { count, completed_count }
GitLabClosedBy (reuse GitLabAuthor shape: id, username, name)
GitLabEpic { id, iid, title, url, group_id }
GitLabIteration { id, iid, title, state, start_date, due_date }

GitLabIssue: add fields

type: Option<String>                          // #[serde(rename = "type")]  -- fallback-only (uppercase category); "type" is reserved in Rust
upvotes: i64                                  // #[serde(default)]
downvotes: i64                                // #[serde(default)]
user_notes_count: i64                         // #[serde(default)]
merge_requests_count: i64                     // #[serde(default)]
confidential: bool                            // #[serde(default)]
discussion_locked: bool                       // #[serde(default)]
weight: Option<i64>
time_stats: Option<GitLabTimeStats>
task_completion_status: Option<GitLabTaskCompletionStatus>
has_tasks: bool                               // #[serde(default)]
references: Option<GitLabReferences>
closed_by: Option<GitLabAuthor>
severity: Option<String>
health_status: Option<String>
imported: bool                                // #[serde(default)]
imported_from: Option<String>
moved_to_id: Option<i64>
issue_type: Option<String>                    // canonical field (lowercase); preferred for DB storage over `type`
epic: Option<GitLabEpic>
iteration: Option<GitLabIteration>

GitLabMergeRequest: add fields

upvotes: i64                                  // #[serde(default)]
downvotes: i64                                // #[serde(default)]
user_notes_count: i64                         // #[serde(default)]
source_project_id: Option<i64>
target_project_id: Option<i64>
milestone: Option<GitLabMilestone>            // reuse existing type
merge_when_pipeline_succeeds: bool            // #[serde(default)]
merge_commit_sha: Option<String>
squash_commit_sha: Option<String>
squash: bool                                  // #[serde(default)]
squash_on_merge: bool                         // #[serde(default)]
has_conflicts: bool                           // #[serde(default)]
blocking_discussions_resolved: bool           // #[serde(default)]
should_remove_source_branch: Option<bool>
force_remove_source_branch: Option<bool>
discussion_locked: bool                       // #[serde(default)]
time_stats: Option<GitLabTimeStats>
task_completion_status: Option<GitLabTaskCompletionStatus>
closed_by: Option<GitLabAuthor>
prepared_at: Option<String>
merge_after: Option<String>
imported: bool                                // #[serde(default)]
imported_from: Option<String>
approvals_before_merge: Option<i64>
confidential: bool                            // #[serde(default)]
iteration: Option<GitLabIteration>

Transformer Changes

IssueRow: add fields

All new fields map 1:1 from the serde struct except:

closed_at -> iso_to_ms() conversion (already in serde struct, just not passed through)
time_stats -> flatten to 4 individual fields
task_completion_status -> flatten to 2 individual fields
references -> flatten to 3 individual fields
closed_by -> extract username only (consistent with author pattern)
author -> additionally extract id and name (currently only username)
issue_type -> store as-is (canonical, lowercase); fallback to lowercased type field if issue_type absent
epic -> flatten to 5 individual fields (id, iid, title, url, group_id)
iteration -> flatten to 6 individual fields (id, iid, title, state, start_date, due_date)

NormalizedMergeRequest: add fields

Same patterns as issues, plus:

milestone -> reuse upsert_milestone_tx from issue pipeline, add milestone_id + milestone_title
prepared_at, merge_after -> iso_to_ms() conversion (API provides ISO datetimes)
source_project_id, target_project_id -> direct pass-through
iteration -> flatten to 6 individual fields (same as issues)

Insert statement changes

Both process_issue_in_transaction and process_mr_in_transaction need their INSERT and ON CONFLICT DO UPDATE statements extended with all new columns. The ON CONFLICT clause should update all new fields on re-sync.

Implementation note (reliability): Define a single authoritative list of persisted columns per entity and generate/compose both SQL fragments from it:

INSERT column list + VALUES placeholders
ON CONFLICT DO UPDATE assignments

This prevents drift where a new field is added to one clause but not the other -- the most likely bug class with 40+ new columns.

Prerequisite refactors (prep commits before main Phase A work)

1. Align issue transformer on `core::time`

The issue transformer (transformers/issue.rs) has a local parse_timestamp() that duplicates iso_to_ms_strict() from core::time. The MR transformer already uses the shared module. Before adding Phase A's optional timestamp fields (especially closed_at as Option<String>), migrate the issue transformer to use iso_to_ms_strict() and iso_to_ms_opt_strict() from core::time. This avoids duplicating the opt variant locally and establishes one timestamp parsing path across the codebase.

Changes: Replace parse_timestamp() calls with iso_to_ms_strict(), adapt or remove TransformError::TimestampParse (MR transformer uses String errors; align on that or on a shared error type).

2. Extract shared ingestion helpers

upsert_milestone_tx (in ingestion/issues.rs) and upsert_label_tx (duplicated in both ingestion/issues.rs and ingestion/merge_requests.rs) should be moved to a shared module (e.g., src/ingestion/shared.rs). MR ingestion needs upsert_milestone_tx for Phase A milestone support, and the label helper is already copy-pasted between files.

Changes: Create src/ingestion/shared.rs, move upsert_milestone_tx, upsert_label_tx, and MilestoneRow there. Update imports in both issue and MR ingestion modules.

Files touched

File	Change
`migrations/007_complete_field_capture.sql`	New file
`src/gitlab/types.rs`	Add `#[derive(Default)]` to `GitLabIssue` and `GitLabMergeRequest`; add `relative: Option<String>` to `GitLabReferences`; add fields to both structs; add `GitLabTimeStats`, `GitLabTaskCompletionStatus`, `GitLabEpic`, `GitLabIteration`
`src/gitlab/transformers/issue.rs`	Remove local `parse_timestamp()`, switch to `core::time`; extend IssueRow, IssueWithMetadata, transform_issue()
`src/gitlab/transformers/merge_request.rs`	Extend NormalizedMergeRequest, MergeRequestWithMetadata, transform_merge_request(); extract `references_relative`
`src/ingestion/shared.rs`	New file: shared `upsert_milestone_tx`, `upsert_label_tx`, `MilestoneRow`
`src/ingestion/issues.rs`	Extend INSERT/UPSERT SQL; import from shared module
`src/ingestion/merge_requests.rs`	Extend INSERT/UPSERT SQL; import from shared module; add milestone upsert
`src/core/db.rs`	Register migration 007 in `MIGRATIONS` array

What this does NOT include

No new API endpoints called
No new tables (except reusing existing milestones for MRs)
No CLI changes (new fields are stored but not yet surfaced in lore issues / lore mrs output)
No changes to discussion/note ingestion (Phase A is issues + MRs only)
No observability instrumentation (that's Phase B)

Rollout / Backfill Note

After applying Migration 007 and shipping transformer + UPSERT updates, existing rows will not have the new columns populated until issues/MRs are reprocessed. Plan on a one-time full re-sync (lore ingest --type issues --full and lore ingest --type mrs --full) to backfill the new fields. Until then, queries on new columns will return NULL/default values for previously-synced entities.

Resolved decisions

Field	Decision	Rationale
`subscribed`	Excluded	User-relative field (reflects token holder's subscription state, not an entity property). Changes meaning if the token is rotated to a different user. Not entity data.
`_links`	Excluded	HATEOAS API navigation metadata, not entity data. Every URL is deterministically constructable from `project_id` + `iid` + GitLab base URL. Note: `closed_as_duplicate_of` inside `_links` contains a real entity reference -- extracting that is deferred to a future phase.
`epic` / `iteration`	Flatten to columns	Same denormalization pattern as milestones. Epic gets 5 columns (`epic_id`, `epic_iid`, `epic_title`, `epic_url`, `epic_group_id`). Iteration gets 6 columns (`iteration_id`, `iteration_iid`, `iteration_title`, `iteration_state`, `iteration_start_date`, `iteration_due_date`). Both nullable (null on Free tier).
`approvals_before_merge`	Store best-effort	Deprecated and scheduled for removal in GitLab API v5. Keep as `Option<i64>` / nullable column. Never depend on it for correctness -- it may disappear in a future GitLab release.

24 KiB Raw Blame History

Phase A: Complete API Field Capture

Scope

Issues: Field Gap Inventory

Currently stored

Currently deserialized but dropped during transform

Currently silently dropped by serde (not in GitLabIssue struct)

Merge Requests: Field Gap Inventory

Currently stored

Currently deserialized but dropped during transform

Currently silently dropped by serde (not in GitLabMergeRequest struct)

Migration 007: complete_field_capture.sql

Serde Struct Changes

Existing type changes

New helper types needed

GitLabIssue: add fields

GitLabMergeRequest: add fields

Transformer Changes

IssueRow: add fields

NormalizedMergeRequest: add fields

Insert statement changes

Prerequisite refactors (prep commits before main Phase A work)

1. Align issue transformer on core::time

2. Extract shared ingestion helpers

Files touched

What this does NOT include

Rollout / Backfill Note

Resolved decisions

24 KiB

Raw Blame History

1. Align issue transformer on `core::time`