# Phase A: Complete API Field Capture > **Status:** Draft > **Guiding principle:** Mirror everything GitLab gives us. > - **Lossless mirror:** the raw API JSON stored behind `raw_payload_id`. This is the true complete representation of every API response. > - **Relational projection:** a stable, query-optimized subset of fields we commit to keeping current on every re-sync. > This preserves maximum context for processing and analysis while avoiding unbounded schema growth. > **Migration:** 007_complete_field_capture.sql > **Prerequisite:** None (independent of CP3) --- ## Scope One migration. Three categories of work: 1. **New columns** on `issues` and `merge_requests` for fields currently dropped by serde or dropped during transform 2. **New serde fields** on `GitLabIssue` and `GitLabMergeRequest` to deserialize currently-silently-dropped JSON fields 3. **Transformer + insert updates** to pass the new fields through to the DB No new tables. No new API calls. No new endpoints. All data comes from responses we already receive. --- ## Issues: Field Gap Inventory ### Currently stored id, iid, project_id, title, description, state, author_username, created_at, updated_at, web_url, due_date, milestone_id, milestone_title, raw_payload_id, last_seen_at, discussions_synced_for_updated_at, labels (junction), assignees (junction) ### Currently deserialized but dropped during transform | API Field | Status | Action | |-----------|--------|--------| | `closed_at` | Deserialized in serde struct, but no DB column exists and transformer never populates it | Add column in migration 007, wire up in IssueRow + transform + INSERT | | `author.id` | Deserialized | Store as `author_id` column | | `author.name` | Deserialized | Store as `author_name` column | ### Currently silently dropped by serde (not in GitLabIssue struct) | API Field | Type | DB Column | Notes | |-----------|------|-----------|-------| | `issue_type` | Option\ | `issue_type` | Canonical field (lowercase, e.g. "issue"); preferred for DB storage | | `upvotes` | i64 | `upvotes` | | | `downvotes` | i64 | `downvotes` | | | `user_notes_count` | i64 | `user_notes_count` | Useful for discussion sync optimization | | `merge_requests_count` | i64 | `merge_requests_count` | Count of linked MRs | | `confidential` | bool | `confidential` | 0/1 | | `discussion_locked` | bool | `discussion_locked` | 0/1 | | `weight` | Option\ | `weight` | Premium/Ultimate, null on Free | | `time_stats.time_estimate` | i64 | `time_estimate` | Seconds | | `time_stats.total_time_spent` | i64 | `time_spent` | Seconds | | `time_stats.human_time_estimate` | Option\ | `human_time_estimate` | e.g. "3h 30m" | | `time_stats.human_total_time_spent` | Option\ | `human_time_spent` | e.g. "1h 15m" | | `task_completion_status.count` | i64 | `task_count` | Checkbox total | | `task_completion_status.completed_count` | i64 | `task_completed_count` | Checkboxes checked | | `has_tasks` | bool | `has_tasks` | 0/1 | | `severity` | Option\ | `severity` | Incident severity | | `closed_by` | Option\ | `closed_by_username` | Who closed it (username only, consistent with author pattern) | | `imported` | bool | `imported` | 0/1 | | `imported_from` | Option\ | `imported_from` | Import source | | `moved_to_id` | Option\ | `moved_to_id` | Target issue if moved | | `references.short` | String | `references_short` | e.g. "#42" | | `references.relative` | String | `references_relative` | e.g. "#42" or "group/proj#42" | | `references.full` | String | `references_full` | e.g. "group/project#42" | | `health_status` | Option\ | `health_status` | Ultimate only | | `type` | Option\ | (transform-only) | Uppercase category (e.g. "ISSUE"); fallback for `issue_type` -- lowercased before storage. Not stored as separate column; raw JSON remains lossless. | | `epic.id` | Option\ | `epic_id` | Premium/Ultimate, null on Free | | `epic.iid` | Option\ | `epic_iid` | | | `epic.title` | Option\ | `epic_title` | | | `epic.url` | Option\ | `epic_url` | | | `epic.group_id` | Option\ | `epic_group_id` | | | `iteration.id` | Option\ | `iteration_id` | Premium/Ultimate, null on Free | | `iteration.iid` | Option\ | `iteration_iid` | | | `iteration.title` | Option\ | `iteration_title` | | | `iteration.state` | Option\ | `iteration_state` | Enum: 1=upcoming, 2=current, 3=closed | | `iteration.start_date` | Option\ | `iteration_start_date` | ISO date | | `iteration.due_date` | Option\ | `iteration_due_date` | ISO date | --- ## Merge Requests: Field Gap Inventory ### Currently stored id, iid, project_id, title, description, state, draft, author_username, source_branch, target_branch, head_sha, references_short, references_full, detailed_merge_status, merge_user_username, created_at, updated_at, merged_at, closed_at, last_seen_at, web_url, raw_payload_id, discussions_synced_for_updated_at, discussions_sync_last_attempt_at, discussions_sync_attempts, discussions_sync_last_error, labels (junction), assignees (junction), reviewers (junction) ### Currently deserialized but dropped during transform | API Field | Status | Action | |-----------|--------|--------| | `author.id` | Deserialized | Store as `author_id` column | | `author.name` | Deserialized | Store as `author_name` column | | `work_in_progress` | Used transiently for `draft` fallback | Already handled, no change needed | | `merge_status` (legacy) | Used transiently for `detailed_merge_status` fallback | Already handled, no change needed | | `merged_by` | Used transiently for `merge_user` fallback | Already handled, no change needed | ### Currently silently dropped by serde (not in GitLabMergeRequest struct) | API Field | Type | DB Column | Notes | |-----------|------|-----------|-------| | `upvotes` | i64 | `upvotes` | | | `downvotes` | i64 | `downvotes` | | | `user_notes_count` | i64 | `user_notes_count` | | | `source_project_id` | i64 | `source_project_id` | Fork source | | `target_project_id` | i64 | `target_project_id` | Fork target | | `milestone` | Option\ | `milestone_id`, `milestone_title` | Reuse issue milestone pattern | | `merge_when_pipeline_succeeds` | bool | `merge_when_pipeline_succeeds` | 0/1, auto-merge flag | | `merge_commit_sha` | Option\ | `merge_commit_sha` | Commit ref after merge | | `squash_commit_sha` | Option\ | `squash_commit_sha` | Commit ref after squash | | `discussion_locked` | bool | `discussion_locked` | 0/1 | | `should_remove_source_branch` | Option\ | `should_remove_source_branch` | 0/1 | | `force_remove_source_branch` | Option\ | `force_remove_source_branch` | 0/1 | | `squash` | bool | `squash` | 0/1 | | `squash_on_merge` | bool | `squash_on_merge` | 0/1 | | `has_conflicts` | bool | `has_conflicts` | 0/1 | | `blocking_discussions_resolved` | bool | `blocking_discussions_resolved` | 0/1 | | `time_stats.time_estimate` | i64 | `time_estimate` | Seconds | | `time_stats.total_time_spent` | i64 | `time_spent` | Seconds | | `time_stats.human_time_estimate` | Option\ | `human_time_estimate` | | | `time_stats.human_total_time_spent` | Option\ | `human_time_spent` | | | `task_completion_status.count` | i64 | `task_count` | | | `task_completion_status.completed_count` | i64 | `task_completed_count` | | | `closed_by` | Option\ | `closed_by_username` | | | `prepared_at` | Option\ | `prepared_at` | ISO datetime in API; store as ms epoch via `iso_to_ms()`, nullable | | `merge_after` | Option\ | `merge_after` | ISO datetime in API; store as ms epoch via `iso_to_ms()`, nullable (scheduled merge) | | `imported` | bool | `imported` | 0/1 | | `imported_from` | Option\ | `imported_from` | | | `approvals_before_merge` | Option\ | `approvals_before_merge` | Deprecated, scheduled for removal in GitLab API v5; store best-effort, keep nullable | | `references.relative` | String | `references_relative` | Currently only short + full stored | | `confidential` | bool | `confidential` | 0/1 (MRs can be confidential too) | | `iteration.id` | Option\ | `iteration_id` | Premium/Ultimate, null on Free | | `iteration.iid` | Option\ | `iteration_iid` | | | `iteration.title` | Option\ | `iteration_title` | | | `iteration.state` | Option\ | `iteration_state` | | | `iteration.start_date` | Option\ | `iteration_start_date` | ISO date | | `iteration.due_date` | Option\ | `iteration_due_date` | ISO date | --- ## Migration 007: complete_field_capture.sql ```sql -- Migration 007: Capture all remaining GitLab API response fields. -- Principle: mirror everything GitLab returns. No field left behind. -- ============================================================ -- ISSUES: new columns -- ============================================================ -- Fields currently deserialized but not stored ALTER TABLE issues ADD COLUMN closed_at INTEGER; -- ms epoch, deserialized but never stored until now ALTER TABLE issues ADD COLUMN author_id INTEGER; -- GitLab user ID ALTER TABLE issues ADD COLUMN author_name TEXT; -- Display name -- Issue metadata ALTER TABLE issues ADD COLUMN issue_type TEXT; -- 'issue' | 'incident' | 'test_case' ALTER TABLE issues ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0; ALTER TABLE issues ADD COLUMN discussion_locked INTEGER NOT NULL DEFAULT 0; -- Engagement ALTER TABLE issues ADD COLUMN upvotes INTEGER NOT NULL DEFAULT 0; ALTER TABLE issues ADD COLUMN downvotes INTEGER NOT NULL DEFAULT 0; ALTER TABLE issues ADD COLUMN user_notes_count INTEGER NOT NULL DEFAULT 0; ALTER TABLE issues ADD COLUMN merge_requests_count INTEGER NOT NULL DEFAULT 0; -- Time tracking ALTER TABLE issues ADD COLUMN time_estimate INTEGER NOT NULL DEFAULT 0; -- seconds ALTER TABLE issues ADD COLUMN time_spent INTEGER NOT NULL DEFAULT 0; -- seconds ALTER TABLE issues ADD COLUMN human_time_estimate TEXT; ALTER TABLE issues ADD COLUMN human_time_spent TEXT; -- Task lists ALTER TABLE issues ADD COLUMN task_count INTEGER NOT NULL DEFAULT 0; ALTER TABLE issues ADD COLUMN task_completed_count INTEGER NOT NULL DEFAULT 0; ALTER TABLE issues ADD COLUMN has_tasks INTEGER NOT NULL DEFAULT 0; -- References (MRs already have short + full) ALTER TABLE issues ADD COLUMN references_short TEXT; -- e.g. "#42" ALTER TABLE issues ADD COLUMN references_relative TEXT; -- context-dependent ALTER TABLE issues ADD COLUMN references_full TEXT; -- e.g. "group/project#42" -- Close/move tracking ALTER TABLE issues ADD COLUMN closed_by_username TEXT; -- Premium/Ultimate fields (nullable, null on Free tier) ALTER TABLE issues ADD COLUMN weight INTEGER; ALTER TABLE issues ADD COLUMN severity TEXT; ALTER TABLE issues ADD COLUMN health_status TEXT; -- Import tracking ALTER TABLE issues ADD COLUMN imported INTEGER NOT NULL DEFAULT 0; ALTER TABLE issues ADD COLUMN imported_from TEXT; ALTER TABLE issues ADD COLUMN moved_to_id INTEGER; -- Epic (Premium/Ultimate, null on Free) ALTER TABLE issues ADD COLUMN epic_id INTEGER; ALTER TABLE issues ADD COLUMN epic_iid INTEGER; ALTER TABLE issues ADD COLUMN epic_title TEXT; ALTER TABLE issues ADD COLUMN epic_url TEXT; ALTER TABLE issues ADD COLUMN epic_group_id INTEGER; -- Iteration (Premium/Ultimate, null on Free) ALTER TABLE issues ADD COLUMN iteration_id INTEGER; ALTER TABLE issues ADD COLUMN iteration_iid INTEGER; ALTER TABLE issues ADD COLUMN iteration_title TEXT; ALTER TABLE issues ADD COLUMN iteration_state INTEGER; ALTER TABLE issues ADD COLUMN iteration_start_date TEXT; ALTER TABLE issues ADD COLUMN iteration_due_date TEXT; -- ============================================================ -- MERGE REQUESTS: new columns -- ============================================================ -- Author enrichment ALTER TABLE merge_requests ADD COLUMN author_id INTEGER; ALTER TABLE merge_requests ADD COLUMN author_name TEXT; -- Engagement ALTER TABLE merge_requests ADD COLUMN upvotes INTEGER NOT NULL DEFAULT 0; ALTER TABLE merge_requests ADD COLUMN downvotes INTEGER NOT NULL DEFAULT 0; ALTER TABLE merge_requests ADD COLUMN user_notes_count INTEGER NOT NULL DEFAULT 0; -- Fork tracking ALTER TABLE merge_requests ADD COLUMN source_project_id INTEGER; ALTER TABLE merge_requests ADD COLUMN target_project_id INTEGER; -- Milestone (parity with issues) ALTER TABLE merge_requests ADD COLUMN milestone_id INTEGER; ALTER TABLE merge_requests ADD COLUMN milestone_title TEXT; -- Merge behavior ALTER TABLE merge_requests ADD COLUMN merge_when_pipeline_succeeds INTEGER NOT NULL DEFAULT 0; ALTER TABLE merge_requests ADD COLUMN merge_commit_sha TEXT; ALTER TABLE merge_requests ADD COLUMN squash_commit_sha TEXT; ALTER TABLE merge_requests ADD COLUMN squash INTEGER NOT NULL DEFAULT 0; ALTER TABLE merge_requests ADD COLUMN squash_on_merge INTEGER NOT NULL DEFAULT 0; -- Merge readiness ALTER TABLE merge_requests ADD COLUMN has_conflicts INTEGER NOT NULL DEFAULT 0; ALTER TABLE merge_requests ADD COLUMN blocking_discussions_resolved INTEGER NOT NULL DEFAULT 0; -- Branch cleanup ALTER TABLE merge_requests ADD COLUMN should_remove_source_branch INTEGER; ALTER TABLE merge_requests ADD COLUMN force_remove_source_branch INTEGER; -- Discussion lock ALTER TABLE merge_requests ADD COLUMN discussion_locked INTEGER NOT NULL DEFAULT 0; -- Time tracking ALTER TABLE merge_requests ADD COLUMN time_estimate INTEGER NOT NULL DEFAULT 0; ALTER TABLE merge_requests ADD COLUMN time_spent INTEGER NOT NULL DEFAULT 0; ALTER TABLE merge_requests ADD COLUMN human_time_estimate TEXT; ALTER TABLE merge_requests ADD COLUMN human_time_spent TEXT; -- Task lists ALTER TABLE merge_requests ADD COLUMN task_count INTEGER NOT NULL DEFAULT 0; ALTER TABLE merge_requests ADD COLUMN task_completed_count INTEGER NOT NULL DEFAULT 0; -- Close tracking ALTER TABLE merge_requests ADD COLUMN closed_by_username TEXT; -- Scheduling (API returns ISO datetimes; we store ms epoch for consistency) ALTER TABLE merge_requests ADD COLUMN prepared_at INTEGER; -- ms epoch after iso_to_ms() ALTER TABLE merge_requests ADD COLUMN merge_after INTEGER; -- ms epoch after iso_to_ms() -- References (add relative, short + full already exist) ALTER TABLE merge_requests ADD COLUMN references_relative TEXT; -- Import tracking ALTER TABLE merge_requests ADD COLUMN imported INTEGER NOT NULL DEFAULT 0; ALTER TABLE merge_requests ADD COLUMN imported_from TEXT; -- Premium/Ultimate ALTER TABLE merge_requests ADD COLUMN approvals_before_merge INTEGER; ALTER TABLE merge_requests ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0; -- Iteration (Premium/Ultimate, null on Free) ALTER TABLE merge_requests ADD COLUMN iteration_id INTEGER; ALTER TABLE merge_requests ADD COLUMN iteration_iid INTEGER; ALTER TABLE merge_requests ADD COLUMN iteration_title TEXT; ALTER TABLE merge_requests ADD COLUMN iteration_state INTEGER; ALTER TABLE merge_requests ADD COLUMN iteration_start_date TEXT; ALTER TABLE merge_requests ADD COLUMN iteration_due_date TEXT; -- Record migration version INSERT INTO schema_version (version, applied_at, description) VALUES (7, strftime('%s', 'now') * 1000, 'Complete API field capture for issues and merge requests'); ``` --- ## Serde Struct Changes ### Existing type changes ``` GitLabReferences // Add: relative: Option (with #[serde(default)]) // Existing fields short + full remain unchanged GitLabIssue // Add #[derive(Default)] for test ergonomics GitLabMergeRequest // Add #[derive(Default)] for test ergonomics ``` ### New helper types needed ``` GitLabTimeStats { time_estimate, total_time_spent, human_time_estimate, human_total_time_spent } GitLabTaskCompletionStatus { count, completed_count } GitLabClosedBy (reuse GitLabAuthor shape: id, username, name) GitLabEpic { id, iid, title, url, group_id } GitLabIteration { id, iid, title, state, start_date, due_date } ``` ### GitLabIssue: add fields ``` type: Option // #[serde(rename = "type")] -- fallback-only (uppercase category); "type" is reserved in Rust upvotes: i64 // #[serde(default)] downvotes: i64 // #[serde(default)] user_notes_count: i64 // #[serde(default)] merge_requests_count: i64 // #[serde(default)] confidential: bool // #[serde(default)] discussion_locked: bool // #[serde(default)] weight: Option time_stats: Option task_completion_status: Option has_tasks: bool // #[serde(default)] references: Option closed_by: Option severity: Option health_status: Option imported: bool // #[serde(default)] imported_from: Option moved_to_id: Option issue_type: Option // canonical field (lowercase); preferred for DB storage over `type` epic: Option iteration: Option ``` ### GitLabMergeRequest: add fields ``` upvotes: i64 // #[serde(default)] downvotes: i64 // #[serde(default)] user_notes_count: i64 // #[serde(default)] source_project_id: Option target_project_id: Option milestone: Option // reuse existing type merge_when_pipeline_succeeds: bool // #[serde(default)] merge_commit_sha: Option squash_commit_sha: Option squash: bool // #[serde(default)] squash_on_merge: bool // #[serde(default)] has_conflicts: bool // #[serde(default)] blocking_discussions_resolved: bool // #[serde(default)] should_remove_source_branch: Option force_remove_source_branch: Option discussion_locked: bool // #[serde(default)] time_stats: Option task_completion_status: Option closed_by: Option prepared_at: Option merge_after: Option imported: bool // #[serde(default)] imported_from: Option approvals_before_merge: Option confidential: bool // #[serde(default)] iteration: Option ``` --- ## Transformer Changes ### IssueRow: add fields All new fields map 1:1 from the serde struct except: - `closed_at` -> `iso_to_ms()` conversion (already in serde struct, just not passed through) - `time_stats` -> flatten to 4 individual fields - `task_completion_status` -> flatten to 2 individual fields - `references` -> flatten to 3 individual fields - `closed_by` -> extract `username` only (consistent with author pattern) - `author` -> additionally extract `id` and `name` (currently only `username`) - `issue_type` -> store as-is (canonical, lowercase); fallback to lowercased `type` field if `issue_type` absent - `epic` -> flatten to 5 individual fields (id, iid, title, url, group_id) - `iteration` -> flatten to 6 individual fields (id, iid, title, state, start_date, due_date) ### NormalizedMergeRequest: add fields Same patterns as issues, plus: - `milestone` -> reuse `upsert_milestone_tx` from issue pipeline, add `milestone_id` + `milestone_title` - `prepared_at`, `merge_after` -> `iso_to_ms()` conversion (API provides ISO datetimes) - `source_project_id`, `target_project_id` -> direct pass-through - `iteration` -> flatten to 6 individual fields (same as issues) ### Insert statement changes Both `process_issue_in_transaction` and `process_mr_in_transaction` need their INSERT and ON CONFLICT DO UPDATE statements extended with all new columns. The ON CONFLICT clause should update all new fields on re-sync. **Implementation note (reliability):** Define a single authoritative list of persisted columns per entity and generate/compose both SQL fragments from it: - INSERT column list + VALUES placeholders - ON CONFLICT DO UPDATE assignments This prevents drift where a new field is added to one clause but not the other -- the most likely bug class with 40+ new columns. --- ## Prerequisite refactors (prep commits before main Phase A work) ### 1. Align issue transformer on `core::time` The issue transformer (`transformers/issue.rs`) has a local `parse_timestamp()` that duplicates `iso_to_ms_strict()` from `core::time`. The MR transformer already uses the shared module. Before adding Phase A's optional timestamp fields (especially `closed_at` as `Option`), migrate the issue transformer to use `iso_to_ms_strict()` and `iso_to_ms_opt_strict()` from `core::time`. This avoids duplicating the `opt` variant locally and establishes one timestamp parsing path across the codebase. **Changes:** Replace `parse_timestamp()` calls with `iso_to_ms_strict()`, adapt or remove `TransformError::TimestampParse` (MR transformer uses `String` errors; align on that or on a shared error type). ### 2. Extract shared ingestion helpers `upsert_milestone_tx` (in `ingestion/issues.rs`) and `upsert_label_tx` (duplicated in both `ingestion/issues.rs` and `ingestion/merge_requests.rs`) should be moved to a shared module (e.g., `src/ingestion/shared.rs`). MR ingestion needs `upsert_milestone_tx` for Phase A milestone support, and the label helper is already copy-pasted between files. **Changes:** Create `src/ingestion/shared.rs`, move `upsert_milestone_tx`, `upsert_label_tx`, and `MilestoneRow` there. Update imports in both issue and MR ingestion modules. --- ## Files touched | File | Change | |------|--------| | `migrations/007_complete_field_capture.sql` | New file | | `src/gitlab/types.rs` | Add `#[derive(Default)]` to `GitLabIssue` and `GitLabMergeRequest`; add `relative: Option` to `GitLabReferences`; add fields to both structs; add `GitLabTimeStats`, `GitLabTaskCompletionStatus`, `GitLabEpic`, `GitLabIteration` | | `src/gitlab/transformers/issue.rs` | Remove local `parse_timestamp()`, switch to `core::time`; extend IssueRow, IssueWithMetadata, transform_issue() | | `src/gitlab/transformers/merge_request.rs` | Extend NormalizedMergeRequest, MergeRequestWithMetadata, transform_merge_request(); extract `references_relative` | | `src/ingestion/shared.rs` | New file: shared `upsert_milestone_tx`, `upsert_label_tx`, `MilestoneRow` | | `src/ingestion/issues.rs` | Extend INSERT/UPSERT SQL; import from shared module | | `src/ingestion/merge_requests.rs` | Extend INSERT/UPSERT SQL; import from shared module; add milestone upsert | | `src/core/db.rs` | Register migration 007 in `MIGRATIONS` array | --- ## What this does NOT include - No new API endpoints called - No new tables (except reusing existing `milestones` for MRs) - No CLI changes (new fields are stored but not yet surfaced in `lore issues` / `lore mrs` output) - No changes to discussion/note ingestion (Phase A is issues + MRs only) - No observability instrumentation (that's Phase B) --- ## Rollout / Backfill Note After applying Migration 007 and shipping transformer + UPSERT updates, **existing rows will not have the new columns populated** until issues/MRs are reprocessed. Plan on a **one-time full re-sync** (`lore ingest --type issues --full` and `lore ingest --type mrs --full`) to backfill the new fields. Until then, queries on new columns will return NULL/default values for previously-synced entities. --- ## Resolved decisions | Field | Decision | Rationale | |-------|----------|-----------| | `subscribed` | **Excluded** | User-relative field (reflects token holder's subscription state, not an entity property). Changes meaning if the token is rotated to a different user. Not entity data. | | `_links` | **Excluded** | HATEOAS API navigation metadata, not entity data. Every URL is deterministically constructable from `project_id` + `iid` + GitLab base URL. Note: `closed_as_duplicate_of` inside `_links` contains a real entity reference -- extracting that is deferred to a future phase. | | `epic` / `iteration` | **Flatten to columns** | Same denormalization pattern as milestones. Epic gets 5 columns (`epic_id`, `epic_iid`, `epic_title`, `epic_url`, `epic_group_id`). Iteration gets 6 columns (`iteration_id`, `iteration_iid`, `iteration_title`, `iteration_state`, `iteration_start_date`, `iteration_due_date`). Both nullable (null on Free tier). | | `approvals_before_merge` | **Store best-effort** | Deprecated and scheduled for removal in GitLab API v5. Keep as `Option` / nullable column. Never depend on it for correctness -- it may disappear in a future GitLab release. |