Files
gitlore/docs/phase-a-spec.md
Taylor Eernisse 9b63671df9 docs: Update documentation for search pipeline and Phase A spec
- README.md: Add hybrid search and robot mode to feature list. Update
  quick start to use new noun-first CLI syntax (lore issues, lore mrs,
  lore search). Add embedding configuration section. Update command
  examples throughout.

- AGENTS.md: Update robot mode examples to new CLI syntax. Add search,
  sync, stats, and generate-docs commands to the robot mode reference.
  Update flag conventions (-n for limit, -s for state, -J for JSON).

- docs/prd/checkpoint-3.md: Major expansion with gated milestone
  structure (Gate A: lexical, Gate B: hybrid, Gate C: sync). Add
  prerequisite rename note, code sample conventions, chunking strategy
  details, and sqlite-vec rowid encoding scheme. Clarify that Gate A
  requires only SQLite + FTS5 with no sqlite-vec dependency.

- docs/phase-a-spec.md: New detailed specification for Gate A (lexical
  search MVP) covering document schema, FTS5 configuration, dirty
  queue mechanics, CLI interface, and acceptance criteria.

- docs/api-efficiency-findings.md: Analysis of GitLab API pagination
  behavior and efficiency observations from production sync runs.
  Documents the missing x-next-page header issue and heuristic fix.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 15:47:33 -05:00

457 lines
24 KiB
Markdown

# Phase A: Complete API Field Capture
> **Status:** Draft
> **Guiding principle:** Mirror everything GitLab gives us.
> - **Lossless mirror:** the raw API JSON stored behind `raw_payload_id`. This is the true complete representation of every API response.
> - **Relational projection:** a stable, query-optimized subset of fields we commit to keeping current on every re-sync.
> This preserves maximum context for processing and analysis while avoiding unbounded schema growth.
> **Migration:** 007_complete_field_capture.sql
> **Prerequisite:** None (independent of CP3)
---
## Scope
One migration. Three categories of work:
1. **New columns** on `issues` and `merge_requests` for fields currently dropped by serde or dropped during transform
2. **New serde fields** on `GitLabIssue` and `GitLabMergeRequest` to deserialize currently-silently-dropped JSON fields
3. **Transformer + insert updates** to pass the new fields through to the DB
No new tables. No new API calls. No new endpoints. All data comes from responses we already receive.
---
## Issues: Field Gap Inventory
### Currently stored
id, iid, project_id, title, description, state, author_username, created_at, updated_at, web_url, due_date, milestone_id, milestone_title, raw_payload_id, last_seen_at, discussions_synced_for_updated_at, labels (junction), assignees (junction)
### Currently deserialized but dropped during transform
| API Field | Status | Action |
|-----------|--------|--------|
| `closed_at` | Deserialized in serde struct, but no DB column exists and transformer never populates it | Add column in migration 007, wire up in IssueRow + transform + INSERT |
| `author.id` | Deserialized | Store as `author_id` column |
| `author.name` | Deserialized | Store as `author_name` column |
### Currently silently dropped by serde (not in GitLabIssue struct)
| API Field | Type | DB Column | Notes |
|-----------|------|-----------|-------|
| `issue_type` | Option\<String\> | `issue_type` | Canonical field (lowercase, e.g. "issue"); preferred for DB storage |
| `upvotes` | i64 | `upvotes` | |
| `downvotes` | i64 | `downvotes` | |
| `user_notes_count` | i64 | `user_notes_count` | Useful for discussion sync optimization |
| `merge_requests_count` | i64 | `merge_requests_count` | Count of linked MRs |
| `confidential` | bool | `confidential` | 0/1 |
| `discussion_locked` | bool | `discussion_locked` | 0/1 |
| `weight` | Option\<i64\> | `weight` | Premium/Ultimate, null on Free |
| `time_stats.time_estimate` | i64 | `time_estimate` | Seconds |
| `time_stats.total_time_spent` | i64 | `time_spent` | Seconds |
| `time_stats.human_time_estimate` | Option\<String\> | `human_time_estimate` | e.g. "3h 30m" |
| `time_stats.human_total_time_spent` | Option\<String\> | `human_time_spent` | e.g. "1h 15m" |
| `task_completion_status.count` | i64 | `task_count` | Checkbox total |
| `task_completion_status.completed_count` | i64 | `task_completed_count` | Checkboxes checked |
| `has_tasks` | bool | `has_tasks` | 0/1 |
| `severity` | Option\<String\> | `severity` | Incident severity |
| `closed_by` | Option\<object\> | `closed_by_username` | Who closed it (username only, consistent with author pattern) |
| `imported` | bool | `imported` | 0/1 |
| `imported_from` | Option\<String\> | `imported_from` | Import source |
| `moved_to_id` | Option\<i64\> | `moved_to_id` | Target issue if moved |
| `references.short` | String | `references_short` | e.g. "#42" |
| `references.relative` | String | `references_relative` | e.g. "#42" or "group/proj#42" |
| `references.full` | String | `references_full` | e.g. "group/project#42" |
| `health_status` | Option\<String\> | `health_status` | Ultimate only |
| `type` | Option\<String\> | (transform-only) | Uppercase category (e.g. "ISSUE"); fallback for `issue_type` -- lowercased before storage. Not stored as separate column; raw JSON remains lossless. |
| `epic.id` | Option\<i64\> | `epic_id` | Premium/Ultimate, null on Free |
| `epic.iid` | Option\<i64\> | `epic_iid` | |
| `epic.title` | Option\<String\> | `epic_title` | |
| `epic.url` | Option\<String\> | `epic_url` | |
| `epic.group_id` | Option\<i64\> | `epic_group_id` | |
| `iteration.id` | Option\<i64\> | `iteration_id` | Premium/Ultimate, null on Free |
| `iteration.iid` | Option\<i64\> | `iteration_iid` | |
| `iteration.title` | Option\<String\> | `iteration_title` | |
| `iteration.state` | Option\<i64\> | `iteration_state` | Enum: 1=upcoming, 2=current, 3=closed |
| `iteration.start_date` | Option\<String\> | `iteration_start_date` | ISO date |
| `iteration.due_date` | Option\<String\> | `iteration_due_date` | ISO date |
---
## Merge Requests: Field Gap Inventory
### Currently stored
id, iid, project_id, title, description, state, draft, author_username, source_branch, target_branch, head_sha, references_short, references_full, detailed_merge_status, merge_user_username, created_at, updated_at, merged_at, closed_at, last_seen_at, web_url, raw_payload_id, discussions_synced_for_updated_at, discussions_sync_last_attempt_at, discussions_sync_attempts, discussions_sync_last_error, labels (junction), assignees (junction), reviewers (junction)
### Currently deserialized but dropped during transform
| API Field | Status | Action |
|-----------|--------|--------|
| `author.id` | Deserialized | Store as `author_id` column |
| `author.name` | Deserialized | Store as `author_name` column |
| `work_in_progress` | Used transiently for `draft` fallback | Already handled, no change needed |
| `merge_status` (legacy) | Used transiently for `detailed_merge_status` fallback | Already handled, no change needed |
| `merged_by` | Used transiently for `merge_user` fallback | Already handled, no change needed |
### Currently silently dropped by serde (not in GitLabMergeRequest struct)
| API Field | Type | DB Column | Notes |
|-----------|------|-----------|-------|
| `upvotes` | i64 | `upvotes` | |
| `downvotes` | i64 | `downvotes` | |
| `user_notes_count` | i64 | `user_notes_count` | |
| `source_project_id` | i64 | `source_project_id` | Fork source |
| `target_project_id` | i64 | `target_project_id` | Fork target |
| `milestone` | Option\<object\> | `milestone_id`, `milestone_title` | Reuse issue milestone pattern |
| `merge_when_pipeline_succeeds` | bool | `merge_when_pipeline_succeeds` | 0/1, auto-merge flag |
| `merge_commit_sha` | Option\<String\> | `merge_commit_sha` | Commit ref after merge |
| `squash_commit_sha` | Option\<String\> | `squash_commit_sha` | Commit ref after squash |
| `discussion_locked` | bool | `discussion_locked` | 0/1 |
| `should_remove_source_branch` | Option\<bool\> | `should_remove_source_branch` | 0/1 |
| `force_remove_source_branch` | Option\<bool\> | `force_remove_source_branch` | 0/1 |
| `squash` | bool | `squash` | 0/1 |
| `squash_on_merge` | bool | `squash_on_merge` | 0/1 |
| `has_conflicts` | bool | `has_conflicts` | 0/1 |
| `blocking_discussions_resolved` | bool | `blocking_discussions_resolved` | 0/1 |
| `time_stats.time_estimate` | i64 | `time_estimate` | Seconds |
| `time_stats.total_time_spent` | i64 | `time_spent` | Seconds |
| `time_stats.human_time_estimate` | Option\<String\> | `human_time_estimate` | |
| `time_stats.human_total_time_spent` | Option\<String\> | `human_time_spent` | |
| `task_completion_status.count` | i64 | `task_count` | |
| `task_completion_status.completed_count` | i64 | `task_completed_count` | |
| `closed_by` | Option\<object\> | `closed_by_username` | |
| `prepared_at` | Option\<String\> | `prepared_at` | ISO datetime in API; store as ms epoch via `iso_to_ms()`, nullable |
| `merge_after` | Option\<String\> | `merge_after` | ISO datetime in API; store as ms epoch via `iso_to_ms()`, nullable (scheduled merge) |
| `imported` | bool | `imported` | 0/1 |
| `imported_from` | Option\<String\> | `imported_from` | |
| `approvals_before_merge` | Option\<i64\> | `approvals_before_merge` | Deprecated, scheduled for removal in GitLab API v5; store best-effort, keep nullable |
| `references.relative` | String | `references_relative` | Currently only short + full stored |
| `confidential` | bool | `confidential` | 0/1 (MRs can be confidential too) |
| `iteration.id` | Option\<i64\> | `iteration_id` | Premium/Ultimate, null on Free |
| `iteration.iid` | Option\<i64\> | `iteration_iid` | |
| `iteration.title` | Option\<String\> | `iteration_title` | |
| `iteration.state` | Option\<i64\> | `iteration_state` | |
| `iteration.start_date` | Option\<String\> | `iteration_start_date` | ISO date |
| `iteration.due_date` | Option\<String\> | `iteration_due_date` | ISO date |
---
## Migration 007: complete_field_capture.sql
```sql
-- Migration 007: Capture all remaining GitLab API response fields.
-- Principle: mirror everything GitLab returns. No field left behind.
-- ============================================================
-- ISSUES: new columns
-- ============================================================
-- Fields currently deserialized but not stored
ALTER TABLE issues ADD COLUMN closed_at INTEGER; -- ms epoch, deserialized but never stored until now
ALTER TABLE issues ADD COLUMN author_id INTEGER; -- GitLab user ID
ALTER TABLE issues ADD COLUMN author_name TEXT; -- Display name
-- Issue metadata
ALTER TABLE issues ADD COLUMN issue_type TEXT; -- 'issue' | 'incident' | 'test_case'
ALTER TABLE issues ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN discussion_locked INTEGER NOT NULL DEFAULT 0;
-- Engagement
ALTER TABLE issues ADD COLUMN upvotes INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN downvotes INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN user_notes_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN merge_requests_count INTEGER NOT NULL DEFAULT 0;
-- Time tracking
ALTER TABLE issues ADD COLUMN time_estimate INTEGER NOT NULL DEFAULT 0; -- seconds
ALTER TABLE issues ADD COLUMN time_spent INTEGER NOT NULL DEFAULT 0; -- seconds
ALTER TABLE issues ADD COLUMN human_time_estimate TEXT;
ALTER TABLE issues ADD COLUMN human_time_spent TEXT;
-- Task lists
ALTER TABLE issues ADD COLUMN task_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN task_completed_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN has_tasks INTEGER NOT NULL DEFAULT 0;
-- References (MRs already have short + full)
ALTER TABLE issues ADD COLUMN references_short TEXT; -- e.g. "#42"
ALTER TABLE issues ADD COLUMN references_relative TEXT; -- context-dependent
ALTER TABLE issues ADD COLUMN references_full TEXT; -- e.g. "group/project#42"
-- Close/move tracking
ALTER TABLE issues ADD COLUMN closed_by_username TEXT;
-- Premium/Ultimate fields (nullable, null on Free tier)
ALTER TABLE issues ADD COLUMN weight INTEGER;
ALTER TABLE issues ADD COLUMN severity TEXT;
ALTER TABLE issues ADD COLUMN health_status TEXT;
-- Import tracking
ALTER TABLE issues ADD COLUMN imported INTEGER NOT NULL DEFAULT 0;
ALTER TABLE issues ADD COLUMN imported_from TEXT;
ALTER TABLE issues ADD COLUMN moved_to_id INTEGER;
-- Epic (Premium/Ultimate, null on Free)
ALTER TABLE issues ADD COLUMN epic_id INTEGER;
ALTER TABLE issues ADD COLUMN epic_iid INTEGER;
ALTER TABLE issues ADD COLUMN epic_title TEXT;
ALTER TABLE issues ADD COLUMN epic_url TEXT;
ALTER TABLE issues ADD COLUMN epic_group_id INTEGER;
-- Iteration (Premium/Ultimate, null on Free)
ALTER TABLE issues ADD COLUMN iteration_id INTEGER;
ALTER TABLE issues ADD COLUMN iteration_iid INTEGER;
ALTER TABLE issues ADD COLUMN iteration_title TEXT;
ALTER TABLE issues ADD COLUMN iteration_state INTEGER;
ALTER TABLE issues ADD COLUMN iteration_start_date TEXT;
ALTER TABLE issues ADD COLUMN iteration_due_date TEXT;
-- ============================================================
-- MERGE REQUESTS: new columns
-- ============================================================
-- Author enrichment
ALTER TABLE merge_requests ADD COLUMN author_id INTEGER;
ALTER TABLE merge_requests ADD COLUMN author_name TEXT;
-- Engagement
ALTER TABLE merge_requests ADD COLUMN upvotes INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN downvotes INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN user_notes_count INTEGER NOT NULL DEFAULT 0;
-- Fork tracking
ALTER TABLE merge_requests ADD COLUMN source_project_id INTEGER;
ALTER TABLE merge_requests ADD COLUMN target_project_id INTEGER;
-- Milestone (parity with issues)
ALTER TABLE merge_requests ADD COLUMN milestone_id INTEGER;
ALTER TABLE merge_requests ADD COLUMN milestone_title TEXT;
-- Merge behavior
ALTER TABLE merge_requests ADD COLUMN merge_when_pipeline_succeeds INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN merge_commit_sha TEXT;
ALTER TABLE merge_requests ADD COLUMN squash_commit_sha TEXT;
ALTER TABLE merge_requests ADD COLUMN squash INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN squash_on_merge INTEGER NOT NULL DEFAULT 0;
-- Merge readiness
ALTER TABLE merge_requests ADD COLUMN has_conflicts INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN blocking_discussions_resolved INTEGER NOT NULL DEFAULT 0;
-- Branch cleanup
ALTER TABLE merge_requests ADD COLUMN should_remove_source_branch INTEGER;
ALTER TABLE merge_requests ADD COLUMN force_remove_source_branch INTEGER;
-- Discussion lock
ALTER TABLE merge_requests ADD COLUMN discussion_locked INTEGER NOT NULL DEFAULT 0;
-- Time tracking
ALTER TABLE merge_requests ADD COLUMN time_estimate INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN time_spent INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN human_time_estimate TEXT;
ALTER TABLE merge_requests ADD COLUMN human_time_spent TEXT;
-- Task lists
ALTER TABLE merge_requests ADD COLUMN task_count INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN task_completed_count INTEGER NOT NULL DEFAULT 0;
-- Close tracking
ALTER TABLE merge_requests ADD COLUMN closed_by_username TEXT;
-- Scheduling (API returns ISO datetimes; we store ms epoch for consistency)
ALTER TABLE merge_requests ADD COLUMN prepared_at INTEGER; -- ms epoch after iso_to_ms()
ALTER TABLE merge_requests ADD COLUMN merge_after INTEGER; -- ms epoch after iso_to_ms()
-- References (add relative, short + full already exist)
ALTER TABLE merge_requests ADD COLUMN references_relative TEXT;
-- Import tracking
ALTER TABLE merge_requests ADD COLUMN imported INTEGER NOT NULL DEFAULT 0;
ALTER TABLE merge_requests ADD COLUMN imported_from TEXT;
-- Premium/Ultimate
ALTER TABLE merge_requests ADD COLUMN approvals_before_merge INTEGER;
ALTER TABLE merge_requests ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0;
-- Iteration (Premium/Ultimate, null on Free)
ALTER TABLE merge_requests ADD COLUMN iteration_id INTEGER;
ALTER TABLE merge_requests ADD COLUMN iteration_iid INTEGER;
ALTER TABLE merge_requests ADD COLUMN iteration_title TEXT;
ALTER TABLE merge_requests ADD COLUMN iteration_state INTEGER;
ALTER TABLE merge_requests ADD COLUMN iteration_start_date TEXT;
ALTER TABLE merge_requests ADD COLUMN iteration_due_date TEXT;
-- Record migration version
INSERT INTO schema_version (version, applied_at, description)
VALUES (7, strftime('%s', 'now') * 1000, 'Complete API field capture for issues and merge requests');
```
---
## Serde Struct Changes
### Existing type changes
```
GitLabReferences // Add: relative: Option<String> (with #[serde(default)])
// Existing fields short + full remain unchanged
GitLabIssue // Add #[derive(Default)] for test ergonomics
GitLabMergeRequest // Add #[derive(Default)] for test ergonomics
```
### New helper types needed
```
GitLabTimeStats { time_estimate, total_time_spent, human_time_estimate, human_total_time_spent }
GitLabTaskCompletionStatus { count, completed_count }
GitLabClosedBy (reuse GitLabAuthor shape: id, username, name)
GitLabEpic { id, iid, title, url, group_id }
GitLabIteration { id, iid, title, state, start_date, due_date }
```
### GitLabIssue: add fields
```
type: Option<String> // #[serde(rename = "type")] -- fallback-only (uppercase category); "type" is reserved in Rust
upvotes: i64 // #[serde(default)]
downvotes: i64 // #[serde(default)]
user_notes_count: i64 // #[serde(default)]
merge_requests_count: i64 // #[serde(default)]
confidential: bool // #[serde(default)]
discussion_locked: bool // #[serde(default)]
weight: Option<i64>
time_stats: Option<GitLabTimeStats>
task_completion_status: Option<GitLabTaskCompletionStatus>
has_tasks: bool // #[serde(default)]
references: Option<GitLabReferences>
closed_by: Option<GitLabAuthor>
severity: Option<String>
health_status: Option<String>
imported: bool // #[serde(default)]
imported_from: Option<String>
moved_to_id: Option<i64>
issue_type: Option<String> // canonical field (lowercase); preferred for DB storage over `type`
epic: Option<GitLabEpic>
iteration: Option<GitLabIteration>
```
### GitLabMergeRequest: add fields
```
upvotes: i64 // #[serde(default)]
downvotes: i64 // #[serde(default)]
user_notes_count: i64 // #[serde(default)]
source_project_id: Option<i64>
target_project_id: Option<i64>
milestone: Option<GitLabMilestone> // reuse existing type
merge_when_pipeline_succeeds: bool // #[serde(default)]
merge_commit_sha: Option<String>
squash_commit_sha: Option<String>
squash: bool // #[serde(default)]
squash_on_merge: bool // #[serde(default)]
has_conflicts: bool // #[serde(default)]
blocking_discussions_resolved: bool // #[serde(default)]
should_remove_source_branch: Option<bool>
force_remove_source_branch: Option<bool>
discussion_locked: bool // #[serde(default)]
time_stats: Option<GitLabTimeStats>
task_completion_status: Option<GitLabTaskCompletionStatus>
closed_by: Option<GitLabAuthor>
prepared_at: Option<String>
merge_after: Option<String>
imported: bool // #[serde(default)]
imported_from: Option<String>
approvals_before_merge: Option<i64>
confidential: bool // #[serde(default)]
iteration: Option<GitLabIteration>
```
---
## Transformer Changes
### IssueRow: add fields
All new fields map 1:1 from the serde struct except:
- `closed_at` -> `iso_to_ms()` conversion (already in serde struct, just not passed through)
- `time_stats` -> flatten to 4 individual fields
- `task_completion_status` -> flatten to 2 individual fields
- `references` -> flatten to 3 individual fields
- `closed_by` -> extract `username` only (consistent with author pattern)
- `author` -> additionally extract `id` and `name` (currently only `username`)
- `issue_type` -> store as-is (canonical, lowercase); fallback to lowercased `type` field if `issue_type` absent
- `epic` -> flatten to 5 individual fields (id, iid, title, url, group_id)
- `iteration` -> flatten to 6 individual fields (id, iid, title, state, start_date, due_date)
### NormalizedMergeRequest: add fields
Same patterns as issues, plus:
- `milestone` -> reuse `upsert_milestone_tx` from issue pipeline, add `milestone_id` + `milestone_title`
- `prepared_at`, `merge_after` -> `iso_to_ms()` conversion (API provides ISO datetimes)
- `source_project_id`, `target_project_id` -> direct pass-through
- `iteration` -> flatten to 6 individual fields (same as issues)
### Insert statement changes
Both `process_issue_in_transaction` and `process_mr_in_transaction` need their INSERT and ON CONFLICT DO UPDATE statements extended with all new columns. The ON CONFLICT clause should update all new fields on re-sync.
**Implementation note (reliability):** Define a single authoritative list of persisted columns per entity and generate/compose both SQL fragments from it:
- INSERT column list + VALUES placeholders
- ON CONFLICT DO UPDATE assignments
This prevents drift where a new field is added to one clause but not the other -- the most likely bug class with 40+ new columns.
---
## Prerequisite refactors (prep commits before main Phase A work)
### 1. Align issue transformer on `core::time`
The issue transformer (`transformers/issue.rs`) has a local `parse_timestamp()` that duplicates `iso_to_ms_strict()` from `core::time`. The MR transformer already uses the shared module. Before adding Phase A's optional timestamp fields (especially `closed_at` as `Option<String>`), migrate the issue transformer to use `iso_to_ms_strict()` and `iso_to_ms_opt_strict()` from `core::time`. This avoids duplicating the `opt` variant locally and establishes one timestamp parsing path across the codebase.
**Changes:** Replace `parse_timestamp()` calls with `iso_to_ms_strict()`, adapt or remove `TransformError::TimestampParse` (MR transformer uses `String` errors; align on that or on a shared error type).
### 2. Extract shared ingestion helpers
`upsert_milestone_tx` (in `ingestion/issues.rs`) and `upsert_label_tx` (duplicated in both `ingestion/issues.rs` and `ingestion/merge_requests.rs`) should be moved to a shared module (e.g., `src/ingestion/shared.rs`). MR ingestion needs `upsert_milestone_tx` for Phase A milestone support, and the label helper is already copy-pasted between files.
**Changes:** Create `src/ingestion/shared.rs`, move `upsert_milestone_tx`, `upsert_label_tx`, and `MilestoneRow` there. Update imports in both issue and MR ingestion modules.
---
## Files touched
| File | Change |
|------|--------|
| `migrations/007_complete_field_capture.sql` | New file |
| `src/gitlab/types.rs` | Add `#[derive(Default)]` to `GitLabIssue` and `GitLabMergeRequest`; add `relative: Option<String>` to `GitLabReferences`; add fields to both structs; add `GitLabTimeStats`, `GitLabTaskCompletionStatus`, `GitLabEpic`, `GitLabIteration` |
| `src/gitlab/transformers/issue.rs` | Remove local `parse_timestamp()`, switch to `core::time`; extend IssueRow, IssueWithMetadata, transform_issue() |
| `src/gitlab/transformers/merge_request.rs` | Extend NormalizedMergeRequest, MergeRequestWithMetadata, transform_merge_request(); extract `references_relative` |
| `src/ingestion/shared.rs` | New file: shared `upsert_milestone_tx`, `upsert_label_tx`, `MilestoneRow` |
| `src/ingestion/issues.rs` | Extend INSERT/UPSERT SQL; import from shared module |
| `src/ingestion/merge_requests.rs` | Extend INSERT/UPSERT SQL; import from shared module; add milestone upsert |
| `src/core/db.rs` | Register migration 007 in `MIGRATIONS` array |
---
## What this does NOT include
- No new API endpoints called
- No new tables (except reusing existing `milestones` for MRs)
- No CLI changes (new fields are stored but not yet surfaced in `lore issues` / `lore mrs` output)
- No changes to discussion/note ingestion (Phase A is issues + MRs only)
- No observability instrumentation (that's Phase B)
---
## Rollout / Backfill Note
After applying Migration 007 and shipping transformer + UPSERT updates, **existing rows will not have the new columns populated** until issues/MRs are reprocessed. Plan on a **one-time full re-sync** (`lore ingest --type issues --full` and `lore ingest --type mrs --full`) to backfill the new fields. Until then, queries on new columns will return NULL/default values for previously-synced entities.
---
## Resolved decisions
| Field | Decision | Rationale |
|-------|----------|-----------|
| `subscribed` | **Excluded** | User-relative field (reflects token holder's subscription state, not an entity property). Changes meaning if the token is rotated to a different user. Not entity data. |
| `_links` | **Excluded** | HATEOAS API navigation metadata, not entity data. Every URL is deterministically constructable from `project_id` + `iid` + GitLab base URL. Note: `closed_as_duplicate_of` inside `_links` contains a real entity reference -- extracting that is deferred to a future phase. |
| `epic` / `iteration` | **Flatten to columns** | Same denormalization pattern as milestones. Epic gets 5 columns (`epic_id`, `epic_iid`, `epic_title`, `epic_url`, `epic_group_id`). Iteration gets 6 columns (`iteration_id`, `iteration_iid`, `iteration_title`, `iteration_state`, `iteration_start_date`, `iteration_due_date`). Both nullable (null on Free tier). |
| `approvals_before_merge` | **Store best-effort** | Deprecated and scheduled for removal in GitLab API v5. Keep as `Option<i64>` / nullable column. Never depend on it for correctness -- it may disappear in a future GitLab release. |