Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests
(Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark
for incremental sync, and the missing idx_label_events_label index.
The MR transformer and ingestion pipeline now populate commit SHAs during
sync. The orchestrator uses watermark-based filtering for closes_issues
jobs instead of re-enqueuing all MRs every sync.
The Phase B PRD is updated to match the actual codebase: corrected
migration numbering (011-015), documented nullable label/milestone
fields (migration 012), watermark patterns (013), observability
infrastructure (014), simplified source_method values, and updated
entity_references schema to match implementation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@@ -39,7 +39,7 @@ Five gates, each independently verifiable and shippable:
- **Opt-in event ingestion.** New config flag `sync.fetchResourceEvents` (default `true`) controls whether the sync pipeline fetches event data. Users who don't need temporal features skip the additional API calls.
- **Opt-in event ingestion.** New config flag `sync.fetchResourceEvents` (default `true`) controls whether the sync pipeline fetches event data. Users who don't need temporal features skip the additional API calls.
- **Application-level graph traversal.** Cross-reference expansion uses BFS in Rust, not recursive SQL CTEs. Capped at configurable depth (default 1) for predictable performance.
- **Application-level graph traversal.** Cross-reference expansion uses BFS in Rust, not recursive SQL CTEs. Capped at configurable depth (default 1) for predictable performance.
- **Evolutionary library extraction.** New commands are built with typed return structs from day one. Old commands are not retrofitted until a concrete consumer (MCP server, web UI) requires it.
- **Evolutionary library extraction.** New commands are built with typed return structs from day one. Old commands are not retrofitted until a concrete consumer (MCP server, web UI) requires it.
- **Phase A fields cherry-picked as needed.** `merge_commit_sha` and `squash_commit_sha` are added in this phase's migration. Remaining Phase A fields are handled in their own migration later.
- **Phase A fields cherry-picked as needed.** `merge_commit_sha` and `squash_commit_sha` are added in migration 015 and populated during MR ingestion. Remaining Phase A fields are handled in their own migration later.
### Scope Boundaries
### Scope Boundaries
@@ -71,9 +71,9 @@ The original approach was to parse system note body text with regex to extract s
System note parsing is still used for events without structured APIs (see Gate 2), but with the explicit understanding that it's best-effort and fragile for non-English instances.
System note parsing is still used for events without structured APIs (see Gate 2), but with the explicit understanding that it's best-effort and fragile for non-English instances.
### 1.2 Schema (Migration 010)
### 1.2 Schema (Migration 011)
**File:**`migrations/010_resource_events.sql`
**File:**`migrations/011_resource_events.sql`
```sql
```sql
-- State change events (opened, closed, reopened, merged, locked)
-- State change events (opened, closed, reopened, merged, locked)
#### 1.2.1 Nullable Label and Milestone Fields (Migration 012)
GitLab returns `null` for `label` and `milestone` in Resource Events when the referenced label or milestone has been deleted from the project. This was discovered in production after the initial schema deployed with `NOT NULL` constraints.
**Migration 012** recreates `resource_label_events` and `resource_milestone_events` with nullable `label_name` and `milestone_title` columns. The table-swap approach (create new → copy → drop old → rename) is required because SQLite doesn't support `ALTER COLUMN`.
Timeline queries that encounter null labels/milestones display `"[deleted label]"` or `"[deleted milestone]"` in human output and omit the name field in robot JSON.
To avoid re-fetching resource events for every entity on every sync, a watermark column tracks the `updated_at` value at the time of last successful event fetch:
**Incremental behavior:** During sync, only entities where `updated_at > COALESCE(resource_events_synced_for_updated_at, 0)` are enqueued for resource event fetching. On `--full` sync, these watermarks are reset to `NULL`, causing all entities to be re-enqueued.
This mirrors the existing `discussions_synced_for_updated_at` pattern and works in conjunction with the dependent fetch queue.
**Architecture:** Generic dependent-fetch queue, generalizing the `pending_discussion_fetches` pattern. A single queue table serves all dependent resource types across Gates 1, 2, and 4, avoiding schema churn as new fetch types are added.
**Architecture:** Generic dependent-fetch queue, generalizing the `pending_discussion_fetches` pattern. A single queue table serves all dependent resource types across Gates 1, 2, and 4, avoiding schema churn as new fetch types are added.
**New queue table (in migration 010):**
**New queue table (in migration 011):**
```sql
```sql
-- Generic queue for all dependent resource fetches (events, closes_issues, diffs)
-- Generic queue for all dependent resource fetches (events, closes_issues, diffs)
**Purpose:** The `run_id` column correlates log entries (via `tracing`) with sync run records. `total_items_processed` and `total_errors` provide aggregate counts for `lore sync-status` and robot mode health checks without requiring log parsing.
This is separate from the event tables but supports the same operational workflow — answering "did the last sync succeed?" and "how many entities were processed?" programmatically.
Temporal queries need to follow links between entities: "MR !567 closed issue #234", "issue #234 mentioned in MR !567", "#299 was opened as a follow-up to !567". These relationships are captured in two places:
Temporal queries need to follow links between entities: "MR !567 closed issue #234", "issue #234 mentioned in MR !567", "#299 was opened as a follow-up to !567". These relationships are captured in two places:
1.**Structured API:**`GET /projects/:id/merge_requests/:iid/closes_issues` returns issues that close when the MR merges. Also, `resource_state_events` includes `source_merge_request_id` for "closed by MR" events.
1.**Structured API:**`GET /projects/:id/merge_requests/:iid/closes_issues` returns issues that close when the MR merges. Also, `resource_state_events` includes `source_merge_request_iid` for "closed by MR" events.
2.**System notes:** Cross-references like "mentioned in !456" and "closed by !789" appear in system note body text.
2.**System notes:** Cross-references like "mentioned in !456" and "closed by !789" appear in system note body text.
### 2.2 Schema (in Migration 010)
### 2.2 Schema (in Migration 011)
```sql
```sql
-- Cross-references between entities
-- Cross-references between entities
@@ -340,33 +379,49 @@ Temporal queries need to follow links between entities: "MR !567 closed issue #2
-- silently dropping them. Timeline output marks these as "[external]".
-- silently dropping them. Timeline output marks these as "[external]".
| `'note_parse'` | Extracted from system note body text (best-effort, English only) |
| `'description_parse'` | Extracted from issue/MR description body text (future) |
The original design used more granular values (`'api_closes_issues'`, `'api_state_event'`, `'system_note_parse'`). In practice, the API-sourced references don't need sub-method distinction — the `reference_type` already captures the semantic relationship — so the implementation simplified to three values.
2.**State events:** When `resource_state_events` contains `source_merge_request_id`, insert `reference_type = 'closes'`, `source_method = 'api_state_event'`. Source = MR (referenced by iid), target = issue (that received the state change).
2.**State events:** When `resource_state_events` contains `source_merge_request_iid`, insert `reference_type = 'closes'`, `source_method = 'api'`. Source = MR (referenced by iid), target = issue (that received the state change).
**Tier 2 — System note parsing (best-effort):**
**Tier 2 — System note parsing (best-effort):**
@@ -385,14 +440,14 @@ closed by #{iid}
**Cross-project references:** When a system note references `{group}/{project}#{iid}` and the target project is not synced locally, store with `target_entity_id = NULL`, `target_project_path = '{group}/{project}'`, `target_entity_iid = {iid}`. These unresolved references are still valuable for timeline narratives — they indicate external dependencies and decision context even when we can't traverse further.
**Cross-project references:** When a system note references `{group}/{project}#{iid}` and the target project is not synced locally, store with `target_entity_id = NULL`, `target_project_path = '{group}/{project}'`, `target_entity_iid = {iid}`. These unresolved references are still valuable for timeline narratives — they indicate external dependencies and decision context even when we can't traverse further.
Insert with `source_method = 'system_note_parse'`. Accept that:
Insert with `source_method = 'note_parse'`. Accept that:
- This breaks on non-English GitLab instances
- This breaks on non-English GitLab instances
- Format may vary across GitLab versions
- Format may vary across GitLab versions
- Log parse failures at `debug` level for monitoring
- Log parse failures at `debug` level for monitoring
Issue and MR descriptions often contain `#123` or `!456` references. Parsing these is lower confidence (mentions != relationships) and is deferred to a future iteration.
Issue and MR descriptions often contain `#123` or `!456` references. Parsing these is lower confidence (mentions != relationships) and is deferred to a future iteration. The `source_method` value `'description_parse'` is reserved in the CHECK constraint for this future work.
### 2.4 Ingestion Flow
### 2.4 Ingestion Flow
@@ -401,6 +456,8 @@ The `closes_issues` fetch uses the generic dependent fetch queue (`job_type = 'm
- One additional API call per MR: `GET /projects/:id/merge_requests/:iid/closes_issues`
- One additional API call per MR: `GET /projects/:id/merge_requests/:iid/closes_issues`
- Cross-reference parsing from system notes runs as a local post-processing step (no API calls) after all dependent fetches complete
- Cross-reference parsing from system notes runs as a local post-processing step (no API calls) after all dependent fetches complete
**Watermark pattern (migration 015):** A `closes_issues_synced_for_updated_at` column on `merge_requests` tracks the last `updated_at` value at which closes_issues data was fetched. Only MRs where `updated_at > COALESCE(closes_issues_synced_for_updated_at, 0)` are enqueued for re-fetching. The watermark is updated after successful fetch or after a permanent API error (e.g., 404 for external MRs). On `--full` sync, the watermark is reset to `NULL`.
### 2.5 Acceptance Criteria
### 2.5 Acceptance Criteria
- [ ]`entity_references` table populated from `closes_issues` API for all synced MRs
- [ ]`entity_references` table populated from `closes_issues` API for all synced MRs
@@ -562,7 +619,7 @@ Evidence notes (`NOTE` events) show the first ~200 characters of FTS5-matched no
`merge_commit_sha` and `squash_commit_sha` were added to `merge_requests` in migration 015. These are now populated during MR ingestion and available for Gate 4/5 queries.
**File changes table (future migration — not yet created):**
```sql
```sql
-- Files changed by each merge request
-- Files changed by each merge request
@@ -660,11 +721,6 @@ CREATE INDEX idx_mr_files_new_path ON mr_file_changes(new_path);
@@ -881,14 +937,16 @@ When git integration is added:
### Migration Numbering
### Migration Numbering
Phase B uses migration numbers starting at 010:
Phase B uses migration numbers 011–015. The original plan assumed migration 010 was available, but chunk config (`010_chunk_config.sql`) was implemented first, shifting everything by +1.
| 012 | `012_nullable_label_milestone.sql` | Make `label_name` and `milestone_title` nullable for deleted labels/milestones | Gate 1 (fix) |
| 013 | `013_resource_event_watermarks.sql` | Add `resource_events_synced_for_updated_at` to issues and merge_requests | Gate 1 (optimization) |
Phase A's complete field capture migration should use 012+ when implemented, skipping fields already added by 011 (`merge_commit_sha`, `squash_commit_sha`).
| 015 | `015_commit_shas_and_closes_watermark.sql` | `merge_commit_sha`, `squash_commit_sha`, `closes_issues_synced_for_updated_at` on merge_requests; `idx_label_events_label` index | Gates 2, 4 |
| TBD | — | `mr_file_changes` table for MR diff data | Gate 4 |
### Backward Compatibility
### Backward Compatibility
@@ -909,7 +967,7 @@ Phase A's complete field capture migration should use 012+ when implemented, ski
| GitLab diffs API returns large payloads | Low | Extract file metadata only, discard diff content |
| GitLab diffs API returns large payloads | Low | Extract file metadata only, discard diff content |
| Cross-reference graph traversal unbounded | Medium | BFS depth capped at configurable limit (default 1); `mentioned` edges excluded by default |
| Cross-reference graph traversal unbounded | Medium | BFS depth capped at configurable limit (default 1); `mentioned` edges excluded by default |
| Cross-project references lost when target not synced | Medium | Unresolved references stored with `target_entity_id = NULL`; still appear in timeline output |
| Cross-project references lost when target not synced | Medium | Unresolved references stored with `target_entity_id = NULL`; still appear in timeline output |
| Phase A migration numbering conflict | Low | Phase B uses 010-011; Phase A uses 012+ |
| Phase A migration numbering conflict | Low | Resolved: chunk config took 010; Phase B shifted to 011-015 |
| Timeline output lacks "why" evidence | Medium | Evidence-bearing notes from FTS5 included as first-class timeline events |
| Timeline output lacks "why" evidence | Medium | Evidence-bearing notes from FTS5 included as first-class timeline events |
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.