# CP1 ↔ CP2 Alignment Audit **Created:** 2026-01-26 **Purpose:** Document deviations between CP1 (Issue Ingestion) and CP2 (MR Ingestion) PRDs that could cause implementation drift. Use this checklist to verify alignment before CP2 implementation. --- ## Status Legend - ⬜ **Not Verified** - Needs investigation - ✅ **Aligned** - Implementations match or CP1 was updated - ⚠️ **Intentional Deviation** - Documented reason for difference - ❌ **Gap Found** - Needs remediation --- ## 1. Transformer Return Types **PRD Deviation:** - CP1: Infallible transformers returning structs directly - CP2: Fallible transformers returning `Result` **CP1 PRD Example:** ```rust pub fn transform_issue(gitlab_issue: &GitLabIssue, local_project_id: i64) -> NormalizedIssue pub fn transform_notes(...) -> Vec ``` **CP2 PRD Example:** ```rust pub fn transform_merge_request(...) -> Result pub fn transform_notes_with_diff_position(...) -> Result, String> ``` **Risk:** Inconsistent error handling; one silently produces bad data, other fails explicitly. **Verification:** - [ ] Check actual `transform_issue()` signature in `src/gitlab/transformers/issue.rs` - [ ] Check actual `transform_notes()` signature in `src/gitlab/transformers/discussion.rs` - [ ] Determine canonical pattern (Result vs infallible) **Status:** ⬜ Not Verified --- ## 2. Timestamp Parsing Contract (`iso_to_ms`) **PRD Deviation:** - CP1: Assumes `iso_to_ms()` returns `i64` directly - CP2: Assumes `iso_to_ms()` returns `Option` **CP1 PRD Usage (line 348):** ```rust created_at: iso_to_ms(&gitlab_issue.created_at), ``` **CP2 PRD Usage (line 377):** ```rust let created_at = iso_to_ms(&gitlab_mr.created_at) .ok_or_else(|| format!("Invalid created_at: {}", gitlab_mr.created_at))?; ``` **Risk:** Compile error or silent zero values on invalid timestamps. **Verification:** - [ ] Check actual `iso_to_ms()` signature in `src/core/time.rs` - [ ] Verify all call sites handle return type correctly **Status:** ⬜ Not Verified --- ## 3. NormalizedDiscussion Structure **PRD Deviation:** - CP1: `issue_id: i64` (non-optional, line 381) - CP2: `issue_id: Option`, `merge_request_id: Option` (lines 499-500) **Risk:** Type incompatibility; CP2 can't reuse CP1's struct without modification. **Verification:** - [ ] Check actual `NormalizedDiscussion` struct definition - [ ] Verify it supports both issue and MR contexts **Status:** ⬜ Not Verified --- ## 4. Cursor Update Strategy **PRD Deviation:** - CP1: Modulo-based flush every 100 items - CP2: Page-boundary flush **CP1 PRD (lines 805-808):** ```rust if result.fetched % 100 == 0 { update_cursor(...)?; } ``` **CP2 PRD (lines 994-998):** ```rust // Page-boundary cursor flush (not based on fetched % 100) if let (Some(updated_at), Some(gitlab_id)) = (last_updated_at, last_gitlab_id) { update_cursor(conn, project_id, "merge_requests", updated_at, gitlab_id)?; } ``` **Risk:** Inconsistent crash recovery behavior; page-boundary is more correct. **Verification:** - [ ] Check actual cursor update logic in `src/ingestion/issues.rs` - [ ] Determine if page-boundary pattern was adopted **Status:** ⬜ Not Verified --- ## 5. Pagination Fallback Strategy **PRD Deviation:** - CP1: Two-tier (x-next-page → empty-page detection) - CP2: Three-tier (Link header RFC 8288 → x-next-page → full-page heuristic) **Risk:** Issue sync may fail on proxies that strip headers while MR sync succeeds. **Verification:** - [ ] Check `paginate_issues()` in `src/gitlab/client.rs` - [ ] Verify Link header parsing is implemented - [ ] Verify fallback chain matches CP2's three-tier approach **Status:** ⬜ Not Verified --- ## 6. Stale Data Removal (Sweep Pattern) **PRD Deviation:** - CP1: No sweep pattern; stale discussions/notes accumulate forever - CP2: `last_seen_at` sweep removes stale data after successful pagination **CP2 PRD Pattern (lines 1291-1313):** ```rust // Remove stale discussions AND notes via last_seen_at sweep if result.pagination_succeeded { conn.execute( "DELETE FROM discussions WHERE ... AND last_seen_at < ?", ... )?; } ``` **Risk:** Deleted GitLab issue discussions remain in local DB indefinitely. **Verification:** - [ ] Check `ingest_issue_discussions()` in `src/ingestion/discussions.rs` - [ ] Verify sweep pattern is implemented for issues - [ ] Verify `last_seen_at` is updated during issue discussion upsert **Status:** ⬜ Not Verified --- ## 7. Discussion Sync Selection Pattern **PRD Deviation:** - CP1: In-memory collection during ingestion - CP2: DB-driven selection after ingestion completes **CP1 PRD (lines 818-827):** ```rust issues_in_page.push((local_issue_id, issue.iid, issue_updated_at)); // ... later filter for discussion sync eligibility ``` **CP2 PRD (lines 790-800):** ```sql SELECT id, iid, updated_at FROM merge_requests WHERE discussions_synced_for_updated_at IS NULL OR updated_at > discussions_synced_for_updated_at ``` **Risk:** Memory growth for projects with many issues; DB-driven is more scalable. **Verification:** - [ ] Check actual pattern in `src/ingestion/issues.rs` - [ ] Determine if DB-driven selection was adopted **Status:** ⬜ Not Verified --- ## 8. Sync Health Telemetry **PRD Deviation:** - CP1: No telemetry columns - CP2: `discussions_sync_last_attempt_at`, `discussions_sync_attempts`, `discussions_sync_last_error` **Risk:** Debugging discussion sync failures for issues is harder than for MRs. **Verification:** - [ ] Check `issues` table schema for telemetry columns - [ ] If missing, determine if backport is needed **Status:** ⬜ Not Verified --- ## 9. Watermark Advancement Logic **PRD Deviation:** - CP1: Advances watermark after any successful discussion ingestion - CP2: Only advances if pagination succeeds AND no parse failures **CP2 PRD (lines 1316-1335):** ```rust // CRITICAL: Only advance watermark if pagination succeeded completely if result.pagination_succeeded { mark_discussions_synced(conn, local_mr_id, mr_updated_at)?; } else { warn!("Discussion sync incomplete; watermark NOT advanced"); } ``` **Risk:** Issue discussion sync may lose data on partial failures. **Verification:** - [ ] Check `ingest_issue_discussions()` watermark logic - [ ] Verify partial failure handling matches CP2's stricter approach **Status:** ⬜ Not Verified --- ## 10. Raw Payload Storage Strategy **PRD Deviation:** - CP1: Stores payloads for all discussions and notes - CP2: Selective storage (skip system notes without DiffNote position) **CP2 PRD (lines 1251-1272):** ```rust let should_store_note_payload = !note.is_system() || note.position_new_path().is_some() || note.position_old_path().is_some(); ``` **Risk:** Database bloat for issues with many system notes. **Verification:** - [ ] Check actual payload storage logic in `src/ingestion/discussions.rs` - [ ] Determine if selective storage is implemented **Status:** ⬜ Not Verified --- ## 11. Full Sync (`--full`) Behavior **PRD Deviation:** - CP1: `--full` flag not explicitly documented - CP2: `--full` resets cursor AND discussion watermarks **CP2 PRD (lines 871-876):** ```rust if full_sync { reset_cursor(conn, project_id, "merge_requests")?; reset_discussion_watermarks(conn, project_id)?; } ``` **Risk:** `--full` on issues may not reset discussion watermarks. **Verification:** - [ ] Check `gi ingest --type=issues --full` behavior - [ ] Verify discussion watermarks are reset **Status:** ⬜ Not Verified --- ## 12. NormalizedNote Extended Fields **PRD Deviation:** - CP1: Basic fields only - CP2: Extended with `position_type`, `position_line_range_start/end`, SHA triplet **Expected:** This is intentional for MR DiffNotes. However, the shared `NormalizedNote` struct must support both. **Verification:** - [ ] Check `NormalizedNote` struct definition - [ ] Verify extended fields are present (even if NULL for issue notes) **Status:** ⬜ Not Verified --- ## Summary Action Items After verification, prioritize fixes: | Priority | Item | Reason | |----------|------|--------| | P0 | Timestamp parsing contract | Compile errors or silent corruption | | P0 | NormalizedDiscussion structure | Type incompatibility blocks CP2 | | P1 | Watermark advancement logic | Data loss on partial failures | | P1 | Stale data sweep pattern | Data accumulation over time | | P2 | Pagination fallback chain | Robustness for edge cases | | P2 | Cursor update strategy | Crash recovery consistency | | P3 | Sync health telemetry | Debugging convenience | | P3 | Raw payload storage | Database size optimization | --- ## Verification Commands ```bash # Check transformer signatures grep -n "fn transform_issue" src/gitlab/transformers/issue.rs grep -n "fn transform_notes" src/gitlab/transformers/discussion.rs # Check iso_to_ms signature grep -n "fn iso_to_ms" src/core/time.rs # Check NormalizedDiscussion struct grep -A 20 "struct NormalizedDiscussion" src/gitlab/transformers/discussion.rs # Check cursor update pattern grep -n "update_cursor" src/ingestion/issues.rs # Check sweep pattern grep -n "last_seen_at" src/ingestion/discussions.rs # Check telemetry columns grep -n "sync_attempts\|sync_last_error" migrations/*.sql ``` --- ## Notes _Use this section to record findings during verification._