Add comprehensive planning documentation for Checkpoint 2 (Merge Request support) and document the results of the CP1 implementation audit. checkpoint-2.md (2093 lines): - Complete PRD for adding merge request ingestion, querying, and display - Detailed user stories with acceptance criteria - ASCII wireframes for CLI output formats - Database schema extensions (migrations 006-007) - API integration specifications for MR endpoints - State transition diagrams for MR lifecycle - Performance requirements and test specifications - Risk assessment and mitigation strategies cp1-cp2-alignment-audit.md (344 lines): - Gap analysis between CP1 PRD and actual implementation - Identified issues prioritized by severity (P0/P1/P2/P3) - P0: NormalizedDiscussion struct incompatible with MR discussions - P1: --full flag not resetting discussion watermarks - P2: Missing Link header pagination fallback - P3: Missing sync health telemetry and selective payload storage - Each issue includes root cause, recommended fix, and affected files The P0 and P1 issues have been fixed in accompanying commits. P2 and P3 items are deferred to CP2 implementation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
9.2 KiB
CP1 ↔ CP2 Alignment Audit
Created: 2026-01-26 Purpose: Document deviations between CP1 (Issue Ingestion) and CP2 (MR Ingestion) PRDs that could cause implementation drift. Use this checklist to verify alignment before CP2 implementation.
Status Legend
- ⬜ Not Verified - Needs investigation
- ✅ Aligned - Implementations match or CP1 was updated
- ⚠️ Intentional Deviation - Documented reason for difference
- ❌ Gap Found - Needs remediation
1. Transformer Return Types
PRD Deviation:
- CP1: Infallible transformers returning structs directly
- CP2: Fallible transformers returning
Result<T, String>
CP1 PRD Example:
pub fn transform_issue(gitlab_issue: &GitLabIssue, local_project_id: i64) -> NormalizedIssue
pub fn transform_notes(...) -> Vec<NormalizedNote>
CP2 PRD Example:
pub fn transform_merge_request(...) -> Result<MergeRequestWithMetadata, String>
pub fn transform_notes_with_diff_position(...) -> Result<Vec<NormalizedNote>, String>
Risk: Inconsistent error handling; one silently produces bad data, other fails explicitly.
Verification:
- Check actual
transform_issue()signature insrc/gitlab/transformers/issue.rs - Check actual
transform_notes()signature insrc/gitlab/transformers/discussion.rs - Determine canonical pattern (Result vs infallible)
Status: ⬜ Not Verified
2. Timestamp Parsing Contract (iso_to_ms)
PRD Deviation:
- CP1: Assumes
iso_to_ms()returnsi64directly - CP2: Assumes
iso_to_ms()returnsOption<i64>
CP1 PRD Usage (line 348):
created_at: iso_to_ms(&gitlab_issue.created_at),
CP2 PRD Usage (line 377):
let created_at = iso_to_ms(&gitlab_mr.created_at)
.ok_or_else(|| format!("Invalid created_at: {}", gitlab_mr.created_at))?;
Risk: Compile error or silent zero values on invalid timestamps.
Verification:
- Check actual
iso_to_ms()signature insrc/core/time.rs - Verify all call sites handle return type correctly
Status: ⬜ Not Verified
3. NormalizedDiscussion Structure
PRD Deviation:
- CP1:
issue_id: i64(non-optional, line 381) - CP2:
issue_id: Option<i64>,merge_request_id: Option<i64>(lines 499-500)
Risk: Type incompatibility; CP2 can't reuse CP1's struct without modification.
Verification:
- Check actual
NormalizedDiscussionstruct definition - Verify it supports both issue and MR contexts
Status: ⬜ Not Verified
4. Cursor Update Strategy
PRD Deviation:
- CP1: Modulo-based flush every 100 items
- CP2: Page-boundary flush
CP1 PRD (lines 805-808):
if result.fetched % 100 == 0 {
update_cursor(...)?;
}
CP2 PRD (lines 994-998):
// Page-boundary cursor flush (not based on fetched % 100)
if let (Some(updated_at), Some(gitlab_id)) = (last_updated_at, last_gitlab_id) {
update_cursor(conn, project_id, "merge_requests", updated_at, gitlab_id)?;
}
Risk: Inconsistent crash recovery behavior; page-boundary is more correct.
Verification:
- Check actual cursor update logic in
src/ingestion/issues.rs - Determine if page-boundary pattern was adopted
Status: ⬜ Not Verified
5. Pagination Fallback Strategy
PRD Deviation:
- CP1: Two-tier (x-next-page → empty-page detection)
- CP2: Three-tier (Link header RFC 8288 → x-next-page → full-page heuristic)
Risk: Issue sync may fail on proxies that strip headers while MR sync succeeds.
Verification:
- Check
paginate_issues()insrc/gitlab/client.rs - Verify Link header parsing is implemented
- Verify fallback chain matches CP2's three-tier approach
Status: ⬜ Not Verified
6. Stale Data Removal (Sweep Pattern)
PRD Deviation:
- CP1: No sweep pattern; stale discussions/notes accumulate forever
- CP2:
last_seen_atsweep removes stale data after successful pagination
CP2 PRD Pattern (lines 1291-1313):
// Remove stale discussions AND notes via last_seen_at sweep
if result.pagination_succeeded {
conn.execute(
"DELETE FROM discussions WHERE ... AND last_seen_at < ?",
...
)?;
}
Risk: Deleted GitLab issue discussions remain in local DB indefinitely.
Verification:
- Check
ingest_issue_discussions()insrc/ingestion/discussions.rs - Verify sweep pattern is implemented for issues
- Verify
last_seen_atis updated during issue discussion upsert
Status: ⬜ Not Verified
7. Discussion Sync Selection Pattern
PRD Deviation:
- CP1: In-memory collection during ingestion
- CP2: DB-driven selection after ingestion completes
CP1 PRD (lines 818-827):
issues_in_page.push((local_issue_id, issue.iid, issue_updated_at));
// ... later filter for discussion sync eligibility
CP2 PRD (lines 790-800):
SELECT id, iid, updated_at FROM merge_requests
WHERE discussions_synced_for_updated_at IS NULL
OR updated_at > discussions_synced_for_updated_at
Risk: Memory growth for projects with many issues; DB-driven is more scalable.
Verification:
- Check actual pattern in
src/ingestion/issues.rs - Determine if DB-driven selection was adopted
Status: ⬜ Not Verified
8. Sync Health Telemetry
PRD Deviation:
- CP1: No telemetry columns
- CP2:
discussions_sync_last_attempt_at,discussions_sync_attempts,discussions_sync_last_error
Risk: Debugging discussion sync failures for issues is harder than for MRs.
Verification:
- Check
issuestable schema for telemetry columns - If missing, determine if backport is needed
Status: ⬜ Not Verified
9. Watermark Advancement Logic
PRD Deviation:
- CP1: Advances watermark after any successful discussion ingestion
- CP2: Only advances if pagination succeeds AND no parse failures
CP2 PRD (lines 1316-1335):
// CRITICAL: Only advance watermark if pagination succeeded completely
if result.pagination_succeeded {
mark_discussions_synced(conn, local_mr_id, mr_updated_at)?;
} else {
warn!("Discussion sync incomplete; watermark NOT advanced");
}
Risk: Issue discussion sync may lose data on partial failures.
Verification:
- Check
ingest_issue_discussions()watermark logic - Verify partial failure handling matches CP2's stricter approach
Status: ⬜ Not Verified
10. Raw Payload Storage Strategy
PRD Deviation:
- CP1: Stores payloads for all discussions and notes
- CP2: Selective storage (skip system notes without DiffNote position)
CP2 PRD (lines 1251-1272):
let should_store_note_payload =
!note.is_system() ||
note.position_new_path().is_some() ||
note.position_old_path().is_some();
Risk: Database bloat for issues with many system notes.
Verification:
- Check actual payload storage logic in
src/ingestion/discussions.rs - Determine if selective storage is implemented
Status: ⬜ Not Verified
11. Full Sync (--full) Behavior
PRD Deviation:
- CP1:
--fullflag not explicitly documented - CP2:
--fullresets cursor AND discussion watermarks
CP2 PRD (lines 871-876):
if full_sync {
reset_cursor(conn, project_id, "merge_requests")?;
reset_discussion_watermarks(conn, project_id)?;
}
Risk: --full on issues may not reset discussion watermarks.
Verification:
- Check
gi ingest --type=issues --fullbehavior - Verify discussion watermarks are reset
Status: ⬜ Not Verified
12. NormalizedNote Extended Fields
PRD Deviation:
- CP1: Basic fields only
- CP2: Extended with
position_type,position_line_range_start/end, SHA triplet
Expected: This is intentional for MR DiffNotes. However, the shared NormalizedNote struct must support both.
Verification:
- Check
NormalizedNotestruct definition - Verify extended fields are present (even if NULL for issue notes)
Status: ⬜ Not Verified
Summary Action Items
After verification, prioritize fixes:
| Priority | Item | Reason |
|---|---|---|
| P0 | Timestamp parsing contract | Compile errors or silent corruption |
| P0 | NormalizedDiscussion structure | Type incompatibility blocks CP2 |
| P1 | Watermark advancement logic | Data loss on partial failures |
| P1 | Stale data sweep pattern | Data accumulation over time |
| P2 | Pagination fallback chain | Robustness for edge cases |
| P2 | Cursor update strategy | Crash recovery consistency |
| P3 | Sync health telemetry | Debugging convenience |
| P3 | Raw payload storage | Database size optimization |
Verification Commands
# Check transformer signatures
grep -n "fn transform_issue" src/gitlab/transformers/issue.rs
grep -n "fn transform_notes" src/gitlab/transformers/discussion.rs
# Check iso_to_ms signature
grep -n "fn iso_to_ms" src/core/time.rs
# Check NormalizedDiscussion struct
grep -A 20 "struct NormalizedDiscussion" src/gitlab/transformers/discussion.rs
# Check cursor update pattern
grep -n "update_cursor" src/ingestion/issues.rs
# Check sweep pattern
grep -n "last_seen_at" src/ingestion/discussions.rs
# Check telemetry columns
grep -n "sync_attempts\|sync_last_error" migrations/*.sql
Notes
Use this section to record findings during verification.