Files
gitlore/docs/prd/cp1-cp2-alignment-audit.md
Taylor Eernisse 0952d21a90 docs(prd): Add CP2 PRD and CP1-CP2 alignment audit
Add comprehensive planning documentation for Checkpoint 2 (Merge Request
support) and document the results of the CP1 implementation audit.

checkpoint-2.md (2093 lines):
- Complete PRD for adding merge request ingestion, querying, and display
- Detailed user stories with acceptance criteria
- ASCII wireframes for CLI output formats
- Database schema extensions (migrations 006-007)
- API integration specifications for MR endpoints
- State transition diagrams for MR lifecycle
- Performance requirements and test specifications
- Risk assessment and mitigation strategies

cp1-cp2-alignment-audit.md (344 lines):
- Gap analysis between CP1 PRD and actual implementation
- Identified issues prioritized by severity (P0/P1/P2/P3)
- P0: NormalizedDiscussion struct incompatible with MR discussions
- P1: --full flag not resetting discussion watermarks
- P2: Missing Link header pagination fallback
- P3: Missing sync health telemetry and selective payload storage
- Each issue includes root cause, recommended fix, and affected files

The P0 and P1 issues have been fixed in accompanying commits. P2 and P3
items are deferred to CP2 implementation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 17:01:20 -05:00

9.2 KiB

CP1 ↔ CP2 Alignment Audit

Created: 2026-01-26 Purpose: Document deviations between CP1 (Issue Ingestion) and CP2 (MR Ingestion) PRDs that could cause implementation drift. Use this checklist to verify alignment before CP2 implementation.


Status Legend

  • Not Verified - Needs investigation
  • Aligned - Implementations match or CP1 was updated
  • ⚠️ Intentional Deviation - Documented reason for difference
  • Gap Found - Needs remediation

1. Transformer Return Types

PRD Deviation:

  • CP1: Infallible transformers returning structs directly
  • CP2: Fallible transformers returning Result<T, String>

CP1 PRD Example:

pub fn transform_issue(gitlab_issue: &GitLabIssue, local_project_id: i64) -> NormalizedIssue
pub fn transform_notes(...) -> Vec<NormalizedNote>

CP2 PRD Example:

pub fn transform_merge_request(...) -> Result<MergeRequestWithMetadata, String>
pub fn transform_notes_with_diff_position(...) -> Result<Vec<NormalizedNote>, String>

Risk: Inconsistent error handling; one silently produces bad data, other fails explicitly.

Verification:

  • Check actual transform_issue() signature in src/gitlab/transformers/issue.rs
  • Check actual transform_notes() signature in src/gitlab/transformers/discussion.rs
  • Determine canonical pattern (Result vs infallible)

Status: Not Verified


2. Timestamp Parsing Contract (iso_to_ms)

PRD Deviation:

  • CP1: Assumes iso_to_ms() returns i64 directly
  • CP2: Assumes iso_to_ms() returns Option<i64>

CP1 PRD Usage (line 348):

created_at: iso_to_ms(&gitlab_issue.created_at),

CP2 PRD Usage (line 377):

let created_at = iso_to_ms(&gitlab_mr.created_at)
    .ok_or_else(|| format!("Invalid created_at: {}", gitlab_mr.created_at))?;

Risk: Compile error or silent zero values on invalid timestamps.

Verification:

  • Check actual iso_to_ms() signature in src/core/time.rs
  • Verify all call sites handle return type correctly

Status: Not Verified


3. NormalizedDiscussion Structure

PRD Deviation:

  • CP1: issue_id: i64 (non-optional, line 381)
  • CP2: issue_id: Option<i64>, merge_request_id: Option<i64> (lines 499-500)

Risk: Type incompatibility; CP2 can't reuse CP1's struct without modification.

Verification:

  • Check actual NormalizedDiscussion struct definition
  • Verify it supports both issue and MR contexts

Status: Not Verified


4. Cursor Update Strategy

PRD Deviation:

  • CP1: Modulo-based flush every 100 items
  • CP2: Page-boundary flush

CP1 PRD (lines 805-808):

if result.fetched % 100 == 0 {
    update_cursor(...)?;
}

CP2 PRD (lines 994-998):

// Page-boundary cursor flush (not based on fetched % 100)
if let (Some(updated_at), Some(gitlab_id)) = (last_updated_at, last_gitlab_id) {
    update_cursor(conn, project_id, "merge_requests", updated_at, gitlab_id)?;
}

Risk: Inconsistent crash recovery behavior; page-boundary is more correct.

Verification:

  • Check actual cursor update logic in src/ingestion/issues.rs
  • Determine if page-boundary pattern was adopted

Status: Not Verified


5. Pagination Fallback Strategy

PRD Deviation:

  • CP1: Two-tier (x-next-page → empty-page detection)
  • CP2: Three-tier (Link header RFC 8288 → x-next-page → full-page heuristic)

Risk: Issue sync may fail on proxies that strip headers while MR sync succeeds.

Verification:

  • Check paginate_issues() in src/gitlab/client.rs
  • Verify Link header parsing is implemented
  • Verify fallback chain matches CP2's three-tier approach

Status: Not Verified


6. Stale Data Removal (Sweep Pattern)

PRD Deviation:

  • CP1: No sweep pattern; stale discussions/notes accumulate forever
  • CP2: last_seen_at sweep removes stale data after successful pagination

CP2 PRD Pattern (lines 1291-1313):

// Remove stale discussions AND notes via last_seen_at sweep
if result.pagination_succeeded {
    conn.execute(
        "DELETE FROM discussions WHERE ... AND last_seen_at < ?",
        ...
    )?;
}

Risk: Deleted GitLab issue discussions remain in local DB indefinitely.

Verification:

  • Check ingest_issue_discussions() in src/ingestion/discussions.rs
  • Verify sweep pattern is implemented for issues
  • Verify last_seen_at is updated during issue discussion upsert

Status: Not Verified


7. Discussion Sync Selection Pattern

PRD Deviation:

  • CP1: In-memory collection during ingestion
  • CP2: DB-driven selection after ingestion completes

CP1 PRD (lines 818-827):

issues_in_page.push((local_issue_id, issue.iid, issue_updated_at));
// ... later filter for discussion sync eligibility

CP2 PRD (lines 790-800):

SELECT id, iid, updated_at FROM merge_requests
WHERE discussions_synced_for_updated_at IS NULL
   OR updated_at > discussions_synced_for_updated_at

Risk: Memory growth for projects with many issues; DB-driven is more scalable.

Verification:

  • Check actual pattern in src/ingestion/issues.rs
  • Determine if DB-driven selection was adopted

Status: Not Verified


8. Sync Health Telemetry

PRD Deviation:

  • CP1: No telemetry columns
  • CP2: discussions_sync_last_attempt_at, discussions_sync_attempts, discussions_sync_last_error

Risk: Debugging discussion sync failures for issues is harder than for MRs.

Verification:

  • Check issues table schema for telemetry columns
  • If missing, determine if backport is needed

Status: Not Verified


9. Watermark Advancement Logic

PRD Deviation:

  • CP1: Advances watermark after any successful discussion ingestion
  • CP2: Only advances if pagination succeeds AND no parse failures

CP2 PRD (lines 1316-1335):

// CRITICAL: Only advance watermark if pagination succeeded completely
if result.pagination_succeeded {
    mark_discussions_synced(conn, local_mr_id, mr_updated_at)?;
} else {
    warn!("Discussion sync incomplete; watermark NOT advanced");
}

Risk: Issue discussion sync may lose data on partial failures.

Verification:

  • Check ingest_issue_discussions() watermark logic
  • Verify partial failure handling matches CP2's stricter approach

Status: Not Verified


10. Raw Payload Storage Strategy

PRD Deviation:

  • CP1: Stores payloads for all discussions and notes
  • CP2: Selective storage (skip system notes without DiffNote position)

CP2 PRD (lines 1251-1272):

let should_store_note_payload =
    !note.is_system() ||
    note.position_new_path().is_some() ||
    note.position_old_path().is_some();

Risk: Database bloat for issues with many system notes.

Verification:

  • Check actual payload storage logic in src/ingestion/discussions.rs
  • Determine if selective storage is implemented

Status: Not Verified


11. Full Sync (--full) Behavior

PRD Deviation:

  • CP1: --full flag not explicitly documented
  • CP2: --full resets cursor AND discussion watermarks

CP2 PRD (lines 871-876):

if full_sync {
    reset_cursor(conn, project_id, "merge_requests")?;
    reset_discussion_watermarks(conn, project_id)?;
}

Risk: --full on issues may not reset discussion watermarks.

Verification:

  • Check gi ingest --type=issues --full behavior
  • Verify discussion watermarks are reset

Status: Not Verified


12. NormalizedNote Extended Fields

PRD Deviation:

  • CP1: Basic fields only
  • CP2: Extended with position_type, position_line_range_start/end, SHA triplet

Expected: This is intentional for MR DiffNotes. However, the shared NormalizedNote struct must support both.

Verification:

  • Check NormalizedNote struct definition
  • Verify extended fields are present (even if NULL for issue notes)

Status: Not Verified


Summary Action Items

After verification, prioritize fixes:

Priority Item Reason
P0 Timestamp parsing contract Compile errors or silent corruption
P0 NormalizedDiscussion structure Type incompatibility blocks CP2
P1 Watermark advancement logic Data loss on partial failures
P1 Stale data sweep pattern Data accumulation over time
P2 Pagination fallback chain Robustness for edge cases
P2 Cursor update strategy Crash recovery consistency
P3 Sync health telemetry Debugging convenience
P3 Raw payload storage Database size optimization

Verification Commands

# Check transformer signatures
grep -n "fn transform_issue" src/gitlab/transformers/issue.rs
grep -n "fn transform_notes" src/gitlab/transformers/discussion.rs

# Check iso_to_ms signature
grep -n "fn iso_to_ms" src/core/time.rs

# Check NormalizedDiscussion struct
grep -A 20 "struct NormalizedDiscussion" src/gitlab/transformers/discussion.rs

# Check cursor update pattern
grep -n "update_cursor" src/ingestion/issues.rs

# Check sweep pattern
grep -n "last_seen_at" src/ingestion/discussions.rs

# Check telemetry columns
grep -n "sync_attempts\|sync_last_error" migrations/*.sql

Notes

Use this section to record findings during verification.