9.4 KiB
CP1 ↔ CP2 Alignment Audit
Note: The project was renamed from "gitlab-inbox" to "gitlore" and the CLI from "gi" to "lore". References to "gi" in this document should be read as "lore".
Created: 2026-01-26 Purpose: Document deviations between CP1 (Issue Ingestion) and CP2 (MR Ingestion) PRDs that could cause implementation drift. Use this checklist to verify alignment before CP2 implementation.
Status Legend
- ⬜ Not Verified - Needs investigation
- ✅ Aligned - Implementations match or CP1 was updated
- ⚠️ Intentional Deviation - Documented reason for difference
- ❌ Gap Found - Needs remediation
1. Transformer Return Types
PRD Deviation:
- CP1: Infallible transformers returning structs directly
- CP2: Fallible transformers returning
Result<T, String>
CP1 PRD Example:
pub fn transform_issue(gitlab_issue: &GitLabIssue, local_project_id: i64) -> NormalizedIssue
pub fn transform_notes(...) -> Vec<NormalizedNote>
CP2 PRD Example:
pub fn transform_merge_request(...) -> Result<MergeRequestWithMetadata, String>
pub fn transform_notes_with_diff_position(...) -> Result<Vec<NormalizedNote>, String>
Risk: Inconsistent error handling; one silently produces bad data, other fails explicitly.
Verification:
- Check actual
transform_issue()signature insrc/gitlab/transformers/issue.rs - Check actual
transform_notes()signature insrc/gitlab/transformers/discussion.rs - Determine canonical pattern (Result vs infallible)
Status: ⬜ Not Verified
2. Timestamp Parsing Contract (iso_to_ms)
PRD Deviation:
- CP1: Assumes
iso_to_ms()returnsi64directly - CP2: Assumes
iso_to_ms()returnsOption<i64>
CP1 PRD Usage (line 348):
created_at: iso_to_ms(&gitlab_issue.created_at),
CP2 PRD Usage (line 377):
let created_at = iso_to_ms(&gitlab_mr.created_at)
.ok_or_else(|| format!("Invalid created_at: {}", gitlab_mr.created_at))?;
Risk: Compile error or silent zero values on invalid timestamps.
Verification:
- Check actual
iso_to_ms()signature insrc/core/time.rs - Verify all call sites handle return type correctly
Status: ⬜ Not Verified
3. NormalizedDiscussion Structure
PRD Deviation:
- CP1:
issue_id: i64(non-optional, line 381) - CP2:
issue_id: Option<i64>,merge_request_id: Option<i64>(lines 499-500)
Risk: Type incompatibility; CP2 can't reuse CP1's struct without modification.
Verification:
- Check actual
NormalizedDiscussionstruct definition - Verify it supports both issue and MR contexts
Status: ⬜ Not Verified
4. Cursor Update Strategy
PRD Deviation:
- CP1: Modulo-based flush every 100 items
- CP2: Page-boundary flush
CP1 PRD (lines 805-808):
if result.fetched % 100 == 0 {
update_cursor(...)?;
}
CP2 PRD (lines 994-998):
// Page-boundary cursor flush (not based on fetched % 100)
if let (Some(updated_at), Some(gitlab_id)) = (last_updated_at, last_gitlab_id) {
update_cursor(conn, project_id, "merge_requests", updated_at, gitlab_id)?;
}
Risk: Inconsistent crash recovery behavior; page-boundary is more correct.
Verification:
- Check actual cursor update logic in
src/ingestion/issues.rs - Determine if page-boundary pattern was adopted
Status: ⬜ Not Verified
5. Pagination Fallback Strategy
PRD Deviation:
- CP1: Two-tier (x-next-page → empty-page detection)
- CP2: Three-tier (Link header RFC 8288 → x-next-page → full-page heuristic)
Risk: Issue sync may fail on proxies that strip headers while MR sync succeeds.
Verification:
- Check
paginate_issues()insrc/gitlab/client.rs - Verify Link header parsing is implemented
- Verify fallback chain matches CP2's three-tier approach
Status: ⬜ Not Verified
6. Stale Data Removal (Sweep Pattern)
PRD Deviation:
- CP1: No sweep pattern; stale discussions/notes accumulate forever
- CP2:
last_seen_atsweep removes stale data after successful pagination
CP2 PRD Pattern (lines 1291-1313):
// Remove stale discussions AND notes via last_seen_at sweep
if result.pagination_succeeded {
conn.execute(
"DELETE FROM discussions WHERE ... AND last_seen_at < ?",
...
)?;
}
Risk: Deleted GitLab issue discussions remain in local DB indefinitely.
Verification:
- Check
ingest_issue_discussions()insrc/ingestion/discussions.rs - Verify sweep pattern is implemented for issues
- Verify
last_seen_atis updated during issue discussion upsert
Status: ⬜ Not Verified
7. Discussion Sync Selection Pattern
PRD Deviation:
- CP1: In-memory collection during ingestion
- CP2: DB-driven selection after ingestion completes
CP1 PRD (lines 818-827):
issues_in_page.push((local_issue_id, issue.iid, issue_updated_at));
// ... later filter for discussion sync eligibility
CP2 PRD (lines 790-800):
SELECT id, iid, updated_at FROM merge_requests
WHERE discussions_synced_for_updated_at IS NULL
OR updated_at > discussions_synced_for_updated_at
Risk: Memory growth for projects with many issues; DB-driven is more scalable.
Verification:
- Check actual pattern in
src/ingestion/issues.rs - Determine if DB-driven selection was adopted
Status: ⬜ Not Verified
8. Sync Health Telemetry
PRD Deviation:
- CP1: No telemetry columns
- CP2:
discussions_sync_last_attempt_at,discussions_sync_attempts,discussions_sync_last_error
Risk: Debugging discussion sync failures for issues is harder than for MRs.
Verification:
- Check
issuestable schema for telemetry columns - If missing, determine if backport is needed
Status: ⬜ Not Verified
9. Watermark Advancement Logic
PRD Deviation:
- CP1: Advances watermark after any successful discussion ingestion
- CP2: Only advances if pagination succeeds AND no parse failures
CP2 PRD (lines 1316-1335):
// CRITICAL: Only advance watermark if pagination succeeded completely
if result.pagination_succeeded {
mark_discussions_synced(conn, local_mr_id, mr_updated_at)?;
} else {
warn!("Discussion sync incomplete; watermark NOT advanced");
}
Risk: Issue discussion sync may lose data on partial failures.
Verification:
- Check
ingest_issue_discussions()watermark logic - Verify partial failure handling matches CP2's stricter approach
Status: ⬜ Not Verified
10. Raw Payload Storage Strategy
PRD Deviation:
- CP1: Stores payloads for all discussions and notes
- CP2: Selective storage (skip system notes without DiffNote position)
CP2 PRD (lines 1251-1272):
let should_store_note_payload =
!note.is_system() ||
note.position_new_path().is_some() ||
note.position_old_path().is_some();
Risk: Database bloat for issues with many system notes.
Verification:
- Check actual payload storage logic in
src/ingestion/discussions.rs - Determine if selective storage is implemented
Status: ⬜ Not Verified
11. Full Sync (--full) Behavior
PRD Deviation:
- CP1:
--fullflag not explicitly documented - CP2:
--fullresets cursor AND discussion watermarks
CP2 PRD (lines 871-876):
if full_sync {
reset_cursor(conn, project_id, "merge_requests")?;
reset_discussion_watermarks(conn, project_id)?;
}
Risk: --full on issues may not reset discussion watermarks.
Verification:
- Check
gi ingest --type=issues --fullbehavior - Verify discussion watermarks are reset
Status: ⬜ Not Verified
12. NormalizedNote Extended Fields
PRD Deviation:
- CP1: Basic fields only
- CP2: Extended with
position_type,position_line_range_start/end, SHA triplet
Expected: This is intentional for MR DiffNotes. However, the shared NormalizedNote struct must support both.
Verification:
- Check
NormalizedNotestruct definition - Verify extended fields are present (even if NULL for issue notes)
Status: ⬜ Not Verified
Summary Action Items
After verification, prioritize fixes:
| Priority | Item | Reason |
|---|---|---|
| P0 | Timestamp parsing contract | Compile errors or silent corruption |
| P0 | NormalizedDiscussion structure | Type incompatibility blocks CP2 |
| P1 | Watermark advancement logic | Data loss on partial failures |
| P1 | Stale data sweep pattern | Data accumulation over time |
| P2 | Pagination fallback chain | Robustness for edge cases |
| P2 | Cursor update strategy | Crash recovery consistency |
| P3 | Sync health telemetry | Debugging convenience |
| P3 | Raw payload storage | Database size optimization |
Verification Commands
# Check transformer signatures
grep -n "fn transform_issue" src/gitlab/transformers/issue.rs
grep -n "fn transform_notes" src/gitlab/transformers/discussion.rs
# Check iso_to_ms signature
grep -n "fn iso_to_ms" src/core/time.rs
# Check NormalizedDiscussion struct
grep -A 20 "struct NormalizedDiscussion" src/gitlab/transformers/discussion.rs
# Check cursor update pattern
grep -n "update_cursor" src/ingestion/issues.rs
# Check sweep pattern
grep -n "last_seen_at" src/ingestion/discussions.rs
# Check telemetry columns
grep -n "sync_attempts\|sync_last_error" migrations/*.sql
Notes
Use this section to record findings during verification.