Files
gitlore/docs/prd/cp1-cp2-alignment-audit.md
2026-01-28 15:49:14 -05:00

347 lines
9.4 KiB
Markdown

# CP1 ↔ CP2 Alignment Audit
> **Note:** The project was renamed from "gitlab-inbox" to "gitlore" and the CLI from "gi" to "lore". References to "gi" in this document should be read as "lore".
**Created:** 2026-01-26
**Purpose:** Document deviations between CP1 (Issue Ingestion) and CP2 (MR Ingestion) PRDs that could cause implementation drift. Use this checklist to verify alignment before CP2 implementation.
---
## Status Legend
-**Not Verified** - Needs investigation
-**Aligned** - Implementations match or CP1 was updated
- ⚠️ **Intentional Deviation** - Documented reason for difference
-**Gap Found** - Needs remediation
---
## 1. Transformer Return Types
**PRD Deviation:**
- CP1: Infallible transformers returning structs directly
- CP2: Fallible transformers returning `Result<T, String>`
**CP1 PRD Example:**
```rust
pub fn transform_issue(gitlab_issue: &GitLabIssue, local_project_id: i64) -> NormalizedIssue
pub fn transform_notes(...) -> Vec<NormalizedNote>
```
**CP2 PRD Example:**
```rust
pub fn transform_merge_request(...) -> Result<MergeRequestWithMetadata, String>
pub fn transform_notes_with_diff_position(...) -> Result<Vec<NormalizedNote>, String>
```
**Risk:** Inconsistent error handling; one silently produces bad data, other fails explicitly.
**Verification:**
- [ ] Check actual `transform_issue()` signature in `src/gitlab/transformers/issue.rs`
- [ ] Check actual `transform_notes()` signature in `src/gitlab/transformers/discussion.rs`
- [ ] Determine canonical pattern (Result vs infallible)
**Status:** ⬜ Not Verified
---
## 2. Timestamp Parsing Contract (`iso_to_ms`)
**PRD Deviation:**
- CP1: Assumes `iso_to_ms()` returns `i64` directly
- CP2: Assumes `iso_to_ms()` returns `Option<i64>`
**CP1 PRD Usage (line 348):**
```rust
created_at: iso_to_ms(&gitlab_issue.created_at),
```
**CP2 PRD Usage (line 377):**
```rust
let created_at = iso_to_ms(&gitlab_mr.created_at)
.ok_or_else(|| format!("Invalid created_at: {}", gitlab_mr.created_at))?;
```
**Risk:** Compile error or silent zero values on invalid timestamps.
**Verification:**
- [ ] Check actual `iso_to_ms()` signature in `src/core/time.rs`
- [ ] Verify all call sites handle return type correctly
**Status:** ⬜ Not Verified
---
## 3. NormalizedDiscussion Structure
**PRD Deviation:**
- CP1: `issue_id: i64` (non-optional, line 381)
- CP2: `issue_id: Option<i64>`, `merge_request_id: Option<i64>` (lines 499-500)
**Risk:** Type incompatibility; CP2 can't reuse CP1's struct without modification.
**Verification:**
- [ ] Check actual `NormalizedDiscussion` struct definition
- [ ] Verify it supports both issue and MR contexts
**Status:** ⬜ Not Verified
---
## 4. Cursor Update Strategy
**PRD Deviation:**
- CP1: Modulo-based flush every 100 items
- CP2: Page-boundary flush
**CP1 PRD (lines 805-808):**
```rust
if result.fetched % 100 == 0 {
update_cursor(...)?;
}
```
**CP2 PRD (lines 994-998):**
```rust
// Page-boundary cursor flush (not based on fetched % 100)
if let (Some(updated_at), Some(gitlab_id)) = (last_updated_at, last_gitlab_id) {
update_cursor(conn, project_id, "merge_requests", updated_at, gitlab_id)?;
}
```
**Risk:** Inconsistent crash recovery behavior; page-boundary is more correct.
**Verification:**
- [ ] Check actual cursor update logic in `src/ingestion/issues.rs`
- [ ] Determine if page-boundary pattern was adopted
**Status:** ⬜ Not Verified
---
## 5. Pagination Fallback Strategy
**PRD Deviation:**
- CP1: Two-tier (x-next-page → empty-page detection)
- CP2: Three-tier (Link header RFC 8288 → x-next-page → full-page heuristic)
**Risk:** Issue sync may fail on proxies that strip headers while MR sync succeeds.
**Verification:**
- [ ] Check `paginate_issues()` in `src/gitlab/client.rs`
- [ ] Verify Link header parsing is implemented
- [ ] Verify fallback chain matches CP2's three-tier approach
**Status:** ⬜ Not Verified
---
## 6. Stale Data Removal (Sweep Pattern)
**PRD Deviation:**
- CP1: No sweep pattern; stale discussions/notes accumulate forever
- CP2: `last_seen_at` sweep removes stale data after successful pagination
**CP2 PRD Pattern (lines 1291-1313):**
```rust
// Remove stale discussions AND notes via last_seen_at sweep
if result.pagination_succeeded {
conn.execute(
"DELETE FROM discussions WHERE ... AND last_seen_at < ?",
...
)?;
}
```
**Risk:** Deleted GitLab issue discussions remain in local DB indefinitely.
**Verification:**
- [ ] Check `ingest_issue_discussions()` in `src/ingestion/discussions.rs`
- [ ] Verify sweep pattern is implemented for issues
- [ ] Verify `last_seen_at` is updated during issue discussion upsert
**Status:** ⬜ Not Verified
---
## 7. Discussion Sync Selection Pattern
**PRD Deviation:**
- CP1: In-memory collection during ingestion
- CP2: DB-driven selection after ingestion completes
**CP1 PRD (lines 818-827):**
```rust
issues_in_page.push((local_issue_id, issue.iid, issue_updated_at));
// ... later filter for discussion sync eligibility
```
**CP2 PRD (lines 790-800):**
```sql
SELECT id, iid, updated_at FROM merge_requests
WHERE discussions_synced_for_updated_at IS NULL
OR updated_at > discussions_synced_for_updated_at
```
**Risk:** Memory growth for projects with many issues; DB-driven is more scalable.
**Verification:**
- [ ] Check actual pattern in `src/ingestion/issues.rs`
- [ ] Determine if DB-driven selection was adopted
**Status:** ⬜ Not Verified
---
## 8. Sync Health Telemetry
**PRD Deviation:**
- CP1: No telemetry columns
- CP2: `discussions_sync_last_attempt_at`, `discussions_sync_attempts`, `discussions_sync_last_error`
**Risk:** Debugging discussion sync failures for issues is harder than for MRs.
**Verification:**
- [ ] Check `issues` table schema for telemetry columns
- [ ] If missing, determine if backport is needed
**Status:** ⬜ Not Verified
---
## 9. Watermark Advancement Logic
**PRD Deviation:**
- CP1: Advances watermark after any successful discussion ingestion
- CP2: Only advances if pagination succeeds AND no parse failures
**CP2 PRD (lines 1316-1335):**
```rust
// CRITICAL: Only advance watermark if pagination succeeded completely
if result.pagination_succeeded {
mark_discussions_synced(conn, local_mr_id, mr_updated_at)?;
} else {
warn!("Discussion sync incomplete; watermark NOT advanced");
}
```
**Risk:** Issue discussion sync may lose data on partial failures.
**Verification:**
- [ ] Check `ingest_issue_discussions()` watermark logic
- [ ] Verify partial failure handling matches CP2's stricter approach
**Status:** ⬜ Not Verified
---
## 10. Raw Payload Storage Strategy
**PRD Deviation:**
- CP1: Stores payloads for all discussions and notes
- CP2: Selective storage (skip system notes without DiffNote position)
**CP2 PRD (lines 1251-1272):**
```rust
let should_store_note_payload =
!note.is_system() ||
note.position_new_path().is_some() ||
note.position_old_path().is_some();
```
**Risk:** Database bloat for issues with many system notes.
**Verification:**
- [ ] Check actual payload storage logic in `src/ingestion/discussions.rs`
- [ ] Determine if selective storage is implemented
**Status:** ⬜ Not Verified
---
## 11. Full Sync (`--full`) Behavior
**PRD Deviation:**
- CP1: `--full` flag not explicitly documented
- CP2: `--full` resets cursor AND discussion watermarks
**CP2 PRD (lines 871-876):**
```rust
if full_sync {
reset_cursor(conn, project_id, "merge_requests")?;
reset_discussion_watermarks(conn, project_id)?;
}
```
**Risk:** `--full` on issues may not reset discussion watermarks.
**Verification:**
- [ ] Check `gi ingest --type=issues --full` behavior
- [ ] Verify discussion watermarks are reset
**Status:** ⬜ Not Verified
---
## 12. NormalizedNote Extended Fields
**PRD Deviation:**
- CP1: Basic fields only
- CP2: Extended with `position_type`, `position_line_range_start/end`, SHA triplet
**Expected:** This is intentional for MR DiffNotes. However, the shared `NormalizedNote` struct must support both.
**Verification:**
- [ ] Check `NormalizedNote` struct definition
- [ ] Verify extended fields are present (even if NULL for issue notes)
**Status:** ⬜ Not Verified
---
## Summary Action Items
After verification, prioritize fixes:
| Priority | Item | Reason |
|----------|------|--------|
| P0 | Timestamp parsing contract | Compile errors or silent corruption |
| P0 | NormalizedDiscussion structure | Type incompatibility blocks CP2 |
| P1 | Watermark advancement logic | Data loss on partial failures |
| P1 | Stale data sweep pattern | Data accumulation over time |
| P2 | Pagination fallback chain | Robustness for edge cases |
| P2 | Cursor update strategy | Crash recovery consistency |
| P3 | Sync health telemetry | Debugging convenience |
| P3 | Raw payload storage | Database size optimization |
---
## Verification Commands
```bash
# Check transformer signatures
grep -n "fn transform_issue" src/gitlab/transformers/issue.rs
grep -n "fn transform_notes" src/gitlab/transformers/discussion.rs
# Check iso_to_ms signature
grep -n "fn iso_to_ms" src/core/time.rs
# Check NormalizedDiscussion struct
grep -A 20 "struct NormalizedDiscussion" src/gitlab/transformers/discussion.rs
# Check cursor update pattern
grep -n "update_cursor" src/ingestion/issues.rs
# Check sweep pattern
grep -n "last_seen_at" src/ingestion/discussions.rs
# Check telemetry columns
grep -n "sync_attempts\|sync_last_error" migrations/*.sql
```
---
## Notes
_Use this section to record findings during verification._