347 lines
9.4 KiB
Markdown
347 lines
9.4 KiB
Markdown
# CP1 ↔ CP2 Alignment Audit
|
|
|
|
> **Note:** The project was renamed from "gitlab-inbox" to "gitlore" and the CLI from "gi" to "lore". References to "gi" in this document should be read as "lore".
|
|
|
|
**Created:** 2026-01-26
|
|
**Purpose:** Document deviations between CP1 (Issue Ingestion) and CP2 (MR Ingestion) PRDs that could cause implementation drift. Use this checklist to verify alignment before CP2 implementation.
|
|
|
|
---
|
|
|
|
## Status Legend
|
|
|
|
- ⬜ **Not Verified** - Needs investigation
|
|
- ✅ **Aligned** - Implementations match or CP1 was updated
|
|
- ⚠️ **Intentional Deviation** - Documented reason for difference
|
|
- ❌ **Gap Found** - Needs remediation
|
|
|
|
---
|
|
|
|
## 1. Transformer Return Types
|
|
|
|
**PRD Deviation:**
|
|
- CP1: Infallible transformers returning structs directly
|
|
- CP2: Fallible transformers returning `Result<T, String>`
|
|
|
|
**CP1 PRD Example:**
|
|
```rust
|
|
pub fn transform_issue(gitlab_issue: &GitLabIssue, local_project_id: i64) -> NormalizedIssue
|
|
pub fn transform_notes(...) -> Vec<NormalizedNote>
|
|
```
|
|
|
|
**CP2 PRD Example:**
|
|
```rust
|
|
pub fn transform_merge_request(...) -> Result<MergeRequestWithMetadata, String>
|
|
pub fn transform_notes_with_diff_position(...) -> Result<Vec<NormalizedNote>, String>
|
|
```
|
|
|
|
**Risk:** Inconsistent error handling; one silently produces bad data, other fails explicitly.
|
|
|
|
**Verification:**
|
|
- [ ] Check actual `transform_issue()` signature in `src/gitlab/transformers/issue.rs`
|
|
- [ ] Check actual `transform_notes()` signature in `src/gitlab/transformers/discussion.rs`
|
|
- [ ] Determine canonical pattern (Result vs infallible)
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 2. Timestamp Parsing Contract (`iso_to_ms`)
|
|
|
|
**PRD Deviation:**
|
|
- CP1: Assumes `iso_to_ms()` returns `i64` directly
|
|
- CP2: Assumes `iso_to_ms()` returns `Option<i64>`
|
|
|
|
**CP1 PRD Usage (line 348):**
|
|
```rust
|
|
created_at: iso_to_ms(&gitlab_issue.created_at),
|
|
```
|
|
|
|
**CP2 PRD Usage (line 377):**
|
|
```rust
|
|
let created_at = iso_to_ms(&gitlab_mr.created_at)
|
|
.ok_or_else(|| format!("Invalid created_at: {}", gitlab_mr.created_at))?;
|
|
```
|
|
|
|
**Risk:** Compile error or silent zero values on invalid timestamps.
|
|
|
|
**Verification:**
|
|
- [ ] Check actual `iso_to_ms()` signature in `src/core/time.rs`
|
|
- [ ] Verify all call sites handle return type correctly
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 3. NormalizedDiscussion Structure
|
|
|
|
**PRD Deviation:**
|
|
- CP1: `issue_id: i64` (non-optional, line 381)
|
|
- CP2: `issue_id: Option<i64>`, `merge_request_id: Option<i64>` (lines 499-500)
|
|
|
|
**Risk:** Type incompatibility; CP2 can't reuse CP1's struct without modification.
|
|
|
|
**Verification:**
|
|
- [ ] Check actual `NormalizedDiscussion` struct definition
|
|
- [ ] Verify it supports both issue and MR contexts
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 4. Cursor Update Strategy
|
|
|
|
**PRD Deviation:**
|
|
- CP1: Modulo-based flush every 100 items
|
|
- CP2: Page-boundary flush
|
|
|
|
**CP1 PRD (lines 805-808):**
|
|
```rust
|
|
if result.fetched % 100 == 0 {
|
|
update_cursor(...)?;
|
|
}
|
|
```
|
|
|
|
**CP2 PRD (lines 994-998):**
|
|
```rust
|
|
// Page-boundary cursor flush (not based on fetched % 100)
|
|
if let (Some(updated_at), Some(gitlab_id)) = (last_updated_at, last_gitlab_id) {
|
|
update_cursor(conn, project_id, "merge_requests", updated_at, gitlab_id)?;
|
|
}
|
|
```
|
|
|
|
**Risk:** Inconsistent crash recovery behavior; page-boundary is more correct.
|
|
|
|
**Verification:**
|
|
- [ ] Check actual cursor update logic in `src/ingestion/issues.rs`
|
|
- [ ] Determine if page-boundary pattern was adopted
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 5. Pagination Fallback Strategy
|
|
|
|
**PRD Deviation:**
|
|
- CP1: Two-tier (x-next-page → empty-page detection)
|
|
- CP2: Three-tier (Link header RFC 8288 → x-next-page → full-page heuristic)
|
|
|
|
**Risk:** Issue sync may fail on proxies that strip headers while MR sync succeeds.
|
|
|
|
**Verification:**
|
|
- [ ] Check `paginate_issues()` in `src/gitlab/client.rs`
|
|
- [ ] Verify Link header parsing is implemented
|
|
- [ ] Verify fallback chain matches CP2's three-tier approach
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 6. Stale Data Removal (Sweep Pattern)
|
|
|
|
**PRD Deviation:**
|
|
- CP1: No sweep pattern; stale discussions/notes accumulate forever
|
|
- CP2: `last_seen_at` sweep removes stale data after successful pagination
|
|
|
|
**CP2 PRD Pattern (lines 1291-1313):**
|
|
```rust
|
|
// Remove stale discussions AND notes via last_seen_at sweep
|
|
if result.pagination_succeeded {
|
|
conn.execute(
|
|
"DELETE FROM discussions WHERE ... AND last_seen_at < ?",
|
|
...
|
|
)?;
|
|
}
|
|
```
|
|
|
|
**Risk:** Deleted GitLab issue discussions remain in local DB indefinitely.
|
|
|
|
**Verification:**
|
|
- [ ] Check `ingest_issue_discussions()` in `src/ingestion/discussions.rs`
|
|
- [ ] Verify sweep pattern is implemented for issues
|
|
- [ ] Verify `last_seen_at` is updated during issue discussion upsert
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 7. Discussion Sync Selection Pattern
|
|
|
|
**PRD Deviation:**
|
|
- CP1: In-memory collection during ingestion
|
|
- CP2: DB-driven selection after ingestion completes
|
|
|
|
**CP1 PRD (lines 818-827):**
|
|
```rust
|
|
issues_in_page.push((local_issue_id, issue.iid, issue_updated_at));
|
|
// ... later filter for discussion sync eligibility
|
|
```
|
|
|
|
**CP2 PRD (lines 790-800):**
|
|
```sql
|
|
SELECT id, iid, updated_at FROM merge_requests
|
|
WHERE discussions_synced_for_updated_at IS NULL
|
|
OR updated_at > discussions_synced_for_updated_at
|
|
```
|
|
|
|
**Risk:** Memory growth for projects with many issues; DB-driven is more scalable.
|
|
|
|
**Verification:**
|
|
- [ ] Check actual pattern in `src/ingestion/issues.rs`
|
|
- [ ] Determine if DB-driven selection was adopted
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 8. Sync Health Telemetry
|
|
|
|
**PRD Deviation:**
|
|
- CP1: No telemetry columns
|
|
- CP2: `discussions_sync_last_attempt_at`, `discussions_sync_attempts`, `discussions_sync_last_error`
|
|
|
|
**Risk:** Debugging discussion sync failures for issues is harder than for MRs.
|
|
|
|
**Verification:**
|
|
- [ ] Check `issues` table schema for telemetry columns
|
|
- [ ] If missing, determine if backport is needed
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 9. Watermark Advancement Logic
|
|
|
|
**PRD Deviation:**
|
|
- CP1: Advances watermark after any successful discussion ingestion
|
|
- CP2: Only advances if pagination succeeds AND no parse failures
|
|
|
|
**CP2 PRD (lines 1316-1335):**
|
|
```rust
|
|
// CRITICAL: Only advance watermark if pagination succeeded completely
|
|
if result.pagination_succeeded {
|
|
mark_discussions_synced(conn, local_mr_id, mr_updated_at)?;
|
|
} else {
|
|
warn!("Discussion sync incomplete; watermark NOT advanced");
|
|
}
|
|
```
|
|
|
|
**Risk:** Issue discussion sync may lose data on partial failures.
|
|
|
|
**Verification:**
|
|
- [ ] Check `ingest_issue_discussions()` watermark logic
|
|
- [ ] Verify partial failure handling matches CP2's stricter approach
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 10. Raw Payload Storage Strategy
|
|
|
|
**PRD Deviation:**
|
|
- CP1: Stores payloads for all discussions and notes
|
|
- CP2: Selective storage (skip system notes without DiffNote position)
|
|
|
|
**CP2 PRD (lines 1251-1272):**
|
|
```rust
|
|
let should_store_note_payload =
|
|
!note.is_system() ||
|
|
note.position_new_path().is_some() ||
|
|
note.position_old_path().is_some();
|
|
```
|
|
|
|
**Risk:** Database bloat for issues with many system notes.
|
|
|
|
**Verification:**
|
|
- [ ] Check actual payload storage logic in `src/ingestion/discussions.rs`
|
|
- [ ] Determine if selective storage is implemented
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 11. Full Sync (`--full`) Behavior
|
|
|
|
**PRD Deviation:**
|
|
- CP1: `--full` flag not explicitly documented
|
|
- CP2: `--full` resets cursor AND discussion watermarks
|
|
|
|
**CP2 PRD (lines 871-876):**
|
|
```rust
|
|
if full_sync {
|
|
reset_cursor(conn, project_id, "merge_requests")?;
|
|
reset_discussion_watermarks(conn, project_id)?;
|
|
}
|
|
```
|
|
|
|
**Risk:** `--full` on issues may not reset discussion watermarks.
|
|
|
|
**Verification:**
|
|
- [ ] Check `gi ingest --type=issues --full` behavior
|
|
- [ ] Verify discussion watermarks are reset
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## 12. NormalizedNote Extended Fields
|
|
|
|
**PRD Deviation:**
|
|
- CP1: Basic fields only
|
|
- CP2: Extended with `position_type`, `position_line_range_start/end`, SHA triplet
|
|
|
|
**Expected:** This is intentional for MR DiffNotes. However, the shared `NormalizedNote` struct must support both.
|
|
|
|
**Verification:**
|
|
- [ ] Check `NormalizedNote` struct definition
|
|
- [ ] Verify extended fields are present (even if NULL for issue notes)
|
|
|
|
**Status:** ⬜ Not Verified
|
|
|
|
---
|
|
|
|
## Summary Action Items
|
|
|
|
After verification, prioritize fixes:
|
|
|
|
| Priority | Item | Reason |
|
|
|----------|------|--------|
|
|
| P0 | Timestamp parsing contract | Compile errors or silent corruption |
|
|
| P0 | NormalizedDiscussion structure | Type incompatibility blocks CP2 |
|
|
| P1 | Watermark advancement logic | Data loss on partial failures |
|
|
| P1 | Stale data sweep pattern | Data accumulation over time |
|
|
| P2 | Pagination fallback chain | Robustness for edge cases |
|
|
| P2 | Cursor update strategy | Crash recovery consistency |
|
|
| P3 | Sync health telemetry | Debugging convenience |
|
|
| P3 | Raw payload storage | Database size optimization |
|
|
|
|
---
|
|
|
|
## Verification Commands
|
|
|
|
```bash
|
|
# Check transformer signatures
|
|
grep -n "fn transform_issue" src/gitlab/transformers/issue.rs
|
|
grep -n "fn transform_notes" src/gitlab/transformers/discussion.rs
|
|
|
|
# Check iso_to_ms signature
|
|
grep -n "fn iso_to_ms" src/core/time.rs
|
|
|
|
# Check NormalizedDiscussion struct
|
|
grep -A 20 "struct NormalizedDiscussion" src/gitlab/transformers/discussion.rs
|
|
|
|
# Check cursor update pattern
|
|
grep -n "update_cursor" src/ingestion/issues.rs
|
|
|
|
# Check sweep pattern
|
|
grep -n "last_seen_at" src/ingestion/discussions.rs
|
|
|
|
# Check telemetry columns
|
|
grep -n "sync_attempts\|sync_last_error" migrations/*.sql
|
|
```
|
|
|
|
---
|
|
|
|
## Notes
|
|
|
|
_Use this section to record findings during verification._
|