initial
This commit is contained in:
563
PRD.md
Normal file
563
PRD.md
Normal file
@@ -0,0 +1,563 @@
|
||||
# GitLab Inbox - Product Requirements Document
|
||||
|
||||
## Overview
|
||||
|
||||
**Product Name**: GitLab Inbox
|
||||
**Version**: 1.0
|
||||
**Author**: Taylor Eernisse
|
||||
**Date**: January 16, 2026
|
||||
|
||||
### Problem Statement
|
||||
|
||||
Managing GitLab activity with ADHD is overwhelming. The native GitLab interface creates cognitive overload through:
|
||||
|
||||
- **Information scatter**: Issues, MRs, and activity are spread across multiple pages
|
||||
- **Missing reply awareness**: Hard to know when someone has responded to your question (not fully covered by /todos alone)
|
||||
- **Context loss**: Difficult to find the right tab or remember which conversation you were tracking
|
||||
- **No unified "what's next"**: Multiple clicks required to understand what needs attention
|
||||
|
||||
### Solution
|
||||
|
||||
A local, always-open "inbox" application that presents GitLab notifications in an ADHD-friendly interface with explicit "handled" tracking, snooze capabilities, watchlist for awaiting replies, and progress visibility.
|
||||
|
||||
---
|
||||
|
||||
## Target User
|
||||
|
||||
**Primary Persona**: Software developer with ADHD working on 1-2 GitLab projects who needs to track conversations and respond to mentions, reviews, and assignments without cognitive overload.
|
||||
|
||||
**Key Characteristics**:
|
||||
- Needs clear "what's next" visibility
|
||||
- Benefits from external accountability (seeing who's waiting)
|
||||
- Motivated by progress tracking (watching a list shrink)
|
||||
- Prefers always-open tools over on-demand checks
|
||||
- Struggles with context switching and finding the right place
|
||||
- Needs a "not now but not forgotten" path that doesn't require willpower
|
||||
|
||||
---
|
||||
|
||||
## Goals
|
||||
|
||||
### User Goals
|
||||
1. Know immediately when someone has replied or needs my attention
|
||||
2. Quickly navigate to the right place in GitLab to respond
|
||||
3. Track what I've handled today for satisfaction and progress awareness
|
||||
4. Reduce cognitive load of manually tracking conversations
|
||||
5. Defer items temporarily without losing accountability (snooze)
|
||||
6. Know when someone has replied to something I'm waiting on
|
||||
|
||||
### Product Goals
|
||||
1. Reduce time-to-awareness for GitLab notifications
|
||||
2. Eliminate the need to manually poll GitLab for updates
|
||||
3. Provide ADHD-friendly UX patterns (clear actions, progress visibility, minimal decisions)
|
||||
4. Enable keyboard-first operation to reduce friction
|
||||
|
||||
### Non-Goals (v1.0)
|
||||
- Replacing GitLab for any write operations (commenting, reviewing, merging)
|
||||
- Supporting multiple GitLab instances
|
||||
- Team/shared usage
|
||||
- Mobile support
|
||||
|
||||
---
|
||||
|
||||
## Core Features
|
||||
|
||||
### 1. Inbox View (Primary)
|
||||
|
||||
**Description**: Display all GitLab todos (notifications) that need attention.
|
||||
|
||||
**Data Source**: GitLab `/todos` API endpoint
|
||||
|
||||
**Display Elements** (per item):
|
||||
| Element | Description |
|
||||
|---------|-------------|
|
||||
| Action Badge | Type indicator: mentioned, assigned, review_requested, build_failed, etc. |
|
||||
| Target Title | MR or Issue title |
|
||||
| Author | Who triggered this todo (name + avatar) |
|
||||
| Time | Relative time since created ("2h ago", "3 days") |
|
||||
| Project | Project name for context |
|
||||
|
||||
**Interactions**:
|
||||
- **Click item / Enter** → Opens target URL in browser (GitLab)
|
||||
- **Mark Handled** → Moves item to Done Today (local state only)
|
||||
- **Snooze** → Hides item until a chosen time (local state only)
|
||||
- **Dismiss** → `POST /todos/:id/mark_as_done` (marks as done in GitLab)
|
||||
|
||||
**Filtering**: Items marked as "handled" or "snoozed" locally are hidden from Inbox.
|
||||
|
||||
### 2. Snoozed View
|
||||
|
||||
**Description**: Items temporarily deferred until their wake time.
|
||||
|
||||
**Purpose**:
|
||||
- "Not now but not forgotten" path
|
||||
- Reduces inbox dread by shrinking the visible list
|
||||
- Enables focus sessions: clear the deck, then pull from Snoozed intentionally
|
||||
|
||||
**Snooze Options**:
|
||||
- Later today (3 hours)
|
||||
- Tomorrow morning (9am local)
|
||||
- Next weekday (Mon-Fri, 9am)
|
||||
- Custom date/time
|
||||
|
||||
**Behavior**:
|
||||
- Snoozed items are hidden from Inbox
|
||||
- When wake time passes, item returns to Inbox with a "Woke up" indicator
|
||||
- Snoozed view shows all snoozed items with their wake times
|
||||
|
||||
### 3. Watchlist (Awaiting Reply)
|
||||
|
||||
**Description**: Targets you're explicitly waiting on (MRs/Issues/etc.). Alerts when there is new activity since you last checked.
|
||||
|
||||
**Purpose**:
|
||||
- GitLab todos don't guarantee "someone replied" notifications
|
||||
- Explicit watch semantics for "I'm waiting on Bob" tracking
|
||||
- Gain "external accountability" symmetry
|
||||
|
||||
**Data Sources**:
|
||||
- Primary: /todos (fast path for items that generate new todos)
|
||||
- Secondary: per-target `updated_at`/notes polling for watched items (small set)
|
||||
|
||||
**Interactions**:
|
||||
- Mark Handled → optionally "Add to Watchlist" toggle
|
||||
- Watch item shows "Last seen" timestamp and "New activity" indicator
|
||||
- Click to open target in GitLab
|
||||
- Remove from watchlist when no longer waiting
|
||||
|
||||
### 4. Done Today View
|
||||
|
||||
**Description**: Items marked as handled during the current day.
|
||||
|
||||
**Purpose**:
|
||||
- ADHD-friendly progress visibility
|
||||
- Satisfaction from watching list shrink
|
||||
- Review of daily accomplishments
|
||||
|
||||
**Behavior**:
|
||||
- Stored as date-bucketed ledger keyed by local date (YYYY-MM-DD)
|
||||
- "Done Today" shows bucket for current local date
|
||||
- Option to clear today's bucket only
|
||||
- Historical buckets retained for potential "Done Yesterday" or weekly views
|
||||
|
||||
### 5. Manual Refresh
|
||||
|
||||
**Description**: Button to fetch latest todos on demand.
|
||||
|
||||
**Purpose**: Immediate update when user knows something changed.
|
||||
|
||||
### 6. Background Polling (v1.1)
|
||||
|
||||
**Description**: Automatic periodic refresh of todos.
|
||||
|
||||
**Configuration**:
|
||||
- Base interval (default: 60s)
|
||||
- Backoff on failure (exponential, capped at 15m) with jitter
|
||||
- 429 handling (respect `Retry-After` header; otherwise back off)
|
||||
|
||||
**Indicator**:
|
||||
- Last successful refresh time
|
||||
- Next scheduled refresh
|
||||
- Current backoff state (if any)
|
||||
|
||||
### 7. Keyboard Shortcuts (v1.0)
|
||||
|
||||
**Description**: Keyboard-first operation for reduced friction.
|
||||
|
||||
| Key | Action |
|
||||
|-----|--------|
|
||||
| `j` / `k` | Navigate down / up |
|
||||
| `Enter` | Open selected item in GitLab |
|
||||
| `h` | Mark handled |
|
||||
| `s` | Snooze (opens snooze picker) |
|
||||
| `d` | Dismiss (mark as done in GitLab) |
|
||||
| `w` | Add to / remove from watchlist |
|
||||
| `/` | Focus search/filter |
|
||||
|
||||
### 8. Focus Mode (Optional)
|
||||
|
||||
**Description**: Show only the next N items (default 3) to reduce decision load.
|
||||
|
||||
**Purpose**:
|
||||
- Convert "overwhelm" into "sequence"
|
||||
- Reduce choices, increase throughput
|
||||
- ADHD-optimized: work the queue, don't manage the list
|
||||
|
||||
**Behavior**:
|
||||
- Primary action emphasized ("Open", then "Handled"/"Snooze")
|
||||
- Toggle: Focus / All Items
|
||||
- Focus queue is top N unhandled items by creation date
|
||||
|
||||
---
|
||||
|
||||
## Technical Architecture
|
||||
|
||||
### Tech Stack
|
||||
|
||||
| Component | Technology |
|
||||
|-----------|------------|
|
||||
| Framework | TanStack Start |
|
||||
| Styling | Tailwind CSS |
|
||||
| Runtime | Node.js (local) |
|
||||
| State Persistence | JSON file with atomic writes (local) |
|
||||
| Secret Storage | OS keychain (preferred) or encrypted local store |
|
||||
| GitLab Integration | REST API with Personal Access Token |
|
||||
|
||||
### Deployment
|
||||
|
||||
- **Local only**: Runs on localhost
|
||||
- **No external hosting**: No cloud deployment, no auth flows
|
||||
- **Single user**: No multi-tenancy
|
||||
|
||||
### GitLab API
|
||||
|
||||
**Authentication**: Personal Access Token (PAT) with `read_api` scope
|
||||
|
||||
**Primary Endpoints**:
|
||||
```
|
||||
GET /api/v4/todos?state=pending&per_page=100
|
||||
|
||||
POST /api/v4/todos/:id/mark_as_done
|
||||
```
|
||||
|
||||
**Response Structure** (relevant fields):
|
||||
```typescript
|
||||
interface GitLabTodo {
|
||||
id: number;
|
||||
action_name:
|
||||
| 'assigned'
|
||||
| 'mentioned'
|
||||
| 'build_failed'
|
||||
| 'marked'
|
||||
| 'approval_required'
|
||||
| 'unmergeable'
|
||||
| 'directly_addressed'
|
||||
| 'merge_train_removed'
|
||||
| 'member_access_requested'
|
||||
| string; // forward-compatible for new action types
|
||||
target_type: 'MergeRequest' | 'Issue' | 'Commit' | 'Epic' | 'DesignManagement::Design' | string;
|
||||
target: {
|
||||
id: number;
|
||||
iid: number;
|
||||
title: string;
|
||||
web_url?: string; // optional; may not be present for all target types
|
||||
};
|
||||
target_url: string; // canonical "Open" URL - use this for navigation
|
||||
author: {
|
||||
id: number;
|
||||
name: string;
|
||||
avatar_url: string;
|
||||
};
|
||||
project: {
|
||||
id: number;
|
||||
name: string;
|
||||
path_with_namespace: string;
|
||||
};
|
||||
created_at: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Local State
|
||||
|
||||
```typescript
|
||||
interface LocalState {
|
||||
schemaVersion: number; // for migrations
|
||||
|
||||
handledByDate: {
|
||||
[localDate: string]: { // YYYY-MM-DD in local time
|
||||
[todoId: number]: {
|
||||
handledAt: string; // ISO timestamp
|
||||
todo: GitLabTodo; // Snapshot for Done Today display
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
snoozedTodos: {
|
||||
[todoId: number]: {
|
||||
wakeAt: string; // ISO timestamp
|
||||
snoozedAt: string; // ISO timestamp
|
||||
todo: GitLabTodo; // snapshot
|
||||
}
|
||||
};
|
||||
|
||||
watchlist: {
|
||||
[watchKey: string]: { // e.g., "MergeRequest:123" or "Issue:456"
|
||||
targetType: string;
|
||||
projectId?: number;
|
||||
targetId: number;
|
||||
targetIid?: number;
|
||||
targetUrl: string;
|
||||
lastSeenUpdatedAt?: string; // ISO - when we last observed the target
|
||||
lastCheckedAt?: string; // ISO - when we last polled
|
||||
addedAt: string; // ISO
|
||||
muted?: boolean;
|
||||
}
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
**Storage**: `~/.config/gitlab-inbox/state.json`
|
||||
|
||||
**Persistence Strategy**:
|
||||
- Atomic writes: write to `state.json.tmp`, then rename to `state.json`
|
||||
- Keep `state.json.bak` as last-known-good before each write
|
||||
- Validate JSON schema on load; if invalid, fall back to backup and surface warning
|
||||
- Schema version for forward migrations
|
||||
|
||||
---
|
||||
|
||||
## User Interface
|
||||
|
||||
### Layout
|
||||
|
||||
```
|
||||
+--------------------------------------------------+
|
||||
| GitLab Inbox [Focus] [Refresh] 🟢 2m ago |
|
||||
| [Inbox] [Snoozed] [Watchlist] [Done Today] |
|
||||
+--------------------------------------------------+
|
||||
| |
|
||||
| > [mentioned] Fix login bug |
|
||||
| Alice Smith · infra-frontend · 2h ago |
|
||||
| [Snooze] [Handle] [Open] |
|
||||
| |
|
||||
| [review_requested] Add caching layer |
|
||||
| Bob Jones · api-service · 1d ago |
|
||||
| [Snooze] [Handle] [Open] |
|
||||
| |
|
||||
| [assigned] Update documentation |
|
||||
| Carol White · docs · 3d ago |
|
||||
| [Snooze] [Handle] [Open] |
|
||||
| |
|
||||
+--------------------------------------------------+
|
||||
| j/k: navigate Enter: open h: handle s: snooze |
|
||||
+--------------------------------------------------+
|
||||
```
|
||||
|
||||
### Action Badge Colors
|
||||
|
||||
| Action | Color | Meaning |
|
||||
|--------|-------|---------|
|
||||
| mentioned | Blue | Someone mentioned you |
|
||||
| assigned | Purple | Assigned to you |
|
||||
| approval_required | Yellow | Needs your approval |
|
||||
| build_failed | Red | Pipeline failure |
|
||||
| directly_addressed | Cyan | Direct @ mention |
|
||||
| unmergeable | Orange | MR has conflicts |
|
||||
| marked | Gray | Marked as todo |
|
||||
| merge_train_removed | Red | Removed from merge train |
|
||||
| member_access_requested | Teal | Access request |
|
||||
| (unknown) | Gray | Forward-compatible fallback |
|
||||
|
||||
### States
|
||||
|
||||
- **Loading**: Skeleton cards while fetching
|
||||
- **Empty**: "All clear! No pending items." message
|
||||
- **Error**: Connection error with retry button
|
||||
- **Stale**: Visual indicator if data is old (> 5 min since last *successful* refresh)
|
||||
- **Backoff**: Indicator showing retry status when experiencing errors
|
||||
|
||||
---
|
||||
|
||||
## User Flows
|
||||
|
||||
### Flow 1: Morning Check-in
|
||||
1. Open GitLab Inbox (already running in background tab)
|
||||
2. See list of todos sorted by newest first
|
||||
3. Press `Enter` to open in GitLab
|
||||
4. Handle the item (reply, review, etc.)
|
||||
5. Return to Inbox, press `h` to mark handled
|
||||
6. Item moves to Done Today
|
||||
7. Repeat until Inbox is empty
|
||||
|
||||
### Flow 2: Triage with Snooze
|
||||
1. See inbox with 12 items
|
||||
2. Quickly triage: handle 3, snooze 5 until tomorrow, dismiss 2 already-resolved
|
||||
3. Inbox now shows 2 items to focus on
|
||||
4. Tomorrow: snoozed items wake up and return to inbox
|
||||
|
||||
### Flow 3: Awaiting Reply
|
||||
1. Handle a todo (you replied to someone's question)
|
||||
2. Toggle "Add to Watchlist" when marking handled
|
||||
3. Item appears in Watchlist view
|
||||
4. Later: see "New activity" indicator when they respond
|
||||
5. Open, read response, remove from watchlist
|
||||
|
||||
### Flow 4: Focus Session
|
||||
1. Enable Focus Mode
|
||||
2. See only top 3 items
|
||||
3. Work through them sequentially
|
||||
4. As items complete, next ones appear
|
||||
5. Reduced decision fatigue
|
||||
|
||||
### Flow 5: End-of-Day Review
|
||||
1. Navigate to Done Today view
|
||||
2. See all items handled today
|
||||
3. Satisfaction from visible progress
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
| Metric | Target | Measurement |
|
||||
|--------|--------|-------------|
|
||||
| Time to awareness | < 2 min | Time from GitLab event to user seeing it |
|
||||
| Daily items handled | Increased | Compare to baseline (manual tracking) |
|
||||
| Context switches | Reduced | Fewer GitLab tabs open simultaneously |
|
||||
| Snooze usage | Regular | Items snoozed vs dismissed (healthy ratio = snooze used) |
|
||||
| Reply awareness | High | Watchlist items caught before manual check |
|
||||
| User satisfaction | Qualitative | Does this reduce ADHD-related friction? |
|
||||
|
||||
---
|
||||
|
||||
## Risks and Mitigations
|
||||
|
||||
| Risk | Impact | Mitigation |
|
||||
|------|--------|------------|
|
||||
| GitLab API rate limits | Polling blocked | Configurable interval, backoff + jitter, respect 429/Retry-After |
|
||||
| Token expiration/rotation | App stops working | Clear error state + setup flow; surface expiry guidance and re-auth path |
|
||||
| State file corruption | Lose handled/snoozed/watch state | Atomic writes (tmp+rename), schema validation on load, keep last-known-good backup |
|
||||
| GitLab API changes | App breaks | Pin to known API version, monitor deprecations, forward-compatible types |
|
||||
| Token leakage | Security incident | Store in OS keychain, not in repo-adjacent files |
|
||||
|
||||
---
|
||||
|
||||
## Future Considerations (Post v1.0)
|
||||
|
||||
- **Grouping**: By project, by action type
|
||||
- **Stale highlighting**: Visual alert for items waiting > X days
|
||||
- **Desktop notifications**: OS-level alerts for new high-priority items
|
||||
- **Quick actions**: Approve MR, close issue directly from app
|
||||
- **Multiple GitLab instances**: Connect to both gitlab.com and self-hosted
|
||||
- **Done history**: View handled items from yesterday, this week
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 0: Setup & Auth
|
||||
- First-run setup wizard (GitLab URL + token)
|
||||
- Token storage implementation (keychain/encrypted local)
|
||||
- Connectivity check (`/todos`, auth failure UX)
|
||||
- Clear error states for invalid/expired tokens
|
||||
|
||||
### Phase 1: Foundation
|
||||
- Initialize TanStack Start project
|
||||
- Set up Tailwind CSS
|
||||
- Create GitLab API client with PAT auth
|
||||
- Fetch and display todos in basic list (using `target_url` for navigation)
|
||||
- Implement click-to-open
|
||||
|
||||
### Phase 2: Core Workflow
|
||||
- Add local storage with atomic writes + backup
|
||||
- Implement date-bucketed handled state
|
||||
- Implement "Mark Handled" action
|
||||
- Create Done Today view
|
||||
- Add keyboard shortcuts (minimal set: j/k/Enter/h/s/d)
|
||||
- Add Snooze + Snoozed view
|
||||
- Filter handled/snoozed todos from Inbox
|
||||
|
||||
### Phase 3: Reliability & Awareness
|
||||
- Background polling with configurable interval
|
||||
- Backoff/jitter + 429 handling
|
||||
- Last successful refresh tracking
|
||||
- Watchlist ("Awaiting Reply") implementation
|
||||
- Per-target polling for watched items (small set)
|
||||
- Add manual refresh button
|
||||
- Relative time display
|
||||
- Action type badges with colors
|
||||
- Loading and error states
|
||||
- Connection status indicator
|
||||
|
||||
### Phase 4: Polish
|
||||
- Focus Mode implementation
|
||||
- Snooze time picker refinement
|
||||
- Keyboard shortcut help overlay
|
||||
- State migration handling (schemaVersion)
|
||||
- Edge case handling (DST, timezone changes)
|
||||
|
||||
---
|
||||
|
||||
## Appendix
|
||||
|
||||
### Environment Configuration
|
||||
|
||||
**Primary configuration**:
|
||||
- URL + settings in: `~/.config/gitlab-inbox/config.json`
|
||||
- Token stored in OS keychain (preferred)
|
||||
|
||||
**Optional (dev-only) `.env.local` support**:
|
||||
```env
|
||||
GITLAB_URL=https://gitlab.yourcompany.com
|
||||
GITLAB_TOKEN=glpat-xxxxxxxxxxxx
|
||||
```
|
||||
|
||||
**Config file structure**:
|
||||
```json
|
||||
{
|
||||
"gitlabUrl": "https://gitlab.yourcompany.com",
|
||||
"pollingInterval": 60,
|
||||
"focusModeCount": 3
|
||||
}
|
||||
```
|
||||
|
||||
### Creating a GitLab PAT
|
||||
|
||||
1. Go to GitLab → User Settings → Access Tokens
|
||||
2. Create token with `read_api` scope
|
||||
3. Set expiration (note: tokens expire at midnight UTC on expiry date)
|
||||
4. Save token via setup wizard (stored in keychain)
|
||||
5. Token never leaves local machine
|
||||
|
||||
### Project Structure
|
||||
|
||||
```
|
||||
gitlab-inbox/
|
||||
├── app/
|
||||
│ ├── routes/
|
||||
│ │ ├── __root.tsx
|
||||
│ │ ├── index.tsx # Inbox view
|
||||
│ │ ├── snoozed.tsx # Snoozed view
|
||||
│ │ ├── watchlist.tsx # Watchlist view
|
||||
│ │ ├── done.tsx # Done Today view
|
||||
│ │ └── setup.tsx # First-run setup
|
||||
│ ├── components/
|
||||
│ │ ├── TodoCard.tsx
|
||||
│ │ ├── TodoList.tsx
|
||||
│ │ ├── ActionBadge.tsx
|
||||
│ │ ├── Header.tsx
|
||||
│ │ ├── SnoozePicker.tsx
|
||||
│ │ ├── FocusMode.tsx
|
||||
│ │ └── KeyboardHelp.tsx
|
||||
│ ├── lib/
|
||||
│ │ ├── gitlab.ts # API client
|
||||
│ │ ├── storage.ts # Atomic state persistence
|
||||
│ │ ├── keychain.ts # Token storage
|
||||
│ │ ├── polling.ts # Polling state machine
|
||||
│ │ ├── snooze.ts # Snooze logic + wake checking
|
||||
│ │ ├── watchlist.ts # Watchlist polling
|
||||
│ │ └── types.ts
|
||||
│ └── app.tsx
|
||||
├── package.json
|
||||
├── tailwind.config.ts
|
||||
└── vite.config.ts
|
||||
```
|
||||
|
||||
### Test Strategy
|
||||
|
||||
**Unit Tests**:
|
||||
- State normalization and migration
|
||||
- Snooze wake time calculations
|
||||
- Date bucketing logic (timezone handling)
|
||||
- Polling backoff calculations
|
||||
|
||||
**Integration Tests** (mocked GitLab API):
|
||||
- `/todos` response parsing
|
||||
- `mark_as_done` endpoint calls
|
||||
- Error handling (401, 429, network errors)
|
||||
- State persistence round-trip (write + read)
|
||||
- Backup recovery on corruption
|
||||
|
||||
**Manual Testing**:
|
||||
- First-run setup flow
|
||||
- Keyboard navigation
|
||||
- Snooze + wake cycle
|
||||
- Watchlist activity detection
|
||||
641
SPEC.md
Normal file
641
SPEC.md
Normal file
@@ -0,0 +1,641 @@
|
||||
# GitLab Knowledge Engine - Spec Document
|
||||
|
||||
## Executive Summary
|
||||
|
||||
A self-hosted tool to extract, index, and semantically search 2+ years of GitLab data (issues, MRs, comments/notes, and MR file-change links) from 2 main repositories (~10K items). The MVP delivers semantic search as a foundational capability that enables future specialized views (file history, personal tracking, person context). Commit-level indexing is explicitly post-MVP.
|
||||
|
||||
---
|
||||
|
||||
## Discovery Summary
|
||||
|
||||
### Pain Points Identified
|
||||
1. **Knowledge discovery** - Tribal knowledge buried in old MRs/issues that nobody can find
|
||||
2. **Decision traceability** - Hard to find *why* decisions were made; context scattered across issue comments and MR discussions
|
||||
|
||||
### Constraints
|
||||
| Constraint | Detail |
|
||||
|------------|--------|
|
||||
| Hosting | Self-hosted only, no external APIs |
|
||||
| Compute | Local dev machine (M-series Mac assumed) |
|
||||
| GitLab Access | Self-hosted instance, PAT access, no webhooks (could request) |
|
||||
| Build Method | AI agents will implement; user is TypeScript expert for review |
|
||||
|
||||
### Target Use Cases (Priority Order)
|
||||
1. **MVP: Semantic Search** - "Find discussions about authentication redesign"
|
||||
2. **Future: File/Feature History** - "What decisions were made about src/auth/login.ts?"
|
||||
3. **Future: Personal Tracking** - "What am I assigned to or mentioned in?"
|
||||
4. **Future: Person Context** - "What's @johndoe's background in this project?"
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ GitLab API │
|
||||
│ (Issues, MRs, Notes) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
(Commit-level indexing explicitly post-MVP)
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Data Ingestion Layer │
|
||||
│ - Incremental sync (PAT-based polling) │
|
||||
│ - Rate limiting / backoff │
|
||||
│ - Raw JSON storage for replay │
|
||||
│ - Dependent resource fetching (notes, MR changes) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Data Processing Layer │
|
||||
│ - Normalize artifacts to unified schema │
|
||||
│ - Extract searchable documents (canonical text + metadata) │
|
||||
│ - Content hashing for change detection │
|
||||
│ - Build relationship graph (issue↔MR↔note↔file) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Storage Layer │
|
||||
│ - SQLite + sqlite-vss + FTS5 (hybrid search) │
|
||||
│ - Structured metadata in relational tables │
|
||||
│ - Vector embeddings for semantic search │
|
||||
│ - Full-text index for lexical search fallback │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Query Interface │
|
||||
│ - CLI for human testing │
|
||||
│ - JSON API for AI agent testing │
|
||||
│ - Semantic search with filters (author, date, type, label) │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Technology Choices
|
||||
|
||||
| Component | Recommendation | Rationale |
|
||||
|-----------|---------------|-----------|
|
||||
| Language | TypeScript/Node.js | User expertise, good GitLab libs, AI agent friendly |
|
||||
| Database | SQLite + sqlite-vss | Zero-config, portable, vector search built-in |
|
||||
| Embeddings | Ollama + nomic-embed-text | Self-hosted, runs well on Apple Silicon, 768-dim vectors |
|
||||
| CLI Framework | Commander.js or oclif | Standard, well-documented |
|
||||
|
||||
### Alternative Considered: Postgres + pgvector
|
||||
- Pros: More scalable, better for production multi-user
|
||||
- Cons: Requires running Postgres, heavier setup
|
||||
- Decision: Start with SQLite for simplicity; migration path exists if needed
|
||||
|
||||
---
|
||||
|
||||
## Checkpoint Structure
|
||||
|
||||
Each checkpoint is a **testable milestone** where a human can validate the system works before proceeding.
|
||||
|
||||
### Checkpoint 0: Project Setup
|
||||
**Deliverable:** Scaffolded project with GitLab API connection verified
|
||||
|
||||
**Tests:**
|
||||
1. Run `gitlab-engine auth-test` → returns authenticated user info
|
||||
2. Run `gitlab-engine doctor` → verifies:
|
||||
- Can reach GitLab baseUrl
|
||||
- PAT is present and can read configured projects
|
||||
- SQLite opens DB and migrations apply
|
||||
- Ollama reachable OR embedding disabled with clear warning
|
||||
|
||||
**Scope:**
|
||||
- Project structure (TypeScript, ESLint, Vitest)
|
||||
- GitLab API client with PAT authentication
|
||||
- Environment and project configuration
|
||||
- Basic CLI scaffold with `auth-test` command
|
||||
- `doctor` command for environment verification
|
||||
- Projects table and initial sync
|
||||
|
||||
**Configuration (MVP):**
|
||||
```json
|
||||
// gitlab-engine.config.json
|
||||
{
|
||||
"gitlab": {
|
||||
"baseUrl": "https://gitlab.example.com",
|
||||
"tokenEnvVar": "GITLAB_TOKEN"
|
||||
},
|
||||
"projects": [
|
||||
{ "path": "group/project-one" },
|
||||
{ "path": "group/project-two" }
|
||||
],
|
||||
"embedding": {
|
||||
"provider": "ollama",
|
||||
"model": "nomic-embed-text",
|
||||
"baseUrl": "http://localhost:11434"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**DB Runtime Defaults (Checkpoint 0):**
|
||||
- On every connection:
|
||||
- `PRAGMA journal_mode=WAL;`
|
||||
- `PRAGMA foreign_keys=ON;`
|
||||
|
||||
**Schema (Checkpoint 0):**
|
||||
```sql
|
||||
-- Projects table (configured targets)
|
||||
CREATE TABLE projects (
|
||||
id INTEGER PRIMARY KEY,
|
||||
gitlab_project_id INTEGER UNIQUE NOT NULL,
|
||||
path_with_namespace TEXT NOT NULL,
|
||||
default_branch TEXT,
|
||||
web_url TEXT,
|
||||
created_at INTEGER,
|
||||
updated_at INTEGER,
|
||||
raw_payload_id INTEGER REFERENCES raw_payloads(id)
|
||||
);
|
||||
CREATE INDEX idx_projects_path ON projects(path_with_namespace);
|
||||
|
||||
-- Sync tracking for reliability
|
||||
CREATE TABLE sync_runs (
|
||||
id INTEGER PRIMARY KEY,
|
||||
started_at INTEGER NOT NULL,
|
||||
finished_at INTEGER,
|
||||
status TEXT NOT NULL, -- 'running' | 'succeeded' | 'failed'
|
||||
command TEXT NOT NULL, -- 'ingest issues' | 'sync' | etc.
|
||||
error TEXT
|
||||
);
|
||||
|
||||
-- Sync cursors for primary resources only
|
||||
-- Notes and MR changes are dependent resources (fetched via parent updates)
|
||||
CREATE TABLE sync_cursors (
|
||||
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||
resource_type TEXT NOT NULL, -- 'issues' | 'merge_requests'
|
||||
updated_at_cursor INTEGER, -- last fully processed updated_at (ms epoch)
|
||||
tie_breaker_id INTEGER, -- last fully processed gitlab_id (for stable ordering)
|
||||
PRIMARY KEY(project_id, resource_type)
|
||||
);
|
||||
|
||||
-- Raw payload storage (decoupled from entity tables)
|
||||
CREATE TABLE raw_payloads (
|
||||
id INTEGER PRIMARY KEY,
|
||||
source TEXT NOT NULL, -- 'gitlab'
|
||||
resource_type TEXT NOT NULL, -- 'project' | 'issue' | 'mr' | 'note'
|
||||
gitlab_id INTEGER NOT NULL,
|
||||
fetched_at INTEGER NOT NULL,
|
||||
json TEXT NOT NULL
|
||||
);
|
||||
CREATE INDEX idx_raw_payloads_lookup ON raw_payloads(resource_type, gitlab_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Checkpoint 1: Issue Ingestion
|
||||
**Deliverable:** All issues from target repos stored locally
|
||||
|
||||
**Test:** Run `gitlab-engine ingest --type=issues` → count matches GitLab; run `gitlab-engine list issues --limit=10` → displays issues correctly
|
||||
|
||||
**Scope:**
|
||||
- Issue fetcher with pagination handling
|
||||
- Raw JSON storage in raw_payloads table
|
||||
- Normalized issue schema in SQLite
|
||||
- Labels ingestion derived from issue payload:
|
||||
- Always persist label names from `labels: string[]`
|
||||
- Optionally request `with_labels_details=true` to capture color/description when available
|
||||
- Incremental sync support (run tracking + per-project cursor)
|
||||
- Basic list/count CLI commands
|
||||
|
||||
**Reliability/Idempotency Rules:**
|
||||
- Every ingest/sync creates a `sync_runs` row
|
||||
- Single-flight: refuse to start if an existing run is `running` (unless `--force`)
|
||||
- Cursor advances only after successful transaction commit per page/batch
|
||||
- Ordering: `updated_at ASC`, tie-breaker `gitlab_id ASC`
|
||||
- Use explicit transactions for batch inserts
|
||||
|
||||
**Schema Preview:**
|
||||
```sql
|
||||
CREATE TABLE issues (
|
||||
id INTEGER PRIMARY KEY,
|
||||
gitlab_id INTEGER UNIQUE NOT NULL,
|
||||
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||
iid INTEGER NOT NULL,
|
||||
title TEXT,
|
||||
description TEXT,
|
||||
state TEXT,
|
||||
author_username TEXT,
|
||||
created_at INTEGER,
|
||||
updated_at INTEGER,
|
||||
web_url TEXT,
|
||||
raw_payload_id INTEGER REFERENCES raw_payloads(id)
|
||||
);
|
||||
CREATE INDEX idx_issues_project_updated ON issues(project_id, updated_at);
|
||||
CREATE INDEX idx_issues_author ON issues(author_username);
|
||||
|
||||
-- Labels are derived from issue payloads (string array)
|
||||
-- Uniqueness is (project_id, name) since gitlab_id isn't always available
|
||||
CREATE TABLE labels (
|
||||
id INTEGER PRIMARY KEY,
|
||||
gitlab_id INTEGER, -- optional (only if available)
|
||||
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||
name TEXT NOT NULL,
|
||||
color TEXT,
|
||||
description TEXT
|
||||
);
|
||||
CREATE UNIQUE INDEX uq_labels_project_name ON labels(project_id, name);
|
||||
CREATE INDEX idx_labels_name ON labels(name);
|
||||
|
||||
CREATE TABLE issue_labels (
|
||||
issue_id INTEGER REFERENCES issues(id),
|
||||
label_id INTEGER REFERENCES labels(id),
|
||||
PRIMARY KEY(issue_id, label_id)
|
||||
);
|
||||
CREATE INDEX idx_issue_labels_label ON issue_labels(label_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Checkpoint 2: MR + Comments + File Links Ingestion
|
||||
**Deliverable:** All MRs, discussion threads, and file-change links stored locally
|
||||
|
||||
**Test:** Run `gitlab-engine ingest --type=merge_requests` → count matches; run `gitlab-engine show mr 1234` → displays MR with comments and files changed
|
||||
|
||||
**Scope:**
|
||||
- MR fetcher with pagination
|
||||
- Notes fetcher (issue notes + MR notes) as a dependent resource:
|
||||
- During initial ingest: fetch notes for every issue/MR
|
||||
- During sync: refetch notes only for issues/MRs updated since cursor
|
||||
- MR changes/diffs fetcher as a dependent resource:
|
||||
- During initial ingest: fetch changes for every MR
|
||||
- During sync: refetch changes only for MRs updated since cursor
|
||||
- Relationship linking (note → parent issue/MR via foreign keys, MR → files)
|
||||
- Extended CLI commands for MR display
|
||||
|
||||
**Schema Additions:**
|
||||
```sql
|
||||
CREATE TABLE merge_requests (
|
||||
id INTEGER PRIMARY KEY,
|
||||
gitlab_id INTEGER UNIQUE NOT NULL,
|
||||
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||
iid INTEGER NOT NULL,
|
||||
title TEXT,
|
||||
description TEXT,
|
||||
state TEXT,
|
||||
author_username TEXT,
|
||||
source_branch TEXT,
|
||||
target_branch TEXT,
|
||||
created_at INTEGER,
|
||||
updated_at INTEGER,
|
||||
merged_at INTEGER,
|
||||
web_url TEXT,
|
||||
raw_payload_id INTEGER REFERENCES raw_payloads(id)
|
||||
);
|
||||
CREATE INDEX idx_mrs_project_updated ON merge_requests(project_id, updated_at);
|
||||
CREATE INDEX idx_mrs_author ON merge_requests(author_username);
|
||||
|
||||
-- Notes with explicit parent foreign keys for referential integrity
|
||||
CREATE TABLE notes (
|
||||
id INTEGER PRIMARY KEY,
|
||||
gitlab_id INTEGER UNIQUE NOT NULL,
|
||||
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||
issue_id INTEGER REFERENCES issues(id),
|
||||
merge_request_id INTEGER REFERENCES merge_requests(id),
|
||||
noteable_type TEXT NOT NULL, -- 'Issue' | 'MergeRequest'
|
||||
noteable_iid INTEGER NOT NULL, -- parent IID (from API path)
|
||||
author_username TEXT,
|
||||
body TEXT,
|
||||
created_at INTEGER,
|
||||
updated_at INTEGER,
|
||||
system BOOLEAN,
|
||||
raw_payload_id INTEGER REFERENCES raw_payloads(id),
|
||||
-- Exactly one parent FK must be set
|
||||
CHECK (
|
||||
(noteable_type='Issue' AND issue_id IS NOT NULL AND merge_request_id IS NULL) OR
|
||||
(noteable_type='MergeRequest' AND merge_request_id IS NOT NULL AND issue_id IS NULL)
|
||||
)
|
||||
);
|
||||
CREATE INDEX idx_notes_issue ON notes(issue_id);
|
||||
CREATE INDEX idx_notes_mr ON notes(merge_request_id);
|
||||
CREATE INDEX idx_notes_author ON notes(author_username);
|
||||
|
||||
-- File linkage for "what MRs touched this file?" queries (with rename support)
|
||||
CREATE TABLE mr_files (
|
||||
id INTEGER PRIMARY KEY,
|
||||
merge_request_id INTEGER REFERENCES merge_requests(id),
|
||||
old_path TEXT,
|
||||
new_path TEXT,
|
||||
new_file BOOLEAN,
|
||||
deleted_file BOOLEAN,
|
||||
renamed_file BOOLEAN,
|
||||
UNIQUE(merge_request_id, old_path, new_path)
|
||||
);
|
||||
CREATE INDEX idx_mr_files_old_path ON mr_files(old_path);
|
||||
CREATE INDEX idx_mr_files_new_path ON mr_files(new_path);
|
||||
|
||||
-- MR labels (reuse same labels table)
|
||||
CREATE TABLE mr_labels (
|
||||
merge_request_id INTEGER REFERENCES merge_requests(id),
|
||||
label_id INTEGER REFERENCES labels(id),
|
||||
PRIMARY KEY(merge_request_id, label_id)
|
||||
);
|
||||
CREATE INDEX idx_mr_labels_label ON mr_labels(label_id);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Checkpoint 3: Embedding Generation
|
||||
**Deliverable:** Vector embeddings generated for all text content
|
||||
|
||||
**Test:** Run `gitlab-engine embed --all` → progress indicator; run `gitlab-engine stats` → shows embedding coverage percentage
|
||||
|
||||
**Scope:**
|
||||
- Ollama integration (nomic-embed-text model)
|
||||
- Embedding generation pipeline (batch processing)
|
||||
- Vector storage in SQLite (sqlite-vss extension)
|
||||
- Progress tracking and resumability
|
||||
- Document extraction layer:
|
||||
- Canonical "search documents" derived from issues/MRs/notes
|
||||
- Stable content hashing for change detection (SHA-256 of content_text)
|
||||
- Single embedding per document (chunking deferred to post-MVP)
|
||||
- Denormalized metadata for fast filtering (author, labels, dates)
|
||||
- Fast label filtering via `document_labels` join table
|
||||
|
||||
**Schema Additions:**
|
||||
```sql
|
||||
-- Unified searchable documents (derived from issues/MRs/notes)
|
||||
CREATE TABLE documents (
|
||||
id INTEGER PRIMARY KEY,
|
||||
source_type TEXT NOT NULL, -- 'issue' | 'merge_request' | 'note'
|
||||
source_id INTEGER NOT NULL, -- local DB id in the source table
|
||||
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||
author_username TEXT,
|
||||
label_names TEXT, -- JSON array (display/debug only)
|
||||
created_at INTEGER,
|
||||
updated_at INTEGER,
|
||||
url TEXT,
|
||||
title TEXT, -- null for notes
|
||||
content_text TEXT NOT NULL, -- canonical text for embedding/snippets
|
||||
content_hash TEXT NOT NULL, -- SHA-256 for change detection
|
||||
UNIQUE(source_type, source_id)
|
||||
);
|
||||
CREATE INDEX idx_documents_project_updated ON documents(project_id, updated_at);
|
||||
CREATE INDEX idx_documents_author ON documents(author_username);
|
||||
CREATE INDEX idx_documents_source ON documents(source_type, source_id);
|
||||
|
||||
-- Fast label filtering for documents (indexed exact-match)
|
||||
CREATE TABLE document_labels (
|
||||
document_id INTEGER NOT NULL REFERENCES documents(id),
|
||||
label_name TEXT NOT NULL,
|
||||
PRIMARY KEY(document_id, label_name)
|
||||
);
|
||||
CREATE INDEX idx_document_labels_label ON document_labels(label_name);
|
||||
|
||||
-- sqlite-vss virtual table
|
||||
-- Storage rule: embeddings.rowid = documents.id
|
||||
CREATE VIRTUAL TABLE embeddings USING vss0(
|
||||
embedding(768)
|
||||
);
|
||||
|
||||
-- Embedding provenance + change detection
|
||||
-- document_id is PRIMARY KEY and equals embeddings.rowid
|
||||
CREATE TABLE embedding_metadata (
|
||||
document_id INTEGER PRIMARY KEY REFERENCES documents(id),
|
||||
model TEXT NOT NULL, -- 'nomic-embed-text'
|
||||
dims INTEGER NOT NULL, -- 768
|
||||
content_hash TEXT NOT NULL, -- copied from documents.content_hash
|
||||
created_at INTEGER NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
**Storage Rule (MVP):**
|
||||
- Insert embedding with `rowid = documents.id`
|
||||
- Upsert `embedding_metadata` by `document_id`
|
||||
- This alignment simplifies joins and eliminates rowid mapping fragility
|
||||
|
||||
**Document Extraction Rules:**
|
||||
- Issue → title + "\n\n" + description
|
||||
- MR → title + "\n\n" + description
|
||||
- Note → body (skip system notes unless they contain meaningful content)
|
||||
|
||||
---
|
||||
|
||||
### Checkpoint 4: Semantic Search
|
||||
**Deliverable:** Working semantic search across all indexed content
|
||||
|
||||
**Tests:**
|
||||
1. Run `gitlab-engine search "authentication redesign"` → returns ranked results with snippets
|
||||
2. Golden queries: curated list of 10 queries with expected result *containment* (e.g., "at least one of these 3 known URLs appears in top 10")
|
||||
3. `gitlab-engine search "..." --json` validates against JSON schema (stable fields present)
|
||||
|
||||
**Scope:**
|
||||
- Hybrid retrieval:
|
||||
- Vector recall (sqlite-vss) + FTS lexical recall (fts5)
|
||||
- Merge + rerank results using Reciprocal Rank Fusion (RRF)
|
||||
- Result ranking and scoring (document-level)
|
||||
- Search filters: `--type=issue|mr|note`, `--author=username`, `--after=date`, `--label=name`
|
||||
- Label filtering operates on `document_labels` (indexed, exact-match)
|
||||
- Output formatting: ranked list with title, snippet, score, URL
|
||||
- JSON output mode for AI agent consumption
|
||||
|
||||
**Schema Additions:**
|
||||
```sql
|
||||
-- Full-text search for hybrid retrieval
|
||||
CREATE VIRTUAL TABLE documents_fts USING fts5(
|
||||
title,
|
||||
content_text,
|
||||
content='documents',
|
||||
content_rowid='id'
|
||||
);
|
||||
|
||||
-- Triggers to keep FTS in sync
|
||||
CREATE TRIGGER documents_ai AFTER INSERT ON documents BEGIN
|
||||
INSERT INTO documents_fts(rowid, title, content_text)
|
||||
VALUES (new.id, new.title, new.content_text);
|
||||
END;
|
||||
|
||||
CREATE TRIGGER documents_ad AFTER DELETE ON documents BEGIN
|
||||
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
|
||||
VALUES('delete', old.id, old.title, old.content_text);
|
||||
END;
|
||||
|
||||
CREATE TRIGGER documents_au AFTER UPDATE ON documents BEGIN
|
||||
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
|
||||
VALUES('delete', old.id, old.title, old.content_text);
|
||||
INSERT INTO documents_fts(rowid, title, content_text)
|
||||
VALUES (new.id, new.title, new.content_text);
|
||||
END;
|
||||
```
|
||||
|
||||
**Hybrid Search Algorithm (MVP) - Reciprocal Rank Fusion:**
|
||||
1. Query both vector index (top 50) and FTS5 (top 50)
|
||||
2. Merge results by document_id
|
||||
3. Combine with Reciprocal Rank Fusion (RRF):
|
||||
- For each retriever list, assign ranks (1..N)
|
||||
- `rrf_score = Σ 1 / (k + rank)` with k=60 (tunable)
|
||||
- RRF is simpler than weighted sums and doesn't require score normalization
|
||||
4. Apply filters (type, author, date, label)
|
||||
5. Return top K
|
||||
|
||||
**Why RRF over Weighted Sums:**
|
||||
- FTS5 BM25 scores and vector distances use different scales
|
||||
- Weighted sums (`0.7 * vector + 0.3 * fts`) require careful normalization
|
||||
- RRF operates on ranks, not scores, making it robust to scale differences
|
||||
- Well-established in information retrieval literature
|
||||
|
||||
**CLI Interface:**
|
||||
```bash
|
||||
# Basic semantic search
|
||||
gitlab-engine search "why did we choose Redis"
|
||||
|
||||
# Pure FTS search (fallback if embeddings unavailable)
|
||||
gitlab-engine search "redis" --mode=lexical
|
||||
|
||||
# Filtered search
|
||||
gitlab-engine search "authentication" --type=mr --after=2024-01-01
|
||||
|
||||
# Filter by label
|
||||
gitlab-engine search "performance" --label=bug --label=critical
|
||||
|
||||
# JSON output for programmatic use
|
||||
gitlab-engine search "payment processing" --json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Checkpoint 5: Incremental Sync
|
||||
**Deliverable:** Efficient ongoing synchronization with GitLab
|
||||
|
||||
**Test:** Make a change in GitLab; run `gitlab-engine sync` → only fetches changed items; verify change appears in search
|
||||
|
||||
**Scope:**
|
||||
- Delta sync based on stable cursor (updated_at + tie-breaker id)
|
||||
- Dependent resources sync strategy (notes, MR changes)
|
||||
- Webhook handler (optional, if webhook access granted)
|
||||
- Re-embedding based on content_hash change (documents.content_hash != embedding_metadata.content_hash)
|
||||
- Sync status reporting
|
||||
|
||||
**Correctness Rules (MVP):**
|
||||
1. Fetch pages ordered by `updated_at ASC`, within identical timestamps advance by `gitlab_id ASC`
|
||||
2. Cursor advances only after successful DB commit for that page
|
||||
3. Dependent resources:
|
||||
- For each updated issue/MR, refetch its notes (sorted by `updated_at`)
|
||||
- For each updated MR, refetch its file changes
|
||||
4. A document is queued for embedding iff `documents.content_hash != embedding_metadata.content_hash`
|
||||
5. Sync run is marked 'failed' with error message if any page fails (can resume from cursor)
|
||||
|
||||
**Why Dependent Resource Model:**
|
||||
- GitLab Notes API doesn't provide a clean global `updated_after` stream
|
||||
- Notes are listed per-issue or per-MR, not as a top-level resource
|
||||
- Treating notes as dependent resources (refetch when parent updates) is simpler and more correct
|
||||
- Same applies to MR changes/diffs
|
||||
|
||||
**CLI Commands:**
|
||||
```bash
|
||||
# Full sync (respects cursors, only fetches new/updated)
|
||||
gitlab-engine sync
|
||||
|
||||
# Force full re-sync (resets cursors)
|
||||
gitlab-engine sync --full
|
||||
|
||||
# Override stale 'running' run after operator review
|
||||
gitlab-engine sync --force
|
||||
|
||||
# Show sync status
|
||||
gitlab-engine sync-status
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Future Checkpoints (Post-MVP)
|
||||
|
||||
### Checkpoint 6: File/Feature History View
|
||||
- Map commits to MRs to discussions
|
||||
- Query: "Show decision history for src/auth/login.ts"
|
||||
- Ship `gitlab-engine file-history <path>` as a first-class feature here
|
||||
- This command is deferred from MVP to sharpen checkpoint focus
|
||||
|
||||
### Checkpoint 7: Personal Dashboard
|
||||
- Filter by assigned/mentioned
|
||||
- Integrate with existing gitlab-inbox tool
|
||||
|
||||
### Checkpoint 8: Person Context
|
||||
- Aggregate contributions by author
|
||||
- Expertise inference from activity
|
||||
|
||||
### Checkpoint 9: Decision Graph
|
||||
- Extract decisions from discussions (LLM-assisted)
|
||||
- Visualize decision relationships
|
||||
|
||||
---
|
||||
|
||||
## Verification Strategy
|
||||
|
||||
Each checkpoint includes:
|
||||
|
||||
1. **Automated tests** - Unit tests for data transformations, integration tests for API calls
|
||||
2. **CLI smoke tests** - Manual commands with expected outputs documented
|
||||
3. **Data integrity checks** - Count verification against GitLab, schema validation
|
||||
4. **Search quality tests** - Known queries with expected results (for Checkpoint 4+)
|
||||
|
||||
---
|
||||
|
||||
## Risk Mitigation
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| GitLab rate limiting | Exponential backoff, respect Retry-After headers, incremental sync |
|
||||
| Embedding model quality | Start with nomic-embed-text; architecture allows model swap |
|
||||
| SQLite scale limits | Monitor performance; Postgres migration path documented |
|
||||
| Stale data | Incremental sync with change detection |
|
||||
| Mid-sync failures | Cursor-based resumption, sync_runs audit trail |
|
||||
| Search quality | Hybrid (vector + FTS5) retrieval with RRF, golden query test suite |
|
||||
| Concurrent sync corruption | Single-flight protection (refuse if existing run is `running`) |
|
||||
|
||||
**SQLite Performance Defaults (MVP):**
|
||||
- Enable `PRAGMA journal_mode=WAL;` on every connection
|
||||
- Enable `PRAGMA foreign_keys=ON;` on every connection
|
||||
- Use explicit transactions for page/batch inserts
|
||||
- Targeted indexes on `(project_id, updated_at)` for primary resources
|
||||
|
||||
---
|
||||
|
||||
## Schema Summary
|
||||
|
||||
| Table | Checkpoint | Purpose |
|
||||
|-------|------------|---------|
|
||||
| projects | 0 | Configured GitLab projects |
|
||||
| sync_runs | 0 | Audit trail of sync operations |
|
||||
| sync_cursors | 0 | Resumable sync state per primary resource |
|
||||
| raw_payloads | 0 | Decoupled raw JSON storage |
|
||||
| issues | 1 | Normalized issues |
|
||||
| labels | 1 | Label definitions (unique by project + name) |
|
||||
| issue_labels | 1 | Issue-label junction |
|
||||
| merge_requests | 2 | Normalized MRs |
|
||||
| notes | 2 | Issue and MR comments (with parent FKs) |
|
||||
| mr_files | 2 | MR file changes (with rename tracking) |
|
||||
| mr_labels | 2 | MR-label junction |
|
||||
| documents | 3 | Unified searchable documents |
|
||||
| document_labels | 3 | Document-label junction for fast filtering |
|
||||
| embeddings | 3 | Vector embeddings (sqlite-vss, rowid=document_id) |
|
||||
| embedding_metadata | 3 | Embedding provenance + change detection |
|
||||
| documents_fts | 4 | Full-text search index (fts5) |
|
||||
|
||||
---
|
||||
|
||||
## Resolved Decisions
|
||||
|
||||
| Question | Decision | Rationale |
|
||||
|----------|----------|-----------|
|
||||
| Commit/file linkage | **Include MR→file links** | Enables "what MRs touched this file?" without full commit history |
|
||||
| Labels | **Index as filters** | Labels are well-used; `document_labels` table enables fast `--label=X` filtering |
|
||||
| Labels uniqueness | **By (project_id, name)** | GitLab API returns labels as strings; gitlab_id isn't always available |
|
||||
| Sync method | **Polling for MVP** | Decide on webhooks after using the system |
|
||||
| Notes sync | **Dependent resource** | Notes API is per-parent, not global; refetch on parent update |
|
||||
| Hybrid ranking | **RRF over weighted sums** | Simpler, no score normalization needed |
|
||||
| Embedding rowid | **rowid = documents.id** | Eliminates fragile rowid mapping during upserts |
|
||||
| file-history CLI | **Post-MVP (CP6)** | Sharpens MVP checkpoint focus |
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. User approves this spec
|
||||
2. Generate Checkpoint 0 PRD for project setup
|
||||
3. Implement Checkpoint 0
|
||||
4. Human validates → proceed to Checkpoint 1
|
||||
5. Repeat for each checkpoint
|
||||
Reference in New Issue
Block a user