initial

2026-01-20 13:11:36 -05:00
commit 7702d2a493
2 changed files with 1204 additions and 0 deletions
--- a/PRD.md
+++ b/PRD.md
@@ -0,0 +1,563 @@
+# GitLab Inbox - Product Requirements Document
+
+## Overview
+
+**Product Name**: GitLab Inbox
+**Version**: 1.0
+**Author**: Taylor Eernisse
+**Date**: January 16, 2026
+
+### Problem Statement
+
+Managing GitLab activity with ADHD is overwhelming. The native GitLab interface creates cognitive overload through:
+
+- **Information scatter**: Issues, MRs, and activity are spread across multiple pages
+- **Missing reply awareness**: Hard to know when someone has responded to your question (not fully covered by /todos alone)
+- **Context loss**: Difficult to find the right tab or remember which conversation you were tracking
+- **No unified "what's next"**: Multiple clicks required to understand what needs attention
+
+### Solution
+
+A local, always-open "inbox" application that presents GitLab notifications in an ADHD-friendly interface with explicit "handled" tracking, snooze capabilities, watchlist for awaiting replies, and progress visibility.
+
+---
+
+## Target User
+
+**Primary Persona**: Software developer with ADHD working on 1-2 GitLab projects who needs to track conversations and respond to mentions, reviews, and assignments without cognitive overload.
+
+**Key Characteristics**:
+- Needs clear "what's next" visibility
+- Benefits from external accountability (seeing who's waiting)
+- Motivated by progress tracking (watching a list shrink)
+- Prefers always-open tools over on-demand checks
+- Struggles with context switching and finding the right place
+- Needs a "not now but not forgotten" path that doesn't require willpower
+
+---
+
+## Goals
+
+### User Goals
+1. Know immediately when someone has replied or needs my attention
+2. Quickly navigate to the right place in GitLab to respond
+3. Track what I've handled today for satisfaction and progress awareness
+4. Reduce cognitive load of manually tracking conversations
+5. Defer items temporarily without losing accountability (snooze)
+6. Know when someone has replied to something I'm waiting on
+
+### Product Goals
+1. Reduce time-to-awareness for GitLab notifications
+2. Eliminate the need to manually poll GitLab for updates
+3. Provide ADHD-friendly UX patterns (clear actions, progress visibility, minimal decisions)
+4. Enable keyboard-first operation to reduce friction
+
+### Non-Goals (v1.0)
+- Replacing GitLab for any write operations (commenting, reviewing, merging)
+- Supporting multiple GitLab instances
+- Team/shared usage
+- Mobile support
+
+---
+
+## Core Features
+
+### 1. Inbox View (Primary)
+
+**Description**: Display all GitLab todos (notifications) that need attention.
+
+**Data Source**: GitLab `/todos` API endpoint
+
+**Display Elements** (per item):
+| Element | Description |
+|---------|-------------|
+| Action Badge | Type indicator: mentioned, assigned, review_requested, build_failed, etc. |
+| Target Title | MR or Issue title |
+| Author | Who triggered this todo (name + avatar) |
+| Time | Relative time since created ("2h ago", "3 days") |
+| Project | Project name for context |
+
+**Interactions**:
+- **Click item / Enter** → Opens target URL in browser (GitLab)
+- **Mark Handled** → Moves item to Done Today (local state only)
+- **Snooze** → Hides item until a chosen time (local state only)
+- **Dismiss** → `POST /todos/:id/mark_as_done` (marks as done in GitLab)
+
+**Filtering**: Items marked as "handled" or "snoozed" locally are hidden from Inbox.
+
+### 2. Snoozed View
+
+**Description**: Items temporarily deferred until their wake time.
+
+**Purpose**:
+- "Not now but not forgotten" path
+- Reduces inbox dread by shrinking the visible list
+- Enables focus sessions: clear the deck, then pull from Snoozed intentionally
+
+**Snooze Options**:
+- Later today (3 hours)
+- Tomorrow morning (9am local)
+- Next weekday (Mon-Fri, 9am)
+- Custom date/time
+
+**Behavior**:
+- Snoozed items are hidden from Inbox
+- When wake time passes, item returns to Inbox with a "Woke up" indicator
+- Snoozed view shows all snoozed items with their wake times
+
+### 3. Watchlist (Awaiting Reply)
+
+**Description**: Targets you're explicitly waiting on (MRs/Issues/etc.). Alerts when there is new activity since you last checked.
+
+**Purpose**:
+- GitLab todos don't guarantee "someone replied" notifications
+- Explicit watch semantics for "I'm waiting on Bob" tracking
+- Gain "external accountability" symmetry
+
+**Data Sources**:
+- Primary: /todos (fast path for items that generate new todos)
+- Secondary: per-target `updated_at`/notes polling for watched items (small set)
+
+**Interactions**:
+- Mark Handled → optionally "Add to Watchlist" toggle
+- Watch item shows "Last seen" timestamp and "New activity" indicator
+- Click to open target in GitLab
+- Remove from watchlist when no longer waiting
+
+### 4. Done Today View
+
+**Description**: Items marked as handled during the current day.
+
+**Purpose**:
+- ADHD-friendly progress visibility
+- Satisfaction from watching list shrink
+- Review of daily accomplishments
+
+**Behavior**:
+- Stored as date-bucketed ledger keyed by local date (YYYY-MM-DD)
+- "Done Today" shows bucket for current local date
+- Option to clear today's bucket only
+- Historical buckets retained for potential "Done Yesterday" or weekly views
+
+### 5. Manual Refresh
+
+**Description**: Button to fetch latest todos on demand.
+
+**Purpose**: Immediate update when user knows something changed.
+
+### 6. Background Polling (v1.1)
+
+**Description**: Automatic periodic refresh of todos.
+
+**Configuration**:
+- Base interval (default: 60s)
+- Backoff on failure (exponential, capped at 15m) with jitter
+- 429 handling (respect `Retry-After` header; otherwise back off)
+
+**Indicator**:
+- Last successful refresh time
+- Next scheduled refresh
+- Current backoff state (if any)
+
+### 7. Keyboard Shortcuts (v1.0)
+
+**Description**: Keyboard-first operation for reduced friction.
+
+| Key | Action |
+|-----|--------|
+| `j` / `k` | Navigate down / up |
+| `Enter` | Open selected item in GitLab |
+| `h` | Mark handled |
+| `s` | Snooze (opens snooze picker) |
+| `d` | Dismiss (mark as done in GitLab) |
+| `w` | Add to / remove from watchlist |
+| `/` | Focus search/filter |
+
+### 8. Focus Mode (Optional)
+
+**Description**: Show only the next N items (default 3) to reduce decision load.
+
+**Purpose**:
+- Convert "overwhelm" into "sequence"
+- Reduce choices, increase throughput
+- ADHD-optimized: work the queue, don't manage the list
+
+**Behavior**:
+- Primary action emphasized ("Open", then "Handled"/"Snooze")
+- Toggle: Focus / All Items
+- Focus queue is top N unhandled items by creation date
+
+---
+
+## Technical Architecture
+
+### Tech Stack
+
+| Component | Technology |
+|-----------|------------|
+| Framework | TanStack Start |
+| Styling | Tailwind CSS |
+| Runtime | Node.js (local) |
+| State Persistence | JSON file with atomic writes (local) |
+| Secret Storage | OS keychain (preferred) or encrypted local store |
+| GitLab Integration | REST API with Personal Access Token |
+
+### Deployment
+
+- **Local only**: Runs on localhost
+- **No external hosting**: No cloud deployment, no auth flows
+- **Single user**: No multi-tenancy
+
+### GitLab API
+
+**Authentication**: Personal Access Token (PAT) with `read_api` scope
+
+**Primary Endpoints**:
+```
+GET /api/v4/todos?state=pending&per_page=100
+
+POST /api/v4/todos/:id/mark_as_done
+```
+
+**Response Structure** (relevant fields):
+```typescript
+interface GitLabTodo {
+  id: number;
+  action_name:
+    | 'assigned'
+    | 'mentioned'
+    | 'build_failed'
+    | 'marked'
+    | 'approval_required'
+    | 'unmergeable'
+    | 'directly_addressed'
+    | 'merge_train_removed'
+    | 'member_access_requested'
+    | string; // forward-compatible for new action types
+  target_type: 'MergeRequest' | 'Issue' | 'Commit' | 'Epic' | 'DesignManagement::Design' | string;
+  target: {
+    id: number;
+    iid: number;
+    title: string;
+    web_url?: string; // optional; may not be present for all target types
+  };
+  target_url: string; // canonical "Open" URL - use this for navigation
+  author: {
+    id: number;
+    name: string;
+    avatar_url: string;
+  };
+  project: {
+    id: number;
+    name: string;
+    path_with_namespace: string;
+  };
+  created_at: string;
+}
+```
+
+### Local State
+
+```typescript
+interface LocalState {
+  schemaVersion: number; // for migrations
+
+  handledByDate: {
+    [localDate: string]: { // YYYY-MM-DD in local time
+      [todoId: number]: {
+        handledAt: string;  // ISO timestamp
+        todo: GitLabTodo;   // Snapshot for Done Today display
+      }
+    }
+  };
+
+  snoozedTodos: {
+    [todoId: number]: {
+      wakeAt: string;     // ISO timestamp
+      snoozedAt: string;  // ISO timestamp
+      todo: GitLabTodo;   // snapshot
+    }
+  };
+
+  watchlist: {
+    [watchKey: string]: { // e.g., "MergeRequest:123" or "Issue:456"
+      targetType: string;
+      projectId?: number;
+      targetId: number;
+      targetIid?: number;
+      targetUrl: string;
+      lastSeenUpdatedAt?: string; // ISO - when we last observed the target
+      lastCheckedAt?: string;     // ISO - when we last polled
+      addedAt: string;            // ISO
+      muted?: boolean;
+    }
+  };
+}
+```
+
+**Storage**: `~/.config/gitlab-inbox/state.json`
+
+**Persistence Strategy**:
+- Atomic writes: write to `state.json.tmp`, then rename to `state.json`
+- Keep `state.json.bak` as last-known-good before each write
+- Validate JSON schema on load; if invalid, fall back to backup and surface warning
+- Schema version for forward migrations
+
+---
+
+## User Interface
+
+### Layout
+
+```
+--------------------------------------------------+
+|  GitLab Inbox    [Focus] [Refresh]   🟢 2m ago   |
+|  [Inbox] [Snoozed] [Watchlist] [Done Today]      |
+--------------------------------------------------+
+|                                                  |
+|  > [mentioned] Fix login bug                     |
+|    Alice Smith · infra-frontend · 2h ago         |
+|                    [Snooze] [Handle] [Open]      |
+|                                                  |
+|    [review_requested] Add caching layer          |
+|    Bob Jones · api-service · 1d ago              |
+|                    [Snooze] [Handle] [Open]      |
+|                                                  |
+|    [assigned] Update documentation               |
+|    Carol White · docs · 3d ago                   |
+|                    [Snooze] [Handle] [Open]      |
+|                                                  |
+--------------------------------------------------+
+|  j/k: navigate  Enter: open  h: handle  s: snooze |
+--------------------------------------------------+
+```
+
+### Action Badge Colors
+
+| Action | Color | Meaning |
+|--------|-------|---------|
+| mentioned | Blue | Someone mentioned you |
+| assigned | Purple | Assigned to you |
+| approval_required | Yellow | Needs your approval |
+| build_failed | Red | Pipeline failure |
+| directly_addressed | Cyan | Direct @ mention |
+| unmergeable | Orange | MR has conflicts |
+| marked | Gray | Marked as todo |
+| merge_train_removed | Red | Removed from merge train |
+| member_access_requested | Teal | Access request |
+| (unknown) | Gray | Forward-compatible fallback |
+
+### States
+
+- **Loading**: Skeleton cards while fetching
+- **Empty**: "All clear! No pending items." message
+- **Error**: Connection error with retry button
+- **Stale**: Visual indicator if data is old (> 5 min since last *successful* refresh)
+- **Backoff**: Indicator showing retry status when experiencing errors
+
+---
+
+## User Flows
+
+### Flow 1: Morning Check-in
+1. Open GitLab Inbox (already running in background tab)
+2. See list of todos sorted by newest first
+3. Press `Enter` to open in GitLab
+4. Handle the item (reply, review, etc.)
+5. Return to Inbox, press `h` to mark handled
+6. Item moves to Done Today
+7. Repeat until Inbox is empty
+
+### Flow 2: Triage with Snooze
+1. See inbox with 12 items
+2. Quickly triage: handle 3, snooze 5 until tomorrow, dismiss 2 already-resolved
+3. Inbox now shows 2 items to focus on
+4. Tomorrow: snoozed items wake up and return to inbox
+
+### Flow 3: Awaiting Reply
+1. Handle a todo (you replied to someone's question)
+2. Toggle "Add to Watchlist" when marking handled
+3. Item appears in Watchlist view
+4. Later: see "New activity" indicator when they respond
+5. Open, read response, remove from watchlist
+
+### Flow 4: Focus Session
+1. Enable Focus Mode
+2. See only top 3 items
+3. Work through them sequentially
+4. As items complete, next ones appear
+5. Reduced decision fatigue
+
+### Flow 5: End-of-Day Review
+1. Navigate to Done Today view
+2. See all items handled today
+3. Satisfaction from visible progress
+
+---
+
+## Success Metrics
+
+| Metric | Target | Measurement |
+|--------|--------|-------------|
+| Time to awareness | < 2 min | Time from GitLab event to user seeing it |
+| Daily items handled | Increased | Compare to baseline (manual tracking) |
+| Context switches | Reduced | Fewer GitLab tabs open simultaneously |
+| Snooze usage | Regular | Items snoozed vs dismissed (healthy ratio = snooze used) |
+| Reply awareness | High | Watchlist items caught before manual check |
+| User satisfaction | Qualitative | Does this reduce ADHD-related friction? |
+
+---
+
+## Risks and Mitigations
+
+| Risk | Impact | Mitigation |
+|------|--------|------------|
+| GitLab API rate limits | Polling blocked | Configurable interval, backoff + jitter, respect 429/Retry-After |
+| Token expiration/rotation | App stops working | Clear error state + setup flow; surface expiry guidance and re-auth path |
+| State file corruption | Lose handled/snoozed/watch state | Atomic writes (tmp+rename), schema validation on load, keep last-known-good backup |
+| GitLab API changes | App breaks | Pin to known API version, monitor deprecations, forward-compatible types |
+| Token leakage | Security incident | Store in OS keychain, not in repo-adjacent files |
+
+---
+
+## Future Considerations (Post v1.0)
+
+- **Grouping**: By project, by action type
+- **Stale highlighting**: Visual alert for items waiting > X days
+- **Desktop notifications**: OS-level alerts for new high-priority items
+- **Quick actions**: Approve MR, close issue directly from app
+- **Multiple GitLab instances**: Connect to both gitlab.com and self-hosted
+- **Done history**: View handled items from yesterday, this week
+
+---
+
+## Implementation Phases
+
+### Phase 0: Setup & Auth
+- First-run setup wizard (GitLab URL + token)
+- Token storage implementation (keychain/encrypted local)
+- Connectivity check (`/todos`, auth failure UX)
+- Clear error states for invalid/expired tokens
+
+### Phase 1: Foundation
+- Initialize TanStack Start project
+- Set up Tailwind CSS
+- Create GitLab API client with PAT auth
+- Fetch and display todos in basic list (using `target_url` for navigation)
+- Implement click-to-open
+
+### Phase 2: Core Workflow
+- Add local storage with atomic writes + backup
+- Implement date-bucketed handled state
+- Implement "Mark Handled" action
+- Create Done Today view
+- Add keyboard shortcuts (minimal set: j/k/Enter/h/s/d)
+- Add Snooze + Snoozed view
+- Filter handled/snoozed todos from Inbox
+
+### Phase 3: Reliability & Awareness
+- Background polling with configurable interval
+- Backoff/jitter + 429 handling
+- Last successful refresh tracking
+- Watchlist ("Awaiting Reply") implementation
+- Per-target polling for watched items (small set)
+- Add manual refresh button
+- Relative time display
+- Action type badges with colors
+- Loading and error states
+- Connection status indicator
+
+### Phase 4: Polish
+- Focus Mode implementation
+- Snooze time picker refinement
+- Keyboard shortcut help overlay
+- State migration handling (schemaVersion)
+- Edge case handling (DST, timezone changes)
+
+---
+
+## Appendix
+
+### Environment Configuration
+
+**Primary configuration**:
+- URL + settings in: `~/.config/gitlab-inbox/config.json`
+- Token stored in OS keychain (preferred)
+
+**Optional (dev-only) `.env.local` support**:
+```env
+GITLAB_URL=https://gitlab.yourcompany.com
+GITLAB_TOKEN=glpat-xxxxxxxxxxxx
+```
+
+**Config file structure**:
+```json
+{
+  "gitlabUrl": "https://gitlab.yourcompany.com",
+  "pollingInterval": 60,
+  "focusModeCount": 3
+}
+```
+
+### Creating a GitLab PAT
+
+1. Go to GitLab → User Settings → Access Tokens
+2. Create token with `read_api` scope
+3. Set expiration (note: tokens expire at midnight UTC on expiry date)
+4. Save token via setup wizard (stored in keychain)
+5. Token never leaves local machine
+
+### Project Structure
+
+```
+gitlab-inbox/
+├── app/
+│   ├── routes/
+│   │   ├── __root.tsx
+│   │   ├── index.tsx          # Inbox view
+│   │   ├── snoozed.tsx        # Snoozed view
+│   │   ├── watchlist.tsx      # Watchlist view
+│   │   ├── done.tsx           # Done Today view
+│   │   └── setup.tsx          # First-run setup
+│   ├── components/
+│   │   ├── TodoCard.tsx
+│   │   ├── TodoList.tsx
+│   │   ├── ActionBadge.tsx
+│   │   ├── Header.tsx
+│   │   ├── SnoozePicker.tsx
+│   │   ├── FocusMode.tsx
+│   │   └── KeyboardHelp.tsx
+│   ├── lib/
+│   │   ├── gitlab.ts          # API client
+│   │   ├── storage.ts         # Atomic state persistence
+│   │   ├── keychain.ts        # Token storage
+│   │   ├── polling.ts         # Polling state machine
+│   │   ├── snooze.ts          # Snooze logic + wake checking
+│   │   ├── watchlist.ts       # Watchlist polling
+│   │   └── types.ts
+│   └── app.tsx
+├── package.json
+├── tailwind.config.ts
+└── vite.config.ts
+```
+
+### Test Strategy
+
+**Unit Tests**:
+- State normalization and migration
+- Snooze wake time calculations
+- Date bucketing logic (timezone handling)
+- Polling backoff calculations
+
+**Integration Tests** (mocked GitLab API):
+- `/todos` response parsing
+- `mark_as_done` endpoint calls
+- Error handling (401, 429, network errors)
+- State persistence round-trip (write + read)
+- Backup recovery on corruption
+
+**Manual Testing**:
+- First-run setup flow
+- Keyboard navigation
+- Snooze + wake cycle
+- Watchlist activity detection
--- a/SPEC.md
+++ b/SPEC.md
@@ -0,0 +1,641 @@
+# GitLab Knowledge Engine - Spec Document
+
+## Executive Summary
+
+A self-hosted tool to extract, index, and semantically search 2+ years of GitLab data (issues, MRs, comments/notes, and MR file-change links) from 2 main repositories (~10K items). The MVP delivers semantic search as a foundational capability that enables future specialized views (file history, personal tracking, person context). Commit-level indexing is explicitly post-MVP.
+
+---
+
+## Discovery Summary
+
+### Pain Points Identified
+1. **Knowledge discovery** - Tribal knowledge buried in old MRs/issues that nobody can find
+2. **Decision traceability** - Hard to find *why* decisions were made; context scattered across issue comments and MR discussions
+
+### Constraints
+| Constraint | Detail |
+|------------|--------|
+| Hosting | Self-hosted only, no external APIs |
+| Compute | Local dev machine (M-series Mac assumed) |
+| GitLab Access | Self-hosted instance, PAT access, no webhooks (could request) |
+| Build Method | AI agents will implement; user is TypeScript expert for review |
+
+### Target Use Cases (Priority Order)
+1. **MVP: Semantic Search** - "Find discussions about authentication redesign"
+2. **Future: File/Feature History** - "What decisions were made about src/auth/login.ts?"
+3. **Future: Personal Tracking** - "What am I assigned to or mentioned in?"
+4. **Future: Person Context** - "What's @johndoe's background in this project?"
+
+---
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                        GitLab API                                │
+│                    (Issues, MRs, Notes)                          │
+└─────────────────────────────────────────────────────────────────┘
+  (Commit-level indexing explicitly post-MVP)
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                     Data Ingestion Layer                         │
+│  - Incremental sync (PAT-based polling)                         │
+│  - Rate limiting / backoff                                       │
+│  - Raw JSON storage for replay                                   │
+│  - Dependent resource fetching (notes, MR changes)              │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                    Data Processing Layer                        │
+│  - Normalize artifacts to unified schema                        │
+│  - Extract searchable documents (canonical text + metadata)     │
+│  - Content hashing for change detection                         │
+│  - Build relationship graph (issue↔MR↔note↔file)               │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      Storage Layer                               │
+│  - SQLite + sqlite-vss + FTS5 (hybrid search)                   │
+│  - Structured metadata in relational tables                      │
+│  - Vector embeddings for semantic search                         │
+│  - Full-text index for lexical search fallback                  │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      Query Interface                             │
+│  - CLI for human testing                                         │
+│  - JSON API for AI agent testing                                 │
+│  - Semantic search with filters (author, date, type, label)     │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### Technology Choices
+
+| Component | Recommendation | Rationale |
+|-----------|---------------|-----------|
+| Language | TypeScript/Node.js | User expertise, good GitLab libs, AI agent friendly |
+| Database | SQLite + sqlite-vss | Zero-config, portable, vector search built-in |
+| Embeddings | Ollama + nomic-embed-text | Self-hosted, runs well on Apple Silicon, 768-dim vectors |
+| CLI Framework | Commander.js or oclif | Standard, well-documented |
+
+### Alternative Considered: Postgres + pgvector
+- Pros: More scalable, better for production multi-user
+- Cons: Requires running Postgres, heavier setup
+- Decision: Start with SQLite for simplicity; migration path exists if needed
+
+---
+
+## Checkpoint Structure
+
+Each checkpoint is a **testable milestone** where a human can validate the system works before proceeding.
+
+### Checkpoint 0: Project Setup
+**Deliverable:** Scaffolded project with GitLab API connection verified
+
+**Tests:**
+1. Run `gitlab-engine auth-test` → returns authenticated user info
+2. Run `gitlab-engine doctor` → verifies:
+   - Can reach GitLab baseUrl
+   - PAT is present and can read configured projects
+   - SQLite opens DB and migrations apply
+   - Ollama reachable OR embedding disabled with clear warning
+
+**Scope:**
+- Project structure (TypeScript, ESLint, Vitest)
+- GitLab API client with PAT authentication
+- Environment and project configuration
+- Basic CLI scaffold with `auth-test` command
+- `doctor` command for environment verification
+- Projects table and initial sync
+
+**Configuration (MVP):**
+```json
+// gitlab-engine.config.json
+{
+  "gitlab": {
+    "baseUrl": "https://gitlab.example.com",
+    "tokenEnvVar": "GITLAB_TOKEN"
+  },
+  "projects": [
+    { "path": "group/project-one" },
+    { "path": "group/project-two" }
+  ],
+  "embedding": {
+    "provider": "ollama",
+    "model": "nomic-embed-text",
+    "baseUrl": "http://localhost:11434"
+  }
+}
+```
+
+**DB Runtime Defaults (Checkpoint 0):**
+- On every connection:
+  - `PRAGMA journal_mode=WAL;`
+  - `PRAGMA foreign_keys=ON;`
+
+**Schema (Checkpoint 0):**
+```sql
+-- Projects table (configured targets)
+CREATE TABLE projects (
+  id INTEGER PRIMARY KEY,
+  gitlab_project_id INTEGER UNIQUE NOT NULL,
+  path_with_namespace TEXT NOT NULL,
+  default_branch TEXT,
+  web_url TEXT,
+  created_at INTEGER,
+  updated_at INTEGER,
+  raw_payload_id INTEGER REFERENCES raw_payloads(id)
+);
+CREATE INDEX idx_projects_path ON projects(path_with_namespace);
+
+-- Sync tracking for reliability
+CREATE TABLE sync_runs (
+  id INTEGER PRIMARY KEY,
+  started_at INTEGER NOT NULL,
+  finished_at INTEGER,
+  status TEXT NOT NULL,          -- 'running' | 'succeeded' | 'failed'
+  command TEXT NOT NULL,         -- 'ingest issues' | 'sync' | etc.
+  error TEXT
+);
+
+-- Sync cursors for primary resources only
+-- Notes and MR changes are dependent resources (fetched via parent updates)
+CREATE TABLE sync_cursors (
+  project_id INTEGER NOT NULL REFERENCES projects(id),
+  resource_type TEXT NOT NULL,   -- 'issues' | 'merge_requests'
+  updated_at_cursor INTEGER,     -- last fully processed updated_at (ms epoch)
+  tie_breaker_id INTEGER,        -- last fully processed gitlab_id (for stable ordering)
+  PRIMARY KEY(project_id, resource_type)
+);
+
+-- Raw payload storage (decoupled from entity tables)
+CREATE TABLE raw_payloads (
+  id INTEGER PRIMARY KEY,
+  source TEXT NOT NULL,          -- 'gitlab'
+  resource_type TEXT NOT NULL,   -- 'project' | 'issue' | 'mr' | 'note'
+  gitlab_id INTEGER NOT NULL,
+  fetched_at INTEGER NOT NULL,
+  json TEXT NOT NULL
+);
+CREATE INDEX idx_raw_payloads_lookup ON raw_payloads(resource_type, gitlab_id);
+```
+
+---
+
+### Checkpoint 1: Issue Ingestion
+**Deliverable:** All issues from target repos stored locally
+
+**Test:** Run `gitlab-engine ingest --type=issues` → count matches GitLab; run `gitlab-engine list issues --limit=10` → displays issues correctly
+
+**Scope:**
+- Issue fetcher with pagination handling
+- Raw JSON storage in raw_payloads table
+- Normalized issue schema in SQLite
+- Labels ingestion derived from issue payload:
+  - Always persist label names from `labels: string[]`
+  - Optionally request `with_labels_details=true` to capture color/description when available
+- Incremental sync support (run tracking + per-project cursor)
+- Basic list/count CLI commands
+
+**Reliability/Idempotency Rules:**
+- Every ingest/sync creates a `sync_runs` row
+- Single-flight: refuse to start if an existing run is `running` (unless `--force`)
+- Cursor advances only after successful transaction commit per page/batch
+- Ordering: `updated_at ASC`, tie-breaker `gitlab_id ASC`
+- Use explicit transactions for batch inserts
+
+**Schema Preview:**
+```sql
+CREATE TABLE issues (
+  id INTEGER PRIMARY KEY,
+  gitlab_id INTEGER UNIQUE NOT NULL,
+  project_id INTEGER NOT NULL REFERENCES projects(id),
+  iid INTEGER NOT NULL,
+  title TEXT,
+  description TEXT,
+  state TEXT,
+  author_username TEXT,
+  created_at INTEGER,
+  updated_at INTEGER,
+  web_url TEXT,
+  raw_payload_id INTEGER REFERENCES raw_payloads(id)
+);
+CREATE INDEX idx_issues_project_updated ON issues(project_id, updated_at);
+CREATE INDEX idx_issues_author ON issues(author_username);
+
+-- Labels are derived from issue payloads (string array)
+-- Uniqueness is (project_id, name) since gitlab_id isn't always available
+CREATE TABLE labels (
+  id INTEGER PRIMARY KEY,
+  gitlab_id INTEGER,                  -- optional (only if available)
+  project_id INTEGER NOT NULL REFERENCES projects(id),
+  name TEXT NOT NULL,
+  color TEXT,
+  description TEXT
+);
+CREATE UNIQUE INDEX uq_labels_project_name ON labels(project_id, name);
+CREATE INDEX idx_labels_name ON labels(name);
+
+CREATE TABLE issue_labels (
+  issue_id INTEGER REFERENCES issues(id),
+  label_id INTEGER REFERENCES labels(id),
+  PRIMARY KEY(issue_id, label_id)
+);
+CREATE INDEX idx_issue_labels_label ON issue_labels(label_id);
+```
+
+---
+
+### Checkpoint 2: MR + Comments + File Links Ingestion
+**Deliverable:** All MRs, discussion threads, and file-change links stored locally
+
+**Test:** Run `gitlab-engine ingest --type=merge_requests` → count matches; run `gitlab-engine show mr 1234` → displays MR with comments and files changed
+
+**Scope:**
+- MR fetcher with pagination
+- Notes fetcher (issue notes + MR notes) as a dependent resource:
+  - During initial ingest: fetch notes for every issue/MR
+  - During sync: refetch notes only for issues/MRs updated since cursor
+- MR changes/diffs fetcher as a dependent resource:
+  - During initial ingest: fetch changes for every MR
+  - During sync: refetch changes only for MRs updated since cursor
+- Relationship linking (note → parent issue/MR via foreign keys, MR → files)
+- Extended CLI commands for MR display
+
+**Schema Additions:**
+```sql
+CREATE TABLE merge_requests (
+  id INTEGER PRIMARY KEY,
+  gitlab_id INTEGER UNIQUE NOT NULL,
+  project_id INTEGER NOT NULL REFERENCES projects(id),
+  iid INTEGER NOT NULL,
+  title TEXT,
+  description TEXT,
+  state TEXT,
+  author_username TEXT,
+  source_branch TEXT,
+  target_branch TEXT,
+  created_at INTEGER,
+  updated_at INTEGER,
+  merged_at INTEGER,
+  web_url TEXT,
+  raw_payload_id INTEGER REFERENCES raw_payloads(id)
+);
+CREATE INDEX idx_mrs_project_updated ON merge_requests(project_id, updated_at);
+CREATE INDEX idx_mrs_author ON merge_requests(author_username);
+
+-- Notes with explicit parent foreign keys for referential integrity
+CREATE TABLE notes (
+  id INTEGER PRIMARY KEY,
+  gitlab_id INTEGER UNIQUE NOT NULL,
+  project_id INTEGER NOT NULL REFERENCES projects(id),
+  issue_id INTEGER REFERENCES issues(id),
+  merge_request_id INTEGER REFERENCES merge_requests(id),
+  noteable_type TEXT NOT NULL,      -- 'Issue' | 'MergeRequest'
+  noteable_iid INTEGER NOT NULL,    -- parent IID (from API path)
+  author_username TEXT,
+  body TEXT,
+  created_at INTEGER,
+  updated_at INTEGER,
+  system BOOLEAN,
+  raw_payload_id INTEGER REFERENCES raw_payloads(id),
+  -- Exactly one parent FK must be set
+  CHECK (
+    (noteable_type='Issue' AND issue_id IS NOT NULL AND merge_request_id IS NULL) OR
+    (noteable_type='MergeRequest' AND merge_request_id IS NOT NULL AND issue_id IS NULL)
+  )
+);
+CREATE INDEX idx_notes_issue ON notes(issue_id);
+CREATE INDEX idx_notes_mr ON notes(merge_request_id);
+CREATE INDEX idx_notes_author ON notes(author_username);
+
+-- File linkage for "what MRs touched this file?" queries (with rename support)
+CREATE TABLE mr_files (
+  id INTEGER PRIMARY KEY,
+  merge_request_id INTEGER REFERENCES merge_requests(id),
+  old_path TEXT,
+  new_path TEXT,
+  new_file BOOLEAN,
+  deleted_file BOOLEAN,
+  renamed_file BOOLEAN,
+  UNIQUE(merge_request_id, old_path, new_path)
+);
+CREATE INDEX idx_mr_files_old_path ON mr_files(old_path);
+CREATE INDEX idx_mr_files_new_path ON mr_files(new_path);
+
+-- MR labels (reuse same labels table)
+CREATE TABLE mr_labels (
+  merge_request_id INTEGER REFERENCES merge_requests(id),
+  label_id INTEGER REFERENCES labels(id),
+  PRIMARY KEY(merge_request_id, label_id)
+);
+CREATE INDEX idx_mr_labels_label ON mr_labels(label_id);
+```
+
+---
+
+### Checkpoint 3: Embedding Generation
+**Deliverable:** Vector embeddings generated for all text content
+
+**Test:** Run `gitlab-engine embed --all` → progress indicator; run `gitlab-engine stats` → shows embedding coverage percentage
+
+**Scope:**
+- Ollama integration (nomic-embed-text model)
+- Embedding generation pipeline (batch processing)
+- Vector storage in SQLite (sqlite-vss extension)
+- Progress tracking and resumability
+- Document extraction layer:
+  - Canonical "search documents" derived from issues/MRs/notes
+  - Stable content hashing for change detection (SHA-256 of content_text)
+  - Single embedding per document (chunking deferred to post-MVP)
+- Denormalized metadata for fast filtering (author, labels, dates)
+- Fast label filtering via `document_labels` join table
+
+**Schema Additions:**
+```sql
+-- Unified searchable documents (derived from issues/MRs/notes)
+CREATE TABLE documents (
+  id INTEGER PRIMARY KEY,
+  source_type TEXT NOT NULL,     -- 'issue' | 'merge_request' | 'note'
+  source_id INTEGER NOT NULL,    -- local DB id in the source table
+  project_id INTEGER NOT NULL REFERENCES projects(id),
+  author_username TEXT,
+  label_names TEXT,              -- JSON array (display/debug only)
+  created_at INTEGER,
+  updated_at INTEGER,
+  url TEXT,
+  title TEXT,                    -- null for notes
+  content_text TEXT NOT NULL,    -- canonical text for embedding/snippets
+  content_hash TEXT NOT NULL,    -- SHA-256 for change detection
+  UNIQUE(source_type, source_id)
+);
+CREATE INDEX idx_documents_project_updated ON documents(project_id, updated_at);
+CREATE INDEX idx_documents_author ON documents(author_username);
+CREATE INDEX idx_documents_source ON documents(source_type, source_id);
+
+-- Fast label filtering for documents (indexed exact-match)
+CREATE TABLE document_labels (
+  document_id INTEGER NOT NULL REFERENCES documents(id),
+  label_name TEXT NOT NULL,
+  PRIMARY KEY(document_id, label_name)
+);
+CREATE INDEX idx_document_labels_label ON document_labels(label_name);
+
+-- sqlite-vss virtual table
+-- Storage rule: embeddings.rowid = documents.id
+CREATE VIRTUAL TABLE embeddings USING vss0(
+  embedding(768)
+);
+
+-- Embedding provenance + change detection
+-- document_id is PRIMARY KEY and equals embeddings.rowid
+CREATE TABLE embedding_metadata (
+  document_id INTEGER PRIMARY KEY REFERENCES documents(id),
+  model TEXT NOT NULL,           -- 'nomic-embed-text'
+  dims INTEGER NOT NULL,         -- 768
+  content_hash TEXT NOT NULL,    -- copied from documents.content_hash
+  created_at INTEGER NOT NULL
+);
+```
+
+**Storage Rule (MVP):**
+- Insert embedding with `rowid = documents.id`
+- Upsert `embedding_metadata` by `document_id`
+- This alignment simplifies joins and eliminates rowid mapping fragility
+
+**Document Extraction Rules:**
+- Issue → title + "\n\n" + description
+- MR → title + "\n\n" + description
+- Note → body (skip system notes unless they contain meaningful content)
+
+---
+
+### Checkpoint 4: Semantic Search
+**Deliverable:** Working semantic search across all indexed content
+
+**Tests:**
+1. Run `gitlab-engine search "authentication redesign"` → returns ranked results with snippets
+2. Golden queries: curated list of 10 queries with expected result *containment* (e.g., "at least one of these 3 known URLs appears in top 10")
+3. `gitlab-engine search "..." --json` validates against JSON schema (stable fields present)
+
+**Scope:**
+- Hybrid retrieval:
+  - Vector recall (sqlite-vss) + FTS lexical recall (fts5)
+  - Merge + rerank results using Reciprocal Rank Fusion (RRF)
+- Result ranking and scoring (document-level)
+- Search filters: `--type=issue|mr|note`, `--author=username`, `--after=date`, `--label=name`
+  - Label filtering operates on `document_labels` (indexed, exact-match)
+- Output formatting: ranked list with title, snippet, score, URL
+- JSON output mode for AI agent consumption
+
+**Schema Additions:**
+```sql
+-- Full-text search for hybrid retrieval
+CREATE VIRTUAL TABLE documents_fts USING fts5(
+  title,
+  content_text,
+  content='documents',
+  content_rowid='id'
+);
+
+-- Triggers to keep FTS in sync
+CREATE TRIGGER documents_ai AFTER INSERT ON documents BEGIN
+  INSERT INTO documents_fts(rowid, title, content_text)
+  VALUES (new.id, new.title, new.content_text);
+END;
+
+CREATE TRIGGER documents_ad AFTER DELETE ON documents BEGIN
+  INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
+  VALUES('delete', old.id, old.title, old.content_text);
+END;
+
+CREATE TRIGGER documents_au AFTER UPDATE ON documents BEGIN
+  INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
+  VALUES('delete', old.id, old.title, old.content_text);
+  INSERT INTO documents_fts(rowid, title, content_text)
+  VALUES (new.id, new.title, new.content_text);
+END;
+```
+
+**Hybrid Search Algorithm (MVP) - Reciprocal Rank Fusion:**
+1. Query both vector index (top 50) and FTS5 (top 50)
+2. Merge results by document_id
+3. Combine with Reciprocal Rank Fusion (RRF):
+   - For each retriever list, assign ranks (1..N)
+   - `rrf_score = Σ 1 / (k + rank)` with k=60 (tunable)
+   - RRF is simpler than weighted sums and doesn't require score normalization
+4. Apply filters (type, author, date, label)
+5. Return top K
+
+**Why RRF over Weighted Sums:**
+- FTS5 BM25 scores and vector distances use different scales
+- Weighted sums (`0.7 * vector + 0.3 * fts`) require careful normalization
+- RRF operates on ranks, not scores, making it robust to scale differences
+- Well-established in information retrieval literature
+
+**CLI Interface:**
+```bash
+# Basic semantic search
+gitlab-engine search "why did we choose Redis"
+
+# Pure FTS search (fallback if embeddings unavailable)
+gitlab-engine search "redis" --mode=lexical
+
+# Filtered search
+gitlab-engine search "authentication" --type=mr --after=2024-01-01
+
+# Filter by label
+gitlab-engine search "performance" --label=bug --label=critical
+
+# JSON output for programmatic use
+gitlab-engine search "payment processing" --json
+```
+
+---
+
+### Checkpoint 5: Incremental Sync
+**Deliverable:** Efficient ongoing synchronization with GitLab
+
+**Test:** Make a change in GitLab; run `gitlab-engine sync` → only fetches changed items; verify change appears in search
+
+**Scope:**
+- Delta sync based on stable cursor (updated_at + tie-breaker id)
+- Dependent resources sync strategy (notes, MR changes)
+- Webhook handler (optional, if webhook access granted)
+- Re-embedding based on content_hash change (documents.content_hash != embedding_metadata.content_hash)
+- Sync status reporting
+
+**Correctness Rules (MVP):**
+1. Fetch pages ordered by `updated_at ASC`, within identical timestamps advance by `gitlab_id ASC`
+2. Cursor advances only after successful DB commit for that page
+3. Dependent resources:
+   - For each updated issue/MR, refetch its notes (sorted by `updated_at`)
+   - For each updated MR, refetch its file changes
+4. A document is queued for embedding iff `documents.content_hash != embedding_metadata.content_hash`
+5. Sync run is marked 'failed' with error message if any page fails (can resume from cursor)
+
+**Why Dependent Resource Model:**
+- GitLab Notes API doesn't provide a clean global `updated_after` stream
+- Notes are listed per-issue or per-MR, not as a top-level resource
+- Treating notes as dependent resources (refetch when parent updates) is simpler and more correct
+- Same applies to MR changes/diffs
+
+**CLI Commands:**
+```bash
+# Full sync (respects cursors, only fetches new/updated)
+gitlab-engine sync
+
+# Force full re-sync (resets cursors)
+gitlab-engine sync --full
+
+# Override stale 'running' run after operator review
+gitlab-engine sync --force
+
+# Show sync status
+gitlab-engine sync-status
+```
+
+---
+
+## Future Checkpoints (Post-MVP)
+
+### Checkpoint 6: File/Feature History View
+- Map commits to MRs to discussions
+- Query: "Show decision history for src/auth/login.ts"
+- Ship `gitlab-engine file-history <path>` as a first-class feature here
+- This command is deferred from MVP to sharpen checkpoint focus
+
+### Checkpoint 7: Personal Dashboard
+- Filter by assigned/mentioned
+- Integrate with existing gitlab-inbox tool
+
+### Checkpoint 8: Person Context
+- Aggregate contributions by author
+- Expertise inference from activity
+
+### Checkpoint 9: Decision Graph
+- Extract decisions from discussions (LLM-assisted)
+- Visualize decision relationships
+
+---
+
+## Verification Strategy
+
+Each checkpoint includes:
+
+1. **Automated tests** - Unit tests for data transformations, integration tests for API calls
+2. **CLI smoke tests** - Manual commands with expected outputs documented
+3. **Data integrity checks** - Count verification against GitLab, schema validation
+4. **Search quality tests** - Known queries with expected results (for Checkpoint 4+)
+
+---
+
+## Risk Mitigation
+
+| Risk | Mitigation |
+|------|------------|
+| GitLab rate limiting | Exponential backoff, respect Retry-After headers, incremental sync |
+| Embedding model quality | Start with nomic-embed-text; architecture allows model swap |
+| SQLite scale limits | Monitor performance; Postgres migration path documented |
+| Stale data | Incremental sync with change detection |
+| Mid-sync failures | Cursor-based resumption, sync_runs audit trail |
+| Search quality | Hybrid (vector + FTS5) retrieval with RRF, golden query test suite |
+| Concurrent sync corruption | Single-flight protection (refuse if existing run is `running`) |
+
+**SQLite Performance Defaults (MVP):**
+- Enable `PRAGMA journal_mode=WAL;` on every connection
+- Enable `PRAGMA foreign_keys=ON;` on every connection
+- Use explicit transactions for page/batch inserts
+- Targeted indexes on `(project_id, updated_at)` for primary resources
+
+---
+
+## Schema Summary
+
+| Table | Checkpoint | Purpose |
+|-------|------------|---------|
+| projects | 0 | Configured GitLab projects |
+| sync_runs | 0 | Audit trail of sync operations |
+| sync_cursors | 0 | Resumable sync state per primary resource |
+| raw_payloads | 0 | Decoupled raw JSON storage |
+| issues | 1 | Normalized issues |
+| labels | 1 | Label definitions (unique by project + name) |
+| issue_labels | 1 | Issue-label junction |
+| merge_requests | 2 | Normalized MRs |
+| notes | 2 | Issue and MR comments (with parent FKs) |
+| mr_files | 2 | MR file changes (with rename tracking) |
+| mr_labels | 2 | MR-label junction |
+| documents | 3 | Unified searchable documents |
+| document_labels | 3 | Document-label junction for fast filtering |
+| embeddings | 3 | Vector embeddings (sqlite-vss, rowid=document_id) |
+| embedding_metadata | 3 | Embedding provenance + change detection |
+| documents_fts | 4 | Full-text search index (fts5) |
+
+---
+
+## Resolved Decisions
+
+| Question | Decision | Rationale |
+|----------|----------|-----------|
+| Commit/file linkage | **Include MR→file links** | Enables "what MRs touched this file?" without full commit history |
+| Labels | **Index as filters** | Labels are well-used; `document_labels` table enables fast `--label=X` filtering |
+| Labels uniqueness | **By (project_id, name)** | GitLab API returns labels as strings; gitlab_id isn't always available |
+| Sync method | **Polling for MVP** | Decide on webhooks after using the system |
+| Notes sync | **Dependent resource** | Notes API is per-parent, not global; refetch on parent update |
+| Hybrid ranking | **RRF over weighted sums** | Simpler, no score normalization needed |
+| Embedding rowid | **rowid = documents.id** | Eliminates fragile rowid mapping during upserts |
+| file-history CLI | **Post-MVP (CP6)** | Sharpens MVP checkpoint focus |
+
+---
+
+## Next Steps
+
+1. User approves this spec
+2. Generate Checkpoint 0 PRD for project setup
+3. Implement Checkpoint 0
+4. Human validates → proceed to Checkpoint 1
+5. Repeat for each checkpoint