initial
This commit is contained in:
563
PRD.md
Normal file
563
PRD.md
Normal file
@@ -0,0 +1,563 @@
|
|||||||
|
# GitLab Inbox - Product Requirements Document
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
**Product Name**: GitLab Inbox
|
||||||
|
**Version**: 1.0
|
||||||
|
**Author**: Taylor Eernisse
|
||||||
|
**Date**: January 16, 2026
|
||||||
|
|
||||||
|
### Problem Statement
|
||||||
|
|
||||||
|
Managing GitLab activity with ADHD is overwhelming. The native GitLab interface creates cognitive overload through:
|
||||||
|
|
||||||
|
- **Information scatter**: Issues, MRs, and activity are spread across multiple pages
|
||||||
|
- **Missing reply awareness**: Hard to know when someone has responded to your question (not fully covered by /todos alone)
|
||||||
|
- **Context loss**: Difficult to find the right tab or remember which conversation you were tracking
|
||||||
|
- **No unified "what's next"**: Multiple clicks required to understand what needs attention
|
||||||
|
|
||||||
|
### Solution
|
||||||
|
|
||||||
|
A local, always-open "inbox" application that presents GitLab notifications in an ADHD-friendly interface with explicit "handled" tracking, snooze capabilities, watchlist for awaiting replies, and progress visibility.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Target User
|
||||||
|
|
||||||
|
**Primary Persona**: Software developer with ADHD working on 1-2 GitLab projects who needs to track conversations and respond to mentions, reviews, and assignments without cognitive overload.
|
||||||
|
|
||||||
|
**Key Characteristics**:
|
||||||
|
- Needs clear "what's next" visibility
|
||||||
|
- Benefits from external accountability (seeing who's waiting)
|
||||||
|
- Motivated by progress tracking (watching a list shrink)
|
||||||
|
- Prefers always-open tools over on-demand checks
|
||||||
|
- Struggles with context switching and finding the right place
|
||||||
|
- Needs a "not now but not forgotten" path that doesn't require willpower
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
### User Goals
|
||||||
|
1. Know immediately when someone has replied or needs my attention
|
||||||
|
2. Quickly navigate to the right place in GitLab to respond
|
||||||
|
3. Track what I've handled today for satisfaction and progress awareness
|
||||||
|
4. Reduce cognitive load of manually tracking conversations
|
||||||
|
5. Defer items temporarily without losing accountability (snooze)
|
||||||
|
6. Know when someone has replied to something I'm waiting on
|
||||||
|
|
||||||
|
### Product Goals
|
||||||
|
1. Reduce time-to-awareness for GitLab notifications
|
||||||
|
2. Eliminate the need to manually poll GitLab for updates
|
||||||
|
3. Provide ADHD-friendly UX patterns (clear actions, progress visibility, minimal decisions)
|
||||||
|
4. Enable keyboard-first operation to reduce friction
|
||||||
|
|
||||||
|
### Non-Goals (v1.0)
|
||||||
|
- Replacing GitLab for any write operations (commenting, reviewing, merging)
|
||||||
|
- Supporting multiple GitLab instances
|
||||||
|
- Team/shared usage
|
||||||
|
- Mobile support
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Core Features
|
||||||
|
|
||||||
|
### 1. Inbox View (Primary)
|
||||||
|
|
||||||
|
**Description**: Display all GitLab todos (notifications) that need attention.
|
||||||
|
|
||||||
|
**Data Source**: GitLab `/todos` API endpoint
|
||||||
|
|
||||||
|
**Display Elements** (per item):
|
||||||
|
| Element | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| Action Badge | Type indicator: mentioned, assigned, review_requested, build_failed, etc. |
|
||||||
|
| Target Title | MR or Issue title |
|
||||||
|
| Author | Who triggered this todo (name + avatar) |
|
||||||
|
| Time | Relative time since created ("2h ago", "3 days") |
|
||||||
|
| Project | Project name for context |
|
||||||
|
|
||||||
|
**Interactions**:
|
||||||
|
- **Click item / Enter** → Opens target URL in browser (GitLab)
|
||||||
|
- **Mark Handled** → Moves item to Done Today (local state only)
|
||||||
|
- **Snooze** → Hides item until a chosen time (local state only)
|
||||||
|
- **Dismiss** → `POST /todos/:id/mark_as_done` (marks as done in GitLab)
|
||||||
|
|
||||||
|
**Filtering**: Items marked as "handled" or "snoozed" locally are hidden from Inbox.
|
||||||
|
|
||||||
|
### 2. Snoozed View
|
||||||
|
|
||||||
|
**Description**: Items temporarily deferred until their wake time.
|
||||||
|
|
||||||
|
**Purpose**:
|
||||||
|
- "Not now but not forgotten" path
|
||||||
|
- Reduces inbox dread by shrinking the visible list
|
||||||
|
- Enables focus sessions: clear the deck, then pull from Snoozed intentionally
|
||||||
|
|
||||||
|
**Snooze Options**:
|
||||||
|
- Later today (3 hours)
|
||||||
|
- Tomorrow morning (9am local)
|
||||||
|
- Next weekday (Mon-Fri, 9am)
|
||||||
|
- Custom date/time
|
||||||
|
|
||||||
|
**Behavior**:
|
||||||
|
- Snoozed items are hidden from Inbox
|
||||||
|
- When wake time passes, item returns to Inbox with a "Woke up" indicator
|
||||||
|
- Snoozed view shows all snoozed items with their wake times
|
||||||
|
|
||||||
|
### 3. Watchlist (Awaiting Reply)
|
||||||
|
|
||||||
|
**Description**: Targets you're explicitly waiting on (MRs/Issues/etc.). Alerts when there is new activity since you last checked.
|
||||||
|
|
||||||
|
**Purpose**:
|
||||||
|
- GitLab todos don't guarantee "someone replied" notifications
|
||||||
|
- Explicit watch semantics for "I'm waiting on Bob" tracking
|
||||||
|
- Gain "external accountability" symmetry
|
||||||
|
|
||||||
|
**Data Sources**:
|
||||||
|
- Primary: /todos (fast path for items that generate new todos)
|
||||||
|
- Secondary: per-target `updated_at`/notes polling for watched items (small set)
|
||||||
|
|
||||||
|
**Interactions**:
|
||||||
|
- Mark Handled → optionally "Add to Watchlist" toggle
|
||||||
|
- Watch item shows "Last seen" timestamp and "New activity" indicator
|
||||||
|
- Click to open target in GitLab
|
||||||
|
- Remove from watchlist when no longer waiting
|
||||||
|
|
||||||
|
### 4. Done Today View
|
||||||
|
|
||||||
|
**Description**: Items marked as handled during the current day.
|
||||||
|
|
||||||
|
**Purpose**:
|
||||||
|
- ADHD-friendly progress visibility
|
||||||
|
- Satisfaction from watching list shrink
|
||||||
|
- Review of daily accomplishments
|
||||||
|
|
||||||
|
**Behavior**:
|
||||||
|
- Stored as date-bucketed ledger keyed by local date (YYYY-MM-DD)
|
||||||
|
- "Done Today" shows bucket for current local date
|
||||||
|
- Option to clear today's bucket only
|
||||||
|
- Historical buckets retained for potential "Done Yesterday" or weekly views
|
||||||
|
|
||||||
|
### 5. Manual Refresh
|
||||||
|
|
||||||
|
**Description**: Button to fetch latest todos on demand.
|
||||||
|
|
||||||
|
**Purpose**: Immediate update when user knows something changed.
|
||||||
|
|
||||||
|
### 6. Background Polling (v1.1)
|
||||||
|
|
||||||
|
**Description**: Automatic periodic refresh of todos.
|
||||||
|
|
||||||
|
**Configuration**:
|
||||||
|
- Base interval (default: 60s)
|
||||||
|
- Backoff on failure (exponential, capped at 15m) with jitter
|
||||||
|
- 429 handling (respect `Retry-After` header; otherwise back off)
|
||||||
|
|
||||||
|
**Indicator**:
|
||||||
|
- Last successful refresh time
|
||||||
|
- Next scheduled refresh
|
||||||
|
- Current backoff state (if any)
|
||||||
|
|
||||||
|
### 7. Keyboard Shortcuts (v1.0)
|
||||||
|
|
||||||
|
**Description**: Keyboard-first operation for reduced friction.
|
||||||
|
|
||||||
|
| Key | Action |
|
||||||
|
|-----|--------|
|
||||||
|
| `j` / `k` | Navigate down / up |
|
||||||
|
| `Enter` | Open selected item in GitLab |
|
||||||
|
| `h` | Mark handled |
|
||||||
|
| `s` | Snooze (opens snooze picker) |
|
||||||
|
| `d` | Dismiss (mark as done in GitLab) |
|
||||||
|
| `w` | Add to / remove from watchlist |
|
||||||
|
| `/` | Focus search/filter |
|
||||||
|
|
||||||
|
### 8. Focus Mode (Optional)
|
||||||
|
|
||||||
|
**Description**: Show only the next N items (default 3) to reduce decision load.
|
||||||
|
|
||||||
|
**Purpose**:
|
||||||
|
- Convert "overwhelm" into "sequence"
|
||||||
|
- Reduce choices, increase throughput
|
||||||
|
- ADHD-optimized: work the queue, don't manage the list
|
||||||
|
|
||||||
|
**Behavior**:
|
||||||
|
- Primary action emphasized ("Open", then "Handled"/"Snooze")
|
||||||
|
- Toggle: Focus / All Items
|
||||||
|
- Focus queue is top N unhandled items by creation date
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Technical Architecture
|
||||||
|
|
||||||
|
### Tech Stack
|
||||||
|
|
||||||
|
| Component | Technology |
|
||||||
|
|-----------|------------|
|
||||||
|
| Framework | TanStack Start |
|
||||||
|
| Styling | Tailwind CSS |
|
||||||
|
| Runtime | Node.js (local) |
|
||||||
|
| State Persistence | JSON file with atomic writes (local) |
|
||||||
|
| Secret Storage | OS keychain (preferred) or encrypted local store |
|
||||||
|
| GitLab Integration | REST API with Personal Access Token |
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
- **Local only**: Runs on localhost
|
||||||
|
- **No external hosting**: No cloud deployment, no auth flows
|
||||||
|
- **Single user**: No multi-tenancy
|
||||||
|
|
||||||
|
### GitLab API
|
||||||
|
|
||||||
|
**Authentication**: Personal Access Token (PAT) with `read_api` scope
|
||||||
|
|
||||||
|
**Primary Endpoints**:
|
||||||
|
```
|
||||||
|
GET /api/v4/todos?state=pending&per_page=100
|
||||||
|
|
||||||
|
POST /api/v4/todos/:id/mark_as_done
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response Structure** (relevant fields):
|
||||||
|
```typescript
|
||||||
|
interface GitLabTodo {
|
||||||
|
id: number;
|
||||||
|
action_name:
|
||||||
|
| 'assigned'
|
||||||
|
| 'mentioned'
|
||||||
|
| 'build_failed'
|
||||||
|
| 'marked'
|
||||||
|
| 'approval_required'
|
||||||
|
| 'unmergeable'
|
||||||
|
| 'directly_addressed'
|
||||||
|
| 'merge_train_removed'
|
||||||
|
| 'member_access_requested'
|
||||||
|
| string; // forward-compatible for new action types
|
||||||
|
target_type: 'MergeRequest' | 'Issue' | 'Commit' | 'Epic' | 'DesignManagement::Design' | string;
|
||||||
|
target: {
|
||||||
|
id: number;
|
||||||
|
iid: number;
|
||||||
|
title: string;
|
||||||
|
web_url?: string; // optional; may not be present for all target types
|
||||||
|
};
|
||||||
|
target_url: string; // canonical "Open" URL - use this for navigation
|
||||||
|
author: {
|
||||||
|
id: number;
|
||||||
|
name: string;
|
||||||
|
avatar_url: string;
|
||||||
|
};
|
||||||
|
project: {
|
||||||
|
id: number;
|
||||||
|
name: string;
|
||||||
|
path_with_namespace: string;
|
||||||
|
};
|
||||||
|
created_at: string;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Local State
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface LocalState {
|
||||||
|
schemaVersion: number; // for migrations
|
||||||
|
|
||||||
|
handledByDate: {
|
||||||
|
[localDate: string]: { // YYYY-MM-DD in local time
|
||||||
|
[todoId: number]: {
|
||||||
|
handledAt: string; // ISO timestamp
|
||||||
|
todo: GitLabTodo; // Snapshot for Done Today display
|
||||||
|
}
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
snoozedTodos: {
|
||||||
|
[todoId: number]: {
|
||||||
|
wakeAt: string; // ISO timestamp
|
||||||
|
snoozedAt: string; // ISO timestamp
|
||||||
|
todo: GitLabTodo; // snapshot
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
watchlist: {
|
||||||
|
[watchKey: string]: { // e.g., "MergeRequest:123" or "Issue:456"
|
||||||
|
targetType: string;
|
||||||
|
projectId?: number;
|
||||||
|
targetId: number;
|
||||||
|
targetIid?: number;
|
||||||
|
targetUrl: string;
|
||||||
|
lastSeenUpdatedAt?: string; // ISO - when we last observed the target
|
||||||
|
lastCheckedAt?: string; // ISO - when we last polled
|
||||||
|
addedAt: string; // ISO
|
||||||
|
muted?: boolean;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Storage**: `~/.config/gitlab-inbox/state.json`
|
||||||
|
|
||||||
|
**Persistence Strategy**:
|
||||||
|
- Atomic writes: write to `state.json.tmp`, then rename to `state.json`
|
||||||
|
- Keep `state.json.bak` as last-known-good before each write
|
||||||
|
- Validate JSON schema on load; if invalid, fall back to backup and surface warning
|
||||||
|
- Schema version for forward migrations
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## User Interface
|
||||||
|
|
||||||
|
### Layout
|
||||||
|
|
||||||
|
```
|
||||||
|
+--------------------------------------------------+
|
||||||
|
| GitLab Inbox [Focus] [Refresh] 🟢 2m ago |
|
||||||
|
| [Inbox] [Snoozed] [Watchlist] [Done Today] |
|
||||||
|
+--------------------------------------------------+
|
||||||
|
| |
|
||||||
|
| > [mentioned] Fix login bug |
|
||||||
|
| Alice Smith · infra-frontend · 2h ago |
|
||||||
|
| [Snooze] [Handle] [Open] |
|
||||||
|
| |
|
||||||
|
| [review_requested] Add caching layer |
|
||||||
|
| Bob Jones · api-service · 1d ago |
|
||||||
|
| [Snooze] [Handle] [Open] |
|
||||||
|
| |
|
||||||
|
| [assigned] Update documentation |
|
||||||
|
| Carol White · docs · 3d ago |
|
||||||
|
| [Snooze] [Handle] [Open] |
|
||||||
|
| |
|
||||||
|
+--------------------------------------------------+
|
||||||
|
| j/k: navigate Enter: open h: handle s: snooze |
|
||||||
|
+--------------------------------------------------+
|
||||||
|
```
|
||||||
|
|
||||||
|
### Action Badge Colors
|
||||||
|
|
||||||
|
| Action | Color | Meaning |
|
||||||
|
|--------|-------|---------|
|
||||||
|
| mentioned | Blue | Someone mentioned you |
|
||||||
|
| assigned | Purple | Assigned to you |
|
||||||
|
| approval_required | Yellow | Needs your approval |
|
||||||
|
| build_failed | Red | Pipeline failure |
|
||||||
|
| directly_addressed | Cyan | Direct @ mention |
|
||||||
|
| unmergeable | Orange | MR has conflicts |
|
||||||
|
| marked | Gray | Marked as todo |
|
||||||
|
| merge_train_removed | Red | Removed from merge train |
|
||||||
|
| member_access_requested | Teal | Access request |
|
||||||
|
| (unknown) | Gray | Forward-compatible fallback |
|
||||||
|
|
||||||
|
### States
|
||||||
|
|
||||||
|
- **Loading**: Skeleton cards while fetching
|
||||||
|
- **Empty**: "All clear! No pending items." message
|
||||||
|
- **Error**: Connection error with retry button
|
||||||
|
- **Stale**: Visual indicator if data is old (> 5 min since last *successful* refresh)
|
||||||
|
- **Backoff**: Indicator showing retry status when experiencing errors
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## User Flows
|
||||||
|
|
||||||
|
### Flow 1: Morning Check-in
|
||||||
|
1. Open GitLab Inbox (already running in background tab)
|
||||||
|
2. See list of todos sorted by newest first
|
||||||
|
3. Press `Enter` to open in GitLab
|
||||||
|
4. Handle the item (reply, review, etc.)
|
||||||
|
5. Return to Inbox, press `h` to mark handled
|
||||||
|
6. Item moves to Done Today
|
||||||
|
7. Repeat until Inbox is empty
|
||||||
|
|
||||||
|
### Flow 2: Triage with Snooze
|
||||||
|
1. See inbox with 12 items
|
||||||
|
2. Quickly triage: handle 3, snooze 5 until tomorrow, dismiss 2 already-resolved
|
||||||
|
3. Inbox now shows 2 items to focus on
|
||||||
|
4. Tomorrow: snoozed items wake up and return to inbox
|
||||||
|
|
||||||
|
### Flow 3: Awaiting Reply
|
||||||
|
1. Handle a todo (you replied to someone's question)
|
||||||
|
2. Toggle "Add to Watchlist" when marking handled
|
||||||
|
3. Item appears in Watchlist view
|
||||||
|
4. Later: see "New activity" indicator when they respond
|
||||||
|
5. Open, read response, remove from watchlist
|
||||||
|
|
||||||
|
### Flow 4: Focus Session
|
||||||
|
1. Enable Focus Mode
|
||||||
|
2. See only top 3 items
|
||||||
|
3. Work through them sequentially
|
||||||
|
4. As items complete, next ones appear
|
||||||
|
5. Reduced decision fatigue
|
||||||
|
|
||||||
|
### Flow 5: End-of-Day Review
|
||||||
|
1. Navigate to Done Today view
|
||||||
|
2. See all items handled today
|
||||||
|
3. Satisfaction from visible progress
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Success Metrics
|
||||||
|
|
||||||
|
| Metric | Target | Measurement |
|
||||||
|
|--------|--------|-------------|
|
||||||
|
| Time to awareness | < 2 min | Time from GitLab event to user seeing it |
|
||||||
|
| Daily items handled | Increased | Compare to baseline (manual tracking) |
|
||||||
|
| Context switches | Reduced | Fewer GitLab tabs open simultaneously |
|
||||||
|
| Snooze usage | Regular | Items snoozed vs dismissed (healthy ratio = snooze used) |
|
||||||
|
| Reply awareness | High | Watchlist items caught before manual check |
|
||||||
|
| User satisfaction | Qualitative | Does this reduce ADHD-related friction? |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risks and Mitigations
|
||||||
|
|
||||||
|
| Risk | Impact | Mitigation |
|
||||||
|
|------|--------|------------|
|
||||||
|
| GitLab API rate limits | Polling blocked | Configurable interval, backoff + jitter, respect 429/Retry-After |
|
||||||
|
| Token expiration/rotation | App stops working | Clear error state + setup flow; surface expiry guidance and re-auth path |
|
||||||
|
| State file corruption | Lose handled/snoozed/watch state | Atomic writes (tmp+rename), schema validation on load, keep last-known-good backup |
|
||||||
|
| GitLab API changes | App breaks | Pin to known API version, monitor deprecations, forward-compatible types |
|
||||||
|
| Token leakage | Security incident | Store in OS keychain, not in repo-adjacent files |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future Considerations (Post v1.0)
|
||||||
|
|
||||||
|
- **Grouping**: By project, by action type
|
||||||
|
- **Stale highlighting**: Visual alert for items waiting > X days
|
||||||
|
- **Desktop notifications**: OS-level alerts for new high-priority items
|
||||||
|
- **Quick actions**: Approve MR, close issue directly from app
|
||||||
|
- **Multiple GitLab instances**: Connect to both gitlab.com and self-hosted
|
||||||
|
- **Done history**: View handled items from yesterday, this week
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Phases
|
||||||
|
|
||||||
|
### Phase 0: Setup & Auth
|
||||||
|
- First-run setup wizard (GitLab URL + token)
|
||||||
|
- Token storage implementation (keychain/encrypted local)
|
||||||
|
- Connectivity check (`/todos`, auth failure UX)
|
||||||
|
- Clear error states for invalid/expired tokens
|
||||||
|
|
||||||
|
### Phase 1: Foundation
|
||||||
|
- Initialize TanStack Start project
|
||||||
|
- Set up Tailwind CSS
|
||||||
|
- Create GitLab API client with PAT auth
|
||||||
|
- Fetch and display todos in basic list (using `target_url` for navigation)
|
||||||
|
- Implement click-to-open
|
||||||
|
|
||||||
|
### Phase 2: Core Workflow
|
||||||
|
- Add local storage with atomic writes + backup
|
||||||
|
- Implement date-bucketed handled state
|
||||||
|
- Implement "Mark Handled" action
|
||||||
|
- Create Done Today view
|
||||||
|
- Add keyboard shortcuts (minimal set: j/k/Enter/h/s/d)
|
||||||
|
- Add Snooze + Snoozed view
|
||||||
|
- Filter handled/snoozed todos from Inbox
|
||||||
|
|
||||||
|
### Phase 3: Reliability & Awareness
|
||||||
|
- Background polling with configurable interval
|
||||||
|
- Backoff/jitter + 429 handling
|
||||||
|
- Last successful refresh tracking
|
||||||
|
- Watchlist ("Awaiting Reply") implementation
|
||||||
|
- Per-target polling for watched items (small set)
|
||||||
|
- Add manual refresh button
|
||||||
|
- Relative time display
|
||||||
|
- Action type badges with colors
|
||||||
|
- Loading and error states
|
||||||
|
- Connection status indicator
|
||||||
|
|
||||||
|
### Phase 4: Polish
|
||||||
|
- Focus Mode implementation
|
||||||
|
- Snooze time picker refinement
|
||||||
|
- Keyboard shortcut help overlay
|
||||||
|
- State migration handling (schemaVersion)
|
||||||
|
- Edge case handling (DST, timezone changes)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix
|
||||||
|
|
||||||
|
### Environment Configuration
|
||||||
|
|
||||||
|
**Primary configuration**:
|
||||||
|
- URL + settings in: `~/.config/gitlab-inbox/config.json`
|
||||||
|
- Token stored in OS keychain (preferred)
|
||||||
|
|
||||||
|
**Optional (dev-only) `.env.local` support**:
|
||||||
|
```env
|
||||||
|
GITLAB_URL=https://gitlab.yourcompany.com
|
||||||
|
GITLAB_TOKEN=glpat-xxxxxxxxxxxx
|
||||||
|
```
|
||||||
|
|
||||||
|
**Config file structure**:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"gitlabUrl": "https://gitlab.yourcompany.com",
|
||||||
|
"pollingInterval": 60,
|
||||||
|
"focusModeCount": 3
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Creating a GitLab PAT
|
||||||
|
|
||||||
|
1. Go to GitLab → User Settings → Access Tokens
|
||||||
|
2. Create token with `read_api` scope
|
||||||
|
3. Set expiration (note: tokens expire at midnight UTC on expiry date)
|
||||||
|
4. Save token via setup wizard (stored in keychain)
|
||||||
|
5. Token never leaves local machine
|
||||||
|
|
||||||
|
### Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
gitlab-inbox/
|
||||||
|
├── app/
|
||||||
|
│ ├── routes/
|
||||||
|
│ │ ├── __root.tsx
|
||||||
|
│ │ ├── index.tsx # Inbox view
|
||||||
|
│ │ ├── snoozed.tsx # Snoozed view
|
||||||
|
│ │ ├── watchlist.tsx # Watchlist view
|
||||||
|
│ │ ├── done.tsx # Done Today view
|
||||||
|
│ │ └── setup.tsx # First-run setup
|
||||||
|
│ ├── components/
|
||||||
|
│ │ ├── TodoCard.tsx
|
||||||
|
│ │ ├── TodoList.tsx
|
||||||
|
│ │ ├── ActionBadge.tsx
|
||||||
|
│ │ ├── Header.tsx
|
||||||
|
│ │ ├── SnoozePicker.tsx
|
||||||
|
│ │ ├── FocusMode.tsx
|
||||||
|
│ │ └── KeyboardHelp.tsx
|
||||||
|
│ ├── lib/
|
||||||
|
│ │ ├── gitlab.ts # API client
|
||||||
|
│ │ ├── storage.ts # Atomic state persistence
|
||||||
|
│ │ ├── keychain.ts # Token storage
|
||||||
|
│ │ ├── polling.ts # Polling state machine
|
||||||
|
│ │ ├── snooze.ts # Snooze logic + wake checking
|
||||||
|
│ │ ├── watchlist.ts # Watchlist polling
|
||||||
|
│ │ └── types.ts
|
||||||
|
│ └── app.tsx
|
||||||
|
├── package.json
|
||||||
|
├── tailwind.config.ts
|
||||||
|
└── vite.config.ts
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test Strategy
|
||||||
|
|
||||||
|
**Unit Tests**:
|
||||||
|
- State normalization and migration
|
||||||
|
- Snooze wake time calculations
|
||||||
|
- Date bucketing logic (timezone handling)
|
||||||
|
- Polling backoff calculations
|
||||||
|
|
||||||
|
**Integration Tests** (mocked GitLab API):
|
||||||
|
- `/todos` response parsing
|
||||||
|
- `mark_as_done` endpoint calls
|
||||||
|
- Error handling (401, 429, network errors)
|
||||||
|
- State persistence round-trip (write + read)
|
||||||
|
- Backup recovery on corruption
|
||||||
|
|
||||||
|
**Manual Testing**:
|
||||||
|
- First-run setup flow
|
||||||
|
- Keyboard navigation
|
||||||
|
- Snooze + wake cycle
|
||||||
|
- Watchlist activity detection
|
||||||
641
SPEC.md
Normal file
641
SPEC.md
Normal file
@@ -0,0 +1,641 @@
|
|||||||
|
# GitLab Knowledge Engine - Spec Document
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
A self-hosted tool to extract, index, and semantically search 2+ years of GitLab data (issues, MRs, comments/notes, and MR file-change links) from 2 main repositories (~10K items). The MVP delivers semantic search as a foundational capability that enables future specialized views (file history, personal tracking, person context). Commit-level indexing is explicitly post-MVP.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Discovery Summary
|
||||||
|
|
||||||
|
### Pain Points Identified
|
||||||
|
1. **Knowledge discovery** - Tribal knowledge buried in old MRs/issues that nobody can find
|
||||||
|
2. **Decision traceability** - Hard to find *why* decisions were made; context scattered across issue comments and MR discussions
|
||||||
|
|
||||||
|
### Constraints
|
||||||
|
| Constraint | Detail |
|
||||||
|
|------------|--------|
|
||||||
|
| Hosting | Self-hosted only, no external APIs |
|
||||||
|
| Compute | Local dev machine (M-series Mac assumed) |
|
||||||
|
| GitLab Access | Self-hosted instance, PAT access, no webhooks (could request) |
|
||||||
|
| Build Method | AI agents will implement; user is TypeScript expert for review |
|
||||||
|
|
||||||
|
### Target Use Cases (Priority Order)
|
||||||
|
1. **MVP: Semantic Search** - "Find discussions about authentication redesign"
|
||||||
|
2. **Future: File/Feature History** - "What decisions were made about src/auth/login.ts?"
|
||||||
|
3. **Future: Personal Tracking** - "What am I assigned to or mentioned in?"
|
||||||
|
4. **Future: Person Context** - "What's @johndoe's background in this project?"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ GitLab API │
|
||||||
|
│ (Issues, MRs, Notes) │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
(Commit-level indexing explicitly post-MVP)
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Data Ingestion Layer │
|
||||||
|
│ - Incremental sync (PAT-based polling) │
|
||||||
|
│ - Rate limiting / backoff │
|
||||||
|
│ - Raw JSON storage for replay │
|
||||||
|
│ - Dependent resource fetching (notes, MR changes) │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Data Processing Layer │
|
||||||
|
│ - Normalize artifacts to unified schema │
|
||||||
|
│ - Extract searchable documents (canonical text + metadata) │
|
||||||
|
│ - Content hashing for change detection │
|
||||||
|
│ - Build relationship graph (issue↔MR↔note↔file) │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Storage Layer │
|
||||||
|
│ - SQLite + sqlite-vss + FTS5 (hybrid search) │
|
||||||
|
│ - Structured metadata in relational tables │
|
||||||
|
│ - Vector embeddings for semantic search │
|
||||||
|
│ - Full-text index for lexical search fallback │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Query Interface │
|
||||||
|
│ - CLI for human testing │
|
||||||
|
│ - JSON API for AI agent testing │
|
||||||
|
│ - Semantic search with filters (author, date, type, label) │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Technology Choices
|
||||||
|
|
||||||
|
| Component | Recommendation | Rationale |
|
||||||
|
|-----------|---------------|-----------|
|
||||||
|
| Language | TypeScript/Node.js | User expertise, good GitLab libs, AI agent friendly |
|
||||||
|
| Database | SQLite + sqlite-vss | Zero-config, portable, vector search built-in |
|
||||||
|
| Embeddings | Ollama + nomic-embed-text | Self-hosted, runs well on Apple Silicon, 768-dim vectors |
|
||||||
|
| CLI Framework | Commander.js or oclif | Standard, well-documented |
|
||||||
|
|
||||||
|
### Alternative Considered: Postgres + pgvector
|
||||||
|
- Pros: More scalable, better for production multi-user
|
||||||
|
- Cons: Requires running Postgres, heavier setup
|
||||||
|
- Decision: Start with SQLite for simplicity; migration path exists if needed
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Checkpoint Structure
|
||||||
|
|
||||||
|
Each checkpoint is a **testable milestone** where a human can validate the system works before proceeding.
|
||||||
|
|
||||||
|
### Checkpoint 0: Project Setup
|
||||||
|
**Deliverable:** Scaffolded project with GitLab API connection verified
|
||||||
|
|
||||||
|
**Tests:**
|
||||||
|
1. Run `gitlab-engine auth-test` → returns authenticated user info
|
||||||
|
2. Run `gitlab-engine doctor` → verifies:
|
||||||
|
- Can reach GitLab baseUrl
|
||||||
|
- PAT is present and can read configured projects
|
||||||
|
- SQLite opens DB and migrations apply
|
||||||
|
- Ollama reachable OR embedding disabled with clear warning
|
||||||
|
|
||||||
|
**Scope:**
|
||||||
|
- Project structure (TypeScript, ESLint, Vitest)
|
||||||
|
- GitLab API client with PAT authentication
|
||||||
|
- Environment and project configuration
|
||||||
|
- Basic CLI scaffold with `auth-test` command
|
||||||
|
- `doctor` command for environment verification
|
||||||
|
- Projects table and initial sync
|
||||||
|
|
||||||
|
**Configuration (MVP):**
|
||||||
|
```json
|
||||||
|
// gitlab-engine.config.json
|
||||||
|
{
|
||||||
|
"gitlab": {
|
||||||
|
"baseUrl": "https://gitlab.example.com",
|
||||||
|
"tokenEnvVar": "GITLAB_TOKEN"
|
||||||
|
},
|
||||||
|
"projects": [
|
||||||
|
{ "path": "group/project-one" },
|
||||||
|
{ "path": "group/project-two" }
|
||||||
|
],
|
||||||
|
"embedding": {
|
||||||
|
"provider": "ollama",
|
||||||
|
"model": "nomic-embed-text",
|
||||||
|
"baseUrl": "http://localhost:11434"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**DB Runtime Defaults (Checkpoint 0):**
|
||||||
|
- On every connection:
|
||||||
|
- `PRAGMA journal_mode=WAL;`
|
||||||
|
- `PRAGMA foreign_keys=ON;`
|
||||||
|
|
||||||
|
**Schema (Checkpoint 0):**
|
||||||
|
```sql
|
||||||
|
-- Projects table (configured targets)
|
||||||
|
CREATE TABLE projects (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_project_id INTEGER UNIQUE NOT NULL,
|
||||||
|
path_with_namespace TEXT NOT NULL,
|
||||||
|
default_branch TEXT,
|
||||||
|
web_url TEXT,
|
||||||
|
created_at INTEGER,
|
||||||
|
updated_at INTEGER,
|
||||||
|
raw_payload_id INTEGER REFERENCES raw_payloads(id)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_projects_path ON projects(path_with_namespace);
|
||||||
|
|
||||||
|
-- Sync tracking for reliability
|
||||||
|
CREATE TABLE sync_runs (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
started_at INTEGER NOT NULL,
|
||||||
|
finished_at INTEGER,
|
||||||
|
status TEXT NOT NULL, -- 'running' | 'succeeded' | 'failed'
|
||||||
|
command TEXT NOT NULL, -- 'ingest issues' | 'sync' | etc.
|
||||||
|
error TEXT
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Sync cursors for primary resources only
|
||||||
|
-- Notes and MR changes are dependent resources (fetched via parent updates)
|
||||||
|
CREATE TABLE sync_cursors (
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||||
|
resource_type TEXT NOT NULL, -- 'issues' | 'merge_requests'
|
||||||
|
updated_at_cursor INTEGER, -- last fully processed updated_at (ms epoch)
|
||||||
|
tie_breaker_id INTEGER, -- last fully processed gitlab_id (for stable ordering)
|
||||||
|
PRIMARY KEY(project_id, resource_type)
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Raw payload storage (decoupled from entity tables)
|
||||||
|
CREATE TABLE raw_payloads (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
source TEXT NOT NULL, -- 'gitlab'
|
||||||
|
resource_type TEXT NOT NULL, -- 'project' | 'issue' | 'mr' | 'note'
|
||||||
|
gitlab_id INTEGER NOT NULL,
|
||||||
|
fetched_at INTEGER NOT NULL,
|
||||||
|
json TEXT NOT NULL
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_raw_payloads_lookup ON raw_payloads(resource_type, gitlab_id);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Checkpoint 1: Issue Ingestion
|
||||||
|
**Deliverable:** All issues from target repos stored locally
|
||||||
|
|
||||||
|
**Test:** Run `gitlab-engine ingest --type=issues` → count matches GitLab; run `gitlab-engine list issues --limit=10` → displays issues correctly
|
||||||
|
|
||||||
|
**Scope:**
|
||||||
|
- Issue fetcher with pagination handling
|
||||||
|
- Raw JSON storage in raw_payloads table
|
||||||
|
- Normalized issue schema in SQLite
|
||||||
|
- Labels ingestion derived from issue payload:
|
||||||
|
- Always persist label names from `labels: string[]`
|
||||||
|
- Optionally request `with_labels_details=true` to capture color/description when available
|
||||||
|
- Incremental sync support (run tracking + per-project cursor)
|
||||||
|
- Basic list/count CLI commands
|
||||||
|
|
||||||
|
**Reliability/Idempotency Rules:**
|
||||||
|
- Every ingest/sync creates a `sync_runs` row
|
||||||
|
- Single-flight: refuse to start if an existing run is `running` (unless `--force`)
|
||||||
|
- Cursor advances only after successful transaction commit per page/batch
|
||||||
|
- Ordering: `updated_at ASC`, tie-breaker `gitlab_id ASC`
|
||||||
|
- Use explicit transactions for batch inserts
|
||||||
|
|
||||||
|
**Schema Preview:**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE issues (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_id INTEGER UNIQUE NOT NULL,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||||
|
iid INTEGER NOT NULL,
|
||||||
|
title TEXT,
|
||||||
|
description TEXT,
|
||||||
|
state TEXT,
|
||||||
|
author_username TEXT,
|
||||||
|
created_at INTEGER,
|
||||||
|
updated_at INTEGER,
|
||||||
|
web_url TEXT,
|
||||||
|
raw_payload_id INTEGER REFERENCES raw_payloads(id)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_issues_project_updated ON issues(project_id, updated_at);
|
||||||
|
CREATE INDEX idx_issues_author ON issues(author_username);
|
||||||
|
|
||||||
|
-- Labels are derived from issue payloads (string array)
|
||||||
|
-- Uniqueness is (project_id, name) since gitlab_id isn't always available
|
||||||
|
CREATE TABLE labels (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_id INTEGER, -- optional (only if available)
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||||
|
name TEXT NOT NULL,
|
||||||
|
color TEXT,
|
||||||
|
description TEXT
|
||||||
|
);
|
||||||
|
CREATE UNIQUE INDEX uq_labels_project_name ON labels(project_id, name);
|
||||||
|
CREATE INDEX idx_labels_name ON labels(name);
|
||||||
|
|
||||||
|
CREATE TABLE issue_labels (
|
||||||
|
issue_id INTEGER REFERENCES issues(id),
|
||||||
|
label_id INTEGER REFERENCES labels(id),
|
||||||
|
PRIMARY KEY(issue_id, label_id)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_issue_labels_label ON issue_labels(label_id);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Checkpoint 2: MR + Comments + File Links Ingestion
|
||||||
|
**Deliverable:** All MRs, discussion threads, and file-change links stored locally
|
||||||
|
|
||||||
|
**Test:** Run `gitlab-engine ingest --type=merge_requests` → count matches; run `gitlab-engine show mr 1234` → displays MR with comments and files changed
|
||||||
|
|
||||||
|
**Scope:**
|
||||||
|
- MR fetcher with pagination
|
||||||
|
- Notes fetcher (issue notes + MR notes) as a dependent resource:
|
||||||
|
- During initial ingest: fetch notes for every issue/MR
|
||||||
|
- During sync: refetch notes only for issues/MRs updated since cursor
|
||||||
|
- MR changes/diffs fetcher as a dependent resource:
|
||||||
|
- During initial ingest: fetch changes for every MR
|
||||||
|
- During sync: refetch changes only for MRs updated since cursor
|
||||||
|
- Relationship linking (note → parent issue/MR via foreign keys, MR → files)
|
||||||
|
- Extended CLI commands for MR display
|
||||||
|
|
||||||
|
**Schema Additions:**
|
||||||
|
```sql
|
||||||
|
CREATE TABLE merge_requests (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_id INTEGER UNIQUE NOT NULL,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||||
|
iid INTEGER NOT NULL,
|
||||||
|
title TEXT,
|
||||||
|
description TEXT,
|
||||||
|
state TEXT,
|
||||||
|
author_username TEXT,
|
||||||
|
source_branch TEXT,
|
||||||
|
target_branch TEXT,
|
||||||
|
created_at INTEGER,
|
||||||
|
updated_at INTEGER,
|
||||||
|
merged_at INTEGER,
|
||||||
|
web_url TEXT,
|
||||||
|
raw_payload_id INTEGER REFERENCES raw_payloads(id)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_mrs_project_updated ON merge_requests(project_id, updated_at);
|
||||||
|
CREATE INDEX idx_mrs_author ON merge_requests(author_username);
|
||||||
|
|
||||||
|
-- Notes with explicit parent foreign keys for referential integrity
|
||||||
|
CREATE TABLE notes (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_id INTEGER UNIQUE NOT NULL,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||||
|
issue_id INTEGER REFERENCES issues(id),
|
||||||
|
merge_request_id INTEGER REFERENCES merge_requests(id),
|
||||||
|
noteable_type TEXT NOT NULL, -- 'Issue' | 'MergeRequest'
|
||||||
|
noteable_iid INTEGER NOT NULL, -- parent IID (from API path)
|
||||||
|
author_username TEXT,
|
||||||
|
body TEXT,
|
||||||
|
created_at INTEGER,
|
||||||
|
updated_at INTEGER,
|
||||||
|
system BOOLEAN,
|
||||||
|
raw_payload_id INTEGER REFERENCES raw_payloads(id),
|
||||||
|
-- Exactly one parent FK must be set
|
||||||
|
CHECK (
|
||||||
|
(noteable_type='Issue' AND issue_id IS NOT NULL AND merge_request_id IS NULL) OR
|
||||||
|
(noteable_type='MergeRequest' AND merge_request_id IS NOT NULL AND issue_id IS NULL)
|
||||||
|
)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_notes_issue ON notes(issue_id);
|
||||||
|
CREATE INDEX idx_notes_mr ON notes(merge_request_id);
|
||||||
|
CREATE INDEX idx_notes_author ON notes(author_username);
|
||||||
|
|
||||||
|
-- File linkage for "what MRs touched this file?" queries (with rename support)
|
||||||
|
CREATE TABLE mr_files (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
merge_request_id INTEGER REFERENCES merge_requests(id),
|
||||||
|
old_path TEXT,
|
||||||
|
new_path TEXT,
|
||||||
|
new_file BOOLEAN,
|
||||||
|
deleted_file BOOLEAN,
|
||||||
|
renamed_file BOOLEAN,
|
||||||
|
UNIQUE(merge_request_id, old_path, new_path)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_mr_files_old_path ON mr_files(old_path);
|
||||||
|
CREATE INDEX idx_mr_files_new_path ON mr_files(new_path);
|
||||||
|
|
||||||
|
-- MR labels (reuse same labels table)
|
||||||
|
CREATE TABLE mr_labels (
|
||||||
|
merge_request_id INTEGER REFERENCES merge_requests(id),
|
||||||
|
label_id INTEGER REFERENCES labels(id),
|
||||||
|
PRIMARY KEY(merge_request_id, label_id)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_mr_labels_label ON mr_labels(label_id);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Checkpoint 3: Embedding Generation
|
||||||
|
**Deliverable:** Vector embeddings generated for all text content
|
||||||
|
|
||||||
|
**Test:** Run `gitlab-engine embed --all` → progress indicator; run `gitlab-engine stats` → shows embedding coverage percentage
|
||||||
|
|
||||||
|
**Scope:**
|
||||||
|
- Ollama integration (nomic-embed-text model)
|
||||||
|
- Embedding generation pipeline (batch processing)
|
||||||
|
- Vector storage in SQLite (sqlite-vss extension)
|
||||||
|
- Progress tracking and resumability
|
||||||
|
- Document extraction layer:
|
||||||
|
- Canonical "search documents" derived from issues/MRs/notes
|
||||||
|
- Stable content hashing for change detection (SHA-256 of content_text)
|
||||||
|
- Single embedding per document (chunking deferred to post-MVP)
|
||||||
|
- Denormalized metadata for fast filtering (author, labels, dates)
|
||||||
|
- Fast label filtering via `document_labels` join table
|
||||||
|
|
||||||
|
**Schema Additions:**
|
||||||
|
```sql
|
||||||
|
-- Unified searchable documents (derived from issues/MRs/notes)
|
||||||
|
CREATE TABLE documents (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
source_type TEXT NOT NULL, -- 'issue' | 'merge_request' | 'note'
|
||||||
|
source_id INTEGER NOT NULL, -- local DB id in the source table
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||||
|
author_username TEXT,
|
||||||
|
label_names TEXT, -- JSON array (display/debug only)
|
||||||
|
created_at INTEGER,
|
||||||
|
updated_at INTEGER,
|
||||||
|
url TEXT,
|
||||||
|
title TEXT, -- null for notes
|
||||||
|
content_text TEXT NOT NULL, -- canonical text for embedding/snippets
|
||||||
|
content_hash TEXT NOT NULL, -- SHA-256 for change detection
|
||||||
|
UNIQUE(source_type, source_id)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_documents_project_updated ON documents(project_id, updated_at);
|
||||||
|
CREATE INDEX idx_documents_author ON documents(author_username);
|
||||||
|
CREATE INDEX idx_documents_source ON documents(source_type, source_id);
|
||||||
|
|
||||||
|
-- Fast label filtering for documents (indexed exact-match)
|
||||||
|
CREATE TABLE document_labels (
|
||||||
|
document_id INTEGER NOT NULL REFERENCES documents(id),
|
||||||
|
label_name TEXT NOT NULL,
|
||||||
|
PRIMARY KEY(document_id, label_name)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_document_labels_label ON document_labels(label_name);
|
||||||
|
|
||||||
|
-- sqlite-vss virtual table
|
||||||
|
-- Storage rule: embeddings.rowid = documents.id
|
||||||
|
CREATE VIRTUAL TABLE embeddings USING vss0(
|
||||||
|
embedding(768)
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Embedding provenance + change detection
|
||||||
|
-- document_id is PRIMARY KEY and equals embeddings.rowid
|
||||||
|
CREATE TABLE embedding_metadata (
|
||||||
|
document_id INTEGER PRIMARY KEY REFERENCES documents(id),
|
||||||
|
model TEXT NOT NULL, -- 'nomic-embed-text'
|
||||||
|
dims INTEGER NOT NULL, -- 768
|
||||||
|
content_hash TEXT NOT NULL, -- copied from documents.content_hash
|
||||||
|
created_at INTEGER NOT NULL
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Storage Rule (MVP):**
|
||||||
|
- Insert embedding with `rowid = documents.id`
|
||||||
|
- Upsert `embedding_metadata` by `document_id`
|
||||||
|
- This alignment simplifies joins and eliminates rowid mapping fragility
|
||||||
|
|
||||||
|
**Document Extraction Rules:**
|
||||||
|
- Issue → title + "\n\n" + description
|
||||||
|
- MR → title + "\n\n" + description
|
||||||
|
- Note → body (skip system notes unless they contain meaningful content)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Checkpoint 4: Semantic Search
|
||||||
|
**Deliverable:** Working semantic search across all indexed content
|
||||||
|
|
||||||
|
**Tests:**
|
||||||
|
1. Run `gitlab-engine search "authentication redesign"` → returns ranked results with snippets
|
||||||
|
2. Golden queries: curated list of 10 queries with expected result *containment* (e.g., "at least one of these 3 known URLs appears in top 10")
|
||||||
|
3. `gitlab-engine search "..." --json` validates against JSON schema (stable fields present)
|
||||||
|
|
||||||
|
**Scope:**
|
||||||
|
- Hybrid retrieval:
|
||||||
|
- Vector recall (sqlite-vss) + FTS lexical recall (fts5)
|
||||||
|
- Merge + rerank results using Reciprocal Rank Fusion (RRF)
|
||||||
|
- Result ranking and scoring (document-level)
|
||||||
|
- Search filters: `--type=issue|mr|note`, `--author=username`, `--after=date`, `--label=name`
|
||||||
|
- Label filtering operates on `document_labels` (indexed, exact-match)
|
||||||
|
- Output formatting: ranked list with title, snippet, score, URL
|
||||||
|
- JSON output mode for AI agent consumption
|
||||||
|
|
||||||
|
**Schema Additions:**
|
||||||
|
```sql
|
||||||
|
-- Full-text search for hybrid retrieval
|
||||||
|
CREATE VIRTUAL TABLE documents_fts USING fts5(
|
||||||
|
title,
|
||||||
|
content_text,
|
||||||
|
content='documents',
|
||||||
|
content_rowid='id'
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Triggers to keep FTS in sync
|
||||||
|
CREATE TRIGGER documents_ai AFTER INSERT ON documents BEGIN
|
||||||
|
INSERT INTO documents_fts(rowid, title, content_text)
|
||||||
|
VALUES (new.id, new.title, new.content_text);
|
||||||
|
END;
|
||||||
|
|
||||||
|
CREATE TRIGGER documents_ad AFTER DELETE ON documents BEGIN
|
||||||
|
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
|
||||||
|
VALUES('delete', old.id, old.title, old.content_text);
|
||||||
|
END;
|
||||||
|
|
||||||
|
CREATE TRIGGER documents_au AFTER UPDATE ON documents BEGIN
|
||||||
|
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
|
||||||
|
VALUES('delete', old.id, old.title, old.content_text);
|
||||||
|
INSERT INTO documents_fts(rowid, title, content_text)
|
||||||
|
VALUES (new.id, new.title, new.content_text);
|
||||||
|
END;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Hybrid Search Algorithm (MVP) - Reciprocal Rank Fusion:**
|
||||||
|
1. Query both vector index (top 50) and FTS5 (top 50)
|
||||||
|
2. Merge results by document_id
|
||||||
|
3. Combine with Reciprocal Rank Fusion (RRF):
|
||||||
|
- For each retriever list, assign ranks (1..N)
|
||||||
|
- `rrf_score = Σ 1 / (k + rank)` with k=60 (tunable)
|
||||||
|
- RRF is simpler than weighted sums and doesn't require score normalization
|
||||||
|
4. Apply filters (type, author, date, label)
|
||||||
|
5. Return top K
|
||||||
|
|
||||||
|
**Why RRF over Weighted Sums:**
|
||||||
|
- FTS5 BM25 scores and vector distances use different scales
|
||||||
|
- Weighted sums (`0.7 * vector + 0.3 * fts`) require careful normalization
|
||||||
|
- RRF operates on ranks, not scores, making it robust to scale differences
|
||||||
|
- Well-established in information retrieval literature
|
||||||
|
|
||||||
|
**CLI Interface:**
|
||||||
|
```bash
|
||||||
|
# Basic semantic search
|
||||||
|
gitlab-engine search "why did we choose Redis"
|
||||||
|
|
||||||
|
# Pure FTS search (fallback if embeddings unavailable)
|
||||||
|
gitlab-engine search "redis" --mode=lexical
|
||||||
|
|
||||||
|
# Filtered search
|
||||||
|
gitlab-engine search "authentication" --type=mr --after=2024-01-01
|
||||||
|
|
||||||
|
# Filter by label
|
||||||
|
gitlab-engine search "performance" --label=bug --label=critical
|
||||||
|
|
||||||
|
# JSON output for programmatic use
|
||||||
|
gitlab-engine search "payment processing" --json
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Checkpoint 5: Incremental Sync
|
||||||
|
**Deliverable:** Efficient ongoing synchronization with GitLab
|
||||||
|
|
||||||
|
**Test:** Make a change in GitLab; run `gitlab-engine sync` → only fetches changed items; verify change appears in search
|
||||||
|
|
||||||
|
**Scope:**
|
||||||
|
- Delta sync based on stable cursor (updated_at + tie-breaker id)
|
||||||
|
- Dependent resources sync strategy (notes, MR changes)
|
||||||
|
- Webhook handler (optional, if webhook access granted)
|
||||||
|
- Re-embedding based on content_hash change (documents.content_hash != embedding_metadata.content_hash)
|
||||||
|
- Sync status reporting
|
||||||
|
|
||||||
|
**Correctness Rules (MVP):**
|
||||||
|
1. Fetch pages ordered by `updated_at ASC`, within identical timestamps advance by `gitlab_id ASC`
|
||||||
|
2. Cursor advances only after successful DB commit for that page
|
||||||
|
3. Dependent resources:
|
||||||
|
- For each updated issue/MR, refetch its notes (sorted by `updated_at`)
|
||||||
|
- For each updated MR, refetch its file changes
|
||||||
|
4. A document is queued for embedding iff `documents.content_hash != embedding_metadata.content_hash`
|
||||||
|
5. Sync run is marked 'failed' with error message if any page fails (can resume from cursor)
|
||||||
|
|
||||||
|
**Why Dependent Resource Model:**
|
||||||
|
- GitLab Notes API doesn't provide a clean global `updated_after` stream
|
||||||
|
- Notes are listed per-issue or per-MR, not as a top-level resource
|
||||||
|
- Treating notes as dependent resources (refetch when parent updates) is simpler and more correct
|
||||||
|
- Same applies to MR changes/diffs
|
||||||
|
|
||||||
|
**CLI Commands:**
|
||||||
|
```bash
|
||||||
|
# Full sync (respects cursors, only fetches new/updated)
|
||||||
|
gitlab-engine sync
|
||||||
|
|
||||||
|
# Force full re-sync (resets cursors)
|
||||||
|
gitlab-engine sync --full
|
||||||
|
|
||||||
|
# Override stale 'running' run after operator review
|
||||||
|
gitlab-engine sync --force
|
||||||
|
|
||||||
|
# Show sync status
|
||||||
|
gitlab-engine sync-status
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future Checkpoints (Post-MVP)
|
||||||
|
|
||||||
|
### Checkpoint 6: File/Feature History View
|
||||||
|
- Map commits to MRs to discussions
|
||||||
|
- Query: "Show decision history for src/auth/login.ts"
|
||||||
|
- Ship `gitlab-engine file-history <path>` as a first-class feature here
|
||||||
|
- This command is deferred from MVP to sharpen checkpoint focus
|
||||||
|
|
||||||
|
### Checkpoint 7: Personal Dashboard
|
||||||
|
- Filter by assigned/mentioned
|
||||||
|
- Integrate with existing gitlab-inbox tool
|
||||||
|
|
||||||
|
### Checkpoint 8: Person Context
|
||||||
|
- Aggregate contributions by author
|
||||||
|
- Expertise inference from activity
|
||||||
|
|
||||||
|
### Checkpoint 9: Decision Graph
|
||||||
|
- Extract decisions from discussions (LLM-assisted)
|
||||||
|
- Visualize decision relationships
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification Strategy
|
||||||
|
|
||||||
|
Each checkpoint includes:
|
||||||
|
|
||||||
|
1. **Automated tests** - Unit tests for data transformations, integration tests for API calls
|
||||||
|
2. **CLI smoke tests** - Manual commands with expected outputs documented
|
||||||
|
3. **Data integrity checks** - Count verification against GitLab, schema validation
|
||||||
|
4. **Search quality tests** - Known queries with expected results (for Checkpoint 4+)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risk Mitigation
|
||||||
|
|
||||||
|
| Risk | Mitigation |
|
||||||
|
|------|------------|
|
||||||
|
| GitLab rate limiting | Exponential backoff, respect Retry-After headers, incremental sync |
|
||||||
|
| Embedding model quality | Start with nomic-embed-text; architecture allows model swap |
|
||||||
|
| SQLite scale limits | Monitor performance; Postgres migration path documented |
|
||||||
|
| Stale data | Incremental sync with change detection |
|
||||||
|
| Mid-sync failures | Cursor-based resumption, sync_runs audit trail |
|
||||||
|
| Search quality | Hybrid (vector + FTS5) retrieval with RRF, golden query test suite |
|
||||||
|
| Concurrent sync corruption | Single-flight protection (refuse if existing run is `running`) |
|
||||||
|
|
||||||
|
**SQLite Performance Defaults (MVP):**
|
||||||
|
- Enable `PRAGMA journal_mode=WAL;` on every connection
|
||||||
|
- Enable `PRAGMA foreign_keys=ON;` on every connection
|
||||||
|
- Use explicit transactions for page/batch inserts
|
||||||
|
- Targeted indexes on `(project_id, updated_at)` for primary resources
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Schema Summary
|
||||||
|
|
||||||
|
| Table | Checkpoint | Purpose |
|
||||||
|
|-------|------------|---------|
|
||||||
|
| projects | 0 | Configured GitLab projects |
|
||||||
|
| sync_runs | 0 | Audit trail of sync operations |
|
||||||
|
| sync_cursors | 0 | Resumable sync state per primary resource |
|
||||||
|
| raw_payloads | 0 | Decoupled raw JSON storage |
|
||||||
|
| issues | 1 | Normalized issues |
|
||||||
|
| labels | 1 | Label definitions (unique by project + name) |
|
||||||
|
| issue_labels | 1 | Issue-label junction |
|
||||||
|
| merge_requests | 2 | Normalized MRs |
|
||||||
|
| notes | 2 | Issue and MR comments (with parent FKs) |
|
||||||
|
| mr_files | 2 | MR file changes (with rename tracking) |
|
||||||
|
| mr_labels | 2 | MR-label junction |
|
||||||
|
| documents | 3 | Unified searchable documents |
|
||||||
|
| document_labels | 3 | Document-label junction for fast filtering |
|
||||||
|
| embeddings | 3 | Vector embeddings (sqlite-vss, rowid=document_id) |
|
||||||
|
| embedding_metadata | 3 | Embedding provenance + change detection |
|
||||||
|
| documents_fts | 4 | Full-text search index (fts5) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolved Decisions
|
||||||
|
|
||||||
|
| Question | Decision | Rationale |
|
||||||
|
|----------|----------|-----------|
|
||||||
|
| Commit/file linkage | **Include MR→file links** | Enables "what MRs touched this file?" without full commit history |
|
||||||
|
| Labels | **Index as filters** | Labels are well-used; `document_labels` table enables fast `--label=X` filtering |
|
||||||
|
| Labels uniqueness | **By (project_id, name)** | GitLab API returns labels as strings; gitlab_id isn't always available |
|
||||||
|
| Sync method | **Polling for MVP** | Decide on webhooks after using the system |
|
||||||
|
| Notes sync | **Dependent resource** | Notes API is per-parent, not global; refetch on parent update |
|
||||||
|
| Hybrid ranking | **RRF over weighted sums** | Simpler, no score normalization needed |
|
||||||
|
| Embedding rowid | **rowid = documents.id** | Eliminates fragile rowid mapping during upserts |
|
||||||
|
| file-history CLI | **Post-MVP (CP6)** | Sharpens MVP checkpoint focus |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
1. User approves this spec
|
||||||
|
2. Generate Checkpoint 0 PRD for project setup
|
||||||
|
3. Implement Checkpoint 0
|
||||||
|
4. Human validates → proceed to Checkpoint 1
|
||||||
|
5. Repeat for each checkpoint
|
||||||
Reference in New Issue
Block a user