docs: Add comprehensive documentation and planning artifacts
README.md provides complete user documentation: - Installation via cargo install or build from source - Quick start guide with example commands - Configuration file format with all options documented - Full command reference for init, auth-test, doctor, ingest, list, show, count, sync-status, migrate, and version - Database schema overview covering projects, issues, milestones, assignees, labels, discussions, notes, and raw payloads - Development setup with test, lint, and debug commands SPEC.md updated from original TypeScript planning document: - Added note clarifying this is historical (implementation uses Rust) - Updated sqlite-vss references to sqlite-vec (deprecated library) - Added architecture overview with Technology Choices rationale - Expanded project structure showing all planned modules docs/prd/ contains detailed checkpoint planning: - checkpoint-0.md: Initial project vision and requirements - checkpoint-1.md: Revised planning after technology decisions These documents capture the evolution from initial concept through the decision to use Rust for performance and type safety. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
252
README.md
Normal file
252
README.md
Normal file
@@ -0,0 +1,252 @@
|
|||||||
|
# gi - GitLab Inbox
|
||||||
|
|
||||||
|
A command-line tool for managing GitLab issues locally. Syncs issues, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying and filtering.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Local-first**: All data stored in SQLite for instant queries
|
||||||
|
- **Incremental sync**: Cursor-based sync only fetches changes since last sync
|
||||||
|
- **Multi-project**: Track issues across multiple GitLab projects
|
||||||
|
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date
|
||||||
|
- **Raw payload storage**: Preserves original GitLab API responses for debugging
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo install --path .
|
||||||
|
```
|
||||||
|
|
||||||
|
Or build from source:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo build --release
|
||||||
|
./target/release/gi --help
|
||||||
|
```
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Initialize configuration (interactive)
|
||||||
|
gi init
|
||||||
|
|
||||||
|
# Verify authentication
|
||||||
|
gi auth-test
|
||||||
|
|
||||||
|
# Sync issues from GitLab
|
||||||
|
gi ingest --type issues
|
||||||
|
|
||||||
|
# List recent issues
|
||||||
|
gi list issues --limit 10
|
||||||
|
|
||||||
|
# Show issue details
|
||||||
|
gi show issue 123 --project group/repo
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Configuration is stored in `~/.config/gi/config.json` (or `$XDG_CONFIG_HOME/gi/config.json`).
|
||||||
|
|
||||||
|
### Example Configuration
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"gitlab": {
|
||||||
|
"baseUrl": "https://gitlab.com",
|
||||||
|
"tokenEnvVar": "GITLAB_TOKEN"
|
||||||
|
},
|
||||||
|
"projects": [
|
||||||
|
{ "path": "group/project" },
|
||||||
|
{ "path": "other-group/other-project" }
|
||||||
|
],
|
||||||
|
"sync": {
|
||||||
|
"backfillDays": 14,
|
||||||
|
"staleLockMinutes": 10
|
||||||
|
},
|
||||||
|
"storage": {
|
||||||
|
"compressRawPayloads": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuration Options
|
||||||
|
|
||||||
|
| Section | Field | Default | Description |
|
||||||
|
|---------|-------|---------|-------------|
|
||||||
|
| `gitlab` | `baseUrl` | — | GitLab instance URL (required) |
|
||||||
|
| `gitlab` | `tokenEnvVar` | `GITLAB_TOKEN` | Environment variable containing API token |
|
||||||
|
| `projects` | `path` | — | Project path (e.g., `group/project`) |
|
||||||
|
| `sync` | `backfillDays` | `14` | Days to backfill on initial sync |
|
||||||
|
| `sync` | `staleLockMinutes` | `10` | Minutes before sync lock considered stale |
|
||||||
|
| `sync` | `cursorRewindSeconds` | `2` | Seconds to rewind cursor for overlap safety |
|
||||||
|
| `storage` | `dbPath` | `~/.local/share/gi/gi.db` | Database file path |
|
||||||
|
| `storage` | `compressRawPayloads` | `true` | Compress stored API responses |
|
||||||
|
|
||||||
|
### GitLab Token
|
||||||
|
|
||||||
|
Create a personal access token with `read_api` scope:
|
||||||
|
|
||||||
|
1. Go to GitLab → Settings → Access Tokens
|
||||||
|
2. Create token with `read_api` scope
|
||||||
|
3. Export it: `export GITLAB_TOKEN=glpat-xxxxxxxxxxxx`
|
||||||
|
|
||||||
|
## Commands
|
||||||
|
|
||||||
|
### `gi init`
|
||||||
|
|
||||||
|
Initialize configuration and database interactively.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi init # Interactive setup
|
||||||
|
gi init --force # Overwrite existing config
|
||||||
|
gi init --non-interactive # Fail if prompts needed
|
||||||
|
```
|
||||||
|
|
||||||
|
### `gi auth-test`
|
||||||
|
|
||||||
|
Verify GitLab authentication is working.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi auth-test
|
||||||
|
# Authenticated as @username (Full Name)
|
||||||
|
# GitLab: https://gitlab.com
|
||||||
|
```
|
||||||
|
|
||||||
|
### `gi doctor`
|
||||||
|
|
||||||
|
Check environment health and configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi doctor # Human-readable output
|
||||||
|
gi doctor --json # JSON output for scripting
|
||||||
|
```
|
||||||
|
|
||||||
|
### `gi ingest`
|
||||||
|
|
||||||
|
Sync data from GitLab to local database.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi ingest --type issues # Sync all projects
|
||||||
|
gi ingest --type issues --project group/repo # Single project
|
||||||
|
gi ingest --type issues --force # Override stale lock
|
||||||
|
```
|
||||||
|
|
||||||
|
### `gi list issues`
|
||||||
|
|
||||||
|
Query issues from local database.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi list issues # Recent issues (default 50)
|
||||||
|
gi list issues --limit 100 # More results
|
||||||
|
gi list issues --state opened # Only open issues
|
||||||
|
gi list issues --state closed # Only closed issues
|
||||||
|
gi list issues --author username # By author
|
||||||
|
gi list issues --assignee username # By assignee
|
||||||
|
gi list issues --label bug # By label (AND logic)
|
||||||
|
gi list issues --label bug --label urgent # Multiple labels
|
||||||
|
gi list issues --milestone "v1.0" # By milestone title
|
||||||
|
gi list issues --since 7d # Updated in last 7 days
|
||||||
|
gi list issues --since 2w # Updated in last 2 weeks
|
||||||
|
gi list issues --since 2024-01-01 # Updated since date
|
||||||
|
gi list issues --due-before 2024-12-31 # Due before date
|
||||||
|
gi list issues --has-due-date # Only issues with due dates
|
||||||
|
gi list issues --project group/repo # Filter by project
|
||||||
|
gi list issues --sort created --order asc # Sort options
|
||||||
|
gi list issues --open # Open first result in browser
|
||||||
|
gi list issues --json # JSON output
|
||||||
|
```
|
||||||
|
|
||||||
|
### `gi show issue`
|
||||||
|
|
||||||
|
Display detailed issue information.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi show issue 123 # Show issue #123
|
||||||
|
gi show issue 123 --project group/repo # Disambiguate if needed
|
||||||
|
```
|
||||||
|
|
||||||
|
### `gi count`
|
||||||
|
|
||||||
|
Count entities in local database.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi count issues # Total issues
|
||||||
|
gi count discussions # Total discussions
|
||||||
|
gi count discussions --type issue # Issue discussions only
|
||||||
|
gi count notes # Total notes
|
||||||
|
```
|
||||||
|
|
||||||
|
### `gi sync-status`
|
||||||
|
|
||||||
|
Show current sync state and watermarks.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi sync-status
|
||||||
|
```
|
||||||
|
|
||||||
|
### `gi migrate`
|
||||||
|
|
||||||
|
Run pending database migrations.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi migrate
|
||||||
|
```
|
||||||
|
|
||||||
|
### `gi version`
|
||||||
|
|
||||||
|
Show version information.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi version
|
||||||
|
```
|
||||||
|
|
||||||
|
## Database Schema
|
||||||
|
|
||||||
|
Data is stored in SQLite with the following main tables:
|
||||||
|
|
||||||
|
- **projects**: Tracked GitLab projects
|
||||||
|
- **issues**: Issue metadata (title, state, author, assignee info, due date, milestone)
|
||||||
|
- **milestones**: Project milestones with state and due dates
|
||||||
|
- **issue_assignees**: Many-to-many issue-assignee relationships
|
||||||
|
- **labels**: Project labels with colors
|
||||||
|
- **issue_labels**: Many-to-many issue-label relationships
|
||||||
|
- **discussions**: Issue/MR discussions
|
||||||
|
- **notes**: Individual notes within discussions
|
||||||
|
- **raw_payloads**: Compressed original API responses
|
||||||
|
|
||||||
|
The database is stored at `~/.local/share/gi/gi.db` by default.
|
||||||
|
|
||||||
|
## Global Options
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gi --config /path/to/config.json <command> # Use alternate config
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run tests
|
||||||
|
cargo test
|
||||||
|
|
||||||
|
# Run with debug logging
|
||||||
|
RUST_LOG=gi=debug gi list issues
|
||||||
|
|
||||||
|
# Check formatting
|
||||||
|
cargo fmt --check
|
||||||
|
|
||||||
|
# Lint
|
||||||
|
cargo clippy
|
||||||
|
```
|
||||||
|
|
||||||
|
## Tech Stack
|
||||||
|
|
||||||
|
- **Rust** (2024 edition)
|
||||||
|
- **SQLite** via rusqlite (bundled)
|
||||||
|
- **clap** for CLI parsing
|
||||||
|
- **reqwest** for HTTP
|
||||||
|
- **tokio** for async runtime
|
||||||
|
- **serde** for serialization
|
||||||
|
- **tracing** for logging
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
292
SPEC.md
292
SPEC.md
@@ -1,5 +1,7 @@
|
|||||||
# GitLab Knowledge Engine - Spec Document
|
# GitLab Knowledge Engine - Spec Document
|
||||||
|
|
||||||
|
> **Note:** This is a historical planning document. The actual implementation uses Rust instead of TypeScript/Node.js. See [README.md](README.md) for current documentation.
|
||||||
|
|
||||||
## Executive Summary
|
## Executive Summary
|
||||||
|
|
||||||
A self-hosted tool to extract, index, and semantically search 2+ years of GitLab data (issues, MRs, and discussion threads) from 2 main repositories (~50-100K documents including threaded discussions). The MVP delivers semantic search as a foundational capability that enables future specialized views (file history, personal tracking, person context). Discussion threads are preserved as first-class entities to maintain conversational context essential for decision traceability.
|
A self-hosted tool to extract, index, and semantically search 2+ years of GitLab data (issues, MRs, and discussion threads) from 2 main repositories (~50-100K documents including threaded discussions). The MVP delivers semantic search as a foundational capability that enables future specialized views (file history, personal tracking, person context). Discussion threads are preserved as first-class entities to maintain conversational context essential for decision traceability.
|
||||||
@@ -122,7 +124,7 @@ npm link # Makes `gi` available globally
|
|||||||
▼
|
▼
|
||||||
┌─────────────────────────────────────────────────────────────────┐
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
│ Storage Layer │
|
│ Storage Layer │
|
||||||
│ - SQLite + sqlite-vss + FTS5 (hybrid search) │
|
│ - SQLite + sqlite-vec + FTS5 (hybrid search) │
|
||||||
│ - Structured metadata in relational tables │
|
│ - Structured metadata in relational tables │
|
||||||
│ - Vector embeddings for semantic search │
|
│ - Vector embeddings for semantic search │
|
||||||
│ - Full-text index for lexical search fallback │
|
│ - Full-text index for lexical search fallback │
|
||||||
@@ -139,12 +141,20 @@ npm link # Makes `gi` available globally
|
|||||||
|
|
||||||
### Technology Choices
|
### Technology Choices
|
||||||
|
|
||||||
| Component | Recommendation | Rationale |
|
| Component | Choice | Rationale |
|
||||||
|-----------|---------------|-----------|
|
|-----------|--------|-----------|
|
||||||
| Language | TypeScript/Node.js | User expertise, good GitLab libs, AI agent friendly |
|
| Language | TypeScript/Node.js | User expertise, good GitLab libs, AI agent friendly |
|
||||||
| Database | SQLite + sqlite-vss | Zero-config, portable, vector search built-in |
|
| Database | SQLite + sqlite-vec + FTS5 | Zero-config, portable, vector search via pure-C extension |
|
||||||
| Embeddings | Ollama + nomic-embed-text | Self-hosted, runs well on Apple Silicon, 768-dim vectors |
|
| Embeddings | Ollama + nomic-embed-text | Self-hosted, runs well on Apple Silicon, 768-dim vectors |
|
||||||
| CLI Framework | Commander.js or oclif | Standard, well-documented |
|
| CLI Framework | Commander.js | Simple, lightweight, well-documented |
|
||||||
|
| Logging | pino | Fast, JSON-structured, low overhead |
|
||||||
|
| Validation | Zod | TypeScript-first schema validation |
|
||||||
|
|
||||||
|
### Alternative Considered: sqlite-vss
|
||||||
|
- sqlite-vss was the original choice but is now deprecated
|
||||||
|
- No Apple Silicon support (no prebuilt ARM binaries)
|
||||||
|
- Replaced by sqlite-vec, which is pure C with no dependencies
|
||||||
|
- sqlite-vec uses `vec0` virtual table (vs `vss0`)
|
||||||
|
|
||||||
### Alternative Considered: Postgres + pgvector
|
### Alternative Considered: Postgres + pgvector
|
||||||
- Pros: More scalable, better for production multi-user
|
- Pros: More scalable, better for production multi-user
|
||||||
@@ -153,6 +163,126 @@ npm link # Makes `gi` available globally
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
gitlab-inbox/
|
||||||
|
├── src/
|
||||||
|
│ ├── cli/
|
||||||
|
│ │ ├── index.ts # CLI entry point (Commander.js)
|
||||||
|
│ │ └── commands/ # One file per command group
|
||||||
|
│ │ ├── init.ts
|
||||||
|
│ │ ├── sync.ts
|
||||||
|
│ │ ├── search.ts
|
||||||
|
│ │ ├── list.ts
|
||||||
|
│ │ └── doctor.ts
|
||||||
|
│ ├── core/
|
||||||
|
│ │ ├── config.ts # Config loading/validation (Zod)
|
||||||
|
│ │ ├── db.ts # Database connection + migrations
|
||||||
|
│ │ ├── errors.ts # Custom error classes
|
||||||
|
│ │ └── logger.ts # pino logger setup
|
||||||
|
│ ├── gitlab/
|
||||||
|
│ │ ├── client.ts # GitLab API client with rate limiting
|
||||||
|
│ │ ├── types.ts # GitLab API response types
|
||||||
|
│ │ └── transformers/ # Payload → normalized schema
|
||||||
|
│ │ ├── issue.ts
|
||||||
|
│ │ ├── merge-request.ts
|
||||||
|
│ │ └── discussion.ts
|
||||||
|
│ ├── ingestion/
|
||||||
|
│ │ ├── issues.ts
|
||||||
|
│ │ ├── merge-requests.ts
|
||||||
|
│ │ └── discussions.ts
|
||||||
|
│ ├── documents/
|
||||||
|
│ │ ├── extractor.ts # Document generation from entities
|
||||||
|
│ │ └── truncation.ts # Note-boundary aware truncation
|
||||||
|
│ ├── embedding/
|
||||||
|
│ │ ├── ollama.ts # Ollama client
|
||||||
|
│ │ └── pipeline.ts # Batch embedding orchestration
|
||||||
|
│ ├── search/
|
||||||
|
│ │ ├── hybrid.ts # RRF ranking logic
|
||||||
|
│ │ ├── fts.ts # FTS5 queries
|
||||||
|
│ │ └── vector.ts # sqlite-vec queries
|
||||||
|
│ └── types/
|
||||||
|
│ └── index.ts # Shared TypeScript types
|
||||||
|
├── tests/
|
||||||
|
│ ├── unit/
|
||||||
|
│ ├── integration/
|
||||||
|
│ ├── live/ # Optional GitLab live tests (GITLAB_LIVE_TESTS=1)
|
||||||
|
│ └── fixtures/
|
||||||
|
│ └── golden-queries.json
|
||||||
|
├── migrations/ # Numbered SQL migration files
|
||||||
|
│ ├── 001_initial.sql
|
||||||
|
│ └── ...
|
||||||
|
├── gi.config.json # User config (gitignored)
|
||||||
|
├── package.json
|
||||||
|
├── tsconfig.json
|
||||||
|
├── vitest.config.ts
|
||||||
|
├── eslint.config.js
|
||||||
|
└── README.md
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
### Runtime Dependencies
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"dependencies": {
|
||||||
|
"better-sqlite3": "latest",
|
||||||
|
"sqlite-vec": "latest",
|
||||||
|
"commander": "latest",
|
||||||
|
"zod": "latest",
|
||||||
|
"pino": "latest",
|
||||||
|
"pino-pretty": "latest",
|
||||||
|
"ora": "latest",
|
||||||
|
"chalk": "latest",
|
||||||
|
"cli-table3": "latest"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
| Package | Purpose |
|
||||||
|
|---------|---------|
|
||||||
|
| better-sqlite3 | Synchronous SQLite driver (fast, native) |
|
||||||
|
| sqlite-vec | Vector search extension (pure C, cross-platform) |
|
||||||
|
| commander | CLI argument parsing |
|
||||||
|
| zod | Schema validation for config and inputs |
|
||||||
|
| pino | Structured JSON logging |
|
||||||
|
| pino-pretty | Dev-mode log formatting |
|
||||||
|
| ora | CLI spinners for progress indication |
|
||||||
|
| chalk | Terminal colors |
|
||||||
|
| cli-table3 | ASCII tables for list output |
|
||||||
|
|
||||||
|
### Dev Dependencies
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"devDependencies": {
|
||||||
|
"typescript": "latest",
|
||||||
|
"@types/better-sqlite3": "latest",
|
||||||
|
"@types/node": "latest",
|
||||||
|
"vitest": "latest",
|
||||||
|
"msw": "latest",
|
||||||
|
"eslint": "latest",
|
||||||
|
"@typescript-eslint/eslint-plugin": "latest",
|
||||||
|
"@typescript-eslint/parser": "latest",
|
||||||
|
"tsx": "latest"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
| Package | Purpose |
|
||||||
|
|---------|---------|
|
||||||
|
| typescript | TypeScript compiler |
|
||||||
|
| vitest | Test runner |
|
||||||
|
| msw | Mock Service Worker for API mocking in tests |
|
||||||
|
| eslint | Linting |
|
||||||
|
| tsx | Run TypeScript directly during development |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## GitLab API Strategy
|
## GitLab API Strategy
|
||||||
|
|
||||||
### Primary Resources (Bulk Fetch)
|
### Primary Resources (Bulk Fetch)
|
||||||
@@ -368,6 +498,98 @@ tests/integration/init.test.ts
|
|||||||
- Decompression is handled transparently when reading payloads
|
- Decompression is handled transparently when reading payloads
|
||||||
- Tradeoff: Slightly higher CPU on write/read, significantly lower disk usage
|
- Tradeoff: Slightly higher CPU on write/read, significantly lower disk usage
|
||||||
|
|
||||||
|
**Error Classes (src/core/errors.ts):**
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Base error class with error codes for programmatic handling
|
||||||
|
export class GiError extends Error {
|
||||||
|
constructor(message: string, public readonly code: string) {
|
||||||
|
super(message);
|
||||||
|
this.name = 'GiError';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Config errors
|
||||||
|
export class ConfigNotFoundError extends GiError {
|
||||||
|
constructor() {
|
||||||
|
super('Config file not found. Run "gi init" first.', 'CONFIG_NOT_FOUND');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export class ConfigValidationError extends GiError {
|
||||||
|
constructor(details: string) {
|
||||||
|
super(`Invalid config: ${details}`, 'CONFIG_INVALID');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// GitLab API errors
|
||||||
|
export class GitLabAuthError extends GiError {
|
||||||
|
constructor() {
|
||||||
|
super('GitLab authentication failed. Check your token.', 'GITLAB_AUTH_FAILED');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export class GitLabNotFoundError extends GiError {
|
||||||
|
constructor(resource: string) {
|
||||||
|
super(`GitLab resource not found: ${resource}`, 'GITLAB_NOT_FOUND');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export class GitLabRateLimitError extends GiError {
|
||||||
|
constructor(public readonly retryAfter: number) {
|
||||||
|
super(`Rate limited. Retry after ${retryAfter}s`, 'GITLAB_RATE_LIMITED');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Database errors
|
||||||
|
export class DatabaseLockError extends GiError {
|
||||||
|
constructor() {
|
||||||
|
super('Another sync is running. Use --force to override.', 'DB_LOCKED');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Embedding errors
|
||||||
|
export class OllamaConnectionError extends GiError {
|
||||||
|
constructor() {
|
||||||
|
super('Cannot connect to Ollama. Is it running?', 'OLLAMA_UNAVAILABLE');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export class EmbeddingError extends GiError {
|
||||||
|
constructor(documentId: number, reason: string) {
|
||||||
|
super(`Failed to embed document ${documentId}: ${reason}`, 'EMBEDDING_FAILED');
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Logging Strategy (src/core/logger.ts):**
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
import pino from 'pino';
|
||||||
|
|
||||||
|
// Logs go to stderr, results to stdout (allows clean JSON piping)
|
||||||
|
export const logger = pino({
|
||||||
|
level: process.env.LOG_LEVEL || 'info',
|
||||||
|
transport: process.env.NODE_ENV === 'production' ? undefined : {
|
||||||
|
target: 'pino-pretty',
|
||||||
|
options: { colorize: true, destination: 2 } // 2 = stderr
|
||||||
|
}
|
||||||
|
}, pino.destination(2));
|
||||||
|
```
|
||||||
|
|
||||||
|
**Log Levels:**
|
||||||
|
| Level | When to use |
|
||||||
|
|-------|-------------|
|
||||||
|
| debug | Detailed sync progress, API calls, SQL queries |
|
||||||
|
| info | Sync start/complete, document counts, search timing |
|
||||||
|
| warn | Rate limits hit, Ollama unavailable (fallback to FTS), retries |
|
||||||
|
| error | Failures that stop operations |
|
||||||
|
|
||||||
|
**Logging Conventions:**
|
||||||
|
- Always include structured context: `logger.info({ project, count }, 'Fetched issues')`
|
||||||
|
- Errors include err object: `logger.error({ err, documentId }, 'Embedding failed')`
|
||||||
|
- All logs to stderr so `gi search --json` output stays clean on stdout
|
||||||
|
|
||||||
**DB Runtime Defaults (Checkpoint 0):**
|
**DB Runtime Defaults (Checkpoint 0):**
|
||||||
- On every connection:
|
- On every connection:
|
||||||
- `PRAGMA journal_mode=WAL;`
|
- `PRAGMA journal_mode=WAL;`
|
||||||
@@ -930,7 +1152,7 @@ tests/unit/embedding-client.test.ts
|
|||||||
✓ batches requests (32 documents per batch)
|
✓ batches requests (32 documents per batch)
|
||||||
|
|
||||||
tests/integration/embedding-storage.test.ts
|
tests/integration/embedding-storage.test.ts
|
||||||
✓ stores embedding in sqlite-vss
|
✓ stores embedding in sqlite-vec
|
||||||
✓ embedding rowid matches document id
|
✓ embedding rowid matches document id
|
||||||
✓ creates embedding_metadata record
|
✓ creates embedding_metadata record
|
||||||
✓ skips re-embedding when content_hash unchanged
|
✓ skips re-embedding when content_hash unchanged
|
||||||
@@ -959,16 +1181,46 @@ tests/integration/embedding-storage.test.ts
|
|||||||
- Concurrency: configurable (default 4 workers)
|
- Concurrency: configurable (default 4 workers)
|
||||||
- Retry with exponential backoff for transient failures (max 3 attempts)
|
- Retry with exponential backoff for transient failures (max 3 attempts)
|
||||||
- Per-document failure recording to enable targeted re-runs
|
- Per-document failure recording to enable targeted re-runs
|
||||||
- Vector storage in SQLite (sqlite-vss extension)
|
- Vector storage in SQLite (sqlite-vec extension)
|
||||||
- Progress tracking and resumability
|
- Progress tracking and resumability
|
||||||
- `gi search --mode=semantic` CLI command
|
- `gi search --mode=semantic` CLI command
|
||||||
|
|
||||||
|
**Ollama API Contract:**
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// POST http://localhost:11434/api/embed (batch endpoint - preferred)
|
||||||
|
interface OllamaEmbedRequest {
|
||||||
|
model: string; // "nomic-embed-text"
|
||||||
|
input: string[]; // array of texts to embed (up to 32)
|
||||||
|
}
|
||||||
|
|
||||||
|
interface OllamaEmbedResponse {
|
||||||
|
model: string;
|
||||||
|
embeddings: number[][]; // array of 768-dim vectors
|
||||||
|
}
|
||||||
|
|
||||||
|
// POST http://localhost:11434/api/embeddings (single text - fallback)
|
||||||
|
interface OllamaEmbeddingsRequest {
|
||||||
|
model: string;
|
||||||
|
prompt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface OllamaEmbeddingsResponse {
|
||||||
|
embedding: number[];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
- Use `/api/embed` for batching (up to 32 documents per request)
|
||||||
|
- Fall back to `/api/embeddings` for single documents or if batch fails
|
||||||
|
- Check Ollama availability with `GET http://localhost:11434/api/tags`
|
||||||
|
|
||||||
**Schema Additions (CP3B):**
|
**Schema Additions (CP3B):**
|
||||||
```sql
|
```sql
|
||||||
-- sqlite-vss virtual table for vector search
|
-- sqlite-vec virtual table for vector search
|
||||||
-- Storage rule: embeddings.rowid = documents.id
|
-- Storage rule: embeddings.rowid = documents.id
|
||||||
CREATE VIRTUAL TABLE embeddings USING vss0(
|
CREATE VIRTUAL TABLE embeddings USING vec0(
|
||||||
embedding(768)
|
embedding float[768]
|
||||||
);
|
);
|
||||||
|
|
||||||
-- Embedding provenance + change detection
|
-- Embedding provenance + change detection
|
||||||
@@ -1053,6 +1305,11 @@ If content exceeds 8000 tokens (~32000 chars):
|
|||||||
6. Set `documents.truncated_reason = 'token_limit_middle_drop'`
|
6. Set `documents.truncated_reason = 'token_limit_middle_drop'`
|
||||||
7. Log a warning with document ID and original/truncated token count
|
7. Log a warning with document ID and original/truncated token count
|
||||||
|
|
||||||
|
**Edge Cases:**
|
||||||
|
- **Single note > 32000 chars:** Truncate at character boundary, append `[truncated]`, set `truncated_reason = 'single_note_oversized'`
|
||||||
|
- **First + last note > 32000 chars:** Keep only first note (truncated if needed), set `truncated_reason = 'first_last_oversized'`
|
||||||
|
- **Only one note in discussion:** If it exceeds limit, truncate at char boundary with `[truncated]`
|
||||||
|
|
||||||
**Why note-boundary truncation:**
|
**Why note-boundary truncation:**
|
||||||
- Cutting mid-note produces unreadable snippets ("...the authentication flow because--")
|
- Cutting mid-note produces unreadable snippets ("...the authentication flow because--")
|
||||||
- Keeping whole notes preserves semantic coherence for embeddings
|
- Keeping whole notes preserves semantic coherence for embeddings
|
||||||
@@ -1148,7 +1405,7 @@ Each query must have at least one expected URL appear in top 10 results.
|
|||||||
|
|
||||||
**Scope:**
|
**Scope:**
|
||||||
- Hybrid retrieval:
|
- Hybrid retrieval:
|
||||||
- Vector recall (sqlite-vss) + FTS lexical recall (fts5)
|
- Vector recall (sqlite-vec) + FTS lexical recall (fts5)
|
||||||
- Merge + rerank results using Reciprocal Rank Fusion (RRF)
|
- Merge + rerank results using Reciprocal Rank Fusion (RRF)
|
||||||
- Query embedding generation (same Ollama pipeline as documents)
|
- Query embedding generation (same Ollama pipeline as documents)
|
||||||
- Result ranking and scoring (document-level)
|
- Result ranking and scoring (document-level)
|
||||||
@@ -1178,6 +1435,7 @@ Each query must have at least one expected URL appear in top 10 results.
|
|||||||
- If any filters present (--project, --type, --author, --label, --path, --after): `topK = 200`
|
- If any filters present (--project, --type, --author, --label, --path, --after): `topK = 200`
|
||||||
- This prevents "no results" when relevant docs exist outside top-50 unfiltered recall
|
- This prevents "no results" when relevant docs exist outside top-50 unfiltered recall
|
||||||
2. Query both vector index (top topK) and FTS5 (top topK)
|
2. Query both vector index (top topK) and FTS5 (top topK)
|
||||||
|
- Vector recall via sqlite-vec + FTS lexical recall via fts5
|
||||||
- Apply SQL-expressible filters during retrieval when possible (project_id, author_username, source_type)
|
- Apply SQL-expressible filters during retrieval when possible (project_id, author_username, source_type)
|
||||||
3. Merge results by document_id
|
3. Merge results by document_id
|
||||||
4. Combine with Reciprocal Rank Fusion (RRF):
|
4. Combine with Reciprocal Rank Fusion (RRF):
|
||||||
@@ -1318,7 +1576,7 @@ tests/integration/incremental-sync.test.ts
|
|||||||
✓ refetches discussions for updated MRs
|
✓ refetches discussions for updated MRs
|
||||||
✓ updates existing records (not duplicates)
|
✓ updates existing records (not duplicates)
|
||||||
✓ creates new records for new items
|
✓ creates new records for new items
|
||||||
✓ re-embeds documents with changed content
|
✓ re-embeds documents with changed content_hash
|
||||||
|
|
||||||
tests/integration/sync-recovery.test.ts
|
tests/integration/sync-recovery.test.ts
|
||||||
✓ resumes from cursor after interrupted sync
|
✓ resumes from cursor after interrupted sync
|
||||||
@@ -1731,7 +1989,7 @@ Each checkpoint includes:
|
|||||||
| dirty_sources | 3A | Queue for incremental document regeneration |
|
| dirty_sources | 3A | Queue for incremental document regeneration |
|
||||||
| pending_discussion_fetches | 3A | Resumable queue for dependent discussion fetching |
|
| pending_discussion_fetches | 3A | Resumable queue for dependent discussion fetching |
|
||||||
| documents_fts | 3A | Full-text search index (fts5 with porter stemmer) |
|
| documents_fts | 3A | Full-text search index (fts5 with porter stemmer) |
|
||||||
| embeddings | 3B | Vector embeddings (sqlite-vss, rowid=document_id) |
|
| embeddings | 3B | Vector embeddings (sqlite-vec vec0, rowid=document_id) |
|
||||||
| embedding_metadata | 3B | Embedding provenance + error tracking |
|
| embedding_metadata | 3B | Embedding provenance + error tracking |
|
||||||
| mr_files | 6 | MR file changes (deferred to post-MVP) |
|
| mr_files | 6 | MR file changes (deferred to post-MVP) |
|
||||||
|
|
||||||
@@ -1759,7 +2017,7 @@ Each checkpoint includes:
|
|||||||
| JSON output | **Stable documented schema** | Enables reliable agent/MCP consumption |
|
| JSON output | **Stable documented schema** | Enables reliable agent/MCP consumption |
|
||||||
| Database location | **XDG compliant: `~/.local/share/gi/`** | Standard location, user-configurable |
|
| Database location | **XDG compliant: `~/.local/share/gi/`** | Standard location, user-configurable |
|
||||||
| `gi init` validation | **Validate GitLab before writing config** | Fail fast, better UX |
|
| `gi init` validation | **Validate GitLab before writing config** | Fail fast, better UX |
|
||||||
| Ctrl+C handling | **Graceful shutdown** | Finish page, commit cursor, exit cleanly |
|
| Ctrl+C handling | **Graceful shutdown** | Finish page, commit cursor, exits cleanly |
|
||||||
| Empty state UX | **Actionable messages** | Guide user to next step |
|
| Empty state UX | **Actionable messages** | Guide user to next step |
|
||||||
| raw_payloads.gitlab_id | **TEXT not INTEGER** | Discussion IDs are strings; numeric IDs stored as strings |
|
| raw_payloads.gitlab_id | **TEXT not INTEGER** | Discussion IDs are strings; numeric IDs stored as strings |
|
||||||
| GitLab list params | **Always scope=all&state=all** | Ensures all historical data including closed items |
|
| GitLab list params | **Always scope=all&state=all** | Ensures all historical data including closed items |
|
||||||
@@ -1769,6 +2027,12 @@ Each checkpoint includes:
|
|||||||
| RRF score normalization | **Per-query normalized 0-1** | score = rrfScore / max(rrfScore); raw score in explain |
|
| RRF score normalization | **Per-query normalized 0-1** | score = rrfScore / max(rrfScore); raw score in explain |
|
||||||
| --path semantics | **Trailing / = prefix match** | `--path=src/auth/` does prefix; otherwise exact match |
|
| --path semantics | **Trailing / = prefix match** | `--path=src/auth/` does prefix; otherwise exact match |
|
||||||
| CP3 structure | **Split into 3A (FTS) and 3B (embeddings)** | Lexical search works before embedding infra risk |
|
| CP3 structure | **Split into 3A (FTS) and 3B (embeddings)** | Lexical search works before embedding infra risk |
|
||||||
|
| Vector extension | **sqlite-vec (not sqlite-vss)** | sqlite-vss deprecated, no Apple Silicon support; sqlite-vec is pure C, runs anywhere |
|
||||||
|
| CLI framework | **Commander.js** | Simple, lightweight, sufficient for single-user CLI tool |
|
||||||
|
| Logging | **pino to stderr** | JSON-structured, fast; stderr keeps stdout clean for JSON output piping |
|
||||||
|
| Error handling | **Custom error class hierarchy** | GiError base with codes; specific classes for config/gitlab/db/embedding errors |
|
||||||
|
| Truncation edge cases | **Char-boundary cut for oversized notes** | Single notes > 32000 chars truncated at char boundary with `[truncated]` marker |
|
||||||
|
| Ollama API | **Use /api/embed for batching** | Batch up to 32 docs per request; fall back to /api/embeddings for single |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
1164
docs/prd/checkpoint-0.md
Normal file
1164
docs/prd/checkpoint-0.md
Normal file
File diff suppressed because it is too large
Load Diff
1683
docs/prd/checkpoint-1.md
Normal file
1683
docs/prd/checkpoint-1.md
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user