From 3f38b3fda7895d0d1c158ab95d65ae04c8cd3923 Mon Sep 17 00:00:00 2001 From: teernisse Date: Fri, 27 Feb 2026 07:31:36 -0500 Subject: [PATCH] docs: add comprehensive command surface analysis Deep analysis of the full `lore` CLI command surface (34 commands across 6 categories) covering command inventory, data flow, overlap analysis, and optimization proposals. Document structure: - Main consolidated doc: docs/command-surface-analysis.md (1251 lines) - Split sections in docs/command-surface-analysis/ for navigation: 00-overview.md - Summary, inventory, priorities 01-entity-commands.md - issues, mrs, notes, search, count 02-intelligence-commands.md - who, timeline, me, file-history, trace, related, drift 03-pipeline-and-infra.md - sync, ingest, generate-docs, embed, diagnostics 04-data-flow.md - Shared data source map, command network graph 05-overlap-analysis.md - Quantified overlap percentages for every command pair 06-agent-workflows.md - Common agent flows, round-trip costs, token profiles 07-consolidation-proposals.md - 5 proposals to reduce 34 commands to 29 08-robot-optimization-proposals.md - 6 proposals for --include, --batch, --depth 09-appendices.md - Robot output envelope, field presets, exit codes Key findings: - High overlap pairs: who-workload/me (~85%), health/doctor (~90%) - 5 consolidation proposals to reduce command count by 15% - 6 robot-mode optimization proposals targeting agent round-trip reduction - Full DB table mapping and data flow documentation Co-Authored-By: Claude Opus 4.5 --- docs/command-surface-analysis.md | 1251 +++++++++++++++++ docs/command-surface-analysis/00-overview.md | 92 ++ .../01-entity-commands.md | 308 ++++ .../02-intelligence-commands.md | 452 ++++++ .../03-pipeline-and-infra.md | 210 +++ docs/command-surface-analysis/04-data-flow.md | 179 +++ .../05-overlap-analysis.md | 170 +++ .../06-agent-workflows.md | 216 +++ .../07-consolidation-proposals.md | 198 +++ .../08-robot-optimization-proposals.md | 347 +++++ .../command-surface-analysis/09-appendices.md | 181 +++ 11 files changed, 3604 insertions(+) create mode 100644 docs/command-surface-analysis.md create mode 100644 docs/command-surface-analysis/00-overview.md create mode 100644 docs/command-surface-analysis/01-entity-commands.md create mode 100644 docs/command-surface-analysis/02-intelligence-commands.md create mode 100644 docs/command-surface-analysis/03-pipeline-and-infra.md create mode 100644 docs/command-surface-analysis/04-data-flow.md create mode 100644 docs/command-surface-analysis/05-overlap-analysis.md create mode 100644 docs/command-surface-analysis/06-agent-workflows.md create mode 100644 docs/command-surface-analysis/07-consolidation-proposals.md create mode 100644 docs/command-surface-analysis/08-robot-optimization-proposals.md create mode 100644 docs/command-surface-analysis/09-appendices.md diff --git a/docs/command-surface-analysis.md b/docs/command-surface-analysis.md new file mode 100644 index 0000000..bcbcb15 --- /dev/null +++ b/docs/command-surface-analysis.md @@ -0,0 +1,1251 @@ +# Lore Command Surface Analysis + +**Date:** 2026-02-26 +**Version:** v0.9.1 (439c20e) +**Scope:** Full command inventory, overlap analysis, consolidation proposals, robot-mode optimization proposals + +--- + +## 1. Command Inventory + +34 commands across 6 categories. + +| Category | Commands | Count | +|---|---|---| +| Entity Query | `issues`, `mrs`, `notes`, `search`, `count` | 5 | +| Intelligence | `who` (5 modes), `timeline`, `related`, `drift`, `me`, `file-history`, `trace` | 7 (11 with who sub-modes) | +| Data Pipeline | `sync`, `ingest`, `generate-docs`, `embed` | 4 | +| Diagnostics | `health`, `auth`, `doctor`, `status`, `stats` | 5 | +| Setup | `init`, `token`, `cron`, `migrate` | 4 | +| Meta | `version`, `completions`, `robot-docs` | 3 | + +### 1.1 Entity Query Commands + +#### `issues` (alias: `issue`) + +List or show issues from local database. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `[IID]` | positional | — | Omit to list, provide to show detail | +| `-n, --limit` | int | 50 | Max results | +| `--fields` | string | — | Select output columns (preset: `minimal`) | +| `-s, --state` | enum | — | `opened\|closed\|all` | +| `-p, --project` | string | — | Filter by project (fuzzy) | +| `-a, --author` | string | — | Filter by author username | +| `-A, --assignee` | string | — | Filter by assignee username | +| `-l, --label` | string[] | — | Filter by labels (AND logic, repeatable) | +| `-m, --milestone` | string | — | Filter by milestone title | +| `--status` | string[] | — | Filter by work-item status (COLLATE NOCASE, OR logic) | +| `--since` | duration/date | — | Filter by created date (`7d`, `2w`, `YYYY-MM-DD`) | +| `--due-before` | date | — | Filter by due date | +| `--has-due` | flag | — | Show only issues with due dates | +| `--sort` | enum | `updated` | `updated\|created\|iid` | +| `--asc` | flag | — | Sort ascending | +| `-o, --open` | flag | — | Open first match in browser | + +**DB tables:** `issues`, `projects`, `issue_assignees`, `issue_labels`, `labels` +**Detail mode adds:** `discussions`, `notes`, `entity_references` (closing MRs) + +**Robot output (list):** +```json +{ + "ok": true, + "data": { + "issues": [ + { + "iid": 42, "title": "Fix auth", "state": "opened", + "author_username": "jdoe", "labels": ["backend"], + "assignees": ["jdoe"], "discussion_count": 3, + "unresolved_count": 1, "created_at_iso": "...", + "updated_at_iso": "...", "web_url": "...", + "project_path": "group/repo", + "status_name": "In progress" + } + ], + "total_count": 150, "showing": 50 + }, + "meta": { "elapsed_ms": 40, "available_statuses": ["Open", "In progress", "Closed"] } +} +``` + +**Minimal preset:** `iid`, `title`, `state`, `updated_at_iso` + +--- + +#### `mrs` (aliases: `mr`, `merge-request`, `merge-requests`) + +List or show merge requests. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `[IID]` | positional | — | Omit to list, provide to show detail | +| `-n, --limit` | int | 50 | Max results | +| `--fields` | string | — | Select output columns (preset: `minimal`) | +| `-s, --state` | enum | — | `opened\|merged\|closed\|locked\|all` | +| `-p, --project` | string | — | Filter by project | +| `-a, --author` | string | — | Filter by author | +| `-A, --assignee` | string | — | Filter by assignee | +| `-r, --reviewer` | string | — | Filter by reviewer | +| `-l, --label` | string[] | — | Filter by labels (AND) | +| `--since` | duration/date | — | Filter by created date | +| `-d, --draft` | flag | — | Draft MRs only | +| `-D, --no-draft` | flag | — | Exclude drafts | +| `--target` | string | — | Filter by target branch | +| `--source` | string | — | Filter by source branch | +| `--sort` | enum | `updated` | `updated\|created\|iid` | +| `--asc` | flag | — | Sort ascending | +| `-o, --open` | flag | — | Open in browser | + +**DB tables:** `merge_requests`, `projects`, `mr_reviewers`, `mr_labels`, `labels`, `mr_assignees` +**Detail mode adds:** `discussions`, `notes`, `mr_diffs` + +**Minimal preset:** `iid`, `title`, `state`, `updated_at_iso` + +--- + +#### `notes` (alias: `note`) + +List discussion notes/comments with fine-grained filters. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `-n, --limit` | int | 50 | Max results | +| `--fields` | string | — | Preset: `minimal` | +| `-a, --author` | string | — | Filter by author | +| `--note-type` | enum | — | `DiffNote\|DiscussionNote` | +| `--contains` | string | — | Body text substring filter | +| `--note-id` | int | — | Internal note ID | +| `--gitlab-note-id` | int | — | GitLab note ID | +| `--discussion-id` | string | — | Discussion ID filter | +| `--include-system` | flag | — | Include system notes | +| `--for-issue` | int | — | Notes on specific issue (requires `-p`) | +| `--for-mr` | int | — | Notes on specific MR (requires `-p`) | +| `-p, --project` | string | — | Scope to project | +| `--since` | duration/date | — | Created after | +| `--until` | date | — | Created before (inclusive) | +| `--path` | string | — | File path filter (exact or prefix with `/`) | +| `--resolution` | enum | — | `any\|unresolved\|resolved` | +| `--sort` | enum | `created` | `created\|updated` | +| `--asc` | flag | — | Sort ascending | +| `--open` | flag | — | Open in browser | + +**DB tables:** `notes`, `discussions`, `projects`, `issues`, `merge_requests` + +**Minimal preset:** `id`, `author_username`, `body`, `created_at_iso` + +--- + +#### `search` (aliases: `find`, `query`) + +Semantic + full-text search across indexed documents. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | Search query string | +| `--mode` | enum | `hybrid` | `lexical\|hybrid\|semantic` | +| `--type` | enum | — | `issue\|mr\|discussion\|note` | +| `--author` | string | — | Filter by author | +| `-p, --project` | string | — | Scope to project | +| `--label` | string[] | — | Filter by labels (AND) | +| `--path` | string | — | File path filter | +| `--since` | duration/date | — | Created after | +| `--updated-since` | duration/date | — | Updated after | +| `-n, --limit` | int | 20 | Max results (max: 100) | +| `--fields` | string | — | Preset: `minimal` | +| `--explain` | flag | — | Show ranking breakdown | +| `--fts-mode` | enum | `safe` | `safe\|raw` | + +**DB tables:** `documents`, `documents_fts` (FTS5), `embeddings` (vec0), `document_labels`, `document_paths`, `projects` + +**Robot output:** +```json +{ + "data": { + "query": "authentication bug", + "mode": "hybrid", + "total_results": 15, + "results": [ + { + "document_id": 1234, "source_type": "issue", + "title": "Fix SSO auth", "url": "...", + "author": "jdoe", "project_path": "group/repo", + "labels": ["auth"], "paths": ["src/auth/"], + "snippet": "...matching text...", + "score": 0.85, + "explain": { "vector_rank": 2, "fts_rank": 1, "rrf_score": 0.85 } + } + ], + "warnings": [] + } +} +``` + +**Minimal preset:** `document_id`, `title`, `source_type`, `score` + +--- + +#### `count` + +Count entities in local database. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | `issues\|mrs\|discussions\|notes\|events` | +| `-f, --for` | enum | — | Parent type: `issue\|mr` | + +**DB tables:** Conditional aggregation on `issues`, `merge_requests`, `discussions`, `notes`, event tables + +**Robot output:** +```json +{ + "data": { + "entity": "merge_requests", + "count": 1234, + "breakdown": { "opened": 100, "closed": 50, "merged": 1084 } + } +} +``` + +--- + +### 1.2 Intelligence Commands + +#### `who` (People Intelligence) + +Five sub-modes, dispatched by argument shape. + +| Mode | Trigger | Purpose | +|---|---|---| +| **expert** | `who ` or `who --path ` | Who knows about a code area? | +| **workload** | `who @username` | What is this person working on? | +| **reviews** | `who @username --reviews` | Review pattern analysis | +| **active** | `who --active` | Unresolved discussions needing attention | +| **overlap** | `who --overlap ` | Who else touches these files? | + +**Shared flags:** + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `-p, --project` | string | — | Scope to project | +| `-n, --limit` | int | varies | Max results (1-500) | +| `--fields` | string | — | Preset: `minimal` | +| `--since` | duration/date | — | Time window | +| `--include-bots` | flag | — | Include bot users | +| `--include-closed` | flag | — | Include closed issues/MRs | +| `--all-history` | flag | — | Query all history | + +**Expert-only flags:** `--detail` (per-MR breakdown), `--as-of` (score at point in time), `--explain-score` (score breakdown) + +**DB tables by mode:** + +| Mode | Primary Tables | +|---|---| +| expert | `notes` (INDEXED BY idx_notes_diffnote_path_created), `merge_requests`, `mr_reviewers` | +| workload | `issues`, `merge_requests`, `mr_reviewers` | +| reviews | `merge_requests`, `discussions`, `notes` | +| active | `discussions`, `notes`, `issues`, `merge_requests` | +| overlap | `notes`, `mr_file_changes`, `merge_requests` | + +**Robot output (expert):** +```json +{ + "data": { + "mode": "expert", + "result": { + "experts": [ + { "username": "jdoe", "score": 42.5, "detail": { "mr_ids_author": [99, 101] } } + ] + } + } +} +``` + +**Minimal presets:** expert: `username, score` | workload: `iid, title, state` | reviews: `name, count, percentage` | active: `entity_type, iid, title, participants` | overlap: `username, touch_count` + +--- + +#### `timeline` + +Reconstruct chronological event history for a topic/entity with cross-reference expansion. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | Search text or entity ref (`issue:42`, `mr:99`) | +| `-p, --project` | string | — | Scope to project | +| `--since` | duration/date | — | Filter events after | +| `--depth` | int | 1 | Cross-ref expansion depth (0=none) | +| `--no-mentions` | flag | — | Skip "mentioned" edges, keep "closes"/"related" | +| `-n, --limit` | int | 100 | Max events | +| `--fields` | string | — | Preset: `minimal` | +| `--max-seeds` | int | 10 | Max seed entities from search | +| `--max-entities` | int | 50 | Max expanded entities | +| `--max-evidence` | int | 10 | Max evidence notes | + +**Pipeline:** SEED -> HYDRATE -> EXPAND -> COLLECT -> RENDER + +**DB tables:** `issues`, `merge_requests`, `discussions`, `notes`, `entity_references`, `resource_state_events`, `resource_label_events`, `resource_milestone_events`, `documents` (for search seeding) + +**Robot output:** +```json +{ + "data": { + "query": "authentication", "event_count": 25, + "seed_entities": [{ "type": "issue", "iid": 42, "project": "group/repo" }], + "expanded_entities": [ + { "type": "mr", "iid": 99, "project": "group/repo", "depth": 1, + "via": { "from": { "type": "issue", "iid": 42 }, "reference_type": "closes" } } + ], + "events": [ + { + "timestamp": "2026-01-15T10:30:00Z", "entity_type": "issue", + "entity_iid": 42, "project": "group/repo", + "event_type": "state_changed", "summary": "Reopened", + "actor": "jdoe", "is_seed": true, + "evidence_notes": [{ "author": "jdoe", "snippet": "..." }] + } + ] + }, + "meta": { + "elapsed_ms": 150, "search_mode": "fts", + "expansion_depth": 1, "include_mentions": true, + "total_entities": 5, "total_events": 25 + } +} +``` + +**Minimal preset:** `timestamp`, `type`, `entity_iid`, `detail` + +--- + +#### `me` (Personal Dashboard) + +Personal work dashboard with issues, MRs, activity, and since-last-check inbox. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--issues` | flag | — | Open issues section only | +| `--mrs` | flag | — | MRs section only | +| `--activity` | flag | — | Activity feed only | +| `--since` | duration/date | `30d` | Activity window | +| `-p, --project` | string | — | Scope to one project | +| `--all` | flag | — | All synced projects | +| `--user` | string | — | Override configured username | +| `--fields` | string | — | Preset: `minimal` | +| `--reset-cursor` | flag | — | Clear since-last-check cursor | + +**Sections (no flags = all):** Issues, MRs authored, MRs reviewing, Activity feed, Inbox (since last check) + +**DB tables:** `issues`, `merge_requests`, `resource_state_events`, `projects`, `issue_labels`, `mr_labels` + +**Robot output:** +```json +{ + "data": { + "username": "jdoe", + "summary": { + "project_count": 3, "open_issue_count": 5, + "authored_mr_count": 2, "reviewing_mr_count": 1, + "needs_attention_count": 3 + }, + "since_last_check": { + "cursor_iso": "2026-02-25T18:00:00Z", + "total_event_count": 8, + "groups": [ + { + "entity_type": "issue", "entity_iid": 42, + "entity_title": "Fix auth", "project": "group/repo", + "events": [ + { "timestamp_iso": "...", "event_type": "comment", + "actor": "reviewer", "summary": "New comment" } + ] + } + ] + }, + "open_issues": [ + { "project": "group/repo", "iid": 42, "title": "Fix auth", + "state": "opened", "attention_state": "needs_attention", + "status_name": "In progress", "labels": ["auth"], + "updated_at_iso": "..." } + ], + "open_mrs_authored": [...], + "reviewing_mrs": [...], + "activity": [...] + } +} +``` + +**Minimal presets:** Items: `iid, title, attention_state, updated_at_iso` | Activity: `timestamp_iso, event_type, entity_iid, actor` + +--- + +#### `file-history` + +Show which MRs touched a file, with linked discussions. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | File path to trace | +| `-p, --project` | string | — | Scope to project | +| `--discussions` | flag | — | Include DiffNote snippets | +| `--no-follow-renames` | flag | — | Skip rename chain resolution | +| `--merged` | flag | — | Only merged MRs | +| `-n, --limit` | int | 50 | Max MRs | + +**DB tables:** `mr_file_changes`, `merge_requests`, `notes` (DiffNotes), `projects` + +**Robot output:** +```json +{ + "data": { + "path": "src/auth/middleware.rs", + "rename_chain": [ + { "previous_path": "src/auth.rs", "mr_iid": 55, "merged_at": "..." } + ], + "merge_requests": [ + { "iid": 99, "title": "Refactor auth", "state": "merged", + "author": "jdoe", "merged_at": "...", "change_type": "modified" } + ], + "discussions": [ + { "discussion_id": 123, "mr_iid": 99, "author": "reviewer", + "body_snippet": "...", "path": "src/auth/middleware.rs" } + ] + }, + "meta": { "elapsed_ms": 30, "total_mrs": 5, "renames_followed": true } +} +``` + +--- + +#### `trace` + +File -> MR -> issue -> discussion chain to understand why code was introduced. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | File path (future: `:line` suffix) | +| `-p, --project` | string | — | Scope to project | +| `--discussions` | flag | — | Include DiffNote snippets | +| `--no-follow-renames` | flag | — | Skip rename chain | +| `-n, --limit` | int | 20 | Max chains | + +**DB tables:** `mr_file_changes`, `merge_requests`, `issues`, `discussions`, `notes`, `entity_references` + +**Robot output:** +```json +{ + "data": { + "path": "src/auth/middleware.rs", + "resolved_paths": ["src/auth/middleware.rs", "src/auth.rs"], + "trace_chains": [ + { + "mr_iid": 99, "mr_title": "Refactor auth", "mr_state": "merged", + "mr_author": "jdoe", "change_type": "modified", + "merged_at_iso": "...", "web_url": "...", + "issues": [42], + "discussions": [ + { "discussion_id": 123, "author_username": "reviewer", + "body_snippet": "...", "path": "src/auth/middleware.rs" } + ] + } + ] + }, + "meta": { "tier": "api_only", "total_chains": 3, "renames_followed": 1 } +} +``` + +--- + +#### `related` + +Find semantically related entities via vector search. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | Entity type (`issues`, `mrs`) or free text | +| `[IID]` | positional | — | Entity IID (required with entity type) | +| `-n, --limit` | int | 10 | Max results | +| `-p, --project` | string | — | Scope to project | + +**Two modes:** +- Entity mode: `related issues 42` — find entities similar to issue #42 +- Query mode: `related "auth flow"` — find entities matching free text + +**DB tables:** `documents`, `embeddings` (vec0), `projects` + +**Requires:** Ollama running (for query mode embedding) + +--- + +#### `drift` + +Detect discussion divergence from original intent. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | Currently only `issues` | +| `` | positional | required | Entity IID | +| `--threshold` | f32 | 0.4 | Similarity threshold (0.0-1.0) | +| `-p, --project` | string | — | Scope to project | + +**DB tables:** `issues`, `discussions`, `notes`, `embeddings` + +**Requires:** Ollama running + +--- + +### 1.3 Data Pipeline Commands + +#### `sync` (Full Pipeline) + +Complete sync: ingest -> generate-docs -> embed. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--full` | flag | — | Full re-sync (reset cursors) | +| `-f, --force` | flag | — | Override stale lock | +| `--no-embed` | flag | — | Skip embedding | +| `--no-docs` | flag | — | Skip doc generation | +| `--no-events` | flag | — | Skip resource events | +| `--no-file-changes` | flag | — | Skip MR file changes | +| `--no-status` | flag | — | Skip work-item status enrichment | +| `--dry-run` | flag | — | Preview without changes | +| `-t, --timings` | flag | — | Show timing breakdown | +| `--lock` | flag | — | Acquire file lock | +| `--issue` | int[] | — | Surgically sync specific issues (repeatable) | +| `--mr` | int[] | — | Surgically sync specific MRs (repeatable) | +| `-p, --project` | string | — | Required with `--issue`/`--mr` | +| `--preflight-only` | flag | — | Validate without DB writes | + +**Stages:** Ingest -> GraphQL status enrichment -> Generate docs -> Embed + +#### `ingest` + +Fetch data from GitLab API only. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `[ENTITY]` | positional | — | `issues` or `mrs` (omit for all) | +| `-p, --project` | string | — | Filter to single project | +| `-f, --force` | flag | — | Override stale lock | +| `--full` | flag | — | Full re-sync | +| `--dry-run` | flag | — | Preview | + +#### `generate-docs` + +Create searchable documents from ingested data. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--full` | flag | — | Full rebuild | +| `-p, --project` | string | — | Single project rebuild | + +#### `embed` + +Generate vector embeddings via Ollama. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--full` | flag | — | Re-embed all | +| `--retry-failed` | flag | — | Retry failed embeddings | + +**Requires:** Ollama running with `nomic-embed-text` + +--- + +### 1.4 Diagnostic Commands + +#### `health` + +Quick pre-flight check. Exit 0 = healthy, exit 19 = unhealthy. + +**Checks:** config found, DB found, schema current. + +```json +{ + "data": { + "healthy": true, + "config_found": true, "db_found": true, + "schema_current": true, "schema_version": 28 + } +} +``` + +#### `auth` + +Verify GitLab authentication. + +**Checks:** token set, GitLab reachable, user identity. + +#### `doctor` + +Comprehensive environment check. + +**Checks:** config validity, token, GitLab connectivity, DB health, migration status, Ollama availability, model status. + +```json +{ + "data": { + "config": { "valid": true, "path": "..." }, + "token": { "set": true, "gitlab": { "reachable": true, "user": "jdoe" } }, + "database": { "exists": true, "version": 28, "tables": 25 }, + "ollama": { "available": true, "model_ready": true } + } +} +``` + +#### `status` (alias: `st`) + +Show sync state per project (last sync times, cursors). + +```json +{ + "data": { + "projects": [ + { "project_path": "group/repo", "last_synced_at": "...", + "document_count": 5000, "discussion_count": 2000 } + ] + } +} +``` + +#### `stats` (alias: `stat`) + +Document and index statistics with optional integrity checks. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--check` | flag | — | Run integrity checks | +| `--repair` | flag | — | Fix issues (implies `--check`) | +| `--dry-run` | flag | — | Preview repairs | + +```json +{ + "data": { + "documents": { "total": 61652, "issues": 5000, "mrs": 2000, "notes": 50000 }, + "embeddings": { "total": 80000, "synced": 79500, "pending": 500 }, + "fts": { "total_docs": 61652 }, + "queues": { "pending": 0, "in_progress": 0, "failed": 0 }, + "integrity": { "ok": true } + } +} +``` + +--- + +### 1.5 Setup Commands + +| Command | Purpose | +|---|---| +| `init` | Initialize config + DB (interactive or `--non-interactive`) | +| `token set` | Store GitLab token (interactive or `--token`) | +| `token show` | Display token (`--unmask` for full) | +| `cron install` | Schedule auto-sync (`--interval` minutes, default 8) | +| `cron uninstall` | Remove cron job | +| `cron status` | Check cron installation | +| `migrate` | Run pending DB migrations | + +### 1.6 Meta Commands + +| Command | Purpose | +|---|---| +| `version` | Show version string | +| `completions ` | Generate shell completions (bash/zsh/fish/powershell) | +| `robot-docs` | Machine-readable command manifest (`--brief` for smaller) | + +--- + +## 2. Shared Data Source Map + +Which DB tables power which commands. Higher overlap = more consolidation potential. + +| Table | Read By | +|---|---| +| `issues` | issues, me, who-workload, search, timeline, trace, count, stats | +| `merge_requests` | mrs, me, who-workload, search, timeline, trace, file-history, count, stats | +| `notes` | notes, issues-detail, mrs-detail, who-expert, who-active, search, timeline, trace, file-history | +| `discussions` | notes, issues-detail, mrs-detail, who-active, who-reviews, timeline, trace | +| `entity_references` | trace, timeline | +| `mr_file_changes` | trace, file-history, who-overlap | +| `resource_state_events` | timeline, me-activity | +| `resource_label_events` | timeline | +| `resource_milestone_events` | timeline | +| `documents` + FTS | search, stats | +| `embeddings` | search, related, drift | +| `document_labels` | search | +| `document_paths` | search | +| `issue_labels` | issues, me | +| `mr_labels` | mrs, me | +| `mr_reviewers` | mrs, who-expert, who-workload | +| `issue_assignees` | issues, me | +| `sync_cursors` | status | +| `dirty_sources` | stats | + +--- + +## 3. Command Network Graph + +### 3.1 Data Flow (command A feeds into command B) + +``` + ┌─────────┐ + │ search │─────────────────────────────┐ + └────┬────┘ │ + │ iid │ topic + ┌────▼────┐ ┌────▼─────┐ + ┌─────│ issues │◄───────────────────────│ timeline │ + │ │ mrs │ (detail) └──────────┘ + │ └────┬────┘ ▲ + │ │ iid │ entity ref + │ ┌────▼────┐ ┌──────────────┐ │ + │ │ related │ │ file-history │───────┘ + │ │ drift │ └──────┬───────┘ + │ └─────────┘ │ MR iids + │ ┌────▼────┐ + │ │ trace │──── issues (linked) + │ └────┬────┘ + │ │ paths + │ ┌────▼────┐ + │ │ who │ + │ │ (expert)│ + │ └─────────┘ + │ + file paths ┌─────────┐ + │ │ me │──── issues, mrs (dashboard) + ▼ └─────────┘ + ┌──────────┐ ▲ + │ notes │ │ (same data) + └──────────┘ ┌────┴──────┐ + │who workload│ + └───────────┘ +``` + +### 3.2 Shared-Data Clusters + +Commands that read from the same primary tables form natural clusters: + +**Cluster A: Issue/MR entities** — `issues`, `mrs`, `me`, `who workload`, `count` +All read from `issues` + `merge_requests` with similar filter patterns. + +**Cluster B: Notes/discussions** — `notes`, `issues detail`, `mrs detail`, `who expert`, `who active`, `timeline` +All traverse the `discussions` -> `notes` join path. + +**Cluster C: File genealogy** — `trace`, `file-history`, `who overlap` +All use `mr_file_changes` with rename chain BFS. + +**Cluster D: Semantic/vector** — `search`, `related`, `drift` +All use `documents` + `embeddings` (require Ollama). + +**Cluster E: Diagnostics** — `health`, `auth`, `doctor`, `status`, `stats` +All check system state at various granularities. + +--- + +## 4. Overlap Analysis + +### 4.1 High Overlap (>70% functional duplication) + +#### `who workload` vs `me` + +| Dimension | `who @user` (workload) | `me --user @user` | +|---|---|---| +| Assigned issues | Yes | Yes | +| Authored MRs | Yes | Yes | +| Reviewing MRs | Yes | Yes | +| Attention state | No | **Yes** | +| Activity feed | No | **Yes** | +| Since-last-check inbox | No | **Yes** | +| Cross-project | Yes | **Yes** | + +**Verdict:** `who workload` is a strict subset of `me`. 85% overlap. + +#### `health` vs `doctor` + +| Check | `health` | `doctor` | +|---|---|---| +| Config found | Yes | Yes | +| DB exists | Yes | Yes | +| Schema current | Yes | Yes | +| Token valid | No | **Yes** | +| GitLab reachable | No | **Yes** | +| Ollama available | No | **Yes** | + +**Verdict:** `health` is a strict subset of `doctor`. 90% overlap (but `health` is fast pre-flight with different exit code semantics). + +#### `file-history` vs `trace` + +| Feature | `file-history` | `trace` | +|---|---|---| +| Find MRs for file | Yes | Yes | +| Rename chain BFS | Yes | Yes | +| DiffNote discussions | `--discussions` | `--discussions` | +| Follow to linked issues | No | **Yes** | +| `--merged` filter | **Yes** | No | + +**Verdict:** `trace` is a superset minus the `--merged` flag. 75% overlap. + +#### `related` (query mode) vs `search --mode semantic` + +| Feature | `related "text"` | `search "text" --mode semantic` | +|---|---|---| +| Vector similarity | Yes | Yes | +| FTS component | No | No (semantic mode) | +| Filters (labels, author, since) | No | **Yes** | +| Explain ranking | No | **Yes** | +| Field selection | No | **Yes** | + +**Verdict:** `related` query mode is `search --mode semantic` without filters. 80% overlap. + +### 4.2 Medium Overlap (40-70%) + +| Pair | Overlap % | Notes | +|---|---|---| +| `who expert` vs `who overlap` | ~50% | Both answer "who works on this file"; expert has decay scoring, overlap has raw counts | +| `timeline` vs `trace` | ~45% | Both follow `entity_references`; timeline is entity-centric, trace is file-centric | +| `auth` vs `doctor` | ~100% of auth | auth is fully contained within doctor | +| `count` vs `stats` | ~40% | Both answer "how much data" at different layers (entity vs document index) | +| `notes` vs `issues/mrs detail` | ~50% | Detail embeds notes inline; `notes` adds independent filtering | + +### 4.3 No Significant Overlap + +| Command | Reason | +|---|---| +| `drift` | Unique: semantic divergence detection | +| `timeline` | Unique: multi-entity chronological reconstruction | +| `search` (hybrid) | Unique: FTS + vector combined ranking | +| `me` (inbox) | Unique: cursor-based since-last-check | +| `who expert` | Unique: half-life decay scoring with signal types | +| `who reviews` | Unique: review pattern analysis | + +--- + +## 5. Agent Workflow Analysis + +### 5.1 Common Workflows with Round-Trip Counts + +#### Flow 1: "What should I work on?" — 4 round trips + +``` +me → dashboard overview +issues -p proj → detail on picked issue +trace src/relevant/file.rs → understand code context +who src/relevant/file.rs → find domain experts +``` + +#### Flow 2: "What happened with this feature?" — 3 round trips + +``` +search "feature name" → find relevant entities +timeline "feature name" → reconstruct chronological history +related issues 42 → discover connected work +``` + +#### Flow 3: "Why was this code changed?" — 3 round trips + +``` +trace src/file.rs → file -> MR -> issue chain +issues -p proj → full issue detail +timeline "issue:42" → full history with cross-refs +``` + +#### Flow 4: "Is the system healthy?" — 2-4 round trips + +``` +health → quick pre-flight +doctor → detailed diagnostics +status → sync state per project +stats → document/index health +``` + +#### Flow 5: "Who can review this?" — 2-3 round trips + +``` +who src/auth/ → find file experts +who @jdoe --reviews → check reviewer's patterns +``` + +#### Flow 6: "Find and understand an issue" — 4 round trips + +``` +search "query" → discover entities +issues → full detail with discussions +timeline "issue:42" → chronological context +related issues 42 → connected entities +``` + +### 5.2 Token Cost Profiles + +Measured typical response sizes in robot mode: + +| Command | Typical Tokens | Notes | +|---|---|---| +| `me` (full) | 2000-5000 | Scales with open items | +| `me --fields minimal` | 500-1500 | Good reduction | +| `issues` (list, n=50) | 1500-3000 | Labels inflate it | +| `issues ` (detail) | 1000-8000 | Discussions dominate | +| `timeline` (limit=100) | 2000-6000 | Events + evidence | +| `search` (n=20) | 1000-3000 | Snippets dominate | +| `who expert` | 300-800 | Compact | +| `trace` | 500-2000 | Depends on chain depth | +| `health` | ~100 | Very compact | +| `stats` | ~500 | Fixed structure | + +--- + +## 6. Proposals + +### 6.1 Command Consolidations + +#### A. Absorb `file-history` into `trace --shallow` + +**Rationale:** 75% code overlap. Both do rename chain BFS on `mr_file_changes`, both optionally include DiffNote discussions. `trace` just follows entity_references one step further. + +**Change:** +- `trace ` — full chain: file -> MR -> issue -> discussions (existing) +- `trace --shallow` — MR-only, no issue chain (replaces `file-history`) +- Move `--merged` flag from `file-history` to `trace` +- Deprecate `file-history` with alias redirect + +**Impact:** -1 command. No functionality lost. + +#### B. Absorb `auth` into `doctor` + +**Rationale:** `auth` checks token + GitLab connectivity. `doctor` checks the same plus DB + Ollama. `auth` is 100% contained within `doctor`. + +**Change:** +- `doctor` — full check (existing) +- `doctor --auth` — token + GitLab only (replaces `auth`) +- Keep `health` separate (fast pre-flight, different exit code contract) +- Deprecate `auth` with alias redirect + +**Impact:** -1 command. `health` stays because its ~50ms pre-flight with exit 0/19 contract is valuable for scripting. + +#### C. Remove `related` query-mode, keep entity-mode + +**Rationale:** `related "auth flow"` is functionally identical to `search "auth flow" --mode semantic` but with fewer filter options. + +**Change:** +- `related issues 42` — entity-seeded mode (keep, it's unique) +- `related "free text"` — print deprecation, suggest `search --mode semantic` +- Eventually remove query-mode entirely + +**Impact:** -1 mode. Entity-seeded mode stays because it seeds from a specific entity's embedding rather than generating a new one. + +#### D. Merge `who overlap` into `who expert` output + +**Rationale:** `who overlap` reports who touches a file (raw count). `who expert` reports who knows a file (scored). Overlap is strictly less information. + +**Change:** +- `who ` (expert) adds `touch_count` and `last_touch_at` fields to each expert row +- `who --overlap ` becomes an alias for `who --fields username,touch_count` +- Deprecate `--overlap` flag with redirect + +**Impact:** -1 sub-mode. Agents get both perspectives in one call. + +#### E. Merge `count` and `status` into `stats` + +**Rationale:** Three commands answer "how much data do I have?": +- `count issues` — entity counts with state breakdown +- `status` — sync cursor positions per project +- `stats` — document/index counts + integrity + +**Change:** +- `stats` — document/index health (existing) +- `stats --entities` — adds entity counts (replaces `count`) +- `stats --sync` — adds sync cursor positions (replaces `status`) +- `stats --all` — everything +- Bare `stats` keeps current behavior (documents + queues + integrity) +- Deprecate `count` and `status` with alias redirects + +**Impact:** -2 commands. Progressive disclosure via flags. + +#### Consolidation Summary + +| Before | After | Commands Removed | +|---|---|---| +| `file-history`, `trace` | `trace` (+ `--shallow`) | -1 | +| `auth`, `doctor` | `doctor` (+ `--auth`) | -1 | +| `related` query-mode | `search --mode semantic` | -1 mode | +| `who overlap`, `who expert` | `who expert` (+ touch_count) | -1 sub-mode | +| `count`, `status`, `stats` | `stats` (+ `--entities`, `--sync`) | -2 | + +**Total: 34 commands -> 29 commands** (5 fewer entry points for agents to learn). + +--- + +### 6.2 Robot-Mode Optimizations + +These are additive changes that reduce round trips and token waste without removing commands. + +#### A. `--include` flag for embedded sub-queries + +The single highest-impact optimization. Agents constantly fetch an entity and then immediately need related context. + +**Syntax:** +```bash +lore -J issues 42 -p proj --include timeline,related +lore -J mrs 99 -p proj --include timeline,trace +lore -J trace src/auth/ -p proj --include experts +lore -J me --include detail +``` + +**Response shape (embedded data uses `_` prefix):** +```json +{ + "ok": true, + "data": { + "iid": 42, "title": "Fix auth", "state": "opened", + "discussions": [...], + "_timeline": { + "event_count": 15, + "events": [...] + }, + "_related": { + "similar_entities": [...] + } + } +} +``` + +**Include matrix (which includes are valid on which commands):** + +| Base Command | Valid Includes | Default Limits | +|---|---|---| +| `issues ` | `timeline`, `related`, `trace` | 20 events, 5 related, 5 chains | +| `mrs ` | `timeline`, `related`, `file-changes` | 20 events, 5 related | +| `trace ` | `experts`, `timeline` | 5 experts, 20 events | +| `me` | `detail` (inline top-N item details) | 3 items detailed | +| `search` | `detail` (inline top-N result details) | 3 results detailed | + +**Round-trip savings:** + +| Workflow | Before | After | Savings | +|---|---|---|---| +| Understand an issue | 4 calls | 1 call | **75%** | +| Why was code changed | 3 calls | 1 call | **67%** | +| Find and understand | 4 calls | 2 calls | **50%** | + +**Implementation notes:** +- Each include runs its sub-query with reduced limits (configurable via `--include-limit`) +- Sub-query errors are non-fatal: returned as `"_timeline_error": "Ollama unavailable"` instead of failing the whole request +- `--fields minimal` applies to included data too +- Timing breakdown in meta: `"_timeline_ms": 45, "_related_ms": 120` + +#### B. `--batch` flag for multi-entity detail lookups + +After search/timeline, agents typically need detail on multiple entities: + +```bash +# Before: N round trips +lore -J issues 42 -p proj +lore -J issues 55 -p proj +lore -J issues 71 -p proj + +# After: 1 round trip +lore -J issues --batch 42,55,71 -p proj +``` + +**Response:** +```json +{ + "data": { + "results": [ + { "iid": 42, "title": "Fix auth", ... }, + { "iid": 55, "title": "Add SSO", ... }, + { "iid": 71, "title": "Token refresh", ... } + ], + "errors": [] + } +} +``` + +**Constraints:** +- Max 20 IIDs per batch +- Errors for individual items don't fail the batch +- Works with `--include` for maximum efficiency + +#### C. `--depth` control on `me` + +The `me` command returns everything by default, which is token-wasteful when an agent just wants to check "do I have work?". + +```bash +# Counts only (~100 tokens) +lore -J me --depth counts + +# Counts + titles (~400 tokens) +lore -J me --depth titles + +# Full (current behavior, 2000+ tokens) +lore -J me --depth full +``` + +**Depth levels:** + +| Level | Returns | Typical Tokens | +|---|---|---| +| `counts` | `summary` block only | ~100 | +| `titles` | summary + item lists (iid, title, attention_state only) | ~400 | +| `full` | Everything including discussions, activity, inbox | ~2000+ | + +#### D. Composite `context` command + +Purpose-built for the most common agent workflow: "give me everything I need to understand this entity." + +```bash +lore -J context issues 42 -p proj +lore -J context mrs 99 -p proj +``` + +**Equivalent to:** +```bash +lore -J issues 42 -p proj --include timeline,related +``` + +But with optimized defaults: +- Timeline limited to 20 most recent events +- Related limited to top 5 +- Discussions truncated after 5 threads +- Evidence notes capped at 3 + +**Response:** Same shape as `issues --include timeline,related` but with tighter defaults. + +**Rationale:** Rather than teaching agents the `--include` syntax, provide a single "give me full context" command. + +#### E. `--max-tokens` response budget + +Let the agent cap response size. The server truncates arrays, omits low-priority fields, and uses shorter representations to stay within budget. + +```bash +lore -J me --max-tokens 500 +lore -J timeline "feature" --max-tokens 1000 +lore -J context issues 42 --max-tokens 2000 +``` + +**Truncation strategy (in priority order):** +1. Apply `--fields minimal` if not already set +2. Reduce array lengths (newest/highest-score items survive) +3. Truncate string fields (description, snippets) +4. Omit null/empty fields +5. Drop included sub-queries (if using `--include`) + +**Meta includes truncation notice:** +```json +{ + "meta": { + "elapsed_ms": 50, + "truncated": true, + "original_tokens": 3500, + "budget_tokens": 1000, + "dropped": ["_related", "discussions[5:]"] + } +} +``` + +#### F. `--format tsv` for maximum token efficiency + +Beyond JSON, offer a TSV mode for list commands where structured data is simple: + +```bash +lore -J issues --format tsv --fields iid,title,state -n 10 +``` + +**Output:** +``` +iid title state +42 Fix auth opened +55 Add SSO opened +71 Token refresh closed +``` + +**Estimated savings:** ~60-70% fewer tokens vs JSON for list responses. + +**Applicable to:** `issues` (list), `mrs` (list), `notes` (list), `who expert`, `count`. + +**Not applicable to:** Detail views, timeline, search (nested structures). + +--- + +### 6.3 Robot-Mode Optimization Impact Matrix + +| Optimization | Effort | Round-Trip Savings | Token Savings | Breaking? | +|---|---|---|---|---| +| `--include` flag | High | **50-75%** | Moderate | No (additive) | +| `--batch` flag | Medium | **N-1 per batch** | Moderate | No (additive) | +| `--depth` on `me` | Low | None | **60-80%** | No (additive) | +| `context` command | Medium | **67-75%** | Moderate | No (additive) | +| `--max-tokens` | High | None | **Variable, up to 80%** | No (additive) | +| `--format tsv` | Medium | None | **60-70% on lists** | No (additive) | + +--- + +## 7. Priority Ranking + +Ordered by impact on robot-mode efficiency (round-trip reduction x token savings x implementation ease): + +| Priority | Proposal | Category | Effort | Impact | +|---|---|---|---|---| +| **P0** | `--include` flag | Robot optimization | High | Eliminates 2-3 round trips per workflow | +| **P0** | `--depth` on `me` | Robot optimization | Low | 60-80% token reduction on most-used command | +| **P1** | `--batch` for detail views | Robot optimization | Medium | Eliminates N+1 after search/timeline | +| **P1** | Absorb `file-history` into `trace` | Consolidation | Low | Cleaner surface, shared code | +| **P1** | Merge `who overlap` into `who expert` | Consolidation | Low | -1 round trip in review flows | +| **P2** | `context` composite command | Robot optimization | Medium | Single entry point for understanding entities | +| **P2** | Merge `count`+`status` into `stats` | Consolidation | Medium | -2 commands, progressive disclosure | +| **P2** | Absorb `auth` into `doctor` | Consolidation | Low | -1 command | +| **P2** | Remove `related` query-mode | Consolidation | Low | -1 confusing choice | +| **P3** | `--max-tokens` budget | Robot optimization | High | Flexible but hard to implement well | +| **P3** | `--format tsv` | Robot optimization | Medium | High savings but limited applicability | + +--- + +## 8. Appendix: Robot Output Envelope + +All robot-mode responses follow: + +```json +{ + "ok": true, + "data": { /* command-specific */ }, + "meta": { "elapsed_ms": 42 } +} +``` + +Errors (to stderr): +```json +{ + "error": { + "code": "CONFIG_NOT_FOUND", + "message": "Configuration file not found", + "suggestion": "Run 'lore init'", + "actions": ["lore init"] + } +} +``` + +Exit codes: 0 (success), 1 (internal), 2 (usage), 3 (config invalid), 4 (token missing), 5 (auth failed), 6 (resource not found), 7 (rate limited), 8 (network), 9 (DB locked), 10 (DB error), 11 (migration), 12 (I/O), 13 (transform), 14 (Ollama unavailable), 15 (model missing), 16 (embedding failed), 17 (not found), 18 (ambiguous match), 19 (health failed), 20 (config not found). + +--- + +## 9. Appendix: Field Selection Presets + +The `--fields` flag supports both presets and custom field lists: + +```bash +lore -J issues --fields minimal # Preset +lore -J mrs --fields iid,title,state,draft # Custom +``` + +| Command | Minimal Preset Fields | +|---|---| +| `issues` (list) | `iid`, `title`, `state`, `updated_at_iso` | +| `mrs` (list) | `iid`, `title`, `state`, `updated_at_iso` | +| `notes` (list) | `id`, `author_username`, `body`, `created_at_iso` | +| `search` | `document_id`, `title`, `source_type`, `score` | +| `timeline` | `timestamp`, `type`, `entity_iid`, `detail` | +| `who expert` | `username`, `score` | +| `who workload` | `iid`, `title`, `state` | +| `who reviews` | `name`, `count`, `percentage` | +| `who active` | `entity_type`, `iid`, `title`, `participants` | +| `who overlap` | `username`, `touch_count` | +| `me` (items) | `iid`, `title`, `attention_state`, `updated_at_iso` | +| `me` (activity) | `timestamp_iso`, `event_type`, `entity_iid`, `actor` | diff --git a/docs/command-surface-analysis/00-overview.md b/docs/command-surface-analysis/00-overview.md new file mode 100644 index 0000000..40a4f05 --- /dev/null +++ b/docs/command-surface-analysis/00-overview.md @@ -0,0 +1,92 @@ +# Lore Command Surface Analysis — Overview + +**Date:** 2026-02-26 +**Version:** v0.9.1 (439c20e) + +--- + +## Purpose + +Deep analysis of the full `lore` CLI command surface: what each command does, how commands overlap, how they connect in agent workflows, and where consolidation and robot-mode optimization can reduce round trips and token waste. + +## Document Map + +| File | Contents | When to Read | +|---|---|---| +| **00-overview.md** | This file. Summary, inventory, priorities. | Always read first. | +| [01-entity-commands.md](01-entity-commands.md) | `issues`, `mrs`, `notes`, `search`, `count` — flags, DB tables, robot schemas | Need command reference for entity queries | +| [02-intelligence-commands.md](02-intelligence-commands.md) | `who`, `timeline`, `me`, `file-history`, `trace`, `related`, `drift` | Need command reference for intelligence/analysis | +| [03-pipeline-and-infra.md](03-pipeline-and-infra.md) | `sync`, `ingest`, `generate-docs`, `embed`, diagnostics, setup | Need command reference for data management | +| [04-data-flow.md](04-data-flow.md) | Shared data source map, command network graph, clusters | Understanding how commands interconnect | +| [05-overlap-analysis.md](05-overlap-analysis.md) | Quantified overlap percentages for every command pair | Evaluating what to consolidate | +| [06-agent-workflows.md](06-agent-workflows.md) | Common agent flows, round-trip costs, token profiles | Understanding inefficiency pain points | +| [07-consolidation-proposals.md](07-consolidation-proposals.md) | 5 proposals to reduce 34 commands to 29 | Planning command surface changes | +| [08-robot-optimization-proposals.md](08-robot-optimization-proposals.md) | 6 proposals for `--include`, `--batch`, `--depth`, etc. | Planning robot-mode improvements | +| [09-appendices.md](09-appendices.md) | Robot output envelope, field presets, exit codes | Reference material | + +--- + +## Command Inventory (34 commands) + +| Category | Commands | Count | +|---|---|---| +| Entity Query | `issues`, `mrs`, `notes`, `search`, `count` | 5 | +| Intelligence | `who` (5 modes), `timeline`, `related`, `drift`, `me`, `file-history`, `trace` | 7 (11 with who sub-modes) | +| Data Pipeline | `sync`, `ingest`, `generate-docs`, `embed` | 4 | +| Diagnostics | `health`, `auth`, `doctor`, `status`, `stats` | 5 | +| Setup | `init`, `token`, `cron`, `migrate` | 4 | +| Meta | `version`, `completions`, `robot-docs` | 3 | + +--- + +## Key Findings + +### High-Overlap Pairs + +| Pair | Overlap | Recommendation | +|---|---|---| +| `who workload` vs `me` | ~85% | Workload is a strict subset of me | +| `health` vs `doctor` | ~90% | Health is a strict subset of doctor | +| `file-history` vs `trace` | ~75% | Trace is a superset minus `--merged` | +| `related` query-mode vs `search --mode semantic` | ~80% | Related query-mode is search without filters | +| `auth` vs `doctor` | ~100% of auth | Auth is fully contained within doctor | + +### Agent Workflow Pain Points + +| Workflow | Current Round Trips | With Optimizations | +|---|---|---| +| "Understand this issue" | 4 calls | 1 call (`--include`) | +| "Why was code changed?" | 3 calls | 1 call (`--include`) | +| "What should I work on?" | 4 calls | 2 calls | +| "Find and understand" | 4 calls | 2 calls | +| "Is system healthy?" | 2-4 calls | 1 call | + +--- + +## Priority Ranking + +| Pri | Proposal | Category | Effort | Impact | +|---|---|---|---|---| +| **P0** | `--include` flag on detail commands | Robot optimization | High | Eliminates 2-3 round trips per workflow | +| **P0** | `--depth` on `me` command | Robot optimization | Low | 60-80% token reduction on most-used command | +| **P1** | `--batch` for detail views | Robot optimization | Medium | Eliminates N+1 after search/timeline | +| **P1** | Absorb `file-history` into `trace` | Consolidation | Low | Cleaner surface, shared code | +| **P1** | Merge `who overlap` into `who expert` | Consolidation | Low | -1 round trip in review flows | +| **P2** | `context` composite command | Robot optimization | Medium | Single entry point for entity understanding | +| **P2** | Merge `count`+`status` into `stats` | Consolidation | Medium | -2 commands, progressive disclosure | +| **P2** | Absorb `auth` into `doctor` | Consolidation | Low | -1 command | +| **P2** | Remove `related` query-mode | Consolidation | Low | -1 confusing choice | +| **P3** | `--max-tokens` budget | Robot optimization | High | Flexible but complex to implement | +| **P3** | `--format tsv` | Robot optimization | Medium | High savings, limited applicability | + +### Consolidation Summary + +| Before | After | Removed | +|---|---|---| +| `file-history` + `trace` | `trace` (+ `--shallow`) | -1 | +| `auth` + `doctor` | `doctor` (+ `--auth`) | -1 | +| `related` query-mode | `search --mode semantic` | -1 mode | +| `who overlap` + `who expert` | `who expert` (+ touch_count) | -1 sub-mode | +| `count` + `status` + `stats` | `stats` (+ `--entities`, `--sync`) | -2 | + +**Total: 34 commands -> 29 commands** diff --git a/docs/command-surface-analysis/01-entity-commands.md b/docs/command-surface-analysis/01-entity-commands.md new file mode 100644 index 0000000..aabeec9 --- /dev/null +++ b/docs/command-surface-analysis/01-entity-commands.md @@ -0,0 +1,308 @@ +# Entity Query Commands + +Reference for: `issues`, `mrs`, `notes`, `search`, `count` + +--- + +## `issues` (alias: `issue`) + +List or show issues from local database. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `[IID]` | positional | — | Omit to list, provide to show detail | +| `-n, --limit` | int | 50 | Max results | +| `--fields` | string | — | Select output columns (preset: `minimal`) | +| `-s, --state` | enum | — | `opened\|closed\|all` | +| `-p, --project` | string | — | Filter by project (fuzzy) | +| `-a, --author` | string | — | Filter by author username | +| `-A, --assignee` | string | — | Filter by assignee username | +| `-l, --label` | string[] | — | Filter by labels (AND logic, repeatable) | +| `-m, --milestone` | string | — | Filter by milestone title | +| `--status` | string[] | — | Filter by work-item status (COLLATE NOCASE, OR logic) | +| `--since` | duration/date | — | Filter by created date (`7d`, `2w`, `YYYY-MM-DD`) | +| `--due-before` | date | — | Filter by due date | +| `--has-due` | flag | — | Show only issues with due dates | +| `--sort` | enum | `updated` | `updated\|created\|iid` | +| `--asc` | flag | — | Sort ascending | +| `-o, --open` | flag | — | Open first match in browser | + +**DB tables:** `issues`, `projects`, `issue_assignees`, `issue_labels`, `labels` +**Detail mode adds:** `discussions`, `notes`, `entity_references` (closing MRs) + +### Robot Output (list mode) + +```json +{ + "ok": true, + "data": { + "issues": [ + { + "iid": 42, "title": "Fix auth", "state": "opened", + "author_username": "jdoe", "labels": ["backend"], + "assignees": ["jdoe"], "discussion_count": 3, + "unresolved_count": 1, "created_at_iso": "...", + "updated_at_iso": "...", "web_url": "...", + "project_path": "group/repo", + "status_name": "In progress" + } + ], + "total_count": 150, "showing": 50 + }, + "meta": { "elapsed_ms": 40, "available_statuses": ["Open", "In progress", "Closed"] } +} +``` + +### Robot Output (detail mode — `issues `) + +```json +{ + "ok": true, + "data": { + "id": 12345, "iid": 42, "title": "Fix auth", + "description": "Full markdown body...", + "state": "opened", "author_username": "jdoe", + "created_at": "...", "updated_at": "...", "closed_at": null, + "confidential": false, "web_url": "...", "project_path": "group/repo", + "references_full": "group/repo#42", + "labels": ["backend"], "assignees": ["jdoe"], + "due_date": null, "milestone": null, + "user_notes_count": 5, "merge_requests_count": 1, + "closing_merge_requests": [ + { "iid": 99, "title": "Refactor auth", "state": "merged", "web_url": "..." } + ], + "discussions": [ + { + "notes": [ + { "author_username": "jdoe", "body": "...", "created_at": "...", "is_system": false } + ], + "individual_note": false + } + ], + "status_name": "In progress", "status_color": "#1068bf" + } +} +``` + +**Minimal preset:** `iid`, `title`, `state`, `updated_at_iso` + +--- + +## `mrs` (aliases: `mr`, `merge-request`, `merge-requests`) + +List or show merge requests. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `[IID]` | positional | — | Omit to list, provide to show detail | +| `-n, --limit` | int | 50 | Max results | +| `--fields` | string | — | Select output columns (preset: `minimal`) | +| `-s, --state` | enum | — | `opened\|merged\|closed\|locked\|all` | +| `-p, --project` | string | — | Filter by project | +| `-a, --author` | string | — | Filter by author | +| `-A, --assignee` | string | — | Filter by assignee | +| `-r, --reviewer` | string | — | Filter by reviewer | +| `-l, --label` | string[] | — | Filter by labels (AND) | +| `--since` | duration/date | — | Filter by created date | +| `-d, --draft` | flag | — | Draft MRs only | +| `-D, --no-draft` | flag | — | Exclude drafts | +| `--target` | string | — | Filter by target branch | +| `--source` | string | — | Filter by source branch | +| `--sort` | enum | `updated` | `updated\|created\|iid` | +| `--asc` | flag | — | Sort ascending | +| `-o, --open` | flag | — | Open in browser | + +**DB tables:** `merge_requests`, `projects`, `mr_reviewers`, `mr_labels`, `labels`, `mr_assignees` +**Detail mode adds:** `discussions`, `notes`, `mr_diffs` + +### Robot Output (list mode) + +```json +{ + "ok": true, + "data": { + "mrs": [ + { + "iid": 99, "title": "Refactor auth", "state": "merged", + "draft": false, "author_username": "jdoe", + "source_branch": "feat/auth", "target_branch": "main", + "labels": ["backend"], "assignees": ["jdoe"], "reviewers": ["reviewer"], + "discussion_count": 5, "unresolved_count": 0, + "created_at_iso": "...", "updated_at_iso": "...", + "web_url": "...", "project_path": "group/repo" + } + ], + "total_count": 500, "showing": 50 + } +} +``` + +### Robot Output (detail mode — `mrs `) + +```json +{ + "ok": true, + "data": { + "id": 67890, "iid": 99, "title": "Refactor auth", + "description": "Full markdown body...", + "state": "merged", "draft": false, "author_username": "jdoe", + "source_branch": "feat/auth", "target_branch": "main", + "created_at": "...", "updated_at": "...", + "merged_at": "...", "closed_at": null, + "web_url": "...", "project_path": "group/repo", + "labels": ["backend"], "assignees": ["jdoe"], "reviewers": ["reviewer"], + "discussions": [ + { + "notes": [ + { + "author_username": "reviewer", "body": "...", + "created_at": "...", "is_system": false, + "position": { "new_path": "src/auth.rs", "new_line": 42 } + } + ], + "individual_note": false + } + ] + } +} +``` + +**Minimal preset:** `iid`, `title`, `state`, `updated_at_iso` + +--- + +## `notes` (alias: `note`) + +List discussion notes/comments with fine-grained filters. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `-n, --limit` | int | 50 | Max results | +| `--fields` | string | — | Preset: `minimal` | +| `-a, --author` | string | — | Filter by author | +| `--note-type` | enum | — | `DiffNote\|DiscussionNote` | +| `--contains` | string | — | Body text substring filter | +| `--note-id` | int | — | Internal note ID | +| `--gitlab-note-id` | int | — | GitLab note ID | +| `--discussion-id` | string | — | Discussion ID filter | +| `--include-system` | flag | — | Include system notes | +| `--for-issue` | int | — | Notes on specific issue (requires `-p`) | +| `--for-mr` | int | — | Notes on specific MR (requires `-p`) | +| `-p, --project` | string | — | Scope to project | +| `--since` | duration/date | — | Created after | +| `--until` | date | — | Created before (inclusive) | +| `--path` | string | — | File path filter (exact or prefix with `/`) | +| `--resolution` | enum | — | `any\|unresolved\|resolved` | +| `--sort` | enum | `created` | `created\|updated` | +| `--asc` | flag | — | Sort ascending | +| `--open` | flag | — | Open in browser | + +**DB tables:** `notes`, `discussions`, `projects`, `issues`, `merge_requests` + +### Robot Output + +```json +{ + "ok": true, + "data": { + "notes": [ + { + "id": 1234, "gitlab_id": 56789, + "author_username": "reviewer", "body": "...", + "note_type": "DiffNote", "is_system": false, + "created_at_iso": "...", "updated_at_iso": "...", + "position_new_path": "src/auth.rs", "position_new_line": 42, + "resolvable": true, "resolved": false, + "noteable_type": "MergeRequest", "parent_iid": 99, + "parent_title": "Refactor auth", "project_path": "group/repo" + } + ], + "total_count": 1000, "showing": 50 + } +} +``` + +**Minimal preset:** `id`, `author_username`, `body`, `created_at_iso` + +--- + +## `search` (aliases: `find`, `query`) + +Semantic + full-text search across indexed documents. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | Search query string | +| `--mode` | enum | `hybrid` | `lexical\|hybrid\|semantic` | +| `--type` | enum | — | `issue\|mr\|discussion\|note` | +| `--author` | string | — | Filter by author | +| `-p, --project` | string | — | Scope to project | +| `--label` | string[] | — | Filter by labels (AND) | +| `--path` | string | — | File path filter | +| `--since` | duration/date | — | Created after | +| `--updated-since` | duration/date | — | Updated after | +| `-n, --limit` | int | 20 | Max results (max: 100) | +| `--fields` | string | — | Preset: `minimal` | +| `--explain` | flag | — | Show ranking breakdown | +| `--fts-mode` | enum | `safe` | `safe\|raw` | + +**DB tables:** `documents`, `documents_fts` (FTS5), `embeddings` (vec0), `document_labels`, `document_paths`, `projects` + +**Search modes:** +- **lexical** — FTS5 with BM25 ranking (fastest, no Ollama needed) +- **hybrid** — RRF combination of lexical + semantic (default) +- **semantic** — Vector similarity only (requires Ollama) + +### Robot Output + +```json +{ + "ok": true, + "data": { + "query": "authentication bug", + "mode": "hybrid", + "total_results": 15, + "results": [ + { + "document_id": 1234, "source_type": "issue", + "title": "Fix SSO auth", "url": "...", + "author": "jdoe", "project_path": "group/repo", + "labels": ["auth"], "paths": ["src/auth/"], + "snippet": "...matching text...", + "score": 0.85, + "explain": { "vector_rank": 2, "fts_rank": 1, "rrf_score": 0.85 } + } + ], + "warnings": [] + } +} +``` + +**Minimal preset:** `document_id`, `title`, `source_type`, `score` + +--- + +## `count` + +Count entities in local database. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | `issues\|mrs\|discussions\|notes\|events\|references` | +| `-f, --for` | enum | — | Parent type: `issue\|mr` | + +**DB tables:** Conditional aggregation on entity tables + +### Robot Output + +```json +{ + "ok": true, + "data": { + "entity": "merge_requests", + "count": 1234, + "system_excluded": 5000, + "breakdown": { "opened": 100, "closed": 50, "merged": 1084 } + } +} +``` diff --git a/docs/command-surface-analysis/02-intelligence-commands.md b/docs/command-surface-analysis/02-intelligence-commands.md new file mode 100644 index 0000000..bc78f70 --- /dev/null +++ b/docs/command-surface-analysis/02-intelligence-commands.md @@ -0,0 +1,452 @@ +# Intelligence Commands + +Reference for: `who`, `timeline`, `me`, `file-history`, `trace`, `related`, `drift` + +--- + +## `who` (People Intelligence) + +Five sub-modes, dispatched by argument shape. + +| Mode | Trigger | Purpose | +|---|---|---| +| **expert** | `who ` or `who --path ` | Who knows about a code area? | +| **workload** | `who @username` | What is this person working on? | +| **reviews** | `who @username --reviews` | Review pattern analysis | +| **active** | `who --active` | Unresolved discussions needing attention | +| **overlap** | `who --overlap ` | Who else touches these files? | + +### Shared Flags + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `-p, --project` | string | — | Scope to project | +| `-n, --limit` | int | varies | Max results (1-500) | +| `--fields` | string | — | Preset: `minimal` | +| `--since` | duration/date | — | Time window | +| `--include-bots` | flag | — | Include bot users | +| `--include-closed` | flag | — | Include closed issues/MRs | +| `--all-history` | flag | — | Query all history | + +### Expert-Only Flags + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--detail` | flag | — | Per-MR breakdown | +| `--as-of` | date/duration | — | Score at point in time | +| `--explain-score` | flag | — | Score breakdown | + +### DB Tables by Mode + +| Mode | Primary Tables | +|---|---| +| expert | `notes` (INDEXED BY idx_notes_diffnote_path_created), `merge_requests`, `mr_reviewers` | +| workload | `issues`, `merge_requests`, `mr_reviewers` | +| reviews | `merge_requests`, `discussions`, `notes` | +| active | `discussions`, `notes`, `issues`, `merge_requests` | +| overlap | `notes`, `mr_file_changes`, `merge_requests` | + +### Robot Output (expert) + +```json +{ + "ok": true, + "data": { + "mode": "expert", + "input": { "target": "src/auth/", "path": "src/auth/" }, + "resolved_input": { "mode": "expert", "project_id": 1, "project_path": "group/repo" }, + "result": { + "experts": [ + { + "username": "jdoe", "score": 42.5, + "detail": { "mr_ids_author": [99, 101], "mr_ids_reviewer": [88] } + } + ] + } + } +} +``` + +### Robot Output (workload) + +```json +{ + "data": { + "mode": "workload", + "result": { + "assigned_issues": [{ "iid": 42, "title": "Fix auth", "state": "opened" }], + "authored_mrs": [{ "iid": 99, "title": "Refactor auth", "state": "merged" }], + "review_mrs": [{ "iid": 88, "title": "Add SSO", "state": "opened" }] + } + } +} +``` + +### Robot Output (reviews) + +```json +{ + "data": { + "mode": "reviews", + "result": { + "categories": [ + { + "category": "approval_rate", + "reviewers": [{ "name": "jdoe", "count": 15, "percentage": 85.0 }] + } + ] + } + } +} +``` + +### Robot Output (active) + +```json +{ + "data": { + "mode": "active", + "result": { + "discussions": [ + { "entity_type": "mr", "iid": 99, "title": "Refactor auth", "participants": ["jdoe", "reviewer"] } + ] + } + } +} +``` + +### Robot Output (overlap) + +```json +{ + "data": { + "mode": "overlap", + "result": { + "users": [{ "username": "jdoe", "touch_count": 15 }] + } + } +} +``` + +### Minimal Presets + +| Mode | Fields | +|---|---| +| expert | `username`, `score` | +| workload | `iid`, `title`, `state` | +| reviews | `name`, `count`, `percentage` | +| active | `entity_type`, `iid`, `title`, `participants` | +| overlap | `username`, `touch_count` | + +--- + +## `timeline` + +Reconstruct chronological event history for a topic/entity with cross-reference expansion. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | Search text or entity ref (`issue:42`, `mr:99`) | +| `-p, --project` | string | — | Scope to project | +| `--since` | duration/date | — | Filter events after | +| `--depth` | int | 1 | Cross-ref expansion depth (0=none) | +| `--no-mentions` | flag | — | Skip "mentioned" edges, keep "closes"/"related" | +| `-n, --limit` | int | 100 | Max events | +| `--fields` | string | — | Preset: `minimal` | +| `--max-seeds` | int | 10 | Max seed entities from search | +| `--max-entities` | int | 50 | Max expanded entities | +| `--max-evidence` | int | 10 | Max evidence notes | + +**Pipeline:** SEED -> HYDRATE -> EXPAND -> COLLECT -> RENDER + +**DB tables:** `issues`, `merge_requests`, `discussions`, `notes`, `entity_references`, `resource_state_events`, `resource_label_events`, `resource_milestone_events`, `documents` (for search seeding) + +### Robot Output + +```json +{ + "ok": true, + "data": { + "query": "authentication", "event_count": 25, + "seed_entities": [{ "type": "issue", "iid": 42, "project": "group/repo" }], + "expanded_entities": [ + { + "type": "mr", "iid": 99, "project": "group/repo", "depth": 1, + "via": { + "from": { "type": "issue", "iid": 42 }, + "reference_type": "closes" + } + } + ], + "unresolved_references": [ + { + "source": { "type": "issue", "iid": 42, "project": "group/repo" }, + "target_type": "mr", "target_iid": 200, "reference_type": "mentioned" + } + ], + "events": [ + { + "timestamp": "2026-01-15T10:30:00Z", + "entity_type": "issue", "entity_iid": 42, "project": "group/repo", + "event_type": "state_changed", "summary": "Reopened", + "actor": "jdoe", "is_seed": true, + "evidence_notes": [{ "author": "jdoe", "snippet": "..." }] + } + ] + }, + "meta": { + "elapsed_ms": 150, "search_mode": "fts", + "expansion_depth": 1, "include_mentions": true, + "total_entities": 5, "total_events": 25, + "evidence_notes_included": 8, "discussion_threads_included": 3, + "unresolved_references": 1, "showing": 25 + } +} +``` + +**Minimal preset:** `timestamp`, `type`, `entity_iid`, `detail` + +--- + +## `me` (Personal Dashboard) + +Personal work dashboard with issues, MRs, activity, and since-last-check inbox. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--issues` | flag | — | Open issues section only | +| `--mrs` | flag | — | MRs section only | +| `--activity` | flag | — | Activity feed only | +| `--since` | duration/date | `30d` | Activity window | +| `-p, --project` | string | — | Scope to one project | +| `--all` | flag | — | All synced projects | +| `--user` | string | — | Override configured username | +| `--fields` | string | — | Preset: `minimal` | +| `--reset-cursor` | flag | — | Clear since-last-check cursor | + +**Sections (no flags = all):** Issues, MRs authored, MRs reviewing, Activity, Inbox + +**DB tables:** `issues`, `merge_requests`, `resource_state_events`, `projects`, `issue_labels`, `mr_labels` + +### Robot Output + +```json +{ + "ok": true, + "data": { + "username": "jdoe", + "summary": { + "project_count": 3, "open_issue_count": 5, + "authored_mr_count": 2, "reviewing_mr_count": 1, + "needs_attention_count": 3 + }, + "since_last_check": { + "cursor_iso": "2026-02-25T18:00:00Z", + "total_event_count": 8, + "groups": [ + { + "entity_type": "issue", "entity_iid": 42, + "entity_title": "Fix auth", "project": "group/repo", + "events": [ + { "timestamp_iso": "...", "event_type": "comment", + "actor": "reviewer", "summary": "New comment" } + ] + } + ] + }, + "open_issues": [ + { + "project": "group/repo", "iid": 42, "title": "Fix auth", + "state": "opened", "attention_state": "needs_attention", + "status_name": "In progress", "labels": ["auth"], + "updated_at_iso": "..." + } + ], + "open_mrs_authored": [ + { + "project": "group/repo", "iid": 99, "title": "Refactor auth", + "state": "opened", "attention_state": "needs_attention", + "draft": false, "labels": ["backend"], "updated_at_iso": "..." + } + ], + "reviewing_mrs": [], + "activity": [ + { + "timestamp_iso": "...", "event_type": "state_changed", + "entity_type": "issue", "entity_iid": 42, "project": "group/repo", + "actor": "jdoe", "is_own": true, "summary": "Closed" + } + ] + } +} +``` + +**Minimal presets:** Items: `iid, title, attention_state, updated_at_iso` | Activity: `timestamp_iso, event_type, entity_iid, actor` + +--- + +## `file-history` + +Show which MRs touched a file, with linked discussions. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | File path to trace | +| `-p, --project` | string | — | Scope to project | +| `--discussions` | flag | — | Include DiffNote snippets | +| `--no-follow-renames` | flag | — | Skip rename chain resolution | +| `--merged` | flag | — | Only merged MRs | +| `-n, --limit` | int | 50 | Max MRs | + +**DB tables:** `mr_file_changes`, `merge_requests`, `notes` (DiffNotes), `projects` + +### Robot Output + +```json +{ + "ok": true, + "data": { + "path": "src/auth/middleware.rs", + "rename_chain": [ + { "previous_path": "src/auth.rs", "mr_iid": 55, "merged_at": "..." } + ], + "merge_requests": [ + { + "iid": 99, "title": "Refactor auth", "state": "merged", + "author": "jdoe", "merged_at": "...", "change_type": "modified" + } + ], + "discussions": [ + { + "discussion_id": 123, "mr_iid": 99, "author": "reviewer", + "body_snippet": "...", "path": "src/auth/middleware.rs" + } + ] + }, + "meta": { "elapsed_ms": 30, "total_mrs": 5, "renames_followed": true } +} +``` + +--- + +## `trace` + +File -> MR -> issue -> discussion chain to understand why code was introduced. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | File path (future: `:line` suffix) | +| `-p, --project` | string | — | Scope to project | +| `--discussions` | flag | — | Include DiffNote snippets | +| `--no-follow-renames` | flag | — | Skip rename chain | +| `-n, --limit` | int | 20 | Max chains | + +**DB tables:** `mr_file_changes`, `merge_requests`, `issues`, `discussions`, `notes`, `entity_references` + +### Robot Output + +```json +{ + "ok": true, + "data": { + "path": "src/auth/middleware.rs", + "resolved_paths": ["src/auth/middleware.rs", "src/auth.rs"], + "trace_chains": [ + { + "mr_iid": 99, "mr_title": "Refactor auth", "mr_state": "merged", + "mr_author": "jdoe", "change_type": "modified", + "merged_at_iso": "...", "web_url": "...", + "issues": [42], + "discussions": [ + { + "discussion_id": 123, "author_username": "reviewer", + "body_snippet": "...", "path": "src/auth/middleware.rs" + } + ] + } + ] + }, + "meta": { "tier": "api_only", "total_chains": 3, "renames_followed": 1 } +} +``` + +--- + +## `related` + +Find semantically related entities via vector search. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | Entity type (`issues`, `mrs`) or free text | +| `[IID]` | positional | — | Entity IID (required with entity type) | +| `-n, --limit` | int | 10 | Max results | +| `-p, --project` | string | — | Scope to project | + +**Two modes:** +- **Entity mode:** `related issues 42` — find entities similar to issue #42 +- **Query mode:** `related "auth flow"` — find entities matching free text + +**DB tables:** `documents`, `embeddings` (vec0), `projects` + +**Requires:** Ollama running (for query mode embedding) + +### Robot Output (entity mode) + +```json +{ + "ok": true, + "data": { + "query_entity_type": "issue", + "query_entity_iid": 42, + "query_entity_title": "Fix SSO authentication", + "similar_entities": [ + { + "entity_type": "mr", "entity_iid": 99, + "entity_title": "Refactor auth module", + "project_path": "group/repo", "state": "merged", + "similarity_score": 0.87, + "shared_labels": ["auth"], "shared_authors": ["jdoe"] + } + ] + } +} +``` + +--- + +## `drift` + +Detect discussion divergence from original intent. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `` | positional | required | Currently only `issues` | +| `` | positional | required | Entity IID | +| `--threshold` | f32 | 0.4 | Similarity threshold (0.0-1.0) | +| `-p, --project` | string | — | Scope to project | + +**DB tables:** `issues`, `discussions`, `notes`, `embeddings` + +**Requires:** Ollama running + +### Robot Output + +```json +{ + "ok": true, + "data": { + "entity_type": "issue", "entity_iid": 42, + "total_notes": 15, + "detected_drift": true, + "drift_point": { + "note_index": 8, "similarity": 0.32, + "author": "someone", "created_at": "..." + }, + "similarity_curve": [ + { "note_index": 0, "similarity": 0.95, "author": "jdoe", "created_at": "..." }, + { "note_index": 1, "similarity": 0.88, "author": "reviewer", "created_at": "..." } + ] + } +} +``` diff --git a/docs/command-surface-analysis/03-pipeline-and-infra.md b/docs/command-surface-analysis/03-pipeline-and-infra.md new file mode 100644 index 0000000..cf52d85 --- /dev/null +++ b/docs/command-surface-analysis/03-pipeline-and-infra.md @@ -0,0 +1,210 @@ +# Pipeline & Infrastructure Commands + +Reference for: `sync`, `ingest`, `generate-docs`, `embed`, `health`, `auth`, `doctor`, `status`, `stats`, `init`, `token`, `cron`, `migrate`, `version`, `completions`, `robot-docs` + +--- + +## Data Pipeline + +### `sync` (Full Pipeline) + +Complete sync: ingest -> generate-docs -> embed. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--full` | flag | — | Full re-sync (reset cursors) | +| `-f, --force` | flag | — | Override stale lock | +| `--no-embed` | flag | — | Skip embedding | +| `--no-docs` | flag | — | Skip doc generation | +| `--no-events` | flag | — | Skip resource events | +| `--no-file-changes` | flag | — | Skip MR file changes | +| `--no-status` | flag | — | Skip work-item status enrichment | +| `--dry-run` | flag | — | Preview without changes | +| `-t, --timings` | flag | — | Show timing breakdown | +| `--lock` | flag | — | Acquire file lock | +| `--issue` | int[] | — | Surgically sync specific issues (repeatable) | +| `--mr` | int[] | — | Surgically sync specific MRs (repeatable) | +| `-p, --project` | string | — | Required with `--issue`/`--mr` | +| `--preflight-only` | flag | — | Validate without DB writes | + +**Stages:** GitLab REST ingest -> GraphQL status enrichment -> Document generation -> Ollama embedding + +**Surgical sync:** `lore sync --issue 42 --mr 99 -p group/repo` fetches only specific entities. + +### `ingest` + +Fetch data from GitLab API only (no docs, no embeddings). + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `[ENTITY]` | positional | — | `issues` or `mrs` (omit for all) | +| `-p, --project` | string | — | Single project | +| `-f, --force` | flag | — | Override stale lock | +| `--full` | flag | — | Full re-sync | +| `--dry-run` | flag | — | Preview | + +**Fetches from GitLab:** +- Issues + discussions + notes +- MRs + discussions + notes +- Resource events (state, label, milestone) +- MR file changes (for DiffNote tracking) +- Work-item statuses (via GraphQL) + +### `generate-docs` + +Create searchable documents from ingested data. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--full` | flag | — | Full rebuild | +| `-p, --project` | string | — | Single project rebuild | + +**Writes:** `documents`, `document_labels`, `document_paths` + +### `embed` + +Generate vector embeddings via Ollama. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--full` | flag | — | Re-embed all | +| `--retry-failed` | flag | — | Retry failed embeddings | + +**Requires:** Ollama running with `nomic-embed-text` +**Writes:** `embeddings`, `embedding_metadata` + +--- + +## Diagnostics + +### `health` + +Quick pre-flight check (~50ms). Exit 0 = healthy, exit 19 = unhealthy. + +**Checks:** config found, DB found, schema version current. + +```json +{ + "ok": true, + "data": { + "healthy": true, + "config_found": true, "db_found": true, + "schema_current": true, "schema_version": 28 + } +} +``` + +### `auth` + +Verify GitLab authentication. + +**Checks:** token set, GitLab reachable, user identity. + +### `doctor` + +Comprehensive environment check. + +**Checks:** config validity, token, GitLab connectivity, DB health, migration status, Ollama availability + model status. + +```json +{ + "ok": true, + "data": { + "config": { "valid": true, "path": "~/.config/lore/config.json" }, + "token": { "set": true, "gitlab": { "reachable": true, "user": "jdoe" } }, + "database": { "exists": true, "version": 28, "tables": 25 }, + "ollama": { "available": true, "model_ready": true } + } +} +``` + +### `status` (alias: `st`) + +Show sync state per project. + +```json +{ + "ok": true, + "data": { + "projects": [ + { + "project_path": "group/repo", + "last_synced_at": "2026-02-26T10:00:00Z", + "document_count": 5000, "discussion_count": 2000, "notes_count": 15000 + } + ] + } +} +``` + +### `stats` (alias: `stat`) + +Document and index statistics with optional integrity checks. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `--check` | flag | — | Run integrity checks | +| `--repair` | flag | — | Fix issues (implies `--check`) | +| `--dry-run` | flag | — | Preview repairs | + +```json +{ + "ok": true, + "data": { + "documents": { "total": 61652, "issues": 5000, "mrs": 2000, "notes": 50000 }, + "embeddings": { "total": 80000, "synced": 79500, "pending": 500, "failed": 0 }, + "fts": { "total_docs": 61652 }, + "queues": { "pending": 0, "in_progress": 0, "failed": 0, "max_attempts": 0 }, + "integrity": { + "ok": true, "fts_doc_mismatch": 0, "orphan_embeddings": 0, + "stale_metadata": 0, "orphan_state_events": 0 + } + } +} +``` + +--- + +## Setup + +### `init` + +Initialize configuration and database. + +| Flag | Type | Default | Purpose | +|---|---|---|---| +| `-f, --force` | flag | — | Skip overwrite confirmation | +| `--non-interactive` | flag | — | Fail if prompts needed | +| `--gitlab-url` | string | — | GitLab base URL (required in robot mode) | +| `--token-env-var` | string | — | Env var holding token (required in robot mode) | +| `--projects` | string | — | Comma-separated project paths (required in robot mode) | +| `--default-project` | string | — | Default project path | + +### `token` + +| Subcommand | Flags | Purpose | +|---|---|---| +| `token set` | `--token ` | Store token (reads stdin if omitted) | +| `token show` | `--unmask` | Display token (masked by default) | + +### `cron` + +| Subcommand | Flags | Purpose | +|---|---|---| +| `cron install` | `--interval ` (default: 8) | Schedule auto-sync | +| `cron uninstall` | — | Remove cron job | +| `cron status` | — | Check installation | + +### `migrate` + +Run pending database migrations. No flags. + +--- + +## Meta + +| Command | Purpose | +|---|---| +| `version` | Show version string | +| `completions ` | Generate shell completions (bash/zsh/fish/powershell) | +| `robot-docs` | Machine-readable command manifest (`--brief` for ~60% smaller) | diff --git a/docs/command-surface-analysis/04-data-flow.md b/docs/command-surface-analysis/04-data-flow.md new file mode 100644 index 0000000..eda1d1d --- /dev/null +++ b/docs/command-surface-analysis/04-data-flow.md @@ -0,0 +1,179 @@ +# Data Flow & Command Network + +How commands interconnect through shared data sources and output-to-input dependencies. + +--- + +## 1. Command Network Graph + +Arrows mean "output of A feeds as input to B": + +``` + ┌─────────┐ + │ search │─────────────────────────────┐ + └────┬────┘ │ + │ iid │ topic + ┌────▼────┐ ┌────▼─────┐ + ┌─────│ issues │◄───────────────────────│ timeline │ + │ │ mrs │ (detail) └──────────┘ + │ └────┬────┘ ▲ + │ │ iid │ entity ref + │ ┌────▼────┐ ┌──────────────┐ │ + │ │ related │ │ file-history │───────┘ + │ │ drift │ └──────┬───────┘ + │ └─────────┘ │ MR iids + │ ┌────▼────┐ + │ │ trace │──── issues (linked) + │ └────┬────┘ + │ │ paths + │ ┌────▼────┐ + │ │ who │ + │ │ (expert)│ + │ └─────────┘ + │ + file paths ┌─────────┐ + │ │ me │──── issues, mrs (dashboard) + ▼ └─────────┘ + ┌──────────┐ ▲ + │ notes │ │ (~same data) + └──────────┘ ┌────┴──────┐ + │who workload│ + └───────────┘ +``` + +### Feed Chains (output of A -> input of B) + +| From | To | What Flows | +|---|---|---| +| `search` | `issues`, `mrs` | IIDs from search results -> detail lookup | +| `search` | `timeline` | Topic/query -> chronological history | +| `search` | `related` | Entity IID -> semantic similarity | +| `me` | `issues`, `mrs` | IIDs from dashboard -> detail lookup | +| `trace` | `issues` | Linked issue IIDs -> detail lookup | +| `trace` | `who` | File paths -> expert lookup | +| `file-history` | `mrs` | MR IIDs -> detail lookup | +| `file-history` | `timeline` | Entity refs -> chronological events | +| `timeline` | `issues`, `mrs` | Referenced IIDs -> detail lookup | +| `who expert` | `who reviews` | Username -> review patterns | +| `who expert` | `mrs` | MR IIDs from expert detail -> MR detail | + +--- + +## 2. Shared Data Source Map + +Which DB tables power which commands. Higher overlap = stronger consolidation signal. + +### Primary Entity Tables + +| Table | Read By | +|---|---| +| `issues` | issues, me, who-workload, search, timeline, trace, count, stats | +| `merge_requests` | mrs, me, who-workload, search, timeline, trace, file-history, count, stats | +| `notes` | notes, issues-detail, mrs-detail, who-expert, who-active, search, timeline, trace, file-history | +| `discussions` | notes, issues-detail, mrs-detail, who-active, who-reviews, timeline, trace | + +### Relationship Tables + +| Table | Read By | +|---|---| +| `entity_references` | trace, timeline | +| `mr_file_changes` | trace, file-history, who-overlap | +| `issue_labels` | issues, me | +| `mr_labels` | mrs, me | +| `issue_assignees` | issues, me | +| `mr_reviewers` | mrs, who-expert, who-workload | + +### Event Tables + +| Table | Read By | +|---|---| +| `resource_state_events` | timeline, me-activity | +| `resource_label_events` | timeline | +| `resource_milestone_events` | timeline | + +### Document/Search Tables + +| Table | Read By | +|---|---| +| `documents` + `documents_fts` | search, stats | +| `embeddings` | search, related, drift | +| `document_labels` | search | +| `document_paths` | search | + +### Infrastructure Tables + +| Table | Read By | +|---|---| +| `sync_cursors` | status | +| `dirty_sources` | stats | +| `embedding_metadata` | stats, embed | + +--- + +## 3. Shared-Data Clusters + +Commands that read from the same primary tables form natural clusters: + +### Cluster A: Issue/MR Entities + +`issues`, `mrs`, `me`, `who workload`, `count` + +All read `issues` + `merge_requests` with similar filter patterns (state, author, labels, project). These commands share the same underlying WHERE-clause builder logic. + +### Cluster B: Notes/Discussions + +`notes`, `issues detail`, `mrs detail`, `who expert`, `who active`, `timeline` + +All traverse the `discussions` -> `notes` join path. The `notes` command does it with independent filters; the others embed notes within parent context. + +### Cluster C: File Genealogy + +`trace`, `file-history`, `who overlap` + +All use `mr_file_changes` with rename chain BFS (forward: old_path -> new_path, backward: new_path -> old_path). Shared `resolve_rename_chain()` function. + +### Cluster D: Semantic/Vector + +`search`, `related`, `drift` + +All use `documents` + `embeddings` via Ollama. `search` adds FTS component; `related` is pure vector; `drift` uses vector for divergence scoring. + +### Cluster E: Diagnostics + +`health`, `auth`, `doctor`, `status`, `stats` + +All check system state. `health` < `doctor` (strict subset). `status` checks sync cursors. `stats` checks document/index health. `auth` checks token/connectivity. + +--- + +## 4. Query Pattern Sharing + +### Dynamic Filter Builder (used by issues, mrs, notes) + +All three list commands use the same pattern: build a WHERE clause dynamically from filter flags with parameterized tokens. Labels use EXISTS subquery against junction table. + +### Rename Chain BFS (used by trace, file-history, who overlap) + +Forward query: +```sql +SELECT DISTINCT new_path FROM mr_file_changes +WHERE project_id = ?1 AND old_path = ?2 AND change_type = 'renamed' +``` + +Backward query: +```sql +SELECT DISTINCT old_path FROM mr_file_changes +WHERE project_id = ?1 AND new_path = ?2 AND change_type = 'renamed' +``` + +Cycle detection via `HashSet` of visited paths, `MAX_RENAME_HOPS = 10`. + +### Hybrid Search (used by search, timeline seeding) + +RRF ranking: `score = (60 / fts_rank) + (60 / vector_rank)` + +FTS5 queries go through `to_fts_query()` which sanitizes input and builds MATCH expressions. Vector search calls Ollama to embed the query, then does cosine similarity against `embeddings` vec0 table. + +### Project Resolution (used by most commands) + +`resolve_project(conn, project_filter)` does fuzzy matching on `path_with_namespace` — suffix and substring matching. Returns `(project_id, path_with_namespace)`. diff --git a/docs/command-surface-analysis/05-overlap-analysis.md b/docs/command-surface-analysis/05-overlap-analysis.md new file mode 100644 index 0000000..7deaae5 --- /dev/null +++ b/docs/command-surface-analysis/05-overlap-analysis.md @@ -0,0 +1,170 @@ +# Overlap Analysis + +Quantified functional duplication between commands. + +--- + +## 1. High Overlap (>70%) + +### `who workload` vs `me` — 85% overlap + +| Dimension | `who @user` (workload) | `me --user @user` | +|---|---|---| +| Assigned issues | Yes | Yes | +| Authored MRs | Yes | Yes | +| Reviewing MRs | Yes | Yes | +| Attention state | No | **Yes** | +| Activity feed | No | **Yes** | +| Since-last-check inbox | No | **Yes** | +| Cross-project | Yes | **Yes** | + +**Verdict:** `who workload` is a strict subset of `me`. The only reason to use `who workload` is if you DON'T want attention_state/activity/inbox — but `me --issues --mrs --fields minimal` achieves the same thing. + +### `health` vs `doctor` — 90% overlap + +| Check | `health` | `doctor` | +|---|---|---| +| Config found | Yes | Yes | +| DB exists | Yes | Yes | +| Schema current | Yes | Yes | +| Token valid | No | **Yes** | +| GitLab reachable | No | **Yes** | +| Ollama available | No | **Yes** | + +**Verdict:** `health` is a strict subset of `doctor`. However, `health` has unique value as a ~50ms pre-flight with clean exit 0/19 semantics for scripting. + +### `file-history` vs `trace` — 75% overlap + +| Feature | `file-history` | `trace` | +|---|---|---| +| Find MRs for file | Yes | Yes | +| Rename chain BFS | Yes | Yes | +| DiffNote discussions | `--discussions` | `--discussions` | +| Follow to linked issues | No | **Yes** | +| `--merged` filter | **Yes** | No | + +**Verdict:** `trace` is a superset of `file-history` minus the `--merged` filter. Both use the same `resolve_rename_chain()` function and query `mr_file_changes`. + +### `related` query-mode vs `search --mode semantic` — 80% overlap + +| Feature | `related "text"` | `search "text" --mode semantic` | +|---|---|---| +| Vector similarity | Yes | Yes | +| FTS component | No | No (semantic mode skips FTS) | +| Filters (labels, author, since) | No | **Yes** | +| Explain ranking | No | **Yes** | +| Field selection | No | **Yes** | +| Requires Ollama | Yes | Yes | + +**Verdict:** `related "text"` is `search --mode semantic` without any filter capabilities. The entity-seeded mode (`related issues 42`) is NOT duplicated — it seeds from an existing entity's embedding. + +--- + +## 2. Medium Overlap (40-70%) + +### `who expert` vs `who overlap` — 50% + +Both answer "who works on this file" but with different scoring: + +| Aspect | `who expert` | `who overlap` | +|---|---|---| +| Scoring | Half-life decay, signal types (diffnote_author, reviewer, etc.) | Raw touch count | +| Output | Ranked experts with scores | Users with touch counts | +| Use case | "Who should review this?" | "Who else touches this?" | + +**Verdict:** Overlap is a simplified version of expert. Expert could include touch_count as a field. + +### `timeline` vs `trace` — 45% + +Both follow `entity_references` to discover connected entities, but from different entry points: + +| Aspect | `timeline` | `trace` | +|---|---|---| +| Entry point | Entity (issue/MR) or search query | File path | +| Direction | Entity -> cross-refs -> events | File -> MRs -> issues -> discussions | +| Output | Chronological events | Causal chains (why code changed) | +| Expansion | Depth-controlled cross-ref following | MR -> issue via entity_references | + +**Verdict:** Complementary, not duplicative. Different questions, shared plumbing. + +### `auth` vs `doctor` — 100% of auth + +`auth` checks: token set + GitLab reachable + user identity. +`doctor` checks: all of the above + DB + schema + Ollama. + +**Verdict:** `auth` is completely contained within `doctor`. + +### `count` vs `stats` — 40% + +Both answer "how much data?": + +| Aspect | `count` | `stats` | +|---|---|---| +| Layer | Entity (issues, MRs, notes) | Document index | +| State breakdown | Yes (opened/closed/merged) | No | +| Integrity checks | No | Yes | +| Queue status | No | Yes | + +**Verdict:** Different layers. Could be unified under `stats --entities`. + +### `notes` vs `issues/mrs detail` — 50% + +Both return note content: + +| Aspect | `notes` command | Detail view discussions | +|---|---|---| +| Independent filtering | **Yes** (author, path, resolution, contains, type) | No | +| Parent context | Minimal (parent_iid, parent_title) | **Full** (complete entity + all discussions) | +| Cross-entity queries | **Yes** (all notes matching criteria) | No (one entity only) | + +**Verdict:** `notes` is for filtered queries across entities. Detail views are for complete context on one entity. Different use cases. + +--- + +## 3. No Significant Overlap + +| Command | Why It's Unique | +|---|---| +| `drift` | Only command doing semantic divergence detection | +| `timeline` | Only command doing multi-entity chronological reconstruction with expansion | +| `search` (hybrid) | Only command combining FTS + vector with RRF ranking | +| `me` (inbox) | Only command with cursor-based since-last-check tracking | +| `who expert` | Only command with half-life decay scoring by signal type | +| `who reviews` | Only command analyzing review patterns (approval rate, latency) | +| `who active` | Only command surfacing unresolved discussions needing attention | + +--- + +## 4. Overlap Adjacency Matrix + +Rows/columns are commands. Values are estimated functional overlap percentage. + +``` + issues mrs notes search who-e who-w who-r who-a who-o timeline me fh trace related drift count status stats health doctor +issues - 30 50 20 5 40 0 5 0 15 40 0 10 10 0 20 0 10 0 0 +mrs 30 - 50 20 5 40 0 5 0 15 40 5 10 10 0 20 0 10 0 0 +notes 50 50 - 15 15 0 5 10 0 10 0 5 5 0 0 0 0 0 0 0 +search 20 20 15 - 0 0 0 0 0 15 0 0 0 80 0 0 0 5 0 0 +who-expert 5 5 15 0 - 0 10 0 50 0 0 10 10 0 0 0 0 0 0 0 +who-workload 40 40 0 0 0 - 0 0 0 0 85 0 0 0 0 0 0 0 0 0 +who-reviews 0 0 5 0 10 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 +who-active 5 5 10 0 0 0 0 - 0 5 0 0 0 0 0 0 0 0 0 0 +who-overlap 0 0 0 0 50 0 0 0 - 0 0 10 5 0 0 0 0 0 0 0 +timeline 15 15 10 15 0 0 0 5 0 - 5 5 45 0 0 0 0 0 0 0 +me 40 40 0 0 0 85 0 0 0 5 - 0 0 0 0 0 5 0 5 5 +file-history 0 5 5 0 10 0 0 0 10 5 0 - 75 0 0 0 0 0 0 0 +trace 10 10 5 0 10 0 0 0 5 45 0 75 - 0 0 0 0 0 0 0 +related 10 10 0 80 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 +drift 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 +count 20 20 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 40 0 0 +status 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 - 20 30 40 +stats 10 10 0 5 0 0 0 0 0 0 0 0 0 0 0 40 20 - 0 15 +health 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 30 0 - 90 +doctor 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 40 15 90 - +``` + +**Highest overlap pairs (>= 75%):** +1. `health` / `doctor` — 90% +2. `who workload` / `me` — 85% +3. `related` query-mode / `search semantic` — 80% +4. `file-history` / `trace` — 75% diff --git a/docs/command-surface-analysis/06-agent-workflows.md b/docs/command-surface-analysis/06-agent-workflows.md new file mode 100644 index 0000000..3303688 --- /dev/null +++ b/docs/command-surface-analysis/06-agent-workflows.md @@ -0,0 +1,216 @@ +# Agent Workflow Analysis + +Common agent workflows, round-trip costs, and token profiles. + +--- + +## 1. Common Workflows + +### Flow 1: "What should I work on?" — 4 round trips + +``` +me → dashboard overview (which items need attention?) +issues -p proj → detail on picked issue (full context + discussions) +trace src/relevant/file.rs → understand code context (why was it written?) +who src/relevant/file.rs → find domain experts (who can help?) +``` + +**Total tokens (minimal):** ~800 + ~2000 + ~1000 + ~400 = ~4200 +**Total tokens (full):** ~3000 + ~6000 + ~1500 + ~800 = ~11300 +**Latency:** 4 serial round trips + +### Flow 2: "What happened with this feature?" — 3 round trips + +``` +search "feature name" → find relevant entities +timeline "feature name" → reconstruct chronological history +related issues 42 → discover connected work +``` + +**Total tokens (minimal):** ~600 + ~1500 + ~400 = ~2500 +**Total tokens (full):** ~2000 + ~5000 + ~1000 = ~8000 +**Latency:** 3 serial round trips + +### Flow 3: "Why was this code changed?" — 3 round trips + +``` +trace src/file.rs → file -> MR -> issue chain +issues -p proj → full issue detail +timeline "issue:42" → full history with cross-refs +``` + +**Total tokens (minimal):** ~800 + ~2000 + ~1500 = ~4300 +**Total tokens (full):** ~1500 + ~6000 + ~5000 = ~12500 +**Latency:** 3 serial round trips + +### Flow 4: "Is the system healthy?" — 2-4 round trips + +``` +health → quick pre-flight (pass/fail) +doctor → detailed diagnostics (if health fails) +status → sync state per project +stats → document/index health +``` + +**Total tokens:** ~100 + ~300 + ~200 + ~400 = ~1000 +**Latency:** 2-4 serial round trips (often 1 if health passes) + +### Flow 5: "Who can review this?" — 2-3 round trips + +``` +who src/auth/ → find file experts +who @jdoe --reviews → check reviewer's patterns +``` + +**Total tokens (minimal):** ~300 + ~300 = ~600 +**Latency:** 2 serial round trips + +### Flow 6: "Find and understand an issue" — 4 round trips + +``` +search "query" → discover entities (get IIDs) +issues → full detail with discussions +timeline "issue:42" → chronological context +related issues 42 → connected entities +``` + +**Total tokens (minimal):** ~600 + ~2000 + ~1500 + ~400 = ~4500 +**Total tokens (full):** ~2000 + ~6000 + ~5000 + ~1000 = ~14000 +**Latency:** 4 serial round trips + +--- + +## 2. Token Cost Profiles + +Measured typical response sizes in robot mode with default settings: + +| Command | Typical Tokens (full) | With `--fields minimal` | Dominant Cost Driver | +|---|---|---|---| +| `me` (all sections) | 2000-5000 | 500-1500 | Open items count | +| `issues` (list, n=50) | 1500-3000 | 400-800 | Labels arrays | +| `issues ` (detail) | 1000-8000 | N/A (no minimal for detail) | Discussion depth | +| `mrs ` (detail) | 1000-8000 | N/A | Discussion depth, DiffNote positions | +| `timeline` (limit=100) | 2000-6000 | 800-1500 | Event count + evidence | +| `search` (n=20) | 1000-3000 | 300-600 | Snippet length | +| `who expert` | 300-800 | 150-300 | Expert count | +| `who workload` | 500-1500 | 200-500 | Open items count | +| `trace` | 500-2000 | 300-800 | Chain depth | +| `file-history` | 300-1500 | 200-500 | MR count | +| `related` | 300-1000 | 200-400 | Result count | +| `drift` | 200-800 | N/A | Similarity curve length | +| `notes` (n=50) | 1500-5000 | 500-1000 | Body length | +| `count` | ~100 | N/A | Fixed structure | +| `stats` | ~500 | N/A | Fixed structure | +| `health` | ~100 | N/A | Fixed structure | +| `doctor` | ~300 | N/A | Fixed structure | +| `status` | ~200 | N/A | Project count | + +### Key Observations + +1. **Detail commands are expensive.** `issues ` and `mrs ` can hit 8000 tokens due to discussions. This is the content agents actually need, but most of it is discussion body text. + +2. **`me` is the most-called command** and ranges 2000-5000 tokens. Agents often just need "do I have work?" which is ~100 tokens (summary counts only). + +3. **Lists with labels are wasteful.** Every issue/MR in a list carries its full label array. With 50 items x 5 labels each, that's 250 strings of overhead. + +4. **`--fields minimal` helps a lot** — 50-70% reduction on list commands. But it's not available on detail views. + +5. **Timeline scales linearly** with event count and evidence notes. The `--max-evidence` flag helps cap the expensive part. + +--- + +## 3. Round-Trip Inefficiency Patterns + +### Pattern A: Discovery -> Detail (N+1) + +Agent searches, gets 5 results, then needs detail on each: + +``` +search "auth bug" → 5 results +issues 42 -p proj → detail +issues 55 -p proj → detail +issues 71 -p proj → detail +issues 88 -p proj → detail +issues 95 -p proj → detail +``` + +**6 round trips** for what should be 2 (search + batch detail). + +### Pattern B: Detail -> Context Gathering + +Agent gets issue detail, then needs timeline + related + trace: + +``` +issues 42 -p proj → detail +timeline "issue:42" -p proj → events +related issues 42 -p proj → similar +trace src/file.rs -p proj → code provenance +``` + +**4 round trips** for what should be 1 (detail with embedded context). + +### Pattern C: Health Check Cascade + +Agent checks health, discovers issue, drills down: + +``` +health → unhealthy (exit 19) +doctor → token OK, Ollama missing +stats --check → 5 orphan embeddings +stats --repair → fixed +``` + +**4 round trips** but only 2 are actually needed (doctor covers health). + +### Pattern D: Dashboard -> Action + +Agent checks dashboard, picks item, needs full context: + +``` +me → 5 open issues, 2 MRs +issues 42 -p proj → picked issue detail +who src/auth/ -p proj → expert for help +timeline "issue:42" -p proj → history +``` + +**4 round trips.** With `--include`, could be 2 (me with inline detail + who). + +--- + +## 4. Optimized Workflow Vision + +What the same workflows look like with proposed optimizations: + +### Flow 1 Optimized: "What should I work on?" — 2 round trips + +``` +me --depth titles → 400 tokens: counts + item titles with attention_state +issues 42 --include timeline,trace → 1 call: detail + events + code provenance +``` + +### Flow 2 Optimized: "What happened with this feature?" — 1-2 round trips + +``` +search "feature" -n 5 → find entities +issues 42 --include timeline,related → everything in one call +``` + +### Flow 3 Optimized: "Why was this code changed?" — 1 round trip + +``` +trace src/file.rs --include experts,timeline → full chain + experts + events +``` + +### Flow 4 Optimized: "Is the system healthy?" — 1 round trip + +``` +doctor → covers health + auth + connectivity +# status + stats only if doctor reveals issues +``` + +### Flow 6 Optimized: "Find and understand" — 2 round trips + +``` +search "query" -n 5 → discover entities +issues --batch 42,55,71 --include timeline → batch detail with events +``` diff --git a/docs/command-surface-analysis/07-consolidation-proposals.md b/docs/command-surface-analysis/07-consolidation-proposals.md new file mode 100644 index 0000000..89a623c --- /dev/null +++ b/docs/command-surface-analysis/07-consolidation-proposals.md @@ -0,0 +1,198 @@ +# Consolidation Proposals + +5 proposals to reduce 34 commands to 29 by merging high-overlap commands. + +--- + +## A. Absorb `file-history` into `trace --shallow` + +**Overlap:** 75%. Both do rename chain BFS on `mr_file_changes`, both optionally include DiffNote discussions. `trace` follows `entity_references` to linked issues; `file-history` stops at MRs. + +**Current state:** +```bash +# These do nearly the same thing: +lore file-history src/auth/ -p proj --discussions +lore trace src/auth/ -p proj --discussions +# trace just adds: issues linked via entity_references +``` + +**Proposed change:** +- `trace ` — full chain: file -> MR -> issue -> discussions (existing behavior) +- `trace --shallow` — MR-only, no issue following (replaces `file-history`) +- Move `--merged` flag from `file-history` to `trace` +- Deprecate `file-history` as an alias that maps to `trace --shallow` + +**Migration path:** +1. Add `--shallow` and `--merged` flags to `trace` +2. Make `file-history` an alias with deprecation warning +3. Update robot-docs to point to `trace` +4. Remove alias after 2 releases + +**Breaking changes:** Robot output shape differs slightly (`trace_chains` vs `merge_requests` key name). The `--shallow` variant should match `file-history`'s output shape for compatibility. + +**Effort:** Low. Most code is already shared via `resolve_rename_chain()`. + +--- + +## B. Absorb `auth` into `doctor` + +**Overlap:** 100% of `auth` is contained within `doctor`. + +**Current state:** +```bash +lore auth # checks: token set, GitLab reachable, user identity +lore doctor # checks: all of above + DB + schema + Ollama +``` + +**Proposed change:** +- `doctor` — full check (existing behavior) +- `doctor --auth` — token + GitLab only (replaces `auth`) +- Keep `health` separate (fast pre-flight, different exit code contract: 0/19) +- Deprecate `auth` as alias for `doctor --auth` + +**Migration path:** +1. Add `--auth` flag to `doctor` +2. Make `auth` an alias with deprecation warning +3. Remove alias after 2 releases + +**Breaking changes:** None for robot mode (same JSON shape). Exit code mapping needs verification. + +**Effort:** Low. Doctor already has the auth check logic. + +--- + +## C. Remove `related` query-mode + +**Overlap:** 80% with `search --mode semantic`. + +**Current state:** +```bash +# These are functionally equivalent: +lore related "authentication flow" +lore search "authentication flow" --mode semantic + +# This is UNIQUE (no overlap): +lore related issues 42 +``` + +**Proposed change:** +- Keep entity-seeded mode: `related issues 42` (seeds from existing entity embedding) +- Remove free-text mode: `related "text"` -> error with suggestion: "Use `search --mode semantic`" +- Alternatively: keep as sugar but document it as equivalent to search + +**Migration path:** +1. Add deprecation warning when query-mode is used +2. After 2 releases, remove query-mode parsing +3. Entity-mode stays unchanged + +**Breaking changes:** Agents using `related "text"` must switch to `search --mode semantic`. This is a strict improvement since search has filters. + +**Effort:** Low. Just argument validation change. + +--- + +## D. Merge `who overlap` into `who expert` + +**Overlap:** 50% functional, but overlap is a strict simplification of expert. + +**Current state:** +```bash +lore who src/auth/ # expert mode: scored rankings +lore who --overlap src/auth/ # overlap mode: raw touch counts +``` + +**Proposed change:** +- `who ` (expert) adds `touch_count` and `last_touch_at` fields to each expert row +- `who --overlap ` becomes an alias for `who --fields username,touch_count` +- Eventually remove `--overlap` flag + +**New expert output:** +```json +{ + "experts": [ + { + "username": "jdoe", "score": 42.5, + "touch_count": 15, "last_touch_at": "2026-02-20", + "detail": { "mr_ids_author": [99, 101] } + } + ] +} +``` + +**Migration path:** +1. Add `touch_count` and `last_touch_at` to expert output +2. Make `--overlap` an alias with deprecation warning +3. Remove `--overlap` after 2 releases + +**Breaking changes:** Expert output gains new fields (non-breaking for JSON consumers). Overlap output shape changes if agents were parsing `{ "users": [...] }` vs `{ "experts": [...] }`. + +**Effort:** Low. Expert query already touches the same tables; just need to add a COUNT aggregation. + +--- + +## E. Merge `count` and `status` into `stats` + +**Overlap:** `count` and `stats` both answer "how much data?"; `status` and `stats` both report system state. + +**Current state:** +```bash +lore count issues # entity count + state breakdown +lore count mrs # entity count + state breakdown +lore status # sync cursors per project +lore stats # document/index counts + integrity +``` + +**Proposed change:** +- `stats` — document/index health (existing behavior, default) +- `stats --entities` — adds entity counts (replaces `count`) +- `stats --sync` — adds sync cursor positions (replaces `status`) +- `stats --all` — everything: entities + sync + documents + integrity +- `stats --check` / `--repair` — unchanged + +**New `--all` output:** +```json +{ + "data": { + "entities": { + "issues": { "total": 5000, "opened": 200, "closed": 4800 }, + "merge_requests": { "total": 1234, "opened": 100, "closed": 50, "merged": 1084 }, + "discussions": { "total": 8000 }, + "notes": { "total": 282000, "system_excluded": 50000 } + }, + "sync": { + "projects": [ + { "project_path": "group/repo", "last_synced_at": "...", "document_count": 5000 } + ] + }, + "documents": { "total": 61652, "issues": 5000, "mrs": 2000, "notes": 50000 }, + "embeddings": { "total": 80000, "synced": 79500, "pending": 500 }, + "fts": { "total_docs": 61652 }, + "queues": { "pending": 0, "in_progress": 0, "failed": 0 }, + "integrity": { "ok": true } + } +} +``` + +**Migration path:** +1. Add `--entities`, `--sync`, `--all` flags to `stats` +2. Make `count` an alias for `stats --entities` with deprecation warning +3. Make `status` an alias for `stats --sync` with deprecation warning +4. Remove aliases after 2 releases + +**Breaking changes:** `count` output currently has `{ "entity": "issues", "count": N, "breakdown": {...} }`. Under `stats --entities`, this becomes nested under `data.entities`. Alias can preserve old shape during deprecation period. + +**Effort:** Medium. Need to compose three query paths into one response builder. + +--- + +## Summary + +| Consolidation | Removes | Effort | Breaking? | +|---|---|---|---| +| `file-history` -> `trace --shallow` | -1 command | Low | Alias redirect, output shape compat | +| `auth` -> `doctor --auth` | -1 command | Low | Alias redirect | +| `related` query-mode removal | -1 mode | Low | Must switch to `search --mode semantic` | +| `who overlap` -> `who expert` | -1 sub-mode | Low | Output gains fields | +| `count` + `status` -> `stats` | -2 commands | Medium | Output nesting changes | + +**Total: 34 commands -> 29 commands.** All changes use deprecation-with-alias pattern for gradual migration. diff --git a/docs/command-surface-analysis/08-robot-optimization-proposals.md b/docs/command-surface-analysis/08-robot-optimization-proposals.md new file mode 100644 index 0000000..093b14e --- /dev/null +++ b/docs/command-surface-analysis/08-robot-optimization-proposals.md @@ -0,0 +1,347 @@ +# Robot-Mode Optimization Proposals + +6 proposals to reduce round trips and token waste for agent consumers. + +--- + +## A. `--include` flag for embedded sub-queries (P0) + +**Problem:** The #1 agent inefficiency. Every "understand this entity" workflow requires 3-4 serial round trips: detail + timeline + related + trace. + +**Proposal:** Add `--include` flag to detail commands that embeds sub-query results in the response. + +```bash +# Before: 4 round trips, ~12000 tokens +lore -J issues 42 -p proj +lore -J timeline "issue:42" -p proj --limit 20 +lore -J related issues 42 -p proj -n 5 +lore -J trace src/auth/ -p proj + +# After: 1 round trip, ~5000 tokens (sub-queries use reduced limits) +lore -J issues 42 -p proj --include timeline,related +``` + +### Include Matrix + +| Base Command | Valid Includes | Default Limits | +|---|---|---| +| `issues ` | `timeline`, `related`, `trace` | 20 events, 5 related, 5 chains | +| `mrs ` | `timeline`, `related`, `file-changes` | 20 events, 5 related | +| `trace ` | `experts`, `timeline` | 5 experts, 20 events | +| `me` | `detail` (inline top-N item details) | 3 items detailed | +| `search` | `detail` (inline top-N result details) | 3 results detailed | + +### Response Shape + +Included data uses `_` prefix to distinguish from base fields: + +```json +{ + "ok": true, + "data": { + "iid": 42, "title": "Fix auth", "state": "opened", + "discussions": [...], + "_timeline": { + "event_count": 15, + "events": [...] + }, + "_related": { + "similar_entities": [...] + } + }, + "meta": { + "elapsed_ms": 200, + "_timeline_ms": 45, + "_related_ms": 120 + } +} +``` + +### Error Handling + +Sub-query errors are non-fatal. If Ollama is down, `_related` returns an error instead of failing the whole request: + +```json +{ + "_related_error": "Ollama unavailable — related results skipped" +} +``` + +### Limit Control + +```bash +# Custom limits for included data +lore -J issues 42 --include timeline:50,related:10 +``` + +### Round-Trip Savings + +| Workflow | Before | After | Savings | +|---|---|---|---| +| Understand an issue | 4 calls | 1 call | **75%** | +| Why was code changed | 3 calls | 1 call | **67%** | +| Find and understand | 4 calls | 2 calls | **50%** | + +**Effort:** High. Each include needs its own sub-query executor, error isolation, and limit enforcement. But the payoff is massive — this single feature halves agent round trips. + +--- + +## B. `--depth` control on `me` (P0) + +**Problem:** `me` returns 2000-5000 tokens. Agents checking "do I have work?" only need ~100 tokens. + +**Proposal:** Add `--depth` flag with three levels. + +```bash +# Counts only (~100 tokens) — "do I have work?" +lore -J me --depth counts + +# Titles (~400 tokens) — "what work do I have?" +lore -J me --depth titles + +# Full (current behavior, 2000+ tokens) — "give me everything" +lore -J me --depth full +lore -J me # same as --depth full +``` + +### Depth Levels + +| Level | Includes | Typical Tokens | +|---|---|---| +| `counts` | `summary` block only (counts, no items) | ~100 | +| `titles` | summary + item lists with minimal fields (iid, title, attention_state) | ~400 | +| `full` | Everything: items, activity, inbox, discussions | ~2000-5000 | + +### Response at `--depth counts` + +```json +{ + "ok": true, + "data": { + "username": "jdoe", + "summary": { + "project_count": 3, + "open_issue_count": 5, + "authored_mr_count": 2, + "reviewing_mr_count": 1, + "needs_attention_count": 3 + } + } +} +``` + +### Response at `--depth titles` + +```json +{ + "ok": true, + "data": { + "username": "jdoe", + "summary": { ... }, + "open_issues": [ + { "iid": 42, "title": "Fix auth", "attention_state": "needs_attention" } + ], + "open_mrs_authored": [ + { "iid": 99, "title": "Refactor auth", "attention_state": "needs_attention" } + ], + "reviewing_mrs": [] + } +} +``` + +**Effort:** Low. The data is already available; just need to gate serialization by depth level. + +--- + +## C. `--batch` flag for multi-entity detail (P1) + +**Problem:** After search/timeline, agents discover N entity IIDs and need detail on each. Currently N round trips. + +**Proposal:** Add `--batch` flag to `issues` and `mrs` detail mode. + +```bash +# Before: 3 round trips +lore -J issues 42 -p proj +lore -J issues 55 -p proj +lore -J issues 71 -p proj + +# After: 1 round trip +lore -J issues --batch 42,55,71 -p proj +``` + +### Response + +```json +{ + "ok": true, + "data": { + "results": [ + { "iid": 42, "title": "Fix auth", "state": "opened", ... }, + { "iid": 55, "title": "Add SSO", "state": "opened", ... }, + { "iid": 71, "title": "Token refresh", "state": "closed", ... } + ], + "errors": [ + { "iid": 99, "error": "Not found" } + ] + } +} +``` + +### Constraints + +- Max 20 IIDs per batch +- Individual errors don't fail the batch (partial results returned) +- Works with `--include` for maximum efficiency: `--batch 42,55 --include timeline` +- Works with `--fields minimal` for token control + +**Effort:** Medium. Need to loop the existing detail handler and compose results. + +--- + +## D. Composite `context` command (P2) + +**Problem:** Agents need full context on an entity but must learn `--include` syntax. A purpose-built command is more discoverable. + +**Proposal:** Add `context` command that returns detail + timeline + related in one call. + +```bash +lore -J context issues 42 -p proj +lore -J context mrs 99 -p proj +``` + +### Equivalent To + +```bash +lore -J issues 42 -p proj --include timeline,related +``` + +But with optimized defaults: +- Timeline: 20 most recent events, max 3 evidence notes +- Related: top 5 entities +- Discussions: truncated after 5 threads +- Non-fatal: Ollama-dependent parts gracefully degrade + +### Response Shape + +Same as `issues --include timeline,related` but with the reduced defaults applied. + +### Relationship to `--include` + +`context` is sugar for the most common `--include` pattern. Both mechanisms can coexist: +- `context` for the 80% case (agents wanting full entity understanding) +- `--include` for custom combinations + +**Effort:** Medium. Thin wrapper around detail + include pipeline. + +--- + +## E. `--max-tokens` response budget (P3) + +**Problem:** Response sizes vary wildly (100 to 8000 tokens). Agents can't predict cost in advance. + +**Proposal:** Let agents cap response size. Server truncates to fit. + +```bash +lore -J me --max-tokens 500 +lore -J timeline "feature" --max-tokens 1000 +lore -J context issues 42 --max-tokens 2000 +``` + +### Truncation Strategy (priority order) + +1. Apply `--fields minimal` if not already set +2. Reduce array lengths (newest/highest-score items survive) +3. Truncate string fields (descriptions, snippets) to 200 chars +4. Omit null/empty fields +5. Drop included sub-queries (if using `--include`) + +### Meta Notice + +```json +{ + "meta": { + "elapsed_ms": 50, + "truncated": true, + "original_tokens": 3500, + "budget_tokens": 1000, + "dropped": ["_related", "discussions[5:]", "activity[10:]"] + } +} +``` + +### Implementation Notes + +Token estimation: rough heuristic based on JSON character count / 4. Doesn't need to be exact — the goal is "roughly this size" not "exactly N tokens." + +**Effort:** High. Requires token estimation, progressive truncation logic, and tracking what was dropped. + +--- + +## F. `--format tsv` for list commands (P3) + +**Problem:** JSON is verbose for tabular data. List commands return arrays of objects with repeated key names. + +**Proposal:** Add `--format tsv` for list commands. + +```bash +lore -J issues --format tsv --fields iid,title,state -n 10 +``` + +### Output + +``` +iid title state +42 Fix auth opened +55 Add SSO opened +71 Token refresh closed +``` + +### Token Savings + +| Command | JSON tokens | TSV tokens | Savings | +|---|---|---|---| +| `issues -n 50 --fields minimal` | ~800 | ~250 | **69%** | +| `mrs -n 50 --fields minimal` | ~800 | ~250 | **69%** | +| `who expert -n 10` | ~300 | ~100 | **67%** | +| `notes -n 50 --fields minimal` | ~1000 | ~350 | **65%** | + +### Applicable Commands + +TSV works well for flat, tabular data: +- `issues` (list), `mrs` (list), `notes` (list) +- `who expert`, `who overlap`, `who reviews` +- `count` + +TSV does NOT work for nested/complex data: +- Detail views (discussions are nested) +- Timeline (events have nested evidence) +- Search (nested explain, labels arrays) +- `me` (multiple sections) + +### Agent Parsing + +Most LLMs parse TSV naturally. Agents that need structured data can still use JSON. + +**Effort:** Medium. Tab-separated serialization for flat structs is straightforward. Need to handle escaping for body text containing tabs/newlines. + +--- + +## Impact Summary + +| Optimization | Priority | Effort | Round-Trip Savings | Token Savings | +|---|---|---|---|---| +| `--include` | P0 | High | **50-75%** | Moderate | +| `--depth` on `me` | P0 | Low | None | **60-80%** | +| `--batch` | P1 | Medium | **N-1 per batch** | Moderate | +| `context` command | P2 | Medium | **67-75%** | Moderate | +| `--max-tokens` | P3 | High | None | **Variable** | +| `--format tsv` | P3 | Medium | None | **65-69% on lists** | + +### Implementation Order + +1. **`--depth` on `me`** — lowest effort, high value, no risk +2. **`--include` on `issues`/`mrs` detail** — highest impact, start with `timeline` include only +3. **`--batch`** — eliminates N+1 pattern +4. **`context` command** — sugar on top of `--include` +5. **`--format tsv`** — nice-to-have, easy to add incrementally +6. **`--max-tokens`** — complex, defer until demand is clear diff --git a/docs/command-surface-analysis/09-appendices.md b/docs/command-surface-analysis/09-appendices.md new file mode 100644 index 0000000..556496d --- /dev/null +++ b/docs/command-surface-analysis/09-appendices.md @@ -0,0 +1,181 @@ +# Appendices + +--- + +## A. Robot Output Envelope + +All robot-mode responses follow this structure: + +```json +{ + "ok": true, + "data": { /* command-specific */ }, + "meta": { "elapsed_ms": 42 } +} +``` + +Errors (to stderr): + +```json +{ + "error": { + "code": "CONFIG_NOT_FOUND", + "message": "Configuration file not found", + "suggestion": "Run 'lore init'", + "actions": ["lore init"] + } +} +``` + +The `actions` array contains copy-paste shell commands for automated recovery. Omitted when empty. + +--- + +## B. Exit Codes + +| Code | Meaning | Retryable | +|---|---|---| +| 0 | Success | N/A | +| 1 | Internal error / not implemented | Maybe | +| 2 | Usage error (invalid flags or arguments) | No (fix syntax) | +| 3 | Config invalid | No (fix config) | +| 4 | Token not set | No (set token) | +| 5 | GitLab auth failed | Maybe (token expired?) | +| 6 | Resource not found (HTTP 404) | No | +| 7 | Rate limited | Yes (wait) | +| 8 | Network error | Yes (retry) | +| 9 | Database locked | Yes (wait) | +| 10 | Database error | Maybe | +| 11 | Migration failed | No (investigate) | +| 12 | I/O error | Maybe | +| 13 | Transform error | No (bug) | +| 14 | Ollama unavailable | Yes (start Ollama) | +| 15 | Ollama model not found | No (pull model) | +| 16 | Embedding failed | Yes (retry) | +| 17 | Not found (entity does not exist) | No | +| 18 | Ambiguous match (use `-p` to specify project) | No (be specific) | +| 19 | Health check failed | Yes (fix issues first) | +| 20 | Config not found | No (run init) | + +--- + +## C. Field Selection Presets + +The `--fields` flag supports both presets and custom field lists: + +```bash +lore -J issues --fields minimal # Preset +lore -J mrs --fields iid,title,state,draft # Custom comma-separated +``` + +| Command | Minimal Preset Fields | +|---|---| +| `issues` (list) | `iid`, `title`, `state`, `updated_at_iso` | +| `mrs` (list) | `iid`, `title`, `state`, `updated_at_iso` | +| `notes` (list) | `id`, `author_username`, `body`, `created_at_iso` | +| `search` | `document_id`, `title`, `source_type`, `score` | +| `timeline` | `timestamp`, `type`, `entity_iid`, `detail` | +| `who expert` | `username`, `score` | +| `who workload` | `iid`, `title`, `state` | +| `who reviews` | `name`, `count`, `percentage` | +| `who active` | `entity_type`, `iid`, `title`, `participants` | +| `who overlap` | `username`, `touch_count` | +| `me` (items) | `iid`, `title`, `attention_state`, `updated_at_iso` | +| `me` (activity) | `timestamp_iso`, `event_type`, `entity_iid`, `actor` | + +--- + +## D. Configuration Precedence + +1. CLI flags (highest priority) +2. Environment variables (`LORE_ROBOT`, `GITLAB_TOKEN`, `LORE_CONFIG_PATH`) +3. Config file (`~/.config/lore/config.json`) +4. Built-in defaults (lowest priority) + +--- + +## E. Time Parsing + +All commands accepting `--since`, `--until`, `--as-of` support: + +| Format | Example | Meaning | +|---|---|---| +| Relative days | `7d` | 7 days ago | +| Relative weeks | `2w` | 2 weeks ago | +| Relative months | `1m`, `6m` | 1/6 months ago | +| Absolute date | `2026-01-15` | Specific date | + +Internally converted to Unix milliseconds for DB queries. + +--- + +## F. Database Schema (28 migrations) + +### Primary Entity Tables + +| Table | Key Columns | Notes | +|---|---|---| +| `projects` | `gitlab_project_id`, `path_with_namespace`, `web_url` | No `name` or `last_seen_at` | +| `issues` | `iid`, `title`, `state`, `author_username`, 5 status columns | Status columns nullable (migration 021) | +| `merge_requests` | `iid`, `title`, `state`, `draft`, `source_branch`, `target_branch` | `last_seen_at INTEGER NOT NULL` | +| `discussions` | `gitlab_discussion_id` (text), `issue_id`/`merge_request_id` | One FK must be set | +| `notes` | `gitlab_id`, `author_username`, `body`, DiffNote position columns | `type` column for DiffNote/DiscussionNote | + +### Relationship Tables + +| Table | Purpose | +|---|---| +| `issue_labels`, `mr_labels` | Label junction (DELETE+INSERT for stale removal) | +| `issue_assignees`, `mr_assignees` | Assignee junction | +| `mr_reviewers` | Reviewer junction | +| `entity_references` | Cross-refs: closes, mentioned, related (with `source_method`) | +| `mr_file_changes` | File diffs: old_path, new_path, change_type | + +### Event Tables + +| Table | Constraint | +|---|---| +| `resource_state_events` | CHECK: exactly one of issue_id/merge_request_id NOT NULL | +| `resource_label_events` | Same CHECK constraint; `label_name` nullable (migration 012) | +| `resource_milestone_events` | Same CHECK constraint; `milestone_title` nullable | + +### Document/Search Pipeline + +| Table | Purpose | +|---|---| +| `documents` | Unified searchable content (source_type: issue/merge_request/discussion) | +| `documents_fts` | FTS5 virtual table for text search | +| `documents_fts_docsize` | FTS5 shadow B-tree (19x faster for COUNT) | +| `document_labels` | Fast label filtering (indexed exact-match) | +| `document_paths` | File path association for DiffNote filtering | +| `embeddings` | vec0 virtual table; rowid = document_id * 1000 + chunk_index | +| `embedding_metadata` | Chunk provenance + staleness tracking (document_hash) | +| `dirty_sources` | Documents needing regeneration (with backoff via next_attempt_at) | + +### Infrastructure + +| Table | Purpose | +|---|---| +| `sync_runs` | Sync history with metrics | +| `sync_cursors` | Per-resource sync position (updated_at cursor + tie_breaker_id) | +| `app_locks` | Crash-safe single-flight lock | +| `raw_payloads` | Raw JSON storage for debugging | +| `pending_discussion_fetches` | Dependent discussion fetch queue | +| `pending_dependent_fetches` | Job queue for resource_events, mr_closes, mr_diffs | +| `schema_version` | Migration tracking | + +--- + +## G. Glossary + +| Term | Definition | +|---|---| +| **IID** | Issue/MR number within a project (not globally unique) | +| **FTS5** | SQLite full-text search extension (BM25 ranking) | +| **vec0** | SQLite extension for vector similarity search | +| **RRF** | Reciprocal Rank Fusion — combines FTS and vector rankings | +| **DiffNote** | Comment attached to a specific line in a merge request diff | +| **Entity reference** | Cross-reference between issues/MRs (closes, mentioned, related) | +| **Rename chain** | BFS traversal of mr_file_changes to follow file renames | +| **Attention state** | Computed field on `me` items: needs_attention, not_started, stale, etc. | +| **Surgical sync** | Fetching specific entities by IID instead of full incremental sync |