docs: add comprehensive command surface analysis

Deep analysis of the full `lore` CLI command surface (34 commands across
6 categories) covering command inventory, data flow, overlap analysis,
and optimization proposals.

Document structure:
- Main consolidated doc: docs/command-surface-analysis.md (1251 lines)
- Split sections in docs/command-surface-analysis/ for navigation:
  00-overview.md      - Summary, inventory, priorities
  01-entity-commands.md   - issues, mrs, notes, search, count
  02-intelligence-commands.md - who, timeline, me, file-history, trace, related, drift
  03-pipeline-and-infra.md    - sync, ingest, generate-docs, embed, diagnostics
  04-data-flow.md     - Shared data source map, command network graph
  05-overlap-analysis.md  - Quantified overlap percentages for every command pair
  06-agent-workflows.md   - Common agent flows, round-trip costs, token profiles
  07-consolidation-proposals.md  - 5 proposals to reduce 34 commands to 29
  08-robot-optimization-proposals.md - 6 proposals for --include, --batch, --depth
  09-appendices.md    - Robot output envelope, field presets, exit codes

Key findings:
- High overlap pairs: who-workload/me (~85%), health/doctor (~90%)
- 5 consolidation proposals to reduce command count by 15%
- 6 robot-mode optimization proposals targeting agent round-trip reduction
- Full DB table mapping and data flow documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
teernisse
2026-02-27 07:31:36 -05:00
parent 439c20e713
commit 3f38b3fda7
11 changed files with 3604 additions and 0 deletions

View File

@@ -0,0 +1,179 @@
# Data Flow & Command Network
How commands interconnect through shared data sources and output-to-input dependencies.
---
## 1. Command Network Graph
Arrows mean "output of A feeds as input to B":
```
┌─────────┐
│ search │─────────────────────────────┐
└────┬────┘ │
│ iid │ topic
┌────▼────┐ ┌────▼─────┐
┌─────│ issues │◄───────────────────────│ timeline │
│ │ mrs │ (detail) └──────────┘
│ └────┬────┘ ▲
│ │ iid │ entity ref
│ ┌────▼────┐ ┌──────────────┐ │
│ │ related │ │ file-history │───────┘
│ │ drift │ └──────┬───────┘
│ └─────────┘ │ MR iids
│ ┌────▼────┐
│ │ trace │──── issues (linked)
│ └────┬────┘
│ │ paths
│ ┌────▼────┐
│ │ who │
│ │ (expert)│
│ └─────────┘
file paths ┌─────────┐
│ │ me │──── issues, mrs (dashboard)
▼ └─────────┘
┌──────────┐ ▲
│ notes │ │ (~same data)
└──────────┘ ┌────┴──────┐
│who workload│
└───────────┘
```
### Feed Chains (output of A -> input of B)
| From | To | What Flows |
|---|---|---|
| `search` | `issues`, `mrs` | IIDs from search results -> detail lookup |
| `search` | `timeline` | Topic/query -> chronological history |
| `search` | `related` | Entity IID -> semantic similarity |
| `me` | `issues`, `mrs` | IIDs from dashboard -> detail lookup |
| `trace` | `issues` | Linked issue IIDs -> detail lookup |
| `trace` | `who` | File paths -> expert lookup |
| `file-history` | `mrs` | MR IIDs -> detail lookup |
| `file-history` | `timeline` | Entity refs -> chronological events |
| `timeline` | `issues`, `mrs` | Referenced IIDs -> detail lookup |
| `who expert` | `who reviews` | Username -> review patterns |
| `who expert` | `mrs` | MR IIDs from expert detail -> MR detail |
---
## 2. Shared Data Source Map
Which DB tables power which commands. Higher overlap = stronger consolidation signal.
### Primary Entity Tables
| Table | Read By |
|---|---|
| `issues` | issues, me, who-workload, search, timeline, trace, count, stats |
| `merge_requests` | mrs, me, who-workload, search, timeline, trace, file-history, count, stats |
| `notes` | notes, issues-detail, mrs-detail, who-expert, who-active, search, timeline, trace, file-history |
| `discussions` | notes, issues-detail, mrs-detail, who-active, who-reviews, timeline, trace |
### Relationship Tables
| Table | Read By |
|---|---|
| `entity_references` | trace, timeline |
| `mr_file_changes` | trace, file-history, who-overlap |
| `issue_labels` | issues, me |
| `mr_labels` | mrs, me |
| `issue_assignees` | issues, me |
| `mr_reviewers` | mrs, who-expert, who-workload |
### Event Tables
| Table | Read By |
|---|---|
| `resource_state_events` | timeline, me-activity |
| `resource_label_events` | timeline |
| `resource_milestone_events` | timeline |
### Document/Search Tables
| Table | Read By |
|---|---|
| `documents` + `documents_fts` | search, stats |
| `embeddings` | search, related, drift |
| `document_labels` | search |
| `document_paths` | search |
### Infrastructure Tables
| Table | Read By |
|---|---|
| `sync_cursors` | status |
| `dirty_sources` | stats |
| `embedding_metadata` | stats, embed |
---
## 3. Shared-Data Clusters
Commands that read from the same primary tables form natural clusters:
### Cluster A: Issue/MR Entities
`issues`, `mrs`, `me`, `who workload`, `count`
All read `issues` + `merge_requests` with similar filter patterns (state, author, labels, project). These commands share the same underlying WHERE-clause builder logic.
### Cluster B: Notes/Discussions
`notes`, `issues detail`, `mrs detail`, `who expert`, `who active`, `timeline`
All traverse the `discussions` -> `notes` join path. The `notes` command does it with independent filters; the others embed notes within parent context.
### Cluster C: File Genealogy
`trace`, `file-history`, `who overlap`
All use `mr_file_changes` with rename chain BFS (forward: old_path -> new_path, backward: new_path -> old_path). Shared `resolve_rename_chain()` function.
### Cluster D: Semantic/Vector
`search`, `related`, `drift`
All use `documents` + `embeddings` via Ollama. `search` adds FTS component; `related` is pure vector; `drift` uses vector for divergence scoring.
### Cluster E: Diagnostics
`health`, `auth`, `doctor`, `status`, `stats`
All check system state. `health` < `doctor` (strict subset). `status` checks sync cursors. `stats` checks document/index health. `auth` checks token/connectivity.
---
## 4. Query Pattern Sharing
### Dynamic Filter Builder (used by issues, mrs, notes)
All three list commands use the same pattern: build a WHERE clause dynamically from filter flags with parameterized tokens. Labels use EXISTS subquery against junction table.
### Rename Chain BFS (used by trace, file-history, who overlap)
Forward query:
```sql
SELECT DISTINCT new_path FROM mr_file_changes
WHERE project_id = ?1 AND old_path = ?2 AND change_type = 'renamed'
```
Backward query:
```sql
SELECT DISTINCT old_path FROM mr_file_changes
WHERE project_id = ?1 AND new_path = ?2 AND change_type = 'renamed'
```
Cycle detection via `HashSet` of visited paths, `MAX_RENAME_HOPS = 10`.
### Hybrid Search (used by search, timeline seeding)
RRF ranking: `score = (60 / fts_rank) + (60 / vector_rank)`
FTS5 queries go through `to_fts_query()` which sanitizes input and builds MATCH expressions. Vector search calls Ollama to embed the query, then does cosine similarity against `embeddings` vec0 table.
### Project Resolution (used by most commands)
`resolve_project(conn, project_filter)` does fuzzy matching on `path_with_namespace` — suffix and substring matching. Returns `(project_id, path_with_namespace)`.