docs: add comprehensive command surface analysis

Deep analysis of the full `lore` CLI command surface (34 commands across 6 categories) covering command inventory, data flow, overlap analysis, and optimization proposals. Document structure: - Main consolidated doc: docs/command-surface-analysis.md (1251 lines) - Split sections in docs/command-surface-analysis/ for navigation: 00-overview.md - Summary, inventory, priorities 01-entity-commands.md - issues, mrs, notes, search, count 02-intelligence-commands.md - who, timeline, me, file-history, trace, related, drift 03-pipeline-and-infra.md - sync, ingest, generate-docs, embed, diagnostics 04-data-flow.md - Shared data source map, command network graph 05-overlap-analysis.md - Quantified overlap percentages for every command pair 06-agent-workflows.md - Common agent flows, round-trip costs, token profiles 07-consolidation-proposals.md - 5 proposals to reduce 34 commands to 29 08-robot-optimization-proposals.md - 6 proposals for --include, --batch, --depth 09-appendices.md - Robot output envelope, field presets, exit codes Key findings: - High overlap pairs: who-workload/me (~85%), health/doctor (~90%) - 5 consolidation proposals to reduce command count by 15% - 6 robot-mode optimization proposals targeting agent round-trip reduction - Full DB table mapping and data flow documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-27 07:31:36 -05:00
parent 439c20e713
commit 3f38b3fda7
11 changed files with 3604 additions and 0 deletions
--- a/docs/command-surface-analysis/04-data-flow.md
+++ b/docs/command-surface-analysis/04-data-flow.md
@@ -0,0 +1,179 @@
+# Data Flow & Command Network
+
+How commands interconnect through shared data sources and output-to-input dependencies.
+
+---
+
+## 1. Command Network Graph
+
+Arrows mean "output of A feeds as input to B":
+
+```
+                    ┌─────────┐
+                    │ search  │─────────────────────────────┐
+                    └────┬────┘                             │
+                         │ iid                              │ topic
+                    ┌────▼────┐                        ┌────▼─────┐
+              ┌─────│ issues  │◄───────────────────────│ timeline │
+              │     │ mrs     │ (detail)               └──────────┘
+              │     └────┬────┘                             ▲
+              │          │ iid                              │ entity ref
+              │     ┌────▼────┐     ┌──────────────┐       │
+              │     │ related │     │ file-history  │───────┘
+              │     │ drift   │     └──────┬───────┘
+              │     └─────────┘            │ MR iids
+              │                       ┌────▼────┐
+              │                       │  trace  │──── issues (linked)
+              │                       └────┬────┘
+              │                            │ paths
+              │                       ┌────▼────┐
+              │                       │   who   │
+              │                       │ (expert)│
+              │                       └─────────┘
+              │
+         file paths                   ┌─────────┐
+              │                       │   me    │──── issues, mrs (dashboard)
+              ▼                       └─────────┘
+        ┌──────────┐                       ▲
+        │  notes   │                       │ (~same data)
+        └──────────┘                  ┌────┴──────┐
+                                      │who workload│
+                                      └───────────┘
+```
+
+### Feed Chains (output of A -> input of B)
+
+| From | To | What Flows |
+|---|---|---|
+| `search` | `issues`, `mrs` | IIDs from search results -> detail lookup |
+| `search` | `timeline` | Topic/query -> chronological history |
+| `search` | `related` | Entity IID -> semantic similarity |
+| `me` | `issues`, `mrs` | IIDs from dashboard -> detail lookup |
+| `trace` | `issues` | Linked issue IIDs -> detail lookup |
+| `trace` | `who` | File paths -> expert lookup |
+| `file-history` | `mrs` | MR IIDs -> detail lookup |
+| `file-history` | `timeline` | Entity refs -> chronological events |
+| `timeline` | `issues`, `mrs` | Referenced IIDs -> detail lookup |
+| `who expert` | `who reviews` | Username -> review patterns |
+| `who expert` | `mrs` | MR IIDs from expert detail -> MR detail |
+
+---
+
+## 2. Shared Data Source Map
+
+Which DB tables power which commands. Higher overlap = stronger consolidation signal.
+
+### Primary Entity Tables
+
+| Table | Read By |
+|---|---|
+| `issues` | issues, me, who-workload, search, timeline, trace, count, stats |
+| `merge_requests` | mrs, me, who-workload, search, timeline, trace, file-history, count, stats |
+| `notes` | notes, issues-detail, mrs-detail, who-expert, who-active, search, timeline, trace, file-history |
+| `discussions` | notes, issues-detail, mrs-detail, who-active, who-reviews, timeline, trace |
+
+### Relationship Tables
+
+| Table | Read By |
+|---|---|
+| `entity_references` | trace, timeline |
+| `mr_file_changes` | trace, file-history, who-overlap |
+| `issue_labels` | issues, me |
+| `mr_labels` | mrs, me |
+| `issue_assignees` | issues, me |
+| `mr_reviewers` | mrs, who-expert, who-workload |
+
+### Event Tables
+
+| Table | Read By |
+|---|---|
+| `resource_state_events` | timeline, me-activity |
+| `resource_label_events` | timeline |
+| `resource_milestone_events` | timeline |
+
+### Document/Search Tables
+
+| Table | Read By |
+|---|---|
+| `documents` + `documents_fts` | search, stats |
+| `embeddings` | search, related, drift |
+| `document_labels` | search |
+| `document_paths` | search |
+
+### Infrastructure Tables
+
+| Table | Read By |
+|---|---|
+| `sync_cursors` | status |
+| `dirty_sources` | stats |
+| `embedding_metadata` | stats, embed |
+
+---
+
+## 3. Shared-Data Clusters
+
+Commands that read from the same primary tables form natural clusters:
+
+### Cluster A: Issue/MR Entities
+
+`issues`, `mrs`, `me`, `who workload`, `count`
+
+All read `issues` + `merge_requests` with similar filter patterns (state, author, labels, project). These commands share the same underlying WHERE-clause builder logic.
+
+### Cluster B: Notes/Discussions
+
+`notes`, `issues detail`, `mrs detail`, `who expert`, `who active`, `timeline`
+
+All traverse the `discussions` -> `notes` join path. The `notes` command does it with independent filters; the others embed notes within parent context.
+
+### Cluster C: File Genealogy
+
+`trace`, `file-history`, `who overlap`
+
+All use `mr_file_changes` with rename chain BFS (forward: old_path -> new_path, backward: new_path -> old_path). Shared `resolve_rename_chain()` function.
+
+### Cluster D: Semantic/Vector
+
+`search`, `related`, `drift`
+
+All use `documents` + `embeddings` via Ollama. `search` adds FTS component; `related` is pure vector; `drift` uses vector for divergence scoring.
+
+### Cluster E: Diagnostics
+
+`health`, `auth`, `doctor`, `status`, `stats`
+
+All check system state. `health` < `doctor` (strict subset). `status` checks sync cursors. `stats` checks document/index health. `auth` checks token/connectivity.
+
+---
+
+## 4. Query Pattern Sharing
+
+### Dynamic Filter Builder (used by issues, mrs, notes)
+
+All three list commands use the same pattern: build a WHERE clause dynamically from filter flags with parameterized tokens. Labels use EXISTS subquery against junction table.
+
+### Rename Chain BFS (used by trace, file-history, who overlap)
+
+Forward query:
+```sql
+SELECT DISTINCT new_path FROM mr_file_changes
+WHERE project_id = ?1 AND old_path = ?2 AND change_type = 'renamed'
+```
+
+Backward query:
+```sql
+SELECT DISTINCT old_path FROM mr_file_changes
+WHERE project_id = ?1 AND new_path = ?2 AND change_type = 'renamed'
+```
+
+Cycle detection via `HashSet` of visited paths, `MAX_RENAME_HOPS = 10`.
+
+### Hybrid Search (used by search, timeline seeding)
+
+RRF ranking: `score = (60 / fts_rank) + (60 / vector_rank)`
+
+FTS5 queries go through `to_fts_query()` which sanitizes input and builds MATCH expressions. Vector search calls Ollama to embed the query, then does cosine similarity against `embeddings` vec0 table.
+
+### Project Resolution (used by most commands)
+
+`resolve_project(conn, project_filter)` does fuzzy matching on `path_with_namespace` — suffix and substring matching. Returns `(project_id, path_with_namespace)`.