docs: add comprehensive command surface analysis
Deep analysis of the full `lore` CLI command surface (34 commands across 6 categories) covering command inventory, data flow, overlap analysis, and optimization proposals. Document structure: - Main consolidated doc: docs/command-surface-analysis.md (1251 lines) - Split sections in docs/command-surface-analysis/ for navigation: 00-overview.md - Summary, inventory, priorities 01-entity-commands.md - issues, mrs, notes, search, count 02-intelligence-commands.md - who, timeline, me, file-history, trace, related, drift 03-pipeline-and-infra.md - sync, ingest, generate-docs, embed, diagnostics 04-data-flow.md - Shared data source map, command network graph 05-overlap-analysis.md - Quantified overlap percentages for every command pair 06-agent-workflows.md - Common agent flows, round-trip costs, token profiles 07-consolidation-proposals.md - 5 proposals to reduce 34 commands to 29 08-robot-optimization-proposals.md - 6 proposals for --include, --batch, --depth 09-appendices.md - Robot output envelope, field presets, exit codes Key findings: - High overlap pairs: who-workload/me (~85%), health/doctor (~90%) - 5 consolidation proposals to reduce command count by 15% - 6 robot-mode optimization proposals targeting agent round-trip reduction - Full DB table mapping and data flow documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
179
docs/command-surface-analysis/04-data-flow.md
Normal file
179
docs/command-surface-analysis/04-data-flow.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# Data Flow & Command Network
|
||||
|
||||
How commands interconnect through shared data sources and output-to-input dependencies.
|
||||
|
||||
---
|
||||
|
||||
## 1. Command Network Graph
|
||||
|
||||
Arrows mean "output of A feeds as input to B":
|
||||
|
||||
```
|
||||
┌─────────┐
|
||||
│ search │─────────────────────────────┐
|
||||
└────┬────┘ │
|
||||
│ iid │ topic
|
||||
┌────▼────┐ ┌────▼─────┐
|
||||
┌─────│ issues │◄───────────────────────│ timeline │
|
||||
│ │ mrs │ (detail) └──────────┘
|
||||
│ └────┬────┘ ▲
|
||||
│ │ iid │ entity ref
|
||||
│ ┌────▼────┐ ┌──────────────┐ │
|
||||
│ │ related │ │ file-history │───────┘
|
||||
│ │ drift │ └──────┬───────┘
|
||||
│ └─────────┘ │ MR iids
|
||||
│ ┌────▼────┐
|
||||
│ │ trace │──── issues (linked)
|
||||
│ └────┬────┘
|
||||
│ │ paths
|
||||
│ ┌────▼────┐
|
||||
│ │ who │
|
||||
│ │ (expert)│
|
||||
│ └─────────┘
|
||||
│
|
||||
file paths ┌─────────┐
|
||||
│ │ me │──── issues, mrs (dashboard)
|
||||
▼ └─────────┘
|
||||
┌──────────┐ ▲
|
||||
│ notes │ │ (~same data)
|
||||
└──────────┘ ┌────┴──────┐
|
||||
│who workload│
|
||||
└───────────┘
|
||||
```
|
||||
|
||||
### Feed Chains (output of A -> input of B)
|
||||
|
||||
| From | To | What Flows |
|
||||
|---|---|---|
|
||||
| `search` | `issues`, `mrs` | IIDs from search results -> detail lookup |
|
||||
| `search` | `timeline` | Topic/query -> chronological history |
|
||||
| `search` | `related` | Entity IID -> semantic similarity |
|
||||
| `me` | `issues`, `mrs` | IIDs from dashboard -> detail lookup |
|
||||
| `trace` | `issues` | Linked issue IIDs -> detail lookup |
|
||||
| `trace` | `who` | File paths -> expert lookup |
|
||||
| `file-history` | `mrs` | MR IIDs -> detail lookup |
|
||||
| `file-history` | `timeline` | Entity refs -> chronological events |
|
||||
| `timeline` | `issues`, `mrs` | Referenced IIDs -> detail lookup |
|
||||
| `who expert` | `who reviews` | Username -> review patterns |
|
||||
| `who expert` | `mrs` | MR IIDs from expert detail -> MR detail |
|
||||
|
||||
---
|
||||
|
||||
## 2. Shared Data Source Map
|
||||
|
||||
Which DB tables power which commands. Higher overlap = stronger consolidation signal.
|
||||
|
||||
### Primary Entity Tables
|
||||
|
||||
| Table | Read By |
|
||||
|---|---|
|
||||
| `issues` | issues, me, who-workload, search, timeline, trace, count, stats |
|
||||
| `merge_requests` | mrs, me, who-workload, search, timeline, trace, file-history, count, stats |
|
||||
| `notes` | notes, issues-detail, mrs-detail, who-expert, who-active, search, timeline, trace, file-history |
|
||||
| `discussions` | notes, issues-detail, mrs-detail, who-active, who-reviews, timeline, trace |
|
||||
|
||||
### Relationship Tables
|
||||
|
||||
| Table | Read By |
|
||||
|---|---|
|
||||
| `entity_references` | trace, timeline |
|
||||
| `mr_file_changes` | trace, file-history, who-overlap |
|
||||
| `issue_labels` | issues, me |
|
||||
| `mr_labels` | mrs, me |
|
||||
| `issue_assignees` | issues, me |
|
||||
| `mr_reviewers` | mrs, who-expert, who-workload |
|
||||
|
||||
### Event Tables
|
||||
|
||||
| Table | Read By |
|
||||
|---|---|
|
||||
| `resource_state_events` | timeline, me-activity |
|
||||
| `resource_label_events` | timeline |
|
||||
| `resource_milestone_events` | timeline |
|
||||
|
||||
### Document/Search Tables
|
||||
|
||||
| Table | Read By |
|
||||
|---|---|
|
||||
| `documents` + `documents_fts` | search, stats |
|
||||
| `embeddings` | search, related, drift |
|
||||
| `document_labels` | search |
|
||||
| `document_paths` | search |
|
||||
|
||||
### Infrastructure Tables
|
||||
|
||||
| Table | Read By |
|
||||
|---|---|
|
||||
| `sync_cursors` | status |
|
||||
| `dirty_sources` | stats |
|
||||
| `embedding_metadata` | stats, embed |
|
||||
|
||||
---
|
||||
|
||||
## 3. Shared-Data Clusters
|
||||
|
||||
Commands that read from the same primary tables form natural clusters:
|
||||
|
||||
### Cluster A: Issue/MR Entities
|
||||
|
||||
`issues`, `mrs`, `me`, `who workload`, `count`
|
||||
|
||||
All read `issues` + `merge_requests` with similar filter patterns (state, author, labels, project). These commands share the same underlying WHERE-clause builder logic.
|
||||
|
||||
### Cluster B: Notes/Discussions
|
||||
|
||||
`notes`, `issues detail`, `mrs detail`, `who expert`, `who active`, `timeline`
|
||||
|
||||
All traverse the `discussions` -> `notes` join path. The `notes` command does it with independent filters; the others embed notes within parent context.
|
||||
|
||||
### Cluster C: File Genealogy
|
||||
|
||||
`trace`, `file-history`, `who overlap`
|
||||
|
||||
All use `mr_file_changes` with rename chain BFS (forward: old_path -> new_path, backward: new_path -> old_path). Shared `resolve_rename_chain()` function.
|
||||
|
||||
### Cluster D: Semantic/Vector
|
||||
|
||||
`search`, `related`, `drift`
|
||||
|
||||
All use `documents` + `embeddings` via Ollama. `search` adds FTS component; `related` is pure vector; `drift` uses vector for divergence scoring.
|
||||
|
||||
### Cluster E: Diagnostics
|
||||
|
||||
`health`, `auth`, `doctor`, `status`, `stats`
|
||||
|
||||
All check system state. `health` < `doctor` (strict subset). `status` checks sync cursors. `stats` checks document/index health. `auth` checks token/connectivity.
|
||||
|
||||
---
|
||||
|
||||
## 4. Query Pattern Sharing
|
||||
|
||||
### Dynamic Filter Builder (used by issues, mrs, notes)
|
||||
|
||||
All three list commands use the same pattern: build a WHERE clause dynamically from filter flags with parameterized tokens. Labels use EXISTS subquery against junction table.
|
||||
|
||||
### Rename Chain BFS (used by trace, file-history, who overlap)
|
||||
|
||||
Forward query:
|
||||
```sql
|
||||
SELECT DISTINCT new_path FROM mr_file_changes
|
||||
WHERE project_id = ?1 AND old_path = ?2 AND change_type = 'renamed'
|
||||
```
|
||||
|
||||
Backward query:
|
||||
```sql
|
||||
SELECT DISTINCT old_path FROM mr_file_changes
|
||||
WHERE project_id = ?1 AND new_path = ?2 AND change_type = 'renamed'
|
||||
```
|
||||
|
||||
Cycle detection via `HashSet` of visited paths, `MAX_RENAME_HOPS = 10`.
|
||||
|
||||
### Hybrid Search (used by search, timeline seeding)
|
||||
|
||||
RRF ranking: `score = (60 / fts_rank) + (60 / vector_rank)`
|
||||
|
||||
FTS5 queries go through `to_fts_query()` which sanitizes input and builds MATCH expressions. Vector search calls Ollama to embed the query, then does cosine similarity against `embeddings` vec0 table.
|
||||
|
||||
### Project Resolution (used by most commands)
|
||||
|
||||
`resolve_project(conn, project_filter)` does fuzzy matching on `path_with_namespace` — suffix and substring matching. Returns `(project_id, path_with_namespace)`.
|
||||
Reference in New Issue
Block a user