Files
gitlore/docs/command-surface-analysis/04-data-flow.md
teernisse 3f38b3fda7 docs: add comprehensive command surface analysis
Deep analysis of the full `lore` CLI command surface (34 commands across
6 categories) covering command inventory, data flow, overlap analysis,
and optimization proposals.

Document structure:
- Main consolidated doc: docs/command-surface-analysis.md (1251 lines)
- Split sections in docs/command-surface-analysis/ for navigation:
  00-overview.md      - Summary, inventory, priorities
  01-entity-commands.md   - issues, mrs, notes, search, count
  02-intelligence-commands.md - who, timeline, me, file-history, trace, related, drift
  03-pipeline-and-infra.md    - sync, ingest, generate-docs, embed, diagnostics
  04-data-flow.md     - Shared data source map, command network graph
  05-overlap-analysis.md  - Quantified overlap percentages for every command pair
  06-agent-workflows.md   - Common agent flows, round-trip costs, token profiles
  07-consolidation-proposals.md  - 5 proposals to reduce 34 commands to 29
  08-robot-optimization-proposals.md - 6 proposals for --include, --batch, --depth
  09-appendices.md    - Robot output envelope, field presets, exit codes

Key findings:
- High overlap pairs: who-workload/me (~85%), health/doctor (~90%)
- 5 consolidation proposals to reduce command count by 15%
- 6 robot-mode optimization proposals targeting agent round-trip reduction
- Full DB table mapping and data flow documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-28 00:08:31 -05:00

7.2 KiB

Data Flow & Command Network

How commands interconnect through shared data sources and output-to-input dependencies.


1. Command Network Graph

Arrows mean "output of A feeds as input to B":

                    ┌─────────┐
                    │ search  │─────────────────────────────┐
                    └────┬────┘                             │
                         │ iid                              │ topic
                    ┌────▼────┐                        ┌────▼─────┐
              ┌─────│ issues  │◄───────────────────────│ timeline │
              │     │ mrs     │ (detail)               └──────────┘
              │     └────┬────┘                             ▲
              │          │ iid                              │ entity ref
              │     ┌────▼────┐     ┌──────────────┐       │
              │     │ related │     │ file-history  │───────┘
              │     │ drift   │     └──────┬───────┘
              │     └─────────┘            │ MR iids
              │                       ┌────▼────┐
              │                       │  trace  │──── issues (linked)
              │                       └────┬────┘
              │                            │ paths
              │                       ┌────▼────┐
              │                       │   who   │
              │                       │ (expert)│
              │                       └─────────┘
              │
         file paths                   ┌─────────┐
              │                       │   me    │──── issues, mrs (dashboard)
              ▼                       └─────────┘
        ┌──────────┐                       ▲
        │  notes   │                       │ (~same data)
        └──────────┘                  ┌────┴──────┐
                                      │who workload│
                                      └───────────┘

Feed Chains (output of A -> input of B)

From To What Flows
search issues, mrs IIDs from search results -> detail lookup
search timeline Topic/query -> chronological history
search related Entity IID -> semantic similarity
me issues, mrs IIDs from dashboard -> detail lookup
trace issues Linked issue IIDs -> detail lookup
trace who File paths -> expert lookup
file-history mrs MR IIDs -> detail lookup
file-history timeline Entity refs -> chronological events
timeline issues, mrs Referenced IIDs -> detail lookup
who expert who reviews Username -> review patterns
who expert mrs MR IIDs from expert detail -> MR detail

2. Shared Data Source Map

Which DB tables power which commands. Higher overlap = stronger consolidation signal.

Primary Entity Tables

Table Read By
issues issues, me, who-workload, search, timeline, trace, count, stats
merge_requests mrs, me, who-workload, search, timeline, trace, file-history, count, stats
notes notes, issues-detail, mrs-detail, who-expert, who-active, search, timeline, trace, file-history
discussions notes, issues-detail, mrs-detail, who-active, who-reviews, timeline, trace

Relationship Tables

Table Read By
entity_references trace, timeline
mr_file_changes trace, file-history, who-overlap
issue_labels issues, me
mr_labels mrs, me
issue_assignees issues, me
mr_reviewers mrs, who-expert, who-workload

Event Tables

Table Read By
resource_state_events timeline, me-activity
resource_label_events timeline
resource_milestone_events timeline

Document/Search Tables

Table Read By
documents + documents_fts search, stats
embeddings search, related, drift
document_labels search
document_paths search

Infrastructure Tables

Table Read By
sync_cursors status
dirty_sources stats
embedding_metadata stats, embed

3. Shared-Data Clusters

Commands that read from the same primary tables form natural clusters:

Cluster A: Issue/MR Entities

issues, mrs, me, who workload, count

All read issues + merge_requests with similar filter patterns (state, author, labels, project). These commands share the same underlying WHERE-clause builder logic.

Cluster B: Notes/Discussions

notes, issues detail, mrs detail, who expert, who active, timeline

All traverse the discussions -> notes join path. The notes command does it with independent filters; the others embed notes within parent context.

Cluster C: File Genealogy

trace, file-history, who overlap

All use mr_file_changes with rename chain BFS (forward: old_path -> new_path, backward: new_path -> old_path). Shared resolve_rename_chain() function.

Cluster D: Semantic/Vector

search, related, drift

All use documents + embeddings via Ollama. search adds FTS component; related is pure vector; drift uses vector for divergence scoring.

Cluster E: Diagnostics

health, auth, doctor, status, stats

All check system state. health < doctor (strict subset). status checks sync cursors. stats checks document/index health. auth checks token/connectivity.


4. Query Pattern Sharing

Dynamic Filter Builder (used by issues, mrs, notes)

All three list commands use the same pattern: build a WHERE clause dynamically from filter flags with parameterized tokens. Labels use EXISTS subquery against junction table.

Rename Chain BFS (used by trace, file-history, who overlap)

Forward query:

SELECT DISTINCT new_path FROM mr_file_changes
WHERE project_id = ?1 AND old_path = ?2 AND change_type = 'renamed'

Backward query:

SELECT DISTINCT old_path FROM mr_file_changes
WHERE project_id = ?1 AND new_path = ?2 AND change_type = 'renamed'

Cycle detection via HashSet of visited paths, MAX_RENAME_HOPS = 10.

Hybrid Search (used by search, timeline seeding)

RRF ranking: score = (60 / fts_rank) + (60 / vector_rank)

FTS5 queries go through to_fts_query() which sanitizes input and builds MATCH expressions. Vector search calls Ollama to embed the query, then does cosine similarity against embeddings vec0 table.

Project Resolution (used by most commands)

resolve_project(conn, project_filter) does fuzzy matching on path_with_namespace — suffix and substring matching. Returns (project_id, path_with_namespace).