Files

teernisse 3f38b3fda7 docs: add comprehensive command surface analysis

Deep analysis of the full `lore` CLI command surface (34 commands across
6 categories) covering command inventory, data flow, overlap analysis,
and optimization proposals.

Document structure:
- Main consolidated doc: docs/command-surface-analysis.md (1251 lines)
- Split sections in docs/command-surface-analysis/ for navigation:
  00-overview.md      - Summary, inventory, priorities
  01-entity-commands.md   - issues, mrs, notes, search, count
  02-intelligence-commands.md - who, timeline, me, file-history, trace, related, drift
  03-pipeline-and-infra.md    - sync, ingest, generate-docs, embed, diagnostics
  04-data-flow.md     - Shared data source map, command network graph
  05-overlap-analysis.md  - Quantified overlap percentages for every command pair
  06-agent-workflows.md   - Common agent flows, round-trip costs, token profiles
  07-consolidation-proposals.md  - 5 proposals to reduce 34 commands to 29
  08-robot-optimization-proposals.md - 6 proposals for --include, --batch, --depth
  09-appendices.md    - Robot output envelope, field presets, exit codes

Key findings:
- High overlap pairs: who-workload/me (~85%), health/doctor (~90%)
- 5 consolidation proposals to reduce command count by 15%
- 6 robot-mode optimization proposals targeting agent round-trip reduction
- Full DB table mapping and data flow documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-28 00:08:31 -05:00

7.2 KiB

Raw Blame History

Data Flow & Command Network

How commands interconnect through shared data sources and output-to-input dependencies.

1. Command Network Graph

Arrows mean "output of A feeds as input to B":

                    ┌─────────┐
                    │ search  │─────────────────────────────┐
                    └────┬────┘                             │
                         │ iid                              │ topic
                    ┌────▼────┐                        ┌────▼─────┐
              ┌─────│ issues  │◄───────────────────────│ timeline │
              │     │ mrs     │ (detail)               └──────────┘
              │     └────┬────┘                             ▲
              │          │ iid                              │ entity ref
              │     ┌────▼────┐     ┌──────────────┐       │
              │     │ related │     │ file-history  │───────┘
              │     │ drift   │     └──────┬───────┘
              │     └─────────┘            │ MR iids
              │                       ┌────▼────┐
              │                       │  trace  │──── issues (linked)
              │                       └────┬────┘
              │                            │ paths
              │                       ┌────▼────┐
              │                       │   who   │
              │                       │ (expert)│
              │                       └─────────┘
              │
         file paths                   ┌─────────┐
              │                       │   me    │──── issues, mrs (dashboard)
              ▼                       └─────────┘
        ┌──────────┐                       ▲
        │  notes   │                       │ (~same data)
        └──────────┘                  ┌────┴──────┐
                                      │who workload│
                                      └───────────┘

Feed Chains (output of A -> input of B)

From	To	What Flows
`search`	`issues`, `mrs`	IIDs from search results -> detail lookup
`search`	`timeline`	Topic/query -> chronological history
`search`	`related`	Entity IID -> semantic similarity
`me`	`issues`, `mrs`	IIDs from dashboard -> detail lookup
`trace`	`issues`	Linked issue IIDs -> detail lookup
`trace`	`who`	File paths -> expert lookup
`file-history`	`mrs`	MR IIDs -> detail lookup
`file-history`	`timeline`	Entity refs -> chronological events
`timeline`	`issues`, `mrs`	Referenced IIDs -> detail lookup
`who expert`	`who reviews`	Username -> review patterns
`who expert`	`mrs`	MR IIDs from expert detail -> MR detail

2. Shared Data Source Map

Which DB tables power which commands. Higher overlap = stronger consolidation signal.

Primary Entity Tables

Table	Read By
`issues`	issues, me, who-workload, search, timeline, trace, count, stats
`merge_requests`	mrs, me, who-workload, search, timeline, trace, file-history, count, stats
`notes`	notes, issues-detail, mrs-detail, who-expert, who-active, search, timeline, trace, file-history
`discussions`	notes, issues-detail, mrs-detail, who-active, who-reviews, timeline, trace

Relationship Tables

Table	Read By
`entity_references`	trace, timeline
`mr_file_changes`	trace, file-history, who-overlap
`issue_labels`	issues, me
`mr_labels`	mrs, me
`issue_assignees`	issues, me
`mr_reviewers`	mrs, who-expert, who-workload

Event Tables

Table	Read By
`resource_state_events`	timeline, me-activity
`resource_label_events`	timeline
`resource_milestone_events`	timeline

Document/Search Tables

Table	Read By
`documents` + `documents_fts`	search, stats
`embeddings`	search, related, drift
`document_labels`	search
`document_paths`	search

Infrastructure Tables

Table	Read By
`sync_cursors`	status
`dirty_sources`	stats
`embedding_metadata`	stats, embed

3. Shared-Data Clusters

Commands that read from the same primary tables form natural clusters:

Cluster A: Issue/MR Entities

issues, mrs, me, who workload, count

All read issues + merge_requests with similar filter patterns (state, author, labels, project). These commands share the same underlying WHERE-clause builder logic.

Cluster B: Notes/Discussions

notes, issues detail, mrs detail, who expert, who active, timeline

All traverse the discussions -> notes join path. The notes command does it with independent filters; the others embed notes within parent context.

Cluster C: File Genealogy

trace, file-history, who overlap

All use mr_file_changes with rename chain BFS (forward: old_path -> new_path, backward: new_path -> old_path). Shared resolve_rename_chain() function.

Cluster D: Semantic/Vector

search, related, drift

All use documents + embeddings via Ollama. search adds FTS component; related is pure vector; drift uses vector for divergence scoring.

Cluster E: Diagnostics

health, auth, doctor, status, stats

All check system state. health < doctor (strict subset). status checks sync cursors. stats checks document/index health. auth checks token/connectivity.

Dynamic Filter Builder (used by issues, mrs, notes)

All three list commands use the same pattern: build a WHERE clause dynamically from filter flags with parameterized tokens. Labels use EXISTS subquery against junction table.

Rename Chain BFS (used by trace, file-history, who overlap)

Forward query:

SELECT DISTINCT new_path FROM mr_file_changes
WHERE project_id = ?1 AND old_path = ?2 AND change_type = 'renamed'

Backward query:

SELECT DISTINCT old_path FROM mr_file_changes
WHERE project_id = ?1 AND new_path = ?2 AND change_type = 'renamed'

Cycle detection via HashSet of visited paths, MAX_RENAME_HOPS = 10.

Hybrid Search (used by search, timeline seeding)

RRF ranking: score = (60 / fts_rank) + (60 / vector_rank)

FTS5 queries go through to_fts_query() which sanitizes input and builds MATCH expressions. Vector search calls Ollama to embed the query, then does cosine similarity against embeddings vec0 table.

Project Resolution (used by most commands)

resolve_project(conn, project_filter) does fuzzy matching on path_with_namespace — suffix and substring matching. Returns (project_id, path_with_namespace).

7.2 KiB Raw Blame History