Files
gitlore/command-restructure/CLI_AUDIT.md
teernisse 06852e90a6 docs(cli): add command restructure audit and implementation plan
CLI audit scoring the current command surface across human ergonomics,
robot/agent ergonomics, documentation quality, and flag design. Paired
with a detailed implementation plan for restructuring commands into a
more consistent, discoverable hierarchy.
2026-03-10 11:06:53 -04:00

20 KiB

Gitlore CLI Command Audit

1. Full Command Inventory

29 visible + 4 hidden + 2 stub = 35 total command surface

# Command Aliases Args Flags Purpose
1 issues issue [IID] 15 List/show issues
2 mrs mr, merge-requests [IID] 16 List/show MRs
3 notes note 16 List notes
4 search find, query <QUERY> 13 Hybrid FTS+vector search
5 timeline <QUERY> 11 Chronological event reconstruction
6 who [TARGET] 16 People intelligence (5 modes)
7 me 10 Personal dashboard
8 file-history <PATH> 6 MRs that touched a file
9 trace <PATH> 5 file->MR->issue->discussion chain
10 drift <TYPE> <IID> 3 Discussion divergence detection
11 related <QUERY_OR_TYPE> [IID] 3 Semantic similarity
12 count <ENTITY> 2 Count entities
13 sync 14 Full pipeline: ingest+docs+embed
14 ingest [ENTITY] 5 Fetch from GitLab API
15 generate-docs 2 Build searchable documents
16 embed 2 Generate vector embeddings
17 status st 0 Last sync times per project
18 health 0 Quick pre-flight (exit code only)
19 doctor 0 Full environment diagnostic
20 stats stat 3 Document/index statistics
21 init 6 Setup config + database
22 auth 0 Verify GitLab token
23 token subcommand 1-2 Token CRUD (set/show)
24 cron subcommand 0-1 Auto-sync scheduling
25 migrate 0 Apply DB migrations
26 robot-docs 1 Agent self-discovery manifest
27 completions <SHELL> 0 Shell completions
28 version 0 Version info
29 help (clap built-in)
Hidden/deprecated:
30 list <ENTITY> 14 deprecated, use issues/mrs
31 show <ENTITY> <IID> 1 deprecated, use issues/mrs
32 auth-test 0 deprecated, use auth
33 sync-status 0 deprecated, use status
34 backup 0 Stub (not implemented)
35 reset 1 Stub (not implemented)

2. Semantic Overlap Analysis

Cluster A: "Is the system working?" (4 commands, 1 concept)

Command What it checks Exit code semantics Has flags?
health config exists, DB opens, schema version 0=healthy, 19=unhealthy No
doctor config, token, database, Ollama informational No
status last sync times per project informational No
stats document counts, index size, integrity informational --check, --repair

Problem: A user/agent asking "is lore working?" must choose among four commands. health is a strict subset of doctor. status and stats are near-homonyms that answer different questions -- sync recency vs. index health. count (Cluster E) also overlaps with what stats reports.

Cognitive cost: High. The CLI literature (Clig.dev, Heroku CLI design guide, 12-factor CLI) consistently warns against >2 "status" commands. Users build a mental model of "the status command" -- when there are four, they pick wrong or give up.

Theoretical basis:

  • Nielsen's "Recognition over Recall" -- Four similar system-status commands force users to recall which one does what. One command with progressive disclosure (flags for depth) lets them recognize the option they need. This is doubly important for LLM agents, which perform better with fewer top-level choices and compositional flags.

  • Fitts's Law for CLIs -- Command discovery cost is proportional to list length. Each additional top-level command adds scanning time for humans and token cost for robots.

Cluster B: "Data pipeline stages" (4 commands, 1 pipeline)

Command Pipeline stage Subsumed by sync?
sync ingest -> generate-docs -> embed -- (is the parent)
ingest GitLab API fetch sync without --no-docs --no-embed
generate-docs Build FTS documents sync --no-embed (after ingest)
embed Vector embeddings via Ollama (final stage)

Problem: sync already has skip flags (--no-embed, --no-docs, --no-events, --no-status, --no-file-changes). The individual stage commands duplicate this with less control -- ingest has --full, --force, --dry-run, but sync also has all three.

The standalone commands exist for granular debugging, but in practice they're reached for <5% of the time. They inflate the help screen while sync handles 95% of use cases.

Cluster C: "File-centric intelligence" (3 overlapping surfaces)

Command Input Output Key flags
file-history <PATH> MRs that touched file -p, --discussions, --no-follow-renames, --merged, -n
trace <PATH> file->MR->issue->discussion chains -p, --discussions, --no-follow-renames, -n
who --path <PATH> <PATH> via flag experts for file area -p, --since, -n
who --overlap <PATH> <PATH> via flag users touching same files -p, --since, -n

Problem: trace is a superset of file-history -- it follows the same MR chain but additionally links to closing issues and discussions. They share 4 of 5 filter flags. A user who wants "what happened to this file?" has to choose between two commands that sound nearly identical.

Cluster D: "Semantic discovery" (3 commands, all need embeddings)

Command Input Output
search free text query ranked documents
related entity ref OR free text similar entities
drift entity ref divergence score per discussion

related "some text" is functionally a vector-only search "some text" --mode semantic. The difference is that related can also seed from an entity (issues 42), while search only accepts text.

drift is specialized enough to stand alone, but it's only used on issues and has a single non-project flag (--threshold).

Cluster E: "Count" is an orphan

count is a standalone command for SELECT COUNT(*) FROM <table>. This could be:

  • A --count flag on issues/mrs/notes
  • A section in stats output (which already shows counts)
  • Part of status output

It exists as its own top-level command primarily for robot convenience, but adds to the 29-command sprawl.


3. Flag Consistency Audit

Consistent (good patterns)

Flag Meaning Used in
-p, --project Scope to project (fuzzy) issues, mrs, notes, search, sync, ingest, generate-docs, timeline, who, me, file-history, trace, drift, related
-n, --limit Max results issues, mrs, notes, search, timeline, who, me, file-history, trace, related
--since Temporal filter (7d, 2w, YYYY-MM-DD) issues, mrs, notes, search, timeline, who, me
--fields Field selection / minimal preset issues, mrs, notes, search, timeline, who, me
--full Reset cursors / full rebuild sync, ingest, embed, generate-docs
--force Override stale lock sync, ingest
--dry-run Preview without changes sync, ingest, stats

Inconsistencies (problems)

Issue Details Impact
-f collision ingest -f = --force, count -f = --for Robot confusion; violates "same short flag = same semantics"
-a inconsistency issues -a = --author, me has no -a (uses --user for analogous concept) Minor
-s inconsistency issues -s = --state, search has no -s short flag at all Missed ergonomic shortcut
--sort availability Present in issues/mrs/notes, absent from search/timeline/file-history Inconsistent query power
--discussions file-history --discussions, trace --discussions, but issues 42 has no --discussions flag Can't get discussions when showing an issue
--open (browser) issues -o, mrs -o, notes --open (no -o) Inconsistent short flag
--merged Only on file-history, not on mrs (which uses --state merged) Different filter mechanics for same concept
Entity type naming count takes issues, mrs, discussions, notes, events; search --type takes issue, mr, discussion, note (singular) Singular vs plural for same concept

Theoretical basis:

  • Principle of Least Surprise (POLS) -- When -f means --force in one command and --for in another, both humans and agents learn the wrong lesson from one interaction and apply it to the other. CLI design guides (GNU standards, POSIX conventions, clig.dev) are unanimous: short flags should have consistent semantics across all subcommands.

  • Singular/plural inconsistency (issues vs issue as entity type values) is particularly harmful for LLM agents, which use pattern matching on prior successful invocations. If lore count issues works, the agent will try lore search --type issues -- and get a parse error.


4. Robot Ergonomics Assessment

Strengths (well above average for a CLI)

Feature Rating Notes
Structured output Excellent Consistent {ok, data, meta} envelope
Auto-detection Excellent Non-TTY -> robot mode, LORE_ROBOT env var
Error output Excellent Structured JSON to stderr with actions array for recovery
Exit codes Excellent 20 distinct, well-documented codes
Self-discovery Excellent robot-docs manifest, --brief for token savings
Typo tolerance Excellent Autocorrect with confidence scores + structured warnings
Field selection Good --fields minimal saves ~60% tokens
No-args behavior Good Robot mode auto-outputs robot-docs

Weaknesses

Issue Severity Recommendation
29 commands in robot-docs manifest High Agents spend tokens evaluating which command to use. Grouping would reduce decision space.
status/stats/stat near-homonyms High LLMs are particularly susceptible to surface-level lexical confusion. stat is an alias for stats while status is a different command -- this guarantees agent errors.
Singular vs plural entity types Medium count issues works but search --type issues fails. Agents learn from one and apply to the other.
Overlapping file commands Medium Agent must decide between trace, file-history, and who --path. The decision tree isn't obvious from names alone.
count as separate command Low Could be a flag; standalone command inflates the decision space

5. Human Ergonomics Assessment

Strengths

Feature Rating Notes
Help text quality Excellent Every command has examples, help headings organize flags
Short flags Good -p, -n, -s, -a, -J cover 80% of common use
Alias coverage Good issue/issues, mr/mrs, st/status, find/search
Subcommand inference Good lore issu -> issues via clap infer
Color/icon system Good Auto, with overrides

Weaknesses

Issue Severity Recommendation
29 commands in flat help High Doesn't fit one terminal screen. No grouping -> overwhelming
status vs stats naming High Humans will type wrong one repeatedly
health vs doctor distinction Medium "Which one do I run?" -- unclear from names
who 5-mode overload Medium Help text is long; mode exclusions are complex
Pipeline stages as top-level Low ingest/generate-docs/embed rarely used directly but clutter help
generate-docs is 14 chars Low Longest command name; gen-docs or gendocs would help

6. Proposals (Ranked by Impact x Feasibility)

P1: Help Grouping (HIGH impact, LOW effort)

Problem: 29 flat commands -> information overload.

Fix: Use clap's help_heading on subcommands to group them:

Query:
  issues         List or show issues [aliases: issue]
  mrs            List or show merge requests [aliases: mr]
  notes          List notes from discussions [aliases: note]
  search         Search indexed documents [aliases: find]
  count          Count entities in local database

Intelligence:
  timeline       Chronological timeline of events
  who            People intelligence: experts, workload, overlap
  me             Personal work dashboard

File Analysis:
  trace          Trace why code was introduced
  file-history   Show MRs that touched a file
  related        Find semantically related entities
  drift          Detect discussion divergence

Data Pipeline:
  sync           Run full sync pipeline
  ingest         Ingest data from GitLab
  generate-docs  Generate searchable documents
  embed          Generate vector embeddings

System:
  init           Initialize configuration and database
  status         Show sync state [aliases: st]
  health         Quick health check
  doctor         Check environment health
  stats          Document and index statistics [aliases: stat]
  auth           Verify GitLab authentication
  token          Manage stored GitLab token
  migrate        Run pending database migrations
  cron           Manage automatic syncing
  completions    Generate shell completions
  robot-docs     Agent self-discovery manifest
  version        Show version information

Effort: ~20 lines of #[command(help_heading = "...")] annotations. No behavior changes.

P2: Resolve status/stats Confusion (HIGH impact, LOW effort)

Option A (recommended): Rename stats -> index.

  • lore status = when did I last sync? (pipeline state)
  • lore index = how big is my index? (data inventory)
  • The alias stat goes away (it was causing confusion anyway)

Option B: Rename status -> sync-state and stats -> db-stats. More descriptive but longer.

Option C: Merge both under check (see P4).

P3: Fix Singular/Plural Entity Type Inconsistency (MEDIUM impact, TRIVIAL effort)

Accept both singular and plural forms everywhere:

  • count already takes issues (plural) -- also accept issue
  • search --type already takes issue (singular) -- also accept issues
  • drift takes issues -- also accept issue

This is a ~10 line change in the value parsers and eliminates an entire class of agent errors.

P4: Merge health + doctor (MEDIUM impact, LOW effort)

health is a fast subset of doctor. Merge:

  • lore doctor = full diagnostic (current behavior)
  • lore doctor --quick = fast pre-flight, exit-code-only (current health)
  • Drop health as a separate command, add a hidden alias for backward compat

P5: Fix -f Short Flag Collision (MEDIUM impact, TRIVIAL effort)

Change count's -f, --for to just --for (no short flag). -f should mean --force project-wide, or nowhere.

P6: Consolidate trace + file-history (MEDIUM impact, MEDIUM effort)

trace already does everything file-history does plus more. Options:

Option A: Make file-history an alias for trace --flat (shows MR list without issue/discussion linking).

Option B: Add --mrs-only to trace that produces file-history output. Deprecate file-history with a hidden alias.

Either way, one fewer top-level command and no lost functionality.

P7: Hide Pipeline Sub-stages (LOW impact, TRIVIAL effort)

Move ingest, generate-docs, embed to #[command(hide = true)]. They remain usable but don't clutter --help. Direct users to sync with stage-skip flags.

For power users who need individual stages, document in sync --help:

To run individual stages:
  lore ingest                    # Fetch from GitLab only
  lore generate-docs             # Rebuild documents only
  lore embed                     # Re-embed only

P8: Make count a Flag, Not a Command (LOW impact, MEDIUM effort)

Add --count to issues and mrs:

lore issues --count              # replaces: lore count issues
lore mrs --count                 # replaces: lore count mrs
lore notes --count               # replaces: lore count notes

Keep count as a hidden alias for backward compatibility. Removes one top-level command.

P9: Consistent --open Short Flag (LOW impact, TRIVIAL effort)

notes --open lacks the -o shorthand that issues and mrs have. Add it.

P10: Add --sort to search (LOW impact, LOW effort)

search returns ranked results but offers no --sort override. Adding --sort=score,created,updated would bring it in line with issues/mrs/notes.


7. Summary: Proposed Command Tree (After All Changes)

If all proposals were adopted, the visible top-level shrinks from 29 -> 21:

Before (29) After (21) Change
issues issues --
mrs mrs --
notes notes --
search search --
timeline timeline --
who who --
me me --
file-history (hidden, alias for trace --flat) merged into trace
trace trace absorbs file-history
drift drift --
related related --
count (hidden, issues --count replaces) absorbed
sync sync --
ingest (hidden) hidden
generate-docs (hidden) hidden
embed (hidden) hidden
status status --
health (merged into doctor) merged
doctor doctor absorbs health
stats index renamed
init init --
auth auth --
token token --
migrate migrate --
cron cron --
robot-docs robot-docs --
completions completions --
version version --

Net reduction: 29 -> 21 visible (-28%). The hidden commands remain fully functional and documented in robot-docs for agents that already use them.

Theoretical basis:

  • Miller's Law -- Humans can hold 7+/-2 items in working memory. 29 commands far exceeds this. Even with help grouping (P1), the sheer count creates decision fatigue. The literature on CLI design (Heroku's "12-Factor CLI", clig.dev's "Command Line Interface Guidelines") recommends 10-15 top-level commands maximum, with grouping or nesting for anything beyond.

  • For LLM agents specifically: Research on tool-use with large tool sets (Schick et al. 2023, Qin et al. 2023) shows that agent accuracy degrades as the tool count increases, roughly following an inverse log curve. Reducing from 29 to 21 commands in the robot-docs manifest would measurably improve agent command selection accuracy.

  • Backward compatibility is free: Since AGENTS.md says "we don't care about backward compatibility," hidden aliases cost nothing and prevent breakage for agents with cached robot-docs.


8. Priority Matrix

Proposal Impact Effort Risk Recommended Order
P1: Help grouping High Trivial None Do first
P3: Singular/plural fix Medium Trivial None Do first
P5: Fix -f collision Medium Trivial None Do first
P9: notes -o shorthand Low Trivial None Do first
P2: Rename stats->index High Low Alias needed Do second
P4: Merge health->doctor Medium Low Alias needed Do second
P7: Hide pipeline stages Low Trivial Needs docs update Do second
P6: Merge file-history->trace Medium Medium Flag design Plan carefully
P8: count -> --count flag Low Medium Compat shim Plan carefully
P10: --sort on search Low Low None When convenient

The "do first" tier is 4 changes that could ship in a single commit with zero risk and immediate ergonomic improvement for both humans and agents.