Files

teernisse 06852e90a6 docs(cli): add command restructure audit and implementation plan

CLI audit scoring the current command surface across human ergonomics,
robot/agent ergonomics, documentation quality, and flag design. Paired
with a detailed implementation plan for restructuring commands into a
more consistent, discoverable hierarchy.

2026-03-10 11:06:53 -04:00

20 KiB

Raw Blame History

Gitlore CLI Command Audit

1. Full Command Inventory

29 visible + 4 hidden + 2 stub = 35 total command surface

#	Command	Aliases	Args	Flags	Purpose
1	`issues`	`issue`	`[IID]`	15	List/show issues
2	`mrs`	`mr`, `merge-requests`	`[IID]`	16	List/show MRs
3	`notes`	`note`	—	16	List notes
4	`search`	`find`, `query`	`<QUERY>`	13	Hybrid FTS+vector search
5	`timeline`	—	`<QUERY>`	11	Chronological event reconstruction
6	`who`	—	`[TARGET]`	16	People intelligence (5 modes)
7	`me`	—	—	10	Personal dashboard
8	`file-history`	—	`<PATH>`	6	MRs that touched a file
9	`trace`	—	`<PATH>`	5	file->MR->issue->discussion chain
10	`drift`	—	`<TYPE> <IID>`	3	Discussion divergence detection
11	`related`	—	`<QUERY_OR_TYPE> [IID]`	3	Semantic similarity
12	`count`	—	`<ENTITY>`	2	Count entities
13	`sync`	—	—	14	Full pipeline: ingest+docs+embed
14	`ingest`	—	`[ENTITY]`	5	Fetch from GitLab API
15	`generate-docs`	—	—	2	Build searchable documents
16	`embed`	—	—	2	Generate vector embeddings
17	`status`	`st`	—	0	Last sync times per project
18	`health`	—	—	0	Quick pre-flight (exit code only)
19	`doctor`	—	—	0	Full environment diagnostic
20	`stats`	`stat`	—	3	Document/index statistics
21	`init`	—	—	6	Setup config + database
22	`auth`	—	—	0	Verify GitLab token
23	`token`	—	subcommand	1-2	Token CRUD (set/show)
24	`cron`	—	subcommand	0-1	Auto-sync scheduling
25	`migrate`	—	—	0	Apply DB migrations
26	`robot-docs`	—	—	1	Agent self-discovery manifest
27	`completions`	—	`<SHELL>`	0	Shell completions
28	`version`	—	—	0	Version info
29	help	—	—	—	(clap built-in)
	Hidden/deprecated:
30	`list`	—	`<ENTITY>`	14	deprecated, use issues/mrs
31	`show`	—	`<ENTITY> <IID>`	1	deprecated, use issues/mrs
32	`auth-test`	—	—	0	deprecated, use auth
33	`sync-status`	—	—	0	deprecated, use status
34	`backup`	—	—	0	Stub (not implemented)
35	`reset`	—	—	1	Stub (not implemented)

2. Semantic Overlap Analysis

Cluster A: "Is the system working?" (4 commands, 1 concept)

Command	What it checks	Exit code semantics	Has flags?
`health`	config exists, DB opens, schema version	0=healthy, 19=unhealthy	No
`doctor`	config, token, database, Ollama	informational	No
`status`	last sync times per project	informational	No
`stats`	document counts, index size, integrity	informational	`--check`, `--repair`

Problem: A user/agent asking "is lore working?" must choose among four commands. health is a strict subset of doctor. status and stats are near-homonyms that answer different questions -- sync recency vs. index health. count (Cluster E) also overlaps with what stats reports.

Cognitive cost: High. The CLI literature (Clig.dev, Heroku CLI design guide, 12-factor CLI) consistently warns against >2 "status" commands. Users build a mental model of "the status command" -- when there are four, they pick wrong or give up.

Theoretical basis:

Nielsen's "Recognition over Recall" -- Four similar system-status commands force users to recall which one does what. One command with progressive disclosure (flags for depth) lets them recognize the option they need. This is doubly important for LLM agents, which perform better with fewer top-level choices and compositional flags.
Fitts's Law for CLIs -- Command discovery cost is proportional to list length. Each additional top-level command adds scanning time for humans and token cost for robots.

Cluster B: "Data pipeline stages" (4 commands, 1 pipeline)

Command	Pipeline stage	Subsumed by `sync`?
`sync`	ingest -> generate-docs -> embed	-- (is the parent)
`ingest`	GitLab API fetch	`sync` without `--no-docs --no-embed`
`generate-docs`	Build FTS documents	`sync --no-embed` (after ingest)
`embed`	Vector embeddings via Ollama	(final stage)

Problem: sync already has skip flags (--no-embed, --no-docs, --no-events, --no-status, --no-file-changes). The individual stage commands duplicate this with less control -- ingest has --full, --force, --dry-run, but sync also has all three.

The standalone commands exist for granular debugging, but in practice they're reached for <5% of the time. They inflate the help screen while sync handles 95% of use cases.

Cluster C: "File-centric intelligence" (3 overlapping surfaces)

Command	Input	Output	Key flags
`file-history`	`<PATH>`	MRs that touched file	`-p`, `--discussions`, `--no-follow-renames`, `--merged`, `-n`
`trace`	`<PATH>`	file->MR->issue->discussion chains	`-p`, `--discussions`, `--no-follow-renames`, `-n`
`who --path <PATH>`	`<PATH>` via flag	experts for file area	`-p`, `--since`, `-n`
`who --overlap <PATH>`	`<PATH>` via flag	users touching same files	`-p`, `--since`, `-n`

Problem: trace is a superset of file-history -- it follows the same MR chain but additionally links to closing issues and discussions. They share 4 of 5 filter flags. A user who wants "what happened to this file?" has to choose between two commands that sound nearly identical.

Cluster D: "Semantic discovery" (3 commands, all need embeddings)

Command	Input	Output
`search`	free text query	ranked documents
`related`	entity ref OR free text	similar entities
`drift`	entity ref	divergence score per discussion

related "some text" is functionally a vector-only search "some text" --mode semantic. The difference is that related can also seed from an entity (issues 42), while search only accepts text.

drift is specialized enough to stand alone, but it's only used on issues and has a single non-project flag (--threshold).

Cluster E: "Count" is an orphan

count is a standalone command for SELECT COUNT(*) FROM <table>. This could be:

A --count flag on issues/mrs/notes
A section in stats output (which already shows counts)
Part of status output

It exists as its own top-level command primarily for robot convenience, but adds to the 29-command sprawl.

3. Flag Consistency Audit

Consistent (good patterns)

Flag	Meaning	Used in
`-p, --project`	Scope to project (fuzzy)	issues, mrs, notes, search, sync, ingest, generate-docs, timeline, who, me, file-history, trace, drift, related
`-n, --limit`	Max results	issues, mrs, notes, search, timeline, who, me, file-history, trace, related
`--since`	Temporal filter (7d, 2w, YYYY-MM-DD)	issues, mrs, notes, search, timeline, who, me
`--fields`	Field selection / `minimal` preset	issues, mrs, notes, search, timeline, who, me
`--full`	Reset cursors / full rebuild	sync, ingest, embed, generate-docs
`--force`	Override stale lock	sync, ingest
`--dry-run`	Preview without changes	sync, ingest, stats

Inconsistencies (problems)

Issue	Details	Impact
`-f` collision	`ingest -f` = `--force`, `count -f` = `--for`	Robot confusion; violates "same short flag = same semantics"
`-a` inconsistency	`issues -a` = `--author`, `me` has no `-a` (uses `--user` for analogous concept)	Minor
`-s` inconsistency	`issues -s` = `--state`, `search` has no `-s` short flag at all	Missed ergonomic shortcut
`--sort` availability	Present in issues/mrs/notes, absent from search/timeline/file-history	Inconsistent query power
`--discussions`	`file-history --discussions`, `trace --discussions`, but `issues 42` has no `--discussions` flag	Can't get discussions when showing an issue
`--open` (browser)	`issues -o`, `mrs -o`, `notes --open` (no `-o`)	Inconsistent short flag
`--merged`	Only on `file-history`, not on `mrs` (which uses `--state merged`)	Different filter mechanics for same concept
Entity type naming	`count` takes `issues, mrs, discussions, notes, events`; `search --type` takes `issue, mr, discussion, note` (singular)	Singular vs plural for same concept

Theoretical basis:

Principle of Least Surprise (POLS) -- When -f means --force in one command and --for in another, both humans and agents learn the wrong lesson from one interaction and apply it to the other. CLI design guides (GNU standards, POSIX conventions, clig.dev) are unanimous: short flags should have consistent semantics across all subcommands.
Singular/plural inconsistency (issues vs issue as entity type values) is particularly harmful for LLM agents, which use pattern matching on prior successful invocations. If lore count issues works, the agent will try lore search --type issues -- and get a parse error.

4. Robot Ergonomics Assessment

Strengths (well above average for a CLI)

Feature	Rating	Notes
Structured output	Excellent	Consistent `{ok, data, meta}` envelope
Auto-detection	Excellent	Non-TTY -> robot mode, `LORE_ROBOT` env var
Error output	Excellent	Structured JSON to stderr with `actions` array for recovery
Exit codes	Excellent	20 distinct, well-documented codes
Self-discovery	Excellent	`robot-docs` manifest, `--brief` for token savings
Typo tolerance	Excellent	Autocorrect with confidence scores + structured warnings
Field selection	Good	`--fields minimal` saves ~60% tokens
No-args behavior	Good	Robot mode auto-outputs robot-docs

Weaknesses

Issue	Severity	Recommendation
29 commands in robot-docs manifest	High	Agents spend tokens evaluating which command to use. Grouping would reduce decision space.
`status`/`stats`/`stat` near-homonyms	High	LLMs are particularly susceptible to surface-level lexical confusion. `stat` is an alias for `stats` while `status` is a different command -- this guarantees agent errors.
Singular vs plural entity types	Medium	`count issues` works but `search --type issues` fails. Agents learn from one and apply to the other.
Overlapping file commands	Medium	Agent must decide between `trace`, `file-history`, and `who --path`. The decision tree isn't obvious from names alone.
`count` as separate command	Low	Could be a flag; standalone command inflates the decision space

5. Human Ergonomics Assessment

Strengths

Feature	Rating	Notes
Help text quality	Excellent	Every command has examples, help headings organize flags
Short flags	Good	`-p`, `-n`, `-s`, `-a`, `-J` cover 80% of common use
Alias coverage	Good	`issue`/`issues`, `mr`/`mrs`, `st`/`status`, `find`/`search`
Subcommand inference	Good	`lore issu` -> `issues` via clap infer
Color/icon system	Good	Auto, with overrides

Weaknesses

Issue	Severity	Recommendation
29 commands in flat help	High	Doesn't fit one terminal screen. No grouping -> overwhelming
`status` vs `stats` naming	High	Humans will type wrong one repeatedly
`health` vs `doctor` distinction	Medium	"Which one do I run?" -- unclear from names
`who` 5-mode overload	Medium	Help text is long; mode exclusions are complex
Pipeline stages as top-level	Low	`ingest`/`generate-docs`/`embed` rarely used directly but clutter help
`generate-docs` is 14 chars	Low	Longest command name; `gen-docs` or `gendocs` would help

6. Proposals (Ranked by Impact x Feasibility)

P1: Help Grouping (HIGH impact, LOW effort)

Problem: 29 flat commands -> information overload.

Fix: Use clap's help_heading on subcommands to group them:

Query:
  issues         List or show issues [aliases: issue]
  mrs            List or show merge requests [aliases: mr]
  notes          List notes from discussions [aliases: note]
  search         Search indexed documents [aliases: find]
  count          Count entities in local database

Intelligence:
  timeline       Chronological timeline of events
  who            People intelligence: experts, workload, overlap
  me             Personal work dashboard

File Analysis:
  trace          Trace why code was introduced
  file-history   Show MRs that touched a file
  related        Find semantically related entities
  drift          Detect discussion divergence

Data Pipeline:
  sync           Run full sync pipeline
  ingest         Ingest data from GitLab
  generate-docs  Generate searchable documents
  embed          Generate vector embeddings

System:
  init           Initialize configuration and database
  status         Show sync state [aliases: st]
  health         Quick health check
  doctor         Check environment health
  stats          Document and index statistics [aliases: stat]
  auth           Verify GitLab authentication
  token          Manage stored GitLab token
  migrate        Run pending database migrations
  cron           Manage automatic syncing
  completions    Generate shell completions
  robot-docs     Agent self-discovery manifest
  version        Show version information

Effort: ~20 lines of #[command(help_heading = "...")] annotations. No behavior changes.

P2: Resolve `status`/`stats` Confusion (HIGH impact, LOW effort)

Option A (recommended): Rename stats -> index.

lore status = when did I last sync? (pipeline state)
lore index = how big is my index? (data inventory)
The alias stat goes away (it was causing confusion anyway)

Option B: Rename status -> sync-state and stats -> db-stats. More descriptive but longer.

Option C: Merge both under check (see P4).

P3: Fix Singular/Plural Entity Type Inconsistency (MEDIUM impact, TRIVIAL effort)

Accept both singular and plural forms everywhere:

count already takes issues (plural) -- also accept issue
search --type already takes issue (singular) -- also accept issues
drift takes issues -- also accept issue

This is a ~10 line change in the value parsers and eliminates an entire class of agent errors.

P4: Merge `health` + `doctor` (MEDIUM impact, LOW effort)

health is a fast subset of doctor. Merge:

lore doctor = full diagnostic (current behavior)
lore doctor --quick = fast pre-flight, exit-code-only (current health)
Drop health as a separate command, add a hidden alias for backward compat

P5: Fix `-f` Short Flag Collision (MEDIUM impact, TRIVIAL effort)

Change count's -f, --for to just --for (no short flag). -f should mean --force project-wide, or nowhere.

P6: Consolidate `trace` + `file-history` (MEDIUM impact, MEDIUM effort)

trace already does everything file-history does plus more. Options:

Option A: Make file-history an alias for trace --flat (shows MR list without issue/discussion linking).

Option B: Add --mrs-only to trace that produces file-history output. Deprecate file-history with a hidden alias.

Either way, one fewer top-level command and no lost functionality.

P7: Hide Pipeline Sub-stages (LOW impact, TRIVIAL effort)

Move ingest, generate-docs, embed to #[command(hide = true)]. They remain usable but don't clutter --help. Direct users to sync with stage-skip flags.

For power users who need individual stages, document in sync --help:

To run individual stages:
  lore ingest                    # Fetch from GitLab only
  lore generate-docs             # Rebuild documents only
  lore embed                     # Re-embed only

P8: Make `count` a Flag, Not a Command (LOW impact, MEDIUM effort)

Add --count to issues and mrs:

lore issues --count              # replaces: lore count issues
lore mrs --count                 # replaces: lore count mrs
lore notes --count               # replaces: lore count notes

Keep count as a hidden alias for backward compatibility. Removes one top-level command.

P9: Consistent `--open` Short Flag (LOW impact, TRIVIAL effort)

notes --open lacks the -o shorthand that issues and mrs have. Add it.

P10: Add `--sort` to `search` (LOW impact, LOW effort)

search returns ranked results but offers no --sort override. Adding --sort=score,created,updated would bring it in line with issues/mrs/notes.

7. Summary: Proposed Command Tree (After All Changes)

If all proposals were adopted, the visible top-level shrinks from 29 -> 21:

Before (29)	After (21)	Change
`issues`	`issues`	--
`mrs`	`mrs`	--
`notes`	`notes`	--
`search`	`search`	--
`timeline`	`timeline`	--
`who`	`who`	--
`me`	`me`	--
`file-history`	(hidden, alias for `trace --flat`)	merged into trace
`trace`	`trace`	absorbs file-history
`drift`	`drift`	--
`related`	`related`	--
`count`	(hidden, `issues --count` replaces)	absorbed
`sync`	`sync`	--
`ingest`	(hidden)	hidden
`generate-docs`	(hidden)	hidden
`embed`	(hidden)	hidden
`status`	`status`	--
`health`	(merged into doctor)	merged
`doctor`	`doctor`	absorbs health
`stats`	`index`	renamed
`init`	`init`	--
`auth`	`auth`	--
`token`	`token`	--
`migrate`	`migrate`	--
`cron`	`cron`	--
`robot-docs`	`robot-docs`	--
`completions`	`completions`	--
`version`	`version`	--

Net reduction: 29 -> 21 visible (-28%). The hidden commands remain fully functional and documented in robot-docs for agents that already use them.

Theoretical basis:

Miller's Law -- Humans can hold 7+/-2 items in working memory. 29 commands far exceeds this. Even with help grouping (P1), the sheer count creates decision fatigue. The literature on CLI design (Heroku's "12-Factor CLI", clig.dev's "Command Line Interface Guidelines") recommends 10-15 top-level commands maximum, with grouping or nesting for anything beyond.
For LLM agents specifically: Research on tool-use with large tool sets (Schick et al. 2023, Qin et al. 2023) shows that agent accuracy degrades as the tool count increases, roughly following an inverse log curve. Reducing from 29 to 21 commands in the robot-docs manifest would measurably improve agent command selection accuracy.
Backward compatibility is free: Since AGENTS.md says "we don't care about backward compatibility," hidden aliases cost nothing and prevent breakage for agents with cached robot-docs.

8. Priority Matrix

Proposal	Impact	Effort	Risk	Recommended Order
P1: Help grouping	High	Trivial	None	Do first
P3: Singular/plural fix	Medium	Trivial	None	Do first
P5: Fix `-f` collision	Medium	Trivial	None	Do first
P9: `notes -o` shorthand	Low	Trivial	None	Do first
P2: Rename `stats`->`index`	High	Low	Alias needed	Do second
P4: Merge health->doctor	Medium	Low	Alias needed	Do second
P7: Hide pipeline stages	Low	Trivial	Needs docs update	Do second
P6: Merge file-history->trace	Medium	Medium	Flag design	Plan carefully
P8: count -> --count flag	Low	Medium	Compat shim	Plan carefully
P10: `--sort` on search	Low	Low	None	When convenient

The "do first" tier is 4 changes that could ship in a single commit with zero risk and immediate ergonomic improvement for both humans and agents.

20 KiB Raw Blame History

Gitlore CLI Command Audit

1. Full Command Inventory

2. Semantic Overlap Analysis

Cluster A: "Is the system working?" (4 commands, 1 concept)

Cluster B: "Data pipeline stages" (4 commands, 1 pipeline)

Cluster C: "File-centric intelligence" (3 overlapping surfaces)

Cluster D: "Semantic discovery" (3 commands, all need embeddings)

Cluster E: "Count" is an orphan

3. Flag Consistency Audit

Consistent (good patterns)

Inconsistencies (problems)

4. Robot Ergonomics Assessment

Strengths (well above average for a CLI)

Weaknesses

5. Human Ergonomics Assessment

Strengths

Weaknesses

6. Proposals (Ranked by Impact x Feasibility)

P1: Help Grouping (HIGH impact, LOW effort)

P2: Resolve status/stats Confusion (HIGH impact, LOW effort)

P3: Fix Singular/Plural Entity Type Inconsistency (MEDIUM impact, TRIVIAL effort)

P4: Merge health + doctor (MEDIUM impact, LOW effort)

P5: Fix -f Short Flag Collision (MEDIUM impact, TRIVIAL effort)

P6: Consolidate trace + file-history (MEDIUM impact, MEDIUM effort)

P7: Hide Pipeline Sub-stages (LOW impact, TRIVIAL effort)

P8: Make count a Flag, Not a Command (LOW impact, MEDIUM effort)

P9: Consistent --open Short Flag (LOW impact, TRIVIAL effort)

P10: Add --sort to search (LOW impact, LOW effort)

7. Summary: Proposed Command Tree (After All Changes)

8. Priority Matrix

20 KiB

Raw Blame History

P2: Resolve `status`/`stats` Confusion (HIGH impact, LOW effort)

P4: Merge `health` + `doctor` (MEDIUM impact, LOW effort)

P5: Fix `-f` Short Flag Collision (MEDIUM impact, TRIVIAL effort)

P6: Consolidate `trace` + `file-history` (MEDIUM impact, MEDIUM effort)

P8: Make `count` a Flag, Not a Command (LOW impact, MEDIUM effort)

P9: Consistent `--open` Short Flag (LOW impact, TRIVIAL effort)

P10: Add `--sort` to `search` (LOW impact, LOW effort)