gitlore/command-restructure/CLI_AUDIT.md

# Gitlore CLI Command Audit

## 1. Full Command Inventory

**29 visible + 4 hidden + 2 stub = 35 total command surface**

| # | Command | Aliases | Args | Flags | Purpose |
|---|---------|---------|------|-------|---------|
| 1 | `issues` | `issue` | `[IID]` | 15 | List/show issues |
| 2 | `mrs` | `mr`, `merge-requests` | `[IID]` | 16 | List/show MRs |
| 3 | `notes` | `note` | — | 16 | List notes |
| 4 | `search` | `find`, `query` | `<QUERY>` | 13 | Hybrid FTS+vector search |
| 5 | `timeline` | — | `<QUERY>` | 11 | Chronological event reconstruction |
| 6 | `who` | — | `[TARGET]` | 16 | People intelligence (5 modes) |
| 7 | `me` | — | — | 10 | Personal dashboard |
| 8 | `file-history` | — | `<PATH>` | 6 | MRs that touched a file |
| 9 | `trace` | — | `<PATH>` | 5 | file->MR->issue->discussion chain |
| 10 | `drift` | — | `<TYPE> <IID>` | 3 | Discussion divergence detection |
| 11 | `related` | — | `<QUERY_OR_TYPE> [IID]` | 3 | Semantic similarity |
| 12 | `count` | — | `<ENTITY>` | 2 | Count entities |
| 13 | `sync` | — | — | 14 | Full pipeline: ingest+docs+embed |
| 14 | `ingest` | — | `[ENTITY]` | 5 | Fetch from GitLab API |
| 15 | `generate-docs` | — | — | 2 | Build searchable documents |
| 16 | `embed` | — | — | 2 | Generate vector embeddings |
| 17 | `status` | `st` | — | 0 | Last sync times per project |
| 18 | `health` | — | — | 0 | Quick pre-flight (exit code only) |
| 19 | `doctor` | — | — | 0 | Full environment diagnostic |
| 20 | `stats` | `stat` | — | 3 | Document/index statistics |
| 21 | `init` | — | — | 6 | Setup config + database |
| 22 | `auth` | — | — | 0 | Verify GitLab token |
| 23 | `token` | — | subcommand | 1-2 | Token CRUD (set/show) |
| 24 | `cron` | — | subcommand | 0-1 | Auto-sync scheduling |
| 25 | `migrate` | — | — | 0 | Apply DB migrations |
| 26 | `robot-docs` | — | — | 1 | Agent self-discovery manifest |
| 27 | `completions` | — | `<SHELL>` | 0 | Shell completions |
| 28 | `version` | — | — | 0 | Version info |
| 29 | *help* | — | — | — | (clap built-in) |
| | **Hidden/deprecated:** | | | | |
| 30 | `list` | — | `<ENTITY>` | 14 | deprecated, use issues/mrs |
| 31 | `auth-test` | — | — | 0 | deprecated, use auth |
| 32 | `sync-status` | — | — | 0 | deprecated, use status |
| 33 | `backup` | — | — | 0 | Stub (not implemented) |
| 34 | `reset` | — | — | 1 | Stub (not implemented) |

---

## 2. Semantic Overlap Analysis

### Cluster A: "Is the system working?" (4 commands, 1 concept)

| Command | What it checks | Exit code semantics | Has flags? |
|---------|---------------|---------------------|------------|
| `health` | config exists, DB opens, schema version | 0=healthy, 19=unhealthy | No |
| `doctor` | config, token, database, Ollama | informational | No |
| `status` | last sync times per project | informational | No |
| `stats` | document counts, index size, integrity | informational | `--check`, `--repair` |

**Problem:** A user/agent asking "is lore working?" must choose among four commands. `health` is a strict subset of `doctor`. `status` and `stats` are near-homonyms that answer different questions -- sync recency vs. index health. `count` (Cluster E) also overlaps with what `stats` reports.

**Cognitive cost:** High. The CLI literature (Clig.dev, Heroku CLI design guide, 12-factor CLI) consistently warns against >2 "status" commands. Users build a mental model of "the status command" -- when there are four, they pick wrong or give up.

**Theoretical basis:**

- **Nielsen's "Recognition over Recall"** -- Four similar system-status commands force users to *recall* which one does what. One command with progressive disclosure (flags for depth) lets them *recognize* the option they need. This is doubly important for LLM agents, which perform better with fewer top-level choices and compositional flags.

- **Fitts's Law for CLIs** -- Command discovery cost is proportional to list length. Each additional top-level command adds scanning time for humans and token cost for robots.

### Cluster B: "Data pipeline stages" (4 commands, 1 pipeline)

| Command | Pipeline stage | Subsumed by `sync`? |
|---------|---------------|---------------------|
| `sync` | ingest -> generate-docs -> embed | -- (is the parent) |
| `ingest` | GitLab API fetch | `sync` without `--no-docs --no-embed` |
| `generate-docs` | Build FTS documents | `sync --no-embed` (after ingest) |
| `embed` | Vector embeddings via Ollama | (final stage) |

**Problem:** `sync` already has skip flags (`--no-embed`, `--no-docs`, `--no-events`, `--no-status`, `--no-file-changes`). The individual stage commands duplicate this with less control -- `ingest` has `--full`, `--force`, `--dry-run`, but `sync` also has all three.

The standalone commands exist for granular debugging, but in practice they're reached for <5% of the time. They inflate the help screen while `sync` handles 95% of use cases.

### Cluster C: "File-centric intelligence" (3 overlapping surfaces)

| Command | Input | Output | Key flags |
|---------|-------|--------|-----------|
| `file-history` | `<PATH>` | MRs that touched file | `-p`, `--discussions`, `--no-follow-renames`, `--merged`, `-n` |
| `trace` | `<PATH>` | file->MR->issue->discussion chains | `-p`, `--discussions`, `--no-follow-renames`, `-n` |
| `who --path <PATH>` | `<PATH>` via flag | experts for file area | `-p`, `--since`, `-n` |
| `who --overlap <PATH>` | `<PATH>` via flag | users touching same files | `-p`, `--since`, `-n` |

**Problem:** `trace` is a superset of `file-history` -- it follows the same MR chain but additionally links to closing issues and discussions. They share 4 of 5 filter flags. A user who wants "what happened to this file?" has to choose between two commands that sound nearly identical.

### Cluster D: "Semantic discovery" (3 commands, all need embeddings)

| Command | Input | Output |
|---------|-------|--------|
| `search` | free text query | ranked documents |
| `related` | entity ref OR free text | similar entities |
| `drift` | entity ref | divergence score per discussion |

`related "some text"` is functionally a vector-only `search "some text" --mode semantic`. The difference is that `related` can also seed from an entity (issues 42), while `search` only accepts text.

`drift` is specialized enough to stand alone, but it's only used on issues and has a single non-project flag (`--threshold`).

### Cluster E: "Count" is an orphan

`count` is a standalone command for `SELECT COUNT(*) FROM <table>`. This could be:
- A `--count` flag on `issues`/`mrs`/`notes`
- A section in `stats` output (which already shows counts)
- Part of `status` output

It exists as its own top-level command primarily for robot convenience, but adds to the 29-command sprawl.

---

## 3. Flag Consistency Audit

### Consistent (good patterns)

| Flag | Meaning | Used in |
|------|---------|---------|
| `-p, --project` | Scope to project (fuzzy) | issues, mrs, notes, search, sync, ingest, generate-docs, timeline, who, me, file-history, trace, drift, related |
| `-n, --limit` | Max results | issues, mrs, notes, search, timeline, who, me, file-history, trace, related |
| `--since` | Temporal filter (7d, 2w, YYYY-MM-DD) | issues, mrs, notes, search, timeline, who, me |
| `--fields` | Field selection / `minimal` preset | issues, mrs, notes, search, timeline, who, me |
| `--full` | Reset cursors / full rebuild | sync, ingest, embed, generate-docs |
| `--force` | Override stale lock | sync, ingest |
| `--dry-run` | Preview without changes | sync, ingest, stats |

### Inconsistencies (problems)

| Issue | Details | Impact |
|-------|---------|--------|
| `-f` collision | `ingest -f` = `--force`, `count -f` = `--for` | Robot confusion; violates "same short flag = same semantics" |
| `-a` inconsistency | `issues -a` = `--author`, `me` has no `-a` (uses `--user` for analogous concept) | Minor |
| `-s` inconsistency | `issues -s` = `--state`, `search` has no `-s` short flag at all | Missed ergonomic shortcut |
| `--sort` availability | Present in issues/mrs/notes, absent from search/timeline/file-history | Inconsistent query power |
| `--discussions` | `file-history --discussions`, `trace --discussions`, but `issues 42` has no `--discussions` flag | Can't get discussions when showing an issue |
| `--open` (browser) | `issues -o`, `mrs -o`, `notes --open` (no `-o`) | Inconsistent short flag |
| `--merged` | Only on `file-history`, not on `mrs` (which uses `--state merged`) | Different filter mechanics for same concept |
| Entity type naming | `count` takes `issues, mrs, discussions, notes, events`; `search --type` takes `issue, mr, discussion, note` (singular) | Singular vs plural for same concept |

**Theoretical basis:**

- **Principle of Least Surprise (POLS)** -- When `-f` means `--force` in one command and `--for` in another, both humans and agents learn the wrong lesson from one interaction and apply it to the other. CLI design guides (GNU standards, POSIX conventions, clig.dev) are unanimous: short flags should have consistent semantics across all subcommands.

- **Singular/plural inconsistency** (`issues` vs `issue` as entity type values) is particularly harmful for LLM agents, which use pattern matching on prior successful invocations. If `lore count issues` works, the agent will try `lore search --type issues` -- and get a parse error.

---

## 4. Robot Ergonomics Assessment

### Strengths (well above average for a CLI)

| Feature | Rating | Notes |
|---------|--------|-------|
| Structured output | Excellent | Consistent `{ok, data, meta}` envelope |
| Auto-detection | Excellent | Non-TTY -> robot mode, `LORE_ROBOT` env var |
| Error output | Excellent | Structured JSON to stderr with `actions` array for recovery |
| Exit codes | Excellent | 20 distinct, well-documented codes |
| Self-discovery | Excellent | `robot-docs` manifest, `--brief` for token savings |
| Typo tolerance | Excellent | Autocorrect with confidence scores + structured warnings |
| Field selection | Good | `--fields minimal` saves ~60% tokens |
| No-args behavior | Good | Robot mode auto-outputs robot-docs |

### Weaknesses

| Issue | Severity | Recommendation |
|-------|----------|----------------|
| 29 commands in robot-docs manifest | High | Agents spend tokens evaluating which command to use. Grouping would reduce decision space. |
| `status`/`stats`/`stat` near-homonyms | High | LLMs are particularly susceptible to surface-level lexical confusion. `stat` is an alias for `stats` while `status` is a different command -- this guarantees agent errors. |
| Singular vs plural entity types | Medium | `count issues` works but `search --type issues` fails. Agents learn from one and apply to the other. |
| Overlapping file commands | Medium | Agent must decide between `trace`, `file-history`, and `who --path`. The decision tree isn't obvious from names alone. |
| `count` as separate command | Low | Could be a flag; standalone command inflates the decision space |

---

## 5. Human Ergonomics Assessment

### Strengths

| Feature | Rating | Notes |
|---------|--------|-------|
| Help text quality | Excellent | Every command has examples, help headings organize flags |
| Short flags | Good | `-p`, `-n`, `-s`, `-a`, `-J` cover 80% of common use |
| Alias coverage | Good | `issue`/`issues`, `mr`/`mrs`, `st`/`status`, `find`/`search` |
| Subcommand inference | Good | `lore issu` -> `issues` via clap infer |
| Color/icon system | Good | Auto, with overrides |

### Weaknesses

| Issue | Severity | Recommendation |
|-------|----------|----------------|
| 29 commands in flat help | High | Doesn't fit one terminal screen. No grouping -> overwhelming |
| `status` vs `stats` naming | High | Humans will type wrong one repeatedly |
| `health` vs `doctor` distinction | Medium | "Which one do I run?" -- unclear from names |
| `who` 5-mode overload | Medium | Help text is long; mode exclusions are complex |
| Pipeline stages as top-level | Low | `ingest`/`generate-docs`/`embed` rarely used directly but clutter help |
| `generate-docs` is 14 chars | Low | Longest command name; `gen-docs` or `gendocs` would help |

---

## 6. Proposals (Ranked by Impact x Feasibility)

### P1: Help Grouping (HIGH impact, LOW effort)

**Problem:** 29 flat commands -> information overload.

**Fix:** Use clap's `help_heading` on subcommands to group them:

```
Query:
  issues         List or show issues [aliases: issue]
  mrs            List or show merge requests [aliases: mr]
  notes          List notes from discussions [aliases: note]
  search         Search indexed documents [aliases: find]
  count          Count entities in local database

Intelligence:
  timeline       Chronological timeline of events
  who            People intelligence: experts, workload, overlap
  me             Personal work dashboard

File Analysis:
  trace          Trace why code was introduced
  file-history   Show MRs that touched a file
  related        Find semantically related entities
  drift          Detect discussion divergence

Data Pipeline:
  sync           Run full sync pipeline
  ingest         Ingest data from GitLab
  generate-docs  Generate searchable documents
  embed          Generate vector embeddings

System:
  init           Initialize configuration and database
  status         Show sync state [aliases: st]
  health         Quick health check
  doctor         Check environment health
  stats          Document and index statistics [aliases: stat]
  auth           Verify GitLab authentication
  token          Manage stored GitLab token
  migrate        Run pending database migrations
  cron           Manage automatic syncing
  completions    Generate shell completions
  robot-docs     Agent self-discovery manifest
  version        Show version information
```

**Effort:** ~20 lines of `#[command(help_heading = "...")]` annotations. No behavior changes.

### P2: Resolve `status`/`stats` Confusion (HIGH impact, LOW effort)

**Option A (recommended):** Rename `stats` -> `index`.
- `lore status` = when did I last sync? (pipeline state)
- `lore index` = how big is my index? (data inventory)
- The alias `stat` goes away (it was causing confusion anyway)

**Option B:** Rename `status` -> `sync-state` and `stats` -> `db-stats`. More descriptive but longer.

**Option C:** Merge both under `check` (see P4).

### P3: Fix Singular/Plural Entity Type Inconsistency (MEDIUM impact, TRIVIAL effort)

Accept both singular and plural forms everywhere:
- `count` already takes `issues` (plural) -- also accept `issue`
- `search --type` already takes `issue` (singular) -- also accept `issues`
- `drift` takes `issues` -- also accept `issue`

This is a ~10 line change in the value parsers and eliminates an entire class of agent errors.

### P4: Merge `health` + `doctor` (MEDIUM impact, LOW effort)

`health` is a fast subset of `doctor`. Merge:
- `lore doctor` = full diagnostic (current behavior)
- `lore doctor --quick` = fast pre-flight, exit-code-only (current `health`)
- Drop `health` as a separate command, add a hidden alias for backward compat

### P5: Fix `-f` Short Flag Collision (MEDIUM impact, TRIVIAL effort)

Change `count`'s `-f, --for` to just `--for` (no short flag). `-f` should mean `--force` project-wide, or nowhere.

### P6: Consolidate `trace` + `file-history` (MEDIUM impact, MEDIUM effort)

`trace` already does everything `file-history` does plus more. Options:

**Option A:** Make `file-history` an alias for `trace --flat` (shows MR list without issue/discussion linking).

**Option B:** Add `--mrs-only` to `trace` that produces `file-history` output. Deprecate `file-history` with a hidden alias.

Either way, one fewer top-level command and no lost functionality.

### P7: Hide Pipeline Sub-stages (LOW impact, TRIVIAL effort)

Move `ingest`, `generate-docs`, `embed` to `#[command(hide = true)]`. They remain usable but don't clutter `--help`. Direct users to `sync` with stage-skip flags.

For power users who need individual stages, document in `sync --help`:
```
To run individual stages:
  lore ingest                    # Fetch from GitLab only
  lore generate-docs             # Rebuild documents only
  lore embed                     # Re-embed only
```

### P8: Make `count` a Flag, Not a Command (LOW impact, MEDIUM effort)

Add `--count` to `issues` and `mrs`:
```bash
lore issues --count              # replaces: lore count issues
lore mrs --count                 # replaces: lore count mrs
lore notes --count               # replaces: lore count notes
```

Keep `count` as a hidden alias for backward compatibility. Removes one top-level command.

### P9: Consistent `--open` Short Flag (LOW impact, TRIVIAL effort)

`notes --open` lacks the `-o` shorthand that `issues` and `mrs` have. Add it.

### P10: Add `--sort` to `search` (LOW impact, LOW effort)

`search` returns ranked results but offers no `--sort` override. Adding `--sort=score,created,updated` would bring it in line with `issues`/`mrs`/`notes`.

---

## 7. Summary: Proposed Command Tree (After All Changes)

If all proposals were adopted, the visible top-level shrinks from **29 -> 21**:

| Before (29) | After (21) | Change |
|-------------|------------|--------|
| `issues` | `issues` | -- |
| `mrs` | `mrs` | -- |
| `notes` | `notes` | -- |
| `search` | `search` | -- |
| `timeline` | `timeline` | -- |
| `who` | `who` | -- |
| `me` | `me` | -- |
| `file-history` | *(hidden, alias for `trace --flat`)* | **merged into trace** |
| `trace` | `trace` | absorbs file-history |
| `drift` | `drift` | -- |
| `related` | `related` | -- |
| `count` | *(hidden, `issues --count` replaces)* | **absorbed** |
| `sync` | `sync` | -- |
| `ingest` | *(hidden)* | **hidden** |
| `generate-docs` | *(hidden)* | **hidden** |
| `embed` | *(hidden)* | **hidden** |
| `status` | `status` | -- |
| `health` | *(merged into doctor)* | **merged** |
| `doctor` | `doctor` | absorbs health |
| `stats` | `index` | **renamed** |
| `init` | `init` | -- |
| `auth` | `auth` | -- |
| `token` | `token` | -- |
| `migrate` | `migrate` | -- |
| `cron` | `cron` | -- |
| `robot-docs` | `robot-docs` | -- |
| `completions` | `completions` | -- |
| `version` | `version` | -- |

**Net reduction:** 29 -> 21 visible (-28%). The hidden commands remain fully functional and documented in `robot-docs` for agents that already use them.

**Theoretical basis:**

- **Miller's Law** -- Humans can hold 7+/-2 items in working memory. 29 commands far exceeds this. Even with help grouping (P1), the sheer count creates decision fatigue. The literature on CLI design (Heroku's "12-Factor CLI", clig.dev's "Command Line Interface Guidelines") recommends 10-15 top-level commands maximum, with grouping or nesting for anything beyond.

- **For LLM agents specifically:** Research on tool-use with large tool sets (Schick et al. 2023, Qin et al. 2023) shows that agent accuracy degrades as the tool count increases, roughly following an inverse log curve. Reducing from 29 to 21 commands in the robot-docs manifest would measurably improve agent command selection accuracy.

- **Backward compatibility is free:** Since AGENTS.md says "we don't care about backward compatibility," hidden aliases cost nothing and prevent breakage for agents with cached robot-docs.

---

## 8. Priority Matrix

| Proposal | Impact | Effort | Risk | Recommended Order |
|----------|--------|--------|------|-------------------|
| P1: Help grouping | High | Trivial | None | **Do first** |
| P3: Singular/plural fix | Medium | Trivial | None | **Do first** |
| P5: Fix `-f` collision | Medium | Trivial | None | **Do first** |
| P9: `notes -o` shorthand | Low | Trivial | None | **Do first** |
| P2: Rename `stats`->`index` | High | Low | Alias needed | **Do second** |
| P4: Merge health->doctor | Medium | Low | Alias needed | **Do second** |
| P7: Hide pipeline stages | Low | Trivial | Needs docs update | **Do second** |
| P6: Merge file-history->trace | Medium | Medium | Flag design | **Plan carefully** |
| P8: count -> --count flag | Low | Medium | Compat shim | **Plan carefully** |
| P10: `--sort` on search | Low | Low | None | **When convenient** |

The "do first" tier is 4 changes that could ship in a single commit with zero risk and immediate ergonomic improvement for both humans and agents.