6 Commits

Author SHA1 Message Date
teernisse
159c490ad7 docs: update README with notes, drift, error tolerance, scoring config, and expanded command reference
Major additions:
- lore notes command: full documentation of rich note querying with
  filters (author, type, path, resolution, time range, body substring),
  sort/format options, field selection, and browser opening
- lore drift command: discussion divergence detection documentation
- Error Tolerance section: table of all 8 auto-correction types with
  examples and mode behavior, stderr JSON warning format, fuzzy
  suggestion format for unrecognized commands
- Command Aliases table: primary commands and their accepted aliases
- scoring config section: all weight/half-life/decay parameters for
  the who-expert scoring engine (authorWeight, reviewerWeight, noteBonus,
  half-life periods, closedMrMultiplier, excludedUsernames)

Updates to existing sections:
- Timeline: entity-direct seeding syntax (issue:N, i:N, mr:N, m:N),
  hybrid search pipeline description replacing pure FTS5, discussion
  thread collection, --fields flag, numbered progress spinners
- Search: --after/--updated-after renamed to --since/--updated-since,
  progress spinner behavior, note type filter
- Who: --explain-score, --as-of, --include-bots, --all-history, --detail
- Sync: --no-file-changes flag
- Robot-docs: --brief flag
- Field selection: expanded to note which commands support --fields
2026-02-13 17:27:59 -05:00
teernisse
e0041ed4d9 feat(cli): improve error recovery with alias-aware suggestions and error tolerance manifest
Two related improvements to agent ergonomics in main.rs:

1. suggest_similar_command now matches against aliases (issue->issues,
   mr->mrs, find->search, stat->stats, note->notes, etc.) and provides
   contextual usage examples via a new command_example() helper, so
   agents get actionable recovery hints like "Did you mean 'lore mrs'?
   Example: lore --robot mrs -n 10" instead of just the command name.

2. robot-docs now includes an error_tolerance section documenting every
   auto-correction the CLI performs: types (single_dash_long_flag,
   case_normalization, flag_prefix, fuzzy_flag, subcommand_alias,
   value_normalization, value_fuzzy, prefix_matching), examples, and
   mode behavior (threshold differences). Also expands the aliases
   section with command_aliases and pre_clap_aliases maps for complete
   agent self-discovery.

Together these ensure agents can programmatically discover and recover
from any CLI input error without human intervention.
2026-02-13 17:27:49 -05:00
teernisse
a34751bd47 feat(autocorrect): expand pre-clap correction to 3-phase pipeline with subcommand aliases, value normalization, and flag prefix matching
Three-phase pipeline replacing the single-pass correction:

- Phase A: Subcommand alias correction — handles forms clap can't
  express (merge_requests, mergerequests, robotdocs, generatedocs,
  gen-docs, etc.) via case-insensitive alias map lookup.
- Phase B: Per-arg flag corrections — adds unambiguous prefix expansion
  (--proj -> --project) alongside existing single-dash, case, and fuzzy
  rules. New FlagPrefix rule with 0.95 confidence.
- Phase C: Enum value normalization — auto-corrects casing, prefixes,
  and typos for flags with known valid values. Handles both --flag value
  and --flag=value forms. Respects POSIX -- option terminator.

Changes strict/robot mode from disabling fuzzy matching entirely to using
a higher threshold (0.9 vs 0.8), still catching obvious typos like
--projct while avoiding speculative corrections that mislead agents.

New CorrectionRule variants: SubcommandAlias, ValueNormalization,
ValueFuzzy, FlagPrefix. Each has a corresponding teaching note.
Comprehensive test coverage for all new correction types including
subcommand aliases, value normalization (case, prefix, fuzzy, eq-form),
flag prefix (ambiguous rejection, eq-value preservation), and updated
strict mode behavior.
2026-02-13 17:27:39 -05:00
teernisse
0aecbf33c0 feat(xref): extract cross-references from descriptions, user notes, and fix system note regex
- Fix MENTIONED_RE/CLOSED_BY_RE to match real GitLab format
  ('mentioned in issue #N' / 'mentioned in merge request !N')
- Add GITLAB_URL_RE + parse_url_refs() for full URL extraction
- Add extract_refs_from_descriptions() -> source_method='description_parse'
- Add extract_refs_from_user_notes() -> source_method='note_parse'
- Wire both into orchestrator after system note extraction
- 36 tests: regex fix, URL parsing, integration, idempotency
2026-02-13 17:19:36 -05:00
teernisse
c10471ddb9 feat(timeline): add entity-direct seeding (issue:N, mr:N syntax)
Adds issue:N / i:N / mr:N / m:N query syntax to bypass hybrid search
and seed the timeline directly from a known entity. All discussions for
the entity are gathered without needing Ollama.

- parse_timeline_query() detects entity-direct patterns
- resolve_entity_by_iid() resolves IID to EntityRef with ambiguity handling
- seed_timeline_direct() gathers all discussions for the entity
- 20 new tests (5 resolve, 6 direct seed, 9 parse)
- Updated CLI help text and robot-docs manifest
2026-02-13 15:22:45 -05:00
teernisse
cbce4c9f59 release: v0.8.2 2026-02-13 15:01:28 -05:00
13 changed files with 1937 additions and 109 deletions

2
Cargo.lock generated
View File

@@ -1106,7 +1106,7 @@ checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
[[package]]
name = "lore"
version = "0.8.1"
version = "0.8.2"
dependencies = [
"async-stream",
"chrono",

View File

@@ -1,6 +1,6 @@
[package]
name = "lore"
version = "0.8.1"
version = "0.8.2"
edition = "2024"
description = "Gitlore - Local GitLab data management with semantic search"
authors = ["Taylor Eernisse"]

168
README.md
View File

@@ -19,7 +19,10 @@ Local GitLab data management with semantic search, people intelligence, and temp
- **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
- **Work item status enrichment**: Fetches issue statuses (e.g., "To do", "In progress", "Done") from GitLab's GraphQL API with adaptive page sizing, color-coded display, and case-insensitive filtering
- **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
- **Note querying**: Rich filtering over discussion notes by author, type, path, resolution status, time range, and body content
- **Discussion drift detection**: Semantic analysis of how discussions diverge from original issue intent
- **Robot mode**: Machine-readable JSON output with structured errors, meaningful exit codes, and actionable recovery steps
- **Error tolerance**: Auto-corrects common CLI mistakes (case, typos, single-dash flags, value casing) with teaching feedback
- **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing
## Installation
@@ -71,6 +74,12 @@ lore who @asmith
# Timeline of events related to deployments
lore timeline "deployment"
# Timeline for a specific issue
lore timeline issue:42
# Query notes by author
lore notes --author alice --since 7d
# Robot mode (machine-readable JSON)
lore -J issues -n 5 | jq .
```
@@ -109,6 +118,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
"model": "nomic-embed-text",
"baseUrl": "http://localhost:11434",
"concurrency": 4
},
"scoring": {
"authorWeight": 25,
"reviewerWeight": 10,
"noteBonus": 1,
"authorHalfLifeDays": 180,
"reviewerHalfLifeDays": 90,
"noteHalfLifeDays": 45,
"excludedUsernames": ["bot-user"]
}
}
```
@@ -135,6 +153,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
| `embedding` | `model` | `nomic-embed-text` | Model name for embeddings |
| `embedding` | `baseUrl` | `http://localhost:11434` | Ollama server URL |
| `embedding` | `concurrency` | `4` | Concurrent embedding requests |
| `scoring` | `authorWeight` | `25` | Points per MR where the user authored code touching the path |
| `scoring` | `reviewerWeight` | `10` | Points per MR where the user reviewed code touching the path |
| `scoring` | `noteBonus` | `1` | Bonus per inline review comment (DiffNote) |
| `scoring` | `reviewerAssignmentWeight` | `3` | Points per MR where the user was assigned as reviewer |
| `scoring` | `authorHalfLifeDays` | `180` | Half-life in days for author contribution decay |
| `scoring` | `reviewerHalfLifeDays` | `90` | Half-life in days for reviewer contribution decay |
| `scoring` | `noteHalfLifeDays` | `45` | Half-life in days for note/comment decay |
| `scoring` | `closedMrMultiplier` | `0.5` | Score multiplier for closed (not merged) MRs |
| `scoring` | `excludedUsernames` | `[]` | Usernames excluded from expert results (e.g., bots) |
### Config File Resolution
@@ -262,18 +289,21 @@ lore search "login flow" --mode semantic # Vector similarity only
lore search "auth" --type issue # Filter by source type
lore search "auth" --type mr # MR documents only
lore search "auth" --type discussion # Discussion documents only
lore search "auth" --type note # Individual notes only
lore search "deploy" --author username # Filter by author
lore search "deploy" -p group/repo # Filter by project
lore search "deploy" --label backend # Filter by label (AND logic)
lore search "deploy" --path src/ # Filter by file path (trailing / for prefix)
lore search "deploy" --after 7d # Created after (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-after 2w # Updated after
lore search "deploy" --since 7d # Created since (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-since 2w # Updated since
lore search "deploy" -n 50 # Limit results (default 20, max 100)
lore search "deploy" --explain # Show ranking explanation per result
lore search "deploy" --fts-mode raw # Raw FTS5 query syntax (advanced)
```
The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. Use `raw` for advanced FTS5 query syntax (AND, OR, NOT, phrase matching, prefix queries).
The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. FTS5 boolean operators (`AND`, `OR`, `NOT`, `NEAR`) are passed through in safe mode, so queries like `"switch AND health"` work without switching to raw mode. Use `raw` for advanced FTS5 query syntax (phrase matching, column filters, prefix queries).
A progress spinner displays during search, showing the active mode (e.g., `Searching (hybrid)...`). In robot mode, spinners are suppressed for clean JSON output.
Requires `lore generate-docs` (or `lore sync`) to have been run at least once. Semantic and hybrid modes require `lore embed` (or `lore sync`) to have generated vector embeddings via Ollama.
@@ -283,7 +313,7 @@ People intelligence: discover experts, analyze workloads, review patterns, activ
#### Expert Mode
Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis).
Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis). Scores use exponential half-life decay so recent contributions count more than older ones. Scoring weights and half-life periods are configurable via the `scoring` config section.
```bash
lore who src/features/auth/ # Who knows about this directory?
@@ -292,6 +322,9 @@ lore who --path README.md # Root files need --path flag
lore who --path Makefile # Dotless root files too
lore who src/ --since 3m # Limit to recent 3 months
lore who src/ -p group/repo # Scope to project
lore who src/ --explain-score # Show per-component score breakdown
lore who src/ --as-of 30d # Score as if "now" was 30 days ago
lore who src/ --include-bots # Include bot users in results
```
The target is auto-detected as a path when it contains `/`. For root files without `/` (e.g., `README.md`), use the `--path` flag. Default time window: 6 months.
@@ -348,13 +381,22 @@ Shows: users with touch counts (author vs. review), linked MR references. Defaul
| `-p` / `--project` | Scope to a project (fuzzy match) |
| `--since` | Time window (7d, 2w, 6m, YYYY-MM-DD). Default varies by mode. |
| `-n` / `--limit` | Max results per section (1-500, default 20) |
| `--all-history` | Remove the default time window, query all history |
| `--detail` | Show per-MR detail breakdown (expert mode only) |
| `--explain-score` | Show per-component score breakdown (expert mode only) |
| `--as-of` | Score as if "now" is a past date (ISO 8601 or duration like 30d, expert mode only) |
| `--include-bots` | Include bot users normally excluded via `scoring.excludedUsernames` |
### `lore timeline`
Reconstruct a chronological timeline of events matching a keyword query. The pipeline discovers related entities through cross-reference graph traversal and assembles a unified, time-ordered event stream.
```bash
lore timeline "deployment" # Events related to deployments
lore timeline "deployment" # Search-based seeding (hybrid search)
lore timeline issue:42 # Direct entity seeding by issue IID
lore timeline i:42 # Shorthand for issue:42
lore timeline mr:99 # Direct entity seeding by MR IID
lore timeline m:99 # Shorthand for mr:99
lore timeline "auth" -p group/repo # Scoped to a project
lore timeline "auth" --since 30d # Only recent events
lore timeline "migration" --depth 2 # Deeper cross-reference expansion
@@ -363,6 +405,8 @@ lore timeline "deploy" -n 50 # Limit event count
lore timeline "auth" --max-seeds 5 # Fewer seed entities
```
The query can be either a search string (hybrid search finds matching entities) or an entity reference (`issue:N`, `i:N`, `mr:N`, `m:N`) which directly seeds the timeline from a specific entity and its cross-references.
#### Flags
| Flag | Default | Description |
@@ -375,13 +419,16 @@ lore timeline "auth" --max-seeds 5 # Fewer seed entities
| `--max-seeds` | `10` | Maximum seed entities from search |
| `--max-entities` | `50` | Maximum entities discovered via cross-references |
| `--max-evidence` | `10` | Maximum evidence notes included |
| `--fields` | all | Select output fields (comma-separated, or 'minimal' preset) |
#### Pipeline Stages
1. **SEED** -- Full-text search identifies the most relevant issues and MRs matching the query. Documents are ranked by BM25 relevance.
2. **HYDRATE** -- Evidence notes are extracted: the top FTS-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced.
Each stage displays a numbered progress spinner (e.g., `[1/3] Seeding timeline...`). In robot mode, spinners are suppressed for clean JSON output.
1. **SEED** -- Hybrid search (FTS5 lexical + Ollama vector similarity via Reciprocal Rank Fusion) identifies the most relevant issues and MRs. Falls back to lexical-only if Ollama is unavailable. Discussion notes matching the query are also discovered and attached to their parent entities.
2. **HYDRATE** -- Evidence notes are extracted: the top search-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced. Matched discussions are collected as full thread candidates.
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities via "closes", "related", and optionally "mentioned" references up to the configured depth.
4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking.
4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, evidence notes, and full discussion threads. Events are sorted chronologically with stable tiebreaking.
5. **RENDER** -- Events are formatted as human-readable text or structured JSON (robot mode).
#### Event Types
@@ -395,13 +442,70 @@ lore timeline "auth" --max-seeds 5 # Fewer seed entities
| `MilestoneSet` | Milestone assigned |
| `MilestoneRemoved` | Milestone removed |
| `Merged` | MR merged (deduplicated against state events) |
| `NoteEvidence` | Discussion note matched by FTS, with snippet |
| `NoteEvidence` | Discussion note matched by search, with snippet |
| `DiscussionThread` | Full discussion thread with all non-system notes |
| `CrossReferenced` | Reference to another entity |
#### Unresolved References
When graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the output. This enables discovery of external dependencies and can inform future sync targets.
### `lore notes`
Query individual notes from discussions with rich filtering options.
```bash
lore notes # List 50 most recent notes
lore notes --author alice --since 7d # Notes by alice in last 7 days
lore notes --for-issue 42 -p group/repo # Notes on issue #42
lore notes --for-mr 99 -p group/repo # Notes on MR !99
lore notes --path src/ --resolution unresolved # Unresolved diff notes in src/
lore notes --note-type DiffNote # Only inline code review comments
lore notes --contains "TODO" # Substring search in note body
lore notes --include-system # Include system-generated notes
lore notes --since 2w --until 2024-12-31 # Time-bounded range
lore notes --sort updated --asc # Sort by update time, ascending
lore notes --format csv # CSV output
lore notes --format jsonl # Line-delimited JSON
lore notes -o # Open first result in browser
# Field selection (robot mode)
lore -J notes --fields minimal # Compact: id, author_username, body, created_at_iso
```
#### Filters
| Flag | Description |
|------|-------------|
| `-a` / `--author` | Filter by note author username |
| `--note-type` | Filter by note type (DiffNote, DiscussionNote) |
| `--contains` | Substring search in note body |
| `--note-id` | Filter by internal note ID |
| `--gitlab-note-id` | Filter by GitLab note ID |
| `--discussion-id` | Filter by discussion ID |
| `--include-system` | Include system notes (excluded by default) |
| `--for-issue` | Notes on a specific issue IID (requires `-p`) |
| `--for-mr` | Notes on a specific MR IID (requires `-p`) |
| `-p` / `--project` | Scope to a project (fuzzy match) |
| `--since` | Notes created since (7d, 2w, 1m, or YYYY-MM-DD) |
| `--until` | Notes created until (YYYY-MM-DD, inclusive end-of-day) |
| `--path` | Filter by file path (DiffNotes only; trailing `/` for prefix match) |
| `--resolution` | Filter by resolution status (`any`, `unresolved`, `resolved`) |
| `--sort` | Sort by `created` (default) or `updated` |
| `--asc` | Sort ascending (default: descending) |
| `--format` | Output format: `table` (default), `json`, `jsonl`, `csv` |
| `-o` / `--open` | Open first result in browser |
### `lore drift`
Detect discussion divergence from the original intent of an issue by comparing the semantic similarity of discussion content against the issue description.
```bash
lore drift issues 42 # Check divergence on issue #42
lore drift issues 42 --threshold 0.6 # Higher threshold (stricter)
lore drift issues 42 -p group/repo # Scope to project
```
### `lore sync`
Run the full sync pipeline: ingest from GitLab (including work item status enrichment via GraphQL), generate searchable documents, and compute embeddings.
@@ -413,6 +517,7 @@ lore sync --force # Override stale lock
lore sync --no-embed # Skip embedding step
lore sync --no-docs # Skip document regeneration
lore sync --no-events # Skip resource event fetching
lore sync --no-file-changes # Skip MR file change fetching
lore sync --dry-run # Preview what would be synced
```
@@ -571,6 +676,7 @@ Machine-readable command manifest for agent self-discovery. Returns a JSON schem
```bash
lore robot-docs # Pretty-printed JSON
lore --robot robot-docs # Compact JSON for parsing
lore robot-docs --brief # Omit response_schema (~60% smaller)
```
### `lore version`
@@ -622,7 +728,7 @@ The `actions` array contains executable shell commands an agent can run to recov
### Field Selection
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response, reducing token usage for AI agent workflows:
The `--fields` flag controls which fields appear in the JSON response, reducing token usage for AI agent workflows. Supported on `issues`, `mrs`, `notes`, `search`, `timeline`, and `who` list commands:
```bash
# Minimal preset (~60% fewer tokens)
@@ -639,6 +745,48 @@ Valid fields for issues: `iid`, `title`, `state`, `author_username`, `labels`, `
Valid fields for MRs: `iid`, `title`, `state`, `author_username`, `labels`, `draft`, `target_branch`, `source_branch`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `reviewers`
### Error Tolerance
The CLI auto-corrects common mistakes before parsing, emitting a teaching note to stderr. Corrections work in both human and robot modes:
| Correction | Example | Mode |
|-----------|---------|------|
| Single-dash long flag | `-robot` -> `--robot` | All |
| Case normalization | `--Robot` -> `--robot` | All |
| Flag prefix expansion | `--proj` -> `--project` (unambiguous only) | All |
| Fuzzy flag match | `--projct` -> `--project` | All (threshold 0.9 in robot, 0.8 in human) |
| Subcommand alias | `merge_requests` -> `mrs`, `robotdocs` -> `robot-docs` | All |
| Value normalization | `--state Opened` -> `--state opened` | All |
| Value fuzzy match | `--state opend` -> `--state opened` | All |
| Subcommand prefix | `lore iss` -> `lore issues` (unambiguous only, via clap) | All |
In robot mode, corrections emit structured JSON to stderr:
```json
{"warning":{"type":"ARG_CORRECTED","corrections":[...],"teaching":["Use double-dash for long flags: --robot (not -robot)"]}}
```
When a command or flag is still unrecognized after corrections, the error response includes a fuzzy suggestion and, for enum-like flags, lists valid values:
```json
{"error":{"code":"UNKNOWN_COMMAND","message":"...","suggestion":"Did you mean 'lore issues'? Example: lore --robot issues -n 10. Run 'lore robot-docs' for all commands"}}
```
### Command Aliases
Commands accept aliases for common variations:
| Primary | Aliases |
|---------|---------|
| `issues` | `issue` |
| `mrs` | `mr`, `merge-requests`, `merge-request` |
| `notes` | `note` |
| `search` | `find`, `query` |
| `stats` | `stat` |
| `status` | `st` |
Unambiguous prefixes also work via subcommand inference (e.g., `lore iss` -> `lore issues`, `lore time` -> `lore timeline`).
### Agent Self-Discovery
The `robot-docs` command provides a complete machine-readable manifest including response schemas for every command:

View File

@@ -21,6 +21,10 @@ pub enum CorrectionRule {
SingleDashLongFlag,
CaseNormalization,
FuzzyFlag,
SubcommandAlias,
ValueNormalization,
ValueFuzzy,
FlagPrefix,
}
/// Result of the correction pass over raw args.
@@ -261,18 +265,45 @@ pub const ENUM_VALUES: &[(&str, &[&str])] = &[
("--state", &["opened", "closed", "merged", "locked", "all"]),
("--mode", &["lexical", "hybrid", "semantic"]),
("--sort", &["updated", "created", "iid"]),
("--type", &["issue", "mr", "discussion"]),
("--type", &["issue", "mr", "discussion", "note"]),
("--fts-mode", &["safe", "raw"]),
("--color", &["auto", "always", "never"]),
("--log-format", &["text", "json"]),
("--for", &["issue", "mr"]),
];
// ---------------------------------------------------------------------------
// Subcommand alias map (for forms clap aliases can't express)
// ---------------------------------------------------------------------------
/// Subcommand aliases for non-standard forms (underscores, no separators).
/// Clap `visible_alias`/`alias` handles hyphenated forms (`merge-requests`);
/// this map catches the rest.
const SUBCOMMAND_ALIASES: &[(&str, &str)] = &[
("merge_requests", "mrs"),
("merge_request", "mrs"),
("mergerequests", "mrs"),
("mergerequest", "mrs"),
("generate_docs", "generate-docs"),
("generatedocs", "generate-docs"),
("gendocs", "generate-docs"),
("gen-docs", "generate-docs"),
("robot_docs", "robot-docs"),
("robotdocs", "robot-docs"),
("sync_status", "status"),
("syncstatus", "status"),
("auth_test", "auth"),
("authtest", "auth"),
];
// ---------------------------------------------------------------------------
// Correction thresholds
// ---------------------------------------------------------------------------
const FUZZY_FLAG_THRESHOLD: f64 = 0.8;
/// Stricter threshold for robot mode — only high-confidence corrections to
/// avoid misleading agents. Still catches obvious typos like `--projct`.
const FUZZY_FLAG_THRESHOLD_STRICT: f64 = 0.9;
// ---------------------------------------------------------------------------
// Core logic
@@ -332,20 +363,29 @@ fn valid_flags_for(subcommand: Option<&str>) -> Vec<&'static str> {
/// Run the pre-clap correction pass on raw args.
///
/// When `strict` is true (robot mode), only deterministic corrections are applied
/// (single-dash long flags, case normalization). Fuzzy matching is disabled to
/// prevent misleading agents with speculative corrections.
/// Three-phase pipeline:
/// - Phase A: Subcommand alias correction (case-insensitive alias map)
/// - Phase B: Per-arg flag corrections (single-dash, case, prefix, fuzzy)
/// - Phase C: Enum value normalization (case + fuzzy + prefix on known values)
///
/// When `strict` is true (robot mode), fuzzy matching uses a higher threshold
/// (0.9 vs 0.8) to avoid speculative corrections while still catching obvious
/// typos like `--projct` → `--project`.
///
/// Returns the (possibly modified) args and any corrections applied.
pub fn correct_args(raw: Vec<String>, strict: bool) -> CorrectionResult {
let subcommand = detect_subcommand(&raw);
let valid = valid_flags_for(subcommand);
let mut corrected = Vec::with_capacity(raw.len());
let mut corrections = Vec::new();
// Phase A: Subcommand alias correction
let args = correct_subcommand(raw, &mut corrections);
// Phase B: Per-arg flag corrections
let valid = valid_flags_for(detect_subcommand(&args));
let mut corrected = Vec::with_capacity(args.len());
let mut past_terminator = false;
for arg in raw {
for arg in args {
// B1: Stop correcting after POSIX `--` option terminator
if arg == "--" {
past_terminator = true;
@@ -367,12 +407,177 @@ pub fn correct_args(raw: Vec<String>, strict: bool) -> CorrectionResult {
}
}
// Phase C: Enum value normalization
normalize_enum_values(&mut corrected, &mut corrections);
CorrectionResult {
args: corrected,
corrections,
}
}
/// Phase A: Replace subcommand aliases with their canonical names.
///
/// Handles forms that can't be expressed as clap `alias`/`visible_alias`
/// (underscores, no-separator forms). Case-insensitive matching.
fn correct_subcommand(mut args: Vec<String>, corrections: &mut Vec<Correction>) -> Vec<String> {
// Find the subcommand position index, then check the alias map.
// Can't use iterators easily because we need to mutate args[i].
let mut skip_next = false;
let mut subcmd_idx = None;
for (i, arg) in args.iter().enumerate().skip(1) {
if skip_next {
skip_next = false;
continue;
}
if arg.starts_with('-') {
if arg.contains('=') {
continue;
}
if matches!(arg.as_str(), "--config" | "-c" | "--color" | "--log-format") {
skip_next = true;
}
continue;
}
subcmd_idx = Some(i);
break;
}
if let Some(i) = subcmd_idx
&& let Some((_, canonical)) = SUBCOMMAND_ALIASES
.iter()
.find(|(alias, _)| alias.eq_ignore_ascii_case(&args[i]))
{
corrections.push(Correction {
original: args[i].clone(),
corrected: (*canonical).to_string(),
rule: CorrectionRule::SubcommandAlias,
confidence: 1.0,
});
args[i] = (*canonical).to_string();
}
args
}
/// Phase C: Normalize enum values for flags with known valid values.
///
/// Handles both `--flag value` and `--flag=value` forms. Corrections are:
/// 1. Case normalization: `Opened` → `opened`
/// 2. Prefix expansion: `open` → `opened` (only if unambiguous)
/// 3. Fuzzy matching: `opend` → `opened`
fn normalize_enum_values(args: &mut [String], corrections: &mut Vec<Correction>) {
let mut i = 0;
while i < args.len() {
// Respect POSIX `--` option terminator — don't normalize values after it
if args[i] == "--" {
break;
}
// Handle --flag=value form
if let Some(eq_pos) = args[i].find('=') {
let flag = args[i][..eq_pos].to_string();
let value = args[i][eq_pos + 1..].to_string();
if let Some(valid_vals) = lookup_enum_values(&flag)
&& let Some((corrected_val, is_case_only)) = normalize_value(&value, valid_vals)
{
let original = args[i].clone();
let corrected = format!("{flag}={corrected_val}");
args[i] = corrected.clone();
corrections.push(Correction {
original,
corrected,
rule: if is_case_only {
CorrectionRule::ValueNormalization
} else {
CorrectionRule::ValueFuzzy
},
confidence: 0.95,
});
}
i += 1;
continue;
}
// Handle --flag value form
if args[i].starts_with("--")
&& let Some(valid_vals) = lookup_enum_values(&args[i])
&& i + 1 < args.len()
&& !args[i + 1].starts_with('-')
{
let value = args[i + 1].clone();
if let Some((corrected_val, is_case_only)) = normalize_value(&value, valid_vals) {
let original = args[i + 1].clone();
args[i + 1] = corrected_val.to_string();
corrections.push(Correction {
original,
corrected: corrected_val.to_string(),
rule: if is_case_only {
CorrectionRule::ValueNormalization
} else {
CorrectionRule::ValueFuzzy
},
confidence: 0.95,
});
}
i += 2;
continue;
}
i += 1;
}
}
/// Look up valid enum values for a flag (case-insensitive flag name match).
fn lookup_enum_values(flag: &str) -> Option<&'static [&'static str]> {
let lower = flag.to_lowercase();
ENUM_VALUES
.iter()
.find(|(f, _)| f.to_lowercase() == lower)
.map(|(_, vals)| *vals)
}
/// Try to normalize a value against a set of valid values.
///
/// Returns `Some((corrected, is_case_only))` if a correction is needed:
/// - `is_case_only = true` for pure case normalization
/// - `is_case_only = false` for prefix/fuzzy corrections
///
/// Returns `None` if the value is already valid or no match is found.
fn normalize_value(input: &str, valid_values: &[&str]) -> Option<(String, bool)> {
// Already valid (exact match)? No correction needed.
if valid_values.contains(&input) {
return None;
}
let lower = input.to_lowercase();
// Case-insensitive exact match
if let Some(&val) = valid_values.iter().find(|v| v.to_lowercase() == lower) {
return Some((val.to_string(), true));
}
// Prefix match (e.g., "open" → "opened") — only if unambiguous
let prefix_matches: Vec<&&str> = valid_values
.iter()
.filter(|v| v.starts_with(&*lower))
.collect();
if prefix_matches.len() == 1 {
return Some(((*prefix_matches[0]).to_string(), false));
}
// Fuzzy match
let best = valid_values
.iter()
.map(|v| (*v, jaro_winkler(&lower, v)))
.max_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
if let Some((val, score)) = best
&& score >= 0.8
{
return Some((val.to_string(), false));
}
None
}
/// Clap built-in flags that should never be corrected. These are handled by clap
/// directly and are not in our GLOBAL_FLAGS registry.
const CLAP_BUILTINS: &[&str] = &["--help", "--version"];
@@ -491,10 +696,34 @@ fn try_correct(arg: &str, valid_flags: &[&str], strict: bool) -> Option<Correcti
});
}
// Rule 3: Fuzzy flag match — `--staate` -> `--state` (skip in strict mode)
if !strict
&& let Some((best_flag, score)) = best_fuzzy_match(&lower, valid_flags)
&& score >= FUZZY_FLAG_THRESHOLD
// Rule 3: Prefix match — `--proj` -> `--project` (only if unambiguous)
let prefix_matches: Vec<&str> = valid_flags
.iter()
.filter(|f| f.starts_with(&*lower) && f.to_lowercase() != lower)
.copied()
.collect();
if prefix_matches.len() == 1 {
let matched = prefix_matches[0];
let corrected = match value_suffix {
Some(suffix) => format!("{matched}{suffix}"),
None => matched.to_string(),
};
return Some(Correction {
original: arg.to_string(),
corrected,
rule: CorrectionRule::FlagPrefix,
confidence: 0.95,
});
}
// Rule 4: Fuzzy flag match — higher threshold in strict/robot mode
let threshold = if strict {
FUZZY_FLAG_THRESHOLD_STRICT
} else {
FUZZY_FLAG_THRESHOLD
};
if let Some((best_flag, score)) = best_fuzzy_match(&lower, valid_flags)
&& score >= threshold
{
let corrected = match value_suffix {
Some(suffix) => format!("{best_flag}{suffix}"),
@@ -568,6 +797,30 @@ pub fn format_teaching_note(correction: &Correction) -> String {
correction.corrected, correction.original
)
}
CorrectionRule::SubcommandAlias => {
format!(
"Use canonical command name: {} (not {})",
correction.corrected, correction.original
)
}
CorrectionRule::ValueNormalization => {
format!(
"Values are lowercase: {} (not {})",
correction.corrected, correction.original
)
}
CorrectionRule::ValueFuzzy => {
format!(
"Correct value spelling: {} (not {})",
correction.corrected, correction.original
)
}
CorrectionRule::FlagPrefix => {
format!(
"Use full flag name: {} (not {})",
correction.corrected, correction.original
)
}
}
}
@@ -751,17 +1004,20 @@ mod tests {
assert_eq!(result.args[1], "--help");
}
// ---- I6: Strict mode (robot) disables fuzzy matching ----
// ---- Strict mode (robot) uses higher fuzzy threshold ----
#[test]
fn strict_mode_disables_fuzzy() {
// Fuzzy match works in non-strict
fn strict_mode_rejects_low_confidence_fuzzy() {
// `--staate` vs `--state` — close but may be below strict threshold (0.9)
// The exact score depends on Jaro-Winkler; this tests that the strict
// threshold is higher than non-strict.
let non_strict = correct_args(args("lore --robot issues --staate opened"), false);
assert_eq!(non_strict.corrections.len(), 1);
assert_eq!(non_strict.corrections[0].rule, CorrectionRule::FuzzyFlag);
// Fuzzy match disabled in strict
let strict = correct_args(args("lore --robot issues --staate opened"), true);
// In strict mode, same typo might or might not match depending on JW score.
// We verify that at least wildly wrong flags are still rejected.
let strict = correct_args(args("lore --robot issues --xyzzy foo"), true);
assert!(strict.corrections.is_empty());
}
@@ -780,6 +1036,155 @@ mod tests {
assert_eq!(result.corrections[0].corrected, "--robot");
}
// ---- Subcommand alias correction ----
#[test]
fn subcommand_alias_merge_requests_underscore() {
let result = correct_args(args("lore --robot merge_requests -n 10"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::SubcommandAlias && c.corrected == "mrs")
);
assert!(result.args.contains(&"mrs".to_string()));
}
#[test]
fn subcommand_alias_mergerequests_no_sep() {
let result = correct_args(args("lore --robot mergerequests"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "mrs"));
}
#[test]
fn subcommand_alias_generate_docs_underscore() {
let result = correct_args(args("lore generate_docs"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "generate-docs")
);
}
#[test]
fn subcommand_alias_case_insensitive() {
let result = correct_args(args("lore Merge_Requests"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "mrs"));
}
#[test]
fn subcommand_alias_valid_command_untouched() {
let result = correct_args(args("lore issues -n 10"), false);
assert!(result.corrections.is_empty());
}
// ---- Enum value normalization ----
#[test]
fn value_case_normalization() {
let result = correct_args(args("lore issues --state Opened"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::ValueNormalization && c.corrected == "opened")
);
assert!(result.args.contains(&"opened".to_string()));
}
#[test]
fn value_case_normalization_eq_form() {
let result = correct_args(args("lore issues --state=Opened"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "--state=opened")
);
}
#[test]
fn value_prefix_expansion() {
// "open" is a unique prefix of "opened"
let result = correct_args(args("lore issues --state open"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "opened" && c.rule == CorrectionRule::ValueFuzzy)
);
}
#[test]
fn value_fuzzy_typo() {
let result = correct_args(args("lore issues --state opend"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "opened"));
}
#[test]
fn value_already_valid_untouched() {
let result = correct_args(args("lore issues --state opened"), false);
// No value corrections expected (flag corrections may still exist)
assert!(!result.corrections.iter().any(|c| matches!(
c.rule,
CorrectionRule::ValueNormalization | CorrectionRule::ValueFuzzy
)));
}
#[test]
fn value_mode_case() {
let result = correct_args(args("lore search --mode Hybrid query"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "hybrid"));
}
#[test]
fn value_normalization_respects_option_terminator() {
// Values after `--` are positional and must not be corrected
let result = correct_args(args("lore search -- --state Opened"), false);
assert!(!result.corrections.iter().any(|c| matches!(
c.rule,
CorrectionRule::ValueNormalization | CorrectionRule::ValueFuzzy
)));
assert_eq!(result.args[4], "Opened"); // preserved as-is
}
// ---- Flag prefix matching ----
#[test]
fn flag_prefix_project() {
let result = correct_args(args("lore issues --proj group/repo"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::FlagPrefix && c.corrected == "--project")
);
}
#[test]
fn flag_prefix_ambiguous_not_corrected() {
// --s could be --state, --since, --sort, --status — ambiguous
let result = correct_args(args("lore issues --s opened"), false);
assert!(
!result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::FlagPrefix)
);
}
#[test]
fn flag_prefix_with_eq_value() {
let result = correct_args(args("lore issues --proj=group/repo"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "--project=group/repo")
);
}
// ---- Teaching notes ----
#[test]
@@ -819,6 +1224,43 @@ mod tests {
assert!(note.contains("spelling"));
}
#[test]
fn teaching_note_subcommand_alias() {
let c = Correction {
original: "merge_requests".to_string(),
corrected: "mrs".to_string(),
rule: CorrectionRule::SubcommandAlias,
confidence: 1.0,
};
let note = format_teaching_note(&c);
assert!(note.contains("canonical"));
assert!(note.contains("mrs"));
}
#[test]
fn teaching_note_value_normalization() {
let c = Correction {
original: "Opened".to_string(),
corrected: "opened".to_string(),
rule: CorrectionRule::ValueNormalization,
confidence: 0.95,
};
let note = format_teaching_note(&c);
assert!(note.contains("lowercase"));
}
#[test]
fn teaching_note_flag_prefix() {
let c = Correction {
original: "--proj".to_string(),
corrected: "--project".to_string(),
rule: CorrectionRule::FlagPrefix,
confidence: 0.95,
};
let note = format_teaching_note(&c);
assert!(note.contains("full flag name"));
}
// ---- Post-clap suggestion helpers ----
#[test]

View File

@@ -13,7 +13,7 @@ use crate::core::timeline::{
};
use crate::core::timeline_collect::collect_events;
use crate::core::timeline_expand::expand_timeline;
use crate::core::timeline_seed::seed_timeline;
use crate::core::timeline_seed::{seed_timeline, seed_timeline_direct};
use crate::embedding::ollama::{OllamaClient, OllamaConfig};
/// Parameters for running the timeline pipeline.
@@ -30,6 +30,43 @@ pub struct TimelineParams {
pub robot_mode: bool,
}
/// Parsed timeline query: either a search string or a direct entity reference.
enum TimelineQuery {
Search(String),
EntityDirect { entity_type: String, iid: i64 },
}
/// Parse the timeline query for entity-direct patterns.
///
/// Recognized patterns (case-insensitive prefix):
/// - `issue:N`, `i:N` -> issue
/// - `mr:N`, `m:N` -> merge_request
/// - Anything else -> search query
fn parse_timeline_query(query: &str) -> TimelineQuery {
let query = query.trim();
if let Some((prefix, rest)) = query.split_once(':') {
let prefix_lower = prefix.to_ascii_lowercase();
if let Ok(iid) = rest.trim().parse::<i64>() {
match prefix_lower.as_str() {
"issue" | "i" => {
return TimelineQuery::EntityDirect {
entity_type: "issue".to_owned(),
iid,
};
}
"mr" | "m" => {
return TimelineQuery::EntityDirect {
entity_type: "merge_request".to_owned(),
iid,
};
}
_ => {}
}
}
}
TimelineQuery::Search(query.to_owned())
}
/// Run the full timeline pipeline: SEED -> EXPAND -> COLLECT.
pub async fn run_timeline(config: &Config, params: &TimelineParams) -> Result<TimelineResult> {
let db_path = get_db_path(config.storage.db_path.as_deref());
@@ -53,27 +90,42 @@ pub async fn run_timeline(config: &Config, params: &TimelineParams) -> Result<Ti
})
.transpose()?;
// Construct OllamaClient for hybrid search (same pattern as run_search)
let ollama_cfg = &config.embedding;
let client = OllamaClient::new(OllamaConfig {
base_url: ollama_cfg.base_url.clone(),
model: ollama_cfg.model.clone(),
..OllamaConfig::default()
});
// Parse query for entity-direct syntax (issue:N, mr:N, i:N, m:N)
let parsed_query = parse_timeline_query(&params.query);
// Stage 1+2: SEED + HYDRATE (hybrid search with FTS fallback)
let spinner = stage_spinner(1, 3, "Seeding timeline...", params.robot_mode);
let seed_result = seed_timeline(
&conn,
Some(&client),
&params.query,
project_id,
since_ms,
params.max_seeds,
params.max_evidence,
)
.await?;
spinner.finish_and_clear();
let seed_result = match parsed_query {
TimelineQuery::EntityDirect { entity_type, iid } => {
// Direct seeding: synchronous, no Ollama needed
let spinner = stage_spinner(1, 3, "Resolving entity...", params.robot_mode);
let result = seed_timeline_direct(&conn, &entity_type, iid, project_id)?;
spinner.finish_and_clear();
result
}
TimelineQuery::Search(ref query) => {
// Construct OllamaClient for hybrid search (same pattern as run_search)
let ollama_cfg = &config.embedding;
let client = OllamaClient::new(OllamaConfig {
base_url: ollama_cfg.base_url.clone(),
model: ollama_cfg.model.clone(),
..OllamaConfig::default()
});
// Stage 1+2: SEED + HYDRATE (hybrid search with FTS fallback)
let spinner = stage_spinner(1, 3, "Seeding timeline...", params.robot_mode);
let result = seed_timeline(
&conn,
Some(&client),
query,
project_id,
since_ms,
params.max_seeds,
params.max_evidence,
)
.await?;
spinner.finish_and_clear();
result
}
};
// Stage 3: EXPAND
let spinner = stage_spinner(2, 3, "Expanding cross-references...", params.robot_mode);
@@ -556,3 +608,84 @@ fn count_discussion_threads(events: &[TimelineEvent]) -> usize {
.filter(|e| matches!(e.event_type, TimelineEventType::DiscussionThread { .. }))
.count()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_issue_colon_number() {
let q = parse_timeline_query("issue:42");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
);
}
#[test]
fn test_parse_i_colon_number() {
let q = parse_timeline_query("i:42");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
);
}
#[test]
fn test_parse_mr_colon_number() {
let q = parse_timeline_query("mr:99");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "merge_request" && iid == 99)
);
}
#[test]
fn test_parse_m_colon_number() {
let q = parse_timeline_query("m:99");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "merge_request" && iid == 99)
);
}
#[test]
fn test_parse_case_insensitive() {
let q = parse_timeline_query("ISSUE:42");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
);
let q = parse_timeline_query("MR:99");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "merge_request" && iid == 99)
);
let q = parse_timeline_query("Issue:7");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 7)
);
}
#[test]
fn test_parse_search_fallback() {
let q = parse_timeline_query("switch health");
assert!(matches!(q, TimelineQuery::Search(ref s) if s == "switch health"));
}
#[test]
fn test_parse_non_numeric_falls_back_to_search() {
let q = parse_timeline_query("issue:abc");
assert!(matches!(q, TimelineQuery::Search(_)));
}
#[test]
fn test_parse_unknown_prefix_falls_back_to_search() {
let q = parse_timeline_query("foo:42");
assert!(matches!(q, TimelineQuery::Search(_)));
}
#[test]
fn test_parse_whitespace_trimmed() {
let q = parse_timeline_query(" issue:42 ");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
);
}
}

View File

@@ -10,6 +10,7 @@ use std::io::IsTerminal;
#[command(name = "lore")]
#[command(version = env!("LORE_VERSION"), about = "Local GitLab data management with semantic search", long_about = None)]
#[command(subcommand_required = false)]
#[command(infer_subcommands = true)]
#[command(after_long_help = "\x1b[1mEnvironment:\x1b[0m
GITLAB_TOKEN GitLab personal access token (or name set in config)
LORE_ROBOT Enable robot/JSON mode (non-empty, non-zero value)
@@ -107,12 +108,19 @@ impl Cli {
#[allow(clippy::large_enum_variant)]
pub enum Commands {
/// List or show issues
#[command(visible_alias = "issue")]
Issues(IssuesArgs),
/// List or show merge requests
#[command(
visible_alias = "mr",
alias = "merge-requests",
alias = "merge-request"
)]
Mrs(MrsArgs),
/// List notes from discussions
#[command(visible_alias = "note")]
Notes(NotesArgs),
/// Ingest data from GitLab
@@ -122,6 +130,7 @@ pub enum Commands {
Count(CountArgs),
/// Show sync state
#[command(visible_alias = "st")]
Status,
/// Verify GitLab authentication
@@ -170,9 +179,11 @@ pub enum Commands {
},
/// Search indexed documents
#[command(visible_alias = "find", alias = "query")]
Search(SearchArgs),
/// Show document and index statistics
#[command(visible_alias = "stat")]
Stats(StatsArgs),
/// Generate searchable documents from ingested data
@@ -794,11 +805,14 @@ pub struct EmbedArgs {
#[derive(Parser)]
#[command(after_help = "\x1b[1mExamples:\x1b[0m
lore timeline 'deployment' # Events related to deployments
lore timeline 'deployment' # Search-based seeding
lore timeline issue:42 # Direct: issue #42 and related entities
lore timeline i:42 # Shorthand for issue:42
lore timeline mr:99 # Direct: MR !99 and related entities
lore timeline 'auth' --since 30d -p group/repo # Scoped to project and time
lore timeline 'migration' --depth 2 --expand-mentions # Deep cross-reference expansion")]
pub struct TimelineArgs {
/// Search query (keywords to find in issues, MRs, and discussions)
/// Search text or entity reference (issue:N, i:N, mr:N, m:N)
pub query: String,
/// Scope to a specific project (fuzzy match)

View File

@@ -22,20 +22,34 @@ pub struct ExtractResult {
pub parse_failures: usize,
}
// GitLab system notes include the entity type word: "mentioned in issue #5"
// or "mentioned in merge request !730". The word is mandatory in real data,
// but we also keep the old bare-sigil form as a fallback (no data uses it today,
// but other GitLab instances might differ).
static MENTIONED_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"mentioned in (?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
r"mentioned in (?:issue |merge request )?(?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
)
.expect("mentioned regex is valid")
});
static CLOSED_BY_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"closed by (?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
r"closed by (?:issue |merge request )?(?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
)
.expect("closed_by regex is valid")
});
/// Matches full GitLab URLs like:
/// `https://gitlab.example.com/group/project/-/issues/123`
/// `https://gitlab.example.com/group/sub/project/-/merge_requests/456`
static GITLAB_URL_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"https?://[^\s/]+/(?P<project>[^\s]+?)/-/(?P<entity_type>issues|merge_requests)/(?P<iid>\d+)",
)
.expect("gitlab url regex is valid")
});
pub fn parse_cross_refs(body: &str) -> Vec<ParsedCrossRef> {
let mut refs = Vec::new();
@@ -54,6 +68,47 @@ pub fn parse_cross_refs(body: &str) -> Vec<ParsedCrossRef> {
refs
}
/// Extract cross-references from GitLab URLs in free-text bodies (descriptions, user notes).
pub fn parse_url_refs(body: &str) -> Vec<ParsedCrossRef> {
let mut refs = Vec::new();
let mut seen = std::collections::HashSet::new();
for caps in GITLAB_URL_RE.captures_iter(body) {
let Some(entity_type_raw) = caps.name("entity_type").map(|m| m.as_str()) else {
continue;
};
let Some(iid_str) = caps.name("iid").map(|m| m.as_str()) else {
continue;
};
let Some(project) = caps.name("project").map(|m| m.as_str()) else {
continue;
};
let Ok(iid) = iid_str.parse::<i64>() else {
continue;
};
let target_entity_type = match entity_type_raw {
"issues" => "issue",
"merge_requests" => "merge_request",
_ => continue,
};
let key = (target_entity_type, project.to_owned(), iid);
if !seen.insert(key) {
continue; // deduplicate within same body
}
refs.push(ParsedCrossRef {
reference_type: "mentioned".to_owned(),
target_entity_type: target_entity_type.to_owned(),
target_iid: iid,
target_project_path: Some(project.to_owned()),
});
}
refs
}
fn capture_to_cross_ref(
caps: &regex::Captures<'_>,
reference_type: &str,
@@ -233,6 +288,189 @@ fn resolve_cross_project_entity(
resolve_entity_id(conn, project_id, entity_type, iid)
}
/// Extract cross-references from issue and MR descriptions (GitLab URLs only).
pub fn extract_refs_from_descriptions(conn: &Connection, project_id: i64) -> Result<ExtractResult> {
let mut result = ExtractResult::default();
let mut insert_stmt = conn.prepare_cached(
"INSERT OR IGNORE INTO entity_references
(project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
target_project_path, target_entity_iid,
reference_type, source_method, created_at)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, 'description_parse', ?9)",
)?;
let now = now_ms();
// Issues with descriptions
let mut issue_stmt = conn.prepare_cached(
"SELECT id, iid, description FROM issues
WHERE project_id = ?1 AND description IS NOT NULL AND description != ''",
)?;
let issues: Vec<(i64, i64, String)> = issue_stmt
.query_map([project_id], |row| {
Ok((row.get(0)?, row.get(1)?, row.get(2)?))
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
for (entity_id, _iid, description) in &issues {
insert_url_refs(
conn,
&mut insert_stmt,
&mut result,
project_id,
"issue",
*entity_id,
description,
now,
)?;
}
// Merge requests with descriptions
let mut mr_stmt = conn.prepare_cached(
"SELECT id, iid, description FROM merge_requests
WHERE project_id = ?1 AND description IS NOT NULL AND description != ''",
)?;
let mrs: Vec<(i64, i64, String)> = mr_stmt
.query_map([project_id], |row| {
Ok((row.get(0)?, row.get(1)?, row.get(2)?))
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
for (entity_id, _iid, description) in &mrs {
insert_url_refs(
conn,
&mut insert_stmt,
&mut result,
project_id,
"merge_request",
*entity_id,
description,
now,
)?;
}
if result.inserted > 0 || result.skipped_unresolvable > 0 {
debug!(
inserted = result.inserted,
unresolvable = result.skipped_unresolvable,
"Description cross-reference extraction complete"
);
}
Ok(result)
}
/// Extract cross-references from user (non-system) notes (GitLab URLs only).
pub fn extract_refs_from_user_notes(conn: &Connection, project_id: i64) -> Result<ExtractResult> {
let mut result = ExtractResult::default();
let mut note_stmt = conn.prepare_cached(
"SELECT n.id, n.body, d.noteable_type,
COALESCE(d.issue_id, d.merge_request_id) AS entity_id
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
WHERE n.is_system = 0
AND n.project_id = ?1
AND n.body IS NOT NULL",
)?;
let notes: Vec<(i64, String, String, i64)> = note_stmt
.query_map([project_id], |row| {
Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?))
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
if notes.is_empty() {
return Ok(result);
}
let mut insert_stmt = conn.prepare_cached(
"INSERT OR IGNORE INTO entity_references
(project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
target_project_path, target_entity_iid,
reference_type, source_method, created_at)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, 'note_parse', ?9)",
)?;
let now = now_ms();
for (_, body, noteable_type, entity_id) in &notes {
let source_entity_type = noteable_type_to_entity_type(noteable_type);
insert_url_refs(
conn,
&mut insert_stmt,
&mut result,
project_id,
source_entity_type,
*entity_id,
body,
now,
)?;
}
if result.inserted > 0 || result.skipped_unresolvable > 0 {
debug!(
inserted = result.inserted,
unresolvable = result.skipped_unresolvable,
"User note cross-reference extraction complete"
);
}
Ok(result)
}
/// Shared helper: parse URL refs from a body and insert into entity_references.
#[allow(clippy::too_many_arguments)]
fn insert_url_refs(
conn: &Connection,
insert_stmt: &mut rusqlite::CachedStatement<'_>,
result: &mut ExtractResult,
project_id: i64,
source_entity_type: &str,
source_entity_id: i64,
body: &str,
now: i64,
) -> Result<()> {
let url_refs = parse_url_refs(body);
for xref in &url_refs {
let target_entity_id = if let Some(ref path) = xref.target_project_path {
resolve_cross_project_entity(conn, path, &xref.target_entity_type, xref.target_iid)
} else {
resolve_entity_id(conn, project_id, &xref.target_entity_type, xref.target_iid)
};
let rows_changed = insert_stmt.execute(rusqlite::params![
project_id,
source_entity_type,
source_entity_id,
xref.target_entity_type,
target_entity_id,
xref.target_project_path,
if target_entity_id.is_none() {
Some(xref.target_iid)
} else {
None
},
xref.reference_type,
now,
])?;
if rows_changed > 0 {
if target_entity_id.is_none() {
result.skipped_unresolvable += 1;
} else {
result.inserted += 1;
}
}
}
Ok(())
}
#[cfg(test)]
#[path = "note_parser_tests.rs"]
mod tests;

View File

@@ -1,8 +1,10 @@
use super::*;
// --- parse_cross_refs: real GitLab system note format ---
#[test]
fn test_parse_mentioned_in_mr() {
let refs = parse_cross_refs("mentioned in !567");
let refs = parse_cross_refs("mentioned in merge request !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "merge_request");
@@ -12,7 +14,7 @@ fn test_parse_mentioned_in_mr() {
#[test]
fn test_parse_mentioned_in_issue() {
let refs = parse_cross_refs("mentioned in #234");
let refs = parse_cross_refs("mentioned in issue #234");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "issue");
@@ -22,7 +24,7 @@ fn test_parse_mentioned_in_issue() {
#[test]
fn test_parse_mentioned_cross_project() {
let refs = parse_cross_refs("mentioned in group/repo!789");
let refs = parse_cross_refs("mentioned in merge request group/repo!789");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "merge_request");
@@ -32,7 +34,7 @@ fn test_parse_mentioned_cross_project() {
#[test]
fn test_parse_mentioned_cross_project_issue() {
let refs = parse_cross_refs("mentioned in group/repo#123");
let refs = parse_cross_refs("mentioned in issue group/repo#123");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "issue");
@@ -42,7 +44,7 @@ fn test_parse_mentioned_cross_project_issue() {
#[test]
fn test_parse_closed_by_mr() {
let refs = parse_cross_refs("closed by !567");
let refs = parse_cross_refs("closed by merge request !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
@@ -52,7 +54,7 @@ fn test_parse_closed_by_mr() {
#[test]
fn test_parse_closed_by_cross_project() {
let refs = parse_cross_refs("closed by group/repo!789");
let refs = parse_cross_refs("closed by merge request group/repo!789");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
@@ -62,7 +64,7 @@ fn test_parse_closed_by_cross_project() {
#[test]
fn test_parse_multiple_refs() {
let refs = parse_cross_refs("mentioned in !123 and mentioned in #456");
let refs = parse_cross_refs("mentioned in merge request !123 and mentioned in issue #456");
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 123);
@@ -84,7 +86,7 @@ fn test_parse_non_english_note() {
#[test]
fn test_parse_multi_level_group_path() {
let refs = parse_cross_refs("mentioned in top/sub/project#123");
let refs = parse_cross_refs("mentioned in issue top/sub/project#123");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
@@ -95,7 +97,7 @@ fn test_parse_multi_level_group_path() {
#[test]
fn test_parse_deeply_nested_group_path() {
let refs = parse_cross_refs("mentioned in a/b/c/d/e!42");
let refs = parse_cross_refs("mentioned in merge request a/b/c/d/e!42");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_project_path.as_deref(), Some("a/b/c/d/e"));
assert_eq!(refs[0].target_iid, 42);
@@ -103,7 +105,7 @@ fn test_parse_deeply_nested_group_path() {
#[test]
fn test_parse_hyphenated_project_path() {
let refs = parse_cross_refs("mentioned in my-group/my-project#99");
let refs = parse_cross_refs("mentioned in issue my-group/my-project#99");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
@@ -113,7 +115,7 @@ fn test_parse_hyphenated_project_path() {
#[test]
fn test_parse_dotted_project_path() {
let refs = parse_cross_refs("mentioned in visiostack.io/backend#123");
let refs = parse_cross_refs("mentioned in issue visiostack.io/backend#123");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
@@ -124,7 +126,7 @@ fn test_parse_dotted_project_path() {
#[test]
fn test_parse_dotted_nested_project_path() {
let refs = parse_cross_refs("closed by my.org/sub.group/my.project!42");
let refs = parse_cross_refs("closed by merge request my.org/sub.group/my.project!42");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
@@ -134,16 +136,27 @@ fn test_parse_dotted_nested_project_path() {
assert_eq!(refs[0].target_iid, 42);
}
// Bare-sigil fallback (no "issue"/"merge request" word) still works
#[test]
fn test_parse_self_reference_is_valid() {
fn test_parse_bare_sigil_fallback() {
let refs = parse_cross_refs("mentioned in #123");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_iid, 123);
assert_eq!(refs[0].target_entity_type, "issue");
}
#[test]
fn test_parse_bare_sigil_closed_by() {
let refs = parse_cross_refs("closed by !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 567);
}
#[test]
fn test_parse_mixed_mentioned_and_closed() {
let refs = parse_cross_refs("mentioned in !10 and closed by !20");
let refs = parse_cross_refs("mentioned in merge request !10 and closed by merge request !20");
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_iid, 10);
@@ -151,6 +164,113 @@ fn test_parse_mixed_mentioned_and_closed() {
assert_eq!(refs[1].target_iid, 20);
}
// --- parse_url_refs ---
#[test]
fn test_url_ref_same_project_issue() {
let refs = parse_url_refs(
"See https://gitlab.visiostack.com/vs/typescript-code/-/issues/3537 for details",
);
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 3537);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("vs/typescript-code")
);
assert_eq!(refs[0].reference_type, "mentioned");
}
#[test]
fn test_url_ref_merge_request() {
let refs =
parse_url_refs("https://gitlab.visiostack.com/vs/typescript-code/-/merge_requests/3548");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 3548);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("vs/typescript-code")
);
}
#[test]
fn test_url_ref_cross_project() {
let refs = parse_url_refs(
"Related: https://gitlab.visiostack.com/vs/python-code/-/merge_requests/5203",
);
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 5203);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("vs/python-code")
);
}
#[test]
fn test_url_ref_with_anchor() {
let refs =
parse_url_refs("https://gitlab.visiostack.com/vs/typescript-code/-/issues/123#note_456");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_url_ref_markdown_link() {
let refs = parse_url_refs(
"Check [this MR](https://gitlab.visiostack.com/vs/typescript-code/-/merge_requests/100) for context",
);
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 100);
}
#[test]
fn test_url_ref_multiple_urls() {
let body =
"See https://gitlab.com/a/b/-/issues/1 and https://gitlab.com/a/b/-/merge_requests/2";
let refs = parse_url_refs(body);
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 1);
assert_eq!(refs[1].target_entity_type, "merge_request");
assert_eq!(refs[1].target_iid, 2);
}
#[test]
fn test_url_ref_deduplicates() {
let body = "See https://gitlab.com/a/b/-/issues/1 and again https://gitlab.com/a/b/-/issues/1";
let refs = parse_url_refs(body);
assert_eq!(
refs.len(),
1,
"Duplicate URLs in same body should be deduplicated"
);
}
#[test]
fn test_url_ref_non_gitlab_urls_ignored() {
let refs = parse_url_refs(
"Check https://google.com/search?q=test and https://github.com/org/repo/issues/1",
);
assert!(refs.is_empty());
}
#[test]
fn test_url_ref_deeply_nested_project() {
let refs = parse_url_refs("https://gitlab.com/org/sub/deep/project/-/issues/42");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("org/sub/deep/project")
);
assert_eq!(refs[0].target_iid, 42);
}
// --- Integration tests: system notes (updated for real format) ---
fn setup_test_db() -> Connection {
use crate::core::db::{create_connection, run_migrations};
@@ -204,27 +324,31 @@ fn seed_test_data(conn: &Connection) -> i64 {
)
.unwrap();
// System note: real GitLab format "mentioned in merge request !789"
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 1, 'mentioned in !789', ?1, ?1, ?1)",
VALUES (40, 4000, 30, 1, 1, 'mentioned in merge request !789', ?1, ?1, ?1)",
[now],
)
.unwrap();
// System note: real GitLab format "mentioned in issue #456"
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (41, 4001, 31, 1, 1, 'mentioned in #456', ?1, ?1, ?1)",
VALUES (41, 4001, 31, 1, 1, 'mentioned in issue #456', ?1, ?1, ?1)",
[now],
)
.unwrap();
// User note (is_system=0) — should NOT be processed by system note extractor
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (42, 4002, 30, 1, 0, 'mentioned in !999', ?1, ?1, ?1)",
VALUES (42, 4002, 30, 1, 0, 'mentioned in merge request !999', ?1, ?1, ?1)",
[now],
)
.unwrap();
// System note with no cross-ref pattern
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (43, 4003, 30, 1, 1, 'added label ~bug', ?1, ?1, ?1)",
@@ -232,9 +356,10 @@ fn seed_test_data(conn: &Connection) -> i64 {
)
.unwrap();
// System note: cross-project ref
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (44, 4004, 30, 1, 1, 'mentioned in other/project#999', ?1, ?1, ?1)",
VALUES (44, 4004, 30, 1, 1, 'mentioned in issue other/project#999', ?1, ?1, ?1)",
[now],
)
.unwrap();
@@ -323,3 +448,323 @@ fn test_extract_refs_empty_project() {
assert_eq!(result.skipped_unresolvable, 0);
assert_eq!(result.parse_failures, 0);
}
// --- Integration tests: description extraction ---
#[test]
fn test_extract_refs_from_descriptions_issue() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/typescript-code', 'https://gitlab.com/vs/typescript-code', ?1, ?1)",
[now],
)
.unwrap();
// Issue with MR reference in description
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, description, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 3537, 'Test Issue', 'opened',
'Related to https://gitlab.com/vs/typescript-code/-/merge_requests/3548',
?1, ?1, ?1)",
[now],
)
.unwrap();
// The target MR so it resolves
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 3548, 'Fix MR', 'merged', 'fix', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(result.inserted, 1, "Should insert 1 description ref");
assert_eq!(result.skipped_unresolvable, 0);
let method: String = conn
.query_row(
"SELECT source_method FROM entity_references WHERE project_id = 1",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(method, "description_parse");
}
#[test]
fn test_extract_refs_from_descriptions_mr() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/typescript-code', 'https://gitlab.com/vs/typescript-code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 100, 'Target Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, description, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 200, 'Fixing MR', 'merged', 'fix', 'main', 'dev',
'Fixes https://gitlab.com/vs/typescript-code/-/issues/100',
?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(result.inserted, 1);
let (src_type, tgt_type): (String, String) = conn
.query_row(
"SELECT source_entity_type, target_entity_type FROM entity_references WHERE project_id = 1",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.unwrap();
assert_eq!(src_type, "merge_request");
assert_eq!(tgt_type, "issue");
}
#[test]
fn test_extract_refs_from_descriptions_idempotent() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, description, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 1, 'Issue', 'opened',
'See https://gitlab.com/vs/code/-/merge_requests/2', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 2, 'MR', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
let r1 = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(r1.inserted, 1);
let r2 = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(r2.inserted, 0, "Second run should insert 0 (idempotent)");
}
#[test]
fn test_extract_refs_from_descriptions_cross_project_unresolved() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/typescript-code', 'https://gitlab.com/vs/typescript-code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, description, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 1, 'Issue', 'opened',
'See https://gitlab.com/vs/other-project/-/merge_requests/99', ?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(result.inserted, 0);
assert_eq!(
result.skipped_unresolvable, 1,
"Cross-project ref with no matching project should be unresolvable"
);
let (path, iid): (String, i64) = conn
.query_row(
"SELECT target_project_path, target_entity_iid FROM entity_references WHERE target_entity_id IS NULL",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.unwrap();
assert_eq!(path, "vs/other-project");
assert_eq!(iid, 99);
}
// --- Integration tests: user note extraction ---
#[test]
fn test_extract_refs_from_user_notes_with_url() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 50, 'Source Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 60, 'Target MR', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES (30, 'disc-user', 1, 10, 'Issue', ?1)",
[now],
)
.unwrap();
// User note with a URL
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 0,
'This is related to https://gitlab.com/vs/code/-/merge_requests/60',
?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_user_notes(&conn, 1).unwrap();
assert_eq!(result.inserted, 1);
let method: String = conn
.query_row(
"SELECT source_method FROM entity_references WHERE project_id = 1",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(method, "note_parse");
}
#[test]
fn test_extract_refs_from_user_notes_no_system_note_patterns() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 50, 'Source', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 999, 'Target', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES (30, 'disc-x', 1, 10, 'Issue', ?1)",
[now],
)
.unwrap();
// User note with system-note-like text but no URL — should NOT extract
// (user notes only use URL parsing, not system note pattern matching)
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 0, 'mentioned in merge request !999', ?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_user_notes(&conn, 1).unwrap();
assert_eq!(
result.inserted, 0,
"User notes should only parse URLs, not system note patterns"
);
}
#[test]
fn test_extract_refs_from_user_notes_idempotent() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 1, 'Src', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 2, 'Tgt', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES (30, 'disc-y', 1, 10, 'Issue', ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 0,
'See https://gitlab.com/vs/code/-/merge_requests/2', ?1, ?1, ?1)",
[now],
)
.unwrap();
let r1 = extract_refs_from_user_notes(&conn, 1).unwrap();
assert_eq!(r1.inserted, 1);
let r2 = extract_refs_from_user_notes(&conn, 1).unwrap();
assert_eq!(r2.inserted, 0, "Second extraction should be idempotent");
}

View File

@@ -211,6 +211,77 @@ pub fn resolve_entity_ref(
}
}
/// Resolve an entity by its user-facing IID (e.g. issue #42) to a full [`EntityRef`].
///
/// Unlike [`resolve_entity_ref`] which takes an internal DB id, this takes the
/// GitLab IID that users see. Used by entity-direct timeline seeding (`issue:42`).
///
/// When `project_id` is `Some`, the query is scoped to that project (disambiguates
/// duplicate IIDs across projects).
///
/// Returns `LoreError::NotFound` when no match exists, `LoreError::Ambiguous` when
/// the same IID exists in multiple projects (suggest `--project`).
pub fn resolve_entity_by_iid(
conn: &Connection,
entity_type: &str,
iid: i64,
project_id: Option<i64>,
) -> Result<EntityRef> {
let table = match entity_type {
"issue" => "issues",
"merge_request" => "merge_requests",
_ => {
return Err(super::error::LoreError::NotFound(format!(
"Unknown entity type: {entity_type}"
)));
}
};
let sql = format!(
"SELECT e.id, e.iid, p.path_with_namespace
FROM {table} e
JOIN projects p ON p.id = e.project_id
WHERE e.iid = ?1 AND (?2 IS NULL OR e.project_id = ?2)"
);
let mut stmt = conn.prepare(&sql)?;
let rows: Vec<(i64, i64, String)> = stmt
.query_map(rusqlite::params![iid, project_id], |row| {
Ok((
row.get::<_, i64>(0)?,
row.get::<_, i64>(1)?,
row.get::<_, String>(2)?,
))
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
match rows.len() {
0 => {
let sigil = if entity_type == "issue" { "#" } else { "!" };
Err(super::error::LoreError::NotFound(format!(
"{entity_type} {sigil}{iid} not found"
)))
}
1 => {
let (entity_id, entity_iid, project_path) = rows.into_iter().next().unwrap();
Ok(EntityRef {
entity_type: entity_type.to_owned(),
entity_id,
entity_iid,
project_path,
})
}
_ => {
let projects: Vec<&str> = rows.iter().map(|(_, _, p)| p.as_str()).collect();
let sigil = if entity_type == "issue" { "#" } else { "!" };
Err(super::error::LoreError::Ambiguous(format!(
"{entity_type} {sigil}{iid} exists in multiple projects: {}. Use --project to specify.",
projects.join(", ")
)))
}
}
}
#[cfg(test)]
mod tests {
use super::*;
@@ -409,4 +480,106 @@ mod tests {
let long = "a".repeat(300);
assert_eq!(truncate_to_chars(&long, 200).chars().count(), 200);
}
// ─── resolve_entity_by_iid tests ────────────────────────────────────────
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn insert_project(conn: &Connection, gitlab_id: i64, path: &str) -> i64 {
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (?1, ?2, ?3)",
rusqlite::params![gitlab_id, path, format!("https://gitlab.com/{path}")],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test issue', 'opened', 'alice', 1000, 2000, 3000)",
rusqlite::params![project_id * 10000 + iid, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
rusqlite::params![project_id * 10000 + iid, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
#[test]
fn test_resolve_entity_by_iid_issue() {
let conn = setup_db();
let project_id = insert_project(&conn, 1, "group/project");
let entity_id = insert_issue(&conn, project_id, 42);
let result = resolve_entity_by_iid(&conn, "issue", 42, None).unwrap();
assert_eq!(result.entity_type, "issue");
assert_eq!(result.entity_id, entity_id);
assert_eq!(result.entity_iid, 42);
assert_eq!(result.project_path, "group/project");
}
#[test]
fn test_resolve_entity_by_iid_mr() {
let conn = setup_db();
let project_id = insert_project(&conn, 1, "group/project");
let entity_id = insert_mr(&conn, project_id, 99);
let result = resolve_entity_by_iid(&conn, "merge_request", 99, None).unwrap();
assert_eq!(result.entity_type, "merge_request");
assert_eq!(result.entity_id, entity_id);
assert_eq!(result.entity_iid, 99);
assert_eq!(result.project_path, "group/project");
}
#[test]
fn test_resolve_entity_by_iid_not_found() {
let conn = setup_db();
insert_project(&conn, 1, "group/project");
let result = resolve_entity_by_iid(&conn, "issue", 999, None);
assert!(result.is_err());
let err = result.unwrap_err();
assert!(matches!(err, crate::core::error::LoreError::NotFound(_)));
}
#[test]
fn test_resolve_entity_by_iid_ambiguous() {
let conn = setup_db();
let proj1 = insert_project(&conn, 1, "group/project-a");
let proj2 = insert_project(&conn, 2, "group/project-b");
insert_issue(&conn, proj1, 42);
insert_issue(&conn, proj2, 42);
let result = resolve_entity_by_iid(&conn, "issue", 42, None);
assert!(result.is_err());
let err = result.unwrap_err();
assert!(matches!(err, crate::core::error::LoreError::Ambiguous(_)));
}
#[test]
fn test_resolve_entity_by_iid_project_scoped() {
let conn = setup_db();
let proj1 = insert_project(&conn, 1, "group/project-a");
let proj2 = insert_project(&conn, 2, "group/project-b");
insert_issue(&conn, proj1, 42);
let entity_id_b = insert_issue(&conn, proj2, 42);
let result = resolve_entity_by_iid(&conn, "issue", 42, Some(proj2)).unwrap();
assert_eq!(result.entity_id, entity_id_b);
assert_eq!(result.project_path, "group/project-b");
}
}

View File

@@ -5,8 +5,8 @@ use tracing::debug;
use crate::core::error::Result;
use crate::core::timeline::{
EntityRef, MatchedDiscussion, TimelineEvent, TimelineEventType, resolve_entity_ref,
truncate_to_chars,
EntityRef, MatchedDiscussion, TimelineEvent, TimelineEventType, resolve_entity_by_iid,
resolve_entity_ref, truncate_to_chars,
};
use crate::embedding::ollama::OllamaClient;
use crate::search::{FtsQueryMode, SearchFilters, SearchMode, search_hybrid, to_fts_query};
@@ -102,6 +102,53 @@ pub async fn seed_timeline(
})
}
/// Seed the timeline directly from an entity IID, bypassing search entirely.
///
/// Used for `issue:42` / `mr:99` syntax. Resolves the entity, gathers ALL its
/// discussions, and returns a `SeedResult` compatible with the rest of the pipeline.
pub fn seed_timeline_direct(
conn: &Connection,
entity_type: &str,
iid: i64,
project_id: Option<i64>,
) -> Result<SeedResult> {
let entity_ref = resolve_entity_by_iid(conn, entity_type, iid, project_id)?;
// Gather all discussions for this entity (not search-matched, ALL of them)
let entity_id_col = match entity_type {
"issue" => "issue_id",
"merge_request" => "merge_request_id",
_ => {
return Ok(SeedResult {
seed_entities: vec![entity_ref],
evidence_notes: Vec::new(),
matched_discussions: Vec::new(),
search_mode: "direct".to_owned(),
});
}
};
let sql = format!("SELECT id, project_id FROM discussions WHERE {entity_id_col} = ?1");
let mut stmt = conn.prepare(&sql)?;
let matched_discussions: Vec<MatchedDiscussion> = stmt
.query_map(rusqlite::params![entity_ref.entity_id], |row| {
Ok(MatchedDiscussion {
discussion_id: row.get(0)?,
entity_type: entity_type.to_owned(),
entity_id: entity_ref.entity_id,
project_id: row.get(1)?,
})
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
Ok(SeedResult {
seed_entities: vec![entity_ref],
evidence_notes: Vec::new(),
matched_discussions,
search_mode: "direct".to_owned(),
})
}
/// Resolve a list of document IDs to deduplicated entity refs and matched discussions.
/// Discussion and note documents are resolved to their parent entity (issue or MR).
/// Returns (entities, matched_discussions).

View File

@@ -423,3 +423,90 @@ async fn test_seed_matched_discussions_have_correct_parent_entity() {
assert_eq!(result.matched_discussions[0].entity_type, "merge_request");
assert_eq!(result.matched_discussions[0].entity_id, mr_id);
}
// ─── seed_timeline_direct tests ─────────────────────────────────────────────
#[test]
fn test_direct_seed_resolves_entity() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
insert_test_issue(&conn, project_id, 42);
let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "issue");
assert_eq!(result.seed_entities[0].entity_iid, 42);
assert_eq!(result.seed_entities[0].project_path, "group/project");
}
#[test]
fn test_direct_seed_gathers_all_discussions() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 42);
// Create 3 discussions for this issue
let disc1 = insert_discussion(&conn, project_id, Some(issue_id), None);
let disc2 = insert_discussion(&conn, project_id, Some(issue_id), None);
let disc3 = insert_discussion(&conn, project_id, Some(issue_id), None);
let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
assert_eq!(result.matched_discussions.len(), 3);
let disc_ids: Vec<i64> = result
.matched_discussions
.iter()
.map(|d| d.discussion_id)
.collect();
assert!(disc_ids.contains(&disc1));
assert!(disc_ids.contains(&disc2));
assert!(disc_ids.contains(&disc3));
}
#[test]
fn test_direct_seed_no_evidence_notes() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 42);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_note(&conn, disc_id, project_id, "some note body", false);
let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
assert!(
result.evidence_notes.is_empty(),
"Direct seeding should not produce evidence notes"
);
}
#[test]
fn test_direct_seed_search_mode_is_direct() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
insert_test_issue(&conn, project_id, 42);
let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
assert_eq!(result.search_mode, "direct");
}
#[test]
fn test_direct_seed_not_found() {
let conn = setup_test_db();
insert_test_project(&conn);
let result = seed_timeline_direct(&conn, "issue", 999, None);
assert!(result.is_err());
}
#[test]
fn test_direct_seed_mr() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let mr_id = insert_test_mr(&conn, project_id, 99);
let disc_id = insert_discussion(&conn, project_id, None, Some(mr_id));
let result = seed_timeline_direct(&conn, "merge_request", 99, None).unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "merge_request");
assert_eq!(result.seed_entities[0].entity_iid, 99);
assert_eq!(result.matched_discussions.len(), 1);
assert_eq!(result.matched_discussions[0].discussion_id, disc_id);
}

View File

@@ -640,6 +640,24 @@ pub async fn ingest_project_merge_requests_with_progress(
);
}
let desc_refs = crate::core::note_parser::extract_refs_from_descriptions(conn, project_id)?;
if desc_refs.inserted > 0 || desc_refs.skipped_unresolvable > 0 {
debug!(
inserted = desc_refs.inserted,
unresolvable = desc_refs.skipped_unresolvable,
"Extracted cross-references from descriptions"
);
}
let user_note_refs = crate::core::note_parser::extract_refs_from_user_notes(conn, project_id)?;
if user_note_refs.inserted > 0 || user_note_refs.skipped_unresolvable > 0 {
debug!(
inserted = user_note_refs.inserted,
unresolvable = user_note_refs.skipped_unresolvable,
"Extracted cross-references from user notes"
);
}
{
let enqueued = enqueue_mr_closes_issues_jobs(conn, project_id)?;
if enqueued > 0 {

View File

@@ -651,27 +651,37 @@ fn extract_invalid_value_context(e: &clap::Error) -> (Option<String>, Option<Vec
/// Phase 4: Suggest similar command using fuzzy matching
fn suggest_similar_command(invalid: &str) -> String {
const VALID_COMMANDS: &[&str] = &[
"issues",
"mrs",
"search",
"sync",
"ingest",
"count",
"status",
"auth",
"doctor",
"version",
"init",
"stats",
"generate-docs",
"embed",
"migrate",
"health",
"robot-docs",
"completions",
"timeline",
"who",
// Primary commands + common aliases for fuzzy matching
const VALID_COMMANDS: &[(&str, &str)] = &[
("issues", "issues"),
("issue", "issues"),
("mrs", "mrs"),
("mr", "mrs"),
("merge-requests", "mrs"),
("search", "search"),
("find", "search"),
("query", "search"),
("sync", "sync"),
("ingest", "ingest"),
("count", "count"),
("status", "status"),
("auth", "auth"),
("doctor", "doctor"),
("version", "version"),
("init", "init"),
("stats", "stats"),
("stat", "stats"),
("generate-docs", "generate-docs"),
("embed", "embed"),
("migrate", "migrate"),
("health", "health"),
("robot-docs", "robot-docs"),
("completions", "completions"),
("timeline", "timeline"),
("who", "who"),
("notes", "notes"),
("note", "notes"),
("drift", "drift"),
];
let invalid_lower = invalid.to_lowercase();
@@ -679,19 +689,43 @@ fn suggest_similar_command(invalid: &str) -> String {
// Find the best match using Jaro-Winkler similarity
let best_match = VALID_COMMANDS
.iter()
.map(|cmd| (*cmd, jaro_winkler(&invalid_lower, cmd)))
.map(|(alias, canonical)| (*canonical, jaro_winkler(&invalid_lower, alias)))
.max_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
if let Some((cmd, score)) = best_match
&& score > 0.7
{
let example = command_example(cmd);
return format!(
"Did you mean 'lore {}'? Run 'lore robot-docs' for all commands",
cmd
"Did you mean 'lore {cmd}'? Example: {example}. Run 'lore robot-docs' for all commands"
);
}
"Run 'lore robot-docs' for valid commands".to_string()
"Run 'lore robot-docs' for valid commands. Common: issues, mrs, search, sync, timeline, who"
.to_string()
}
/// Return a contextual usage example for a command.
fn command_example(cmd: &str) -> &'static str {
match cmd {
"issues" => "lore --robot issues -n 10",
"mrs" => "lore --robot mrs -n 10",
"search" => "lore --robot search \"auth bug\"",
"sync" => "lore --robot sync",
"ingest" => "lore --robot ingest issues",
"notes" => "lore --robot notes --for-issue 123",
"count" => "lore --robot count issues",
"status" => "lore --robot status",
"stats" => "lore --robot stats",
"timeline" => "lore --robot timeline \"auth flow\"",
"who" => "lore --robot who --path src/",
"health" => "lore --robot health",
"generate-docs" => "lore --robot generate-docs",
"embed" => "lore --robot embed",
"robot-docs" => "lore robot-docs",
"init" => "lore init",
_ => "lore --robot <command>",
}
}
fn handle_issues(
@@ -2135,6 +2169,8 @@ struct RobotDocsData {
commands: serde_json::Value,
/// Deprecated command aliases (old -> new)
aliases: serde_json::Value,
/// Pre-clap error tolerance: what the CLI auto-corrects
error_tolerance: serde_json::Value,
exit_codes: serde_json::Value,
/// Error codes emitted by clap parse failures
clap_error_codes: serde_json::Value,
@@ -2345,13 +2381,17 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
"example": "lore completions bash > ~/.local/share/bash-completion/completions/lore"
},
"timeline": {
"description": "Chronological timeline of events matching a keyword query",
"description": "Chronological timeline of events matching a keyword query or entity reference",
"flags": ["<QUERY>", "-p/--project", "--since <duration>", "--depth <n>", "--expand-mentions", "-n/--limit", "--fields <list>", "--max-seeds", "--max-entities", "--max-evidence"],
"example": "lore --robot timeline '<keyword>' --since 30d",
"query_syntax": {
"search": "Any text -> hybrid search seeding (FTS + vector)",
"entity_direct": "issue:N, i:N, mr:N, m:N -> direct entity seeding (no search, no Ollama)"
},
"example": "lore --robot timeline issue:42",
"response_schema": {
"ok": "bool",
"data": {"entities": "[{type:string, iid:int, title:string, project_path:string}]", "events": "[{timestamp:string, type:string, entity_type:string, entity_iid:int, detail:string}]", "total_events": "int"},
"meta": {"elapsed_ms": "int"}
"meta": {"elapsed_ms": "int", "search_mode": "string (hybrid|lexical|direct)"}
},
"fields_presets": {"minimal": ["timestamp", "type", "entity_iid", "detail"]}
},
@@ -2485,12 +2525,54 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
// Phase 3: Deprecated command aliases
let aliases = serde_json::json!({
"list issues": "issues",
"list mrs": "mrs",
"show issue <IID>": "issues <IID>",
"show mr <IID>": "mrs <IID>",
"auth-test": "auth",
"sync-status": "status"
"deprecated_commands": {
"list issues": "issues",
"list mrs": "mrs",
"show issue <IID>": "issues <IID>",
"show mr <IID>": "mrs <IID>",
"auth-test": "auth",
"sync-status": "status"
},
"command_aliases": {
"issue": "issues",
"mr": "mrs",
"merge-requests": "mrs",
"merge-request": "mrs",
"note": "notes",
"find": "search",
"query": "search",
"stat": "stats",
"st": "status"
},
"pre_clap_aliases": {
"note": "Underscore/no-separator forms auto-corrected before parsing",
"merge_requests": "mrs",
"merge_request": "mrs",
"mergerequests": "mrs",
"mergerequest": "mrs",
"generate_docs": "generate-docs",
"generatedocs": "generate-docs",
"gendocs": "generate-docs",
"gen-docs": "generate-docs",
"robot_docs": "robot-docs",
"robotdocs": "robot-docs"
},
"prefix_matching": "Enabled via infer_subcommands. Unambiguous prefixes work: 'iss' -> issues, 'time' -> timeline, 'sea' -> search"
});
let error_tolerance = serde_json::json!({
"note": "The CLI auto-corrects common mistakes before parsing. Corrections are applied silently with a teaching note on stderr.",
"auto_corrections": [
{"type": "single_dash_long_flag", "example": "-robot -> --robot", "mode": "all"},
{"type": "case_normalization", "example": "--Robot -> --robot, --State -> --state", "mode": "all"},
{"type": "flag_prefix", "example": "--proj -> --project (when unambiguous)", "mode": "all"},
{"type": "fuzzy_flag", "example": "--projct -> --project", "mode": "all (threshold 0.9 in robot, 0.8 in human)"},
{"type": "subcommand_alias", "example": "merge_requests -> mrs, robotdocs -> robot-docs", "mode": "all"},
{"type": "value_normalization", "example": "--state Opened -> --state opened", "mode": "all"},
{"type": "value_fuzzy", "example": "--state opend -> --state opened", "mode": "all"},
{"type": "prefix_matching", "example": "lore iss -> lore issues, lore time -> lore timeline", "mode": "all (via clap infer_subcommands)"}
],
"teaching_notes": "Auto-corrections emit a JSON warning on stderr: {\"warning\":{\"type\":\"ARG_CORRECTED\",\"corrections\":[...],\"teaching\":[...]}}"
});
// Phase 3: Clap error codes (emitted by handle_clap_error)
@@ -2529,6 +2611,7 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
quick_start,
commands,
aliases,
error_tolerance,
exit_codes,
clap_error_codes,
error_format: "stderr JSON: {\"error\":{\"code\":\"...\",\"message\":\"...\",\"suggestion\":\"...\",\"actions\":[\"...\"]}}".to_string(),