8 Commits

Author SHA1 Message Date
Taylor Eernisse
c2f34d3a4f chore(beads): Update issue tracker metadata
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:23:13 -05:00
Taylor Eernisse
3bb24dc6cb docs: Add performance audit report with optimization findings
PERFORMANCE_AUDIT.md documents a comprehensive code analysis identifying
12 optimization opportunities across the codebase:

High-impact findings (ICE score > 8):
1. Triple-EXISTS change detection -> LEFT JOIN (DONE)
2. N+1 label/assignee inserts during ingestion
3. Clone in embedding batch loop
4. Correlated GROUP_CONCAT in list queries
5. Multiple EXISTS per label filter (DONE)

Medium-impact findings (ICE 5-7):
6. String allocation in chunking
7. Multiple COUNT queries -> conditional aggregation (DONE)
8. Collect-then-concat in truncation (DONE)
9. Box<dyn ToSql> allocations in filters
10. Missing Vec::with_capacity hints (DONE)
11. FTS token collect-join pattern (DONE)
12. Transformer string clones

Report includes:
- Methodology section explaining code-analysis approach
- ICE (Impact x Confidence / Effort) scoring matrix
- Detailed SQL query transformations with isomorphism proofs
- Before/after code samples for each optimization
- Test verification notes

Status: 6 of 12 optimizations implemented in this session.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:23:06 -05:00
Taylor Eernisse
42a4bca6df docs: Update README and AGENTS.md with new features and options
README.md:
- Add cross-reference tracking feature description
- Add resource event history feature description
- Add observability feature description (verbosity, JSON logs, metrics)
- Document --no-events flag for sync command
- Add sync timing/progress bar behavior note
- Document verbosity flags (-v, -vv, -vvv)
- Document --log-format json option
- Add new database tables to schema reference:
  - resource_state_events
  - resource_label_events
  - resource_milestone_events
  - entity_references

AGENTS.md:
- Add --no-events example for sync command
- Document verbosity flags (-v, -vv, -vvv)
- Document --log-format json option

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:22:50 -05:00
Taylor Eernisse
c730b0ec54 feat(cli): Improve help text, error handling, and add fuzzy command suggestions
CLI help improvements (cli/mod.rs):
- Add descriptive help text to all global flags (-c, --robot, -J, etc.)
- Add descriptions to all subcommands (Issues, Mrs, Sync, etc.)
- Add --no-quiet flag for explicit quiet override
- Shell completions now shows installation instructions for each shell
- Optional subcommand: running bare 'lore' shows help in terminal mode,
  robot-docs in robot mode

Structured clap error handling (main.rs):
- Early robot mode detection before parsing (env + args)
- JSON error output for parse failures in robot mode
- Semantic error codes: UNKNOWN_COMMAND, UNKNOWN_FLAG, MISSING_REQUIRED,
  INVALID_VALUE, ARGUMENT_CONFLICT, etc.
- Fuzzy command suggestion using Jaro-Winkler similarity (>0.7 threshold)
- Help/version requests handled normally (exit 0, not error)

Robot-docs enhancements (main.rs):
- Document deprecated command aliases (list issues -> issues, etc.)
- Document clap error codes for programmatic error handling
- Include completions command in manifest
- Update flag documentation to show short forms (-n, -s, -p, etc.)

Dependencies:
- Add strsim 0.11 for Jaro-Winkler fuzzy matching

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:22:38 -05:00
Taylor Eernisse
ab43bbd2db feat: Add dry-run mode to ingest, sync, and stats commands
Enables preview of operations without making changes, useful for
understanding what would happen before committing to a full sync.

Ingest dry-run (--dry-run flag):
- Shows resource type, sync mode (full vs incremental), project list
- Per-project info: existing count, has_cursor, last_synced timestamp
- No GitLab API calls, no database writes

Sync dry-run (--dry-run flag):
- Preview all four stages: issues ingest, MRs ingest, docs, embed
- Shows which stages would run vs be skipped (--no-docs, --no-embed)
- Per-project breakdown for both entity types

Stats repair dry-run (--dry-run flag):
- Shows what would be repaired without executing repairs
- "would fix" vs "fixed" indicator in terminal output
- dry_run: true field in JSON response

Implementation details:
- DryRunPreview struct captures project-level sync state
- SyncDryRunResult aggregates previews for all sync stages
- Terminal output uses yellow styling for "would" actions
- JSON output includes dry_run: true at top level

Flag handling:
- --dry-run and --no-dry-run pair for explicit control
- Defaults to false (normal operation)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:22:22 -05:00
Taylor Eernisse
784fe79b80 feat(show): Enrich issue detail with assignees, milestones, and closing MRs
Issue detail now includes:
- assignees: List of assigned usernames from issue_assignees table
- due_date: Issue due date when set
- milestone: Milestone title when assigned
- closing_merge_requests: MRs that will close this issue when merged

Closing MR detection:
- Queries entity_references table for 'closes' reference type
- Shows MR iid, title, state (with color coding) in terminal output
- Full MR metadata included in JSON output

Human-readable output:
- "Assignees:" line shows comma-separated @usernames
- "Development:" section lists closing MRs with state indicator
- Green for merged, cyan for opened, red for closed

JSON output:
- New fields: assignees, due_date, milestone, closing_merge_requests
- closing_merge_requests array contains iid, title, state, web_url

Test coverage:
- get_issue_assignees: empty, single, multiple (alphabetical order)
- get_closing_mrs: empty, single, ignores 'mentioned' references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:22:02 -05:00
Taylor Eernisse
db750e4fc5 fix: Graceful HTTP client fallbacks and overflow protection
HTTP client initialization (embedding/ollama.rs, gitlab/client.rs):
- Replace expect/panic with unwrap_or_else fallback to default Client
- Log warning when configured client fails to build
- Prevents crash on TLS/system configuration issues

Doctor command (cli/commands/doctor.rs):
- Handle reqwest Client::builder() failure in Ollama health check
- Return Warning status with descriptive message instead of panicking
- Ensures doctor command remains operational even with HTTP issues

These changes improve resilience when running in unusual environments
(containers with limited TLS, restrictive network policies, etc.)
without affecting normal operation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:21:40 -05:00
Taylor Eernisse
72f1cafdcf perf: Optimize SQL queries and reduce allocations in hot paths
Change detection queries (embedding/change_detector.rs):
- Replace triple-EXISTS subquery pattern with LEFT JOIN + NULL check
- SQLite now scans embedding_metadata once instead of three times
- Semantically identical: returns docs needing embedding when no
  embedding exists, hash changed, or config mismatch

Count queries (cli/commands/count.rs):
- Consolidate 3 separate COUNT queries for issues into single query
  using conditional aggregation (CASE WHEN state = 'x' THEN 1)
- Same optimization for MRs: 5 queries reduced to 1

Search filter queries (search/filters.rs):
- Replace N separate EXISTS clauses for label filtering with single
  IN() clause with COUNT/GROUP BY HAVING pattern
- For multi-label AND queries, this reduces N subqueries to 1

FTS tokenization (search/fts.rs):
- Replace collect-into-Vec-then-join pattern with direct String building
- Pre-allocate capacity hint for result string

Discussion truncation (documents/truncation.rs):
- Calculate total length without allocating concatenated string first
- Only allocate full string when we know it fits within limit

Embedding pipeline (embedding/pipeline.rs):
- Add Vec::with_capacity hints for chunk work and cleared_docs hashset
- Reduces reallocations during embedding batch processing

Backoff calculation (core/backoff.rs):
- Replace unchecked addition with saturating_add to prevent overflow
- Add test case verifying overflow protection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:21:28 -05:00
24 changed files with 1843 additions and 210 deletions

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
bd-3ia bd-1oo

View File

@@ -633,6 +633,9 @@ lore --robot status
# Run full sync pipeline # Run full sync pipeline
lore --robot sync lore --robot sync
# Run sync without resource events
lore --robot sync --no-events
# Run ingestion only # Run ingestion only
lore --robot ingest issues lore --robot ingest issues
@@ -712,6 +715,8 @@ Errors return structured JSON to stderr:
- Use `-n` / `--limit` to control response size - Use `-n` / `--limit` to control response size
- Use `-q` / `--quiet` to suppress progress bars and non-essential output - Use `-q` / `--quiet` to suppress progress bars and non-essential output
- Use `--color never` in non-TTY automation for ANSI-free output - Use `--color never` in non-TTY automation for ANSI-free output
- Use `-v` / `-vv` / `-vvv` for increasing verbosity (debug/trace logging)
- Use `--log-format json` for machine-readable log output to stderr
- TTY detection handles piped commands automatically - TTY detection handles piped commands automatically
- Use `lore --robot health` as a fast pre-flight check before queries - Use `lore --robot health` as a fast pre-flight check before queries
- The `-p` flag supports fuzzy project matching (suffix and substring) - The `-p` flag supports fuzzy project matching (suffix and substring)

1
Cargo.lock generated
View File

@@ -1129,6 +1129,7 @@ dependencies = [
"serde_json", "serde_json",
"sha2", "sha2",
"sqlite-vec", "sqlite-vec",
"strsim",
"tempfile", "tempfile",
"thiserror", "thiserror",
"tokio", "tokio",

View File

@@ -47,6 +47,7 @@ flate2 = "1"
chrono = { version = "0.4", features = ["serde"] } chrono = { version = "0.4", features = ["serde"] }
uuid = { version = "1", features = ["v4"] } uuid = { version = "1", features = ["v4"] }
regex = "1" regex = "1"
strsim = "0.11"
[target.'cfg(unix)'.dependencies] [target.'cfg(unix)'.dependencies]
libc = "0.2" libc = "0.2"

467
PERFORMANCE_AUDIT.md Normal file
View File

@@ -0,0 +1,467 @@
# Gitlore Performance Audit Report
**Date**: 2026-02-05
**Auditor**: Claude Code (Opus 4.5)
**Scope**: Core system performance - ingestion, embedding, search, and document regeneration
## Executive Summary
This audit identifies 12 high-impact optimization opportunities across the Gitlore codebase. The most significant findings center on:
1. **SQL query patterns** with N+1 issues and inefficient correlated subqueries
2. **Memory allocation patterns** in hot paths (embedding, chunking, ingestion)
3. **Change detection queries** using triple-EXISTS patterns instead of JOINs
**Estimated overall improvement potential**: 30-50% reduction in latency for filtered searches, 2-5x improvement in ingestion throughput for issues/MRs with many labels.
---
## Methodology
- **Codebase analysis**: Full read of all modules in `src/`
- **SQL pattern analysis**: All queries checked for N+1, missing indexes, unbounded results
- **Memory allocation analysis**: Clone patterns, unnecessary collections, missing capacity hints
- **Test baseline**: All tests pass (`cargo test --release`)
Note: Without access to a live GitLab instance or populated database, profiling is code-analysis based rather than runtime measured.
---
## Opportunity Matrix
| ID | Issue | Location | Impact | Confidence | Effort | ICE Score | Status |
|----|-------|----------|--------|------------|--------|-----------|--------|
| 1 | Triple-EXISTS change detection | `change_detector.rs:19-46` | HIGH | 95% | LOW | **9.5** | **DONE** |
| 2 | N+1 label/assignee inserts | `issues.rs:270-285`, `merge_requests.rs:242-272` | HIGH | 95% | MEDIUM | **9.0** | Pending |
| 3 | Clone in embedding batch loop | `pipeline.rs:165` | HIGH | 90% | LOW | **9.0** | Pending |
| 4 | Correlated GROUP_CONCAT in list | `list.rs:341-348` | HIGH | 90% | MEDIUM | **8.5** | Pending |
| 5 | Multiple EXISTS per label filter | `filters.rs:100-107` | HIGH | 85% | MEDIUM | **8.0** | **DONE** |
| 6 | String allocation in chunking | `chunking.rs:7-49` | MEDIUM | 95% | MEDIUM | **7.5** | Pending |
| 7 | Multiple COUNT queries | `count.rs:44-56` | MEDIUM | 95% | LOW | **7.0** | **DONE** |
| 8 | Collect-then-concat pattern | `truncation.rs:60-61` | MEDIUM | 90% | LOW | **7.0** | **DONE** |
| 9 | Box<dyn ToSql> allocations | `filters.rs:67-135` | MEDIUM | 80% | HIGH | **6.0** | Pending |
| 10 | Missing Vec::with_capacity | `pipeline.rs:106`, multiple | LOW | 95% | LOW | **5.5** | **DONE** |
| 11 | FTS token collect-join | `fts.rs:26-41` | LOW | 90% | LOW | **5.0** | **DONE** |
| 12 | Transformer string clones | `merge_request.rs:51-77` | MEDIUM | 85% | HIGH | **5.0** | Pending |
ICE Score = (Impact x Confidence) / Effort, scaled 1-10
---
## Detailed Findings
### 1. Triple-EXISTS Change Detection Query (ICE: 9.5)
**Location**: `src/embedding/change_detector.rs:19-46`
**Current Code**:
```sql
SELECT d.id, d.content_text, d.content_hash
FROM documents d
WHERE d.id > ?1
AND (
NOT EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0)
OR EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0 AND em.document_hash != d.content_hash)
OR EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0 AND (...))
)
ORDER BY d.id
LIMIT ?2
```
**Problem**: Three separate EXISTS subqueries, each scanning `embedding_metadata`. SQLite cannot short-circuit across OR'd EXISTS efficiently.
**Proposed Fix**:
```sql
SELECT d.id, d.content_text, d.content_hash
FROM documents d
LEFT JOIN embedding_metadata em
ON em.document_id = d.id AND em.chunk_index = 0
WHERE d.id > ?1
AND (
em.document_id IS NULL -- no embedding
OR em.document_hash != d.content_hash -- hash mismatch
OR em.chunk_max_bytes IS NULL
OR em.chunk_max_bytes != ?3
OR em.model != ?4
OR em.dims != ?5
)
ORDER BY d.id
LIMIT ?2
```
**Isomorphism Proof**: Both queries return documents needing embedding when:
- No embedding exists for chunk_index=0 (NULL check)
- Hash changed (direct comparison)
- Config mismatch (model/dims/chunk_max_bytes)
The LEFT JOIN + NULL check is semantically identical to NOT EXISTS. The OR conditions inside WHERE match the EXISTS predicates exactly.
**Expected Impact**: 2-3x faster for large document sets. Single scan of embedding_metadata instead of three.
---
### 2. N+1 Label/Assignee Inserts (ICE: 9.0)
**Location**:
- `src/ingestion/issues.rs:270-285`
- `src/ingestion/merge_requests.rs:242-272`
**Current Code**:
```rust
for label_name in label_names {
let label_id = upsert_label_tx(tx, project_id, label_name, &mut labels_created)?;
link_issue_label_tx(tx, local_issue_id, label_id)?;
}
```
**Problem**: Each label triggers 2+ SQL statements. With 20 labels × 100 issues = 4000+ queries per batch.
**Proposed Fix**: Batch insert using prepared statements with multi-row VALUES:
```rust
// Build batch: INSERT INTO issue_labels VALUES (?, ?), (?, ?), ...
let mut values = String::new();
let mut params: Vec<Box<dyn ToSql>> = Vec::with_capacity(label_ids.len() * 2);
for (i, label_id) in label_ids.iter().enumerate() {
if i > 0 { values.push_str(","); }
values.push_str("(?,?)");
params.push(Box::new(local_issue_id));
params.push(Box::new(*label_id));
}
let sql = format!("INSERT OR IGNORE INTO issue_labels (issue_id, label_id) VALUES {}", values);
```
Or use `prepare_cached()` pattern from `events_db.rs`.
**Isomorphism Proof**: Both approaches insert identical rows. OR IGNORE handles duplicates identically.
**Expected Impact**: 5-10x faster ingestion for issues/MRs with many labels.
---
### 3. Clone in Embedding Batch Loop (ICE: 9.0)
**Location**: `src/embedding/pipeline.rs:165`
**Current Code**:
```rust
let texts: Vec<String> = batch.iter().map(|c| c.text.clone()).collect();
```
**Problem**: Every batch iteration clones all chunk texts. With BATCH_SIZE=32 and thousands of chunks, this doubles memory allocation in the hot path.
**Proposed Fix**: Transfer ownership instead of cloning:
```rust
// Option A: Drain chunks from all_chunks instead of iterating
let texts: Vec<String> = batch.into_iter().map(|c| c.text).collect();
// Option B: Store references in ChunkWork, clone only at API boundary
struct ChunkWork<'a> {
text: &'a str,
// ...
}
```
**Isomorphism Proof**: Same texts sent to Ollama, same embeddings returned. Order and content identical.
**Expected Impact**: 30-50% reduction in embedding pipeline memory allocation.
---
### 4. Correlated GROUP_CONCAT in List Queries (ICE: 8.5)
**Location**: `src/cli/commands/list.rs:341-348`
**Current Code**:
```sql
SELECT i.*,
(SELECT GROUP_CONCAT(l.name, X'1F') FROM issue_labels il JOIN labels l ... WHERE il.issue_id = i.id) AS labels_csv,
(SELECT COUNT(*) FROM discussions WHERE issue_id = i.id) as discussion_count
FROM issues i
```
**Problem**: Each correlated subquery executes per row. With LIMIT 50, that's 100+ subquery executions.
**Proposed Fix**: Use window functions or pre-aggregated CTEs:
```sql
WITH label_agg AS (
SELECT il.issue_id, GROUP_CONCAT(l.name, X'1F') AS labels_csv
FROM issue_labels il JOIN labels l ON il.label_id = l.id
GROUP BY il.issue_id
),
discussion_agg AS (
SELECT issue_id, COUNT(*) AS cnt
FROM discussions WHERE issue_id IS NOT NULL
GROUP BY issue_id
)
SELECT i.*, la.labels_csv, da.cnt
FROM issues i
LEFT JOIN label_agg la ON la.issue_id = i.id
LEFT JOIN discussion_agg da ON da.issue_id = i.id
WHERE ...
LIMIT 50
```
**Isomorphism Proof**: Same data returned - labels concatenated, discussion counts accurate. JOIN preserves NULL when no labels/discussions exist.
**Expected Impact**: 3-5x faster list queries with discussion/label data.
---
### 5. Multiple EXISTS Per Label Filter (ICE: 8.0)
**Location**: `src/search/filters.rs:100-107`
**Current Code**:
```sql
WHERE EXISTS (SELECT 1 ... AND label_name = ?)
AND EXISTS (SELECT 1 ... AND label_name = ?)
AND EXISTS (SELECT 1 ... AND label_name = ?)
```
**Problem**: Filtering by 3 labels generates 3 EXISTS subqueries. Each scans document_labels.
**Proposed Fix**: Single EXISTS with GROUP BY/HAVING:
```sql
WHERE EXISTS (
SELECT 1 FROM document_labels dl
WHERE dl.document_id = d.id
AND dl.label_name IN (?, ?, ?)
GROUP BY dl.document_id
HAVING COUNT(DISTINCT dl.label_name) = 3
)
```
**Isomorphism Proof**: Both return documents with ALL specified labels. AND of EXISTS = document has label1 AND label2 AND label3. GROUP BY + HAVING COUNT(DISTINCT) = 3 is mathematically equivalent.
**Expected Impact**: 2-4x faster filtered search with multiple labels.
---
### 6. String Allocation in Chunking (ICE: 7.5)
**Location**: `src/embedding/chunking.rs:7-49`
**Current Code**:
```rust
chunks.push((chunk_index, remaining.to_string()));
```
**Problem**: Converts `&str` slices to owned `String` for every chunk. The input is already a `&str`.
**Proposed Fix**: Return borrowed slices or use `Cow`:
```rust
pub fn split_into_chunks(content: &str) -> Vec<(usize, &str)> {
// Return slices into original content
}
```
Or if ownership is needed later:
```rust
pub fn split_into_chunks(content: &str) -> Vec<(usize, Cow<'_, str>)>
```
**Isomorphism Proof**: Same chunk boundaries, same text content. Only allocation behavior changes.
**Expected Impact**: Reduces allocations by ~50% in chunking hot path.
---
### 7. Multiple COUNT Queries (ICE: 7.0)
**Location**: `src/cli/commands/count.rs:44-56`
**Current Code**:
```rust
let count = conn.query_row("SELECT COUNT(*) FROM issues", ...)?;
let opened = conn.query_row("SELECT COUNT(*) FROM issues WHERE state = 'opened'", ...)?;
let closed = conn.query_row("SELECT COUNT(*) FROM issues WHERE state = 'closed'", ...)?;
```
**Problem**: 5 separate queries for MR state breakdown, 3 for issues.
**Proposed Fix**: Single query with CASE aggregation:
```sql
SELECT
COUNT(*) AS total,
SUM(CASE WHEN state = 'opened' THEN 1 ELSE 0 END) AS opened,
SUM(CASE WHEN state = 'closed' THEN 1 ELSE 0 END) AS closed
FROM issues
```
**Isomorphism Proof**: Identical counts returned. CASE WHEN with SUM is standard SQL for conditional counting.
**Expected Impact**: 3-5x fewer round trips for count command.
---
### 8. Collect-then-Concat Pattern (ICE: 7.0)
**Location**: `src/documents/truncation.rs:60-61`
**Current Code**:
```rust
let formatted: Vec<String> = notes.iter().map(format_note).collect();
let total: String = formatted.concat();
```
**Problem**: Allocates intermediate Vec<String>, then allocates again for concat.
**Proposed Fix**: Use fold or format directly:
```rust
let total = notes.iter().fold(String::new(), |mut acc, note| {
acc.push_str(&format_note(note));
acc
});
```
Or with capacity hint:
```rust
let total_len: usize = notes.iter().map(|n| estimate_note_len(n)).sum();
let mut total = String::with_capacity(total_len);
for note in notes {
total.push_str(&format_note(note));
}
```
**Isomorphism Proof**: Same concatenated string output. Order preserved.
**Expected Impact**: 50% reduction in allocations for document regeneration.
---
### 9. Box<dyn ToSql> Allocations (ICE: 6.0)
**Location**: `src/search/filters.rs:67-135`
**Current Code**:
```rust
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = vec![Box::new(ids_json)];
// ... more Box::new() calls
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
```
**Problem**: Boxing each parameter, then collecting references. Two allocations per parameter.
**Proposed Fix**: Use rusqlite's params! macro or typed parameter arrays:
```rust
// For known parameter counts, use arrays
let params: [&dyn ToSql; 4] = [&ids_json, &author, &state, &limit];
// Or build SQL with named parameters and use params! directly
```
**Expected Impact**: Eliminates ~15 allocations per filtered search.
---
### 10. Missing Vec::with_capacity (ICE: 5.5)
**Locations**:
- `src/embedding/pipeline.rs:106`
- `src/embedding/pipeline.rs:162`
- Multiple other locations
**Current Code**:
```rust
let mut all_chunks: Vec<ChunkWork> = Vec::new();
```
**Proposed Fix**:
```rust
// Estimate: average 3 chunks per document
let mut all_chunks = Vec::with_capacity(pending.len() * 3);
```
**Expected Impact**: Eliminates reallocation overhead during vector growth.
---
### 11. FTS Token Collect-Join (ICE: 5.0)
**Location**: `src/search/fts.rs:26-41`
**Current Code**:
```rust
let tokens: Vec<String> = trimmed.split_whitespace().map(...).collect();
tokens.join(" ")
```
**Proposed Fix**: Use itertools or avoid intermediate vec:
```rust
use itertools::Itertools;
trimmed.split_whitespace().map(...).join(" ")
```
**Expected Impact**: Minor - search queries are typically short.
---
### 12. Transformer String Clones (ICE: 5.0)
**Location**: `src/gitlab/transformers/merge_request.rs:51-77`
**Problem**: Multiple `.clone()` calls on String fields during transformation.
**Proposed Fix**: Use `std::mem::take()` where possible, or restructure to avoid cloning.
**Expected Impact**: Moderate - depends on MR volume.
---
## Regression Guardrails
For any optimization implemented:
1. **Test Coverage**: All existing tests must pass
2. **Output Equivalence**: For SQL changes, verify identical result sets with test data
3. **Benchmark Suite**: Add benchmarks for affected paths before/after
Suggested benchmark targets:
```rust
#[bench] fn bench_change_detection_1k_docs(b: &mut Bencher) { ... }
#[bench] fn bench_label_insert_50_labels(b: &mut Bencher) { ... }
#[bench] fn bench_hybrid_search_filtered(b: &mut Bencher) { ... }
```
---
## Implementation Priority
**Phase 1 (Quick Wins)** - COMPLETE:
1. ~~Change detection query rewrite (#1)~~ **DONE**
2. ~~Multiple COUNT consolidation (#7)~~ **DONE**
3. ~~Collect-concat pattern (#8)~~ **DONE**
4. ~~Vec::with_capacity hints (#10)~~ **DONE**
5. ~~FTS token collect-join (#11)~~ **DONE**
6. ~~Multiple EXISTS per label (#5)~~ **DONE**
**Phase 2 (Medium Effort)**:
5. Embedding batch clone removal (#3)
6. Label filter EXISTS consolidation (#5)
7. Chunking string allocation (#6)
**Phase 3 (Higher Effort)**:
8. N+1 batch inserts (#2)
9. List query CTEs (#4)
10. Parameter boxing (#9)
---
## Appendix: Test Baseline
```
cargo test --release
running 127 tests
test result: ok. 127 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
```
All tests pass. Any optimization must maintain this baseline.

View File

@@ -12,7 +12,10 @@ Local GitLab data management with semantic search. Syncs issues, MRs, discussion
- **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion - **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
- **Raw payload storage**: Preserves original GitLab API responses for debugging - **Raw payload storage**: Preserves original GitLab API responses for debugging
- **Discussion threading**: Full support for issue and MR discussions including inline code review comments - **Discussion threading**: Full support for issue and MR discussions including inline code review comments
- **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
- **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
- **Robot mode**: Machine-readable JSON output with structured errors and meaningful exit codes - **Robot mode**: Machine-readable JSON output with structured errors and meaningful exit codes
- **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing
## Installation ## Installation
@@ -254,8 +257,11 @@ lore sync --full # Reset cursors, fetch everything
lore sync --force # Override stale lock lore sync --force # Override stale lock
lore sync --no-embed # Skip embedding step lore sync --no-embed # Skip embedding step
lore sync --no-docs # Skip document regeneration lore sync --no-docs # Skip document regeneration
lore sync --no-events # Skip resource event fetching
``` ```
The sync command displays animated progress bars for each stage and outputs timing metrics on completion. In robot mode (`-J`), detailed stage timing is included in the JSON response.
### `lore ingest` ### `lore ingest`
Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings). Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings).
@@ -478,6 +484,10 @@ lore -J <command> # JSON shorthand
lore --color never <command> # Disable color output lore --color never <command> # Disable color output
lore --color always <command> # Force color output lore --color always <command> # Force color output
lore -q <command> # Suppress non-essential output lore -q <command> # Suppress non-essential output
lore -v <command> # Debug logging
lore -vv <command> # More verbose debug logging
lore -vvv <command> # Trace-level logging
lore --log-format json <command> # JSON-formatted log output to stderr
``` ```
Color output respects `NO_COLOR` and `CLICOLOR` environment variables in `auto` mode (the default). Color output respects `NO_COLOR` and `CLICOLOR` environment variables in `auto` mode (the default).
@@ -518,6 +528,10 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
| `mr_reviewers` | Many-to-many MR-reviewer relationships | | `mr_reviewers` | Many-to-many MR-reviewer relationships |
| `discussions` | Issue/MR discussion threads | | `discussions` | Issue/MR discussion threads |
| `notes` | Individual notes within discussions (with system note flag and DiffNote position data) | | `notes` | Individual notes within discussions (with system note flag and DiffNote position data) |
| `resource_state_events` | Issue/MR state change history (opened, closed, merged, reopened) |
| `resource_label_events` | Label add/remove events with actor and timestamp |
| `resource_milestone_events` | Milestone add/remove events with actor and timestamp |
| `entity_references` | Cross-references between entities (MR closes issue, mentioned in, etc.) |
| `documents` | Extracted searchable text for FTS and embedding | | `documents` | Extracted searchable text for FTS and embedding |
| `documents_fts` | FTS5 full-text search index | | `documents_fts` | FTS5 full-text search index |
| `embeddings` | Vector embeddings for semantic search | | `embeddings` | Vector embeddings for semantic search |

View File

@@ -41,18 +41,15 @@ pub fn run_count(config: &Config, entity: &str, type_filter: Option<&str>) -> Re
} }
fn count_issues(conn: &Connection) -> Result<CountResult> { fn count_issues(conn: &Connection) -> Result<CountResult> {
let count: i64 = conn.query_row("SELECT COUNT(*) FROM issues", [], |row| row.get(0))?; // Single query with conditional aggregation instead of 3 separate queries
let (count, opened, closed): (i64, i64, i64) = conn.query_row(
let opened: i64 = conn.query_row( "SELECT
"SELECT COUNT(*) FROM issues WHERE state = 'opened'", COUNT(*),
COALESCE(SUM(CASE WHEN state = 'opened' THEN 1 ELSE 0 END), 0),
COALESCE(SUM(CASE WHEN state = 'closed' THEN 1 ELSE 0 END), 0)
FROM issues",
[], [],
|row| row.get(0), |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
)?;
let closed: i64 = conn.query_row(
"SELECT COUNT(*) FROM issues WHERE state = 'closed'",
[],
|row| row.get(0),
)?; )?;
Ok(CountResult { Ok(CountResult {
@@ -69,30 +66,25 @@ fn count_issues(conn: &Connection) -> Result<CountResult> {
} }
fn count_mrs(conn: &Connection) -> Result<CountResult> { fn count_mrs(conn: &Connection) -> Result<CountResult> {
let count: i64 = conn.query_row("SELECT COUNT(*) FROM merge_requests", [], |row| row.get(0))?; // Single query with conditional aggregation instead of 5 separate queries
let (count, opened, merged, closed, locked): (i64, i64, i64, i64, i64) = conn.query_row(
let opened: i64 = conn.query_row( "SELECT
"SELECT COUNT(*) FROM merge_requests WHERE state = 'opened'", COUNT(*),
COALESCE(SUM(CASE WHEN state = 'opened' THEN 1 ELSE 0 END), 0),
COALESCE(SUM(CASE WHEN state = 'merged' THEN 1 ELSE 0 END), 0),
COALESCE(SUM(CASE WHEN state = 'closed' THEN 1 ELSE 0 END), 0),
COALESCE(SUM(CASE WHEN state = 'locked' THEN 1 ELSE 0 END), 0)
FROM merge_requests",
[], [],
|row| row.get(0), |row| {
)?; Ok((
row.get(0)?,
let merged: i64 = conn.query_row( row.get(1)?,
"SELECT COUNT(*) FROM merge_requests WHERE state = 'merged'", row.get(2)?,
[], row.get(3)?,
|row| row.get(0), row.get(4)?,
)?; ))
},
let closed: i64 = conn.query_row(
"SELECT COUNT(*) FROM merge_requests WHERE state = 'closed'",
[],
|row| row.get(0),
)?;
let locked: i64 = conn.query_row(
"SELECT COUNT(*) FROM merge_requests WHERE state = 'locked'",
[],
|row| row.get(0),
)?; )?;
Ok(CountResult { Ok(CountResult {

View File

@@ -383,10 +383,22 @@ async fn check_ollama(config: Option<&Config>) -> OllamaCheck {
let base_url = &config.embedding.base_url; let base_url = &config.embedding.base_url;
let model = &config.embedding.model; let model = &config.embedding.model;
let client = reqwest::Client::builder() let client = match reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(2)) .timeout(std::time::Duration::from_secs(2))
.build() .build()
.unwrap(); {
Ok(client) => client,
Err(e) => {
return OllamaCheck {
result: CheckResult {
status: CheckStatus::Warning,
message: Some(format!("Failed to build HTTP client: {e}")),
},
url: Some(base_url.clone()),
model: Some(model.clone()),
};
}
};
match client.get(format!("{base_url}/api/tags")).send().await { match client.get(format!("{base_url}/api/tags")).send().await {
Ok(response) if response.status().is_success() => { Ok(response) if response.status().is_success() => {

View File

@@ -42,6 +42,23 @@ pub struct IngestResult {
pub resource_events_failed: usize, pub resource_events_failed: usize,
} }
#[derive(Debug, Default, Clone, Serialize)]
pub struct DryRunPreview {
pub resource_type: String,
pub projects: Vec<DryRunProjectPreview>,
pub sync_mode: String,
}
#[derive(Debug, Default, Clone, Serialize)]
pub struct DryRunProjectPreview {
pub path: String,
pub local_id: i64,
pub gitlab_id: i64,
pub has_cursor: bool,
pub last_synced: Option<String>,
pub existing_count: i64,
}
enum ProjectIngestOutcome { enum ProjectIngestOutcome {
Issues { Issues {
path: String, path: String,
@@ -86,12 +103,14 @@ impl IngestDisplay {
} }
} }
#[allow(clippy::too_many_arguments)]
pub async fn run_ingest( pub async fn run_ingest(
config: &Config, config: &Config,
resource_type: &str, resource_type: &str,
project_filter: Option<&str>, project_filter: Option<&str>,
force: bool, force: bool,
full: bool, full: bool,
dry_run: bool,
display: IngestDisplay, display: IngestDisplay,
stage_bar: Option<ProgressBar>, stage_bar: Option<ProgressBar>,
) -> Result<IngestResult> { ) -> Result<IngestResult> {
@@ -105,6 +124,7 @@ pub async fn run_ingest(
project_filter, project_filter,
force, force,
full, full,
dry_run,
display, display,
stage_bar, stage_bar,
) )
@@ -112,15 +132,107 @@ pub async fn run_ingest(
.await .await
} }
pub fn run_ingest_dry_run(
config: &Config,
resource_type: &str,
project_filter: Option<&str>,
full: bool,
) -> Result<DryRunPreview> {
if resource_type != "issues" && resource_type != "mrs" {
return Err(LoreError::Other(format!(
"Invalid resource type '{}'. Valid types: issues, mrs",
resource_type
)));
}
let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?;
let projects = get_projects_to_sync(&conn, &config.projects, project_filter)?;
if projects.is_empty() {
if let Some(filter) = project_filter {
return Err(LoreError::Other(format!(
"Project '{}' not found in configuration",
filter
)));
}
return Err(LoreError::Other(
"No projects configured. Run 'lore init' first.".to_string(),
));
}
let mut preview = DryRunPreview {
resource_type: resource_type.to_string(),
projects: Vec::new(),
sync_mode: if full {
"full".to_string()
} else {
"incremental".to_string()
},
};
for (local_project_id, gitlab_project_id, path) in &projects {
let cursor_exists: bool = conn
.query_row(
"SELECT EXISTS(SELECT 1 FROM sync_cursors WHERE project_id = ? AND resource_type = ?)",
(*local_project_id, resource_type),
|row| row.get(0),
)
.unwrap_or(false);
let last_synced: Option<String> = conn
.query_row(
"SELECT updated_at FROM sync_cursors WHERE project_id = ? AND resource_type = ?",
(*local_project_id, resource_type),
|row| row.get(0),
)
.ok();
let existing_count: i64 = if resource_type == "issues" {
conn.query_row(
"SELECT COUNT(*) FROM issues WHERE project_id = ?",
[*local_project_id],
|row| row.get(0),
)
.unwrap_or(0)
} else {
conn.query_row(
"SELECT COUNT(*) FROM merge_requests WHERE project_id = ?",
[*local_project_id],
|row| row.get(0),
)
.unwrap_or(0)
};
preview.projects.push(DryRunProjectPreview {
path: path.clone(),
local_id: *local_project_id,
gitlab_id: *gitlab_project_id,
has_cursor: cursor_exists && !full,
last_synced: if full { None } else { last_synced },
existing_count,
});
}
Ok(preview)
}
#[allow(clippy::too_many_arguments)]
async fn run_ingest_inner( async fn run_ingest_inner(
config: &Config, config: &Config,
resource_type: &str, resource_type: &str,
project_filter: Option<&str>, project_filter: Option<&str>,
force: bool, force: bool,
full: bool, full: bool,
dry_run: bool,
display: IngestDisplay, display: IngestDisplay,
stage_bar: Option<ProgressBar>, stage_bar: Option<ProgressBar>,
) -> Result<IngestResult> { ) -> Result<IngestResult> {
// In dry_run mode, we don't actually ingest - use run_ingest_dry_run instead
// This flag is passed through for consistency but the actual dry-run logic
// is handled at the caller level
let _ = dry_run;
if resource_type != "issues" && resource_type != "mrs" { if resource_type != "issues" && resource_type != "mrs" {
return Err(LoreError::Other(format!( return Err(LoreError::Other(format!(
"Invalid resource type '{}'. Valid types: issues, mrs", "Invalid resource type '{}'. Valid types: issues, mrs",
@@ -759,3 +871,63 @@ pub fn print_ingest_summary(result: &IngestResult) {
); );
} }
} }
pub fn print_dry_run_preview(preview: &DryRunPreview) {
println!(
"{} {}",
style("Dry Run Preview").cyan().bold(),
style("(no changes will be made)").yellow()
);
println!();
let type_label = if preview.resource_type == "issues" {
"issues"
} else {
"merge requests"
};
println!(" Resource type: {}", style(type_label).white().bold());
println!(
" Sync mode: {}",
if preview.sync_mode == "full" {
style("full (all data will be re-fetched)").yellow()
} else {
style("incremental (only changes since last sync)").green()
}
);
println!(" Projects: {}", preview.projects.len());
println!();
println!("{}", style("Projects to sync:").cyan().bold());
for project in &preview.projects {
let sync_status = if !project.has_cursor {
style("initial sync").yellow()
} else {
style("incremental").green()
};
println!(" {} ({})", style(&project.path).white(), sync_status);
println!(" Existing {}: {}", type_label, project.existing_count);
if let Some(ref last_synced) = project.last_synced {
println!(" Last synced: {}", last_synced);
}
}
}
#[derive(Serialize)]
struct DryRunJsonOutput {
ok: bool,
dry_run: bool,
data: DryRunPreview,
}
pub fn print_dry_run_preview_json(preview: &DryRunPreview) {
let output = DryRunJsonOutput {
ok: true,
dry_run: true,
data: preview.clone(),
};
println!("{}", serde_json::to_string(&output).unwrap());
}

View File

@@ -17,10 +17,13 @@ pub use count::{
print_count, print_count_json, print_event_count, print_event_count_json, run_count, print_count, print_count_json, print_event_count, print_event_count_json, run_count,
run_count_events, run_count_events,
}; };
pub use doctor::{print_doctor_results, run_doctor}; pub use doctor::{DoctorChecks, print_doctor_results, run_doctor};
pub use embed::{print_embed, print_embed_json, run_embed}; pub use embed::{print_embed, print_embed_json, run_embed};
pub use generate_docs::{print_generate_docs, print_generate_docs_json, run_generate_docs}; pub use generate_docs::{print_generate_docs, print_generate_docs_json, run_generate_docs};
pub use ingest::{IngestDisplay, print_ingest_summary, print_ingest_summary_json, run_ingest}; pub use ingest::{
DryRunPreview, IngestDisplay, print_dry_run_preview, print_dry_run_preview_json,
print_ingest_summary, print_ingest_summary_json, run_ingest, run_ingest_dry_run,
};
pub use init::{InitInputs, InitOptions, InitResult, run_init}; pub use init::{InitInputs, InitOptions, InitResult, run_init};
pub use list::{ pub use list::{
ListFilters, MrListFilters, open_issue_in_browser, open_mr_in_browser, print_list_issues, ListFilters, MrListFilters, open_issue_in_browser, open_mr_in_browser, print_list_issues,

View File

@@ -56,6 +56,14 @@ pub struct DiffNotePosition {
pub position_type: Option<String>, pub position_type: Option<String>,
} }
#[derive(Debug, Clone, Serialize)]
pub struct ClosingMrRef {
pub iid: i64,
pub title: String,
pub state: String,
pub web_url: Option<String>,
}
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct IssueDetail { pub struct IssueDetail {
pub id: i64, pub id: i64,
@@ -69,6 +77,10 @@ pub struct IssueDetail {
pub web_url: Option<String>, pub web_url: Option<String>,
pub project_path: String, pub project_path: String,
pub labels: Vec<String>, pub labels: Vec<String>,
pub assignees: Vec<String>,
pub due_date: Option<String>,
pub milestone: Option<String>,
pub closing_merge_requests: Vec<ClosingMrRef>,
pub discussions: Vec<DiscussionDetail>, pub discussions: Vec<DiscussionDetail>,
} }
@@ -98,6 +110,10 @@ pub fn run_show_issue(
let labels = get_issue_labels(&conn, issue.id)?; let labels = get_issue_labels(&conn, issue.id)?;
let assignees = get_issue_assignees(&conn, issue.id)?;
let closing_mrs = get_closing_mrs(&conn, issue.id)?;
let discussions = get_issue_discussions(&conn, issue.id)?; let discussions = get_issue_discussions(&conn, issue.id)?;
Ok(IssueDetail { Ok(IssueDetail {
@@ -112,6 +128,10 @@ pub fn run_show_issue(
web_url: issue.web_url, web_url: issue.web_url,
project_path: issue.project_path, project_path: issue.project_path,
labels, labels,
assignees,
due_date: issue.due_date,
milestone: issue.milestone_title,
closing_merge_requests: closing_mrs,
discussions, discussions,
}) })
} }
@@ -127,6 +147,8 @@ struct IssueRow {
updated_at: i64, updated_at: i64,
web_url: Option<String>, web_url: Option<String>,
project_path: String, project_path: String,
due_date: Option<String>,
milestone_title: Option<String>,
} }
fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Result<IssueRow> { fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Result<IssueRow> {
@@ -135,7 +157,8 @@ fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Resu
let project_id = resolve_project(conn, project)?; let project_id = resolve_project(conn, project)?;
( (
"SELECT i.id, i.iid, i.title, i.description, i.state, i.author_username, "SELECT i.id, i.iid, i.title, i.description, i.state, i.author_username,
i.created_at, i.updated_at, i.web_url, p.path_with_namespace i.created_at, i.updated_at, i.web_url, p.path_with_namespace,
i.due_date, i.milestone_title
FROM issues i FROM issues i
JOIN projects p ON i.project_id = p.id JOIN projects p ON i.project_id = p.id
WHERE i.iid = ? AND i.project_id = ?", WHERE i.iid = ? AND i.project_id = ?",
@@ -144,7 +167,8 @@ fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Resu
} }
None => ( None => (
"SELECT i.id, i.iid, i.title, i.description, i.state, i.author_username, "SELECT i.id, i.iid, i.title, i.description, i.state, i.author_username,
i.created_at, i.updated_at, i.web_url, p.path_with_namespace i.created_at, i.updated_at, i.web_url, p.path_with_namespace,
i.due_date, i.milestone_title
FROM issues i FROM issues i
JOIN projects p ON i.project_id = p.id JOIN projects p ON i.project_id = p.id
WHERE i.iid = ?", WHERE i.iid = ?",
@@ -168,6 +192,8 @@ fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Resu
updated_at: row.get(7)?, updated_at: row.get(7)?,
web_url: row.get(8)?, web_url: row.get(8)?,
project_path: row.get(9)?, project_path: row.get(9)?,
due_date: row.get(10)?,
milestone_title: row.get(11)?,
}) })
})? })?
.collect::<std::result::Result<Vec<_>, _>>()?; .collect::<std::result::Result<Vec<_>, _>>()?;
@@ -201,6 +227,46 @@ fn get_issue_labels(conn: &Connection, issue_id: i64) -> Result<Vec<String>> {
Ok(labels) Ok(labels)
} }
fn get_issue_assignees(conn: &Connection, issue_id: i64) -> Result<Vec<String>> {
let mut stmt = conn.prepare(
"SELECT username FROM issue_assignees
WHERE issue_id = ?
ORDER BY username",
)?;
let assignees: Vec<String> = stmt
.query_map([issue_id], |row| row.get(0))?
.collect::<std::result::Result<Vec<_>, _>>()?;
Ok(assignees)
}
fn get_closing_mrs(conn: &Connection, issue_id: i64) -> Result<Vec<ClosingMrRef>> {
let mut stmt = conn.prepare(
"SELECT mr.iid, mr.title, mr.state, mr.web_url
FROM entity_references er
JOIN merge_requests mr ON mr.id = er.source_entity_id
WHERE er.target_entity_type = 'issue'
AND er.target_entity_id = ?
AND er.source_entity_type = 'merge_request'
AND er.reference_type = 'closes'
ORDER BY mr.iid",
)?;
let mrs: Vec<ClosingMrRef> = stmt
.query_map([issue_id], |row| {
Ok(ClosingMrRef {
iid: row.get(0)?,
title: row.get(1)?,
state: row.get(2)?,
web_url: row.get(3)?,
})
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
Ok(mrs)
}
fn get_issue_discussions(conn: &Connection, issue_id: i64) -> Result<Vec<DiscussionDetail>> { fn get_issue_discussions(conn: &Connection, issue_id: i64) -> Result<Vec<DiscussionDetail>> {
let mut disc_stmt = conn.prepare( let mut disc_stmt = conn.prepare(
"SELECT id, individual_note FROM discussions "SELECT id, individual_note FROM discussions
@@ -546,15 +612,57 @@ pub fn print_show_issue(issue: &IssueDetail) {
println!("State: {}", state_styled); println!("State: {}", state_styled);
println!("Author: @{}", issue.author_username); println!("Author: @{}", issue.author_username);
if !issue.assignees.is_empty() {
let label = if issue.assignees.len() > 1 {
"Assignees"
} else {
"Assignee"
};
println!(
"{}:{} {}",
label,
" ".repeat(10 - label.len()),
issue
.assignees
.iter()
.map(|a| format!("@{}", a))
.collect::<Vec<_>>()
.join(", ")
);
}
println!("Created: {}", format_date(issue.created_at)); println!("Created: {}", format_date(issue.created_at));
println!("Updated: {}", format_date(issue.updated_at)); println!("Updated: {}", format_date(issue.updated_at));
if let Some(due) = &issue.due_date {
println!("Due: {}", due);
}
if let Some(ms) = &issue.milestone {
println!("Milestone: {}", ms);
}
if issue.labels.is_empty() { if issue.labels.is_empty() {
println!("Labels: {}", style("(none)").dim()); println!("Labels: {}", style("(none)").dim());
} else { } else {
println!("Labels: {}", issue.labels.join(", ")); println!("Labels: {}", issue.labels.join(", "));
} }
if !issue.closing_merge_requests.is_empty() {
println!();
println!("{}", style("Development:").bold());
for mr in &issue.closing_merge_requests {
let state_indicator = match mr.state.as_str() {
"merged" => style(&mr.state).green(),
"opened" => style(&mr.state).cyan(),
"closed" => style(&mr.state).red(),
_ => style(&mr.state).dim(),
};
println!(" !{} {} ({})", mr.iid, mr.title, state_indicator);
}
}
if let Some(url) = &issue.web_url { if let Some(url) = &issue.web_url {
println!("URL: {}", style(url).dim()); println!("URL: {}", style(url).dim());
} }
@@ -779,9 +887,21 @@ pub struct IssueDetailJson {
pub web_url: Option<String>, pub web_url: Option<String>,
pub project_path: String, pub project_path: String,
pub labels: Vec<String>, pub labels: Vec<String>,
pub assignees: Vec<String>,
pub due_date: Option<String>,
pub milestone: Option<String>,
pub closing_merge_requests: Vec<ClosingMrRefJson>,
pub discussions: Vec<DiscussionDetailJson>, pub discussions: Vec<DiscussionDetailJson>,
} }
#[derive(Serialize)]
pub struct ClosingMrRefJson {
pub iid: i64,
pub title: String,
pub state: String,
pub web_url: Option<String>,
}
#[derive(Serialize)] #[derive(Serialize)]
pub struct DiscussionDetailJson { pub struct DiscussionDetailJson {
pub notes: Vec<NoteDetailJson>, pub notes: Vec<NoteDetailJson>,
@@ -810,6 +930,19 @@ impl From<&IssueDetail> for IssueDetailJson {
web_url: issue.web_url.clone(), web_url: issue.web_url.clone(),
project_path: issue.project_path.clone(), project_path: issue.project_path.clone(),
labels: issue.labels.clone(), labels: issue.labels.clone(),
assignees: issue.assignees.clone(),
due_date: issue.due_date.clone(),
milestone: issue.milestone.clone(),
closing_merge_requests: issue
.closing_merge_requests
.iter()
.map(|mr| ClosingMrRefJson {
iid: mr.iid,
title: mr.title.clone(),
state: mr.state.clone(),
web_url: mr.web_url.clone(),
})
.collect(),
discussions: issue.discussions.iter().map(|d| d.into()).collect(), discussions: issue.discussions.iter().map(|d| d.into()).collect(),
} }
} }
@@ -939,6 +1072,167 @@ pub fn print_show_mr_json(mr: &MrDetail) {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
use crate::core::db::run_migrations;
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_project(conn: &Connection) {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'group/repo', 'https://gitlab.example.com', 1000, 2000)",
[],
)
.unwrap();
}
fn seed_issue(conn: &Connection) {
seed_project(conn);
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, author_username,
created_at, updated_at, last_seen_at)
VALUES (1, 200, 10, 1, 'Test issue', 'opened', 'author', 1000, 2000, 2000)",
[],
)
.unwrap();
}
#[test]
fn test_get_issue_assignees_empty() {
let conn = setup_test_db();
seed_issue(&conn);
let result = get_issue_assignees(&conn, 1).unwrap();
assert!(result.is_empty());
}
#[test]
fn test_get_issue_assignees_single() {
let conn = setup_test_db();
seed_issue(&conn);
conn.execute(
"INSERT INTO issue_assignees (issue_id, username) VALUES (1, 'charlie')",
[],
)
.unwrap();
let result = get_issue_assignees(&conn, 1).unwrap();
assert_eq!(result, vec!["charlie"]);
}
#[test]
fn test_get_issue_assignees_multiple_sorted() {
let conn = setup_test_db();
seed_issue(&conn);
conn.execute(
"INSERT INTO issue_assignees (issue_id, username) VALUES (1, 'bob')",
[],
)
.unwrap();
conn.execute(
"INSERT INTO issue_assignees (issue_id, username) VALUES (1, 'alice')",
[],
)
.unwrap();
let result = get_issue_assignees(&conn, 1).unwrap();
assert_eq!(result, vec!["alice", "bob"]); // alphabetical
}
#[test]
fn test_get_closing_mrs_empty() {
let conn = setup_test_db();
seed_issue(&conn);
let result = get_closing_mrs(&conn, 1).unwrap();
assert!(result.is_empty());
}
#[test]
fn test_get_closing_mrs_single() {
let conn = setup_test_db();
seed_issue(&conn);
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, author_username,
source_branch, target_branch, created_at, updated_at, last_seen_at)
VALUES (1, 300, 5, 1, 'Fix the bug', 'merged', 'dev', 'fix', 'main', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO entity_references (project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id, reference_type, source_method, created_at)
VALUES (1, 'merge_request', 1, 'issue', 1, 'closes', 'api', 3000)",
[],
)
.unwrap();
let result = get_closing_mrs(&conn, 1).unwrap();
assert_eq!(result.len(), 1);
assert_eq!(result[0].iid, 5);
assert_eq!(result[0].title, "Fix the bug");
assert_eq!(result[0].state, "merged");
}
#[test]
fn test_get_closing_mrs_ignores_mentioned() {
let conn = setup_test_db();
seed_issue(&conn);
// Add a 'mentioned' reference that should be ignored
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, author_username,
source_branch, target_branch, created_at, updated_at, last_seen_at)
VALUES (1, 300, 5, 1, 'Some MR', 'opened', 'dev', 'feat', 'main', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO entity_references (project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id, reference_type, source_method, created_at)
VALUES (1, 'merge_request', 1, 'issue', 1, 'mentioned', 'note_parse', 3000)",
[],
)
.unwrap();
let result = get_closing_mrs(&conn, 1).unwrap();
assert!(result.is_empty()); // 'mentioned' refs not included
}
#[test]
fn test_get_closing_mrs_multiple_sorted() {
let conn = setup_test_db();
seed_issue(&conn);
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, author_username,
source_branch, target_branch, created_at, updated_at, last_seen_at)
VALUES (1, 300, 8, 1, 'Second fix', 'opened', 'dev', 'fix2', 'main', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, author_username,
source_branch, target_branch, created_at, updated_at, last_seen_at)
VALUES (2, 301, 5, 1, 'First fix', 'merged', 'dev', 'fix1', 'main', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO entity_references (project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id, reference_type, source_method, created_at)
VALUES (1, 'merge_request', 1, 'issue', 1, 'closes', 'api', 3000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO entity_references (project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id, reference_type, source_method, created_at)
VALUES (1, 'merge_request', 2, 'issue', 1, 'closes', 'api', 3000)",
[],
)
.unwrap();
let result = get_closing_mrs(&conn, 1).unwrap();
assert_eq!(result.len(), 2);
assert_eq!(result[0].iid, 5); // Lower iid first
assert_eq!(result[1].iid, 8);
}
#[test] #[test]
fn truncate_leaves_short_strings() { fn truncate_leaves_short_strings() {

View File

@@ -69,9 +69,10 @@ pub struct RepairResult {
pub fts_rebuilt: bool, pub fts_rebuilt: bool,
pub orphans_deleted: i64, pub orphans_deleted: i64,
pub stale_cleared: i64, pub stale_cleared: i64,
pub dry_run: bool,
} }
pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResult> { pub fn run_stats(config: &Config, check: bool, repair: bool, dry_run: bool) -> Result<StatsResult> {
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
@@ -220,43 +221,54 @@ pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResu
if repair { if repair {
let mut repair_result = RepairResult::default(); let mut repair_result = RepairResult::default();
repair_result.dry_run = dry_run;
if integrity.fts_doc_mismatch { if integrity.fts_doc_mismatch {
conn.execute( if !dry_run {
"INSERT INTO documents_fts(documents_fts) VALUES('rebuild')", conn.execute(
[], "INSERT INTO documents_fts(documents_fts) VALUES('rebuild')",
)?; [],
)?;
}
repair_result.fts_rebuilt = true; repair_result.fts_rebuilt = true;
} }
if integrity.orphan_embeddings > 0 && table_exists(&conn, "embedding_metadata") { if integrity.orphan_embeddings > 0 && table_exists(&conn, "embedding_metadata") {
let deleted = conn.execute( if !dry_run {
"DELETE FROM embedding_metadata let deleted = conn.execute(
WHERE NOT EXISTS (SELECT 1 FROM documents d WHERE d.id = embedding_metadata.document_id)", "DELETE FROM embedding_metadata
[], WHERE NOT EXISTS (SELECT 1 FROM documents d WHERE d.id = embedding_metadata.document_id)",
)?;
repair_result.orphans_deleted = deleted as i64;
if table_exists(&conn, "embeddings") {
let _ = conn.execute(
"DELETE FROM embeddings
WHERE rowid / 1000 NOT IN (SELECT id FROM documents)",
[], [],
); )?;
repair_result.orphans_deleted = deleted as i64;
if table_exists(&conn, "embeddings") {
let _ = conn.execute(
"DELETE FROM embeddings
WHERE rowid / 1000 NOT IN (SELECT id FROM documents)",
[],
);
}
} else {
repair_result.orphans_deleted = integrity.orphan_embeddings;
} }
} }
if integrity.stale_metadata > 0 && table_exists(&conn, "embedding_metadata") { if integrity.stale_metadata > 0 && table_exists(&conn, "embedding_metadata") {
let cleared = conn.execute( if !dry_run {
"DELETE FROM embedding_metadata let cleared = conn.execute(
WHERE document_id IN ( "DELETE FROM embedding_metadata
SELECT em.document_id FROM embedding_metadata em WHERE document_id IN (
JOIN documents d ON d.id = em.document_id SELECT em.document_id FROM embedding_metadata em
WHERE em.chunk_index = 0 AND em.document_hash != d.content_hash JOIN documents d ON d.id = em.document_id
)", WHERE em.chunk_index = 0 AND em.document_hash != d.content_hash
[], )",
)?; [],
repair_result.stale_cleared = cleared as i64; )?;
repair_result.stale_cleared = cleared as i64;
} else {
repair_result.stale_cleared = integrity.stale_metadata;
}
} }
integrity.repair = Some(repair_result); integrity.repair = Some(repair_result);
@@ -387,22 +399,35 @@ pub fn print_stats(result: &StatsResult) {
if let Some(ref repair) = integrity.repair { if let Some(ref repair) = integrity.repair {
println!(); println!();
println!("{}", style("Repair").cyan().bold()); if repair.dry_run {
println!(
"{} {}",
style("Repair").cyan().bold(),
style("(dry run - no changes made)").yellow()
);
} else {
println!("{}", style("Repair").cyan().bold());
}
let action = if repair.dry_run {
style("would fix").yellow()
} else {
style("fixed").green()
};
if repair.fts_rebuilt { if repair.fts_rebuilt {
println!(" {} FTS index rebuilt", style("fixed").green()); println!(" {} FTS index rebuilt", action);
} }
if repair.orphans_deleted > 0 { if repair.orphans_deleted > 0 {
println!( println!(
" {} {} orphan embeddings deleted", " {} {} orphan embeddings deleted",
style("fixed").green(), action, repair.orphans_deleted
repair.orphans_deleted
); );
} }
if repair.stale_cleared > 0 { if repair.stale_cleared > 0 {
println!( println!(
" {} {} stale metadata entries cleared", " {} {} stale metadata entries cleared",
style("fixed").green(), action, repair.stale_cleared
repair.stale_cleared
); );
} }
if !repair.fts_rebuilt && repair.orphans_deleted == 0 && repair.stale_cleared == 0 { if !repair.fts_rebuilt && repair.orphans_deleted == 0 && repair.stale_cleared == 0 {
@@ -442,6 +467,7 @@ pub fn print_stats_json(result: &StatsResult) {
fts_rebuilt: r.fts_rebuilt, fts_rebuilt: r.fts_rebuilt,
orphans_deleted: r.orphans_deleted, orphans_deleted: r.orphans_deleted,
stale_cleared: r.stale_cleared, stale_cleared: r.stale_cleared,
dry_run: r.dry_run,
}), }),
}), }),
}, },

View File

@@ -12,7 +12,7 @@ use crate::core::metrics::{MetricsLayer, StageTiming};
use super::embed::run_embed; use super::embed::run_embed;
use super::generate_docs::run_generate_docs; use super::generate_docs::run_generate_docs;
use super::ingest::{IngestDisplay, run_ingest}; use super::ingest::{DryRunPreview, IngestDisplay, run_ingest, run_ingest_dry_run};
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct SyncOptions { pub struct SyncOptions {
@@ -22,6 +22,7 @@ pub struct SyncOptions {
pub no_docs: bool, pub no_docs: bool,
pub no_events: bool, pub no_events: bool,
pub robot_mode: bool, pub robot_mode: bool,
pub dry_run: bool,
} }
#[derive(Debug, Default, Serialize)] #[derive(Debug, Default, Serialize)]
@@ -74,6 +75,11 @@ pub async fn run_sync(
..SyncResult::default() ..SyncResult::default()
}; };
// Handle dry_run mode - show preview without making any changes
if options.dry_run {
return run_sync_dry_run(config, &options).await;
}
let ingest_display = if options.robot_mode { let ingest_display = if options.robot_mode {
IngestDisplay::silent() IngestDisplay::silent()
} else { } else {
@@ -103,6 +109,7 @@ pub async fn run_sync(
None, None,
options.force, options.force,
options.full, options.full,
false, // dry_run - sync has its own dry_run handling
ingest_display, ingest_display,
Some(spinner.clone()), Some(spinner.clone()),
) )
@@ -127,6 +134,7 @@ pub async fn run_sync(
None, None,
options.force, options.force,
options.full, options.full,
false, // dry_run - sync has its own dry_run handling
ingest_display, ingest_display,
Some(spinner.clone()), Some(spinner.clone()),
) )
@@ -369,3 +377,172 @@ pub fn print_sync_json(result: &SyncResult, elapsed_ms: u64, metrics: Option<&Me
}; };
println!("{}", serde_json::to_string(&output).unwrap()); println!("{}", serde_json::to_string(&output).unwrap());
} }
#[derive(Debug, Default, Serialize)]
pub struct SyncDryRunResult {
pub issues_preview: DryRunPreview,
pub mrs_preview: DryRunPreview,
pub would_generate_docs: bool,
pub would_embed: bool,
}
async fn run_sync_dry_run(config: &Config, options: &SyncOptions) -> Result<SyncResult> {
// Get dry run previews for both issues and MRs
let issues_preview = run_ingest_dry_run(config, "issues", None, options.full)?;
let mrs_preview = run_ingest_dry_run(config, "mrs", None, options.full)?;
let dry_result = SyncDryRunResult {
issues_preview,
mrs_preview,
would_generate_docs: !options.no_docs,
would_embed: !options.no_embed,
};
if options.robot_mode {
print_sync_dry_run_json(&dry_result);
} else {
print_sync_dry_run(&dry_result);
}
// Return an empty SyncResult since this is just a preview
Ok(SyncResult::default())
}
pub fn print_sync_dry_run(result: &SyncDryRunResult) {
println!(
"{} {}",
style("Sync Dry Run Preview").cyan().bold(),
style("(no changes will be made)").yellow()
);
println!();
println!("{}", style("Stage 1: Issues Ingestion").white().bold());
println!(
" Sync mode: {}",
if result.issues_preview.sync_mode == "full" {
style("full").yellow()
} else {
style("incremental").green()
}
);
println!(" Projects: {}", result.issues_preview.projects.len());
for project in &result.issues_preview.projects {
let sync_status = if !project.has_cursor {
style("initial sync").yellow()
} else {
style("incremental").green()
};
println!(
" {} ({}) - {} existing",
&project.path, sync_status, project.existing_count
);
}
println!();
println!(
"{}",
style("Stage 2: Merge Requests Ingestion").white().bold()
);
println!(
" Sync mode: {}",
if result.mrs_preview.sync_mode == "full" {
style("full").yellow()
} else {
style("incremental").green()
}
);
println!(" Projects: {}", result.mrs_preview.projects.len());
for project in &result.mrs_preview.projects {
let sync_status = if !project.has_cursor {
style("initial sync").yellow()
} else {
style("incremental").green()
};
println!(
" {} ({}) - {} existing",
&project.path, sync_status, project.existing_count
);
}
println!();
if result.would_generate_docs {
println!(
"{} {}",
style("Stage 3: Document Generation").white().bold(),
style("(would run)").green()
);
} else {
println!(
"{} {}",
style("Stage 3: Document Generation").white().bold(),
style("(skipped)").dim()
);
}
if result.would_embed {
println!(
"{} {}",
style("Stage 4: Embedding").white().bold(),
style("(would run)").green()
);
} else {
println!(
"{} {}",
style("Stage 4: Embedding").white().bold(),
style("(skipped)").dim()
);
}
}
#[derive(Serialize)]
struct SyncDryRunJsonOutput {
ok: bool,
dry_run: bool,
data: SyncDryRunJsonData,
}
#[derive(Serialize)]
struct SyncDryRunJsonData {
stages: Vec<SyncDryRunStage>,
}
#[derive(Serialize)]
struct SyncDryRunStage {
name: String,
would_run: bool,
#[serde(skip_serializing_if = "Option::is_none")]
preview: Option<DryRunPreview>,
}
pub fn print_sync_dry_run_json(result: &SyncDryRunResult) {
let output = SyncDryRunJsonOutput {
ok: true,
dry_run: true,
data: SyncDryRunJsonData {
stages: vec![
SyncDryRunStage {
name: "ingest_issues".to_string(),
would_run: true,
preview: Some(result.issues_preview.clone()),
},
SyncDryRunStage {
name: "ingest_mrs".to_string(),
would_run: true,
preview: Some(result.mrs_preview.clone()),
},
SyncDryRunStage {
name: "generate_docs".to_string(),
would_run: result.would_generate_docs,
preview: None,
},
SyncDryRunStage {
name: "embed".to_string(),
would_run: result.would_embed,
preview: None,
},
],
},
};
println!("{}", serde_json::to_string(&output).unwrap());
}

View File

@@ -6,71 +6,127 @@ use std::io::IsTerminal;
#[derive(Parser)] #[derive(Parser)]
#[command(name = "lore")] #[command(name = "lore")]
#[command(version, about, long_about = None)] #[command(version, about = "Local GitLab data management with semantic search", long_about = None)]
#[command(subcommand_required = false)]
pub struct Cli { pub struct Cli {
#[arg(short = 'c', long, global = true)] /// Path to config file
#[arg(short = 'c', long, global = true, help = "Path to config file")]
pub config: Option<String>, pub config: Option<String>,
#[arg(long, global = true, env = "LORE_ROBOT")] /// Machine-readable JSON output (auto-enabled when piped)
#[arg(
long,
global = true,
env = "LORE_ROBOT",
help = "Machine-readable JSON output (auto-enabled when piped)"
)]
pub robot: bool, pub robot: bool,
#[arg(short = 'J', long = "json", global = true)] /// JSON output (global shorthand)
#[arg(
short = 'J',
long = "json",
global = true,
help = "JSON output (global shorthand)"
)]
pub json: bool, pub json: bool,
#[arg(long, global = true, value_parser = ["auto", "always", "never"], default_value = "auto")] /// Color output: auto (default), always, or never
#[arg(long, global = true, value_parser = ["auto", "always", "never"], default_value = "auto", help = "Color output: auto (default), always, or never")]
pub color: String, pub color: String,
#[arg(short = 'q', long, global = true)] /// Suppress non-essential output
#[arg(
short = 'q',
long,
global = true,
overrides_with = "no_quiet",
help = "Suppress non-essential output"
)]
pub quiet: bool, pub quiet: bool,
#[arg(short = 'v', long = "verbose", action = clap::ArgAction::Count, global = true)] #[arg(
long = "no-quiet",
global = true,
hide = true,
overrides_with = "quiet"
)]
pub no_quiet: bool,
/// Increase log verbosity (-v, -vv, -vvv)
#[arg(short = 'v', long = "verbose", action = clap::ArgAction::Count, global = true, help = "Increase log verbosity (-v, -vv, -vvv)")]
pub verbose: u8, pub verbose: u8,
#[arg(long = "log-format", global = true, value_parser = ["text", "json"], default_value = "text")] /// Log format for stderr output: text (default) or json
#[arg(long = "log-format", global = true, value_parser = ["text", "json"], default_value = "text", help = "Log format for stderr output: text (default) or json")]
pub log_format: String, pub log_format: String,
#[command(subcommand)] #[command(subcommand)]
pub command: Commands, pub command: Option<Commands>,
} }
impl Cli { impl Cli {
pub fn is_robot_mode(&self) -> bool { pub fn is_robot_mode(&self) -> bool {
self.robot || self.json || !std::io::stdout().is_terminal() self.robot || self.json || !std::io::stdout().is_terminal()
} }
/// Detect robot mode from environment before parsing succeeds.
/// Used for structured error output when clap parsing fails.
pub fn detect_robot_mode_from_env() -> bool {
let args: Vec<String> = std::env::args().collect();
args.iter()
.any(|a| a == "--robot" || a == "-J" || a == "--json")
|| std::env::var("LORE_ROBOT").is_ok()
|| !std::io::stdout().is_terminal()
}
} }
#[derive(Subcommand)] #[derive(Subcommand)]
#[allow(clippy::large_enum_variant)] #[allow(clippy::large_enum_variant)]
pub enum Commands { pub enum Commands {
/// List or show issues
Issues(IssuesArgs), Issues(IssuesArgs),
/// List or show merge requests
Mrs(MrsArgs), Mrs(MrsArgs),
/// Ingest data from GitLab
Ingest(IngestArgs), Ingest(IngestArgs),
/// Count entities in local database
Count(CountArgs), Count(CountArgs),
/// Show sync state
Status, Status,
/// Verify GitLab authentication
Auth, Auth,
/// Check environment health
Doctor, Doctor,
/// Show version information
Version, Version,
/// Initialize configuration and database
Init { Init {
/// Skip overwrite confirmation
#[arg(short = 'f', long)] #[arg(short = 'f', long)]
force: bool, force: bool,
/// Fail if prompts would be shown
#[arg(long)] #[arg(long)]
non_interactive: bool, non_interactive: bool,
/// GitLab base URL (required in robot mode)
#[arg(long)] #[arg(long)]
gitlab_url: Option<String>, gitlab_url: Option<String>,
/// Environment variable name holding GitLab token (required in robot mode)
#[arg(long)] #[arg(long)]
token_env_var: Option<String>, token_env_var: Option<String>,
/// Comma-separated project paths (required in robot mode)
#[arg(long)] #[arg(long)]
projects: Option<String>, projects: Option<String>,
}, },
@@ -84,26 +140,41 @@ pub enum Commands {
yes: bool, yes: bool,
}, },
/// Search indexed documents
Search(SearchArgs), Search(SearchArgs),
/// Show document and index statistics
Stats(StatsArgs), Stats(StatsArgs),
/// Generate searchable documents from ingested data
#[command(name = "generate-docs")] #[command(name = "generate-docs")]
GenerateDocs(GenerateDocsArgs), GenerateDocs(GenerateDocsArgs),
/// Generate vector embeddings for documents via Ollama
Embed(EmbedArgs), Embed(EmbedArgs),
/// Run full sync pipeline: ingest -> generate-docs -> embed
Sync(SyncArgs), Sync(SyncArgs),
/// Run pending database migrations
Migrate, Migrate,
/// Quick health check: config, database, schema version
Health, Health,
/// Machine-readable command manifest for agent self-discovery
#[command(name = "robot-docs")] #[command(name = "robot-docs")]
RobotDocs, RobotDocs,
#[command(hide = true)] /// Generate shell completions
#[command(long_about = "Generate shell completions for lore.\n\n\
Installation:\n \
bash: lore completions bash > ~/.local/share/bash-completion/completions/lore\n \
zsh: lore completions zsh > ~/.zfunc/_lore && echo 'fpath+=~/.zfunc' >> ~/.zshrc\n \
fish: lore completions fish > ~/.config/fish/completions/lore.fish\n \
pwsh: lore completions powershell >> $PROFILE")]
Completions { Completions {
/// Shell to generate completions for
#[arg(value_parser = ["bash", "zsh", "fish", "powershell"])] #[arg(value_parser = ["bash", "zsh", "fish", "powershell"])]
shell: String, shell: String,
}, },
@@ -171,8 +242,10 @@ pub enum Commands {
#[derive(Parser)] #[derive(Parser)]
pub struct IssuesArgs { pub struct IssuesArgs {
/// Issue IID (omit to list, provide to show details)
pub iid: Option<i64>, pub iid: Option<i64>,
/// Maximum results
#[arg( #[arg(
short = 'n', short = 'n',
long = "limit", long = "limit",
@@ -181,30 +254,43 @@ pub struct IssuesArgs {
)] )]
pub limit: usize, pub limit: usize,
/// Select output fields (comma-separated: iid,title,state,author,labels,updated)
#[arg(long, help_heading = "Output", value_delimiter = ',')]
pub fields: Option<Vec<String>>,
/// Filter by state (opened, closed, all)
#[arg(short = 's', long, help_heading = "Filters")] #[arg(short = 's', long, help_heading = "Filters")]
pub state: Option<String>, pub state: Option<String>,
/// Filter by project path
#[arg(short = 'p', long, help_heading = "Filters")] #[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>, pub project: Option<String>,
/// Filter by author username
#[arg(short = 'a', long, help_heading = "Filters")] #[arg(short = 'a', long, help_heading = "Filters")]
pub author: Option<String>, pub author: Option<String>,
/// Filter by assignee username
#[arg(short = 'A', long, help_heading = "Filters")] #[arg(short = 'A', long, help_heading = "Filters")]
pub assignee: Option<String>, pub assignee: Option<String>,
/// Filter by label (repeatable, AND logic)
#[arg(short = 'l', long, help_heading = "Filters")] #[arg(short = 'l', long, help_heading = "Filters")]
pub label: Option<Vec<String>>, pub label: Option<Vec<String>>,
/// Filter by milestone title
#[arg(short = 'm', long, help_heading = "Filters")] #[arg(short = 'm', long, help_heading = "Filters")]
pub milestone: Option<String>, pub milestone: Option<String>,
/// Filter by time (7d, 2w, 1m, or YYYY-MM-DD)
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub since: Option<String>, pub since: Option<String>,
/// Filter by due date (before this date, YYYY-MM-DD)
#[arg(long = "due-before", help_heading = "Filters")] #[arg(long = "due-before", help_heading = "Filters")]
pub due_before: Option<String>, pub due_before: Option<String>,
/// Show only issues with a due date
#[arg( #[arg(
long = "has-due", long = "has-due",
help_heading = "Filters", help_heading = "Filters",
@@ -215,15 +301,18 @@ pub struct IssuesArgs {
#[arg(long = "no-has-due", hide = true, overrides_with = "has_due")] #[arg(long = "no-has-due", hide = true, overrides_with = "has_due")]
pub no_has_due: bool, pub no_has_due: bool,
/// Sort field (updated, created, iid)
#[arg(long, value_parser = ["updated", "created", "iid"], default_value = "updated", help_heading = "Sorting")] #[arg(long, value_parser = ["updated", "created", "iid"], default_value = "updated", help_heading = "Sorting")]
pub sort: String, pub sort: String,
/// Sort ascending (default: descending)
#[arg(long, help_heading = "Sorting", overrides_with = "no_asc")] #[arg(long, help_heading = "Sorting", overrides_with = "no_asc")]
pub asc: bool, pub asc: bool,
#[arg(long = "no-asc", hide = true, overrides_with = "asc")] #[arg(long = "no-asc", hide = true, overrides_with = "asc")]
pub no_asc: bool, pub no_asc: bool,
/// Open first matching item in browser
#[arg( #[arg(
short = 'o', short = 'o',
long, long,
@@ -238,8 +327,10 @@ pub struct IssuesArgs {
#[derive(Parser)] #[derive(Parser)]
pub struct MrsArgs { pub struct MrsArgs {
/// MR IID (omit to list, provide to show details)
pub iid: Option<i64>, pub iid: Option<i64>,
/// Maximum results
#[arg( #[arg(
short = 'n', short = 'n',
long = "limit", long = "limit",
@@ -248,27 +339,39 @@ pub struct MrsArgs {
)] )]
pub limit: usize, pub limit: usize,
/// Select output fields (comma-separated: iid,title,state,author,labels,updated)
#[arg(long, help_heading = "Output", value_delimiter = ',')]
pub fields: Option<Vec<String>>,
/// Filter by state (opened, merged, closed, locked, all)
#[arg(short = 's', long, help_heading = "Filters")] #[arg(short = 's', long, help_heading = "Filters")]
pub state: Option<String>, pub state: Option<String>,
/// Filter by project path
#[arg(short = 'p', long, help_heading = "Filters")] #[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>, pub project: Option<String>,
/// Filter by author username
#[arg(short = 'a', long, help_heading = "Filters")] #[arg(short = 'a', long, help_heading = "Filters")]
pub author: Option<String>, pub author: Option<String>,
/// Filter by assignee username
#[arg(short = 'A', long, help_heading = "Filters")] #[arg(short = 'A', long, help_heading = "Filters")]
pub assignee: Option<String>, pub assignee: Option<String>,
/// Filter by reviewer username
#[arg(short = 'r', long, help_heading = "Filters")] #[arg(short = 'r', long, help_heading = "Filters")]
pub reviewer: Option<String>, pub reviewer: Option<String>,
/// Filter by label (repeatable, AND logic)
#[arg(short = 'l', long, help_heading = "Filters")] #[arg(short = 'l', long, help_heading = "Filters")]
pub label: Option<Vec<String>>, pub label: Option<Vec<String>>,
/// Filter by time (7d, 2w, 1m, or YYYY-MM-DD)
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub since: Option<String>, pub since: Option<String>,
/// Show only draft MRs
#[arg( #[arg(
short = 'd', short = 'd',
long, long,
@@ -277,6 +380,7 @@ pub struct MrsArgs {
)] )]
pub draft: bool, pub draft: bool,
/// Exclude draft MRs
#[arg( #[arg(
short = 'D', short = 'D',
long = "no-draft", long = "no-draft",
@@ -285,21 +389,26 @@ pub struct MrsArgs {
)] )]
pub no_draft: bool, pub no_draft: bool,
/// Filter by target branch
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub target: Option<String>, pub target: Option<String>,
/// Filter by source branch
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub source: Option<String>, pub source: Option<String>,
/// Sort field (updated, created, iid)
#[arg(long, value_parser = ["updated", "created", "iid"], default_value = "updated", help_heading = "Sorting")] #[arg(long, value_parser = ["updated", "created", "iid"], default_value = "updated", help_heading = "Sorting")]
pub sort: String, pub sort: String,
/// Sort ascending (default: descending)
#[arg(long, help_heading = "Sorting", overrides_with = "no_asc")] #[arg(long, help_heading = "Sorting", overrides_with = "no_asc")]
pub asc: bool, pub asc: bool,
#[arg(long = "no-asc", hide = true, overrides_with = "asc")] #[arg(long = "no-asc", hide = true, overrides_with = "asc")]
pub no_asc: bool, pub no_asc: bool,
/// Open first matching item in browser
#[arg( #[arg(
short = 'o', short = 'o',
long, long,
@@ -314,65 +423,95 @@ pub struct MrsArgs {
#[derive(Parser)] #[derive(Parser)]
pub struct IngestArgs { pub struct IngestArgs {
/// Entity to ingest (issues, mrs). Omit to ingest everything
#[arg(value_parser = ["issues", "mrs"])] #[arg(value_parser = ["issues", "mrs"])]
pub entity: Option<String>, pub entity: Option<String>,
/// Filter to single project
#[arg(short = 'p', long)] #[arg(short = 'p', long)]
pub project: Option<String>, pub project: Option<String>,
/// Override stale sync lock
#[arg(short = 'f', long, overrides_with = "no_force")] #[arg(short = 'f', long, overrides_with = "no_force")]
pub force: bool, pub force: bool,
#[arg(long = "no-force", hide = true, overrides_with = "force")] #[arg(long = "no-force", hide = true, overrides_with = "force")]
pub no_force: bool, pub no_force: bool,
/// Full re-sync: reset cursors and fetch all data from scratch
#[arg(long, overrides_with = "no_full")] #[arg(long, overrides_with = "no_full")]
pub full: bool, pub full: bool,
#[arg(long = "no-full", hide = true, overrides_with = "full")] #[arg(long = "no-full", hide = true, overrides_with = "full")]
pub no_full: bool, pub no_full: bool,
/// Preview what would be synced without making changes
#[arg(long, overrides_with = "no_dry_run")]
pub dry_run: bool,
#[arg(long = "no-dry-run", hide = true, overrides_with = "dry_run")]
pub no_dry_run: bool,
} }
#[derive(Parser)] #[derive(Parser)]
pub struct StatsArgs { pub struct StatsArgs {
/// Run integrity checks
#[arg(long, overrides_with = "no_check")] #[arg(long, overrides_with = "no_check")]
pub check: bool, pub check: bool,
#[arg(long = "no-check", hide = true, overrides_with = "check")] #[arg(long = "no-check", hide = true, overrides_with = "check")]
pub no_check: bool, pub no_check: bool,
/// Repair integrity issues (auto-enables --check)
#[arg(long)] #[arg(long)]
pub repair: bool, pub repair: bool,
/// Preview what would be repaired without making changes (requires --repair)
#[arg(long, overrides_with = "no_dry_run")]
pub dry_run: bool,
#[arg(long = "no-dry-run", hide = true, overrides_with = "dry_run")]
pub no_dry_run: bool,
} }
#[derive(Parser)] #[derive(Parser)]
pub struct SearchArgs { pub struct SearchArgs {
/// Search query string
pub query: String, pub query: String,
/// Search mode (lexical, hybrid, semantic)
#[arg(long, default_value = "hybrid", value_parser = ["lexical", "hybrid", "semantic"], help_heading = "Output")] #[arg(long, default_value = "hybrid", value_parser = ["lexical", "hybrid", "semantic"], help_heading = "Output")]
pub mode: String, pub mode: String,
/// Filter by source type (issue, mr, discussion)
#[arg(long = "type", value_name = "TYPE", value_parser = ["issue", "mr", "discussion"], help_heading = "Filters")] #[arg(long = "type", value_name = "TYPE", value_parser = ["issue", "mr", "discussion"], help_heading = "Filters")]
pub source_type: Option<String>, pub source_type: Option<String>,
/// Filter by author username
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub author: Option<String>, pub author: Option<String>,
/// Filter by project path
#[arg(short = 'p', long, help_heading = "Filters")] #[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>, pub project: Option<String>,
/// Filter by label (repeatable, AND logic)
#[arg(long, action = clap::ArgAction::Append, help_heading = "Filters")] #[arg(long, action = clap::ArgAction::Append, help_heading = "Filters")]
pub label: Vec<String>, pub label: Vec<String>,
/// Filter by file path (trailing / for prefix match)
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub path: Option<String>, pub path: Option<String>,
/// Filter by created after (7d, 2w, or YYYY-MM-DD)
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub after: Option<String>, pub after: Option<String>,
/// Filter by updated after (7d, 2w, or YYYY-MM-DD)
#[arg(long = "updated-after", help_heading = "Filters")] #[arg(long = "updated-after", help_heading = "Filters")]
pub updated_after: Option<String>, pub updated_after: Option<String>,
/// Maximum results (default 20, max 100)
#[arg( #[arg(
short = 'n', short = 'n',
long = "limit", long = "limit",
@@ -381,57 +520,75 @@ pub struct SearchArgs {
)] )]
pub limit: usize, pub limit: usize,
/// Show ranking explanation per result
#[arg(long, help_heading = "Output", overrides_with = "no_explain")] #[arg(long, help_heading = "Output", overrides_with = "no_explain")]
pub explain: bool, pub explain: bool,
#[arg(long = "no-explain", hide = true, overrides_with = "explain")] #[arg(long = "no-explain", hide = true, overrides_with = "explain")]
pub no_explain: bool, pub no_explain: bool,
/// FTS query mode: safe (default) or raw
#[arg(long = "fts-mode", default_value = "safe", value_parser = ["safe", "raw"], help_heading = "Output")] #[arg(long = "fts-mode", default_value = "safe", value_parser = ["safe", "raw"], help_heading = "Output")]
pub fts_mode: String, pub fts_mode: String,
} }
#[derive(Parser)] #[derive(Parser)]
pub struct GenerateDocsArgs { pub struct GenerateDocsArgs {
/// Full rebuild: seed all entities into dirty queue, then drain
#[arg(long)] #[arg(long)]
pub full: bool, pub full: bool,
/// Filter to single project
#[arg(short = 'p', long)] #[arg(short = 'p', long)]
pub project: Option<String>, pub project: Option<String>,
} }
#[derive(Parser)] #[derive(Parser)]
pub struct SyncArgs { pub struct SyncArgs {
/// Reset cursors, fetch everything
#[arg(long, overrides_with = "no_full")] #[arg(long, overrides_with = "no_full")]
pub full: bool, pub full: bool,
#[arg(long = "no-full", hide = true, overrides_with = "full")] #[arg(long = "no-full", hide = true, overrides_with = "full")]
pub no_full: bool, pub no_full: bool,
/// Override stale lock
#[arg(long, overrides_with = "no_force")] #[arg(long, overrides_with = "no_force")]
pub force: bool, pub force: bool,
#[arg(long = "no-force", hide = true, overrides_with = "force")] #[arg(long = "no-force", hide = true, overrides_with = "force")]
pub no_force: bool, pub no_force: bool,
/// Skip embedding step
#[arg(long)] #[arg(long)]
pub no_embed: bool, pub no_embed: bool,
/// Skip document regeneration
#[arg(long)] #[arg(long)]
pub no_docs: bool, pub no_docs: bool,
/// Skip resource event fetching (overrides config)
#[arg(long = "no-events")] #[arg(long = "no-events")]
pub no_events: bool, pub no_events: bool,
/// Preview what would be synced without making changes
#[arg(long, overrides_with = "no_dry_run")]
pub dry_run: bool,
#[arg(long = "no-dry-run", hide = true, overrides_with = "dry_run")]
pub no_dry_run: bool,
} }
#[derive(Parser)] #[derive(Parser)]
pub struct EmbedArgs { pub struct EmbedArgs {
/// Re-embed all documents (clears existing embeddings first)
#[arg(long, overrides_with = "no_full")] #[arg(long, overrides_with = "no_full")]
pub full: bool, pub full: bool,
#[arg(long = "no-full", hide = true, overrides_with = "full")] #[arg(long = "no-full", hide = true, overrides_with = "full")]
pub no_full: bool, pub no_full: bool,
/// Retry previously failed embeddings
#[arg(long, overrides_with = "no_retry_failed")] #[arg(long, overrides_with = "no_retry_failed")]
pub retry_failed: bool, pub retry_failed: bool,
@@ -441,9 +598,11 @@ pub struct EmbedArgs {
#[derive(Parser)] #[derive(Parser)]
pub struct CountArgs { pub struct CountArgs {
/// Entity type to count (issues, mrs, discussions, notes, events)
#[arg(value_parser = ["issues", "mrs", "discussions", "notes", "events"])] #[arg(value_parser = ["issues", "mrs", "discussions", "notes", "events"])]
pub entity: String, pub entity: String,
/// Parent type filter: issue or mr (for discussions/notes)
#[arg(short = 'f', long = "for", value_parser = ["issue", "mr"])] #[arg(short = 'f', long = "for", value_parser = ["issue", "mr"])]
pub for_entity: Option<String>, pub for_entity: Option<String>,
} }

View File

@@ -8,7 +8,7 @@ pub fn compute_next_attempt_at(now: i64, attempt_count: i64) -> i64 {
let jitter_factor = rand::thread_rng().gen_range(0.9..=1.1); let jitter_factor = rand::thread_rng().gen_range(0.9..=1.1);
let delay_with_jitter = (capped_delay_ms as f64 * jitter_factor) as i64; let delay_with_jitter = (capped_delay_ms as f64 * jitter_factor) as i64;
now + delay_with_jitter now.saturating_add(delay_with_jitter)
} }
#[cfg(test)] #[cfg(test)]
@@ -82,4 +82,11 @@ mod tests {
let result = compute_next_attempt_at(now, i64::MAX); let result = compute_next_attempt_at(now, i64::MAX);
assert!(result > now); assert!(result > now);
} }
#[test]
fn test_saturating_add_prevents_overflow() {
let now = i64::MAX - 10;
let result = compute_next_attempt_at(now, 30);
assert_eq!(result, i64::MAX);
}
} }

View File

@@ -58,9 +58,13 @@ pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> Truncatio
} }
let formatted: Vec<String> = notes.iter().map(format_note).collect(); let formatted: Vec<String> = notes.iter().map(format_note).collect();
let total: String = formatted.concat(); let total_len: usize = formatted.iter().map(|s| s.len()).sum();
if total.len() <= max_bytes { if total_len <= max_bytes {
let mut total = String::with_capacity(total_len);
for s in &formatted {
total.push_str(s);
}
return TruncationResult { return TruncationResult {
content: total, content: total,
is_truncated: false, is_truncated: false,
@@ -69,7 +73,7 @@ pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> Truncatio
} }
if notes.len() == 1 { if notes.len() == 1 {
let truncated = truncate_utf8(&total, max_bytes.saturating_sub(11)); let truncated = truncate_utf8(&formatted[0], max_bytes.saturating_sub(11));
let content = format!("{}[truncated]", truncated); let content = format!("{}[truncated]", truncated);
return TruncationResult { return TruncationResult {
content, content,

View File

@@ -16,30 +16,25 @@ pub fn find_pending_documents(
last_id: i64, last_id: i64,
model_name: &str, model_name: &str,
) -> Result<Vec<PendingDocument>> { ) -> Result<Vec<PendingDocument>> {
// Optimized query: LEFT JOIN + NULL check replaces triple-EXISTS pattern.
// This allows SQLite to scan embedding_metadata once instead of three times.
// Semantically identical: returns documents needing (re-)embedding when:
// - No embedding exists (em.document_id IS NULL)
// - Content hash changed (em.document_hash != d.content_hash)
// - Config mismatch (model/dims/chunk_max_bytes)
let sql = r#" let sql = r#"
SELECT d.id, d.content_text, d.content_hash SELECT d.id, d.content_text, d.content_hash
FROM documents d FROM documents d
LEFT JOIN embedding_metadata em
ON em.document_id = d.id AND em.chunk_index = 0
WHERE d.id > ?1 WHERE d.id > ?1
AND ( AND (
NOT EXISTS ( em.document_id IS NULL
SELECT 1 FROM embedding_metadata em OR em.document_hash != d.content_hash
WHERE em.document_id = d.id AND em.chunk_index = 0 OR em.chunk_max_bytes IS NULL
) OR em.chunk_max_bytes != ?3
OR EXISTS ( OR em.model != ?4
SELECT 1 FROM embedding_metadata em OR em.dims != ?5
WHERE em.document_id = d.id AND em.chunk_index = 0
AND em.document_hash != d.content_hash
)
OR EXISTS (
SELECT 1 FROM embedding_metadata em
WHERE em.document_id = d.id AND em.chunk_index = 0
AND (
em.chunk_max_bytes IS NULL
OR em.chunk_max_bytes != ?3
OR em.model != ?4
OR em.dims != ?5
)
)
) )
ORDER BY d.id ORDER BY d.id
LIMIT ?2 LIMIT ?2
@@ -69,31 +64,19 @@ pub fn find_pending_documents(
} }
pub fn count_pending_documents(conn: &Connection, model_name: &str) -> Result<i64> { pub fn count_pending_documents(conn: &Connection, model_name: &str) -> Result<i64> {
// Optimized query: LEFT JOIN + NULL check replaces triple-EXISTS pattern
let count: i64 = conn.query_row( let count: i64 = conn.query_row(
r#" r#"
SELECT COUNT(*) SELECT COUNT(*)
FROM documents d FROM documents d
WHERE ( LEFT JOIN embedding_metadata em
NOT EXISTS ( ON em.document_id = d.id AND em.chunk_index = 0
SELECT 1 FROM embedding_metadata em WHERE em.document_id IS NULL
WHERE em.document_id = d.id AND em.chunk_index = 0 OR em.document_hash != d.content_hash
) OR em.chunk_max_bytes IS NULL
OR EXISTS ( OR em.chunk_max_bytes != ?1
SELECT 1 FROM embedding_metadata em OR em.model != ?2
WHERE em.document_id = d.id AND em.chunk_index = 0 OR em.dims != ?3
AND em.document_hash != d.content_hash
)
OR EXISTS (
SELECT 1 FROM embedding_metadata em
WHERE em.document_id = d.id AND em.chunk_index = 0
AND (
em.chunk_max_bytes IS NULL
OR em.chunk_max_bytes != ?1
OR em.model != ?2
OR em.dims != ?3
)
)
)
"#, "#,
rusqlite::params![CHUNK_MAX_BYTES as i64, model_name, EXPECTED_DIMS as i64], rusqlite::params![CHUNK_MAX_BYTES as i64, model_name, EXPECTED_DIMS as i64],
|row| row.get(0), |row| row.get(0),

View File

@@ -1,6 +1,7 @@
use reqwest::Client; use reqwest::Client;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use std::time::Duration; use std::time::Duration;
use tracing::warn;
use crate::core::error::{LoreError, Result}; use crate::core::error::{LoreError, Result};
@@ -53,7 +54,13 @@ impl OllamaClient {
let client = Client::builder() let client = Client::builder()
.timeout(Duration::from_secs(config.timeout_secs)) .timeout(Duration::from_secs(config.timeout_secs))
.build() .build()
.expect("Failed to create HTTP client"); .unwrap_or_else(|e| {
warn!(
error = %e,
"Failed to build configured Ollama HTTP client; falling back to default client"
);
Client::new()
});
Self { client, config } Self { client, config }
} }

View File

@@ -103,7 +103,7 @@ async fn embed_page(
total: usize, total: usize,
progress_callback: &Option<Box<dyn Fn(usize, usize)>>, progress_callback: &Option<Box<dyn Fn(usize, usize)>>,
) -> Result<()> { ) -> Result<()> {
let mut all_chunks: Vec<ChunkWork> = Vec::new(); let mut all_chunks: Vec<ChunkWork> = Vec::with_capacity(pending.len() * 3);
let mut page_normal_docs: usize = 0; let mut page_normal_docs: usize = 0;
for doc in pending { for doc in pending {
@@ -159,7 +159,7 @@ async fn embed_page(
page_normal_docs += 1; page_normal_docs += 1;
} }
let mut cleared_docs: HashSet<i64> = HashSet::new(); let mut cleared_docs: HashSet<i64> = HashSet::with_capacity(pending.len());
for batch in all_chunks.chunks(BATCH_SIZE) { for batch in all_chunks.chunks(BATCH_SIZE) {
let texts: Vec<String> = batch.iter().map(|c| c.text.clone()).collect(); let texts: Vec<String> = batch.iter().map(|c| c.text.clone()).collect();

View File

@@ -8,7 +8,7 @@ use std::sync::Arc;
use std::time::{Duration, Instant}; use std::time::{Duration, Instant};
use tokio::sync::Mutex; use tokio::sync::Mutex;
use tokio::time::sleep; use tokio::time::sleep;
use tracing::debug; use tracing::{debug, warn};
use super::types::{ use super::types::{
GitLabDiscussion, GitLabIssue, GitLabIssueRef, GitLabLabelEvent, GitLabMergeRequest, GitLabDiscussion, GitLabIssue, GitLabIssueRef, GitLabLabelEvent, GitLabMergeRequest,
@@ -73,7 +73,13 @@ impl GitLabClient {
.default_headers(headers) .default_headers(headers)
.timeout(Duration::from_secs(30)) .timeout(Duration::from_secs(30))
.build() .build()
.expect("Failed to create HTTP client"); .unwrap_or_else(|e| {
warn!(
error = %e,
"Failed to build configured HTTP client; falling back to default client"
);
Client::new()
});
Self { Self {
client, client,

View File

@@ -2,6 +2,7 @@ use clap::Parser;
use console::style; use console::style;
use dialoguer::{Confirm, Input}; use dialoguer::{Confirm, Input};
use serde::Serialize; use serde::Serialize;
use strsim::jaro_winkler;
use tracing_subscriber::Layer; use tracing_subscriber::Layer;
use tracing_subscriber::layer::SubscriberExt; use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::util::SubscriberInitExt; use tracing_subscriber::util::SubscriberInitExt;
@@ -10,13 +11,14 @@ use lore::Config;
use lore::cli::commands::{ use lore::cli::commands::{
IngestDisplay, InitInputs, InitOptions, InitResult, ListFilters, MrListFilters, IngestDisplay, InitInputs, InitOptions, InitResult, ListFilters, MrListFilters,
SearchCliFilters, SyncOptions, open_issue_in_browser, open_mr_in_browser, print_count, SearchCliFilters, SyncOptions, open_issue_in_browser, open_mr_in_browser, print_count,
print_count_json, print_doctor_results, print_embed, print_embed_json, print_event_count, print_count_json, print_doctor_results, print_dry_run_preview, print_dry_run_preview_json,
print_event_count_json, print_generate_docs, print_generate_docs_json, print_ingest_summary, print_embed, print_embed_json, print_event_count, print_event_count_json, print_generate_docs,
print_ingest_summary_json, print_list_issues, print_list_issues_json, print_list_mrs, print_generate_docs_json, print_ingest_summary, print_ingest_summary_json, print_list_issues,
print_list_mrs_json, print_search_results, print_search_results_json, print_show_issue, print_list_issues_json, print_list_mrs, print_list_mrs_json, print_search_results,
print_show_issue_json, print_show_mr, print_show_mr_json, print_stats, print_stats_json, print_search_results_json, print_show_issue, print_show_issue_json, print_show_mr,
print_sync, print_sync_json, print_sync_status, print_sync_status_json, run_auth_test, print_show_mr_json, print_stats, print_stats_json, print_sync, print_sync_json,
run_count, run_count_events, run_doctor, run_embed, run_generate_docs, run_ingest, run_init, print_sync_status, print_sync_status_json, run_auth_test, run_count, run_count_events,
run_doctor, run_embed, run_generate_docs, run_ingest, run_ingest_dry_run, run_init,
run_list_issues, run_list_mrs, run_search, run_show_issue, run_show_mr, run_stats, run_sync, run_list_issues, run_list_mrs, run_search, run_show_issue, run_show_mr, run_stats, run_sync,
run_sync_status, run_sync_status,
}; };
@@ -40,7 +42,15 @@ async fn main() {
libc::signal(libc::SIGPIPE, libc::SIG_DFL); libc::signal(libc::SIGPIPE, libc::SIG_DFL);
} }
let cli = Cli::parse(); // Phase 1: Early robot mode detection for structured clap errors
let robot_mode_early = Cli::detect_robot_mode_from_env();
let cli = match Cli::try_parse() {
Ok(cli) => cli,
Err(e) => {
handle_clap_error(e, robot_mode_early);
}
};
let robot_mode = cli.is_robot_mode(); let robot_mode = cli.is_robot_mode();
let logging_config = lore::Config::load(cli.config.as_deref()) let logging_config = lore::Config::load(cli.config.as_deref())
@@ -127,15 +137,29 @@ async fn main() {
let quiet = cli.quiet; let quiet = cli.quiet;
let result = match cli.command { let result = match cli.command {
Commands::Issues(args) => handle_issues(cli.config.as_deref(), args, robot_mode), // Phase 2: Handle no-args case - in robot mode, output robot-docs; otherwise show help
Commands::Mrs(args) => handle_mrs(cli.config.as_deref(), args, robot_mode), None => {
Commands::Search(args) => handle_search(cli.config.as_deref(), args, robot_mode).await, if robot_mode {
Commands::Stats(args) => handle_stats(cli.config.as_deref(), args, robot_mode).await, handle_robot_docs(robot_mode)
Commands::Embed(args) => handle_embed(cli.config.as_deref(), args, robot_mode).await, } else {
Commands::Sync(args) => { use clap::CommandFactory;
let mut cmd = Cli::command();
cmd.print_help().ok();
println!();
Ok(())
}
}
Some(Commands::Issues(args)) => handle_issues(cli.config.as_deref(), args, robot_mode),
Some(Commands::Mrs(args)) => handle_mrs(cli.config.as_deref(), args, robot_mode),
Some(Commands::Search(args)) => {
handle_search(cli.config.as_deref(), args, robot_mode).await
}
Some(Commands::Stats(args)) => handle_stats(cli.config.as_deref(), args, robot_mode).await,
Some(Commands::Embed(args)) => handle_embed(cli.config.as_deref(), args, robot_mode).await,
Some(Commands::Sync(args)) => {
handle_sync_cmd(cli.config.as_deref(), args, robot_mode, &metrics_layer).await handle_sync_cmd(cli.config.as_deref(), args, robot_mode, &metrics_layer).await
} }
Commands::Ingest(args) => { Some(Commands::Ingest(args)) => {
handle_ingest( handle_ingest(
cli.config.as_deref(), cli.config.as_deref(),
args, args,
@@ -145,19 +169,19 @@ async fn main() {
) )
.await .await
} }
Commands::Count(args) => handle_count(cli.config.as_deref(), args, robot_mode).await, Some(Commands::Count(args)) => handle_count(cli.config.as_deref(), args, robot_mode).await,
Commands::Status => handle_sync_status_cmd(cli.config.as_deref(), robot_mode).await, Some(Commands::Status) => handle_sync_status_cmd(cli.config.as_deref(), robot_mode).await,
Commands::Auth => handle_auth_test(cli.config.as_deref(), robot_mode).await, Some(Commands::Auth) => handle_auth_test(cli.config.as_deref(), robot_mode).await,
Commands::Doctor => handle_doctor(cli.config.as_deref(), robot_mode).await, Some(Commands::Doctor) => handle_doctor(cli.config.as_deref(), robot_mode).await,
Commands::Version => handle_version(robot_mode), Some(Commands::Version) => handle_version(robot_mode),
Commands::Completions { shell } => handle_completions(&shell), Some(Commands::Completions { shell }) => handle_completions(&shell),
Commands::Init { Some(Commands::Init {
force, force,
non_interactive, non_interactive,
gitlab_url, gitlab_url,
token_env_var, token_env_var,
projects, projects,
} => { }) => {
handle_init( handle_init(
cli.config.as_deref(), cli.config.as_deref(),
force, force,
@@ -169,16 +193,16 @@ async fn main() {
) )
.await .await
} }
Commands::GenerateDocs(args) => { Some(Commands::GenerateDocs(args)) => {
handle_generate_docs(cli.config.as_deref(), args, robot_mode).await handle_generate_docs(cli.config.as_deref(), args, robot_mode).await
} }
Commands::Backup => handle_backup(robot_mode), Some(Commands::Backup) => handle_backup(robot_mode),
Commands::Reset { yes: _ } => handle_reset(robot_mode), Some(Commands::Reset { yes: _ }) => handle_reset(robot_mode),
Commands::Migrate => handle_migrate(cli.config.as_deref(), robot_mode).await, Some(Commands::Migrate) => handle_migrate(cli.config.as_deref(), robot_mode).await,
Commands::Health => handle_health(cli.config.as_deref(), robot_mode).await, Some(Commands::Health) => handle_health(cli.config.as_deref(), robot_mode).await,
Commands::RobotDocs => handle_robot_docs(robot_mode), Some(Commands::RobotDocs) => handle_robot_docs(robot_mode),
Commands::List { Some(Commands::List {
entity, entity,
limit, limit,
project, project,
@@ -198,7 +222,7 @@ async fn main() {
reviewer, reviewer,
target_branch, target_branch,
source_branch, source_branch,
} => { }) => {
if !robot_mode { if !robot_mode {
eprintln!( eprintln!(
"{}", "{}",
@@ -231,11 +255,11 @@ async fn main() {
) )
.await .await
} }
Commands::Show { Some(Commands::Show {
entity, entity,
iid, iid,
project, project,
} => { }) => {
if !robot_mode { if !robot_mode {
eprintln!( eprintln!(
"{}", "{}",
@@ -255,7 +279,7 @@ async fn main() {
) )
.await .await
} }
Commands::AuthTest => { Some(Commands::AuthTest) => {
if !robot_mode { if !robot_mode {
eprintln!( eprintln!(
"{}", "{}",
@@ -264,7 +288,7 @@ async fn main() {
} }
handle_auth_test(cli.config.as_deref(), robot_mode).await handle_auth_test(cli.config.as_deref(), robot_mode).await
} }
Commands::SyncStatus => { Some(Commands::SyncStatus) => {
if !robot_mode { if !robot_mode {
eprintln!( eprintln!(
"{}", "{}",
@@ -338,11 +362,143 @@ fn handle_error(e: Box<dyn std::error::Error>, robot_mode: bool) -> ! {
std::process::exit(1); std::process::exit(1);
} }
/// Phase 1 & 4: Handle clap parsing errors with structured JSON output in robot mode.
/// Also includes fuzzy command matching to suggest similar commands.
fn handle_clap_error(e: clap::Error, robot_mode: bool) -> ! {
use clap::error::ErrorKind;
// Always let clap handle --help and --version normally (print and exit 0).
// These are intentional user actions, not errors, even when stdout is redirected.
if matches!(e.kind(), ErrorKind::DisplayHelp | ErrorKind::DisplayVersion) {
e.exit()
}
if robot_mode {
let error_code = map_clap_error_kind(e.kind());
let message = e
.to_string()
.lines()
.next()
.unwrap_or("Parse error")
.to_string();
// Phase 4: Try to suggest similar command for unknown commands
let suggestion = if e.kind() == ErrorKind::InvalidSubcommand {
if let Some(invalid_cmd) = extract_invalid_subcommand(&e) {
suggest_similar_command(&invalid_cmd)
} else {
"Run 'lore robot-docs' for valid commands".to_string()
}
} else {
"Run 'lore robot-docs' for valid commands".to_string()
};
let output = RobotErrorWithSuggestion {
error: RobotErrorSuggestionData {
code: error_code.to_string(),
message,
suggestion,
},
};
eprintln!(
"{}",
serde_json::to_string(&output).unwrap_or_else(|_| {
r#"{"error":{"code":"PARSE_ERROR","message":"Parse error"}}"#.to_string()
})
);
std::process::exit(2);
} else {
e.exit()
}
}
/// Map clap ErrorKind to semantic error codes
fn map_clap_error_kind(kind: clap::error::ErrorKind) -> &'static str {
use clap::error::ErrorKind;
match kind {
ErrorKind::InvalidSubcommand => "UNKNOWN_COMMAND",
ErrorKind::UnknownArgument => "UNKNOWN_FLAG",
ErrorKind::MissingRequiredArgument => "MISSING_REQUIRED",
ErrorKind::InvalidValue => "INVALID_VALUE",
ErrorKind::ValueValidation => "INVALID_VALUE",
ErrorKind::TooManyValues => "TOO_MANY_VALUES",
ErrorKind::TooFewValues => "TOO_FEW_VALUES",
ErrorKind::ArgumentConflict => "ARGUMENT_CONFLICT",
ErrorKind::MissingSubcommand => "MISSING_COMMAND",
ErrorKind::DisplayHelp | ErrorKind::DisplayVersion => "HELP_REQUESTED",
_ => "PARSE_ERROR",
}
}
/// Extract the invalid subcommand from a clap error (Phase 4)
fn extract_invalid_subcommand(e: &clap::Error) -> Option<String> {
// Parse the error message to find the invalid subcommand
// Format is typically: "error: unrecognized subcommand 'foo'"
let msg = e.to_string();
if let Some(start) = msg.find('\'')
&& let Some(end) = msg[start + 1..].find('\'')
{
return Some(msg[start + 1..start + 1 + end].to_string());
}
None
}
/// Phase 4: Suggest similar command using fuzzy matching
fn suggest_similar_command(invalid: &str) -> String {
const VALID_COMMANDS: &[&str] = &[
"issues",
"mrs",
"search",
"sync",
"ingest",
"count",
"status",
"auth",
"doctor",
"version",
"init",
"stats",
"generate-docs",
"embed",
"migrate",
"health",
"robot-docs",
"completions",
];
let invalid_lower = invalid.to_lowercase();
// Find the best match using Jaro-Winkler similarity
let best_match = VALID_COMMANDS
.iter()
.map(|cmd| (*cmd, jaro_winkler(&invalid_lower, cmd)))
.max_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
if let Some((cmd, score)) = best_match
&& score > 0.7
{
return format!(
"Did you mean 'lore {}'? Run 'lore robot-docs' for all commands",
cmd
);
}
"Run 'lore robot-docs' for valid commands".to_string()
}
fn handle_issues( fn handle_issues(
config_override: Option<&str>, config_override: Option<&str>,
args: IssuesArgs, args: IssuesArgs,
robot_mode: bool, robot_mode: bool,
) -> Result<(), Box<dyn std::error::Error>> { ) -> Result<(), Box<dyn std::error::Error>> {
// Warn about unimplemented --fields
if args.fields.is_some() && !robot_mode {
eprintln!(
"{}",
style("warning: --fields is not yet implemented, showing all fields").yellow()
);
}
let config = Config::load(config_override)?; let config = Config::load(config_override)?;
let asc = args.asc && !args.no_asc; let asc = args.asc && !args.no_asc;
let has_due = args.has_due && !args.no_has_due; let has_due = args.has_due && !args.no_has_due;
@@ -391,6 +547,14 @@ fn handle_mrs(
args: MrsArgs, args: MrsArgs,
robot_mode: bool, robot_mode: bool,
) -> Result<(), Box<dyn std::error::Error>> { ) -> Result<(), Box<dyn std::error::Error>> {
// Warn about unimplemented --fields
if args.fields.is_some() && !robot_mode {
eprintln!(
"{}",
style("warning: --fields is not yet implemented, showing all fields").yellow()
);
}
let config = Config::load(config_override)?; let config = Config::load(config_override)?;
let asc = args.asc && !args.no_asc; let asc = args.asc && !args.no_asc;
let open = args.open && !args.no_open; let open = args.open && !args.no_open;
@@ -442,16 +606,47 @@ async fn handle_ingest(
quiet: bool, quiet: bool,
metrics: &MetricsLayer, metrics: &MetricsLayer,
) -> Result<(), Box<dyn std::error::Error>> { ) -> Result<(), Box<dyn std::error::Error>> {
let dry_run = args.dry_run && !args.no_dry_run;
let config = Config::load(config_override)?; let config = Config::load(config_override)?;
let force = args.force && !args.no_force;
let full = args.full && !args.no_full;
// Handle dry run mode - show preview without making any changes
if dry_run {
match args.entity.as_deref() {
Some(resource_type) => {
let preview =
run_ingest_dry_run(&config, resource_type, args.project.as_deref(), full)?;
if robot_mode {
print_dry_run_preview_json(&preview);
} else {
print_dry_run_preview(&preview);
}
}
None => {
let issues_preview =
run_ingest_dry_run(&config, "issues", args.project.as_deref(), full)?;
let mrs_preview =
run_ingest_dry_run(&config, "mrs", args.project.as_deref(), full)?;
if robot_mode {
print_combined_dry_run_json(&issues_preview, &mrs_preview);
} else {
print_dry_run_preview(&issues_preview);
println!();
print_dry_run_preview(&mrs_preview);
}
}
}
return Ok(());
}
let display = if robot_mode || quiet { let display = if robot_mode || quiet {
IngestDisplay::silent() IngestDisplay::silent()
} else { } else {
IngestDisplay::interactive() IngestDisplay::interactive()
}; };
let force = args.force && !args.no_force;
let full = args.full && !args.no_full;
let entity_label = args.entity.as_deref().unwrap_or("all"); let entity_label = args.entity.as_deref().unwrap_or("all");
let command = format!("ingest:{entity_label}"); let command = format!("ingest:{entity_label}");
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
@@ -469,6 +664,7 @@ async fn handle_ingest(
args.project.as_deref(), args.project.as_deref(),
force, force,
full, full,
false,
display, display,
None, None,
) )
@@ -495,6 +691,7 @@ async fn handle_ingest(
args.project.as_deref(), args.project.as_deref(),
force, force,
full, full,
false,
display, display,
None, None,
) )
@@ -506,6 +703,7 @@ async fn handle_ingest(
args.project.as_deref(), args.project.as_deref(),
force, force,
full, full,
false,
display, display,
None, None,
) )
@@ -592,6 +790,35 @@ fn print_combined_ingest_json(
println!("{}", serde_json::to_string(&output).unwrap()); println!("{}", serde_json::to_string(&output).unwrap());
} }
#[derive(Serialize)]
struct CombinedDryRunOutput {
ok: bool,
dry_run: bool,
data: CombinedDryRunData,
}
#[derive(Serialize)]
struct CombinedDryRunData {
issues: lore::cli::commands::DryRunPreview,
merge_requests: lore::cli::commands::DryRunPreview,
}
fn print_combined_dry_run_json(
issues: &lore::cli::commands::DryRunPreview,
mrs: &lore::cli::commands::DryRunPreview,
) {
let output = CombinedDryRunOutput {
ok: true,
dry_run: true,
data: CombinedDryRunData {
issues: issues.clone(),
merge_requests: mrs.clone(),
},
};
println!("{}", serde_json::to_string(&output).unwrap());
}
async fn handle_count( async fn handle_count(
config_override: Option<&str>, config_override: Option<&str>,
args: CountArgs, args: CountArgs,
@@ -921,6 +1148,18 @@ async fn handle_auth_test(
} }
} }
#[derive(Serialize)]
struct DoctorOutput {
ok: bool,
data: DoctorData,
}
#[derive(Serialize)]
struct DoctorData {
success: bool,
checks: lore::cli::commands::DoctorChecks,
}
async fn handle_doctor( async fn handle_doctor(
config_override: Option<&str>, config_override: Option<&str>,
robot_mode: bool, robot_mode: bool,
@@ -928,7 +1167,14 @@ async fn handle_doctor(
let result = run_doctor(config_override).await; let result = run_doctor(config_override).await;
if robot_mode { if robot_mode {
println!("{}", serde_json::to_string_pretty(&result)?); let output = DoctorOutput {
ok: true,
data: DoctorData {
success: result.success,
checks: result.checks,
},
};
println!("{}", serde_json::to_string(&output)?);
} else { } else {
print_doctor_results(&result); print_doctor_results(&result);
} }
@@ -1133,9 +1379,10 @@ async fn handle_stats(
args: StatsArgs, args: StatsArgs,
robot_mode: bool, robot_mode: bool,
) -> Result<(), Box<dyn std::error::Error>> { ) -> Result<(), Box<dyn std::error::Error>> {
let dry_run = args.dry_run && !args.no_dry_run;
let config = Config::load(config_override)?; let config = Config::load(config_override)?;
let check = (args.check && !args.no_check) || args.repair; let check = (args.check && !args.no_check) || args.repair;
let result = run_stats(&config, check, args.repair)?; let result = run_stats(&config, check, args.repair, dry_run)?;
if robot_mode { if robot_mode {
print_stats_json(&result); print_stats_json(&result);
} else { } else {
@@ -1219,6 +1466,8 @@ async fn handle_sync_cmd(
robot_mode: bool, robot_mode: bool,
metrics: &MetricsLayer, metrics: &MetricsLayer,
) -> Result<(), Box<dyn std::error::Error>> { ) -> Result<(), Box<dyn std::error::Error>> {
let dry_run = args.dry_run && !args.no_dry_run;
let mut config = Config::load(config_override)?; let mut config = Config::load(config_override)?;
if args.no_events { if args.no_events {
config.sync.fetch_resource_events = false; config.sync.fetch_resource_events = false;
@@ -1230,8 +1479,15 @@ async fn handle_sync_cmd(
no_docs: args.no_docs, no_docs: args.no_docs,
no_events: args.no_events, no_events: args.no_events,
robot_mode, robot_mode,
dry_run,
}; };
// For dry_run, skip recording and just show the preview
if dry_run {
run_sync(&config, options, None).await?;
return Ok(());
}
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let recorder_conn = create_connection(&db_path)?; let recorder_conn = create_connection(&db_path)?;
let run_id = uuid::Uuid::new_v4().simple().to_string(); let run_id = uuid::Uuid::new_v4().simple().to_string();
@@ -1371,7 +1627,11 @@ struct RobotDocsData {
description: String, description: String,
activation: RobotDocsActivation, activation: RobotDocsActivation,
commands: serde_json::Value, commands: serde_json::Value,
/// Deprecated command aliases (old -> new)
aliases: serde_json::Value,
exit_codes: serde_json::Value, exit_codes: serde_json::Value,
/// Error codes emitted by clap parse failures
clap_error_codes: serde_json::Value,
error_format: String, error_format: String,
workflows: serde_json::Value, workflows: serde_json::Value,
} }
@@ -1410,37 +1670,37 @@ fn handle_robot_docs(robot_mode: bool) -> Result<(), Box<dyn std::error::Error>>
}, },
"ingest": { "ingest": {
"description": "Sync data from GitLab", "description": "Sync data from GitLab",
"flags": ["--project <path>", "--force", "--full", "<entity: issues|mrs>"], "flags": ["--project <path>", "--force", "--no-force", "--full", "--no-full", "--dry-run", "--no-dry-run", "<entity: issues|mrs>"],
"example": "lore --robot ingest issues --project group/repo" "example": "lore --robot ingest issues --project group/repo"
}, },
"sync": { "sync": {
"description": "Full sync pipeline: ingest -> generate-docs -> embed", "description": "Full sync pipeline: ingest -> generate-docs -> embed",
"flags": ["--full", "--force", "--no-embed", "--no-docs"], "flags": ["--full", "--no-full", "--force", "--no-force", "--no-embed", "--no-docs", "--no-events", "--dry-run", "--no-dry-run"],
"example": "lore --robot sync" "example": "lore --robot sync"
}, },
"issues": { "issues": {
"description": "List or show issues", "description": "List or show issues",
"flags": ["<IID>", "--limit", "--state", "--project", "--author", "--assignee", "--label", "--milestone", "--since", "--due-before", "--has-due", "--sort", "--asc"], "flags": ["<IID>", "-n/--limit", "--fields <list>", "-s/--state", "-p/--project", "-a/--author", "-A/--assignee", "-l/--label", "-m/--milestone", "--since", "--due-before", "--has-due", "--no-has-due", "--sort", "--asc", "--no-asc", "-o/--open", "--no-open"],
"example": "lore --robot issues --state opened --limit 10" "example": "lore --robot issues --state opened --limit 10"
}, },
"mrs": { "mrs": {
"description": "List or show merge requests", "description": "List or show merge requests",
"flags": ["<IID>", "--limit", "--state", "--project", "--author", "--assignee", "--reviewer", "--label", "--since", "--draft", "--no-draft", "--target", "--source", "--sort", "--asc"], "flags": ["<IID>", "-n/--limit", "--fields <list>", "-s/--state", "-p/--project", "-a/--author", "-A/--assignee", "-r/--reviewer", "-l/--label", "--since", "-d/--draft", "-D/--no-draft", "--target", "--source", "--sort", "--asc", "--no-asc", "-o/--open", "--no-open"],
"example": "lore --robot mrs --state opened" "example": "lore --robot mrs --state opened"
}, },
"search": { "search": {
"description": "Search indexed documents (lexical, hybrid, semantic)", "description": "Search indexed documents (lexical, hybrid, semantic)",
"flags": ["<QUERY>", "--mode", "--type", "--author", "--project", "--label", "--path", "--after", "--updated-after", "--limit", "--explain", "--fts-mode"], "flags": ["<QUERY>", "--mode", "--type", "--author", "-p/--project", "--label", "--path", "--after", "--updated-after", "-n/--limit", "--explain", "--no-explain", "--fts-mode"],
"example": "lore --robot search 'authentication bug' --mode hybrid --limit 10" "example": "lore --robot search 'authentication bug' --mode hybrid --limit 10"
}, },
"count": { "count": {
"description": "Count entities in local database", "description": "Count entities in local database",
"flags": ["<entity: issues|mrs|discussions|notes>", "--for <issue|mr>"], "flags": ["<entity: issues|mrs|discussions|notes|events>", "-f/--for <issue|mr>"],
"example": "lore --robot count issues" "example": "lore --robot count issues"
}, },
"stats": { "stats": {
"description": "Show document and index statistics", "description": "Show document and index statistics",
"flags": ["--check", "--repair"], "flags": ["--check", "--no-check", "--repair", "--dry-run", "--no-dry-run"],
"example": "lore --robot stats" "example": "lore --robot stats"
}, },
"status": { "status": {
@@ -1450,12 +1710,12 @@ fn handle_robot_docs(robot_mode: bool) -> Result<(), Box<dyn std::error::Error>>
}, },
"generate-docs": { "generate-docs": {
"description": "Generate searchable documents from ingested data", "description": "Generate searchable documents from ingested data",
"flags": ["--full", "--project <path>"], "flags": ["--full", "-p/--project <path>"],
"example": "lore --robot generate-docs --full" "example": "lore --robot generate-docs --full"
}, },
"embed": { "embed": {
"description": "Generate vector embeddings for documents via Ollama", "description": "Generate vector embeddings for documents via Ollama",
"flags": ["--full", "--retry-failed"], "flags": ["--full", "--no-full", "--retry-failed", "--no-retry-failed"],
"example": "lore --robot embed" "example": "lore --robot embed"
}, },
"migrate": { "migrate": {
@@ -1468,6 +1728,11 @@ fn handle_robot_docs(robot_mode: bool) -> Result<(), Box<dyn std::error::Error>>
"flags": [], "flags": [],
"example": "lore --robot version" "example": "lore --robot version"
}, },
"completions": {
"description": "Generate shell completions",
"flags": ["<shell: bash|zsh|fish|powershell>"],
"example": "lore completions bash > ~/.local/share/bash-completion/completions/lore"
},
"robot-docs": { "robot-docs": {
"description": "This command (agent self-discovery manifest)", "description": "This command (agent self-discovery manifest)",
"flags": [], "flags": [],
@@ -1515,6 +1780,30 @@ fn handle_robot_docs(robot_mode: bool) -> Result<(), Box<dyn std::error::Error>>
] ]
}); });
// Phase 3: Deprecated command aliases
let aliases = serde_json::json!({
"list issues": "issues",
"list mrs": "mrs",
"show issue <IID>": "issues <IID>",
"show mr <IID>": "mrs <IID>",
"auth-test": "auth",
"sync-status": "status"
});
// Phase 3: Clap error codes (emitted by handle_clap_error)
let clap_error_codes = serde_json::json!({
"UNKNOWN_COMMAND": "Unrecognized subcommand (includes fuzzy suggestion)",
"UNKNOWN_FLAG": "Unrecognized command-line flag",
"MISSING_REQUIRED": "Required argument not provided",
"INVALID_VALUE": "Invalid value for argument",
"TOO_MANY_VALUES": "Too many values provided",
"TOO_FEW_VALUES": "Too few values provided",
"ARGUMENT_CONFLICT": "Conflicting arguments",
"MISSING_COMMAND": "No subcommand provided (in non-robot mode, shows help)",
"HELP_REQUESTED": "Help or version flag used",
"PARSE_ERROR": "General parse error"
});
let output = RobotDocsOutput { let output = RobotDocsOutput {
ok: true, ok: true,
data: RobotDocsData { data: RobotDocsData {
@@ -1527,7 +1816,9 @@ fn handle_robot_docs(robot_mode: bool) -> Result<(), Box<dyn std::error::Error>>
auto: "Non-TTY stdout".to_string(), auto: "Non-TTY stdout".to_string(),
}, },
commands, commands,
aliases,
exit_codes, exit_codes,
clap_error_codes,
error_format: "stderr JSON: {\"error\":{\"code\":\"...\",\"message\":\"...\",\"suggestion\":\"...\"}}".to_string(), error_format: "stderr JSON: {\"error\":{\"code\":\"...\",\"message\":\"...\",\"suggestion\":\"...\"}}".to_string(),
workflows, workflows,
}, },
@@ -1639,14 +1930,14 @@ async fn handle_show_compat(
entity: &str, entity: &str,
iid: i64, iid: i64,
project_filter: Option<&str>, project_filter: Option<&str>,
json: bool, robot_mode: bool,
) -> Result<(), Box<dyn std::error::Error>> { ) -> Result<(), Box<dyn std::error::Error>> {
let config = Config::load(config_override)?; let config = Config::load(config_override)?;
match entity { match entity {
"issue" => { "issue" => {
let result = run_show_issue(&config, iid, project_filter)?; let result = run_show_issue(&config, iid, project_filter)?;
if json { if robot_mode {
print_show_issue_json(&result); print_show_issue_json(&result);
} else { } else {
print_show_issue(&result); print_show_issue(&result);
@@ -1655,7 +1946,7 @@ async fn handle_show_compat(
} }
"mr" => { "mr" => {
let result = run_show_mr(&config, iid, project_filter)?; let result = run_show_mr(&config, iid, project_filter)?;
if json { if robot_mode {
print_show_mr_json(&result); print_show_mr_json(&result);
} else { } else {
print_show_mr(&result); print_show_mr(&result);

View File

@@ -97,13 +97,19 @@ pub fn apply_filters(
param_idx += 1; param_idx += 1;
} }
for label in &filters.labels { if !filters.labels.is_empty() {
let placeholders: Vec<String> = (0..filters.labels.len())
.map(|i| format!("?{}", param_idx + i))
.collect();
sql.push_str(&format!( sql.push_str(&format!(
" AND EXISTS (SELECT 1 FROM document_labels dl WHERE dl.document_id = d.id AND dl.label_name = ?{})", " AND EXISTS (SELECT 1 FROM document_labels dl WHERE dl.document_id = d.id AND dl.label_name IN ({}) GROUP BY dl.document_id HAVING COUNT(DISTINCT dl.label_name) = {})",
param_idx placeholders.join(","),
filters.labels.len()
)); ));
params.push(Box::new(label.clone())); for label in &filters.labels {
param_idx += 1; params.push(Box::new(label.clone()));
param_idx += 1;
}
} }
if let Some(ref path_filter) = filters.path { if let Some(ref path_filter) = filters.path {

View File

@@ -23,22 +23,25 @@ pub fn to_fts_query(raw: &str, mode: FtsQueryMode) -> String {
return String::new(); return String::new();
} }
let tokens: Vec<String> = trimmed let mut result = String::with_capacity(trimmed.len() + 20);
.split_whitespace() for (i, token) in trimmed.split_whitespace().enumerate() {
.map(|token| { if i > 0 {
if let Some(stem) = token.strip_suffix('*') result.push(' ');
&& !stem.is_empty() }
&& stem.chars().all(|c| c.is_alphanumeric() || c == '_') if let Some(stem) = token.strip_suffix('*')
{ && !stem.is_empty()
let escaped = stem.replace('"', "\"\""); && stem.chars().all(|c| c.is_alphanumeric() || c == '_')
return format!("\"{}\"*", escaped); {
} result.push('"');
let escaped = token.replace('"', "\"\""); result.push_str(&stem.replace('"', "\"\""));
format!("\"{}\"", escaped) result.push_str("\"*");
}) } else {
.collect(); result.push('"');
result.push_str(&token.replace('"', "\"\""));
tokens.join(" ") result.push('"');
}
}
result
} }
} }
} }