Removes module-level doc comments (//! lines) and excessive inline doc
comments that were duplicating information already evident from:
- Function/struct names (self-documenting code)
- Type signatures (the what is clear from types)
- Implementation context (the how is clear from code)
Affected modules:
- cli/* - Removed command descriptions duplicating clap help text
- core/* - Removed module headers and obvious function docs
- documents/* - Removed extractor/regenerator/truncation docs
- embedding/* - Removed pipeline and chunking docs
- gitlab/* - Removed client and transformer docs (kept type definitions)
- ingestion/* - Removed orchestrator and ingestion docs
- search/* - Removed FTS and vector search docs
Philosophy: Code should be self-documenting. Comments should explain
"why" (business decisions, non-obvious constraints) not "what" (which
the code itself shows). This change reduces noise and maintenance burden
while keeping the codebase just as understandable.
Retains comments for:
- Non-obvious business logic
- Important safety invariants
- Complex algorithm explanations
- Public API boundaries where generated docs matter
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Stages 3 (generate-docs) and 4 (embed) reported progress by appending
"(N/M)" text to the stage spinner message, while stages 1-2 (ingest)
used dedicated indicatif progress bars with animated [====> ] rendering
registered with the global MultiProgress. This visual inconsistency
was introduced when progress callbacks were wired through in 266ed78.
Replace the spinner.set_message() callbacks with proper ProgressBar
instances that match the ingest stage pattern:
- Create a bar-style ProgressBar registered via multi().add()
- Use the same template/progress_chars as the ingest discussion bars
- Lazy-init the tick via AtomicBool to avoid showing the bar before
the first callback fires (matching how ingest enables ticks only
at DiscussionSyncStarted)
- Update set_length on every callback for the docs stage, since the
regenerator's estimated_total can grow if new dirty items are
queued during processing (using .max() internally)
- Clean up both the sub-bar and stage spinner on completion/error
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three defensive improvements from peer code review:
Replace unreachable!() in GitLab client retry loops:
Both request() and request_with_headers() had unreachable!() after
their for loops. While the logic was sound (the final iteration always
reaches the return/break), any refactor to the loop condition would
turn this into a runtime panic. Restructured both to store
last_response with explicit break, making the control flow
self-documenting and the .expect() message useful if ever violated.
Doctor model name comparison asymmetry:
Ollama model names were stripped of their tag (:latest, :v1.5) for
comparison, but the configured model name was compared as-is. A config
value like "nomic-embed-text:v1.5" would never match. Now strips the
tag from both sides before comparing.
Regenerator savepoint cleanup and progress accuracy:
- upsert_document's error path did ROLLBACK TO but never RELEASE,
leaving a dangling savepoint that could nest on the next call. Added
RELEASE after rollback so the connection is clean.
- estimated_total for progress reporting was computed once at start but
the dirty queue can grow during processing. Now recounts each loop
iteration with max() so the progress fraction never goes backwards.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The sync command's stage spinners now show real-time aggregate progress
for each pipeline phase instead of static "syncing..." messages.
- Add `progress_callback` parameter to `run_embed` and
`run_generate_docs` so callers can receive `(processed, total)` updates
- Add `stage_bar` parameter to `run_ingest` for aggregate progress
across concurrently-ingested projects using shared AtomicUsize counters
- Update `stage_spinner` to use `{prefix}` for the `[N/M]` label,
allowing `{msg}` to be updated independently with progress details
- Thread `ProgressBar` clones into each concurrent project task so
per-entity progress (fetch, discussions, events) is reflected on the
aggregate spinner
- Pass `None` for progress callbacks at standalone CLI entry points
(handle_ingest, handle_generate_docs, handle_embed) to preserve
existing behavior when commands are run outside of sync
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add end-to-end observability to the sync and ingest pipelines:
Sync command:
- Generate UUID-based run_id for each sync invocation, propagated through
all child spans for log correlation across stages
- Accept MetricsLayer reference to extract hierarchical StageTiming data
after pipeline completion for robot-mode performance output
- Record sync runs in DB via SyncRunRecorder (start/succeed/fail lifecycle)
- Wrap entire sync execution in a root tracing span with run_id field
Ingest command:
- Wrap run_ingest in an instrumented root span with run_id and resource_type
- Add project path prefix to discussion progress bars for multi-project clarity
- Reset resource_events_synced_for_updated_at on --full re-sync
Sync status:
- Expand from single last_run to configurable recent runs list (default 10)
- Parse and expose StageTiming metrics from stored metrics_json
- Add run_id, total_items_processed, total_errors to SyncRunInfo
- Add mr_count to DataSummary for complete entity coverage
Orchestrator:
- Add #[instrument] with structured fields to issue and MR ingestion functions
- Record items_processed, items_skipped, errors on span close for MetricsLayer
- Emit granular progress events (IssuesFetchStarted, IssuesFetchComplete)
- Pass project_id through to drain_resource_events for scoped job claiming
Document regenerator and embedding pipeline:
- Add #[instrument] spans with items_processed, items_skipped, errors fields
- Record final counts on span close for metrics extraction
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Overhaul the CLI logging infrastructure for production observability:
CLI flags:
- Add -v/-vv/-vvv (--verbose) for progressive stderr verbosity control:
0=INFO, 1=DEBUG app, 2=DEBUG all, 3+=TRACE
- Add --log-format text|json for structured stderr output in automation
- Existing -q/--quiet overrides verbosity for silent operation
Subscriber architecture (main.rs):
- Replace single-layer subscriber with triple-layer setup:
1. stderr layer: human-readable or JSON, filtered by -v flags
2. file layer: always-on JSON to daily-rotated logs (lore.YYYY-MM-DD.log)
3. MetricsLayer: captures span timing for robot-mode performance payloads
- Parse CLI before subscriber init so verbosity is known at setup time
- Load LoggingConfig early (with graceful fallback for pre-init commands)
- Clean up old log files before subscriber init to avoid holding deleted handles
- Hold WorkerGuard at function scope to ensure flush on exit
Doctor command:
- Add logging health check: validates log directory exists, reports file
count and total size, warns on missing or inaccessible log directory
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The sync pipeline was bottlenecked at 10 req/s (hardcoded) with
sequential project processing and no retry on rate limiting. These
changes target 3-5x throughput improvement.
Rate limit configuration:
- Add requestsPerSecond to SyncConfig (default 30.0, was hardcoded 10)
- Pass configured rate through to GitLabClient::new from ingest
- Floor rate at 0.1 rps in RateLimiter::new to prevent panic on
Duration::from_secs_f64(1.0 / 0.0) — now reachable via user config
429 auto-retry:
- Both request() and request_with_headers() retry up to 3 times on
HTTP 429, respecting the retry-after header (default 60s)
- Extract parse_retry_after helper, reused by handle_response fallback
- After exhausting retries, the 429 error propagates as before
- Improved JSON decode errors now include a response body preview
Concurrent project ingestion:
- Derive Clone on GitLabClient (cheap: shares Arc<Mutex<RateLimiter>>
and reqwest::Client which is already Arc-backed)
- Restructure project loop to use futures::stream::buffer_unordered
with primary_concurrency (default 4) as the parallelism bound
- Each project gets its own SQLite connection (WAL mode + busy_timeout
handles concurrent writes)
- Add show_spinner field to IngestDisplay to separate the per-project
spinner from the sync-level stage spinner
- Error aggregation defers failures: all successful projects get their
summaries printed and results counted before returning the first error
- Bump dependentConcurrency default from 2 to 8 for discussion prefetch
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
tracing-indicatif pulled in vt100, arrayvec, and its own indicatif
integration layer. Replace it with a minimal SuspendingWriter that
coordinates tracing output with progress bars via a global LazyLock
MultiProgress.
- Add src/cli/progress.rs: shared MultiProgress singleton via LazyLock
and a SuspendingWriter that suspends bars before writing log lines,
preventing interleaving/flicker
- Wire all progress bar creation through multi().add() in sync and
ingest commands
- Replace IndicatifLayer in main.rs with SuspendingWriter for
tracing-subscriber's fmt layer
- Remove tracing-indicatif from Cargo.toml (drops vt100 and arrayvec
transitive deps)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three bugs fixed:
1. Early return in orchestrator when no discussions needed sync also
skipped resource event enqueue+drain. On incremental syncs (the most
common case), resource events were never fetched. Restructured to use
if/else instead of early return so Step 4 always executes.
2. Ingest command JSON and human-readable output silently dropped
resource_events_fetched/failed counts. Added to IngestJsonData and
print_ingest_summary.
3. Progress bar reuse after finish_and_clear caused indicatif to silently
ignore subsequent set_position/set_length calls. Added reset() call
before reconfiguring the bar for resource events.
Also removed stale comment referencing "unsafe" that didn't reflect
the actual unchecked_transaction approach.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Integrate resource event fetching as Step 4 of both issue and MR
ingestion, gated behind the fetch_resource_events config flag.
Orchestrator changes:
- Add ProgressEvent variants: ResourceEventsFetchStarted,
ResourceEventFetched, ResourceEventsFetchComplete
- Add resource_events_fetched/failed fields to IngestProjectResult
and IngestMrProjectResult
- New enqueue_resource_events_for_entity_type() queries all
issues/MRs for a project and enqueues resource_events jobs via
the dependent queue (INSERT OR IGNORE for idempotency)
- New drain_resource_events() claims jobs in batches, fetches
state/label/milestone events from GitLab API, stores them
atomically via unchecked_transaction, and handles failures
with exponential backoff via fail_job()
- Max-iterations guard prevents infinite retry loops within a
single drain run
- New store_resource_events() + per-type _tx helpers write events
using prepared statements inside a single transaction
- DrainResult struct tracks fetched/failed counts
CLI ingest changes:
- IngestResult gains resource_events_fetched/failed fields
- Progress bar repurposed for resource event fetch phase
(reuses discussion bar with updated template)
- Accumulates event counts from both issue and MR ingestion
CLI sync changes:
- SyncResult gains resource_events_fetched/failed fields
- Accumulates counts from both ingest stages
- print_sync() conditionally displays event counts
- Structured logging includes event counts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Automated formatting and lint corrections from parallel agent work:
- cargo fmt: import reordering (alphabetical), line wrapping to respect
max width, trailing comma normalization, destructuring alignment,
function signature reformatting, match arm formatting
- clippy (pedantic): Range::contains() instead of manual comparisons,
i64::from() instead of `as i64` casts, .clamp() instead of
.max().min() chains, let-chain refactors (if-let with &&),
#[allow(clippy::too_many_arguments)] and
#[allow(clippy::field_reassign_with_default)] where warranted
- Removed trailing blank lines and extra whitespace
No behavioral changes. All existing tests pass unmodified.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds two new categories of integrity checks to 'lore stats --check':
Event FK integrity (3 queries):
- Detects orphaned resource_state_events where issue_id or
merge_request_id points to a non-existent parent entity
- Same check for resource_label_events and resource_milestone_events
- Under normal CASCADE operation these should always be zero; non-zero
indicates manual DB edits, bugs, or partial migration state
Queue health diagnostics:
- pending_dependent_fetches counts: pending, failed, and stuck (locked)
- queue_stuck_locks: Jobs with locked_at set (potential worker crashes)
- queue_max_attempts: Highest retry count across all jobs (signals
permanently failing jobs when > 3)
New IntegrityResult fields: orphan_state_events, orphan_label_events,
orphan_milestone_events, queue_stuck_locks, queue_max_attempts.
New QueueStats fields: pending_dependent_fetches,
pending_dependent_fetches_failed, pending_dependent_fetches_stuck.
Human output shows colored PASS/WARN/FAIL indicators:
- Red "!" for orphaned events (integrity failure)
- Yellow "!" for stuck locks and high retry counts (warnings)
- Dependent fetch queue line only shown when non-zero
All new queries are guarded by table_exists() checks for graceful
degradation on databases without migration 011 applied.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends the count command to support "events" as an entity type,
displaying resource event counts broken down by event type (state,
label, milestone) and entity type (issue, merge request).
New functions in count.rs:
- run_count_events: Creates DB connection and delegates to
events_db::count_events for the actual queries
- print_event_count: Human-readable table with aligned columns
showing per-type breakdowns and row/column totals
- print_event_count_json: Structured JSON matching the robot mode
contract with ok/data envelope and per-type issue/mr/total counts
JSON output structure:
{"ok":true,"data":{"state_events":{"issue":N,"merge_request":N,
"total":N},"label_events":{...},"milestone_events":{...},"total":N}}
Updated exports in commands/mod.rs to expose the three new public
functions (run_count_events, print_event_count, print_event_count_json).
The "events" branch in handle_count (main.rs, committed earlier)
routes to these functions before the existing entity type dispatcher.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a new boolean field to SyncConfig that controls whether resource
event fetching is performed during sync:
- SyncConfig.fetch_resource_events: defaults to true via serde
default_true helper, serialized as "fetchResourceEvents" in JSON
- SyncArgs.no_events: --no-events CLI flag that overrides the config
value to false when present
- SyncOptions.no_events: propagates the flag through the sync pipeline
- handle_sync_cmd: mutates loaded config when --no-events is set,
ensuring the flag takes effect regardless of config file contents
This follows the existing pattern established by --no-embed and
--no-docs flags, where CLI flags override config file defaults.
The config is loaded as mutable specifically to support this override.
Also adds "events" to the count command's entity type value_parser,
enabling `lore count events` (implementation in a separate commit).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --full / --no-full flag pair to EmbedArgs with overrides_with
semantics matching the existing flag pattern. When active, atomically
DELETEs all embedding_metadata and embeddings before re-embedding.
- Thread the full flag through run_embed -> run_sync so that
'lore sync --full' triggers a complete re-embed alongside the full
re-ingest it already performed.
- Add indicatif spinners to sync stages with dynamic stage numbering
that adjusts when --no-docs or --no-embed skip stages. Spinners are
hidden in robot mode.
- Update robot-docs manifest to advertise the new --full flag on the
embed command.
- Replace hardcoded schema version 9 in health check with the
LATEST_SCHEMA_VERSION constant from db.rs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ingest:
- Introduce IngestDisplay struct with show_progress/show_text booleans
to decouple progress bars from text output. Replaces the robot_mode
bool parameter with explicit display control, enabling sync to show
progress without duplicating summary text (progress_only mode).
- Use resolve_project() for --project filtering instead of LIKE queries,
providing proper error messages for ambiguous or missing projects.
List:
- Add colored_cell() helper that checks console::colors_enabled() before
applying comfy-table foreground colors, bridging the gap between the
console and comfy-table crates for --color flag support.
- Use resolve_project() for project filtering (exact ID match).
- Improve since filter to return explicit errors instead of silently
ignoring invalid values.
- Improve format_relative_time for proper singular/plural forms.
Search:
- Validate --after/--updated-after with explicit error messages.
- Handle optional title field (Option<String>) in HydratedRow.
Show:
- Use resolve_project() for project disambiguation.
Sync:
- Thread robot_mode via SyncOptions for IngestDisplay selection.
- Use IngestDisplay::progress_only() in interactive sync mode.
GenerateDocs:
- Use resolve_project() for --project filtering.
Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>
Global flags:
- --color (auto|always|never) for explicit color control
- --quiet/-q to suppress non-essential output
- Hidden Completions subcommand for bash/zsh/fish/powershell
Flag negation (--no-X) with overrides_with for: has-due, asc, open
(issues/mrs), force/full (ingest/sync), check (stats), explain (search),
retry-failed (embed). Enables scripted flag composition where later flags
override earlier ones.
Validation:
- value_parser on search --mode, --type, --fts-mode for early rejection
- Remove requires="check" from --repair (auto-enabled in handler)
Polish:
- help_heading groups (Filters, Sorting, Output, Actions) on issues,
mrs, and search args for cleaner --help output
- Hide Backup, Reset, and Completions from --help
Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>
Extends the CLI with six new commands that complete the search pipeline:
- lore search <QUERY>: Hybrid search with mode selection (lexical,
hybrid, semantic), rich filtering (--type, --author, --project,
--label, --path, --after, --updated-after), result limits, and
optional explain mode showing RRF score breakdowns. Safe FTS mode
sanitizes user input; raw mode passes through for power users.
- lore stats: Document and index statistics with optional --check
for integrity verification and --repair to fix inconsistencies
(orphaned documents, missing FTS entries, stale dirty queue items).
- lore embed: Generate vector embeddings via Ollama. Supports
--retry-failed to re-attempt previously failed embeddings.
- lore generate-docs: Drain the dirty queue to regenerate documents.
--full seeds all entities for complete rebuild. --project scopes
to a single project.
- lore sync: Full pipeline orchestration (ingest issues + MRs,
generate-docs, embed) with --no-embed and --no-docs flags for
partial runs. Reports per-stage results and total elapsed time.
- lore health: Quick pre-flight check (config exists, DB exists,
schema current). Returns exit code 1 if unhealthy. Designed for
agent pre-flight scripts.
- lore robot-docs: Machine-readable command manifest for agent
self-discovery. Returns all commands, flags, examples, exit codes,
and recommended workflows as structured JSON.
Also enhances lore init with --gitlab-url, --token-env-var, and
--projects flags for fully non-interactive robot-mode initialization.
Fixes init's force/non-interactive precedence logic and adds JSON
output for robot mode.
Updates all command files for the GiError -> LoreError rename.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaces the verb-first pattern ('lore list issues', 'lore show
issue 42') with noun-first subcommands that feel more natural:
lore issues # list issues
lore issues 42 # show issue #42
lore mrs # list merge requests
lore mrs 99 # show MR #99
lore ingest # ingest everything
lore ingest issues # ingest only issues
lore count issues # count issues
lore status # sync status
lore auth # verify auth
lore doctor # health check
Key changes:
- New IssuesArgs, MrsArgs, IngestArgs, CountArgs structs with
short flags (-n, -s, -p, -a, -l, -o, -f, -J, etc.)
- Global -J/--json flag as shorthand for --robot
- 'lore ingest' with no argument ingests both issues and MRs,
emitting combined JSON summary in robot mode
- --asc flag replaces --order=asc/desc for brevity
- Renamed flags: --has-due-date -> --has-due, --type -> --for,
--confirm -> --yes, target_branch -> --target, etc.
Old commands (list, show, auth-test, sync-status) are preserved
as hidden backward-compat aliases that emit deprecation warnings
to stderr before delegating to the new handlers.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ingestion counters (discussions_upserted, notes_upserted,
discussions_fetched, diffnotes_count) were incremented before
tx.commit(), meaning a failed commit would report inflated
metrics. Counters now increment only after successful commit
so reported numbers accurately reflect persisted state.
Also simplifies the stale-removal guard in issue discussions:
the received_first_response flag was unnecessary since an empty
seen_discussion_ids list is safe to pass to remove_stale -- if
there were no discussions, stale removal correctly sweeps all
previously-stored discussions. The two separate code paths
(empty vs populated) are collapsed into a single branch.
Derives Default on IngestResult to eliminate verbose zero-init.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two SQL correctness issues fixed:
1. Project filter used LIKE '%term%' which caused partial matches
(e.g. filtering for "foo" matched "group/foobar"). Now uses
exact match OR suffix match after '/' so "foo" matches
"group/foo" but not "group/foobar".
2. GROUP_CONCAT used comma as delimiter for labels and assignees,
which broke parsing when label names themselves contained commas.
Switched to ASCII unit separator (0x1F) which cannot appear in
GitLab entity names.
Also adds a guard for negative time deltas in format_relative_time
to handle clock skew gracefully instead of panicking.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends all data commands to support merge requests alongside issues,
with consistent patterns and JSON output for robot mode.
List command (gi list mrs):
- MR-specific columns: branches, draft status, reviewers
- Filters: --state (opened|merged|closed|locked|all), --draft,
--no-draft, --reviewer, --target-branch, --source-branch
- Discussion count with unresolved indicator (e.g., "5/2!")
- JSON output includes full MR metadata
Show command (gi show mr <iid>):
- MR details with branches, assignees, reviewers, merge status
- DiffNote positions showing file:line for code review comments
- Full description and discussion bodies (no truncation in JSON)
- --json flag for structured output with ISO timestamps
Count command (gi count mrs):
- MR counting with optional --type filter for discussions/notes
- JSON output with breakdown by state
Ingest command (gi ingest --type mrs):
- Full MR sync with discussion prefetch
- Progress output shows MR-specific metrics (diffnotes count)
- JSON summary with comprehensive sync statistics
All commands respect global --robot mode for auto-JSON output.
The pattern "gi list mrs --json | jq '.mrs[] | .iid'" now works
for scripted MR processing.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces a unified robot mode that enables JSON output across all
commands, designed for AI agent and script consumption.
Robot mode activation (any of):
- --robot flag: Explicit opt-in
- GI_ROBOT=1 env var: For persistent configuration
- Non-TTY stdout: Auto-detect when piped (e.g., gi list issues | jq)
Implementation:
- Cli::is_robot_mode(): Centralized detection logic
- All command handlers receive robot_mode boolean
- Errors emit structured JSON to stderr with exit codes
- Success responses emit JSON to stdout
Behavior changes in robot mode:
- No color/emoji output (no ANSI escapes)
- No progress spinners or interactive prompts
- Timestamps as ISO 8601 strings (not relative "2 hours ago")
- Full content (no truncation of descriptions/notes)
- Structured error objects with code, message, suggestion
This enables reliable parsing by Claude Code, shell scripts, and
automation pipelines. The auto-detect on non-TTY means simple piping
"just works" without explicit flags.
Per-command --json flags remain for explicit control and override
robot mode when needed for human-friendly terminal + JSON file output.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This is a P1 fix from the CP1-CP2 alignment audit. The --full flag was
designed to enable complete data re-synchronization, but it only reset
sync_cursors for issues—it failed to reset the per-issue
discussions_synced_for_updated_at watermark.
The result was an inconsistent state: issues would be re-fetched from
GitLab (because sync_cursors were cleared), but their discussions would
NOT be re-synced (because the watermark comparison prevented it). This
was a subtle bug because the watermark check uses:
WHERE updated_at > COALESCE(discussions_synced_for_updated_at, 0)
When discussions_synced_for_updated_at is already set to the issue's
updated_at, the comparison fails and discussions are skipped.
Fix: Before clearing sync_cursors, set discussions_synced_for_updated_at
to NULL for all issues in the project. This makes COALESCE return 0,
ensuring all issues become eligible for discussion sync.
The ordering is important: watermarks must be reset BEFORE cursors to
ensure the full sync behaves consistently.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>