Removes module-level doc comments (//! lines) and excessive inline doc
comments that were duplicating information already evident from:
- Function/struct names (self-documenting code)
- Type signatures (the what is clear from types)
- Implementation context (the how is clear from code)
Affected modules:
- cli/* - Removed command descriptions duplicating clap help text
- core/* - Removed module headers and obvious function docs
- documents/* - Removed extractor/regenerator/truncation docs
- embedding/* - Removed pipeline and chunking docs
- gitlab/* - Removed client and transformer docs (kept type definitions)
- ingestion/* - Removed orchestrator and ingestion docs
- search/* - Removed FTS and vector search docs
Philosophy: Code should be self-documenting. Comments should explain
"why" (business decisions, non-obvious constraints) not "what" (which
the code itself shows). This change reduces noise and maintenance burden
while keeping the codebase just as understandable.
Retains comments for:
- Non-obvious business logic
- Important safety invariants
- Complex algorithm explanations
- Public API boundaries where generated docs matter
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Stages 3 (generate-docs) and 4 (embed) reported progress by appending
"(N/M)" text to the stage spinner message, while stages 1-2 (ingest)
used dedicated indicatif progress bars with animated [====> ] rendering
registered with the global MultiProgress. This visual inconsistency
was introduced when progress callbacks were wired through in 266ed78.
Replace the spinner.set_message() callbacks with proper ProgressBar
instances that match the ingest stage pattern:
- Create a bar-style ProgressBar registered via multi().add()
- Use the same template/progress_chars as the ingest discussion bars
- Lazy-init the tick via AtomicBool to avoid showing the bar before
the first callback fires (matching how ingest enables ticks only
at DiscussionSyncStarted)
- Update set_length on every callback for the docs stage, since the
regenerator's estimated_total can grow if new dirty items are
queued during processing (using .max() internally)
- Clean up both the sub-bar and stage spinner on completion/error
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The sync command's stage spinners now show real-time aggregate progress
for each pipeline phase instead of static "syncing..." messages.
- Add `progress_callback` parameter to `run_embed` and
`run_generate_docs` so callers can receive `(processed, total)` updates
- Add `stage_bar` parameter to `run_ingest` for aggregate progress
across concurrently-ingested projects using shared AtomicUsize counters
- Update `stage_spinner` to use `{prefix}` for the `[N/M]` label,
allowing `{msg}` to be updated independently with progress details
- Thread `ProgressBar` clones into each concurrent project task so
per-entity progress (fetch, discussions, events) is reflected on the
aggregate spinner
- Pass `None` for progress callbacks at standalone CLI entry points
(handle_ingest, handle_generate_docs, handle_embed) to preserve
existing behavior when commands are run outside of sync
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add end-to-end observability to the sync and ingest pipelines:
Sync command:
- Generate UUID-based run_id for each sync invocation, propagated through
all child spans for log correlation across stages
- Accept MetricsLayer reference to extract hierarchical StageTiming data
after pipeline completion for robot-mode performance output
- Record sync runs in DB via SyncRunRecorder (start/succeed/fail lifecycle)
- Wrap entire sync execution in a root tracing span with run_id field
Ingest command:
- Wrap run_ingest in an instrumented root span with run_id and resource_type
- Add project path prefix to discussion progress bars for multi-project clarity
- Reset resource_events_synced_for_updated_at on --full re-sync
Sync status:
- Expand from single last_run to configurable recent runs list (default 10)
- Parse and expose StageTiming metrics from stored metrics_json
- Add run_id, total_items_processed, total_errors to SyncRunInfo
- Add mr_count to DataSummary for complete entity coverage
Orchestrator:
- Add #[instrument] with structured fields to issue and MR ingestion functions
- Record items_processed, items_skipped, errors on span close for MetricsLayer
- Emit granular progress events (IssuesFetchStarted, IssuesFetchComplete)
- Pass project_id through to drain_resource_events for scoped job claiming
Document regenerator and embedding pipeline:
- Add #[instrument] spans with items_processed, items_skipped, errors fields
- Record final counts on span close for metrics extraction
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
tracing-indicatif pulled in vt100, arrayvec, and its own indicatif
integration layer. Replace it with a minimal SuspendingWriter that
coordinates tracing output with progress bars via a global LazyLock
MultiProgress.
- Add src/cli/progress.rs: shared MultiProgress singleton via LazyLock
and a SuspendingWriter that suspends bars before writing log lines,
preventing interleaving/flicker
- Wire all progress bar creation through multi().add() in sync and
ingest commands
- Replace IndicatifLayer in main.rs with SuspendingWriter for
tracing-subscriber's fmt layer
- Remove tracing-indicatif from Cargo.toml (drops vt100 and arrayvec
transitive deps)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Integrate resource event fetching as Step 4 of both issue and MR
ingestion, gated behind the fetch_resource_events config flag.
Orchestrator changes:
- Add ProgressEvent variants: ResourceEventsFetchStarted,
ResourceEventFetched, ResourceEventsFetchComplete
- Add resource_events_fetched/failed fields to IngestProjectResult
and IngestMrProjectResult
- New enqueue_resource_events_for_entity_type() queries all
issues/MRs for a project and enqueues resource_events jobs via
the dependent queue (INSERT OR IGNORE for idempotency)
- New drain_resource_events() claims jobs in batches, fetches
state/label/milestone events from GitLab API, stores them
atomically via unchecked_transaction, and handles failures
with exponential backoff via fail_job()
- Max-iterations guard prevents infinite retry loops within a
single drain run
- New store_resource_events() + per-type _tx helpers write events
using prepared statements inside a single transaction
- DrainResult struct tracks fetched/failed counts
CLI ingest changes:
- IngestResult gains resource_events_fetched/failed fields
- Progress bar repurposed for resource event fetch phase
(reuses discussion bar with updated template)
- Accumulates event counts from both issue and MR ingestion
CLI sync changes:
- SyncResult gains resource_events_fetched/failed fields
- Accumulates counts from both ingest stages
- print_sync() conditionally displays event counts
- Structured logging includes event counts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a new boolean field to SyncConfig that controls whether resource
event fetching is performed during sync:
- SyncConfig.fetch_resource_events: defaults to true via serde
default_true helper, serialized as "fetchResourceEvents" in JSON
- SyncArgs.no_events: --no-events CLI flag that overrides the config
value to false when present
- SyncOptions.no_events: propagates the flag through the sync pipeline
- handle_sync_cmd: mutates loaded config when --no-events is set,
ensuring the flag takes effect regardless of config file contents
This follows the existing pattern established by --no-embed and
--no-docs flags, where CLI flags override config file defaults.
The config is loaded as mutable specifically to support this override.
Also adds "events" to the count command's entity type value_parser,
enabling `lore count events` (implementation in a separate commit).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --full / --no-full flag pair to EmbedArgs with overrides_with
semantics matching the existing flag pattern. When active, atomically
DELETEs all embedding_metadata and embeddings before re-embedding.
- Thread the full flag through run_embed -> run_sync so that
'lore sync --full' triggers a complete re-embed alongside the full
re-ingest it already performed.
- Add indicatif spinners to sync stages with dynamic stage numbering
that adjusts when --no-docs or --no-embed skip stages. Spinners are
hidden in robot mode.
- Update robot-docs manifest to advertise the new --full flag on the
embed command.
- Replace hardcoded schema version 9 in health check with the
LATEST_SCHEMA_VERSION constant from db.rs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ingest:
- Introduce IngestDisplay struct with show_progress/show_text booleans
to decouple progress bars from text output. Replaces the robot_mode
bool parameter with explicit display control, enabling sync to show
progress without duplicating summary text (progress_only mode).
- Use resolve_project() for --project filtering instead of LIKE queries,
providing proper error messages for ambiguous or missing projects.
List:
- Add colored_cell() helper that checks console::colors_enabled() before
applying comfy-table foreground colors, bridging the gap between the
console and comfy-table crates for --color flag support.
- Use resolve_project() for project filtering (exact ID match).
- Improve since filter to return explicit errors instead of silently
ignoring invalid values.
- Improve format_relative_time for proper singular/plural forms.
Search:
- Validate --after/--updated-after with explicit error messages.
- Handle optional title field (Option<String>) in HydratedRow.
Show:
- Use resolve_project() for project disambiguation.
Sync:
- Thread robot_mode via SyncOptions for IngestDisplay selection.
- Use IngestDisplay::progress_only() in interactive sync mode.
GenerateDocs:
- Use resolve_project() for --project filtering.
Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>
Extends the CLI with six new commands that complete the search pipeline:
- lore search <QUERY>: Hybrid search with mode selection (lexical,
hybrid, semantic), rich filtering (--type, --author, --project,
--label, --path, --after, --updated-after), result limits, and
optional explain mode showing RRF score breakdowns. Safe FTS mode
sanitizes user input; raw mode passes through for power users.
- lore stats: Document and index statistics with optional --check
for integrity verification and --repair to fix inconsistencies
(orphaned documents, missing FTS entries, stale dirty queue items).
- lore embed: Generate vector embeddings via Ollama. Supports
--retry-failed to re-attempt previously failed embeddings.
- lore generate-docs: Drain the dirty queue to regenerate documents.
--full seeds all entities for complete rebuild. --project scopes
to a single project.
- lore sync: Full pipeline orchestration (ingest issues + MRs,
generate-docs, embed) with --no-embed and --no-docs flags for
partial runs. Reports per-stage results and total elapsed time.
- lore health: Quick pre-flight check (config exists, DB exists,
schema current). Returns exit code 1 if unhealthy. Designed for
agent pre-flight scripts.
- lore robot-docs: Machine-readable command manifest for agent
self-discovery. Returns all commands, flags, examples, exit codes,
and recommended workflows as structured JSON.
Also enhances lore init with --gitlab-url, --token-env-var, and
--projects flags for fully non-interactive robot-mode initialization.
Fixes init's force/non-interactive precedence logic and adds JSON
output for robot mode.
Updates all command files for the GiError -> LoreError rename.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>