gitlore

Author	SHA1	Message	Date
Taylor Eernisse	39cb0cb087	feat(embed): concurrent batching, UTF-8 safe chunking, right-sized chunks Three fixes to the embedding pipeline: 1. Concurrent HTTP batching: fire EMBED_CONCURRENCY (2) Ollama requests in parallel via join_all, then write results serially to SQLite. ~2x throughput improvement on GPU-bound workloads. 2. UTF-8 boundary safety: all computed byte offsets in split_into_chunks (paragraph/sentence/word break finders + overlap advance) now use floor_char_boundary() to prevent panics on multi-byte characters like smart quotes and non-breaking spaces. 3. CHUNK_MAX_BYTES reduced from 6000 to 1500 to fit nomic-embed-text's actual 2048-token context window, eliminating context-length retry storms that were causing 10x slowdowns. Also threads ShutdownSignal through embed pipeline for graceful Ctrl+C. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 14:48:34 -05:00
Taylor Eernisse	65583ed5d6	refactor: Remove redundant doc comments throughout codebase Removes module-level doc comments (//! lines) and excessive inline doc comments that were duplicating information already evident from: - Function/struct names (self-documenting code) - Type signatures (the what is clear from types) - Implementation context (the how is clear from code) Affected modules: - cli/* - Removed command descriptions duplicating clap help text - core/* - Removed module headers and obvious function docs - documents/* - Removed extractor/regenerator/truncation docs - embedding/* - Removed pipeline and chunking docs - gitlab/* - Removed client and transformer docs (kept type definitions) - ingestion/* - Removed orchestrator and ingestion docs - search/* - Removed FTS and vector search docs Philosophy: Code should be self-documenting. Comments should explain "why" (business decisions, non-obvious constraints) not "what" (which the code itself shows). This change reduces noise and maintenance burden while keeping the codebase just as understandable. Retains comments for: - Non-obvious business logic - Important safety invariants - Complex algorithm explanations - Public API boundaries where generated docs matter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:04:32 -05:00
Taylor Eernisse	266ed78e73	feat(sync): Wire progress callbacks through sync pipeline stages The sync command's stage spinners now show real-time aggregate progress for each pipeline phase instead of static "syncing..." messages. - Add `progress_callback` parameter to `run_embed` and `run_generate_docs` so callers can receive `(processed, total)` updates - Add `stage_bar` parameter to `run_ingest` for aggregate progress across concurrently-ingested projects using shared AtomicUsize counters - Update `stage_spinner` to use `{prefix}` for the `[N/M]` label, allowing `{msg}` to be updated independently with progress details - Thread `ProgressBar` clones into each concurrent project task so per-entity progress (fetch, discussions, events) is reflected on the aggregate spinner - Pass `None` for progress callbacks at standalone CLI entry points (handle_ingest, handle_generate_docs, handle_embed) to preserve existing behavior when commands are run outside of sync Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:21 -05:00
Taylor Eernisse	a50fc78823	style: Apply cargo fmt and clippy fixes across codebase Automated formatting and lint corrections from parallel agent work: - cargo fmt: import reordering (alphabetical), line wrapping to respect max width, trailing comma normalization, destructuring alignment, function signature reformatting, match arm formatting - clippy (pedantic): Range::contains() instead of manual comparisons, i64::from() instead of `as i64` casts, .clamp() instead of .max().min() chains, let-chain refactors (if-let with &&), #[allow(clippy::too_many_arguments)] and #[allow(clippy::field_reassign_with_default)] where warranted - Removed trailing blank lines and extra whitespace No behavioral changes. All existing tests pass unmodified. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:01:59 -05:00
Taylor Eernisse	aebbe6b795	feat(cli): Wire --full flag for embed, add sync stage spinners - Add --full / --no-full flag pair to EmbedArgs with overrides_with semantics matching the existing flag pattern. When active, atomically DELETEs all embedding_metadata and embeddings before re-embedding. - Thread the full flag through run_embed -> run_sync so that 'lore sync --full' triggers a complete re-embed alongside the full re-ingest it already performed. - Add indicatif spinners to sync stages with dynamic stage numbering that adjusts when --no-docs or --no-embed skip stages. Spinners are hidden in robot mode. - Update robot-docs manifest to advertise the new --full flag on the embed command. - Replace hardcoded schema version 9 in health check with the LATEST_SCHEMA_VERSION constant from db.rs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 09:35:22 -05:00
Taylor Eernisse	daf5a73019	feat(cli): Add search, stats, embed, sync, health, and robot-docs commands Extends the CLI with six new commands that complete the search pipeline: - lore search <QUERY>: Hybrid search with mode selection (lexical, hybrid, semantic), rich filtering (--type, --author, --project, --label, --path, --after, --updated-after), result limits, and optional explain mode showing RRF score breakdowns. Safe FTS mode sanitizes user input; raw mode passes through for power users. - lore stats: Document and index statistics with optional --check for integrity verification and --repair to fix inconsistencies (orphaned documents, missing FTS entries, stale dirty queue items). - lore embed: Generate vector embeddings via Ollama. Supports --retry-failed to re-attempt previously failed embeddings. - lore generate-docs: Drain the dirty queue to regenerate documents. --full seeds all entities for complete rebuild. --project scopes to a single project. - lore sync: Full pipeline orchestration (ingest issues + MRs, generate-docs, embed) with --no-embed and --no-docs flags for partial runs. Reports per-stage results and total elapsed time. - lore health: Quick pre-flight check (config exists, DB exists, schema current). Returns exit code 1 if unhealthy. Designed for agent pre-flight scripts. - lore robot-docs: Machine-readable command manifest for agent self-discovery. Returns all commands, flags, examples, exit codes, and recommended workflows as structured JSON. Also enhances lore init with --gitlab-url, --token-env-var, and --projects flags for fully non-interactive robot-mode initialization. Fixes init's force/non-interactive precedence logic and adds JSON output for robot mode. Updates all command files for the GiError -> LoreError rename. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:47:10 -05:00

6 Commits