gitlore

Author	SHA1	Message	Date
Taylor Eernisse	45126f04a6	fix: document upsert project_id, truncation budget, and Ollama model matching - regenerator: Include project_id in the ON CONFLICT UPDATE clause for document upserts. Previously, if a document moved between projects (e.g., during re-ingestion), the project_id would remain stale. - truncation: Compute the omission marker ("N notes omitted") before checking whether first+last notes fit in the budget. The old order computed the marker after the budget check, meaning the marker's byte cost was unaccounted for and could cause over-budget output. - ollama: Tighten model name matching to require either an exact match or a colon-delimited tag prefix (model == name or name starts with "model:"). The prior starts_with check would false-positive on "nomic-embed-text-v2" when looking for "nomic-embed-text". Tests updated to cover exact match, tagged, wrong model, and prefix false-positive cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-09 10:16:14 -05:00
Taylor Eernisse	dfa44e5bcd	fix(ingestion): label upsert reliability, init idempotency, and sync health Label upsert (issues + merge_requests): Replace INSERT ... ON CONFLICT DO UPDATE RETURNING with INSERT OR IGNORE + SELECT. The prior RETURNING-based approach relied on last_insert_rowid() matching the returned id, which is not guaranteed when ON CONFLICT triggers an update (SQLite may return 0). The new two-step approach is unambiguous and correctly tracks created_count. Init: Add ON CONFLICT(gitlab_project_id) DO UPDATE to the project insert so re-running `lore init` updates path/branch/url instead of failing with a unique constraint violation. MR discussions sync: Reset discussions_sync_attempts to 0 when clearing a sync health error, so previously-failed MRs get a fresh retry budget after successful sync. Count: format_number now handles negative numbers correctly by extracting the sign before inserting thousand-separators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-09 10:15:53 -05:00
Taylor Eernisse	53ef21d653	fix: propagate DB errors instead of silently swallowing them Replace .unwrap_or(), .ok(), and .filter_map(\|r\| r.ok()) patterns with proper error propagation using ? and rusqlite::OptionalExtension where the query may legitimately return no rows. Affected areas: - events_db::count_events: three count queries now propagate errors instead of defaulting to (0, 0) on failure - note_parser::extract_refs_from_system_notes: row iteration errors are now propagated instead of silently dropped via filter_map - note_parser::noteable_type_to_entity_type: unknown types now log a debug warning before defaulting to "issue" - payloads::store_payload/read_payload: use .optional()? instead of .ok() to distinguish "no row" from "query failed" - backoff::compute_next_attempt_at: use .clamp(0, 30) to guard against negative attempt_count, not just .min(30) - search::vector::max_chunks_per_document: returns Result<i64> with proper error propagation through .optional()?.flatten() - embedding::chunk_ids::decode_rowid: promote debug_assert to assert since negative rowids indicate data corruption worth failing fast on - ingestion::dirty_tracker::record_dirty_error: use .optional()? to handle missing dirty_sources row gracefully instead of hard error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-09 10:15:36 -05:00
Taylor Eernisse	41504b4941	feat(who): configurable scoring weights, MR refs, detail mode, and suffix path resolution Expert mode now surfaces the specific MR references (project/path!iid) that contributed to each expert's score, capped at 50 per user. A new --detail flag adds per-MR breakdowns showing role (Author/Reviewer/both), note count, and last activity timestamp. Scoring weights (author_weight, reviewer_weight, note_bonus) are now configurable via the config file's `scoring` section with validation that rejects negative values. Defaults shift to author_weight=25, reviewer_weight=10, note_bonus=1 — better reflecting that code authorship is a stronger expertise signal than review assignment alone. Path resolution gains suffix matching: typing "login.rs" auto-resolves to "src/auth/login.rs" when unambiguous, with clear disambiguation errors when multiple paths match. Project-scoping (-p) narrows the candidate set. The MAX_MR_REFS_PER_USER constant is promoted to module scope for reuse across expert and overlap modes. Human output shows MR refs inline and detail sub-rows when requested. Robot JSON includes mr_refs, mr_refs_total, mr_refs_truncated, and optional details array. Includes comprehensive tests for suffix resolution, scoring weight configurability, MR ref aggregation across projects, and detail mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-09 10:15:15 -05:00
Taylor Eernisse	b168a58134	fix(search): cap vector search k-value and add rowid assertion The vector search multiplier could grow unbounded on documents with many chunks, producing enormous k values that cause SQLite to scan far more rows than necessary. Clamp the multiplier to [8, 200] and cap k at 10,000 to prevent degenerate performance on large corpora. Also adds a debug_assert in decode_rowid to catch negative rowids early — these indicate a bug in the encoding pipeline and should fail fast rather than silently produce garbage document IDs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 14:34:05 -05:00
Taylor Eernisse	b704e33188	feat(sync): surface MR diff fetch/fail counters in sync output Adds mr_diffs_fetched and mr_diffs_failed fields to IngestResult and SyncResult, threads them through the orchestrator aggregation, includes them in the structured tracing span and human-readable sync summary. Previously MR diff failures were silently swallowed — now they appear alongside resource event counts for full pipeline observability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 14:33:53 -05:00
Taylor Eernisse	6e82f723c3	fix(ingestion): unify store + watermark + job-complete in single transaction Previously, drain_resource_events, drain_mr_closes_issues, and drain_mr_diffs each opened a transaction only for the job-complete + watermark update, but the store operation ran outside that transaction. If the process crashed between the store and the watermark update, data would be persisted without the watermark advancing, causing silent duplicates on the next sync. Now each drain function opens the transaction before the store call and commits it only after both the store and the watermark update succeed. On error, the transaction is explicitly dropped so the connection is not left in a half-committed state. Also: - store_resource_events no longer manages its own transaction; the caller passes in a connection (which is actually the transaction) - upsert_mr_file_changes wraps DELETE + INSERT in a transaction internally - reset_discussion_watermarks now also clears diffs_synced_for_updated_at - Orchestrator error span now includes closes_issues_failed + mr_diffs_failed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 14:33:47 -05:00
Taylor Eernisse	940a96375a	refactor(search): rename --after/--updated-after to --since/--updated-since The --since naming is more intuitive (matches git log --since) and consistent with the list commands which already use --since. Renames the CLI flags, SearchCliFilters fields, SearchFilters fields, autocorrect registry, and robot-docs manifest. No behavioral change. Affected paths: - cli/mod.rs: SearchArgs field + clap attribute rename - cli/commands/search.rs: SearchCliFilters + run_search plumbing - search/filters.rs: SearchFilters struct + apply_filters logic - main.rs: handle_search + robot-docs JSON - cli/autocorrect.rs: COMMAND_FLAGS entry for search Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 14:33:24 -05:00
Taylor Eernisse	c54a969269	fix(who): exclude self-assigned reviewers from file-change reviewer signal Signal 4 (mr_reviewers + mr_file_changes) was missing the self-review exclusion that signal 1 (DiffNote reviewer) already had. An MR author listed as their own reviewer would be double-counted as both author and reviewer, inflating their score. Also removes redundant SELECT DISTINCT from signal 2 (GROUP BY already ensures uniqueness). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 13:42:40 -05:00
Taylor Eernisse	95b7183add	feat(who): expand expert + overlap queries with mr_file_changes and mr_reviewers Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries) - Add fetch_mr_file_changes config option and --no-file-changes CLI flag - Add GitLab MR diffs API fetch pipeline with watermark-based sync - Create migration 020 for diffs_synced_for_updated_at watermark column - Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL: DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers - Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END) - Add insert_file_change test helper, 8 new who tests, all 397 tests pass - Also includes: list performance migration 019, autocorrect module, README updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 13:35:14 -05:00
Taylor Eernisse	435a208c93	perf: eliminate unnecessary clones and pre-allocate collections Three micro-optimizations with zero behavioral change: 1. timeline_collect.rs: Reorder format!() before enum construction so the owned String moves into the variant directly, eliminating .clone() on state, label, and milestone strings in StateChanged, LabelAdded/Removed, and MilestoneSet/Removed event paths. 2. pipeline.rs: Use Arc<str> for doc_hash shared across a document's chunks instead of cloning the full String per chunk. Also remove redundant embed_buf.reserve() since extend_from_slice already handles growth and the buffer is reused across iterations. 3. rrf.rs: Pre-allocate HashMap with combined vector+fts result count via with_capacity() to avoid rehashing during RRF score accumulation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 08:08:14 -05:00
Taylor Eernisse	cc11d3e5a0	fix: peer review — 5 correctness bugs across who, db, lock, embedding, main Comprehensive peer code review identified and fixed the following: 1. who.rs: @-prefixed path routing used `target` (with @) instead of `clean` (stripped) when checking for '/' and passing to Expert mode, causing `lore who @src/auth/` to silently return zero results because the SQL LIKE matched against `@src/auth/%` which never exists. 2. db.rs: After ROLLBACK TO savepoint on migration failure, the savepoint was never RELEASEd, leaving it active on the connection. Fixed in both run_migrations() and run_migrations_from_dir(). 3. lock.rs: Multiple acquire() calls (e.g. re-acquiring a stale lock) replaced the heartbeat_handle without stopping the old thread, causing two concurrent heartbeat writers competing on the same lock row. Now signals the old thread to stop and joins it before spawning a new one. 4. chunk_ids.rs: encode_rowid() had no guard for chunk_index >= 1000 (CHUNK_ROWID_MULTIPLIER), which would cause rowid collisions between adjacent documents. Added range assertion [0, 1000). 5. main.rs: Fallback JSON error formatting in handle_auth_test interpolated LoreError Display output without escaping quotes or backslashes, potentially producing malformed JSON for robot-mode consumers. Now escapes both characters before interpolation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 08:07:59 -05:00
Taylor Eernisse	5786d7f4b6	fix: defensive hardening — lock release logging, SQLite param guard, vector cast Three defensive improvements found via peer code review: 1. lock.rs: Lock release errors were silently discarded with `let _ =`. If the DELETE failed (disk full, corruption), the lock stayed in the database with no diagnostic. Next sync would require --force with no clue why. Now logs with error!() including the underlying error message. 2. filters.rs: Dynamic SQL label filter construction had no upper bound on bind parameters. With many combined filters, param_idx + labels.len() could exceed SQLite's 999-parameter limit, producing an opaque error. Added a guard that caps labels at 900 - param_idx. 3. vector.rs: max_chunks_per_document returned i64 which was cast to usize. A negative value from a corrupt database would wrap to a huge number, causing overflow in the multiplier calculation. Now clamped to .max(1) and cast via unsigned_abs(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 07:55:54 -05:00
Taylor Eernisse	d3306114eb	fix(ingestion): pass ShutdownSignal into issue and MR pagination loops The orchestrator already accepted a ShutdownSignal but only checked it between phases (after all issues fetched, before discussions). The inner loops in ingest_issues() and ingest_merge_requests() consumed entire paginated streams without checking for cancellation. On a large initial sync (thousands of issues/MRs), Ctrl+C could be unresponsive for minutes while the current entity type finished draining. Now both functions accept &ShutdownSignal and check is_cancelled() at the top of each iteration, breaking out promptly and committing the cursor for whatever was already processed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 07:55:36 -05:00
Taylor Eernisse	e6b880cbcb	fix: prevent panics in robot-mode JSON output and arithmetic paths Peer code review found multiple panic-reachable paths: 1. serde_json::to_string().unwrap() in 4 robot-mode output functions (who.rs, main.rs x3). If serialization ever failed (e.g., NaN from edge-case division), the CLI would panic with an unhelpful stack trace. Replaced with unwrap_or_else that emits a structured JSON error fallback. 2. encode_rowid() in chunk_ids.rs used unchecked multiplication (document_id * 1000). On extreme document IDs this could silently wrap in release mode, causing embedding rowid collisions. Now uses checked_mul + checked_add with a diagnostic panic message. 3. HTTP response body truncation at byte index 500 in client.rs could split a multi-byte UTF-8 character, causing a panic. Now uses floor_char_boundary(500) for safe truncation. 4. who.rs reviews mode: SQL used `m.author_username != ?1` which silently dropped MRs with NULL author_username (SQL NULL != anything = NULL). Changed to `(m.author_username IS NULL OR m.author_username != ?1)` to match the pattern already used in expert mode. 5. handle_auth_test hardcoded exit code 5 for all errors regardless of type. Config not found (20), token not set (4), and network errors (8) all incorrectly returned 5. Now uses e.exit_code() from the actual LoreError, with proper suggestion hints in human mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 07:55:20 -05:00
Taylor Eernisse	121a634653	fix: critical data integrity — timeline dedup, discussion atomicity, index collision Three correctness bugs found via peer code review: 1. TimelineEvent PartialEq/Ord omitted entity_type — issue #42 and MR #42 with the same timestamp and event_type were treated as equal. In a BTreeSet or dedup, one would silently be dropped. Added entity_type to both PartialEq and Ord comparisons. 2. discussions.rs: store_payload() was called outside the transaction (on bare conn) while upsert_discussion/notes were inside. A crash between them left orphaned payload rows. Moved store_payload inside the unchecked_transaction block, matching mr_discussions.rs pattern. 3. Migration 017 created idx_issue_assignees_username(username, issue_id) but migration 005 already created the same index name with just (username). SQLite's IF NOT EXISTS silently skipped the composite version on every existing database. New migration 018 drops and recreates the index with correct composite columns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 07:54:59 -05:00
Taylor Eernisse	f267578aab	feat: implement lore who — people intelligence commands (5 modes) Add `lore who` command with 5 query modes answering collaboration questions using existing DB data (280K notes, 210K discussions, 33K DiffNotes): - Expert: who knows about a file/directory (DiffNote path analysis + MR breadth scoring) - Workload: what is a person working on (assigned issues, authored/reviewing MRs, discussions) - Active: what discussions need attention (unresolved resolvable, global/project-scoped) - Overlap: who else is touching these files (dual author+reviewer role tracking) - Reviews: what review patterns does a person have (prefix-based category extraction) Includes migration 017 (5 composite indexes), CLI skeleton with clap conflicts_with validation, robot JSON output with input+resolved_input reproducibility, human terminal output, and 20 unit tests. All quality gates pass. Closes: bd-1q8z, bd-34rr, bd-2rk9, bd-2ldg, bd-zqpf, bd-s3rc, bd-m7k1, bd-b51e, bd-2711, bd-1rdi, bd-3mj2, bd-tfh3, bd-zibc, bd-g0d5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-07 23:11:14 -05:00
Taylor Eernisse	b5f78e31a8	fix(cli): audit-driven improvements to flags, help, exit codes, and deprecation Addresses findings from a comprehensive CLI readiness audit: Flag design (I2): - Add hidden --no-verbose flag with overrides_with semantics, matching the --no-quiet pattern already established for all other boolean flags. Help text (I3): - Add after_help examples to issues, mrs, search, sync, and timeline subcommands. Each shows 3-4 concrete, runnable commands with comments. Help headings (I4/P5): - Move --mode and --fts-mode from "Output" heading to "Mode" heading in the search subcommand. These control search strategy, not output format — "Output" is reserved for --limit, --explain, --fields. Exit codes (I5): - Health check failure now exits 19 (was 1). Exit code 1 is reserved for internal errors only. robot-docs updated to document code 19. Deprecation visibility (P4): - Deprecated commands (list, show, auth-test, sync-status) now emit structured JSON warnings to stderr in robot mode: {"warning":{"type":"DEPRECATED","message":"...","successor":"..."}} Previously these were silently swallowed in robot mode. Version string (P1): - Cli struct uses env!("LORE_VERSION") from build.rs so --version shows git hash (see previous commit). Fields flag (P3): - --fields help text updated to document the "minimal" preset. Robot-docs (parallel work): - response_schema added for every command, documenting the JSON shape agents will receive. Agents can now introspect expected fields before calling a command. - error_format documents the new "actions" array. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 23:47:04 -05:00
Taylor Eernisse	cf6d27435a	feat(robot): add elapsed_ms timing, --fields support, and actionable error actions Robot mode consistency improvements across all command output: Timing: - Every robot JSON response now includes meta.elapsed_ms measuring wall-clock time from command start to serialization. Agents can use this to detect slow queries and tune --limit or --project filters. Field selection (--fields): - print_list_issues_json and print_list_mrs_json accept an optional fields slice that prunes each item in the response array to only the requested keys. A "minimal" preset expands to [iid, title, state, updated_at_iso] for token-efficient agent scans. - filter_fields and expand_fields_preset live in the new src/cli/robot.rs module alongside RobotMeta. Actionable error recovery: - LoreError gains an actions() method returning concrete shell commands an agent can execute to recover (e.g. "ollama serve" for OllamaUnavailable, "lore init" for ConfigNotFound). - RobotError now serializes an "actions" array (empty array omitted) so agents can parse and offer one-click fixes. Envelope consistency: - show issue/MR JSON responses now use the standard {"ok":true,"data":...,"meta":...} envelope instead of bare data, matching all other commands. Files: src/cli/robot.rs (new), src/core/error.rs, src/cli/commands/{count,embed,generate_docs,ingest,list,show,stats,sync_status}.rs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 23:46:48 -05:00
Taylor Eernisse	a855759bf8	fix: shutdown safety, CLI hardening, exit code collision Shutdown signal improvements: - Upgrade ShutdownSignal from Relaxed to Release/Acquire ordering. Relaxed was technically sufficient for a single flag but Release/Acquire is the textbook correct pattern and ensures visibility guarantees across threads without relying on x86 TSO. - Add double Ctrl+C support to all three signal handlers (ingest, embed, sync). First Ctrl+C sets cooperative flag with user message; second Ctrl+C force-exits with code 130 (standard SIGINT convention). CLI hardening: - LORE_ROBOT env var now checks for truthy values (!empty, !="0", !="false") instead of mere existence. Setting LORE_ROBOT=0 or LORE_ROBOT=false no longer activates robot mode. - Replace unreachable!() in color mode match with defensive warning and fallback to auto. Clap validates the values but defense in depth prevents panics if the value_parser is ever changed. - Replace unreachable!() in completions shell match with proper error return for unsupported shells. Exit code collision fix: - ConfigNotFound was mapped to exit code 2 (error.rs:56) which collided with handle_clap_error() also using exit code 2 for parse errors. Agents calling lore --robot could not distinguish "bad arguments" from "missing config file." - Restore ConfigNotFound to exit code 20 (its original dedicated code). - Update robot-docs exit code table: code 2 = "Usage error", code 20 = "Config not found". Build script: - Track .git/refs/heads directory for Cargo rebuild triggers. Ensures GIT_HASH env var updates when branch refs change, not just HEAD. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 22:42:59 -05:00
Taylor Eernisse	f3f3560e0d	fix(ingestion): proper error propagation and transaction safety Three hardening improvements to the ingestion orchestrator: - Replace .unwrap_or(0) with ? on COUNT(*) queries for total_issues and total_mrs. These are simple aggregate queries that should never fail, but if they do (e.g. table missing after failed migration), propagating the error gives an actionable message instead of silently reporting 0 items. - Wrap store_closes_issues_refs in a SAVEPOINT with proper ROLLBACK/RELEASE. Previously, a failure mid-loop (e.g. on the 5th of 10 close-issue references) would leave partial refs committed. Now the entire batch is atomic. - Replace silent catch-all (_ => {}) arms in enqueue_resource_events and update_resource_event_watermark with explicit warnings for unknown entity_type values. Makes debugging easier when new entity types are added but the match arms aren't updated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 22:42:40 -05:00
Taylor Eernisse	2bfa4f1f8c	perf(documents): eliminate redundant hash query in regeneration The document regenerator was making two queries per document: 1. get_existing_hash() — SELECT content_hash 2. upsert_document_inner() — SELECT id, content_hash, labels_hash, paths_hash Query 2 already returns the content_hash needed for change detection. Remove get_existing_hash() entirely and compute content_changed inside upsert_document_inner() from the existing row data. upsert_document_inner now returns Result<bool> (true = content changed) which propagates up through upsert_document and regenerate_one, replacing the separate pre-check. The triple-hash fast-path (all three hashes match → return Ok(false) with no writes) is preserved. This halves the query count for unchanged documents, which dominate incremental syncs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 22:42:26 -05:00
Taylor Eernisse	8cf14fb69b	feat(search): sanitize raw FTS5 queries with safe fallback Add input validation for Raw FTS query mode to prevent expensive or malformed queries from reaching SQLite FTS5: - Reject unbalanced double quotes (would cause FTS5 syntax error) - Reject leading wildcard-only queries ("", " OR ...") that trigger expensive full-table scans - Reject empty/whitespace-only queries - Invalid raw input falls back to Safe mode automatically instead of erroring, so callers never see FTS5 parse failures The Safe mode already escapes all tokens with double-quote wrapping and handles embedded quotes via doubling. Raw mode now has a validation layer on top. All queries remain parameterized (?1, ?2) — user input never enters SQL strings directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 22:42:17 -05:00
Taylor Eernisse	c2036c64e9	feat(embed): docs_embedded tracking, buffer reuse, retry hardening Embedding pipeline improvements building on the concurrent batching foundation: - Track docs_embedded vs chunks_embedded separately. A document counts as embedded only when ALL its chunks succeed, giving accurate progress reporting. The sync command reads docs_embedded for its document count. - Reuse a single Vec<u8> buffer (embed_buf) across all store_embedding calls instead of allocating per chunk. Eliminates ~3KB allocation per 768-dim embedding. - Detect and record errors when Ollama silently returns fewer embeddings than inputs (batch mismatch). Previously these dropped chunks were invisible. - Improve retry error messages: distinguish "retry returned unexpected result" (wrong dims/count) from "retry request failed" (network error) instead of generic "chunk too large" message. - Convert all hot-path SQL from conn.execute() to prepare_cached() for statement cache reuse (clear_document_embeddings, store_embedding, record_embedding_error). - Record embedding_metadata errors for empty documents so they don't appear as perpetually pending on subsequent runs. - Accept concurrency parameter (configurable via config.embedding.concurrency) instead of hardcoded EMBED_CONCURRENCY=2. - Add schema version pre-flight check in embed command to fail fast with actionable error instead of cryptic SQL errors. - Fix --retry-failed to use DELETE instead of UPDATE. UPDATE clears last_error but the row still matches config params in the LEFT JOIN, making the doc permanently invisible to find_pending_documents. DELETE removes the row entirely so the LEFT JOIN returns NULL. Regression test added (old_update_approach_leaves_doc_invisible). - Add chunking forward-progress guard: after floor_char_boundary() rounds backward, ensure start advances by at least one full character to prevent infinite loops on multi-byte sequences (box-drawing chars, smart quotes). Test cases cover the exact patterns that caused production hangs on document 18526. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 22:42:08 -05:00
Taylor Eernisse	39cb0cb087	feat(embed): concurrent batching, UTF-8 safe chunking, right-sized chunks Three fixes to the embedding pipeline: 1. Concurrent HTTP batching: fire EMBED_CONCURRENCY (2) Ollama requests in parallel via join_all, then write results serially to SQLite. ~2x throughput improvement on GPU-bound workloads. 2. UTF-8 boundary safety: all computed byte offsets in split_into_chunks (paragraph/sentence/word break finders + overlap advance) now use floor_char_boundary() to prevent panics on multi-byte characters like smart quotes and non-breaking spaces. 3. CHUNK_MAX_BYTES reduced from 6000 to 1500 to fit nomic-embed-text's actual 2048-token context window, eliminating context-length retry storms that were causing 10x slowdowns. Also threads ShutdownSignal through embed pipeline for graceful Ctrl+C. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 14:48:34 -05:00
Taylor Eernisse	1c45725cba	fix(sync): pass options.full through to generate-docs stage The sync pipeline was hardcoding `false` for the `full` parameter when calling run_generate_docs, so `lore sync --full` would re-ingest all entities but then only regenerate documents for newly-dirtied ones. Entities loaded before migration 007 (which introduced the dirty_sources system) were never marked dirty and thus never got documents generated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 11:42:11 -05:00
Taylor Eernisse	405e5370dc	feat(sync): concurrent drains, atomic watermarks, graceful Ctrl+C shutdown Three fixes to the sync pipeline: 1. Atomic watermarks: wrap complete_job + update_watermark in a single SQLite transaction so crash between them can't leave partial state. 2. Concurrent drain loops: prefetch HTTP requests via join_all (batch size = dependent_concurrency), then write serially to DB. Reduces ~9K sequential requests from ~19 min to ~2.4 min. 3. Graceful shutdown: install Ctrl+C handler via ShutdownSignal (Arc<AtomicBool>), thread through orchestrator/CLI, release locked jobs on interrupt, record sync_run as "failed". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 11:22:04 -05:00
Taylor Eernisse	32783080f1	fix(timeline): report true total_events in robot JSON meta The robot JSON envelope's meta.total_events field was incorrectly reporting events.len() (the post-limit count), making it identical to meta.showing. This defeated the purpose of having both fields. Changes across the pipeline to fix this: - collect_events now returns (Vec<TimelineEvent>, usize) where the second element is the total event count before truncation - TimelineResult gains a total_events_before_limit field (serde-skipped) so the value flows cleanly from collect through to the renderer - main.rs passes the real total instead of the events.len() workaround Additional cleanup in this pass: - Derive PartialEq/Eq/PartialOrd/Ord on TimelineEventType, replacing the hand-rolled event_type_discriminant() function. Variant declaration order now defines sort tiebreak, documented in a doc comment. - Validate --since input with a proper LoreError::Other instead of silently treating invalid values as None - Fix ANSI-aware tag column padding with console::pad_str (colored tags like "[merged]" were misaligned because ANSI escapes consumed width) - Remove dead print_timeline_json and infer_max_depth functions that were superseded by print_timeline_json_with_meta Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 09:35:02 -05:00
Taylor Eernisse	69df8a5603	feat(timeline): wire up lore timeline command with human + robot renderers Complete Gate 3 by implementing the final three beads: - bd-2f2: Human output renderer with colored event tags, entity refs, evidence snippets, and expansion summary footer - bd-dty: Robot JSON output with {ok,data,meta} envelope, ISO timestamps, nested via provenance, and per-event-type details objects - bd-1nf: CLI wiring with TimelineArgs (9 flags), Commands::Timeline variant, handle_timeline handler, VALID_COMMANDS entry, and robot-docs manifest with temporal_intelligence workflow All 7 Gate 3 children now closed. Pipeline: SEED -> HYDRATE -> EXPAND -> COLLECT -> RENDER fully operational. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 08:49:48 -05:00
Taylor Eernisse	03d9f8cce5	docs(db): document safety invariants for sqlite-vec transmute Adds a SAFETY comment explaining why the transmute of sqlite3_vec_init to the sqlite3_auto_extension callback type is sound. The three invariants (stable C-ABI signature, single-call-per-connection contract, idempotency) were previously undocumented, which left the lone unsafe block without justification for future readers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 08:38:41 -05:00
Taylor Eernisse	9b23d91378	refactor(timeline): harden pipeline stages with shared resolver and exhaustive error handling Follows up on the resolve_entity_ref extraction by updating all three pipeline stages to consume the shared helper and removing their local duplicates (~75 lines of dead code eliminated). timeline_seed.rs: - Switch from local resolve_entity to shared resolve_entity_ref with explicit Some(proj_id) scoping - Add tracing::debug for orphaned discussion parents instead of silently skipping them, aiding debugging when evidence notes go missing - Use saturating_mul for the over-fetch multiplier to prevent overflow on pathological max_seeds values timeline_expand.rs: - Switch from local resolve_entity_ref to shared version with None project scoping (cross-project traversal) - Pass Option<i64> for target_iid in UnresolvedRef construction instead of unwrap_or(0) sentinel - Update test assertion to compare against Some(42) timeline_collect.rs: - Make entity_id_column return Result instead of silently defaulting to issue_id for unknown entity types. The previous fallback could produce incorrect SQL queries that return wrong results rather than failing - Replace if-let chains in collect_merged_event with exhaustive match blocks that propagate real DB errors while gracefully handling expected missing-data cases (QueryReturnedNoRows, NULL merged_at) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 08:38:24 -05:00
Taylor Eernisse	a324fa26e1	refactor(timeline): extract shared resolve_entity_ref and make target_iid optional The seed, expand, and collect stages each had their own near-identical resolve_entity_ref helper that converted internal DB IDs to full EntityRef structs. This duplication made it easy for bug fixes to land in one copy but not the others. Extract a single public resolve_entity_ref into timeline.rs with an optional project_id parameter: - Some(project_id): scopes the lookup (used by seed, which knows the project from the FTS result) - None: unscoped lookup (used by expand, which traverses cross-project references) Also changes UnresolvedRef.target_iid from i64 to Option<i64>. Cross- project references parsed from descriptions may not always carry an IID (e.g. when the reference is malformed or the target was deleted). The previous sentinel value of 0 was semantically incorrect since GitLab IIDs start at 1. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 08:38:12 -05:00
Taylor Eernisse	3e9cf2358e	perf(search+embed): zero-copy embedding API and deferred RRF mapping Change OllamaClient::embed_batch to accept &[&str] instead of Vec<String>. The EmbedRequest struct now borrows both model name and input texts, eliminating per-batch cloning of chunk text (up to 32KB per chunk x 32 chunks per batch). Serialization output is identical since serde serializes &str and String to the same JSON. In hybrid search, defer the RrfResult->HybridResult mapping until after filter+take, so only `limit` items (typically 20) are constructed instead of up to 1,500 at RECALL_CAP. Also switch filtered_ids to into_iter() to avoid an extra .copied() pass. Switch FTS search_fts from prepare() to prepare_cached() for statement reuse across repeated searches. Benchmarked at ~1.6x faster. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-05 17:35:53 -05:00
Taylor Eernisse	16beb35a69	perf(documents): batch INSERTs and writeln! in document pipeline Replace individual INSERT-per-label and INSERT-per-path loops in upsert_document_inner with single multi-row INSERT statements. For a document with 5 labels, this reduces 5 SQL round-trips to 1. Replace format!()+push_str() with writeln!() in all three document extractors (issue, MR, discussion). writeln! writes directly into the String buffer, avoiding the intermediate allocation that format! creates. Benchmarked at ~1.9x faster for string building and ~1.6x faster for batch inserts (measured over 5k iterations in-memory). Also switch get_existing_hash from prepare() to prepare_cached() since it is called once per document during regeneration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-05 17:35:42 -05:00
Taylor Eernisse	3767c33c28	feat: Implement Gate 3 timeline pipeline and Gate 4 migration scaffolding Complete 5 beads for the Phase B temporal intelligence feature: - bd-1oo: Register migration 015 (commit SHAs, closes watermark) and create migration 016 (mr_file_changes table with 4 indexes for Gate 4 file-history) - bd-20e: Define TimelineEvent model with 9 event type variants, EntityRef, ExpandedEntityRef, UnresolvedRef, and TimelineResult types. Ord impl for chronological sorting with stable tiebreak. - bd-32q: Implement timeline seed phase - FTS5 keyword search to entity IDs with discussion-to-parent resolution, entity dedup, and evidence note extraction with snippet truncation. - bd-ypa: Implement timeline expand phase - BFS cross-reference expansion over entity_references with bidirectional traversal, depth limiting, mention filtering, provenance tracking, and unresolved reference collection. - bd-3as: Implement timeline event collection - gathers Created, StateChanged, LabelAdded/Removed, MilestoneSet/Removed, Merged, and NoteEvidence events. Merged dedup (state=merged -> Merged variant only). NULL label/milestone fallbacks. Chronological interleaving with since filter and limit. 38 new tests, all 445 tests pass. All quality gates clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-05 16:54:28 -05:00
Taylor Eernisse	233eb546af	feat: Add commit SHAs, closes_issues watermark, and PRD alignment Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests (Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark for incremental sync, and the missing idx_label_events_label index. The MR transformer and ingestion pipeline now populate commit SHAs during sync. The orchestrator uses watermark-based filtering for closes_issues jobs instead of re-enqueuing all MRs every sync. The Phase B PRD is updated to match the actual codebase: corrected migration numbering (011-015), documented nullable label/milestone fields (migration 012), watermark patterns (013), observability infrastructure (014), simplified source_method values, and updated entity_references schema to match implementation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-05 15:29:51 -05:00
Taylor Eernisse	5d1586b88e	feat(show): Display full discussion content without truncation Remove artificial length limits from `lore show` output to display complete descriptions and discussion threads. Previously, descriptions were truncated to 500 characters and discussion notes to 300 characters, which cut off important context when reviewing issues and MRs. Users often need the full content to understand the complete discussion history. Changes: - Remove truncate() helper function and its 2 unit tests - Pass description and note bodies directly to wrap_text() - Affects both print_show_issue() and print_show_mr() The wrap_text() function continues to handle line wrapping for readability at the configured widths (76/72/68 chars depending on nesting level). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:46:29 -05:00
Taylor Eernisse	c730b0ec54	feat(cli): Improve help text, error handling, and add fuzzy command suggestions CLI help improvements (cli/mod.rs): - Add descriptive help text to all global flags (-c, --robot, -J, etc.) - Add descriptions to all subcommands (Issues, Mrs, Sync, etc.) - Add --no-quiet flag for explicit quiet override - Shell completions now shows installation instructions for each shell - Optional subcommand: running bare 'lore' shows help in terminal mode, robot-docs in robot mode Structured clap error handling (main.rs): - Early robot mode detection before parsing (env + args) - JSON error output for parse failures in robot mode - Semantic error codes: UNKNOWN_COMMAND, UNKNOWN_FLAG, MISSING_REQUIRED, INVALID_VALUE, ARGUMENT_CONFLICT, etc. - Fuzzy command suggestion using Jaro-Winkler similarity (>0.7 threshold) - Help/version requests handled normally (exit 0, not error) Robot-docs enhancements (main.rs): - Document deprecated command aliases (list issues -> issues, etc.) - Document clap error codes for programmatic error handling - Include completions command in manifest - Update flag documentation to show short forms (-n, -s, -p, etc.) Dependencies: - Add strsim 0.11 for Jaro-Winkler fuzzy matching Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:22:38 -05:00
Taylor Eernisse	ab43bbd2db	feat: Add dry-run mode to ingest, sync, and stats commands Enables preview of operations without making changes, useful for understanding what would happen before committing to a full sync. Ingest dry-run (--dry-run flag): - Shows resource type, sync mode (full vs incremental), project list - Per-project info: existing count, has_cursor, last_synced timestamp - No GitLab API calls, no database writes Sync dry-run (--dry-run flag): - Preview all four stages: issues ingest, MRs ingest, docs, embed - Shows which stages would run vs be skipped (--no-docs, --no-embed) - Per-project breakdown for both entity types Stats repair dry-run (--dry-run flag): - Shows what would be repaired without executing repairs - "would fix" vs "fixed" indicator in terminal output - dry_run: true field in JSON response Implementation details: - DryRunPreview struct captures project-level sync state - SyncDryRunResult aggregates previews for all sync stages - Terminal output uses yellow styling for "would" actions - JSON output includes dry_run: true at top level Flag handling: - --dry-run and --no-dry-run pair for explicit control - Defaults to false (normal operation) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:22:22 -05:00
Taylor Eernisse	784fe79b80	feat(show): Enrich issue detail with assignees, milestones, and closing MRs Issue detail now includes: - assignees: List of assigned usernames from issue_assignees table - due_date: Issue due date when set - milestone: Milestone title when assigned - closing_merge_requests: MRs that will close this issue when merged Closing MR detection: - Queries entity_references table for 'closes' reference type - Shows MR iid, title, state (with color coding) in terminal output - Full MR metadata included in JSON output Human-readable output: - "Assignees:" line shows comma-separated @usernames - "Development:" section lists closing MRs with state indicator - Green for merged, cyan for opened, red for closed JSON output: - New fields: assignees, due_date, milestone, closing_merge_requests - closing_merge_requests array contains iid, title, state, web_url Test coverage: - get_issue_assignees: empty, single, multiple (alphabetical order) - get_closing_mrs: empty, single, ignores 'mentioned' references Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:22:02 -05:00
Taylor Eernisse	db750e4fc5	fix: Graceful HTTP client fallbacks and overflow protection HTTP client initialization (embedding/ollama.rs, gitlab/client.rs): - Replace expect/panic with unwrap_or_else fallback to default Client - Log warning when configured client fails to build - Prevents crash on TLS/system configuration issues Doctor command (cli/commands/doctor.rs): - Handle reqwest Client::builder() failure in Ollama health check - Return Warning status with descriptive message instead of panicking - Ensures doctor command remains operational even with HTTP issues These changes improve resilience when running in unusual environments (containers with limited TLS, restrictive network policies, etc.) without affecting normal operation. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:21:40 -05:00
Taylor Eernisse	72f1cafdcf	perf: Optimize SQL queries and reduce allocations in hot paths Change detection queries (embedding/change_detector.rs): - Replace triple-EXISTS subquery pattern with LEFT JOIN + NULL check - SQLite now scans embedding_metadata once instead of three times - Semantically identical: returns docs needing embedding when no embedding exists, hash changed, or config mismatch Count queries (cli/commands/count.rs): - Consolidate 3 separate COUNT queries for issues into single query using conditional aggregation (CASE WHEN state = 'x' THEN 1) - Same optimization for MRs: 5 queries reduced to 1 Search filter queries (search/filters.rs): - Replace N separate EXISTS clauses for label filtering with single IN() clause with COUNT/GROUP BY HAVING pattern - For multi-label AND queries, this reduces N subqueries to 1 FTS tokenization (search/fts.rs): - Replace collect-into-Vec-then-join pattern with direct String building - Pre-allocate capacity hint for result string Discussion truncation (documents/truncation.rs): - Calculate total length without allocating concatenated string first - Only allocate full string when we know it fits within limit Embedding pipeline (embedding/pipeline.rs): - Add Vec::with_capacity hints for chunk work and cleared_docs hashset - Reduces reallocations during embedding batch processing Backoff calculation (core/backoff.rs): - Replace unchecked addition with saturating_add to prevent overflow - Add test case verifying overflow protection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:21:28 -05:00
Taylor Eernisse	65583ed5d6	refactor: Remove redundant doc comments throughout codebase Removes module-level doc comments (//! lines) and excessive inline doc comments that were duplicating information already evident from: - Function/struct names (self-documenting code) - Type signatures (the what is clear from types) - Implementation context (the how is clear from code) Affected modules: - cli/* - Removed command descriptions duplicating clap help text - core/* - Removed module headers and obvious function docs - documents/* - Removed extractor/regenerator/truncation docs - embedding/* - Removed pipeline and chunking docs - gitlab/* - Removed client and transformer docs (kept type definitions) - ingestion/* - Removed orchestrator and ingestion docs - search/* - Removed FTS and vector search docs Philosophy: Code should be self-documenting. Comments should explain "why" (business decisions, non-obvious constraints) not "what" (which the code itself shows). This change reduces noise and maintenance burden while keeping the codebase just as understandable. Retains comments for: - Non-obvious business logic - Important safety invariants - Complex algorithm explanations - Public API boundaries where generated docs matter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:04:32 -05:00
Taylor Eernisse	a76dc8089e	feat(orchestrator): Integrate closes_issues fetching and cross-ref extraction Extends the MR ingestion pipeline to populate the entity_references table from multiple sources: 1. Resource state events (extract_refs_from_state_events): Called after draining the resource_events queue for both issues and MRs. Extracts "closes" relationships from the structured API data. 2. System notes (extract_refs_from_system_notes): Called during MR ingestion to parse "mentioned in" and "closed by" patterns from discussion note bodies. 3. MR closes_issues API (new): - enqueue_mr_closes_issues_jobs(): Queues jobs for all MRs - drain_mr_closes_issues(): Fetches closes_issues for each MR - Records cross-references with source_method='closes_issues_api' New progress events: - ClosesIssuesFetchStarted { total } - ClosesIssueFetched { current, total } - ClosesIssuesFetchComplete { fetched, failed } New result fields on IngestMrProjectResult: - closes_issues_fetched: Count of successful fetches - closes_issues_failed: Count of failed fetches The pipeline now comprehensively builds the relationship graph between issues and MRs, enabling queries like "what will close this issue?" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:03:40 -05:00
Taylor Eernisse	26cf13248d	feat(gitlab): Add MR closes_issues API endpoint and GitLabIssueRef type Extends the GitLab client to fetch the list of issues that an MR will close when merged, using the /projects/:id/merge_requests/:iid/closes_issues endpoint. New type: - GitLabIssueRef: Lightweight issue reference with id, iid, project_id, title, state, and web_url. Used for the closes_issues response which returns a list of issue summaries rather than full GitLabIssue objects. New client method: - fetch_mr_closes_issues(gitlab_project_id, iid): Returns Vec<GitLabIssueRef> for all issues that the MR's description/commits indicate will be closed. This enables building the entity_references table from API data in addition to parsing system notes, providing more reliable cross-reference discovery. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:03:30 -05:00
Taylor Eernisse	f748570d4d	feat(core): Add cross-reference extraction infrastructure Introduces two new modules for extracting and storing entity cross-references from GitLab data: note_parser.rs: - Parses system notes for "mentioned in" and "closed by" patterns - Extracts cross-project references (group/project#42, group/project!123) - Uses lazy-compiled regexes for performance - Handles both issue (#) and MR (!) sigils - Provides extract_refs_from_system_notes() for batch processing references.rs: - Extracts refs from resource_state_events table (API-sourced closes links) - Provides insert_entity_reference() for storing discovered references - Includes resolution helpers: resolve_issue_local_id, resolve_mr_local_id, resolve_project_path for converting iids to internal IDs - Enables cross-project reference resolution These modules power the entity_references table, enabling features like "find all MRs that close this issue" and "find all issues mentioned in this MR". Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:03:13 -05:00
Taylor Eernisse	1d003aeac2	fix(sync): Replace text-only progress with animated bars for docs/embed stages Stages 3 (generate-docs) and 4 (embed) reported progress by appending "(N/M)" text to the stage spinner message, while stages 1-2 (ingest) used dedicated indicatif progress bars with animated [====> ] rendering registered with the global MultiProgress. This visual inconsistency was introduced when progress callbacks were wired through in `266ed78`. Replace the spinner.set_message() callbacks with proper ProgressBar instances that match the ingest stage pattern: - Create a bar-style ProgressBar registered via multi().add() - Use the same template/progress_chars as the ingest discussion bars - Lazy-init the tick via AtomicBool to avoid showing the bar before the first callback fires (matching how ingest enables ticks only at DiscussionSyncStarted) - Update set_length on every callback for the docs stage, since the regenerator's estimated_total can grow if new dirty items are queued during processing (using .max() internally) - Clean up both the sub-bar and stage spinner on completion/error Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 15:02:13 -05:00
Taylor Eernisse	925ec9f574	fix: Retry loop safety, doctor model matching, regenerator robustness Three defensive improvements from peer code review: Replace unreachable!() in GitLab client retry loops: Both request() and request_with_headers() had unreachable!() after their for loops. While the logic was sound (the final iteration always reaches the return/break), any refactor to the loop condition would turn this into a runtime panic. Restructured both to store last_response with explicit break, making the control flow self-documenting and the .expect() message useful if ever violated. Doctor model name comparison asymmetry: Ollama model names were stripped of their tag (:latest, :v1.5) for comparison, but the configured model name was compared as-is. A config value like "nomic-embed-text:v1.5" would never match. Now strips the tag from both sides before comparing. Regenerator savepoint cleanup and progress accuracy: - upsert_document's error path did ROLLBACK TO but never RELEASE, leaving a dangling savepoint that could nest on the next call. Added RELEASE after rollback so the connection is clean. - estimated_total for progress reporting was computed once at start but the dirty queue can grow during processing. Now recounts each loop iteration with max() so the progress fraction never goes backwards. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:54 -05:00
Taylor Eernisse	1fdc6d03cc	fix: Savepoint leak in embedding pipeline, atomic fail_job, RRF dedup Three correctness fixes found during peer code review: Embedding pipeline savepoint leak (HIGH severity): The SAVEPOINT embed_page / RELEASE embed_page pattern had ~10 `?` propagation points between them. Any error from record_embedding_error, clear_document_embeddings, or store_embedding would exit the function without rolling back, leaving the SQLite connection in a broken transactional state and causing cascading failures for the rest of the session. Fixed by extracting page processing into `embed_page()` and wrapping with explicit rollback-on-error handling. Dependent queue fail_job race (MEDIUM severity): fail_job performed a SELECT followed by a separate UPDATE on the attempts counter without a transaction. Under concurrent lock reclamation, the attempts value could be read stale. Replaced with a single atomic UPDATE that increments attempts and computes exponential backoff entirely in SQL, also halving DB round-trips. Added explicit error when the job no longer exists. RRF duplicate document score inflation (MEDIUM severity): If a retriever returned the same document_id multiple times, the RRF score accumulated multiple rank contributions while the rank only recorded the first occurrence. Moved the score accumulation inside the `if is_none` guard so only the first occurrence per list contributes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:38 -05:00
Taylor Eernisse	266ed78e73	feat(sync): Wire progress callbacks through sync pipeline stages The sync command's stage spinners now show real-time aggregate progress for each pipeline phase instead of static "syncing..." messages. - Add `progress_callback` parameter to `run_embed` and `run_generate_docs` so callers can receive `(processed, total)` updates - Add `stage_bar` parameter to `run_ingest` for aggregate progress across concurrently-ingested projects using shared AtomicUsize counters - Update `stage_spinner` to use `{prefix}` for the `[N/M]` label, allowing `{msg}` to be updated independently with progress details - Thread `ProgressBar` clones into each concurrent project task so per-entity progress (fetch, discussions, events) is reflected on the aggregate spinner - Pass `None` for progress callbacks at standalone CLI entry points (handle_ingest, handle_generate_docs, handle_embed) to preserve existing behavior when commands are run outside of sync Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 14:16:21 -05:00

1 2 3

107 Commits