gitlore

Author	SHA1	Message	Date
teernisse	06889ec85a	fix(explain): address review findings — N+1 queries, duplicate decisions, silent errors 1. fetch_open_threads: replace N+1 loop (2 queries per thread) with a single query using correlated subqueries for note_count and started_by. 2. extract_key_decisions: track consumed notes so the same note is not matched to multiple events, preventing duplicate decision entries. 3. build_timeline_excerpt_from_pipeline: log tracing::warn on seed/collect failures instead of silently returning empty timeline.	2026-03-10 16:43:06 -04:00
teernisse	b2811b5e45	fix(fts): remove NEAR from infix operator list NEAR is an FTS5 function (NEAR(term1 term2, N)), not an infix operator like AND/OR/NOT. Passing it through unquoted in Safe mode was incorrect - it would be treated as a literal term rather than a function call. Users who need NEAR proximity search should use FtsQueryMode::Raw which passes the query through verbatim to FTS5. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-26 11:06:59 -05:00
Taylor Eernisse	ebf64816c9	fix(search): correct FTS5 raw mode fallback test assertion Update test_raw_mode_leading_wildcard_falls_back_to_safe to match the actual Safe mode behavior: OR is a recognized FTS5 boolean operator and passes through unquoted, so the expected output is '"" OR "auth"' not '"" "OR" "auth"'. The previous assertion was incorrect since the Safe mode operator-passthrough logic was added. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 22:34:01 -05:00
teernisse	59f65b127a	fix(search): pass FTS5 boolean operators through unquoted FTS5 boolean operators (AND, OR, NOT, NEAR) are case-sensitive uppercase keywords that must appear unquoted in the query string. Previously, the user-friendly query builder would double-quote every token, causing queries like "switch AND health" to search for the literal word "AND" instead of using it as a boolean conjunction. Adds a FTS5_OPERATORS constant and checks each token against it before quoting, allowing natural boolean search syntax to work as expected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 14:56:29 -05:00
Taylor Eernisse	8cf14fb69b	feat(search): sanitize raw FTS5 queries with safe fallback Add input validation for Raw FTS query mode to prevent expensive or malformed queries from reaching SQLite FTS5: - Reject unbalanced double quotes (would cause FTS5 syntax error) - Reject leading wildcard-only queries ("", " OR ...") that trigger expensive full-table scans - Reject empty/whitespace-only queries - Invalid raw input falls back to Safe mode automatically instead of erroring, so callers never see FTS5 parse failures The Safe mode already escapes all tokens with double-quote wrapping and handles embedded quotes via doubling. Raw mode now has a validation layer on top. All queries remain parameterized (?1, ?2) — user input never enters SQL strings directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 22:42:17 -05:00
Taylor Eernisse	3e9cf2358e	perf(search+embed): zero-copy embedding API and deferred RRF mapping Change OllamaClient::embed_batch to accept &[&str] instead of Vec<String>. The EmbedRequest struct now borrows both model name and input texts, eliminating per-batch cloning of chunk text (up to 32KB per chunk x 32 chunks per batch). Serialization output is identical since serde serializes &str and String to the same JSON. In hybrid search, defer the RrfResult->HybridResult mapping until after filter+take, so only `limit` items (typically 20) are constructed instead of up to 1,500 at RECALL_CAP. Also switch filtered_ids to into_iter() to avoid an extra .copied() pass. Switch FTS search_fts from prepare() to prepare_cached() for statement reuse across repeated searches. Benchmarked at ~1.6x faster. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-05 17:35:53 -05:00
Taylor Eernisse	72f1cafdcf	perf: Optimize SQL queries and reduce allocations in hot paths Change detection queries (embedding/change_detector.rs): - Replace triple-EXISTS subquery pattern with LEFT JOIN + NULL check - SQLite now scans embedding_metadata once instead of three times - Semantically identical: returns docs needing embedding when no embedding exists, hash changed, or config mismatch Count queries (cli/commands/count.rs): - Consolidate 3 separate COUNT queries for issues into single query using conditional aggregation (CASE WHEN state = 'x' THEN 1) - Same optimization for MRs: 5 queries reduced to 1 Search filter queries (search/filters.rs): - Replace N separate EXISTS clauses for label filtering with single IN() clause with COUNT/GROUP BY HAVING pattern - For multi-label AND queries, this reduces N subqueries to 1 FTS tokenization (search/fts.rs): - Replace collect-into-Vec-then-join pattern with direct String building - Pre-allocate capacity hint for result string Discussion truncation (documents/truncation.rs): - Calculate total length without allocating concatenated string first - Only allocate full string when we know it fits within limit Embedding pipeline (embedding/pipeline.rs): - Add Vec::with_capacity hints for chunk work and cleared_docs hashset - Reduces reallocations during embedding batch processing Backoff calculation (core/backoff.rs): - Replace unchecked addition with saturating_add to prevent overflow - Add test case verifying overflow protection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 11:21:28 -05:00
Taylor Eernisse	65583ed5d6	refactor: Remove redundant doc comments throughout codebase Removes module-level doc comments (//! lines) and excessive inline doc comments that were duplicating information already evident from: - Function/struct names (self-documenting code) - Type signatures (the what is clear from types) - Implementation context (the how is clear from code) Affected modules: - cli/* - Removed command descriptions duplicating clap help text - core/* - Removed module headers and obvious function docs - documents/* - Removed extractor/regenerator/truncation docs - embedding/* - Removed pipeline and chunking docs - gitlab/* - Removed client and transformer docs (kept type definitions) - ingestion/* - Removed orchestrator and ingestion docs - search/* - Removed FTS and vector search docs Philosophy: Code should be self-documenting. Comments should explain "why" (business decisions, non-obvious constraints) not "what" (which the code itself shows). This change reduces noise and maintenance burden while keeping the codebase just as understandable. Retains comments for: - Non-obvious business logic - Important safety invariants - Complex algorithm explanations - Public API boundaries where generated docs matter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:04:32 -05:00
Taylor Eernisse	a50fc78823	style: Apply cargo fmt and clippy fixes across codebase Automated formatting and lint corrections from parallel agent work: - cargo fmt: import reordering (alphabetical), line wrapping to respect max width, trailing comma normalization, destructuring alignment, function signature reformatting, match arm formatting - clippy (pedantic): Range::contains() instead of manual comparisons, i64::from() instead of `as i64` casts, .clamp() instead of .max().min() chains, let-chain refactors (if-let with &&), #[allow(clippy::too_many_arguments)] and #[allow(clippy::field_reassign_with_default)] where warranted - Removed trailing blank lines and extra whitespace No behavioral changes. All existing tests pass unmodified. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:01:59 -05:00
Taylor Eernisse	d5bdb24b0f	feat(search): Add hybrid search engine with FTS5, vector, and RRF fusion Implements the search module providing three search modes: - Lexical (FTS5): Full-text search using SQLite FTS5 with safe query sanitization. User queries are automatically tokenized and wrapped in proper FTS5 syntax. Supports a "raw" mode for power users who want direct FTS5 query syntax (NEAR, column filters, etc.). - Semantic (vector): Embeds the search query via Ollama, then performs cosine similarity search against stored document embeddings. Results are deduplicated by doc_id since documents may have multiple chunks. - Hybrid (default): Executes both lexical and semantic searches in parallel, then fuses results using Reciprocal Rank Fusion (RRF) with k=60. This avoids the complexity of score normalization while producing high-quality merged rankings. Gracefully degrades to lexical-only when embeddings are unavailable. Additional components: - search::filters: Post-retrieval filtering by source_type, author, project, labels (AND logic), file path prefix, created_after, and updated_after. Date filters accept relative formats (7d, 2w) and ISO dates. - search::rrf: Reciprocal Rank Fusion implementation with configurable k parameter and optional explain mode that annotates each result with its component ranks and fusion score breakdown. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:42 -05:00

10 Commits