gitlore/src/embedding/ollama.rs at 5786d7f4b6f135d53a4ae78b4a781a2cc922bd15

Files

Taylor Eernisse 3e9cf2358e perf(search+embed): zero-copy embedding API and deferred RRF mapping

Change OllamaClient::embed_batch to accept &[&str] instead of
Vec<String>. The EmbedRequest struct now borrows both model name and
input texts, eliminating per-batch cloning of chunk text (up to 32KB
per chunk x 32 chunks per batch). Serialization output is identical
since serde serializes &str and String to the same JSON.

In hybrid search, defer the RrfResult->HybridResult mapping until
after filter+take, so only `limit` items (typically 20) are
constructed instead of up to 1,500 at RECALL_CAP. Also switch
filtered_ids to into_iter() to avoid an extra .copied() pass.

Switch FTS search_fts from prepare() to prepare_cached() for statement
reuse across repeated searches. Benchmarked at ~1.6x faster.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-05 17:35:53 -05:00

5.3 KiB

Raw Blame History

View Raw

5.3 KiB Raw Blame History

5.3 KiB

Raw Blame History