feat(embed): concurrent batching, UTF-8 safe chunking, right-sized chunks

Three fixes to the embedding pipeline:

1. Concurrent HTTP batching: fire EMBED_CONCURRENCY (2) Ollama requests
   in parallel via join_all, then write results serially to SQLite.
   ~2x throughput improvement on GPU-bound workloads.

2. UTF-8 boundary safety: all computed byte offsets in split_into_chunks
   (paragraph/sentence/word break finders + overlap advance) now use
   floor_char_boundary() to prevent panics on multi-byte characters
   like smart quotes and non-breaking spaces.

3. CHUNK_MAX_BYTES reduced from 6000 to 1500 to fit nomic-embed-text's
   actual 2048-token context window, eliminating context-length retry
   storms that were causing 10x slowdowns.

Also threads ShutdownSignal through embed pipeline for graceful Ctrl+C.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Taylor Eernisse
2026-02-06 14:48:34 -05:00
parent 1c45725cba
commit 39cb0cb087
5 changed files with 199 additions and 115 deletions

View File

@@ -1517,7 +1517,15 @@ async fn handle_embed(
let config = Config::load(config_override)?;
let full = args.full && !args.no_full;
let retry_failed = args.retry_failed && !args.no_retry_failed;
let result = run_embed(&config, full, retry_failed, None).await?;
let signal = ShutdownSignal::new();
let signal_for_handler = signal.clone();
tokio::spawn(async move {
let _ = tokio::signal::ctrl_c().await;
signal_for_handler.cancel();
});
let result = run_embed(&config, full, retry_failed, None, &signal).await?;
if robot_mode {
print_embed_json(&result);
} else {