Two new microbenchmarks measuring optimizations applied in this session:
bench_redundant_hash_query_elimination:
Compares the old 2-query pattern (get_existing_hash + full SELECT)
against the new single-query pattern where upsert_document_inner
returns change detection info directly. Uses 100 seeded documents
with 10K iterations, prepare_cached, and black_box to prevent
elision.
bench_embedding_bytes_alloc_vs_reuse:
Compares per-call Vec<u8> allocation against the reusable embed_buf
pattern now used in store_embedding. Simulates 768-dim embeddings
(nomic-embed-text) with 50K iterations. Includes correctness
assertion that both approaches produce identical byte output.
Both benchmarks use informational-only timing (no pass/fail on speed)
with correctness assertions as the actual test criteria, ensuring they
never flake on CI.
Notes recorded in benchmark file:
- SHA256 hex formatting optimization measured at 1.01x (reverted)
- compute_list_hash sort strategy measured at 1.02x (reverted)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests/perf_benchmark.rs with three side-by-side benchmarks that
compare old vs new approaches for the optimizations introduced in the
preceding commits:
- bench_label_insert_individual_vs_batch: measures N individual INSERTs
vs single multi-row INSERT (5k iterations, ~1.6x speedup)
- bench_string_building_old_vs_new: measures format!+push_str vs
writeln! (50k iterations, ~1.9x speedup)
- bench_prepare_vs_prepare_cached: measures prepare vs prepare_cached
(10k iterations, ~1.6x speedup)
Each benchmark verifies correctness (both approaches produce identical
output) and uses std::hint::black_box to prevent dead-code
elimination. Run with: cargo test --test perf_benchmark -- --nocapture
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>