gitlore

Author	SHA1	Message	Date
teernisse	9107a78b57	perf(ingestion): replace per-row INSERT loops with chunked batch INSERTs The issue and MR ingestion paths previously inserted labels, assignees, and reviewers one row at a time inside a transaction. For entities with many labels or assignees, this issued N separate SQLite statements where a single multi-row INSERT suffices. Replace the per-row loops with batch INSERT functions that build a single `INSERT OR IGNORE ... VALUES (?1,?2),(?1,?3),...` statement per chunk. Chunks are capped at 400 rows (BATCH_LINK_ROWS_MAX) to stay comfortably below SQLite's default 999 bind-parameter limit. Affected paths: - issues.rs: link_issue_labels_batch_tx, insert_issue_assignees_batch_tx - merge_requests.rs: insert_mr_labels_batch_tx, insert_mr_assignees_batch_tx, insert_mr_reviewers_batch_tx New tests verify deduplication (OR IGNORE), multi-chunk correctness, and equivalence with the old per-row approach. A perf benchmark (bench_issue_assignee_insert_individual_vs_batch) demonstrates the speedup across representative assignee set sizes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 13:36:26 -05:00
Taylor Eernisse	a573d695d5	test(perf): add benchmarks for hash query elimination and embed bytes Two new microbenchmarks measuring optimizations applied in this session: bench_redundant_hash_query_elimination: Compares the old 2-query pattern (get_existing_hash + full SELECT) against the new single-query pattern where upsert_document_inner returns change detection info directly. Uses 100 seeded documents with 10K iterations, prepare_cached, and black_box to prevent elision. bench_embedding_bytes_alloc_vs_reuse: Compares per-call Vec<u8> allocation against the reusable embed_buf pattern now used in store_embedding. Simulates 768-dim embeddings (nomic-embed-text) with 50K iterations. Includes correctness assertion that both approaches produce identical byte output. Both benchmarks use informational-only timing (no pass/fail on speed) with correctness assertions as the actual test criteria, ensuring they never flake on CI. Notes recorded in benchmark file: - SHA256 hex formatting optimization measured at 1.01x (reverted) - compute_list_hash sort strategy measured at 1.02x (reverted) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 22:43:11 -05:00
Taylor Eernisse	f1cb45a168	style: format perf_benchmark.rs with cargo fmt Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-06 08:49:53 -05:00
Taylor Eernisse	e8845380e9	test: add performance regression benchmarks Add tests/perf_benchmark.rs with three side-by-side benchmarks that compare old vs new approaches for the optimizations introduced in the preceding commits: - bench_label_insert_individual_vs_batch: measures N individual INSERTs vs single multi-row INSERT (5k iterations, ~1.6x speedup) - bench_string_building_old_vs_new: measures format!+push_str vs writeln! (50k iterations, ~1.9x speedup) - bench_prepare_vs_prepare_cached: measures prepare vs prepare_cached (10k iterations, ~1.6x speedup) Each benchmark verifies correctness (both approaches produce identical output) and uses std::hint::black_box to prevent dead-code elimination. Run with: cargo test --test perf_benchmark -- --nocapture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-05 17:36:01 -05:00

4 Commits