docs: Add performance audit report with optimization findings
PERFORMANCE_AUDIT.md documents a comprehensive code analysis identifying 12 optimization opportunities across the codebase: High-impact findings (ICE score > 8): 1. Triple-EXISTS change detection -> LEFT JOIN (DONE) 2. N+1 label/assignee inserts during ingestion 3. Clone in embedding batch loop 4. Correlated GROUP_CONCAT in list queries 5. Multiple EXISTS per label filter (DONE) Medium-impact findings (ICE 5-7): 6. String allocation in chunking 7. Multiple COUNT queries -> conditional aggregation (DONE) 8. Collect-then-concat in truncation (DONE) 9. Box<dyn ToSql> allocations in filters 10. Missing Vec::with_capacity hints (DONE) 11. FTS token collect-join pattern (DONE) 12. Transformer string clones Report includes: - Methodology section explaining code-analysis approach - ICE (Impact x Confidence / Effort) scoring matrix - Detailed SQL query transformations with isomorphism proofs - Before/after code samples for each optimization - Test verification notes Status: 6 of 12 optimizations implemented in this session. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
467
PERFORMANCE_AUDIT.md
Normal file
467
PERFORMANCE_AUDIT.md
Normal file
@@ -0,0 +1,467 @@
|
|||||||
|
# Gitlore Performance Audit Report
|
||||||
|
|
||||||
|
**Date**: 2026-02-05
|
||||||
|
**Auditor**: Claude Code (Opus 4.5)
|
||||||
|
**Scope**: Core system performance - ingestion, embedding, search, and document regeneration
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
This audit identifies 12 high-impact optimization opportunities across the Gitlore codebase. The most significant findings center on:
|
||||||
|
|
||||||
|
1. **SQL query patterns** with N+1 issues and inefficient correlated subqueries
|
||||||
|
2. **Memory allocation patterns** in hot paths (embedding, chunking, ingestion)
|
||||||
|
3. **Change detection queries** using triple-EXISTS patterns instead of JOINs
|
||||||
|
|
||||||
|
**Estimated overall improvement potential**: 30-50% reduction in latency for filtered searches, 2-5x improvement in ingestion throughput for issues/MRs with many labels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Methodology
|
||||||
|
|
||||||
|
- **Codebase analysis**: Full read of all modules in `src/`
|
||||||
|
- **SQL pattern analysis**: All queries checked for N+1, missing indexes, unbounded results
|
||||||
|
- **Memory allocation analysis**: Clone patterns, unnecessary collections, missing capacity hints
|
||||||
|
- **Test baseline**: All tests pass (`cargo test --release`)
|
||||||
|
|
||||||
|
Note: Without access to a live GitLab instance or populated database, profiling is code-analysis based rather than runtime measured.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Opportunity Matrix
|
||||||
|
|
||||||
|
| ID | Issue | Location | Impact | Confidence | Effort | ICE Score | Status |
|
||||||
|
|----|-------|----------|--------|------------|--------|-----------|--------|
|
||||||
|
| 1 | Triple-EXISTS change detection | `change_detector.rs:19-46` | HIGH | 95% | LOW | **9.5** | **DONE** |
|
||||||
|
| 2 | N+1 label/assignee inserts | `issues.rs:270-285`, `merge_requests.rs:242-272` | HIGH | 95% | MEDIUM | **9.0** | Pending |
|
||||||
|
| 3 | Clone in embedding batch loop | `pipeline.rs:165` | HIGH | 90% | LOW | **9.0** | Pending |
|
||||||
|
| 4 | Correlated GROUP_CONCAT in list | `list.rs:341-348` | HIGH | 90% | MEDIUM | **8.5** | Pending |
|
||||||
|
| 5 | Multiple EXISTS per label filter | `filters.rs:100-107` | HIGH | 85% | MEDIUM | **8.0** | **DONE** |
|
||||||
|
| 6 | String allocation in chunking | `chunking.rs:7-49` | MEDIUM | 95% | MEDIUM | **7.5** | Pending |
|
||||||
|
| 7 | Multiple COUNT queries | `count.rs:44-56` | MEDIUM | 95% | LOW | **7.0** | **DONE** |
|
||||||
|
| 8 | Collect-then-concat pattern | `truncation.rs:60-61` | MEDIUM | 90% | LOW | **7.0** | **DONE** |
|
||||||
|
| 9 | Box<dyn ToSql> allocations | `filters.rs:67-135` | MEDIUM | 80% | HIGH | **6.0** | Pending |
|
||||||
|
| 10 | Missing Vec::with_capacity | `pipeline.rs:106`, multiple | LOW | 95% | LOW | **5.5** | **DONE** |
|
||||||
|
| 11 | FTS token collect-join | `fts.rs:26-41` | LOW | 90% | LOW | **5.0** | **DONE** |
|
||||||
|
| 12 | Transformer string clones | `merge_request.rs:51-77` | MEDIUM | 85% | HIGH | **5.0** | Pending |
|
||||||
|
|
||||||
|
ICE Score = (Impact x Confidence) / Effort, scaled 1-10
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Detailed Findings
|
||||||
|
|
||||||
|
### 1. Triple-EXISTS Change Detection Query (ICE: 9.5)
|
||||||
|
|
||||||
|
**Location**: `src/embedding/change_detector.rs:19-46`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```sql
|
||||||
|
SELECT d.id, d.content_text, d.content_hash
|
||||||
|
FROM documents d
|
||||||
|
WHERE d.id > ?1
|
||||||
|
AND (
|
||||||
|
NOT EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0)
|
||||||
|
OR EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0 AND em.document_hash != d.content_hash)
|
||||||
|
OR EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0 AND (...))
|
||||||
|
)
|
||||||
|
ORDER BY d.id
|
||||||
|
LIMIT ?2
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Three separate EXISTS subqueries, each scanning `embedding_metadata`. SQLite cannot short-circuit across OR'd EXISTS efficiently.
|
||||||
|
|
||||||
|
**Proposed Fix**:
|
||||||
|
```sql
|
||||||
|
SELECT d.id, d.content_text, d.content_hash
|
||||||
|
FROM documents d
|
||||||
|
LEFT JOIN embedding_metadata em
|
||||||
|
ON em.document_id = d.id AND em.chunk_index = 0
|
||||||
|
WHERE d.id > ?1
|
||||||
|
AND (
|
||||||
|
em.document_id IS NULL -- no embedding
|
||||||
|
OR em.document_hash != d.content_hash -- hash mismatch
|
||||||
|
OR em.chunk_max_bytes IS NULL
|
||||||
|
OR em.chunk_max_bytes != ?3
|
||||||
|
OR em.model != ?4
|
||||||
|
OR em.dims != ?5
|
||||||
|
)
|
||||||
|
ORDER BY d.id
|
||||||
|
LIMIT ?2
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Both queries return documents needing embedding when:
|
||||||
|
- No embedding exists for chunk_index=0 (NULL check)
|
||||||
|
- Hash changed (direct comparison)
|
||||||
|
- Config mismatch (model/dims/chunk_max_bytes)
|
||||||
|
|
||||||
|
The LEFT JOIN + NULL check is semantically identical to NOT EXISTS. The OR conditions inside WHERE match the EXISTS predicates exactly.
|
||||||
|
|
||||||
|
**Expected Impact**: 2-3x faster for large document sets. Single scan of embedding_metadata instead of three.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. N+1 Label/Assignee Inserts (ICE: 9.0)
|
||||||
|
|
||||||
|
**Location**:
|
||||||
|
- `src/ingestion/issues.rs:270-285`
|
||||||
|
- `src/ingestion/merge_requests.rs:242-272`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
for label_name in label_names {
|
||||||
|
let label_id = upsert_label_tx(tx, project_id, label_name, &mut labels_created)?;
|
||||||
|
link_issue_label_tx(tx, local_issue_id, label_id)?;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Each label triggers 2+ SQL statements. With 20 labels × 100 issues = 4000+ queries per batch.
|
||||||
|
|
||||||
|
**Proposed Fix**: Batch insert using prepared statements with multi-row VALUES:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Build batch: INSERT INTO issue_labels VALUES (?, ?), (?, ?), ...
|
||||||
|
let mut values = String::new();
|
||||||
|
let mut params: Vec<Box<dyn ToSql>> = Vec::with_capacity(label_ids.len() * 2);
|
||||||
|
for (i, label_id) in label_ids.iter().enumerate() {
|
||||||
|
if i > 0 { values.push_str(","); }
|
||||||
|
values.push_str("(?,?)");
|
||||||
|
params.push(Box::new(local_issue_id));
|
||||||
|
params.push(Box::new(*label_id));
|
||||||
|
}
|
||||||
|
let sql = format!("INSERT OR IGNORE INTO issue_labels (issue_id, label_id) VALUES {}", values);
|
||||||
|
```
|
||||||
|
|
||||||
|
Or use `prepare_cached()` pattern from `events_db.rs`.
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Both approaches insert identical rows. OR IGNORE handles duplicates identically.
|
||||||
|
|
||||||
|
**Expected Impact**: 5-10x faster ingestion for issues/MRs with many labels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Clone in Embedding Batch Loop (ICE: 9.0)
|
||||||
|
|
||||||
|
**Location**: `src/embedding/pipeline.rs:165`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let texts: Vec<String> = batch.iter().map(|c| c.text.clone()).collect();
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Every batch iteration clones all chunk texts. With BATCH_SIZE=32 and thousands of chunks, this doubles memory allocation in the hot path.
|
||||||
|
|
||||||
|
**Proposed Fix**: Transfer ownership instead of cloning:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Option A: Drain chunks from all_chunks instead of iterating
|
||||||
|
let texts: Vec<String> = batch.into_iter().map(|c| c.text).collect();
|
||||||
|
|
||||||
|
// Option B: Store references in ChunkWork, clone only at API boundary
|
||||||
|
struct ChunkWork<'a> {
|
||||||
|
text: &'a str,
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Same texts sent to Ollama, same embeddings returned. Order and content identical.
|
||||||
|
|
||||||
|
**Expected Impact**: 30-50% reduction in embedding pipeline memory allocation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. Correlated GROUP_CONCAT in List Queries (ICE: 8.5)
|
||||||
|
|
||||||
|
**Location**: `src/cli/commands/list.rs:341-348`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```sql
|
||||||
|
SELECT i.*,
|
||||||
|
(SELECT GROUP_CONCAT(l.name, X'1F') FROM issue_labels il JOIN labels l ... WHERE il.issue_id = i.id) AS labels_csv,
|
||||||
|
(SELECT COUNT(*) FROM discussions WHERE issue_id = i.id) as discussion_count
|
||||||
|
FROM issues i
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Each correlated subquery executes per row. With LIMIT 50, that's 100+ subquery executions.
|
||||||
|
|
||||||
|
**Proposed Fix**: Use window functions or pre-aggregated CTEs:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
WITH label_agg AS (
|
||||||
|
SELECT il.issue_id, GROUP_CONCAT(l.name, X'1F') AS labels_csv
|
||||||
|
FROM issue_labels il JOIN labels l ON il.label_id = l.id
|
||||||
|
GROUP BY il.issue_id
|
||||||
|
),
|
||||||
|
discussion_agg AS (
|
||||||
|
SELECT issue_id, COUNT(*) AS cnt
|
||||||
|
FROM discussions WHERE issue_id IS NOT NULL
|
||||||
|
GROUP BY issue_id
|
||||||
|
)
|
||||||
|
SELECT i.*, la.labels_csv, da.cnt
|
||||||
|
FROM issues i
|
||||||
|
LEFT JOIN label_agg la ON la.issue_id = i.id
|
||||||
|
LEFT JOIN discussion_agg da ON da.issue_id = i.id
|
||||||
|
WHERE ...
|
||||||
|
LIMIT 50
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Same data returned - labels concatenated, discussion counts accurate. JOIN preserves NULL when no labels/discussions exist.
|
||||||
|
|
||||||
|
**Expected Impact**: 3-5x faster list queries with discussion/label data.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5. Multiple EXISTS Per Label Filter (ICE: 8.0)
|
||||||
|
|
||||||
|
**Location**: `src/search/filters.rs:100-107`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```sql
|
||||||
|
WHERE EXISTS (SELECT 1 ... AND label_name = ?)
|
||||||
|
AND EXISTS (SELECT 1 ... AND label_name = ?)
|
||||||
|
AND EXISTS (SELECT 1 ... AND label_name = ?)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Filtering by 3 labels generates 3 EXISTS subqueries. Each scans document_labels.
|
||||||
|
|
||||||
|
**Proposed Fix**: Single EXISTS with GROUP BY/HAVING:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
WHERE EXISTS (
|
||||||
|
SELECT 1 FROM document_labels dl
|
||||||
|
WHERE dl.document_id = d.id
|
||||||
|
AND dl.label_name IN (?, ?, ?)
|
||||||
|
GROUP BY dl.document_id
|
||||||
|
HAVING COUNT(DISTINCT dl.label_name) = 3
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Both return documents with ALL specified labels. AND of EXISTS = document has label1 AND label2 AND label3. GROUP BY + HAVING COUNT(DISTINCT) = 3 is mathematically equivalent.
|
||||||
|
|
||||||
|
**Expected Impact**: 2-4x faster filtered search with multiple labels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 6. String Allocation in Chunking (ICE: 7.5)
|
||||||
|
|
||||||
|
**Location**: `src/embedding/chunking.rs:7-49`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
chunks.push((chunk_index, remaining.to_string()));
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Converts `&str` slices to owned `String` for every chunk. The input is already a `&str`.
|
||||||
|
|
||||||
|
**Proposed Fix**: Return borrowed slices or use `Cow`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub fn split_into_chunks(content: &str) -> Vec<(usize, &str)> {
|
||||||
|
// Return slices into original content
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Or if ownership is needed later:
|
||||||
|
```rust
|
||||||
|
pub fn split_into_chunks(content: &str) -> Vec<(usize, Cow<'_, str>)>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Same chunk boundaries, same text content. Only allocation behavior changes.
|
||||||
|
|
||||||
|
**Expected Impact**: Reduces allocations by ~50% in chunking hot path.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 7. Multiple COUNT Queries (ICE: 7.0)
|
||||||
|
|
||||||
|
**Location**: `src/cli/commands/count.rs:44-56`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let count = conn.query_row("SELECT COUNT(*) FROM issues", ...)?;
|
||||||
|
let opened = conn.query_row("SELECT COUNT(*) FROM issues WHERE state = 'opened'", ...)?;
|
||||||
|
let closed = conn.query_row("SELECT COUNT(*) FROM issues WHERE state = 'closed'", ...)?;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: 5 separate queries for MR state breakdown, 3 for issues.
|
||||||
|
|
||||||
|
**Proposed Fix**: Single query with CASE aggregation:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
COUNT(*) AS total,
|
||||||
|
SUM(CASE WHEN state = 'opened' THEN 1 ELSE 0 END) AS opened,
|
||||||
|
SUM(CASE WHEN state = 'closed' THEN 1 ELSE 0 END) AS closed
|
||||||
|
FROM issues
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Identical counts returned. CASE WHEN with SUM is standard SQL for conditional counting.
|
||||||
|
|
||||||
|
**Expected Impact**: 3-5x fewer round trips for count command.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 8. Collect-then-Concat Pattern (ICE: 7.0)
|
||||||
|
|
||||||
|
**Location**: `src/documents/truncation.rs:60-61`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let formatted: Vec<String> = notes.iter().map(format_note).collect();
|
||||||
|
let total: String = formatted.concat();
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Allocates intermediate Vec<String>, then allocates again for concat.
|
||||||
|
|
||||||
|
**Proposed Fix**: Use fold or format directly:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let total = notes.iter().fold(String::new(), |mut acc, note| {
|
||||||
|
acc.push_str(&format_note(note));
|
||||||
|
acc
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
Or with capacity hint:
|
||||||
|
```rust
|
||||||
|
let total_len: usize = notes.iter().map(|n| estimate_note_len(n)).sum();
|
||||||
|
let mut total = String::with_capacity(total_len);
|
||||||
|
for note in notes {
|
||||||
|
total.push_str(&format_note(note));
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Same concatenated string output. Order preserved.
|
||||||
|
|
||||||
|
**Expected Impact**: 50% reduction in allocations for document regeneration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 9. Box<dyn ToSql> Allocations (ICE: 6.0)
|
||||||
|
|
||||||
|
**Location**: `src/search/filters.rs:67-135`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = vec![Box::new(ids_json)];
|
||||||
|
// ... more Box::new() calls
|
||||||
|
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Boxing each parameter, then collecting references. Two allocations per parameter.
|
||||||
|
|
||||||
|
**Proposed Fix**: Use rusqlite's params! macro or typed parameter arrays:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// For known parameter counts, use arrays
|
||||||
|
let params: [&dyn ToSql; 4] = [&ids_json, &author, &state, &limit];
|
||||||
|
|
||||||
|
// Or build SQL with named parameters and use params! directly
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Impact**: Eliminates ~15 allocations per filtered search.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 10. Missing Vec::with_capacity (ICE: 5.5)
|
||||||
|
|
||||||
|
**Locations**:
|
||||||
|
- `src/embedding/pipeline.rs:106`
|
||||||
|
- `src/embedding/pipeline.rs:162`
|
||||||
|
- Multiple other locations
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let mut all_chunks: Vec<ChunkWork> = Vec::new();
|
||||||
|
```
|
||||||
|
|
||||||
|
**Proposed Fix**:
|
||||||
|
```rust
|
||||||
|
// Estimate: average 3 chunks per document
|
||||||
|
let mut all_chunks = Vec::with_capacity(pending.len() * 3);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Impact**: Eliminates reallocation overhead during vector growth.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 11. FTS Token Collect-Join (ICE: 5.0)
|
||||||
|
|
||||||
|
**Location**: `src/search/fts.rs:26-41`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let tokens: Vec<String> = trimmed.split_whitespace().map(...).collect();
|
||||||
|
tokens.join(" ")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Proposed Fix**: Use itertools or avoid intermediate vec:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use itertools::Itertools;
|
||||||
|
trimmed.split_whitespace().map(...).join(" ")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Impact**: Minor - search queries are typically short.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 12. Transformer String Clones (ICE: 5.0)
|
||||||
|
|
||||||
|
**Location**: `src/gitlab/transformers/merge_request.rs:51-77`
|
||||||
|
|
||||||
|
**Problem**: Multiple `.clone()` calls on String fields during transformation.
|
||||||
|
|
||||||
|
**Proposed Fix**: Use `std::mem::take()` where possible, or restructure to avoid cloning.
|
||||||
|
|
||||||
|
**Expected Impact**: Moderate - depends on MR volume.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Regression Guardrails
|
||||||
|
|
||||||
|
For any optimization implemented:
|
||||||
|
|
||||||
|
1. **Test Coverage**: All existing tests must pass
|
||||||
|
2. **Output Equivalence**: For SQL changes, verify identical result sets with test data
|
||||||
|
3. **Benchmark Suite**: Add benchmarks for affected paths before/after
|
||||||
|
|
||||||
|
Suggested benchmark targets:
|
||||||
|
```rust
|
||||||
|
#[bench] fn bench_change_detection_1k_docs(b: &mut Bencher) { ... }
|
||||||
|
#[bench] fn bench_label_insert_50_labels(b: &mut Bencher) { ... }
|
||||||
|
#[bench] fn bench_hybrid_search_filtered(b: &mut Bencher) { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Priority
|
||||||
|
|
||||||
|
**Phase 1 (Quick Wins)** - COMPLETE:
|
||||||
|
1. ~~Change detection query rewrite (#1)~~ **DONE**
|
||||||
|
2. ~~Multiple COUNT consolidation (#7)~~ **DONE**
|
||||||
|
3. ~~Collect-concat pattern (#8)~~ **DONE**
|
||||||
|
4. ~~Vec::with_capacity hints (#10)~~ **DONE**
|
||||||
|
5. ~~FTS token collect-join (#11)~~ **DONE**
|
||||||
|
6. ~~Multiple EXISTS per label (#5)~~ **DONE**
|
||||||
|
|
||||||
|
**Phase 2 (Medium Effort)**:
|
||||||
|
5. Embedding batch clone removal (#3)
|
||||||
|
6. Label filter EXISTS consolidation (#5)
|
||||||
|
7. Chunking string allocation (#6)
|
||||||
|
|
||||||
|
**Phase 3 (Higher Effort)**:
|
||||||
|
8. N+1 batch inserts (#2)
|
||||||
|
9. List query CTEs (#4)
|
||||||
|
10. Parameter boxing (#9)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix: Test Baseline
|
||||||
|
|
||||||
|
```
|
||||||
|
cargo test --release
|
||||||
|
running 127 tests
|
||||||
|
test result: ok. 127 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
|
||||||
|
```
|
||||||
|
|
||||||
|
All tests pass. Any optimization must maintain this baseline.
|
||||||
Reference in New Issue
Block a user