Files

teernisse acc5e12e3d perf: force partial index for DiffNote queries, batch stats counts

Query optimizer fixes for the `who` and `stats` commands based on
a systematic performance audit of the SQLite query plans.

who command (expert/reviews/detail modes):
- Add INDEXED BY idx_notes_diffnote_path_created hints to all DiffNote
  queries. SQLite's planner was selecting idx_notes_system (38% of rows)
  over the far more selective partial index (9.3% of rows). Measured
  50-133x speedup on expert queries, 26x on reviews queries.
- Reorder JOIN clauses in detail mode's MR-author sub-select to match
  the index scan direction (notes -> discussions -> merge_requests).

stats command:
- Replace 12+ sequential COUNT(*) queries with conditional aggregates
  (COALESCE + SUM + CASE). Documents, dirty_sources, pending_discussion_
  fetches, and pending_dependent_fetches tables each scanned once instead
  of 2-3 times. Measured 1.7x speedup (109ms -> 65ms warm cache).
- Switch FTS document count from COUNT(*) on the virtual table to
  COUNT(*) on documents_fts_docsize shadow table (B-tree scan vs FTS5
  virtual table overhead). Measured 19x speedup for that single query.

Database: 61652 docs, 282K notes, 211K discussions, 1.5GB.

2026-02-12 11:21:00 -05:00

7.2 KiB

Raw Blame History

Deep Performance Audit Report

Date: 2026-02-12 Branch: perf-audit (e9bacc94) Parent: 039ab1c2 (master, v0.6.1)

Methodology

Baseline — measured p50/p95 latency for all major commands with warm cache
Profile — used macOS sample profiler and EXPLAIN QUERY PLAN to identify hotspots
Golden output — captured exact numeric outputs before changes as equivalence oracle
One lever per change — each optimization isolated and independently benchmarked
Revert threshold — any optimization <1.1x speedup reverted per audit rules

Baseline Measurements (warm cache, release build)

Command	Latency	Notes
`who --path src/core/db.rs` (expert)	2200ms	Hotspot
`who --active`	83-93ms	Acceptable
`who workload`	22ms	Fast
`stats`	107-112ms	Hotspot
`search "authentication"`	1030ms	Hotspot (library-level)
`list issues -n 50`	~40ms	Fast

Optimization 1: INDEXED BY for DiffNote Queries

Target: src/cli/commands/who.rs — expert and reviews query paths

Problem: SQLite query planner chose idx_notes_system (38% selectivity, 106K rows) over idx_notes_diffnote_path_created (9.3% selectivity, 26K rows) for path-filtered DiffNote queries. The partial index WHERE noteable_type = 'MergeRequest' AND type = 'DiffNote' is far more selective but the planner's cost model didn't pick it.

Change: Added INDEXED BY idx_notes_diffnote_path_created to all 8 SQL queries across query_expert, query_expert_details, query_reviews, build_path_query (probes 1 & 2), and suffix_probe.

Results:

Query	Before	After	Speedup
expert (specific path)	2200ms	56-58ms	38x
expert (broad path)	2200ms	83ms	26x
reviews	1800ms	24ms	75x

Isomorphism proof: INDEXED BY only changes which index the planner uses, not the query semantics. Same rows matched, same ordering, same output. Verified by golden output comparison across 5+ runs.

Optimization 2: Conditional Aggregates in Stats

Target: src/cli/commands/stats.rs

Problem: 12+ sequential COUNT(*) queries each requiring a full table scan of documents (61K rows). Each scan touched the same pages but couldn't share work.

Changes:

Documents: 5 sequential COUNTs -> 1 query with SUM(CASE WHEN ... THEN 1 END)
FTS count: SELECT COUNT(*) FROM documents_fts (virtual table, slow) -> SELECT COUNT(*) FROM documents_fts_docsize (shadow B-tree table, 19x faster)
Embeddings: 2 queries -> 1 with COUNT(DISTINCT document_id), COUNT(*)
Dirty sources: 2 queries -> 1 with conditional aggregates
Pending fetches: 2 queries -> 1 each (discussions, dependents)

Results:

Metric	Before	After	Speedup
Warm median	112ms	66ms	1.70x
Cold	1220ms	~700ms	~1.7x

Golden output verified:

total:61652, issues:8241, mrs:10018, discussions:43393, truncated:63
fts:61652, embedded:61652, chunks:88161

All values match exactly across before/after runs.

Isomorphism proof: SUM(CASE WHEN x THEN 1 END) is algebraically identical to COUNT(*) WHERE x. The FTS5 shadow table documents_fts_docsize has exactly one row per FTS document by SQLite specification, so COUNT(*) on it equals the virtual table count.

Investigation: Two-Phase FTS Search (REVERTED)

Target: src/search/fts.rs, src/cli/commands/search.rs

Hypothesis: FTS5 snippet() generation is expensive. Splitting search into Phase 1 (score-only MATCH+bm25) and Phase 2 (snippet for filtered results only) should reduce work.

Implementation: Created fetch_fts_snippets() that retrieves snippets only for post-filter document IDs via json_each() join.

Results:

Metric	Before	After	Improvement
search (limit 20)	1030ms	995ms	3.5%

Decision: Reverted. Per audit rules, <1.1x speedup does not justify added code complexity.

Root cause: The bottleneck is not snippet generation but MATCH + bm25() scoring itself. Profiling showed strspn (FTS5 tokenizer) and memmove as the top CPU consumers. The same query runs in 30ms on system sqlite3 but 1030ms in rusqlite's bundled SQLite — a ~125x gap despite both being SQLite 3.51.x compiled at -O3.

Library-Level Finding: Bundled SQLite FTS5 Performance

Observation: FTS5 MATCH+bm25 queries are ~125x slower in rusqlite's bundled SQLite vs system sqlite3.

Environment	Query Time	Notes
System sqlite3 (macOS)	30ms (with snippet), 8ms (without)	Same .db file
rusqlite bundled	1030ms	`features = ["bundled"]`, OPT_LEVEL=3

Profiler data (macOS sample):

Top hotspot: strspn in FTS5 tokenizer
Secondary: memmove in FTS5 internals
Scaling: ~5ms per result (limit 5 = 497ms, limit 20 = 995ms)

Possible causes:

Bundled SQLite compiled without platform-specific optimizations (SIMD, etc.)
Different memory allocator behavior
Missing compile-time tuning flags

Recommendation for future: Investigate switching from features = ["bundled"] to system SQLite linkage, or audit the bundled compile flags in the libsqlite3-sys build script.

Exploration Agent Findings (Informational)

Four parallel exploration agents surveyed the entire codebase. Key findings beyond what was already addressed:

Ingestion Pipeline

Serial DB writes in async context (acceptable — rusqlite is synchronous)
Label ingestion uses individual inserts (potential batch optimization, low priority)

CLI / GitLab Client

GraphQL client recreated per call (client.rs:98-100) — caches connection pool, minor
Double JSON deserialization in GraphQL responses — medium priority
N+1 subqueries in list command (list.rs:408-423) — 4 correlated subqueries per row

Search / Embedding

No N+1 patterns, no O(n^2) algorithms
Chunking is O(n) single-pass with proper UTF-8 safety
Ollama concurrency model is sound (parallel HTTP, serial DB writes)

Database / Documents

O(n^2) prefix sum in truncation.rs — low traffic path
String allocation patterns in extractors — micro-optimization territory

Opportunity Matrix

Candidate	Impact	Confidence	Effort	Score	Status
INDEXED BY for DiffNote	Very High	High	Low	9.0	Shipped
Stats conditional aggregates	Medium	High	Low	7.0	Shipped
Bundled SQLite FTS5	Very High	Medium	High	5.0	Documented
List N+1 subqueries	Medium	Medium	Medium	4.0	Backlog
GraphQL double deser	Low	Medium	Low	3.5	Backlog
Truncation O(n^2)	Low	High	Low	3.0	Backlog

Files Modified

File	Change
`src/cli/commands/who.rs`	INDEXED BY hints on 8 SQL queries
`src/cli/commands/stats.rs`	Conditional aggregates, FTS5 shadow table, merged queries

Quality Gates

All 603 tests pass
cargo clippy --all-targets -- -D warnings clean
cargo fmt --check clean
Golden output verified for both optimizations

7.2 KiB Raw Blame History

Deep Performance Audit Report

Methodology

Baseline Measurements (warm cache, release build)

Optimization 1: INDEXED BY for DiffNote Queries

Optimization 2: Conditional Aggregates in Stats

Investigation: Two-Phase FTS Search (REVERTED)

Library-Level Finding: Bundled SQLite FTS5 Performance

Exploration Agent Findings (Informational)

Ingestion Pipeline

CLI / GitLab Client

Search / Embedding

Database / Documents

Opportunity Matrix

Files Modified

Quality Gates

7.2 KiB

Raw Blame History