test: Add test suites for embedding, FTS, hybrid search, and golden queries

Four new test modules covering the search infrastructure:

- tests/embedding.rs: Unit tests for the embedding pipeline including
  chunk ID encoding/decoding, change detection, and document chunking
  with overlap verification.

- tests/fts_search.rs: Integration tests for FTS5 search including
  safe query sanitization, multi-term queries, prefix matching, and
  the raw FTS mode for power users.

- tests/hybrid_search.rs: End-to-end tests for hybrid search mode
  including RRF fusion correctness, graceful degradation when
  embeddings are unavailable, and filter application.

- tests/golden_query_tests.rs: Golden query tests using fixtures
  from tests/fixtures/golden_queries.json to verify search quality
  against known-good query/result pairs. Ensures ranking stability
  across implementation changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Taylor Eernisse
2026-01-30 15:47:19 -05:00
parent daf5a73019
commit d235f2b4dd
5 changed files with 931 additions and 0 deletions

65
tests/fixtures/golden_queries.json vendored Normal file
View File

@@ -0,0 +1,65 @@
[
{
"query": "authentication login",
"mode": "lexical",
"filters": {},
"expected_doc_ids": [1],
"min_results": 1,
"max_rank": 10,
"description": "Basic auth keywords should find the OAuth login issue"
},
{
"query": "database migration",
"mode": "lexical",
"filters": {},
"expected_doc_ids": [3],
"min_results": 1,
"max_rank": 10,
"description": "Database migration terms should find the migration issue"
},
{
"query": "user profile",
"mode": "lexical",
"filters": {},
"expected_doc_ids": [2],
"min_results": 1,
"max_rank": 10,
"description": "User profile keywords should find the profile MR"
},
{
"query": "API rate limiting",
"mode": "lexical",
"filters": {},
"expected_doc_ids": [5],
"min_results": 1,
"max_rank": 10,
"description": "Rate limiting query should find the discussion document"
},
{
"query": "performance optimization",
"mode": "lexical",
"filters": {},
"expected_doc_ids": [4],
"min_results": 1,
"max_rank": 10,
"description": "Performance terms should find the performance MR"
},
{
"query": "token refresh",
"mode": "lexical",
"filters": {"source_type": "issue"},
"expected_doc_ids": [1],
"min_results": 1,
"max_rank": 10,
"description": "Token refresh with issue filter should find auth issue only"
},
{
"query": "CSS styling frontend",
"mode": "lexical",
"filters": {},
"expected_doc_ids": [6],
"min_results": 1,
"max_rank": 10,
"description": "Frontend CSS query should find the UI improvements issue"
}
]