fix: Savepoint leak in embedding pipeline, atomic fail_job, RRF dedup
Three correctness fixes found during peer code review: Embedding pipeline savepoint leak (HIGH severity): The SAVEPOINT embed_page / RELEASE embed_page pattern had ~10 `?` propagation points between them. Any error from record_embedding_error, clear_document_embeddings, or store_embedding would exit the function without rolling back, leaving the SQLite connection in a broken transactional state and causing cascading failures for the rest of the session. Fixed by extracting page processing into `embed_page()` and wrapping with explicit rollback-on-error handling. Dependent queue fail_job race (MEDIUM severity): fail_job performed a SELECT followed by a separate UPDATE on the attempts counter without a transaction. Under concurrent lock reclamation, the attempts value could be read stale. Replaced with a single atomic UPDATE that increments attempts and computes exponential backoff entirely in SQL, also halving DB round-trips. Added explicit error when the job no longer exists. RRF duplicate document score inflation (MEDIUM severity): If a retriever returned the same document_id multiple times, the RRF score accumulated multiple rank contributions while the rank only recorded the first occurrence. Moved the score accumulation inside the `if is_none` guard so only the first occurrence per list contributes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -33,8 +33,10 @@ pub fn rank_rrf(vector_results: &[(i64, f64)], fts_results: &[(i64, f64)]) -> Ve
|
||||
for (i, &(doc_id, _)) in vector_results.iter().enumerate() {
|
||||
let rank = i + 1; // 1-indexed
|
||||
let entry = scores.entry(doc_id).or_insert((0.0, None, None));
|
||||
entry.0 += 1.0 / (RRF_K + rank as f64);
|
||||
// Only count the first occurrence per list to prevent duplicates
|
||||
// from inflating the score.
|
||||
if entry.1.is_none() {
|
||||
entry.0 += 1.0 / (RRF_K + rank as f64);
|
||||
entry.1 = Some(rank);
|
||||
}
|
||||
}
|
||||
@@ -42,8 +44,8 @@ pub fn rank_rrf(vector_results: &[(i64, f64)], fts_results: &[(i64, f64)]) -> Ve
|
||||
for (i, &(doc_id, _)) in fts_results.iter().enumerate() {
|
||||
let rank = i + 1; // 1-indexed
|
||||
let entry = scores.entry(doc_id).or_insert((0.0, None, None));
|
||||
entry.0 += 1.0 / (RRF_K + rank as f64);
|
||||
if entry.2.is_none() {
|
||||
entry.0 += 1.0 / (RRF_K + rank as f64);
|
||||
entry.2 = Some(rank);
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user