fix: Project-scoped job claiming, structured rate-limit logging, RRF total_cmp

Targeted fixes across multiple subsystems:

dependent_queue:
- Add project_id parameter to claim_jobs() for project-scoped job claiming,
  preventing cross-project job theft during concurrent multi-project ingestion
- Add project_id parameter to count_pending_jobs() with optional scoping
  (None returns global counts, Some(pid) returns per-project counts)

gitlab/client:
- Downgrade rate-limit log from warn to info (429s are expected operational
  behavior, not warnings) and add structured fields (path, status_code)
  for better log filtering and aggregation

gitlab/transformers/discussion:
- Add tracing::warn on invalid timestamp parse instead of silent fallback
  to epoch 0, making data quality issues visible in logs

ingestion/merge_requests:
- Remove duplicate doc comment on upsert_label_tx

search/rrf:
- Replace partial_cmp().unwrap_or() with total_cmp() for f64 sorting,
  eliminating the NaN edge case entirely (total_cmp treats NaN consistently)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
teernisse
2026-02-04 10:01:28 -05:00
parent f6d19a9467
commit 86a51cddef
6 changed files with 102 additions and 38 deletions

View File

@@ -1,7 +1,7 @@
/// Multiplier for encoding (document_id, chunk_index) into a single rowid.
/// Supports up to 1000 chunks per document. At CHUNK_MAX_BYTES=6000,
/// a 2MB document (MAX_DOCUMENT_BYTES_HARD) produces ~333 chunks.
/// The pipeline enforces chunk_count < CHUNK_ROWID_MULTIPLIER at runtime.
/// The pipeline enforces chunk_count <= CHUNK_ROWID_MULTIPLIER at runtime.
pub const CHUNK_ROWID_MULTIPLIER: i64 = 1000;
/// Encode (document_id, chunk_index) into a sqlite-vec rowid.