Implement drift detection using cosine similarity between issue description
embedding and chronological note embeddings. Sliding window (size 3) identifies
topic drift points. Includes human and robot output formatters.
New files: drift.rs, similarity.rs
Closes: bd-1cjx
Automated formatting and lint corrections from parallel agent work:
- cargo fmt: import reordering (alphabetical), line wrapping to respect
max width, trailing comma normalization, destructuring alignment,
function signature reformatting, match arm formatting
- clippy (pedantic): Range::contains() instead of manual comparisons,
i64::from() instead of `as i64` casts, .clamp() instead of
.max().min() chains, let-chain refactors (if-let with &&),
#[allow(clippy::too_many_arguments)] and
#[allow(clippy::field_reassign_with_default)] where warranted
- Removed trailing blank lines and extra whitespace
No behavioral changes. All existing tests pass unmodified.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the embedding module that generates vector representations
of documents using a local Ollama instance with the nomic-embed-text
model. These embeddings enable semantic (vector) search and the hybrid
search mode that fuses lexical and semantic results via RRF.
Key components:
- embedding::ollama: HTTP client for the Ollama /api/embeddings
endpoint. Handles connection errors with actionable error messages
(OllamaUnavailable, OllamaModelNotFound) and validates response
dimensions.
- embedding::chunking: Splits long documents into overlapping
paragraph-aware chunks for embedding. Uses a configurable max token
estimate (8192 default for nomic-embed-text) with 10% overlap to
preserve cross-chunk context.
- embedding::chunk_ids: Encodes chunk identity as
doc_id * 1000 + chunk_index for the embeddings table rowid. This
allows vector search to map results back to documents and
deduplicate by doc_id efficiently.
- embedding::change_detector: Compares document content_hash against
stored embedding hashes to skip re-embedding unchanged documents,
making incremental embedding runs fast.
- embedding::pipeline: Orchestrates the full embedding flow: detect
changed documents, chunk them, call Ollama in configurable
concurrency (default 4), store results. Supports --retry-failed
to re-attempt previously failed embeddings.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>