Commit Graph

4 Commits

Author SHA1 Message Date
teernisse
a0519a4d0d feat(surgical-sync): add per-IID surgical sync pipeline
Implement lore sync --issue <IID> --mr <IID> -p <project> for on-demand
sync of specific entities without running the full project-wide pipeline.
Completes in seconds by fetching only targeted entities, their discussions,
resource events, and dependent data, then scoping doc regeneration and
embedding to only affected documents.

Pipeline stages: PREFLIGHT -> TOCTOU -> INGEST -> DEPENDENTS -> DOCS -> EMBED

New files:
- src/ingestion/surgical.rs: TOCTOU guard, preflight fetch, per-entity ingest
- src/ingestion/surgical_tests.rs: 17 unit/wiremock tests
- src/cli/commands/sync_surgical.rs: 719-line orchestrator
- src/embedding/pipeline_tests.rs: scoped embedding tests
- src/gitlab/client_tests.rs: get_by_iid wiremock tests
- migrations/027_surgical_sync_runs.sql: 12 surgical columns + indexes

Key changes:
- SyncOptions: issue_iids, mr_iids, project, preflight_only fields
- SyncResult: surgical_mode, surgical_iids, entity_results fields
- SyncRunRecorder: surgical lifecycle methods (set_surgical_metadata, etc)
- GitLabClient: get_issue_by_iid, get_mr_by_iid
- Scoped docs: regenerate_dirty_documents_for_sources
- Scoped embed: embed_documents_by_ids
- run_sync dispatches to run_sync_surgical when is_surgical()
- robot-docs updated with surgical sync schema + workflows
- All 1019 tests pass, clippy clean

Closes: bd-1sc6, bd-tiux, bd-159p, bd-1lja, bd-hs6j, bd-1elx, bd-arka,
        bd-3sez, bd-wcja, bd-kanh, bd-1i4i, bd-3bec
2026-02-18 15:39:14 -05:00
teernisse
47eecce8e9 feat(bd-1cjx): add lore drift command for discussion divergence detection
Implement drift detection using cosine similarity between issue description
embedding and chronological note embeddings. Sliding window (size 3) identifies
topic drift points. Includes human and robot output formatters.

New files: drift.rs, similarity.rs
Closes: bd-1cjx
2026-02-12 12:02:15 -05:00
Taylor Eernisse
a50fc78823 style: Apply cargo fmt and clippy fixes across codebase
Automated formatting and lint corrections from parallel agent work:

- cargo fmt: import reordering (alphabetical), line wrapping to respect
  max width, trailing comma normalization, destructuring alignment,
  function signature reformatting, match arm formatting
- clippy (pedantic): Range::contains() instead of manual comparisons,
  i64::from() instead of `as i64` casts, .clamp() instead of
  .max().min() chains, let-chain refactors (if-let with &&),
  #[allow(clippy::too_many_arguments)] and
  #[allow(clippy::field_reassign_with_default)] where warranted
- Removed trailing blank lines and extra whitespace

No behavioral changes. All existing tests pass unmodified.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 13:01:59 -05:00
Taylor Eernisse
723703bed9 feat(embedding): Add Ollama-powered vector embedding pipeline
Implements the embedding module that generates vector representations
of documents using a local Ollama instance with the nomic-embed-text
model. These embeddings enable semantic (vector) search and the hybrid
search mode that fuses lexical and semantic results via RRF.

Key components:

- embedding::ollama: HTTP client for the Ollama /api/embeddings
  endpoint. Handles connection errors with actionable error messages
  (OllamaUnavailable, OllamaModelNotFound) and validates response
  dimensions.

- embedding::chunking: Splits long documents into overlapping
  paragraph-aware chunks for embedding. Uses a configurable max token
  estimate (8192 default for nomic-embed-text) with 10% overlap to
  preserve cross-chunk context.

- embedding::chunk_ids: Encodes chunk identity as
  doc_id * 1000 + chunk_index for the embeddings table rowid. This
  allows vector search to map results back to documents and
  deduplicate by doc_id efficiently.

- embedding::change_detector: Compares document content_hash against
  stored embedding hashes to skip re-embedding unchanged documents,
  making incremental embedding runs fast.

- embedding::pipeline: Orchestrates the full embedding flow: detect
  changed documents, chunk them, call Ollama in configurable
  concurrency (default 4), store results. Supports --retry-failed
  to re-attempt previously failed embeddings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 15:46:30 -05:00