gitlore

theirongolddev/gitlore

Fork 0

Commit Graph

Author	SHA1	Message	Date
Taylor Eernisse	65583ed5d6	refactor: Remove redundant doc comments throughout codebase Removes module-level doc comments (//! lines) and excessive inline doc comments that were duplicating information already evident from: - Function/struct names (self-documenting code) - Type signatures (the what is clear from types) - Implementation context (the how is clear from code) Affected modules: - cli/* - Removed command descriptions duplicating clap help text - core/* - Removed module headers and obvious function docs - documents/* - Removed extractor/regenerator/truncation docs - embedding/* - Removed pipeline and chunking docs - gitlab/* - Removed client and transformer docs (kept type definitions) - ingestion/* - Removed orchestrator and ingestion docs - search/* - Removed FTS and vector search docs Philosophy: Code should be self-documenting. Comments should explain "why" (business decisions, non-obvious constraints) not "what" (which the code itself shows). This change reduces noise and maintenance burden while keeping the codebase just as understandable. Retains comments for: - Non-obvious business logic - Important safety invariants - Complex algorithm explanations - Public API boundaries where generated docs matter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:04:32 -05:00
Taylor Eernisse	a50fc78823	style: Apply cargo fmt and clippy fixes across codebase Automated formatting and lint corrections from parallel agent work: - cargo fmt: import reordering (alphabetical), line wrapping to respect max width, trailing comma normalization, destructuring alignment, function signature reformatting, match arm formatting - clippy (pedantic): Range::contains() instead of manual comparisons, i64::from() instead of `as i64` casts, .clamp() instead of .max().min() chains, let-chain refactors (if-let with &&), #[allow(clippy::too_many_arguments)] and #[allow(clippy::field_reassign_with_default)] where warranted - Removed trailing blank lines and extra whitespace No behavioral changes. All existing tests pass unmodified. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:01:59 -05:00
Taylor Eernisse	723703bed9	feat(embedding): Add Ollama-powered vector embedding pipeline Implements the embedding module that generates vector representations of documents using a local Ollama instance with the nomic-embed-text model. These embeddings enable semantic (vector) search and the hybrid search mode that fuses lexical and semantic results via RRF. Key components: - embedding::ollama: HTTP client for the Ollama /api/embeddings endpoint. Handles connection errors with actionable error messages (OllamaUnavailable, OllamaModelNotFound) and validates response dimensions. - embedding::chunking: Splits long documents into overlapping paragraph-aware chunks for embedding. Uses a configurable max token estimate (8192 default for nomic-embed-text) with 10% overlap to preserve cross-chunk context. - embedding::chunk_ids: Encodes chunk identity as doc_id * 1000 + chunk_index for the embeddings table rowid. This allows vector search to map results back to documents and deduplicate by doc_id efficiently. - embedding::change_detector: Compares document content_hash against stored embedding hashes to skip re-embedding unchanged documents, making incremental embedding runs fast. - embedding::pipeline: Orchestrates the full embedding flow: detect changed documents, chunk them, call Ollama in configurable concurrency (default 4), store results. Supports --retry-failed to re-attempt previously failed embeddings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:30 -05:00

Author

SHA1

Message

Date

Taylor Eernisse

65583ed5d6

refactor: Remove redundant doc comments throughout codebase

Removes module-level doc comments (//! lines) and excessive inline doc
comments that were duplicating information already evident from:
- Function/struct names (self-documenting code)
- Type signatures (the what is clear from types)
- Implementation context (the how is clear from code)

Affected modules:
- cli/* - Removed command descriptions duplicating clap help text
- core/* - Removed module headers and obvious function docs
- documents/* - Removed extractor/regenerator/truncation docs
- embedding/* - Removed pipeline and chunking docs
- gitlab/* - Removed client and transformer docs (kept type definitions)
- ingestion/* - Removed orchestrator and ingestion docs
- search/* - Removed FTS and vector search docs

Philosophy: Code should be self-documenting. Comments should explain
"why" (business decisions, non-obvious constraints) not "what" (which
the code itself shows). This change reduces noise and maintenance burden
while keeping the codebase just as understandable.

Retains comments for:
- Non-obvious business logic
- Important safety invariants
- Complex algorithm explanations
- Public API boundaries where generated docs matter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-05 00:04:32 -05:00

Taylor Eernisse

a50fc78823

style: Apply cargo fmt and clippy fixes across codebase

Automated formatting and lint corrections from parallel agent work:

- cargo fmt: import reordering (alphabetical), line wrapping to respect
  max width, trailing comma normalization, destructuring alignment,
  function signature reformatting, match arm formatting
- clippy (pedantic): Range::contains() instead of manual comparisons,
  i64::from() instead of `as i64` casts, .clamp() instead of
  .max().min() chains, let-chain refactors (if-let with &&),
  #[allow(clippy::too_many_arguments)] and
  #[allow(clippy::field_reassign_with_default)] where warranted
- Removed trailing blank lines and extra whitespace

No behavioral changes. All existing tests pass unmodified.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-03 13:01:59 -05:00

Taylor Eernisse

723703bed9

feat(embedding): Add Ollama-powered vector embedding pipeline

Implements the embedding module that generates vector representations
of documents using a local Ollama instance with the nomic-embed-text
model. These embeddings enable semantic (vector) search and the hybrid
search mode that fuses lexical and semantic results via RRF.

Key components:

- embedding::ollama: HTTP client for the Ollama /api/embeddings
  endpoint. Handles connection errors with actionable error messages
  (OllamaUnavailable, OllamaModelNotFound) and validates response
  dimensions.

- embedding::chunking: Splits long documents into overlapping
  paragraph-aware chunks for embedding. Uses a configurable max token
  estimate (8192 default for nomic-embed-text) with 10% overlap to
  preserve cross-chunk context.

- embedding::chunk_ids: Encodes chunk identity as
  doc_id * 1000 + chunk_index for the embeddings table rowid. This
  allows vector search to map results back to documents and
  deduplicate by doc_id efficiently.

- embedding::change_detector: Compares document content_hash against
  stored embedding hashes to skip re-embedding unchanged documents,
  making incremental embedding runs fast.

- embedding::pipeline: Orchestrates the full embedding flow: detect
  changed documents, chunk them, call Ollama in configurable
  concurrency (default 4), store results. Supports --retry-failed
  to re-attempt previously failed embeddings.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-30 15:46:30 -05:00

3 Commits