Replace FTS-only seed entity discovery with hybrid search (FTS + vector
via RRF), using the same search_hybrid infrastructure as the search
command. Falls back gracefully to FTS-only when Ollama is unavailable.
Changes:
- seed_timeline() now accepts OllamaClient, delegates to search_hybrid
- New resolve_documents_to_entities() replaces find_seed_entities()
- SeedResult gains search_mode field tracking actual mode used
- TimelineResult carries search_mode through to JSON renderer
- run_timeline wires up OllamaClient from config
- handle_timeline made async for the hybrid search await
- Tests updated for new function signatures
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two new microbenchmarks measuring optimizations applied in this session:
bench_redundant_hash_query_elimination:
Compares the old 2-query pattern (get_existing_hash + full SELECT)
against the new single-query pattern where upsert_document_inner
returns change detection info directly. Uses 100 seeded documents
with 10K iterations, prepare_cached, and black_box to prevent
elision.
bench_embedding_bytes_alloc_vs_reuse:
Compares per-call Vec<u8> allocation against the reusable embed_buf
pattern now used in store_embedding. Simulates 768-dim embeddings
(nomic-embed-text) with 50K iterations. Includes correctness
assertion that both approaches produce identical byte output.
Both benchmarks use informational-only timing (no pass/fail on speed)
with correctness assertions as the actual test criteria, ensuring they
never flake on CI.
Notes recorded in benchmark file:
- SHA256 hex formatting optimization measured at 1.01x (reverted)
- compute_list_hash sort strategy measured at 1.02x (reverted)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The robot JSON envelope's meta.total_events field was incorrectly
reporting events.len() (the post-limit count), making it identical
to meta.showing. This defeated the purpose of having both fields.
Changes across the pipeline to fix this:
- collect_events now returns (Vec<TimelineEvent>, usize) where the
second element is the total event count before truncation
- TimelineResult gains a total_events_before_limit field (serde-skipped)
so the value flows cleanly from collect through to the renderer
- main.rs passes the real total instead of the events.len() workaround
Additional cleanup in this pass:
- Derive PartialEq/Eq/PartialOrd/Ord on TimelineEventType, replacing
the hand-rolled event_type_discriminant() function. Variant declaration
order now defines sort tiebreak, documented in a doc comment.
- Validate --since input with a proper LoreError::Other instead of
silently treating invalid values as None
- Fix ANSI-aware tag column padding with console::pad_str (colored tags
like "[merged]" were misaligned because ANSI escapes consumed width)
- Remove dead print_timeline_json and infer_max_depth functions that
were superseded by print_timeline_json_with_meta
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds tests/timeline_pipeline_tests.rs with end-to-end integration tests
that exercise the complete timeline pipeline against an in-memory SQLite
database with realistic data:
- pipeline_seed_expand_collect_end_to_end: Full scenario with an issue
closed by an MR, state changes, and label events. Verifies that seed
finds entities via FTS, expand discovers the closing MR through the
entity_references graph, and collect assembles a chronologically sorted
event stream containing Created, StateChanged, LabelAdded, and Merged
events.
- pipeline_empty_query_produces_empty_result: Validates graceful
degradation when FTS returns zero matches -- all three stages should
produce empty results without errors.
- pipeline_since_filter_excludes_old_events: Verifies that the since
timestamp filter propagates correctly through collect, excluding events
before the cutoff while retaining newer ones.
- pipeline_unresolved_refs_have_optional_iid: Tests the Option<i64>
target_iid on UnresolvedRef by creating cross-project references both
with and without known IIDs.
- shared_resolve_entity_ref_scoping: Unit tests for the new shared
resolve_entity_ref helper covering project-scoped lookup, unscoped
lookup, wrong-project rejection, unknown entity types, and nonexistent
entity IDs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests/perf_benchmark.rs with three side-by-side benchmarks that
compare old vs new approaches for the optimizations introduced in the
preceding commits:
- bench_label_insert_individual_vs_batch: measures N individual INSERTs
vs single multi-row INSERT (5k iterations, ~1.6x speedup)
- bench_string_building_old_vs_new: measures format!+push_str vs
writeln! (50k iterations, ~1.9x speedup)
- bench_prepare_vs_prepare_cached: measures prepare vs prepare_cached
(10k iterations, ~1.6x speedup)
Each benchmark verifies correctness (both approaches produce identical
output) and uses std::hint::black_box to prevent dead-code
elimination. Run with: cargo test --test perf_benchmark -- --nocapture
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests
(Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark
for incremental sync, and the missing idx_label_events_label index.
The MR transformer and ingestion pipeline now populate commit SHAs during
sync. The orchestrator uses watermark-based filtering for closes_issues
jobs instead of re-enqueuing all MRs every sync.
The Phase B PRD is updated to match the actual codebase: corrected
migration numbering (011-015), documented nullable label/milestone
fields (migration 012), watermark patterns (013), observability
infrastructure (014), simplified source_method values, and updated
entity_references schema to match implementation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Applies the same doc comment cleanup to test files:
- Removes test module headers (//! lines)
- Removes obvious test function comments
- Retains comments explaining non-obvious test scenarios
Test names should be descriptive enough to convey intent without
additional comments. Complex test setup or assertions that need
explanation retain their comments.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds test coverage for the new GitLabIssueRef type used by the
MR closes_issues API endpoint:
- deserializes_gitlab_issue_ref: Single object with all fields
- deserializes_gitlab_issue_ref_array: Array of refs (typical API response)
Validates that cross-project references (different project_id values)
deserialize correctly, which is important for cross-project close links.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
GitLab returns null for the label/milestone fields on resource_label_events
and resource_milestone_events when the referenced label or milestone has
been deleted. This caused deserialization failures during sync.
- Add migration 012 to recreate both event tables with nullable
label_name, milestone_title, and milestone_id columns (SQLite
requires table recreation to alter NOT NULL constraints)
- Change GitLabLabelEvent.label and GitLabMilestoneEvent.milestone
to Option<> in the Rust types
- Update upsert functions to pass through None values correctly
- Add tests for null label and null milestone deserialization
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Automated formatting and lint corrections from parallel agent work:
- cargo fmt: import reordering (alphabetical), line wrapping to respect
max width, trailing comma normalization, destructuring alignment,
function signature reformatting, match arm formatting
- clippy (pedantic): Range::contains() instead of manual comparisons,
i64::from() instead of `as i64` casts, .clamp() instead of
.max().min() chains, let-chain refactors (if-let with &&),
#[allow(clippy::too_many_arguments)] and
#[allow(clippy::field_reassign_with_default)] where warranted
- Removed trailing blank lines and extra whitespace
No behavioral changes. All existing tests pass unmodified.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds six new types for deserializing responses from GitLab's three
Resource Events API endpoints (state, label, milestone):
- GitLabStateEvent: State transitions with optional user, source_commit,
and source_merge_request reference
- GitLabLabelEvent: Label add/remove events with nested GitLabLabelRef
- GitLabMilestoneEvent: Milestone assignment changes with nested
GitLabMilestoneRef
- GitLabMergeRequestRef: Lightweight MR reference (iid, title, web_url)
- GitLabLabelRef: Label metadata (id, name, color, description)
- GitLabMilestoneRef: Milestone metadata (id, iid, title)
All types derive Deserialize + Serialize and use Option<T> for nullable
fields (user, source_commit, color, description) to match GitLab's API
contract where these fields may be null.
Includes 8 new test cases covering:
- State events with/without user, with/without source_merge_request
- Label events for add and remove actions, including null color handling
- Milestone event deserialization
- Standalone ref type deserialization (MR, label, milestone)
Uses r##"..."## raw string delimiters where JSON contains hex color
codes (#FF0000) that would conflict with r#"..."# delimiters.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three targeted regression tests covering bugs fixed in the embedding
pipeline hardening:
- overflow_doc_with_error_sentinel_not_re_detected_as_pending: verifies
that documents skipped for producing too many chunks have their
sentinel error recorded in embedding_metadata and are NOT returned by
find_pending_documents or count_pending_documents on subsequent runs
(prevents infinite re-processing loop).
- count_and_find_pending_agree: exercises four states (empty DB, new
document, fully-embedded document, config-drifted document) and
asserts that count_pending_documents and find_pending_documents
produce consistent results across all of them.
- full_embed_delete_is_atomic: confirms the --full flag's two DELETE
statements (embedding_metadata + embeddings) execute atomically
within a transaction.
Also updates test DB creation to apply migration 010.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Four new test modules covering the search infrastructure:
- tests/embedding.rs: Unit tests for the embedding pipeline including
chunk ID encoding/decoding, change detection, and document chunking
with overlap verification.
- tests/fts_search.rs: Integration tests for FTS5 search including
safe query sanitization, multi-term queries, prefix matching, and
the raw FTS mode for power users.
- tests/hybrid_search.rs: End-to-end tests for hybrid search mode
including RRF fusion correctness, graceful degradation when
embeddings are unavailable, and filter application.
- tests/golden_query_tests.rs: Golden query tests using fixtures
from tests/fixtures/golden_queries.json to verify search quality
against known-good query/result pairs. Ensures ranking stability
across implementation changes.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces thorough test coverage for merge request functionality,
following the established testing patterns from issue ingestion.
New test files:
- mr_transformer_tests.rs: NormalizedMergeRequest transformation tests
covering full MR with all fields, minimal MR, draft detection via
title prefix and work_in_progress field, label/assignee/reviewer
extraction, and timestamp conversion
- mr_discussion_tests.rs: MR discussion normalization tests including
polymorphic noteable binding, DiffNote position extraction with
line ranges and SHA triplet, and resolvable note handling
- diffnote_position_tests.rs: Exhaustive DiffNote position scenarios
covering text/image/file types, single-line vs multi-line comments,
added/removed/modified lines, and missing position handling
New fixtures:
- fixtures/gitlab_merge_request.json: Representative MR API response
with nested structures for integration testing
Updated tests:
- gitlab_types_tests.rs: Add MR type deserialization tests
- migration_tests.rs: Update expected schema version to 6
Test design follows property-based patterns where feasible, with
explicit edge case coverage for nullable fields and API variants
across different GitLab versions.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Establishes testing infrastructure for reliable development.
tests/fixtures/ - GitLab API response samples:
- gitlab_issue.json: Single issue with full metadata
- gitlab_issues_page.json: Paginated issue list response
- gitlab_discussion.json: Discussion thread with notes
- gitlab_discussions_page.json: Paginated discussions response
All fixtures captured from real GitLab API responses with
sensitive data redacted, ensuring tests match actual behavior.
tests/gitlab_types_tests.rs - Type deserialization tests:
- Validates serde parsing of all GitLab API types
- Tests edge cases: null fields, empty arrays, nested objects
- Ensures GitLabIssue, GitLabDiscussion, GitLabNote parse correctly
- Verifies optional fields handle missing data gracefully
- Tests author/assignee extraction from various formats
tests/fixture_tests.rs - Integration with fixtures:
- Loads fixture files and validates parsing
- Tests transformer functions produce correct database rows
- Verifies IssueWithMetadata extracts labels and assignees
- Tests NormalizedDiscussion/NormalizedNote structure
- Validates raw payload preservation logic
tests/migration_tests.rs - Database schema tests:
- Creates in-memory SQLite for isolation
- Runs all migrations and verifies schema
- Tests table creation with expected columns
- Validates foreign key constraints
- Tests index creation for query performance
- Verifies idempotent migration behavior
Test infrastructure uses:
- tempfile for isolated database instances
- wiremock for HTTP mocking (available for future API tests)
- Standard Rust #[test] attributes
Run with: cargo test
Run single: cargo test test_name
Run with output: cargo test -- --nocapture
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>