Commit Graph

138 Commits

Author SHA1 Message Date
teernisse
94435c37f0 perf(timeline): hoist prepared statement outside discussion thread loop
Moves the conn.prepare() call for fetching discussion notes outside the
per-discussion loop in collect_discussion_threads(). The SQL is identical
for every iteration, so preparing it once and rebinding parameters avoids
redundant statement compilation on each matched discussion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:40 -05:00
teernisse
59f65b127a fix(search): pass FTS5 boolean operators through unquoted
FTS5 boolean operators (AND, OR, NOT, NEAR) are case-sensitive uppercase
keywords that must appear unquoted in the query string. Previously, the
user-friendly query builder would double-quote every token, causing
queries like "switch AND health" to search for the literal word "AND"
instead of using it as a boolean conjunction.

Adds a FTS5_OPERATORS constant and checks each token against it before
quoting, allowing natural boolean search syntax to work as expected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:29 -05:00
teernisse
f36e900570 feat(cli): add pipeline progress spinners to timeline and search
Adds numbered stage spinners ([1/3], [2/3], [3/3]) to the timeline
pipeline stages (seed, expand, collect) so users see activity during
longer queries. TimelineParams gains a robot_mode field to suppress
spinners in JSON output mode.

Adds a [1/1] spinner to the search command for consistency, using the
shared stage_spinner from cli/progress.

Also refactors wrap_snippet() to delegate to wrap_text() with a 4-line
cap, eliminating the duplicated word-wrapping logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:19 -05:00
teernisse
e2efc61beb refactor(cli): extract stage_spinner to shared progress module
Moves stage_spinner() from a private function in sync.rs to a pub function
in cli/progress.rs so it can be reused by the timeline and search commands.
The function creates a numbered spinner (e.g. [1/3]) for pipeline stages,
returning a hidden no-op bar in robot mode to keep caller code path-uniform.

sync.rs now imports from crate::cli::progress::stage_spinner instead of
defining its own copy. Adds unit tests for robot mode (hidden bar), human
mode (prefix/message properties), and prefix formatting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:10 -05:00
teernisse
2da1a228b3 feat(timeline): collect and render full discussion threads
Implements the downstream consumption of matched discussions from the seed
phase, completing the discussion thread feature across collect, CLI, and
integration tests.

Collect phase (timeline_collect.rs):
- New collect_discussion_threads() function assembles full threads by
  querying notes for each matched discussion_id, filtering out system notes
  (is_system = 0), ordering chronologically, and capping at THREAD_MAX_NOTES
  with a synthetic "[N more notes not shown]" summary note
- build_entity_lookup() creates a (type, id) -> (iid, path) map from seed
  and expanded entities to provide display metadata for thread events
- Thread timestamp is set to the first note's created_at for correct
  chronological interleaving with other timeline events
- collect_events() gains a matched_discussions parameter; threads are
  collected after entity events and before evidence note merging

CLI rendering (cli/commands/timeline.rs):
- Human mode: threads render with box-drawing borders, bold @author tags,
  date-stamped notes, and word-wrapped bodies (60 char width)
- Robot mode: DiscussionThread serializes as discussion_thread kind with
  note_count, full notes array (note_id, author, body, ISO created_at)
- THREAD tag in yellow for human event tag styling
- TimelineMeta gains discussion_threads_included count

Tests:
- 8 new collect tests: basic thread assembly, system note filtering, empty
  thread skipping, body truncation to THREAD_NOTE_MAX_CHARS, note cap with
  synthetic summary, timestamp from first note, chronological sort position,
  and deduplication of duplicate discussion_ids
- Integration tests updated for new collect_events signature

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:18:36 -05:00
teernisse
0e65202778 feat(timeline): add DiscussionThread types and seed-phase discussion matching
Introduces the foundation for full discussion thread support in the
timeline pipeline. Adds three new domain types to timeline.rs:

- ThreadNote: individual note within a thread (id, author, body, timestamp)
- MatchedDiscussion: tracks discussions matched during seeding with their
  parent entity (issue or MR) for downstream collection
- DiscussionThread variant on TimelineEventType: carries a full thread of
  notes, sorted between NoteEvidence and CrossReferenced

Moves truncate_to_chars() from timeline_seed.rs to timeline.rs as pub(crate)
for reuse by the collect phase. Adds THREAD_NOTE_MAX_CHARS (2000) and
THREAD_MAX_NOTES (50) constants.

Upgrades the seed SQL in resolve_documents_to_entities() to resolve note
documents to their parent discussion via an additional LEFT JOIN chain
(notes -> discussions), using COALESCE to unify the entity resolution path
for both discussion and note source types. SeedResult gains a
matched_discussions field that captures deduplicated discussion matches.

Tests cover: discussion matching from discussion docs, note-to-parent
resolution, deduplication of same discussion across multiple docs, and
correct parent entity type (issue vs MR).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:18:18 -05:00
teernisse
f439c42b3d chore: add gitignore for mock-seed, roam CI workflow, formatting
- Add tools/mock-seed/ to .gitignore
- Add .github/workflows/roam.yml CI workflow
- Add .roam/fitness.yaml architectural fitness rules
- Rustfmt formatting fixes in show.rs and vector.rs
- Beads sync

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:30 -05:00
teernisse
4f3ec72923 feat(timeline): upgrade seed phase to hybrid search
Replace FTS-only seed entity discovery with hybrid search (FTS + vector
via RRF), using the same search_hybrid infrastructure as the search
command. Falls back gracefully to FTS-only when Ollama is unavailable.

Changes:
- seed_timeline() now accepts OllamaClient, delegates to search_hybrid
- New resolve_documents_to_entities() replaces find_seed_entities()
- SeedResult gains search_mode field tracking actual mode used
- TimelineResult carries search_mode through to JSON renderer
- run_timeline wires up OllamaClient from config
- handle_timeline made async for the hybrid search await
- Tests updated for new function signatures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:24 -05:00
teernisse
e6771709f1 refactor(core): extract path_resolver module, fix old_path matching in who
Extract shared path resolution logic from who.rs into a new
core::path_resolver module for cross-module reuse. Functions moved:
escape_like, normalize_repo_path, PathQuery, SuffixResult,
build_path_query, suffix_probe. Duplicate escape_like copies removed
from list.rs, project.rs, and filters.rs — all now import from
path_resolver.

Additionally fixes two bugs in query_expert_details() and
query_overlap() where only position_new_path was checked (missing
old_path matches for renamed files) and state filter excluded 'closed'
MRs despite the main scoring query including them with a decay
multiplier.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:14 -05:00
teernisse
6e55b2470d bugfix: DB column and size issues 2026-02-13 11:11:35 -05:00
Taylor Eernisse
48fbd4bfdb feat(core): add file rename chain resolver with depth-bounded BFS
New module: core::file_history with resolve_rename_chain() that traces
a file path through its rename history in mr_file_changes using
bidirectional BFS (forward: old_path->new_path, backward: new_path->old_path).

Key design decisions:
- Depth-bounded BFS: each queue entry carries its distance from the
  origin, so max_hops correctly limits by graph distance (not by total
  nodes discovered). This matters for branching rename graphs where a
  file was renamed differently in parallel MRs.
- Cycle-safe: visited set prevents infinite loops from circular renames.
- Project-scoped: queries are always scoped to a single project_id.
- Deterministic: output is sorted for stable results.

Tests cover: linear chains (forward/backward), cycles, max_hops=0,
depth-bounded linear chains, branching renames, diamond patterns,
and cross-project isolation (9 tests total).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:41 -05:00
Taylor Eernisse
9786ef27f5 refactor(core/time): extract parse_since_from for deterministic time parsing
Factor out parse_since_from(input, reference_ms) so callers can compute
relative durations against a fixed reference timestamp instead of always
using now(). The existing parse_since() now delegates to it with now_ms().

Enables testable and reproducible time-relative queries for features like
timeline --as-of and who --as-of.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:20 -05:00
Taylor Eernisse
7e0e6a91f2 refactor: extract unit tests into separate _tests.rs files
Move inline #[cfg(test)] mod tests { ... } blocks from 22 source files
into dedicated _tests.rs companion files, wired via:

    #[cfg(test)]
    #[path = "module_tests.rs"]
    mod tests;

This keeps implementation-focused source files leaner and more scannable
while preserving full access to private items through `use super::*;`.

Modules extracted:
  core:      db, note_parser, payloads, project, references, sync_run,
             timeline_collect, timeline_expand, timeline_seed
  cli:       list (55 tests), who (75 tests)
  documents: extractor (43 tests), regenerator
  embedding: change_detector, chunking
  gitlab:    graphql (wiremock async tests), transformers/issue
  ingestion: dirty_tracker, discussions, issues, mr_diffs

Also adds conflicts_with("explain_score") to the --detail flag in the
who command to prevent mutually exclusive flags from being combined.

All 629 unit tests pass. No behavior changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:02 -05:00
teernisse
94c8613420 feat(bd-226s): implement time-decay expert scoring model
Replace flat-weight expertise scoring with exponential half-life decay,
split reviewer signals (participated vs assigned-only), dual-path rename
awareness, and new CLI flags (--as-of, --explain-score, --include-bots,
--all-history).

Changes:
- ScoringConfig: 8 new fields with validation (config.rs)
- half_life_decay() and normalize_query_path() pure functions (who.rs)
- CTE-based SQL with dual-path matching, mr_activity, reviewer_participation (who.rs)
- Rust-side decay aggregation with deterministic f64 ordering (who.rs)
- Path resolution probes check old_path columns (who.rs)
- Migration 026: 5 new indexes for dual-path and reviewer participation
- Default --since changed from 6m to 24m
- 31 new tests (example-based + invariant), 621 total who tests passing
- Autocorrect registry updated with new flags

Closes: bd-226s, bd-2w1p, bd-1soz, bd-18dn, bd-2ao4, bd-2yu5, bd-1b50,
bd-1hoq, bd-1h3f, bd-13q8, bd-11mg, bd-1vti, bd-1j5o
2026-02-12 15:44:55 -05:00
teernisse
83cd16c918 feat: implement per-note search and document pipeline
- Add SourceType::Note with extract_note_document() and ParentMetadataCache
- Migration 022: composite indexes for notes queries + author_id column
- Migration 024: table rebuild adding 'note' to CHECK constraints, defense triggers
- Migration 025: backfill existing non-system notes into dirty queue
- Add lore notes CLI command with 17 filter options (author, path, resolution, etc.)
- Support table/json/jsonl/csv output formats with field selection
- Wire note dirty tracking through discussion and MR discussion ingestion
- Fix test_migration_024_preserves_existing_data off-by-one (tested wrong migration)
- Fix upsert_document_inner returning false for label/path-only changes
2026-02-12 13:31:24 -05:00
teernisse
c8d609ab78 chore: add drift to autocorrect command registry 2026-02-12 12:10:02 -05:00
teernisse
35c828ba73 feat(bd-91j1): enhance robot-docs with quick_start and example_output
Add quick_start section with glab equivalents, lore-exclusive features,
and read/write split guidance. Add example_output to issues, mrs, search,
and who commands. Update strip_schemas to also strip example_output in
brief mode. Update beads tracking state.

Closes: bd-91j1
2026-02-12 12:09:44 -05:00
teernisse
ecbfef537a feat(bd-1ksf): wire hybrid search (FTS5 + vector + RRF) to CLI
Make run_search async, replace hardcoded lexical mode with SearchMode::parse(),
wire search_hybrid() with OllamaClient for semantic/hybrid modes, graceful
degradation when Ollama unavailable.

Closes: bd-1ksf
2026-02-12 12:03:47 -05:00
teernisse
47eecce8e9 feat(bd-1cjx): add lore drift command for discussion divergence detection
Implement drift detection using cosine similarity between issue description
embedding and chronological note embeddings. Sliding window (size 3) identifies
topic drift points. Includes human and robot output formatters.

New files: drift.rs, similarity.rs
Closes: bd-1cjx
2026-02-12 12:02:15 -05:00
teernisse
b29c382583 feat(bd-2g50): fill data gaps in issue detail view
Add references_full, user_notes_count, merge_requests_count computed
fields to show issue. Add closed_at and confidential columns via
migration 023.

Closes: bd-2g50
2026-02-12 11:59:44 -05:00
teernisse
d9c9f6e541 fix: escape LIKE metacharacters in project resolver
User-supplied project names containing `%` or `_` were passed directly
into LIKE patterns, causing unintended wildcard matching. For example,
`my_project` would match `my-project` because `_` is a single-char
wildcard in SQL LIKE.

Added escape_like() helper that escapes `\`, `%`, and `_` with
backslash, and added ESCAPE '\' clauses to both the suffix-match and
substring-match queries in resolve_project().

Includes two regression tests:
- test_underscore_not_wildcard: `_` in input must not match `-`
- test_percent_not_wildcard: `%` in input must not match arbitrary strings
2026-02-12 11:21:09 -05:00
teernisse
acc5e12e3d perf: force partial index for DiffNote queries, batch stats counts
Query optimizer fixes for the `who` and `stats` commands based on
a systematic performance audit of the SQLite query plans.

who command (expert/reviews/detail modes):
- Add INDEXED BY idx_notes_diffnote_path_created hints to all DiffNote
  queries. SQLite's planner was selecting idx_notes_system (38% of rows)
  over the far more selective partial index (9.3% of rows). Measured
  50-133x speedup on expert queries, 26x on reviews queries.
- Reorder JOIN clauses in detail mode's MR-author sub-select to match
  the index scan direction (notes -> discussions -> merge_requests).

stats command:
- Replace 12+ sequential COUNT(*) queries with conditional aggregates
  (COALESCE + SUM + CASE). Documents, dirty_sources, pending_discussion_
  fetches, and pending_dependent_fetches tables each scanned once instead
  of 2-3 times. Measured 1.7x speedup (109ms -> 65ms warm cache).
- Switch FTS document count from COUNT(*) on the virtual table to
  COUNT(*) on documents_fts_docsize shadow table (B-tree scan vs FTS5
  virtual table overhead). Measured 19x speedup for that single query.

Database: 61652 docs, 282K notes, 211K discussions, 1.5GB.
2026-02-12 11:21:00 -05:00
teernisse
3a1307dcdc feat(cli): wire defaultProject through init and all commands
Integrates the defaultProject config field across the entire CLI
surface so that omitting `-p` now falls back to the configured default.

Init command:
- New `--default-project` flag on `lore init` (and robot-mode variant)
- InitInputs.default_project: Option<String> passed through to run_init
- Validation in run_init ensures the default matches a configured path
- Interactive mode: when multiple projects are configured, prompts
  whether to set a default and which project to use
- Robot mode: InitOutputJson now includes default_project (omitted when
  null) for downstream automation
- Autocorrect dictionary updated with `--default-project`

Command handlers applying effective_project():
- handle_issues: list filters use config default when -p omitted
- handle_mrs: same cascading resolution for MR listing
- handle_ingest: dry-run and full sync respect the default
- handle_timeline: TimelineParams.project resolved via effective_project
- handle_search: SearchCliFilters.project resolved via effective_project
- handle_generate_docs: project filter cascades
- handle_who: falls back to config.default_project when -p omitted
- handle_count: both count subcommands respect the default
- handle_discussions: discussion count filters respect the default

Robot-docs:
- init command schema updated with --default-project flag and
  response_schema showing default_project as string?
- New config_notes section documents the defaultProject field with
  type, description, and example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:09:46 -05:00
teernisse
6ea3108a20 feat(config): add defaultProject with validation and cascading resolver
Introduces a new optional `defaultProject` field on Config (and
MinimalConfig for init output) that acts as a fallback when the
`-p`/`--project` CLI flag is omitted.

Domain-layer changes:
- Config.default_project: Option<String> with camelCase serde rename
- Config::load validates that defaultProject matches a configured
  project path (exact or case-insensitive suffix match), returning
  ConfigInvalid on mismatch
- Config::effective_project(cli_flag) -> Option<&str>: cascading
  resolver that prefers the CLI flag, then the config default, then None
- MinimalConfig.default_project with skip_serializing_if for clean
  JSON output when unset

Tests added:
- effective_project: CLI overrides default, falls back to default,
  returns None when both absent
- Config::load: accepts valid defaultProject, rejects nonexistent,
  accepts suffix match
- MinimalConfig: omits null defaultProject, includes when set
- Helper write_config_with_default_project for parameterized tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:09:33 -05:00
Taylor Eernisse
06229ce98b feat(cli): expose available_statuses in robot mode and hide status_category
(Supersedes empty commit f3788eb — jj auto-snapshot race.)

Three related refinements to how work item status is presented:

1. available_statuses in meta (list.rs, main.rs):
   Robot-mode issue list responses now include meta.available_statuses —
   a sorted array of all distinct status_name values in the database.
   Agents can use this to validate --status filter values or display
   valid options without a separate query.

2. Hide status_category from JSON (list.rs, show.rs):
   status_category is a GitLab internal classification that duplicates
   the state field. Switched to skip_serializing so it never appears
   in JSON output while remaining available internally.

3. Simplify human-readable status display (show.rs):
   Removed the "(category)" parenthetical from the Status line.

4. robot-docs schema updates (main.rs):
   Documented --status filter semantics and meta.available_statuses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:24:41 -05:00
Taylor Eernisse
e9af529f6e feat(ingestion): add progress reporting for status enrichment pipeline
Previously the status enrichment phase (GraphQL work item status fetch)
ran silently — users saw no feedback between "syncing issues" and the
final enrichment summary. For projects with hundreds of issues and
adaptive page-size retries, this felt like a hang.

Changes across three layers:

GraphQL (graphql.rs):
  - Extract fetch_issue_statuses_with_progress() accepting an optional
    on_page callback invoked after each paginated fetch with the
    running count of fetched IIDs
  - Original fetch_issue_statuses() preserved as a zero-cost
    delegation wrapper (no callback overhead)

Orchestrator (orchestrator.rs):
  - Three new ProgressEvent variants: StatusEnrichmentStarted,
    StatusEnrichmentPageFetched, StatusEnrichmentWriting
  - Wire the page callback through to the new _with_progress fn

CLI (ingest.rs):
  - Handle all three new events in the progress callback, updating
    both the per-project spinner and the stage bar with live counts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:22:20 -05:00
Taylor Eernisse
70271c14d6 fix(core): ensure migration framework records schema version automatically
The migration runner now inserts (OR REPLACE) the schema_version row
after each successful migration batch, regardless of whether the
migration SQL itself contains a self-registering INSERT. This prevents
version tracking gaps when a .sql migration omits the bookkeeping
statement, which would leave the schema at an unrecorded version and
cause re-execution attempts on next startup.

Legacy migrations that already self-register are unaffected thanks to
the OR REPLACE conflict resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:21:49 -05:00
Taylor Eernisse
d9f99ef21d feat(cli): status display/filtering, expanded --fields, and robot-docs --brief
Work item status integration across all CLI output:

Issue listing (lore list issues):
- New Status column appears when any issue has status data, with
  hex-color rendering using ANSI 256-color approximation
- New --status flag for case-insensitive filtering (OR logic for
  multiple values): lore issues --status "In progress" --status "To do"
- Status fields (name, category, color, icon_name, synced_at) in issue
  list query and JSON output with conditional serialization

Issue detail (lore show issue):
- Displays "Status: In progress (in_progress)" with color-coded output
  using ANSI 256-color approximation from hex color values
- Status fields included in robot mode JSON with ISO timestamps
- IssueRow, IssueDetail, IssueDetailJson all carry status columns

Robot mode field selection expanded to new commands:
- search: --fields with "minimal" preset (document_id, title, source_type, score)
- timeline: --fields with "minimal" preset (timestamp, type, entity_iid, detail)
- who: --fields with per-mode presets (expert_minimal, workload_minimal, etc.)
- robot-docs: new --brief flag strips response_schema from output (~60% smaller)
- strip_schemas() utility in robot.rs for --brief mode
- expand_fields_preset() extended for search, timeline, and all who modes

Robot-docs manifest updated with --status flag documentation, --fields
flags for search/timeline/who, fields_presets sections, and corrected
search response schema field names.

Note: replaces empty commit dcfd449 which lost staging during hook execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:13:37 -05:00
Taylor Eernisse
6b75697638 feat(ingestion): enrich issues with work item status from GraphQL API
Add a "Phase 1.5" status enrichment step to the issue ingestion pipeline
that fetches work item statuses via the GitLab GraphQL API after the
standard REST API ingestion completes.

Schema changes (migration 021):
- Add status_name, status_category, status_color, status_icon_name, and
  status_synced_at columns to the issues table (all nullable)

Ingestion pipeline changes:
- New `enrich_issue_statuses_txn()` function that applies fetched
  statuses in a single transaction with two phases: clear stale statuses
  for issues that no longer have a status widget, then apply new/updated
  statuses from the GraphQL response
- ProgressEvent variants for status enrichment (complete/skipped)
- IngestProjectResult tracks enrichment metrics (seen, enriched, cleared,
  without_widget, partial_error_count, enrichment_mode, errors)
- Robot mode JSON output includes per-project status enrichment details

Configuration:
- New `sync.fetchWorkItemStatus` config option (defaults true) to disable
  GraphQL status enrichment on instances without Premium/Ultimate
- `LoreError::GitLabAuthFailed` now treated as permanent API error so
  status enrichment auth failures don't trigger retries

Also removes the unnecessary nested SAVEPOINT in store_closes_issues_refs
(already runs within the orchestrator's transaction context).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:09:21 -05:00
Taylor Eernisse
dc49f5209e feat(gitlab): add GraphQL client with adaptive pagination and work item status types
Introduce a reusable GraphQL client (`src/gitlab/graphql.rs`) that handles
GitLab's GraphQL API with full error handling for auth failures, rate
limiting, and partial errors. Key capabilities:

- Adaptive page sizing (100 → 50 → 25 → 10) to handle GitLab GraphQL
  complexity limits without hardcoding a single safe page size
- Paginated issue status fetching via the workItems GraphQL query
- Graceful detection of unsupported instances (missing GraphQL endpoint
  or forbidden auth) so ingestion continues without status data
- Retry-After header parsing via the `httpdate` crate for rate limit
  compliance

Also adds `WorkItemStatus` type to `gitlab::types` with name, category,
color, and icon_name fields (all optional except name) with comprehensive
deserialization tests covering all system statuses (TO_DO, IN_PROGRESS,
DONE, CANCELED) and edge cases (null category, unknown future values).

The `GitLabClient` gains a `graphql_client()` factory method for
ergonomic access from the ingestion pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:08:53 -05:00
Taylor Eernisse
7d40a81512 fix(ingestion): remove nested transaction in upsert_mr_file_changes
drain_mr_diffs in orchestrator.rs already wraps each MR diff store
in an unchecked_transaction (alongside job completion and watermark
update). upsert_mr_file_changes was also starting its own inner
transaction via conn.unchecked_transaction(), causing every call to
fail with "cannot start a transaction within a transaction".

Remove the inner transaction management from upsert_mr_file_changes
so it operates on whatever Connection (or Transaction deref'd to
Connection) the caller provides. The caller in drain_mr_diffs owns
the transaction boundary. Standalone callers (tests, future direct
use) auto-commit each statement, which is correct for their use case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 11:56:15 -05:00
Taylor Eernisse
45126f04a6 fix: document upsert project_id, truncation budget, and Ollama model matching
- regenerator: Include project_id in the ON CONFLICT UPDATE clause for
  document upserts. Previously, if a document moved between projects
  (e.g., during re-ingestion), the project_id would remain stale.

- truncation: Compute the omission marker ("N notes omitted") before
  checking whether first+last notes fit in the budget. The old order
  computed the marker after the budget check, meaning the marker's byte
  cost was unaccounted for and could cause over-budget output.

- ollama: Tighten model name matching to require either an exact match
  or a colon-delimited tag prefix (model == name or name starts with
  "model:"). The prior starts_with check would false-positive on
  "nomic-embed-text-v2" when looking for "nomic-embed-text". Tests
  updated to cover exact match, tagged, wrong model, and prefix
  false-positive cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:14 -05:00
Taylor Eernisse
dfa44e5bcd fix(ingestion): label upsert reliability, init idempotency, and sync health
Label upsert (issues + merge_requests): Replace INSERT ... ON CONFLICT DO
UPDATE RETURNING with INSERT OR IGNORE + SELECT. The prior RETURNING-based
approach relied on last_insert_rowid() matching the returned id, which is
not guaranteed when ON CONFLICT triggers an update (SQLite may return 0).
The new two-step approach is unambiguous and correctly tracks created_count.

Init: Add ON CONFLICT(gitlab_project_id) DO UPDATE to the project insert
so re-running `lore init` updates path/branch/url instead of failing with
a unique constraint violation.

MR discussions sync: Reset discussions_sync_attempts to 0 when clearing a
sync health error, so previously-failed MRs get a fresh retry budget after
successful sync.

Count: format_number now handles negative numbers correctly by extracting
the sign before inserting thousand-separators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:15:53 -05:00
Taylor Eernisse
53ef21d653 fix: propagate DB errors instead of silently swallowing them
Replace .unwrap_or(), .ok(), and .filter_map(|r| r.ok()) patterns with
proper error propagation using ? and rusqlite::OptionalExtension where
the query may legitimately return no rows.

Affected areas:
- events_db::count_events: three count queries now propagate errors
  instead of defaulting to (0, 0) on failure
- note_parser::extract_refs_from_system_notes: row iteration errors
  are now propagated instead of silently dropped via filter_map
- note_parser::noteable_type_to_entity_type: unknown types now log a
  debug warning before defaulting to "issue"
- payloads::store_payload/read_payload: use .optional()? instead of
  .ok() to distinguish "no row" from "query failed"
- backoff::compute_next_attempt_at: use .clamp(0, 30) to guard against
  negative attempt_count, not just .min(30)
- search::vector::max_chunks_per_document: returns Result<i64> with
  proper error propagation through .optional()?.flatten()
- embedding::chunk_ids::decode_rowid: promote debug_assert to assert
  since negative rowids indicate data corruption worth failing fast on
- ingestion::dirty_tracker::record_dirty_error: use .optional()? to
  handle missing dirty_sources row gracefully instead of hard error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:15:36 -05:00
Taylor Eernisse
41504b4941 feat(who): configurable scoring weights, MR refs, detail mode, and suffix path resolution
Expert mode now surfaces the specific MR references (project/path!iid) that
contributed to each expert's score, capped at 50 per user. A new --detail flag
adds per-MR breakdowns showing role (Author/Reviewer/both), note count, and
last activity timestamp.

Scoring weights (author_weight, reviewer_weight, note_bonus) are now
configurable via the config file's `scoring` section with validation that
rejects negative values. Defaults shift to author_weight=25, reviewer_weight=10,
note_bonus=1 — better reflecting that code authorship is a stronger expertise
signal than review assignment alone.

Path resolution gains suffix matching: typing "login.rs" auto-resolves to
"src/auth/login.rs" when unambiguous, with clear disambiguation errors when
multiple paths match. Project-scoping (-p) narrows the candidate set.

The MAX_MR_REFS_PER_USER constant is promoted to module scope for reuse
across expert and overlap modes. Human output shows MR refs inline and detail
sub-rows when requested. Robot JSON includes mr_refs, mr_refs_total,
mr_refs_truncated, and optional details array.

Includes comprehensive tests for suffix resolution, scoring weight
configurability, MR ref aggregation across projects, and detail mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:15:15 -05:00
Taylor Eernisse
b168a58134 fix(search): cap vector search k-value and add rowid assertion
The vector search multiplier could grow unbounded on documents with
many chunks, producing enormous k values that cause SQLite to scan
far more rows than necessary. Clamp the multiplier to [8, 200] and
cap k at 10,000 to prevent degenerate performance on large corpora.

Also adds a debug_assert in decode_rowid to catch negative rowids
early — these indicate a bug in the encoding pipeline and should
fail fast rather than silently produce garbage document IDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:34:05 -05:00
Taylor Eernisse
b704e33188 feat(sync): surface MR diff fetch/fail counters in sync output
Adds mr_diffs_fetched and mr_diffs_failed fields to IngestResult and
SyncResult, threads them through the orchestrator aggregation, includes
them in the structured tracing span and human-readable sync summary.
Previously MR diff failures were silently swallowed — now they appear
alongside resource event counts for full pipeline observability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:53 -05:00
Taylor Eernisse
6e82f723c3 fix(ingestion): unify store + watermark + job-complete in single transaction
Previously, drain_resource_events, drain_mr_closes_issues, and
drain_mr_diffs each opened a transaction only for the job-complete +
watermark update, but the store operation ran outside that transaction.
If the process crashed between the store and the watermark update, data
would be persisted without the watermark advancing, causing silent
duplicates on the next sync.

Now each drain function opens the transaction before the store call and
commits it only after both the store and the watermark update succeed.
On error, the transaction is explicitly dropped so the connection is
not left in a half-committed state.

Also:
- store_resource_events no longer manages its own transaction; the caller
  passes in a connection (which is actually the transaction)
- upsert_mr_file_changes wraps DELETE + INSERT in a transaction internally
- reset_discussion_watermarks now also clears diffs_synced_for_updated_at
- Orchestrator error span now includes closes_issues_failed + mr_diffs_failed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:47 -05:00
Taylor Eernisse
940a96375a refactor(search): rename --after/--updated-after to --since/--updated-since
The --since naming is more intuitive (matches git log --since) and
consistent with the list commands which already use --since. Renames
the CLI flags, SearchCliFilters fields, SearchFilters fields,
autocorrect registry, and robot-docs manifest. No behavioral change.

Affected paths:
- cli/mod.rs: SearchArgs field + clap attribute rename
- cli/commands/search.rs: SearchCliFilters + run_search plumbing
- search/filters.rs: SearchFilters struct + apply_filters logic
- main.rs: handle_search + robot-docs JSON
- cli/autocorrect.rs: COMMAND_FLAGS entry for search

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:24 -05:00
Taylor Eernisse
c54a969269 fix(who): exclude self-assigned reviewers from file-change reviewer signal
Signal 4 (mr_reviewers + mr_file_changes) was missing the self-review
exclusion that signal 1 (DiffNote reviewer) already had. An MR author
listed as their own reviewer would be double-counted as both author
and reviewer, inflating their score.

Also removes redundant SELECT DISTINCT from signal 2 (GROUP BY
already ensures uniqueness).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 13:42:40 -05:00
Taylor Eernisse
95b7183add feat(who): expand expert + overlap queries with mr_file_changes and mr_reviewers
Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries)

- Add fetch_mr_file_changes config option and --no-file-changes CLI flag
- Add GitLab MR diffs API fetch pipeline with watermark-based sync
- Create migration 020 for diffs_synced_for_updated_at watermark column
- Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL:
  DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers
- Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END)
- Add insert_file_change test helper, 8 new who tests, all 397 tests pass
- Also includes: list performance migration 019, autocorrect module, README updates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 13:35:14 -05:00
Taylor Eernisse
435a208c93 perf: eliminate unnecessary clones and pre-allocate collections
Three micro-optimizations with zero behavioral change:

1. timeline_collect.rs: Reorder format!() before enum construction so
   the owned String moves into the variant directly, eliminating
   .clone() on state, label, and milestone strings in StateChanged,
   LabelAdded/Removed, and MilestoneSet/Removed event paths.

2. pipeline.rs: Use Arc<str> for doc_hash shared across a document's
   chunks instead of cloning the full String per chunk. Also remove
   redundant embed_buf.reserve() since extend_from_slice already
   handles growth and the buffer is reused across iterations.

3. rrf.rs: Pre-allocate HashMap with combined vector+fts result count
   via with_capacity() to avoid rehashing during RRF score accumulation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 08:08:14 -05:00
Taylor Eernisse
cc11d3e5a0 fix: peer review — 5 correctness bugs across who, db, lock, embedding, main
Comprehensive peer code review identified and fixed the following:

1. who.rs: @-prefixed path routing used `target` (with @) instead of
   `clean` (stripped) when checking for '/' and passing to Expert mode,
   causing `lore who @src/auth/` to silently return zero results because
   the SQL LIKE matched against `@src/auth/%` which never exists.

2. db.rs: After ROLLBACK TO savepoint on migration failure, the savepoint
   was never RELEASEd, leaving it active on the connection. Fixed in both
   run_migrations() and run_migrations_from_dir().

3. lock.rs: Multiple acquire() calls (e.g. re-acquiring a stale lock)
   replaced the heartbeat_handle without stopping the old thread, causing
   two concurrent heartbeat writers competing on the same lock row. Now
   signals the old thread to stop and joins it before spawning a new one.

4. chunk_ids.rs: encode_rowid() had no guard for chunk_index >= 1000
   (CHUNK_ROWID_MULTIPLIER), which would cause rowid collisions between
   adjacent documents. Added range assertion [0, 1000).

5. main.rs: Fallback JSON error formatting in handle_auth_test
   interpolated LoreError Display output without escaping quotes or
   backslashes, potentially producing malformed JSON for robot-mode
   consumers. Now escapes both characters before interpolation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 08:07:59 -05:00
Taylor Eernisse
5786d7f4b6 fix: defensive hardening — lock release logging, SQLite param guard, vector cast
Three defensive improvements found via peer code review:

1. lock.rs: Lock release errors were silently discarded with `let _ =`.
   If the DELETE failed (disk full, corruption), the lock stayed in the
   database with no diagnostic. Next sync would require --force with no
   clue why. Now logs with error!() including the underlying error message.

2. filters.rs: Dynamic SQL label filter construction had no upper bound
   on bind parameters. With many combined filters, param_idx + labels.len()
   could exceed SQLite's 999-parameter limit, producing an opaque error.
   Added a guard that caps labels at 900 - param_idx.

3. vector.rs: max_chunks_per_document returned i64 which was cast to
   usize. A negative value from a corrupt database would wrap to a huge
   number, causing overflow in the multiplier calculation. Now clamped
   to .max(1) and cast via unsigned_abs().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:54 -05:00
Taylor Eernisse
d3306114eb fix(ingestion): pass ShutdownSignal into issue and MR pagination loops
The orchestrator already accepted a ShutdownSignal but only checked it
between phases (after all issues fetched, before discussions). The inner
loops in ingest_issues() and ingest_merge_requests() consumed entire
paginated streams without checking for cancellation.

On a large initial sync (thousands of issues/MRs), Ctrl+C could be
unresponsive for minutes while the current entity type finished draining.

Now both functions accept &ShutdownSignal and check is_cancelled() at
the top of each iteration, breaking out promptly and committing the
cursor for whatever was already processed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:36 -05:00
Taylor Eernisse
e6b880cbcb fix: prevent panics in robot-mode JSON output and arithmetic paths
Peer code review found multiple panic-reachable paths:

1. serde_json::to_string().unwrap() in 4 robot-mode output functions
   (who.rs, main.rs x3). If serialization ever failed (e.g., NaN from
   edge-case division), the CLI would panic with an unhelpful stack trace.
   Replaced with unwrap_or_else that emits a structured JSON error fallback.

2. encode_rowid() in chunk_ids.rs used unchecked multiplication
   (document_id * 1000). On extreme document IDs this could silently wrap
   in release mode, causing embedding rowid collisions. Now uses
   checked_mul + checked_add with a diagnostic panic message.

3. HTTP response body truncation at byte index 500 in client.rs could
   split a multi-byte UTF-8 character, causing a panic. Now uses
   floor_char_boundary(500) for safe truncation.

4. who.rs reviews mode: SQL used `m.author_username != ?1` which silently
   dropped MRs with NULL author_username (SQL NULL != anything = NULL).
   Changed to `(m.author_username IS NULL OR m.author_username != ?1)`
   to match the pattern already used in expert mode.

5. handle_auth_test hardcoded exit code 5 for all errors regardless of
   type. Config not found (20), token not set (4), and network errors (8)
   all incorrectly returned 5. Now uses e.exit_code() from the actual
   LoreError, with proper suggestion hints in human mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:20 -05:00
Taylor Eernisse
121a634653 fix: critical data integrity — timeline dedup, discussion atomicity, index collision
Three correctness bugs found via peer code review:

1. TimelineEvent PartialEq/Ord omitted entity_type — issue #42 and MR #42
   with the same timestamp and event_type were treated as equal. In a
   BTreeSet or dedup, one would silently be dropped. Added entity_type to
   both PartialEq and Ord comparisons.

2. discussions.rs: store_payload() was called outside the transaction
   (on bare conn) while upsert_discussion/notes were inside. A crash
   between them left orphaned payload rows. Moved store_payload inside
   the unchecked_transaction block, matching mr_discussions.rs pattern.

3. Migration 017 created idx_issue_assignees_username(username, issue_id)
   but migration 005 already created the same index name with just
   (username). SQLite's IF NOT EXISTS silently skipped the composite
   version on every existing database. New migration 018 drops and
   recreates the index with correct composite columns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:54:59 -05:00
Taylor Eernisse
f267578aab feat: implement lore who — people intelligence commands (5 modes)
Add `lore who` command with 5 query modes answering collaboration questions
using existing DB data (280K notes, 210K discussions, 33K DiffNotes):

- Expert: who knows about a file/directory (DiffNote path analysis + MR breadth scoring)
- Workload: what is a person working on (assigned issues, authored/reviewing MRs, discussions)
- Active: what discussions need attention (unresolved resolvable, global/project-scoped)
- Overlap: who else is touching these files (dual author+reviewer role tracking)
- Reviews: what review patterns does a person have (prefix-based category extraction)

Includes migration 017 (5 composite indexes), CLI skeleton with clap conflicts_with
validation, robot JSON output with input+resolved_input reproducibility, human terminal
output, and 20 unit tests. All quality gates pass.

Closes: bd-1q8z, bd-34rr, bd-2rk9, bd-2ldg, bd-zqpf, bd-s3rc, bd-m7k1, bd-b51e,
bd-2711, bd-1rdi, bd-3mj2, bd-tfh3, bd-zibc, bd-g0d5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 23:11:14 -05:00
Taylor Eernisse
b5f78e31a8 fix(cli): audit-driven improvements to flags, help, exit codes, and deprecation
Addresses findings from a comprehensive CLI readiness audit:

Flag design (I2):
- Add hidden --no-verbose flag with overrides_with semantics, matching
  the --no-quiet pattern already established for all other boolean flags.

Help text (I3):
- Add after_help examples to issues, mrs, search, sync, and timeline
  subcommands. Each shows 3-4 concrete, runnable commands with comments.

Help headings (I4/P5):
- Move --mode and --fts-mode from "Output" heading to "Mode" heading
  in the search subcommand. These control search strategy, not output
  format — "Output" is reserved for --limit, --explain, --fields.

Exit codes (I5):
- Health check failure now exits 19 (was 1). Exit code 1 is reserved
  for internal errors only. robot-docs updated to document code 19.

Deprecation visibility (P4):
- Deprecated commands (list, show, auth-test, sync-status) now emit
  structured JSON warnings to stderr in robot mode:
  {"warning":{"type":"DEPRECATED","message":"...","successor":"..."}}
  Previously these were silently swallowed in robot mode.

Version string (P1):
- Cli struct uses env!("LORE_VERSION") from build.rs so --version shows
  git hash (see previous commit).

Fields flag (P3):
- --fields help text updated to document the "minimal" preset.

Robot-docs (parallel work):
- response_schema added for every command, documenting the JSON shape
  agents will receive. Agents can now introspect expected fields before
  calling a command.
- error_format documents the new "actions" array.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:47:04 -05:00
Taylor Eernisse
cf6d27435a feat(robot): add elapsed_ms timing, --fields support, and actionable error actions
Robot mode consistency improvements across all command output:

Timing:
- Every robot JSON response now includes meta.elapsed_ms measuring
  wall-clock time from command start to serialization. Agents can use
  this to detect slow queries and tune --limit or --project filters.

Field selection (--fields):
- print_list_issues_json and print_list_mrs_json accept an optional
  fields slice that prunes each item in the response array to only
  the requested keys. A "minimal" preset expands to
  [iid, title, state, updated_at_iso] for token-efficient agent scans.
- filter_fields and expand_fields_preset live in the new
  src/cli/robot.rs module alongside RobotMeta.

Actionable error recovery:
- LoreError gains an actions() method returning concrete shell commands
  an agent can execute to recover (e.g. "ollama serve" for
  OllamaUnavailable, "lore init" for ConfigNotFound).
- RobotError now serializes an "actions" array (empty array omitted)
  so agents can parse and offer one-click fixes.

Envelope consistency:
- show issue/MR JSON responses now use the standard
  {"ok":true,"data":...,"meta":...} envelope instead of bare data,
  matching all other commands.

Files: src/cli/robot.rs (new), src/core/error.rs,
       src/cli/commands/{count,embed,generate_docs,ingest,list,show,stats,sync_status}.rs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:46:48 -05:00