Commit Graph

325 Commits

Author SHA1 Message Date
Taylor Eernisse
1161edb212 docs: add TUI PRD v2 (FrankenTUI) with 9 plan-refine iterations
Comprehensive product requirements document for the gitlore TUI built on
FrankenTUI's Elm architecture (Msg -> update -> view). The PRD (7800+
lines) covers:

Architecture: Separate binary crate (lore-tui) with runtime delegation,
Elm-style Model/Cmd/Msg, DbManager with closure-based read pool + WAL,
TaskSupervisor for dedup/cancellation, EntityKey system for type-safe
entity references, CommandRegistry as single source of truth for
keybindings/palette/help.

Screens: Dashboard, IssueList, IssueDetail, MrList, MrDetail, Search
(lexical/hybrid/semantic with facets), Timeline (5-stage pipeline),
Who (expert/workload/reviews/active/overlap), Sync (live progress),
CommandPalette, Help overlay.

Infrastructure: InputMode state machine, Clock trait for deterministic
rendering, crash_context ring buffer with redaction, instance lock,
progressive hydration, session restore, grapheme-safe text truncation
(unicode-width + unicode-segmentation), terminal sanitization (ANSI/bidi/
C1 controls), entity LRU cache.

Testing: Snapshot tests via insta, event-fuzz, CLI/TUI parity, tiered
benchmark fixtures (S/M/L), query-plan CI enforcement, Phase 2.5
vertical slice gate.

9 plan-refine iterations (ChatGPT review -> Claude integration):
  Iter 1-3: Connection pool, debounce, EntityKey, TaskSupervisor,
    keyset pagination, capability-adaptive rendering
  Iter 4-6: Separate binary crate, ANSI hardening, session restore,
    read tx isolation, progressive hydration, unicode-width
  Iter 7-9: Per-screen LoadState, CommandRegistry, InputMode, Clock,
    log redaction, entity cache, search cancel SLO, crash diagnostics

Also includes the original tui-prd.md (ratatui-based, superseded by v2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:11:26 -05:00
Taylor Eernisse
5ea976583e docs: update README, AGENTS, and robot-mode-design for work item status
README.md:
- Feature summary updated to mention work item status sync and GraphQL
- New config reference entry for sync.fetchWorkItemStatus (default true)
- Issue listing/show examples include --status flag usage
- Valid fields list expanded with status_name, status_category,
  status_color, status_icon_name, status_synced_at_iso
- Database schema table updated for issues table
- Ingest/sync command descriptions mention status enrichment phase
- Adaptive page sizing and graceful degradation documented

AGENTS.md:
- Robot mode example shows --status flag usage

docs/robot-mode-design.md:
- Issue available fields list expanded with status fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:10:51 -05:00
Taylor Eernisse
dcfd449b72 feat(cli): status display/filtering, expanded --fields, and robot-docs --brief
Work item status integration across all CLI output:

Issue listing (lore list issues):
- New Status column appears when any issue has status data, with
  hex-color rendering using ANSI 256-color approximation
- New --status flag for case-insensitive filtering (OR logic for
  multiple values): lore issues --status "In progress" --status "To do"

Issue detail (lore show issue):
- Displays "Status: In progress (in_progress)" with color-coded output
- Status fields (name, category, color, icon, synced_at) included in
  robot mode JSON with ISO timestamps

Robot mode field selection expanded to new commands:
- search: --fields with "minimal" preset (document_id, title, source_type, score)
- timeline: --fields with "minimal" preset (timestamp, type, entity_iid, detail)
- who: --fields with per-mode presets (expert_minimal, workload_minimal, etc.)
- robot-docs: new --brief flag strips response_schema from output (~60% smaller)

Robot-docs manifest updated with --status flag documentation, --fields
flags for search/timeline/who, fields_presets sections, and corrected
search response schema field names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:09:47 -05:00
Taylor Eernisse
6b75697638 feat(ingestion): enrich issues with work item status from GraphQL API
Add a "Phase 1.5" status enrichment step to the issue ingestion pipeline
that fetches work item statuses via the GitLab GraphQL API after the
standard REST API ingestion completes.

Schema changes (migration 021):
- Add status_name, status_category, status_color, status_icon_name, and
  status_synced_at columns to the issues table (all nullable)

Ingestion pipeline changes:
- New `enrich_issue_statuses_txn()` function that applies fetched
  statuses in a single transaction with two phases: clear stale statuses
  for issues that no longer have a status widget, then apply new/updated
  statuses from the GraphQL response
- ProgressEvent variants for status enrichment (complete/skipped)
- IngestProjectResult tracks enrichment metrics (seen, enriched, cleared,
  without_widget, partial_error_count, enrichment_mode, errors)
- Robot mode JSON output includes per-project status enrichment details

Configuration:
- New `sync.fetchWorkItemStatus` config option (defaults true) to disable
  GraphQL status enrichment on instances without Premium/Ultimate
- `LoreError::GitLabAuthFailed` now treated as permanent API error so
  status enrichment auth failures don't trigger retries

Also removes the unnecessary nested SAVEPOINT in store_closes_issues_refs
(already runs within the orchestrator's transaction context).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:09:21 -05:00
Taylor Eernisse
dc49f5209e feat(gitlab): add GraphQL client with adaptive pagination and work item status types
Introduce a reusable GraphQL client (`src/gitlab/graphql.rs`) that handles
GitLab's GraphQL API with full error handling for auth failures, rate
limiting, and partial errors. Key capabilities:

- Adaptive page sizing (100 → 50 → 25 → 10) to handle GitLab GraphQL
  complexity limits without hardcoding a single safe page size
- Paginated issue status fetching via the workItems GraphQL query
- Graceful detection of unsupported instances (missing GraphQL endpoint
  or forbidden auth) so ingestion continues without status data
- Retry-After header parsing via the `httpdate` crate for rate limit
  compliance

Also adds `WorkItemStatus` type to `gitlab::types` with name, category,
color, and icon_name fields (all optional except name) with comprehensive
deserialization tests covering all system statuses (TO_DO, IN_PROGRESS,
DONE, CANCELED) and edge cases (null category, unknown future values).

The `GitLabClient` gains a `graphql_client()` factory method for
ergonomic access from the ingestion pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:08:53 -05:00
Taylor Eernisse
7d40a81512 fix(ingestion): remove nested transaction in upsert_mr_file_changes
drain_mr_diffs in orchestrator.rs already wraps each MR diff store
in an unchecked_transaction (alongside job completion and watermark
update). upsert_mr_file_changes was also starting its own inner
transaction via conn.unchecked_transaction(), causing every call to
fail with "cannot start a transaction within a transaction".

Remove the inner transaction management from upsert_mr_file_changes
so it operates on whatever Connection (or Transaction deref'd to
Connection) the caller provides. The caller in drain_mr_diffs owns
the transaction boundary. Standalone callers (tests, future direct
use) auto-commit each statement, which is correct for their use case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 11:56:15 -05:00
Taylor Eernisse
4185abe05d docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc
Ideas catalog (docs/ideas/): 25 feature concept documents covering future
lore capabilities including bottleneck detection, churn analysis, expert
scoring, collaboration patterns, milestone risk, knowledge silos, and more.
Each doc includes motivation, implementation sketch, data requirements, and
dependencies on existing infrastructure. README.md provides an overview and
SYSTEM-PROPOSAL.md presents the unified analytics vision.

Plans (plans/): Time-decay expert scoring design with four rounds of review
feedback exploring decay functions, scoring algebra, and integration points
with the existing who-expert pipeline.

Issue doc (docs/issues/001): Documents the timeline pipeline bug where
EntityRef was missing project context, causing ambiguous cross-project
references during the EXPAND stage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:48 -05:00
Taylor Eernisse
d54f669c5e chore: add multi-agent editor config and UBS file-write hook
Add rule/config files for Cursor, Cline, Codex, Gemini, Continue, and
OpenCode editors pointing them to project conventions, UBS usage, and
AGENTS.md. Add a Claude Code on-file-write hook that runs UBS on
supported source files after every save.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:28 -05:00
Taylor Eernisse
45126f04a6 fix: document upsert project_id, truncation budget, and Ollama model matching
- regenerator: Include project_id in the ON CONFLICT UPDATE clause for
  document upserts. Previously, if a document moved between projects
  (e.g., during re-ingestion), the project_id would remain stale.

- truncation: Compute the omission marker ("N notes omitted") before
  checking whether first+last notes fit in the budget. The old order
  computed the marker after the budget check, meaning the marker's byte
  cost was unaccounted for and could cause over-budget output.

- ollama: Tighten model name matching to require either an exact match
  or a colon-delimited tag prefix (model == name or name starts with
  "model:"). The prior starts_with check would false-positive on
  "nomic-embed-text-v2" when looking for "nomic-embed-text". Tests
  updated to cover exact match, tagged, wrong model, and prefix
  false-positive cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:14 -05:00
Taylor Eernisse
dfa44e5bcd fix(ingestion): label upsert reliability, init idempotency, and sync health
Label upsert (issues + merge_requests): Replace INSERT ... ON CONFLICT DO
UPDATE RETURNING with INSERT OR IGNORE + SELECT. The prior RETURNING-based
approach relied on last_insert_rowid() matching the returned id, which is
not guaranteed when ON CONFLICT triggers an update (SQLite may return 0).
The new two-step approach is unambiguous and correctly tracks created_count.

Init: Add ON CONFLICT(gitlab_project_id) DO UPDATE to the project insert
so re-running `lore init` updates path/branch/url instead of failing with
a unique constraint violation.

MR discussions sync: Reset discussions_sync_attempts to 0 when clearing a
sync health error, so previously-failed MRs get a fresh retry budget after
successful sync.

Count: format_number now handles negative numbers correctly by extracting
the sign before inserting thousand-separators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:15:53 -05:00
Taylor Eernisse
53ef21d653 fix: propagate DB errors instead of silently swallowing them
Replace .unwrap_or(), .ok(), and .filter_map(|r| r.ok()) patterns with
proper error propagation using ? and rusqlite::OptionalExtension where
the query may legitimately return no rows.

Affected areas:
- events_db::count_events: three count queries now propagate errors
  instead of defaulting to (0, 0) on failure
- note_parser::extract_refs_from_system_notes: row iteration errors
  are now propagated instead of silently dropped via filter_map
- note_parser::noteable_type_to_entity_type: unknown types now log a
  debug warning before defaulting to "issue"
- payloads::store_payload/read_payload: use .optional()? instead of
  .ok() to distinguish "no row" from "query failed"
- backoff::compute_next_attempt_at: use .clamp(0, 30) to guard against
  negative attempt_count, not just .min(30)
- search::vector::max_chunks_per_document: returns Result<i64> with
  proper error propagation through .optional()?.flatten()
- embedding::chunk_ids::decode_rowid: promote debug_assert to assert
  since negative rowids indicate data corruption worth failing fast on
- ingestion::dirty_tracker::record_dirty_error: use .optional()? to
  handle missing dirty_sources row gracefully instead of hard error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:15:36 -05:00
Taylor Eernisse
41504b4941 feat(who): configurable scoring weights, MR refs, detail mode, and suffix path resolution
Expert mode now surfaces the specific MR references (project/path!iid) that
contributed to each expert's score, capped at 50 per user. A new --detail flag
adds per-MR breakdowns showing role (Author/Reviewer/both), note count, and
last activity timestamp.

Scoring weights (author_weight, reviewer_weight, note_bonus) are now
configurable via the config file's `scoring` section with validation that
rejects negative values. Defaults shift to author_weight=25, reviewer_weight=10,
note_bonus=1 — better reflecting that code authorship is a stronger expertise
signal than review assignment alone.

Path resolution gains suffix matching: typing "login.rs" auto-resolves to
"src/auth/login.rs" when unambiguous, with clear disambiguation errors when
multiple paths match. Project-scoping (-p) narrows the candidate set.

The MAX_MR_REFS_PER_USER constant is promoted to module scope for reuse
across expert and overlap modes. Human output shows MR refs inline and detail
sub-rows when requested. Robot JSON includes mr_refs, mr_refs_total,
mr_refs_truncated, and optional details array.

Includes comprehensive tests for suffix resolution, scoring weight
configurability, MR ref aggregation across projects, and detail mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:15:15 -05:00
Taylor Eernisse
d36850f181 release: v0.5.2 2026-02-08 17:24:17 -05:00
Taylor Eernisse
5ce18e0ebc release: v0.5.1 2026-02-08 14:36:06 -05:00
Taylor Eernisse
b168a58134 fix(search): cap vector search k-value and add rowid assertion
The vector search multiplier could grow unbounded on documents with
many chunks, producing enormous k values that cause SQLite to scan
far more rows than necessary. Clamp the multiplier to [8, 200] and
cap k at 10,000 to prevent degenerate performance on large corpora.

Also adds a debug_assert in decode_rowid to catch negative rowids
early — these indicate a bug in the encoding pipeline and should
fail fast rather than silently produce garbage document IDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:34:05 -05:00
Taylor Eernisse
b704e33188 feat(sync): surface MR diff fetch/fail counters in sync output
Adds mr_diffs_fetched and mr_diffs_failed fields to IngestResult and
SyncResult, threads them through the orchestrator aggregation, includes
them in the structured tracing span and human-readable sync summary.
Previously MR diff failures were silently swallowed — now they appear
alongside resource event counts for full pipeline observability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:53 -05:00
Taylor Eernisse
6e82f723c3 fix(ingestion): unify store + watermark + job-complete in single transaction
Previously, drain_resource_events, drain_mr_closes_issues, and
drain_mr_diffs each opened a transaction only for the job-complete +
watermark update, but the store operation ran outside that transaction.
If the process crashed between the store and the watermark update, data
would be persisted without the watermark advancing, causing silent
duplicates on the next sync.

Now each drain function opens the transaction before the store call and
commits it only after both the store and the watermark update succeed.
On error, the transaction is explicitly dropped so the connection is
not left in a half-committed state.

Also:
- store_resource_events no longer manages its own transaction; the caller
  passes in a connection (which is actually the transaction)
- upsert_mr_file_changes wraps DELETE + INSERT in a transaction internally
- reset_discussion_watermarks now also clears diffs_synced_for_updated_at
- Orchestrator error span now includes closes_issues_failed + mr_diffs_failed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:47 -05:00
Taylor Eernisse
940a96375a refactor(search): rename --after/--updated-after to --since/--updated-since
The --since naming is more intuitive (matches git log --since) and
consistent with the list commands which already use --since. Renames
the CLI flags, SearchCliFilters fields, SearchFilters fields,
autocorrect registry, and robot-docs manifest. No behavioral change.

Affected paths:
- cli/mod.rs: SearchArgs field + clap attribute rename
- cli/commands/search.rs: SearchCliFilters + run_search plumbing
- search/filters.rs: SearchFilters struct + apply_filters logic
- main.rs: handle_search + robot-docs JSON
- cli/autocorrect.rs: COMMAND_FLAGS entry for search

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:24 -05:00
Taylor Eernisse
7dd86d5433 fix(db): add missing schema_version insert to migration 019
Migration 019 created performance indexes but never recorded itself
in the schema_version table. Without this row the migration runner
considers the schema outdated and would attempt to re-apply. Adds
the standard INSERT INTO schema_version for version 19.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:13 -05:00
Taylor Eernisse
429c6f07d2 release: v0.5.0
Bump version from 0.1.0 to 0.5.0 to reflect the maturity of the CLI
after months of development — robot mode, search pipeline, ingestion
orchestrator, who commands, timeline pipeline, and embedding support
are all implemented and stable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:07 -05:00
Taylor Eernisse
754efa4369 chore: add /release skill for automated SemVer version bumps
Adds a Claude Code skill that automates the release workflow:
parse bump type (major/minor/patch), update Cargo.toml + Cargo.lock,
commit, and tag. Intentionally does not auto-push so the user
retains control over when releases go to the remote.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:02 -05:00
Taylor Eernisse
c54a969269 fix(who): exclude self-assigned reviewers from file-change reviewer signal
Signal 4 (mr_reviewers + mr_file_changes) was missing the self-review
exclusion that signal 1 (DiffNote reviewer) already had. An MR author
listed as their own reviewer would be double-counted as both author
and reviewer, inflating their score.

Also removes redundant SELECT DISTINCT from signal 2 (GROUP BY
already ensures uniqueness).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 13:42:40 -05:00
Taylor Eernisse
95b7183add feat(who): expand expert + overlap queries with mr_file_changes and mr_reviewers
Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries)

- Add fetch_mr_file_changes config option and --no-file-changes CLI flag
- Add GitLab MR diffs API fetch pipeline with watermark-based sync
- Create migration 020 for diffs_synced_for_updated_at watermark column
- Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL:
  DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers
- Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END)
- Add insert_file_change test helper, 8 new who tests, all 397 tests pass
- Also includes: list performance migration 019, autocorrect module, README updates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 13:35:14 -05:00
Taylor Eernisse
435a208c93 perf: eliminate unnecessary clones and pre-allocate collections
Three micro-optimizations with zero behavioral change:

1. timeline_collect.rs: Reorder format!() before enum construction so
   the owned String moves into the variant directly, eliminating
   .clone() on state, label, and milestone strings in StateChanged,
   LabelAdded/Removed, and MilestoneSet/Removed event paths.

2. pipeline.rs: Use Arc<str> for doc_hash shared across a document's
   chunks instead of cloning the full String per chunk. Also remove
   redundant embed_buf.reserve() since extend_from_slice already
   handles growth and the buffer is reused across iterations.

3. rrf.rs: Pre-allocate HashMap with combined vector+fts result count
   via with_capacity() to avoid rehashing during RRF score accumulation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 08:08:14 -05:00
Taylor Eernisse
cc11d3e5a0 fix: peer review — 5 correctness bugs across who, db, lock, embedding, main
Comprehensive peer code review identified and fixed the following:

1. who.rs: @-prefixed path routing used `target` (with @) instead of
   `clean` (stripped) when checking for '/' and passing to Expert mode,
   causing `lore who @src/auth/` to silently return zero results because
   the SQL LIKE matched against `@src/auth/%` which never exists.

2. db.rs: After ROLLBACK TO savepoint on migration failure, the savepoint
   was never RELEASEd, leaving it active on the connection. Fixed in both
   run_migrations() and run_migrations_from_dir().

3. lock.rs: Multiple acquire() calls (e.g. re-acquiring a stale lock)
   replaced the heartbeat_handle without stopping the old thread, causing
   two concurrent heartbeat writers competing on the same lock row. Now
   signals the old thread to stop and joins it before spawning a new one.

4. chunk_ids.rs: encode_rowid() had no guard for chunk_index >= 1000
   (CHUNK_ROWID_MULTIPLIER), which would cause rowid collisions between
   adjacent documents. Added range assertion [0, 1000).

5. main.rs: Fallback JSON error formatting in handle_auth_test
   interpolated LoreError Display output without escaping quotes or
   backslashes, potentially producing malformed JSON for robot-mode
   consumers. Now escapes both characters before interpolation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 08:07:59 -05:00
Taylor Eernisse
5786d7f4b6 fix: defensive hardening — lock release logging, SQLite param guard, vector cast
Three defensive improvements found via peer code review:

1. lock.rs: Lock release errors were silently discarded with `let _ =`.
   If the DELETE failed (disk full, corruption), the lock stayed in the
   database with no diagnostic. Next sync would require --force with no
   clue why. Now logs with error!() including the underlying error message.

2. filters.rs: Dynamic SQL label filter construction had no upper bound
   on bind parameters. With many combined filters, param_idx + labels.len()
   could exceed SQLite's 999-parameter limit, producing an opaque error.
   Added a guard that caps labels at 900 - param_idx.

3. vector.rs: max_chunks_per_document returned i64 which was cast to
   usize. A negative value from a corrupt database would wrap to a huge
   number, causing overflow in the multiplier calculation. Now clamped
   to .max(1) and cast via unsigned_abs().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:54 -05:00
Taylor Eernisse
d3306114eb fix(ingestion): pass ShutdownSignal into issue and MR pagination loops
The orchestrator already accepted a ShutdownSignal but only checked it
between phases (after all issues fetched, before discussions). The inner
loops in ingest_issues() and ingest_merge_requests() consumed entire
paginated streams without checking for cancellation.

On a large initial sync (thousands of issues/MRs), Ctrl+C could be
unresponsive for minutes while the current entity type finished draining.

Now both functions accept &ShutdownSignal and check is_cancelled() at
the top of each iteration, breaking out promptly and committing the
cursor for whatever was already processed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:36 -05:00
Taylor Eernisse
e6b880cbcb fix: prevent panics in robot-mode JSON output and arithmetic paths
Peer code review found multiple panic-reachable paths:

1. serde_json::to_string().unwrap() in 4 robot-mode output functions
   (who.rs, main.rs x3). If serialization ever failed (e.g., NaN from
   edge-case division), the CLI would panic with an unhelpful stack trace.
   Replaced with unwrap_or_else that emits a structured JSON error fallback.

2. encode_rowid() in chunk_ids.rs used unchecked multiplication
   (document_id * 1000). On extreme document IDs this could silently wrap
   in release mode, causing embedding rowid collisions. Now uses
   checked_mul + checked_add with a diagnostic panic message.

3. HTTP response body truncation at byte index 500 in client.rs could
   split a multi-byte UTF-8 character, causing a panic. Now uses
   floor_char_boundary(500) for safe truncation.

4. who.rs reviews mode: SQL used `m.author_username != ?1` which silently
   dropped MRs with NULL author_username (SQL NULL != anything = NULL).
   Changed to `(m.author_username IS NULL OR m.author_username != ?1)`
   to match the pattern already used in expert mode.

5. handle_auth_test hardcoded exit code 5 for all errors regardless of
   type. Config not found (20), token not set (4), and network errors (8)
   all incorrectly returned 5. Now uses e.exit_code() from the actual
   LoreError, with proper suggestion hints in human mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:20 -05:00
Taylor Eernisse
121a634653 fix: critical data integrity — timeline dedup, discussion atomicity, index collision
Three correctness bugs found via peer code review:

1. TimelineEvent PartialEq/Ord omitted entity_type — issue #42 and MR #42
   with the same timestamp and event_type were treated as equal. In a
   BTreeSet or dedup, one would silently be dropped. Added entity_type to
   both PartialEq and Ord comparisons.

2. discussions.rs: store_payload() was called outside the transaction
   (on bare conn) while upsert_discussion/notes were inside. A crash
   between them left orphaned payload rows. Moved store_payload inside
   the unchecked_transaction block, matching mr_discussions.rs pattern.

3. Migration 017 created idx_issue_assignees_username(username, issue_id)
   but migration 005 already created the same index name with just
   (username). SQLite's IF NOT EXISTS silently skipped the composite
   version on every existing database. New migration 018 drops and
   recreates the index with correct composite columns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:54:59 -05:00
Taylor Eernisse
f267578aab feat: implement lore who — people intelligence commands (5 modes)
Add `lore who` command with 5 query modes answering collaboration questions
using existing DB data (280K notes, 210K discussions, 33K DiffNotes):

- Expert: who knows about a file/directory (DiffNote path analysis + MR breadth scoring)
- Workload: what is a person working on (assigned issues, authored/reviewing MRs, discussions)
- Active: what discussions need attention (unresolved resolvable, global/project-scoped)
- Overlap: who else is touching these files (dual author+reviewer role tracking)
- Reviews: what review patterns does a person have (prefix-based category extraction)

Includes migration 017 (5 composite indexes), CLI skeleton with clap conflicts_with
validation, robot JSON output with input+resolved_input reproducibility, human terminal
output, and 20 unit tests. All quality gates pass.

Closes: bd-1q8z, bd-34rr, bd-2rk9, bd-2ldg, bd-zqpf, bd-s3rc, bd-m7k1, bd-b51e,
bd-2711, bd-1rdi, bd-3mj2, bd-tfh3, bd-zibc, bd-g0d5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 23:11:14 -05:00
Taylor Eernisse
859923f86b docs: update AGENTS.md robot mode section for --fields, actions, exit codes
Sync the agent instructions with the current robot mode implementation:
- Add RUST_CLI_TOOLS_BEST_PRACTICES.md reference for Rust coding guidance
- Expand robot mode description to cover all new capabilities
- Add --fields examples (minimal preset, custom field lists)
- Document error actions array for automated recovery workflows
- Update response format to show elapsed_ms and actions in error envelope
- Add field selection section with usage examples
- Separate health check to exit code 19 (was overloaded on exit code 1)
- Add robot-docs recommendation for response schema discovery
- Update best practices with --fields minimal for token efficiency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:32 -05:00
Taylor Eernisse
d701b1f977 docs: add plan frontmatter to api-efficiency-findings
Add YAML frontmatter metadata (plan: true, status: drafting, iteration: 0)
to integrate with the iterative plan review workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:24 -05:00
Taylor Eernisse
736d9c9a80 docs: rewrite robot-mode-design to reflect implemented features
Comprehensive update to the robot mode design document bringing it in sync
with the actual implementation after the elapsed_ms, --fields, and error
actions features landed.

Major additions:
- Response envelope section documenting compact JSON with elapsed_ms timing
- Error actions table mapping each error code to executable recovery commands
- Field selection section with presets (minimal) and per-entity available fields
- Expanded exit codes table (14-20) covering Ollama, embedding, ambiguity errors
- Updated command examples to use current CLI syntax (lore issues vs lore list issues)
- Added -J shorthand and --fields to global flags table
- Best practices section with --fields minimal for token efficiency (~60% reduction)

Removed outdated sections that no longer match the implementation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:16 -05:00
Taylor Eernisse
8dc479e515 docs: add lore who command design plan with 8 iterations of review feedback
Design document for `lore who` — a people intelligence query layer over
existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53
participants). Answers five collaboration questions: expert lookup by
file/path, workload summary, review pattern analysis, active discussion
tracking, and file overlap detection.

Key design decisions refined across 8 feedback iterations:
- All SQL is fully static (no format!()) with prepare_cached() throughout
- Exact vs prefix path matching via PathQuery struct (two static SQL variants)
- Self-review exclusion (author != reviewer) on all DiffNote branches
- Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers
- Bounded payloads with *_total/*_truncated metadata for robot consumers
- Truncation transparency via LIMIT+1 overflow detection pattern
- Robot JSON includes resolved_input for reproducibility (since_mode tri-state)
- Multi-project correctness with project-qualified entity references
- Composite migration indexes designed for query selectivity on hot paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:05 -05:00
Taylor Eernisse
3e7fa607d3 docs: update README for --fields, elapsed_ms, error actions, exit code 19
Documents the robot mode enhancements from the previous commits:

- Field selection (--fields flag and minimal preset) with examples
  and complete field lists for issues and MRs
- Updated response format section to show meta.elapsed_ms and compact
  single-line JSON
- Error actions array with recovery shell commands
- Agent self-discovery section explaining robot-docs response_schema
- Exit code 19 for health check failure added to the table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:47:30 -05:00
Taylor Eernisse
b5f78e31a8 fix(cli): audit-driven improvements to flags, help, exit codes, and deprecation
Addresses findings from a comprehensive CLI readiness audit:

Flag design (I2):
- Add hidden --no-verbose flag with overrides_with semantics, matching
  the --no-quiet pattern already established for all other boolean flags.

Help text (I3):
- Add after_help examples to issues, mrs, search, sync, and timeline
  subcommands. Each shows 3-4 concrete, runnable commands with comments.

Help headings (I4/P5):
- Move --mode and --fts-mode from "Output" heading to "Mode" heading
  in the search subcommand. These control search strategy, not output
  format — "Output" is reserved for --limit, --explain, --fields.

Exit codes (I5):
- Health check failure now exits 19 (was 1). Exit code 1 is reserved
  for internal errors only. robot-docs updated to document code 19.

Deprecation visibility (P4):
- Deprecated commands (list, show, auth-test, sync-status) now emit
  structured JSON warnings to stderr in robot mode:
  {"warning":{"type":"DEPRECATED","message":"...","successor":"..."}}
  Previously these were silently swallowed in robot mode.

Version string (P1):
- Cli struct uses env!("LORE_VERSION") from build.rs so --version shows
  git hash (see previous commit).

Fields flag (P3):
- --fields help text updated to document the "minimal" preset.

Robot-docs (parallel work):
- response_schema added for every command, documenting the JSON shape
  agents will receive. Agents can now introspect expected fields before
  calling a command.
- error_format documents the new "actions" array.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:47:04 -05:00
Taylor Eernisse
cf6d27435a feat(robot): add elapsed_ms timing, --fields support, and actionable error actions
Robot mode consistency improvements across all command output:

Timing:
- Every robot JSON response now includes meta.elapsed_ms measuring
  wall-clock time from command start to serialization. Agents can use
  this to detect slow queries and tune --limit or --project filters.

Field selection (--fields):
- print_list_issues_json and print_list_mrs_json accept an optional
  fields slice that prunes each item in the response array to only
  the requested keys. A "minimal" preset expands to
  [iid, title, state, updated_at_iso] for token-efficient agent scans.
- filter_fields and expand_fields_preset live in the new
  src/cli/robot.rs module alongside RobotMeta.

Actionable error recovery:
- LoreError gains an actions() method returning concrete shell commands
  an agent can execute to recover (e.g. "ollama serve" for
  OllamaUnavailable, "lore init" for ConfigNotFound).
- RobotError now serializes an "actions" array (empty array omitted)
  so agents can parse and offer one-click fixes.

Envelope consistency:
- show issue/MR JSON responses now use the standard
  {"ok":true,"data":...,"meta":...} envelope instead of bare data,
  matching all other commands.

Files: src/cli/robot.rs (new), src/core/error.rs,
       src/cli/commands/{count,embed,generate_docs,ingest,list,show,stats,sync_status}.rs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:46:48 -05:00
Taylor Eernisse
4ce0130620 build: emit LORE_VERSION env var combining version and git hash
The clap --version flag now shows the git hash alongside the semver
version (e.g. "lore 0.1.0 (a573d69)") instead of bare "lore 0.1.0".

LORE_VERSION is constructed at compile time in build.rs from
CARGO_PKG_VERSION + the short git hash, and consumed via
env!("LORE_VERSION") in the Cli struct's #[command(version)] attribute.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:46:29 -05:00
Taylor Eernisse
a573d695d5 test(perf): add benchmarks for hash query elimination and embed bytes
Two new microbenchmarks measuring optimizations applied in this session:

bench_redundant_hash_query_elimination:
  Compares the old 2-query pattern (get_existing_hash + full SELECT)
  against the new single-query pattern where upsert_document_inner
  returns change detection info directly. Uses 100 seeded documents
  with 10K iterations, prepare_cached, and black_box to prevent
  elision.

bench_embedding_bytes_alloc_vs_reuse:
  Compares per-call Vec<u8> allocation against the reusable embed_buf
  pattern now used in store_embedding. Simulates 768-dim embeddings
  (nomic-embed-text) with 50K iterations. Includes correctness
  assertion that both approaches produce identical byte output.

Both benchmarks use informational-only timing (no pass/fail on speed)
with correctness assertions as the actual test criteria, ensuring they
never flake on CI.

Notes recorded in benchmark file:
- SHA256 hex formatting optimization measured at 1.01x (reverted)
- compute_list_hash sort strategy measured at 1.02x (reverted)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:43:11 -05:00
Taylor Eernisse
a855759bf8 fix: shutdown safety, CLI hardening, exit code collision
Shutdown signal improvements:
- Upgrade ShutdownSignal from Relaxed to Release/Acquire ordering.
  Relaxed was technically sufficient for a single flag but
  Release/Acquire is the textbook correct pattern and ensures
  visibility guarantees across threads without relying on x86 TSO.
- Add double Ctrl+C support to all three signal handlers (ingest,
  embed, sync). First Ctrl+C sets cooperative flag with user message;
  second Ctrl+C force-exits with code 130 (standard SIGINT convention).

CLI hardening:
- LORE_ROBOT env var now checks for truthy values (!empty, !="0",
  !="false") instead of mere existence. Setting LORE_ROBOT=0 or
  LORE_ROBOT=false no longer activates robot mode.
- Replace unreachable!() in color mode match with defensive warning
  and fallback to auto. Clap validates the values but defense in depth
  prevents panics if the value_parser is ever changed.
- Replace unreachable!() in completions shell match with proper error
  return for unsupported shells.

Exit code collision fix:
- ConfigNotFound was mapped to exit code 2 (error.rs:56) which
  collided with handle_clap_error() also using exit code 2 for parse
  errors. Agents calling lore --robot could not distinguish "bad
  arguments" from "missing config file."
- Restore ConfigNotFound to exit code 20 (its original dedicated code).
- Update robot-docs exit code table: code 2 = "Usage error", code 20 =
  "Config not found".

Build script:
- Track .git/refs/heads directory for Cargo rebuild triggers. Ensures
  GIT_HASH env var updates when branch refs change, not just HEAD.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:59 -05:00
Taylor Eernisse
f3f3560e0d fix(ingestion): proper error propagation and transaction safety
Three hardening improvements to the ingestion orchestrator:

- Replace .unwrap_or(0) with ? on COUNT(*) queries for total_issues
  and total_mrs. These are simple aggregate queries that should never
  fail, but if they do (e.g. table missing after failed migration),
  propagating the error gives an actionable message instead of silently
  reporting 0 items.

- Wrap store_closes_issues_refs in a SAVEPOINT with proper
  ROLLBACK/RELEASE. Previously, a failure mid-loop (e.g. on the 5th of
  10 close-issue references) would leave partial refs committed. Now
  the entire batch is atomic.

- Replace silent catch-all (_ => {}) arms in enqueue_resource_events
  and update_resource_event_watermark with explicit warnings for
  unknown entity_type values. Makes debugging easier when new entity
  types are added but the match arms aren't updated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:40 -05:00
Taylor Eernisse
2bfa4f1f8c perf(documents): eliminate redundant hash query in regeneration
The document regenerator was making two queries per document:
1. get_existing_hash() — SELECT content_hash
2. upsert_document_inner() — SELECT id, content_hash, labels_hash, paths_hash

Query 2 already returns the content_hash needed for change detection.
Remove get_existing_hash() entirely and compute content_changed inside
upsert_document_inner() from the existing row data.

upsert_document_inner now returns Result<bool> (true = content changed)
which propagates up through upsert_document and regenerate_one,
replacing the separate pre-check. The triple-hash fast-path (all three
hashes match → return Ok(false) with no writes) is preserved.

This halves the query count for unchanged documents, which dominate
incremental syncs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:26 -05:00
Taylor Eernisse
8cf14fb69b feat(search): sanitize raw FTS5 queries with safe fallback
Add input validation for Raw FTS query mode to prevent expensive or
malformed queries from reaching SQLite FTS5:

- Reject unbalanced double quotes (would cause FTS5 syntax error)
- Reject leading wildcard-only queries ("*", "* OR ...") that trigger
  expensive full-table scans
- Reject empty/whitespace-only queries
- Invalid raw input falls back to Safe mode automatically instead of
  erroring, so callers never see FTS5 parse failures

The Safe mode already escapes all tokens with double-quote wrapping
and handles embedded quotes via doubling. Raw mode now has a
validation layer on top.

All queries remain parameterized (?1, ?2) — user input never enters
SQL strings directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:17 -05:00
Taylor Eernisse
c2036c64e9 feat(embed): docs_embedded tracking, buffer reuse, retry hardening
Embedding pipeline improvements building on the concurrent batching
foundation:

- Track docs_embedded vs chunks_embedded separately. A document counts
  as embedded only when ALL its chunks succeed, giving accurate
  progress reporting. The sync command reads docs_embedded for its
  document count.

- Reuse a single Vec<u8> buffer (embed_buf) across all store_embedding
  calls instead of allocating per chunk. Eliminates ~3KB allocation per
  768-dim embedding.

- Detect and record errors when Ollama silently returns fewer
  embeddings than inputs (batch mismatch). Previously these dropped
  chunks were invisible.

- Improve retry error messages: distinguish "retry returned unexpected
  result" (wrong dims/count) from "retry request failed" (network
  error) instead of generic "chunk too large" message.

- Convert all hot-path SQL from conn.execute() to prepare_cached() for
  statement cache reuse (clear_document_embeddings, store_embedding,
  record_embedding_error).

- Record embedding_metadata errors for empty documents so they don't
  appear as perpetually pending on subsequent runs.

- Accept concurrency parameter (configurable via config.embedding.concurrency)
  instead of hardcoded EMBED_CONCURRENCY=2.

- Add schema version pre-flight check in embed command to fail fast
  with actionable error instead of cryptic SQL errors.

- Fix --retry-failed to use DELETE instead of UPDATE. UPDATE clears
  last_error but the row still matches config params in the LEFT JOIN,
  making the doc permanently invisible to find_pending_documents.
  DELETE removes the row entirely so the LEFT JOIN returns NULL.
  Regression test added (old_update_approach_leaves_doc_invisible).

- Add chunking forward-progress guard: after floor_char_boundary()
  rounds backward, ensure start advances by at least one full
  character to prevent infinite loops on multi-byte sequences
  (box-drawing chars, smart quotes). Test cases cover the exact
  patterns that caused production hangs on document 18526.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:08 -05:00
Taylor Eernisse
39cb0cb087 feat(embed): concurrent batching, UTF-8 safe chunking, right-sized chunks
Three fixes to the embedding pipeline:

1. Concurrent HTTP batching: fire EMBED_CONCURRENCY (2) Ollama requests
   in parallel via join_all, then write results serially to SQLite.
   ~2x throughput improvement on GPU-bound workloads.

2. UTF-8 boundary safety: all computed byte offsets in split_into_chunks
   (paragraph/sentence/word break finders + overlap advance) now use
   floor_char_boundary() to prevent panics on multi-byte characters
   like smart quotes and non-breaking spaces.

3. CHUNK_MAX_BYTES reduced from 6000 to 1500 to fit nomic-embed-text's
   actual 2048-token context window, eliminating context-length retry
   storms that were causing 10x slowdowns.

Also threads ShutdownSignal through embed pipeline for graceful Ctrl+C.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 14:48:34 -05:00
Taylor Eernisse
1c45725cba fix(sync): pass options.full through to generate-docs stage
The sync pipeline was hardcoding `false` for the `full` parameter when
calling run_generate_docs, so `lore sync --full` would re-ingest all
entities but then only regenerate documents for newly-dirtied ones.
Entities loaded before migration 007 (which introduced the dirty_sources
system) were never marked dirty and thus never got documents generated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 11:42:11 -05:00
Taylor Eernisse
405e5370dc feat(sync): concurrent drains, atomic watermarks, graceful Ctrl+C shutdown
Three fixes to the sync pipeline:

1. Atomic watermarks: wrap complete_job + update_watermark in a single
   SQLite transaction so crash between them can't leave partial state.

2. Concurrent drain loops: prefetch HTTP requests via join_all (batch
   size = dependent_concurrency), then write serially to DB. Reduces
   ~9K sequential requests from ~19 min to ~2.4 min.

3. Graceful shutdown: install Ctrl+C handler via ShutdownSignal
   (Arc<AtomicBool>), thread through orchestrator/CLI, release locked
   jobs on interrupt, record sync_run as "failed".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 11:22:04 -05:00
Taylor Eernisse
32783080f1 fix(timeline): report true total_events in robot JSON meta
The robot JSON envelope's meta.total_events field was incorrectly
reporting events.len() (the post-limit count), making it identical
to meta.showing. This defeated the purpose of having both fields.

Changes across the pipeline to fix this:

- collect_events now returns (Vec<TimelineEvent>, usize) where the
  second element is the total event count before truncation
- TimelineResult gains a total_events_before_limit field (serde-skipped)
  so the value flows cleanly from collect through to the renderer
- main.rs passes the real total instead of the events.len() workaround

Additional cleanup in this pass:

- Derive PartialEq/Eq/PartialOrd/Ord on TimelineEventType, replacing
  the hand-rolled event_type_discriminant() function. Variant declaration
  order now defines sort tiebreak, documented in a doc comment.
- Validate --since input with a proper LoreError::Other instead of
  silently treating invalid values as None
- Fix ANSI-aware tag column padding with console::pad_str (colored tags
  like "[merged]" were misaligned because ANSI escapes consumed width)
- Remove dead print_timeline_json and infer_max_depth functions that
  were superseded by print_timeline_json_with_meta

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 09:35:02 -05:00
Taylor Eernisse
f1cb45a168 style: format perf_benchmark.rs with cargo fmt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:49:53 -05:00
Taylor Eernisse
69df8a5603 feat(timeline): wire up lore timeline command with human + robot renderers
Complete Gate 3 by implementing the final three beads:
- bd-2f2: Human output renderer with colored event tags, entity refs,
  evidence snippets, and expansion summary footer
- bd-dty: Robot JSON output with {ok,data,meta} envelope, ISO timestamps,
  nested via provenance, and per-event-type details objects
- bd-1nf: CLI wiring with TimelineArgs (9 flags), Commands::Timeline
  variant, handle_timeline handler, VALID_COMMANDS entry, and robot-docs
  manifest with temporal_intelligence workflow

All 7 Gate 3 children now closed. Pipeline: SEED -> HYDRATE -> EXPAND ->
COLLECT -> RENDER fully operational.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:49:48 -05:00