gitlore

Author	SHA1	Message	Date
teernisse	fe7d210988	feat(embedding): strip GitLab boilerplate from titles before embedding GitLab auto-generates MR titles like "Draft: Resolve \"Issue Title\"" when creating MRs from issues. This 4-token boilerplate prefix dominated the embedding vectors, causing unrelated MRs with the same title structure to appear as highly similar in "lore related" results (0.667 similarity vs 0.674 for the actual parent issue — a difference of only 0.007). Add normalize_title_for_embedding() which deterministically strips: - "Draft: " prefix (case-insensitive) - "WIP: " prefix (case-insensitive) - "Resolve \"...\"" wrapper (extracts inner title) - Combinations: "Draft: Resolve \"...\"" The normalization is applied in all four document extractors (issues, MRs, discussions, notes) to the content_text field only. DocumentData.title preserves the original title for human-readable display in CLI output. Since content_text changes, content_hash will differ from stored values, triggering automatic re-embedding on the next "lore embed" run. Uses str::get() for all byte-offset slicing to prevent panics on titles containing emoji or other multi-byte UTF-8 characters. 15 new tests covering: all boilerplate patterns, case insensitivity, edge cases (empty inner text, no-op for normal titles), UTF-8 safety, and end-to-end document extraction with boilerplate titles. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 17:07:23 -04:00
teernisse	8ab65a3401	fix(search): broaden whitespace collapse to all Unicode whitespace Change collapse_whitespace() from is_ascii_whitespace() to is_whitespace() so non-breaking spaces, em-spaces, and other Unicode whitespace characters in search snippets are also collapsed into single spaces. Additionally fix serde_json::to_value() call site to handle serialization errors gracefully instead of unwrapping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 17:07:10 -04:00
teernisse	16bd33e8c0	feat(core): add ollama lifecycle management for cron sync Add src/core/ollama_mgmt.rs module that handles Ollama detection, startup, and health checking. This enables cron-based sync to automatically start Ollama when it's installed but not running, ensuring embeddings are always available during unattended sync runs. Integration points: - sync handler (--lock mode): calls ensure_ollama() before embedding phase - cron status: displays Ollama health (installed/running/not-installed) - robot JSON: includes OllamaStatusBrief in cron status response The module handles local vs remote Ollama URLs, IPv6, process detection via lsof, and graceful startup with configurable wait timeouts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 17:07:05 -04:00
teernisse	fa7c44d88c	fix(search): collapse newlines in snippets to prevent unindented metadata (GIT-5) Document content_text includes multi-line metadata (Project:, URL:, Labels:, State:) separated by newlines. FTS5 snippet() preserves these newlines, causing subsequent lines to render at column 0 with no indent. collapse_newlines() flattens all whitespace runs into single spaces before truncation and rendering. Includes 3 unit tests.	2026-03-12 10:25:39 -04:00
teernisse	e46a2fe590	test(core): add lookup-by-gitlab_project_id test for projects table Validates that the projects table schema uses gitlab_project_id (not gitlab_id) and that queries filtering by this column return the correct project. Uses the test helper convention where insert_project sets gitlab_project_id = id * 100.	2026-03-12 10:08:22 -04:00
teernisse	4ab04a0a1c	test(me): add integration tests for gitlab_base_url in robot JSON envelope Guards against regression in the wiring chain run_me -> print_me_json -> MeJsonEnvelope where the gitlab_base_url meta field could silently disappear. - me_envelope_includes_gitlab_base_url_in_meta: verifies full envelope serialization preserves the base URL in meta - activity_event_carries_url_construction_fields: verifies activity events contain entity_type + entity_iid + project fields, then demonstrates URL construction by combining with meta.gitlab_base_url	2026-03-12 10:08:22 -04:00
teernisse	9c909df6b2	feat(me): add 30-day mention age cutoff to filter stale @-mentions Previously, query_mentioned_in returned mentions from any time in the entity's history as long as the entity was still open (or recently closed). This caused noise: a mention from 6 months ago on a still-open issue would appear in the dashboard indefinitely. Now the SQL filters notes by created_at > mention_cutoff_ms, defaulting to 30 days. The recency_cutoff (7 days) still governs closed/merged entity visibility — this new cutoff governs mention note age on open entities. Signature change: query_mentioned_in gains a mention_cutoff_ms parameter. All existing test call sites updated. Two new tests verify the boundary: - mentioned_in_excludes_old_mention_on_open_issue (45-day mention filtered) - mentioned_in_includes_recent_mention_on_open_issue (5-day mention kept)	2026-03-12 10:08:22 -04:00
teernisse	7e5ffe35d3	feat(explain): enrich output with project path, thread excerpts, entity state, and timeline metadata Multiple improvements to the explain command's data richness: - Add project_path to EntitySummary so consumers can construct URLs from project + entity_type + iid without extra lookups - Include first_note_excerpt (first 200 chars) in open threads so agents and humans get thread context without a separate query - Add state and direction fields to RelatedIssue — consumers now see whether referenced entities are open/closed/merged and whether the reference is incoming or outgoing - Filter out self-references in both outgoing and incoming related entity queries (entity referencing itself via cross-reference extraction) - Wrap timeline excerpt in TimelineExcerpt struct with total_events and truncated fields — consumers know when events were omitted - Keep most recent events (tail) instead of oldest (head) when truncating timeline — recent activity is more actionable - Floor activity summary first_event at entity created_at — label events from bulk operations can predate entity creation - Human output: show project path in header, thread excerpt preview, state badges on related entities, directional arrows, truncation counts	2026-03-12 10:08:22 -04:00
teernisse	36b361a50a	fix(search): tag-aware snippet truncation prevents cutting inside <mark> pairs (GIT-5) The old truncation counted <mark></mark> HTML tags (~13 chars per keyword) as visible characters, causing over-aggressive truncation. When a cut landed inside a tag pair, render_snippet would render highlighted text as muted gray instead of bold yellow. New truncate_snippet() walks through markup counting only visible characters, respects tag boundaries, and always closes an open <mark> before appending ellipsis. Includes 6 unit tests.	2026-03-12 09:28:55 -04:00
teernisse	44431667e8	feat(search): overhaul search output formatting (GIT-5) Phase 1: Add source_entity_iid to search results via CASE subquery on hydrate_results() for all 4 source types (issue, MR, discussion, note). Phase 2: Fix visual alignment - compute indent from prefix visible width. Phase 3: Show compact relative time on title line. Phase 4: Add drill-down hint footer (lore issues <iid>). Phase 5: Move labels to --explain mode, limit snippets to 2 terminal lines. Phase 6: Use section_divider() for results header. Also: promote strip_ansi/visible_width to public render utils, update robot mode --fields minimal search preset with source_entity_iid.	2026-03-12 09:15:34 -04:00
teernisse	ddab186315	feat(me): include GitLab base URL in robot meta for URL construction The `me` dashboard robot output now includes `meta.gitlab_base_url` so consuming agents can construct clickable issue/MR links without needing access to the lore config file. The pattern is: {gitlab_base_url}/{project}/-/issues/{iid} {gitlab_base_url}/{project}/-/merge_requests/{iid} This uses the new RobotMeta::with_base_url() constructor. The base URL is sourced from config.gitlab.base_url (already available in the me command's execution context) and normalized to strip trailing slashes. robot-docs updated to document the new meta field and URL construction pattern for the me command's response schema. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 10:30:03 -04:00
teernisse	d6d1686f8e	refactor(robot): add constructors to RobotMeta, support optional gitlab_base_url RobotMeta previously required direct struct literal construction with only elapsed_ms. This made it impossible to add optional fields without updating every call site to include them. Introduce two constructors: - RobotMeta::new(elapsed_ms) — standard meta with timing only - RobotMeta::with_base_url(elapsed_ms, base_url) — meta enriched with the GitLab instance URL, enabling consumers to construct entity links without needing config access The gitlab_base_url field uses #[serde(skip_serializing_if = "Option::is_none")] so existing JSON envelopes are byte-identical — no breaking change for any robot mode consumer. All 22 call sites across handlers, count, cron, drift, embed, generate_docs, ingest, list (mrs/notes), related, show, stats, sync_status, and who are updated from struct literals to RobotMeta::new(). Three tests verify the new constructors and trailing-slash normalization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-11 10:29:56 -04:00
teernisse	5c44ee91fb	fix(robot): propagate JSON serialization errors instead of silent failure Three robot-mode print functions used `serde_json::to_string().unwrap_or_default()` which silently outputs an empty string on failure (exit 0, no error). This diverged from the codebase standard in handlers.rs which uses `?` propagation. Changed to return Result<()> with proper LoreError::Other mapping: - explain.rs: print_explain_json() - file_history.rs: print_file_history_json() - trace.rs: print_trace_json() Updated callers in handlers.rs and explain.rs to propagate with `?`. While serde_json::to_string on a json!() Value is unlikely to fail in practice (only non-finite floats trigger it), the unwrap_or_default pattern violates the robot mode contract: callers expect either valid JSON on stdout or a structured error on stderr with a non-zero exit code, never empty output with exit 0.	2026-03-10 17:11:03 -04:00
teernisse	6aff96d32f	fix(sql): add ORDER BY to all LIMIT queries for deterministic results SQLite does not guarantee row order without ORDER BY, even with LIMIT. This was a systemic issue found during a multi-pass bug hunt: Production queries (explain.rs): - Outgoing reference query: ORDER BY target_entity_type, target_entity_iid - Incoming reference query: ORDER BY source_entity_type, COALESCE(iid) Without these, robot mode output was non-deterministic across calls, breaking clients expecting stable ordering. Test helper queries (5 locations across 3 files): - discussions_tests.rs: get_discussion_id() - mr_discussions.rs: get_mr_discussion_id() - queue.rs: setup_db_with_job(), release_all_locked_jobs_clears_locks() Currently safe (single-row inserts) but would break silently if tests expanded to multi-row fixtures.	2026-03-10 17:10:52 -04:00
teernisse	06889ec85a	fix(explain): address review findings — N+1 queries, duplicate decisions, silent errors 1. fetch_open_threads: replace N+1 loop (2 queries per thread) with a single query using correlated subqueries for note_count and started_by. 2. extract_key_decisions: track consumed notes so the same note is not matched to multiple events, preventing duplicate decision entries. 3. build_timeline_excerpt_from_pipeline: log tracing::warn on seed/collect failures instead of silently returning empty timeline.	2026-03-10 16:43:06 -04:00
teernisse	08bda08934	fix(explain): filter out NULL iids in related entities queries entity_references.target_entity_iid is nullable (unresolved cross-project refs), and COALESCE(i.iid, mr.iid) returns NULL for orphaned refs. Both paths caused rusqlite InvalidColumnType errors when fetching i64. Added IS NOT NULL filters to both outgoing and incoming reference queries.	2026-03-10 15:54:54 -04:00
teernisse	32134ea933	feat(explain): implement lore explain command for auto-generating issue/MR narratives Adds the full explain command with 7 output sections: entity summary, description, key decisions (heuristic event-note correlation), activity summary, open threads, related entities (closing MRs, cross-references), and timeline excerpt (reuses existing pipeline). Supports --sections filtering, --since time scoping, --no-timeline, --max-decisions, and robot mode JSON output. Closes: bd-2i3z, bd-a3j8, bd-wb0b, bd-3q5e, bd-nj7f, bd-9lbr	2026-03-10 15:04:35 -04:00
teernisse	a10d870863	remove: deprecated `show` command from CLI The `show` command (`lore show issue 42` / `lore show mr 99`) was deprecated in favor of the unified entity commands (`lore issues 42` / `lore mrs 99`). This commit fully removes the command entry point: - Remove `Commands::Show` variant from clap CLI definition - Remove `Commands::Show` match arm and deprecation warning in main.rs - Remove `handle_show_compat()` forwarding function from robot_docs.rs - Remove "show" from autocorrect known-commands and flags tables - Rename response schema keys from "show" to "detail" in robot-docs - Update command descriptions from "List or show" to "List ... or view detail with <IID>" The underlying detail-view module (`src/cli/commands/show/`) is preserved — its types (IssueDetail, MrDetail) and query/render functions are still used by `handle_issues` and `handle_mrs` when an IID argument is provided.	2026-03-10 14:20:57 -04:00
teernisse	cab8c540da	fix(show): include gitlab_id on notes in issue/MR detail views The show command's NoteDetail and MrNoteDetail structs were missing gitlab_id, making individual notes unaddressable in robot mode output. This was inconsistent with the notes list command which already exposed gitlab_id. Without an identifier, agents consuming show output could not construct GitLab web URLs or reference specific notes for follow-up operations via glab. Added gitlab_id to: - NoteDetail / NoteDetailJson (issue discussions) - MrNoteDetail / MrNoteDetailJson (MR discussions) - Both SQL queries (shifted column indices accordingly) - Both From<&T> conversion impls Deliberately scoped to show command only — me/timeline/trace structs were evaluated and intentionally left unchanged because they serve different consumption patterns where note-level identity is not needed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 13:27:33 -04:00
teernisse	62fbd7275e	fix(me): show activity on closed/merged items in dashboard The activity feed and since-last-check inbox previously filtered to only open items via state = 'opened' checks in the SQL subqueries. This meant comments on merged MRs (post-merge follow-ups, questions) and closed issues were silently dropped from the feed. Remove the state filter from the association checks in both query_activity() and query_since_last_check(). The user-association checks (assigned, authored, reviewing) remain — activity still only appears for items the user is connected to, regardless of state. The simplified subqueries also eliminate unnecessary JOINs to the issues/merge_requests tables that were only needed for the state check, resulting in slightly more efficient index-only scans on issue_assignees and mr_reviewers. Add 4 tests covering: merged MR (authored), closed MR (reviewer), closed issue (assignee), and merged MR in the since-last-check inbox.	2026-03-10 11:07:05 -04:00
teernisse	4b0535f852	perf(timeline): guard against overly broad seed queries Add pre-flight FTS count check before expensive bm25-ranked search. Queries matching >10,000 documents are rejected instantly with a suggestion to use a more specific query or --since filter. Prevents multi-minute CPU spin on queries like 'merge request' that match most of the corpus (106K/178K documents).	2026-03-06 21:22:43 -05:00
teernisse	6aaf931c9b	fix(embedding): guard is_multiple_of() progress logs against zero is_multiple_of(N) returns true for 0, which caused debug/info progress messages to fire at doc_num=0 (the start of every page) rather than only at the intended 50/100 milestones. Add != 0 check to both the debug (every 50) and info (every 100) log sites. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 17:01:33 -05:00
teernisse	e8d6c5b15f	feat(runtime): replace tokio+reqwest with asupersync async runtime - Add HTTP adapter layer (src/http.rs) wrapping asupersync h1 client - Migrate gitlab client, graphql, and ollama to HTTP adapter - Swap entrypoint from #[tokio::main] to RuntimeBuilder::new().block_on() - Rewrite signal handler for asupersync (RuntimeHandle::spawn + ctrl_c()) - Migrate rate limiter sleeps to asupersync::time::sleep(wall_now(), d) - Add asupersync-native HTTP integration tests - Convert timeline_seed_tests to RuntimeBuilder pattern Phases 1-3 of asupersync migration (atomic: code won't compile without all pieces).	2026-03-06 15:57:20 -05:00
teernisse	bf977eca1a	refactor(structure): reorganize codebase into domain-focused modules	2026-03-06 15:24:09 -05:00
teernisse	4d41d74ea7	refactor(deps): replace tokio Mutex/join!, add NetworkErrorKind enum, remove reqwest from error types	2026-03-06 15:22:42 -05:00
teernisse	3a4fc96558	refactor(shutdown): extract 4 identical Ctrl+C handlers into core/shutdown.rs	2026-03-06 15:22:37 -05:00
teernisse	d3f8020cf8	perf(me): optimize mentions query with materialized CTEs scoped to candidates The `query_mentioned_in` SQL previously joined notes directly against the full issues/merge_requests tables, with per-row subqueries for author/assignee/reviewer exclusion. On large databases this produced pathological query plans where SQLite scanned the entire notes table before filtering to relevant entities. Refactor into a dedicated `build_mentioned_in_sql()` builder that: 1. Pre-filters candidate issues and MRs into MATERIALIZED CTEs (state open OR recently closed, not authored by user, not assigned/reviewing). This narrows the working set before any notes join. 2. Computes note timestamps (my_ts, others_ts, any_ts) as separate MATERIALIZED CTEs scoped to candidate entities only, rather than scanning all notes. 3. Joins mention-bearing notes against the pre-filtered candidates, avoiding the full-table scans. Also adds a test verifying that authored issues are excluded from the mentions results, and a unit test asserting all four CTEs are materialized. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 13:36:37 -05:00
teernisse	9107a78b57	perf(ingestion): replace per-row INSERT loops with chunked batch INSERTs The issue and MR ingestion paths previously inserted labels, assignees, and reviewers one row at a time inside a transaction. For entities with many labels or assignees, this issued N separate SQLite statements where a single multi-row INSERT suffices. Replace the per-row loops with batch INSERT functions that build a single `INSERT OR IGNORE ... VALUES (?1,?2),(?1,?3),...` statement per chunk. Chunks are capped at 400 rows (BATCH_LINK_ROWS_MAX) to stay comfortably below SQLite's default 999 bind-parameter limit. Affected paths: - issues.rs: link_issue_labels_batch_tx, insert_issue_assignees_batch_tx - merge_requests.rs: insert_mr_labels_batch_tx, insert_mr_assignees_batch_tx, insert_mr_reviewers_batch_tx New tests verify deduplication (OR IGNORE), multi-chunk correctness, and equivalence with the old per-row approach. A perf benchmark (bench_issue_assignee_insert_individual_vs_batch) demonstrates the speedup across representative assignee set sizes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 13:36:26 -05:00
teernisse	1dfcfd3f83	feat(autocorrect): add fuzzy subcommand matching and flag-as-subcommand detection Extend the CLI autocorrection pipeline with two new correction rules that help agents recover from common typos and misunderstandings: 1. SubcommandFuzzy (threshold 0.85): Fuzzy-matches typo'd subcommands against the canonical list. Examples: - "issuess" → "issues" - "timline" → "timeline" - "serach" → "search" Guards prevent false positives: - Words that look like misplaced global flags are skipped - Valid command prefixes are left to clap's infer_subcommands 2. FlagAsSubcommand: Detects when agents type subcommands as flags. Some agents (especially Codex) assume `--robot-docs` is a flag rather than a subcommand. This rule converts: - "--robot-docs" → "robot-docs" - "--generate-docs" → "generate-docs" Also improves error messages in main.rs: - MissingRequiredArgument: Contextual example based on detected subcommand - MissingSubcommand: Lists common commands - TooFewValues/TooManyValues: Command-specific help hints Added CANONICAL_SUBCOMMANDS constant enumerating all valid subcommands (including hidden ones) for fuzzy matching. This ensures agents that know about hidden commands still get typo correction. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-06 11:15:28 -05:00
teernisse	ffbd1e2dce	feat(me): add mentions section for @-mentions in dashboard Add a new --mentions flag to the `lore me` command that surfaces items where the user is @-mentioned but NOT already assigned, authoring, or reviewing. This fills an important gap in the personal work dashboard: cross-team requests and callouts that don't show up in the standard issue/MR sections. Implementation details: - query_mentioned_in() scans notes for @username patterns, then filters out entities where the user is already an assignee, author, or reviewer - MentionedInItem type captures entity_type (issue/mr), iid, title, state, project path, attention state, and updated timestamp - Attention state computation marks items as needs_attention when there's recent activity from others - Recency cutoff (7 days) prevents surfacing stale mentions - Both human and robot renderers include the new section The robot mode schema adds mentioned_in array with me_mentions field preset for token-efficient output. Test coverage: - mentioned_in_finds_mention_on_unassigned_issue: basic case - mentioned_in_excludes_assigned_issue: no duplicate surfacing - mentioned_in_excludes_author_on_mr: author already sees in authored MRs - mentioned_in_excludes_reviewer_on_mr: reviewer already sees in reviewing - mentioned_in_uses_recency_cutoff: old mentions filtered - mentioned_in_respects_project_filter: scoping works Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-06 11:15:15 -05:00
teernisse	571c304031	feat(init): add --refresh flag for project re-registration When new projects are added to the config file, `lore sync` doesn't pick them up because project discovery only happens during `lore init`. Previously, users had to use `--force` to overwrite their entire config. The new `--refresh` flag reads the existing config and updates the database to match, without modifying the config file itself. Features: - Validates GitLab authentication before processing - Registers new projects from config into the database - Detects orphan projects (in DB but removed from config) - Interactive mode: prompts to delete orphans (default: No) - Robot mode: returns JSON with orphan info, no prompts Usage: lore init --refresh # Interactive lore --robot init --refresh # JSON output Improved UX: When running `lore init` with an existing config and no flags, the error message now suggests using `--refresh` to register new projects or `--force` to overwrite the config file. Implementation: - Added RefreshOptions and RefreshResult types to init module - Added run_init_refresh() for core refresh logic - Added delete_orphan_projects() helper for orphan cleanup - Added handle_init_refresh() in main.rs for CLI handling - Added JSON output types for robot mode - Registered --refresh in autocorrect.rs command flags registry - --refresh conflicts with --force (mutually exclusive)	2026-03-02 15:23:41 -05:00
teernisse	5fd1ce6905	perf(ingestion): implement prefetch pattern for issue discussions Issue discussion sync was ~10x slower than MR discussion sync because it used a fully sequential pattern: fetch one issue's discussions, write to DB, repeat. MR sync already used a prefetch pattern with concurrent HTTP requests followed by sequential DB writes. This commit brings issue discussion sync to parity with MRs: Architecture (prefetch pattern): 1. HTTP phase: Concurrent fetches via `join_all()` with batch size controlled by `dependent_concurrency` config (default 8) 2. Transform phase: Normalize discussions and notes during prefetch 3. DB phase: Sequential writes with proper transaction boundaries Changes: - gitlab/client.rs: Add `fetch_all_issue_discussions()` to mirror the existing MR pattern for API consistency - discussions.rs: Replace `ingest_issue_discussions()` with: * `prefetch_issue_discussions()` - async HTTP fetch + transform * `write_prefetched_issue_discussions()` - sync DB writes * New structs: `PrefetchedIssueDiscussions`, `PrefetchedDiscussion` - orchestrator.rs: Update `sync_discussions_sequential()` to use concurrent prefetch for each batch instead of sequential calls - surgical.rs: Update single-issue surgical sync to use new functions - mod.rs: Update public exports Expected improvement: 5-10x speedup on issue discussion sync (from ~50s to ~5-10s for large projects) due to concurrent HTTP round-trips. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-02 14:14:03 -05:00
teernisse	b67bb8754c	fix(who): prevent integer overflow in limit calculations When `--limit` is omitted, the default value is `usize::MAX` to mean "unlimited". The previous code used `(limit + 1) as i64` to fetch one extra row for "has more" detection. This caused integer overflow: usize::MAX + 1 = 0 (wraps around) The resulting `LIMIT 0` clause returned zero rows, making the `who` subcommands appear to find nothing even when data existed. Fix: Use `saturating_add(1)` to cap at `usize::MAX` instead of wrapping, then `.min(i64::MAX as usize)` to ensure the value fits in SQLite's signed 64-bit LIMIT parameter. Includes regression tests that verify `usize::MAX` limit returns results. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-02 14:13:51 -05:00
teernisse	b2811b5e45	fix(fts): remove NEAR from infix operator list NEAR is an FTS5 function (NEAR(term1 term2, N)), not an infix operator like AND/OR/NOT. Passing it through unquoted in Safe mode was incorrect - it would be treated as a literal term rather than a function call. Users who need NEAR proximity search should use FtsQueryMode::Raw which passes the query through verbatim to FTS5. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-26 11:06:59 -05:00
teernisse	2d2e470621	refactor(orchestrator): consolidate stale lock reclamation and fix edge cases Several improvements to the ingestion orchestrator: 1. Stale lock reclamation consolidation: Previously, reclaim_stale_locks() was called redundantly in multiple drain functions (drain_resource_events, drain_closes_issues, etc.). Now it's called once at sync entry points (ingest_project_issues, ingest_project_mrs) to reduce overhead and DB contention. 2. Fix status_enrichment_mode error values: - "fetched" -> "error" when project path is missing - "fetched" -> "fetch_error" when GraphQL fetch fails These values are used in robot mode JSON output and should accurately reflect the error condition. 3. Add batch_size zero guard: Added .max(1) to batch_size calculation to prevent panic in .chunks() when config.sync.dependent_concurrency is 0. This makes the code defensive against misconfiguration. These changes improve correctness and reduce unnecessary DB operations during sync, particularly beneficial for large projects with many entities. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-26 11:06:44 -05:00
teernisse	23efb15599	feat(truncation): add pre-truncation for oversized descriptions Add pre_truncate_description() to prevent unbounded memory allocation when processing pathologically large descriptions (e.g., 500MB base64 blobs in issue descriptions). Previously, the document extraction pipeline would: 1. Allocate memory for the entire description 2. Append to content buffer 3. Only truncate at the end via truncate_hard_cap() For a 500MB description, this would allocate 500MB+ before truncation. New approach: 1. Check description size BEFORE appending 2. If over limit, truncate at UTF-8 boundary immediately 3. Add human-readable marker: "[... description truncated from 500.0MB to 2.0MB ...]" 4. Log warning with original size for observability Also adds format_bytes() helper for human-readable byte sizes (B, KB, MB). This is applied to both issue and MR document extraction in extractor.rs, protecting the embedding pipeline from OOM on malformed GitLab data. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-26 11:06:32 -05:00
teernisse	a45c37c7e4	feat(timeline): add entity-direct seeding and round-robin evidence selection Enhance the timeline command with two major improvements: 1. Entity-direct seeding syntax (bypass search): lore timeline issue:42 # Timeline for specific issue lore timeline i:42 # Short form lore timeline mr:99 # Timeline for specific MR lore timeline m:99 # Short form This directly resolves the entity and gathers ALL its discussions without requiring search/embedding. Useful when you know exactly which entity you want. 2. Round-robin evidence note selection: Previously, evidence notes were taken in FTS rank order, which could result in all notes coming from a single high-traffic discussion. Now we: - Fetch 5x the requested limit (or minimum 50) - Group notes by discussion_id - Select round-robin across discussions - This ensures diverse evidence from multiple conversations API changes: - Renamed total_events_before_limit -> total_filtered_events (clearer semantics) - Added resolve_entity_by_iid() in timeline.rs for IID-based entity resolution - Added seed_timeline_direct() in timeline_seed.rs for search-free seeding - Added round_robin_select_by_discussion() helper function The entity-direct mode uses search_mode: "direct" to distinguish from "hybrid" or "lexical" search modes in the response metadata. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-26 11:06:23 -05:00
teernisse	8657e10822	feat(related): add semantic similarity discovery command Implement `lore related` command for discovering semantically similar entities using vector embeddings. Supports two modes: Entity mode: lore related issues 42 # Find entities similar to issue #42 lore related mrs 99 # Find entities similar to MR !99 Query mode: lore related "auth bug" # Find entities matching free text query Key features: - Uses existing embedding infrastructure (nomic-embed-text via Ollama) - Computes shared labels between source and results - Shows similarity scores as percentage (0-100%) - Warns when all results have low similarity (<30%) - Warns for short queries (<=2 words) that may produce noisy results - Filters out discussion/note documents, returning only issues and MRs - Handles orphaned documents gracefully (skips if entity deleted) - Robot mode JSON output with {ok, data, meta} envelope Implementation details: - distance_to_similarity() converts L2 distance to 0-1 score: 1/(1+distance) - Uses saturating_add/saturating_mul for overflow safety on limit parameter - Proper error handling for missing embeddings ("run lore embed first") - Project scoping via -p flag with fuzzy matching CLI integration: - Added to autocorrect.rs command registry - Added Related variant to Commands enum in cli/mod.rs - Wired into main.rs with handle_related() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-26 11:06:12 -05:00
teernisse	7fdeafa330	feat(db): add migration 028 for discussions.merge_request_id FK constraint Add foreign key constraint on discussions.merge_request_id to prevent orphaned discussions when MRs are deleted. SQLite doesn't support ALTER TABLE ADD CONSTRAINT, so this migration recreates the table with: 1. New table with FK: REFERENCES merge_requests(id) ON DELETE CASCADE 2. Data copy with FK validation (only copies rows with valid MR references) 3. Table swap (DROP old, RENAME new) 4. Full index recreation (all 10 indexes from migrations 002-022) The migration also includes a CHECK constraint ensuring mutual exclusivity: - Issue discussions have issue_id NOT NULL and merge_request_id NULL - MR discussions have merge_request_id NOT NULL and issue_id NULL Also fixes run_migrations() to properly propagate query errors instead of silently returning unwrap_or defaults, improving error diagnostics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-26 11:06:01 -05:00
teernisse	87bdbda468	feat(status): add per-entity sync counts from migration 027 Enhances sync status reporting to include granular per-entity counts that were added in database migration 027. This provides better visibility into what each sync run actually processed. New fields in SyncRunInfo and robot mode JSON: - issues_fetched / issues_ingested: issue sync counts - mrs_fetched / mrs_ingested: merge request sync counts - skipped_stale: entities skipped due to staleness - docs_regenerated / docs_embedded: document pipeline counts - warnings_count: non-fatal issues during sync Robot mode optimization: - Uses skip_serializing_if = "is_zero" to omit zero-value fields - Reduces JSON payload size for typical sync runs - Maintains backwards compatibility (fields are additive) SQL query now reads all 8 new columns from sync_runs table, with defensive unwrap_or(0) for NULL handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-25 10:02:45 -05:00
teernisse	ed987c8f71	docs: update robot-docs manifest and agent instructions for since-last-check Updates the `lore robot-docs` manifest with comprehensive documentation for the new since-last-check inbox feature, enabling AI agents to discover and use the functionality programmatically. robot-docs manifest additions: - since_last_check response schema with cursor_iso, groups, events - --reset-cursor flag documentation - Design notes: cursor persistence location, --project filter behavior - Example commands in personal_dashboard section Agent instruction updates (AGENTS.md, CLAUDE.md): - Added --mrs, --project, --user flags to command examples - Added --reset-cursor example - Aligned both files for consistency Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-25 10:02:37 -05:00
teernisse	ce5621f3ed	feat(me): add "since last check" cursor-based inbox to dashboard Implements a cursor-based notification inbox that surfaces actionable events from others since the user's last `lore me` invocation. This addresses the core UX need: "what happened while I was away?" Event Sources (three-way UNION query): 1. Others' comments on user's open issues/MRs 2. @mentions on ANY item (not restricted to owned items) 3. Assignment/review-request system notes mentioning user Mention Detection: - SQL LIKE pre-filter for performance, then regex validation - Word-boundary-aware: rejects "alice" in "@alice-bot" or "alice@corp.com" - Domain rejection: "@alice.com" not matched (prevents email false positives) - Punctuation tolerance: "@alice," "@alice." "(@ alice)" all match Cursor Watermark Pattern: - Global watermark computed from ALL projects before --project filtering - Ensures --project display filter doesn't permanently skip events - Cursor advances only after successful render (no data loss on errors) - First run establishes baseline (no inbox shown), subsequent runs show delta Output: - Human: color-coded event badges, grouped by entity, actor + timestamp - Robot: standard envelope with since_last_check object containing cursor_iso, total_event_count, and groups array with nested events CLI additions: - --reset-cursor flag: clears cursor (next run shows no new events) - Autocorrect: --reset-cursor added to known me command flags Tests cover: - Mention with trailing comma/period/parentheses (should match) - Email-like text "@alice.com" (should NOT match) - Domain-like text "@alice.example" (should NOT match) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-25 10:02:31 -05:00
teernisse	eac640225f	feat(core): add cursor persistence module for session-based timestamps Introduces a lightweight file-based cursor system for persisting per-user timestamps across CLI invocations. This enables "since last check" semantics where `lore me` can track what the user has seen. Key design decisions: - Per-user cursor files: ~/.local/share/lore/me_cursor_<username>.json - Atomic writes via temp-file + rename pattern (crash-safe) - Graceful degradation: missing/corrupt files return None - Username sanitization: non-safe chars replaced with underscore The cursor module provides three operations: - read_cursor(username) -> Option<i64>: read last-check timestamp - write_cursor(username, timestamp_ms): atomically persist timestamp - reset_cursor(username): delete cursor file (no-op if missing) Tests cover: missing file, roundtrip, per-user isolation, reset isolation, JSON validity after overwrites, corrupt file handling. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-25 10:02:13 -05:00
teernisse	f9e7913232	fix(error): replace misleading Database error suggestions The Database(rusqlite::Error) catch-all variant was suggesting 'lore reset --yes' for ALL database errors, including transient SQLITE_BUSY lock contention. This was wrong on two counts: 1. `lore reset` is not implemented (prints "not yet implemented") 2. Nuking the database is not the fix for a transient lock Changes: - Detect SQLITE_BUSY specifically via sqlite_error_code() and provide targeted advice: "Another process has the database locked" with common causes (cron sync, concurrent lore command) - Map SQLITE_BUSY to ErrorCode::DatabaseLocked (exit code 9) instead of DatabaseError (exit code 10) — semantically correct - Set BUSY actions to ["lore cron status"] (diagnostic) instead of the useless "lore sync --force" (--force overrides the app-level lock table, but SQLITE_BUSY fires before that table is even reached) - Fix MigrationFailed suggestion: also referenced non-existent 'lore reset', now says "try again" with lore migrate / lore doctor - Non-BUSY database errors get a simpler suggestion pointing to lore doctor (no more phantom reset command) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:36:16 -05:00
teernisse	6e487532aa	feat(me): improve dashboard rendering with dynamic layout and table-based activity Overhaul the `lore me` human-mode renderer for better terminal adaptation and visual clarity: Layout: - Add terminal_width() detection (COLUMNS env -> stderr ioctl -> 80 fallback) - Replace hardcoded column widths with dynamic title_width() that adapts to terminal size, clamped to [20, 80] - Section dividers now span the full terminal width Activity feed: - Replace manual println! formatting with Table-based rendering for proper column alignment across variable-width content - Split event_badge() into activity_badge_label() + activity_badge_style() for table cell compatibility - Add system_event_style() (#555555 dark gray) to visually suppress non-note events (label, assign, status, milestone, review changes) - Own actions use dim styling; others' notes render at full color MR display: - Add humanize_merge_status() to convert GitLab API values like "not_approved" -> "needs approval", "ci_must_pass" -> "CI pending" Table infrastructure (render.rs): - Add Table::columns() for headerless tables - Add Table::indent() for row-level indentation - Add truncate_pad() for fixed-width cell formatting - Table::render() now supports headerless mode (no separator line) Other: - Default activity lookback changed from 30d to 1d (more useful default) - Robot-docs schema added for `me` command - AGENTS.md and CLAUDE.md updated with `lore me` examples Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 10:36:01 -05:00
teernisse	7e9a23cc0f	fix(me): include NULL statuses in open issues filter Organizations without GitLab Premium/Ultimate don't have work item statuses configured - all their issues have status_name = NULL. Previously, the me command filtered to only 'In Progress' and 'In Review' statuses, showing zero issues for these organizations. Now includes NULL status as a fallback for graceful degradation.	2026-02-21 09:20:25 -05:00
teernisse	9c1a9bfe5d	feat(me): add lore me personal work dashboard command Implement a personal work dashboard that shows everything relevant to the configured GitLab user: open issues assigned to them, MRs they authored, MRs they are reviewing, and a chronological activity feed. Design decisions: - Attention state computed from GitLab interaction data (comments, reviews) with no local state tracking -- purely derived from existing synced data - Username resolution: --user flag > config.gitlab.username > actionable error - Project scoping: --project (fuzzy) \| --all \| default_project \| all - Section filtering: --issues, --mrs, --activity (combinable, default = all) - Activity feed controlled by --since (default 30d); work item sections always show all open items regardless of --since Architecture (src/cli/commands/me/): - types.rs: MeDashboard, MeSummary, AttentionState data types - queries.rs: 4 SQL queries (open_issues, authored_mrs, reviewing_mrs, activity) using existing issue_assignees, mr_reviewers, notes tables - render_human.rs: colored terminal output with attention state indicators - render_robot.rs: {ok, data, meta} JSON envelope with field selection - mod.rs: orchestration (resolve_username, resolve_project_scope, run_me) - me_tests.rs: comprehensive unit tests covering all query paths Config additions: - New optional gitlab.username field in config.json - Tests for config with/without username - Existing test configs updated with username: None CLI wiring: - MeArgs struct with section filter, since, project, all, user, fields flags - Autocorrect support for me command flags - LoreRenderer::try_get() for safe renderer access in me module - Robot mode field selection presets (me_items, me_activity) - handle_me() in main.rs command dispatch Also fixes duplicate assertions in surgical sync tests (removed 6 duplicate assert! lines that were copy-paste artifacts). Spec: docs/lore-me-spec.md	2026-02-20 14:31:57 -05:00
teernisse	9ec1344945	feat(surgical-sync): add per-IID surgical sync pipeline with preflight validation Add the ability to sync specific issues or merge requests by IID without running a full incremental sync. This enables fast, targeted data refresh for individual entities — useful for agent workflows, debugging, and real-time investigation of specific issues or MRs. Architecture: - New CLI flags: --issue <IID> and --mr <IID> (repeatable, up to 100 total) scoped to a single project via -p/--project - Preflight phase validates all IIDs exist on GitLab before any DB writes, with TOCTOU-aware soft verification at ingest time - 6-stage pipeline: preflight -> fetch -> ingest -> dependents -> docs -> embed - Each stage is cancellation-aware via ShutdownSignal - Dedicated SyncRunRecorder extensions track surgical-specific counters (issues_fetched, mrs_ingested, docs_regenerated, etc.) New modules: - src/ingestion/surgical.rs: Core surgical fetch/ingest/dependent logic with preflight_fetch(), ingest_issue_by_iid(), ingest_mr_by_iid(), and fetch_dependents_for_{issue,mr}() - src/cli/commands/sync_surgical.rs: Full CLI orchestrator with progress spinners, human/robot output, and cancellation handling - src/embedding/pipeline.rs: embed_documents_by_ids() for scoped embedding - src/documents/regenerator.rs: regenerate_dirty_documents_for_sources() for scoped document regeneration Database changes: - Migration 027: Extends sync_runs with mode, phase, surgical_iids_json, per-entity counters, and cancelled_at column - New indexes: idx_sync_runs_mode_started, idx_sync_runs_status_phase_started GitLab client: - get_issue_by_iid() and get_mr_by_iid() single-entity fetch methods Error handling: - New SurgicalPreflightFailed error variant with entity_type, iid, project, and reason fields. Shares exit code 6 with GitLabNotFound. Includes comprehensive test coverage: - 645 lines of surgical ingestion tests (wiremock-based) - 184 lines of scoped embedding tests - 85 lines of scoped regeneration tests - 113 lines of GitLab client single-entity tests - 236 lines of sync_run surgical column/counter tests - Unit tests for SyncOptions, error codes, and CLI validation	2026-02-18 16:28:21 -05:00
teernisse	ea6e45e43f	refactor(who): make --limit optional (unlimited default) and fix clippy sort lints Change the `who` command's --limit flag from default=20 to optional, so omitting it returns all results. This matches the behavior users expect when they want a complete expert/workload/active/overlap listing without an arbitrary cap. Also applies clippy-recommended sort improvements: - who/reviews: sort_by(\|a,b\| b.count.cmp(&a.count)) -> sort_by_key with Reverse - drift: same pattern for frequency sorting Adds Theme::color_icon() helper to DRY the stage-icon coloring pattern used in sync output (was inline closure, now shared method).	2026-02-18 16:27:59 -05:00
teernisse	30ed02c694	feat(token): add stored token support with resolve_token and token_source Introduce a centralized token resolution system that supports both environment variables and config-file-stored tokens with clear priority (env var wins). This enables cron-based sync which runs in minimal shell environments without env vars. Core changes: - GitLabConfig gains optional `token` field and `resolve_token()` method that checks env var first, then config file, returning trimmed values - `token_source()` returns human-readable provenance ("environment variable" or "config file") for diagnostics - `ensure_config_permissions()` enforces 0600 on config files containing tokens (Unix only, no-op on other platforms) New CLI commands: - `lore token set [--token VALUE]` — validates against GitLab API, stores in config, enforces file permissions. Supports flag, stdin pipe, or interactive entry. - `lore token show [--unmask]` — displays masked token with source label Consumers updated to use resolve_token(): - auth_test: removes manual env var lookup - doctor: shows token source in health check output - ingest: uses centralized resolution Includes 10 unit tests for resolve/source logic and 2 for mask_token.	2026-02-18 16:27:48 -05:00

1 2 3 4 5

224 Commits