- types.rs: add #[allow(dead_code)] to truncate_to_chars now that
data-layer truncation was removed in favor of flex-width rendering
- timeline_seed_tests.rs: reformat multi-line assert_eq for clarity
- ollama_mgmt.rs: collapse method chain formatting
- render.rs: clamp flex column width to min(min_flex, natural) instead
of a hardcoded 20, preventing layout overflow when natural width is
small; rewrites flex_width test to be terminal-independent
- list/issues.rs: adopt .flex_col() builder on table construction
- list/mrs.rs, list/notes.rs: consolidate multi-line StyledCell::styled
calls to single-line format
- explain.rs: adopt flex_width() for related-issue title truncation,
consolidate multi-line formatting
File logging was set to DEBUG level unconditionally, causing log files to
grow to 25-32GB each (200GB total across 8 files). The primary volume
came from per-HTTP-request, per-entity, and per-chunk debug!() calls in
the ingestion orchestrator, GitLab client, and embedding pipeline — all
of which wrote JSON events to daily-rotated log files regardless of CLI
verbosity flags.
Two changes:
- File filter: lore=debug,warn -> lore=info (eliminates ~90% of volume)
- Default retention: 30 days -> 7 days (caps total disk usage)
The info level still captures operational events (sync start/complete,
rate limits, errors, embedding progress) while per-request instrumentation
stays silent unless explicitly enabled via -vv/-vvv on stderr.
Replace hardcoded truncation widths across CLI commands with
render::flex_width() calls that adapt to terminal size. Remove
server-side truncate_to_chars() in timeline collect/seed stages so
full text is preserved through the pipeline — truncation now happens
only at the presentation layer where terminal width is known.
Affected commands: explain, file-history, list (issues/mrs/notes),
me, timeline, who (active/expert/workload).
Add flex_width() helper and flex_col() builder method so a designated
column can absorb remaining terminal width after fixed columns are sized.
The flex column's width is clamped between 20 chars and its natural
(content-driven) width, and max_width constraints are skipped for it.
1. PATH blindness in cron: find_ollama_binary() used `which ollama` which
fails in cron's minimal PATH (/usr/bin:/bin). Added well-known install
locations (/opt/homebrew/bin, /usr/local/bin, /usr/bin, /snap/bin) as
fallback. ensure_ollama() now spawns using the discovered absolute path
instead of bare "ollama".
2. IPv6-first DNS resolution: is_ollama_reachable() only tried the first
address from to_socket_addrs(), which on macOS is ::1 (IPv6). Ollama
only listens on 127.0.0.1 (IPv4), so the check always failed.
Now iterates all resolved addresses — "Connection refused" on ::1 is
instant so there's no performance cost.
3. Excessive blocking on cold start: ensure_ollama() blocked for 30s
waiting for readiness, then reported failure even though ollama serve
was successfully spawned and still booting. Reduced wait to 5s (catches
hot restarts), and reports started=true on timeout since the ~90s
ingestion phase gives Ollama plenty of time to cold-start before the
embed stage needs it.
The explain command's human-mode output was hand-rolled with raw
println! formatting that didn't use any of the shared render.rs
infrastructure. This made it visually inconsistent with every other
command (me, who, search, timeline).
Changes to print_explain():
- Section headers now use render::section_divider() with counts,
producing the same box-drawing divider lines as the me command
- Entity refs use Theme::issue_ref()/mr_ref() color styling
- Entity state uses Theme::state_opened/closed/merged() styling
- Authors/usernames use Theme::username() with @ prefix
- Project paths use Theme::muted()
- Timestamps use format_relative_time() for recency fields (created,
first/last event, last note) and format_date() for point-in-time
fields (key decisions, timeline events), matching the conventions
in me, who, and timeline respectively
- Note excerpts use render::truncate() instead of manual byte slicing
- Related entity titles are truncated via render::truncate()
- Indentation aligned to 4-space content under section dividers
Robot JSON output is unchanged -- it continues to use ms_to_iso() for
all timestamp fields, consistent with the rest of the robot API.
The ensure_ollama() function previously blocked for up to 10 seconds
waiting for Ollama to become reachable after spawning. Cold starts can
take 30-60s, so this often timed out and reported a misleading error.
Now waits only 5 seconds (enough for hot restarts), and if Ollama is
still starting, reports started=true with no error instead of treating
it as a failure. The embed stage runs 60-90s later (after ingestion),
by which time Ollama is ready. The handler log message is updated to
distinguish hot restarts from cold starts still in progress.
GitLab auto-generates MR titles like "Draft: Resolve \"Issue Title\""
when creating MRs from issues. This 4-token boilerplate prefix dominated
the embedding vectors, causing unrelated MRs with the same title structure
to appear as highly similar in "lore related" results (0.667 similarity
vs 0.674 for the actual parent issue — a difference of only 0.007).
Add normalize_title_for_embedding() which deterministically strips:
- "Draft: " prefix (case-insensitive)
- "WIP: " prefix (case-insensitive)
- "Resolve \"...\"" wrapper (extracts inner title)
- Combinations: "Draft: Resolve \"...\""
The normalization is applied in all four document extractors (issues, MRs,
discussions, notes) to the content_text field only. DocumentData.title
preserves the original title for human-readable display in CLI output.
Since content_text changes, content_hash will differ from stored values,
triggering automatic re-embedding on the next "lore embed" run.
Uses str::get() for all byte-offset slicing to prevent panics on titles
containing emoji or other multi-byte UTF-8 characters.
15 new tests covering: all boilerplate patterns, case insensitivity,
edge cases (empty inner text, no-op for normal titles), UTF-8 safety,
and end-to-end document extraction with boilerplate titles.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change collapse_whitespace() from is_ascii_whitespace() to is_whitespace()
so non-breaking spaces, em-spaces, and other Unicode whitespace characters
in search snippets are also collapsed into single spaces. Additionally
fix serde_json::to_value() call site to handle serialization errors
gracefully instead of unwrapping.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add src/core/ollama_mgmt.rs module that handles Ollama detection, startup,
and health checking. This enables cron-based sync to automatically start
Ollama when it's installed but not running, ensuring embeddings are always
available during unattended sync runs.
Integration points:
- sync handler (--lock mode): calls ensure_ollama() before embedding phase
- cron status: displays Ollama health (installed/running/not-installed)
- robot JSON: includes OllamaStatusBrief in cron status response
The module handles local vs remote Ollama URLs, IPv6, process detection
via lsof, and graceful startup with configurable wait timeouts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add .cargo/config.toml to force all builds (including worktrees created
by Claude Code agents) to share a single target/ directory. Without this,
each worktree creates its own ~3GB target/ directory which fills the disk
when multiple agents are working in parallel.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document content_text includes multi-line metadata (Project:, URL:, Labels:,
State:) separated by newlines. FTS5 snippet() preserves these newlines, causing
subsequent lines to render at column 0 with no indent. collapse_newlines()
flattens all whitespace runs into single spaces before truncation and rendering.
Includes 3 unit tests.
SPEC_discussion_analysis.md defines a pre-computed enrichment pipeline that
replaces the current key_decisions heuristic in explain with actual
LLM-extracted discourse analysis (decisions, questions, consensus).
Key design choices:
- Dual LLM backend: Claude Haiku via AWS Bedrock (primary) or Anthropic API
- Pre-computed batch enrichment (lore enrich), never runtime LLM calls
- Staleness detection via notes_hash to skip unchanged threads
- New discussion_analysis SQLite table with structured JSON results
- Configurable via config.json enrichment section
Status: DRAFT — open questions on Bedrock model ID, auth mechanism, rate
limits, cost ceiling, and confidence thresholds.
Validates that the projects table schema uses gitlab_project_id (not
gitlab_id) and that queries filtering by this column return the correct
project. Uses the test helper convention where insert_project sets
gitlab_project_id = id * 100.
Guards against regression in the wiring chain run_me -> print_me_json ->
MeJsonEnvelope where the gitlab_base_url meta field could silently
disappear.
- me_envelope_includes_gitlab_base_url_in_meta: verifies full envelope
serialization preserves the base URL in meta
- activity_event_carries_url_construction_fields: verifies activity events
contain entity_type + entity_iid + project fields, then demonstrates
URL construction by combining with meta.gitlab_base_url
Previously, query_mentioned_in returned mentions from any time in the
entity's history as long as the entity was still open (or recently closed).
This caused noise: a mention from 6 months ago on a still-open issue would
appear in the dashboard indefinitely.
Now the SQL filters notes by created_at > mention_cutoff_ms, defaulting to
30 days. The recency_cutoff (7 days) still governs closed/merged entity
visibility — this new cutoff governs mention note age on open entities.
Signature change: query_mentioned_in gains a mention_cutoff_ms parameter.
All existing test call sites updated. Two new tests verify the boundary:
- mentioned_in_excludes_old_mention_on_open_issue (45-day mention filtered)
- mentioned_in_includes_recent_mention_on_open_issue (5-day mention kept)
Multiple improvements to the explain command's data richness:
- Add project_path to EntitySummary so consumers can construct URLs from
project + entity_type + iid without extra lookups
- Include first_note_excerpt (first 200 chars) in open threads so agents
and humans get thread context without a separate query
- Add state and direction fields to RelatedIssue — consumers now see
whether referenced entities are open/closed/merged and whether the
reference is incoming or outgoing
- Filter out self-references in both outgoing and incoming related entity
queries (entity referencing itself via cross-reference extraction)
- Wrap timeline excerpt in TimelineExcerpt struct with total_events and
truncated fields — consumers know when events were omitted
- Keep most recent events (tail) instead of oldest (head) when truncating
timeline — recent activity is more actionable
- Floor activity summary first_event at entity created_at — label events
from bulk operations can predate entity creation
- Human output: show project path in header, thread excerpt preview,
state badges on related entities, directional arrows, truncation counts
CEO memory notes for 2026-03-11 and 2026-03-12 capture the full timeline of
GIT-2 (founding engineer evaluation), GIT-3 (calibration task), and GIT-6
(plan reviewer hire).
Founding Engineer: AGENTS.md rewritten from 25-line boilerplate to 3-layer
progressive disclosure model (AGENTS.md core -> DOMAIN.md reference ->
SOUL.md persona). Adds HEARTBEAT.md checklist, TOOLS.md placeholder. Key
changes: memory system reference, async runtime warning, schema gotchas,
UTF-8 boundary safety, search import privacy.
Plan Reviewer: new agent created with AGENTS.md (review workflow, severity
levels, codebase context), HEARTBEAT.md, SOUL.md. Reviews implementation
plans in Paperclip issues before code is written.
The old truncation counted <mark></mark> HTML tags (~13 chars per keyword)
as visible characters, causing over-aggressive truncation. When a cut
landed inside a tag pair, render_snippet would render highlighted text
as muted gray instead of bold yellow.
New truncate_snippet() walks through markup counting only visible
characters, respects tag boundaries, and always closes an open <mark>
before appending ellipsis. Includes 6 unit tests.
Phase 1: Add source_entity_iid to search results via CASE subquery on
hydrate_results() for all 4 source types (issue, MR, discussion, note).
Phase 2: Fix visual alignment - compute indent from prefix visible width.
Phase 3: Show compact relative time on title line.
Phase 4: Add drill-down hint footer (lore issues <iid>).
Phase 5: Move labels to --explain mode, limit snippets to 2 terminal lines.
Phase 6: Use section_divider() for results header.
Also: promote strip_ansi/visible_width to public render utils, update
robot mode --fields minimal search preset with source_entity_iid.
The `me` dashboard robot output now includes `meta.gitlab_base_url` so
consuming agents can construct clickable issue/MR links without needing
access to the lore config file. The pattern is:
{gitlab_base_url}/{project}/-/issues/{iid}
{gitlab_base_url}/{project}/-/merge_requests/{iid}
This uses the new RobotMeta::with_base_url() constructor. The base URL
is sourced from config.gitlab.base_url (already available in the me
command's execution context) and normalized to strip trailing slashes.
robot-docs updated to document the new meta field and URL construction
pattern for the me command's response schema.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RobotMeta previously required direct struct literal construction with only
elapsed_ms. This made it impossible to add optional fields without updating
every call site to include them.
Introduce two constructors:
- RobotMeta::new(elapsed_ms) — standard meta with timing only
- RobotMeta::with_base_url(elapsed_ms, base_url) — meta enriched with the
GitLab instance URL, enabling consumers to construct entity links without
needing config access
The gitlab_base_url field uses #[serde(skip_serializing_if = "Option::is_none")]
so existing JSON envelopes are byte-identical — no breaking change for any
robot mode consumer.
All 22 call sites across handlers, count, cron, drift, embed, generate_docs,
ingest, list (mrs/notes), related, show, stats, sync_status, and who are
updated from struct literals to RobotMeta::new(). Three tests verify the
new constructors and trailing-slash normalization.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three robot-mode print functions used `serde_json::to_string().unwrap_or_default()`
which silently outputs an empty string on failure (exit 0, no error). This
diverged from the codebase standard in handlers.rs which uses `?` propagation.
Changed to return Result<()> with proper LoreError::Other mapping:
- explain.rs: print_explain_json()
- file_history.rs: print_file_history_json()
- trace.rs: print_trace_json()
Updated callers in handlers.rs and explain.rs to propagate with `?`.
While serde_json::to_string on a json!() Value is unlikely to fail in practice
(only non-finite floats trigger it), the unwrap_or_default pattern violates the
robot mode contract: callers expect either valid JSON on stdout or a structured
error on stderr with a non-zero exit code, never empty output with exit 0.
SQLite does not guarantee row order without ORDER BY, even with LIMIT.
This was a systemic issue found during a multi-pass bug hunt:
Production queries (explain.rs):
- Outgoing reference query: ORDER BY target_entity_type, target_entity_iid
- Incoming reference query: ORDER BY source_entity_type, COALESCE(iid)
Without these, robot mode output was non-deterministic across calls,
breaking clients expecting stable ordering.
Test helper queries (5 locations across 3 files):
- discussions_tests.rs: get_discussion_id()
- mr_discussions.rs: get_mr_discussion_id()
- queue.rs: setup_db_with_job(), release_all_locked_jobs_clears_locks()
Currently safe (single-row inserts) but would break silently if tests
expanded to multi-row fixtures.
1. fetch_open_threads: replace N+1 loop (2 queries per thread) with a
single query using correlated subqueries for note_count and started_by.
2. extract_key_decisions: track consumed notes so the same note is not
matched to multiple events, preventing duplicate decision entries.
3. build_timeline_excerpt_from_pipeline: log tracing::warn on seed/collect
failures instead of silently returning empty timeline.
entity_references.target_entity_iid is nullable (unresolved cross-project
refs), and COALESCE(i.iid, mr.iid) returns NULL for orphaned refs.
Both paths caused rusqlite InvalidColumnType errors when fetching i64.
Added IS NOT NULL filters to both outgoing and incoming reference queries.
Update planning docs and audit tables to reflect the removal of
`lore show`:
- CLI_AUDIT.md: remove show row, renumber remaining entries
- plan-expose-discussion-ids.md: replace `show` with
`issues <IID>`/`mrs <IID>`
- plan-expose-discussion-ids.feedback-3.md: replace `show` with
"detail views"
- work-item-status-graphql.md: update example commands from
`lore show issue 123` to `lore issues 123`
The `show` command (`lore show issue 42` / `lore show mr 99`) was
deprecated in favor of the unified entity commands (`lore issues 42` /
`lore mrs 99`). This commit fully removes the command entry point:
- Remove `Commands::Show` variant from clap CLI definition
- Remove `Commands::Show` match arm and deprecation warning in main.rs
- Remove `handle_show_compat()` forwarding function from robot_docs.rs
- Remove "show" from autocorrect known-commands and flags tables
- Rename response schema keys from "show" to "detail" in robot-docs
- Update command descriptions from "List or show" to "List ... or
view detail with <IID>"
The underlying detail-view module (`src/cli/commands/show/`) is
preserved — its types (IssueDetail, MrDetail) and query/render
functions are still used by `handle_issues` and `handle_mrs` when
an IID argument is provided.
The show command's NoteDetail and MrNoteDetail structs were missing
gitlab_id, making individual notes unaddressable in robot mode output.
This was inconsistent with the notes list command which already exposed
gitlab_id. Without an identifier, agents consuming show output could
not construct GitLab web URLs or reference specific notes for follow-up
operations via glab.
Added gitlab_id to:
- NoteDetail / NoteDetailJson (issue discussions)
- MrNoteDetail / MrNoteDetailJson (MR discussions)
- Both SQL queries (shifted column indices accordingly)
- Both From<&T> conversion impls
Deliberately scoped to show command only — me/timeline/trace structs
were evaluated and intentionally left unchanged because they serve
different consumption patterns where note-level identity is not needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document that the activity feed and since-last-check inbox cover items
in any state (open, closed, merged), while the issues and MRs sections
show only open items. Add the previously undocumented since-last-check
inbox section to the dashboard description.
The activity feed and since-last-check inbox previously filtered to
only open items via state = 'opened' checks in the SQL subqueries.
This meant comments on merged MRs (post-merge follow-ups, questions)
and closed issues were silently dropped from the feed.
Remove the state filter from the association checks in both
query_activity() and query_since_last_check(). The user-association
checks (assigned, authored, reviewing) remain — activity still only
appears for items the user is connected to, regardless of state.
The simplified subqueries also eliminate unnecessary JOINs to the
issues/merge_requests tables that were only needed for the state
check, resulting in slightly more efficient index-only scans on
issue_assignees and mr_reviewers.
Add 4 tests covering: merged MR (authored), closed MR (reviewer),
closed issue (assignee), and merged MR in the since-last-check inbox.
CLI audit scoring the current command surface across human ergonomics,
robot/agent ergonomics, documentation quality, and flag design. Paired
with a detailed implementation plan for restructuring commands into a
more consistent, discoverable hierarchy.
Add pre-flight FTS count check before expensive bm25-ranked search.
Queries matching >10,000 documents are rejected instantly with a
suggestion to use a more specific query or --since filter.
Prevents multi-minute CPU spin on queries like 'merge request' that
match most of the corpus (106K/178K documents).
is_multiple_of(N) returns true for 0, which caused debug/info
progress messages to fire at doc_num=0 (the start of every page)
rather than only at the intended 50/100 milestones. Add != 0
check to both the debug (every 50) and info (every 100) log sites.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Major additions to the migration plan based on review feedback:
Alternative analysis:
- Add "Why not tokio CancellationToken + JoinSet?" section explaining
why obligation tracking and single-migration cost favor asupersync
over incremental tokio fixes.
Error handling depth:
- Add NetworkErrorKind enum design for preserving error categories
(timeout, DNS, TLS, connection refused) without coupling LoreError
to any HTTP client.
- Add response body size guard (64 MiB) to prevent unbounded memory
growth from misconfigured endpoints.
Adapter layer refinements:
- Expand append_query_params with URL fragment handling, edge case
docs, and doc comments.
- Add contention constraint note for std::sync::Mutex rate limiter.
Cancellation invariants (INV-1 through INV-4):
- Atomic batch writes, no .await between tx open/commit,
ShutdownSignal + region cancellation complementarity.
- Concrete test plan for each invariant.
Semantic ordering concerns:
- Document 4 behavioral differences when replacing join_all with
region-spawned tasks (ordering, error aggregation, backpressure,
late result loss on cancellation).
HTTP behavior parity:
- Replace informational table with concrete acceptance criteria and
pass/fail tests for redirects, proxy, keep-alive, DNS, TLS, and
Content-Length.
Phasing refinements:
- Add Cx threading sub-steps (orchestration path first, then
command/embedding layer) for blast radius reduction.
- Add decision gate between Phase 0d and Phase 1 requiring compile +
behavioral smoke tests before committing to runtime swap.
Rollback strategy:
- Per-phase rollback guidance with concrete escape hatch triggers
(nightly breakage > 7d, TLS incompatibility, API instability,
wiremock issues).
Testing depth:
- Adapter-layer test gap analysis with 5 specific asupersync-native
integration tests.
- Cancellation integration test specifications.
- Coverage gap documentation for wiremock-on-tokio tests.
Risk register additions:
- Unbounded response body buffering, manual URL/header handling
correctness.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>