84 Commits

Author SHA1 Message Date
teernisse
171260a772 feat(cli): implement 'lore trace' command (bd-2n4, bd-9dd)
Gate 5 Code Trace - Tier 1 (API-only, no git blame).
Answers 'Why was this code introduced?' by building
file -> MR -> issue -> discussion chains.

New files:
- src/core/trace.rs: run_trace() query logic with rename-aware
  path resolution, entity_reference-based issue linking, and
  DiffNote discussion extraction
- src/core/trace_tests.rs: 7 unit tests for query logic
- src/cli/commands/trace.rs: CLI command with human output,
  robot JSON output, and :line suffix parsing (5 tests)

Human output shows full content (no truncation).
Robot JSON truncates discussion bodies to 500 chars for token efficiency.

Wiring:
- TraceArgs + Commands::Trace in cli/mod.rs
- handle_trace in main.rs
- VALID_COMMANDS + robot-docs manifest entry
- COMMAND_FLAGS autocorrect registry entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 14:57:21 -05:00
teernisse
a1bca10408 feat(cli): implement 'lore file-history' command (bd-z94)
Adds file-history command showing which MRs touched a file, with:
- Rename chain resolution via BFS (resolve_rename_chain from bd-1yx)
- DiffNote discussion snippets with --discussions flag
- --merged filter, --no-follow-renames, -n limit
- Human output with styled MR list and rename chain display
- Robot JSON output with {ok, data, meta} envelope
- Autocorrect registry and robot-docs manifest entry
- Fixes pre-existing --no-status missing from sync autocorrect registry
2026-02-17 12:57:56 -05:00
teernisse
491dc52864 release: v0.8.3 2026-02-16 10:29:52 -05:00
teernisse
b9063aa17a feat(cli): add --no-status flag to skip GraphQL status enrichment during sync 2026-02-16 10:29:11 -05:00
teernisse
fc0d9cb1d3 feat(sync): colored stage output, functional sub-rows, and error visibility
Overhaul the sync command's human output to use semantic colors and a
cleaner rendering architecture. The changes fall into four areas:

Stage lines: Replace direct finish_stage() calls with an
emit_stage_line/emit_stage_block pattern that clears the spinner first,
then prints static lines via MultiProgress::suspend. Stage icons are
now color-coded green (success) or yellow (warning) via color_icon().
A separate "Status" stage line now appears after Issues, summarizing
work-item status enrichment across all projects.

Sub-rows: Replace the imperative print_issue_sub_rows/print_mr_sub_rows
functions with functional issue_sub_rows(), mr_sub_rows(), and new
status_sub_rows() that return Vec<String>. Project paths use
Theme::muted(), error/failure counts use Theme::warning(), and
separators use the dim middle-dot style. Sub-rows are printed atomically
with their parent stage line to avoid interleaving with spinners.

Summary: In print_sync(), counts now use Theme::info().bold() for visual
pop, detail-line separators are individually styled (dim middle-dot),
and a new "Sync completed with issues" headline appears when any stage
had failures. Document errors and embedding failures are surfaced in
both the doc-parts line and the errors line.

Tests: Full coverage for append_failures, summarize_status_enrichment,
should_print_timings, issue_sub_rows, mr_sub_rows, and status_sub_rows.
2026-02-16 09:43:36 -05:00
teernisse
c8b47bf8f8 feat(cli): add --timings flag and enrich error tracking fields
Add -t/--timings flag to the sync subcommand, allowing users to opt
into a per-stage timing breakdown after the sync summary. Wire the flag
through main.rs into print_sync() which passes it to the new
should_print_timings() gate.

Enrich the data structures that flow through the sync pipeline so
downstream renderers have full error visibility:

- ProjectSummary gains status_errors (issue-side status enrichment
  failures per project)
- ProjectStatusEnrichment gains path (project path for sub-row display)
- SyncResult gains documents_errored and embedding_failed so the
  summary can surface doc-gen and embed failures separately
- Autocorrect table updated with --timings for fuzzy flag matching
2026-02-16 09:43:22 -05:00
teernisse
a570327a6b refactor(progress): extract format_stage_line with themed styling
Pull the line-formatting logic out of finish_stage() into a standalone
public format_stage_line() so that sync.rs can build stage lines without
needing a live ProgressBar (e.g. for static multi-line blocks printed
after the spinner is cleared).

The new function applies Theme::info().bold() to the label and
Theme::timing() to the elapsed column, giving every stage line
consistent color treatment. finish_stage() now delegates to it.

Includes a unit test asserting the formatted output contains the
expected icon, label, summary, and elapsed components.
2026-02-16 09:43:13 -05:00
teernisse
eef73decb5 fix(cli): timeline tag width, test env isolation, and logging verbosity
Miscellaneous fixes across CLI and core modules:

- Timeline: widen TAG_WIDTH from 10 to 11 to accommodate longer event
  type labels without truncation
- render.rs: save and restore LORE_ICONS env var in glyph_mode test to
  prevent interference from the test environment leaking into or from
  other tests that set LORE_ICONS
- logging.rs: adjust verbose=1 to info level (was debug), verbose=2 to
  debug — this reduces noise at -v while keeping -vv as the full debug
  experience
- issues.rs, merge_requests.rs: use infodebug! macro consistently for
  ingestion summary logging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 11:25:42 -05:00
teernisse
bb6660178c feat(sync): per-project breakdown, status enrichment progress bars, and summary polish
Add per-project detail rows beneath stage completion lines during multi-project
syncs, showing itemized counts (issues/MRs, discussions, events, statuses, diffs)
for each project. Previously, only aggregate totals were visible, making it hard
to diagnose which project contributed what during a sync.

Status enrichment gets proper progress bars replacing the old spinner-only
display: StatusEnrichmentStarted now carries a total count so the CLI can
render a determinate bar with rate and ETA. The enrichment SQL is tightened
to use IS NOT comparisons for diff-only UPDATEs (skip rows where values
haven't changed), and a follow-up touch_stmt ensures status_synced_at is
updated even for unchanged rows so staleness detection works correctly.

Other improvements:
- New ProjectSummary struct aggregates per-project metrics during ingestion
- SyncResult gains statuses_enriched + per-project summary vectors
- "Already up to date" message when sync finds zero changes
- Remove Arc<AtomicBool> tick_started pattern from docs/embed stages
  (enable_steady_tick is idempotent, the guard was unnecessary)
- Progress bar styling: dim spinner, dark_gray track, per_sec + eta display
- Tick intervals tightened from 100ms to 60ms for smoother animation
- statuses_without_widget calculation uses fetch_result.statuses.len()
  instead of subtracting enriched (more accurate when some statuses lack
  work item widgets)
- Status enrichment completion log downgraded from info to debug

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 11:25:33 -05:00
teernisse
64e73b1cab fix(graphql): handle past HTTP dates in retry-after header gracefully
Extract parse_retry_after_value(header, now) as a pure function to enable
deterministic testing of Retry-After header parsing. The previous
implementation used let-chains with SystemTime::now() inline, which made
it untestable and would panic on negative durations when the server
clock was behind or the header contained a date in the past.

Changes:
- Extract parse_retry_after_value() taking an explicit `now` parameter
- Handle past HTTP dates by returning 1 second instead of panicking on
  negative Duration (date.duration_since(now) returns Err for past dates)
- Trim whitespace from header values before parsing
- Add test for past HTTP date returning 1 second minimum
- Add test for delta-seconds with surrounding whitespace

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 11:25:19 -05:00
teernisse
361757568f refactor(cli): remove deprecated stage_spinner, migrate remaining callers to v2
Phase 7 cleanup: migrate timeline.rs and main.rs search spinner
from stage_spinner() to stage_spinner_v2() with proper icon labels,
then remove the now-unused stage_spinner() function and its tests.

No external callers remain for the old numbered-stage API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:13:06 -05:00
Taylor Eernisse
8572f6cc04 refactor(cli): polish secondary commands with icons, number formatting, and section dividers
Phase 6 of the UX overhaul. Applies consistent visual treatment across
the remaining command outputs: stats, doctor, timeline, who, count,
and drift.

Stats (stats.rs):
- Apply render::format_number() to all numeric values (documents,
  FTS indexed, embedding counts, chunks) for thousand-separator
  formatting in large databases

Doctor (doctor.rs):
- Replace Unicode check/warning/cross symbols with Icons::success(),
  Icons::warning(), Icons::error() for glyph-mode awareness
- Add summary line after checks showing "Ready/Not ready" with counts
  of passed, warnings, and failed checks separated by middle dots
- Remove "lore doctor" title header for cleaner output

Count (count.rs):
- Right-align numeric values with {:>10} format for columnar output
  in count and state breakdown displays

Timeline (timeline.rs):
- Add entity icons (issue/MR) before entity references in event rows
- Refactor format_event_tag to pad plain text before applying style,
  preventing ANSI codes from breaking column alignment
- Extract style_padded() helper for width-then-style pattern

Who (who.rs):
- Add Icons::user() before usernames in expert, workload, reviews,
  and overlap displays
- Replace manual bold section headers with render::section_divider()
  in workload view (Assigned Issues, Authored MRs, Reviewing MRs,
  Unresolved Discussions)

Drift (drift.rs):
- Add Icons::error()/success() before drift detection status line
- Replace '#' bar character with Unicode full block for similarity
  curve visualization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
d0744039ef refactor(show): polish issue and MR detail views with section dividers and icons
Phase 4 of the UX overhaul. Restructures the show issue and show MR
detail displays with consistent section layout, state icons, and
improved typography.

Issue detail changes:
- Replace bold header + box-drawing underline with indented title using
  Theme::bold() for the title text only
- Organize fields into named sections using render::section_divider():
  Details, Development, Description, Discussions
- Add state icons (Icons::issue_opened/closed) alongside text labels
- Add relative time in parentheses next to Created/Updated dates
- Switch labels from "Labels: (none)" to only showing when present,
  using format_labels_bare for clean comma-separated output
- Move URL and confidential indicator into Details section
- Closing MRs show state-colored icons (merged/opened/closed)
- Discussions use section_divider instead of bold text, remove colons
  from author lines, adjust wrap widths for consistent indentation

MR detail changes:
- Same section-divider layout: Details, Description, Discussions
- State icons for opened/merged/closed using Icons::mr_* helpers
- Draft indicator uses Icons::mr_draft() instead of [Draft] text prefix
- Relative times added to Created, Updated, Merged, Closed dates
- Reviewers and Assignees fields aligned with fixed-width labels
- Labels shown only when present, using format_labels_bare
- Discussion formatting matches issue detail style

Both views use 5-space left indent for field alignment and consistent
wrap widths (72 for descriptions, 68/66 for discussion notes/replies).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
4b372dfb38 refactor(list): polish list commands with icons, compact timestamps, and styled discussions
Phase 3 of the UX overhaul. Enhances the issues, merge requests, and
notes list displays with visual indicators and improved formatting.

List display changes (src/cli/commands/list.rs):
- Add state icons to issues (opened/closed) and merge requests
  (opened/merged/closed) using Icons:: helpers alongside text labels
- Replace [DRAFT] prefix with Icons::mr_draft() glyph for draft MRs
- Switch from format_relative_time to format_relative_time_compact for
  tighter column widths in tabular output
- Switch from format_labels to format_labels_bare for unlabeled style
- Change format_discussions() return type from String to StyledCell so
  unresolved counts render with Theme::warning() color inline
- Bold the section headers ("Issues", "Merge Requests", "Notes")
  with count separated from the label for cleaner scanning
- Import Icons from render module

Test updates (src/cli/commands/list_tests.rs):
- Update format_discussions tests to assert on StyledCell.text field
  instead of raw String, since the function now returns styled output
- The unresolved-count test checks starts_with/contains to handle
  embedded ANSI escape codes from Theme::warning()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
af8fc4af76 refactor(sync): overhaul progress display with stage spinners and summaries
Phase 2 of the UX overhaul. Replaces the old numbered-stage progress
system (1/4, 2/4...) and manual indicatif ProgressBar/ProgressStyle
setup with the new centralized progress helpers.

Sync command changes (src/cli/commands/sync.rs):
- Replace stage_spinner(n, total, msg) with stage_spinner_v2(icon, label, status)
  removing the rigid numbered-stage counter in favor of named stages
- Replace manual ProgressBar::new + ProgressStyle::default_bar for docs
  and embed sub-progress with nested_progress(label, len, robot_mode)
- Add finish_stage() calls that display a completion summary with
  elapsed time, e.g. "Issues  42 issues from 3 projects  1.2s"
- Each stage (Issues, MRs, Docs, Embed) now reports what it did on
  completion rather than just clearing the spinner silently
- Embed failure path uses Icons::warning() instead of inline Theme
  formatting, keeping error display consistent with success path
- Remove indicatif direct dependency from sync.rs (now handled by
  progress module)

Main entry point changes (src/main.rs):
- Add GlyphMode detection: auto-detect Unicode/Nerd Font support or
  fall back to ASCII based on --icons flag, --color=never, NO_COLOR,
  or robot mode
- Update all LoreRenderer::init() calls to pass GlyphMode alongside
  ColorMode for icon-aware rendering throughout the CLI
- Overhaul handle_error() formatting: use Icons::error() glyph,
  bold error text, arrow prefixed action suggestions, and breathing
  room with blank lines for scannability
- Migrate handle_embed() progress bar from manual ProgressBar +
  ProgressStyle to nested_progress() helper, matching sync command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
96b288ccdd refactor(search): polish search results rendering with semantic Theme styles
Phase 5 of the UX overhaul. Migrates search result display from raw
console styling to the centralized Theme system with semantic methods,
improving visual consistency and readability.

Search result changes:
- Type badges now use semantic styles (issue_ref, mr_ref) with
  fixed-width alignment for clean columnar layout
- Snippet rendering uses Theme::highlight() for matched terms and
  Theme::muted() for surrounding context, replacing bold+underline
- Metadata line uses Theme::username() for authors and per-part
  styling with middle-dot separators instead of a single dim line
- Result numbering uses muted style with right-aligned width
- Consistent 8-space indent for metadata, snippets, and explain lines
- Header line uses muted style for search mode instead of dim+parens
- Trailing blank line moved after the result loop instead of per-result

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
teernisse
d710403567 feat(cli): add GlyphMode icon system, Theme extensions, and progress API
Phase 1 of UX skin overhaul: foundation layer that all subsequent
phases build upon.

Icons: 3-tier glyph system (Nerd Font / Unicode / ASCII) with
auto-detection from TERM_PROGRAM, LORE_ICONS env, or --icons flag.
16 semantic icon methods on Icons struct (success, warning, error,
issue states, MR states, note, search, user, sync, waiting).

Theme: 4 new semantic styles — muted (#6b7280), highlight (#fbbf24),
timing (#94a3b8), state_draft (#6b7280).

Progress: stage_spinner_v2 with icon prefix, nested_progress with
bounded bar/throughput/ETA, finish_stage for static completion lines,
format_elapsed for compact duration strings.

Utilities: format_relative_time_compact (3h, 2d, 1w, 3mo),
format_labels_bare (comma-separated without brackets).

CLI: --icons global flag, GLOBAL_FLAGS registry updated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
ebf64816c9 fix(search): correct FTS5 raw mode fallback test assertion
Update test_raw_mode_leading_wildcard_falls_back_to_safe to match the
actual Safe mode behavior: OR is a recognized FTS5 boolean operator and
passes through unquoted, so the expected output is '"*" OR "auth"' not
'"*" "OR" "auth"'. The previous assertion was incorrect since the Safe
mode operator-passthrough logic was added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:34:01 -05:00
Taylor Eernisse
450951dee1 feat(timeline): rename --expand-mentions to --no-mentions, default mentions on
Invert the timeline mention-expansion flag semantics. Previously, mention
edges were excluded by default and --expand-mentions opted in. Now mention
edges are included by default (matching the more common use case) and
--no-mentions opts out to reduce fan-out when needed.

This is a breaking CLI change but aligns with the principle that the
default behavior should produce the most useful output. Users who were
passing --expand-mentions get the same behavior without any flag. Users
who want reduced output can pass --no-mentions.

Updated: CLI args (TimelineArgs), autocorrect flag list, robot-docs
schema, README documentation and flag reference table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:33:34 -05:00
Taylor Eernisse
81f049a7fa refactor(main): wire LoreRenderer init, migrate to Theme, improve UX polish
Wire the LoreRenderer singleton initialization into main.rs color mode
handling, replacing the console::style import with Theme throughout.

Key changes:

- Color initialization: LoreRenderer::init() called for all code paths
  (NO_COLOR, --color never/always/auto, unknown mode fallback) alongside
  the existing console::set_colors_enabled() calls. Both systems must
  agree since some transitive code still uses console (e.g. dialoguer).

- Tracing: Replace .with_target(false) with .event_format(CompactHumanFormat)
  for the stderr layer, producing the clean 'HH:MM:SS LEVEL  message' format.

- Error handling: handle_error() now shows machine-actionable recovery
  commands from gi_error.actions() below the hint, formatted with dim '$'
  prefix and bold command text.

- Deprecation warnings: All 'lore list', 'lore show', 'lore auth-test',
  'lore sync-status' warnings migrated to Theme::warning().

- Init wizard: All success/info/error messages migrated. Unicode check
  marks use explicit \u{2713} escapes instead of literal symbols.

- Embed command: Added progress bar with indicatif for embedding stage,
  showing position/total with steady tick. Elapsed time shown on completion.

- Generate-docs and ingest commands: Added 'Done in Xs' elapsed time and
  next-step hints (run embed after generate-docs, run generate-docs after
  ingest) for better workflow guidance.

- Sync output: Interrupt message and lock release migrated to Theme.

- Health command: Status labels and overall healthy/unhealthy styled.

- Robot-docs: Added drift command schema, updated sync flags to include
  --no-file-changes, updated who flags with new options.

- Timeline --expand-mentions -> --no-mentions flag rename wired through
  params and robot-docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:33:09 -05:00
Taylor Eernisse
dd00a2b840 refactor(cli): migrate all command modules from console::style to Theme
Replace all console::style() calls in command modules with the centralized
Theme API and render:: utility functions. This ensures consistent color
behavior across the entire CLI, proper NO_COLOR/--color never support via
the LoreRenderer singleton, and eliminates duplicated formatting code.

Changes per module:

- count.rs: Theme for table headers, render::format_number replacing local
  duplicate. Removed local format_number implementation.
- doctor.rs: Theme::success/warning/error for check status symbols and
  messages. Unicode escapes for check/warning/cross symbols.
- drift.rs: Theme::bold/error/success for drift detection headers and
  status messages.
- embed.rs: Compact output format — headline with count, zero-suppressed
  detail lines, 'nothing to embed' short-circuit for no-op runs.
- generate_docs.rs: Same compact pattern — headline + detail + hint for
  next step. No-op short-circuit when regenerated==0.
- ingest.rs: Theme for project summaries, sync status, dry-run preview.
  All console::style -> Theme replacements.
- list.rs: Replace comfy-table with render::LoreTable for issue/MR listing.
  Remove local colored_cell, colored_cell_hex, format_relative_time,
  truncate_with_ellipsis, and format_labels (all moved to render.rs).
- list_tests.rs: Update test assertions to use render:: functions.
- search.rs: Add render_snippet() for FTS5 <mark> tag highlighting via
  Theme::bold().underline(). Compact result layout with type badges.
- show.rs: Theme for entity detail views, delegate format_date and
  wrap_text to render module.
- stats.rs: Section-based layout using render::section_divider. Compact
  middle-dot format for document counts. Color-coded embedding coverage
  percentage (green >=95%, yellow >=50%, red <50%).
- sync.rs: Compact sync summary — headline with counts and elapsed time,
  zero-suppressed detail lines, visually prominent error-only section.
- sync_status.rs: Theme for run history headers, removed local
  format_number duplicate.
- timeline.rs: Theme for headers/footers, render:: for date/truncate,
  standard format! padding replacing console::pad_str.
- who.rs: Theme for all expert/workload/active/overlap/review output
  modes, render:: for relative time and truncation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:32:35 -05:00
Taylor Eernisse
c6a5461d41 refactor(ingestion): compact log summaries and quieter shutdown messages
Migrate all ingestion completion logs to use nonzero_summary() for compact,
zero-suppressed output. Before: 8-14 individual key=value structured fields
per completion message. After: a single summary field like
'42 fetched · 3 labels · 12 notes' that only shows non-zero counters.

Also downgrade all 'Shutdown requested...' messages from info! to debug!.
These are emitted on every Ctrl+C and add noise to the partial results
output that immediately follows. They remain visible at -vv for debugging
graceful shutdown behavior.

Affected modules:
- issues.rs: issue ingestion completion
- merge_requests.rs: MR ingestion completion, full-sync cursor reset
- mr_discussions.rs: discussion ingestion completion
- orchestrator.rs: project-level issue and MR completion summaries,
  all shutdown-requested checkpoints across discussion sync, resource
  events drain, closes-issues drain, and MR diffs drain

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:57 -05:00
Taylor Eernisse
a7f86b26e4 refactor(core): compact human log format, quieter lock lifecycle, nonzero_summary helper
Three quality-of-life improvements to reduce log noise and improve readability:

1. logging.rs: Add CompactHumanFormat for stderr tracing output. Replaces the
   default format with a minimal 'HH:MM:SS LEVEL  message key=value' layout —
   no span context, no full timestamps, no target module. The JSON file log
   layer is unaffected. This makes watching 'lore sync' output much cleaner.

2. lock.rs: Downgrade AppLock acquire/release messages from info! to debug!.
   Lock lifecycle events (acquired new, acquired existing, released) are
   operational bookkeeping that clutters normal output. They remain visible
   at -vv verbosity for troubleshooting.

3. ingestion/mod.rs: Add nonzero_summary() utility that formats named counters
   as a compact middle-dot-separated string, suppressing zero values. Produces
   output like '42 fetched · 3 labels · 12 notes' instead of verbose key=value
   structured fields. Returns 'nothing to update' when all values are zero.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:30 -05:00
Taylor Eernisse
5ee8b0841c feat(cli): add centralized render module with semantic Theme and LoreRenderer
Introduce src/cli/render.rs as the single source of truth for all terminal
output styling and formatting utilities. Key components:

- LoreRenderer: global singleton initialized once at startup, resolving
  color mode (Auto/Always/Never) against TTY state and NO_COLOR env var.
  This fixes lipgloss's limitation of hardcoded TrueColor rendering by
  gating all style application through a colors_on() check.

- Theme: semantic style constants (success/warning/error/info/accent,
  entity refs, state colors, structural styles) that return plain
  Style::new() when colors are disabled. Replaces ad-hoc console::style()
  calls scattered across 15+ command modules.

- Shared formatting utilities consolidated from duplicated implementations:
  format_relative_time (was in list.rs and who.rs), format_number (was in
  count.rs and sync_status.rs), truncate (was truncate_with_ellipsis in
  list.rs and truncate_summary in timeline.rs), format_labels, format_date,
  wrap_indent, section_divider.

- LoreTable: lightweight table renderer replacing comfy-table with simple
  column alignment (Left/Right/Center), adaptive terminal width, and
  NO_COLOR-safe output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:02 -05:00
Taylor Eernisse
7062a3f1fd deps: replace comfy-table with lipgloss (charmed-lipgloss)
Switch from comfy-table to the lipgloss Rust port for terminal styling.
lipgloss provides a composable Style API better suited to our new semantic
theming approach (Theme::success(), Theme::error(), etc.) where we apply
styles to individual text spans rather than constructing styled table cells.
The comfy-table dependency was only used by the list command's human output
and is no longer needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:30:31 -05:00
teernisse
159c490ad7 docs: update README with notes, drift, error tolerance, scoring config, and expanded command reference
Major additions:
- lore notes command: full documentation of rich note querying with
  filters (author, type, path, resolution, time range, body substring),
  sort/format options, field selection, and browser opening
- lore drift command: discussion divergence detection documentation
- Error Tolerance section: table of all 8 auto-correction types with
  examples and mode behavior, stderr JSON warning format, fuzzy
  suggestion format for unrecognized commands
- Command Aliases table: primary commands and their accepted aliases
- scoring config section: all weight/half-life/decay parameters for
  the who-expert scoring engine (authorWeight, reviewerWeight, noteBonus,
  half-life periods, closedMrMultiplier, excludedUsernames)

Updates to existing sections:
- Timeline: entity-direct seeding syntax (issue:N, i:N, mr:N, m:N),
  hybrid search pipeline description replacing pure FTS5, discussion
  thread collection, --fields flag, numbered progress spinners
- Search: --after/--updated-after renamed to --since/--updated-since,
  progress spinner behavior, note type filter
- Who: --explain-score, --as-of, --include-bots, --all-history, --detail
- Sync: --no-file-changes flag
- Robot-docs: --brief flag
- Field selection: expanded to note which commands support --fields
2026-02-13 17:27:59 -05:00
teernisse
e0041ed4d9 feat(cli): improve error recovery with alias-aware suggestions and error tolerance manifest
Two related improvements to agent ergonomics in main.rs:

1. suggest_similar_command now matches against aliases (issue->issues,
   mr->mrs, find->search, stat->stats, note->notes, etc.) and provides
   contextual usage examples via a new command_example() helper, so
   agents get actionable recovery hints like "Did you mean 'lore mrs'?
   Example: lore --robot mrs -n 10" instead of just the command name.

2. robot-docs now includes an error_tolerance section documenting every
   auto-correction the CLI performs: types (single_dash_long_flag,
   case_normalization, flag_prefix, fuzzy_flag, subcommand_alias,
   value_normalization, value_fuzzy, prefix_matching), examples, and
   mode behavior (threshold differences). Also expands the aliases
   section with command_aliases and pre_clap_aliases maps for complete
   agent self-discovery.

Together these ensure agents can programmatically discover and recover
from any CLI input error without human intervention.
2026-02-13 17:27:49 -05:00
teernisse
a34751bd47 feat(autocorrect): expand pre-clap correction to 3-phase pipeline with subcommand aliases, value normalization, and flag prefix matching
Three-phase pipeline replacing the single-pass correction:

- Phase A: Subcommand alias correction — handles forms clap can't
  express (merge_requests, mergerequests, robotdocs, generatedocs,
  gen-docs, etc.) via case-insensitive alias map lookup.
- Phase B: Per-arg flag corrections — adds unambiguous prefix expansion
  (--proj -> --project) alongside existing single-dash, case, and fuzzy
  rules. New FlagPrefix rule with 0.95 confidence.
- Phase C: Enum value normalization — auto-corrects casing, prefixes,
  and typos for flags with known valid values. Handles both --flag value
  and --flag=value forms. Respects POSIX -- option terminator.

Changes strict/robot mode from disabling fuzzy matching entirely to using
a higher threshold (0.9 vs 0.8), still catching obvious typos like
--projct while avoiding speculative corrections that mislead agents.

New CorrectionRule variants: SubcommandAlias, ValueNormalization,
ValueFuzzy, FlagPrefix. Each has a corresponding teaching note.
Comprehensive test coverage for all new correction types including
subcommand aliases, value normalization (case, prefix, fuzzy, eq-form),
flag prefix (ambiguous rejection, eq-value preservation), and updated
strict mode behavior.
2026-02-13 17:27:39 -05:00
teernisse
0aecbf33c0 feat(xref): extract cross-references from descriptions, user notes, and fix system note regex
- Fix MENTIONED_RE/CLOSED_BY_RE to match real GitLab format
  ('mentioned in issue #N' / 'mentioned in merge request !N')
- Add GITLAB_URL_RE + parse_url_refs() for full URL extraction
- Add extract_refs_from_descriptions() -> source_method='description_parse'
- Add extract_refs_from_user_notes() -> source_method='note_parse'
- Wire both into orchestrator after system note extraction
- 36 tests: regex fix, URL parsing, integration, idempotency
2026-02-13 17:19:36 -05:00
teernisse
c10471ddb9 feat(timeline): add entity-direct seeding (issue:N, mr:N syntax)
Adds issue:N / i:N / mr:N / m:N query syntax to bypass hybrid search
and seed the timeline directly from a known entity. All discussions for
the entity are gathered without needing Ollama.

- parse_timeline_query() detects entity-direct patterns
- resolve_entity_by_iid() resolves IID to EntityRef with ambiguity handling
- seed_timeline_direct() gathers all discussions for the entity
- 20 new tests (5 resolve, 6 direct seed, 9 parse)
- Updated CLI help text and robot-docs manifest
2026-02-13 15:22:45 -05:00
teernisse
cbce4c9f59 release: v0.8.2 2026-02-13 15:01:28 -05:00
teernisse
94435c37f0 perf(timeline): hoist prepared statement outside discussion thread loop
Moves the conn.prepare() call for fetching discussion notes outside the
per-discussion loop in collect_discussion_threads(). The SQL is identical
for every iteration, so preparing it once and rebinding parameters avoids
redundant statement compilation on each matched discussion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:40 -05:00
teernisse
59f65b127a fix(search): pass FTS5 boolean operators through unquoted
FTS5 boolean operators (AND, OR, NOT, NEAR) are case-sensitive uppercase
keywords that must appear unquoted in the query string. Previously, the
user-friendly query builder would double-quote every token, causing
queries like "switch AND health" to search for the literal word "AND"
instead of using it as a boolean conjunction.

Adds a FTS5_OPERATORS constant and checks each token against it before
quoting, allowing natural boolean search syntax to work as expected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:29 -05:00
teernisse
f36e900570 feat(cli): add pipeline progress spinners to timeline and search
Adds numbered stage spinners ([1/3], [2/3], [3/3]) to the timeline
pipeline stages (seed, expand, collect) so users see activity during
longer queries. TimelineParams gains a robot_mode field to suppress
spinners in JSON output mode.

Adds a [1/1] spinner to the search command for consistency, using the
shared stage_spinner from cli/progress.

Also refactors wrap_snippet() to delegate to wrap_text() with a 4-line
cap, eliminating the duplicated word-wrapping logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:19 -05:00
teernisse
e2efc61beb refactor(cli): extract stage_spinner to shared progress module
Moves stage_spinner() from a private function in sync.rs to a pub function
in cli/progress.rs so it can be reused by the timeline and search commands.
The function creates a numbered spinner (e.g. [1/3]) for pipeline stages,
returning a hidden no-op bar in robot mode to keep caller code path-uniform.

sync.rs now imports from crate::cli::progress::stage_spinner instead of
defining its own copy. Adds unit tests for robot mode (hidden bar), human
mode (prefix/message properties), and prefix formatting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:10 -05:00
teernisse
2da1a228b3 feat(timeline): collect and render full discussion threads
Implements the downstream consumption of matched discussions from the seed
phase, completing the discussion thread feature across collect, CLI, and
integration tests.

Collect phase (timeline_collect.rs):
- New collect_discussion_threads() function assembles full threads by
  querying notes for each matched discussion_id, filtering out system notes
  (is_system = 0), ordering chronologically, and capping at THREAD_MAX_NOTES
  with a synthetic "[N more notes not shown]" summary note
- build_entity_lookup() creates a (type, id) -> (iid, path) map from seed
  and expanded entities to provide display metadata for thread events
- Thread timestamp is set to the first note's created_at for correct
  chronological interleaving with other timeline events
- collect_events() gains a matched_discussions parameter; threads are
  collected after entity events and before evidence note merging

CLI rendering (cli/commands/timeline.rs):
- Human mode: threads render with box-drawing borders, bold @author tags,
  date-stamped notes, and word-wrapped bodies (60 char width)
- Robot mode: DiscussionThread serializes as discussion_thread kind with
  note_count, full notes array (note_id, author, body, ISO created_at)
- THREAD tag in yellow for human event tag styling
- TimelineMeta gains discussion_threads_included count

Tests:
- 8 new collect tests: basic thread assembly, system note filtering, empty
  thread skipping, body truncation to THREAD_NOTE_MAX_CHARS, note cap with
  synthetic summary, timestamp from first note, chronological sort position,
  and deduplication of duplicate discussion_ids
- Integration tests updated for new collect_events signature

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:18:36 -05:00
teernisse
0e65202778 feat(timeline): add DiscussionThread types and seed-phase discussion matching
Introduces the foundation for full discussion thread support in the
timeline pipeline. Adds three new domain types to timeline.rs:

- ThreadNote: individual note within a thread (id, author, body, timestamp)
- MatchedDiscussion: tracks discussions matched during seeding with their
  parent entity (issue or MR) for downstream collection
- DiscussionThread variant on TimelineEventType: carries a full thread of
  notes, sorted between NoteEvidence and CrossReferenced

Moves truncate_to_chars() from timeline_seed.rs to timeline.rs as pub(crate)
for reuse by the collect phase. Adds THREAD_NOTE_MAX_CHARS (2000) and
THREAD_MAX_NOTES (50) constants.

Upgrades the seed SQL in resolve_documents_to_entities() to resolve note
documents to their parent discussion via an additional LEFT JOIN chain
(notes -> discussions), using COALESCE to unify the entity resolution path
for both discussion and note source types. SeedResult gains a
matched_discussions field that captures deduplicated discussion matches.

Tests cover: discussion matching from discussion docs, note-to-parent
resolution, deduplication of same discussion across multiple docs, and
correct parent entity type (issue vs MR).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:18:18 -05:00
teernisse
f439c42b3d chore: add gitignore for mock-seed, roam CI workflow, formatting
- Add tools/mock-seed/ to .gitignore
- Add .github/workflows/roam.yml CI workflow
- Add .roam/fitness.yaml architectural fitness rules
- Rustfmt formatting fixes in show.rs and vector.rs
- Beads sync

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:30 -05:00
teernisse
4f3ec72923 feat(timeline): upgrade seed phase to hybrid search
Replace FTS-only seed entity discovery with hybrid search (FTS + vector
via RRF), using the same search_hybrid infrastructure as the search
command. Falls back gracefully to FTS-only when Ollama is unavailable.

Changes:
- seed_timeline() now accepts OllamaClient, delegates to search_hybrid
- New resolve_documents_to_entities() replaces find_seed_entities()
- SeedResult gains search_mode field tracking actual mode used
- TimelineResult carries search_mode through to JSON renderer
- run_timeline wires up OllamaClient from config
- handle_timeline made async for the hybrid search await
- Tests updated for new function signatures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:24 -05:00
teernisse
e6771709f1 refactor(core): extract path_resolver module, fix old_path matching in who
Extract shared path resolution logic from who.rs into a new
core::path_resolver module for cross-module reuse. Functions moved:
escape_like, normalize_repo_path, PathQuery, SuffixResult,
build_path_query, suffix_probe. Duplicate escape_like copies removed
from list.rs, project.rs, and filters.rs — all now import from
path_resolver.

Additionally fixes two bugs in query_expert_details() and
query_overlap() where only position_new_path was checked (missing
old_path matches for renamed files) and state filter excluded 'closed'
MRs despite the main scoring query including them with a decay
multiplier.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:14 -05:00
Taylor Eernisse
8c86b0dfd7 release: v0.8.1 2026-02-13 11:12:31 -05:00
teernisse
6e55b2470d bugfix: DB column and size issues 2026-02-13 11:11:35 -05:00
Taylor Eernisse
b05922d60b release: v0.8.0 2026-02-13 10:59:05 -05:00
Taylor Eernisse
11fe02fac9 docs: add proposed code file reorganization plan
Planning document for the ongoing test extraction and code organization
effort. Covers module-by-module analysis, proposed file splits, and
phased execution plan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:56 -05:00
Taylor Eernisse
48fbd4bfdb feat(core): add file rename chain resolver with depth-bounded BFS
New module: core::file_history with resolve_rename_chain() that traces
a file path through its rename history in mr_file_changes using
bidirectional BFS (forward: old_path->new_path, backward: new_path->old_path).

Key design decisions:
- Depth-bounded BFS: each queue entry carries its distance from the
  origin, so max_hops correctly limits by graph distance (not by total
  nodes discovered). This matters for branching rename graphs where a
  file was renamed differently in parallel MRs.
- Cycle-safe: visited set prevents infinite loops from circular renames.
- Project-scoped: queries are always scoped to a single project_id.
- Deterministic: output is sorted for stable results.

Tests cover: linear chains (forward/backward), cycles, max_hops=0,
depth-bounded linear chains, branching renames, diamond patterns,
and cross-project isolation (9 tests total).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:41 -05:00
Taylor Eernisse
9786ef27f5 refactor(core/time): extract parse_since_from for deterministic time parsing
Factor out parse_since_from(input, reference_ms) so callers can compute
relative durations against a fixed reference timestamp instead of always
using now(). The existing parse_since() now delegates to it with now_ms().

Enables testable and reproducible time-relative queries for features like
timeline --as-of and who --as-of.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:20 -05:00
Taylor Eernisse
7e0e6a91f2 refactor: extract unit tests into separate _tests.rs files
Move inline #[cfg(test)] mod tests { ... } blocks from 22 source files
into dedicated _tests.rs companion files, wired via:

    #[cfg(test)]
    #[path = "module_tests.rs"]
    mod tests;

This keeps implementation-focused source files leaner and more scannable
while preserving full access to private items through `use super::*;`.

Modules extracted:
  core:      db, note_parser, payloads, project, references, sync_run,
             timeline_collect, timeline_expand, timeline_seed
  cli:       list (55 tests), who (75 tests)
  documents: extractor (43 tests), regenerator
  embedding: change_detector, chunking
  gitlab:    graphql (wiremock async tests), transformers/issue
  ingestion: dirty_tracker, discussions, issues, mr_diffs

Also adds conflicts_with("explain_score") to the --detail flag in the
who command to prevent mutually exclusive flags from being combined.

All 629 unit tests pass. No behavior changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:02 -05:00
Taylor Eernisse
5c2df3df3b chore(beads): sync issue tracker
Export latest bead state to JSONL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:53:33 -05:00
teernisse
94c8613420 feat(bd-226s): implement time-decay expert scoring model
Replace flat-weight expertise scoring with exponential half-life decay,
split reviewer signals (participated vs assigned-only), dual-path rename
awareness, and new CLI flags (--as-of, --explain-score, --include-bots,
--all-history).

Changes:
- ScoringConfig: 8 new fields with validation (config.rs)
- half_life_decay() and normalize_query_path() pure functions (who.rs)
- CTE-based SQL with dual-path matching, mr_activity, reviewer_participation (who.rs)
- Rust-side decay aggregation with deterministic f64 ordering (who.rs)
- Path resolution probes check old_path columns (who.rs)
- Migration 026: 5 new indexes for dual-path and reviewer participation
- Default --since changed from 6m to 24m
- 31 new tests (example-based + invariant), 621 total who tests passing
- Autocorrect registry updated with new flags

Closes: bd-226s, bd-2w1p, bd-1soz, bd-18dn, bd-2ao4, bd-2yu5, bd-1b50,
bd-1hoq, bd-1h3f, bd-13q8, bd-11mg, bd-1vti, bd-1j5o
2026-02-12 15:44:55 -05:00
teernisse
ad4dd6e855 release: v0.7.0 2026-02-12 13:31:57 -05:00
teernisse
83cd16c918 feat: implement per-note search and document pipeline
- Add SourceType::Note with extract_note_document() and ParentMetadataCache
- Migration 022: composite indexes for notes queries + author_id column
- Migration 024: table rebuild adding 'note' to CHECK constraints, defense triggers
- Migration 025: backfill existing non-system notes into dirty queue
- Add lore notes CLI command with 17 filter options (author, path, resolution, etc.)
- Support table/json/jsonl/csv output formats with field selection
- Wire note dirty tracking through discussion and MR discussion ingestion
- Fix test_migration_024_preserves_existing_data off-by-one (tested wrong migration)
- Fix upsert_document_inner returning false for label/path-only changes
2026-02-12 13:31:24 -05:00
teernisse
fda9cd8835 chore(beads): revise 18 NOTE beads with verified codebase context
Enriched all per-note search beads (NOTE-0A through NOTE-2I) with:
- Corrected migration numbers (022, 024, 025)
- Verified file paths and line numbers from codebase
- Complete function signatures for referenced code
- Detailed approach sections with SQL and Rust patterns
- DocumentData struct field mappings
- TDD anchors with specific test names
- Edge cases from codebase analysis
- Dependency context explaining what each blocker provides
2026-02-12 12:26:48 -05:00
teernisse
c8d609ab78 chore: add drift to autocorrect command registry 2026-02-12 12:10:02 -05:00
teernisse
35c828ba73 feat(bd-91j1): enhance robot-docs with quick_start and example_output
Add quick_start section with glab equivalents, lore-exclusive features,
and read/write split guidance. Add example_output to issues, mrs, search,
and who commands. Update strip_schemas to also strip example_output in
brief mode. Update beads tracking state.

Closes: bd-91j1
2026-02-12 12:09:44 -05:00
teernisse
ecbfef537a feat(bd-1ksf): wire hybrid search (FTS5 + vector + RRF) to CLI
Make run_search async, replace hardcoded lexical mode with SearchMode::parse(),
wire search_hybrid() with OllamaClient for semantic/hybrid modes, graceful
degradation when Ollama unavailable.

Closes: bd-1ksf
2026-02-12 12:03:47 -05:00
teernisse
47eecce8e9 feat(bd-1cjx): add lore drift command for discussion divergence detection
Implement drift detection using cosine similarity between issue description
embedding and chronological note embeddings. Sliding window (size 3) identifies
topic drift points. Includes human and robot output formatters.

New files: drift.rs, similarity.rs
Closes: bd-1cjx
2026-02-12 12:02:15 -05:00
teernisse
b29c382583 feat(bd-2g50): fill data gaps in issue detail view
Add references_full, user_notes_count, merge_requests_count computed
fields to show issue. Add closed_at and confidential columns via
migration 023.

Closes: bd-2g50
2026-02-12 11:59:44 -05:00
teernisse
e26816333f feat(bd-kvij): rewrite agent skills to mandate lore for reads
Add Read/Write Split section to AGENTS.md and CLAUDE.md mandating lore
for all read operations and glab for all write operations.

Closes: bd-kvij
2026-02-12 11:59:32 -05:00
teernisse
f772de8aef release: v0.6.2 2026-02-12 11:33:59 -05:00
teernisse
dd4d867c6e chore: update beads issue tracking state
Sync beads database with current issue status. Includes history
snapshot rotation and updated issue metadata from triage session.
2026-02-12 11:25:27 -05:00
teernisse
ffd074499a docs: update TUI PRD, time-decay scoring, and plan-to-beads plans
TUI PRD v2 (frankentui): Rounds 10-11 feedback refining the hybrid
Ratatui terminal UI approach — component architecture, keybinding
model, and incremental search integration.

Time-decay expert scoring: Round 6 feedback on the weighted scoring
model for the `who` command's expert mode, covering decay curves,
activity normalization, and bot filtering thresholds.

Plan-to-beads v2: Draft specification for the next iteration of the
plan-to-beads skill that converts markdown plans into dependency-
aware beads with full agent-executable context.
2026-02-12 11:21:32 -05:00
teernisse
125938fba6 docs: add per-note search PRD and user journey documentation
Per-note search PRD: Comprehensive product requirements for evolving
the search system from document-level to note-level granularity.
Includes 6 rounds of iterative feedback refining scope, ranking
strategy, migration path, and robot mode integration.

User journeys: Detailed walkthrough of 8 primary user workflows
covering issue triage, MR review lookup, code archaeology, expert
discovery, sync pipeline operation, and agent integration patterns.
2026-02-12 11:21:23 -05:00
teernisse
cd25cf61ca docs: add architecture and flow diagrams
Excalidraw source files and PNG exports for 5 architectural diagrams:

01-human-flow-map: User journey through lore CLI commands
02-agent-flow-map: AI agent interaction patterns with robot mode
03-command-coverage: Matrix of CLI commands vs data entities
04-gap-priority-matrix: Feature gap analysis with priority scoring
05-data-flow-architecture: End-to-end data pipeline from GitLab
    through ingestion, storage, indexing, and query layers
2026-02-12 11:21:15 -05:00
teernisse
d9c9f6e541 fix: escape LIKE metacharacters in project resolver
User-supplied project names containing `%` or `_` were passed directly
into LIKE patterns, causing unintended wildcard matching. For example,
`my_project` would match `my-project` because `_` is a single-char
wildcard in SQL LIKE.

Added escape_like() helper that escapes `\`, `%`, and `_` with
backslash, and added ESCAPE '\' clauses to both the suffix-match and
substring-match queries in resolve_project().

Includes two regression tests:
- test_underscore_not_wildcard: `_` in input must not match `-`
- test_percent_not_wildcard: `%` in input must not match arbitrary strings
2026-02-12 11:21:09 -05:00
teernisse
acc5e12e3d perf: force partial index for DiffNote queries, batch stats counts
Query optimizer fixes for the `who` and `stats` commands based on
a systematic performance audit of the SQLite query plans.

who command (expert/reviews/detail modes):
- Add INDEXED BY idx_notes_diffnote_path_created hints to all DiffNote
  queries. SQLite's planner was selecting idx_notes_system (38% of rows)
  over the far more selective partial index (9.3% of rows). Measured
  50-133x speedup on expert queries, 26x on reviews queries.
- Reorder JOIN clauses in detail mode's MR-author sub-select to match
  the index scan direction (notes -> discussions -> merge_requests).

stats command:
- Replace 12+ sequential COUNT(*) queries with conditional aggregates
  (COALESCE + SUM + CASE). Documents, dirty_sources, pending_discussion_
  fetches, and pending_dependent_fetches tables each scanned once instead
  of 2-3 times. Measured 1.7x speedup (109ms -> 65ms warm cache).
- Switch FTS document count from COUNT(*) on the virtual table to
  COUNT(*) on documents_fts_docsize shadow table (B-tree scan vs FTS5
  virtual table overhead). Measured 19x speedup for that single query.

Database: 61652 docs, 282K notes, 211K discussions, 1.5GB.
2026-02-12 11:21:00 -05:00
teernisse
039ab1c2a3 release: v0.6.1 2026-02-11 15:15:41 -05:00
teernisse
d63d6f0b9c docs: document defaultProject configuration option
Updates README.md to explain the new defaultProject behavior:
- Config example now shows the defaultProject field
- New row in the configuration reference table describing the field,
  its type (optional string), default (none), and behavior (fallback
  when -p omitted, must match a configured path, CLI always overrides)
- Project Resolution section updated to explain the cascading logic:
  CLI flag > config default > all projects
- Init section notes the interactive prompt for multi-project setups
  and the --default-project flag for non-interactive/robot mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:09:53 -05:00
teernisse
3a1307dcdc feat(cli): wire defaultProject through init and all commands
Integrates the defaultProject config field across the entire CLI
surface so that omitting `-p` now falls back to the configured default.

Init command:
- New `--default-project` flag on `lore init` (and robot-mode variant)
- InitInputs.default_project: Option<String> passed through to run_init
- Validation in run_init ensures the default matches a configured path
- Interactive mode: when multiple projects are configured, prompts
  whether to set a default and which project to use
- Robot mode: InitOutputJson now includes default_project (omitted when
  null) for downstream automation
- Autocorrect dictionary updated with `--default-project`

Command handlers applying effective_project():
- handle_issues: list filters use config default when -p omitted
- handle_mrs: same cascading resolution for MR listing
- handle_ingest: dry-run and full sync respect the default
- handle_timeline: TimelineParams.project resolved via effective_project
- handle_search: SearchCliFilters.project resolved via effective_project
- handle_generate_docs: project filter cascades
- handle_who: falls back to config.default_project when -p omitted
- handle_count: both count subcommands respect the default
- handle_discussions: discussion count filters respect the default

Robot-docs:
- init command schema updated with --default-project flag and
  response_schema showing default_project as string?
- New config_notes section documents the defaultProject field with
  type, description, and example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:09:46 -05:00
teernisse
6ea3108a20 feat(config): add defaultProject with validation and cascading resolver
Introduces a new optional `defaultProject` field on Config (and
MinimalConfig for init output) that acts as a fallback when the
`-p`/`--project` CLI flag is omitted.

Domain-layer changes:
- Config.default_project: Option<String> with camelCase serde rename
- Config::load validates that defaultProject matches a configured
  project path (exact or case-insensitive suffix match), returning
  ConfigInvalid on mismatch
- Config::effective_project(cli_flag) -> Option<&str>: cascading
  resolver that prefers the CLI flag, then the config default, then None
- MinimalConfig.default_project with skip_serializing_if for clean
  JSON output when unset

Tests added:
- effective_project: CLI overrides default, falls back to default,
  returns None when both absent
- Config::load: accepts valid defaultProject, rejects nonexistent,
  accepts suffix match
- MinimalConfig: omits null defaultProject, includes when set
- Helper write_config_with_default_project for parameterized tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:09:33 -05:00
teernisse
81647545e7 release: v0.6.0 2026-02-11 10:56:26 -05:00
teernisse
39a832688d feat(sync): status enrichment progress visibility and status discoverability
- Add StatusEnrichmentStarted/PageFetched/Writing progress events so
  sync no longer has a 45-60s silent gap during GraphQL status fetch
- Thread per-page callback into fetch_issue_statuses_with_progress
- Hide status_category from all human and robot output (keep in DB)
- Add meta.available_statuses to issues list JSON response for agent
  self-discovery of valid --status filter values
- Update robot-docs with status filtering documentation
2026-02-11 10:56:01 -05:00
Taylor Eernisse
06229ce98b feat(cli): expose available_statuses in robot mode and hide status_category
(Supersedes empty commit f3788eb — jj auto-snapshot race.)

Three related refinements to how work item status is presented:

1. available_statuses in meta (list.rs, main.rs):
   Robot-mode issue list responses now include meta.available_statuses —
   a sorted array of all distinct status_name values in the database.
   Agents can use this to validate --status filter values or display
   valid options without a separate query.

2. Hide status_category from JSON (list.rs, show.rs):
   status_category is a GitLab internal classification that duplicates
   the state field. Switched to skip_serializing so it never appears
   in JSON output while remaining available internally.

3. Simplify human-readable status display (show.rs):
   Removed the "(category)" parenthetical from the Status line.

4. robot-docs schema updates (main.rs):
   Documented --status filter semantics and meta.available_statuses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:24:41 -05:00
Taylor Eernisse
8d18552298 docs: add jj-first VCS policy to AGENTS.md
Establishes Jujutsu (jj) as the preferred VCS tool for this colocated
repo, matching the global Claude Code rules. Agents should use jj
equivalents for all git operations and only fall back to raw git for
hooks, LFS, submodules, or gh CLI interop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:23:01 -05:00
Taylor Eernisse
f3788eb687 feat(cli): expose available_statuses in robot mode and hide status_category
Three related refinements to how work item status is presented:

1. available_statuses in meta (list.rs, main.rs):
   Robot-mode issue list responses now include meta.available_statuses —
   a sorted array of all distinct status_name values in the database.
   Agents can use this to validate --status filter values, offer
   autocomplete, or display valid options without a separate query.

2. Hide status_category from JSON (list.rs, show.rs):
   status_category (e.g. "open", "closed") is a GitLab internal
   classification that duplicates the state field and adds no actionable
   signal for consumers. Switched from skip_serializing_if to
   skip_serializing so it never appears in JSON output while remaining
   available internally for future use.

3. Simplify human-readable status display (show.rs):
   Removed the "(category)" parenthetical from the Status line in
   lore show issue output. The category was noise — users care about
   the board column label, not GitLab's internal taxonomy.

4. robot-docs schema updates (main.rs):
   Documented the --status filter semantics and the new
   meta.available_statuses field in the self-discovery manifest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:22:39 -05:00
Taylor Eernisse
e9af529f6e feat(ingestion): add progress reporting for status enrichment pipeline
Previously the status enrichment phase (GraphQL work item status fetch)
ran silently — users saw no feedback between "syncing issues" and the
final enrichment summary. For projects with hundreds of issues and
adaptive page-size retries, this felt like a hang.

Changes across three layers:

GraphQL (graphql.rs):
  - Extract fetch_issue_statuses_with_progress() accepting an optional
    on_page callback invoked after each paginated fetch with the
    running count of fetched IIDs
  - Original fetch_issue_statuses() preserved as a zero-cost
    delegation wrapper (no callback overhead)

Orchestrator (orchestrator.rs):
  - Three new ProgressEvent variants: StatusEnrichmentStarted,
    StatusEnrichmentPageFetched, StatusEnrichmentWriting
  - Wire the page callback through to the new _with_progress fn

CLI (ingest.rs):
  - Handle all three new events in the progress callback, updating
    both the per-project spinner and the stage bar with live counts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:22:20 -05:00
Taylor Eernisse
70271c14d6 fix(core): ensure migration framework records schema version automatically
The migration runner now inserts (OR REPLACE) the schema_version row
after each successful migration batch, regardless of whether the
migration SQL itself contains a self-registering INSERT. This prevents
version tracking gaps when a .sql migration omits the bookkeeping
statement, which would leave the schema at an unrecorded version and
cause re-execution attempts on next startup.

Legacy migrations that already self-register are unaffected thanks to
the OR REPLACE conflict resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:21:49 -05:00
Taylor Eernisse
d9f99ef21d feat(cli): status display/filtering, expanded --fields, and robot-docs --brief
Work item status integration across all CLI output:

Issue listing (lore list issues):
- New Status column appears when any issue has status data, with
  hex-color rendering using ANSI 256-color approximation
- New --status flag for case-insensitive filtering (OR logic for
  multiple values): lore issues --status "In progress" --status "To do"
- Status fields (name, category, color, icon_name, synced_at) in issue
  list query and JSON output with conditional serialization

Issue detail (lore show issue):
- Displays "Status: In progress (in_progress)" with color-coded output
  using ANSI 256-color approximation from hex color values
- Status fields included in robot mode JSON with ISO timestamps
- IssueRow, IssueDetail, IssueDetailJson all carry status columns

Robot mode field selection expanded to new commands:
- search: --fields with "minimal" preset (document_id, title, source_type, score)
- timeline: --fields with "minimal" preset (timestamp, type, entity_iid, detail)
- who: --fields with per-mode presets (expert_minimal, workload_minimal, etc.)
- robot-docs: new --brief flag strips response_schema from output (~60% smaller)
- strip_schemas() utility in robot.rs for --brief mode
- expand_fields_preset() extended for search, timeline, and all who modes

Robot-docs manifest updated with --status flag documentation, --fields
flags for search/timeline/who, fields_presets sections, and corrected
search response schema field names.

Note: replaces empty commit dcfd449 which lost staging during hook execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:13:37 -05:00
Taylor Eernisse
f5967a8e52 chore: fix UBS hook stdin parsing and update beads
.claude/hooks/on-file-write.sh:
- Fix hook to read Claude Code context from JSON stdin (FILE_PATH and
  CWD extracted via jq) instead of relying on environment variables
- Scan only the changed file instead of the entire project directory,
  reducing hook execution from ~30s to <1s per save

.beads/:
- Sync issue tracker state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:34 -05:00
Taylor Eernisse
2c9de1a6c3 docs: add lore-service, work-item-status-graphql, and time-decay plans
Three implementation plans with iterative cross-model refinement:

lore-service (5 iterations):
  HTTP service layer exposing lore's SQLite data via REST/SSE for
  integration with external tools (dashboards, IDE extensions, chat
  agents). Covers authentication, rate limiting, caching strategy, and
  webhook-driven sync triggers.

work-item-status-graphql (7 iterations + TDD appendix):
  Detailed implementation plan for the GraphQL-based work item status
  enrichment feature (now implemented). Includes the TDD appendix with
  test-first development specifications covering GraphQL client, adaptive
  pagination, ingestion orchestration, CLI display, and robot mode output.

time-decay-expert-scoring (iteration 5 feedback):
  Updates to the existing time-decay scoring plan incorporating feedback
  on decay curve parameterization, recency weighting for discussion
  contributions, and staleness detection thresholds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:17 -05:00
Taylor Eernisse
1161edb212 docs: add TUI PRD v2 (FrankenTUI) with 9 plan-refine iterations
Comprehensive product requirements document for the gitlore TUI built on
FrankenTUI's Elm architecture (Msg -> update -> view). The PRD (7800+
lines) covers:

Architecture: Separate binary crate (lore-tui) with runtime delegation,
Elm-style Model/Cmd/Msg, DbManager with closure-based read pool + WAL,
TaskSupervisor for dedup/cancellation, EntityKey system for type-safe
entity references, CommandRegistry as single source of truth for
keybindings/palette/help.

Screens: Dashboard, IssueList, IssueDetail, MrList, MrDetail, Search
(lexical/hybrid/semantic with facets), Timeline (5-stage pipeline),
Who (expert/workload/reviews/active/overlap), Sync (live progress),
CommandPalette, Help overlay.

Infrastructure: InputMode state machine, Clock trait for deterministic
rendering, crash_context ring buffer with redaction, instance lock,
progressive hydration, session restore, grapheme-safe text truncation
(unicode-width + unicode-segmentation), terminal sanitization (ANSI/bidi/
C1 controls), entity LRU cache.

Testing: Snapshot tests via insta, event-fuzz, CLI/TUI parity, tiered
benchmark fixtures (S/M/L), query-plan CI enforcement, Phase 2.5
vertical slice gate.

9 plan-refine iterations (ChatGPT review -> Claude integration):
  Iter 1-3: Connection pool, debounce, EntityKey, TaskSupervisor,
    keyset pagination, capability-adaptive rendering
  Iter 4-6: Separate binary crate, ANSI hardening, session restore,
    read tx isolation, progressive hydration, unicode-width
  Iter 7-9: Per-screen LoadState, CommandRegistry, InputMode, Clock,
    log redaction, entity cache, search cancel SLO, crash diagnostics

Also includes the original tui-prd.md (ratatui-based, superseded by v2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:11:26 -05:00
Taylor Eernisse
5ea976583e docs: update README, AGENTS, and robot-mode-design for work item status
README.md:
- Feature summary updated to mention work item status sync and GraphQL
- New config reference entry for sync.fetchWorkItemStatus (default true)
- Issue listing/show examples include --status flag usage
- Valid fields list expanded with status_name, status_category,
  status_color, status_icon_name, status_synced_at_iso
- Database schema table updated for issues table
- Ingest/sync command descriptions mention status enrichment phase
- Adaptive page sizing and graceful degradation documented

AGENTS.md:
- Robot mode example shows --status flag usage

docs/robot-mode-design.md:
- Issue available fields list expanded with status fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:10:51 -05:00
Taylor Eernisse
dcfd449b72 feat(cli): status display/filtering, expanded --fields, and robot-docs --brief
Work item status integration across all CLI output:

Issue listing (lore list issues):
- New Status column appears when any issue has status data, with
  hex-color rendering using ANSI 256-color approximation
- New --status flag for case-insensitive filtering (OR logic for
  multiple values): lore issues --status "In progress" --status "To do"

Issue detail (lore show issue):
- Displays "Status: In progress (in_progress)" with color-coded output
- Status fields (name, category, color, icon, synced_at) included in
  robot mode JSON with ISO timestamps

Robot mode field selection expanded to new commands:
- search: --fields with "minimal" preset (document_id, title, source_type, score)
- timeline: --fields with "minimal" preset (timestamp, type, entity_iid, detail)
- who: --fields with per-mode presets (expert_minimal, workload_minimal, etc.)
- robot-docs: new --brief flag strips response_schema from output (~60% smaller)

Robot-docs manifest updated with --status flag documentation, --fields
flags for search/timeline/who, fields_presets sections, and corrected
search response schema field names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:09:47 -05:00
Taylor Eernisse
6b75697638 feat(ingestion): enrich issues with work item status from GraphQL API
Add a "Phase 1.5" status enrichment step to the issue ingestion pipeline
that fetches work item statuses via the GitLab GraphQL API after the
standard REST API ingestion completes.

Schema changes (migration 021):
- Add status_name, status_category, status_color, status_icon_name, and
  status_synced_at columns to the issues table (all nullable)

Ingestion pipeline changes:
- New `enrich_issue_statuses_txn()` function that applies fetched
  statuses in a single transaction with two phases: clear stale statuses
  for issues that no longer have a status widget, then apply new/updated
  statuses from the GraphQL response
- ProgressEvent variants for status enrichment (complete/skipped)
- IngestProjectResult tracks enrichment metrics (seen, enriched, cleared,
  without_widget, partial_error_count, enrichment_mode, errors)
- Robot mode JSON output includes per-project status enrichment details

Configuration:
- New `sync.fetchWorkItemStatus` config option (defaults true) to disable
  GraphQL status enrichment on instances without Premium/Ultimate
- `LoreError::GitLabAuthFailed` now treated as permanent API error so
  status enrichment auth failures don't trigger retries

Also removes the unnecessary nested SAVEPOINT in store_closes_issues_refs
(already runs within the orchestrator's transaction context).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:09:21 -05:00
Taylor Eernisse
dc49f5209e feat(gitlab): add GraphQL client with adaptive pagination and work item status types
Introduce a reusable GraphQL client (`src/gitlab/graphql.rs`) that handles
GitLab's GraphQL API with full error handling for auth failures, rate
limiting, and partial errors. Key capabilities:

- Adaptive page sizing (100 → 50 → 25 → 10) to handle GitLab GraphQL
  complexity limits without hardcoding a single safe page size
- Paginated issue status fetching via the workItems GraphQL query
- Graceful detection of unsupported instances (missing GraphQL endpoint
  or forbidden auth) so ingestion continues without status data
- Retry-After header parsing via the `httpdate` crate for rate limit
  compliance

Also adds `WorkItemStatus` type to `gitlab::types` with name, category,
color, and icon_name fields (all optional except name) with comprehensive
deserialization tests covering all system statuses (TO_DO, IN_PROGRESS,
DONE, CANCELED) and edge cases (null category, unknown future values).

The `GitLabClient` gains a `graphql_client()` factory method for
ergonomic access from the ingestion pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:08:53 -05:00
179 changed files with 60572 additions and 10067 deletions

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
bd-3qn6
bd-1sc6

View File

@@ -1,6 +1,11 @@
#!/bin/bash
# Ultimate Bug Scanner - Claude Code Hook
# Runs on every file save for UBS-supported languages (JS/TS, Python, C/C++, Rust, Go, Java, Ruby)
# Claude Code hooks receive context as JSON on stdin.
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
CWD=$(echo "$INPUT" | jq -r '.cwd // empty')
if [[ "$FILE_PATH" =~ \.(js|jsx|ts|tsx|mjs|cjs|py|pyw|pyi|c|cc|cpp|cxx|h|hh|hpp|hxx|rs|go|java|rb)$ ]]; then
echo "🔬 Running bug scanner..."
@@ -8,5 +13,5 @@ if [[ "$FILE_PATH" =~ \.(js|jsx|ts|tsx|mjs|cjs|py|pyw|pyi|c|cc|cpp|cxx|h|hh|hpp|
echo "⚠️ 'ubs' not found in PATH; install it before using this hook." >&2
exit 0
fi
ubs "${PROJECT_DIR}" --ci 2>&1 | head -50
ubs "$FILE_PATH" --ci 2>&1 | head -50
fi

99
.claude/plan.md Normal file
View File

@@ -0,0 +1,99 @@
# Plan: Add Colors to Sync Command Output
## Current State
The sync output has three layers, each needing color treatment:
### Layer 1: Stage Lines (during sync)
```
✓ Issues 10 issues from 2 projects 4.2s
✓ Status 3 statuses updated · 5 seen 4.2s
vs/typescript-code 2 issues · 1 statuses updated
✓ MRs 5 merge requests from 2 projects 12.3s
vs/python-code 3 MRs · 10 discussions
✓ Docs 1,200 documents generated 8.1s
✓ Embed 3,400 chunks embedded 45.2s
```
**What's uncolored:** icons, labels, numbers, elapsed times, sub-row project paths, failure counts in parentheses.
### Layer 2: Summary (after sync)
```
Synced 10 issues and 5 MRs in 42.3s
120 discussions · 45 events · 12 diffs · 3 statuses updated
1,200 docs regenerated · 3,400 embedded
```
**What's already colored:** headline ("Synced" = green bold, "Sync completed with issues" = warning bold), issue/MR counts (bold), error line (red). Detail lines are all dim.
### Layer 3: Timing breakdown (`-t` flag)
```
── Timing ──────────────────────
issues .............. 4.2s
merge_requests ...... 12.3s
```
**What's already colored:** dots (dim), time (bold), errors (red), rate limits (warning).
---
## Color Plan
Using only existing `Theme` methods — no new colors needed.
### Stage Lines (`format_stage_line` + callers in sync.rs)
| Element | Current | Proposed | Theme method |
|---------|---------|----------|-------------|
| Icon (✓/⚠) | plain | green for success, yellow for warning | `Theme::success()` / `Theme::warning()` |
| Label ("Issues", "MRs", etc.) | plain | bold | `Theme::bold()` |
| Numbers in summary text | plain | bold | `Theme::bold()` (just the count) |
| Elapsed time | plain | muted gray | `Theme::timing()` |
| Failure text in parens | plain | warning/error color | `Theme::warning()` |
### Sub-rows (project breakdown lines)
| Element | Current | Proposed |
|---------|---------|----------|
| Project path | dim | `Theme::muted()` (slightly brighter than dim) |
| Counts (numbers only) | dim | `Theme::dim()` but numbers in normal weight |
| Error/failure counts | dim | `Theme::warning()` |
| Middle dots | dim | keep dim (they're separators, should recede) |
### Summary (`print_sync`)
| Element | Current | Proposed |
|---------|---------|----------|
| Issue/MR counts in headline | bold only | `Theme::info()` + bold (cyan numbers pop) |
| Time in headline | plain | `Theme::timing()` |
| Detail line numbers | all dim | numbers in `Theme::info()`, rest stays dim |
| Doc line numbers | all dim | numbers in `Theme::info()`, rest stays dim |
| "Already up to date" time | plain | `Theme::timing()` |
---
## Files to Change
1. **`src/cli/progress.rs`** — `format_stage_line()`: apply color to icon, bold to label, `Theme::timing()` to elapsed
2. **`src/cli/commands/sync.rs`** —
- Pass colored icons to `format_stage_line` / `emit_stage_line` / `emit_stage_block`
- Color failure text in `append_failures()`
- Color numbers and time in `print_sync()`
- Color error/failure counts in sub-row functions (`issue_sub_rows`, `mr_sub_rows`, `status_sub_rows`)
## Approach
- `format_stage_line` already receives the icon string — color it before passing
- Add a `color_icon` helper that applies success/warning color to the icon glyph
- Bold the label in `format_stage_line`
- Apply `Theme::timing()` to elapsed in `format_stage_line`
- In `append_failures`, wrap failure text in `Theme::warning()`
- In `print_sync`, wrap count numbers with `Theme::info().bold()`
- In sub-row functions, apply `Theme::warning()` to error/failure parts only (keep rest dim)
## Non-goals
- No changes to robot mode (JSON output)
- No changes to dry-run output (already reasonably colored)
- No new Theme colors — use existing palette
- No changes to timing breakdown (already colored)

21
.github/workflows/roam.yml vendored Normal file
View File

@@ -0,0 +1,21 @@
name: Roam Code Analysis
on:
pull_request:
branches: [main, master]
permissions:
contents: read
pull-requests: write
jobs:
roam:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install roam-code
- run: roam index
- run: roam fitness
- run: roam pr-risk --json

3
.gitignore vendored
View File

@@ -41,6 +41,9 @@ lore.config.json
*.db-shm
# Mock seed data
tools/mock-seed/
# Added by cargo
/target

11
.roam/fitness.yaml Normal file
View File

@@ -0,0 +1,11 @@
rules:
- name: No circular imports in core
type: dependency
source: "src/**"
forbidden_target: "tests/**"
reason: "Production code should not import test modules"
- name: Complexity threshold
type: metric
metric: cognitive_complexity
threshold: 30
reason: "Functions above 30 cognitive complexity need refactoring"

View File

@@ -14,6 +14,14 @@ If I tell you to do something, even if it goes against what follows below, YOU M
---
## Version Control: jj-First (CRITICAL)
**ALWAYS prefer jj (Jujutsu) over git for all VCS operations.** This is a colocated repo with both `.jj/` and `.git/`. When instructed to use git by anything — even later in this file — use the best jj replacement commands instead. Only fall back to raw `git` for things jj cannot do (hooks, LFS, submodules, `gh` CLI interop).
See `~/.claude/rules/jj-vcs/` for the full command reference, translation table, revsets, patterns, and recovery recipes.
---
## Irreversible Git & Filesystem Actions — DO NOT EVER BREAK GLASS
> **Note:** Treat destructive commands as break-glass. If there's any doubt, stop and ask.
@@ -316,7 +324,7 @@ bv --robot-insights | jq '.Cycles' # Circular deps (must
```bash
ubs file.rs file2.rs # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs $(jj diff --name-only) # Changed files — before commit
ubs --only=rust,toml src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs . # Whole project (ignores target/, Cargo.lock)
@@ -428,9 +436,9 @@ Returns structured results with file paths, line ranges, and extracted code snip
## Beads Workflow Integration
This project uses [beads_viewer](https://github.com/Dicklesworthstone/beads_viewer) for issue tracking. Issues are stored in `.beads/` and tracked in git.
This project uses [beads_viewer](https://github.com/Dicklesworthstone/beads_viewer) for issue tracking. Issues are stored in `.beads/` and tracked in version control.
**Note:** `br` is non-invasive—it never executes git commands directly. You must run git commands manually after `br sync --flush-only`.
**Note:** `br` is non-invasive—it never executes VCS commands directly. You must commit manually after `br sync --flush-only`.
### Essential Commands
@@ -446,7 +454,7 @@ br create --title="..." --type=task --priority=2
br update <id> --status=in_progress
br close <id> --reason="Completed"
br close <id1> <id2> # Close multiple issues at once
br sync --flush-only # Export to JSONL (then manually: git add .beads/ && git commit)
br sync --flush-only # Export to JSONL (then: jj commit -m "Update beads")
```
### Workflow Pattern
@@ -466,15 +474,14 @@ br sync --flush-only # Export to JSONL (then manually: git add .beads/ && git c
### Session Protocol
**Before ending any session, run this checklist:**
**Before ending any session, run this checklist (solo/lead only — workers skip VCS):**
```bash
git status # Check what changed
git add <files> # Stage code changes
br sync --flush-only # Export beads to JSONL
git add .beads/ # Stage beads changes
git commit -m "..." # Commit code and beads
git push # Push to remote
jj status # Check what changed
br sync --flush-only # Export beads to JSONL
jj commit -m "..." # Commit code and beads (jj auto-tracks all changes)
jj bookmark set <name> -r @- # Point bookmark at committed work
jj git push -b <name> # Push to remote
```
### Best Practices
@@ -483,13 +490,15 @@ git push # Push to remote
- Update status as you work (in_progress → closed)
- Create new issues with `br create` when you discover tasks
- Use descriptive titles and set appropriate priority/type
- Always run `br sync --flush-only` then commit .beads/ before ending session
- Always run `br sync --flush-only` then commit before ending session (jj auto-tracks .beads/)
<!-- end-bv-agent-instructions -->
## Landing the Plane (Session Completion)
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until `git push` succeeds.
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until push succeeds.
**WHO RUNS THIS:** Solo agents run it themselves. In multi-agent sessions, ONLY the team lead runs this. Workers skip VCS entirely.
**MANDATORY WORKFLOW:**
@@ -498,19 +507,20 @@ git push # Push to remote
3. **Update issue status** - Close finished work, update in-progress items
4. **PUSH TO REMOTE** - This is MANDATORY:
```bash
git pull --rebase
br sync --flush-only
git add .beads/
git commit -m "Update beads"
git push
git status # MUST show "up to date with origin"
jj git fetch # Get latest remote state
jj rebase -d trunk() # Rebase onto latest trunk if needed
br sync --flush-only # Export beads to JSONL
jj commit -m "Update beads" # Commit (jj auto-tracks .beads/ changes)
jj bookmark set <name> -r @- # Point bookmark at committed work
jj git push -b <name> # Push to remote
jj log -r '<name>' # Verify bookmark position
```
5. **Clean up** - Clear stashes, prune remote branches
5. **Clean up** - Abandon empty orphan changes if any (`jj abandon <rev>`)
6. **Verify** - All changes committed AND pushed
7. **Hand off** - Provide context for next session
**CRITICAL RULES:**
- Work is NOT complete until `git push` succeeds
- Work is NOT complete until `jj git push` succeeds
- NEVER stop before pushing - that leaves work stranded locally
- NEVER say "ready to push when you are" - YOU must push
- If push fails, resolve and retry until it succeeds
@@ -618,6 +628,9 @@ LORE_ROBOT=1 lore issues
lore --robot issues -n 10
lore --robot mrs -s opened
# Filter issues by work item status (case-insensitive)
lore --robot issues --status "In progress"
# List with field selection (reduces token usage ~60%)
lore --robot issues --fields minimal
lore --robot mrs --fields iid,title,state,draft
@@ -741,6 +754,21 @@ lore -J mrs --fields iid,title,state,draft,labels # Custom field list
- Use `lore robot-docs` for response schema discovery
- The `-p` flag supports fuzzy project matching (suffix and substring)
---
## Read/Write Split: lore vs glab
| Operation | Tool | Why |
|-----------|------|-----|
| List issues/MRs | lore | Richer: includes status, discussions, closing MRs |
| View issue/MR detail | lore | Pre-joined discussions, work-item status |
| Search across entities | lore | FTS5 + vector hybrid search |
| Expert/workload analysis | lore | who command — no glab equivalent |
| Timeline reconstruction | lore | Chronological narrative — no glab equivalent |
| Create/update/close | glab | Write operations |
| Approve/merge MR | glab | Write operations |
| CI/CD pipelines | glab | Not in lore scope |
````markdown
## UBS Quick Reference for AI Agents

742
AGENTS.md.backup Normal file
View File

@@ -0,0 +1,742 @@
# AGENTS.md
## RULE 0 - THE FUNDAMENTAL OVERRIDE PEROGATIVE
If I tell you to do something, even if it goes against what follows below, YOU MUST LISTEN TO ME. I AM IN CHARGE, NOT YOU.
---
## RULE NUMBER 1: NO FILE DELETION
**YOU ARE NEVER ALLOWED TO DELETE A FILE WITHOUT EXPRESS PERMISSION.** Even a new file that you yourself created, such as a test code file. You have a horrible track record of deleting critically important files or otherwise throwing away tons of expensive work. As a result, you have permanently lost any and all rights to determine that a file or folder should be deleted.
**YOU MUST ALWAYS ASK AND RECEIVE CLEAR, WRITTEN PERMISSION BEFORE EVER DELETING A FILE OR FOLDER OF ANY KIND.**
---
## Irreversible Git & Filesystem Actions — DO NOT EVER BREAK GLASS
> **Note:** Treat destructive commands as break-glass. If there's any doubt, stop and ask.
1. **Absolutely forbidden commands:** `git reset --hard`, `git clean -fd`, `rm -rf`, or any command that can delete or overwrite code/data must never be run unless the user explicitly provides the exact command and states, in the same message, that they understand and want the irreversible consequences.
2. **No guessing:** If there is any uncertainty about what a command might delete or overwrite, stop immediately and ask the user for specific approval. "I think it's safe" is never acceptable.
3. **Safer alternatives first:** When cleanup or rollbacks are needed, request permission to use non-destructive options (`git status`, `git diff`, `git stash`, copying to backups) before ever considering a destructive command.
4. **Mandatory explicit plan:** Even after explicit user authorization, restate the command verbatim, list exactly what will be affected, and wait for a confirmation that your understanding is correct. Only then may you execute it—if anything remains ambiguous, refuse and escalate.
5. **Document the confirmation:** When running any approved destructive command, record (in the session notes / final response) the exact user text that authorized it, the command actually run, and the execution time. If that record is absent, the operation did not happen.
---
## Toolchain: Rust & Cargo
We only use **Cargo** in this project, NEVER any other package manager.
- **Edition/toolchain:** Follow `rust-toolchain.toml` (if present). Do not assume stable vs nightly.
- **Dependencies:** Explicit versions for stability; keep the set minimal.
- **Configuration:** Cargo.toml only
- **Unsafe code:** Forbidden (`#![forbid(unsafe_code)]`)
When writing Rust code, reference RUST_CLI_TOOLS_BEST_PRACTICES.md
### Release Profile
Use the release profile defined in `Cargo.toml`. If you need to change it, justify the
performance/size tradeoff and how it impacts determinism and cancellation behavior.
---
## Code Editing Discipline
### No Script-Based Changes
**NEVER** run a script that processes/changes code files in this repo. Brittle regex-based transformations create far more problems than they solve.
- **Always make code changes manually**, even when there are many instances
- For many simple changes: use parallel subagents
- For subtle/complex changes: do them methodically yourself
### No File Proliferation
If you want to change something or add a feature, **revise existing code files in place**.
**NEVER** create variations like:
- `mainV2.rs`
- `main_improved.rs`
- `main_enhanced.rs`
New files are reserved for **genuinely new functionality** that makes zero sense to include in any existing file. The bar for creating new files is **incredibly high**.
---
## Backwards Compatibility
We do not care about backwards compatibility—we're in early development with no users. We want to do things the **RIGHT** way with **NO TECH DEBT**.
- Never create "compatibility shims"
- Never create wrapper functions for deprecated APIs
- Just fix the code directly
---
## Compiler Checks (CRITICAL)
**After any substantive code changes, you MUST verify no errors were introduced:**
```bash
# Check for compiler errors and warnings
cargo check --all-targets
# Check for clippy lints (pedantic + nursery are enabled)
cargo clippy --all-targets -- -D warnings
# Verify formatting
cargo fmt --check
```
If you see errors, **carefully understand and resolve each issue**. Read sufficient context to fix them the RIGHT way.
---
## Testing
### Unit & Property Tests
```bash
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
```
When adding or changing primitives, add tests that assert the core invariants:
- no task leaks
- no obligation leaks
- losers are drained after races
- region close implies quiescence
Prefer deterministic lab-runtime tests for concurrency-sensitive behavior.
---
## MCP Agent Mail — Multi-Agent Coordination
A mail-like layer that lets coding agents coordinate asynchronously via MCP tools and resources. Provides identities, inbox/outbox, searchable threads, and advisory file reservations with human-auditable artifacts in Git.
### Why It's Useful
- **Prevents conflicts:** Explicit file reservations (leases) for files/globs
- **Token-efficient:** Messages stored in per-project archive, not in context
- **Quick reads:** `resource://inbox/...`, `resource://thread/...`
### Same Repository Workflow
1. **Register identity:**
```
ensure_project(project_key=<abs-path>)
register_agent(project_key, program, model)
```
2. **Reserve files before editing:**
```
file_reservation_paths(project_key, agent_name, ["src/**"], ttl_seconds=3600, exclusive=true)
```
3. **Communicate with threads:**
```
send_message(..., thread_id="FEAT-123")
fetch_inbox(project_key, agent_name)
acknowledge_message(project_key, agent_name, message_id)
```
4. **Quick reads:**
```
resource://inbox/{Agent}?project=<abs-path>&limit=20
resource://thread/{id}?project=<abs-path>&include_bodies=true
```
### Macros vs Granular Tools
- **Prefer macros for speed:** `macro_start_session`, `macro_prepare_thread`, `macro_file_reservation_cycle`, `macro_contact_handshake`
- **Use granular tools for control:** `register_agent`, `file_reservation_paths`, `send_message`, `fetch_inbox`, `acknowledge_message`
### Common Pitfalls
- `"from_agent not registered"`: Always `register_agent` in the correct `project_key` first
- `"FILE_RESERVATION_CONFLICT"`: Adjust patterns, wait for expiry, or use non-exclusive reservation
- **Auth errors:** If JWT+JWKS enabled, include bearer token with matching `kid`
---
## Beads (br) — Dependency-Aware Issue Tracking
Beads provides a lightweight, dependency-aware issue database and CLI (`br` / beads_rust) for selecting "ready work," setting priorities, and tracking status. It complements MCP Agent Mail's messaging and file reservations.
**Note:** `br` is non-invasive—it never executes git commands directly. You must run git commands manually after `br sync --flush-only`.
### Conventions
- **Single source of truth:** Beads for task status/priority/dependencies; Agent Mail for conversation and audit
- **Shared identifiers:** Use Beads issue ID (e.g., `br-123`) as Mail `thread_id` and prefix subjects with `[br-123]`
- **Reservations:** When starting a task, call `file_reservation_paths()` with the issue ID in `reason`
### Typical Agent Flow
1. **Pick ready work (Beads):**
```bash
br ready --json # Choose highest priority, no blockers
```
2. **Reserve edit surface (Mail):**
```
file_reservation_paths(project_key, agent_name, ["src/**"], ttl_seconds=3600, exclusive=true, reason="br-123")
```
3. **Announce start (Mail):**
```
send_message(..., thread_id="br-123", subject="[br-123] Start: <title>", ack_required=true)
```
4. **Work and update:** Reply in-thread with progress
5. **Complete and release:**
```bash
br close br-123 --reason "Completed"
```
```
release_file_reservations(project_key, agent_name, paths=["src/**"])
```
Final Mail reply: `[br-123] Completed` with summary
### Mapping Cheat Sheet
| Concept | Value |
|---------|-------|
| Mail `thread_id` | `br-###` |
| Mail subject | `[br-###] ...` |
| File reservation `reason` | `br-###` |
| Commit messages | Include `br-###` for traceability |
---
## bv — Graph-Aware Triage Engine
bv is a graph-aware triage engine for Beads projects (`.beads/beads.jsonl`). It computes PageRank, betweenness, critical path, cycles, HITS, eigenvector, and k-core metrics deterministically.
**Scope boundary:** bv handles *what to work on* (triage, priority, planning). For agent-to-agent coordination (messaging, work claiming, file reservations), use MCP Agent Mail.
**CRITICAL: Use ONLY `--robot-*` flags. Bare `bv` launches an interactive TUI that blocks your session.**
### The Workflow: Start With Triage
**`bv --robot-triage` is your single entry point.** It returns:
- `quick_ref`: at-a-glance counts + top 3 picks
- `recommendations`: ranked actionable items with scores, reasons, unblock info
- `quick_wins`: low-effort high-impact items
- `blockers_to_clear`: items that unblock the most downstream work
- `project_health`: status/type/priority distributions, graph metrics
- `commands`: copy-paste shell commands for next steps
```bash
bv --robot-triage # THE MEGA-COMMAND: start here
bv --robot-next # Minimal: just the single top pick + claim command
```
### Command Reference
**Planning:**
| Command | Returns |
|---------|---------|
| `--robot-plan` | Parallel execution tracks with `unblocks` lists |
| `--robot-priority` | Priority misalignment detection with confidence |
**Graph Analysis:**
| Command | Returns |
|---------|---------|
| `--robot-insights` | Full metrics: PageRank, betweenness, HITS, eigenvector, critical path, cycles, k-core, articulation points, slack |
| `--robot-label-health` | Per-label health: `health_level`, `velocity_score`, `staleness`, `blocked_count` |
| `--robot-label-flow` | Cross-label dependency: `flow_matrix`, `dependencies`, `bottleneck_labels` |
| `--robot-label-attention [--attention-limit=N]` | Attention-ranked labels |
**History & Change Tracking:**
| Command | Returns |
|---------|---------|
| `--robot-history` | Bead-to-commit correlations |
| `--robot-diff --diff-since <ref>` | Changes since ref: new/closed/modified issues, cycles |
**Other:**
| Command | Returns |
|---------|---------|
| `--robot-burndown <sprint>` | Sprint burndown, scope changes, at-risk items |
| `--robot-forecast <id\|all>` | ETA predictions with dependency-aware scheduling |
| `--robot-alerts` | Stale issues, blocking cascades, priority mismatches |
| `--robot-suggest` | Hygiene: duplicates, missing deps, label suggestions |
| `--robot-graph [--graph-format=json\|dot\|mermaid]` | Dependency graph export |
| `--export-graph <file.html>` | Interactive HTML visualization |
### Scoping & Filtering
```bash
bv --robot-plan --label backend # Scope to label's subgraph
bv --robot-insights --as-of HEAD~30 # Historical point-in-time
bv --recipe actionable --robot-plan # Pre-filter: ready to work
bv --recipe high-impact --robot-triage # Pre-filter: top PageRank
bv --robot-triage --robot-triage-by-track # Group by parallel work streams
bv --robot-triage --robot-triage-by-label # Group by domain
```
### Understanding Robot Output
**All robot JSON includes:**
- `data_hash` — Fingerprint of source beads.jsonl
- `status` — Per-metric state: `computed|approx|timeout|skipped` + elapsed ms
- `as_of` / `as_of_commit` — Present when using `--as-of`
**Two-phase analysis:**
- **Phase 1 (instant):** degree, topo sort, density
- **Phase 2 (async, 500ms timeout):** PageRank, betweenness, HITS, eigenvector, cycles
### jq Quick Reference
```bash
bv --robot-triage | jq '.quick_ref' # At-a-glance summary
bv --robot-triage | jq '.recommendations[0]' # Top recommendation
bv --robot-plan | jq '.plan.summary.highest_impact' # Best unblock target
bv --robot-insights | jq '.status' # Check metric readiness
bv --robot-insights | jq '.Cycles' # Circular deps (must fix!)
```
---
## UBS — Ultimate Bug Scanner
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
### Commands
```bash
ubs file.rs file2.rs # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs --only=rust,toml src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs . # Whole project (ignores target/, Cargo.lock)
```
### Output Format
```
⚠️ Category (N errors)
file.rs:42:5 Issue description
💡 Suggested fix
Exit code: 1
```
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
### Fix Workflow
1. Read finding → category + fix suggestion
2. Navigate `file:line:col` → view context
3. Verify real issue (not false positive)
4. Fix root cause (not symptom)
5. Re-run `ubs <file>` → exit 0
6. Commit
### Bug Severity
- **Critical (always fix):** Memory safety, use-after-free, data races, SQL injection
- **Important (production):** Unwrap panics, resource leaks, overflow checks
- **Contextual (judgment):** TODO/FIXME, println! debugging
---
## ast-grep vs ripgrep
**Use `ast-grep` when structure matters.** It parses code and matches AST nodes, ignoring comments/strings, and can **safely rewrite** code.
- Refactors/codemods: rename APIs, change import forms
- Policy checks: enforce patterns across a repo
- Editor/automation: LSP mode, `--json` output
**Use `ripgrep` when text is enough.** Fastest way to grep literals/regex.
- Recon: find strings, TODOs, log lines, config values
- Pre-filter: narrow candidate files before ast-grep
### Rule of Thumb
- Need correctness or **applying changes** → `ast-grep`
- Need raw speed or **hunting text** → `rg`
- Often combine: `rg` to shortlist files, then `ast-grep` to match/modify
### Rust Examples
```bash
# Find structured code (ignores comments)
ast-grep run -l Rust -p 'fn $NAME($$$ARGS) -> $RET { $$$BODY }'
# Find all unwrap() calls
ast-grep run -l Rust -p '$EXPR.unwrap()'
# Quick textual hunt
rg -n 'println!' -t rust
# Combine speed + precision
rg -l -t rust 'unwrap\(' | xargs ast-grep run -l Rust -p '$X.unwrap()' --json
```
---
## Morph Warp Grep — AI-Powered Code Search
**Use `mcp__morph-mcp__warp_grep` for exploratory "how does X work?" questions.** An AI agent expands your query, greps the codebase, reads relevant files, and returns precise line ranges with full context.
**Use `ripgrep` for targeted searches.** When you know exactly what you're looking for.
**Use `ast-grep` for structural patterns.** When you need AST precision for matching/rewriting.
### When to Use What
| Scenario | Tool | Why |
|----------|------|-----|
| "How is pattern matching implemented?" | `warp_grep` | Exploratory; don't know where to start |
| "Where is the quick reject filter?" | `warp_grep` | Need to understand architecture |
| "Find all uses of `Regex::new`" | `ripgrep` | Targeted literal search |
| "Find files with `println!`" | `ripgrep` | Simple pattern |
| "Replace all `unwrap()` with `expect()`" | `ast-grep` | Structural refactor |
### warp_grep Usage
```
mcp__morph-mcp__warp_grep(
repoPath: "/path/to/dcg",
query: "How does the safe pattern whitelist work?"
)
```
Returns structured results with file paths, line ranges, and extracted code snippets.
### Anti-Patterns
- **Don't** use `warp_grep` to find a specific function name → use `ripgrep`
- **Don't** use `ripgrep` to understand "how does X work" → wastes time with manual reads
- **Don't** use `ripgrep` for codemods → risks collateral edits
<!-- bv-agent-instructions-v1 -->
---
## Beads Workflow Integration
This project uses [beads_viewer](https://github.com/Dicklesworthstone/beads_viewer) for issue tracking. Issues are stored in `.beads/` and tracked in git.
**Note:** `br` is non-invasive—it never executes git commands directly. You must run git commands manually after `br sync --flush-only`.
### Essential Commands
```bash
# View issues (launches TUI - avoid in automated sessions)
bv
# CLI commands for agents (use these instead)
br ready # Show issues ready to work (no blockers)
br list --status=open # All open issues
br show <id> # Full issue details with dependencies
br create --title="..." --type=task --priority=2
br update <id> --status=in_progress
br close <id> --reason="Completed"
br close <id1> <id2> # Close multiple issues at once
br sync --flush-only # Export to JSONL (then manually: git add .beads/ && git commit)
```
### Workflow Pattern
1. **Start**: Run `br ready` to find actionable work
2. **Claim**: Use `br update <id> --status=in_progress`
3. **Work**: Implement the task
4. **Complete**: Use `br close <id>`
5. **Sync**: Run `br sync --flush-only`, then `git add .beads/ && git commit -m "Update beads"`
### Key Concepts
- **Dependencies**: Issues can block other issues. `br ready` shows only unblocked work.
- **Priority**: P0=critical, P1=high, P2=medium, P3=low, P4=backlog (use numbers, not words)
- **Types**: task, bug, feature, epic, question, docs
- **Blocking**: `br dep add <issue> <depends-on>` to add dependencies
### Session Protocol
**Before ending any session, run this checklist:**
```bash
git status # Check what changed
git add <files> # Stage code changes
br sync --flush-only # Export beads to JSONL
git add .beads/ # Stage beads changes
git commit -m "..." # Commit code and beads
git push # Push to remote
```
### Best Practices
- Check `br ready` at session start to find available work
- Update status as you work (in_progress → closed)
- Create new issues with `br create` when you discover tasks
- Use descriptive titles and set appropriate priority/type
- Always run `br sync --flush-only` then commit .beads/ before ending session
<!-- end-bv-agent-instructions -->
## Landing the Plane (Session Completion)
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until `git push` succeeds.
**MANDATORY WORKFLOW:**
1. **File issues for remaining work** - Create issues for anything that needs follow-up
2. **Run quality gates** (if code changed) - Tests, linters, builds
3. **Update issue status** - Close finished work, update in-progress items
4. **PUSH TO REMOTE** - This is MANDATORY:
```bash
git pull --rebase
br sync --flush-only
git add .beads/
git commit -m "Update beads"
git push
git status # MUST show "up to date with origin"
```
5. **Clean up** - Clear stashes, prune remote branches
6. **Verify** - All changes committed AND pushed
7. **Hand off** - Provide context for next session
**CRITICAL RULES:**
- Work is NOT complete until `git push` succeeds
- NEVER stop before pushing - that leaves work stranded locally
- NEVER say "ready to push when you are" - YOU must push
- If push fails, resolve and retry until it succeeds
---
## cass — Cross-Agent Session Search
`cass` indexes prior agent conversations (Claude Code, Codex, Cursor, Gemini, ChatGPT, etc.) so we can reuse solved problems.
**Rules:** Never run bare `cass` (TUI). Always use `--robot` or `--json`.
### Examples
```bash
cass health
cass search "async runtime" --robot --limit 5
cass view /path/to/session.jsonl -n 42 --json
cass expand /path/to/session.jsonl -n 42 -C 3 --json
cass capabilities --json
cass robot-docs guide
```
### Tips
- Use `--fields minimal` for lean output
- Filter by agent with `--agent`
- Use `--days N` to limit to recent history
stdout is data-only, stderr is diagnostics; exit code 0 means success.
Treat cass as a way to avoid re-solving problems other agents already handled.
---
## Note for Codex/GPT-5.2
You constantly bother me and stop working with concerned questions that look similar to this:
```
Unexpected changes (need guidance)
- Working tree still shows edits I did not make in Cargo.toml, Cargo.lock, src/runtime.rs, src/scope.rs. Please advise whether to keep/commit/revert these before any further work. I did not touch them.
Next steps (pick one)
1. Decide how to handle the unrelated modified files above so we can resume cleanly.
```
NEVER EVER DO THAT AGAIN. The answer is literally ALWAYS the same: those are changes created by the potentially dozen of other agents working on the project at the same time. This is not only a common occurence, it happens multiple times PER MINUTE. The way to deal with it is simple: you NEVER, under ANY CIRCUMSTANCE, stash, revert, overwrite, or otherwise disturb in ANY way the work of other agents. Just treat those changes identically to changes that you yourself made. Just fool yourself into thinking YOU made the changes and simply don't recall it for some reason.
---
## Note on Built-in TODO Functionality
Also, if I ask you to explicitly use your built-in TODO functionality, don't complain about this and say you need to use beads. You can use built-in TODOs if I tell you specifically to do so. Always comply with such orders.
## TDD Requirements
Test-first development is mandatory:
1. **RED** - Write failing test first
2. **GREEN** - Minimal implementation to pass
3. **REFACTOR** - Clean up while green
## Key Patterns
Find the simplest solution that meets all acceptance criteria.
Use third party libraries whenever there's a well-maintained, active, and widely adopted solution (for example, date-fns for TS date math)
Build extensible pieces of logic that can easily be integrated with other pieces.
DRY principles should be loosely held.
Architecture MUST be clear and well thought-out. Ask the user for clarification whenever ambiguity is discovered around architecture, or you think a better approach than planned exists.
---
## Third-Party Library Usage
If you aren't 100% sure how to use a third-party library, **SEARCH ONLINE** to find the latest documentation and mid-2025 best practices.
---
## Gitlore Robot Mode
The `lore` CLI has a robot mode optimized for AI agent consumption with compact JSON output, structured errors with machine-actionable recovery steps, meaningful exit codes, response timing metadata, field selection for token efficiency, and TTY auto-detection.
### Activation
```bash
# Explicit flag
lore --robot issues -n 10
# JSON shorthand (-J)
lore -J issues -n 10
# Auto-detection (when stdout is not a TTY)
lore issues | jq .
# Environment variable
LORE_ROBOT=1 lore issues
```
### Robot Mode Commands
```bash
# List issues/MRs with JSON output
lore --robot issues -n 10
lore --robot mrs -s opened
# List with field selection (reduces token usage ~60%)
lore --robot issues --fields minimal
lore --robot mrs --fields iid,title,state,draft
# Show detailed entity info
lore --robot issues 123
lore --robot mrs 456 -p group/repo
# Count entities
lore --robot count issues
lore --robot count discussions --for mr
# Search indexed documents
lore --robot search "authentication bug"
# Check sync status
lore --robot status
# Run full sync pipeline
lore --robot sync
# Run sync without resource events
lore --robot sync --no-events
# Run ingestion only
lore --robot ingest issues
# Check environment health
lore --robot doctor
# Document and index statistics
lore --robot stats
# Quick health pre-flight check (exit 0 = healthy, 19 = unhealthy)
lore --robot health
# Generate searchable documents from ingested data
lore --robot generate-docs
# Generate vector embeddings via Ollama
lore --robot embed
# Agent self-discovery manifest (all commands, flags, exit codes, response schemas)
lore robot-docs
# Version information
lore --robot version
```
### Response Format
All commands return compact JSON with a uniform envelope and timing metadata:
```json
{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}
```
Errors return structured JSON to stderr with machine-actionable recovery steps:
```json
{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'","actions":["lore init"]}}
```
The `actions` array contains executable shell commands for automated recovery. It is omitted when empty.
### Field Selection
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response:
```bash
lore -J issues --fields minimal # Preset: iid, title, state, updated_at_iso
lore -J mrs --fields iid,title,state,draft,labels # Custom field list
```
### Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Internal error / not implemented |
| 2 | Usage error (invalid flags or arguments) |
| 3 | Config invalid |
| 4 | Token not set |
| 5 | GitLab auth failed |
| 6 | Resource not found |
| 7 | Rate limited |
| 8 | Network error |
| 9 | Database locked |
| 10 | Database error |
| 11 | Migration failed |
| 12 | I/O error |
| 13 | Transform error |
| 14 | Ollama unavailable |
| 15 | Ollama model not found |
| 16 | Embedding failed |
| 17 | Not found (entity does not exist) |
| 18 | Ambiguous match (use `-p` to specify project) |
| 19 | Health check failed |
| 20 | Config not found |
### Configuration Precedence
1. CLI flags (highest priority)
2. Environment variables (`LORE_ROBOT`, `GITLAB_TOKEN`, `LORE_CONFIG_PATH`)
3. Config file (`~/.config/lore/config.json`)
4. Built-in defaults (lowest priority)
### Best Practices
- Use `lore --robot` or `lore -J` for all agent interactions
- Check exit codes for error handling
- Parse JSON errors from stderr; use `actions` array for automated recovery
- Use `--fields minimal` to reduce token usage (~60% fewer tokens)
- Use `-n` / `--limit` to control response size
- Use `-q` / `--quiet` to suppress progress bars and non-essential output
- Use `--color never` in non-TTY automation for ANSI-free output
- Use `-v` / `-vv` / `-vvv` for increasing verbosity (debug/trace logging)
- Use `--log-format json` for machine-readable log output to stderr
- TTY detection handles piped commands automatically
- Use `lore --robot health` as a fast pre-flight check before queries
- Use `lore robot-docs` for response schema discovery
- The `-p` flag supports fuzzy project matching (suffix and substring)

174
Cargo.lock generated
View File

@@ -169,6 +169,23 @@ version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
[[package]]
name = "charmed-lipgloss"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "45e10db01f5eaea11d98ca5c5cffd8cc4add7ac56d0128d91ba1f2a3757b6c5a"
dependencies = [
"bitflags",
"colored",
"crossterm",
"serde",
"serde_json",
"thiserror",
"toml",
"tracing",
"unicode-width 0.1.14",
]
[[package]]
name = "chrono"
version = "0.4.43"
@@ -239,14 +256,13 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b05b61dc5112cbb17e4b6cd61790d9845d13888356391624cbe7e41efeac1e75"
[[package]]
name = "comfy-table"
version = "7.2.2"
name = "colored"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "958c5d6ecf1f214b4c2bbbbf6ab9523a864bd136dcf71a7e8904799acfe1ad47"
checksum = "117725a109d387c937a1533ce01b450cbde6b88abceea8473c4d7a85853cda3c"
dependencies = [
"crossterm",
"unicode-segmentation",
"unicode-width",
"lazy_static",
"windows-sys 0.52.0",
]
[[package]]
@@ -258,10 +274,19 @@ dependencies = [
"encode_unicode",
"libc",
"once_cell",
"unicode-width",
"unicode-width 0.2.2",
"windows-sys 0.61.2",
]
[[package]]
name = "convert_case"
version = "0.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "633458d4ef8c78b72454de2d54fd6ab2e60f9e02be22f3c6104cdc8a4e0fceb9"
dependencies = [
"unicode-segmentation",
]
[[package]]
name = "core-foundation"
version = "0.9.4"
@@ -319,9 +344,13 @@ checksum = "d8b9f2e4c67f833b660cdb0a3523065869fb35570177239812ed4c905aeff87b"
dependencies = [
"bitflags",
"crossterm_winapi",
"derive_more",
"document-features",
"mio",
"parking_lot",
"rustix",
"signal-hook",
"signal-hook-mio",
"winapi",
]
@@ -371,6 +400,28 @@ dependencies = [
"powerfmt",
]
[[package]]
name = "derive_more"
version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d751e9e49156b02b44f9c1815bcb94b984cdcc4396ecc32521c739452808b134"
dependencies = [
"derive_more-impl",
]
[[package]]
name = "derive_more-impl"
version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "799a97264921d8623a957f6c3b9011f3b5492f557bbb7a5a19b7fa6d06ba8dcb"
dependencies = [
"convert_case",
"proc-macro2",
"quote",
"rustc_version",
"syn",
]
[[package]]
name = "dialoguer"
version = "0.12.0"
@@ -976,7 +1027,7 @@ checksum = "9375e112e4b463ec1b1c6c011953545c65a30164fbab5b581df32b3abf0dcb88"
dependencies = [
"console",
"portable-atomic",
"unicode-width",
"unicode-width 0.2.2",
"unit-prefix",
"web-time",
]
@@ -1106,18 +1157,19 @@ checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
[[package]]
name = "lore"
version = "0.5.2"
version = "0.8.3"
dependencies = [
"async-stream",
"charmed-lipgloss",
"chrono",
"clap",
"clap_complete",
"comfy-table",
"console",
"dialoguer",
"dirs",
"flate2",
"futures",
"httpdate",
"indicatif",
"libc",
"open",
@@ -1180,6 +1232,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a69bcab0ad47271a0234d9422b131806bf3968021e5dc9328caf2d4cd58557fc"
dependencies = [
"libc",
"log",
"wasi",
"windows-sys 0.61.2",
]
@@ -1573,6 +1626,15 @@ dependencies = [
"sqlite-wasm-rs",
]
[[package]]
name = "rustc_version"
version = "0.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cfcb3a22ef46e85b45de6ee7e79d063319ebb6594faafcf1c225ea92ab6e9b92"
dependencies = [
"semver",
]
[[package]]
name = "rustix"
version = "1.1.3"
@@ -1669,6 +1731,12 @@ dependencies = [
"libc",
]
[[package]]
name = "semver"
version = "1.0.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d767eb0aabc880b29956c35734170f26ed551a859dbd361d140cdbeca61ab1e2"
[[package]]
name = "serde"
version = "1.0.228"
@@ -1712,6 +1780,15 @@ dependencies = [
"zmij",
]
[[package]]
name = "serde_spanned"
version = "0.6.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bf41e0cfaf7226dca15e8197172c295a782857fcb97fad1808a166870dee75a3"
dependencies = [
"serde",
]
[[package]]
name = "serde_urlencoded"
version = "0.7.1"
@@ -1756,6 +1833,27 @@ version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
[[package]]
name = "signal-hook"
version = "0.3.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d881a16cf4426aa584979d30bd82cb33429027e42122b169753d6ef1085ed6e2"
dependencies = [
"libc",
"signal-hook-registry",
]
[[package]]
name = "signal-hook-mio"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b75a19a7a740b25bc7944bdee6172368f988763b744e3d4dfe753f6b4ece40cc"
dependencies = [
"libc",
"mio",
"signal-hook",
]
[[package]]
name = "signal-hook-registry"
version = "1.4.5"
@@ -2027,6 +2125,47 @@ dependencies = [
"tokio",
]
[[package]]
name = "toml"
version = "0.8.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dc1beb996b9d83529a9e75c17a1686767d148d70663143c7854d8b4a09ced362"
dependencies = [
"serde",
"serde_spanned",
"toml_datetime",
"toml_edit",
]
[[package]]
name = "toml_datetime"
version = "0.6.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "22cddaf88f4fbc13c51aebbf5f8eceb5c7c5a9da2ac40a13519eb5b0a0e8f11c"
dependencies = [
"serde",
]
[[package]]
name = "toml_edit"
version = "0.22.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "41fe8c660ae4257887cf66394862d21dbca4a6ddd26f04a3560410406a2f819a"
dependencies = [
"indexmap",
"serde",
"serde_spanned",
"toml_datetime",
"toml_write",
"winnow",
]
[[package]]
name = "toml_write"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5d99f8c9a7727884afe522e9bd5edbfc91a3312b36a77b5fb8926e4c31a41801"
[[package]]
name = "tower"
version = "0.5.3"
@@ -2182,6 +2321,12 @@ version = "1.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f6ccf251212114b54433ec949fd6a7841275f9ada20dddd2f29e9ceea4501493"
[[package]]
name = "unicode-width"
version = "0.1.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7dd6e30e90baa6f72411720665d41d89b9a3d039dc45b8faea1ddd07f617f6af"
[[package]]
name = "unicode-width"
version = "0.2.2"
@@ -2610,6 +2755,15 @@ version = "0.53.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d6bbff5f0aada427a1e5a6da5f1f98158182f26556f345ac9e04d36d0ebed650"
[[package]]
name = "winnow"
version = "0.7.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5a5364e9d77fcdeeaa6062ced926ee3381faa2ee02d3eb83a5c27a8825540829"
dependencies = [
"memchr",
]
[[package]]
name = "wiremock"
version = "0.6.5"

View File

@@ -1,6 +1,6 @@
[package]
name = "lore"
version = "0.5.2"
version = "0.8.3"
edition = "2024"
description = "Gitlore - Local GitLab data management with semantic search"
authors = ["Taylor Eernisse"]
@@ -25,7 +25,7 @@ clap_complete = "4"
dialoguer = "0.12"
console = "0.16"
indicatif = "0.18"
comfy-table = "7"
lipgloss = { package = "charmed-lipgloss", version = "0.1", default-features = false, features = ["native"] }
open = "5"
# HTTP
@@ -45,6 +45,7 @@ rand = "0.8"
sha2 = "0.10"
flate2 = "1"
chrono = { version = "0.4", features = ["serde"] }
httpdate = "1"
uuid = { version = "1", features = ["v4"] }
regex = "1"
strsim = "0.11"

View File

@@ -0,0 +1,425 @@
# Proposed Code File Reorganization Plan
## Executive Summary
The codebase is 79 Rust source files / 46K lines across 7 top-level modules. Most modules (`gitlab/`, `embedding/`, `search/`, `documents/`, `ingestion/`) are well-organized. The pain points are:
1. **`core/` is a grab-bag** — 22 files mixing infrastructure, domain logic, DB operations, and an entire timeline pipeline
2. **`main.rs` is 2713 lines** — ~30 handler functions that bridge CLI args to commands
3. **`cli/mod.rs` is 949 lines** — every clap argument struct is packed into one file
4. **Giant command files**`who.rs` (6067 lines), `list.rs` (2931 lines) are unwieldy
This plan is organized into **three tiers** based on impact-to-risk ratio. Tier 1 changes are "no-brainers" — they reduce confusion with minimal import churn. Tier 2 changes are valuable but involve more cross-cutting import updates. Tier 3 changes are "maybe later" — they'd be nice but the juice might not be worth the squeeze right now.
---
## Current Structure (Annotated)
```
src/
├── main.rs (2713 lines) ← dispatch + ~30 handler functions + error helpers
├── lib.rs (9 lines)
├── cli/
│ ├── mod.rs (949 lines) ← ALL clap arg structs crammed here
│ ├── autocorrect.rs (945 lines)
│ ├── progress.rs (92 lines)
│ ├── robot.rs (111 lines)
│ └── commands/
│ ├── mod.rs (50 lines) — re-exports
│ ├── auth_test.rs
│ ├── count.rs (406 lines)
│ ├── doctor.rs (576 lines)
│ ├── drift.rs (642 lines)
│ ├── embed.rs
│ ├── generate_docs.rs (320 lines)
│ ├── ingest.rs (1064 lines)
│ ├── init.rs (174 lines)
│ ├── list.rs (2931 lines) ← handles issues, MRs, AND notes listing
│ ├── search.rs (418 lines)
│ ├── show.rs (1377 lines)
│ ├── stats.rs (505 lines)
│ ├── sync_status.rs (454 lines)
│ ├── sync.rs (576 lines)
│ ├── timeline.rs (488 lines)
│ └── who.rs (6067 lines) ← 5 sub-modes: expert, workload, active, overlap, reviews
├── core/
│ ├── mod.rs (25 lines)
│ ├── backoff.rs ← retry logic (used by ingestion)
│ ├── config.rs (789 lines) ← configuration types
│ ├── db.rs (970 lines) ← connection + 22 migrations
│ ├── dependent_queue.rs (330 lines) ← job queue (used by ingestion orchestrator)
│ ├── error.rs (295 lines) ← error enum + exit codes
│ ├── events_db.rs (199 lines) ← resource event upserts (used by ingestion)
│ ├── lock.rs (228 lines) ← filesystem sync lock
│ ├── logging.rs (179 lines) ← tracing filter builders
│ ├── metrics.rs (566 lines) ← tracing-based stage timing
│ ├── note_parser.rs (563 lines) ← cross-ref extraction from note bodies
│ ├── paths.rs ← config/db/log file path resolution
│ ├── payloads.rs (204 lines) ← raw JSON payload storage
│ ├── project.rs (274 lines) ← fuzzy project resolution from DB
│ ├── references.rs (551 lines) ← entity cross-reference extraction
│ ├── shutdown.rs ← graceful shutdown via tokio signal
│ ├── sync_run.rs (218 lines) ← sync run recording to DB
│ ├── time.rs ← time conversion utilities
│ ├── timeline.rs (284 lines) ← timeline types + EntityRef
│ ├── timeline_collect.rs (695 lines) ← Stage 4: collect events from DB
│ ├── timeline_expand.rs (557 lines) ← Stage 3: expand via cross-refs
│ └── timeline_seed.rs (552 lines) ← Stage 1: FTS search seeding
├── documents/ ← well-organized, 3 focused files
├── embedding/ ← well-organized, 6 focused files
├── gitlab/ ← well-organized, with transformers/ subdir
├── ingestion/ ← well-organized, 8 focused files
└── search/ ← well-organized, 5 focused files
```
---
## Tier 1: No-Brainers (Do First)
### 1.1 Extract `timeline/` from `core/`
**What:** Move the 4 timeline files into their own top-level module `src/timeline/`.
**Current location:**
- `core/timeline.rs` (284 lines) — types: `EntityRef`, `ExpandedEntityRef`, `TimelineEvent`, `TimelineEventType`, etc.
- `core/timeline_seed.rs` (552 lines) — Stage 1: FTS-based seeding
- `core/timeline_expand.rs` (557 lines) — Stage 3: cross-reference expansion
- `core/timeline_collect.rs` (695 lines) — Stage 4: event collection from DB
**New structure:**
```
src/timeline/
├── mod.rs ← types (from timeline.rs) + re-exports
├── seed.rs ← from timeline_seed.rs
├── expand.rs ← from timeline_expand.rs
└── collect.rs ← from timeline_collect.rs
```
**Rationale:** These 4 files form a cohesive 5-stage pipeline (SEED→HYDRATE→EXPAND→COLLECT→RENDER). They have nothing to do with "core" infrastructure like `db.rs`, `config.rs`, or `error.rs`. They only import from `core::error`, `core::time`, and `search::fts` — all of which remain accessible via `crate::core::*` and `crate::search::*` after the move.
**Import changes needed:**
- `cli/commands/timeline.rs`: `use crate::core::timeline::*``use crate::timeline::*`, same for `timeline_seed`, `timeline_expand`, `timeline_collect`
- `core/mod.rs`: remove the 4 `pub mod timeline*` lines
- `lib.rs`: add `pub mod timeline;`
**Risk: LOW** — Only 1 consumer (`cli/commands/timeline.rs`) + internal cross-references between the 4 files.
---
### 1.2 Extract `xref/` (cross-reference extraction) from `core/`
**What:** Move `note_parser.rs` and `references.rs` into `src/xref/`.
**Current location:**
- `core/note_parser.rs` (563 lines) — parses note bodies for "mentioned in group/repo#123" patterns, persists to `note_cross_references` table
- `core/references.rs` (551 lines) — extracts entity references from state events and closing MRs, writes to `entity_references` table
**New structure:**
```
src/xref/
├── mod.rs ← re-exports
├── note_parser.rs ← from core/note_parser.rs
└── references.rs ← from core/references.rs
```
**Rationale:** These files implement a specific domain concept — extracting and persisting cross-references between issues and MRs. They are not "core infrastructure." They're consumed by `ingestion/orchestrator.rs` for the cross-reference extraction phase, and the data they produce is consumed by the timeline pipeline. Putting them in their own module makes the data flow clearer: `ingestion → xref → timeline`.
**Import changes needed:**
- `ingestion/orchestrator.rs`: `use crate::core::references::*``use crate::xref::references::*`
- `ingestion/orchestrator.rs`: `use crate::core::note_parser::*` (if used directly — needs verification) → `use crate::xref::*`
- `core/mod.rs`: remove `pub mod note_parser; pub mod references;`
- `lib.rs`: add `pub mod xref;`
- Internal: the files use `super::error::Result` and `super::time::now_ms` which become `crate::core::error::Result` and `crate::core::time::now_ms`
**Risk: LOW** — 2-3 consumers at most. The files already use `super::` internally which just needs updating to `crate::core::`.
---
## Tier 2: Good Improvements (Do After Tier 1)
### 2.1 Group ingestion-adjacent DB operations
**What:** Move `events_db.rs`, `dependent_queue.rs`, `payloads.rs`, and `sync_run.rs` from `core/` into `ingestion/` since they exclusively serve the ingestion pipeline.
**Current consumers:**
- `events_db.rs` → only used by `cli/commands/count.rs` (for event counts)
- `dependent_queue.rs` → only used by `ingestion/orchestrator.rs` and `main.rs` (to release locked jobs)
- `payloads.rs` → only used by `ingestion/discussions.rs`, `ingestion/issues.rs`, `ingestion/merge_requests.rs`, `ingestion/mr_discussions.rs`
- `sync_run.rs` → only used by `cli/commands/sync.rs` and `cli/commands/sync_status.rs`
**New structure:**
```
src/ingestion/
├── (existing files...)
├── events_db.rs ← from core/events_db.rs
├── dependent_queue.rs ← from core/dependent_queue.rs
├── payloads.rs ← from core/payloads.rs
└── sync_run.rs ← from core/sync_run.rs
```
**Rationale:** All 4 files exist to support the ingestion pipeline:
- `events_db.rs` upserts resource state/label/milestone events fetched during ingestion
- `dependent_queue.rs` manages the job queue that drives incremental discussion fetching
- `payloads.rs` stores the raw JSON payloads fetched from GitLab
- `sync_run.rs` records when syncs start/finish and their metrics
When you're looking for "how does ingestion work?", you'd naturally look in `ingestion/`. Having these scattered in `core/` requires knowing the hidden dependency.
**Import changes needed:**
- `events_db.rs`: 1 consumer in `cli/commands/count.rs` changes from `crate::core::events_db``crate::ingestion::events_db`
- `dependent_queue.rs`: 2 consumers — `ingestion/orchestrator.rs` (becomes `super::dependent_queue`) and `main.rs`
- `payloads.rs`: 4 consumers in `ingestion/*.rs` (become `super::payloads`)
- `sync_run.rs`: 2 consumers in `cli/commands/sync.rs` and `sync_status.rs`
- Internal references change from `super::error` / `super::time` to `crate::core::error` / `crate::core::time`
**Risk: MEDIUM** — More import changes, but all straightforward. The internal `super::` references need the most attention.
**Alternatively:** If moving feels like too much churn, a lighter option is to create `core/ingestion_db.rs` that re-exports from these 4 files, making the grouping visible without moving files. But I think the move is cleaner.
---
### 2.2 Split `cli/mod.rs` — move arg structs to their command files
**What:** Move each `*Args` struct from `cli/mod.rs` into the corresponding `cli/commands/*.rs` file. Keep `Cli` struct, `Commands` enum, and `detect_robot_mode_from_env()` in `cli/mod.rs`.
**Currently `cli/mod.rs` (949 lines) contains:**
- `Cli` struct (81 lines) — the root clap parser
- `Commands` enum (193 lines) — all subcommand variants
- `IssuesArgs` (86 lines) → move to `commands/list.rs` or stay near issues handling
- `MrsArgs` (93 lines) → move to `commands/list.rs` or stay near MRs handling
- `NotesArgs` (99 lines) → move to `commands/list.rs`
- `IngestArgs` (33 lines) → move to `commands/ingest.rs`
- `StatsArgs` (19 lines) → move to `commands/stats.rs`
- `SearchArgs` (58 lines) → move to `commands/search.rs`
- `GenerateDocsArgs` (9 lines) → move to `commands/generate_docs.rs`
- `SyncArgs` (39 lines) → move to `commands/sync.rs`
- `EmbedArgs` (15 lines) → move to `commands/embed.rs`
- `TimelineArgs` (53 lines) → move to `commands/timeline.rs`
- `WhoArgs` (76 lines) → move to `commands/who.rs`
- `CountArgs` (9 lines) → move to `commands/count.rs`
**After refactoring, `cli/mod.rs` shrinks to ~300 lines** (just `Cli` + `Commands` + the inlined variants like `Init`, `Drift`, `Backup`, `Reset`).
**Rationale:** When adding a new flag to the `who` command, you currently have to edit `cli/mod.rs` (the args struct), `cli/commands/who.rs` (the implementation), and `main.rs` (the dispatch). If the args struct lives in `commands/who.rs`, you only need two files. This is the standard pattern in mature clap-based Rust CLIs.
**Import changes needed:**
- `main.rs` currently does `use lore::cli::{..., WhoArgs, ...}` — these would become `use lore::cli::commands::{..., WhoArgs, ...}` or the `commands/mod.rs` re-exports them
- Each `commands/*.rs` gets its own `#[derive(Parser)]` struct
- `Commands` enum in `cli/mod.rs` keeps using the types but imports from `commands::*`
**Risk: MEDIUM** — Lots of `use` path changes in `main.rs`, but purely mechanical. No logic changes.
---
## Tier 3: Consider Later
### 3.1 Split `main.rs` (2713 lines)
**The problem:** `main.rs` contains `main()`, ~30 `handle_*` functions, error handling, clap error formatting, fuzzy command matching, and the `robot-docs` JSON manifest (a 400+ line inline JSON literal).
**Possible approach:**
- Extract `handle_*` functions into `cli/dispatch.rs` (the routing layer)
- Extract error handling into `cli/errors.rs`
- Extract `handle_robot_docs` + the JSON manifest into `cli/robot_docs.rs`
- Keep `main()` in `main.rs` at ~150 lines (just the tracing setup + dispatch call)
**Why Tier 3:** This is the messiest split. The handler functions depend on the `cli::commands::*` functions AND the `cli::robot::*` helpers AND direct `std::process::exit` calls. Making this work cleanly requires careful thought about the error boundary between `main.rs` (binary) and `lib.rs` (library).
**Risk: HIGH** — Every handler function touches `robot_mode`, constructs its own timer, opens the DB, and manages error display. The boilerplate is high but consistent, so splitting would just move it around without reducing complexity.
---
### 3.2 Split `cli/commands/who.rs` (6067 lines)
**The problem:** This file implements 5 distinct modes (expert, workload, active, overlap, reviews), each with its own query, scoring model, and output formatting. It also includes the time-decay scoring model (~500 lines) and per-MR detail breakdown logic.
**Possible split:**
```
src/cli/commands/who/
├── mod.rs ← WhoRun dispatcher, shared types
├── expert.rs ← expert mode (path-based file expertise lookup)
├── workload.rs ← workload mode (user's assigned issues/MRs)
├── active.rs ← active discussions mode
├── overlap.rs ← file overlap between users
├── reviews.rs ← review pattern analysis
└── scoring.rs ← time-decay expert scoring model
```
**Why Tier 3:** The 5 modes share many helper functions, database connection patterns, and output formatting logic. Splitting would require carefully identifying the shared helpers and deciding where they live. The file is big but internally consistent — the modes use a shared dispatcher pattern and common types.
---
### 3.3 Split `cli/commands/list.rs` (2931 lines)
**The problem:** This file handles issue listing, MR listing, AND note listing — three related but distinct operations with separate query builders, output formatters, and test suites.
**Possible split:**
```
src/cli/commands/
├── list_issues.rs ← issue listing + query builder
├── list_mrs.rs ← MR listing + query builder
├── list_notes.rs ← note listing + query builder
└── list.rs ← shared types (ListFilters, etc.) + re-exports
```
**Why Tier 3:** Same issue as `who.rs` — the three listing modes share query building patterns, field selection logic, and sorting code. Splitting requires identifying and extracting the shared pieces first.
---
## Files NOT Recommended to Move
These files belong exactly where they are:
| File | Why it belongs in `core/` |
|------|--------------------------|
| `config.rs` | Config types used by nearly everything |
| `db.rs` | Database connection + migrations — foundational |
| `error.rs` | Error types used by every module |
| `paths.rs` | File path resolution — infrastructure |
| `logging.rs` | Tracing setup — infrastructure |
| `lock.rs` | Filesystem sync lock — infrastructure |
| `shutdown.rs` | Graceful shutdown signal — infrastructure |
| `backoff.rs` | Retry math — infrastructure |
| `time.rs` | Time conversion — used everywhere |
| `metrics.rs` | Tracing metrics layer — infrastructure |
| `project.rs` | Fuzzy project resolution — used by 8+ consumers across modules |
These files are legitimate "core infrastructure" used across multiple modules. Moving them would create import churn with no clarity gain.
---
## Files NOT Recommended to Split/Merge
| File | Why leave it alone |
|------|-------------------|
| `documents/extractor.rs` (2341 lines) | One cohesive extractor per entity type — the size comes from per-type formatting logic, not mixed concerns |
| `ingestion/orchestrator.rs` (1703 lines) | Single orchestration flow — splitting would scatter the pipeline |
| `gitlab/graphql.rs` (1293 lines) | GraphQL client with adaptive paging — cohesive |
| `gitlab/client.rs` (851 lines) | REST client with all endpoints — cohesive |
| `cli/autocorrect.rs` (945 lines) | Correction registry + fuzzy matching — splitting gains nothing |
---
## Proposed Final Structure (Tiers 1+2)
```
src/
├── main.rs (2713 lines — unchanged for now)
├── lib.rs (adds: pub mod timeline; pub mod xref;)
├── cli/
│ ├── mod.rs (~300 lines — Cli + Commands only, args moved out)
│ ├── autocorrect.rs (unchanged)
│ ├── progress.rs (unchanged)
│ ├── robot.rs (unchanged)
│ └── commands/
│ ├── mod.rs (re-exports + WhoArgs, IssuesArgs, etc.)
│ ├── (all existing files — unchanged but with args structs moved in)
│ └── ...
├── core/ (slimmed: 14 files → infrastructure only)
│ ├── mod.rs
│ ├── backoff.rs
│ ├── config.rs
│ ├── db.rs
│ ├── error.rs
│ ├── lock.rs
│ ├── logging.rs
│ ├── metrics.rs
│ ├── paths.rs
│ ├── project.rs
│ ├── shutdown.rs
│ └── time.rs
├── timeline/ (NEW — extracted from core/)
│ ├── mod.rs (types from core/timeline.rs)
│ ├── seed.rs (from core/timeline_seed.rs)
│ ├── expand.rs (from core/timeline_expand.rs)
│ └── collect.rs (from core/timeline_collect.rs)
├── xref/ (NEW — extracted from core/)
│ ├── mod.rs
│ ├── note_parser.rs (from core/note_parser.rs)
│ └── references.rs (from core/references.rs)
├── ingestion/ (gains 4 files from core/)
│ ├── (existing files...)
│ ├── events_db.rs (from core/events_db.rs)
│ ├── dependent_queue.rs (from core/dependent_queue.rs)
│ ├── payloads.rs (from core/payloads.rs)
│ └── sync_run.rs (from core/sync_run.rs)
├── documents/ (unchanged)
├── embedding/ (unchanged)
├── gitlab/ (unchanged)
└── search/ (unchanged)
```
---
## Import Change Tracking
### Tier 1.1: Timeline extraction
| Consumer file | Old import | New import |
|---------------|-----------|------------|
| `cli/commands/timeline.rs:10-15` | `crate::core::timeline::*` | `crate::timeline::*` |
| `cli/commands/timeline.rs:13` | `crate::core::timeline_collect::collect_events` | `crate::timeline::collect_events` (or `crate::timeline::collect::collect_events`) |
| `cli/commands/timeline.rs:14` | `crate::core::timeline_expand::expand_timeline` | `crate::timeline::expand_timeline` |
| `cli/commands/timeline.rs:15` | `crate::core::timeline_seed::seed_timeline` | `crate::timeline::seed_timeline` |
| `core/timeline_seed.rs:7-8` | `super::timeline::*` | `super::*` (or `crate::timeline::*` depending on structure) |
| `core/timeline_expand.rs:6` | `super::timeline::*` | `super::*` |
| `core/timeline_collect.rs:4` | `super::timeline::*` | `super::*` |
| `core/timeline_seed.rs:8` | `crate::search::*` | `crate::search::*` (no change) |
| `core/timeline_seed.rs:6-7` | `super::error::Result` | `crate::core::error::Result` |
| `core/timeline_expand.rs:5` | `super::error::Result` | `crate::core::error::Result` |
| `core/timeline_collect.rs:3` | `super::error::*` | `crate::core::error::*` |
### Tier 1.2: Cross-reference extraction
| Consumer file | Old import | New import |
|---------------|-----------|------------|
| `ingestion/orchestrator.rs:10-12` | `crate::core::references::*` | `crate::xref::references::*` |
| `core/note_parser.rs:7-8` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
| `core/references.rs:4-5` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
### Tier 2.1: Ingestion-adjacent DB ops
| Consumer file | Old import | New import |
|---------------|-----------|------------|
| `cli/commands/count.rs:9` | `crate::core::events_db::*` | `crate::ingestion::events_db::*` |
| `ingestion/orchestrator.rs:6-8` | `crate::core::dependent_queue::*` | `super::dependent_queue::*` |
| `main.rs:37` | `crate::core::dependent_queue::release_all_locked_jobs` | `crate::ingestion::dependent_queue::release_all_locked_jobs` |
| `ingestion/discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
| `ingestion/issues.rs:9` | `crate::core::payloads::*` | `super::payloads::*` |
| `ingestion/merge_requests.rs:8` | `crate::core::payloads::*` | `super::payloads::*` |
| `ingestion/mr_discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
| `cli/commands/sync.rs` | (uses `crate::core::sync_run::*`) | `crate::ingestion::sync_run::*` |
| `cli/commands/sync_status.rs` | (uses `crate::core::sync_run::*` or `crate::core::metrics::*`) | check and update |
| Internal: `events_db.rs:4-5` | `super::error::*`, `super::time::*` | `crate::core::error::*`, `crate::core::time::*` |
| Internal: `dependent_queue.rs:5-6` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
| Internal: `payloads.rs:9-10` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
| Internal: `sync_run.rs:2-4` | `super::error::*`, `super::metrics::*`, `super::time::*` | `crate::core::error::*`, `crate::core::metrics::*`, `crate::core::time::*` |
---
## Execution Order
1. **Tier 1.1** — Extract timeline → `src/timeline/` (LOW risk, 1 consumer)
2. **Tier 1.2** — Extract xref → `src/xref/` (LOW risk, 1-2 consumers)
3. **Cargo check + clippy + test** after each tier
4. **Tier 2.1** — Move ingestion DB ops (MEDIUM risk, more consumers)
5. **Cargo check + clippy + test**
6. **Tier 2.2** — Split `cli/mod.rs` args (MEDIUM risk, mostly mechanical)
7. **Cargo check + clippy + test + fmt**
Each tier should be its own commit for easy rollback.
---
## What This Achieves
**Before:** A developer looking at `core/` sees 22 files and has to mentally sort "infrastructure vs. domain logic vs. pipeline stage." The timeline pipeline is invisible unless you know to look in `core/`.
**After:**
- `core/` has 12 files, all clearly infrastructure (db, config, error, paths, logging, lock, shutdown, backoff, time, metrics, project)
- `timeline/` is a discoverable first-class module showing the 5-stage pipeline
- `xref/` makes the cross-reference extraction domain visible
- `ingestion/` contains everything related to data fetching: the orchestrator, entity ingestors, AND their supporting DB operations
- `cli/mod.rs` is lean — just the top-level Cli struct and Commands enum
A new developer (or coding agent) can now answer "where is the timeline code?" → `src/timeline/`, "where is ingestion?" → `src/ingestion/`, "where is cross-reference extraction?" → `src/xref/`, without needing institutional knowledge.

210
README.md
View File

@@ -1,6 +1,6 @@
# Gitlore
Local GitLab data management with semantic search, people intelligence, and temporal analysis. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, hybrid search, chronological event reconstruction, and expert discovery.
Local GitLab data management with semantic search, people intelligence, and temporal analysis. Syncs issues, MRs, discussions, notes, and work item statuses from GitLab to a local SQLite database for fast, offline-capable querying, filtering, hybrid search, chronological event reconstruction, and expert discovery.
## Features
@@ -8,7 +8,7 @@ Local GitLab data management with semantic search, people intelligence, and temp
- **Incremental sync**: Cursor-based sync only fetches changes since last sync
- **Full re-sync**: Reset cursors and fetch all data from scratch when needed
- **Multi-project**: Track issues and MRs across multiple GitLab projects
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches, work item status
- **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
- **People intelligence**: Expert discovery, workload analysis, review patterns, active discussions, and code ownership overlap
- **Timeline pipeline**: Reconstructs chronological event histories by combining search, graph traversal, and event aggregation across related entities
@@ -17,8 +17,12 @@ Local GitLab data management with semantic search, people intelligence, and temp
- **Raw payload storage**: Preserves original GitLab API responses for debugging
- **Discussion threading**: Full support for issue and MR discussions including inline code review comments
- **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
- **Work item status enrichment**: Fetches issue statuses (e.g., "To do", "In progress", "Done") from GitLab's GraphQL API with adaptive page sizing, color-coded display, and case-insensitive filtering
- **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
- **Note querying**: Rich filtering over discussion notes by author, type, path, resolution status, time range, and body content
- **Discussion drift detection**: Semantic analysis of how discussions diverge from original issue intent
- **Robot mode**: Machine-readable JSON output with structured errors, meaningful exit codes, and actionable recovery steps
- **Error tolerance**: Auto-corrects common CLI mistakes (case, typos, single-dash flags, value casing) with teaching feedback
- **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing
## Installation
@@ -70,6 +74,12 @@ lore who @asmith
# Timeline of events related to deployments
lore timeline "deployment"
# Timeline for a specific issue
lore timeline issue:42
# Query notes by author
lore notes --author alice --since 7d
# Robot mode (machine-readable JSON)
lore -J issues -n 5 | jq .
```
@@ -90,13 +100,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
{ "path": "group/project" },
{ "path": "other-group/other-project" }
],
"defaultProject": "group/project",
"sync": {
"backfillDays": 14,
"staleLockMinutes": 10,
"heartbeatIntervalSeconds": 30,
"cursorRewindSeconds": 2,
"primaryConcurrency": 4,
"dependentConcurrency": 2
"dependentConcurrency": 2,
"fetchWorkItemStatus": true
},
"storage": {
"compressRawPayloads": true
@@ -106,6 +118,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
"model": "nomic-embed-text",
"baseUrl": "http://localhost:11434",
"concurrency": 4
},
"scoring": {
"authorWeight": 25,
"reviewerWeight": 10,
"noteBonus": 1,
"authorHalfLifeDays": 180,
"reviewerHalfLifeDays": 90,
"noteHalfLifeDays": 45,
"excludedUsernames": ["bot-user"]
}
}
```
@@ -117,12 +138,14 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
| `gitlab` | `baseUrl` | -- | GitLab instance URL (required) |
| `gitlab` | `tokenEnvVar` | `GITLAB_TOKEN` | Environment variable containing API token |
| `projects` | `path` | -- | Project path (e.g., `group/project`) |
| *(top-level)* | `defaultProject` | none | Fallback project path used when `-p` is omitted. Must match a configured project path (exact or suffix). CLI `-p` always overrides. |
| `sync` | `backfillDays` | `14` | Days to backfill on initial sync |
| `sync` | `staleLockMinutes` | `10` | Minutes before sync lock considered stale |
| `sync` | `heartbeatIntervalSeconds` | `30` | Frequency of lock heartbeat updates |
| `sync` | `cursorRewindSeconds` | `2` | Seconds to rewind cursor for overlap safety |
| `sync` | `primaryConcurrency` | `4` | Concurrent GitLab requests for primary resources |
| `sync` | `dependentConcurrency` | `2` | Concurrent requests for dependent resources |
| `sync` | `fetchWorkItemStatus` | `true` | Enrich issues with work item status via GraphQL (requires GitLab Premium/Ultimate) |
| `storage` | `dbPath` | `~/.local/share/lore/lore.db` | Database file path |
| `storage` | `backupDir` | `~/.local/share/lore/backups` | Backup directory |
| `storage` | `compressRawPayloads` | `true` | Compress stored API responses with gzip |
@@ -130,6 +153,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
| `embedding` | `model` | `nomic-embed-text` | Model name for embeddings |
| `embedding` | `baseUrl` | `http://localhost:11434` | Ollama server URL |
| `embedding` | `concurrency` | `4` | Concurrent embedding requests |
| `scoring` | `authorWeight` | `25` | Points per MR where the user authored code touching the path |
| `scoring` | `reviewerWeight` | `10` | Points per MR where the user reviewed code touching the path |
| `scoring` | `noteBonus` | `1` | Bonus per inline review comment (DiffNote) |
| `scoring` | `reviewerAssignmentWeight` | `3` | Points per MR where the user was assigned as reviewer |
| `scoring` | `authorHalfLifeDays` | `180` | Half-life in days for author contribution decay |
| `scoring` | `reviewerHalfLifeDays` | `90` | Half-life in days for reviewer contribution decay |
| `scoring` | `noteHalfLifeDays` | `45` | Half-life in days for note/comment decay |
| `scoring` | `closedMrMultiplier` | `0.5` | Score multiplier for closed (not merged) MRs |
| `scoring` | `excludedUsernames` | `[]` | Usernames excluded from expert results (e.g., bots) |
### Config File Resolution
@@ -184,6 +216,8 @@ lore issues --since 1m # Updated in last month
lore issues --since 2024-01-01 # Updated since date
lore issues --due-before 2024-12-31 # Due before date
lore issues --has-due # Only issues with due dates
lore issues --status "In progress" # By work item status (case-insensitive)
lore issues --status "To do" --status "In progress" # Multiple statuses (OR)
lore issues -p group/repo # Filter by project
lore issues --sort created --asc # Sort by created date, ascending
lore issues -o # Open first result in browser
@@ -193,13 +227,13 @@ lore -J issues --fields minimal # Compact: iid, title, state, updated_at_i
lore -J issues --fields iid,title,labels,state # Custom fields
```
When listing, output includes: IID, title, state, author, assignee, labels, and update time. In robot mode, the `--fields` flag controls which fields appear in the JSON response.
When listing, output includes: IID, title, state, status (when any issue has one), assignee, labels, and update time. Status values display with their configured color. In robot mode, the `--fields` flag controls which fields appear in the JSON response.
When showing a single issue (e.g., `lore issues 123`), output includes: title, description, state, author, assignees, labels, milestone, due date, web URL, and threaded discussions.
When showing a single issue (e.g., `lore issues 123`), output includes: title, description, state, work item status (with color and category), author, assignees, labels, milestone, due date, web URL, and threaded discussions.
#### Project Resolution
The `-p` / `--project` flag uses cascading match logic across all commands:
When `-p` / `--project` is omitted, the `defaultProject` from config is used as a fallback. If neither is set, results span all configured projects. When a project is specified (via `-p` or config default), it uses cascading match logic across all commands:
1. **Exact match**: `group/project`
2. **Case-insensitive**: `Group/Project`
@@ -255,18 +289,21 @@ lore search "login flow" --mode semantic # Vector similarity only
lore search "auth" --type issue # Filter by source type
lore search "auth" --type mr # MR documents only
lore search "auth" --type discussion # Discussion documents only
lore search "auth" --type note # Individual notes only
lore search "deploy" --author username # Filter by author
lore search "deploy" -p group/repo # Filter by project
lore search "deploy" --label backend # Filter by label (AND logic)
lore search "deploy" --path src/ # Filter by file path (trailing / for prefix)
lore search "deploy" --after 7d # Created after (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-after 2w # Updated after
lore search "deploy" --since 7d # Created since (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-since 2w # Updated since
lore search "deploy" -n 50 # Limit results (default 20, max 100)
lore search "deploy" --explain # Show ranking explanation per result
lore search "deploy" --fts-mode raw # Raw FTS5 query syntax (advanced)
```
The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. Use `raw` for advanced FTS5 query syntax (AND, OR, NOT, phrase matching, prefix queries).
The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. FTS5 boolean operators (`AND`, `OR`, `NOT`, `NEAR`) are passed through in safe mode, so queries like `"switch AND health"` work without switching to raw mode. Use `raw` for advanced FTS5 query syntax (phrase matching, column filters, prefix queries).
A progress spinner displays during search, showing the active mode (e.g., `Searching (hybrid)...`). In robot mode, spinners are suppressed for clean JSON output.
Requires `lore generate-docs` (or `lore sync`) to have been run at least once. Semantic and hybrid modes require `lore embed` (or `lore sync`) to have generated vector embeddings via Ollama.
@@ -276,7 +313,7 @@ People intelligence: discover experts, analyze workloads, review patterns, activ
#### Expert Mode
Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis).
Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis). Scores use exponential half-life decay so recent contributions count more than older ones. Scoring weights and half-life periods are configurable via the `scoring` config section.
```bash
lore who src/features/auth/ # Who knows about this directory?
@@ -285,6 +322,9 @@ lore who --path README.md # Root files need --path flag
lore who --path Makefile # Dotless root files too
lore who src/ --since 3m # Limit to recent 3 months
lore who src/ -p group/repo # Scope to project
lore who src/ --explain-score # Show per-component score breakdown
lore who src/ --as-of 30d # Score as if "now" was 30 days ago
lore who src/ --include-bots # Include bot users in results
```
The target is auto-detected as a path when it contains `/`. For root files without `/` (e.g., `README.md`), use the `--path` flag. Default time window: 6 months.
@@ -341,21 +381,32 @@ Shows: users with touch counts (author vs. review), linked MR references. Defaul
| `-p` / `--project` | Scope to a project (fuzzy match) |
| `--since` | Time window (7d, 2w, 6m, YYYY-MM-DD). Default varies by mode. |
| `-n` / `--limit` | Max results per section (1-500, default 20) |
| `--all-history` | Remove the default time window, query all history |
| `--detail` | Show per-MR detail breakdown (expert mode only) |
| `--explain-score` | Show per-component score breakdown (expert mode only) |
| `--as-of` | Score as if "now" is a past date (ISO 8601 or duration like 30d, expert mode only) |
| `--include-bots` | Include bot users normally excluded via `scoring.excludedUsernames` |
### `lore timeline`
Reconstruct a chronological timeline of events matching a keyword query. The pipeline discovers related entities through cross-reference graph traversal and assembles a unified, time-ordered event stream.
```bash
lore timeline "deployment" # Events related to deployments
lore timeline "deployment" # Search-based seeding (hybrid search)
lore timeline issue:42 # Direct entity seeding by issue IID
lore timeline i:42 # Shorthand for issue:42
lore timeline mr:99 # Direct entity seeding by MR IID
lore timeline m:99 # Shorthand for mr:99
lore timeline "auth" -p group/repo # Scoped to a project
lore timeline "auth" --since 30d # Only recent events
lore timeline "migration" --depth 2 # Deeper cross-reference expansion
lore timeline "migration" --expand-mentions # Follow 'mentioned' edges (high fan-out)
lore timeline "migration" --no-mentions # Skip 'mentioned' edges (reduces fan-out)
lore timeline "deploy" -n 50 # Limit event count
lore timeline "auth" --max-seeds 5 # Fewer seed entities
```
The query can be either a search string (hybrid search finds matching entities) or an entity reference (`issue:N`, `i:N`, `mr:N`, `m:N`) which directly seeds the timeline from a specific entity and its cross-references.
#### Flags
| Flag | Default | Description |
@@ -363,18 +414,21 @@ lore timeline "auth" --max-seeds 5 # Fewer seed entities
| `-p` / `--project` | all | Scope to a specific project (fuzzy match) |
| `--since` | none | Only events after this date (7d, 2w, 6m, YYYY-MM-DD) |
| `--depth` | `1` | Cross-reference expansion depth (0 = seeds only) |
| `--expand-mentions` | off | Also follow "mentioned" edges during expansion |
| `--no-mentions` | off | Skip "mentioned" edges during expansion (reduces fan-out) |
| `-n` / `--limit` | `100` | Maximum events to display |
| `--max-seeds` | `10` | Maximum seed entities from search |
| `--max-entities` | `50` | Maximum entities discovered via cross-references |
| `--max-evidence` | `10` | Maximum evidence notes included |
| `--fields` | all | Select output fields (comma-separated, or 'minimal' preset) |
#### Pipeline Stages
1. **SEED** -- Full-text search identifies the most relevant issues and MRs matching the query. Documents are ranked by BM25 relevance.
2. **HYDRATE** -- Evidence notes are extracted: the top FTS-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced.
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities via "closes", "related", and optionally "mentioned" references up to the configured depth.
4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking.
Each stage displays a numbered progress spinner (e.g., `[1/3] Seeding timeline...`). In robot mode, spinners are suppressed for clean JSON output.
1. **SEED** -- Hybrid search (FTS5 lexical + Ollama vector similarity via Reciprocal Rank Fusion) identifies the most relevant issues and MRs. Falls back to lexical-only if Ollama is unavailable. Discussion notes matching the query are also discovered and attached to their parent entities.
2. **HYDRATE** -- Evidence notes are extracted: the top search-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced. Matched discussions are collected as full thread candidates.
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities via "closes", "related", and "mentioned" references up to the configured depth. Use `--no-mentions` to exclude "mentioned" edges and reduce fan-out.
4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, evidence notes, and full discussion threads. Events are sorted chronologically with stable tiebreaking.
5. **RENDER** -- Events are formatted as human-readable text or structured JSON (robot mode).
#### Event Types
@@ -388,16 +442,73 @@ lore timeline "auth" --max-seeds 5 # Fewer seed entities
| `MilestoneSet` | Milestone assigned |
| `MilestoneRemoved` | Milestone removed |
| `Merged` | MR merged (deduplicated against state events) |
| `NoteEvidence` | Discussion note matched by FTS, with snippet |
| `NoteEvidence` | Discussion note matched by search, with snippet |
| `DiscussionThread` | Full discussion thread with all non-system notes |
| `CrossReferenced` | Reference to another entity |
#### Unresolved References
When graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the output. This enables discovery of external dependencies and can inform future sync targets.
### `lore notes`
Query individual notes from discussions with rich filtering options.
```bash
lore notes # List 50 most recent notes
lore notes --author alice --since 7d # Notes by alice in last 7 days
lore notes --for-issue 42 -p group/repo # Notes on issue #42
lore notes --for-mr 99 -p group/repo # Notes on MR !99
lore notes --path src/ --resolution unresolved # Unresolved diff notes in src/
lore notes --note-type DiffNote # Only inline code review comments
lore notes --contains "TODO" # Substring search in note body
lore notes --include-system # Include system-generated notes
lore notes --since 2w --until 2024-12-31 # Time-bounded range
lore notes --sort updated --asc # Sort by update time, ascending
lore notes --format csv # CSV output
lore notes --format jsonl # Line-delimited JSON
lore notes -o # Open first result in browser
# Field selection (robot mode)
lore -J notes --fields minimal # Compact: id, author_username, body, created_at_iso
```
#### Filters
| Flag | Description |
|------|-------------|
| `-a` / `--author` | Filter by note author username |
| `--note-type` | Filter by note type (DiffNote, DiscussionNote) |
| `--contains` | Substring search in note body |
| `--note-id` | Filter by internal note ID |
| `--gitlab-note-id` | Filter by GitLab note ID |
| `--discussion-id` | Filter by discussion ID |
| `--include-system` | Include system notes (excluded by default) |
| `--for-issue` | Notes on a specific issue IID (requires `-p`) |
| `--for-mr` | Notes on a specific MR IID (requires `-p`) |
| `-p` / `--project` | Scope to a project (fuzzy match) |
| `--since` | Notes created since (7d, 2w, 1m, or YYYY-MM-DD) |
| `--until` | Notes created until (YYYY-MM-DD, inclusive end-of-day) |
| `--path` | Filter by file path (DiffNotes only; trailing `/` for prefix match) |
| `--resolution` | Filter by resolution status (`any`, `unresolved`, `resolved`) |
| `--sort` | Sort by `created` (default) or `updated` |
| `--asc` | Sort ascending (default: descending) |
| `--format` | Output format: `table` (default), `json`, `jsonl`, `csv` |
| `-o` / `--open` | Open first result in browser |
### `lore drift`
Detect discussion divergence from the original intent of an issue by comparing the semantic similarity of discussion content against the issue description.
```bash
lore drift issues 42 # Check divergence on issue #42
lore drift issues 42 --threshold 0.6 # Higher threshold (stricter)
lore drift issues 42 -p group/repo # Scope to project
```
### `lore sync`
Run the full sync pipeline: ingest from GitLab, generate searchable documents, and compute embeddings.
Run the full sync pipeline: ingest from GitLab (including work item status enrichment via GraphQL), generate searchable documents, and compute embeddings.
```bash
lore sync # Full pipeline
@@ -406,6 +517,7 @@ lore sync --force # Override stale lock
lore sync --no-embed # Skip embedding step
lore sync --no-docs # Skip document regeneration
lore sync --no-events # Skip resource event fetching
lore sync --no-file-changes # Skip MR file change fetching
lore sync --dry-run # Preview what would be synced
```
@@ -413,11 +525,11 @@ The sync command displays animated progress bars for each stage and outputs timi
### `lore ingest`
Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings).
Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings). For issue ingestion, this includes a status enrichment phase that fetches work item statuses via the GitLab GraphQL API.
```bash
lore ingest # Ingest everything (issues + MRs)
lore ingest issues # Issues only
lore ingest issues # Issues only (includes status enrichment)
lore ingest mrs # MRs only
lore ingest issues -p group/repo # Single project
lore ingest --force # Override stale lock
@@ -430,6 +542,8 @@ The `--full` flag resets sync cursors and discussion watermarks, then fetches al
- You want to ensure complete data after schema changes
- Troubleshooting sync issues
Status enrichment uses adaptive page sizing (100 → 50 → 25 → 10) to handle GitLab GraphQL complexity limits. It gracefully handles instances without GraphQL support or Premium/Ultimate licensing. Disable via `sync.fetchWorkItemStatus: false` in config.
### `lore generate-docs`
Extract searchable documents from ingested issues, MRs, and discussions for the FTS5 index.
@@ -501,12 +615,15 @@ lore init --force # Overwrite existing config
lore init --non-interactive # Fail if prompts needed
```
When multiple projects are configured, `init` prompts whether to set a default project (used when `-p` is omitted). This can also be set via the `--default-project` flag.
In robot mode, `init` supports non-interactive setup via flags:
```bash
lore -J init --gitlab-url https://gitlab.com \
--token-env-var GITLAB_TOKEN \
--projects "group/project,other/project"
--projects "group/project,other/project" \
--default-project group/project
```
### `lore auth`
@@ -559,6 +676,7 @@ Machine-readable command manifest for agent self-discovery. Returns a JSON schem
```bash
lore robot-docs # Pretty-printed JSON
lore --robot robot-docs # Compact JSON for parsing
lore robot-docs --brief # Omit response_schema (~60% smaller)
```
### `lore version`
@@ -610,7 +728,7 @@ The `actions` array contains executable shell commands an agent can run to recov
### Field Selection
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response, reducing token usage for AI agent workflows:
The `--fields` flag controls which fields appear in the JSON response, reducing token usage for AI agent workflows. Supported on `issues`, `mrs`, `notes`, `search`, `timeline`, and `who` list commands:
```bash
# Minimal preset (~60% fewer tokens)
@@ -623,10 +741,52 @@ lore -J issues --fields iid,title,state,labels,updated_at_iso
# minimal: iid, title, state, updated_at_iso
```
Valid fields for issues: `iid`, `title`, `state`, `author_username`, `labels`, `assignees`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`
Valid fields for issues: `iid`, `title`, `state`, `author_username`, `labels`, `assignees`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at_iso`
Valid fields for MRs: `iid`, `title`, `state`, `author_username`, `labels`, `draft`, `target_branch`, `source_branch`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `reviewers`
### Error Tolerance
The CLI auto-corrects common mistakes before parsing, emitting a teaching note to stderr. Corrections work in both human and robot modes:
| Correction | Example | Mode |
|-----------|---------|------|
| Single-dash long flag | `-robot` -> `--robot` | All |
| Case normalization | `--Robot` -> `--robot` | All |
| Flag prefix expansion | `--proj` -> `--project` (unambiguous only) | All |
| Fuzzy flag match | `--projct` -> `--project` | All (threshold 0.9 in robot, 0.8 in human) |
| Subcommand alias | `merge_requests` -> `mrs`, `robotdocs` -> `robot-docs` | All |
| Value normalization | `--state Opened` -> `--state opened` | All |
| Value fuzzy match | `--state opend` -> `--state opened` | All |
| Subcommand prefix | `lore iss` -> `lore issues` (unambiguous only, via clap) | All |
In robot mode, corrections emit structured JSON to stderr:
```json
{"warning":{"type":"ARG_CORRECTED","corrections":[...],"teaching":["Use double-dash for long flags: --robot (not -robot)"]}}
```
When a command or flag is still unrecognized after corrections, the error response includes a fuzzy suggestion and, for enum-like flags, lists valid values:
```json
{"error":{"code":"UNKNOWN_COMMAND","message":"...","suggestion":"Did you mean 'lore issues'? Example: lore --robot issues -n 10. Run 'lore robot-docs' for all commands"}}
```
### Command Aliases
Commands accept aliases for common variations:
| Primary | Aliases |
|---------|---------|
| `issues` | `issue` |
| `mrs` | `mr`, `merge-requests`, `merge-request` |
| `notes` | `note` |
| `search` | `find`, `query` |
| `stats` | `stat` |
| `status` | `st` |
Unambiguous prefixes also work via subcommand inference (e.g., `lore iss` -> `lore issues`, `lore time` -> `lore timeline`).
### Agent Self-Discovery
The `robot-docs` command provides a complete machine-readable manifest including response schemas for every command:
@@ -714,7 +874,7 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
| Table | Purpose |
|-------|---------|
| `projects` | Tracked GitLab projects with metadata |
| `issues` | Issue metadata (title, state, author, due date, milestone) |
| `issues` | Issue metadata (title, state, author, due date, milestone, work item status) |
| `merge_requests` | MR metadata (title, state, draft, branches, merge status, commit SHAs) |
| `milestones` | Project milestones with state and due dates |
| `labels` | Project labels with colors |

View File

@@ -0,0 +1,245 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 300, "y": 15, "text": "Human User Flow Map", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 220, "y": 53, "text": "15 human workflows mapped to lore commands. Arrows show data dependency.", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "text", "id": "col-trigger", "x": 60, "y": 80, "text": "TRIGGER (Problem)", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "col-flow", "x": 400, "y": 80, "text": "COMMAND FLOW", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "col-gap", "x": 880, "y": 80, "text": "GAP", "fontSize": 16, "strokeColor": "#ef4444" },
{ "type": "rectangle", "id": "zone-daily", "x": 20, "y": 110, "width": 960, "height": 190,
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-daily-label", "x": 30, "y": 115, "text": "Daily Operations", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "rectangle", "id": "h1-trigger", "x": 30, "y": 140, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H1: Standup prep\n\"What moved overnight?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a1", "x": 230, "y": 165, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-cmd1", "x": 280, "y": 145, "width": 90, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "sync -q", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a2", "x": 370, "y": 165, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-cmd2", "x": 400, "y": 145, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues --since 1d", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a3", "x": 540, "y": 165, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-cmd3", "x": 570, "y": 145, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs --since 1d", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a4", "x": 700, "y": 165, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-cmd4", "x": 730, "y": 145, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "who @me", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a5", "x": 830, "y": 165, "width": 40, "height": 0,
"points": [[0,0],[40,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-gap", "x": 870, "y": 140, "width": 100, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No @me\nNo feed", "fontSize": 14 } },
{ "type": "rectangle", "id": "h3-trigger", "x": 30, "y": 210, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H3: Incident\n\"Deploy broke prod\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h3-a1", "x": 230, "y": 235, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h3-cmd1", "x": 280, "y": 215, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "timeline deploy", "fontSize": 14 } },
{ "type": "arrow", "id": "h3-a2", "x": 410, "y": 235, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h3-cmd2", "x": 440, "y": 215, "width": 160, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "search deploy --mr", "fontSize": 14 } },
{ "type": "arrow", "id": "h3-a3", "x": 600, "y": 235, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h3-cmd3", "x": 630, "y": 215, "width": 110, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs <iid>", "fontSize": 14 } },
{ "type": "arrow", "id": "h3-a4", "x": 740, "y": 235, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h3-cmd4", "x": 770, "y": 215, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "who --overlap", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-planning", "x": 20, "y": 310, "width": 960, "height": 190,
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-planning-label", "x": 30, "y": 315, "text": "Planning & Assignment", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "rectangle", "id": "h2-trigger", "x": 30, "y": 340, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H2: Sprint plan\n\"What's ready to pick?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h2-a1", "x": 230, "y": 365, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h2-cmd1", "x": 280, "y": 345, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues -s opened -l ready", "fontSize": 13 } },
{ "type": "arrow", "id": "h2-a2", "x": 450, "y": 365, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h2-cmd2", "x": 480, "y": 345, "width": 150, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues --has-due", "fontSize": 14 } },
{ "type": "arrow", "id": "h2-a3", "x": 630, "y": 365, "width": 230, "height": 0,
"points": [[0,0],[230,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h2-gap", "x": 860, "y": 340, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No\n--no-assignee", "fontSize": 14 } },
{ "type": "rectangle", "id": "h8-trigger", "x": 30, "y": 410, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H8: Assign work\n\"Who has bandwidth?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h8-a1", "x": 230, "y": 435, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h8-cmd1", "x": 280, "y": 415, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who @alice", "fontSize": 14 } },
{ "type": "arrow", "id": "h8-a2", "x": 400, "y": 435, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h8-cmd2", "x": 430, "y": 415, "width": 110, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who @bob", "fontSize": 14 } },
{ "type": "arrow", "id": "h8-a3", "x": 540, "y": 435, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h8-cmd3", "x": 570, "y": 415, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who @carol...", "fontSize": 14 } },
{ "type": "arrow", "id": "h8-a4", "x": 690, "y": 435, "width": 170, "height": 0,
"points": [[0,0],[170,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h8-gap", "x": 860, "y": 410, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No team\nworkload view", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-investigation", "x": 20, "y": 510, "width": 960, "height": 260,
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-invest-label", "x": 30, "y": 515, "text": "Investigation & Understanding", "fontSize": 14, "strokeColor": "#b45309" },
{ "type": "rectangle", "id": "h7-trigger", "x": 30, "y": 540, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H7: Why this way?\n\"Understand a decision\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h7-a1", "x": 230, "y": 565, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h7-cmd1", "x": 280, "y": 545, "width": 160, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "search \"rationale\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h7-a2", "x": 440, "y": 565, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h7-cmd2", "x": 470, "y": 545, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "timeline --depth 2", "fontSize": 14 } },
{ "type": "arrow", "id": "h7-a3", "x": 610, "y": 565, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h7-cmd3", "x": 640, "y": 545, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues 234", "fontSize": 14 } },
{ "type": "arrow", "id": "h7-a4", "x": 740, "y": 565, "width": 120, "height": 0,
"points": [[0,0],[120,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h7-gap", "x": 860, "y": 540, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No per-note\nsearch", "fontSize": 14 } },
{ "type": "rectangle", "id": "h11-trigger", "x": 30, "y": 610, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H11: Bug lifecycle\n\"Why does #321 reopen?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h11-a1", "x": 230, "y": 635, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h11-cmd1", "x": 280, "y": 615, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues 321", "fontSize": 14 } },
{ "type": "arrow", "id": "h11-a2", "x": 400, "y": 635, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h11-cmd2", "x": 430, "y": 615, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "timeline ???", "fontSize": 14 } },
{ "type": "arrow", "id": "h11-a3", "x": 560, "y": 635, "width": 300, "height": 0,
"points": [[0,0],[300,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h11-gap", "x": 860, "y": 610, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No entity\ntimeline", "fontSize": 14 } },
{ "type": "rectangle", "id": "h14-trigger", "x": 30, "y": 680, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H14: Prior art?\n\"Was this tried before?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h14-a1", "x": 230, "y": 705, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h14-cmd1", "x": 280, "y": 685, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "search \"memory leak\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h14-a2", "x": 450, "y": 705, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h14-cmd2", "x": 480, "y": 685, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "mrs --closed?", "fontSize": 14 } },
{ "type": "arrow", "id": "h14-a3", "x": 600, "y": 705, "width": 260, "height": 0,
"points": [[0,0],[260,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h14-gap", "x": 860, "y": 680, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No --state\non search", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-people", "x": 20, "y": 780, "width": 960, "height": 190,
"backgroundColor": "#e5dbff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#8b5cf6", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-people-label", "x": 30, "y": 785, "text": "People & Expertise", "fontSize": 14, "strokeColor": "#7048e8" },
{ "type": "rectangle", "id": "h4-trigger", "x": 30, "y": 810, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H4: Review prep\n\"Context for MR !789\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h4-a1", "x": 230, "y": 835, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h4-cmd1", "x": 280, "y": 815, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs 789", "fontSize": 14 } },
{ "type": "arrow", "id": "h4-a2", "x": 380, "y": 835, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h4-cmd2", "x": 410, "y": 815, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "who src/auth/", "fontSize": 14 } },
{ "type": "arrow", "id": "h4-a3", "x": 530, "y": 835, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h4-cmd3", "x": 560, "y": 815, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "search \"auth\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h4-a4", "x": 690, "y": 835, "width": 170, "height": 0,
"points": [[0,0],[170,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h4-gap", "x": 860, "y": 810, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No MR file\nlist output", "fontSize": 14 } },
{ "type": "rectangle", "id": "h6-trigger", "x": 30, "y": 880, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H6: Find reviewer\n\"Who should review?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h6-a1", "x": 230, "y": 905, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h6-cmd1", "x": 280, "y": 885, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who src/auth/", "fontSize": 14 } },
{ "type": "arrow", "id": "h6-a2", "x": 410, "y": 905, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h6-cmd2", "x": 440, "y": 885, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who src/pay/", "fontSize": 14 } },
{ "type": "arrow", "id": "h6-a3", "x": 580, "y": 905, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h6-cmd3", "x": 610, "y": 885, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who @candidate", "fontSize": 14 } },
{ "type": "arrow", "id": "h6-a4", "x": 750, "y": 905, "width": 110, "height": 0,
"points": [[0,0],[110,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h6-gap", "x": 860, "y": 880, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No multi-\npath query", "fontSize": 14 } },
{ "type": "text", "id": "callout-1", "x": 30, "y": 990, "text": "Pattern: Most human flows require 3-5 serial commands. Average gap rate: 73% of flows have at least one.", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "callout-2", "x": 30, "y": 1015, "text": "Top optimization: Composite commands (activity feed, team workload) would reduce multi-command flows by ~40%.", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "text", "id": "callout-3", "x": 30, "y": 1040, "text": "Top missing data: MR file changes and entity references are stored but invisible to CLI users.", "fontSize": 14, "strokeColor": "#ef4444" }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 274 KiB

View File

@@ -0,0 +1,204 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 320, "y": 15, "text": "AI Agent Flow Map", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 180, "y": 53, "text": "15 agent automation workflows. Agents need structured JSON (-J), exit codes, and field selection.", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "text", "id": "col-trigger", "x": 60, "y": 80, "text": "TRIGGER (Agent Goal)", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "col-flow", "x": 400, "y": 80, "text": "COMMAND PIPELINE", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "col-gap", "x": 880, "y": 80, "text": "BLOCKED BY", "fontSize": 16, "strokeColor": "#ef4444" },
{ "type": "rectangle", "id": "zone-context", "x": 20, "y": 110, "width": 960, "height": 200,
"backgroundColor": "#e5dbff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#8b5cf6", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-context-label", "x": 30, "y": 115, "text": "Context Gathering (pre-action)", "fontSize": 14, "strokeColor": "#7048e8" },
{ "type": "rectangle", "id": "a1-trigger", "x": 30, "y": 140, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A1: Pre-edit context\nAbout to modify files", "fontSize": 14 } },
{ "type": "arrow", "id": "a1-a1", "x": 230, "y": 165, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a1-cmd1", "x": 280, "y": 145, "width": 80, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J health", "fontSize": 14 } },
{ "type": "arrow", "id": "a1-a2", "x": 360, "y": 165, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a1-cmd2", "x": 380, "y": 145, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J who src/auth/", "fontSize": 14 } },
{ "type": "arrow", "id": "a1-a3", "x": 520, "y": 165, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a1-cmd3", "x": 540, "y": 145, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J search \"auth\" -n 10", "fontSize": 14 } },
{ "type": "arrow", "id": "a1-a4", "x": 710, "y": 165, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a1-cmd4", "x": 730, "y": 145, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J who --overlap", "fontSize": 14 } },
{ "type": "rectangle", "id": "a6-trigger", "x": 30, "y": 210, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A6: Auto-assign reviewers\nBased on file expertise", "fontSize": 14 } },
{ "type": "arrow", "id": "a6-a1", "x": 230, "y": 235, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a6-cmd1", "x": 280, "y": 215, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "-J mrs 456", "fontSize": 14 } },
{ "type": "text", "id": "a6-block", "x": 390, "y": 218, "text": "file list not\nin response!", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "arrow", "id": "a6-a2", "x": 380, "y": 245, "width": 480, "height": -10,
"points": [[0,0],[480,-10]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a6-gap", "x": 860, "y": 210, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "MR files\nnot exposed", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-report", "x": 20, "y": 320, "width": 960, "height": 200,
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-report-label", "x": 30, "y": 325, "text": "Reporting & Synthesis", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "rectangle", "id": "a3-trigger", "x": 30, "y": 350, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A3: Sprint status report\n7 queries for 1 report", "fontSize": 14 } },
{ "type": "arrow", "id": "a3-a1", "x": 230, "y": 375, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a3-cmd1", "x": 280, "y": 352, "width": 100, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues -s closed", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd2", "x": 390, "y": 352, "width": 100, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues --status", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd3", "x": 500, "y": 352, "width": 100, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs -s merged", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd4", "x": 610, "y": 352, "width": 80, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs -s open", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd5", "x": 700, "y": 352, "width": 80, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "count x2", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd6", "x": 790, "y": 352, "width": 60, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "who", "fontSize": 12 } },
{ "type": "arrow", "id": "a3-agap", "x": 850, "y": 370, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a3-gap", "x": 860, "y": 350, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No summary\ncommand", "fontSize": 14 } },
{ "type": "text", "id": "a3-note", "x": 280, "y": 395, "text": "7 sequential API calls for one report. A `lore summary` could reduce to 1.", "fontSize": 12, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "a7-trigger", "x": 30, "y": 430, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A7: Incident timeline\nPostmortem reconstruction", "fontSize": 14 } },
{ "type": "arrow", "id": "a7-a1", "x": 230, "y": 455, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a7-cmd1", "x": 280, "y": 435, "width": 190, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J timeline --depth 2", "fontSize": 14 } },
{ "type": "arrow", "id": "a7-a2", "x": 470, "y": 455, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a7-cmd2", "x": 490, "y": 435, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J search --since 3d", "fontSize": 14 } },
{ "type": "arrow", "id": "a7-a3", "x": 660, "y": 455, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a7-cmd3", "x": 680, "y": 435, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J mrs -s merged", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-discover", "x": 20, "y": 530, "width": 960, "height": 200,
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-discover-label", "x": 30, "y": 535, "text": "Discovery & Correlation", "fontSize": 14, "strokeColor": "#b45309" },
{ "type": "rectangle", "id": "a5-trigger", "x": 30, "y": 560, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A5: PR description\nFind related issues to link", "fontSize": 14 } },
{ "type": "arrow", "id": "a5-a1", "x": 230, "y": 585, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a5-cmd1", "x": 280, "y": 565, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J search keywords", "fontSize": 14 } },
{ "type": "arrow", "id": "a5-a2", "x": 450, "y": 585, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a5-cmd2", "x": 470, "y": 565, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J issues --fields iid,url", "fontSize": 14 } },
{ "type": "arrow", "id": "a5-a3", "x": 650, "y": 585, "width": 210, "height": 0,
"points": [[0,0],[210,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a5-gap", "x": 860, "y": 560, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No refs\nquery", "fontSize": 14 } },
{ "type": "text", "id": "a5-note", "x": 280, "y": 612, "text": "Agent can't ask \"which issues does MR !456 close?\" -- entity_references data exists but isn't queryable.", "fontSize": 12, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "a11-trigger", "x": 30, "y": 640, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A11: Knowledge graph\nMap entity relationships", "fontSize": 14 } },
{ "type": "arrow", "id": "a11-a1", "x": 230, "y": 665, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a11-cmd1", "x": 280, "y": 645, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J search -n 30", "fontSize": 14 } },
{ "type": "arrow", "id": "a11-a2", "x": 420, "y": 665, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a11-cmd2", "x": 440, "y": 645, "width": 190, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J timeline --depth 2", "fontSize": 14 } },
{ "type": "arrow", "id": "a11-a3", "x": 630, "y": 665, "width": 230, "height": 0,
"points": [[0,0],[230,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a11-gap", "x": 860, "y": 640, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No refs\nquery", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-maint", "x": 20, "y": 740, "width": 960, "height": 140,
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-maint-label", "x": 30, "y": 745, "text": "Maintenance & Cleanup", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "rectangle", "id": "a9-trigger", "x": 30, "y": 770, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A9: Stale issue cleanup\nWeekly backlog hygiene", "fontSize": 14 } },
{ "type": "arrow", "id": "a9-a1", "x": 230, "y": 795, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a9-cmd1", "x": 280, "y": 775, "width": 200, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J issues --sort updated --asc", "fontSize": 12 } },
{ "type": "arrow", "id": "a9-a2", "x": 480, "y": 795, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a9-cmd2", "x": 500, "y": 775, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "filter client-side", "fontSize": 14 } },
{ "type": "arrow", "id": "a9-a3", "x": 620, "y": 795, "width": 240, "height": 0,
"points": [[0,0],[240,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a9-gap", "x": 860, "y": 770, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No --before\nNo offset", "fontSize": 14 } },
{ "type": "rectangle", "id": "a15-trigger", "x": 30, "y": 840, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A15: Conflict detect\n\"Safe to start work?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "a15-a1", "x": 230, "y": 865, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a15-cmd1", "x": 280, "y": 845, "width": 110, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J issues 123", "fontSize": 14 } },
{ "type": "arrow", "id": "a15-a2", "x": 390, "y": 865, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a15-cmd2", "x": 410, "y": 845, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J who --overlap", "fontSize": 14 } },
{ "type": "arrow", "id": "a15-a3", "x": 540, "y": 865, "width": 320, "height": 0,
"points": [[0,0],[320,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a15-gap", "x": 860, "y": 840, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No refs +\n--state", "fontSize": 14 } },
{ "type": "text", "id": "callout-1", "x": 30, "y": 910, "text": "Agent-specific pain: Agents always use -J and --fields minimal for token efficiency. Every extra query burns tokens.", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "callout-2", "x": 30, "y": 935, "text": "Biggest ROI: `lore refs` command would unblock A5, A11, A12, A15 instantly. Data already exists in entity_references table.", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "text", "id": "callout-3", "x": 30, "y": 960, "text": "Token waste: Sprint report (A3) requires 7 calls. A composite `lore summary` could save ~85% of tokens.", "fontSize": 14, "strokeColor": "#ef4444" }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 269 KiB

View File

@@ -0,0 +1,203 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 280, "y": 15, "text": "Command Coverage Heatmap", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 220, "y": 53, "text": "Which commands serve which workflows? Darker = more essential to that flow.", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "text", "id": "col-issues", "x": 260, "y": 85, "text": "issues", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-mrs", "x": 330, "y": 85, "text": "mrs", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-search", "x": 390, "y": 85, "text": "search", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-who", "x": 465, "y": 85, "text": "who", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-timeline", "x": 520, "y": 85, "text": "timeline", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-sync", "x": 600, "y": 85, "text": "sync", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-count", "x": 660, "y": 85, "text": "count", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-status", "x": 720, "y": 85, "text": "status", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-missing", "x": 790, "y": 85, "text": "MISSING?", "fontSize": 14, "strokeColor": "#ef4444" },
{ "type": "text", "id": "grp-human", "x": 15, "y": 108, "text": "HUMAN FLOWS", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "text", "id": "h1-label", "x": 15, "y": 135, "text": "H1 Standup prep", "fontSize": 14 },
{ "type": "rectangle", "id": "h1-issues", "x": 255, "y": 130, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h1-mrs", "x": 325, "y": 130, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h1-who", "x": 460, "y": 130, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h1-sync", "x": 595, "y": 130, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h1-gap", "x": 780, "y": 135, "text": "activity feed", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h2-label", "x": 15, "y": 170, "text": "H2 Sprint planning", "fontSize": 14 },
{ "type": "rectangle", "id": "h2-issues", "x": 255, "y": 165, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h2-count", "x": 655, "y": 165, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h2-gap", "x": 780, "y": 170, "text": "--no-assignee", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h3-label", "x": 15, "y": 205, "text": "H3 Incident response", "fontSize": 14 },
{ "type": "rectangle", "id": "h3-mrs", "x": 325, "y": 200, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h3-search", "x": 390, "y": 200, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h3-who", "x": 460, "y": 200, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h3-timeline", "x": 525, "y": 200, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h3-sync", "x": 595, "y": 200, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h4-label", "x": 15, "y": 240, "text": "H4 Code review prep", "fontSize": 14 },
{ "type": "rectangle", "id": "h4-mrs", "x": 325, "y": 235, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h4-search", "x": 390, "y": 235, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h4-who", "x": 460, "y": 235, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h4-timeline", "x": 525, "y": 235, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h4-gap", "x": 780, "y": 240, "text": "MR file list", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h5-label", "x": 15, "y": 275, "text": "H5 Onboarding", "fontSize": 14 },
{ "type": "rectangle", "id": "h5-issues", "x": 255, "y": 270, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h5-mrs", "x": 325, "y": 270, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h5-search", "x": 390, "y": 270, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h5-who", "x": 460, "y": 270, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h5-timeline", "x": 525, "y": 270, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h6-label", "x": 15, "y": 310, "text": "H6 Find reviewer", "fontSize": 14 },
{ "type": "rectangle", "id": "h6-who", "x": 460, "y": 305, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h6-gap", "x": 780, "y": 310, "text": "multi-path who", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h7-label", "x": 15, "y": 345, "text": "H7 Why was this built?", "fontSize": 14 },
{ "type": "rectangle", "id": "h7-issues", "x": 255, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h7-mrs", "x": 325, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h7-search", "x": 390, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h7-timeline", "x": 525, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h7-gap", "x": 780, "y": 345, "text": "per-note search", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h8-label", "x": 15, "y": 380, "text": "H8 Team workload", "fontSize": 14 },
{ "type": "rectangle", "id": "h8-who", "x": 460, "y": 375, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h8-gap", "x": 780, "y": 380, "text": "team view", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h9-label", "x": 15, "y": 415, "text": "H9 Release notes", "fontSize": 14 },
{ "type": "rectangle", "id": "h9-issues", "x": 255, "y": 410, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h9-mrs", "x": 325, "y": 410, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h9-gap", "x": 780, "y": 415, "text": "mrs --milestone", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h10-label", "x": 15, "y": 450, "text": "H10 Stale issues", "fontSize": 14 },
{ "type": "rectangle", "id": "h10-issues", "x": 255, "y": 445, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h10-gap", "x": 780, "y": 450, "text": "--updated-before", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h11-label", "x": 15, "y": 485, "text": "H11 Bug lifecycle", "fontSize": 14 },
{ "type": "rectangle", "id": "h11-issues", "x": 255, "y": 480, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h11-timeline", "x": 525, "y": 480, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "text", "id": "h11-gap", "x": 780, "y": 485, "text": "entity timeline", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h12-label", "x": 15, "y": 520, "text": "H12 Who broke tests?", "fontSize": 14 },
{ "type": "rectangle", "id": "h12-search", "x": 390, "y": 515, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h12-who", "x": 460, "y": 515, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h13-label", "x": 15, "y": 555, "text": "H13 Feature tracking", "fontSize": 14 },
{ "type": "rectangle", "id": "h13-issues", "x": 255, "y": 550, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h13-mrs", "x": 325, "y": 550, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h13-timeline", "x": 525, "y": 550, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h14-label", "x": 15, "y": 590, "text": "H14 Prior art check", "fontSize": 14 },
{ "type": "rectangle", "id": "h14-search", "x": 390, "y": 585, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h14-timeline", "x": 525, "y": 585, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h14-gap", "x": 780, "y": 590, "text": "--state on search", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h15-label", "x": 15, "y": 625, "text": "H15 My discussions", "fontSize": 14 },
{ "type": "rectangle", "id": "h15-who", "x": 460, "y": 620, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "text", "id": "h15-gap", "x": 780, "y": 625, "text": "participant filter", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "rectangle", "id": "divider", "x": 10, "y": 655, "width": 910, "height": 2, "backgroundColor": "#dee2e6", "fillStyle": "solid" },
{ "type": "text", "id": "grp-agent", "x": 15, "y": 668, "text": "AI AGENT FLOWS", "fontSize": 14, "strokeColor": "#7048e8" },
{ "type": "text", "id": "a1-label", "x": 15, "y": 695, "text": "A1 Pre-edit context", "fontSize": 14 },
{ "type": "rectangle", "id": "a1-mrs", "x": 325, "y": 690, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a1-search", "x": 390, "y": 690, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a1-who", "x": 460, "y": 690, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a2-label", "x": 15, "y": 730, "text": "A2 Auto-triage", "fontSize": 14 },
{ "type": "rectangle", "id": "a2-issues", "x": 255, "y": 725, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a2-search", "x": 390, "y": 725, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a2-who", "x": 460, "y": 725, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a2-gap", "x": 780, "y": 730, "text": "detail --fields", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a3-label", "x": 15, "y": 765, "text": "A3 Sprint report", "fontSize": 14 },
{ "type": "rectangle", "id": "a3-issues", "x": 255, "y": 760, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a3-mrs", "x": 325, "y": 760, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a3-who", "x": 460, "y": 760, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a3-count", "x": 655, "y": 760, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a3-gap", "x": 780, "y": 765, "text": "summary cmd", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a4-label", "x": 15, "y": 800, "text": "A4 Prior art", "fontSize": 14 },
{ "type": "rectangle", "id": "a4-search", "x": 390, "y": 795, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a4-timeline", "x": 525, "y": 795, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a4-gap", "x": 780, "y": 800, "text": "per-note search", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a5-label", "x": 15, "y": 835, "text": "A5 PR description", "fontSize": 14 },
{ "type": "rectangle", "id": "a5-issues", "x": 255, "y": 830, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a5-search", "x": 390, "y": 830, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a5-gap", "x": 780, "y": 835, "text": "entity refs query", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a6-label", "x": 15, "y": 870, "text": "A6 Reviewer assign", "fontSize": 14 },
{ "type": "rectangle", "id": "a6-mrs", "x": 325, "y": 865, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a6-who", "x": 460, "y": 865, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a6-gap", "x": 780, "y": 870, "text": "MR file list", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a7-label", "x": 15, "y": 905, "text": "A7 Incident timeline", "fontSize": 14 },
{ "type": "rectangle", "id": "a7-mrs", "x": 325, "y": 900, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a7-search", "x": 390, "y": 900, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a7-timeline", "x": 525, "y": 900, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a8-label", "x": 15, "y": 940, "text": "A8 Cross-project", "fontSize": 14 },
{ "type": "rectangle", "id": "a8-search", "x": 390, "y": 935, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a8-timeline", "x": 525, "y": 935, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a8-gap", "x": 780, "y": 940, "text": "group by project", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a9-label", "x": 15, "y": 975, "text": "A9 Stale cleanup", "fontSize": 14 },
{ "type": "rectangle", "id": "a9-issues", "x": 255, "y": 970, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a9-search", "x": 390, "y": 970, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a9-gap", "x": 780, "y": 975, "text": "--updated-before", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a10-label", "x": 15, "y": 1010, "text": "A10 Review context", "fontSize": 14 },
{ "type": "rectangle", "id": "a10-mrs", "x": 325, "y": 1005, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a10-who", "x": 460, "y": 1005, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a10-gap", "x": 780, "y": 1010, "text": "MR file list", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a11-label", "x": 15, "y": 1045, "text": "A11 Knowledge graph", "fontSize": 14 },
{ "type": "rectangle", "id": "a11-search", "x": 390, "y": 1040, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a11-timeline", "x": 525, "y": 1040, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a11-gap", "x": 780, "y": 1045, "text": "entity refs query", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a12-label", "x": 15, "y": 1080, "text": "A12 Release check", "fontSize": 14 },
{ "type": "rectangle", "id": "a12-issues", "x": 255, "y": 1075, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a12-mrs", "x": 325, "y": 1075, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a12-who", "x": 460, "y": 1075, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a12-gap", "x": 780, "y": 1080, "text": "mrs --milestone", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a13-label", "x": 15, "y": 1115, "text": "A13 What changed?", "fontSize": 14 },
{ "type": "rectangle", "id": "a13-issues", "x": 255, "y": 1110, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a13-mrs", "x": 325, "y": 1110, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a13-gap", "x": 780, "y": 1115, "text": "state-change filter", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a14-label", "x": 15, "y": 1150, "text": "A14 Meeting prep", "fontSize": 14 },
{ "type": "rectangle", "id": "a14-issues", "x": 255, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a14-mrs", "x": 325, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a14-who", "x": 460, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a14-count", "x": 655, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a14-gap", "x": 780, "y": 1150, "text": "summary cmd", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a15-label", "x": 15, "y": 1185, "text": "A15 Conflict detect", "fontSize": 14 },
{ "type": "rectangle", "id": "a15-issues", "x": 255, "y": 1180, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a15-mrs", "x": 325, "y": 1180, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a15-who", "x": 460, "y": 1180, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a15-gap", "x": 780, "y": 1185, "text": "entity refs, --state", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "legend-title", "x": 15, "y": 1230, "text": "Legend:", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-essential", "x": 80, "y": 1228, "width": 20, "height": 20, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "leg-essential-t", "x": 105, "y": 1230, "text": "Essential", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-supporting", "x": 190, "y": 1228, "width": 20, "height": 20, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "leg-supporting-t", "x": 215, "y": 1230, "text": "Supporting", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-partial", "x": 310, "y": 1228, "width": 20, "height": 20, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "text", "id": "leg-partial-t", "x": 335, "y": 1230, "text": "Partially blocked", "fontSize": 14 },
{ "type": "text", "id": "leg-gap-t", "x": 470, "y": 1230, "text": "Red text = gap", "fontSize": 14, "strokeColor": "#ef4444" },
{ "type": "text", "id": "insight-1", "x": 15, "y": 1270, "text": "Key insight: `issues` and `search` are the workhorses (used in 20+ flows).", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "insight-2", "x": 15, "y": 1295, "text": "`who` is critical for people questions but siloed from file-change data.", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "insight-3", "x": 15, "y": 1320, "text": "`timeline` is powerful but keyword-only seeding limits entity-specific queries.", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "insight-4", "x": 15, "y": 1345, "text": "22/30 flows have at least one gap. Most gaps are filter additions, not new commands.", "fontSize": 14, "strokeColor": "#ef4444" }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 217 KiB

View File

@@ -0,0 +1,110 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 300, "y": 20, "text": "Lore CLI Gap Priority Matrix", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 310, "y": 58, "text": "20 identified gaps plotted by impact vs effort", "fontSize": 16, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "q1-zone", "x": 100, "y": 120, "width": 500, "height": 380,
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "q1-label", "x": 110, "y": 126, "text": "QUICK WINS", "fontSize": 18, "strokeColor": "#15803d" },
{ "type": "rectangle", "id": "q2-zone", "x": 620, "y": 120, "width": 500, "height": 380,
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "q2-label", "x": 630, "y": 126, "text": "STRATEGIC", "fontSize": 18, "strokeColor": "#b45309" },
{ "type": "rectangle", "id": "q3-zone", "x": 100, "y": 520, "width": 500, "height": 300,
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "q3-label", "x": 110, "y": 526, "text": "FILL-IN", "fontSize": 18, "strokeColor": "#1971c2" },
{ "type": "rectangle", "id": "q4-zone", "x": 620, "y": 520, "width": 500, "height": 300,
"backgroundColor": "#ffc9c9", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#ef4444", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "q4-label", "x": 630, "y": 526, "text": "DEPRIORITIZE", "fontSize": 18, "strokeColor": "#c92a2a" },
{ "type": "text", "id": "y-axis-hi", "x": 30, "y": 130, "text": "HIGH\nIMPACT", "fontSize": 16, "strokeColor": "#495057", "textAlign": "center" },
{ "type": "text", "id": "y-axis-lo", "x": 30, "y": 550, "text": "LOW\nIMPACT", "fontSize": 16, "strokeColor": "#495057", "textAlign": "center" },
{ "type": "text", "id": "x-axis-lo", "x": 280, "y": 840, "text": "LOW EFFORT", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "x-axis-hi", "x": 800, "y": 840, "text": "HIGH EFFORT", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "arrow", "id": "y-arrow", "x": 85, "y": 810, "width": 0, "height": -680,
"points": [[0,0],[0,-680]], "endArrowhead": "arrow", "strokeColor": "#495057", "strokeWidth": 1 },
{ "type": "arrow", "id": "x-arrow", "x": 85, "y": 810, "width": 1050, "height": 0,
"points": [[0,0],[1050,0]], "endArrowhead": "arrow", "strokeColor": "#495057", "strokeWidth": 1 },
{ "type": "rectangle", "id": "g5", "x": 120, "y": 160, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#5 @me alias", "fontSize": 16 } },
{ "type": "rectangle", "id": "g8", "x": 120, "y": 225, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#8 --state on search", "fontSize": 16 } },
{ "type": "rectangle", "id": "g9", "x": 120, "y": 290, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#9 mrs --milestone", "fontSize": 16 } },
{ "type": "rectangle", "id": "g10", "x": 120, "y": 355, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#10 --no-assignee", "fontSize": 16 } },
{ "type": "rectangle", "id": "g11", "x": 350, "y": 160, "width": 230, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#11 --updated-before", "fontSize": 16 } },
{ "type": "rectangle", "id": "g14", "x": 350, "y": 225, "width": 230, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#14 detail --fields", "fontSize": 16 } },
{ "type": "rectangle", "id": "g18", "x": 350, "y": 290, "width": 230, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#18 1y/12m duration", "fontSize": 16 } },
{ "type": "rectangle", "id": "g20", "x": 350, "y": 355, "width": 230, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#20 sort by due date", "fontSize": 16 } },
{ "type": "rectangle", "id": "g1", "x": 640, "y": 160, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#1 MR file changes", "fontSize": 16 } },
{ "type": "rectangle", "id": "g2", "x": 640, "y": 225, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#2 entity refs query", "fontSize": 16 } },
{ "type": "rectangle", "id": "g3", "x": 640, "y": 290, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#3 per-note search", "fontSize": 16 } },
{ "type": "rectangle", "id": "g4", "x": 880, "y": 160, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#4 entity timeline", "fontSize": 16 } },
{ "type": "rectangle", "id": "g6", "x": 880, "y": 225, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#6 activity feed", "fontSize": 16 } },
{ "type": "rectangle", "id": "g12", "x": 880, "y": 290, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#12 team workload", "fontSize": 16 } },
{ "type": "rectangle", "id": "g13", "x": 120, "y": 570, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "#13 pagination/offset", "fontSize": 16 } },
{ "type": "rectangle", "id": "g15", "x": 120, "y": 635, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "#15 group by project", "fontSize": 16 } },
{ "type": "rectangle", "id": "g19", "x": 120, "y": 700, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "#19 participant filter", "fontSize": 16 } },
{ "type": "rectangle", "id": "g7", "x": 640, "y": 570, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid",
"label": { "text": "#7 multi-path who", "fontSize": 16 } },
{ "type": "rectangle", "id": "g16", "x": 640, "y": 635, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid",
"label": { "text": "#16 trend metrics", "fontSize": 16 } },
{ "type": "rectangle", "id": "g17", "x": 640, "y": 700, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid",
"label": { "text": "#17 --for-issue on mrs", "fontSize": 16 } },
{ "type": "text", "id": "q1-count", "x": 180, "y": 430, "text": "8 gaps - lowest hanging fruit", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "text", "id": "q2-count", "x": 710, "y": 370, "text": "6 gaps - build deliberately", "fontSize": 14, "strokeColor": "#b45309" },
{ "type": "text", "id": "q3-count", "x": 160, "y": 770, "text": "3 gaps - fill as needed", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "q4-count", "x": 680, "y": 770, "text": "3 gaps - defer or rethink", "fontSize": 14, "strokeColor": "#c92a2a" }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 179 KiB

View File

@@ -0,0 +1,184 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 350, "y": 15, "text": "Lore Data Flow Architecture", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 280, "y": 53, "text": "Green = queryable via CLI | Red = stored but hidden | Gray = internal", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "zone-gitlab", "x": 30, "y": 90, "width": 200, "height": 300,
"backgroundColor": "#e5dbff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#8b5cf6", "strokeWidth": 1, "opacity": 30 },
{ "type": "text", "id": "zone-gitlab-label", "x": 55, "y": 96, "text": "GitLab APIs", "fontSize": 16, "strokeColor": "#7048e8" },
{ "type": "rectangle", "id": "rest-api", "x": 50, "y": 130, "width": 160, "height": 60,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "REST API\n(paginated)", "fontSize": 16 } },
{ "type": "rectangle", "id": "graphql-api", "x": 50, "y": 210, "width": 160, "height": 60,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "GraphQL API\n(adaptive pages)", "fontSize": 16 } },
{ "type": "rectangle", "id": "ollama-api", "x": 50, "y": 310, "width": 160, "height": 60,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "Ollama\n(embeddings)", "fontSize": 16 } },
{ "type": "rectangle", "id": "zone-ingest", "x": 270, "y": 90, "width": 180, "height": 300,
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 30 },
{ "type": "text", "id": "zone-ingest-label", "x": 300, "y": 96, "text": "Ingestion", "fontSize": 16, "strokeColor": "#1971c2" },
{ "type": "rectangle", "id": "ingest-issues", "x": 285, "y": 130, "width": 150, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "Issue Sync", "fontSize": 16 } },
{ "type": "rectangle", "id": "ingest-mrs", "x": 285, "y": 195, "width": 150, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "MR Sync", "fontSize": 16 } },
{ "type": "rectangle", "id": "ingest-disc", "x": 285, "y": 260, "width": 150, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "Discussion Sync", "fontSize": 16 } },
{ "type": "rectangle", "id": "ingest-events", "x": 285, "y": 325, "width": 150, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "Event Sync", "fontSize": 16 } },
{ "type": "arrow", "id": "a-rest-issues", "x": 210, "y": 155, "width": 75, "height": 0,
"points": [[0,0],[75,0]], "endArrowhead": "arrow", "strokeColor": "#495057" },
{ "type": "arrow", "id": "a-rest-mrs", "x": 210, "y": 165, "width": 75, "height": 50,
"points": [[0,0],[75,50]], "endArrowhead": "arrow", "strokeColor": "#495057" },
{ "type": "arrow", "id": "a-graphql-issues", "x": 210, "y": 240, "width": 75, "height": -80,
"points": [[0,0],[75,-80]], "endArrowhead": "arrow", "strokeColor": "#495057" },
{ "type": "rectangle", "id": "zone-sqlite", "x": 490, "y": 90, "width": 400, "height": 650,
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-sqlite-label", "x": 570, "y": 96, "text": "SQLite (WAL mode)", "fontSize": 16, "strokeColor": "#15803d" },
{ "type": "text", "id": "grp-queryable", "x": 500, "y": 120, "text": "Queryable Tables", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "rectangle", "id": "t-projects", "x": 500, "y": 145, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "projects", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-issues", "x": 500, "y": 195, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues + assignees", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-mrs", "x": 500, "y": 245, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "merge_requests", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-discussions", "x": 500, "y": 295, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "discussions + notes", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-events", "x": 500, "y": 345, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "resource_*_events", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-docs", "x": 500, "y": 395, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "documents + FTS5", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-embed", "x": 500, "y": 445, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "embeddings (vec)", "fontSize": 14 } },
{ "type": "text", "id": "grp-hidden", "x": 700, "y": 120, "text": "Hidden Tables", "fontSize": 14, "strokeColor": "#c92a2a" },
{ "type": "rectangle", "id": "t-file-changes", "x": 695, "y": 145, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "mr_file_changes", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-entity-refs", "x": 695, "y": 195, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "entity_references", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-raw", "x": 695, "y": 245, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "raw_payloads", "fontSize": 14 } },
{ "type": "text", "id": "grp-internal", "x": 700, "y": 310, "text": "Internal Only", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "t-sync", "x": 695, "y": 340, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96",
"label": { "text": "sync_runs + cursors", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-dirty", "x": 695, "y": 390, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96",
"label": { "text": "dirty_sources", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-locks", "x": 695, "y": 440, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96",
"label": { "text": "app_locks", "fontSize": 14 } },
{ "type": "arrow", "id": "a-ingest-tables", "x": 435, "y": 200, "width": 55, "height": 0,
"points": [[0,0],[55,0]], "endArrowhead": "arrow", "strokeColor": "#495057" },
{ "type": "rectangle", "id": "zone-cli", "x": 930, "y": 90, "width": 250, "height": 650,
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "zone-cli-label", "x": 990, "y": 96, "text": "CLI Commands", "fontSize": 16, "strokeColor": "#b45309" },
{ "type": "rectangle", "id": "cmd-issues", "x": 950, "y": 130, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore issues", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-mrs", "x": 950, "y": 185, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore mrs", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-search", "x": 950, "y": 240, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore search", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-who", "x": 950, "y": 295, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore who", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-timeline", "x": 950, "y": 350, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore timeline", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-count", "x": 950, "y": 405, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore count", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-sync", "x": 950, "y": 460, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore sync", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-status", "x": 950, "y": 515, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore status", "fontSize": 16 } },
{ "type": "arrow", "id": "a-issues-cmd", "x": 670, "y": 215, "width": 270, "height": -65,
"points": [[0,0],[270,-65]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "arrow", "id": "a-mrs-cmd", "x": 670, "y": 265, "width": 270, "height": -60,
"points": [[0,0],[270,-60]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "arrow", "id": "a-docs-cmd", "x": 670, "y": 415, "width": 270, "height": -155,
"points": [[0,0],[270,-155]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "arrow", "id": "a-embed-cmd", "x": 670, "y": 465, "width": 270, "height": -200,
"points": [[0,0],[270,-200]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "arrow", "id": "a-events-cmd", "x": 670, "y": 365, "width": 270, "height": 5,
"points": [[0,0],[270,5]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "text", "id": "hidden-note-1", "x": 695, "y": 498, "text": "mr_file_changes: populated by\nMR sync but NOT queryable.\nBlocks H4, A6, A10 flows.", "fontSize": 14, "strokeColor": "#ef4444" },
{ "type": "text", "id": "hidden-note-2", "x": 695, "y": 568, "text": "entity_references: used by\ntimeline internally but NOT\nqueryable. Blocks A5, A11.", "fontSize": 14, "strokeColor": "#ef4444" },
{ "type": "arrow", "id": "a-hidden-who", "x": 875, "y": 165, "width": 65, "height": 148,
"points": [[0,0],[65,148]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeWidth": 2,
"strokeStyle": "dashed" },
{ "type": "text", "id": "hidden-who-label", "x": 880, "y": 240, "text": "who uses\nDiffNotes,\nnot file\nchanges", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "arrow", "id": "a-hidden-timeline", "x": 875, "y": 215, "width": 65, "height": 155,
"points": [[0,0],[65,155]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeWidth": 2,
"strokeStyle": "dashed" },
{ "type": "rectangle", "id": "cmd-missing-refs", "x": 950, "y": 580, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed",
"label": { "text": "lore refs (missing)", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-missing-files", "x": 950, "y": 635, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed",
"label": { "text": "lore files (missing)", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-missing-activity", "x": 950, "y": 690, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed",
"label": { "text": "lore activity (missing)", "fontSize": 16 } },
{ "type": "text", "id": "legend-title", "x": 30, "y": 430, "text": "Legend", "fontSize": 16 },
{ "type": "rectangle", "id": "leg-green", "x": 30, "y": 460, "width": 20, "height": 20,
"backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "leg-green-t", "x": 60, "y": 462, "text": "Queryable via CLI", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-red", "x": 30, "y": 490, "width": 20, "height": 20,
"backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444" },
{ "type": "text", "id": "leg-red-t", "x": 60, "y": 492, "text": "Stored but hidden", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-gray", "x": 30, "y": 520, "width": 20, "height": 20,
"backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96" },
{ "type": "text", "id": "leg-gray-t", "x": 60, "y": 522, "text": "Internal bookkeeping", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-dashed", "x": 30, "y": 550, "width": 20, "height": 20,
"backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "text", "id": "leg-dashed-t", "x": 60, "y": 552, "text": "Missing command", "fontSize": 14 }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

View File

@@ -0,0 +1,179 @@
# Deep Performance Audit Report
**Date:** 2026-02-12
**Branch:** `perf-audit` (e9bacc94)
**Parent:** `039ab1c2` (master, v0.6.1)
---
## Methodology
1. **Baseline** — measured p50/p95 latency for all major commands with warm cache
2. **Profile** — used macOS `sample` profiler and `EXPLAIN QUERY PLAN` to identify hotspots
3. **Golden output** — captured exact numeric outputs before changes as equivalence oracle
4. **One lever per change** — each optimization isolated and independently benchmarked
5. **Revert threshold** — any optimization <1.1x speedup reverted per audit rules
---
## Baseline Measurements (warm cache, release build)
| Command | Latency | Notes |
|---------|---------|-------|
| `who --path src/core/db.rs` (expert) | 2200ms | **Hotspot** |
| `who --active` | 83-93ms | Acceptable |
| `who workload` | 22ms | Fast |
| `stats` | 107-112ms | **Hotspot** |
| `search "authentication"` | 1030ms | **Hotspot** (library-level) |
| `list issues -n 50` | ~40ms | Fast |
---
## Optimization 1: INDEXED BY for DiffNote Queries
**Target:** `src/cli/commands/who.rs` — expert and reviews query paths
**Problem:** SQLite query planner chose `idx_notes_system` (38% selectivity, 106K rows) over `idx_notes_diffnote_path_created` (9.3% selectivity, 26K rows) for path-filtered DiffNote queries. The partial index `WHERE noteable_type = 'MergeRequest' AND type = 'DiffNote'` is far more selective but the planner's cost model didn't pick it.
**Change:** Added `INDEXED BY idx_notes_diffnote_path_created` to all 8 SQL queries across `query_expert`, `query_expert_details`, `query_reviews`, `build_path_query` (probes 1 & 2), and `suffix_probe`.
**Results:**
| Query | Before | After | Speedup |
|-------|--------|-------|---------|
| expert (specific path) | 2200ms | 56-58ms | **38x** |
| expert (broad path) | 2200ms | 83ms | **26x** |
| reviews | 1800ms | 24ms | **75x** |
**Isomorphism proof:** `INDEXED BY` only changes which index the planner uses, not the query semantics. Same rows matched, same ordering, same output. Verified by golden output comparison across 5+ runs.
---
## Optimization 2: Conditional Aggregates in Stats
**Target:** `src/cli/commands/stats.rs`
**Problem:** 12+ sequential `COUNT(*)` queries each requiring a full table scan of `documents` (61K rows). Each scan touched the same pages but couldn't share work.
**Changes:**
- Documents: 5 sequential COUNTs -> 1 query with `SUM(CASE WHEN ... THEN 1 END)`
- FTS count: `SELECT COUNT(*) FROM documents_fts` (virtual table, slow) -> `SELECT COUNT(*) FROM documents_fts_docsize` (shadow B-tree table, 19x faster)
- Embeddings: 2 queries -> 1 with `COUNT(DISTINCT document_id), COUNT(*)`
- Dirty sources: 2 queries -> 1 with conditional aggregates
- Pending fetches: 2 queries -> 1 each (discussions, dependents)
**Results:**
| Metric | Before | After | Speedup |
|--------|--------|-------|---------|
| Warm median | 112ms | 66ms | **1.70x** |
| Cold | 1220ms | ~700ms | ~1.7x |
**Golden output verified:**
```
total:61652, issues:8241, mrs:10018, discussions:43393, truncated:63
fts:61652, embedded:61652, chunks:88161
```
All values match exactly across before/after runs.
**Isomorphism proof:** `SUM(CASE WHEN x THEN 1 END)` is algebraically identical to `COUNT(*) WHERE x`. The FTS5 shadow table `documents_fts_docsize` has exactly one row per FTS document by SQLite specification, so `COUNT(*)` on it equals the virtual table count.
---
## Investigation: Two-Phase FTS Search (REVERTED)
**Target:** `src/search/fts.rs`, `src/cli/commands/search.rs`
**Hypothesis:** FTS5 `snippet()` generation is expensive. Splitting search into Phase 1 (score-only MATCH+bm25) and Phase 2 (snippet for filtered results only) should reduce work.
**Implementation:** Created `fetch_fts_snippets()` that retrieves snippets only for post-filter document IDs via `json_each()` join.
**Results:**
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| search (limit 20) | 1030ms | 995ms | 3.5% |
**Decision:** Reverted. Per audit rules, <1.1x speedup does not justify added code complexity.
**Root cause:** The bottleneck is not snippet generation but `MATCH` + `bm25()` scoring itself. Profiling showed `strspn` (FTS5 tokenizer) and `memmove` as the top CPU consumers. The same query runs in 30ms on system sqlite3 but 1030ms in rusqlite's bundled SQLite — a ~125x gap despite both being SQLite 3.51.x compiled at -O3.
---
## Library-Level Finding: Bundled SQLite FTS5 Performance
**Observation:** FTS5 MATCH+bm25 queries are ~125x slower in rusqlite's bundled SQLite vs system sqlite3.
| Environment | Query Time | Notes |
|-------------|-----------|-------|
| System sqlite3 (macOS) | 30ms (with snippet), 8ms (without) | Same .db file |
| rusqlite bundled | 1030ms | `features = ["bundled"]`, OPT_LEVEL=3 |
**Profiler data (macOS `sample`):**
- Top hotspot: `strspn` in FTS5 tokenizer
- Secondary: `memmove` in FTS5 internals
- Scaling: ~5ms per result (limit 5 = 497ms, limit 20 = 995ms)
**Possible causes:**
- Bundled SQLite compiled without platform-specific optimizations (SIMD, etc.)
- Different memory allocator behavior
- Missing compile-time tuning flags
**Recommendation for future:** Investigate switching from `features = ["bundled"]` to system SQLite linkage, or audit the bundled compile flags in the `libsqlite3-sys` build script.
---
## Exploration Agent Findings (Informational)
Four parallel exploration agents surveyed the entire codebase. Key findings beyond what was already addressed:
### Ingestion Pipeline
- Serial DB writes in async context (acceptable — rusqlite is synchronous)
- Label ingestion uses individual inserts (potential batch optimization, low priority)
### CLI / GitLab Client
- GraphQL client recreated per call (`client.rs:98-100`) — caches connection pool, minor
- Double JSON deserialization in GraphQL responses — medium priority
- N+1 subqueries in `list` command (`list.rs:408-423`) — 4 correlated subqueries per row
### Search / Embedding
- No N+1 patterns, no O(n^2) algorithms
- Chunking is O(n) single-pass with proper UTF-8 safety
- Ollama concurrency model is sound (parallel HTTP, serial DB writes)
### Database / Documents
- O(n^2) prefix sum in `truncation.rs` — low traffic path
- String allocation patterns in extractors — micro-optimization territory
---
## Opportunity Matrix
| Candidate | Impact | Confidence | Effort | Score | Status |
|-----------|--------|------------|--------|-------|--------|
| INDEXED BY for DiffNote | Very High | High | Low | **9.0** | Shipped |
| Stats conditional aggregates | Medium | High | Low | **7.0** | Shipped |
| Bundled SQLite FTS5 | Very High | Medium | High | 5.0 | Documented |
| List N+1 subqueries | Medium | Medium | Medium | 4.0 | Backlog |
| GraphQL double deser | Low | Medium | Low | 3.5 | Backlog |
| Truncation O(n^2) | Low | High | Low | 3.0 | Backlog |
---
## Files Modified
| File | Change |
|------|--------|
| `src/cli/commands/who.rs` | INDEXED BY hints on 8 SQL queries |
| `src/cli/commands/stats.rs` | Conditional aggregates, FTS5 shadow table, merged queries |
---
## Quality Gates
- All 603 tests pass
- `cargo clippy --all-targets -- -D warnings` clean
- `cargo fmt --check` clean
- Golden output verified for both optimizations

View File

@@ -0,0 +1,202 @@
No `## Rejected Recommendations` section appears in the plan you pasted, so the revisions below are all net-new.
1. **Add an explicit “Bridge Contract” and fix scope inconsistency**
Analysis: The plan says “Three changes” but defines four. More importantly, identifier requirements are scattered. A single contract section prevents drift and makes every new read surface prove it can drive a write call.
```diff
@@
-**Scope**: Three changes, delivered in order:
+**Scope**: Four workstreams, delivered in order:
1. Add `gitlab_discussion_id` to notes output
2. Add `gitlab_discussion_id` to show command discussion groups
3. Add a standalone `discussions` list command
4. Fix robot-docs to list actual field names instead of opaque type references
+
+## Bridge Contract (Cross-Cutting)
+Every read payload that surfaces notes/discussions MUST include:
+- `project_path`
+- `noteable_type`
+- `parent_iid`
+- `gitlab_discussion_id`
+- `gitlab_note_id` (when note-level data is returned)
+This contract is required so agents can deterministically construct `glab api` write calls.
```
2. **Normalize identifier naming now (break ambiguous names)**
Analysis: Current `id`/`gitlab_id` naming is ambiguous in mixed payloads. Rename to explicit `note_id` and `gitlab_note_id` now (you explicitly dont care about backward compatibility). This reduces automation mistakes.
```diff
@@ 1b. Add field to `NoteListRow`
-pub struct NoteListRow {
- pub id: i64,
- pub gitlab_id: i64,
+pub struct NoteListRow {
+ pub note_id: i64, // local DB id
+ pub gitlab_note_id: i64, // GitLab note id
@@
@@ 1c. Add field to `NoteListRowJson`
-pub struct NoteListRowJson {
- pub id: i64,
- pub gitlab_id: i64,
+pub struct NoteListRowJson {
+ pub note_id: i64,
+ pub gitlab_note_id: i64,
@@
-#### 2f. Add `gitlab_note_id` to note detail structs in show
-While we're here, add `gitlab_id` to `NoteDetail`, `MrNoteDetail`, and their JSON
+#### 2f. Add `gitlab_note_id` to note detail structs in show
+While we're here, add `gitlab_note_id` to `NoteDetail`, `MrNoteDetail`, and their JSON
counterparts.
```
3. **Stop positional column indexing for these changes**
Analysis: In `list.rs`, row extraction is positional (`row.get(18)`, etc.). Adding fields is fragile and easy to break silently. Use named aliases and named lookup for robustness.
```diff
@@ 1a/1b SQL + query_map
- p.path_with_namespace AS project_path
+ p.path_with_namespace AS project_path,
+ d.gitlab_discussion_id AS gitlab_discussion_id
@@
- project_path: row.get(18)?,
- gitlab_discussion_id: row.get(19)?,
+ project_path: row.get("project_path")?,
+ gitlab_discussion_id: row.get("gitlab_discussion_id")?,
```
4. **Redesign `discussions` query to avoid correlated subquery fanout**
Analysis: Proposed query uses many correlated subqueries per row. Thats acceptable for tiny MR-scoped sets, but degrades for project-wide scans. Use a base CTE + one rollup pass over notes.
```diff
@@ 3c. SQL Query
-SELECT
- d.id,
- ...
- (SELECT COUNT(*) FROM notes n2 WHERE n2.discussion_id = d.id AND n2.is_system = 0) AS note_count,
- (SELECT n3.author_username FROM notes n3 WHERE n3.discussion_id = d.id ORDER BY n3.position LIMIT 1) AS first_author,
- ...
-FROM discussions d
+WITH base AS (
+ SELECT d.id, d.gitlab_discussion_id, d.noteable_type, d.project_id, d.issue_id, d.merge_request_id,
+ d.individual_note, d.first_note_at, d.last_note_at, d.resolvable, d.resolved
+ FROM discussions d
+ {where_sql}
+),
+note_rollup AS (
+ SELECT n.discussion_id,
+ COUNT(*) FILTER (WHERE n.is_system = 0) AS user_note_count,
+ COUNT(*) AS total_note_count,
+ MIN(CASE WHEN n.is_system = 0 THEN n.position END) AS first_user_pos
+ FROM notes n
+ JOIN base b ON b.id = n.discussion_id
+ GROUP BY n.discussion_id
+)
+SELECT ...
+FROM base b
+LEFT JOIN note_rollup r ON r.discussion_id = b.id
```
5. **Add explicit index work for new access patterns**
Analysis: Existing indexes are good but not ideal for new list patterns (`project + last_note`, note position ordering inside discussion). Add migration entries to keep latency stable.
```diff
@@ ## 3. Add Standalone `discussions` List Command
+#### 3h. Add migration for discussion-list performance
+**File**: `migrations/027_discussions_list_indexes.sql`
+```sql
+CREATE INDEX IF NOT EXISTS idx_discussions_project_last_note
+ ON discussions(project_id, last_note_at DESC, id DESC);
+CREATE INDEX IF NOT EXISTS idx_discussions_project_first_note
+ ON discussions(project_id, first_note_at DESC, id DESC);
+CREATE INDEX IF NOT EXISTS idx_notes_discussion_position
+ ON notes(discussion_id, position);
+```
```
6. **Add keyset pagination (critical for agent workflows)**
Analysis: `--limit` alone is not enough for automation over large datasets. Add cursor-based pagination with deterministic sort keys and `next_cursor` in JSON.
```diff
@@ 3a. CLI Args
+ /// Keyset cursor from previous response
+ #[arg(long, help_heading = "Output")]
+ pub cursor: Option<String>,
@@
@@ Response Schema
- "total_count": 15,
- "showing": 15
+ "total_count": 15,
+ "showing": 15,
+ "next_cursor": "eyJsYXN0X25vdGVfYXQiOjE3MDAwMDAwMDAwMDAsImlkIjoxMjN9"
@@
@@ Validation Criteria
+7. `lore -J discussions ... --cursor <token>` returns the next stable page without duplicates/skips
```
7. **Fix semantic ambiguities in discussion summary fields**
Analysis: `note_count` is ambiguous, and `first_author` can accidentally be a system note author. Make fields explicit and consistent with non-system default behavior.
```diff
@@ Response Schema
- "note_count": 3,
- "first_author": "elovegrove",
+ "user_note_count": 3,
+ "total_note_count": 4,
+ "first_user_author": "elovegrove",
@@
@@ 3d. Filters struct / path behavior
-- `path` → `EXISTS (SELECT 1 FROM notes n WHERE n.discussion_id = d.id AND n.position_new_path LIKE ?)`
+- `path` → match on BOTH `position_new_path` and `position_old_path` (exact/prefix)
```
8. **Enrich show outputs with actionable thread metadata**
Analysis: Adding only discussion id helps, but agents still need thread state and note ids to pick targets correctly. Add `resolvable`, `resolved`, `last_note_at_iso`, and `gitlab_note_id` in show discussion payloads.
```diff
@@ 2a/2b show discussion structs
pub struct DiscussionDetailJson {
pub gitlab_discussion_id: String,
+ pub resolvable: bool,
+ pub resolved: bool,
+ pub last_note_at_iso: String,
pub notes: Vec<NoteDetailJson>,
@@
pub struct NoteDetailJson {
+ pub gitlab_note_id: i64,
pub author_username: String,
```
9. **Harden robot-docs against schema drift with tests**
Analysis: Static JSON in `main.rs` will drift again. Add a lightweight contract test that asserts docs include required fields for `notes`, `discussions`, and show payloads.
```diff
@@ 4. Fix Robot-Docs Response Schemas
+#### 4f. Add robot-docs contract tests
+**File**: `src/main.rs` (or dedicated test module)
+- Assert `robot-docs` contains `gitlab_discussion_id` and `gitlab_note_id` in:
+ - `notes.response_schema`
+ - `issues.response_schema.show`
+ - `mrs.response_schema.show`
+ - `discussions.response_schema`
```
10. **Adjust delivery order to reduce rework and include missing CSV path**
Analysis: In your sample `handle_discussions`, `csv` is declared in args but not handled. Also, robot-docs should land after all payload changes. Sequence should minimize churn.
```diff
@@ Delivery Order
-3. **Change 4** (robot-docs) — depends on 1 and 2 being done so schemas are accurate.
-4. **Change 3** (discussions command) — largest change, depends on 1 for design consistency.
+3. **Change 3** (discussions command + indexes + pagination) — largest change.
+4. **Change 4** (robot-docs + contract tests) — last, after payloads are final.
@@ 3e. Handler wiring
- match format {
+ match format {
"json" => ...
"jsonl" => ...
+ "csv" => print_list_discussions_csv(&result),
_ => ...
}
```
If you want, I can produce a single consolidated revised plan markdown with these edits applied so you can drop it in directly.

View File

@@ -0,0 +1,162 @@
Best non-rejected upgrades Id make to this plan are below. They focus on reducing schema drift, making robot output safer to consume, and improving performance behavior at scale.
1. Add a shared contract model and field constants first (before workstreams 1-4)
Rationale: Right now each command has its own structs and ad-hoc mapping. That is exactly how drift happens. A single contract definition reused by `notes`, `show`, `discussions`, and robot-docs gives compile-time coupling between output payloads and docs. It also makes future fields cheaper and safer to add.
```diff
@@ Scope: Four workstreams, delivered in order:
-1. Add `gitlab_discussion_id` to notes output
-2. Add `gitlab_discussion_id` to show command discussion groups
-3. Add a standalone `discussions` list command
-4. Fix robot-docs to list actual field names instead of opaque type references
+0. Introduce shared Bridge Contract model/constants used by notes/show/discussions/robot-docs
+1. Add `gitlab_discussion_id` to notes output
+2. Add `gitlab_discussion_id` to show command discussion groups
+3. Add a standalone `discussions` list command
+4. Fix robot-docs to list actual field names instead of opaque type references
+## 0. Shared Contract Model (Cross-Cutting)
+Define canonical required-field constants and shared mapping helpers, then consume them in:
+- `src/cli/commands/list.rs`
+- `src/cli/commands/show.rs`
+- `src/cli/robot.rs`
+- `src/main.rs` robot-docs builder
+This removes duplicated field-name strings and prevents docs/output mismatch.
```
2. Make bridge fields “non-droppable” in robot mode
Rationale: The current plan adds fields, but `--fields` can still remove them. That breaks the core read/write bridge contract in exactly the workflows this change is trying to fix. In robot mode, contract fields should always be force-included.
```diff
@@ ## Bridge Contract (Cross-Cutting)
Every read payload that surfaces notes or discussions **MUST** include:
- `project_path`
- `noteable_type`
- `parent_iid`
- `gitlab_discussion_id`
- `gitlab_note_id` (when note-level data is returned — i.e., in notes list and show detail)
+### Field Filtering Guardrail
+In robot mode, `filter_fields` must force-include Bridge Contract fields even when users pass a narrower `--fields` list.
+Human/table mode keeps existing behavior.
```
3. Replace correlated subqueries in `discussions` rollup with a single-pass window/aggregate pattern
Rationale: Your CTE is better than naive fanout, but it still uses multiple correlated sub-selects per discussion for first author/body/path. At 200K+ discussions this can regress badly depending on cache/index state. A window-ranked `notes` CTE with grouped aggregates is usually faster and more predictable in SQLite.
```diff
@@ #### 3c. SQL Query
-Core query uses a CTE + rollup to avoid correlated subquery fanout on larger result sets:
+Core query uses a CTE + ranked-notes rollup (window function) to avoid per-row correlated subqueries:
-WITH filtered_discussions AS (...),
-note_rollup AS (
- SELECT
- n.discussion_id,
- SUM(...) AS note_count,
- (SELECT ... LIMIT 1) AS first_author,
- (SELECT ... LIMIT 1) AS first_note_body,
- (SELECT ... LIMIT 1) AS position_new_path,
- (SELECT ... LIMIT 1) AS position_new_line
- FROM notes n
- ...
-)
+WITH filtered_discussions AS (...),
+ranked_notes AS (
+ SELECT
+ n.*,
+ ROW_NUMBER() OVER (PARTITION BY n.discussion_id ORDER BY n.position, n.id) AS rn
+ FROM notes n
+ WHERE n.discussion_id IN (SELECT id FROM filtered_discussions)
+),
+note_rollup AS (
+ SELECT
+ discussion_id,
+ SUM(CASE WHEN is_system = 0 THEN 1 ELSE 0 END) AS note_count,
+ MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author,
+ MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body,
+ MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path,
+ MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line
+ FROM ranked_notes
+ GROUP BY discussion_id
+)
```
4. Add direct GitLab ID filters for deterministic bridging
Rationale: Bridge workflows often start from one known ID. You already have `gitlab_note_id` in notes filters, but discussion filtering still looks internal-ID-centric. Add explicit GitLab-ID filters so agents do not need extra translation calls.
```diff
@@ #### 3a. CLI Args
pub struct DiscussionsArgs {
+ /// Filter by GitLab discussion ID
+ #[arg(long, help_heading = "Filters")]
+ pub gitlab_discussion_id: Option<String>,
@@
@@ #### 3d. Filters struct
pub struct DiscussionListFilters {
+ pub gitlab_discussion_id: Option<String>,
@@
}
```
```diff
@@ ## 1. Add `gitlab_discussion_id` to Notes Output
+#### 1g. Add `--gitlab-discussion-id` filter to notes
+Allow filtering notes directly by GitLab thread ID (not only internal discussion ID).
+This enables one-hop note retrieval from external references.
```
5. Add optional note expansion to `discussions` for fewer round-trips
Rationale: Today the agent flow is often `discussions -> show`. Optional embedded notes (`--include-notes N`) gives a fast path for “list unresolved threads with latest context” without forcing full show payloads.
```diff
@@ ### Design
lore -J discussions --for-mr 99 --resolution unresolved
+lore -J discussions --for-mr 99 --resolution unresolved --include-notes 2
@@ #### 3a. CLI Args
+ /// Include up to N latest notes per discussion (0 = none)
+ #[arg(long, default_value = "0", help_heading = "Output")]
+ pub include_notes: usize,
```
6. Upgrade robot-docs from string blobs to structured schema + explicit contract block
Rationale: `contains("gitlab_discussion_id")` tests on schema strings are brittle. A structured schema object gives machine-checked docs and reliable test assertions. Add a contract section for agent consumers.
```diff
@@ ## 4. Fix Robot-Docs Response Schemas
-#### 4a. Notes response_schema
-Replace stringly-typed schema snippets...
+#### 4a. Notes response_schema (structured)
+Represent response fields as JSON objects (field -> type/nullable), not freeform strings.
+#### 4g. Add `bridge_contract` section in robot-docs
+Publish canonical required fields per entity:
+- notes
+- discussions
+- show.discussions
+- show.notes
```
7. Strengthen validation: add CLI-level contract tests and perf guardrails
Rationale: Most current tests are unit-level struct/query checks. Add end-to-end JSON contract tests via command handlers, plus a benchmark-style regression test (ignored by default) so performance work stays intentional.
```diff
@@ ## Validation Criteria
8. Bridge Contract fields (...) are present in every applicable read payload
+9. Contract fields remain present even with `--fields` in robot mode
+10. `discussions` query meets performance guardrail on representative fixture (documented threshold)
@@ ### Tests
+#### Test: robot-mode fields cannot drop bridge contract keys
+Run notes/discussions JSON output through `filter_fields` path and assert required keys remain.
+
+#### Test: CLI contract integration
+Invoke command handlers for `notes`, `discussions`, `mrs <iid>`, parse JSON, assert required keys and types.
+
+#### Test (ignored): large-fixture performance regression
+Generate representative fixture and assert `query_discussions` stays under target elapsed time.
```
If you want, I can now produce a full “v2 plan” document that applies these diffs end-to-end (including revised delivery order and complete updated sections).

View File

@@ -0,0 +1,147 @@
1. **Make `gitlab_note_id` explicit in all note-level payloads without breaking existing consumers**
Rationale: Your Bridge Contract already requires `gitlab_note_id`, but current plan keeps `gitlab_id` only in `notes` list while adding `gitlab_note_id` only in `show`. That forces agents to special-case commands. Add `gitlab_note_id` as an alias field everywhere note-level data appears, while keeping `gitlab_id` for compatibility.
```diff
@@ Bridge Contract (Cross-Cutting)
-Every read payload that surfaces notes or discussions MUST include:
+Every read payload that surfaces notes or discussions MUST include:
- project_path
- noteable_type
- parent_iid
- gitlab_discussion_id
- gitlab_note_id (when note-level data is returned — i.e., in notes list and show detail)
+ - Back-compat rule: note payloads may continue exposing `gitlab_id`, but MUST also expose `gitlab_note_id` with the same value.
@@ 1. Add `gitlab_discussion_id` to Notes Output
-#### 1c. Add field to `NoteListRowJson`
+#### 1c. Add fields to `NoteListRowJson`
+Add `gitlab_note_id` alias in addition to existing `gitlab_id` (no rename, no breakage).
@@ 1f. Update `--fields minimal` preset
-"notes" => ["id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
+"notes" => ["id", "gitlab_note_id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
```
2. **Avoid duplicate flag semantics for discussion filtering**
Rationale: `notes` already has `--discussion-id` and it already maps to `d.gitlab_discussion_id`. Adding a second independent flag/field (`--gitlab-discussion-id`) increases complexity and precedence bugs. Keep one backing filter field and make the new flag an alias.
```diff
@@ 1g. Add `--gitlab-discussion-id` filter to notes
-Allow filtering notes directly by GitLab discussion thread ID...
+Normalize discussion ID flags:
+- Keep one backing filter field (`discussion_id`)
+- Support both `--discussion-id` (existing) and `--gitlab-discussion-id` (alias)
+- If both are provided, clap should reject as duplicate/alias conflict
```
3. **Add ambiguity guardrails for cross-project discussion IDs**
Rationale: `gitlab_discussion_id` is unique per project, not globally. Filtering by discussion ID without project can return multiple rows across repos, which breaks deterministic write bridging. Fail fast with an `Ambiguous` error and actionable fix (`--project`).
```diff
@@ Bridge Contract (Cross-Cutting)
+### Ambiguity Guardrail
+When filtering by `gitlab_discussion_id` without `--project`, if multiple projects match:
+- return `Ambiguous` error
+- include matching project paths in message
+- suggest retry with `--project <path>`
```
4. **Replace `--include-notes` N+1 retrieval with one batched top-N query**
Rationale: The current plans per-discussion follow-up query scales poorly and creates latency spikes. Use a single window-function query over selected discussion IDs and group rows in Rust. This is both faster and more predictable.
```diff
@@ 3c-ii. Note expansion query (--include-notes)
-When `include_notes > 0`, after the main discussion query, run a follow-up query per discussion...
+When `include_notes > 0`, run one batched query:
+WITH ranked_notes AS (
+ SELECT
+ n.*,
+ d.gitlab_discussion_id,
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY n.created_at DESC, n.id DESC
+ ) AS rn
+ FROM notes n
+ JOIN discussions d ON d.id = n.discussion_id
+ WHERE n.discussion_id IN ( ...selected discussion ids... )
+)
+SELECT ... FROM ranked_notes WHERE rn <= ?
+ORDER BY discussion_id, rn;
+
+Group by `discussion_id` in Rust and attach notes arrays without per-thread round-trips.
```
5. **Add hard output guardrails and explicit truncation metadata**
Rationale: `--limit` and `--include-notes` are unbounded today. For robot workflows this can accidentally generate huge payloads. Cap values and surface effective limits plus truncation state in `meta`.
```diff
@@ 3a. CLI Args
- pub limit: usize,
+ pub limit: usize, // clamp to max (e.g., 500)
- pub include_notes: usize,
+ pub include_notes: usize, // clamp to max (e.g., 20)
@@ Response Schema
- "meta": { "elapsed_ms": 12 }
+ "meta": {
+ "elapsed_ms": 12,
+ "effective_limit": 50,
+ "effective_include_notes": 2,
+ "has_more": true
+ }
```
6. **Strengthen deterministic ordering and null handling**
Rationale: `first_note_at`, `last_note_at`, and note `position` can be null/incomplete during partial sync states. Add null-safe ordering to avoid unstable output and flaky automation.
```diff
@@ 2c. Update queries to SELECT new fields
-... ORDER BY first_note_at
+... ORDER BY COALESCE(first_note_at, last_note_at, 0), id
@@ show note query
-ORDER BY position
+ORDER BY COALESCE(position, 9223372036854775807), created_at, id
@@ 3c. SQL Query
-ORDER BY {sort_column} {order}
+ORDER BY COALESCE({sort_column}, 0) {order}, fd.id {order}
```
7. **Make write-bridging more useful with optional command hints**
Rationale: Exposing IDs is necessary but not sufficient; agents still need to assemble endpoints repeatedly. Add optional `--with-write-hints` that injects compact endpoint templates (`reply`, `resolve`) derived from row context. This improves usability without bloating default output.
```diff
@@ 3a. CLI Args
+ /// Include machine-actionable glab write hints per row
+ #[arg(long, help_heading = "Output")]
+ pub with_write_hints: bool,
@@ Response Schema (notes/discussions/show)
+ "write_hints?": {
+ "reply_endpoint": "string",
+ "resolve_endpoint?": "string"
+ }
```
8. **Upgrade robot-docs/contract validation from string-contains to parity checks**
Rationale: `contains("gitlab_discussion_id")` catches very little and allows schema drift. Build field-set parity tests that compare actual serialized JSON keys to robot-docs declared fields for `notes`, `discussions`, and `show` discussion nodes.
```diff
@@ 4f. Add robot-docs contract tests
-assert!(notes_schema.contains("gitlab_discussion_id"));
+let declared = parse_schema_field_list(notes_schema);
+let sample = sample_notes_row_json_keys();
+assert_required_subset(&declared, &["project_path","noteable_type","parent_iid","gitlab_discussion_id","gitlab_note_id"]);
+assert_schema_matches_payload(&declared, &sample);
@@ 4g. Add CLI-level contract integration tests
+Add parity tests for:
+- notes list JSON
+- discussions list JSON
+- issues show discussions[*]
+- mrs show discussions[*]
```
If you want, I can produce a full revised v3 plan text with these edits merged end-to-end so its ready to execute directly.

View File

@@ -0,0 +1,207 @@
Below are the highest-impact revisions Id make to this plan. I excluded everything listed in your `## Rejected Recommendations` section.
**1. Fix a correctness bug in the ambiguity guardrail (must run before `LIMIT`)**
The current post-query ambiguity check can silently fail when `--limit` truncates results to one project even though multiple projects match the same `gitlab_discussion_id`. That creates non-deterministic write targeting risk.
```diff
@@ ## Ambiguity Guardrail
-**Implementation**: After the main query, if `gitlab_discussion_id` is set and no `--project`
-was provided, check if the result set spans multiple `project_path` values.
+**Implementation**: Run a preflight distinct-project check when `gitlab_discussion_id` is set
+and `--project` was not provided, before the main list query applies `LIMIT`.
+Use:
+```sql
+SELECT DISTINCT p.path_with_namespace
+FROM discussions d
+JOIN projects p ON p.id = d.project_id
+WHERE d.gitlab_discussion_id = ?
+LIMIT 3
+```
+If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with project
+paths and suggestion to retry with `--project <path>`.
```
---
**2. Add `gitlab_project_id` to the Bridge Contract**
`project_path` is human-friendly but mutable (renames/transfers). `gitlab_project_id` gives a stable write target and avoids path re-resolution failures.
```diff
@@ ## Bridge Contract (Cross-Cutting)
Every read payload that surfaces notes or discussions **MUST** include:
- `project_path`
+- `gitlab_project_id`
- `noteable_type`
- `parent_iid`
- `gitlab_discussion_id`
- `gitlab_note_id`
@@
const BRIDGE_FIELDS_NOTES: &[&str] = &[
- "project_path", "noteable_type", "parent_iid",
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id", "gitlab_note_id",
];
const BRIDGE_FIELDS_DISCUSSIONS: &[&str] = &[
- "project_path", "noteable_type", "parent_iid",
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id",
];
```
---
**3. Replace stringly-typed filter/sort fields with enums end-to-end**
Right now `sort`, `order`, `resolution`, `noteable_type` are mostly `String`. This is fragile and risks unsafe SQL interpolation drift over time. Typed enums make invalid states unrepresentable.
```diff
@@ ## 3a. CLI Args
- pub resolution: Option<String>,
+ pub resolution: Option<ResolutionFilter>,
@@
- pub noteable_type: Option<String>,
+ pub noteable_type: Option<NoteableTypeFilter>,
@@
- pub sort: String,
+ pub sort: DiscussionSortField,
@@
- pub asc: bool,
+ pub order: SortDirection,
@@ ## 3d. Filters struct
- pub resolution: Option<String>,
- pub noteable_type: Option<String>,
- pub sort: String,
- pub order: String,
+ pub resolution: Option<ResolutionFilter>,
+ pub noteable_type: Option<NoteableTypeFilter>,
+ pub sort: DiscussionSortField,
+ pub order: SortDirection,
@@
+Map enum -> SQL fragment via `match` in query builder; never interpolate raw strings.
```
---
**4. Enforce snapshot consistency for multi-query commands**
`discussions` with `--include-notes` does multiple reads. Without a single read transaction, concurrent ingest can produce mismatched `total_count`, row set, and expanded notes.
```diff
@@ ## 3c. SQL Query
-pub fn query_discussions(...)
+pub fn query_discussions(...)
{
+ // Run count query + page query + note expansion under one deferred read transaction
+ // so output is a single consistent snapshot.
+ let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
...
+ tx.commit()?;
}
@@ ## 1. Add `gitlab_discussion_id` to Notes Output
+Apply the same snapshot rule to `query_notes` when returning `total_count` + paged rows.
```
---
**5. Correct first-note rollup semantics (current CTE can return null/incorrect `first_author`)**
In the proposed SQL, `rn=1` is computed over all notes but then filtered with `is_system=0`, so threads with a leading system note may incorrectly lose `first_author`/snippet. Also path rollup uses non-deterministic `MAX(...)`.
```diff
@@ ## 3c. SQL Query
-ranked_notes AS (
+ranked_notes AS (
SELECT
n.discussion_id,
n.author_username,
n.body,
n.is_system,
n.position_new_path,
n.position_new_line,
- ROW_NUMBER() OVER (
- PARTITION BY n.discussion_id
- ORDER BY n.position, n.id
- ) AS rn
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY CASE WHEN n.is_system = 0 THEN 0 ELSE 1 END, n.created_at, n.id
+ ) AS rn_first_note,
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY CASE WHEN n.position_new_path IS NULL THEN 1 ELSE 0 END, n.created_at, n.id
+ ) AS rn_first_position
@@
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author,
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body,
- MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path,
- MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN author_username END) AS first_author,
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN body END) AS first_note_body,
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_path END) AS position_new_path,
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_line END) AS position_new_line
```
---
**6. Add per-discussion truncation signals for `--include-notes`**
Top-level `has_more` is useful, but agents also need to know if an individual threads notes were truncated. Otherwise they cant tell if a thread is complete.
```diff
@@ ## Response Schema
{
"gitlab_discussion_id": "...",
...
- "notes": []
+ "included_note_count": 0,
+ "has_more_notes": false,
+ "notes": []
}
@@ ## 3b. Domain Structs
pub struct DiscussionListRowJson {
@@
+ pub included_note_count: usize,
+ pub has_more_notes: bool,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub notes: Vec<NoteListRowJson>,
}
@@ ## 3c-ii. Note expansion query (--include-notes)
-Group by `discussion_id` in Rust and attach notes arrays...
+Group by `discussion_id` in Rust, attach notes arrays, and set:
+`included_note_count = notes.len()`,
+`has_more_notes = note_count > included_note_count`.
```
---
**7. Add explicit query-plan gate and targeted index workstream (measured, not speculative)**
This plan introduces heavy discussion-centric reads. You should bake in deterministic performance validation with `EXPLAIN QUERY PLAN` and only then add indexes if missing.
```diff
@@ ## Scope: Four workstreams, delivered in order:
-4. Fix robot-docs to list actual field names instead of opaque type references
+4. Add query-plan validation + targeted index updates for new discussion queries
+5. Fix robot-docs to list actual field names instead of opaque type references
@@
+## 4. Query-Plan Validation and Targeted Indexes
+
+Before and after implementing `query_discussions`, capture `EXPLAIN QUERY PLAN` for:
+- `--for-mr <iid> --resolution unresolved`
+- `--project <path> --since 7d --sort last_note`
+- `--gitlab-discussion-id <id>`
+
+If plans show table scans on `notes`/`discussions`, add indexes in `MIGRATIONS` array:
+- `discussions(project_id, gitlab_discussion_id)`
+- `discussions(merge_request_id, last_note_at, id)`
+- `notes(discussion_id, created_at DESC, id DESC)`
+- `notes(discussion_id, position, id)`
+
+Tests: assert the new query paths return expected rows under indexed schema and no regressions.
```
---
If you want, I can produce a single consolidated “iteration 4” version of the plan text with all seven revisions merged in place.

View File

@@ -0,0 +1,160 @@
I reviewed the plan end-to-end and focused only on new improvements (none of the items in `## Rejected Recommendations` are re-proposed).
1. Add direct `--discussion-id` retrieval paths
Rationale: This removes a full discovery hop for the exact workflow that failed (replying to a known thread). It also reduces ambiguity and query cost when an agent already has the thread ID.
```diff
@@ Core Changes
| 7 | Fix robot-docs to list actual field names | Docs | Small |
+| 8 | Add direct `--discussion-id` filter to notes/discussions/show | Core | Small |
@@ Change 3: Add Standalone `discussions` List Command
lore -J discussions --for-mr 99 --cursor <token> # keyset pagination
+lore -J discussions --discussion-id 6a9c1750b37d... # direct lookup
@@ 3a. CLI Args
+ #[arg(long, conflicts_with_all = ["for_issue", "for_mr"], help_heading = "Filters")]
+ pub discussion_id: Option<String>,
@@ Change 1: Add `gitlab_discussion_id` to Notes Output
+Add `--discussion-id <hex>` filter to `notes` for direct note retrieval within one thread.
```
2. Add a shared filter compiler to eliminate count/query drift
Rationale: The plan currently repeats filters across data query, `total_count`, and `incomplete_rows` count queries. That is a classic reliability bug source. A single compiled filter object makes count semantics provably consistent.
```diff
@@ Count Semantics (Cross-Cutting Convention)
+## Filter Compiler (NEW, Cross-Cutting Convention)
+All list commands must build predicates via a shared `CompiledFilters` object that emits:
+- SQL predicate fragment
+- bind parameters
+- canonical filter string (for cursor hash)
+The same compiled object is reused by:
+- page data query
+- `total_count` query
+- `incomplete_rows` query
```
3. Harden keyset pagination semantics for `DESC`, limits, and client ergonomics
Rationale: `(sort_value, id) > (?, ?)` is only correct for ascending order. Descending sort needs `<`. Also add explicit `has_more` so clients dont infer from cursor nullability.
```diff
@@ Keyset Pagination (Cross-Cutting, Change B)
-```sql
-WHERE (sort_value, id) > (?, ?)
-```
+Use comparator by order:
+- ASC: `(sort_value, id) > (?, ?)`
+- DESC: `(sort_value, id) < (?, ?)`
@@ 3a. CLI Args
+ #[arg(short = 'n', long = "limit", default_value = "50", value_parser = clap::value_parser!(usize).range(1..=500), help_heading = "Output")]
+ pub limit: usize,
@@ Response Schema
- "next_cursor": "aW...xyz=="
+ "next_cursor": "aW...xyz==",
+ "has_more": true
```
4. Add DB-level entity integrity invariants (not just response invariants)
Rationale: Response-side filtering is good, but DB correctness should also be guarded. This prevents silent corruption and bad joins from ingestion or future migrations.
```diff
@@ Contract Invariants (NEW)
+### Entity Integrity Invariants (DB + Ingest)
+1. `discussions` must belong to exactly one parent (`issue_id XOR merge_request_id`).
+2. `discussions.noteable_type` must match the populated parent column.
+3. Natural-key uniqueness is enforced where valid:
+ - `(project_id, gitlab_discussion_id)` unique for discussions.
+4. Ingestion must reject/quarantine rows violating invariants and report counts.
@@ Supporting Indexes (Cross-Cutting, Change D)
+CREATE UNIQUE INDEX IF NOT EXISTS idx_discussions_project_gitlab_discussion_id
+ ON discussions(project_id, gitlab_discussion_id);
```
5. Switch bulk note loading to streaming grouping (avoid large intermediate vecs)
Rationale: Current bulk strategy still materializes all notes before grouping. Streaming into the map cuts peak memory and improves large-MR stability.
```diff
@@ Change 2e. Constructor — use bulk notes map
-let all_note_rows: Vec<MrNoteDetail> = ... // From bulk query above
-let notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> =
- all_note_rows.into_iter().fold(HashMap::new(), |mut map, note| {
- map.entry(note.discussion_id).or_insert_with(Vec::new).push(note);
- map
- });
+let mut notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> = HashMap::new();
+for row in bulk_note_stmt.query_map(params, map_note_row)? {
+ let note = row?;
+ notes_by_discussion.entry(note.discussion_id).or_default().push(note);
+}
```
6. Make freshness tri-state (`fresh|stale|unknown`) and fail closed on unknown with `--require-fresh`
Rationale: `stale: bool` alone cannot represent “never synced / unknown project freshness.” For write safety, unknown freshness should be explicit and reject under freshness constraints.
```diff
@@ Freshness Metadata & Staleness Guards
pub struct ResponseMeta {
pub elapsed_ms: i64,
pub data_as_of_iso: String,
pub sync_lag_seconds: i64,
pub stale: bool,
+ pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub freshness_reason: Option<String>,
pub incomplete_rows: i64,
@@
-if sync_lag_seconds > max_age_secs {
+if freshness_state == "unknown" || sync_lag_seconds > max_age_secs {
```
7. Tune indexes to match actual ORDER BY paths in window queries
Rationale: `idx_notes_discussion_position` is likely insufficient for the two window orderings. A covering-style index aligned with partition/order keys reduces random table lookups.
```diff
@@ Supporting Indexes (Cross-Cutting, Change D)
--- Notes: window function ORDER BY (discussion_id, position) for ROW_NUMBER()
-CREATE INDEX IF NOT EXISTS idx_notes_discussion_position
- ON notes(discussion_id, position);
+-- Notes: support dual ROW_NUMBER() orderings and reduce table lookups
+CREATE INDEX IF NOT EXISTS idx_notes_discussion_window
+ ON notes(discussion_id, is_system, position, created_at, gitlab_id);
```
8. Add a phased rollout gate before strict exclusion becomes default
Rationale: Enforcing `gitlab_* IS NOT NULL` immediately can hide data if existing rows are incomplete. A short observation gate prevents sudden regressions while preserving the end-state contract.
```diff
@@ Delivery Order
+Batch 0: Observability gate (NEW)
+- Ship `incomplete_rows` and freshness meta first
+- Measure incomplete rate across real datasets
+- If incomplete ratio <= threshold, enable strict exclusion defaults
+- If above threshold, block rollout and fix ingestion quality first
+
Change 1 (notes output) ──┐
```
9. Add property-based invariants for pagination/count correctness
Rationale: Your current tests are scenario-based and good, but randomized property tests are much better at catching edge-case cursor/count bugs.
```diff
@@ Tests (Change 3 / Change B)
+**Test 12**: Property-based pagination invariants (`proptest`)
+```rust
+#[test]
+fn prop_discussion_cursor_no_overlap_no_gap_under_random_data() { /* ... */ }
+```
+
+**Test 13**: Property-based count invariants
+```rust
+#[test]
+fn prop_total_count_and_incomplete_rows_match_filter_partition() { /* ... */ }
+```
```
If you want, I can now produce a fully consolidated “Plan v4” that applies these diffs cleanly into your original document so it reads as a single coherent spec.

View File

@@ -0,0 +1,158 @@
I reviewed the whole plan and only proposed changes that are not in your `## Rejected Recommendations`.
1. **Fix plan-internal inconsistencies first**
Analysis: The plan currently has a few self-contradictions (`8` vs `9` cross-cutting improvements, `stale` still referenced after moving to tri-state freshness). Cleaning this prevents implementation drift and bad AC validation.
```diff
--- a/plan.md
+++ b/plan.md
@@
-**Scope**: 8 core changes + 8 cross-cutting architectural improvements across 3 tiers:
+**Scope**: 8 core changes + 9 cross-cutting architectural improvements across 3 tiers:
@@ AC-7: Freshness Metadata Present & Staleness Guards Work
-lore -J notes -n 1 | jq '.meta | {data_as_of_iso, sync_lag_seconds, stale}'
-# All fields present, stale=false if recently synced
+lore -J notes -n 1 | jq '.meta | {data_as_of_iso, sync_lag_seconds, freshness_state}'
+# All fields present, freshness_state is one of fresh|stale|unknown
@@ Change 6 Response Schema example
- "stale": false,
+ "freshness_state": "fresh",
```
2. **Require snapshot-consistent list responses (page + counts)**
Analysis: `total_count`, `incomplete_rows`, and page rows can drift if sync writes between queries. Enforcing a single read snapshot for all list commands makes pagination and counts deterministic.
```diff
--- a/plan.md
+++ b/plan.md
@@ Count Semantics (Cross-Cutting Convention)
All list commands use consistent count fields:
+All three queries (`page`, `total_count`, `incomplete_rows`) MUST execute inside one read transaction/snapshot.
+This guarantees count/page consistency under concurrent sync writes.
```
3. **Use RAII transactions instead of manual `BEGIN/COMMIT`**
Analysis: Manual `execute_batch("BEGIN...")` is fragile on early returns. `rusqlite::Transaction` guarantees rollback on error and removes transaction-leak risk.
```diff
--- a/plan.md
+++ b/plan.md
@@ Change 2: Consistency guarantee
-conn.execute_batch("BEGIN DEFERRED")?;
-// ... discussion query ...
-// ... bulk note query ...
-conn.execute_batch("COMMIT")?;
+let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
+// ... discussion query ...
+// ... bulk note query ...
+tx.commit()?;
```
4. **Allow small focused new modules for query infrastructure**
Analysis: Keeping everything in `list.rs`/`show.rs` will become a maintenance hotspot as filters/cursors/freshness expand. A small module split reduces coupling and regression risk.
```diff
--- a/plan.md
+++ b/plan.md
@@ Change 3: File Architecture
-**No new files.** Follow existing patterns:
+Allow focused infra modules for shared logic:
+- `src/cli/query/filters.rs` (CompiledFilters + builders)
+- `src/cli/query/cursor.rs` (encode/decode/validate v2 cursors)
+- `src/cli/query/freshness.rs` (freshness computation + guards)
+Command handlers remain in existing files.
```
5. **Add ingest-time `discussion_rollups` to avoid repeated heavy window scans**
Analysis: Window functions are good, but doing them on every read over large note volumes is still expensive. Precomputing rollups during ingest gives lower and more predictable p95 latency while keeping read paths simpler.
```diff
--- a/plan.md
+++ b/plan.md
@@ Architectural Improvements (Cross-Cutting)
+| J | Ingest-time discussion rollups (`discussion_rollups`) | Performance | Medium |
@@ Change 3 SQL strategy
-Use `ROW_NUMBER()` window function instead of correlated subqueries...
+Primary path: join precomputed `discussion_rollups` for `note_count`, `first_author`,
+`first_note_body`, `position_new_path`, `position_new_line`.
+Fallback path: window-function recompute if rollup row is missing (defensive correctness).
```
6. **Add deterministic numeric project selector `--project-id`**
Analysis: `-p group/repo` is human-friendly, but numeric project IDs are safer for robots and avoid fuzzy/project-path ambiguity. This reduces false ambiguity failures and lookup overhead.
```diff
--- a/plan.md
+++ b/plan.md
@@ DiscussionsArgs
#[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>,
+ #[arg(long, conflicts_with = "project", help_heading = "Filters")]
+ pub project_id: Option<i64>,
@@ Ambiguity handling
+If `--project-id` is provided, IID resolution is scoped directly to that project.
+`--project-id` takes precedence over path-based project matching.
```
7. **Make path filtering rename-aware (`old` + `new`)**
Analysis: Current `--path` strategy only using `position_new_path` misses deleted/renamed-file discussions. Supporting side selection makes the feature materially more useful for review workflows.
```diff
--- a/plan.md
+++ b/plan.md
@@ DiscussionsArgs
#[arg(long, help_heading = "Filters")]
pub path: Option<String>,
+ #[arg(long, value_parser = ["either", "new", "old"], default_value = "either", help_heading = "Filters")]
+ pub path_side: String,
@@ Change 3 filtering
-Path filter matches `position_new_path`.
+Path filter semantics:
+- `either` (default): match `position_new_path` OR `position_old_path`
+- `new`: match only `position_new_path`
+- `old`: match only `position_old_path`
```
8. **Add explicit freshness behavior for empty-result queries + bootstrap backfill**
Analysis: Freshness based only on “participating rows” is undefined when results are empty. Define deterministic behavior and backfill `project_sync_state` on migration so `unknown` doesnt spike unexpectedly after deploy.
```diff
--- a/plan.md
+++ b/plan.md
@@ Freshness state logic
+Empty-result rules:
+- If query is project-scoped (`-p` or `--project-id`), freshness is computed from that project even when no rows match.
+- If query is unscoped and returns zero rows, freshness is computed from all tracked projects.
@@ A1. Track per-project sync timestamp
+Migration step: seed `project_sync_state` from latest known sync metadata where available
+to avoid mass `unknown` freshness immediately after rollout.
```
9. **Upgrade `--discussion-id` from filter-only to first-class thread retrieval**
Analysis: Filtering list output by discussion ID still returns list-shaped data and partial note context. A direct thread retrieval mode is faster for agent workflows and avoids extra commands.
```diff
--- a/plan.md
+++ b/plan.md
@@ Core Changes
-| 8 | Add direct `--discussion-id` filter to notes/discussions/show | Core | Small |
+| 8 | Add direct `--discussion-id` filter + single-thread retrieval mode | Core | Medium |
@@ Change 8
+lore -J discussions --discussion-id <id> --full-thread
+# Returns one discussion with full notes payload (same note schema as show command).
```
10. **Replace ad-hoc AC performance timing with repeatable perf harness**
Analysis: `time lore ...` is noisy and machine-dependent. A reproducible seeded benchmark test gives stable guardrails and catches regressions earlier.
```diff
--- a/plan.md
+++ b/plan.md
@@ AC-10: Performance Budget
-time lore -J discussions --for-mr <iid> -n 100
-# real 0m0.100s (p95 < 150ms)
+cargo test --test perf_discussions -- --ignored --nocapture
+# Uses seeded fixture DB and N repeated runs; asserts p95 < 150ms for target query shape.
```
If you want, I can also produce a fully merged “iteration 5” rewritten plan document with these edits applied end-to-end so its directly executable by an implementation agent.

View File

@@ -0,0 +1,143 @@
Strong plan overall. The biggest gaps Id fix are around sync-health correctness, idempotency/integrity under repeated ingests, deleted-entity lifecycle, and reducing schema drift risk without heavy reflection machinery.
I avoided everything in your `## Rejected Recommendations` section.
**1. Add Sync Health Semantics (not just age)**
Time freshness alone can mislead after partial/failed syncs. Agents need to know whether data is both recent and complete.
```diff
@@ ## Freshness Metadata & Staleness Guards (Cross-Cutting, Change A/F/G)
- pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ pub sync_status: String, // "ok" | "partial" | "failed" | "never"
+ pub last_successful_sync_run_id: Option<i64>,
+ pub last_attempted_sync_run_id: Option<i64>,
@@
-#[arg(long, help_heading = "Freshness")]
-pub require_fresh: Option<String>,
+#[arg(long, help_heading = "Freshness")]
+pub require_fresh: Option<String>,
+#[arg(long, help_heading = "Freshness")]
+pub require_sync_ok: bool,
```
Rationale: this prevents false confidence when one project is fresh-by-time but latest sync actually failed or was partial.
---
**2. Add `--require-complete` Guard for Missing Required IDs**
You already expose `meta.incomplete_rows`; add a hard gate for automation.
```diff
@@ ## Count Semantics (Cross-Cutting Convention)
`incomplete_rows` is computed via a dedicated COUNT query...
+Add CLI guard:
+`--require-complete` fails with exit code 19 when `meta.incomplete_rows > 0`.
+Suggested action: `lore sync --full`.
```
Rationale: agents can fail fast instead of silently acting on partial datasets.
---
**3. Strengthen Ingestion Idempotency + Referential Integrity for Notes**
You added natural-key uniqueness for discussions; do the same for notes and enforce parent integrity at DB level.
```diff
@@ ## Supporting Indexes (Cross-Cutting, Change D)
CREATE UNIQUE INDEX IF NOT EXISTS idx_discussions_project_gitlab_discussion_id
ON discussions(project_id, gitlab_discussion_id);
+CREATE UNIQUE INDEX IF NOT EXISTS idx_notes_project_gitlab_id
+ ON notes(project_id, gitlab_id);
+
+-- Referential integrity
+-- notes.discussion_id REFERENCES discussions(id)
+-- notes.project_id REFERENCES projects(id)
```
Rationale: repeated syncs and retries wont duplicate notes, and orphaned rows cant accumulate.
---
**4. Add Deleted/Tombstoned Entity Lifecycle**
Current plan excludes null IDs but doesnt define behavior when GitLab entities are deleted after sync.
```diff
@@ ## Contract Invariants (NEW)
+### Deletion Lifecycle Invariant
+1. Notes/discussions deleted upstream are tombstoned locally (`deleted_at`), not hard-deleted.
+2. All list/show commands exclude tombstoned rows by default.
+3. Optional flag `--include-deleted` exposes tombstoned rows for audit/debug.
```
Rationale: preserves auditability, prevents ghost actions on deleted objects, and avoids destructive resync behavior.
---
**5. Expand Discussions Payload for Rename Accuracy + Better Triage**
`--path-side old` is great, but output currently only returns `position_new_*`.
```diff
@@ ## Change 3: Add Standalone `discussions` List Command
pub position_new_path: Option<String>,
pub position_new_line: Option<i64>,
+ pub position_old_path: Option<String>,
+ pub position_old_line: Option<i64>,
+ pub last_author: Option<String>,
+ pub participant_usernames: Vec<String>,
```
Rationale: for renamed/deleted files, agents need old and new coordinates to act confidently; participants/last_author improve thread routing and prioritization.
---
**6. Add SQLite Busy Handling + Retry Policy**
Read transactions + concurrent sync writes can still produce `SQLITE_BUSY` under load.
```diff
@@ ## Count Semantics (Cross-Cutting Convention)
**Snapshot consistency**: All three queries ... inside a single read transaction ...
+**Busy handling**: set `PRAGMA busy_timeout` (e.g. 5000ms) and retry transient
+`SQLITE_BUSY` errors up to 3 times with jittered backoff for read commands.
```
Rationale: improves reliability in real multi-agent usage without changing semantics.
---
**7. Make Field Definitions Single-Source (Lightweight Drift Prevention)**
You rejected full schema generation from code; a lower-cost middle ground is shared field manifests used by both docs and `--fields` validation.
```diff
@@ ## Change 7: Fix Robot-Docs Response Schemas
+#### 7h. Single-source field manifests (no reflection)
+Define per-command field constants (e.g. `NOTES_FIELDS`, `DISCUSSIONS_FIELDS`)
+used by:
+1) `--fields` validation/filtering
+2) `--fields minimal` expansion
+3) `robot-docs` schema rendering
```
Rationale: cuts drift risk materially while staying much simpler than reflection/snapshot infra.
---
**8. De-duplicate and Upgrade Test Strategy Around Concurrency**
There are duplicated tests across Change 2 and Change 3; add explicit race tests where sync writes happen between list subqueries to prove tx consistency.
```diff
@@ ## Tests
-**Test 6**: `--project-id` scopes IID resolution directly
-**Test 7**: `--path-side old` matches renamed file discussions
-**Test 8**: `--path-side either` matches both old and new paths
+Move shared discussion-filter tests to a single section under Change 3.
+Add concurrency tests:
+1) count/page/incomplete consistency under concurrent sync writes
+2) show discussion+notes snapshot consistency under concurrent writes
```
Rationale: less maintenance noise, better coverage of your highest-risk correctness path.
---
If you want, I can also produce a single consolidated patch block that rewrites your plan text end-to-end with these edits applied in-place.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,169 @@
Below are the strongest **new** revisions Id make (excluding everything in your rejected list), with rationale and plan-level diffs.
### 1. Add a durable run ledger (`sync_runs`) with phase state
This makes surgical sync crash-resumable, auditable, and safer under Ctrl+C. Right now `run_id` is mostly ephemeral; persisting phase state removes ambiguity about what completed.
```diff
@@ Design Constraints
+9. **Durable run state**: Surgical sync MUST persist a `sync_runs` row keyed by `run_id`
+ with phase transitions (`preflight`, `ingest`, `dependents`, `docs`, `embed`, `done`, `failed`).
+ This is required for crash recovery, observability, and deterministic retries.
@@ Step 9: Create `run_sync_surgical`
+Before Stage 0, insert `sync_runs(run_id, project_id, mode='surgical', requested_counts, started_at)`.
+After each stage, update `sync_runs.phase`, counters, and `last_error` if present.
+On success/failure, set terminal state (`done`/`failed`) and `finished_at`.
```
### 2. Add `--preflight-only` (network validation without writes)
`--dry-run` is intentionally zero-network, so it cannot validate IIDs. `--preflight-only` is high-value for agents: verifies existence/permissions quickly with no DB mutation.
```diff
@@ CLI Interface
lore sync --dry-run --issue 123 -p myproject
+lore sync --preflight-only --issue 123 -p myproject
@@ Step 2: Add `--issue`, `--mr`, `-p` to `SyncArgs`
+ /// Validate remote entities and auth without any DB writes
+ #[arg(long, default_value_t = false)]
+ pub preflight_only: bool,
@@ Step 10: Add branch in `run_sync`
+if options.preflight_only && options.is_surgical() {
+ return run_sync_surgical_preflight_only(config, &options, run_id, signal).await;
+}
```
### 3. Preflight should aggregate all missing/failed IIDs, not fail-fast
Fail-fast causes repeated reruns. Aggregating errors gives one-shot correction and better robot automation.
```diff
@@ Step 7: Create `src/ingestion/surgical.rs`
-/// Returns the fetched payloads. If ANY fetch fails, the entire operation should abort.
+/// Returns fetched payloads plus per-IID failures; caller aborts writes if failures exist.
pub async fn preflight_fetch(...) -> Result<PreflightResult> {
@@
#[derive(Debug, Default)]
pub struct PreflightResult {
pub issues: Vec<GitLabIssue>,
pub merge_requests: Vec<GitLabMergeRequest>,
+ pub failures: Vec<EntityFailure>, // stage="fetch"
}
@@ Step 9: Create `run_sync_surgical`
-let preflight = preflight_fetch(...).await?;
+let preflight = preflight_fetch(...).await?;
+if !preflight.failures.is_empty() {
+ result.entity_failures = preflight.failures;
+ return Err(LoreError::Other("Surgical preflight failed for one or more IIDs".into()).into());
+}
```
### 4. Stop filtering scoped queue drains with raw `json_extract` scans
`json_extract(payload_json, '$.scope_run_id')` in hot drain queries will degrade as queue grows. Use indexed scope metadata.
```diff
@@ Step 9b: Implement scoped drain helpers
-// claim query adds:
-// AND json_extract(payload_json, '$.scope_run_id') = ?
+// Add migration:
+// 1) Add `scope_run_id` generated/stored column derived from payload_json (or explicit column)
+// 2) Create index on (project_id, job_type, scope_run_id, status, id)
+// Scoped drains filter by indexed `scope_run_id`, not full-table JSON extraction.
```
### 5. Replace `dirty_source_ids` collection-by-query with explicit run scoping
Current approach can accidentally include prior dirty rows for same source and can duplicate work. Tag dirty rows with `origin_run_id` and consume by run.
```diff
@@ Design Constraints
-2. **Dirty queue scoping**: ... MUST call ... `run_generate_docs_for_dirty_ids`
+2. **Dirty queue scoping**: Surgical sync MUST scope docs by `origin_run_id` on `dirty_sources`
+ (or equivalent exact run marker) and MUST NOT drain unrelated dirty rows.
@@ Step 7: `SurgicalIngestResult`
- pub dirty_source_ids: Vec<i64>,
+ pub origin_run_id: String,
@@ Step 9a: Implement `run_generate_docs_for_dirty_ids`
-pub fn run_generate_docs_for_dirty_ids(config: &Config, dirty_source_ids: &[i64]) -> Result<...>
+pub fn run_generate_docs_for_run_id(config: &Config, run_id: &str) -> Result<...>
```
### 6. Enforce transaction safety at the type boundary
`unchecked_transaction()` + `&Connection` signatures is fragile. Accept `&Transaction` for ingest internals and use `TransactionBehavior::Immediate` for deterministic lock behavior.
```diff
@@ Step 7: Create `src/ingestion/surgical.rs`
-pub fn ingest_issue_by_iid_from_payload(conn: &Connection, ...)
+pub fn ingest_issue_by_iid_from_payload(tx: &rusqlite::Transaction<'_>, ...)
-pub fn ingest_mr_by_iid_from_payload(conn: &Connection, ...)
+pub fn ingest_mr_by_iid_from_payload(tx: &rusqlite::Transaction<'_>, ...)
-let tx = conn.unchecked_transaction()?;
+let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Immediate)?;
```
### 7. Acquire sync lock only for mutation phases, not remote preflight
This materially reduces lock contention and keeps normal sync throughput higher, while still guaranteeing mutation serialization.
```diff
@@ Design Constraints
+10. **Lock window minimization**: Preflight fetch runs without sync lock; lock is acquired immediately
+ before first DB mutation and held through all mutation stages.
@@ Step 9: Create `run_sync_surgical`
-// ── Acquire sync lock ──
-...
-// ── Stage 0: Preflight fetch ──
+// ── Stage 0: Preflight fetch (no lock, no writes) ──
...
+// ── Acquire sync lock just before Stage 1 mutation ──
```
### 8. Add explicit transient retry policy beyond 429
Client already handles rate limits; surgical reliability improves a lot if 5xx/timeouts are retried with bounded backoff.
```diff
@@ Design Constraints
+11. **Transient retry policy**: Preflight and dependent remote fetches MUST retry boundedly on
+ timeout/5xx with jittered backoff; permanent errors (404/401/403) fail immediately.
@@ Step 5: Add `get_issue_by_iid` / `get_mr_by_iid`
+Document retry behavior for transient transport/server failures.
```
### 9. Tighten automated tests around scoping invariants
You already list manual checks; these should be enforced in unit/integration tests to prevent regressions.
```diff
@@ Step 1: TDD — Write Failing Tests First
+### 1d. New invariants tests
+- `surgical_docs_scope_ignores_preexisting_dirty_rows`
+- `scoped_queue_drain_ignores_orphaned_jobs`
+- `preflight_aggregates_multiple_missing_iids`
+- `preflight_only_performs_zero_writes`
+- `dry_run_performs_zero_network_calls`
+- `lock_window_does_not_block_during_preflight`
@@ Acceptance Criteria
+32. Scoped queue/docs invariants are covered by automated tests (not manual-only verification).
```
### 10. Make robot-mode surgical output first-class
For agent workflows, include full stage telemetry and actionable recovery commands.
```diff
@@ Step 15: Update `SyncResult` for robot mode structured output
+ /// Per-stage elapsed ms for deterministic performance tracking
+ pub stage_timings_ms: std::collections::BTreeMap<String, u64>,
+ /// Suggested recovery commands (robot ergonomics)
+ pub recovery_actions: Vec<String>,
@@ Step 14: Update `robot-docs` manifest
+Document surgical-specific error codes and `actions` schema for automated recovery.
```
If you want, I can now produce a fully rewritten **iteration 3** plan that merges these into your current structure end-to-end.

View File

@@ -0,0 +1,212 @@
1. **Resolve the current contract contradictions (`preflight-only`, `dry-run`, `sync_runs`)**
Why this improves the plan:
- Right now constraints conflict: “zero DB writes before commit” vs inserting `sync_runs` during preflight.
- This ambiguity will cause implementation drift and flaky acceptance tests.
- Splitting control-plane writes from content-plane writes keeps safety guarantees strict while preserving observability.
```diff
@@ ## Design Constraints
-6. **Preflight-then-commit**: All remote fetches happen BEFORE any DB writes. If any IID fetch fails (404, network error), the entire operation aborts with zero DB mutations.
+6. **Preflight-then-commit (content-plane)**: All remote fetches happen BEFORE any writes to content tables (`issues`, `merge_requests`, `discussions`, `resource_events`, `documents`, `embeddings`).
+7. **Control-plane exception**: `sync_runs` / `sync_run_entities` writes are allowed during preflight for observability and crash diagnostics.
@@
-11. **Preflight-only mode**: `--preflight-only` validates remote entity existence and permissions with zero DB writes.
+11. **Preflight-only mode**: `--preflight-only` performs zero content writes; control-plane run-ledger writes are allowed.
@@ ### For me to evaluate (functional):
-24. **Preflight-only mode** ... no DB mutations beyond the sync_runs ledger entry
+24. **Preflight-only mode** ... no content DB mutations; only run-ledger rows may be written
```
---
2. **Add stale-write protection to avoid TOCTOU regressions during unlocked preflight**
Why this improves the plan:
- You intentionally preflight without lock; thats good for throughput but introduces race risk.
- Without a guard, a slower surgical run can overwrite newer data ingested by a concurrent normal sync.
- This is a correctness bug under contention, not a nice-to-have.
```diff
@@ ## Design Constraints
+12. **Stale-write protection**: Surgical ingest MUST NOT overwrite fresher local rows. If local `updated_at` is newer than the preflight payloads `updated_at`, skip that entity and record `skipped_stale`.
@@ ## Step 7: Create `src/ingestion/surgical.rs`
- let labels_created = process_single_issue(conn, config, project_id, issue)?;
+ // Skip stale payloads to avoid TOCTOU overwrite after unlocked preflight.
+ if is_local_newer_issue(conn, project_id, issue.iid, issue.updated_at)? {
+ result.skipped_stale += 1;
+ return Ok(result);
+ }
+ let labels_created = process_single_issue(conn, config, project_id, issue)?;
@@
+// same guard for MR path
@@ ## Step 15: Update `SyncResult`
+ /// Entities skipped because local row was newer than preflight payload
+ pub skipped_stale: usize,
@@ ### Edge cases to verify:
+38. **TOCTOU safety**: if a normal sync updates entity after preflight but before ingest, surgical run skips stale payload (no overwrite)
```
---
3. **Make dirty-source scoping exact (do not capture pre-existing rows for same entity)**
Why this improves the plan:
- Current “query dirty rows by `source_id` after ingest” can accidentally include older dirty rows for the same entity.
- That silently violates strict run scoping and can delete unrelated backlog rows.
- You can fix this without adding `origin_run_id` to `dirty_sources` (which you already rejected).
```diff
@@ ## Step 7: Create `src/ingestion/surgical.rs`
- // Collect dirty_source rows for this entity
- let mut stmt = conn.prepare(
- "SELECT id FROM dirty_sources WHERE source_type = 'issue' AND source_id = ?1"
- )?;
+ // Capture only rows inserted by THIS call using high-water mark.
+ let before_dirty_id: i64 = conn.query_row(
+ "SELECT COALESCE(MAX(id), 0) FROM dirty_sources",
+ [], |r| r.get(0),
+ )?;
+ // ... call process_single_issue ...
+ let mut stmt = conn.prepare(
+ "SELECT id FROM dirty_sources
+ WHERE id > ?1 AND source_type = 'issue' AND source_id = ?2"
+ )?;
@@
+ // same pattern for MR
@@ ### 1d. Scoping invariant tests
+#[test]
+fn surgical_docs_scope_ignores_preexisting_dirty_rows_for_same_entity() {
+ // pre-insert dirty row for iid=7, then surgical ingest iid=7
+ // assert result.dirty_source_ids only contains newly inserted rows
+}
```
---
4. **Fix embed-stage leakage when `--no-docs` is used in surgical mode**
Why this improves the plan:
- Current design can run global embed even when docs stage is skipped, which may embed unrelated backlog docs.
- That breaks the surgical “scope only this run” promise.
- This is both correctness and operator-trust critical.
```diff
@@ ## Step 9: Create `run_sync_surgical`
- if !options.no_embed {
+ // Surgical embed only runs when surgical docs actually regenerated docs in this run.
+ if !options.no_embed && !options.no_docs && result.documents_regenerated > 0 {
@@ ## Step 4: Wire new fields in `handle_sync_cmd`
+ if options.is_surgical() && options.no_docs && !options.no_embed {
+ return Err(Box::new(LoreError::Other(
+ "In surgical mode, --no-docs requires --no-embed (to preserve scoping guarantees)".to_string()
+ )));
+ }
@@ ### For me to evaluate
+39. **No embed leakage**: `sync --issue X --no-docs` never embeds unrelated unembedded docs
```
---
5. **Add queue-failure hygiene so scoped jobs do not leak forever**
Why this improves the plan:
- Scoped drains prevent accidental processing, but failed runs can strand pending jobs permanently.
- You need explicit terminalization (`aborted`) and optional replay mechanics.
- Otherwise queue bloat and confusing diagnostics accumulate.
```diff
@@ ## Step 8a: Add `sync_runs` table migration
+ALTER TABLE dependent_queue ADD COLUMN aborted_reason TEXT;
+-- status domain now includes: pending, claimed, done, failed, aborted
@@ ## Step 9: run_sync_surgical failure paths
+// On run failure/cancel:
+conn.execute(
+ "UPDATE dependent_queue
+ SET status='aborted', aborted_reason=?1
+ WHERE project_id=?2 AND scope_run_id=?3 AND status='pending'",
+ rusqlite::params![failure_summary, project_id, run_id],
+)?;
@@ ## Acceptance Criteria
+40. **No stranded scoped jobs**: failed surgical runs leave no `pending` rows for their `scope_run_id`
```
---
6. **Persist per-entity lifecycle (`sync_run_entities`) for real observability and deterministic retry**
Why this improves the plan:
- `sync_runs` alone gives aggregate counters but not which IID failed at which stage.
- Per-entity records make retries deterministic and robot output far more useful.
- This is the missing piece for your stated “deterministic retry decisions.”
```diff
@@ ## Step 8a: Add `sync_runs` table migration
+CREATE TABLE IF NOT EXISTS sync_run_entities (
+ id INTEGER PRIMARY KEY,
+ run_id TEXT NOT NULL REFERENCES sync_runs(run_id),
+ entity_type TEXT NOT NULL CHECK(entity_type IN ('issue','merge_request')),
+ iid INTEGER NOT NULL,
+ stage TEXT NOT NULL,
+ status TEXT NOT NULL CHECK(status IN ('ok','failed','skipped_stale')),
+ error_code TEXT,
+ error_message TEXT,
+ updated_at INTEGER NOT NULL
+);
+CREATE INDEX IF NOT EXISTS idx_sync_run_entities_run ON sync_run_entities(run_id, entity_type, iid);
@@ ## Step 15: Update `SyncResult`
+ pub failed_iids: Vec<(String, u64)>,
+ pub skipped_stale_iids: Vec<(String, u64)>,
@@ ## CLI Interface
+lore --robot sync-runs --run-id <id>
+lore --robot sync-runs --run-id <id> --retry-failed
```
---
7. **Use explicit error type for surgical preflight failures (not `LoreError::Other`)**
Why this improves the plan:
- `Other(String)` loses machine semantics, weakens robot mode, and leads to bad exit-code behavior.
- A typed error preserves structured failures and enables actionable recovery commands.
```diff
@@ ## Step 9: run_sync_surgical
- return Err(LoreError::Other(
- format!("Surgical preflight failed for {} of {} IIDs: {}", ...)
- ).into());
+ return Err(LoreError::SurgicalPreflightFailed {
+ run_id: run_id.to_string(),
+ total: total_items,
+ failures: preflight.failures.clone(),
+ }.into());
@@ ## Step 15: Update `SyncResult`
+ /// Machine-actionable error summary for robot mode
+ pub error_code: Option<String>,
@@ ## Acceptance Criteria
+41. **Typed failure**: preflight failures serialize structured errors (not generic `Other`) with machine-usable codes/actions
```
---
8. **Strengthen tests for rollback, contention, and stale-skip guarantees**
Why this improves the plan:
- Current tests cover many happy-paths and scoping invariants, but key race/rollback behaviors are still under-tested.
- These are exactly where regressions will appear first in production.
```diff
@@ ## Step 1: TDD — Write Failing Tests First
+### 1f. Transactional rollback + TOCTOU tests
+1. `preflight_success_then_ingest_failure_rolls_back_all_content_writes`
+2. `stale_payload_is_skipped_when_local_updated_at_is_newer`
+3. `failed_run_aborts_pending_scoped_jobs`
+4. `surgical_no_docs_requires_no_embed`
@@ ### Automated scoping invariants
-38. **Scoped queue/docs invariants are enforced by automated tests**
+42. **Rollback and race invariants are enforced by automated tests** (no partial writes on ingest failure, no stale overwrite)
```
---
These eight revisions keep your core approach intact, avoid your explicitly rejected ideas, and close the biggest correctness/operability gaps before implementation.

View File

@@ -0,0 +1,130 @@
**Critical Gaps In Current Plan**
1. `dirty_sources` scoping is based on `id`, but `dirty_sources` has no `id` column and uses `(source_type, source_id)` UPSERT semantics.
2. Plan assumes a new `dependent_queue` with `status`, but current code uses `pending_dependent_fetches` (delete-on-complete), so queue-scoping design conflicts with existing invariants.
3. Constraint 6 says all remote fetches happen before any content writes, but the proposed surgical flow fetches discussions/events/diffs after ingest writes.
4. `sync_runs` is already an existing table and already used by `SyncRunRecorder`; the plan currently treats it like a new table.
**Best Revisions**
1. **Fix dirty-source scoping to match real schema (queued-at watermark, not `id` high-water).**
Why this is better: This removes a correctness bug and makes same-entity re-ingest deterministic under UPSERT behavior.
```diff
@@ Design Constraints
-2. Dirty queue scoping: ... capture MAX(id) FROM dirty_sources ... run_generate_docs_for_dirty_ids ...
+2. Dirty queue scoping: `dirty_sources` is keyed by `(source_type, source_id)` and updated via UPSERT.
+ Surgical scoping MUST use:
+ 1) a run-level `run_dirty_floor_ms` captured before surgical ingest, and
+ 2) explicit touched source keys from ingest (`(source_type, source_id)`).
+ Surgical docs MUST call a scoped API (e.g. `run_generate_docs_for_sources`) and MUST NOT drain global dirty queue.
@@ Step 9a
-pub fn run_generate_docs_for_dirty_ids(config: &Config, dirty_source_ids: &[i64]) -> Result<GenerateDocsResult>
+pub fn run_generate_docs_for_sources(config: &Config, sources: &[(SourceType, i64)]) -> Result<GenerateDocsResult>
```
2. **Bypass shared dependent queue in surgical mode; run dependents inline per target.**
Why this is better: Avoids queue migration churn, avoids run-scope conflicts with existing unique constraints, and removes orphan-job hygiene complexity entirely.
```diff
@@ Design Constraints
-4. Dependent queue scoping: ... scope_run_id indexed column on dependent_queue ...
+4. Surgical dependent execution: surgical mode MUST bypass `pending_dependent_fetches`.
+ Dependents (resource_events, mr_closes_issues, mr_diffs) run inline for targeted entities only.
+ Global queue remains for normal sync only.
@@ Design Constraints
-14. Queue failure hygiene: ... pending scoped jobs ... terminalized to aborted ...
+14. Surgical failure hygiene: surgical mode MUST leave no queue artifacts because it does not enqueue dependent jobs.
@@ Step 9b / 9c / Step 13
-Implement scoped drain helpers and enqueue_job scope_run_id plumbing
+Replace with direct per-entity helpers in ingestion layer:
+ - sync_issue_resource_events_direct(...)
+ - sync_mr_resource_events_direct(...)
+ - sync_mr_closes_issues_direct(...)
+ - sync_mr_diffs_direct(...)
```
3. **Clarify atomicity contract to “primary-entity atomicity” (remove contradiction).**
Why this is better: Keeps strong zero-write guarantees for missing IIDs while matching practical staged pipeline behavior.
```diff
@@ Design Constraints
-6. Preflight-then-commit (content-plane): All remote fetches happen BEFORE any writes to content tables ...
+6. Primary-entity atomicity: all requested issue/MR payload fetches complete before first content write.
+ If any primary IID fetch fails, primary ingest does zero content writes.
+ Dependent stages (discussions/events/diffs/closes) are post-ingest and best-effort, with structured per-stage failure reporting.
```
4. **Extend existing `sync_runs` schema instead of redefining it.**
Why this is better: Preserves compatibility with current `SyncRunRecorder`, `sync_status`, and existing historical data.
```diff
@@ Step 8a
-Add `sync_runs` table migration (CREATE TABLE sync_runs ...)
+Add migration 027 to extend existing `sync_runs` table:
+ - ADD COLUMN mode TEXT NULL -- 'standard' | 'surgical'
+ - ADD COLUMN phase TEXT NULL -- preflight|ingest|dependents|docs|embed|done|failed
+ - ADD COLUMN surgical_summary_json TEXT NULL
+Reuse `SyncRunRecorder` row lifecycle; do not introduce a parallel run-ledger model.
```
5. **Strengthen TOCTOU stale protection for equal timestamps.**
Why this is better: Prevents regressions when `updated_at` is equal but a fresher local fetch already happened.
```diff
@@ Design Constraints
-13. ... If local `updated_at` is newer than preflight payload `updated_at`, skip ...
+13. ... Skip stale when:
+ a) local.updated_at > payload.updated_at, OR
+ b) local.updated_at == payload.updated_at AND local.last_seen_at > preflight_started_at_ms.
+ This prevents equal-timestamp regressions under concurrent sync.
@@ Step 1f tests
+Add test: `equal_updated_at_but_newer_last_seen_is_skipped`.
```
6. **Shrink lock window further: release `sync` lock before embed; use dedicated embed lock.**
Why this is better: Prevents long embedding from blocking unrelated syncs and avoids concurrent embed writers.
```diff
@@ Design Constraints
-11. Lock ... held through all mutation stages.
+11. Lock ... held through ingest/dependents/docs only.
+ Release `AppLock("sync")` before embed.
+ Embed stage uses `AppLock("embed")` for single-flight embedding writes.
@@ Step 9
-Embed runs inside the same sync lock window
+Embed runs after sync lock release, under dedicated embed lock
```
7. **Add the missing `sync-runs` robot read path (the plan references it but doesnt define it).**
Why this is better: Makes durable run-state actually useful for recovery automation and observability.
```diff
@@ Step 14 (new)
+## Step 14a: Add `sync-runs` read command
+
+CLI:
+ lore --robot sync-runs --limit 20
+ lore --robot sync-runs --run-id <id>
+ lore --robot sync-runs --state failed
+
+Robot response fields:
+ run_id, mode, phase, status, started_at, finished_at, counters, failures, suggested_retry_command
```
8. **Add URL-native surgical targets (`--issue-url`, `--mr-url`) with project inference.**
Why this is better: Much more agent-friendly and reduces project-resolution errors from copy/paste workflows.
```diff
@@ CLI Interface
lore sync --issue 123 --issue 456 -p myproject
+lore sync --issue-url https://gitlab.example.com/group/proj/-/issues/123
+lore sync --mr-url https://gitlab.example.com/group/proj/-/merge_requests/789
@@ Step 2
+Add repeatable flags:
+ --issue-url <url>
+ --mr-url <url>
+Parse URL into (project_path, iid). If all targets are URL-derived and same project, `-p` is optional.
+If mixed projects are provided in one command, reject with clear error.
```
If you want, I can produce a single consolidated patched version of your plan (iteration 5 draft) with these revisions already merged.

View File

@@ -0,0 +1,152 @@
Highest-impact revisions after reviewing your v5 plan:
1. **Fix a real scoping hole: embed can still process unrelated docs**
Rationale: Current plan assumes scoped docs implies scoped embed, but that only holds while no other run creates unembedded docs. You explicitly release sync lock before embed, so another sync can enqueue/regenerate docs in between, and `run_embed` may embed unrelated backlog. This breaks surgical isolation and can hide backlog debt.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
-3. Embed scoping: Embedding runs only for documents regenerated by this surgical run. Because `run_embed` processes only unembedded docs, scoping is automatic IF docs are scoped correctly...
+3. Embed scoping: Embedding MUST be explicitly scoped to documents regenerated by this surgical run.
+ `run_generate_docs_for_sources` returns regenerated `document_ids`; surgical mode calls
+ `run_embed_for_document_ids(document_ids)` and never global `run_embed`.
+ This remains true even after lock release and under concurrent normal sync activity.
@@ Step 9a: Implement `run_generate_docs_for_sources`
-pub fn run_generate_docs_for_sources(...) -> Result<GenerateDocsResult> {
+pub fn run_generate_docs_for_sources(...) -> Result<GenerateDocsResult> {
+ // Return regenerated document IDs for scoped embedding.
+ // GenerateDocsResult { regenerated, errored, regenerated_document_ids: Vec<i64> }
@@ Step 9: Embed stage
- match run_embed(config, false, false, None, signal).await {
+ match run_embed_for_document_ids(config, &result.regenerated_document_ids, signal).await {
```
2. **Make run-ledger lifecycle actually durable (and consistent with your own constraint 10)**
Rationale: Plan text says “reuse `SyncRunRecorder`”, but Step 9 writes raw SQL directly. That creates lifecycle drift, missing heartbeats, and inconsistent failure handling as code evolves.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
-10. Durable run state: ... Reuses `SyncRunRecorder` row lifecycle ...
+10. Durable run state: surgical sync MUST use `SyncRunRecorder` end-to-end (no ad-hoc SQL updates).
+ Add recorder APIs for `set_mode`, `set_phase`, `set_counters`, `finish_succeeded`,
+ `finish_failed`, `finish_cancelled`, and periodic `heartbeat`.
@@ Step 9: Create `run_sync_surgical`
- conn.execute("INSERT INTO sync_runs ...")
- conn.execute("UPDATE sync_runs SET phase = ...")
+ let mut recorder = SyncRunRecorder::start_surgical(...)?;
+ recorder.set_phase("preflight")?;
+ recorder.heartbeat_if_due()?;
+ recorder.set_phase("ingest")?;
+ ...
+ recorder.finish_succeeded_with_warnings(...)?;
```
3. **Add explicit `cancelled` terminal state**
Rationale: Current early cancellation branches return `Ok(result)` without guaranteed run-row finalization. That leaves misleading `running` rows and weak crash diagnostics.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
+15. Cancellation semantics: If shutdown is observed after run start, phase is set to `cancelled`,
+ status is `cancelled`, `finished_at` is written, and lock is released before return.
@@ Step 8a migration
+ALTER TABLE sync_runs ADD COLUMN warnings_count INTEGER NOT NULL DEFAULT 0;
+ALTER TABLE sync_runs ADD COLUMN cancelled_at INTEGER;
@@ Acceptance Criteria
+47. Cancellation durability: Ctrl+C during surgical sync records `status='cancelled'`,
+ `phase='cancelled'`, and `finished_at` in `sync_runs`.
```
4. **Reduce lock contention further by separating dependent fetch and dependent write**
Rationale: You currently hold lock through network-heavy dependent stages. That maximizes contention and increases lock timeout risk. Better: fetch dependents unlocked, write in short locked transactions with per-entity freshness guards.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
-11. Lock window minimization: ... held through ingest, dependents, and docs stages.
+11. Lock window minimization: lock is held only for DB mutation windows.
+ Dependents run in two phases:
+ (a) fetch from GitLab without lock,
+ (b) write results under lock in short transactions.
+ Apply per-entity freshness checks before dependent writes.
@@ Step 9: Dependent stages
- // All dependents run INLINE per-entity ... while lock is held
+ // Dependents fetch outside lock, then write under lock with CAS-style watermark guards.
```
5. **Introduce stage timeout budgets to prevent hung surgical runs**
Rationale: A single slow GitLab endpoint can stall the whole run and hold resources too long. Timeout budgets plus per-entity failure recording keep the run bounded and predictable.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
+16. Stage timeout budgets: each dependent fetch has a per-entity timeout and a global stage budget.
+ Timed-out entities are recorded in `entity_failures` with code `TIMEOUT` and run continues best-effort.
@@ Step 9 notes
+ - Wrap dependent network calls with `tokio::time::timeout`.
+ - Add config knobs:
+ `sync.surgical_entity_timeout_seconds` (default 20),
+ `sync.surgical_dependents_budget_seconds` (default 120).
```
6. **Add payload integrity checks (project mismatch hard-fail)**
Rationale: Surgical mode is precision tooling. If API/proxy misconfiguration returns payloads from wrong project, you should fail preflight loudly, not trust downstream assumptions.
```diff
diff --git a/plan.md b/plan.md
@@ Step 7: preflight_fetch
+ // Integrity check: payload.project_id must equal requested gitlab_project_id.
+ // On mismatch, record EntityFailure { code: "PROJECT_MISMATCH", stage: "fetch" }.
@@ Step 9d: error codes
+PROJECT_MISMATCH -> usage/config data integrity failure (typed, machine-readable)
@@ Acceptance Criteria
+48. Project integrity: payloads with unexpected `project_id` are rejected in preflight
+ and produce zero content writes.
```
7. **Upgrade robot output from aggregate-only to per-entity lifecycle**
Rationale: `entity_failures` alone is not enough for robust automation. Agents need a complete entity outcome map (fetched, ingested, stale-skipped, dependent failures) to retry deterministically.
```diff
diff --git a/plan.md b/plan.md
@@ Step 15: Update `SyncResult`
+pub struct EntityOutcome {
+ pub entity_type: String,
+ pub iid: u64,
+ pub fetched: bool,
+ pub ingested: bool,
+ pub stale_skipped: bool,
+ pub dependent_failures: Vec<EntityFailure>,
+}
@@
+pub entity_outcomes: Vec<EntityOutcome>,
+pub completion_status: String, // succeeded | succeeded_with_warnings | failed | cancelled
@@ Robot mode
- enables agents to detect partial failures via `entity_failures`
+ enables deterministic, per-IID retry and richer UI messaging.
```
8. **Index `sync_runs` for real observability at scale**
Rationale: Youre adding mode/phase/counters and then querying recent surgical runs. Without indexes, this degrades as run history grows.
```diff
diff --git a/plan.md b/plan.md
@@ Step 8a migration
+CREATE INDEX IF NOT EXISTS idx_sync_runs_mode_started
+ ON sync_runs(mode, started_at DESC);
+CREATE INDEX IF NOT EXISTS idx_sync_runs_status_phase_started
+ ON sync_runs(status, phase, started_at DESC);
```
9. **Add tests specifically for the new failure-prone paths**
Rationale: Current tests are strong on ingest and scoping, but still miss new high-risk runtime behavior (cancel state, timeout handling, scoped embed under concurrency).
```diff
diff --git a/plan.md b/plan.md
@@ Step 1f tests
+#[tokio::test]
+async fn cancellation_marks_sync_run_cancelled() { ... }
+
+#[tokio::test]
+async fn dependent_timeout_records_entity_failure_and_continues() { ... }
+
+#[tokio::test]
+async fn scoped_embed_does_not_embed_unrelated_docs_created_after_docs_stage() { ... }
@@ Acceptance Criteria
+49. Scoped embed isolation under concurrency is verified by automated test.
+50. Timeout path is verified (TIMEOUT code + continued processing).
```
These revisions keep your core direction intact, avoid every rejected recommendation, and materially improve correctness under concurrency, operational observability, and agent automation quality.

2240
docs/plan-surgical-sync.md Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,131 @@
1. **Make immutable identity usable now (`--author-id`)**
Why: The plan captures `author_id` but intentionally defers using it, so the core longitudinal-analysis problem is only half-fixed.
```diff
@@ Phase 1: `lore notes` Command / Work Chunk 1A
pub struct NoteListFilters<'a> {
+ pub author_id: Option<i64>, // immutable identity filter
@@
- pub author: Option<&'a str>, // case-insensitive match via COLLATE NOCASE
+ pub author: Option<&'a str>, // display-name filter
+ // If both author and author_id are provided, apply both (AND) for precision.
}
@@
Filter mappings:
+ - `author_id`: `n.author_id = ?` (exact immutable identity)
- `author`: strip `@` prefix, `n.author_username = ? COLLATE NOCASE`
@@ Phase 1 / Work Chunk 1B (CLI)
+ /// Filter by immutable author id
+ #[arg(long = "author-id", help_heading = "Filters")]
+ pub author_id: Option<i64>,
@@ Phase 2 / Work Chunk 2F
+ Add `--author-id` support to `lore search` filtering for note documents.
@@ Phase 1 / Work Chunk 1E
+ CREATE INDEX IF NOT EXISTS idx_notes_project_author_id_created
+ ON notes(project_id, author_id, created_at DESC, id DESC)
+ WHERE is_system = 0 AND author_id IS NOT NULL;
```
2. **Fix document staleness on username changes**
Why: Current plan says username changes are “not semantic,” but note documents include username in content/title, so docs go stale/inconsistent.
```diff
@@ Work Chunk 0D: Immutable Author Identity Capture
- Assert: changed_semantics = false (username change is not a semantic change for documents)
+ Assert: changed_semantics = true (username affects note document content/title)
@@ Work Chunk 0A: semantic-change detection
- old_body != body || old_note_type != note_type || ...
+ old_body != body || old_note_type != note_type || ...
+ || old_author_username != author_username
@@ Work Chunk 2C: Note Document Extractor header
author: @{author}
+ author_id: {author_id}
```
3. **Replace `last_seen_at` sweep marker with monotonic `sync_run_id`**
Why: Timestamp markers are vulnerable to clock skew and concurrent runs; run IDs are deterministic and safer.
```diff
@@ Phase 0: Stable Note Identity
+ ### Work Chunk 0E: Monotonic Run Marker
+ Add `sync_runs` table and `notes.last_seen_run_id`.
+ Ingest assigns one run_id per sync transaction.
+ Upsert sets `last_seen_run_id = current_run_id`.
+ Sweep condition becomes `last_seen_run_id < current_run_id` (when fetch_complete=true).
@@ Work Chunk 0C
- fetch_complete + last_seen_at-based sweep
+ fetch_complete + run_id-based sweep
```
4. **Materialize stale-note set once during sweep**
Why: Current set-based SQL still re-runs the stale subquery 3 times; materializing once improves performance and guarantees identical deletion set.
```diff
@@ Work Chunk 0B: Immediate Deletion Propagation
- DELETE FROM documents ... IN (SELECT id FROM notes WHERE ...);
- DELETE FROM dirty_sources ... IN (SELECT id FROM notes WHERE ...);
- DELETE FROM notes WHERE ...;
+ CREATE TEMP TABLE _stale_note_ids AS
+ SELECT id, is_system FROM notes WHERE discussion_id = ? AND last_seen_run_id < ?;
+ DELETE FROM documents
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
+ DELETE FROM dirty_sources
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
+ DELETE FROM notes WHERE id IN (SELECT id FROM _stale_note_ids);
+ DROP TABLE _stale_note_ids;
```
5. **Move historical note backfill out of migration into resumable runtime job**
Why: Data-heavy migration can block startup and is harder to resume/recover on large DBs.
```diff
@@ Work Chunk 2H
- Backfill Existing Notes After Upgrade (Migration 024)
+ Backfill Existing Notes After Upgrade (Resumable Runtime Backfill)
@@
- Files: `migrations/024_note_dirty_backfill.sql`, `src/core/db.rs`
+ Files: `src/documents/backfill.rs`, `src/cli/commands/generate_docs.rs`
@@
- INSERT INTO dirty_sources ... SELECT ... FROM notes ...
+ Introduce batched backfill API:
+ `enqueue_missing_note_documents(batch_size: usize) -> BackfillProgress`
+ invoked from `generate-docs`/`sync` until complete, resumable across runs.
```
6. **Add streaming path for large `jsonl`/`csv` note exports**
Why: Current `query_notes` materializes full result set in memory; streaming improves scalability and latency.
```diff
@@ Work Chunk 1A
+ Add `query_notes_stream(conn, filters, row_handler)` for forward-only row iteration.
@@ Work Chunk 1C
- print_list_notes_jsonl(&result)
- print_list_notes_csv(&result)
+ print_list_notes_jsonl_stream(config, filters)
+ print_list_notes_csv_stream(config, filters)
+ (table/json keep counted buffered path)
```
7. **Add index for path-centric note queries**
Why: `--path` + project/date queries are a stated hot path and not fully covered by current proposed indexes.
```diff
@@ Work Chunk 1E: Composite Query Index
+ CREATE INDEX IF NOT EXISTS idx_notes_project_path_created
+ ON notes(project_id, position_new_path, created_at DESC, id DESC)
+ WHERE is_system = 0 AND position_new_path IS NOT NULL;
```
8. **Add property/invariant tests (not only examples)**
Why: This feature touches ingestion identity, sweeping, deletion propagation, and document regeneration; randomized invariants will catch subtle regressions.
```diff
@@ Verification Checklist
+ Add property tests (proptest):
+ - stable local IDs across randomized re-sync orderings
+ - no orphan `documents(source_type='note')` after randomized deletions/sweeps
+ - partial-fetch runs never reduce note count
+ - repeated full rebuild converges (fixed-point idempotence)
```
These revisions keep your existing direction, avoid all rejected items, and materially improve correctness, scale behavior, and long-term maintainability.

2518
docs/prd-per-note-search.md Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -125,7 +125,7 @@ lore -J mrs --fields iid,title,state,draft,target_branch
### Available Fields
**Issues**: `iid`, `title`, `state`, `author_username`, `labels`, `assignees`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`
**Issues**: `iid`, `title`, `state`, `author_username`, `labels`, `assignees`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at_iso`
**MRs**: `iid`, `title`, `state`, `author_username`, `labels`, `draft`, `target_branch`, `source_branch`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `reviewers`

541
docs/user-journeys.md Normal file
View File

@@ -0,0 +1,541 @@
# Lore CLI User Journeys
## Purpose
Map realistic workflows for both human users and AI agents to identify gaps in the command surface and optimization opportunities. Each journey starts with a **problem** and traces the commands needed to reach a **resolution**.
---
## Part 1: Human User Flows
### H1. Morning Standup Prep
**Problem:** "What happened since yesterday? I need to know what moved before standup."
**Flow:**
```
lore sync -q # Refresh data (quiet, no noise)
lore issues -s opened --since 1d # Issues that changed overnight
lore mrs -s opened --since 1d # MRs that moved
lore who @me # My current workload snapshot
```
**Gap identified:** No single "activity feed" command. User runs 3 queries to get what should be one view. No `--since 1d` shorthand for "since yesterday." No `@me` alias for the authenticated user.
---
### H2. Sprint Planning: What's Ready to Pick Up?
**Problem:** "We're planning the next sprint. What's open, unassigned, and actionable?"
**Flow:**
```
lore issues -s opened -p myproject # All open issues
lore issues -s opened -l "ready" # Issues labeled ready
lore issues -s opened --has-due # Issues with deadlines approaching
lore count issues -p myproject # How many total?
```
**Gap identified:** No way to filter by "unassigned" issues (missing `--no-assignee` flag). No way to sort by due date. No way to see priority/weight. Can't combine filters like "opened AND no assignee AND has due date."
---
### H3. Investigating a Production Incident
**Problem:** "Deploy broke prod. I need the full timeline of what changed around the deploy."
**Flow:**
```
lore sync -q # Get latest
lore timeline "deploy" --since 7d # What happened around deploys
lore search "deploy" --type mr # MRs mentioning deploy
lore mrs 456 # Inspect the suspicious MR
lore who --overlap src/deploy/ # Who else touches deploy code
```
**Gap identified:** Timeline is keyword-based, not event-based. Can't filter by "MRs merged in the last 24 hours" directly. No way to see which MRs were merged between two dates (release diff). Would benefit from `lore mrs -s merged --since 1d`.
---
### H4. Preparing to Review Someone's MR
**Problem:** "I was assigned to review MR !789. I need context before diving in."
**Flow:**
```
lore mrs 789 # Read the MR description + discussions
lore mrs 789 -o # Open in browser for the actual diff
lore who src/features/auth/ # Who are the experts in this area?
lore search "auth refactor" --type issue # Related issues for background
lore timeline "authentication" # History of auth changes
```
**Gap identified:** No way to see the file list touched by an MR from the CLI (data is stored in `mr_file_changes` but not surfaced). No way to link an MR back to its closing issue(s) from the MR detail view. The cross-reference data exists in `entity_references` but isn't shown in `mrs <iid>` output.
---
### H5. Onboarding to an Unfamiliar Code Area
**Problem:** "I'm new to the team and need to understand how the billing module works."
**Flow:**
```
lore search "billing" -n 20 # What exists about billing?
lore who src/billing/ # Who knows billing best?
lore timeline "billing" --depth 2 # History of billing changes
lore mrs -s merged -l billing --since 6m # Recent merged billing work
lore issues -s opened -l billing # Outstanding billing issues
```
**Gap identified:** No way to get a "module overview" in one command. The search spans issues, MRs, and discussions but doesn't summarize by category. No way to see the most-discussed or most-referenced entities (high-signal items for understanding).
---
### H6. Finding the Right Reviewer for My PR
**Problem:** "I'm about to submit a PR touching auth and payments. Who should review?"
**Flow:**
```
lore who src/features/auth/ # Auth experts
lore who src/features/payments/ # Payment experts
lore who @candidate1 # Check candidate1's workload
lore who @candidate2 # Check candidate2's workload
```
**Gap identified:** No way to query multiple paths at once (`lore who src/auth/ src/payments/`). No way to find the intersection of expertise. No workload-aware recommendation ("who knows this AND has bandwidth"). Four separate commands for what should be one decision.
---
### H7. Understanding Why a Feature Was Built This Way
**Problem:** "This code is weird. Why was it implemented like this? What was the original discussion?"
**Flow:**
```
lore search "feature-name rationale" # Search for decision context
lore timeline "feature-name" --depth 2 # Full history with cross-refs
lore issues 234 # Read the original issue
lore mrs 567 # Read the implementation MR
```
**Gap identified:** No way to search within a specific issue's or MR's discussion notes. The search covers documents (titles + descriptions) but per-note search isn't available yet (PRD exists). No way to navigate "issue 234 was closed by MR 567" without manually knowing both IDs.
---
### H8. Checking Team Workload Before Assigning Work
**Problem:** "I need to assign this urgent bug. Who has the least on their plate?"
**Flow:**
```
lore who @alice # Alice's workload
lore who @bob # Bob's workload
lore who @carol # Carol's workload
lore who @dave # Dave's workload
```
**Gap identified:** No team-level workload view. Must query each person individually. No way to list "all assignees and their open issue counts." No concept of a team roster. Would benefit from `lore who --team` or `lore workload`.
---
### H9. Preparing Release Notes
**Problem:** "We're cutting a release. I need to summarize what's in this version."
**Flow:**
```
lore mrs -s merged --since 2w -p myproject # MRs merged since last release
lore issues -s closed --since 2w -p myproject # Issues closed since last release
lore mrs -s merged -l feature --since 2w # Feature MRs specifically
lore mrs -s merged -l bugfix --since 2w # Bugfix MRs
```
**Gap identified:** No way to filter by milestone (for version-based releases). Wait -- `issues` has `-m` for milestone but `mrs` does not. No changelog generation. No "what closed between tag A and tag B." No grouping by label for release note categories.
---
### H10. Finding and Closing Stale Issues
**Problem:** "Our backlog is bloated. Which issues haven't been touched in months?"
**Flow:**
```
lore issues -s opened --sort updated --asc -n 50 # Oldest-updated first
# Then manually inspect each one...
lore issues 42 # Is this still relevant?
```
**Gap identified:** No `--before` or `--updated-before` filter (only `--since` exists). Can sort ascending but can't filter "not updated in 90 days." No staleness indicator. No bulk operations concept.
---
### H11. Understanding a Bug's Full History
**Problem:** "Bug #321 keeps getting reopened. I need to understand its entire lifecycle."
**Flow:**
```
lore issues 321 # Read the issue
lore timeline "bug-keyword" -p myproject # Try to find timeline events
# But timeline is keyword-based, not entity-based...
```
**Gap identified:** No way to get a timeline for a specific entity by IID. `lore timeline` requires a keyword query, not an entity reference. Would benefit from `lore timeline --issue 321` or `lore timeline --mr 456` to get the event history of a specific entity directly.
---
### H12. Identifying Who to Ask About Failing Tests
**Problem:** "CI tests are failing in `src/lib/parser.rs`. Who last touched this?"
**Flow:**
```
lore who src/lib/parser.rs # Expert lookup
lore who --overlap src/lib/parser.rs # Who else has touched it
lore search "parser" --type mr --since 2w # Recent MRs touching parser
```
**Gap identified:** Expert mode uses DiffNote analysis (code review comments), not actual file change tracking. The `mr_file_changes` table has the real data but `who` doesn't use it for attribution. Could be much more accurate with file-change-based expertise.
---
### H13. Tracking a Feature Across Multiple MRs
**Problem:** "The 'dark mode' feature spans 5 MRs. I need to see them all together."
**Flow:**
```
lore mrs -l dark-mode # MRs with the label
lore issues -l dark-mode # Related issues
lore timeline "dark mode" --depth 2 # Cross-referenced events
```
**Gap identified:** Works reasonably well with labels as the grouping mechanism. But if the team didn't label consistently, there's no way to discover related MRs by content similarity. No "related items" view that combines issues + MRs + discussions for a topic.
---
### H14. Checking if a Similar Fix Was Already Attempted
**Problem:** "Before I implement this fix, was something similar tried before?"
**Flow:**
```
lore search "memory leak connection pool" # Semantic search
lore search "connection pool" --type mr -s all # Wait, no state filter on search
lore mrs -s closed -l bugfix # Closed bugfix MRs (coarse)
lore timeline "connection pool" # Historical context
```
**Gap identified:** Search doesn't have a `--state` filter. Can't search only closed/merged items. The semantic search is powerful but can't be combined with entity state. Would benefit from `--state merged` on search to find past attempts.
---
### H15. Reviewing Discussions That Need My Attention
**Problem:** "Which discussion threads am I involved in that are still unresolved?"
**Flow:**
```
lore who --active # All active unresolved discussions
lore who --active --since 30d # Wider window
# But can't filter to "discussions I'm in"...
```
**Gap identified:** `--active` shows all unresolved discussions, not filtered by participant. No way to say "show me discussions where @me participated." No notification/mention tracking. No "my unresolved threads" view.
---
## Part 2: AI Agent Flows
### A1. Context Gathering Before Code Modification
**Problem:** Agent is about to modify `src/features/auth/session.rs` and needs full context.
**Flow:**
```
lore -J health # Pre-flight check
lore -J who src/features/auth/ # Who knows this area
lore -J search "auth session" -n 10 # Related issues/MRs
lore -J mrs -s merged --since 3m -l auth # Recent auth changes
lore -J who --overlap src/features/auth/session.rs # Concurrent work risk
```
**Gap identified:** No way to check "are there open MRs touching this file right now?" The overlap mode shows historical touches, not active branches. An agent needs to know about in-flight changes to avoid conflicts.
---
### A2. Auto-Triaging an Incoming Issue
**Problem:** Agent receives a new issue and needs to categorize it, find related work, and suggest assignees.
**Flow:**
```
lore -J issues 999 # Read the new issue
lore -J search "$(extract_keywords)" --explain # Find similar past issues
lore -J who src/affected/path/ # Suggest experts as assignees
lore -J issues -s opened -l same-label # Check for duplicates
```
**Gap identified:** No way to get just the description text for programmatic keyword extraction. `issues <iid>` returns full detail including discussions. Agent must parse the full response to extract the description for a secondary search. Would benefit from `--fields description` on detail view. No duplicate detection built in.
---
### A3. Generating Sprint Status Report
**Problem:** Agent needs to produce a weekly status report for the team.
**Flow:**
```
lore -J issues -s closed --since 1w --fields minimal # Completed work
lore -J issues -s opened --status "In progress" # In-flight work
lore -J mrs -s merged --since 1w --fields minimal # Merged PRs
lore -J mrs -s opened -D --fields minimal # Open non-draft MRs
lore -J count issues # Totals
lore -J count mrs # MR totals
lore -J who --active --since 1w # Discussions needing attention
```
**Gap identified:** Seven separate queries for one report. No `lore summary` or `lore report` command. No way to get "issues transitioned from X to Y this week" (state change history exists in events but isn't queryable). No velocity metric (issues closed per week trend).
---
### A4. Finding Relevant Prior Art Before Implementing
**Problem:** Agent is implementing a caching layer and wants to find if similar patterns exist in the codebase's GitLab history.
**Flow:**
```
lore -J search "caching" --mode hybrid -n 20 --explain
lore -J search "cache invalidation" --mode hybrid -n 10
lore -J search "redis" --mode lexical --type discussion # Exact term in discussions
lore -J timeline "cache" --since 1y # Wait, max is 1y? Let's try 12m
```
**Gap identified:** No way to search discussion notes individually (per-note search). Discussions are aggregated into documents, so individual note-level matches are lost. The `--explain` flag helps but doesn't show which specific note matched. No `--since 1y` or `--since 12m` duration format.
---
### A5. Building Context for PR Description
**Problem:** Agent wrote code and needs to generate a PR description that references relevant issues.
**Flow:**
```
lore -J search "feature description keywords" --type issue
lore -J issues -s opened -l feature-label --fields iid,title,web_url
# Cross-reference: which issues does this MR close?
# No command for this -- must manually scan search results
```
**Gap identified:** No way to query the `entity_references` table directly. Agent can't ask "which issues reference MR !456" or "which issues contain 'closes #123' in their text." The data exists but isn't exposed as a query surface. Would benefit from `lore refs --mr 456` or `lore refs --issue 123`.
---
### A6. Identifying Affected Experts for Review Assignment
**Problem:** Agent needs to automatically assign reviewers based on the files changed in an MR.
**Flow:**
```
lore -J mrs 456 # Get MR details
# Parse file paths from response... but file changes aren't in the output
lore -J who src/path/from/mr/ # Query each path
lore -J who src/another/path/ # One at a time...
lore -J who @candidate --fields minimal # Check workload
```
**Gap identified:** MR detail view (`mrs <iid>`) doesn't include the file change list from `mr_file_changes`. Agent can't programmatically extract which files an MR touches. Must fall back to GitLab API or guess from description. The `who` command doesn't accept multiple paths. No "auto-reviewer" suggestion combining expertise + availability.
---
### A7. Incident Investigation and Timeline Reconstruction
**Problem:** Agent needs to reconstruct what happened during an outage for a postmortem.
**Flow:**
```
lore -J timeline "outage" --since 3d --depth 2 --expand-mentions
lore -J search "error 500" --since 3d
lore -J mrs -s merged --since 3d -p production-service
lore -J issues --status "In progress" -p production-service
```
**Gap identified:** Timeline is keyword-seeded, which means if the outage wasn't described with that exact term, seeds may miss it. No way to seed a timeline from an entity ID (e.g., "start from issue #321 and expand outward"). No severity/priority filter. No way to correlate with merge times.
---
### A8. Cross-Project Impact Assessment
**Problem:** Agent needs to understand how a breaking API change in project A affects projects B and C.
**Flow:**
```
lore -J search "api-endpoint-name" -p project-a
lore -J search "api-endpoint-name" -p project-b
lore -J search "api-endpoint-name" -p project-c
# Or without project filter to search everywhere:
lore -J search "api-endpoint-name" -n 50
lore -J timeline "api-endpoint-name" --depth 2
```
**Gap identified:** Cross-project references in entity_references are tracked but the timeline shows unresolved references for entities not synced locally. No way to see a cross-project dependency map. Search works across projects but doesn't group results by project.
---
### A9. Automated Stale Issue Recommendations
**Problem:** Agent runs weekly to identify issues that should be closed or re-prioritized.
**Flow:**
```
lore -J issues -s opened --sort updated --asc -n 100 # Oldest first
# For each issue, check:
lore -J issues <iid> # Read details
lore -J search "<issue title keywords>" # Any recent activity?
```
**Gap identified:** No `--updated-before` filter, so agent must fetch all and filter client-side. No way to detect "issue has no assignee AND no activity in 90 days." The 100-issue limit means pagination is needed for large backlogs, but there's no cursor/offset pagination -- only `--limit`. Agent must do N+1 queries to inspect each candidate.
---
### A10. Code Review Preparation (File-Level Context)
**Problem:** Agent is reviewing MR !789 and needs to understand the history of each changed file.
**Flow:**
```
lore -J mrs 789 # Get MR details
# Can't get file list from output...
# Fall back to search by MR title keywords
lore -J search "feature-from-mr" --type mr
lore -J who src/guessed/path/ # Expertise for each file
lore -J who --overlap src/guessed/path/ # Concurrent changes
```
**Gap identified:** Same as A6 -- `mr_file_changes` data isn't exposed. Agent is blind to the actual files in the MR unless it parses the description or uses the GitLab API directly. This is the single biggest gap for automated code review workflows.
---
### A11. Building a Knowledge Graph of Entity Relationships
**Problem:** Agent wants to map how issues, MRs, and discussions are connected for a feature.
**Flow:**
```
lore -J search "feature-name" -n 30
lore -J timeline "feature-name" --depth 2 --max-entities 100
# Timeline shows expanded entities and cross-refs, but...
# No way to query entity_references directly
# No way to get "all entities that reference issue #123"
```
**Gap identified:** The `entity_references` table (closes, related, mentioned) is used internally by timeline but isn't queryable as a standalone command. Agent can't ask "what closes issue #123?" or "what does MR !456 reference?" No graph export. Would enable powerful dependency mapping.
---
### A12. Release Readiness Assessment
**Problem:** Agent needs to verify all issues in milestone "v2.0" are closed and MRs are merged.
**Flow:**
```
lore -J issues -m "v2.0" -s opened # Any open issues in milestone?
lore -J issues -m "v2.0" -s closed # Closed issues
# MRs don't have milestone filter...
lore -J mrs -s opened -l "v2.0" # Try label as proxy
lore -J who --active -p myproject # Unresolved discussions
```
**Gap identified:** MRs don't have a `--milestone` filter (issues do). No way to check "all MRs linked to issues in milestone v2.0" -- would require joining `entity_references` with issue milestone. No release checklist concept. No way to verify "every issue in this milestone has a closing MR."
---
### A13. Answering "What Changed?" Between Two Points
**Problem:** Agent needs to diff project state between two dates for a stakeholder report.
**Flow:**
```
lore -J issues -s closed --since 2w --fields minimal # Recently closed
lore -J issues -s opened --since 2w --fields minimal # Recently opened
lore -J mrs -s merged --since 2w --fields minimal # Recently merged
# But no way to get "issues that CHANGED STATE" in a window
# An issue opened 3 months ago but closed yesterday won't appear in --since 2w for issues -s opened
```
**Gap identified:** `--since` filters by `updated_at`, not by "state changed at." An issue closed yesterday but created 6 months ago would appear in `issues -s closed --since 1d` (because updated_at changed), but the semantics are subtle. No explicit "state transitions in time window" query. The resource_state_events table has this data but it's not exposed as a filter.
---
### A14. Meeting Prep: Summarize Recent Activity for a Stakeholder
**Problem:** Agent needs to prepare a 2-minute summary for a project sponsor meeting.
**Flow:**
```
lore -J count issues -p project # Current totals
lore -J count mrs -p project # MR totals
lore -J issues -s closed --since 1w -p project --fields minimal
lore -J mrs -s merged --since 1w -p project --fields minimal
lore -J issues -s opened --status "In progress" -p project
lore -J who --active -p project --since 1w
```
**Gap identified:** Six queries, same as A3. No summary/dashboard command. Agent must synthesize all responses. No trend data (is the open issue count growing or shrinking?). No "highlights" extraction.
---
### A15. Determining If Work Is Safe to Start (Conflict Detection)
**Problem:** Agent is about to start work on an issue and needs to check nobody else is already working on it.
**Flow:**
```
lore -J issues 123 # Read the issue
# Check assignees from response
lore -J mrs -s opened -A other-person # Are they working on related MRs?
lore -J who --overlap src/target/path/ # Anyone actively touching these files?
lore -J search "issue-123-keywords" --type mr -s opened # Wait, search has no --state
```
**Gap identified:** No way to check "is there an open MR that closes issue #123?" -- the entity_references data exists but isn't queryable. Search doesn't support `--state` filter. No "conflict detection" or "in-flight work" check. Agent must do multiple queries and manually correlate.
---
## Part 3: Gap Summary
### Critical Gaps (high impact, blocks common workflows)
| # | Gap | Affected Flows | Suggested Command/Flag |
|---|-----|----------------|----------------------|
| 1 | **MR file changes not surfaced** | H4, A6, A10 | `lore mrs <iid> --files` or include in detail view |
| 2 | **Entity references not queryable** | H7, A5, A11, A15 | `lore refs --issue 123` / `lore refs --mr 456` |
| 3 | **Per-note search missing** | H7, A4 | `lore search --granularity note` (PRD exists) |
| 4 | **No entity-based timeline** | H11, A7 | `lore timeline --issue 321` / `lore timeline --mr 456` |
| 5 | **No @me / current-user alias** | H1, H15 | Resolve from auth token automatically |
### Important Gaps (significant friction, multiple workarounds needed)
| # | Gap | Affected Flows | Suggested Command/Flag |
|---|-----|----------------|----------------------|
| 6 | **No activity feed / summary** | H1, A3, A14 | `lore activity --since 1d` or `lore summary` |
| 7 | **No multi-path who query** | H6, A6 | `lore who src/path1/ src/path2/` |
| 8 | **No --state filter on search** | H14, A15 | `lore search --state merged` |
| 9 | **MRs missing --milestone filter** | H9, A12 | `lore mrs -m "v2.0"` |
| 10 | **No --no-assignee / --unassigned** | H2 | `lore issues --no-assignee` |
| 11 | **No --updated-before filter** | H10, A9 | `lore issues --before 90d` or `--stale 90d` |
| 12 | **No team workload view** | H8 | `lore who --team` or `lore workload` |
### Nice-to-Have Gaps (would improve agent efficiency)
| # | Gap | Affected Flows | Suggested Command/Flag |
|---|-----|----------------|----------------------|
| 13 | **No pagination/offset** | A9 | `--offset 100` for large result sets |
| 14 | **No detail --fields on show** | A2 | `lore issues 999 --fields description` |
| 15 | **No cross-project grouping** | A8 | `lore search --group-by project` |
| 16 | **No trend/velocity metrics** | A3, A14 | `lore trends issues --period week` |
| 17 | **No --for-issue on mrs** | A12, A15 | `lore mrs --closes 123` (query entity_refs) |
| 18 | **1y/12m duration not supported** | A4 | Support `1y`, `12m`, `365d` in --since |
| 19 | **No discussion participant filter** | H15 | `lore who --active --participant @me` |
| 20 | **No sort by due date** | H2 | `lore issues --sort due` |

View File

@@ -1,434 +0,0 @@
Below are the highest-leverage revisions Id make to this plan. Im focusing on correctness pitfalls, SQLite gotchas, query performance on 280K notes, and reducing “dynamic SQL + param juggling” complexity—without turning this into a new ingestion project.
Change 1 — Fix a hard SQLite bug in --active (GROUP_CONCAT DISTINCT + separator)
Why
SQLite does not allow GROUP_CONCAT(DISTINCT x, sep). With DISTINCT, SQLite only permits a single argument (GROUP_CONCAT(DISTINCT x)). Your current query will error at runtime in many SQLite versions.
Revision
Use a subquery that selects distinct participants, then GROUP_CONCAT with your separator.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_active(...)
- (SELECT GROUP_CONCAT(DISTINCT n.author_username, X'1F')
- FROM notes n
- WHERE n.discussion_id = d.id
- AND n.is_system = 0
- AND n.author_username IS NOT NULL) AS participants
+ (SELECT GROUP_CONCAT(username, X'1F') FROM (
+ SELECT DISTINCT n.author_username AS username
+ FROM notes n
+ WHERE n.discussion_id = d.id
+ AND n.is_system = 0
+ AND n.author_username IS NOT NULL
+ ORDER BY username
+ )) AS participants
Change 2 — Replace “contains('.') => exact file match” with segment-aware path classification
Why
path.contains('.') misclassifies directories like:
.github/workflows/
src/v1.2/auth/
It also fails the “root file” case (README.md) because your mode discriminator only treats paths as paths if they contain /.
Revision
Add explicit --path to force Expert mode (covers root files cleanly).
Classify file-vs-dir by checking last path segment for a dot, and whether the input ends with /.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ pub struct WhoArgs {
- /// Username or file path (path if contains /)
- pub target: Option<String>,
+ /// Username or file path shorthand (ambiguous for root files like README.md)
+ pub target: Option<String>,
+
+ /// Force expert mode for a file/directory path (supports root files like README.md)
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["active", "overlap", "reviews"])]
+ pub path: Option<String>,
@@ fn resolve_mode<'a>(args: &'a WhoArgs) -> Result<WhoMode<'a>> {
- if let Some(target) = &args.target {
+ if let Some(p) = &args.path {
+ return Ok(WhoMode::Expert { path: p });
+ }
+ if let Some(target) = &args.target {
let clean = target.strip_prefix('@').unwrap_or(target);
if args.reviews {
return Ok(WhoMode::Reviews { username: clean });
}
- // Disambiguation: if target contains '/', it's a file path.
- // GitLab usernames never contain '/'.
- if target.contains('/') {
+ // Disambiguation:
+ // - treat as path if it contains '/'
+ // - otherwise treat as username (root files require --path)
+ if target.contains('/') {
return Ok(WhoMode::Expert { path: target });
}
return Ok(WhoMode::Workload { username: clean });
}
And update the path pattern logic used by Expert/Overlap:
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_expert(...)
- // Normalize path for LIKE matching: add trailing % if no extension
- let path_pattern = if path.contains('.') {
- path.to_string() // Exact file match
- } else {
- let trimmed = path.trim_end_matches('/');
- format!("{trimmed}/%")
- };
+ // Normalize:
+ // - if ends_with('/') => directory prefix
+ // - else if last segment contains '.' => file exact match
+ // - else => directory prefix
+ let trimmed = path.trim_end_matches('/');
+ let last = trimmed.rsplit('/').next().unwrap_or(trimmed);
+ let is_file = !path.ends_with('/') && last.contains('.');
+ let path_pattern = if is_file { trimmed.to_string() } else { format!("{trimmed}/%") };
Change 3 — Stop building dynamic SQL strings for optional filters; always bind params
Why
Right now youre mixing:
dynamic project_clause string fragments
ad-hoc param vectors
placeholder renumbering by branch
Thats brittle and easy to regress (especially when you add more conditions later). SQLite/rusqlite can bind Option<T> to NULL, which enables a simple pattern:
sql
Copy code
AND (?3 IS NULL OR n.project_id = ?3)
Revision (representative; apply to all queries)
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_expert(...)
- let project_clause = if project_id.is_some() {
- "AND n.project_id = ?3"
- } else {
- ""
- };
-
- let sql = format!(
+ let sql = format!(
"SELECT username, role, activity_count, last_active_at FROM (
@@
FROM notes n
WHERE n.position_new_path LIKE ?1
AND n.is_system = 0
AND n.author_username IS NOT NULL
AND n.created_at >= ?2
- {project_clause}
+ AND (?3 IS NULL OR n.project_id = ?3)
@@
WHERE n.position_new_path LIKE ?1
AND m.author_username IS NOT NULL
AND m.updated_at >= ?2
- {project_clause}
+ AND (?3 IS NULL OR n.project_id = ?3)
GROUP BY m.author_username
- )"
+ ) t"
);
-
- let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
- params.push(Box::new(path_pattern.clone()));
- params.push(Box::new(since_ms));
- if let Some(pid) = project_id {
- params.push(Box::new(pid));
- }
- let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
+ let param_refs = rusqlite::params![path_pattern, since_ms, project_id];
Notes:
Adds required derived-table alias t (some SQLite configurations are stricter).
Eliminates the dynamic param vector and placeholder gymnastics.
Change 4 — Filter “path touch” queries to DiffNotes and escape LIKE properly
Why
Only DiffNotes reliably have position_new_path; including other note types can skew counts and harm performance.
LIKE treats % and _ as wildcards—rare in file paths, but not impossible (generated files, templates). Escaping is a low-cost robustness win.
Revision
Add note_type='DiffNote' and LIKE ... ESCAPE '\' plus a tiny escape helper.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_expert(...)
- FROM notes n
- WHERE n.position_new_path LIKE ?1
+ FROM notes n
+ WHERE n.note_type = 'DiffNote'
+ AND n.position_new_path LIKE ?1 ESCAPE '\'
AND n.is_system = 0
@@
diff --git a/Plan.md b/Plan.md
@@ Helper Functions
+fn escape_like(input: &str) -> String {
+ input.replace('\\', "\\\\").replace('%', "\\%").replace('_', "\\_")
+}
And when building patterns:
diff
Copy code
- let path_pattern = if is_file { trimmed.to_string() } else { format!("{trimmed}/%") };
+ let base = escape_like(trimmed);
+ let path_pattern = if is_file { base } else { format!("{base}/%") };
Apply the same changes to query_overlap and any other position_new_path LIKE ....
Change 5 — Use note timestamps for “touch since” semantics (Expert/Overlap author branch)
Why
In Expert/Overlap “author” branches you filter by m.updated_at >= since. That answers “MR updated recently” rather than “MR touched at this path recently”, which can surface stale ownership.
Revision
Filter by the note creation time (and use it for “last touch” where relevant). You can still compute author activity, but anchor it to note activity.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_overlap(...)
- WHERE n.position_new_path LIKE ?1
+ WHERE n.note_type = 'DiffNote'
+ AND n.position_new_path LIKE ?1 ESCAPE '\'
AND m.state IN ('opened', 'merged')
AND m.author_username IS NOT NULL
- AND m.updated_at >= ?2
+ AND n.created_at >= ?2
AND (?3 IS NULL OR m.project_id = ?3)
Same idea in Expert modes “MR authors” branch.
Change 6 — Workload mode: apply --since consistently to unresolved discussions
Why
Workloads unresolved discussions ignore since_ms. That makes --since partially misleading and can dump very old threads.
Revision
Filter on d.last_note_at when since_ms is set.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_workload(...)
- let disc_sql = format!(
+ let disc_since = if since_ms.is_some() {
+ "AND d.last_note_at >= ?2"
+ } else { "" };
+ let disc_sql = format!(
"SELECT d.noteable_type,
@@
WHERE d.resolvable = 1 AND d.resolved = 0
AND EXISTS (
@@
)
{disc_project_filter}
+ {disc_since}
ORDER BY d.last_note_at DESC
LIMIT {limit}"
);
@@
- // Rebuild params for discussion query (only username + optional project_id)
- let mut disc_params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
- disc_params.push(Box::new(username.to_string()));
- if let Some(pid) = project_id {
- disc_params.push(Box::new(pid));
- }
+ // Params: username, since_ms, project_id (NULLs ok)
+ let disc_param_refs = rusqlite::params![username, since_ms, project_id];
(If you adopt Change 3 fully, this becomes very clean.)
Change 7 — Make Overlap results represent “both roles” instead of collapsing to one
Why
Collapsing to a single role loses valuable info (“they authored and reviewed”). Also your current “prefer author” rule is arbitrary for the “who else is touching this” question.
Revision
Track role counts separately and render as A, R, or A+R.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ pub struct OverlapUser {
pub username: String,
- pub role: String,
- pub touch_count: u32,
+ pub author_touch_count: u32,
+ pub review_touch_count: u32,
+ pub touch_count: u32,
pub last_touch_at: i64,
pub mr_iids: Vec<i64>,
}
@@ fn query_overlap(...)
- let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
+ let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
username: username.clone(),
- role: role.clone(),
+ author_touch_count: 0,
+ review_touch_count: 0,
touch_count: 0,
last_touch_at: 0,
mr_iids: Vec::new(),
});
entry.touch_count += count;
+ if role == "author" { entry.author_touch_count += count; }
+ if role == "reviewer" { entry.review_touch_count += count; }
@@ human output
- println!(
- " {:<16} {:<8} {:>7} {:<12} {}",
+ println!(
+ " {:<16} {:<6} {:>7} {:<12} {}",
...
);
@@
- user.role,
+ format_roles(user.author_touch_count, user.review_touch_count),
Change 8 — Add an “Index Audit + optional migration” step (big perf win, low blast radius)
Why
With 280K notes, the path/timestamp queries will degrade quickly without indexes. This isnt “scope creep”; its making the feature usable.
Revision (plan-level)
Add a non-breaking migration that only creates indexes if missing.
Optionally add a runtime check: if EXPLAIN QUERY PLAN indicates full table scan on notes, print a dim warning in human mode.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ Implementation Order
-| Step | What | Files |
+| Step | What | Files |
| 1 | CLI skeleton: `WhoArgs` + `Commands::Who` + dispatch + stub | `cli/mod.rs`, `commands/mod.rs`, `main.rs` |
+| 1.5 | Index audit + add `CREATE INDEX IF NOT EXISTS` migration for who hot paths | `migrations/0xx_who_indexes.sql` |
@@
Suggested indexes (tune names to your conventions):
notes(note_type, position_new_path, created_at)
notes(discussion_id, is_system, author_username)
discussions(resolvable, resolved, last_note_at, project_id)
merge_requests(project_id, state, updated_at, author_username)
issue_assignees(username, issue_id)
Even if SQLite cant perfectly index LIKE, these still help with join and timestamp filters.
Change 9 — Make robot JSON reproducible by echoing the effective query inputs
Why
Agent workflows benefit from a stable “query record”: what mode ran, what path/user, resolved project, effective since, limit.
Revision
Include an input object in JSON output.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ struct WhoJsonData {
mode: String,
+ input: serde_json::Value,
#[serde(flatten)]
result: serde_json::Value,
}
@@ pub fn print_who_json(...)
- let output = WhoJsonEnvelope {
+ let input = serde_json::json!({
+ "project": /* resolved or raw args.project */,
+ "since": /* resolved since ISO */,
+ "limit": /* args.limit */,
+ });
+ let output = WhoJsonEnvelope {
ok: true,
data: WhoJsonData {
mode: mode.to_string(),
+ input,
result: data,
},
meta: RobotMeta { elapsed_ms },
};
Change 10 — Tighten clap constraints so invalid combinations never reach resolve_mode
Why
Right now conflicts are enforced manually (or not at all). Clamp the invalid combos at the CLI layer:
--active should conflict with target, --overlap, --reviews, --path
--reviews should require a username (and should conflict with Expert path modes)
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ pub struct WhoArgs {
- pub active: bool,
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["target", "overlap", "reviews", "path"])]
+ pub active: bool,
@@
- pub overlap: Option<String>,
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["target", "active", "reviews", "path"])]
+ pub overlap: Option<String>,
@@
- pub reviews: bool,
+ #[arg(long, help_heading = "Mode", requires = "target", conflicts_with_all = ["active", "overlap", "path"])]
+ pub reviews: bool,
Summary of what Id definitely change
If you do nothing else, do these first:
Fix GROUP_CONCAT(DISTINCT ..., sep) in Active mode (runtime error).
Path classification: add --path, and stop using contains('.') globally.
Remove dynamic SQL + param vectors: always bind project_id as nullable and use (? IS NULL OR ...).
Filter to DiffNotes + LIKE escaping for correctness and fewer rows scanned.
Optional index migration: otherwise this will feel slow/non-deterministically slow depending on local DB state.
If you want, I can also provide a consolidated “v2 plan” as a single unified patch (one diff) rather than per-change snippets.

View File

@@ -1,303 +0,0 @@
Below are the highest-leverage revisions Id make to iteration 1 to tighten correctness, performance, and “agent usefulness” without blowing up scope. For each change: (1) rationale, (2) a focused unified diff against the plan you pasted.
Change 1 — Make robot “input echo” actually resolved (project_id, project_path, since_ms/iso, mode)
Why
Your Design Principle #5 says the robot envelope should echo resolved inputs (“effective since, resolved project”), but the current input object echoes only raw CLI strings. Agents cant reliably reproduce or compare runs (e.g., fuzzy project resolution may map differently over time).
This is also a reliability improvement: “what ran” should be computed once and propagated, not recomputed in output.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-5. **Robot-first reproducibility.** Robot JSON output includes an `input` object echoing the resolved query parameters (effective since, resolved project, limit) so agents can trace exactly what ran.
+5. **Robot-first reproducibility.** Robot JSON output includes a `resolved_input` object (mode, since_ms + since_iso, resolved project_id + project_path, limit, db_path) so agents can trace exactly what ran.
@@
-/// Main entry point. Resolves mode from args and dispatches.
-pub fn run_who(config: &Config, args: &WhoArgs) -> Result<WhoResult> {
+/// Main entry point. Resolves mode + resolved inputs once, then dispatches.
+pub fn run_who(config: &Config, args: &WhoArgs) -> Result<WhoRun> {
let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?;
- let project_id = args
+ let project_id = args
.project
.as_deref()
.map(|p| resolve_project(&conn, p))
.transpose()?;
+ let project_path = project_id
+ .map(|id| lookup_project_path(&conn, id))
+ .transpose()?;
let mode = resolve_mode(args)?;
match mode {
WhoMode::Expert { path } => {
let since_ms = resolve_since(args.since.as_deref(), "6m")?;
let result = query_expert(&conn, path, project_id, since_ms, args.limit)?;
- Ok(WhoResult::Expert(result))
+ Ok(WhoRun::new("expert", &db_path, project_id, project_path, since_ms, args.limit, WhoResult::Expert(result)))
}
@@
}
}
+
+/// Wrapper that carries resolved inputs for reproducible output.
+pub struct WhoRun {
+ pub mode: String,
+ pub resolved_input: WhoResolvedInput,
+ pub result: WhoResult,
+}
+
+pub struct WhoResolvedInput {
+ pub db_path: String,
+ pub project_id: Option<i64>,
+ pub project_path: Option<String>,
+ pub since_ms: i64,
+ pub since_iso: String,
+ pub limit: usize,
+}
@@
-pub fn print_who_json(result: &WhoResult, args: &WhoArgs, elapsed_ms: u64) {
- let (mode, data) = match result {
+pub fn print_who_json(run: &WhoRun, args: &WhoArgs, elapsed_ms: u64) {
+ let (mode, data) = match &run.result {
WhoResult::Expert(r) => ("expert", expert_to_json(r)),
@@
- let input = serde_json::json!({
+ let input = serde_json::json!({
"target": args.target,
"path": args.path,
"project": args.project,
"since": args.since,
"limit": args.limit,
});
+
+ let resolved_input = serde_json::json!({
+ "mode": run.mode,
+ "db_path": run.resolved_input.db_path,
+ "project_id": run.resolved_input.project_id,
+ "project_path": run.resolved_input.project_path,
+ "since_ms": run.resolved_input.since_ms,
+ "since_iso": run.resolved_input.since_iso,
+ "limit": run.resolved_input.limit,
+ });
@@
- data: WhoJsonData {
- mode: mode.to_string(),
- input,
- result: data,
- },
+ data: WhoJsonData { mode: mode.to_string(), input, resolved_input, result: data },
meta: RobotMeta { elapsed_ms },
};
@@
struct WhoJsonData {
mode: String,
input: serde_json::Value,
+ resolved_input: serde_json::Value,
#[serde(flatten)]
result: serde_json::Value,
}
Change 2 — Remove dynamic SQL format!(..LIMIT {limit}) and parameterize LIMIT everywhere
Why
You explicitly prefer static SQL ((?N IS NULL OR ...)) to avoid subtle bugs; but Workload/Active use format! for LIMIT. Even though limit is typed, its an inconsistency that complicates statement caching and encourages future string assembly creep.
SQLite supports LIMIT ? with bound parameters; rusqlite can bind an i64.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
- let issues_sql = format!(
- "SELECT ...
- ORDER BY i.updated_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&issues_sql)?;
+ let issues_sql =
+ "SELECT ...
+ ORDER BY i.updated_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(issues_sql)?;
let assigned_issues: Vec<WorkloadIssue> = stmt
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let authored_sql = format!(
- "SELECT ...
- ORDER BY m.updated_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&authored_sql)?;
+ let authored_sql =
+ "SELECT ...
+ ORDER BY m.updated_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(authored_sql)?;
@@
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let reviewing_sql = format!(
- "SELECT ...
- ORDER BY m.updated_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&reviewing_sql)?;
+ let reviewing_sql =
+ "SELECT ...
+ ORDER BY m.updated_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(reviewing_sql)?;
@@
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let disc_sql = format!(
- "SELECT ...
- ORDER BY d.last_note_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&disc_sql)?;
+ let disc_sql =
+ "SELECT ...
+ ORDER BY d.last_note_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(disc_sql)?;
@@
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let sql = format!(
- "SELECT ...
- ORDER BY d.last_note_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&sql)?;
+ let sql =
+ "SELECT ...
+ ORDER BY d.last_note_at DESC
+ LIMIT ?3";
+ let mut stmt = conn.prepare(sql)?;
@@
- .query_map(rusqlite::params![since_ms, project_id], |row| {
+ .query_map(rusqlite::params![since_ms, project_id, limit as i64], |row| {
Change 3 — Fix path matching for dotless files (LICENSE/Makefile) via “exact OR prefix” (no new flags)
Why
Your improved “dot only in last segment” heuristic still fails on dotless files (LICENSE, Makefile, Dockerfile) which are common, especially at repo root. Right now theyll be treated as directories (LICENSE/%) and silently return nothing.
Best minimal UX: if user provides a path thats ambiguous (no trailing slash), match either exact file OR directory prefix.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-/// Build a LIKE pattern from a user-supplied path, with proper LIKE escaping.
-///
-/// Rules:
-/// - If the path ends with `/`, it's a directory prefix → `escaped_path%`
-/// - If the last path segment contains `.`, it's a file → exact match
-/// - Otherwise, it's a directory prefix → `escaped_path/%`
+/// Build an exact + prefix match from a user-supplied path, with proper LIKE escaping.
+///
+/// Rules:
+/// - If the path ends with `/`, treat as directory-only (prefix match)
+/// - Otherwise, treat as ambiguous: exact match OR directory prefix
+/// (fixes dotless files like LICENSE/Makefile without requiring new flags)
@@
-fn build_path_pattern(path: &str) -> String {
+struct PathMatch {
+ exact: String,
+ prefix: String,
+ dir_only: bool,
+}
+
+fn build_path_match(path: &str) -> PathMatch {
let trimmed = path.trim_end_matches('/');
- let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
- let is_file = !path.ends_with('/') && last_segment.contains('.');
let escaped = escape_like(trimmed);
-
- if is_file {
- escaped
- } else {
- format!("{escaped}/%")
- }
+ PathMatch {
+ exact: escaped.clone(),
+ prefix: format!("{escaped}/%"),
+ dir_only: path.ends_with('/'),
+ }
}
@@
- let path_pattern = build_path_pattern(path);
+ let pm = build_path_match(path);
@@
- AND n.position_new_path LIKE ?1 ESCAPE '\\'
+ AND (
+ (?4 = 1 AND n.position_new_path LIKE ?2 ESCAPE '\\')
+ OR (?4 = 0 AND (n.position_new_path = ?1 OR n.position_new_path LIKE ?2 ESCAPE '\\'))
+ )
@@
- let rows: Vec<(String, String, u32, i64)> = stmt
- .query_map(rusqlite::params![path_pattern, since_ms, project_id], |row| {
+ let rows: Vec<(String, String, u32, i64)> = stmt
+ .query_map(rusqlite::params![pm.exact, pm.prefix, since_ms, i32::from(pm.dir_only), project_id], |row| {
Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?))
})?
(Apply the same pattern to Overlap mode.)
Change 4 — Consistently exclude system notes in all DiffNote-based branches (Expert/Overlap author branches currently dont)
Why
You filter n.is_system = 0 for reviewer branches, but not in the author branches of Expert/Overlap. That can skew “author touch” via system-generated diff notes or bot activity.
Consistency here improves correctness and also enables more aggressive partial indexing.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
- WHERE n.note_type = 'DiffNote'
+ WHERE n.note_type = 'DiffNote'
AND n.position_new_path LIKE ?1 ESCAPE '\\'
+ AND n.is_system = 0
AND m.author_username IS NOT NULL
AND n.created_at >= ?2
AND (?3 IS NULL OR m.project_id = ?3)
@@
- WHERE n.note_type = 'DiffNote'
+ WHERE n.note_type = 'DiffNote'
AND n.position_new_path LIKE ?1 ESCAPE '\\'
+ AND n.is_system = 0
AND m.state IN ('opened', 'merged')
AND m.author_username IS NOT NULL
AND n.created_at >= ?2
AND (?3 IS NULL OR m.project_id = ?3)
Change 5 — Rework Migration 017 indexes to match real predicates + add one critical notes index for discussion participation
Why
(a) idx_notes_diffnote_path_created currently leads with note_type even though its constant via partial index. You want the leading columns to match your most selective predicates: position_new_path prefix + created_at range, with optional project_id.
(b) Active + Workload discussion participation repeatedly hits notes by (discussion_id, author_username); you only guarantee notes(discussion_id) is indexed. Adding a narrow partial composite index pays off immediately for both “participants” and “EXISTS user participated” checks.
(c) The discussions index should focus on (project_id, last_note_at) with a partial predicate; resolvable/resolved a_

View File

@@ -1,471 +0,0 @@
Below are the revisions Id make to iteration 2 to improve correctness, determinism, query-plan quality, and multi-project usability without turning this into a bigger product.
Im treating your plan as the “source of truth” and showing git-diff style patches against the plan text/code blocks you included.
Change 1 — Fix project scoping to hit the right index (DiffNote branches)
Why
Your hot-path index is:
idx_notes_diffnote_path_created ON notes(position_new_path, created_at, project_id) WHERE note_type='DiffNote' AND is_system=0
But in Expert/Overlap you sometimes scope by m.project_id = ?3 (MR table), not n.project_id = ?3 (notes table). That weakens the optimizers ability to use the composite notes index (and can force broader joins before filtering).
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Query: Expert Mode @@
- AND (?3 IS NULL OR m.project_id = ?3)
+ -- IMPORTANT: scope on notes.project_id to maximize use of
+ -- idx_notes_diffnote_path_created (notes is the selective table)
+ AND (?3 IS NULL OR n.project_id = ?3)
@@ Query: Overlap Mode @@
- AND (?3 IS NULL OR m.project_id = ?3)
+ AND (?3 IS NULL OR n.project_id = ?3)
@@ Query: Overlap Mode (author branch) @@
- AND (?3 IS NULL OR m.project_id = ?3)
+ AND (?3 IS NULL OR n.project_id = ?3)
Change 2 — Introduce a “prefix vs exact” path query to avoid LIKE when you dont need it
Why
For exact file paths (e.g. src/auth/login.rs), you currently do:
position_new_path LIKE ?1 ESCAPE '\' where ?1 has no wildcard
Thats logically fine, but its a worse signal to the planner than = and can degrade performance depending on collation/case settings.
This doesnt violate “static SQL” — you can pick between two static query strings.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Helper: Path Pattern Construction @@
-fn build_path_pattern(path: &str) -> String {
+struct PathQuery {
+ /// The parameter value to bind.
+ value: String,
+ /// If true: use LIKE value || '%'. If false: use '='.
+ is_prefix: bool,
+}
+
+fn build_path_query(path: &str) -> PathQuery {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
let is_file = !path.ends_with('/') && last_segment.contains('.');
let escaped = escape_like(trimmed);
if is_file {
- escaped
+ PathQuery { value: escaped, is_prefix: false }
} else {
- format!("{escaped}/%")
+ PathQuery { value: format!("{escaped}/%"), is_prefix: true }
}
}
And then (example for DiffNote predicates):
diff
Copy code
@@ Query: Expert Mode @@
- let path_pattern = build_path_pattern(path);
+ let pq = build_path_query(path);
- let sql = " ... n.position_new_path LIKE ?1 ESCAPE '\\' ... ";
+ let sql_prefix = " ... n.position_new_path LIKE ?1 ESCAPE '\\' ... ";
+ let sql_exact = " ... n.position_new_path = ?1 ... ";
- let mut stmt = conn.prepare(sql)?;
+ let mut stmt = if pq.is_prefix { conn.prepare_cached(sql_prefix)? }
+ else { conn.prepare_cached(sql_exact)? };
let rows = stmt.query_map(params![... pq.value ...], ...);
Change 3 — Push Expert aggregation into SQL (less Rust, fewer rows, SQL-level LIMIT)
Why
Right now Expert does:
UNION ALL
return per-role rows
HashMap merge
score compute
sort/truncate
You can do all of that in SQL deterministically, then LIMIT ?N actually works.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Query: Expert Mode @@
- let sql = "SELECT username, role, activity_count, last_active_at FROM (
- ...
- )";
+ let sql = "
+ WITH activity AS (
+ SELECT
+ n.author_username AS username,
+ 'reviewer' AS role,
+ COUNT(*) AS cnt,
+ MAX(n.created_at) AS last_active_at
+ FROM notes n
+ WHERE n.note_type = 'DiffNote'
+ AND n.is_system = 0
+ AND n.author_username IS NOT NULL
+ AND n.created_at >= ?2
+ AND (?3 IS NULL OR n.project_id = ?3)
+ AND (
+ (?4 = 1 AND n.position_new_path LIKE ?1 ESCAPE '\\') OR
+ (?4 = 0 AND n.position_new_path = ?1)
+ )
+ GROUP BY n.author_username
+
+ UNION ALL
+
+ SELECT
+ m.author_username AS username,
+ 'author' AS role,
+ COUNT(DISTINCT m.id) AS cnt,
+ MAX(n.created_at) AS last_active_at
+ FROM merge_requests m
+ JOIN discussions d ON d.merge_request_id = m.id
+ JOIN notes n ON n.discussion_id = d.id
+ WHERE n.note_type = 'DiffNote'
+ AND n.is_system = 0
+ AND m.author_username IS NOT NULL
+ AND n.created_at >= ?2
+ AND (?3 IS NULL OR n.project_id = ?3)
+ AND (
+ (?4 = 1 AND n.position_new_path LIKE ?1 ESCAPE '\\') OR
+ (?4 = 0 AND n.position_new_path = ?1)
+ )
+ GROUP BY m.author_username
+ )
+ SELECT
+ username,
+ SUM(CASE WHEN role='reviewer' THEN cnt ELSE 0 END) AS review_count,
+ SUM(CASE WHEN role='author' THEN cnt ELSE 0 END) AS author_count,
+ MAX(last_active_at) AS last_active_at,
+ (SUM(CASE WHEN role='reviewer' THEN cnt ELSE 0 END) * 3.0) +
+ (SUM(CASE WHEN role='author' THEN cnt ELSE 0 END) * 2.0) AS score
+ FROM activity
+ GROUP BY username
+ ORDER BY score DESC, last_active_at DESC, username ASC
+ LIMIT ?5
+ ";
- // Aggregate by username: combine reviewer + author counts
- let mut user_map: HashMap<...> = HashMap::new();
- ...
- experts.sort_by(...); experts.truncate(limit);
+ // No Rust-side merge/sort needed; SQL already returns final rows.
Change 4 — Overlap output is ambiguous across projects: include stable MR refs (project_path!iid)
Why
mr_iids: Vec<i64> is ambiguous in a multi-project DB. !123 only means something with a project.
Also: your MR IID dedup is currently Vec.contains() inside a loop (O(n²)). Use a HashSet.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ OverlapResult @@
pub struct OverlapUser {
pub username: String,
@@
- pub mr_iids: Vec<i64>,
+ /// Stable MR references like "group/project!123"
+ pub mr_refs: Vec<String>,
}
@@ Query: Overlap Mode (SQL) @@
- GROUP_CONCAT(DISTINCT m.iid) AS mr_iids
+ GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
JOIN merge_requests m ON d.merge_request_id = m.id
+ JOIN projects p ON m.project_id = p.id
@@
- GROUP_CONCAT(DISTINCT m.iid) AS mr_iids
+ GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
FROM merge_requests m
JOIN discussions d ON d.merge_request_id = m.id
JOIN notes n ON n.discussion_id = d.id
+ JOIN projects p ON m.project_id = p.id
@@ Query: Overlap Mode (Rust merge) @@
- let mr_iids: Vec<i64> = mr_iids_csv ...
+ let mr_refs: Vec<String> = mr_refs_csv
+ .as_deref()
+ .map(|csv| csv.split(',').map(|s| s.trim().to_string()).collect())
+ .unwrap_or_default();
@@
- // Merge MR IIDs, deduplicate
- for iid in &mr_iids {
- if !entry.mr_iids.contains(iid) {
- entry.mr_iids.push(*iid);
- }
- }
+ // Merge MR refs, deduplicate
+ use std::collections::HashSet;
+ let mut set: HashSet<String> = entry.mr_refs.drain(..).collect();
+ for r in mr_refs { set.insert(r); }
+ entry.mr_refs = set.into_iter().collect();
Change 5 — Active mode: avoid correlated subqueries by preselecting discussions, then aggregating notes once
Why
Your Active query does two correlated subqueries per discussion row:
note_count
participants
With LIMIT 20 its not catastrophic, but it is still unnecessary work and creates “spiky” behavior if the planner chooses poorly.
Pattern to use:
CTE selects the limited set of discussions
Join notes once, aggregate with GROUP BY
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Query: Active Mode @@
- let sql =
- "SELECT
- d.noteable_type,
- ...
- (SELECT COUNT(*) FROM notes n
- WHERE n.discussion_id = d.id AND n.is_system = 0) AS note_count,
- (SELECT GROUP_CONCAT(username, X'1F') FROM (
- SELECT DISTINCT n.author_username AS username
- FROM notes n
- WHERE n.discussion_id = d.id
- AND n.is_system = 0
- AND n.author_username IS NOT NULL
- ORDER BY username
- )) AS participants
- FROM discussions d
- ...
- LIMIT ?3";
+ let sql = "
+ WITH picked AS (
+ SELECT d.id, d.noteable_type, d.issue_id, d.merge_request_id, d.project_id, d.last_note_at
+ FROM discussions d
+ WHERE d.resolvable = 1 AND d.resolved = 0
+ AND d.last_note_at >= ?1
+ AND (?2 IS NULL OR d.project_id = ?2)
+ ORDER BY d.last_note_at DESC
+ LIMIT ?3
+ ),
+ note_agg AS (
+ SELECT
+ n.discussion_id,
+ COUNT(*) AS note_count,
+ GROUP_CONCAT(n.author_username, X'1F') AS participants
+ FROM (
+ SELECT DISTINCT discussion_id, author_username
+ FROM notes
+ WHERE is_system = 0 AND author_username IS NOT NULL
+ ) n
+ JOIN picked p ON p.id = n.discussion_id
+ GROUP BY n.discussion_id
+ )
+ SELECT
+ p.noteable_type,
+ COALESCE(i.iid, m.iid) AS entity_iid,
+ COALESCE(i.title, m.title) AS entity_title,
+ proj.path_with_namespace,
+ p.last_note_at,
+ COALESCE(na.note_count, 0) AS note_count,
+ COALESCE(na.participants, '') AS participants
+ FROM picked p
+ JOIN projects proj ON p.project_id = proj.id
+ LEFT JOIN issues i ON p.issue_id = i.id
+ LEFT JOIN merge_requests m ON p.merge_request_id = m.id
+ LEFT JOIN note_agg na ON na.discussion_id = p.id
+ ORDER BY p.last_note_at DESC
+ ";
Change 6 — Use prepare_cached() everywhere (cheap perf win, no scope creep)
Why
You already worked hard to keep SQL static. Taking advantage of sqlite statement caching completes the loop.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Query functions @@
- let mut stmt = conn.prepare(sql)?;
+ let mut stmt = conn.prepare_cached(sql)?;
Apply in all query fns (query_workload, query_reviews, query_active, query_expert, query_overlap, lookup_project_path).
Change 7 — Human output: show project_path where ambiguity exists (Workload + Overlap)
Why
When not project-scoped, #42 and !100 arent unique. You already have project paths in the query results — youre just not printing them.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ print_workload_human @@
- println!(
- " {} {} {}",
+ println!(
+ " {} {} {} {}",
style(format!("#{:<5}", item.iid)).cyan(),
truncate_str(&item.title, 45),
style(format_relative_time(item.updated_at)).dim(),
+ style(&item.project_path).dim(),
);
@@ print_workload_human (MRs) @@
- println!(
- " {} {}{} {}",
+ println!(
+ " {} {}{} {} {}",
style(format!("!{:<5}", mr.iid)).cyan(),
truncate_str(&mr.title, 40),
style(draft).dim(),
style(format_relative_time(mr.updated_at)).dim(),
+ style(&mr.project_path).dim(),
);
@@ print_overlap_human @@
- let mr_str = user.mr_iids.iter().take(5).map(|iid| format!("!{iid}")).collect::<Vec<_>>().join(", ");
+ let mr_str = user.mr_refs.iter().take(5).cloned().collect::<Vec<_>>().join(", ");
Change 8 — Robot JSON: add stable IDs + “defaulted” flags for reproducibility
Why
You already added resolved_input — good. Two more reproducibility gaps remain:
Agents cant reliably “open” an entity without IDs (discussion_id, mr_id, issue_id).
Agents cant tell whether since was user-provided vs defaulted (important when replaying intent).
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ WhoResolvedInput @@
pub struct WhoResolvedInput {
@@
pub since_ms: Option<i64>,
pub since_iso: Option<String>,
+ pub since_was_default: bool,
pub limit: usize,
}
@@ run_who @@
- let since_ms = resolve_since(args.since.as_deref(), "6m")?;
+ let since_was_default = args.since.is_none();
+ let since_ms = resolve_since(args.since.as_deref(), "6m")?;
Ok(WhoRun {
resolved_input: WhoResolvedInput {
@@
since_ms: Some(since_ms),
since_iso: Some(ms_to_iso(since_ms)),
+ since_was_default,
limit: args.limit,
},
@@ print_who_json resolved_input @@
let resolved_input = serde_json::json!({
@@
"since_ms": run.resolved_input.since_ms,
"since_iso": run.resolved_input.since_iso,
+ "since_was_default": run.resolved_input.since_was_default,
"limit": run.resolved_input.limit,
});
And for Active/Workload discussion items, add IDs in SQL and JSON:
diff
Copy code
@@ ActiveDiscussion @@
pub struct ActiveDiscussion {
+ pub discussion_id: i64,
@@
}
@@ query_active SELECT @@
- SELECT
- p.noteable_type,
+ SELECT
+ p.id AS discussion_id,
+ p.noteable_type,
@@ active_to_json @@
- "discussions": r.discussions.iter().map(|d| json!({
+ "discussions": r.discussions.iter().map(|d| json!({
+ "discussion_id": d.discussion_id,
...
}))
Change 9 — Make performance verification explicit: require EXPLAIN QUERY PLAN checks for each mode
Why
Youre adding indexes specifically for these queries. The only way to ensure the planner is doing what you think is to lock in a short perf checklist (especially after schema drift or SQLite version differences).
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Verification @@
# Manual verification against real data
cargo run --release -- who src/features/global-search/
@@
cargo run --release -- who src/features/global-search/ -p typescript # project scoped
+
+# Perf verification (required before merge):
+# Confirm idx_notes_diffnote_path_created is used for Expert/Overlap and
+# idx_discussions_unresolved_recent is used for Active.
+sqlite3 path/to/db.sqlite "
+ EXPLAIN QUERY PLAN
+ SELECT ... -- paste final Expert SQL with representative bindings
+";
(Keep it lightweight: one representative query per mode is enough.)
Net effect
Correctness: project scoping hits the notes index; IDs added for agent workflows.
Performance: fewer rows/materialization in Expert; statement caching everywhere; Active avoids correlated subqueries.
UX: human output no longer ambiguous across projects; Overlap MR references become actionable.
Reproducibility: agents can distinguish defaults vs explicit inputs; can dereference entities reliably.
If you want one “highest ROI” subset to implement first: Change 1 + Change 4 + Change 6 + Change 7. Thats where the real operational value lands.

View File

@@ -1,3 +0,0 @@
ChatGPT said:
Proposing code revisions for performance and determinism
Answer now

View File

@@ -1,356 +0,0 @@
Below are the highest-leverage revisions Id make. Theyre tightly scoped (no new tables/APIs), but fix a few real correctness issues and make the outputs more actionable.
1) Fix a correctness bug in PathQuery: dont escape for =, and make --path Makefile actually work
Why
Bug: build_path_query() currently runs escape_like() even when is_prefix = false (exact match). That will break exact matches for paths containing _, %, or \ because = does not treat those as metacharacters (so the escaped string wont equal the stored path).
UX mismatch: The plan says --path handles dotless root files (Makefile/LICENSE), but the current logic still treats them as directory prefixes (Makefile/%) → zero results.
Change
Only escape for LIKE.
Treat root paths (no /) passed via --path as exact matches by default (unless they end with /).
diff
Copy code
diff --git a/plan.md b/plan.md
@@
-/// Build a path query from a user-supplied path.
-///
-/// Rules:
-/// - If the path ends with `/`, it's a directory prefix -> `escaped_path%` (LIKE)
-/// - If the last path segment contains `.`, it's a file -> exact match (=)
-/// - Otherwise, it's a directory prefix -> `escaped_path/%` (LIKE)
+/// Build a path query from a user-supplied path.
+///
+/// Rules:
+/// - If the path ends with `/`, it's a directory prefix -> `escaped_path/%` (LIKE)
+/// - If the path is a root path (no `/`) and does NOT end with `/`, treat as exact (=)
+/// (this makes `--path Makefile` and `--path LICENSE` work as intended)
+/// - Else if the last path segment contains `.`, treat as exact (=)
+/// - Otherwise, treat as directory prefix -> `escaped_path/%` (LIKE)
@@
-fn build_path_query(path: &str) -> PathQuery {
+fn build_path_query(path: &str) -> PathQuery {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
- let is_file = !path.ends_with('/') && last_segment.contains('.');
- let escaped = escape_like(trimmed);
+ let is_root = !trimmed.contains('/');
+ let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));
if is_file {
PathQuery {
- value: escaped,
+ // IMPORTANT: do NOT escape for exact match (=)
+ value: trimmed.to_string(),
is_prefix: false,
}
} else {
+ let escaped = escape_like(trimmed);
PathQuery {
value: format!("{escaped}/%"),
is_prefix: true,
}
}
}
@@
-/// **Known limitation:** Dotless root files (LICENSE, Makefile, Dockerfile)
-/// without a trailing `/` will be treated as directory prefixes. Use `--path`
-/// for these — the `--path` flag passes through to Expert mode directly,
-/// and the `build_path_query` output for "LICENSE" is a prefix `LICENSE/%`
-/// which will simply return zero results (a safe, obvious failure mode that the
-/// help text addresses).
+/// Note: Root file paths passed via `--path` (including dotless files like Makefile/LICENSE)
+/// are treated as exact matches unless they end with `/`.
Also update the --path help text to be explicit:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- /// Force expert mode for a file/directory path (handles root files like
- /// README.md, LICENSE, Makefile that lack a / and can't be auto-detected)
+ /// Force expert mode for a file/directory path.
+ /// Root files (README.md, LICENSE, Makefile) are treated as exact matches.
+ /// Use a trailing `/` to force directory-prefix matching.
2) Fix Active mode: your note_count is currently counting participants, and the CTE scans too broadly
Why
In note_agg, you do SELECT DISTINCT discussion_id, author_username and then COUNT(*) AS note_count. Thats participant count, not note count.
The current note_agg also builds the DISTINCT set from all notes then joins to picked. Its avoidable work.
Change
Split into two aggregations scoped to picked:
note_counts: counts non-system notes per picked discussion.
participants: distinct usernames per picked discussion, then GROUP_CONCAT.
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- note_agg AS (
- SELECT
- n.discussion_id,
- COUNT(*) AS note_count,
- GROUP_CONCAT(n.author_username, X'1F') AS participants
- FROM (
- SELECT DISTINCT discussion_id, author_username
- FROM notes
- WHERE is_system = 0 AND author_username IS NOT NULL
- ) n
- JOIN picked p ON p.id = n.discussion_id
- GROUP BY n.discussion_id
- )
+ note_counts AS (
+ SELECT
+ n.discussion_id,
+ COUNT(*) AS note_count
+ FROM notes n
+ JOIN picked p ON p.id = n.discussion_id
+ WHERE n.is_system = 0
+ GROUP BY n.discussion_id
+ ),
+ participants AS (
+ SELECT
+ x.discussion_id,
+ GROUP_CONCAT(x.author_username, X'1F') AS participants
+ FROM (
+ SELECT DISTINCT n.discussion_id, n.author_username
+ FROM notes n
+ JOIN picked p ON p.id = n.discussion_id
+ WHERE n.is_system = 0 AND n.author_username IS NOT NULL
+ ) x
+ GROUP BY x.discussion_id
+ )
@@
- LEFT JOIN note_agg na ON na.discussion_id = p.id
+ LEFT JOIN note_counts nc ON nc.discussion_id = p.id
+ LEFT JOIN participants pa ON pa.discussion_id = p.id
@@
- COALESCE(na.note_count, 0) AS note_count,
- COALESCE(na.participants, '') AS participants
+ COALESCE(nc.note_count, 0) AS note_count,
+ COALESCE(pa.participants, '') AS participants
Net effect: correctness fix + more predictable perf.
Add a test that would have failed before:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
#[test]
fn test_active_query() {
@@
- insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
+ insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
+ insert_diffnote(&conn, 2, 1, 1, "reviewer_b", "src/foo.rs", "follow-up");
@@
- assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
+ assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
+ assert_eq!(result.discussions[0].note_count, 2);
3) Index fix: idx_discussions_unresolved_recent wont help global --active ordering
Why
Your index is (project_id, last_note_at) with WHERE resolvable=1 AND resolved=0.
When --active is not project-scoped (common default), SQLite cant use (project_id, last_note_at) to satisfy ORDER BY last_note_at DESC efficiently because project_id isnt constrained.
This can turn into a scan+sort over potentially large unresolved sets.
Change
Keep the project-scoped index, but add a global ordering index (partial, still small):
diff
Copy code
diff --git a/plan.md b/plan.md
@@
CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent
ON discussions(project_id, last_note_at)
WHERE resolvable = 1 AND resolved = 0;
+
+-- Active (global): unresolved discussions by recency (no project scope).
+-- Supports ORDER BY last_note_at DESC LIMIT N when project_id is unconstrained.
+CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent_global
+ ON discussions(last_note_at)
+ WHERE resolvable = 1 AND resolved = 0;
4) Make Overlap “touches” coherent: count MRs for reviewers, not DiffNotes
Why
Overlaps question is “Who else has MRs touching my files?” but:
reviewer branch uses COUNT(*) (DiffNotes)
author branch uses COUNT(DISTINCT m.id) (MRs)
Those are different units; summing them into touch_count is misleading.
Change
Count distinct MRs on the reviewer branch too:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- COUNT(*) AS touch_count,
+ COUNT(DISTINCT m.id) AS touch_count,
MAX(n.created_at) AS last_touch_at,
GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
Also update human output labeling:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- style("Touches").bold(),
+ style("MRs").bold(),
(You still preserve “strength” via mr_refs and last_touch_at.)
5) Make outputs more actionable: add a canonical ref field (group/project!iid, group/project#iid)
Why
You already do this for Overlap (mr_refs). Doing the same for Workload and Active reduces friction for both humans and agents:
humans can copy/paste a single token
robots dont need to stitch project_path + iid + prefix
Change (Workload structs + SQL)
diff
Copy code
diff --git a/plan.md b/plan.md
@@
pub struct WorkloadIssue {
pub iid: i64,
+ pub ref_: String,
pub title: String,
pub project_path: String,
pub updated_at: i64,
}
@@
pub struct WorkloadMr {
pub iid: i64,
+ pub ref_: String,
pub title: String,
pub draft: bool,
pub project_path: String,
@@
- let issues_sql =
- "SELECT i.iid, i.title, p.path_with_namespace, i.updated_at
+ let issues_sql =
+ "SELECT i.iid,
+ (p.path_with_namespace || '#' || i.iid) AS ref,
+ i.title, p.path_with_namespace, i.updated_at
@@
- iid: row.get(0)?,
- title: row.get(1)?,
- project_path: row.get(2)?,
- updated_at: row.get(3)?,
+ iid: row.get(0)?,
+ ref_: row.get(1)?,
+ title: row.get(2)?,
+ project_path: row.get(3)?,
+ updated_at: row.get(4)?,
})
@@
- let authored_sql =
- "SELECT m.iid, m.title, m.draft, p.path_with_namespace, m.updated_at
+ let authored_sql =
+ "SELECT m.iid,
+ (p.path_with_namespace || '!' || m.iid) AS ref,
+ m.title, m.draft, p.path_with_namespace, m.updated_at
@@
- iid: row.get(0)?,
- title: row.get(1)?,
- draft: row.get::<_, i32>(2)? != 0,
- project_path: row.get(3)?,
+ iid: row.get(0)?,
+ ref_: row.get(1)?,
+ title: row.get(2)?,
+ draft: row.get::<_, i32>(3)? != 0,
+ project_path: row.get(4)?,
author_username: None,
- updated_at: row.get(4)?,
+ updated_at: row.get(5)?,
})
Then use ref_ in human output + robot JSON.
6) Reviews mode: tolerate leading whitespace before **prefix**
Why
Many people write " **suggestion**: ...". Current LIKE '**%**%' misses that.
Change
Use ltrim(n.body) consistently:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- AND n.body LIKE '**%**%'
+ AND ltrim(n.body) LIKE '**%**%'
@@
- SUBSTR(n.body, 3, INSTR(SUBSTR(n.body, 3), '**') - 1) AS raw_prefix,
+ SUBSTR(ltrim(n.body), 3, INSTR(SUBSTR(ltrim(n.body), 3), '**') - 1) AS raw_prefix,
7) Add two small tests that catch the above regressions
Why
These are exactly the kind of issues that slip through without targeted tests.
diff
Copy code
diff --git a/plan.md b/plan.md
@@
#[test]
fn test_escape_like() {
@@
}
+
+ #[test]
+ fn test_build_path_query_exact_does_not_escape() {
+ // '_' must not be escaped for '='
+ let pq = build_path_query("README_with_underscore.md");
+ assert_eq!(pq.value, "README_with_underscore.md");
+ assert!(!pq.is_prefix);
+ }
+
+ #[test]
+ fn test_path_flag_dotless_root_file_is_exact() {
+ let pq = build_path_query("Makefile");
+ assert_eq!(pq.value, "Makefile");
+ assert!(!pq.is_prefix);
+ }
Summary of net effect
Correctness fixes: exact-path escaping bug; Active.note_count bug.
Perf fixes: global --active index; avoid broad note scans in Active.
Usefulness upgrades: coherent overlap “touch” metric; canonical refs everywhere; reviews prefix more robust.
If you want one extra “stretch” that still isnt scope creep: add an unscoped warning line in human output when project_id == None (e.g., “Aggregated across projects; use -p to scope”) for Expert/Overlap/Active. Thats pure presentation, but prevents misinterpretation in multi-project DBs.

View File

@@ -1,471 +0,0 @@
Proposed revisions (Iteration 6)
Below are the highest-leverage changes Id make on top of your current Iteration 5 plan, with rationale and git-diff style edits to the plan text/snippets.
1) Fix a real edge case: dotless non-root files (src/Dockerfile, infra/Makefile, etc.)
Why
Your current build_path_query() treats dotless last segments as directories (prefix match) unless the path is root. That misclassifies legitimate dotless files inside directories and silently produces path/% (zero hits or wrong hits).
Best minimal fix: keep your static SQL approach, but add a DB existence probe (static SQL) for path queries:
If user didnt force directory (/), and exact path exists in DiffNotes, treat as exact =.
Otherwise use prefix LIKE 'dir/%'.
This avoids new CLI flags, avoids heuristics lists, and uses your existing partial index (idx_notes_diffnote_path_created) efficiently.
Diff
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@
-struct PathQuery {
+struct PathQuery {
/// The parameter value to bind.
value: String,
/// If true: use `LIKE value ESCAPE '\'`. If false: use `= value`.
is_prefix: bool,
}
-/// Build a path query from a user-supplied path.
+/// Build a path query from a user-supplied path, with a DB probe for dotless files.
@@
-fn build_path_query(path: &str) -> PathQuery {
+fn build_path_query(conn: &Connection, path: &str) -> Result<PathQuery> {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
let is_root = !trimmed.contains('/');
- let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));
+ let forced_dir = path.ends_with('/');
+ let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
+
+ // If it doesn't "look like a file" but the exact path exists in DiffNotes,
+ // treat as exact (handles src/Dockerfile, infra/Makefile, etc.).
+ let exact_exists = if !looks_like_file && !forced_dir {
+ conn.query_row(
+ "SELECT 1
+ FROM notes
+ WHERE note_type = 'DiffNote'
+ AND is_system = 0
+ AND position_new_path = ?1
+ LIMIT 1",
+ rusqlite::params![trimmed],
+ |_| Ok(()),
+ ).is_ok()
+ } else {
+ false
+ };
+
+ let is_file = looks_like_file || exact_exists;
if is_file {
PathQuery {
value: trimmed.to_string(),
is_prefix: false,
}
} else {
let escaped = escape_like(trimmed);
PathQuery {
value: format!("{escaped}/%"),
is_prefix: true,
}
}
}
Also update callers:
diff
Copy code
@@
- let pq = build_path_query(path);
+ let pq = build_path_query(conn, path)?;
@@
- let pq = build_path_query(path);
+ let pq = build_path_query(conn, path)?;
And tests:
diff
Copy code
@@
- fn test_build_path_query() {
+ fn test_build_path_query() {
@@
- // Dotless root file -> exact match (root path without '/')
+ // Dotless root file -> exact match (root path without '/')
let pq = build_path_query("Makefile");
assert_eq!(pq.value, "Makefile");
assert!(!pq.is_prefix);
+
+ // Dotless file in subdir should become exact if DB contains it (probe)
+ // (set up: insert one DiffNote with position_new_path = "src/Dockerfile")
2) Make “reviewer” semantics correct: exclude MR authors commenting on their own diffs
Why
Right now, Overlap (and Expert reviewer branch) will count MR authors as “reviewers” if they leave DiffNotes in their own MR (clarifications / replies), inflating A+R and contaminating “who reviewed here” signals.
You already enforce this in --reviews mode (m.author_username != ?1). Apply the same principle consistently:
Reviewer branch: only count notes where n.author_username != m.author_username (when both non-NULL).
Diff (Overlap reviewer branch)
diff
Copy code
@@
- WHERE n.note_type = 'DiffNote'
+ WHERE n.note_type = 'DiffNote'
AND n.position_new_path LIKE ?1 ESCAPE '\\'
AND n.is_system = 0
AND n.author_username IS NOT NULL
+ AND (m.author_username IS NULL OR n.author_username != m.author_username)
AND n.created_at >= ?2
AND (?3 IS NULL OR n.project_id = ?3)
Same change for sql_exact.
3) Expert mode scoring: align units + reduce single-MR “comment storms”
Why
Expert currently mixes units:
reviewer side: DiffNote count
author side: distinct MR count
That makes score noisy and can crown “someone who wrote 30 comments on one MR” as top expert.
Fix: make both sides primarily MR-breadth:
reviewer: COUNT(DISTINCT m.id) as review_mr_count
author: COUNT(DISTINCT m.id) as author_mr_count
Optionally keep review_note_count as a secondary intensity signal (but not the main driver).
Diff (types + SQL)
diff
Copy code
@@
pub struct Expert {
pub username: String,
- pub score: f64,
- pub review_count: u32,
- pub author_count: u32,
+ pub score: i64,
+ pub review_mr_count: u32,
+ pub review_note_count: u32,
+ pub author_mr_count: u32,
pub last_active_ms: i64,
}
Reviewer branch now joins to MR so it can count distinct MRs and exclude self-comments:
diff
Copy code
@@
- SELECT
- n.author_username AS username,
- 'reviewer' AS role,
- COUNT(*) AS cnt,
- MAX(n.created_at) AS last_active_at
- FROM notes n
+ SELECT
+ n.author_username AS username,
+ 'reviewer' AS role,
+ COUNT(DISTINCT m.id) AS mr_cnt,
+ COUNT(*) AS note_cnt,
+ MAX(n.created_at) AS last_active_at
+ FROM notes n
+ JOIN discussions d ON n.discussion_id = d.id
+ JOIN merge_requests m ON d.merge_request_id = m.id
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username IS NOT NULL
+ AND (m.author_username IS NULL OR n.author_username != m.author_username)
AND n.position_new_path LIKE ?1 ESCAPE '\\'
AND n.created_at >= ?2
AND (?3 IS NULL OR n.project_id = ?3)
GROUP BY n.author_username
Update author branch payload to match shape:
diff
Copy code
@@
SELECT
m.author_username AS username,
'author' AS role,
- COUNT(DISTINCT m.id) AS cnt,
+ COUNT(DISTINCT m.id) AS mr_cnt,
+ 0 AS note_cnt,
MAX(n.created_at) AS last_active_at
Aggregate:
diff
Copy code
@@
SELECT
username,
- SUM(CASE WHEN role = 'reviewer' THEN cnt ELSE 0 END) AS review_count,
- SUM(CASE WHEN role = 'author' THEN cnt ELSE 0 END) AS author_count,
+ SUM(CASE WHEN role = 'reviewer' THEN mr_cnt ELSE 0 END) AS review_mr_count,
+ SUM(CASE WHEN role = 'reviewer' THEN note_cnt ELSE 0 END) AS review_note_count,
+ SUM(CASE WHEN role = 'author' THEN mr_cnt ELSE 0 END) AS author_mr_count,
MAX(last_active_at) AS last_active_at,
- (SUM(CASE WHEN role = 'reviewer' THEN cnt ELSE 0 END) * 3.0) +
- (SUM(CASE WHEN role = 'author' THEN cnt ELSE 0 END) * 2.0) AS score
+ (
+ (SUM(CASE WHEN role = 'reviewer' THEN mr_cnt ELSE 0 END) * 20) +
+ (SUM(CASE WHEN role = 'author' THEN mr_cnt ELSE 0 END) * 12) +
+ (SUM(CASE WHEN role = 'reviewer' THEN note_cnt ELSE 0 END) * 1)
+ ) AS score
Human header:
diff
Copy code
@@
- style("Reviews").bold(),
- style("Authored").bold(),
+ style("Reviewed(MRs)").bold(),
+ style("Notes").bold(),
+ style("Authored(MRs)").bold(),
4) Deterministic output: participants + MR refs + tie-breakers
Why
Youve correctly focused on reproducibility (resolved_input), but you still have nondeterministic lists:
participants: GROUP_CONCAT order is undefined → vector order changes run-to-run.
mr_refs: you dedup via HashSet then iterate → undefined order.
user sorting in overlap is missing stable tie-breakers.
This is a real “robot mode flake” source.
Diff (Active participants sort)
diff
Copy code
@@
- let participants: Vec<String> = participants_csv
+ let mut participants: Vec<String> = participants_csv
.as_deref()
.filter(|s| !s.is_empty())
.map(|csv| csv.split('\x1F').map(String::from).collect())
.unwrap_or_default();
+ participants.sort(); // stable, deterministic
Diff (Overlap MR refs sort + stable user sort)
diff
Copy code
@@
- users.sort_by(|a, b| b.touch_count.cmp(&a.touch_count));
+ users.sort_by(|a, b| {
+ b.touch_count.cmp(&a.touch_count)
+ .then_with(|| b.last_touch_at.cmp(&a.last_touch_at))
+ .then_with(|| a.username.cmp(&b.username))
+ });
@@
- entry.mr_refs = set.into_iter().collect();
+ let mut v: Vec<String> = set.into_iter().collect();
+ v.sort();
+ entry.mr_refs = v;
5) Make --limit actionable: surface truncation explicitly (human + robot)
Why
Agents (and humans) need to know if results were cut off so they can rerun with a bigger -n.
Right now theres no signal.
Minimal pattern: query limit + 1, set truncated = true if you got > limit, then truncate.
Diff (result types)
diff
Copy code
@@
pub struct ExpertResult {
pub path_query: String,
pub experts: Vec<Expert>,
+ pub truncated: bool,
}
@@
pub struct ActiveResult {
pub discussions: Vec<ActiveDiscussion>,
pub total_unresolved: u32,
+ pub truncated: bool,
}
@@
pub struct OverlapResult {
pub path_query: String,
pub users: Vec<OverlapUser>,
+ pub truncated: bool,
}
Diff (query pattern example)
diff
Copy code
@@
- let limit_i64 = limit as i64;
+ let limit_plus_one = (limit + 1) as i64;
@@
- LIMIT ?4
+ LIMIT ?4
@@
- rusqlite::params![pq.value, since_ms, project_id, limit_i64],
+ rusqlite::params![pq.value, since_ms, project_id, limit_plus_one],
@@
- Ok(ExpertResult {
+ let truncated = experts.len() > limit;
+ let experts = experts.into_iter().take(limit).collect();
+ Ok(ExpertResult {
path_query: path.to_string(),
experts,
+ truncated,
})
Human output hint:
diff
Copy code
@@
if r.experts.is_empty() { ... }
+ if r.truncated {
+ println!(" {}", style("(showing first -n; rerun with a higher --limit)").dim());
+ }
Robot output field:
diff
Copy code
@@
fn expert_to_json(r: &ExpertResult) -> serde_json::Value {
serde_json::json!({
"path_query": r.path_query,
+ "truncated": r.truncated,
"experts": ...
})
}
6) Overlap merge hot loop: avoid repeated HashSet rebuild per row
Why
This line is expensive in a UNION result with many rows:
rust
Copy code
let mut set: HashSet<String> = entry.mr_refs.drain(..).collect();
It reallocates and rehashes every time.
Fix: store an accumulator with HashSet during merge, convert once at end.
Diff (internal accumulator)
diff
Copy code
@@
- let mut user_map: HashMap<String, OverlapUser> = HashMap::new();
+ struct OverlapAcc {
+ username: String,
+ author_touch_count: u32,
+ review_touch_count: u32,
+ touch_count: u32,
+ last_touch_at: i64,
+ mr_refs: HashSet<String>,
+ }
+ let mut user_map: HashMap<String, OverlapAcc> = HashMap::new();
@@
- let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
+ let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapAcc {
username: username.clone(),
author_touch_count: 0,
review_touch_count: 0,
touch_count: 0,
last_touch_at: 0,
- mr_refs: Vec::new(),
+ mr_refs: HashSet::new(),
});
@@
- let mut set: HashSet<String> = entry.mr_refs.drain(..).collect();
- for r in mr_refs { set.insert(r); }
- entry.mr_refs = set.into_iter().collect();
+ for r in mr_refs { entry.mr_refs.insert(r); }
@@
- let mut users: Vec<OverlapUser> = user_map.into_values().collect();
+ let mut users: Vec<OverlapUser> = user_map.into_values().map(|a| {
+ let mut mr_refs: Vec<String> = a.mr_refs.into_iter().collect();
+ mr_refs.sort();
+ OverlapUser {
+ username: a.username,
+ author_touch_count: a.author_touch_count,
+ review_touch_count: a.review_touch_count,
+ touch_count: a.touch_count,
+ last_touch_at: a.last_touch_at,
+ mr_refs,
+ }
+ }).collect();
7) Tests to lock these behaviors
Add tests (high value)
dotless subdir file uses DB probe → exact match
self-review exclusion prevents MR author showing up as reviewer
deterministic ordering for participants and mr_refs (sort)
Diff (test additions outline)
diff
Copy code
@@
#[test]
+ fn test_build_path_query_dotless_subdir_file_uses_probe() {
+ let conn = setup_test_db();
+ insert_project(&conn, 1, "team/backend");
+ insert_mr(&conn, 1, 1, 100, "author_a", "opened");
+ insert_discussion(&conn, 1, 1, Some(1), None, true, false);
+ insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/Dockerfile", "note");
+
+ let pq = build_path_query(&conn, "src/Dockerfile").unwrap();
+ assert_eq!(pq.value, "src/Dockerfile");
+ assert!(!pq.is_prefix);
+ }
+
+ #[test]
+ fn test_overlap_excludes_self_review_notes() {
+ let conn = setup_test_db();
+ insert_project(&conn, 1, "team/backend");
+ insert_mr(&conn, 1, 1, 100, "author_a", "opened");
+ insert_discussion(&conn, 1, 1, Some(1), None, true, false);
+ // author_a comments on their own MR diff
+ insert_diffnote(&conn, 1, 1, 1, "author_a", "src/auth/login.rs", "clarification");
+
+ let result = query_overlap(&conn, "src/auth/", None, 0, 20).unwrap();
+ let u = result.users.iter().find(|u| u.username == "author_a");
+ // should not be credited as reviewer touch
+ assert!(u.map(|x| x.review_touch_count).unwrap_or(0) == 0);
+ }
Net effect
Correctness: fixes dotless subdir files + self-review pollution.
Signal quality: Expert ranking becomes harder to game by comment volume.
Robot reproducibility: deterministic ordering + explicit truncation.
Performance: avoids rehash loops in overlap merges; path probe uses indexed equality.
If you want one “single best” change: #1 (DB probe exact-match) is the most likely to prevent confusing “why is this empty?” behavior without adding any user-facing complexity.

View File

@@ -1,353 +0,0 @@
Below are the highest-leverage revisions Id make to iteration 6 to improve correctness (multi-project edge cases), robot-mode reliability (bounded payloads + truncation), and signal quality—without changing the fundamental scope (still pure SQL over existing tables).
1) Make build_path_query project-aware and two-way probe (exact and prefix)
Why
Your DB probe currently answers: “does this exact file exist anywhere in DiffNotes?” That can misclassify in a project-scoped run:
Path exists as a dotless file in Project A → probe returns true
User runs -p Project B where the path is a directory (or different shape) → you switch to exact, return empty, and miss valid prefix hits.
Also, you still have a minor heuristic fragility for dot directories when the user omits trailing / (e.g., .github/workflows): last segment has a dot → you treat as file unless forced dir.
Revision
Thread project_id into build_path_query(conn, path, project_id)
Probe exact first (scoped), then probe prefix (scoped)
Only fall back to heuristics if both probes fail
This keeps “static SQL, no dynamic assembly,” and costs at most 2 indexed existence queries per invocation.
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
- fn build_path_query(conn: &Connection, path: &str) -> Result<PathQuery> {
+ fn build_path_query(conn: &Connection, path: &str, project_id: Option<i64>) -> Result<PathQuery> {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
let is_root = !trimmed.contains('/');
let forced_dir = path.ends_with('/');
- let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
+ // Heuristic is now only a fallback; probes decide first.
+ let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
- let exact_exists = if !looks_like_file && !forced_dir {
- conn.query_row(
- "SELECT 1 FROM notes
- WHERE note_type = 'DiffNote'
- AND is_system = 0
- AND position_new_path = ?1
- LIMIT 1",
- rusqlite::params![trimmed],
- |_| Ok(()),
- )
- .is_ok()
- } else {
- false
- };
+ // Probe 1: exact file exists (scoped)
+ let exact_exists = conn.query_row(
+ "SELECT 1 FROM notes
+ WHERE note_type = 'DiffNote'
+ AND is_system = 0
+ AND position_new_path = ?1
+ AND (?2 IS NULL OR project_id = ?2)
+ LIMIT 1",
+ rusqlite::params![trimmed, project_id],
+ |_| Ok(()),
+ ).is_ok();
+
+ // Probe 2: directory prefix exists (scoped)
+ let prefix_exists = if !forced_dir {
+ let escaped = escape_like(trimmed);
+ let pat = format!("{escaped}/%");
+ conn.query_row(
+ "SELECT 1 FROM notes
+ WHERE note_type = 'DiffNote'
+ AND is_system = 0
+ AND position_new_path LIKE ?1 ESCAPE '\\'
+ AND (?2 IS NULL OR project_id = ?2)
+ LIMIT 1",
+ rusqlite::params![pat, project_id],
+ |_| Ok(()),
+ ).is_ok()
+ } else { false };
- let is_file = looks_like_file || exact_exists;
+ // Forced directory always wins; otherwise: exact > prefix > heuristic
+ let is_file = if forced_dir { false }
+ else if exact_exists { true }
+ else if prefix_exists { false }
+ else { looks_like_file };
if is_file {
Ok(PathQuery { value: trimmed.to_string(), is_prefix: false })
} else {
let escaped = escape_like(trimmed);
Ok(PathQuery { value: format!("{escaped}/%"), is_prefix: true })
}
}
@@
- let pq = build_path_query(conn, path)?;
+ let pq = build_path_query(conn, path, project_id)?;
Add test coverage for the multi-project misclassification case:
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
#[test]
fn test_build_path_query_dotless_subdir_file_uses_db_probe() {
@@
- let pq = build_path_query(&conn, "src/Dockerfile").unwrap();
+ let pq = build_path_query(&conn, "src/Dockerfile", None).unwrap();
@@
- let pq2 = build_path_query(&conn2, "src/Dockerfile").unwrap();
+ let pq2 = build_path_query(&conn2, "src/Dockerfile", None).unwrap();
}
+
+ #[test]
+ fn test_build_path_query_probe_is_project_scoped() {
+ // Path exists as a dotless file in project 1; project 2 should not
+ // treat it as an exact file unless it exists there too.
+ let conn = setup_test_db();
+ insert_project(&conn, 1, "team/a");
+ insert_project(&conn, 2, "team/b");
+ insert_mr(&conn, 1, 1, 10, "author_a", "opened");
+ insert_discussion(&conn, 1, 1, Some(1), None, true, false);
+ insert_diffnote(&conn, 1, 1, 1, "rev", "infra/Makefile", "note");
+
+ let pq_scoped = build_path_query(&conn, "infra/Makefile", Some(2)).unwrap();
+ assert!(pq_scoped.is_prefix); // should fall back to prefix in project 2
+ }
2) Bound robot payload sizes for participants and mr_refs (with totals + truncation)
Why
mr_refs and participants can become unbounded arrays in robot mode, which is a real operational hazard:
huge JSON → slow, noisy diffs, brittle downstream pipelines
potential SQLite group_concat truncation becomes invisible (and you cant distinguish “no refs” vs “refs truncated”)
Revision
Introduce hard caps and explicit metadata:
participants_total, participants_truncated
mr_refs_total, mr_refs_truncated
This is not scope creep—its defensive output hygiene.
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
pub struct ActiveDiscussion {
@@
pub participants: Vec<String>,
+ pub participants_total: u32,
+ pub participants_truncated: bool,
}
@@
pub struct OverlapUser {
@@
pub mr_refs: Vec<String>,
+ pub mr_refs_total: u32,
+ pub mr_refs_truncated: bool,
}
Implementation sketch (Rust-side, deterministic):
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
fn query_active(...) -> Result<ActiveResult> {
+ const MAX_PARTICIPANTS: usize = 50;
@@
- participants.sort();
+ participants.sort();
+ let participants_total = participants.len() as u32;
+ let participants_truncated = participants.len() > MAX_PARTICIPANTS;
+ if participants_truncated {
+ participants.truncate(MAX_PARTICIPANTS);
+ }
@@
Ok(ActiveDiscussion {
@@
participants,
+ participants_total,
+ participants_truncated,
})
@@
fn query_overlap(...) -> Result<OverlapResult> {
+ const MAX_MR_REFS_PER_USER: usize = 50;
@@
.map(|a| {
let mut mr_refs: Vec<String> = a.mr_refs.into_iter().collect();
mr_refs.sort();
+ let mr_refs_total = mr_refs.len() as u32;
+ let mr_refs_truncated = mr_refs.len() > MAX_MR_REFS_PER_USER;
+ if mr_refs_truncated {
+ mr_refs.truncate(MAX_MR_REFS_PER_USER);
+ }
OverlapUser {
@@
mr_refs,
+ mr_refs_total,
+ mr_refs_truncated,
}
})
Update robot JSON accordingly:
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
fn active_to_json(r: &ActiveResult) -> serde_json::Value {
@@
"participants": d.participants,
+ "participants_total": d.participants_total,
+ "participants_truncated": d.participants_truncated,
}))
@@
fn overlap_to_json(r: &OverlapResult) -> serde_json::Value {
@@
"mr_refs": u.mr_refs,
+ "mr_refs_total": u.mr_refs_total,
+ "mr_refs_truncated": u.mr_refs_truncated,
}))
Also update robot-docs manifest schema snippet for who.active.discussions[] and who.overlap.users[].
3) Add truncation metadata to Workload sections (same LIMIT+1 pattern)
Why
Workload is the mode most likely to be consumed by agents, and right now it has silent truncation (each section is LIMIT N with no signal). Your plan already treats truncation as a first-class contract elsewhere; Workload should match.
Revision
For each workload query:
request LIMIT + 1
set *_truncated booleans
trim to requested limit
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
pub struct WorkloadResult {
pub username: String,
pub assigned_issues: Vec<WorkloadIssue>,
pub authored_mrs: Vec<WorkloadMr>,
pub reviewing_mrs: Vec<WorkloadMr>,
pub unresolved_discussions: Vec<WorkloadDiscussion>,
+ pub assigned_issues_truncated: bool,
+ pub authored_mrs_truncated: bool,
+ pub reviewing_mrs_truncated: bool,
+ pub unresolved_discussions_truncated: bool,
}
And in JSON include the booleans (plus you already have summary.counts).
This is mechanically repetitive but extremely valuable for automation.
4) Rename “Last Active” → “Last Seen” for Expert/Overlap
Why
For “author” rows, the timestamp is derived from review activity on their MR (via MAX(n.created_at)), not necessarily that persons direct action. Calling that “active” is semantically misleading. “Last seen” is accurate across both reviewer+author branches.
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
pub struct Expert {
@@
- pub last_active_ms: i64,
+ pub last_seen_ms: i64,
}
@@
pub struct OverlapUser {
@@
- pub last_touch_at: i64,
+ pub last_seen_at: i64,
@@
fn print_expert_human(...) {
@@
- style("Last Active").bold(),
+ style("Last Seen").bold(),
@@
- style(format_relative_time(expert.last_active_ms)).dim(),
+ style(format_relative_time(expert.last_seen_ms)).dim(),
(Keep internal SQL aliases consistent: last_seen_at everywhere.)
5) Make MR state filtering consistent in Expert/Overlap reviewer branches
Why
You already restrict Overlap author branch to opened|merged, but reviewer branches can include closed/unmerged noise. Consistency improves signal quality and can reduce scan churn.
Low-risk revision: apply the same state filter to reviewer branches (Expert + Overlap). You can keep “closed” excluded by default without adding new flags.
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
@@
- AND n.created_at >= ?2
+ AND m.state IN ('opened','merged')
+ AND n.created_at >= ?2
This is a semantic choice; if you later want archaeology across closed/unmerged, that belongs in a separate mode/flag, but I would not add it now.
6) Add a design principle for bounded outputs (aligns with robot-first reproducibility)
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
10. **Truncation transparency.** Result types carry a `truncated: bool` flag...
+11. **Bounded payloads.** Robot JSON must never emit unbounded arrays (participants, refs).
+ Large list fields are capped with `*_total` + `*_truncated` so agents can page/retry.
Consolidated plan metadata bump (Iteration 7)
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
-iteration: 6
+iteration: 7
updated: 2026-02-07
Net effect (what you get)
Correct path classification under -p scoping (no cross-project probe leakage)
Deterministic + bounded robot payloads (no giant JSON surprises)
Uniform truncation contract across all modes (Workload no longer silently truncates)
Clearer semantics (“Last Seen” avoids misinterpretation)
Cleaner signals (reviewer branches ignore closed/unmerged by default)
If you want, I can also produce a second diff that updates the robot-docs schema block and the Verification EXPLAIN expectations to reflect the new probe queries and the state filter.

View File

@@ -0,0 +1,9 @@
ALTER TABLE issues ADD COLUMN status_name TEXT;
ALTER TABLE issues ADD COLUMN status_category TEXT;
ALTER TABLE issues ADD COLUMN status_color TEXT;
ALTER TABLE issues ADD COLUMN status_icon_name TEXT;
ALTER TABLE issues ADD COLUMN status_synced_at INTEGER;
CREATE INDEX IF NOT EXISTS idx_issues_project_status_name ON issues(project_id, status_name);
INSERT INTO schema_version (version, applied_at, description)
VALUES (21, strftime('%s', 'now') * 1000, 'Work item status columns for issues');

View File

@@ -0,0 +1,21 @@
-- Migration 022: Composite query indexes for notes + author_id column
-- Optimizes author-scoped and project-scoped date-range queries on notes.
-- Adds discussion JOIN indexes and immutable author identity column.
-- Composite index for author-scoped queries (who command, notes --author)
CREATE INDEX IF NOT EXISTS idx_notes_user_created
ON notes(project_id, author_username COLLATE NOCASE, created_at DESC, id DESC)
WHERE is_system = 0;
-- Composite index for project-scoped date-range queries
CREATE INDEX IF NOT EXISTS idx_notes_project_created
ON notes(project_id, created_at DESC, id DESC)
WHERE is_system = 0;
-- Discussion JOIN indexes
CREATE INDEX IF NOT EXISTS idx_discussions_issue_id ON discussions(issue_id);
CREATE INDEX IF NOT EXISTS idx_discussions_mr_id ON discussions(merge_request_id);
-- Immutable author identity column (GitLab numeric user ID)
ALTER TABLE notes ADD COLUMN author_id INTEGER;
CREATE INDEX IF NOT EXISTS idx_notes_author_id ON notes(author_id) WHERE author_id IS NOT NULL;

View File

@@ -0,0 +1,5 @@
ALTER TABLE issues ADD COLUMN closed_at TEXT;
ALTER TABLE issues ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0;
INSERT INTO schema_version (version, applied_at, description)
VALUES (23, strftime('%s', 'now') * 1000, 'Add closed_at and confidential to issues');

View File

@@ -0,0 +1,153 @@
-- Migration 024: Add 'note' source_type to documents and dirty_sources
-- SQLite does not support ALTER CONSTRAINT, so we use the table-rebuild pattern.
-- ============================================================
-- 1. Rebuild dirty_sources with updated CHECK constraint
-- ============================================================
CREATE TABLE dirty_sources_new (
source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion','note')),
source_id INTEGER NOT NULL,
queued_at INTEGER NOT NULL,
attempt_count INTEGER NOT NULL DEFAULT 0,
last_attempt_at INTEGER,
last_error TEXT,
next_attempt_at INTEGER,
PRIMARY KEY(source_type, source_id)
);
INSERT INTO dirty_sources_new SELECT * FROM dirty_sources;
DROP TABLE dirty_sources;
ALTER TABLE dirty_sources_new RENAME TO dirty_sources;
CREATE INDEX idx_dirty_sources_next_attempt ON dirty_sources(next_attempt_at);
-- ============================================================
-- 2. Rebuild documents with updated CHECK constraint
-- ============================================================
-- 2a. Backup junction table data
CREATE TEMP TABLE _doc_labels_backup AS SELECT * FROM document_labels;
CREATE TEMP TABLE _doc_paths_backup AS SELECT * FROM document_paths;
-- 2b. Drop all triggers that reference documents
DROP TRIGGER IF EXISTS documents_ai;
DROP TRIGGER IF EXISTS documents_ad;
DROP TRIGGER IF EXISTS documents_au;
DROP TRIGGER IF EXISTS documents_embeddings_ad;
-- 2c. Drop junction tables (they have FK references to documents)
DROP TABLE IF EXISTS document_labels;
DROP TABLE IF EXISTS document_paths;
-- 2d. Create new documents table with 'note' in CHECK constraint
CREATE TABLE documents_new (
id INTEGER PRIMARY KEY,
source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion','note')),
source_id INTEGER NOT NULL,
project_id INTEGER NOT NULL REFERENCES projects(id),
author_username TEXT,
label_names TEXT,
created_at INTEGER,
updated_at INTEGER,
url TEXT,
title TEXT,
content_text TEXT NOT NULL,
content_hash TEXT NOT NULL,
labels_hash TEXT NOT NULL DEFAULT '',
paths_hash TEXT NOT NULL DEFAULT '',
is_truncated INTEGER NOT NULL DEFAULT 0,
truncated_reason TEXT CHECK (
truncated_reason IN (
'token_limit_middle_drop','single_note_oversized','first_last_oversized',
'hard_cap_oversized'
)
OR truncated_reason IS NULL
),
UNIQUE(source_type, source_id)
);
-- 2e. Copy all existing data
INSERT INTO documents_new SELECT * FROM documents;
-- 2f. Swap tables
DROP TABLE documents;
ALTER TABLE documents_new RENAME TO documents;
-- 2g. Recreate all indexes on documents
CREATE INDEX idx_documents_project_updated ON documents(project_id, updated_at);
CREATE INDEX idx_documents_author ON documents(author_username);
CREATE INDEX idx_documents_source ON documents(source_type, source_id);
CREATE INDEX idx_documents_hash ON documents(content_hash);
-- 2h. Recreate junction tables
CREATE TABLE document_labels (
document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
label_name TEXT NOT NULL,
PRIMARY KEY(document_id, label_name)
) WITHOUT ROWID;
CREATE INDEX idx_document_labels_label ON document_labels(label_name);
CREATE TABLE document_paths (
document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
path TEXT NOT NULL,
PRIMARY KEY(document_id, path)
) WITHOUT ROWID;
CREATE INDEX idx_document_paths_path ON document_paths(path);
-- 2i. Restore junction table data from backups
INSERT INTO document_labels SELECT * FROM _doc_labels_backup;
INSERT INTO document_paths SELECT * FROM _doc_paths_backup;
-- 2j. Recreate FTS triggers (from migration 008)
CREATE TRIGGER documents_ai AFTER INSERT ON documents BEGIN
INSERT INTO documents_fts(rowid, title, content_text)
VALUES (new.id, COALESCE(new.title, ''), new.content_text);
END;
CREATE TRIGGER documents_ad AFTER DELETE ON documents BEGIN
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
END;
CREATE TRIGGER documents_au AFTER UPDATE ON documents
WHEN old.title IS NOT new.title OR old.content_text != new.content_text
BEGIN
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
INSERT INTO documents_fts(rowid, title, content_text)
VALUES (new.id, COALESCE(new.title, ''), new.content_text);
END;
-- 2k. Recreate embeddings cleanup trigger (from migration 009)
CREATE TRIGGER documents_embeddings_ad AFTER DELETE ON documents BEGIN
DELETE FROM embeddings
WHERE rowid >= old.id * 1000
AND rowid < (old.id + 1) * 1000;
END;
-- 2l. Rebuild FTS index to ensure consistency after table swap
INSERT INTO documents_fts(documents_fts) VALUES('rebuild');
-- ============================================================
-- 3. Defense triggers: clean up documents when notes are
-- deleted or flipped to system notes
-- ============================================================
CREATE TRIGGER notes_ad_cleanup AFTER DELETE ON notes
WHEN old.is_system = 0
BEGIN
DELETE FROM documents WHERE source_type = 'note' AND source_id = old.id;
END;
CREATE TRIGGER notes_au_system_cleanup AFTER UPDATE OF is_system ON notes
WHEN NEW.is_system = 1 AND OLD.is_system = 0
BEGIN
DELETE FROM documents WHERE source_type = 'note' AND source_id = OLD.id;
END;
-- ============================================================
-- 4. Drop temp backup tables
-- ============================================================
DROP TABLE IF EXISTS _doc_labels_backup;
DROP TABLE IF EXISTS _doc_paths_backup;

View File

@@ -0,0 +1,8 @@
-- Backfill existing non-system notes into dirty queue for document generation.
-- Only seeds notes that don't already have documents and aren't already queued.
INSERT INTO dirty_sources (source_type, source_id, queued_at)
SELECT 'note', n.id, CAST(strftime('%s', 'now') AS INTEGER) * 1000
FROM notes n
LEFT JOIN documents d ON d.source_type = 'note' AND d.source_id = n.id
WHERE n.is_system = 0 AND d.id IS NULL
ON CONFLICT(source_type, source_id) DO NOTHING;

View File

@@ -0,0 +1,20 @@
-- Indexes for time-decay expert scoring: dual-path matching and reviewer participation.
CREATE INDEX IF NOT EXISTS idx_notes_old_path_author
ON notes(position_old_path, author_username, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
ON mr_file_changes(old_path, project_id, merge_request_id)
WHERE old_path IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
ON mr_file_changes(new_path, project_id, merge_request_id);
CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author
ON notes(discussion_id, author_username, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0;
CREATE INDEX IF NOT EXISTS idx_notes_old_path_project_created
ON notes(position_old_path, project_id, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;

View File

@@ -0,0 +1,186 @@
1. **Isolate scheduled behavior from manual `sync`**
Reasoning: Your current plan injects backoff into `handle_sync_cmd`, which affects all `lore sync` calls (including manual recovery runs). Scheduled behavior should be isolated so humans arent unexpectedly blocked by service backoff.
```diff
@@ Context
-`lore sync` runs a 4-stage pipeline (issues, MRs, docs, embeddings) that takes 2-4 minutes.
+`lore sync` remains the manual/operator command.
+`lore service run` (hidden/internal) is the scheduled execution entrypoint.
@@ Commands & User Journeys
+### `lore service run` (hidden/internal)
+**What it does:** Executes one scheduled sync attempt with service-only policy:
+- applies service backoff policy
+- records service run state
+- invokes sync pipeline with configured profile
+- updates retry state on success/failure
+
+**Invocation:** scheduler always runs:
+`lore --robot service run --reason timer`
@@ Backoff Integration into `handle_sync_cmd`
-Insert **after** config load but **before** the dry_run check:
+Do not add backoff checks to `handle_sync_cmd`.
+Backoff logic lives only in `handle_service_run`.
```
2. **Use DB as source-of-truth for service state (not a standalone JSON status file)**
Reasoning: You already have `sync_runs` in SQLite. A separate JSON status file creates split-brain and race/corruption risk. Keep JSON as optional cache/export only.
```diff
@@ Status File
-Location: `{get_data_dir()}/sync-status.json`
+Primary state location: SQLite (`service_state` table) + existing `sync_runs`.
+Optional mirror file: `{get_data_dir()}/sync-status.json` (best-effort export only).
@@ File-by-File Implementation Details
-### `src/core/sync_status.rs` (NEW)
+### `migrations/015_service_state.sql` (NEW)
+CREATE TABLE service_state (
+ id INTEGER PRIMARY KEY CHECK (id = 1),
+ installed INTEGER NOT NULL DEFAULT 0,
+ platform TEXT,
+ interval_seconds INTEGER,
+ profile TEXT NOT NULL DEFAULT 'balanced',
+ consecutive_failures INTEGER NOT NULL DEFAULT 0,
+ next_retry_at_ms INTEGER,
+ last_error_code TEXT,
+ last_error_message TEXT,
+ updated_at_ms INTEGER NOT NULL
+);
+
+### `src/core/service_state.rs` (NEW)
+- read/write state row
+- derive backoff/next_retry
+- join with latest `sync_runs` for status output
```
3. **Backoff policy should be configurable, jittered, and error-aware**
Reasoning: Fixed hardcoded backoff (`base=1800`) is wrong when user sets another interval. Also permanent failures (bad token/config) should not burn retries forever; they should enter paused/error state.
```diff
@@ Backoff Logic
-// Exponential: base * 2^failures, capped at 4 hours
+// Exponential with jitter: base * 2^(failures-1), capped, ±20% jitter
+// Applies only to transient errors.
+// Permanent errors set `paused_reason` and stop retries until user action.
@@ CLI Definition Changes
+ServiceCommand::Resume, // clear paused state / failures
+ServiceCommand::Run, // hidden
@@ Error Types
+ServicePaused, // scheduler paused due to permanent error
+ServiceCommandFailed, // OS command failure with stderr context
```
4. **Add a pipeline-level single-flight lock**
Reasoning: Current locking is in ingest stages; theres still overlap risk across full sync pipelines (docs/embed can overlap with another run). Add a top-level lock for scheduled/manual sync pipeline execution.
```diff
@@ Architecture
+Add `sync_pipeline` lock at top-level sync execution.
+Keep existing ingest lock (`sync`) for ingest internals.
@@ Backoff Integration into `handle_sync_cmd`
+Before starting sync pipeline, acquire `AppLock` with:
+name = "sync_pipeline"
+stale_lock_minutes = config.sync.stale_lock_minutes
+heartbeat_interval_seconds = config.sync.heartbeat_interval_seconds
```
5. **Dont embed token in service files by default**
Reasoning: Embedding PAT into unit/plist is a high-risk secret leak path. Make secure storage explicit and default-safe.
```diff
@@ `lore service install [--interval 30m]`
+`lore service install [--interval 30m] [--token-source env-file|embedded]`
+Default: `env-file` (0600 perms, user-owned)
+`embedded` allowed only with explicit opt-in and warning
@@ Robot output
- "token_embedded": true
+ "token_source": "env_file"
@@ Human output
- Note: Your GITLAB_TOKEN is embedded in the service file.
+ Note: Token is stored in a user-private env file (0600).
```
6. **Introduce a command-runner abstraction with timeout + stderr capture**
Reasoning: `launchctl/systemctl/schtasks` calls are failure-prone; you need consistent error mapping and deterministic tests.
```diff
@@ Platform Backends
-exports free functions that dispatch via `#[cfg(target_os)]`
+exports backend + shared `CommandRunner`:
+- run(cmd, args, timeout)
+- capture stdout/stderr/exit code
+- map failure to `ServiceCommandFailed { cmd, exit_code, stderr }`
```
7. **Persist install manifest to avoid brittle file parsing**
Reasoning: Parsing timer/plist for interval/state is fragile and platform-format dependent. Persist a manifest with checksums and expected artifacts.
```diff
@@ Platform Backends
-Same pattern for ... `get_interval_seconds()`
+Add manifest: `{data_dir}/service-manifest.json`
+Stores platform, interval, profile, generated files, and command.
+`service status` reads manifest first, then verifies platform state.
@@ Acceptance criteria
+Install is idempotent:
+- if manifest+files already match, report `no_change: true`
+- if drift detected, reconcile and rewrite
```
8. **Make schedule profile explicit (`fast|balanced|full`)**
Reasoning: This makes the feature more useful and performance-tunable without requiring users to understand internal flags.
```diff
@@ `lore service install [--interval 30m]`
+`lore service install [--interval 30m] [--profile fast|balanced|full]`
+
+Profiles:
+- fast: `sync --no-docs --no-embed`
+- balanced (default): `sync --no-embed`
+- full: `sync`
```
9. **Upgrade `service status` to include scheduler health + recent run summary**
Reasoning: Single last-sync snapshot is too shallow. Include recent attempts and whether scheduler is paused/backing off/running.
```diff
@@ `lore service status`
-What it does: Shows whether the service is installed, its configuration, last sync result, and next scheduled run.
+What it does: Shows install state, scheduler state (running/backoff/paused), recent runs, and next run estimate.
@@ Robot output
- "last_sync": { ... },
- "backoff": null
+ "scheduler_state": "running|backoff|paused|idle",
+ "last_sync": { ... },
+ "recent_runs": [{"run_id":"...","status":"...","started_at_iso":"..."}],
+ "backoff": null,
+ "paused_reason": null
```
10. **Strengthen tests around determinism and cross-platform generation**
Reasoning: Time-based backoff and shell quoting are classic flaky points. Add fake clock + fake command runner for deterministic tests.
```diff
@@ Testing Strategy
+Add deterministic test seams:
+- `Clock` trait for backoff/now calculations
+- `CommandRunner` trait for backend command execution
+
+Add tests:
+- transient vs permanent error classification
+- backoff schedule with jitter bounds
+- manifest drift reconciliation
+- quoting/escaping for paths with spaces and special chars
+- `service run` does not modify manual `sync` behavior
```
If you want, I can rewrite your full plan as a single clean revised document with these changes already integrated (instead of patch fragments).

View File

@@ -0,0 +1,182 @@
**High-Impact Revisions (ordered by priority)**
1. **Make service identity project-scoped (avoid collisions across repos/users)**
Analysis: Current fixed names (`com.gitlore.sync`, `LoreSync`, `lore-sync.timer`) will collide when users run multiple gitlore workspaces. This causes silent overwrites and broken uninstall/status behavior.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ Commands & User Journeys / install
- lore service install [--interval 30m] [--profile balanced] [--token-source env-file]
+ lore service install [--interval 30m] [--profile balanced] [--token-source auto] [--name <optional>]
@@ Install Manifest Schema
+ /// Stable per-install identity (default derived from project root hash)
+ pub service_id: String,
@@ Platform Backends
- Label: com.gitlore.sync
+ Label: com.gitlore.sync.{service_id}
- Task name: LoreSync
+ Task name: LoreSync-{service_id}
- ~/.config/systemd/user/lore-sync.service
+ ~/.config/systemd/user/lore-sync-{service_id}.service
```
2. **Replace token model with secure per-OS defaults**
Analysis: The current “env-file default” is not actually secure on macOS launchd (token still ends up in plist). On Windows, assumptions about inherited environment are fragile. Use OS-native secure stores by default and keep `embedded` as explicit opt-in only.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ Token storage strategies
-| env-file (default) | ...
+| auto (default) | macOS: Keychain, Linux: env-file (0600), Windows: Credential Manager |
+| env-file | Linux/systemd only |
| embedded | ... explicit warning ...
@@ macOS launchd section
- env-file strategy stores canonical token in service-env but embeds token in plist
+ default strategy is Keychain lookup at runtime; no token persisted in plist
+ env-file is not offered on macOS
@@ Windows schtasks section
- token must be in user's system environment
+ default strategy stores token in Windows Credential Manager and injects at runtime
```
3. **Version and atomically persist manifest/status**
Analysis: `Option<Self>` on read hides corruption, and non-atomic writes risk truncated JSON on crashes. This will create false “not installed” and scheduler confusion.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ Install Manifest Schema
+ pub schema_version: u32, // start at 1
+ pub updated_at_iso: String,
@@ Status File Schema
+ pub schema_version: u32, // start at 1
+ pub updated_at_iso: String,
@@ Read/Write
- read(path) -> Option<Self>
+ read(path) -> Result<Option<Self>, LoreError>
- write(...) -> std::io::Result<()>
+ write_atomic(...) -> std::io::Result<()> // tmp file + fsync + rename
```
4. **Persist `next_retry_at_ms` instead of recomputing jitter**
Analysis: Deterministic jitter from timestamp modulo is predictable and can herd retries. Persisting `next_retry_at_ms` at failure time makes status accurate, stable, and cheap to compute.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ SyncStatusFile
pub consecutive_failures: u32,
+ pub next_retry_at_ms: Option<i64>,
@@ Backoff Logic
- compute backoff from last_run.timestamp_ms and deterministic jitter each read
+ compute backoff once on failure, store next_retry_at_ms, read-only comparison afterward
+ jitter algorithm: full jitter in [0, cap], injectable RNG for tests
```
5. **Add circuit breaker for repeated transient failures**
Analysis: Infinite transient retries can run forever on systemic failures (DB corruption, bad network policy). After N transient failures, pause with actionable reason.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ Scheduler states
- backoff — transient failures, waiting to retry
+ backoff — transient failures, waiting to retry
+ paused — permanent error OR circuit breaker tripped after N transient failures
@@ Service run flow
- On transient failure: increment failures, compute backoff
+ On transient failure: increment failures, compute backoff, if failures >= max_transient_failures -> pause
```
6. **Stage-aware outcome policy (core freshness over all-or-nothing)**
Analysis: Failing embeddings/docs should not block issues/MRs freshness. Split stage outcomes and only treat core stages as hard-fail by default. This improves reliability and practical usefulness.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ Context
- lore sync runs a 4-stage pipeline ... treated as one run result
+ lore service run records per-stage outcomes (issues, mrs, docs, embeddings)
@@ Status File Schema
+ pub stage_results: Vec<StageResult>,
@@ service run flow
- Execute sync pipeline with flags derived from profile
+ Execute stage-by-stage and classify severity:
+ - critical: issues, mrs
+ - optional: docs, embeddings
+ optional stage failures mark run as degraded, not failed
```
7. **Replace cfg free-function backend with trait-based backend**
Analysis: Current backend API is hard to test end-to-end without real OS commands. A `SchedulerBackend` trait enables deterministic integration tests and cleaner architecture.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ Platform Backends / Architecture
- exports free functions dispatched via #[cfg]
+ define trait SchedulerBackend { install, uninstall, state, file_paths, next_run }
+ provide LaunchdBackend, SystemdBackend, SchtasksBackend implementations
+ include FakeBackend for integration tests
```
8. **Harden platform units and detect scheduler prerequisites**
Analysis: systemd user timers often fail silently without user manager/linger; launchd context can be wrong in headless sessions. Add explicit diagnostics and unit hardening.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ Linux systemd unit
[Service]
Type=oneshot
ExecStart=...
+TimeoutStartSec=900
+NoNewPrivileges=true
+PrivateTmp=true
+ProtectSystem=strict
+ProtectHome=read-only
@@ Linux install/status
+ detect user manager availability and linger state; surface warning/action
@@ macOS install/status
+ detect non-GUI bootstrap context and return actionable error
```
9. **Add operational commands: `trigger`, `doctor`, and non-interactive log tail**
Analysis: `logs` opening an editor is weak for automation and incident response. Operators need a preflight and immediate controlled run.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ ServiceCommand
+ Trigger, // run one attempt through service policy now
+ Doctor, // validate scheduler, token, paths, permissions
@@ logs
- opens editor
+ supports --tail <n> and --follow in human mode
+ robot mode can return last_n lines optionally
```
10. **Fix plan inconsistencies and edge-case correctness**
Analysis: There are internal mismatches that will cause implementation drift.
Diff:
```diff
--- a/plan.md
+++ b/plan.md
@@ Interval Parsing
- supports 's' suffix
+ remove 's' suffix (acceptance only allows 5m..24h)
@@ uninstall acceptance
- removes ALL service files only
+ explicitly also remove service-manifest and service-env (status/logs retained)
@@ SyncStatusFile schema
- pub last_run: SyncRunRecord
+ pub last_run: Option<SyncRunRecord> // matches idle/no runs state
```
---
**Recommended Architecture Upgrade Summary**
The strongest improvement set is: **(1) project-scoped IDs, (2) secure token defaults, (3) atomic/versioned state, (4) persisted retry schedule + circuit breaker, (5) stage-aware outcomes**. That combination materially improves correctness, multi-repo safety, security, operability, and real-world reliability without changing your core manual-vs-scheduled separation principle.

View File

@@ -0,0 +1,174 @@
Below are the highest-impact revisions Id make, ordered by severity/ROI. These focus on correctness first, then security, then operability and UX.
1. **Fix multi-install ambiguity (`service_id` exists, but commands cant target one explicitly)**
Analysis: The plan introduces `service-manifest-{service_id}.json`, but `status/uninstall/resume/logs` have no selector. In a multi-workspace or multi-name install scenario, behavior becomes ambiguous and error-prone. Add explicit targeting plus discovery.
```diff
@@ ## Commands & User Journeys
+### `lore service list`
+Lists installed services discovered from `{data_dir}/service-manifest-*.json`.
+Robot output includes `service_id`, `platform`, `interval_seconds`, `profile`, `installed_at_iso`.
@@ ### `lore service uninstall`
-### `lore service uninstall`
+### `lore service uninstall [--service <service_id|name>] [--all]`
@@
-2. CLI reads install manifest to find `service_id`
+2. CLI resolves target service via `--service` or current-project-derived default.
+3. If multiple candidates and no selector, return actionable error.
@@ ### `lore service status`
-### `lore service status`
+### `lore service status [--service <service_id|name>]`
```
2. **Make status state service-scoped (not global)**
Analysis: A single `sync-status.json` for all services causes cross-service contamination (pause/backoff/outcome from one profile affecting another). Keep lock global, but state per service.
```diff
@@ ## Status File
-### Location
-`{get_data_dir()}/sync-status.json`
+### Location
+`{get_data_dir()}/sync-status-{service_id}.json`
@@ ## Paths Module Additions
-pub fn get_service_status_path() -> PathBuf {
- get_data_dir().join("sync-status.json")
+pub fn get_service_status_path(service_id: &str) -> PathBuf {
+ get_data_dir().join(format!("sync-status-{service_id}.json"))
}
@@
-Note: `sync-status.json` is NOT scoped by `service_id`
+Note: status is scoped by `service_id`; lock remains global (`sync_pipeline`) to prevent overlapping writes.
```
3. **Stop classifying permanence via string matching**
Analysis: Matching `"401 Unauthorized"` in strings is brittle and will misclassify edge cases. Carry machine codes through stage results and classify by `ErrorCode` only.
```diff
@@ pub struct StageResult {
- pub error: Option<String>,
+ pub error: Option<String>,
+ pub error_code: Option<String>, // e.g., AUTH_FAILED, NETWORK_ERROR
}
@@ Error classification helpers
-fn is_permanent_error_message(msg: Option<&str>) -> bool { ...string contains... }
+fn is_permanent_error_code(code: Option<&str>) -> bool {
+ matches!(code, Some("TOKEN_NOT_SET" | "AUTH_FAILED" | "CONFIG_NOT_FOUND" | "CONFIG_INVALID" | "MIGRATION_FAILED"))
+}
```
4. **Install should be transactional (manifest written last)**
Analysis: Current order writes manifest before scheduler enable. If enable fails, you persist a false “installed” state. Use two-phase install with rollback.
```diff
@@ ### `lore service install` User journey
-9. CLI writes install manifest ...
-10. CLI runs the platform-specific enable command
+9. CLI runs the platform-specific enable command
+10. On success, CLI writes install manifest atomically
+11. On failure, CLI removes generated files and returns `ServiceCommandFailed`
```
5. **Fix launchd token security gap (env-file currently still embeds token)**
Analysis: Current “env-file” on macOS still writes token into plist, defeating the main security goal. Generate a private wrapper script that reads env file at runtime and execs `lore`.
```diff
@@ ### macOS: launchd
-<key>ProgramArguments</key>
-<array>
- <string>{binary_path}</string>
- <string>--robot</string>
- <string>service</string>
- <string>run</string>
-</array>
+<key>ProgramArguments</key>
+<array>
+ <string>{data_dir}/service-run-{service_id}.sh</string>
+</array>
@@
-`env-file`: ... token value must still appear in plist ...
+`env-file`: token never appears in plist; wrapper loads `{data_dir}/service-env-{service_id}` at runtime.
```
6. **Improve backoff math and add half-open circuit recovery**
Analysis: Current jitter + min clamp makes first retry deterministic and can over-pause. Also circuit-breaker requires manual resume forever. Add cooldown + half-open probe to self-heal.
```diff
@@ Backoff Logic
-let backoff_secs = ((base_backoff as f64) * jitter_factor) as u64;
-let backoff_secs = backoff_secs.max(base_interval_seconds);
+let max_backoff = base_backoff;
+let min_backoff = base_interval_seconds;
+let span = max_backoff.saturating_sub(min_backoff);
+let backoff_secs = min_backoff + ((span as f64) * jitter_factor) as u64;
@@ Scheduler states
-- `paused` — permanent error ... OR circuit breaker tripped ...
+- `paused` — permanent error requiring intervention
+- `half_open` — probe state after circuit cooldown; one trial run allowed
@@ Circuit breaker
-... transitions to `paused` ... Run: lore service resume
+... transitions to `half_open` after cooldown (default 30m). Successful probe closes breaker automatically; failed probe returns to backoff/paused.
```
7. **Promote backend trait to v1 (not v2) for deterministic integration tests**
Analysis: This is a reliability-critical feature spanning OS schedulers. A trait abstraction now gives true behavior tests and safer refactors.
```diff
@@ ### Platform Backends
-> Future architecture note: A `SchedulerBackend` trait ... for v2.
+Adopt `SchedulerBackend` trait in v1 with real backends (`launchd/systemd/schtasks`) and `FakeBackend` for tests.
+This enables deterministic install/uninstall/status/run-path integration tests without touching host scheduler.
```
8. **Harden `run_cmd` timeout behavior**
Analysis: If timeout occurs, child process must be killed and reaped. Otherwise you leak processes and can wedge repeated runs.
```diff
@@ fn run_cmd(...)
-// Wait with timeout
-let output = wait_with_timeout(output, timeout_secs)?;
+// Wait with timeout; on timeout kill child and wait to reap
+let output = wait_with_timeout_kill_and_reap(child, timeout_secs)?;
```
9. **Add manual control commands (`pause`, `trigger`, `repair`)**
Analysis: These are high-utility operational controls. `trigger` helps immediate sync without waiting interval. `pause` supports maintenance windows. `repair` avoids manual file deletion for corrupt state.
```diff
@@ pub enum ServiceCommand {
+ /// Pause scheduled execution without uninstalling
+ Pause { #[arg(long)] reason: Option<String> },
+ /// Trigger an immediate one-off run using installed profile
+ Trigger { #[arg(long)] ignore_backoff: bool },
+ /// Repair corrupt manifest/status by backing up and reinitializing
+ Repair { #[arg(long)] service: Option<String> },
}
```
10. **Make `logs` default non-interactive and add rotation policy**
Analysis: Opening editor by default is awkward for automation/SSH and slower for normal diagnosis. Defaulting to `tail` is more practical; `--open` can preserve editor behavior.
```diff
@@ ### `lore service logs`
-By default, opens in the user's preferred editor.
+By default, prints last 100 lines to stdout.
+Use `--open` to open editor.
@@
+Log rotation: rotate `service-stdout.log` / `service-stderr.log` at 10 MB, keep 5 files.
```
11. **Remove destructive/shell-unsafe suggested action**
Analysis: `actions(): ["rm {path}", ...]` is unsafe (shell injection + destructive guidance). Replace with safe command path.
```diff
@@ LoreError::actions()
-Self::ServiceCorruptState { path, .. } => vec![&format!("rm {path}"), "lore service install"],
+Self::ServiceCorruptState { .. } => vec!["lore service repair", "lore service install"],
```
12. **Tighten scheduler units for real-world reliability**
Analysis: Add explicit working directory and success-exit handling to reduce environment drift and edge failures.
```diff
@@ systemd service unit
[Service]
Type=oneshot
ExecStart={binary_path} --robot service run
+WorkingDirectory={data_dir}
+SuccessExitStatus=0
TimeoutStartSec=900
```
If you want, I can produce a single consolidated “v3 plan” markdown with these revisions already merged into your original structure.

View File

@@ -0,0 +1,190 @@
No `## Rejected Recommendations` section was present in the plan you shared, so the proposals below are all net-new.
1. **Make scheduled runs explicitly target a single service instance**
Analysis: right now `service run` has no selector, but the plan supports multiple installed services. That creates ambiguity and incorrect manifest/status selection. This is the most important architectural fix.
```diff
@@ `lore service install` What it does
- runs `lore --robot service run` at the specified interval
+ runs `lore --robot service run --service-id <service_id>` at the specified interval
@@ Robot output (`install`)
- "sync_command": "/usr/local/bin/lore --robot service run",
+ "sync_command": "/usr/local/bin/lore --robot service run --service-id a1b2c3d4",
@@ `ServiceCommand` enum
- #[command(hide = true)]
- Run,
+ #[command(hide = true)]
+ Run {
+ /// Internal selector injected by scheduler backend
+ #[arg(long, hide = true)]
+ service_id: String,
+ },
@@ `handle_service_run` signature
-pub fn handle_service_run(start: std::time::Instant) -> Result<(), Box<dyn std::error::Error>>
+pub fn handle_service_run(service_id: &str, start: std::time::Instant) -> Result<(), Box<dyn std::error::Error>>
@@ run flow step 1
- Read install manifest
+ Read install manifest for `service_id`
```
2. **Strengthen `service_id` derivation to avoid cross-workspace collisions**
Analysis: hashing config path alone can collide when many workspaces share one global config. Identity should represent what is being synced, not only where config lives.
```diff
@@ Key Design Principles / Project-Scoped Service Identity
- derive from a stable hash of the config file path
+ derive from a stable fingerprint of:
+ - canonical workspace root
+ - normalized configured GitLab project URLs
+ - canonical config path
+ then take first 12 hex chars of SHA-256
@@ `compute_service_id`
- Returns first 8 hex chars of SHA-256 of the canonical config path.
+ Returns first 12 hex chars of SHA-256 of a canonical identity tuple
+ (workspace_root + sorted project URLs + config_path).
```
3. **Introduce a service-state machine with a dedicated admin lock**
Analysis: install/uninstall/pause/resume/repair/status can race each other. A lock and explicit transition table prevents invalid states and file races.
```diff
@@ New section: Service State Model
+ All state mutations are serialized by `AppLock("service-admin-{service_id}")`.
+ Legal transitions:
+ - idle -> running -> success|degraded|backoff|paused
+ - backoff -> running|paused
+ - paused -> half_open|running (resume)
+ - half_open -> running|paused
+ Any invalid transition is rejected with `ServiceCorruptState`.
@@ `handle_install`, `handle_uninstall`, `handle_pause`, `handle_resume`, `handle_repair`
+ Acquire `service-admin-{service_id}` before mutating manifest/status/service files.
```
4. **Unify manual and scheduled sync execution behind one orchestrator**
Analysis: the plan currently duplicates stage logic and error classification in `service run`, increasing drift risk. A shared orchestrator gives one authoritative pipeline behavior.
```diff
@@ Key Design Principles
+ #### 6. Single Sync Orchestrator
+ Both `lore sync` and `lore service run` call `SyncOrchestrator`.
+ Service mode adds policy (backoff/circuit-breaker); manual mode bypasses policy.
@@ Service Run Implementation
- execute_sync_stages(&sync_args)
+ SyncOrchestrator::run(SyncMode::Service { profile, policy })
@@ manual sync
- separate pipeline path
+ SyncOrchestrator::run(SyncMode::Manual { flags })
```
5. **Add bounded in-run retries for transient core-stage failures**
Analysis: single-shot failure handling will over-trigger backoff on temporary network blips. One short retry per core stage significantly improves freshness without much extra runtime.
```diff
@@ Stage-aware execution
+ Core stages (`issues`, `mrs`) get up to 1 immediate retry on transient errors
+ (jittered 1-5s). Permanent errors are never retried.
+ Optional stages keep best-effort semantics.
@@ Acceptance criteria (`service run`)
+ Retries transient core stage failures once before counting run as failed.
```
6. **Harden persistence with full crash-safety semantics**
Analysis: current atomic write description is good but incomplete for power-loss durability. You should fsync parent directory after rename and include lightweight integrity metadata.
```diff
@@ `write_atomic`
- tmp file + fsync + rename
+ tmp file + fsync(file) + rename + fsync(parent_dir)
@@ `ServiceManifest` and `SyncStatusFile`
+ pub write_seq: u64,
+ pub content_sha256: String, // optional integrity guard for repair/doctor
```
7. **Fix token handling to avoid shell/env injection and add secure-store mode**
Analysis: sourcing env files in shell is brittle if token contains special chars/newlines. Also, secure OS credential stores should be first-class for production reliability/security.
```diff
@@ Token storage strategies
-| `env-file` (default) ...
+| `auto` (default) | use secure-store when available, else env-file |
+| `secure-store` | macOS Keychain / libsecret / Windows Credential Manager |
+| `env-file` | explicit fallback |
@@ macOS wrapper script
-. "{data_dir}/service-env-{service_id}"
-export {token_env_var}
+TOKEN_VALUE="$(cat "{data_dir}/service-token-{service_id}" )"
+export {token_env_var}="$TOKEN_VALUE"
@@ Acceptance criteria
+ Reject token values containing `\0` or newline for env-file mode.
+ Never eval/source untrusted token content.
```
8. **Correct platform/runtime implementation hazards**
Analysis: there are a few correctness risks that should be fixed in-plan now.
```diff
@@ macOS install steps
- Get UID via `unsafe { libc::getuid() }`
+ Get UID via safe API (`nix::unistd::Uid::current()` or equivalent safe helper)
@@ Command Runner Helper
- poll try_wait and read stdout/stderr after exit
+ avoid potential pipe backpressure deadlock:
+ use wait-with-timeout + concurrent stdout/stderr draining
@@ Linux timer
- OnUnitActiveSec={interval_seconds}s
+ OnUnitInactiveSec={interval_seconds}s
+ AccuracySec=1min
```
9. **Make logs fully service-scoped**
Analysis: you already scoped manifest/status by `service_id`; logs are still global in several places. Multi-service installs will overwrite each others logs.
```diff
@@ Paths Module Additions
-pub fn get_service_log_path() -> PathBuf
+pub fn get_service_log_path(service_id: &str, stream: LogStream) -> PathBuf
@@ log filenames
- logs/service-stderr.log
- logs/service-stdout.log
+ logs/service-{service_id}-stderr.log
+ logs/service-{service_id}-stdout.log
@@ `service logs`
- default path: `{data_dir}/logs/service-stderr.log`
+ default path: `{data_dir}/logs/service-{service_id}-stderr.log`
```
10. **Resolve internal spec contradictions and rollback gaps**
Analysis: there are a few contradictory statements and incomplete rollback behavior that will cause implementation churn.
```diff
@@ `service logs` behavior
- default (no flags): open in editor (human)
+ default (no flags): print last 100 lines (human and robot metadata mode)
+ `--open` is explicit opt-in
@@ install rollback
- On failure: removes generated service files
+ On failure: removes generated service files, env file, wrapper script, and temp manifest
@@ `handle_service_run` sample code
- let manifest_path = get_service_manifest_path();
+ let manifest_path = get_service_manifest_path(service_id);
```
If you want, I can take these revisions and produce a single consolidated “Iteration 4” replacement plan block with all sections rewritten coherently so its ready to hand to an implementer.

View File

@@ -0,0 +1,196 @@
I reviewed the full plan and avoided everything already listed in `## Rejected Recommendations`. These are the highest-impact revisions Id make.
1. **Fix identity model inconsistency and prevent `--name` alias collisions**
Why this improves the plan: your text says identity includes workspace root, but the current derivation code does not. Also, using `--name` as the actual `service_id` risks accidental cross-project collisions and destructive updates.
```diff
--- a/plan.md
+++ b/plan.md
@@ Key Design Principles / 2. Project-Scoped Service Identity
- Each installed service gets a unique `service_id` derived from a canonical identity tuple: the config file path, sorted GitLab project URLs, and workspace root.
+ Each installed service gets an immutable `identity_hash` derived from a canonical identity tuple:
+ workspace root + canonical config path + sorted normalized project URLs.
+ `service_id` remains the scheduler identifier; `--name` is a human alias only.
+ If `--name` collides with an existing service that has a different `identity_hash`, install fails with an actionable error.
@@ Install Manifest / Schema
+ /// Immutable identity hash for collision-safe matching across reinstalls
+ pub identity_hash: String,
+ /// Optional human-readable alias passed via --name
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub service_alias: Option<String>,
+ /// Canonical workspace root used in identity derivation
+ pub workspace_root: String,
@@ service_id derivation
-pub fn compute_service_id(config_path: &Path, project_urls: &[&str]) -> String
+pub fn compute_identity_hash(workspace_root: &Path, config_path: &Path, project_urls: &[&str]) -> String
```
2. **Add lock protocol to eliminate uninstall/run race conditions**
Why this improves the plan: today `service run` does not take admin lock, and admin commands do not take pipeline lock. `uninstall` can race with an active run and remove files mid-execution.
```diff
--- a/plan.md
+++ b/plan.md
@@ Key Design Principles / 6. Serialized Admin Mutations
- The `service run` entrypoint does NOT acquire the admin lock — it only acquires the `sync_pipeline` lock
+ The `service run` entrypoint acquires only `sync_pipeline`.
+ Destructive admin operations (`install` overwrite, `uninstall`, `repair --regenerate`) must:
+ 1) acquire `service-admin-{service_id}`
+ 2) disable scheduler backend entrypoint
+ 3) acquire `sync_pipeline` lock with timeout
+ 4) mutate/remove files
+ This lock ordering is mandatory to prevent deadlocks and run/delete races.
@@ lore service uninstall / User journey
- 4. Runs platform-specific disable command
- 5. Removes service files from disk
+ 4. Acquires `sync_pipeline` lock (after disabling scheduler) with bounded wait
+ 5. Removes service files from disk only after lock acquisition
```
3. **Make transient handling `Retry-After` aware**
Why this improves the plan: rate-limit and 503 responses often carry retry hints. Ignoring them causes useless retries and longer outages.
```diff
--- a/plan.md
+++ b/plan.md
@@ Transient vs permanent error classification
-| Transient | Retry with backoff | Network timeout, rate limited, DB locked, 5xx from GitLab |
+| Transient | Retry with adaptive backoff | Network timeout, DB locked, 5xx from GitLab |
+| Transient (hinted) | Respect server retry hint | Rate limited with Retry-After/X-RateLimit-Reset |
@@ Backoff Logic
+ If an error includes a retry hint (e.g., `Retry-After`), set:
+ `next_retry_at_ms = max(computed_backoff, hinted_retry_at_ms)`.
+ Persist `backoff_reason` for status visibility.
```
4. **Decouple optional stage cadence from core sync interval**
Why this improves the plan: running docs/embeddings every 530 minutes is expensive and unnecessary. Separate freshness windows reduce cost/latency while keeping core data fresh.
```diff
--- a/plan.md
+++ b/plan.md
@@ Sync profiles
-| `balanced` (default) | `--no-embed` | Issues + MRs + doc generation |
-| `full` | (none) | Full pipeline including embeddings |
+| `balanced` (default) | core every interval, docs every 60m, no embeddings | Fast + useful docs |
+| `full` | core every interval, docs every interval, embeddings every 6h (default) | Full freshness with bounded cost |
@@ Status File / StageResult
+ /// true when stage intentionally skipped due freshness window
+ #[serde(default)]
+ pub skipped: bool,
@@ lore service run / Stage-aware execution
+ Optional stages may be skipped when their last successful run is within configured freshness windows.
+ Skipped optional stages do not count as failures and are recorded explicitly.
```
5. **Give Windows parity for secure token handling (env-file + wrapper)**
Why this improves the plan: current Windows path requires global/system env and has poor UX. A wrapper+env-file model gives platform parity and avoids global token exposure.
```diff
--- a/plan.md
+++ b/plan.md
@@ Token storage strategies
-| On Windows, neither strategy applies — the token must be in the user's system environment
+| On Windows, `env-file` is supported via a generated wrapper script (`service-run-{service_id}.cmd` or `.ps1`)
+| that reads `{data_dir}/service-env-{service_id}` and launches `lore --robot service run ...`.
+| `embedded` remains opt-in and warned as less secure.
@@ Windows: schtasks
- Token handling on Windows: The env var must be set system-wide via `setx`
+ Token handling on Windows:
+ - `env-file` (default): wrapper script reads token from private file at runtime
+ - `embedded`: token passed via wrapper-set environment variable
+ - `system_env`: still supported as fallback
```
6. **Add run heartbeat and stale-run detection**
Why this improves the plan: `running` state can become misleading after crashes or stale locks. Heartbeat metadata makes status accurate and improves incident triage.
```diff
--- a/plan.md
+++ b/plan.md
@@ Status File / Schema
+ /// In-flight run metadata for crash/stale detection
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub current_run: Option<CurrentRunState>,
+
+pub struct CurrentRunState {
+ pub run_id: String,
+ pub started_at_ms: i64,
+ pub last_heartbeat_ms: i64,
+ pub pid: u32,
+}
@@ lore service status
- - `running` — currently executing (sync_pipeline lock held)
+ - `running` — currently executing with live heartbeat
+ - `running_stale` — in-flight metadata exists but heartbeat exceeded stale threshold
```
7. **Upgrade drift detection from “loaded/unloaded” to spec-level drift**
Why this improves the plan: platform state alone misses manual edits to unit/plist/wrapper files. Spec-hash drift gives reliable “what changed?” diagnostics and safe regeneration.
```diff
--- a/plan.md
+++ b/plan.md
@@ Install Manifest / Schema
+ /// Hash of generated scheduler artifacts and command spec
+ pub spec_hash: String,
@@ lore service status
- Detects manifest/platform drift and reports it
+ Detects:
+ - platform drift (loaded/unloaded mismatch)
+ - spec drift (artifact content hash mismatch)
+ - command drift (sync command differs from manifest)
@@ lore service repair
+ Add `--regenerate` to rewrite scheduler artifacts from manifest when spec drift is detected.
+ This is non-destructive and does not delete status/log history.
```
8. **Add safe operational modes: `install --dry-run` and `doctor --fix`**
Why this improves the plan: dry-run reduces risk before writing OS scheduler files; fix-mode improves operator ergonomics and lowers support burden.
```diff
--- a/plan.md
+++ b/plan.md
@@ lore service install
+ Add `--dry-run`:
+ - validates config/token/prereqs
+ - renders service files and planned commands
+ - writes nothing, executes nothing
@@ lore service doctor
+ Add `--fix` for safe, non-destructive remediations:
+ - create missing dirs
+ - correct file permissions on env/wrapper files
+ - run `systemctl --user daemon-reload` when applicable
+ - report applied fixes in robot output
```
9. **Define explicit schema migration behavior (not just `schema_version` fields)**
Why this improves the plan: version fields without migration policy become operational risk during upgrades.
```diff
--- a/plan.md
+++ b/plan.md
@@ ServiceManifest Read/Write
- `ServiceManifest::read(path: &Path) -> Result<Option<Self>, LoreError>`
+ `ServiceManifest::read_and_migrate(path: &Path) -> Result<Option<Self>, LoreError>`
+ - Migrates known older schema versions to current in-memory model
+ - Rewrites migrated file atomically
+ - Fails with actionable `ServiceCorruptState` for unknown future major versions
@@ SyncStatusFile Read/Write
- `SyncStatusFile::read(path: &Path) -> Result<Option<Self>, LoreError>`
+ `SyncStatusFile::read_and_migrate(path: &Path) -> Result<Option<Self>, LoreError>`
```
If you want, I can produce a fully rewritten v5 plan text that integrates all nine changes coherently section-by-section.

3759
plans/lore-service.md Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,250 @@
# plan-to-beads v2 — Draft for Review
This is a draft of the improved skill. Review before applying to `~/.claude/skills/plan-to-beads/SKILL.md`.
---
```markdown
---
name: plan-to-beads
description: Transforms markdown implementation plans into granular, agent-ready beads with dependency graphs. Each bead is fully self-contained — an agent can execute it with zero external context. Triggers on "break down this plan", "create beads from", "convert to beads", "make issues from plan".
argument-hint: "[path/to/plan.md]"
---
# Plan to Beads Conversion
## The Prime Directive
**Every bead must be executable by an agent that has ONLY the bead description.** No plan document. No Slack context. No "see the PRD." The bead IS the spec. If an agent can't start coding within 60 seconds of reading the bead, it's not ready.
## Workflow
```
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ 1. PARSE │──▶│ 2. MINE │──▶│ 3. BUILD │──▶│ 4. LINK │──▶│ 5. AUDIT │
│ Structure│ │ Context │ │ Beads │ │ Deps │ │ Quality │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
```
### 1. Parse Structure
Read the plan document. Identify:
- **Epics**: Major sections / phases / milestones
- **Tasks**: Implementable units with clear outcomes (1-4 hour scope)
- **Subtasks**: Granular steps within tasks
### 2. Mine Context
This is the critical step. For EACH identified task, extract everything an implementing agent will need.
#### From the plan document:
| Extract | Where to look | Example |
|---------|--------------|---------|
| **Rationale** | Intro paragraphs, "why" sections | "We need this because the current approach causes N+1 queries" |
| **Approach details** | Implementation notes, code snippets, architecture decisions | "Use a 5-stage pipeline: SEED → HYDRATE → ..." |
| **Test requirements** | TDD sections, acceptance criteria, "verify by" notes | "Test that empty input returns empty vec" |
| **Edge cases & risks** | Warnings, gotchas, "watch out for" notes | "Multi-byte UTF-8 chars can cause panics at byte boundaries" |
| **Data shapes** | Type definitions, struct descriptions, API contracts | "TimelineEvent { kind: EventKind, timestamp: DateTime, ... }" |
| **File paths** | Explicit mentions or inferable from module structure | "src/core/timeline_seed.rs" |
| **Dependencies on other tasks** | "requires X", "after Y is done", "uses Z from step N" | "Consumes the TimelineEvent struct from the types task" |
| **Verification commands** | Test commands, CLI invocations, expected outputs | "cargo test timeline_seed -- --nocapture" |
#### From the codebase:
Search the codebase to supplement what the plan says:
- Find existing files mentioned or implied by the plan
- Discover patterns the task should follow (e.g., how existing similar modules are structured)
- Check test files for naming conventions and test infrastructure in use
- Confirm exact file paths rather than guessing
Use codebase search tools (WarpGrep, Explore agent, or targeted Grep/Glob) appropriate to the scope of what you need to find.
### 3. Build Beads
Use `br` exclusively.
| Type | Priority | Command |
|------|----------|---------|
| Epic | 1 | `br create "Epic: [Title]" -p 1` |
| Task | 2-3 | `br create "[Verb] [Object]" -p 2` |
| Subtask | 3-4 | `br q "[Verb] [Object]"` |
**Granularity target**: Each bead completable in 1-4 hours by one agent.
#### Description Templates
Use the **full template** for all task-level beads. Use the **light template** only for trivially small tasks (config change, single-line fix, add a re-export).
##### Full Template (default)
```
## Background
[WHY this exists. What problem it solves. How it fits into the larger system.
Include enough context that an agent unfamiliar with the project understands
the purpose. Reference architectural patterns in use.]
## Approach
[HOW to implement. Be specific:
- Data structures / types to create or use (include field names and types)
- Algorithms or patterns to follow
- Code snippets from the plan if available
- Which existing code to reference for patterns (exact file paths)]
## Acceptance Criteria
### Specified (from plan — implement as-is)
- [ ] <criteria explicitly stated in the plan>
- [ ] <criteria explicitly stated in the plan>
### Proposed (inferred — confirm with user before implementing) [?]
- [ ] [?] <criteria the agent inferred but the plan didn't specify>
- [ ] [?] <criteria the agent inferred but the plan didn't specify>
**ASSUMPTION RULE**: If proposed criteria exceed ~30% of total, STOP.
The bead needs human input before it's ready for implementation. Flag it
in the audit output and ask the user to refine the ACs.
## Files
[Exact paths to create or modify. Confirmed by searching the codebase.]
- CREATE: src/foo/bar.rs
- MODIFY: src/foo/mod.rs (add pub mod bar)
- MODIFY: tests/foo_tests.rs (add test module)
## TDD Anchor
[The first test to write. This grounds the agent's work.]
RED: Write `test_<name>` in `<test_file>` that asserts <specific behavior>.
GREEN: Implement the minimal code to make it pass.
VERIFY: <project's test command> <pattern>
[If the plan specifies additional tests, list them all:]
- test_empty_input_returns_empty_vec
- test_single_issue_produces_one_event
- test_handles_missing_fields_gracefully
## Edge Cases
[Gotchas, risks, things that aren't obvious. Pulled from the plan's warnings,
known issues, or your analysis of the approach.]
- <edge case 1>
- <edge case 2>
## Dependency Context
[For each dependency, explain WHAT it provides that this bead consumes.
Not just "depends on bd-xyz" but "uses the `TimelineEvent` struct and
`SeedConfig` type defined in bd-xyz".]
```
##### Light Template (trivially small tasks only)
Use this ONLY when the task is a one-liner or pure mechanical change (add a re-export, flip a config flag, rename a constant). If there's any ambiguity about approach, use the full template.
```
## What
[One sentence: what to do and where.]
## Acceptance Criteria
- [ ] <single binary criterion>
## Files
- MODIFY: <exact path>
```
### 4. Link Dependencies
```bash
br dep add [blocker-id] [blocked-id]
```
Dependency patterns:
- Types/structs → code that uses them
- Infrastructure (DB, config) → features that need them
- Core logic → extensions/enhancements
- Tests may depend on test helpers
**Critical**: When linking deps, update the "Dependency Context" section in the blocked bead to describe exactly what it receives from the blocker.
### 5. Audit Quality
Before reporting, review EVERY bead against this checklist:
| Check | Pass criteria |
|-------|--------------|
| **Self-contained?** | Agent can start coding in 60 seconds with ONLY this description |
| **TDD anchor?** | First test to write is named and described |
| **Binary criteria?** | Every acceptance criterion is pass/fail, not subjective |
| **Exact paths?** | File paths verified against codebase, not guessed |
| **Edge cases?** | At least 1 non-obvious gotcha identified |
| **Dep context?** | Each dependency explains WHAT it provides, not just its ID |
| **Approach specifics?** | Types, field names, patterns — not "implement the thing" |
| **Assumption budget?** | Proposed [?] criteria are <30% of total ACs |
If a bead fails any check, fix it before moving on. If the assumption budget is exceeded, flag the bead for human review rather than inventing more ACs.
## Assumption & AC Guidance
Agents filling in beads will inevitably encounter gaps in the plan. The rules:
1. **Never silently fill gaps.** If the plan doesn't specify a behavior, don't assume one and bury it in the ACs. Mark it `[?]` so the implementing agent knows to ask.
2. **Specify provenance on every AC.** Specified = from the plan. Proposed = your inference. The implementing agent treats these differently:
- **Specified**: implement without question
- **Proposed [?]**: pause and confirm with the user before implementing
3. **The 30% rule.** If more than ~30% of ACs on a bead are proposed/inferred, the plan was too vague for this task. Don't create the bead as-is. Instead:
- Create it with status noting "needs AC refinement"
- List the open questions explicitly
- Flag it in the output report under "Beads Needing Human Input"
4. **Prefer smaller scope over more assumptions.** If you're unsure whether a task should handle edge case X, make the bead's scope explicitly exclude it and note it as a potential follow-up. A bead that does less but does it right beats one that guesses wrong.
5. **Implementing agents: honor the markers.** When you encounter `[?]` on an AC, you MUST ask the user before implementing that behavior. Do not silently resolve it in either direction.
## Output Format
After completion, report:
```
## Beads Created: N total (X epics, Y tasks, Z subtasks)
### Quality Audit
- Beads scoring 4+: N/N (target: 100%)
- [list any beads that needed extra attention and why]
### Beads Needing Human Input
[List any beads where proposed ACs exceeded 30%, or where significant
ambiguity in the plan made self-contained descriptions impossible.
Include the specific open questions for each.]
### Critical Path
[blocker] → [blocked] → [blocked]
### Ready to Start
- bd-xxx: [Title] — [one-line summary of what agent will do]
- bd-yyy: [Title] — [one-line summary of what agent will do]
### Dependency Graph
[Brief visualization or description of the dep structure]
```
## Risk Tiers
| Operation | Tier | Behavior |
|-----------|------|----------|
| `br create` | SAFE | Auto-proceed |
| `br dep add` | SAFE | Auto-proceed |
| `br update --description` | CAUTION | Verify content |
| Bulk creation (>20 beads) | CAUTION | Confirm count first |
## Anti-Patterns
| Anti-Pattern | Why it's bad | Fix |
|-------------|-------------|-----|
| "Implement the pipeline stage" | Agent doesn't know WHAT to implement | Name the types, the function signatures, the test |
| "See plan for details" | Plan isn't available to the agent | Copy the relevant details INTO the bead |
| "Files: probably src/foo/" | Agent wastes time finding the right file | Search the codebase, confirm exact paths |
| "Should work correctly" | Not binary, not testable | "test_x passes" or "output matches Y" |
| No TDD anchor | Agent doesn't know where to start | Always specify the first test to write |
| "Depends on bd-xyz" (without context) | Agent doesn't know what bd-xyz provides | "Uses FooStruct and bar() function from bd-xyz" |
| Single-line description | Score 1 bead, agent is stuck | Use the full template, every section |
| Silently invented ACs | User surprised by implementation choices | Mark inferred ACs with [?], honor the 30% rule |
```

View File

@@ -0,0 +1,128 @@
**Best Revisions To Strengthen The Plan**
1. **[Critical] Replace one-hop rename matching with canonical path identities**
Analysis and rationale: Current `old_path OR new_path` fixes direct renames, but it still breaks on rename chains (`a.rs -> b.rs -> c.rs`) and split/move patterns. A canonical `path_identity` graph built from `mr_file_changes(old_path,new_path)` gives stable identity over time, which is the right architectural boundary for expertise history.
```diff
@@ ## Context
-- Match both old and new paths in all signal queries AND path resolution probes so expertise survives file renames
+- Build canonical path identities from rename edges and score by identity, not raw path strings, so expertise survives multi-hop renames and moves
@@ ## Files to Modify
-2. **`src/cli/commands/who.rs`** — Core changes:
+2. **`src/cli/commands/who.rs`** — Core changes:
...
- - Match both `new_path` and `old_path` in all signal queries (rename awareness)
+ - Resolve queried paths to `path_identity_id` and match all aliases in that identity set
+4. **`src/core/path_identity.rs`** — New module:
+ - Build/maintain rename graph from `mr_file_changes`
+ - Resolve path -> identity + aliases for probes/scoring
```
2. **[Critical] Shift scoring input from runtime CTE joins to a normalized `expertise_events` table**
Analysis and rationale: Your SQL is correct but complex and expensive at query time. Precomputing normalized events at ingestion gives simpler, faster, and more reliable scoring queries; it also enables model versioning/backfills without touching raw MR/note tables each request.
```diff
@@ ## Files to Modify
-3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes
+3. **`src/core/db.rs`** — Add migrations for:
+ - `expertise_events` table (normalized scoring events)
+ - supporting indexes
+4. **`src/core/ingest/expertise_events.rs`** — New:
+ - Incremental upsert of events during sync/ingest
@@ ## SQL Restructure (who.rs)
-The SQL uses CTE-based dual-path matching and hybrid aggregation...
+Runtime SQL reads precomputed `expertise_events` filtered by path identity + time window.
+Heavy joins/aggregation move to ingest-time normalization.
```
3. **[High] Upgrade reviewer engagement model beyond char-count threshold**
Analysis and rationale: `min_note_chars` is a useful guardrail but brittle (easy to game, penalizes concise high-quality comments). Add explicit review-state signals (`approved`, `changes_requested`) and trivial-comment pattern filtering to better capture real reviewer expertise.
```diff
@@ ## Scoring Formula
-| **Reviewer Participated** (left DiffNote on MR/path) | 10 | 90 days |
+| **Reviewer Participated** (substantive DiffNote and/or formal review action) | 10 | 90 days |
+| **Review Decision: changes_requested** | 6 | 120 days |
+| **Review Decision: approved** | 4 | 75 days |
@@ ### 1. ScoringConfig (config.rs)
pub reviewer_min_note_chars: u32,
+ pub reviewer_trivial_note_patterns: Vec<String>, // default: ["lgtm","+1","nit","ship it","👍"]
+ pub review_approved_weight: i64, // default: 4
+ pub review_changes_requested_weight: i64, // default: 6
```
4. **[High] Make temporal semantics explicit and deterministic**
Analysis and rationale: `--as-of` is good, but day parsing and boundary semantics can still cause subtle reproducibility issues. Define window as `[since_ms, as_of_ms)` and parse `YYYY-MM-DD` as end-of-day UTC (or explicit timezone) so user expectations match outputs.
```diff
@@ ### 5a. Reproducible Scoring via `--as-of`
-- All event selection is bounded by `[since_ms, as_of_ms]`
+- All event selection is bounded by `[since_ms, as_of_ms)` (exclusive upper bound)
+- `YYYY-MM-DD` is interpreted as `23:59:59.999Z` unless `--timezone` is provided
+- Robot output includes `window_start_iso`, `window_end_iso`, `window_end_exclusive: true`
```
5. **[High] Replace fixed default `--since 24m` with contribution-floor auto cutoff**
Analysis and rationale: A static window is simple but often over-scans data. Compute a model-derived horizon from a minimum contribution floor (for example `0.01` points) per signal; this keeps results equivalent while reducing query cost.
```diff
@@ ### 5. Default --since Change
-Expert mode: `"6m"` -> `"24m"`
+Expert mode default: `--since auto`
+`auto` computes earliest relevant timestamp from configured weights/half-lives and `min_contribution_floor`
+Add config: `min_contribution_floor` (default: 0.01)
+`--since` still overrides, `--all-history` still bypasses cutoff
```
6. **[High] Add bot/service-account filtering now (not later)**
Analysis and rationale: Bot activity can materially distort expertise rankings in real repos. This is low implementation cost with high quality gain and should be in v1 of the scoring revamp, not deferred.
```diff
@@ ### 1. ScoringConfig (config.rs)
+ pub excluded_username_patterns: Vec<String>, // default: ["bot","\\[bot\\]","service-account","ci-"]
@@ ### 2. SQL Restructure (who.rs)
+Apply username exclusion in all signal sources unless `--include-bots` is set
@@ ### 5b. Score Explainability via `--explain-score`
+Add `filtered_events` counts in robot output metadata
```
7. **[Medium] Enforce deterministic floating-point accumulation**
Analysis and rationale: Even with small sets, unordered `HashMap` iteration can cause tiny platform-dependent ranking differences near ties. Sorting contributions and using Neumaier summation removes nondeterminism and stabilizes tests/CI.
```diff
@@ ### 4. Rust-Side Aggregation (who.rs)
-Compute score as `f64`.
+Compute score as `f64` using deterministic contribution ordering:
+1) sort by (username, signal, mr_id, ts)
+2) sum with Neumaier compensation
+Tie-break remains `(raw_score DESC, last_seen DESC, username ASC)`
```
8. **[Medium] Strengthen explainability with evidence, not just totals**
Analysis and rationale: Component totals help, but disputes usually need “why this user got this score now.” Add compact top evidence rows per component (`mr_id`, `ts`, `raw_contribution`) behind an optional mode.
```diff
@@ ### 5b. Score Explainability via `--explain-score`
-Component breakdown only (4 floats per user).
+Add `--explain-score=summary|full`:
+`summary`: current 4-component totals
+`full`: adds top N evidence rows per component (default N=3)
+Robot output includes per-evidence `mr_id`, `signal`, `ts`, `contribution`
```
9. **[Medium] Make query plan strategy explicit: `UNION ALL` default for dual-path scans**
Analysis and rationale: You currently treat `UNION ALL` as fallback if planner regresses. For SQLite, OR-across-indexed-columns regressions are common enough that defaulting to branch-split queries is often more predictable.
```diff
@@ **Index optimization fallback (UNION ALL split)**
-Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm degradation.
+Use `UNION ALL` + dedup as default for dual-path matching.
+Keep `OR` variant as optional strategy flag for benchmarking/regression checks.
```
10. **[Medium] Add explicit performance SLO + benchmark gate**
Analysis and rationale: This plan is query-heavy and ranking-critical; add measurable performance budgets so future edits do not silently degrade UX. Include synthetic fixture benchmarks for exact, prefix, and suffix path modes.
```diff
@@ ## Verification
+8. Performance regression gate:
+ - `cargo bench --bench who_expert_scoring`
+ - Dataset tiers: 100k, 1M, 5M notes
+ - SLOs: p95 exact path < 150ms, prefix < 250ms, suffix < 400ms on reference hardware
+ - Fail CI if regression > 20% vs stored baseline
```
If you want, I can produce a single consolidated “iteration 5” plan document with these changes already merged into your current structure.

View File

@@ -0,0 +1,134 @@
I avoided everything already listed in your `Rejected Ideas` section and focused on net-new upgrades.
1. Centralize MR temporal semantics in one `mr_activity` CTE (architecture + correctness)
Why this improves the plan: right now the state-aware timestamp logic is repeated across multiple signal branches, while `closed_mr_multiplier` is applied later in Rust by string state checks. That split is brittle. A single `mr_activity` CTE removes drift risk, simplifies query maintenance, and avoids per-row state-string handling in Rust.
```diff
diff --git a/plan.md b/plan.md
@@ SQL Restructure
+mr_activity AS (
+ SELECT
+ m.id AS mr_id,
+ m.project_id,
+ m.author_username,
+ m.state,
+ CASE
+ WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.created_at)
+ WHEN m.state = 'closed' THEN COALESCE(m.closed_at, m.created_at)
+ ELSE COALESCE(m.updated_at, m.created_at)
+ END AS activity_ts,
+ CASE
+ WHEN m.state = 'closed' THEN ?5
+ ELSE 1.0
+ END AS state_mult
+ FROM merge_requests m
+ WHERE m.state IN ('opened','merged','closed')
+),
@@
-... {state_aware_ts} AS seen_at, m.state AS mr_state
+... a.activity_ts AS seen_at, a.state_mult
@@
-SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated
+SELECT username, signal, mr_id, qty, ts, state_mult FROM aggregated
```
2. Parameterize `reviewer_min_note_chars` and tighten config validation (robustness)
Why this improves the plan: inlining `reviewer_min_note_chars` into SQL text creates statement-cache churn and avoidable SQL-text variability. Also, current validation misses finite-range guards (`NaN`, absurd half-lives). Parameterization + stronger validation reduces weird failure modes.
```diff
diff --git a/plan.md b/plan.md
@@ 1. ScoringConfig (config.rs)
- reviewer_min_note_chars must be >= 0
+ reviewer_min_note_chars must be <= 4096
+ all half-life values must be <= 3650 (10 years safety cap)
+ closed_mr_multiplier must be finite and in (0.0, 1.0]
@@ SQL Restructure
-AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= {reviewer_min_note_chars}
+AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= ?6
```
3. Add path canonicalization before probes/scoring (correctness + UX)
Why this improves the plan: rename-awareness helps only after path resolution succeeds. Inputs like `./src//foo.rs` or inconsistent trailing slashes can still miss. Canonicalizing query paths up front reduces false negatives and ambiguous suffix behavior.
```diff
diff --git a/plan.md b/plan.md
@@ 3a. Path Resolution Probes (who.rs)
+Add `normalize_query_path()` before `build_path_query()`:
+- strip leading `./`
+- collapse repeated `/`
+- trim whitespace
+- preserve trailing `/` only for explicit prefix intent
+Expose both `path_input_original` and `path_input_normalized` in `resolved_input`.
@@ New Tests
+test_path_normalization_handles_dot_and_double_slash
+test_path_normalization_preserves_explicit_prefix_semantics
```
4. Add epsilon-based tie buckets for stable ranking (determinism)
Why this improves the plan: even with deterministic summation order, tiny `powf` platform differences can reorder near-equal scores. Tie bucketing keeps ordering stable and user-meaningful.
```diff
diff --git a/plan.md b/plan.md
@@ 4. Rust-Side Aggregation (who.rs)
-Sort on raw `f64` score — `(raw_score DESC, last_seen DESC, username ASC)`.
+Sort using a tie bucket:
+`score_bucket = (raw_score / 1e-9).floor() as i64`
+Order by `(score_bucket DESC, raw_score DESC, last_seen DESC, username ASC)`.
+This preserves precision while preventing meaningless micro-delta reorderings.
@@ New Tests
+test_near_equal_scores_use_stable_tie_bucket_order
```
5. Add `--diagnose-score` aggregated diagnostics (operability)
Why this improves the plan: `--explain-score` tells “why this user scored”, but not “why this query behaved oddly” (path ambiguity, dedup collapse, old_path contribution share, filtered bots, window exclusions). Lightweight aggregate diagnostics are high-value without per-MR drill-down complexity.
```diff
diff --git a/plan.md b/plan.md
@@ CLI changes (who.rs)
+Add `--diagnose-score` flag (compatible with `--explain-score`, incompatible with `--detail`).
+When enabled, include:
+- matched_notes_raw_count
+- matched_notes_dedup_count
+- matched_file_changes_raw_count
+- matched_file_changes_dedup_count
+- rows_excluded_by_window_upper_bound
+- users_filtered_by_excluded_usernames
+- query_elapsed_ms
@@ Robot output
+`diagnostics` object emitted only when `--diagnose-score` is set.
```
6. Add probe-optimized indexes for path resolution (performance)
Why this improves the plan: current proposed indexes are optimized for scoring joins, but `build_path_query()` and `suffix_probe()` run existence/path-only probes where `author_username` is not constrained. Dedicated probe indexes will materially reduce latency for path lookup modes.
```diff
diff --git a/plan.md b/plan.md
@@ 6. Index Migration (db.rs)
+-- Fast exact/prefix/suffix path probes on notes (no author predicate)
+CREATE INDEX IF NOT EXISTS idx_notes_new_path_project_created
+ ON notes(position_new_path, project_id, created_at)
+ WHERE note_type = 'DiffNote' AND is_system = 0 AND position_new_path IS NOT NULL;
+
+CREATE INDEX IF NOT EXISTS idx_notes_old_path_project_created
+ ON notes(position_old_path, project_id, created_at)
+ WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;
```
7. Add multi-path expert scoring (`--path` repeatable) with dedup across paths (feature + utility)
Why this improves the plan: current model is single-path centric. Real ownership questions are usually subsystem-level. Repeatable paths/prefixes let users ask “who knows auth stack?” in one call. Dedup by `(username, signal, mr_id)` avoids double-counting same MR touching multiple requested paths.
```diff
diff --git a/plan.md b/plan.md
@@ CLI/feature scope
+Add repeatable `--path` in expert mode:
+`lore who --expert --path src/auth/ --path src/session/`
+Optional `--path-file <file>` for large path sets (one per line).
@@ SQL Restructure
+Add `requested_paths` CTE and match each source against that set.
+Ensure dedup key includes `(username, signal, mr_id)` so one MR contributes once per signal even if multiple paths match.
@@ New Tests
+test_multi_path_query_unions_results_without_double_counting
+test_multi_path_with_overlap_prefixes_is_idempotent
```
These 7 revisions keep your current model direction intact, but reduce correctness drift risk, harden edge handling, improve query observability, and make the feature materially more useful for real ownership workflows.

View File

@@ -2,12 +2,12 @@
plan: true
title: ""
status: iterating
iteration: 4
iteration: 6
target_iterations: 8
beads_revision: 0
beads_revision: 2
related_plans: []
created: 2026-02-08
updated: 2026-02-08
updated: 2026-02-12
---
# Time-Decay Expert Scoring Model
@@ -70,7 +70,8 @@ Author/reviewer signals are deduplicated per MR (one signal per distinct MR). No
1. **`src/core/config.rs`** — Add half-life fields + assigned-only reviewer config to `ScoringConfig`; add config validation
2. **`src/cli/commands/who.rs`** — Core changes:
- Add `half_life_decay()` pure function
- Restructure `query_expert()`: SQL returns hybrid-aggregated signal rows with timestamps (MR-level for author/reviewer, note-count-per-MR for notes), Rust applies decay + `log2(1+count)` + final ranking
- Add `normalize_query_path()` for input canonicalization before path resolution
- Restructure `query_expert()`: SQL returns hybrid-aggregated signal rows with timestamps and state multiplier (MR-level for author/reviewer, note-count-per-MR for notes), Rust applies decay + `log2(1+count)` + final ranking
- Match both `new_path` and `old_path` in all signal queries (rename awareness)
- Extend rename awareness to `build_path_query()` probes and `suffix_probe()` (not just scoring)
- Split reviewer signal into participated vs assigned-only
@@ -78,6 +79,7 @@ Author/reviewer signals are deduplicated per MR (one signal per distinct MR). No
- Change default `--since` from `"6m"` to `"24m"` (2 years captures all meaningful decayed signals)
- Add `--as-of` flag for reproducible scoring at a fixed timestamp
- Add `--explain-score` flag for per-user score component breakdown
- Add `--include-bots` flag to disable bot/service-account filtering
- Sort on raw f64 score, round only for display
- Update tests
3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes (dual-path matching, reviewer participation CTE, path resolution probes)
@@ -100,14 +102,16 @@ pub struct ScoringConfig {
pub note_half_life_days: u32, // default: 45
pub closed_mr_multiplier: f64, // default: 0.5 (applied to closed-without-merge MRs)
pub reviewer_min_note_chars: u32, // default: 20 (minimum note body length to count as participation)
pub excluded_usernames: Vec<String>, // default: [] (exact-match usernames to exclude, e.g. ["renovate-bot", "gitlab-ci"])
}
```
**Config validation**: Add a `validate_scoring()` call in `Config::load_from_path()` after deserialization:
- All `*_half_life_days` must be > 0 (prevents division by zero in decay function)
- All `*_half_life_days` must be > 0 and <= 3650 (prevents division by zero in decay function; rejects absurd 10+ year half-lives that would effectively disable decay)
- All `*_weight` / `*_bonus` must be >= 0 (negative weights produce nonsensical scores)
- `closed_mr_multiplier` must be in `(0.0, 1.0]` (0 would discard closed MRs entirely; >1 would over-weight them)
- `reviewer_min_note_chars` must be >= 0 (0 disables the filter; typical useful values: 10-50)
- `closed_mr_multiplier` must be finite (not NaN/Inf) and in `(0.0, 1.0]` (0 would discard closed MRs entirely; >1 would over-weight them; NaN/Inf would propagate through all scores)
- `reviewer_min_note_chars` must be >= 0 and <= 4096 (0 disables the filter; 4096 is a sane upper bound — no real review comment needs to be longer to qualify; typical useful values: 10-50)
- `excluded_usernames` entries must be non-empty strings (no blank entries)
- Return `LoreError::ConfigInvalid` with a clear message on failure
### 2. Decay Function (who.rs)
@@ -123,30 +127,74 @@ fn half_life_decay(elapsed_ms: i64, half_life_days: u32) -> f64 {
### 3. SQL Restructure (who.rs)
The SQL uses **CTE-based dual-path matching** and **hybrid aggregation**. Rather than repeating `OR old_path` in every signal subquery, two foundational CTEs (`matched_notes`, `matched_file_changes`) centralize path matching. A third CTE (`reviewer_participation`) precomputes which reviewers actually left DiffNotes, avoiding correlated `EXISTS`/`NOT EXISTS` subqueries.
The SQL uses **CTE-based dual-path matching**, a **centralized `mr_activity` CTE**, and **hybrid aggregation**. Rather than repeating `OR old_path` in every signal subquery, two foundational CTEs (`matched_notes`, `matched_file_changes`) centralize path matching. A `mr_activity` CTE centralizes the state-aware timestamp and state multiplier in one place, eliminating repetition of the CASE expression across signals 3, 4a, 4b. A fourth CTE (`reviewer_participation`) precomputes which reviewers actually left DiffNotes, avoiding correlated `EXISTS`/`NOT EXISTS` subqueries.
MR-level signals return one row per (username, signal, mr_id) with a timestamp; note signals return one row per (username, mr_id) with `note_count` and `max_ts`. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and `log2(1+count)`.
MR-level signals return one row per (username, signal, mr_id) with a timestamp and state multiplier; note signals return one row per (username, mr_id) with `note_count` and `max_ts`. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and `log2(1+count)`.
```sql
WITH matched_notes AS (
-- Centralize dual-path matching for DiffNotes
SELECT n.id, n.discussion_id, n.author_username, n.created_at,
n.position_new_path, n.position_old_path, n.project_id
WITH matched_notes_raw AS (
-- Branch 1: match on new_path (uses idx_notes_new_path or equivalent)
SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
FROM notes n
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username IS NOT NULL
AND n.created_at >= ?2
AND n.created_at <= ?4
AND n.created_at < ?4
AND (?3 IS NULL OR n.project_id = ?3)
AND (n.position_new_path {path_op} OR n.position_old_path {path_op})
AND n.position_new_path {path_op}
UNION ALL
-- Branch 2: match on old_path (uses idx_notes_old_path_author)
SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
FROM notes n
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username IS NOT NULL
AND n.created_at >= ?2
AND n.created_at < ?4
AND (?3 IS NULL OR n.project_id = ?3)
AND n.position_old_path {path_op}
),
matched_file_changes AS (
-- Centralize dual-path matching for file changes
matched_notes AS (
-- Dedup: prevent double-counting when old_path = new_path (no rename)
SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
FROM matched_notes_raw
),
matched_file_changes_raw AS (
-- Branch 1: match on new_path (uses idx_mfc_new_path_project_mr)
SELECT fc.merge_request_id, fc.project_id
FROM mr_file_changes fc
WHERE (?3 IS NULL OR fc.project_id = ?3)
AND (fc.new_path {path_op} OR fc.old_path {path_op})
AND fc.new_path {path_op}
UNION ALL
-- Branch 2: match on old_path (uses idx_mfc_old_path_project_mr)
SELECT fc.merge_request_id, fc.project_id
FROM mr_file_changes fc
WHERE (?3 IS NULL OR fc.project_id = ?3)
AND fc.old_path {path_op}
),
matched_file_changes AS (
-- Dedup: prevent double-counting when old_path = new_path (no rename)
SELECT DISTINCT merge_request_id, project_id
FROM matched_file_changes_raw
),
mr_activity AS (
-- Centralized state-aware timestamps and state multiplier.
-- Defined once, referenced by all file-change-based signals (3, 4a, 4b).
-- Scoped to MRs matched by file changes to avoid materializing the full MR table.
SELECT DISTINCT
m.id AS mr_id,
m.author_username,
m.state,
CASE
WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.created_at)
WHEN m.state = 'closed' THEN COALESCE(m.closed_at, m.created_at)
ELSE COALESCE(m.updated_at, m.created_at)
END AS activity_ts,
CASE WHEN m.state = 'closed' THEN ?5 ELSE 1.0 END AS state_mult
FROM merge_requests m
JOIN matched_file_changes mfc ON mfc.merge_request_id = m.id
WHERE m.state IN ('opened','merged','closed')
),
reviewer_participation AS (
-- Precompute which (mr_id, username) pairs have substantive DiffNote participation.
@@ -156,17 +204,20 @@ reviewer_participation AS (
-- reviewer from 3-point to 10-point weight, defeating the purpose of the split.
-- Note: mn.id refers back to notes.id, so we join notes to access the body column
-- (not carried in matched_notes to avoid bloating that CTE with body text).
-- ?6 is the configured reviewer_min_note_chars value (default 20).
SELECT DISTINCT d.merge_request_id AS mr_id, mn.author_username AS username
FROM matched_notes mn
JOIN discussions d ON mn.discussion_id = d.id
JOIN notes n_body ON mn.id = n_body.id
WHERE d.merge_request_id IS NOT NULL
AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= {reviewer_min_note_chars}
AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= ?6
),
raw AS (
-- Signal 1: DiffNote reviewer (individual notes for note_cnt)
-- Computes state_mult inline (not via mr_activity) because this joins through discussions, not file changes.
SELECT mn.author_username AS username, 'diffnote_reviewer' AS signal,
m.id AS mr_id, mn.id AS note_id, mn.created_at AS seen_at, m.state AS mr_state
m.id AS mr_id, mn.id AS note_id, mn.created_at AS seen_at,
CASE WHEN m.state = 'closed' THEN ?5 ELSE 1.0 END AS state_mult
FROM matched_notes mn
JOIN discussions d ON mn.discussion_id = d.id
JOIN merge_requests m ON d.merge_request_id = m.id
@@ -176,8 +227,10 @@ raw AS (
UNION ALL
-- Signal 2: DiffNote MR author
-- Computes state_mult inline (same reason as signal 1).
SELECT m.author_username AS username, 'diffnote_author' AS signal,
m.id AS mr_id, NULL AS note_id, MAX(mn.created_at) AS seen_at, m.state AS mr_state
m.id AS mr_id, NULL AS note_id, MAX(mn.created_at) AS seen_at,
CASE WHEN m.state = 'closed' THEN ?5 ELSE 1.0 END AS state_mult
FROM merge_requests m
JOIN discussions d ON d.merge_request_id = m.id
JOIN matched_notes mn ON mn.discussion_id = d.id
@@ -187,69 +240,63 @@ raw AS (
UNION ALL
-- Signal 3: MR author via file changes (state-aware timestamp)
SELECT m.author_username AS username, 'file_author' AS signal,
m.id AS mr_id, NULL AS note_id,
{state_aware_ts} AS seen_at, m.state AS mr_state
FROM matched_file_changes mfc
JOIN merge_requests m ON mfc.merge_request_id = m.id
WHERE m.author_username IS NOT NULL
AND m.state IN ('opened','merged','closed')
AND {state_aware_ts} >= ?2
AND {state_aware_ts} <= ?4
-- Signal 3: MR author via file changes (uses mr_activity CTE for timestamp + state_mult)
SELECT a.author_username AS username, 'file_author' AS signal,
a.mr_id, NULL AS note_id,
a.activity_ts AS seen_at, a.state_mult
FROM mr_activity a
WHERE a.author_username IS NOT NULL
AND a.activity_ts >= ?2
AND a.activity_ts < ?4
UNION ALL
-- Signal 4a: Reviewer participated (in mr_reviewers AND left DiffNotes on path)
SELECT r.username AS username, 'file_reviewer_participated' AS signal,
m.id AS mr_id, NULL AS note_id,
{state_aware_ts} AS seen_at, m.state AS mr_state
FROM matched_file_changes mfc
JOIN merge_requests m ON mfc.merge_request_id = m.id
JOIN mr_reviewers r ON r.merge_request_id = m.id
JOIN reviewer_participation rp ON rp.mr_id = m.id AND rp.username = r.username
a.mr_id, NULL AS note_id,
a.activity_ts AS seen_at, a.state_mult
FROM mr_activity a
JOIN mr_reviewers r ON r.merge_request_id = a.mr_id
JOIN reviewer_participation rp ON rp.mr_id = a.mr_id AND rp.username = r.username
WHERE r.username IS NOT NULL
AND (m.author_username IS NULL OR r.username != m.author_username)
AND m.state IN ('opened','merged','closed')
AND {state_aware_ts} >= ?2
AND {state_aware_ts} <= ?4
AND (a.author_username IS NULL OR r.username != a.author_username)
AND a.activity_ts >= ?2
AND a.activity_ts < ?4
UNION ALL
-- Signal 4b: Reviewer assigned-only (in mr_reviewers, NO DiffNotes on path)
SELECT r.username AS username, 'file_reviewer_assigned' AS signal,
m.id AS mr_id, NULL AS note_id,
{state_aware_ts} AS seen_at, m.state AS mr_state
FROM matched_file_changes mfc
JOIN merge_requests m ON mfc.merge_request_id = m.id
JOIN mr_reviewers r ON r.merge_request_id = m.id
LEFT JOIN reviewer_participation rp ON rp.mr_id = m.id AND rp.username = r.username
a.mr_id, NULL AS note_id,
a.activity_ts AS seen_at, a.state_mult
FROM mr_activity a
JOIN mr_reviewers r ON r.merge_request_id = a.mr_id
LEFT JOIN reviewer_participation rp ON rp.mr_id = a.mr_id AND rp.username = r.username
WHERE rp.username IS NULL -- NOT in participation set
AND r.username IS NOT NULL
AND (m.author_username IS NULL OR r.username != m.author_username)
AND m.state IN ('opened','merged','closed')
AND {state_aware_ts} >= ?2
AND {state_aware_ts} <= ?4
AND (a.author_username IS NULL OR r.username != a.author_username)
AND a.activity_ts >= ?2
AND a.activity_ts < ?4
),
aggregated AS (
-- MR-level signals: 1 row per (username, signal_class, mr_id) with MAX(ts)
SELECT username, signal, mr_id, 1 AS qty, MAX(seen_at) AS ts, mr_state
SELECT username, signal, mr_id, 1 AS qty, MAX(seen_at) AS ts, MAX(state_mult) AS state_mult
FROM raw WHERE signal != 'diffnote_reviewer'
GROUP BY username, signal, mr_id
UNION ALL
-- Note signals: 1 row per (username, mr_id) with note_count and max_ts
SELECT username, 'note_group' AS signal, mr_id, COUNT(*) AS qty, MAX(seen_at) AS ts, mr_state
SELECT username, 'note_group' AS signal, mr_id, COUNT(*) AS qty, MAX(seen_at) AS ts, MAX(state_mult) AS state_mult
FROM raw WHERE signal = 'diffnote_reviewer' AND note_id IS NOT NULL
GROUP BY username, mr_id
)
SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated WHERE username IS NOT NULL
SELECT username, signal, mr_id, qty, ts, state_mult FROM aggregated WHERE username IS NOT NULL
```
Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `BETWEEN ?2 AND ?4` pattern ensures that when `--as-of` is set to a past date, events after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility.
Where `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?2` is `since_ms`, `?3` is the optional project_id, `?4` is the `as_of_ms` exclusive upper bound (defaults to `now_ms` when `--as-of` is not specified), `?5` is the `closed_mr_multiplier` (default 0.5, bound as a parameter), and `?6` is the configured `reviewer_min_note_chars` value (default 20, bound as a parameter). The `>= ?2 AND < ?4` pattern (half-open interval) ensures that when `--as-of` is set to a past date, events at or after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility. The exclusive upper bound avoids edge-case ambiguity when events have timestamps exactly equal to the as-of value.
**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into `matched_notes` and `matched_file_changes` CTEs means path matching is defined once, the indexes are hit once, and adding future path resolution logic (e.g., alias chains) only requires changes in one place.
**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into foundational CTEs (`matched_notes_raw``matched_notes`, `matched_file_changes_raw``matched_file_changes`) means path matching is defined once, each index branch is explicit, and adding future path resolution logic (e.g., alias chains) only requires changes in one place. The UNION ALL + dedup pattern ensures SQLite uses the optimal index for each path column independently.
**Index optimization fallback (UNION ALL split)**: SQLite's query planner sometimes struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. If EXPLAIN QUERY PLAN shows this during step 6 verification, replace the `OR`-based CTEs with a `UNION ALL` split + dedup pattern:
**Dual-path matching strategy (UNION ALL split)**: SQLite's query planner commonly struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. Rather than starting with `OR` and hoping the planner cooperates, use `UNION ALL` + dedup as the default strategy:
```sql
matched_notes AS (
SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
@@ -261,25 +308,39 @@ matched_notes_dedup AS (
FROM matched_notes
),
```
This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm the degradation.
This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). The same pattern applies to `matched_file_changes`. The simpler `OR` variant is retained as a comment for benchmarking — if a future SQLite version handles `OR` well, the split can be collapsed.
**Rationale for precomputed participation set**: The previous approach used correlated `EXISTS`/`NOT EXISTS` subqueries to classify reviewers. The `reviewer_participation` CTE materializes the set of `(mr_id, username)` pairs from matched DiffNotes once, then signal 4a JOINs against it (participated) and signal 4b LEFT JOINs with `IS NULL` (assigned-only). This avoids per-reviewer-row correlated scans, is easier to reason about, and produces the same exhaustive split — every `mr_reviewers` row falls into exactly one bucket.
**Rationale for hybrid over fully-raw**: Pre-aggregating note counts in SQL prevents row explosion from heavy DiffNote volume on frequently-discussed paths. MR-level signals are already 1-per-MR by nature (deduped via GROUP BY in each subquery). This keeps memory and latency predictable regardless of review activity density.
**Path rename awareness**: Both `matched_notes` and `matched_file_changes` CTEs match against both old and new path columns:
**Path rename awareness**: Both `matched_notes` and `matched_file_changes` use UNION ALL + dedup to match against both old and new path columns independently, ensuring each branch uses its respective index:
- Notes: `(n.position_new_path {path_op} OR n.position_old_path {path_op})`
- File changes: `(fc.new_path {path_op} OR fc.old_path {path_op})`
- Notes: branch 1 matches `position_new_path`, branch 2 matches `position_old_path`, deduped by `notes.id`
- File changes: branch 1 matches `new_path`, branch 2 matches `old_path`, deduped by `(merge_request_id, project_id)`
Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The `OR` match ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.
Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The UNION ALL approach ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.
**Signal 4 splits into two**: The current signal 4 (`file_reviewer`) joins `mr_reviewers` but doesn't distinguish participation. In the new plan:
- **Signal 4a** (`file_reviewer_participated`): User is in `mr_reviewers` AND appears in the `reviewer_participation` CTE (left DiffNotes on the path for that MR). Gets `reviewer_weight` (10) and `reviewer_half_life_days` (90).
- **Signal 4b** (`file_reviewer_assigned`): User is in `mr_reviewers` but NOT in the `reviewer_participation` CTE. Gets `reviewer_assignment_weight` (3) and `reviewer_assignment_half_life_days` (45).
### 3a. Path Resolution Probes (who.rs)
**Rationale for `mr_activity` CTE**: The previous approach repeated the state-aware CASE expression and `m.state` column in signals 3, 4a, and 4b, with the `closed_mr_multiplier` applied later in Rust by string-matching on `mr_state`. This split was brittle — the CASE expression could drift between signal branches, and per-row state-string handling in Rust was unnecessary indirection. The `mr_activity` CTE defines the timestamp and multiplier once, scoped to matched MRs only (via JOIN with `matched_file_changes`) to avoid materializing the full MR table. Signals 3, 4a, 4b now reference `a.activity_ts` and `a.state_mult` directly. Signals 1 and 2 (DiffNote-based) still compute `state_mult` inline because they join through `discussions`, not `matched_file_changes`, and adding them to `mr_activity` would require a second join path that doesn't simplify anything.
**Rationale for parameterized `reviewer_min_note_chars` and `closed_mr_multiplier`**: Previous iterations inlined `reviewer_min_note_chars` as a literal in the SQL string and kept `closed_mr_multiplier` in Rust only. Binding both as SQL parameters (`?5` for `closed_mr_multiplier`, `?6` for `reviewer_min_note_chars`) eliminates statement-cache churn (the SQL text is identical regardless of config values), avoids SQL-text variability that complicates EXPLAIN QUERY PLAN analysis, and centralizes the multiplier application in SQL for file-change signals. The DiffNote signals (1, 2) still compute `state_mult` inline because they don't go through `mr_activity`.
### 3a. Path Canonicalization and Resolution Probes (who.rs)
**Path canonicalization**: Before any path resolution or scoring, normalize the user's input path via `normalize_query_path()`:
- Strip leading `./` (e.g., `./src/foo.rs``src/foo.rs`)
- Collapse repeated `/` (e.g., `src//foo.rs``src/foo.rs`)
- Trim leading/trailing whitespace
- Preserve trailing `/` only when present — it signals explicit prefix intent
This is applied once at the top of `run_who()` before `build_path_query()`. The robot JSON `resolved_input` includes both `path_input_original` (raw user input) and `path_input_normalized` (after canonicalization) for debugging transparency. The normalization is purely syntactic — no filesystem lookups, no canonicalization against the database.
**Path resolution probes**: Rename awareness must extend beyond scoring queries to the path resolution layer. Currently `build_path_query()` (line 457) and `suffix_probe()` (line 584) only check `position_new_path` and `new_path`. If a user queries an old path name, these probes return "not found" and the scoring query never runs.
Rename awareness must extend beyond scoring queries to the path resolution layer. Currently `build_path_query()` (line 457) and `suffix_probe()` (line 584) only check `position_new_path` and `new_path`. If a user queries an old path name, these probes return "not found" and the scoring query never runs.
@@ -308,39 +369,29 @@ WHERE old_path IS NOT NULL
This ensures that querying by an old filename (e.g., `login.rs` after it was renamed to `auth.rs`) still resolves to a usable path for scoring. The UNION deduplicates so the same path appearing in both old and new columns doesn't cause false ambiguity.
**State-aware timestamps for file-change signals (signals 3, 4a, 4b)**: Replace `m.updated_at` with a state-aware expression:
```sql
CASE
WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.created_at)
WHEN m.state = 'closed' THEN COALESCE(m.closed_at, m.created_at)
ELSE COALESCE(m.updated_at, m.created_at) -- opened / other
END AS activity_ts
```
**State-aware timestamps for file-change signals (signals 3, 4a, 4b)**: Centralized in the `mr_activity` CTE (see section 3). The CASE expression uses `merged_at` for merged MRs, `closed_at` for closed MRs, and `updated_at` for open MRs, with `created_at` as fallback when the preferred timestamp is NULL.
**Rationale**: `updated_at` is noisy for merged MRs — it changes on label edits, title changes, rebases, and metadata touches, creating false recency. `merged_at` is the best indicator of when code expertise was formed (the moment the code entered the branch). But for **open MRs**, `updated_at` is actually the right signal because it reflects ongoing active work. `closed_at` anchors closed-without-merge MRs to their closure time (these represent review effort even if the code was abandoned). Each state gets the timestamp that best represents when expertise was last exercised.
### 4. Rust-Side Aggregation (who.rs)
For each username, accumulate into a struct with:
- **Author MRs**: `HashMap<i64, (i64, String)>` (mr_id -> (max timestamp, mr_state)) from `diffnote_author` + `file_author` signals
- **Reviewer Participated MRs**: `HashMap<i64, (i64, String)>` from `diffnote_reviewer` + `file_reviewer_participated` signals
- **Reviewer Assigned-Only MRs**: `HashMap<i64, (i64, String)>` from `file_reviewer_assigned` signals (excluding any MR already in participated set)
- **Notes per MR**: `HashMap<i64, (u32, i64, String)>` (mr_id -> (count, max_ts, mr_state)) from `note_group` rows in the aggregated query (already grouped per user+MR with note_count in `qty`). Used for `log2(1 + count)` diminishing returns.
- **Author MRs**: `HashMap<i64, (i64, f64)>` (mr_id -> (max timestamp, state_mult)) from `diffnote_author` + `file_author` signals
- **Reviewer Participated MRs**: `HashMap<i64, (i64, f64)>` from `diffnote_reviewer` + `file_reviewer_participated` signals
- **Reviewer Assigned-Only MRs**: `HashMap<i64, (i64, f64)>` from `file_reviewer_assigned` signals (excluding any MR already in participated set)
- **Notes per MR**: `HashMap<i64, (u32, i64, f64)>` (mr_id -> (count, max_ts, state_mult)) from `note_group` rows in the aggregated query (already grouped per user+MR with note_count in `qty`). Used for `log2(1 + count)` diminishing returns.
- **Last seen**: max of all timestamps
- **Components** (when `--explain-score`): Track per-component f64 subtotals for `author`, `reviewer_participated`, `reviewer_assigned`, `notes`
The `mr_state` field from each SQL row is stored alongside the timestamp so the Rust-side can apply `closed_mr_multiplier` when `mr_state == "closed"`.
The `state_mult` field from each SQL row (already computed in SQL as 1.0 for merged/open or `closed_mr_multiplier` for closed) is stored alongside the timestamp — no string-matching on MR state needed in Rust.
Compute score as `f64`. Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`:
Compute score as `f64` with **deterministic contribution ordering**: within each signal type, sort contributions by `(mr_id ASC)` before summing. This eliminates platform-dependent HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the complexity of compensated summation (Neumaier/Kahan). Each MR-level contribution is multiplied by its `state_mult` (already computed in SQL):
```
state_mult(mr) = if mr.state == "closed" { closed_mr_multiplier } else { 1.0 }
raw_score =
sum(author_weight * state_mult(mr) * decay(now - ts, author_hl) for (mr, ts) in author_mrs)
+ sum(reviewer_weight * state_mult(mr) * decay(now - ts, reviewer_hl) for (mr, ts) in reviewer_participated)
+ sum(reviewer_assignment_weight * state_mult(mr) * decay(now - ts, reviewer_assignment_hl) for (mr, ts) in reviewer_assigned)
+ sum(note_bonus * state_mult(mr) * log2(1 + count) * decay(now - ts, note_hl) for (mr, count, ts) in notes_per_mr)
sum(author_weight * state_mult * decay(now - ts, author_hl) for (mr, ts, state_mult) in author_mrs)
+ sum(reviewer_weight * state_mult * decay(now - ts, reviewer_hl) for (mr, ts, state_mult) in reviewer_participated)
+ sum(reviewer_assignment_weight * state_mult * decay(now - ts, reviewer_assignment_hl) for (mr, ts, state_mult) in reviewer_assigned)
+ sum(note_bonus * state_mult * log2(1 + count) * decay(now - ts, note_hl) for (mr, count, ts, state_mult) in notes_per_mr)
```
**Why include closed MRs?** A closed-without-merge MR still represents review effort and code familiarity — the reviewer read the diff, left comments, and engaged with the code even though it was ultimately abandoned. Excluding closed MRs entirely (the previous plan's approach) discarded this signal. The `closed_mr_multiplier` (default 0.5) halves the contribution, reflecting that the code never landed but the reviewer's cognitive engagement was real. This also eliminates the dead-code inconsistency where the state-aware CASE expression handled `closed` but the WHERE clause excluded it.
@@ -352,6 +403,8 @@ Compute counts from the accumulated data:
- `review_note_count = notes_per_mr.values().map(|(count, _)| count).sum()`
- `author_mr_count = author_mrs.len()`
**Bot/service-account filtering**: After accumulating all user scores and before sorting, filter out any username that appears in `config.scoring.excluded_usernames` (exact match, case-insensitive). This is applied in Rust post-query (not SQL) to keep the SQL clean and avoid parameter explosion. When `--include-bots` is active, the filter is skipped entirely. The robot JSON `resolved_input` includes `excluded_usernames_applied: true|false` to indicate whether filtering was active.
Truncate to limit after sorting.
### 5. Default --since Change
@@ -364,10 +417,11 @@ At 2 years, author decay = 6%, reviewer decay = 0.4%, note decay = 0.006% — ne
### 5a. Reproducible Scoring via `--as-of`
Add `--as-of <RFC3339|YYYY-MM-DD>` flag that overrides the `now_ms` reference point used for decay calculations. When set:
- All event selection is bounded by `[since_ms, as_of_ms]` — events after `as_of_ms` are excluded from SQL results entirely (not just decayed)
- All event selection is bounded by `[since_ms, as_of_ms)` — exclusive upper bound; events at or after `as_of_ms` are excluded from SQL results entirely (not just decayed). The SQL uses `< ?4` (strict less-than), not `<= ?4`.
- `YYYY-MM-DD` input (without time component) is interpreted as end-of-day UTC: `T23:59:59.999Z`. This matches user intuition that `--as-of 2025-06-01` means "as of the end of June 1st" rather than "as of midnight at the start of June 1st" which would exclude the entire day's activity.
- All decay computations use `as_of_ms` instead of `SystemTime::now()`
- The `--since` window is calculated relative to `as_of_ms` (not wall clock)
- Robot JSON `resolved_input` includes `as_of_ms` and `as_of_iso` fields
- Robot JSON `resolved_input` includes `as_of_ms`, `as_of_iso`, `window_start_iso`, `window_end_iso`, and `window_end_exclusive: true` fields — making the exact query window unambiguous in output
**Rationale**: Decayed scoring is time-sensitive by nature. Without a fixed reference point, the same query run minutes apart produces different rankings, making debugging and test reproducibility difficult. `--as-of` pins the clock so that results are deterministic for a given dataset. The upper-bound filter in SQL is critical — without it, events after the as-of date would enter with full weight (since `elapsed.max(0.0)` clamps negative elapsed time to zero), breaking the reproducibility guarantee.
@@ -426,9 +480,16 @@ CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author
ON notes(discussion_id, author_username, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0;
-- Support path resolution probes on old_path (build_path_query() and suffix_probe())
-- The existing idx_notes_diffnote_path_created covers new_path probes, but old_path probes
-- need their own index since probes don't constrain author_username.
CREATE INDEX IF NOT EXISTS idx_notes_old_path_project_created
ON notes(position_old_path, project_id, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;
```
**Rationale**: The existing indexes cover `position_new_path` and `new_path` but not their `old_path` counterparts. Without these, the `OR old_path` clauses would force table scans on renamed files. The `reviewer_participation` CTE joins `matched_notes` -> `discussions` -> `merge_requests`, so an index on `(discussion_id, author_username)` speeds up the CTE materialization.
**Rationale**: The existing indexes cover `position_new_path` and `new_path` but not their `old_path` counterparts. Without these, the `OR old_path` clauses would force table scans on renamed files. The `reviewer_participation` CTE joins `matched_notes` -> `discussions` -> `merge_requests`, so an index on `(discussion_id, author_username)` speeds up the CTE materialization. The `idx_notes_old_path_project_created` index supports path resolution probes (`build_path_query()` and `suffix_probe()`) which run existence/path-only checks without constraining `author_username` — the scoring-oriented `idx_notes_old_path_author` has `author_username` as the second column, which is suboptimal for these probes.
**Schema note**: The `notes` table uses `discussion_id` as its FK to `discussions`, which in turn has `merge_request_id`. There is no `noteable_id` column on `notes`. The previous plan revision incorrectly referenced `noteable_id` — this is corrected.
@@ -484,10 +545,24 @@ Add timestamp-aware variants:
**`test_closed_mr_multiplier`**: Two identical MRs (same author, same age, same path). One is `merged`, one is `closed`. The merged MR should contribute `author_weight * decay(...)`, the closed MR should contribute `author_weight * closed_mr_multiplier * decay(...)`. With default multiplier 0.5, the closed MR contributes half.
**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the upper-bound filtering in SQL.
**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the exclusive upper-bound (`< ?4`) filtering in SQL.
**`test_as_of_exclusive_upper_bound`**: Insert an event with timestamp exactly equal to the `as_of_ms` value. Verify it is excluded from results (strict less-than, not less-than-or-equal). This validates the half-open interval `[since, as_of)` semantics.
**`test_excluded_usernames_filters_bots`**: Insert signals for a user named "renovate-bot" and a user named "jsmith", both with the same activity. With `excluded_usernames: ["renovate-bot"]` in config, only "jsmith" should appear in results. Validates the Rust-side post-query filtering.
**`test_include_bots_flag_disables_filtering`**: Same setup as above, but with `--include-bots` active. Both "renovate-bot" and "jsmith" should appear in results.
**`test_null_timestamp_fallback_to_created_at`**: Insert a merged MR with `merged_at = NULL` (edge case: old data before the column was populated). The state-aware timestamp should fall back to `created_at`. Verify the score reflects `created_at`, not 0 or a panic.
**`test_path_normalization_handles_dot_and_double_slash`**: Call `normalize_query_path("./src//foo.rs")` — should return `"src/foo.rs"`. Call `normalize_query_path(" src/bar.rs ")` — should return `"src/bar.rs"`. Call `normalize_query_path("src/foo.rs")` — should return unchanged (already normalized). Call `normalize_query_path("")` — should return `""` (empty input passes through).
**`test_path_normalization_preserves_prefix_semantics`**: Call `normalize_query_path("./src/dir/")` — should return `"src/dir/"` (trailing slash preserved for prefix intent). Call `normalize_query_path("src/dir")` — should return `"src/dir"` (no trailing slash = file, not prefix).
**`test_config_validation_rejects_absurd_half_life`**: `ScoringConfig` with `author_half_life_days = 5000` (>3650 cap) should return `ConfigInvalid` error. Similarly, `reviewer_min_note_chars = 5000` (>4096 cap) should fail.
**`test_config_validation_rejects_nan_multiplier`**: `ScoringConfig` with `closed_mr_multiplier = f64::NAN` should return `ConfigInvalid` error. Same for `f64::INFINITY`.
#### Invariant tests (regression safety for ranking systems)
**`test_score_monotonicity_by_age`**: For any single signal type, an older timestamp must never produce a higher score than a newer timestamp with the same weight and half-life. Generate N random (age, half_life) pairs and assert `decay(older) <= decay(newer)` for all.
@@ -496,11 +571,13 @@ Add timestamp-aware variants:
**`test_reviewer_split_is_exhaustive`**: For a reviewer assigned to an MR, they must appear in exactly one of: participated (has substantive DiffNotes meeting `reviewer_min_note_chars`) or assigned-only (no DiffNotes, or only trivial ones below the threshold). Never both, never neither. Test three cases: (1) reviewer with substantive DiffNotes -> participated only, (2) reviewer with no DiffNotes -> assigned-only only, (3) reviewer with only trivial notes ("LGTM") -> assigned-only only.
**`test_deterministic_accumulation_order`**: Insert signals for a user with contributions at many different timestamps (10+ MRs with varied ages). Run `query_expert` 100 times in a loop. All 100 runs must produce the exact same `f64` score (bit-identical). Validates that the sorted contribution ordering eliminates HashMap-iteration-order nondeterminism.
### 9. Existing Test Compatibility
All existing tests insert data with `now_ms()`. With decay, elapsed ~0ms means decay ~1.0, so scores round to the same integers as before. No existing test assertions should break.
The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, and `reviewer_min_note_chars` fields.
The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, `reviewer_min_note_chars`, and `excluded_usernames` fields.
## Verification
@@ -511,11 +588,19 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul
5. `ubs src/cli/commands/who.rs src/core/config.rs src/core/db.rs` — no bug scanner findings
6. Manual query plan verification (not automated — SQLite planner varies across versions):
- Run `EXPLAIN QUERY PLAN` on the expert query (both exact and prefix modes) against a real database
- Confirm that `matched_notes` CTE uses `idx_notes_old_path_author` or the existing new_path index (not a full table scan)
- Confirm that `matched_file_changes` CTE uses `idx_mfc_old_path_project_mr` or `idx_mfc_new_path_project_mr`
- Confirm that `matched_notes_raw` branch 1 uses the existing new_path index and branch 2 uses `idx_notes_old_path_author` (not a full table scan on either branch)
- Confirm that `matched_file_changes_raw` branch 1 uses `idx_mfc_new_path_project_mr` and branch 2 uses `idx_mfc_old_path_project_mr`
- Confirm that `reviewer_participation` CTE uses `idx_notes_diffnote_discussion_author`
- Confirm that `mr_activity` CTE joins `merge_requests` via primary key from `matched_file_changes`
- Confirm that path resolution probes (old_path leg) use `idx_notes_old_path_project_created`
- Document the observed plan in a comment near the SQL for future regression reference
7. Real-world validation:
7. Performance baseline (manual, not CI-gated):
- Run `time cargo run --release -- who --path <exact-path>` on the real database for exact, prefix, and suffix modes
- Target SLOs: p95 exact path < 200ms, prefix < 300ms, suffix < 500ms on development hardware
- Record baseline timings as a comment near the SQL for regression reference
- If any mode exceeds 2x the baseline after future changes, investigate before merging
- Note: These are soft targets for developer awareness, not automated CI gates. Automated benchmarking with synthetic fixtures (100k/1M/5M notes) is a v2 investment if performance becomes a real concern.
8. Real-world validation:
- `cargo run --release -- who --path MeasurementQualityDialog.tsx` — verify jdefting/zhayes old reviews are properly discounted relative to recent authors
- `cargo run --release -- who --path MeasurementQualityDialog.tsx --all-history` — compare full history vs 24m window to validate cutoff is reasonable
- `cargo run --release -- who --path MeasurementQualityDialog.tsx --explain-score` — verify component breakdown sums to total and authored signal dominates for known authors
@@ -524,6 +609,8 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul
- `cargo run --release -- who --path MeasurementQualityDialog.tsx --as-of 2025-06-01` — verify deterministic output across repeated runs
- Spot-check that reviewers who only left "LGTM"-style notes are classified as assigned-only (not participated)
- Verify closed MRs contribute at ~50% of equivalent merged MR scores via `--explain-score`
- If the project has known bot accounts (e.g., renovate-bot), add them to `excluded_usernames` config and verify they no longer appear in results. Run again with `--include-bots` to confirm they reappear.
- Test path normalization: `who --path ./src//foo.rs` and `who --path src/foo.rs` should produce identical results
## Accepted from External Review
@@ -553,12 +640,28 @@ Ideas incorporated from ChatGPT review (feedback-1 through feedback-4) that genu
- **EXPLAIN QUERY PLAN verification step**: Manual check that the restructured queries use the new indexes (not automated, since SQLite planner varies across versions).
**From feedback-4:**
- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `<= ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms]`. Without this, `--as-of` reproducibility was fundamentally broken.
- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `< ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms)`. Without this, `--as-of` reproducibility was fundamentally broken. (Refined to exclusive upper bound in feedback-5.)
- **Closed-state inconsistency resolution**: The state-aware CASE expression handled `closed` state but the WHERE clause filtered to `('opened','merged')` only — dead code. Resolved by including `'closed'` in state filters and adding a `closed_mr_multiplier` (default 0.5) applied in Rust to all signals from closed-without-merge MRs. This credits real review effort on abandoned MRs while appropriately discounting it.
- **Substantive note threshold for reviewer participation**: A single "LGTM" shouldn't promote a reviewer from 3-point (assigned-only) to 10-point (participated) weight. Added `reviewer_min_note_chars` (default 20) config field and `LENGTH(TRIM(body))` filter in the `reviewer_participation` CTE. This raises the bar for participation classification to actual substantive review comments.
- **UNION ALL optimization fallback for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Added documentation of a `UNION ALL` split + dedup fallback pattern to use if EXPLAIN QUERY PLAN shows degradation during verification. Start with the simpler `OR` approach; switch only if needed.
- **UNION ALL optimization for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Originally documented as a fallback; promoted to default strategy in feedback-5 iteration. The UNION ALL + dedup approach ensures each index branch is used independently.
- **New tests**: `test_trivial_note_does_not_count_as_participation`, `test_closed_mr_multiplier`, `test_as_of_excludes_future_events` — cover the three new features added from this review round.
**From feedback-5 (ChatGPT review):**
- **Exclusive upper bound for `--as-of`**: Changed from `[since_ms, as_of_ms]` (inclusive) to `[since_ms, as_of_ms)` (exclusive). Half-open intervals are the standard convention in temporal systems — they eliminate edge-case ambiguity when events have timestamps exactly at the boundary. Also added `YYYY-MM-DD` → end-of-day UTC parsing and window metadata in robot output.
- **UNION ALL as default for dual-path matching**: Promoted from "fallback if planner regresses" to default strategy. SQLite `OR`-across-indexed-columns degradation is common enough that the predictable UNION ALL + dedup approach is the safer starting point. The simpler `OR` variant is retained as a comment for benchmarking.
- **Deterministic contribution ordering**: Within each signal type, sort contributions by `mr_id` before summing. This eliminates HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the overhead of compensated summation (Neumaier/Kahan was rejected as overkill at this scale).
- **Minimal bot/service-account filtering**: Added `excluded_usernames` (exact match, case-insensitive) to `ScoringConfig` and `--include-bots` CLI flag. Applied as a Rust-side post-filter (not SQL) to keep queries clean. Scope is deliberately minimal — no regex patterns, no heuristic detection. Users configure the list for their team's specific bots.
- **Performance baseline SLOs**: Added manual performance baseline step to verification — record timings for exact/prefix/suffix modes and flag >2x regressions. Kept lightweight (no CI gating, no synthetic benchmarks) to match the project's current maturity.
- **New tests**: `test_as_of_exclusive_upper_bound`, `test_excluded_usernames_filters_bots`, `test_include_bots_flag_disables_filtering`, `test_deterministic_accumulation_order` — cover the newly-accepted features.
**From feedback-6 (ChatGPT review):**
- **Centralized `mr_activity` CTE**: The state-aware timestamp CASE expression and `closed_mr_multiplier` were repeated across signals 3, 4a, 4b with the multiplier applied later in Rust via string-matching on `mr_state`. This was brittle — the CASE could drift between branches and the Rust-side string matching was unnecessary indirection. A single `mr_activity` CTE defines both `activity_ts` and `state_mult` once, scoped to matched MRs only (via JOIN with `matched_file_changes`). Signals 1 and 2 still compute `state_mult` inline because they join through `discussions`, not `matched_file_changes`.
- **Parameterized `reviewer_min_note_chars` and `closed_mr_multiplier`**: Previously `reviewer_min_note_chars` was inlined as a literal in the SQL string and `closed_mr_multiplier` was applied only in Rust. Binding both as SQL parameters (`?5` for `closed_mr_multiplier`, `?6` for `reviewer_min_note_chars`) eliminates statement-cache churn, ensures identical SQL text regardless of config values, and simplifies EXPLAIN QUERY PLAN analysis.
- **Tightened config validation**: Added upper bounds — `*_half_life_days <= 3650` (10-year safety cap), `reviewer_min_note_chars <= 4096`, and `closed_mr_multiplier` must be finite (not NaN/Inf). These prevent absurd configurations from silently producing nonsensical results.
- **Path canonicalization via `normalize_query_path()`**: Inputs like `./src//foo.rs` or whitespace-padded paths could fail path resolution even when the file exists in the database. A simple syntactic normalization (strip `./`, collapse `//`, trim whitespace, preserve trailing `/`) runs before `build_path_query()` to reduce false negatives. No filesystem or database lookups — purely string manipulation.
- **Probe-optimized `idx_notes_old_path_project_created` index**: The scoring-oriented `idx_notes_old_path_author` index has `author_username` as its second column, which is suboptimal for path resolution probes that don't constrain author. A dedicated probe index on `(position_old_path, project_id, created_at)` ensures `build_path_query()` and `suffix_probe()` old_path lookups are efficient.
- **New tests**: `test_path_normalization_handles_dot_and_double_slash`, `test_path_normalization_preserves_prefix_semantics`, `test_config_validation_rejects_absurd_half_life`, `test_config_validation_rejects_nan_multiplier` — cover the path canonicalization and tightened validation logic.
## Rejected Ideas (with rationale)
These suggestions were considered during review but explicitly excluded from this iteration:
@@ -573,3 +676,13 @@ These suggestions were considered during review but explicitly excluded from thi
- **Split scoring engine into core module** (feedback-4 #5): Proposed extracting scoring math from `who.rs` into `src/core/scoring/model_v2_decay.rs`. Premature modularization — `who.rs` is the only consumer and is ~800 lines. Adding module plumbing and indirection for a single call site adds complexity without reducing it. If we add a second scoring consumer (e.g., automated triage), revisit.
- **Bot/service-account filtering** (feedback-4 #7): Real concern but orthogonal to time-decay scoring. This is a general data quality feature that belongs in its own issue — it affects all `who` modes, not just expert scoring. Adding `excluded_username_patterns` config and `--include-bots` flag is scope expansion that should be designed and tested independently.
- **Model compare mode / rank-delta diagnostics** (feedback-4 #9): Over-engineered rollout safety for an internal CLI tool with ~3 users. Maintaining two parallel scoring codepaths (v1 flat + v2 decayed) doubles test surface and code complexity. The `--explain-score` + `--as-of` combination already provides debugging capability. If a future model change is risky enough to warrant A/B comparison, build it then.
- **Canonical path identity graph** (feedback-5 #1, also feedback-2 #2, feedback-4 #4): Third time proposed, third time rejected. Building a rename graph from `mr_file_changes(old_path, new_path)` with identity resolution requires new schema (`path_identities`, `path_aliases` tables), ingestion pipeline changes, graph traversal at query time, and backfill logic for existing data. The UNION ALL dual-path matching already covers the 80%+ case (direct renames). Multi-hop rename chains (A→B→C) are rare in practice and can be addressed in v2 with real usage data showing the gap matters.
- **Normalized `expertise_events` table** (feedback-5 #2): Proposes shifting from query-time CTE joins to a precomputed `expertise_events` table populated at ingest time. While architecturally appealing for read performance, this doubles the data surface area (raw tables + derived events), requires new ingestion pipelines with incremental upsert logic, backfill tooling for existing databases, and introduces consistency risks when raw data is corrected/re-synced. The CTE approach is correct, maintainable, and performant at our current scale. If query latency becomes a real bottleneck (see performance baseline SLOs), materialized views or derived tables become a v2 optimization.
- **Reviewer engagement model upgrade** (feedback-5 #3): Proposes adding `approved`/`changes_requested` review-state signals and trivial-comment pattern matching (`["lgtm","+1","nit","ship it"]`). Expands the signal type count from 4 to 6 and adds a fragile pattern-matching layer (what about "don't ship it"? "lgtm but..."?). The `reviewer_min_note_chars` threshold is imperfect but pragmatic — it's a single configurable number with no false-positive risk from substring matching. Review-state signals may be worth adding later as a separate enhancement when we have data on how often they diverge from DiffNote participation.
- **Contribution-floor auto cutoff for `--since`** (feedback-5 #5): Proposes `--since auto` computing the earliest relevant timestamp from `min_contribution_floor` (e.g., 0.01 points). Adds a non-obvious config parameter for minimal benefit — the 24m default is already mathematically justified from the decay curves (author: 6%, reviewer: 0.4% at 2 years) and easily overridden with `--since` or `--all-history`. The auto-derivation formula (`ceil(max_half_life * log2(1/floor))`) is opaque to users who just want to understand why a certain time range was selected.
- **Full evidence drill-down in `--explain-score`** (feedback-5 #8): Proposes `--explain-score=summary|full` with per-MR evidence rows. Already rejected in feedback-2 #7. Component totals are sufficient for v1 debugging — they answer "which signal type drives this user's score." Per-MR drill-down requires additional SQL queries and significant output format complexity. Deferred unless component breakdowns prove insufficient.
- **Neumaier compensated summation** (feedback-5 #7 partial): Accepted the sorting aspect for deterministic ordering, but rejected Neumaier/Kahan compensated summation. At the scale of dozens to low hundreds of contributions per user, the rounding error from naive f64 summation is on the order of 1e-14 — several orders of magnitude below any meaningful score difference. Compensated summation adds code complexity and a maintenance burden for no practical benefit at this scale.
- **Automated CI benchmark gate** (feedback-5 #10 partial): Accepted manual performance baselines, but rejected automated CI regression gating with synthetic fixtures (100k/1M/5M notes). Building and maintaining benchmark infrastructure is a significant investment that's premature for a CLI tool with ~3 users. Manual timing checks during development are sufficient until performance becomes a real concern.
- **Epsilon-based tie buckets for ranking** (feedback-6 #4) — rejected because the plan already has deterministic contribution ordering by `mr_id` within each signal type, which eliminates HashMap-iteration nondeterminism. Platform-dependent `powf` differences at the scale of dozens to hundreds of contributions per user are sub-epsilon (order of 1e-15). If two users genuinely score within 1e-9 of each other, the existing tiebreak by `(last_seen DESC, username ASC)` is already meaningful and deterministic. Adding a bucketing layer introduces a magic epsilon constant and floor operation for a problem that doesn't manifest in practice.
- **`--diagnose-score` aggregated diagnostics flag** (feedback-6 #5) — rejected because this is diagnostic/debugging tooling that adds a new flag, new output format, and new counting logic (matched_notes_raw_count, dedup_count, window exclusions, etc.) across the SQL pipeline. The existing `--explain-score` component breakdown + manual EXPLAIN QUERY PLAN verification already covers the debugging need. The additional SQL instrumentation required (counting rows at each CTE stage) would complicate the query for a feature with unclear demand. A v2 addition if operational debugging becomes a recurring need.
- **Multi-path expert scoring (`--path` repeatable)** (feedback-6 #7) — rejected because this is a feature expansion, not a plan improvement for the time-decay model. Multi-path requires a `requested_paths` CTE, modified dedup logic keyed on `(username, signal, mr_id)` across paths, CLI parsing changes for repeatable `--path` and `--path-file`, and new test cases for overlap/prefix/dedup semantics. This is a separate bead/feature that should be designed independently — it's orthogonal to time-decay scoring and can be added later without requiring any changes to the decay model.

View File

@@ -0,0 +1,209 @@
No `## Rejected Recommendations` section was present, so these are all net-new improvements.
1. Keep core `lore` stable; isolate nightly to a TUI crate
Rationale: the current plan says “whole project nightly” but later assumes TUI is feature-gated. Isolating nightly removes unnecessary risk from non-TUI users, CI, and release cadence.
```diff
@@ 3.2 Nightly Rust Strategy
-- The entire gitlore project moves to pinned nightly, not just the TUI feature.
+- Keep core `lore` on stable Rust.
+- Add workspace member `lore-tui` pinned to nightly for FrankenTUI.
+- Ship `lore tui` only when `--features tui` (or separate `lore-tui` binary) is enabled.
@@ 10.1 New Files
+- crates/lore-tui/Cargo.toml
+- crates/lore-tui/src/main.rs
@@ 11. Assumptions
-17. TUI module is feature-gated.
+17. TUI is isolated in a workspace crate and feature-gated in root CLI integration.
```
2. Add a framework adapter boundary from day 1
Rationale: the “3-day ratatui escape hatch” is optimistic without a strict interface. A tiny `UiRuntime` + screen renderer trait makes fallback real, not aspirational.
```diff
@@ 4. Architecture
+### 4.9 UI Runtime Abstraction
+Introduce `UiRuntime` trait (`run`, `send`, `subscribe`) and `ScreenRenderer` trait.
+FrankenTUI implementation is default; ratatui adapter can be dropped in with no state/action rewrite.
@@ 3.5 Escape Hatch
-- The migration cost to ratatui is ~3 days
+- Migration cost target is ~3-5 days, validated by one ratatui spike screen in Phase 1.
```
3. Stop using CLI command modules as the TUI query API
Rationale: coupling TUI to CLI output-era structs creates long-term friction and accidental regressions. Create a shared domain query layer used by both CLI and TUI.
```diff
@@ 10.20 Refactor: Extract Query Functions
-- extract query_* from cli/commands/*
+- introduce `src/domain/query/*` as the canonical read model API.
+- CLI and TUI both depend on domain query layer.
+- CLI modules retain formatting/output only.
@@ 10.2 Modified Files
+- src/domain/query/mod.rs
+- src/domain/query/issues.rs
+- src/domain/query/mrs.rs
+- src/domain/query/search.rs
+- src/domain/query/who.rs
```
4. Replace single `Arc<Mutex<Connection>>` with connection manager
Rationale: one locked connection serializes everything and hurts responsiveness, especially during sync. Use separate read pool + writer connection with WAL and busy timeout.
```diff
@@ 4.4 App — Implementing the Model Trait
- pub db: Arc<Mutex<Connection>>,
+ pub db: Arc<DbManager>, // read pool + single writer coordination
@@ 4.5 Async Action System
- Each Cmd::task closure locks the mutex, runs the query, and returns a Msg
+ Reads use pooled read-only connections.
+ Sync/write path uses dedicated writer connection.
+ Enforce WAL, busy_timeout, and retry policy for SQLITE_BUSY.
```
5. Make debouncing/cancellation explicit and correct
Rationale: “runtime coalesces rapid keypresses” is not a safe correctness guarantee. Add request IDs and stale-response dropping to prevent flicker and wrong data.
```diff
@@ 4.3 Core Types (Msg)
+ SearchRequestStarted { request_id: u64, query: String }
- SearchExecuted(SearchResults),
+ SearchExecuted { request_id: u64, results: SearchResults },
@@ 4.4 maybe_debounced_query()
- runtime coalesces rapid keypresses
+ use explicit 200ms debounce timer + monotonic request_id
+ ignore results whose request_id != current_search_request_id
```
6. Implement true streaming sync, not batch-at-end pseudo-streaming
Rationale: the plan promises real-time logs/progress but code currently returns one completion message. This gap will disappoint users and complicate cancellation.
```diff
@@ 4.4 start_sync_task()
- Pragmatic approach: run sync synchronously, collect all progress events, return summary.
+ Use event channel subscription for `SyncProgress`/`SyncLogLine` streaming.
+ Keep `SyncCompleted` only as terminal event.
+ Add cooperative cancel token mapped to `Esc` while running.
@@ 5.9 Sync
+ Add "Resume from checkpoint" option for interrupted syncs.
```
7. Fix entity identity ambiguity across projects
Rationale: using `iid` alone is unsafe in multi-project datasets. Navigation and cross-refs should key by `(project_id, iid)` or global ID.
```diff
@@ 4.3 Core Types
- IssueDetail(i64)
- MrDetail(i64)
+ IssueDetail(EntityKey)
+ MrDetail(EntityKey)
+ pub struct EntityKey { pub project_id: i64, pub iid: i64, pub kind: EntityKind }
@@ 10.12.4 Cross-Reference Widget
- parse "group/project#123" -> iid only
+ parse into `{project_path, iid, kind}` then resolve to `project_id` before navigation
```
8. Resolve keybinding conflicts and formalize keymap precedence
Rationale: current spec conflicts (`Tab` sort vs focus filter; `gg` vs go-prefix). A deterministic keymap contract prevents UX bugs.
```diff
@@ 8.2 List Screens
- Tab | Cycle sort column
- f | Focus filter bar
+ Tab | Focus filter bar
+ S | Cycle sort column
+ / | Focus filter bar (alias)
@@ 4.4 interpret_key()
+ Add explicit precedence table:
+ 1) modal/palette
+ 2) focused input
+ 3) global
+ 4) screen-local
+ Add configurable go-prefix timeout (default 500ms) with cancel feedback.
```
9. Add performance SLOs and DB/index plan
Rationale: “fast enough” is vague. Add measurable budgets, required indexes, and query-plan gates in CI for predictable performance.
```diff
@@ 3.1 Risk Matrix
+ Add risk: "Query latency regressions on large datasets"
@@ 9.3 Phase 0 — Toolchain Gate
+7. p95 list query latency < 75ms on 100k issues synthetic fixture
+8. p95 search latency < 200ms on 1M docs (lexical mode)
@@ 11. Assumptions
-5. SQLite queries are fast enough for interactive use (<50ms for filtered results).
+5. Performance budgets are enforced by benchmark fixtures and query-plan checks.
+6. Required indexes documented and migration-backed before TUI GA.
```
10. Add reliability/observability model (error classes, retries, tracing)
Rationale: one string toast is not enough for production debugging. Add typed errors, retry policy, and an in-TUI diagnostics pane.
```diff
@@ 4.3 Core Types (Msg)
- Error(String),
+ Error(AppError),
+ pub enum AppError {
+ DbBusy, DbCorruption, NetworkRateLimited, NetworkUnavailable,
+ AuthFailed, ParseError, Internal(String)
+ }
@@ 5.11 Doctor / Stats
+ Add "Diagnostics" tab:
+ - last 100 errors
+ - retry counts
+ - current sync/backoff state
+ - DB contention metrics
```
11. Add “Saved Views + Watchlist” as high-value product features
Rationale: this makes the TUI compelling daily, not just navigable. Users can persist filters and monitor critical slices (e.g., “P1 auth issues updated in last 24h”).
```diff
@@ 1. Executive Summary
+ - Saved Views (named filters and layouts)
+ - Watchlist panel (tracked queries with delta badges)
@@ 5. Screen Taxonomy
+### 5.12 Saved Views / Watchlist
+Persistent named filters for Issues/MRs/Search.
+Dashboard shows per-watchlist deltas since last session.
@@ 6. User Flows
+### 6.9 Flow: "Run morning watchlist triage"
+Dashboard -> Watchlist -> filtered IssueList/MRList -> detail drilldown
```
12. Strengthen testing plan with deterministic behavior and chaos cases
Rationale: snapshot tests alone wont catch race/staleness/cancellation issues. Add concurrency, cancellation, and flaky terminal behavior tests.
```diff
@@ 9.2 Phases
+Phase 5.5 Reliability Test Pack (2d)
+ - stale response drop tests
+ - sync cancel/resume tests
+ - SQLITE_BUSY retry tests
+ - resize storm and rapid key-chord tests
@@ 10.9 Snapshot Test Example
+ Add non-snapshot tests:
+ - property tests for navigation invariants
+ - integration tests for request ordering correctness
+ - benchmark tests for query budgets
```
If you want, I can produce a consolidated “PRD v2.1 patch” with all of the above merged into one coherent updated document structure.

View File

@@ -0,0 +1,214 @@
I found 9 high-impact revisions that materially improve correctness, robustness, and usability without reintroducing anything in `## Rejected Recommendations`.
### 1. Prevent stale async overwrites on **all** screens (not just search)
Right now, only `SearchExecuted` is generation-guarded. `IssueListLoaded`, `MrListLoaded`, `IssueDetailLoaded`, etc. can still race and overwrite newer state after rapid navigation/filtering. This is the biggest correctness risk in the current design.
```diff
diff --git a/PRD.md b/PRD.md
@@ message.rs
- IssueListLoaded(Vec<IssueRow>),
+ IssueListLoaded { generation: u64, rows: Vec<IssueRow> },
@@
- MrListLoaded(Vec<MrRow>),
+ MrListLoaded { generation: u64, rows: Vec<MrRow> },
@@
- IssueDetailLoaded { key: EntityKey, detail: IssueDetail },
- MrDetailLoaded { key: EntityKey, detail: MrDetail },
+ IssueDetailLoaded { generation: u64, key: EntityKey, detail: IssueDetail },
+ MrDetailLoaded { generation: u64, key: EntityKey, detail: MrDetail },
@@ update()
- Msg::IssueListLoaded(result) => {
+ Msg::IssueListLoaded { generation, rows } => {
+ if !self.task_supervisor.is_current(&TaskKey::LoadScreen(Screen::IssueList), generation) {
+ return Cmd::none();
+ }
self.state.set_loading(false);
- self.state.issue_list.set_result(result);
+ self.state.issue_list.set_result(rows);
Cmd::none()
}
```
### 2. Make cancellation safe with task-owned SQLite interrupt handles
The plan mentions `sqlite3_interrupt()` but uses pooled shared reader connections. Interrupting a shared connection can cancel unrelated work. Use per-task reader leases and store `InterruptHandle` in `TaskHandle`.
```diff
diff --git a/PRD.md b/PRD.md
@@ DbManager
- readers: Vec<Mutex<Connection>>,
+ readers: Vec<Mutex<Connection>>,
+ // task-scoped interrupt handles prevent cross-task cancellation bleed
+ // each dispatched query receives an owned ReaderLease
+pub struct ReaderLease {
+ conn: Connection,
+ interrupt: rusqlite::InterruptHandle,
+}
+
+impl DbManager {
+ pub fn lease_reader(&self) -> Result<ReaderLease, LoreError> { ... }
+}
@@ TaskHandle
pub struct TaskHandle {
pub key: TaskKey,
pub generation: u64,
pub cancel: Arc<CancelToken>,
+ pub interrupt: Option<rusqlite::InterruptHandle>,
}
@@ cancellation
-Query interruption: ... fires sqlite3_interrupt() on the connection.
+Query interruption: cancel triggers the task's owned InterruptHandle only.
+No shared-connection interrupt is permitted.
```
### 3. Harden keyset pagination for multi-project and sort changes
`updated_at + iid` cursor is not enough when rows share timestamps across projects or sort mode changes. This can duplicate/skip rows.
```diff
diff --git a/PRD.md b/PRD.md
@@ issue_list.rs
-pub struct IssueCursor {
- pub updated_at: i64,
- pub iid: i64,
-}
+pub struct IssueCursor {
+ pub sort_field: SortField,
+ pub sort_order: SortOrder,
+ pub updated_at: Option<i64>,
+ pub created_at: Option<i64>,
+ pub iid: i64,
+ pub project_id: i64, // deterministic tie-breaker
+ pub filter_hash: u64, // invalidates stale cursors on filter mutation
+}
@@ pagination section
-Windowed keyset pagination ...
+Windowed keyset pagination uses deterministic tuple ordering:
+`ORDER BY <primary_sort>, project_id, iid`.
+Cursor is rejected if `filter_hash` or sort tuple mismatches current query.
```
### 4. Replace ad-hoc filter parsing with a small typed DSL
Current `split_whitespace()` parser is brittle and silently lossy. Add quoted values, negation, and strict parse errors.
```diff
diff --git a/PRD.md b/PRD.md
@@ filter_bar.rs
- fn parse_tokens(&mut self) {
- let text = self.input.value().to_string();
- self.tokens = text.split_whitespace().map(|chunk| { ... }).collect();
- }
+ fn parse_tokens(&mut self) {
+ // grammar (v1):
+ // term := [ "-" ] (field ":" value | quoted_text | bare_text)
+ // value := quoted | unquoted
+ // examples:
+ // state:opened label:"P1 blocker" -author:bot since:14d
+ self.tokens = filter_dsl::parse(self.input.value())?;
+ }
@@ section 8 / keybindings-help
+Filter parser surfaces actionable inline diagnostics with cursor position,
+and never silently drops unknown fields.
```
### 5. Add render caches for markdown/tree shaping
Markdown and tree shaping are currently recomputed on every frame in several snippets. Cache render artifacts by `(entity, width, theme, content_hash)` to protect frame time.
```diff
diff --git a/PRD.md b/PRD.md
@@ module structure
+ render_cache.rs # Width/theme/content-hash keyed cache for markdown + tree layouts
@@ Assumptions / Performance
+Detail and search preview rendering uses memoized render artifacts.
+Cache invalidation triggers: content hash change, terminal width change, theme change.
```
### 6. Use one-shot timers for debounce/prefix timeout
`Every` is periodic; it wakes repeatedly and can produce edge-case repeated firings. One-shot subscriptions are cleaner and cheaper.
```diff
diff --git a/PRD.md b/PRD.md
@@ subscriptions()
- if self.state.search.debounce_pending() {
- subs.push(Box::new(
- Every::with_id(3, Duration::from_millis(200), move || {
- Msg::SearchDebounceFired { generation }
- })
- ));
- }
+ if self.state.search.debounce_pending() {
+ subs.push(Box::new(
+ After::with_id(3, Duration::from_millis(200), move || {
+ Msg::SearchDebounceFired { generation }
+ })
+ ));
+ }
@@ InputMode GoPrefix timeout
-The tick subscription compares clock instant...
+GoPrefix timeout is a one-shot `After(500ms)` tied to prefix generation.
```
### 7. New feature: list “Quick Peek” panel (`Space`) for triage speed
This adds immediate value without v2-level scope. Users can inspect selected issue/MR metadata/snippet without entering detail and coming back.
```diff
diff --git a/PRD.md b/PRD.md
@@ 5.2 Issue List
-Interaction: Enter detail
+Interaction: Enter detail, Space quick-peek (toggle right preview pane)
@@ 5.4 MR List
+Quick Peek mode mirrors Issue List: metadata + first discussion snippet + cross-refs.
@@ 8.2 List Screens
| `Enter` | Open selected item |
+| `Space` | Toggle Quick Peek panel for selected row |
```
### 8. Upgrade compatibility handshake from integer to machine-readable contract
Single integer compat is too coarse for real drift detection. Keep it simple but structured.
```diff
diff --git a/PRD.md b/PRD.md
@@ Nightly Rust Strategy / Compatibility contract
- 1. Binary compat version (`lore-tui --compat-version`) — integer check ...
+ 1. Binary compat contract (`lore-tui --compat-json`) — JSON:
+ `{ "protocol": 1, "compat_version": 2, "min_schema": 14, "max_schema": 16, "build": "..." }`
+ `lore` validates protocol + compat + schema range before spawn.
@@ CLI integration
-fn validate_tui_compat(...) { ... --compat-version ... }
+fn validate_tui_compat(...) { ... --compat-json ... }
```
### 9. Fix sync stream bug and formalize progress coalescing
The current snippet calls `try_send` for progress twice in one callback path and depth math is wrong. Also progress spam should be coalesced by lane.
```diff
diff --git a/PRD.md b/PRD.md
@@ start_sync_task()
- let current_depth = 2048 - tx.try_send(Msg::SyncProgress(event.clone()))
- .err().map_or(0, |_| 1);
- max_queue_depth = max_queue_depth.max(current_depth);
- if tx.try_send(Msg::SyncProgress(event.clone())).is_err() {
+ // coalesce by lane key at <=30Hz; one send attempt per flush
+ coalescer.update(event.clone());
+ if let Some(batch) = coalescer.flush_ready() {
+ if tx.try_send(Msg::SyncProgressBatch(batch)).is_err() {
dropped_count += 1;
let _ = tx.try_send(Msg::SyncBackpressureDrop);
+ } else {
+ max_queue_depth = max_queue_depth.max(observed_queue_depth());
+ }
}
```
---
If you want, I can produce a single consolidated patch-style rewrite of Sections `4.x`, `5.2/5.4`, `8.2`, `9.3`, and `10.x` so you can drop it directly into iteration 10.

View File

@@ -0,0 +1,177 @@
I reviewed the full PRD and avoided everything listed under `## Rejected Recommendations`.
These are the highest-impact revisions Id make.
1. Stable list pagination via snapshot fences
Why this improves the plan: your keyset cursor is deterministic for sort/filter, but still vulnerable to duplicates/skips if sync writes land between page fetches. Add a per-browse snapshot fence so one browse session sees a stable dataset.
Tradeoff: newest rows are hidden until refresh, which is correct for deterministic triage.
```diff
diff --git a/docs/plans/gitlore-tui-prd-v2.md b/docs/plans/gitlore-tui-prd-v2.md
@@ 5.2 Issue List
- **Pagination:** Windowed keyset pagination with explicit cursor state.
+ **Pagination:** Windowed keyset pagination with explicit cursor state.
+ **Snapshot fence:** On list entry, capture `snapshot_upper_updated_at` (ms) and pin all
+ list-page queries to `updated_at <= snapshot_upper_updated_at`. This guarantees no duplicate
+ or skipped rows during scrolling even if sync writes occur concurrently.
+ A "new data available" badge appears when a newer sync completes; `r` refreshes the fence.
@@ 5.4 MR List
- **Pagination:** Same windowed keyset pagination strategy as Issue List.
+ **Pagination:** Same strategy plus snapshot fence (`updated_at <= snapshot_upper_updated_at`)
+ for deterministic cross-page traversal under concurrent sync writes.
@@ 4.7 Navigation Stack Implementation
+ Browsing sessions carry a per-screen `BrowseSnapshot` token to preserve stable ordering
+ until explicit refresh or screen re-entry.
```
2. Query budgets and soft deadlines
Why this improves the plan: currently “slow query” is handled mostly by cancellation and stale-drop. Add explicit latency budgets so UI responsiveness stays predictable under worst-case filters.
Tradeoff: sometimes user gets partial/truncated results first, followed by full results on retry/refine.
```diff
diff --git a/docs/plans/gitlore-tui-prd-v2.md b/docs/plans/gitlore-tui-prd-v2.md
@@ 4.5 Async Action System
+ #### 4.5.2 Query Budgets and Soft Deadlines
+ Each query type gets a budget:
+ - list window fetch: 120ms target, 250ms hard deadline
+ - detail phase-1 metadata: 75ms target, 150ms hard deadline
+ - search lexical/hybrid: 250ms hard deadline
+ On hard deadline breach, return `QueryDegraded { truncated: true }` and show inline badge:
+ "results truncated; refine filter or press r to retry full".
+ Implementation uses SQLite progress handler + per-task interrupt deadline.
@@ 9.3 Phase 0 — Toolchain Gate
+ 26. Query deadline behavior validated: hard deadline cancels query and renders degraded badge
+ without blocking input loop.
```
3. Targeted cache invalidation and prewarm after sync
Why this improves the plan: `invalidate_all()` after sync throws away hot detail cache and hurts the exact post-sync workflow you optimized for. Invalidate only changed keys and prewarm likely-next entities.
Tradeoff: slightly more bookkeeping in sync result handling.
```diff
diff --git a/docs/plans/gitlore-tui-prd-v2.md b/docs/plans/gitlore-tui-prd-v2.md
@@ 4.1 Module Structure
- entity_cache.rs # Bounded LRU cache ... Invalidated on sync completion.
+ entity_cache.rs # Bounded LRU cache with selective invalidation by changed EntityKey
+ # and optional post-sync prewarm of top changed entities.
@@ 4.4 App — Implementing the Model Trait (Msg::SyncCompleted)
- // Invalidate entity cache — synced data may have changed.
- self.entity_cache.invalidate_all();
+ // Selective invalidation: evict only changed entities from sync delta.
+ self.entity_cache.invalidate_keys(&result.changed_entity_keys);
+ // Prewarm top N changed/new entities for immediate post-sync triage.
+ self.enqueue_cache_prewarm(&result.changed_entity_keys);
```
4. Exact “what changed” navigation without new DB tables
Why this improves the plan: your summary currently uses timestamp filter; this can include unrelated updates and miss edge cases. Keep an in-memory delta ledger per sync run and navigate by exact IDs.
Tradeoff: small memory overhead per run; no schema migration required.
```diff
diff --git a/docs/plans/gitlore-tui-prd-v2.md b/docs/plans/gitlore-tui-prd-v2.md
@@ 5.9 Sync (Summary mode)
-- `i` navigates to Issue List pre-filtered to "since last sync" (using `sync_status.last_completed_at` timestamp comparison)
-- `m` navigates to MR List pre-filtered to "since last sync" (using `sync_status.last_completed_at` timestamp comparison)
+- `i` navigates to Issue List filtered by exact issue IDs changed in this sync run
+- `m` navigates to MR List filtered by exact MR IDs changed in this sync run
+ (fallback to timestamp filter only if run delta not available)
@@ 10.1 New Files
+crates/lore-tui/src/sync_delta_ledger.rs # In-memory per-run exact changed/new IDs (issues/MRs/discussions)
```
5. Adaptive render governor (runtime performance safety)
Why this improves the plan: capability detection is static; you also need dynamic adaptation when frame time/backpressure worsens (SSH, tmux nesting, huge logs).
Tradeoff: visual richness may step down automatically under load.
```diff
diff --git a/docs/plans/gitlore-tui-prd-v2.md b/docs/plans/gitlore-tui-prd-v2.md
@@ 3.4.1 Capability-Adaptive Rendering
+#### 3.4.2 Adaptive Render Governor
+Runtime monitors frame time and stream pressure:
+- if frame p95 > 40ms or sync drops spike, switch to lighter profile:
+ plain markdown, reduced tree guides, slower spinner tick, less frequent repaint.
+- when stable for N seconds, restore previous profile.
+CLI override:
+`lore tui --render-profile=auto|quality|balanced|speed`
@@ 9.3 Phase 0 — Toolchain Gate
+27. Frame-time governor validated: under induced load, UI remains responsive and input latency
+stays within p95 < 75ms while auto-downgrading render profile.
```
6. First-run/data-not-ready screen (not an init wizard)
Why this improves the plan: empty DB or missing indexes will otherwise feel broken. A dedicated read-only readiness screen improves first impression and self-recovery.
Tradeoff: one extra lightweight screen/state.
```diff
diff --git a/docs/plans/gitlore-tui-prd-v2.md b/docs/plans/gitlore-tui-prd-v2.md
@@ 4.3 Core Types (Screen enum)
Sync,
Stats,
Doctor,
+ Bootstrap,
@@ 5.11 Doctor / Stats (Info Screens)
+### 5.12 Bootstrap (Data Readiness)
+Shown when no synced projects/documents are present or required indexes are missing.
+Displays concise readiness checks and exact CLI commands to recover:
+`lore sync`, `lore migrate`, `lore --robot doctor`.
+Read-only; no auto-execution.
```
7. Global project scope pinning across screens
Why this improves the plan: users repeatedly apply the same project filter across dashboard/list/search/timeline/who. Add a global scope pin to reduce repetitive filtering and speed triage.
Tradeoff: must show clear “scope active” indicator to avoid confusion.
```diff
diff --git a/docs/plans/gitlore-tui-prd-v2.md b/docs/plans/gitlore-tui-prd-v2.md
@@ 4.1 Module Structure
+ scope.rs # Global project scope context (all-projects or pinned project set)
@@ 8.1 Global (Available Everywhere)
+| `P` | Open project scope picker / toggle global scope pin |
@@ 4.10 State Module — Complete
+pub global_scope: ScopeContext, // Applies to dashboard/list/search/timeline/who queries
@@ 10.11 Action Module — Query Bridge
- pub fn fetch_issues(conn: &Connection, filter: &IssueFilter) -> Result<Vec<IssueListRow>, LoreError>
+ pub fn fetch_issues(conn: &Connection, scope: &ScopeContext, filter: &IssueFilter) -> Result<Vec<IssueListRow>, LoreError>
```
8. Concurrency correctness tests for pagination and cancellation races
Why this improves the plan: current reliability tests are good, but missing a direct test for duplicate/skip behavior under concurrent sync writes while paginating.
Tradeoff: additional integration test complexity.
```diff
diff --git a/docs/plans/gitlore-tui-prd-v2.md b/docs/plans/gitlore-tui-prd-v2.md
@@ 9.2 Phases (Phase 5.5 — Reliability Test Pack)
+ Concurrent pagination/write race tests :p55j, after p55h, 1d
+ Query deadline cancellation race tests :p55k, after p55j, 0.5d
@@ 9.3 Phase 0 — Toolchain Gate
+28. Concurrent pagination/write test proves no duplicates/skips within a pinned browse snapshot.
+29. Cancellation race test proves no cross-task interrupt bleed and no stuck loading state.
```
9. URL opening policy v2: allowlisted GitLab entity paths
Why this improves the plan: host validation is necessary but not always sufficient. Restrict default browser opens to known GitLab entity paths and require confirmation for unusual paths on same host.
Tradeoff: occasional extra prompt for uncommon but valid URLs.
```diff
diff --git a/docs/plans/gitlore-tui-prd-v2.md b/docs/plans/gitlore-tui-prd-v2.md
@@ 3.1 Risk Matrix
-| Malicious URL in entity data opened in browser | Medium | Low | URL host validated against configured GitLab instance before `open`/`xdg-open` |
+| Malicious URL in entity data opened in browser | Medium | Low | Validate scheme+host+port and path pattern allowlist (`/-/issues/`, `/-/merge_requests/`, project issue/MR routes). Unknown same-host paths require explicit confirm modal. |
@@ 10.4.1 Terminal Safety — Untrusted Text Sanitization
- pub fn is_safe_url(url: &str, allowed_origins: &[AllowedOrigin]) -> bool
+ pub fn classify_safe_url(url: &str, policy: &UrlPolicy) -> UrlSafety
+ // UrlSafety::{AllowedEntityPath, AllowedButUnrecognizedPath, Blocked}
```
These 9 changes are additive, avoid previously rejected ideas, and materially improve determinism, responsiveness, post-sync usefulness, and safety without forcing a big architecture reset.

View File

@@ -0,0 +1,203 @@
I excluded the two items in your `## Rejected Recommendations` and focused on net-new improvements.
These are the highest-impact revisions Id make.
### 1. Fix the package graph now (avoid a hard Cargo cycle)
Your current plan has `root -> optional lore-tui` and `lore-tui -> lore (root)`, which creates a cyclic dependency risk. Split shared logic into a dedicated core crate so CLI and TUI both depend downward.
```diff
diff --git a/PRD.md b/PRD.md
@@ ## 9.1 Dependency Changes
-[workspace]
-members = [".", "crates/lore-tui"]
+[workspace]
+members = [".", "crates/lore-core", "crates/lore-tui"]
@@
-[dependencies]
-lore-tui = { path = "crates/lore-tui", optional = true }
+[dependencies]
+lore-core = { path = "crates/lore-core" }
+lore-tui = { path = "crates/lore-tui", optional = true }
@@ # crates/lore-tui/Cargo.toml
-lore = { path = "../.." } # Core lore library
+lore-core = { path = "../lore-core" } # Shared domain/query crate (acyclic graph)
```
### 2. Stop coupling TUI to `cli/commands/*` internals
Calling CLI command modules from TUI is brittle and will drift. Introduce a shared query/service layer with DTOs owned by core.
```diff
diff --git a/PRD.md b/PRD.md
@@ ## 4.1 Module Structure
- action.rs # Async action runners (DB queries, GitLab calls)
+ action.rs # Task dispatch only
+ service/
+ mod.rs
+ query.rs # Shared read services (CLI + TUI)
+ sync.rs # Shared sync orchestration facade
+ dto.rs # UI-agnostic data contracts
@@ ## 10.2 Modified Files
-src/cli/commands/list.rs # Extract query_issues(), query_mrs() as pub fns
-src/cli/commands/show.rs # Extract query_issue_detail(), query_mr_detail() as pub fns
-src/cli/commands/who.rs # Extract query_experts(), etc. as pub fns
-src/cli/commands/search.rs # Extract run_search_query() as pub fn
+crates/lore-core/src/query/issues.rs # Canonical issue queries
+crates/lore-core/src/query/mrs.rs # Canonical MR queries
+crates/lore-core/src/query/show.rs # Canonical detail queries
+crates/lore-core/src/query/who.rs # Canonical people queries
+crates/lore-core/src/query/search.rs # Canonical search queries
+src/cli/commands/*.rs # Consume lore-core query services
+crates/lore-tui/src/action.rs # Consume lore-core query services
```
### 3. Add a real task supervisor (dedupe + cancellation + priority)
Right now tasks are ad hoc and can overrun each other. Add a scheduler keyed by screen+intent.
```diff
diff --git a/PRD.md b/PRD.md
@@ ## 4.5 Async Action System
-The `Cmd::task(|| { ... })` pattern runs a blocking closure on a background thread pool.
+The TUI uses a `TaskSupervisor`:
+- Keyed tasks (`TaskKey`) to dedupe redundant requests
+- Priority lanes (`Input`, `Navigation`, `Background`)
+- Cooperative cancellation tokens per task
+- Late-result drop via generation IDs (not just search)
@@ ## 4.3 Core Types
+pub enum TaskKey {
+ LoadScreen(Screen),
+ Search { generation: u64 },
+ SyncStream,
+}
```
### 4. Correct sync streaming architecture (current sketch loses streamed events)
The sample creates `tx/rx` then drops `rx`; events never reach update loop. Define an explicit stream subscription with bounded queue and backpressure policy.
```diff
diff --git a/PRD.md b/PRD.md
@@ ## 4.4 App — Implementing the Model Trait
- let (tx, _rx) = std::sync::mpsc::channel::<Msg>();
+ let (tx, rx) = std::sync::mpsc::sync_channel::<Msg>(1024);
+ // rx is registered via Subscription::from_receiver("sync-stream", rx)
@@
- let result = crate::ingestion::orchestrator::run_sync(
+ let result = crate::ingestion::orchestrator::run_sync(
&config,
&conn,
|event| {
@@
- let _ = tx.send(Msg::SyncProgress(event.clone()));
- let _ = tx.send(Msg::SyncLogLine(format!("{event:?}")));
+ if tx.try_send(Msg::SyncProgress(event.clone())).is_err() {
+ let _ = tx.try_send(Msg::SyncBackpressureDrop);
+ }
+ let _ = tx.try_send(Msg::SyncLogLine(format!("{event:?}")));
},
);
```
### 5. Upgrade data-plane performance plan (keyset pagination + index contracts)
Virtualized list without keyset paging still forces expensive scans. Add explicit keyset pagination and query-plan CI checks.
```diff
diff --git a/PRD.md b/PRD.md
@@ ## 9.3 Phase 0 — Toolchain Gate
-7. p95 list query latency < 75ms on synthetic fixture (10k issues, 5k MRs)
+7. p95 list page fetch latency < 75ms using keyset pagination (10k issues, 5k MRs)
+8. EXPLAIN QUERY PLAN must show index usage for top 10 TUI queries
+9. No full table scan on issues/MRs/discussions under default filters
@@
-8. p95 search latency < 200ms on synthetic fixture (50k documents, lexical mode)
+10. p95 search latency < 200ms on synthetic fixture (50k documents, lexical mode)
+## 9.4 Required Indexes (GA blocker)
+- `issues(project_id, state, updated_at DESC, iid DESC)`
+- `merge_requests(project_id, state, updated_at DESC, iid DESC)`
+- `discussions(project_id, entity_type, entity_iid, created_at DESC)`
+- `notes(discussion_id, created_at ASC)`
```
### 6. Enforce `EntityKey` everywhere (remove bare IID paths)
You correctly identified multi-project IID collisions, but many message/state signatures still use `i64`. Make `EntityKey` mandatory in all navigation and detail loaders.
```diff
diff --git a/PRD.md b/PRD.md
@@ ## 4.3 Core Types
- IssueSelected(i64),
+ IssueSelected(EntityKey),
@@
- MrSelected(i64),
+ MrSelected(EntityKey),
@@
- IssueDetailLoaded(IssueDetail),
+ IssueDetailLoaded { key: EntityKey, detail: IssueDetail },
@@
- MrDetailLoaded(MrDetail),
+ MrDetailLoaded { key: EntityKey, detail: MrDetail },
@@ ## 10.10 State Module — Complete
- Cmd::msg(Msg::NavigateTo(Screen::IssueDetail(iid)))
+ Cmd::msg(Msg::NavigateTo(Screen::IssueDetail(entity_key)))
```
### 7. Harden filter/search semantics (strict parser + inline diagnostics + explain scores)
Current filter parser silently ignores unknown fields; that causes hidden mistakes. Add strict parse diagnostics and search score explainability.
```diff
diff --git a/PRD.md b/PRD.md
@@ ## 10.12.1 Filter Bar Widget
- _ => {} // Unknown fields silently ignored
+ _ => self.errors.push(format!("Unknown filter field: {}", token.field))
+ pub errors: Vec<String>, // inline parse/validation errors
+ pub warnings: Vec<String>, // non-fatal coercions
@@ ## 5.6 Search
-- **Live preview:** Selected result shows snippet + metadata in right pane
+- **Live preview:** Selected result shows snippet + metadata in right pane
+- **Explain score:** Optional breakdown (lexical, semantic, recency, boosts) for trust/debug
```
### 8. Add operational resilience: safe mode + panic report + startup fallback
TUI failures should degrade gracefully, not block usage.
```diff
diff --git a/PRD.md b/PRD.md
@@ ## 3.1 Risk Matrix
+| Runtime panic leaves user blocked | High | Medium | Panic hook writes crash report, restores terminal, offers fallback CLI command |
@@ ## 10.3 Entry Point
+pub fn launch_tui(config: Config, db_path: &Path) -> Result<(), LoreError> {
+ install_panic_hook_for_tui(); // terminal restore + crash dump path
+ ...
+}
@@ ## 8.1 Global (Available Everywhere)
+| `:` | Show fallback equivalent CLI command for current screen/action |
```
### 9. Add a “jump list” (forward/back navigation, not only stack pop)
Current model has only push/pop and reset. Add browser-like history for investigation workflows.
```diff
diff --git a/PRD.md b/PRD.md
@@ ## 4.7 Navigation Stack Implementation
pub struct NavigationStack {
- stack: Vec<Screen>,
+ back_stack: Vec<Screen>,
+ current: Screen,
+ forward_stack: Vec<Screen>,
+ jump_list: Vec<Screen>, // recent entity/detail hops
}
@@ ## 8.1 Global (Available Everywhere)
+| `Ctrl+o` | Jump backward in jump list |
+| `Ctrl+i` | Jump forward in jump list |
```
If you want, I can produce a single consolidated “PRD v2.1” patch that applies all nine revisions coherently section-by-section.

View File

@@ -0,0 +1,163 @@
I excluded everything already listed in `## Rejected Recommendations`.
These are the highest-impact net-new revisions Id make.
1. **Enforce Entity Identity Consistency End-to-End (P0)**
Analysis: The PRD defines `EntityKey`, but many code paths still pass bare `iid` (`IssueSelected(item.iid)`, timeline refs, search refs). In multi-project datasets this will cause wrong-entity navigation and subtle data corruption in cached state. Make `EntityKey` mandatory in every navigation message and add compile-time constructors.
```diff
@@ 4.3 Core Types
pub struct EntityKey {
pub project_id: i64,
pub iid: i64,
pub kind: EntityKind,
}
+impl EntityKey {
+ pub fn issue(project_id: i64, iid: i64) -> Self { Self { project_id, iid, kind: EntityKind::Issue } }
+ pub fn mr(project_id: i64, iid: i64) -> Self { Self { project_id, iid, kind: EntityKind::MergeRequest } }
+}
@@ 10.10 state/issue_list.rs
- .map(|item| Msg::IssueSelected(item.iid))
+ .map(|item| Msg::IssueSelected(EntityKey::issue(item.project_id, item.iid)))
@@ 10.10 state/mr_list.rs
- .map(|item| Msg::MrSelected(item.iid))
+ .map(|item| Msg::MrSelected(EntityKey::mr(item.project_id, item.iid)))
```
2. **Make TaskSupervisor Mandatory for All Background Work (P0)**
Analysis: The plan introduces `TaskSupervisor` but still dispatches many direct `Cmd::task` calls. That will reintroduce stale updates, duplicate queries, and priority inversion under rapid input. Centralize all background task creation through one spawn path that enforces dedupe, cancellation tokening, and generation checks.
```diff
@@ 4.5.1 Task Supervisor (Dedup + Cancellation + Priority)
-The supervisor is owned by `LoreApp` and consulted before dispatching any `Cmd::task`.
+The supervisor is owned by `LoreApp` and is the ONLY allowed path for background work.
+All task launches use `LoreApp::spawn_task(TaskKey, TaskPriority, closure)`.
@@ 4.4 App — Implementing the Model Trait
- Cmd::task(move || { ... })
+ self.spawn_task(TaskKey::LoadScreen(screen.clone()), TaskPriority::Navigation, move |token| { ... })
```
3. **Remove the Sync Streaming TODO and Make Real-Time Streaming a GA Gate (P0)**
Analysis: Current text admits sync progress is buffered with a TODO. That undercuts one of the main value props. Make streaming progress/log delivery non-optional, with bounded buffers and dropped-line accounting.
```diff
@@ 4.4 start_sync_task()
- // TODO: Register rx as subscription when FrankenTUI supports it.
- // For now, the task returns the final Msg and progress is buffered.
+ // Register rx as a live subscription (`Subscription::from_receiver` adapter).
+ // Progress and logs must render in real time (no batch-at-end fallback).
+ // Keep a bounded ring buffer (N=5000) and surface `dropped_log_lines` in UI.
@@ 9.3 Phase 0 — Toolchain Gate
+11. Real-time sync stream verified: progress updates visible during run, not only at completion.
```
4. **Upgrade List/Search Data Strategy to Windowed Keyset + Prefetch (P0)**
Analysis: “Virtualized list” alone does not solve query/transfer cost if full result sets are loaded. Move to fixed-size keyset windows with next-window prefetch and fast first paint; this keeps latency predictable on 100k+ records.
```diff
@@ 5.2 Issue List
- Pagination: Virtual scrolling for large result sets
+ Pagination: Windowed keyset pagination (window=200 rows) with background prefetch of next window.
+ First paint uses current window only; no full-result materialization.
@@ 5.4 MR List
+ Same windowed keyset pagination strategy as Issue List.
@@ 9.3 Success criteria
- 7. p95 list page fetch latency < 75ms using keyset pagination on synthetic fixture (10k issues, 5k MRs)
+ 7. p95 first-paint latency < 50ms and p95 next-window fetch < 75ms on synthetic fixture (100k issues, 50k MRs)
```
5. **Add Resumable Sync Checkpoints + Per-Project Fault Isolation (P1)**
Analysis: If sync is interrupted or one project fails, current design mostly falls back to cancel/fail. Add checkpoints so long runs can resume, and isolate failures to project/resource scope while continuing others.
```diff
@@ 3.1 Risk Matrix
+| Interrupted sync loses progress | High | Medium | Persist phase checkpoints and offer resume |
@@ 5.9 Sync
+Running mode: failed project/resource lanes are marked degraded while other lanes continue.
+Summary mode: offer `[R]esume interrupted sync` from last checkpoint.
@@ 11 Assumptions
-16. No new SQLite tables needed (but required indexes must be verified — see Performance SLOs).
+16. Add minimal internal tables for reliability: `sync_runs` and `sync_checkpoints` (append-only metadata).
```
6. **Add Capability-Adaptive Rendering Modes (P1)**
Analysis: Terminal compatibility is currently test-focused, but runtime adaptation is under-specified. Add explicit degradations for no-truecolor, no-unicode, slow SSH/tmux paths to reduce rendering artifacts and support incidents.
```diff
@@ 3.4 Terminal Compatibility Testing
+Add capability matrix validation: truecolor/256/16 color, unicode/ascii glyphs, alt-screen on/off.
@@ 10.19 CLI Integration
+Tui {
+ #[arg(long, default_value="auto")] render_mode: String, // auto|full|minimal
+ #[arg(long)] ascii: bool,
+ #[arg(long)] no_alt_screen: bool,
+}
```
7. **Harden Browser/Open and Log Privacy (P1)**
Analysis: `open_current_in_browser` currently trusts stored URLs; sync logs may expose tokens/emails from upstream messages. Add host allowlisting and redaction pipeline by default.
```diff
@@ 4.4 open_current_in_browser()
- if let Some(url) = url { ... open ... }
+ if let Some(url) = url {
+ if !self.state.security.is_allowed_gitlab_url(&url) {
+ self.state.set_error("Blocked non-GitLab URL".into());
+ return;
+ }
+ ... open ...
+ }
@@ 5.9 Sync
+Log stream passes through redaction (tokens, auth headers, email local-parts) before render/storage.
```
8. **Add “My Workbench” Screen for Daily Pull (P1, new feature)**
Analysis: The PRD is strong on exploration, weaker on “what should I do now?”. Add a focused operator screen aggregating assigned issues, requested reviews, unresolved threads mentioning me, and stale approvals. This makes the TUI habit-forming.
```diff
@@ 5. Screen Taxonomy
+### 5.12 My Workbench
+Single-screen triage cockpit:
+- Assigned-to-me open issues/MRs
+- Review requests awaiting action
+- Threads mentioning me and unresolved
+- Recently stale approvals / blocked MRs
@@ 8.1 Global
+| `gb` | Go to My Workbench |
@@ 9.2 Phases
+section Phase 3.5 — Daily Workflow
+My Workbench screen + queries :p35a, after p3d, 2d
```
9. **Add Rollout, SLO Telemetry, and Kill-Switch Plan (P0)**
Analysis: You have implementation phases but no production rollout control. Add explicit experiment flags, health telemetry, and rollback criteria so risk is operationally bounded.
```diff
@@ Table of Contents
-11. [Assumptions](#11-assumptions)
+11. [Assumptions](#11-assumptions)
+12. [Rollout & Telemetry](#12-rollout--telemetry)
@@ NEW SECTION 12
+## 12. Rollout & Telemetry
+- Feature flags: `tui_experimental`, `tui_sync_streaming`, `tui_workbench`
+- Metrics: startup_ms, frame_render_p95_ms, db_busy_rate, panic_free_sessions, sync_drop_events
+- Kill-switch: disable `tui` feature path at runtime if panic rate > 0.5% sessions over 24h
+- Canary rollout: internal only -> opt-in beta -> default-on
```
10. **Strengthen Reliability Pack with Event-Fuzz + Soak Tests (P0)**
Analysis: Current tests are good but still light on prolonged event pressure. Add deterministic fuzzed key/resize/paste streams and a long soak to catch rare deadlocks/leaks and state corruption.
```diff
@@ 9.2 Phase 5.5 — Reliability Test Pack
+Event fuzz tests (key/resize/paste interleavings) :p55g, after p55e, 1d
+30-minute soak test (no panic, bounded memory) :p55h, after p55g, 1d
@@ 9.3 Success criteria
+12. Event-fuzz suite passes with zero invariant violations across 10k randomized traces.
+13. 30-minute soak: no panic, no deadlock, memory growth < 5%.
```
If you want, I can produce a single consolidated unified diff of the full PRD text next (all edits merged, ready to apply as v3).

View File

@@ -0,0 +1,157 @@
Below are my strongest revisions, focused on correctness, reliability, and long-term maintainability, while avoiding all items in your `## Rejected Recommendations`.
1. **Fix the Cargo/toolchain architecture (current plan has a real dependency-cycle risk and shaky per-member toolchain behavior).**
Analysis: The current plan has `lore -> lore-tui (optional)` and `lore-tui -> lore`, which creates a package cycle when `tui` is enabled. Also, per-member `rust-toolchain.toml` in a workspace is easy to misapply in CI/dev workflows. The cleanest robust shape is: `lore-tui` is a separate binary crate (nightly), `lore` remains stable and delegates at runtime (`lore tui` shells out to `lore-tui`).
```diff
--- a/Gitlore_TUI_PRD_v2.md
+++ b/Gitlore_TUI_PRD_v2.md
@@ 3.2 Nightly Rust Strategy
-- The `lore` binary integrates TUI via `lore tui` subcommand. The `lore-tui` crate is a library dependency feature-gated in the root.
+- `lore-tui` is a separate binary crate built on pinned nightly.
+- `lore` (stable) does not compile-link `lore-tui`; `lore tui` delegates by spawning `lore-tui`.
+- This removes Cargo dependency-cycle risk and keeps stable builds nightly-free.
@@ 9.1 Dependency Changes
-[features]
-tui = ["dep:lore-tui"]
-[dependencies]
-lore-tui = { path = "crates/lore-tui", optional = true }
+[dependencies]
+# no compile-time dependency on lore-tui from lore
+# runtime delegation keeps toolchains isolated
@@ 10.19 CLI Integration
-Add Tui match arm that directly calls crate::tui::launch_tui(...)
+Add Tui match arm that resolves and spawns `lore-tui` with passthrough args.
+If missing, print actionable install/build command.
```
2. **Make `TaskSupervisor` the *actual* single async path (remove contradictory direct `Cmd::task` usage in state handlers).**
Analysis: You declare “direct `Cmd::task` is prohibited outside supervisor,” but later `handle_screen_msg` still launches tasks directly. That contradiction will reintroduce stale-result bugs and race conditions. Make state handlers pure (intent-only); all async launch/cancel/dedup goes through one supervised API.
```diff
--- a/Gitlore_TUI_PRD_v2.md
+++ b/Gitlore_TUI_PRD_v2.md
@@ 4.5.1 Task Supervisor
-The supervisor is the ONLY allowed path for background work.
+The supervisor is the ONLY allowed path for background work, enforced by architecture:
+`AppState` emits intents only; `LoreApp::update` launches tasks via `spawn_task(...)`.
@@ 10.10 State Module — Complete
-pub fn handle_screen_msg(..., db: &Arc<Mutex<Connection>>) -> Cmd<Msg>
+pub fn handle_screen_msg(...) -> ScreenIntent
+// no DB access, no Cmd::task in state layer
```
3. **Enforce `EntityKey` everywhere (remove raw IID navigation paths).**
Analysis: Multi-project identity is one of your strongest ideas, but multiple snippets still navigate by bare IID (`document_id`, `EntityRef::Issue(i64)`). That can misroute across projects and create silent correctness bugs. Make all navigation-bearing results carry `EntityKey` end-to-end.
```diff
--- a/Gitlore_TUI_PRD_v2.md
+++ b/Gitlore_TUI_PRD_v2.md
@@ 4.3 Core Types
-pub enum EntityRef { Issue(i64), MergeRequest(i64) }
+pub enum EntityRef { Issue(EntityKey), MergeRequest(EntityKey) }
@@ 10.10 state/search.rs
-Some(Msg::NavigateTo(Screen::IssueDetail(r.document_id)))
+Some(Msg::NavigateTo(Screen::IssueDetail(r.entity_key.clone())))
@@ 10.11 action.rs
-pub fn fetch_issue_detail(conn: &Connection, iid: i64) -> Result<IssueDetail, LoreError>
+pub fn fetch_issue_detail(conn: &Connection, key: &EntityKey) -> Result<IssueDetail, LoreError>
```
4. **Introduce a shared query boundary inside the existing crate (not a new crate) to decouple TUI from CLI presentation structs.**
Analysis: Reusing CLI command modules directly is fast initially, but it ties TUI to output-layer types and command concerns. A minimal internal `core::query::*` module gives a stable data contract used by both CLI and TUI without the overhead of a new crate split.
```diff
--- a/Gitlore_TUI_PRD_v2.md
+++ b/Gitlore_TUI_PRD_v2.md
@@ 10.2 Modified Files
-src/cli/commands/list.rs # extract query_issues/query_mrs as pub
-src/cli/commands/show.rs # extract query_issue_detail/query_mr_detail as pub
+src/core/query/mod.rs
+src/core/query/issues.rs
+src/core/query/mrs.rs
+src/core/query/detail.rs
+src/core/query/search.rs
+src/core/query/who.rs
+src/cli/commands/* now call core::query::* + format output
+TUI action.rs calls core::query::* directly
```
5. **Add terminal-safety sanitization for untrusted text (ANSI/OSC injection hardening).**
Analysis: Issue/MR bodies, notes, and logs are untrusted text in a terminal context. Without sanitization, terminal escape/control sequences can spoof UI or trigger unintended behavior. Add explicit sanitization and a strict URL policy before rendering/opening.
```diff
--- a/Gitlore_TUI_PRD_v2.md
+++ b/Gitlore_TUI_PRD_v2.md
@@ 3.1 Risk Matrix
+| Terminal escape/control-sequence injection via issue/note text | High | Medium | Strip ANSI/OSC/control chars before render; escape markdown output; allowlist URL scheme+host |
@@ 4.1 Module Structure
+ safety.rs # sanitize_for_terminal(), safe_url_policy()
@@ 10.5/10.8/10.14/10.16
+All user-sourced text passes through `sanitize_for_terminal()` before widget rendering.
+Disable markdown raw HTML and clickable links unless URL policy passes.
```
6. **Move resumable sync checkpoints into v1 (lightweight version).**
Analysis: You already identify interruption risk as real. Deferring resumability to post-v1 leaves a major reliability gap in exactly the heaviest workflow. A lightweight checkpoint table (resource cursor + updated-at watermark) gives large reliability gain with modest complexity.
```diff
--- a/Gitlore_TUI_PRD_v2.md
+++ b/Gitlore_TUI_PRD_v2.md
@@ 3.1 Risk Matrix
-- Resumable checkpoints planned for post-v1
+Resumable checkpoints included in v1 (lightweight cursors per project/resource lane)
@@ 9.3 Success Criteria
+14. Interrupt-and-resume test: sync resumes from checkpoint and reaches completion without full restart.
@@ 9.3.1 Required Indexes (GA Blocker)
+CREATE TABLE IF NOT EXISTS sync_checkpoints (
+ project_id INTEGER NOT NULL,
+ lane TEXT NOT NULL,
+ cursor TEXT,
+ updated_at_ms INTEGER NOT NULL,
+ PRIMARY KEY (project_id, lane)
+);
```
7. **Strengthen performance gates with tiered fixtures and memory ceilings.**
Analysis: Current thresholds are good, but fixture sizes are too close to mid-scale only. Add S/M/L fixtures and memory budget checks so regressions appear before real-world datasets hit them. This gives much more confidence in long-term scalability.
```diff
--- a/Gitlore_TUI_PRD_v2.md
+++ b/Gitlore_TUI_PRD_v2.md
@@ 9.3 Phase 0 — Toolchain Gate
-7. p95 first-paint latency < 50ms ... (100k issues, 50k MRs)
-10. p95 search latency < 200ms ... (50k documents)
+7. Tiered fixtures:
+ S: 10k issues / 5k MRs / 50k notes
+ M: 100k issues / 50k MRs / 500k notes
+ L: 250k issues / 100k MRs / 1M notes
+ Enforce p95 targets per tier and memory ceiling (<250MB RSS in M tier).
+10. Search SLO validated in S and M tiers, lexical and hybrid modes.
```
8. **Add session restore (last screen + filters + selection), with explicit `--fresh` opt-out.**
Analysis: This is high-value daily UX with low complexity, and it makes the TUI feel materially more “compelling/useful” without feature bloat. It also reduces friction when recovering from crash/restart.
```diff
--- a/Gitlore_TUI_PRD_v2.md
+++ b/Gitlore_TUI_PRD_v2.md
@@ 1. Executive Summary
+- **Session restore** — resume last screen, filters, and selection on startup.
@@ 4.1 Module Structure
+ session.rs # persisted UI session state
@@ 8.1 Global
+| `Ctrl+R` | Reset session state for current screen |
@@ 10.19 CLI Integration
+`lore tui --fresh` starts without restoring prior session state.
@@ 11. Assumptions
-12. No TUI-specific configuration initially.
+12. Minimal TUI state file is allowed for session restore only.
```
9. **Add parity tests between TUI data panels and `--robot` outputs.**
Analysis: You already have `ShowCliEquivalent`; parity tests make that claim trustworthy and prevent drift between interfaces. This is a strong reliability multiplier and helps future refactors.
```diff
--- a/Gitlore_TUI_PRD_v2.md
+++ b/Gitlore_TUI_PRD_v2.md
@@ 9.2 Phases / 9.3 Success Criteria
+Phase 5.6 — CLI/TUI Parity Pack
+ - Dashboard count parity vs `lore --robot count/status`
+ - List/detail parity for issues/MRs on sampled entities
+ - Search result identity parity (top-N ids) for lexical mode
+Success criterion: parity suite passes on CI fixtures.
```
If you want, I can produce a single consolidated patch of the PRD text (one unified diff) so you can drop it directly into the next iteration.

View File

@@ -0,0 +1,200 @@
1. **Fix the structural inconsistency between `src/tui` and `crates/lore-tui/src`**
Analysis: The PRD currently defines two different code layouts for the same system. That will cause implementation drift, wrong imports, and duplicated modules. Locking to one canonical layout early prevents execution churn and makes every snippet/action item unambiguous.
```diff
@@ 4.1 Module Structure @@
-src/
- tui/
+crates/lore-tui/src/
mod.rs
app.rs
message.rs
@@
-### 10.5 Dashboard View (FrankenTUI Native)
-// src/tui/view/dashboard.rs
+### 10.5 Dashboard View (FrankenTUI Native)
+// crates/lore-tui/src/view/dashboard.rs
@@
-### 10.6 Sync View
-// src/tui/view/sync.rs
+### 10.6 Sync View
+// crates/lore-tui/src/view/sync.rs
```
2. **Add a small `ui_adapter` seam to contain FrankenTUI API churn**
Analysis: You already identified high likelihood of upstream breakage. Pinning a commit helps, but if every screen imports raw `ftui_*` types directly, churn ripples through dozens of files. A thin adapter layer reduces upgrade cost without introducing the rejected “full portability abstraction”.
```diff
@@ 3.1 Risk Matrix @@
| API breaking changes | High | High (v0.x) | Pin exact git commit; vendor source if needed |
+| API breakage blast radius across app code | High | High | Constrain ftui usage behind `ui_adapter/*` wrappers |
@@ 4.1 Module Structure @@
+ ui_adapter/
+ mod.rs # Re-export stable local UI primitives
+ runtime.rs # App launch/options wrappers
+ widgets.rs # Table/List/Modal wrapper constructors
+ input.rs # Text input + focus helpers
@@ 9.3 Phase 0 — Toolchain Gate @@
+14. `ui_adapter` compile-check: no screen module imports `ftui_*` directly (lint-enforced)
```
3. **Correct search mode behavior and replace sleep-based debounce with cancelable scheduling**
Analysis: Current plan hardcodes `"hybrid"` in `execute_search`, so mode switching is UI-only and incorrect. Also, spawning sleeping tasks per keypress is wasteful under fast typing. Make mode a first-class query parameter and debounce via one cancelable scheduled event per input domain.
```diff
@@ 4.4 maybe_debounced_query @@
-std::thread::sleep(std::time::Duration::from_millis(200));
-match crate::tui::action::execute_search(&conn, &query, &filters) {
+// no thread sleep; schedule SearchRequestStarted after 200ms via debounce scheduler
+match crate::tui::action::execute_search(&conn, &query, &filters, mode) {
@@ 10.11 Action Module — Query Bridge @@
-pub fn execute_search(conn: &Connection, query: &str, filters: &SearchCliFilters) -> Result<SearchResponse, LoreError> {
- let mode_str = "hybrid"; // default; TUI mode selector overrides
+pub fn execute_search(
+ conn: &Connection,
+ query: &str,
+ filters: &SearchCliFilters,
+ mode: SearchMode,
+) -> Result<SearchResponse, LoreError> {
+ let mode_str = match mode {
+ SearchMode::Hybrid => "hybrid",
+ SearchMode::Lexical => "lexical",
+ SearchMode::Semantic => "semantic",
+ };
@@ 9.3 Phase 0 — Toolchain Gate @@
+15. Search mode parity: lexical/hybrid/semantic each return mode-consistent top-N IDs on fixture
```
4. **Guarantee consistent multi-query reads and add query interruption for responsiveness**
Analysis: Detail screens combine multiple queries that can observe mixed states during sync writes. Wrap each detail fetch in a single read transaction for snapshot consistency. Add cancellation/interrupt checks for long-running queries so UI remains responsive under heavy datasets.
```diff
@@ 4.5 Async Action System @@
+All detail fetches (`issue_detail`, `mr_detail`, timeline expansion) run inside one read transaction
+to guarantee snapshot consistency across subqueries.
@@ 10.11 Action Module — Query Bridge @@
+pub fn with_read_snapshot<T>(
+ conn: &Connection,
+ f: impl FnOnce(&rusqlite::Transaction<'_>) -> Result<T, LoreError>,
+) -> Result<T, LoreError> { ... }
+// Long queries register interrupt checks tied to CancelToken
+// to avoid >1s uninterruptible stalls during rapid navigation/filtering.
```
5. **Formalize sync event streaming contract to prevent “stuck” states**
Analysis: Dropping events on backpressure is acceptable, but completion must never be dropped and event ordering must be explicit. Add a typed `SyncUiEvent` stream with guaranteed terminal sentinel and progress coalescing to reduce load while preserving correctness.
```diff
@@ 4.4 start_sync_task @@
-let (tx, rx) = std::sync::mpsc::sync_channel::<Msg>(1024);
+let (tx, rx) = std::sync::mpsc::sync_channel::<SyncUiEvent>(2048);
-// drop this progress update rather than blocking the sync thread
+// coalesce progress to max 30Hz per lane; never drop terminal events
+// always emit SyncUiEvent::StreamClosed { outcome }
@@ 5.9 Sync @@
-- Log viewer with streaming output
+- Log viewer with streaming output and explicit stream-finalization state
+- UI shows dropped/coalesced event counters for transparency
```
6. **Version and validate session restore payloads**
Analysis: A raw JSON session file without schema/version checks is fragile across releases and DB switches. Add schema version, DB fingerprint, and safe fallback rules so session restore never blocks startup or applies stale state incorrectly.
```diff
@@ 11. Assumptions @@
-12. Minimal TUI state file allowed for session restore only ...
+12. Versioned TUI state file allowed for session restore only:
+ fields include `schema_version`, `app_version`, `db_fingerprint`, `saved_at`, `state`.
@@ 10.1 New Files @@
crates/lore-tui/src/session.rs # Lightweight session state persistence
+ # + versioning, validation, corruption quarantine
@@ 4.1 Module Structure @@
session.rs # Lightweight session state persistence
+ # corrupted file -> `.bad-<timestamp>` and fresh start
```
7. **Harden terminal safety beyond ANSI stripping**
Analysis: ANSI stripping is necessary but not sufficient. Bidi controls and invisible Unicode controls can still spoof displayed content. URL checks should normalize host/port and disallow deceptive variants. This closes realistic terminal spoofing vectors.
```diff
@@ 3.1 Risk Matrix @@
| Terminal escape/control-sequence injection via issue/note text | High | Medium | Strip ANSI/OSC/control chars via sanitize_for_terminal() ... |
+| Bidi/invisible Unicode spoofing in rendered text | High | Medium | Strip bidi overrides + zero-width controls in untrusted text |
@@ 10.4.1 Terminal Safety — Untrusted Text Sanitization @@
-Strip ANSI escape sequences, OSC commands, and control characters
+Strip ANSI/OSC/control chars, bidi overrides (RLO/LRO/PDF/RLI/LRI/FSI/PDI),
+and zero-width/invisible controls from untrusted text
-pub fn is_safe_url(url: &str, allowed_hosts: &[String]) -> bool {
+pub fn is_safe_url(url: &str, allowed_origins: &[Origin]) -> bool {
+ // normalize host (IDNA), enforce scheme+host+port match
```
8. **Use progressive hydration for detail screens**
Analysis: Issue/MR detail first-paint can become slow when discussions are large. Split fetch into phases: metadata first, then discussions/file changes, then deep thread content on expand. This improves perceived performance and keeps navigation snappy on large repos.
```diff
@@ 5.3 Issue Detail @@
-Data source: `lore issues <iid>` + discussions + cross-references
+Data source (progressive):
+1) metadata/header (first paint)
+2) discussions summary + cross-refs
+3) full thread bodies loaded on demand when expanded
@@ 5.5 MR Detail @@
-Unique features: File changes list, Diff discussions ...
+Unique features (progressive hydration):
+- file change summary in first paint
+- diff discussion bodies loaded lazily per expanded thread
@@ 9.3 Phase 0 — Toolchain Gate @@
+16. Detail first-paint p95 < 75ms on M-tier fixtures (metadata-only phase)
```
9. **Make reliability tests reproducible with deterministic clocks/seeds**
Analysis: Relative-time rendering and fuzz tests are currently tied to wall clock/randomness, which makes CI flakes hard to diagnose. Introduce a `Clock` abstraction and deterministic fuzz seeds with failure replay output.
```diff
@@ 10.9.1 Non-Snapshot Tests @@
+/// All time-based rendering uses injected `Clock` in tests.
+/// Fuzz failures print deterministic seed for replay.
@@ 9.2 Phase 5.5 — Reliability Test Pack @@
-Event fuzz tests (key/resize/paste):p55g
+Event fuzz tests (key/resize/paste, deterministic seed replay):p55g
+Deterministic clock/render tests:p55i
```
10. **Add an “Actionable Insights” dashboard panel for stronger day-to-day utility**
Analysis: Current dashboard is informative, but not prioritizing. Adding ranked insights (stale P1s, blocked MRs, discussion hotspots) turns it into a decision surface, not just a metrics screen. This makes the TUI materially more compelling for triage workflows.
```diff
@@ 1. Executive Summary @@
- Dashboard — sync status, project health, counts at a glance
+- Dashboard — sync status, project health, counts, and ranked actionable insights
@@ 5.1 Dashboard (Home Screen) @@
-│ Recent Activity │
+│ Recent Activity │
+│ Actionable Insights │
+│ 1) 7 opened P1 issues >14d │
+│ 2) 3 MRs blocked by unresolved │
+│ 3) auth/ has +42% note velocity │
@@ 6. User Flows @@
+### 6.9 Flow: "Risk-first morning sweep"
+Dashboard -> select insight -> jump to pre-filtered list/detail
```
These 10 changes stay clear of your `Rejected Recommendations` list and materially improve correctness, operability, and product value without adding speculative architecture.

View File

@@ -0,0 +1,150 @@
Your plan is strong and unusually detailed. The biggest upgrades Id make are around build isolation, async correctness, terminal correctness, and turning existing data into sharper triage workflows.
## 1) Fix toolchain isolation so stable builds cannot accidentally pull nightly
Rationale: a `rust-toolchain.toml` inside `crates/lore-tui` is not a complete guard when running workspace commands from repo root. You should structurally prevent stable workflows from touching nightly-only code.
```diff
@@ 3.2 Nightly Rust Strategy
-[workspace]
-members = [".", "crates/lore-tui"]
+[workspace]
+members = ["."]
+exclude = ["crates/lore-tui"]
+`crates/lore-tui` is built as an isolated workspace/package with explicit toolchain invocation:
+ cargo +nightly-2026-02-08 check --manifest-path crates/lore-tui/Cargo.toml
+Core repo remains:
+ cargo +stable check --workspace
```
## 2) Add an explicit `lore` <-> `lore-tui` compatibility contract
Rationale: runtime delegation is correct, but version drift between binaries will become the #1 support failure mode. Add a handshake before launch.
```diff
@@ 10.19 CLI Integration — Adding `lore tui`
+Before spawning `lore-tui`, `lore` runs:
+ lore-tui --print-contract-json
+and validates:
+ - minimum_core_version
+ - supported_db_schema_range
+ - contract_version
+On mismatch, print actionable remediation:
+ cargo install --path crates/lore-tui
```
## 3) Make TaskSupervisor truly authoritative (remove split async paths)
Rationale: the document says supervisor is the only path, but examples still use direct `Cmd::task` and `search_request_id`. Close that contradiction now to avoid stale-data races.
```diff
@@ 4.4 App — Implementing the Model Trait
- search_request_id: u64,
+ task_supervisor: TaskSupervisor,
@@ 4.5.1 Task Supervisor
-The `search_request_id` field in `LoreApp` is superseded...
+`search_request_id` is removed. All async work uses TaskSupervisor generations.
+No direct `Cmd::task` from screen handlers or ad-hoc helpers.
```
## 4) Resolve keybinding conflicts and implement real go-prefix timeout
Rationale: `Ctrl+I` collides with `Tab` in terminals. Also your 500ms go-prefix timeout is described but not enforced in code.
```diff
@@ 8.1 Global (Available Everywhere)
-| `Ctrl+I` | Jump forward in jump list (entity hops) |
+| `Alt+o` | Jump forward in jump list (entity hops) |
@@ 8.2 Keybinding precedence
+Go-prefix timeout is enforced by timestamped state + tick check.
+Backspace global-back behavior is implemented (currently documented but not wired).
```
## 5) Add a shared display-width text utility (Unicode-safe truncation and alignment)
Rationale: current `truncate()` implementations use byte/char length and will misalign CJK/emoji/full-width text in tables and trees.
```diff
@@ 10.1 New Files
+crates/lore-tui/src/text_width.rs # grapheme-safe truncation + display width helpers
@@ 10.5 Dashboard View / 10.13 Issue List / 10.16 Who View
-fn truncate(s: &str, max: usize) -> String { ... }
+use crate::text_width::truncate_display_width;
+// all column fitting/truncation uses terminal display width, not bytes/chars
```
## 6) Upgrade sync streaming to a QoS event bus with sequence IDs
Rationale: today progress/log events can be dropped under load with weak observability. Keep UI responsive while guaranteeing completion semantics and visible gap accounting.
```diff
@@ 4.4 start_sync_task()
-let (tx, rx) = std::sync::mpsc::sync_channel::<SyncUiEvent>(2048);
+let (ctrl_tx, ctrl_rx) = std::sync::mpsc::sync_channel::<SyncCtrlEvent>(256); // never-drop
+let (data_tx, data_rx) = std::sync::mpsc::sync_channel::<SyncDataEvent>(4096); // coalescible
+Every streamed event carries seq_no.
+UI detects gaps and renders: "Dropped N log/progress events due to backpressure."
+Terminal events (started/completed/failed/cancelled) remain lossless.
```
## 7) Make list pagination truly keyset-driven in state, not just in prose
Rationale: plan text promises windowed keyset paging, but state examples still keep a single list without cursor model. Encode pagination state explicitly.
```diff
@@ 10.10 state/issue_list.rs
-pub items: Vec<IssueListRow>,
+pub window: Vec<IssueListRow>,
+pub next_cursor: Option<IssueCursor>,
+pub prev_cursor: Option<IssueCursor>,
+pub prefetch: Option<Vec<IssueListRow>>,
+pub window_size: usize, // default 200
@@ 5.2 Issue List
-Pagination: Windowed keyset pagination...
+Pagination: Keyset cursor model is first-class state with forward/back cursors and prefetch buffer.
```
## 8) Harden session restore with atomic persistence + integrity checksum
Rationale: versioning/quarantine is good, but you still need crash-safe write semantics and tamper/corruption detection to avoid random boot failures.
```diff
@@ 10.1 New Files
-crates/lore-tui/src/session.rs # Versioned session state persistence + validation + corruption quarantine
+crates/lore-tui/src/session.rs # + atomic write (tmp->fsync->rename), checksum, max-size guard
@@ 11. Assumptions
+Session writes are atomic and checksummed.
+Invalid checksum or oversized file triggers quarantine and fresh boot.
```
## 9) Evolve Doctor from read-only text into actionable remediation
Rationale: your CLI already returns machine-actionable `actions`. TUI should surface those as one-key fixes; this materially increases usefulness.
```diff
@@ 5.11 Doctor / Stats (Info Screens)
-Simple read-only views rendering the output...
+Doctor is interactive:
+ - shows health checks + severity
+ - exposes suggested `actions` from robot-mode errors
+ - Enter runs selected action command (with confirmation modal)
+Stats remains read-only.
```
## 10) Add a Dependency Lens to Issue/MR detail (high-value triage feature)
Rationale: you already have cross-refs + discussions + timeline. A compact dependency panel (blocked-by / blocks / unresolved threads) makes this data operational for prioritization.
```diff
@@ 5.3 Issue Detail
-│ ┌─ Cross-References ─────────────────────────────────────────┐ │
+│ ┌─ Dependency Lens ──────────────────────────────────────────┐ │
+│ │ Blocked by: #1198 (open, stale 9d) │ │
+│ │ Blocks: !458 (opened, 2 unresolved threads) │ │
+│ │ Risk: High (P1 + stale blocker + open MR discussion) │ │
+│ └────────────────────────────────────────────────────────────┘ │
@@ 9.2 Phases
+Dependency Lens (issue/mr detail, computed risk score) :p3e, after p2e, 1d
```
---
If you want, I can next produce a consolidated **“v2.1 patch”** of the PRD with all these edits merged into one coherent updated document structure.

View File

@@ -0,0 +1,264 @@
1. **Fix a critical contradiction in workspace/toolchain isolation**
Rationale: Section `3.2` says `crates/lore-tui` is excluded from the root workspace, but Section `9.1` currently adds it as a member. That inconsistency will cause broken CI/tooling behavior and confusion about whether stable-only workflows remain safe.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 9.1 Dependency Changes
-# Root Cargo.toml changes
-[workspace]
-members = [".", "crates/lore-tui"]
+# Root Cargo.toml changes
+[workspace]
+members = ["."]
+exclude = ["crates/lore-tui"]
@@
-# Add workspace member (no lore-tui dep, no tui feature)
+# Keep lore-tui EXCLUDED from root workspace (nightly isolation boundary)
@@ 9.3 Phase 0 — Toolchain Gate
-1. `cargo check --all-targets` passes on pinned nightly (TUI crate) and stable (core)
+1. `cargo +stable check --workspace --all-targets` passes for root workspace
+2. `cargo +nightly-2026-02-08 check --manifest-path crates/lore-tui/Cargo.toml --all-targets` passes
```
2. **Replace global loading spinner with per-screen stale-while-revalidate**
Rationale: A single `is_loading` flag causes full-screen flicker and blocked context during quick refreshes. Per-screen load states keep existing data visible while background refresh runs, improving perceived performance and usability.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 10.10 State Module — Complete
- pub is_loading: bool,
+ pub load_state: ScreenLoadStateMap,
@@
- pub fn set_loading(&mut self, loading: bool) {
- self.is_loading = loading;
- }
+ pub fn set_loading(&mut self, screen: ScreenId, state: LoadState) {
+ self.load_state.insert(screen, state);
+ }
+
+pub enum LoadState {
+ Idle,
+ LoadingInitial,
+ Refreshing, // stale data remains visible
+ Error(String),
+}
@@ 4.4 App — Implementing the Model Trait
- // Loading spinner overlay (while async data is fetching)
- if self.state.is_loading {
- crate::tui::view::common::render_loading(frame, body);
- } else {
- match self.navigation.current() { ... }
- }
+ // Always render screen; show lightweight refresh indicator when needed.
+ match self.navigation.current() { ... }
+ crate::tui::view::common::render_refresh_indicator_if_needed(
+ self.navigation.current(), &self.state.load_state, frame, body
+ );
```
3. **Make `TaskSupervisor` a real scheduler (not just token registry)**
Rationale: Current design declares priority lanes but still dispatches directly with `Cmd::task`, and debounce uses `thread::sleep` per keystroke (wastes worker threads). A bounded scheduler with queued tasks and timer-driven debounce will reduce contention and tail latency.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 4.5.1 Task Supervisor (Dedup + Cancellation + Priority)
-pub struct TaskSupervisor {
- active: HashMap<TaskKey, Arc<CancelToken>>,
- generation: AtomicU64,
-}
+pub struct TaskSupervisor {
+ active: HashMap<TaskKey, Arc<CancelToken>>,
+ generation: AtomicU64,
+ queue: BinaryHeap<ScheduledTask>,
+ inflight: HashMap<TaskPriority, usize>,
+ limits: TaskLaneLimits, // e.g. Input=4, Navigation=2, Background=1
+}
@@
-// 200ms debounce via cancelable scheduled event (not thread::sleep).
-Cmd::task(move || {
- std::thread::sleep(std::time::Duration::from_millis(200));
- ...
-})
+// Debounce via runtime timer message; no sleeping worker thread.
+self.state.search.debounce_deadline = Some(now + 200ms);
+Cmd::none()
@@ 4.4 update()
+Msg::Tick => {
+ if self.state.search.debounce_expired(now) {
+ return self.dispatch_supervised(TaskKey::Search, TaskPriority::Input, ...);
+ }
+ self.task_supervisor.dispatch_ready(now)
+}
```
4. **Add a sync run ledger for exact “new since sync” navigation**
Rationale: “Since last sync” based on timestamps is ambiguous with partial failures, retries, and clock drift. A lightweight `sync_runs` + `sync_deltas` ledger makes summary-mode drill-down exact and auditable without implementing full resumable checkpoints.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 5.9 Sync
-- `i` navigates to Issue List pre-filtered to "since last sync"
-- `m` navigates to MR List pre-filtered to "since last sync"
+- `i` navigates to Issue List pre-filtered to `sync_run_id=<last_run>`
+- `m` navigates to MR List pre-filtered to `sync_run_id=<last_run>`
+- Filters are driven by persisted `sync_deltas` rows (exact entity keys changed in run)
@@ 10.1 New Files
+src/core/migrations/00xx_add_sync_run_ledger.sql
@@ New migration (appendix)
+CREATE TABLE sync_runs (
+ id INTEGER PRIMARY KEY,
+ started_at_ms INTEGER NOT NULL,
+ completed_at_ms INTEGER,
+ status TEXT NOT NULL
+);
+CREATE TABLE sync_deltas (
+ sync_run_id INTEGER NOT NULL,
+ entity_kind TEXT NOT NULL,
+ project_id INTEGER NOT NULL,
+ iid INTEGER NOT NULL,
+ change_kind TEXT NOT NULL
+);
+CREATE INDEX idx_sync_deltas_run_kind ON sync_deltas(sync_run_id, entity_kind);
@@ 11 Assumptions
-16. No new SQLite tables needed for v1
+16. Two small v1 tables are added: `sync_runs` and `sync_deltas` for deterministic post-sync UX.
```
5. **Expand the GA index set to match actual filter surface**
Rationale: Current required indexes only cover default sort paths; they do not match common filters like `author`, `assignee`, `reviewer`, `target_branch`, label-based filtering. This will likely miss p95 SLOs at M tier.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 9.3.1 Required Indexes (GA Blocker)
CREATE INDEX IF NOT EXISTS idx_issues_list_default
ON issues(project_id, state, updated_at DESC, iid DESC);
+CREATE INDEX IF NOT EXISTS idx_issues_author_updated
+ ON issues(project_id, state, author_username, updated_at DESC, iid DESC);
+CREATE INDEX IF NOT EXISTS idx_issues_assignee_updated
+ ON issues(project_id, state, assignee_username, updated_at DESC, iid DESC);
@@
CREATE INDEX IF NOT EXISTS idx_mrs_list_default
ON merge_requests(project_id, state, updated_at DESC, iid DESC);
+CREATE INDEX IF NOT EXISTS idx_mrs_reviewer_updated
+ ON merge_requests(project_id, state, reviewer_username, updated_at DESC, iid DESC);
+CREATE INDEX IF NOT EXISTS idx_mrs_target_updated
+ ON merge_requests(project_id, state, target_branch, updated_at DESC, iid DESC);
+CREATE INDEX IF NOT EXISTS idx_mrs_source_updated
+ ON merge_requests(project_id, state, source_branch, updated_at DESC, iid DESC);
@@
+-- If labels are normalized through join table:
+CREATE INDEX IF NOT EXISTS idx_issue_labels_label_issue ON issue_labels(label, issue_id);
+CREATE INDEX IF NOT EXISTS idx_mr_labels_label_mr ON mr_labels(label, mr_id);
@@ CI enforcement
-asserts that none show `SCAN TABLE` for the primary entity tables
+asserts that none show full scans for primary tables under default filters AND top 8 user-facing filter combinations
```
6. **Add DB schema compatibility preflight (separate from binary compat)**
Rationale: Binary compat (`--compat-version`) does not protect against schema mismatches. Add explicit schema version checks before booting the TUI to avoid runtime SQL errors deep in navigation paths.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 3.2 Nightly Rust Strategy
-- **Compatibility contract:** Before spawning `lore-tui`, the `lore tui` subcommand runs `lore-tui --compat-version` ...
+- **Compatibility contract:** Before spawning `lore-tui`, `lore tui` validates:
+ 1) binary compat version (`lore-tui --compat-version`)
+ 2) DB schema range (`lore-tui --check-schema <db-path>`)
+If schema is out-of-range, print remediation: `lore migrate`.
@@ 9.3 Phase 0 — Toolchain Gate
+17. Schema preflight test: incompatible DB schema yields actionable error and non-zero exit before entering TUI loop.
```
7. **Refine terminal sanitization to preserve legitimate Unicode while blocking control attacks**
Rationale: Current sanitizer strips zero-width joiners and similar characters, which breaks emoji/grapheme rendering and undermines your own `text_width` goals. Keep benign Unicode, remove only dangerous controls/bidi spoof vectors, and sanitize markdown link targets too.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 10.4.1 Terminal Safety — Untrusted Text Sanitization
-- Strip bidi overrides ... and zero-width/invisible controls ...
+- Strip ANSI/OSC/control chars and bidi spoof controls.
+- Preserve legitimate grapheme-joining characters (ZWJ/ZWNJ/combining marks) for correct Unicode rendering.
+- Sanitize markdown link targets with strict URL allowlist before rendering clickable links.
@@ safety.rs
- // Strip zero-width and invisible controls
- '\u{200B}' | '\u{200C}' | '\u{200D}' | '\u{FEFF}' | '\u{00AD}' => {}
+ // Preserve grapheme/emoji join behavior; remove only harmful controls.
+ // (ZWJ/ZWNJ/combining marks are retained)
@@ Enforcement rule
- Search result snippets
- Author names and labels
+- Markdown link destinations (scheme + origin validation before render/open)
```
8. **Add key normalization layer for terminal portability**
Rationale: Collision notes are good, but you still need a canonicalization layer because terminals emit different sequences for Alt/Meta/Backspace/Enter variants. This reduces “works in iTerm, broken in tmux/SSH” bugs.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 8.2 List Screens
**Terminal keybinding safety notes:**
@@
- `Ctrl+M` is NOT used — it collides with `Enter` ...
+
+**Key normalization layer (new):**
+- Introduce `KeyNormalizer` before `interpret_key()`:
+ - normalize Backspace variants (`^H`, `DEL`)
+ - normalize Alt/Meta prefixes
+ - normalize Shift+Tab vs Tab where terminal supports it
+ - normalize kitty/CSI-u enhanced key protocols when present
@@ 9.2 Phases
+ Key normalization integration tests :p5d, after p5c, 1d
+ Terminal profile replay tests :p5e, after p5d, 1d
```
9. **Add deterministic event-trace capture for crash reproduction**
Rationale: Panic logs without recent event context are often insufficient for TUI race bugs. Persist last-N normalized events + active screen + task state snapshot on panic for one-command repro.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 3.1 Risk Matrix
| Runtime panic leaves user blocked | High | Medium | Panic hook writes crash report, restores terminal, offers fallback CLI command |
+| Hard-to-reproduce input race bugs | Medium | Medium | Persist last 2k normalized events + state hash on panic for deterministic replay |
@@ 10.3 Entry Point / panic hook
- // 2. Write crash dump
+ // 2. Write crash dump + event trace snapshot
+ // Includes: last 2000 normalized events, current screen, in-flight task keys/generations
@@ 10.9.1 Non-Snapshot Tests
+/// Replay captured event trace from panic artifact and assert no panic.
+#[test]
+fn replay_trace_artifact_is_stable() { ... }
```
10. **Do a plan-wide consistency pass on pseudocode contracts**
Rationale: There are internal mismatches that will create implementation churn (`search_request_id` still referenced after replacement, `items` vs `window`, keybinding mismatch `Ctrl+I` vs `Alt+o`). Tightening these now saves real engineering time later.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ 4.4 LoreApp::new
- search_request_id: 0,
+ // dedup generation handled by TaskSupervisor
@@ 8.1 Global
-| `Ctrl+O` | Jump backward in jump list (entity hops) |
-| `Alt+o` | Jump forward in jump list (entity hops) |
+| `Ctrl+O` | Jump backward in jump list (entity hops) |
+| `Alt+o` | Jump forward in jump list (entity hops) |
@@ 10.10 IssueListState
- pub fn selected_item(&self) -> Option<&IssueListRow> {
- self.items.get(self.selected_index)
- }
+ pub fn selected_item(&self) -> Option<&IssueListRow> {
+ self.window.get(self.selected_index)
+ }
```
If you want, I can now produce a single consolidated unified diff patch of the full PRD with these revisions merged end-to-end.

View File

@@ -0,0 +1,211 @@
Below are the strongest revisions Id make. I intentionally avoided anything in your `## Rejected Recommendations`.
1. **Unify commands/keybindings/help/palette into one registry**
Rationale: your plan currently duplicates action definitions across `execute_palette_action`, `ShowCliEquivalent`, help overlay text, and status hints. That will drift quickly and create correctness bugs. A single `CommandRegistry` makes behavior consistent and testable.
```diff
diff --git a/PRD.md b/PRD.md
@@ 4.1 Module Structure
+ commands.rs # Single source of truth for actions, keybindings, CLI equivalents
@@ 4.4 App — Implementing the Model Trait
- fn execute_palette_action(&self, action_id: &str) -> Cmd<Msg> { ... big match ... }
+ fn execute_palette_action(&self, action_id: &str) -> Cmd<Msg> {
+ if let Some(spec) = self.commands.get(action_id) {
+ return self.update(spec.to_msg(self.navigation.current()));
+ }
+ Cmd::none()
+ }
@@ 8. Keybinding Reference
+All keybinding/help/status/palette definitions are generated from `commands.rs`.
+No hardcoded duplicate maps in view/state modules.
```
2. **Replace ad-hoc key flags with explicit input state machine**
Rationale: `pending_go` + `go_prefix_instant` is fragile and already inconsistent with documented behavior. A typed `InputMode` removes edge-case bugs and makes prefix timeout deterministic.
```diff
diff --git a/PRD.md b/PRD.md
@@ 4.4 LoreApp struct
- pending_go: bool,
- go_prefix_instant: Option<std::time::Instant>,
+ input_mode: InputMode, // Normal | Text | Palette | GoPrefix { started_at }
@@ 8.2 List Screens
-| `g` `g` | Jump to top |
+| `g` `g` | Jump to top (current list screen) |
@@ 4.4 interpret_key
- KeyCode::Char('g') => Msg::IssueListScrollToTop
+ KeyCode::Char('g') => Msg::ScrollToTopCurrentScreen
```
3. **Fix TaskSupervisor contract and message schema drift**
Rationale: the plan mixes `request_id` and `generation`, and `TaskKey::Search { generation }` defeats dedup by making every key unique. This can silently reintroduce stale-result races.
```diff
diff --git a/PRD.md b/PRD.md
@@ 4.3 Core Types (Msg)
- SearchRequestStarted { request_id: u64, query: String },
- SearchExecuted { request_id: u64, results: SearchResults },
+ SearchRequestStarted { generation: u64, query: String },
+ SearchExecuted { generation: u64, results: SearchResults },
@@ 4.5.1 Task Supervisor
- Search { generation: u64 },
+ Search,
+ struct TaskStamp { key: TaskKey, generation: u64 }
@@ 10.9.1 Non-Snapshot Tests
- Msg::SearchExecuted { request_id: 3, ... }
+ Msg::SearchExecuted { generation: 3, ... }
```
4. **Add a `Clock` boundary everywhere time is computed**
Rationale: you call `SystemTime::now()` in many query/render paths, causing inconsistent relative-time labels inside one frame and flaky tests. Injected clock gives deterministic rendering and lower per-frame overhead.
```diff
diff --git a/PRD.md b/PRD.md
@@ 4.1 Module Structure
+ clock.rs # Clock trait: SystemClock/FakeClock
@@ 4.4 LoreApp struct
+ clock: Arc<dyn Clock>,
@@ 10.11 action.rs
- let now_ms = std::time::SystemTime::now()...
+ let now_ms = clock.now_ms();
@@ 9.3 Phase 0 success criteria
+19. Relative-time rendering deterministic under FakeClock across snapshot runs.
```
5. **Upgrade text truncation to grapheme-safe width handling**
Rationale: `unicode-width` alone is not enough for safe truncation; it can split grapheme clusters (emoji ZWJ sequences, skin tones, flags). You need width + grapheme segmentation together.
```diff
diff --git a/PRD.md b/PRD.md
@@ 10.1 New Files
-crates/lore-tui/src/text_width.rs # ... using unicode-width crate
+crates/lore-tui/src/text_width.rs # Grapheme-safe width/truncation using unicode-width + unicode-segmentation
@@ 10.1 New Files
+Cargo.toml (lore-tui): unicode-segmentation = "1"
@@ 9.3 Phase 0 success criteria
+20. Unicode rendering tests pass for CJK, emoji ZWJ, combining marks, RTL text.
```
6. **Redact sensitive values in logs and crash dumps**
Rationale: current crash/log strategy risks storing tokens/credentials in plain text. This is a serious operational/security gap for local tooling too.
```diff
diff --git a/PRD.md b/PRD.md
@@ 4.1 Module Structure
safety.rs # sanitize_for_terminal(), safe_url_policy()
+ redact.rs # redact_sensitive() for logs/crash reports
@@ 10.3 install_panic_hook_for_tui
- let _ = std::fs::write(&crash_path, format!("{panic_info:#?}"));
+ let report = redact_sensitive(format!("{panic_info:#?}"));
+ let _ = std::fs::write(&crash_path, report);
@@ 9.3 Phase 0 success criteria
+21. Redaction tests confirm tokens/Authorization headers never appear in persisted crash/log artifacts.
```
7. **Add search capability detection and mode fallback UX**
Rationale: semantic/hybrid mode should not silently degrade when embeddings are absent/stale. Explicit capability state increases trust and avoids “why are results weird?” confusion.
```diff
diff --git a/PRD.md b/PRD.md
@@ 5.6 Search
+Capability-aware modes:
+- If embeddings unavailable/stale, semantic mode is disabled with inline reason.
+- Hybrid mode auto-falls back to lexical and shows badge: "semantic unavailable".
@@ 4.3 Core Types
+ SearchCapabilitiesLoaded(SearchCapabilities)
@@ 9.3 Phase 0 success criteria
+22. Mode availability checks validated: lexical/hybrid/semantic correctly enabled/disabled by fixture capabilities.
```
8. **Define sync cancel latency SLO and enforce fine-grained checks**
Rationale: “check cancel between phases” is too coarse on big projects. Users need fast cancel acknowledgment and bounded stop time.
```diff
diff --git a/PRD.md b/PRD.md
@@ 5.9 Sync
-CANCELLATION: checked between sync phases
+CANCELLATION: checked at page boundaries, batch upsert boundaries, and before each network request.
+UX target: cancel acknowledged <250ms, sync stop p95 <2s after Esc.
@@ 9.3 Phase 0 success criteria
+23. Cancel latency test passes: p95 stop time <2s under M-tier fixtures.
```
9. **Add a “Hotspots” screen for risk/churn triage**
Rationale: this is high-value and uses existing data (events, unresolved discussions, stale items). It makes the TUI more compelling without needing new sync tables or rejected features.
```diff
diff --git a/PRD.md b/PRD.md
@@ 1. Executive Summary
+- **Hotspots** — file/path risk ranking by churn × unresolved discussion pressure × staleness
@@ 5. Screen Taxonomy
+### 5.12 Hotspots
+Shows top risky paths with drill-down to related issues/MRs/timeline.
@@ 8.1 Global
+| `gx` | Go to Hotspots |
@@ 10.1 New Files
+crates/lore-tui/src/state/hotspots.rs
+crates/lore-tui/src/view/hotspots.rs
```
10. **Add degraded startup mode when compat/schema checks fail**
Rationale: hard-exit on mismatch blocks users. A degraded mode that shells to `lore --robot` for read-only summary/doctor keeps the product usable and gives guided recovery.
```diff
diff --git a/PRD.md b/PRD.md
@@ 3.2 Nightly Rust Strategy
- On mismatch: actionable error and exit
+ On mismatch: actionable error with `--degraded` option.
+ `--degraded` launches limited TUI (Dashboard/Doctor/Stats via `lore --robot` subprocess calls).
@@ 10.3 TuiCli
+ /// Allow limited mode when schema/compat checks fail
+ #[arg(long)]
+ degraded: bool,
```
11. **Harden query-plan CI checks (dont rely on `SCAN TABLE` string matching)**
Rationale: SQLite planner text varies by version. Parse opcode structure and assert index usage semantically; otherwise CI will be flaky or miss regressions.
```diff
diff --git a/PRD.md b/PRD.md
@@ 9.3.1 Required Indexes (CI enforcement)
- asserts that none show `SCAN TABLE`
+ parses EXPLAIN QUERY PLAN rows and asserts:
+ - top-level loop uses expected index families
+ - no full scan on primary entity tables under default and top filter combos
+ - join order remains bounded (no accidental cartesian expansions)
```
12. **Enforce single-instance lock for session/state safety**
Rationale: assumption says no concurrent TUI sessions, but accidental double-launch will still happen. Locking prevents state corruption and confusing interleaved sync actions.
```diff
diff --git a/PRD.md b/PRD.md
@@ 10.1 New Files
+crates/lore-tui/src/instance_lock.rs # lock file with stale-lock recovery
@@ 11. Assumptions
-21. No concurrent TUI sessions.
+21. Concurrent sessions unsupported and actively prevented by instance lock (with clear error message).
```
If you want, I can turn this into a consolidated patched PRD (single unified diff) next.

View File

@@ -0,0 +1,198 @@
I reviewed the full PRD end-to-end and avoided all items already listed in `## Rejected Recommendations`.
These are the highest-impact revisions Id make.
1. **Fix keybinding/state-machine correctness gaps (critical)**
The plan currently has an internal conflict: the doc says jump-forward is `Alt+o`, but code sample uses `Ctrl+i` (which collides with `Tab` in many terminals). Also, `g`-prefix timeout depends on `Tick`, but `Tick` isnt guaranteed when idle, so prefix mode can get “stuck.” This is a correctness bug, not polish.
```diff
@@ 8.1 Global (Available Everywhere)
-| `Ctrl+O` | Jump backward in jump list (entity hops) |
-| `Alt+o` | Jump forward in jump list (entity hops) |
+| `Ctrl+O` | Jump backward in jump list (entity hops) |
+| `Alt+o` | Jump forward in jump list (entity hops) |
+| `Backspace` | Go back (when no text input is focused) |
@@ 4.4 LoreApp::interpret_key
- (KeyCode::Char('i'), m) if m.contains(Modifiers::CTRL) => {
- return Some(Msg::JumpForward);
- }
+ (KeyCode::Char('o'), m) if m.contains(Modifiers::ALT) => {
+ return Some(Msg::JumpForward);
+ }
+ (KeyCode::Backspace, Modifiers::NONE) => {
+ return Some(Msg::GoBack);
+ }
@@ 4.4 Model::subscriptions
+ // Go-prefix timeout enforcement must tick even when nothing is loading.
+ if matches!(self.input_mode, InputMode::GoPrefix { .. }) {
+ subs.push(Box::new(
+ Every::with_id(2, Duration::from_millis(50), || Msg::Tick)
+ ));
+ }
```
2. **Make `TaskSupervisor` API internally consistent and enforceable**
The plan uses `submit()`/`is_current()` in one place and `register()`/`next_generation()` in another. That inconsistency will cause implementation drift and stale-result bugs. Use one coherent API with a returned handle containing `{key, generation, cancel_token}`.
```diff
@@ 4.5.1 Task Supervisor (Dedup + Cancellation + Priority)
-pub struct TaskSupervisor {
- active: HashMap<TaskKey, Arc<CancelToken>>,
- generation: AtomicU64,
-}
+pub struct TaskSupervisor {
+ active: HashMap<TaskKey, TaskHandle>,
+}
+
+pub struct TaskHandle {
+ pub key: TaskKey,
+ pub generation: u64,
+ pub cancel: Arc<CancelToken>,
+}
- pub fn register(&mut self, key: TaskKey) -> Arc<CancelToken>
- pub fn next_generation(&self) -> u64
+ pub fn submit(&mut self, key: TaskKey) -> TaskHandle
+ pub fn is_current(&self, key: &TaskKey, generation: u64) -> bool
+ pub fn complete(&mut self, key: &TaskKey, generation: u64)
```
3. **Replace thread-sleep debounce with runtime timer messages**
`std::thread::sleep(200ms)` inside task closures wastes pool threads under fast typing and reduces responsiveness under contention. Use timer-driven debounce messages and only fire the latest generation. This improves latency stability on large datasets.
```diff
@@ 4.3 Core Types (Msg enum)
+ SearchDebounceArmed { generation: u64, query: String },
+ SearchDebounceFired { generation: u64 },
@@ 4.4 maybe_debounced_query
- Cmd::task(move || {
- std::thread::sleep(std::time::Duration::from_millis(200));
- ...
- })
+ // Arm debounce only; runtime timer emits SearchDebounceFired.
+ Cmd::msg(Msg::SearchDebounceArmed { generation, query })
@@ 4.4 subscriptions()
+ if self.state.search.debounce_pending() {
+ subs.push(Box::new(
+ Every::with_id(3, Duration::from_millis(200), || Msg::SearchDebounceFired { generation: ... })
+ ));
+ }
```
4. **Harden `DbManager` API to avoid lock-poison panics and accidental long-held guards**
Returning raw `MutexGuard<Connection>` invites accidental lock scope expansion and `expect("lock poisoned")` panics. Move to closure-based access (`with_reader`, `with_writer`) returning `Result`, and use cached statements. This reduces deadlock risk and tail latency.
```diff
@@ 4.4 DbManager
- pub fn reader(&self) -> MutexGuard<'_, Connection> { ...expect("reader lock poisoned") }
- pub fn writer(&self) -> MutexGuard<'_, Connection> { ...expect("writer lock poisoned") }
+ pub fn with_reader<T>(&self, f: impl FnOnce(&Connection) -> Result<T, LoreError>) -> Result<T, LoreError>
+ pub fn with_writer<T>(&self, f: impl FnOnce(&Connection) -> Result<T, LoreError>) -> Result<T, LoreError>
@@ 10.11 action.rs
- let conn = db.reader();
- match fetch_issues(&conn, &filter) { ... }
+ match db.with_reader(|conn| fetch_issues(conn, &filter)) { ... }
+ // Query hot paths use prepare_cached() to reduce parse overhead.
```
5. **Add read-path entity cache (LRU) for repeated drill-in/out workflows**
Your core daily flow is Enter/Esc bouncing between list/detail. Without caching, identical detail payloads are re-queried repeatedly. A bounded LRU by `EntityKey` with invalidation on sync completion gives near-instant reopen behavior and reduces DB pressure.
```diff
@@ 4.1 Module Structure
+ entity_cache.rs # Bounded LRU cache for detail payloads
@@ app.rs LoreApp fields
+ entity_cache: EntityCache,
@@ load_screen(Screen::IssueDetail / MrDetail)
+ if let Some(cached) = self.entity_cache.get_issue(&key) {
+ return Cmd::msg(Msg::IssueDetailLoaded { key, detail: cached.clone() });
+ }
@@ Msg::IssueDetailLoaded / Msg::MrDetailLoaded handlers
+ self.entity_cache.put_issue(key.clone(), detail.clone());
@@ Msg::SyncCompleted
+ self.entity_cache.invalidate_all();
```
6. **Tighten sync-stream observability and drop semantics without adding heavy architecture**
You already handle backpressure, but operators need visibility when it happens. Track dropped-progress count and max queue depth in state and surface it in running/summary views. This keeps the current simple design while making reliability measurable.
```diff
@@ 4.3 Msg
+ SyncStreamStats { dropped_progress: u64, max_queue_depth: usize },
@@ 5.9 Sync (Running mode footer)
-| Esc cancel f full sync e embed after d dry-run l log level|
+| Esc cancel f full sync e embed after d dry-run l log level stats:drop=12 qmax=1847 |
@@ 9.3 Success criteria
+24. Sync stream stats are emitted and rendered; terminal events (completed/failed/cancelled) delivery is 100% under induced backpressure.
```
7. **Make crash reporting match the promised diagnostic value**
The PRD promises event replay context, but sample hook writes only panic text. Add explicit crash context capture (`last events`, `current screen`, `task handles`, `build id`, `db fingerprint`) and retention policy. This materially improves post-mortem debugging.
```diff
@@ 4.1 Module Structure
+ crash_context.rs # ring buffer of normalized events + task/screen snapshot
@@ 10.3 install_panic_hook_for_tui()
- let report = crate::redact::redact_sensitive(&format!("{panic_info:#?}"));
+ let ctx = crate::crash_context::snapshot();
+ let report = crate::redact::redact_sensitive(&format!("{panic_info:#?}\n{ctx:#?}"));
+ // Retention: keep latest 20 crash files, delete oldest metadata entries only.
```
8. **Add Search Facets panel for faster triage (high-value feature, low risk)**
Search is central, but right now filtering requires manual field edits. Add facet counts (`issues`, `MRs`, `discussions`, top labels/projects/authors) with one-key apply. This makes search more compelling and actionable without introducing schema changes.
```diff
@@ 5.6 Search
-- Layout: Split pane — results list (left) + preview (right)
+- Layout: Three-pane on wide terminals — results (left) + preview (center) + facets (right)
+**Facets panel:**
+- Entity type counts (issue/MR/discussion)
+- Top labels/projects/authors for current query
+- `1/2/3` quick-apply type facet; `l` cycles top label facet
@@ 8.2 List/Search keybindings
+| `1` `2` `3` | Apply facet: Issue / MR / Discussion |
+| `l` | Apply next top-label facet |
```
9. **Strengthen text sanitization for terminal edge cases**
Current sanitizer is strong, but still misses some control-space edge cases (C1 controls, directional marks beyond the listed bidi set). Add those and test them. This closes spoofing/render confusion gaps with minimal complexity.
```diff
@@ 10.4.1 sanitize_for_terminal()
+ // Strip C1 control block (U+0080..U+009F) and additional directional marks
+ c if ('\u{0080}'..='\u{009F}').contains(&c) => {}
+ '\u{200E}' | '\u{200F}' | '\u{061C}' => {} // LRM, RLM, ALM
@@ tests
+ #[test] fn strips_c1_controls() { ... }
+ #[test] fn strips_lrm_rlm_alm() { ... }
```
10. **Add an explicit vertical-slice gate before broad screen expansion**
The plan is comprehensive, but risk is still front-loaded on framework + runtime behavior. Insert a strict vertical slice gate (`Dashboard + IssueList + IssueDetail + Sync running`) with perf and stability thresholds before Phase 3 features. This reduces rework if foundational assumptions break.
```diff
@@ 9.2 Phases
+section Phase 2.5 — Vertical Slice Gate
+Dashboard + IssueList + IssueDetail + Sync (running) integrated :p25a, after p2c, 3d
+Gate: p95 nav latency < 75ms on M tier; zero stuck-input-state bugs; cancel p95 < 2s :p25b, after p25a, 1d
+Only then proceed to Search/Timeline/Who/Palette expansion.
```
If you want, I can produce a full consolidated `diff` block against the entire PRD text (single patch), but the above is the set Id prioritize first.

File diff suppressed because it is too large Load Diff

2075
plans/tui-prd.md Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,157 @@
**Top Revisions I Recommend**
1. **Fix auth semantics + a real inconsistency in your test plan**
Your ACs require graceful handling for `403`, but the test list says the “403” test returns `401`. That hides the exact behavior you care about and can let permission regressions slip through.
```diff
@@ AC-1: GraphQL Client (Unit)
- [ ] HTTP 401 → `LoreError::GitLabAuthFailed`
+ [ ] HTTP 401 → `LoreError::GitLabAuthFailed`
+ [ ] HTTP 403 → `LoreError::GitLabForbidden`
@@ AC-3: Status Fetcher (Integration)
- [ ] GraphQL 403 → returns `Ok(HashMap::new())` with warning log
+ [ ] GraphQL 403 (`GitLabForbidden`) → returns `Ok(HashMap::new())` with warning log
@@ TDD Plan (RED)
- 13. `test_fetch_statuses_403_graceful` — mock returns 401 → `Ok(HashMap::new())`
+ 13. `test_fetch_statuses_403_graceful` — mock returns 403 → `Ok(HashMap::new())`
```
2. **Make enrichment atomic and stale-safe**
Current plan can leave stale status values forever when a widget disappears or status becomes null. Make writes transactional and clear status fields for fetched scope before upserts.
```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] Enrichment DB writes are transactional per project (all-or-nothing)
+ [ ] Status fields are cleared for fetched issue scope before applying new statuses
+ [ ] If enrichment fails mid-project, prior persisted statuses are unchanged (rollback)
@@ File 6: `src/ingestion/orchestrator.rs`
- fn enrich_issue_statuses(...)
+ fn enrich_issue_statuses_txn(...)
+ // BEGIN TRANSACTION
+ // clear status columns for fetched issue scope
+ // apply updates
+ // COMMIT
```
3. **Add transient retry/backoff (429/5xx/network)**
Right now one transient failure loses status enrichment for that sync. Retrying with bounded backoff gives much better reliability at low cost.
```diff
@@ AC-1: GraphQL Client (Unit)
+ [ ] Retries 429/502/503/504/network errors with bounded exponential backoff + jitter (max 3 attempts)
+ [ ] Honors `Retry-After` on 429 before retrying
@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] Cancellation signal is checked before each retry sleep and between paginated calls
```
4. **Stop full GraphQL scans when nothing changed**
Running full pagination on every sync will dominate runtime on large repos. Trigger enrichment only when issue ingestion reports changes, with a manual override.
```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
- [ ] Runs on every sync (not gated by `--full`)
+ [ ] Runs when issue ingestion changed at least one issue in the project
+ [ ] New override flag `--refresh-status` forces enrichment even with zero issue deltas
+ [ ] Optional periodic full refresh (e.g. every N syncs) to prevent long-tail drift
```
5. **Do not expose raw token via `client.token()`**
Architecturally cleaner and safer: keep token encapsulated and expose a GraphQL-ready client factory from `GitLabClient`.
```diff
@@ File 13: `src/gitlab/client.rs`
- pub fn token(&self) -> &str
+ pub fn graphql_client(&self) -> crate::gitlab::graphql::GraphqlClient
@@ File 6: `src/ingestion/orchestrator.rs`
- let graphql_client = GraphqlClient::new(&config.gitlab.base_url, client.token());
+ let graphql_client = client.graphql_client();
```
6. **Add indexes for new status filters**
`--status` on large tables will otherwise full-scan `issues`. Add compound indexes aligned with project-scoped list queries.
```diff
@@ AC-4: Migration 021 (Unit)
+ [ ] Adds index `idx_issues_project_status_name(project_id, status_name)`
+ [ ] Adds index `idx_issues_project_status_category(project_id, status_category)`
@@ File 14: `migrations/021_work_item_status.sql`
ALTER TABLE issues ADD COLUMN status_name TEXT;
ALTER TABLE issues ADD COLUMN status_category TEXT;
ALTER TABLE issues ADD COLUMN status_color TEXT;
ALTER TABLE issues ADD COLUMN status_icon_name TEXT;
+CREATE INDEX IF NOT EXISTS idx_issues_project_status_name
+ ON issues(project_id, status_name);
+CREATE INDEX IF NOT EXISTS idx_issues_project_status_category
+ ON issues(project_id, status_category);
```
7. **Improve filter UX: add category filter + case-insensitive status**
Case-sensitive exact name matches are brittle with custom lifecycle names. Category filter is stable and useful for automation.
```diff
@@ AC-9: List Issues Filter (E2E)
- [ ] Filter is case-sensitive (matches GitLab's exact status name)
+ [ ] `--status` uses case-insensitive exact match by default (`COLLATE NOCASE`)
+ [ ] New filter `--status-category` supports `triage|to_do|in_progress|done|canceled`
+ [ ] `--status-exact` enables strict case-sensitive behavior when needed
```
8. **Add capability probe/cache to avoid pointless calls**
Free tier / old GitLab versions will never return status widget. Cache that capability per project (with TTL) to reduce noise and wasted requests.
```diff
@@ GitLab API Constraints
+### Capability Probe
+On first sync per project, detect status-widget support and cache result for 24h.
+If unsupported, skip enrichment silently (debug log) until TTL expiry.
@@ AC-3: Status Fetcher (Integration)
+ [ ] Unsupported capability state bypasses GraphQL fetch and warning spam
```
9. **Use a nested robot `status` object instead of 4 top-level fields**
This is cleaner schema design and scales better as status metadata grows (IDs, lifecycle, timestamps, etc.).
```diff
@@ AC-7: Show Issue Display (Robot)
- [ ] JSON includes `status_name`, `status_category`, `status_color`, `status_icon_name` fields
- [ ] Fields are `null` (not absent) when status not available
+ [ ] JSON includes `status` object:
+ `{ "name": "...", "category": "...", "color": "...", "icon_name": "..." }` or `null`
@@ AC-8: List Issues Display (Robot)
- [ ] JSON includes `status_name`, `status_category` fields on each issue
+ [ ] JSON includes `status` object (or `null`) on each issue
```
10. **Add one compelling feature: status analytics, not just status display**
Right now this is mostly a transport/display enhancement. Make it genuinely useful with “stale in-progress” detection and age-in-status filters.
```diff
@@ Acceptance Criteria
+### AC-11: Status Aging & Triage Value (E2E)
+- [ ] `lore list issues --status-category in_progress --stale-days 14` filters to stale work
+- [ ] Human table shows `Status Age` (days) when status exists
+- [ ] Robot output includes `status_age_days` (nullable integer)
```
11. **Harden test plan around failure modes youll actually hit**
The current tests are good, but miss rollback/staleness/retry behavior that drives real reliability.
```diff
@@ TDD Plan (RED) additions
+21. `test_enrich_clears_removed_status`
+22. `test_enrich_transaction_rolls_back_on_failure`
+23. `test_graphql_retry_429_then_success`
+24. `test_graphql_retry_503_then_success`
+25. `test_cancel_during_backoff_aborts_cleanly`
+26. `test_status_filter_query_uses_project_status_index` (EXPLAIN smoke test)
```
If you want, I can produce a fully revised v3 plan document end-to-end (frontmatter + reordered ACs + updated file list + updated TDD matrix) so it is ready to implement directly.

View File

@@ -0,0 +1,159 @@
Your plan is already strong, but Id revise it in 10 places to reduce risk at scale and make it materially more useful.
1. Shared transport + retries for GraphQL (must-have)
Reasoning: `REST` already has throttling/retry in `src/gitlab/client.rs`; your proposed GraphQL client would bypass that and can spike rate limits under concurrent project ingest (`src/cli/commands/ingest.rs`). Unifying transport prevents split behavior and cuts production incidents.
```diff
@@ AC-1: GraphQL Client (Unit)
- [ ] Network error → `LoreError::Other`
+ [ ] GraphQL requests use shared GitLab transport (same timeout, rate limiter, retry policy as REST)
+ [ ] Retries 429/502/503/504/network errors (max 3) with exponential backoff + jitter
+ [ ] 429 honors `Retry-After` before retrying
+ [ ] Exhausted network retries → `LoreError::GitLabNetworkError`
@@ Decisions
- 8. **No retry/backoff in v1** — DEFER.
+ 8. **Retry/backoff in v1** — YES (shared REST+GraphQL reliability policy).
@@ Implementation Detail
+ File 15: `src/gitlab/transport.rs` (NEW) — shared HTTP execution and retry/backoff policy.
```
2. Capability cache for unsupported projects (must-have)
Reasoning: Free tier / older GitLab will repeatedly emit warning noise every sync and waste calls. Cache support status per project and re-probe on TTL.
```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
- [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync)
+ [ ] Unsupported capability responses (missing endpoint/type/widget) are cached per project
+ [ ] While cached unsupported, enrichment is skipped without repeated warning spam
+ [ ] Capability cache auto-expires (default 24h) and is re-probed
@@ Migration Numbering
- This feature uses **migration 021**.
+ This feature uses **migrations 021-022**.
@@ Files Changed (Summary)
+ `migrations/022_project_capabilities.sql` | NEW — support cache table for project capabilities
```
3. Delta-first enrichment with periodic full reconcile (must-have)
Reasoning: Full GraphQL scan every sync is expensive for large projects. You already compute issue deltas in ingestion; use that as fast path and keep a periodic full sweep as safety net.
```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
- [ ] Runs on every sync (not gated by `--full`)
+ [ ] Fast path: skip status enrichment when issue ingestion upserted 0 issues for that project
+ [ ] Safety net: run full reconciliation every `status_full_reconcile_hours` (default 24)
+ [ ] `--full` always forces reconciliation
@@ AC-5: Config Toggle (Unit)
+ [ ] `SyncConfig` has `status_full_reconcile_hours: u32` (default 24)
```
4. Strongly typed widget parsing via `__typename` (must-have)
Reasoning: current “deserialize arbitrary widget JSON into `StatusWidget`” is fragile. Query/type by `__typename` for forward compatibility and fewer silent parse mistakes.
```diff
@@ AC-3: Status Fetcher (Integration)
- [ ] Extracts status from `widgets` array by matching `WorkItemWidgetStatus` fragment
+ [ ] Query includes `widgets { __typename ... }` and parser matches `__typename == "WorkItemWidgetStatus"`
+ [ ] Non-status widgets are ignored deterministically (no heuristic JSON-deserialize attempts)
@@ GraphQL Query
+ widgets {
+ __typename
+ ... on WorkItemWidgetStatus { ... }
+ }
```
5. Set-based transactional DB apply (must-have)
Reasoning: row-by-row clear/update loops will be slow on large projects and hold write locks longer. Temp-table + set-based SQL inside one txn is faster and easier to reason about rollback.
```diff
@@ AC-3: Status Fetcher (Integration)
- `all_fetched_iids: Vec<i64>`
+ `all_fetched_iids: HashSet<i64>`
@@ AC-6: Enrichment in Orchestrator (Integration)
- [ ] Before applying updates, NULL out status fields ... (loop per IID)
- [ ] UPDATE SQL: `SET status_name=?, ... WHERE project_id=? AND iid=?`
+ [ ] Use temp tables and set-based SQL in one transaction:
+ [ ] (1) clear stale statuses for fetched IIDs absent from status rows
+ [ ] (2) apply status values for fetched IIDs with status
+ [ ] One commit per project; rollback leaves prior state intact
```
6. Fix index strategy for `COLLATE NOCASE` + default sorting (must-have)
Reasoning: your proposed `(project_id, status_name)` index may not fully help `COLLATE NOCASE` + `ORDER BY updated_at`. Tune index to real query shape in `src/cli/commands/list.rs`.
```diff
@@ AC-4: Migration 021 (Unit)
- [ ] Adds compound index `idx_issues_project_status_name(project_id, status_name)` for `--status` filter performance
+ [ ] Adds covering NOCASE-aware index:
+ [ ] `idx_issues_project_status_name_nocase_updated(project_id, status_name COLLATE NOCASE, updated_at DESC)`
+ [ ] Adds category index:
+ [ ] `idx_issues_project_status_category_nocase(project_id, status_category COLLATE NOCASE)`
```
7. Add stable/automation-friendly filters now (high-value feature)
Reasoning: status names are user-customizable and renameable; category is more stable. Also add `--no-status` for quality checks and migration visibility.
```diff
@@ AC-9: List Issues Filter (E2E)
+ [ ] `lore list issues --status-category in_progress` filters by category (case-insensitive)
+ [ ] `lore list issues --no-status` returns only issues where `status_name IS NULL`
+ [ ] `--status` + `--status-category` combine with AND logic
@@ File 9: `src/cli/mod.rs`
+ Add flags: `--status-category`, `--no-status`
@@ File 11: `src/cli/autocorrect.rs`
+ Register `--status-category` and `--no-status` for `issues`
```
8. Better enrichment observability and failure accounting (must-have ops)
Reasoning: only tracking `statuses_enriched` hides skipped/cleared/errors, and auth failures become silent partial data quality issues. Add counters and explicit progress events.
```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
- [ ] `IngestProjectResult` gains `statuses_enriched: usize` counter
- [ ] Progress event: `ProgressEvent::StatusEnrichmentComplete { enriched: usize }`
+ [ ] `IngestProjectResult` gains:
+ [ ] `statuses_enriched`, `statuses_cleared`, `status_enrichment_skipped`, `status_enrichment_failed`
+ [ ] Progress events:
+ [ ] `StatusEnrichmentStarted`, `StatusEnrichmentSkipped`, `StatusEnrichmentComplete`, `StatusEnrichmentFailed`
+ [ ] End-of-sync summary includes per-project enrichment outcome counts
```
9. Add `status_changed_at` for immediately useful workflow analytics (high-value feature)
Reasoning: without change timestamp, you cant answer “how long has this been in progress?” which is one of the most useful agent/human queries.
```diff
@@ AC-4: Migration 021 (Unit)
+ [ ] Adds nullable INTEGER column `status_changed_at` (ms epoch UTC)
@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] If status_name/category changes, update `status_changed_at = now_ms()`
+ [ ] If status is cleared, set `status_changed_at = NULL`
@@ AC-9: List Issues Filter (E2E)
+ [ ] `lore list issues --stale-status-days N` filters by `status_changed_at <= now - N days`
```
10. Expand test matrix for real-world failure/perf paths (must-have)
Reasoning: current tests are good, but the highest-risk failures are retry behavior, capability caching, idempotency under repeated runs, and large-project performance.
```diff
@@ TDD Plan — RED Phase
+ 26. `test_graphql_retries_429_with_retry_after_then_succeeds`
+ 27. `test_graphql_retries_503_then_fails_after_max_attempts`
+ 28. `test_capability_cache_skips_unsupported_project_until_ttl_expiry`
+ 29. `test_delta_skip_when_no_issue_upserts`
+ 30. `test_periodic_full_reconcile_runs_after_threshold`
+ 31. `test_set_based_enrichment_scales_10k_issues_without_timeout`
+ 32. `test_enrichment_idempotent_across_two_runs`
+ 33. `test_status_changed_at_updates_only_on_actual_status_change`
```
If you want, I can now produce a single consolidated revised plan document (full rewritten Markdown) with these changes merged in-place so its ready to execute.

View File

@@ -0,0 +1,124 @@
Your plan is already strong and implementation-aware. The best upgrades are mostly about reliability under real-world API instability, large-scale performance, and making the feature more useful for automation.
1. Promote retry/backoff from deferred to in-scope now.
Reason: Right now, transient failures cause silent status gaps until a later sync. Bounded retries with jitter and a time budget dramatically improve successful enrichment without making syncs hang.
```diff
@@ AC-1: GraphQL Client (Unit) @@
- [ ] Network error → `LoreError::Other`
+ [ ] Transient failures (`429`, `502`, `503`, `504`, timeout, connect reset) retry with exponential backoff + jitter (max 3 attempts)
+ [ ] `Retry-After` supports both delta-seconds and HTTP-date formats
+ [ ] Per-request retry budget capped (e.g. 120s total) to preserve cancellation responsiveness
@@ AC-6: Enrichment in Orchestrator (Integration) @@
- [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync)
+ [ ] On transient GraphQL errors: retry policy applied before warning/skip behavior
@@ Decisions @@
- 8. **No retry/backoff in v1** — DEFER.
+ 8. **Retry/backoff in v1** — YES. Required for reliable enrichment under normal GitLab/API turbulence.
```
2. Add a capability cache so unsupported projects stop paying repeated GraphQL cost.
Reason: Free tier / older instances will never return status widgets. Re-querying every sync is wasted time and noisy logs.
```diff
@@ Acceptance Criteria @@
+ ### AC-11: Capability Probe & Cache (Integration)
+ - [ ] Add `project_capabilities` cache with `supports_work_item_status`, `checked_at`, `cooldown_until`
+ - [ ] 404/403/known-unsupported responses update capability cache and suppress repeated warnings until TTL expires
+ - [ ] Supported projects still enrich every run (subject to normal schedule)
@@ Future Enhancements (Not in Scope) @@
- **Capability probe/cache**: Detect status-widget support per project ... (deferred)
+ (moved into scope as AC-11)
```
3. Make enrichment delta-aware with periodic forced reconciliation.
Reason: Full pagination every sync is expensive on large projects. You can skip unnecessary status fetches when no issue changes occurred, while still doing periodic safety sweeps.
```diff
@@ AC-6: Enrichment in Orchestrator (Integration) @@
- [ ] Runs on every sync (not gated by `--full`)
+ [ ] Runs when issue ingestion reports project issue deltas OR reconcile window elapsed
+ [ ] New config: `status_reconcile_hours` (default: 24) for periodic full sweep
+ [ ] `--refresh-status` forces enrichment regardless of delta/reconcile window
```
4. Replace row-by-row update loops with set-based SQL via temp table.
Reason: Current per-IID loops are simple but slow at scale and hold locks longer. Set-based updates are much faster and reduce lock contention.
```diff
@@ File 6: `src/ingestion/orchestrator.rs` (MODIFY) @@
- for iid in all_fetched_iids { ... UPDATE issues ... }
- for (iid, status) in statuses { ... UPDATE issues ... }
+ CREATE TEMP TABLE temp_issue_status_updates(...)
+ bulk INSERT temp rows (iid, name, category, color, icon_name)
+ single set-based UPDATE for enriched rows
+ single set-based NULL-clear for fetched-without-status rows
+ commit transaction
```
5. Add strict mode and explicit partial-failure reporting.
Reason: “Warn and continue” is good default UX, but automation needs a fail-fast option and machine-readable failure output.
```diff
@@ AC-5: Config Toggle (Unit) @@
+ - [ ] `SyncConfig` adds `status_enrichment_strict: bool` (default false)
@@ AC-6: Enrichment in Orchestrator (Integration) @@
- [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync)
+ [ ] Default mode: warn + continue
+ [ ] Strict mode: status enrichment error fails sync for that run
@@ AC-6: IngestProjectResult @@
+ - [ ] Adds `status_enrichment_error: Option<String>`
@@ AC-8 / Robot sync envelope @@
+ - [ ] Robot output includes `partial_failures` array with per-project enrichment failures
```
6. Fix case-insensitive matching robustness and track freshness.
Reason: SQLite `COLLATE NOCASE` is ASCII-centric; custom statuses may be non-ASCII. Also you need visibility into staleness.
```diff
@@ AC-4: Migration 021 (Unit) @@
- [ ] Migration adds 4 nullable TEXT columns to `issues`
+ [ ] Migration adds 6 columns:
+ `status_name`, `status_category`, `status_color`, `status_icon_name`,
+ `status_name_fold`, `status_synced_at`
- [ ] Adds compound index `idx_issues_project_status_name(project_id, status_name)`
+ [ ] Adds compound index `idx_issues_project_status_name_fold(project_id, status_name_fold)`
@@ AC-9: List Issues Filter (E2E) @@
- [ ] Filter uses case-insensitive matching (`COLLATE NOCASE`)
+ [ ] Filter uses `status_name_fold` (Unicode-safe fold normalization done at write time)
```
7. Expand filtering to category and missing-status workflows.
Reason: Name filters are useful, but automation is better on semantic categories and “missing data” detection.
```diff
@@ AC-9: List Issues Filter (E2E) @@
+ - [ ] `--status-category in_progress` filters by `status_category` (case-insensitive)
+ - [ ] `--no-status` returns only issues where `status_name IS NULL`
+ - [ ] `--status` and `--status-category` can be combined with AND logic
```
8. Change robot payload from flat status fields to a nested `status` object.
Reason: Better schema evolution and less top-level field sprawl as you add metadata (`synced_at`, future lifecycle fields).
```diff
@@ AC-7: Show Issue Display (E2E) @@
- [ ] JSON includes `status_name`, `status_category`, `status_color`, `status_icon_name` fields
- [ ] Fields are `null` (not absent) when status not available
+ [ ] JSON includes `status` object:
+ `{ "name", "category", "color", "icon_name", "synced_at" }`
+ [ ] `status: null` when not available
@@ AC-8: List Issues Display (E2E) @@
- [ ] `--fields` supports: `status_name`, `status_category`, `status_color`, `status_icon_name`
+ [ ] `--fields` supports: `status.name,status.category,status.color,status.icon_name,status.synced_at`
```
If you want, I can produce a fully rewritten “Iteration 5” plan document with these changes integrated end-to-end (ACs, files, migrations, TDD batches, and updated decisions/future-scope).

View File

@@ -0,0 +1,130 @@
Your iteration-5 plan is strong. The biggest remaining gaps are outcome ambiguity, cancellation safety, and long-term status identity. These are the revisions Id make.
1. **Make enrichment outcomes explicit (not “empty success”)**
Analysis:
Right now `404/403 -> Ok(empty)` is operationally ambiguous: “project has no statuses” vs “feature unavailable/auth issue.” Agents and dashboards need that distinction to make correct decisions.
This improves reliability and observability without making sync fail-hard.
```diff
@@ AC-3: Status Fetcher (Integration)
-- [ ] `fetch_issue_statuses()` returns `FetchStatusResult` containing:
+- [ ] `fetch_issue_statuses()` returns `FetchStatusOutcome`:
+ - `Fetched(FetchStatusResult)`
+ - `Unsupported { reason: UnsupportedReason }`
+ - `CancelledPartial(FetchStatusResult)`
@@
-- [ ] GraphQL 404 → returns `Ok(FetchStatusResult)` with empty collections + warning log
-- [ ] GraphQL 403 (`GitLabAuthFailed`) → returns `Ok(FetchStatusResult)` with empty collections + warning log
+- [ ] GraphQL 404 → `Unsupported { reason: GraphqlEndpointMissing }` + warning log
+- [ ] GraphQL 403 (`GitLabAuthFailed`) → `Unsupported { reason: AuthForbidden }` + warning log
@@ AC-10: Robot Sync Envelope (E2E)
-- [ ] `status_enrichment` object: `{ "enriched": N, "cleared": N, "error": null | "message" }`
+- [ ] `status_enrichment` object: `{ "mode": "fetched|unsupported|cancelled_partial", "reason": null|"...", "enriched": N, "cleared": N, "error": null|"message" }`
```
2. **Add cancellation and pagination loop safety**
Analysis:
Large projects can run long. Current flow checks cancellation only before enrichment starts; pagination and per-row update loops can ignore cancellation for too long. Also, GraphQL cursor bugs can create infinite loops (`hasNextPage=true` with unchanged cursor).
This is a robustness must-have.
```diff
@@ AC-3: Status Fetcher (Integration)
+ [ ] `fetch_issue_statuses()` accepts cancellation signal and checks it between page requests
+ [ ] Pagination guard: if `hasNextPage=true` but `endCursor` is `None` or unchanged, abort loop with warning and return partial outcome
+ [ ] Emits `pages_fetched` count for diagnostics
@@ File 1: `src/gitlab/graphql.rs`
-- pub async fn fetch_issue_statuses(client: &GraphqlClient, project_path: &str) -> Result<FetchStatusResult>
+- pub async fn fetch_issue_statuses(client: &GraphqlClient, project_path: &str, signal: &CancellationSignal) -> Result<FetchStatusOutcome>
```
3. **Persist stable `status_id` in addition to name**
Analysis:
`status_name` is display-oriented and mutable (rename/custom lifecycle changes). A stable status identifier is critical for durable automations, analytics, and future migrations.
This is a schema decision that is cheap now and expensive later if skipped.
```diff
@@ AC-2: Status Types (Unit)
-- [ ] `WorkItemStatus` struct has `name`, `category`, `color`, `icon_name`
+- [ ] `WorkItemStatus` struct has `id: String`, `name`, `category`, `color`, `icon_name`
@@ AC-4: Migration 021 (Unit)
-- [ ] Migration adds 5 nullable columns to `issues`: `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at`
+- [ ] Migration adds 6 nullable columns to `issues`: `status_id`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at`
+ [ ] Adds index `idx_issues_project_status_id(project_id, status_id)` for stable-machine filters
@@ GraphQL query
- status { name category color iconName }
+ status { id name category color iconName }
@@ AC-7 / AC-8 Robot
+ [ ] JSON includes `status_id` (null when unavailable)
```
4. **Handle GraphQL partial-data responses correctly**
Analysis:
GraphQL can return both `data` and `errors` in the same response. Current plan treats any `errors` as hard failure, which can discard valid data and reduce reliability.
Use partial-data semantics: keep data, log/report warnings.
```diff
@@ AC-1: GraphQL Client (Unit)
-- [ ] Error response: if top-level `errors` array is non-empty, returns `LoreError` with first error message
+- [ ] If `errors` non-empty and `data` missing: return `LoreError` with first error message
+- [ ] If `errors` non-empty and `data` present: return `data` + warning metadata (do not fail the whole fetch)
@@ TDD Plan (RED)
+ 33. `test_graphql_partial_data_with_errors_returns_data_and_warning`
```
5. **Extract status enrichment from orchestrator into dedicated module**
Analysis:
`orchestrator.rs` already has many phases. Putting status transport/parsing/transaction policy directly there increases coupling and test friction.
A dedicated module improves architecture clarity and makes future enhancements safer.
```diff
@@ Implementation Detail
+- File 15: `src/ingestion/enrichment/status.rs` (NEW)
+ - `run_status_enrichment(...)`
+ - `enrich_issue_statuses_txn(...)`
+ - outcome mapping + telemetry
@@ File 6: `src/ingestion/orchestrator.rs`
-- Inline Phase 1.5 logic + helper function
+- Delegates to `enrichment::status::run_status_enrichment(...)` and records returned stats
```
6. **Add status/state consistency checks**
Analysis:
GitLab states status categories and issue state should synchronize, but ingestion drift or API edge cases can violate this. Detecting mismatch is high-signal for data integrity issues.
This is compelling for agents because it catches “looks correct but isnt” problems.
```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] Enrichment computes `status_state_mismatches` count:
+ - `DONE|CANCELED` with `state=open` or `TO_DO|IN_PROGRESS|TRIAGE` with `state=closed`
+ [ ] Logs warning summary when mismatches > 0
@@ AC-10: Robot Sync Envelope (E2E)
+ [ ] `status_enrichment` includes `state_mismatches: N`
```
7. **Add explicit performance envelope acceptance criterion**
Analysis:
Plan claims large-project handling, but no hard validation target is defined. Add a bounded, reproducible performance criterion to prevent regressions.
This is especially important with pagination + per-row writes.
```diff
@@ Acceptance Criteria
+ ### AC-12: Performance Envelope (Integration)
+ - [ ] 10k-issue fixture completes status fetch + apply within defined budget on CI baseline machine
+ - [ ] Memory usage remains O(page_size), not O(total_issues)
+ - [ ] Cancellation during large sync exits within a bounded latency target
@@ TDD Plan (RED)
+ 34. `test_enrichment_large_project_budget`
+ 35. `test_fetch_statuses_memory_bound_by_page`
+ 36. `test_cancellation_latency_during_pagination`
```
If you want, I can next produce a single consolidated “iteration 6” plan draft with these diffs fully merged so its ready to execute.

View File

@@ -0,0 +1,118 @@
**Highest-Impact Revisions (new, not in your rejected list)**
1. **Critical: Preserve GraphQL partial-error metadata end-to-end (dont just log it)**
Rationale: Right now partial GraphQL errors are warning-only. Agents get no machine-readable signal that status data may be incomplete, which can silently corrupt downstream automation decisions. Exposing partial-error metadata in `FetchStatusResult` and robot sync output makes reliability observable and actionable.
```diff
@@ AC-1: GraphQL Client (Unit)
- [ ] Partial-data response: if `errors` array is non-empty BUT `data` field is present and non-null, returns `data` and logs warning with first error message
+ [ ] Partial-data response: if `errors` array is non-empty BUT `data` field is present and non-null, returns `data` and warning metadata (`had_errors=true`, `first_error_message`)
+ [ ] `GraphqlClient::query()` returns `GraphqlQueryResult { data, had_errors, first_error_message }`
@@ AC-3: Status Fetcher (Integration)
+ [ ] `FetchStatusResult` includes `partial_error_count: usize` and `first_partial_error: Option<String>`
+ [ ] Partial GraphQL errors increment `partial_error_count` and are surfaced to orchestrator result
@@ AC-10: Robot Sync Envelope (E2E)
- { "mode": "...", "reason": ..., "enriched": N, "cleared": N, "error": ... }
+ { "mode": "...", "reason": ..., "enriched": N, "cleared": N, "error": ..., "partial_errors": N, "first_partial_error": null|"..." }
@@ File 1: src/gitlab/graphql.rs
- pub async fn query(...) -> Result<serde_json::Value>
+ pub async fn query(...) -> Result<GraphqlQueryResult>
+ pub struct GraphqlQueryResult { pub data: serde_json::Value, pub had_errors: bool, pub first_error_message: Option<String> }
```
2. **High: Add adaptive page-size fallback for GraphQL complexity/timeout failures**
Rationale: Fixed `first: 100` is brittle on self-hosted instances with stricter complexity/time limits. Adaptive page size (100→50→25→10) improves success rate without retries/backoff and avoids failing an entire project due to one tunable server constraint.
```diff
@@ Query Path
-query($projectPath: ID!, $after: String) { ... workItems(types: [ISSUE], first: 100, after: $after) ... }
+query($projectPath: ID!, $after: String, $first: Int!) { ... workItems(types: [ISSUE], first: $first, after: $after) ... }
@@ AC-3: Status Fetcher (Integration)
+ [ ] Starts with `first=100`; on GraphQL complexity/timeout errors, retries same cursor with smaller page size (50, 25, 10)
+ [ ] If smallest page size still fails, returns error as today
+ [ ] Emits warning including page size downgrade event
@@ TDD Plan (RED)
+ 36. `test_fetch_statuses_complexity_error_reduces_page_size`
+ 37. `test_fetch_statuses_timeout_error_reduces_page_size`
```
3. **High: Make project path lookup failure non-fatal for the sync**
Rationale: Enrichment is optional. If `projects.path_with_namespace` lookup fails for any reason, sync should continue with a structured enrichment error instead of risking full project pipeline failure.
```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] If project path lookup fails/missing, status enrichment is skipped for that project, warning logged, and sync continues
+ [ ] `status_enrichment_error` captures `"project_path_missing"` (or DB error text)
@@ File 6: src/ingestion/orchestrator.rs
- let project_path: String = conn.query_row(...)?;
+ let project_path = conn.query_row(...).optional()?;
+ if project_path.is_none() {
+ result.status_enrichment_error = Some("project_path_missing".to_string());
+ result.status_enrichment_mode = "fetched".to_string(); // attempted but unavailable locally
+ emit(ProgressEvent::StatusEnrichmentComplete { enriched: 0, cleared: 0 });
+ // continue to discussion sync
+ }
```
4. **Medium: Upgrade `--status` from single-value to repeatable multi-value filter**
Rationale: Practical usage often needs “active buckets” (`To do` OR `In progress`). Repeatable `--status` with OR semantics dramatically improves usefulness without adding new conceptual surface area.
```diff
@@ AC-9: List Issues Filter (E2E)
- [ ] `lore list issues --status "In progress"` → only issues where `status_name = 'In progress'`
+ [ ] `lore list issues --status "In progress"` → unchanged single-value behavior
+ [ ] Repeatable flags supported: `--status "In progress" --status "To do"` (OR semantics across status values)
+ [ ] Repeated `--status` remains AND-composed with other filters
@@ File 9: src/cli/mod.rs
- pub status: Option<String>,
+ pub status: Vec<String>, // repeatable flag
@@ File 8: src/cli/commands/list.rs
- if let Some(status) = filters.status { where_clauses.push("i.status_name = ? COLLATE NOCASE"); ... }
+ if !filters.statuses.is_empty() { /* dynamic OR/IN clause with case-insensitive matching */ }
```
5. **Medium: Add coverage telemetry (`seen`, `with_status`, `without_status`)**
Rationale: `enriched`/`cleared` alone is not enough to judge enrichment health. Coverage counters make it obvious whether a project truly has no statuses, is unsupported, or has unexpectedly low status population.
```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] `IngestProjectResult` gains `statuses_seen: usize` and `statuses_without_widget: usize`
+ [ ] Enrichment log includes `seen`, `enriched`, `cleared`, `without_widget`
@@ AC-10: Robot Sync Envelope (E2E)
- status_enrichment: { mode, reason, enriched, cleared, error }
+ status_enrichment: { mode, reason, seen, enriched, cleared, without_widget, error, partial_errors }
@@ File 6: src/ingestion/orchestrator.rs
+ result.statuses_seen = fetch_result.all_fetched_iids.len();
+ result.statuses_without_widget = result.statuses_seen.saturating_sub(result.statuses_enriched);
```
6. **Medium: Centralize color parsing/render decisions (single helper used by show/list)**
Rationale: Color parsing is duplicated in `show.rs` and `list.rs`, which invites drift and inconsistent behavior. One shared helper gives consistent fallback behavior and simpler tests.
```diff
@@ File 7: src/cli/commands/show.rs
- fn style_with_hex(...) { ...hex parse logic... }
+ use crate::cli::commands::color::style_with_hex;
@@ File 8: src/cli/commands/list.rs
- fn colored_cell_hex(...) { ...hex parse logic... }
+ use crate::cli::commands::color::colored_cell_hex;
@@ Files Changed (Summary)
+ `src/cli/commands/color.rs` (NEW) — shared hex parsing + styling helpers
- duplicated hex parsing blocks removed from show/list
```
---
If you want, I can produce a **single consolidated patch-style diff of the plan document itself** (all section edits merged, ready to paste as iteration 7).

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -21,6 +21,10 @@ pub enum CorrectionRule {
SingleDashLongFlag,
CaseNormalization,
FuzzyFlag,
SubcommandAlias,
ValueNormalization,
ValueFuzzy,
FlagPrefix,
}
/// Result of the correction pass over raw args.
@@ -40,6 +44,7 @@ const GLOBAL_FLAGS: &[&str] = &[
"--robot",
"--json",
"--color",
"--icons",
"--quiet",
"--no-quiet",
"--verbose",
@@ -61,6 +66,7 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--assignee",
"--label",
"--milestone",
"--status",
"--since",
"--due-before",
"--has-due",
@@ -118,8 +124,10 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--no-docs",
"--no-events",
"--no-file-changes",
"--no-status",
"--dry-run",
"--no-dry-run",
"--timings",
],
),
(
@@ -134,6 +142,7 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--since",
"--updated-since",
"--limit",
"--fields",
"--explain",
"--no-explain",
"--fts-mode",
@@ -160,8 +169,9 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--project",
"--since",
"--depth",
"--expand-mentions",
"--no-mentions",
"--limit",
"--fields",
"--max-seeds",
"--max-entities",
"--max-evidence",
@@ -177,8 +187,39 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--since",
"--project",
"--limit",
"--fields",
"--detail",
"--no-detail",
"--as-of",
"--explain-score",
"--include-bots",
"--all-history",
],
),
("drift", &["--threshold", "--project"]),
(
"notes",
&[
"--limit",
"--fields",
"--format",
"--author",
"--note-type",
"--contains",
"--note-id",
"--gitlab-note-id",
"--discussion-id",
"--include-system",
"--for-issue",
"--for-mr",
"--project",
"--since",
"--until",
"--path",
"--resolution",
"--sort",
"--asc",
"--open",
],
),
(
@@ -189,10 +230,31 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--gitlab-url",
"--token-env-var",
"--projects",
"--default-project",
],
),
(
"file-history",
&[
"--project",
"--discussions",
"--no-follow-renames",
"--merged",
"--limit",
],
),
(
"trace",
&[
"--project",
"--discussions",
"--no-follow-renames",
"--limit",
],
),
("generate-docs", &["--full", "--project"]),
("completions", &[]),
("robot-docs", &["--brief"]),
(
"list",
&[
@@ -225,18 +287,47 @@ pub const ENUM_VALUES: &[(&str, &[&str])] = &[
("--state", &["opened", "closed", "merged", "locked", "all"]),
("--mode", &["lexical", "hybrid", "semantic"]),
("--sort", &["updated", "created", "iid"]),
("--type", &["issue", "mr", "discussion"]),
("--type", &["issue", "mr", "discussion", "note"]),
("--fts-mode", &["safe", "raw"]),
("--color", &["auto", "always", "never"]),
("--log-format", &["text", "json"]),
("--for", &["issue", "mr"]),
];
// ---------------------------------------------------------------------------
// Subcommand alias map (for forms clap aliases can't express)
// ---------------------------------------------------------------------------
/// Subcommand aliases for non-standard forms (underscores, no separators).
/// Clap `visible_alias`/`alias` handles hyphenated forms (`merge-requests`);
/// this map catches the rest.
const SUBCOMMAND_ALIASES: &[(&str, &str)] = &[
("merge_requests", "mrs"),
("merge_request", "mrs"),
("mergerequests", "mrs"),
("mergerequest", "mrs"),
("generate_docs", "generate-docs"),
("generatedocs", "generate-docs"),
("gendocs", "generate-docs"),
("gen-docs", "generate-docs"),
("robot_docs", "robot-docs"),
("robotdocs", "robot-docs"),
("sync_status", "status"),
("syncstatus", "status"),
("auth_test", "auth"),
("authtest", "auth"),
("file_history", "file-history"),
("filehistory", "file-history"),
];
// ---------------------------------------------------------------------------
// Correction thresholds
// ---------------------------------------------------------------------------
const FUZZY_FLAG_THRESHOLD: f64 = 0.8;
/// Stricter threshold for robot mode — only high-confidence corrections to
/// avoid misleading agents. Still catches obvious typos like `--projct`.
const FUZZY_FLAG_THRESHOLD_STRICT: f64 = 0.9;
// ---------------------------------------------------------------------------
// Core logic
@@ -296,20 +387,29 @@ fn valid_flags_for(subcommand: Option<&str>) -> Vec<&'static str> {
/// Run the pre-clap correction pass on raw args.
///
/// When `strict` is true (robot mode), only deterministic corrections are applied
/// (single-dash long flags, case normalization). Fuzzy matching is disabled to
/// prevent misleading agents with speculative corrections.
/// Three-phase pipeline:
/// - Phase A: Subcommand alias correction (case-insensitive alias map)
/// - Phase B: Per-arg flag corrections (single-dash, case, prefix, fuzzy)
/// - Phase C: Enum value normalization (case + fuzzy + prefix on known values)
///
/// When `strict` is true (robot mode), fuzzy matching uses a higher threshold
/// (0.9 vs 0.8) to avoid speculative corrections while still catching obvious
/// typos like `--projct` → `--project`.
///
/// Returns the (possibly modified) args and any corrections applied.
pub fn correct_args(raw: Vec<String>, strict: bool) -> CorrectionResult {
let subcommand = detect_subcommand(&raw);
let valid = valid_flags_for(subcommand);
let mut corrected = Vec::with_capacity(raw.len());
let mut corrections = Vec::new();
// Phase A: Subcommand alias correction
let args = correct_subcommand(raw, &mut corrections);
// Phase B: Per-arg flag corrections
let valid = valid_flags_for(detect_subcommand(&args));
let mut corrected = Vec::with_capacity(args.len());
let mut past_terminator = false;
for arg in raw {
for arg in args {
// B1: Stop correcting after POSIX `--` option terminator
if arg == "--" {
past_terminator = true;
@@ -331,12 +431,177 @@ pub fn correct_args(raw: Vec<String>, strict: bool) -> CorrectionResult {
}
}
// Phase C: Enum value normalization
normalize_enum_values(&mut corrected, &mut corrections);
CorrectionResult {
args: corrected,
corrections,
}
}
/// Phase A: Replace subcommand aliases with their canonical names.
///
/// Handles forms that can't be expressed as clap `alias`/`visible_alias`
/// (underscores, no-separator forms). Case-insensitive matching.
fn correct_subcommand(mut args: Vec<String>, corrections: &mut Vec<Correction>) -> Vec<String> {
// Find the subcommand position index, then check the alias map.
// Can't use iterators easily because we need to mutate args[i].
let mut skip_next = false;
let mut subcmd_idx = None;
for (i, arg) in args.iter().enumerate().skip(1) {
if skip_next {
skip_next = false;
continue;
}
if arg.starts_with('-') {
if arg.contains('=') {
continue;
}
if matches!(arg.as_str(), "--config" | "-c" | "--color" | "--log-format") {
skip_next = true;
}
continue;
}
subcmd_idx = Some(i);
break;
}
if let Some(i) = subcmd_idx
&& let Some((_, canonical)) = SUBCOMMAND_ALIASES
.iter()
.find(|(alias, _)| alias.eq_ignore_ascii_case(&args[i]))
{
corrections.push(Correction {
original: args[i].clone(),
corrected: (*canonical).to_string(),
rule: CorrectionRule::SubcommandAlias,
confidence: 1.0,
});
args[i] = (*canonical).to_string();
}
args
}
/// Phase C: Normalize enum values for flags with known valid values.
///
/// Handles both `--flag value` and `--flag=value` forms. Corrections are:
/// 1. Case normalization: `Opened` → `opened`
/// 2. Prefix expansion: `open` → `opened` (only if unambiguous)
/// 3. Fuzzy matching: `opend` → `opened`
fn normalize_enum_values(args: &mut [String], corrections: &mut Vec<Correction>) {
let mut i = 0;
while i < args.len() {
// Respect POSIX `--` option terminator — don't normalize values after it
if args[i] == "--" {
break;
}
// Handle --flag=value form
if let Some(eq_pos) = args[i].find('=') {
let flag = args[i][..eq_pos].to_string();
let value = args[i][eq_pos + 1..].to_string();
if let Some(valid_vals) = lookup_enum_values(&flag)
&& let Some((corrected_val, is_case_only)) = normalize_value(&value, valid_vals)
{
let original = args[i].clone();
let corrected = format!("{flag}={corrected_val}");
args[i] = corrected.clone();
corrections.push(Correction {
original,
corrected,
rule: if is_case_only {
CorrectionRule::ValueNormalization
} else {
CorrectionRule::ValueFuzzy
},
confidence: 0.95,
});
}
i += 1;
continue;
}
// Handle --flag value form
if args[i].starts_with("--")
&& let Some(valid_vals) = lookup_enum_values(&args[i])
&& i + 1 < args.len()
&& !args[i + 1].starts_with('-')
{
let value = args[i + 1].clone();
if let Some((corrected_val, is_case_only)) = normalize_value(&value, valid_vals) {
let original = args[i + 1].clone();
args[i + 1] = corrected_val.to_string();
corrections.push(Correction {
original,
corrected: corrected_val.to_string(),
rule: if is_case_only {
CorrectionRule::ValueNormalization
} else {
CorrectionRule::ValueFuzzy
},
confidence: 0.95,
});
}
i += 2;
continue;
}
i += 1;
}
}
/// Look up valid enum values for a flag (case-insensitive flag name match).
fn lookup_enum_values(flag: &str) -> Option<&'static [&'static str]> {
let lower = flag.to_lowercase();
ENUM_VALUES
.iter()
.find(|(f, _)| f.to_lowercase() == lower)
.map(|(_, vals)| *vals)
}
/// Try to normalize a value against a set of valid values.
///
/// Returns `Some((corrected, is_case_only))` if a correction is needed:
/// - `is_case_only = true` for pure case normalization
/// - `is_case_only = false` for prefix/fuzzy corrections
///
/// Returns `None` if the value is already valid or no match is found.
fn normalize_value(input: &str, valid_values: &[&str]) -> Option<(String, bool)> {
// Already valid (exact match)? No correction needed.
if valid_values.contains(&input) {
return None;
}
let lower = input.to_lowercase();
// Case-insensitive exact match
if let Some(&val) = valid_values.iter().find(|v| v.to_lowercase() == lower) {
return Some((val.to_string(), true));
}
// Prefix match (e.g., "open" → "opened") — only if unambiguous
let prefix_matches: Vec<&&str> = valid_values
.iter()
.filter(|v| v.starts_with(&*lower))
.collect();
if prefix_matches.len() == 1 {
return Some(((*prefix_matches[0]).to_string(), false));
}
// Fuzzy match
let best = valid_values
.iter()
.map(|v| (*v, jaro_winkler(&lower, v)))
.max_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
if let Some((val, score)) = best
&& score >= 0.8
{
return Some((val.to_string(), false));
}
None
}
/// Clap built-in flags that should never be corrected. These are handled by clap
/// directly and are not in our GLOBAL_FLAGS registry.
const CLAP_BUILTINS: &[&str] = &["--help", "--version"];
@@ -455,10 +720,34 @@ fn try_correct(arg: &str, valid_flags: &[&str], strict: bool) -> Option<Correcti
});
}
// Rule 3: Fuzzy flag match — `--staate` -> `--state` (skip in strict mode)
if !strict
&& let Some((best_flag, score)) = best_fuzzy_match(&lower, valid_flags)
&& score >= FUZZY_FLAG_THRESHOLD
// Rule 3: Prefix match — `--proj` -> `--project` (only if unambiguous)
let prefix_matches: Vec<&str> = valid_flags
.iter()
.filter(|f| f.starts_with(&*lower) && f.to_lowercase() != lower)
.copied()
.collect();
if prefix_matches.len() == 1 {
let matched = prefix_matches[0];
let corrected = match value_suffix {
Some(suffix) => format!("{matched}{suffix}"),
None => matched.to_string(),
};
return Some(Correction {
original: arg.to_string(),
corrected,
rule: CorrectionRule::FlagPrefix,
confidence: 0.95,
});
}
// Rule 4: Fuzzy flag match — higher threshold in strict/robot mode
let threshold = if strict {
FUZZY_FLAG_THRESHOLD_STRICT
} else {
FUZZY_FLAG_THRESHOLD
};
if let Some((best_flag, score)) = best_fuzzy_match(&lower, valid_flags)
&& score >= threshold
{
let corrected = match value_suffix {
Some(suffix) => format!("{best_flag}{suffix}"),
@@ -532,6 +821,30 @@ pub fn format_teaching_note(correction: &Correction) -> String {
correction.corrected, correction.original
)
}
CorrectionRule::SubcommandAlias => {
format!(
"Use canonical command name: {} (not {})",
correction.corrected, correction.original
)
}
CorrectionRule::ValueNormalization => {
format!(
"Values are lowercase: {} (not {})",
correction.corrected, correction.original
)
}
CorrectionRule::ValueFuzzy => {
format!(
"Correct value spelling: {} (not {})",
correction.corrected, correction.original
)
}
CorrectionRule::FlagPrefix => {
format!(
"Use full flag name: {} (not {})",
correction.corrected, correction.original
)
}
}
}
@@ -715,17 +1028,20 @@ mod tests {
assert_eq!(result.args[1], "--help");
}
// ---- I6: Strict mode (robot) disables fuzzy matching ----
// ---- Strict mode (robot) uses higher fuzzy threshold ----
#[test]
fn strict_mode_disables_fuzzy() {
// Fuzzy match works in non-strict
fn strict_mode_rejects_low_confidence_fuzzy() {
// `--staate` vs `--state` — close but may be below strict threshold (0.9)
// The exact score depends on Jaro-Winkler; this tests that the strict
// threshold is higher than non-strict.
let non_strict = correct_args(args("lore --robot issues --staate opened"), false);
assert_eq!(non_strict.corrections.len(), 1);
assert_eq!(non_strict.corrections[0].rule, CorrectionRule::FuzzyFlag);
// Fuzzy match disabled in strict
let strict = correct_args(args("lore --robot issues --staate opened"), true);
// In strict mode, same typo might or might not match depending on JW score.
// We verify that at least wildly wrong flags are still rejected.
let strict = correct_args(args("lore --robot issues --xyzzy foo"), true);
assert!(strict.corrections.is_empty());
}
@@ -744,6 +1060,155 @@ mod tests {
assert_eq!(result.corrections[0].corrected, "--robot");
}
// ---- Subcommand alias correction ----
#[test]
fn subcommand_alias_merge_requests_underscore() {
let result = correct_args(args("lore --robot merge_requests -n 10"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::SubcommandAlias && c.corrected == "mrs")
);
assert!(result.args.contains(&"mrs".to_string()));
}
#[test]
fn subcommand_alias_mergerequests_no_sep() {
let result = correct_args(args("lore --robot mergerequests"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "mrs"));
}
#[test]
fn subcommand_alias_generate_docs_underscore() {
let result = correct_args(args("lore generate_docs"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "generate-docs")
);
}
#[test]
fn subcommand_alias_case_insensitive() {
let result = correct_args(args("lore Merge_Requests"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "mrs"));
}
#[test]
fn subcommand_alias_valid_command_untouched() {
let result = correct_args(args("lore issues -n 10"), false);
assert!(result.corrections.is_empty());
}
// ---- Enum value normalization ----
#[test]
fn value_case_normalization() {
let result = correct_args(args("lore issues --state Opened"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::ValueNormalization && c.corrected == "opened")
);
assert!(result.args.contains(&"opened".to_string()));
}
#[test]
fn value_case_normalization_eq_form() {
let result = correct_args(args("lore issues --state=Opened"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "--state=opened")
);
}
#[test]
fn value_prefix_expansion() {
// "open" is a unique prefix of "opened"
let result = correct_args(args("lore issues --state open"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "opened" && c.rule == CorrectionRule::ValueFuzzy)
);
}
#[test]
fn value_fuzzy_typo() {
let result = correct_args(args("lore issues --state opend"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "opened"));
}
#[test]
fn value_already_valid_untouched() {
let result = correct_args(args("lore issues --state opened"), false);
// No value corrections expected (flag corrections may still exist)
assert!(!result.corrections.iter().any(|c| matches!(
c.rule,
CorrectionRule::ValueNormalization | CorrectionRule::ValueFuzzy
)));
}
#[test]
fn value_mode_case() {
let result = correct_args(args("lore search --mode Hybrid query"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "hybrid"));
}
#[test]
fn value_normalization_respects_option_terminator() {
// Values after `--` are positional and must not be corrected
let result = correct_args(args("lore search -- --state Opened"), false);
assert!(!result.corrections.iter().any(|c| matches!(
c.rule,
CorrectionRule::ValueNormalization | CorrectionRule::ValueFuzzy
)));
assert_eq!(result.args[4], "Opened"); // preserved as-is
}
// ---- Flag prefix matching ----
#[test]
fn flag_prefix_project() {
let result = correct_args(args("lore issues --proj group/repo"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::FlagPrefix && c.corrected == "--project")
);
}
#[test]
fn flag_prefix_ambiguous_not_corrected() {
// --s could be --state, --since, --sort, --status — ambiguous
let result = correct_args(args("lore issues --s opened"), false);
assert!(
!result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::FlagPrefix)
);
}
#[test]
fn flag_prefix_with_eq_value() {
let result = correct_args(args("lore issues --proj=group/repo"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "--project=group/repo")
);
}
// ---- Teaching notes ----
#[test]
@@ -783,6 +1248,43 @@ mod tests {
assert!(note.contains("spelling"));
}
#[test]
fn teaching_note_subcommand_alias() {
let c = Correction {
original: "merge_requests".to_string(),
corrected: "mrs".to_string(),
rule: CorrectionRule::SubcommandAlias,
confidence: 1.0,
};
let note = format_teaching_note(&c);
assert!(note.contains("canonical"));
assert!(note.contains("mrs"));
}
#[test]
fn teaching_note_value_normalization() {
let c = Correction {
original: "Opened".to_string(),
corrected: "opened".to_string(),
rule: CorrectionRule::ValueNormalization,
confidence: 0.95,
};
let note = format_teaching_note(&c);
assert!(note.contains("lowercase"));
}
#[test]
fn teaching_note_flag_prefix() {
let c = Correction {
original: "--proj".to_string(),
corrected: "--project".to_string(),
rule: CorrectionRule::FlagPrefix,
confidence: 0.95,
};
let note = format_teaching_note(&c);
assert!(note.contains("full flag name"));
}
// ---- Post-clap suggestion helpers ----
#[test]

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::{self, Theme};
use rusqlite::Connection;
use serde::Serialize;
@@ -178,27 +178,6 @@ fn count_notes(conn: &Connection, type_filter: Option<&str>) -> Result<CountResu
})
}
fn format_number(n: i64) -> String {
let (prefix, abs) = if n < 0 {
("-", n.unsigned_abs())
} else {
("", n.unsigned_abs())
};
let s = abs.to_string();
let chars: Vec<char> = s.chars().collect();
let mut result = String::from(prefix);
for (i, c) in chars.iter().enumerate() {
if i > 0 && (chars.len() - i).is_multiple_of(3) {
result.push(',');
}
result.push(*c);
}
result
}
#[derive(Serialize)]
struct CountJsonOutput {
ok: bool,
@@ -284,10 +263,10 @@ pub fn print_event_count_json(counts: &EventCounts, elapsed_ms: u64) {
pub fn print_event_count(counts: &EventCounts) {
println!(
"{:<20} {:>8} {:>8} {:>8}",
style("Event Type").cyan().bold(),
style("Issues").bold(),
style("MRs").bold(),
style("Total").bold()
Theme::info().bold().render("Event Type"),
Theme::bold().render("Issues"),
Theme::bold().render("MRs"),
Theme::bold().render("Total")
);
let state_total = counts.state_issue + counts.state_mr;
@@ -297,33 +276,33 @@ pub fn print_event_count(counts: &EventCounts) {
println!(
"{:<20} {:>8} {:>8} {:>8}",
"State events",
format_number(counts.state_issue as i64),
format_number(counts.state_mr as i64),
format_number(state_total as i64)
render::format_number(counts.state_issue as i64),
render::format_number(counts.state_mr as i64),
render::format_number(state_total as i64)
);
println!(
"{:<20} {:>8} {:>8} {:>8}",
"Label events",
format_number(counts.label_issue as i64),
format_number(counts.label_mr as i64),
format_number(label_total as i64)
render::format_number(counts.label_issue as i64),
render::format_number(counts.label_mr as i64),
render::format_number(label_total as i64)
);
println!(
"{:<20} {:>8} {:>8} {:>8}",
"Milestone events",
format_number(counts.milestone_issue as i64),
format_number(counts.milestone_mr as i64),
format_number(milestone_total as i64)
render::format_number(counts.milestone_issue as i64),
render::format_number(counts.milestone_mr as i64),
render::format_number(milestone_total as i64)
);
let total_issues = counts.state_issue + counts.label_issue + counts.milestone_issue;
let total_mrs = counts.state_mr + counts.label_mr + counts.milestone_mr;
println!(
"{:<20} {:>8} {:>8} {:>8}",
style("Total").bold(),
format_number(total_issues as i64),
format_number(total_mrs as i64),
style(format_number(counts.total() as i64)).bold()
Theme::bold().render("Total"),
render::format_number(total_issues as i64),
render::format_number(total_mrs as i64),
Theme::bold().render(&render::format_number(counts.total() as i64))
);
}
@@ -350,57 +329,56 @@ pub fn print_count_json(result: &CountResult, elapsed_ms: u64) {
}
pub fn print_count(result: &CountResult) {
let count_str = format_number(result.count);
let count_str = render::format_number(result.count);
if let Some(system_count) = result.system_count {
println!(
"{}: {} {}",
style(&result.entity).cyan(),
style(&count_str).bold(),
style(format!(
"{}: {:>10} {}",
Theme::info().render(&result.entity),
Theme::bold().render(&count_str),
Theme::dim().render(&format!(
"(excluding {} system)",
format_number(system_count)
render::format_number(system_count)
))
.dim()
);
} else {
println!(
"{}: {}",
style(&result.entity).cyan(),
style(&count_str).bold()
"{}: {:>10}",
Theme::info().render(&result.entity),
Theme::bold().render(&count_str)
);
}
if let Some(breakdown) = &result.state_breakdown {
println!(" opened: {}", format_number(breakdown.opened));
println!(" opened: {:>10}", render::format_number(breakdown.opened));
if let Some(merged) = breakdown.merged {
println!(" merged: {}", format_number(merged));
println!(" merged: {:>10}", render::format_number(merged));
}
println!(" closed: {}", format_number(breakdown.closed));
println!(" closed: {:>10}", render::format_number(breakdown.closed));
if let Some(locked) = breakdown.locked
&& locked > 0
{
println!(" locked: {}", format_number(locked));
println!(" locked: {:>10}", render::format_number(locked));
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::cli::render;
#[test]
fn format_number_handles_small_numbers() {
assert_eq!(format_number(0), "0");
assert_eq!(format_number(1), "1");
assert_eq!(format_number(100), "100");
assert_eq!(format_number(999), "999");
assert_eq!(render::format_number(0), "0");
assert_eq!(render::format_number(1), "1");
assert_eq!(render::format_number(100), "100");
assert_eq!(render::format_number(999), "999");
}
#[test]
fn format_number_adds_thousands_separators() {
assert_eq!(format_number(1000), "1,000");
assert_eq!(format_number(12345), "12,345");
assert_eq!(format_number(1234567), "1,234,567");
assert_eq!(render::format_number(1000), "1,000");
assert_eq!(render::format_number(12345), "12,345");
assert_eq!(render::format_number(1234567), "1,234,567");
}
}

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::{Icons, Theme};
use serde::Serialize;
use crate::core::config::Config;
@@ -530,7 +530,7 @@ fn check_logging(config: Option<&Config>) -> LoggingCheck {
}
pub fn print_doctor_results(result: &DoctorResult) {
println!("\nlore doctor\n");
println!();
print_check("Config", &result.checks.config.result);
print_check("Database", &result.checks.database.result);
@@ -539,38 +539,61 @@ pub fn print_doctor_results(result: &DoctorResult) {
print_check("Ollama", &result.checks.ollama.result);
print_check("Logging", &result.checks.logging.result);
// Count statuses
let checks = [
&result.checks.config.result,
&result.checks.database.result,
&result.checks.gitlab.result,
&result.checks.projects.result,
&result.checks.ollama.result,
&result.checks.logging.result,
];
let passed = checks
.iter()
.filter(|c| c.status == CheckStatus::Ok)
.count();
let warnings = checks
.iter()
.filter(|c| c.status == CheckStatus::Warning)
.count();
let failed = checks
.iter()
.filter(|c| c.status == CheckStatus::Error)
.count();
println!();
let mut summary_parts = Vec::new();
if result.success {
let ollama_ok = result.checks.ollama.result.status == CheckStatus::Ok;
if ollama_ok {
println!("{}", style("Status: Ready").green());
} else {
println!(
"{} {}",
style("Status: Ready").green(),
style("(lexical search available, semantic search requires Ollama)").yellow()
);
}
summary_parts.push(Theme::success().render("Ready"));
} else {
println!("{}", style("Status: Not ready").red());
summary_parts.push(Theme::error().render("Not ready"));
}
summary_parts.push(format!("{passed} passed"));
if warnings > 0 {
summary_parts.push(Theme::warning().render(&format!("{warnings} warning")));
}
if failed > 0 {
summary_parts.push(Theme::error().render(&format!("{failed} failed")));
}
println!(" {}", summary_parts.join(" \u{b7} "));
println!();
}
fn print_check(name: &str, result: &CheckResult) {
let symbol = match result.status {
CheckStatus::Ok => style("").green(),
CheckStatus::Warning => style("").yellow(),
CheckStatus::Error => style("").red(),
let icon = match result.status {
CheckStatus::Ok => Theme::success().render(Icons::success()),
CheckStatus::Warning => Theme::warning().render(Icons::warning()),
CheckStatus::Error => Theme::error().render(Icons::error()),
};
let message = result.message.as_deref().unwrap_or("");
let message_styled = match result.status {
CheckStatus::Ok => message.to_string(),
CheckStatus::Warning => style(message).yellow().to_string(),
CheckStatus::Error => style(message).red().to_string(),
CheckStatus::Warning => Theme::warning().render(message),
CheckStatus::Error => Theme::error().render(message),
};
println!(" {symbol} {:<10} {message_styled}", name);
println!(" {icon} {:<10} {message_styled}", name);
}

650
src/cli/commands/drift.rs Normal file
View File

@@ -0,0 +1,650 @@
use std::collections::HashMap;
use std::sync::LazyLock;
use regex::Regex;
use serde::Serialize;
use crate::cli::render::{Icons, Theme};
use crate::cli::robot::RobotMeta;
use crate::core::config::Config;
use crate::core::db::create_connection;
use crate::core::error::{LoreError, Result};
use crate::core::paths::get_db_path;
use crate::core::project::resolve_project;
use crate::core::time::ms_to_iso;
use crate::embedding::ollama::{OllamaClient, OllamaConfig};
use crate::embedding::similarity::cosine_similarity;
const BATCH_SIZE: usize = 32;
const WINDOW_SIZE: usize = 3;
const MIN_DESCRIPTION_LEN: usize = 20;
const MAX_NOTES: i64 = 200;
const TOP_TOPICS: usize = 3;
// ---------------------------------------------------------------------------
// Response types
// ---------------------------------------------------------------------------
#[derive(Debug, Serialize)]
pub struct DriftResponse {
pub entity: DriftEntity,
pub drift_detected: bool,
pub threshold: f32,
#[serde(skip_serializing_if = "Option::is_none")]
pub drift_point: Option<DriftPoint>,
pub drift_topics: Vec<String>,
pub similarity_curve: Vec<SimilarityPoint>,
pub recommendation: String,
}
#[derive(Debug, Serialize)]
pub struct DriftEntity {
pub entity_type: String,
pub iid: i64,
pub title: String,
}
#[derive(Debug, Serialize)]
pub struct DriftPoint {
pub note_index: usize,
pub note_id: i64,
pub author: String,
pub created_at: String,
pub similarity: f32,
}
#[derive(Debug, Serialize)]
pub struct SimilarityPoint {
pub note_index: usize,
pub similarity: f32,
pub author: String,
pub created_at: String,
}
// ---------------------------------------------------------------------------
// Internal row types
// ---------------------------------------------------------------------------
struct IssueInfo {
id: i64,
iid: i64,
title: String,
description: Option<String>,
}
struct NoteRow {
id: i64,
body: String,
author_username: String,
created_at: i64,
}
// ---------------------------------------------------------------------------
// Main entry point
// ---------------------------------------------------------------------------
pub async fn run_drift(
config: &Config,
entity_type: &str,
iid: i64,
threshold: f32,
project: Option<&str>,
) -> Result<DriftResponse> {
if entity_type != "issues" {
return Err(LoreError::Other(
"drift currently supports 'issues' only".to_string(),
));
}
let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?;
let issue = find_issue(&conn, iid, project)?;
let description = match &issue.description {
Some(d) if d.len() >= MIN_DESCRIPTION_LEN => d.clone(),
_ => {
return Ok(DriftResponse {
entity: DriftEntity {
entity_type: entity_type.to_string(),
iid: issue.iid,
title: issue.title,
},
drift_detected: false,
threshold,
drift_point: None,
drift_topics: vec![],
similarity_curve: vec![],
recommendation: "Description too short for drift analysis.".to_string(),
});
}
};
let notes = fetch_notes(&conn, issue.id)?;
if notes.len() < WINDOW_SIZE {
return Ok(DriftResponse {
entity: DriftEntity {
entity_type: entity_type.to_string(),
iid: issue.iid,
title: issue.title,
},
drift_detected: false,
threshold,
drift_point: None,
drift_topics: vec![],
similarity_curve: vec![],
recommendation: format!(
"Only {} note(s) found; need at least {} for drift detection.",
notes.len(),
WINDOW_SIZE
),
});
}
// Build texts to embed: description first, then each note body.
let mut texts: Vec<String> = Vec::with_capacity(1 + notes.len());
texts.push(description.clone());
for note in &notes {
texts.push(note.body.clone());
}
let embeddings = embed_texts(config, &texts).await?;
let desc_embedding = &embeddings[0];
let note_embeddings = &embeddings[1..];
// Build similarity curve.
let similarity_curve: Vec<SimilarityPoint> = note_embeddings
.iter()
.enumerate()
.map(|(i, emb)| SimilarityPoint {
note_index: i,
similarity: cosine_similarity(desc_embedding, emb),
author: notes[i].author_username.clone(),
created_at: ms_to_iso(notes[i].created_at),
})
.collect();
// Detect drift via sliding window.
let (drift_detected, drift_point) = detect_drift(&similarity_curve, &notes, threshold);
// Extract drift topics.
let drift_topics = if drift_detected {
let drift_idx = drift_point.as_ref().map_or(0, |dp| dp.note_index);
extract_drift_topics(&description, &notes, drift_idx)
} else {
vec![]
};
let recommendation = if drift_detected {
let dp = drift_point.as_ref().unwrap();
format!(
"Discussion drifted at note {} by @{} (similarity {:.2}). Consider splitting into a new issue.",
dp.note_index, dp.author, dp.similarity
)
} else {
"Discussion remains on topic.".to_string()
};
Ok(DriftResponse {
entity: DriftEntity {
entity_type: entity_type.to_string(),
iid: issue.iid,
title: issue.title,
},
drift_detected,
threshold,
drift_point,
drift_topics,
similarity_curve,
recommendation,
})
}
// ---------------------------------------------------------------------------
// DB helpers
// ---------------------------------------------------------------------------
fn find_issue(
conn: &rusqlite::Connection,
iid: i64,
project_filter: Option<&str>,
) -> Result<IssueInfo> {
let (sql, params): (&str, Vec<Box<dyn rusqlite::ToSql>>) = match project_filter {
Some(project) => {
let project_id = resolve_project(conn, project)?;
(
"SELECT i.id, i.iid, i.title, i.description
FROM issues i
WHERE i.iid = ? AND i.project_id = ?",
vec![Box::new(iid), Box::new(project_id)],
)
}
None => (
"SELECT i.id, i.iid, i.title, i.description
FROM issues i
WHERE i.iid = ?",
vec![Box::new(iid)],
),
};
let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let mut stmt = conn.prepare(sql)?;
let rows: Vec<IssueInfo> = stmt
.query_map(param_refs.as_slice(), |row| {
Ok(IssueInfo {
id: row.get(0)?,
iid: row.get(1)?,
title: row.get(2)?,
description: row.get(3)?,
})
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
match rows.len() {
0 => Err(LoreError::NotFound(format!("Issue #{iid} not found"))),
1 => Ok(rows.into_iter().next().unwrap()),
_ => Err(LoreError::Ambiguous(format!(
"Issue #{iid} exists in multiple projects. Use --project to specify."
))),
}
}
fn fetch_notes(conn: &rusqlite::Connection, issue_id: i64) -> Result<Vec<NoteRow>> {
let mut stmt = conn.prepare(
"SELECT n.id, n.body, n.author_username, n.created_at
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
WHERE d.issue_id = ?
AND n.is_system = 0
AND LENGTH(n.body) >= 20
ORDER BY n.created_at ASC
LIMIT ?",
)?;
let notes: Vec<NoteRow> = stmt
.query_map(rusqlite::params![issue_id, MAX_NOTES], |row| {
Ok(NoteRow {
id: row.get(0)?,
body: row.get(1)?,
author_username: row.get(2)?,
created_at: row.get(3)?,
})
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
Ok(notes)
}
// ---------------------------------------------------------------------------
// Embedding helper
// ---------------------------------------------------------------------------
async fn embed_texts(config: &Config, texts: &[String]) -> Result<Vec<Vec<f32>>> {
let ollama = OllamaClient::new(OllamaConfig {
base_url: config.embedding.base_url.clone(),
model: config.embedding.model.clone(),
timeout_secs: 60,
});
let mut all_embeddings: Vec<Vec<f32>> = Vec::with_capacity(texts.len());
for chunk in texts.chunks(BATCH_SIZE) {
let refs: Vec<&str> = chunk.iter().map(|s| s.as_str()).collect();
let batch_result = ollama.embed_batch(&refs).await?;
all_embeddings.extend(batch_result);
}
Ok(all_embeddings)
}
// ---------------------------------------------------------------------------
// Drift detection
// ---------------------------------------------------------------------------
fn detect_drift(
curve: &[SimilarityPoint],
notes: &[NoteRow],
threshold: f32,
) -> (bool, Option<DriftPoint>) {
if curve.len() < WINDOW_SIZE {
return (false, None);
}
for i in 0..=curve.len() - WINDOW_SIZE {
let window_avg: f32 = curve[i..i + WINDOW_SIZE]
.iter()
.map(|p| p.similarity)
.sum::<f32>()
/ WINDOW_SIZE as f32;
if window_avg < threshold {
return (
true,
Some(DriftPoint {
note_index: i,
note_id: notes[i].id,
author: notes[i].author_username.clone(),
created_at: ms_to_iso(notes[i].created_at),
similarity: curve[i].similarity,
}),
);
}
}
(false, None)
}
// ---------------------------------------------------------------------------
// Topic extraction
// ---------------------------------------------------------------------------
static STOPWORDS: LazyLock<std::collections::HashSet<&'static str>> = LazyLock::new(|| {
[
"the", "a", "an", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had",
"do", "does", "did", "will", "would", "could", "should", "may", "might", "shall", "can",
"need", "dare", "ought", "used", "to", "of", "in", "for", "on", "with", "at", "by", "from",
"as", "into", "through", "during", "before", "after", "above", "below", "between", "out",
"off", "over", "under", "again", "further", "then", "once", "here", "there", "when",
"where", "why", "how", "all", "each", "every", "both", "few", "more", "most", "other",
"some", "such", "no", "not", "only", "own", "same", "so", "than", "too", "very", "just",
"because", "but", "and", "or", "if", "while", "about", "up", "it", "its", "this", "that",
"these", "those", "i", "me", "my", "we", "our", "you", "your", "he", "him", "his", "she",
"her", "they", "them", "their", "what", "which", "who", "whom", "also", "like", "get",
"got", "think", "know", "see", "make", "go", "one", "two", "new", "way",
]
.into_iter()
.collect()
});
fn tokenize(text: &str) -> Vec<String> {
let cleaned = strip_markdown(text);
cleaned
.split(|c: char| !c.is_alphanumeric() && c != '_')
.filter(|w| w.len() >= 3)
.map(|w| w.to_lowercase())
.filter(|w| !STOPWORDS.contains(w.as_str()))
.collect()
}
fn extract_drift_topics(description: &str, notes: &[NoteRow], drift_idx: usize) -> Vec<String> {
let desc_terms: std::collections::HashSet<String> = tokenize(description).into_iter().collect();
let mut freq: HashMap<String, usize> = HashMap::new();
for note in notes.iter().skip(drift_idx) {
for term in tokenize(&note.body) {
if !desc_terms.contains(&term) {
*freq.entry(term).or_insert(0) += 1;
}
}
}
let mut sorted: Vec<(String, usize)> = freq.into_iter().collect();
sorted.sort_by(|a, b| b.1.cmp(&a.1));
sorted
.into_iter()
.take(TOP_TOPICS)
.map(|(t, _)| t)
.collect()
}
// ---------------------------------------------------------------------------
// Markdown stripping
// ---------------------------------------------------------------------------
static RE_FENCED_CODE: LazyLock<Regex> =
LazyLock::new(|| Regex::new(r"(?s)```[^\n]*\n.*?```").unwrap());
static RE_INLINE_CODE: LazyLock<Regex> = LazyLock::new(|| Regex::new(r"`[^`]+`").unwrap());
static RE_LINK: LazyLock<Regex> = LazyLock::new(|| Regex::new(r"\[([^\]]+)\]\([^)]+\)").unwrap());
static RE_BLOCKQUOTE: LazyLock<Regex> = LazyLock::new(|| Regex::new(r"(?m)^>\s?").unwrap());
static RE_HTML_TAG: LazyLock<Regex> = LazyLock::new(|| Regex::new(r"<[^>]+>").unwrap());
fn strip_markdown(text: &str) -> String {
let text = RE_FENCED_CODE.replace_all(text, "");
let text = RE_INLINE_CODE.replace_all(&text, "");
let text = RE_LINK.replace_all(&text, "$1");
let text = RE_BLOCKQUOTE.replace_all(&text, "");
let text = RE_HTML_TAG.replace_all(&text, "");
text.into_owned()
}
// ---------------------------------------------------------------------------
// Printers
// ---------------------------------------------------------------------------
pub fn print_drift_human(response: &DriftResponse) {
let header = format!(
"Drift Analysis: {} #{}",
response.entity.entity_type, response.entity.iid
);
println!("{}", Theme::bold().render(&header));
println!("{}", "-".repeat(header.len().min(60)));
println!("Title: {}", response.entity.title);
println!("Threshold: {:.2}", response.threshold);
println!("Notes: {}", response.similarity_curve.len());
println!();
if response.drift_detected {
println!(
"{} {}",
Theme::error().render(Icons::error()),
Theme::error().bold().render("DRIFT DETECTED")
);
if let Some(dp) = &response.drift_point {
println!(
" At note #{} by @{} ({}) - similarity {:.2}",
dp.note_index, dp.author, dp.created_at, dp.similarity
);
}
if !response.drift_topics.is_empty() {
println!(" Topics: {}", response.drift_topics.join(", "));
}
} else {
println!(
"{} {}",
Theme::success().render(Icons::success()),
Theme::success().render("No drift detected")
);
}
println!();
println!("{}", response.recommendation);
if !response.similarity_curve.is_empty() {
println!();
println!("{}", Theme::bold().render("Similarity Curve:"));
for pt in &response.similarity_curve {
let bar_len = ((pt.similarity.max(0.0)) * 30.0) as usize;
let bar: String = "\u{2588}".repeat(bar_len);
println!(
" {:>3} {:.2} {} @{}",
pt.note_index, pt.similarity, bar, pt.author
);
}
}
}
pub fn print_drift_json(response: &DriftResponse, elapsed_ms: u64) {
let meta = RobotMeta { elapsed_ms };
let output = serde_json::json!({
"ok": true,
"data": response,
"meta": meta,
});
match serde_json::to_string(&output) {
Ok(json) => println!("{json}"),
Err(e) => eprintln!("Error serializing to JSON: {e}"),
}
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_detect_drift_when_divergent() {
let notes: Vec<NoteRow> = (0..6)
.map(|i| NoteRow {
id: i as i64,
body: format!("note {i}"),
author_username: "user".to_string(),
created_at: 1000 + i as i64,
})
.collect();
let curve: Vec<SimilarityPoint> = [0.9, 0.85, 0.8, 0.25, 0.2, 0.15]
.iter()
.enumerate()
.map(|(i, &sim)| SimilarityPoint {
note_index: i,
similarity: sim,
author: "user".to_string(),
created_at: ms_to_iso(1000 + i as i64),
})
.collect();
let (detected, point) = detect_drift(&curve, &notes, 0.4);
assert!(detected);
assert!(point.is_some());
}
#[test]
fn test_no_drift_consistent() {
let notes: Vec<NoteRow> = (0..5)
.map(|i| NoteRow {
id: i as i64,
body: format!("note {i}"),
author_username: "user".to_string(),
created_at: 1000 + i as i64,
})
.collect();
let curve: Vec<SimilarityPoint> = [0.85, 0.8, 0.75, 0.7, 0.65]
.iter()
.enumerate()
.map(|(i, &sim)| SimilarityPoint {
note_index: i,
similarity: sim,
author: "user".to_string(),
created_at: ms_to_iso(1000 + i as i64),
})
.collect();
let (detected, _) = detect_drift(&curve, &notes, 0.4);
assert!(!detected);
}
#[test]
fn test_drift_point_is_first_divergent() {
let notes: Vec<NoteRow> = (0..5)
.map(|i| NoteRow {
id: (i * 10) as i64,
body: format!("note {i}"),
author_username: format!("user{i}"),
created_at: 1000 + i as i64,
})
.collect();
// Window of 3: indices [0,1,2] avg=0.83, [1,2,3] avg=0.55, [2,3,4] avg=0.23
let curve: Vec<SimilarityPoint> = [0.9, 0.8, 0.8, 0.05, 0.05]
.iter()
.enumerate()
.map(|(i, &sim)| SimilarityPoint {
note_index: i,
similarity: sim,
author: format!("user{i}"),
created_at: ms_to_iso(1000 + i as i64),
})
.collect();
let (detected, point) = detect_drift(&curve, &notes, 0.4);
assert!(detected);
let dp = point.unwrap();
// Window [2,3,4] avg = (0.8+0.05+0.05)/3 = 0.3 < 0.4
// But [1,2,3] avg = (0.8+0.8+0.05)/3 = 0.55 >= 0.4, so first failing is index 2
assert_eq!(dp.note_index, 2);
assert_eq!(dp.note_id, 20);
}
#[test]
fn test_extract_drift_topics_excludes_description_terms() {
let description = "We need to fix the authentication flow for login users";
let notes = vec![
NoteRow {
id: 1,
body: "The database migration script is broken and needs postgres update"
.to_string(),
author_username: "dev".to_string(),
created_at: 1000,
},
NoteRow {
id: 2,
body: "The database connection pool also has migration issues with postgres"
.to_string(),
author_username: "dev".to_string(),
created_at: 2000,
},
];
let topics = extract_drift_topics(description, &notes, 0);
// "database", "migration", "postgres" should appear; "fix" should not (it's in description)
assert!(!topics.is_empty());
for t in &topics {
assert_ne!(t, "fix");
assert_ne!(t, "authentication");
assert_ne!(t, "login");
}
}
#[test]
fn test_strip_markdown_code_blocks() {
let input = "Before\n```rust\nfn main() {}\n```\nAfter";
let result = strip_markdown(input);
assert!(!result.contains("fn main"));
assert!(result.contains("Before"));
assert!(result.contains("After"));
}
#[test]
fn test_strip_markdown_preserves_text() {
let input = "Check [this link](https://example.com) and `inline code` for details";
let result = strip_markdown(input);
assert!(result.contains("this link"));
assert!(!result.contains("https://example.com"));
assert!(!result.contains("inline code"));
assert!(result.contains("details"));
}
#[test]
fn test_too_few_notes() {
let notes: Vec<NoteRow> = (0..2)
.map(|i| NoteRow {
id: i as i64,
body: format!("note {i}"),
author_username: "user".to_string(),
created_at: 1000 + i as i64,
})
.collect();
let curve: Vec<SimilarityPoint> = [0.1, 0.1]
.iter()
.enumerate()
.map(|(i, &sim)| SimilarityPoint {
note_index: i,
similarity: sim,
author: "user".to_string(),
created_at: ms_to_iso(1000 + i as i64),
})
.collect();
let (detected, _) = detect_drift(&curve, &notes, 0.4);
assert!(!detected);
}
}

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::Theme;
use serde::Serialize;
use crate::Config;
@@ -96,16 +96,31 @@ pub async fn run_embed(
}
pub fn print_embed(result: &EmbedCommandResult) {
println!("{} Embedding complete", style("done").green().bold(),);
if result.docs_embedded == 0 && result.failed == 0 && result.skipped == 0 {
println!(
"\n {} nothing to embed",
Theme::success().bold().render("Embedding")
);
return;
}
println!(
" Embedded: {} documents ({} chunks)",
result.docs_embedded, result.chunks_embedded
"\n {} {} documents ({} chunks)",
Theme::success().bold().render("Embedded"),
Theme::bold().render(&result.docs_embedded.to_string()),
result.chunks_embedded
);
if result.failed > 0 {
println!(" Failed: {}", style(result.failed).red());
println!(
" {}",
Theme::error().render(&format!("{} failed", result.failed))
);
}
if result.skipped > 0 {
println!(" Skipped: {}", result.skipped);
println!(
" {}",
Theme::dim().render(&format!("{} skipped", result.skipped))
);
}
}

View File

@@ -0,0 +1,334 @@
use serde::Serialize;
use crate::Config;
use crate::cli::render::{self, Icons, Theme};
use crate::core::db::create_connection;
use crate::core::error::Result;
use crate::core::file_history::resolve_rename_chain;
use crate::core::paths::get_db_path;
use crate::core::project::resolve_project;
use crate::core::time::ms_to_iso;
/// Maximum rename chain BFS depth.
const MAX_RENAME_HOPS: usize = 10;
/// A single MR that touched the file.
#[derive(Debug, Serialize)]
pub struct FileHistoryMr {
pub iid: i64,
pub title: String,
pub state: String,
pub author_username: String,
pub change_type: String,
pub merged_at_iso: Option<String>,
pub updated_at_iso: String,
pub merge_commit_sha: Option<String>,
pub web_url: Option<String>,
}
/// A DiffNote discussion snippet on the file.
#[derive(Debug, Serialize)]
pub struct FileDiscussion {
pub discussion_id: String,
pub author_username: String,
pub body_snippet: String,
pub path: String,
pub created_at_iso: String,
}
/// Full result of a file-history query.
#[derive(Debug, Serialize)]
pub struct FileHistoryResult {
pub path: String,
pub rename_chain: Vec<String>,
pub renames_followed: bool,
pub merge_requests: Vec<FileHistoryMr>,
pub discussions: Vec<FileDiscussion>,
pub total_mrs: usize,
pub paths_searched: usize,
}
/// Run the file-history query.
pub fn run_file_history(
config: &Config,
path: &str,
project: Option<&str>,
no_follow_renames: bool,
merged_only: bool,
include_discussions: bool,
limit: usize,
) -> Result<FileHistoryResult> {
let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?;
let project_id = project.map(|p| resolve_project(&conn, p)).transpose()?;
// Resolve rename chain unless disabled
let (all_paths, renames_followed) = if no_follow_renames {
(vec![path.to_string()], false)
} else if let Some(pid) = project_id {
let chain = resolve_rename_chain(&conn, pid, path, MAX_RENAME_HOPS)?;
let followed = chain.len() > 1;
(chain, followed)
} else {
// Without a project scope, can't resolve renames (need project_id)
(vec![path.to_string()], false)
};
let paths_searched = all_paths.len();
// Build placeholders for IN clause
let placeholders: Vec<String> = (0..all_paths.len())
.map(|i| format!("?{}", i + 2))
.collect();
let in_clause = placeholders.join(", ");
let merged_filter = if merged_only {
" AND mr.state = 'merged'"
} else {
""
};
let project_filter = if project_id.is_some() {
"AND mfc.project_id = ?1"
} else {
""
};
let sql = format!(
"SELECT DISTINCT \
mr.iid, mr.title, mr.state, mr.author_username, \
mfc.change_type, mr.merged_at, mr.updated_at, mr.merge_commit_sha, mr.web_url \
FROM mr_file_changes mfc \
JOIN merge_requests mr ON mr.id = mfc.merge_request_id \
WHERE mfc.new_path IN ({in_clause}) {project_filter} {merged_filter} \
ORDER BY COALESCE(mr.merged_at, mr.updated_at) DESC \
LIMIT ?{}",
all_paths.len() + 2
);
let mut stmt = conn.prepare(&sql)?;
// Bind parameters: ?1 = project_id (or 0 placeholder), ?2..?N+1 = paths, ?N+2 = limit
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = Vec::new();
params.push(Box::new(project_id.unwrap_or(0)));
for p in &all_paths {
params.push(Box::new(p.clone()));
}
params.push(Box::new(limit as i64));
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let merge_requests: Vec<FileHistoryMr> = stmt
.query_map(param_refs.as_slice(), |row| {
let merged_at: Option<i64> = row.get(5)?;
let updated_at: i64 = row.get(6)?;
Ok(FileHistoryMr {
iid: row.get(0)?,
title: row.get(1)?,
state: row.get(2)?,
author_username: row.get(3)?,
change_type: row.get(4)?,
merged_at_iso: merged_at.map(ms_to_iso),
updated_at_iso: ms_to_iso(updated_at),
merge_commit_sha: row.get(7)?,
web_url: row.get(8)?,
})
})?
.filter_map(std::result::Result::ok)
.collect();
let total_mrs = merge_requests.len();
// Optionally fetch DiffNote discussions on this file
let discussions = if include_discussions && !merge_requests.is_empty() {
fetch_file_discussions(&conn, &all_paths, project_id)?
} else {
Vec::new()
};
Ok(FileHistoryResult {
path: path.to_string(),
rename_chain: all_paths,
renames_followed,
merge_requests,
discussions,
total_mrs,
paths_searched,
})
}
/// Fetch DiffNote discussions that reference the given file paths.
fn fetch_file_discussions(
conn: &rusqlite::Connection,
paths: &[String],
project_id: Option<i64>,
) -> Result<Vec<FileDiscussion>> {
let placeholders: Vec<String> = (0..paths.len()).map(|i| format!("?{}", i + 2)).collect();
let in_clause = placeholders.join(", ");
let project_filter = if project_id.is_some() {
"AND d.project_id = ?1"
} else {
""
};
let sql = format!(
"SELECT d.gitlab_discussion_id, n.author_username, n.body, n.position_new_path, n.created_at \
FROM notes n \
JOIN discussions d ON d.id = n.discussion_id \
WHERE n.position_new_path IN ({in_clause}) {project_filter} \
AND n.is_system = 0 \
ORDER BY n.created_at DESC \
LIMIT 50"
);
let mut stmt = conn.prepare(&sql)?;
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = Vec::new();
params.push(Box::new(project_id.unwrap_or(0)));
for p in paths {
params.push(Box::new(p.clone()));
}
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let discussions: Vec<FileDiscussion> = stmt
.query_map(param_refs.as_slice(), |row| {
let body: String = row.get(2)?;
let snippet = if body.len() > 200 {
format!("{}...", &body[..body.floor_char_boundary(200)])
} else {
body
};
let created_at: i64 = row.get(4)?;
Ok(FileDiscussion {
discussion_id: row.get(0)?,
author_username: row.get(1)?,
body_snippet: snippet,
path: row.get(3)?,
created_at_iso: ms_to_iso(created_at),
})
})?
.filter_map(std::result::Result::ok)
.collect();
Ok(discussions)
}
// ── Human output ────────────────────────────────────────────────────────────
pub fn print_file_history(result: &FileHistoryResult) {
// Header
let paths_info = if result.paths_searched > 1 {
format!(
" (via {} paths, {} MRs)",
result.paths_searched, result.total_mrs
)
} else {
format!(" ({} MRs)", result.total_mrs)
};
println!();
println!(
"{}",
Theme::bold().render(&format!("File History: {}{}", result.path, paths_info))
);
// Rename chain
if result.renames_followed && result.rename_chain.len() > 1 {
let chain_str: Vec<&str> = result.rename_chain.iter().map(String::as_str).collect();
println!(
" Rename chain: {}",
Theme::dim().render(&chain_str.join(" -> "))
);
}
if result.merge_requests.is_empty() {
println!(
"\n {} {}",
Icons::info(),
Theme::dim().render("No merge requests found touching this file.")
);
println!(
" {}",
Theme::dim().render("Hint: Run 'lore sync' to fetch MR file changes.")
);
println!();
return;
}
println!();
for mr in &result.merge_requests {
let (icon, state_style) = match mr.state.as_str() {
"merged" => (Icons::mr_merged(), Theme::accent()),
"opened" => (Icons::mr_opened(), Theme::success()),
"closed" => (Icons::mr_closed(), Theme::warning()),
_ => (Icons::mr_opened(), Theme::dim()),
};
let date = mr
.merged_at_iso
.as_deref()
.or(Some(mr.updated_at_iso.as_str()))
.unwrap_or("")
.split('T')
.next()
.unwrap_or("");
println!(
" {} {} {} {} @{} {} {}",
icon,
Theme::accent().render(&format!("!{}", mr.iid)),
render::truncate(&mr.title, 50),
state_style.render(&mr.state),
mr.author_username,
date,
Theme::dim().render(&mr.change_type),
);
}
// Discussions
if !result.discussions.is_empty() {
println!(
"\n {} File discussions ({}):",
Icons::note(),
result.discussions.len()
);
for d in &result.discussions {
let date = d.created_at_iso.split('T').next().unwrap_or("");
println!(
" @{} ({}) [{}]: {}",
d.author_username,
date,
Theme::dim().render(&d.path),
d.body_snippet
);
}
}
println!();
}
// ── Robot (JSON) output ─────────────────────────────────────────────────────
pub fn print_file_history_json(result: &FileHistoryResult, elapsed_ms: u64) {
let output = serde_json::json!({
"ok": true,
"data": {
"path": result.path,
"rename_chain": if result.renames_followed { Some(&result.rename_chain) } else { None },
"merge_requests": result.merge_requests,
"discussions": if result.discussions.is_empty() { None } else { Some(&result.discussions) },
},
"meta": {
"elapsed_ms": elapsed_ms,
"total_mrs": result.total_mrs,
"renames_followed": result.renames_followed,
"paths_searched": result.paths_searched,
}
});
println!("{}", serde_json::to_string(&output).unwrap_or_default());
}

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::Theme;
use rusqlite::Connection;
use serde::Serialize;
use tracing::info;
@@ -39,6 +39,7 @@ pub fn run_generate_docs(
result.seeded += seed_dirty(&conn, SourceType::Issue, project_filter)?;
result.seeded += seed_dirty(&conn, SourceType::MergeRequest, project_filter)?;
result.seeded += seed_dirty(&conn, SourceType::Discussion, project_filter)?;
result.seeded += seed_dirty_notes(&conn, project_filter)?;
}
let regen =
@@ -67,6 +68,10 @@ fn seed_dirty(
SourceType::Issue => "issues",
SourceType::MergeRequest => "merge_requests",
SourceType::Discussion => "discussions",
SourceType::Note => {
// NOTE-2E will implement seed_dirty_notes separately (needs is_system filter)
unreachable!("Note seeding handled by seed_dirty_notes, not seed_dirty")
}
};
let type_str = source_type.as_str();
let now = chrono::Utc::now().timestamp_millis();
@@ -125,25 +130,95 @@ fn seed_dirty(
Ok(total_seeded)
}
fn seed_dirty_notes(conn: &Connection, project_filter: Option<&str>) -> Result<usize> {
let now = chrono::Utc::now().timestamp_millis();
let mut total_seeded: usize = 0;
let mut last_id: i64 = 0;
loop {
let inserted = if let Some(project) = project_filter {
let project_id = resolve_project(conn, project)?;
conn.execute(
"INSERT INTO dirty_sources (source_type, source_id, queued_at, attempt_count, last_attempt_at, last_error, next_attempt_at)
SELECT 'note', id, ?1, 0, NULL, NULL, NULL
FROM notes WHERE id > ?2 AND project_id = ?3 AND is_system = 0 ORDER BY id LIMIT ?4
ON CONFLICT(source_type, source_id) DO NOTHING",
rusqlite::params![now, last_id, project_id, FULL_MODE_CHUNK_SIZE],
)?
} else {
conn.execute(
"INSERT INTO dirty_sources (source_type, source_id, queued_at, attempt_count, last_attempt_at, last_error, next_attempt_at)
SELECT 'note', id, ?1, 0, NULL, NULL, NULL
FROM notes WHERE id > ?2 AND is_system = 0 ORDER BY id LIMIT ?3
ON CONFLICT(source_type, source_id) DO NOTHING",
rusqlite::params![now, last_id, FULL_MODE_CHUNK_SIZE],
)?
};
if inserted == 0 {
break;
}
let max_id: i64 = conn.query_row(
"SELECT MAX(id) FROM (SELECT id FROM notes WHERE id > ?1 AND is_system = 0 ORDER BY id LIMIT ?2)",
rusqlite::params![last_id, FULL_MODE_CHUNK_SIZE],
|row| row.get(0),
)?;
total_seeded += inserted;
last_id = max_id;
}
info!(
source_type = "note",
seeded = total_seeded,
"Seeded dirty_sources"
);
Ok(total_seeded)
}
pub fn print_generate_docs(result: &GenerateDocsResult) {
let mode = if result.full_mode {
"full"
} else {
"incremental"
};
if result.regenerated == 0 && result.errored == 0 {
println!(
"\n {} no documents to update ({})",
Theme::success().bold().render("Docs"),
mode
);
return;
}
// Headline
println!(
"{} Document generation complete ({})",
style("done").green().bold(),
"\n {} {} documents ({})",
Theme::success().bold().render("Generated"),
Theme::bold().render(&result.regenerated.to_string()),
mode
);
if result.full_mode {
println!(" Seeded: {}", result.seeded);
// Detail line: compact middle-dot format, zero-suppressed
let mut details: Vec<String> = Vec::new();
if result.full_mode && result.seeded > 0 {
details.push(format!("{} seeded", result.seeded));
}
if result.unchanged > 0 {
details.push(format!("{} unchanged", result.unchanged));
}
if !details.is_empty() {
println!(" {}", Theme::dim().render(&details.join(" \u{b7} ")));
}
println!(" Regenerated: {}", result.regenerated);
println!(" Unchanged: {}", result.unchanged);
if result.errored > 0 {
println!(" Errored: {}", style(result.errored).red());
println!(
" {}",
Theme::error().render(&format!("{} errored", result.errored))
);
}
}
@@ -186,3 +261,81 @@ pub fn print_generate_docs_json(result: &GenerateDocsResult, elapsed_ms: u64) {
};
println!("{}", serde_json::to_string(&output).unwrap());
}
#[cfg(test)]
mod tests {
use std::path::Path;
use crate::core::db::{create_connection, run_migrations};
use super::*;
fn setup_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url) VALUES (1, 100, 'group/project', 'https://gitlab.com/group/project')",
[],
).unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 1, 'Test', 'opened', 1000, 2000, 3000)",
[],
).unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (1, 'disc_1', 1, 1, 'Issue', 3000)",
[],
).unwrap();
conn
}
fn insert_note(conn: &Connection, id: i64, gitlab_id: i64, is_system: bool) {
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (?1, ?2, 1, 1, 'alice', 'note body', 1000, 2000, 3000, ?3)",
rusqlite::params![id, gitlab_id, is_system as i32],
).unwrap();
}
#[test]
fn test_full_seed_includes_notes() {
let conn = setup_db();
insert_note(&conn, 1, 101, false);
insert_note(&conn, 2, 102, false);
insert_note(&conn, 3, 103, false);
insert_note(&conn, 4, 104, true); // system note — should be excluded
let seeded = seed_dirty_notes(&conn, None).unwrap();
assert_eq!(seeded, 3);
let count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(count, 3);
}
#[test]
fn test_note_document_count_stable_after_second_generate_docs_full() {
let conn = setup_db();
insert_note(&conn, 1, 101, false);
insert_note(&conn, 2, 102, false);
let first = seed_dirty_notes(&conn, None).unwrap();
assert_eq!(first, 2);
// Second run should be idempotent (ON CONFLICT DO NOTHING)
let second = seed_dirty_notes(&conn, None).unwrap();
assert_eq!(second, 0);
let count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(count, 2);
}
}

View File

@@ -1,7 +1,7 @@
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use console::style;
use crate::cli::render::Theme;
use indicatif::{ProgressBar, ProgressStyle};
use rusqlite::Connection;
use serde::Serialize;
@@ -44,6 +44,38 @@ pub struct IngestResult {
pub resource_events_failed: usize,
pub mr_diffs_fetched: usize,
pub mr_diffs_failed: usize,
pub status_enrichment_errors: usize,
pub status_enrichment_projects: Vec<ProjectStatusEnrichment>,
pub project_summaries: Vec<ProjectSummary>,
}
/// Per-project summary for display in stage completion sub-rows.
#[derive(Debug, Default)]
pub struct ProjectSummary {
pub path: String,
pub items_upserted: usize,
pub discussions_synced: usize,
pub events_fetched: usize,
pub events_failed: usize,
pub statuses_enriched: usize,
pub statuses_seen: usize,
pub status_errors: usize,
pub mr_diffs_fetched: usize,
pub mr_diffs_failed: usize,
}
/// Per-project status enrichment result, collected during ingestion.
pub struct ProjectStatusEnrichment {
pub path: String,
pub mode: String,
pub reason: Option<String>,
pub seen: usize,
pub enriched: usize,
pub cleared: usize,
pub without_widget: usize,
pub partial_errors: usize,
pub first_partial_error: Option<String>,
pub error: Option<String>,
}
#[derive(Debug, Default, Clone, Serialize)]
@@ -278,7 +310,7 @@ async fn run_ingest_inner(
if display.show_text {
println!(
"{}",
style("Full sync: resetting cursors to fetch all data...").yellow()
Theme::warning().render("Full sync: resetting cursors to fetch all data...")
);
}
for (local_project_id, _, path) in &projects {
@@ -326,7 +358,10 @@ async fn run_ingest_inner(
"merge requests"
};
if display.show_text {
println!("{}", style(format!("Ingesting {type_label}...")).blue());
println!(
"{}",
Theme::info().render(&format!("Ingesting {type_label}..."))
);
println!();
}
@@ -370,11 +405,11 @@ async fn run_ingest_inner(
let s = multi.add(ProgressBar::new_spinner());
s.set_style(
ProgressStyle::default_spinner()
.template("{spinner:.blue} {msg}")
.template("{spinner:.cyan} {msg}")
.unwrap(),
);
s.set_message(format!("Fetching {type_label} from {path}..."));
s.enable_steady_tick(std::time::Duration::from_millis(100));
s.enable_steady_tick(std::time::Duration::from_millis(60));
s
};
@@ -385,12 +420,13 @@ async fn run_ingest_inner(
b.set_style(
ProgressStyle::default_bar()
.template(
" {spinner:.blue} {prefix:.cyan} Syncing discussions [{bar:30.cyan/dim}] {pos}/{len}",
" {spinner:.dim} {prefix:.cyan} Syncing discussions [{bar:30.cyan/dark_gray}] {pos}/{len} {per_sec:.dim} {eta:.dim}",
)
.unwrap()
.progress_chars("=> "),
.progress_chars(crate::cli::render::Icons::progress_chars()),
);
b.set_prefix(path.clone());
b.enable_steady_tick(std::time::Duration::from_millis(60));
b
};
@@ -427,7 +463,7 @@ async fn run_ingest_inner(
spinner_clone.finish_and_clear();
let agg_total = agg_disc_total_clone.fetch_add(total, Ordering::Relaxed) + total;
disc_bar_clone.set_length(total as u64);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(format!(
"Syncing discussions... (0/{agg_total})"
));
@@ -447,7 +483,7 @@ async fn run_ingest_inner(
spinner_clone.finish_and_clear();
let agg_total = agg_disc_total_clone.fetch_add(total, Ordering::Relaxed) + total;
disc_bar_clone.set_length(total as u64);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(format!(
"Syncing discussions... (0/{agg_total})"
));
@@ -468,11 +504,11 @@ async fn run_ingest_inner(
disc_bar_clone.set_length(total as u64);
disc_bar_clone.set_style(
ProgressStyle::default_bar()
.template(" {spinner:.blue} {prefix:.cyan} Fetching resource events [{bar:30.cyan/dim}] {pos}/{len}")
.template(" {spinner:.dim} {prefix:.cyan} Fetching resource events [{bar:30.cyan/dark_gray}] {pos}/{len} {per_sec:.dim} {eta:.dim}")
.unwrap()
.progress_chars("=> "),
.progress_chars(crate::cli::render::Icons::progress_chars()),
);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
agg_events_total_clone.fetch_add(total, Ordering::Relaxed);
stage_bar_clone.set_message(
"Fetching resource events...".to_string()
@@ -492,7 +528,7 @@ async fn run_ingest_inner(
ProgressEvent::ClosesIssuesFetchStarted { total } => {
disc_bar_clone.reset();
disc_bar_clone.set_length(total as u64);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(
"Fetching closes-issues references...".to_string()
);
@@ -506,7 +542,7 @@ async fn run_ingest_inner(
ProgressEvent::MrDiffsFetchStarted { total } => {
disc_bar_clone.reset();
disc_bar_clone.set_length(total as u64);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(
"Fetching MR file changes...".to_string()
);
@@ -517,6 +553,43 @@ async fn run_ingest_inner(
ProgressEvent::MrDiffsFetchComplete { .. } => {
disc_bar_clone.finish_and_clear();
}
ProgressEvent::StatusEnrichmentStarted { total } => {
spinner_clone.finish_and_clear();
disc_bar_clone.reset();
disc_bar_clone.set_length(total as u64);
disc_bar_clone.set_style(
ProgressStyle::default_bar()
.template(" {spinner:.dim} {prefix:.cyan} Statuses [{bar:30.cyan/dark_gray}] {pos}/{len} {per_sec:.dim} {eta:.dim}")
.unwrap()
.progress_chars(crate::cli::render::Icons::progress_chars()),
);
disc_bar_clone.set_prefix(path_for_cb.clone());
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(
"Enriching work item statuses...".to_string()
);
}
ProgressEvent::StatusEnrichmentPageFetched { items_so_far } => {
disc_bar_clone.set_position(items_so_far as u64);
stage_bar_clone.set_message(format!(
"Enriching work item statuses... ({items_so_far} fetched)"
));
}
ProgressEvent::StatusEnrichmentWriting { total } => {
disc_bar_clone.set_message(format!("Writing {total} statuses..."));
stage_bar_clone.set_message(format!(
"Writing {total} work item statuses..."
));
}
ProgressEvent::StatusEnrichmentComplete { enriched, cleared } => {
disc_bar_clone.finish_and_clear();
if enriched > 0 || cleared > 0 {
stage_bar_clone.set_message(format!(
"Status enrichment: {enriched} enriched, {cleared} cleared"
));
}
}
ProgressEvent::StatusEnrichmentSkipped => {}
})
};
@@ -587,6 +660,36 @@ async fn run_ingest_inner(
total.issues_skipped_discussion_sync += result.issues_skipped_discussion_sync;
total.resource_events_fetched += result.resource_events_fetched;
total.resource_events_failed += result.resource_events_failed;
if result.status_enrichment_error.is_some() {
total.status_enrichment_errors += 1;
}
total
.status_enrichment_projects
.push(ProjectStatusEnrichment {
path: path.clone(),
mode: result.status_enrichment_mode.clone(),
reason: result.status_unsupported_reason.clone(),
seen: result.statuses_seen,
enriched: result.statuses_enriched,
cleared: result.statuses_cleared,
without_widget: result.statuses_without_widget,
partial_errors: result.partial_error_count,
first_partial_error: result.first_partial_error.clone(),
error: result.status_enrichment_error.clone(),
});
total.project_summaries.push(ProjectSummary {
path: path.clone(),
items_upserted: result.issues_upserted,
discussions_synced: result.discussions_fetched,
events_fetched: result.resource_events_fetched,
events_failed: result.resource_events_failed,
statuses_enriched: result.statuses_enriched,
statuses_seen: result.statuses_seen,
status_errors: result.partial_error_count
+ usize::from(result.status_enrichment_error.is_some()),
mr_diffs_fetched: 0,
mr_diffs_failed: 0,
});
}
Ok(ProjectIngestOutcome::Mrs {
ref path,
@@ -610,6 +713,18 @@ async fn run_ingest_inner(
total.resource_events_failed += result.resource_events_failed;
total.mr_diffs_fetched += result.mr_diffs_fetched;
total.mr_diffs_failed += result.mr_diffs_failed;
total.project_summaries.push(ProjectSummary {
path: path.clone(),
items_upserted: result.mrs_upserted,
discussions_synced: result.discussions_fetched,
events_fetched: result.resource_events_fetched,
events_failed: result.resource_events_failed,
statuses_enriched: 0,
statuses_seen: 0,
status_errors: 0,
mr_diffs_fetched: result.mr_diffs_fetched,
mr_diffs_failed: result.mr_diffs_failed,
});
}
}
}
@@ -680,7 +795,7 @@ fn print_issue_project_summary(path: &str, result: &IngestProjectResult) {
println!(
" {}: {} issues fetched{}",
style(path).cyan(),
Theme::info().render(path),
result.issues_upserted,
labels_str
);
@@ -695,7 +810,7 @@ fn print_issue_project_summary(path: &str, result: &IngestProjectResult) {
if result.issues_skipped_discussion_sync > 0 {
println!(
" {} unchanged issues (discussion sync skipped)",
style(result.issues_skipped_discussion_sync).dim()
Theme::dim().render(&result.issues_skipped_discussion_sync.to_string())
);
}
}
@@ -718,7 +833,7 @@ fn print_mr_project_summary(path: &str, result: &IngestMrProjectResult) {
println!(
" {}: {} MRs fetched{}{}",
style(path).cyan(),
Theme::info().render(path),
result.mrs_upserted,
labels_str,
assignees_str
@@ -742,7 +857,7 @@ fn print_mr_project_summary(path: &str, result: &IngestMrProjectResult) {
if result.mrs_skipped_discussion_sync > 0 {
println!(
" {} unchanged MRs (discussion sync skipped)",
style(result.mrs_skipped_discussion_sync).dim()
Theme::dim().render(&result.mrs_skipped_discussion_sync.to_string())
);
}
}
@@ -767,6 +882,25 @@ struct IngestJsonData {
notes_upserted: usize,
resource_events_fetched: usize,
resource_events_failed: usize,
#[serde(skip_serializing_if = "Vec::is_empty")]
status_enrichment: Vec<StatusEnrichmentJson>,
status_enrichment_errors: usize,
}
#[derive(Serialize)]
struct StatusEnrichmentJson {
mode: String,
#[serde(skip_serializing_if = "Option::is_none")]
reason: Option<String>,
seen: usize,
enriched: usize,
cleared: usize,
without_widget: usize,
partial_errors: usize,
#[serde(skip_serializing_if = "Option::is_none")]
first_partial_error: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
error: Option<String>,
}
#[derive(Serialize)]
@@ -814,6 +948,22 @@ pub fn print_ingest_summary_json(result: &IngestResult, elapsed_ms: u64) {
)
};
let status_enrichment: Vec<StatusEnrichmentJson> = result
.status_enrichment_projects
.iter()
.map(|p| StatusEnrichmentJson {
mode: p.mode.clone(),
reason: p.reason.clone(),
seen: p.seen,
enriched: p.enriched,
cleared: p.cleared,
without_widget: p.without_widget,
partial_errors: p.partial_errors,
first_partial_error: p.first_partial_error.clone(),
error: p.error.clone(),
})
.collect();
let output = IngestJsonOutput {
ok: true,
data: IngestJsonData {
@@ -826,6 +976,8 @@ pub fn print_ingest_summary_json(result: &IngestResult, elapsed_ms: u64) {
notes_upserted: result.notes_upserted,
resource_events_fetched: result.resource_events_fetched,
resource_events_failed: result.resource_events_failed,
status_enrichment,
status_enrichment_errors: result.status_enrichment_errors,
},
meta: RobotMeta { elapsed_ms },
};
@@ -839,21 +991,19 @@ pub fn print_ingest_summary(result: &IngestResult) {
if result.resource_type == "issues" {
println!(
"{}",
style(format!(
Theme::success().render(&format!(
"Total: {} issues, {} discussions, {} notes",
result.issues_upserted, result.discussions_fetched, result.notes_upserted
))
.green()
);
if result.issues_skipped_discussion_sync > 0 {
println!(
"{}",
style(format!(
Theme::dim().render(&format!(
"Skipped discussion sync for {} unchanged issues.",
result.issues_skipped_discussion_sync
))
.dim()
);
}
} else {
@@ -865,24 +1015,22 @@ pub fn print_ingest_summary(result: &IngestResult) {
println!(
"{}",
style(format!(
Theme::success().render(&format!(
"Total: {} MRs, {} discussions, {} notes{}",
result.mrs_upserted,
result.discussions_fetched,
result.notes_upserted,
diffnotes_str
))
.green()
);
if result.mrs_skipped_discussion_sync > 0 {
println!(
"{}",
style(format!(
Theme::dim().render(&format!(
"Skipped discussion sync for {} unchanged MRs.",
result.mrs_skipped_discussion_sync
))
.dim()
);
}
}
@@ -903,8 +1051,8 @@ pub fn print_ingest_summary(result: &IngestResult) {
pub fn print_dry_run_preview(preview: &DryRunPreview) {
println!(
"{} {}",
style("Dry Run Preview").cyan().bold(),
style("(no changes will be made)").yellow()
Theme::info().bold().render("Dry Run Preview"),
Theme::warning().render("(no changes will be made)")
);
println!();
@@ -914,27 +1062,31 @@ pub fn print_dry_run_preview(preview: &DryRunPreview) {
"merge requests"
};
println!(" Resource type: {}", style(type_label).white().bold());
println!(" Resource type: {}", Theme::bold().render(type_label));
println!(
" Sync mode: {}",
if preview.sync_mode == "full" {
style("full (all data will be re-fetched)").yellow()
Theme::warning().render("full (all data will be re-fetched)")
} else {
style("incremental (only changes since last sync)").green()
Theme::success().render("incremental (only changes since last sync)")
}
);
println!(" Projects: {}", preview.projects.len());
println!();
println!("{}", style("Projects to sync:").cyan().bold());
println!("{}", Theme::info().bold().render("Projects to sync:"));
for project in &preview.projects {
let sync_status = if !project.has_cursor {
style("initial sync").yellow()
Theme::warning().render("initial sync")
} else {
style("incremental").green()
Theme::success().render("incremental")
};
println!(" {} ({})", style(&project.path).white(), sync_status);
println!(
" {} ({})",
Theme::bold().render(&project.path),
sync_status
);
println!(" Existing {}: {}", type_label, project.existing_count);
if let Some(ref last_synced) = project.last_synced {

View File

@@ -10,6 +10,7 @@ pub struct InitInputs {
pub gitlab_url: String,
pub token_env_var: String,
pub project_paths: Vec<String>,
pub default_project: Option<String>,
}
pub struct InitOptions {
@@ -23,6 +24,7 @@ pub struct InitResult {
pub data_dir: String,
pub user: UserInfo,
pub projects: Vec<ProjectInfo>,
pub default_project: Option<String>,
}
pub struct UserInfo {
@@ -104,6 +106,20 @@ pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitRe
));
}
// Validate default_project matches one of the configured project paths
if let Some(ref dp) = inputs.default_project {
let matched = inputs.project_paths.iter().any(|p| {
p.eq_ignore_ascii_case(dp)
|| p.to_ascii_lowercase()
.ends_with(&format!("/{}", dp.to_ascii_lowercase()))
});
if !matched {
return Err(LoreError::Other(format!(
"defaultProject '{dp}' does not match any configured project path"
)));
}
}
if let Some(parent) = config_path.parent() {
fs::create_dir_all(parent)?;
}
@@ -118,6 +134,7 @@ pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitRe
.iter()
.map(|p| ProjectConfig { path: p.clone() })
.collect(),
default_project: inputs.default_project.clone(),
};
let config_json = serde_json::to_string_pretty(&config)?;
@@ -152,5 +169,6 @@ pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitRe
data_dir: data_dir.display().to_string(),
user,
projects: validated_projects.into_iter().map(|(p, _)| p).collect(),
default_project: inputs.default_project,
})
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,9 @@
pub mod auth_test;
pub mod count;
pub mod doctor;
pub mod drift;
pub mod embed;
pub mod file_history;
pub mod generate_docs;
pub mod ingest;
pub mod init;
@@ -12,6 +14,7 @@ pub mod stats;
pub mod sync;
pub mod sync_status;
pub mod timeline;
pub mod trace;
pub mod who;
pub use auth_test::run_auth_test;
@@ -20,7 +23,9 @@ pub use count::{
run_count_events,
};
pub use doctor::{DoctorChecks, print_doctor_results, run_doctor};
pub use drift::{DriftResponse, print_drift_human, print_drift_json, run_drift};
pub use embed::{print_embed, print_embed_json, run_embed};
pub use file_history::{print_file_history, print_file_history_json, run_file_history};
pub use generate_docs::{print_generate_docs, print_generate_docs_json, run_generate_docs};
pub use ingest::{
DryRunPreview, IngestDisplay, print_dry_run_preview, print_dry_run_preview_json,
@@ -28,8 +33,10 @@ pub use ingest::{
};
pub use init::{InitInputs, InitOptions, InitResult, run_init};
pub use list::{
ListFilters, MrListFilters, open_issue_in_browser, open_mr_in_browser, print_list_issues,
print_list_issues_json, print_list_mrs, print_list_mrs_json, run_list_issues, run_list_mrs,
ListFilters, MrListFilters, NoteListFilters, open_issue_in_browser, open_mr_in_browser,
print_list_issues, print_list_issues_json, print_list_mrs, print_list_mrs_json,
print_list_notes, print_list_notes_csv, print_list_notes_json, print_list_notes_jsonl,
query_notes, run_list_issues, run_list_mrs,
};
pub use search::{
SearchCliFilters, SearchResponse, print_search_results, print_search_results_json, run_search,
@@ -42,4 +49,5 @@ pub use stats::{print_stats, print_stats_json, run_stats};
pub use sync::{SyncOptions, SyncResult, print_sync, print_sync_json, run_sync};
pub use sync_status::{print_sync_status, print_sync_status_json, run_sync_status};
pub use timeline::{TimelineParams, print_timeline, print_timeline_json_with_meta, run_timeline};
pub use trace::{parse_trace_path, print_trace, print_trace_json};
pub use who::{WhoRun, print_who_human, print_who_json, run_who};

View File

@@ -1,4 +1,6 @@
use console::style;
use std::collections::HashMap;
use crate::cli::render::Theme;
use serde::Serialize;
use crate::Config;
@@ -8,9 +10,10 @@ use crate::core::paths::get_db_path;
use crate::core::project::resolve_project;
use crate::core::time::{ms_to_iso, parse_since};
use crate::documents::SourceType;
use crate::embedding::ollama::{OllamaClient, OllamaConfig};
use crate::search::{
FtsQueryMode, PathFilter, SearchFilters, apply_filters, get_result_snippet, rank_rrf,
search_fts,
FtsQueryMode, HybridResult, PathFilter, SearchFilters, SearchMode, get_result_snippet,
search_fts, search_hybrid,
};
#[derive(Debug, Serialize)]
@@ -58,7 +61,7 @@ pub struct SearchCliFilters {
pub limit: usize,
}
pub fn run_search(
pub async fn run_search(
config: &Config,
query: &str,
cli_filters: SearchCliFilters,
@@ -71,15 +74,18 @@ pub fn run_search(
let mut warnings: Vec<String> = Vec::new();
// Determine actual mode: vector search requires embeddings, which need async + Ollama.
// Until hybrid/semantic are wired up, we run lexical and warn if the user asked for more.
let actual_mode = "lexical";
if requested_mode != "lexical" {
warnings.push(format!(
"Requested mode '{}' is not yet available; falling back to lexical search.",
requested_mode
));
}
let actual_mode = SearchMode::parse(requested_mode).unwrap_or(SearchMode::Hybrid);
let client = if actual_mode != SearchMode::Lexical {
let ollama_cfg = &config.embedding;
Some(OllamaClient::new(OllamaConfig {
base_url: ollama_cfg.base_url.clone(),
model: ollama_cfg.model.clone(),
..OllamaConfig::default()
}))
} else {
None
};
let doc_count: i64 = conn
.query_row("SELECT COUNT(*) FROM documents", [], |row| row.get(0))
@@ -89,7 +95,7 @@ pub fn run_search(
warnings.push("No documents indexed. Run 'lore generate-docs' first.".to_string());
return Ok(SearchResponse {
query: query.to_string(),
mode: actual_mode.to_string(),
mode: actual_mode.as_str().to_string(),
total_results: 0,
results: vec![],
warnings,
@@ -151,52 +157,54 @@ pub fn run_search(
limit: cli_filters.limit,
};
let requested = filters.clamp_limit();
let top_k = if filters.has_any_filter() {
(requested * 50).clamp(200, 1500)
} else {
(requested * 10).clamp(50, 1500)
};
let fts_results = search_fts(&conn, query, top_k, fts_mode)?;
let fts_tuples: Vec<(i64, f64)> = fts_results
.iter()
.map(|r| (r.document_id, r.bm25_score))
.collect();
let snippet_map: std::collections::HashMap<i64, String> = fts_results
// Run FTS separately for snippet extraction (search_hybrid doesn't return snippets).
let snippet_top_k = filters
.clamp_limit()
.checked_mul(10)
.unwrap_or(500)
.clamp(50, 1500);
let fts_results = search_fts(&conn, query, snippet_top_k, fts_mode)?;
let snippet_map: HashMap<i64, String> = fts_results
.iter()
.map(|r| (r.document_id, r.snippet.clone()))
.collect();
let ranked = rank_rrf(&[], &fts_tuples);
let ranked_ids: Vec<i64> = ranked.iter().map(|r| r.document_id).collect();
// search_hybrid handles recall sizing, RRF ranking, and filter application internally.
let (hybrid_results, mut hybrid_warnings) = search_hybrid(
&conn,
client.as_ref(),
query,
actual_mode,
&filters,
fts_mode,
)
.await?;
warnings.append(&mut hybrid_warnings);
let filtered_ids = apply_filters(&conn, &ranked_ids, &filters)?;
if filtered_ids.is_empty() {
if hybrid_results.is_empty() {
return Ok(SearchResponse {
query: query.to_string(),
mode: actual_mode.to_string(),
mode: actual_mode.as_str().to_string(),
total_results: 0,
results: vec![],
warnings,
});
}
let hydrated = hydrate_results(&conn, &filtered_ids)?;
let ranked_ids: Vec<i64> = hybrid_results.iter().map(|r| r.document_id).collect();
let hydrated = hydrate_results(&conn, &ranked_ids)?;
let rrf_map: std::collections::HashMap<i64, &crate::search::RrfResult> =
ranked.iter().map(|r| (r.document_id, r)).collect();
let hybrid_map: HashMap<i64, &HybridResult> =
hybrid_results.iter().map(|r| (r.document_id, r)).collect();
let mut results: Vec<SearchResultDisplay> = Vec::with_capacity(hydrated.len());
for row in &hydrated {
let rrf = rrf_map.get(&row.document_id);
let hr = hybrid_map.get(&row.document_id);
let fts_snippet = snippet_map.get(&row.document_id).map(|s| s.as_str());
let snippet = get_result_snippet(fts_snippet, &row.content_text);
let explain_data = if explain {
rrf.map(|r| ExplainData {
hr.map(|r| ExplainData {
vector_rank: r.vector_rank,
fts_rank: r.fts_rank,
rrf_score: r.rrf_score,
@@ -217,14 +225,14 @@ pub fn run_search(
labels: row.labels.clone(),
paths: row.paths.clone(),
snippet,
score: rrf.map(|r| r.normalized_score).unwrap_or(0.0),
score: hr.map(|r| r.score).unwrap_or(0.0),
explain: explain_data,
});
}
Ok(SearchResponse {
query: query.to_string(),
mode: actual_mode.to_string(),
mode: actual_mode.as_str().to_string(),
total_results: results.len(),
results,
warnings,
@@ -301,67 +309,97 @@ fn parse_json_array(json: &str) -> Vec<String> {
.collect()
}
/// Render FTS snippet with `<mark>` tags as terminal highlight style.
fn render_snippet(snippet: &str) -> String {
let mut result = String::new();
let mut remaining = snippet;
while let Some(start) = remaining.find("<mark>") {
result.push_str(&Theme::muted().render(&remaining[..start]));
remaining = &remaining[start + 6..];
if let Some(end) = remaining.find("</mark>") {
let highlighted = &remaining[..end];
result.push_str(&Theme::highlight().render(highlighted));
remaining = &remaining[end + 7..];
}
}
result.push_str(&Theme::muted().render(remaining));
result
}
pub fn print_search_results(response: &SearchResponse) {
if !response.warnings.is_empty() {
for w in &response.warnings {
eprintln!("{} {}", style("Warning:").yellow(), w);
eprintln!("{} {}", Theme::warning().render("Warning:"), w);
}
}
if response.results.is_empty() {
println!("No results found for '{}'", style(&response.query).bold());
println!(
"No results found for '{}'",
Theme::bold().render(&response.query)
);
return;
}
println!(
"{} results for '{}' ({})",
response.total_results,
style(&response.query).bold(),
response.mode
"\n {} results for '{}' {}",
Theme::bold().render(&response.total_results.to_string()),
Theme::bold().render(&response.query),
Theme::muted().render(&response.mode)
);
println!();
for (i, result) in response.results.iter().enumerate() {
let type_prefix = match result.source_type.as_str() {
"issue" => "Issue",
"merge_request" => "MR",
"discussion" => "Discussion",
_ => &result.source_type,
println!();
let type_badge = match result.source_type.as_str() {
"issue" => Theme::issue_ref().render("issue"),
"merge_request" => Theme::mr_ref().render(" mr "),
"discussion" => Theme::info().render(" disc"),
"note" => Theme::muted().render(" note"),
_ => Theme::muted().render(&format!("{:>5}", &result.source_type)),
};
// Title line: rank, type badge, title
println!(
"[{}] {} - {} (score: {:.2})",
i + 1,
style(type_prefix).cyan(),
result.title,
result.score
" {:>3}. {} {}",
Theme::muted().render(&(i + 1).to_string()),
type_badge,
Theme::bold().render(&result.title)
);
if let Some(ref url) = result.url {
println!(" {}", style(url).dim());
// Metadata: project, author, labels — compact middle-dot line
let sep = Theme::muted().render(" \u{b7} ");
let mut meta_parts: Vec<String> = Vec::new();
meta_parts.push(Theme::muted().render(&result.project_path));
if let Some(ref author) = result.author {
meta_parts.push(Theme::username().render(&format!("@{author}")));
}
println!(
" {} | {}",
style(&result.project_path).dim(),
result
.author
.as_deref()
.map(|a| format!("@{}", a))
.unwrap_or_default()
);
if !result.labels.is_empty() {
println!(" Labels: {}", result.labels.join(", "));
let label_str = if result.labels.len() <= 3 {
result.labels.join(", ")
} else {
format!(
"{} +{}",
result.labels[..2].join(", "),
result.labels.len() - 2
)
};
meta_parts.push(Theme::muted().render(&label_str));
}
println!(" {}", meta_parts.join(&sep));
let clean_snippet = result.snippet.replace("<mark>", "").replace("</mark>", "");
println!(" {}", style(clean_snippet).dim());
// Snippet with highlight styling
let rendered = render_snippet(&result.snippet);
println!(" {rendered}");
if let Some(ref explain) = result.explain {
println!(
" {} fts_rank={} rrf_score={:.6}",
style("[explain]").magenta(),
" {} vec={} fts={} rrf={:.4}",
Theme::accent().render("explain"),
explain
.vector_rank
.map(|r| r.to_string())
.unwrap_or_else(|| "-".into()),
explain
.fts_rank
.map(|r| r.to_string())
@@ -369,9 +407,9 @@ pub fn print_search_results(response: &SearchResponse) {
explain.rrf_score
);
}
println!();
}
println!();
}
#[derive(Serialize)]
@@ -386,11 +424,20 @@ struct SearchMeta {
elapsed_ms: u64,
}
pub fn print_search_results_json(response: &SearchResponse, elapsed_ms: u64) {
pub fn print_search_results_json(
response: &SearchResponse,
elapsed_ms: u64,
fields: Option<&[String]>,
) {
let output = SearchJsonOutput {
ok: true,
data: response,
meta: SearchMeta { elapsed_ms },
};
println!("{}", serde_json::to_string(&output).unwrap());
let mut value = serde_json::to_value(&output).unwrap();
if let Some(f) = fields {
let expanded = crate::cli::robot::expand_fields_preset(f, "search");
crate::cli::robot::filter_fields(&mut value, "results", &expanded);
}
println!("{}", serde_json::to_string(&value).unwrap());
}

Some files were not shown because too many files have changed in this diff Show More