194 Commits

Author SHA1 Message Date
teernisse
171260a772 feat(cli): implement 'lore trace' command (bd-2n4, bd-9dd)
Gate 5 Code Trace - Tier 1 (API-only, no git blame).
Answers 'Why was this code introduced?' by building
file -> MR -> issue -> discussion chains.

New files:
- src/core/trace.rs: run_trace() query logic with rename-aware
  path resolution, entity_reference-based issue linking, and
  DiffNote discussion extraction
- src/core/trace_tests.rs: 7 unit tests for query logic
- src/cli/commands/trace.rs: CLI command with human output,
  robot JSON output, and :line suffix parsing (5 tests)

Human output shows full content (no truncation).
Robot JSON truncates discussion bodies to 500 chars for token efficiency.

Wiring:
- TraceArgs + Commands::Trace in cli/mod.rs
- handle_trace in main.rs
- VALID_COMMANDS + robot-docs manifest entry
- COMMAND_FLAGS autocorrect registry entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 14:57:21 -05:00
teernisse
a1bca10408 feat(cli): implement 'lore file-history' command (bd-z94)
Adds file-history command showing which MRs touched a file, with:
- Rename chain resolution via BFS (resolve_rename_chain from bd-1yx)
- DiffNote discussion snippets with --discussions flag
- --merged filter, --no-follow-renames, -n limit
- Human output with styled MR list and rename chain display
- Robot JSON output with {ok, data, meta} envelope
- Autocorrect registry and robot-docs manifest entry
- Fixes pre-existing --no-status missing from sync autocorrect registry
2026-02-17 12:57:56 -05:00
teernisse
491dc52864 release: v0.8.3 2026-02-16 10:29:52 -05:00
teernisse
b9063aa17a feat(cli): add --no-status flag to skip GraphQL status enrichment during sync 2026-02-16 10:29:11 -05:00
teernisse
fc0d9cb1d3 feat(sync): colored stage output, functional sub-rows, and error visibility
Overhaul the sync command's human output to use semantic colors and a
cleaner rendering architecture. The changes fall into four areas:

Stage lines: Replace direct finish_stage() calls with an
emit_stage_line/emit_stage_block pattern that clears the spinner first,
then prints static lines via MultiProgress::suspend. Stage icons are
now color-coded green (success) or yellow (warning) via color_icon().
A separate "Status" stage line now appears after Issues, summarizing
work-item status enrichment across all projects.

Sub-rows: Replace the imperative print_issue_sub_rows/print_mr_sub_rows
functions with functional issue_sub_rows(), mr_sub_rows(), and new
status_sub_rows() that return Vec<String>. Project paths use
Theme::muted(), error/failure counts use Theme::warning(), and
separators use the dim middle-dot style. Sub-rows are printed atomically
with their parent stage line to avoid interleaving with spinners.

Summary: In print_sync(), counts now use Theme::info().bold() for visual
pop, detail-line separators are individually styled (dim middle-dot),
and a new "Sync completed with issues" headline appears when any stage
had failures. Document errors and embedding failures are surfaced in
both the doc-parts line and the errors line.

Tests: Full coverage for append_failures, summarize_status_enrichment,
should_print_timings, issue_sub_rows, mr_sub_rows, and status_sub_rows.
2026-02-16 09:43:36 -05:00
teernisse
c8b47bf8f8 feat(cli): add --timings flag and enrich error tracking fields
Add -t/--timings flag to the sync subcommand, allowing users to opt
into a per-stage timing breakdown after the sync summary. Wire the flag
through main.rs into print_sync() which passes it to the new
should_print_timings() gate.

Enrich the data structures that flow through the sync pipeline so
downstream renderers have full error visibility:

- ProjectSummary gains status_errors (issue-side status enrichment
  failures per project)
- ProjectStatusEnrichment gains path (project path for sub-row display)
- SyncResult gains documents_errored and embedding_failed so the
  summary can surface doc-gen and embed failures separately
- Autocorrect table updated with --timings for fuzzy flag matching
2026-02-16 09:43:22 -05:00
teernisse
a570327a6b refactor(progress): extract format_stage_line with themed styling
Pull the line-formatting logic out of finish_stage() into a standalone
public format_stage_line() so that sync.rs can build stage lines without
needing a live ProgressBar (e.g. for static multi-line blocks printed
after the spinner is cleared).

The new function applies Theme::info().bold() to the label and
Theme::timing() to the elapsed column, giving every stage line
consistent color treatment. finish_stage() now delegates to it.

Includes a unit test asserting the formatted output contains the
expected icon, label, summary, and elapsed components.
2026-02-16 09:43:13 -05:00
teernisse
eef73decb5 fix(cli): timeline tag width, test env isolation, and logging verbosity
Miscellaneous fixes across CLI and core modules:

- Timeline: widen TAG_WIDTH from 10 to 11 to accommodate longer event
  type labels without truncation
- render.rs: save and restore LORE_ICONS env var in glyph_mode test to
  prevent interference from the test environment leaking into or from
  other tests that set LORE_ICONS
- logging.rs: adjust verbose=1 to info level (was debug), verbose=2 to
  debug — this reduces noise at -v while keeping -vv as the full debug
  experience
- issues.rs, merge_requests.rs: use infodebug! macro consistently for
  ingestion summary logging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 11:25:42 -05:00
teernisse
bb6660178c feat(sync): per-project breakdown, status enrichment progress bars, and summary polish
Add per-project detail rows beneath stage completion lines during multi-project
syncs, showing itemized counts (issues/MRs, discussions, events, statuses, diffs)
for each project. Previously, only aggregate totals were visible, making it hard
to diagnose which project contributed what during a sync.

Status enrichment gets proper progress bars replacing the old spinner-only
display: StatusEnrichmentStarted now carries a total count so the CLI can
render a determinate bar with rate and ETA. The enrichment SQL is tightened
to use IS NOT comparisons for diff-only UPDATEs (skip rows where values
haven't changed), and a follow-up touch_stmt ensures status_synced_at is
updated even for unchanged rows so staleness detection works correctly.

Other improvements:
- New ProjectSummary struct aggregates per-project metrics during ingestion
- SyncResult gains statuses_enriched + per-project summary vectors
- "Already up to date" message when sync finds zero changes
- Remove Arc<AtomicBool> tick_started pattern from docs/embed stages
  (enable_steady_tick is idempotent, the guard was unnecessary)
- Progress bar styling: dim spinner, dark_gray track, per_sec + eta display
- Tick intervals tightened from 100ms to 60ms for smoother animation
- statuses_without_widget calculation uses fetch_result.statuses.len()
  instead of subtracting enriched (more accurate when some statuses lack
  work item widgets)
- Status enrichment completion log downgraded from info to debug

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 11:25:33 -05:00
teernisse
64e73b1cab fix(graphql): handle past HTTP dates in retry-after header gracefully
Extract parse_retry_after_value(header, now) as a pure function to enable
deterministic testing of Retry-After header parsing. The previous
implementation used let-chains with SystemTime::now() inline, which made
it untestable and would panic on negative durations when the server
clock was behind or the header contained a date in the past.

Changes:
- Extract parse_retry_after_value() taking an explicit `now` parameter
- Handle past HTTP dates by returning 1 second instead of panicking on
  negative Duration (date.duration_since(now) returns Err for past dates)
- Trim whitespace from header values before parsing
- Add test for past HTTP date returning 1 second minimum
- Add test for delta-seconds with surrounding whitespace

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 11:25:19 -05:00
teernisse
361757568f refactor(cli): remove deprecated stage_spinner, migrate remaining callers to v2
Phase 7 cleanup: migrate timeline.rs and main.rs search spinner
from stage_spinner() to stage_spinner_v2() with proper icon labels,
then remove the now-unused stage_spinner() function and its tests.

No external callers remain for the old numbered-stage API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:13:06 -05:00
Taylor Eernisse
8572f6cc04 refactor(cli): polish secondary commands with icons, number formatting, and section dividers
Phase 6 of the UX overhaul. Applies consistent visual treatment across
the remaining command outputs: stats, doctor, timeline, who, count,
and drift.

Stats (stats.rs):
- Apply render::format_number() to all numeric values (documents,
  FTS indexed, embedding counts, chunks) for thousand-separator
  formatting in large databases

Doctor (doctor.rs):
- Replace Unicode check/warning/cross symbols with Icons::success(),
  Icons::warning(), Icons::error() for glyph-mode awareness
- Add summary line after checks showing "Ready/Not ready" with counts
  of passed, warnings, and failed checks separated by middle dots
- Remove "lore doctor" title header for cleaner output

Count (count.rs):
- Right-align numeric values with {:>10} format for columnar output
  in count and state breakdown displays

Timeline (timeline.rs):
- Add entity icons (issue/MR) before entity references in event rows
- Refactor format_event_tag to pad plain text before applying style,
  preventing ANSI codes from breaking column alignment
- Extract style_padded() helper for width-then-style pattern

Who (who.rs):
- Add Icons::user() before usernames in expert, workload, reviews,
  and overlap displays
- Replace manual bold section headers with render::section_divider()
  in workload view (Assigned Issues, Authored MRs, Reviewing MRs,
  Unresolved Discussions)

Drift (drift.rs):
- Add Icons::error()/success() before drift detection status line
- Replace '#' bar character with Unicode full block for similarity
  curve visualization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
d0744039ef refactor(show): polish issue and MR detail views with section dividers and icons
Phase 4 of the UX overhaul. Restructures the show issue and show MR
detail displays with consistent section layout, state icons, and
improved typography.

Issue detail changes:
- Replace bold header + box-drawing underline with indented title using
  Theme::bold() for the title text only
- Organize fields into named sections using render::section_divider():
  Details, Development, Description, Discussions
- Add state icons (Icons::issue_opened/closed) alongside text labels
- Add relative time in parentheses next to Created/Updated dates
- Switch labels from "Labels: (none)" to only showing when present,
  using format_labels_bare for clean comma-separated output
- Move URL and confidential indicator into Details section
- Closing MRs show state-colored icons (merged/opened/closed)
- Discussions use section_divider instead of bold text, remove colons
  from author lines, adjust wrap widths for consistent indentation

MR detail changes:
- Same section-divider layout: Details, Description, Discussions
- State icons for opened/merged/closed using Icons::mr_* helpers
- Draft indicator uses Icons::mr_draft() instead of [Draft] text prefix
- Relative times added to Created, Updated, Merged, Closed dates
- Reviewers and Assignees fields aligned with fixed-width labels
- Labels shown only when present, using format_labels_bare
- Discussion formatting matches issue detail style

Both views use 5-space left indent for field alignment and consistent
wrap widths (72 for descriptions, 68/66 for discussion notes/replies).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
4b372dfb38 refactor(list): polish list commands with icons, compact timestamps, and styled discussions
Phase 3 of the UX overhaul. Enhances the issues, merge requests, and
notes list displays with visual indicators and improved formatting.

List display changes (src/cli/commands/list.rs):
- Add state icons to issues (opened/closed) and merge requests
  (opened/merged/closed) using Icons:: helpers alongside text labels
- Replace [DRAFT] prefix with Icons::mr_draft() glyph for draft MRs
- Switch from format_relative_time to format_relative_time_compact for
  tighter column widths in tabular output
- Switch from format_labels to format_labels_bare for unlabeled style
- Change format_discussions() return type from String to StyledCell so
  unresolved counts render with Theme::warning() color inline
- Bold the section headers ("Issues", "Merge Requests", "Notes")
  with count separated from the label for cleaner scanning
- Import Icons from render module

Test updates (src/cli/commands/list_tests.rs):
- Update format_discussions tests to assert on StyledCell.text field
  instead of raw String, since the function now returns styled output
- The unresolved-count test checks starts_with/contains to handle
  embedded ANSI escape codes from Theme::warning()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
af8fc4af76 refactor(sync): overhaul progress display with stage spinners and summaries
Phase 2 of the UX overhaul. Replaces the old numbered-stage progress
system (1/4, 2/4...) and manual indicatif ProgressBar/ProgressStyle
setup with the new centralized progress helpers.

Sync command changes (src/cli/commands/sync.rs):
- Replace stage_spinner(n, total, msg) with stage_spinner_v2(icon, label, status)
  removing the rigid numbered-stage counter in favor of named stages
- Replace manual ProgressBar::new + ProgressStyle::default_bar for docs
  and embed sub-progress with nested_progress(label, len, robot_mode)
- Add finish_stage() calls that display a completion summary with
  elapsed time, e.g. "Issues  42 issues from 3 projects  1.2s"
- Each stage (Issues, MRs, Docs, Embed) now reports what it did on
  completion rather than just clearing the spinner silently
- Embed failure path uses Icons::warning() instead of inline Theme
  formatting, keeping error display consistent with success path
- Remove indicatif direct dependency from sync.rs (now handled by
  progress module)

Main entry point changes (src/main.rs):
- Add GlyphMode detection: auto-detect Unicode/Nerd Font support or
  fall back to ASCII based on --icons flag, --color=never, NO_COLOR,
  or robot mode
- Update all LoreRenderer::init() calls to pass GlyphMode alongside
  ColorMode for icon-aware rendering throughout the CLI
- Overhaul handle_error() formatting: use Icons::error() glyph,
  bold error text, arrow prefixed action suggestions, and breathing
  room with blank lines for scannability
- Migrate handle_embed() progress bar from manual ProgressBar +
  ProgressStyle to nested_progress() helper, matching sync command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
96b288ccdd refactor(search): polish search results rendering with semantic Theme styles
Phase 5 of the UX overhaul. Migrates search result display from raw
console styling to the centralized Theme system with semantic methods,
improving visual consistency and readability.

Search result changes:
- Type badges now use semantic styles (issue_ref, mr_ref) with
  fixed-width alignment for clean columnar layout
- Snippet rendering uses Theme::highlight() for matched terms and
  Theme::muted() for surrounding context, replacing bold+underline
- Metadata line uses Theme::username() for authors and per-part
  styling with middle-dot separators instead of a single dim line
- Result numbering uses muted style with right-aligned width
- Consistent 8-space indent for metadata, snippets, and explain lines
- Header line uses muted style for search mode instead of dim+parens
- Trailing blank line moved after the result loop instead of per-result

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
teernisse
d710403567 feat(cli): add GlyphMode icon system, Theme extensions, and progress API
Phase 1 of UX skin overhaul: foundation layer that all subsequent
phases build upon.

Icons: 3-tier glyph system (Nerd Font / Unicode / ASCII) with
auto-detection from TERM_PROGRAM, LORE_ICONS env, or --icons flag.
16 semantic icon methods on Icons struct (success, warning, error,
issue states, MR states, note, search, user, sync, waiting).

Theme: 4 new semantic styles — muted (#6b7280), highlight (#fbbf24),
timing (#94a3b8), state_draft (#6b7280).

Progress: stage_spinner_v2 with icon prefix, nested_progress with
bounded bar/throughput/ETA, finish_stage for static completion lines,
format_elapsed for compact duration strings.

Utilities: format_relative_time_compact (3h, 2d, 1w, 3mo),
format_labels_bare (comma-separated without brackets).

CLI: --icons global flag, GLOBAL_FLAGS registry updated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
ebf64816c9 fix(search): correct FTS5 raw mode fallback test assertion
Update test_raw_mode_leading_wildcard_falls_back_to_safe to match the
actual Safe mode behavior: OR is a recognized FTS5 boolean operator and
passes through unquoted, so the expected output is '"*" OR "auth"' not
'"*" "OR" "auth"'. The previous assertion was incorrect since the Safe
mode operator-passthrough logic was added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:34:01 -05:00
Taylor Eernisse
450951dee1 feat(timeline): rename --expand-mentions to --no-mentions, default mentions on
Invert the timeline mention-expansion flag semantics. Previously, mention
edges were excluded by default and --expand-mentions opted in. Now mention
edges are included by default (matching the more common use case) and
--no-mentions opts out to reduce fan-out when needed.

This is a breaking CLI change but aligns with the principle that the
default behavior should produce the most useful output. Users who were
passing --expand-mentions get the same behavior without any flag. Users
who want reduced output can pass --no-mentions.

Updated: CLI args (TimelineArgs), autocorrect flag list, robot-docs
schema, README documentation and flag reference table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:33:34 -05:00
Taylor Eernisse
81f049a7fa refactor(main): wire LoreRenderer init, migrate to Theme, improve UX polish
Wire the LoreRenderer singleton initialization into main.rs color mode
handling, replacing the console::style import with Theme throughout.

Key changes:

- Color initialization: LoreRenderer::init() called for all code paths
  (NO_COLOR, --color never/always/auto, unknown mode fallback) alongside
  the existing console::set_colors_enabled() calls. Both systems must
  agree since some transitive code still uses console (e.g. dialoguer).

- Tracing: Replace .with_target(false) with .event_format(CompactHumanFormat)
  for the stderr layer, producing the clean 'HH:MM:SS LEVEL  message' format.

- Error handling: handle_error() now shows machine-actionable recovery
  commands from gi_error.actions() below the hint, formatted with dim '$'
  prefix and bold command text.

- Deprecation warnings: All 'lore list', 'lore show', 'lore auth-test',
  'lore sync-status' warnings migrated to Theme::warning().

- Init wizard: All success/info/error messages migrated. Unicode check
  marks use explicit \u{2713} escapes instead of literal symbols.

- Embed command: Added progress bar with indicatif for embedding stage,
  showing position/total with steady tick. Elapsed time shown on completion.

- Generate-docs and ingest commands: Added 'Done in Xs' elapsed time and
  next-step hints (run embed after generate-docs, run generate-docs after
  ingest) for better workflow guidance.

- Sync output: Interrupt message and lock release migrated to Theme.

- Health command: Status labels and overall healthy/unhealthy styled.

- Robot-docs: Added drift command schema, updated sync flags to include
  --no-file-changes, updated who flags with new options.

- Timeline --expand-mentions -> --no-mentions flag rename wired through
  params and robot-docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:33:09 -05:00
Taylor Eernisse
dd00a2b840 refactor(cli): migrate all command modules from console::style to Theme
Replace all console::style() calls in command modules with the centralized
Theme API and render:: utility functions. This ensures consistent color
behavior across the entire CLI, proper NO_COLOR/--color never support via
the LoreRenderer singleton, and eliminates duplicated formatting code.

Changes per module:

- count.rs: Theme for table headers, render::format_number replacing local
  duplicate. Removed local format_number implementation.
- doctor.rs: Theme::success/warning/error for check status symbols and
  messages. Unicode escapes for check/warning/cross symbols.
- drift.rs: Theme::bold/error/success for drift detection headers and
  status messages.
- embed.rs: Compact output format — headline with count, zero-suppressed
  detail lines, 'nothing to embed' short-circuit for no-op runs.
- generate_docs.rs: Same compact pattern — headline + detail + hint for
  next step. No-op short-circuit when regenerated==0.
- ingest.rs: Theme for project summaries, sync status, dry-run preview.
  All console::style -> Theme replacements.
- list.rs: Replace comfy-table with render::LoreTable for issue/MR listing.
  Remove local colored_cell, colored_cell_hex, format_relative_time,
  truncate_with_ellipsis, and format_labels (all moved to render.rs).
- list_tests.rs: Update test assertions to use render:: functions.
- search.rs: Add render_snippet() for FTS5 <mark> tag highlighting via
  Theme::bold().underline(). Compact result layout with type badges.
- show.rs: Theme for entity detail views, delegate format_date and
  wrap_text to render module.
- stats.rs: Section-based layout using render::section_divider. Compact
  middle-dot format for document counts. Color-coded embedding coverage
  percentage (green >=95%, yellow >=50%, red <50%).
- sync.rs: Compact sync summary — headline with counts and elapsed time,
  zero-suppressed detail lines, visually prominent error-only section.
- sync_status.rs: Theme for run history headers, removed local
  format_number duplicate.
- timeline.rs: Theme for headers/footers, render:: for date/truncate,
  standard format! padding replacing console::pad_str.
- who.rs: Theme for all expert/workload/active/overlap/review output
  modes, render:: for relative time and truncation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:32:35 -05:00
Taylor Eernisse
c6a5461d41 refactor(ingestion): compact log summaries and quieter shutdown messages
Migrate all ingestion completion logs to use nonzero_summary() for compact,
zero-suppressed output. Before: 8-14 individual key=value structured fields
per completion message. After: a single summary field like
'42 fetched · 3 labels · 12 notes' that only shows non-zero counters.

Also downgrade all 'Shutdown requested...' messages from info! to debug!.
These are emitted on every Ctrl+C and add noise to the partial results
output that immediately follows. They remain visible at -vv for debugging
graceful shutdown behavior.

Affected modules:
- issues.rs: issue ingestion completion
- merge_requests.rs: MR ingestion completion, full-sync cursor reset
- mr_discussions.rs: discussion ingestion completion
- orchestrator.rs: project-level issue and MR completion summaries,
  all shutdown-requested checkpoints across discussion sync, resource
  events drain, closes-issues drain, and MR diffs drain

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:57 -05:00
Taylor Eernisse
a7f86b26e4 refactor(core): compact human log format, quieter lock lifecycle, nonzero_summary helper
Three quality-of-life improvements to reduce log noise and improve readability:

1. logging.rs: Add CompactHumanFormat for stderr tracing output. Replaces the
   default format with a minimal 'HH:MM:SS LEVEL  message key=value' layout —
   no span context, no full timestamps, no target module. The JSON file log
   layer is unaffected. This makes watching 'lore sync' output much cleaner.

2. lock.rs: Downgrade AppLock acquire/release messages from info! to debug!.
   Lock lifecycle events (acquired new, acquired existing, released) are
   operational bookkeeping that clutters normal output. They remain visible
   at -vv verbosity for troubleshooting.

3. ingestion/mod.rs: Add nonzero_summary() utility that formats named counters
   as a compact middle-dot-separated string, suppressing zero values. Produces
   output like '42 fetched · 3 labels · 12 notes' instead of verbose key=value
   structured fields. Returns 'nothing to update' when all values are zero.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:30 -05:00
Taylor Eernisse
5ee8b0841c feat(cli): add centralized render module with semantic Theme and LoreRenderer
Introduce src/cli/render.rs as the single source of truth for all terminal
output styling and formatting utilities. Key components:

- LoreRenderer: global singleton initialized once at startup, resolving
  color mode (Auto/Always/Never) against TTY state and NO_COLOR env var.
  This fixes lipgloss's limitation of hardcoded TrueColor rendering by
  gating all style application through a colors_on() check.

- Theme: semantic style constants (success/warning/error/info/accent,
  entity refs, state colors, structural styles) that return plain
  Style::new() when colors are disabled. Replaces ad-hoc console::style()
  calls scattered across 15+ command modules.

- Shared formatting utilities consolidated from duplicated implementations:
  format_relative_time (was in list.rs and who.rs), format_number (was in
  count.rs and sync_status.rs), truncate (was truncate_with_ellipsis in
  list.rs and truncate_summary in timeline.rs), format_labels, format_date,
  wrap_indent, section_divider.

- LoreTable: lightweight table renderer replacing comfy-table with simple
  column alignment (Left/Right/Center), adaptive terminal width, and
  NO_COLOR-safe output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:02 -05:00
Taylor Eernisse
7062a3f1fd deps: replace comfy-table with lipgloss (charmed-lipgloss)
Switch from comfy-table to the lipgloss Rust port for terminal styling.
lipgloss provides a composable Style API better suited to our new semantic
theming approach (Theme::success(), Theme::error(), etc.) where we apply
styles to individual text spans rather than constructing styled table cells.
The comfy-table dependency was only used by the list command's human output
and is no longer needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:30:31 -05:00
teernisse
159c490ad7 docs: update README with notes, drift, error tolerance, scoring config, and expanded command reference
Major additions:
- lore notes command: full documentation of rich note querying with
  filters (author, type, path, resolution, time range, body substring),
  sort/format options, field selection, and browser opening
- lore drift command: discussion divergence detection documentation
- Error Tolerance section: table of all 8 auto-correction types with
  examples and mode behavior, stderr JSON warning format, fuzzy
  suggestion format for unrecognized commands
- Command Aliases table: primary commands and their accepted aliases
- scoring config section: all weight/half-life/decay parameters for
  the who-expert scoring engine (authorWeight, reviewerWeight, noteBonus,
  half-life periods, closedMrMultiplier, excludedUsernames)

Updates to existing sections:
- Timeline: entity-direct seeding syntax (issue:N, i:N, mr:N, m:N),
  hybrid search pipeline description replacing pure FTS5, discussion
  thread collection, --fields flag, numbered progress spinners
- Search: --after/--updated-after renamed to --since/--updated-since,
  progress spinner behavior, note type filter
- Who: --explain-score, --as-of, --include-bots, --all-history, --detail
- Sync: --no-file-changes flag
- Robot-docs: --brief flag
- Field selection: expanded to note which commands support --fields
2026-02-13 17:27:59 -05:00
teernisse
e0041ed4d9 feat(cli): improve error recovery with alias-aware suggestions and error tolerance manifest
Two related improvements to agent ergonomics in main.rs:

1. suggest_similar_command now matches against aliases (issue->issues,
   mr->mrs, find->search, stat->stats, note->notes, etc.) and provides
   contextual usage examples via a new command_example() helper, so
   agents get actionable recovery hints like "Did you mean 'lore mrs'?
   Example: lore --robot mrs -n 10" instead of just the command name.

2. robot-docs now includes an error_tolerance section documenting every
   auto-correction the CLI performs: types (single_dash_long_flag,
   case_normalization, flag_prefix, fuzzy_flag, subcommand_alias,
   value_normalization, value_fuzzy, prefix_matching), examples, and
   mode behavior (threshold differences). Also expands the aliases
   section with command_aliases and pre_clap_aliases maps for complete
   agent self-discovery.

Together these ensure agents can programmatically discover and recover
from any CLI input error without human intervention.
2026-02-13 17:27:49 -05:00
teernisse
a34751bd47 feat(autocorrect): expand pre-clap correction to 3-phase pipeline with subcommand aliases, value normalization, and flag prefix matching
Three-phase pipeline replacing the single-pass correction:

- Phase A: Subcommand alias correction — handles forms clap can't
  express (merge_requests, mergerequests, robotdocs, generatedocs,
  gen-docs, etc.) via case-insensitive alias map lookup.
- Phase B: Per-arg flag corrections — adds unambiguous prefix expansion
  (--proj -> --project) alongside existing single-dash, case, and fuzzy
  rules. New FlagPrefix rule with 0.95 confidence.
- Phase C: Enum value normalization — auto-corrects casing, prefixes,
  and typos for flags with known valid values. Handles both --flag value
  and --flag=value forms. Respects POSIX -- option terminator.

Changes strict/robot mode from disabling fuzzy matching entirely to using
a higher threshold (0.9 vs 0.8), still catching obvious typos like
--projct while avoiding speculative corrections that mislead agents.

New CorrectionRule variants: SubcommandAlias, ValueNormalization,
ValueFuzzy, FlagPrefix. Each has a corresponding teaching note.
Comprehensive test coverage for all new correction types including
subcommand aliases, value normalization (case, prefix, fuzzy, eq-form),
flag prefix (ambiguous rejection, eq-value preservation), and updated
strict mode behavior.
2026-02-13 17:27:39 -05:00
teernisse
0aecbf33c0 feat(xref): extract cross-references from descriptions, user notes, and fix system note regex
- Fix MENTIONED_RE/CLOSED_BY_RE to match real GitLab format
  ('mentioned in issue #N' / 'mentioned in merge request !N')
- Add GITLAB_URL_RE + parse_url_refs() for full URL extraction
- Add extract_refs_from_descriptions() -> source_method='description_parse'
- Add extract_refs_from_user_notes() -> source_method='note_parse'
- Wire both into orchestrator after system note extraction
- 36 tests: regex fix, URL parsing, integration, idempotency
2026-02-13 17:19:36 -05:00
teernisse
c10471ddb9 feat(timeline): add entity-direct seeding (issue:N, mr:N syntax)
Adds issue:N / i:N / mr:N / m:N query syntax to bypass hybrid search
and seed the timeline directly from a known entity. All discussions for
the entity are gathered without needing Ollama.

- parse_timeline_query() detects entity-direct patterns
- resolve_entity_by_iid() resolves IID to EntityRef with ambiguity handling
- seed_timeline_direct() gathers all discussions for the entity
- 20 new tests (5 resolve, 6 direct seed, 9 parse)
- Updated CLI help text and robot-docs manifest
2026-02-13 15:22:45 -05:00
teernisse
cbce4c9f59 release: v0.8.2 2026-02-13 15:01:28 -05:00
teernisse
94435c37f0 perf(timeline): hoist prepared statement outside discussion thread loop
Moves the conn.prepare() call for fetching discussion notes outside the
per-discussion loop in collect_discussion_threads(). The SQL is identical
for every iteration, so preparing it once and rebinding parameters avoids
redundant statement compilation on each matched discussion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:40 -05:00
teernisse
59f65b127a fix(search): pass FTS5 boolean operators through unquoted
FTS5 boolean operators (AND, OR, NOT, NEAR) are case-sensitive uppercase
keywords that must appear unquoted in the query string. Previously, the
user-friendly query builder would double-quote every token, causing
queries like "switch AND health" to search for the literal word "AND"
instead of using it as a boolean conjunction.

Adds a FTS5_OPERATORS constant and checks each token against it before
quoting, allowing natural boolean search syntax to work as expected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:29 -05:00
teernisse
f36e900570 feat(cli): add pipeline progress spinners to timeline and search
Adds numbered stage spinners ([1/3], [2/3], [3/3]) to the timeline
pipeline stages (seed, expand, collect) so users see activity during
longer queries. TimelineParams gains a robot_mode field to suppress
spinners in JSON output mode.

Adds a [1/1] spinner to the search command for consistency, using the
shared stage_spinner from cli/progress.

Also refactors wrap_snippet() to delegate to wrap_text() with a 4-line
cap, eliminating the duplicated word-wrapping logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:19 -05:00
teernisse
e2efc61beb refactor(cli): extract stage_spinner to shared progress module
Moves stage_spinner() from a private function in sync.rs to a pub function
in cli/progress.rs so it can be reused by the timeline and search commands.
The function creates a numbered spinner (e.g. [1/3]) for pipeline stages,
returning a hidden no-op bar in robot mode to keep caller code path-uniform.

sync.rs now imports from crate::cli::progress::stage_spinner instead of
defining its own copy. Adds unit tests for robot mode (hidden bar), human
mode (prefix/message properties), and prefix formatting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:10 -05:00
teernisse
2da1a228b3 feat(timeline): collect and render full discussion threads
Implements the downstream consumption of matched discussions from the seed
phase, completing the discussion thread feature across collect, CLI, and
integration tests.

Collect phase (timeline_collect.rs):
- New collect_discussion_threads() function assembles full threads by
  querying notes for each matched discussion_id, filtering out system notes
  (is_system = 0), ordering chronologically, and capping at THREAD_MAX_NOTES
  with a synthetic "[N more notes not shown]" summary note
- build_entity_lookup() creates a (type, id) -> (iid, path) map from seed
  and expanded entities to provide display metadata for thread events
- Thread timestamp is set to the first note's created_at for correct
  chronological interleaving with other timeline events
- collect_events() gains a matched_discussions parameter; threads are
  collected after entity events and before evidence note merging

CLI rendering (cli/commands/timeline.rs):
- Human mode: threads render with box-drawing borders, bold @author tags,
  date-stamped notes, and word-wrapped bodies (60 char width)
- Robot mode: DiscussionThread serializes as discussion_thread kind with
  note_count, full notes array (note_id, author, body, ISO created_at)
- THREAD tag in yellow for human event tag styling
- TimelineMeta gains discussion_threads_included count

Tests:
- 8 new collect tests: basic thread assembly, system note filtering, empty
  thread skipping, body truncation to THREAD_NOTE_MAX_CHARS, note cap with
  synthetic summary, timestamp from first note, chronological sort position,
  and deduplication of duplicate discussion_ids
- Integration tests updated for new collect_events signature

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:18:36 -05:00
teernisse
0e65202778 feat(timeline): add DiscussionThread types and seed-phase discussion matching
Introduces the foundation for full discussion thread support in the
timeline pipeline. Adds three new domain types to timeline.rs:

- ThreadNote: individual note within a thread (id, author, body, timestamp)
- MatchedDiscussion: tracks discussions matched during seeding with their
  parent entity (issue or MR) for downstream collection
- DiscussionThread variant on TimelineEventType: carries a full thread of
  notes, sorted between NoteEvidence and CrossReferenced

Moves truncate_to_chars() from timeline_seed.rs to timeline.rs as pub(crate)
for reuse by the collect phase. Adds THREAD_NOTE_MAX_CHARS (2000) and
THREAD_MAX_NOTES (50) constants.

Upgrades the seed SQL in resolve_documents_to_entities() to resolve note
documents to their parent discussion via an additional LEFT JOIN chain
(notes -> discussions), using COALESCE to unify the entity resolution path
for both discussion and note source types. SeedResult gains a
matched_discussions field that captures deduplicated discussion matches.

Tests cover: discussion matching from discussion docs, note-to-parent
resolution, deduplication of same discussion across multiple docs, and
correct parent entity type (issue vs MR).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:18:18 -05:00
teernisse
f439c42b3d chore: add gitignore for mock-seed, roam CI workflow, formatting
- Add tools/mock-seed/ to .gitignore
- Add .github/workflows/roam.yml CI workflow
- Add .roam/fitness.yaml architectural fitness rules
- Rustfmt formatting fixes in show.rs and vector.rs
- Beads sync

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:30 -05:00
teernisse
4f3ec72923 feat(timeline): upgrade seed phase to hybrid search
Replace FTS-only seed entity discovery with hybrid search (FTS + vector
via RRF), using the same search_hybrid infrastructure as the search
command. Falls back gracefully to FTS-only when Ollama is unavailable.

Changes:
- seed_timeline() now accepts OllamaClient, delegates to search_hybrid
- New resolve_documents_to_entities() replaces find_seed_entities()
- SeedResult gains search_mode field tracking actual mode used
- TimelineResult carries search_mode through to JSON renderer
- run_timeline wires up OllamaClient from config
- handle_timeline made async for the hybrid search await
- Tests updated for new function signatures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:24 -05:00
teernisse
e6771709f1 refactor(core): extract path_resolver module, fix old_path matching in who
Extract shared path resolution logic from who.rs into a new
core::path_resolver module for cross-module reuse. Functions moved:
escape_like, normalize_repo_path, PathQuery, SuffixResult,
build_path_query, suffix_probe. Duplicate escape_like copies removed
from list.rs, project.rs, and filters.rs — all now import from
path_resolver.

Additionally fixes two bugs in query_expert_details() and
query_overlap() where only position_new_path was checked (missing
old_path matches for renamed files) and state filter excluded 'closed'
MRs despite the main scoring query including them with a decay
multiplier.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:14 -05:00
Taylor Eernisse
8c86b0dfd7 release: v0.8.1 2026-02-13 11:12:31 -05:00
teernisse
6e55b2470d bugfix: DB column and size issues 2026-02-13 11:11:35 -05:00
Taylor Eernisse
b05922d60b release: v0.8.0 2026-02-13 10:59:05 -05:00
Taylor Eernisse
11fe02fac9 docs: add proposed code file reorganization plan
Planning document for the ongoing test extraction and code organization
effort. Covers module-by-module analysis, proposed file splits, and
phased execution plan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:56 -05:00
Taylor Eernisse
48fbd4bfdb feat(core): add file rename chain resolver with depth-bounded BFS
New module: core::file_history with resolve_rename_chain() that traces
a file path through its rename history in mr_file_changes using
bidirectional BFS (forward: old_path->new_path, backward: new_path->old_path).

Key design decisions:
- Depth-bounded BFS: each queue entry carries its distance from the
  origin, so max_hops correctly limits by graph distance (not by total
  nodes discovered). This matters for branching rename graphs where a
  file was renamed differently in parallel MRs.
- Cycle-safe: visited set prevents infinite loops from circular renames.
- Project-scoped: queries are always scoped to a single project_id.
- Deterministic: output is sorted for stable results.

Tests cover: linear chains (forward/backward), cycles, max_hops=0,
depth-bounded linear chains, branching renames, diamond patterns,
and cross-project isolation (9 tests total).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:41 -05:00
Taylor Eernisse
9786ef27f5 refactor(core/time): extract parse_since_from for deterministic time parsing
Factor out parse_since_from(input, reference_ms) so callers can compute
relative durations against a fixed reference timestamp instead of always
using now(). The existing parse_since() now delegates to it with now_ms().

Enables testable and reproducible time-relative queries for features like
timeline --as-of and who --as-of.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:20 -05:00
Taylor Eernisse
7e0e6a91f2 refactor: extract unit tests into separate _tests.rs files
Move inline #[cfg(test)] mod tests { ... } blocks from 22 source files
into dedicated _tests.rs companion files, wired via:

    #[cfg(test)]
    #[path = "module_tests.rs"]
    mod tests;

This keeps implementation-focused source files leaner and more scannable
while preserving full access to private items through `use super::*;`.

Modules extracted:
  core:      db, note_parser, payloads, project, references, sync_run,
             timeline_collect, timeline_expand, timeline_seed
  cli:       list (55 tests), who (75 tests)
  documents: extractor (43 tests), regenerator
  embedding: change_detector, chunking
  gitlab:    graphql (wiremock async tests), transformers/issue
  ingestion: dirty_tracker, discussions, issues, mr_diffs

Also adds conflicts_with("explain_score") to the --detail flag in the
who command to prevent mutually exclusive flags from being combined.

All 629 unit tests pass. No behavior changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:02 -05:00
Taylor Eernisse
5c2df3df3b chore(beads): sync issue tracker
Export latest bead state to JSONL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:53:33 -05:00
teernisse
94c8613420 feat(bd-226s): implement time-decay expert scoring model
Replace flat-weight expertise scoring with exponential half-life decay,
split reviewer signals (participated vs assigned-only), dual-path rename
awareness, and new CLI flags (--as-of, --explain-score, --include-bots,
--all-history).

Changes:
- ScoringConfig: 8 new fields with validation (config.rs)
- half_life_decay() and normalize_query_path() pure functions (who.rs)
- CTE-based SQL with dual-path matching, mr_activity, reviewer_participation (who.rs)
- Rust-side decay aggregation with deterministic f64 ordering (who.rs)
- Path resolution probes check old_path columns (who.rs)
- Migration 026: 5 new indexes for dual-path and reviewer participation
- Default --since changed from 6m to 24m
- 31 new tests (example-based + invariant), 621 total who tests passing
- Autocorrect registry updated with new flags

Closes: bd-226s, bd-2w1p, bd-1soz, bd-18dn, bd-2ao4, bd-2yu5, bd-1b50,
bd-1hoq, bd-1h3f, bd-13q8, bd-11mg, bd-1vti, bd-1j5o
2026-02-12 15:44:55 -05:00
teernisse
ad4dd6e855 release: v0.7.0 2026-02-12 13:31:57 -05:00
teernisse
83cd16c918 feat: implement per-note search and document pipeline
- Add SourceType::Note with extract_note_document() and ParentMetadataCache
- Migration 022: composite indexes for notes queries + author_id column
- Migration 024: table rebuild adding 'note' to CHECK constraints, defense triggers
- Migration 025: backfill existing non-system notes into dirty queue
- Add lore notes CLI command with 17 filter options (author, path, resolution, etc.)
- Support table/json/jsonl/csv output formats with field selection
- Wire note dirty tracking through discussion and MR discussion ingestion
- Fix test_migration_024_preserves_existing_data off-by-one (tested wrong migration)
- Fix upsert_document_inner returning false for label/path-only changes
2026-02-12 13:31:24 -05:00
teernisse
fda9cd8835 chore(beads): revise 18 NOTE beads with verified codebase context
Enriched all per-note search beads (NOTE-0A through NOTE-2I) with:
- Corrected migration numbers (022, 024, 025)
- Verified file paths and line numbers from codebase
- Complete function signatures for referenced code
- Detailed approach sections with SQL and Rust patterns
- DocumentData struct field mappings
- TDD anchors with specific test names
- Edge cases from codebase analysis
- Dependency context explaining what each blocker provides
2026-02-12 12:26:48 -05:00
teernisse
c8d609ab78 chore: add drift to autocorrect command registry 2026-02-12 12:10:02 -05:00
teernisse
35c828ba73 feat(bd-91j1): enhance robot-docs with quick_start and example_output
Add quick_start section with glab equivalents, lore-exclusive features,
and read/write split guidance. Add example_output to issues, mrs, search,
and who commands. Update strip_schemas to also strip example_output in
brief mode. Update beads tracking state.

Closes: bd-91j1
2026-02-12 12:09:44 -05:00
teernisse
ecbfef537a feat(bd-1ksf): wire hybrid search (FTS5 + vector + RRF) to CLI
Make run_search async, replace hardcoded lexical mode with SearchMode::parse(),
wire search_hybrid() with OllamaClient for semantic/hybrid modes, graceful
degradation when Ollama unavailable.

Closes: bd-1ksf
2026-02-12 12:03:47 -05:00
teernisse
47eecce8e9 feat(bd-1cjx): add lore drift command for discussion divergence detection
Implement drift detection using cosine similarity between issue description
embedding and chronological note embeddings. Sliding window (size 3) identifies
topic drift points. Includes human and robot output formatters.

New files: drift.rs, similarity.rs
Closes: bd-1cjx
2026-02-12 12:02:15 -05:00
teernisse
b29c382583 feat(bd-2g50): fill data gaps in issue detail view
Add references_full, user_notes_count, merge_requests_count computed
fields to show issue. Add closed_at and confidential columns via
migration 023.

Closes: bd-2g50
2026-02-12 11:59:44 -05:00
teernisse
e26816333f feat(bd-kvij): rewrite agent skills to mandate lore for reads
Add Read/Write Split section to AGENTS.md and CLAUDE.md mandating lore
for all read operations and glab for all write operations.

Closes: bd-kvij
2026-02-12 11:59:32 -05:00
teernisse
f772de8aef release: v0.6.2 2026-02-12 11:33:59 -05:00
teernisse
dd4d867c6e chore: update beads issue tracking state
Sync beads database with current issue status. Includes history
snapshot rotation and updated issue metadata from triage session.
2026-02-12 11:25:27 -05:00
teernisse
ffd074499a docs: update TUI PRD, time-decay scoring, and plan-to-beads plans
TUI PRD v2 (frankentui): Rounds 10-11 feedback refining the hybrid
Ratatui terminal UI approach — component architecture, keybinding
model, and incremental search integration.

Time-decay expert scoring: Round 6 feedback on the weighted scoring
model for the `who` command's expert mode, covering decay curves,
activity normalization, and bot filtering thresholds.

Plan-to-beads v2: Draft specification for the next iteration of the
plan-to-beads skill that converts markdown plans into dependency-
aware beads with full agent-executable context.
2026-02-12 11:21:32 -05:00
teernisse
125938fba6 docs: add per-note search PRD and user journey documentation
Per-note search PRD: Comprehensive product requirements for evolving
the search system from document-level to note-level granularity.
Includes 6 rounds of iterative feedback refining scope, ranking
strategy, migration path, and robot mode integration.

User journeys: Detailed walkthrough of 8 primary user workflows
covering issue triage, MR review lookup, code archaeology, expert
discovery, sync pipeline operation, and agent integration patterns.
2026-02-12 11:21:23 -05:00
teernisse
cd25cf61ca docs: add architecture and flow diagrams
Excalidraw source files and PNG exports for 5 architectural diagrams:

01-human-flow-map: User journey through lore CLI commands
02-agent-flow-map: AI agent interaction patterns with robot mode
03-command-coverage: Matrix of CLI commands vs data entities
04-gap-priority-matrix: Feature gap analysis with priority scoring
05-data-flow-architecture: End-to-end data pipeline from GitLab
    through ingestion, storage, indexing, and query layers
2026-02-12 11:21:15 -05:00
teernisse
d9c9f6e541 fix: escape LIKE metacharacters in project resolver
User-supplied project names containing `%` or `_` were passed directly
into LIKE patterns, causing unintended wildcard matching. For example,
`my_project` would match `my-project` because `_` is a single-char
wildcard in SQL LIKE.

Added escape_like() helper that escapes `\`, `%`, and `_` with
backslash, and added ESCAPE '\' clauses to both the suffix-match and
substring-match queries in resolve_project().

Includes two regression tests:
- test_underscore_not_wildcard: `_` in input must not match `-`
- test_percent_not_wildcard: `%` in input must not match arbitrary strings
2026-02-12 11:21:09 -05:00
teernisse
acc5e12e3d perf: force partial index for DiffNote queries, batch stats counts
Query optimizer fixes for the `who` and `stats` commands based on
a systematic performance audit of the SQLite query plans.

who command (expert/reviews/detail modes):
- Add INDEXED BY idx_notes_diffnote_path_created hints to all DiffNote
  queries. SQLite's planner was selecting idx_notes_system (38% of rows)
  over the far more selective partial index (9.3% of rows). Measured
  50-133x speedup on expert queries, 26x on reviews queries.
- Reorder JOIN clauses in detail mode's MR-author sub-select to match
  the index scan direction (notes -> discussions -> merge_requests).

stats command:
- Replace 12+ sequential COUNT(*) queries with conditional aggregates
  (COALESCE + SUM + CASE). Documents, dirty_sources, pending_discussion_
  fetches, and pending_dependent_fetches tables each scanned once instead
  of 2-3 times. Measured 1.7x speedup (109ms -> 65ms warm cache).
- Switch FTS document count from COUNT(*) on the virtual table to
  COUNT(*) on documents_fts_docsize shadow table (B-tree scan vs FTS5
  virtual table overhead). Measured 19x speedup for that single query.

Database: 61652 docs, 282K notes, 211K discussions, 1.5GB.
2026-02-12 11:21:00 -05:00
teernisse
039ab1c2a3 release: v0.6.1 2026-02-11 15:15:41 -05:00
teernisse
d63d6f0b9c docs: document defaultProject configuration option
Updates README.md to explain the new defaultProject behavior:
- Config example now shows the defaultProject field
- New row in the configuration reference table describing the field,
  its type (optional string), default (none), and behavior (fallback
  when -p omitted, must match a configured path, CLI always overrides)
- Project Resolution section updated to explain the cascading logic:
  CLI flag > config default > all projects
- Init section notes the interactive prompt for multi-project setups
  and the --default-project flag for non-interactive/robot mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:09:53 -05:00
teernisse
3a1307dcdc feat(cli): wire defaultProject through init and all commands
Integrates the defaultProject config field across the entire CLI
surface so that omitting `-p` now falls back to the configured default.

Init command:
- New `--default-project` flag on `lore init` (and robot-mode variant)
- InitInputs.default_project: Option<String> passed through to run_init
- Validation in run_init ensures the default matches a configured path
- Interactive mode: when multiple projects are configured, prompts
  whether to set a default and which project to use
- Robot mode: InitOutputJson now includes default_project (omitted when
  null) for downstream automation
- Autocorrect dictionary updated with `--default-project`

Command handlers applying effective_project():
- handle_issues: list filters use config default when -p omitted
- handle_mrs: same cascading resolution for MR listing
- handle_ingest: dry-run and full sync respect the default
- handle_timeline: TimelineParams.project resolved via effective_project
- handle_search: SearchCliFilters.project resolved via effective_project
- handle_generate_docs: project filter cascades
- handle_who: falls back to config.default_project when -p omitted
- handle_count: both count subcommands respect the default
- handle_discussions: discussion count filters respect the default

Robot-docs:
- init command schema updated with --default-project flag and
  response_schema showing default_project as string?
- New config_notes section documents the defaultProject field with
  type, description, and example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:09:46 -05:00
teernisse
6ea3108a20 feat(config): add defaultProject with validation and cascading resolver
Introduces a new optional `defaultProject` field on Config (and
MinimalConfig for init output) that acts as a fallback when the
`-p`/`--project` CLI flag is omitted.

Domain-layer changes:
- Config.default_project: Option<String> with camelCase serde rename
- Config::load validates that defaultProject matches a configured
  project path (exact or case-insensitive suffix match), returning
  ConfigInvalid on mismatch
- Config::effective_project(cli_flag) -> Option<&str>: cascading
  resolver that prefers the CLI flag, then the config default, then None
- MinimalConfig.default_project with skip_serializing_if for clean
  JSON output when unset

Tests added:
- effective_project: CLI overrides default, falls back to default,
  returns None when both absent
- Config::load: accepts valid defaultProject, rejects nonexistent,
  accepts suffix match
- MinimalConfig: omits null defaultProject, includes when set
- Helper write_config_with_default_project for parameterized tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 15:09:33 -05:00
teernisse
81647545e7 release: v0.6.0 2026-02-11 10:56:26 -05:00
teernisse
39a832688d feat(sync): status enrichment progress visibility and status discoverability
- Add StatusEnrichmentStarted/PageFetched/Writing progress events so
  sync no longer has a 45-60s silent gap during GraphQL status fetch
- Thread per-page callback into fetch_issue_statuses_with_progress
- Hide status_category from all human and robot output (keep in DB)
- Add meta.available_statuses to issues list JSON response for agent
  self-discovery of valid --status filter values
- Update robot-docs with status filtering documentation
2026-02-11 10:56:01 -05:00
Taylor Eernisse
06229ce98b feat(cli): expose available_statuses in robot mode and hide status_category
(Supersedes empty commit f3788eb — jj auto-snapshot race.)

Three related refinements to how work item status is presented:

1. available_statuses in meta (list.rs, main.rs):
   Robot-mode issue list responses now include meta.available_statuses —
   a sorted array of all distinct status_name values in the database.
   Agents can use this to validate --status filter values or display
   valid options without a separate query.

2. Hide status_category from JSON (list.rs, show.rs):
   status_category is a GitLab internal classification that duplicates
   the state field. Switched to skip_serializing so it never appears
   in JSON output while remaining available internally.

3. Simplify human-readable status display (show.rs):
   Removed the "(category)" parenthetical from the Status line.

4. robot-docs schema updates (main.rs):
   Documented --status filter semantics and meta.available_statuses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:24:41 -05:00
Taylor Eernisse
8d18552298 docs: add jj-first VCS policy to AGENTS.md
Establishes Jujutsu (jj) as the preferred VCS tool for this colocated
repo, matching the global Claude Code rules. Agents should use jj
equivalents for all git operations and only fall back to raw git for
hooks, LFS, submodules, or gh CLI interop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:23:01 -05:00
Taylor Eernisse
f3788eb687 feat(cli): expose available_statuses in robot mode and hide status_category
Three related refinements to how work item status is presented:

1. available_statuses in meta (list.rs, main.rs):
   Robot-mode issue list responses now include meta.available_statuses —
   a sorted array of all distinct status_name values in the database.
   Agents can use this to validate --status filter values, offer
   autocomplete, or display valid options without a separate query.

2. Hide status_category from JSON (list.rs, show.rs):
   status_category (e.g. "open", "closed") is a GitLab internal
   classification that duplicates the state field and adds no actionable
   signal for consumers. Switched from skip_serializing_if to
   skip_serializing so it never appears in JSON output while remaining
   available internally for future use.

3. Simplify human-readable status display (show.rs):
   Removed the "(category)" parenthetical from the Status line in
   lore show issue output. The category was noise — users care about
   the board column label, not GitLab's internal taxonomy.

4. robot-docs schema updates (main.rs):
   Documented the --status filter semantics and the new
   meta.available_statuses field in the self-discovery manifest.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:22:39 -05:00
Taylor Eernisse
e9af529f6e feat(ingestion): add progress reporting for status enrichment pipeline
Previously the status enrichment phase (GraphQL work item status fetch)
ran silently — users saw no feedback between "syncing issues" and the
final enrichment summary. For projects with hundreds of issues and
adaptive page-size retries, this felt like a hang.

Changes across three layers:

GraphQL (graphql.rs):
  - Extract fetch_issue_statuses_with_progress() accepting an optional
    on_page callback invoked after each paginated fetch with the
    running count of fetched IIDs
  - Original fetch_issue_statuses() preserved as a zero-cost
    delegation wrapper (no callback overhead)

Orchestrator (orchestrator.rs):
  - Three new ProgressEvent variants: StatusEnrichmentStarted,
    StatusEnrichmentPageFetched, StatusEnrichmentWriting
  - Wire the page callback through to the new _with_progress fn

CLI (ingest.rs):
  - Handle all three new events in the progress callback, updating
    both the per-project spinner and the stage bar with live counts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:22:20 -05:00
Taylor Eernisse
70271c14d6 fix(core): ensure migration framework records schema version automatically
The migration runner now inserts (OR REPLACE) the schema_version row
after each successful migration batch, regardless of whether the
migration SQL itself contains a self-registering INSERT. This prevents
version tracking gaps when a .sql migration omits the bookkeeping
statement, which would leave the schema at an unrecorded version and
cause re-execution attempts on next startup.

Legacy migrations that already self-register are unaffected thanks to
the OR REPLACE conflict resolution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 10:21:49 -05:00
Taylor Eernisse
d9f99ef21d feat(cli): status display/filtering, expanded --fields, and robot-docs --brief
Work item status integration across all CLI output:

Issue listing (lore list issues):
- New Status column appears when any issue has status data, with
  hex-color rendering using ANSI 256-color approximation
- New --status flag for case-insensitive filtering (OR logic for
  multiple values): lore issues --status "In progress" --status "To do"
- Status fields (name, category, color, icon_name, synced_at) in issue
  list query and JSON output with conditional serialization

Issue detail (lore show issue):
- Displays "Status: In progress (in_progress)" with color-coded output
  using ANSI 256-color approximation from hex color values
- Status fields included in robot mode JSON with ISO timestamps
- IssueRow, IssueDetail, IssueDetailJson all carry status columns

Robot mode field selection expanded to new commands:
- search: --fields with "minimal" preset (document_id, title, source_type, score)
- timeline: --fields with "minimal" preset (timestamp, type, entity_iid, detail)
- who: --fields with per-mode presets (expert_minimal, workload_minimal, etc.)
- robot-docs: new --brief flag strips response_schema from output (~60% smaller)
- strip_schemas() utility in robot.rs for --brief mode
- expand_fields_preset() extended for search, timeline, and all who modes

Robot-docs manifest updated with --status flag documentation, --fields
flags for search/timeline/who, fields_presets sections, and corrected
search response schema field names.

Note: replaces empty commit dcfd449 which lost staging during hook execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:13:37 -05:00
Taylor Eernisse
f5967a8e52 chore: fix UBS hook stdin parsing and update beads
.claude/hooks/on-file-write.sh:
- Fix hook to read Claude Code context from JSON stdin (FILE_PATH and
  CWD extracted via jq) instead of relying on environment variables
- Scan only the changed file instead of the entire project directory,
  reducing hook execution from ~30s to <1s per save

.beads/:
- Sync issue tracker state

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:34 -05:00
Taylor Eernisse
2c9de1a6c3 docs: add lore-service, work-item-status-graphql, and time-decay plans
Three implementation plans with iterative cross-model refinement:

lore-service (5 iterations):
  HTTP service layer exposing lore's SQLite data via REST/SSE for
  integration with external tools (dashboards, IDE extensions, chat
  agents). Covers authentication, rate limiting, caching strategy, and
  webhook-driven sync triggers.

work-item-status-graphql (7 iterations + TDD appendix):
  Detailed implementation plan for the GraphQL-based work item status
  enrichment feature (now implemented). Includes the TDD appendix with
  test-first development specifications covering GraphQL client, adaptive
  pagination, ingestion orchestration, CLI display, and robot mode output.

time-decay-expert-scoring (iteration 5 feedback):
  Updates to the existing time-decay scoring plan incorporating feedback
  on decay curve parameterization, recency weighting for discussion
  contributions, and staleness detection thresholds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:17 -05:00
Taylor Eernisse
1161edb212 docs: add TUI PRD v2 (FrankenTUI) with 9 plan-refine iterations
Comprehensive product requirements document for the gitlore TUI built on
FrankenTUI's Elm architecture (Msg -> update -> view). The PRD (7800+
lines) covers:

Architecture: Separate binary crate (lore-tui) with runtime delegation,
Elm-style Model/Cmd/Msg, DbManager with closure-based read pool + WAL,
TaskSupervisor for dedup/cancellation, EntityKey system for type-safe
entity references, CommandRegistry as single source of truth for
keybindings/palette/help.

Screens: Dashboard, IssueList, IssueDetail, MrList, MrDetail, Search
(lexical/hybrid/semantic with facets), Timeline (5-stage pipeline),
Who (expert/workload/reviews/active/overlap), Sync (live progress),
CommandPalette, Help overlay.

Infrastructure: InputMode state machine, Clock trait for deterministic
rendering, crash_context ring buffer with redaction, instance lock,
progressive hydration, session restore, grapheme-safe text truncation
(unicode-width + unicode-segmentation), terminal sanitization (ANSI/bidi/
C1 controls), entity LRU cache.

Testing: Snapshot tests via insta, event-fuzz, CLI/TUI parity, tiered
benchmark fixtures (S/M/L), query-plan CI enforcement, Phase 2.5
vertical slice gate.

9 plan-refine iterations (ChatGPT review -> Claude integration):
  Iter 1-3: Connection pool, debounce, EntityKey, TaskSupervisor,
    keyset pagination, capability-adaptive rendering
  Iter 4-6: Separate binary crate, ANSI hardening, session restore,
    read tx isolation, progressive hydration, unicode-width
  Iter 7-9: Per-screen LoadState, CommandRegistry, InputMode, Clock,
    log redaction, entity cache, search cancel SLO, crash diagnostics

Also includes the original tui-prd.md (ratatui-based, superseded by v2).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:11:26 -05:00
Taylor Eernisse
5ea976583e docs: update README, AGENTS, and robot-mode-design for work item status
README.md:
- Feature summary updated to mention work item status sync and GraphQL
- New config reference entry for sync.fetchWorkItemStatus (default true)
- Issue listing/show examples include --status flag usage
- Valid fields list expanded with status_name, status_category,
  status_color, status_icon_name, status_synced_at_iso
- Database schema table updated for issues table
- Ingest/sync command descriptions mention status enrichment phase
- Adaptive page sizing and graceful degradation documented

AGENTS.md:
- Robot mode example shows --status flag usage

docs/robot-mode-design.md:
- Issue available fields list expanded with status fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:10:51 -05:00
Taylor Eernisse
dcfd449b72 feat(cli): status display/filtering, expanded --fields, and robot-docs --brief
Work item status integration across all CLI output:

Issue listing (lore list issues):
- New Status column appears when any issue has status data, with
  hex-color rendering using ANSI 256-color approximation
- New --status flag for case-insensitive filtering (OR logic for
  multiple values): lore issues --status "In progress" --status "To do"

Issue detail (lore show issue):
- Displays "Status: In progress (in_progress)" with color-coded output
- Status fields (name, category, color, icon, synced_at) included in
  robot mode JSON with ISO timestamps

Robot mode field selection expanded to new commands:
- search: --fields with "minimal" preset (document_id, title, source_type, score)
- timeline: --fields with "minimal" preset (timestamp, type, entity_iid, detail)
- who: --fields with per-mode presets (expert_minimal, workload_minimal, etc.)
- robot-docs: new --brief flag strips response_schema from output (~60% smaller)

Robot-docs manifest updated with --status flag documentation, --fields
flags for search/timeline/who, fields_presets sections, and corrected
search response schema field names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:09:47 -05:00
Taylor Eernisse
6b75697638 feat(ingestion): enrich issues with work item status from GraphQL API
Add a "Phase 1.5" status enrichment step to the issue ingestion pipeline
that fetches work item statuses via the GitLab GraphQL API after the
standard REST API ingestion completes.

Schema changes (migration 021):
- Add status_name, status_category, status_color, status_icon_name, and
  status_synced_at columns to the issues table (all nullable)

Ingestion pipeline changes:
- New `enrich_issue_statuses_txn()` function that applies fetched
  statuses in a single transaction with two phases: clear stale statuses
  for issues that no longer have a status widget, then apply new/updated
  statuses from the GraphQL response
- ProgressEvent variants for status enrichment (complete/skipped)
- IngestProjectResult tracks enrichment metrics (seen, enriched, cleared,
  without_widget, partial_error_count, enrichment_mode, errors)
- Robot mode JSON output includes per-project status enrichment details

Configuration:
- New `sync.fetchWorkItemStatus` config option (defaults true) to disable
  GraphQL status enrichment on instances without Premium/Ultimate
- `LoreError::GitLabAuthFailed` now treated as permanent API error so
  status enrichment auth failures don't trigger retries

Also removes the unnecessary nested SAVEPOINT in store_closes_issues_refs
(already runs within the orchestrator's transaction context).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:09:21 -05:00
Taylor Eernisse
dc49f5209e feat(gitlab): add GraphQL client with adaptive pagination and work item status types
Introduce a reusable GraphQL client (`src/gitlab/graphql.rs`) that handles
GitLab's GraphQL API with full error handling for auth failures, rate
limiting, and partial errors. Key capabilities:

- Adaptive page sizing (100 → 50 → 25 → 10) to handle GitLab GraphQL
  complexity limits without hardcoding a single safe page size
- Paginated issue status fetching via the workItems GraphQL query
- Graceful detection of unsupported instances (missing GraphQL endpoint
  or forbidden auth) so ingestion continues without status data
- Retry-After header parsing via the `httpdate` crate for rate limit
  compliance

Also adds `WorkItemStatus` type to `gitlab::types` with name, category,
color, and icon_name fields (all optional except name) with comprehensive
deserialization tests covering all system statuses (TO_DO, IN_PROGRESS,
DONE, CANCELED) and edge cases (null category, unknown future values).

The `GitLabClient` gains a `graphql_client()` factory method for
ergonomic access from the ingestion pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:08:53 -05:00
Taylor Eernisse
7d40a81512 fix(ingestion): remove nested transaction in upsert_mr_file_changes
drain_mr_diffs in orchestrator.rs already wraps each MR diff store
in an unchecked_transaction (alongside job completion and watermark
update). upsert_mr_file_changes was also starting its own inner
transaction via conn.unchecked_transaction(), causing every call to
fail with "cannot start a transaction within a transaction".

Remove the inner transaction management from upsert_mr_file_changes
so it operates on whatever Connection (or Transaction deref'd to
Connection) the caller provides. The caller in drain_mr_diffs owns
the transaction boundary. Standalone callers (tests, future direct
use) auto-commit each statement, which is correct for their use case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 11:56:15 -05:00
Taylor Eernisse
4185abe05d docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc
Ideas catalog (docs/ideas/): 25 feature concept documents covering future
lore capabilities including bottleneck detection, churn analysis, expert
scoring, collaboration patterns, milestone risk, knowledge silos, and more.
Each doc includes motivation, implementation sketch, data requirements, and
dependencies on existing infrastructure. README.md provides an overview and
SYSTEM-PROPOSAL.md presents the unified analytics vision.

Plans (plans/): Time-decay expert scoring design with four rounds of review
feedback exploring decay functions, scoring algebra, and integration points
with the existing who-expert pipeline.

Issue doc (docs/issues/001): Documents the timeline pipeline bug where
EntityRef was missing project context, causing ambiguous cross-project
references during the EXPAND stage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:48 -05:00
Taylor Eernisse
d54f669c5e chore: add multi-agent editor config and UBS file-write hook
Add rule/config files for Cursor, Cline, Codex, Gemini, Continue, and
OpenCode editors pointing them to project conventions, UBS usage, and
AGENTS.md. Add a Claude Code on-file-write hook that runs UBS on
supported source files after every save.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:28 -05:00
Taylor Eernisse
45126f04a6 fix: document upsert project_id, truncation budget, and Ollama model matching
- regenerator: Include project_id in the ON CONFLICT UPDATE clause for
  document upserts. Previously, if a document moved between projects
  (e.g., during re-ingestion), the project_id would remain stale.

- truncation: Compute the omission marker ("N notes omitted") before
  checking whether first+last notes fit in the budget. The old order
  computed the marker after the budget check, meaning the marker's byte
  cost was unaccounted for and could cause over-budget output.

- ollama: Tighten model name matching to require either an exact match
  or a colon-delimited tag prefix (model == name or name starts with
  "model:"). The prior starts_with check would false-positive on
  "nomic-embed-text-v2" when looking for "nomic-embed-text". Tests
  updated to cover exact match, tagged, wrong model, and prefix
  false-positive cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:14 -05:00
Taylor Eernisse
dfa44e5bcd fix(ingestion): label upsert reliability, init idempotency, and sync health
Label upsert (issues + merge_requests): Replace INSERT ... ON CONFLICT DO
UPDATE RETURNING with INSERT OR IGNORE + SELECT. The prior RETURNING-based
approach relied on last_insert_rowid() matching the returned id, which is
not guaranteed when ON CONFLICT triggers an update (SQLite may return 0).
The new two-step approach is unambiguous and correctly tracks created_count.

Init: Add ON CONFLICT(gitlab_project_id) DO UPDATE to the project insert
so re-running `lore init` updates path/branch/url instead of failing with
a unique constraint violation.

MR discussions sync: Reset discussions_sync_attempts to 0 when clearing a
sync health error, so previously-failed MRs get a fresh retry budget after
successful sync.

Count: format_number now handles negative numbers correctly by extracting
the sign before inserting thousand-separators.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:15:53 -05:00
Taylor Eernisse
53ef21d653 fix: propagate DB errors instead of silently swallowing them
Replace .unwrap_or(), .ok(), and .filter_map(|r| r.ok()) patterns with
proper error propagation using ? and rusqlite::OptionalExtension where
the query may legitimately return no rows.

Affected areas:
- events_db::count_events: three count queries now propagate errors
  instead of defaulting to (0, 0) on failure
- note_parser::extract_refs_from_system_notes: row iteration errors
  are now propagated instead of silently dropped via filter_map
- note_parser::noteable_type_to_entity_type: unknown types now log a
  debug warning before defaulting to "issue"
- payloads::store_payload/read_payload: use .optional()? instead of
  .ok() to distinguish "no row" from "query failed"
- backoff::compute_next_attempt_at: use .clamp(0, 30) to guard against
  negative attempt_count, not just .min(30)
- search::vector::max_chunks_per_document: returns Result<i64> with
  proper error propagation through .optional()?.flatten()
- embedding::chunk_ids::decode_rowid: promote debug_assert to assert
  since negative rowids indicate data corruption worth failing fast on
- ingestion::dirty_tracker::record_dirty_error: use .optional()? to
  handle missing dirty_sources row gracefully instead of hard error

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:15:36 -05:00
Taylor Eernisse
41504b4941 feat(who): configurable scoring weights, MR refs, detail mode, and suffix path resolution
Expert mode now surfaces the specific MR references (project/path!iid) that
contributed to each expert's score, capped at 50 per user. A new --detail flag
adds per-MR breakdowns showing role (Author/Reviewer/both), note count, and
last activity timestamp.

Scoring weights (author_weight, reviewer_weight, note_bonus) are now
configurable via the config file's `scoring` section with validation that
rejects negative values. Defaults shift to author_weight=25, reviewer_weight=10,
note_bonus=1 — better reflecting that code authorship is a stronger expertise
signal than review assignment alone.

Path resolution gains suffix matching: typing "login.rs" auto-resolves to
"src/auth/login.rs" when unambiguous, with clear disambiguation errors when
multiple paths match. Project-scoping (-p) narrows the candidate set.

The MAX_MR_REFS_PER_USER constant is promoted to module scope for reuse
across expert and overlap modes. Human output shows MR refs inline and detail
sub-rows when requested. Robot JSON includes mr_refs, mr_refs_total,
mr_refs_truncated, and optional details array.

Includes comprehensive tests for suffix resolution, scoring weight
configurability, MR ref aggregation across projects, and detail mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:15:15 -05:00
Taylor Eernisse
d36850f181 release: v0.5.2 2026-02-08 17:24:17 -05:00
Taylor Eernisse
5ce18e0ebc release: v0.5.1 2026-02-08 14:36:06 -05:00
Taylor Eernisse
b168a58134 fix(search): cap vector search k-value and add rowid assertion
The vector search multiplier could grow unbounded on documents with
many chunks, producing enormous k values that cause SQLite to scan
far more rows than necessary. Clamp the multiplier to [8, 200] and
cap k at 10,000 to prevent degenerate performance on large corpora.

Also adds a debug_assert in decode_rowid to catch negative rowids
early — these indicate a bug in the encoding pipeline and should
fail fast rather than silently produce garbage document IDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:34:05 -05:00
Taylor Eernisse
b704e33188 feat(sync): surface MR diff fetch/fail counters in sync output
Adds mr_diffs_fetched and mr_diffs_failed fields to IngestResult and
SyncResult, threads them through the orchestrator aggregation, includes
them in the structured tracing span and human-readable sync summary.
Previously MR diff failures were silently swallowed — now they appear
alongside resource event counts for full pipeline observability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:53 -05:00
Taylor Eernisse
6e82f723c3 fix(ingestion): unify store + watermark + job-complete in single transaction
Previously, drain_resource_events, drain_mr_closes_issues, and
drain_mr_diffs each opened a transaction only for the job-complete +
watermark update, but the store operation ran outside that transaction.
If the process crashed between the store and the watermark update, data
would be persisted without the watermark advancing, causing silent
duplicates on the next sync.

Now each drain function opens the transaction before the store call and
commits it only after both the store and the watermark update succeed.
On error, the transaction is explicitly dropped so the connection is
not left in a half-committed state.

Also:
- store_resource_events no longer manages its own transaction; the caller
  passes in a connection (which is actually the transaction)
- upsert_mr_file_changes wraps DELETE + INSERT in a transaction internally
- reset_discussion_watermarks now also clears diffs_synced_for_updated_at
- Orchestrator error span now includes closes_issues_failed + mr_diffs_failed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:47 -05:00
Taylor Eernisse
940a96375a refactor(search): rename --after/--updated-after to --since/--updated-since
The --since naming is more intuitive (matches git log --since) and
consistent with the list commands which already use --since. Renames
the CLI flags, SearchCliFilters fields, SearchFilters fields,
autocorrect registry, and robot-docs manifest. No behavioral change.

Affected paths:
- cli/mod.rs: SearchArgs field + clap attribute rename
- cli/commands/search.rs: SearchCliFilters + run_search plumbing
- search/filters.rs: SearchFilters struct + apply_filters logic
- main.rs: handle_search + robot-docs JSON
- cli/autocorrect.rs: COMMAND_FLAGS entry for search

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:24 -05:00
Taylor Eernisse
7dd86d5433 fix(db): add missing schema_version insert to migration 019
Migration 019 created performance indexes but never recorded itself
in the schema_version table. Without this row the migration runner
considers the schema outdated and would attempt to re-apply. Adds
the standard INSERT INTO schema_version for version 19.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:13 -05:00
Taylor Eernisse
429c6f07d2 release: v0.5.0
Bump version from 0.1.0 to 0.5.0 to reflect the maturity of the CLI
after months of development — robot mode, search pipeline, ingestion
orchestrator, who commands, timeline pipeline, and embedding support
are all implemented and stable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:07 -05:00
Taylor Eernisse
754efa4369 chore: add /release skill for automated SemVer version bumps
Adds a Claude Code skill that automates the release workflow:
parse bump type (major/minor/patch), update Cargo.toml + Cargo.lock,
commit, and tag. Intentionally does not auto-push so the user
retains control over when releases go to the remote.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 14:33:02 -05:00
Taylor Eernisse
c54a969269 fix(who): exclude self-assigned reviewers from file-change reviewer signal
Signal 4 (mr_reviewers + mr_file_changes) was missing the self-review
exclusion that signal 1 (DiffNote reviewer) already had. An MR author
listed as their own reviewer would be double-counted as both author
and reviewer, inflating their score.

Also removes redundant SELECT DISTINCT from signal 2 (GROUP BY
already ensures uniqueness).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 13:42:40 -05:00
Taylor Eernisse
95b7183add feat(who): expand expert + overlap queries with mr_file_changes and mr_reviewers
Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries)

- Add fetch_mr_file_changes config option and --no-file-changes CLI flag
- Add GitLab MR diffs API fetch pipeline with watermark-based sync
- Create migration 020 for diffs_synced_for_updated_at watermark column
- Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL:
  DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers
- Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END)
- Add insert_file_change test helper, 8 new who tests, all 397 tests pass
- Also includes: list performance migration 019, autocorrect module, README updates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 13:35:14 -05:00
Taylor Eernisse
435a208c93 perf: eliminate unnecessary clones and pre-allocate collections
Three micro-optimizations with zero behavioral change:

1. timeline_collect.rs: Reorder format!() before enum construction so
   the owned String moves into the variant directly, eliminating
   .clone() on state, label, and milestone strings in StateChanged,
   LabelAdded/Removed, and MilestoneSet/Removed event paths.

2. pipeline.rs: Use Arc<str> for doc_hash shared across a document's
   chunks instead of cloning the full String per chunk. Also remove
   redundant embed_buf.reserve() since extend_from_slice already
   handles growth and the buffer is reused across iterations.

3. rrf.rs: Pre-allocate HashMap with combined vector+fts result count
   via with_capacity() to avoid rehashing during RRF score accumulation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 08:08:14 -05:00
Taylor Eernisse
cc11d3e5a0 fix: peer review — 5 correctness bugs across who, db, lock, embedding, main
Comprehensive peer code review identified and fixed the following:

1. who.rs: @-prefixed path routing used `target` (with @) instead of
   `clean` (stripped) when checking for '/' and passing to Expert mode,
   causing `lore who @src/auth/` to silently return zero results because
   the SQL LIKE matched against `@src/auth/%` which never exists.

2. db.rs: After ROLLBACK TO savepoint on migration failure, the savepoint
   was never RELEASEd, leaving it active on the connection. Fixed in both
   run_migrations() and run_migrations_from_dir().

3. lock.rs: Multiple acquire() calls (e.g. re-acquiring a stale lock)
   replaced the heartbeat_handle without stopping the old thread, causing
   two concurrent heartbeat writers competing on the same lock row. Now
   signals the old thread to stop and joins it before spawning a new one.

4. chunk_ids.rs: encode_rowid() had no guard for chunk_index >= 1000
   (CHUNK_ROWID_MULTIPLIER), which would cause rowid collisions between
   adjacent documents. Added range assertion [0, 1000).

5. main.rs: Fallback JSON error formatting in handle_auth_test
   interpolated LoreError Display output without escaping quotes or
   backslashes, potentially producing malformed JSON for robot-mode
   consumers. Now escapes both characters before interpolation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 08:07:59 -05:00
Taylor Eernisse
5786d7f4b6 fix: defensive hardening — lock release logging, SQLite param guard, vector cast
Three defensive improvements found via peer code review:

1. lock.rs: Lock release errors were silently discarded with `let _ =`.
   If the DELETE failed (disk full, corruption), the lock stayed in the
   database with no diagnostic. Next sync would require --force with no
   clue why. Now logs with error!() including the underlying error message.

2. filters.rs: Dynamic SQL label filter construction had no upper bound
   on bind parameters. With many combined filters, param_idx + labels.len()
   could exceed SQLite's 999-parameter limit, producing an opaque error.
   Added a guard that caps labels at 900 - param_idx.

3. vector.rs: max_chunks_per_document returned i64 which was cast to
   usize. A negative value from a corrupt database would wrap to a huge
   number, causing overflow in the multiplier calculation. Now clamped
   to .max(1) and cast via unsigned_abs().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:54 -05:00
Taylor Eernisse
d3306114eb fix(ingestion): pass ShutdownSignal into issue and MR pagination loops
The orchestrator already accepted a ShutdownSignal but only checked it
between phases (after all issues fetched, before discussions). The inner
loops in ingest_issues() and ingest_merge_requests() consumed entire
paginated streams without checking for cancellation.

On a large initial sync (thousands of issues/MRs), Ctrl+C could be
unresponsive for minutes while the current entity type finished draining.

Now both functions accept &ShutdownSignal and check is_cancelled() at
the top of each iteration, breaking out promptly and committing the
cursor for whatever was already processed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:36 -05:00
Taylor Eernisse
e6b880cbcb fix: prevent panics in robot-mode JSON output and arithmetic paths
Peer code review found multiple panic-reachable paths:

1. serde_json::to_string().unwrap() in 4 robot-mode output functions
   (who.rs, main.rs x3). If serialization ever failed (e.g., NaN from
   edge-case division), the CLI would panic with an unhelpful stack trace.
   Replaced with unwrap_or_else that emits a structured JSON error fallback.

2. encode_rowid() in chunk_ids.rs used unchecked multiplication
   (document_id * 1000). On extreme document IDs this could silently wrap
   in release mode, causing embedding rowid collisions. Now uses
   checked_mul + checked_add with a diagnostic panic message.

3. HTTP response body truncation at byte index 500 in client.rs could
   split a multi-byte UTF-8 character, causing a panic. Now uses
   floor_char_boundary(500) for safe truncation.

4. who.rs reviews mode: SQL used `m.author_username != ?1` which silently
   dropped MRs with NULL author_username (SQL NULL != anything = NULL).
   Changed to `(m.author_username IS NULL OR m.author_username != ?1)`
   to match the pattern already used in expert mode.

5. handle_auth_test hardcoded exit code 5 for all errors regardless of
   type. Config not found (20), token not set (4), and network errors (8)
   all incorrectly returned 5. Now uses e.exit_code() from the actual
   LoreError, with proper suggestion hints in human mode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:20 -05:00
Taylor Eernisse
121a634653 fix: critical data integrity — timeline dedup, discussion atomicity, index collision
Three correctness bugs found via peer code review:

1. TimelineEvent PartialEq/Ord omitted entity_type — issue #42 and MR #42
   with the same timestamp and event_type were treated as equal. In a
   BTreeSet or dedup, one would silently be dropped. Added entity_type to
   both PartialEq and Ord comparisons.

2. discussions.rs: store_payload() was called outside the transaction
   (on bare conn) while upsert_discussion/notes were inside. A crash
   between them left orphaned payload rows. Moved store_payload inside
   the unchecked_transaction block, matching mr_discussions.rs pattern.

3. Migration 017 created idx_issue_assignees_username(username, issue_id)
   but migration 005 already created the same index name with just
   (username). SQLite's IF NOT EXISTS silently skipped the composite
   version on every existing database. New migration 018 drops and
   recreates the index with correct composite columns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:54:59 -05:00
Taylor Eernisse
f267578aab feat: implement lore who — people intelligence commands (5 modes)
Add `lore who` command with 5 query modes answering collaboration questions
using existing DB data (280K notes, 210K discussions, 33K DiffNotes):

- Expert: who knows about a file/directory (DiffNote path analysis + MR breadth scoring)
- Workload: what is a person working on (assigned issues, authored/reviewing MRs, discussions)
- Active: what discussions need attention (unresolved resolvable, global/project-scoped)
- Overlap: who else is touching these files (dual author+reviewer role tracking)
- Reviews: what review patterns does a person have (prefix-based category extraction)

Includes migration 017 (5 composite indexes), CLI skeleton with clap conflicts_with
validation, robot JSON output with input+resolved_input reproducibility, human terminal
output, and 20 unit tests. All quality gates pass.

Closes: bd-1q8z, bd-34rr, bd-2rk9, bd-2ldg, bd-zqpf, bd-s3rc, bd-m7k1, bd-b51e,
bd-2711, bd-1rdi, bd-3mj2, bd-tfh3, bd-zibc, bd-g0d5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 23:11:14 -05:00
Taylor Eernisse
859923f86b docs: update AGENTS.md robot mode section for --fields, actions, exit codes
Sync the agent instructions with the current robot mode implementation:
- Add RUST_CLI_TOOLS_BEST_PRACTICES.md reference for Rust coding guidance
- Expand robot mode description to cover all new capabilities
- Add --fields examples (minimal preset, custom field lists)
- Document error actions array for automated recovery workflows
- Update response format to show elapsed_ms and actions in error envelope
- Add field selection section with usage examples
- Separate health check to exit code 19 (was overloaded on exit code 1)
- Add robot-docs recommendation for response schema discovery
- Update best practices with --fields minimal for token efficiency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:32 -05:00
Taylor Eernisse
d701b1f977 docs: add plan frontmatter to api-efficiency-findings
Add YAML frontmatter metadata (plan: true, status: drafting, iteration: 0)
to integrate with the iterative plan review workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:24 -05:00
Taylor Eernisse
736d9c9a80 docs: rewrite robot-mode-design to reflect implemented features
Comprehensive update to the robot mode design document bringing it in sync
with the actual implementation after the elapsed_ms, --fields, and error
actions features landed.

Major additions:
- Response envelope section documenting compact JSON with elapsed_ms timing
- Error actions table mapping each error code to executable recovery commands
- Field selection section with presets (minimal) and per-entity available fields
- Expanded exit codes table (14-20) covering Ollama, embedding, ambiguity errors
- Updated command examples to use current CLI syntax (lore issues vs lore list issues)
- Added -J shorthand and --fields to global flags table
- Best practices section with --fields minimal for token efficiency (~60% reduction)

Removed outdated sections that no longer match the implementation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:16 -05:00
Taylor Eernisse
8dc479e515 docs: add lore who command design plan with 8 iterations of review feedback
Design document for `lore who` — a people intelligence query layer over
existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53
participants). Answers five collaboration questions: expert lookup by
file/path, workload summary, review pattern analysis, active discussion
tracking, and file overlap detection.

Key design decisions refined across 8 feedback iterations:
- All SQL is fully static (no format!()) with prepare_cached() throughout
- Exact vs prefix path matching via PathQuery struct (two static SQL variants)
- Self-review exclusion (author != reviewer) on all DiffNote branches
- Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers
- Bounded payloads with *_total/*_truncated metadata for robot consumers
- Truncation transparency via LIMIT+1 overflow detection pattern
- Robot JSON includes resolved_input for reproducibility (since_mode tri-state)
- Multi-project correctness with project-qualified entity references
- Composite migration indexes designed for query selectivity on hot paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:05 -05:00
Taylor Eernisse
3e7fa607d3 docs: update README for --fields, elapsed_ms, error actions, exit code 19
Documents the robot mode enhancements from the previous commits:

- Field selection (--fields flag and minimal preset) with examples
  and complete field lists for issues and MRs
- Updated response format section to show meta.elapsed_ms and compact
  single-line JSON
- Error actions array with recovery shell commands
- Agent self-discovery section explaining robot-docs response_schema
- Exit code 19 for health check failure added to the table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:47:30 -05:00
Taylor Eernisse
b5f78e31a8 fix(cli): audit-driven improvements to flags, help, exit codes, and deprecation
Addresses findings from a comprehensive CLI readiness audit:

Flag design (I2):
- Add hidden --no-verbose flag with overrides_with semantics, matching
  the --no-quiet pattern already established for all other boolean flags.

Help text (I3):
- Add after_help examples to issues, mrs, search, sync, and timeline
  subcommands. Each shows 3-4 concrete, runnable commands with comments.

Help headings (I4/P5):
- Move --mode and --fts-mode from "Output" heading to "Mode" heading
  in the search subcommand. These control search strategy, not output
  format — "Output" is reserved for --limit, --explain, --fields.

Exit codes (I5):
- Health check failure now exits 19 (was 1). Exit code 1 is reserved
  for internal errors only. robot-docs updated to document code 19.

Deprecation visibility (P4):
- Deprecated commands (list, show, auth-test, sync-status) now emit
  structured JSON warnings to stderr in robot mode:
  {"warning":{"type":"DEPRECATED","message":"...","successor":"..."}}
  Previously these were silently swallowed in robot mode.

Version string (P1):
- Cli struct uses env!("LORE_VERSION") from build.rs so --version shows
  git hash (see previous commit).

Fields flag (P3):
- --fields help text updated to document the "minimal" preset.

Robot-docs (parallel work):
- response_schema added for every command, documenting the JSON shape
  agents will receive. Agents can now introspect expected fields before
  calling a command.
- error_format documents the new "actions" array.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:47:04 -05:00
Taylor Eernisse
cf6d27435a feat(robot): add elapsed_ms timing, --fields support, and actionable error actions
Robot mode consistency improvements across all command output:

Timing:
- Every robot JSON response now includes meta.elapsed_ms measuring
  wall-clock time from command start to serialization. Agents can use
  this to detect slow queries and tune --limit or --project filters.

Field selection (--fields):
- print_list_issues_json and print_list_mrs_json accept an optional
  fields slice that prunes each item in the response array to only
  the requested keys. A "minimal" preset expands to
  [iid, title, state, updated_at_iso] for token-efficient agent scans.
- filter_fields and expand_fields_preset live in the new
  src/cli/robot.rs module alongside RobotMeta.

Actionable error recovery:
- LoreError gains an actions() method returning concrete shell commands
  an agent can execute to recover (e.g. "ollama serve" for
  OllamaUnavailable, "lore init" for ConfigNotFound).
- RobotError now serializes an "actions" array (empty array omitted)
  so agents can parse and offer one-click fixes.

Envelope consistency:
- show issue/MR JSON responses now use the standard
  {"ok":true,"data":...,"meta":...} envelope instead of bare data,
  matching all other commands.

Files: src/cli/robot.rs (new), src/core/error.rs,
       src/cli/commands/{count,embed,generate_docs,ingest,list,show,stats,sync_status}.rs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:46:48 -05:00
Taylor Eernisse
4ce0130620 build: emit LORE_VERSION env var combining version and git hash
The clap --version flag now shows the git hash alongside the semver
version (e.g. "lore 0.1.0 (a573d69)") instead of bare "lore 0.1.0".

LORE_VERSION is constructed at compile time in build.rs from
CARGO_PKG_VERSION + the short git hash, and consumed via
env!("LORE_VERSION") in the Cli struct's #[command(version)] attribute.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 23:46:29 -05:00
Taylor Eernisse
a573d695d5 test(perf): add benchmarks for hash query elimination and embed bytes
Two new microbenchmarks measuring optimizations applied in this session:

bench_redundant_hash_query_elimination:
  Compares the old 2-query pattern (get_existing_hash + full SELECT)
  against the new single-query pattern where upsert_document_inner
  returns change detection info directly. Uses 100 seeded documents
  with 10K iterations, prepare_cached, and black_box to prevent
  elision.

bench_embedding_bytes_alloc_vs_reuse:
  Compares per-call Vec<u8> allocation against the reusable embed_buf
  pattern now used in store_embedding. Simulates 768-dim embeddings
  (nomic-embed-text) with 50K iterations. Includes correctness
  assertion that both approaches produce identical byte output.

Both benchmarks use informational-only timing (no pass/fail on speed)
with correctness assertions as the actual test criteria, ensuring they
never flake on CI.

Notes recorded in benchmark file:
- SHA256 hex formatting optimization measured at 1.01x (reverted)
- compute_list_hash sort strategy measured at 1.02x (reverted)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:43:11 -05:00
Taylor Eernisse
a855759bf8 fix: shutdown safety, CLI hardening, exit code collision
Shutdown signal improvements:
- Upgrade ShutdownSignal from Relaxed to Release/Acquire ordering.
  Relaxed was technically sufficient for a single flag but
  Release/Acquire is the textbook correct pattern and ensures
  visibility guarantees across threads without relying on x86 TSO.
- Add double Ctrl+C support to all three signal handlers (ingest,
  embed, sync). First Ctrl+C sets cooperative flag with user message;
  second Ctrl+C force-exits with code 130 (standard SIGINT convention).

CLI hardening:
- LORE_ROBOT env var now checks for truthy values (!empty, !="0",
  !="false") instead of mere existence. Setting LORE_ROBOT=0 or
  LORE_ROBOT=false no longer activates robot mode.
- Replace unreachable!() in color mode match with defensive warning
  and fallback to auto. Clap validates the values but defense in depth
  prevents panics if the value_parser is ever changed.
- Replace unreachable!() in completions shell match with proper error
  return for unsupported shells.

Exit code collision fix:
- ConfigNotFound was mapped to exit code 2 (error.rs:56) which
  collided with handle_clap_error() also using exit code 2 for parse
  errors. Agents calling lore --robot could not distinguish "bad
  arguments" from "missing config file."
- Restore ConfigNotFound to exit code 20 (its original dedicated code).
- Update robot-docs exit code table: code 2 = "Usage error", code 20 =
  "Config not found".

Build script:
- Track .git/refs/heads directory for Cargo rebuild triggers. Ensures
  GIT_HASH env var updates when branch refs change, not just HEAD.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:59 -05:00
Taylor Eernisse
f3f3560e0d fix(ingestion): proper error propagation and transaction safety
Three hardening improvements to the ingestion orchestrator:

- Replace .unwrap_or(0) with ? on COUNT(*) queries for total_issues
  and total_mrs. These are simple aggregate queries that should never
  fail, but if they do (e.g. table missing after failed migration),
  propagating the error gives an actionable message instead of silently
  reporting 0 items.

- Wrap store_closes_issues_refs in a SAVEPOINT with proper
  ROLLBACK/RELEASE. Previously, a failure mid-loop (e.g. on the 5th of
  10 close-issue references) would leave partial refs committed. Now
  the entire batch is atomic.

- Replace silent catch-all (_ => {}) arms in enqueue_resource_events
  and update_resource_event_watermark with explicit warnings for
  unknown entity_type values. Makes debugging easier when new entity
  types are added but the match arms aren't updated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:40 -05:00
Taylor Eernisse
2bfa4f1f8c perf(documents): eliminate redundant hash query in regeneration
The document regenerator was making two queries per document:
1. get_existing_hash() — SELECT content_hash
2. upsert_document_inner() — SELECT id, content_hash, labels_hash, paths_hash

Query 2 already returns the content_hash needed for change detection.
Remove get_existing_hash() entirely and compute content_changed inside
upsert_document_inner() from the existing row data.

upsert_document_inner now returns Result<bool> (true = content changed)
which propagates up through upsert_document and regenerate_one,
replacing the separate pre-check. The triple-hash fast-path (all three
hashes match → return Ok(false) with no writes) is preserved.

This halves the query count for unchanged documents, which dominate
incremental syncs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:26 -05:00
Taylor Eernisse
8cf14fb69b feat(search): sanitize raw FTS5 queries with safe fallback
Add input validation for Raw FTS query mode to prevent expensive or
malformed queries from reaching SQLite FTS5:

- Reject unbalanced double quotes (would cause FTS5 syntax error)
- Reject leading wildcard-only queries ("*", "* OR ...") that trigger
  expensive full-table scans
- Reject empty/whitespace-only queries
- Invalid raw input falls back to Safe mode automatically instead of
  erroring, so callers never see FTS5 parse failures

The Safe mode already escapes all tokens with double-quote wrapping
and handles embedded quotes via doubling. Raw mode now has a
validation layer on top.

All queries remain parameterized (?1, ?2) — user input never enters
SQL strings directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:17 -05:00
Taylor Eernisse
c2036c64e9 feat(embed): docs_embedded tracking, buffer reuse, retry hardening
Embedding pipeline improvements building on the concurrent batching
foundation:

- Track docs_embedded vs chunks_embedded separately. A document counts
  as embedded only when ALL its chunks succeed, giving accurate
  progress reporting. The sync command reads docs_embedded for its
  document count.

- Reuse a single Vec<u8> buffer (embed_buf) across all store_embedding
  calls instead of allocating per chunk. Eliminates ~3KB allocation per
  768-dim embedding.

- Detect and record errors when Ollama silently returns fewer
  embeddings than inputs (batch mismatch). Previously these dropped
  chunks were invisible.

- Improve retry error messages: distinguish "retry returned unexpected
  result" (wrong dims/count) from "retry request failed" (network
  error) instead of generic "chunk too large" message.

- Convert all hot-path SQL from conn.execute() to prepare_cached() for
  statement cache reuse (clear_document_embeddings, store_embedding,
  record_embedding_error).

- Record embedding_metadata errors for empty documents so they don't
  appear as perpetually pending on subsequent runs.

- Accept concurrency parameter (configurable via config.embedding.concurrency)
  instead of hardcoded EMBED_CONCURRENCY=2.

- Add schema version pre-flight check in embed command to fail fast
  with actionable error instead of cryptic SQL errors.

- Fix --retry-failed to use DELETE instead of UPDATE. UPDATE clears
  last_error but the row still matches config params in the LEFT JOIN,
  making the doc permanently invisible to find_pending_documents.
  DELETE removes the row entirely so the LEFT JOIN returns NULL.
  Regression test added (old_update_approach_leaves_doc_invisible).

- Add chunking forward-progress guard: after floor_char_boundary()
  rounds backward, ensure start advances by at least one full
  character to prevent infinite loops on multi-byte sequences
  (box-drawing chars, smart quotes). Test cases cover the exact
  patterns that caused production hangs on document 18526.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 22:42:08 -05:00
Taylor Eernisse
39cb0cb087 feat(embed): concurrent batching, UTF-8 safe chunking, right-sized chunks
Three fixes to the embedding pipeline:

1. Concurrent HTTP batching: fire EMBED_CONCURRENCY (2) Ollama requests
   in parallel via join_all, then write results serially to SQLite.
   ~2x throughput improvement on GPU-bound workloads.

2. UTF-8 boundary safety: all computed byte offsets in split_into_chunks
   (paragraph/sentence/word break finders + overlap advance) now use
   floor_char_boundary() to prevent panics on multi-byte characters
   like smart quotes and non-breaking spaces.

3. CHUNK_MAX_BYTES reduced from 6000 to 1500 to fit nomic-embed-text's
   actual 2048-token context window, eliminating context-length retry
   storms that were causing 10x slowdowns.

Also threads ShutdownSignal through embed pipeline for graceful Ctrl+C.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 14:48:34 -05:00
Taylor Eernisse
1c45725cba fix(sync): pass options.full through to generate-docs stage
The sync pipeline was hardcoding `false` for the `full` parameter when
calling run_generate_docs, so `lore sync --full` would re-ingest all
entities but then only regenerate documents for newly-dirtied ones.
Entities loaded before migration 007 (which introduced the dirty_sources
system) were never marked dirty and thus never got documents generated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 11:42:11 -05:00
Taylor Eernisse
405e5370dc feat(sync): concurrent drains, atomic watermarks, graceful Ctrl+C shutdown
Three fixes to the sync pipeline:

1. Atomic watermarks: wrap complete_job + update_watermark in a single
   SQLite transaction so crash between them can't leave partial state.

2. Concurrent drain loops: prefetch HTTP requests via join_all (batch
   size = dependent_concurrency), then write serially to DB. Reduces
   ~9K sequential requests from ~19 min to ~2.4 min.

3. Graceful shutdown: install Ctrl+C handler via ShutdownSignal
   (Arc<AtomicBool>), thread through orchestrator/CLI, release locked
   jobs on interrupt, record sync_run as "failed".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 11:22:04 -05:00
Taylor Eernisse
32783080f1 fix(timeline): report true total_events in robot JSON meta
The robot JSON envelope's meta.total_events field was incorrectly
reporting events.len() (the post-limit count), making it identical
to meta.showing. This defeated the purpose of having both fields.

Changes across the pipeline to fix this:

- collect_events now returns (Vec<TimelineEvent>, usize) where the
  second element is the total event count before truncation
- TimelineResult gains a total_events_before_limit field (serde-skipped)
  so the value flows cleanly from collect through to the renderer
- main.rs passes the real total instead of the events.len() workaround

Additional cleanup in this pass:

- Derive PartialEq/Eq/PartialOrd/Ord on TimelineEventType, replacing
  the hand-rolled event_type_discriminant() function. Variant declaration
  order now defines sort tiebreak, documented in a doc comment.
- Validate --since input with a proper LoreError::Other instead of
  silently treating invalid values as None
- Fix ANSI-aware tag column padding with console::pad_str (colored tags
  like "[merged]" were misaligned because ANSI escapes consumed width)
- Remove dead print_timeline_json and infer_max_depth functions that
  were superseded by print_timeline_json_with_meta

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 09:35:02 -05:00
Taylor Eernisse
f1cb45a168 style: format perf_benchmark.rs with cargo fmt
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:49:53 -05:00
Taylor Eernisse
69df8a5603 feat(timeline): wire up lore timeline command with human + robot renderers
Complete Gate 3 by implementing the final three beads:
- bd-2f2: Human output renderer with colored event tags, entity refs,
  evidence snippets, and expansion summary footer
- bd-dty: Robot JSON output with {ok,data,meta} envelope, ISO timestamps,
  nested via provenance, and per-event-type details objects
- bd-1nf: CLI wiring with TimelineArgs (9 flags), Commands::Timeline
  variant, handle_timeline handler, VALID_COMMANDS entry, and robot-docs
  manifest with temporal_intelligence workflow

All 7 Gate 3 children now closed. Pipeline: SEED -> HYDRATE -> EXPAND ->
COLLECT -> RENDER fully operational.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:49:48 -05:00
Taylor Eernisse
b005edb7f2 docs(readme): add timeline pipeline documentation and schema updates
Documents the timeline pipeline feature in the README:
- New feature bullets: timeline pipeline, git history linking, file
  change tracking
- Updated schema table: merge_requests now includes commit SHAs,
  added mr_file_changes table
- New "Timeline Pipeline" section explaining the 5-stage architecture
  (SEED -> HYDRATE -> EXPAND -> COLLECT -> RENDER) with a table of all
  event types and a note on unresolved cross-project references

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:38:48 -05:00
Taylor Eernisse
03d9f8cce5 docs(db): document safety invariants for sqlite-vec transmute
Adds a SAFETY comment explaining why the transmute of sqlite3_vec_init
to the sqlite3_auto_extension callback type is sound. The three
invariants (stable C-ABI signature, single-call-per-connection contract,
idempotency) were previously undocumented, which left the lone unsafe
block without justification for future readers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:38:41 -05:00
Taylor Eernisse
7eadae75f0 test(timeline): add integration tests for full seed-expand-collect pipeline
Adds tests/timeline_pipeline_tests.rs with end-to-end integration tests
that exercise the complete timeline pipeline against an in-memory SQLite
database with realistic data:

- pipeline_seed_expand_collect_end_to_end: Full scenario with an issue
  closed by an MR, state changes, and label events. Verifies that seed
  finds entities via FTS, expand discovers the closing MR through the
  entity_references graph, and collect assembles a chronologically sorted
  event stream containing Created, StateChanged, LabelAdded, and Merged
  events.

- pipeline_empty_query_produces_empty_result: Validates graceful
  degradation when FTS returns zero matches -- all three stages should
  produce empty results without errors.

- pipeline_since_filter_excludes_old_events: Verifies that the since
  timestamp filter propagates correctly through collect, excluding events
  before the cutoff while retaining newer ones.

- pipeline_unresolved_refs_have_optional_iid: Tests the Option<i64>
  target_iid on UnresolvedRef by creating cross-project references both
  with and without known IIDs.

- shared_resolve_entity_ref_scoping: Unit tests for the new shared
  resolve_entity_ref helper covering project-scoped lookup, unscoped
  lookup, wrong-project rejection, unknown entity types, and nonexistent
  entity IDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:38:34 -05:00
Taylor Eernisse
9b23d91378 refactor(timeline): harden pipeline stages with shared resolver and exhaustive error handling
Follows up on the resolve_entity_ref extraction by updating all three
pipeline stages to consume the shared helper and removing their local
duplicates (~75 lines of dead code eliminated).

timeline_seed.rs:
- Switch from local resolve_entity to shared resolve_entity_ref with
  explicit Some(proj_id) scoping
- Add tracing::debug for orphaned discussion parents instead of silently
  skipping them, aiding debugging when evidence notes go missing
- Use saturating_mul for the over-fetch multiplier to prevent overflow on
  pathological max_seeds values

timeline_expand.rs:
- Switch from local resolve_entity_ref to shared version with None
  project scoping (cross-project traversal)
- Pass Option<i64> for target_iid in UnresolvedRef construction instead
  of unwrap_or(0) sentinel
- Update test assertion to compare against Some(42)

timeline_collect.rs:
- Make entity_id_column return Result instead of silently defaulting to
  issue_id for unknown entity types. The previous fallback could produce
  incorrect SQL queries that return wrong results rather than failing
- Replace if-let chains in collect_merged_event with exhaustive match
  blocks that propagate real DB errors while gracefully handling expected
  missing-data cases (QueryReturnedNoRows, NULL merged_at)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:38:24 -05:00
Taylor Eernisse
a324fa26e1 refactor(timeline): extract shared resolve_entity_ref and make target_iid optional
The seed, expand, and collect stages each had their own near-identical
resolve_entity_ref helper that converted internal DB IDs to full EntityRef
structs. This duplication made it easy for bug fixes to land in one copy
but not the others.

Extract a single public resolve_entity_ref into timeline.rs with an
optional project_id parameter:
- Some(project_id): scopes the lookup (used by seed, which knows the
  project from the FTS result)
- None: unscoped lookup (used by expand, which traverses cross-project
  references)

Also changes UnresolvedRef.target_iid from i64 to Option<i64>. Cross-
project references parsed from descriptions may not always carry an IID
(e.g. when the reference is malformed or the target was deleted). The
previous sentinel value of 0 was semantically incorrect since GitLab IIDs
start at 1.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-06 08:38:12 -05:00
Taylor Eernisse
e8845380e9 test: add performance regression benchmarks
Add tests/perf_benchmark.rs with three side-by-side benchmarks that
compare old vs new approaches for the optimizations introduced in the
preceding commits:

- bench_label_insert_individual_vs_batch: measures N individual INSERTs
  vs single multi-row INSERT (5k iterations, ~1.6x speedup)
- bench_string_building_old_vs_new: measures format!+push_str vs
  writeln! (50k iterations, ~1.9x speedup)
- bench_prepare_vs_prepare_cached: measures prepare vs prepare_cached
  (10k iterations, ~1.6x speedup)

Each benchmark verifies correctness (both approaches produce identical
output) and uses std::hint::black_box to prevent dead-code
elimination. Run with: cargo test --test perf_benchmark -- --nocapture

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 17:36:01 -05:00
Taylor Eernisse
3e9cf2358e perf(search+embed): zero-copy embedding API and deferred RRF mapping
Change OllamaClient::embed_batch to accept &[&str] instead of
Vec<String>. The EmbedRequest struct now borrows both model name and
input texts, eliminating per-batch cloning of chunk text (up to 32KB
per chunk x 32 chunks per batch). Serialization output is identical
since serde serializes &str and String to the same JSON.

In hybrid search, defer the RrfResult->HybridResult mapping until
after filter+take, so only `limit` items (typically 20) are
constructed instead of up to 1,500 at RECALL_CAP. Also switch
filtered_ids to into_iter() to avoid an extra .copied() pass.

Switch FTS search_fts from prepare() to prepare_cached() for statement
reuse across repeated searches. Benchmarked at ~1.6x faster.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 17:35:53 -05:00
Taylor Eernisse
16beb35a69 perf(documents): batch INSERTs and writeln! in document pipeline
Replace individual INSERT-per-label and INSERT-per-path loops in
upsert_document_inner with single multi-row INSERT statements. For a
document with 5 labels, this reduces 5 SQL round-trips to 1.

Replace format!()+push_str() with writeln!() in all three document
extractors (issue, MR, discussion). writeln! writes directly into the
String buffer, avoiding the intermediate allocation that format!
creates. Benchmarked at ~1.9x faster for string building and ~1.6x
faster for batch inserts (measured over 5k iterations in-memory).

Also switch get_existing_hash from prepare() to prepare_cached() since
it is called once per document during regeneration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 17:35:42 -05:00
Taylor Eernisse
3767c33c28 feat: Implement Gate 3 timeline pipeline and Gate 4 migration scaffolding
Complete 5 beads for the Phase B temporal intelligence feature:

- bd-1oo: Register migration 015 (commit SHAs, closes watermark) and
  create migration 016 (mr_file_changes table with 4 indexes for
  Gate 4 file-history)

- bd-20e: Define TimelineEvent model with 9 event type variants,
  EntityRef, ExpandedEntityRef, UnresolvedRef, and TimelineResult
  types. Ord impl for chronological sorting with stable tiebreak.

- bd-32q: Implement timeline seed phase - FTS5 keyword search to
  entity IDs with discussion-to-parent resolution, entity dedup,
  and evidence note extraction with snippet truncation.

- bd-ypa: Implement timeline expand phase - BFS cross-reference
  expansion over entity_references with bidirectional traversal,
  depth limiting, mention filtering, provenance tracking, and
  unresolved reference collection.

- bd-3as: Implement timeline event collection - gathers Created,
  StateChanged, LabelAdded/Removed, MilestoneSet/Removed, Merged,
  and NoteEvidence events. Merged dedup (state=merged -> Merged
  variant only). NULL label/milestone fallbacks. Chronological
  interleaving with since filter and limit.

38 new tests, all 445 tests pass. All quality gates clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 16:54:28 -05:00
Taylor Eernisse
d1b2b5fa7d chore(beads): Revise 11 Phase B beads with corrected migration numbering and enriched descriptions
Critical fix: Migration 015 exists on disk but was not registered in db.rs.
All beads referencing "migration 015 for mr_file_changes" corrected to migration
016. bd-1oo retitled to reflect dual responsibility (register 015 + create 016).
bd-2y79 renumbered from 016 to 017.

Revised beads: bd-1oo, bd-2yo, bd-1yx, bd-2y79, bd-1nf, bd-2f2, bd-ike,
bd-14q, bd-1ht, bd-z94, bd-2n4.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 15:59:27 -05:00
Taylor Eernisse
a7d5d1c99f chore(beads): Update issue tracker metadata
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 15:30:01 -05:00
Taylor Eernisse
233eb546af feat: Add commit SHAs, closes_issues watermark, and PRD alignment
Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests
(Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark
for incremental sync, and the missing idx_label_events_label index.

The MR transformer and ingestion pipeline now populate commit SHAs during
sync. The orchestrator uses watermark-based filtering for closes_issues
jobs instead of re-enqueuing all MRs every sync.

The Phase B PRD is updated to match the actual codebase: corrected
migration numbering (011-015), documented nullable label/milestone
fields (migration 012), watermark patterns (013), observability
infrastructure (014), simplified source_method values, and updated
entity_references schema to match implementation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 15:29:51 -05:00
Taylor Eernisse
ddcfff1026 chore(beads): Fix factual errors in Phase B bead descriptions
- Fix 6 beads (bd-1ht, bd-2n4, bd-9dd, bd-z94, bd-1yx, bd-3as) that
  incorrectly claimed merge_requests has NO merged_at column. Migration
  006 defines it and it's used throughout the codebase. Updated SQL
  ordering to use COALESCE(merged_at, updated_at).
- Fix bd-32q: build_safe_fts_query() -> to_fts_query(query, FtsQueryMode::Safe)
  (actual function in src/search/fts.rs)
- Add Rust JSON struct examples to bd-dty (robot mode output)
- Add edge cases section to bd-jec (config flag)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 15:00:46 -05:00
Taylor Eernisse
001e4f37b4 chore(beads): Revise 22 Phase B beads with codebase-grounded context
Audited all 24 open beads against the actual codebase state and
Phase B spec. Key corrections:

- Added Codebase Context sections documenting Gates 1-2 as COMPLETE
  (migrations 011-014, all resource event + reference infrastructure)
- Fixed entity_type from &'static str to String for Serialize compat
- Documented actual source_method values (api/note_parse/description_parse)
  vs spec's original values (api_closes_issues etc.)
- Noted merge_requests has NO merged_at column (use updated_at)
- Confirmed migration 015 as next sequential number
- Added NULL label_name/milestone_title handling (migration 012)
- Fixed --since filter threading through collect_events phase
- Added Merged event deduplication from StateChanged{merged}

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 14:49:15 -05:00
Taylor Eernisse
873d2c0ab8 fix(beads): Align bead descriptions with Phase B spec
Reconciled 9 beads (bd-20e, bd-dty, bd-2f2, bd-3as, bd-ypa, bd-32q,
bd-1nf, bd-2ez, bd-343o) against docs/phase-b-temporal-intelligence.md.

Key fixes:
- bd-20e: Add url field, align StateChanged to {state} per spec 3.3,
  fix NoteEvidence fields (note_id, snippet, discussion_id), simplify
  Merged to unit variant, align CrossReferenced to {target}
- bd-dty: Restructure expanded_entities JSON to use nested "via" object
  per spec 3.5, add url/details fields to events, use "project" key
- bd-3as: Align event collection with updated TimelineEventType variants
- bd-ypa: Add via_from/via_reference_type/via_source_method provenance
- bd-32q, bd-1nf, bd-2f2: Add spec section references throughout
- bd-2ez: Document source_method value discrepancy (spec vs codebase)
- bd-343o: Add spec context for how it extends Gate 2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 14:13:34 -05:00
Taylor Eernisse
42b8238329 chore(beads): Enrich all 24 open beads with agent-ready descriptions
Score-2 beads (11 beads, previously stubs) now include:
- Background with rationale and system fit
- Approach with exact code snippets, SQL queries, and type signatures
- Binary acceptance criteria with specific file paths
- TDD loops with test names and verify commands
- Edge cases and gotchas

Score-3 beads (10 beads, previously adequate) enriched with:
- Concrete TDD loops and test names
- Specific SQL queries for database operations
- Edge case documentation

All beads now target score 4+ for autonomous agent execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 13:57:49 -05:00
Taylor Eernisse
5d1586b88e feat(show): Display full discussion content without truncation
Remove artificial length limits from `lore show` output to display
complete descriptions and discussion threads.

Previously, descriptions were truncated to 500 characters and discussion
notes to 300 characters, which cut off important context when reviewing
issues and MRs. Users often need the full content to understand the
complete discussion history.

Changes:
- Remove truncate() helper function and its 2 unit tests
- Pass description and note bodies directly to wrap_text()
- Affects both print_show_issue() and print_show_mr()

The wrap_text() function continues to handle line wrapping for
readability at the configured widths (76/72/68 chars depending on
nesting level).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:46:29 -05:00
Taylor Eernisse
c2f34d3a4f chore(beads): Update issue tracker metadata
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:23:13 -05:00
Taylor Eernisse
3bb24dc6cb docs: Add performance audit report with optimization findings
PERFORMANCE_AUDIT.md documents a comprehensive code analysis identifying
12 optimization opportunities across the codebase:

High-impact findings (ICE score > 8):
1. Triple-EXISTS change detection -> LEFT JOIN (DONE)
2. N+1 label/assignee inserts during ingestion
3. Clone in embedding batch loop
4. Correlated GROUP_CONCAT in list queries
5. Multiple EXISTS per label filter (DONE)

Medium-impact findings (ICE 5-7):
6. String allocation in chunking
7. Multiple COUNT queries -> conditional aggregation (DONE)
8. Collect-then-concat in truncation (DONE)
9. Box<dyn ToSql> allocations in filters
10. Missing Vec::with_capacity hints (DONE)
11. FTS token collect-join pattern (DONE)
12. Transformer string clones

Report includes:
- Methodology section explaining code-analysis approach
- ICE (Impact x Confidence / Effort) scoring matrix
- Detailed SQL query transformations with isomorphism proofs
- Before/after code samples for each optimization
- Test verification notes

Status: 6 of 12 optimizations implemented in this session.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:23:06 -05:00
Taylor Eernisse
42a4bca6df docs: Update README and AGENTS.md with new features and options
README.md:
- Add cross-reference tracking feature description
- Add resource event history feature description
- Add observability feature description (verbosity, JSON logs, metrics)
- Document --no-events flag for sync command
- Add sync timing/progress bar behavior note
- Document verbosity flags (-v, -vv, -vvv)
- Document --log-format json option
- Add new database tables to schema reference:
  - resource_state_events
  - resource_label_events
  - resource_milestone_events
  - entity_references

AGENTS.md:
- Add --no-events example for sync command
- Document verbosity flags (-v, -vv, -vvv)
- Document --log-format json option

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:22:50 -05:00
Taylor Eernisse
c730b0ec54 feat(cli): Improve help text, error handling, and add fuzzy command suggestions
CLI help improvements (cli/mod.rs):
- Add descriptive help text to all global flags (-c, --robot, -J, etc.)
- Add descriptions to all subcommands (Issues, Mrs, Sync, etc.)
- Add --no-quiet flag for explicit quiet override
- Shell completions now shows installation instructions for each shell
- Optional subcommand: running bare 'lore' shows help in terminal mode,
  robot-docs in robot mode

Structured clap error handling (main.rs):
- Early robot mode detection before parsing (env + args)
- JSON error output for parse failures in robot mode
- Semantic error codes: UNKNOWN_COMMAND, UNKNOWN_FLAG, MISSING_REQUIRED,
  INVALID_VALUE, ARGUMENT_CONFLICT, etc.
- Fuzzy command suggestion using Jaro-Winkler similarity (>0.7 threshold)
- Help/version requests handled normally (exit 0, not error)

Robot-docs enhancements (main.rs):
- Document deprecated command aliases (list issues -> issues, etc.)
- Document clap error codes for programmatic error handling
- Include completions command in manifest
- Update flag documentation to show short forms (-n, -s, -p, etc.)

Dependencies:
- Add strsim 0.11 for Jaro-Winkler fuzzy matching

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:22:38 -05:00
Taylor Eernisse
ab43bbd2db feat: Add dry-run mode to ingest, sync, and stats commands
Enables preview of operations without making changes, useful for
understanding what would happen before committing to a full sync.

Ingest dry-run (--dry-run flag):
- Shows resource type, sync mode (full vs incremental), project list
- Per-project info: existing count, has_cursor, last_synced timestamp
- No GitLab API calls, no database writes

Sync dry-run (--dry-run flag):
- Preview all four stages: issues ingest, MRs ingest, docs, embed
- Shows which stages would run vs be skipped (--no-docs, --no-embed)
- Per-project breakdown for both entity types

Stats repair dry-run (--dry-run flag):
- Shows what would be repaired without executing repairs
- "would fix" vs "fixed" indicator in terminal output
- dry_run: true field in JSON response

Implementation details:
- DryRunPreview struct captures project-level sync state
- SyncDryRunResult aggregates previews for all sync stages
- Terminal output uses yellow styling for "would" actions
- JSON output includes dry_run: true at top level

Flag handling:
- --dry-run and --no-dry-run pair for explicit control
- Defaults to false (normal operation)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:22:22 -05:00
Taylor Eernisse
784fe79b80 feat(show): Enrich issue detail with assignees, milestones, and closing MRs
Issue detail now includes:
- assignees: List of assigned usernames from issue_assignees table
- due_date: Issue due date when set
- milestone: Milestone title when assigned
- closing_merge_requests: MRs that will close this issue when merged

Closing MR detection:
- Queries entity_references table for 'closes' reference type
- Shows MR iid, title, state (with color coding) in terminal output
- Full MR metadata included in JSON output

Human-readable output:
- "Assignees:" line shows comma-separated @usernames
- "Development:" section lists closing MRs with state indicator
- Green for merged, cyan for opened, red for closed

JSON output:
- New fields: assignees, due_date, milestone, closing_merge_requests
- closing_merge_requests array contains iid, title, state, web_url

Test coverage:
- get_issue_assignees: empty, single, multiple (alphabetical order)
- get_closing_mrs: empty, single, ignores 'mentioned' references

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:22:02 -05:00
Taylor Eernisse
db750e4fc5 fix: Graceful HTTP client fallbacks and overflow protection
HTTP client initialization (embedding/ollama.rs, gitlab/client.rs):
- Replace expect/panic with unwrap_or_else fallback to default Client
- Log warning when configured client fails to build
- Prevents crash on TLS/system configuration issues

Doctor command (cli/commands/doctor.rs):
- Handle reqwest Client::builder() failure in Ollama health check
- Return Warning status with descriptive message instead of panicking
- Ensures doctor command remains operational even with HTTP issues

These changes improve resilience when running in unusual environments
(containers with limited TLS, restrictive network policies, etc.)
without affecting normal operation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:21:40 -05:00
Taylor Eernisse
72f1cafdcf perf: Optimize SQL queries and reduce allocations in hot paths
Change detection queries (embedding/change_detector.rs):
- Replace triple-EXISTS subquery pattern with LEFT JOIN + NULL check
- SQLite now scans embedding_metadata once instead of three times
- Semantically identical: returns docs needing embedding when no
  embedding exists, hash changed, or config mismatch

Count queries (cli/commands/count.rs):
- Consolidate 3 separate COUNT queries for issues into single query
  using conditional aggregation (CASE WHEN state = 'x' THEN 1)
- Same optimization for MRs: 5 queries reduced to 1

Search filter queries (search/filters.rs):
- Replace N separate EXISTS clauses for label filtering with single
  IN() clause with COUNT/GROUP BY HAVING pattern
- For multi-label AND queries, this reduces N subqueries to 1

FTS tokenization (search/fts.rs):
- Replace collect-into-Vec-then-join pattern with direct String building
- Pre-allocate capacity hint for result string

Discussion truncation (documents/truncation.rs):
- Calculate total length without allocating concatenated string first
- Only allocate full string when we know it fits within limit

Embedding pipeline (embedding/pipeline.rs):
- Add Vec::with_capacity hints for chunk work and cleared_docs hashset
- Reduces reallocations during embedding batch processing

Backoff calculation (core/backoff.rs):
- Replace unchecked addition with saturating_add to prevent overflow
- Add test case verifying overflow protection

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 11:21:28 -05:00
Taylor Eernisse
9c04b7fb1b chore(beads): Update issue tracker metadata
Syncs .beads/issues.jsonl and last-touched timestamp with current
project state.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:04:44 -05:00
Taylor Eernisse
dd2869fd98 test: Remove redundant comments from test files
Applies the same doc comment cleanup to test files:
- Removes test module headers (//! lines)
- Removes obvious test function comments
- Retains comments explaining non-obvious test scenarios

Test names should be descriptive enough to convey intent without
additional comments. Complex test setup or assertions that need
explanation retain their comments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:04:39 -05:00
Taylor Eernisse
65583ed5d6 refactor: Remove redundant doc comments throughout codebase
Removes module-level doc comments (//! lines) and excessive inline doc
comments that were duplicating information already evident from:
- Function/struct names (self-documenting code)
- Type signatures (the what is clear from types)
- Implementation context (the how is clear from code)

Affected modules:
- cli/* - Removed command descriptions duplicating clap help text
- core/* - Removed module headers and obvious function docs
- documents/* - Removed extractor/regenerator/truncation docs
- embedding/* - Removed pipeline and chunking docs
- gitlab/* - Removed client and transformer docs (kept type definitions)
- ingestion/* - Removed orchestrator and ingestion docs
- search/* - Removed FTS and vector search docs

Philosophy: Code should be self-documenting. Comments should explain
"why" (business decisions, non-obvious constraints) not "what" (which
the code itself shows). This change reduces noise and maintenance burden
while keeping the codebase just as understandable.

Retains comments for:
- Non-obvious business logic
- Important safety invariants
- Complex algorithm explanations
- Public API boundaries where generated docs matter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:04:32 -05:00
Taylor Eernisse
976ad92ef0 test(gitlab): Add GitLabIssueRef deserialization tests
Adds test coverage for the new GitLabIssueRef type used by the
MR closes_issues API endpoint:

- deserializes_gitlab_issue_ref: Single object with all fields
- deserializes_gitlab_issue_ref_array: Array of refs (typical API response)

Validates that cross-project references (different project_id values)
deserialize correctly, which is important for cross-project close links.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:47 -05:00
Taylor Eernisse
a76dc8089e feat(orchestrator): Integrate closes_issues fetching and cross-ref extraction
Extends the MR ingestion pipeline to populate the entity_references table
from multiple sources:

1. Resource state events (extract_refs_from_state_events):
   Called after draining the resource_events queue for both issues and MRs.
   Extracts "closes" relationships from the structured API data.

2. System notes (extract_refs_from_system_notes):
   Called during MR ingestion to parse "mentioned in" and "closed by"
   patterns from discussion note bodies.

3. MR closes_issues API (new):
   - enqueue_mr_closes_issues_jobs(): Queues jobs for all MRs
   - drain_mr_closes_issues(): Fetches closes_issues for each MR
   - Records cross-references with source_method='closes_issues_api'

New progress events:
- ClosesIssuesFetchStarted { total }
- ClosesIssueFetched { current, total }
- ClosesIssuesFetchComplete { fetched, failed }

New result fields on IngestMrProjectResult:
- closes_issues_fetched: Count of successful fetches
- closes_issues_failed: Count of failed fetches

The pipeline now comprehensively builds the relationship graph between
issues and MRs, enabling queries like "what will close this issue?"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:40 -05:00
Taylor Eernisse
26cf13248d feat(gitlab): Add MR closes_issues API endpoint and GitLabIssueRef type
Extends the GitLab client to fetch the list of issues that an MR will close
when merged, using the /projects/:id/merge_requests/:iid/closes_issues endpoint.

New type:
- GitLabIssueRef: Lightweight issue reference with id, iid, project_id, title,
  state, and web_url. Used for the closes_issues response which returns a list
  of issue summaries rather than full GitLabIssue objects.

New client method:
- fetch_mr_closes_issues(gitlab_project_id, iid): Returns Vec<GitLabIssueRef>
  for all issues that the MR's description/commits indicate will be closed.

This enables building the entity_references table from API data in addition to
parsing system notes, providing more reliable cross-reference discovery.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:30 -05:00
Taylor Eernisse
a2e26454dc build: Add regex dependency for cross-reference parsing
The note_parser module requires regex for extracting "mentioned in" and
"closed by" patterns from GitLab system notes. The regex crate provides:

- LazyLock-compatible lazy compilation (Regex::new at first use)
- Named capture groups for clean field extraction
- Efficient iteration over all matches via captures_iter()

Version 1.x is the current stable release with good compile times.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:21 -05:00
Taylor Eernisse
f748570d4d feat(core): Add cross-reference extraction infrastructure
Introduces two new modules for extracting and storing entity cross-references
from GitLab data:

note_parser.rs:
- Parses system notes for "mentioned in" and "closed by" patterns
- Extracts cross-project references (group/project#42, group/project!123)
- Uses lazy-compiled regexes for performance
- Handles both issue (#) and MR (!) sigils
- Provides extract_refs_from_system_notes() for batch processing

references.rs:
- Extracts refs from resource_state_events table (API-sourced closes links)
- Provides insert_entity_reference() for storing discovered references
- Includes resolution helpers: resolve_issue_local_id, resolve_mr_local_id,
  resolve_project_path for converting iids to internal IDs
- Enables cross-project reference resolution

These modules power the entity_references table, enabling features like
"find all MRs that close this issue" and "find all issues mentioned in this MR".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:13 -05:00
Taylor Eernisse
0b6b168043 chore(beads): Update issue tracker metadata
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 15:02:17 -05:00
Taylor Eernisse
1d003aeac2 fix(sync): Replace text-only progress with animated bars for docs/embed stages
Stages 3 (generate-docs) and 4 (embed) reported progress by appending
"(N/M)" text to the stage spinner message, while stages 1-2 (ingest)
used dedicated indicatif progress bars with animated [====> ] rendering
registered with the global MultiProgress. This visual inconsistency
was introduced when progress callbacks were wired through in 266ed78.

Replace the spinner.set_message() callbacks with proper ProgressBar
instances that match the ingest stage pattern:
- Create a bar-style ProgressBar registered via multi().add()
- Use the same template/progress_chars as the ingest discussion bars
- Lazy-init the tick via AtomicBool to avoid showing the bar before
  the first callback fires (matching how ingest enables ticks only
  at DiscussionSyncStarted)
- Update set_length on every callback for the docs stage, since the
  regenerator's estimated_total can grow if new dirty items are
  queued during processing (using .max() internally)
- Clean up both the sub-bar and stage spinner on completion/error

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 15:02:13 -05:00
Taylor Eernisse
925ec9f574 fix: Retry loop safety, doctor model matching, regenerator robustness
Three defensive improvements from peer code review:

Replace unreachable!() in GitLab client retry loops:
Both request() and request_with_headers() had unreachable!() after
their for loops. While the logic was sound (the final iteration always
reaches the return/break), any refactor to the loop condition would
turn this into a runtime panic. Restructured both to store
last_response with explicit break, making the control flow
self-documenting and the .expect() message useful if ever violated.

Doctor model name comparison asymmetry:
Ollama model names were stripped of their tag (:latest, :v1.5) for
comparison, but the configured model name was compared as-is. A config
value like "nomic-embed-text:v1.5" would never match. Now strips the
tag from both sides before comparing.

Regenerator savepoint cleanup and progress accuracy:
- upsert_document's error path did ROLLBACK TO but never RELEASE,
  leaving a dangling savepoint that could nest on the next call. Added
  RELEASE after rollback so the connection is clean.
- estimated_total for progress reporting was computed once at start but
  the dirty queue can grow during processing. Now recounts each loop
  iteration with max() so the progress fraction never goes backwards.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 14:16:54 -05:00
Taylor Eernisse
1fdc6d03cc fix: Savepoint leak in embedding pipeline, atomic fail_job, RRF dedup
Three correctness fixes found during peer code review:

Embedding pipeline savepoint leak (HIGH severity):
The SAVEPOINT embed_page / RELEASE embed_page pattern had ~10 `?`
propagation points between them. Any error from record_embedding_error,
clear_document_embeddings, or store_embedding would exit the function
without rolling back, leaving the SQLite connection in a broken
transactional state and causing cascading failures for the rest of the
session. Fixed by extracting page processing into `embed_page()` and
wrapping with explicit rollback-on-error handling.

Dependent queue fail_job race (MEDIUM severity):
fail_job performed a SELECT followed by a separate UPDATE on the
attempts counter without a transaction. Under concurrent lock
reclamation, the attempts value could be read stale. Replaced with a
single atomic UPDATE that increments attempts and computes exponential
backoff entirely in SQL, also halving DB round-trips. Added explicit
error when the job no longer exists.

RRF duplicate document score inflation (MEDIUM severity):
If a retriever returned the same document_id multiple times, the RRF
score accumulated multiple rank contributions while the rank only
recorded the first occurrence. Moved the score accumulation inside the
`if is_none` guard so only the first occurrence per list contributes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 14:16:38 -05:00
Taylor Eernisse
266ed78e73 feat(sync): Wire progress callbacks through sync pipeline stages
The sync command's stage spinners now show real-time aggregate progress
for each pipeline phase instead of static "syncing..." messages.

- Add `progress_callback` parameter to `run_embed` and
  `run_generate_docs` so callers can receive `(processed, total)` updates
- Add `stage_bar` parameter to `run_ingest` for aggregate progress
  across concurrently-ingested projects using shared AtomicUsize counters
- Update `stage_spinner` to use `{prefix}` for the `[N/M]` label,
  allowing `{msg}` to be updated independently with progress details
- Thread `ProgressBar` clones into each concurrent project task so
  per-entity progress (fetch, discussions, events) is reflected on the
  aggregate spinner
- Pass `None` for progress callbacks at standalone CLI entry points
  (handle_ingest, handle_generate_docs, handle_embed) to preserve
  existing behavior when commands are run outside of sync

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 14:16:21 -05:00
teernisse
a65ea2f56f chore(beads): Add observability and orchestrator issues to tracker
Add new beads for MR orchestrator integration, sync run observability,
metrics collection, logging infrastructure, and CLI verbosity controls.
Update last-touched timestamp.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:39:34 -05:00
teernisse
38da7ca47b docs: Add observability PRD and sync pipeline explorer visualization
- prd-observability.md: Product requirements document for the sync pipeline
  observability system, covering structured logging, metrics collection,
  sync run tracking, and robot-mode performance output
- gitlore-sync-explorer.html: Self-contained interactive HTML visualization
  for exploring sync pipeline stage timings and data flow

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:39:22 -05:00
teernisse
86a51cddef fix: Project-scoped job claiming, structured rate-limit logging, RRF total_cmp
Targeted fixes across multiple subsystems:

dependent_queue:
- Add project_id parameter to claim_jobs() for project-scoped job claiming,
  preventing cross-project job theft during concurrent multi-project ingestion
- Add project_id parameter to count_pending_jobs() with optional scoping
  (None returns global counts, Some(pid) returns per-project counts)

gitlab/client:
- Downgrade rate-limit log from warn to info (429s are expected operational
  behavior, not warnings) and add structured fields (path, status_code)
  for better log filtering and aggregation

gitlab/transformers/discussion:
- Add tracing::warn on invalid timestamp parse instead of silent fallback
  to epoch 0, making data quality issues visible in logs

ingestion/merge_requests:
- Remove duplicate doc comment on upsert_label_tx

search/rrf:
- Replace partial_cmp().unwrap_or() with total_cmp() for f64 sorting,
  eliminating the NaN edge case entirely (total_cmp treats NaN consistently)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:39:13 -05:00
teernisse
f6d19a9467 feat(sync): Instrument pipeline with tracing spans, run_id correlation, and metrics
Add end-to-end observability to the sync and ingest pipelines:

Sync command:
- Generate UUID-based run_id for each sync invocation, propagated through
  all child spans for log correlation across stages
- Accept MetricsLayer reference to extract hierarchical StageTiming data
  after pipeline completion for robot-mode performance output
- Record sync runs in DB via SyncRunRecorder (start/succeed/fail lifecycle)
- Wrap entire sync execution in a root tracing span with run_id field

Ingest command:
- Wrap run_ingest in an instrumented root span with run_id and resource_type
- Add project path prefix to discussion progress bars for multi-project clarity
- Reset resource_events_synced_for_updated_at on --full re-sync

Sync status:
- Expand from single last_run to configurable recent runs list (default 10)
- Parse and expose StageTiming metrics from stored metrics_json
- Add run_id, total_items_processed, total_errors to SyncRunInfo
- Add mr_count to DataSummary for complete entity coverage

Orchestrator:
- Add #[instrument] with structured fields to issue and MR ingestion functions
- Record items_processed, items_skipped, errors on span close for MetricsLayer
- Emit granular progress events (IssuesFetchStarted, IssuesFetchComplete)
- Pass project_id through to drain_resource_events for scoped job claiming

Document regenerator and embedding pipeline:
- Add #[instrument] spans with items_processed, items_skipped, errors fields
- Record final counts on span close for metrics extraction

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:39:00 -05:00
teernisse
362503d3bf feat(cli): Add verbosity controls, JSON log format, and triple-layer subscriber
Overhaul the CLI logging infrastructure for production observability:

CLI flags:
- Add -v/-vv/-vvv (--verbose) for progressive stderr verbosity control:
  0=INFO, 1=DEBUG app, 2=DEBUG all, 3+=TRACE
- Add --log-format text|json for structured stderr output in automation
- Existing -q/--quiet overrides verbosity for silent operation

Subscriber architecture (main.rs):
- Replace single-layer subscriber with triple-layer setup:
  1. stderr layer: human-readable or JSON, filtered by -v flags
  2. file layer: always-on JSON to daily-rotated logs (lore.YYYY-MM-DD.log)
  3. MetricsLayer: captures span timing for robot-mode performance payloads
- Parse CLI before subscriber init so verbosity is known at setup time
- Load LoggingConfig early (with graceful fallback for pre-init commands)
- Clean up old log files before subscriber init to avoid holding deleted handles
- Hold WorkerGuard at function scope to ensure flush on exit

Doctor command:
- Add logging health check: validates log directory exists, reports file
  count and total size, warns on missing or inaccessible log directory

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:38:43 -05:00
teernisse
329c8f4539 feat(observability): Add metrics, logging, and sync-run core modules
Introduce the foundational observability layer for the sync pipeline:

- MetricsLayer: Custom tracing subscriber layer that captures span timing
  and structured fields, materializing them into a hierarchical
  Vec<StageTiming> tree for robot-mode performance data output
- logging: Dual-layer subscriber infrastructure with configurable stderr
  verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily
  rotation and configurable retention (default 30 days)
- SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs
  table (start -> succeed|fail), with correlation IDs and aggregate counts
- LoggingConfig: New config section for log_dir, retention_days, and
  file_logging toggle
- get_log_dir(): Path helper for log directory resolution
- is_permanent_api_error(): Distinguish retryable vs permanent API failures
  (only 404 is truly permanent; 403/auth errors may be environmental)

Database changes:
- Migration 013: Add resource_events_synced_for_updated_at watermark columns
  to issues and merge_requests tables for incremental resource event sync
- Migration 014: Enrich sync_runs with run_id correlation ID, aggregate
  counts (total_items_processed, total_errors), and run_id index
- Wrap file-based migrations in savepoints for rollback safety

Dependencies: Add uuid (run_id generation), tracing-appender (file logging)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:38:29 -05:00
Taylor Eernisse
ee5c5f9645 perf: Eliminate double serialization, add SQLite tuning, optimize hot paths
11 isomorphic performance fixes from deep audit (no behavior changes):

- Eliminate double serialization: store_payload now accepts pre-serialized
  bytes (&[u8]) instead of re-serializing from serde_json::Value. Uses
  Cow<[u8]> for zero-copy when compression is disabled.
- Add SQLite cache_size (64MB) and mmap_size (256MB) pragmas
- Replace SELECT-then-INSERT label upserts with INSERT...ON CONFLICT
  RETURNING in both issues.rs and merge_requests.rs
- Replace INSERT + SELECT milestone upsert with RETURNING
- Use prepare_cached for 5 hot-path queries in extractor.rs
- Optimize compute_list_hash: index-sort + incremental SHA-256 instead
  of clone+sort+join+hash
- Pre-allocate embedding float-to-bytes buffer with Vec::with_capacity
- Replace RandomState::new() in rand_jitter with atomic counter XOR nanos
- Remove redundant per-note payload storage (discussion payload contains
  all notes already)
- Change transform_issue to accept &GitLabIssue (avoids full struct clone)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 08:12:37 -05:00
Taylor Eernisse
f5b4a765b7 perf: Configurable rate limit, 429 auto-retry, concurrent project ingestion
The sync pipeline was bottlenecked at 10 req/s (hardcoded) with
sequential project processing and no retry on rate limiting. These
changes target 3-5x throughput improvement.

Rate limit configuration:
- Add requestsPerSecond to SyncConfig (default 30.0, was hardcoded 10)
- Pass configured rate through to GitLabClient::new from ingest
- Floor rate at 0.1 rps in RateLimiter::new to prevent panic on
  Duration::from_secs_f64(1.0 / 0.0) — now reachable via user config

429 auto-retry:
- Both request() and request_with_headers() retry up to 3 times on
  HTTP 429, respecting the retry-after header (default 60s)
- Extract parse_retry_after helper, reused by handle_response fallback
- After exhausting retries, the 429 error propagates as before
- Improved JSON decode errors now include a response body preview

Concurrent project ingestion:
- Derive Clone on GitLabClient (cheap: shares Arc<Mutex<RateLimiter>>
  and reqwest::Client which is already Arc-backed)
- Restructure project loop to use futures::stream::buffer_unordered
  with primary_concurrency (default 4) as the parallelism bound
- Each project gets its own SQLite connection (WAL mode + busy_timeout
  handles concurrent writes)
- Add show_spinner field to IngestDisplay to separate the per-project
  spinner from the sync-level stage spinner
- Error aggregation defers failures: all successful projects get their
  summaries printed and results counted before returning the first error
- Bump dependentConcurrency default from 2 to 8 for discussion prefetch

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:37:06 -05:00
Taylor Eernisse
4ee99c1677 fix: Propagate queue errors, eliminate format!-based SQL construction
Two hardening changes to the dependent queue and orchestrator:

- dependent_queue::fail_job now propagates the rusqlite error via ?
  instead of silently falling back to 0 attempts when the job row is
  missing. A missing job is a real bug that should surface, not be
  masked by unwrap_or(0) which would cause infinite retries at the
  base backoff interval.

- orchestrator::enqueue_resource_events_for_entity_type replaces
  format!-based SQL ("SELECT {id_col} FROM {table}") with separate
  hardcoded queries per entity type. While the original values were
  not user-controlled, hardcoded SQL is clearer about intent and
  eliminates a class of injection risk entirely.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:36:45 -05:00
Taylor Eernisse
c35f485e0e refactor(cli): Replace tracing-indicatif with shared MultiProgress
tracing-indicatif pulled in vt100, arrayvec, and its own indicatif
integration layer. Replace it with a minimal SuspendingWriter that
coordinates tracing output with progress bars via a global LazyLock
MultiProgress.

- Add src/cli/progress.rs: shared MultiProgress singleton via LazyLock
  and a SuspendingWriter that suspends bars before writing log lines,
  preventing interleaving/flicker
- Wire all progress bar creation through multi().add() in sync and
  ingest commands
- Replace IndicatifLayer in main.rs with SuspendingWriter for
  tracing-subscriber's fmt layer
- Remove tracing-indicatif from Cargo.toml (drops vt100 and arrayvec
  transitive deps)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:36:31 -05:00
Taylor Eernisse
a92e176bb6 fix(events): Handle nullable label and milestone in resource events
GitLab returns null for the label/milestone fields on resource_label_events
and resource_milestone_events when the referenced label or milestone has
been deleted. This caused deserialization failures during sync.

- Add migration 012 to recreate both event tables with nullable
  label_name, milestone_title, and milestone_id columns (SQLite
  requires table recreation to alter NOT NULL constraints)
- Change GitLabLabelEvent.label and GitLabMilestoneEvent.milestone
  to Option<> in the Rust types
- Update upsert functions to pass through None values correctly
- Add tests for null label and null milestone deserialization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 17:36:17 -05:00
Taylor Eernisse
deafa88af5 perf: Concurrent resource event fetching, remove unnecessary async
client.rs:
- fetch_all_resource_events() now uses tokio::try_join!() to fire all
  three API requests (state, label, milestone events) concurrently
  instead of awaiting each sequentially. For entities with many events,
  this reduces wall-clock time by up to ~3x since the three independent
  HTTP round-trips overlap.

main.rs:
- Removed async from handle_issues() and handle_mrs(). These functions
  perform only synchronous database queries and formatting; they never
  await anything. Removing the async annotation avoids the overhead of
  an unnecessary Future state machine and makes the sync nature of
  these code paths explicit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 14:09:44 -05:00
Taylor Eernisse
880ad1d3fa refactor(events): Lift transaction control to callers, eliminate duplicated store functions
events_db.rs:
- Removed internal savepoints from upsert_state_events,
  upsert_label_events, and upsert_milestone_events. Each function
  previously created its own savepoint, making it impossible for
  callers to wrap all three in a single atomic transaction.
- Changed signatures from &mut Connection to &Connection, since
  savepoints are no longer created internally. This makes the
  functions compatible with rusqlite::Transaction (which derefs to
  Connection), allowing callers to pass a transaction directly.

orchestrator.rs:
- Deleted the three store_*_events_tx() functions (store_state_events_tx,
  store_label_events_tx, store_milestone_events_tx) which were
  hand-duplicated copies of the events_db upsert functions, created as
  a workaround for the &mut Connection requirement. Now that events_db
  accepts &Connection, store_resource_events() calls the canonical
  upsert functions directly through the unchecked_transaction.
- Replaced the max-iterations guard in drain_resource_events() with a
  HashSet-based deduplication of job IDs. The old guard used an
  arbitrary 2x multiplier on total_pending which could either terminate
  too early (if many retries were legitimate) or too late. The new
  approach precisely prevents reprocessing the same job within a single
  drain run, which is the actual invariant we need.

Net effect: ~133 lines of duplicated SQL removed, single source of
truth for event upsert logic, and callers control transaction scope.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 14:09:35 -05:00
Taylor Eernisse
4c0123426a fix: Content hash now computed after truncation, atomic job claiming
Two bug fixes:

1. extractor.rs: The content hash was computed on the pre-truncation
   content, meaning the hash stored in the document didn't correspond
   to the actual stored (truncated) content. This would cause change
   detection to miss updates when content changed only within the
   truncated portion. Hash is now computed after truncate_hard_cap()
   so it always matches the persisted content.

2. dependent_queue.rs: claim_jobs() had a TOCTOU race between the
   SELECT that found available jobs and the UPDATE that locked them.
   Under concurrent callers, two drain runs could claim the same job.
   Replaced with a single UPDATE ... RETURNING statement that
   atomically selects and locks jobs in one operation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 14:09:22 -05:00
Taylor Eernisse
bb75a9d228 fix(events): Resource events now run on incremental syncs, fix output and progress bar
Three bugs fixed:

1. Early return in orchestrator when no discussions needed sync also
   skipped resource event enqueue+drain. On incremental syncs (the most
   common case), resource events were never fetched. Restructured to use
   if/else instead of early return so Step 4 always executes.

2. Ingest command JSON and human-readable output silently dropped
   resource_events_fetched/failed counts. Added to IngestJsonData and
   print_ingest_summary.

3. Progress bar reuse after finish_and_clear caused indicatif to silently
   ignore subsequent set_position/set_length calls. Added reset() call
   before reconfiguring the bar for resource events.

Also removed stale comment referencing "unsafe" that didn't reflect
the actual unchecked_transaction approach.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 13:06:35 -05:00
Taylor Eernisse
2bcd8db0e9 feat(events): Wire resource event fetching into sync pipeline (bd-1ep)
Integrate resource event fetching as Step 4 of both issue and MR
ingestion, gated behind the fetch_resource_events config flag.

Orchestrator changes:
- Add ProgressEvent variants: ResourceEventsFetchStarted,
  ResourceEventFetched, ResourceEventsFetchComplete
- Add resource_events_fetched/failed fields to IngestProjectResult
  and IngestMrProjectResult
- New enqueue_resource_events_for_entity_type() queries all
  issues/MRs for a project and enqueues resource_events jobs via
  the dependent queue (INSERT OR IGNORE for idempotency)
- New drain_resource_events() claims jobs in batches, fetches
  state/label/milestone events from GitLab API, stores them
  atomically via unchecked_transaction, and handles failures
  with exponential backoff via fail_job()
- Max-iterations guard prevents infinite retry loops within a
  single drain run
- New store_resource_events() + per-type _tx helpers write events
  using prepared statements inside a single transaction
- DrainResult struct tracks fetched/failed counts

CLI ingest changes:
- IngestResult gains resource_events_fetched/failed fields
- Progress bar repurposed for resource event fetch phase
  (reuses discussion bar with updated template)
- Accumulates event counts from both issue and MR ingestion

CLI sync changes:
- SyncResult gains resource_events_fetched/failed fields
- Accumulates counts from both ingest stages
- print_sync() conditionally displays event counts
- Structured logging includes event counts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 13:02:15 -05:00
Taylor Eernisse
a50fc78823 style: Apply cargo fmt and clippy fixes across codebase
Automated formatting and lint corrections from parallel agent work:

- cargo fmt: import reordering (alphabetical), line wrapping to respect
  max width, trailing comma normalization, destructuring alignment,
  function signature reformatting, match arm formatting
- clippy (pedantic): Range::contains() instead of manual comparisons,
  i64::from() instead of `as i64` casts, .clamp() instead of
  .max().min() chains, let-chain refactors (if-let with &&),
  #[allow(clippy::too_many_arguments)] and
  #[allow(clippy::field_reassign_with_default)] where warranted
- Removed trailing blank lines and extra whitespace

No behavioral changes. All existing tests pass unmodified.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 13:01:59 -05:00
Taylor Eernisse
ff94f24702 chore(beads): Update issue tracker state for Gate 1 completions
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 13:01:46 -05:00
Taylor Eernisse
5c521491b7 chore(beads): Update issue tracker state for Gate 1 completions
Closes bd-hu3, bd-2e8, bd-2fm, bd-sqw, bd-1uc, bd-tir, bd-3sh, bd-1m8.
All Gate 1 resource events infrastructure beads except bd-1ep (pipeline
wiring) are now complete.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:08:23 -05:00
Taylor Eernisse
0236ef2776 feat(stats): Extend --check with event FK integrity and queue health diagnostics
Adds two new categories of integrity checks to 'lore stats --check':

Event FK integrity (3 queries):
- Detects orphaned resource_state_events where issue_id or
  merge_request_id points to a non-existent parent entity
- Same check for resource_label_events and resource_milestone_events
- Under normal CASCADE operation these should always be zero; non-zero
  indicates manual DB edits, bugs, or partial migration state

Queue health diagnostics:
- pending_dependent_fetches counts: pending, failed, and stuck (locked)
- queue_stuck_locks: Jobs with locked_at set (potential worker crashes)
- queue_max_attempts: Highest retry count across all jobs (signals
  permanently failing jobs when > 3)

New IntegrityResult fields: orphan_state_events, orphan_label_events,
orphan_milestone_events, queue_stuck_locks, queue_max_attempts.

New QueueStats fields: pending_dependent_fetches,
pending_dependent_fetches_failed, pending_dependent_fetches_stuck.

Human output shows colored PASS/WARN/FAIL indicators:
- Red "!" for orphaned events (integrity failure)
- Yellow "!" for stuck locks and high retry counts (warnings)
- Dependent fetch queue line only shown when non-zero

All new queries are guarded by table_exists() checks for graceful
degradation on databases without migration 011 applied.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:08:15 -05:00
Taylor Eernisse
12811683ca feat(cli): Add 'lore count events' command with human and robot output
Extends the count command to support "events" as an entity type,
displaying resource event counts broken down by event type (state,
label, milestone) and entity type (issue, merge request).

New functions in count.rs:
- run_count_events: Creates DB connection and delegates to
  events_db::count_events for the actual queries
- print_event_count: Human-readable table with aligned columns
  showing per-type breakdowns and row/column totals
- print_event_count_json: Structured JSON matching the robot mode
  contract with ok/data envelope and per-type issue/mr/total counts

JSON output structure:
  {"ok":true,"data":{"state_events":{"issue":N,"merge_request":N,
  "total":N},"label_events":{...},"milestone_events":{...},"total":N}}

Updated exports in commands/mod.rs to expose the three new public
functions (run_count_events, print_event_count, print_event_count_json).

The "events" branch in handle_count (main.rs, committed earlier)
routes to these functions before the existing entity type dispatcher.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:08:01 -05:00
Taylor Eernisse
724be4d265 feat(queue): Add generic dependent fetch queue with exponential backoff
New module src/core/dependent_queue.rs provides job queue operations
against the pending_dependent_fetches table. Designed for second-pass
fetches that depend on primary entity ingestion (resource events,
MR close references, MR file diffs).

Queue operations:
- enqueue_job: Idempotent INSERT OR IGNORE keyed on the UNIQUE
  (project_id, entity_type, entity_iid, job_type) constraint.
  Returns bool indicating whether the row was actually inserted.

- claim_jobs: Two-phase claim — SELECT available jobs (unlocked,
  past retry window) then UPDATE locked_at in batch. Orders by
  enqueued_at ASC for FIFO processing within a job type.

- complete_job: DELETE the row on successful processing.

- fail_job: Increments attempts, calculates exponential backoff
  (30s * 2^(attempts-1), capped at 480s), sets next_retry_at,
  clears locked_at, and records the error message. Reads current
  attempts via query with unwrap_or(0) fallback for robustness.

- reclaim_stale_locks: Clears locked_at on jobs locked longer than
  a configurable threshold, recovering from worker crashes.

- count_pending_jobs: GROUP BY job_type aggregation for progress
  reporting and stats display.

Registers both events_db and dependent_queue in src/core/mod.rs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:07:48 -05:00
Taylor Eernisse
c34ed3007e feat(db): Add event upsert functions and count queries in events_db module
New module src/core/events_db.rs provides database operations for
resource events:

- upsert_state_events: Batch INSERT OR REPLACE for state change events,
  keyed on UNIQUE(gitlab_id, project_id). Wraps in a savepoint for
  atomicity per entity batch. Maps GitLabStateEvent fields including
  optional user, source_commit, and source_merge_request_iid.

- upsert_label_events: Same pattern for label add/remove events,
  extracting label.name for denormalized storage.

- upsert_milestone_events: Same pattern for milestone assignment events,
  storing both milestone.title and milestone.id.

All three upsert functions:
- Take &mut Connection (required for savepoint creation)
- Use prepare_cached for statement reuse across batch iterations
- Convert ISO timestamps via iso_to_ms_strict for ms-epoch storage
- Propagate rusqlite errors via the #[from] LoreError::Database path
- Return the count of events processed

Supporting functions:
- resolve_entity_ids: Maps entity_type string to (issue_id, MR_id) pair
  with exactly-one-non-NULL invariant matching the CHECK constraints
- count_events: Queries all three event tables with conditional COUNT
  aggregations, returning EventCounts struct. Uses unwrap_or((0, 0))
  for graceful degradation when tables don't exist (pre-migration 011).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:07:34 -05:00
Taylor Eernisse
e73d2907dc feat(client): Add Resource Events API endpoints with generic paginated fetcher
Extends GitLabClient with methods for fetching resource events from
GitLab's per-entity API endpoints. Adds a new impl block containing:

- fetch_all_pages<T>: Generic paginated collector that handles
  x-next-page header parsing with fallback to page-size heuristics.
  Uses per_page=100 and respects the existing rate limiter via
  request_with_headers. Terminates when: (a) x-next-page header is
  absent/stale, (b) response is empty, or (c) page is not full.

- Six typed endpoint methods:
  - fetch_issue_state_events / fetch_mr_state_events
  - fetch_issue_label_events / fetch_mr_label_events
  - fetch_issue_milestone_events / fetch_mr_milestone_events

- fetch_all_resource_events: Convenience method that fetches all three
  event types for an entity (issue or merge_request) in sequence,
  returning a tuple of (state, label, milestone) event vectors.
  Routes to issue or MR endpoints based on entity_type string.

All methods follow the existing client patterns: path formatting with
gitlab_project_id and iid, error propagation via Result, and rate
limiter integration through the shared request_with_headers path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:07:19 -05:00
Taylor Eernisse
9d4755521f feat(config): Add fetchResourceEvents config flag with --no-events CLI override
Adds a new boolean field to SyncConfig that controls whether resource
event fetching is performed during sync:

- SyncConfig.fetch_resource_events: defaults to true via serde
  default_true helper, serialized as "fetchResourceEvents" in JSON
- SyncArgs.no_events: --no-events CLI flag that overrides the config
  value to false when present
- SyncOptions.no_events: propagates the flag through the sync pipeline
- handle_sync_cmd: mutates loaded config when --no-events is set,
  ensuring the flag takes effect regardless of config file contents

This follows the existing pattern established by --no-embed and
--no-docs flags, where CLI flags override config file defaults.
The config is loaded as mutable specifically to support this override.

Also adds "events" to the count command's entity type value_parser,
enabling `lore count events` (implementation in a separate commit).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:07:06 -05:00
Taylor Eernisse
92ff255909 feat(types): Add GitLab Resource Event serde types with deserialization tests
Adds six new types for deserializing responses from GitLab's three
Resource Events API endpoints (state, label, milestone):

- GitLabStateEvent: State transitions with optional user, source_commit,
  and source_merge_request reference
- GitLabLabelEvent: Label add/remove events with nested GitLabLabelRef
- GitLabMilestoneEvent: Milestone assignment changes with nested
  GitLabMilestoneRef
- GitLabMergeRequestRef: Lightweight MR reference (iid, title, web_url)
- GitLabLabelRef: Label metadata (id, name, color, description)
- GitLabMilestoneRef: Milestone metadata (id, iid, title)

All types derive Deserialize + Serialize and use Option<T> for nullable
fields (user, source_commit, color, description) to match GitLab's API
contract where these fields may be null.

Includes 8 new test cases covering:
- State events with/without user, with/without source_merge_request
- Label events for add and remove actions, including null color handling
- Milestone event deserialization
- Standalone ref type deserialization (MR, label, milestone)

Uses r##"..."## raw string delimiters where JSON contains hex color
codes (#FF0000) that would conflict with r#"..."# delimiters.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:06:56 -05:00
Taylor Eernisse
ce5cd9c95d feat(schema): Add migration 011 for resource events, entity references, and dependent fetch queue
Introduces five new tables that power temporal queries (timeline,
file-history, trace) via GitLab Resource Events APIs:

- resource_state_events: State transitions (opened/closed/reopened/merged/locked)
  with actor tracking, source commit, and source MR references
- resource_label_events: Label add/remove history per entity
- resource_milestone_events: Milestone assignment changes per entity
- entity_references: Cross-reference table (Gate 2 prep) linking
  source/target entity pairs with reference type and discovery method
- pending_dependent_fetches: Generic job queue for resource_events,
  mr_closes_issues, and mr_diffs with exponential backoff retry

All event tables enforce entity exclusivity via CHECK constraints
(exactly one of issue_id or merge_request_id must be non-NULL).
Deduplication handled via UNIQUE indexes on (gitlab_id, project_id).
FK cascades ensure cleanup when parent entities are removed.

The dependent fetch queue uses a UNIQUE constraint on
(project_id, entity_type, entity_iid, job_type) for idempotent
enqueue, with partial indexes optimizing claim and retry queries.

Registered as migration 011 in the embedded MIGRATIONS array in db.rs.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 12:06:43 -05:00
256 changed files with 85055 additions and 6126 deletions

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
bd-1m8
bd-1sc6

17
.claude/hooks/on-file-write.sh Executable file
View File

@@ -0,0 +1,17 @@
#!/bin/bash
# Ultimate Bug Scanner - Claude Code Hook
# Runs on every file save for UBS-supported languages (JS/TS, Python, C/C++, Rust, Go, Java, Ruby)
# Claude Code hooks receive context as JSON on stdin.
INPUT=$(cat)
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
CWD=$(echo "$INPUT" | jq -r '.cwd // empty')
if [[ "$FILE_PATH" =~ \.(js|jsx|ts|tsx|mjs|cjs|py|pyw|pyi|c|cc|cpp|cxx|h|hh|hpp|hxx|rs|go|java|rb)$ ]]; then
echo "🔬 Running bug scanner..."
if ! command -v ubs >/dev/null 2>&1; then
echo "⚠️ 'ubs' not found in PATH; install it before using this hook." >&2
exit 0
fi
ubs "$FILE_PATH" --ci 2>&1 | head -50
fi

99
.claude/plan.md Normal file
View File

@@ -0,0 +1,99 @@
# Plan: Add Colors to Sync Command Output
## Current State
The sync output has three layers, each needing color treatment:
### Layer 1: Stage Lines (during sync)
```
✓ Issues 10 issues from 2 projects 4.2s
✓ Status 3 statuses updated · 5 seen 4.2s
vs/typescript-code 2 issues · 1 statuses updated
✓ MRs 5 merge requests from 2 projects 12.3s
vs/python-code 3 MRs · 10 discussions
✓ Docs 1,200 documents generated 8.1s
✓ Embed 3,400 chunks embedded 45.2s
```
**What's uncolored:** icons, labels, numbers, elapsed times, sub-row project paths, failure counts in parentheses.
### Layer 2: Summary (after sync)
```
Synced 10 issues and 5 MRs in 42.3s
120 discussions · 45 events · 12 diffs · 3 statuses updated
1,200 docs regenerated · 3,400 embedded
```
**What's already colored:** headline ("Synced" = green bold, "Sync completed with issues" = warning bold), issue/MR counts (bold), error line (red). Detail lines are all dim.
### Layer 3: Timing breakdown (`-t` flag)
```
── Timing ──────────────────────
issues .............. 4.2s
merge_requests ...... 12.3s
```
**What's already colored:** dots (dim), time (bold), errors (red), rate limits (warning).
---
## Color Plan
Using only existing `Theme` methods — no new colors needed.
### Stage Lines (`format_stage_line` + callers in sync.rs)
| Element | Current | Proposed | Theme method |
|---------|---------|----------|-------------|
| Icon (✓/⚠) | plain | green for success, yellow for warning | `Theme::success()` / `Theme::warning()` |
| Label ("Issues", "MRs", etc.) | plain | bold | `Theme::bold()` |
| Numbers in summary text | plain | bold | `Theme::bold()` (just the count) |
| Elapsed time | plain | muted gray | `Theme::timing()` |
| Failure text in parens | plain | warning/error color | `Theme::warning()` |
### Sub-rows (project breakdown lines)
| Element | Current | Proposed |
|---------|---------|----------|
| Project path | dim | `Theme::muted()` (slightly brighter than dim) |
| Counts (numbers only) | dim | `Theme::dim()` but numbers in normal weight |
| Error/failure counts | dim | `Theme::warning()` |
| Middle dots | dim | keep dim (they're separators, should recede) |
### Summary (`print_sync`)
| Element | Current | Proposed |
|---------|---------|----------|
| Issue/MR counts in headline | bold only | `Theme::info()` + bold (cyan numbers pop) |
| Time in headline | plain | `Theme::timing()` |
| Detail line numbers | all dim | numbers in `Theme::info()`, rest stays dim |
| Doc line numbers | all dim | numbers in `Theme::info()`, rest stays dim |
| "Already up to date" time | plain | `Theme::timing()` |
---
## Files to Change
1. **`src/cli/progress.rs`** — `format_stage_line()`: apply color to icon, bold to label, `Theme::timing()` to elapsed
2. **`src/cli/commands/sync.rs`** —
- Pass colored icons to `format_stage_line` / `emit_stage_line` / `emit_stage_block`
- Color failure text in `append_failures()`
- Color numbers and time in `print_sync()`
- Color error/failure counts in sub-row functions (`issue_sub_rows`, `mr_sub_rows`, `status_sub_rows`)
## Approach
- `format_stage_line` already receives the icon string — color it before passing
- Add a `color_icon` helper that applies success/warning color to the icon glyph
- Bold the label in `format_stage_line`
- Apply `Theme::timing()` to elapsed in `format_stage_line`
- In `append_failures`, wrap failure text in `Theme::warning()`
- In `print_sync`, wrap count numbers with `Theme::info().bold()`
- In sub-row functions, apply `Theme::warning()` to error/failure parts only (keep rest dim)
## Non-goals
- No changes to robot mode (JSON output)
- No changes to dry-run output (already reasonably colored)
- No new Theme colors — use existing palette
- No changes to timing breakdown (already colored)

View File

@@ -0,0 +1,106 @@
---
name: release
description: Bump version, tag, and prepare for next development cycle
version: 1.0.0
author: Taylor Eernisse
category: automation
tags: ["release", "versioning", "semver", "git"]
---
# Release
Automate SemVer version bumps for the `lore` CLI.
## Invocation
```
/release <type>
```
Where `<type>` is one of:
- **major** — breaking changes (0.5.0 -> 1.0.0)
- **minor** — new features (0.5.0 -> 0.6.0)
- **patch** / **hotfix** — bug fixes (0.5.0 -> 0.5.1)
If no type is provided, ask the user.
## Procedure
Follow these steps exactly. Do NOT skip any step.
### 1. Determine bump type
Parse the argument. Accept these aliases:
- `major`, `breaking` -> MAJOR
- `minor`, `feature`, `feat` -> MINOR
- `patch`, `hotfix`, `fix` -> PATCH
If the argument doesn't match, ask the user to clarify.
### 2. Read current version
Read `Cargo.toml` and extract the `version = "X.Y.Z"` line. Parse into major, minor, patch integers.
### 3. Compute new version
- MAJOR: `(major+1).0.0`
- MINOR: `major.(minor+1).0`
- PATCH: `major.minor.(patch+1)`
### 4. Check preconditions
Run `git status` and `git log --oneline -5`. Show the user:
- Current version: X.Y.Z
- New version: A.B.C
- Bump type: major/minor/patch
- Working tree status (clean or dirty)
- Last 5 commits (so they can confirm scope)
If the working tree is dirty, warn: "You have uncommitted changes. They will NOT be included in the release tag. Continue?"
Ask the user to confirm before proceeding.
### 5. Update Cargo.toml
Edit the `version = "..."` line in Cargo.toml to the new version.
### 6. Update Cargo.lock
Run `cargo check` to update Cargo.lock with the new version. This also verifies the project compiles.
### 7. Commit the version bump
```bash
git add Cargo.toml Cargo.lock
git commit -m "release: v{NEW_VERSION}"
```
### 8. Tag the release
```bash
git tag v{NEW_VERSION}
```
### 9. Report
Print a summary:
```
Release v{NEW_VERSION} created.
Previous: v{OLD_VERSION}
Bump: {type}
Tag: v{NEW_VERSION}
Commit: {short hash}
To push: git push && git push --tags
```
Do NOT push automatically. The user decides when to push.
## Examples
```
/release minor -> 0.5.0 -> 0.6.0
/release hotfix -> 0.5.0 -> 0.5.1
/release patch -> 0.5.0 -> 0.5.1
/release major -> 0.5.0 -> 1.0.0
```

50
.cline/rules Normal file
View File

@@ -0,0 +1,50 @@
````markdown
## UBS Quick Reference for AI Agents
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
**Commands:**
```bash
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs --only=js,python src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs --help # Full command reference
ubs sessions --entries 1 # Tail the latest install session log
ubs . # Whole project (ignores things like .venv and node_modules automatically)
```
**Output Format:**
```
⚠️ Category (N errors)
file.ts:42:5 Issue description
💡 Suggested fix
Exit code: 1
```
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
**Fix Workflow:**
1. Read finding → category + fix suggestion
2. Navigate `file:line:col` → view context
3. Verify real issue (not false positive)
4. Fix root cause (not symptom)
5. Re-run `ubs <file>` → exit 0
6. Commit
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
**Bug Severity:**
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
- **Important** (production): Type narrowing, division-by-zero, resource leaks
- **Contextual** (judgment): TODO/FIXME, console logs
**Anti-Patterns:**
- ❌ Ignore findings → ✅ Investigate each
- ❌ Full scan per edit → ✅ Scope to file
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
````

50
.codex/rules/ubs.md Normal file
View File

@@ -0,0 +1,50 @@
````markdown
## UBS Quick Reference for AI Agents
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
**Commands:**
```bash
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs --only=js,python src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs --help # Full command reference
ubs sessions --entries 1 # Tail the latest install session log
ubs . # Whole project (ignores things like .venv and node_modules automatically)
```
**Output Format:**
```
⚠️ Category (N errors)
file.ts:42:5 Issue description
💡 Suggested fix
Exit code: 1
```
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
**Fix Workflow:**
1. Read finding → category + fix suggestion
2. Navigate `file:line:col` → view context
3. Verify real issue (not false positive)
4. Fix root cause (not symptom)
5. Re-run `ubs <file>` → exit 0
6. Commit
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
**Bug Severity:**
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
- **Important** (production): Type narrowing, division-by-zero, resource leaks
- **Contextual** (judgment): TODO/FIXME, console logs
**Anti-Patterns:**
- ❌ Ignore findings → ✅ Investigate each
- ❌ Full scan per edit → ✅ Scope to file
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
````

16
.continue/config.json Normal file
View File

@@ -0,0 +1,16 @@
{
"customCommands": [
{
"name": "scan-bugs",
"description": "Run Ultimate Bug Scanner on current project",
"prompt": "Run 'ubs --fail-on-warning .' and fix any critical issues found before proceeding"
}
],
"slashCommands": [
{
"name": "quality",
"description": "Check code quality with UBS",
"run": "ubs ."
}
]
}

50
.cursor/rules Normal file
View File

@@ -0,0 +1,50 @@
````markdown
## UBS Quick Reference for AI Agents
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
**Commands:**
```bash
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs --only=js,python src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs --help # Full command reference
ubs sessions --entries 1 # Tail the latest install session log
ubs . # Whole project (ignores things like .venv and node_modules automatically)
```
**Output Format:**
```
⚠️ Category (N errors)
file.ts:42:5 Issue description
💡 Suggested fix
Exit code: 1
```
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
**Fix Workflow:**
1. Read finding → category + fix suggestion
2. Navigate `file:line:col` → view context
3. Verify real issue (not false positive)
4. Fix root cause (not symptom)
5. Re-run `ubs <file>` → exit 0
6. Commit
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
**Bug Severity:**
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
- **Important** (production): Type narrowing, division-by-zero, resource leaks
- **Contextual** (judgment): TODO/FIXME, console logs
**Anti-Patterns:**
- ❌ Ignore findings → ✅ Investigate each
- ❌ Full scan per edit → ✅ Scope to file
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
````

50
.gemini/rules Normal file
View File

@@ -0,0 +1,50 @@
````markdown
## UBS Quick Reference for AI Agents
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
**Commands:**
```bash
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs --only=js,python src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs --help # Full command reference
ubs sessions --entries 1 # Tail the latest install session log
ubs . # Whole project (ignores things like .venv and node_modules automatically)
```
**Output Format:**
```
⚠️ Category (N errors)
file.ts:42:5 Issue description
💡 Suggested fix
Exit code: 1
```
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
**Fix Workflow:**
1. Read finding → category + fix suggestion
2. Navigate `file:line:col` → view context
3. Verify real issue (not false positive)
4. Fix root cause (not symptom)
5. Re-run `ubs <file>` → exit 0
6. Commit
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
**Bug Severity:**
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
- **Important** (production): Type narrowing, division-by-zero, resource leaks
- **Contextual** (judgment): TODO/FIXME, console logs
**Anti-Patterns:**
- ❌ Ignore findings → ✅ Investigate each
- ❌ Full scan per edit → ✅ Scope to file
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
````

21
.github/workflows/roam.yml vendored Normal file
View File

@@ -0,0 +1,21 @@
name: Roam Code Analysis
on:
pull_request:
branches: [main, master]
permissions:
contents: read
pull-requests: write
jobs:
roam:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install roam-code
- run: roam index
- run: roam fitness
- run: roam pr-risk --json

3
.gitignore vendored
View File

@@ -41,6 +41,9 @@ lore.config.json
*.db-shm
# Mock seed data
tools/mock-seed/
# Added by cargo
/target

50
.opencode/rules Normal file
View File

@@ -0,0 +1,50 @@
````markdown
## UBS Quick Reference for AI Agents
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
**Commands:**
```bash
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs --only=js,python src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs --help # Full command reference
ubs sessions --entries 1 # Tail the latest install session log
ubs . # Whole project (ignores things like .venv and node_modules automatically)
```
**Output Format:**
```
⚠️ Category (N errors)
file.ts:42:5 Issue description
💡 Suggested fix
Exit code: 1
```
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
**Fix Workflow:**
1. Read finding → category + fix suggestion
2. Navigate `file:line:col` → view context
3. Verify real issue (not false positive)
4. Fix root cause (not symptom)
5. Re-run `ubs <file>` → exit 0
6. Commit
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
**Bug Severity:**
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
- **Important** (production): Type narrowing, division-by-zero, resource leaks
- **Contextual** (judgment): TODO/FIXME, console logs
**Anti-Patterns:**
- ❌ Ignore findings → ✅ Investigate each
- ❌ Full scan per edit → ✅ Scope to file
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
````

11
.roam/fitness.yaml Normal file
View File

@@ -0,0 +1,11 @@
rules:
- name: No circular imports in core
type: dependency
source: "src/**"
forbidden_target: "tests/**"
reason: "Production code should not import test modules"
- name: Complexity threshold
type: metric
metric: cognitive_complexity
threshold: 30
reason: "Functions above 30 cognitive complexity need refactoring"

163
AGENTS.md
View File

@@ -14,6 +14,14 @@ If I tell you to do something, even if it goes against what follows below, YOU M
---
## Version Control: jj-First (CRITICAL)
**ALWAYS prefer jj (Jujutsu) over git for all VCS operations.** This is a colocated repo with both `.jj/` and `.git/`. When instructed to use git by anything — even later in this file — use the best jj replacement commands instead. Only fall back to raw `git` for things jj cannot do (hooks, LFS, submodules, `gh` CLI interop).
See `~/.claude/rules/jj-vcs/` for the full command reference, translation table, revsets, patterns, and recovery recipes.
---
## Irreversible Git & Filesystem Actions — DO NOT EVER BREAK GLASS
> **Note:** Treat destructive commands as break-glass. If there's any doubt, stop and ask.
@@ -35,6 +43,8 @@ We only use **Cargo** in this project, NEVER any other package manager.
- **Configuration:** Cargo.toml only
- **Unsafe code:** Forbidden (`#![forbid(unsafe_code)]`)
When writing Rust code, reference RUST_CLI_TOOLS_BEST_PRACTICES.md
### Release Profile
Use the release profile defined in `Cargo.toml`. If you need to change it, justify the
@@ -314,7 +324,7 @@ bv --robot-insights | jq '.Cycles' # Circular deps (must
```bash
ubs file.rs file2.rs # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs $(jj diff --name-only) # Changed files — before commit
ubs --only=rust,toml src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs . # Whole project (ignores target/, Cargo.lock)
@@ -426,9 +436,9 @@ Returns structured results with file paths, line ranges, and extracted code snip
## Beads Workflow Integration
This project uses [beads_viewer](https://github.com/Dicklesworthstone/beads_viewer) for issue tracking. Issues are stored in `.beads/` and tracked in git.
This project uses [beads_viewer](https://github.com/Dicklesworthstone/beads_viewer) for issue tracking. Issues are stored in `.beads/` and tracked in version control.
**Note:** `br` is non-invasive—it never executes git commands directly. You must run git commands manually after `br sync --flush-only`.
**Note:** `br` is non-invasive—it never executes VCS commands directly. You must commit manually after `br sync --flush-only`.
### Essential Commands
@@ -444,7 +454,7 @@ br create --title="..." --type=task --priority=2
br update <id> --status=in_progress
br close <id> --reason="Completed"
br close <id1> <id2> # Close multiple issues at once
br sync --flush-only # Export to JSONL (then manually: git add .beads/ && git commit)
br sync --flush-only # Export to JSONL (then: jj commit -m "Update beads")
```
### Workflow Pattern
@@ -464,15 +474,14 @@ br sync --flush-only # Export to JSONL (then manually: git add .beads/ && git c
### Session Protocol
**Before ending any session, run this checklist:**
**Before ending any session, run this checklist (solo/lead only — workers skip VCS):**
```bash
git status # Check what changed
git add <files> # Stage code changes
br sync --flush-only # Export beads to JSONL
git add .beads/ # Stage beads changes
git commit -m "..." # Commit code and beads
git push # Push to remote
jj status # Check what changed
br sync --flush-only # Export beads to JSONL
jj commit -m "..." # Commit code and beads (jj auto-tracks all changes)
jj bookmark set <name> -r @- # Point bookmark at committed work
jj git push -b <name> # Push to remote
```
### Best Practices
@@ -481,13 +490,15 @@ git push # Push to remote
- Update status as you work (in_progress → closed)
- Create new issues with `br create` when you discover tasks
- Use descriptive titles and set appropriate priority/type
- Always run `br sync --flush-only` then commit .beads/ before ending session
- Always run `br sync --flush-only` then commit before ending session (jj auto-tracks .beads/)
<!-- end-bv-agent-instructions -->
## Landing the Plane (Session Completion)
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until `git push` succeeds.
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until push succeeds.
**WHO RUNS THIS:** Solo agents run it themselves. In multi-agent sessions, ONLY the team lead runs this. Workers skip VCS entirely.
**MANDATORY WORKFLOW:**
@@ -496,19 +507,20 @@ git push # Push to remote
3. **Update issue status** - Close finished work, update in-progress items
4. **PUSH TO REMOTE** - This is MANDATORY:
```bash
git pull --rebase
br sync --flush-only
git add .beads/
git commit -m "Update beads"
git push
git status # MUST show "up to date with origin"
jj git fetch # Get latest remote state
jj rebase -d trunk() # Rebase onto latest trunk if needed
br sync --flush-only # Export beads to JSONL
jj commit -m "Update beads" # Commit (jj auto-tracks .beads/ changes)
jj bookmark set <name> -r @- # Point bookmark at committed work
jj git push -b <name> # Push to remote
jj log -r '<name>' # Verify bookmark position
```
5. **Clean up** - Clear stashes, prune remote branches
5. **Clean up** - Abandon empty orphan changes if any (`jj abandon <rev>`)
6. **Verify** - All changes committed AND pushed
7. **Hand off** - Provide context for next session
**CRITICAL RULES:**
- Work is NOT complete until `git push` succeeds
- Work is NOT complete until `jj git push` succeeds
- NEVER stop before pushing - that leaves work stranded locally
- NEVER say "ready to push when you are" - YOU must push
- If push fails, resolve and retry until it succeeds
@@ -591,7 +603,7 @@ If you aren't 100% sure how to use a third-party library, **SEARCH ONLINE** to f
## Gitlore Robot Mode
The `lore` CLI has a robot mode optimized for AI agent consumption with structured JSON output, meaningful exit codes, and TTY auto-detection.
The `lore` CLI has a robot mode optimized for AI agent consumption with compact JSON output, structured errors with machine-actionable recovery steps, meaningful exit codes, response timing metadata, field selection for token efficiency, and TTY auto-detection.
### Activation
@@ -616,6 +628,13 @@ LORE_ROBOT=1 lore issues
lore --robot issues -n 10
lore --robot mrs -s opened
# Filter issues by work item status (case-insensitive)
lore --robot issues --status "In progress"
# List with field selection (reduces token usage ~60%)
lore --robot issues --fields minimal
lore --robot mrs --fields iid,title,state,draft
# Show detailed entity info
lore --robot issues 123
lore --robot mrs 456 -p group/repo
@@ -633,6 +652,9 @@ lore --robot status
# Run full sync pipeline
lore --robot sync
# Run sync without resource events
lore --robot sync --no-events
# Run ingestion only
lore --robot ingest issues
@@ -642,7 +664,7 @@ lore --robot doctor
# Document and index statistics
lore --robot stats
# Quick health pre-flight check (exit 0 = healthy, 1 = unhealthy)
# Quick health pre-flight check (exit 0 = healthy, 19 = unhealthy)
lore --robot health
# Generate searchable documents from ingested data
@@ -651,7 +673,7 @@ lore --robot generate-docs
# Generate vector embeddings via Ollama
lore --robot embed
# Agent self-discovery manifest (all commands, flags, exit codes)
# Agent self-discovery manifest (all commands, flags, exit codes, response schemas)
lore robot-docs
# Version information
@@ -660,16 +682,27 @@ lore --robot version
### Response Format
All commands return consistent JSON:
All commands return compact JSON with a uniform envelope and timing metadata:
```json
{"ok":true,"data":{...},"meta":{...}}
{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}
```
Errors return structured JSON to stderr:
Errors return structured JSON to stderr with machine-actionable recovery steps:
```json
{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'"}}
{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'","actions":["lore init"]}}
```
The `actions` array contains executable shell commands for automated recovery. It is omitted when empty.
### Field Selection
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response:
```bash
lore -J issues --fields minimal # Preset: iid, title, state, updated_at_iso
lore -J mrs --fields iid,title,state,draft,labels # Custom field list
```
### Exit Codes
@@ -677,7 +710,7 @@ Errors return structured JSON to stderr:
| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Internal error / health check failed / not implemented |
| 1 | Internal error / not implemented |
| 2 | Usage error (invalid flags or arguments) |
| 3 | Config invalid |
| 4 | Token not set |
@@ -695,6 +728,7 @@ Errors return structured JSON to stderr:
| 16 | Embedding failed |
| 17 | Not found (entity does not exist) |
| 18 | Ambiguous match (use `-p` to specify project) |
| 19 | Health check failed |
| 20 | Config not found |
### Configuration Precedence
@@ -708,10 +742,79 @@ Errors return structured JSON to stderr:
- Use `lore --robot` or `lore -J` for all agent interactions
- Check exit codes for error handling
- Parse JSON errors from stderr
- Parse JSON errors from stderr; use `actions` array for automated recovery
- Use `--fields minimal` to reduce token usage (~60% fewer tokens)
- Use `-n` / `--limit` to control response size
- Use `-q` / `--quiet` to suppress progress bars and non-essential output
- Use `--color never` in non-TTY automation for ANSI-free output
- Use `-v` / `-vv` / `-vvv` for increasing verbosity (debug/trace logging)
- Use `--log-format json` for machine-readable log output to stderr
- TTY detection handles piped commands automatically
- Use `lore --robot health` as a fast pre-flight check before queries
- Use `lore robot-docs` for response schema discovery
- The `-p` flag supports fuzzy project matching (suffix and substring)
---
## Read/Write Split: lore vs glab
| Operation | Tool | Why |
|-----------|------|-----|
| List issues/MRs | lore | Richer: includes status, discussions, closing MRs |
| View issue/MR detail | lore | Pre-joined discussions, work-item status |
| Search across entities | lore | FTS5 + vector hybrid search |
| Expert/workload analysis | lore | who command — no glab equivalent |
| Timeline reconstruction | lore | Chronological narrative — no glab equivalent |
| Create/update/close | glab | Write operations |
| Approve/merge MR | glab | Write operations |
| CI/CD pipelines | glab | Not in lore scope |
````markdown
## UBS Quick Reference for AI Agents
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
**Commands:**
```bash
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs --only=js,python src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs --help # Full command reference
ubs sessions --entries 1 # Tail the latest install session log
ubs . # Whole project (ignores things like .venv and node_modules automatically)
```
**Output Format:**
```
⚠️ Category (N errors)
file.ts:42:5 Issue description
💡 Suggested fix
Exit code: 1
```
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
**Fix Workflow:**
1. Read finding → category + fix suggestion
2. Navigate `file:line:col` → view context
3. Verify real issue (not false positive)
4. Fix root cause (not symptom)
5. Re-run `ubs <file>` → exit 0
6. Commit
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
**Bug Severity:**
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
- **Important** (production): Type narrowing, division-by-zero, resource leaks
- **Contextual** (judgment): TODO/FIXME, console logs
**Anti-Patterns:**
- ❌ Ignore findings → ✅ Investigate each
- ❌ Full scan per edit → ✅ Scope to file
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
````

742
AGENTS.md.backup Normal file
View File

@@ -0,0 +1,742 @@
# AGENTS.md
## RULE 0 - THE FUNDAMENTAL OVERRIDE PEROGATIVE
If I tell you to do something, even if it goes against what follows below, YOU MUST LISTEN TO ME. I AM IN CHARGE, NOT YOU.
---
## RULE NUMBER 1: NO FILE DELETION
**YOU ARE NEVER ALLOWED TO DELETE A FILE WITHOUT EXPRESS PERMISSION.** Even a new file that you yourself created, such as a test code file. You have a horrible track record of deleting critically important files or otherwise throwing away tons of expensive work. As a result, you have permanently lost any and all rights to determine that a file or folder should be deleted.
**YOU MUST ALWAYS ASK AND RECEIVE CLEAR, WRITTEN PERMISSION BEFORE EVER DELETING A FILE OR FOLDER OF ANY KIND.**
---
## Irreversible Git & Filesystem Actions — DO NOT EVER BREAK GLASS
> **Note:** Treat destructive commands as break-glass. If there's any doubt, stop and ask.
1. **Absolutely forbidden commands:** `git reset --hard`, `git clean -fd`, `rm -rf`, or any command that can delete or overwrite code/data must never be run unless the user explicitly provides the exact command and states, in the same message, that they understand and want the irreversible consequences.
2. **No guessing:** If there is any uncertainty about what a command might delete or overwrite, stop immediately and ask the user for specific approval. "I think it's safe" is never acceptable.
3. **Safer alternatives first:** When cleanup or rollbacks are needed, request permission to use non-destructive options (`git status`, `git diff`, `git stash`, copying to backups) before ever considering a destructive command.
4. **Mandatory explicit plan:** Even after explicit user authorization, restate the command verbatim, list exactly what will be affected, and wait for a confirmation that your understanding is correct. Only then may you execute it—if anything remains ambiguous, refuse and escalate.
5. **Document the confirmation:** When running any approved destructive command, record (in the session notes / final response) the exact user text that authorized it, the command actually run, and the execution time. If that record is absent, the operation did not happen.
---
## Toolchain: Rust & Cargo
We only use **Cargo** in this project, NEVER any other package manager.
- **Edition/toolchain:** Follow `rust-toolchain.toml` (if present). Do not assume stable vs nightly.
- **Dependencies:** Explicit versions for stability; keep the set minimal.
- **Configuration:** Cargo.toml only
- **Unsafe code:** Forbidden (`#![forbid(unsafe_code)]`)
When writing Rust code, reference RUST_CLI_TOOLS_BEST_PRACTICES.md
### Release Profile
Use the release profile defined in `Cargo.toml`. If you need to change it, justify the
performance/size tradeoff and how it impacts determinism and cancellation behavior.
---
## Code Editing Discipline
### No Script-Based Changes
**NEVER** run a script that processes/changes code files in this repo. Brittle regex-based transformations create far more problems than they solve.
- **Always make code changes manually**, even when there are many instances
- For many simple changes: use parallel subagents
- For subtle/complex changes: do them methodically yourself
### No File Proliferation
If you want to change something or add a feature, **revise existing code files in place**.
**NEVER** create variations like:
- `mainV2.rs`
- `main_improved.rs`
- `main_enhanced.rs`
New files are reserved for **genuinely new functionality** that makes zero sense to include in any existing file. The bar for creating new files is **incredibly high**.
---
## Backwards Compatibility
We do not care about backwards compatibility—we're in early development with no users. We want to do things the **RIGHT** way with **NO TECH DEBT**.
- Never create "compatibility shims"
- Never create wrapper functions for deprecated APIs
- Just fix the code directly
---
## Compiler Checks (CRITICAL)
**After any substantive code changes, you MUST verify no errors were introduced:**
```bash
# Check for compiler errors and warnings
cargo check --all-targets
# Check for clippy lints (pedantic + nursery are enabled)
cargo clippy --all-targets -- -D warnings
# Verify formatting
cargo fmt --check
```
If you see errors, **carefully understand and resolve each issue**. Read sufficient context to fix them the RIGHT way.
---
## Testing
### Unit & Property Tests
```bash
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
```
When adding or changing primitives, add tests that assert the core invariants:
- no task leaks
- no obligation leaks
- losers are drained after races
- region close implies quiescence
Prefer deterministic lab-runtime tests for concurrency-sensitive behavior.
---
## MCP Agent Mail — Multi-Agent Coordination
A mail-like layer that lets coding agents coordinate asynchronously via MCP tools and resources. Provides identities, inbox/outbox, searchable threads, and advisory file reservations with human-auditable artifacts in Git.
### Why It's Useful
- **Prevents conflicts:** Explicit file reservations (leases) for files/globs
- **Token-efficient:** Messages stored in per-project archive, not in context
- **Quick reads:** `resource://inbox/...`, `resource://thread/...`
### Same Repository Workflow
1. **Register identity:**
```
ensure_project(project_key=<abs-path>)
register_agent(project_key, program, model)
```
2. **Reserve files before editing:**
```
file_reservation_paths(project_key, agent_name, ["src/**"], ttl_seconds=3600, exclusive=true)
```
3. **Communicate with threads:**
```
send_message(..., thread_id="FEAT-123")
fetch_inbox(project_key, agent_name)
acknowledge_message(project_key, agent_name, message_id)
```
4. **Quick reads:**
```
resource://inbox/{Agent}?project=<abs-path>&limit=20
resource://thread/{id}?project=<abs-path>&include_bodies=true
```
### Macros vs Granular Tools
- **Prefer macros for speed:** `macro_start_session`, `macro_prepare_thread`, `macro_file_reservation_cycle`, `macro_contact_handshake`
- **Use granular tools for control:** `register_agent`, `file_reservation_paths`, `send_message`, `fetch_inbox`, `acknowledge_message`
### Common Pitfalls
- `"from_agent not registered"`: Always `register_agent` in the correct `project_key` first
- `"FILE_RESERVATION_CONFLICT"`: Adjust patterns, wait for expiry, or use non-exclusive reservation
- **Auth errors:** If JWT+JWKS enabled, include bearer token with matching `kid`
---
## Beads (br) — Dependency-Aware Issue Tracking
Beads provides a lightweight, dependency-aware issue database and CLI (`br` / beads_rust) for selecting "ready work," setting priorities, and tracking status. It complements MCP Agent Mail's messaging and file reservations.
**Note:** `br` is non-invasive—it never executes git commands directly. You must run git commands manually after `br sync --flush-only`.
### Conventions
- **Single source of truth:** Beads for task status/priority/dependencies; Agent Mail for conversation and audit
- **Shared identifiers:** Use Beads issue ID (e.g., `br-123`) as Mail `thread_id` and prefix subjects with `[br-123]`
- **Reservations:** When starting a task, call `file_reservation_paths()` with the issue ID in `reason`
### Typical Agent Flow
1. **Pick ready work (Beads):**
```bash
br ready --json # Choose highest priority, no blockers
```
2. **Reserve edit surface (Mail):**
```
file_reservation_paths(project_key, agent_name, ["src/**"], ttl_seconds=3600, exclusive=true, reason="br-123")
```
3. **Announce start (Mail):**
```
send_message(..., thread_id="br-123", subject="[br-123] Start: <title>", ack_required=true)
```
4. **Work and update:** Reply in-thread with progress
5. **Complete and release:**
```bash
br close br-123 --reason "Completed"
```
```
release_file_reservations(project_key, agent_name, paths=["src/**"])
```
Final Mail reply: `[br-123] Completed` with summary
### Mapping Cheat Sheet
| Concept | Value |
|---------|-------|
| Mail `thread_id` | `br-###` |
| Mail subject | `[br-###] ...` |
| File reservation `reason` | `br-###` |
| Commit messages | Include `br-###` for traceability |
---
## bv — Graph-Aware Triage Engine
bv is a graph-aware triage engine for Beads projects (`.beads/beads.jsonl`). It computes PageRank, betweenness, critical path, cycles, HITS, eigenvector, and k-core metrics deterministically.
**Scope boundary:** bv handles *what to work on* (triage, priority, planning). For agent-to-agent coordination (messaging, work claiming, file reservations), use MCP Agent Mail.
**CRITICAL: Use ONLY `--robot-*` flags. Bare `bv` launches an interactive TUI that blocks your session.**
### The Workflow: Start With Triage
**`bv --robot-triage` is your single entry point.** It returns:
- `quick_ref`: at-a-glance counts + top 3 picks
- `recommendations`: ranked actionable items with scores, reasons, unblock info
- `quick_wins`: low-effort high-impact items
- `blockers_to_clear`: items that unblock the most downstream work
- `project_health`: status/type/priority distributions, graph metrics
- `commands`: copy-paste shell commands for next steps
```bash
bv --robot-triage # THE MEGA-COMMAND: start here
bv --robot-next # Minimal: just the single top pick + claim command
```
### Command Reference
**Planning:**
| Command | Returns |
|---------|---------|
| `--robot-plan` | Parallel execution tracks with `unblocks` lists |
| `--robot-priority` | Priority misalignment detection with confidence |
**Graph Analysis:**
| Command | Returns |
|---------|---------|
| `--robot-insights` | Full metrics: PageRank, betweenness, HITS, eigenvector, critical path, cycles, k-core, articulation points, slack |
| `--robot-label-health` | Per-label health: `health_level`, `velocity_score`, `staleness`, `blocked_count` |
| `--robot-label-flow` | Cross-label dependency: `flow_matrix`, `dependencies`, `bottleneck_labels` |
| `--robot-label-attention [--attention-limit=N]` | Attention-ranked labels |
**History & Change Tracking:**
| Command | Returns |
|---------|---------|
| `--robot-history` | Bead-to-commit correlations |
| `--robot-diff --diff-since <ref>` | Changes since ref: new/closed/modified issues, cycles |
**Other:**
| Command | Returns |
|---------|---------|
| `--robot-burndown <sprint>` | Sprint burndown, scope changes, at-risk items |
| `--robot-forecast <id\|all>` | ETA predictions with dependency-aware scheduling |
| `--robot-alerts` | Stale issues, blocking cascades, priority mismatches |
| `--robot-suggest` | Hygiene: duplicates, missing deps, label suggestions |
| `--robot-graph [--graph-format=json\|dot\|mermaid]` | Dependency graph export |
| `--export-graph <file.html>` | Interactive HTML visualization |
### Scoping & Filtering
```bash
bv --robot-plan --label backend # Scope to label's subgraph
bv --robot-insights --as-of HEAD~30 # Historical point-in-time
bv --recipe actionable --robot-plan # Pre-filter: ready to work
bv --recipe high-impact --robot-triage # Pre-filter: top PageRank
bv --robot-triage --robot-triage-by-track # Group by parallel work streams
bv --robot-triage --robot-triage-by-label # Group by domain
```
### Understanding Robot Output
**All robot JSON includes:**
- `data_hash` — Fingerprint of source beads.jsonl
- `status` — Per-metric state: `computed|approx|timeout|skipped` + elapsed ms
- `as_of` / `as_of_commit` — Present when using `--as-of`
**Two-phase analysis:**
- **Phase 1 (instant):** degree, topo sort, density
- **Phase 2 (async, 500ms timeout):** PageRank, betweenness, HITS, eigenvector, cycles
### jq Quick Reference
```bash
bv --robot-triage | jq '.quick_ref' # At-a-glance summary
bv --robot-triage | jq '.recommendations[0]' # Top recommendation
bv --robot-plan | jq '.plan.summary.highest_impact' # Best unblock target
bv --robot-insights | jq '.status' # Check metric readiness
bv --robot-insights | jq '.Cycles' # Circular deps (must fix!)
```
---
## UBS — Ultimate Bug Scanner
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
### Commands
```bash
ubs file.rs file2.rs # Specific files (< 1s) — USE THIS
ubs $(git diff --name-only --cached) # Staged files — before commit
ubs --only=rust,toml src/ # Language filter (3-5x faster)
ubs --ci --fail-on-warning . # CI mode — before PR
ubs . # Whole project (ignores target/, Cargo.lock)
```
### Output Format
```
⚠️ Category (N errors)
file.rs:42:5 Issue description
💡 Suggested fix
Exit code: 1
```
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
### Fix Workflow
1. Read finding → category + fix suggestion
2. Navigate `file:line:col` → view context
3. Verify real issue (not false positive)
4. Fix root cause (not symptom)
5. Re-run `ubs <file>` → exit 0
6. Commit
### Bug Severity
- **Critical (always fix):** Memory safety, use-after-free, data races, SQL injection
- **Important (production):** Unwrap panics, resource leaks, overflow checks
- **Contextual (judgment):** TODO/FIXME, println! debugging
---
## ast-grep vs ripgrep
**Use `ast-grep` when structure matters.** It parses code and matches AST nodes, ignoring comments/strings, and can **safely rewrite** code.
- Refactors/codemods: rename APIs, change import forms
- Policy checks: enforce patterns across a repo
- Editor/automation: LSP mode, `--json` output
**Use `ripgrep` when text is enough.** Fastest way to grep literals/regex.
- Recon: find strings, TODOs, log lines, config values
- Pre-filter: narrow candidate files before ast-grep
### Rule of Thumb
- Need correctness or **applying changes** → `ast-grep`
- Need raw speed or **hunting text** → `rg`
- Often combine: `rg` to shortlist files, then `ast-grep` to match/modify
### Rust Examples
```bash
# Find structured code (ignores comments)
ast-grep run -l Rust -p 'fn $NAME($$$ARGS) -> $RET { $$$BODY }'
# Find all unwrap() calls
ast-grep run -l Rust -p '$EXPR.unwrap()'
# Quick textual hunt
rg -n 'println!' -t rust
# Combine speed + precision
rg -l -t rust 'unwrap\(' | xargs ast-grep run -l Rust -p '$X.unwrap()' --json
```
---
## Morph Warp Grep — AI-Powered Code Search
**Use `mcp__morph-mcp__warp_grep` for exploratory "how does X work?" questions.** An AI agent expands your query, greps the codebase, reads relevant files, and returns precise line ranges with full context.
**Use `ripgrep` for targeted searches.** When you know exactly what you're looking for.
**Use `ast-grep` for structural patterns.** When you need AST precision for matching/rewriting.
### When to Use What
| Scenario | Tool | Why |
|----------|------|-----|
| "How is pattern matching implemented?" | `warp_grep` | Exploratory; don't know where to start |
| "Where is the quick reject filter?" | `warp_grep` | Need to understand architecture |
| "Find all uses of `Regex::new`" | `ripgrep` | Targeted literal search |
| "Find files with `println!`" | `ripgrep` | Simple pattern |
| "Replace all `unwrap()` with `expect()`" | `ast-grep` | Structural refactor |
### warp_grep Usage
```
mcp__morph-mcp__warp_grep(
repoPath: "/path/to/dcg",
query: "How does the safe pattern whitelist work?"
)
```
Returns structured results with file paths, line ranges, and extracted code snippets.
### Anti-Patterns
- **Don't** use `warp_grep` to find a specific function name → use `ripgrep`
- **Don't** use `ripgrep` to understand "how does X work" → wastes time with manual reads
- **Don't** use `ripgrep` for codemods → risks collateral edits
<!-- bv-agent-instructions-v1 -->
---
## Beads Workflow Integration
This project uses [beads_viewer](https://github.com/Dicklesworthstone/beads_viewer) for issue tracking. Issues are stored in `.beads/` and tracked in git.
**Note:** `br` is non-invasive—it never executes git commands directly. You must run git commands manually after `br sync --flush-only`.
### Essential Commands
```bash
# View issues (launches TUI - avoid in automated sessions)
bv
# CLI commands for agents (use these instead)
br ready # Show issues ready to work (no blockers)
br list --status=open # All open issues
br show <id> # Full issue details with dependencies
br create --title="..." --type=task --priority=2
br update <id> --status=in_progress
br close <id> --reason="Completed"
br close <id1> <id2> # Close multiple issues at once
br sync --flush-only # Export to JSONL (then manually: git add .beads/ && git commit)
```
### Workflow Pattern
1. **Start**: Run `br ready` to find actionable work
2. **Claim**: Use `br update <id> --status=in_progress`
3. **Work**: Implement the task
4. **Complete**: Use `br close <id>`
5. **Sync**: Run `br sync --flush-only`, then `git add .beads/ && git commit -m "Update beads"`
### Key Concepts
- **Dependencies**: Issues can block other issues. `br ready` shows only unblocked work.
- **Priority**: P0=critical, P1=high, P2=medium, P3=low, P4=backlog (use numbers, not words)
- **Types**: task, bug, feature, epic, question, docs
- **Blocking**: `br dep add <issue> <depends-on>` to add dependencies
### Session Protocol
**Before ending any session, run this checklist:**
```bash
git status # Check what changed
git add <files> # Stage code changes
br sync --flush-only # Export beads to JSONL
git add .beads/ # Stage beads changes
git commit -m "..." # Commit code and beads
git push # Push to remote
```
### Best Practices
- Check `br ready` at session start to find available work
- Update status as you work (in_progress → closed)
- Create new issues with `br create` when you discover tasks
- Use descriptive titles and set appropriate priority/type
- Always run `br sync --flush-only` then commit .beads/ before ending session
<!-- end-bv-agent-instructions -->
## Landing the Plane (Session Completion)
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until `git push` succeeds.
**MANDATORY WORKFLOW:**
1. **File issues for remaining work** - Create issues for anything that needs follow-up
2. **Run quality gates** (if code changed) - Tests, linters, builds
3. **Update issue status** - Close finished work, update in-progress items
4. **PUSH TO REMOTE** - This is MANDATORY:
```bash
git pull --rebase
br sync --flush-only
git add .beads/
git commit -m "Update beads"
git push
git status # MUST show "up to date with origin"
```
5. **Clean up** - Clear stashes, prune remote branches
6. **Verify** - All changes committed AND pushed
7. **Hand off** - Provide context for next session
**CRITICAL RULES:**
- Work is NOT complete until `git push` succeeds
- NEVER stop before pushing - that leaves work stranded locally
- NEVER say "ready to push when you are" - YOU must push
- If push fails, resolve and retry until it succeeds
---
## cass — Cross-Agent Session Search
`cass` indexes prior agent conversations (Claude Code, Codex, Cursor, Gemini, ChatGPT, etc.) so we can reuse solved problems.
**Rules:** Never run bare `cass` (TUI). Always use `--robot` or `--json`.
### Examples
```bash
cass health
cass search "async runtime" --robot --limit 5
cass view /path/to/session.jsonl -n 42 --json
cass expand /path/to/session.jsonl -n 42 -C 3 --json
cass capabilities --json
cass robot-docs guide
```
### Tips
- Use `--fields minimal` for lean output
- Filter by agent with `--agent`
- Use `--days N` to limit to recent history
stdout is data-only, stderr is diagnostics; exit code 0 means success.
Treat cass as a way to avoid re-solving problems other agents already handled.
---
## Note for Codex/GPT-5.2
You constantly bother me and stop working with concerned questions that look similar to this:
```
Unexpected changes (need guidance)
- Working tree still shows edits I did not make in Cargo.toml, Cargo.lock, src/runtime.rs, src/scope.rs. Please advise whether to keep/commit/revert these before any further work. I did not touch them.
Next steps (pick one)
1. Decide how to handle the unrelated modified files above so we can resume cleanly.
```
NEVER EVER DO THAT AGAIN. The answer is literally ALWAYS the same: those are changes created by the potentially dozen of other agents working on the project at the same time. This is not only a common occurence, it happens multiple times PER MINUTE. The way to deal with it is simple: you NEVER, under ANY CIRCUMSTANCE, stash, revert, overwrite, or otherwise disturb in ANY way the work of other agents. Just treat those changes identically to changes that you yourself made. Just fool yourself into thinking YOU made the changes and simply don't recall it for some reason.
---
## Note on Built-in TODO Functionality
Also, if I ask you to explicitly use your built-in TODO functionality, don't complain about this and say you need to use beads. You can use built-in TODOs if I tell you specifically to do so. Always comply with such orders.
## TDD Requirements
Test-first development is mandatory:
1. **RED** - Write failing test first
2. **GREEN** - Minimal implementation to pass
3. **REFACTOR** - Clean up while green
## Key Patterns
Find the simplest solution that meets all acceptance criteria.
Use third party libraries whenever there's a well-maintained, active, and widely adopted solution (for example, date-fns for TS date math)
Build extensible pieces of logic that can easily be integrated with other pieces.
DRY principles should be loosely held.
Architecture MUST be clear and well thought-out. Ask the user for clarification whenever ambiguity is discovered around architecture, or you think a better approach than planned exists.
---
## Third-Party Library Usage
If you aren't 100% sure how to use a third-party library, **SEARCH ONLINE** to find the latest documentation and mid-2025 best practices.
---
## Gitlore Robot Mode
The `lore` CLI has a robot mode optimized for AI agent consumption with compact JSON output, structured errors with machine-actionable recovery steps, meaningful exit codes, response timing metadata, field selection for token efficiency, and TTY auto-detection.
### Activation
```bash
# Explicit flag
lore --robot issues -n 10
# JSON shorthand (-J)
lore -J issues -n 10
# Auto-detection (when stdout is not a TTY)
lore issues | jq .
# Environment variable
LORE_ROBOT=1 lore issues
```
### Robot Mode Commands
```bash
# List issues/MRs with JSON output
lore --robot issues -n 10
lore --robot mrs -s opened
# List with field selection (reduces token usage ~60%)
lore --robot issues --fields minimal
lore --robot mrs --fields iid,title,state,draft
# Show detailed entity info
lore --robot issues 123
lore --robot mrs 456 -p group/repo
# Count entities
lore --robot count issues
lore --robot count discussions --for mr
# Search indexed documents
lore --robot search "authentication bug"
# Check sync status
lore --robot status
# Run full sync pipeline
lore --robot sync
# Run sync without resource events
lore --robot sync --no-events
# Run ingestion only
lore --robot ingest issues
# Check environment health
lore --robot doctor
# Document and index statistics
lore --robot stats
# Quick health pre-flight check (exit 0 = healthy, 19 = unhealthy)
lore --robot health
# Generate searchable documents from ingested data
lore --robot generate-docs
# Generate vector embeddings via Ollama
lore --robot embed
# Agent self-discovery manifest (all commands, flags, exit codes, response schemas)
lore robot-docs
# Version information
lore --robot version
```
### Response Format
All commands return compact JSON with a uniform envelope and timing metadata:
```json
{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}
```
Errors return structured JSON to stderr with machine-actionable recovery steps:
```json
{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'","actions":["lore init"]}}
```
The `actions` array contains executable shell commands for automated recovery. It is omitted when empty.
### Field Selection
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response:
```bash
lore -J issues --fields minimal # Preset: iid, title, state, updated_at_iso
lore -J mrs --fields iid,title,state,draft,labels # Custom field list
```
### Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Internal error / not implemented |
| 2 | Usage error (invalid flags or arguments) |
| 3 | Config invalid |
| 4 | Token not set |
| 5 | GitLab auth failed |
| 6 | Resource not found |
| 7 | Rate limited |
| 8 | Network error |
| 9 | Database locked |
| 10 | Database error |
| 11 | Migration failed |
| 12 | I/O error |
| 13 | Transform error |
| 14 | Ollama unavailable |
| 15 | Ollama model not found |
| 16 | Embedding failed |
| 17 | Not found (entity does not exist) |
| 18 | Ambiguous match (use `-p` to specify project) |
| 19 | Health check failed |
| 20 | Config not found |
### Configuration Precedence
1. CLI flags (highest priority)
2. Environment variables (`LORE_ROBOT`, `GITLAB_TOKEN`, `LORE_CONFIG_PATH`)
3. Config file (`~/.config/lore/config.json`)
4. Built-in defaults (lowest priority)
### Best Practices
- Use `lore --robot` or `lore -J` for all agent interactions
- Check exit codes for error handling
- Parse JSON errors from stderr; use `actions` array for automated recovery
- Use `--fields minimal` to reduce token usage (~60% fewer tokens)
- Use `-n` / `--limit` to control response size
- Use `-q` / `--quiet` to suppress progress bars and non-essential output
- Use `--color never` in non-TTY automation for ANSI-free output
- Use `-v` / `-vv` / `-vvv` for increasing verbosity (debug/trace logging)
- Use `--log-format json` for machine-readable log output to stderr
- TTY detection handles piped commands automatically
- Use `lore --robot health` as a fast pre-flight check before queries
- Use `lore robot-docs` for response schema discovery
- The `-p` flag supports fuzzy project matching (suffix and substring)

320
Cargo.lock generated
View File

@@ -76,12 +76,6 @@ dependencies = [
"windows-sys 0.61.2",
]
[[package]]
name = "arrayvec"
version = "0.7.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50"
[[package]]
name = "assert-json-diff"
version = "2.0.2"
@@ -175,6 +169,23 @@ version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
[[package]]
name = "charmed-lipgloss"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "45e10db01f5eaea11d98ca5c5cffd8cc4add7ac56d0128d91ba1f2a3757b6c5a"
dependencies = [
"bitflags",
"colored",
"crossterm",
"serde",
"serde_json",
"thiserror",
"toml",
"tracing",
"unicode-width 0.1.14",
]
[[package]]
name = "chrono"
version = "0.4.43"
@@ -245,14 +256,13 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b05b61dc5112cbb17e4b6cd61790d9845d13888356391624cbe7e41efeac1e75"
[[package]]
name = "comfy-table"
version = "7.2.2"
name = "colored"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "958c5d6ecf1f214b4c2bbbbf6ab9523a864bd136dcf71a7e8904799acfe1ad47"
checksum = "117725a109d387c937a1533ce01b450cbde6b88abceea8473c4d7a85853cda3c"
dependencies = [
"crossterm",
"unicode-segmentation",
"unicode-width",
"lazy_static",
"windows-sys 0.52.0",
]
[[package]]
@@ -264,10 +274,19 @@ dependencies = [
"encode_unicode",
"libc",
"once_cell",
"unicode-width",
"unicode-width 0.2.2",
"windows-sys 0.61.2",
]
[[package]]
name = "convert_case"
version = "0.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "633458d4ef8c78b72454de2d54fd6ab2e60f9e02be22f3c6104cdc8a4e0fceb9"
dependencies = [
"unicode-segmentation",
]
[[package]]
name = "core-foundation"
version = "0.9.4"
@@ -302,6 +321,21 @@ dependencies = [
"cfg-if",
]
[[package]]
name = "crossbeam-channel"
version = "0.5.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "82b8f8f868b36967f9606790d1903570de9ceaf870a7bf9fbbd3016d636a2cb2"
dependencies = [
"crossbeam-utils",
]
[[package]]
name = "crossbeam-utils"
version = "0.8.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28"
[[package]]
name = "crossterm"
version = "0.29.0"
@@ -310,9 +344,13 @@ checksum = "d8b9f2e4c67f833b660cdb0a3523065869fb35570177239812ed4c905aeff87b"
dependencies = [
"bitflags",
"crossterm_winapi",
"derive_more",
"document-features",
"mio",
"parking_lot",
"rustix",
"signal-hook",
"signal-hook-mio",
"winapi",
]
@@ -353,6 +391,37 @@ version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "092966b41edc516079bdf31ec78a2e0588d1d0c08f78b91d8307215928642b2b"
[[package]]
name = "deranged"
version = "0.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ececcb659e7ba858fb4f10388c250a7252eb0a27373f1a72b8748afdd248e587"
dependencies = [
"powerfmt",
]
[[package]]
name = "derive_more"
version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d751e9e49156b02b44f9c1815bcb94b984cdcc4396ecc32521c739452808b134"
dependencies = [
"derive_more-impl",
]
[[package]]
name = "derive_more-impl"
version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "799a97264921d8623a957f6c3b9011f3b5492f557bbb7a5a19b7fa6d06ba8dcb"
dependencies = [
"convert_case",
"proc-macro2",
"quote",
"rustc_version",
"syn",
]
[[package]]
name = "dialoguer"
version = "0.12.0"
@@ -958,9 +1027,8 @@ checksum = "9375e112e4b463ec1b1c6c011953545c65a30164fbab5b581df32b3abf0dcb88"
dependencies = [
"console",
"portable-atomic",
"unicode-width",
"unicode-width 0.2.2",
"unit-prefix",
"vt100",
"web-time",
]
@@ -1089,33 +1157,36 @@ checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
[[package]]
name = "lore"
version = "0.1.0"
version = "0.8.3"
dependencies = [
"async-stream",
"charmed-lipgloss",
"chrono",
"clap",
"clap_complete",
"comfy-table",
"console",
"dialoguer",
"dirs",
"flate2",
"futures",
"httpdate",
"indicatif",
"libc",
"open",
"rand",
"regex",
"reqwest",
"rusqlite",
"serde",
"serde_json",
"sha2",
"sqlite-vec",
"strsim",
"tempfile",
"thiserror",
"tokio",
"tracing",
"tracing-indicatif",
"tracing-appender",
"tracing-subscriber",
"url",
"urlencoding",
@@ -1161,6 +1232,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a69bcab0ad47271a0234d9422b131806bf3968021e5dc9328caf2d4cd58557fc"
dependencies = [
"libc",
"log",
"wasi",
"windows-sys 0.61.2",
]
@@ -1191,6 +1263,12 @@ dependencies = [
"windows-sys 0.61.2",
]
[[package]]
name = "num-conv"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cf97ec579c3c42f953ef76dbf8d55ac91fb219dde70e49aa4a6b7d74e9919050"
[[package]]
name = "num-traits"
version = "0.2.19"
@@ -1351,6 +1429,12 @@ dependencies = [
"zerovec",
]
[[package]]
name = "powerfmt"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "439ee305def115ba05938db6eb1644ff94165c5ab5e9420d1c1bcedbba909391"
[[package]]
name = "ppv-lite86"
version = "0.2.21"
@@ -1542,6 +1626,15 @@ dependencies = [
"sqlite-wasm-rs",
]
[[package]]
name = "rustc_version"
version = "0.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cfcb3a22ef46e85b45de6ee7e79d063319ebb6594faafcf1c225ea92ab6e9b92"
dependencies = [
"semver",
]
[[package]]
name = "rustix"
version = "1.1.3"
@@ -1638,6 +1731,12 @@ dependencies = [
"libc",
]
[[package]]
name = "semver"
version = "1.0.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d767eb0aabc880b29956c35734170f26ed551a859dbd361d140cdbeca61ab1e2"
[[package]]
name = "serde"
version = "1.0.228"
@@ -1681,6 +1780,15 @@ dependencies = [
"zmij",
]
[[package]]
name = "serde_spanned"
version = "0.6.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bf41e0cfaf7226dca15e8197172c295a782857fcb97fad1808a166870dee75a3"
dependencies = [
"serde",
]
[[package]]
name = "serde_urlencoded"
version = "0.7.1"
@@ -1725,6 +1833,36 @@ version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
[[package]]
name = "signal-hook"
version = "0.3.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d881a16cf4426aa584979d30bd82cb33429027e42122b169753d6ef1085ed6e2"
dependencies = [
"libc",
"signal-hook-registry",
]
[[package]]
name = "signal-hook-mio"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b75a19a7a740b25bc7944bdee6172368f988763b744e3d4dfe753f6b4ece40cc"
dependencies = [
"libc",
"mio",
"signal-hook",
]
[[package]]
name = "signal-hook-registry"
version = "1.4.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9203b8055f63a2a00e2f593bb0510367fe707d7ff1e5c872de2f537b339e5410"
dependencies = [
"libc",
]
[[package]]
name = "simd-adler32"
version = "0.3.8"
@@ -1886,6 +2024,37 @@ dependencies = [
"cfg-if",
]
[[package]]
name = "time"
version = "0.3.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9da98b7d9b7dad93488a84b8248efc35352b0b2657397d4167e7ad67e5d535e5"
dependencies = [
"deranged",
"itoa",
"num-conv",
"powerfmt",
"serde_core",
"time-core",
"time-macros",
]
[[package]]
name = "time-core"
version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7694e1cfe791f8d31026952abf09c69ca6f6fa4e1a1229e18988f06a04a12dca"
[[package]]
name = "time-macros"
version = "0.2.26"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "78cc610bac2dcee56805c99642447d4c5dbde4d01f752ffea0199aee1f601dc4"
dependencies = [
"num-conv",
"time-core",
]
[[package]]
name = "tinystr"
version = "0.8.2"
@@ -1906,6 +2075,7 @@ dependencies = [
"libc",
"mio",
"pin-project-lite",
"signal-hook-registry",
"socket2",
"tokio-macros",
"windows-sys 0.61.2",
@@ -1955,6 +2125,47 @@ dependencies = [
"tokio",
]
[[package]]
name = "toml"
version = "0.8.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dc1beb996b9d83529a9e75c17a1686767d148d70663143c7854d8b4a09ced362"
dependencies = [
"serde",
"serde_spanned",
"toml_datetime",
"toml_edit",
]
[[package]]
name = "toml_datetime"
version = "0.6.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "22cddaf88f4fbc13c51aebbf5f8eceb5c7c5a9da2ac40a13519eb5b0a0e8f11c"
dependencies = [
"serde",
]
[[package]]
name = "toml_edit"
version = "0.22.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "41fe8c660ae4257887cf66394862d21dbca4a6ddd26f04a3560410406a2f819a"
dependencies = [
"indexmap",
"serde",
"serde_spanned",
"toml_datetime",
"toml_write",
"winnow",
]
[[package]]
name = "toml_write"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5d99f8c9a7727884afe522e9bd5edbfc91a3312b36a77b5fb8926e4c31a41801"
[[package]]
name = "tower"
version = "0.5.3"
@@ -2011,6 +2222,18 @@ dependencies = [
"tracing-core",
]
[[package]]
name = "tracing-appender"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "786d480bce6247ab75f005b14ae1624ad978d3029d9113f0a22fa1ac773faeaf"
dependencies = [
"crossbeam-channel",
"thiserror",
"time",
"tracing-subscriber",
]
[[package]]
name = "tracing-attributes"
version = "0.1.31"
@@ -2032,18 +2255,6 @@ dependencies = [
"valuable",
]
[[package]]
name = "tracing-indicatif"
version = "0.3.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e1ef6990e0438749f0080573248e96631171a0b5ddfddde119aa5ba8c3a9c47e"
dependencies = [
"indicatif",
"tracing",
"tracing-core",
"tracing-subscriber",
]
[[package]]
name = "tracing-log"
version = "0.2.0"
@@ -2055,6 +2266,16 @@ dependencies = [
"tracing-core",
]
[[package]]
name = "tracing-serde"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "704b1aeb7be0d0a84fc9828cae51dab5970fee5088f83d1dd7ee6f6246fc6ff1"
dependencies = [
"serde",
"tracing-core",
]
[[package]]
name = "tracing-subscriber"
version = "0.3.22"
@@ -2065,12 +2286,15 @@ dependencies = [
"nu-ansi-term",
"once_cell",
"regex-automata",
"serde",
"serde_json",
"sharded-slab",
"smallvec",
"thread_local",
"tracing",
"tracing-core",
"tracing-log",
"tracing-serde",
]
[[package]]
@@ -2097,6 +2321,12 @@ version = "1.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f6ccf251212114b54433ec949fd6a7841275f9ada20dddd2f29e9ceea4501493"
[[package]]
name = "unicode-width"
version = "0.1.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7dd6e30e90baa6f72411720665d41d89b9a3d039dc45b8faea1ddd07f617f6af"
[[package]]
name = "unicode-width"
version = "0.2.2"
@@ -2174,27 +2404,6 @@ version = "0.9.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
[[package]]
name = "vt100"
version = "0.16.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "054ff75fb8fa83e609e685106df4faeffdf3a735d3c74ebce97ec557d5d36fd9"
dependencies = [
"itoa",
"unicode-width",
"vte",
]
[[package]]
name = "vte"
version = "0.15.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a5924018406ce0063cd67f8e008104968b74b563ee1b85dde3ed1f7cb87d3dbd"
dependencies = [
"arrayvec",
"memchr",
]
[[package]]
name = "want"
version = "0.3.1"
@@ -2546,6 +2755,15 @@ version = "0.53.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d6bbff5f0aada427a1e5a6da5f1f98158182f26556f345ac9e04d36d0ebed650"
[[package]]
name = "winnow"
version = "0.7.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5a5364e9d77fcdeeaa6062ced926ee3381faa2ee02d3eb83a5c27a8825540829"
dependencies = [
"memchr",
]
[[package]]
name = "wiremock"
version = "0.6.5"

View File

@@ -1,6 +1,6 @@
[package]
name = "lore"
version = "0.1.0"
version = "0.8.3"
edition = "2024"
description = "Gitlore - Local GitLab data management with semantic search"
authors = ["Taylor Eernisse"]
@@ -25,12 +25,12 @@ clap_complete = "4"
dialoguer = "0.12"
console = "0.16"
indicatif = "0.18"
comfy-table = "7"
lipgloss = { package = "charmed-lipgloss", version = "0.1", default-features = false, features = ["native"] }
open = "5"
# HTTP
reqwest = { version = "0.12", features = ["json"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros", "time"] }
tokio = { version = "1", features = ["rt-multi-thread", "macros", "time", "signal"] }
# Async streaming for pagination
async-stream = "0.3"
@@ -45,15 +45,18 @@ rand = "0.8"
sha2 = "0.10"
flate2 = "1"
chrono = { version = "0.4", features = ["serde"] }
httpdate = "1"
uuid = { version = "1", features = ["v4"] }
regex = "1"
strsim = "0.11"
[target.'cfg(unix)'.dependencies]
libc = "0.2"
# Logging
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
tracing-indicatif = "0.3"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
tracing-appender = "0.2"
[dev-dependencies]
tempfile = "3"

467
PERFORMANCE_AUDIT.md Normal file
View File

@@ -0,0 +1,467 @@
# Gitlore Performance Audit Report
**Date**: 2026-02-05
**Auditor**: Claude Code (Opus 4.5)
**Scope**: Core system performance - ingestion, embedding, search, and document regeneration
## Executive Summary
This audit identifies 12 high-impact optimization opportunities across the Gitlore codebase. The most significant findings center on:
1. **SQL query patterns** with N+1 issues and inefficient correlated subqueries
2. **Memory allocation patterns** in hot paths (embedding, chunking, ingestion)
3. **Change detection queries** using triple-EXISTS patterns instead of JOINs
**Estimated overall improvement potential**: 30-50% reduction in latency for filtered searches, 2-5x improvement in ingestion throughput for issues/MRs with many labels.
---
## Methodology
- **Codebase analysis**: Full read of all modules in `src/`
- **SQL pattern analysis**: All queries checked for N+1, missing indexes, unbounded results
- **Memory allocation analysis**: Clone patterns, unnecessary collections, missing capacity hints
- **Test baseline**: All tests pass (`cargo test --release`)
Note: Without access to a live GitLab instance or populated database, profiling is code-analysis based rather than runtime measured.
---
## Opportunity Matrix
| ID | Issue | Location | Impact | Confidence | Effort | ICE Score | Status |
|----|-------|----------|--------|------------|--------|-----------|--------|
| 1 | Triple-EXISTS change detection | `change_detector.rs:19-46` | HIGH | 95% | LOW | **9.5** | **DONE** |
| 2 | N+1 label/assignee inserts | `issues.rs:270-285`, `merge_requests.rs:242-272` | HIGH | 95% | MEDIUM | **9.0** | Pending |
| 3 | Clone in embedding batch loop | `pipeline.rs:165` | HIGH | 90% | LOW | **9.0** | Pending |
| 4 | Correlated GROUP_CONCAT in list | `list.rs:341-348` | HIGH | 90% | MEDIUM | **8.5** | Pending |
| 5 | Multiple EXISTS per label filter | `filters.rs:100-107` | HIGH | 85% | MEDIUM | **8.0** | **DONE** |
| 6 | String allocation in chunking | `chunking.rs:7-49` | MEDIUM | 95% | MEDIUM | **7.5** | Pending |
| 7 | Multiple COUNT queries | `count.rs:44-56` | MEDIUM | 95% | LOW | **7.0** | **DONE** |
| 8 | Collect-then-concat pattern | `truncation.rs:60-61` | MEDIUM | 90% | LOW | **7.0** | **DONE** |
| 9 | Box<dyn ToSql> allocations | `filters.rs:67-135` | MEDIUM | 80% | HIGH | **6.0** | Pending |
| 10 | Missing Vec::with_capacity | `pipeline.rs:106`, multiple | LOW | 95% | LOW | **5.5** | **DONE** |
| 11 | FTS token collect-join | `fts.rs:26-41` | LOW | 90% | LOW | **5.0** | **DONE** |
| 12 | Transformer string clones | `merge_request.rs:51-77` | MEDIUM | 85% | HIGH | **5.0** | Pending |
ICE Score = (Impact x Confidence) / Effort, scaled 1-10
---
## Detailed Findings
### 1. Triple-EXISTS Change Detection Query (ICE: 9.5)
**Location**: `src/embedding/change_detector.rs:19-46`
**Current Code**:
```sql
SELECT d.id, d.content_text, d.content_hash
FROM documents d
WHERE d.id > ?1
AND (
NOT EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0)
OR EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0 AND em.document_hash != d.content_hash)
OR EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0 AND (...))
)
ORDER BY d.id
LIMIT ?2
```
**Problem**: Three separate EXISTS subqueries, each scanning `embedding_metadata`. SQLite cannot short-circuit across OR'd EXISTS efficiently.
**Proposed Fix**:
```sql
SELECT d.id, d.content_text, d.content_hash
FROM documents d
LEFT JOIN embedding_metadata em
ON em.document_id = d.id AND em.chunk_index = 0
WHERE d.id > ?1
AND (
em.document_id IS NULL -- no embedding
OR em.document_hash != d.content_hash -- hash mismatch
OR em.chunk_max_bytes IS NULL
OR em.chunk_max_bytes != ?3
OR em.model != ?4
OR em.dims != ?5
)
ORDER BY d.id
LIMIT ?2
```
**Isomorphism Proof**: Both queries return documents needing embedding when:
- No embedding exists for chunk_index=0 (NULL check)
- Hash changed (direct comparison)
- Config mismatch (model/dims/chunk_max_bytes)
The LEFT JOIN + NULL check is semantically identical to NOT EXISTS. The OR conditions inside WHERE match the EXISTS predicates exactly.
**Expected Impact**: 2-3x faster for large document sets. Single scan of embedding_metadata instead of three.
---
### 2. N+1 Label/Assignee Inserts (ICE: 9.0)
**Location**:
- `src/ingestion/issues.rs:270-285`
- `src/ingestion/merge_requests.rs:242-272`
**Current Code**:
```rust
for label_name in label_names {
let label_id = upsert_label_tx(tx, project_id, label_name, &mut labels_created)?;
link_issue_label_tx(tx, local_issue_id, label_id)?;
}
```
**Problem**: Each label triggers 2+ SQL statements. With 20 labels × 100 issues = 4000+ queries per batch.
**Proposed Fix**: Batch insert using prepared statements with multi-row VALUES:
```rust
// Build batch: INSERT INTO issue_labels VALUES (?, ?), (?, ?), ...
let mut values = String::new();
let mut params: Vec<Box<dyn ToSql>> = Vec::with_capacity(label_ids.len() * 2);
for (i, label_id) in label_ids.iter().enumerate() {
if i > 0 { values.push_str(","); }
values.push_str("(?,?)");
params.push(Box::new(local_issue_id));
params.push(Box::new(*label_id));
}
let sql = format!("INSERT OR IGNORE INTO issue_labels (issue_id, label_id) VALUES {}", values);
```
Or use `prepare_cached()` pattern from `events_db.rs`.
**Isomorphism Proof**: Both approaches insert identical rows. OR IGNORE handles duplicates identically.
**Expected Impact**: 5-10x faster ingestion for issues/MRs with many labels.
---
### 3. Clone in Embedding Batch Loop (ICE: 9.0)
**Location**: `src/embedding/pipeline.rs:165`
**Current Code**:
```rust
let texts: Vec<String> = batch.iter().map(|c| c.text.clone()).collect();
```
**Problem**: Every batch iteration clones all chunk texts. With BATCH_SIZE=32 and thousands of chunks, this doubles memory allocation in the hot path.
**Proposed Fix**: Transfer ownership instead of cloning:
```rust
// Option A: Drain chunks from all_chunks instead of iterating
let texts: Vec<String> = batch.into_iter().map(|c| c.text).collect();
// Option B: Store references in ChunkWork, clone only at API boundary
struct ChunkWork<'a> {
text: &'a str,
// ...
}
```
**Isomorphism Proof**: Same texts sent to Ollama, same embeddings returned. Order and content identical.
**Expected Impact**: 30-50% reduction in embedding pipeline memory allocation.
---
### 4. Correlated GROUP_CONCAT in List Queries (ICE: 8.5)
**Location**: `src/cli/commands/list.rs:341-348`
**Current Code**:
```sql
SELECT i.*,
(SELECT GROUP_CONCAT(l.name, X'1F') FROM issue_labels il JOIN labels l ... WHERE il.issue_id = i.id) AS labels_csv,
(SELECT COUNT(*) FROM discussions WHERE issue_id = i.id) as discussion_count
FROM issues i
```
**Problem**: Each correlated subquery executes per row. With LIMIT 50, that's 100+ subquery executions.
**Proposed Fix**: Use window functions or pre-aggregated CTEs:
```sql
WITH label_agg AS (
SELECT il.issue_id, GROUP_CONCAT(l.name, X'1F') AS labels_csv
FROM issue_labels il JOIN labels l ON il.label_id = l.id
GROUP BY il.issue_id
),
discussion_agg AS (
SELECT issue_id, COUNT(*) AS cnt
FROM discussions WHERE issue_id IS NOT NULL
GROUP BY issue_id
)
SELECT i.*, la.labels_csv, da.cnt
FROM issues i
LEFT JOIN label_agg la ON la.issue_id = i.id
LEFT JOIN discussion_agg da ON da.issue_id = i.id
WHERE ...
LIMIT 50
```
**Isomorphism Proof**: Same data returned - labels concatenated, discussion counts accurate. JOIN preserves NULL when no labels/discussions exist.
**Expected Impact**: 3-5x faster list queries with discussion/label data.
---
### 5. Multiple EXISTS Per Label Filter (ICE: 8.0)
**Location**: `src/search/filters.rs:100-107`
**Current Code**:
```sql
WHERE EXISTS (SELECT 1 ... AND label_name = ?)
AND EXISTS (SELECT 1 ... AND label_name = ?)
AND EXISTS (SELECT 1 ... AND label_name = ?)
```
**Problem**: Filtering by 3 labels generates 3 EXISTS subqueries. Each scans document_labels.
**Proposed Fix**: Single EXISTS with GROUP BY/HAVING:
```sql
WHERE EXISTS (
SELECT 1 FROM document_labels dl
WHERE dl.document_id = d.id
AND dl.label_name IN (?, ?, ?)
GROUP BY dl.document_id
HAVING COUNT(DISTINCT dl.label_name) = 3
)
```
**Isomorphism Proof**: Both return documents with ALL specified labels. AND of EXISTS = document has label1 AND label2 AND label3. GROUP BY + HAVING COUNT(DISTINCT) = 3 is mathematically equivalent.
**Expected Impact**: 2-4x faster filtered search with multiple labels.
---
### 6. String Allocation in Chunking (ICE: 7.5)
**Location**: `src/embedding/chunking.rs:7-49`
**Current Code**:
```rust
chunks.push((chunk_index, remaining.to_string()));
```
**Problem**: Converts `&str` slices to owned `String` for every chunk. The input is already a `&str`.
**Proposed Fix**: Return borrowed slices or use `Cow`:
```rust
pub fn split_into_chunks(content: &str) -> Vec<(usize, &str)> {
// Return slices into original content
}
```
Or if ownership is needed later:
```rust
pub fn split_into_chunks(content: &str) -> Vec<(usize, Cow<'_, str>)>
```
**Isomorphism Proof**: Same chunk boundaries, same text content. Only allocation behavior changes.
**Expected Impact**: Reduces allocations by ~50% in chunking hot path.
---
### 7. Multiple COUNT Queries (ICE: 7.0)
**Location**: `src/cli/commands/count.rs:44-56`
**Current Code**:
```rust
let count = conn.query_row("SELECT COUNT(*) FROM issues", ...)?;
let opened = conn.query_row("SELECT COUNT(*) FROM issues WHERE state = 'opened'", ...)?;
let closed = conn.query_row("SELECT COUNT(*) FROM issues WHERE state = 'closed'", ...)?;
```
**Problem**: 5 separate queries for MR state breakdown, 3 for issues.
**Proposed Fix**: Single query with CASE aggregation:
```sql
SELECT
COUNT(*) AS total,
SUM(CASE WHEN state = 'opened' THEN 1 ELSE 0 END) AS opened,
SUM(CASE WHEN state = 'closed' THEN 1 ELSE 0 END) AS closed
FROM issues
```
**Isomorphism Proof**: Identical counts returned. CASE WHEN with SUM is standard SQL for conditional counting.
**Expected Impact**: 3-5x fewer round trips for count command.
---
### 8. Collect-then-Concat Pattern (ICE: 7.0)
**Location**: `src/documents/truncation.rs:60-61`
**Current Code**:
```rust
let formatted: Vec<String> = notes.iter().map(format_note).collect();
let total: String = formatted.concat();
```
**Problem**: Allocates intermediate Vec<String>, then allocates again for concat.
**Proposed Fix**: Use fold or format directly:
```rust
let total = notes.iter().fold(String::new(), |mut acc, note| {
acc.push_str(&format_note(note));
acc
});
```
Or with capacity hint:
```rust
let total_len: usize = notes.iter().map(|n| estimate_note_len(n)).sum();
let mut total = String::with_capacity(total_len);
for note in notes {
total.push_str(&format_note(note));
}
```
**Isomorphism Proof**: Same concatenated string output. Order preserved.
**Expected Impact**: 50% reduction in allocations for document regeneration.
---
### 9. Box<dyn ToSql> Allocations (ICE: 6.0)
**Location**: `src/search/filters.rs:67-135`
**Current Code**:
```rust
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = vec![Box::new(ids_json)];
// ... more Box::new() calls
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
```
**Problem**: Boxing each parameter, then collecting references. Two allocations per parameter.
**Proposed Fix**: Use rusqlite's params! macro or typed parameter arrays:
```rust
// For known parameter counts, use arrays
let params: [&dyn ToSql; 4] = [&ids_json, &author, &state, &limit];
// Or build SQL with named parameters and use params! directly
```
**Expected Impact**: Eliminates ~15 allocations per filtered search.
---
### 10. Missing Vec::with_capacity (ICE: 5.5)
**Locations**:
- `src/embedding/pipeline.rs:106`
- `src/embedding/pipeline.rs:162`
- Multiple other locations
**Current Code**:
```rust
let mut all_chunks: Vec<ChunkWork> = Vec::new();
```
**Proposed Fix**:
```rust
// Estimate: average 3 chunks per document
let mut all_chunks = Vec::with_capacity(pending.len() * 3);
```
**Expected Impact**: Eliminates reallocation overhead during vector growth.
---
### 11. FTS Token Collect-Join (ICE: 5.0)
**Location**: `src/search/fts.rs:26-41`
**Current Code**:
```rust
let tokens: Vec<String> = trimmed.split_whitespace().map(...).collect();
tokens.join(" ")
```
**Proposed Fix**: Use itertools or avoid intermediate vec:
```rust
use itertools::Itertools;
trimmed.split_whitespace().map(...).join(" ")
```
**Expected Impact**: Minor - search queries are typically short.
---
### 12. Transformer String Clones (ICE: 5.0)
**Location**: `src/gitlab/transformers/merge_request.rs:51-77`
**Problem**: Multiple `.clone()` calls on String fields during transformation.
**Proposed Fix**: Use `std::mem::take()` where possible, or restructure to avoid cloning.
**Expected Impact**: Moderate - depends on MR volume.
---
## Regression Guardrails
For any optimization implemented:
1. **Test Coverage**: All existing tests must pass
2. **Output Equivalence**: For SQL changes, verify identical result sets with test data
3. **Benchmark Suite**: Add benchmarks for affected paths before/after
Suggested benchmark targets:
```rust
#[bench] fn bench_change_detection_1k_docs(b: &mut Bencher) { ... }
#[bench] fn bench_label_insert_50_labels(b: &mut Bencher) { ... }
#[bench] fn bench_hybrid_search_filtered(b: &mut Bencher) { ... }
```
---
## Implementation Priority
**Phase 1 (Quick Wins)** - COMPLETE:
1. ~~Change detection query rewrite (#1)~~ **DONE**
2. ~~Multiple COUNT consolidation (#7)~~ **DONE**
3. ~~Collect-concat pattern (#8)~~ **DONE**
4. ~~Vec::with_capacity hints (#10)~~ **DONE**
5. ~~FTS token collect-join (#11)~~ **DONE**
6. ~~Multiple EXISTS per label (#5)~~ **DONE**
**Phase 2 (Medium Effort)**:
5. Embedding batch clone removal (#3)
6. Label filter EXISTS consolidation (#5)
7. Chunking string allocation (#6)
**Phase 3 (Higher Effort)**:
8. N+1 batch inserts (#2)
9. List query CTEs (#4)
10. Parameter boxing (#9)
---
## Appendix: Test Baseline
```
cargo test --release
running 127 tests
test result: ok. 127 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
```
All tests pass. Any optimization must maintain this baseline.

View File

@@ -0,0 +1,425 @@
# Proposed Code File Reorganization Plan
## Executive Summary
The codebase is 79 Rust source files / 46K lines across 7 top-level modules. Most modules (`gitlab/`, `embedding/`, `search/`, `documents/`, `ingestion/`) are well-organized. The pain points are:
1. **`core/` is a grab-bag** — 22 files mixing infrastructure, domain logic, DB operations, and an entire timeline pipeline
2. **`main.rs` is 2713 lines** — ~30 handler functions that bridge CLI args to commands
3. **`cli/mod.rs` is 949 lines** — every clap argument struct is packed into one file
4. **Giant command files**`who.rs` (6067 lines), `list.rs` (2931 lines) are unwieldy
This plan is organized into **three tiers** based on impact-to-risk ratio. Tier 1 changes are "no-brainers" — they reduce confusion with minimal import churn. Tier 2 changes are valuable but involve more cross-cutting import updates. Tier 3 changes are "maybe later" — they'd be nice but the juice might not be worth the squeeze right now.
---
## Current Structure (Annotated)
```
src/
├── main.rs (2713 lines) ← dispatch + ~30 handler functions + error helpers
├── lib.rs (9 lines)
├── cli/
│ ├── mod.rs (949 lines) ← ALL clap arg structs crammed here
│ ├── autocorrect.rs (945 lines)
│ ├── progress.rs (92 lines)
│ ├── robot.rs (111 lines)
│ └── commands/
│ ├── mod.rs (50 lines) — re-exports
│ ├── auth_test.rs
│ ├── count.rs (406 lines)
│ ├── doctor.rs (576 lines)
│ ├── drift.rs (642 lines)
│ ├── embed.rs
│ ├── generate_docs.rs (320 lines)
│ ├── ingest.rs (1064 lines)
│ ├── init.rs (174 lines)
│ ├── list.rs (2931 lines) ← handles issues, MRs, AND notes listing
│ ├── search.rs (418 lines)
│ ├── show.rs (1377 lines)
│ ├── stats.rs (505 lines)
│ ├── sync_status.rs (454 lines)
│ ├── sync.rs (576 lines)
│ ├── timeline.rs (488 lines)
│ └── who.rs (6067 lines) ← 5 sub-modes: expert, workload, active, overlap, reviews
├── core/
│ ├── mod.rs (25 lines)
│ ├── backoff.rs ← retry logic (used by ingestion)
│ ├── config.rs (789 lines) ← configuration types
│ ├── db.rs (970 lines) ← connection + 22 migrations
│ ├── dependent_queue.rs (330 lines) ← job queue (used by ingestion orchestrator)
│ ├── error.rs (295 lines) ← error enum + exit codes
│ ├── events_db.rs (199 lines) ← resource event upserts (used by ingestion)
│ ├── lock.rs (228 lines) ← filesystem sync lock
│ ├── logging.rs (179 lines) ← tracing filter builders
│ ├── metrics.rs (566 lines) ← tracing-based stage timing
│ ├── note_parser.rs (563 lines) ← cross-ref extraction from note bodies
│ ├── paths.rs ← config/db/log file path resolution
│ ├── payloads.rs (204 lines) ← raw JSON payload storage
│ ├── project.rs (274 lines) ← fuzzy project resolution from DB
│ ├── references.rs (551 lines) ← entity cross-reference extraction
│ ├── shutdown.rs ← graceful shutdown via tokio signal
│ ├── sync_run.rs (218 lines) ← sync run recording to DB
│ ├── time.rs ← time conversion utilities
│ ├── timeline.rs (284 lines) ← timeline types + EntityRef
│ ├── timeline_collect.rs (695 lines) ← Stage 4: collect events from DB
│ ├── timeline_expand.rs (557 lines) ← Stage 3: expand via cross-refs
│ └── timeline_seed.rs (552 lines) ← Stage 1: FTS search seeding
├── documents/ ← well-organized, 3 focused files
├── embedding/ ← well-organized, 6 focused files
├── gitlab/ ← well-organized, with transformers/ subdir
├── ingestion/ ← well-organized, 8 focused files
└── search/ ← well-organized, 5 focused files
```
---
## Tier 1: No-Brainers (Do First)
### 1.1 Extract `timeline/` from `core/`
**What:** Move the 4 timeline files into their own top-level module `src/timeline/`.
**Current location:**
- `core/timeline.rs` (284 lines) — types: `EntityRef`, `ExpandedEntityRef`, `TimelineEvent`, `TimelineEventType`, etc.
- `core/timeline_seed.rs` (552 lines) — Stage 1: FTS-based seeding
- `core/timeline_expand.rs` (557 lines) — Stage 3: cross-reference expansion
- `core/timeline_collect.rs` (695 lines) — Stage 4: event collection from DB
**New structure:**
```
src/timeline/
├── mod.rs ← types (from timeline.rs) + re-exports
├── seed.rs ← from timeline_seed.rs
├── expand.rs ← from timeline_expand.rs
└── collect.rs ← from timeline_collect.rs
```
**Rationale:** These 4 files form a cohesive 5-stage pipeline (SEED→HYDRATE→EXPAND→COLLECT→RENDER). They have nothing to do with "core" infrastructure like `db.rs`, `config.rs`, or `error.rs`. They only import from `core::error`, `core::time`, and `search::fts` — all of which remain accessible via `crate::core::*` and `crate::search::*` after the move.
**Import changes needed:**
- `cli/commands/timeline.rs`: `use crate::core::timeline::*``use crate::timeline::*`, same for `timeline_seed`, `timeline_expand`, `timeline_collect`
- `core/mod.rs`: remove the 4 `pub mod timeline*` lines
- `lib.rs`: add `pub mod timeline;`
**Risk: LOW** — Only 1 consumer (`cli/commands/timeline.rs`) + internal cross-references between the 4 files.
---
### 1.2 Extract `xref/` (cross-reference extraction) from `core/`
**What:** Move `note_parser.rs` and `references.rs` into `src/xref/`.
**Current location:**
- `core/note_parser.rs` (563 lines) — parses note bodies for "mentioned in group/repo#123" patterns, persists to `note_cross_references` table
- `core/references.rs` (551 lines) — extracts entity references from state events and closing MRs, writes to `entity_references` table
**New structure:**
```
src/xref/
├── mod.rs ← re-exports
├── note_parser.rs ← from core/note_parser.rs
└── references.rs ← from core/references.rs
```
**Rationale:** These files implement a specific domain concept — extracting and persisting cross-references between issues and MRs. They are not "core infrastructure." They're consumed by `ingestion/orchestrator.rs` for the cross-reference extraction phase, and the data they produce is consumed by the timeline pipeline. Putting them in their own module makes the data flow clearer: `ingestion → xref → timeline`.
**Import changes needed:**
- `ingestion/orchestrator.rs`: `use crate::core::references::*``use crate::xref::references::*`
- `ingestion/orchestrator.rs`: `use crate::core::note_parser::*` (if used directly — needs verification) → `use crate::xref::*`
- `core/mod.rs`: remove `pub mod note_parser; pub mod references;`
- `lib.rs`: add `pub mod xref;`
- Internal: the files use `super::error::Result` and `super::time::now_ms` which become `crate::core::error::Result` and `crate::core::time::now_ms`
**Risk: LOW** — 2-3 consumers at most. The files already use `super::` internally which just needs updating to `crate::core::`.
---
## Tier 2: Good Improvements (Do After Tier 1)
### 2.1 Group ingestion-adjacent DB operations
**What:** Move `events_db.rs`, `dependent_queue.rs`, `payloads.rs`, and `sync_run.rs` from `core/` into `ingestion/` since they exclusively serve the ingestion pipeline.
**Current consumers:**
- `events_db.rs` → only used by `cli/commands/count.rs` (for event counts)
- `dependent_queue.rs` → only used by `ingestion/orchestrator.rs` and `main.rs` (to release locked jobs)
- `payloads.rs` → only used by `ingestion/discussions.rs`, `ingestion/issues.rs`, `ingestion/merge_requests.rs`, `ingestion/mr_discussions.rs`
- `sync_run.rs` → only used by `cli/commands/sync.rs` and `cli/commands/sync_status.rs`
**New structure:**
```
src/ingestion/
├── (existing files...)
├── events_db.rs ← from core/events_db.rs
├── dependent_queue.rs ← from core/dependent_queue.rs
├── payloads.rs ← from core/payloads.rs
└── sync_run.rs ← from core/sync_run.rs
```
**Rationale:** All 4 files exist to support the ingestion pipeline:
- `events_db.rs` upserts resource state/label/milestone events fetched during ingestion
- `dependent_queue.rs` manages the job queue that drives incremental discussion fetching
- `payloads.rs` stores the raw JSON payloads fetched from GitLab
- `sync_run.rs` records when syncs start/finish and their metrics
When you're looking for "how does ingestion work?", you'd naturally look in `ingestion/`. Having these scattered in `core/` requires knowing the hidden dependency.
**Import changes needed:**
- `events_db.rs`: 1 consumer in `cli/commands/count.rs` changes from `crate::core::events_db``crate::ingestion::events_db`
- `dependent_queue.rs`: 2 consumers — `ingestion/orchestrator.rs` (becomes `super::dependent_queue`) and `main.rs`
- `payloads.rs`: 4 consumers in `ingestion/*.rs` (become `super::payloads`)
- `sync_run.rs`: 2 consumers in `cli/commands/sync.rs` and `sync_status.rs`
- Internal references change from `super::error` / `super::time` to `crate::core::error` / `crate::core::time`
**Risk: MEDIUM** — More import changes, but all straightforward. The internal `super::` references need the most attention.
**Alternatively:** If moving feels like too much churn, a lighter option is to create `core/ingestion_db.rs` that re-exports from these 4 files, making the grouping visible without moving files. But I think the move is cleaner.
---
### 2.2 Split `cli/mod.rs` — move arg structs to their command files
**What:** Move each `*Args` struct from `cli/mod.rs` into the corresponding `cli/commands/*.rs` file. Keep `Cli` struct, `Commands` enum, and `detect_robot_mode_from_env()` in `cli/mod.rs`.
**Currently `cli/mod.rs` (949 lines) contains:**
- `Cli` struct (81 lines) — the root clap parser
- `Commands` enum (193 lines) — all subcommand variants
- `IssuesArgs` (86 lines) → move to `commands/list.rs` or stay near issues handling
- `MrsArgs` (93 lines) → move to `commands/list.rs` or stay near MRs handling
- `NotesArgs` (99 lines) → move to `commands/list.rs`
- `IngestArgs` (33 lines) → move to `commands/ingest.rs`
- `StatsArgs` (19 lines) → move to `commands/stats.rs`
- `SearchArgs` (58 lines) → move to `commands/search.rs`
- `GenerateDocsArgs` (9 lines) → move to `commands/generate_docs.rs`
- `SyncArgs` (39 lines) → move to `commands/sync.rs`
- `EmbedArgs` (15 lines) → move to `commands/embed.rs`
- `TimelineArgs` (53 lines) → move to `commands/timeline.rs`
- `WhoArgs` (76 lines) → move to `commands/who.rs`
- `CountArgs` (9 lines) → move to `commands/count.rs`
**After refactoring, `cli/mod.rs` shrinks to ~300 lines** (just `Cli` + `Commands` + the inlined variants like `Init`, `Drift`, `Backup`, `Reset`).
**Rationale:** When adding a new flag to the `who` command, you currently have to edit `cli/mod.rs` (the args struct), `cli/commands/who.rs` (the implementation), and `main.rs` (the dispatch). If the args struct lives in `commands/who.rs`, you only need two files. This is the standard pattern in mature clap-based Rust CLIs.
**Import changes needed:**
- `main.rs` currently does `use lore::cli::{..., WhoArgs, ...}` — these would become `use lore::cli::commands::{..., WhoArgs, ...}` or the `commands/mod.rs` re-exports them
- Each `commands/*.rs` gets its own `#[derive(Parser)]` struct
- `Commands` enum in `cli/mod.rs` keeps using the types but imports from `commands::*`
**Risk: MEDIUM** — Lots of `use` path changes in `main.rs`, but purely mechanical. No logic changes.
---
## Tier 3: Consider Later
### 3.1 Split `main.rs` (2713 lines)
**The problem:** `main.rs` contains `main()`, ~30 `handle_*` functions, error handling, clap error formatting, fuzzy command matching, and the `robot-docs` JSON manifest (a 400+ line inline JSON literal).
**Possible approach:**
- Extract `handle_*` functions into `cli/dispatch.rs` (the routing layer)
- Extract error handling into `cli/errors.rs`
- Extract `handle_robot_docs` + the JSON manifest into `cli/robot_docs.rs`
- Keep `main()` in `main.rs` at ~150 lines (just the tracing setup + dispatch call)
**Why Tier 3:** This is the messiest split. The handler functions depend on the `cli::commands::*` functions AND the `cli::robot::*` helpers AND direct `std::process::exit` calls. Making this work cleanly requires careful thought about the error boundary between `main.rs` (binary) and `lib.rs` (library).
**Risk: HIGH** — Every handler function touches `robot_mode`, constructs its own timer, opens the DB, and manages error display. The boilerplate is high but consistent, so splitting would just move it around without reducing complexity.
---
### 3.2 Split `cli/commands/who.rs` (6067 lines)
**The problem:** This file implements 5 distinct modes (expert, workload, active, overlap, reviews), each with its own query, scoring model, and output formatting. It also includes the time-decay scoring model (~500 lines) and per-MR detail breakdown logic.
**Possible split:**
```
src/cli/commands/who/
├── mod.rs ← WhoRun dispatcher, shared types
├── expert.rs ← expert mode (path-based file expertise lookup)
├── workload.rs ← workload mode (user's assigned issues/MRs)
├── active.rs ← active discussions mode
├── overlap.rs ← file overlap between users
├── reviews.rs ← review pattern analysis
└── scoring.rs ← time-decay expert scoring model
```
**Why Tier 3:** The 5 modes share many helper functions, database connection patterns, and output formatting logic. Splitting would require carefully identifying the shared helpers and deciding where they live. The file is big but internally consistent — the modes use a shared dispatcher pattern and common types.
---
### 3.3 Split `cli/commands/list.rs` (2931 lines)
**The problem:** This file handles issue listing, MR listing, AND note listing — three related but distinct operations with separate query builders, output formatters, and test suites.
**Possible split:**
```
src/cli/commands/
├── list_issues.rs ← issue listing + query builder
├── list_mrs.rs ← MR listing + query builder
├── list_notes.rs ← note listing + query builder
└── list.rs ← shared types (ListFilters, etc.) + re-exports
```
**Why Tier 3:** Same issue as `who.rs` — the three listing modes share query building patterns, field selection logic, and sorting code. Splitting requires identifying and extracting the shared pieces first.
---
## Files NOT Recommended to Move
These files belong exactly where they are:
| File | Why it belongs in `core/` |
|------|--------------------------|
| `config.rs` | Config types used by nearly everything |
| `db.rs` | Database connection + migrations — foundational |
| `error.rs` | Error types used by every module |
| `paths.rs` | File path resolution — infrastructure |
| `logging.rs` | Tracing setup — infrastructure |
| `lock.rs` | Filesystem sync lock — infrastructure |
| `shutdown.rs` | Graceful shutdown signal — infrastructure |
| `backoff.rs` | Retry math — infrastructure |
| `time.rs` | Time conversion — used everywhere |
| `metrics.rs` | Tracing metrics layer — infrastructure |
| `project.rs` | Fuzzy project resolution — used by 8+ consumers across modules |
These files are legitimate "core infrastructure" used across multiple modules. Moving them would create import churn with no clarity gain.
---
## Files NOT Recommended to Split/Merge
| File | Why leave it alone |
|------|-------------------|
| `documents/extractor.rs` (2341 lines) | One cohesive extractor per entity type — the size comes from per-type formatting logic, not mixed concerns |
| `ingestion/orchestrator.rs` (1703 lines) | Single orchestration flow — splitting would scatter the pipeline |
| `gitlab/graphql.rs` (1293 lines) | GraphQL client with adaptive paging — cohesive |
| `gitlab/client.rs` (851 lines) | REST client with all endpoints — cohesive |
| `cli/autocorrect.rs` (945 lines) | Correction registry + fuzzy matching — splitting gains nothing |
---
## Proposed Final Structure (Tiers 1+2)
```
src/
├── main.rs (2713 lines — unchanged for now)
├── lib.rs (adds: pub mod timeline; pub mod xref;)
├── cli/
│ ├── mod.rs (~300 lines — Cli + Commands only, args moved out)
│ ├── autocorrect.rs (unchanged)
│ ├── progress.rs (unchanged)
│ ├── robot.rs (unchanged)
│ └── commands/
│ ├── mod.rs (re-exports + WhoArgs, IssuesArgs, etc.)
│ ├── (all existing files — unchanged but with args structs moved in)
│ └── ...
├── core/ (slimmed: 14 files → infrastructure only)
│ ├── mod.rs
│ ├── backoff.rs
│ ├── config.rs
│ ├── db.rs
│ ├── error.rs
│ ├── lock.rs
│ ├── logging.rs
│ ├── metrics.rs
│ ├── paths.rs
│ ├── project.rs
│ ├── shutdown.rs
│ └── time.rs
├── timeline/ (NEW — extracted from core/)
│ ├── mod.rs (types from core/timeline.rs)
│ ├── seed.rs (from core/timeline_seed.rs)
│ ├── expand.rs (from core/timeline_expand.rs)
│ └── collect.rs (from core/timeline_collect.rs)
├── xref/ (NEW — extracted from core/)
│ ├── mod.rs
│ ├── note_parser.rs (from core/note_parser.rs)
│ └── references.rs (from core/references.rs)
├── ingestion/ (gains 4 files from core/)
│ ├── (existing files...)
│ ├── events_db.rs (from core/events_db.rs)
│ ├── dependent_queue.rs (from core/dependent_queue.rs)
│ ├── payloads.rs (from core/payloads.rs)
│ └── sync_run.rs (from core/sync_run.rs)
├── documents/ (unchanged)
├── embedding/ (unchanged)
├── gitlab/ (unchanged)
└── search/ (unchanged)
```
---
## Import Change Tracking
### Tier 1.1: Timeline extraction
| Consumer file | Old import | New import |
|---------------|-----------|------------|
| `cli/commands/timeline.rs:10-15` | `crate::core::timeline::*` | `crate::timeline::*` |
| `cli/commands/timeline.rs:13` | `crate::core::timeline_collect::collect_events` | `crate::timeline::collect_events` (or `crate::timeline::collect::collect_events`) |
| `cli/commands/timeline.rs:14` | `crate::core::timeline_expand::expand_timeline` | `crate::timeline::expand_timeline` |
| `cli/commands/timeline.rs:15` | `crate::core::timeline_seed::seed_timeline` | `crate::timeline::seed_timeline` |
| `core/timeline_seed.rs:7-8` | `super::timeline::*` | `super::*` (or `crate::timeline::*` depending on structure) |
| `core/timeline_expand.rs:6` | `super::timeline::*` | `super::*` |
| `core/timeline_collect.rs:4` | `super::timeline::*` | `super::*` |
| `core/timeline_seed.rs:8` | `crate::search::*` | `crate::search::*` (no change) |
| `core/timeline_seed.rs:6-7` | `super::error::Result` | `crate::core::error::Result` |
| `core/timeline_expand.rs:5` | `super::error::Result` | `crate::core::error::Result` |
| `core/timeline_collect.rs:3` | `super::error::*` | `crate::core::error::*` |
### Tier 1.2: Cross-reference extraction
| Consumer file | Old import | New import |
|---------------|-----------|------------|
| `ingestion/orchestrator.rs:10-12` | `crate::core::references::*` | `crate::xref::references::*` |
| `core/note_parser.rs:7-8` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
| `core/references.rs:4-5` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
### Tier 2.1: Ingestion-adjacent DB ops
| Consumer file | Old import | New import |
|---------------|-----------|------------|
| `cli/commands/count.rs:9` | `crate::core::events_db::*` | `crate::ingestion::events_db::*` |
| `ingestion/orchestrator.rs:6-8` | `crate::core::dependent_queue::*` | `super::dependent_queue::*` |
| `main.rs:37` | `crate::core::dependent_queue::release_all_locked_jobs` | `crate::ingestion::dependent_queue::release_all_locked_jobs` |
| `ingestion/discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
| `ingestion/issues.rs:9` | `crate::core::payloads::*` | `super::payloads::*` |
| `ingestion/merge_requests.rs:8` | `crate::core::payloads::*` | `super::payloads::*` |
| `ingestion/mr_discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
| `cli/commands/sync.rs` | (uses `crate::core::sync_run::*`) | `crate::ingestion::sync_run::*` |
| `cli/commands/sync_status.rs` | (uses `crate::core::sync_run::*` or `crate::core::metrics::*`) | check and update |
| Internal: `events_db.rs:4-5` | `super::error::*`, `super::time::*` | `crate::core::error::*`, `crate::core::time::*` |
| Internal: `dependent_queue.rs:5-6` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
| Internal: `payloads.rs:9-10` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
| Internal: `sync_run.rs:2-4` | `super::error::*`, `super::metrics::*`, `super::time::*` | `crate::core::error::*`, `crate::core::metrics::*`, `crate::core::time::*` |
---
## Execution Order
1. **Tier 1.1** — Extract timeline → `src/timeline/` (LOW risk, 1 consumer)
2. **Tier 1.2** — Extract xref → `src/xref/` (LOW risk, 1-2 consumers)
3. **Cargo check + clippy + test** after each tier
4. **Tier 2.1** — Move ingestion DB ops (MEDIUM risk, more consumers)
5. **Cargo check + clippy + test**
6. **Tier 2.2** — Split `cli/mod.rs` args (MEDIUM risk, mostly mechanical)
7. **Cargo check + clippy + test + fmt**
Each tier should be its own commit for easy rollback.
---
## What This Achieves
**Before:** A developer looking at `core/` sees 22 files and has to mentally sort "infrastructure vs. domain logic vs. pipeline stage." The timeline pipeline is invisible unless you know to look in `core/`.
**After:**
- `core/` has 12 files, all clearly infrastructure (db, config, error, paths, logging, lock, shutdown, backoff, time, metrics, project)
- `timeline/` is a discoverable first-class module showing the 5-stage pipeline
- `xref/` makes the cross-reference extraction domain visible
- `ingestion/` contains everything related to data fetching: the orchestrator, entity ingestors, AND their supporting DB operations
- `cli/mod.rs` is lean — just the top-level Cli struct and Commands enum
A new developer (or coding agent) can now answer "where is the timeline code?" → `src/timeline/`, "where is ingestion?" → `src/ingestion/`, "where is cross-reference extraction?" → `src/xref/`, without needing institutional knowledge.

412
README.md
View File

@@ -1,6 +1,6 @@
# Gitlore
Local GitLab data management with semantic search. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, and hybrid search.
Local GitLab data management with semantic search, people intelligence, and temporal analysis. Syncs issues, MRs, discussions, notes, and work item statuses from GitLab to a local SQLite database for fast, offline-capable querying, filtering, hybrid search, chronological event reconstruction, and expert discovery.
## Features
@@ -8,11 +8,22 @@ Local GitLab data management with semantic search. Syncs issues, MRs, discussion
- **Incremental sync**: Cursor-based sync only fetches changes since last sync
- **Full re-sync**: Reset cursors and fetch all data from scratch when needed
- **Multi-project**: Track issues and MRs across multiple GitLab projects
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches, work item status
- **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
- **People intelligence**: Expert discovery, workload analysis, review patterns, active discussions, and code ownership overlap
- **Timeline pipeline**: Reconstructs chronological event histories by combining search, graph traversal, and event aggregation across related entities
- **Git history linking**: Tracks merge and squash commit SHAs to connect MRs with git history
- **File change tracking**: Records which files each MR touches, enabling file-level history queries
- **Raw payload storage**: Preserves original GitLab API responses for debugging
- **Discussion threading**: Full support for issue and MR discussions including inline code review comments
- **Robot mode**: Machine-readable JSON output with structured errors and meaningful exit codes
- **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
- **Work item status enrichment**: Fetches issue statuses (e.g., "To do", "In progress", "Done") from GitLab's GraphQL API with adaptive page sizing, color-coded display, and case-insensitive filtering
- **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
- **Note querying**: Rich filtering over discussion notes by author, type, path, resolution status, time range, and body content
- **Discussion drift detection**: Semantic analysis of how discussions diverge from original issue intent
- **Robot mode**: Machine-readable JSON output with structured errors, meaningful exit codes, and actionable recovery steps
- **Error tolerance**: Auto-corrects common CLI mistakes (case, typos, single-dash flags, value casing) with teaching feedback
- **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing
## Installation
@@ -54,6 +65,21 @@ lore mrs 456
# Search across all indexed data
lore search "authentication bug"
# Who knows about this code area?
lore who src/features/auth/
# What is @asmith working on?
lore who @asmith
# Timeline of events related to deployments
lore timeline "deployment"
# Timeline for a specific issue
lore timeline issue:42
# Query notes by author
lore notes --author alice --since 7d
# Robot mode (machine-readable JSON)
lore -J issues -n 5 | jq .
```
@@ -74,13 +100,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
{ "path": "group/project" },
{ "path": "other-group/other-project" }
],
"defaultProject": "group/project",
"sync": {
"backfillDays": 14,
"staleLockMinutes": 10,
"heartbeatIntervalSeconds": 30,
"cursorRewindSeconds": 2,
"primaryConcurrency": 4,
"dependentConcurrency": 2
"dependentConcurrency": 2,
"fetchWorkItemStatus": true
},
"storage": {
"compressRawPayloads": true
@@ -90,6 +118,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
"model": "nomic-embed-text",
"baseUrl": "http://localhost:11434",
"concurrency": 4
},
"scoring": {
"authorWeight": 25,
"reviewerWeight": 10,
"noteBonus": 1,
"authorHalfLifeDays": 180,
"reviewerHalfLifeDays": 90,
"noteHalfLifeDays": 45,
"excludedUsernames": ["bot-user"]
}
}
```
@@ -101,12 +138,14 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
| `gitlab` | `baseUrl` | -- | GitLab instance URL (required) |
| `gitlab` | `tokenEnvVar` | `GITLAB_TOKEN` | Environment variable containing API token |
| `projects` | `path` | -- | Project path (e.g., `group/project`) |
| *(top-level)* | `defaultProject` | none | Fallback project path used when `-p` is omitted. Must match a configured project path (exact or suffix). CLI `-p` always overrides. |
| `sync` | `backfillDays` | `14` | Days to backfill on initial sync |
| `sync` | `staleLockMinutes` | `10` | Minutes before sync lock considered stale |
| `sync` | `heartbeatIntervalSeconds` | `30` | Frequency of lock heartbeat updates |
| `sync` | `cursorRewindSeconds` | `2` | Seconds to rewind cursor for overlap safety |
| `sync` | `primaryConcurrency` | `4` | Concurrent GitLab requests for primary resources |
| `sync` | `dependentConcurrency` | `2` | Concurrent requests for dependent resources |
| `sync` | `fetchWorkItemStatus` | `true` | Enrich issues with work item status via GraphQL (requires GitLab Premium/Ultimate) |
| `storage` | `dbPath` | `~/.local/share/lore/lore.db` | Database file path |
| `storage` | `backupDir` | `~/.local/share/lore/backups` | Backup directory |
| `storage` | `compressRawPayloads` | `true` | Compress stored API responses with gzip |
@@ -114,6 +153,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
| `embedding` | `model` | `nomic-embed-text` | Model name for embeddings |
| `embedding` | `baseUrl` | `http://localhost:11434` | Ollama server URL |
| `embedding` | `concurrency` | `4` | Concurrent embedding requests |
| `scoring` | `authorWeight` | `25` | Points per MR where the user authored code touching the path |
| `scoring` | `reviewerWeight` | `10` | Points per MR where the user reviewed code touching the path |
| `scoring` | `noteBonus` | `1` | Bonus per inline review comment (DiffNote) |
| `scoring` | `reviewerAssignmentWeight` | `3` | Points per MR where the user was assigned as reviewer |
| `scoring` | `authorHalfLifeDays` | `180` | Half-life in days for author contribution decay |
| `scoring` | `reviewerHalfLifeDays` | `90` | Half-life in days for reviewer contribution decay |
| `scoring` | `noteHalfLifeDays` | `45` | Half-life in days for note/comment decay |
| `scoring` | `closedMrMultiplier` | `0.5` | Score multiplier for closed (not merged) MRs |
| `scoring` | `excludedUsernames` | `[]` | Usernames excluded from expert results (e.g., bots) |
### Config File Resolution
@@ -168,18 +216,24 @@ lore issues --since 1m # Updated in last month
lore issues --since 2024-01-01 # Updated since date
lore issues --due-before 2024-12-31 # Due before date
lore issues --has-due # Only issues with due dates
lore issues --status "In progress" # By work item status (case-insensitive)
lore issues --status "To do" --status "In progress" # Multiple statuses (OR)
lore issues -p group/repo # Filter by project
lore issues --sort created --asc # Sort by created date, ascending
lore issues -o # Open first result in browser
# Field selection (robot mode)
lore -J issues --fields minimal # Compact: iid, title, state, updated_at_iso
lore -J issues --fields iid,title,labels,state # Custom fields
```
When listing, output includes: IID, title, state, author, assignee, labels, and update time.
When listing, output includes: IID, title, state, status (when any issue has one), assignee, labels, and update time. Status values display with their configured color. In robot mode, the `--fields` flag controls which fields appear in the JSON response.
When showing a single issue (e.g., `lore issues 123`), output includes: title, description, state, author, assignees, labels, milestone, due date, web URL, and threaded discussions.
When showing a single issue (e.g., `lore issues 123`), output includes: title, description, state, work item status (with color and category), author, assignees, labels, milestone, due date, web URL, and threaded discussions.
#### Project Resolution
The `-p` / `--project` flag uses cascading match logic across all commands:
When `-p` / `--project` is omitted, the `defaultProject` from config is used as a fallback. If neither is set, results span all configured projects. When a project is specified (via `-p` or config default), it uses cascading match logic across all commands:
1. **Exact match**: `group/project`
2. **Case-insensitive**: `Group/Project`
@@ -214,6 +268,10 @@ lore mrs --since 7d # Updated in last 7 days
lore mrs -p group/repo # Filter by project
lore mrs --sort created --asc # Sort by created date, ascending
lore mrs -o # Open first result in browser
# Field selection (robot mode)
lore -J mrs --fields minimal # Compact: iid, title, state, updated_at_iso
lore -J mrs --fields iid,title,draft,target_branch # Custom fields
```
When listing, output includes: IID, title (with [DRAFT] prefix if applicable), state, author, assignee, labels, and update time.
@@ -231,22 +289,226 @@ lore search "login flow" --mode semantic # Vector similarity only
lore search "auth" --type issue # Filter by source type
lore search "auth" --type mr # MR documents only
lore search "auth" --type discussion # Discussion documents only
lore search "auth" --type note # Individual notes only
lore search "deploy" --author username # Filter by author
lore search "deploy" -p group/repo # Filter by project
lore search "deploy" --label backend # Filter by label (AND logic)
lore search "deploy" --path src/ # Filter by file path (trailing / for prefix)
lore search "deploy" --after 7d # Created after (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-after 2w # Updated after
lore search "deploy" --since 7d # Created since (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-since 2w # Updated since
lore search "deploy" -n 50 # Limit results (default 20, max 100)
lore search "deploy" --explain # Show ranking explanation per result
lore search "deploy" --fts-mode raw # Raw FTS5 query syntax (advanced)
```
The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. FTS5 boolean operators (`AND`, `OR`, `NOT`, `NEAR`) are passed through in safe mode, so queries like `"switch AND health"` work without switching to raw mode. Use `raw` for advanced FTS5 query syntax (phrase matching, column filters, prefix queries).
A progress spinner displays during search, showing the active mode (e.g., `Searching (hybrid)...`). In robot mode, spinners are suppressed for clean JSON output.
Requires `lore generate-docs` (or `lore sync`) to have been run at least once. Semantic and hybrid modes require `lore embed` (or `lore sync`) to have generated vector embeddings via Ollama.
### `lore who`
People intelligence: discover experts, analyze workloads, review patterns, active discussions, and code overlap.
#### Expert Mode
Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis). Scores use exponential half-life decay so recent contributions count more than older ones. Scoring weights and half-life periods are configurable via the `scoring` config section.
```bash
lore who src/features/auth/ # Who knows about this directory?
lore who src/features/auth/login.ts # Who knows about this file?
lore who --path README.md # Root files need --path flag
lore who --path Makefile # Dotless root files too
lore who src/ --since 3m # Limit to recent 3 months
lore who src/ -p group/repo # Scope to project
lore who src/ --explain-score # Show per-component score breakdown
lore who src/ --as-of 30d # Score as if "now" was 30 days ago
lore who src/ --include-bots # Include bot users in results
```
The target is auto-detected as a path when it contains `/`. For root files without `/` (e.g., `README.md`), use the `--path` flag. Default time window: 6 months.
#### Workload Mode
See what someone is currently working on.
```bash
lore who @asmith # Full workload summary
lore who @asmith -p group/repo # Scoped to one project
```
Shows: assigned open issues, authored MRs, MRs under review, and unresolved discussions.
#### Reviews Mode
Analyze someone's code review patterns by area.
```bash
lore who @asmith --reviews # Review activity breakdown
lore who @asmith --reviews --since 3m # Recent review patterns
```
Shows: total DiffNotes, categorized by code area with percentage breakdown.
#### Active Mode
Surface unresolved discussions needing attention.
```bash
lore who --active # Unresolved discussions (last 7 days)
lore who --active --since 30d # Wider time window
lore who --active -p group/repo # Scoped to project
```
Shows: discussion threads with participants and last activity timestamps.
#### Overlap Mode
Find who else is touching a file or directory.
```bash
lore who --overlap src/features/auth/ # Who else works here?
lore who --overlap src/lib.rs # Single file overlap
```
Shows: users with touch counts (author vs. review), linked MR references. Default time window: 6 months.
#### Common Flags
| Flag | Description |
|------|-------------|
| `-p` / `--project` | Scope to a project (fuzzy match) |
| `--since` | Time window (7d, 2w, 6m, YYYY-MM-DD). Default varies by mode. |
| `-n` / `--limit` | Max results per section (1-500, default 20) |
| `--all-history` | Remove the default time window, query all history |
| `--detail` | Show per-MR detail breakdown (expert mode only) |
| `--explain-score` | Show per-component score breakdown (expert mode only) |
| `--as-of` | Score as if "now" is a past date (ISO 8601 or duration like 30d, expert mode only) |
| `--include-bots` | Include bot users normally excluded via `scoring.excludedUsernames` |
### `lore timeline`
Reconstruct a chronological timeline of events matching a keyword query. The pipeline discovers related entities through cross-reference graph traversal and assembles a unified, time-ordered event stream.
```bash
lore timeline "deployment" # Search-based seeding (hybrid search)
lore timeline issue:42 # Direct entity seeding by issue IID
lore timeline i:42 # Shorthand for issue:42
lore timeline mr:99 # Direct entity seeding by MR IID
lore timeline m:99 # Shorthand for mr:99
lore timeline "auth" -p group/repo # Scoped to a project
lore timeline "auth" --since 30d # Only recent events
lore timeline "migration" --depth 2 # Deeper cross-reference expansion
lore timeline "migration" --no-mentions # Skip 'mentioned' edges (reduces fan-out)
lore timeline "deploy" -n 50 # Limit event count
lore timeline "auth" --max-seeds 5 # Fewer seed entities
```
The query can be either a search string (hybrid search finds matching entities) or an entity reference (`issue:N`, `i:N`, `mr:N`, `m:N`) which directly seeds the timeline from a specific entity and its cross-references.
#### Flags
| Flag | Default | Description |
|------|---------|-------------|
| `-p` / `--project` | all | Scope to a specific project (fuzzy match) |
| `--since` | none | Only events after this date (7d, 2w, 6m, YYYY-MM-DD) |
| `--depth` | `1` | Cross-reference expansion depth (0 = seeds only) |
| `--no-mentions` | off | Skip "mentioned" edges during expansion (reduces fan-out) |
| `-n` / `--limit` | `100` | Maximum events to display |
| `--max-seeds` | `10` | Maximum seed entities from search |
| `--max-entities` | `50` | Maximum entities discovered via cross-references |
| `--max-evidence` | `10` | Maximum evidence notes included |
| `--fields` | all | Select output fields (comma-separated, or 'minimal' preset) |
#### Pipeline Stages
Each stage displays a numbered progress spinner (e.g., `[1/3] Seeding timeline...`). In robot mode, spinners are suppressed for clean JSON output.
1. **SEED** -- Hybrid search (FTS5 lexical + Ollama vector similarity via Reciprocal Rank Fusion) identifies the most relevant issues and MRs. Falls back to lexical-only if Ollama is unavailable. Discussion notes matching the query are also discovered and attached to their parent entities.
2. **HYDRATE** -- Evidence notes are extracted: the top search-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced. Matched discussions are collected as full thread candidates.
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities via "closes", "related", and "mentioned" references up to the configured depth. Use `--no-mentions` to exclude "mentioned" edges and reduce fan-out.
4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, evidence notes, and full discussion threads. Events are sorted chronologically with stable tiebreaking.
5. **RENDER** -- Events are formatted as human-readable text or structured JSON (robot mode).
#### Event Types
| Event | Description |
|-------|-------------|
| `Created` | Entity creation |
| `StateChanged` | State transitions (opened, closed, reopened) |
| `LabelAdded` | Label applied to entity |
| `LabelRemoved` | Label removed from entity |
| `MilestoneSet` | Milestone assigned |
| `MilestoneRemoved` | Milestone removed |
| `Merged` | MR merged (deduplicated against state events) |
| `NoteEvidence` | Discussion note matched by search, with snippet |
| `DiscussionThread` | Full discussion thread with all non-system notes |
| `CrossReferenced` | Reference to another entity |
#### Unresolved References
When graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the output. This enables discovery of external dependencies and can inform future sync targets.
### `lore notes`
Query individual notes from discussions with rich filtering options.
```bash
lore notes # List 50 most recent notes
lore notes --author alice --since 7d # Notes by alice in last 7 days
lore notes --for-issue 42 -p group/repo # Notes on issue #42
lore notes --for-mr 99 -p group/repo # Notes on MR !99
lore notes --path src/ --resolution unresolved # Unresolved diff notes in src/
lore notes --note-type DiffNote # Only inline code review comments
lore notes --contains "TODO" # Substring search in note body
lore notes --include-system # Include system-generated notes
lore notes --since 2w --until 2024-12-31 # Time-bounded range
lore notes --sort updated --asc # Sort by update time, ascending
lore notes --format csv # CSV output
lore notes --format jsonl # Line-delimited JSON
lore notes -o # Open first result in browser
# Field selection (robot mode)
lore -J notes --fields minimal # Compact: id, author_username, body, created_at_iso
```
#### Filters
| Flag | Description |
|------|-------------|
| `-a` / `--author` | Filter by note author username |
| `--note-type` | Filter by note type (DiffNote, DiscussionNote) |
| `--contains` | Substring search in note body |
| `--note-id` | Filter by internal note ID |
| `--gitlab-note-id` | Filter by GitLab note ID |
| `--discussion-id` | Filter by discussion ID |
| `--include-system` | Include system notes (excluded by default) |
| `--for-issue` | Notes on a specific issue IID (requires `-p`) |
| `--for-mr` | Notes on a specific MR IID (requires `-p`) |
| `-p` / `--project` | Scope to a project (fuzzy match) |
| `--since` | Notes created since (7d, 2w, 1m, or YYYY-MM-DD) |
| `--until` | Notes created until (YYYY-MM-DD, inclusive end-of-day) |
| `--path` | Filter by file path (DiffNotes only; trailing `/` for prefix match) |
| `--resolution` | Filter by resolution status (`any`, `unresolved`, `resolved`) |
| `--sort` | Sort by `created` (default) or `updated` |
| `--asc` | Sort ascending (default: descending) |
| `--format` | Output format: `table` (default), `json`, `jsonl`, `csv` |
| `-o` / `--open` | Open first result in browser |
### `lore drift`
Detect discussion divergence from the original intent of an issue by comparing the semantic similarity of discussion content against the issue description.
```bash
lore drift issues 42 # Check divergence on issue #42
lore drift issues 42 --threshold 0.6 # Higher threshold (stricter)
lore drift issues 42 -p group/repo # Scope to project
```
### `lore sync`
Run the full sync pipeline: ingest from GitLab, generate searchable documents, and compute embeddings.
Run the full sync pipeline: ingest from GitLab (including work item status enrichment via GraphQL), generate searchable documents, and compute embeddings.
```bash
lore sync # Full pipeline
@@ -254,19 +516,25 @@ lore sync --full # Reset cursors, fetch everything
lore sync --force # Override stale lock
lore sync --no-embed # Skip embedding step
lore sync --no-docs # Skip document regeneration
lore sync --no-events # Skip resource event fetching
lore sync --no-file-changes # Skip MR file change fetching
lore sync --dry-run # Preview what would be synced
```
The sync command displays animated progress bars for each stage and outputs timing metrics on completion. In robot mode (`-J`), detailed stage timing is included in the JSON response.
### `lore ingest`
Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings).
Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings). For issue ingestion, this includes a status enrichment phase that fetches work item statuses via the GitLab GraphQL API.
```bash
lore ingest # Ingest everything (issues + MRs)
lore ingest issues # Issues only
lore ingest issues # Issues only (includes status enrichment)
lore ingest mrs # MRs only
lore ingest issues -p group/repo # Single project
lore ingest --force # Override stale lock
lore ingest --full # Full re-sync (reset cursors)
lore ingest --dry-run # Preview what would change
```
The `--full` flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:
@@ -274,6 +542,8 @@ The `--full` flag resets sync cursors and discussion watermarks, then fetches al
- You want to ensure complete data after schema changes
- Troubleshooting sync issues
Status enrichment uses adaptive page sizing (100 → 50 → 25 → 10) to handle GitLab GraphQL complexity limits. It gracefully handles instances without GraphQL support or Premium/Ultimate licensing. Disable via `sync.fetchWorkItemStatus: false` in config.
### `lore generate-docs`
Extract searchable documents from ingested issues, MRs, and discussions for the FTS5 index.
@@ -290,6 +560,7 @@ Generate vector embeddings for documents via Ollama. Requires Ollama running wit
```bash
lore embed # Embed new/changed documents
lore embed --full # Re-embed all documents (clears existing)
lore embed --retry-failed # Retry previously failed embeddings
```
@@ -305,6 +576,9 @@ lore count discussions --for issue # Issue discussions only
lore count discussions --for mr # MR discussions only
lore count notes # Total notes (system vs user breakdown)
lore count notes --for issue # Issue notes only
lore count events # Total resource events
lore count events --for issue # Issue events only
lore count events --for mr # MR events only
```
### `lore stats`
@@ -315,6 +589,7 @@ Show document and index statistics, with optional integrity checks.
lore stats # Document and index statistics
lore stats --check # Run integrity checks
lore stats --check --repair # Repair integrity issues
lore stats --dry-run # Preview repairs without saving
```
### `lore status`
@@ -340,6 +615,17 @@ lore init --force # Overwrite existing config
lore init --non-interactive # Fail if prompts needed
```
When multiple projects are configured, `init` prompts whether to set a default project (used when `-p` is omitted). This can also be set via the `--default-project` flag.
In robot mode, `init` supports non-interactive setup via flags:
```bash
lore -J init --gitlab-url https://gitlab.com \
--token-env-var GITLAB_TOKEN \
--projects "group/project,other/project" \
--default-project group/project
```
### `lore auth`
Verify GitLab authentication is working.
@@ -375,7 +661,7 @@ lore migrate
### `lore health`
Quick pre-flight check for config, database, and schema version. Exits 0 if healthy, 1 if unhealthy.
Quick pre-flight check for config, database, and schema version. Exits 0 if healthy, 19 if unhealthy.
```bash
lore health
@@ -390,6 +676,7 @@ Machine-readable command manifest for agent self-discovery. Returns a JSON schem
```bash
lore robot-docs # Pretty-printed JSON
lore --robot robot-docs # Compact JSON for parsing
lore robot-docs --brief # Omit response_schema (~60% smaller)
```
### `lore version`
@@ -403,7 +690,7 @@ lore version
## Robot Mode
Machine-readable JSON output for scripting and AI agent consumption.
Machine-readable JSON output for scripting and AI agent consumption. All responses use compact (single-line) JSON with a uniform envelope and timing metadata.
### Activation
@@ -423,18 +710,93 @@ lore issues -n 5 | jq .
### Response Format
All commands return consistent JSON:
All commands return a consistent JSON envelope to stdout:
```json
{"ok": true, "data": {...}, "meta": {...}}
{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}
```
Errors return structured JSON to stderr:
Every response includes `meta.elapsed_ms` (wall-clock milliseconds for the command).
Errors return structured JSON to stderr with machine-actionable recovery steps:
```json
{"error": {"code": "CONFIG_NOT_FOUND", "message": "...", "suggestion": "Run 'lore init'"}}
{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'","actions":["lore init"]}}
```
The `actions` array contains executable shell commands an agent can run to recover from the error. It is omitted when empty (e.g., for generic I/O errors).
### Field Selection
The `--fields` flag controls which fields appear in the JSON response, reducing token usage for AI agent workflows. Supported on `issues`, `mrs`, `notes`, `search`, `timeline`, and `who` list commands:
```bash
# Minimal preset (~60% fewer tokens)
lore -J issues --fields minimal
# Custom field list
lore -J issues --fields iid,title,state,labels,updated_at_iso
# Available presets
# minimal: iid, title, state, updated_at_iso
```
Valid fields for issues: `iid`, `title`, `state`, `author_username`, `labels`, `assignees`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at_iso`
Valid fields for MRs: `iid`, `title`, `state`, `author_username`, `labels`, `draft`, `target_branch`, `source_branch`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `reviewers`
### Error Tolerance
The CLI auto-corrects common mistakes before parsing, emitting a teaching note to stderr. Corrections work in both human and robot modes:
| Correction | Example | Mode |
|-----------|---------|------|
| Single-dash long flag | `-robot` -> `--robot` | All |
| Case normalization | `--Robot` -> `--robot` | All |
| Flag prefix expansion | `--proj` -> `--project` (unambiguous only) | All |
| Fuzzy flag match | `--projct` -> `--project` | All (threshold 0.9 in robot, 0.8 in human) |
| Subcommand alias | `merge_requests` -> `mrs`, `robotdocs` -> `robot-docs` | All |
| Value normalization | `--state Opened` -> `--state opened` | All |
| Value fuzzy match | `--state opend` -> `--state opened` | All |
| Subcommand prefix | `lore iss` -> `lore issues` (unambiguous only, via clap) | All |
In robot mode, corrections emit structured JSON to stderr:
```json
{"warning":{"type":"ARG_CORRECTED","corrections":[...],"teaching":["Use double-dash for long flags: --robot (not -robot)"]}}
```
When a command or flag is still unrecognized after corrections, the error response includes a fuzzy suggestion and, for enum-like flags, lists valid values:
```json
{"error":{"code":"UNKNOWN_COMMAND","message":"...","suggestion":"Did you mean 'lore issues'? Example: lore --robot issues -n 10. Run 'lore robot-docs' for all commands"}}
```
### Command Aliases
Commands accept aliases for common variations:
| Primary | Aliases |
|---------|---------|
| `issues` | `issue` |
| `mrs` | `mr`, `merge-requests`, `merge-request` |
| `notes` | `note` |
| `search` | `find`, `query` |
| `stats` | `stat` |
| `status` | `st` |
Unambiguous prefixes also work via subcommand inference (e.g., `lore iss` -> `lore issues`, `lore time` -> `lore timeline`).
### Agent Self-Discovery
The `robot-docs` command provides a complete machine-readable manifest including response schemas for every command:
```bash
lore robot-docs | jq '.data.commands.issues.response_schema'
```
Each command entry includes `response_schema` describing the shape of its JSON response, `fields_presets` for commands supporting `--fields`, and copy-paste `example` invocations.
### Exit Codes
| Code | Meaning |
@@ -458,6 +820,7 @@ Errors return structured JSON to stderr:
| 16 | Embedding failed |
| 17 | Not found (entity does not exist) |
| 18 | Ambiguous match (use `-p` to specify project) |
| 19 | Health check failed |
| 20 | Config not found |
## Configuration Precedence
@@ -478,6 +841,10 @@ lore -J <command> # JSON shorthand
lore --color never <command> # Disable color output
lore --color always <command> # Force color output
lore -q <command> # Suppress non-essential output
lore -v <command> # Debug logging
lore -vv <command> # More verbose debug logging
lore -vvv <command> # Trace-level logging
lore --log-format json <command> # JSON-formatted log output to stderr
```
Color output respects `NO_COLOR` and `CLICOLOR` environment variables in `auto` mode (the default).
@@ -507,8 +874,8 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
| Table | Purpose |
|-------|---------|
| `projects` | Tracked GitLab projects with metadata |
| `issues` | Issue metadata (title, state, author, due date, milestone) |
| `merge_requests` | MR metadata (title, state, draft, branches, merge status) |
| `issues` | Issue metadata (title, state, author, due date, milestone, work item status) |
| `merge_requests` | MR metadata (title, state, draft, branches, merge status, commit SHAs) |
| `milestones` | Project milestones with state and due dates |
| `labels` | Project labels with colors |
| `issue_labels` | Many-to-many issue-label relationships |
@@ -516,8 +883,13 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
| `mr_labels` | Many-to-many MR-label relationships |
| `mr_assignees` | Many-to-many MR-assignee relationships |
| `mr_reviewers` | Many-to-many MR-reviewer relationships |
| `mr_file_changes` | Files touched by each MR (path, change type, renames) |
| `discussions` | Issue/MR discussion threads |
| `notes` | Individual notes within discussions (with system note flag and DiffNote position data) |
| `resource_state_events` | Issue/MR state change history (opened, closed, merged, reopened) |
| `resource_label_events` | Label add/remove events with actor and timestamp |
| `resource_milestone_events` | Milestone add/remove events with actor and timestamp |
| `entity_references` | Cross-references between entities (MR closes issue, mentioned in, etc.) |
| `documents` | Extracted searchable text for FTS and embedding |
| `documents_fts` | FTS5 full-text search index |
| `embeddings` | Vector embeddings for semantic search |

View File

@@ -5,6 +5,17 @@ fn main() {
.ok()
.and_then(|o| String::from_utf8(o.stdout).ok())
.unwrap_or_default();
println!("cargo:rustc-env=GIT_HASH={}", hash.trim());
let hash = hash.trim();
println!("cargo:rustc-env=GIT_HASH={hash}");
// Combined version string for clap --version flag
let pkg_version = std::env::var("CARGO_PKG_VERSION").unwrap_or_default();
if hash.is_empty() {
println!("cargo:rustc-env=LORE_VERSION={pkg_version}");
} else {
println!("cargo:rustc-env=LORE_VERSION={pkg_version} ({hash})");
}
println!("cargo:rerun-if-changed=.git/HEAD");
println!("cargo:rerun-if-changed=.git/refs/heads");
}

View File

@@ -1,3 +1,15 @@
---
plan: true
title: "api-efficiency-findings"
status: drafting
iteration: 0
target_iterations: 8
beads_revision: 0
related_plans: []
created: 2026-02-07
updated: 2026-02-07
---
# API Efficiency & Observability Findings
> **Status:** Draft - working through items

View File

@@ -0,0 +1,245 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 300, "y": 15, "text": "Human User Flow Map", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 220, "y": 53, "text": "15 human workflows mapped to lore commands. Arrows show data dependency.", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "text", "id": "col-trigger", "x": 60, "y": 80, "text": "TRIGGER (Problem)", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "col-flow", "x": 400, "y": 80, "text": "COMMAND FLOW", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "col-gap", "x": 880, "y": 80, "text": "GAP", "fontSize": 16, "strokeColor": "#ef4444" },
{ "type": "rectangle", "id": "zone-daily", "x": 20, "y": 110, "width": 960, "height": 190,
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-daily-label", "x": 30, "y": 115, "text": "Daily Operations", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "rectangle", "id": "h1-trigger", "x": 30, "y": 140, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H1: Standup prep\n\"What moved overnight?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a1", "x": 230, "y": 165, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-cmd1", "x": 280, "y": 145, "width": 90, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "sync -q", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a2", "x": 370, "y": 165, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-cmd2", "x": 400, "y": 145, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues --since 1d", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a3", "x": 540, "y": 165, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-cmd3", "x": 570, "y": 145, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs --since 1d", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a4", "x": 700, "y": 165, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-cmd4", "x": 730, "y": 145, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "who @me", "fontSize": 14 } },
{ "type": "arrow", "id": "h1-a5", "x": 830, "y": 165, "width": 40, "height": 0,
"points": [[0,0],[40,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h1-gap", "x": 870, "y": 140, "width": 100, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No @me\nNo feed", "fontSize": 14 } },
{ "type": "rectangle", "id": "h3-trigger", "x": 30, "y": 210, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H3: Incident\n\"Deploy broke prod\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h3-a1", "x": 230, "y": 235, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h3-cmd1", "x": 280, "y": 215, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "timeline deploy", "fontSize": 14 } },
{ "type": "arrow", "id": "h3-a2", "x": 410, "y": 235, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h3-cmd2", "x": 440, "y": 215, "width": 160, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "search deploy --mr", "fontSize": 14 } },
{ "type": "arrow", "id": "h3-a3", "x": 600, "y": 235, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h3-cmd3", "x": 630, "y": 215, "width": 110, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs <iid>", "fontSize": 14 } },
{ "type": "arrow", "id": "h3-a4", "x": 740, "y": 235, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h3-cmd4", "x": 770, "y": 215, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "who --overlap", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-planning", "x": 20, "y": 310, "width": 960, "height": 190,
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-planning-label", "x": 30, "y": 315, "text": "Planning & Assignment", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "rectangle", "id": "h2-trigger", "x": 30, "y": 340, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H2: Sprint plan\n\"What's ready to pick?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h2-a1", "x": 230, "y": 365, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h2-cmd1", "x": 280, "y": 345, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues -s opened -l ready", "fontSize": 13 } },
{ "type": "arrow", "id": "h2-a2", "x": 450, "y": 365, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h2-cmd2", "x": 480, "y": 345, "width": 150, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues --has-due", "fontSize": 14 } },
{ "type": "arrow", "id": "h2-a3", "x": 630, "y": 365, "width": 230, "height": 0,
"points": [[0,0],[230,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h2-gap", "x": 860, "y": 340, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No\n--no-assignee", "fontSize": 14 } },
{ "type": "rectangle", "id": "h8-trigger", "x": 30, "y": 410, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H8: Assign work\n\"Who has bandwidth?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h8-a1", "x": 230, "y": 435, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h8-cmd1", "x": 280, "y": 415, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who @alice", "fontSize": 14 } },
{ "type": "arrow", "id": "h8-a2", "x": 400, "y": 435, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h8-cmd2", "x": 430, "y": 415, "width": 110, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who @bob", "fontSize": 14 } },
{ "type": "arrow", "id": "h8-a3", "x": 540, "y": 435, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h8-cmd3", "x": 570, "y": 415, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who @carol...", "fontSize": 14 } },
{ "type": "arrow", "id": "h8-a4", "x": 690, "y": 435, "width": 170, "height": 0,
"points": [[0,0],[170,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h8-gap", "x": 860, "y": 410, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No team\nworkload view", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-investigation", "x": 20, "y": 510, "width": 960, "height": 260,
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-invest-label", "x": 30, "y": 515, "text": "Investigation & Understanding", "fontSize": 14, "strokeColor": "#b45309" },
{ "type": "rectangle", "id": "h7-trigger", "x": 30, "y": 540, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H7: Why this way?\n\"Understand a decision\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h7-a1", "x": 230, "y": 565, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h7-cmd1", "x": 280, "y": 545, "width": 160, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "search \"rationale\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h7-a2", "x": 440, "y": 565, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h7-cmd2", "x": 470, "y": 545, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "timeline --depth 2", "fontSize": 14 } },
{ "type": "arrow", "id": "h7-a3", "x": 610, "y": 565, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h7-cmd3", "x": 640, "y": 545, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues 234", "fontSize": 14 } },
{ "type": "arrow", "id": "h7-a4", "x": 740, "y": 565, "width": 120, "height": 0,
"points": [[0,0],[120,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h7-gap", "x": 860, "y": 540, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No per-note\nsearch", "fontSize": 14 } },
{ "type": "rectangle", "id": "h11-trigger", "x": 30, "y": 610, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H11: Bug lifecycle\n\"Why does #321 reopen?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h11-a1", "x": 230, "y": 635, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h11-cmd1", "x": 280, "y": 615, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues 321", "fontSize": 14 } },
{ "type": "arrow", "id": "h11-a2", "x": 400, "y": 635, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h11-cmd2", "x": 430, "y": 615, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "timeline ???", "fontSize": 14 } },
{ "type": "arrow", "id": "h11-a3", "x": 560, "y": 635, "width": 300, "height": 0,
"points": [[0,0],[300,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h11-gap", "x": 860, "y": 610, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No entity\ntimeline", "fontSize": 14 } },
{ "type": "rectangle", "id": "h14-trigger", "x": 30, "y": 680, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H14: Prior art?\n\"Was this tried before?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h14-a1", "x": 230, "y": 705, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h14-cmd1", "x": 280, "y": 685, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "search \"memory leak\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h14-a2", "x": 450, "y": 705, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h14-cmd2", "x": 480, "y": 685, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "mrs --closed?", "fontSize": 14 } },
{ "type": "arrow", "id": "h14-a3", "x": 600, "y": 705, "width": 260, "height": 0,
"points": [[0,0],[260,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h14-gap", "x": 860, "y": 680, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No --state\non search", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-people", "x": 20, "y": 780, "width": 960, "height": 190,
"backgroundColor": "#e5dbff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#8b5cf6", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-people-label", "x": 30, "y": 785, "text": "People & Expertise", "fontSize": 14, "strokeColor": "#7048e8" },
{ "type": "rectangle", "id": "h4-trigger", "x": 30, "y": 810, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H4: Review prep\n\"Context for MR !789\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h4-a1", "x": 230, "y": 835, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h4-cmd1", "x": 280, "y": 815, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs 789", "fontSize": 14 } },
{ "type": "arrow", "id": "h4-a2", "x": 380, "y": 835, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h4-cmd2", "x": 410, "y": 815, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "who src/auth/", "fontSize": 14 } },
{ "type": "arrow", "id": "h4-a3", "x": 530, "y": 835, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h4-cmd3", "x": 560, "y": 815, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "search \"auth\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h4-a4", "x": 690, "y": 835, "width": 170, "height": 0,
"points": [[0,0],[170,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h4-gap", "x": 860, "y": 810, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No MR file\nlist output", "fontSize": 14 } },
{ "type": "rectangle", "id": "h6-trigger", "x": 30, "y": 880, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "H6: Find reviewer\n\"Who should review?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "h6-a1", "x": 230, "y": 905, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h6-cmd1", "x": 280, "y": 885, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who src/auth/", "fontSize": 14 } },
{ "type": "arrow", "id": "h6-a2", "x": 410, "y": 905, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h6-cmd2", "x": 440, "y": 885, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who src/pay/", "fontSize": 14 } },
{ "type": "arrow", "id": "h6-a3", "x": 580, "y": 905, "width": 30, "height": 0,
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h6-cmd3", "x": 610, "y": 885, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "who @candidate", "fontSize": 14 } },
{ "type": "arrow", "id": "h6-a4", "x": 750, "y": 905, "width": 110, "height": 0,
"points": [[0,0],[110,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "h6-gap", "x": 860, "y": 880, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No multi-\npath query", "fontSize": 14 } },
{ "type": "text", "id": "callout-1", "x": 30, "y": 990, "text": "Pattern: Most human flows require 3-5 serial commands. Average gap rate: 73% of flows have at least one.", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "callout-2", "x": 30, "y": 1015, "text": "Top optimization: Composite commands (activity feed, team workload) would reduce multi-command flows by ~40%.", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "text", "id": "callout-3", "x": 30, "y": 1040, "text": "Top missing data: MR file changes and entity references are stored but invisible to CLI users.", "fontSize": 14, "strokeColor": "#ef4444" }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 274 KiB

View File

@@ -0,0 +1,204 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 320, "y": 15, "text": "AI Agent Flow Map", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 180, "y": 53, "text": "15 agent automation workflows. Agents need structured JSON (-J), exit codes, and field selection.", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "text", "id": "col-trigger", "x": 60, "y": 80, "text": "TRIGGER (Agent Goal)", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "col-flow", "x": 400, "y": 80, "text": "COMMAND PIPELINE", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "col-gap", "x": 880, "y": 80, "text": "BLOCKED BY", "fontSize": 16, "strokeColor": "#ef4444" },
{ "type": "rectangle", "id": "zone-context", "x": 20, "y": 110, "width": 960, "height": 200,
"backgroundColor": "#e5dbff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#8b5cf6", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-context-label", "x": 30, "y": 115, "text": "Context Gathering (pre-action)", "fontSize": 14, "strokeColor": "#7048e8" },
{ "type": "rectangle", "id": "a1-trigger", "x": 30, "y": 140, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A1: Pre-edit context\nAbout to modify files", "fontSize": 14 } },
{ "type": "arrow", "id": "a1-a1", "x": 230, "y": 165, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a1-cmd1", "x": 280, "y": 145, "width": 80, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J health", "fontSize": 14 } },
{ "type": "arrow", "id": "a1-a2", "x": 360, "y": 165, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a1-cmd2", "x": 380, "y": 145, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J who src/auth/", "fontSize": 14 } },
{ "type": "arrow", "id": "a1-a3", "x": 520, "y": 165, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a1-cmd3", "x": 540, "y": 145, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J search \"auth\" -n 10", "fontSize": 14 } },
{ "type": "arrow", "id": "a1-a4", "x": 710, "y": 165, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a1-cmd4", "x": 730, "y": 145, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J who --overlap", "fontSize": 14 } },
{ "type": "rectangle", "id": "a6-trigger", "x": 30, "y": 210, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A6: Auto-assign reviewers\nBased on file expertise", "fontSize": 14 } },
{ "type": "arrow", "id": "a6-a1", "x": 230, "y": 235, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a6-cmd1", "x": 280, "y": 215, "width": 100, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "-J mrs 456", "fontSize": 14 } },
{ "type": "text", "id": "a6-block", "x": 390, "y": 218, "text": "file list not\nin response!", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "arrow", "id": "a6-a2", "x": 380, "y": 245, "width": 480, "height": -10,
"points": [[0,0],[480,-10]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a6-gap", "x": 860, "y": 210, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "MR files\nnot exposed", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-report", "x": 20, "y": 320, "width": 960, "height": 200,
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-report-label", "x": 30, "y": 325, "text": "Reporting & Synthesis", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "rectangle", "id": "a3-trigger", "x": 30, "y": 350, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A3: Sprint status report\n7 queries for 1 report", "fontSize": 14 } },
{ "type": "arrow", "id": "a3-a1", "x": 230, "y": 375, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a3-cmd1", "x": 280, "y": 352, "width": 100, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues -s closed", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd2", "x": 390, "y": 352, "width": 100, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues --status", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd3", "x": 500, "y": 352, "width": 100, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs -s merged", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd4", "x": 610, "y": 352, "width": 80, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "mrs -s open", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd5", "x": 700, "y": 352, "width": 80, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "count x2", "fontSize": 12 } },
{ "type": "rectangle", "id": "a3-cmd6", "x": 790, "y": 352, "width": 60, "height": 36,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "who", "fontSize": 12 } },
{ "type": "arrow", "id": "a3-agap", "x": 850, "y": 370, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a3-gap", "x": 860, "y": 350, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No summary\ncommand", "fontSize": 14 } },
{ "type": "text", "id": "a3-note", "x": 280, "y": 395, "text": "7 sequential API calls for one report. A `lore summary` could reduce to 1.", "fontSize": 12, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "a7-trigger", "x": 30, "y": 430, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A7: Incident timeline\nPostmortem reconstruction", "fontSize": 14 } },
{ "type": "arrow", "id": "a7-a1", "x": 230, "y": 455, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a7-cmd1", "x": 280, "y": 435, "width": 190, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J timeline --depth 2", "fontSize": 14 } },
{ "type": "arrow", "id": "a7-a2", "x": 470, "y": 455, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a7-cmd2", "x": 490, "y": 435, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J search --since 3d", "fontSize": 14 } },
{ "type": "arrow", "id": "a7-a3", "x": 660, "y": 455, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a7-cmd3", "x": 680, "y": 435, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J mrs -s merged", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-discover", "x": 20, "y": 530, "width": 960, "height": 200,
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-discover-label", "x": 30, "y": 535, "text": "Discovery & Correlation", "fontSize": 14, "strokeColor": "#b45309" },
{ "type": "rectangle", "id": "a5-trigger", "x": 30, "y": 560, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A5: PR description\nFind related issues to link", "fontSize": 14 } },
{ "type": "arrow", "id": "a5-a1", "x": 230, "y": 585, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a5-cmd1", "x": 280, "y": 565, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J search keywords", "fontSize": 14 } },
{ "type": "arrow", "id": "a5-a2", "x": 450, "y": 585, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a5-cmd2", "x": 470, "y": 565, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J issues --fields iid,url", "fontSize": 14 } },
{ "type": "arrow", "id": "a5-a3", "x": 650, "y": 585, "width": 210, "height": 0,
"points": [[0,0],[210,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a5-gap", "x": 860, "y": 560, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No refs\nquery", "fontSize": 14 } },
{ "type": "text", "id": "a5-note", "x": 280, "y": 612, "text": "Agent can't ask \"which issues does MR !456 close?\" -- entity_references data exists but isn't queryable.", "fontSize": 12, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "a11-trigger", "x": 30, "y": 640, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A11: Knowledge graph\nMap entity relationships", "fontSize": 14 } },
{ "type": "arrow", "id": "a11-a1", "x": 230, "y": 665, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a11-cmd1", "x": 280, "y": 645, "width": 140, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J search -n 30", "fontSize": 14 } },
{ "type": "arrow", "id": "a11-a2", "x": 420, "y": 665, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a11-cmd2", "x": 440, "y": 645, "width": 190, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J timeline --depth 2", "fontSize": 14 } },
{ "type": "arrow", "id": "a11-a3", "x": 630, "y": 665, "width": 230, "height": 0,
"points": [[0,0],[230,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a11-gap", "x": 860, "y": 640, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No refs\nquery", "fontSize": 14 } },
{ "type": "rectangle", "id": "zone-maint", "x": 20, "y": 740, "width": 960, "height": 140,
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-maint-label", "x": 30, "y": 745, "text": "Maintenance & Cleanup", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "rectangle", "id": "a9-trigger", "x": 30, "y": 770, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A9: Stale issue cleanup\nWeekly backlog hygiene", "fontSize": 14 } },
{ "type": "arrow", "id": "a9-a1", "x": 230, "y": 795, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a9-cmd1", "x": 280, "y": 775, "width": 200, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J issues --sort updated --asc", "fontSize": 12 } },
{ "type": "arrow", "id": "a9-a2", "x": 480, "y": 795, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a9-cmd2", "x": 500, "y": 775, "width": 120, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "filter client-side", "fontSize": 14 } },
{ "type": "arrow", "id": "a9-a3", "x": 620, "y": 795, "width": 240, "height": 0,
"points": [[0,0],[240,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a9-gap", "x": 860, "y": 770, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No --before\nNo offset", "fontSize": 14 } },
{ "type": "rectangle", "id": "a15-trigger", "x": 30, "y": 840, "width": 200, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "A15: Conflict detect\n\"Safe to start work?\"", "fontSize": 14 } },
{ "type": "arrow", "id": "a15-a1", "x": 230, "y": 865, "width": 50, "height": 0,
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a15-cmd1", "x": 280, "y": 845, "width": 110, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J issues 123", "fontSize": 14 } },
{ "type": "arrow", "id": "a15-a2", "x": 390, "y": 865, "width": 20, "height": 0,
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
{ "type": "rectangle", "id": "a15-cmd2", "x": 410, "y": 845, "width": 130, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "-J who --overlap", "fontSize": 14 } },
{ "type": "arrow", "id": "a15-a3", "x": 540, "y": 865, "width": 320, "height": 0,
"points": [[0,0],[320,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "rectangle", "id": "a15-gap", "x": 860, "y": 840, "width": 110, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "No refs +\n--state", "fontSize": 14 } },
{ "type": "text", "id": "callout-1", "x": 30, "y": 910, "text": "Agent-specific pain: Agents always use -J and --fields minimal for token efficiency. Every extra query burns tokens.", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "callout-2", "x": 30, "y": 935, "text": "Biggest ROI: `lore refs` command would unblock A5, A11, A12, A15 instantly. Data already exists in entity_references table.", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "text", "id": "callout-3", "x": 30, "y": 960, "text": "Token waste: Sprint report (A3) requires 7 calls. A composite `lore summary` could save ~85% of tokens.", "fontSize": 14, "strokeColor": "#ef4444" }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 269 KiB

View File

@@ -0,0 +1,203 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 280, "y": 15, "text": "Command Coverage Heatmap", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 220, "y": 53, "text": "Which commands serve which workflows? Darker = more essential to that flow.", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "text", "id": "col-issues", "x": 260, "y": 85, "text": "issues", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-mrs", "x": 330, "y": 85, "text": "mrs", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-search", "x": 390, "y": 85, "text": "search", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-who", "x": 465, "y": 85, "text": "who", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-timeline", "x": 520, "y": 85, "text": "timeline", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-sync", "x": 600, "y": 85, "text": "sync", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-count", "x": 660, "y": 85, "text": "count", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-status", "x": 720, "y": 85, "text": "status", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "col-missing", "x": 790, "y": 85, "text": "MISSING?", "fontSize": 14, "strokeColor": "#ef4444" },
{ "type": "text", "id": "grp-human", "x": 15, "y": 108, "text": "HUMAN FLOWS", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "text", "id": "h1-label", "x": 15, "y": 135, "text": "H1 Standup prep", "fontSize": 14 },
{ "type": "rectangle", "id": "h1-issues", "x": 255, "y": 130, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h1-mrs", "x": 325, "y": 130, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h1-who", "x": 460, "y": 130, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h1-sync", "x": 595, "y": 130, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h1-gap", "x": 780, "y": 135, "text": "activity feed", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h2-label", "x": 15, "y": 170, "text": "H2 Sprint planning", "fontSize": 14 },
{ "type": "rectangle", "id": "h2-issues", "x": 255, "y": 165, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h2-count", "x": 655, "y": 165, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h2-gap", "x": 780, "y": 170, "text": "--no-assignee", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h3-label", "x": 15, "y": 205, "text": "H3 Incident response", "fontSize": 14 },
{ "type": "rectangle", "id": "h3-mrs", "x": 325, "y": 200, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h3-search", "x": 390, "y": 200, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h3-who", "x": 460, "y": 200, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h3-timeline", "x": 525, "y": 200, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h3-sync", "x": 595, "y": 200, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h4-label", "x": 15, "y": 240, "text": "H4 Code review prep", "fontSize": 14 },
{ "type": "rectangle", "id": "h4-mrs", "x": 325, "y": 235, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h4-search", "x": 390, "y": 235, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h4-who", "x": 460, "y": 235, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h4-timeline", "x": 525, "y": 235, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h4-gap", "x": 780, "y": 240, "text": "MR file list", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h5-label", "x": 15, "y": 275, "text": "H5 Onboarding", "fontSize": 14 },
{ "type": "rectangle", "id": "h5-issues", "x": 255, "y": 270, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h5-mrs", "x": 325, "y": 270, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h5-search", "x": 390, "y": 270, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h5-who", "x": 460, "y": 270, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h5-timeline", "x": 525, "y": 270, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h6-label", "x": 15, "y": 310, "text": "H6 Find reviewer", "fontSize": 14 },
{ "type": "rectangle", "id": "h6-who", "x": 460, "y": 305, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h6-gap", "x": 780, "y": 310, "text": "multi-path who", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h7-label", "x": 15, "y": 345, "text": "H7 Why was this built?", "fontSize": 14 },
{ "type": "rectangle", "id": "h7-issues", "x": 255, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h7-mrs", "x": 325, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h7-search", "x": 390, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h7-timeline", "x": 525, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h7-gap", "x": 780, "y": 345, "text": "per-note search", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h8-label", "x": 15, "y": 380, "text": "H8 Team workload", "fontSize": 14 },
{ "type": "rectangle", "id": "h8-who", "x": 460, "y": 375, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h8-gap", "x": 780, "y": 380, "text": "team view", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h9-label", "x": 15, "y": 415, "text": "H9 Release notes", "fontSize": 14 },
{ "type": "rectangle", "id": "h9-issues", "x": 255, "y": 410, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h9-mrs", "x": 325, "y": 410, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h9-gap", "x": 780, "y": 415, "text": "mrs --milestone", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h10-label", "x": 15, "y": 450, "text": "H10 Stale issues", "fontSize": 14 },
{ "type": "rectangle", "id": "h10-issues", "x": 255, "y": 445, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h10-gap", "x": 780, "y": 450, "text": "--updated-before", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h11-label", "x": 15, "y": 485, "text": "H11 Bug lifecycle", "fontSize": 14 },
{ "type": "rectangle", "id": "h11-issues", "x": 255, "y": 480, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h11-timeline", "x": 525, "y": 480, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "text", "id": "h11-gap", "x": 780, "y": 485, "text": "entity timeline", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h12-label", "x": 15, "y": 520, "text": "H12 Who broke tests?", "fontSize": 14 },
{ "type": "rectangle", "id": "h12-search", "x": 390, "y": 515, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h12-who", "x": 460, "y": 515, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h13-label", "x": 15, "y": 555, "text": "H13 Feature tracking", "fontSize": 14 },
{ "type": "rectangle", "id": "h13-issues", "x": 255, "y": 550, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h13-mrs", "x": 325, "y": 550, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h13-timeline", "x": 525, "y": 550, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "h14-label", "x": 15, "y": 590, "text": "H14 Prior art check", "fontSize": 14 },
{ "type": "rectangle", "id": "h14-search", "x": 390, "y": 585, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "h14-timeline", "x": 525, "y": 585, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "h14-gap", "x": 780, "y": 590, "text": "--state on search", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "h15-label", "x": 15, "y": 625, "text": "H15 My discussions", "fontSize": 14 },
{ "type": "rectangle", "id": "h15-who", "x": 460, "y": 620, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "text", "id": "h15-gap", "x": 780, "y": 625, "text": "participant filter", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "rectangle", "id": "divider", "x": 10, "y": 655, "width": 910, "height": 2, "backgroundColor": "#dee2e6", "fillStyle": "solid" },
{ "type": "text", "id": "grp-agent", "x": 15, "y": 668, "text": "AI AGENT FLOWS", "fontSize": 14, "strokeColor": "#7048e8" },
{ "type": "text", "id": "a1-label", "x": 15, "y": 695, "text": "A1 Pre-edit context", "fontSize": 14 },
{ "type": "rectangle", "id": "a1-mrs", "x": 325, "y": 690, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a1-search", "x": 390, "y": 690, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a1-who", "x": 460, "y": 690, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a2-label", "x": 15, "y": 730, "text": "A2 Auto-triage", "fontSize": 14 },
{ "type": "rectangle", "id": "a2-issues", "x": 255, "y": 725, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a2-search", "x": 390, "y": 725, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a2-who", "x": 460, "y": 725, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a2-gap", "x": 780, "y": 730, "text": "detail --fields", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a3-label", "x": 15, "y": 765, "text": "A3 Sprint report", "fontSize": 14 },
{ "type": "rectangle", "id": "a3-issues", "x": 255, "y": 760, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a3-mrs", "x": 325, "y": 760, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a3-who", "x": 460, "y": 760, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a3-count", "x": 655, "y": 760, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a3-gap", "x": 780, "y": 765, "text": "summary cmd", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a4-label", "x": 15, "y": 800, "text": "A4 Prior art", "fontSize": 14 },
{ "type": "rectangle", "id": "a4-search", "x": 390, "y": 795, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a4-timeline", "x": 525, "y": 795, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a4-gap", "x": 780, "y": 800, "text": "per-note search", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a5-label", "x": 15, "y": 835, "text": "A5 PR description", "fontSize": 14 },
{ "type": "rectangle", "id": "a5-issues", "x": 255, "y": 830, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a5-search", "x": 390, "y": 830, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a5-gap", "x": 780, "y": 835, "text": "entity refs query", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a6-label", "x": 15, "y": 870, "text": "A6 Reviewer assign", "fontSize": 14 },
{ "type": "rectangle", "id": "a6-mrs", "x": 325, "y": 865, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a6-who", "x": 460, "y": 865, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a6-gap", "x": 780, "y": 870, "text": "MR file list", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a7-label", "x": 15, "y": 905, "text": "A7 Incident timeline", "fontSize": 14 },
{ "type": "rectangle", "id": "a7-mrs", "x": 325, "y": 900, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a7-search", "x": 390, "y": 900, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a7-timeline", "x": 525, "y": 900, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a8-label", "x": 15, "y": 940, "text": "A8 Cross-project", "fontSize": 14 },
{ "type": "rectangle", "id": "a8-search", "x": 390, "y": 935, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a8-timeline", "x": 525, "y": 935, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a8-gap", "x": 780, "y": 940, "text": "group by project", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a9-label", "x": 15, "y": 975, "text": "A9 Stale cleanup", "fontSize": 14 },
{ "type": "rectangle", "id": "a9-issues", "x": 255, "y": 970, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a9-search", "x": 390, "y": 970, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a9-gap", "x": 780, "y": 975, "text": "--updated-before", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a10-label", "x": 15, "y": 1010, "text": "A10 Review context", "fontSize": 14 },
{ "type": "rectangle", "id": "a10-mrs", "x": 325, "y": 1005, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a10-who", "x": 460, "y": 1005, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a10-gap", "x": 780, "y": 1010, "text": "MR file list", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a11-label", "x": 15, "y": 1045, "text": "A11 Knowledge graph", "fontSize": 14 },
{ "type": "rectangle", "id": "a11-search", "x": 390, "y": 1040, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a11-timeline", "x": 525, "y": 1040, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a11-gap", "x": 780, "y": 1045, "text": "entity refs query", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a12-label", "x": 15, "y": 1080, "text": "A12 Release check", "fontSize": 14 },
{ "type": "rectangle", "id": "a12-issues", "x": 255, "y": 1075, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a12-mrs", "x": 325, "y": 1075, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a12-who", "x": 460, "y": 1075, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a12-gap", "x": 780, "y": 1080, "text": "mrs --milestone", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a13-label", "x": 15, "y": 1115, "text": "A13 What changed?", "fontSize": 14 },
{ "type": "rectangle", "id": "a13-issues", "x": 255, "y": 1110, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a13-mrs", "x": 325, "y": 1110, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a13-gap", "x": 780, "y": 1115, "text": "state-change filter", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a14-label", "x": 15, "y": 1150, "text": "A14 Meeting prep", "fontSize": 14 },
{ "type": "rectangle", "id": "a14-issues", "x": 255, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a14-mrs", "x": 325, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a14-who", "x": 460, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a14-count", "x": 655, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "a14-gap", "x": 780, "y": 1150, "text": "summary cmd", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "a15-label", "x": 15, "y": 1185, "text": "A15 Conflict detect", "fontSize": 14 },
{ "type": "rectangle", "id": "a15-issues", "x": 255, "y": 1180, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a15-mrs", "x": 325, "y": 1180, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "rectangle", "id": "a15-who", "x": 460, "y": 1180, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "a15-gap", "x": 780, "y": 1185, "text": "entity refs, --state", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "text", "id": "legend-title", "x": 15, "y": 1230, "text": "Legend:", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-essential", "x": 80, "y": 1228, "width": 20, "height": 20, "backgroundColor": "#22c55e", "fillStyle": "solid" },
{ "type": "text", "id": "leg-essential-t", "x": 105, "y": 1230, "text": "Essential", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-supporting", "x": 190, "y": 1228, "width": 20, "height": 20, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "leg-supporting-t", "x": 215, "y": 1230, "text": "Supporting", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-partial", "x": 310, "y": 1228, "width": 20, "height": 20, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
{ "type": "text", "id": "leg-partial-t", "x": 335, "y": 1230, "text": "Partially blocked", "fontSize": 14 },
{ "type": "text", "id": "leg-gap-t", "x": 470, "y": 1230, "text": "Red text = gap", "fontSize": 14, "strokeColor": "#ef4444" },
{ "type": "text", "id": "insight-1", "x": 15, "y": 1270, "text": "Key insight: `issues` and `search` are the workhorses (used in 20+ flows).", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "insight-2", "x": 15, "y": 1295, "text": "`who` is critical for people questions but siloed from file-change data.", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "insight-3", "x": 15, "y": 1320, "text": "`timeline` is powerful but keyword-only seeding limits entity-specific queries.", "fontSize": 14, "strokeColor": "#495057" },
{ "type": "text", "id": "insight-4", "x": 15, "y": 1345, "text": "22/30 flows have at least one gap. Most gaps are filter additions, not new commands.", "fontSize": 14, "strokeColor": "#ef4444" }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 217 KiB

View File

@@ -0,0 +1,110 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 300, "y": 20, "text": "Lore CLI Gap Priority Matrix", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 310, "y": 58, "text": "20 identified gaps plotted by impact vs effort", "fontSize": 16, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "q1-zone", "x": 100, "y": 120, "width": 500, "height": 380,
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "q1-label", "x": 110, "y": 126, "text": "QUICK WINS", "fontSize": 18, "strokeColor": "#15803d" },
{ "type": "rectangle", "id": "q2-zone", "x": 620, "y": 120, "width": 500, "height": 380,
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "q2-label", "x": 630, "y": 126, "text": "STRATEGIC", "fontSize": 18, "strokeColor": "#b45309" },
{ "type": "rectangle", "id": "q3-zone", "x": 100, "y": 520, "width": 500, "height": 300,
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "q3-label", "x": 110, "y": 526, "text": "FILL-IN", "fontSize": 18, "strokeColor": "#1971c2" },
{ "type": "rectangle", "id": "q4-zone", "x": 620, "y": 520, "width": 500, "height": 300,
"backgroundColor": "#ffc9c9", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#ef4444", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "q4-label", "x": 630, "y": 526, "text": "DEPRIORITIZE", "fontSize": 18, "strokeColor": "#c92a2a" },
{ "type": "text", "id": "y-axis-hi", "x": 30, "y": 130, "text": "HIGH\nIMPACT", "fontSize": 16, "strokeColor": "#495057", "textAlign": "center" },
{ "type": "text", "id": "y-axis-lo", "x": 30, "y": 550, "text": "LOW\nIMPACT", "fontSize": 16, "strokeColor": "#495057", "textAlign": "center" },
{ "type": "text", "id": "x-axis-lo", "x": 280, "y": 840, "text": "LOW EFFORT", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "text", "id": "x-axis-hi", "x": 800, "y": 840, "text": "HIGH EFFORT", "fontSize": 16, "strokeColor": "#495057" },
{ "type": "arrow", "id": "y-arrow", "x": 85, "y": 810, "width": 0, "height": -680,
"points": [[0,0],[0,-680]], "endArrowhead": "arrow", "strokeColor": "#495057", "strokeWidth": 1 },
{ "type": "arrow", "id": "x-arrow", "x": 85, "y": 810, "width": 1050, "height": 0,
"points": [[0,0],[1050,0]], "endArrowhead": "arrow", "strokeColor": "#495057", "strokeWidth": 1 },
{ "type": "rectangle", "id": "g5", "x": 120, "y": 160, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#5 @me alias", "fontSize": 16 } },
{ "type": "rectangle", "id": "g8", "x": 120, "y": 225, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#8 --state on search", "fontSize": 16 } },
{ "type": "rectangle", "id": "g9", "x": 120, "y": 290, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#9 mrs --milestone", "fontSize": 16 } },
{ "type": "rectangle", "id": "g10", "x": 120, "y": 355, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#10 --no-assignee", "fontSize": 16 } },
{ "type": "rectangle", "id": "g11", "x": 350, "y": 160, "width": 230, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#11 --updated-before", "fontSize": 16 } },
{ "type": "rectangle", "id": "g14", "x": 350, "y": 225, "width": 230, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#14 detail --fields", "fontSize": 16 } },
{ "type": "rectangle", "id": "g18", "x": 350, "y": 290, "width": 230, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#18 1y/12m duration", "fontSize": 16 } },
{ "type": "rectangle", "id": "g20", "x": 350, "y": 355, "width": 230, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "#20 sort by due date", "fontSize": 16 } },
{ "type": "rectangle", "id": "g1", "x": 640, "y": 160, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#1 MR file changes", "fontSize": 16 } },
{ "type": "rectangle", "id": "g2", "x": 640, "y": 225, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#2 entity refs query", "fontSize": 16 } },
{ "type": "rectangle", "id": "g3", "x": 640, "y": 290, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#3 per-note search", "fontSize": 16 } },
{ "type": "rectangle", "id": "g4", "x": 880, "y": 160, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#4 entity timeline", "fontSize": 16 } },
{ "type": "rectangle", "id": "g6", "x": 880, "y": 225, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#6 activity feed", "fontSize": 16 } },
{ "type": "rectangle", "id": "g12", "x": 880, "y": 290, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
"label": { "text": "#12 team workload", "fontSize": 16 } },
{ "type": "rectangle", "id": "g13", "x": 120, "y": 570, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "#13 pagination/offset", "fontSize": 16 } },
{ "type": "rectangle", "id": "g15", "x": 120, "y": 635, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "#15 group by project", "fontSize": 16 } },
{ "type": "rectangle", "id": "g19", "x": 120, "y": 700, "width": 210, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "#19 participant filter", "fontSize": 16 } },
{ "type": "rectangle", "id": "g7", "x": 640, "y": 570, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid",
"label": { "text": "#7 multi-path who", "fontSize": 16 } },
{ "type": "rectangle", "id": "g16", "x": 640, "y": 635, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid",
"label": { "text": "#16 trend metrics", "fontSize": 16 } },
{ "type": "rectangle", "id": "g17", "x": 640, "y": 700, "width": 220, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid",
"label": { "text": "#17 --for-issue on mrs", "fontSize": 16 } },
{ "type": "text", "id": "q1-count", "x": 180, "y": 430, "text": "8 gaps - lowest hanging fruit", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "text", "id": "q2-count", "x": 710, "y": 370, "text": "6 gaps - build deliberately", "fontSize": 14, "strokeColor": "#b45309" },
{ "type": "text", "id": "q3-count", "x": 160, "y": 770, "text": "3 gaps - fill as needed", "fontSize": 14, "strokeColor": "#1971c2" },
{ "type": "text", "id": "q4-count", "x": 680, "y": 770, "text": "3 gaps - defer or rethink", "fontSize": 14, "strokeColor": "#c92a2a" }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 179 KiB

View File

@@ -0,0 +1,184 @@
{
"type": "excalidraw",
"version": 2,
"source": "https://excalidraw.com",
"elements": [
{ "type": "text", "id": "title", "x": 350, "y": 15, "text": "Lore Data Flow Architecture", "fontSize": 28 },
{ "type": "text", "id": "subtitle", "x": 280, "y": 53, "text": "Green = queryable via CLI | Red = stored but hidden | Gray = internal", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "zone-gitlab", "x": 30, "y": 90, "width": 200, "height": 300,
"backgroundColor": "#e5dbff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#8b5cf6", "strokeWidth": 1, "opacity": 30 },
{ "type": "text", "id": "zone-gitlab-label", "x": 55, "y": 96, "text": "GitLab APIs", "fontSize": 16, "strokeColor": "#7048e8" },
{ "type": "rectangle", "id": "rest-api", "x": 50, "y": 130, "width": 160, "height": 60,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "REST API\n(paginated)", "fontSize": 16 } },
{ "type": "rectangle", "id": "graphql-api", "x": 50, "y": 210, "width": 160, "height": 60,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "GraphQL API\n(adaptive pages)", "fontSize": 16 } },
{ "type": "rectangle", "id": "ollama-api", "x": 50, "y": 310, "width": 160, "height": 60,
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
"label": { "text": "Ollama\n(embeddings)", "fontSize": 16 } },
{ "type": "rectangle", "id": "zone-ingest", "x": 270, "y": 90, "width": 180, "height": 300,
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 30 },
{ "type": "text", "id": "zone-ingest-label", "x": 300, "y": 96, "text": "Ingestion", "fontSize": 16, "strokeColor": "#1971c2" },
{ "type": "rectangle", "id": "ingest-issues", "x": 285, "y": 130, "width": 150, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "Issue Sync", "fontSize": 16 } },
{ "type": "rectangle", "id": "ingest-mrs", "x": 285, "y": 195, "width": 150, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "MR Sync", "fontSize": 16 } },
{ "type": "rectangle", "id": "ingest-disc", "x": 285, "y": 260, "width": 150, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "Discussion Sync", "fontSize": 16 } },
{ "type": "rectangle", "id": "ingest-events", "x": 285, "y": 325, "width": 150, "height": 50,
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
"label": { "text": "Event Sync", "fontSize": 16 } },
{ "type": "arrow", "id": "a-rest-issues", "x": 210, "y": 155, "width": 75, "height": 0,
"points": [[0,0],[75,0]], "endArrowhead": "arrow", "strokeColor": "#495057" },
{ "type": "arrow", "id": "a-rest-mrs", "x": 210, "y": 165, "width": 75, "height": 50,
"points": [[0,0],[75,50]], "endArrowhead": "arrow", "strokeColor": "#495057" },
{ "type": "arrow", "id": "a-graphql-issues", "x": 210, "y": 240, "width": 75, "height": -80,
"points": [[0,0],[75,-80]], "endArrowhead": "arrow", "strokeColor": "#495057" },
{ "type": "rectangle", "id": "zone-sqlite", "x": 490, "y": 90, "width": 400, "height": 650,
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 20 },
{ "type": "text", "id": "zone-sqlite-label", "x": 570, "y": 96, "text": "SQLite (WAL mode)", "fontSize": 16, "strokeColor": "#15803d" },
{ "type": "text", "id": "grp-queryable", "x": 500, "y": 120, "text": "Queryable Tables", "fontSize": 14, "strokeColor": "#15803d" },
{ "type": "rectangle", "id": "t-projects", "x": 500, "y": 145, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "projects", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-issues", "x": 500, "y": 195, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "issues + assignees", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-mrs", "x": 500, "y": 245, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "merge_requests", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-discussions", "x": 500, "y": 295, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "discussions + notes", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-events", "x": 500, "y": 345, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "resource_*_events", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-docs", "x": 500, "y": 395, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "documents + FTS5", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-embed", "x": 500, "y": 445, "width": 170, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
"label": { "text": "embeddings (vec)", "fontSize": 14 } },
{ "type": "text", "id": "grp-hidden", "x": 700, "y": 120, "text": "Hidden Tables", "fontSize": 14, "strokeColor": "#c92a2a" },
{ "type": "rectangle", "id": "t-file-changes", "x": 695, "y": 145, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "mr_file_changes", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-entity-refs", "x": 695, "y": 195, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "entity_references", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-raw", "x": 695, "y": 245, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
"label": { "text": "raw_payloads", "fontSize": 14 } },
{ "type": "text", "id": "grp-internal", "x": 700, "y": 310, "text": "Internal Only", "fontSize": 14, "strokeColor": "#868e96" },
{ "type": "rectangle", "id": "t-sync", "x": 695, "y": 340, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96",
"label": { "text": "sync_runs + cursors", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-dirty", "x": 695, "y": 390, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96",
"label": { "text": "dirty_sources", "fontSize": 14 } },
{ "type": "rectangle", "id": "t-locks", "x": 695, "y": 440, "width": 180, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96",
"label": { "text": "app_locks", "fontSize": 14 } },
{ "type": "arrow", "id": "a-ingest-tables", "x": 435, "y": 200, "width": 55, "height": 0,
"points": [[0,0],[55,0]], "endArrowhead": "arrow", "strokeColor": "#495057" },
{ "type": "rectangle", "id": "zone-cli", "x": 930, "y": 90, "width": 250, "height": 650,
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 25 },
{ "type": "text", "id": "zone-cli-label", "x": 990, "y": 96, "text": "CLI Commands", "fontSize": 16, "strokeColor": "#b45309" },
{ "type": "rectangle", "id": "cmd-issues", "x": 950, "y": 130, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore issues", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-mrs", "x": 950, "y": 185, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore mrs", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-search", "x": 950, "y": 240, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore search", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-who", "x": 950, "y": 295, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore who", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-timeline", "x": 950, "y": 350, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore timeline", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-count", "x": 950, "y": 405, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore count", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-sync", "x": 950, "y": 460, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore sync", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-status", "x": 950, "y": 515, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
"label": { "text": "lore status", "fontSize": 16 } },
{ "type": "arrow", "id": "a-issues-cmd", "x": 670, "y": 215, "width": 270, "height": -65,
"points": [[0,0],[270,-65]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "arrow", "id": "a-mrs-cmd", "x": 670, "y": 265, "width": 270, "height": -60,
"points": [[0,0],[270,-60]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "arrow", "id": "a-docs-cmd", "x": 670, "y": 415, "width": 270, "height": -155,
"points": [[0,0],[270,-155]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "arrow", "id": "a-embed-cmd", "x": 670, "y": 465, "width": 270, "height": -200,
"points": [[0,0],[270,-200]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "arrow", "id": "a-events-cmd", "x": 670, "y": 365, "width": 270, "height": 5,
"points": [[0,0],[270,5]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
{ "type": "text", "id": "hidden-note-1", "x": 695, "y": 498, "text": "mr_file_changes: populated by\nMR sync but NOT queryable.\nBlocks H4, A6, A10 flows.", "fontSize": 14, "strokeColor": "#ef4444" },
{ "type": "text", "id": "hidden-note-2", "x": 695, "y": 568, "text": "entity_references: used by\ntimeline internally but NOT\nqueryable. Blocks A5, A11.", "fontSize": 14, "strokeColor": "#ef4444" },
{ "type": "arrow", "id": "a-hidden-who", "x": 875, "y": 165, "width": 65, "height": 148,
"points": [[0,0],[65,148]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeWidth": 2,
"strokeStyle": "dashed" },
{ "type": "text", "id": "hidden-who-label", "x": 880, "y": 240, "text": "who uses\nDiffNotes,\nnot file\nchanges", "fontSize": 12, "strokeColor": "#ef4444" },
{ "type": "arrow", "id": "a-hidden-timeline", "x": 875, "y": 215, "width": 65, "height": 155,
"points": [[0,0],[65,155]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeWidth": 2,
"strokeStyle": "dashed" },
{ "type": "rectangle", "id": "cmd-missing-refs", "x": 950, "y": 580, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed",
"label": { "text": "lore refs (missing)", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-missing-files", "x": 950, "y": 635, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed",
"label": { "text": "lore files (missing)", "fontSize": 16 } },
{ "type": "rectangle", "id": "cmd-missing-activity", "x": 950, "y": 690, "width": 210, "height": 40,
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed",
"label": { "text": "lore activity (missing)", "fontSize": 16 } },
{ "type": "text", "id": "legend-title", "x": 30, "y": 430, "text": "Legend", "fontSize": 16 },
{ "type": "rectangle", "id": "leg-green", "x": 30, "y": 460, "width": 20, "height": 20,
"backgroundColor": "#b2f2bb", "fillStyle": "solid" },
{ "type": "text", "id": "leg-green-t", "x": 60, "y": 462, "text": "Queryable via CLI", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-red", "x": 30, "y": 490, "width": 20, "height": 20,
"backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444" },
{ "type": "text", "id": "leg-red-t", "x": 60, "y": 492, "text": "Stored but hidden", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-gray", "x": 30, "y": 520, "width": 20, "height": 20,
"backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96" },
{ "type": "text", "id": "leg-gray-t", "x": 60, "y": 522, "text": "Internal bookkeeping", "fontSize": 14 },
{ "type": "rectangle", "id": "leg-dashed", "x": 30, "y": 550, "width": 20, "height": 20,
"backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
{ "type": "text", "id": "leg-dashed-t", "x": 60, "y": 552, "text": "Missing command", "fontSize": 14 }
],
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
"files": {}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 238 KiB

66
docs/ideas/README.md Normal file
View File

@@ -0,0 +1,66 @@
# Gitlore Feature Ideas
Central registry of potential features. Each idea leverages data already ingested
into the local SQLite database (issues, MRs, discussions, notes, resource events,
entity references, embeddings, file changes).
## Priority Tiers
**Tier 1 — High confidence, low effort, immediate value:**
| # | Idea | File | Confidence |
|---|------|------|------------|
| 9 | Similar Issues Finder | [similar-issues.md](similar-issues.md) | 95% |
| 17 | "What Changed?" Digest | [digest.md](digest.md) | 93% |
| 5 | Who Knows About X? | [experts.md](experts.md) | 92% |
| -- | Multi-Project Ergonomics | [project-ergonomics.md](project-ergonomics.md) | 90% |
| 27 | Weekly Digest Generator | [weekly-digest.md](weekly-digest.md) | 90% |
| 4 | Stale Discussion Finder | [stale-discussions.md](stale-discussions.md) | 90% |
**Tier 2 — Strong ideas, moderate effort:**
| # | Idea | File | Confidence |
|---|------|------|------------|
| 19 | MR-to-Issue Closure Gap | [closure-gaps.md](closure-gaps.md) | 88% |
| 1 | Contributor Heatmap | [contributors.md](contributors.md) | 88% |
| 21 | Knowledge Silo Detection | [silos.md](silos.md) | 87% |
| 2 | Review Bottleneck Detector | [bottlenecks.md](bottlenecks.md) | 85% |
| 14 | File Hotspot Report | [hotspots.md](hotspots.md) | 85% |
| 26 | Unlinked MR Finder | [unlinked.md](unlinked.md) | 83% |
| 6 | Decision Archaeology | [decisions.md](decisions.md) | 82% |
| 18 | Label Hygiene Audit | [label-audit.md](label-audit.md) | 82% |
**Tier 3 — Promising, needs more design work:**
| # | Idea | File | Confidence |
|---|------|------|------------|
| 29 | Entity Relationship Explorer | [graph.md](graph.md) | 80% |
| 12 | Milestone Risk Report | [milestone-risk.md](milestone-risk.md) | 78% |
| 3 | Label Velocity | [label-flow.md](label-flow.md) | 78% |
| 24 | Recurring Bug Patterns | [recurring-patterns.md](recurring-patterns.md) | 76% |
| 7 | Cross-Project Impact Graph | [impact-graph.md](impact-graph.md) | 75% |
| 16 | Idle Work Detector | [idle.md](idle.md) | 73% |
| 8 | MR Churn Analysis | [churn.md](churn.md) | 72% |
| 15 | Author Collaboration Network | [collaboration.md](collaboration.md) | 70% |
| 28 | DiffNote Coverage Map | [review-coverage.md](review-coverage.md) | 75% |
| 25 | MR Pipeline Efficiency | [mr-pipeline.md](mr-pipeline.md) | 78% |
## Rejected Ideas (with reasons)
| # | Idea | Reason |
|---|------|--------|
| 10 | Sprint Burndown from Labels | Too opinionated about label semantics |
| 11 | Code Review Quality Score | Subjective "quality" scoring creates perverse incentives |
| 13 | Discussion Sentiment Drift | Unreliable heuristic sentiment on technical text |
| 20 | Response Time Leaderboard | Toxic "leaderboard" framing; metric folded into #2 |
| 22 | Timeline Diff | Niche use case; timeline already interleaves events |
| 23 | Discussion Thread Summarizer | Requires LLM inference; out of scope for local-first tool |
| 30 | NL Query Interface | Over-engineered; existing filters cover this |
## How to use this list
1. Pick an idea from Tier 1 or Tier 2
2. Read its detail file for implementation plan and SQL sketches
3. Create a bead (`br create`) referencing the idea file
4. Implement following TDD (test first, then minimal impl)
5. Update the idea file with `status: implemented` when done

View File

@@ -0,0 +1,555 @@
# Project Manager System — Design Proposal
## The Problem
We have a growing backlog of ideas and issues in markdown files. Agents can ship
features in under an hour. The constraint isn't execution speed — it's knowing
WHAT to execute NEXT, in what ORDER, and detecting when the plan needs to change.
We need a system that:
1. Automatically scores and sequences work items
2. Detects when scope changes during spec generation
3. Tracks the full lifecycle: idea → spec → beads → shipped
4. Re-triages instantly when the dependency graph changes
5. Runs in seconds, not minutes
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ docs/ideas/*.md │
│ docs/issues/*.md │
│ (YAML frontmatter) │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ IDEA TRIAGE SKILL │
│ │
│ Phase 1: INGEST — parse all frontmatter │
│ Phase 2: VALIDATE — check refs, detect staleness │
│ Phase 3: EVALUATE — detect scope changes since last run │
│ Phase 4: SCORE — compute priority with unlock graph │
│ Phase 5: SEQUENCE — topological sort by dependency + score │
│ Phase 6: RECOMMEND — top 3 + unlock advisories + warnings │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ HUMAN DECIDES │
│ (picks from top 3, takes seconds) │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ SPEC GENERATION (Claude/GPT) │
│ Takes the idea doc, generates detailed implementation spec │
│ ALSO: re-evaluates frontmatter fields based on deeper │
│ understanding. Updates effort, blocked-by, components. │
│ This is the SCOPE CHANGE DETECTION point. │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ PLAN-TO-BEADS (existing skill) │
│ Spec → granular beads with dependencies via br CLI │
│ Links bead IDs back into the idea frontmatter │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ AGENT IMPLEMENTATION │
│ Works beads via br/bv workflow │
│ bv --robot-triage handles execution-phase prioritization │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ COMPLETION & RE-TRIAGE │
│ Beads close → idea status updates to implemented │
│ Skill re-runs → newly unblocked ideas surface │
│ Loop back to top │
└─────────────────────────────────────────────────────────────┘
```
## The Two Systems and Their Boundary
| Concern | Ideas System (new) | Beads System (existing) |
|---------|-------------------|------------------------|
| Phase | Pre-commitment (what to build) | Execution (how to build) |
| Data | docs/ideas/*.md, docs/issues/*.md | .beads/issues.jsonl |
| Triage | Idea triage skill | bv --robot-triage |
| Tracking | YAML frontmatter | JSONL records |
| Granularity | Feature-level | Task-level |
| Lifecycle | proposed → specced → promoted | open → in_progress → closed |
**The handoff point is promotion.** An idea becomes one or more beads. After that,
the ideas system only tracks the idea's status (promoted/implemented). Beads owns
execution.
An idea file is NEVER deleted. It's a permanent design record. Even after
implementation, it documents WHY the feature was built and what tradeoffs were made.
---
## Data Model
### Frontmatter Schema
```yaml
---
# ── Identity ──
id: idea-009 # stable unique identifier
title: Similar Issues Finder
type: idea # idea | issue
status: proposed # see lifecycle below
# ── Timestamps ──
created: 2026-02-09
updated: 2026-02-09
eval-hash: null # SHA of scoring fields at last triage run
# ── Scoring Inputs ──
impact: high # high | medium | low
effort: small # small | medium | large | xlarge
severity: null # critical | high | medium | low (issues only)
autonomy: full # full | needs-design | needs-human
# ── Dependency Graph ──
blocked-by: [] # IDs of ideas/issues that must complete first
unlocks: # IDs that become possible/better after this ships
- idea-recurring-patterns
requires: [] # external prerequisites (gate names)
related: # soft links, not blocking
- issue-001
# ── Implementation Context ──
components: # source code paths this will touch
- src/search/
- src/embedding/
command: lore similar # proposed CLI command (null for issues)
has-spec: false # detailed spec has been generated
spec-path: null # path to spec doc if it exists
beads: [] # bead IDs after promotion
# ── Classification ──
tags:
- embeddings
- search
---
```
### Status Lifecycle
```
IDEA lifecycle:
proposed ──→ accepted ──→ specced ──→ promoted ──→ implemented
│ │
└──→ rejected └──→ (scope changed, back to accepted)
ISSUE lifecycle:
open ──→ accepted ──→ specced ──→ promoted ──→ resolved
└──→ wontfix
```
Transitions:
- `proposed → accepted`: Human confirms this is worth building
- `accepted → specced`: Detailed implementation spec has been generated
- `specced → promoted`: Beads created from the spec
- `promoted → implemented`: All beads closed
- Any → `rejected`/`wontfix`: Decided not to build (with reason in body)
- `specced → accepted`: Scope changed during spec, needs re-evaluation
### Effort Calibration (Agent-Executed)
| Level | Wall Clock | Autonomy | Example |
|-------|-----------|----------|---------|
| small | ~30 min | Agent ships end-to-end | stale-discussions, closure-gaps |
| medium | ~1 hour | Agent ships end-to-end | similar-issues, digest |
| large | 1-2 hours | May need one design decision | recurring-patterns, experts |
| xlarge | 2+ hours | Needs human architecture input | project groups |
### Gates Registry (docs/gates.yaml)
```yaml
gates:
gate-1:
title: Resource Events Ingestion
status: complete
completed: 2025-12-15
gate-2:
title: Cross-References & Entity Graph
status: complete
completed: 2026-01-10
gate-3:
title: Timeline Pipeline
status: complete
completed: 2026-01-25
gate-4:
title: MR File Changes Ingestion
status: partial
notes: Schema ready (migration 016), ingestion code exists but untested
tracks: mr_file_changes table population
gate-5:
title: Code Trace (file:line → commit → MR → issue)
status: not-started
blocked-by: gate-4
notes: Requires git log parsing + commit SHA matching
```
The skill reads this file to determine which `requires` entries are satisfied.
---
## Scoring Algorithm
### Priority Score
```
For ideas:
base = impact_weight # high=3, medium=2, low=1
unlock = 1 + (0.5 × count_of_unlocks) # items this directly enables
readiness = 0 if blocked, 1 if ready
priority = base × unlock × readiness
For issues:
base = severity_weight × 1.5 # critical=6, high=4.5, medium=3, low=1.5
unlock = 1 + (0.5 × count_of_unlocks) # (bugs rarely unlock, but can)
readiness = 0 if blocked, 1 if ready
priority = base × unlock × readiness
Tiebreak (among equal priority):
1. Prefer smaller effort (ships faster, starts next cycle sooner)
2. Prefer autonomy:full over needs-design over needs-human
3. Prefer older items (FIFO within same score)
```
### Why This Works
- High-impact items that unlock other items float to the top
- Blocked items score 0 regardless of impact (can't be worked)
- Effort is a tiebreaker, not a primary factor (since execution is fast)
- Issues with severity get a 1.5× multiplier (bugs degrade existing value)
- Unlock multiplier captures the "do Gate 4 first" insight automatically
### Example Rankings
| Item | Impact | Unlocks | Readiness | Score |
|------|--------|---------|-----------|-------|
| project-ergonomics | high(3) | 10 | ready(1) | 3 × 6.0 = 18.0 |
| gate-4-completion | med(2) | 5 | ready(1) | 2 × 3.5 = 7.0 |
| similar-issues | high(3) | 1 | ready(1) | 3 × 1.5 = 4.5 |
| stale-discussions | high(3) | 0 | ready(1) | 3 × 1.0 = 3.0 |
| hotspots | high(3) | 1 | blocked(0) | 0.0 |
Project-ergonomics dominates because it unlocks 10 downstream items. This is the
correct recommendation — it's the highest-leverage work even though "stale-discussions"
is simpler.
---
## Scope Change Detection
This is the hardest problem. An idea's scope can change in three ways:
### 1. During Spec Generation (Primary Detection Point)
When Claude/GPT generates a detailed implementation spec from an idea doc, it
understands the idea more deeply than the original sketch. The spec process should
be instructed to:
- Re-evaluate effort (now that implementation is understood in detail)
- Discover new dependencies (need to change schema first, need a new config option)
- Identify component changes (touches more modules than originally thought)
- Assess impact more accurately (this is actually higher/lower value than estimated)
**Mechanism:** The spec generation prompt includes an explicit "re-evaluate frontmatter"
step. The spec output includes an updated frontmatter block. If scoring-relevant
fields changed, the skill flags it:
```
SCOPE CHANGE DETECTED:
idea-009 (Similar Issues Finder)
- effort: small → medium (needs embedding aggregation strategy)
- blocked-by: [] → [gate-embeddings-populated]
- components: +src/cli/commands/similar.rs (new file)
Previous score: 4.5 → New score: 3.0
Recommendation: Still top-3, but sequencing may change.
```
### 2. During Implementation (Discovered Complexity)
An agent working on beads may discover the spec was wrong:
- "This requires a database migration I didn't anticipate"
- "This module doesn't expose the API I need"
**Mechanism:** When a bead is blocked or takes significantly longer than estimated,
the agent should update the idea's frontmatter. The skill detects the change on
next triage run via eval-hash comparison.
### 3. External Changes (Gate Completion, New Ideas)
When a gate completes or a new idea is added that changes the dependency graph:
- Gate 4 completes → 5 ideas become unblocked
- New idea added that's higher priority than current top-3
- Two ideas discovered to be duplicates
**Mechanism:** The skill detects these automatically by re-computing the full graph
on every run. The eval-hash tracks what the scoring fields looked like last time;
if they haven't changed but the SCORE changed (because a dependency was resolved),
the skill flags it as "newly unblocked."
### The eval-hash Field
```yaml
eval-hash: "a1b2c3d4" # SHA-256 of: impact + effort + blocked-by + unlocks + requires
```
Computed by hashing the concatenation of all scoring-relevant fields. When the skill
runs, it compares:
- If eval-hash matches AND score is same → no change, skip
- If eval-hash matches BUT score changed → external change (dependency resolved)
- If eval-hash differs → item was modified, re-evaluate
This avoids re-announcing unchanged items on every run.
---
## Skill Design
### Location
`.claude/skills/idea-triage/SKILL.md` (project-local)
### Trigger Phrases
- "triage ideas" / "what should I build next?"
- "idea triage" / "prioritize ideas"
- "what's the highest value work?"
- `/idea-triage`
### Workflow Phases
**Phase 1: INGEST**
- Glob docs/ideas/*.md and docs/issues/*.md
- Parse YAML frontmatter from each file
- Read docs/gates.yaml for capability status
- Collect: id, title, type, status, impact, effort, severity, autonomy,
blocked-by, unlocks, requires, has-spec, beads, eval-hash
**Phase 2: VALIDATE**
- Required fields present (id, title, type, status, impact, effort)
- All blocked-by IDs reference existing files
- All unlocks IDs reference existing files
- All requires entries exist in gates.yaml
- No dependency cycles (blocked-by graph is a DAG)
- Status transitions are valid (no "proposed" with beads linked)
- Output: list of validation errors/warnings
**Phase 3: EVALUATE (Scope Change Detection)**
- For each item, compute current eval-hash from scoring fields
- Compare against stored eval-hash in frontmatter
- If different: flag as SCOPE_CHANGED with field-level diff
- If same but score changed (due to external dep resolution): flag as NEWLY_UNBLOCKED
- If status is specced but has-spec is false: flag as INCONSISTENT
**Phase 4: SCORE**
- Resolve requires against gates.yaml (is the gate complete?)
- Resolve blocked-by against other items (is the blocker done?)
- Compute readiness: 0 if any hard blocker is unresolved, 1 otherwise
- Compute unlock count: count items whose blocked-by includes this ID
- Apply scoring formula:
- Ideas: impact_weight × (1 + 0.5 × unlock_count) × readiness
- Issues: severity_weight × 1.5 × (1 + 0.5 × unlock_count) × readiness
- Apply tiebreak: effort_weight, autonomy, created date
**Phase 5: SEQUENCE**
- Separate into: actionable (score > 0) vs blocked (score = 0)
- Among actionable: sort by score descending with tiebreak
- Among blocked: sort by "what-if score" (score if blockers were resolved)
- Compute unlock advisories: "completing X unblocks Y items worth Z total score"
**Phase 6: RECOMMEND**
Output structured report:
```
== IDEA TRIAGE ==
Run: 2026-02-09T14:30:00Z
Items: 22 (18 proposed, 2 accepted, 1 specced, 1 implemented)
RECOMMENDED SEQUENCE:
1. [idea-project-ergonomics] Multi-Project Ergonomics
impact:high effort:medium autonomy:full score:18.0
WHY FIRST: Unlocks 10 downstream ideas. Highest leverage.
COMPONENTS: src/core/config.rs, src/core/project.rs, src/cli/
2. [idea-009] Similar Issues Finder
impact:high effort:small autonomy:full score:4.5
WHY NEXT: Highest standalone impact. Ships in ~30 min.
UNLOCKS: idea-recurring-patterns
3. [idea-004] Stale Discussion Finder
impact:high effort:small autonomy:full score:3.0
WHY NEXT: Quick win, no dependencies, immediate user value.
BLOCKED (would rank high if unblocked):
idea-014 File Hotspots score-if-unblocked:4.5 BLOCKED BY: gate-4
idea-021 Knowledge Silos score-if-unblocked:3.0 BLOCKED BY: gate-4
UNLOCK ADVISORY: Completing gate-4 unblocks 5 items (combined: 15.0)
SCOPE CHANGES DETECTED:
idea-009: effort changed small→medium (eval-hash mismatch)
idea-017: now has spec (has-spec flipped to true)
NEWLY UNBLOCKED:
(none this run)
WARNINGS:
idea-016: status=proposed, unchanged for 30+ days
idea-008: blocked-by references "idea-gate4" which doesn't exist (typo?)
HEALTH:
Proposed: 18 | Accepted: 2 | Specced: 1 | Promoted: 0 | Implemented: 1
Blocked: 6 | Actionable: 16
Backlog runway at ~5/day: ~3 days
```
### What the Skill Does NOT Do
- **Never modifies files.** Read-only triage. The agent or human updates frontmatter.
Exception: the skill CAN update eval-hash after a triage run (opt-in).
- **Never creates beads.** That's plan-to-beads skill territory.
- **Never replaces bv.** Once work is in beads, bv --robot-triage handles execution
prioritization. This skill owns pre-commitment only.
- **Never generates specs.** That's a separate step with Claude/GPT.
---
## Integration Points
### With Spec Generation
The spec generation prompt (separate from this skill) should include:
```
After generating the implementation spec, re-evaluate the idea's frontmatter:
1. Is the effort estimate still accurate? (small/medium/large/xlarge)
2. Did you discover new dependencies? (add to blocked-by)
3. Are there components not listed? (add to components)
4. Has the impact assessment changed?
5. Can an agent ship this autonomously? (autonomy: full/needs-design/needs-human)
Output an UPDATED frontmatter block at the end of the spec.
If any scoring field changed, explain what changed and why.
```
### With plan-to-beads
When promoting an idea to beads:
1. Run plan-to-beads on the spec
2. Capture the created bead IDs
3. Update the idea's frontmatter: status → promoted, beads → [bd-xxx, bd-yyy]
4. Run br sync --flush-only && git add .beads/
### With bv --robot-triage
These systems don't talk to each other directly. The boundary is:
- Idea triage skill → "build idea-009 next"
- Human/agent generates spec → plan-to-beads → beads created
- bv --robot-triage → "work on bd-xxx next"
- Beads close → human/agent updates idea frontmatter → idea triage re-runs
### With New Item Ingestion
When someone adds a new file to docs/ideas/ or docs/issues/:
- If it has valid frontmatter: picked up automatically on next triage run
- If it has no/invalid frontmatter: flagged in WARNINGS section
- Skill can suggest default frontmatter based on content analysis
---
## Failure Modes and Mitigations
### 1. Frontmatter Rot
**Risk:** Fields don't get updated. Status says "proposed" but it's actually shipped.
**Mitigation:** Cross-reference with beads. If an idea has beads and all beads are
closed, flag that the idea should be "implemented" even if frontmatter says otherwise.
The skill detects this inconsistency.
### 2. Score Gaming
**Risk:** Someone inflates impact or unlocks count to make their idea rank higher.
**Mitigation:** Unlocks are verified — the skill checks that the referenced items
actually have this idea in their blocked-by. Impact is subjective but reviewed during
spec generation (second opinion from a different model/session).
### 3. Stale Gates Registry
**Risk:** gate-4 is actually complete but gates.yaml wasn't updated.
**Mitigation:** Skill warns when a gate has been "partial" for a long time. Could
also probe the codebase (check if mr_file_changes ingestion code exists and has tests).
### 4. Circular Dependencies
**Risk:** A blocks B blocks A.
**Mitigation:** Phase 2 validation explicitly checks for cycles in the blocked-by
graph and reports them as errors.
### 5. Unlock Count Inflation
**Risk:** An item claims to unlock 20 things, making it score astronomically.
**Mitigation:** Unlock count is VERIFIED by checking reverse blocked-by references.
If idea-X says it unlocks idea-Y, but idea-Y's blocked-by doesn't include idea-X,
the claim is discounted. Both explicit unlocks and reverse blocked-by contribute to
the count, but unverified claims are flagged.
### 6. Scope Creep During Spec
**Risk:** Spec generation reveals the idea is actually 5× harder than estimated.
The score drops, but the human has already mentally committed.
**Mitigation:** The scope change detection makes this VISIBLE. The triage output
explicitly shows "effort changed small→xlarge, score dropped from 4.5 to 0.75."
Human can then decide: proceed anyway, or switch to a different top-3 pick.
### 7. Orphaned Ideas
**Risk:** Ideas get promoted to beads, beads get implemented, but the idea file
never gets updated. It sits in "promoted" forever.
**Mitigation:** Skill checks: for each idea with status=promoted, look up the
linked beads. If all beads are closed, flag: "idea-009 appears complete, update
status to implemented."
---
## Implementation Plan
### Step 1: Create the Frontmatter Schema (this doc → applied to all files)
- Define the exact YAML schema (above)
- Create docs/gates.yaml
- Apply frontmatter to all 22 existing files in docs/ideas/ and docs/issues/
### Step 2: Build the Skill
- Create .claude/skills/idea-triage/SKILL.md
- Implement all 6 phases in the skill prompt
- The skill uses Glob, Read, and text processing — no external scripts needed
(25 files is small enough for Claude to process directly)
### Step 3: Test the System
- Run the skill against current files
- Verify scoring matches manual expectations
- Check that project-ergonomics ranks #1 (it should, due to unlock count)
- Verify blocked items score 0
- Check validation catches intentional errors
### Step 4: Run One Full Cycle
- Pick the top recommendation
- Generate a spec (separate session)
- Verify scope change detection works (spec should update frontmatter)
- Promote to beads via plan-to-beads
- Implement
- Verify completion detection works
### Step 5: Iterate
- Run triage again after implementation
- Verify newly unblocked items surface
- Adjust scoring weights if rankings feel wrong
- Add new ideas as they emerge

88
docs/ideas/bottlenecks.md Normal file
View File

@@ -0,0 +1,88 @@
# Review Bottleneck Detector
- **Command:** `lore bottlenecks [--since <date>]`
- **Confidence:** 85%
- **Tier:** 2
- **Status:** proposed
- **Effort:** medium — join MRs with first review note, compute percentiles
## What
For MRs in a given time window, compute:
1. **Time to first review** — created_at to first non-author DiffNote
2. **Review cycles** — count of discussion resolution rounds
3. **Time to merge** — created_at to merged_at
Flag MRs above P90 thresholds as bottlenecks.
## Why
Review bottlenecks are the #1 developer productivity killer. Making them visible
and measurable is the first step to fixing them. This provides data for process
retrospectives.
## Data Required
All exists today:
- `merge_requests` (created_at, merged_at, author_username)
- `notes` (note_type='DiffNote', author_username, created_at)
- `discussions` (resolved, resolvable)
## Implementation Sketch
```sql
-- Time to first review per MR
SELECT
mr.id,
mr.iid,
mr.title,
mr.author_username,
mr.created_at,
mr.merged_at,
p.path_with_namespace,
MIN(n.created_at) as first_review_at,
(MIN(n.created_at) - mr.created_at) / 3600000.0 as hours_to_first_review,
(mr.merged_at - mr.created_at) / 3600000.0 as hours_to_merge
FROM merge_requests mr
JOIN projects p ON mr.project_id = p.id
LEFT JOIN discussions d ON d.merge_request_id = mr.id
LEFT JOIN notes n ON n.discussion_id = d.id
AND n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username != mr.author_username
WHERE mr.created_at >= ?1
AND mr.state IN ('merged', 'opened')
GROUP BY mr.id
ORDER BY hours_to_first_review DESC NULLS FIRST;
```
## Human Output
```
Review Bottlenecks (last 30 days)
P50 time to first review: 4.2h
P90 time to first review: 28.1h
P50 time to merge: 2.1d
P90 time to merge: 8.3d
Slowest to review:
!234 Refactor auth 72h to first review (alice, still open)
!228 Database migration 48h to first review (bob, merged in 5d)
Most review cycles:
!234 Refactor auth 8 discussion threads, 4 resolved
!225 API versioning 6 discussion threads, 6 resolved
```
## Downsides
- Doesn't capture review done outside GitLab (Slack, in-person)
- DiffNote timestamp != when reviewer started reading
- Large MRs naturally take longer; no size normalization
## Extensions
- `lore bottlenecks --reviewer alice` — how fast does alice review?
- Per-project comparison: which project has the fastest review cycle?
- Trend line: is review speed improving or degrading over time?

77
docs/ideas/churn.md Normal file
View File

@@ -0,0 +1,77 @@
# MR Churn Analysis
- **Command:** `lore churn [--since <date>]`
- **Confidence:** 72%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — multi-table aggregation with composite scoring
## What
For merged MRs, compute a "contentiousness score" based on: number of review
discussions, number of DiffNotes, resolution cycles, file count. Flag high-churn
MRs as candidates for architectural review.
## Why
High-churn MRs often indicate architectural disagreements, unclear requirements,
or code that's hard to review. Surfacing them post-merge enables retrospectives
and identifies areas that need better design upfront.
## Data Required
All exists today:
- `merge_requests` (state='merged')
- `discussions` (merge_request_id, resolved, resolvable)
- `notes` (note_type='DiffNote', discussion_id)
- `mr_file_changes` (file count per MR)
## Implementation Sketch
```sql
SELECT
mr.iid,
mr.title,
mr.author_username,
p.path_with_namespace,
COUNT(DISTINCT d.id) as discussion_count,
COUNT(DISTINCT CASE WHEN n.note_type = 'DiffNote' THEN n.id END) as diffnote_count,
COUNT(DISTINCT CASE WHEN d.resolvable = 1 AND d.resolved = 1 THEN d.id END) as resolved_threads,
COUNT(DISTINCT mfc.id) as files_changed,
-- Composite score: normalize each metric and weight
(COUNT(DISTINCT d.id) * 2 + COUNT(DISTINCT n.id) + COUNT(DISTINCT mfc.id)) as churn_score
FROM merge_requests mr
JOIN projects p ON mr.project_id = p.id
LEFT JOIN discussions d ON d.merge_request_id = mr.id AND d.noteable_type = 'MergeRequest'
LEFT JOIN notes n ON n.discussion_id = d.id AND n.is_system = 0
LEFT JOIN mr_file_changes mfc ON mfc.merge_request_id = mr.id
WHERE mr.state = 'merged'
AND mr.merged_at >= ?1
GROUP BY mr.id
ORDER BY churn_score DESC
LIMIT ?2;
```
## Human Output
```
High-Churn MRs (last 90 days)
MR Discussions DiffNotes Files Score Title
!234 12 28 8 60 Refactor auth middleware
!225 8 19 5 39 API versioning v2
!218 6 15 12 39 Database schema migration
!210 5 8 3 21 Update logging framework
```
## Downsides
- High discussion count could mean thorough review, not contention
- Composite scoring weights are arbitrary; needs calibration per team
- Large MRs naturally score higher regardless of contention
## Extensions
- Normalize by file count (discussions per file changed)
- Compare against team averages (flag outliers, not absolute values)
- `lore churn --author alice` — which of alice's MRs generate the most discussion?

View File

@@ -0,0 +1,73 @@
# MR-to-Issue Closure Gap
- **Command:** `lore closure-gaps`
- **Confidence:** 88%
- **Tier:** 2
- **Status:** proposed
- **Effort:** low — single join query
## What
Find entity_references where reference_type='closes' AND the target issue is still
open AND the source MR is merged. These represent broken auto-close links where a
merge should have closed an issue but didn't.
## Why
Simple, definitive, actionable. If a merged MR says "closes #42" but #42 is still
open, something is wrong. Either auto-close failed (wrong target branch), the
reference was incorrect, or the issue needs manual attention.
## Data Required
All exists today:
- `entity_references` (reference_type='closes')
- `merge_requests` (state='merged')
- `issues` (state='opened')
## Implementation Sketch
```sql
SELECT
mr.iid as mr_iid,
mr.title as mr_title,
mr.merged_at,
mr.target_branch,
i.iid as issue_iid,
i.title as issue_title,
i.state as issue_state,
p.path_with_namespace
FROM entity_references er
JOIN merge_requests mr ON er.source_entity_type = 'merge_request'
AND er.source_entity_id = mr.id
JOIN issues i ON er.target_entity_type = 'issue'
AND er.target_entity_id = i.id
JOIN projects p ON er.project_id = p.id
WHERE er.reference_type = 'closes'
AND mr.state = 'merged'
AND i.state = 'opened';
```
## Human Output
```
Closure Gaps — merged MRs that didn't close their referenced issues
group/backend !234 merged 3d ago → #42 still OPEN
"Refactor auth middleware" should have closed "Login timeout bug"
Target branch: develop (default: main) — possible branch mismatch
group/frontend !45 merged 1w ago → #38 still OPEN
"Update dashboard" should have closed "Dashboard layout broken"
```
## Downsides
- Could be intentional (MR merged to wrong branch, issue tracked across branches)
- Cross-project references may not be resolvable if target project not synced
- GitLab auto-close only works when merging to default branch
## Extensions
- Flag likely cause: branch mismatch (target_branch != project.default_branch)
- `lore closure-gaps --auto-close` — actually close the issues via API (dangerous, needs confirmation)

101
docs/ideas/collaboration.md Normal file
View File

@@ -0,0 +1,101 @@
# Author Collaboration Network
- **Command:** `lore collaboration [--since <date>]`
- **Confidence:** 70%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — self-join on notes, graph construction
## What
Build a weighted graph of author pairs: (author_A, author_B, weight) where weight =
number of times A reviewed B's MR + B reviewed A's MR + they both commented on the
same entity.
## Why
Reveals team structure empirically. Shows who collaborates across team boundaries
and where knowledge transfer happens. Useful for re-orgs, onboarding planning,
and identifying isolated team members.
## Data Required
All exists today:
- `merge_requests` (author_username)
- `notes` (author_username, note_type='DiffNote')
- `discussions` (for co-participation)
## Implementation Sketch
```sql
-- Review relationships: who reviews whose MRs
SELECT
mr.author_username as author,
n.author_username as reviewer,
COUNT(*) as review_count
FROM merge_requests mr
JOIN discussions d ON d.merge_request_id = mr.id
JOIN notes n ON n.discussion_id = d.id
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username != mr.author_username
AND mr.created_at >= ?1
GROUP BY mr.author_username, n.author_username;
-- Co-participation: who comments on the same entities
WITH entity_participants AS (
SELECT
COALESCE(d.issue_id, d.merge_request_id) as entity_id,
d.noteable_type,
n.author_username
FROM discussions d
JOIN notes n ON n.discussion_id = d.id
WHERE n.is_system = 0
AND n.created_at >= ?1
)
SELECT
a.author_username as person_a,
b.author_username as person_b,
COUNT(DISTINCT a.entity_id) as shared_entities
FROM entity_participants a
JOIN entity_participants b
ON a.entity_id = b.entity_id
AND a.noteable_type = b.noteable_type
AND a.author_username < b.author_username -- avoid duplicates
GROUP BY a.author_username, b.author_username;
```
## Output Formats
### JSON (for further analysis)
```json
{
"nodes": ["alice", "bob", "charlie"],
"edges": [
{ "source": "alice", "target": "bob", "reviews": 15, "co_participated": 8 },
{ "source": "bob", "target": "charlie", "reviews": 3, "co_participated": 12 }
]
}
```
### Human
```
Collaboration Network (last 90 days)
alice <-> bob 15 reviews, 8 shared discussions [strong]
bob <-> charlie 3 reviews, 12 shared discussions [moderate]
alice <-> charlie 1 review, 2 shared discussions [weak]
dave <-> (none) 0 reviews, 0 shared discussions [isolated]
```
## Downsides
- Interpretation requires context; high collaboration might mean dependency
- Doesn't capture collaboration outside GitLab
- Self-join can be slow with many notes
## Extensions
- `lore collaboration --format dot` — GraphViz network diagram
- `lore collaboration --isolated` — find team members with no collaboration edges
- Team boundary detection via graph clustering algorithms

View File

@@ -0,0 +1,86 @@
# Contributor Heatmap
- **Command:** `lore contributors [--since <date>]`
- **Confidence:** 88%
- **Tier:** 2
- **Status:** proposed
- **Effort:** medium — multiple aggregation queries
## What
Rank team members by activity across configurable time windows (7d, 30d, 90d). Shows
issues authored, MRs authored, MRs merged, review comments made, discussions
participated in.
## Why
Team leads constantly ask "who's been active?" or "who's contributing to reviews?"
This answers it from local data without GitLab Premium analytics. Also useful for
identifying team members who may be overloaded or disengaged.
## Data Required
All exists today:
- `issues` (author_username, created_at)
- `merge_requests` (author_username, created_at, merged_at)
- `notes` (author_username, created_at, note_type, is_system)
- `discussions` (for participation counting)
## Implementation Sketch
```sql
-- Combined activity per author
WITH activity AS (
SELECT author_username, 'issue_authored' as activity_type, created_at
FROM issues WHERE created_at >= ?1
UNION ALL
SELECT author_username, 'mr_authored', created_at
FROM merge_requests WHERE created_at >= ?1
UNION ALL
SELECT author_username, 'mr_merged', merged_at
FROM merge_requests WHERE merged_at >= ?1 AND state = 'merged'
UNION ALL
SELECT author_username, 'review_comment', created_at
FROM notes WHERE created_at >= ?1 AND note_type = 'DiffNote' AND is_system = 0
UNION ALL
SELECT author_username, 'discussion_comment', created_at
FROM notes WHERE created_at >= ?1 AND note_type != 'DiffNote' AND is_system = 0
)
SELECT
author_username,
COUNT(*) FILTER (WHERE activity_type = 'issue_authored') as issues,
COUNT(*) FILTER (WHERE activity_type = 'mr_authored') as mrs_authored,
COUNT(*) FILTER (WHERE activity_type = 'mr_merged') as mrs_merged,
COUNT(*) FILTER (WHERE activity_type = 'review_comment') as reviews,
COUNT(*) FILTER (WHERE activity_type = 'discussion_comment') as comments,
COUNT(*) as total
FROM activity
GROUP BY author_username
ORDER BY total DESC;
```
Note: SQLite doesn't support FILTER — use SUM(CASE WHEN ... THEN 1 ELSE 0 END).
## Human Output
```
Contributors (last 30 days)
Username Issues MRs Merged Reviews Comments Total
alice 3 8 7 23 12 53
bob 1 5 4 31 8 49
charlie 5 3 2 4 15 29
dave 0 1 0 2 3 6
```
## Downsides
- Could be used for surveillance; frame as team health, not individual tracking
- Activity volume != productivity (one thoughtful review > ten "LGTM"s)
- Doesn't capture work done outside GitLab
## Extensions
- `lore contributors --project group/backend` — scoped to project
- `lore contributors --type reviews` — focus on review activity only
- Trend comparison: `--compare 30d,90d` shows velocity changes

94
docs/ideas/decisions.md Normal file
View File

@@ -0,0 +1,94 @@
# Decision Archaeology
- **Command:** `lore decisions <query>`
- **Confidence:** 82%
- **Tier:** 2
- **Status:** proposed
- **Effort:** medium — search pipeline + regex pattern matching on notes
## What
Search for discussion notes that contain decision-making language. Use the existing
search pipeline but boost notes containing patterns like "decided", "agreed",
"will go with", "tradeoff", "because we", "rationale", "the approach is", "we chose".
Return the surrounding discussion context.
## Why
This is gitlore's unique value proposition — "why was this decision made?" is the
question that no other tool answers well. Architecture Decision Records are rarely
maintained; the real decisions live in discussion threads. This mines them.
## Data Required
All exists today:
- `documents` + search pipeline (for finding relevant entities)
- `notes` (body text for pattern matching)
- `discussions` (for thread context)
## Implementation Sketch
```
1. Run existing hybrid search to find entities matching the query topic
2. For each result entity, query all discussion notes
3. Score each note against decision-language patterns:
- Strong signals (weight 3): "decided to", "agreed on", "the decision is",
"we will go with", "approved approach"
- Medium signals (weight 2): "tradeoff", "because", "rationale", "chosen",
"opted for", "rejected", "alternative"
- Weak signals (weight 1): "should we", "proposal", "option A", "option B",
"pros and cons"
4. Return notes scoring above threshold, with surrounding context (previous and
next note in discussion thread)
5. Sort by: search relevance * decision score
```
### Decision Patterns (regex)
```rust
const STRONG_PATTERNS: &[&str] = &[
r"(?i)\b(decided|agreed|approved)\s+(to|on|that)\b",
r"(?i)\bthe\s+(decision|approach|plan)\s+is\b",
r"(?i)\bwe('ll| will| are going to)\s+(go with|use|implement)\b",
r"(?i)\blet'?s\s+(go with|use|do)\b",
];
const MEDIUM_PATTERNS: &[&str] = &[
r"(?i)\b(tradeoff|trade-off|rationale|because we|opted for)\b",
r"(?i)\b(rejected|ruled out|won't work|not viable)\b",
r"(?i)\b(chosen|selected|picked)\b.{0,20}\b(over|instead of)\b",
];
```
## Human Output
```
Decisions related to "authentication"
group/backend !234 — "Refactor auth middleware"
Discussion #a1b2c3 (alice, 3w ago):
"We decided to use JWT with short-lived tokens instead of session cookies.
The tradeoff is more complexity in the refresh flow, but we get stateless
auth which scales better."
Decision confidence: HIGH (3 strong pattern matches)
group/backend #42 — "Auth architecture review"
Discussion #d4e5f6 (bob, 2mo ago):
"After discussing with the security team, we'll go with bcrypt for password
hashing. Argon2 is theoretically better but bcrypt has wider library support."
Decision confidence: HIGH (2 strong pattern matches)
```
## Downsides
- Pattern matching is imperfect; may miss decisions phrased differently
- May surface "discussion about deciding" rather than actual decisions
- Non-English discussions won't match
- Requires good search results as input (garbage in, garbage out)
## Extensions
- `lore decisions --recent` — decisions made in last 30 days
- `lore decisions --author alice` — decisions made by specific person
- Export as ADR (Architecture Decision Record) format
- Combine with timeline for chronological decision history

131
docs/ideas/digest.md Normal file
View File

@@ -0,0 +1,131 @@
# "What Changed?" Digest
- **Command:** `lore digest --since <date>`
- **Confidence:** 93%
- **Tier:** 1
- **Status:** proposed
- **Effort:** medium — multiple queries across event tables, formatting logic
## What
Generate a structured summary of all activity since a given date: issues
opened/closed, MRs merged, labels changed, milestones updated, key discussions.
Group by project and sort by significance (state changes > merges > label changes >
new comments).
Default `--since` is 1 day (last 24 hours). Supports `7d`, `2w`, `YYYY-MM-DD`.
## Why
"What happened while I was on PTO?" is the most universal developer question. This
is a killer feature that leverages ALL the event data gitlore has ingested. No other
local tool provides this.
## Data Required
All exists today:
- `resource_state_events` (opened/closed/merged/reopened)
- `resource_label_events` (label add/remove)
- `resource_milestone_events` (milestone add/remove)
- `merge_requests` (merged_at for merge events)
- `issues` (created_at for new issues)
- `discussions` (last_note_at for active discussions)
## Implementation Sketch
```
1. Parse --since into ms epoch timestamp
2. Query each event table WHERE created_at >= since
3. Query new issues WHERE created_at >= since
4. Query merged MRs WHERE merged_at >= since
5. Query active discussions WHERE last_note_at >= since
6. Group all events by project
7. Within each project, sort by: state changes first, then merges, then labels
8. Format as human-readable sections or robot JSON
```
### SQL Queries
```sql
-- State changes in window
SELECT rse.*, i.iid as issue_iid, mr.iid as mr_iid,
COALESCE(i.title, mr.title) as title,
p.path_with_namespace
FROM resource_state_events rse
LEFT JOIN issues i ON rse.issue_id = i.id
LEFT JOIN merge_requests mr ON rse.merge_request_id = mr.id
JOIN projects p ON rse.project_id = p.id
WHERE rse.created_at >= ?1
ORDER BY rse.created_at DESC;
-- Newly merged MRs
SELECT mr.iid, mr.title, mr.author_username, mr.merged_at,
p.path_with_namespace
FROM merge_requests mr
JOIN projects p ON mr.project_id = p.id
WHERE mr.merged_at >= ?1
ORDER BY mr.merged_at DESC;
-- New issues
SELECT i.iid, i.title, i.author_username, i.created_at,
p.path_with_namespace
FROM issues i
JOIN projects p ON i.project_id = p.id
WHERE i.created_at >= ?1
ORDER BY i.created_at DESC;
```
## Human Output Format
```
=== What Changed (last 7 days) ===
group/backend (12 events)
Merged:
!234 Refactor auth middleware (alice, 2d ago)
!231 Fix connection pool leak (bob, 5d ago)
Closed:
#89 Login timeout on slow networks (closed by alice, 3d ago)
Opened:
#95 Rate limiting returns 500 (charlie, 1d ago)
Labels:
#90 +priority::high (dave, 4d ago)
group/frontend (3 events)
Merged:
!45 Update dashboard layout (eve, 6d ago)
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"since": "2025-01-20T00:00:00Z",
"projects": [
{
"path": "group/backend",
"merged": [ { "iid": 234, "title": "...", "author": "alice" } ],
"closed": [ { "iid": 89, "title": "...", "actor": "alice" } ],
"opened": [ { "iid": 95, "title": "...", "author": "charlie" } ],
"label_changes": [ { "iid": 90, "label": "priority::high", "action": "add" } ]
}
],
"summary": { "total_events": 15, "projects_active": 2 }
}
}
```
## Downsides
- Can be overwhelming for very active repos; needs `--limit` per category
- Doesn't capture nuance (a 200-comment MR merge is more significant than a typo fix)
- Only shows what gitlore has synced; stale data = stale digest
## Extensions
- `lore digest --author alice` — personal activity digest
- `lore digest --project group/backend` — single project scope
- `lore digest --format markdown` — paste-ready for Slack/email
- Combine with weekly-digest for scheduled summaries

120
docs/ideas/experts.md Normal file
View File

@@ -0,0 +1,120 @@
# Who Knows About X?
- **Command:** `lore experts <path-or-topic>`
- **Confidence:** 92%
- **Tier:** 1
- **Status:** proposed
- **Effort:** medium — two query paths (file-based, topic-based)
## What
Given a file path, find people who have authored MRs touching that file, left
DiffNotes on that file, or discussed issues referencing that file. Given a topic
string, use search to find relevant entities then extract the active participants.
## Why
"Who should I ask about the auth module?" is one of the most common questions in
large teams. This answers it empirically from actual contribution and review data.
No guessing, no out-of-date wiki pages.
## Data Required
All exists today:
- `mr_file_changes` (new_path, merge_request_id) — who changed the file
- `notes` (position_new_path, author_username) — who reviewed the file
- `merge_requests` (author_username) — MR authorship
- `documents` + search pipeline — for topic-based queries
- `discussions` + `notes` — for participant extraction
## Implementation Sketch
### Path Mode: `lore experts src/auth/`
```
1. Query mr_file_changes WHERE new_path LIKE 'src/auth/%'
2. Join merge_requests to get author_username for each MR
3. Query notes WHERE position_new_path LIKE 'src/auth/%'
4. Collect all usernames with activity counts
5. Rank by: MR authorship (weight 3) + DiffNote authorship (weight 2) + discussion participation (weight 1)
6. Apply recency decay (recent activity weighted higher)
```
### Topic Mode: `lore experts "authentication timeout"`
```
1. Run existing hybrid search for the topic
2. Collect top N document results
3. For each document, extract author_username
4. For each document's entity, query discussions and collect note authors
5. Rank by frequency and recency
```
### SQL (Path Mode)
```sql
-- Authors who changed files matching pattern
SELECT mr.author_username, COUNT(*) as changes, MAX(mr.merged_at) as last_active
FROM mr_file_changes mfc
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
WHERE mfc.new_path LIKE ?1
AND mr.state = 'merged'
GROUP BY mr.author_username
ORDER BY changes DESC;
-- Reviewers who commented on files matching pattern
SELECT n.author_username, COUNT(*) as reviews, MAX(n.created_at) as last_active
FROM notes n
WHERE n.position_new_path LIKE ?1
AND n.note_type = 'DiffNote'
AND n.is_system = 0
GROUP BY n.author_username
ORDER BY reviews DESC;
```
## Human Output Format
```
Experts for: src/auth/
alice 12 changes, 8 reviews (last active 3d ago) [top contributor]
bob 3 changes, 15 reviews (last active 1d ago) [top reviewer]
charlie 5 changes, 2 reviews (last active 2w ago)
dave 1 change, 0 reviews (last active 3mo ago) [stale]
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"query": "src/auth/",
"query_type": "path",
"experts": [
{
"username": "alice",
"changes": 12,
"reviews": 8,
"discussions": 3,
"score": 62,
"last_active": "2025-01-25T10:00:00Z",
"role": "top_contributor"
}
]
}
}
```
## Downsides
- Historical data may be stale (people leave teams, change roles)
- Path mode requires `mr_file_changes` to be populated (Gate 4 ingestion)
- Topic mode quality depends on search quality
- Doesn't account for org chart / actual ownership
## Extensions
- `lore experts --since 90d` — recency filter
- `lore experts --min-activity 3` — noise filter
- Combine with `lore silos` to highlight when an expert is the ONLY expert

75
docs/ideas/graph.md Normal file
View File

@@ -0,0 +1,75 @@
# Entity Relationship Explorer
- **Command:** `lore graph <entity-type> <iid>`
- **Confidence:** 80%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — BFS traversal (similar to timeline expand), output formatting
## What
Given an issue or MR, traverse `entity_references` and display all connected
entities with relationship types and depths. Output as tree, JSON, or Mermaid diagram.
## Why
The entity_references graph is already built (Gate 2) but has no dedicated
exploration command. Timeline shows events over time; this shows the relationship
structure. "What's connected to this issue?" is a different question from "what
happened to this issue?"
## Data Required
All exists today:
- `entity_references` (source/target entity, reference_type)
- `issues` / `merge_requests` (for entity context)
- Timeline expand stage already implements BFS over this graph
## Implementation Sketch
```
1. Resolve entity type + iid to local ID
2. BFS over entity_references:
- Follow source→target AND target→source (bidirectional)
- Track depth (--depth flag, default 2)
- Track reference_type for edge labels
3. Hydrate each discovered entity with title, state, URL
4. Format as tree / JSON / Mermaid
```
## Human Output (Tree)
```
#42 Login timeout bug (CLOSED)
├── closes ── !234 Refactor auth middleware (MERGED)
│ ├── mentioned ── #38 Connection timeout in auth flow (CLOSED)
│ └── mentioned ── #51 Token refresh improvements (OPEN)
├── related ── #45 Auth module documentation (OPEN)
└── mentioned ── !228 Database migration (MERGED)
└── closes ── #35 Schema version drift (CLOSED)
```
## Mermaid Output
```mermaid
graph LR
I42["#42 Login timeout"] -->|closes| MR234["!234 Refactor auth"]
MR234 -->|mentioned| I38["#38 Connection timeout"]
MR234 -->|mentioned| I51["#51 Token refresh"]
I42 -->|related| I45["#45 Auth docs"]
I42 -->|mentioned| MR228["!228 DB migration"]
MR228 -->|closes| I35["#35 Schema drift"]
```
## Downsides
- Overlaps somewhat with timeline (but different focus: structure vs chronology)
- High fan-out for popular entities (need depth + limit controls)
- Unresolved cross-project references appear as dead ends
## Extensions
- `lore graph --format dot` — GraphViz DOT output
- `lore graph --format mermaid` — Mermaid diagram
- `lore graph --include-discussions` — show discussion threads as nodes
- Interactive HTML visualization (future web UI)

70
docs/ideas/hotspots.md Normal file
View File

@@ -0,0 +1,70 @@
# File Hotspot Report
- **Command:** `lore hotspots [--since <date>]`
- **Confidence:** 85%
- **Tier:** 2
- **Status:** proposed
- **Effort:** low — single query on mr_file_changes (requires Gate 4 population)
## What
Rank files by frequency of appearance in merged MRs over a time window. Show
change_type breakdown (modified vs added vs deleted). Optionally filter by project.
## Why
Hot files are where bugs live. This is a proven engineering metric (see "Your Code
as a Crime Scene" by Adam Tornhill). High-churn files deserve extra test coverage,
better documentation, and architectural review.
## Data Required
- `mr_file_changes` (new_path, change_type, merge_request_id) — needs Gate 4 population
- `merge_requests` (merged_at, state='merged')
## Implementation Sketch
```sql
SELECT
mfc.new_path,
p.path_with_namespace,
COUNT(*) as total_changes,
SUM(CASE WHEN mfc.change_type = 'modified' THEN 1 ELSE 0 END) as modifications,
SUM(CASE WHEN mfc.change_type = 'added' THEN 1 ELSE 0 END) as additions,
SUM(CASE WHEN mfc.change_type = 'deleted' THEN 1 ELSE 0 END) as deletions,
SUM(CASE WHEN mfc.change_type = 'renamed' THEN 1 ELSE 0 END) as renames,
COUNT(DISTINCT mr.author_username) as unique_authors
FROM mr_file_changes mfc
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
JOIN projects p ON mfc.project_id = p.id
WHERE mr.state = 'merged'
AND mr.merged_at >= ?1
GROUP BY mfc.new_path, p.path_with_namespace
ORDER BY total_changes DESC
LIMIT ?2;
```
## Human Output
```
File Hotspots (last 90 days, top 20)
File Changes Authors Type Breakdown
src/auth/middleware.rs 18 4 14 mod, 3 add, 1 del
src/api/routes.rs 15 3 12 mod, 2 add, 1 rename
src/db/migrations.rs 12 2 8 mod, 4 add
tests/integration/auth_test.rs 11 3 9 mod, 2 add
```
## Downsides
- Requires `mr_file_changes` to be populated (Gate 4 ingestion)
- Doesn't distinguish meaningful changes from trivial ones (formatting, imports)
- Configuration files (CI, Cargo.toml) will rank high but aren't risky
## Extensions
- `lore hotspots --exclude "*.toml,*.yml"` — filter out config files
- `lore hotspots --dir src/auth/` — scope to directory
- Combine with `lore silos` for risk scoring: high churn + bus factor 1 = critical
- Complexity trend: correlate with discussion count (churn + many discussions = problematic)

69
docs/ideas/idle.md Normal file
View File

@@ -0,0 +1,69 @@
# Idle Work Detector
- **Command:** `lore idle [--days <N>] [--labels <pattern>]`
- **Confidence:** 73%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — label event querying with configurable patterns
## What
Find entities that received an "in progress" or similar label but have had no
discussion activity for N days. Cross-reference with assignee to show who might
have forgotten about something.
## Why
Forgotten WIP is invisible waste. Developers start work, get pulled to something
urgent, and the original task sits idle. This makes it visible before it becomes
a problem.
## Data Required
All exists today:
- `resource_label_events` (label_name, action='add', created_at)
- `discussions` (last_note_at for entity activity)
- `issues` / `merge_requests` (state, assignees)
- `issue_assignees` / `mr_assignees`
## Implementation Sketch
```
1. Query resource_label_events for labels matching "in progress" patterns
Default patterns: "in-progress", "in_progress", "doing", "wip",
"workflow::in-progress", "status::in-progress"
Configurable via --labels flag
2. For each entity with an "in progress" label still applied:
a. Check if the label was subsequently removed (if so, skip)
b. Get last_note_at from discussions for that entity
c. Flag if last_note_at is older than threshold
3. Join with assignees for attribution
```
## Human Output
```
Idle Work (labeled "in progress" but no activity for 14+ days)
group/backend
#90 Rate limiting design assigned to: charlie idle 18 days
Last activity: label +priority::high by dave
#85 Cache invalidation fix assigned to: alice idle 21 days
Last activity: discussion comment by bob
group/frontend
!230 Dashboard redesign assigned to: eve idle 14 days
Last activity: DiffNote by dave
```
## Downsides
- Requires label naming conventions; no universal standard
- Work may be happening outside GitLab (local branch, design doc)
- "Idle" threshold is subjective; 14 days may be normal for large features
## Extensions
- `lore idle --assignee alice` — personal idle work check
- `lore idle --notify` — generate message templates for nudging owners
- Configurable label patterns in config.json for team-specific workflows

View File

@@ -0,0 +1,92 @@
# Cross-Project Impact Graph
- **Command:** `lore impact-graph [--format json|dot|mermaid]`
- **Confidence:** 75%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — aggregation over entity_references, graph output formatting
## What
Aggregate `entity_references` by project pair to produce a weighted adjacency matrix
showing how projects reference each other. Output as JSON, DOT, or Mermaid for
visualization.
## Why
Makes invisible architectural coupling visible. "Backend and frontend repos have
47 cross-references this quarter" tells you about tight coupling that may need
architectural attention.
## Data Required
All exists today:
- `entity_references` (source/target entity IDs)
- `issues` / `merge_requests` (project_id for source/target)
- `projects` (path_with_namespace)
## Implementation Sketch
```sql
-- Project-to-project reference counts
WITH ref_projects AS (
SELECT
CASE er.source_entity_type
WHEN 'issue' THEN i_src.project_id
WHEN 'merge_request' THEN mr_src.project_id
END as source_project_id,
CASE er.target_entity_type
WHEN 'issue' THEN i_tgt.project_id
WHEN 'merge_request' THEN mr_tgt.project_id
END as target_project_id,
er.reference_type
FROM entity_references er
LEFT JOIN issues i_src ON er.source_entity_type = 'issue' AND er.source_entity_id = i_src.id
LEFT JOIN merge_requests mr_src ON er.source_entity_type = 'merge_request' AND er.source_entity_id = mr_src.id
LEFT JOIN issues i_tgt ON er.target_entity_type = 'issue' AND er.target_entity_id = i_tgt.id
LEFT JOIN merge_requests mr_tgt ON er.target_entity_type = 'merge_request' AND er.target_entity_id = mr_tgt.id
WHERE er.target_entity_id IS NOT NULL -- resolved references only
)
SELECT
p_src.path_with_namespace as source_project,
p_tgt.path_with_namespace as target_project,
er.reference_type,
COUNT(*) as weight
FROM ref_projects rp
JOIN projects p_src ON rp.source_project_id = p_src.id
JOIN projects p_tgt ON rp.target_project_id = p_tgt.id
WHERE rp.source_project_id != rp.target_project_id -- cross-project only
GROUP BY p_src.path_with_namespace, p_tgt.path_with_namespace, er.reference_type
ORDER BY weight DESC;
```
## Output Formats
### Mermaid
```mermaid
graph LR
Backend -->|closes 23| Frontend
Backend -->|mentioned 47| Infrastructure
Frontend -->|mentioned 12| Backend
```
### DOT
```dot
digraph impact {
"group/backend" -> "group/frontend" [label="closes: 23"];
"group/backend" -> "group/infra" [label="mentioned: 47"];
}
```
## Downsides
- Requires multiple projects synced; limited value for single-project users
- "Mentioned" references are noisy (high volume, low signal)
- Doesn't capture coupling through shared libraries or APIs (code-level coupling)
## Extensions
- `lore impact-graph --since 90d` — time-scoped coupling analysis
- `lore impact-graph --type closes` — only meaningful reference types
- Include unresolved references to show dependencies on un-synced projects
- Coupling trend: is cross-project coupling increasing over time?

97
docs/ideas/label-audit.md Normal file
View File

@@ -0,0 +1,97 @@
# Label Hygiene Audit
- **Command:** `lore label-audit`
- **Confidence:** 82%
- **Tier:** 2
- **Status:** proposed
- **Effort:** low — straightforward aggregation queries
## What
Report on label health:
- Labels used only once (may be typos or abandoned experiments)
- Labels applied and removed within 1 hour (likely mistakes)
- Labels with no active issues/MRs (orphaned)
- Label name collisions across projects (same name, different meaning)
- Labels never used at all (defined but not applied)
## Why
Label sprawl is real and makes filtering useless over time. Teams create labels
ad-hoc and never clean them up. This simple audit surfaces maintenance tasks.
## Data Required
All exists today:
- `labels` (name, project_id)
- `issue_labels` / `mr_labels` (usage counts)
- `resource_label_events` (add/remove pairs for mistake detection)
- `issues` / `merge_requests` (state for "active" filtering)
## Implementation Sketch
```sql
-- Labels used only once
SELECT l.name, p.path_with_namespace, COUNT(*) as usage
FROM labels l
JOIN projects p ON l.project_id = p.id
LEFT JOIN issue_labels il ON il.label_id = l.id
LEFT JOIN mr_labels ml ON ml.label_id = l.id
GROUP BY l.id
HAVING COUNT(il.issue_id) + COUNT(ml.merge_request_id) = 1;
-- Flash labels (applied and removed within 1 hour)
SELECT
rle1.label_name,
rle1.created_at as added_at,
rle2.created_at as removed_at,
(rle2.created_at - rle1.created_at) / 60000 as minutes_active
FROM resource_label_events rle1
JOIN resource_label_events rle2
ON rle1.issue_id = rle2.issue_id
AND rle1.label_name = rle2.label_name
AND rle1.action = 'add'
AND rle2.action = 'remove'
AND rle2.created_at > rle1.created_at
AND (rle2.created_at - rle1.created_at) < 3600000;
-- Unused labels (defined but never applied)
SELECT l.name, p.path_with_namespace
FROM labels l
JOIN projects p ON l.project_id = p.id
LEFT JOIN issue_labels il ON il.label_id = l.id
LEFT JOIN mr_labels ml ON ml.label_id = l.id
WHERE il.issue_id IS NULL AND ml.merge_request_id IS NULL;
```
## Human Output
```
Label Audit
Unused Labels (4):
group/backend: deprecated-v1, needs-triage, wontfix-maybe
group/frontend: old-design
Single-Use Labels (3):
group/backend: perf-regression (1 issue)
group/frontend: ux-debt (1 MR), mobile-only (1 issue)
Flash Labels (applied < 1hr, 2):
group/backend #90: +priority::critical then -priority::critical (12 min)
group/backend #85: +blocked then -blocked (5 min)
Cross-Project Collisions (1):
"needs-review" used in group/backend (32 uses) AND group/frontend (8 uses)
```
## Downsides
- Low glamour; this is janitorial work
- Single-use labels may be legitimate (one-off categorization)
- Cross-project collisions may be intentional (shared vocabulary)
## Extensions
- `lore label-audit --fix` — suggest deletions for unused labels
- Trend: label count over time (is sprawl increasing?)

74
docs/ideas/label-flow.md Normal file
View File

@@ -0,0 +1,74 @@
# Label Velocity
- **Command:** `lore label-flow <from-label> <to-label>`
- **Confidence:** 78%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — self-join on resource_label_events, percentile computation
## What
For a given label pair (e.g., "needs-review" to "approved"), compute median and P90
transition times using `resource_label_events`. Shows how fast work moves through
your process labels.
Also supports: single label dwell time (how long does "in-progress" stay applied?).
## Why
Process bottlenecks become quantifiable. "Our code review takes a median of 3 days"
is actionable data for retrospectives and process improvement.
## Data Required
All exists today:
- `resource_label_events` (label_name, action, created_at, issue_id, merge_request_id)
## Implementation Sketch
```sql
-- Label A → Label B transition time
WITH add_a AS (
SELECT issue_id, merge_request_id, MIN(created_at) as added_at
FROM resource_label_events
WHERE label_name = ?1 AND action = 'add'
GROUP BY issue_id, merge_request_id
),
add_b AS (
SELECT issue_id, merge_request_id, MIN(created_at) as added_at
FROM resource_label_events
WHERE label_name = ?2 AND action = 'add'
GROUP BY issue_id, merge_request_id
)
SELECT
(b.added_at - a.added_at) / 3600000.0 as hours_transition
FROM add_a a
JOIN add_b b ON a.issue_id = b.issue_id OR a.merge_request_id = b.merge_request_id
WHERE b.added_at > a.added_at;
```
Then compute percentiles in Rust (median, P75, P90).
## Human Output
```
Label Flow: "needs-review" → "approved"
Transitions: 42 issues/MRs in last 90 days
Median: 18.5 hours
P75: 36.2 hours
P90: 72.8 hours
Slowest: !234 Refactor auth (168 hours)
```
## Downsides
- Only works if teams use label-based workflows consistently
- Labels may be applied out of order or skipped
- Self-join performance could be slow with many events
## Extensions
- `lore label-flow --dwell "in-progress"` — how long does a label stay?
- `lore label-flow --all` — auto-discover common transitions from event data
- Visualization: label state machine with median transition times on edges

View File

@@ -0,0 +1,81 @@
# Milestone Risk Report
- **Command:** `lore milestone-risk [title]`
- **Confidence:** 78%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — milestone + issue aggregation with scope change detection
## What
For each active milestone (or a specific one): show total issues, % closed, issues
added after milestone creation (scope creep), issues with no assignee, issues with
overdue due_date. Flag milestones where completion rate is below expected trajectory.
## Why
Milestone health is usually assessed by gut feel. This provides objective signals
from data already ingested. Project managers can spot risks early.
## Data Required
All exists today:
- `milestones` (title, state, due_date)
- `issues` (milestone_id, state, created_at, due_date, assignee)
- `issue_assignees` (for unassigned detection)
## Implementation Sketch
```sql
SELECT
m.title,
m.state,
m.due_date,
COUNT(*) as total_issues,
SUM(CASE WHEN i.state = 'closed' THEN 1 ELSE 0 END) as closed,
SUM(CASE WHEN i.state = 'opened' THEN 1 ELSE 0 END) as open,
SUM(CASE WHEN i.created_at > m.created_at THEN 1 ELSE 0 END) as scope_creep,
SUM(CASE WHEN ia.username IS NULL AND i.state = 'opened' THEN 1 ELSE 0 END) as unassigned,
SUM(CASE WHEN i.due_date < DATE('now') AND i.state = 'opened' THEN 1 ELSE 0 END) as overdue
FROM milestones m
JOIN issues i ON i.milestone_id = m.id
LEFT JOIN issue_assignees ia ON ia.issue_id = i.id
WHERE m.state = 'active'
GROUP BY m.id;
```
Note: `created_at` comparison for scope creep is approximate — GitLab doesn't
expose when an issue was added to a milestone via its milestone_events.
Actually we DO have `resource_milestone_events` — use those for precise scope change
detection.
## Human Output
```
Milestone Risk Report
v2.0 (due Feb 15, 2025)
Progress: 14/20 closed (70%)
Scope: +3 issues added after milestone start
Risks: 2 issues overdue, 1 issue unassigned
Status: ON TRACK (70% complete, 60% time elapsed)
v2.1 (due Mar 30, 2025)
Progress: 2/15 closed (13%)
Scope: +8 issues added after milestone start
Risks: 5 issues unassigned
Status: AT RISK (13% complete, scope still growing)
```
## Downsides
- Milestone semantics vary wildly between teams
- "Scope creep" detection is noisy if teams batch-add issues to milestones
- due_date comparison assumes consistent timezone handling
## Extensions
- `lore milestone-risk --history` — show scope changes over time
- Velocity estimation: at current closure rate, will the milestone finish on time?
- Combine with label-flow for "how fast are milestone issues moving through workflow"

67
docs/ideas/mr-pipeline.md Normal file
View File

@@ -0,0 +1,67 @@
# MR Pipeline Efficiency
- **Command:** `lore mr-pipeline [--since <date>]`
- **Confidence:** 78%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — builds on bottleneck detector with more stages
## What
Track the full MR lifecycle: creation, first review, all reviews complete (threads
resolved), approval, merge. Compute time spent in each stage across all MRs.
Identify which stage is the bottleneck.
## Why
"Our merge process is slow" is vague. This breaks it into stages so teams can target
the actual bottleneck. Maybe creation-to-review is fast but review-to-merge is slow
(merge queue issues). Maybe first review is fast but resolution takes forever
(contentious code).
## Data Required
All exists today:
- `merge_requests` (created_at, merged_at)
- `notes` (note_type='DiffNote', created_at, author_username)
- `discussions` (resolved, resolvable, merge_request_id)
- `resource_state_events` (state changes with timestamps)
## Implementation Sketch
For each merged MR, compute:
1. **Created → First Review**: MIN(DiffNote.created_at) - mr.created_at
2. **First Review → All Resolved**: MAX(discussion.resolved_at) - MIN(DiffNote.created_at)
3. **All Resolved → Merged**: mr.merged_at - MAX(discussion.resolved_at)
Note: "resolved_at" isn't directly stored but can be approximated from the last
note in resolved discussions, or from state events.
## Human Output
```
MR Pipeline (last 30 days, 24 merged MRs)
Stage Median P75 P90
Created → First Review 4.2h 12.1h 28.3h
First Review → Resolved 8.1h 24.5h 72.0h <-- BOTTLENECK
Resolved → Merged 0.5h 1.2h 3.1h
Total (Created → Merged) 18.4h 48.2h 96.1h
Biggest bottleneck: Review resolution (median 8.1h)
Suggestion: Consider breaking large MRs into smaller reviewable chunks
```
## Downsides
- "Resolved" timestamp approximation may be inaccurate
- Pipeline assumes linear flow; real MRs have back-and-forth cycles
- Draft MRs skew metrics (created early, reviewed late intentionally)
## Extensions
- `lore mr-pipeline --exclude-drafts` — cleaner metrics
- Per-project comparison: which project has the fastest pipeline?
- Trend line: weekly pipeline speed over time
- Break down by MR size (files changed) to normalize

View File

@@ -0,0 +1,265 @@
# Multi-Project Ergonomics
- **Confidence:** 90%
- **Tier:** 1
- **Status:** proposed
- **Effort:** medium (multiple small improvements that compound)
## The Problem
Every command that touches project-scoped data requires `-p group/subgroup/project`
to disambiguate. For users with 5+ projects synced, this is:
- Repetitive: typing `-p infra/platform/auth-service` on every query
- Error-prone: mistyping long paths
- Discoverable only by failure: you don't know you need `-p` until you hit an
ambiguous error
The fuzzy matching in `resolve_project` is already good (suffix, substring,
case-insensitive) but it only kicks in on the `-p` value itself. There's no way to
set a default, group projects, or scope a whole session.
## Proposed Improvements
### 1. Project Aliases in Config
Let users define short aliases for long project paths.
```json
{
"projects": [
{ "path": "infra/platform/auth-service", "alias": "auth" },
{ "path": "infra/platform/billing-service", "alias": "billing" },
{ "path": "frontend/customer-portal", "alias": "portal" },
{ "path": "frontend/admin-dashboard", "alias": "admin" }
]
}
```
Then: `lore issues -p auth` resolves via alias before falling through to fuzzy match.
**Implementation:** Add optional `alias` field to `ProjectConfig`. In
`resolve_project`, check aliases before the existing exact/suffix/substring cascade.
```rust
#[derive(Debug, Clone, Deserialize)]
pub struct ProjectConfig {
pub path: String,
#[serde(default)]
pub alias: Option<String>,
}
```
Resolution order becomes:
1. Exact alias match (new)
2. Exact path match
3. Case-insensitive path match
4. Suffix match
5. Substring match
### 2. Default Project (`LORE_PROJECT` env var)
Set a default project for your shell session so you don't need `-p` at all.
```bash
export LORE_PROJECT=auth
lore issues # scoped to auth-service
lore mrs --state opened # scoped to auth-service
lore search "timeout bug" # scoped to auth-service
lore issues -p billing # explicit -p overrides the env var
```
**Implementation:** In every command that accepts `-p`, fall back to
`std::env::var("LORE_PROJECT")` when the flag is absent. The `-p` flag always wins.
Could also support a config-level default:
```json
{
"defaultProject": "auth"
}
```
Precedence: CLI flag > env var > config default > (no filter).
### 3. `lore use <project>` — Session Context Switcher
A command that sets `LORE_PROJECT` for the current shell by writing to a dotfile.
```bash
lore use auth
# writes ~/.local/state/lore/current-project containing "auth"
lore issues # reads current-project file, scopes to auth
lore use --clear # removes the file, back to all-project mode
lore use # shows current project context
```
This is similar to `kubectl config use-context`, `nvm use`, or `tfenv use`.
**Implementation:** Write a one-line file at a known state path. Each command reads
it as the lowest-priority default (below env var and CLI flag).
Precedence: CLI flag > env var > `lore use` state file > config default > (no filter).
### 4. `lore projects` — Project Listing and Discovery
A dedicated command to see what's synced, with aliases and activity stats.
```bash
$ lore projects
Alias Path Issues MRs Last Sync
auth infra/platform/auth-service 142 87 2h ago
billing infra/platform/billing-service 56 34 2h ago
portal frontend/customer-portal 203 112 2h ago
admin frontend/admin-dashboard 28 15 3d ago
- data/ml-pipeline 89 45 2h ago
```
Robot mode returns the same as JSON with alias, path, counts, and last sync time.
**Implementation:** Query `projects` joined with `COUNT(issues)`, `COUNT(mrs)`,
and `MAX(sync_runs.finished_at)`. Overlay aliases from config.
### 5. Project Groups in Config
Let users define named groups of projects for batch scoping.
```json
{
"projectGroups": {
"backend": ["auth", "billing", "data/ml-pipeline"],
"frontend": ["portal", "admin"],
"all-infra": ["auth", "billing"]
}
}
```
Then: `lore issues -p @backend` (or `--group backend`) queries across all projects
in the group.
**Implementation:** When `-p` value starts with `@`, look up the group and resolve
each member project. Pass as a `Vec<i64>` of project IDs to the query layer.
This is especially powerful for:
- `lore search "auth bug" -p @backend` — search across related repos
- `lore digest --since 7d -p @frontend` — team-scoped activity digest
- `lore timeline "deployment" -p @all-infra` — cross-repo timeline
### 6. Git-Aware Project Detection
When running `lore` from inside a git repo that matches a synced project, auto-scope
to that project without any flags.
```bash
cd ~/code/auth-service
lore issues # auto-detects this is infra/platform/auth-service
```
**Implementation:** Read `.git/config` for the remote URL, extract the project path,
check if it matches a synced project. Only activate when exactly one project matches.
Detection logic:
```
1. Check if cwd is inside a git repo (find .git)
2. Parse git remote origin URL
3. Extract path component (e.g., "infra/platform/auth-service.git" → "infra/platform/auth-service")
4. Match against synced projects
5. If exactly one match, use as implicit -p
6. If ambiguous or no match, do nothing (fall through to normal behavior)
```
Precedence: CLI flag > env var > `lore use` > config default > git detection > (no filter).
This is similar to how `gh` (GitHub CLI) auto-detects the repo you're in.
### 7. Prompt Integration / Shell Function
Provide a shell function that shows the current project context in the prompt.
```bash
# In .bashrc / .zshrc
eval "$(lore completions zsh)"
PROMPT='$(lore-prompt)%~ %# '
```
Output: `[lore:auth] ~/code/auth-service %`
Shows which project `lore` commands will scope to, using the same precedence chain.
Helps users understand what context they're in before running a query.
### 8. Short Project References in Output
Once aliases exist, use them everywhere in output for brevity:
**Before:**
```
infra/platform/auth-service#42 Login timeout bug
infra/platform/auth-service!234 Refactor auth middleware
```
**After:**
```
auth#42 Login timeout bug
auth!234 Refactor auth middleware
```
With `--full-paths` flag to get the verbose form when needed.
## Combined UX Flow
With all improvements, a typical session looks like:
```bash
# One-time config
lore init # sets up aliases during interactive setup
# Daily use
lore use auth # set context
lore issues --state opened # no -p needed
lore search "timeout" # scoped to auth
lore timeline "login flow" # scoped to auth
lore issues -p @backend # cross-repo query via group
lore mrs -p billing # quick alias switch
lore use --clear # back to global
```
Or for the power user who never wants to type `lore use`:
```bash
cd ~/code/auth-service
lore issues # git-aware auto-detection
```
Or for the scripter:
```bash
LORE_PROJECT=auth lore --robot issues -n 50 # env var for automation
```
## Priority Order
Implement in this order for maximum incremental value:
1. **Project aliases** — smallest change, biggest daily friction reduction
2. **`LORE_PROJECT` env var** — trivial to implement, enables scripting
3. **`lore projects` command** — discoverability, completes the alias story
4. **`lore use` context** — nice-to-have for heavy users
5. **Project groups** — high value for multi-repo teams
6. **Git-aware detection** — polish, "it just works" feel
7. **Short refs in output** — ties into timeline issue #001
8. **Prompt integration** — extra polish
## Relationship to Issue #001
The timeline entity-ref ambiguity (issue #001) is solved naturally by items 7 and 8
here. Once aliases exist, `format_entity_ref` can use the alias as the short project
identifier in multi-project output:
```
auth#42 instead of infra/platform/auth-service#42
```
And in single-project timelines (detected via `lore use` or git-aware), the project
prefix is omitted entirely — matching the current behavior but now intentionally.

View File

@@ -0,0 +1,81 @@
# Recurring Bug Pattern Detector
- **Command:** `lore recurring-patterns [--min-cluster <N>]`
- **Confidence:** 76%
- **Tier:** 3
- **Status:** proposed
- **Effort:** high — vector clustering, threshold tuning
## What
Cluster closed issues by embedding similarity. Identify clusters of 3+ issues that
are semantically similar — these represent recurring problems that need a systemic
fix rather than one-off patches.
## Why
Finding the same bug filed 5 different ways is one of the most impactful things you
can surface. This is a sophisticated use of the embedding pipeline that no competing
tool offers. It turns "we keep having auth issues" from a gut feeling into data.
## Data Required
All exists today:
- `documents` (source_type='issue', content_text)
- `embeddings` (768-dim vectors)
- `issues` (state='closed' for filtering)
## Implementation Sketch
```
1. Collect all embeddings for closed issue documents
2. For each issue, find K nearest neighbors (K=10)
3. Build adjacency graph: edge exists if similarity > threshold (e.g., 0.80)
4. Find connected components (simple DFS/BFS)
5. Filter to components with >= min-cluster members (default 3)
6. For each cluster:
a. Extract common terms (TF-IDF or simple word frequency)
b. Sort by recency (most recent issue first)
c. Report cluster with: theme, member issues, time span
```
### Similarity Threshold Tuning
This is the critical parameter. Too low = noise, too high = misses.
- Start at 0.80 cosine similarity
- Expose as `--threshold` flag for user tuning
- Report cluster cohesion score for transparency
## Human Output
```
Recurring Patterns (3+ similar closed issues)
Cluster 1: "Authentication timeout errors" (5 issues, spanning 6 months)
#89 Login timeout on slow networks (closed 3d ago)
#72 Auth flow hangs on cellular (closed 2mo ago)
#58 Token refresh timeout (closed 3mo ago)
#45 SSO login timeout for remote users (closed 5mo ago)
#31 Connection timeout in auth middleware (closed 6mo ago)
Avg similarity: 0.87 | Suggested: systemic fix for auth timeout handling
Cluster 2: "Cache invalidation issues" (3 issues, spanning 2 months)
#85 Stale cache after deploy (closed 2w ago)
#77 Cache headers not updated (closed 1mo ago)
#69 Dashboard shows old data after settings change (closed 2mo ago)
Avg similarity: 0.82 | Suggested: review cache invalidation strategy
```
## Downsides
- Clustering quality depends on embedding quality and threshold tuning
- May produce false clusters (issues that mention similar terms but are different problems)
- Computationally expensive for large issue counts (N^2 comparisons)
- Need to handle multi-chunk documents (aggregate embeddings)
## Extensions
- `lore recurring-patterns --open` — find clusters in open issues (duplicates to merge)
- `lore recurring-patterns --cross-project` — patterns across repos
- Trend detection: are cluster sizes growing? (escalating problem)
- Export as report for engineering retrospectives

View File

@@ -0,0 +1,78 @@
# DiffNote Coverage Map
- **Command:** `lore review-coverage <mr-iid>`
- **Confidence:** 75%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — join DiffNote positions with mr_file_changes
## What
For a specific MR, show which files received review comments (DiffNotes) vs. which
files were changed but received no review attention. Highlights blind spots in code
review.
## Why
Large MRs often have files that get reviewed thoroughly and files that slip through
with no comments. This makes the review coverage visible so teams can decide if
un-reviewed files need a second look.
## Data Required
All exists today:
- `mr_file_changes` (new_path per MR)
- `notes` (position_new_path, note_type='DiffNote', discussion_id)
- `discussions` (merge_request_id)
## Implementation Sketch
```sql
SELECT
mfc.new_path,
mfc.change_type,
COUNT(DISTINCT n.id) as review_comments,
COUNT(DISTINCT d.id) as review_threads,
CASE WHEN COUNT(n.id) = 0 THEN 'NOT REVIEWED' ELSE 'REVIEWED' END as status
FROM mr_file_changes mfc
LEFT JOIN notes n ON n.position_new_path = mfc.new_path
AND n.note_type = 'DiffNote'
AND n.is_system = 0
LEFT JOIN discussions d ON n.discussion_id = d.id
AND d.merge_request_id = mfc.merge_request_id
WHERE mfc.merge_request_id = ?1
GROUP BY mfc.new_path
ORDER BY review_comments DESC;
```
## Human Output
```
Review Coverage for !234 — Refactor auth middleware
REVIEWED (5 files, 23 comments)
src/auth/middleware.rs 12 comments, 4 threads
src/auth/jwt.rs 6 comments, 2 threads
src/auth/session.rs 3 comments, 1 thread
tests/auth/middleware_test.rs 1 comment, 1 thread
src/auth/mod.rs 1 comment, 1 thread
NOT REVIEWED (3 files)
src/auth/types.rs modified [no review comments]
src/api/routes.rs modified [no review comments]
Cargo.toml modified [no review comments]
Coverage: 5/8 files (62.5%)
```
## Downsides
- Reviewers may have reviewed a file without leaving comments (approval by silence)
- position_new_path matching may not cover all DiffNote position formats
- Config files (Cargo.toml) not being reviewed is usually fine
## Extensions
- `lore review-coverage --all --since 30d` — aggregate coverage across all MRs
- Per-reviewer breakdown: which reviewers cover which files?
- Coverage heatmap: files that consistently escape review across multiple MRs

90
docs/ideas/silos.md Normal file
View File

@@ -0,0 +1,90 @@
# Knowledge Silo Detection
- **Command:** `lore silos [--min-changes <N>]`
- **Confidence:** 87%
- **Tier:** 2
- **Status:** proposed
- **Effort:** medium — requires mr_file_changes population (Gate 4)
## What
For each file path (or directory), count unique MR authors. Flag paths where only
1 person has ever authored changes (bus factor = 1). Aggregate by directory to show
silo areas.
## Why
Bus factor analysis is critical for team resilience. If only one person has ever
touched the auth module, that's a risk. This uses data already ingested to surface
knowledge concentration that's otherwise invisible.
## Data Required
- `mr_file_changes` (new_path, merge_request_id) — needs Gate 4 ingestion
- `merge_requests` (author_username, state='merged')
- `projects` (path_with_namespace)
## Implementation Sketch
```sql
-- Find directories with bus factor = 1
WITH file_authors AS (
SELECT
mfc.new_path,
mr.author_username,
p.path_with_namespace,
mfc.project_id
FROM mr_file_changes mfc
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
JOIN projects p ON mfc.project_id = p.id
WHERE mr.state = 'merged'
),
directory_authors AS (
SELECT
project_id,
path_with_namespace,
-- Extract directory: everything before last '/'
CASE
WHEN INSTR(new_path, '/') > 0
THEN SUBSTR(new_path, 1, LENGTH(new_path) - LENGTH(REPLACE(RTRIM(new_path, REPLACE(new_path, '/', '')), '', '')))
ELSE '.'
END as directory,
COUNT(DISTINCT author_username) as unique_authors,
COUNT(*) as total_changes,
GROUP_CONCAT(DISTINCT author_username) as authors
FROM file_authors
GROUP BY project_id, directory
)
SELECT * FROM directory_authors
WHERE unique_authors = 1
AND total_changes >= ?1 -- min-changes threshold
ORDER BY total_changes DESC;
```
## Human Output
```
Knowledge Silos (bus factor = 1, min 3 changes)
group/backend
src/auth/ alice (8 changes) HIGH RISK
src/billing/ bob (5 changes) HIGH RISK
src/utils/cache/ charlie (3 changes) MODERATE RISK
group/frontend
src/admin/ dave (12 changes) HIGH RISK
```
## Downsides
- Historical authors may have left the team; needs recency weighting
- Requires `mr_file_changes` to be populated (Gate 4)
- Single-author directories may be intentional (ownership model)
- Directory aggregation heuristic is imperfect for deep nesting
## Extensions
- `lore silos --since 180d` — only count recent activity
- `lore silos --depth 2` — aggregate at directory depth N
- Combine with `lore experts` to show both silos and experts in one view
- Risk scoring: weight by directory size, change frequency, recency

View File

@@ -0,0 +1,95 @@
# Similar Issues Finder
- **Command:** `lore similar <iid>`
- **Confidence:** 95%
- **Tier:** 1
- **Status:** proposed
- **Effort:** low — infrastructure exists, needs one new query path
## What
Given an issue IID, find the N most semantically similar issues using the existing
vector embeddings. Show similarity score and overlapping keywords.
Can also work with MRs: `lore similar --mr <iid>`.
## Why
Duplicate detection is a constant problem on active projects. "Is this bug already
filed?" becomes a one-liner. This is the most natural use of the embedding pipeline
and the feature people expect when they hear "semantic search."
## Data Required
All exists today:
- `documents` table (source_type, source_id, content_text)
- `embeddings` virtual table (768-dim vectors via sqlite-vec)
- `embedding_metadata` (document_hash for staleness check)
## Implementation Sketch
```
1. Resolve IID → issue.id → document.id (via source_type='issue', source_id)
2. Look up embedding vector(s) for that document
3. Query sqlite-vec for K nearest neighbors (K = limit * 2 for headroom)
4. Filter to source_type='issue' (or 'merge_request' if --include-mrs)
5. Exclude self
6. Rank by cosine similarity
7. Return top N with: iid, title, project, similarity_score, url
```
### SQL Core
```sql
-- Get the embedding for target document (chunk 0 = representative)
SELECT embedding FROM embeddings WHERE rowid = ?1 * 1000;
-- Find nearest neighbors
SELECT
rowid,
distance
FROM embeddings
WHERE embedding MATCH ?1
AND k = ?2
ORDER BY distance;
-- Resolve back to entities
SELECT d.source_type, d.source_id, d.title, d.url, i.iid, i.state
FROM documents d
JOIN issues i ON d.source_id = i.id AND d.source_type = 'issue'
WHERE d.id = ?;
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"query_issue": { "iid": 42, "title": "Login timeout on slow networks" },
"similar": [
{
"iid": 38,
"title": "Connection timeout in auth flow",
"project": "group/backend",
"similarity": 0.87,
"state": "closed",
"url": "https://gitlab.com/group/backend/-/issues/38"
}
]
},
"meta": { "elapsed_ms": 45, "candidates_scanned": 200 }
}
```
## Downsides
- Embedding quality depends on description quality; short issues may not match well
- Multi-chunk documents need aggregation strategy (use chunk 0 or average?)
- Requires embeddings to be generated first (`lore embed`)
## Extensions
- `lore similar --open-only` to filter to unresolved issues (duplicate triage)
- `lore similar --text "free text query"` to find issues similar to arbitrary text
- Batch mode: find all potential duplicate clusters across the entire database

View File

@@ -0,0 +1,100 @@
# Stale Discussion Finder
- **Command:** `lore stale-discussions [--days <N>]`
- **Confidence:** 90%
- **Tier:** 1
- **Status:** proposed
- **Effort:** low — single query, minimal formatting
## What
List unresolved, resolvable discussions where `last_note_at` is older than a
threshold (default 14 days), grouped by parent entity. Prioritize by discussion
count per entity (more stale threads = more urgent).
## Why
Unresolved discussions are silent blockers. They prevent MR merges, stall
decision-making, and represent forgotten conversations. This surfaces them so teams
can take action: resolve, respond, or explicitly mark as won't-fix.
## Data Required
All exists today:
- `discussions` (resolved, resolvable, last_note_at)
- `issues` / `merge_requests` (for parent entity context)
## Implementation Sketch
```sql
SELECT
d.id,
d.noteable_type,
CASE WHEN d.issue_id IS NOT NULL THEN i.iid ELSE mr.iid END as entity_iid,
CASE WHEN d.issue_id IS NOT NULL THEN i.title ELSE mr.title END as entity_title,
p.path_with_namespace,
d.last_note_at,
((?1 - d.last_note_at) / 86400000) as days_stale,
COUNT(*) OVER (PARTITION BY COALESCE(d.issue_id, d.merge_request_id), d.noteable_type) as stale_count_for_entity
FROM discussions d
JOIN projects p ON d.project_id = p.id
LEFT JOIN issues i ON d.issue_id = i.id
LEFT JOIN merge_requests mr ON d.merge_request_id = mr.id
WHERE d.resolved = 0
AND d.resolvable = 1
AND d.last_note_at < ?1
ORDER BY days_stale DESC;
```
## Human Output Format
```
Stale Discussions (14+ days without activity)
group/backend !234 — Refactor auth middleware (3 stale threads)
Discussion #a1b2c3 (28d stale) "Should we use JWT or session tokens?"
Discussion #d4e5f6 (21d stale) "Error handling for expired tokens"
Discussion #g7h8i9 (14d stale) "Performance implications of per-request validation"
group/backend #90 — Rate limiting design (1 stale thread)
Discussion #j0k1l2 (18d stale) "Redis vs in-memory rate counter"
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"threshold_days": 14,
"total_stale": 4,
"entities": [
{
"type": "merge_request",
"iid": 234,
"title": "Refactor auth middleware",
"project": "group/backend",
"stale_discussions": [
{
"discussion_id": "a1b2c3",
"days_stale": 28,
"first_note_preview": "Should we use JWT or session tokens?"
}
]
}
]
}
}
```
## Downsides
- Some discussions are intentionally left open (design docs, long-running threads)
- Could produce noise in repos with loose discussion hygiene
- Doesn't distinguish "stale and blocking" from "stale and irrelevant"
## Extensions
- `lore stale-discussions --mr-only` — focus on MR review threads (most actionable)
- `lore stale-discussions --author alice` — "threads I started that went quiet"
- `lore stale-discussions --assignee bob` — "threads on my MRs that need attention"

82
docs/ideas/unlinked.md Normal file
View File

@@ -0,0 +1,82 @@
# Unlinked MR Finder
- **Command:** `lore unlinked [--since <date>]`
- **Confidence:** 83%
- **Tier:** 2
- **Status:** proposed
- **Effort:** low — LEFT JOIN queries
## What
Two reports:
1. Merged MRs with no entity_references at all (no "closes", no "mentioned",
no "related") — orphan MRs with no issue traceability
2. Closed issues with no MR reference — issues closed manually without code change
## Why
Process compliance metric. Unlinked MRs mean lost traceability — you can't trace
a code change back to a requirement. Manually closed issues might mean work was done
outside the tracked process, or issues were closed prematurely.
## Data Required
All exists today:
- `merge_requests` (state, merged_at)
- `issues` (state, closed/updated_at)
- `entity_references` (for join/anti-join)
## Implementation Sketch
```sql
-- Orphan merged MRs (no references at all)
SELECT mr.iid, mr.title, mr.author_username, mr.merged_at,
p.path_with_namespace
FROM merge_requests mr
JOIN projects p ON mr.project_id = p.id
LEFT JOIN entity_references er
ON er.source_entity_type = 'merge_request' AND er.source_entity_id = mr.id
WHERE mr.state = 'merged'
AND mr.merged_at >= ?1
AND er.id IS NULL
ORDER BY mr.merged_at DESC;
-- Closed issues with no MR reference
SELECT i.iid, i.title, i.author_username, i.updated_at,
p.path_with_namespace
FROM issues i
JOIN projects p ON i.project_id = p.id
LEFT JOIN entity_references er
ON er.target_entity_type = 'issue' AND er.target_entity_id = i.id
AND er.source_entity_type = 'merge_request'
WHERE i.state = 'closed'
AND i.updated_at >= ?1
AND er.id IS NULL
ORDER BY i.updated_at DESC;
```
## Human Output
```
Unlinked MRs (merged with no issue reference, last 30 days)
!245 Fix typo in README (alice, merged 2d ago)
!239 Update CI pipeline (bob, merged 1w ago)
!236 Bump dependency versions (charlie, merged 2w ago)
Orphan Closed Issues (closed without any MR, last 30 days)
#92 Update documentation for v2 (closed by dave, 3d ago)
#88 Investigate memory usage (closed by eve, 2w ago)
```
## Downsides
- Some MRs legitimately don't reference issues (chores, CI fixes, dependency bumps)
- Some issues are legitimately closed without code (questions, duplicates, won't-fix)
- Noise level depends on team discipline
## Extensions
- `lore unlinked --ignore-labels "chore,ci"` — filter out expected orphans
- Compliance score: % of MRs with issue links over time (trend metric)

102
docs/ideas/weekly-digest.md Normal file
View File

@@ -0,0 +1,102 @@
# Weekly Digest Generator
- **Command:** `lore weekly [--since <date>]`
- **Confidence:** 90%
- **Tier:** 1
- **Status:** proposed
- **Effort:** medium — builds on digest infrastructure, adds markdown formatting
## What
Auto-generate a markdown document summarizing the week: MRs merged (grouped by
project), issues closed, new issues opened, ongoing discussions, milestone progress.
Formatted for pasting into Slack, email, or team standup notes.
Default window is 7 days. `--since` overrides.
## Why
Every team lead writes a weekly status update. This writes itself from the data.
Leverages everything gitlore has ingested. Saves 30-60 minutes of manual summarization
per week.
## Data Required
Same as digest (all exists today):
- `resource_state_events`, `merge_requests`, `issues`, `discussions`
- `milestones` for progress tracking
## Implementation Sketch
This is essentially `lore digest --since 7d --format markdown` with:
1. Section headers for each category
2. Milestone progress bars (X/Y issues closed)
3. "Highlights" section with the most-discussed items
4. "Risks" section with overdue issues and stale MRs
### Markdown Template
```markdown
# Weekly Summary — Jan 20-27, 2025
## Highlights
- **!234** Refactor auth middleware merged (12 discussions, 4 reviewers)
- **#95** New critical bug: Rate limiting returns 500
## Merged (3)
| MR | Title | Author | Reviewers |
|----|-------|--------|-----------|
| !234 | Refactor auth middleware | alice | bob, charlie |
| !231 | Fix connection pool leak | bob | alice |
| !45 | Update dashboard layout | eve | dave |
## Closed Issues (2)
- **#89** Login timeout on slow networks (closed by alice)
- **#87** Stale cache headers (closed by bob)
## New Issues (3)
- **#95** Rate limiting returns 500 (priority::high, assigned to charlie)
- **#94** Add rate limit documentation (priority::low)
- **#93** Flaky test in CI pipeline (assigned to dave)
## Milestone Progress
- **v2.0** — 14/20 issues closed (70%) — due Feb 15
- **v1.9-hotfix** — 3/3 issues closed (100%) — COMPLETE
## Active Discussions
- **#90** 8 new comments this week (needs-review)
- **!230** 5 review threads unresolved
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"period": { "from": "2025-01-20", "to": "2025-01-27" },
"merged_count": 3,
"closed_count": 2,
"opened_count": 3,
"highlights": [...],
"merged": [...],
"closed": [...],
"opened": [...],
"milestones": [...],
"active_discussions": [...]
}
}
```
## Downsides
- Formatting preferences vary by team; hard to please everyone
- "Highlights" ranking is heuristic (discussion count as proxy for importance)
- Doesn't capture work done outside GitLab
## Extensions
- `lore weekly --project group/backend` — single project scope
- `lore weekly --author alice` — personal weekly summary
- `lore weekly --output weekly.md` — write to file
- Scheduled generation via cron + robot mode

View File

@@ -0,0 +1,140 @@
# 001: Timeline human output omits project path from entity references
- **Severity:** medium
- **Component:** `src/cli/commands/timeline.rs`
- **Status:** open
## Problem
The `lore timeline` human-readable output renders entity references as bare `#42` or
`!234` without the project path. When multiple projects are synced, this makes the
output ambiguous — issue `#42` in `group/backend` and `#42` in `group/frontend` are
indistinguishable.
### Affected code
`format_entity_ref` at `src/cli/commands/timeline.rs:201-207`:
```rust
fn format_entity_ref(entity_type: &str, iid: i64) -> String {
match entity_type {
"issue" => format!("#{iid}"),
"merge_request" => format!("!{iid}"),
_ => format!("{entity_type}:{iid}"),
}
}
```
This function is called in three places:
1. **Event lines** (`print_timeline_event`, line 130) — each event row shows `#42`
with no project context
2. **Footer seed list** (`print_timeline_footer`, line 161) — seed entities listed as
`#42, !234` with no project disambiguation
3. **Collect stage summaries** (`timeline_collect.rs:107`) — the `summary` field itself
bakes in `"Issue #42 created: ..."` without project
### Current output (ambiguous)
```
2025-01-20 CREATED #42 Issue #42 created: Login timeout bug @alice
2025-01-21 LABEL+ #42 Label added: priority::high @dave
2025-01-22 CREATED !234 MR !234 created: Refactor auth middleware @alice
2025-01-25 MERGED !234 MR !234 merged @bob
Seed entities: #42, !234
```
When multiple projects are synced, a reader cannot tell which project `#42` belongs to.
## Robot mode is partially affected
The robot JSON output (`EventJson`, line 387-416) DOES include a `project` field per
event, so programmatic consumers can disambiguate. However, the `summary` string field
still bakes in bare `#42` without project context, which is misleading if an agent uses
the summary for display.
## Proposed fix
### 1. Add project to `format_entity_ref`
Pass `project_path` into `format_entity_ref` and use GitLab's full reference format:
```rust
fn format_entity_ref(entity_type: &str, iid: i64, project_path: &str) -> String {
match entity_type {
"issue" => format!("{project_path}#{iid}"),
"merge_request" => format!("{project_path}!{iid}"),
_ => format!("{project_path}/{entity_type}:{iid}"),
}
}
```
### 2. Smart elision for single-project timelines
When all events belong to the same project, the full path is visual noise. Detect
this and fall back to bare `#42` / `!234`:
```rust
fn should_show_project(events: &[TimelineEvent]) -> bool {
let mut projects = events.iter().map(|e| &e.project_path).collect::<HashSet<_>>();
projects.len() > 1
}
```
Then conditionally format:
```rust
let entity_ref = if show_project {
format_entity_ref(&event.entity_type, event.entity_iid, &event.project_path)
} else {
format_entity_ref_short(&event.entity_type, event.entity_iid)
};
```
### 3. Fix summary strings in collect stage
`timeline_collect.rs:107` bakes the summary as `"Issue #42 created: title"`. This
should include the project when multi-project:
```rust
let prefix = if multi_project {
format!("{type_label} {project_path}#{iid}")
} else {
format!("{type_label} #{iid}")
};
summary = format!("{prefix} created: {title_str}");
```
Same pattern for the merge summary at lines 317 and 347.
### 4. Update footer seed list
`print_timeline_footer` (line 155-164) should also use the project-aware format:
```rust
result.seed_entities.iter()
.map(|e| format_entity_ref(&e.entity_type, e.entity_iid, &e.project_path))
```
## Expected output after fix
### Single project (no change)
```
2025-01-20 CREATED #42 Issue #42 created: Login timeout bug @alice
```
### Multi-project (project path added)
```
2025-01-20 CREATED group/backend#42 Issue group/backend#42 created: Login timeout @alice
2025-01-22 CREATED group/frontend#42 Issue group/frontend#42 created: Broken layout @eve
```
## Impact
- Human output: ambiguous for multi-project users (the primary use case for gitlore)
- Robot output: summary field misleading, but `project` field provides workaround
- Timeline footer: seed entity list ambiguous
- Collect-stage summaries: baked-in bare references propagate to both renderers

View File

@@ -0,0 +1,179 @@
# Deep Performance Audit Report
**Date:** 2026-02-12
**Branch:** `perf-audit` (e9bacc94)
**Parent:** `039ab1c2` (master, v0.6.1)
---
## Methodology
1. **Baseline** — measured p50/p95 latency for all major commands with warm cache
2. **Profile** — used macOS `sample` profiler and `EXPLAIN QUERY PLAN` to identify hotspots
3. **Golden output** — captured exact numeric outputs before changes as equivalence oracle
4. **One lever per change** — each optimization isolated and independently benchmarked
5. **Revert threshold** — any optimization <1.1x speedup reverted per audit rules
---
## Baseline Measurements (warm cache, release build)
| Command | Latency | Notes |
|---------|---------|-------|
| `who --path src/core/db.rs` (expert) | 2200ms | **Hotspot** |
| `who --active` | 83-93ms | Acceptable |
| `who workload` | 22ms | Fast |
| `stats` | 107-112ms | **Hotspot** |
| `search "authentication"` | 1030ms | **Hotspot** (library-level) |
| `list issues -n 50` | ~40ms | Fast |
---
## Optimization 1: INDEXED BY for DiffNote Queries
**Target:** `src/cli/commands/who.rs` — expert and reviews query paths
**Problem:** SQLite query planner chose `idx_notes_system` (38% selectivity, 106K rows) over `idx_notes_diffnote_path_created` (9.3% selectivity, 26K rows) for path-filtered DiffNote queries. The partial index `WHERE noteable_type = 'MergeRequest' AND type = 'DiffNote'` is far more selective but the planner's cost model didn't pick it.
**Change:** Added `INDEXED BY idx_notes_diffnote_path_created` to all 8 SQL queries across `query_expert`, `query_expert_details`, `query_reviews`, `build_path_query` (probes 1 & 2), and `suffix_probe`.
**Results:**
| Query | Before | After | Speedup |
|-------|--------|-------|---------|
| expert (specific path) | 2200ms | 56-58ms | **38x** |
| expert (broad path) | 2200ms | 83ms | **26x** |
| reviews | 1800ms | 24ms | **75x** |
**Isomorphism proof:** `INDEXED BY` only changes which index the planner uses, not the query semantics. Same rows matched, same ordering, same output. Verified by golden output comparison across 5+ runs.
---
## Optimization 2: Conditional Aggregates in Stats
**Target:** `src/cli/commands/stats.rs`
**Problem:** 12+ sequential `COUNT(*)` queries each requiring a full table scan of `documents` (61K rows). Each scan touched the same pages but couldn't share work.
**Changes:**
- Documents: 5 sequential COUNTs -> 1 query with `SUM(CASE WHEN ... THEN 1 END)`
- FTS count: `SELECT COUNT(*) FROM documents_fts` (virtual table, slow) -> `SELECT COUNT(*) FROM documents_fts_docsize` (shadow B-tree table, 19x faster)
- Embeddings: 2 queries -> 1 with `COUNT(DISTINCT document_id), COUNT(*)`
- Dirty sources: 2 queries -> 1 with conditional aggregates
- Pending fetches: 2 queries -> 1 each (discussions, dependents)
**Results:**
| Metric | Before | After | Speedup |
|--------|--------|-------|---------|
| Warm median | 112ms | 66ms | **1.70x** |
| Cold | 1220ms | ~700ms | ~1.7x |
**Golden output verified:**
```
total:61652, issues:8241, mrs:10018, discussions:43393, truncated:63
fts:61652, embedded:61652, chunks:88161
```
All values match exactly across before/after runs.
**Isomorphism proof:** `SUM(CASE WHEN x THEN 1 END)` is algebraically identical to `COUNT(*) WHERE x`. The FTS5 shadow table `documents_fts_docsize` has exactly one row per FTS document by SQLite specification, so `COUNT(*)` on it equals the virtual table count.
---
## Investigation: Two-Phase FTS Search (REVERTED)
**Target:** `src/search/fts.rs`, `src/cli/commands/search.rs`
**Hypothesis:** FTS5 `snippet()` generation is expensive. Splitting search into Phase 1 (score-only MATCH+bm25) and Phase 2 (snippet for filtered results only) should reduce work.
**Implementation:** Created `fetch_fts_snippets()` that retrieves snippets only for post-filter document IDs via `json_each()` join.
**Results:**
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| search (limit 20) | 1030ms | 995ms | 3.5% |
**Decision:** Reverted. Per audit rules, <1.1x speedup does not justify added code complexity.
**Root cause:** The bottleneck is not snippet generation but `MATCH` + `bm25()` scoring itself. Profiling showed `strspn` (FTS5 tokenizer) and `memmove` as the top CPU consumers. The same query runs in 30ms on system sqlite3 but 1030ms in rusqlite's bundled SQLite — a ~125x gap despite both being SQLite 3.51.x compiled at -O3.
---
## Library-Level Finding: Bundled SQLite FTS5 Performance
**Observation:** FTS5 MATCH+bm25 queries are ~125x slower in rusqlite's bundled SQLite vs system sqlite3.
| Environment | Query Time | Notes |
|-------------|-----------|-------|
| System sqlite3 (macOS) | 30ms (with snippet), 8ms (without) | Same .db file |
| rusqlite bundled | 1030ms | `features = ["bundled"]`, OPT_LEVEL=3 |
**Profiler data (macOS `sample`):**
- Top hotspot: `strspn` in FTS5 tokenizer
- Secondary: `memmove` in FTS5 internals
- Scaling: ~5ms per result (limit 5 = 497ms, limit 20 = 995ms)
**Possible causes:**
- Bundled SQLite compiled without platform-specific optimizations (SIMD, etc.)
- Different memory allocator behavior
- Missing compile-time tuning flags
**Recommendation for future:** Investigate switching from `features = ["bundled"]` to system SQLite linkage, or audit the bundled compile flags in the `libsqlite3-sys` build script.
---
## Exploration Agent Findings (Informational)
Four parallel exploration agents surveyed the entire codebase. Key findings beyond what was already addressed:
### Ingestion Pipeline
- Serial DB writes in async context (acceptable — rusqlite is synchronous)
- Label ingestion uses individual inserts (potential batch optimization, low priority)
### CLI / GitLab Client
- GraphQL client recreated per call (`client.rs:98-100`) — caches connection pool, minor
- Double JSON deserialization in GraphQL responses — medium priority
- N+1 subqueries in `list` command (`list.rs:408-423`) — 4 correlated subqueries per row
### Search / Embedding
- No N+1 patterns, no O(n^2) algorithms
- Chunking is O(n) single-pass with proper UTF-8 safety
- Ollama concurrency model is sound (parallel HTTP, serial DB writes)
### Database / Documents
- O(n^2) prefix sum in `truncation.rs` — low traffic path
- String allocation patterns in extractors — micro-optimization territory
---
## Opportunity Matrix
| Candidate | Impact | Confidence | Effort | Score | Status |
|-----------|--------|------------|--------|-------|--------|
| INDEXED BY for DiffNote | Very High | High | Low | **9.0** | Shipped |
| Stats conditional aggregates | Medium | High | Low | **7.0** | Shipped |
| Bundled SQLite FTS5 | Very High | Medium | High | 5.0 | Documented |
| List N+1 subqueries | Medium | Medium | Medium | 4.0 | Backlog |
| GraphQL double deser | Low | Medium | Low | 3.5 | Backlog |
| Truncation O(n^2) | Low | High | Low | 3.0 | Backlog |
---
## Files Modified
| File | Change |
|------|--------|
| `src/cli/commands/who.rs` | INDEXED BY hints on 8 SQL queries |
| `src/cli/commands/stats.rs` | Conditional aggregates, FTS5 shadow table, merged queries |
---
## Quality Gates
- All 603 tests pass
- `cargo clippy --all-targets -- -D warnings` clean
- `cargo fmt --check` clean
- Golden output verified for both optimizations

View File

@@ -39,7 +39,7 @@ Five gates, each independently verifiable and shippable:
- **Opt-in event ingestion.** New config flag `sync.fetchResourceEvents` (default `true`) controls whether the sync pipeline fetches event data. Users who don't need temporal features skip the additional API calls.
- **Application-level graph traversal.** Cross-reference expansion uses BFS in Rust, not recursive SQL CTEs. Capped at configurable depth (default 1) for predictable performance.
- **Evolutionary library extraction.** New commands are built with typed return structs from day one. Old commands are not retrofitted until a concrete consumer (MCP server, web UI) requires it.
- **Phase A fields cherry-picked as needed.** `merge_commit_sha` and `squash_commit_sha` are added in this phase's migration. Remaining Phase A fields are handled in their own migration later.
- **Phase A fields cherry-picked as needed.** `merge_commit_sha` and `squash_commit_sha` are added in migration 015 and populated during MR ingestion. Remaining Phase A fields are handled in their own migration later.
### Scope Boundaries
@@ -71,9 +71,9 @@ The original approach was to parse system note body text with regex to extract s
System note parsing is still used for events without structured APIs (see Gate 2), but with the explicit understanding that it's best-effort and fragile for non-English instances.
### 1.2 Schema (Migration 010)
### 1.2 Schema (Migration 011)
**File:** `migrations/010_resource_events.sql`
**File:** `migrations/011_resource_events.sql`
```sql
-- State change events (opened, closed, reopened, merged, locked)
@@ -89,16 +89,16 @@ CREATE TABLE resource_state_events (
actor_gitlab_id INTEGER, -- GitLab user ID (stable; usernames can change)
actor_username TEXT, -- display/search convenience
created_at INTEGER NOT NULL, -- ms epoch UTC
-- "closed by MR" link: structured by GitLab, not parsed from text
source_merge_request_id INTEGER, -- GitLab's MR iid that caused this state change
source_commit TEXT, -- commit SHA that caused this state change
UNIQUE(gitlab_id, project_id),
source_merge_request_iid INTEGER, -- iid from source_merge_request ref
CHECK (
(issue_id IS NOT NULL AND merge_request_id IS NULL)
OR (issue_id IS NULL AND merge_request_id IS NOT NULL)
)
);
CREATE UNIQUE INDEX uq_state_events_gitlab ON resource_state_events(gitlab_id, project_id);
CREATE INDEX idx_state_events_issue ON resource_state_events(issue_id)
WHERE issue_id IS NOT NULL;
CREATE INDEX idx_state_events_mr ON resource_state_events(merge_request_id)
@@ -114,24 +114,25 @@ CREATE TABLE resource_label_events (
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
label_name TEXT NOT NULL,
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
actor_gitlab_id INTEGER, -- GitLab user ID (stable; usernames can change)
actor_username TEXT, -- display/search convenience
label_name TEXT, -- nullable: GitLab returns null for deleted labels (see §1.2.1)
actor_gitlab_id INTEGER,
actor_username TEXT,
created_at INTEGER NOT NULL, -- ms epoch UTC
UNIQUE(gitlab_id, project_id),
CHECK (
(issue_id IS NOT NULL AND merge_request_id IS NULL)
OR (issue_id IS NULL AND merge_request_id IS NOT NULL)
)
);
CREATE UNIQUE INDEX uq_label_events_gitlab ON resource_label_events(gitlab_id, project_id);
CREATE INDEX idx_label_events_issue ON resource_label_events(issue_id)
WHERE issue_id IS NOT NULL;
CREATE INDEX idx_label_events_mr ON resource_label_events(merge_request_id)
WHERE merge_request_id IS NOT NULL;
CREATE INDEX idx_label_events_created ON resource_label_events(created_at);
CREATE INDEX idx_label_events_label ON resource_label_events(label_name);
-- Note: idx_label_events_label was added in migration 015 (not in the original 011)
-- Milestone change events (add, remove)
-- Source: GET /projects/:id/issues/:iid/resource_milestone_events
@@ -142,19 +143,20 @@ CREATE TABLE resource_milestone_events (
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
milestone_title TEXT NOT NULL,
milestone_id INTEGER,
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
actor_gitlab_id INTEGER, -- GitLab user ID (stable; usernames can change)
actor_username TEXT, -- display/search convenience
milestone_title TEXT, -- nullable: GitLab returns null for deleted milestones (see §1.2.1)
milestone_id INTEGER,
actor_gitlab_id INTEGER,
actor_username TEXT,
created_at INTEGER NOT NULL, -- ms epoch UTC
UNIQUE(gitlab_id, project_id),
CHECK (
(issue_id IS NOT NULL AND merge_request_id IS NULL)
OR (issue_id IS NULL AND merge_request_id IS NOT NULL)
)
);
CREATE UNIQUE INDEX uq_milestone_events_gitlab ON resource_milestone_events(gitlab_id, project_id);
CREATE INDEX idx_milestone_events_issue ON resource_milestone_events(issue_id)
WHERE issue_id IS NOT NULL;
CREATE INDEX idx_milestone_events_mr ON resource_milestone_events(merge_request_id)
@@ -162,6 +164,27 @@ CREATE INDEX idx_milestone_events_mr ON resource_milestone_events(merge_request_
CREATE INDEX idx_milestone_events_created ON resource_milestone_events(created_at);
```
#### 1.2.1 Nullable Label and Milestone Fields (Migration 012)
GitLab returns `null` for `label` and `milestone` in Resource Events when the referenced label or milestone has been deleted from the project. This was discovered in production after the initial schema deployed with `NOT NULL` constraints.
**Migration 012** recreates `resource_label_events` and `resource_milestone_events` with nullable `label_name` and `milestone_title` columns. The table-swap approach (create new → copy → drop old → rename) is required because SQLite doesn't support `ALTER COLUMN`.
Timeline queries that encounter null labels/milestones display `"[deleted label]"` or `"[deleted milestone]"` in human output and omit the name field in robot JSON.
#### 1.2.2 Resource Event Watermarks (Migration 013)
To avoid re-fetching resource events for every entity on every sync, a watermark column tracks the `updated_at` value at the time of last successful event fetch:
```sql
ALTER TABLE issues ADD COLUMN resource_events_synced_for_updated_at INTEGER;
ALTER TABLE merge_requests ADD COLUMN resource_events_synced_for_updated_at INTEGER;
```
**Incremental behavior:** During sync, only entities where `updated_at > COALESCE(resource_events_synced_for_updated_at, 0)` are enqueued for resource event fetching. On `--full` sync, these watermarks are reset to `NULL`, causing all entities to be re-enqueued.
This mirrors the existing `discussions_synced_for_updated_at` pattern and works in conjunction with the dependent fetch queue.
### 1.3 Config Extension
**File:** `src/core/config.rs`
@@ -223,7 +246,7 @@ pub struct GitLabLabelEvent {
pub created_at: String,
pub resource_type: String,
pub resource_id: i64,
pub label: GitLabLabelRef,
pub label: Option<GitLabLabelRef>, // nullable: deleted labels return null
pub action: String, // "add" | "remove"
}
@@ -234,7 +257,7 @@ pub struct GitLabMilestoneEvent {
pub created_at: String,
pub resource_type: String,
pub resource_id: i64,
pub milestone: GitLabMilestoneRef,
pub milestone: Option<GitLabMilestoneRef>, // nullable: deleted milestones return null
pub action: String, // "add" | "remove"
}
```
@@ -243,7 +266,7 @@ pub struct GitLabMilestoneEvent {
**Architecture:** Generic dependent-fetch queue, generalizing the `pending_discussion_fetches` pattern. A single queue table serves all dependent resource types across Gates 1, 2, and 4, avoiding schema churn as new fetch types are added.
**New queue table (in migration 010):**
**New queue table (in migration 011):**
```sql
-- Generic queue for all dependent resource fetches (events, closes_issues, diffs)
@@ -302,16 +325,32 @@ Acceptable for initial sync. Incremental sync adds negligible overhead.
### 1.7 Acceptance Criteria
- [ ] Migration 010 creates all three event tables + generic dependent fetch queue
- [ ] `lore sync` fetches resource events for changed entities when `fetchResourceEvents` is true
- [ ] `lore sync --no-events` skips event fetching
- [ ] Event fetch failures are queued for retry with exponential backoff
- [ ] Stale locks (crashed sync) automatically reclaimed on next run
- [ ] `lore count events` shows event counts by type
- [x] Migration 011 creates all three event tables + generic dependent fetch queue
- [x] `lore sync` fetches resource events for changed entities when `fetchResourceEvents` is true
- [x] `lore sync --no-events` skips event fetching
- [x] Event fetch failures are queued for retry with exponential backoff
- [x] Stale locks (crashed sync) automatically reclaimed on next run
- [x] `lore count events` shows event counts by type
- [ ] `lore stats --check` validates event table referential integrity
- [ ] `lore stats --check` validates dependent job queue health (no stuck locks, retryable jobs visible)
- [ ] Robot mode JSON for all new commands
### 1.8 Observability Infrastructure (Migration 014)
The sync pipeline includes lightweight observability via `sync_runs` enrichment. Migration 014 adds:
```sql
ALTER TABLE sync_runs ADD COLUMN run_id TEXT; -- correlation ID for log tracing
ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
CREATE INDEX IF NOT EXISTS idx_sync_runs_run_id ON sync_runs(run_id);
```
**Purpose:** The `run_id` column correlates log entries (via `tracing`) with sync run records. `total_items_processed` and `total_errors` provide aggregate counts for `lore sync-status` and robot mode health checks without requiring log parsing.
This is separate from the event tables but supports the same operational workflow — answering "did the last sync succeed?" and "how many entities were processed?" programmatically.
---
## Gate 2: Cross-Reference Extraction
@@ -320,10 +359,10 @@ Acceptable for initial sync. Incremental sync adds negligible overhead.
Temporal queries need to follow links between entities: "MR !567 closed issue #234", "issue #234 mentioned in MR !567", "#299 was opened as a follow-up to !567". These relationships are captured in two places:
1. **Structured API:** `GET /projects/:id/merge_requests/:iid/closes_issues` returns issues that close when the MR merges. Also, `resource_state_events` includes `source_merge_request_id` for "closed by MR" events.
1. **Structured API:** `GET /projects/:id/merge_requests/:iid/closes_issues` returns issues that close when the MR merges. Also, `resource_state_events` includes `source_merge_request_iid` for "closed by MR" events.
2. **System notes:** Cross-references like "mentioned in !456" and "closed by !789" appear in system note body text.
### 2.2 Schema (in Migration 010)
### 2.2 Schema (in Migration 011)
```sql
-- Cross-references between entities
@@ -340,33 +379,49 @@ Temporal queries need to follow links between entities: "MR !567 closed issue #2
-- silently dropping them. Timeline output marks these as "[external]".
CREATE TABLE entity_references (
id INTEGER PRIMARY KEY,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
source_entity_type TEXT NOT NULL CHECK (source_entity_type IN ('issue', 'merge_request')),
source_entity_id INTEGER NOT NULL, -- local DB id
target_entity_type TEXT NOT NULL CHECK (target_entity_type IN ('issue', 'merge_request')),
target_entity_id INTEGER, -- local DB id (NULL when target is unresolved/external)
target_project_path TEXT, -- e.g. "group/other-repo" (populated for cross-project refs)
target_entity_iid INTEGER, -- GitLab iid (populated when target_entity_id is NULL)
reference_type TEXT NOT NULL, -- 'closes' | 'mentioned' | 'related'
source_method TEXT NOT NULL, -- 'api_closes_issues' | 'api_state_event' | 'system_note_parse'
created_at INTEGER, -- when the reference was created (if known)
UNIQUE(source_entity_type, source_entity_id, target_entity_type,
COALESCE(target_entity_id, -1), COALESCE(target_project_path, ''),
COALESCE(target_entity_iid, -1), reference_type)
reference_type TEXT NOT NULL CHECK (reference_type IN ('closes', 'mentioned', 'related')),
source_method TEXT NOT NULL CHECK (source_method IN ('api', 'note_parse', 'description_parse')),
created_at INTEGER NOT NULL -- ms epoch UTC
);
CREATE INDEX idx_refs_source ON entity_references(source_entity_type, source_entity_id);
CREATE INDEX idx_refs_target ON entity_references(target_entity_type, target_entity_id)
-- Unique constraint includes source_method: the same relationship can be discovered by
-- multiple methods (e.g., closes_issues API and a state event), and we store both for provenance.
CREATE UNIQUE INDEX uq_entity_refs ON entity_references(
project_id, source_entity_type, source_entity_id, target_entity_type,
COALESCE(target_entity_id, -1), COALESCE(target_project_path, ''),
COALESCE(target_entity_iid, -1), reference_type, source_method
);
CREATE INDEX idx_entity_refs_source ON entity_references(source_entity_type, source_entity_id);
CREATE INDEX idx_entity_refs_target ON entity_references(target_entity_id)
WHERE target_entity_id IS NOT NULL;
CREATE INDEX idx_refs_unresolved ON entity_references(target_project_path, target_entity_iid)
CREATE INDEX idx_entity_refs_unresolved ON entity_references(target_project_path, target_entity_iid)
WHERE target_entity_id IS NULL;
```
**`source_method` values:**
| Value | Meaning |
|-------|---------|
| `'api'` | Populated from structured GitLab APIs (`closes_issues`, `resource_state_events`) |
| `'note_parse'` | Extracted from system note body text (best-effort, English only) |
| `'description_parse'` | Extracted from issue/MR description body text (future) |
The original design used more granular values (`'api_closes_issues'`, `'api_state_event'`, `'system_note_parse'`). In practice, the API-sourced references don't need sub-method distinction — the `reference_type` already captures the semantic relationship — so the implementation simplified to three values.
### 2.3 Population Strategy
**Tier 1 — Structured APIs (reliable):**
1. **`closes_issues` endpoint:** After MR ingestion, fetch `GET /projects/:id/merge_requests/:iid/closes_issues`. Insert `reference_type = 'closes'`, `source_method = 'api_closes_issues'`. Source = MR, target = issue.
2. **State events:** When `resource_state_events` contains `source_merge_request_id`, insert `reference_type = 'closes'`, `source_method = 'api_state_event'`. Source = MR (referenced by iid), target = issue (that received the state change).
1. **`closes_issues` endpoint:** After MR ingestion, fetch `GET /projects/:id/merge_requests/:iid/closes_issues`. Insert `reference_type = 'closes'`, `source_method = 'api'`. Source = MR, target = issue.
2. **State events:** When `resource_state_events` contains `source_merge_request_iid`, insert `reference_type = 'closes'`, `source_method = 'api'`. Source = MR (referenced by iid), target = issue (that received the state change).
**Tier 2 — System note parsing (best-effort):**
@@ -385,14 +440,14 @@ closed by #{iid}
**Cross-project references:** When a system note references `{group}/{project}#{iid}` and the target project is not synced locally, store with `target_entity_id = NULL`, `target_project_path = '{group}/{project}'`, `target_entity_iid = {iid}`. These unresolved references are still valuable for timeline narratives — they indicate external dependencies and decision context even when we can't traverse further.
Insert with `source_method = 'system_note_parse'`. Accept that:
Insert with `source_method = 'note_parse'`. Accept that:
- This breaks on non-English GitLab instances
- Format may vary across GitLab versions
- Log parse failures at `debug` level for monitoring
**Tier 3 — Description/body parsing (deferred):**
**Tier 3 — Description/body parsing (`source_method = 'description_parse'`, deferred):**
Issue and MR descriptions often contain `#123` or `!456` references. Parsing these is lower confidence (mentions != relationships) and is deferred to a future iteration.
Issue and MR descriptions often contain `#123` or `!456` references. Parsing these is lower confidence (mentions != relationships) and is deferred to a future iteration. The `source_method` value `'description_parse'` is reserved in the CHECK constraint for this future work.
### 2.4 Ingestion Flow
@@ -401,6 +456,8 @@ The `closes_issues` fetch uses the generic dependent fetch queue (`job_type = 'm
- One additional API call per MR: `GET /projects/:id/merge_requests/:iid/closes_issues`
- Cross-reference parsing from system notes runs as a local post-processing step (no API calls) after all dependent fetches complete
**Watermark pattern (migration 015):** A `closes_issues_synced_for_updated_at` column on `merge_requests` tracks the last `updated_at` value at which closes_issues data was fetched. Only MRs where `updated_at > COALESCE(closes_issues_synced_for_updated_at, 0)` are enqueued for re-fetching. The watermark is updated after successful fetch or after a permanent API error (e.g., 404 for external MRs). On `--full` sync, the watermark is reset to `NULL`.
### 2.5 Acceptance Criteria
- [ ] `entity_references` table populated from `closes_issues` API for all synced MRs
@@ -562,7 +619,7 @@ Evidence notes (`NOTE` events) show the first ~200 characters of FTS5-matched no
"via": {
"from": { "type": "merge_request", "iid": 567, "project": "group/repo" },
"reference_type": "closes",
"source_method": "api_closes_issues"
"source_method": "api"
}
}
],
@@ -639,9 +696,13 @@ Evidence notes (`NOTE` events) show the first ~200 characters of FTS5-matched no
## Gate 4: File Decision History (`lore file-history`)
### 4.1 Schema (Migration 011)
### 4.1 Schema
**File:** `migrations/011_file_changes.sql`
**Commit SHAs (Migration 015 — already applied):**
`merge_commit_sha` and `squash_commit_sha` were added to `merge_requests` in migration 015. These are now populated during MR ingestion and available for Gate 4/5 queries.
**File changes table (future migration — not yet created):**
```sql
-- Files changed by each merge request
@@ -660,11 +721,6 @@ CREATE INDEX idx_mr_files_new_path ON mr_file_changes(new_path);
CREATE INDEX idx_mr_files_old_path ON mr_file_changes(old_path)
WHERE old_path IS NOT NULL;
CREATE INDEX idx_mr_files_mr ON mr_file_changes(merge_request_id);
-- Add commit SHAs to merge_requests (cherry-picked from Phase A)
-- These link MRs to actual git history
ALTER TABLE merge_requests ADD COLUMN merge_commit_sha TEXT;
ALTER TABLE merge_requests ADD COLUMN squash_commit_sha TEXT;
```
### 4.2 Config Extension
@@ -881,14 +937,16 @@ When git integration is added:
### Migration Numbering
Phase B uses migration numbers starting at 010:
Phase B uses migration numbers 011015. The original plan assumed migration 010 was available, but chunk config (`010_chunk_config.sql`) was implemented first, shifting everything by +1.
| Migration | Content | Gate |
|-----------|---------|------|
| 010 | Resource event tables, generic dependent fetch queue, entity_references | Gates 1, 2 |
| 011 | mr_file_changes, merge_commit_sha, squash_commit_sha | Gate 4 |
Phase A's complete field capture migration should use 012+ when implemented, skipping fields already added by 011 (`merge_commit_sha`, `squash_commit_sha`).
| Migration | File | Content | Gate |
|-----------|------|---------|------|
| 011 | `011_resource_events.sql` | Resource event tables (state, label, milestone), entity_references, generic dependent fetch queue | Gates 1, 2 |
| 012 | `012_nullable_label_milestone.sql` | Make `label_name` and `milestone_title` nullable for deleted labels/milestones | Gate 1 (fix) |
| 013 | `013_resource_event_watermarks.sql` | Add `resource_events_synced_for_updated_at` to issues and merge_requests | Gate 1 (optimization) |
| 014 | `014_sync_runs_enrichment.sql` | Observability: `run_id`, `total_items_processed`, `total_errors` on sync_runs | Observability |
| 015 | `015_commit_shas_and_closes_watermark.sql` | `merge_commit_sha`, `squash_commit_sha`, `closes_issues_synced_for_updated_at` on merge_requests; `idx_label_events_label` index | Gates 2, 4 |
| TBD | — | `mr_file_changes` table for MR diff data | Gate 4 |
### Backward Compatibility
@@ -909,7 +967,7 @@ Phase A's complete field capture migration should use 012+ when implemented, ski
| GitLab diffs API returns large payloads | Low | Extract file metadata only, discard diff content |
| Cross-reference graph traversal unbounded | Medium | BFS depth capped at configurable limit (default 1); `mentioned` edges excluded by default |
| Cross-project references lost when target not synced | Medium | Unresolved references stored with `target_entity_id = NULL`; still appear in timeline output |
| Phase A migration numbering conflict | Low | Phase B uses 010-011; Phase A uses 012+ |
| Phase A migration numbering conflict | Low | Resolved: chunk config took 010; Phase B shifted to 011-015 |
| Timeline output lacks "why" evidence | Medium | Evidence-bearing notes from FTS5 included as first-class timeline events |
| Squash commits break blame-to-MR mapping | Medium | Tier 2 (git integration) deferred; Tier 1 uses file-level MR matching |

View File

@@ -0,0 +1,202 @@
No `## Rejected Recommendations` section appears in the plan you pasted, so the revisions below are all net-new.
1. **Add an explicit “Bridge Contract” and fix scope inconsistency**
Analysis: The plan says “Three changes” but defines four. More importantly, identifier requirements are scattered. A single contract section prevents drift and makes every new read surface prove it can drive a write call.
```diff
@@
-**Scope**: Three changes, delivered in order:
+**Scope**: Four workstreams, delivered in order:
1. Add `gitlab_discussion_id` to notes output
2. Add `gitlab_discussion_id` to show command discussion groups
3. Add a standalone `discussions` list command
4. Fix robot-docs to list actual field names instead of opaque type references
+
+## Bridge Contract (Cross-Cutting)
+Every read payload that surfaces notes/discussions MUST include:
+- `project_path`
+- `noteable_type`
+- `parent_iid`
+- `gitlab_discussion_id`
+- `gitlab_note_id` (when note-level data is returned)
+This contract is required so agents can deterministically construct `glab api` write calls.
```
2. **Normalize identifier naming now (break ambiguous names)**
Analysis: Current `id`/`gitlab_id` naming is ambiguous in mixed payloads. Rename to explicit `note_id` and `gitlab_note_id` now (you explicitly dont care about backward compatibility). This reduces automation mistakes.
```diff
@@ 1b. Add field to `NoteListRow`
-pub struct NoteListRow {
- pub id: i64,
- pub gitlab_id: i64,
+pub struct NoteListRow {
+ pub note_id: i64, // local DB id
+ pub gitlab_note_id: i64, // GitLab note id
@@
@@ 1c. Add field to `NoteListRowJson`
-pub struct NoteListRowJson {
- pub id: i64,
- pub gitlab_id: i64,
+pub struct NoteListRowJson {
+ pub note_id: i64,
+ pub gitlab_note_id: i64,
@@
-#### 2f. Add `gitlab_note_id` to note detail structs in show
-While we're here, add `gitlab_id` to `NoteDetail`, `MrNoteDetail`, and their JSON
+#### 2f. Add `gitlab_note_id` to note detail structs in show
+While we're here, add `gitlab_note_id` to `NoteDetail`, `MrNoteDetail`, and their JSON
counterparts.
```
3. **Stop positional column indexing for these changes**
Analysis: In `list.rs`, row extraction is positional (`row.get(18)`, etc.). Adding fields is fragile and easy to break silently. Use named aliases and named lookup for robustness.
```diff
@@ 1a/1b SQL + query_map
- p.path_with_namespace AS project_path
+ p.path_with_namespace AS project_path,
+ d.gitlab_discussion_id AS gitlab_discussion_id
@@
- project_path: row.get(18)?,
- gitlab_discussion_id: row.get(19)?,
+ project_path: row.get("project_path")?,
+ gitlab_discussion_id: row.get("gitlab_discussion_id")?,
```
4. **Redesign `discussions` query to avoid correlated subquery fanout**
Analysis: Proposed query uses many correlated subqueries per row. Thats acceptable for tiny MR-scoped sets, but degrades for project-wide scans. Use a base CTE + one rollup pass over notes.
```diff
@@ 3c. SQL Query
-SELECT
- d.id,
- ...
- (SELECT COUNT(*) FROM notes n2 WHERE n2.discussion_id = d.id AND n2.is_system = 0) AS note_count,
- (SELECT n3.author_username FROM notes n3 WHERE n3.discussion_id = d.id ORDER BY n3.position LIMIT 1) AS first_author,
- ...
-FROM discussions d
+WITH base AS (
+ SELECT d.id, d.gitlab_discussion_id, d.noteable_type, d.project_id, d.issue_id, d.merge_request_id,
+ d.individual_note, d.first_note_at, d.last_note_at, d.resolvable, d.resolved
+ FROM discussions d
+ {where_sql}
+),
+note_rollup AS (
+ SELECT n.discussion_id,
+ COUNT(*) FILTER (WHERE n.is_system = 0) AS user_note_count,
+ COUNT(*) AS total_note_count,
+ MIN(CASE WHEN n.is_system = 0 THEN n.position END) AS first_user_pos
+ FROM notes n
+ JOIN base b ON b.id = n.discussion_id
+ GROUP BY n.discussion_id
+)
+SELECT ...
+FROM base b
+LEFT JOIN note_rollup r ON r.discussion_id = b.id
```
5. **Add explicit index work for new access patterns**
Analysis: Existing indexes are good but not ideal for new list patterns (`project + last_note`, note position ordering inside discussion). Add migration entries to keep latency stable.
```diff
@@ ## 3. Add Standalone `discussions` List Command
+#### 3h. Add migration for discussion-list performance
+**File**: `migrations/027_discussions_list_indexes.sql`
+```sql
+CREATE INDEX IF NOT EXISTS idx_discussions_project_last_note
+ ON discussions(project_id, last_note_at DESC, id DESC);
+CREATE INDEX IF NOT EXISTS idx_discussions_project_first_note
+ ON discussions(project_id, first_note_at DESC, id DESC);
+CREATE INDEX IF NOT EXISTS idx_notes_discussion_position
+ ON notes(discussion_id, position);
+```
```
6. **Add keyset pagination (critical for agent workflows)**
Analysis: `--limit` alone is not enough for automation over large datasets. Add cursor-based pagination with deterministic sort keys and `next_cursor` in JSON.
```diff
@@ 3a. CLI Args
+ /// Keyset cursor from previous response
+ #[arg(long, help_heading = "Output")]
+ pub cursor: Option<String>,
@@
@@ Response Schema
- "total_count": 15,
- "showing": 15
+ "total_count": 15,
+ "showing": 15,
+ "next_cursor": "eyJsYXN0X25vdGVfYXQiOjE3MDAwMDAwMDAwMDAsImlkIjoxMjN9"
@@
@@ Validation Criteria
+7. `lore -J discussions ... --cursor <token>` returns the next stable page without duplicates/skips
```
7. **Fix semantic ambiguities in discussion summary fields**
Analysis: `note_count` is ambiguous, and `first_author` can accidentally be a system note author. Make fields explicit and consistent with non-system default behavior.
```diff
@@ Response Schema
- "note_count": 3,
- "first_author": "elovegrove",
+ "user_note_count": 3,
+ "total_note_count": 4,
+ "first_user_author": "elovegrove",
@@
@@ 3d. Filters struct / path behavior
-- `path` → `EXISTS (SELECT 1 FROM notes n WHERE n.discussion_id = d.id AND n.position_new_path LIKE ?)`
+- `path` → match on BOTH `position_new_path` and `position_old_path` (exact/prefix)
```
8. **Enrich show outputs with actionable thread metadata**
Analysis: Adding only discussion id helps, but agents still need thread state and note ids to pick targets correctly. Add `resolvable`, `resolved`, `last_note_at_iso`, and `gitlab_note_id` in show discussion payloads.
```diff
@@ 2a/2b show discussion structs
pub struct DiscussionDetailJson {
pub gitlab_discussion_id: String,
+ pub resolvable: bool,
+ pub resolved: bool,
+ pub last_note_at_iso: String,
pub notes: Vec<NoteDetailJson>,
@@
pub struct NoteDetailJson {
+ pub gitlab_note_id: i64,
pub author_username: String,
```
9. **Harden robot-docs against schema drift with tests**
Analysis: Static JSON in `main.rs` will drift again. Add a lightweight contract test that asserts docs include required fields for `notes`, `discussions`, and show payloads.
```diff
@@ 4. Fix Robot-Docs Response Schemas
+#### 4f. Add robot-docs contract tests
+**File**: `src/main.rs` (or dedicated test module)
+- Assert `robot-docs` contains `gitlab_discussion_id` and `gitlab_note_id` in:
+ - `notes.response_schema`
+ - `issues.response_schema.show`
+ - `mrs.response_schema.show`
+ - `discussions.response_schema`
```
10. **Adjust delivery order to reduce rework and include missing CSV path**
Analysis: In your sample `handle_discussions`, `csv` is declared in args but not handled. Also, robot-docs should land after all payload changes. Sequence should minimize churn.
```diff
@@ Delivery Order
-3. **Change 4** (robot-docs) — depends on 1 and 2 being done so schemas are accurate.
-4. **Change 3** (discussions command) — largest change, depends on 1 for design consistency.
+3. **Change 3** (discussions command + indexes + pagination) — largest change.
+4. **Change 4** (robot-docs + contract tests) — last, after payloads are final.
@@ 3e. Handler wiring
- match format {
+ match format {
"json" => ...
"jsonl" => ...
+ "csv" => print_list_discussions_csv(&result),
_ => ...
}
```
If you want, I can produce a single consolidated revised plan markdown with these edits applied so you can drop it in directly.

View File

@@ -0,0 +1,162 @@
Best non-rejected upgrades Id make to this plan are below. They focus on reducing schema drift, making robot output safer to consume, and improving performance behavior at scale.
1. Add a shared contract model and field constants first (before workstreams 1-4)
Rationale: Right now each command has its own structs and ad-hoc mapping. That is exactly how drift happens. A single contract definition reused by `notes`, `show`, `discussions`, and robot-docs gives compile-time coupling between output payloads and docs. It also makes future fields cheaper and safer to add.
```diff
@@ Scope: Four workstreams, delivered in order:
-1. Add `gitlab_discussion_id` to notes output
-2. Add `gitlab_discussion_id` to show command discussion groups
-3. Add a standalone `discussions` list command
-4. Fix robot-docs to list actual field names instead of opaque type references
+0. Introduce shared Bridge Contract model/constants used by notes/show/discussions/robot-docs
+1. Add `gitlab_discussion_id` to notes output
+2. Add `gitlab_discussion_id` to show command discussion groups
+3. Add a standalone `discussions` list command
+4. Fix robot-docs to list actual field names instead of opaque type references
+## 0. Shared Contract Model (Cross-Cutting)
+Define canonical required-field constants and shared mapping helpers, then consume them in:
+- `src/cli/commands/list.rs`
+- `src/cli/commands/show.rs`
+- `src/cli/robot.rs`
+- `src/main.rs` robot-docs builder
+This removes duplicated field-name strings and prevents docs/output mismatch.
```
2. Make bridge fields “non-droppable” in robot mode
Rationale: The current plan adds fields, but `--fields` can still remove them. That breaks the core read/write bridge contract in exactly the workflows this change is trying to fix. In robot mode, contract fields should always be force-included.
```diff
@@ ## Bridge Contract (Cross-Cutting)
Every read payload that surfaces notes or discussions **MUST** include:
- `project_path`
- `noteable_type`
- `parent_iid`
- `gitlab_discussion_id`
- `gitlab_note_id` (when note-level data is returned — i.e., in notes list and show detail)
+### Field Filtering Guardrail
+In robot mode, `filter_fields` must force-include Bridge Contract fields even when users pass a narrower `--fields` list.
+Human/table mode keeps existing behavior.
```
3. Replace correlated subqueries in `discussions` rollup with a single-pass window/aggregate pattern
Rationale: Your CTE is better than naive fanout, but it still uses multiple correlated sub-selects per discussion for first author/body/path. At 200K+ discussions this can regress badly depending on cache/index state. A window-ranked `notes` CTE with grouped aggregates is usually faster and more predictable in SQLite.
```diff
@@ #### 3c. SQL Query
-Core query uses a CTE + rollup to avoid correlated subquery fanout on larger result sets:
+Core query uses a CTE + ranked-notes rollup (window function) to avoid per-row correlated subqueries:
-WITH filtered_discussions AS (...),
-note_rollup AS (
- SELECT
- n.discussion_id,
- SUM(...) AS note_count,
- (SELECT ... LIMIT 1) AS first_author,
- (SELECT ... LIMIT 1) AS first_note_body,
- (SELECT ... LIMIT 1) AS position_new_path,
- (SELECT ... LIMIT 1) AS position_new_line
- FROM notes n
- ...
-)
+WITH filtered_discussions AS (...),
+ranked_notes AS (
+ SELECT
+ n.*,
+ ROW_NUMBER() OVER (PARTITION BY n.discussion_id ORDER BY n.position, n.id) AS rn
+ FROM notes n
+ WHERE n.discussion_id IN (SELECT id FROM filtered_discussions)
+),
+note_rollup AS (
+ SELECT
+ discussion_id,
+ SUM(CASE WHEN is_system = 0 THEN 1 ELSE 0 END) AS note_count,
+ MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author,
+ MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body,
+ MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path,
+ MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line
+ FROM ranked_notes
+ GROUP BY discussion_id
+)
```
4. Add direct GitLab ID filters for deterministic bridging
Rationale: Bridge workflows often start from one known ID. You already have `gitlab_note_id` in notes filters, but discussion filtering still looks internal-ID-centric. Add explicit GitLab-ID filters so agents do not need extra translation calls.
```diff
@@ #### 3a. CLI Args
pub struct DiscussionsArgs {
+ /// Filter by GitLab discussion ID
+ #[arg(long, help_heading = "Filters")]
+ pub gitlab_discussion_id: Option<String>,
@@
@@ #### 3d. Filters struct
pub struct DiscussionListFilters {
+ pub gitlab_discussion_id: Option<String>,
@@
}
```
```diff
@@ ## 1. Add `gitlab_discussion_id` to Notes Output
+#### 1g. Add `--gitlab-discussion-id` filter to notes
+Allow filtering notes directly by GitLab thread ID (not only internal discussion ID).
+This enables one-hop note retrieval from external references.
```
5. Add optional note expansion to `discussions` for fewer round-trips
Rationale: Today the agent flow is often `discussions -> show`. Optional embedded notes (`--include-notes N`) gives a fast path for “list unresolved threads with latest context” without forcing full show payloads.
```diff
@@ ### Design
lore -J discussions --for-mr 99 --resolution unresolved
+lore -J discussions --for-mr 99 --resolution unresolved --include-notes 2
@@ #### 3a. CLI Args
+ /// Include up to N latest notes per discussion (0 = none)
+ #[arg(long, default_value = "0", help_heading = "Output")]
+ pub include_notes: usize,
```
6. Upgrade robot-docs from string blobs to structured schema + explicit contract block
Rationale: `contains("gitlab_discussion_id")` tests on schema strings are brittle. A structured schema object gives machine-checked docs and reliable test assertions. Add a contract section for agent consumers.
```diff
@@ ## 4. Fix Robot-Docs Response Schemas
-#### 4a. Notes response_schema
-Replace stringly-typed schema snippets...
+#### 4a. Notes response_schema (structured)
+Represent response fields as JSON objects (field -> type/nullable), not freeform strings.
+#### 4g. Add `bridge_contract` section in robot-docs
+Publish canonical required fields per entity:
+- notes
+- discussions
+- show.discussions
+- show.notes
```
7. Strengthen validation: add CLI-level contract tests and perf guardrails
Rationale: Most current tests are unit-level struct/query checks. Add end-to-end JSON contract tests via command handlers, plus a benchmark-style regression test (ignored by default) so performance work stays intentional.
```diff
@@ ## Validation Criteria
8. Bridge Contract fields (...) are present in every applicable read payload
+9. Contract fields remain present even with `--fields` in robot mode
+10. `discussions` query meets performance guardrail on representative fixture (documented threshold)
@@ ### Tests
+#### Test: robot-mode fields cannot drop bridge contract keys
+Run notes/discussions JSON output through `filter_fields` path and assert required keys remain.
+
+#### Test: CLI contract integration
+Invoke command handlers for `notes`, `discussions`, `mrs <iid>`, parse JSON, assert required keys and types.
+
+#### Test (ignored): large-fixture performance regression
+Generate representative fixture and assert `query_discussions` stays under target elapsed time.
```
If you want, I can now produce a full “v2 plan” document that applies these diffs end-to-end (including revised delivery order and complete updated sections).

View File

@@ -0,0 +1,147 @@
1. **Make `gitlab_note_id` explicit in all note-level payloads without breaking existing consumers**
Rationale: Your Bridge Contract already requires `gitlab_note_id`, but current plan keeps `gitlab_id` only in `notes` list while adding `gitlab_note_id` only in `show`. That forces agents to special-case commands. Add `gitlab_note_id` as an alias field everywhere note-level data appears, while keeping `gitlab_id` for compatibility.
```diff
@@ Bridge Contract (Cross-Cutting)
-Every read payload that surfaces notes or discussions MUST include:
+Every read payload that surfaces notes or discussions MUST include:
- project_path
- noteable_type
- parent_iid
- gitlab_discussion_id
- gitlab_note_id (when note-level data is returned — i.e., in notes list and show detail)
+ - Back-compat rule: note payloads may continue exposing `gitlab_id`, but MUST also expose `gitlab_note_id` with the same value.
@@ 1. Add `gitlab_discussion_id` to Notes Output
-#### 1c. Add field to `NoteListRowJson`
+#### 1c. Add fields to `NoteListRowJson`
+Add `gitlab_note_id` alias in addition to existing `gitlab_id` (no rename, no breakage).
@@ 1f. Update `--fields minimal` preset
-"notes" => ["id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
+"notes" => ["id", "gitlab_note_id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
```
2. **Avoid duplicate flag semantics for discussion filtering**
Rationale: `notes` already has `--discussion-id` and it already maps to `d.gitlab_discussion_id`. Adding a second independent flag/field (`--gitlab-discussion-id`) increases complexity and precedence bugs. Keep one backing filter field and make the new flag an alias.
```diff
@@ 1g. Add `--gitlab-discussion-id` filter to notes
-Allow filtering notes directly by GitLab discussion thread ID...
+Normalize discussion ID flags:
+- Keep one backing filter field (`discussion_id`)
+- Support both `--discussion-id` (existing) and `--gitlab-discussion-id` (alias)
+- If both are provided, clap should reject as duplicate/alias conflict
```
3. **Add ambiguity guardrails for cross-project discussion IDs**
Rationale: `gitlab_discussion_id` is unique per project, not globally. Filtering by discussion ID without project can return multiple rows across repos, which breaks deterministic write bridging. Fail fast with an `Ambiguous` error and actionable fix (`--project`).
```diff
@@ Bridge Contract (Cross-Cutting)
+### Ambiguity Guardrail
+When filtering by `gitlab_discussion_id` without `--project`, if multiple projects match:
+- return `Ambiguous` error
+- include matching project paths in message
+- suggest retry with `--project <path>`
```
4. **Replace `--include-notes` N+1 retrieval with one batched top-N query**
Rationale: The current plans per-discussion follow-up query scales poorly and creates latency spikes. Use a single window-function query over selected discussion IDs and group rows in Rust. This is both faster and more predictable.
```diff
@@ 3c-ii. Note expansion query (--include-notes)
-When `include_notes > 0`, after the main discussion query, run a follow-up query per discussion...
+When `include_notes > 0`, run one batched query:
+WITH ranked_notes AS (
+ SELECT
+ n.*,
+ d.gitlab_discussion_id,
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY n.created_at DESC, n.id DESC
+ ) AS rn
+ FROM notes n
+ JOIN discussions d ON d.id = n.discussion_id
+ WHERE n.discussion_id IN ( ...selected discussion ids... )
+)
+SELECT ... FROM ranked_notes WHERE rn <= ?
+ORDER BY discussion_id, rn;
+
+Group by `discussion_id` in Rust and attach notes arrays without per-thread round-trips.
```
5. **Add hard output guardrails and explicit truncation metadata**
Rationale: `--limit` and `--include-notes` are unbounded today. For robot workflows this can accidentally generate huge payloads. Cap values and surface effective limits plus truncation state in `meta`.
```diff
@@ 3a. CLI Args
- pub limit: usize,
+ pub limit: usize, // clamp to max (e.g., 500)
- pub include_notes: usize,
+ pub include_notes: usize, // clamp to max (e.g., 20)
@@ Response Schema
- "meta": { "elapsed_ms": 12 }
+ "meta": {
+ "elapsed_ms": 12,
+ "effective_limit": 50,
+ "effective_include_notes": 2,
+ "has_more": true
+ }
```
6. **Strengthen deterministic ordering and null handling**
Rationale: `first_note_at`, `last_note_at`, and note `position` can be null/incomplete during partial sync states. Add null-safe ordering to avoid unstable output and flaky automation.
```diff
@@ 2c. Update queries to SELECT new fields
-... ORDER BY first_note_at
+... ORDER BY COALESCE(first_note_at, last_note_at, 0), id
@@ show note query
-ORDER BY position
+ORDER BY COALESCE(position, 9223372036854775807), created_at, id
@@ 3c. SQL Query
-ORDER BY {sort_column} {order}
+ORDER BY COALESCE({sort_column}, 0) {order}, fd.id {order}
```
7. **Make write-bridging more useful with optional command hints**
Rationale: Exposing IDs is necessary but not sufficient; agents still need to assemble endpoints repeatedly. Add optional `--with-write-hints` that injects compact endpoint templates (`reply`, `resolve`) derived from row context. This improves usability without bloating default output.
```diff
@@ 3a. CLI Args
+ /// Include machine-actionable glab write hints per row
+ #[arg(long, help_heading = "Output")]
+ pub with_write_hints: bool,
@@ Response Schema (notes/discussions/show)
+ "write_hints?": {
+ "reply_endpoint": "string",
+ "resolve_endpoint?": "string"
+ }
```
8. **Upgrade robot-docs/contract validation from string-contains to parity checks**
Rationale: `contains("gitlab_discussion_id")` catches very little and allows schema drift. Build field-set parity tests that compare actual serialized JSON keys to robot-docs declared fields for `notes`, `discussions`, and `show` discussion nodes.
```diff
@@ 4f. Add robot-docs contract tests
-assert!(notes_schema.contains("gitlab_discussion_id"));
+let declared = parse_schema_field_list(notes_schema);
+let sample = sample_notes_row_json_keys();
+assert_required_subset(&declared, &["project_path","noteable_type","parent_iid","gitlab_discussion_id","gitlab_note_id"]);
+assert_schema_matches_payload(&declared, &sample);
@@ 4g. Add CLI-level contract integration tests
+Add parity tests for:
+- notes list JSON
+- discussions list JSON
+- issues show discussions[*]
+- mrs show discussions[*]
```
If you want, I can produce a full revised v3 plan text with these edits merged end-to-end so its ready to execute directly.

View File

@@ -0,0 +1,207 @@
Below are the highest-impact revisions Id make to this plan. I excluded everything listed in your `## Rejected Recommendations` section.
**1. Fix a correctness bug in the ambiguity guardrail (must run before `LIMIT`)**
The current post-query ambiguity check can silently fail when `--limit` truncates results to one project even though multiple projects match the same `gitlab_discussion_id`. That creates non-deterministic write targeting risk.
```diff
@@ ## Ambiguity Guardrail
-**Implementation**: After the main query, if `gitlab_discussion_id` is set and no `--project`
-was provided, check if the result set spans multiple `project_path` values.
+**Implementation**: Run a preflight distinct-project check when `gitlab_discussion_id` is set
+and `--project` was not provided, before the main list query applies `LIMIT`.
+Use:
+```sql
+SELECT DISTINCT p.path_with_namespace
+FROM discussions d
+JOIN projects p ON p.id = d.project_id
+WHERE d.gitlab_discussion_id = ?
+LIMIT 3
+```
+If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with project
+paths and suggestion to retry with `--project <path>`.
```
---
**2. Add `gitlab_project_id` to the Bridge Contract**
`project_path` is human-friendly but mutable (renames/transfers). `gitlab_project_id` gives a stable write target and avoids path re-resolution failures.
```diff
@@ ## Bridge Contract (Cross-Cutting)
Every read payload that surfaces notes or discussions **MUST** include:
- `project_path`
+- `gitlab_project_id`
- `noteable_type`
- `parent_iid`
- `gitlab_discussion_id`
- `gitlab_note_id`
@@
const BRIDGE_FIELDS_NOTES: &[&str] = &[
- "project_path", "noteable_type", "parent_iid",
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id", "gitlab_note_id",
];
const BRIDGE_FIELDS_DISCUSSIONS: &[&str] = &[
- "project_path", "noteable_type", "parent_iid",
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id",
];
```
---
**3. Replace stringly-typed filter/sort fields with enums end-to-end**
Right now `sort`, `order`, `resolution`, `noteable_type` are mostly `String`. This is fragile and risks unsafe SQL interpolation drift over time. Typed enums make invalid states unrepresentable.
```diff
@@ ## 3a. CLI Args
- pub resolution: Option<String>,
+ pub resolution: Option<ResolutionFilter>,
@@
- pub noteable_type: Option<String>,
+ pub noteable_type: Option<NoteableTypeFilter>,
@@
- pub sort: String,
+ pub sort: DiscussionSortField,
@@
- pub asc: bool,
+ pub order: SortDirection,
@@ ## 3d. Filters struct
- pub resolution: Option<String>,
- pub noteable_type: Option<String>,
- pub sort: String,
- pub order: String,
+ pub resolution: Option<ResolutionFilter>,
+ pub noteable_type: Option<NoteableTypeFilter>,
+ pub sort: DiscussionSortField,
+ pub order: SortDirection,
@@
+Map enum -> SQL fragment via `match` in query builder; never interpolate raw strings.
```
---
**4. Enforce snapshot consistency for multi-query commands**
`discussions` with `--include-notes` does multiple reads. Without a single read transaction, concurrent ingest can produce mismatched `total_count`, row set, and expanded notes.
```diff
@@ ## 3c. SQL Query
-pub fn query_discussions(...)
+pub fn query_discussions(...)
{
+ // Run count query + page query + note expansion under one deferred read transaction
+ // so output is a single consistent snapshot.
+ let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
...
+ tx.commit()?;
}
@@ ## 1. Add `gitlab_discussion_id` to Notes Output
+Apply the same snapshot rule to `query_notes` when returning `total_count` + paged rows.
```
---
**5. Correct first-note rollup semantics (current CTE can return null/incorrect `first_author`)**
In the proposed SQL, `rn=1` is computed over all notes but then filtered with `is_system=0`, so threads with a leading system note may incorrectly lose `first_author`/snippet. Also path rollup uses non-deterministic `MAX(...)`.
```diff
@@ ## 3c. SQL Query
-ranked_notes AS (
+ranked_notes AS (
SELECT
n.discussion_id,
n.author_username,
n.body,
n.is_system,
n.position_new_path,
n.position_new_line,
- ROW_NUMBER() OVER (
- PARTITION BY n.discussion_id
- ORDER BY n.position, n.id
- ) AS rn
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY CASE WHEN n.is_system = 0 THEN 0 ELSE 1 END, n.created_at, n.id
+ ) AS rn_first_note,
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY CASE WHEN n.position_new_path IS NULL THEN 1 ELSE 0 END, n.created_at, n.id
+ ) AS rn_first_position
@@
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author,
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body,
- MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path,
- MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN author_username END) AS first_author,
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN body END) AS first_note_body,
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_path END) AS position_new_path,
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_line END) AS position_new_line
```
---
**6. Add per-discussion truncation signals for `--include-notes`**
Top-level `has_more` is useful, but agents also need to know if an individual threads notes were truncated. Otherwise they cant tell if a thread is complete.
```diff
@@ ## Response Schema
{
"gitlab_discussion_id": "...",
...
- "notes": []
+ "included_note_count": 0,
+ "has_more_notes": false,
+ "notes": []
}
@@ ## 3b. Domain Structs
pub struct DiscussionListRowJson {
@@
+ pub included_note_count: usize,
+ pub has_more_notes: bool,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub notes: Vec<NoteListRowJson>,
}
@@ ## 3c-ii. Note expansion query (--include-notes)
-Group by `discussion_id` in Rust and attach notes arrays...
+Group by `discussion_id` in Rust, attach notes arrays, and set:
+`included_note_count = notes.len()`,
+`has_more_notes = note_count > included_note_count`.
```
---
**7. Add explicit query-plan gate and targeted index workstream (measured, not speculative)**
This plan introduces heavy discussion-centric reads. You should bake in deterministic performance validation with `EXPLAIN QUERY PLAN` and only then add indexes if missing.
```diff
@@ ## Scope: Four workstreams, delivered in order:
-4. Fix robot-docs to list actual field names instead of opaque type references
+4. Add query-plan validation + targeted index updates for new discussion queries
+5. Fix robot-docs to list actual field names instead of opaque type references
@@
+## 4. Query-Plan Validation and Targeted Indexes
+
+Before and after implementing `query_discussions`, capture `EXPLAIN QUERY PLAN` for:
+- `--for-mr <iid> --resolution unresolved`
+- `--project <path> --since 7d --sort last_note`
+- `--gitlab-discussion-id <id>`
+
+If plans show table scans on `notes`/`discussions`, add indexes in `MIGRATIONS` array:
+- `discussions(project_id, gitlab_discussion_id)`
+- `discussions(merge_request_id, last_note_at, id)`
+- `notes(discussion_id, created_at DESC, id DESC)`
+- `notes(discussion_id, position, id)`
+
+Tests: assert the new query paths return expected rows under indexed schema and no regressions.
```
---
If you want, I can produce a single consolidated “iteration 4” version of the plan text with all seven revisions merged in place.

View File

@@ -0,0 +1,160 @@
I reviewed the plan end-to-end and focused only on new improvements (none of the items in `## Rejected Recommendations` are re-proposed).
1. Add direct `--discussion-id` retrieval paths
Rationale: This removes a full discovery hop for the exact workflow that failed (replying to a known thread). It also reduces ambiguity and query cost when an agent already has the thread ID.
```diff
@@ Core Changes
| 7 | Fix robot-docs to list actual field names | Docs | Small |
+| 8 | Add direct `--discussion-id` filter to notes/discussions/show | Core | Small |
@@ Change 3: Add Standalone `discussions` List Command
lore -J discussions --for-mr 99 --cursor <token> # keyset pagination
+lore -J discussions --discussion-id 6a9c1750b37d... # direct lookup
@@ 3a. CLI Args
+ #[arg(long, conflicts_with_all = ["for_issue", "for_mr"], help_heading = "Filters")]
+ pub discussion_id: Option<String>,
@@ Change 1: Add `gitlab_discussion_id` to Notes Output
+Add `--discussion-id <hex>` filter to `notes` for direct note retrieval within one thread.
```
2. Add a shared filter compiler to eliminate count/query drift
Rationale: The plan currently repeats filters across data query, `total_count`, and `incomplete_rows` count queries. That is a classic reliability bug source. A single compiled filter object makes count semantics provably consistent.
```diff
@@ Count Semantics (Cross-Cutting Convention)
+## Filter Compiler (NEW, Cross-Cutting Convention)
+All list commands must build predicates via a shared `CompiledFilters` object that emits:
+- SQL predicate fragment
+- bind parameters
+- canonical filter string (for cursor hash)
+The same compiled object is reused by:
+- page data query
+- `total_count` query
+- `incomplete_rows` query
```
3. Harden keyset pagination semantics for `DESC`, limits, and client ergonomics
Rationale: `(sort_value, id) > (?, ?)` is only correct for ascending order. Descending sort needs `<`. Also add explicit `has_more` so clients dont infer from cursor nullability.
```diff
@@ Keyset Pagination (Cross-Cutting, Change B)
-```sql
-WHERE (sort_value, id) > (?, ?)
-```
+Use comparator by order:
+- ASC: `(sort_value, id) > (?, ?)`
+- DESC: `(sort_value, id) < (?, ?)`
@@ 3a. CLI Args
+ #[arg(short = 'n', long = "limit", default_value = "50", value_parser = clap::value_parser!(usize).range(1..=500), help_heading = "Output")]
+ pub limit: usize,
@@ Response Schema
- "next_cursor": "aW...xyz=="
+ "next_cursor": "aW...xyz==",
+ "has_more": true
```
4. Add DB-level entity integrity invariants (not just response invariants)
Rationale: Response-side filtering is good, but DB correctness should also be guarded. This prevents silent corruption and bad joins from ingestion or future migrations.
```diff
@@ Contract Invariants (NEW)
+### Entity Integrity Invariants (DB + Ingest)
+1. `discussions` must belong to exactly one parent (`issue_id XOR merge_request_id`).
+2. `discussions.noteable_type` must match the populated parent column.
+3. Natural-key uniqueness is enforced where valid:
+ - `(project_id, gitlab_discussion_id)` unique for discussions.
+4. Ingestion must reject/quarantine rows violating invariants and report counts.
@@ Supporting Indexes (Cross-Cutting, Change D)
+CREATE UNIQUE INDEX IF NOT EXISTS idx_discussions_project_gitlab_discussion_id
+ ON discussions(project_id, gitlab_discussion_id);
```
5. Switch bulk note loading to streaming grouping (avoid large intermediate vecs)
Rationale: Current bulk strategy still materializes all notes before grouping. Streaming into the map cuts peak memory and improves large-MR stability.
```diff
@@ Change 2e. Constructor — use bulk notes map
-let all_note_rows: Vec<MrNoteDetail> = ... // From bulk query above
-let notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> =
- all_note_rows.into_iter().fold(HashMap::new(), |mut map, note| {
- map.entry(note.discussion_id).or_insert_with(Vec::new).push(note);
- map
- });
+let mut notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> = HashMap::new();
+for row in bulk_note_stmt.query_map(params, map_note_row)? {
+ let note = row?;
+ notes_by_discussion.entry(note.discussion_id).or_default().push(note);
+}
```
6. Make freshness tri-state (`fresh|stale|unknown`) and fail closed on unknown with `--require-fresh`
Rationale: `stale: bool` alone cannot represent “never synced / unknown project freshness.” For write safety, unknown freshness should be explicit and reject under freshness constraints.
```diff
@@ Freshness Metadata & Staleness Guards
pub struct ResponseMeta {
pub elapsed_ms: i64,
pub data_as_of_iso: String,
pub sync_lag_seconds: i64,
pub stale: bool,
+ pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub freshness_reason: Option<String>,
pub incomplete_rows: i64,
@@
-if sync_lag_seconds > max_age_secs {
+if freshness_state == "unknown" || sync_lag_seconds > max_age_secs {
```
7. Tune indexes to match actual ORDER BY paths in window queries
Rationale: `idx_notes_discussion_position` is likely insufficient for the two window orderings. A covering-style index aligned with partition/order keys reduces random table lookups.
```diff
@@ Supporting Indexes (Cross-Cutting, Change D)
--- Notes: window function ORDER BY (discussion_id, position) for ROW_NUMBER()
-CREATE INDEX IF NOT EXISTS idx_notes_discussion_position
- ON notes(discussion_id, position);
+-- Notes: support dual ROW_NUMBER() orderings and reduce table lookups
+CREATE INDEX IF NOT EXISTS idx_notes_discussion_window
+ ON notes(discussion_id, is_system, position, created_at, gitlab_id);
```
8. Add a phased rollout gate before strict exclusion becomes default
Rationale: Enforcing `gitlab_* IS NOT NULL` immediately can hide data if existing rows are incomplete. A short observation gate prevents sudden regressions while preserving the end-state contract.
```diff
@@ Delivery Order
+Batch 0: Observability gate (NEW)
+- Ship `incomplete_rows` and freshness meta first
+- Measure incomplete rate across real datasets
+- If incomplete ratio <= threshold, enable strict exclusion defaults
+- If above threshold, block rollout and fix ingestion quality first
+
Change 1 (notes output) ──┐
```
9. Add property-based invariants for pagination/count correctness
Rationale: Your current tests are scenario-based and good, but randomized property tests are much better at catching edge-case cursor/count bugs.
```diff
@@ Tests (Change 3 / Change B)
+**Test 12**: Property-based pagination invariants (`proptest`)
+```rust
+#[test]
+fn prop_discussion_cursor_no_overlap_no_gap_under_random_data() { /* ... */ }
+```
+
+**Test 13**: Property-based count invariants
+```rust
+#[test]
+fn prop_total_count_and_incomplete_rows_match_filter_partition() { /* ... */ }
+```
```
If you want, I can now produce a fully consolidated “Plan v4” that applies these diffs cleanly into your original document so it reads as a single coherent spec.

View File

@@ -0,0 +1,158 @@
I reviewed the whole plan and only proposed changes that are not in your `## Rejected Recommendations`.
1. **Fix plan-internal inconsistencies first**
Analysis: The plan currently has a few self-contradictions (`8` vs `9` cross-cutting improvements, `stale` still referenced after moving to tri-state freshness). Cleaning this prevents implementation drift and bad AC validation.
```diff
--- a/plan.md
+++ b/plan.md
@@
-**Scope**: 8 core changes + 8 cross-cutting architectural improvements across 3 tiers:
+**Scope**: 8 core changes + 9 cross-cutting architectural improvements across 3 tiers:
@@ AC-7: Freshness Metadata Present & Staleness Guards Work
-lore -J notes -n 1 | jq '.meta | {data_as_of_iso, sync_lag_seconds, stale}'
-# All fields present, stale=false if recently synced
+lore -J notes -n 1 | jq '.meta | {data_as_of_iso, sync_lag_seconds, freshness_state}'
+# All fields present, freshness_state is one of fresh|stale|unknown
@@ Change 6 Response Schema example
- "stale": false,
+ "freshness_state": "fresh",
```
2. **Require snapshot-consistent list responses (page + counts)**
Analysis: `total_count`, `incomplete_rows`, and page rows can drift if sync writes between queries. Enforcing a single read snapshot for all list commands makes pagination and counts deterministic.
```diff
--- a/plan.md
+++ b/plan.md
@@ Count Semantics (Cross-Cutting Convention)
All list commands use consistent count fields:
+All three queries (`page`, `total_count`, `incomplete_rows`) MUST execute inside one read transaction/snapshot.
+This guarantees count/page consistency under concurrent sync writes.
```
3. **Use RAII transactions instead of manual `BEGIN/COMMIT`**
Analysis: Manual `execute_batch("BEGIN...")` is fragile on early returns. `rusqlite::Transaction` guarantees rollback on error and removes transaction-leak risk.
```diff
--- a/plan.md
+++ b/plan.md
@@ Change 2: Consistency guarantee
-conn.execute_batch("BEGIN DEFERRED")?;
-// ... discussion query ...
-// ... bulk note query ...
-conn.execute_batch("COMMIT")?;
+let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
+// ... discussion query ...
+// ... bulk note query ...
+tx.commit()?;
```
4. **Allow small focused new modules for query infrastructure**
Analysis: Keeping everything in `list.rs`/`show.rs` will become a maintenance hotspot as filters/cursors/freshness expand. A small module split reduces coupling and regression risk.
```diff
--- a/plan.md
+++ b/plan.md
@@ Change 3: File Architecture
-**No new files.** Follow existing patterns:
+Allow focused infra modules for shared logic:
+- `src/cli/query/filters.rs` (CompiledFilters + builders)
+- `src/cli/query/cursor.rs` (encode/decode/validate v2 cursors)
+- `src/cli/query/freshness.rs` (freshness computation + guards)
+Command handlers remain in existing files.
```
5. **Add ingest-time `discussion_rollups` to avoid repeated heavy window scans**
Analysis: Window functions are good, but doing them on every read over large note volumes is still expensive. Precomputing rollups during ingest gives lower and more predictable p95 latency while keeping read paths simpler.
```diff
--- a/plan.md
+++ b/plan.md
@@ Architectural Improvements (Cross-Cutting)
+| J | Ingest-time discussion rollups (`discussion_rollups`) | Performance | Medium |
@@ Change 3 SQL strategy
-Use `ROW_NUMBER()` window function instead of correlated subqueries...
+Primary path: join precomputed `discussion_rollups` for `note_count`, `first_author`,
+`first_note_body`, `position_new_path`, `position_new_line`.
+Fallback path: window-function recompute if rollup row is missing (defensive correctness).
```
6. **Add deterministic numeric project selector `--project-id`**
Analysis: `-p group/repo` is human-friendly, but numeric project IDs are safer for robots and avoid fuzzy/project-path ambiguity. This reduces false ambiguity failures and lookup overhead.
```diff
--- a/plan.md
+++ b/plan.md
@@ DiscussionsArgs
#[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>,
+ #[arg(long, conflicts_with = "project", help_heading = "Filters")]
+ pub project_id: Option<i64>,
@@ Ambiguity handling
+If `--project-id` is provided, IID resolution is scoped directly to that project.
+`--project-id` takes precedence over path-based project matching.
```
7. **Make path filtering rename-aware (`old` + `new`)**
Analysis: Current `--path` strategy only using `position_new_path` misses deleted/renamed-file discussions. Supporting side selection makes the feature materially more useful for review workflows.
```diff
--- a/plan.md
+++ b/plan.md
@@ DiscussionsArgs
#[arg(long, help_heading = "Filters")]
pub path: Option<String>,
+ #[arg(long, value_parser = ["either", "new", "old"], default_value = "either", help_heading = "Filters")]
+ pub path_side: String,
@@ Change 3 filtering
-Path filter matches `position_new_path`.
+Path filter semantics:
+- `either` (default): match `position_new_path` OR `position_old_path`
+- `new`: match only `position_new_path`
+- `old`: match only `position_old_path`
```
8. **Add explicit freshness behavior for empty-result queries + bootstrap backfill**
Analysis: Freshness based only on “participating rows” is undefined when results are empty. Define deterministic behavior and backfill `project_sync_state` on migration so `unknown` doesnt spike unexpectedly after deploy.
```diff
--- a/plan.md
+++ b/plan.md
@@ Freshness state logic
+Empty-result rules:
+- If query is project-scoped (`-p` or `--project-id`), freshness is computed from that project even when no rows match.
+- If query is unscoped and returns zero rows, freshness is computed from all tracked projects.
@@ A1. Track per-project sync timestamp
+Migration step: seed `project_sync_state` from latest known sync metadata where available
+to avoid mass `unknown` freshness immediately after rollout.
```
9. **Upgrade `--discussion-id` from filter-only to first-class thread retrieval**
Analysis: Filtering list output by discussion ID still returns list-shaped data and partial note context. A direct thread retrieval mode is faster for agent workflows and avoids extra commands.
```diff
--- a/plan.md
+++ b/plan.md
@@ Core Changes
-| 8 | Add direct `--discussion-id` filter to notes/discussions/show | Core | Small |
+| 8 | Add direct `--discussion-id` filter + single-thread retrieval mode | Core | Medium |
@@ Change 8
+lore -J discussions --discussion-id <id> --full-thread
+# Returns one discussion with full notes payload (same note schema as show command).
```
10. **Replace ad-hoc AC performance timing with repeatable perf harness**
Analysis: `time lore ...` is noisy and machine-dependent. A reproducible seeded benchmark test gives stable guardrails and catches regressions earlier.
```diff
--- a/plan.md
+++ b/plan.md
@@ AC-10: Performance Budget
-time lore -J discussions --for-mr <iid> -n 100
-# real 0m0.100s (p95 < 150ms)
+cargo test --test perf_discussions -- --ignored --nocapture
+# Uses seeded fixture DB and N repeated runs; asserts p95 < 150ms for target query shape.
```
If you want, I can also produce a fully merged “iteration 5” rewritten plan document with these edits applied end-to-end so its directly executable by an implementation agent.

View File

@@ -0,0 +1,143 @@
Strong plan overall. The biggest gaps Id fix are around sync-health correctness, idempotency/integrity under repeated ingests, deleted-entity lifecycle, and reducing schema drift risk without heavy reflection machinery.
I avoided everything in your `## Rejected Recommendations` section.
**1. Add Sync Health Semantics (not just age)**
Time freshness alone can mislead after partial/failed syncs. Agents need to know whether data is both recent and complete.
```diff
@@ ## Freshness Metadata & Staleness Guards (Cross-Cutting, Change A/F/G)
- pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ pub sync_status: String, // "ok" | "partial" | "failed" | "never"
+ pub last_successful_sync_run_id: Option<i64>,
+ pub last_attempted_sync_run_id: Option<i64>,
@@
-#[arg(long, help_heading = "Freshness")]
-pub require_fresh: Option<String>,
+#[arg(long, help_heading = "Freshness")]
+pub require_fresh: Option<String>,
+#[arg(long, help_heading = "Freshness")]
+pub require_sync_ok: bool,
```
Rationale: this prevents false confidence when one project is fresh-by-time but latest sync actually failed or was partial.
---
**2. Add `--require-complete` Guard for Missing Required IDs**
You already expose `meta.incomplete_rows`; add a hard gate for automation.
```diff
@@ ## Count Semantics (Cross-Cutting Convention)
`incomplete_rows` is computed via a dedicated COUNT query...
+Add CLI guard:
+`--require-complete` fails with exit code 19 when `meta.incomplete_rows > 0`.
+Suggested action: `lore sync --full`.
```
Rationale: agents can fail fast instead of silently acting on partial datasets.
---
**3. Strengthen Ingestion Idempotency + Referential Integrity for Notes**
You added natural-key uniqueness for discussions; do the same for notes and enforce parent integrity at DB level.
```diff
@@ ## Supporting Indexes (Cross-Cutting, Change D)
CREATE UNIQUE INDEX IF NOT EXISTS idx_discussions_project_gitlab_discussion_id
ON discussions(project_id, gitlab_discussion_id);
+CREATE UNIQUE INDEX IF NOT EXISTS idx_notes_project_gitlab_id
+ ON notes(project_id, gitlab_id);
+
+-- Referential integrity
+-- notes.discussion_id REFERENCES discussions(id)
+-- notes.project_id REFERENCES projects(id)
```
Rationale: repeated syncs and retries wont duplicate notes, and orphaned rows cant accumulate.
---
**4. Add Deleted/Tombstoned Entity Lifecycle**
Current plan excludes null IDs but doesnt define behavior when GitLab entities are deleted after sync.
```diff
@@ ## Contract Invariants (NEW)
+### Deletion Lifecycle Invariant
+1. Notes/discussions deleted upstream are tombstoned locally (`deleted_at`), not hard-deleted.
+2. All list/show commands exclude tombstoned rows by default.
+3. Optional flag `--include-deleted` exposes tombstoned rows for audit/debug.
```
Rationale: preserves auditability, prevents ghost actions on deleted objects, and avoids destructive resync behavior.
---
**5. Expand Discussions Payload for Rename Accuracy + Better Triage**
`--path-side old` is great, but output currently only returns `position_new_*`.
```diff
@@ ## Change 3: Add Standalone `discussions` List Command
pub position_new_path: Option<String>,
pub position_new_line: Option<i64>,
+ pub position_old_path: Option<String>,
+ pub position_old_line: Option<i64>,
+ pub last_author: Option<String>,
+ pub participant_usernames: Vec<String>,
```
Rationale: for renamed/deleted files, agents need old and new coordinates to act confidently; participants/last_author improve thread routing and prioritization.
---
**6. Add SQLite Busy Handling + Retry Policy**
Read transactions + concurrent sync writes can still produce `SQLITE_BUSY` under load.
```diff
@@ ## Count Semantics (Cross-Cutting Convention)
**Snapshot consistency**: All three queries ... inside a single read transaction ...
+**Busy handling**: set `PRAGMA busy_timeout` (e.g. 5000ms) and retry transient
+`SQLITE_BUSY` errors up to 3 times with jittered backoff for read commands.
```
Rationale: improves reliability in real multi-agent usage without changing semantics.
---
**7. Make Field Definitions Single-Source (Lightweight Drift Prevention)**
You rejected full schema generation from code; a lower-cost middle ground is shared field manifests used by both docs and `--fields` validation.
```diff
@@ ## Change 7: Fix Robot-Docs Response Schemas
+#### 7h. Single-source field manifests (no reflection)
+Define per-command field constants (e.g. `NOTES_FIELDS`, `DISCUSSIONS_FIELDS`)
+used by:
+1) `--fields` validation/filtering
+2) `--fields minimal` expansion
+3) `robot-docs` schema rendering
```
Rationale: cuts drift risk materially while staying much simpler than reflection/snapshot infra.
---
**8. De-duplicate and Upgrade Test Strategy Around Concurrency**
There are duplicated tests across Change 2 and Change 3; add explicit race tests where sync writes happen between list subqueries to prove tx consistency.
```diff
@@ ## Tests
-**Test 6**: `--project-id` scopes IID resolution directly
-**Test 7**: `--path-side old` matches renamed file discussions
-**Test 8**: `--path-side either` matches both old and new paths
+Move shared discussion-filter tests to a single section under Change 3.
+Add concurrency tests:
+1) count/page/incomplete consistency under concurrent sync writes
+2) show discussion+notes snapshot consistency under concurrent writes
```
Rationale: less maintenance noise, better coverage of your highest-risk correctness path.
---
If you want, I can also produce a single consolidated patch block that rewrites your plan text end-to-end with these edits applied in-place.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,169 @@
Below are the strongest **new** revisions Id make (excluding everything in your rejected list), with rationale and plan-level diffs.
### 1. Add a durable run ledger (`sync_runs`) with phase state
This makes surgical sync crash-resumable, auditable, and safer under Ctrl+C. Right now `run_id` is mostly ephemeral; persisting phase state removes ambiguity about what completed.
```diff
@@ Design Constraints
+9. **Durable run state**: Surgical sync MUST persist a `sync_runs` row keyed by `run_id`
+ with phase transitions (`preflight`, `ingest`, `dependents`, `docs`, `embed`, `done`, `failed`).
+ This is required for crash recovery, observability, and deterministic retries.
@@ Step 9: Create `run_sync_surgical`
+Before Stage 0, insert `sync_runs(run_id, project_id, mode='surgical', requested_counts, started_at)`.
+After each stage, update `sync_runs.phase`, counters, and `last_error` if present.
+On success/failure, set terminal state (`done`/`failed`) and `finished_at`.
```
### 2. Add `--preflight-only` (network validation without writes)
`--dry-run` is intentionally zero-network, so it cannot validate IIDs. `--preflight-only` is high-value for agents: verifies existence/permissions quickly with no DB mutation.
```diff
@@ CLI Interface
lore sync --dry-run --issue 123 -p myproject
+lore sync --preflight-only --issue 123 -p myproject
@@ Step 2: Add `--issue`, `--mr`, `-p` to `SyncArgs`
+ /// Validate remote entities and auth without any DB writes
+ #[arg(long, default_value_t = false)]
+ pub preflight_only: bool,
@@ Step 10: Add branch in `run_sync`
+if options.preflight_only && options.is_surgical() {
+ return run_sync_surgical_preflight_only(config, &options, run_id, signal).await;
+}
```
### 3. Preflight should aggregate all missing/failed IIDs, not fail-fast
Fail-fast causes repeated reruns. Aggregating errors gives one-shot correction and better robot automation.
```diff
@@ Step 7: Create `src/ingestion/surgical.rs`
-/// Returns the fetched payloads. If ANY fetch fails, the entire operation should abort.
+/// Returns fetched payloads plus per-IID failures; caller aborts writes if failures exist.
pub async fn preflight_fetch(...) -> Result<PreflightResult> {
@@
#[derive(Debug, Default)]
pub struct PreflightResult {
pub issues: Vec<GitLabIssue>,
pub merge_requests: Vec<GitLabMergeRequest>,
+ pub failures: Vec<EntityFailure>, // stage="fetch"
}
@@ Step 9: Create `run_sync_surgical`
-let preflight = preflight_fetch(...).await?;
+let preflight = preflight_fetch(...).await?;
+if !preflight.failures.is_empty() {
+ result.entity_failures = preflight.failures;
+ return Err(LoreError::Other("Surgical preflight failed for one or more IIDs".into()).into());
+}
```
### 4. Stop filtering scoped queue drains with raw `json_extract` scans
`json_extract(payload_json, '$.scope_run_id')` in hot drain queries will degrade as queue grows. Use indexed scope metadata.
```diff
@@ Step 9b: Implement scoped drain helpers
-// claim query adds:
-// AND json_extract(payload_json, '$.scope_run_id') = ?
+// Add migration:
+// 1) Add `scope_run_id` generated/stored column derived from payload_json (or explicit column)
+// 2) Create index on (project_id, job_type, scope_run_id, status, id)
+// Scoped drains filter by indexed `scope_run_id`, not full-table JSON extraction.
```
### 5. Replace `dirty_source_ids` collection-by-query with explicit run scoping
Current approach can accidentally include prior dirty rows for same source and can duplicate work. Tag dirty rows with `origin_run_id` and consume by run.
```diff
@@ Design Constraints
-2. **Dirty queue scoping**: ... MUST call ... `run_generate_docs_for_dirty_ids`
+2. **Dirty queue scoping**: Surgical sync MUST scope docs by `origin_run_id` on `dirty_sources`
+ (or equivalent exact run marker) and MUST NOT drain unrelated dirty rows.
@@ Step 7: `SurgicalIngestResult`
- pub dirty_source_ids: Vec<i64>,
+ pub origin_run_id: String,
@@ Step 9a: Implement `run_generate_docs_for_dirty_ids`
-pub fn run_generate_docs_for_dirty_ids(config: &Config, dirty_source_ids: &[i64]) -> Result<...>
+pub fn run_generate_docs_for_run_id(config: &Config, run_id: &str) -> Result<...>
```
### 6. Enforce transaction safety at the type boundary
`unchecked_transaction()` + `&Connection` signatures is fragile. Accept `&Transaction` for ingest internals and use `TransactionBehavior::Immediate` for deterministic lock behavior.
```diff
@@ Step 7: Create `src/ingestion/surgical.rs`
-pub fn ingest_issue_by_iid_from_payload(conn: &Connection, ...)
+pub fn ingest_issue_by_iid_from_payload(tx: &rusqlite::Transaction<'_>, ...)
-pub fn ingest_mr_by_iid_from_payload(conn: &Connection, ...)
+pub fn ingest_mr_by_iid_from_payload(tx: &rusqlite::Transaction<'_>, ...)
-let tx = conn.unchecked_transaction()?;
+let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Immediate)?;
```
### 7. Acquire sync lock only for mutation phases, not remote preflight
This materially reduces lock contention and keeps normal sync throughput higher, while still guaranteeing mutation serialization.
```diff
@@ Design Constraints
+10. **Lock window minimization**: Preflight fetch runs without sync lock; lock is acquired immediately
+ before first DB mutation and held through all mutation stages.
@@ Step 9: Create `run_sync_surgical`
-// ── Acquire sync lock ──
-...
-// ── Stage 0: Preflight fetch ──
+// ── Stage 0: Preflight fetch (no lock, no writes) ──
...
+// ── Acquire sync lock just before Stage 1 mutation ──
```
### 8. Add explicit transient retry policy beyond 429
Client already handles rate limits; surgical reliability improves a lot if 5xx/timeouts are retried with bounded backoff.
```diff
@@ Design Constraints
+11. **Transient retry policy**: Preflight and dependent remote fetches MUST retry boundedly on
+ timeout/5xx with jittered backoff; permanent errors (404/401/403) fail immediately.
@@ Step 5: Add `get_issue_by_iid` / `get_mr_by_iid`
+Document retry behavior for transient transport/server failures.
```
### 9. Tighten automated tests around scoping invariants
You already list manual checks; these should be enforced in unit/integration tests to prevent regressions.
```diff
@@ Step 1: TDD — Write Failing Tests First
+### 1d. New invariants tests
+- `surgical_docs_scope_ignores_preexisting_dirty_rows`
+- `scoped_queue_drain_ignores_orphaned_jobs`
+- `preflight_aggregates_multiple_missing_iids`
+- `preflight_only_performs_zero_writes`
+- `dry_run_performs_zero_network_calls`
+- `lock_window_does_not_block_during_preflight`
@@ Acceptance Criteria
+32. Scoped queue/docs invariants are covered by automated tests (not manual-only verification).
```
### 10. Make robot-mode surgical output first-class
For agent workflows, include full stage telemetry and actionable recovery commands.
```diff
@@ Step 15: Update `SyncResult` for robot mode structured output
+ /// Per-stage elapsed ms for deterministic performance tracking
+ pub stage_timings_ms: std::collections::BTreeMap<String, u64>,
+ /// Suggested recovery commands (robot ergonomics)
+ pub recovery_actions: Vec<String>,
@@ Step 14: Update `robot-docs` manifest
+Document surgical-specific error codes and `actions` schema for automated recovery.
```
If you want, I can now produce a fully rewritten **iteration 3** plan that merges these into your current structure end-to-end.

View File

@@ -0,0 +1,212 @@
1. **Resolve the current contract contradictions (`preflight-only`, `dry-run`, `sync_runs`)**
Why this improves the plan:
- Right now constraints conflict: “zero DB writes before commit” vs inserting `sync_runs` during preflight.
- This ambiguity will cause implementation drift and flaky acceptance tests.
- Splitting control-plane writes from content-plane writes keeps safety guarantees strict while preserving observability.
```diff
@@ ## Design Constraints
-6. **Preflight-then-commit**: All remote fetches happen BEFORE any DB writes. If any IID fetch fails (404, network error), the entire operation aborts with zero DB mutations.
+6. **Preflight-then-commit (content-plane)**: All remote fetches happen BEFORE any writes to content tables (`issues`, `merge_requests`, `discussions`, `resource_events`, `documents`, `embeddings`).
+7. **Control-plane exception**: `sync_runs` / `sync_run_entities` writes are allowed during preflight for observability and crash diagnostics.
@@
-11. **Preflight-only mode**: `--preflight-only` validates remote entity existence and permissions with zero DB writes.
+11. **Preflight-only mode**: `--preflight-only` performs zero content writes; control-plane run-ledger writes are allowed.
@@ ### For me to evaluate (functional):
-24. **Preflight-only mode** ... no DB mutations beyond the sync_runs ledger entry
+24. **Preflight-only mode** ... no content DB mutations; only run-ledger rows may be written
```
---
2. **Add stale-write protection to avoid TOCTOU regressions during unlocked preflight**
Why this improves the plan:
- You intentionally preflight without lock; thats good for throughput but introduces race risk.
- Without a guard, a slower surgical run can overwrite newer data ingested by a concurrent normal sync.
- This is a correctness bug under contention, not a nice-to-have.
```diff
@@ ## Design Constraints
+12. **Stale-write protection**: Surgical ingest MUST NOT overwrite fresher local rows. If local `updated_at` is newer than the preflight payloads `updated_at`, skip that entity and record `skipped_stale`.
@@ ## Step 7: Create `src/ingestion/surgical.rs`
- let labels_created = process_single_issue(conn, config, project_id, issue)?;
+ // Skip stale payloads to avoid TOCTOU overwrite after unlocked preflight.
+ if is_local_newer_issue(conn, project_id, issue.iid, issue.updated_at)? {
+ result.skipped_stale += 1;
+ return Ok(result);
+ }
+ let labels_created = process_single_issue(conn, config, project_id, issue)?;
@@
+// same guard for MR path
@@ ## Step 15: Update `SyncResult`
+ /// Entities skipped because local row was newer than preflight payload
+ pub skipped_stale: usize,
@@ ### Edge cases to verify:
+38. **TOCTOU safety**: if a normal sync updates entity after preflight but before ingest, surgical run skips stale payload (no overwrite)
```
---
3. **Make dirty-source scoping exact (do not capture pre-existing rows for same entity)**
Why this improves the plan:
- Current “query dirty rows by `source_id` after ingest” can accidentally include older dirty rows for the same entity.
- That silently violates strict run scoping and can delete unrelated backlog rows.
- You can fix this without adding `origin_run_id` to `dirty_sources` (which you already rejected).
```diff
@@ ## Step 7: Create `src/ingestion/surgical.rs`
- // Collect dirty_source rows for this entity
- let mut stmt = conn.prepare(
- "SELECT id FROM dirty_sources WHERE source_type = 'issue' AND source_id = ?1"
- )?;
+ // Capture only rows inserted by THIS call using high-water mark.
+ let before_dirty_id: i64 = conn.query_row(
+ "SELECT COALESCE(MAX(id), 0) FROM dirty_sources",
+ [], |r| r.get(0),
+ )?;
+ // ... call process_single_issue ...
+ let mut stmt = conn.prepare(
+ "SELECT id FROM dirty_sources
+ WHERE id > ?1 AND source_type = 'issue' AND source_id = ?2"
+ )?;
@@
+ // same pattern for MR
@@ ### 1d. Scoping invariant tests
+#[test]
+fn surgical_docs_scope_ignores_preexisting_dirty_rows_for_same_entity() {
+ // pre-insert dirty row for iid=7, then surgical ingest iid=7
+ // assert result.dirty_source_ids only contains newly inserted rows
+}
```
---
4. **Fix embed-stage leakage when `--no-docs` is used in surgical mode**
Why this improves the plan:
- Current design can run global embed even when docs stage is skipped, which may embed unrelated backlog docs.
- That breaks the surgical “scope only this run” promise.
- This is both correctness and operator-trust critical.
```diff
@@ ## Step 9: Create `run_sync_surgical`
- if !options.no_embed {
+ // Surgical embed only runs when surgical docs actually regenerated docs in this run.
+ if !options.no_embed && !options.no_docs && result.documents_regenerated > 0 {
@@ ## Step 4: Wire new fields in `handle_sync_cmd`
+ if options.is_surgical() && options.no_docs && !options.no_embed {
+ return Err(Box::new(LoreError::Other(
+ "In surgical mode, --no-docs requires --no-embed (to preserve scoping guarantees)".to_string()
+ )));
+ }
@@ ### For me to evaluate
+39. **No embed leakage**: `sync --issue X --no-docs` never embeds unrelated unembedded docs
```
---
5. **Add queue-failure hygiene so scoped jobs do not leak forever**
Why this improves the plan:
- Scoped drains prevent accidental processing, but failed runs can strand pending jobs permanently.
- You need explicit terminalization (`aborted`) and optional replay mechanics.
- Otherwise queue bloat and confusing diagnostics accumulate.
```diff
@@ ## Step 8a: Add `sync_runs` table migration
+ALTER TABLE dependent_queue ADD COLUMN aborted_reason TEXT;
+-- status domain now includes: pending, claimed, done, failed, aborted
@@ ## Step 9: run_sync_surgical failure paths
+// On run failure/cancel:
+conn.execute(
+ "UPDATE dependent_queue
+ SET status='aborted', aborted_reason=?1
+ WHERE project_id=?2 AND scope_run_id=?3 AND status='pending'",
+ rusqlite::params![failure_summary, project_id, run_id],
+)?;
@@ ## Acceptance Criteria
+40. **No stranded scoped jobs**: failed surgical runs leave no `pending` rows for their `scope_run_id`
```
---
6. **Persist per-entity lifecycle (`sync_run_entities`) for real observability and deterministic retry**
Why this improves the plan:
- `sync_runs` alone gives aggregate counters but not which IID failed at which stage.
- Per-entity records make retries deterministic and robot output far more useful.
- This is the missing piece for your stated “deterministic retry decisions.”
```diff
@@ ## Step 8a: Add `sync_runs` table migration
+CREATE TABLE IF NOT EXISTS sync_run_entities (
+ id INTEGER PRIMARY KEY,
+ run_id TEXT NOT NULL REFERENCES sync_runs(run_id),
+ entity_type TEXT NOT NULL CHECK(entity_type IN ('issue','merge_request')),
+ iid INTEGER NOT NULL,
+ stage TEXT NOT NULL,
+ status TEXT NOT NULL CHECK(status IN ('ok','failed','skipped_stale')),
+ error_code TEXT,
+ error_message TEXT,
+ updated_at INTEGER NOT NULL
+);
+CREATE INDEX IF NOT EXISTS idx_sync_run_entities_run ON sync_run_entities(run_id, entity_type, iid);
@@ ## Step 15: Update `SyncResult`
+ pub failed_iids: Vec<(String, u64)>,
+ pub skipped_stale_iids: Vec<(String, u64)>,
@@ ## CLI Interface
+lore --robot sync-runs --run-id <id>
+lore --robot sync-runs --run-id <id> --retry-failed
```
---
7. **Use explicit error type for surgical preflight failures (not `LoreError::Other`)**
Why this improves the plan:
- `Other(String)` loses machine semantics, weakens robot mode, and leads to bad exit-code behavior.
- A typed error preserves structured failures and enables actionable recovery commands.
```diff
@@ ## Step 9: run_sync_surgical
- return Err(LoreError::Other(
- format!("Surgical preflight failed for {} of {} IIDs: {}", ...)
- ).into());
+ return Err(LoreError::SurgicalPreflightFailed {
+ run_id: run_id.to_string(),
+ total: total_items,
+ failures: preflight.failures.clone(),
+ }.into());
@@ ## Step 15: Update `SyncResult`
+ /// Machine-actionable error summary for robot mode
+ pub error_code: Option<String>,
@@ ## Acceptance Criteria
+41. **Typed failure**: preflight failures serialize structured errors (not generic `Other`) with machine-usable codes/actions
```
---
8. **Strengthen tests for rollback, contention, and stale-skip guarantees**
Why this improves the plan:
- Current tests cover many happy-paths and scoping invariants, but key race/rollback behaviors are still under-tested.
- These are exactly where regressions will appear first in production.
```diff
@@ ## Step 1: TDD — Write Failing Tests First
+### 1f. Transactional rollback + TOCTOU tests
+1. `preflight_success_then_ingest_failure_rolls_back_all_content_writes`
+2. `stale_payload_is_skipped_when_local_updated_at_is_newer`
+3. `failed_run_aborts_pending_scoped_jobs`
+4. `surgical_no_docs_requires_no_embed`
@@ ### Automated scoping invariants
-38. **Scoped queue/docs invariants are enforced by automated tests**
+42. **Rollback and race invariants are enforced by automated tests** (no partial writes on ingest failure, no stale overwrite)
```
---
These eight revisions keep your core approach intact, avoid your explicitly rejected ideas, and close the biggest correctness/operability gaps before implementation.

View File

@@ -0,0 +1,130 @@
**Critical Gaps In Current Plan**
1. `dirty_sources` scoping is based on `id`, but `dirty_sources` has no `id` column and uses `(source_type, source_id)` UPSERT semantics.
2. Plan assumes a new `dependent_queue` with `status`, but current code uses `pending_dependent_fetches` (delete-on-complete), so queue-scoping design conflicts with existing invariants.
3. Constraint 6 says all remote fetches happen before any content writes, but the proposed surgical flow fetches discussions/events/diffs after ingest writes.
4. `sync_runs` is already an existing table and already used by `SyncRunRecorder`; the plan currently treats it like a new table.
**Best Revisions**
1. **Fix dirty-source scoping to match real schema (queued-at watermark, not `id` high-water).**
Why this is better: This removes a correctness bug and makes same-entity re-ingest deterministic under UPSERT behavior.
```diff
@@ Design Constraints
-2. Dirty queue scoping: ... capture MAX(id) FROM dirty_sources ... run_generate_docs_for_dirty_ids ...
+2. Dirty queue scoping: `dirty_sources` is keyed by `(source_type, source_id)` and updated via UPSERT.
+ Surgical scoping MUST use:
+ 1) a run-level `run_dirty_floor_ms` captured before surgical ingest, and
+ 2) explicit touched source keys from ingest (`(source_type, source_id)`).
+ Surgical docs MUST call a scoped API (e.g. `run_generate_docs_for_sources`) and MUST NOT drain global dirty queue.
@@ Step 9a
-pub fn run_generate_docs_for_dirty_ids(config: &Config, dirty_source_ids: &[i64]) -> Result<GenerateDocsResult>
+pub fn run_generate_docs_for_sources(config: &Config, sources: &[(SourceType, i64)]) -> Result<GenerateDocsResult>
```
2. **Bypass shared dependent queue in surgical mode; run dependents inline per target.**
Why this is better: Avoids queue migration churn, avoids run-scope conflicts with existing unique constraints, and removes orphan-job hygiene complexity entirely.
```diff
@@ Design Constraints
-4. Dependent queue scoping: ... scope_run_id indexed column on dependent_queue ...
+4. Surgical dependent execution: surgical mode MUST bypass `pending_dependent_fetches`.
+ Dependents (resource_events, mr_closes_issues, mr_diffs) run inline for targeted entities only.
+ Global queue remains for normal sync only.
@@ Design Constraints
-14. Queue failure hygiene: ... pending scoped jobs ... terminalized to aborted ...
+14. Surgical failure hygiene: surgical mode MUST leave no queue artifacts because it does not enqueue dependent jobs.
@@ Step 9b / 9c / Step 13
-Implement scoped drain helpers and enqueue_job scope_run_id plumbing
+Replace with direct per-entity helpers in ingestion layer:
+ - sync_issue_resource_events_direct(...)
+ - sync_mr_resource_events_direct(...)
+ - sync_mr_closes_issues_direct(...)
+ - sync_mr_diffs_direct(...)
```
3. **Clarify atomicity contract to “primary-entity atomicity” (remove contradiction).**
Why this is better: Keeps strong zero-write guarantees for missing IIDs while matching practical staged pipeline behavior.
```diff
@@ Design Constraints
-6. Preflight-then-commit (content-plane): All remote fetches happen BEFORE any writes to content tables ...
+6. Primary-entity atomicity: all requested issue/MR payload fetches complete before first content write.
+ If any primary IID fetch fails, primary ingest does zero content writes.
+ Dependent stages (discussions/events/diffs/closes) are post-ingest and best-effort, with structured per-stage failure reporting.
```
4. **Extend existing `sync_runs` schema instead of redefining it.**
Why this is better: Preserves compatibility with current `SyncRunRecorder`, `sync_status`, and existing historical data.
```diff
@@ Step 8a
-Add `sync_runs` table migration (CREATE TABLE sync_runs ...)
+Add migration 027 to extend existing `sync_runs` table:
+ - ADD COLUMN mode TEXT NULL -- 'standard' | 'surgical'
+ - ADD COLUMN phase TEXT NULL -- preflight|ingest|dependents|docs|embed|done|failed
+ - ADD COLUMN surgical_summary_json TEXT NULL
+Reuse `SyncRunRecorder` row lifecycle; do not introduce a parallel run-ledger model.
```
5. **Strengthen TOCTOU stale protection for equal timestamps.**
Why this is better: Prevents regressions when `updated_at` is equal but a fresher local fetch already happened.
```diff
@@ Design Constraints
-13. ... If local `updated_at` is newer than preflight payload `updated_at`, skip ...
+13. ... Skip stale when:
+ a) local.updated_at > payload.updated_at, OR
+ b) local.updated_at == payload.updated_at AND local.last_seen_at > preflight_started_at_ms.
+ This prevents equal-timestamp regressions under concurrent sync.
@@ Step 1f tests
+Add test: `equal_updated_at_but_newer_last_seen_is_skipped`.
```
6. **Shrink lock window further: release `sync` lock before embed; use dedicated embed lock.**
Why this is better: Prevents long embedding from blocking unrelated syncs and avoids concurrent embed writers.
```diff
@@ Design Constraints
-11. Lock ... held through all mutation stages.
+11. Lock ... held through ingest/dependents/docs only.
+ Release `AppLock("sync")` before embed.
+ Embed stage uses `AppLock("embed")` for single-flight embedding writes.
@@ Step 9
-Embed runs inside the same sync lock window
+Embed runs after sync lock release, under dedicated embed lock
```
7. **Add the missing `sync-runs` robot read path (the plan references it but doesnt define it).**
Why this is better: Makes durable run-state actually useful for recovery automation and observability.
```diff
@@ Step 14 (new)
+## Step 14a: Add `sync-runs` read command
+
+CLI:
+ lore --robot sync-runs --limit 20
+ lore --robot sync-runs --run-id <id>
+ lore --robot sync-runs --state failed
+
+Robot response fields:
+ run_id, mode, phase, status, started_at, finished_at, counters, failures, suggested_retry_command
```
8. **Add URL-native surgical targets (`--issue-url`, `--mr-url`) with project inference.**
Why this is better: Much more agent-friendly and reduces project-resolution errors from copy/paste workflows.
```diff
@@ CLI Interface
lore sync --issue 123 --issue 456 -p myproject
+lore sync --issue-url https://gitlab.example.com/group/proj/-/issues/123
+lore sync --mr-url https://gitlab.example.com/group/proj/-/merge_requests/789
@@ Step 2
+Add repeatable flags:
+ --issue-url <url>
+ --mr-url <url>
+Parse URL into (project_path, iid). If all targets are URL-derived and same project, `-p` is optional.
+If mixed projects are provided in one command, reject with clear error.
```
If you want, I can produce a single consolidated patched version of your plan (iteration 5 draft) with these revisions already merged.

View File

@@ -0,0 +1,152 @@
Highest-impact revisions after reviewing your v5 plan:
1. **Fix a real scoping hole: embed can still process unrelated docs**
Rationale: Current plan assumes scoped docs implies scoped embed, but that only holds while no other run creates unembedded docs. You explicitly release sync lock before embed, so another sync can enqueue/regenerate docs in between, and `run_embed` may embed unrelated backlog. This breaks surgical isolation and can hide backlog debt.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
-3. Embed scoping: Embedding runs only for documents regenerated by this surgical run. Because `run_embed` processes only unembedded docs, scoping is automatic IF docs are scoped correctly...
+3. Embed scoping: Embedding MUST be explicitly scoped to documents regenerated by this surgical run.
+ `run_generate_docs_for_sources` returns regenerated `document_ids`; surgical mode calls
+ `run_embed_for_document_ids(document_ids)` and never global `run_embed`.
+ This remains true even after lock release and under concurrent normal sync activity.
@@ Step 9a: Implement `run_generate_docs_for_sources`
-pub fn run_generate_docs_for_sources(...) -> Result<GenerateDocsResult> {
+pub fn run_generate_docs_for_sources(...) -> Result<GenerateDocsResult> {
+ // Return regenerated document IDs for scoped embedding.
+ // GenerateDocsResult { regenerated, errored, regenerated_document_ids: Vec<i64> }
@@ Step 9: Embed stage
- match run_embed(config, false, false, None, signal).await {
+ match run_embed_for_document_ids(config, &result.regenerated_document_ids, signal).await {
```
2. **Make run-ledger lifecycle actually durable (and consistent with your own constraint 10)**
Rationale: Plan text says “reuse `SyncRunRecorder`”, but Step 9 writes raw SQL directly. That creates lifecycle drift, missing heartbeats, and inconsistent failure handling as code evolves.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
-10. Durable run state: ... Reuses `SyncRunRecorder` row lifecycle ...
+10. Durable run state: surgical sync MUST use `SyncRunRecorder` end-to-end (no ad-hoc SQL updates).
+ Add recorder APIs for `set_mode`, `set_phase`, `set_counters`, `finish_succeeded`,
+ `finish_failed`, `finish_cancelled`, and periodic `heartbeat`.
@@ Step 9: Create `run_sync_surgical`
- conn.execute("INSERT INTO sync_runs ...")
- conn.execute("UPDATE sync_runs SET phase = ...")
+ let mut recorder = SyncRunRecorder::start_surgical(...)?;
+ recorder.set_phase("preflight")?;
+ recorder.heartbeat_if_due()?;
+ recorder.set_phase("ingest")?;
+ ...
+ recorder.finish_succeeded_with_warnings(...)?;
```
3. **Add explicit `cancelled` terminal state**
Rationale: Current early cancellation branches return `Ok(result)` without guaranteed run-row finalization. That leaves misleading `running` rows and weak crash diagnostics.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
+15. Cancellation semantics: If shutdown is observed after run start, phase is set to `cancelled`,
+ status is `cancelled`, `finished_at` is written, and lock is released before return.
@@ Step 8a migration
+ALTER TABLE sync_runs ADD COLUMN warnings_count INTEGER NOT NULL DEFAULT 0;
+ALTER TABLE sync_runs ADD COLUMN cancelled_at INTEGER;
@@ Acceptance Criteria
+47. Cancellation durability: Ctrl+C during surgical sync records `status='cancelled'`,
+ `phase='cancelled'`, and `finished_at` in `sync_runs`.
```
4. **Reduce lock contention further by separating dependent fetch and dependent write**
Rationale: You currently hold lock through network-heavy dependent stages. That maximizes contention and increases lock timeout risk. Better: fetch dependents unlocked, write in short locked transactions with per-entity freshness guards.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
-11. Lock window minimization: ... held through ingest, dependents, and docs stages.
+11. Lock window minimization: lock is held only for DB mutation windows.
+ Dependents run in two phases:
+ (a) fetch from GitLab without lock,
+ (b) write results under lock in short transactions.
+ Apply per-entity freshness checks before dependent writes.
@@ Step 9: Dependent stages
- // All dependents run INLINE per-entity ... while lock is held
+ // Dependents fetch outside lock, then write under lock with CAS-style watermark guards.
```
5. **Introduce stage timeout budgets to prevent hung surgical runs**
Rationale: A single slow GitLab endpoint can stall the whole run and hold resources too long. Timeout budgets plus per-entity failure recording keep the run bounded and predictable.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
+16. Stage timeout budgets: each dependent fetch has a per-entity timeout and a global stage budget.
+ Timed-out entities are recorded in `entity_failures` with code `TIMEOUT` and run continues best-effort.
@@ Step 9 notes
+ - Wrap dependent network calls with `tokio::time::timeout`.
+ - Add config knobs:
+ `sync.surgical_entity_timeout_seconds` (default 20),
+ `sync.surgical_dependents_budget_seconds` (default 120).
```
6. **Add payload integrity checks (project mismatch hard-fail)**
Rationale: Surgical mode is precision tooling. If API/proxy misconfiguration returns payloads from wrong project, you should fail preflight loudly, not trust downstream assumptions.
```diff
diff --git a/plan.md b/plan.md
@@ Step 7: preflight_fetch
+ // Integrity check: payload.project_id must equal requested gitlab_project_id.
+ // On mismatch, record EntityFailure { code: "PROJECT_MISMATCH", stage: "fetch" }.
@@ Step 9d: error codes
+PROJECT_MISMATCH -> usage/config data integrity failure (typed, machine-readable)
@@ Acceptance Criteria
+48. Project integrity: payloads with unexpected `project_id` are rejected in preflight
+ and produce zero content writes.
```
7. **Upgrade robot output from aggregate-only to per-entity lifecycle**
Rationale: `entity_failures` alone is not enough for robust automation. Agents need a complete entity outcome map (fetched, ingested, stale-skipped, dependent failures) to retry deterministically.
```diff
diff --git a/plan.md b/plan.md
@@ Step 15: Update `SyncResult`
+pub struct EntityOutcome {
+ pub entity_type: String,
+ pub iid: u64,
+ pub fetched: bool,
+ pub ingested: bool,
+ pub stale_skipped: bool,
+ pub dependent_failures: Vec<EntityFailure>,
+}
@@
+pub entity_outcomes: Vec<EntityOutcome>,
+pub completion_status: String, // succeeded | succeeded_with_warnings | failed | cancelled
@@ Robot mode
- enables agents to detect partial failures via `entity_failures`
+ enables deterministic, per-IID retry and richer UI messaging.
```
8. **Index `sync_runs` for real observability at scale**
Rationale: Youre adding mode/phase/counters and then querying recent surgical runs. Without indexes, this degrades as run history grows.
```diff
diff --git a/plan.md b/plan.md
@@ Step 8a migration
+CREATE INDEX IF NOT EXISTS idx_sync_runs_mode_started
+ ON sync_runs(mode, started_at DESC);
+CREATE INDEX IF NOT EXISTS idx_sync_runs_status_phase_started
+ ON sync_runs(status, phase, started_at DESC);
```
9. **Add tests specifically for the new failure-prone paths**
Rationale: Current tests are strong on ingest and scoping, but still miss new high-risk runtime behavior (cancel state, timeout handling, scoped embed under concurrency).
```diff
diff --git a/plan.md b/plan.md
@@ Step 1f tests
+#[tokio::test]
+async fn cancellation_marks_sync_run_cancelled() { ... }
+
+#[tokio::test]
+async fn dependent_timeout_records_entity_failure_and_continues() { ... }
+
+#[tokio::test]
+async fn scoped_embed_does_not_embed_unrelated_docs_created_after_docs_stage() { ... }
@@ Acceptance Criteria
+49. Scoped embed isolation under concurrency is verified by automated test.
+50. Timeout path is verified (TIMEOUT code + continued processing).
```
These revisions keep your core direction intact, avoid every rejected recommendation, and materially improve correctness under concurrency, operational observability, and agent automation quality.

2240
docs/plan-surgical-sync.md Normal file

File diff suppressed because it is too large Load Diff

866
docs/prd-observability.md Normal file
View File

@@ -0,0 +1,866 @@
# PRD: Observability Infrastructure for lore CLI
**Status:** Draft
**Author:** Taylor + Claude
**Date:** 2026-02-04
---
## 1. Problem Statement
lore currently has minimal observability. Logging is ephemeral (stderr only), there are no persistent log files, no performance metrics, no structured JSON log output, no verbosity controls beyond `RUST_LOG`, and no way to diagnose issues after the fact. When a sync fails at 3 AM in a cron job, or an embedding run takes 10x longer than usual, there is zero forensic data available.
### Current State
| Capability | Status |
|---|---|
| Log destination | stderr only, ephemeral |
| Log persistence | None |
| Structured output | Human-readable fmt only |
| Verbosity control | `RUST_LOG` env var (no CLI flag) |
| Performance metrics | Ad-hoc `Instant::now()` in 2 commands |
| Timing in robot JSON | `elapsed_ms` in search and sync `meta` only |
| Spans / correlation | None |
| Log rotation | None |
| Per-stage timing | None |
| Rate limit / retry visibility | `tracing::warn!` only |
| Error aggregation | None |
| Historical comparison | None |
### What's Already in Place (to build on)
- `tracing` (0.1) + `tracing-subscriber` (0.3) with `env-filter` feature
- Registry-based subscriber initialized in `src/main.rs:44-58` with a single `fmt::layer()` using `SuspendingWriter`
- `SuspendingWriter` (`src/cli/progress.rs:25-73`) that coordinates log output with indicatif `MultiProgress` — buffers each log line, calls `MULTI.suspend()` on drop to clear progress bars before writing to stderr
- `IngestDisplay` struct (`src/cli/commands/ingest.rs:65-104`) controlling UI verbosity with three modes: `interactive()` / `silent()` / `progress_only()`
- Robot mode JSON envelope: `{ "ok": true, "data": {...}, "meta": {...} }` — used consistently in sync, search, sync-status, and doctor commands
- XDG-compliant data directory at `~/.local/share/lore/`
- `sync_runs` table (migration 001) with schema: `id`, `started_at`, `heartbeat_at`, `finished_at`, `status`, `command`, `error`, `metrics_json`**exists but is never written to** (no INSERT anywhere in the codebase; `sync_status.rs` reads from it but always gets zero rows)
- `uuid` crate (v1, v4 feature) already a dependency
- Structured fields used in tracing calls (e.g., `info!(owner = %self.owner, ...)`)
- `EnvFilter` currently hardcoded: `lore=info` + `warn` default directives
- Global CLI flags in `src/cli/mod.rs:9-43`: `--config`, `--robot`, `-J`, `--color`, `--quiet` (all `global = true`)
---
## 2. Goals
### Primary
1. **Post-mortem debugging**: Any failed or slow run can be diagnosed after the fact from persistent, structured log files.
2. **Performance visibility**: Every sync/ingest/embed/search operation reports granular stage-level timing, both to the terminal and to persistent storage.
3. **Ergonomic verbosity**: Users and agents control log verbosity through CLI flags (`-v`, `-vv`, `-vvv`) without needing to know `RUST_LOG` syntax.
4. **Machine-parseable logs**: A JSON log mode for piping into log aggregators (jq, Datadog, Loki, etc.).
5. **Agent-friendly metrics**: Robot mode JSON output includes comprehensive timing breakdowns for every command, enabling automated monitoring.
### Secondary
6. **Log rotation and retention**: Log files don't grow unbounded; old logs are automatically cleaned up.
7. **Correlation IDs**: Every sync run gets a unique ID that connects log lines, database records, and robot output.
8. **Rate limit and retry transparency**: Every rate-limited request and retry is visible in logs with full context.
9. **Sync history with metrics**: The `sync_runs` table is enriched with per-stage timing, item counts, and error counts so `lore sync-status` becomes a real dashboard.
### Non-Goals
- External telemetry export (OpenTelemetry, Prometheus) -- out of scope for v1.
- Real-time log streaming / tailing UI.
- Alerting or notification systems.
- Distributed tracing across multiple lore instances.
---
## 3. Research Foundation
### 3.1 The Three Pillars of Observability
Academic and industry consensus (Gholamian & Ward 2021, "A Comprehensive Survey of Logging in Software") identifies three pillars:
1. **Logs** -- Discrete events with context. The foundation.
2. **Metrics** -- Numerical measurements over time (counters, gauges, histograms).
3. **Traces** -- Causally ordered spans representing operations.
For a CLI tool (not a long-running service), the mapping is:
| Pillar | CLI Equivalent |
|---|---|
| Logs | Structured log files per invocation |
| Metrics | Per-stage timing, item counts, error counts stored in DB |
| Traces | Span hierarchy within a single invocation (sync -> ingest issues -> fetch page N -> sync discussions) |
### 3.2 Structured Logging Best Practices
From Duan et al. 2025 ("PDLogger: Automated Logging Framework for Practical Software Development") and industry practice:
- **Always structured**: JSON or key=value, never free-form prose in production logs.
- **Contextual fields propagate**: A sync_run_id set at the top level appears in every downstream log line.
- **Levels have semantic meaning**:
- `ERROR`: Operation failed, requires attention.
- `WARN`: Degraded behavior (rate limited, retry, skip).
- `INFO`: Significant state transitions (stage start/complete, items processed).
- `DEBUG`: Detailed operational data (page boundaries, individual API calls).
- `TRACE`: Wire-level detail (request/response bodies, SQL queries).
### 3.3 CLI Verbosity Conventions
From the GNU Coding Standards, POSIX conventions, and modern Rust CLI tools (ripgrep, fd, cargo):
| Pattern | Meaning | Precedent |
|---|---|---|
| (default) | INFO for app, WARN for deps | cargo, rustc |
| `-q` / `--quiet` | Suppress non-error output | ripgrep, fd, cargo |
| `-v` | DEBUG for app | ripgrep, fd |
| `-vv` | DEBUG for app + deps | cargo |
| `-vvv` | TRACE for everything | cargo, curl |
| `RUST_LOG=...` | Fine-grained override | Universal in Rust |
The `-v` flag should feel familiar to anyone who has used cargo, curl, or ssh.
### 3.4 Log File Rotation
`tracing-appender` (from the tokio-rs/tracing ecosystem) provides:
- **Daily rotation**: New file per day, named `lore.2026-02-04.log`.
- **Non-blocking writes**: Dedicated writer thread, zero impact on main async runtime.
- **Configurable retention**: Delete files older than N days.
This is the canonical solution in the Rust tracing ecosystem and requires no custom code.
### 3.5 Performance Metrics for CLI Tools
Inspired by hyperfine's approach to benchmarking and cargo's `--timings` flag:
- Report wall-clock time per stage.
- Report item throughput (items/sec).
- Store historical runs for trend comparison.
- Present timing data in both human-readable and machine-readable formats.
---
## 4. Design
### 4.1 Architecture Overview
```
CLI Invocation
|
+----------+----------+
| |
Interactive Mode Robot Mode
| |
+---stderr (human fmt) stdout (JSON envelope)
| | |
| progress bars { ok, data, meta: {
| colored output elapsed_ms,
| stages: [...],
| run_id
| }}
|
Log Subscribers (layered)
|
+----+----+--------+
| | |
stderr file (future:
(fmt) (JSON) OTLP)
```
### 4.2 Subscriber Stack
Replace the current single-layer subscriber with a layered registry. Each layer has its own filter:
```
registry()
.with(stderr_layer.with_filter(stderr_filter)) // Human-readable, SuspendingWriter, -v controlled
.with(file_layer.with_filter(file_filter)) // JSON, daily rotation, always DEBUG+
```
**stderr layer**: Same `fmt::layer()` as today with `SuspendingWriter`, but level controlled by `-v` flags. When `--log-format json` is passed, this layer switches to `fmt::layer().json()` (same JSON format as file layer, but still routed through `SuspendingWriter` for progress bar coordination).
**file layer**: Always-on JSON output to `~/.local/share/lore/logs/`, daily rotation via `tracing-appender`. Uses its own `EnvFilter` set to `lore=debug,warn` regardless of `-v` flags, ensuring post-mortem data is always available. The file layer does NOT use `SuspendingWriter` — it writes to a file, not stderr, so progress bar coordination is unnecessary.
**Filter architecture**: Per-layer filtering (not a single shared `EnvFilter`) is required because the file layer must always be at DEBUG+ while stderr follows `-v`. `tracing-subscriber`'s `Layer::with_filter()` method enables this.
**`RUST_LOG` override**: When `RUST_LOG` is set, it overrides BOTH layer filters. This is the expert escape hatch.
**Current subscriber** (`src/main.rs:44-58`):
```rust
tracing_subscriber::registry()
.with(
tracing_subscriber::fmt::layer()
.with_target(false)
.with_writer(lore::cli::progress::SuspendingWriter),
)
.with(
EnvFilter::from_default_env()
.add_directive("lore=info".parse().unwrap())
.add_directive("warn".parse().unwrap()),
)
.init();
```
This will be replaced by the dual-layer setup. The `SuspendingWriter` integration and `with_target(false)` on the stderr layer remain unchanged.
### 4.3 Verbosity Levels
#### stderr layer (controlled by `-v` flags)
| Flags | App Level | Dep Level | Behavior |
|---|---|---|---|
| (none) | INFO | WARN | Default. Stage transitions, summaries. |
| `-q` | WARN | ERROR | Errors and warnings only. |
| `-v` | DEBUG | WARN | Detailed app behavior. API pages, skip reasons. |
| `-vv` | DEBUG | INFO | App + dependency detail. HTTP client, SQLite. |
| `-vvv` | TRACE | DEBUG | Everything. Wire-level detail. |
| `RUST_LOG=...` | (overrides all) | (overrides all) | Expert escape hatch. |
Precedence: `RUST_LOG` > `-v` flags > defaults. This matches cargo's behavior.
#### file layer (independent of `-v` flags)
| Condition | App Level | Dep Level |
|---|---|---|
| Always (default) | DEBUG | WARN |
| `RUST_LOG=...` set | (overrides) | (overrides) |
The file layer always captures DEBUG+ for the `lore` crate and WARN+ for dependencies. This ensures post-mortem data is available even when the user ran with default stderr verbosity. `RUST_LOG` overrides both layers when set.
#### New CLI flags
Add to the `Cli` struct (`src/cli/mod.rs`):
```rust
/// Increase log verbosity (-v, -vv, -vvv)
#[arg(short = 'v', long = "verbose", action = clap::ArgAction::Count, global = true)]
pub verbose: u8,
/// Log format for stderr output: text (default) or json
#[arg(long = "log-format", global = true, value_parser = ["text", "json"], default_value = "text")]
pub log_format: String,
```
The `-v` flag uses `clap::ArgAction::Count` to support `-v`, `-vv`, `-vvv` as a single flag with increasing count. The `--log-format` flag controls whether stderr emits human-readable or JSON-formatted log lines.
### 4.4 Structured Log File Output
**Location**: `~/.local/share/lore/logs/lore.YYYY-MM-DD.log`
**Format**: One JSON object per line (JSONL), produced by `tracing-subscriber`'s `fmt::layer().json()`:
```json
{"timestamp":"2026-02-04T14:32:01.123Z","level":"INFO","target":"lore::ingestion","fields":{"message":"Discussion sync complete","project":"group/repo","issues_synced":42,"elapsed_ms":1234},"span":{"name":"ingest_issues","run_id":"a1b2c3"}}
```
**Rotation**: Daily via `tracing-appender::rolling::daily()`.
**Retention**: Configurable, default 30 days. A `logs.retention_days` config field. Cleanup runs at startup (check directory, delete files older than N days).
### 4.5 Tracing Spans
Introduce spans for causal correlation within a single invocation:
```
sync (run_id=uuid)
+-- ingest_issues
| +-- fetch_pages (project="group/repo")
| +-- sync_discussions (project="group/repo")
| +-- fetch_resource_events (project="group/repo")
+-- ingest_mrs
| +-- fetch_pages (project="group/repo")
| +-- sync_discussions (project="group/repo")
+-- generate_docs
+-- embed
```
Each span records `elapsed_ms` on close. The `run_id` propagates to all child spans and log events, enabling `jq '.span.run_id == "a1b2c3"' lore.2026-02-04.log` to extract an entire run.
### 4.6 Performance Metrics
#### 4.6.1 Per-Stage Timing
Every command collects a `Vec<StageTiming>`:
```rust
#[derive(Debug, Clone, Serialize)]
pub struct StageTiming {
pub name: String, // "ingest_issues", "fetch_pages", etc.
#[serde(skip_serializing_if = "Option::is_none")]
pub project: Option<String>, // Which project, if applicable
pub elapsed_ms: u64,
pub items_processed: usize,
#[serde(skip_serializing_if = "is_zero")]
pub items_skipped: usize,
#[serde(skip_serializing_if = "is_zero")]
pub errors: usize,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub sub_stages: Vec<StageTiming>, // Nested child stages
}
```
**Collection mechanism**: Stage timing is materialized from tracing spans, not plumbed manually through function signatures. Phase 2 adds `#[instrument]` spans to each sync stage. Phase 3 adds a custom `tracing-subscriber` layer that records span enter/exit times and structured fields, then extracts the span tree into `Vec<StageTiming>` when the root span closes.
This means:
- No mutable timing collector threaded through `run_ingest``fetch_pages``sync_discussions`
- Spans are the single source of truth for timing
- `StageTiming` is a materialized view of the span tree
- The custom layer implements `on_close` to capture `elapsed` and `on_record` to capture structured fields like `items_processed`
**Where to define**: `src/core/metrics.rs` (new file — genuinely new functionality that doesn't fit in any existing file)
#### 4.6.2 Robot JSON Meta Enhancement
Currently:
```json
{ "ok": true, "data": {...}, "meta": { "elapsed_ms": 1234 } }
```
Proposed:
```json
{
"ok": true,
"data": { ... },
"meta": {
"run_id": "a1b2c3d4",
"elapsed_ms": 45230,
"stages": [
{
"name": "ingest_issues",
"elapsed_ms": 12340,
"items_processed": 150,
"items_skipped": 30,
"errors": 0,
"sub_stages": [
{ "name": "fetch_pages", "project": "group/repo", "elapsed_ms": 5200, "items_processed": 150 },
{ "name": "sync_discussions", "project": "group/repo", "elapsed_ms": 6800, "items_processed": 42, "items_skipped": 108 }
]
},
{
"name": "ingest_mrs",
"elapsed_ms": 18900,
"items_processed": 85,
"items_skipped": 12,
"errors": 1
},
{ "name": "generate_docs", "elapsed_ms": 8500, "items_processed": 235 },
{ "name": "embed", "elapsed_ms": 5490, "items_processed": 1024 }
]
}
}
```
#### 4.6.3 Sync History Enrichment
**Prerequisite bug fix**: The `sync_runs` table (migration 001) exists with columns `id`, `started_at`, `heartbeat_at`, `finished_at`, `status`, `command`, `error`, `metrics_json` — but **no code ever writes to it**. The `sync_status.rs` command reads from it but always gets zero rows. This must be fixed before enrichment.
**Step 1: Wire up sync_runs lifecycle** (prerequisite, in Phase 4)
Add INSERT/UPDATE calls to the sync and ingest command handlers:
```rust
// On sync/ingest start:
INSERT INTO sync_runs (started_at, heartbeat_at, status, command)
VALUES (?now_ms, ?now_ms, 'running', ?command_name)
RETURNING id;
// On sync/ingest success:
UPDATE sync_runs
SET finished_at = ?now_ms, status = 'succeeded', metrics_json = ?metrics
WHERE id = ?run_id;
// On sync/ingest failure:
UPDATE sync_runs
SET finished_at = ?now_ms, status = 'failed', error = ?error_msg, metrics_json = ?metrics
WHERE id = ?run_id;
```
**Where**: Add a `SyncRunRecorder` helper in `src/core/db.rs` or `src/core/sync_run.rs` that encapsulates the INSERT/UPDATE lifecycle. Called from `run_sync()` in `src/cli/commands/sync.rs` and `run_ingest()` in `src/cli/commands/ingest.rs`.
**Step 2: Schema migration** (migration 014)
Add dedicated queryable columns alongside the existing `metrics_json`:
```sql
-- Migration 014: sync_runs enrichment for observability
ALTER TABLE sync_runs ADD COLUMN run_id TEXT;
ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
-- Index for correlation queries
CREATE INDEX idx_sync_runs_run_id ON sync_runs(run_id);
```
The existing `metrics_json` column stores the detailed `Vec<StageTiming>` as a JSON array. No need for a separate `stages_json` column.
**Step 3: Enhanced sync-status display**
`lore sync-status` (`src/cli/commands/sync_status.rs`) currently shows only the last run. Enhance to show recent runs with metrics:
```
Recent sync runs:
Run a1b2c3 | 2026-02-04 14:32 | 45.2s | 235 items | 1 error
Run d4e5f6 | 2026-02-03 14:30 | 38.1s | 220 items | 0 errors
Run g7h8i9 | 2026-02-02 14:29 | 42.7s | 228 items | 0 errors
```
Robot mode (`lore --robot sync-status`):
```json
{
"ok": true,
"data": {
"runs": [
{
"run_id": "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
"started_at": "2026-02-04T14:32:01.123Z",
"elapsed_ms": 45230,
"status": "succeeded",
"command": "sync",
"total_items_processed": 235,
"total_errors": 1,
"stages": [...]
}
],
"cursors": [...],
"summary": {...}
}
}
```
The `stages` array is parsed from `metrics_json` and included in the robot output. Interactive mode shows the summary table above; `lore --robot sync-status --run a1b2c3` shows a single run's full stage breakdown.
#### 4.6.4 Human-Readable Timing
At the end of `lore sync` (interactive mode), print a timing summary:
```
Sync complete in 45.2s
Ingest issues .... 12.3s (150 items, 42 discussions)
Ingest MRs ....... 18.9s (85 items, 1 error)
Generate docs .... 8.5s (235 documents)
Embed ............ 5.5s (1024 chunks)
```
Gated behind `display.show_text` so it doesn't appear in progress_only or silent modes.
### 4.7 Rate Limit and Retry Transparency
Currently, rate limits emit a `tracing::warn!`. Enhance to:
- Log at INFO level (not just WARN) with structured fields: `info!(path, attempt, retry_after_secs, "Rate limited, retrying")`.
- Count total rate-limit hits per run and include in stage timing.
- In `-v` mode, show retry progress on stderr: ` Retrying /api/v4/projects/123/issues (429, waiting 2s)`.
### 4.8 Configuration
Add a new `logging` section to `Config` (`src/core/config.rs`):
```rust
#[derive(Debug, Clone, Deserialize)]
pub struct LoggingConfig {
/// Directory for log files. Default: ~/.local/share/lore/logs/
#[serde(default)]
pub log_dir: Option<String>,
/// Days to retain log files. Default: 30. Set to 0 to disable file logging.
#[serde(default = "default_retention_days")]
pub retention_days: u32,
/// Enable JSON log files. Default: true.
#[serde(default = "default_true")]
pub file_logging: bool,
}
fn default_retention_days() -> u32 { 30 }
fn default_true() -> bool { true }
```
Add to the `Config` struct:
```rust
#[serde(default)]
pub logging: LoggingConfig,
```
With `config.json`:
```json
{
"logging": {
"log_dir": null,
"retention_days": 30,
"file_logging": true
}
}
```
Defaults are sane so existing configs continue working with zero changes.
**CLI flags** (added to `Cli` struct in `src/cli/mod.rs`):
| Flag | Type | Default | Description |
|---|---|---|---|
| `-v` / `--verbose` | count (u8) | 0 | Increase stderr log verbosity. Stacks: `-v`, `-vv`, `-vvv`. |
| `--log-format` | text \| json | text | Stderr log format. `json` emits one JSON object per log line (same schema as file layer). |
These are global flags (`global = true`) consistent with the existing `--quiet`, `--robot`, etc.
---
## 5. Implementation Plan
### Phase 1: Verbosity Flags + Structured File Logging
**Scope**: CLI flags, dual-layer subscriber, file logging, rotation, retention, `--log-format`.
**Files touched**:
- `Cargo.toml` — add `tracing-appender` dependency
- `src/cli/mod.rs` — add `-v`/`--verbose` (count) and `--log-format` flags to `Cli` struct
- `src/main.rs` — replace subscriber initialization (lines 44-58) with dual-layer setup
- `src/core/config.rs` — add `LoggingConfig` struct and `logging` field to `Config`
- `src/core/paths.rs` — add `get_log_dir()` helper (XDG data dir + `/logs/`)
- `src/cli/commands/doctor.rs` — add log file location and disk usage check
**Implementation steps**:
1. Add `-v` / `--verbose` (count, `u8`) and `--log-format` (text|json) flags to `Cli` struct.
2. Add `tracing-appender` dependency to `Cargo.toml`.
3. Add `LoggingConfig` to `Config` with `#[serde(default)]`.
4. Add `get_log_dir()` to `src/core/paths.rs` (mirrors `get_db_path()` pattern).
5. Replace subscriber init in `main.rs`:
- Build `stderr_filter` from `-v` count (or `RUST_LOG` if set).
- Build `file_filter` as `lore=debug,warn` (or `RUST_LOG` if set).
- stderr layer: `fmt::layer().with_writer(SuspendingWriter)` with `stderr_filter`. When `--log-format json`, chain `.json()`.
- file layer: `fmt::layer().json().with_writer(tracing_appender::rolling::daily(log_dir, "lore"))` with `file_filter`.
- Combine via `registry().with(stderr_layer.with_filter(stderr_filter)).with(file_layer.with_filter(file_filter))`.
6. Implement log retention at startup: scan `log_dir`, delete files matching `lore.*.log` pattern older than `retention_days`. Run before subscriber init so deleted files aren't held open.
7. Add log file check to `lore doctor`: report log directory path, number of log files, total disk usage. In robot mode, add a `logging` field to `DoctorChecks` with `log_dir`, `file_count`, `total_bytes`, `oldest_file`.
**New dependencies**: `tracing-appender` (0.2)
**Interaction with `-q`/`--quiet`**: The existing `--quiet` flag suppresses non-error terminal output via `IngestDisplay::silent()`. It should NOT affect file logging (file layer is always on). When `-q` and `-v` are both passed, `-q` wins for stderr (set stderr filter to WARN+). File layer remains at DEBUG+.
**Tests** (see Section 6.1 for details):
- Unit: `EnvFilter` construction from verbosity count (0→INFO, 1→DEBUG, 2→DEBUG+deps, 3→TRACE)
- Unit: `RUST_LOG` overrides `-v` flags
- Unit: `-q` + `-v` interaction (quiet wins)
- Unit: `LoggingConfig` deserialization with missing/partial/full fields
- Unit: Log retention deletes old files, preserves recent ones
- Integration: Subscriber produces JSON lines to a test file
- Integration: `SuspendingWriter` still works with dual-layer stack (no garbled output)
### Phase 2: Spans + Correlation IDs
**Scope**: Tracing spans, UUID-based `run_id`, span recording for JSON logs.
**Depends on**: Phase 1 (subscriber must support span recording).
**Files touched**:
- `src/cli/commands/sync.rs` — add root span with `run_id` field to `run_sync()`
- `src/cli/commands/ingest.rs` — add `#[instrument]` spans to `run_ingest()` and its stages
- `src/ingestion/orchestrator.rs` — add spans for `fetch_pages`, `sync_discussions`, `fetch_resource_events`
- `src/documents/regenerator.rs` — add span for `generate_docs` stage
- `src/embedding/pipeline.rs` — add span for `embed` stage
- `src/main.rs` — generate `run_id` before calling command handler, pass as field
**Implementation steps**:
1. Generate `run_id` using `Uuid::new_v4().to_string()[..8]` (first 8 chars of UUIDv4) at command entry in `main.rs`. No new dependency needed — `uuid` v1 with v4 feature is already in `Cargo.toml`.
2. Create root span: `let _root = tracing::info_span!("sync", run_id = %run_id).entered();` (or equivalent for each command).
3. Add `#[instrument(skip_all, fields(stage = "ingest_issues"))]` to ingest stages.
4. Add `#[instrument(skip_all, fields(project = %project_path))]` to per-project functions.
5. Ensure the file layer's JSON formatter includes span context. `tracing-subscriber`'s `fmt::layer().json()` includes the current span chain by default when the registry has span storage enabled.
6. Verify: parse a log file, confirm every line includes `span.run_id`.
**New dependencies**: None (`uuid` already present).
**Tests**:
- Unit: `run_id` is a valid 8-character hex string
- Integration: Run a sync-like operation with spans, parse JSON log output, verify every line contains `run_id` in span context
- Integration: Nested spans produce correct parent-child relationships in JSON output
### Phase 3: Performance Metrics Collection
**Scope**: `StageTiming` struct, span-to-metrics extraction, robot JSON enrichment, timing summary.
**Depends on**: Phase 2 (spans must exist to extract timing from).
**Files touched**:
- `src/core/metrics.rs` — new file: `StageTiming` struct, `MetricsLayer` (custom tracing layer), span-to-timing extraction
- `src/cli/commands/sync.rs` — consume `Vec<StageTiming>` from `MetricsLayer`, include in `SyncMeta`
- `src/cli/commands/ingest.rs` — same pattern for standalone ingest
- `src/main.rs` — register `MetricsLayer` in the subscriber stack
**Implementation steps**:
1. Define `StageTiming` struct with `sub_stages: Vec<StageTiming>` in `src/core/metrics.rs`.
2. Implement `MetricsLayer` as a custom `tracing_subscriber::Layer`:
- `on_new_span`: Record span ID, name, parent, start time.
- `on_record`: Capture structured fields (`items_processed`, `items_skipped`, `errors`) recorded via `Span::record()`.
- `on_close`: Calculate `elapsed_ms`, build `StageTiming` entry, attach to parent.
- Provide `fn extract_timings(&self, run_id: &str) -> Vec<StageTiming>` to materialize the span tree after the root span closes.
3. Store `MetricsLayer` reference (behind `Arc`) so command handlers can call `extract_timings()` after `run_sync()` completes.
4. Extend `SyncMeta` and `SyncJsonOutput` to include `run_id: String` and `stages: Vec<StageTiming>`.
5. Print human-readable timing summary at end of interactive sync (gated behind `IngestDisplay::show_text`).
**Span field recording**: Sync stages must record item counts as span fields for `MetricsLayer` to capture:
```rust
let span = tracing::info_span!("ingest_issues");
let _guard = span.enter();
// ... do work ...
span.record("items_processed", count);
span.record("items_skipped", skipped);
```
**Tests**:
- Unit: `StageTiming` serialization matches expected JSON (including nested `sub_stages`)
- Unit: `MetricsLayer` correctly builds span tree from synthetic span events
- Unit: `MetricsLayer` handles spans with no children (leaf stages like `embed`)
- Unit: `MetricsLayer` handles concurrent spans (multiple projects in parallel)
- Integration: `lore --robot sync` output includes `meta.stages` array with correct nesting
- Integration: Interactive sync prints timing summary table to stderr
### Phase 4: Sync History Enrichment
**Scope**: Wire up `sync_runs` INSERT/UPDATE lifecycle, schema migration, enhanced sync-status.
**Depends on**: Phase 3 (needs `Vec<StageTiming>` to store in `metrics_json`).
**Files touched**:
- `migrations/014_sync_runs_enrichment.sql` — new migration: add `run_id`, `total_items_processed`, `total_errors` columns + index
- `src/core/sync_run.rs` — new file: `SyncRunRecorder` struct encapsulating INSERT on start, UPDATE on finish
- `src/cli/commands/sync.rs` — create `SyncRunRecorder` before pipeline, finalize after
- `src/cli/commands/ingest.rs` — same pattern for standalone ingest
- `src/cli/commands/sync_status.rs` — enhance to show recent runs with metrics, parse `metrics_json`
**Implementation steps**:
1. Create migration `014_sync_runs_enrichment.sql`:
```sql
ALTER TABLE sync_runs ADD COLUMN run_id TEXT;
ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
CREATE INDEX idx_sync_runs_run_id ON sync_runs(run_id);
```
Note: Migration number 014 assumes no other migration is added before this phase. If concurrent work adds migration 014, renumber accordingly.
2. Implement `SyncRunRecorder`:
```rust
pub struct SyncRunRecorder { id: i64, conn: Connection }
impl SyncRunRecorder {
pub fn start(conn: &Connection, command: &str, run_id: &str) -> Result<Self>;
pub fn succeed(self, metrics: &[StageTiming], total_items: usize, total_errors: usize) -> Result<()>;
pub fn fail(self, error: &str, metrics: Option<&[StageTiming]>) -> Result<()>;
}
```
3. In `run_sync()`: create `SyncRunRecorder::start()` before pipeline, call `.succeed()` or `.fail()` after.
4. In `run_ingest()`: same pattern.
5. Enhance `sync_status.rs`:
- Query last N runs (default 10) instead of just the last 1.
- Parse `metrics_json` column to extract stage breakdown.
- Show `run_id`, duration, item counts, error counts in both interactive and robot modes.
- Add `--run <run_id>` flag to `sync-status` for single-run detail view.
**Tests**:
- Unit: `SyncRunRecorder::start` inserts a row with status='running'
- Unit: `SyncRunRecorder::succeed` updates status, sets finished_at, writes metrics_json
- Unit: `SyncRunRecorder::fail` updates status, sets error, sets finished_at
- Unit: Migration 014 applies cleanly on top of migration 013
- Integration: `lore sync` creates a sync_runs row; `lore sync-status` displays it
- Integration: `lore --robot sync-status` JSON includes `runs` array with stage breakdowns
- Integration: Failed sync records error in sync_runs with partial metrics
### Phase 5: Rate Limit + Retry Instrumentation
**Scope**: Enhanced logging in GitLab client, retry counters in stage timing.
**Depends on**: Phase 2 (spans for context), Phase 3 (StageTiming for counters).
**Files touched**:
- `src/gitlab/client.rs` (or wherever the HTTP client with retry logic lives) — add structured fields to retry/rate-limit log events
- `src/core/metrics.rs` — add `rate_limit_hits` and `retries` fields to `StageTiming`
**Implementation steps**:
1. Find the retry/rate-limit handling code (likely in the GitLab HTTP client). Add structured tracing fields:
```rust
info!(
path = %request_path,
attempt = attempt_number,
retry_after_secs = retry_after,
status_code = 429,
"Rate limited, retrying"
);
```
2. Add `rate_limit_hits: usize` and `retries: usize` fields to `StageTiming` (with `#[serde(skip_serializing_if = "is_zero")]`).
3. In `MetricsLayer`, count rate-limit and retry events within each span and include in `StageTiming`.
4. In `-v` mode, the existing stderr layer already shows INFO+ events, so retry activity becomes visible automatically. No additional work needed beyond step 1.
**Tests**:
- Unit: Rate-limit log events include all required structured fields
- Unit: `StageTiming` serialization includes `rate_limit_hits` and `retries` when non-zero, omits when zero
- Integration: Simulate 429 response, verify log line has `path`, `attempt`, `retry_after_secs` fields
- Integration: After simulated retries, `StageTiming` counts match expected values
---
## 6. Acceptance Criteria
### 6.1 Phase 1: Verbosity Flags + Structured File Logging
**Functional criteria**:
- [ ] `lore sync` writes JSON log lines to `~/.local/share/lore/logs/lore.YYYY-MM-DD.log` with zero configuration.
- [ ] `lore -v sync` shows DEBUG-level `lore::*` output on stderr; dependency output stays at WARN.
- [ ] `lore -vv sync` shows DEBUG-level `lore::*` + INFO-level dependency output on stderr.
- [ ] `lore -vvv sync` shows TRACE-level output for everything on stderr.
- [ ] `RUST_LOG=lore::gitlab=trace lore sync` overrides `-v` flags for both stderr and file layers.
- [ ] `lore --log-format json sync` emits JSON-formatted log lines on stderr (same schema as file layer).
- [ ] Log files rotate daily (new file per calendar day).
- [ ] Files matching `lore.*.log` older than `retention_days` are deleted on startup.
- [ ] Existing behavior is unchanged when no new flags are passed (INFO on stderr, human-readable format).
- [ ] `--quiet` suppresses non-error stderr output. `-q` + `-v` together: `-q` wins (stderr at WARN+).
- [ ] `--quiet` does NOT affect file logging (file layer remains at DEBUG+).
- [ ] `lore doctor` reports: log directory path, number of log files, total disk usage in bytes. Robot mode includes a `logging` field in the checks JSON.
- [ ] File layer always logs at DEBUG+ for `lore::*` crate regardless of `-v` flags.
**Test specifications**:
- `test_verbosity_filter_construction`: Given verbosity count 0/1/2/3, assert the resulting `EnvFilter` matches the expected directives table.
- `test_rust_log_overrides_verbose`: Set `RUST_LOG=lore=trace`, pass `-v` (count=1), assert the filter uses TRACE (not DEBUG).
- `test_quiet_overrides_verbose`: Pass `-q` and `-v` together, assert stderr filter is WARN+.
- `test_logging_config_defaults`: Deserialize an empty `{}` JSON as `LoggingConfig`, assert `retention_days=30`, `file_logging=true`, `log_dir=None`.
- `test_logging_config_partial`: Deserialize `{"retention_days": 7}`, assert `file_logging=true` default preserved.
- `test_log_retention_cleanup`: Create temp dir with files named `lore.2026-01-01.log` through `lore.2026-02-04.log`. Run retention with `retention_days=7`. Assert files older than 7 days are deleted, recent files preserved.
- `test_log_retention_ignores_non_log_files`: Create temp dir with `lore.2026-01-01.log` and `other.txt`. Run retention. Assert `other.txt` is NOT deleted.
- `test_json_log_output_format`: Capture file layer output, parse each line as JSON, assert keys: `timestamp`, `level`, `target`, `fields`, `span`.
- `test_suspending_writer_dual_layer`: Run a tracing event with both layers active and a progress bar. Assert no garbled output on stderr (no interleaved progress bar fragments in log lines).
### 6.2 Phase 2: Spans + Correlation IDs
**Functional criteria**:
- [ ] Every log line within a sync run includes `run_id` in the JSON span context.
- [ ] `jq 'select(.spans[] | .run_id != null)' lore.2026-02-04.log` extracts all lines from a run.
- [ ] Nested spans produce a chain: log lines inside `fetch_pages` include both the `fetch_pages` span and the parent `ingest_issues` span in their span context.
- [ ] `run_id` is an 8-character hex string (truncated UUIDv4).
- [ ] Spans are visible in `-vv` stderr output as bracketed context.
**Test specifications**:
- `test_run_id_format`: Generate 100 run_ids, assert each is 8 chars, all hex characters.
- `test_run_id_uniqueness`: Generate 1000 run_ids, assert no duplicates.
- `test_span_context_in_json_logs`: Run a mock sync with spans, capture JSON log output, parse and verify each line has `spans` array containing `run_id`.
- `test_nested_span_chain`: Create parent span "sync" with child "ingest_issues" with child "fetch_pages". Emit a log event inside "fetch_pages". Assert the JSON log line's span chain includes all three span names.
- `test_span_elapsed_on_close`: Create a span, sleep 10ms, close it. Verify the close event records `elapsed_ms >= 10`.
### 6.3 Phase 3: Performance Metrics Collection
**Functional criteria**:
- [ ] `lore --robot sync` JSON includes `meta.run_id` (string) and `meta.stages` (array).
- [ ] Each stage in `meta.stages` has: `name`, `elapsed_ms`, `items_processed`.
- [ ] Top-level stages (ingest_issues, ingest_mrs, generate_docs, embed) have `sub_stages` arrays.
- [ ] Sub-stages include `project` field when applicable.
- [ ] `lore sync` (interactive) prints a timing summary table on stderr, gated behind `IngestDisplay::show_text`.
- [ ] `lore -q sync` does NOT print the timing summary.
- [ ] Zero-value fields (`items_skipped: 0`, `errors: 0`) are omitted from JSON output.
**Test specifications**:
- `test_stage_timing_serialization`: Create a `StageTiming` with sub_stages, serialize to JSON, assert structure matches PRD example.
- `test_stage_timing_zero_fields_omitted`: Create `StageTiming` with `errors: 0`, serialize, assert no `errors` key in output.
- `test_metrics_layer_single_span`: Create `MetricsLayer`, enter/exit one span with recorded fields, extract timings, assert one `StageTiming` entry.
- `test_metrics_layer_nested_spans`: Create parent + child spans, extract timings, assert parent has child in `sub_stages`.
- `test_metrics_layer_parallel_spans`: Create two sibling spans (simulating two projects), extract timings, assert both appear as sub_stages of parent.
- `test_sync_meta_includes_stages`: Mock a sync pipeline, verify robot JSON output parses correctly with `meta.stages`.
- `test_timing_summary_format`: Capture stderr during interactive sync, verify timing table format matches PRD example.
### 6.4 Phase 4: Sync History Enrichment
**Functional criteria**:
- [ ] `lore sync` creates a row in `sync_runs` with status='running' at start, updated to 'succeeded'/'failed' at finish.
- [ ] `lore ingest issues` also creates a `sync_runs` row.
- [ ] `sync_runs.run_id` matches the `run_id` in log files and robot JSON.
- [ ] `sync_runs.metrics_json` contains the serialized `Vec<StageTiming>`.
- [ ] `sync_runs.total_items_processed` and `total_errors` are populated.
- [ ] `lore sync-status` shows the last 10 runs with: run_id, timestamp, duration, item count, error count.
- [ ] `lore --robot sync-status` JSON includes `runs` array with `stages` parsed from `metrics_json`.
- [ ] Failed syncs record the error message and any partial metrics collected before failure.
- [ ] Migration 014 applies cleanly and is idempotent (safe to re-run).
**Test specifications**:
- `test_sync_run_recorder_start`: Call `start()`, query sync_runs, assert one row with status='running'.
- `test_sync_run_recorder_succeed`: Call `start()` then `succeed()`, assert row has status='succeeded', finished_at set, metrics_json parseable.
- `test_sync_run_recorder_fail`: Call `start()` then `fail()`, assert row has status='failed', error set.
- `test_sync_run_recorder_fail_with_partial_metrics`: Call `start()`, collect some metrics, then `fail()`. Assert metrics_json contains partial data.
- `test_migration_014_applies`: Apply all migrations 001-014 on a fresh DB. Assert `sync_runs` has `run_id`, `total_items_processed`, `total_errors` columns.
- `test_migration_014_idempotent`: Apply migration 014 twice. Assert no error on second apply.
- `test_sync_status_shows_runs`: Insert 3 sync_runs rows, run `print_sync_status()`, assert output includes all 3 with correct formatting.
- `test_sync_status_json_includes_stages`: Insert a sync_runs row with metrics_json, run robot-mode sync-status, parse JSON, assert `runs[0].stages` is an array.
### 6.5 Phase 5: Rate Limit + Retry Instrumentation
**Functional criteria**:
- [ ] Rate-limit events (HTTP 429) log at INFO with structured fields: `path`, `attempt`, `retry_after_secs`, `status_code`.
- [ ] Retry events (non-429 transient errors) log with: `path`, `attempt`, `error`.
- [ ] `StageTiming` includes `rate_limit_hits` and `retries` counts (omitted when zero).
- [ ] `lore -v sync` shows retry activity on stderr (visible because it's INFO+).
- [ ] Rate limit counts are included in `metrics_json` stored in `sync_runs`.
**Test specifications**:
- `test_rate_limit_log_fields`: Simulate a 429 response, capture log output, parse JSON, assert fields: `path`, `attempt`, `retry_after_secs`, `status_code`.
- `test_retry_log_fields`: Simulate a transient error + retry, capture log, assert fields: `path`, `attempt`, `error`.
- `test_stage_timing_rate_limit_counts`: Simulate 3 rate-limit hits within a span, extract `StageTiming`, assert `rate_limit_hits == 3`.
- `test_stage_timing_retry_counts`: Simulate 2 retries, extract `StageTiming`, assert `retries == 2`.
- `test_rate_limit_fields_omitted_when_zero`: Create `StageTiming` with zero rate limits, serialize, assert no `rate_limit_hits` key.
---
## 7. Resolved Decisions
1. **Log format**: Use `tracing-subscriber`'s built-in JSON formatter (`fmt::layer().json()`). Zero custom code, battle-tested, and ecosystem tools (Grafana Loki, Datadog) already parse this format. The schema difference from our robot JSON envelope is cosmetic and not worth the maintenance burden of a custom formatter.
2. **Span recording**: Always-on. lore is I/O-bound (GitLab API + SQLite), so the nanosecond-level overhead of span storage and chain lookup is unmeasurable against our millisecond-scale operations. Conditional recording would add subscriber construction complexity for zero practical benefit.
3. **Log file location**: `~/.local/share/lore/logs/` (XDG data directory). Logs are NOT reproducible — you can generate new logs, but you cannot regenerate the exact diagnostic output from a past run. They are forensic artifacts that users would notice missing, so they belong in data, not cache.
4. **Retention**: In scope for Phase 1. Startup cleanup: scan log directory, delete files matching `lore.*.log` older than `retention_days` (default 30). Simple, no background threads, no external dependencies. Runs before subscriber initialization so deleted file handles aren't held.
5. **Stage timing granularity**: Per-project with nested sub-stages. When one project has 500 MRs and another has 3, knowing which one consumed the time budget is the difference between "sync was slow" and actionable diagnosis. The `StageTiming` struct includes an optional `project` field and a `sub_stages: Vec<StageTiming>` field for nesting.
6. **Stage timing collection mechanism**: Materialized from tracing spans, not plumbed manually. A custom `MetricsLayer` in the subscriber stack records span enter/exit/record events and builds the `StageTiming` tree. This avoids threading a mutable collector through every function signature and makes spans the single source of truth for timing data. Phase 2 adds spans; Phase 3 adds the layer that reads them.
7. **run_id format**: First 8 characters of `Uuid::new_v4().to_string()` (e.g., `"a1b2c3d4"`). The `uuid` crate (v1, v4 feature) is already a dependency. No new crate needed. 8 characters provide ~4 billion unique values — more than sufficient for local CLI invocations.
8. **File log level**: Always DEBUG+ for `lore::*` crate, WARN+ for dependencies, regardless of `-v` flags. This ensures post-mortem data is always richer than what was shown on stderr. `RUST_LOG` overrides both layers when set.
9. **sync_runs lifecycle**: The table exists (migration 001) but nothing writes to it. Phase 4 wires up the INSERT (on start) / UPDATE (on finish) lifecycle AND adds enrichment columns in a single migration. The existing `metrics_json` column stores the detailed `Vec<StageTiming>` array — no need for a separate `stages_json` column.
10. **JSON stderr via --log-format**: A `--log-format text|json` global flag controls stderr log format. Default is `text` (human-readable). When `json`, stderr uses the same JSON formatter as the file layer, routed through `SuspendingWriter` for progress bar coordination. This enables `lore sync 2>&1 | jq` workflows without reading log files.
---
## 8. Phase Dependency Graph
```
Phase 1 (Subscriber + Flags)
|
v
Phase 2 (Spans + run_id)
|
+------+------+
| |
v v
Phase 3 Phase 5
(Metrics) (Rate Limit Logging)
| |
v |
Phase 4 |
(Sync History) <--+
```
**Parallelization opportunities**:
- Phase 1 must complete before anything else.
- Phase 2 must complete before Phase 3 or Phase 5.
- Phase 3 and Phase 5 can run in parallel (Phase 5 only needs spans from Phase 2, not MetricsLayer from Phase 3).
- Phase 4 depends on Phase 3 (needs `Vec<StageTiming>` to store). Phase 5's `rate_limit_hits`/`retries` fields on `StageTiming` can be added to Phase 4's stored data after Phase 5 completes, or Phase 4 can store them as zero initially.
**Agent assignment suggestion**:
- Agent A: Phase 1 → Phase 2 (sequential, foundational infrastructure)
- Agent B: Phase 3 (after Phase 2 completes)
- Agent C: Phase 5 (after Phase 2 completes, parallel with Phase 3)
- Agent B or D: Phase 4 (after Phase 3 completes)
---
## 9. References
- Gholamian, S. & Ward, P. (2021). "A Comprehensive Survey of Logging in Software." arXiv:2110.12489.
- Duan, S. et al. (2025). "PDLogger: Automated Logging Framework for Practical Software Development." arXiv:2507.19951.
- tokio-rs/tracing ecosystem: `tracing`, `tracing-subscriber`, `tracing-appender`.
- GNU Coding Standards: Verbosity and diagnostic output conventions.
Rust CLI Working Group: Recommendations for error reporting and verbosity.

View File

@@ -0,0 +1,131 @@
1. **Make immutable identity usable now (`--author-id`)**
Why: The plan captures `author_id` but intentionally defers using it, so the core longitudinal-analysis problem is only half-fixed.
```diff
@@ Phase 1: `lore notes` Command / Work Chunk 1A
pub struct NoteListFilters<'a> {
+ pub author_id: Option<i64>, // immutable identity filter
@@
- pub author: Option<&'a str>, // case-insensitive match via COLLATE NOCASE
+ pub author: Option<&'a str>, // display-name filter
+ // If both author and author_id are provided, apply both (AND) for precision.
}
@@
Filter mappings:
+ - `author_id`: `n.author_id = ?` (exact immutable identity)
- `author`: strip `@` prefix, `n.author_username = ? COLLATE NOCASE`
@@ Phase 1 / Work Chunk 1B (CLI)
+ /// Filter by immutable author id
+ #[arg(long = "author-id", help_heading = "Filters")]
+ pub author_id: Option<i64>,
@@ Phase 2 / Work Chunk 2F
+ Add `--author-id` support to `lore search` filtering for note documents.
@@ Phase 1 / Work Chunk 1E
+ CREATE INDEX IF NOT EXISTS idx_notes_project_author_id_created
+ ON notes(project_id, author_id, created_at DESC, id DESC)
+ WHERE is_system = 0 AND author_id IS NOT NULL;
```
2. **Fix document staleness on username changes**
Why: Current plan says username changes are “not semantic,” but note documents include username in content/title, so docs go stale/inconsistent.
```diff
@@ Work Chunk 0D: Immutable Author Identity Capture
- Assert: changed_semantics = false (username change is not a semantic change for documents)
+ Assert: changed_semantics = true (username affects note document content/title)
@@ Work Chunk 0A: semantic-change detection
- old_body != body || old_note_type != note_type || ...
+ old_body != body || old_note_type != note_type || ...
+ || old_author_username != author_username
@@ Work Chunk 2C: Note Document Extractor header
author: @{author}
+ author_id: {author_id}
```
3. **Replace `last_seen_at` sweep marker with monotonic `sync_run_id`**
Why: Timestamp markers are vulnerable to clock skew and concurrent runs; run IDs are deterministic and safer.
```diff
@@ Phase 0: Stable Note Identity
+ ### Work Chunk 0E: Monotonic Run Marker
+ Add `sync_runs` table and `notes.last_seen_run_id`.
+ Ingest assigns one run_id per sync transaction.
+ Upsert sets `last_seen_run_id = current_run_id`.
+ Sweep condition becomes `last_seen_run_id < current_run_id` (when fetch_complete=true).
@@ Work Chunk 0C
- fetch_complete + last_seen_at-based sweep
+ fetch_complete + run_id-based sweep
```
4. **Materialize stale-note set once during sweep**
Why: Current set-based SQL still re-runs the stale subquery 3 times; materializing once improves performance and guarantees identical deletion set.
```diff
@@ Work Chunk 0B: Immediate Deletion Propagation
- DELETE FROM documents ... IN (SELECT id FROM notes WHERE ...);
- DELETE FROM dirty_sources ... IN (SELECT id FROM notes WHERE ...);
- DELETE FROM notes WHERE ...;
+ CREATE TEMP TABLE _stale_note_ids AS
+ SELECT id, is_system FROM notes WHERE discussion_id = ? AND last_seen_run_id < ?;
+ DELETE FROM documents
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
+ DELETE FROM dirty_sources
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
+ DELETE FROM notes WHERE id IN (SELECT id FROM _stale_note_ids);
+ DROP TABLE _stale_note_ids;
```
5. **Move historical note backfill out of migration into resumable runtime job**
Why: Data-heavy migration can block startup and is harder to resume/recover on large DBs.
```diff
@@ Work Chunk 2H
- Backfill Existing Notes After Upgrade (Migration 024)
+ Backfill Existing Notes After Upgrade (Resumable Runtime Backfill)
@@
- Files: `migrations/024_note_dirty_backfill.sql`, `src/core/db.rs`
+ Files: `src/documents/backfill.rs`, `src/cli/commands/generate_docs.rs`
@@
- INSERT INTO dirty_sources ... SELECT ... FROM notes ...
+ Introduce batched backfill API:
+ `enqueue_missing_note_documents(batch_size: usize) -> BackfillProgress`
+ invoked from `generate-docs`/`sync` until complete, resumable across runs.
```
6. **Add streaming path for large `jsonl`/`csv` note exports**
Why: Current `query_notes` materializes full result set in memory; streaming improves scalability and latency.
```diff
@@ Work Chunk 1A
+ Add `query_notes_stream(conn, filters, row_handler)` for forward-only row iteration.
@@ Work Chunk 1C
- print_list_notes_jsonl(&result)
- print_list_notes_csv(&result)
+ print_list_notes_jsonl_stream(config, filters)
+ print_list_notes_csv_stream(config, filters)
+ (table/json keep counted buffered path)
```
7. **Add index for path-centric note queries**
Why: `--path` + project/date queries are a stated hot path and not fully covered by current proposed indexes.
```diff
@@ Work Chunk 1E: Composite Query Index
+ CREATE INDEX IF NOT EXISTS idx_notes_project_path_created
+ ON notes(project_id, position_new_path, created_at DESC, id DESC)
+ WHERE is_system = 0 AND position_new_path IS NOT NULL;
```
8. **Add property/invariant tests (not only examples)**
Why: This feature touches ingestion identity, sweeping, deletion propagation, and document regeneration; randomized invariants will catch subtle regressions.
```diff
@@ Verification Checklist
+ Add property tests (proptest):
+ - stable local IDs across randomized re-sync orderings
+ - no orphan `documents(source_type='note')` after randomized deletions/sweeps
+ - partial-fetch runs never reduce note count
+ - repeated full rebuild converges (fixed-point idempotence)
```
These revisions keep your existing direction, avoid all rejected items, and materially improve correctness, scale behavior, and long-term maintainability.

2518
docs/prd-per-note-search.md Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -2,19 +2,22 @@
## Overview
Robot mode optimizes the `lore` CLI for AI agent consumption with structured JSON output, meaningful exit codes, and token-efficient responses.
Robot mode optimizes the `lore` CLI for AI agent consumption with compact JSON output, structured errors with machine-actionable recovery steps, meaningful exit codes, response timing metadata, field selection for token efficiency, and TTY auto-detection.
## Activation
```bash
# Explicit flag
lore --robot list issues
lore --robot issues -n 5
# Auto-detection (when stdout is not a TTY)
lore list issues | jq .
# JSON shorthand
lore -J issues -n 5
# Environment variable
LORE_ROBOT=true lore list issues
LORE_ROBOT=1 lore issues
# Auto-detection (when stdout is not a TTY)
lore issues | jq .
```
## Global Flags
@@ -22,218 +25,160 @@ LORE_ROBOT=true lore list issues
| Flag | Description |
|------|-------------|
| `--robot` | Force JSON output, structured errors |
| `--quiet` | Suppress progress/spinners (implied by --robot) |
| `-J` / `--json` | Shorthand for `--robot` |
| `--quiet` | Suppress progress/spinners (implied by `--robot`) |
| `--fields <list>` | Select output fields for list commands |
## Exit Codes
## Response Envelope
| Code | ErrorCode | Meaning |
|------|-----------|---------|
| 0 | - | Success |
| 1 | INTERNAL_ERROR | Unknown/internal error |
| 2 | CONFIG_NOT_FOUND | Config file missing |
| 3 | CONFIG_INVALID | Config file malformed |
| 4 | TOKEN_NOT_SET | GitLab token not configured |
| 5 | GITLAB_AUTH_FAILED | Authentication failed |
| 6 | GITLAB_NOT_FOUND | Resource not found |
| 7 | GITLAB_RATE_LIMITED | Rate limited |
| 8 | GITLAB_NETWORK_ERROR | Network/connection error |
| 9 | DB_LOCKED | Database locked by another process |
| 10 | DB_ERROR | Database error |
| 11 | MIGRATION_FAILED | Migration failed |
| 12 | IO_ERROR | File I/O error |
| 13 | TRANSFORM_ERROR | Data transformation error |
All commands return a consistent JSON envelope to stdout:
```json
{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}
```
Key properties:
- **Compact JSON**: Single-line output (no pretty-printing) for efficient parsing
- **Uniform envelope**: Every command wraps its data in `{"ok":true,"data":{...},"meta":{...}}`
- **Timing metadata**: `meta.elapsed_ms` is present on every response (wall-clock milliseconds)
## Error Output Format
When `--robot` is active, errors are JSON on stderr:
Errors are JSON on stderr with structured fields for programmatic handling:
```json
{
"error": {
"code": "CONFIG_NOT_FOUND",
"message": "Config file not found at ~/.config/lore/config.toml",
"suggestion": "Run 'lore init' to create configuration"
"message": "Config file not found at ~/.config/lore/config.json. Run \"lore init\" first.",
"suggestion": "Run 'lore init' to set up your GitLab connection.",
"actions": ["lore init"]
}
}
```
## Success Output Format
| Field | Type | Description |
|-------|------|-------------|
| `code` | string | Machine-readable error code (e.g., `CONFIG_NOT_FOUND`) |
| `message` | string | Human-readable error description |
| `suggestion` | string? | Recovery guidance (omitted when not applicable) |
| `actions` | string[]? | Executable shell commands for recovery (omitted when empty) |
All commands return consistent JSON structure:
### Error Actions by Code
```json
{
"ok": true,
"data": { ... },
"meta": {
"count": 50,
"total": 1234,
"elapsed_ms": 45
}
}
| Error Code | Actions |
|------------|---------|
| `CONFIG_NOT_FOUND` | `["lore init"]` |
| `CONFIG_INVALID` | `["lore init --force"]` |
| `GITLAB_AUTH_FAILED` | `["export GITLAB_TOKEN=glpat-xxx", "lore auth"]` |
| `TOKEN_NOT_SET` | `["export GITLAB_TOKEN=glpat-xxx"]` |
| `OLLAMA_UNAVAILABLE` | `["ollama serve"]` |
| `OLLAMA_MODEL_NOT_FOUND` | `["ollama pull nomic-embed-text"]` |
| `DB_LOCKED` | `["lore ingest --force"]` |
| `EMBEDDING_FAILED` | `["lore embed --retry-failed"]` |
| `MIGRATION_FAILED` | `["lore migrate"]` |
| `GITLAB_NETWORK_ERROR` | `["lore doctor"]` |
## Exit Codes
| Code | ErrorCode | Meaning |
|------|-----------|---------|
| 0 | -- | Success |
| 1 | `INTERNAL_ERROR` | Unknown/internal error |
| 2 | -- | Usage error (invalid flags or arguments) |
| 3 | `CONFIG_INVALID` | Config file malformed |
| 4 | `TOKEN_NOT_SET` | GitLab token not configured |
| 5 | `GITLAB_AUTH_FAILED` | Authentication failed |
| 6 | `GITLAB_NOT_FOUND` | Resource not found |
| 7 | `GITLAB_RATE_LIMITED` | Rate limited |
| 8 | `GITLAB_NETWORK_ERROR` | Network/connection error |
| 9 | `DB_LOCKED` | Database locked by another process |
| 10 | `DB_ERROR` | Database error |
| 11 | `MIGRATION_FAILED` | Migration failed |
| 12 | `IO_ERROR` | File I/O error |
| 13 | `TRANSFORM_ERROR` | Data transformation error |
| 14 | `OLLAMA_UNAVAILABLE` | Ollama not running |
| 15 | `OLLAMA_MODEL_NOT_FOUND` | Ollama model not installed |
| 16 | `EMBEDDING_FAILED` | Embedding generation failed |
| 17 | `NOT_FOUND` | Entity does not exist locally |
| 18 | `AMBIGUOUS` | Multiple projects match (use `-p`) |
| 19 | -- | Health check failed |
| 20 | `CONFIG_NOT_FOUND` | Config file missing |
## Field Selection
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in each item of the response array:
```bash
# Preset: ~60% fewer tokens
lore -J issues --fields minimal
# Custom field list
lore -J mrs --fields iid,title,state,draft,target_branch
```
## Command-Specific Output
### Presets
### lore list issues --robot
| Preset | Expands to |
|--------|------------|
| `minimal` | `iid`, `title`, `state`, `updated_at_iso` |
```json
{
"ok": true,
"data": {
"issues": [
{
"iid": 123,
"project": "group/project",
"title": "Bug in login",
"state": "opened",
"author": "username",
"assignees": ["user1"],
"labels": ["bug", "priority::high"],
"discussions": { "total": 5, "unresolved": 2 },
"updated_at": "2024-01-15T10:30:00Z",
"web_url": "https://..."
}
]
},
"meta": { "showing": 50, "total": 234 }
}
### Available Fields
**Issues**: `iid`, `title`, `state`, `author_username`, `labels`, `assignees`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at_iso`
**MRs**: `iid`, `title`, `state`, `author_username`, `labels`, `draft`, `target_branch`, `source_branch`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `reviewers`
Field selection applies only to list output, not to show (single-entity) output which returns full detail.
## Command Response Schemas
Every command in `lore robot-docs` includes a `response_schema` field describing the shape of its JSON response. This enables agents to understand response structures without trial-and-error.
```bash
# Get schema for a specific command
lore robot-docs | jq '.data.commands.issues.response_schema'
# Get all schemas
lore robot-docs | jq '[.data.commands | to_entries[] | select(.value.response_schema) | {(.key): .value.response_schema}] | add'
```
### lore show issue 123 --robot
## Clap Error Handling
Parse errors from the argument parser emit structured JSON to stderr with semantic error codes:
| Code | Meaning |
|------|---------|
| `UNKNOWN_COMMAND` | Unrecognized subcommand (includes fuzzy suggestion) |
| `UNKNOWN_FLAG` | Unrecognized command-line flag |
| `MISSING_REQUIRED` | Required argument not provided |
| `INVALID_VALUE` | Invalid value for argument |
| `TOO_MANY_VALUES` | Too many values provided |
| `TOO_FEW_VALUES` | Too few values provided |
| `ARGUMENT_CONFLICT` | Conflicting arguments |
| `MISSING_COMMAND` | No subcommand provided |
| `HELP_REQUESTED` | Help or version flag used |
| `PARSE_ERROR` | General parse error |
Unknown commands include a fuzzy suggestion when a close match exists:
```json
{
"ok": true,
"data": {
"issue": {
"iid": 123,
"project": "group/project",
"title": "Bug in login",
"description": "Full markdown...",
"state": "opened",
"author": "username",
"created_at": "2024-01-10T08:00:00Z",
"updated_at": "2024-01-15T10:30:00Z",
"discussions": [
{
"id": "abc123",
"resolved": false,
"notes": [
{
"author": "user1",
"body": "Comment text...",
"created_at": "2024-01-11T09:00:00Z",
"system": false
}
]
}
]
}
}
}
{"error":{"code":"UNKNOWN_COMMAND","message":"...","suggestion":"Did you mean 'lore issues'? Run 'lore robot-docs' for all commands"}}
```
### lore ingest --type issues --robot
## Agent Self-Discovery
```json
{
"ok": true,
"data": {
"resource_type": "issues",
"projects": [
{
"path": "group/project",
"issues_synced": 45,
"discussions_synced": 123
}
],
"totals": {
"issues": 45,
"discussions": 123
}
},
"meta": { "elapsed_ms": 3400 }
}
`lore robot-docs` provides a complete manifest for agent bootstrapping:
```bash
lore robot-docs # Pretty-printed (human-readable)
lore --robot robot-docs # Compact (for parsing)
```
### lore count issues --robot
```json
{
"ok": true,
"data": {
"entity": "issues",
"count": 1234,
"breakdown": {
"opened": 456,
"closed": 778
}
}
}
```
### lore doctor --robot
```json
{
"ok": true,
"data": {
"success": true,
"checks": {
"config": { "status": "ok", "path": "~/.config/lore/config.toml" },
"database": { "status": "ok", "version": 6 },
"gitlab": { "status": "ok", "user": "username" },
"projects": [
{ "path": "group/project", "status": "ok" }
]
}
}
}
```
### lore sync-status --robot
```json
{
"ok": true,
"data": {
"last_sync": {
"status": "completed",
"resource_type": "issues",
"started_at": "2024-01-15T10:00:00Z",
"completed_at": "2024-01-15T10:00:45Z",
"duration_ms": 45000
},
"cursors": [
{
"project": "group/project",
"resource_type": "issues",
"cursor": "2024-01-15T10:00:00Z"
}
]
}
}
```
## Implementation Plan
### Phase 1: Core Infrastructure
1. Add `--robot` global flag to Cli struct
2. Create `RobotOutput` trait for consistent JSON serialization
3. Add exit code mapping from ErrorCode
4. Implement TTY detection with `atty` crate
### Phase 2: Command Updates
1. Update all commands to check robot mode
2. Add JSON output variants for commands missing them (count, ingest, sync-status)
3. Suppress progress bars in robot mode
### Phase 3: Error Handling
1. Update main.rs error handler for robot mode
2. Add suggestion field to GiError variants
3. Emit structured JSON errors to stderr
### Phase 4: Documentation
1. Update AGENTS.md with robot mode commands
2. Add --robot examples to help text
The manifest includes:
- All commands with flags, examples, and response schemas
- Deprecated command aliases (e.g., `list issues` -> `issues`)
- Exit codes with meanings
- Clap error codes
- Suggested workflows (first setup, daily sync, search, pre-flight)
- Activation methods (flags, env vars, TTY auto-detection)

541
docs/user-journeys.md Normal file
View File

@@ -0,0 +1,541 @@
# Lore CLI User Journeys
## Purpose
Map realistic workflows for both human users and AI agents to identify gaps in the command surface and optimization opportunities. Each journey starts with a **problem** and traces the commands needed to reach a **resolution**.
---
## Part 1: Human User Flows
### H1. Morning Standup Prep
**Problem:** "What happened since yesterday? I need to know what moved before standup."
**Flow:**
```
lore sync -q # Refresh data (quiet, no noise)
lore issues -s opened --since 1d # Issues that changed overnight
lore mrs -s opened --since 1d # MRs that moved
lore who @me # My current workload snapshot
```
**Gap identified:** No single "activity feed" command. User runs 3 queries to get what should be one view. No `--since 1d` shorthand for "since yesterday." No `@me` alias for the authenticated user.
---
### H2. Sprint Planning: What's Ready to Pick Up?
**Problem:** "We're planning the next sprint. What's open, unassigned, and actionable?"
**Flow:**
```
lore issues -s opened -p myproject # All open issues
lore issues -s opened -l "ready" # Issues labeled ready
lore issues -s opened --has-due # Issues with deadlines approaching
lore count issues -p myproject # How many total?
```
**Gap identified:** No way to filter by "unassigned" issues (missing `--no-assignee` flag). No way to sort by due date. No way to see priority/weight. Can't combine filters like "opened AND no assignee AND has due date."
---
### H3. Investigating a Production Incident
**Problem:** "Deploy broke prod. I need the full timeline of what changed around the deploy."
**Flow:**
```
lore sync -q # Get latest
lore timeline "deploy" --since 7d # What happened around deploys
lore search "deploy" --type mr # MRs mentioning deploy
lore mrs 456 # Inspect the suspicious MR
lore who --overlap src/deploy/ # Who else touches deploy code
```
**Gap identified:** Timeline is keyword-based, not event-based. Can't filter by "MRs merged in the last 24 hours" directly. No way to see which MRs were merged between two dates (release diff). Would benefit from `lore mrs -s merged --since 1d`.
---
### H4. Preparing to Review Someone's MR
**Problem:** "I was assigned to review MR !789. I need context before diving in."
**Flow:**
```
lore mrs 789 # Read the MR description + discussions
lore mrs 789 -o # Open in browser for the actual diff
lore who src/features/auth/ # Who are the experts in this area?
lore search "auth refactor" --type issue # Related issues for background
lore timeline "authentication" # History of auth changes
```
**Gap identified:** No way to see the file list touched by an MR from the CLI (data is stored in `mr_file_changes` but not surfaced). No way to link an MR back to its closing issue(s) from the MR detail view. The cross-reference data exists in `entity_references` but isn't shown in `mrs <iid>` output.
---
### H5. Onboarding to an Unfamiliar Code Area
**Problem:** "I'm new to the team and need to understand how the billing module works."
**Flow:**
```
lore search "billing" -n 20 # What exists about billing?
lore who src/billing/ # Who knows billing best?
lore timeline "billing" --depth 2 # History of billing changes
lore mrs -s merged -l billing --since 6m # Recent merged billing work
lore issues -s opened -l billing # Outstanding billing issues
```
**Gap identified:** No way to get a "module overview" in one command. The search spans issues, MRs, and discussions but doesn't summarize by category. No way to see the most-discussed or most-referenced entities (high-signal items for understanding).
---
### H6. Finding the Right Reviewer for My PR
**Problem:** "I'm about to submit a PR touching auth and payments. Who should review?"
**Flow:**
```
lore who src/features/auth/ # Auth experts
lore who src/features/payments/ # Payment experts
lore who @candidate1 # Check candidate1's workload
lore who @candidate2 # Check candidate2's workload
```
**Gap identified:** No way to query multiple paths at once (`lore who src/auth/ src/payments/`). No way to find the intersection of expertise. No workload-aware recommendation ("who knows this AND has bandwidth"). Four separate commands for what should be one decision.
---
### H7. Understanding Why a Feature Was Built This Way
**Problem:** "This code is weird. Why was it implemented like this? What was the original discussion?"
**Flow:**
```
lore search "feature-name rationale" # Search for decision context
lore timeline "feature-name" --depth 2 # Full history with cross-refs
lore issues 234 # Read the original issue
lore mrs 567 # Read the implementation MR
```
**Gap identified:** No way to search within a specific issue's or MR's discussion notes. The search covers documents (titles + descriptions) but per-note search isn't available yet (PRD exists). No way to navigate "issue 234 was closed by MR 567" without manually knowing both IDs.
---
### H8. Checking Team Workload Before Assigning Work
**Problem:** "I need to assign this urgent bug. Who has the least on their plate?"
**Flow:**
```
lore who @alice # Alice's workload
lore who @bob # Bob's workload
lore who @carol # Carol's workload
lore who @dave # Dave's workload
```
**Gap identified:** No team-level workload view. Must query each person individually. No way to list "all assignees and their open issue counts." No concept of a team roster. Would benefit from `lore who --team` or `lore workload`.
---
### H9. Preparing Release Notes
**Problem:** "We're cutting a release. I need to summarize what's in this version."
**Flow:**
```
lore mrs -s merged --since 2w -p myproject # MRs merged since last release
lore issues -s closed --since 2w -p myproject # Issues closed since last release
lore mrs -s merged -l feature --since 2w # Feature MRs specifically
lore mrs -s merged -l bugfix --since 2w # Bugfix MRs
```
**Gap identified:** No way to filter by milestone (for version-based releases). Wait -- `issues` has `-m` for milestone but `mrs` does not. No changelog generation. No "what closed between tag A and tag B." No grouping by label for release note categories.
---
### H10. Finding and Closing Stale Issues
**Problem:** "Our backlog is bloated. Which issues haven't been touched in months?"
**Flow:**
```
lore issues -s opened --sort updated --asc -n 50 # Oldest-updated first
# Then manually inspect each one...
lore issues 42 # Is this still relevant?
```
**Gap identified:** No `--before` or `--updated-before` filter (only `--since` exists). Can sort ascending but can't filter "not updated in 90 days." No staleness indicator. No bulk operations concept.
---
### H11. Understanding a Bug's Full History
**Problem:** "Bug #321 keeps getting reopened. I need to understand its entire lifecycle."
**Flow:**
```
lore issues 321 # Read the issue
lore timeline "bug-keyword" -p myproject # Try to find timeline events
# But timeline is keyword-based, not entity-based...
```
**Gap identified:** No way to get a timeline for a specific entity by IID. `lore timeline` requires a keyword query, not an entity reference. Would benefit from `lore timeline --issue 321` or `lore timeline --mr 456` to get the event history of a specific entity directly.
---
### H12. Identifying Who to Ask About Failing Tests
**Problem:** "CI tests are failing in `src/lib/parser.rs`. Who last touched this?"
**Flow:**
```
lore who src/lib/parser.rs # Expert lookup
lore who --overlap src/lib/parser.rs # Who else has touched it
lore search "parser" --type mr --since 2w # Recent MRs touching parser
```
**Gap identified:** Expert mode uses DiffNote analysis (code review comments), not actual file change tracking. The `mr_file_changes` table has the real data but `who` doesn't use it for attribution. Could be much more accurate with file-change-based expertise.
---
### H13. Tracking a Feature Across Multiple MRs
**Problem:** "The 'dark mode' feature spans 5 MRs. I need to see them all together."
**Flow:**
```
lore mrs -l dark-mode # MRs with the label
lore issues -l dark-mode # Related issues
lore timeline "dark mode" --depth 2 # Cross-referenced events
```
**Gap identified:** Works reasonably well with labels as the grouping mechanism. But if the team didn't label consistently, there's no way to discover related MRs by content similarity. No "related items" view that combines issues + MRs + discussions for a topic.
---
### H14. Checking if a Similar Fix Was Already Attempted
**Problem:** "Before I implement this fix, was something similar tried before?"
**Flow:**
```
lore search "memory leak connection pool" # Semantic search
lore search "connection pool" --type mr -s all # Wait, no state filter on search
lore mrs -s closed -l bugfix # Closed bugfix MRs (coarse)
lore timeline "connection pool" # Historical context
```
**Gap identified:** Search doesn't have a `--state` filter. Can't search only closed/merged items. The semantic search is powerful but can't be combined with entity state. Would benefit from `--state merged` on search to find past attempts.
---
### H15. Reviewing Discussions That Need My Attention
**Problem:** "Which discussion threads am I involved in that are still unresolved?"
**Flow:**
```
lore who --active # All active unresolved discussions
lore who --active --since 30d # Wider window
# But can't filter to "discussions I'm in"...
```
**Gap identified:** `--active` shows all unresolved discussions, not filtered by participant. No way to say "show me discussions where @me participated." No notification/mention tracking. No "my unresolved threads" view.
---
## Part 2: AI Agent Flows
### A1. Context Gathering Before Code Modification
**Problem:** Agent is about to modify `src/features/auth/session.rs` and needs full context.
**Flow:**
```
lore -J health # Pre-flight check
lore -J who src/features/auth/ # Who knows this area
lore -J search "auth session" -n 10 # Related issues/MRs
lore -J mrs -s merged --since 3m -l auth # Recent auth changes
lore -J who --overlap src/features/auth/session.rs # Concurrent work risk
```
**Gap identified:** No way to check "are there open MRs touching this file right now?" The overlap mode shows historical touches, not active branches. An agent needs to know about in-flight changes to avoid conflicts.
---
### A2. Auto-Triaging an Incoming Issue
**Problem:** Agent receives a new issue and needs to categorize it, find related work, and suggest assignees.
**Flow:**
```
lore -J issues 999 # Read the new issue
lore -J search "$(extract_keywords)" --explain # Find similar past issues
lore -J who src/affected/path/ # Suggest experts as assignees
lore -J issues -s opened -l same-label # Check for duplicates
```
**Gap identified:** No way to get just the description text for programmatic keyword extraction. `issues <iid>` returns full detail including discussions. Agent must parse the full response to extract the description for a secondary search. Would benefit from `--fields description` on detail view. No duplicate detection built in.
---
### A3. Generating Sprint Status Report
**Problem:** Agent needs to produce a weekly status report for the team.
**Flow:**
```
lore -J issues -s closed --since 1w --fields minimal # Completed work
lore -J issues -s opened --status "In progress" # In-flight work
lore -J mrs -s merged --since 1w --fields minimal # Merged PRs
lore -J mrs -s opened -D --fields minimal # Open non-draft MRs
lore -J count issues # Totals
lore -J count mrs # MR totals
lore -J who --active --since 1w # Discussions needing attention
```
**Gap identified:** Seven separate queries for one report. No `lore summary` or `lore report` command. No way to get "issues transitioned from X to Y this week" (state change history exists in events but isn't queryable). No velocity metric (issues closed per week trend).
---
### A4. Finding Relevant Prior Art Before Implementing
**Problem:** Agent is implementing a caching layer and wants to find if similar patterns exist in the codebase's GitLab history.
**Flow:**
```
lore -J search "caching" --mode hybrid -n 20 --explain
lore -J search "cache invalidation" --mode hybrid -n 10
lore -J search "redis" --mode lexical --type discussion # Exact term in discussions
lore -J timeline "cache" --since 1y # Wait, max is 1y? Let's try 12m
```
**Gap identified:** No way to search discussion notes individually (per-note search). Discussions are aggregated into documents, so individual note-level matches are lost. The `--explain` flag helps but doesn't show which specific note matched. No `--since 1y` or `--since 12m` duration format.
---
### A5. Building Context for PR Description
**Problem:** Agent wrote code and needs to generate a PR description that references relevant issues.
**Flow:**
```
lore -J search "feature description keywords" --type issue
lore -J issues -s opened -l feature-label --fields iid,title,web_url
# Cross-reference: which issues does this MR close?
# No command for this -- must manually scan search results
```
**Gap identified:** No way to query the `entity_references` table directly. Agent can't ask "which issues reference MR !456" or "which issues contain 'closes #123' in their text." The data exists but isn't exposed as a query surface. Would benefit from `lore refs --mr 456` or `lore refs --issue 123`.
---
### A6. Identifying Affected Experts for Review Assignment
**Problem:** Agent needs to automatically assign reviewers based on the files changed in an MR.
**Flow:**
```
lore -J mrs 456 # Get MR details
# Parse file paths from response... but file changes aren't in the output
lore -J who src/path/from/mr/ # Query each path
lore -J who src/another/path/ # One at a time...
lore -J who @candidate --fields minimal # Check workload
```
**Gap identified:** MR detail view (`mrs <iid>`) doesn't include the file change list from `mr_file_changes`. Agent can't programmatically extract which files an MR touches. Must fall back to GitLab API or guess from description. The `who` command doesn't accept multiple paths. No "auto-reviewer" suggestion combining expertise + availability.
---
### A7. Incident Investigation and Timeline Reconstruction
**Problem:** Agent needs to reconstruct what happened during an outage for a postmortem.
**Flow:**
```
lore -J timeline "outage" --since 3d --depth 2 --expand-mentions
lore -J search "error 500" --since 3d
lore -J mrs -s merged --since 3d -p production-service
lore -J issues --status "In progress" -p production-service
```
**Gap identified:** Timeline is keyword-seeded, which means if the outage wasn't described with that exact term, seeds may miss it. No way to seed a timeline from an entity ID (e.g., "start from issue #321 and expand outward"). No severity/priority filter. No way to correlate with merge times.
---
### A8. Cross-Project Impact Assessment
**Problem:** Agent needs to understand how a breaking API change in project A affects projects B and C.
**Flow:**
```
lore -J search "api-endpoint-name" -p project-a
lore -J search "api-endpoint-name" -p project-b
lore -J search "api-endpoint-name" -p project-c
# Or without project filter to search everywhere:
lore -J search "api-endpoint-name" -n 50
lore -J timeline "api-endpoint-name" --depth 2
```
**Gap identified:** Cross-project references in entity_references are tracked but the timeline shows unresolved references for entities not synced locally. No way to see a cross-project dependency map. Search works across projects but doesn't group results by project.
---
### A9. Automated Stale Issue Recommendations
**Problem:** Agent runs weekly to identify issues that should be closed or re-prioritized.
**Flow:**
```
lore -J issues -s opened --sort updated --asc -n 100 # Oldest first
# For each issue, check:
lore -J issues <iid> # Read details
lore -J search "<issue title keywords>" # Any recent activity?
```
**Gap identified:** No `--updated-before` filter, so agent must fetch all and filter client-side. No way to detect "issue has no assignee AND no activity in 90 days." The 100-issue limit means pagination is needed for large backlogs, but there's no cursor/offset pagination -- only `--limit`. Agent must do N+1 queries to inspect each candidate.
---
### A10. Code Review Preparation (File-Level Context)
**Problem:** Agent is reviewing MR !789 and needs to understand the history of each changed file.
**Flow:**
```
lore -J mrs 789 # Get MR details
# Can't get file list from output...
# Fall back to search by MR title keywords
lore -J search "feature-from-mr" --type mr
lore -J who src/guessed/path/ # Expertise for each file
lore -J who --overlap src/guessed/path/ # Concurrent changes
```
**Gap identified:** Same as A6 -- `mr_file_changes` data isn't exposed. Agent is blind to the actual files in the MR unless it parses the description or uses the GitLab API directly. This is the single biggest gap for automated code review workflows.
---
### A11. Building a Knowledge Graph of Entity Relationships
**Problem:** Agent wants to map how issues, MRs, and discussions are connected for a feature.
**Flow:**
```
lore -J search "feature-name" -n 30
lore -J timeline "feature-name" --depth 2 --max-entities 100
# Timeline shows expanded entities and cross-refs, but...
# No way to query entity_references directly
# No way to get "all entities that reference issue #123"
```
**Gap identified:** The `entity_references` table (closes, related, mentioned) is used internally by timeline but isn't queryable as a standalone command. Agent can't ask "what closes issue #123?" or "what does MR !456 reference?" No graph export. Would enable powerful dependency mapping.
---
### A12. Release Readiness Assessment
**Problem:** Agent needs to verify all issues in milestone "v2.0" are closed and MRs are merged.
**Flow:**
```
lore -J issues -m "v2.0" -s opened # Any open issues in milestone?
lore -J issues -m "v2.0" -s closed # Closed issues
# MRs don't have milestone filter...
lore -J mrs -s opened -l "v2.0" # Try label as proxy
lore -J who --active -p myproject # Unresolved discussions
```
**Gap identified:** MRs don't have a `--milestone` filter (issues do). No way to check "all MRs linked to issues in milestone v2.0" -- would require joining `entity_references` with issue milestone. No release checklist concept. No way to verify "every issue in this milestone has a closing MR."
---
### A13. Answering "What Changed?" Between Two Points
**Problem:** Agent needs to diff project state between two dates for a stakeholder report.
**Flow:**
```
lore -J issues -s closed --since 2w --fields minimal # Recently closed
lore -J issues -s opened --since 2w --fields minimal # Recently opened
lore -J mrs -s merged --since 2w --fields minimal # Recently merged
# But no way to get "issues that CHANGED STATE" in a window
# An issue opened 3 months ago but closed yesterday won't appear in --since 2w for issues -s opened
```
**Gap identified:** `--since` filters by `updated_at`, not by "state changed at." An issue closed yesterday but created 6 months ago would appear in `issues -s closed --since 1d` (because updated_at changed), but the semantics are subtle. No explicit "state transitions in time window" query. The resource_state_events table has this data but it's not exposed as a filter.
---
### A14. Meeting Prep: Summarize Recent Activity for a Stakeholder
**Problem:** Agent needs to prepare a 2-minute summary for a project sponsor meeting.
**Flow:**
```
lore -J count issues -p project # Current totals
lore -J count mrs -p project # MR totals
lore -J issues -s closed --since 1w -p project --fields minimal
lore -J mrs -s merged --since 1w -p project --fields minimal
lore -J issues -s opened --status "In progress" -p project
lore -J who --active -p project --since 1w
```
**Gap identified:** Six queries, same as A3. No summary/dashboard command. Agent must synthesize all responses. No trend data (is the open issue count growing or shrinking?). No "highlights" extraction.
---
### A15. Determining If Work Is Safe to Start (Conflict Detection)
**Problem:** Agent is about to start work on an issue and needs to check nobody else is already working on it.
**Flow:**
```
lore -J issues 123 # Read the issue
# Check assignees from response
lore -J mrs -s opened -A other-person # Are they working on related MRs?
lore -J who --overlap src/target/path/ # Anyone actively touching these files?
lore -J search "issue-123-keywords" --type mr -s opened # Wait, search has no --state
```
**Gap identified:** No way to check "is there an open MR that closes issue #123?" -- the entity_references data exists but isn't queryable. Search doesn't support `--state` filter. No "conflict detection" or "in-flight work" check. Agent must do multiple queries and manually correlate.
---
## Part 3: Gap Summary
### Critical Gaps (high impact, blocks common workflows)
| # | Gap | Affected Flows | Suggested Command/Flag |
|---|-----|----------------|----------------------|
| 1 | **MR file changes not surfaced** | H4, A6, A10 | `lore mrs <iid> --files` or include in detail view |
| 2 | **Entity references not queryable** | H7, A5, A11, A15 | `lore refs --issue 123` / `lore refs --mr 456` |
| 3 | **Per-note search missing** | H7, A4 | `lore search --granularity note` (PRD exists) |
| 4 | **No entity-based timeline** | H11, A7 | `lore timeline --issue 321` / `lore timeline --mr 456` |
| 5 | **No @me / current-user alias** | H1, H15 | Resolve from auth token automatically |
### Important Gaps (significant friction, multiple workarounds needed)
| # | Gap | Affected Flows | Suggested Command/Flag |
|---|-----|----------------|----------------------|
| 6 | **No activity feed / summary** | H1, A3, A14 | `lore activity --since 1d` or `lore summary` |
| 7 | **No multi-path who query** | H6, A6 | `lore who src/path1/ src/path2/` |
| 8 | **No --state filter on search** | H14, A15 | `lore search --state merged` |
| 9 | **MRs missing --milestone filter** | H9, A12 | `lore mrs -m "v2.0"` |
| 10 | **No --no-assignee / --unassigned** | H2 | `lore issues --no-assignee` |
| 11 | **No --updated-before filter** | H10, A9 | `lore issues --before 90d` or `--stale 90d` |
| 12 | **No team workload view** | H8 | `lore who --team` or `lore workload` |
### Nice-to-Have Gaps (would improve agent efficiency)
| # | Gap | Affected Flows | Suggested Command/Flag |
|---|-----|----------------|----------------------|
| 13 | **No pagination/offset** | A9 | `--offset 100` for large result sets |
| 14 | **No detail --fields on show** | A2 | `lore issues 999 --fields description` |
| 15 | **No cross-project grouping** | A8 | `lore search --group-by project` |
| 16 | **No trend/velocity metrics** | A3, A14 | `lore trends issues --period week` |
| 17 | **No --for-issue on mrs** | A12, A15 | `lore mrs --closes 123` (query entity_refs) |
| 18 | **1y/12m duration not supported** | A4 | Support `1y`, `12m`, `365d` in --since |
| 19 | **No discussion participant filter** | H15 | `lore who --active --participant @me` |
| 20 | **No sort by due date** | H2 | `lore issues --sort due` |

View File

@@ -0,0 +1,552 @@
Below are the highest-leverage revisions Id make for iteration 8, staying within your MVP constraints (static SQL, no scope creep into new data sources), but tightening correctness, index utilization predictability, debuggability, and output safety.
1) Fix the semantic bug in since_was_default (Workload mode) by introducing since_mode
Why this is better
Right now since_was_default = args.since.is_none() is misleading for Workload, because Workload has no default window (its “unbounded unless explicitly filtered”). In robot mode, this creates incorrect intent replay and ambiguity.
Replace the boolean with a tri-state:
since_mode: "default" | "explicit" | "none"
Keep since_was_default only if you want backward compatibility, but compute it as since_mode == "default".
Patch
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-5. **Robot-first reproducibility.** Robot JSON output includes both a raw `input` object (echoing CLI args) and a `resolved_input` object (computed `since_ms`, `since_iso`, `since_was_default`, resolved `project_id` + `project_path`, effective `mode`, `limit`) so agents can trace exactly what ran and reproduce it precisely.
+5. **Robot-first reproducibility.** Robot JSON output includes both a raw `input` object (echoing CLI args) and a `resolved_input` object (computed `since_ms`, `since_iso`, `since_mode`, resolved `project_id` + `project_path`, effective `mode`, `limit`) so agents can trace exactly what ran and reproduce it precisely.
@@
pub struct WhoResolvedInput {
pub mode: String,
pub project_id: Option<i64>,
pub project_path: Option<String>,
pub since_ms: Option<i64>,
pub since_iso: Option<String>,
- pub since_was_default: bool,
+ /// "default" (mode default applied), "explicit" (user provided --since), "none" (no window)
+ pub since_mode: String,
pub limit: usize,
}
@@
- let since_was_default = args.since.is_none();
+ // since_mode semantics:
+ // - expert/reviews/active/overlap: default window applies if args.since is None
+ // - workload: no default window; args.since None => "none"
+ let since_mode_for_defaulted = if args.since.is_some() { "explicit" } else { "default" };
+ let since_mode_for_workload = if args.since.is_some() { "explicit" } else { "none" };
@@
WhoMode::Expert { path } => {
let since_ms = resolve_since(args.since.as_deref(), "6m")?;
let result = query_expert(&conn, path, project_id, since_ms, args.limit)?;
Ok(WhoRun {
resolved_input: WhoResolvedInput {
mode: "expert".to_string(),
project_id,
project_path,
since_ms: Some(since_ms),
since_iso: Some(ms_to_iso(since_ms)),
- since_was_default,
+ since_mode: since_mode_for_defaulted.to_string(),
limit: args.limit,
},
result: WhoResult::Expert(result),
})
}
@@
WhoMode::Workload { username } => {
let since_ms = args
.since
.as_deref()
.map(|s| resolve_since_required(s))
.transpose()?;
let result = query_workload(&conn, username, project_id, since_ms, args.limit)?;
Ok(WhoRun {
resolved_input: WhoResolvedInput {
mode: "workload".to_string(),
project_id,
project_path,
since_ms,
since_iso: since_ms.map(ms_to_iso),
- since_was_default,
+ since_mode: since_mode_for_workload.to_string(),
limit: args.limit,
},
result: WhoResult::Workload(result),
})
}
@@
fn print_who_json(run: &WhoRun, args: &WhoArgs, elapsed_ms: u64) {
@@
let resolved_input = serde_json::json!({
"mode": run.resolved_input.mode,
"project_id": run.resolved_input.project_id,
"project_path": run.resolved_input.project_path,
"since_ms": run.resolved_input.since_ms,
"since_iso": run.resolved_input.since_iso,
- "since_was_default": run.resolved_input.since_was_default,
+ "since_mode": run.resolved_input.since_mode,
"limit": run.resolved_input.limit,
});
}
2) Stop using nullable-OR ((? IS NULL OR col = ?)) where it determines the “right” index (Active is the big one)
Why this is better
Your global vs project-scoped Active indexes are correct, but the nullable binding pattern undermines them because SQLites planner cant assume whether ?2 is NULL at prepare time. Result: it can pick a “good enough for both” plan, which is often the wrong one for -p.
Fix: keep SQL static, but use two static statements selected at runtime (like you already do for exact vs prefix path matching).
Patch
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-1. **Lean on existing infrastructure.** Use `(?N IS NULL OR ...)` nullable binding pattern (already used in `timeline_seed.rs`) instead of dynamic SQL string assembly.
+1. **Lean on existing infrastructure.** Prefer `(?N IS NULL OR ...)` nullable binding for optional filters **unless** it materially changes index choice. In those cases, select between **two static SQL strings** at runtime (no `format!()`), e.g. Active mode uses separate global vs project-scoped statements to ensure the intended index is used.
@@
fn query_active(
conn: &Connection,
project_id: Option<i64>,
since_ms: i64,
limit: usize,
) -> Result<ActiveResult> {
let limit_plus_one = (limit + 1) as i64;
- // Total unresolved count
- let total_sql =
- "SELECT COUNT(*) FROM discussions d
- WHERE d.resolvable = 1 AND d.resolved = 0
- AND d.last_note_at >= ?1
- AND (?2 IS NULL OR d.project_id = ?2)";
+ // Total unresolved count (two static variants to avoid nullable-OR planner ambiguity)
+ let total_sql_global =
+ "SELECT COUNT(*) FROM discussions d
+ WHERE d.resolvable = 1 AND d.resolved = 0
+ AND d.last_note_at >= ?1";
+ let total_sql_scoped =
+ "SELECT COUNT(*) FROM discussions d
+ WHERE d.resolvable = 1 AND d.resolved = 0
+ AND d.last_note_at >= ?1
+ AND d.project_id = ?2";
- let total_unresolved: u32 =
- conn.query_row(total_sql, rusqlite::params![since_ms, project_id], |row| row.get(0))?;
+ let total_unresolved: u32 = match project_id {
+ None => conn.query_row(total_sql_global, rusqlite::params![since_ms], |row| row.get(0))?,
+ Some(pid) => conn.query_row(total_sql_scoped, rusqlite::params![since_ms, pid], |row| row.get(0))?,
+ };
- let sql = "
+ let sql_global = "
WITH picked AS (
SELECT d.id, d.noteable_type, d.issue_id, d.merge_request_id,
d.project_id, d.last_note_at
FROM discussions d
WHERE d.resolvable = 1 AND d.resolved = 0
AND d.last_note_at >= ?1
- AND (?2 IS NULL OR d.project_id = ?2)
ORDER BY d.last_note_at DESC
LIMIT ?2
),
@@
ORDER BY p.last_note_at DESC
";
- let mut stmt = conn.prepare_cached(sql)?;
- let discussions: Vec<ActiveDiscussion> = stmt
- .query_map(rusqlite::params![since_ms, project_id, limit_plus_one], |row| {
+ let sql_scoped = "
+ WITH picked AS (
+ SELECT d.id, d.noteable_type, d.issue_id, d.merge_request_id,
+ d.project_id, d.last_note_at
+ FROM discussions d
+ WHERE d.resolvable = 1 AND d.resolved = 0
+ AND d.last_note_at >= ?1
+ AND d.project_id = ?2
+ ORDER BY d.last_note_at DESC
+ LIMIT ?3
+ ),
+ note_counts AS (
+ SELECT n.discussion_id, COUNT(*) AS note_count
+ FROM notes n
+ JOIN picked p ON p.id = n.discussion_id
+ WHERE n.is_system = 0
+ GROUP BY n.discussion_id
+ ),
+ participants AS (
+ SELECT x.discussion_id, GROUP_CONCAT(x.author_username, X'1F') AS participants
+ FROM (
+ SELECT DISTINCT n.discussion_id, n.author_username
+ FROM notes n
+ JOIN picked p ON p.id = n.discussion_id
+ WHERE n.is_system = 0 AND n.author_username IS NOT NULL
+ ) x
+ GROUP BY x.discussion_id
+ )
+ SELECT
+ p.id AS discussion_id,
+ p.noteable_type,
+ COALESCE(i.iid, m.iid) AS entity_iid,
+ COALESCE(i.title, m.title) AS entity_title,
+ proj.path_with_namespace,
+ p.last_note_at,
+ COALESCE(nc.note_count, 0) AS note_count,
+ COALESCE(pa.participants, '') AS participants
+ FROM picked p
+ JOIN projects proj ON p.project_id = proj.id
+ LEFT JOIN issues i ON p.issue_id = i.id
+ LEFT JOIN merge_requests m ON p.merge_request_id = m.id
+ LEFT JOIN note_counts nc ON nc.discussion_id = p.id
+ LEFT JOIN participants pa ON pa.discussion_id = p.id
+ ORDER BY p.last_note_at DESC
+ ";
+
+ let discussions: Vec<ActiveDiscussion> = match project_id {
+ None => {
+ let mut stmt = conn.prepare_cached(sql_global)?;
+ stmt.query_map(rusqlite::params![since_ms, limit_plus_one], |row| {
+ /* unchanged row mapping */
+ })?.collect::<std::result::Result<Vec<_>, _>>()?
+ }
+ Some(pid) => {
+ let mut stmt = conn.prepare_cached(sql_scoped)?;
+ stmt.query_map(rusqlite::params![since_ms, pid, limit_plus_one], |row| {
+ /* unchanged row mapping */
+ })?.collect::<std::result::Result<Vec<_>, _>>()?
+ }
+ };
Also update Verification to explicitly check both variants:
diff
Copy code
@@
# Performance verification (required before merge):
@@
sqlite3 path/to/db.sqlite "
EXPLAIN QUERY PLAN
SELECT d.id, d.last_note_at
FROM discussions d
WHERE d.resolvable = 1 AND d.resolved = 0
AND d.last_note_at >= 0
ORDER BY d.last_note_at DESC
LIMIT 20;
"
# Expected: SEARCH discussions USING INDEX idx_discussions_unresolved_recent_global
+
+sqlite3 path/to/db.sqlite "
+ EXPLAIN QUERY PLAN
+ SELECT d.id, d.last_note_at
+ FROM discussions d
+ WHERE d.resolvable = 1 AND d.resolved = 0
+ AND d.project_id = 1
+ AND d.last_note_at >= 0
+ ORDER BY d.last_note_at DESC
+ LIMIT 20;
+"
+# Expected: SEARCH discussions USING INDEX idx_discussions_unresolved_recent
3) Add repo-path normalization (eliminate trivial “no results” footguns)
Why this is better
People paste:
./src/foo/
/src/foo/
src\foo\bar.rs (Windows)
These currently lead to silent misses.
Normalize only user input (not DB content):
trim whitespace
strip leading ./ and /
convert \ → / when present
collapse repeated //
Patch
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
--- a/who-command-design.md
+++ b/who-command-design.md
@@
fn resolve_mode<'a>(args: &'a WhoArgs) -> Result<WhoMode<'a>> {
@@
- if let Some(p) = &args.path {
- return Ok(WhoMode::Expert { path: p });
+ if let Some(p) = &args.path {
+ let norm = normalize_repo_path(p);
+ return Ok(WhoMode::Expert { path: Box::leak(norm.into_boxed_str()) });
}
@@
- if let Some(path) = &args.overlap {
- return Ok(WhoMode::Overlap { path });
+ if let Some(path) = &args.overlap {
+ let norm = normalize_repo_path(path);
+ return Ok(WhoMode::Overlap { path: Box::leak(norm.into_boxed_str()) });
}
@@
- if target.contains('/') {
- return Ok(WhoMode::Expert { path: target });
+ if target.contains('/') {
+ let norm = normalize_repo_path(target);
+ return Ok(WhoMode::Expert { path: Box::leak(norm.into_boxed_str()) });
}
@@
}
+
+/// Normalize user-supplied repo paths to match stored DiffNote paths.
+/// - trims whitespace
+/// - strips leading "./" and "/" (repo-relative)
+/// - converts '\' to '/' (Windows paste)
+/// - collapses repeated slashes
+fn normalize_repo_path(input: &str) -> String {
+ let mut s = input.trim().to_string();
+ if s.contains('\\') && !s.contains('/') {
+ s = s.replace('\\', "/");
+ }
+ while s.starts_with("./") {
+ s = s.trim_start_matches("./").to_string();
+ }
+ while s.starts_with('/') {
+ s = s.trim_start_matches('/').to_string();
+ }
+ while s.contains("//") {
+ s = s.replace("//", "/");
+ }
+ s
+}
(Add a small test block for normalization; even 23 asserts catch regressions.)
4) Make path matching observable: include path_match (exact vs prefix) in results/JSON
Why this is better
Youve made path classification smarter (heuristics + two-way probe). Thats great, but without visibility youll get “why did it treat this as a directory?” confusion. Exposing match metadata is low cost and hugely helps debugging.
Patch
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-struct PathQuery {
- /// The parameter value to bind.
- value: String,
- /// If true: use `LIKE value ESCAPE '\'`. If false: use `= value`.
- is_prefix: bool,
-}
+struct PathQuery {
+ /// User input after normalization (no trailing slash stripping yet).
+ input: String,
+ /// Trimmed path without trailing '/' used for exact/prefix construction.
+ normalized: String,
+ /// The SQL parameter bound to the statement (`foo/bar` or `foo/bar/%`).
+ sql_value: String,
+ /// If true: use `LIKE sql_value ESCAPE '\'`. If false: use `= normalized`.
+ is_prefix: bool,
+}
@@
- let trimmed = path.trim_end_matches('/');
+ let input = normalize_repo_path(path);
+ let trimmed = input.trim_end_matches('/').to_string();
@@
- Ok(PathQuery {
- value: trimmed.to_string(),
- is_prefix: false,
- })
+ Ok(PathQuery { input, normalized: trimmed.clone(), sql_value: trimmed, is_prefix: false })
} else {
- Ok(PathQuery {
- value: format!("{escaped}/%"),
- is_prefix: true,
- })
+ Ok(PathQuery { input, normalized: trimmed.clone(), sql_value: format!("{escaped}/%"), is_prefix: true })
}
@@
pub struct ExpertResult {
pub path_query: String,
+ pub path_match: String, // "exact" or "prefix"
pub experts: Vec<Expert>,
pub truncated: bool,
}
@@
pub struct OverlapResult {
pub path_query: String,
+ pub path_match: String, // "exact" or "prefix"
pub users: Vec<OverlapUser>,
pub truncated: bool,
}
@@
fn query_expert(...) -> Result<ExpertResult> {
let pq = build_path_query(conn, path, project_id)?;
@@
Ok(ExpertResult {
path_query: path.to_string(),
+ path_match: if pq.is_prefix { "prefix".to_string() } else { "exact".to_string() },
experts,
truncated,
})
}
@@
fn query_overlap(...) -> Result<OverlapResult> {
let pq = build_path_query(conn, path, project_id)?;
@@
Ok(OverlapResult {
path_query: path.to_string(),
+ path_match: if pq.is_prefix { "prefix".to_string() } else { "exact".to_string() },
users,
truncated,
})
}
@@
fn expert_to_json(r: &ExpertResult) -> serde_json::Value {
serde_json::json!({
"path_query": r.path_query,
+ "path_match": r.path_match,
"truncated": r.truncated,
"experts": ...
})
}
@@
fn overlap_to_json(r: &OverlapResult) -> serde_json::Value {
serde_json::json!({
"path_query": r.path_query,
+ "path_match": r.path_match,
"truncated": r.truncated,
"users": ...
})
}
Human output can add a single dim hint line:
(matching exact file) or (matching directory prefix)
5) Put a hard upper bound on --limit at the CLI boundary
Why this is better
You already bounded nested arrays (participants, mr_refs), but top-level lists are still user-unbounded. A single --limit 50000 can:
generate huge JSON payloads
blow up downstream agent pipelines
create slow queries / memory spikes
Clamp it before execution. A max of 500 is usually plenty; even 200 is fine.
Patch
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
--- a/who-command-design.md
+++ b/who-command-design.md
@@
pub struct WhoArgs {
@@
- /// Maximum results per section
- #[arg(short = 'n', long = "limit", default_value = "20", help_heading = "Output")]
+ /// Maximum results per section (bounded for output safety)
+ #[arg(
+ short = 'n',
+ long = "limit",
+ default_value = "20",
+ value_parser = clap::value_parser!(u16).range(1..=500),
+ help_heading = "Output"
+ )]
pub limit: usize,
}
@@
-11. **Bounded payloads.** Robot JSON must never emit unbounded arrays ...
+11. **Bounded payloads.** Robot JSON must never emit unbounded arrays ...
+ Top-level result set size is also bounded via `--limit` (1..=500) to prevent runaway payloads.
6) Clarify Active “unresolved count” semantics (window vs total)
Why this is better
total_unresolved currently means “unresolved within the time window”. The human header prints “Active Discussions (X unresolved)” which can easily be misread as “total unresolved overall”.
Small rename avoids confusion, no new behavior.
Patch
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
--- a/who-command-design.md
+++ b/who-command-design.md
@@
pub struct ActiveResult {
pub discussions: Vec<ActiveDiscussion>,
- pub total_unresolved: u32,
+ pub total_unresolved_in_window: u32,
pub truncated: bool,
}
@@
- println!(
- "{}",
- style(format!(
- "Active Discussions ({} unresolved)",
- r.total_unresolved
- ))
- .bold()
- );
+ println!("{}", style(format!(
+ "Active Discussions ({} unresolved in window)",
+ r.total_unresolved_in_window
+ )).bold());
(If you later want global total, add a second count query—but Id keep MVP lean.)
7) Tighten statement cache behavior: avoid preparing both SQL variants when not needed
Why this is better
You already use prepare_cached(), but as you add more “two static variants” (exact/prefix; scoped/unscoped), its easy to accidentally prepare multiple statements per invocation.
Codify: select variant first, then prepare exactly one.
This is mostly a plan hygiene change (helps future you keep perf predictable).
Patch (plan-level emphasis)
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-1. **Lean on existing infrastructure.** ...
+1. **Lean on existing infrastructure.** ...
+ When multiple static SQL variants exist (exact/prefix; scoped/unscoped), always:
+ (a) resolve which variant applies, then (b) `prepare_cached()` exactly one statement.
Net effect (what you gain)
Correct robot semantics (since_mode) without breaking your static-SQL/agent-first contract.
Guaranteed intended index usage for Active global vs scoped queries (the nullable-OR planner pitfall is real).
Fewer “why no results?” surprises via path normalization.
Better debugging (path match introspection) with essentially no runtime cost.
Output safety even when users/agents misconfigure --limit.
Less ambiguous UX around “unresolved” counts.
If you want a single “most important” change to ship before iteration 8 locks: #2 (Active query variants) and #1 (since semantics) are the two that prevent the most painful, hard-to-diagnose failures.

3251
docs/who-command-design.md Normal file

File diff suppressed because it is too large Load Diff

844
gitlore-sync-explorer.html Normal file
View File

@@ -0,0 +1,844 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Gitlore Sync Pipeline Explorer</title>
<style>
:root {
--bg: #0d1117;
--bg-secondary: #161b22;
--bg-tertiary: #1c2129;
--border: #30363d;
--text: #c9d1d9;
--text-dim: #8b949e;
--text-bright: #f0f6fc;
--cyan: #58a6ff;
--green: #3fb950;
--amber: #d29922;
--red: #f85149;
--purple: #bc8cff;
--pink: #f778ba;
--cyan-dim: rgba(88,166,255,0.15);
--green-dim: rgba(63,185,80,0.15);
--amber-dim: rgba(210,153,34,0.15);
--red-dim: rgba(248,81,73,0.15);
--purple-dim: rgba(188,140,255,0.15);
}
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
font-family: 'SF Mono', 'Cascadia Code', 'Fira Code', 'JetBrains Mono', monospace;
background: var(--bg); color: var(--text);
display: flex; height: 100vh; overflow: hidden;
}
.sidebar {
width: 220px; min-width: 220px; background: var(--bg-secondary);
border-right: 1px solid var(--border); display: flex; flex-direction: column; padding: 16px 0;
}
.sidebar-title {
font-size: 11px; font-weight: 700; text-transform: uppercase;
letter-spacing: 1.2px; color: var(--text-dim); padding: 0 16px 12px;
}
.logo {
padding: 0 16px 20px; font-size: 15px; font-weight: 700; color: var(--cyan);
display: flex; align-items: center; gap: 8px;
}
.logo svg { width: 20px; height: 20px; }
.nav-item {
padding: 10px 16px; cursor: pointer; font-size: 13px; color: var(--text-dim);
transition: all 0.15s; border-left: 3px solid transparent;
display: flex; align-items: center; gap: 10px;
}
.nav-item:hover { background: var(--bg-tertiary); color: var(--text); }
.nav-item.active { background: var(--cyan-dim); color: var(--cyan); border-left-color: var(--cyan); }
.nav-dot { width: 8px; height: 8px; border-radius: 50%; flex-shrink: 0; }
.main { flex: 1; display: flex; flex-direction: column; overflow: hidden; }
.header {
padding: 16px 24px; border-bottom: 1px solid var(--border);
display: flex; align-items: center; justify-content: space-between;
}
.header h1 { font-size: 16px; font-weight: 600; color: var(--text-bright); }
.header-badge {
font-size: 11px; padding: 3px 10px; border-radius: 12px;
background: var(--cyan-dim); color: var(--cyan);
}
.canvas-wrapper { flex: 1; overflow: auto; position: relative; }
.canvas { padding: 32px; min-height: 100%; }
.flow-container { display: none; }
.flow-container.active { display: block; }
.phase { margin-bottom: 32px; }
.phase-header { display: flex; align-items: center; gap: 12px; margin-bottom: 16px; }
.phase-number {
width: 28px; height: 28px; border-radius: 50%; display: flex; align-items: center;
justify-content: center; font-size: 13px; font-weight: 700; flex-shrink: 0;
}
.phase-title { font-size: 14px; font-weight: 600; color: var(--text-bright); }
.phase-subtitle { font-size: 11px; color: var(--text-dim); margin-left: 4px; font-weight: 400; }
.flow-row {
display: flex; align-items: stretch; gap: 0; flex-wrap: wrap;
margin-left: 14px; padding-left: 26px; border-left: 2px solid var(--border);
}
.flow-row:last-child { border-left-color: transparent; }
.node {
position: relative; padding: 12px 16px; border-radius: 8px;
border: 1px solid var(--border); background: var(--bg-secondary);
font-size: 12px; cursor: pointer; transition: all 0.2s;
min-width: 180px; max-width: 260px; margin: 4px 0;
}
.node:hover {
border-color: var(--cyan); transform: translateY(-1px);
box-shadow: 0 4px 12px rgba(0,0,0,0.3);
}
.node.selected {
border-color: var(--cyan);
box-shadow: 0 0 0 1px var(--cyan), 0 4px 16px rgba(88,166,255,0.15);
}
.node-title { font-weight: 600; font-size: 12px; margin-bottom: 4px; color: var(--text-bright); }
.node-desc { font-size: 11px; color: var(--text-dim); line-height: 1.5; }
.node.api { border-left: 3px solid var(--cyan); }
.node.transform { border-left: 3px solid var(--purple); }
.node.db { border-left: 3px solid var(--green); }
.node.decision { border-left: 3px solid var(--amber); }
.node.error { border-left: 3px solid var(--red); }
.node.queue { border-left: 3px solid var(--pink); }
.arrow {
display: flex; align-items: center; padding: 0 6px;
color: var(--text-dim); font-size: 16px; flex-shrink: 0;
}
.arrow-down {
display: flex; justify-content: center; padding: 4px 0;
color: var(--text-dim); font-size: 16px; margin-left: 14px;
padding-left: 26px; border-left: 2px solid var(--border);
}
.branch-container {
margin-left: 14px; padding-left: 26px;
border-left: 2px solid var(--border); padding-bottom: 8px;
}
.branch-row { display: flex; gap: 12px; margin: 8px 0; flex-wrap: wrap; }
.branch-label {
font-size: 11px; font-weight: 600; margin: 8px 0 4px;
display: flex; align-items: center; gap: 6px;
}
.branch-label.success { color: var(--green); }
.branch-label.error { color: var(--red); }
.branch-label.retry { color: var(--amber); }
.diff-badge {
display: inline-block; font-size: 10px; padding: 2px 6px;
border-radius: 4px; margin-top: 6px; font-weight: 600;
}
.diff-badge.changed { background: var(--amber-dim); color: var(--amber); }
.diff-badge.same { background: var(--green-dim); color: var(--green); }
.detail-panel {
position: fixed; right: 0; top: 0; bottom: 0; width: 380px;
background: var(--bg-secondary); border-left: 1px solid var(--border);
transform: translateX(100%); transition: transform 0.25s ease;
z-index: 100; display: flex; flex-direction: column; overflow: hidden;
}
.detail-panel.open { transform: translateX(0); }
.detail-header {
padding: 16px 20px; border-bottom: 1px solid var(--border);
display: flex; align-items: center; justify-content: space-between;
}
.detail-header h2 { font-size: 14px; font-weight: 600; color: var(--text-bright); }
.detail-close {
cursor: pointer; color: var(--text-dim); font-size: 18px;
background: none; border: none; padding: 4px 8px; border-radius: 4px;
}
.detail-close:hover { background: var(--bg-tertiary); color: var(--text); }
.detail-body { flex: 1; overflow-y: auto; padding: 20px; }
.detail-section { margin-bottom: 20px; }
.detail-section h3 {
font-size: 11px; text-transform: uppercase; letter-spacing: 0.8px;
color: var(--text-dim); margin-bottom: 8px;
}
.detail-section p { font-size: 12px; line-height: 1.7; color: var(--text); }
.sql-block {
background: var(--bg); border: 1px solid var(--border); border-radius: 6px;
padding: 12px; font-size: 11px; line-height: 1.6; color: var(--green);
overflow-x: auto; white-space: pre; margin-top: 8px;
}
.detail-tag {
display: inline-block; font-size: 10px; padding: 2px 8px;
border-radius: 10px; margin: 2px 4px 2px 0;
}
.detail-tag.file { background: var(--purple-dim); color: var(--purple); }
.detail-tag.type-api { background: var(--cyan-dim); color: var(--cyan); }
.detail-tag.type-db { background: var(--green-dim); color: var(--green); }
.detail-tag.type-transform { background: var(--purple-dim); color: var(--purple); }
.detail-tag.type-decision { background: var(--amber-dim); color: var(--amber); }
.detail-tag.type-error { background: var(--red-dim); color: var(--red); }
.detail-tag.type-queue { background: rgba(247,120,186,0.15); color: var(--pink); }
.watermark-panel { border-top: 1px solid var(--border); background: var(--bg-secondary); }
.watermark-toggle {
padding: 10px 24px; cursor: pointer; font-size: 12px; color: var(--text-dim);
display: flex; align-items: center; gap: 8px; user-select: none;
}
.watermark-toggle:hover { color: var(--text); }
.watermark-toggle .chevron { transition: transform 0.2s; font-size: 10px; }
.watermark-toggle .chevron.open { transform: rotate(180deg); }
.watermark-content { display: none; padding: 0 24px 16px; max-height: 260px; overflow-y: auto; }
.watermark-content.open { display: block; }
.wm-table { width: 100%; border-collapse: collapse; font-size: 11px; }
.wm-table th {
text-align: left; padding: 6px 12px; color: var(--text-dim); font-weight: 600;
border-bottom: 1px solid var(--border); font-size: 10px;
text-transform: uppercase; letter-spacing: 0.5px;
}
.wm-table td { padding: 6px 12px; border-bottom: 1px solid var(--border); color: var(--text); }
.wm-table td:first-child { color: var(--cyan); font-weight: 600; }
.wm-table td:nth-child(2) { color: var(--green); }
.overview-pipeline { display: flex; gap: 0; align-items: stretch; margin: 24px 0; flex-wrap: wrap; }
.overview-stage {
flex: 1; min-width: 200px; background: var(--bg-secondary);
border: 1px solid var(--border); border-radius: 10px; padding: 20px;
cursor: pointer; transition: all 0.2s;
}
.overview-stage:hover {
border-color: var(--cyan); transform: translateY(-2px);
box-shadow: 0 6px 20px rgba(0,0,0,0.3);
}
.overview-arrow { display: flex; align-items: center; padding: 0 8px; font-size: 20px; color: var(--text-dim); }
.stage-num { font-size: 10px; font-weight: 700; text-transform: uppercase; letter-spacing: 1px; margin-bottom: 8px; }
.stage-title { font-size: 15px; font-weight: 700; color: var(--text-bright); margin-bottom: 6px; }
.stage-desc { font-size: 11px; color: var(--text-dim); line-height: 1.6; }
.stage-detail {
margin-top: 12px; padding-top: 12px; border-top: 1px solid var(--border);
font-size: 11px; color: var(--text-dim); line-height: 1.6;
}
.stage-detail code {
color: var(--amber); background: var(--amber-dim); padding: 1px 5px;
border-radius: 3px; font-size: 10px;
}
.info-box {
background: var(--bg-tertiary); border: 1px solid var(--border);
border-radius: 8px; padding: 16px; margin: 16px 0; font-size: 12px; line-height: 1.7;
}
.info-box-title { font-weight: 600; color: var(--cyan); margin-bottom: 6px; display: flex; align-items: center; gap: 6px; }
.info-box ul { margin-left: 16px; color: var(--text-dim); }
.info-box li { margin: 4px 0; }
.info-box code {
color: var(--amber); background: var(--amber-dim);
padding: 1px 5px; border-radius: 3px; font-size: 11px;
}
.legend {
display: flex; gap: 16px; flex-wrap: wrap; margin-bottom: 24px;
padding: 12px 16px; background: var(--bg-secondary);
border: 1px solid var(--border); border-radius: 8px;
}
.legend-item { display: flex; align-items: center; gap: 6px; font-size: 11px; color: var(--text-dim); }
.legend-color { width: 12px; height: 3px; border-radius: 2px; }
::-webkit-scrollbar { width: 8px; height: 8px; }
::-webkit-scrollbar-track { background: transparent; }
::-webkit-scrollbar-thumb { background: var(--border); border-radius: 4px; }
::-webkit-scrollbar-thumb:hover { background: var(--text-dim); }
</style>
</head>
<body>
<div class="sidebar">
<div class="logo">
<svg viewBox="0 0 20 20" fill="none" stroke="currentColor" stroke-width="1.5">
<circle cx="10" cy="10" r="8"/><path d="M10 6v4l3 2"/>
</svg>
lore sync
</div>
<div class="sidebar-title">Entity Flows</div>
<div class="nav-item active" data-view="overview" onclick="switchView('overview')">
<div class="nav-dot" style="background:var(--cyan)"></div>Full Sync Overview
</div>
<div class="nav-item" data-view="issues" onclick="switchView('issues')">
<div class="nav-dot" style="background:var(--green)"></div>Issues
</div>
<div class="nav-item" data-view="mrs" onclick="switchView('mrs')">
<div class="nav-dot" style="background:var(--purple)"></div>Merge Requests
</div>
<div class="nav-item" data-view="docs" onclick="switchView('docs')">
<div class="nav-dot" style="background:var(--amber)"></div>Documents
</div>
<div class="nav-item" data-view="embed" onclick="switchView('embed')">
<div class="nav-dot" style="background:var(--pink)"></div>Embeddings
</div>
</div>
<div class="main">
<div class="header">
<h1 id="view-title">Full Sync Overview</h1>
<span class="header-badge" id="view-badge">4 stages</span>
</div>
<div class="canvas-wrapper"><div class="canvas">
<!-- OVERVIEW -->
<div class="flow-container active" id="view-overview">
<div class="overview-pipeline">
<div class="overview-stage" onclick="switchView('issues')">
<div class="stage-num" style="color:var(--green)">Stage 1</div>
<div class="stage-title">Ingest Issues</div>
<div class="stage-desc">Fetch issues + discussions + resource events from GitLab API</div>
<div class="stage-detail">Cursor-based incremental sync.<br>Sequential discussion fetch.<br>Queue-based resource events.</div>
</div>
<div class="overview-arrow">&rarr;</div>
<div class="overview-stage" onclick="switchView('mrs')">
<div class="stage-num" style="color:var(--purple)">Stage 2</div>
<div class="stage-title">Ingest MRs</div>
<div class="stage-desc">Fetch merge requests + discussions + resource events</div>
<div class="stage-detail">Page-based incremental sync.<br>Parallel prefetch discussions.<br>Queue-based resource events.</div>
</div>
<div class="overview-arrow">&rarr;</div>
<div class="overview-stage" onclick="switchView('docs')">
<div class="stage-num" style="color:var(--amber)">Stage 3</div>
<div class="stage-title">Generate Docs</div>
<div class="stage-desc">Regenerate searchable documents for changed entities</div>
<div class="stage-detail">Driven by <code>dirty_sources</code> table.<br>Triple-hash skip optimization.<br>FTS5 index auto-updated.</div>
</div>
<div class="overview-arrow">&rarr;</div>
<div class="overview-stage" onclick="switchView('embed')">
<div class="stage-num" style="color:var(--pink)">Stage 4</div>
<div class="stage-title">Embed</div>
<div class="stage-desc">Generate vector embeddings via Ollama for semantic search</div>
<div class="stage-detail">Hash-based change detection.<br>Chunked, batched API calls.<br><b>Non-fatal</b> &mdash; graceful if Ollama down.</div>
</div>
</div>
<div class="info-box">
<div class="info-box-title">Concurrency Model</div>
<ul>
<li>Stages 1 &amp; 2 process <b>projects concurrently</b> via <code>buffer_unordered(primary_concurrency)</code></li>
<li>Each project gets its own <b>SQLite connection</b>; rate limiter is <b>shared</b></li>
<li>Discussions: <b>sequential</b> (issues) or <b>batched parallel prefetch</b> (MRs)</li>
<li>Resource events use a <b>persistent job queue</b> with atomic claim + exponential backoff</li>
</ul>
</div>
<div class="info-box">
<div class="info-box-title">Sync Flags</div>
<ul>
<li><code>--full</code> &mdash; Resets all cursors &amp; watermarks, forces complete re-fetch</li>
<li><code>--no-docs</code> &mdash; Skips Stage 3 (document generation)</li>
<li><code>--no-embed</code> &mdash; Skips Stage 4 (embedding generation)</li>
<li><code>--force</code> &mdash; Overrides stale single-flight lock</li>
<li><code>--project &lt;path&gt;</code> &mdash; Sync only one project (fuzzy matching)</li>
</ul>
</div>
<div class="info-box">
<div class="info-box-title">Single-Flight Lock</div>
<ul>
<li>Table-based lock (<code>AppLock</code>) prevents concurrent syncs</li>
<li>Heartbeat keeps the lock alive; stale locks auto-detected</li>
<li>Use <code>--force</code> to override a stale lock</li>
</ul>
</div>
</div>
<!-- ISSUES -->
<div class="flow-container" id="view-issues">
<div class="legend">
<div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>API Call</div>
<div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Transform</div>
<div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
<div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Decision</div>
<div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error Path</div>
<div class="legend-item"><div class="legend-color" style="background:var(--pink)"></div>Queue</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">1</div>
<div class="phase-title">Fetch Issues <span class="phase-subtitle">Cursor-Based Incremental Sync</span></div>
</div>
<div class="flow-row">
<div class="node api" data-detail="issue-api-call"><div class="node-title">GitLab API Call</div><div class="node-desc">paginate_issues() with<br>updated_after = cursor - rewind</div></div>
<div class="arrow">&rarr;</div>
<div class="node decision" data-detail="issue-cursor-filter"><div class="node-title">Cursor Filter</div><div class="node-desc">updated_at &gt; cursor_ts<br>OR tie_breaker check</div></div>
<div class="arrow">&rarr;</div>
<div class="node transform" data-detail="issue-transform"><div class="node-title">transform_issue()</div><div class="node-desc">GitLab API shape &rarr;<br>local DB row shape</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="issue-transaction"><div class="node-title">Transaction</div><div class="node-desc">store_payload &rarr; upsert &rarr;<br>mark_dirty &rarr; relink</div></div>
</div>
<div class="arrow-down">&darr;</div>
<div class="flow-row">
<div class="node db" data-detail="issue-cursor-update"><div class="node-title">Update Cursor</div><div class="node-desc">Every 100 issues + final<br>sync_cursors table</div></div>
</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:var(--green-dim);color:var(--green)">2</div>
<div class="phase-title">Discussion Sync <span class="phase-subtitle">Sequential, Watermark-Based</span></div>
</div>
<div class="flow-row">
<div class="node db" data-detail="issue-disc-query"><div class="node-title">Query Stale Issues</div><div class="node-desc">updated_at &gt; COALESCE(<br>discussions_synced_for_<br>updated_at, 0)</div></div>
<div class="arrow">&rarr;</div>
<div class="node api" data-detail="issue-disc-fetch"><div class="node-title">Paginate Discussions</div><div class="node-desc">Sequential per issue<br>paginate_issue_discussions()</div></div>
<div class="arrow">&rarr;</div>
<div class="node transform" data-detail="issue-disc-transform"><div class="node-title">Transform</div><div class="node-desc">transform_discussion()<br>transform_notes()</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="issue-disc-write"><div class="node-title">Write Discussion</div><div class="node-desc">store_payload &rarr; upsert<br>DELETE notes &rarr; INSERT notes</div></div>
</div>
<div class="branch-container">
<div class="branch-label success">&#10003; On Success (all pages fetched)</div>
<div class="branch-row">
<div class="node db" data-detail="issue-disc-stale"><div class="node-title">Remove Stale</div><div class="node-desc">DELETE discussions not<br>seen in this fetch</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="issue-disc-watermark"><div class="node-title">Advance Watermark</div><div class="node-desc">discussions_synced_for_<br>updated_at = updated_at</div></div>
</div>
<div class="branch-label error">&#10007; On Pagination Error</div>
<div class="branch-row">
<div class="node error" data-detail="issue-disc-fail"><div class="node-title">Skip Stale Removal</div><div class="node-desc">Watermark NOT advanced<br>Will retry next sync</div></div>
</div>
</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:rgba(247,120,186,0.15);color:var(--pink)">3</div>
<div class="phase-title">Resource Events <span class="phase-subtitle">Queue-Based, Concurrent Fetch</span></div>
</div>
<div class="flow-row">
<div class="node queue" data-detail="re-cleanup"><div class="node-title">Cleanup Obsolete</div><div class="node-desc">DELETE jobs where entity<br>watermark is current</div></div>
<div class="arrow">&rarr;</div>
<div class="node queue" data-detail="re-enqueue"><div class="node-title">Enqueue Jobs</div><div class="node-desc">INSERT for entities where<br>updated_at &gt; watermark</div></div>
<div class="arrow">&rarr;</div>
<div class="node queue" data-detail="re-claim"><div class="node-title">Claim Jobs</div><div class="node-desc">Atomic UPDATE...RETURNING<br>with lock acquisition</div></div>
<div class="arrow">&rarr;</div>
<div class="node api" data-detail="re-fetch"><div class="node-title">Fetch Events</div><div class="node-desc">3 concurrent: state +<br>label + milestone</div></div>
</div>
<div class="branch-container">
<div class="branch-label success">&#10003; On Success</div>
<div class="branch-row">
<div class="node db" data-detail="re-store"><div class="node-title">Store Events</div><div class="node-desc">Transaction: upsert all<br>3 event types</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="re-complete"><div class="node-title">Complete + Watermark</div><div class="node-desc">DELETE job row<br>Advance watermark</div></div>
</div>
<div class="branch-label error">&#10007; Permanent Error (404 / 403)</div>
<div class="branch-row">
<div class="node error" data-detail="re-permanent"><div class="node-title">Skip Permanently</div><div class="node-desc">complete_job + advance<br>watermark (coalesced)</div></div>
</div>
<div class="branch-label retry">&#8635; Transient Error</div>
<div class="branch-row">
<div class="node error" data-detail="re-transient"><div class="node-title">Backoff Retry</div><div class="node-desc">fail_job: 30s x 2^(n-1)<br>capped at 480s</div></div>
</div>
</div>
</div>
</div>
<!-- MERGE REQUESTS -->
<div class="flow-container" id="view-mrs">
<div class="legend">
<div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>API Call</div>
<div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Transform</div>
<div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
<div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Diff from Issues</div>
<div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error Path</div>
<div class="legend-item"><div class="legend-color" style="background:var(--pink)"></div>Queue</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">1</div>
<div class="phase-title">Fetch MRs <span class="phase-subtitle">Page-Based Incremental Sync</span></div>
</div>
<div class="flow-row">
<div class="node api" data-detail="mr-api-call"><div class="node-title">GitLab API Call</div><div class="node-desc">fetch_merge_requests_page()<br>with cursor rewind</div><div class="diff-badge changed">Page-based, not streaming</div></div>
<div class="arrow">&rarr;</div>
<div class="node decision" data-detail="mr-cursor-filter"><div class="node-title">Cursor Filter</div><div class="node-desc">Same logic as issues:<br>timestamp + tie-breaker</div><div class="diff-badge same">Same as issues</div></div>
<div class="arrow">&rarr;</div>
<div class="node transform" data-detail="mr-transform"><div class="node-title">transform_merge_request()</div><div class="node-desc">Maps API shape &rarr;<br>local DB row</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="mr-transaction"><div class="node-title">Transaction</div><div class="node-desc">store &rarr; upsert &rarr; dirty &rarr;<br>labels + assignees + reviewers</div><div class="diff-badge changed">3 junction tables (not 2)</div></div>
</div>
<div class="arrow-down">&darr;</div>
<div class="flow-row">
<div class="node db" data-detail="mr-cursor-update"><div class="node-title">Update Cursor</div><div class="node-desc">Per page (not every 100)</div><div class="diff-badge changed">Per page boundary</div></div>
</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:var(--green-dim);color:var(--green)">2</div>
<div class="phase-title">MR Discussion Sync <span class="phase-subtitle">Parallel Prefetch + Serial Write</span></div>
</div>
<div class="info-box" style="margin-left:40px;margin-bottom:16px;">
<div class="info-box-title">Key Differences from Issue Discussions</div>
<ul>
<li><b>Parallel prefetch</b> &mdash; fetches all discussions for a batch concurrently via <code>join_all()</code></li>
<li><b>Upsert pattern</b> &mdash; notes use INSERT...ON CONFLICT (not delete-all + re-insert)</li>
<li><b>Sweep stale</b> &mdash; uses <code>last_seen_at</code> timestamp comparison (not set difference)</li>
<li><b>Sync health tracking</b> &mdash; records <code>discussions_sync_attempts</code> and <code>last_error</code></li>
</ul>
</div>
<div class="flow-row">
<div class="node db" data-detail="mr-disc-query"><div class="node-title">Query Stale MRs</div><div class="node-desc">updated_at &gt; COALESCE(<br>discussions_synced_for_<br>updated_at, 0)</div><div class="diff-badge same">Same watermark logic</div></div>
<div class="arrow">&rarr;</div>
<div class="node decision" data-detail="mr-disc-batch"><div class="node-title">Batch by Concurrency</div><div class="node-desc">dependent_concurrency<br>MRs per batch</div><div class="diff-badge changed">Batched processing</div></div>
</div>
<div class="arrow-down">&darr;</div>
<div class="flow-row">
<div class="node api" data-detail="mr-disc-prefetch"><div class="node-title">Parallel Prefetch</div><div class="node-desc">join_all() fetches all<br>discussions for batch</div><div class="diff-badge changed">Parallel (not sequential)</div></div>
<div class="arrow">&rarr;</div>
<div class="node transform" data-detail="mr-disc-transform"><div class="node-title">Transform In-Memory</div><div class="node-desc">transform_mr_discussion()<br>+ diff position notes</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="mr-disc-write"><div class="node-title">Serial Write</div><div class="node-desc">upsert discussion<br>upsert notes (ON CONFLICT)</div><div class="diff-badge changed">Upsert, not delete+insert</div></div>
</div>
<div class="branch-container">
<div class="branch-label success">&#10003; On Full Success</div>
<div class="branch-row">
<div class="node db" data-detail="mr-disc-sweep"><div class="node-title">Sweep Stale</div><div class="node-desc">DELETE WHERE last_seen_at<br>&lt; run_seen_at (disc + notes)</div><div class="diff-badge changed">last_seen_at sweep</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="mr-disc-watermark"><div class="node-title">Advance Watermark</div><div class="node-desc">discussions_synced_for_<br>updated_at = updated_at</div></div>
</div>
<div class="branch-label error">&#10007; On Failure</div>
<div class="branch-row">
<div class="node error" data-detail="mr-disc-fail"><div class="node-title">Record Sync Health</div><div class="node-desc">Watermark NOT advanced<br>Tracks attempts + last_error</div><div class="diff-badge changed">Health tracking</div></div>
</div>
</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:rgba(247,120,186,0.15);color:var(--pink)">3</div>
<div class="phase-title">Resource Events <span class="phase-subtitle">Same as Issues</span></div>
</div>
<div class="info-box" style="margin-left:40px">
<div class="info-box-title">Identical to Issue Resource Events</div>
<ul>
<li>Same queue-based approach: cleanup &rarr; enqueue &rarr; claim &rarr; fetch &rarr; store/fail</li>
<li>Same watermark column: <code>resource_events_synced_for_updated_at</code></li>
<li>Same error handling: 404/403 coalesced to empty, transient errors get backoff</li>
<li>entity_type = <code>"merge_request"</code> instead of <code>"issue"</code></li>
</ul>
</div>
</div>
</div>
<!-- DOCUMENTS -->
<div class="flow-container" id="view-docs">
<div class="legend">
<div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>Trigger</div>
<div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Extract</div>
<div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
<div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Decision</div>
<div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">1</div>
<div class="phase-title">Dirty Source Queue <span class="phase-subtitle">Populated During Ingestion</span></div>
</div>
<div class="flow-row">
<div class="node api" data-detail="doc-trigger"><div class="node-title">mark_dirty_tx()</div><div class="node-desc">Called during every issue/<br>MR/discussion upsert</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="doc-dirty-table"><div class="node-title">dirty_sources Table</div><div class="node-desc">INSERT (source_type, source_id)<br>ON CONFLICT reset backoff</div></div>
</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:var(--amber-dim);color:var(--amber)">2</div>
<div class="phase-title">Drain Loop <span class="phase-subtitle">Batch 500, Respects Backoff</span></div>
</div>
<div class="flow-row">
<div class="node db" data-detail="doc-drain"><div class="node-title">Get Dirty Sources</div><div class="node-desc">Batch 500, ORDER BY<br>attempt_count, queued_at</div></div>
<div class="arrow">&rarr;</div>
<div class="node decision" data-detail="doc-dispatch"><div class="node-title">Dispatch by Type</div><div class="node-desc">issue / mr / discussion<br>&rarr; extract function</div></div>
<div class="arrow">&rarr;</div>
<div class="node decision" data-detail="doc-deleted-check"><div class="node-title">Source Exists?</div><div class="node-desc">If deleted: remove doc row<br>(cascade cleans FTS + embeds)</div></div>
</div>
<div class="arrow-down">&darr;</div>
<div class="flow-row">
<div class="node transform" data-detail="doc-extract"><div class="node-title">Extract Content</div><div class="node-desc">Structured text:<br>header + metadata + body</div></div>
<div class="arrow">&rarr;</div>
<div class="node decision" data-detail="doc-triple-hash"><div class="node-title">Triple-Hash Check</div><div class="node-desc">content_hash + labels_hash<br>+ paths_hash all match?</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="doc-write"><div class="node-title">SAVEPOINT Write</div><div class="node-desc">Atomic: document row +<br>labels + paths</div></div>
</div>
<div class="branch-container">
<div class="branch-label success">&#10003; On Success</div>
<div class="branch-row">
<div class="node db" data-detail="doc-clear"><div class="node-title">clear_dirty()</div><div class="node-desc">Remove from dirty_sources</div></div>
</div>
<div class="branch-label error">&#10007; On Error</div>
<div class="branch-row">
<div class="node error" data-detail="doc-error"><div class="node-title">record_dirty_error()</div><div class="node-desc">Increment attempt_count<br>Exponential backoff</div></div>
</div>
<div class="branch-label" style="color:var(--purple)">&#8801; Triple-Hash Match (skip)</div>
<div class="branch-row">
<div class="node db" data-detail="doc-skip"><div class="node-title">Skip Write</div><div class="node-desc">All 3 hashes match &rarr;<br>no WAL churn, clear dirty</div></div>
</div>
</div>
</div>
<div class="info-box">
<div class="info-box-title">Full Mode (<code>--full</code>)</div>
<ul>
<li>Seeds <b>ALL</b> entities into <code>dirty_sources</code> via keyset pagination</li>
<li>Triple-hash optimization prevents redundant writes even in full mode</li>
<li>Runs FTS <code>OPTIMIZE</code> after drain completes</li>
</ul>
</div>
</div>
<!-- EMBEDDINGS -->
<div class="flow-container" id="view-embed">
<div class="legend">
<div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>API (Ollama)</div>
<div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Processing</div>
<div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
<div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Decision</div>
<div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:var(--amber-dim);color:var(--amber)">1</div>
<div class="phase-title">Change Detection <span class="phase-subtitle">Hash + Config Drift</span></div>
</div>
<div class="flow-row">
<div class="node decision" data-detail="embed-detect"><div class="node-title">find_pending_documents()</div><div class="node-desc">No metadata row? OR<br>document_hash mismatch? OR<br>config drift?</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="embed-paginate"><div class="node-title">Keyset Pagination</div><div class="node-desc">500 documents per page<br>ordered by doc ID</div></div>
</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:var(--purple-dim);color:var(--purple)">2</div>
<div class="phase-title">Chunking <span class="phase-subtitle">Split + Overflow Guard</span></div>
</div>
<div class="flow-row">
<div class="node transform" data-detail="embed-chunk"><div class="node-title">split_into_chunks()</div><div class="node-desc">Split by paragraph boundaries<br>with configurable overlap</div></div>
<div class="arrow">&rarr;</div>
<div class="node decision" data-detail="embed-overflow"><div class="node-title">Overflow Guard</div><div class="node-desc">Too many chunks?<br>Skip to prevent rowid collision</div></div>
<div class="arrow">&rarr;</div>
<div class="node transform" data-detail="embed-work"><div class="node-title">Build ChunkWork</div><div class="node-desc">Assign encoded chunk IDs<br>per document</div></div>
</div>
</div>
<div class="phase">
<div class="phase-header">
<div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">3</div>
<div class="phase-title">Ollama Embedding <span class="phase-subtitle">Batched API Calls</span></div>
</div>
<div class="flow-row">
<div class="node api" data-detail="embed-batch"><div class="node-title">Batch Embed</div><div class="node-desc">32 chunks per Ollama<br>API call</div></div>
<div class="arrow">&rarr;</div>
<div class="node db" data-detail="embed-store"><div class="node-title">Store Vectors</div><div class="node-desc">sqlite-vec embeddings table<br>+ embedding_metadata</div></div>
</div>
<div class="branch-container">
<div class="branch-label success">&#10003; On Success</div>
<div class="branch-row">
<div class="node db" data-detail="embed-success"><div class="node-title">SAVEPOINT Commit</div><div class="node-desc">Atomic per page:<br>clear old + write new</div></div>
</div>
<div class="branch-label retry">&#8635; Context-Length Error</div>
<div class="branch-row">
<div class="node error" data-detail="embed-ctx-error"><div class="node-title">Retry Individually</div><div class="node-desc">Re-embed each chunk solo<br>to isolate oversized one</div></div>
</div>
<div class="branch-label error">&#10007; Other Error</div>
<div class="branch-row">
<div class="node error" data-detail="embed-other-error"><div class="node-title">Record Error</div><div class="node-desc">Store in embedding_metadata<br>for retry next run</div></div>
</div>
</div>
</div>
<div class="info-box">
<div class="info-box-title">Full Mode (<code>--full</code>)</div>
<ul>
<li>DELETEs all <code>embedding_metadata</code> and <code>embeddings</code> rows first</li>
<li>Every document re-processed from scratch</li>
</ul>
</div>
<div class="info-box">
<div class="info-box-title">Non-Fatal in Sync</div>
<ul>
<li>Stage 4 failures (Ollama down, model missing) are <b>graceful</b></li>
<li>Sync completes successfully; embeddings just won't be updated</li>
<li>Semantic search degrades to FTS-only mode</li>
</ul>
</div>
</div>
</div></div>
<!-- Watermark Panel -->
<div class="watermark-panel">
<div class="watermark-toggle" onclick="toggleWatermarks()">
<span class="chevron" id="wm-chevron">&#9650;</span>
Watermark &amp; Cursor Reference
</div>
<div class="watermark-content" id="wm-content">
<table class="wm-table">
<thead><tr><th>Table</th><th>Column(s)</th><th>Purpose</th></tr></thead>
<tbody>
<tr><td>sync_cursors</td><td>updated_at_cursor + tie_breaker_id</td><td>Incremental fetch: "last entity we saw" per project+type</td></tr>
<tr><td>issues</td><td>discussions_synced_for_updated_at</td><td>Per-issue discussion watermark</td></tr>
<tr><td>issues</td><td>resource_events_synced_for_updated_at</td><td>Per-issue resource event watermark</td></tr>
<tr><td>merge_requests</td><td>discussions_synced_for_updated_at</td><td>Per-MR discussion watermark</td></tr>
<tr><td>merge_requests</td><td>resource_events_synced_for_updated_at</td><td>Per-MR resource event watermark</td></tr>
<tr><td>dirty_sources</td><td>queued_at + next_attempt_at</td><td>Document regeneration queue with backoff</td></tr>
<tr><td>embedding_metadata</td><td>document_hash + chunk_max_bytes + model + dims</td><td>Embedding staleness detection</td></tr>
<tr><td>pending_dependent_fetches</td><td>locked_at + next_retry_at + attempts</td><td>Resource event job queue with backoff</td></tr>
</tbody>
</table>
</div>
</div>
</div>
<!-- Detail Panel -->
<div class="detail-panel" id="detail-panel">
<div class="detail-header">
<h2 id="detail-title">Node Details</h2>
<button class="detail-close" onclick="closeDetail()">&times;</button>
</div>
<div class="detail-body" id="detail-body"></div>
</div>
<script>
const viewTitles = {
overview: 'Full Sync Overview', issues: 'Issue Ingestion Flow',
mrs: 'Merge Request Ingestion Flow', docs: 'Document Generation Flow',
embed: 'Embedding Generation Flow',
};
const viewBadges = {
overview: '4 stages', issues: '3 phases', mrs: '3 phases',
docs: '2 phases', embed: '3 phases',
};
function switchView(view) {
document.querySelectorAll('.flow-container').forEach(function(el) { el.classList.remove('active'); });
document.getElementById('view-' + view).classList.add('active');
document.querySelectorAll('.nav-item').forEach(function(el) {
el.classList.toggle('active', el.dataset.view === view);
});
document.getElementById('view-title').textContent = viewTitles[view];
document.getElementById('view-badge').textContent = viewBadges[view];
closeDetail();
}
function toggleWatermarks() {
document.getElementById('wm-content').classList.toggle('open');
document.getElementById('wm-chevron').classList.toggle('open');
}
var details = {
'issue-api-call': { title: 'GitLab API: Paginate Issues', type: 'api', file: 'src/ingestion/issues.rs:51-140', desc: 'Streams issues from the GitLab API using cursor-based incremental sync. The API is called with updated_after set to the last known cursor minus a configurable rewind window (to handle clock skew between GitLab and the local database).', sql: 'GET /api/v4/projects/{id}/issues\n ?updated_after={cursor - rewind_seconds}\n &order_by=updated_at&sort=asc\n &per_page=100' },
'issue-cursor-filter': { title: 'Cursor Filter (Dedup)', type: 'decision', file: 'src/ingestion/issues.rs:95-110', desc: 'Because of the cursor rewind, some issues will be re-fetched that we already have. The cursor filter skips these using a two-part comparison: primary on updated_at timestamp, with gitlab_id as a tie-breaker when timestamps are equal.', sql: '// Pseudocode:\nif issue.updated_at > cursor_ts:\n ACCEPT // newer than cursor\nelif issue.updated_at == cursor_ts\n AND issue.gitlab_id > tie_breaker_id:\n ACCEPT // same timestamp, higher ID\nelse:\n SKIP // already processed' },
'issue-transform': { title: 'Transform Issue', type: 'transform', file: 'src/gitlab/transformers/issue.rs', desc: 'Maps the GitLab API response shape to the local database row shape. Parses ISO 8601 timestamps to milliseconds-since-epoch, extracts label names, assignee usernames, milestone info, and due dates.' },
'issue-transaction': { title: 'Issue Write Transaction', type: 'db', file: 'src/ingestion/issues.rs:190-220', desc: 'All operations for a single issue are wrapped in one SQLite transaction for atomicity. If any step fails, the entire issue write is rolled back.', sql: 'BEGIN;\n-- 1. Store raw JSON payload (compressed, deduped)\nINSERT INTO payloads ...;\n-- 2. Upsert issue row\nINSERT INTO issues ... ON CONFLICT(gitlab_id)\n DO UPDATE SET ...;\n-- 3. Mark dirty for document regen\nINSERT INTO dirty_sources ...;\n-- 4. Relink labels\nDELETE FROM issue_labels WHERE issue_id = ?;\nINSERT INTO labels ... ON CONFLICT DO UPDATE;\nINSERT INTO issue_labels ...;\n-- 5. Relink assignees\nDELETE FROM issue_assignees WHERE issue_id = ?;\nINSERT INTO issue_assignees ...;\nCOMMIT;' },
'issue-cursor-update': { title: 'Update Sync Cursor', type: 'db', file: 'src/ingestion/issues.rs:130-140', desc: 'The sync cursor is updated every 100 issues (for crash recovery) and once at the end of the stream. If the process crashes mid-sync, it resumes from at most 100 issues back.', sql: 'INSERT INTO sync_cursors\n (project_id, resource_type,\n updated_at_cursor, tie_breaker_id)\nVALUES (?1, \'issues\', ?2, ?3)\nON CONFLICT(project_id, resource_type)\n DO UPDATE SET\n updated_at_cursor = ?2,\n tie_breaker_id = ?3;' },
'issue-disc-query': { title: 'Query Issues Needing Discussion Sync', type: 'db', file: 'src/ingestion/issues.rs:450-471', desc: 'Finds all issues in this project whose updated_at timestamp exceeds their per-row discussion watermark. Issues that have not changed since their last discussion sync are skipped entirely.', sql: 'SELECT id, iid, updated_at\nFROM issues\nWHERE project_id = ?1\n AND updated_at > COALESCE(\n discussions_synced_for_updated_at, 0\n );' },
'issue-disc-fetch': { title: 'Paginate Issue Discussions', type: 'api', file: 'src/ingestion/discussions.rs:73-205', desc: 'Discussions are fetched sequentially per issue (rusqlite Connection is not Send, so async parallelism is not possible here). Each issue\'s discussions are streamed page by page from the GitLab API.', sql: 'GET /api/v4/projects/{id}/issues/{iid}\n /discussions?per_page=100' },
'issue-disc-transform': { title: 'Transform Discussion + Notes', type: 'transform', file: 'src/gitlab/transformers/discussion.rs', desc: 'Transforms the raw GitLab discussion payload into normalized rows. Sets NoteableRef::Issue. Computes resolvable/resolved status, first_note_at/last_note_at timestamps, and per-note position indices.' },
'issue-disc-write': { title: 'Write Discussion (Full Refresh)', type: 'db', file: 'src/ingestion/discussions.rs:140-180', desc: 'Issue discussions use a full-refresh pattern: all existing notes for a discussion are deleted and re-inserted. This is simpler than upsert but means partial failures lose the previous state.', sql: 'BEGIN;\nINSERT INTO payloads ...;\nINSERT INTO discussions ... ON CONFLICT DO UPDATE;\nINSERT INTO dirty_sources ...;\n-- Full refresh: delete all then re-insert\nDELETE FROM notes WHERE discussion_id = ?;\nINSERT INTO notes VALUES (...);\nCOMMIT;' },
'issue-disc-stale': { title: 'Remove Stale Discussions', type: 'db', file: 'src/ingestion/discussions.rs:185-195', desc: 'After successfully fetching ALL discussion pages for an issue, any discussions in the DB that were not seen in this fetch are deleted. Uses a temp table for >500 IDs to avoid SQLite\'s 999-variable limit.', sql: '-- For small sets (<= 500):\nDELETE FROM discussions\nWHERE issue_id = ?\n AND gitlab_id NOT IN (...);\n\n-- For large sets (> 500):\nCREATE TEMP TABLE seen_ids(id TEXT);\nINSERT INTO seen_ids ...;\nDELETE FROM discussions\nWHERE issue_id = ?\n AND gitlab_id NOT IN\n (SELECT id FROM seen_ids);\nDROP TABLE seen_ids;' },
'issue-disc-watermark': { title: 'Advance Discussion Watermark', type: 'db', file: 'src/ingestion/discussions.rs:198', desc: 'Sets the per-issue watermark to the issue\'s current updated_at, signaling that discussions are now synced for this version of the issue.', sql: 'UPDATE issues\nSET discussions_synced_for_updated_at\n = updated_at\nWHERE id = ?;' },
'issue-disc-fail': { title: 'Pagination Error Handling', type: 'error', file: 'src/ingestion/discussions.rs:182', desc: 'If pagination fails mid-stream, stale discussion removal is skipped (we don\'t know the full set) and the watermark is NOT advanced. The issue will be retried on the next sync run.' },
're-cleanup': { title: 'Cleanup Obsolete Jobs', type: 'queue', file: 'src/ingestion/orchestrator.rs:490-520', desc: 'Before enqueuing new jobs, delete any existing jobs for entities whose watermark is already current. These are leftover from a previous run.', sql: 'DELETE FROM pending_dependent_fetches\nWHERE project_id = ?\n AND job_type = \'resource_events\'\n AND entity_local_id IN (\n SELECT id FROM issues\n WHERE project_id = ?\n AND updated_at <= COALESCE(\n resource_events_synced_for_updated_at, 0\n )\n );' },
're-enqueue': { title: 'Enqueue Resource Event Jobs', type: 'queue', file: 'src/ingestion/orchestrator.rs:525-555', desc: 'For each entity whose updated_at exceeds its resource event watermark, insert a job into the queue. Uses INSERT OR IGNORE for idempotency.', sql: 'INSERT OR IGNORE INTO pending_dependent_fetches\n (project_id, entity_type, entity_iid,\n entity_local_id, job_type, enqueued_at)\nSELECT project_id, \'issue\', iid, id,\n \'resource_events\', ?now\nFROM issues\nWHERE project_id = ?\n AND updated_at > COALESCE(\n resource_events_synced_for_updated_at, 0\n );' },
're-claim': { title: 'Claim Jobs (Atomic Lock)', type: 'queue', file: 'src/core/dependent_queue.rs', desc: 'Atomically claims a batch of unlocked jobs whose backoff period has elapsed. Uses UPDATE...RETURNING for lock acquisition in a single statement.', sql: 'UPDATE pending_dependent_fetches\nSET locked_at = ?now\nWHERE rowid IN (\n SELECT rowid\n FROM pending_dependent_fetches\n WHERE project_id = ?\n AND job_type = \'resource_events\'\n AND locked_at IS NULL\n AND (next_retry_at IS NULL\n OR next_retry_at <= ?now)\n ORDER BY enqueued_at ASC\n LIMIT ?batch_size\n)\nRETURNING *;' },
're-fetch': { title: 'Fetch 3 Event Types Concurrently', type: 'api', file: 'src/gitlab/client.rs:732-771', desc: 'Uses tokio::join! (not try_join!) to fetch state, label, and milestone events concurrently. Permanent errors (404, 403) are coalesced to empty vecs via coalesce_inaccessible().', sql: 'tokio::join!(\n fetch_issue_state_events(proj, iid),\n fetch_issue_label_events(proj, iid),\n fetch_issue_milestone_events(proj, iid),\n)\n// Each: coalesce_inaccessible()\n// 404/403 -> Ok(vec![])\n// Other errors -> propagated' },
're-store': { title: 'Store Resource Events', type: 'db', file: 'src/ingestion/orchestrator.rs:620-640', desc: 'All three event types are upserted in a single transaction.', sql: 'BEGIN;\nINSERT INTO resource_state_events ...\n ON CONFLICT DO UPDATE;\nINSERT INTO resource_label_events ...\n ON CONFLICT DO UPDATE;\nINSERT INTO resource_milestone_events ...\n ON CONFLICT DO UPDATE;\nCOMMIT;' },
're-complete': { title: 'Complete Job + Advance Watermark', type: 'db', file: 'src/ingestion/orchestrator.rs:645-660', desc: 'After successful storage, the job row is deleted and the entity\'s watermark is advanced.', sql: 'DELETE FROM pending_dependent_fetches\n WHERE rowid = ?;\n\nUPDATE issues\nSET resource_events_synced_for_updated_at\n = updated_at\nWHERE id = ?;' },
're-permanent': { title: 'Permanent Error: Skip Entity', type: 'error', file: 'src/ingestion/orchestrator.rs:665-680', desc: '404 (endpoint doesn\'t exist) and 403 (insufficient permissions) are permanent. The job is completed and watermark advanced, so this entity is permanently skipped until next updated on GitLab.' },
're-transient': { title: 'Transient Error: Exponential Backoff', type: 'error', file: 'src/core/dependent_queue.rs', desc: 'Network errors, 500s, rate limits get exponential backoff. Formula: 30s * 2^(attempts-1), capped at 480s (8 minutes).', sql: 'UPDATE pending_dependent_fetches\nSET locked_at = NULL,\n attempts = attempts + 1,\n next_retry_at = ?now\n + 30000 * pow(2, attempts),\n -- capped at 480000ms (8 min)\n last_error = ?error_msg\nWHERE rowid = ?;' },
'mr-api-call': { title: 'GitLab API: Fetch MR Pages', type: 'api', file: 'src/ingestion/merge_requests.rs:51-151', desc: 'Unlike issues which stream, MRs use explicit page-based pagination via fetch_merge_requests_page(). Each page returns items plus a next_page indicator.', sql: 'GET /api/v4/projects/{id}/merge_requests\n ?updated_after={cursor - rewind}\n &order_by=updated_at&sort=asc\n &per_page=100&page={n}' },
'mr-cursor-filter': { title: 'Cursor Filter', type: 'decision', file: 'src/ingestion/merge_requests.rs:90-105', desc: 'Identical logic to issues: timestamp comparison with gitlab_id tie-breaker.' },
'mr-transform': { title: 'Transform Merge Request', type: 'transform', file: 'src/gitlab/transformers/mr.rs', desc: 'Maps GitLab MR response to local row. Handles draft detection (prefers draft field, falls back to work_in_progress), detailed_merge_status, merge_user resolution, and reviewer extraction.' },
'mr-transaction': { title: 'MR Write Transaction', type: 'db', file: 'src/ingestion/merge_requests.rs:170-210', desc: 'Same pattern as issues but with THREE junction tables: labels, assignees, AND reviewers.', sql: 'BEGIN;\nINSERT INTO payloads ...;\nINSERT INTO merge_requests ...\n ON CONFLICT DO UPDATE;\nINSERT INTO dirty_sources ...;\n-- 3 junction tables:\nDELETE FROM mr_labels WHERE mr_id = ?;\nINSERT INTO mr_labels ...;\nDELETE FROM mr_assignees WHERE mr_id = ?;\nINSERT INTO mr_assignees ...;\nDELETE FROM mr_reviewers WHERE mr_id = ?;\nINSERT INTO mr_reviewers ...;\nCOMMIT;' },
'mr-cursor-update': { title: 'Update Cursor Per Page', type: 'db', file: 'src/ingestion/merge_requests.rs:140-150', desc: 'Unlike issues (every 100 items), MR cursor is updated at each page boundary for better crash recovery.' },
'mr-disc-query': { title: 'Query MRs Needing Discussion Sync', type: 'db', file: 'src/ingestion/merge_requests.rs:430-451', desc: 'Same watermark pattern as issues. Runs AFTER MR ingestion to avoid memory growth.', sql: 'SELECT id, iid, updated_at\nFROM merge_requests\nWHERE project_id = ?1\n AND updated_at > COALESCE(\n discussions_synced_for_updated_at, 0\n );' },
'mr-disc-batch': { title: 'Batch by Concurrency', type: 'decision', file: 'src/ingestion/orchestrator.rs:420-465', desc: 'MRs are processed in batches sized by dependent_concurrency. Each batch first prefetches all discussions in parallel, then writes serially.' },
'mr-disc-prefetch': { title: 'Parallel Prefetch', type: 'api', file: 'src/ingestion/mr_discussions.rs:66-120', desc: 'All MRs in the batch have their discussions fetched concurrently via join_all(). Each MR\'s discussions are fetched in one call, transformed in memory, and returned as PrefetchedMrDiscussions.', sql: '// For each MR in batch, concurrently:\nGET /api/v4/projects/{id}/merge_requests\n /{iid}/discussions?per_page=100\n\n// All fetched + transformed in memory\n// before any DB writes happen' },
'mr-disc-transform': { title: 'Transform MR Discussions', type: 'transform', file: 'src/ingestion/mr_discussions.rs:125-160', desc: 'Uses transform_mr_discussion() which additionally handles DiffNote positions (file paths, line ranges, SHA triplets).' },
'mr-disc-write': { title: 'Serial Write (Upsert Pattern)', type: 'db', file: 'src/ingestion/mr_discussions.rs:165-220', desc: 'Unlike issue discussions (delete-all + re-insert), MR discussions use INSERT...ON CONFLICT DO UPDATE for both discussions and notes. Safer for partial failures.', sql: 'BEGIN;\nINSERT INTO payloads ...;\nINSERT INTO discussions ...\n ON CONFLICT DO UPDATE\n SET ..., last_seen_at = ?run_ts;\nINSERT INTO dirty_sources ...;\n-- Upsert notes (not delete+insert):\nINSERT INTO notes ...\n ON CONFLICT DO UPDATE\n SET ..., last_seen_at = ?run_ts;\nCOMMIT;' },
'mr-disc-sweep': { title: 'Sweep Stale (last_seen_at)', type: 'db', file: 'src/ingestion/mr_discussions.rs:225-245', desc: 'Staleness detected via last_seen_at timestamps. Both discussions AND notes are swept independently.', sql: '-- Sweep stale discussions:\nDELETE FROM discussions\nWHERE merge_request_id = ?\n AND last_seen_at < ?run_seen_at;\n\n-- Sweep stale notes:\nDELETE FROM notes\nWHERE discussion_id IN (\n SELECT id FROM discussions\n WHERE merge_request_id = ?\n) AND last_seen_at < ?run_seen_at;' },
'mr-disc-watermark': { title: 'Advance MR Discussion Watermark', type: 'db', file: 'src/ingestion/mr_discussions.rs:248', desc: 'Same as issues: stamps the per-MR watermark.', sql: 'UPDATE merge_requests\nSET discussions_synced_for_updated_at\n = updated_at\nWHERE id = ?;' },
'mr-disc-fail': { title: 'Failure: Sync Health Tracking', type: 'error', file: 'src/ingestion/mr_discussions.rs:252-260', desc: 'Unlike issues, MR discussion failures are tracked: discussions_sync_attempts is incremented and discussions_sync_last_error is recorded. Watermark is NOT advanced.' },
'doc-trigger': { title: 'mark_dirty_tx()', type: 'api', file: 'src/ingestion/dirty_tracker.rs', desc: 'Called during every upsert in ingestion. Inserts into dirty_sources, or on conflict resets backoff. This bridges ingestion (stages 1-2) and document generation (stage 3).', sql: 'INSERT INTO dirty_sources\n (source_type, source_id, queued_at)\nVALUES (?1, ?2, ?now)\nON CONFLICT(source_type, source_id)\n DO UPDATE SET\n queued_at = ?now,\n attempt_count = 0,\n next_attempt_at = NULL,\n last_error = NULL;' },
'doc-dirty-table': { title: 'dirty_sources Table', type: 'db', file: 'src/ingestion/dirty_tracker.rs', desc: 'Persistent queue of entities needing document regeneration. Supports exponential backoff for failed extractions.' },
'doc-drain': { title: 'Get Dirty Sources (Batched)', type: 'db', file: 'src/documents/regenerator.rs:35-45', desc: 'Fetches up to 500 dirty entries per batch, prioritizing fewer attempts. Respects exponential backoff.', sql: 'SELECT source_type, source_id\nFROM dirty_sources\nWHERE next_attempt_at IS NULL\n OR next_attempt_at <= ?now\nORDER BY attempt_count ASC,\n queued_at ASC\nLIMIT 500;' },
'doc-dispatch': { title: 'Dispatch by Source Type', type: 'decision', file: 'src/documents/extractor.rs', desc: 'Routes to the appropriate extraction function: "issue" -> extract_issue_document(), "merge_request" -> extract_mr_document(), "discussion" -> extract_discussion_document().' },
'doc-deleted-check': { title: 'Source Exists Check', type: 'decision', file: 'src/documents/regenerator.rs:48-55', desc: 'If the source entity was deleted, the extractor returns None. The regenerator deletes the document row. FK cascades clean up FTS and embeddings.' },
'doc-extract': { title: 'Extract Structured Content', type: 'transform', file: 'src/documents/extractor.rs', desc: 'Builds searchable text:\n[[Issue]] #42: Title\nProject: group/repo\nURL: ...\nLabels: [bug, urgent]\nState: opened\n\n--- Description ---\n...\n\nDiscussions inherit parent labels and extract DiffNote file paths.' },
'doc-triple-hash': { title: 'Triple-Hash Write Optimization', type: 'decision', file: 'src/documents/regenerator.rs:55-62', desc: 'Checks content_hash + labels_hash + paths_hash against existing document. If ALL three match, write is completely skipped. Critical for --full mode performance.' },
'doc-write': { title: 'SAVEPOINT Atomic Write', type: 'db', file: 'src/documents/regenerator.rs:58-65', desc: 'Document, labels, and paths written inside a SAVEPOINT for atomicity.', sql: 'SAVEPOINT doc_write;\nINSERT INTO documents ...\n ON CONFLICT DO UPDATE SET\n content = ?, content_hash = ?,\n labels_hash = ?, paths_hash = ?;\nDELETE FROM document_labels\n WHERE doc_id = ?;\nINSERT INTO document_labels ...;\nDELETE FROM document_paths\n WHERE doc_id = ?;\nINSERT INTO document_paths ...;\nRELEASE doc_write;' },
'doc-clear': { title: 'Clear Dirty Entry', type: 'db', file: 'src/ingestion/dirty_tracker.rs', desc: 'On success, the dirty_sources row is deleted.', sql: 'DELETE FROM dirty_sources\nWHERE source_type = ?\n AND source_id = ?;' },
'doc-error': { title: 'Record Error + Backoff', type: 'error', file: 'src/ingestion/dirty_tracker.rs', desc: 'Increments attempt_count, sets next_attempt_at with exponential backoff. Entry stays for retry.', sql: 'UPDATE dirty_sources\nSET attempt_count = attempt_count + 1,\n next_attempt_at = ?now\n + compute_backoff(attempt_count),\n last_error = ?error_msg\nWHERE source_type = ?\n AND source_id = ?;' },
'doc-skip': { title: 'Skip Write (Hash Match)', type: 'db', file: 'src/documents/regenerator.rs:57', desc: 'When all three hashes match, the document has not actually changed. Common when updated_at changes but content/labels/paths remain the same. Dirty entry is cleared without writes.' },
'embed-detect': { title: 'Change Detection', type: 'decision', file: 'src/embedding/change_detector.rs', desc: 'Document needs re-embedding if: (1) No embedding_metadata row, (2) document_hash mismatch, (3) Config drift in chunk_max_bytes, model, or dims.', sql: 'SELECT d.id, d.content, d.content_hash\nFROM documents d\nLEFT JOIN embedding_metadata em\n ON em.document_id = d.id\nWHERE em.document_id IS NULL\n OR em.document_hash != d.content_hash\n OR em.chunk_max_bytes != ?config\n OR em.model != ?model\n OR em.dims != ?dims;' },
'embed-paginate': { title: 'Keyset Pagination', type: 'db', file: 'src/embedding/pipeline.rs:80-100', desc: '500 documents per page using keyset pagination. Each page wrapped in a SAVEPOINT.' },
'embed-chunk': { title: 'Split Into Chunks', type: 'transform', file: 'src/embedding/chunking.rs', desc: 'Splits content at paragraph boundaries with configurable max size and overlap.' },
'embed-overflow': { title: 'Overflow Guard', type: 'decision', file: 'src/embedding/pipeline.rs:110-120', desc: 'If a document produces too many chunks, it is skipped to prevent rowid collisions in the encoded chunk ID scheme.' },
'embed-work': { title: 'Build ChunkWork Items', type: 'transform', file: 'src/embedding/pipeline.rs:125-140', desc: 'Each chunk gets an encoded ID (document_id * 1000000 + chunk_index) for the sqlite-vec primary key.' },
'embed-batch': { title: 'Batch Embed via Ollama', type: 'api', file: 'src/embedding/pipeline.rs:150-200', desc: 'Sends 32 chunks per Ollama API call. Model default: nomic-embed-text.', sql: 'POST http://localhost:11434/api/embed\n{\n "model": "nomic-embed-text",\n "input": ["chunk1...", "chunk2...", ...]\n}' },
'embed-store': { title: 'Store Vectors', type: 'db', file: 'src/embedding/pipeline.rs:205-230', desc: 'Vectors stored in sqlite-vec virtual table. Metadata in embedding_metadata. Old embeddings cleared on first successful chunk.', sql: '-- Clear old embeddings:\nDELETE FROM embeddings\n WHERE rowid / 1000000 = ?doc_id;\n\n-- Insert new vector:\nINSERT INTO embeddings(rowid, embedding)\nVALUES (?chunk_id, ?vector_blob);\n\n-- Update metadata:\nINSERT INTO embedding_metadata ...\n ON CONFLICT DO UPDATE SET\n document_hash = ?,\n chunk_max_bytes = ?,\n model = ?, dims = ?;' },
'embed-success': { title: 'SAVEPOINT Commit', type: 'db', file: 'src/embedding/pipeline.rs:240-250', desc: 'Each page of 500 documents wrapped in a SAVEPOINT. Completed pages survive crashes.' },
'embed-ctx-error': { title: 'Context-Length Retry', type: 'error', file: 'src/embedding/pipeline.rs:260-280', desc: 'If Ollama returns context-length error for a batch, each chunk is retried individually to isolate the oversized one.' },
'embed-other-error': { title: 'Record Error for Retry', type: 'error', file: 'src/embedding/pipeline.rs:285-295', desc: 'Network/model errors recorded in embedding_metadata. Document detected as pending again on next run.' },
};
function escapeHtml(str) {
var div = document.createElement('div');
div.appendChild(document.createTextNode(str));
return div.textContent;
}
function buildDetailContent(d) {
var container = document.createDocumentFragment();
// Tags section
var tagSection = document.createElement('div');
tagSection.className = 'detail-section';
var typeTag = document.createElement('span');
typeTag.className = 'detail-tag type-' + d.type;
typeTag.textContent = d.type.toUpperCase();
tagSection.appendChild(typeTag);
if (d.file) {
var fileTag = document.createElement('span');
fileTag.className = 'detail-tag file';
fileTag.textContent = d.file;
tagSection.appendChild(fileTag);
}
container.appendChild(tagSection);
// Description
var descSection = document.createElement('div');
descSection.className = 'detail-section';
var descH3 = document.createElement('h3');
descH3.textContent = 'Description';
descSection.appendChild(descH3);
var descP = document.createElement('p');
descP.textContent = d.desc;
descSection.appendChild(descP);
container.appendChild(descSection);
// SQL
if (d.sql) {
var sqlSection = document.createElement('div');
sqlSection.className = 'detail-section';
var sqlH3 = document.createElement('h3');
sqlH3.textContent = 'Key Query / Code';
sqlSection.appendChild(sqlH3);
var sqlBlock = document.createElement('div');
sqlBlock.className = 'sql-block';
sqlBlock.textContent = d.sql;
sqlSection.appendChild(sqlBlock);
container.appendChild(sqlSection);
}
return container;
}
function showDetail(key) {
var d = details[key];
if (!d) return;
var panel = document.getElementById('detail-panel');
document.getElementById('detail-title').textContent = d.title;
var body = document.getElementById('detail-body');
while (body.firstChild) body.removeChild(body.firstChild);
body.appendChild(buildDetailContent(d));
document.querySelectorAll('.node.selected').forEach(function(n) { n.classList.remove('selected'); });
var clicked = document.querySelector('[data-detail="' + key + '"]');
if (clicked) clicked.classList.add('selected');
panel.classList.add('open');
}
function closeDetail() {
document.getElementById('detail-panel').classList.remove('open');
document.querySelectorAll('.node.selected').forEach(function(n) { n.classList.remove('selected'); });
}
document.addEventListener('click', function(e) {
var node = e.target.closest('.node[data-detail]');
if (node) { showDetail(node.dataset.detail); return; }
if (!e.target.closest('.detail-panel') && !e.target.closest('.node')) closeDetail();
});
document.addEventListener('keydown', function(e) { if (e.key === 'Escape') closeDetail(); });
</script>
</body>
</html>

View File

@@ -0,0 +1,65 @@
-- Migration 012: Make label_name and milestone_title nullable
-- GitLab returns null for these when the referenced label/milestone has been deleted.
-- Recreate resource_label_events with nullable label_name
CREATE TABLE resource_label_events_new (
id INTEGER PRIMARY KEY,
gitlab_id INTEGER NOT NULL,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
label_name TEXT,
actor_gitlab_id INTEGER,
actor_username TEXT,
created_at INTEGER NOT NULL,
CHECK (
(issue_id IS NOT NULL AND merge_request_id IS NULL) OR
(issue_id IS NULL AND merge_request_id IS NOT NULL)
)
);
INSERT INTO resource_label_events_new
SELECT * FROM resource_label_events;
DROP TABLE resource_label_events;
ALTER TABLE resource_label_events_new RENAME TO resource_label_events;
CREATE UNIQUE INDEX uq_label_events_gitlab ON resource_label_events(gitlab_id, project_id);
CREATE INDEX idx_label_events_issue ON resource_label_events(issue_id) WHERE issue_id IS NOT NULL;
CREATE INDEX idx_label_events_mr ON resource_label_events(merge_request_id) WHERE merge_request_id IS NOT NULL;
CREATE INDEX idx_label_events_created ON resource_label_events(created_at);
-- Recreate resource_milestone_events with nullable milestone_title
CREATE TABLE resource_milestone_events_new (
id INTEGER PRIMARY KEY,
gitlab_id INTEGER NOT NULL,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
milestone_title TEXT,
milestone_id INTEGER,
actor_gitlab_id INTEGER,
actor_username TEXT,
created_at INTEGER NOT NULL,
CHECK (
(issue_id IS NOT NULL AND merge_request_id IS NULL) OR
(issue_id IS NULL AND merge_request_id IS NOT NULL)
)
);
INSERT INTO resource_milestone_events_new
SELECT * FROM resource_milestone_events;
DROP TABLE resource_milestone_events;
ALTER TABLE resource_milestone_events_new RENAME TO resource_milestone_events;
CREATE UNIQUE INDEX uq_milestone_events_gitlab ON resource_milestone_events(gitlab_id, project_id);
CREATE INDEX idx_milestone_events_issue ON resource_milestone_events(issue_id) WHERE issue_id IS NOT NULL;
CREATE INDEX idx_milestone_events_mr ON resource_milestone_events(merge_request_id) WHERE merge_request_id IS NOT NULL;
CREATE INDEX idx_milestone_events_created ON resource_milestone_events(created_at);
-- Update schema version
INSERT INTO schema_version (version, applied_at, description)
VALUES (12, strftime('%s', 'now') * 1000, 'Make label_name and milestone_title nullable for deleted labels/milestones');

View File

@@ -0,0 +1,10 @@
-- Migration 013: Add resource event sync watermarks
-- Mirrors the discussions_synced_for_updated_at pattern so that only entities
-- whose updated_at exceeds the last resource event sync get re-enqueued.
ALTER TABLE issues ADD COLUMN resource_events_synced_for_updated_at INTEGER;
ALTER TABLE merge_requests ADD COLUMN resource_events_synced_for_updated_at INTEGER;
-- Update schema version
INSERT INTO schema_version (version, applied_at, description)
VALUES (13, strftime('%s', 'now') * 1000, 'Add resource event sync watermarks to issues and merge_requests');

View File

@@ -0,0 +1,12 @@
-- Migration 014: sync_runs enrichment for observability
-- Adds correlation ID and aggregate counts for queryable sync history
ALTER TABLE sync_runs ADD COLUMN run_id TEXT;
ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
-- Index for correlation queries (find run by run_id from logs)
CREATE INDEX IF NOT EXISTS idx_sync_runs_run_id ON sync_runs(run_id);
INSERT INTO schema_version (version, applied_at, description)
VALUES (14, strftime('%s', 'now') * 1000, 'Sync runs enrichment for observability');

View File

@@ -0,0 +1,17 @@
-- Migration 015: Add commit SHAs to merge_requests, closes_issues watermark,
-- and missing label_name index on resource_label_events.
-- Commit SHAs link MRs to actual git history (needed for Gate 4: file-history, Gate 5: trace)
ALTER TABLE merge_requests ADD COLUMN merge_commit_sha TEXT;
ALTER TABLE merge_requests ADD COLUMN squash_commit_sha TEXT;
-- Watermark for closes_issues sync (same pattern as resource_events_synced_for_updated_at)
-- Prevents re-fetching closes_issues for MRs that haven't changed since last sync
ALTER TABLE merge_requests ADD COLUMN closes_issues_synced_for_updated_at INTEGER;
-- Missing index from original spec: enables efficient label-name filtering in timeline queries
CREATE INDEX IF NOT EXISTS idx_label_events_label ON resource_label_events(label_name);
-- Update schema version
INSERT INTO schema_version (version, applied_at, description)
VALUES (15, strftime('%s', 'now') * 1000, 'Add commit SHAs, closes_issues watermark, and label event index');

View File

@@ -0,0 +1,20 @@
-- Migration 016: MR file changes table
-- Powers file-history and trace commands (Gates 4-5)
CREATE TABLE mr_file_changes (
id INTEGER PRIMARY KEY,
merge_request_id INTEGER NOT NULL REFERENCES merge_requests(id) ON DELETE CASCADE,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
old_path TEXT,
new_path TEXT NOT NULL,
change_type TEXT NOT NULL CHECK (change_type IN ('added', 'modified', 'renamed', 'deleted')),
UNIQUE(merge_request_id, new_path)
);
CREATE INDEX idx_mfc_project_path ON mr_file_changes(project_id, new_path);
CREATE INDEX idx_mfc_project_old_path ON mr_file_changes(project_id, old_path) WHERE old_path IS NOT NULL;
CREATE INDEX idx_mfc_mr ON mr_file_changes(merge_request_id);
CREATE INDEX idx_mfc_renamed ON mr_file_changes(project_id, change_type) WHERE change_type = 'renamed';
INSERT INTO schema_version (version, applied_at, description)
VALUES (16, strftime('%s', 'now') * 1000, 'MR file changes table');

View File

@@ -0,0 +1,28 @@
-- Migration 017: Composite indexes for `who` query paths
-- Expert/Overlap: DiffNote path prefix + timestamp filter.
CREATE INDEX IF NOT EXISTS idx_notes_diffnote_path_created
ON notes(position_new_path, created_at, project_id)
WHERE note_type = 'DiffNote' AND is_system = 0;
-- Active/Workload: discussion participation lookups.
CREATE INDEX IF NOT EXISTS idx_notes_discussion_author
ON notes(discussion_id, author_username)
WHERE is_system = 0;
-- Active (project-scoped): unresolved discussions by recency, scoped by project.
CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent
ON discussions(project_id, last_note_at)
WHERE resolvable = 1 AND resolved = 0;
-- Active (global): unresolved discussions by recency (no project scope).
CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent_global
ON discussions(last_note_at)
WHERE resolvable = 1 AND resolved = 0;
-- Workload: issue assignees by username.
CREATE INDEX IF NOT EXISTS idx_issue_assignees_username
ON issue_assignees(username, issue_id);
INSERT INTO schema_version (version, applied_at, description)
VALUES (17, strftime('%s', 'now') * 1000, 'Composite indexes for who query paths');

View File

@@ -0,0 +1,10 @@
-- Migration 018: Fix composite index on issue_assignees
-- Migration 005 created idx_issue_assignees_username(username) as single-column.
-- Migration 017 attempted to recreate as (username, issue_id) but IF NOT EXISTS
-- silently skipped it. Drop and recreate with the correct composite columns.
DROP INDEX IF EXISTS idx_issue_assignees_username;
CREATE INDEX idx_issue_assignees_username ON issue_assignees(username, issue_id);
INSERT INTO schema_version (version, applied_at, description)
VALUES (18, strftime('%s', 'now') * 1000, 'Fix composite index on issue_assignees');

View File

@@ -0,0 +1,16 @@
-- Standalone updated_at DESC indexes for ORDER BY without temp B-tree sort.
-- The existing composite indexes (project_id, updated_at) only help when
-- filtering by project first.
CREATE INDEX IF NOT EXISTS idx_issues_updated_at_desc
ON issues(updated_at DESC);
CREATE INDEX IF NOT EXISTS idx_mrs_updated_at_desc
ON merge_requests(updated_at DESC);
-- Covering index for correlated subquery: unresolved discussion count per issue.
-- MRs already have idx_discussions_mr_resolved (migration 006).
CREATE INDEX IF NOT EXISTS idx_discussions_issue_resolved
ON discussions(issue_id, resolvable, resolved);
INSERT INTO schema_version (version, applied_at, description)
VALUES (19, strftime('%s', 'now') * 1000, 'List performance indexes');

View File

@@ -0,0 +1,7 @@
-- Migration 020: Watermark column for MR diffs sync
-- Tracks which MRs have had their file changes fetched, same pattern as closes_issues_synced_for_updated_at
ALTER TABLE merge_requests ADD COLUMN diffs_synced_for_updated_at INTEGER;
INSERT INTO schema_version (version, applied_at, description)
VALUES (20, strftime('%s', 'now') * 1000, 'MR diffs sync watermark');

View File

@@ -0,0 +1,9 @@
ALTER TABLE issues ADD COLUMN status_name TEXT;
ALTER TABLE issues ADD COLUMN status_category TEXT;
ALTER TABLE issues ADD COLUMN status_color TEXT;
ALTER TABLE issues ADD COLUMN status_icon_name TEXT;
ALTER TABLE issues ADD COLUMN status_synced_at INTEGER;
CREATE INDEX IF NOT EXISTS idx_issues_project_status_name ON issues(project_id, status_name);
INSERT INTO schema_version (version, applied_at, description)
VALUES (21, strftime('%s', 'now') * 1000, 'Work item status columns for issues');

View File

@@ -0,0 +1,21 @@
-- Migration 022: Composite query indexes for notes + author_id column
-- Optimizes author-scoped and project-scoped date-range queries on notes.
-- Adds discussion JOIN indexes and immutable author identity column.
-- Composite index for author-scoped queries (who command, notes --author)
CREATE INDEX IF NOT EXISTS idx_notes_user_created
ON notes(project_id, author_username COLLATE NOCASE, created_at DESC, id DESC)
WHERE is_system = 0;
-- Composite index for project-scoped date-range queries
CREATE INDEX IF NOT EXISTS idx_notes_project_created
ON notes(project_id, created_at DESC, id DESC)
WHERE is_system = 0;
-- Discussion JOIN indexes
CREATE INDEX IF NOT EXISTS idx_discussions_issue_id ON discussions(issue_id);
CREATE INDEX IF NOT EXISTS idx_discussions_mr_id ON discussions(merge_request_id);
-- Immutable author identity column (GitLab numeric user ID)
ALTER TABLE notes ADD COLUMN author_id INTEGER;
CREATE INDEX IF NOT EXISTS idx_notes_author_id ON notes(author_id) WHERE author_id IS NOT NULL;

View File

@@ -0,0 +1,5 @@
ALTER TABLE issues ADD COLUMN closed_at TEXT;
ALTER TABLE issues ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0;
INSERT INTO schema_version (version, applied_at, description)
VALUES (23, strftime('%s', 'now') * 1000, 'Add closed_at and confidential to issues');

View File

@@ -0,0 +1,153 @@
-- Migration 024: Add 'note' source_type to documents and dirty_sources
-- SQLite does not support ALTER CONSTRAINT, so we use the table-rebuild pattern.
-- ============================================================
-- 1. Rebuild dirty_sources with updated CHECK constraint
-- ============================================================
CREATE TABLE dirty_sources_new (
source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion','note')),
source_id INTEGER NOT NULL,
queued_at INTEGER NOT NULL,
attempt_count INTEGER NOT NULL DEFAULT 0,
last_attempt_at INTEGER,
last_error TEXT,
next_attempt_at INTEGER,
PRIMARY KEY(source_type, source_id)
);
INSERT INTO dirty_sources_new SELECT * FROM dirty_sources;
DROP TABLE dirty_sources;
ALTER TABLE dirty_sources_new RENAME TO dirty_sources;
CREATE INDEX idx_dirty_sources_next_attempt ON dirty_sources(next_attempt_at);
-- ============================================================
-- 2. Rebuild documents with updated CHECK constraint
-- ============================================================
-- 2a. Backup junction table data
CREATE TEMP TABLE _doc_labels_backup AS SELECT * FROM document_labels;
CREATE TEMP TABLE _doc_paths_backup AS SELECT * FROM document_paths;
-- 2b. Drop all triggers that reference documents
DROP TRIGGER IF EXISTS documents_ai;
DROP TRIGGER IF EXISTS documents_ad;
DROP TRIGGER IF EXISTS documents_au;
DROP TRIGGER IF EXISTS documents_embeddings_ad;
-- 2c. Drop junction tables (they have FK references to documents)
DROP TABLE IF EXISTS document_labels;
DROP TABLE IF EXISTS document_paths;
-- 2d. Create new documents table with 'note' in CHECK constraint
CREATE TABLE documents_new (
id INTEGER PRIMARY KEY,
source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion','note')),
source_id INTEGER NOT NULL,
project_id INTEGER NOT NULL REFERENCES projects(id),
author_username TEXT,
label_names TEXT,
created_at INTEGER,
updated_at INTEGER,
url TEXT,
title TEXT,
content_text TEXT NOT NULL,
content_hash TEXT NOT NULL,
labels_hash TEXT NOT NULL DEFAULT '',
paths_hash TEXT NOT NULL DEFAULT '',
is_truncated INTEGER NOT NULL DEFAULT 0,
truncated_reason TEXT CHECK (
truncated_reason IN (
'token_limit_middle_drop','single_note_oversized','first_last_oversized',
'hard_cap_oversized'
)
OR truncated_reason IS NULL
),
UNIQUE(source_type, source_id)
);
-- 2e. Copy all existing data
INSERT INTO documents_new SELECT * FROM documents;
-- 2f. Swap tables
DROP TABLE documents;
ALTER TABLE documents_new RENAME TO documents;
-- 2g. Recreate all indexes on documents
CREATE INDEX idx_documents_project_updated ON documents(project_id, updated_at);
CREATE INDEX idx_documents_author ON documents(author_username);
CREATE INDEX idx_documents_source ON documents(source_type, source_id);
CREATE INDEX idx_documents_hash ON documents(content_hash);
-- 2h. Recreate junction tables
CREATE TABLE document_labels (
document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
label_name TEXT NOT NULL,
PRIMARY KEY(document_id, label_name)
) WITHOUT ROWID;
CREATE INDEX idx_document_labels_label ON document_labels(label_name);
CREATE TABLE document_paths (
document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
path TEXT NOT NULL,
PRIMARY KEY(document_id, path)
) WITHOUT ROWID;
CREATE INDEX idx_document_paths_path ON document_paths(path);
-- 2i. Restore junction table data from backups
INSERT INTO document_labels SELECT * FROM _doc_labels_backup;
INSERT INTO document_paths SELECT * FROM _doc_paths_backup;
-- 2j. Recreate FTS triggers (from migration 008)
CREATE TRIGGER documents_ai AFTER INSERT ON documents BEGIN
INSERT INTO documents_fts(rowid, title, content_text)
VALUES (new.id, COALESCE(new.title, ''), new.content_text);
END;
CREATE TRIGGER documents_ad AFTER DELETE ON documents BEGIN
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
END;
CREATE TRIGGER documents_au AFTER UPDATE ON documents
WHEN old.title IS NOT new.title OR old.content_text != new.content_text
BEGIN
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
INSERT INTO documents_fts(rowid, title, content_text)
VALUES (new.id, COALESCE(new.title, ''), new.content_text);
END;
-- 2k. Recreate embeddings cleanup trigger (from migration 009)
CREATE TRIGGER documents_embeddings_ad AFTER DELETE ON documents BEGIN
DELETE FROM embeddings
WHERE rowid >= old.id * 1000
AND rowid < (old.id + 1) * 1000;
END;
-- 2l. Rebuild FTS index to ensure consistency after table swap
INSERT INTO documents_fts(documents_fts) VALUES('rebuild');
-- ============================================================
-- 3. Defense triggers: clean up documents when notes are
-- deleted or flipped to system notes
-- ============================================================
CREATE TRIGGER notes_ad_cleanup AFTER DELETE ON notes
WHEN old.is_system = 0
BEGIN
DELETE FROM documents WHERE source_type = 'note' AND source_id = old.id;
END;
CREATE TRIGGER notes_au_system_cleanup AFTER UPDATE OF is_system ON notes
WHEN NEW.is_system = 1 AND OLD.is_system = 0
BEGIN
DELETE FROM documents WHERE source_type = 'note' AND source_id = OLD.id;
END;
-- ============================================================
-- 4. Drop temp backup tables
-- ============================================================
DROP TABLE IF EXISTS _doc_labels_backup;
DROP TABLE IF EXISTS _doc_paths_backup;

View File

@@ -0,0 +1,8 @@
-- Backfill existing non-system notes into dirty queue for document generation.
-- Only seeds notes that don't already have documents and aren't already queued.
INSERT INTO dirty_sources (source_type, source_id, queued_at)
SELECT 'note', n.id, CAST(strftime('%s', 'now') AS INTEGER) * 1000
FROM notes n
LEFT JOIN documents d ON d.source_type = 'note' AND d.source_id = n.id
WHERE n.is_system = 0 AND d.id IS NULL
ON CONFLICT(source_type, source_id) DO NOTHING;

Some files were not shown because too many files have changed in this diff Show More