54 Commits

Author SHA1 Message Date
teernisse
171260a772 feat(cli): implement 'lore trace' command (bd-2n4, bd-9dd)
Gate 5 Code Trace - Tier 1 (API-only, no git blame).
Answers 'Why was this code introduced?' by building
file -> MR -> issue -> discussion chains.

New files:
- src/core/trace.rs: run_trace() query logic with rename-aware
  path resolution, entity_reference-based issue linking, and
  DiffNote discussion extraction
- src/core/trace_tests.rs: 7 unit tests for query logic
- src/cli/commands/trace.rs: CLI command with human output,
  robot JSON output, and :line suffix parsing (5 tests)

Human output shows full content (no truncation).
Robot JSON truncates discussion bodies to 500 chars for token efficiency.

Wiring:
- TraceArgs + Commands::Trace in cli/mod.rs
- handle_trace in main.rs
- VALID_COMMANDS + robot-docs manifest entry
- COMMAND_FLAGS autocorrect registry entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 14:57:21 -05:00
teernisse
a1bca10408 feat(cli): implement 'lore file-history' command (bd-z94)
Adds file-history command showing which MRs touched a file, with:
- Rename chain resolution via BFS (resolve_rename_chain from bd-1yx)
- DiffNote discussion snippets with --discussions flag
- --merged filter, --no-follow-renames, -n limit
- Human output with styled MR list and rename chain display
- Robot JSON output with {ok, data, meta} envelope
- Autocorrect registry and robot-docs manifest entry
- Fixes pre-existing --no-status missing from sync autocorrect registry
2026-02-17 12:57:56 -05:00
teernisse
491dc52864 release: v0.8.3 2026-02-16 10:29:52 -05:00
teernisse
b9063aa17a feat(cli): add --no-status flag to skip GraphQL status enrichment during sync 2026-02-16 10:29:11 -05:00
teernisse
fc0d9cb1d3 feat(sync): colored stage output, functional sub-rows, and error visibility
Overhaul the sync command's human output to use semantic colors and a
cleaner rendering architecture. The changes fall into four areas:

Stage lines: Replace direct finish_stage() calls with an
emit_stage_line/emit_stage_block pattern that clears the spinner first,
then prints static lines via MultiProgress::suspend. Stage icons are
now color-coded green (success) or yellow (warning) via color_icon().
A separate "Status" stage line now appears after Issues, summarizing
work-item status enrichment across all projects.

Sub-rows: Replace the imperative print_issue_sub_rows/print_mr_sub_rows
functions with functional issue_sub_rows(), mr_sub_rows(), and new
status_sub_rows() that return Vec<String>. Project paths use
Theme::muted(), error/failure counts use Theme::warning(), and
separators use the dim middle-dot style. Sub-rows are printed atomically
with their parent stage line to avoid interleaving with spinners.

Summary: In print_sync(), counts now use Theme::info().bold() for visual
pop, detail-line separators are individually styled (dim middle-dot),
and a new "Sync completed with issues" headline appears when any stage
had failures. Document errors and embedding failures are surfaced in
both the doc-parts line and the errors line.

Tests: Full coverage for append_failures, summarize_status_enrichment,
should_print_timings, issue_sub_rows, mr_sub_rows, and status_sub_rows.
2026-02-16 09:43:36 -05:00
teernisse
c8b47bf8f8 feat(cli): add --timings flag and enrich error tracking fields
Add -t/--timings flag to the sync subcommand, allowing users to opt
into a per-stage timing breakdown after the sync summary. Wire the flag
through main.rs into print_sync() which passes it to the new
should_print_timings() gate.

Enrich the data structures that flow through the sync pipeline so
downstream renderers have full error visibility:

- ProjectSummary gains status_errors (issue-side status enrichment
  failures per project)
- ProjectStatusEnrichment gains path (project path for sub-row display)
- SyncResult gains documents_errored and embedding_failed so the
  summary can surface doc-gen and embed failures separately
- Autocorrect table updated with --timings for fuzzy flag matching
2026-02-16 09:43:22 -05:00
teernisse
a570327a6b refactor(progress): extract format_stage_line with themed styling
Pull the line-formatting logic out of finish_stage() into a standalone
public format_stage_line() so that sync.rs can build stage lines without
needing a live ProgressBar (e.g. for static multi-line blocks printed
after the spinner is cleared).

The new function applies Theme::info().bold() to the label and
Theme::timing() to the elapsed column, giving every stage line
consistent color treatment. finish_stage() now delegates to it.

Includes a unit test asserting the formatted output contains the
expected icon, label, summary, and elapsed components.
2026-02-16 09:43:13 -05:00
teernisse
eef73decb5 fix(cli): timeline tag width, test env isolation, and logging verbosity
Miscellaneous fixes across CLI and core modules:

- Timeline: widen TAG_WIDTH from 10 to 11 to accommodate longer event
  type labels without truncation
- render.rs: save and restore LORE_ICONS env var in glyph_mode test to
  prevent interference from the test environment leaking into or from
  other tests that set LORE_ICONS
- logging.rs: adjust verbose=1 to info level (was debug), verbose=2 to
  debug — this reduces noise at -v while keeping -vv as the full debug
  experience
- issues.rs, merge_requests.rs: use infodebug! macro consistently for
  ingestion summary logging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 11:25:42 -05:00
teernisse
bb6660178c feat(sync): per-project breakdown, status enrichment progress bars, and summary polish
Add per-project detail rows beneath stage completion lines during multi-project
syncs, showing itemized counts (issues/MRs, discussions, events, statuses, diffs)
for each project. Previously, only aggregate totals were visible, making it hard
to diagnose which project contributed what during a sync.

Status enrichment gets proper progress bars replacing the old spinner-only
display: StatusEnrichmentStarted now carries a total count so the CLI can
render a determinate bar with rate and ETA. The enrichment SQL is tightened
to use IS NOT comparisons for diff-only UPDATEs (skip rows where values
haven't changed), and a follow-up touch_stmt ensures status_synced_at is
updated even for unchanged rows so staleness detection works correctly.

Other improvements:
- New ProjectSummary struct aggregates per-project metrics during ingestion
- SyncResult gains statuses_enriched + per-project summary vectors
- "Already up to date" message when sync finds zero changes
- Remove Arc<AtomicBool> tick_started pattern from docs/embed stages
  (enable_steady_tick is idempotent, the guard was unnecessary)
- Progress bar styling: dim spinner, dark_gray track, per_sec + eta display
- Tick intervals tightened from 100ms to 60ms for smoother animation
- statuses_without_widget calculation uses fetch_result.statuses.len()
  instead of subtracting enriched (more accurate when some statuses lack
  work item widgets)
- Status enrichment completion log downgraded from info to debug

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 11:25:33 -05:00
teernisse
64e73b1cab fix(graphql): handle past HTTP dates in retry-after header gracefully
Extract parse_retry_after_value(header, now) as a pure function to enable
deterministic testing of Retry-After header parsing. The previous
implementation used let-chains with SystemTime::now() inline, which made
it untestable and would panic on negative durations when the server
clock was behind or the header contained a date in the past.

Changes:
- Extract parse_retry_after_value() taking an explicit `now` parameter
- Handle past HTTP dates by returning 1 second instead of panicking on
  negative Duration (date.duration_since(now) returns Err for past dates)
- Trim whitespace from header values before parsing
- Add test for past HTTP date returning 1 second minimum
- Add test for delta-seconds with surrounding whitespace

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 11:25:19 -05:00
teernisse
361757568f refactor(cli): remove deprecated stage_spinner, migrate remaining callers to v2
Phase 7 cleanup: migrate timeline.rs and main.rs search spinner
from stage_spinner() to stage_spinner_v2() with proper icon labels,
then remove the now-unused stage_spinner() function and its tests.

No external callers remain for the old numbered-stage API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:13:06 -05:00
Taylor Eernisse
8572f6cc04 refactor(cli): polish secondary commands with icons, number formatting, and section dividers
Phase 6 of the UX overhaul. Applies consistent visual treatment across
the remaining command outputs: stats, doctor, timeline, who, count,
and drift.

Stats (stats.rs):
- Apply render::format_number() to all numeric values (documents,
  FTS indexed, embedding counts, chunks) for thousand-separator
  formatting in large databases

Doctor (doctor.rs):
- Replace Unicode check/warning/cross symbols with Icons::success(),
  Icons::warning(), Icons::error() for glyph-mode awareness
- Add summary line after checks showing "Ready/Not ready" with counts
  of passed, warnings, and failed checks separated by middle dots
- Remove "lore doctor" title header for cleaner output

Count (count.rs):
- Right-align numeric values with {:>10} format for columnar output
  in count and state breakdown displays

Timeline (timeline.rs):
- Add entity icons (issue/MR) before entity references in event rows
- Refactor format_event_tag to pad plain text before applying style,
  preventing ANSI codes from breaking column alignment
- Extract style_padded() helper for width-then-style pattern

Who (who.rs):
- Add Icons::user() before usernames in expert, workload, reviews,
  and overlap displays
- Replace manual bold section headers with render::section_divider()
  in workload view (Assigned Issues, Authored MRs, Reviewing MRs,
  Unresolved Discussions)

Drift (drift.rs):
- Add Icons::error()/success() before drift detection status line
- Replace '#' bar character with Unicode full block for similarity
  curve visualization

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
d0744039ef refactor(show): polish issue and MR detail views with section dividers and icons
Phase 4 of the UX overhaul. Restructures the show issue and show MR
detail displays with consistent section layout, state icons, and
improved typography.

Issue detail changes:
- Replace bold header + box-drawing underline with indented title using
  Theme::bold() for the title text only
- Organize fields into named sections using render::section_divider():
  Details, Development, Description, Discussions
- Add state icons (Icons::issue_opened/closed) alongside text labels
- Add relative time in parentheses next to Created/Updated dates
- Switch labels from "Labels: (none)" to only showing when present,
  using format_labels_bare for clean comma-separated output
- Move URL and confidential indicator into Details section
- Closing MRs show state-colored icons (merged/opened/closed)
- Discussions use section_divider instead of bold text, remove colons
  from author lines, adjust wrap widths for consistent indentation

MR detail changes:
- Same section-divider layout: Details, Description, Discussions
- State icons for opened/merged/closed using Icons::mr_* helpers
- Draft indicator uses Icons::mr_draft() instead of [Draft] text prefix
- Relative times added to Created, Updated, Merged, Closed dates
- Reviewers and Assignees fields aligned with fixed-width labels
- Labels shown only when present, using format_labels_bare
- Discussion formatting matches issue detail style

Both views use 5-space left indent for field alignment and consistent
wrap widths (72 for descriptions, 68/66 for discussion notes/replies).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
4b372dfb38 refactor(list): polish list commands with icons, compact timestamps, and styled discussions
Phase 3 of the UX overhaul. Enhances the issues, merge requests, and
notes list displays with visual indicators and improved formatting.

List display changes (src/cli/commands/list.rs):
- Add state icons to issues (opened/closed) and merge requests
  (opened/merged/closed) using Icons:: helpers alongside text labels
- Replace [DRAFT] prefix with Icons::mr_draft() glyph for draft MRs
- Switch from format_relative_time to format_relative_time_compact for
  tighter column widths in tabular output
- Switch from format_labels to format_labels_bare for unlabeled style
- Change format_discussions() return type from String to StyledCell so
  unresolved counts render with Theme::warning() color inline
- Bold the section headers ("Issues", "Merge Requests", "Notes")
  with count separated from the label for cleaner scanning
- Import Icons from render module

Test updates (src/cli/commands/list_tests.rs):
- Update format_discussions tests to assert on StyledCell.text field
  instead of raw String, since the function now returns styled output
- The unresolved-count test checks starts_with/contains to handle
  embedded ANSI escape codes from Theme::warning()

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
af8fc4af76 refactor(sync): overhaul progress display with stage spinners and summaries
Phase 2 of the UX overhaul. Replaces the old numbered-stage progress
system (1/4, 2/4...) and manual indicatif ProgressBar/ProgressStyle
setup with the new centralized progress helpers.

Sync command changes (src/cli/commands/sync.rs):
- Replace stage_spinner(n, total, msg) with stage_spinner_v2(icon, label, status)
  removing the rigid numbered-stage counter in favor of named stages
- Replace manual ProgressBar::new + ProgressStyle::default_bar for docs
  and embed sub-progress with nested_progress(label, len, robot_mode)
- Add finish_stage() calls that display a completion summary with
  elapsed time, e.g. "Issues  42 issues from 3 projects  1.2s"
- Each stage (Issues, MRs, Docs, Embed) now reports what it did on
  completion rather than just clearing the spinner silently
- Embed failure path uses Icons::warning() instead of inline Theme
  formatting, keeping error display consistent with success path
- Remove indicatif direct dependency from sync.rs (now handled by
  progress module)

Main entry point changes (src/main.rs):
- Add GlyphMode detection: auto-detect Unicode/Nerd Font support or
  fall back to ASCII based on --icons flag, --color=never, NO_COLOR,
  or robot mode
- Update all LoreRenderer::init() calls to pass GlyphMode alongside
  ColorMode for icon-aware rendering throughout the CLI
- Overhaul handle_error() formatting: use Icons::error() glyph,
  bold error text, arrow prefixed action suggestions, and breathing
  room with blank lines for scannability
- Migrate handle_embed() progress bar from manual ProgressBar +
  ProgressStyle to nested_progress() helper, matching sync command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
96b288ccdd refactor(search): polish search results rendering with semantic Theme styles
Phase 5 of the UX overhaul. Migrates search result display from raw
console styling to the centralized Theme system with semantic methods,
improving visual consistency and readability.

Search result changes:
- Type badges now use semantic styles (issue_ref, mr_ref) with
  fixed-width alignment for clean columnar layout
- Snippet rendering uses Theme::highlight() for matched terms and
  Theme::muted() for surrounding context, replacing bold+underline
- Metadata line uses Theme::username() for authors and per-part
  styling with middle-dot separators instead of a single dim line
- Result numbering uses muted style with right-aligned width
- Consistent 8-space indent for metadata, snippets, and explain lines
- Header line uses muted style for search mode instead of dim+parens
- Trailing blank line moved after the result loop instead of per-result

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
teernisse
d710403567 feat(cli): add GlyphMode icon system, Theme extensions, and progress API
Phase 1 of UX skin overhaul: foundation layer that all subsequent
phases build upon.

Icons: 3-tier glyph system (Nerd Font / Unicode / ASCII) with
auto-detection from TERM_PROGRAM, LORE_ICONS env, or --icons flag.
16 semantic icon methods on Icons struct (success, warning, error,
issue states, MR states, note, search, user, sync, waiting).

Theme: 4 new semantic styles — muted (#6b7280), highlight (#fbbf24),
timing (#94a3b8), state_draft (#6b7280).

Progress: stage_spinner_v2 with icon prefix, nested_progress with
bounded bar/throughput/ETA, finish_stage for static completion lines,
format_elapsed for compact duration strings.

Utilities: format_relative_time_compact (3h, 2d, 1w, 3mo),
format_labels_bare (comma-separated without brackets).

CLI: --icons global flag, GLOBAL_FLAGS registry updated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 10:06:05 -05:00
Taylor Eernisse
ebf64816c9 fix(search): correct FTS5 raw mode fallback test assertion
Update test_raw_mode_leading_wildcard_falls_back_to_safe to match the
actual Safe mode behavior: OR is a recognized FTS5 boolean operator and
passes through unquoted, so the expected output is '"*" OR "auth"' not
'"*" "OR" "auth"'. The previous assertion was incorrect since the Safe
mode operator-passthrough logic was added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:34:01 -05:00
Taylor Eernisse
450951dee1 feat(timeline): rename --expand-mentions to --no-mentions, default mentions on
Invert the timeline mention-expansion flag semantics. Previously, mention
edges were excluded by default and --expand-mentions opted in. Now mention
edges are included by default (matching the more common use case) and
--no-mentions opts out to reduce fan-out when needed.

This is a breaking CLI change but aligns with the principle that the
default behavior should produce the most useful output. Users who were
passing --expand-mentions get the same behavior without any flag. Users
who want reduced output can pass --no-mentions.

Updated: CLI args (TimelineArgs), autocorrect flag list, robot-docs
schema, README documentation and flag reference table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:33:34 -05:00
Taylor Eernisse
81f049a7fa refactor(main): wire LoreRenderer init, migrate to Theme, improve UX polish
Wire the LoreRenderer singleton initialization into main.rs color mode
handling, replacing the console::style import with Theme throughout.

Key changes:

- Color initialization: LoreRenderer::init() called for all code paths
  (NO_COLOR, --color never/always/auto, unknown mode fallback) alongside
  the existing console::set_colors_enabled() calls. Both systems must
  agree since some transitive code still uses console (e.g. dialoguer).

- Tracing: Replace .with_target(false) with .event_format(CompactHumanFormat)
  for the stderr layer, producing the clean 'HH:MM:SS LEVEL  message' format.

- Error handling: handle_error() now shows machine-actionable recovery
  commands from gi_error.actions() below the hint, formatted with dim '$'
  prefix and bold command text.

- Deprecation warnings: All 'lore list', 'lore show', 'lore auth-test',
  'lore sync-status' warnings migrated to Theme::warning().

- Init wizard: All success/info/error messages migrated. Unicode check
  marks use explicit \u{2713} escapes instead of literal symbols.

- Embed command: Added progress bar with indicatif for embedding stage,
  showing position/total with steady tick. Elapsed time shown on completion.

- Generate-docs and ingest commands: Added 'Done in Xs' elapsed time and
  next-step hints (run embed after generate-docs, run generate-docs after
  ingest) for better workflow guidance.

- Sync output: Interrupt message and lock release migrated to Theme.

- Health command: Status labels and overall healthy/unhealthy styled.

- Robot-docs: Added drift command schema, updated sync flags to include
  --no-file-changes, updated who flags with new options.

- Timeline --expand-mentions -> --no-mentions flag rename wired through
  params and robot-docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:33:09 -05:00
Taylor Eernisse
dd00a2b840 refactor(cli): migrate all command modules from console::style to Theme
Replace all console::style() calls in command modules with the centralized
Theme API and render:: utility functions. This ensures consistent color
behavior across the entire CLI, proper NO_COLOR/--color never support via
the LoreRenderer singleton, and eliminates duplicated formatting code.

Changes per module:

- count.rs: Theme for table headers, render::format_number replacing local
  duplicate. Removed local format_number implementation.
- doctor.rs: Theme::success/warning/error for check status symbols and
  messages. Unicode escapes for check/warning/cross symbols.
- drift.rs: Theme::bold/error/success for drift detection headers and
  status messages.
- embed.rs: Compact output format — headline with count, zero-suppressed
  detail lines, 'nothing to embed' short-circuit for no-op runs.
- generate_docs.rs: Same compact pattern — headline + detail + hint for
  next step. No-op short-circuit when regenerated==0.
- ingest.rs: Theme for project summaries, sync status, dry-run preview.
  All console::style -> Theme replacements.
- list.rs: Replace comfy-table with render::LoreTable for issue/MR listing.
  Remove local colored_cell, colored_cell_hex, format_relative_time,
  truncate_with_ellipsis, and format_labels (all moved to render.rs).
- list_tests.rs: Update test assertions to use render:: functions.
- search.rs: Add render_snippet() for FTS5 <mark> tag highlighting via
  Theme::bold().underline(). Compact result layout with type badges.
- show.rs: Theme for entity detail views, delegate format_date and
  wrap_text to render module.
- stats.rs: Section-based layout using render::section_divider. Compact
  middle-dot format for document counts. Color-coded embedding coverage
  percentage (green >=95%, yellow >=50%, red <50%).
- sync.rs: Compact sync summary — headline with counts and elapsed time,
  zero-suppressed detail lines, visually prominent error-only section.
- sync_status.rs: Theme for run history headers, removed local
  format_number duplicate.
- timeline.rs: Theme for headers/footers, render:: for date/truncate,
  standard format! padding replacing console::pad_str.
- who.rs: Theme for all expert/workload/active/overlap/review output
  modes, render:: for relative time and truncation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:32:35 -05:00
Taylor Eernisse
c6a5461d41 refactor(ingestion): compact log summaries and quieter shutdown messages
Migrate all ingestion completion logs to use nonzero_summary() for compact,
zero-suppressed output. Before: 8-14 individual key=value structured fields
per completion message. After: a single summary field like
'42 fetched · 3 labels · 12 notes' that only shows non-zero counters.

Also downgrade all 'Shutdown requested...' messages from info! to debug!.
These are emitted on every Ctrl+C and add noise to the partial results
output that immediately follows. They remain visible at -vv for debugging
graceful shutdown behavior.

Affected modules:
- issues.rs: issue ingestion completion
- merge_requests.rs: MR ingestion completion, full-sync cursor reset
- mr_discussions.rs: discussion ingestion completion
- orchestrator.rs: project-level issue and MR completion summaries,
  all shutdown-requested checkpoints across discussion sync, resource
  events drain, closes-issues drain, and MR diffs drain

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:57 -05:00
Taylor Eernisse
a7f86b26e4 refactor(core): compact human log format, quieter lock lifecycle, nonzero_summary helper
Three quality-of-life improvements to reduce log noise and improve readability:

1. logging.rs: Add CompactHumanFormat for stderr tracing output. Replaces the
   default format with a minimal 'HH:MM:SS LEVEL  message key=value' layout —
   no span context, no full timestamps, no target module. The JSON file log
   layer is unaffected. This makes watching 'lore sync' output much cleaner.

2. lock.rs: Downgrade AppLock acquire/release messages from info! to debug!.
   Lock lifecycle events (acquired new, acquired existing, released) are
   operational bookkeeping that clutters normal output. They remain visible
   at -vv verbosity for troubleshooting.

3. ingestion/mod.rs: Add nonzero_summary() utility that formats named counters
   as a compact middle-dot-separated string, suppressing zero values. Produces
   output like '42 fetched · 3 labels · 12 notes' instead of verbose key=value
   structured fields. Returns 'nothing to update' when all values are zero.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:30 -05:00
Taylor Eernisse
5ee8b0841c feat(cli): add centralized render module with semantic Theme and LoreRenderer
Introduce src/cli/render.rs as the single source of truth for all terminal
output styling and formatting utilities. Key components:

- LoreRenderer: global singleton initialized once at startup, resolving
  color mode (Auto/Always/Never) against TTY state and NO_COLOR env var.
  This fixes lipgloss's limitation of hardcoded TrueColor rendering by
  gating all style application through a colors_on() check.

- Theme: semantic style constants (success/warning/error/info/accent,
  entity refs, state colors, structural styles) that return plain
  Style::new() when colors are disabled. Replaces ad-hoc console::style()
  calls scattered across 15+ command modules.

- Shared formatting utilities consolidated from duplicated implementations:
  format_relative_time (was in list.rs and who.rs), format_number (was in
  count.rs and sync_status.rs), truncate (was truncate_with_ellipsis in
  list.rs and truncate_summary in timeline.rs), format_labels, format_date,
  wrap_indent, section_divider.

- LoreTable: lightweight table renderer replacing comfy-table with simple
  column alignment (Left/Right/Center), adaptive terminal width, and
  NO_COLOR-safe output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:02 -05:00
Taylor Eernisse
7062a3f1fd deps: replace comfy-table with lipgloss (charmed-lipgloss)
Switch from comfy-table to the lipgloss Rust port for terminal styling.
lipgloss provides a composable Style API better suited to our new semantic
theming approach (Theme::success(), Theme::error(), etc.) where we apply
styles to individual text spans rather than constructing styled table cells.
The comfy-table dependency was only used by the list command's human output
and is no longer needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:30:31 -05:00
teernisse
159c490ad7 docs: update README with notes, drift, error tolerance, scoring config, and expanded command reference
Major additions:
- lore notes command: full documentation of rich note querying with
  filters (author, type, path, resolution, time range, body substring),
  sort/format options, field selection, and browser opening
- lore drift command: discussion divergence detection documentation
- Error Tolerance section: table of all 8 auto-correction types with
  examples and mode behavior, stderr JSON warning format, fuzzy
  suggestion format for unrecognized commands
- Command Aliases table: primary commands and their accepted aliases
- scoring config section: all weight/half-life/decay parameters for
  the who-expert scoring engine (authorWeight, reviewerWeight, noteBonus,
  half-life periods, closedMrMultiplier, excludedUsernames)

Updates to existing sections:
- Timeline: entity-direct seeding syntax (issue:N, i:N, mr:N, m:N),
  hybrid search pipeline description replacing pure FTS5, discussion
  thread collection, --fields flag, numbered progress spinners
- Search: --after/--updated-after renamed to --since/--updated-since,
  progress spinner behavior, note type filter
- Who: --explain-score, --as-of, --include-bots, --all-history, --detail
- Sync: --no-file-changes flag
- Robot-docs: --brief flag
- Field selection: expanded to note which commands support --fields
2026-02-13 17:27:59 -05:00
teernisse
e0041ed4d9 feat(cli): improve error recovery with alias-aware suggestions and error tolerance manifest
Two related improvements to agent ergonomics in main.rs:

1. suggest_similar_command now matches against aliases (issue->issues,
   mr->mrs, find->search, stat->stats, note->notes, etc.) and provides
   contextual usage examples via a new command_example() helper, so
   agents get actionable recovery hints like "Did you mean 'lore mrs'?
   Example: lore --robot mrs -n 10" instead of just the command name.

2. robot-docs now includes an error_tolerance section documenting every
   auto-correction the CLI performs: types (single_dash_long_flag,
   case_normalization, flag_prefix, fuzzy_flag, subcommand_alias,
   value_normalization, value_fuzzy, prefix_matching), examples, and
   mode behavior (threshold differences). Also expands the aliases
   section with command_aliases and pre_clap_aliases maps for complete
   agent self-discovery.

Together these ensure agents can programmatically discover and recover
from any CLI input error without human intervention.
2026-02-13 17:27:49 -05:00
teernisse
a34751bd47 feat(autocorrect): expand pre-clap correction to 3-phase pipeline with subcommand aliases, value normalization, and flag prefix matching
Three-phase pipeline replacing the single-pass correction:

- Phase A: Subcommand alias correction — handles forms clap can't
  express (merge_requests, mergerequests, robotdocs, generatedocs,
  gen-docs, etc.) via case-insensitive alias map lookup.
- Phase B: Per-arg flag corrections — adds unambiguous prefix expansion
  (--proj -> --project) alongside existing single-dash, case, and fuzzy
  rules. New FlagPrefix rule with 0.95 confidence.
- Phase C: Enum value normalization — auto-corrects casing, prefixes,
  and typos for flags with known valid values. Handles both --flag value
  and --flag=value forms. Respects POSIX -- option terminator.

Changes strict/robot mode from disabling fuzzy matching entirely to using
a higher threshold (0.9 vs 0.8), still catching obvious typos like
--projct while avoiding speculative corrections that mislead agents.

New CorrectionRule variants: SubcommandAlias, ValueNormalization,
ValueFuzzy, FlagPrefix. Each has a corresponding teaching note.
Comprehensive test coverage for all new correction types including
subcommand aliases, value normalization (case, prefix, fuzzy, eq-form),
flag prefix (ambiguous rejection, eq-value preservation), and updated
strict mode behavior.
2026-02-13 17:27:39 -05:00
teernisse
0aecbf33c0 feat(xref): extract cross-references from descriptions, user notes, and fix system note regex
- Fix MENTIONED_RE/CLOSED_BY_RE to match real GitLab format
  ('mentioned in issue #N' / 'mentioned in merge request !N')
- Add GITLAB_URL_RE + parse_url_refs() for full URL extraction
- Add extract_refs_from_descriptions() -> source_method='description_parse'
- Add extract_refs_from_user_notes() -> source_method='note_parse'
- Wire both into orchestrator after system note extraction
- 36 tests: regex fix, URL parsing, integration, idempotency
2026-02-13 17:19:36 -05:00
teernisse
c10471ddb9 feat(timeline): add entity-direct seeding (issue:N, mr:N syntax)
Adds issue:N / i:N / mr:N / m:N query syntax to bypass hybrid search
and seed the timeline directly from a known entity. All discussions for
the entity are gathered without needing Ollama.

- parse_timeline_query() detects entity-direct patterns
- resolve_entity_by_iid() resolves IID to EntityRef with ambiguity handling
- seed_timeline_direct() gathers all discussions for the entity
- 20 new tests (5 resolve, 6 direct seed, 9 parse)
- Updated CLI help text and robot-docs manifest
2026-02-13 15:22:45 -05:00
teernisse
cbce4c9f59 release: v0.8.2 2026-02-13 15:01:28 -05:00
teernisse
94435c37f0 perf(timeline): hoist prepared statement outside discussion thread loop
Moves the conn.prepare() call for fetching discussion notes outside the
per-discussion loop in collect_discussion_threads(). The SQL is identical
for every iteration, so preparing it once and rebinding parameters avoids
redundant statement compilation on each matched discussion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:40 -05:00
teernisse
59f65b127a fix(search): pass FTS5 boolean operators through unquoted
FTS5 boolean operators (AND, OR, NOT, NEAR) are case-sensitive uppercase
keywords that must appear unquoted in the query string. Previously, the
user-friendly query builder would double-quote every token, causing
queries like "switch AND health" to search for the literal word "AND"
instead of using it as a boolean conjunction.

Adds a FTS5_OPERATORS constant and checks each token against it before
quoting, allowing natural boolean search syntax to work as expected.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:29 -05:00
teernisse
f36e900570 feat(cli): add pipeline progress spinners to timeline and search
Adds numbered stage spinners ([1/3], [2/3], [3/3]) to the timeline
pipeline stages (seed, expand, collect) so users see activity during
longer queries. TimelineParams gains a robot_mode field to suppress
spinners in JSON output mode.

Adds a [1/1] spinner to the search command for consistency, using the
shared stage_spinner from cli/progress.

Also refactors wrap_snippet() to delegate to wrap_text() with a 4-line
cap, eliminating the duplicated word-wrapping logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:19 -05:00
teernisse
e2efc61beb refactor(cli): extract stage_spinner to shared progress module
Moves stage_spinner() from a private function in sync.rs to a pub function
in cli/progress.rs so it can be reused by the timeline and search commands.
The function creates a numbered spinner (e.g. [1/3]) for pipeline stages,
returning a hidden no-op bar in robot mode to keep caller code path-uniform.

sync.rs now imports from crate::cli::progress::stage_spinner instead of
defining its own copy. Adds unit tests for robot mode (hidden bar), human
mode (prefix/message properties), and prefix formatting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:56:10 -05:00
teernisse
2da1a228b3 feat(timeline): collect and render full discussion threads
Implements the downstream consumption of matched discussions from the seed
phase, completing the discussion thread feature across collect, CLI, and
integration tests.

Collect phase (timeline_collect.rs):
- New collect_discussion_threads() function assembles full threads by
  querying notes for each matched discussion_id, filtering out system notes
  (is_system = 0), ordering chronologically, and capping at THREAD_MAX_NOTES
  with a synthetic "[N more notes not shown]" summary note
- build_entity_lookup() creates a (type, id) -> (iid, path) map from seed
  and expanded entities to provide display metadata for thread events
- Thread timestamp is set to the first note's created_at for correct
  chronological interleaving with other timeline events
- collect_events() gains a matched_discussions parameter; threads are
  collected after entity events and before evidence note merging

CLI rendering (cli/commands/timeline.rs):
- Human mode: threads render with box-drawing borders, bold @author tags,
  date-stamped notes, and word-wrapped bodies (60 char width)
- Robot mode: DiscussionThread serializes as discussion_thread kind with
  note_count, full notes array (note_id, author, body, ISO created_at)
- THREAD tag in yellow for human event tag styling
- TimelineMeta gains discussion_threads_included count

Tests:
- 8 new collect tests: basic thread assembly, system note filtering, empty
  thread skipping, body truncation to THREAD_NOTE_MAX_CHARS, note cap with
  synthetic summary, timestamp from first note, chronological sort position,
  and deduplication of duplicate discussion_ids
- Integration tests updated for new collect_events signature

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:18:36 -05:00
teernisse
0e65202778 feat(timeline): add DiscussionThread types and seed-phase discussion matching
Introduces the foundation for full discussion thread support in the
timeline pipeline. Adds three new domain types to timeline.rs:

- ThreadNote: individual note within a thread (id, author, body, timestamp)
- MatchedDiscussion: tracks discussions matched during seeding with their
  parent entity (issue or MR) for downstream collection
- DiscussionThread variant on TimelineEventType: carries a full thread of
  notes, sorted between NoteEvidence and CrossReferenced

Moves truncate_to_chars() from timeline_seed.rs to timeline.rs as pub(crate)
for reuse by the collect phase. Adds THREAD_NOTE_MAX_CHARS (2000) and
THREAD_MAX_NOTES (50) constants.

Upgrades the seed SQL in resolve_documents_to_entities() to resolve note
documents to their parent discussion via an additional LEFT JOIN chain
(notes -> discussions), using COALESCE to unify the entity resolution path
for both discussion and note source types. SeedResult gains a
matched_discussions field that captures deduplicated discussion matches.

Tests cover: discussion matching from discussion docs, note-to-parent
resolution, deduplication of same discussion across multiple docs, and
correct parent entity type (issue vs MR).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:18:18 -05:00
teernisse
f439c42b3d chore: add gitignore for mock-seed, roam CI workflow, formatting
- Add tools/mock-seed/ to .gitignore
- Add .github/workflows/roam.yml CI workflow
- Add .roam/fitness.yaml architectural fitness rules
- Rustfmt formatting fixes in show.rs and vector.rs
- Beads sync

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:30 -05:00
teernisse
4f3ec72923 feat(timeline): upgrade seed phase to hybrid search
Replace FTS-only seed entity discovery with hybrid search (FTS + vector
via RRF), using the same search_hybrid infrastructure as the search
command. Falls back gracefully to FTS-only when Ollama is unavailable.

Changes:
- seed_timeline() now accepts OllamaClient, delegates to search_hybrid
- New resolve_documents_to_entities() replaces find_seed_entities()
- SeedResult gains search_mode field tracking actual mode used
- TimelineResult carries search_mode through to JSON renderer
- run_timeline wires up OllamaClient from config
- handle_timeline made async for the hybrid search await
- Tests updated for new function signatures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:24 -05:00
teernisse
e6771709f1 refactor(core): extract path_resolver module, fix old_path matching in who
Extract shared path resolution logic from who.rs into a new
core::path_resolver module for cross-module reuse. Functions moved:
escape_like, normalize_repo_path, PathQuery, SuffixResult,
build_path_query, suffix_probe. Duplicate escape_like copies removed
from list.rs, project.rs, and filters.rs — all now import from
path_resolver.

Additionally fixes two bugs in query_expert_details() and
query_overlap() where only position_new_path was checked (missing
old_path matches for renamed files) and state filter excluded 'closed'
MRs despite the main scoring query including them with a decay
multiplier.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 13:50:14 -05:00
Taylor Eernisse
8c86b0dfd7 release: v0.8.1 2026-02-13 11:12:31 -05:00
teernisse
6e55b2470d bugfix: DB column and size issues 2026-02-13 11:11:35 -05:00
Taylor Eernisse
b05922d60b release: v0.8.0 2026-02-13 10:59:05 -05:00
Taylor Eernisse
11fe02fac9 docs: add proposed code file reorganization plan
Planning document for the ongoing test extraction and code organization
effort. Covers module-by-module analysis, proposed file splits, and
phased execution plan.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:56 -05:00
Taylor Eernisse
48fbd4bfdb feat(core): add file rename chain resolver with depth-bounded BFS
New module: core::file_history with resolve_rename_chain() that traces
a file path through its rename history in mr_file_changes using
bidirectional BFS (forward: old_path->new_path, backward: new_path->old_path).

Key design decisions:
- Depth-bounded BFS: each queue entry carries its distance from the
  origin, so max_hops correctly limits by graph distance (not by total
  nodes discovered). This matters for branching rename graphs where a
  file was renamed differently in parallel MRs.
- Cycle-safe: visited set prevents infinite loops from circular renames.
- Project-scoped: queries are always scoped to a single project_id.
- Deterministic: output is sorted for stable results.

Tests cover: linear chains (forward/backward), cycles, max_hops=0,
depth-bounded linear chains, branching renames, diamond patterns,
and cross-project isolation (9 tests total).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:41 -05:00
Taylor Eernisse
9786ef27f5 refactor(core/time): extract parse_since_from for deterministic time parsing
Factor out parse_since_from(input, reference_ms) so callers can compute
relative durations against a fixed reference timestamp instead of always
using now(). The existing parse_since() now delegates to it with now_ms().

Enables testable and reproducible time-relative queries for features like
timeline --as-of and who --as-of.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:20 -05:00
Taylor Eernisse
7e0e6a91f2 refactor: extract unit tests into separate _tests.rs files
Move inline #[cfg(test)] mod tests { ... } blocks from 22 source files
into dedicated _tests.rs companion files, wired via:

    #[cfg(test)]
    #[path = "module_tests.rs"]
    mod tests;

This keeps implementation-focused source files leaner and more scannable
while preserving full access to private items through `use super::*;`.

Modules extracted:
  core:      db, note_parser, payloads, project, references, sync_run,
             timeline_collect, timeline_expand, timeline_seed
  cli:       list (55 tests), who (75 tests)
  documents: extractor (43 tests), regenerator
  embedding: change_detector, chunking
  gitlab:    graphql (wiremock async tests), transformers/issue
  ingestion: dirty_tracker, discussions, issues, mr_diffs

Also adds conflicts_with("explain_score") to the --detail flag in the
who command to prevent mutually exclusive flags from being combined.

All 629 unit tests pass. No behavior changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:02 -05:00
Taylor Eernisse
5c2df3df3b chore(beads): sync issue tracker
Export latest bead state to JSONL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:53:33 -05:00
teernisse
94c8613420 feat(bd-226s): implement time-decay expert scoring model
Replace flat-weight expertise scoring with exponential half-life decay,
split reviewer signals (participated vs assigned-only), dual-path rename
awareness, and new CLI flags (--as-of, --explain-score, --include-bots,
--all-history).

Changes:
- ScoringConfig: 8 new fields with validation (config.rs)
- half_life_decay() and normalize_query_path() pure functions (who.rs)
- CTE-based SQL with dual-path matching, mr_activity, reviewer_participation (who.rs)
- Rust-side decay aggregation with deterministic f64 ordering (who.rs)
- Path resolution probes check old_path columns (who.rs)
- Migration 026: 5 new indexes for dual-path and reviewer participation
- Default --since changed from 6m to 24m
- 31 new tests (example-based + invariant), 621 total who tests passing
- Autocorrect registry updated with new flags

Closes: bd-226s, bd-2w1p, bd-1soz, bd-18dn, bd-2ao4, bd-2yu5, bd-1b50,
bd-1hoq, bd-1h3f, bd-13q8, bd-11mg, bd-1vti, bd-1j5o
2026-02-12 15:44:55 -05:00
teernisse
ad4dd6e855 release: v0.7.0 2026-02-12 13:31:57 -05:00
teernisse
83cd16c918 feat: implement per-note search and document pipeline
- Add SourceType::Note with extract_note_document() and ParentMetadataCache
- Migration 022: composite indexes for notes queries + author_id column
- Migration 024: table rebuild adding 'note' to CHECK constraints, defense triggers
- Migration 025: backfill existing non-system notes into dirty queue
- Add lore notes CLI command with 17 filter options (author, path, resolution, etc.)
- Support table/json/jsonl/csv output formats with field selection
- Wire note dirty tracking through discussion and MR discussion ingestion
- Fix test_migration_024_preserves_existing_data off-by-one (tested wrong migration)
- Fix upsert_document_inner returning false for label/path-only changes
2026-02-12 13:31:24 -05:00
teernisse
fda9cd8835 chore(beads): revise 18 NOTE beads with verified codebase context
Enriched all per-note search beads (NOTE-0A through NOTE-2I) with:
- Corrected migration numbers (022, 024, 025)
- Verified file paths and line numbers from codebase
- Complete function signatures for referenced code
- Detailed approach sections with SQL and Rust patterns
- DocumentData struct field mappings
- TDD anchors with specific test names
- Edge cases from codebase analysis
- Dependency context explaining what each blocker provides
2026-02-12 12:26:48 -05:00
teernisse
c8d609ab78 chore: add drift to autocorrect command registry 2026-02-12 12:10:02 -05:00
teernisse
35c828ba73 feat(bd-91j1): enhance robot-docs with quick_start and example_output
Add quick_start section with glab equivalents, lore-exclusive features,
and read/write split guidance. Add example_output to issues, mrs, search,
and who commands. Update strip_schemas to also strip example_output in
brief mode. Update beads tracking state.

Closes: bd-91j1
2026-02-12 12:09:44 -05:00
128 changed files with 30911 additions and 11785 deletions

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
bd-xsgw
bd-1sc6

99
.claude/plan.md Normal file
View File

@@ -0,0 +1,99 @@
# Plan: Add Colors to Sync Command Output
## Current State
The sync output has three layers, each needing color treatment:
### Layer 1: Stage Lines (during sync)
```
✓ Issues 10 issues from 2 projects 4.2s
✓ Status 3 statuses updated · 5 seen 4.2s
vs/typescript-code 2 issues · 1 statuses updated
✓ MRs 5 merge requests from 2 projects 12.3s
vs/python-code 3 MRs · 10 discussions
✓ Docs 1,200 documents generated 8.1s
✓ Embed 3,400 chunks embedded 45.2s
```
**What's uncolored:** icons, labels, numbers, elapsed times, sub-row project paths, failure counts in parentheses.
### Layer 2: Summary (after sync)
```
Synced 10 issues and 5 MRs in 42.3s
120 discussions · 45 events · 12 diffs · 3 statuses updated
1,200 docs regenerated · 3,400 embedded
```
**What's already colored:** headline ("Synced" = green bold, "Sync completed with issues" = warning bold), issue/MR counts (bold), error line (red). Detail lines are all dim.
### Layer 3: Timing breakdown (`-t` flag)
```
── Timing ──────────────────────
issues .............. 4.2s
merge_requests ...... 12.3s
```
**What's already colored:** dots (dim), time (bold), errors (red), rate limits (warning).
---
## Color Plan
Using only existing `Theme` methods — no new colors needed.
### Stage Lines (`format_stage_line` + callers in sync.rs)
| Element | Current | Proposed | Theme method |
|---------|---------|----------|-------------|
| Icon (✓/⚠) | plain | green for success, yellow for warning | `Theme::success()` / `Theme::warning()` |
| Label ("Issues", "MRs", etc.) | plain | bold | `Theme::bold()` |
| Numbers in summary text | plain | bold | `Theme::bold()` (just the count) |
| Elapsed time | plain | muted gray | `Theme::timing()` |
| Failure text in parens | plain | warning/error color | `Theme::warning()` |
### Sub-rows (project breakdown lines)
| Element | Current | Proposed |
|---------|---------|----------|
| Project path | dim | `Theme::muted()` (slightly brighter than dim) |
| Counts (numbers only) | dim | `Theme::dim()` but numbers in normal weight |
| Error/failure counts | dim | `Theme::warning()` |
| Middle dots | dim | keep dim (they're separators, should recede) |
### Summary (`print_sync`)
| Element | Current | Proposed |
|---------|---------|----------|
| Issue/MR counts in headline | bold only | `Theme::info()` + bold (cyan numbers pop) |
| Time in headline | plain | `Theme::timing()` |
| Detail line numbers | all dim | numbers in `Theme::info()`, rest stays dim |
| Doc line numbers | all dim | numbers in `Theme::info()`, rest stays dim |
| "Already up to date" time | plain | `Theme::timing()` |
---
## Files to Change
1. **`src/cli/progress.rs`** — `format_stage_line()`: apply color to icon, bold to label, `Theme::timing()` to elapsed
2. **`src/cli/commands/sync.rs`** —
- Pass colored icons to `format_stage_line` / `emit_stage_line` / `emit_stage_block`
- Color failure text in `append_failures()`
- Color numbers and time in `print_sync()`
- Color error/failure counts in sub-row functions (`issue_sub_rows`, `mr_sub_rows`, `status_sub_rows`)
## Approach
- `format_stage_line` already receives the icon string — color it before passing
- Add a `color_icon` helper that applies success/warning color to the icon glyph
- Bold the label in `format_stage_line`
- Apply `Theme::timing()` to elapsed in `format_stage_line`
- In `append_failures`, wrap failure text in `Theme::warning()`
- In `print_sync`, wrap count numbers with `Theme::info().bold()`
- In sub-row functions, apply `Theme::warning()` to error/failure parts only (keep rest dim)
## Non-goals
- No changes to robot mode (JSON output)
- No changes to dry-run output (already reasonably colored)
- No new Theme colors — use existing palette
- No changes to timing breakdown (already colored)

21
.github/workflows/roam.yml vendored Normal file
View File

@@ -0,0 +1,21 @@
name: Roam Code Analysis
on:
pull_request:
branches: [main, master]
permissions:
contents: read
pull-requests: write
jobs:
roam:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install roam-code
- run: roam index
- run: roam fitness
- run: roam pr-risk --json

3
.gitignore vendored
View File

@@ -41,6 +41,9 @@ lore.config.json
*.db-shm
# Mock seed data
tools/mock-seed/
# Added by cargo
/target

11
.roam/fitness.yaml Normal file
View File

@@ -0,0 +1,11 @@
rules:
- name: No circular imports in core
type: dependency
source: "src/**"
forbidden_target: "tests/**"
reason: "Production code should not import test modules"
- name: Complexity threshold
type: metric
metric: cognitive_complexity
threshold: 30
reason: "Functions above 30 cognitive complexity need refactoring"

View File

@@ -16,43 +16,10 @@ If I tell you to do something, even if it goes against what follows below, YOU M
## Version Control: jj-First (CRITICAL)
**ALWAYS prefer jj (Jujutsu) over git for VCS mutations** (commit, describe, rebase, push, bookmark, undo). This is a colocated repo with both `.jj/` and `.git/`. Only fall back to raw `git` for things jj cannot do (hooks, LFS, submodules, `gh` CLI interop).
**Exception — read-only inspection:** Use `git status`, `git diff`, `git log` instead of their jj equivalents. In a colocated repo these see accurate data, and unlike jj, they don't create operations that cause divergences when multiple agents run concurrently. See "Parallel Agent VCS Protocol" below.
**ALWAYS prefer jj (Jujutsu) over git for all VCS operations.** This is a colocated repo with both `.jj/` and `.git/`. When instructed to use git by anything — even later in this file — use the best jj replacement commands instead. Only fall back to raw `git` for things jj cannot do (hooks, LFS, submodules, `gh` CLI interop).
See `~/.claude/rules/jj-vcs/` for the full command reference, translation table, revsets, patterns, and recovery recipes.
### Parallel Agent VCS Protocol (CRITICAL)
Multiple agents often run concurrently in separate terminal panes, sharing the same repo directory. This requires care because jj's auto-snapshot creates operations on EVERY command — even read-only ones like `jj status`. Concurrent jj commands fork from the same parent operation and create **divergent changes**.
**The rule: use git for reads, jj for writes.**
In a colocated repo, git reads see accurate data because jj keeps `.git/` in sync.
| Operation | Use | Why |
|-----------|-----|-----|
| Check status | `git status` | No jj operation created |
| View diff | `git diff` | No jj operation created |
| Browse history | `git log` | No jj operation created |
| Commit work | `jj commit -m "msg"` | jj mutation (better UX) |
| Update description | `jj describe -m "msg"` | jj mutation |
| Rebase | `jj rebase -d trunk()` | jj mutation |
| Push | `jj git push -b <name>` | jj mutation |
| Manage bookmarks | `jj bookmark set ...` | jj mutation |
| Undo a mistake | `jj undo` | jj mutation |
**NEVER run `jj status`, `jj diff`, `jj log`, or `jj show` when other agents may be active** — these trigger snapshots that cause divergences.
**If using Claude Code's built-in agent teams:** Only the team lead runs ANY VCS commands (git or jj). Workers only edit files via Edit/Write tools and do NOT run "Landing the Plane".
**Resolving divergences if they occur:**
```bash
jj log -r 'divergent()' # Find divergent changes
jj abandon <unwanted-commit-id> # Keep the version you want
```
---
## Irreversible Git & Filesystem Actions — DO NOT EVER BREAK GLASS

173
Cargo.lock generated
View File

@@ -169,6 +169,23 @@ version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
[[package]]
name = "charmed-lipgloss"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "45e10db01f5eaea11d98ca5c5cffd8cc4add7ac56d0128d91ba1f2a3757b6c5a"
dependencies = [
"bitflags",
"colored",
"crossterm",
"serde",
"serde_json",
"thiserror",
"toml",
"tracing",
"unicode-width 0.1.14",
]
[[package]]
name = "chrono"
version = "0.4.43"
@@ -239,14 +256,13 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b05b61dc5112cbb17e4b6cd61790d9845d13888356391624cbe7e41efeac1e75"
[[package]]
name = "comfy-table"
version = "7.2.2"
name = "colored"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "958c5d6ecf1f214b4c2bbbbf6ab9523a864bd136dcf71a7e8904799acfe1ad47"
checksum = "117725a109d387c937a1533ce01b450cbde6b88abceea8473c4d7a85853cda3c"
dependencies = [
"crossterm",
"unicode-segmentation",
"unicode-width",
"lazy_static",
"windows-sys 0.52.0",
]
[[package]]
@@ -258,10 +274,19 @@ dependencies = [
"encode_unicode",
"libc",
"once_cell",
"unicode-width",
"unicode-width 0.2.2",
"windows-sys 0.61.2",
]
[[package]]
name = "convert_case"
version = "0.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "633458d4ef8c78b72454de2d54fd6ab2e60f9e02be22f3c6104cdc8a4e0fceb9"
dependencies = [
"unicode-segmentation",
]
[[package]]
name = "core-foundation"
version = "0.9.4"
@@ -319,9 +344,13 @@ checksum = "d8b9f2e4c67f833b660cdb0a3523065869fb35570177239812ed4c905aeff87b"
dependencies = [
"bitflags",
"crossterm_winapi",
"derive_more",
"document-features",
"mio",
"parking_lot",
"rustix",
"signal-hook",
"signal-hook-mio",
"winapi",
]
@@ -371,6 +400,28 @@ dependencies = [
"powerfmt",
]
[[package]]
name = "derive_more"
version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d751e9e49156b02b44f9c1815bcb94b984cdcc4396ecc32521c739452808b134"
dependencies = [
"derive_more-impl",
]
[[package]]
name = "derive_more-impl"
version = "2.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "799a97264921d8623a957f6c3b9011f3b5492f557bbb7a5a19b7fa6d06ba8dcb"
dependencies = [
"convert_case",
"proc-macro2",
"quote",
"rustc_version",
"syn",
]
[[package]]
name = "dialoguer"
version = "0.12.0"
@@ -976,7 +1027,7 @@ checksum = "9375e112e4b463ec1b1c6c011953545c65a30164fbab5b581df32b3abf0dcb88"
dependencies = [
"console",
"portable-atomic",
"unicode-width",
"unicode-width 0.2.2",
"unit-prefix",
"web-time",
]
@@ -1106,13 +1157,13 @@ checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
[[package]]
name = "lore"
version = "0.6.2"
version = "0.8.3"
dependencies = [
"async-stream",
"charmed-lipgloss",
"chrono",
"clap",
"clap_complete",
"comfy-table",
"console",
"dialoguer",
"dirs",
@@ -1181,6 +1232,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a69bcab0ad47271a0234d9422b131806bf3968021e5dc9328caf2d4cd58557fc"
dependencies = [
"libc",
"log",
"wasi",
"windows-sys 0.61.2",
]
@@ -1574,6 +1626,15 @@ dependencies = [
"sqlite-wasm-rs",
]
[[package]]
name = "rustc_version"
version = "0.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cfcb3a22ef46e85b45de6ee7e79d063319ebb6594faafcf1c225ea92ab6e9b92"
dependencies = [
"semver",
]
[[package]]
name = "rustix"
version = "1.1.3"
@@ -1670,6 +1731,12 @@ dependencies = [
"libc",
]
[[package]]
name = "semver"
version = "1.0.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d767eb0aabc880b29956c35734170f26ed551a859dbd361d140cdbeca61ab1e2"
[[package]]
name = "serde"
version = "1.0.228"
@@ -1713,6 +1780,15 @@ dependencies = [
"zmij",
]
[[package]]
name = "serde_spanned"
version = "0.6.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bf41e0cfaf7226dca15e8197172c295a782857fcb97fad1808a166870dee75a3"
dependencies = [
"serde",
]
[[package]]
name = "serde_urlencoded"
version = "0.7.1"
@@ -1757,6 +1833,27 @@ version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
[[package]]
name = "signal-hook"
version = "0.3.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d881a16cf4426aa584979d30bd82cb33429027e42122b169753d6ef1085ed6e2"
dependencies = [
"libc",
"signal-hook-registry",
]
[[package]]
name = "signal-hook-mio"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b75a19a7a740b25bc7944bdee6172368f988763b744e3d4dfe753f6b4ece40cc"
dependencies = [
"libc",
"mio",
"signal-hook",
]
[[package]]
name = "signal-hook-registry"
version = "1.4.5"
@@ -2028,6 +2125,47 @@ dependencies = [
"tokio",
]
[[package]]
name = "toml"
version = "0.8.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dc1beb996b9d83529a9e75c17a1686767d148d70663143c7854d8b4a09ced362"
dependencies = [
"serde",
"serde_spanned",
"toml_datetime",
"toml_edit",
]
[[package]]
name = "toml_datetime"
version = "0.6.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "22cddaf88f4fbc13c51aebbf5f8eceb5c7c5a9da2ac40a13519eb5b0a0e8f11c"
dependencies = [
"serde",
]
[[package]]
name = "toml_edit"
version = "0.22.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "41fe8c660ae4257887cf66394862d21dbca4a6ddd26f04a3560410406a2f819a"
dependencies = [
"indexmap",
"serde",
"serde_spanned",
"toml_datetime",
"toml_write",
"winnow",
]
[[package]]
name = "toml_write"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5d99f8c9a7727884afe522e9bd5edbfc91a3312b36a77b5fb8926e4c31a41801"
[[package]]
name = "tower"
version = "0.5.3"
@@ -2183,6 +2321,12 @@ version = "1.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f6ccf251212114b54433ec949fd6a7841275f9ada20dddd2f29e9ceea4501493"
[[package]]
name = "unicode-width"
version = "0.1.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7dd6e30e90baa6f72411720665d41d89b9a3d039dc45b8faea1ddd07f617f6af"
[[package]]
name = "unicode-width"
version = "0.2.2"
@@ -2611,6 +2755,15 @@ version = "0.53.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d6bbff5f0aada427a1e5a6da5f1f98158182f26556f345ac9e04d36d0ebed650"
[[package]]
name = "winnow"
version = "0.7.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5a5364e9d77fcdeeaa6062ced926ee3381faa2ee02d3eb83a5c27a8825540829"
dependencies = [
"memchr",
]
[[package]]
name = "wiremock"
version = "0.6.5"

View File

@@ -1,6 +1,6 @@
[package]
name = "lore"
version = "0.6.2"
version = "0.8.3"
edition = "2024"
description = "Gitlore - Local GitLab data management with semantic search"
authors = ["Taylor Eernisse"]
@@ -25,7 +25,7 @@ clap_complete = "4"
dialoguer = "0.12"
console = "0.16"
indicatif = "0.18"
comfy-table = "7"
lipgloss = { package = "charmed-lipgloss", version = "0.1", default-features = false, features = ["native"] }
open = "5"
# HTTP

View File

@@ -0,0 +1,425 @@
# Proposed Code File Reorganization Plan
## Executive Summary
The codebase is 79 Rust source files / 46K lines across 7 top-level modules. Most modules (`gitlab/`, `embedding/`, `search/`, `documents/`, `ingestion/`) are well-organized. The pain points are:
1. **`core/` is a grab-bag** — 22 files mixing infrastructure, domain logic, DB operations, and an entire timeline pipeline
2. **`main.rs` is 2713 lines** — ~30 handler functions that bridge CLI args to commands
3. **`cli/mod.rs` is 949 lines** — every clap argument struct is packed into one file
4. **Giant command files**`who.rs` (6067 lines), `list.rs` (2931 lines) are unwieldy
This plan is organized into **three tiers** based on impact-to-risk ratio. Tier 1 changes are "no-brainers" — they reduce confusion with minimal import churn. Tier 2 changes are valuable but involve more cross-cutting import updates. Tier 3 changes are "maybe later" — they'd be nice but the juice might not be worth the squeeze right now.
---
## Current Structure (Annotated)
```
src/
├── main.rs (2713 lines) ← dispatch + ~30 handler functions + error helpers
├── lib.rs (9 lines)
├── cli/
│ ├── mod.rs (949 lines) ← ALL clap arg structs crammed here
│ ├── autocorrect.rs (945 lines)
│ ├── progress.rs (92 lines)
│ ├── robot.rs (111 lines)
│ └── commands/
│ ├── mod.rs (50 lines) — re-exports
│ ├── auth_test.rs
│ ├── count.rs (406 lines)
│ ├── doctor.rs (576 lines)
│ ├── drift.rs (642 lines)
│ ├── embed.rs
│ ├── generate_docs.rs (320 lines)
│ ├── ingest.rs (1064 lines)
│ ├── init.rs (174 lines)
│ ├── list.rs (2931 lines) ← handles issues, MRs, AND notes listing
│ ├── search.rs (418 lines)
│ ├── show.rs (1377 lines)
│ ├── stats.rs (505 lines)
│ ├── sync_status.rs (454 lines)
│ ├── sync.rs (576 lines)
│ ├── timeline.rs (488 lines)
│ └── who.rs (6067 lines) ← 5 sub-modes: expert, workload, active, overlap, reviews
├── core/
│ ├── mod.rs (25 lines)
│ ├── backoff.rs ← retry logic (used by ingestion)
│ ├── config.rs (789 lines) ← configuration types
│ ├── db.rs (970 lines) ← connection + 22 migrations
│ ├── dependent_queue.rs (330 lines) ← job queue (used by ingestion orchestrator)
│ ├── error.rs (295 lines) ← error enum + exit codes
│ ├── events_db.rs (199 lines) ← resource event upserts (used by ingestion)
│ ├── lock.rs (228 lines) ← filesystem sync lock
│ ├── logging.rs (179 lines) ← tracing filter builders
│ ├── metrics.rs (566 lines) ← tracing-based stage timing
│ ├── note_parser.rs (563 lines) ← cross-ref extraction from note bodies
│ ├── paths.rs ← config/db/log file path resolution
│ ├── payloads.rs (204 lines) ← raw JSON payload storage
│ ├── project.rs (274 lines) ← fuzzy project resolution from DB
│ ├── references.rs (551 lines) ← entity cross-reference extraction
│ ├── shutdown.rs ← graceful shutdown via tokio signal
│ ├── sync_run.rs (218 lines) ← sync run recording to DB
│ ├── time.rs ← time conversion utilities
│ ├── timeline.rs (284 lines) ← timeline types + EntityRef
│ ├── timeline_collect.rs (695 lines) ← Stage 4: collect events from DB
│ ├── timeline_expand.rs (557 lines) ← Stage 3: expand via cross-refs
│ └── timeline_seed.rs (552 lines) ← Stage 1: FTS search seeding
├── documents/ ← well-organized, 3 focused files
├── embedding/ ← well-organized, 6 focused files
├── gitlab/ ← well-organized, with transformers/ subdir
├── ingestion/ ← well-organized, 8 focused files
└── search/ ← well-organized, 5 focused files
```
---
## Tier 1: No-Brainers (Do First)
### 1.1 Extract `timeline/` from `core/`
**What:** Move the 4 timeline files into their own top-level module `src/timeline/`.
**Current location:**
- `core/timeline.rs` (284 lines) — types: `EntityRef`, `ExpandedEntityRef`, `TimelineEvent`, `TimelineEventType`, etc.
- `core/timeline_seed.rs` (552 lines) — Stage 1: FTS-based seeding
- `core/timeline_expand.rs` (557 lines) — Stage 3: cross-reference expansion
- `core/timeline_collect.rs` (695 lines) — Stage 4: event collection from DB
**New structure:**
```
src/timeline/
├── mod.rs ← types (from timeline.rs) + re-exports
├── seed.rs ← from timeline_seed.rs
├── expand.rs ← from timeline_expand.rs
└── collect.rs ← from timeline_collect.rs
```
**Rationale:** These 4 files form a cohesive 5-stage pipeline (SEED→HYDRATE→EXPAND→COLLECT→RENDER). They have nothing to do with "core" infrastructure like `db.rs`, `config.rs`, or `error.rs`. They only import from `core::error`, `core::time`, and `search::fts` — all of which remain accessible via `crate::core::*` and `crate::search::*` after the move.
**Import changes needed:**
- `cli/commands/timeline.rs`: `use crate::core::timeline::*``use crate::timeline::*`, same for `timeline_seed`, `timeline_expand`, `timeline_collect`
- `core/mod.rs`: remove the 4 `pub mod timeline*` lines
- `lib.rs`: add `pub mod timeline;`
**Risk: LOW** — Only 1 consumer (`cli/commands/timeline.rs`) + internal cross-references between the 4 files.
---
### 1.2 Extract `xref/` (cross-reference extraction) from `core/`
**What:** Move `note_parser.rs` and `references.rs` into `src/xref/`.
**Current location:**
- `core/note_parser.rs` (563 lines) — parses note bodies for "mentioned in group/repo#123" patterns, persists to `note_cross_references` table
- `core/references.rs` (551 lines) — extracts entity references from state events and closing MRs, writes to `entity_references` table
**New structure:**
```
src/xref/
├── mod.rs ← re-exports
├── note_parser.rs ← from core/note_parser.rs
└── references.rs ← from core/references.rs
```
**Rationale:** These files implement a specific domain concept — extracting and persisting cross-references between issues and MRs. They are not "core infrastructure." They're consumed by `ingestion/orchestrator.rs` for the cross-reference extraction phase, and the data they produce is consumed by the timeline pipeline. Putting them in their own module makes the data flow clearer: `ingestion → xref → timeline`.
**Import changes needed:**
- `ingestion/orchestrator.rs`: `use crate::core::references::*``use crate::xref::references::*`
- `ingestion/orchestrator.rs`: `use crate::core::note_parser::*` (if used directly — needs verification) → `use crate::xref::*`
- `core/mod.rs`: remove `pub mod note_parser; pub mod references;`
- `lib.rs`: add `pub mod xref;`
- Internal: the files use `super::error::Result` and `super::time::now_ms` which become `crate::core::error::Result` and `crate::core::time::now_ms`
**Risk: LOW** — 2-3 consumers at most. The files already use `super::` internally which just needs updating to `crate::core::`.
---
## Tier 2: Good Improvements (Do After Tier 1)
### 2.1 Group ingestion-adjacent DB operations
**What:** Move `events_db.rs`, `dependent_queue.rs`, `payloads.rs`, and `sync_run.rs` from `core/` into `ingestion/` since they exclusively serve the ingestion pipeline.
**Current consumers:**
- `events_db.rs` → only used by `cli/commands/count.rs` (for event counts)
- `dependent_queue.rs` → only used by `ingestion/orchestrator.rs` and `main.rs` (to release locked jobs)
- `payloads.rs` → only used by `ingestion/discussions.rs`, `ingestion/issues.rs`, `ingestion/merge_requests.rs`, `ingestion/mr_discussions.rs`
- `sync_run.rs` → only used by `cli/commands/sync.rs` and `cli/commands/sync_status.rs`
**New structure:**
```
src/ingestion/
├── (existing files...)
├── events_db.rs ← from core/events_db.rs
├── dependent_queue.rs ← from core/dependent_queue.rs
├── payloads.rs ← from core/payloads.rs
└── sync_run.rs ← from core/sync_run.rs
```
**Rationale:** All 4 files exist to support the ingestion pipeline:
- `events_db.rs` upserts resource state/label/milestone events fetched during ingestion
- `dependent_queue.rs` manages the job queue that drives incremental discussion fetching
- `payloads.rs` stores the raw JSON payloads fetched from GitLab
- `sync_run.rs` records when syncs start/finish and their metrics
When you're looking for "how does ingestion work?", you'd naturally look in `ingestion/`. Having these scattered in `core/` requires knowing the hidden dependency.
**Import changes needed:**
- `events_db.rs`: 1 consumer in `cli/commands/count.rs` changes from `crate::core::events_db``crate::ingestion::events_db`
- `dependent_queue.rs`: 2 consumers — `ingestion/orchestrator.rs` (becomes `super::dependent_queue`) and `main.rs`
- `payloads.rs`: 4 consumers in `ingestion/*.rs` (become `super::payloads`)
- `sync_run.rs`: 2 consumers in `cli/commands/sync.rs` and `sync_status.rs`
- Internal references change from `super::error` / `super::time` to `crate::core::error` / `crate::core::time`
**Risk: MEDIUM** — More import changes, but all straightforward. The internal `super::` references need the most attention.
**Alternatively:** If moving feels like too much churn, a lighter option is to create `core/ingestion_db.rs` that re-exports from these 4 files, making the grouping visible without moving files. But I think the move is cleaner.
---
### 2.2 Split `cli/mod.rs` — move arg structs to their command files
**What:** Move each `*Args` struct from `cli/mod.rs` into the corresponding `cli/commands/*.rs` file. Keep `Cli` struct, `Commands` enum, and `detect_robot_mode_from_env()` in `cli/mod.rs`.
**Currently `cli/mod.rs` (949 lines) contains:**
- `Cli` struct (81 lines) — the root clap parser
- `Commands` enum (193 lines) — all subcommand variants
- `IssuesArgs` (86 lines) → move to `commands/list.rs` or stay near issues handling
- `MrsArgs` (93 lines) → move to `commands/list.rs` or stay near MRs handling
- `NotesArgs` (99 lines) → move to `commands/list.rs`
- `IngestArgs` (33 lines) → move to `commands/ingest.rs`
- `StatsArgs` (19 lines) → move to `commands/stats.rs`
- `SearchArgs` (58 lines) → move to `commands/search.rs`
- `GenerateDocsArgs` (9 lines) → move to `commands/generate_docs.rs`
- `SyncArgs` (39 lines) → move to `commands/sync.rs`
- `EmbedArgs` (15 lines) → move to `commands/embed.rs`
- `TimelineArgs` (53 lines) → move to `commands/timeline.rs`
- `WhoArgs` (76 lines) → move to `commands/who.rs`
- `CountArgs` (9 lines) → move to `commands/count.rs`
**After refactoring, `cli/mod.rs` shrinks to ~300 lines** (just `Cli` + `Commands` + the inlined variants like `Init`, `Drift`, `Backup`, `Reset`).
**Rationale:** When adding a new flag to the `who` command, you currently have to edit `cli/mod.rs` (the args struct), `cli/commands/who.rs` (the implementation), and `main.rs` (the dispatch). If the args struct lives in `commands/who.rs`, you only need two files. This is the standard pattern in mature clap-based Rust CLIs.
**Import changes needed:**
- `main.rs` currently does `use lore::cli::{..., WhoArgs, ...}` — these would become `use lore::cli::commands::{..., WhoArgs, ...}` or the `commands/mod.rs` re-exports them
- Each `commands/*.rs` gets its own `#[derive(Parser)]` struct
- `Commands` enum in `cli/mod.rs` keeps using the types but imports from `commands::*`
**Risk: MEDIUM** — Lots of `use` path changes in `main.rs`, but purely mechanical. No logic changes.
---
## Tier 3: Consider Later
### 3.1 Split `main.rs` (2713 lines)
**The problem:** `main.rs` contains `main()`, ~30 `handle_*` functions, error handling, clap error formatting, fuzzy command matching, and the `robot-docs` JSON manifest (a 400+ line inline JSON literal).
**Possible approach:**
- Extract `handle_*` functions into `cli/dispatch.rs` (the routing layer)
- Extract error handling into `cli/errors.rs`
- Extract `handle_robot_docs` + the JSON manifest into `cli/robot_docs.rs`
- Keep `main()` in `main.rs` at ~150 lines (just the tracing setup + dispatch call)
**Why Tier 3:** This is the messiest split. The handler functions depend on the `cli::commands::*` functions AND the `cli::robot::*` helpers AND direct `std::process::exit` calls. Making this work cleanly requires careful thought about the error boundary between `main.rs` (binary) and `lib.rs` (library).
**Risk: HIGH** — Every handler function touches `robot_mode`, constructs its own timer, opens the DB, and manages error display. The boilerplate is high but consistent, so splitting would just move it around without reducing complexity.
---
### 3.2 Split `cli/commands/who.rs` (6067 lines)
**The problem:** This file implements 5 distinct modes (expert, workload, active, overlap, reviews), each with its own query, scoring model, and output formatting. It also includes the time-decay scoring model (~500 lines) and per-MR detail breakdown logic.
**Possible split:**
```
src/cli/commands/who/
├── mod.rs ← WhoRun dispatcher, shared types
├── expert.rs ← expert mode (path-based file expertise lookup)
├── workload.rs ← workload mode (user's assigned issues/MRs)
├── active.rs ← active discussions mode
├── overlap.rs ← file overlap between users
├── reviews.rs ← review pattern analysis
└── scoring.rs ← time-decay expert scoring model
```
**Why Tier 3:** The 5 modes share many helper functions, database connection patterns, and output formatting logic. Splitting would require carefully identifying the shared helpers and deciding where they live. The file is big but internally consistent — the modes use a shared dispatcher pattern and common types.
---
### 3.3 Split `cli/commands/list.rs` (2931 lines)
**The problem:** This file handles issue listing, MR listing, AND note listing — three related but distinct operations with separate query builders, output formatters, and test suites.
**Possible split:**
```
src/cli/commands/
├── list_issues.rs ← issue listing + query builder
├── list_mrs.rs ← MR listing + query builder
├── list_notes.rs ← note listing + query builder
└── list.rs ← shared types (ListFilters, etc.) + re-exports
```
**Why Tier 3:** Same issue as `who.rs` — the three listing modes share query building patterns, field selection logic, and sorting code. Splitting requires identifying and extracting the shared pieces first.
---
## Files NOT Recommended to Move
These files belong exactly where they are:
| File | Why it belongs in `core/` |
|------|--------------------------|
| `config.rs` | Config types used by nearly everything |
| `db.rs` | Database connection + migrations — foundational |
| `error.rs` | Error types used by every module |
| `paths.rs` | File path resolution — infrastructure |
| `logging.rs` | Tracing setup — infrastructure |
| `lock.rs` | Filesystem sync lock — infrastructure |
| `shutdown.rs` | Graceful shutdown signal — infrastructure |
| `backoff.rs` | Retry math — infrastructure |
| `time.rs` | Time conversion — used everywhere |
| `metrics.rs` | Tracing metrics layer — infrastructure |
| `project.rs` | Fuzzy project resolution — used by 8+ consumers across modules |
These files are legitimate "core infrastructure" used across multiple modules. Moving them would create import churn with no clarity gain.
---
## Files NOT Recommended to Split/Merge
| File | Why leave it alone |
|------|-------------------|
| `documents/extractor.rs` (2341 lines) | One cohesive extractor per entity type — the size comes from per-type formatting logic, not mixed concerns |
| `ingestion/orchestrator.rs` (1703 lines) | Single orchestration flow — splitting would scatter the pipeline |
| `gitlab/graphql.rs` (1293 lines) | GraphQL client with adaptive paging — cohesive |
| `gitlab/client.rs` (851 lines) | REST client with all endpoints — cohesive |
| `cli/autocorrect.rs` (945 lines) | Correction registry + fuzzy matching — splitting gains nothing |
---
## Proposed Final Structure (Tiers 1+2)
```
src/
├── main.rs (2713 lines — unchanged for now)
├── lib.rs (adds: pub mod timeline; pub mod xref;)
├── cli/
│ ├── mod.rs (~300 lines — Cli + Commands only, args moved out)
│ ├── autocorrect.rs (unchanged)
│ ├── progress.rs (unchanged)
│ ├── robot.rs (unchanged)
│ └── commands/
│ ├── mod.rs (re-exports + WhoArgs, IssuesArgs, etc.)
│ ├── (all existing files — unchanged but with args structs moved in)
│ └── ...
├── core/ (slimmed: 14 files → infrastructure only)
│ ├── mod.rs
│ ├── backoff.rs
│ ├── config.rs
│ ├── db.rs
│ ├── error.rs
│ ├── lock.rs
│ ├── logging.rs
│ ├── metrics.rs
│ ├── paths.rs
│ ├── project.rs
│ ├── shutdown.rs
│ └── time.rs
├── timeline/ (NEW — extracted from core/)
│ ├── mod.rs (types from core/timeline.rs)
│ ├── seed.rs (from core/timeline_seed.rs)
│ ├── expand.rs (from core/timeline_expand.rs)
│ └── collect.rs (from core/timeline_collect.rs)
├── xref/ (NEW — extracted from core/)
│ ├── mod.rs
│ ├── note_parser.rs (from core/note_parser.rs)
│ └── references.rs (from core/references.rs)
├── ingestion/ (gains 4 files from core/)
│ ├── (existing files...)
│ ├── events_db.rs (from core/events_db.rs)
│ ├── dependent_queue.rs (from core/dependent_queue.rs)
│ ├── payloads.rs (from core/payloads.rs)
│ └── sync_run.rs (from core/sync_run.rs)
├── documents/ (unchanged)
├── embedding/ (unchanged)
├── gitlab/ (unchanged)
└── search/ (unchanged)
```
---
## Import Change Tracking
### Tier 1.1: Timeline extraction
| Consumer file | Old import | New import |
|---------------|-----------|------------|
| `cli/commands/timeline.rs:10-15` | `crate::core::timeline::*` | `crate::timeline::*` |
| `cli/commands/timeline.rs:13` | `crate::core::timeline_collect::collect_events` | `crate::timeline::collect_events` (or `crate::timeline::collect::collect_events`) |
| `cli/commands/timeline.rs:14` | `crate::core::timeline_expand::expand_timeline` | `crate::timeline::expand_timeline` |
| `cli/commands/timeline.rs:15` | `crate::core::timeline_seed::seed_timeline` | `crate::timeline::seed_timeline` |
| `core/timeline_seed.rs:7-8` | `super::timeline::*` | `super::*` (or `crate::timeline::*` depending on structure) |
| `core/timeline_expand.rs:6` | `super::timeline::*` | `super::*` |
| `core/timeline_collect.rs:4` | `super::timeline::*` | `super::*` |
| `core/timeline_seed.rs:8` | `crate::search::*` | `crate::search::*` (no change) |
| `core/timeline_seed.rs:6-7` | `super::error::Result` | `crate::core::error::Result` |
| `core/timeline_expand.rs:5` | `super::error::Result` | `crate::core::error::Result` |
| `core/timeline_collect.rs:3` | `super::error::*` | `crate::core::error::*` |
### Tier 1.2: Cross-reference extraction
| Consumer file | Old import | New import |
|---------------|-----------|------------|
| `ingestion/orchestrator.rs:10-12` | `crate::core::references::*` | `crate::xref::references::*` |
| `core/note_parser.rs:7-8` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
| `core/references.rs:4-5` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
### Tier 2.1: Ingestion-adjacent DB ops
| Consumer file | Old import | New import |
|---------------|-----------|------------|
| `cli/commands/count.rs:9` | `crate::core::events_db::*` | `crate::ingestion::events_db::*` |
| `ingestion/orchestrator.rs:6-8` | `crate::core::dependent_queue::*` | `super::dependent_queue::*` |
| `main.rs:37` | `crate::core::dependent_queue::release_all_locked_jobs` | `crate::ingestion::dependent_queue::release_all_locked_jobs` |
| `ingestion/discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
| `ingestion/issues.rs:9` | `crate::core::payloads::*` | `super::payloads::*` |
| `ingestion/merge_requests.rs:8` | `crate::core::payloads::*` | `super::payloads::*` |
| `ingestion/mr_discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
| `cli/commands/sync.rs` | (uses `crate::core::sync_run::*`) | `crate::ingestion::sync_run::*` |
| `cli/commands/sync_status.rs` | (uses `crate::core::sync_run::*` or `crate::core::metrics::*`) | check and update |
| Internal: `events_db.rs:4-5` | `super::error::*`, `super::time::*` | `crate::core::error::*`, `crate::core::time::*` |
| Internal: `dependent_queue.rs:5-6` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
| Internal: `payloads.rs:9-10` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
| Internal: `sync_run.rs:2-4` | `super::error::*`, `super::metrics::*`, `super::time::*` | `crate::core::error::*`, `crate::core::metrics::*`, `crate::core::time::*` |
---
## Execution Order
1. **Tier 1.1** — Extract timeline → `src/timeline/` (LOW risk, 1 consumer)
2. **Tier 1.2** — Extract xref → `src/xref/` (LOW risk, 1-2 consumers)
3. **Cargo check + clippy + test** after each tier
4. **Tier 2.1** — Move ingestion DB ops (MEDIUM risk, more consumers)
5. **Cargo check + clippy + test**
6. **Tier 2.2** — Split `cli/mod.rs` args (MEDIUM risk, mostly mechanical)
7. **Cargo check + clippy + test + fmt**
Each tier should be its own commit for easy rollback.
---
## What This Achieves
**Before:** A developer looking at `core/` sees 22 files and has to mentally sort "infrastructure vs. domain logic vs. pipeline stage." The timeline pipeline is invisible unless you know to look in `core/`.
**After:**
- `core/` has 12 files, all clearly infrastructure (db, config, error, paths, logging, lock, shutdown, backoff, time, metrics, project)
- `timeline/` is a discoverable first-class module showing the 5-stage pipeline
- `xref/` makes the cross-reference extraction domain visible
- `ingestion/` contains everything related to data fetching: the orchestrator, entity ingestors, AND their supporting DB operations
- `cli/mod.rs` is lean — just the top-level Cli struct and Commands enum
A new developer (or coding agent) can now answer "where is the timeline code?" → `src/timeline/`, "where is ingestion?" → `src/ingestion/`, "where is cross-reference extraction?" → `src/xref/`, without needing institutional knowledge.

174
README.md
View File

@@ -19,7 +19,10 @@ Local GitLab data management with semantic search, people intelligence, and temp
- **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
- **Work item status enrichment**: Fetches issue statuses (e.g., "To do", "In progress", "Done") from GitLab's GraphQL API with adaptive page sizing, color-coded display, and case-insensitive filtering
- **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
- **Note querying**: Rich filtering over discussion notes by author, type, path, resolution status, time range, and body content
- **Discussion drift detection**: Semantic analysis of how discussions diverge from original issue intent
- **Robot mode**: Machine-readable JSON output with structured errors, meaningful exit codes, and actionable recovery steps
- **Error tolerance**: Auto-corrects common CLI mistakes (case, typos, single-dash flags, value casing) with teaching feedback
- **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing
## Installation
@@ -71,6 +74,12 @@ lore who @asmith
# Timeline of events related to deployments
lore timeline "deployment"
# Timeline for a specific issue
lore timeline issue:42
# Query notes by author
lore notes --author alice --since 7d
# Robot mode (machine-readable JSON)
lore -J issues -n 5 | jq .
```
@@ -109,6 +118,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
"model": "nomic-embed-text",
"baseUrl": "http://localhost:11434",
"concurrency": 4
},
"scoring": {
"authorWeight": 25,
"reviewerWeight": 10,
"noteBonus": 1,
"authorHalfLifeDays": 180,
"reviewerHalfLifeDays": 90,
"noteHalfLifeDays": 45,
"excludedUsernames": ["bot-user"]
}
}
```
@@ -135,6 +153,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
| `embedding` | `model` | `nomic-embed-text` | Model name for embeddings |
| `embedding` | `baseUrl` | `http://localhost:11434` | Ollama server URL |
| `embedding` | `concurrency` | `4` | Concurrent embedding requests |
| `scoring` | `authorWeight` | `25` | Points per MR where the user authored code touching the path |
| `scoring` | `reviewerWeight` | `10` | Points per MR where the user reviewed code touching the path |
| `scoring` | `noteBonus` | `1` | Bonus per inline review comment (DiffNote) |
| `scoring` | `reviewerAssignmentWeight` | `3` | Points per MR where the user was assigned as reviewer |
| `scoring` | `authorHalfLifeDays` | `180` | Half-life in days for author contribution decay |
| `scoring` | `reviewerHalfLifeDays` | `90` | Half-life in days for reviewer contribution decay |
| `scoring` | `noteHalfLifeDays` | `45` | Half-life in days for note/comment decay |
| `scoring` | `closedMrMultiplier` | `0.5` | Score multiplier for closed (not merged) MRs |
| `scoring` | `excludedUsernames` | `[]` | Usernames excluded from expert results (e.g., bots) |
### Config File Resolution
@@ -262,18 +289,21 @@ lore search "login flow" --mode semantic # Vector similarity only
lore search "auth" --type issue # Filter by source type
lore search "auth" --type mr # MR documents only
lore search "auth" --type discussion # Discussion documents only
lore search "auth" --type note # Individual notes only
lore search "deploy" --author username # Filter by author
lore search "deploy" -p group/repo # Filter by project
lore search "deploy" --label backend # Filter by label (AND logic)
lore search "deploy" --path src/ # Filter by file path (trailing / for prefix)
lore search "deploy" --after 7d # Created after (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-after 2w # Updated after
lore search "deploy" --since 7d # Created since (7d, 2w, 1m, or YYYY-MM-DD)
lore search "deploy" --updated-since 2w # Updated since
lore search "deploy" -n 50 # Limit results (default 20, max 100)
lore search "deploy" --explain # Show ranking explanation per result
lore search "deploy" --fts-mode raw # Raw FTS5 query syntax (advanced)
```
The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. Use `raw` for advanced FTS5 query syntax (AND, OR, NOT, phrase matching, prefix queries).
The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. FTS5 boolean operators (`AND`, `OR`, `NOT`, `NEAR`) are passed through in safe mode, so queries like `"switch AND health"` work without switching to raw mode. Use `raw` for advanced FTS5 query syntax (phrase matching, column filters, prefix queries).
A progress spinner displays during search, showing the active mode (e.g., `Searching (hybrid)...`). In robot mode, spinners are suppressed for clean JSON output.
Requires `lore generate-docs` (or `lore sync`) to have been run at least once. Semantic and hybrid modes require `lore embed` (or `lore sync`) to have generated vector embeddings via Ollama.
@@ -283,7 +313,7 @@ People intelligence: discover experts, analyze workloads, review patterns, activ
#### Expert Mode
Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis).
Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis). Scores use exponential half-life decay so recent contributions count more than older ones. Scoring weights and half-life periods are configurable via the `scoring` config section.
```bash
lore who src/features/auth/ # Who knows about this directory?
@@ -292,6 +322,9 @@ lore who --path README.md # Root files need --path flag
lore who --path Makefile # Dotless root files too
lore who src/ --since 3m # Limit to recent 3 months
lore who src/ -p group/repo # Scope to project
lore who src/ --explain-score # Show per-component score breakdown
lore who src/ --as-of 30d # Score as if "now" was 30 days ago
lore who src/ --include-bots # Include bot users in results
```
The target is auto-detected as a path when it contains `/`. For root files without `/` (e.g., `README.md`), use the `--path` flag. Default time window: 6 months.
@@ -348,21 +381,32 @@ Shows: users with touch counts (author vs. review), linked MR references. Defaul
| `-p` / `--project` | Scope to a project (fuzzy match) |
| `--since` | Time window (7d, 2w, 6m, YYYY-MM-DD). Default varies by mode. |
| `-n` / `--limit` | Max results per section (1-500, default 20) |
| `--all-history` | Remove the default time window, query all history |
| `--detail` | Show per-MR detail breakdown (expert mode only) |
| `--explain-score` | Show per-component score breakdown (expert mode only) |
| `--as-of` | Score as if "now" is a past date (ISO 8601 or duration like 30d, expert mode only) |
| `--include-bots` | Include bot users normally excluded via `scoring.excludedUsernames` |
### `lore timeline`
Reconstruct a chronological timeline of events matching a keyword query. The pipeline discovers related entities through cross-reference graph traversal and assembles a unified, time-ordered event stream.
```bash
lore timeline "deployment" # Events related to deployments
lore timeline "deployment" # Search-based seeding (hybrid search)
lore timeline issue:42 # Direct entity seeding by issue IID
lore timeline i:42 # Shorthand for issue:42
lore timeline mr:99 # Direct entity seeding by MR IID
lore timeline m:99 # Shorthand for mr:99
lore timeline "auth" -p group/repo # Scoped to a project
lore timeline "auth" --since 30d # Only recent events
lore timeline "migration" --depth 2 # Deeper cross-reference expansion
lore timeline "migration" --expand-mentions # Follow 'mentioned' edges (high fan-out)
lore timeline "migration" --no-mentions # Skip 'mentioned' edges (reduces fan-out)
lore timeline "deploy" -n 50 # Limit event count
lore timeline "auth" --max-seeds 5 # Fewer seed entities
```
The query can be either a search string (hybrid search finds matching entities) or an entity reference (`issue:N`, `i:N`, `mr:N`, `m:N`) which directly seeds the timeline from a specific entity and its cross-references.
#### Flags
| Flag | Default | Description |
@@ -370,18 +414,21 @@ lore timeline "auth" --max-seeds 5 # Fewer seed entities
| `-p` / `--project` | all | Scope to a specific project (fuzzy match) |
| `--since` | none | Only events after this date (7d, 2w, 6m, YYYY-MM-DD) |
| `--depth` | `1` | Cross-reference expansion depth (0 = seeds only) |
| `--expand-mentions` | off | Also follow "mentioned" edges during expansion |
| `--no-mentions` | off | Skip "mentioned" edges during expansion (reduces fan-out) |
| `-n` / `--limit` | `100` | Maximum events to display |
| `--max-seeds` | `10` | Maximum seed entities from search |
| `--max-entities` | `50` | Maximum entities discovered via cross-references |
| `--max-evidence` | `10` | Maximum evidence notes included |
| `--fields` | all | Select output fields (comma-separated, or 'minimal' preset) |
#### Pipeline Stages
1. **SEED** -- Full-text search identifies the most relevant issues and MRs matching the query. Documents are ranked by BM25 relevance.
2. **HYDRATE** -- Evidence notes are extracted: the top FTS-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced.
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities via "closes", "related", and optionally "mentioned" references up to the configured depth.
4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking.
Each stage displays a numbered progress spinner (e.g., `[1/3] Seeding timeline...`). In robot mode, spinners are suppressed for clean JSON output.
1. **SEED** -- Hybrid search (FTS5 lexical + Ollama vector similarity via Reciprocal Rank Fusion) identifies the most relevant issues and MRs. Falls back to lexical-only if Ollama is unavailable. Discussion notes matching the query are also discovered and attached to their parent entities.
2. **HYDRATE** -- Evidence notes are extracted: the top search-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced. Matched discussions are collected as full thread candidates.
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities via "closes", "related", and "mentioned" references up to the configured depth. Use `--no-mentions` to exclude "mentioned" edges and reduce fan-out.
4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, evidence notes, and full discussion threads. Events are sorted chronologically with stable tiebreaking.
5. **RENDER** -- Events are formatted as human-readable text or structured JSON (robot mode).
#### Event Types
@@ -395,13 +442,70 @@ lore timeline "auth" --max-seeds 5 # Fewer seed entities
| `MilestoneSet` | Milestone assigned |
| `MilestoneRemoved` | Milestone removed |
| `Merged` | MR merged (deduplicated against state events) |
| `NoteEvidence` | Discussion note matched by FTS, with snippet |
| `NoteEvidence` | Discussion note matched by search, with snippet |
| `DiscussionThread` | Full discussion thread with all non-system notes |
| `CrossReferenced` | Reference to another entity |
#### Unresolved References
When graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the output. This enables discovery of external dependencies and can inform future sync targets.
### `lore notes`
Query individual notes from discussions with rich filtering options.
```bash
lore notes # List 50 most recent notes
lore notes --author alice --since 7d # Notes by alice in last 7 days
lore notes --for-issue 42 -p group/repo # Notes on issue #42
lore notes --for-mr 99 -p group/repo # Notes on MR !99
lore notes --path src/ --resolution unresolved # Unresolved diff notes in src/
lore notes --note-type DiffNote # Only inline code review comments
lore notes --contains "TODO" # Substring search in note body
lore notes --include-system # Include system-generated notes
lore notes --since 2w --until 2024-12-31 # Time-bounded range
lore notes --sort updated --asc # Sort by update time, ascending
lore notes --format csv # CSV output
lore notes --format jsonl # Line-delimited JSON
lore notes -o # Open first result in browser
# Field selection (robot mode)
lore -J notes --fields minimal # Compact: id, author_username, body, created_at_iso
```
#### Filters
| Flag | Description |
|------|-------------|
| `-a` / `--author` | Filter by note author username |
| `--note-type` | Filter by note type (DiffNote, DiscussionNote) |
| `--contains` | Substring search in note body |
| `--note-id` | Filter by internal note ID |
| `--gitlab-note-id` | Filter by GitLab note ID |
| `--discussion-id` | Filter by discussion ID |
| `--include-system` | Include system notes (excluded by default) |
| `--for-issue` | Notes on a specific issue IID (requires `-p`) |
| `--for-mr` | Notes on a specific MR IID (requires `-p`) |
| `-p` / `--project` | Scope to a project (fuzzy match) |
| `--since` | Notes created since (7d, 2w, 1m, or YYYY-MM-DD) |
| `--until` | Notes created until (YYYY-MM-DD, inclusive end-of-day) |
| `--path` | Filter by file path (DiffNotes only; trailing `/` for prefix match) |
| `--resolution` | Filter by resolution status (`any`, `unresolved`, `resolved`) |
| `--sort` | Sort by `created` (default) or `updated` |
| `--asc` | Sort ascending (default: descending) |
| `--format` | Output format: `table` (default), `json`, `jsonl`, `csv` |
| `-o` / `--open` | Open first result in browser |
### `lore drift`
Detect discussion divergence from the original intent of an issue by comparing the semantic similarity of discussion content against the issue description.
```bash
lore drift issues 42 # Check divergence on issue #42
lore drift issues 42 --threshold 0.6 # Higher threshold (stricter)
lore drift issues 42 -p group/repo # Scope to project
```
### `lore sync`
Run the full sync pipeline: ingest from GitLab (including work item status enrichment via GraphQL), generate searchable documents, and compute embeddings.
@@ -413,6 +517,7 @@ lore sync --force # Override stale lock
lore sync --no-embed # Skip embedding step
lore sync --no-docs # Skip document regeneration
lore sync --no-events # Skip resource event fetching
lore sync --no-file-changes # Skip MR file change fetching
lore sync --dry-run # Preview what would be synced
```
@@ -571,6 +676,7 @@ Machine-readable command manifest for agent self-discovery. Returns a JSON schem
```bash
lore robot-docs # Pretty-printed JSON
lore --robot robot-docs # Compact JSON for parsing
lore robot-docs --brief # Omit response_schema (~60% smaller)
```
### `lore version`
@@ -622,7 +728,7 @@ The `actions` array contains executable shell commands an agent can run to recov
### Field Selection
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response, reducing token usage for AI agent workflows:
The `--fields` flag controls which fields appear in the JSON response, reducing token usage for AI agent workflows. Supported on `issues`, `mrs`, `notes`, `search`, `timeline`, and `who` list commands:
```bash
# Minimal preset (~60% fewer tokens)
@@ -639,6 +745,48 @@ Valid fields for issues: `iid`, `title`, `state`, `author_username`, `labels`, `
Valid fields for MRs: `iid`, `title`, `state`, `author_username`, `labels`, `draft`, `target_branch`, `source_branch`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `reviewers`
### Error Tolerance
The CLI auto-corrects common mistakes before parsing, emitting a teaching note to stderr. Corrections work in both human and robot modes:
| Correction | Example | Mode |
|-----------|---------|------|
| Single-dash long flag | `-robot` -> `--robot` | All |
| Case normalization | `--Robot` -> `--robot` | All |
| Flag prefix expansion | `--proj` -> `--project` (unambiguous only) | All |
| Fuzzy flag match | `--projct` -> `--project` | All (threshold 0.9 in robot, 0.8 in human) |
| Subcommand alias | `merge_requests` -> `mrs`, `robotdocs` -> `robot-docs` | All |
| Value normalization | `--state Opened` -> `--state opened` | All |
| Value fuzzy match | `--state opend` -> `--state opened` | All |
| Subcommand prefix | `lore iss` -> `lore issues` (unambiguous only, via clap) | All |
In robot mode, corrections emit structured JSON to stderr:
```json
{"warning":{"type":"ARG_CORRECTED","corrections":[...],"teaching":["Use double-dash for long flags: --robot (not -robot)"]}}
```
When a command or flag is still unrecognized after corrections, the error response includes a fuzzy suggestion and, for enum-like flags, lists valid values:
```json
{"error":{"code":"UNKNOWN_COMMAND","message":"...","suggestion":"Did you mean 'lore issues'? Example: lore --robot issues -n 10. Run 'lore robot-docs' for all commands"}}
```
### Command Aliases
Commands accept aliases for common variations:
| Primary | Aliases |
|---------|---------|
| `issues` | `issue` |
| `mrs` | `mr`, `merge-requests`, `merge-request` |
| `notes` | `note` |
| `search` | `find`, `query` |
| `stats` | `stat` |
| `status` | `st` |
Unambiguous prefixes also work via subcommand inference (e.g., `lore iss` -> `lore issues`, `lore time` -> `lore timeline`).
### Agent Self-Discovery
The `robot-docs` command provides a complete machine-readable manifest including response schemas for every command:

View File

@@ -0,0 +1,202 @@
No `## Rejected Recommendations` section appears in the plan you pasted, so the revisions below are all net-new.
1. **Add an explicit “Bridge Contract” and fix scope inconsistency**
Analysis: The plan says “Three changes” but defines four. More importantly, identifier requirements are scattered. A single contract section prevents drift and makes every new read surface prove it can drive a write call.
```diff
@@
-**Scope**: Three changes, delivered in order:
+**Scope**: Four workstreams, delivered in order:
1. Add `gitlab_discussion_id` to notes output
2. Add `gitlab_discussion_id` to show command discussion groups
3. Add a standalone `discussions` list command
4. Fix robot-docs to list actual field names instead of opaque type references
+
+## Bridge Contract (Cross-Cutting)
+Every read payload that surfaces notes/discussions MUST include:
+- `project_path`
+- `noteable_type`
+- `parent_iid`
+- `gitlab_discussion_id`
+- `gitlab_note_id` (when note-level data is returned)
+This contract is required so agents can deterministically construct `glab api` write calls.
```
2. **Normalize identifier naming now (break ambiguous names)**
Analysis: Current `id`/`gitlab_id` naming is ambiguous in mixed payloads. Rename to explicit `note_id` and `gitlab_note_id` now (you explicitly dont care about backward compatibility). This reduces automation mistakes.
```diff
@@ 1b. Add field to `NoteListRow`
-pub struct NoteListRow {
- pub id: i64,
- pub gitlab_id: i64,
+pub struct NoteListRow {
+ pub note_id: i64, // local DB id
+ pub gitlab_note_id: i64, // GitLab note id
@@
@@ 1c. Add field to `NoteListRowJson`
-pub struct NoteListRowJson {
- pub id: i64,
- pub gitlab_id: i64,
+pub struct NoteListRowJson {
+ pub note_id: i64,
+ pub gitlab_note_id: i64,
@@
-#### 2f. Add `gitlab_note_id` to note detail structs in show
-While we're here, add `gitlab_id` to `NoteDetail`, `MrNoteDetail`, and their JSON
+#### 2f. Add `gitlab_note_id` to note detail structs in show
+While we're here, add `gitlab_note_id` to `NoteDetail`, `MrNoteDetail`, and their JSON
counterparts.
```
3. **Stop positional column indexing for these changes**
Analysis: In `list.rs`, row extraction is positional (`row.get(18)`, etc.). Adding fields is fragile and easy to break silently. Use named aliases and named lookup for robustness.
```diff
@@ 1a/1b SQL + query_map
- p.path_with_namespace AS project_path
+ p.path_with_namespace AS project_path,
+ d.gitlab_discussion_id AS gitlab_discussion_id
@@
- project_path: row.get(18)?,
- gitlab_discussion_id: row.get(19)?,
+ project_path: row.get("project_path")?,
+ gitlab_discussion_id: row.get("gitlab_discussion_id")?,
```
4. **Redesign `discussions` query to avoid correlated subquery fanout**
Analysis: Proposed query uses many correlated subqueries per row. Thats acceptable for tiny MR-scoped sets, but degrades for project-wide scans. Use a base CTE + one rollup pass over notes.
```diff
@@ 3c. SQL Query
-SELECT
- d.id,
- ...
- (SELECT COUNT(*) FROM notes n2 WHERE n2.discussion_id = d.id AND n2.is_system = 0) AS note_count,
- (SELECT n3.author_username FROM notes n3 WHERE n3.discussion_id = d.id ORDER BY n3.position LIMIT 1) AS first_author,
- ...
-FROM discussions d
+WITH base AS (
+ SELECT d.id, d.gitlab_discussion_id, d.noteable_type, d.project_id, d.issue_id, d.merge_request_id,
+ d.individual_note, d.first_note_at, d.last_note_at, d.resolvable, d.resolved
+ FROM discussions d
+ {where_sql}
+),
+note_rollup AS (
+ SELECT n.discussion_id,
+ COUNT(*) FILTER (WHERE n.is_system = 0) AS user_note_count,
+ COUNT(*) AS total_note_count,
+ MIN(CASE WHEN n.is_system = 0 THEN n.position END) AS first_user_pos
+ FROM notes n
+ JOIN base b ON b.id = n.discussion_id
+ GROUP BY n.discussion_id
+)
+SELECT ...
+FROM base b
+LEFT JOIN note_rollup r ON r.discussion_id = b.id
```
5. **Add explicit index work for new access patterns**
Analysis: Existing indexes are good but not ideal for new list patterns (`project + last_note`, note position ordering inside discussion). Add migration entries to keep latency stable.
```diff
@@ ## 3. Add Standalone `discussions` List Command
+#### 3h. Add migration for discussion-list performance
+**File**: `migrations/027_discussions_list_indexes.sql`
+```sql
+CREATE INDEX IF NOT EXISTS idx_discussions_project_last_note
+ ON discussions(project_id, last_note_at DESC, id DESC);
+CREATE INDEX IF NOT EXISTS idx_discussions_project_first_note
+ ON discussions(project_id, first_note_at DESC, id DESC);
+CREATE INDEX IF NOT EXISTS idx_notes_discussion_position
+ ON notes(discussion_id, position);
+```
```
6. **Add keyset pagination (critical for agent workflows)**
Analysis: `--limit` alone is not enough for automation over large datasets. Add cursor-based pagination with deterministic sort keys and `next_cursor` in JSON.
```diff
@@ 3a. CLI Args
+ /// Keyset cursor from previous response
+ #[arg(long, help_heading = "Output")]
+ pub cursor: Option<String>,
@@
@@ Response Schema
- "total_count": 15,
- "showing": 15
+ "total_count": 15,
+ "showing": 15,
+ "next_cursor": "eyJsYXN0X25vdGVfYXQiOjE3MDAwMDAwMDAwMDAsImlkIjoxMjN9"
@@
@@ Validation Criteria
+7. `lore -J discussions ... --cursor <token>` returns the next stable page without duplicates/skips
```
7. **Fix semantic ambiguities in discussion summary fields**
Analysis: `note_count` is ambiguous, and `first_author` can accidentally be a system note author. Make fields explicit and consistent with non-system default behavior.
```diff
@@ Response Schema
- "note_count": 3,
- "first_author": "elovegrove",
+ "user_note_count": 3,
+ "total_note_count": 4,
+ "first_user_author": "elovegrove",
@@
@@ 3d. Filters struct / path behavior
-- `path` → `EXISTS (SELECT 1 FROM notes n WHERE n.discussion_id = d.id AND n.position_new_path LIKE ?)`
+- `path` → match on BOTH `position_new_path` and `position_old_path` (exact/prefix)
```
8. **Enrich show outputs with actionable thread metadata**
Analysis: Adding only discussion id helps, but agents still need thread state and note ids to pick targets correctly. Add `resolvable`, `resolved`, `last_note_at_iso`, and `gitlab_note_id` in show discussion payloads.
```diff
@@ 2a/2b show discussion structs
pub struct DiscussionDetailJson {
pub gitlab_discussion_id: String,
+ pub resolvable: bool,
+ pub resolved: bool,
+ pub last_note_at_iso: String,
pub notes: Vec<NoteDetailJson>,
@@
pub struct NoteDetailJson {
+ pub gitlab_note_id: i64,
pub author_username: String,
```
9. **Harden robot-docs against schema drift with tests**
Analysis: Static JSON in `main.rs` will drift again. Add a lightweight contract test that asserts docs include required fields for `notes`, `discussions`, and show payloads.
```diff
@@ 4. Fix Robot-Docs Response Schemas
+#### 4f. Add robot-docs contract tests
+**File**: `src/main.rs` (or dedicated test module)
+- Assert `robot-docs` contains `gitlab_discussion_id` and `gitlab_note_id` in:
+ - `notes.response_schema`
+ - `issues.response_schema.show`
+ - `mrs.response_schema.show`
+ - `discussions.response_schema`
```
10. **Adjust delivery order to reduce rework and include missing CSV path**
Analysis: In your sample `handle_discussions`, `csv` is declared in args but not handled. Also, robot-docs should land after all payload changes. Sequence should minimize churn.
```diff
@@ Delivery Order
-3. **Change 4** (robot-docs) — depends on 1 and 2 being done so schemas are accurate.
-4. **Change 3** (discussions command) — largest change, depends on 1 for design consistency.
+3. **Change 3** (discussions command + indexes + pagination) — largest change.
+4. **Change 4** (robot-docs + contract tests) — last, after payloads are final.
@@ 3e. Handler wiring
- match format {
+ match format {
"json" => ...
"jsonl" => ...
+ "csv" => print_list_discussions_csv(&result),
_ => ...
}
```
If you want, I can produce a single consolidated revised plan markdown with these edits applied so you can drop it in directly.

View File

@@ -0,0 +1,162 @@
Best non-rejected upgrades Id make to this plan are below. They focus on reducing schema drift, making robot output safer to consume, and improving performance behavior at scale.
1. Add a shared contract model and field constants first (before workstreams 1-4)
Rationale: Right now each command has its own structs and ad-hoc mapping. That is exactly how drift happens. A single contract definition reused by `notes`, `show`, `discussions`, and robot-docs gives compile-time coupling between output payloads and docs. It also makes future fields cheaper and safer to add.
```diff
@@ Scope: Four workstreams, delivered in order:
-1. Add `gitlab_discussion_id` to notes output
-2. Add `gitlab_discussion_id` to show command discussion groups
-3. Add a standalone `discussions` list command
-4. Fix robot-docs to list actual field names instead of opaque type references
+0. Introduce shared Bridge Contract model/constants used by notes/show/discussions/robot-docs
+1. Add `gitlab_discussion_id` to notes output
+2. Add `gitlab_discussion_id` to show command discussion groups
+3. Add a standalone `discussions` list command
+4. Fix robot-docs to list actual field names instead of opaque type references
+## 0. Shared Contract Model (Cross-Cutting)
+Define canonical required-field constants and shared mapping helpers, then consume them in:
+- `src/cli/commands/list.rs`
+- `src/cli/commands/show.rs`
+- `src/cli/robot.rs`
+- `src/main.rs` robot-docs builder
+This removes duplicated field-name strings and prevents docs/output mismatch.
```
2. Make bridge fields “non-droppable” in robot mode
Rationale: The current plan adds fields, but `--fields` can still remove them. That breaks the core read/write bridge contract in exactly the workflows this change is trying to fix. In robot mode, contract fields should always be force-included.
```diff
@@ ## Bridge Contract (Cross-Cutting)
Every read payload that surfaces notes or discussions **MUST** include:
- `project_path`
- `noteable_type`
- `parent_iid`
- `gitlab_discussion_id`
- `gitlab_note_id` (when note-level data is returned — i.e., in notes list and show detail)
+### Field Filtering Guardrail
+In robot mode, `filter_fields` must force-include Bridge Contract fields even when users pass a narrower `--fields` list.
+Human/table mode keeps existing behavior.
```
3. Replace correlated subqueries in `discussions` rollup with a single-pass window/aggregate pattern
Rationale: Your CTE is better than naive fanout, but it still uses multiple correlated sub-selects per discussion for first author/body/path. At 200K+ discussions this can regress badly depending on cache/index state. A window-ranked `notes` CTE with grouped aggregates is usually faster and more predictable in SQLite.
```diff
@@ #### 3c. SQL Query
-Core query uses a CTE + rollup to avoid correlated subquery fanout on larger result sets:
+Core query uses a CTE + ranked-notes rollup (window function) to avoid per-row correlated subqueries:
-WITH filtered_discussions AS (...),
-note_rollup AS (
- SELECT
- n.discussion_id,
- SUM(...) AS note_count,
- (SELECT ... LIMIT 1) AS first_author,
- (SELECT ... LIMIT 1) AS first_note_body,
- (SELECT ... LIMIT 1) AS position_new_path,
- (SELECT ... LIMIT 1) AS position_new_line
- FROM notes n
- ...
-)
+WITH filtered_discussions AS (...),
+ranked_notes AS (
+ SELECT
+ n.*,
+ ROW_NUMBER() OVER (PARTITION BY n.discussion_id ORDER BY n.position, n.id) AS rn
+ FROM notes n
+ WHERE n.discussion_id IN (SELECT id FROM filtered_discussions)
+),
+note_rollup AS (
+ SELECT
+ discussion_id,
+ SUM(CASE WHEN is_system = 0 THEN 1 ELSE 0 END) AS note_count,
+ MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author,
+ MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body,
+ MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path,
+ MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line
+ FROM ranked_notes
+ GROUP BY discussion_id
+)
```
4. Add direct GitLab ID filters for deterministic bridging
Rationale: Bridge workflows often start from one known ID. You already have `gitlab_note_id` in notes filters, but discussion filtering still looks internal-ID-centric. Add explicit GitLab-ID filters so agents do not need extra translation calls.
```diff
@@ #### 3a. CLI Args
pub struct DiscussionsArgs {
+ /// Filter by GitLab discussion ID
+ #[arg(long, help_heading = "Filters")]
+ pub gitlab_discussion_id: Option<String>,
@@
@@ #### 3d. Filters struct
pub struct DiscussionListFilters {
+ pub gitlab_discussion_id: Option<String>,
@@
}
```
```diff
@@ ## 1. Add `gitlab_discussion_id` to Notes Output
+#### 1g. Add `--gitlab-discussion-id` filter to notes
+Allow filtering notes directly by GitLab thread ID (not only internal discussion ID).
+This enables one-hop note retrieval from external references.
```
5. Add optional note expansion to `discussions` for fewer round-trips
Rationale: Today the agent flow is often `discussions -> show`. Optional embedded notes (`--include-notes N`) gives a fast path for “list unresolved threads with latest context” without forcing full show payloads.
```diff
@@ ### Design
lore -J discussions --for-mr 99 --resolution unresolved
+lore -J discussions --for-mr 99 --resolution unresolved --include-notes 2
@@ #### 3a. CLI Args
+ /// Include up to N latest notes per discussion (0 = none)
+ #[arg(long, default_value = "0", help_heading = "Output")]
+ pub include_notes: usize,
```
6. Upgrade robot-docs from string blobs to structured schema + explicit contract block
Rationale: `contains("gitlab_discussion_id")` tests on schema strings are brittle. A structured schema object gives machine-checked docs and reliable test assertions. Add a contract section for agent consumers.
```diff
@@ ## 4. Fix Robot-Docs Response Schemas
-#### 4a. Notes response_schema
-Replace stringly-typed schema snippets...
+#### 4a. Notes response_schema (structured)
+Represent response fields as JSON objects (field -> type/nullable), not freeform strings.
+#### 4g. Add `bridge_contract` section in robot-docs
+Publish canonical required fields per entity:
+- notes
+- discussions
+- show.discussions
+- show.notes
```
7. Strengthen validation: add CLI-level contract tests and perf guardrails
Rationale: Most current tests are unit-level struct/query checks. Add end-to-end JSON contract tests via command handlers, plus a benchmark-style regression test (ignored by default) so performance work stays intentional.
```diff
@@ ## Validation Criteria
8. Bridge Contract fields (...) are present in every applicable read payload
+9. Contract fields remain present even with `--fields` in robot mode
+10. `discussions` query meets performance guardrail on representative fixture (documented threshold)
@@ ### Tests
+#### Test: robot-mode fields cannot drop bridge contract keys
+Run notes/discussions JSON output through `filter_fields` path and assert required keys remain.
+
+#### Test: CLI contract integration
+Invoke command handlers for `notes`, `discussions`, `mrs <iid>`, parse JSON, assert required keys and types.
+
+#### Test (ignored): large-fixture performance regression
+Generate representative fixture and assert `query_discussions` stays under target elapsed time.
```
If you want, I can now produce a full “v2 plan” document that applies these diffs end-to-end (including revised delivery order and complete updated sections).

View File

@@ -0,0 +1,147 @@
1. **Make `gitlab_note_id` explicit in all note-level payloads without breaking existing consumers**
Rationale: Your Bridge Contract already requires `gitlab_note_id`, but current plan keeps `gitlab_id` only in `notes` list while adding `gitlab_note_id` only in `show`. That forces agents to special-case commands. Add `gitlab_note_id` as an alias field everywhere note-level data appears, while keeping `gitlab_id` for compatibility.
```diff
@@ Bridge Contract (Cross-Cutting)
-Every read payload that surfaces notes or discussions MUST include:
+Every read payload that surfaces notes or discussions MUST include:
- project_path
- noteable_type
- parent_iid
- gitlab_discussion_id
- gitlab_note_id (when note-level data is returned — i.e., in notes list and show detail)
+ - Back-compat rule: note payloads may continue exposing `gitlab_id`, but MUST also expose `gitlab_note_id` with the same value.
@@ 1. Add `gitlab_discussion_id` to Notes Output
-#### 1c. Add field to `NoteListRowJson`
+#### 1c. Add fields to `NoteListRowJson`
+Add `gitlab_note_id` alias in addition to existing `gitlab_id` (no rename, no breakage).
@@ 1f. Update `--fields minimal` preset
-"notes" => ["id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
+"notes" => ["id", "gitlab_note_id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
```
2. **Avoid duplicate flag semantics for discussion filtering**
Rationale: `notes` already has `--discussion-id` and it already maps to `d.gitlab_discussion_id`. Adding a second independent flag/field (`--gitlab-discussion-id`) increases complexity and precedence bugs. Keep one backing filter field and make the new flag an alias.
```diff
@@ 1g. Add `--gitlab-discussion-id` filter to notes
-Allow filtering notes directly by GitLab discussion thread ID...
+Normalize discussion ID flags:
+- Keep one backing filter field (`discussion_id`)
+- Support both `--discussion-id` (existing) and `--gitlab-discussion-id` (alias)
+- If both are provided, clap should reject as duplicate/alias conflict
```
3. **Add ambiguity guardrails for cross-project discussion IDs**
Rationale: `gitlab_discussion_id` is unique per project, not globally. Filtering by discussion ID without project can return multiple rows across repos, which breaks deterministic write bridging. Fail fast with an `Ambiguous` error and actionable fix (`--project`).
```diff
@@ Bridge Contract (Cross-Cutting)
+### Ambiguity Guardrail
+When filtering by `gitlab_discussion_id` without `--project`, if multiple projects match:
+- return `Ambiguous` error
+- include matching project paths in message
+- suggest retry with `--project <path>`
```
4. **Replace `--include-notes` N+1 retrieval with one batched top-N query**
Rationale: The current plans per-discussion follow-up query scales poorly and creates latency spikes. Use a single window-function query over selected discussion IDs and group rows in Rust. This is both faster and more predictable.
```diff
@@ 3c-ii. Note expansion query (--include-notes)
-When `include_notes > 0`, after the main discussion query, run a follow-up query per discussion...
+When `include_notes > 0`, run one batched query:
+WITH ranked_notes AS (
+ SELECT
+ n.*,
+ d.gitlab_discussion_id,
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY n.created_at DESC, n.id DESC
+ ) AS rn
+ FROM notes n
+ JOIN discussions d ON d.id = n.discussion_id
+ WHERE n.discussion_id IN ( ...selected discussion ids... )
+)
+SELECT ... FROM ranked_notes WHERE rn <= ?
+ORDER BY discussion_id, rn;
+
+Group by `discussion_id` in Rust and attach notes arrays without per-thread round-trips.
```
5. **Add hard output guardrails and explicit truncation metadata**
Rationale: `--limit` and `--include-notes` are unbounded today. For robot workflows this can accidentally generate huge payloads. Cap values and surface effective limits plus truncation state in `meta`.
```diff
@@ 3a. CLI Args
- pub limit: usize,
+ pub limit: usize, // clamp to max (e.g., 500)
- pub include_notes: usize,
+ pub include_notes: usize, // clamp to max (e.g., 20)
@@ Response Schema
- "meta": { "elapsed_ms": 12 }
+ "meta": {
+ "elapsed_ms": 12,
+ "effective_limit": 50,
+ "effective_include_notes": 2,
+ "has_more": true
+ }
```
6. **Strengthen deterministic ordering and null handling**
Rationale: `first_note_at`, `last_note_at`, and note `position` can be null/incomplete during partial sync states. Add null-safe ordering to avoid unstable output and flaky automation.
```diff
@@ 2c. Update queries to SELECT new fields
-... ORDER BY first_note_at
+... ORDER BY COALESCE(first_note_at, last_note_at, 0), id
@@ show note query
-ORDER BY position
+ORDER BY COALESCE(position, 9223372036854775807), created_at, id
@@ 3c. SQL Query
-ORDER BY {sort_column} {order}
+ORDER BY COALESCE({sort_column}, 0) {order}, fd.id {order}
```
7. **Make write-bridging more useful with optional command hints**
Rationale: Exposing IDs is necessary but not sufficient; agents still need to assemble endpoints repeatedly. Add optional `--with-write-hints` that injects compact endpoint templates (`reply`, `resolve`) derived from row context. This improves usability without bloating default output.
```diff
@@ 3a. CLI Args
+ /// Include machine-actionable glab write hints per row
+ #[arg(long, help_heading = "Output")]
+ pub with_write_hints: bool,
@@ Response Schema (notes/discussions/show)
+ "write_hints?": {
+ "reply_endpoint": "string",
+ "resolve_endpoint?": "string"
+ }
```
8. **Upgrade robot-docs/contract validation from string-contains to parity checks**
Rationale: `contains("gitlab_discussion_id")` catches very little and allows schema drift. Build field-set parity tests that compare actual serialized JSON keys to robot-docs declared fields for `notes`, `discussions`, and `show` discussion nodes.
```diff
@@ 4f. Add robot-docs contract tests
-assert!(notes_schema.contains("gitlab_discussion_id"));
+let declared = parse_schema_field_list(notes_schema);
+let sample = sample_notes_row_json_keys();
+assert_required_subset(&declared, &["project_path","noteable_type","parent_iid","gitlab_discussion_id","gitlab_note_id"]);
+assert_schema_matches_payload(&declared, &sample);
@@ 4g. Add CLI-level contract integration tests
+Add parity tests for:
+- notes list JSON
+- discussions list JSON
+- issues show discussions[*]
+- mrs show discussions[*]
```
If you want, I can produce a full revised v3 plan text with these edits merged end-to-end so its ready to execute directly.

View File

@@ -0,0 +1,207 @@
Below are the highest-impact revisions Id make to this plan. I excluded everything listed in your `## Rejected Recommendations` section.
**1. Fix a correctness bug in the ambiguity guardrail (must run before `LIMIT`)**
The current post-query ambiguity check can silently fail when `--limit` truncates results to one project even though multiple projects match the same `gitlab_discussion_id`. That creates non-deterministic write targeting risk.
```diff
@@ ## Ambiguity Guardrail
-**Implementation**: After the main query, if `gitlab_discussion_id` is set and no `--project`
-was provided, check if the result set spans multiple `project_path` values.
+**Implementation**: Run a preflight distinct-project check when `gitlab_discussion_id` is set
+and `--project` was not provided, before the main list query applies `LIMIT`.
+Use:
+```sql
+SELECT DISTINCT p.path_with_namespace
+FROM discussions d
+JOIN projects p ON p.id = d.project_id
+WHERE d.gitlab_discussion_id = ?
+LIMIT 3
+```
+If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with project
+paths and suggestion to retry with `--project <path>`.
```
---
**2. Add `gitlab_project_id` to the Bridge Contract**
`project_path` is human-friendly but mutable (renames/transfers). `gitlab_project_id` gives a stable write target and avoids path re-resolution failures.
```diff
@@ ## Bridge Contract (Cross-Cutting)
Every read payload that surfaces notes or discussions **MUST** include:
- `project_path`
+- `gitlab_project_id`
- `noteable_type`
- `parent_iid`
- `gitlab_discussion_id`
- `gitlab_note_id`
@@
const BRIDGE_FIELDS_NOTES: &[&str] = &[
- "project_path", "noteable_type", "parent_iid",
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id", "gitlab_note_id",
];
const BRIDGE_FIELDS_DISCUSSIONS: &[&str] = &[
- "project_path", "noteable_type", "parent_iid",
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id",
];
```
---
**3. Replace stringly-typed filter/sort fields with enums end-to-end**
Right now `sort`, `order`, `resolution`, `noteable_type` are mostly `String`. This is fragile and risks unsafe SQL interpolation drift over time. Typed enums make invalid states unrepresentable.
```diff
@@ ## 3a. CLI Args
- pub resolution: Option<String>,
+ pub resolution: Option<ResolutionFilter>,
@@
- pub noteable_type: Option<String>,
+ pub noteable_type: Option<NoteableTypeFilter>,
@@
- pub sort: String,
+ pub sort: DiscussionSortField,
@@
- pub asc: bool,
+ pub order: SortDirection,
@@ ## 3d. Filters struct
- pub resolution: Option<String>,
- pub noteable_type: Option<String>,
- pub sort: String,
- pub order: String,
+ pub resolution: Option<ResolutionFilter>,
+ pub noteable_type: Option<NoteableTypeFilter>,
+ pub sort: DiscussionSortField,
+ pub order: SortDirection,
@@
+Map enum -> SQL fragment via `match` in query builder; never interpolate raw strings.
```
---
**4. Enforce snapshot consistency for multi-query commands**
`discussions` with `--include-notes` does multiple reads. Without a single read transaction, concurrent ingest can produce mismatched `total_count`, row set, and expanded notes.
```diff
@@ ## 3c. SQL Query
-pub fn query_discussions(...)
+pub fn query_discussions(...)
{
+ // Run count query + page query + note expansion under one deferred read transaction
+ // so output is a single consistent snapshot.
+ let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
...
+ tx.commit()?;
}
@@ ## 1. Add `gitlab_discussion_id` to Notes Output
+Apply the same snapshot rule to `query_notes` when returning `total_count` + paged rows.
```
---
**5. Correct first-note rollup semantics (current CTE can return null/incorrect `first_author`)**
In the proposed SQL, `rn=1` is computed over all notes but then filtered with `is_system=0`, so threads with a leading system note may incorrectly lose `first_author`/snippet. Also path rollup uses non-deterministic `MAX(...)`.
```diff
@@ ## 3c. SQL Query
-ranked_notes AS (
+ranked_notes AS (
SELECT
n.discussion_id,
n.author_username,
n.body,
n.is_system,
n.position_new_path,
n.position_new_line,
- ROW_NUMBER() OVER (
- PARTITION BY n.discussion_id
- ORDER BY n.position, n.id
- ) AS rn
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY CASE WHEN n.is_system = 0 THEN 0 ELSE 1 END, n.created_at, n.id
+ ) AS rn_first_note,
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY CASE WHEN n.position_new_path IS NULL THEN 1 ELSE 0 END, n.created_at, n.id
+ ) AS rn_first_position
@@
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author,
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body,
- MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path,
- MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN author_username END) AS first_author,
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN body END) AS first_note_body,
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_path END) AS position_new_path,
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_line END) AS position_new_line
```
---
**6. Add per-discussion truncation signals for `--include-notes`**
Top-level `has_more` is useful, but agents also need to know if an individual threads notes were truncated. Otherwise they cant tell if a thread is complete.
```diff
@@ ## Response Schema
{
"gitlab_discussion_id": "...",
...
- "notes": []
+ "included_note_count": 0,
+ "has_more_notes": false,
+ "notes": []
}
@@ ## 3b. Domain Structs
pub struct DiscussionListRowJson {
@@
+ pub included_note_count: usize,
+ pub has_more_notes: bool,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub notes: Vec<NoteListRowJson>,
}
@@ ## 3c-ii. Note expansion query (--include-notes)
-Group by `discussion_id` in Rust and attach notes arrays...
+Group by `discussion_id` in Rust, attach notes arrays, and set:
+`included_note_count = notes.len()`,
+`has_more_notes = note_count > included_note_count`.
```
---
**7. Add explicit query-plan gate and targeted index workstream (measured, not speculative)**
This plan introduces heavy discussion-centric reads. You should bake in deterministic performance validation with `EXPLAIN QUERY PLAN` and only then add indexes if missing.
```diff
@@ ## Scope: Four workstreams, delivered in order:
-4. Fix robot-docs to list actual field names instead of opaque type references
+4. Add query-plan validation + targeted index updates for new discussion queries
+5. Fix robot-docs to list actual field names instead of opaque type references
@@
+## 4. Query-Plan Validation and Targeted Indexes
+
+Before and after implementing `query_discussions`, capture `EXPLAIN QUERY PLAN` for:
+- `--for-mr <iid> --resolution unresolved`
+- `--project <path> --since 7d --sort last_note`
+- `--gitlab-discussion-id <id>`
+
+If plans show table scans on `notes`/`discussions`, add indexes in `MIGRATIONS` array:
+- `discussions(project_id, gitlab_discussion_id)`
+- `discussions(merge_request_id, last_note_at, id)`
+- `notes(discussion_id, created_at DESC, id DESC)`
+- `notes(discussion_id, position, id)`
+
+Tests: assert the new query paths return expected rows under indexed schema and no regressions.
```
---
If you want, I can produce a single consolidated “iteration 4” version of the plan text with all seven revisions merged in place.

View File

@@ -0,0 +1,160 @@
I reviewed the plan end-to-end and focused only on new improvements (none of the items in `## Rejected Recommendations` are re-proposed).
1. Add direct `--discussion-id` retrieval paths
Rationale: This removes a full discovery hop for the exact workflow that failed (replying to a known thread). It also reduces ambiguity and query cost when an agent already has the thread ID.
```diff
@@ Core Changes
| 7 | Fix robot-docs to list actual field names | Docs | Small |
+| 8 | Add direct `--discussion-id` filter to notes/discussions/show | Core | Small |
@@ Change 3: Add Standalone `discussions` List Command
lore -J discussions --for-mr 99 --cursor <token> # keyset pagination
+lore -J discussions --discussion-id 6a9c1750b37d... # direct lookup
@@ 3a. CLI Args
+ #[arg(long, conflicts_with_all = ["for_issue", "for_mr"], help_heading = "Filters")]
+ pub discussion_id: Option<String>,
@@ Change 1: Add `gitlab_discussion_id` to Notes Output
+Add `--discussion-id <hex>` filter to `notes` for direct note retrieval within one thread.
```
2. Add a shared filter compiler to eliminate count/query drift
Rationale: The plan currently repeats filters across data query, `total_count`, and `incomplete_rows` count queries. That is a classic reliability bug source. A single compiled filter object makes count semantics provably consistent.
```diff
@@ Count Semantics (Cross-Cutting Convention)
+## Filter Compiler (NEW, Cross-Cutting Convention)
+All list commands must build predicates via a shared `CompiledFilters` object that emits:
+- SQL predicate fragment
+- bind parameters
+- canonical filter string (for cursor hash)
+The same compiled object is reused by:
+- page data query
+- `total_count` query
+- `incomplete_rows` query
```
3. Harden keyset pagination semantics for `DESC`, limits, and client ergonomics
Rationale: `(sort_value, id) > (?, ?)` is only correct for ascending order. Descending sort needs `<`. Also add explicit `has_more` so clients dont infer from cursor nullability.
```diff
@@ Keyset Pagination (Cross-Cutting, Change B)
-```sql
-WHERE (sort_value, id) > (?, ?)
-```
+Use comparator by order:
+- ASC: `(sort_value, id) > (?, ?)`
+- DESC: `(sort_value, id) < (?, ?)`
@@ 3a. CLI Args
+ #[arg(short = 'n', long = "limit", default_value = "50", value_parser = clap::value_parser!(usize).range(1..=500), help_heading = "Output")]
+ pub limit: usize,
@@ Response Schema
- "next_cursor": "aW...xyz=="
+ "next_cursor": "aW...xyz==",
+ "has_more": true
```
4. Add DB-level entity integrity invariants (not just response invariants)
Rationale: Response-side filtering is good, but DB correctness should also be guarded. This prevents silent corruption and bad joins from ingestion or future migrations.
```diff
@@ Contract Invariants (NEW)
+### Entity Integrity Invariants (DB + Ingest)
+1. `discussions` must belong to exactly one parent (`issue_id XOR merge_request_id`).
+2. `discussions.noteable_type` must match the populated parent column.
+3. Natural-key uniqueness is enforced where valid:
+ - `(project_id, gitlab_discussion_id)` unique for discussions.
+4. Ingestion must reject/quarantine rows violating invariants and report counts.
@@ Supporting Indexes (Cross-Cutting, Change D)
+CREATE UNIQUE INDEX IF NOT EXISTS idx_discussions_project_gitlab_discussion_id
+ ON discussions(project_id, gitlab_discussion_id);
```
5. Switch bulk note loading to streaming grouping (avoid large intermediate vecs)
Rationale: Current bulk strategy still materializes all notes before grouping. Streaming into the map cuts peak memory and improves large-MR stability.
```diff
@@ Change 2e. Constructor — use bulk notes map
-let all_note_rows: Vec<MrNoteDetail> = ... // From bulk query above
-let notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> =
- all_note_rows.into_iter().fold(HashMap::new(), |mut map, note| {
- map.entry(note.discussion_id).or_insert_with(Vec::new).push(note);
- map
- });
+let mut notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> = HashMap::new();
+for row in bulk_note_stmt.query_map(params, map_note_row)? {
+ let note = row?;
+ notes_by_discussion.entry(note.discussion_id).or_default().push(note);
+}
```
6. Make freshness tri-state (`fresh|stale|unknown`) and fail closed on unknown with `--require-fresh`
Rationale: `stale: bool` alone cannot represent “never synced / unknown project freshness.” For write safety, unknown freshness should be explicit and reject under freshness constraints.
```diff
@@ Freshness Metadata & Staleness Guards
pub struct ResponseMeta {
pub elapsed_ms: i64,
pub data_as_of_iso: String,
pub sync_lag_seconds: i64,
pub stale: bool,
+ pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub freshness_reason: Option<String>,
pub incomplete_rows: i64,
@@
-if sync_lag_seconds > max_age_secs {
+if freshness_state == "unknown" || sync_lag_seconds > max_age_secs {
```
7. Tune indexes to match actual ORDER BY paths in window queries
Rationale: `idx_notes_discussion_position` is likely insufficient for the two window orderings. A covering-style index aligned with partition/order keys reduces random table lookups.
```diff
@@ Supporting Indexes (Cross-Cutting, Change D)
--- Notes: window function ORDER BY (discussion_id, position) for ROW_NUMBER()
-CREATE INDEX IF NOT EXISTS idx_notes_discussion_position
- ON notes(discussion_id, position);
+-- Notes: support dual ROW_NUMBER() orderings and reduce table lookups
+CREATE INDEX IF NOT EXISTS idx_notes_discussion_window
+ ON notes(discussion_id, is_system, position, created_at, gitlab_id);
```
8. Add a phased rollout gate before strict exclusion becomes default
Rationale: Enforcing `gitlab_* IS NOT NULL` immediately can hide data if existing rows are incomplete. A short observation gate prevents sudden regressions while preserving the end-state contract.
```diff
@@ Delivery Order
+Batch 0: Observability gate (NEW)
+- Ship `incomplete_rows` and freshness meta first
+- Measure incomplete rate across real datasets
+- If incomplete ratio <= threshold, enable strict exclusion defaults
+- If above threshold, block rollout and fix ingestion quality first
+
Change 1 (notes output) ──┐
```
9. Add property-based invariants for pagination/count correctness
Rationale: Your current tests are scenario-based and good, but randomized property tests are much better at catching edge-case cursor/count bugs.
```diff
@@ Tests (Change 3 / Change B)
+**Test 12**: Property-based pagination invariants (`proptest`)
+```rust
+#[test]
+fn prop_discussion_cursor_no_overlap_no_gap_under_random_data() { /* ... */ }
+```
+
+**Test 13**: Property-based count invariants
+```rust
+#[test]
+fn prop_total_count_and_incomplete_rows_match_filter_partition() { /* ... */ }
+```
```
If you want, I can now produce a fully consolidated “Plan v4” that applies these diffs cleanly into your original document so it reads as a single coherent spec.

View File

@@ -0,0 +1,158 @@
I reviewed the whole plan and only proposed changes that are not in your `## Rejected Recommendations`.
1. **Fix plan-internal inconsistencies first**
Analysis: The plan currently has a few self-contradictions (`8` vs `9` cross-cutting improvements, `stale` still referenced after moving to tri-state freshness). Cleaning this prevents implementation drift and bad AC validation.
```diff
--- a/plan.md
+++ b/plan.md
@@
-**Scope**: 8 core changes + 8 cross-cutting architectural improvements across 3 tiers:
+**Scope**: 8 core changes + 9 cross-cutting architectural improvements across 3 tiers:
@@ AC-7: Freshness Metadata Present & Staleness Guards Work
-lore -J notes -n 1 | jq '.meta | {data_as_of_iso, sync_lag_seconds, stale}'
-# All fields present, stale=false if recently synced
+lore -J notes -n 1 | jq '.meta | {data_as_of_iso, sync_lag_seconds, freshness_state}'
+# All fields present, freshness_state is one of fresh|stale|unknown
@@ Change 6 Response Schema example
- "stale": false,
+ "freshness_state": "fresh",
```
2. **Require snapshot-consistent list responses (page + counts)**
Analysis: `total_count`, `incomplete_rows`, and page rows can drift if sync writes between queries. Enforcing a single read snapshot for all list commands makes pagination and counts deterministic.
```diff
--- a/plan.md
+++ b/plan.md
@@ Count Semantics (Cross-Cutting Convention)
All list commands use consistent count fields:
+All three queries (`page`, `total_count`, `incomplete_rows`) MUST execute inside one read transaction/snapshot.
+This guarantees count/page consistency under concurrent sync writes.
```
3. **Use RAII transactions instead of manual `BEGIN/COMMIT`**
Analysis: Manual `execute_batch("BEGIN...")` is fragile on early returns. `rusqlite::Transaction` guarantees rollback on error and removes transaction-leak risk.
```diff
--- a/plan.md
+++ b/plan.md
@@ Change 2: Consistency guarantee
-conn.execute_batch("BEGIN DEFERRED")?;
-// ... discussion query ...
-// ... bulk note query ...
-conn.execute_batch("COMMIT")?;
+let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
+// ... discussion query ...
+// ... bulk note query ...
+tx.commit()?;
```
4. **Allow small focused new modules for query infrastructure**
Analysis: Keeping everything in `list.rs`/`show.rs` will become a maintenance hotspot as filters/cursors/freshness expand. A small module split reduces coupling and regression risk.
```diff
--- a/plan.md
+++ b/plan.md
@@ Change 3: File Architecture
-**No new files.** Follow existing patterns:
+Allow focused infra modules for shared logic:
+- `src/cli/query/filters.rs` (CompiledFilters + builders)
+- `src/cli/query/cursor.rs` (encode/decode/validate v2 cursors)
+- `src/cli/query/freshness.rs` (freshness computation + guards)
+Command handlers remain in existing files.
```
5. **Add ingest-time `discussion_rollups` to avoid repeated heavy window scans**
Analysis: Window functions are good, but doing them on every read over large note volumes is still expensive. Precomputing rollups during ingest gives lower and more predictable p95 latency while keeping read paths simpler.
```diff
--- a/plan.md
+++ b/plan.md
@@ Architectural Improvements (Cross-Cutting)
+| J | Ingest-time discussion rollups (`discussion_rollups`) | Performance | Medium |
@@ Change 3 SQL strategy
-Use `ROW_NUMBER()` window function instead of correlated subqueries...
+Primary path: join precomputed `discussion_rollups` for `note_count`, `first_author`,
+`first_note_body`, `position_new_path`, `position_new_line`.
+Fallback path: window-function recompute if rollup row is missing (defensive correctness).
```
6. **Add deterministic numeric project selector `--project-id`**
Analysis: `-p group/repo` is human-friendly, but numeric project IDs are safer for robots and avoid fuzzy/project-path ambiguity. This reduces false ambiguity failures and lookup overhead.
```diff
--- a/plan.md
+++ b/plan.md
@@ DiscussionsArgs
#[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>,
+ #[arg(long, conflicts_with = "project", help_heading = "Filters")]
+ pub project_id: Option<i64>,
@@ Ambiguity handling
+If `--project-id` is provided, IID resolution is scoped directly to that project.
+`--project-id` takes precedence over path-based project matching.
```
7. **Make path filtering rename-aware (`old` + `new`)**
Analysis: Current `--path` strategy only using `position_new_path` misses deleted/renamed-file discussions. Supporting side selection makes the feature materially more useful for review workflows.
```diff
--- a/plan.md
+++ b/plan.md
@@ DiscussionsArgs
#[arg(long, help_heading = "Filters")]
pub path: Option<String>,
+ #[arg(long, value_parser = ["either", "new", "old"], default_value = "either", help_heading = "Filters")]
+ pub path_side: String,
@@ Change 3 filtering
-Path filter matches `position_new_path`.
+Path filter semantics:
+- `either` (default): match `position_new_path` OR `position_old_path`
+- `new`: match only `position_new_path`
+- `old`: match only `position_old_path`
```
8. **Add explicit freshness behavior for empty-result queries + bootstrap backfill**
Analysis: Freshness based only on “participating rows” is undefined when results are empty. Define deterministic behavior and backfill `project_sync_state` on migration so `unknown` doesnt spike unexpectedly after deploy.
```diff
--- a/plan.md
+++ b/plan.md
@@ Freshness state logic
+Empty-result rules:
+- If query is project-scoped (`-p` or `--project-id`), freshness is computed from that project even when no rows match.
+- If query is unscoped and returns zero rows, freshness is computed from all tracked projects.
@@ A1. Track per-project sync timestamp
+Migration step: seed `project_sync_state` from latest known sync metadata where available
+to avoid mass `unknown` freshness immediately after rollout.
```
9. **Upgrade `--discussion-id` from filter-only to first-class thread retrieval**
Analysis: Filtering list output by discussion ID still returns list-shaped data and partial note context. A direct thread retrieval mode is faster for agent workflows and avoids extra commands.
```diff
--- a/plan.md
+++ b/plan.md
@@ Core Changes
-| 8 | Add direct `--discussion-id` filter to notes/discussions/show | Core | Small |
+| 8 | Add direct `--discussion-id` filter + single-thread retrieval mode | Core | Medium |
@@ Change 8
+lore -J discussions --discussion-id <id> --full-thread
+# Returns one discussion with full notes payload (same note schema as show command).
```
10. **Replace ad-hoc AC performance timing with repeatable perf harness**
Analysis: `time lore ...` is noisy and machine-dependent. A reproducible seeded benchmark test gives stable guardrails and catches regressions earlier.
```diff
--- a/plan.md
+++ b/plan.md
@@ AC-10: Performance Budget
-time lore -J discussions --for-mr <iid> -n 100
-# real 0m0.100s (p95 < 150ms)
+cargo test --test perf_discussions -- --ignored --nocapture
+# Uses seeded fixture DB and N repeated runs; asserts p95 < 150ms for target query shape.
```
If you want, I can also produce a fully merged “iteration 5” rewritten plan document with these edits applied end-to-end so its directly executable by an implementation agent.

View File

@@ -0,0 +1,143 @@
Strong plan overall. The biggest gaps Id fix are around sync-health correctness, idempotency/integrity under repeated ingests, deleted-entity lifecycle, and reducing schema drift risk without heavy reflection machinery.
I avoided everything in your `## Rejected Recommendations` section.
**1. Add Sync Health Semantics (not just age)**
Time freshness alone can mislead after partial/failed syncs. Agents need to know whether data is both recent and complete.
```diff
@@ ## Freshness Metadata & Staleness Guards (Cross-Cutting, Change A/F/G)
- pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ pub sync_status: String, // "ok" | "partial" | "failed" | "never"
+ pub last_successful_sync_run_id: Option<i64>,
+ pub last_attempted_sync_run_id: Option<i64>,
@@
-#[arg(long, help_heading = "Freshness")]
-pub require_fresh: Option<String>,
+#[arg(long, help_heading = "Freshness")]
+pub require_fresh: Option<String>,
+#[arg(long, help_heading = "Freshness")]
+pub require_sync_ok: bool,
```
Rationale: this prevents false confidence when one project is fresh-by-time but latest sync actually failed or was partial.
---
**2. Add `--require-complete` Guard for Missing Required IDs**
You already expose `meta.incomplete_rows`; add a hard gate for automation.
```diff
@@ ## Count Semantics (Cross-Cutting Convention)
`incomplete_rows` is computed via a dedicated COUNT query...
+Add CLI guard:
+`--require-complete` fails with exit code 19 when `meta.incomplete_rows > 0`.
+Suggested action: `lore sync --full`.
```
Rationale: agents can fail fast instead of silently acting on partial datasets.
---
**3. Strengthen Ingestion Idempotency + Referential Integrity for Notes**
You added natural-key uniqueness for discussions; do the same for notes and enforce parent integrity at DB level.
```diff
@@ ## Supporting Indexes (Cross-Cutting, Change D)
CREATE UNIQUE INDEX IF NOT EXISTS idx_discussions_project_gitlab_discussion_id
ON discussions(project_id, gitlab_discussion_id);
+CREATE UNIQUE INDEX IF NOT EXISTS idx_notes_project_gitlab_id
+ ON notes(project_id, gitlab_id);
+
+-- Referential integrity
+-- notes.discussion_id REFERENCES discussions(id)
+-- notes.project_id REFERENCES projects(id)
```
Rationale: repeated syncs and retries wont duplicate notes, and orphaned rows cant accumulate.
---
**4. Add Deleted/Tombstoned Entity Lifecycle**
Current plan excludes null IDs but doesnt define behavior when GitLab entities are deleted after sync.
```diff
@@ ## Contract Invariants (NEW)
+### Deletion Lifecycle Invariant
+1. Notes/discussions deleted upstream are tombstoned locally (`deleted_at`), not hard-deleted.
+2. All list/show commands exclude tombstoned rows by default.
+3. Optional flag `--include-deleted` exposes tombstoned rows for audit/debug.
```
Rationale: preserves auditability, prevents ghost actions on deleted objects, and avoids destructive resync behavior.
---
**5. Expand Discussions Payload for Rename Accuracy + Better Triage**
`--path-side old` is great, but output currently only returns `position_new_*`.
```diff
@@ ## Change 3: Add Standalone `discussions` List Command
pub position_new_path: Option<String>,
pub position_new_line: Option<i64>,
+ pub position_old_path: Option<String>,
+ pub position_old_line: Option<i64>,
+ pub last_author: Option<String>,
+ pub participant_usernames: Vec<String>,
```
Rationale: for renamed/deleted files, agents need old and new coordinates to act confidently; participants/last_author improve thread routing and prioritization.
---
**6. Add SQLite Busy Handling + Retry Policy**
Read transactions + concurrent sync writes can still produce `SQLITE_BUSY` under load.
```diff
@@ ## Count Semantics (Cross-Cutting Convention)
**Snapshot consistency**: All three queries ... inside a single read transaction ...
+**Busy handling**: set `PRAGMA busy_timeout` (e.g. 5000ms) and retry transient
+`SQLITE_BUSY` errors up to 3 times with jittered backoff for read commands.
```
Rationale: improves reliability in real multi-agent usage without changing semantics.
---
**7. Make Field Definitions Single-Source (Lightweight Drift Prevention)**
You rejected full schema generation from code; a lower-cost middle ground is shared field manifests used by both docs and `--fields` validation.
```diff
@@ ## Change 7: Fix Robot-Docs Response Schemas
+#### 7h. Single-source field manifests (no reflection)
+Define per-command field constants (e.g. `NOTES_FIELDS`, `DISCUSSIONS_FIELDS`)
+used by:
+1) `--fields` validation/filtering
+2) `--fields minimal` expansion
+3) `robot-docs` schema rendering
```
Rationale: cuts drift risk materially while staying much simpler than reflection/snapshot infra.
---
**8. De-duplicate and Upgrade Test Strategy Around Concurrency**
There are duplicated tests across Change 2 and Change 3; add explicit race tests where sync writes happen between list subqueries to prove tx consistency.
```diff
@@ ## Tests
-**Test 6**: `--project-id` scopes IID resolution directly
-**Test 7**: `--path-side old` matches renamed file discussions
-**Test 8**: `--path-side either` matches both old and new paths
+Move shared discussion-filter tests to a single section under Change 3.
+Add concurrency tests:
+1) count/page/incomplete consistency under concurrent sync writes
+2) show discussion+notes snapshot consistency under concurrent writes
```
Rationale: less maintenance noise, better coverage of your highest-risk correctness path.
---
If you want, I can also produce a single consolidated patch block that rewrites your plan text end-to-end with these edits applied in-place.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,169 @@
Below are the strongest **new** revisions Id make (excluding everything in your rejected list), with rationale and plan-level diffs.
### 1. Add a durable run ledger (`sync_runs`) with phase state
This makes surgical sync crash-resumable, auditable, and safer under Ctrl+C. Right now `run_id` is mostly ephemeral; persisting phase state removes ambiguity about what completed.
```diff
@@ Design Constraints
+9. **Durable run state**: Surgical sync MUST persist a `sync_runs` row keyed by `run_id`
+ with phase transitions (`preflight`, `ingest`, `dependents`, `docs`, `embed`, `done`, `failed`).
+ This is required for crash recovery, observability, and deterministic retries.
@@ Step 9: Create `run_sync_surgical`
+Before Stage 0, insert `sync_runs(run_id, project_id, mode='surgical', requested_counts, started_at)`.
+After each stage, update `sync_runs.phase`, counters, and `last_error` if present.
+On success/failure, set terminal state (`done`/`failed`) and `finished_at`.
```
### 2. Add `--preflight-only` (network validation without writes)
`--dry-run` is intentionally zero-network, so it cannot validate IIDs. `--preflight-only` is high-value for agents: verifies existence/permissions quickly with no DB mutation.
```diff
@@ CLI Interface
lore sync --dry-run --issue 123 -p myproject
+lore sync --preflight-only --issue 123 -p myproject
@@ Step 2: Add `--issue`, `--mr`, `-p` to `SyncArgs`
+ /// Validate remote entities and auth without any DB writes
+ #[arg(long, default_value_t = false)]
+ pub preflight_only: bool,
@@ Step 10: Add branch in `run_sync`
+if options.preflight_only && options.is_surgical() {
+ return run_sync_surgical_preflight_only(config, &options, run_id, signal).await;
+}
```
### 3. Preflight should aggregate all missing/failed IIDs, not fail-fast
Fail-fast causes repeated reruns. Aggregating errors gives one-shot correction and better robot automation.
```diff
@@ Step 7: Create `src/ingestion/surgical.rs`
-/// Returns the fetched payloads. If ANY fetch fails, the entire operation should abort.
+/// Returns fetched payloads plus per-IID failures; caller aborts writes if failures exist.
pub async fn preflight_fetch(...) -> Result<PreflightResult> {
@@
#[derive(Debug, Default)]
pub struct PreflightResult {
pub issues: Vec<GitLabIssue>,
pub merge_requests: Vec<GitLabMergeRequest>,
+ pub failures: Vec<EntityFailure>, // stage="fetch"
}
@@ Step 9: Create `run_sync_surgical`
-let preflight = preflight_fetch(...).await?;
+let preflight = preflight_fetch(...).await?;
+if !preflight.failures.is_empty() {
+ result.entity_failures = preflight.failures;
+ return Err(LoreError::Other("Surgical preflight failed for one or more IIDs".into()).into());
+}
```
### 4. Stop filtering scoped queue drains with raw `json_extract` scans
`json_extract(payload_json, '$.scope_run_id')` in hot drain queries will degrade as queue grows. Use indexed scope metadata.
```diff
@@ Step 9b: Implement scoped drain helpers
-// claim query adds:
-// AND json_extract(payload_json, '$.scope_run_id') = ?
+// Add migration:
+// 1) Add `scope_run_id` generated/stored column derived from payload_json (or explicit column)
+// 2) Create index on (project_id, job_type, scope_run_id, status, id)
+// Scoped drains filter by indexed `scope_run_id`, not full-table JSON extraction.
```
### 5. Replace `dirty_source_ids` collection-by-query with explicit run scoping
Current approach can accidentally include prior dirty rows for same source and can duplicate work. Tag dirty rows with `origin_run_id` and consume by run.
```diff
@@ Design Constraints
-2. **Dirty queue scoping**: ... MUST call ... `run_generate_docs_for_dirty_ids`
+2. **Dirty queue scoping**: Surgical sync MUST scope docs by `origin_run_id` on `dirty_sources`
+ (or equivalent exact run marker) and MUST NOT drain unrelated dirty rows.
@@ Step 7: `SurgicalIngestResult`
- pub dirty_source_ids: Vec<i64>,
+ pub origin_run_id: String,
@@ Step 9a: Implement `run_generate_docs_for_dirty_ids`
-pub fn run_generate_docs_for_dirty_ids(config: &Config, dirty_source_ids: &[i64]) -> Result<...>
+pub fn run_generate_docs_for_run_id(config: &Config, run_id: &str) -> Result<...>
```
### 6. Enforce transaction safety at the type boundary
`unchecked_transaction()` + `&Connection` signatures is fragile. Accept `&Transaction` for ingest internals and use `TransactionBehavior::Immediate` for deterministic lock behavior.
```diff
@@ Step 7: Create `src/ingestion/surgical.rs`
-pub fn ingest_issue_by_iid_from_payload(conn: &Connection, ...)
+pub fn ingest_issue_by_iid_from_payload(tx: &rusqlite::Transaction<'_>, ...)
-pub fn ingest_mr_by_iid_from_payload(conn: &Connection, ...)
+pub fn ingest_mr_by_iid_from_payload(tx: &rusqlite::Transaction<'_>, ...)
-let tx = conn.unchecked_transaction()?;
+let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Immediate)?;
```
### 7. Acquire sync lock only for mutation phases, not remote preflight
This materially reduces lock contention and keeps normal sync throughput higher, while still guaranteeing mutation serialization.
```diff
@@ Design Constraints
+10. **Lock window minimization**: Preflight fetch runs without sync lock; lock is acquired immediately
+ before first DB mutation and held through all mutation stages.
@@ Step 9: Create `run_sync_surgical`
-// ── Acquire sync lock ──
-...
-// ── Stage 0: Preflight fetch ──
+// ── Stage 0: Preflight fetch (no lock, no writes) ──
...
+// ── Acquire sync lock just before Stage 1 mutation ──
```
### 8. Add explicit transient retry policy beyond 429
Client already handles rate limits; surgical reliability improves a lot if 5xx/timeouts are retried with bounded backoff.
```diff
@@ Design Constraints
+11. **Transient retry policy**: Preflight and dependent remote fetches MUST retry boundedly on
+ timeout/5xx with jittered backoff; permanent errors (404/401/403) fail immediately.
@@ Step 5: Add `get_issue_by_iid` / `get_mr_by_iid`
+Document retry behavior for transient transport/server failures.
```
### 9. Tighten automated tests around scoping invariants
You already list manual checks; these should be enforced in unit/integration tests to prevent regressions.
```diff
@@ Step 1: TDD — Write Failing Tests First
+### 1d. New invariants tests
+- `surgical_docs_scope_ignores_preexisting_dirty_rows`
+- `scoped_queue_drain_ignores_orphaned_jobs`
+- `preflight_aggregates_multiple_missing_iids`
+- `preflight_only_performs_zero_writes`
+- `dry_run_performs_zero_network_calls`
+- `lock_window_does_not_block_during_preflight`
@@ Acceptance Criteria
+32. Scoped queue/docs invariants are covered by automated tests (not manual-only verification).
```
### 10. Make robot-mode surgical output first-class
For agent workflows, include full stage telemetry and actionable recovery commands.
```diff
@@ Step 15: Update `SyncResult` for robot mode structured output
+ /// Per-stage elapsed ms for deterministic performance tracking
+ pub stage_timings_ms: std::collections::BTreeMap<String, u64>,
+ /// Suggested recovery commands (robot ergonomics)
+ pub recovery_actions: Vec<String>,
@@ Step 14: Update `robot-docs` manifest
+Document surgical-specific error codes and `actions` schema for automated recovery.
```
If you want, I can now produce a fully rewritten **iteration 3** plan that merges these into your current structure end-to-end.

View File

@@ -0,0 +1,212 @@
1. **Resolve the current contract contradictions (`preflight-only`, `dry-run`, `sync_runs`)**
Why this improves the plan:
- Right now constraints conflict: “zero DB writes before commit” vs inserting `sync_runs` during preflight.
- This ambiguity will cause implementation drift and flaky acceptance tests.
- Splitting control-plane writes from content-plane writes keeps safety guarantees strict while preserving observability.
```diff
@@ ## Design Constraints
-6. **Preflight-then-commit**: All remote fetches happen BEFORE any DB writes. If any IID fetch fails (404, network error), the entire operation aborts with zero DB mutations.
+6. **Preflight-then-commit (content-plane)**: All remote fetches happen BEFORE any writes to content tables (`issues`, `merge_requests`, `discussions`, `resource_events`, `documents`, `embeddings`).
+7. **Control-plane exception**: `sync_runs` / `sync_run_entities` writes are allowed during preflight for observability and crash diagnostics.
@@
-11. **Preflight-only mode**: `--preflight-only` validates remote entity existence and permissions with zero DB writes.
+11. **Preflight-only mode**: `--preflight-only` performs zero content writes; control-plane run-ledger writes are allowed.
@@ ### For me to evaluate (functional):
-24. **Preflight-only mode** ... no DB mutations beyond the sync_runs ledger entry
+24. **Preflight-only mode** ... no content DB mutations; only run-ledger rows may be written
```
---
2. **Add stale-write protection to avoid TOCTOU regressions during unlocked preflight**
Why this improves the plan:
- You intentionally preflight without lock; thats good for throughput but introduces race risk.
- Without a guard, a slower surgical run can overwrite newer data ingested by a concurrent normal sync.
- This is a correctness bug under contention, not a nice-to-have.
```diff
@@ ## Design Constraints
+12. **Stale-write protection**: Surgical ingest MUST NOT overwrite fresher local rows. If local `updated_at` is newer than the preflight payloads `updated_at`, skip that entity and record `skipped_stale`.
@@ ## Step 7: Create `src/ingestion/surgical.rs`
- let labels_created = process_single_issue(conn, config, project_id, issue)?;
+ // Skip stale payloads to avoid TOCTOU overwrite after unlocked preflight.
+ if is_local_newer_issue(conn, project_id, issue.iid, issue.updated_at)? {
+ result.skipped_stale += 1;
+ return Ok(result);
+ }
+ let labels_created = process_single_issue(conn, config, project_id, issue)?;
@@
+// same guard for MR path
@@ ## Step 15: Update `SyncResult`
+ /// Entities skipped because local row was newer than preflight payload
+ pub skipped_stale: usize,
@@ ### Edge cases to verify:
+38. **TOCTOU safety**: if a normal sync updates entity after preflight but before ingest, surgical run skips stale payload (no overwrite)
```
---
3. **Make dirty-source scoping exact (do not capture pre-existing rows for same entity)**
Why this improves the plan:
- Current “query dirty rows by `source_id` after ingest” can accidentally include older dirty rows for the same entity.
- That silently violates strict run scoping and can delete unrelated backlog rows.
- You can fix this without adding `origin_run_id` to `dirty_sources` (which you already rejected).
```diff
@@ ## Step 7: Create `src/ingestion/surgical.rs`
- // Collect dirty_source rows for this entity
- let mut stmt = conn.prepare(
- "SELECT id FROM dirty_sources WHERE source_type = 'issue' AND source_id = ?1"
- )?;
+ // Capture only rows inserted by THIS call using high-water mark.
+ let before_dirty_id: i64 = conn.query_row(
+ "SELECT COALESCE(MAX(id), 0) FROM dirty_sources",
+ [], |r| r.get(0),
+ )?;
+ // ... call process_single_issue ...
+ let mut stmt = conn.prepare(
+ "SELECT id FROM dirty_sources
+ WHERE id > ?1 AND source_type = 'issue' AND source_id = ?2"
+ )?;
@@
+ // same pattern for MR
@@ ### 1d. Scoping invariant tests
+#[test]
+fn surgical_docs_scope_ignores_preexisting_dirty_rows_for_same_entity() {
+ // pre-insert dirty row for iid=7, then surgical ingest iid=7
+ // assert result.dirty_source_ids only contains newly inserted rows
+}
```
---
4. **Fix embed-stage leakage when `--no-docs` is used in surgical mode**
Why this improves the plan:
- Current design can run global embed even when docs stage is skipped, which may embed unrelated backlog docs.
- That breaks the surgical “scope only this run” promise.
- This is both correctness and operator-trust critical.
```diff
@@ ## Step 9: Create `run_sync_surgical`
- if !options.no_embed {
+ // Surgical embed only runs when surgical docs actually regenerated docs in this run.
+ if !options.no_embed && !options.no_docs && result.documents_regenerated > 0 {
@@ ## Step 4: Wire new fields in `handle_sync_cmd`
+ if options.is_surgical() && options.no_docs && !options.no_embed {
+ return Err(Box::new(LoreError::Other(
+ "In surgical mode, --no-docs requires --no-embed (to preserve scoping guarantees)".to_string()
+ )));
+ }
@@ ### For me to evaluate
+39. **No embed leakage**: `sync --issue X --no-docs` never embeds unrelated unembedded docs
```
---
5. **Add queue-failure hygiene so scoped jobs do not leak forever**
Why this improves the plan:
- Scoped drains prevent accidental processing, but failed runs can strand pending jobs permanently.
- You need explicit terminalization (`aborted`) and optional replay mechanics.
- Otherwise queue bloat and confusing diagnostics accumulate.
```diff
@@ ## Step 8a: Add `sync_runs` table migration
+ALTER TABLE dependent_queue ADD COLUMN aborted_reason TEXT;
+-- status domain now includes: pending, claimed, done, failed, aborted
@@ ## Step 9: run_sync_surgical failure paths
+// On run failure/cancel:
+conn.execute(
+ "UPDATE dependent_queue
+ SET status='aborted', aborted_reason=?1
+ WHERE project_id=?2 AND scope_run_id=?3 AND status='pending'",
+ rusqlite::params![failure_summary, project_id, run_id],
+)?;
@@ ## Acceptance Criteria
+40. **No stranded scoped jobs**: failed surgical runs leave no `pending` rows for their `scope_run_id`
```
---
6. **Persist per-entity lifecycle (`sync_run_entities`) for real observability and deterministic retry**
Why this improves the plan:
- `sync_runs` alone gives aggregate counters but not which IID failed at which stage.
- Per-entity records make retries deterministic and robot output far more useful.
- This is the missing piece for your stated “deterministic retry decisions.”
```diff
@@ ## Step 8a: Add `sync_runs` table migration
+CREATE TABLE IF NOT EXISTS sync_run_entities (
+ id INTEGER PRIMARY KEY,
+ run_id TEXT NOT NULL REFERENCES sync_runs(run_id),
+ entity_type TEXT NOT NULL CHECK(entity_type IN ('issue','merge_request')),
+ iid INTEGER NOT NULL,
+ stage TEXT NOT NULL,
+ status TEXT NOT NULL CHECK(status IN ('ok','failed','skipped_stale')),
+ error_code TEXT,
+ error_message TEXT,
+ updated_at INTEGER NOT NULL
+);
+CREATE INDEX IF NOT EXISTS idx_sync_run_entities_run ON sync_run_entities(run_id, entity_type, iid);
@@ ## Step 15: Update `SyncResult`
+ pub failed_iids: Vec<(String, u64)>,
+ pub skipped_stale_iids: Vec<(String, u64)>,
@@ ## CLI Interface
+lore --robot sync-runs --run-id <id>
+lore --robot sync-runs --run-id <id> --retry-failed
```
---
7. **Use explicit error type for surgical preflight failures (not `LoreError::Other`)**
Why this improves the plan:
- `Other(String)` loses machine semantics, weakens robot mode, and leads to bad exit-code behavior.
- A typed error preserves structured failures and enables actionable recovery commands.
```diff
@@ ## Step 9: run_sync_surgical
- return Err(LoreError::Other(
- format!("Surgical preflight failed for {} of {} IIDs: {}", ...)
- ).into());
+ return Err(LoreError::SurgicalPreflightFailed {
+ run_id: run_id.to_string(),
+ total: total_items,
+ failures: preflight.failures.clone(),
+ }.into());
@@ ## Step 15: Update `SyncResult`
+ /// Machine-actionable error summary for robot mode
+ pub error_code: Option<String>,
@@ ## Acceptance Criteria
+41. **Typed failure**: preflight failures serialize structured errors (not generic `Other`) with machine-usable codes/actions
```
---
8. **Strengthen tests for rollback, contention, and stale-skip guarantees**
Why this improves the plan:
- Current tests cover many happy-paths and scoping invariants, but key race/rollback behaviors are still under-tested.
- These are exactly where regressions will appear first in production.
```diff
@@ ## Step 1: TDD — Write Failing Tests First
+### 1f. Transactional rollback + TOCTOU tests
+1. `preflight_success_then_ingest_failure_rolls_back_all_content_writes`
+2. `stale_payload_is_skipped_when_local_updated_at_is_newer`
+3. `failed_run_aborts_pending_scoped_jobs`
+4. `surgical_no_docs_requires_no_embed`
@@ ### Automated scoping invariants
-38. **Scoped queue/docs invariants are enforced by automated tests**
+42. **Rollback and race invariants are enforced by automated tests** (no partial writes on ingest failure, no stale overwrite)
```
---
These eight revisions keep your core approach intact, avoid your explicitly rejected ideas, and close the biggest correctness/operability gaps before implementation.

View File

@@ -0,0 +1,130 @@
**Critical Gaps In Current Plan**
1. `dirty_sources` scoping is based on `id`, but `dirty_sources` has no `id` column and uses `(source_type, source_id)` UPSERT semantics.
2. Plan assumes a new `dependent_queue` with `status`, but current code uses `pending_dependent_fetches` (delete-on-complete), so queue-scoping design conflicts with existing invariants.
3. Constraint 6 says all remote fetches happen before any content writes, but the proposed surgical flow fetches discussions/events/diffs after ingest writes.
4. `sync_runs` is already an existing table and already used by `SyncRunRecorder`; the plan currently treats it like a new table.
**Best Revisions**
1. **Fix dirty-source scoping to match real schema (queued-at watermark, not `id` high-water).**
Why this is better: This removes a correctness bug and makes same-entity re-ingest deterministic under UPSERT behavior.
```diff
@@ Design Constraints
-2. Dirty queue scoping: ... capture MAX(id) FROM dirty_sources ... run_generate_docs_for_dirty_ids ...
+2. Dirty queue scoping: `dirty_sources` is keyed by `(source_type, source_id)` and updated via UPSERT.
+ Surgical scoping MUST use:
+ 1) a run-level `run_dirty_floor_ms` captured before surgical ingest, and
+ 2) explicit touched source keys from ingest (`(source_type, source_id)`).
+ Surgical docs MUST call a scoped API (e.g. `run_generate_docs_for_sources`) and MUST NOT drain global dirty queue.
@@ Step 9a
-pub fn run_generate_docs_for_dirty_ids(config: &Config, dirty_source_ids: &[i64]) -> Result<GenerateDocsResult>
+pub fn run_generate_docs_for_sources(config: &Config, sources: &[(SourceType, i64)]) -> Result<GenerateDocsResult>
```
2. **Bypass shared dependent queue in surgical mode; run dependents inline per target.**
Why this is better: Avoids queue migration churn, avoids run-scope conflicts with existing unique constraints, and removes orphan-job hygiene complexity entirely.
```diff
@@ Design Constraints
-4. Dependent queue scoping: ... scope_run_id indexed column on dependent_queue ...
+4. Surgical dependent execution: surgical mode MUST bypass `pending_dependent_fetches`.
+ Dependents (resource_events, mr_closes_issues, mr_diffs) run inline for targeted entities only.
+ Global queue remains for normal sync only.
@@ Design Constraints
-14. Queue failure hygiene: ... pending scoped jobs ... terminalized to aborted ...
+14. Surgical failure hygiene: surgical mode MUST leave no queue artifacts because it does not enqueue dependent jobs.
@@ Step 9b / 9c / Step 13
-Implement scoped drain helpers and enqueue_job scope_run_id plumbing
+Replace with direct per-entity helpers in ingestion layer:
+ - sync_issue_resource_events_direct(...)
+ - sync_mr_resource_events_direct(...)
+ - sync_mr_closes_issues_direct(...)
+ - sync_mr_diffs_direct(...)
```
3. **Clarify atomicity contract to “primary-entity atomicity” (remove contradiction).**
Why this is better: Keeps strong zero-write guarantees for missing IIDs while matching practical staged pipeline behavior.
```diff
@@ Design Constraints
-6. Preflight-then-commit (content-plane): All remote fetches happen BEFORE any writes to content tables ...
+6. Primary-entity atomicity: all requested issue/MR payload fetches complete before first content write.
+ If any primary IID fetch fails, primary ingest does zero content writes.
+ Dependent stages (discussions/events/diffs/closes) are post-ingest and best-effort, with structured per-stage failure reporting.
```
4. **Extend existing `sync_runs` schema instead of redefining it.**
Why this is better: Preserves compatibility with current `SyncRunRecorder`, `sync_status`, and existing historical data.
```diff
@@ Step 8a
-Add `sync_runs` table migration (CREATE TABLE sync_runs ...)
+Add migration 027 to extend existing `sync_runs` table:
+ - ADD COLUMN mode TEXT NULL -- 'standard' | 'surgical'
+ - ADD COLUMN phase TEXT NULL -- preflight|ingest|dependents|docs|embed|done|failed
+ - ADD COLUMN surgical_summary_json TEXT NULL
+Reuse `SyncRunRecorder` row lifecycle; do not introduce a parallel run-ledger model.
```
5. **Strengthen TOCTOU stale protection for equal timestamps.**
Why this is better: Prevents regressions when `updated_at` is equal but a fresher local fetch already happened.
```diff
@@ Design Constraints
-13. ... If local `updated_at` is newer than preflight payload `updated_at`, skip ...
+13. ... Skip stale when:
+ a) local.updated_at > payload.updated_at, OR
+ b) local.updated_at == payload.updated_at AND local.last_seen_at > preflight_started_at_ms.
+ This prevents equal-timestamp regressions under concurrent sync.
@@ Step 1f tests
+Add test: `equal_updated_at_but_newer_last_seen_is_skipped`.
```
6. **Shrink lock window further: release `sync` lock before embed; use dedicated embed lock.**
Why this is better: Prevents long embedding from blocking unrelated syncs and avoids concurrent embed writers.
```diff
@@ Design Constraints
-11. Lock ... held through all mutation stages.
+11. Lock ... held through ingest/dependents/docs only.
+ Release `AppLock("sync")` before embed.
+ Embed stage uses `AppLock("embed")` for single-flight embedding writes.
@@ Step 9
-Embed runs inside the same sync lock window
+Embed runs after sync lock release, under dedicated embed lock
```
7. **Add the missing `sync-runs` robot read path (the plan references it but doesnt define it).**
Why this is better: Makes durable run-state actually useful for recovery automation and observability.
```diff
@@ Step 14 (new)
+## Step 14a: Add `sync-runs` read command
+
+CLI:
+ lore --robot sync-runs --limit 20
+ lore --robot sync-runs --run-id <id>
+ lore --robot sync-runs --state failed
+
+Robot response fields:
+ run_id, mode, phase, status, started_at, finished_at, counters, failures, suggested_retry_command
```
8. **Add URL-native surgical targets (`--issue-url`, `--mr-url`) with project inference.**
Why this is better: Much more agent-friendly and reduces project-resolution errors from copy/paste workflows.
```diff
@@ CLI Interface
lore sync --issue 123 --issue 456 -p myproject
+lore sync --issue-url https://gitlab.example.com/group/proj/-/issues/123
+lore sync --mr-url https://gitlab.example.com/group/proj/-/merge_requests/789
@@ Step 2
+Add repeatable flags:
+ --issue-url <url>
+ --mr-url <url>
+Parse URL into (project_path, iid). If all targets are URL-derived and same project, `-p` is optional.
+If mixed projects are provided in one command, reject with clear error.
```
If you want, I can produce a single consolidated patched version of your plan (iteration 5 draft) with these revisions already merged.

View File

@@ -0,0 +1,152 @@
Highest-impact revisions after reviewing your v5 plan:
1. **Fix a real scoping hole: embed can still process unrelated docs**
Rationale: Current plan assumes scoped docs implies scoped embed, but that only holds while no other run creates unembedded docs. You explicitly release sync lock before embed, so another sync can enqueue/regenerate docs in between, and `run_embed` may embed unrelated backlog. This breaks surgical isolation and can hide backlog debt.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
-3. Embed scoping: Embedding runs only for documents regenerated by this surgical run. Because `run_embed` processes only unembedded docs, scoping is automatic IF docs are scoped correctly...
+3. Embed scoping: Embedding MUST be explicitly scoped to documents regenerated by this surgical run.
+ `run_generate_docs_for_sources` returns regenerated `document_ids`; surgical mode calls
+ `run_embed_for_document_ids(document_ids)` and never global `run_embed`.
+ This remains true even after lock release and under concurrent normal sync activity.
@@ Step 9a: Implement `run_generate_docs_for_sources`
-pub fn run_generate_docs_for_sources(...) -> Result<GenerateDocsResult> {
+pub fn run_generate_docs_for_sources(...) -> Result<GenerateDocsResult> {
+ // Return regenerated document IDs for scoped embedding.
+ // GenerateDocsResult { regenerated, errored, regenerated_document_ids: Vec<i64> }
@@ Step 9: Embed stage
- match run_embed(config, false, false, None, signal).await {
+ match run_embed_for_document_ids(config, &result.regenerated_document_ids, signal).await {
```
2. **Make run-ledger lifecycle actually durable (and consistent with your own constraint 10)**
Rationale: Plan text says “reuse `SyncRunRecorder`”, but Step 9 writes raw SQL directly. That creates lifecycle drift, missing heartbeats, and inconsistent failure handling as code evolves.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
-10. Durable run state: ... Reuses `SyncRunRecorder` row lifecycle ...
+10. Durable run state: surgical sync MUST use `SyncRunRecorder` end-to-end (no ad-hoc SQL updates).
+ Add recorder APIs for `set_mode`, `set_phase`, `set_counters`, `finish_succeeded`,
+ `finish_failed`, `finish_cancelled`, and periodic `heartbeat`.
@@ Step 9: Create `run_sync_surgical`
- conn.execute("INSERT INTO sync_runs ...")
- conn.execute("UPDATE sync_runs SET phase = ...")
+ let mut recorder = SyncRunRecorder::start_surgical(...)?;
+ recorder.set_phase("preflight")?;
+ recorder.heartbeat_if_due()?;
+ recorder.set_phase("ingest")?;
+ ...
+ recorder.finish_succeeded_with_warnings(...)?;
```
3. **Add explicit `cancelled` terminal state**
Rationale: Current early cancellation branches return `Ok(result)` without guaranteed run-row finalization. That leaves misleading `running` rows and weak crash diagnostics.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
+15. Cancellation semantics: If shutdown is observed after run start, phase is set to `cancelled`,
+ status is `cancelled`, `finished_at` is written, and lock is released before return.
@@ Step 8a migration
+ALTER TABLE sync_runs ADD COLUMN warnings_count INTEGER NOT NULL DEFAULT 0;
+ALTER TABLE sync_runs ADD COLUMN cancelled_at INTEGER;
@@ Acceptance Criteria
+47. Cancellation durability: Ctrl+C during surgical sync records `status='cancelled'`,
+ `phase='cancelled'`, and `finished_at` in `sync_runs`.
```
4. **Reduce lock contention further by separating dependent fetch and dependent write**
Rationale: You currently hold lock through network-heavy dependent stages. That maximizes contention and increases lock timeout risk. Better: fetch dependents unlocked, write in short locked transactions with per-entity freshness guards.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
-11. Lock window minimization: ... held through ingest, dependents, and docs stages.
+11. Lock window minimization: lock is held only for DB mutation windows.
+ Dependents run in two phases:
+ (a) fetch from GitLab without lock,
+ (b) write results under lock in short transactions.
+ Apply per-entity freshness checks before dependent writes.
@@ Step 9: Dependent stages
- // All dependents run INLINE per-entity ... while lock is held
+ // Dependents fetch outside lock, then write under lock with CAS-style watermark guards.
```
5. **Introduce stage timeout budgets to prevent hung surgical runs**
Rationale: A single slow GitLab endpoint can stall the whole run and hold resources too long. Timeout budgets plus per-entity failure recording keep the run bounded and predictable.
```diff
diff --git a/plan.md b/plan.md
@@ Design Constraints
+16. Stage timeout budgets: each dependent fetch has a per-entity timeout and a global stage budget.
+ Timed-out entities are recorded in `entity_failures` with code `TIMEOUT` and run continues best-effort.
@@ Step 9 notes
+ - Wrap dependent network calls with `tokio::time::timeout`.
+ - Add config knobs:
+ `sync.surgical_entity_timeout_seconds` (default 20),
+ `sync.surgical_dependents_budget_seconds` (default 120).
```
6. **Add payload integrity checks (project mismatch hard-fail)**
Rationale: Surgical mode is precision tooling. If API/proxy misconfiguration returns payloads from wrong project, you should fail preflight loudly, not trust downstream assumptions.
```diff
diff --git a/plan.md b/plan.md
@@ Step 7: preflight_fetch
+ // Integrity check: payload.project_id must equal requested gitlab_project_id.
+ // On mismatch, record EntityFailure { code: "PROJECT_MISMATCH", stage: "fetch" }.
@@ Step 9d: error codes
+PROJECT_MISMATCH -> usage/config data integrity failure (typed, machine-readable)
@@ Acceptance Criteria
+48. Project integrity: payloads with unexpected `project_id` are rejected in preflight
+ and produce zero content writes.
```
7. **Upgrade robot output from aggregate-only to per-entity lifecycle**
Rationale: `entity_failures` alone is not enough for robust automation. Agents need a complete entity outcome map (fetched, ingested, stale-skipped, dependent failures) to retry deterministically.
```diff
diff --git a/plan.md b/plan.md
@@ Step 15: Update `SyncResult`
+pub struct EntityOutcome {
+ pub entity_type: String,
+ pub iid: u64,
+ pub fetched: bool,
+ pub ingested: bool,
+ pub stale_skipped: bool,
+ pub dependent_failures: Vec<EntityFailure>,
+}
@@
+pub entity_outcomes: Vec<EntityOutcome>,
+pub completion_status: String, // succeeded | succeeded_with_warnings | failed | cancelled
@@ Robot mode
- enables agents to detect partial failures via `entity_failures`
+ enables deterministic, per-IID retry and richer UI messaging.
```
8. **Index `sync_runs` for real observability at scale**
Rationale: Youre adding mode/phase/counters and then querying recent surgical runs. Without indexes, this degrades as run history grows.
```diff
diff --git a/plan.md b/plan.md
@@ Step 8a migration
+CREATE INDEX IF NOT EXISTS idx_sync_runs_mode_started
+ ON sync_runs(mode, started_at DESC);
+CREATE INDEX IF NOT EXISTS idx_sync_runs_status_phase_started
+ ON sync_runs(status, phase, started_at DESC);
```
9. **Add tests specifically for the new failure-prone paths**
Rationale: Current tests are strong on ingest and scoping, but still miss new high-risk runtime behavior (cancel state, timeout handling, scoped embed under concurrency).
```diff
diff --git a/plan.md b/plan.md
@@ Step 1f tests
+#[tokio::test]
+async fn cancellation_marks_sync_run_cancelled() { ... }
+
+#[tokio::test]
+async fn dependent_timeout_records_entity_failure_and_continues() { ... }
+
+#[tokio::test]
+async fn scoped_embed_does_not_embed_unrelated_docs_created_after_docs_stage() { ... }
@@ Acceptance Criteria
+49. Scoped embed isolation under concurrency is verified by automated test.
+50. Timeout path is verified (TIMEOUT code + continued processing).
```
These revisions keep your core direction intact, avoid every rejected recommendation, and materially improve correctness under concurrency, operational observability, and agent automation quality.

2240
docs/plan-surgical-sync.md Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -1,174 +0,0 @@
Highest-impact gaps I see in the current plan:
1. `for-issue` / `for-mr` filtering is ambiguous across projects and can return incorrect rows.
2. `lore notes` has no pagination contract, so large exports and deterministic resumption are weak.
3. Migration `022` is high-risk (table rebuild + FTS + junction tables) without explicit integrity gates.
4. Note-doc freshness is incomplete for upstream note deletions and parent metadata changes (labels/title).
Below are my best revisions, each with rationale and a git-diff-style plan edit.
---
1. **Add gated rollout + rollback controls**
Rationale: You can still “ship together” while reducing blast radius. This makes recovery fast if note-doc generation causes DB/embedding pressure.
```diff
@@ ## Design
-Two phases, shipped together as one feature:
+Two phases, shipped together as one feature, but with runtime gates:
+
+- `feature.notes_cli` (Phase 1 surface)
+- `feature.note_documents` (Phase 2 indexing/extraction path)
+
+Rollout order:
+1) Enable `notes_cli`
+2) Run note-doc backfill in bounded batches
+3) Enable `note_documents` for continuous updates
+
+Rollback:
+- Disabling `feature.note_documents` stops new note-doc generation without affecting issue/MR/discussion docs.
```
2. **Add keyset pagination + deterministic ordering**
Rationale: Needed for year-long reviewer analysis and reliable “continue where I left off” behavior under concurrent updates.
```diff
@@ pub struct NoteListFilters<'a> {
pub limit: usize,
+ pub cursor: Option<&'a str>, // keyset token "<sort_ms>:<id>"
+ pub include_total_count: bool, // avoid COUNT(*) in hot paths
@@
- pub sort: &'a str, // "created" (default) | "updated"
+ pub sort: &'a str, // "created" | "updated"
@@ query_notes SQL
-ORDER BY {sort_column} {order}
+ORDER BY {sort_column} {order}, n.id {order}
LIMIT ?
```
3. **Make `for-issue` / `for-mr` project-scoped**
Rationale: IIDs are not globally unique. Requiring project avoids false positives and hard-to-debug cross-project leakage.
```diff
@@ pub struct NotesArgs {
- #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr")]
+ #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr", requires = "project")]
pub for_issue: Option<i64>,
@@
- #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue")]
+ #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue", requires = "project")]
pub for_mr: Option<i64>,
```
4. **Upgrade path filtering semantics**
Rationale: Review comments often reference renames/moves. Restricting to `position_new_path` misses relevant notes.
```diff
@@ pub struct NotesArgs {
- /// Filter by file path (trailing / for prefix match)
+ /// Filter by file path
#[arg(long, help_heading = "Filters")]
pub path: Option<String>,
+ /// Path mode: exact|prefix|glob
+ #[arg(long = "path-mode", value_parser = ["exact","prefix","glob"], default_value = "exact", help_heading = "Filters")]
+ pub path_mode: String,
+ /// Match against old path as well as new path
+ #[arg(long = "match-old-path", help_heading = "Filters")]
+ pub match_old_path: bool,
@@ query_notes filter mappings
-- `path` ... n.position_new_path ...
+- `path` applies to `n.position_new_path` and optionally `n.position_old_path`.
+- `glob` mode translates `*`/`?` to SQL LIKE with escaping.
```
5. **Add explicit performance indexes (new migration)**
Rationale: `notes` becomes a first-class query surface; without indexes, filters degrade quickly at 10k+ note scale.
```diff
@@ ## Phase 1: `lore notes` Command
+### Work Chunk 1E: Query Performance Indexes
+**Files:** `migrations/023_notes_query_indexes.sql`, `src/core/db.rs`
+
+Add indexes:
+- `notes(project_id, created_at DESC, id DESC)`
+- `notes(author_username, created_at DESC, id DESC) WHERE is_system = 0`
+- `notes(discussion_id)`
+- `notes(position_new_path)`
+- `notes(position_old_path)`
+- `discussions(issue_id)`
+- `discussions(merge_request_id)`
```
6. **Harden migration 022 with transactional integrity checks**
Rationale: This is the riskiest part of the plan. Add hard fail-fast checks so corruption cannot silently pass.
```diff
@@ ### Work Chunk 2A: Schema Migration (022)
+Migration safety requirements:
+- Execute in a single `BEGIN IMMEDIATE ... COMMIT` transaction.
+- Capture and compare pre/post row counts for `documents`, `document_labels`, `document_paths`, `dirty_sources`.
+- Run `PRAGMA foreign_key_check` and abort on any violation.
+- Run `PRAGMA integrity_check` and abort on non-`ok`.
+- Rebuild FTS and assert `documents_fts` rowcount equals `documents` rowcount.
```
7. **Add note deletion + parent-change propagation**
Rationale: Current plan handles create/update ingestion but not all staleness paths. Without this, note documents drift.
```diff
@@ ## Phase 2: Per-Note Documents
+### Work Chunk 2G: Freshness Propagation
+**Files:** `src/ingestion/discussions.rs`, `src/ingestion/mr_discussions.rs`, `src/documents/regenerator.rs`
+
+Rules:
+- If a previously stored note is missing from upstream payload, delete local note row and enqueue `(note, id)` for document deletion.
+- When parent issue/MR title or labels change, enqueue descendant note docs dirty (notes inherit parent metadata).
+- Keep idempotent behavior for repeated syncs.
```
8. **Separate FTS coverage from embedding coverage**
Rationale: Biggest cost/perf risk is embeddings. Index all notes in FTS, but embed selectively with policy knobs.
```diff
@@ ## Estimated Document Volume Impact
-FTS5 handles this comfortably. Embedding generation time scales linearly (~4x increase).
+FTS5 handles this comfortably. Embedding generation is policy-controlled:
+- FTS: index all non-system note docs
+- Embeddings default: only notes with body length >= 40 chars (configurable)
+- Add config: `documents.note_embeddings.min_chars`, `documents.note_embeddings.enabled`
+- Prioritize unresolved DiffNotes before other notes during embedding backfill
```
9. **Bring structured reviewer profiling into scope (not narrative reporting)**
Rationale: This directly serves the stated use case and makes the feature compelling immediately.
```diff
@@ ## Non-Goals
-- Adding a "reviewer profile" report command (that's a downstream use case built on this infrastructure)
+- Generating free-form narrative reviewer reports.
+ A structured profiling command is in scope.
+
+## Phase 3: Structured Reviewer Profiling
+Add `lore notes profile --author <user> --since <window>` returning:
+- top commented paths
+- top parent labels
+- unresolved-comment ratio
+- note-type distribution
+- median comment length
```
10. **Add operational SLOs + robot-mode status for note pipeline**
Rationale: Reliability improves when regressions are observable, not inferred from failures.
```diff
@@ ## Verification Checklist
+Operational checks:
+- `lore -J stats` includes per-`source_type` document counts (including `note`)
+- Add queue lag metrics: oldest dirty note age, retry backlog size
+- Add extraction error breakdown by `source_type`
+- Add smoke assertion: disabling `feature.note_documents` leaves other source regeneration unaffected
```
---
If you want, I can produce a single consolidated revised PRD draft (fully merged text, not just diffs) as the next step.

View File

@@ -1,200 +0,0 @@
Below are the strongest revisions Id make, excluding everything in your `## Rejected Recommendations` list.
1. **Add a Phase 0 for stable note identity before any note-doc generation**
Rationale: your current plan still allows note document churn because Issue discussion ingestion is delete/reinsert-based. That makes local `notes.id` unstable, causing unnecessary dirtying/regeneration and potential stale-doc edge cases. Stabilizing identity first (upsert-by-GitLab-ID + sweep stale) improves correctness and cuts repeated work.
```diff
@@ ## Design
-Two phases, shipped together as one feature:
+Three phases, shipped together as one feature:
+- **Phase 0 (Foundation):** Stable note identity in local DB (upsert + sweep, no delete/reinsert churn)
- **Phase 1 (Option A):** `lore notes` command — direct SQL query over the `notes` table with rich filtering
- **Phase 2 (Option B):** Per-note documents — each non-system note becomes its own searchable document in the FTS/embedding pipeline
@@
+## Phase 0: Stable Note Identity
+
+### Work Chunk 0A: Upsert/Sweep for Issue Discussion Notes
+**Files:** `src/ingestion/discussions.rs`, `migrations/022_notes_identity_index.sql`, `src/core/db.rs`
+**Implementation:**
+- Add unique index: `UNIQUE(project_id, gitlab_id)` on `notes`
+- Replace delete/reinsert issue-note flow with upsert + `last_seen_at` sweep (same durability model as MR note sweep)
+- Ensure `insert_note/upsert_note` returns the stable local row id for both insert and update paths
```
2. **Replace `source_type` CHECK constraints with a registry table + FK in migration**
Rationale: table CHECKs force full table rebuild for every new source type forever. A `source_types` table with FK keeps DB-level integrity and future extensibility without rebuilding `documents`/`dirty_sources` every time. This is a major architecture hardening win.
```diff
@@ ### Work Chunk 2A: Schema Migration (023)
-Current migration ... CHECK constraints limiting `source_type` ...
+Current migration ... CHECK constraints limiting `source_type` ...
+Revision: migrate to `source_types` registry table + FK constraints.
@@
-1. `dirty_sources` — add `'note'` to source_type CHECK
-2. `documents` — add `'note'` to source_type CHECK
+1. Create `source_types(name TEXT PRIMARY KEY)` and seed: `issue, merge_request, discussion, note`
+2. Rebuild `dirty_sources` and `documents` to replace CHECK with `REFERENCES source_types(name)`
+3. Future source-type additions become `INSERT INTO source_types(name) VALUES (?)` (no table rebuild)
@@
+#### Additional integrity tests
+#[test]
+fn test_source_types_registry_contains_note() { ... }
+#[test]
+fn test_documents_source_type_fk_enforced() { ... }
+#[test]
+fn test_dirty_sources_source_type_fk_enforced() { ... }
```
3. **Mark note documents dirty only when note semantics actually changed**
Rationale: current loops mark every non-system note dirty every sync. With 8k+ notes this creates avoidable queue pressure and regeneration time. Change-aware dirtying (inserted/changed only) gives major performance and stability improvements.
```diff
@@ ### Work Chunk 2D: Regenerator & Dirty Tracking Integration
-for note in notes {
- let local_note_id = insert_note(&tx, local_discussion_id, &note, None)?;
- if !note.is_system {
- dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, local_note_id)?;
- }
-}
+for note in notes {
+ let outcome = upsert_note(&tx, local_discussion_id, &note, None)?;
+ if !note.is_system && outcome.changed_semantics {
+ dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.local_note_id)?;
+ }
+}
@@
+// changed_semantics should include: body, note_type, path/line positions, resolvable/resolved/resolved_by, updated_at
```
4. **Expand filters to support real analysis windows and resolution state**
Rationale: reviewer profiling usually needs bounded windows and both resolved/unresolved views. Current `unresolved: bool` is too narrow and one-sided. Add `--until` and tri-state resolution filtering for better analytical power.
```diff
@@ pub struct NoteListFilters<'a> {
- pub since: Option<&'a str>,
+ pub since: Option<&'a str>,
+ pub until: Option<&'a str>,
@@
- pub unresolved: bool,
+ pub resolution: &'a str, // "any" (default) | "unresolved" | "resolved"
@@
- pub author: Option<&'a str>,
+ pub author: Option<&'a str>, // case-insensitive match
@@
- // Filter by time (7d, 2w, 1m, or YYYY-MM-DD)
+ // Filter by start time (7d, 2w, 1m, or YYYY-MM-DD)
pub since: Option<String>,
+ /// Filter by end time (7d, 2w, 1m, or YYYY-MM-DD)
+ #[arg(long, help_heading = "Filters")]
+ pub until: Option<String>,
@@
- /// Only show unresolved review comments
- pub unresolved: bool,
+ /// Resolution filter: any, unresolved, resolved
+ #[arg(long, value_parser = ["any", "unresolved", "resolved"], default_value = "any", help_heading = "Filters")]
+ pub resolution: String,
```
5. **Broaden index strategy to match actual query shapes, not just author queries**
Rationale: `idx_notes_user_created` helps one path, but common usage also includes project+time scans and unresolved filters. Add two more partial composites for high-selectivity paths.
```diff
@@ ### Work Chunk 1E: Composite Query Index
CREATE INDEX IF NOT EXISTS idx_notes_user_created
ON notes(project_id, author_username, created_at DESC, id DESC)
WHERE is_system = 0;
+
+CREATE INDEX IF NOT EXISTS idx_notes_project_created
+ON notes(project_id, created_at DESC, id DESC)
+WHERE is_system = 0;
+
+CREATE INDEX IF NOT EXISTS idx_notes_unresolved_project_created
+ON notes(project_id, created_at DESC, id DESC)
+WHERE is_system = 0 AND resolvable = 1 AND resolved = 0;
@@
+#[test]
+fn test_notes_query_plan_uses_project_created_index_for_default_listing() { ... }
+#[test]
+fn test_notes_query_plan_uses_unresolved_index_when_resolution_unresolved() { ... }
```
6. **Improve per-note document payload with structured metadata header + minimal thread context**
Rationale: isolated single-note docs can lose meaning. A small structured header plus lightweight context (parent + one preceding note excerpt) improves semantic retrieval quality substantially without re-bundling full threads.
```diff
@@ ### Work Chunk 2C: Note Document Extractor
-// 6. Format content:
-// [[Note]] {note_type or "Comment"} on {parent_type_prefix}: {parent_title}
-// Project: {path_with_namespace}
-// URL: {url}
-// Author: @{author}
-// Date: {format_date(created_at)}
-// Labels: {labels_json}
-// File: {position_new_path}:{position_new_line} (if DiffNote)
-//
-// --- Body ---
-//
-// {body}
+// 6. Format content with machine-readable header:
+// [[Note]]
+// source_type: note
+// note_gitlab_id: {gitlab_id}
+// project: {path_with_namespace}
+// parent_type: {Issue|MergeRequest}
+// parent_iid: {iid}
+// note_type: {DiffNote|DiscussionNote|Comment}
+// author: @{author}
+// created_at: {iso8601}
+// resolved: {true|false}
+// path: {position_new_path}:{position_new_line}
+// url: {url}
+//
+// --- Context ---
+// parent_title: {title}
+// previous_note_excerpt: {optional, max 200 chars}
+//
+// --- Body ---
+// {body}
```
7. **Add first-class export modes for downstream profiling pipelines**
Rationale: this makes the feature much more useful immediately (LLM prompts, notebook analysis, external scripts) without adding a profiling command. It stays within your non-goals and increases adoption.
```diff
@@ pub struct NotesArgs {
+ /// Output format
+ #[arg(long, value_parser = ["table", "json", "jsonl", "csv"], default_value = "table", help_heading = "Output")]
+ pub format: String,
@@
- if robot_mode {
+ if robot_mode || args.format == "json" || args.format == "jsonl" || args.format == "csv" {
print_list_notes_json(...)
} else {
print_list_notes(&result);
}
@@ ### Work Chunk 1C: Human & Robot Output Formatting
+Add `print_list_notes_csv()` and `print_list_notes_jsonl()`:
+- CSV columns mirror `NoteListRowJson` field names
+- JSONL emits one note object per line for streaming pipelines
```
8. **Strengthen verification with idempotence + migration data-preservation checks**
Rationale: this feature touches ingestion, migrations, indexing, and regeneration. Add explicit idempotence/perf checks so regressions surface early.
```diff
@@ ## Verification Checklist
cargo test
cargo clippy --all-targets -- -D warnings
cargo fmt --check
+cargo test test_note_ingestion_idempotent_across_two_syncs
+cargo test test_note_document_count_stable_after_second_generate_docs_full
@@
+lore sync
+lore generate-docs --full
+lore -J stats > /tmp/stats1.json
+lore generate-docs --full
+lore -J stats > /tmp/stats2.json
+# assert note doc count unchanged and dirty queue drains to zero
```
If you want, I can turn this into a fully rewritten PRD v2 draft with these changes merged in-place and renumbered work chunks end-to-end.

View File

@@ -1,162 +0,0 @@
These are the highest-impact revisions Id make. They avoid everything in your `## Rejected Recommendations` list.
1. Add immediate note-document deletion propagation (dont wait for `generate-docs --full`)
Why: right now, deleted notes can leave stale `source_type='note'` documents until a full rebuild. That creates incorrect search/reporting results and weakens trust in the dataset.
```diff
@@ Phase 0: Stable Note Identity
+### Work Chunk 0B: Immediate Deletion Propagation
+
+When sweep deletes stale notes, propagate deletion to documents in the same transaction.
+Do not rely on eventual cleanup via `generate-docs --full`.
+
+#### Tests to Write First
+#[test]
+fn test_issue_note_sweep_deletes_note_documents_immediately() { ... }
+#[test]
+fn test_mr_note_sweep_deletes_note_documents_immediately() { ... }
+
+#### Implementation
+Use `DELETE ... RETURNING id, is_system` in note sweep functions.
+For returned non-system note ids:
+1) `DELETE FROM documents WHERE source_type='note' AND source_id=?`
+2) `DELETE FROM dirty_sources WHERE source_type='note' AND source_id=?`
```
2. Add one-time upgrade backfill for existing notes (migration 024)
Why: existing DBs will otherwise only get note-documents for changed/new notes. Historical notes remain invisible unless users manually run full rebuild.
```diff
@@ Phase 2: Per-Note Documents
+### Work Chunk 2H: Backfill Existing Notes After Upgrade (Migration 024)
+
+Create migration `024_note_dirty_backfill.sql`:
+INSERT INTO dirty_sources (source_type, source_id, queued_at)
+SELECT 'note', n.id, unixepoch('now') * 1000
+FROM notes n
+LEFT JOIN documents d
+ ON d.source_type='note' AND d.source_id=n.id
+WHERE n.is_system=0 AND d.id IS NULL
+ON CONFLICT(source_type, source_id) DO NOTHING;
+
+Add migration test asserting idempotence and expected queue size.
```
3. Fix `--since/--until` semantics and validation
Why: reusing `parse_since` for `until` creates ambiguous windows and off-by-boundary behavior; your own example `--since 90d --until 180d` is chronologically reversed.
```diff
@@ Work Chunk 1A: Data Types & Query Layer
- since: parse_since(since_str) then n.created_at >= ?
- until: parse_since(until_str) then n.created_at <= ?
+ since: parse_since_start_bound(since_str) then n.created_at >= ?
+ until: parse_until_end_bound(until_str) then n.created_at <= ?
+ Validate since <= until; otherwise return a clear user error.
+
+#### Tests to Write First
+#[test] fn test_query_notes_invalid_time_window_rejected() { ... }
+#[test] fn test_query_notes_until_date_is_end_of_day_inclusive() { ... }
```
4. Separate semantic-change detection from housekeeping updates
Why: current proposed `WHERE` includes `updated_at`, which will cause unnecessary dirty churn. You want `last_seen_at` to always refresh, but regeneration only when searchable semantics changed.
```diff
@@ Work Chunk 0A: Upsert/Sweep for Issue Discussion Notes
- OR notes.updated_at IS NOT excluded.updated_at
+ -- updated_at-only changes should not mark semantic dirty
+
+Perform two-step logic:
+1) Upsert always updates persistence/housekeeping fields (`updated_at`, `last_seen_at`).
+2) `changed_semantics` is computed only from fields used by note documents/search filters
+ (body, note_type, resolved flags, paths, author, parent linkage).
+
+#### Tests to Write First
+#[test]
+fn test_issue_note_upsert_updated_at_only_does_not_mark_semantic_change() { ... }
```
5. Make indexes align with actual query collation and join strategy
Why: `author` uses `COLLATE NOCASE`; without collation-aware index, SQLite can skip index use. Also, IID filters via scalar subqueries are harder for planner than direct join predicates.
```diff
@@ Work Chunk 1E: Composite Query Index
-CREATE INDEX ... ON notes(project_id, author_username, created_at DESC, id DESC) WHERE is_system = 0;
+CREATE INDEX ... ON notes(project_id, author_username COLLATE NOCASE, created_at DESC, id DESC) WHERE is_system = 0;
+
+CREATE INDEX IF NOT EXISTS idx_discussions_issue_id ON discussions(issue_id);
+CREATE INDEX IF NOT EXISTS idx_discussions_mr_id ON discussions(merge_request_id);
```
```diff
@@ Work Chunk 1A: query_notes()
- d.issue_id = (SELECT id FROM issues WHERE iid = ? AND project_id = ?)
+ i.iid = ? AND i.project_id = ?
- d.merge_request_id = (SELECT id FROM merge_requests WHERE iid = ? AND project_id = ?)
+ m.iid = ? AND m.project_id = ?
```
6. Replace manual CSV escaping with `csv` crate
Why: manual RFC4180 escaping is fragile (quotes/newlines/multi-byte edge cases). This is exactly where a mature library reduces long-term bug risk.
```diff
@@ Work Chunk 1C: Human & Robot Output Formatting
- Uses a minimal CSV writer (no external dependency — the format is simple enough for manual escaping).
+ Uses `csv::Writer` for RFC4180-compliant escaping and stable output across edge cases.
+
+#### Tests to Write First
+#[test] fn test_csv_output_multiline_and_quotes_roundtrip() { ... }
```
7. Add `--contains` lexical body filter to `lore notes`
Why: useful middle ground between metadata filtering and semantic search; great for reviewer-pattern mining without requiring FTS query syntax.
```diff
@@ Work Chunk 1B: CLI Arguments & Command Wiring
+/// Filter by case-insensitive substring in note body
+#[arg(long, help_heading = "Filters")]
+pub contains: Option<String>;
```
```diff
@@ Work Chunk 1A: NoteListFilters
+ pub contains: Option<&'a str>,
@@ query_notes dynamic filters
+ if contains.is_some() {
+ where_clauses.push("n.body LIKE ? COLLATE NOCASE");
+ params.push(format!("%{}%", escape_like(contains.unwrap())));
+ }
```
8. Reduce note-document embedding noise by slimming metadata header
Why: current verbose key-value header repeats low-signal tokens and consumes embedding budget. Keep context, but bias tokens toward actual review text.
```diff
@@ Work Chunk 2C: Note Document Extractor
- Build content with structured metadata header:
- [[Note]]
- source_type: note
- note_gitlab_id: ...
- project: ...
- ...
- --- Body ---
- {body}
+ Build content with compact, high-signal layout:
+ [[Note]]
+ @{author} on {Issue#|MR!}{iid} in {project_path}
+ path: {path:line} (only when available)
+ state: {resolved|unresolved} (only when resolvable)
+
+ {body}
+
+Keep detailed metadata in structured document columns/labels/paths/url,
+not repeated in verbose text.
```
9. Add explicit performance regression checks for the new hot paths
Why: this feature increases document volume ~4x; you should pin acceptable query behavior now so future changes dont silently degrade.
```diff
@@ Verification Checklist
+Performance/plan checks:
+1) `EXPLAIN QUERY PLAN` for:
+ - author+since query
+ - project+date query
+ - for-mr / for-issue query
+2) Seed 50k-note synthetic fixture and assert:
+ - `lore notes --author ... --limit 100` stays under agreed local threshold
+ - `lore search --type note ...` remains deterministic and completes successfully
```
If you want, I can also provide a fully merged “iteration 3” PRD text with these edits applied end-to-end so you can drop it in directly.

View File

@@ -1,187 +0,0 @@
1. **Canonical note identity for documents: use `notes.gitlab_id` as `source_id`**
Why this is better: the current plan still couples document identity to local row IDs. Even with upsert+sweep, local IDs are a storage artifact and can be reused in edge cases. Using GitLab note IDs as canonical document IDs makes regeneration, backfill, and deletion propagation more stable and portable.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ Phase 0: Stable Note Identity
-Phase 2 depends on `notes.id` as the `source_id` for note documents.
+Phase 2 uses `notes.gitlab_id` as the `source_id` for note documents.
+`notes.id` remains an internal relational key only.
@@ Work Chunk 0A
pub struct NoteUpsertOutcome {
pub local_note_id: i64,
+ pub document_source_id: i64, // notes.gitlab_id
pub changed_semantics: bool,
}
@@ Work Chunk 2D
-if !note.is_system && outcome.changed_semantics {
- dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.local_note_id)?;
+if !note.is_system && outcome.changed_semantics {
+ dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.document_source_id)?;
}
@@ Work Chunk 2E
-SELECT 'note', n.id, ?1
+SELECT 'note', n.gitlab_id, ?1
@@ Work Chunk 2H
-ON d.source_type = 'note' AND d.source_id = n.id
+ON d.source_type = 'note' AND d.source_id = n.gitlab_id
```
2. **Prevent false deletions on partial/incomplete syncs**
Why this is better: sweep-based deletion is correct only when a discussions notes were fully fetched. If a page fails mid-fetch, current logic can incorrectly delete valid notes. Add an explicit “fetch complete” guard before sweep.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ Phase 0
+### Work Chunk 0C: Sweep Safety Guard (Partial Fetch Protection)
+
+Only run stale-note sweep when note pagination completed successfully for that discussion.
+If fetch is partial/interrupted, skip sweep and keep prior notes intact.
+#### Tests to Write First
+#[test]
+fn test_partial_fetch_does_not_sweep_notes() { /* ... */ }
+
+#[test]
+fn test_complete_fetch_runs_sweep_notes() { /* ... */ }
+#### Implementation
+if discussion_fetch_complete {
+ sweep_stale_issue_notes(...)?;
+} else {
+ tracing::warn!("Skipping stale sweep for discussion {} due to partial fetch", discussion_gitlab_id);
+}
```
3. **Make deletion propagation set-based (not per-note loop)**
Why this is better: the current per-note DELETE loop is O(N) statements and gets slow on large threads. A temp-table/CTE set-based delete is faster, simpler to reason about, and remains atomic.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ Work Chunk 0B Implementation
- for note_id in stale_note_ids {
- conn.execute("DELETE FROM documents WHERE source_type = 'note' AND source_id = ?", [note_id])?;
- conn.execute("DELETE FROM dirty_sources WHERE source_type = 'note' AND source_id = ?", [note_id])?;
- }
+ CREATE TEMP TABLE _stale_note_source_ids(source_id INTEGER PRIMARY KEY) WITHOUT ROWID;
+ INSERT INTO _stale_note_source_ids
+ SELECT gitlab_id
+ FROM notes
+ WHERE discussion_id = ? AND last_seen_at < ? AND is_system = 0;
+
+ DELETE FROM notes
+ WHERE discussion_id = ? AND last_seen_at < ?;
+
+ DELETE FROM documents
+ WHERE source_type = 'note'
+ AND source_id IN (SELECT source_id FROM _stale_note_source_ids);
+
+ DELETE FROM dirty_sources
+ WHERE source_type = 'note'
+ AND source_id IN (SELECT source_id FROM _stale_note_source_ids);
+
+ DROP TABLE _stale_note_source_ids;
```
4. **Fix project-scoping and time-window semantics in `lore notes`**
Why this is better: the plan currently has a contradiction: clap `requires = "project"` blocks use of `defaultProject`, while query layer says default fallback is allowed. Also, `since/until` parsing should use one shared “now” to avoid subtle drift and inverted windows.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ Work Chunk 1B NotesArgs
-#[arg(long = "for-issue", ..., requires = "project")]
+#[arg(long = "for-issue", ...)]
pub for_issue: Option<i64>;
-#[arg(long = "for-mr", ..., requires = "project")]
+#[arg(long = "for-mr", ...)]
pub for_mr: Option<i64>;
@@ Work Chunk 1A Query Notes
-- `since`: `parse_since(since_str)` then `n.created_at >= ?`
-- `until`: `parse_since(until_str)` then `n.created_at <= ?`
+- Parse `since` and `until` with a single anchored `now_ms` captured once per command.
+- If user supplies `YYYY-MM-DD` for `--until`, interpret as end-of-day (23:59:59.999 UTC).
+- Validate `since <= until` after both parse with same anchor.
```
5. **Add an analytics mode (not a profile command): `lore notes --aggregate`**
Why this is better: this directly supports the stated use case (review patterns) without introducing the rejected “profile report” command. It keeps scope narrow and reuses existing filters.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ Phase 1
+### Work Chunk 1F: Aggregation Mode for Notes Listing
+
+Add optional aggregation on top of `lore notes`:
+- `--aggregate author|note_type|path|resolution`
+- `--top N` (default 20)
+
+Behavior:
+- Reuses all existing filters (`--since`, `--project`, `--for-mr`, etc.)
+- Returns grouped counts (+ percentage of filtered corpus)
+- Works in table/json/jsonl/csv
+
+Non-goal alignment:
+- This is not a narrative “reviewer profile” command.
+- It is a query primitive for downstream analysis.
```
6. **Prevent note backfill from starving other document regeneration**
Why this is better: after migration/backfill, note dirty entries can dominate the queue and delay issue/MR/discussion updates. Add source-type fairness in regenerator scheduling.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ Work Chunk 2D
+#### Scheduling Revision
+Process dirty sources with weighted fairness instead of strict FIFO:
+- issue: 3
+- merge_request: 3
+- discussion: 2
+- note: 1
+
+Implementation sketch:
+- fetch next batch by source_type buckets
+- interleave according to weights
+- preserve retry semantics per source
+#### Tests to Write First
+#[test]
+fn test_note_backfill_does_not_starve_issue_and_mr_regeneration() { /* ... */ }
```
7. **Harden migration 023: remove invalid SQL assertions and move integrity checks to tests**
Why this is better: `RAISE(ABORT, ...)` in standalone `SELECT` is not valid SQLite usage outside triggers/check expressions. Keep migration SQL minimal/portable and enforce invariants in migration tests.
```diff
--- a/PRD.md
+++ b/PRD.md
@@ Work Chunk 2A Migration SQL
--- Step 10: Integrity verification
-SELECT CASE
- WHEN ... THEN RAISE(ABORT, '...')
-END;
+-- Step 10 removed from SQL migration.
+-- Integrity verification is enforced in migration tests:
+-- 1) pre/post row-count equality
+-- 2) `PRAGMA foreign_key_check` is empty
+-- 3) documents_fts row count matches documents row count after rebuild
@@ Work Chunk 2A Tests
+#[test]
+fn test_migration_023_integrity_checks_pass() {
+ // pre/post counts, foreign_key_check empty, fts parity
+}
```
These 7 revisions improve correctness under failure, reduce churn risk, improve large-sync performance, and make the feature materially more useful for reviewer-analysis workflows without reintroducing any rejected recommendations.

View File

@@ -1,190 +0,0 @@
Here are the highest-impact revisions Id make. None of these repeat anything in your `## Rejected Recommendations`.
1. **Add immutable reviewer identity (`author_id`) as a first-class key**
Why this improves the plan: the PRDs core use case is year-scale reviewer profiling. Usernames are mutable in GitLab, so username-only filtering will fragment one reviewer into multiple identities over time. Adding `author_id` closes that correctness hole and makes historical analysis reliable.
```diff
@@ Problem Statement
-1. **Query individual notes by author** — the `--author` filter on `lore search` only matches the first note's author per discussion thread
+1. **Query individual notes by reviewer identity** — support both mutable username and immutable GitLab `author_id` for stable longitudinal analysis
@@ Phase 0: Stable Note Identity
+### Work Chunk 0D: Immutable Author Identity Capture
+**Files:** `migrations/025_notes_author_id.sql`, `src/ingestion/discussions.rs`, `src/ingestion/mr_discussions.rs`, `src/cli/commands/list.rs`
+
+#### Implementation
+- Add nullable `notes.author_id INTEGER` and backfill from future syncs.
+- Populate `author_id` from GitLab note payload (`note.author.id`) on both issue and MR note ingestion paths.
+- Add `--author-id <int>` filter to `lore notes`.
+- Keep `--author` for ergonomics; when both provided, require both to match.
+
+#### Indexing
+- Add `idx_notes_author_id_created ON notes(project_id, author_id, created_at DESC, id DESC) WHERE is_system = 0;`
+
+#### Tests
+- `test_query_notes_filter_author_id_survives_username_change`
+- `test_query_notes_author_and_author_id_intersection`
```
2. **Strengthen partial-fetch safety from a boolean to an explicit fetch state contract**
Why this improves the plan: `fetch_complete: bool` is easy to misuse and fragile under retries/crashes. A run-scoped state model makes sweep correctness auditable and prevents accidental deletions when ingestion aborts midway.
```diff
@@ Phase 0: Stable Note Identity
-### Work Chunk 0C: Sweep Safety Guard (Partial Fetch Protection)
+### Work Chunk 0C: Sweep Safety Guard with Run-Scoped Fetch State
@@ Implementation
-Add a `fetch_complete` parameter to the discussion ingestion functions. Only run the stale-note sweep when the fetch completed successfully:
+Add a run-scoped fetch state:
+- `FetchState::Complete`
+- `FetchState::Partial`
+- `FetchState::Failed`
+
+Only run sweep on `FetchState::Complete`.
+Persist `run_seen_at` once per sync run and pass unchanged through all discussion/note upserts.
+Require `run_seen_at` monotonicity per discussion before sweep (skip and warn otherwise).
@@ Tests to Write First
+#[test]
+fn test_failed_fetch_never_sweeps_even_after_partial_upserts() { ... }
+#[test]
+fn test_non_monotonic_run_seen_at_skips_sweep() { ... }
+#[test]
+fn test_retry_after_failed_fetch_then_complete_sweeps_correctly() { ... }
```
3. **Add DB-level cleanup triggers for note-document referential integrity**
Why this improves the plan: Work Chunk 0B handles the sweep path, but not every possible delete path. DB triggers give defense-in-depth so stale note docs cannot survive even if a future code path deletes notes differently.
```diff
@@ Work Chunk 0B: Immediate Deletion Propagation
-Update both sweep functions to propagate deletion to documents and dirty_sources using set-based SQL
+Keep set-based SQL in sweep functions, and add DB-level cleanup triggers as a safety net.
@@ Work Chunk 2A: Schema Migration (023)
+-- Cleanup trigger: deleting a non-system note must delete note document + dirty queue row
+CREATE TRIGGER notes_ad_cleanup AFTER DELETE ON notes
+WHEN old.is_system = 0
+BEGIN
+ DELETE FROM documents
+ WHERE source_type = 'note' AND source_id = old.id;
+ DELETE FROM dirty_sources
+ WHERE source_type = 'note' AND source_id = old.id;
+END;
+
+-- Cleanup trigger: if note flips to system, remove its document artifacts
+CREATE TRIGGER notes_au_system_cleanup AFTER UPDATE OF is_system ON notes
+WHEN old.is_system = 0 AND new.is_system = 1
+BEGIN
+ DELETE FROM documents
+ WHERE source_type = 'note' AND source_id = new.id;
+ DELETE FROM dirty_sources
+ WHERE source_type = 'note' AND source_id = new.id;
+END;
```
4. **Eliminate N+1 extraction cost with parent metadata caching in regeneration**
Why this improves the plan: backfilling ~8k notes with per-note parent/label lookups creates avoidable query amplification. Batch caching turns repeated joins into one-time lookups per parent entity and materially reduces rebuild time.
```diff
@@ Phase 2: Per-Note Documents
+### Work Chunk 2I: Batch Parent Metadata Cache for Note Regeneration
+**Files:** `src/documents/regenerator.rs`, `src/documents/extractor.rs`
+
+#### Implementation
+- Add `NoteExtractionContext` cache keyed by `(noteable_type, parent_id)` containing:
+ - parent iid/title/url
+ - parent labels
+ - project path
+- In batch regeneration, prefetch parent metadata for note IDs in the current chunk.
+- Use cached metadata in `extract_note_document()` to avoid repeated parent/label queries.
+
+#### Tests
+- `test_note_regeneration_uses_parent_cache_consistently`
+- `test_note_regeneration_cache_hit_preserves_hash_determinism`
```
5. **Add embedding dedup cache keyed by semantic text hash**
Why this improves the plan: note docs will contain repeated short comments (“LGTM”, “nit: …”). Current doc-level hashing includes metadata, so identical semantic comments still re-embed many times. A semantic embedding hash cache cuts cost and speeds full rebuild/backfill without changing search behavior.
```diff
@@ Phase 2: Per-Note Documents
+### Work Chunk 2J: Semantic Embedding Dedup for Notes
+**Files:** `migrations/026_embedding_cache.sql`, embedding pipeline module(s), `src/documents/extractor.rs`
+
+#### Implementation
+- Compute `embedding_text` for notes as: normalized note body + compact stable context (`parent_type`, `path`, `resolution`), excluding volatile fields.
+- Compute `embedding_hash = sha256(embedding_text)`.
+- Before embedding generation, lookup existing vector by `(model, embedding_hash)`.
+- Reuse cached vector when present; only call embedding model on misses.
+
+#### Tests
+- `test_identical_note_bodies_reuse_embedding_vector`
+- `test_embedding_hash_changes_when_semantic_context_changes`
```
6. **Add deterministic review-signal tags as derived labels**
Why this improves the plan: this makes output immediately more useful for reviewer-pattern analysis without adding a profile command (which is explicitly out of scope). It increases practical value of both `lore notes` and `lore search --type note` with low complexity.
```diff
@@ Non-Goals
-- Adding a "reviewer profile" report command (that's a downstream use case built on this infrastructure)
+- Adding a "reviewer profile" report command (downstream), while allowing low-level derived signal tags as indexing primitives
@@ Phase 2: Per-Note Documents
+### Work Chunk 2K: Derived Review Signal Labels
+**Files:** `src/documents/extractor.rs`
+
+#### Implementation
+- Derive deterministic labels from note text + metadata:
+ - `signal:nit`
+ - `signal:blocking`
+ - `signal:security`
+ - `signal:performance`
+ - `signal:testing`
+- Attach via existing `document_labels` flow for note documents.
+- No new CLI mode required; existing label filters can consume these labels.
+
+#### Tests
+- `test_note_document_derives_signal_labels_nit`
+- `test_note_document_derives_signal_labels_security`
+- `test_signal_label_derivation_is_deterministic`
```
7. **Add high-precision note targeting filters (`--note-id`, `--gitlab-note-id`, `--discussion-id`)**
Why this improves the plan: debugging, incident response, and reproducibility all benefit from exact addressing. This is especially useful when validating sync correctness and cross-checking a specific note/document lifecycle.
```diff
@@ Work Chunk 1B: CLI Arguments & Command Wiring
pub struct NotesArgs {
+ /// Filter by local note row id
+ #[arg(long = "note-id", help_heading = "Filters")]
+ pub note_id: Option<i64>,
+
+ /// Filter by GitLab note id
+ #[arg(long = "gitlab-note-id", help_heading = "Filters")]
+ pub gitlab_note_id: Option<i64>,
+
+ /// Filter by local discussion id
+ #[arg(long = "discussion-id", help_heading = "Filters")]
+ pub discussion_id: Option<i64>,
}
@@ Work Chunk 1A: Filter struct
pub struct NoteListFilters<'a> {
+ pub note_id: Option<i64>,
+ pub gitlab_note_id: Option<i64>,
+ pub discussion_id: Option<i64>,
}
@@ Tests to Write First
+#[test]
+fn test_query_notes_filter_note_id_exact() { ... }
+#[test]
+fn test_query_notes_filter_gitlab_note_id_exact() { ... }
+#[test]
+fn test_query_notes_filter_discussion_id_exact() { ... }
```
If you want, I can produce a single consolidated “iteration 5” PRD diff that merges these into your exact section ordering and updates the dependency graph/migration numbering end-to-end.

View File

@@ -1,434 +0,0 @@
Below are the highest-leverage revisions Id make to this plan. Im focusing on correctness pitfalls, SQLite gotchas, query performance on 280K notes, and reducing “dynamic SQL + param juggling” complexity—without turning this into a new ingestion project.
Change 1 — Fix a hard SQLite bug in --active (GROUP_CONCAT DISTINCT + separator)
Why
SQLite does not allow GROUP_CONCAT(DISTINCT x, sep). With DISTINCT, SQLite only permits a single argument (GROUP_CONCAT(DISTINCT x)). Your current query will error at runtime in many SQLite versions.
Revision
Use a subquery that selects distinct participants, then GROUP_CONCAT with your separator.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_active(...)
- (SELECT GROUP_CONCAT(DISTINCT n.author_username, X'1F')
- FROM notes n
- WHERE n.discussion_id = d.id
- AND n.is_system = 0
- AND n.author_username IS NOT NULL) AS participants
+ (SELECT GROUP_CONCAT(username, X'1F') FROM (
+ SELECT DISTINCT n.author_username AS username
+ FROM notes n
+ WHERE n.discussion_id = d.id
+ AND n.is_system = 0
+ AND n.author_username IS NOT NULL
+ ORDER BY username
+ )) AS participants
Change 2 — Replace “contains('.') => exact file match” with segment-aware path classification
Why
path.contains('.') misclassifies directories like:
.github/workflows/
src/v1.2/auth/
It also fails the “root file” case (README.md) because your mode discriminator only treats paths as paths if they contain /.
Revision
Add explicit --path to force Expert mode (covers root files cleanly).
Classify file-vs-dir by checking last path segment for a dot, and whether the input ends with /.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ pub struct WhoArgs {
- /// Username or file path (path if contains /)
- pub target: Option<String>,
+ /// Username or file path shorthand (ambiguous for root files like README.md)
+ pub target: Option<String>,
+
+ /// Force expert mode for a file/directory path (supports root files like README.md)
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["active", "overlap", "reviews"])]
+ pub path: Option<String>,
@@ fn resolve_mode<'a>(args: &'a WhoArgs) -> Result<WhoMode<'a>> {
- if let Some(target) = &args.target {
+ if let Some(p) = &args.path {
+ return Ok(WhoMode::Expert { path: p });
+ }
+ if let Some(target) = &args.target {
let clean = target.strip_prefix('@').unwrap_or(target);
if args.reviews {
return Ok(WhoMode::Reviews { username: clean });
}
- // Disambiguation: if target contains '/', it's a file path.
- // GitLab usernames never contain '/'.
- if target.contains('/') {
+ // Disambiguation:
+ // - treat as path if it contains '/'
+ // - otherwise treat as username (root files require --path)
+ if target.contains('/') {
return Ok(WhoMode::Expert { path: target });
}
return Ok(WhoMode::Workload { username: clean });
}
And update the path pattern logic used by Expert/Overlap:
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_expert(...)
- // Normalize path for LIKE matching: add trailing % if no extension
- let path_pattern = if path.contains('.') {
- path.to_string() // Exact file match
- } else {
- let trimmed = path.trim_end_matches('/');
- format!("{trimmed}/%")
- };
+ // Normalize:
+ // - if ends_with('/') => directory prefix
+ // - else if last segment contains '.' => file exact match
+ // - else => directory prefix
+ let trimmed = path.trim_end_matches('/');
+ let last = trimmed.rsplit('/').next().unwrap_or(trimmed);
+ let is_file = !path.ends_with('/') && last.contains('.');
+ let path_pattern = if is_file { trimmed.to_string() } else { format!("{trimmed}/%") };
Change 3 — Stop building dynamic SQL strings for optional filters; always bind params
Why
Right now youre mixing:
dynamic project_clause string fragments
ad-hoc param vectors
placeholder renumbering by branch
Thats brittle and easy to regress (especially when you add more conditions later). SQLite/rusqlite can bind Option<T> to NULL, which enables a simple pattern:
sql
Copy code
AND (?3 IS NULL OR n.project_id = ?3)
Revision (representative; apply to all queries)
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_expert(...)
- let project_clause = if project_id.is_some() {
- "AND n.project_id = ?3"
- } else {
- ""
- };
-
- let sql = format!(
+ let sql = format!(
"SELECT username, role, activity_count, last_active_at FROM (
@@
FROM notes n
WHERE n.position_new_path LIKE ?1
AND n.is_system = 0
AND n.author_username IS NOT NULL
AND n.created_at >= ?2
- {project_clause}
+ AND (?3 IS NULL OR n.project_id = ?3)
@@
WHERE n.position_new_path LIKE ?1
AND m.author_username IS NOT NULL
AND m.updated_at >= ?2
- {project_clause}
+ AND (?3 IS NULL OR n.project_id = ?3)
GROUP BY m.author_username
- )"
+ ) t"
);
-
- let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
- params.push(Box::new(path_pattern.clone()));
- params.push(Box::new(since_ms));
- if let Some(pid) = project_id {
- params.push(Box::new(pid));
- }
- let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
+ let param_refs = rusqlite::params![path_pattern, since_ms, project_id];
Notes:
Adds required derived-table alias t (some SQLite configurations are stricter).
Eliminates the dynamic param vector and placeholder gymnastics.
Change 4 — Filter “path touch” queries to DiffNotes and escape LIKE properly
Why
Only DiffNotes reliably have position_new_path; including other note types can skew counts and harm performance.
LIKE treats % and _ as wildcards—rare in file paths, but not impossible (generated files, templates). Escaping is a low-cost robustness win.
Revision
Add note_type='DiffNote' and LIKE ... ESCAPE '\' plus a tiny escape helper.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_expert(...)
- FROM notes n
- WHERE n.position_new_path LIKE ?1
+ FROM notes n
+ WHERE n.note_type = 'DiffNote'
+ AND n.position_new_path LIKE ?1 ESCAPE '\'
AND n.is_system = 0
@@
diff --git a/Plan.md b/Plan.md
@@ Helper Functions
+fn escape_like(input: &str) -> String {
+ input.replace('\\', "\\\\").replace('%', "\\%").replace('_', "\\_")
+}
And when building patterns:
diff
Copy code
- let path_pattern = if is_file { trimmed.to_string() } else { format!("{trimmed}/%") };
+ let base = escape_like(trimmed);
+ let path_pattern = if is_file { base } else { format!("{base}/%") };
Apply the same changes to query_overlap and any other position_new_path LIKE ....
Change 5 — Use note timestamps for “touch since” semantics (Expert/Overlap author branch)
Why
In Expert/Overlap “author” branches you filter by m.updated_at >= since. That answers “MR updated recently” rather than “MR touched at this path recently”, which can surface stale ownership.
Revision
Filter by the note creation time (and use it for “last touch” where relevant). You can still compute author activity, but anchor it to note activity.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_overlap(...)
- WHERE n.position_new_path LIKE ?1
+ WHERE n.note_type = 'DiffNote'
+ AND n.position_new_path LIKE ?1 ESCAPE '\'
AND m.state IN ('opened', 'merged')
AND m.author_username IS NOT NULL
- AND m.updated_at >= ?2
+ AND n.created_at >= ?2
AND (?3 IS NULL OR m.project_id = ?3)
Same idea in Expert modes “MR authors” branch.
Change 6 — Workload mode: apply --since consistently to unresolved discussions
Why
Workloads unresolved discussions ignore since_ms. That makes --since partially misleading and can dump very old threads.
Revision
Filter on d.last_note_at when since_ms is set.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ fn query_workload(...)
- let disc_sql = format!(
+ let disc_since = if since_ms.is_some() {
+ "AND d.last_note_at >= ?2"
+ } else { "" };
+ let disc_sql = format!(
"SELECT d.noteable_type,
@@
WHERE d.resolvable = 1 AND d.resolved = 0
AND EXISTS (
@@
)
{disc_project_filter}
+ {disc_since}
ORDER BY d.last_note_at DESC
LIMIT {limit}"
);
@@
- // Rebuild params for discussion query (only username + optional project_id)
- let mut disc_params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
- disc_params.push(Box::new(username.to_string()));
- if let Some(pid) = project_id {
- disc_params.push(Box::new(pid));
- }
+ // Params: username, since_ms, project_id (NULLs ok)
+ let disc_param_refs = rusqlite::params![username, since_ms, project_id];
(If you adopt Change 3 fully, this becomes very clean.)
Change 7 — Make Overlap results represent “both roles” instead of collapsing to one
Why
Collapsing to a single role loses valuable info (“they authored and reviewed”). Also your current “prefer author” rule is arbitrary for the “who else is touching this” question.
Revision
Track role counts separately and render as A, R, or A+R.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ pub struct OverlapUser {
pub username: String,
- pub role: String,
- pub touch_count: u32,
+ pub author_touch_count: u32,
+ pub review_touch_count: u32,
+ pub touch_count: u32,
pub last_touch_at: i64,
pub mr_iids: Vec<i64>,
}
@@ fn query_overlap(...)
- let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
+ let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
username: username.clone(),
- role: role.clone(),
+ author_touch_count: 0,
+ review_touch_count: 0,
touch_count: 0,
last_touch_at: 0,
mr_iids: Vec::new(),
});
entry.touch_count += count;
+ if role == "author" { entry.author_touch_count += count; }
+ if role == "reviewer" { entry.review_touch_count += count; }
@@ human output
- println!(
- " {:<16} {:<8} {:>7} {:<12} {}",
+ println!(
+ " {:<16} {:<6} {:>7} {:<12} {}",
...
);
@@
- user.role,
+ format_roles(user.author_touch_count, user.review_touch_count),
Change 8 — Add an “Index Audit + optional migration” step (big perf win, low blast radius)
Why
With 280K notes, the path/timestamp queries will degrade quickly without indexes. This isnt “scope creep”; its making the feature usable.
Revision (plan-level)
Add a non-breaking migration that only creates indexes if missing.
Optionally add a runtime check: if EXPLAIN QUERY PLAN indicates full table scan on notes, print a dim warning in human mode.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ Implementation Order
-| Step | What | Files |
+| Step | What | Files |
| 1 | CLI skeleton: `WhoArgs` + `Commands::Who` + dispatch + stub | `cli/mod.rs`, `commands/mod.rs`, `main.rs` |
+| 1.5 | Index audit + add `CREATE INDEX IF NOT EXISTS` migration for who hot paths | `migrations/0xx_who_indexes.sql` |
@@
Suggested indexes (tune names to your conventions):
notes(note_type, position_new_path, created_at)
notes(discussion_id, is_system, author_username)
discussions(resolvable, resolved, last_note_at, project_id)
merge_requests(project_id, state, updated_at, author_username)
issue_assignees(username, issue_id)
Even if SQLite cant perfectly index LIKE, these still help with join and timestamp filters.
Change 9 — Make robot JSON reproducible by echoing the effective query inputs
Why
Agent workflows benefit from a stable “query record”: what mode ran, what path/user, resolved project, effective since, limit.
Revision
Include an input object in JSON output.
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ struct WhoJsonData {
mode: String,
+ input: serde_json::Value,
#[serde(flatten)]
result: serde_json::Value,
}
@@ pub fn print_who_json(...)
- let output = WhoJsonEnvelope {
+ let input = serde_json::json!({
+ "project": /* resolved or raw args.project */,
+ "since": /* resolved since ISO */,
+ "limit": /* args.limit */,
+ });
+ let output = WhoJsonEnvelope {
ok: true,
data: WhoJsonData {
mode: mode.to_string(),
+ input,
result: data,
},
meta: RobotMeta { elapsed_ms },
};
Change 10 — Tighten clap constraints so invalid combinations never reach resolve_mode
Why
Right now conflicts are enforced manually (or not at all). Clamp the invalid combos at the CLI layer:
--active should conflict with target, --overlap, --reviews, --path
--reviews should require a username (and should conflict with Expert path modes)
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@ pub struct WhoArgs {
- pub active: bool,
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["target", "overlap", "reviews", "path"])]
+ pub active: bool,
@@
- pub overlap: Option<String>,
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["target", "active", "reviews", "path"])]
+ pub overlap: Option<String>,
@@
- pub reviews: bool,
+ #[arg(long, help_heading = "Mode", requires = "target", conflicts_with_all = ["active", "overlap", "path"])]
+ pub reviews: bool,
Summary of what Id definitely change
If you do nothing else, do these first:
Fix GROUP_CONCAT(DISTINCT ..., sep) in Active mode (runtime error).
Path classification: add --path, and stop using contains('.') globally.
Remove dynamic SQL + param vectors: always bind project_id as nullable and use (? IS NULL OR ...).
Filter to DiffNotes + LIKE escaping for correctness and fewer rows scanned.
Optional index migration: otherwise this will feel slow/non-deterministically slow depending on local DB state.
If you want, I can also provide a consolidated “v2 plan” as a single unified patch (one diff) rather than per-change snippets.

View File

@@ -1,303 +0,0 @@
Below are the highest-leverage revisions Id make to iteration 1 to tighten correctness, performance, and “agent usefulness” without blowing up scope. For each change: (1) rationale, (2) a focused unified diff against the plan you pasted.
Change 1 — Make robot “input echo” actually resolved (project_id, project_path, since_ms/iso, mode)
Why
Your Design Principle #5 says the robot envelope should echo resolved inputs (“effective since, resolved project”), but the current input object echoes only raw CLI strings. Agents cant reliably reproduce or compare runs (e.g., fuzzy project resolution may map differently over time).
This is also a reliability improvement: “what ran” should be computed once and propagated, not recomputed in output.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-5. **Robot-first reproducibility.** Robot JSON output includes an `input` object echoing the resolved query parameters (effective since, resolved project, limit) so agents can trace exactly what ran.
+5. **Robot-first reproducibility.** Robot JSON output includes a `resolved_input` object (mode, since_ms + since_iso, resolved project_id + project_path, limit, db_path) so agents can trace exactly what ran.
@@
-/// Main entry point. Resolves mode from args and dispatches.
-pub fn run_who(config: &Config, args: &WhoArgs) -> Result<WhoResult> {
+/// Main entry point. Resolves mode + resolved inputs once, then dispatches.
+pub fn run_who(config: &Config, args: &WhoArgs) -> Result<WhoRun> {
let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?;
- let project_id = args
+ let project_id = args
.project
.as_deref()
.map(|p| resolve_project(&conn, p))
.transpose()?;
+ let project_path = project_id
+ .map(|id| lookup_project_path(&conn, id))
+ .transpose()?;
let mode = resolve_mode(args)?;
match mode {
WhoMode::Expert { path } => {
let since_ms = resolve_since(args.since.as_deref(), "6m")?;
let result = query_expert(&conn, path, project_id, since_ms, args.limit)?;
- Ok(WhoResult::Expert(result))
+ Ok(WhoRun::new("expert", &db_path, project_id, project_path, since_ms, args.limit, WhoResult::Expert(result)))
}
@@
}
}
+
+/// Wrapper that carries resolved inputs for reproducible output.
+pub struct WhoRun {
+ pub mode: String,
+ pub resolved_input: WhoResolvedInput,
+ pub result: WhoResult,
+}
+
+pub struct WhoResolvedInput {
+ pub db_path: String,
+ pub project_id: Option<i64>,
+ pub project_path: Option<String>,
+ pub since_ms: i64,
+ pub since_iso: String,
+ pub limit: usize,
+}
@@
-pub fn print_who_json(result: &WhoResult, args: &WhoArgs, elapsed_ms: u64) {
- let (mode, data) = match result {
+pub fn print_who_json(run: &WhoRun, args: &WhoArgs, elapsed_ms: u64) {
+ let (mode, data) = match &run.result {
WhoResult::Expert(r) => ("expert", expert_to_json(r)),
@@
- let input = serde_json::json!({
+ let input = serde_json::json!({
"target": args.target,
"path": args.path,
"project": args.project,
"since": args.since,
"limit": args.limit,
});
+
+ let resolved_input = serde_json::json!({
+ "mode": run.mode,
+ "db_path": run.resolved_input.db_path,
+ "project_id": run.resolved_input.project_id,
+ "project_path": run.resolved_input.project_path,
+ "since_ms": run.resolved_input.since_ms,
+ "since_iso": run.resolved_input.since_iso,
+ "limit": run.resolved_input.limit,
+ });
@@
- data: WhoJsonData {
- mode: mode.to_string(),
- input,
- result: data,
- },
+ data: WhoJsonData { mode: mode.to_string(), input, resolved_input, result: data },
meta: RobotMeta { elapsed_ms },
};
@@
struct WhoJsonData {
mode: String,
input: serde_json::Value,
+ resolved_input: serde_json::Value,
#[serde(flatten)]
result: serde_json::Value,
}
Change 2 — Remove dynamic SQL format!(..LIMIT {limit}) and parameterize LIMIT everywhere
Why
You explicitly prefer static SQL ((?N IS NULL OR ...)) to avoid subtle bugs; but Workload/Active use format! for LIMIT. Even though limit is typed, its an inconsistency that complicates statement caching and encourages future string assembly creep.
SQLite supports LIMIT ? with bound parameters; rusqlite can bind an i64.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
- let issues_sql = format!(
- "SELECT ...
- ORDER BY i.updated_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&issues_sql)?;
+ let issues_sql =
+ "SELECT ...
+ ORDER BY i.updated_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(issues_sql)?;
let assigned_issues: Vec<WorkloadIssue> = stmt
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let authored_sql = format!(
- "SELECT ...
- ORDER BY m.updated_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&authored_sql)?;
+ let authored_sql =
+ "SELECT ...
+ ORDER BY m.updated_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(authored_sql)?;
@@
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let reviewing_sql = format!(
- "SELECT ...
- ORDER BY m.updated_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&reviewing_sql)?;
+ let reviewing_sql =
+ "SELECT ...
+ ORDER BY m.updated_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(reviewing_sql)?;
@@
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let disc_sql = format!(
- "SELECT ...
- ORDER BY d.last_note_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&disc_sql)?;
+ let disc_sql =
+ "SELECT ...
+ ORDER BY d.last_note_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(disc_sql)?;
@@
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let sql = format!(
- "SELECT ...
- ORDER BY d.last_note_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&sql)?;
+ let sql =
+ "SELECT ...
+ ORDER BY d.last_note_at DESC
+ LIMIT ?3";
+ let mut stmt = conn.prepare(sql)?;
@@
- .query_map(rusqlite::params![since_ms, project_id], |row| {
+ .query_map(rusqlite::params![since_ms, project_id, limit as i64], |row| {
Change 3 — Fix path matching for dotless files (LICENSE/Makefile) via “exact OR prefix” (no new flags)
Why
Your improved “dot only in last segment” heuristic still fails on dotless files (LICENSE, Makefile, Dockerfile) which are common, especially at repo root. Right now theyll be treated as directories (LICENSE/%) and silently return nothing.
Best minimal UX: if user provides a path thats ambiguous (no trailing slash), match either exact file OR directory prefix.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-/// Build a LIKE pattern from a user-supplied path, with proper LIKE escaping.
-///
-/// Rules:
-/// - If the path ends with `/`, it's a directory prefix → `escaped_path%`
-/// - If the last path segment contains `.`, it's a file → exact match
-/// - Otherwise, it's a directory prefix → `escaped_path/%`
+/// Build an exact + prefix match from a user-supplied path, with proper LIKE escaping.
+///
+/// Rules:
+/// - If the path ends with `/`, treat as directory-only (prefix match)
+/// - Otherwise, treat as ambiguous: exact match OR directory prefix
+/// (fixes dotless files like LICENSE/Makefile without requiring new flags)
@@
-fn build_path_pattern(path: &str) -> String {
+struct PathMatch {
+ exact: String,
+ prefix: String,
+ dir_only: bool,
+}
+
+fn build_path_match(path: &str) -> PathMatch {
let trimmed = path.trim_end_matches('/');
- let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
- let is_file = !path.ends_with('/') && last_segment.contains('.');
let escaped = escape_like(trimmed);
-
- if is_file {
- escaped
- } else {
- format!("{escaped}/%")
- }
+ PathMatch {
+ exact: escaped.clone(),
+ prefix: format!("{escaped}/%"),
+ dir_only: path.ends_with('/'),
+ }
}
@@
- let path_pattern = build_path_pattern(path);
+ let pm = build_path_match(path);
@@
- AND n.position_new_path LIKE ?1 ESCAPE '\\'
+ AND (
+ (?4 = 1 AND n.position_new_path LIKE ?2 ESCAPE '\\')
+ OR (?4 = 0 AND (n.position_new_path = ?1 OR n.position_new_path LIKE ?2 ESCAPE '\\'))
+ )
@@
- let rows: Vec<(String, String, u32, i64)> = stmt
- .query_map(rusqlite::params![path_pattern, since_ms, project_id], |row| {
+ let rows: Vec<(String, String, u32, i64)> = stmt
+ .query_map(rusqlite::params![pm.exact, pm.prefix, since_ms, i32::from(pm.dir_only), project_id], |row| {
Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?))
})?
(Apply the same pattern to Overlap mode.)
Change 4 — Consistently exclude system notes in all DiffNote-based branches (Expert/Overlap author branches currently dont)
Why
You filter n.is_system = 0 for reviewer branches, but not in the author branches of Expert/Overlap. That can skew “author touch” via system-generated diff notes or bot activity.
Consistency here improves correctness and also enables more aggressive partial indexing.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
- WHERE n.note_type = 'DiffNote'
+ WHERE n.note_type = 'DiffNote'
AND n.position_new_path LIKE ?1 ESCAPE '\\'
+ AND n.is_system = 0
AND m.author_username IS NOT NULL
AND n.created_at >= ?2
AND (?3 IS NULL OR m.project_id = ?3)
@@
- WHERE n.note_type = 'DiffNote'
+ WHERE n.note_type = 'DiffNote'
AND n.position_new_path LIKE ?1 ESCAPE '\\'
+ AND n.is_system = 0
AND m.state IN ('opened', 'merged')
AND m.author_username IS NOT NULL
AND n.created_at >= ?2
AND (?3 IS NULL OR m.project_id = ?3)
Change 5 — Rework Migration 017 indexes to match real predicates + add one critical notes index for discussion participation
Why
(a) idx_notes_diffnote_path_created currently leads with note_type even though its constant via partial index. You want the leading columns to match your most selective predicates: position_new_path prefix + created_at range, with optional project_id.
(b) Active + Workload discussion participation repeatedly hits notes by (discussion_id, author_username); you only guarantee notes(discussion_id) is indexed. Adding a narrow partial composite index pays off immediately for both “participants” and “EXISTS user participated” checks.
(c) The discussions index should focus on (project_id, last_note_at) with a partial predicate; resolvable/resolved a_

View File

@@ -1,471 +0,0 @@
Below are the revisions Id make to iteration 2 to improve correctness, determinism, query-plan quality, and multi-project usability without turning this into a bigger product.
Im treating your plan as the “source of truth” and showing git-diff style patches against the plan text/code blocks you included.
Change 1 — Fix project scoping to hit the right index (DiffNote branches)
Why
Your hot-path index is:
idx_notes_diffnote_path_created ON notes(position_new_path, created_at, project_id) WHERE note_type='DiffNote' AND is_system=0
But in Expert/Overlap you sometimes scope by m.project_id = ?3 (MR table), not n.project_id = ?3 (notes table). That weakens the optimizers ability to use the composite notes index (and can force broader joins before filtering).
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Query: Expert Mode @@
- AND (?3 IS NULL OR m.project_id = ?3)
+ -- IMPORTANT: scope on notes.project_id to maximize use of
+ -- idx_notes_diffnote_path_created (notes is the selective table)
+ AND (?3 IS NULL OR n.project_id = ?3)
@@ Query: Overlap Mode @@
- AND (?3 IS NULL OR m.project_id = ?3)
+ AND (?3 IS NULL OR n.project_id = ?3)
@@ Query: Overlap Mode (author branch) @@
- AND (?3 IS NULL OR m.project_id = ?3)
+ AND (?3 IS NULL OR n.project_id = ?3)
Change 2 — Introduce a “prefix vs exact” path query to avoid LIKE when you dont need it
Why
For exact file paths (e.g. src/auth/login.rs), you currently do:
position_new_path LIKE ?1 ESCAPE '\' where ?1 has no wildcard
Thats logically fine, but its a worse signal to the planner than = and can degrade performance depending on collation/case settings.
This doesnt violate “static SQL” — you can pick between two static query strings.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Helper: Path Pattern Construction @@
-fn build_path_pattern(path: &str) -> String {
+struct PathQuery {
+ /// The parameter value to bind.
+ value: String,
+ /// If true: use LIKE value || '%'. If false: use '='.
+ is_prefix: bool,
+}
+
+fn build_path_query(path: &str) -> PathQuery {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
let is_file = !path.ends_with('/') && last_segment.contains('.');
let escaped = escape_like(trimmed);
if is_file {
- escaped
+ PathQuery { value: escaped, is_prefix: false }
} else {
- format!("{escaped}/%")
+ PathQuery { value: format!("{escaped}/%"), is_prefix: true }
}
}
And then (example for DiffNote predicates):
diff
Copy code
@@ Query: Expert Mode @@
- let path_pattern = build_path_pattern(path);
+ let pq = build_path_query(path);
- let sql = " ... n.position_new_path LIKE ?1 ESCAPE '\\' ... ";
+ let sql_prefix = " ... n.position_new_path LIKE ?1 ESCAPE '\\' ... ";
+ let sql_exact = " ... n.position_new_path = ?1 ... ";
- let mut stmt = conn.prepare(sql)?;
+ let mut stmt = if pq.is_prefix { conn.prepare_cached(sql_prefix)? }
+ else { conn.prepare_cached(sql_exact)? };
let rows = stmt.query_map(params![... pq.value ...], ...);
Change 3 — Push Expert aggregation into SQL (less Rust, fewer rows, SQL-level LIMIT)
Why
Right now Expert does:
UNION ALL
return per-role rows
HashMap merge
score compute
sort/truncate
You can do all of that in SQL deterministically, then LIMIT ?N actually works.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Query: Expert Mode @@
- let sql = "SELECT username, role, activity_count, last_active_at FROM (
- ...
- )";
+ let sql = "
+ WITH activity AS (
+ SELECT
+ n.author_username AS username,
+ 'reviewer' AS role,
+ COUNT(*) AS cnt,
+ MAX(n.created_at) AS last_active_at
+ FROM notes n
+ WHERE n.note_type = 'DiffNote'
+ AND n.is_system = 0
+ AND n.author_username IS NOT NULL
+ AND n.created_at >= ?2
+ AND (?3 IS NULL OR n.project_id = ?3)
+ AND (
+ (?4 = 1 AND n.position_new_path LIKE ?1 ESCAPE '\\') OR
+ (?4 = 0 AND n.position_new_path = ?1)
+ )
+ GROUP BY n.author_username
+
+ UNION ALL
+
+ SELECT
+ m.author_username AS username,
+ 'author' AS role,
+ COUNT(DISTINCT m.id) AS cnt,
+ MAX(n.created_at) AS last_active_at
+ FROM merge_requests m
+ JOIN discussions d ON d.merge_request_id = m.id
+ JOIN notes n ON n.discussion_id = d.id
+ WHERE n.note_type = 'DiffNote'
+ AND n.is_system = 0
+ AND m.author_username IS NOT NULL
+ AND n.created_at >= ?2
+ AND (?3 IS NULL OR n.project_id = ?3)
+ AND (
+ (?4 = 1 AND n.position_new_path LIKE ?1 ESCAPE '\\') OR
+ (?4 = 0 AND n.position_new_path = ?1)
+ )
+ GROUP BY m.author_username
+ )
+ SELECT
+ username,
+ SUM(CASE WHEN role='reviewer' THEN cnt ELSE 0 END) AS review_count,
+ SUM(CASE WHEN role='author' THEN cnt ELSE 0 END) AS author_count,
+ MAX(last_active_at) AS last_active_at,
+ (SUM(CASE WHEN role='reviewer' THEN cnt ELSE 0 END) * 3.0) +
+ (SUM(CASE WHEN role='author' THEN cnt ELSE 0 END) * 2.0) AS score
+ FROM activity
+ GROUP BY username
+ ORDER BY score DESC, last_active_at DESC, username ASC
+ LIMIT ?5
+ ";
- // Aggregate by username: combine reviewer + author counts
- let mut user_map: HashMap<...> = HashMap::new();
- ...
- experts.sort_by(...); experts.truncate(limit);
+ // No Rust-side merge/sort needed; SQL already returns final rows.
Change 4 — Overlap output is ambiguous across projects: include stable MR refs (project_path!iid)
Why
mr_iids: Vec<i64> is ambiguous in a multi-project DB. !123 only means something with a project.
Also: your MR IID dedup is currently Vec.contains() inside a loop (O(n²)). Use a HashSet.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ OverlapResult @@
pub struct OverlapUser {
pub username: String,
@@
- pub mr_iids: Vec<i64>,
+ /// Stable MR references like "group/project!123"
+ pub mr_refs: Vec<String>,
}
@@ Query: Overlap Mode (SQL) @@
- GROUP_CONCAT(DISTINCT m.iid) AS mr_iids
+ GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
JOIN merge_requests m ON d.merge_request_id = m.id
+ JOIN projects p ON m.project_id = p.id
@@
- GROUP_CONCAT(DISTINCT m.iid) AS mr_iids
+ GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
FROM merge_requests m
JOIN discussions d ON d.merge_request_id = m.id
JOIN notes n ON n.discussion_id = d.id
+ JOIN projects p ON m.project_id = p.id
@@ Query: Overlap Mode (Rust merge) @@
- let mr_iids: Vec<i64> = mr_iids_csv ...
+ let mr_refs: Vec<String> = mr_refs_csv
+ .as_deref()
+ .map(|csv| csv.split(',').map(|s| s.trim().to_string()).collect())
+ .unwrap_or_default();
@@
- // Merge MR IIDs, deduplicate
- for iid in &mr_iids {
- if !entry.mr_iids.contains(iid) {
- entry.mr_iids.push(*iid);
- }
- }
+ // Merge MR refs, deduplicate
+ use std::collections::HashSet;
+ let mut set: HashSet<String> = entry.mr_refs.drain(..).collect();
+ for r in mr_refs { set.insert(r); }
+ entry.mr_refs = set.into_iter().collect();
Change 5 — Active mode: avoid correlated subqueries by preselecting discussions, then aggregating notes once
Why
Your Active query does two correlated subqueries per discussion row:
note_count
participants
With LIMIT 20 its not catastrophic, but it is still unnecessary work and creates “spiky” behavior if the planner chooses poorly.
Pattern to use:
CTE selects the limited set of discussions
Join notes once, aggregate with GROUP BY
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Query: Active Mode @@
- let sql =
- "SELECT
- d.noteable_type,
- ...
- (SELECT COUNT(*) FROM notes n
- WHERE n.discussion_id = d.id AND n.is_system = 0) AS note_count,
- (SELECT GROUP_CONCAT(username, X'1F') FROM (
- SELECT DISTINCT n.author_username AS username
- FROM notes n
- WHERE n.discussion_id = d.id
- AND n.is_system = 0
- AND n.author_username IS NOT NULL
- ORDER BY username
- )) AS participants
- FROM discussions d
- ...
- LIMIT ?3";
+ let sql = "
+ WITH picked AS (
+ SELECT d.id, d.noteable_type, d.issue_id, d.merge_request_id, d.project_id, d.last_note_at
+ FROM discussions d
+ WHERE d.resolvable = 1 AND d.resolved = 0
+ AND d.last_note_at >= ?1
+ AND (?2 IS NULL OR d.project_id = ?2)
+ ORDER BY d.last_note_at DESC
+ LIMIT ?3
+ ),
+ note_agg AS (
+ SELECT
+ n.discussion_id,
+ COUNT(*) AS note_count,
+ GROUP_CONCAT(n.author_username, X'1F') AS participants
+ FROM (
+ SELECT DISTINCT discussion_id, author_username
+ FROM notes
+ WHERE is_system = 0 AND author_username IS NOT NULL
+ ) n
+ JOIN picked p ON p.id = n.discussion_id
+ GROUP BY n.discussion_id
+ )
+ SELECT
+ p.noteable_type,
+ COALESCE(i.iid, m.iid) AS entity_iid,
+ COALESCE(i.title, m.title) AS entity_title,
+ proj.path_with_namespace,
+ p.last_note_at,
+ COALESCE(na.note_count, 0) AS note_count,
+ COALESCE(na.participants, '') AS participants
+ FROM picked p
+ JOIN projects proj ON p.project_id = proj.id
+ LEFT JOIN issues i ON p.issue_id = i.id
+ LEFT JOIN merge_requests m ON p.merge_request_id = m.id
+ LEFT JOIN note_agg na ON na.discussion_id = p.id
+ ORDER BY p.last_note_at DESC
+ ";
Change 6 — Use prepare_cached() everywhere (cheap perf win, no scope creep)
Why
You already worked hard to keep SQL static. Taking advantage of sqlite statement caching completes the loop.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Query functions @@
- let mut stmt = conn.prepare(sql)?;
+ let mut stmt = conn.prepare_cached(sql)?;
Apply in all query fns (query_workload, query_reviews, query_active, query_expert, query_overlap, lookup_project_path).
Change 7 — Human output: show project_path where ambiguity exists (Workload + Overlap)
Why
When not project-scoped, #42 and !100 arent unique. You already have project paths in the query results — youre just not printing them.
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ print_workload_human @@
- println!(
- " {} {} {}",
+ println!(
+ " {} {} {} {}",
style(format!("#{:<5}", item.iid)).cyan(),
truncate_str(&item.title, 45),
style(format_relative_time(item.updated_at)).dim(),
+ style(&item.project_path).dim(),
);
@@ print_workload_human (MRs) @@
- println!(
- " {} {}{} {}",
+ println!(
+ " {} {}{} {} {}",
style(format!("!{:<5}", mr.iid)).cyan(),
truncate_str(&mr.title, 40),
style(draft).dim(),
style(format_relative_time(mr.updated_at)).dim(),
+ style(&mr.project_path).dim(),
);
@@ print_overlap_human @@
- let mr_str = user.mr_iids.iter().take(5).map(|iid| format!("!{iid}")).collect::<Vec<_>>().join(", ");
+ let mr_str = user.mr_refs.iter().take(5).cloned().collect::<Vec<_>>().join(", ");
Change 8 — Robot JSON: add stable IDs + “defaulted” flags for reproducibility
Why
You already added resolved_input — good. Two more reproducibility gaps remain:
Agents cant reliably “open” an entity without IDs (discussion_id, mr_id, issue_id).
Agents cant tell whether since was user-provided vs defaulted (important when replaying intent).
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ WhoResolvedInput @@
pub struct WhoResolvedInput {
@@
pub since_ms: Option<i64>,
pub since_iso: Option<String>,
+ pub since_was_default: bool,
pub limit: usize,
}
@@ run_who @@
- let since_ms = resolve_since(args.since.as_deref(), "6m")?;
+ let since_was_default = args.since.is_none();
+ let since_ms = resolve_since(args.since.as_deref(), "6m")?;
Ok(WhoRun {
resolved_input: WhoResolvedInput {
@@
since_ms: Some(since_ms),
since_iso: Some(ms_to_iso(since_ms)),
+ since_was_default,
limit: args.limit,
},
@@ print_who_json resolved_input @@
let resolved_input = serde_json::json!({
@@
"since_ms": run.resolved_input.since_ms,
"since_iso": run.resolved_input.since_iso,
+ "since_was_default": run.resolved_input.since_was_default,
"limit": run.resolved_input.limit,
});
And for Active/Workload discussion items, add IDs in SQL and JSON:
diff
Copy code
@@ ActiveDiscussion @@
pub struct ActiveDiscussion {
+ pub discussion_id: i64,
@@
}
@@ query_active SELECT @@
- SELECT
- p.noteable_type,
+ SELECT
+ p.id AS discussion_id,
+ p.noteable_type,
@@ active_to_json @@
- "discussions": r.discussions.iter().map(|d| json!({
+ "discussions": r.discussions.iter().map(|d| json!({
+ "discussion_id": d.discussion_id,
...
}))
Change 9 — Make performance verification explicit: require EXPLAIN QUERY PLAN checks for each mode
Why
Youre adding indexes specifically for these queries. The only way to ensure the planner is doing what you think is to lock in a short perf checklist (especially after schema drift or SQLite version differences).
Diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@ Verification @@
# Manual verification against real data
cargo run --release -- who src/features/global-search/
@@
cargo run --release -- who src/features/global-search/ -p typescript # project scoped
+
+# Perf verification (required before merge):
+# Confirm idx_notes_diffnote_path_created is used for Expert/Overlap and
+# idx_discussions_unresolved_recent is used for Active.
+sqlite3 path/to/db.sqlite "
+ EXPLAIN QUERY PLAN
+ SELECT ... -- paste final Expert SQL with representative bindings
+";
(Keep it lightweight: one representative query per mode is enough.)
Net effect
Correctness: project scoping hits the notes index; IDs added for agent workflows.
Performance: fewer rows/materialization in Expert; statement caching everywhere; Active avoids correlated subqueries.
UX: human output no longer ambiguous across projects; Overlap MR references become actionable.
Reproducibility: agents can distinguish defaults vs explicit inputs; can dereference entities reliably.
If you want one “highest ROI” subset to implement first: Change 1 + Change 4 + Change 6 + Change 7. Thats where the real operational value lands.

View File

@@ -1,3 +0,0 @@
ChatGPT said:
Proposing code revisions for performance and determinism
Answer now

View File

@@ -1,356 +0,0 @@
Below are the highest-leverage revisions Id make. Theyre tightly scoped (no new tables/APIs), but fix a few real correctness issues and make the outputs more actionable.
1) Fix a correctness bug in PathQuery: dont escape for =, and make --path Makefile actually work
Why
Bug: build_path_query() currently runs escape_like() even when is_prefix = false (exact match). That will break exact matches for paths containing _, %, or \ because = does not treat those as metacharacters (so the escaped string wont equal the stored path).
UX mismatch: The plan says --path handles dotless root files (Makefile/LICENSE), but the current logic still treats them as directory prefixes (Makefile/%) → zero results.
Change
Only escape for LIKE.
Treat root paths (no /) passed via --path as exact matches by default (unless they end with /).
diff
Copy code
diff --git a/plan.md b/plan.md
@@
-/// Build a path query from a user-supplied path.
-///
-/// Rules:
-/// - If the path ends with `/`, it's a directory prefix -> `escaped_path%` (LIKE)
-/// - If the last path segment contains `.`, it's a file -> exact match (=)
-/// - Otherwise, it's a directory prefix -> `escaped_path/%` (LIKE)
+/// Build a path query from a user-supplied path.
+///
+/// Rules:
+/// - If the path ends with `/`, it's a directory prefix -> `escaped_path/%` (LIKE)
+/// - If the path is a root path (no `/`) and does NOT end with `/`, treat as exact (=)
+/// (this makes `--path Makefile` and `--path LICENSE` work as intended)
+/// - Else if the last path segment contains `.`, treat as exact (=)
+/// - Otherwise, treat as directory prefix -> `escaped_path/%` (LIKE)
@@
-fn build_path_query(path: &str) -> PathQuery {
+fn build_path_query(path: &str) -> PathQuery {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
- let is_file = !path.ends_with('/') && last_segment.contains('.');
- let escaped = escape_like(trimmed);
+ let is_root = !trimmed.contains('/');
+ let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));
if is_file {
PathQuery {
- value: escaped,
+ // IMPORTANT: do NOT escape for exact match (=)
+ value: trimmed.to_string(),
is_prefix: false,
}
} else {
+ let escaped = escape_like(trimmed);
PathQuery {
value: format!("{escaped}/%"),
is_prefix: true,
}
}
}
@@
-/// **Known limitation:** Dotless root files (LICENSE, Makefile, Dockerfile)
-/// without a trailing `/` will be treated as directory prefixes. Use `--path`
-/// for these — the `--path` flag passes through to Expert mode directly,
-/// and the `build_path_query` output for "LICENSE" is a prefix `LICENSE/%`
-/// which will simply return zero results (a safe, obvious failure mode that the
-/// help text addresses).
+/// Note: Root file paths passed via `--path` (including dotless files like Makefile/LICENSE)
+/// are treated as exact matches unless they end with `/`.
Also update the --path help text to be explicit:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- /// Force expert mode for a file/directory path (handles root files like
- /// README.md, LICENSE, Makefile that lack a / and can't be auto-detected)
+ /// Force expert mode for a file/directory path.
+ /// Root files (README.md, LICENSE, Makefile) are treated as exact matches.
+ /// Use a trailing `/` to force directory-prefix matching.
2) Fix Active mode: your note_count is currently counting participants, and the CTE scans too broadly
Why
In note_agg, you do SELECT DISTINCT discussion_id, author_username and then COUNT(*) AS note_count. Thats participant count, not note count.
The current note_agg also builds the DISTINCT set from all notes then joins to picked. Its avoidable work.
Change
Split into two aggregations scoped to picked:
note_counts: counts non-system notes per picked discussion.
participants: distinct usernames per picked discussion, then GROUP_CONCAT.
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- note_agg AS (
- SELECT
- n.discussion_id,
- COUNT(*) AS note_count,
- GROUP_CONCAT(n.author_username, X'1F') AS participants
- FROM (
- SELECT DISTINCT discussion_id, author_username
- FROM notes
- WHERE is_system = 0 AND author_username IS NOT NULL
- ) n
- JOIN picked p ON p.id = n.discussion_id
- GROUP BY n.discussion_id
- )
+ note_counts AS (
+ SELECT
+ n.discussion_id,
+ COUNT(*) AS note_count
+ FROM notes n
+ JOIN picked p ON p.id = n.discussion_id
+ WHERE n.is_system = 0
+ GROUP BY n.discussion_id
+ ),
+ participants AS (
+ SELECT
+ x.discussion_id,
+ GROUP_CONCAT(x.author_username, X'1F') AS participants
+ FROM (
+ SELECT DISTINCT n.discussion_id, n.author_username
+ FROM notes n
+ JOIN picked p ON p.id = n.discussion_id
+ WHERE n.is_system = 0 AND n.author_username IS NOT NULL
+ ) x
+ GROUP BY x.discussion_id
+ )
@@
- LEFT JOIN note_agg na ON na.discussion_id = p.id
+ LEFT JOIN note_counts nc ON nc.discussion_id = p.id
+ LEFT JOIN participants pa ON pa.discussion_id = p.id
@@
- COALESCE(na.note_count, 0) AS note_count,
- COALESCE(na.participants, '') AS participants
+ COALESCE(nc.note_count, 0) AS note_count,
+ COALESCE(pa.participants, '') AS participants
Net effect: correctness fix + more predictable perf.
Add a test that would have failed before:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
#[test]
fn test_active_query() {
@@
- insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
+ insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
+ insert_diffnote(&conn, 2, 1, 1, "reviewer_b", "src/foo.rs", "follow-up");
@@
- assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
+ assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
+ assert_eq!(result.discussions[0].note_count, 2);
3) Index fix: idx_discussions_unresolved_recent wont help global --active ordering
Why
Your index is (project_id, last_note_at) with WHERE resolvable=1 AND resolved=0.
When --active is not project-scoped (common default), SQLite cant use (project_id, last_note_at) to satisfy ORDER BY last_note_at DESC efficiently because project_id isnt constrained.
This can turn into a scan+sort over potentially large unresolved sets.
Change
Keep the project-scoped index, but add a global ordering index (partial, still small):
diff
Copy code
diff --git a/plan.md b/plan.md
@@
CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent
ON discussions(project_id, last_note_at)
WHERE resolvable = 1 AND resolved = 0;
+
+-- Active (global): unresolved discussions by recency (no project scope).
+-- Supports ORDER BY last_note_at DESC LIMIT N when project_id is unconstrained.
+CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent_global
+ ON discussions(last_note_at)
+ WHERE resolvable = 1 AND resolved = 0;
4) Make Overlap “touches” coherent: count MRs for reviewers, not DiffNotes
Why
Overlaps question is “Who else has MRs touching my files?” but:
reviewer branch uses COUNT(*) (DiffNotes)
author branch uses COUNT(DISTINCT m.id) (MRs)
Those are different units; summing them into touch_count is misleading.
Change
Count distinct MRs on the reviewer branch too:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- COUNT(*) AS touch_count,
+ COUNT(DISTINCT m.id) AS touch_count,
MAX(n.created_at) AS last_touch_at,
GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
Also update human output labeling:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- style("Touches").bold(),
+ style("MRs").bold(),
(You still preserve “strength” via mr_refs and last_touch_at.)
5) Make outputs more actionable: add a canonical ref field (group/project!iid, group/project#iid)
Why
You already do this for Overlap (mr_refs). Doing the same for Workload and Active reduces friction for both humans and agents:
humans can copy/paste a single token
robots dont need to stitch project_path + iid + prefix
Change (Workload structs + SQL)
diff
Copy code
diff --git a/plan.md b/plan.md
@@
pub struct WorkloadIssue {
pub iid: i64,
+ pub ref_: String,
pub title: String,
pub project_path: String,
pub updated_at: i64,
}
@@
pub struct WorkloadMr {
pub iid: i64,
+ pub ref_: String,
pub title: String,
pub draft: bool,
pub project_path: String,
@@
- let issues_sql =
- "SELECT i.iid, i.title, p.path_with_namespace, i.updated_at
+ let issues_sql =
+ "SELECT i.iid,
+ (p.path_with_namespace || '#' || i.iid) AS ref,
+ i.title, p.path_with_namespace, i.updated_at
@@
- iid: row.get(0)?,
- title: row.get(1)?,
- project_path: row.get(2)?,
- updated_at: row.get(3)?,
+ iid: row.get(0)?,
+ ref_: row.get(1)?,
+ title: row.get(2)?,
+ project_path: row.get(3)?,
+ updated_at: row.get(4)?,
})
@@
- let authored_sql =
- "SELECT m.iid, m.title, m.draft, p.path_with_namespace, m.updated_at
+ let authored_sql =
+ "SELECT m.iid,
+ (p.path_with_namespace || '!' || m.iid) AS ref,
+ m.title, m.draft, p.path_with_namespace, m.updated_at
@@
- iid: row.get(0)?,
- title: row.get(1)?,
- draft: row.get::<_, i32>(2)? != 0,
- project_path: row.get(3)?,
+ iid: row.get(0)?,
+ ref_: row.get(1)?,
+ title: row.get(2)?,
+ draft: row.get::<_, i32>(3)? != 0,
+ project_path: row.get(4)?,
author_username: None,
- updated_at: row.get(4)?,
+ updated_at: row.get(5)?,
})
Then use ref_ in human output + robot JSON.
6) Reviews mode: tolerate leading whitespace before **prefix**
Why
Many people write " **suggestion**: ...". Current LIKE '**%**%' misses that.
Change
Use ltrim(n.body) consistently:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- AND n.body LIKE '**%**%'
+ AND ltrim(n.body) LIKE '**%**%'
@@
- SUBSTR(n.body, 3, INSTR(SUBSTR(n.body, 3), '**') - 1) AS raw_prefix,
+ SUBSTR(ltrim(n.body), 3, INSTR(SUBSTR(ltrim(n.body), 3), '**') - 1) AS raw_prefix,
7) Add two small tests that catch the above regressions
Why
These are exactly the kind of issues that slip through without targeted tests.
diff
Copy code
diff --git a/plan.md b/plan.md
@@
#[test]
fn test_escape_like() {
@@
}
+
+ #[test]
+ fn test_build_path_query_exact_does_not_escape() {
+ // '_' must not be escaped for '='
+ let pq = build_path_query("README_with_underscore.md");
+ assert_eq!(pq.value, "README_with_underscore.md");
+ assert!(!pq.is_prefix);
+ }
+
+ #[test]
+ fn test_path_flag_dotless_root_file_is_exact() {
+ let pq = build_path_query("Makefile");
+ assert_eq!(pq.value, "Makefile");
+ assert!(!pq.is_prefix);
+ }
Summary of net effect
Correctness fixes: exact-path escaping bug; Active.note_count bug.
Perf fixes: global --active index; avoid broad note scans in Active.
Usefulness upgrades: coherent overlap “touch” metric; canonical refs everywhere; reviews prefix more robust.
If you want one extra “stretch” that still isnt scope creep: add an unscoped warning line in human output when project_id == None (e.g., “Aggregated across projects; use -p to scope”) for Expert/Overlap/Active. Thats pure presentation, but prevents misinterpretation in multi-project DBs.

View File

@@ -1,471 +0,0 @@
Proposed revisions (Iteration 6)
Below are the highest-leverage changes Id make on top of your current Iteration 5 plan, with rationale and git-diff style edits to the plan text/snippets.
1) Fix a real edge case: dotless non-root files (src/Dockerfile, infra/Makefile, etc.)
Why
Your current build_path_query() treats dotless last segments as directories (prefix match) unless the path is root. That misclassifies legitimate dotless files inside directories and silently produces path/% (zero hits or wrong hits).
Best minimal fix: keep your static SQL approach, but add a DB existence probe (static SQL) for path queries:
If user didnt force directory (/), and exact path exists in DiffNotes, treat as exact =.
Otherwise use prefix LIKE 'dir/%'.
This avoids new CLI flags, avoids heuristics lists, and uses your existing partial index (idx_notes_diffnote_path_created) efficiently.
Diff
diff
Copy code
diff --git a/Plan.md b/Plan.md
@@
-struct PathQuery {
+struct PathQuery {
/// The parameter value to bind.
value: String,
/// If true: use `LIKE value ESCAPE '\'`. If false: use `= value`.
is_prefix: bool,
}
-/// Build a path query from a user-supplied path.
+/// Build a path query from a user-supplied path, with a DB probe for dotless files.
@@
-fn build_path_query(path: &str) -> PathQuery {
+fn build_path_query(conn: &Connection, path: &str) -> Result<PathQuery> {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
let is_root = !trimmed.contains('/');
- let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));
+ let forced_dir = path.ends_with('/');
+ let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
+
+ // If it doesn't "look like a file" but the exact path exists in DiffNotes,
+ // treat as exact (handles src/Dockerfile, infra/Makefile, etc.).
+ let exact_exists = if !looks_like_file && !forced_dir {
+ conn.query_row(
+ "SELECT 1
+ FROM notes
+ WHERE note_type = 'DiffNote'
+ AND is_system = 0
+ AND position_new_path = ?1
+ LIMIT 1",
+ rusqlite::params![trimmed],
+ |_| Ok(()),
+ ).is_ok()
+ } else {
+ false
+ };
+
+ let is_file = looks_like_file || exact_exists;
if is_file {
PathQuery {
value: trimmed.to_string(),
is_prefix: false,
}
} else {
let escaped = escape_like(trimmed);
PathQuery {
value: format!("{escaped}/%"),
is_prefix: true,
}
}
}
Also update callers:
diff
Copy code
@@
- let pq = build_path_query(path);
+ let pq = build_path_query(conn, path)?;
@@
- let pq = build_path_query(path);
+ let pq = build_path_query(conn, path)?;
And tests:
diff
Copy code
@@
- fn test_build_path_query() {
+ fn test_build_path_query() {
@@
- // Dotless root file -> exact match (root path without '/')
+ // Dotless root file -> exact match (root path without '/')
let pq = build_path_query("Makefile");
assert_eq!(pq.value, "Makefile");
assert!(!pq.is_prefix);
+
+ // Dotless file in subdir should become exact if DB contains it (probe)
+ // (set up: insert one DiffNote with position_new_path = "src/Dockerfile")
2) Make “reviewer” semantics correct: exclude MR authors commenting on their own diffs
Why
Right now, Overlap (and Expert reviewer branch) will count MR authors as “reviewers” if they leave DiffNotes in their own MR (clarifications / replies), inflating A+R and contaminating “who reviewed here” signals.
You already enforce this in --reviews mode (m.author_username != ?1). Apply the same principle consistently:
Reviewer branch: only count notes where n.author_username != m.author_username (when both non-NULL).
Diff (Overlap reviewer branch)
diff
Copy code
@@
- WHERE n.note_type = 'DiffNote'
+ WHERE n.note_type = 'DiffNote'
AND n.position_new_path LIKE ?1 ESCAPE '\\'
AND n.is_system = 0
AND n.author_username IS NOT NULL
+ AND (m.author_username IS NULL OR n.author_username != m.author_username)
AND n.created_at >= ?2
AND (?3 IS NULL OR n.project_id = ?3)
Same change for sql_exact.
3) Expert mode scoring: align units + reduce single-MR “comment storms”
Why
Expert currently mixes units:
reviewer side: DiffNote count
author side: distinct MR count
That makes score noisy and can crown “someone who wrote 30 comments on one MR” as top expert.
Fix: make both sides primarily MR-breadth:
reviewer: COUNT(DISTINCT m.id) as review_mr_count
author: COUNT(DISTINCT m.id) as author_mr_count
Optionally keep review_note_count as a secondary intensity signal (but not the main driver).
Diff (types + SQL)
diff
Copy code
@@
pub struct Expert {
pub username: String,
- pub score: f64,
- pub review_count: u32,
- pub author_count: u32,
+ pub score: i64,
+ pub review_mr_count: u32,
+ pub review_note_count: u32,
+ pub author_mr_count: u32,
pub last_active_ms: i64,
}
Reviewer branch now joins to MR so it can count distinct MRs and exclude self-comments:
diff
Copy code
@@
- SELECT
- n.author_username AS username,
- 'reviewer' AS role,
- COUNT(*) AS cnt,
- MAX(n.created_at) AS last_active_at
- FROM notes n
+ SELECT
+ n.author_username AS username,
+ 'reviewer' AS role,
+ COUNT(DISTINCT m.id) AS mr_cnt,
+ COUNT(*) AS note_cnt,
+ MAX(n.created_at) AS last_active_at
+ FROM notes n
+ JOIN discussions d ON n.discussion_id = d.id
+ JOIN merge_requests m ON d.merge_request_id = m.id
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username IS NOT NULL
+ AND (m.author_username IS NULL OR n.author_username != m.author_username)
AND n.position_new_path LIKE ?1 ESCAPE '\\'
AND n.created_at >= ?2
AND (?3 IS NULL OR n.project_id = ?3)
GROUP BY n.author_username
Update author branch payload to match shape:
diff
Copy code
@@
SELECT
m.author_username AS username,
'author' AS role,
- COUNT(DISTINCT m.id) AS cnt,
+ COUNT(DISTINCT m.id) AS mr_cnt,
+ 0 AS note_cnt,
MAX(n.created_at) AS last_active_at
Aggregate:
diff
Copy code
@@
SELECT
username,
- SUM(CASE WHEN role = 'reviewer' THEN cnt ELSE 0 END) AS review_count,
- SUM(CASE WHEN role = 'author' THEN cnt ELSE 0 END) AS author_count,
+ SUM(CASE WHEN role = 'reviewer' THEN mr_cnt ELSE 0 END) AS review_mr_count,
+ SUM(CASE WHEN role = 'reviewer' THEN note_cnt ELSE 0 END) AS review_note_count,
+ SUM(CASE WHEN role = 'author' THEN mr_cnt ELSE 0 END) AS author_mr_count,
MAX(last_active_at) AS last_active_at,
- (SUM(CASE WHEN role = 'reviewer' THEN cnt ELSE 0 END) * 3.0) +
- (SUM(CASE WHEN role = 'author' THEN cnt ELSE 0 END) * 2.0) AS score
+ (
+ (SUM(CASE WHEN role = 'reviewer' THEN mr_cnt ELSE 0 END) * 20) +
+ (SUM(CASE WHEN role = 'author' THEN mr_cnt ELSE 0 END) * 12) +
+ (SUM(CASE WHEN role = 'reviewer' THEN note_cnt ELSE 0 END) * 1)
+ ) AS score
Human header:
diff
Copy code
@@
- style("Reviews").bold(),
- style("Authored").bold(),
+ style("Reviewed(MRs)").bold(),
+ style("Notes").bold(),
+ style("Authored(MRs)").bold(),
4) Deterministic output: participants + MR refs + tie-breakers
Why
Youve correctly focused on reproducibility (resolved_input), but you still have nondeterministic lists:
participants: GROUP_CONCAT order is undefined → vector order changes run-to-run.
mr_refs: you dedup via HashSet then iterate → undefined order.
user sorting in overlap is missing stable tie-breakers.
This is a real “robot mode flake” source.
Diff (Active participants sort)
diff
Copy code
@@
- let participants: Vec<String> = participants_csv
+ let mut participants: Vec<String> = participants_csv
.as_deref()
.filter(|s| !s.is_empty())
.map(|csv| csv.split('\x1F').map(String::from).collect())
.unwrap_or_default();
+ participants.sort(); // stable, deterministic
Diff (Overlap MR refs sort + stable user sort)
diff
Copy code
@@
- users.sort_by(|a, b| b.touch_count.cmp(&a.touch_count));
+ users.sort_by(|a, b| {
+ b.touch_count.cmp(&a.touch_count)
+ .then_with(|| b.last_touch_at.cmp(&a.last_touch_at))
+ .then_with(|| a.username.cmp(&b.username))
+ });
@@
- entry.mr_refs = set.into_iter().collect();
+ let mut v: Vec<String> = set.into_iter().collect();
+ v.sort();
+ entry.mr_refs = v;
5) Make --limit actionable: surface truncation explicitly (human + robot)
Why
Agents (and humans) need to know if results were cut off so they can rerun with a bigger -n.
Right now theres no signal.
Minimal pattern: query limit + 1, set truncated = true if you got > limit, then truncate.
Diff (result types)
diff
Copy code
@@
pub struct ExpertResult {
pub path_query: String,
pub experts: Vec<Expert>,
+ pub truncated: bool,
}
@@
pub struct ActiveResult {
pub discussions: Vec<ActiveDiscussion>,
pub total_unresolved: u32,
+ pub truncated: bool,
}
@@
pub struct OverlapResult {
pub path_query: String,
pub users: Vec<OverlapUser>,
+ pub truncated: bool,
}
Diff (query pattern example)
diff
Copy code
@@
- let limit_i64 = limit as i64;
+ let limit_plus_one = (limit + 1) as i64;
@@
- LIMIT ?4
+ LIMIT ?4
@@
- rusqlite::params![pq.value, since_ms, project_id, limit_i64],
+ rusqlite::params![pq.value, since_ms, project_id, limit_plus_one],
@@
- Ok(ExpertResult {
+ let truncated = experts.len() > limit;
+ let experts = experts.into_iter().take(limit).collect();
+ Ok(ExpertResult {
path_query: path.to_string(),
experts,
+ truncated,
})
Human output hint:
diff
Copy code
@@
if r.experts.is_empty() { ... }
+ if r.truncated {
+ println!(" {}", style("(showing first -n; rerun with a higher --limit)").dim());
+ }
Robot output field:
diff
Copy code
@@
fn expert_to_json(r: &ExpertResult) -> serde_json::Value {
serde_json::json!({
"path_query": r.path_query,
+ "truncated": r.truncated,
"experts": ...
})
}
6) Overlap merge hot loop: avoid repeated HashSet rebuild per row
Why
This line is expensive in a UNION result with many rows:
rust
Copy code
let mut set: HashSet<String> = entry.mr_refs.drain(..).collect();
It reallocates and rehashes every time.
Fix: store an accumulator with HashSet during merge, convert once at end.
Diff (internal accumulator)
diff
Copy code
@@
- let mut user_map: HashMap<String, OverlapUser> = HashMap::new();
+ struct OverlapAcc {
+ username: String,
+ author_touch_count: u32,
+ review_touch_count: u32,
+ touch_count: u32,
+ last_touch_at: i64,
+ mr_refs: HashSet<String>,
+ }
+ let mut user_map: HashMap<String, OverlapAcc> = HashMap::new();
@@
- let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
+ let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapAcc {
username: username.clone(),
author_touch_count: 0,
review_touch_count: 0,
touch_count: 0,
last_touch_at: 0,
- mr_refs: Vec::new(),
+ mr_refs: HashSet::new(),
});
@@
- let mut set: HashSet<String> = entry.mr_refs.drain(..).collect();
- for r in mr_refs { set.insert(r); }
- entry.mr_refs = set.into_iter().collect();
+ for r in mr_refs { entry.mr_refs.insert(r); }
@@
- let mut users: Vec<OverlapUser> = user_map.into_values().collect();
+ let mut users: Vec<OverlapUser> = user_map.into_values().map(|a| {
+ let mut mr_refs: Vec<String> = a.mr_refs.into_iter().collect();
+ mr_refs.sort();
+ OverlapUser {
+ username: a.username,
+ author_touch_count: a.author_touch_count,
+ review_touch_count: a.review_touch_count,
+ touch_count: a.touch_count,
+ last_touch_at: a.last_touch_at,
+ mr_refs,
+ }
+ }).collect();
7) Tests to lock these behaviors
Add tests (high value)
dotless subdir file uses DB probe → exact match
self-review exclusion prevents MR author showing up as reviewer
deterministic ordering for participants and mr_refs (sort)
Diff (test additions outline)
diff
Copy code
@@
#[test]
+ fn test_build_path_query_dotless_subdir_file_uses_probe() {
+ let conn = setup_test_db();
+ insert_project(&conn, 1, "team/backend");
+ insert_mr(&conn, 1, 1, 100, "author_a", "opened");
+ insert_discussion(&conn, 1, 1, Some(1), None, true, false);
+ insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/Dockerfile", "note");
+
+ let pq = build_path_query(&conn, "src/Dockerfile").unwrap();
+ assert_eq!(pq.value, "src/Dockerfile");
+ assert!(!pq.is_prefix);
+ }
+
+ #[test]
+ fn test_overlap_excludes_self_review_notes() {
+ let conn = setup_test_db();
+ insert_project(&conn, 1, "team/backend");
+ insert_mr(&conn, 1, 1, 100, "author_a", "opened");
+ insert_discussion(&conn, 1, 1, Some(1), None, true, false);
+ // author_a comments on their own MR diff
+ insert_diffnote(&conn, 1, 1, 1, "author_a", "src/auth/login.rs", "clarification");
+
+ let result = query_overlap(&conn, "src/auth/", None, 0, 20).unwrap();
+ let u = result.users.iter().find(|u| u.username == "author_a");
+ // should not be credited as reviewer touch
+ assert!(u.map(|x| x.review_touch_count).unwrap_or(0) == 0);
+ }
Net effect
Correctness: fixes dotless subdir files + self-review pollution.
Signal quality: Expert ranking becomes harder to game by comment volume.
Robot reproducibility: deterministic ordering + explicit truncation.
Performance: avoids rehash loops in overlap merges; path probe uses indexed equality.
If you want one “single best” change: #1 (DB probe exact-match) is the most likely to prevent confusing “why is this empty?” behavior without adding any user-facing complexity.

View File

@@ -1,353 +0,0 @@
Below are the highest-leverage revisions Id make to iteration 6 to improve correctness (multi-project edge cases), robot-mode reliability (bounded payloads + truncation), and signal quality—without changing the fundamental scope (still pure SQL over existing tables).
1) Make build_path_query project-aware and two-way probe (exact and prefix)
Why
Your DB probe currently answers: “does this exact file exist anywhere in DiffNotes?” That can misclassify in a project-scoped run:
Path exists as a dotless file in Project A → probe returns true
User runs -p Project B where the path is a directory (or different shape) → you switch to exact, return empty, and miss valid prefix hits.
Also, you still have a minor heuristic fragility for dot directories when the user omits trailing / (e.g., .github/workflows): last segment has a dot → you treat as file unless forced dir.
Revision
Thread project_id into build_path_query(conn, path, project_id)
Probe exact first (scoped), then probe prefix (scoped)
Only fall back to heuristics if both probes fail
This keeps “static SQL, no dynamic assembly,” and costs at most 2 indexed existence queries per invocation.
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
- fn build_path_query(conn: &Connection, path: &str) -> Result<PathQuery> {
+ fn build_path_query(conn: &Connection, path: &str, project_id: Option<i64>) -> Result<PathQuery> {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
let is_root = !trimmed.contains('/');
let forced_dir = path.ends_with('/');
- let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
+ // Heuristic is now only a fallback; probes decide first.
+ let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
- let exact_exists = if !looks_like_file && !forced_dir {
- conn.query_row(
- "SELECT 1 FROM notes
- WHERE note_type = 'DiffNote'
- AND is_system = 0
- AND position_new_path = ?1
- LIMIT 1",
- rusqlite::params![trimmed],
- |_| Ok(()),
- )
- .is_ok()
- } else {
- false
- };
+ // Probe 1: exact file exists (scoped)
+ let exact_exists = conn.query_row(
+ "SELECT 1 FROM notes
+ WHERE note_type = 'DiffNote'
+ AND is_system = 0
+ AND position_new_path = ?1
+ AND (?2 IS NULL OR project_id = ?2)
+ LIMIT 1",
+ rusqlite::params![trimmed, project_id],
+ |_| Ok(()),
+ ).is_ok();
+
+ // Probe 2: directory prefix exists (scoped)
+ let prefix_exists = if !forced_dir {
+ let escaped = escape_like(trimmed);
+ let pat = format!("{escaped}/%");
+ conn.query_row(
+ "SELECT 1 FROM notes
+ WHERE note_type = 'DiffNote'
+ AND is_system = 0
+ AND position_new_path LIKE ?1 ESCAPE '\\'
+ AND (?2 IS NULL OR project_id = ?2)
+ LIMIT 1",
+ rusqlite::params![pat, project_id],
+ |_| Ok(()),
+ ).is_ok()
+ } else { false };
- let is_file = looks_like_file || exact_exists;
+ // Forced directory always wins; otherwise: exact > prefix > heuristic
+ let is_file = if forced_dir { false }
+ else if exact_exists { true }
+ else if prefix_exists { false }
+ else { looks_like_file };
if is_file {
Ok(PathQuery { value: trimmed.to_string(), is_prefix: false })
} else {
let escaped = escape_like(trimmed);
Ok(PathQuery { value: format!("{escaped}/%"), is_prefix: true })
}
}
@@
- let pq = build_path_query(conn, path)?;
+ let pq = build_path_query(conn, path, project_id)?;
Add test coverage for the multi-project misclassification case:
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
#[test]
fn test_build_path_query_dotless_subdir_file_uses_db_probe() {
@@
- let pq = build_path_query(&conn, "src/Dockerfile").unwrap();
+ let pq = build_path_query(&conn, "src/Dockerfile", None).unwrap();
@@
- let pq2 = build_path_query(&conn2, "src/Dockerfile").unwrap();
+ let pq2 = build_path_query(&conn2, "src/Dockerfile", None).unwrap();
}
+
+ #[test]
+ fn test_build_path_query_probe_is_project_scoped() {
+ // Path exists as a dotless file in project 1; project 2 should not
+ // treat it as an exact file unless it exists there too.
+ let conn = setup_test_db();
+ insert_project(&conn, 1, "team/a");
+ insert_project(&conn, 2, "team/b");
+ insert_mr(&conn, 1, 1, 10, "author_a", "opened");
+ insert_discussion(&conn, 1, 1, Some(1), None, true, false);
+ insert_diffnote(&conn, 1, 1, 1, "rev", "infra/Makefile", "note");
+
+ let pq_scoped = build_path_query(&conn, "infra/Makefile", Some(2)).unwrap();
+ assert!(pq_scoped.is_prefix); // should fall back to prefix in project 2
+ }
2) Bound robot payload sizes for participants and mr_refs (with totals + truncation)
Why
mr_refs and participants can become unbounded arrays in robot mode, which is a real operational hazard:
huge JSON → slow, noisy diffs, brittle downstream pipelines
potential SQLite group_concat truncation becomes invisible (and you cant distinguish “no refs” vs “refs truncated”)
Revision
Introduce hard caps and explicit metadata:
participants_total, participants_truncated
mr_refs_total, mr_refs_truncated
This is not scope creep—its defensive output hygiene.
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
pub struct ActiveDiscussion {
@@
pub participants: Vec<String>,
+ pub participants_total: u32,
+ pub participants_truncated: bool,
}
@@
pub struct OverlapUser {
@@
pub mr_refs: Vec<String>,
+ pub mr_refs_total: u32,
+ pub mr_refs_truncated: bool,
}
Implementation sketch (Rust-side, deterministic):
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
fn query_active(...) -> Result<ActiveResult> {
+ const MAX_PARTICIPANTS: usize = 50;
@@
- participants.sort();
+ participants.sort();
+ let participants_total = participants.len() as u32;
+ let participants_truncated = participants.len() > MAX_PARTICIPANTS;
+ if participants_truncated {
+ participants.truncate(MAX_PARTICIPANTS);
+ }
@@
Ok(ActiveDiscussion {
@@
participants,
+ participants_total,
+ participants_truncated,
})
@@
fn query_overlap(...) -> Result<OverlapResult> {
+ const MAX_MR_REFS_PER_USER: usize = 50;
@@
.map(|a| {
let mut mr_refs: Vec<String> = a.mr_refs.into_iter().collect();
mr_refs.sort();
+ let mr_refs_total = mr_refs.len() as u32;
+ let mr_refs_truncated = mr_refs.len() > MAX_MR_REFS_PER_USER;
+ if mr_refs_truncated {
+ mr_refs.truncate(MAX_MR_REFS_PER_USER);
+ }
OverlapUser {
@@
mr_refs,
+ mr_refs_total,
+ mr_refs_truncated,
}
})
Update robot JSON accordingly:
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
fn active_to_json(r: &ActiveResult) -> serde_json::Value {
@@
"participants": d.participants,
+ "participants_total": d.participants_total,
+ "participants_truncated": d.participants_truncated,
}))
@@
fn overlap_to_json(r: &OverlapResult) -> serde_json::Value {
@@
"mr_refs": u.mr_refs,
+ "mr_refs_total": u.mr_refs_total,
+ "mr_refs_truncated": u.mr_refs_truncated,
}))
Also update robot-docs manifest schema snippet for who.active.discussions[] and who.overlap.users[].
3) Add truncation metadata to Workload sections (same LIMIT+1 pattern)
Why
Workload is the mode most likely to be consumed by agents, and right now it has silent truncation (each section is LIMIT N with no signal). Your plan already treats truncation as a first-class contract elsewhere; Workload should match.
Revision
For each workload query:
request LIMIT + 1
set *_truncated booleans
trim to requested limit
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
pub struct WorkloadResult {
pub username: String,
pub assigned_issues: Vec<WorkloadIssue>,
pub authored_mrs: Vec<WorkloadMr>,
pub reviewing_mrs: Vec<WorkloadMr>,
pub unresolved_discussions: Vec<WorkloadDiscussion>,
+ pub assigned_issues_truncated: bool,
+ pub authored_mrs_truncated: bool,
+ pub reviewing_mrs_truncated: bool,
+ pub unresolved_discussions_truncated: bool,
}
And in JSON include the booleans (plus you already have summary.counts).
This is mechanically repetitive but extremely valuable for automation.
4) Rename “Last Active” → “Last Seen” for Expert/Overlap
Why
For “author” rows, the timestamp is derived from review activity on their MR (via MAX(n.created_at)), not necessarily that persons direct action. Calling that “active” is semantically misleading. “Last seen” is accurate across both reviewer+author branches.
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
pub struct Expert {
@@
- pub last_active_ms: i64,
+ pub last_seen_ms: i64,
}
@@
pub struct OverlapUser {
@@
- pub last_touch_at: i64,
+ pub last_seen_at: i64,
@@
fn print_expert_human(...) {
@@
- style("Last Active").bold(),
+ style("Last Seen").bold(),
@@
- style(format_relative_time(expert.last_active_ms)).dim(),
+ style(format_relative_time(expert.last_seen_ms)).dim(),
(Keep internal SQL aliases consistent: last_seen_at everywhere.)
5) Make MR state filtering consistent in Expert/Overlap reviewer branches
Why
You already restrict Overlap author branch to opened|merged, but reviewer branches can include closed/unmerged noise. Consistency improves signal quality and can reduce scan churn.
Low-risk revision: apply the same state filter to reviewer branches (Expert + Overlap). You can keep “closed” excluded by default without adding new flags.
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
@@
- AND n.created_at >= ?2
+ AND m.state IN ('opened','merged')
+ AND n.created_at >= ?2
This is a semantic choice; if you later want archaeology across closed/unmerged, that belongs in a separate mode/flag, but I would not add it now.
6) Add a design principle for bounded outputs (aligns with robot-first reproducibility)
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
10. **Truncation transparency.** Result types carry a `truncated: bool` flag...
+11. **Bounded payloads.** Robot JSON must never emit unbounded arrays (participants, refs).
+ Large list fields are capped with `*_total` + `*_truncated` so agents can page/retry.
Consolidated plan metadata bump (Iteration 7)
diff
Copy code
diff --git a/who-command-design.md b/who-command-design.md
@@
-iteration: 6
+iteration: 7
updated: 2026-02-07
Net effect (what you get)
Correct path classification under -p scoping (no cross-project probe leakage)
Deterministic + bounded robot payloads (no giant JSON surprises)
Uniform truncation contract across all modes (Workload no longer silently truncates)
Clearer semantics (“Last Seen” avoids misinterpretation)
Cleaner signals (reviewer branches ignore closed/unmerged by default)
If you want, I can also produce a second diff that updates the robot-docs schema block and the Verification EXPLAIN expectations to reflect the new probe queries and the state filter.

View File

@@ -0,0 +1,21 @@
-- Migration 022: Composite query indexes for notes + author_id column
-- Optimizes author-scoped and project-scoped date-range queries on notes.
-- Adds discussion JOIN indexes and immutable author identity column.
-- Composite index for author-scoped queries (who command, notes --author)
CREATE INDEX IF NOT EXISTS idx_notes_user_created
ON notes(project_id, author_username COLLATE NOCASE, created_at DESC, id DESC)
WHERE is_system = 0;
-- Composite index for project-scoped date-range queries
CREATE INDEX IF NOT EXISTS idx_notes_project_created
ON notes(project_id, created_at DESC, id DESC)
WHERE is_system = 0;
-- Discussion JOIN indexes
CREATE INDEX IF NOT EXISTS idx_discussions_issue_id ON discussions(issue_id);
CREATE INDEX IF NOT EXISTS idx_discussions_mr_id ON discussions(merge_request_id);
-- Immutable author identity column (GitLab numeric user ID)
ALTER TABLE notes ADD COLUMN author_id INTEGER;
CREATE INDEX IF NOT EXISTS idx_notes_author_id ON notes(author_id) WHERE author_id IS NOT NULL;

View File

@@ -0,0 +1,153 @@
-- Migration 024: Add 'note' source_type to documents and dirty_sources
-- SQLite does not support ALTER CONSTRAINT, so we use the table-rebuild pattern.
-- ============================================================
-- 1. Rebuild dirty_sources with updated CHECK constraint
-- ============================================================
CREATE TABLE dirty_sources_new (
source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion','note')),
source_id INTEGER NOT NULL,
queued_at INTEGER NOT NULL,
attempt_count INTEGER NOT NULL DEFAULT 0,
last_attempt_at INTEGER,
last_error TEXT,
next_attempt_at INTEGER,
PRIMARY KEY(source_type, source_id)
);
INSERT INTO dirty_sources_new SELECT * FROM dirty_sources;
DROP TABLE dirty_sources;
ALTER TABLE dirty_sources_new RENAME TO dirty_sources;
CREATE INDEX idx_dirty_sources_next_attempt ON dirty_sources(next_attempt_at);
-- ============================================================
-- 2. Rebuild documents with updated CHECK constraint
-- ============================================================
-- 2a. Backup junction table data
CREATE TEMP TABLE _doc_labels_backup AS SELECT * FROM document_labels;
CREATE TEMP TABLE _doc_paths_backup AS SELECT * FROM document_paths;
-- 2b. Drop all triggers that reference documents
DROP TRIGGER IF EXISTS documents_ai;
DROP TRIGGER IF EXISTS documents_ad;
DROP TRIGGER IF EXISTS documents_au;
DROP TRIGGER IF EXISTS documents_embeddings_ad;
-- 2c. Drop junction tables (they have FK references to documents)
DROP TABLE IF EXISTS document_labels;
DROP TABLE IF EXISTS document_paths;
-- 2d. Create new documents table with 'note' in CHECK constraint
CREATE TABLE documents_new (
id INTEGER PRIMARY KEY,
source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion','note')),
source_id INTEGER NOT NULL,
project_id INTEGER NOT NULL REFERENCES projects(id),
author_username TEXT,
label_names TEXT,
created_at INTEGER,
updated_at INTEGER,
url TEXT,
title TEXT,
content_text TEXT NOT NULL,
content_hash TEXT NOT NULL,
labels_hash TEXT NOT NULL DEFAULT '',
paths_hash TEXT NOT NULL DEFAULT '',
is_truncated INTEGER NOT NULL DEFAULT 0,
truncated_reason TEXT CHECK (
truncated_reason IN (
'token_limit_middle_drop','single_note_oversized','first_last_oversized',
'hard_cap_oversized'
)
OR truncated_reason IS NULL
),
UNIQUE(source_type, source_id)
);
-- 2e. Copy all existing data
INSERT INTO documents_new SELECT * FROM documents;
-- 2f. Swap tables
DROP TABLE documents;
ALTER TABLE documents_new RENAME TO documents;
-- 2g. Recreate all indexes on documents
CREATE INDEX idx_documents_project_updated ON documents(project_id, updated_at);
CREATE INDEX idx_documents_author ON documents(author_username);
CREATE INDEX idx_documents_source ON documents(source_type, source_id);
CREATE INDEX idx_documents_hash ON documents(content_hash);
-- 2h. Recreate junction tables
CREATE TABLE document_labels (
document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
label_name TEXT NOT NULL,
PRIMARY KEY(document_id, label_name)
) WITHOUT ROWID;
CREATE INDEX idx_document_labels_label ON document_labels(label_name);
CREATE TABLE document_paths (
document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
path TEXT NOT NULL,
PRIMARY KEY(document_id, path)
) WITHOUT ROWID;
CREATE INDEX idx_document_paths_path ON document_paths(path);
-- 2i. Restore junction table data from backups
INSERT INTO document_labels SELECT * FROM _doc_labels_backup;
INSERT INTO document_paths SELECT * FROM _doc_paths_backup;
-- 2j. Recreate FTS triggers (from migration 008)
CREATE TRIGGER documents_ai AFTER INSERT ON documents BEGIN
INSERT INTO documents_fts(rowid, title, content_text)
VALUES (new.id, COALESCE(new.title, ''), new.content_text);
END;
CREATE TRIGGER documents_ad AFTER DELETE ON documents BEGIN
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
END;
CREATE TRIGGER documents_au AFTER UPDATE ON documents
WHEN old.title IS NOT new.title OR old.content_text != new.content_text
BEGIN
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
INSERT INTO documents_fts(rowid, title, content_text)
VALUES (new.id, COALESCE(new.title, ''), new.content_text);
END;
-- 2k. Recreate embeddings cleanup trigger (from migration 009)
CREATE TRIGGER documents_embeddings_ad AFTER DELETE ON documents BEGIN
DELETE FROM embeddings
WHERE rowid >= old.id * 1000
AND rowid < (old.id + 1) * 1000;
END;
-- 2l. Rebuild FTS index to ensure consistency after table swap
INSERT INTO documents_fts(documents_fts) VALUES('rebuild');
-- ============================================================
-- 3. Defense triggers: clean up documents when notes are
-- deleted or flipped to system notes
-- ============================================================
CREATE TRIGGER notes_ad_cleanup AFTER DELETE ON notes
WHEN old.is_system = 0
BEGIN
DELETE FROM documents WHERE source_type = 'note' AND source_id = old.id;
END;
CREATE TRIGGER notes_au_system_cleanup AFTER UPDATE OF is_system ON notes
WHEN NEW.is_system = 1 AND OLD.is_system = 0
BEGIN
DELETE FROM documents WHERE source_type = 'note' AND source_id = OLD.id;
END;
-- ============================================================
-- 4. Drop temp backup tables
-- ============================================================
DROP TABLE IF EXISTS _doc_labels_backup;
DROP TABLE IF EXISTS _doc_paths_backup;

View File

@@ -0,0 +1,8 @@
-- Backfill existing non-system notes into dirty queue for document generation.
-- Only seeds notes that don't already have documents and aren't already queued.
INSERT INTO dirty_sources (source_type, source_id, queued_at)
SELECT 'note', n.id, CAST(strftime('%s', 'now') AS INTEGER) * 1000
FROM notes n
LEFT JOIN documents d ON d.source_type = 'note' AND d.source_id = n.id
WHERE n.is_system = 0 AND d.id IS NULL
ON CONFLICT(source_type, source_id) DO NOTHING;

View File

@@ -0,0 +1,20 @@
-- Indexes for time-decay expert scoring: dual-path matching and reviewer participation.
CREATE INDEX IF NOT EXISTS idx_notes_old_path_author
ON notes(position_old_path, author_username, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
ON mr_file_changes(old_path, project_id, merge_request_id)
WHERE old_path IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
ON mr_file_changes(new_path, project_id, merge_request_id);
CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author
ON notes(discussion_id, author_username, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0;
CREATE INDEX IF NOT EXISTS idx_notes_old_path_project_created
ON notes(position_old_path, project_id, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;

View File

@@ -4,7 +4,7 @@ title: ""
status: iterating
iteration: 6
target_iterations: 8
beads_revision: 1
beads_revision: 2
related_plans: []
created: 2026-02-08
updated: 2026-02-12

View File

@@ -21,6 +21,10 @@ pub enum CorrectionRule {
SingleDashLongFlag,
CaseNormalization,
FuzzyFlag,
SubcommandAlias,
ValueNormalization,
ValueFuzzy,
FlagPrefix,
}
/// Result of the correction pass over raw args.
@@ -40,6 +44,7 @@ const GLOBAL_FLAGS: &[&str] = &[
"--robot",
"--json",
"--color",
"--icons",
"--quiet",
"--no-quiet",
"--verbose",
@@ -119,8 +124,10 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--no-docs",
"--no-events",
"--no-file-changes",
"--no-status",
"--dry-run",
"--no-dry-run",
"--timings",
],
),
(
@@ -162,7 +169,7 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--project",
"--since",
"--depth",
"--expand-mentions",
"--no-mentions",
"--limit",
"--fields",
"--max-seeds",
@@ -183,6 +190,36 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--fields",
"--detail",
"--no-detail",
"--as-of",
"--explain-score",
"--include-bots",
"--all-history",
],
),
("drift", &["--threshold", "--project"]),
(
"notes",
&[
"--limit",
"--fields",
"--format",
"--author",
"--note-type",
"--contains",
"--note-id",
"--gitlab-note-id",
"--discussion-id",
"--include-system",
"--for-issue",
"--for-mr",
"--project",
"--since",
"--until",
"--path",
"--resolution",
"--sort",
"--asc",
"--open",
],
),
(
@@ -196,6 +233,25 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--default-project",
],
),
(
"file-history",
&[
"--project",
"--discussions",
"--no-follow-renames",
"--merged",
"--limit",
],
),
(
"trace",
&[
"--project",
"--discussions",
"--no-follow-renames",
"--limit",
],
),
("generate-docs", &["--full", "--project"]),
("completions", &[]),
("robot-docs", &["--brief"]),
@@ -231,18 +287,47 @@ pub const ENUM_VALUES: &[(&str, &[&str])] = &[
("--state", &["opened", "closed", "merged", "locked", "all"]),
("--mode", &["lexical", "hybrid", "semantic"]),
("--sort", &["updated", "created", "iid"]),
("--type", &["issue", "mr", "discussion"]),
("--type", &["issue", "mr", "discussion", "note"]),
("--fts-mode", &["safe", "raw"]),
("--color", &["auto", "always", "never"]),
("--log-format", &["text", "json"]),
("--for", &["issue", "mr"]),
];
// ---------------------------------------------------------------------------
// Subcommand alias map (for forms clap aliases can't express)
// ---------------------------------------------------------------------------
/// Subcommand aliases for non-standard forms (underscores, no separators).
/// Clap `visible_alias`/`alias` handles hyphenated forms (`merge-requests`);
/// this map catches the rest.
const SUBCOMMAND_ALIASES: &[(&str, &str)] = &[
("merge_requests", "mrs"),
("merge_request", "mrs"),
("mergerequests", "mrs"),
("mergerequest", "mrs"),
("generate_docs", "generate-docs"),
("generatedocs", "generate-docs"),
("gendocs", "generate-docs"),
("gen-docs", "generate-docs"),
("robot_docs", "robot-docs"),
("robotdocs", "robot-docs"),
("sync_status", "status"),
("syncstatus", "status"),
("auth_test", "auth"),
("authtest", "auth"),
("file_history", "file-history"),
("filehistory", "file-history"),
];
// ---------------------------------------------------------------------------
// Correction thresholds
// ---------------------------------------------------------------------------
const FUZZY_FLAG_THRESHOLD: f64 = 0.8;
/// Stricter threshold for robot mode — only high-confidence corrections to
/// avoid misleading agents. Still catches obvious typos like `--projct`.
const FUZZY_FLAG_THRESHOLD_STRICT: f64 = 0.9;
// ---------------------------------------------------------------------------
// Core logic
@@ -302,20 +387,29 @@ fn valid_flags_for(subcommand: Option<&str>) -> Vec<&'static str> {
/// Run the pre-clap correction pass on raw args.
///
/// When `strict` is true (robot mode), only deterministic corrections are applied
/// (single-dash long flags, case normalization). Fuzzy matching is disabled to
/// prevent misleading agents with speculative corrections.
/// Three-phase pipeline:
/// - Phase A: Subcommand alias correction (case-insensitive alias map)
/// - Phase B: Per-arg flag corrections (single-dash, case, prefix, fuzzy)
/// - Phase C: Enum value normalization (case + fuzzy + prefix on known values)
///
/// When `strict` is true (robot mode), fuzzy matching uses a higher threshold
/// (0.9 vs 0.8) to avoid speculative corrections while still catching obvious
/// typos like `--projct` → `--project`.
///
/// Returns the (possibly modified) args and any corrections applied.
pub fn correct_args(raw: Vec<String>, strict: bool) -> CorrectionResult {
let subcommand = detect_subcommand(&raw);
let valid = valid_flags_for(subcommand);
let mut corrected = Vec::with_capacity(raw.len());
let mut corrections = Vec::new();
// Phase A: Subcommand alias correction
let args = correct_subcommand(raw, &mut corrections);
// Phase B: Per-arg flag corrections
let valid = valid_flags_for(detect_subcommand(&args));
let mut corrected = Vec::with_capacity(args.len());
let mut past_terminator = false;
for arg in raw {
for arg in args {
// B1: Stop correcting after POSIX `--` option terminator
if arg == "--" {
past_terminator = true;
@@ -337,12 +431,177 @@ pub fn correct_args(raw: Vec<String>, strict: bool) -> CorrectionResult {
}
}
// Phase C: Enum value normalization
normalize_enum_values(&mut corrected, &mut corrections);
CorrectionResult {
args: corrected,
corrections,
}
}
/// Phase A: Replace subcommand aliases with their canonical names.
///
/// Handles forms that can't be expressed as clap `alias`/`visible_alias`
/// (underscores, no-separator forms). Case-insensitive matching.
fn correct_subcommand(mut args: Vec<String>, corrections: &mut Vec<Correction>) -> Vec<String> {
// Find the subcommand position index, then check the alias map.
// Can't use iterators easily because we need to mutate args[i].
let mut skip_next = false;
let mut subcmd_idx = None;
for (i, arg) in args.iter().enumerate().skip(1) {
if skip_next {
skip_next = false;
continue;
}
if arg.starts_with('-') {
if arg.contains('=') {
continue;
}
if matches!(arg.as_str(), "--config" | "-c" | "--color" | "--log-format") {
skip_next = true;
}
continue;
}
subcmd_idx = Some(i);
break;
}
if let Some(i) = subcmd_idx
&& let Some((_, canonical)) = SUBCOMMAND_ALIASES
.iter()
.find(|(alias, _)| alias.eq_ignore_ascii_case(&args[i]))
{
corrections.push(Correction {
original: args[i].clone(),
corrected: (*canonical).to_string(),
rule: CorrectionRule::SubcommandAlias,
confidence: 1.0,
});
args[i] = (*canonical).to_string();
}
args
}
/// Phase C: Normalize enum values for flags with known valid values.
///
/// Handles both `--flag value` and `--flag=value` forms. Corrections are:
/// 1. Case normalization: `Opened` → `opened`
/// 2. Prefix expansion: `open` → `opened` (only if unambiguous)
/// 3. Fuzzy matching: `opend` → `opened`
fn normalize_enum_values(args: &mut [String], corrections: &mut Vec<Correction>) {
let mut i = 0;
while i < args.len() {
// Respect POSIX `--` option terminator — don't normalize values after it
if args[i] == "--" {
break;
}
// Handle --flag=value form
if let Some(eq_pos) = args[i].find('=') {
let flag = args[i][..eq_pos].to_string();
let value = args[i][eq_pos + 1..].to_string();
if let Some(valid_vals) = lookup_enum_values(&flag)
&& let Some((corrected_val, is_case_only)) = normalize_value(&value, valid_vals)
{
let original = args[i].clone();
let corrected = format!("{flag}={corrected_val}");
args[i] = corrected.clone();
corrections.push(Correction {
original,
corrected,
rule: if is_case_only {
CorrectionRule::ValueNormalization
} else {
CorrectionRule::ValueFuzzy
},
confidence: 0.95,
});
}
i += 1;
continue;
}
// Handle --flag value form
if args[i].starts_with("--")
&& let Some(valid_vals) = lookup_enum_values(&args[i])
&& i + 1 < args.len()
&& !args[i + 1].starts_with('-')
{
let value = args[i + 1].clone();
if let Some((corrected_val, is_case_only)) = normalize_value(&value, valid_vals) {
let original = args[i + 1].clone();
args[i + 1] = corrected_val.to_string();
corrections.push(Correction {
original,
corrected: corrected_val.to_string(),
rule: if is_case_only {
CorrectionRule::ValueNormalization
} else {
CorrectionRule::ValueFuzzy
},
confidence: 0.95,
});
}
i += 2;
continue;
}
i += 1;
}
}
/// Look up valid enum values for a flag (case-insensitive flag name match).
fn lookup_enum_values(flag: &str) -> Option<&'static [&'static str]> {
let lower = flag.to_lowercase();
ENUM_VALUES
.iter()
.find(|(f, _)| f.to_lowercase() == lower)
.map(|(_, vals)| *vals)
}
/// Try to normalize a value against a set of valid values.
///
/// Returns `Some((corrected, is_case_only))` if a correction is needed:
/// - `is_case_only = true` for pure case normalization
/// - `is_case_only = false` for prefix/fuzzy corrections
///
/// Returns `None` if the value is already valid or no match is found.
fn normalize_value(input: &str, valid_values: &[&str]) -> Option<(String, bool)> {
// Already valid (exact match)? No correction needed.
if valid_values.contains(&input) {
return None;
}
let lower = input.to_lowercase();
// Case-insensitive exact match
if let Some(&val) = valid_values.iter().find(|v| v.to_lowercase() == lower) {
return Some((val.to_string(), true));
}
// Prefix match (e.g., "open" → "opened") — only if unambiguous
let prefix_matches: Vec<&&str> = valid_values
.iter()
.filter(|v| v.starts_with(&*lower))
.collect();
if prefix_matches.len() == 1 {
return Some(((*prefix_matches[0]).to_string(), false));
}
// Fuzzy match
let best = valid_values
.iter()
.map(|v| (*v, jaro_winkler(&lower, v)))
.max_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
if let Some((val, score)) = best
&& score >= 0.8
{
return Some((val.to_string(), false));
}
None
}
/// Clap built-in flags that should never be corrected. These are handled by clap
/// directly and are not in our GLOBAL_FLAGS registry.
const CLAP_BUILTINS: &[&str] = &["--help", "--version"];
@@ -461,10 +720,34 @@ fn try_correct(arg: &str, valid_flags: &[&str], strict: bool) -> Option<Correcti
});
}
// Rule 3: Fuzzy flag match — `--staate` -> `--state` (skip in strict mode)
if !strict
&& let Some((best_flag, score)) = best_fuzzy_match(&lower, valid_flags)
&& score >= FUZZY_FLAG_THRESHOLD
// Rule 3: Prefix match — `--proj` -> `--project` (only if unambiguous)
let prefix_matches: Vec<&str> = valid_flags
.iter()
.filter(|f| f.starts_with(&*lower) && f.to_lowercase() != lower)
.copied()
.collect();
if prefix_matches.len() == 1 {
let matched = prefix_matches[0];
let corrected = match value_suffix {
Some(suffix) => format!("{matched}{suffix}"),
None => matched.to_string(),
};
return Some(Correction {
original: arg.to_string(),
corrected,
rule: CorrectionRule::FlagPrefix,
confidence: 0.95,
});
}
// Rule 4: Fuzzy flag match — higher threshold in strict/robot mode
let threshold = if strict {
FUZZY_FLAG_THRESHOLD_STRICT
} else {
FUZZY_FLAG_THRESHOLD
};
if let Some((best_flag, score)) = best_fuzzy_match(&lower, valid_flags)
&& score >= threshold
{
let corrected = match value_suffix {
Some(suffix) => format!("{best_flag}{suffix}"),
@@ -538,6 +821,30 @@ pub fn format_teaching_note(correction: &Correction) -> String {
correction.corrected, correction.original
)
}
CorrectionRule::SubcommandAlias => {
format!(
"Use canonical command name: {} (not {})",
correction.corrected, correction.original
)
}
CorrectionRule::ValueNormalization => {
format!(
"Values are lowercase: {} (not {})",
correction.corrected, correction.original
)
}
CorrectionRule::ValueFuzzy => {
format!(
"Correct value spelling: {} (not {})",
correction.corrected, correction.original
)
}
CorrectionRule::FlagPrefix => {
format!(
"Use full flag name: {} (not {})",
correction.corrected, correction.original
)
}
}
}
@@ -721,17 +1028,20 @@ mod tests {
assert_eq!(result.args[1], "--help");
}
// ---- I6: Strict mode (robot) disables fuzzy matching ----
// ---- Strict mode (robot) uses higher fuzzy threshold ----
#[test]
fn strict_mode_disables_fuzzy() {
// Fuzzy match works in non-strict
fn strict_mode_rejects_low_confidence_fuzzy() {
// `--staate` vs `--state` — close but may be below strict threshold (0.9)
// The exact score depends on Jaro-Winkler; this tests that the strict
// threshold is higher than non-strict.
let non_strict = correct_args(args("lore --robot issues --staate opened"), false);
assert_eq!(non_strict.corrections.len(), 1);
assert_eq!(non_strict.corrections[0].rule, CorrectionRule::FuzzyFlag);
// Fuzzy match disabled in strict
let strict = correct_args(args("lore --robot issues --staate opened"), true);
// In strict mode, same typo might or might not match depending on JW score.
// We verify that at least wildly wrong flags are still rejected.
let strict = correct_args(args("lore --robot issues --xyzzy foo"), true);
assert!(strict.corrections.is_empty());
}
@@ -750,6 +1060,155 @@ mod tests {
assert_eq!(result.corrections[0].corrected, "--robot");
}
// ---- Subcommand alias correction ----
#[test]
fn subcommand_alias_merge_requests_underscore() {
let result = correct_args(args("lore --robot merge_requests -n 10"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::SubcommandAlias && c.corrected == "mrs")
);
assert!(result.args.contains(&"mrs".to_string()));
}
#[test]
fn subcommand_alias_mergerequests_no_sep() {
let result = correct_args(args("lore --robot mergerequests"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "mrs"));
}
#[test]
fn subcommand_alias_generate_docs_underscore() {
let result = correct_args(args("lore generate_docs"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "generate-docs")
);
}
#[test]
fn subcommand_alias_case_insensitive() {
let result = correct_args(args("lore Merge_Requests"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "mrs"));
}
#[test]
fn subcommand_alias_valid_command_untouched() {
let result = correct_args(args("lore issues -n 10"), false);
assert!(result.corrections.is_empty());
}
// ---- Enum value normalization ----
#[test]
fn value_case_normalization() {
let result = correct_args(args("lore issues --state Opened"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::ValueNormalization && c.corrected == "opened")
);
assert!(result.args.contains(&"opened".to_string()));
}
#[test]
fn value_case_normalization_eq_form() {
let result = correct_args(args("lore issues --state=Opened"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "--state=opened")
);
}
#[test]
fn value_prefix_expansion() {
// "open" is a unique prefix of "opened"
let result = correct_args(args("lore issues --state open"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "opened" && c.rule == CorrectionRule::ValueFuzzy)
);
}
#[test]
fn value_fuzzy_typo() {
let result = correct_args(args("lore issues --state opend"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "opened"));
}
#[test]
fn value_already_valid_untouched() {
let result = correct_args(args("lore issues --state opened"), false);
// No value corrections expected (flag corrections may still exist)
assert!(!result.corrections.iter().any(|c| matches!(
c.rule,
CorrectionRule::ValueNormalization | CorrectionRule::ValueFuzzy
)));
}
#[test]
fn value_mode_case() {
let result = correct_args(args("lore search --mode Hybrid query"), false);
assert!(result.corrections.iter().any(|c| c.corrected == "hybrid"));
}
#[test]
fn value_normalization_respects_option_terminator() {
// Values after `--` are positional and must not be corrected
let result = correct_args(args("lore search -- --state Opened"), false);
assert!(!result.corrections.iter().any(|c| matches!(
c.rule,
CorrectionRule::ValueNormalization | CorrectionRule::ValueFuzzy
)));
assert_eq!(result.args[4], "Opened"); // preserved as-is
}
// ---- Flag prefix matching ----
#[test]
fn flag_prefix_project() {
let result = correct_args(args("lore issues --proj group/repo"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::FlagPrefix && c.corrected == "--project")
);
}
#[test]
fn flag_prefix_ambiguous_not_corrected() {
// --s could be --state, --since, --sort, --status — ambiguous
let result = correct_args(args("lore issues --s opened"), false);
assert!(
!result
.corrections
.iter()
.any(|c| c.rule == CorrectionRule::FlagPrefix)
);
}
#[test]
fn flag_prefix_with_eq_value() {
let result = correct_args(args("lore issues --proj=group/repo"), false);
assert!(
result
.corrections
.iter()
.any(|c| c.corrected == "--project=group/repo")
);
}
// ---- Teaching notes ----
#[test]
@@ -789,6 +1248,43 @@ mod tests {
assert!(note.contains("spelling"));
}
#[test]
fn teaching_note_subcommand_alias() {
let c = Correction {
original: "merge_requests".to_string(),
corrected: "mrs".to_string(),
rule: CorrectionRule::SubcommandAlias,
confidence: 1.0,
};
let note = format_teaching_note(&c);
assert!(note.contains("canonical"));
assert!(note.contains("mrs"));
}
#[test]
fn teaching_note_value_normalization() {
let c = Correction {
original: "Opened".to_string(),
corrected: "opened".to_string(),
rule: CorrectionRule::ValueNormalization,
confidence: 0.95,
};
let note = format_teaching_note(&c);
assert!(note.contains("lowercase"));
}
#[test]
fn teaching_note_flag_prefix() {
let c = Correction {
original: "--proj".to_string(),
corrected: "--project".to_string(),
rule: CorrectionRule::FlagPrefix,
confidence: 0.95,
};
let note = format_teaching_note(&c);
assert!(note.contains("full flag name"));
}
// ---- Post-clap suggestion helpers ----
#[test]

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::{self, Theme};
use rusqlite::Connection;
use serde::Serialize;
@@ -178,27 +178,6 @@ fn count_notes(conn: &Connection, type_filter: Option<&str>) -> Result<CountResu
})
}
fn format_number(n: i64) -> String {
let (prefix, abs) = if n < 0 {
("-", n.unsigned_abs())
} else {
("", n.unsigned_abs())
};
let s = abs.to_string();
let chars: Vec<char> = s.chars().collect();
let mut result = String::from(prefix);
for (i, c) in chars.iter().enumerate() {
if i > 0 && (chars.len() - i).is_multiple_of(3) {
result.push(',');
}
result.push(*c);
}
result
}
#[derive(Serialize)]
struct CountJsonOutput {
ok: bool,
@@ -284,10 +263,10 @@ pub fn print_event_count_json(counts: &EventCounts, elapsed_ms: u64) {
pub fn print_event_count(counts: &EventCounts) {
println!(
"{:<20} {:>8} {:>8} {:>8}",
style("Event Type").cyan().bold(),
style("Issues").bold(),
style("MRs").bold(),
style("Total").bold()
Theme::info().bold().render("Event Type"),
Theme::bold().render("Issues"),
Theme::bold().render("MRs"),
Theme::bold().render("Total")
);
let state_total = counts.state_issue + counts.state_mr;
@@ -297,33 +276,33 @@ pub fn print_event_count(counts: &EventCounts) {
println!(
"{:<20} {:>8} {:>8} {:>8}",
"State events",
format_number(counts.state_issue as i64),
format_number(counts.state_mr as i64),
format_number(state_total as i64)
render::format_number(counts.state_issue as i64),
render::format_number(counts.state_mr as i64),
render::format_number(state_total as i64)
);
println!(
"{:<20} {:>8} {:>8} {:>8}",
"Label events",
format_number(counts.label_issue as i64),
format_number(counts.label_mr as i64),
format_number(label_total as i64)
render::format_number(counts.label_issue as i64),
render::format_number(counts.label_mr as i64),
render::format_number(label_total as i64)
);
println!(
"{:<20} {:>8} {:>8} {:>8}",
"Milestone events",
format_number(counts.milestone_issue as i64),
format_number(counts.milestone_mr as i64),
format_number(milestone_total as i64)
render::format_number(counts.milestone_issue as i64),
render::format_number(counts.milestone_mr as i64),
render::format_number(milestone_total as i64)
);
let total_issues = counts.state_issue + counts.label_issue + counts.milestone_issue;
let total_mrs = counts.state_mr + counts.label_mr + counts.milestone_mr;
println!(
"{:<20} {:>8} {:>8} {:>8}",
style("Total").bold(),
format_number(total_issues as i64),
format_number(total_mrs as i64),
style(format_number(counts.total() as i64)).bold()
Theme::bold().render("Total"),
render::format_number(total_issues as i64),
render::format_number(total_mrs as i64),
Theme::bold().render(&render::format_number(counts.total() as i64))
);
}
@@ -350,57 +329,56 @@ pub fn print_count_json(result: &CountResult, elapsed_ms: u64) {
}
pub fn print_count(result: &CountResult) {
let count_str = format_number(result.count);
let count_str = render::format_number(result.count);
if let Some(system_count) = result.system_count {
println!(
"{}: {} {}",
style(&result.entity).cyan(),
style(&count_str).bold(),
style(format!(
"{}: {:>10} {}",
Theme::info().render(&result.entity),
Theme::bold().render(&count_str),
Theme::dim().render(&format!(
"(excluding {} system)",
format_number(system_count)
render::format_number(system_count)
))
.dim()
);
} else {
println!(
"{}: {}",
style(&result.entity).cyan(),
style(&count_str).bold()
"{}: {:>10}",
Theme::info().render(&result.entity),
Theme::bold().render(&count_str)
);
}
if let Some(breakdown) = &result.state_breakdown {
println!(" opened: {}", format_number(breakdown.opened));
println!(" opened: {:>10}", render::format_number(breakdown.opened));
if let Some(merged) = breakdown.merged {
println!(" merged: {}", format_number(merged));
println!(" merged: {:>10}", render::format_number(merged));
}
println!(" closed: {}", format_number(breakdown.closed));
println!(" closed: {:>10}", render::format_number(breakdown.closed));
if let Some(locked) = breakdown.locked
&& locked > 0
{
println!(" locked: {}", format_number(locked));
println!(" locked: {:>10}", render::format_number(locked));
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::cli::render;
#[test]
fn format_number_handles_small_numbers() {
assert_eq!(format_number(0), "0");
assert_eq!(format_number(1), "1");
assert_eq!(format_number(100), "100");
assert_eq!(format_number(999), "999");
assert_eq!(render::format_number(0), "0");
assert_eq!(render::format_number(1), "1");
assert_eq!(render::format_number(100), "100");
assert_eq!(render::format_number(999), "999");
}
#[test]
fn format_number_adds_thousands_separators() {
assert_eq!(format_number(1000), "1,000");
assert_eq!(format_number(12345), "12,345");
assert_eq!(format_number(1234567), "1,234,567");
assert_eq!(render::format_number(1000), "1,000");
assert_eq!(render::format_number(12345), "12,345");
assert_eq!(render::format_number(1234567), "1,234,567");
}
}

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::{Icons, Theme};
use serde::Serialize;
use crate::core::config::Config;
@@ -530,7 +530,7 @@ fn check_logging(config: Option<&Config>) -> LoggingCheck {
}
pub fn print_doctor_results(result: &DoctorResult) {
println!("\nlore doctor\n");
println!();
print_check("Config", &result.checks.config.result);
print_check("Database", &result.checks.database.result);
@@ -539,38 +539,61 @@ pub fn print_doctor_results(result: &DoctorResult) {
print_check("Ollama", &result.checks.ollama.result);
print_check("Logging", &result.checks.logging.result);
// Count statuses
let checks = [
&result.checks.config.result,
&result.checks.database.result,
&result.checks.gitlab.result,
&result.checks.projects.result,
&result.checks.ollama.result,
&result.checks.logging.result,
];
let passed = checks
.iter()
.filter(|c| c.status == CheckStatus::Ok)
.count();
let warnings = checks
.iter()
.filter(|c| c.status == CheckStatus::Warning)
.count();
let failed = checks
.iter()
.filter(|c| c.status == CheckStatus::Error)
.count();
println!();
let mut summary_parts = Vec::new();
if result.success {
let ollama_ok = result.checks.ollama.result.status == CheckStatus::Ok;
if ollama_ok {
println!("{}", style("Status: Ready").green());
summary_parts.push(Theme::success().render("Ready"));
} else {
println!(
"{} {}",
style("Status: Ready").green(),
style("(lexical search available, semantic search requires Ollama)").yellow()
);
summary_parts.push(Theme::error().render("Not ready"));
}
} else {
println!("{}", style("Status: Not ready").red());
summary_parts.push(format!("{passed} passed"));
if warnings > 0 {
summary_parts.push(Theme::warning().render(&format!("{warnings} warning")));
}
if failed > 0 {
summary_parts.push(Theme::error().render(&format!("{failed} failed")));
}
println!(" {}", summary_parts.join(" \u{b7} "));
println!();
}
fn print_check(name: &str, result: &CheckResult) {
let symbol = match result.status {
CheckStatus::Ok => style("").green(),
CheckStatus::Warning => style("").yellow(),
CheckStatus::Error => style("").red(),
let icon = match result.status {
CheckStatus::Ok => Theme::success().render(Icons::success()),
CheckStatus::Warning => Theme::warning().render(Icons::warning()),
CheckStatus::Error => Theme::error().render(Icons::error()),
};
let message = result.message.as_deref().unwrap_or("");
let message_styled = match result.status {
CheckStatus::Ok => message.to_string(),
CheckStatus::Warning => style(message).yellow().to_string(),
CheckStatus::Error => style(message).red().to_string(),
CheckStatus::Warning => Theme::warning().render(message),
CheckStatus::Error => Theme::error().render(message),
};
println!(" {symbol} {:<10} {message_styled}", name);
println!(" {icon} {:<10} {message_styled}", name);
}

View File

@@ -1,10 +1,10 @@
use std::collections::HashMap;
use std::sync::LazyLock;
use console::style;
use regex::Regex;
use serde::Serialize;
use crate::cli::render::{Icons, Theme};
use crate::cli::robot::RobotMeta;
use crate::core::config::Config;
use crate::core::db::create_connection;
@@ -420,7 +420,7 @@ pub fn print_drift_human(response: &DriftResponse) {
"Drift Analysis: {} #{}",
response.entity.entity_type, response.entity.iid
);
println!("{}", style(&header).bold());
println!("{}", Theme::bold().render(&header));
println!("{}", "-".repeat(header.len().min(60)));
println!("Title: {}", response.entity.title);
println!("Threshold: {:.2}", response.threshold);
@@ -428,7 +428,11 @@ pub fn print_drift_human(response: &DriftResponse) {
println!();
if response.drift_detected {
println!("{}", style("DRIFT DETECTED").red().bold());
println!(
"{} {}",
Theme::error().render(Icons::error()),
Theme::error().bold().render("DRIFT DETECTED")
);
if let Some(dp) = &response.drift_point {
println!(
" At note #{} by @{} ({}) - similarity {:.2}",
@@ -439,7 +443,11 @@ pub fn print_drift_human(response: &DriftResponse) {
println!(" Topics: {}", response.drift_topics.join(", "));
}
} else {
println!("{}", style("No drift detected").green());
println!(
"{} {}",
Theme::success().render(Icons::success()),
Theme::success().render("No drift detected")
);
}
println!();
@@ -447,10 +455,10 @@ pub fn print_drift_human(response: &DriftResponse) {
if !response.similarity_curve.is_empty() {
println!();
println!("{}", style("Similarity Curve:").bold());
println!("{}", Theme::bold().render("Similarity Curve:"));
for pt in &response.similarity_curve {
let bar_len = ((pt.similarity.max(0.0)) * 30.0) as usize;
let bar: String = "#".repeat(bar_len);
let bar: String = "\u{2588}".repeat(bar_len);
println!(
" {:>3} {:.2} {} @{}",
pt.note_index, pt.similarity, bar, pt.author

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::Theme;
use serde::Serialize;
use crate::Config;
@@ -96,16 +96,31 @@ pub async fn run_embed(
}
pub fn print_embed(result: &EmbedCommandResult) {
println!("{} Embedding complete", style("done").green().bold(),);
if result.docs_embedded == 0 && result.failed == 0 && result.skipped == 0 {
println!(
" Embedded: {} documents ({} chunks)",
result.docs_embedded, result.chunks_embedded
"\n {} nothing to embed",
Theme::success().bold().render("Embedding")
);
return;
}
println!(
"\n {} {} documents ({} chunks)",
Theme::success().bold().render("Embedded"),
Theme::bold().render(&result.docs_embedded.to_string()),
result.chunks_embedded
);
if result.failed > 0 {
println!(" Failed: {}", style(result.failed).red());
println!(
" {}",
Theme::error().render(&format!("{} failed", result.failed))
);
}
if result.skipped > 0 {
println!(" Skipped: {}", result.skipped);
println!(
" {}",
Theme::dim().render(&format!("{} skipped", result.skipped))
);
}
}

View File

@@ -0,0 +1,334 @@
use serde::Serialize;
use crate::Config;
use crate::cli::render::{self, Icons, Theme};
use crate::core::db::create_connection;
use crate::core::error::Result;
use crate::core::file_history::resolve_rename_chain;
use crate::core::paths::get_db_path;
use crate::core::project::resolve_project;
use crate::core::time::ms_to_iso;
/// Maximum rename chain BFS depth.
const MAX_RENAME_HOPS: usize = 10;
/// A single MR that touched the file.
#[derive(Debug, Serialize)]
pub struct FileHistoryMr {
pub iid: i64,
pub title: String,
pub state: String,
pub author_username: String,
pub change_type: String,
pub merged_at_iso: Option<String>,
pub updated_at_iso: String,
pub merge_commit_sha: Option<String>,
pub web_url: Option<String>,
}
/// A DiffNote discussion snippet on the file.
#[derive(Debug, Serialize)]
pub struct FileDiscussion {
pub discussion_id: String,
pub author_username: String,
pub body_snippet: String,
pub path: String,
pub created_at_iso: String,
}
/// Full result of a file-history query.
#[derive(Debug, Serialize)]
pub struct FileHistoryResult {
pub path: String,
pub rename_chain: Vec<String>,
pub renames_followed: bool,
pub merge_requests: Vec<FileHistoryMr>,
pub discussions: Vec<FileDiscussion>,
pub total_mrs: usize,
pub paths_searched: usize,
}
/// Run the file-history query.
pub fn run_file_history(
config: &Config,
path: &str,
project: Option<&str>,
no_follow_renames: bool,
merged_only: bool,
include_discussions: bool,
limit: usize,
) -> Result<FileHistoryResult> {
let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?;
let project_id = project.map(|p| resolve_project(&conn, p)).transpose()?;
// Resolve rename chain unless disabled
let (all_paths, renames_followed) = if no_follow_renames {
(vec![path.to_string()], false)
} else if let Some(pid) = project_id {
let chain = resolve_rename_chain(&conn, pid, path, MAX_RENAME_HOPS)?;
let followed = chain.len() > 1;
(chain, followed)
} else {
// Without a project scope, can't resolve renames (need project_id)
(vec![path.to_string()], false)
};
let paths_searched = all_paths.len();
// Build placeholders for IN clause
let placeholders: Vec<String> = (0..all_paths.len())
.map(|i| format!("?{}", i + 2))
.collect();
let in_clause = placeholders.join(", ");
let merged_filter = if merged_only {
" AND mr.state = 'merged'"
} else {
""
};
let project_filter = if project_id.is_some() {
"AND mfc.project_id = ?1"
} else {
""
};
let sql = format!(
"SELECT DISTINCT \
mr.iid, mr.title, mr.state, mr.author_username, \
mfc.change_type, mr.merged_at, mr.updated_at, mr.merge_commit_sha, mr.web_url \
FROM mr_file_changes mfc \
JOIN merge_requests mr ON mr.id = mfc.merge_request_id \
WHERE mfc.new_path IN ({in_clause}) {project_filter} {merged_filter} \
ORDER BY COALESCE(mr.merged_at, mr.updated_at) DESC \
LIMIT ?{}",
all_paths.len() + 2
);
let mut stmt = conn.prepare(&sql)?;
// Bind parameters: ?1 = project_id (or 0 placeholder), ?2..?N+1 = paths, ?N+2 = limit
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = Vec::new();
params.push(Box::new(project_id.unwrap_or(0)));
for p in &all_paths {
params.push(Box::new(p.clone()));
}
params.push(Box::new(limit as i64));
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let merge_requests: Vec<FileHistoryMr> = stmt
.query_map(param_refs.as_slice(), |row| {
let merged_at: Option<i64> = row.get(5)?;
let updated_at: i64 = row.get(6)?;
Ok(FileHistoryMr {
iid: row.get(0)?,
title: row.get(1)?,
state: row.get(2)?,
author_username: row.get(3)?,
change_type: row.get(4)?,
merged_at_iso: merged_at.map(ms_to_iso),
updated_at_iso: ms_to_iso(updated_at),
merge_commit_sha: row.get(7)?,
web_url: row.get(8)?,
})
})?
.filter_map(std::result::Result::ok)
.collect();
let total_mrs = merge_requests.len();
// Optionally fetch DiffNote discussions on this file
let discussions = if include_discussions && !merge_requests.is_empty() {
fetch_file_discussions(&conn, &all_paths, project_id)?
} else {
Vec::new()
};
Ok(FileHistoryResult {
path: path.to_string(),
rename_chain: all_paths,
renames_followed,
merge_requests,
discussions,
total_mrs,
paths_searched,
})
}
/// Fetch DiffNote discussions that reference the given file paths.
fn fetch_file_discussions(
conn: &rusqlite::Connection,
paths: &[String],
project_id: Option<i64>,
) -> Result<Vec<FileDiscussion>> {
let placeholders: Vec<String> = (0..paths.len()).map(|i| format!("?{}", i + 2)).collect();
let in_clause = placeholders.join(", ");
let project_filter = if project_id.is_some() {
"AND d.project_id = ?1"
} else {
""
};
let sql = format!(
"SELECT d.gitlab_discussion_id, n.author_username, n.body, n.position_new_path, n.created_at \
FROM notes n \
JOIN discussions d ON d.id = n.discussion_id \
WHERE n.position_new_path IN ({in_clause}) {project_filter} \
AND n.is_system = 0 \
ORDER BY n.created_at DESC \
LIMIT 50"
);
let mut stmt = conn.prepare(&sql)?;
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = Vec::new();
params.push(Box::new(project_id.unwrap_or(0)));
for p in paths {
params.push(Box::new(p.clone()));
}
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let discussions: Vec<FileDiscussion> = stmt
.query_map(param_refs.as_slice(), |row| {
let body: String = row.get(2)?;
let snippet = if body.len() > 200 {
format!("{}...", &body[..body.floor_char_boundary(200)])
} else {
body
};
let created_at: i64 = row.get(4)?;
Ok(FileDiscussion {
discussion_id: row.get(0)?,
author_username: row.get(1)?,
body_snippet: snippet,
path: row.get(3)?,
created_at_iso: ms_to_iso(created_at),
})
})?
.filter_map(std::result::Result::ok)
.collect();
Ok(discussions)
}
// ── Human output ────────────────────────────────────────────────────────────
pub fn print_file_history(result: &FileHistoryResult) {
// Header
let paths_info = if result.paths_searched > 1 {
format!(
" (via {} paths, {} MRs)",
result.paths_searched, result.total_mrs
)
} else {
format!(" ({} MRs)", result.total_mrs)
};
println!();
println!(
"{}",
Theme::bold().render(&format!("File History: {}{}", result.path, paths_info))
);
// Rename chain
if result.renames_followed && result.rename_chain.len() > 1 {
let chain_str: Vec<&str> = result.rename_chain.iter().map(String::as_str).collect();
println!(
" Rename chain: {}",
Theme::dim().render(&chain_str.join(" -> "))
);
}
if result.merge_requests.is_empty() {
println!(
"\n {} {}",
Icons::info(),
Theme::dim().render("No merge requests found touching this file.")
);
println!(
" {}",
Theme::dim().render("Hint: Run 'lore sync' to fetch MR file changes.")
);
println!();
return;
}
println!();
for mr in &result.merge_requests {
let (icon, state_style) = match mr.state.as_str() {
"merged" => (Icons::mr_merged(), Theme::accent()),
"opened" => (Icons::mr_opened(), Theme::success()),
"closed" => (Icons::mr_closed(), Theme::warning()),
_ => (Icons::mr_opened(), Theme::dim()),
};
let date = mr
.merged_at_iso
.as_deref()
.or(Some(mr.updated_at_iso.as_str()))
.unwrap_or("")
.split('T')
.next()
.unwrap_or("");
println!(
" {} {} {} {} @{} {} {}",
icon,
Theme::accent().render(&format!("!{}", mr.iid)),
render::truncate(&mr.title, 50),
state_style.render(&mr.state),
mr.author_username,
date,
Theme::dim().render(&mr.change_type),
);
}
// Discussions
if !result.discussions.is_empty() {
println!(
"\n {} File discussions ({}):",
Icons::note(),
result.discussions.len()
);
for d in &result.discussions {
let date = d.created_at_iso.split('T').next().unwrap_or("");
println!(
" @{} ({}) [{}]: {}",
d.author_username,
date,
Theme::dim().render(&d.path),
d.body_snippet
);
}
}
println!();
}
// ── Robot (JSON) output ─────────────────────────────────────────────────────
pub fn print_file_history_json(result: &FileHistoryResult, elapsed_ms: u64) {
let output = serde_json::json!({
"ok": true,
"data": {
"path": result.path,
"rename_chain": if result.renames_followed { Some(&result.rename_chain) } else { None },
"merge_requests": result.merge_requests,
"discussions": if result.discussions.is_empty() { None } else { Some(&result.discussions) },
},
"meta": {
"elapsed_ms": elapsed_ms,
"total_mrs": result.total_mrs,
"renames_followed": result.renames_followed,
"paths_searched": result.paths_searched,
}
});
println!("{}", serde_json::to_string(&output).unwrap_or_default());
}

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::Theme;
use rusqlite::Connection;
use serde::Serialize;
use tracing::info;
@@ -39,6 +39,7 @@ pub fn run_generate_docs(
result.seeded += seed_dirty(&conn, SourceType::Issue, project_filter)?;
result.seeded += seed_dirty(&conn, SourceType::MergeRequest, project_filter)?;
result.seeded += seed_dirty(&conn, SourceType::Discussion, project_filter)?;
result.seeded += seed_dirty_notes(&conn, project_filter)?;
}
let regen =
@@ -67,6 +68,10 @@ fn seed_dirty(
SourceType::Issue => "issues",
SourceType::MergeRequest => "merge_requests",
SourceType::Discussion => "discussions",
SourceType::Note => {
// NOTE-2E will implement seed_dirty_notes separately (needs is_system filter)
unreachable!("Note seeding handled by seed_dirty_notes, not seed_dirty")
}
};
let type_str = source_type.as_str();
let now = chrono::Utc::now().timestamp_millis();
@@ -125,25 +130,95 @@ fn seed_dirty(
Ok(total_seeded)
}
fn seed_dirty_notes(conn: &Connection, project_filter: Option<&str>) -> Result<usize> {
let now = chrono::Utc::now().timestamp_millis();
let mut total_seeded: usize = 0;
let mut last_id: i64 = 0;
loop {
let inserted = if let Some(project) = project_filter {
let project_id = resolve_project(conn, project)?;
conn.execute(
"INSERT INTO dirty_sources (source_type, source_id, queued_at, attempt_count, last_attempt_at, last_error, next_attempt_at)
SELECT 'note', id, ?1, 0, NULL, NULL, NULL
FROM notes WHERE id > ?2 AND project_id = ?3 AND is_system = 0 ORDER BY id LIMIT ?4
ON CONFLICT(source_type, source_id) DO NOTHING",
rusqlite::params![now, last_id, project_id, FULL_MODE_CHUNK_SIZE],
)?
} else {
conn.execute(
"INSERT INTO dirty_sources (source_type, source_id, queued_at, attempt_count, last_attempt_at, last_error, next_attempt_at)
SELECT 'note', id, ?1, 0, NULL, NULL, NULL
FROM notes WHERE id > ?2 AND is_system = 0 ORDER BY id LIMIT ?3
ON CONFLICT(source_type, source_id) DO NOTHING",
rusqlite::params![now, last_id, FULL_MODE_CHUNK_SIZE],
)?
};
if inserted == 0 {
break;
}
let max_id: i64 = conn.query_row(
"SELECT MAX(id) FROM (SELECT id FROM notes WHERE id > ?1 AND is_system = 0 ORDER BY id LIMIT ?2)",
rusqlite::params![last_id, FULL_MODE_CHUNK_SIZE],
|row| row.get(0),
)?;
total_seeded += inserted;
last_id = max_id;
}
info!(
source_type = "note",
seeded = total_seeded,
"Seeded dirty_sources"
);
Ok(total_seeded)
}
pub fn print_generate_docs(result: &GenerateDocsResult) {
let mode = if result.full_mode {
"full"
} else {
"incremental"
};
if result.regenerated == 0 && result.errored == 0 {
println!(
"{} Document generation complete ({})",
style("done").green().bold(),
"\n {} no documents to update ({})",
Theme::success().bold().render("Docs"),
mode
);
return;
}
// Headline
println!(
"\n {} {} documents ({})",
Theme::success().bold().render("Generated"),
Theme::bold().render(&result.regenerated.to_string()),
mode
);
if result.full_mode {
println!(" Seeded: {}", result.seeded);
// Detail line: compact middle-dot format, zero-suppressed
let mut details: Vec<String> = Vec::new();
if result.full_mode && result.seeded > 0 {
details.push(format!("{} seeded", result.seeded));
}
if result.unchanged > 0 {
details.push(format!("{} unchanged", result.unchanged));
}
if !details.is_empty() {
println!(" {}", Theme::dim().render(&details.join(" \u{b7} ")));
}
println!(" Regenerated: {}", result.regenerated);
println!(" Unchanged: {}", result.unchanged);
if result.errored > 0 {
println!(" Errored: {}", style(result.errored).red());
println!(
" {}",
Theme::error().render(&format!("{} errored", result.errored))
);
}
}
@@ -186,3 +261,81 @@ pub fn print_generate_docs_json(result: &GenerateDocsResult, elapsed_ms: u64) {
};
println!("{}", serde_json::to_string(&output).unwrap());
}
#[cfg(test)]
mod tests {
use std::path::Path;
use crate::core::db::{create_connection, run_migrations};
use super::*;
fn setup_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url) VALUES (1, 100, 'group/project', 'https://gitlab.com/group/project')",
[],
).unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 1, 'Test', 'opened', 1000, 2000, 3000)",
[],
).unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (1, 'disc_1', 1, 1, 'Issue', 3000)",
[],
).unwrap();
conn
}
fn insert_note(conn: &Connection, id: i64, gitlab_id: i64, is_system: bool) {
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (?1, ?2, 1, 1, 'alice', 'note body', 1000, 2000, 3000, ?3)",
rusqlite::params![id, gitlab_id, is_system as i32],
).unwrap();
}
#[test]
fn test_full_seed_includes_notes() {
let conn = setup_db();
insert_note(&conn, 1, 101, false);
insert_note(&conn, 2, 102, false);
insert_note(&conn, 3, 103, false);
insert_note(&conn, 4, 104, true); // system note — should be excluded
let seeded = seed_dirty_notes(&conn, None).unwrap();
assert_eq!(seeded, 3);
let count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(count, 3);
}
#[test]
fn test_note_document_count_stable_after_second_generate_docs_full() {
let conn = setup_db();
insert_note(&conn, 1, 101, false);
insert_note(&conn, 2, 102, false);
let first = seed_dirty_notes(&conn, None).unwrap();
assert_eq!(first, 2);
// Second run should be idempotent (ON CONFLICT DO NOTHING)
let second = seed_dirty_notes(&conn, None).unwrap();
assert_eq!(second, 0);
let count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(count, 2);
}
}

View File

@@ -1,7 +1,7 @@
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use console::style;
use crate::cli::render::Theme;
use indicatif::{ProgressBar, ProgressStyle};
use rusqlite::Connection;
use serde::Serialize;
@@ -46,10 +46,27 @@ pub struct IngestResult {
pub mr_diffs_failed: usize,
pub status_enrichment_errors: usize,
pub status_enrichment_projects: Vec<ProjectStatusEnrichment>,
pub project_summaries: Vec<ProjectSummary>,
}
/// Per-project summary for display in stage completion sub-rows.
#[derive(Debug, Default)]
pub struct ProjectSummary {
pub path: String,
pub items_upserted: usize,
pub discussions_synced: usize,
pub events_fetched: usize,
pub events_failed: usize,
pub statuses_enriched: usize,
pub statuses_seen: usize,
pub status_errors: usize,
pub mr_diffs_fetched: usize,
pub mr_diffs_failed: usize,
}
/// Per-project status enrichment result, collected during ingestion.
pub struct ProjectStatusEnrichment {
pub path: String,
pub mode: String,
pub reason: Option<String>,
pub seen: usize,
@@ -293,7 +310,7 @@ async fn run_ingest_inner(
if display.show_text {
println!(
"{}",
style("Full sync: resetting cursors to fetch all data...").yellow()
Theme::warning().render("Full sync: resetting cursors to fetch all data...")
);
}
for (local_project_id, _, path) in &projects {
@@ -341,7 +358,10 @@ async fn run_ingest_inner(
"merge requests"
};
if display.show_text {
println!("{}", style(format!("Ingesting {type_label}...")).blue());
println!(
"{}",
Theme::info().render(&format!("Ingesting {type_label}..."))
);
println!();
}
@@ -385,11 +405,11 @@ async fn run_ingest_inner(
let s = multi.add(ProgressBar::new_spinner());
s.set_style(
ProgressStyle::default_spinner()
.template("{spinner:.blue} {msg}")
.template("{spinner:.cyan} {msg}")
.unwrap(),
);
s.set_message(format!("Fetching {type_label} from {path}..."));
s.enable_steady_tick(std::time::Duration::from_millis(100));
s.enable_steady_tick(std::time::Duration::from_millis(60));
s
};
@@ -400,12 +420,13 @@ async fn run_ingest_inner(
b.set_style(
ProgressStyle::default_bar()
.template(
" {spinner:.blue} {prefix:.cyan} Syncing discussions [{bar:30.cyan/dim}] {pos}/{len}",
" {spinner:.dim} {prefix:.cyan} Syncing discussions [{bar:30.cyan/dark_gray}] {pos}/{len} {per_sec:.dim} {eta:.dim}",
)
.unwrap()
.progress_chars("=> "),
.progress_chars(crate::cli::render::Icons::progress_chars()),
);
b.set_prefix(path.clone());
b.enable_steady_tick(std::time::Duration::from_millis(60));
b
};
@@ -442,7 +463,7 @@ async fn run_ingest_inner(
spinner_clone.finish_and_clear();
let agg_total = agg_disc_total_clone.fetch_add(total, Ordering::Relaxed) + total;
disc_bar_clone.set_length(total as u64);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(format!(
"Syncing discussions... (0/{agg_total})"
));
@@ -462,7 +483,7 @@ async fn run_ingest_inner(
spinner_clone.finish_and_clear();
let agg_total = agg_disc_total_clone.fetch_add(total, Ordering::Relaxed) + total;
disc_bar_clone.set_length(total as u64);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(format!(
"Syncing discussions... (0/{agg_total})"
));
@@ -483,11 +504,11 @@ async fn run_ingest_inner(
disc_bar_clone.set_length(total as u64);
disc_bar_clone.set_style(
ProgressStyle::default_bar()
.template(" {spinner:.blue} {prefix:.cyan} Fetching resource events [{bar:30.cyan/dim}] {pos}/{len}")
.template(" {spinner:.dim} {prefix:.cyan} Fetching resource events [{bar:30.cyan/dark_gray}] {pos}/{len} {per_sec:.dim} {eta:.dim}")
.unwrap()
.progress_chars("=> "),
.progress_chars(crate::cli::render::Icons::progress_chars()),
);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
agg_events_total_clone.fetch_add(total, Ordering::Relaxed);
stage_bar_clone.set_message(
"Fetching resource events...".to_string()
@@ -507,7 +528,7 @@ async fn run_ingest_inner(
ProgressEvent::ClosesIssuesFetchStarted { total } => {
disc_bar_clone.reset();
disc_bar_clone.set_length(total as u64);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(
"Fetching closes-issues references...".to_string()
);
@@ -521,7 +542,7 @@ async fn run_ingest_inner(
ProgressEvent::MrDiffsFetchStarted { total } => {
disc_bar_clone.reset();
disc_bar_clone.set_length(total as u64);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(
"Fetching MR file changes...".to_string()
);
@@ -532,35 +553,37 @@ async fn run_ingest_inner(
ProgressEvent::MrDiffsFetchComplete { .. } => {
disc_bar_clone.finish_and_clear();
}
ProgressEvent::StatusEnrichmentStarted => {
spinner_clone.set_message(format!(
"{path_for_cb}: Enriching work item statuses..."
));
ProgressEvent::StatusEnrichmentStarted { total } => {
spinner_clone.finish_and_clear();
disc_bar_clone.reset();
disc_bar_clone.set_length(total as u64);
disc_bar_clone.set_style(
ProgressStyle::default_bar()
.template(" {spinner:.dim} {prefix:.cyan} Statuses [{bar:30.cyan/dark_gray}] {pos}/{len} {per_sec:.dim} {eta:.dim}")
.unwrap()
.progress_chars(crate::cli::render::Icons::progress_chars()),
);
disc_bar_clone.set_prefix(path_for_cb.clone());
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(60));
stage_bar_clone.set_message(
"Enriching work item statuses...".to_string()
);
}
ProgressEvent::StatusEnrichmentPageFetched { items_so_far } => {
spinner_clone.set_message(format!(
"{path_for_cb}: Fetching statuses... ({items_so_far} work items)"
));
disc_bar_clone.set_position(items_so_far as u64);
stage_bar_clone.set_message(format!(
"Enriching work item statuses... ({items_so_far} fetched)"
));
}
ProgressEvent::StatusEnrichmentWriting { total } => {
spinner_clone.set_message(format!(
"{path_for_cb}: Writing {total} statuses..."
));
disc_bar_clone.set_message(format!("Writing {total} statuses..."));
stage_bar_clone.set_message(format!(
"Writing {total} work item statuses..."
));
}
ProgressEvent::StatusEnrichmentComplete { enriched, cleared } => {
disc_bar_clone.finish_and_clear();
if enriched > 0 || cleared > 0 {
spinner_clone.set_message(format!(
"{path_for_cb}: {enriched} statuses enriched, {cleared} cleared"
));
stage_bar_clone.set_message(format!(
"Status enrichment: {enriched} enriched, {cleared} cleared"
));
@@ -643,6 +666,7 @@ async fn run_ingest_inner(
total
.status_enrichment_projects
.push(ProjectStatusEnrichment {
path: path.clone(),
mode: result.status_enrichment_mode.clone(),
reason: result.status_unsupported_reason.clone(),
seen: result.statuses_seen,
@@ -653,6 +677,19 @@ async fn run_ingest_inner(
first_partial_error: result.first_partial_error.clone(),
error: result.status_enrichment_error.clone(),
});
total.project_summaries.push(ProjectSummary {
path: path.clone(),
items_upserted: result.issues_upserted,
discussions_synced: result.discussions_fetched,
events_fetched: result.resource_events_fetched,
events_failed: result.resource_events_failed,
statuses_enriched: result.statuses_enriched,
statuses_seen: result.statuses_seen,
status_errors: result.partial_error_count
+ usize::from(result.status_enrichment_error.is_some()),
mr_diffs_fetched: 0,
mr_diffs_failed: 0,
});
}
Ok(ProjectIngestOutcome::Mrs {
ref path,
@@ -676,6 +713,18 @@ async fn run_ingest_inner(
total.resource_events_failed += result.resource_events_failed;
total.mr_diffs_fetched += result.mr_diffs_fetched;
total.mr_diffs_failed += result.mr_diffs_failed;
total.project_summaries.push(ProjectSummary {
path: path.clone(),
items_upserted: result.mrs_upserted,
discussions_synced: result.discussions_fetched,
events_fetched: result.resource_events_fetched,
events_failed: result.resource_events_failed,
statuses_enriched: 0,
statuses_seen: 0,
status_errors: 0,
mr_diffs_fetched: result.mr_diffs_fetched,
mr_diffs_failed: result.mr_diffs_failed,
});
}
}
}
@@ -746,7 +795,7 @@ fn print_issue_project_summary(path: &str, result: &IngestProjectResult) {
println!(
" {}: {} issues fetched{}",
style(path).cyan(),
Theme::info().render(path),
result.issues_upserted,
labels_str
);
@@ -761,7 +810,7 @@ fn print_issue_project_summary(path: &str, result: &IngestProjectResult) {
if result.issues_skipped_discussion_sync > 0 {
println!(
" {} unchanged issues (discussion sync skipped)",
style(result.issues_skipped_discussion_sync).dim()
Theme::dim().render(&result.issues_skipped_discussion_sync.to_string())
);
}
}
@@ -784,7 +833,7 @@ fn print_mr_project_summary(path: &str, result: &IngestMrProjectResult) {
println!(
" {}: {} MRs fetched{}{}",
style(path).cyan(),
Theme::info().render(path),
result.mrs_upserted,
labels_str,
assignees_str
@@ -808,7 +857,7 @@ fn print_mr_project_summary(path: &str, result: &IngestMrProjectResult) {
if result.mrs_skipped_discussion_sync > 0 {
println!(
" {} unchanged MRs (discussion sync skipped)",
style(result.mrs_skipped_discussion_sync).dim()
Theme::dim().render(&result.mrs_skipped_discussion_sync.to_string())
);
}
}
@@ -942,21 +991,19 @@ pub fn print_ingest_summary(result: &IngestResult) {
if result.resource_type == "issues" {
println!(
"{}",
style(format!(
Theme::success().render(&format!(
"Total: {} issues, {} discussions, {} notes",
result.issues_upserted, result.discussions_fetched, result.notes_upserted
))
.green()
);
if result.issues_skipped_discussion_sync > 0 {
println!(
"{}",
style(format!(
Theme::dim().render(&format!(
"Skipped discussion sync for {} unchanged issues.",
result.issues_skipped_discussion_sync
))
.dim()
);
}
} else {
@@ -968,24 +1015,22 @@ pub fn print_ingest_summary(result: &IngestResult) {
println!(
"{}",
style(format!(
Theme::success().render(&format!(
"Total: {} MRs, {} discussions, {} notes{}",
result.mrs_upserted,
result.discussions_fetched,
result.notes_upserted,
diffnotes_str
))
.green()
);
if result.mrs_skipped_discussion_sync > 0 {
println!(
"{}",
style(format!(
Theme::dim().render(&format!(
"Skipped discussion sync for {} unchanged MRs.",
result.mrs_skipped_discussion_sync
))
.dim()
);
}
}
@@ -1006,8 +1051,8 @@ pub fn print_ingest_summary(result: &IngestResult) {
pub fn print_dry_run_preview(preview: &DryRunPreview) {
println!(
"{} {}",
style("Dry Run Preview").cyan().bold(),
style("(no changes will be made)").yellow()
Theme::info().bold().render("Dry Run Preview"),
Theme::warning().render("(no changes will be made)")
);
println!();
@@ -1017,27 +1062,31 @@ pub fn print_dry_run_preview(preview: &DryRunPreview) {
"merge requests"
};
println!(" Resource type: {}", style(type_label).white().bold());
println!(" Resource type: {}", Theme::bold().render(type_label));
println!(
" Sync mode: {}",
if preview.sync_mode == "full" {
style("full (all data will be re-fetched)").yellow()
Theme::warning().render("full (all data will be re-fetched)")
} else {
style("incremental (only changes since last sync)").green()
Theme::success().render("incremental (only changes since last sync)")
}
);
println!(" Projects: {}", preview.projects.len());
println!();
println!("{}", style("Projects to sync:").cyan().bold());
println!("{}", Theme::info().bold().render("Projects to sync:"));
for project in &preview.projects {
let sync_status = if !project.has_cursor {
style("initial sync").yellow()
Theme::warning().render("initial sync")
} else {
style("incremental").green()
Theme::success().render("incremental")
};
println!(" {} ({})", style(&project.path).white(), sync_status);
println!(
" {} ({})",
Theme::bold().render(&project.path),
sync_status
);
println!(" Existing {}: {}", type_label, project.existing_count);
if let Some(ref last_synced) = project.last_synced {

View File

@@ -1,4 +1,4 @@
use comfy_table::{Attribute, Cell, Color, ContentArrangement, Table};
use crate::cli::render::{self, Align, Icons, StyledCell, Table as LoreTable, Theme};
use rusqlite::Connection;
use serde::Serialize;
@@ -6,41 +6,10 @@ use crate::Config;
use crate::cli::robot::{RobotMeta, expand_fields_preset, filter_fields};
use crate::core::db::create_connection;
use crate::core::error::{LoreError, Result};
use crate::core::path_resolver::escape_like as note_escape_like;
use crate::core::paths::get_db_path;
use crate::core::project::resolve_project;
use crate::core::time::{ms_to_iso, now_ms, parse_since};
fn colored_cell(content: impl std::fmt::Display, color: Color) -> Cell {
let cell = Cell::new(content);
if console::colors_enabled() {
cell.fg(color)
} else {
cell
}
}
fn colored_cell_hex(content: &str, hex: Option<&str>) -> Cell {
if !console::colors_enabled() {
return Cell::new(content);
}
let Some(hex) = hex else {
return Cell::new(content);
};
let hex = hex.trim_start_matches('#');
if hex.len() != 6 {
return Cell::new(content);
}
let Ok(r) = u8::from_str_radix(&hex[0..2], 16) else {
return Cell::new(content);
};
let Ok(g) = u8::from_str_radix(&hex[2..4], 16) else {
return Cell::new(content);
};
let Ok(b) = u8::from_str_radix(&hex[4..6], 16) else {
return Cell::new(content);
};
Cell::new(content).fg(Color::Rgb { r, g, b })
}
use crate::core::time::{ms_to_iso, parse_since};
#[derive(Debug, Serialize)]
pub struct IssueListRow {
@@ -668,60 +637,6 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
Ok(MrListResult { mrs, total_count })
}
fn format_relative_time(ms_epoch: i64) -> String {
let now = now_ms();
let diff = now - ms_epoch;
if diff < 0 {
return "in the future".to_string();
}
match diff {
d if d < 60_000 => "just now".to_string(),
d if d < 3_600_000 => format!("{} min ago", d / 60_000),
d if d < 86_400_000 => {
let n = d / 3_600_000;
format!("{n} {} ago", if n == 1 { "hour" } else { "hours" })
}
d if d < 604_800_000 => {
let n = d / 86_400_000;
format!("{n} {} ago", if n == 1 { "day" } else { "days" })
}
d if d < 2_592_000_000 => {
let n = d / 604_800_000;
format!("{n} {} ago", if n == 1 { "week" } else { "weeks" })
}
_ => {
let n = diff / 2_592_000_000;
format!("{n} {} ago", if n == 1 { "month" } else { "months" })
}
}
}
fn truncate_with_ellipsis(s: &str, max_width: usize) -> String {
if s.chars().count() <= max_width {
s.to_string()
} else {
let truncated: String = s.chars().take(max_width.saturating_sub(3)).collect();
format!("{truncated}...")
}
}
fn format_labels(labels: &[String], max_shown: usize) -> String {
if labels.is_empty() {
return String::new();
}
let shown: Vec<&str> = labels.iter().take(max_shown).map(|s| s.as_str()).collect();
let overflow = labels.len().saturating_sub(max_shown);
if overflow > 0 {
format!("[{} +{}]", shown.join(", "), overflow)
} else {
format!("[{}]", shown.join(", "))
}
}
fn format_assignees(assignees: &[String]) -> String {
if assignees.is_empty() {
return "-".to_string();
@@ -731,7 +646,7 @@ fn format_assignees(assignees: &[String]) -> String {
let shown: Vec<String> = assignees
.iter()
.take(max_shown)
.map(|s| format!("@{}", truncate_with_ellipsis(s, 10)))
.map(|s| format!("@{}", render::truncate(s, 10)))
.collect();
let overflow = assignees.len().saturating_sub(max_shown);
@@ -742,21 +657,23 @@ fn format_assignees(assignees: &[String]) -> String {
}
}
fn format_discussions(total: i64, unresolved: i64) -> String {
fn format_discussions(total: i64, unresolved: i64) -> StyledCell {
if total == 0 {
return String::new();
return StyledCell::plain(String::new());
}
if unresolved > 0 {
format!("{total}/{unresolved}!")
let text = format!("{total}/");
let warn = Theme::warning().render(&format!("{unresolved}!"));
StyledCell::plain(format!("{text}{warn}"))
} else {
format!("{total}")
StyledCell::plain(format!("{total}"))
}
}
fn format_branches(target: &str, source: &str, max_width: usize) -> String {
let full = format!("{} <- {}", target, source);
truncate_with_ellipsis(&full, max_width)
render::truncate(&full, max_width)
}
pub fn print_list_issues(result: &ListResult) {
@@ -766,71 +683,64 @@ pub fn print_list_issues(result: &ListResult) {
}
println!(
"Issues (showing {} of {})\n",
"{} {} of {}\n",
Theme::bold().render("Issues"),
result.issues.len(),
result.total_count
);
let has_any_status = result.issues.iter().any(|i| i.status_name.is_some());
let mut header = vec![
Cell::new("IID").add_attribute(Attribute::Bold),
Cell::new("Title").add_attribute(Attribute::Bold),
Cell::new("State").add_attribute(Attribute::Bold),
];
let mut headers = vec!["IID", "Title", "State"];
if has_any_status {
header.push(Cell::new("Status").add_attribute(Attribute::Bold));
headers.push("Status");
}
header.extend([
Cell::new("Assignee").add_attribute(Attribute::Bold),
Cell::new("Labels").add_attribute(Attribute::Bold),
Cell::new("Disc").add_attribute(Attribute::Bold),
Cell::new("Updated").add_attribute(Attribute::Bold),
]);
headers.extend(["Assignee", "Labels", "Disc", "Updated"]);
let mut table = Table::new();
table
.set_content_arrangement(ContentArrangement::Dynamic)
.set_header(header);
let mut table = LoreTable::new().headers(&headers).align(0, Align::Right);
for issue in &result.issues {
let title = truncate_with_ellipsis(&issue.title, 45);
let relative_time = format_relative_time(issue.updated_at);
let labels = format_labels(&issue.labels, 2);
let title = render::truncate(&issue.title, 45);
let relative_time = render::format_relative_time_compact(issue.updated_at);
let labels = render::format_labels_bare(&issue.labels, 2);
let assignee = format_assignees(&issue.assignees);
let discussions = format_discussions(issue.discussion_count, issue.unresolved_count);
let state_cell = if issue.state == "opened" {
colored_cell(&issue.state, Color::Green)
let (icon, state_style) = if issue.state == "opened" {
(Icons::issue_opened(), Theme::success())
} else {
colored_cell(&issue.state, Color::DarkGrey)
(Icons::issue_closed(), Theme::dim())
};
let state_cell = StyledCell::styled(format!("{icon} {}", issue.state), state_style);
let mut row = vec![
colored_cell(format!("#{}", issue.iid), Color::Cyan),
Cell::new(title),
StyledCell::styled(format!("#{}", issue.iid), Theme::info()),
StyledCell::plain(title),
state_cell,
];
if has_any_status {
match &issue.status_name {
Some(status) => {
row.push(colored_cell_hex(status, issue.status_color.as_deref()));
row.push(StyledCell::plain(render::style_with_hex(
status,
issue.status_color.as_deref(),
)));
}
None => {
row.push(Cell::new(""));
row.push(StyledCell::plain(""));
}
}
}
row.extend([
colored_cell(assignee, Color::Magenta),
colored_cell(labels, Color::Yellow),
Cell::new(discussions),
colored_cell(relative_time, Color::DarkGrey),
StyledCell::styled(assignee, Theme::accent()),
StyledCell::styled(labels, Theme::warning()),
discussions,
StyledCell::styled(relative_time, Theme::dim()),
]);
table.add_row(row);
}
println!("{table}");
println!("{}", table.render());
}
pub fn print_list_issues_json(result: &ListResult, elapsed_ms: u64, fields: Option<&[String]>) {
@@ -877,58 +787,53 @@ pub fn print_list_mrs(result: &MrListResult) {
}
println!(
"Merge Requests (showing {} of {})\n",
"{} {} of {}\n",
Theme::bold().render("Merge Requests"),
result.mrs.len(),
result.total_count
);
let mut table = Table::new();
table
.set_content_arrangement(ContentArrangement::Dynamic)
.set_header(vec![
Cell::new("IID").add_attribute(Attribute::Bold),
Cell::new("Title").add_attribute(Attribute::Bold),
Cell::new("State").add_attribute(Attribute::Bold),
Cell::new("Author").add_attribute(Attribute::Bold),
Cell::new("Branches").add_attribute(Attribute::Bold),
Cell::new("Disc").add_attribute(Attribute::Bold),
Cell::new("Updated").add_attribute(Attribute::Bold),
]);
let mut table = LoreTable::new()
.headers(&[
"IID", "Title", "State", "Author", "Branches", "Disc", "Updated",
])
.align(0, Align::Right);
for mr in &result.mrs {
let title = if mr.draft {
format!("[DRAFT] {}", truncate_with_ellipsis(&mr.title, 38))
format!("{} {}", Icons::mr_draft(), render::truncate(&mr.title, 42))
} else {
truncate_with_ellipsis(&mr.title, 45)
render::truncate(&mr.title, 45)
};
let relative_time = format_relative_time(mr.updated_at);
let relative_time = render::format_relative_time_compact(mr.updated_at);
let branches = format_branches(&mr.target_branch, &mr.source_branch, 25);
let discussions = format_discussions(mr.discussion_count, mr.unresolved_count);
let state_cell = match mr.state.as_str() {
"opened" => colored_cell(&mr.state, Color::Green),
"merged" => colored_cell(&mr.state, Color::Magenta),
"closed" => colored_cell(&mr.state, Color::Red),
"locked" => colored_cell(&mr.state, Color::Yellow),
_ => colored_cell(&mr.state, Color::DarkGrey),
let (icon, style) = match mr.state.as_str() {
"opened" => (Icons::mr_opened(), Theme::success()),
"merged" => (Icons::mr_merged(), Theme::accent()),
"closed" => (Icons::mr_closed(), Theme::error()),
"locked" => (Icons::mr_opened(), Theme::warning()),
_ => (Icons::mr_opened(), Theme::dim()),
};
let state_cell = StyledCell::styled(format!("{icon} {}", mr.state), style);
table.add_row(vec![
colored_cell(format!("!{}", mr.iid), Color::Cyan),
Cell::new(title),
StyledCell::styled(format!("!{}", mr.iid), Theme::info()),
StyledCell::plain(title),
state_cell,
colored_cell(
format!("@{}", truncate_with_ellipsis(&mr.author_username, 12)),
Color::Magenta,
StyledCell::styled(
format!("@{}", render::truncate(&mr.author_username, 12)),
Theme::accent(),
),
colored_cell(branches, Color::Blue),
Cell::new(discussions),
colored_cell(relative_time, Color::DarkGrey),
StyledCell::styled(branches, Theme::info()),
discussions,
StyledCell::styled(relative_time, Theme::dim()),
]);
}
println!("{table}");
println!("{}", table.render());
}
pub fn print_list_mrs_json(result: &MrListResult, elapsed_ms: u64, fields: Option<&[String]>) {
@@ -966,77 +871,566 @@ pub fn open_mr_in_browser(result: &MrListResult) -> Option<String> {
}
}
#[cfg(test)]
mod tests {
use super::*;
// ---------------------------------------------------------------------------
// Note output formatting
// ---------------------------------------------------------------------------
#[test]
fn truncate_leaves_short_strings_alone() {
assert_eq!(truncate_with_ellipsis("short", 10), "short");
}
#[test]
fn truncate_adds_ellipsis_to_long_strings() {
assert_eq!(
truncate_with_ellipsis("this is a very long title", 15),
"this is a ve..."
);
}
#[test]
fn truncate_handles_exact_length() {
assert_eq!(truncate_with_ellipsis("exactly10!", 10), "exactly10!");
}
#[test]
fn relative_time_formats_correctly() {
let now = now_ms();
assert_eq!(format_relative_time(now - 30_000), "just now");
assert_eq!(format_relative_time(now - 120_000), "2 min ago");
assert_eq!(format_relative_time(now - 7_200_000), "2 hours ago");
assert_eq!(format_relative_time(now - 172_800_000), "2 days ago");
}
#[test]
fn format_labels_empty() {
assert_eq!(format_labels(&[], 2), "");
}
#[test]
fn format_labels_single() {
assert_eq!(format_labels(&["bug".to_string()], 2), "[bug]");
}
#[test]
fn format_labels_multiple() {
let labels = vec!["bug".to_string(), "urgent".to_string()];
assert_eq!(format_labels(&labels, 2), "[bug, urgent]");
}
#[test]
fn format_labels_overflow() {
let labels = vec![
"bug".to_string(),
"urgent".to_string(),
"wip".to_string(),
"blocked".to_string(),
];
assert_eq!(format_labels(&labels, 2), "[bug, urgent +2]");
}
#[test]
fn format_discussions_empty() {
assert_eq!(format_discussions(0, 0), "");
}
#[test]
fn format_discussions_no_unresolved() {
assert_eq!(format_discussions(5, 0), "5");
}
#[test]
fn format_discussions_with_unresolved() {
assert_eq!(format_discussions(5, 2), "5/2!");
fn truncate_body(body: &str, max_len: usize) -> String {
if body.chars().count() <= max_len {
body.to_string()
} else {
let truncated: String = body.chars().take(max_len).collect();
format!("{truncated}...")
}
}
fn format_note_type(note_type: Option<&str>) -> &str {
match note_type {
Some("DiffNote") => "Diff",
Some("DiscussionNote") => "Disc",
_ => "-",
}
}
fn format_note_path(path: Option<&str>, line: Option<i64>) -> String {
match (path, line) {
(Some(p), Some(l)) => format!("{p}:{l}"),
(Some(p), None) => p.to_string(),
_ => "-".to_string(),
}
}
fn format_note_parent(noteable_type: Option<&str>, parent_iid: Option<i64>) -> String {
match (noteable_type, parent_iid) {
(Some("Issue"), Some(iid)) => format!("Issue #{iid}"),
(Some("MergeRequest"), Some(iid)) => format!("MR !{iid}"),
_ => "-".to_string(),
}
}
pub fn print_list_notes(result: &NoteListResult) {
if result.notes.is_empty() {
println!("No notes found.");
return;
}
println!(
"{} {} of {}\n",
Theme::bold().render("Notes"),
result.notes.len(),
result.total_count
);
let mut table = LoreTable::new()
.headers(&[
"ID",
"Author",
"Type",
"Body",
"Path:Line",
"Parent",
"Created",
])
.align(0, Align::Right);
for note in &result.notes {
let body = note
.body
.as_deref()
.map(|b| truncate_body(b, 60))
.unwrap_or_default();
let path = format_note_path(note.position_new_path.as_deref(), note.position_new_line);
let parent = format_note_parent(note.noteable_type.as_deref(), note.parent_iid);
let relative_time = render::format_relative_time_compact(note.created_at);
let note_type = format_note_type(note.note_type.as_deref());
table.add_row(vec![
StyledCell::styled(note.gitlab_id.to_string(), Theme::info()),
StyledCell::styled(
format!("@{}", render::truncate(&note.author_username, 12)),
Theme::accent(),
),
StyledCell::plain(note_type),
StyledCell::plain(body),
StyledCell::plain(path),
StyledCell::plain(parent),
StyledCell::styled(relative_time, Theme::dim()),
]);
}
println!("{}", table.render());
}
pub fn print_list_notes_json(result: &NoteListResult, elapsed_ms: u64, fields: Option<&[String]>) {
let json_result = NoteListResultJson::from(result);
let meta = RobotMeta { elapsed_ms };
let output = serde_json::json!({
"ok": true,
"data": json_result,
"meta": meta,
});
let mut output = output;
if let Some(f) = fields {
let expanded = expand_fields_preset(f, "notes");
filter_fields(&mut output, "notes", &expanded);
}
match serde_json::to_string(&output) {
Ok(json) => println!("{json}"),
Err(e) => eprintln!("Error serializing to JSON: {e}"),
}
}
pub fn print_list_notes_jsonl(result: &NoteListResult) {
for note in &result.notes {
let json_row = NoteListRowJson::from(note);
match serde_json::to_string(&json_row) {
Ok(json) => println!("{json}"),
Err(e) => eprintln!("Error serializing to JSON: {e}"),
}
}
}
/// Escape a field for RFC 4180 CSV: quote fields containing commas, quotes, or newlines.
fn csv_escape(field: &str) -> String {
if field.contains(',') || field.contains('"') || field.contains('\n') || field.contains('\r') {
let escaped = field.replace('"', "\"\"");
format!("\"{escaped}\"")
} else {
field.to_string()
}
}
pub fn print_list_notes_csv(result: &NoteListResult) {
println!(
"id,gitlab_id,author_username,body,note_type,is_system,created_at,updated_at,position_new_path,position_new_line,noteable_type,parent_iid,project_path"
);
for note in &result.notes {
let body = note.body.as_deref().unwrap_or("");
let note_type = note.note_type.as_deref().unwrap_or("");
let path = note.position_new_path.as_deref().unwrap_or("");
let line = note
.position_new_line
.map_or(String::new(), |l| l.to_string());
let noteable = note.noteable_type.as_deref().unwrap_or("");
let parent_iid = note.parent_iid.map_or(String::new(), |i| i.to_string());
println!(
"{},{},{},{},{},{},{},{},{},{},{},{},{}",
note.id,
note.gitlab_id,
csv_escape(&note.author_username),
csv_escape(body),
csv_escape(note_type),
note.is_system,
note.created_at,
note.updated_at,
csv_escape(path),
line,
csv_escape(noteable),
parent_iid,
csv_escape(&note.project_path),
);
}
}
// ---------------------------------------------------------------------------
// Note query layer
// ---------------------------------------------------------------------------
#[derive(Debug, Serialize)]
pub struct NoteListRow {
pub id: i64,
pub gitlab_id: i64,
pub author_username: String,
pub body: Option<String>,
pub note_type: Option<String>,
pub is_system: bool,
pub created_at: i64,
pub updated_at: i64,
pub position_new_path: Option<String>,
pub position_new_line: Option<i64>,
pub position_old_path: Option<String>,
pub position_old_line: Option<i64>,
pub resolvable: bool,
pub resolved: bool,
pub resolved_by: Option<String>,
pub noteable_type: Option<String>,
pub parent_iid: Option<i64>,
pub parent_title: Option<String>,
pub project_path: String,
}
#[derive(Serialize)]
pub struct NoteListRowJson {
pub id: i64,
pub gitlab_id: i64,
pub author_username: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub body: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub note_type: Option<String>,
pub is_system: bool,
pub created_at_iso: String,
pub updated_at_iso: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub position_new_path: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub position_new_line: Option<i64>,
#[serde(skip_serializing_if = "Option::is_none")]
pub position_old_path: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub position_old_line: Option<i64>,
pub resolvable: bool,
pub resolved: bool,
#[serde(skip_serializing_if = "Option::is_none")]
pub resolved_by: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub noteable_type: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub parent_iid: Option<i64>,
#[serde(skip_serializing_if = "Option::is_none")]
pub parent_title: Option<String>,
pub project_path: String,
}
impl From<&NoteListRow> for NoteListRowJson {
fn from(row: &NoteListRow) -> Self {
Self {
id: row.id,
gitlab_id: row.gitlab_id,
author_username: row.author_username.clone(),
body: row.body.clone(),
note_type: row.note_type.clone(),
is_system: row.is_system,
created_at_iso: ms_to_iso(row.created_at),
updated_at_iso: ms_to_iso(row.updated_at),
position_new_path: row.position_new_path.clone(),
position_new_line: row.position_new_line,
position_old_path: row.position_old_path.clone(),
position_old_line: row.position_old_line,
resolvable: row.resolvable,
resolved: row.resolved,
resolved_by: row.resolved_by.clone(),
noteable_type: row.noteable_type.clone(),
parent_iid: row.parent_iid,
parent_title: row.parent_title.clone(),
project_path: row.project_path.clone(),
}
}
}
#[derive(Debug)]
pub struct NoteListResult {
pub notes: Vec<NoteListRow>,
pub total_count: i64,
}
#[derive(Serialize)]
pub struct NoteListResultJson {
pub notes: Vec<NoteListRowJson>,
pub total_count: i64,
pub showing: usize,
}
impl From<&NoteListResult> for NoteListResultJson {
fn from(result: &NoteListResult) -> Self {
Self {
notes: result.notes.iter().map(NoteListRowJson::from).collect(),
total_count: result.total_count,
showing: result.notes.len(),
}
}
}
pub struct NoteListFilters {
pub limit: usize,
pub project: Option<String>,
pub author: Option<String>,
pub note_type: Option<String>,
pub include_system: bool,
pub for_issue_iid: Option<i64>,
pub for_mr_iid: Option<i64>,
pub note_id: Option<i64>,
pub gitlab_note_id: Option<i64>,
pub discussion_id: Option<String>,
pub since: Option<String>,
pub until: Option<String>,
pub path: Option<String>,
pub contains: Option<String>,
pub resolution: Option<String>,
pub sort: String,
pub order: String,
}
pub fn query_notes(
conn: &Connection,
filters: &NoteListFilters,
config: &Config,
) -> Result<NoteListResult> {
let mut where_clauses: Vec<String> = Vec::new();
let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
// Project filter
if let Some(ref project) = filters.project {
let project_id = resolve_project(conn, project)?;
where_clauses.push("n.project_id = ?".to_string());
params.push(Box::new(project_id));
}
// Author filter (case-insensitive, strip leading @)
if let Some(ref author) = filters.author {
let username = author.strip_prefix('@').unwrap_or(author);
where_clauses.push("n.author_username = ? COLLATE NOCASE".to_string());
params.push(Box::new(username.to_string()));
}
// Note type filter
if let Some(ref note_type) = filters.note_type {
where_clauses.push("n.note_type = ?".to_string());
params.push(Box::new(note_type.clone()));
}
// System note filter (default: exclude system notes)
if !filters.include_system {
where_clauses.push("n.is_system = 0".to_string());
}
// Since filter
let since_ms = if let Some(ref since_str) = filters.since {
let ms = parse_since(since_str).ok_or_else(|| {
LoreError::Other(format!(
"Invalid --since value '{}'. Use relative (7d, 2w, 1m) or absolute (YYYY-MM-DD) format.",
since_str
))
})?;
where_clauses.push("n.created_at >= ?".to_string());
params.push(Box::new(ms));
Some(ms)
} else {
None
};
// Until filter (end of day for date-only input)
if let Some(ref until_str) = filters.until {
let until_ms = if until_str.len() == 10
&& until_str.chars().filter(|&c| c == '-').count() == 2
{
// Date-only: use end of day 23:59:59.999
let iso_full = format!("{until_str}T23:59:59.999Z");
crate::core::time::iso_to_ms(&iso_full).ok_or_else(|| {
LoreError::Other(format!(
"Invalid --until value '{}'. Use YYYY-MM-DD or relative format.",
until_str
))
})?
} else {
parse_since(until_str).ok_or_else(|| {
LoreError::Other(format!(
"Invalid --until value '{}'. Use relative (7d, 2w, 1m) or absolute (YYYY-MM-DD) format.",
until_str
))
})?
};
// Validate since <= until
if let Some(s) = since_ms
&& s > until_ms
{
return Err(LoreError::Other(
"Invalid time window: --since is after --until.".to_string(),
));
}
where_clauses.push("n.created_at <= ?".to_string());
params.push(Box::new(until_ms));
}
// Path filter (trailing / = prefix match, else exact)
if let Some(ref path) = filters.path {
if let Some(prefix) = path.strip_suffix('/') {
let escaped = note_escape_like(prefix);
where_clauses.push("n.position_new_path LIKE ? ESCAPE '\\'".to_string());
params.push(Box::new(format!("{escaped}%")));
} else {
where_clauses.push("n.position_new_path = ?".to_string());
params.push(Box::new(path.clone()));
}
}
// Contains filter (LIKE %term% on body, case-insensitive)
if let Some(ref contains) = filters.contains {
let escaped = note_escape_like(contains);
where_clauses.push("n.body LIKE ? ESCAPE '\\' COLLATE NOCASE".to_string());
params.push(Box::new(format!("%{escaped}%")));
}
// Resolution filter
if let Some(ref resolution) = filters.resolution {
match resolution.as_str() {
"unresolved" => {
where_clauses.push("n.resolvable = 1 AND n.resolved = 0".to_string());
}
"resolved" => {
where_clauses.push("n.resolvable = 1 AND n.resolved = 1".to_string());
}
other => {
return Err(LoreError::Other(format!(
"Invalid --resolution value '{}'. Use 'resolved' or 'unresolved'.",
other
)));
}
}
}
// For-issue-iid filter (requires project context)
if let Some(iid) = filters.for_issue_iid {
let project_str = filters.project.as_deref().or(config.default_project.as_deref()).ok_or_else(|| {
LoreError::Other(
"Cannot filter by issue IID without a project context. Use --project or set defaultProject in config."
.to_string(),
)
})?;
let project_id = resolve_project(conn, project_str)?;
where_clauses.push(
"d.issue_id = (SELECT id FROM issues WHERE project_id = ? AND iid = ?)".to_string(),
);
params.push(Box::new(project_id));
params.push(Box::new(iid));
}
// For-mr-iid filter (requires project context)
if let Some(iid) = filters.for_mr_iid {
let project_str = filters.project.as_deref().or(config.default_project.as_deref()).ok_or_else(|| {
LoreError::Other(
"Cannot filter by MR IID without a project context. Use --project or set defaultProject in config."
.to_string(),
)
})?;
let project_id = resolve_project(conn, project_str)?;
where_clauses.push(
"d.merge_request_id = (SELECT id FROM merge_requests WHERE project_id = ? AND iid = ?)"
.to_string(),
);
params.push(Box::new(project_id));
params.push(Box::new(iid));
}
// Note ID filter
if let Some(id) = filters.note_id {
where_clauses.push("n.id = ?".to_string());
params.push(Box::new(id));
}
// GitLab note ID filter
if let Some(gitlab_id) = filters.gitlab_note_id {
where_clauses.push("n.gitlab_id = ?".to_string());
params.push(Box::new(gitlab_id));
}
// Discussion ID filter
if let Some(ref disc_id) = filters.discussion_id {
where_clauses.push("d.gitlab_discussion_id = ?".to_string());
params.push(Box::new(disc_id.clone()));
}
let where_sql = if where_clauses.is_empty() {
String::new()
} else {
format!("WHERE {}", where_clauses.join(" AND "))
};
// Count query
let count_sql = format!(
"SELECT COUNT(*) FROM notes n
JOIN discussions d ON n.discussion_id = d.id
JOIN projects p ON n.project_id = p.id
LEFT JOIN issues i ON d.issue_id = i.id
LEFT JOIN merge_requests m ON d.merge_request_id = m.id
{where_sql}"
);
let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let total_count: i64 = conn.query_row(&count_sql, param_refs.as_slice(), |row| row.get(0))?;
// Sort + order
let sort_column = match filters.sort.as_str() {
"updated" => "n.updated_at",
_ => "n.created_at",
};
let order = if filters.order == "asc" {
"ASC"
} else {
"DESC"
};
let query_sql = format!(
"SELECT
n.id,
n.gitlab_id,
n.author_username,
n.body,
n.note_type,
n.is_system,
n.created_at,
n.updated_at,
n.position_new_path,
n.position_new_line,
n.position_old_path,
n.position_old_line,
n.resolvable,
n.resolved,
n.resolved_by,
d.noteable_type,
COALESCE(i.iid, m.iid) AS parent_iid,
COALESCE(i.title, m.title) AS parent_title,
p.path_with_namespace AS project_path
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
JOIN projects p ON n.project_id = p.id
LEFT JOIN issues i ON d.issue_id = i.id
LEFT JOIN merge_requests m ON d.merge_request_id = m.id
{where_sql}
ORDER BY {sort_column} {order}, n.id {order}
LIMIT ?"
);
params.push(Box::new(filters.limit as i64));
let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let mut stmt = conn.prepare(&query_sql)?;
let notes: Vec<NoteListRow> = stmt
.query_map(param_refs.as_slice(), |row| {
let is_system_int: i64 = row.get(5)?;
let resolvable_int: i64 = row.get(12)?;
let resolved_int: i64 = row.get(13)?;
Ok(NoteListRow {
id: row.get(0)?,
gitlab_id: row.get(1)?,
author_username: row.get::<_, Option<String>>(2)?.unwrap_or_default(),
body: row.get(3)?,
note_type: row.get(4)?,
is_system: is_system_int == 1,
created_at: row.get(6)?,
updated_at: row.get(7)?,
position_new_path: row.get(8)?,
position_new_line: row.get(9)?,
position_old_path: row.get(10)?,
position_old_line: row.get(11)?,
resolvable: resolvable_int == 1,
resolved: resolved_int == 1,
resolved_by: row.get(14)?,
noteable_type: row.get(15)?,
parent_iid: row.get(16)?,
parent_title: row.get(17)?,
project_path: row.get(18)?,
})
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
Ok(NoteListResult { notes, total_count })
}
#[cfg(test)]
#[path = "list_tests.rs"]
mod tests;

File diff suppressed because it is too large Load Diff

View File

@@ -3,6 +3,7 @@ pub mod count;
pub mod doctor;
pub mod drift;
pub mod embed;
pub mod file_history;
pub mod generate_docs;
pub mod ingest;
pub mod init;
@@ -13,6 +14,7 @@ pub mod stats;
pub mod sync;
pub mod sync_status;
pub mod timeline;
pub mod trace;
pub mod who;
pub use auth_test::run_auth_test;
@@ -23,6 +25,7 @@ pub use count::{
pub use doctor::{DoctorChecks, print_doctor_results, run_doctor};
pub use drift::{DriftResponse, print_drift_human, print_drift_json, run_drift};
pub use embed::{print_embed, print_embed_json, run_embed};
pub use file_history::{print_file_history, print_file_history_json, run_file_history};
pub use generate_docs::{print_generate_docs, print_generate_docs_json, run_generate_docs};
pub use ingest::{
DryRunPreview, IngestDisplay, print_dry_run_preview, print_dry_run_preview_json,
@@ -30,8 +33,10 @@ pub use ingest::{
};
pub use init::{InitInputs, InitOptions, InitResult, run_init};
pub use list::{
ListFilters, MrListFilters, open_issue_in_browser, open_mr_in_browser, print_list_issues,
print_list_issues_json, print_list_mrs, print_list_mrs_json, run_list_issues, run_list_mrs,
ListFilters, MrListFilters, NoteListFilters, open_issue_in_browser, open_mr_in_browser,
print_list_issues, print_list_issues_json, print_list_mrs, print_list_mrs_json,
print_list_notes, print_list_notes_csv, print_list_notes_json, print_list_notes_jsonl,
query_notes, run_list_issues, run_list_mrs,
};
pub use search::{
SearchCliFilters, SearchResponse, print_search_results, print_search_results_json, run_search,
@@ -44,4 +49,5 @@ pub use stats::{print_stats, print_stats_json, run_stats};
pub use sync::{SyncOptions, SyncResult, print_sync, print_sync_json, run_sync};
pub use sync_status::{print_sync_status, print_sync_status_json, run_sync_status};
pub use timeline::{TimelineParams, print_timeline, print_timeline_json_with_meta, run_timeline};
pub use trace::{parse_trace_path, print_trace, print_trace_json};
pub use who::{WhoRun, print_who_human, print_who_json, run_who};

View File

@@ -1,6 +1,6 @@
use std::collections::HashMap;
use console::style;
use crate::cli::render::Theme;
use serde::Serialize;
use crate::Config;
@@ -309,67 +309,93 @@ fn parse_json_array(json: &str) -> Vec<String> {
.collect()
}
/// Render FTS snippet with `<mark>` tags as terminal highlight style.
fn render_snippet(snippet: &str) -> String {
let mut result = String::new();
let mut remaining = snippet;
while let Some(start) = remaining.find("<mark>") {
result.push_str(&Theme::muted().render(&remaining[..start]));
remaining = &remaining[start + 6..];
if let Some(end) = remaining.find("</mark>") {
let highlighted = &remaining[..end];
result.push_str(&Theme::highlight().render(highlighted));
remaining = &remaining[end + 7..];
}
}
result.push_str(&Theme::muted().render(remaining));
result
}
pub fn print_search_results(response: &SearchResponse) {
if !response.warnings.is_empty() {
for w in &response.warnings {
eprintln!("{} {}", style("Warning:").yellow(), w);
eprintln!("{} {}", Theme::warning().render("Warning:"), w);
}
}
if response.results.is_empty() {
println!("No results found for '{}'", style(&response.query).bold());
println!(
"No results found for '{}'",
Theme::bold().render(&response.query)
);
return;
}
println!(
"{} results for '{}' ({})",
response.total_results,
style(&response.query).bold(),
response.mode
"\n {} results for '{}' {}",
Theme::bold().render(&response.total_results.to_string()),
Theme::bold().render(&response.query),
Theme::muted().render(&response.mode)
);
println!();
for (i, result) in response.results.iter().enumerate() {
let type_prefix = match result.source_type.as_str() {
"issue" => "Issue",
"merge_request" => "MR",
"discussion" => "Discussion",
_ => &result.source_type,
println!();
let type_badge = match result.source_type.as_str() {
"issue" => Theme::issue_ref().render("issue"),
"merge_request" => Theme::mr_ref().render(" mr "),
"discussion" => Theme::info().render(" disc"),
"note" => Theme::muted().render(" note"),
_ => Theme::muted().render(&format!("{:>5}", &result.source_type)),
};
// Title line: rank, type badge, title
println!(
"[{}] {} - {} (score: {:.2})",
i + 1,
style(type_prefix).cyan(),
result.title,
result.score
" {:>3}. {} {}",
Theme::muted().render(&(i + 1).to_string()),
type_badge,
Theme::bold().render(&result.title)
);
if let Some(ref url) = result.url {
println!(" {}", style(url).dim());
// Metadata: project, author, labels — compact middle-dot line
let sep = Theme::muted().render(" \u{b7} ");
let mut meta_parts: Vec<String> = Vec::new();
meta_parts.push(Theme::muted().render(&result.project_path));
if let Some(ref author) = result.author {
meta_parts.push(Theme::username().render(&format!("@{author}")));
}
println!(
" {} | {}",
style(&result.project_path).dim(),
result
.author
.as_deref()
.map(|a| format!("@{}", a))
.unwrap_or_default()
);
if !result.labels.is_empty() {
println!(" Labels: {}", result.labels.join(", "));
let label_str = if result.labels.len() <= 3 {
result.labels.join(", ")
} else {
format!(
"{} +{}",
result.labels[..2].join(", "),
result.labels.len() - 2
)
};
meta_parts.push(Theme::muted().render(&label_str));
}
println!(" {}", meta_parts.join(&sep));
let clean_snippet = result.snippet.replace("<mark>", "").replace("</mark>", "");
println!(" {}", style(clean_snippet).dim());
// Snippet with highlight styling
let rendered = render_snippet(&result.snippet);
println!(" {rendered}");
if let Some(ref explain) = result.explain {
println!(
" {} vector_rank={} fts_rank={} rrf_score={:.6}",
style("[explain]").magenta(),
" {} vec={} fts={} rrf={:.4}",
Theme::accent().render("explain"),
explain
.vector_rank
.map(|r| r.to_string())
@@ -381,9 +407,9 @@ pub fn print_search_results(response: &SearchResponse) {
explain.rrf_score
);
}
}
println!();
}
}
#[derive(Serialize)]

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::{self, Icons, Theme};
use rusqlite::Connection;
use serde::Serialize;
@@ -160,6 +160,7 @@ pub fn run_show_issue(
})
}
#[derive(Debug)]
struct IssueRow {
id: i64,
iid: i64,
@@ -194,7 +195,7 @@ fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Resu
i.due_date, i.milestone_title,
(SELECT COUNT(*) FROM notes n
JOIN discussions d ON n.discussion_id = d.id
WHERE d.noteable_type = 'Issue' AND d.noteable_id = i.id AND n.is_system = 0) AS user_notes_count,
WHERE d.noteable_type = 'Issue' AND d.issue_id = i.id AND n.is_system = 0) AS user_notes_count,
i.status_name, i.status_category, i.status_color,
i.status_icon_name, i.status_synced_at
FROM issues i
@@ -210,7 +211,7 @@ fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Resu
i.due_date, i.milestone_title,
(SELECT COUNT(*) FROM notes n
JOIN discussions d ON n.discussion_id = d.id
WHERE d.noteable_type = 'Issue' AND d.noteable_id = i.id AND n.is_system = 0) AS user_notes_count,
WHERE d.noteable_type = 'Issue' AND d.issue_id = i.id AND n.is_system = 0) AS user_notes_count,
i.status_name, i.status_category, i.status_color,
i.status_icon_name, i.status_synced_at
FROM issues i
@@ -605,69 +606,55 @@ fn get_mr_discussions(conn: &Connection, mr_id: i64) -> Result<Vec<MrDiscussionD
}
fn format_date(ms: i64) -> String {
let iso = ms_to_iso(ms);
iso.split('T').next().unwrap_or(&iso).to_string()
render::format_date(ms)
}
fn wrap_text(text: &str, width: usize, indent: &str) -> String {
let mut result = String::new();
let mut current_line = String::new();
for word in text.split_whitespace() {
if current_line.is_empty() {
current_line = word.to_string();
} else if current_line.len() + 1 + word.len() <= width {
current_line.push(' ');
current_line.push_str(word);
} else {
if !result.is_empty() {
result.push('\n');
result.push_str(indent);
}
result.push_str(&current_line);
current_line = word.to_string();
}
}
if !current_line.is_empty() {
if !result.is_empty() {
result.push('\n');
result.push_str(indent);
}
result.push_str(&current_line);
}
result
render::wrap_indent(text, width, indent)
}
pub fn print_show_issue(issue: &IssueDetail) {
let header = format!("Issue #{}: {}", issue.iid, issue.title);
println!("{}", style(&header).bold());
println!("{}", "".repeat(header.len().min(80)));
println!();
// Title line
println!(
" Issue #{}: {}",
issue.iid,
Theme::bold().render(&issue.title),
);
println!("Ref: {}", style(&issue.references_full).dim());
println!("Project: {}", style(&issue.project_path).cyan());
// Details section
println!("{}", render::section_divider("Details"));
let state_styled = if issue.state == "opened" {
style(&issue.state).green()
println!(
" Ref {}",
Theme::muted().render(&issue.references_full)
);
println!(
" Project {}",
Theme::info().render(&issue.project_path)
);
let (icon, state_style) = if issue.state == "opened" {
(Icons::issue_opened(), Theme::success())
} else {
style(&issue.state).dim()
(Icons::issue_closed(), Theme::dim())
};
println!("State: {}", state_styled);
if issue.confidential {
println!(" {}", style("CONFIDENTIAL").red().bold());
}
println!(
" State {}",
state_style.render(&format!("{icon} {}", issue.state))
);
if let Some(status) = &issue.status_name {
println!(
"Status: {}",
style_with_hex(status, issue.status_color.as_deref())
" Status {}",
render::style_with_hex(status, issue.status_color.as_deref())
);
}
println!("Author: @{}", issue.author_username);
if issue.confidential {
println!(" {}", Theme::error().bold().render("CONFIDENTIAL"));
}
println!(" Author @{}", issue.author_username);
if !issue.assignees.is_empty() {
let label = if issue.assignees.len() > 1 {
@@ -676,69 +663,82 @@ pub fn print_show_issue(issue: &IssueDetail) {
"Assignee"
};
println!(
"{}:{} {}",
" {}{} {}",
label,
" ".repeat(10 - label.len()),
" ".repeat(12 - label.len()),
issue
.assignees
.iter()
.map(|a| format!("@{}", a))
.map(|a| format!("@{a}"))
.collect::<Vec<_>>()
.join(", ")
);
}
println!("Created: {}", format_date(issue.created_at));
println!("Updated: {}", format_date(issue.updated_at));
println!(
" Created {} ({})",
format_date(issue.created_at),
render::format_relative_time_compact(issue.created_at),
);
println!(
" Updated {} ({})",
format_date(issue.updated_at),
render::format_relative_time_compact(issue.updated_at),
);
if let Some(closed_at) = &issue.closed_at {
println!("Closed: {}", closed_at);
println!(" Closed {closed_at}");
}
if let Some(due) = &issue.due_date {
println!("Due: {}", due);
println!(" Due {due}");
}
if let Some(ms) = &issue.milestone {
println!("Milestone: {}", ms);
println!(" Milestone {ms}");
}
if issue.labels.is_empty() {
println!("Labels: {}", style("(none)").dim());
} else {
println!("Labels: {}", issue.labels.join(", "));
}
if !issue.closing_merge_requests.is_empty() {
println!();
println!("{}", style("Development:").bold());
for mr in &issue.closing_merge_requests {
let state_indicator = match mr.state.as_str() {
"merged" => style(&mr.state).green(),
"opened" => style(&mr.state).cyan(),
"closed" => style(&mr.state).red(),
_ => style(&mr.state).dim(),
};
println!(" !{} {} ({})", mr.iid, mr.title, state_indicator);
}
if !issue.labels.is_empty() {
println!(
" Labels {}",
render::format_labels_bare(&issue.labels, issue.labels.len())
);
}
if let Some(url) = &issue.web_url {
println!("URL: {}", style(url).dim());
println!(" URL {}", Theme::muted().render(url));
}
println!();
// Development section
if !issue.closing_merge_requests.is_empty() {
println!("{}", render::section_divider("Development"));
for mr in &issue.closing_merge_requests {
let (mr_icon, mr_style) = match mr.state.as_str() {
"merged" => (Icons::mr_merged(), Theme::accent()),
"opened" => (Icons::mr_opened(), Theme::success()),
"closed" => (Icons::mr_closed(), Theme::error()),
_ => (Icons::mr_opened(), Theme::dim()),
};
println!(
" {} !{} {} {}",
mr_style.render(mr_icon),
mr.iid,
mr.title,
mr_style.render(&mr.state),
);
}
}
println!("{}", style("Description:").bold());
// Description section
println!("{}", render::section_divider("Description"));
if let Some(desc) = &issue.description {
let wrapped = wrap_text(desc, 76, " ");
println!(" {}", wrapped);
let wrapped = wrap_text(desc, 72, " ");
println!(" {wrapped}");
} else {
println!(" {}", style("(no description)").dim());
println!(" {}", Theme::muted().render("(no description)"));
}
println!();
// Discussions section
let user_discussions: Vec<&DiscussionDetail> = issue
.discussions
.iter()
@@ -746,13 +746,12 @@ pub fn print_show_issue(issue: &IssueDetail) {
.collect();
if user_discussions.is_empty() {
println!("{}", style("Discussions: (none)").dim());
println!("\n {}", Theme::muted().render("No discussions"));
} else {
println!(
"{}",
style(format!("Discussions ({}):", user_discussions.len())).bold()
render::section_divider(&format!("Discussions ({})", user_discussions.len()))
);
println!();
for discussion in user_discussions {
let user_notes: Vec<&NoteDetail> =
@@ -760,22 +759,22 @@ pub fn print_show_issue(issue: &IssueDetail) {
if let Some(first_note) = user_notes.first() {
println!(
" {} ({}):",
style(format!("@{}", first_note.author_username)).cyan(),
format_date(first_note.created_at)
" {} {}",
Theme::info().render(&format!("@{}", first_note.author_username)),
format_date(first_note.created_at),
);
let wrapped = wrap_text(&first_note.body, 72, " ");
println!(" {}", wrapped);
let wrapped = wrap_text(&first_note.body, 68, " ");
println!(" {wrapped}");
println!();
for reply in user_notes.iter().skip(1) {
println!(
" {} ({}):",
style(format!("@{}", reply.author_username)).cyan(),
format_date(reply.created_at)
" {} {}",
Theme::info().render(&format!("@{}", reply.author_username)),
format_date(reply.created_at),
);
let wrapped = wrap_text(&reply.body, 68, " ");
println!(" {}", wrapped);
let wrapped = wrap_text(&reply.body, 66, " ");
println!(" {wrapped}");
println!();
}
}
@@ -784,36 +783,49 @@ pub fn print_show_issue(issue: &IssueDetail) {
}
pub fn print_show_mr(mr: &MrDetail) {
let draft_prefix = if mr.draft { "[Draft] " } else { "" };
let header = format!("MR !{}: {}{}", mr.iid, draft_prefix, mr.title);
println!("{}", style(&header).bold());
println!("{}", "".repeat(header.len().min(80)));
println!();
println!("Project: {}", style(&mr.project_path).cyan());
let state_styled = match mr.state.as_str() {
"opened" => style(&mr.state).green(),
"merged" => style(&mr.state).magenta(),
"closed" => style(&mr.state).red(),
_ => style(&mr.state).dim(),
// Title line
let draft_prefix = if mr.draft {
format!("{} ", Icons::mr_draft())
} else {
String::new()
};
println!("State: {}", state_styled);
println!(
"Branches: {} -> {}",
style(&mr.source_branch).cyan(),
style(&mr.target_branch).yellow()
" MR !{}: {}{}",
mr.iid,
draft_prefix,
Theme::bold().render(&mr.title),
);
println!("Author: @{}", mr.author_username);
// Details section
println!("{}", render::section_divider("Details"));
println!(" Project {}", Theme::info().render(&mr.project_path));
let (icon, state_style) = match mr.state.as_str() {
"opened" => (Icons::mr_opened(), Theme::success()),
"merged" => (Icons::mr_merged(), Theme::accent()),
"closed" => (Icons::mr_closed(), Theme::error()),
_ => (Icons::mr_opened(), Theme::dim()),
};
println!(
" State {}",
state_style.render(&format!("{icon} {}", mr.state))
);
println!(
" Branches {} -> {}",
Theme::info().render(&mr.source_branch),
Theme::warning().render(&mr.target_branch)
);
println!(" Author @{}", mr.author_username);
if !mr.assignees.is_empty() {
println!(
"Assignees: {}",
" Assignees {}",
mr.assignees
.iter()
.map(|a| format!("@{}", a))
.map(|a| format!("@{a}"))
.collect::<Vec<_>>()
.join(", ")
);
@@ -821,48 +833,63 @@ pub fn print_show_mr(mr: &MrDetail) {
if !mr.reviewers.is_empty() {
println!(
"Reviewers: {}",
" Reviewers {}",
mr.reviewers
.iter()
.map(|r| format!("@{}", r))
.map(|r| format!("@{r}"))
.collect::<Vec<_>>()
.join(", ")
);
}
println!("Created: {}", format_date(mr.created_at));
println!("Updated: {}", format_date(mr.updated_at));
println!(
" Created {} ({})",
format_date(mr.created_at),
render::format_relative_time_compact(mr.created_at),
);
println!(
" Updated {} ({})",
format_date(mr.updated_at),
render::format_relative_time_compact(mr.updated_at),
);
if let Some(merged_at) = mr.merged_at {
println!("Merged: {}", format_date(merged_at));
println!(
" Merged {} ({})",
format_date(merged_at),
render::format_relative_time_compact(merged_at),
);
}
if let Some(closed_at) = mr.closed_at {
println!("Closed: {}", format_date(closed_at));
println!(
" Closed {} ({})",
format_date(closed_at),
render::format_relative_time_compact(closed_at),
);
}
if mr.labels.is_empty() {
println!("Labels: {}", style("(none)").dim());
} else {
println!("Labels: {}", mr.labels.join(", "));
if !mr.labels.is_empty() {
println!(
" Labels {}",
render::format_labels_bare(&mr.labels, mr.labels.len())
);
}
if let Some(url) = &mr.web_url {
println!("URL: {}", style(url).dim());
println!(" URL {}", Theme::muted().render(url));
}
println!();
println!("{}", style("Description:").bold());
// Description section
println!("{}", render::section_divider("Description"));
if let Some(desc) = &mr.description {
let wrapped = wrap_text(desc, 76, " ");
println!(" {}", wrapped);
let wrapped = wrap_text(desc, 72, " ");
println!(" {wrapped}");
} else {
println!(" {}", style("(no description)").dim());
println!(" {}", Theme::muted().render("(no description)"));
}
println!();
// Discussions section
let user_discussions: Vec<&MrDiscussionDetail> = mr
.discussions
.iter()
@@ -870,13 +897,12 @@ pub fn print_show_mr(mr: &MrDetail) {
.collect();
if user_discussions.is_empty() {
println!("{}", style("Discussions: (none)").dim());
println!("\n {}", Theme::muted().render("No discussions"));
} else {
println!(
"{}",
style(format!("Discussions ({}):", user_discussions.len())).bold()
render::section_divider(&format!("Discussions ({})", user_discussions.len()))
);
println!();
for discussion in user_discussions {
let user_notes: Vec<&MrNoteDetail> =
@@ -888,22 +914,22 @@ pub fn print_show_mr(mr: &MrDetail) {
}
println!(
" {} ({}):",
style(format!("@{}", first_note.author_username)).cyan(),
format_date(first_note.created_at)
" {} {}",
Theme::info().render(&format!("@{}", first_note.author_username)),
format_date(first_note.created_at),
);
let wrapped = wrap_text(&first_note.body, 72, " ");
println!(" {}", wrapped);
let wrapped = wrap_text(&first_note.body, 68, " ");
println!(" {wrapped}");
println!();
for reply in user_notes.iter().skip(1) {
println!(
" {} ({}):",
style(format!("@{}", reply.author_username)).cyan(),
format_date(reply.created_at)
" {} {}",
Theme::info().render(&format!("@{}", reply.author_username)),
format_date(reply.created_at),
);
let wrapped = wrap_text(&reply.body, 68, " ");
println!(" {}", wrapped);
let wrapped = wrap_text(&reply.body, 66, " ");
println!(" {wrapped}");
println!();
}
}
@@ -925,39 +951,13 @@ fn print_diff_position(pos: &DiffNotePosition) {
println!(
" {} {}{}",
style("📍").dim(),
style(file_path).yellow(),
style(line_str).dim()
Theme::dim().render("\u{1f4cd}"),
Theme::warning().render(file_path),
Theme::dim().render(&line_str)
);
}
}
fn style_with_hex<'a>(text: &'a str, hex: Option<&str>) -> console::StyledObject<&'a str> {
let styled = console::style(text);
let Some(hex) = hex else { return styled };
let hex = hex.trim_start_matches('#');
if hex.len() != 6 {
return styled;
}
let Ok(r) = u8::from_str_radix(&hex[0..2], 16) else {
return styled;
};
let Ok(g) = u8::from_str_radix(&hex[2..4], 16) else {
return styled;
};
let Ok(b) = u8::from_str_radix(&hex[4..6], 16) else {
return styled;
};
styled.color256(ansi256_from_rgb(r, g, b))
}
fn ansi256_from_rgb(r: u8, g: u8, b: u8) -> u8 {
let ri = (u16::from(r) * 5 + 127) / 255;
let gi = (u16::from(g) * 5 + 127) / 255;
let bi = (u16::from(b) * 5 + 127) / 255;
(16 + 36 * ri + 6 * gi + bi) as u8
}
#[derive(Serialize)]
pub struct IssueDetailJson {
pub id: i64,
@@ -1218,10 +1218,177 @@ mod tests {
.unwrap();
}
fn seed_second_project(conn: &Connection) {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (2, 101, 'other/repo', 'https://gitlab.example.com/other', 1000, 2000)",
[],
)
.unwrap();
}
fn seed_discussion_with_notes(
conn: &Connection,
issue_id: i64,
project_id: i64,
user_notes: usize,
system_notes: usize,
) {
let disc_id: i64 = conn
.query_row(
"SELECT COALESCE(MAX(id), 0) + 1 FROM discussions",
[],
|r| r.get(0),
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, first_note_at, last_note_at, last_seen_at)
VALUES (?1, ?2, ?3, ?4, 'Issue', 1000, 2000, 2000)",
rusqlite::params![disc_id, format!("disc-{}", disc_id), project_id, issue_id],
)
.unwrap();
for i in 0..user_notes {
conn.execute(
"INSERT INTO notes (gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system, position)
VALUES (?1, ?2, ?3, 'user1', 'comment', 1000, 2000, 2000, 0, ?4)",
rusqlite::params![1000 + disc_id * 100 + i as i64, disc_id, project_id, i as i64],
)
.unwrap();
}
for i in 0..system_notes {
conn.execute(
"INSERT INTO notes (gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system, position)
VALUES (?1, ?2, ?3, 'system', 'status changed', 1000, 2000, 2000, 1, ?4)",
rusqlite::params![2000 + disc_id * 100 + i as i64, disc_id, project_id, (user_notes + i) as i64],
)
.unwrap();
}
}
// --- find_issue tests ---
#[test]
fn test_find_issue_basic() {
let conn = setup_test_db();
seed_issue(&conn);
let row = find_issue(&conn, 10, None).unwrap();
assert_eq!(row.iid, 10);
assert_eq!(row.title, "Test issue");
assert_eq!(row.state, "opened");
assert_eq!(row.author_username, "author");
assert_eq!(row.project_path, "group/repo");
}
#[test]
fn test_find_issue_with_project_filter() {
let conn = setup_test_db();
seed_issue(&conn);
let row = find_issue(&conn, 10, Some("group/repo")).unwrap();
assert_eq!(row.iid, 10);
assert_eq!(row.project_path, "group/repo");
}
#[test]
fn test_find_issue_not_found() {
let conn = setup_test_db();
seed_issue(&conn);
let err = find_issue(&conn, 999, None).unwrap_err();
assert!(matches!(err, LoreError::NotFound(_)));
}
#[test]
fn test_find_issue_wrong_project_filter() {
let conn = setup_test_db();
seed_issue(&conn);
seed_second_project(&conn);
// Issue 10 only exists in project 1, not project 2
let err = find_issue(&conn, 10, Some("other/repo")).unwrap_err();
assert!(matches!(err, LoreError::NotFound(_)));
}
#[test]
fn test_find_issue_ambiguous_without_project() {
let conn = setup_test_db();
seed_issue(&conn); // issue iid=10 in project 1
seed_second_project(&conn);
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, author_username,
created_at, updated_at, last_seen_at)
VALUES (2, 201, 10, 2, 'Same iid different project', 'opened', 'author', 1000, 2000, 2000)",
[],
)
.unwrap();
let err = find_issue(&conn, 10, None).unwrap_err();
assert!(matches!(err, LoreError::Ambiguous(_)));
}
#[test]
fn test_find_issue_ambiguous_resolved_with_project() {
let conn = setup_test_db();
seed_issue(&conn);
seed_second_project(&conn);
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, author_username,
created_at, updated_at, last_seen_at)
VALUES (2, 201, 10, 2, 'Same iid different project', 'opened', 'author', 1000, 2000, 2000)",
[],
)
.unwrap();
let row = find_issue(&conn, 10, Some("other/repo")).unwrap();
assert_eq!(row.title, "Same iid different project");
}
#[test]
fn test_find_issue_user_notes_count_zero() {
let conn = setup_test_db();
seed_issue(&conn);
let row = find_issue(&conn, 10, None).unwrap();
assert_eq!(row.user_notes_count, 0);
}
#[test]
fn test_find_issue_user_notes_count_excludes_system() {
let conn = setup_test_db();
seed_issue(&conn);
// 2 user notes + 3 system notes = should count only 2
seed_discussion_with_notes(&conn, 1, 1, 2, 3);
let row = find_issue(&conn, 10, None).unwrap();
assert_eq!(row.user_notes_count, 2);
}
#[test]
fn test_find_issue_user_notes_count_across_discussions() {
let conn = setup_test_db();
seed_issue(&conn);
seed_discussion_with_notes(&conn, 1, 1, 3, 0); // 3 user notes
seed_discussion_with_notes(&conn, 1, 1, 1, 2); // 1 user note + 2 system
let row = find_issue(&conn, 10, None).unwrap();
assert_eq!(row.user_notes_count, 4);
}
#[test]
fn test_find_issue_notes_count_ignores_other_issues() {
let conn = setup_test_db();
seed_issue(&conn);
// Add a second issue
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, author_username,
created_at, updated_at, last_seen_at)
VALUES (2, 201, 20, 1, 'Other issue', 'opened', 'author', 1000, 2000, 2000)",
[],
)
.unwrap();
// Notes on issue 2, not issue 1
seed_discussion_with_notes(&conn, 2, 1, 5, 0);
let row = find_issue(&conn, 10, None).unwrap();
assert_eq!(row.user_notes_count, 0); // Issue 10 has no notes
}
#[test]
fn test_ansi256_from_rgb() {
assert_eq!(ansi256_from_rgb(0, 0, 0), 16);
assert_eq!(ansi256_from_rgb(255, 255, 255), 231);
// Moved to render.rs — keeping basic hex sanity check
let result = render::style_with_hex("test", Some("#ff0000"));
assert!(!result.is_empty());
}
#[test]

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::{self, Theme};
use rusqlite::Connection;
use serde::Serialize;
@@ -322,77 +322,168 @@ fn table_exists(conn: &Connection, table: &str) -> bool {
> 0
}
fn section(title: &str) {
println!("{}", render::section_divider(title));
}
pub fn print_stats(result: &StatsResult) {
println!("{}", style("Documents").cyan().bold());
println!(" Total: {}", result.documents.total);
println!(" Issues: {}", result.documents.issues);
println!(" Merge Requests: {}", result.documents.merge_requests);
println!(" Discussions: {}", result.documents.discussions);
section("Documents");
let mut parts = vec![format!(
"{} total",
render::format_number(result.documents.total)
)];
if result.documents.issues > 0 {
parts.push(format!(
"{} issues",
render::format_number(result.documents.issues)
));
}
if result.documents.merge_requests > 0 {
parts.push(format!(
"{} MRs",
render::format_number(result.documents.merge_requests)
));
}
if result.documents.discussions > 0 {
parts.push(format!(
"{} discussions",
render::format_number(result.documents.discussions)
));
}
println!(" {}", parts.join(" \u{b7} "));
if result.documents.truncated > 0 {
println!(
" Truncated: {}",
style(result.documents.truncated).yellow()
" {}",
Theme::warning().render(&format!(
"{} truncated",
render::format_number(result.documents.truncated)
))
);
}
println!();
println!("{}", style("Search Index").cyan().bold());
println!(" FTS indexed: {}", result.fts.indexed);
section("Search Index");
println!(
" Embedding coverage: {:.1}% ({}/{})",
result.embeddings.coverage_pct,
result.embeddings.embedded_documents,
result.documents.total
" {} FTS indexed",
render::format_number(result.fts.indexed)
);
let coverage_color = if result.embeddings.coverage_pct >= 95.0 {
Theme::success().render(&format!("{:.0}%", result.embeddings.coverage_pct))
} else if result.embeddings.coverage_pct >= 50.0 {
Theme::warning().render(&format!("{:.0}%", result.embeddings.coverage_pct))
} else {
Theme::error().render(&format!("{:.0}%", result.embeddings.coverage_pct))
};
println!(
" {} embedding coverage ({}/{})",
coverage_color,
render::format_number(result.embeddings.embedded_documents),
render::format_number(result.documents.total),
);
if result.embeddings.total_chunks > 0 {
println!(" Total chunks: {}", result.embeddings.total_chunks);
println!(
" {}",
Theme::dim().render(&format!(
"{} chunks",
render::format_number(result.embeddings.total_chunks)
))
);
}
println!();
println!("{}", style("Queues").cyan().bold());
println!(
" Dirty sources: {} pending, {} failed",
result.queues.dirty_sources, result.queues.dirty_sources_failed
);
println!(
" Discussion fetch: {} pending, {} failed",
result.queues.pending_discussion_fetches, result.queues.pending_discussion_fetches_failed
// Queues: only show if there's anything to report
let has_queue_activity = result.queues.dirty_sources > 0
|| result.queues.dirty_sources_failed > 0
|| result.queues.pending_discussion_fetches > 0
|| result.queues.pending_discussion_fetches_failed > 0
|| result.queues.pending_dependent_fetches > 0
|| result.queues.pending_dependent_fetches_failed > 0;
if has_queue_activity {
section("Queues");
if result.queues.dirty_sources > 0 || result.queues.dirty_sources_failed > 0 {
let mut q = Vec::new();
if result.queues.dirty_sources > 0 {
q.push(format!("{} pending", result.queues.dirty_sources));
}
if result.queues.dirty_sources_failed > 0 {
q.push(
Theme::error()
.render(&format!("{} failed", result.queues.dirty_sources_failed)),
);
}
println!(" dirty sources: {}", q.join(", "));
}
if result.queues.pending_discussion_fetches > 0
|| result.queues.pending_discussion_fetches_failed > 0
{
let mut q = Vec::new();
if result.queues.pending_discussion_fetches > 0 {
q.push(format!(
"{} pending",
result.queues.pending_discussion_fetches
));
}
if result.queues.pending_discussion_fetches_failed > 0 {
q.push(Theme::error().render(&format!(
"{} failed",
result.queues.pending_discussion_fetches_failed
)));
}
println!(" discussion fetch: {}", q.join(", "));
}
if result.queues.pending_dependent_fetches > 0
|| result.queues.pending_dependent_fetches_failed > 0
|| result.queues.pending_dependent_fetches_stuck > 0
{
println!(
" Dependent fetch: {} pending, {} failed, {} stuck",
result.queues.pending_dependent_fetches,
result.queues.pending_dependent_fetches_failed,
let mut q = Vec::new();
if result.queues.pending_dependent_fetches > 0 {
q.push(format!(
"{} pending",
result.queues.pending_dependent_fetches
));
}
if result.queues.pending_dependent_fetches_failed > 0 {
q.push(Theme::error().render(&format!(
"{} failed",
result.queues.pending_dependent_fetches_failed
)));
}
if result.queues.pending_dependent_fetches_stuck > 0 {
q.push(Theme::warning().render(&format!(
"{} stuck",
result.queues.pending_dependent_fetches_stuck
);
)));
}
println!(" dependent fetch: {}", q.join(", "));
}
} else {
section("Queues");
println!(" {}", Theme::success().render("all clear"));
}
if let Some(ref integrity) = result.integrity {
println!();
let status = if integrity.ok {
style("OK").green().bold()
section("Integrity");
if integrity.ok {
println!(
" {} all checks passed",
Theme::success().render("\u{2713}")
);
} else {
style("ISSUES FOUND").red().bold()
};
println!("{} Integrity: {}", style("Check").cyan().bold(), status);
if integrity.fts_doc_mismatch {
println!(" {} FTS/document count mismatch", style("!").red());
println!(
" {} FTS/document count mismatch",
Theme::error().render("\u{2717}")
);
}
if integrity.orphan_embeddings > 0 {
println!(
" {} {} orphan embeddings",
style("!").red(),
Theme::error().render("\u{2717}"),
integrity.orphan_embeddings
);
}
if integrity.stale_metadata > 0 {
println!(
" {} {} stale embedding metadata",
style("!").red(),
Theme::error().render("\u{2717}"),
integrity.stale_metadata
);
}
@@ -401,45 +492,36 @@ pub fn print_stats(result: &StatsResult) {
+ integrity.orphan_milestone_events;
if orphan_events > 0 {
println!(
" {} {} orphan resource events (state: {}, label: {}, milestone: {})",
style("!").red(),
orphan_events,
integrity.orphan_state_events,
integrity.orphan_label_events,
integrity.orphan_milestone_events
" {} {} orphan resource events",
Theme::error().render("\u{2717}"),
orphan_events
);
}
if integrity.queue_stuck_locks > 0 {
println!(
" {} {} stuck queue locks",
style("!").yellow(),
Theme::warning().render("!"),
integrity.queue_stuck_locks
);
}
if integrity.queue_max_attempts > 3 {
println!(
" {} max queue retry attempts: {}",
style("!").yellow(),
integrity.queue_max_attempts
);
}
if let Some(ref repair) = integrity.repair {
println!();
if repair.dry_run {
println!(
"{} {}",
style("Repair").cyan().bold(),
style("(dry run - no changes made)").yellow()
" {} {}",
Theme::bold().render("Repair"),
Theme::warning().render("(dry run)")
);
} else {
println!("{}", style("Repair").cyan().bold());
println!(" {}", Theme::bold().render("Repair"));
}
let action = if repair.dry_run {
style("would fix").yellow()
Theme::warning().render("would fix")
} else {
style("fixed").green()
Theme::success().render("fixed")
};
if repair.fts_rebuilt {
@@ -453,15 +535,17 @@ pub fn print_stats(result: &StatsResult) {
}
if repair.stale_cleared > 0 {
println!(
" {} {} stale metadata entries cleared",
" {} {} stale metadata cleared",
action, repair.stale_cleared
);
}
if !repair.fts_rebuilt && repair.orphans_deleted == 0 && repair.stale_cleared == 0 {
println!(" No issues to repair.");
println!(" {}", Theme::dim().render("nothing to repair"));
}
}
}
println!();
}
#[derive(Serialize)]

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,4 @@
use console::style;
use crate::cli::render::{self, Theme};
use rusqlite::Connection;
use serde::Serialize;
@@ -166,27 +166,6 @@ fn format_duration(ms: i64) -> String {
}
}
fn format_number(n: i64) -> String {
let is_negative = n < 0;
let abs_n = n.unsigned_abs();
let s = abs_n.to_string();
let chars: Vec<char> = s.chars().collect();
let mut result = String::new();
if is_negative {
result.push('-');
}
for (i, c) in chars.iter().enumerate() {
if i > 0 && (chars.len() - i).is_multiple_of(3) {
result.push(',');
}
result.push(*c);
}
result
}
#[derive(Serialize)]
struct SyncStatusJsonOutput {
ok: bool,
@@ -293,14 +272,14 @@ pub fn print_sync_status_json(result: &SyncStatusResult, elapsed_ms: u64) {
}
pub fn print_sync_status(result: &SyncStatusResult) {
println!("{}", style("Recent Sync Runs").bold().underlined());
println!("{}", Theme::bold().underline().render("Recent Sync Runs"));
println!();
if result.runs.is_empty() {
println!(" {}", style("No sync runs recorded yet.").dim());
println!(" {}", Theme::dim().render("No sync runs recorded yet."));
println!(
" {}",
style("Run 'lore sync' or 'lore ingest' to start.").dim()
Theme::dim().render("Run 'lore sync' or 'lore ingest' to start.")
);
} else {
for run in &result.runs {
@@ -310,16 +289,16 @@ pub fn print_sync_status(result: &SyncStatusResult) {
println!();
println!("{}", style("Cursor Positions").bold().underlined());
println!("{}", Theme::bold().underline().render("Cursor Positions"));
println!();
if result.cursors.is_empty() {
println!(" {}", style("No cursors recorded yet.").dim());
println!(" {}", Theme::dim().render("No cursors recorded yet."));
} else {
for cursor in &result.cursors {
println!(
" {} ({}):",
style(&cursor.project_path).cyan(),
Theme::info().render(&cursor.project_path),
cursor.resource_type
);
@@ -328,7 +307,10 @@ pub fn print_sync_status(result: &SyncStatusResult) {
println!(" Last updated_at: {}", ms_to_iso(ts));
}
_ => {
println!(" Last updated_at: {}", style("Not started").dim());
println!(
" Last updated_at: {}",
Theme::dim().render("Not started")
);
}
}
@@ -340,40 +322,39 @@ pub fn print_sync_status(result: &SyncStatusResult) {
println!();
println!("{}", style("Data Summary").bold().underlined());
println!("{}", Theme::bold().underline().render("Data Summary"));
println!();
println!(
" Issues: {}",
style(format_number(result.summary.issue_count)).bold()
Theme::bold().render(&render::format_number(result.summary.issue_count))
);
println!(
" MRs: {}",
style(format_number(result.summary.mr_count)).bold()
Theme::bold().render(&render::format_number(result.summary.mr_count))
);
println!(
" Discussions: {}",
style(format_number(result.summary.discussion_count)).bold()
Theme::bold().render(&render::format_number(result.summary.discussion_count))
);
let user_notes = result.summary.note_count - result.summary.system_note_count;
println!(
" Notes: {} {}",
style(format_number(user_notes)).bold(),
style(format!(
Theme::bold().render(&render::format_number(user_notes)),
Theme::dim().render(&format!(
"(excluding {} system)",
format_number(result.summary.system_note_count)
render::format_number(result.summary.system_note_count)
))
.dim()
);
}
fn print_run_line(run: &SyncRunInfo) {
let status_styled = match run.status.as_str() {
"succeeded" => style(&run.status).green(),
"failed" => style(&run.status).red(),
"running" => style(&run.status).yellow(),
_ => style(&run.status).dim(),
"succeeded" => Theme::success().render(&run.status),
"failed" => Theme::error().render(&run.status),
"running" => Theme::warning().render(&run.status),
_ => Theme::dim().render(&run.status),
};
let run_label = run
@@ -386,9 +367,9 @@ fn print_run_line(run: &SyncRunInfo) {
let time = format_full_datetime(run.started_at);
let mut parts = vec![
format!("{}", style(run_label).bold()),
format!("{status_styled}"),
format!("{}", style(&run.command).dim()),
Theme::bold().render(&run_label),
status_styled,
Theme::dim().render(&run.command),
time,
];
@@ -403,16 +384,13 @@ fn print_run_line(run: &SyncRunInfo) {
}
if run.total_errors > 0 {
parts.push(format!(
"{}",
style(format!("{} errors", run.total_errors)).red()
));
parts.push(Theme::error().render(&format!("{} errors", run.total_errors)));
}
println!(" {}", parts.join(" | "));
if let Some(error) = &run.error {
println!(" {}", style(error).red());
println!(" {}", Theme::error().render(error));
}
}
@@ -448,7 +426,7 @@ mod tests {
#[test]
fn format_number_adds_thousands_separators() {
assert_eq!(format_number(1000), "1,000");
assert_eq!(format_number(1234567), "1,234,567");
assert_eq!(render::format_number(1000), "1,000");
assert_eq!(render::format_number(1234567), "1,234,567");
}
}

View File

@@ -1,7 +1,8 @@
use console::{Alignment, pad_str, style};
use crate::cli::render::{self, Icons, Theme};
use serde::Serialize;
use crate::Config;
use crate::cli::progress::stage_spinner_v2;
use crate::core::db::create_connection;
use crate::core::error::{LoreError, Result};
use crate::core::paths::get_db_path;
@@ -12,7 +13,8 @@ use crate::core::timeline::{
};
use crate::core::timeline_collect::collect_events;
use crate::core::timeline_expand::expand_timeline;
use crate::core::timeline_seed::seed_timeline;
use crate::core::timeline_seed::{seed_timeline, seed_timeline_direct};
use crate::embedding::ollama::{OllamaClient, OllamaConfig};
/// Parameters for running the timeline pipeline.
pub struct TimelineParams {
@@ -20,15 +22,53 @@ pub struct TimelineParams {
pub project: Option<String>,
pub since: Option<String>,
pub depth: u32,
pub expand_mentions: bool,
pub no_mentions: bool,
pub limit: usize,
pub max_seeds: usize,
pub max_entities: usize,
pub max_evidence: usize,
pub robot_mode: bool,
}
/// Parsed timeline query: either a search string or a direct entity reference.
enum TimelineQuery {
Search(String),
EntityDirect { entity_type: String, iid: i64 },
}
/// Parse the timeline query for entity-direct patterns.
///
/// Recognized patterns (case-insensitive prefix):
/// - `issue:N`, `i:N` -> issue
/// - `mr:N`, `m:N` -> merge_request
/// - Anything else -> search query
fn parse_timeline_query(query: &str) -> TimelineQuery {
let query = query.trim();
if let Some((prefix, rest)) = query.split_once(':') {
let prefix_lower = prefix.to_ascii_lowercase();
if let Ok(iid) = rest.trim().parse::<i64>() {
match prefix_lower.as_str() {
"issue" | "i" => {
return TimelineQuery::EntityDirect {
entity_type: "issue".to_owned(),
iid,
};
}
"mr" | "m" => {
return TimelineQuery::EntityDirect {
entity_type: "merge_request".to_owned(),
iid,
};
}
_ => {}
}
}
}
TimelineQuery::Search(query.to_owned())
}
/// Run the full timeline pipeline: SEED -> EXPAND -> COLLECT.
pub fn run_timeline(config: &Config, params: &TimelineParams) -> Result<TimelineResult> {
pub async fn run_timeline(config: &Config, params: &TimelineParams) -> Result<TimelineResult> {
let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?;
@@ -50,37 +90,90 @@ pub fn run_timeline(config: &Config, params: &TimelineParams) -> Result<Timeline
})
.transpose()?;
// Stage 1+2: SEED + HYDRATE
let seed_result = seed_timeline(
// Parse query for entity-direct syntax (issue:N, mr:N, i:N, m:N)
let parsed_query = parse_timeline_query(&params.query);
let seed_result = match parsed_query {
TimelineQuery::EntityDirect { entity_type, iid } => {
// Direct seeding: synchronous, no Ollama needed
let spinner = stage_spinner_v2(
Icons::search(),
"Resolve",
"Resolving entity...",
params.robot_mode,
);
let result = seed_timeline_direct(&conn, &entity_type, iid, project_id)?;
spinner.finish_and_clear();
result
}
TimelineQuery::Search(ref query) => {
// Construct OllamaClient for hybrid search (same pattern as run_search)
let ollama_cfg = &config.embedding;
let client = OllamaClient::new(OllamaConfig {
base_url: ollama_cfg.base_url.clone(),
model: ollama_cfg.model.clone(),
..OllamaConfig::default()
});
// Stage 1+2: SEED + HYDRATE (hybrid search with FTS fallback)
let spinner = stage_spinner_v2(
Icons::search(),
"Seed",
"Seeding timeline...",
params.robot_mode,
);
let result = seed_timeline(
&conn,
&params.query,
Some(&client),
query,
project_id,
since_ms,
params.max_seeds,
params.max_evidence,
)?;
)
.await?;
spinner.finish_and_clear();
result
}
};
// Stage 3: EXPAND
let spinner = stage_spinner_v2(
Icons::sync(),
"Expand",
"Expanding cross-references...",
params.robot_mode,
);
let expand_result = expand_timeline(
&conn,
&seed_result.seed_entities,
params.depth,
params.expand_mentions,
!params.no_mentions,
params.max_entities,
)?;
spinner.finish_and_clear();
// Stage 4: COLLECT
let spinner = stage_spinner_v2(
Icons::sync(),
"Collect",
"Collecting events...",
params.robot_mode,
);
let (events, total_before_limit) = collect_events(
&conn,
&seed_result.seed_entities,
&expand_result.expanded_entities,
&seed_result.evidence_notes,
&seed_result.matched_discussions,
since_ms,
params.limit,
)?;
spinner.finish_and_clear();
Ok(TimelineResult {
query: params.query.clone(),
search_mode: seed_result.search_mode,
events,
total_events_before_limit: total_before_limit,
seed_entities: seed_result.seed_entities,
@@ -98,19 +191,21 @@ pub fn print_timeline(result: &TimelineResult) {
println!();
println!(
"{}",
style(format!(
Theme::bold().render(&format!(
"Timeline: \"{}\" ({} events across {} entities)",
result.query,
result.events.len(),
entity_count,
))
.bold()
);
println!("{}", "".repeat(60));
println!("{}", "\u{2500}".repeat(60));
println!();
if result.events.is_empty() {
println!(" {}", style("No events found for this query.").dim());
println!(
" {}",
Theme::dim().render("No events found for this query.")
);
println!();
return;
}
@@ -120,13 +215,18 @@ pub fn print_timeline(result: &TimelineResult) {
}
println!();
println!("{}", "".repeat(60));
println!("{}", "\u{2500}".repeat(60));
print_timeline_footer(result);
}
fn print_timeline_event(event: &TimelineEvent) {
let date = format_date(event.timestamp);
let date = render::format_date(event.timestamp);
let tag = format_event_tag(&event.event_type);
let entity_icon = match event.entity_type.as_str() {
"issue" => Icons::issue_opened(),
"merge_request" => Icons::mr_opened(),
_ => "",
};
let entity_ref = format_entity_ref(&event.entity_type, event.entity_iid);
let actor = event
.actor
@@ -135,21 +235,41 @@ fn print_timeline_event(event: &TimelineEvent) {
.unwrap_or_default();
let expanded_marker = if event.is_seed { "" } else { " [expanded]" };
let summary = truncate_summary(&event.summary, 50);
let tag_padded = pad_str(&tag, 12, Alignment::Left, None);
println!("{date} {tag_padded} {entity_ref:7} {summary:50} {actor}{expanded_marker}");
let summary = render::truncate(&event.summary, 50);
println!("{date} {tag} {entity_icon}{entity_ref:7} {summary:50} {actor}{expanded_marker}");
// Show snippet for evidence notes
if let TimelineEventType::NoteEvidence { snippet, .. } = &event.event_type
&& !snippet.is_empty()
{
for line in wrap_snippet(snippet, 60) {
let mut lines = render::wrap_lines(snippet, 60);
lines.truncate(4);
for line in lines {
println!(
" \"{}\"",
style(line).dim()
Theme::dim().render(&line)
);
}
}
// Show full discussion thread
if let TimelineEventType::DiscussionThread { notes, .. } = &event.event_type {
let bar = "\u{2500}".repeat(44);
println!(" \u{2500}\u{2500} Discussion {bar}");
for note in notes {
let note_date = render::format_date(note.created_at);
let author = note
.author
.as_deref()
.map(|a| format!("@{a}"))
.unwrap_or_else(|| "unknown".to_owned());
println!(" {} ({note_date}):", Theme::bold().render(&author));
for line in render::wrap_lines(&note.body, 60) {
println!(" {line}");
}
}
println!(" {}", "\u{2500}".repeat(60));
}
}
fn print_timeline_footer(result: &TimelineResult) {
@@ -180,22 +300,33 @@ fn print_timeline_footer(result: &TimelineResult) {
println!();
}
/// Format event tag: pad plain text to TAG_WIDTH, then apply style.
const TAG_WIDTH: usize = 11;
fn format_event_tag(event_type: &TimelineEventType) -> String {
match event_type {
TimelineEventType::Created => style("CREATED").green().to_string(),
let (label, style) = match event_type {
TimelineEventType::Created => ("CREATED", Theme::success()),
TimelineEventType::StateChanged { state } => match state.as_str() {
"closed" => style("CLOSED").red().to_string(),
"reopened" => style("REOPENED").yellow().to_string(),
_ => style(state.to_uppercase()).dim().to_string(),
"closed" => ("CLOSED", Theme::error()),
"reopened" => ("REOPENED", Theme::warning()),
_ => return style_padded(&state.to_uppercase(), TAG_WIDTH, Theme::dim()),
},
TimelineEventType::LabelAdded { .. } => style("LABEL+").blue().to_string(),
TimelineEventType::LabelRemoved { .. } => style("LABEL-").blue().to_string(),
TimelineEventType::MilestoneSet { .. } => style("MILESTONE+").magenta().to_string(),
TimelineEventType::MilestoneRemoved { .. } => style("MILESTONE-").magenta().to_string(),
TimelineEventType::Merged => style("MERGED").cyan().to_string(),
TimelineEventType::NoteEvidence { .. } => style("NOTE").dim().to_string(),
TimelineEventType::CrossReferenced { .. } => style("REF").dim().to_string(),
}
TimelineEventType::LabelAdded { .. } => ("LABEL+", Theme::info()),
TimelineEventType::LabelRemoved { .. } => ("LABEL-", Theme::info()),
TimelineEventType::MilestoneSet { .. } => ("MILESTONE+", Theme::accent()),
TimelineEventType::MilestoneRemoved { .. } => ("MILESTONE-", Theme::accent()),
TimelineEventType::Merged => ("MERGED", Theme::info()),
TimelineEventType::NoteEvidence { .. } => ("NOTE", Theme::dim()),
TimelineEventType::DiscussionThread { .. } => ("THREAD", Theme::warning()),
TimelineEventType::CrossReferenced { .. } => ("REF", Theme::dim()),
};
style_padded(label, TAG_WIDTH, style)
}
/// Pad text to width, then apply lipgloss style (so ANSI codes don't break alignment).
fn style_padded(text: &str, width: usize, style: lipgloss::Style) -> String {
let padded = format!("{:<width$}", text);
style.render(&padded)
}
fn format_entity_ref(entity_type: &str, iid: i64) -> String {
@@ -206,44 +337,6 @@ fn format_entity_ref(entity_type: &str, iid: i64) -> String {
}
}
fn format_date(ms: i64) -> String {
let iso = ms_to_iso(ms);
iso.split('T').next().unwrap_or(&iso).to_string()
}
fn truncate_summary(s: &str, max: usize) -> String {
if s.chars().count() <= max {
s.to_owned()
} else {
let truncated: String = s.chars().take(max - 3).collect();
format!("{truncated}...")
}
}
fn wrap_snippet(text: &str, width: usize) -> Vec<String> {
let mut lines = Vec::new();
let mut current = String::new();
for word in text.split_whitespace() {
if current.is_empty() {
current = word.to_string();
} else if current.len() + 1 + word.len() <= width {
current.push(' ');
current.push_str(word);
} else {
lines.push(current);
current = word.to_string();
}
}
if !current.is_empty() {
lines.push(current);
}
// Cap at 4 lines
lines.truncate(4);
lines
}
// ─── Robot JSON output ───────────────────────────────────────────────────────
/// Render timeline as robot-mode JSON in {ok, data, meta} envelope.
@@ -251,19 +344,20 @@ pub fn print_timeline_json_with_meta(
result: &TimelineResult,
total_events_before_limit: usize,
depth: u32,
expand_mentions: bool,
include_mentions: bool,
fields: Option<&[String]>,
) {
let output = TimelineJsonEnvelope {
ok: true,
data: TimelineDataJson::from_result(result),
meta: TimelineMetaJson {
search_mode: "lexical".to_owned(),
search_mode: result.search_mode.clone(),
expansion_depth: depth,
expand_mentions,
include_mentions,
total_entities: result.seed_entities.len() + result.expanded_entities.len(),
total_events: total_events_before_limit,
evidence_notes_included: count_evidence_notes(&result.events),
discussion_threads_included: count_discussion_threads(&result.events),
unresolved_references: result.unresolved_references.len(),
showing: result.events.len(),
},
@@ -461,6 +555,22 @@ fn event_type_to_json(event_type: &TimelineEventType) -> (String, serde_json::Va
"discussion_id": discussion_id,
}),
),
TimelineEventType::DiscussionThread {
discussion_id,
notes,
} => (
"discussion_thread".to_owned(),
serde_json::json!({
"discussion_id": discussion_id,
"note_count": notes.len(),
"notes": notes.iter().map(|n| serde_json::json!({
"note_id": n.note_id,
"author": n.author,
"body": n.body,
"created_at": ms_to_iso(n.created_at),
})).collect::<Vec<_>>(),
}),
),
TimelineEventType::CrossReferenced { target } => (
"cross_referenced".to_owned(),
serde_json::json!({ "target": target }),
@@ -472,10 +582,11 @@ fn event_type_to_json(event_type: &TimelineEventType) -> (String, serde_json::Va
struct TimelineMetaJson {
search_mode: String,
expansion_depth: u32,
expand_mentions: bool,
include_mentions: bool,
total_entities: usize,
total_events: usize,
evidence_notes_included: usize,
discussion_threads_included: usize,
unresolved_references: usize,
showing: usize,
}
@@ -486,3 +597,91 @@ fn count_evidence_notes(events: &[TimelineEvent]) -> usize {
.filter(|e| matches!(e.event_type, TimelineEventType::NoteEvidence { .. }))
.count()
}
fn count_discussion_threads(events: &[TimelineEvent]) -> usize {
events
.iter()
.filter(|e| matches!(e.event_type, TimelineEventType::DiscussionThread { .. }))
.count()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_issue_colon_number() {
let q = parse_timeline_query("issue:42");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
);
}
#[test]
fn test_parse_i_colon_number() {
let q = parse_timeline_query("i:42");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
);
}
#[test]
fn test_parse_mr_colon_number() {
let q = parse_timeline_query("mr:99");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "merge_request" && iid == 99)
);
}
#[test]
fn test_parse_m_colon_number() {
let q = parse_timeline_query("m:99");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "merge_request" && iid == 99)
);
}
#[test]
fn test_parse_case_insensitive() {
let q = parse_timeline_query("ISSUE:42");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
);
let q = parse_timeline_query("MR:99");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "merge_request" && iid == 99)
);
let q = parse_timeline_query("Issue:7");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 7)
);
}
#[test]
fn test_parse_search_fallback() {
let q = parse_timeline_query("switch health");
assert!(matches!(q, TimelineQuery::Search(ref s) if s == "switch health"));
}
#[test]
fn test_parse_non_numeric_falls_back_to_search() {
let q = parse_timeline_query("issue:abc");
assert!(matches!(q, TimelineQuery::Search(_)));
}
#[test]
fn test_parse_unknown_prefix_falls_back_to_search() {
let q = parse_timeline_query("foo:42");
assert!(matches!(q, TimelineQuery::Search(_)));
}
#[test]
fn test_parse_whitespace_trimmed() {
let q = parse_timeline_query(" issue:42 ");
assert!(
matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
);
}
}

242
src/cli/commands/trace.rs Normal file
View File

@@ -0,0 +1,242 @@
use crate::cli::render::{Icons, Theme};
use crate::core::trace::{TraceChain, TraceResult};
/// Parse a path with optional `:line` suffix.
///
/// Handles Windows drive letters (e.g. `C:/foo.rs`) by checking that the
/// prefix before the colon is not a single ASCII letter.
pub fn parse_trace_path(input: &str) -> (String, Option<u32>) {
if let Some((path, suffix)) = input.rsplit_once(':')
&& !path.is_empty()
&& let Ok(line) = suffix.parse::<u32>()
// Reject Windows drive letters: single ASCII letter before colon
&& (path.len() > 1 || !path.chars().next().unwrap_or(' ').is_ascii_alphabetic())
{
return (path.to_string(), Some(line));
}
(input.to_string(), None)
}
// ── Human output ────────────────────────────────────────────────────────────
pub fn print_trace(result: &TraceResult) {
let chain_info = if result.total_chains == 1 {
"1 chain".to_string()
} else {
format!("{} chains", result.total_chains)
};
let paths_info = if result.resolved_paths.len() > 1 {
format!(", {} paths", result.resolved_paths.len())
} else {
String::new()
};
println!();
println!(
"{}",
Theme::bold().render(&format!(
"Trace: {} ({}{})",
result.path, chain_info, paths_info
))
);
// Rename chain
if result.renames_followed && result.resolved_paths.len() > 1 {
let chain_str: Vec<&str> = result.resolved_paths.iter().map(String::as_str).collect();
println!(
" Rename chain: {}",
Theme::dim().render(&chain_str.join(" -> "))
);
}
if result.trace_chains.is_empty() {
println!(
"\n {} {}",
Icons::info(),
Theme::dim().render("No trace chains found for this file.")
);
println!(
" {}",
Theme::dim()
.render("Hint: Run 'lore sync' to fetch MR file changes and cross-references.")
);
println!();
return;
}
println!();
for chain in &result.trace_chains {
print_chain(chain);
}
println!();
}
fn print_chain(chain: &TraceChain) {
let (icon, state_style) = match chain.mr_state.as_str() {
"merged" => (Icons::mr_merged(), Theme::accent()),
"opened" => (Icons::mr_opened(), Theme::success()),
"closed" => (Icons::mr_closed(), Theme::warning()),
_ => (Icons::mr_opened(), Theme::dim()),
};
let date = chain
.merged_at_iso
.as_deref()
.or(Some(chain.updated_at_iso.as_str()))
.unwrap_or("")
.split('T')
.next()
.unwrap_or("");
println!(
" {} {} {} {} @{} {} {}",
icon,
Theme::accent().render(&format!("!{}", chain.mr_iid)),
chain.mr_title,
state_style.render(&chain.mr_state),
chain.mr_author,
date,
Theme::dim().render(&chain.change_type),
);
// Linked issues
for issue in &chain.issues {
let ref_icon = match issue.reference_type.as_str() {
"closes" => Icons::issue_closed(),
_ => Icons::issue_opened(),
};
println!(
" {} #{} {} {} [{}]",
ref_icon,
issue.iid,
issue.title,
Theme::dim().render(&issue.state),
Theme::dim().render(&issue.reference_type),
);
}
// Discussions
for disc in &chain.discussions {
let date = disc.created_at_iso.split('T').next().unwrap_or("");
println!(
" {} @{} ({}) [{}]: {}",
Icons::note(),
disc.author_username,
date,
Theme::dim().render(&disc.path),
disc.body
);
}
}
// ── Robot (JSON) output ─────────────────────────────────────────────────────
/// Maximum body length in robot JSON output (token efficiency).
const ROBOT_BODY_SNIPPET_LEN: usize = 500;
fn truncate_body(body: &str, max: usize) -> String {
if body.len() <= max {
return body.to_string();
}
let boundary = body.floor_char_boundary(max);
format!("{}...", &body[..boundary])
}
pub fn print_trace_json(result: &TraceResult, elapsed_ms: u64, line_requested: Option<u32>) {
// Truncate discussion bodies for token efficiency in robot mode
let chains: Vec<serde_json::Value> = result
.trace_chains
.iter()
.map(|chain| {
let discussions: Vec<serde_json::Value> = chain
.discussions
.iter()
.map(|d| {
serde_json::json!({
"discussion_id": d.discussion_id,
"mr_iid": d.mr_iid,
"author_username": d.author_username,
"body_snippet": truncate_body(&d.body, ROBOT_BODY_SNIPPET_LEN),
"path": d.path,
"created_at_iso": d.created_at_iso,
})
})
.collect();
serde_json::json!({
"mr_iid": chain.mr_iid,
"mr_title": chain.mr_title,
"mr_state": chain.mr_state,
"mr_author": chain.mr_author,
"change_type": chain.change_type,
"merged_at_iso": chain.merged_at_iso,
"updated_at_iso": chain.updated_at_iso,
"web_url": chain.web_url,
"issues": chain.issues,
"discussions": discussions,
})
})
.collect();
let output = serde_json::json!({
"ok": true,
"data": {
"path": result.path,
"resolved_paths": result.resolved_paths,
"trace_chains": chains,
},
"meta": {
"tier": "api_only",
"line_requested": line_requested,
"elapsed_ms": elapsed_ms,
"total_chains": result.total_chains,
"renames_followed": result.renames_followed,
}
});
println!("{}", serde_json::to_string(&output).unwrap_or_default());
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_trace_path_simple() {
let (path, line) = parse_trace_path("src/foo.rs");
assert_eq!(path, "src/foo.rs");
assert_eq!(line, None);
}
#[test]
fn test_parse_trace_path_with_line() {
let (path, line) = parse_trace_path("src/foo.rs:42");
assert_eq!(path, "src/foo.rs");
assert_eq!(line, Some(42));
}
#[test]
fn test_parse_trace_path_windows() {
let (path, line) = parse_trace_path("C:/foo.rs");
assert_eq!(path, "C:/foo.rs");
assert_eq!(line, None);
}
#[test]
fn test_parse_trace_path_directory() {
let (path, line) = parse_trace_path("src/auth/");
assert_eq!(path, "src/auth/");
assert_eq!(line, None);
}
#[test]
fn test_parse_trace_path_with_line_zero() {
let (path, line) = parse_trace_path("file.rs:0");
assert_eq!(path, "file.rs");
assert_eq!(line, Some(0));
}
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,7 @@
pub mod autocorrect;
pub mod commands;
pub mod progress;
pub mod render;
pub mod robot;
use clap::{Parser, Subcommand};
@@ -10,6 +11,7 @@ use std::io::IsTerminal;
#[command(name = "lore")]
#[command(version = env!("LORE_VERSION"), about = "Local GitLab data management with semantic search", long_about = None)]
#[command(subcommand_required = false)]
#[command(infer_subcommands = true)]
#[command(after_long_help = "\x1b[1mEnvironment:\x1b[0m
GITLAB_TOKEN GitLab personal access token (or name set in config)
LORE_ROBOT Enable robot/JSON mode (non-empty, non-zero value)
@@ -42,6 +44,10 @@ pub struct Cli {
#[arg(long, global = true, value_parser = ["auto", "always", "never"], default_value = "auto", help = "Color output: auto (default), always, or never")]
pub color: String,
/// Icon set: nerd (Nerd Fonts), unicode, or ascii
#[arg(long, global = true, value_parser = ["nerd", "unicode", "ascii"], help = "Icon set: nerd (Nerd Fonts), unicode, or ascii")]
pub icons: Option<String>,
/// Suppress non-essential output
#[arg(
short = 'q',
@@ -107,11 +113,21 @@ impl Cli {
#[allow(clippy::large_enum_variant)]
pub enum Commands {
/// List or show issues
#[command(visible_alias = "issue")]
Issues(IssuesArgs),
/// List or show merge requests
#[command(
visible_alias = "mr",
alias = "merge-requests",
alias = "merge-request"
)]
Mrs(MrsArgs),
/// List notes from discussions
#[command(visible_alias = "note")]
Notes(NotesArgs),
/// Ingest data from GitLab
Ingest(IngestArgs),
@@ -119,6 +135,7 @@ pub enum Commands {
Count(CountArgs),
/// Show sync state
#[command(visible_alias = "st")]
Status,
/// Verify GitLab authentication
@@ -167,9 +184,11 @@ pub enum Commands {
},
/// Search indexed documents
#[command(visible_alias = "find", alias = "query")]
Search(SearchArgs),
/// Show document and index statistics
#[command(visible_alias = "stat")]
Stats(StatsArgs),
/// Generate searchable documents from ingested data
@@ -215,6 +234,13 @@ pub enum Commands {
/// People intelligence: experts, workload, active discussions, overlap
Who(WhoArgs),
/// Show MRs that touched a file, with linked discussions
#[command(name = "file-history")]
FileHistory(FileHistoryArgs),
/// Trace why code was introduced: file -> MR -> issue -> discussion
Trace(TraceArgs),
/// Detect discussion divergence from original intent
Drift {
/// Entity type (currently only "issues" supported)
@@ -489,6 +515,113 @@ pub struct MrsArgs {
pub no_open: bool,
}
#[derive(Parser)]
#[command(after_help = "\x1b[1mExamples:\x1b[0m
lore notes # List 50 most recent notes
lore notes --author alice --since 7d # Notes by alice in last 7 days
lore notes --for-issue 42 -p group/repo # Notes on issue #42
lore notes --path src/ --resolution unresolved # Unresolved diff notes in src/")]
pub struct NotesArgs {
/// Maximum results
#[arg(
short = 'n',
long = "limit",
default_value = "50",
help_heading = "Output"
)]
pub limit: usize,
/// Select output fields (comma-separated, or 'minimal' preset: id,author_username,body,created_at_iso)
#[arg(long, help_heading = "Output", value_delimiter = ',')]
pub fields: Option<Vec<String>>,
/// Output format (table, json, jsonl, csv)
#[arg(
long,
default_value = "table",
value_parser = ["table", "json", "jsonl", "csv"],
help_heading = "Output"
)]
pub format: String,
/// Filter by author username
#[arg(short = 'a', long, help_heading = "Filters")]
pub author: Option<String>,
/// Filter by note type (DiffNote, DiscussionNote)
#[arg(long, help_heading = "Filters")]
pub note_type: Option<String>,
/// Filter by body text (substring match)
#[arg(long, help_heading = "Filters")]
pub contains: Option<String>,
/// Filter by internal note ID
#[arg(long, help_heading = "Filters")]
pub note_id: Option<i64>,
/// Filter by GitLab note ID
#[arg(long, help_heading = "Filters")]
pub gitlab_note_id: Option<i64>,
/// Filter by discussion ID
#[arg(long, help_heading = "Filters")]
pub discussion_id: Option<String>,
/// Include system notes (excluded by default)
#[arg(long, help_heading = "Filters")]
pub include_system: bool,
/// Filter to notes on a specific issue IID (requires --project or default_project)
#[arg(long, conflicts_with = "for_mr", help_heading = "Filters")]
pub for_issue: Option<i64>,
/// Filter to notes on a specific MR IID (requires --project or default_project)
#[arg(long, conflicts_with = "for_issue", help_heading = "Filters")]
pub for_mr: Option<i64>,
/// Filter by project path
#[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>,
/// Filter by time (7d, 2w, 1m, or YYYY-MM-DD)
#[arg(long, help_heading = "Filters")]
pub since: Option<String>,
/// Filter until date (YYYY-MM-DD, inclusive end-of-day)
#[arg(long, help_heading = "Filters")]
pub until: Option<String>,
/// Filter by file path (exact match or prefix with trailing /)
#[arg(long, help_heading = "Filters")]
pub path: Option<String>,
/// Filter by resolution status (any, unresolved, resolved)
#[arg(
long,
value_parser = ["any", "unresolved", "resolved"],
help_heading = "Filters"
)]
pub resolution: Option<String>,
/// Sort field (created, updated)
#[arg(
long,
value_parser = ["created", "updated"],
default_value = "created",
help_heading = "Sorting"
)]
pub sort: String,
/// Sort ascending (default: descending)
#[arg(long, help_heading = "Sorting")]
pub asc: bool,
/// Open first matching item in browser
#[arg(long, help_heading = "Actions")]
pub open: bool,
}
#[derive(Parser)]
pub struct IngestArgs {
/// Entity to ingest (issues, mrs). Omit to ingest everything
@@ -556,8 +689,8 @@ pub struct SearchArgs {
#[arg(long, default_value = "hybrid", value_parser = ["lexical", "hybrid", "semantic"], help_heading = "Mode")]
pub mode: String,
/// Filter by source type (issue, mr, discussion)
#[arg(long = "type", value_name = "TYPE", value_parser = ["issue", "mr", "discussion"], help_heading = "Filters")]
/// Filter by source type (issue, mr, discussion, note)
#[arg(long = "type", value_name = "TYPE", value_parser = ["issue", "mr", "discussion", "note"], help_heading = "Filters")]
pub source_type: Option<String>,
/// Filter by author username
@@ -624,6 +757,7 @@ pub struct GenerateDocsArgs {
#[command(after_help = "\x1b[1mExamples:\x1b[0m
lore sync # Full pipeline: ingest + docs + embed
lore sync --no-embed # Skip embedding step
lore sync --no-status # Skip work-item status enrichment
lore sync --full --force # Full re-sync, override stale lock
lore sync --dry-run # Preview what would change")]
pub struct SyncArgs {
@@ -657,12 +791,20 @@ pub struct SyncArgs {
#[arg(long = "no-file-changes")]
pub no_file_changes: bool,
/// Skip work-item status enrichment via GraphQL (overrides config)
#[arg(long = "no-status")]
pub no_status: bool,
/// Preview what would be synced without making changes
#[arg(long, overrides_with = "no_dry_run")]
pub dry_run: bool,
#[arg(long = "no-dry-run", hide = true, overrides_with = "dry_run")]
pub no_dry_run: bool,
/// Show detailed timing breakdown for sync stages
#[arg(short = 't', long = "timings")]
pub timings: bool,
}
#[derive(Parser)]
@@ -684,11 +826,15 @@ pub struct EmbedArgs {
#[derive(Parser)]
#[command(after_help = "\x1b[1mExamples:\x1b[0m
lore timeline 'deployment' # Events related to deployments
lore timeline 'deployment' # Search-based seeding
lore timeline issue:42 # Direct: issue #42 and related entities
lore timeline i:42 # Shorthand for issue:42
lore timeline mr:99 # Direct: MR !99 and related entities
lore timeline 'auth' --since 30d -p group/repo # Scoped to project and time
lore timeline 'migration' --depth 2 --expand-mentions # Deep cross-reference expansion")]
lore timeline 'migration' --depth 2 # Deep cross-reference expansion
lore timeline 'auth' --no-mentions # Only 'closes' and 'related' edges")]
pub struct TimelineArgs {
/// Search query (keywords to find in issues, MRs, and discussions)
/// Search text or entity reference (issue:N, i:N, mr:N, m:N)
pub query: String,
/// Scope to a specific project (fuzzy match)
@@ -703,9 +849,9 @@ pub struct TimelineArgs {
#[arg(long, default_value = "1", help_heading = "Expansion")]
pub depth: u32,
/// Also follow 'mentioned' edges during expansion (high fan-out)
#[arg(long = "expand-mentions", help_heading = "Expansion")]
pub expand_mentions: bool,
/// Skip 'mentioned' edges during expansion (only follow 'closes' and 'related')
#[arg(long = "no-mentions", help_heading = "Expansion")]
pub no_mentions: bool,
/// Maximum number of events to display
#[arg(
@@ -795,11 +941,104 @@ pub struct WhoArgs {
pub fields: Option<Vec<String>>,
/// Show per-MR detail breakdown (expert mode only)
#[arg(long, help_heading = "Output", overrides_with = "no_detail")]
#[arg(
long,
help_heading = "Output",
overrides_with = "no_detail",
conflicts_with = "explain_score"
)]
pub detail: bool,
#[arg(long = "no-detail", hide = true, overrides_with = "detail")]
pub no_detail: bool,
/// Score as if "now" is this date (ISO 8601 or duration like 30d). Expert mode only.
#[arg(long = "as-of", help_heading = "Scoring")]
pub as_of: Option<String>,
/// Show per-component score breakdown in output. Expert mode only.
#[arg(long = "explain-score", help_heading = "Scoring")]
pub explain_score: bool,
/// Include bot users in results (normally excluded via scoring.excluded_usernames).
#[arg(long = "include-bots", help_heading = "Scoring")]
pub include_bots: bool,
/// Remove the default time window (query all history). Conflicts with --since.
#[arg(
long = "all-history",
help_heading = "Filters",
conflicts_with = "since"
)]
pub all_history: bool,
}
#[derive(Parser)]
#[command(after_help = "\x1b[1mExamples:\x1b[0m
lore file-history src/main.rs # MRs that touched this file
lore file-history src/auth/ -p group/repo # Scoped to project
lore file-history src/foo.rs --discussions # Include DiffNote snippets
lore file-history src/bar.rs --no-follow-renames # Skip rename chain")]
pub struct FileHistoryArgs {
/// File path to trace history for
pub path: String,
/// Scope to a specific project (fuzzy match)
#[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>,
/// Include discussion snippets from DiffNotes on this file
#[arg(long, help_heading = "Output")]
pub discussions: bool,
/// Disable rename chain resolution
#[arg(long = "no-follow-renames", help_heading = "Filters")]
pub no_follow_renames: bool,
/// Only show merged MRs
#[arg(long, help_heading = "Filters")]
pub merged: bool,
/// Maximum results
#[arg(
short = 'n',
long = "limit",
default_value = "50",
help_heading = "Output"
)]
pub limit: usize,
}
#[derive(Parser)]
#[command(after_help = "\x1b[1mExamples:\x1b[0m
lore trace src/main.rs # Why was this file changed?
lore trace src/auth/ -p group/repo # Scoped to project
lore trace src/foo.rs --discussions # Include DiffNote context
lore trace src/bar.rs:42 # Line hint (Tier 2 warning)")]
pub struct TraceArgs {
/// File path to trace (supports :line suffix for future Tier 2)
pub path: String,
/// Scope to a specific project (fuzzy match)
#[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>,
/// Include DiffNote discussion snippets
#[arg(long, help_heading = "Output")]
pub discussions: bool,
/// Disable rename chain resolution
#[arg(long = "no-follow-renames", help_heading = "Filters")]
pub no_follow_renames: bool,
/// Maximum trace chains to display
#[arg(
short = 'n',
long = "limit",
default_value = "20",
help_heading = "Output"
)]
pub limit: usize,
}
#[derive(Parser)]

View File

@@ -1,14 +1,91 @@
use indicatif::MultiProgress;
use indicatif::{MultiProgress, ProgressBar, ProgressStyle};
use std::io::Write;
use std::sync::LazyLock;
use std::time::Duration;
use tracing_subscriber::fmt::MakeWriter;
use crate::cli::render::{Icons, Theme};
static MULTI: LazyLock<MultiProgress> = LazyLock::new(MultiProgress::new);
pub fn multi() -> &'static MultiProgress {
&MULTI
}
/// Stage spinner with icon prefix and elapsed time on the right.
///
/// Template: `{spinner:.cyan} {prefix} {wide_msg} {elapsed_style:.dim}`
pub fn stage_spinner_v2(icon: &str, label: &str, msg: &str, robot_mode: bool) -> ProgressBar {
if robot_mode {
return ProgressBar::hidden();
}
let pb = multi().add(ProgressBar::new_spinner());
pb.set_style(
ProgressStyle::default_spinner()
.template(" {spinner:.cyan} {prefix} {wide_msg}")
.expect("valid template"),
);
pb.enable_steady_tick(Duration::from_millis(60));
pb.set_prefix(format!("{icon} {label}"));
pb.set_message(msg.to_string());
pb
}
/// Nested progress bar with count, throughput, and ETA.
///
/// Template: ` {spinner:.dim} {msg} {bar:30.cyan/dark_gray} {pos}/{len} {per_sec:.dim} {eta:.dim}`
pub fn nested_progress(msg: &str, len: u64, robot_mode: bool) -> ProgressBar {
if robot_mode {
return ProgressBar::hidden();
}
let pb = multi().add(ProgressBar::new(len));
pb.set_style(
ProgressStyle::default_bar()
.template(
" {spinner:.dim} {msg} {bar:30.cyan/dark_gray} {pos}/{len} {per_sec:.dim} {eta:.dim}",
)
.expect("valid template")
.progress_chars(Icons::progress_chars()),
);
pb.enable_steady_tick(Duration::from_millis(60));
pb.set_message(msg.to_string());
pb
}
/// Replace a spinner with a static completion line showing icon, label, summary, and elapsed.
///
/// Output: ` ✓ Label summary elapsed`
pub fn finish_stage(pb: &ProgressBar, icon: &str, label: &str, summary: &str, elapsed: Duration) {
let line = format_stage_line(icon, label, summary, elapsed);
pb.set_style(ProgressStyle::with_template("{msg}").expect("valid template"));
pb.finish_with_message(line);
}
/// Build a static stage line showing icon, label, summary, and elapsed.
///
/// Output: ` ✓ Label summary elapsed`
pub fn format_stage_line(icon: &str, label: &str, summary: &str, elapsed: Duration) -> String {
let elapsed_str = format_elapsed(elapsed);
let styled_label = Theme::info().bold().render(&format!("{label:<12}"));
let styled_elapsed = Theme::timing().render(&format!("{elapsed_str:>8}"));
format!(" {icon} {styled_label}{summary:>40} {styled_elapsed}")
}
/// Format a Duration as a compact human string (e.g. "1.2s", "42ms", "1m 5s").
fn format_elapsed(d: Duration) -> String {
let ms = d.as_millis();
if ms < 1000 {
format!("{ms}ms")
} else if ms < 60_000 {
format!("{:.1}s", ms as f64 / 1000.0)
} else {
let secs = d.as_secs();
let m = secs / 60;
let s = secs % 60;
format!("{m}m {s}s")
}
}
#[derive(Clone)]
pub struct SuspendingWriter;
@@ -50,7 +127,6 @@ impl<'a> MakeWriter<'a> for SuspendingWriter {
#[cfg(test)]
mod tests {
use super::*;
use indicatif::ProgressBar;
#[test]
fn multi_returns_same_instance() {
@@ -88,4 +164,61 @@ mod tests {
let w = MakeWriter::make_writer(&writer);
drop(w);
}
// ── Progress API tests ──
#[test]
fn stage_spinner_v2_robot_mode_returns_hidden() {
let pb = stage_spinner_v2("\u{2714}", "Issues", "fetching...", true);
assert!(pb.is_hidden());
}
#[test]
fn stage_spinner_v2_human_mode_sets_properties() {
let pb = stage_spinner_v2("\u{2714}", "Issues", "fetching...", false);
assert!(pb.prefix().contains("Issues"));
assert_eq!(pb.message(), "fetching...");
pb.finish_and_clear();
}
#[test]
fn nested_progress_robot_mode_returns_hidden() {
let pb = nested_progress("Embedding...", 100, true);
assert!(pb.is_hidden());
}
#[test]
fn nested_progress_human_mode_sets_length() {
let pb = nested_progress("Embedding...", 100, false);
assert_eq!(pb.length(), Some(100));
assert_eq!(pb.message(), "Embedding...");
pb.finish_and_clear();
}
#[test]
fn format_elapsed_sub_second() {
assert_eq!(format_elapsed(Duration::from_millis(42)), "42ms");
assert_eq!(format_elapsed(Duration::from_millis(999)), "999ms");
}
#[test]
fn format_elapsed_seconds() {
assert_eq!(format_elapsed(Duration::from_millis(1200)), "1.2s");
assert_eq!(format_elapsed(Duration::from_millis(5000)), "5.0s");
}
#[test]
fn format_elapsed_minutes() {
assert_eq!(format_elapsed(Duration::from_secs(65)), "1m 5s");
assert_eq!(format_elapsed(Duration::from_secs(120)), "2m 0s");
}
#[test]
fn format_stage_line_includes_label_summary_and_elapsed() {
let line = format_stage_line("", "Issues", "10 issues", Duration::from_millis(4200));
assert!(line.contains(""));
assert!(line.contains("Issues"));
assert!(line.contains("10 issues"));
assert!(line.contains("4.2s"));
}
}

1392
src/cli/render.rs Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -64,6 +64,10 @@ pub fn expand_fields_preset(fields: &[String], entity: &str) -> Vec<String> {
.iter()
.map(|s| (*s).to_string())
.collect(),
"notes" => ["id", "author_username", "body", "created_at_iso"]
.iter()
.map(|s| (*s).to_string())
.collect(),
_ => fields.to_vec(),
}
} else {
@@ -77,7 +81,30 @@ pub fn strip_schemas(commands: &mut serde_json::Value) {
for (_cmd_name, cmd) in map.iter_mut() {
if let Some(obj) = cmd.as_object_mut() {
obj.remove("response_schema");
obj.remove("example_output");
}
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_expand_fields_preset_notes() {
let fields = vec!["minimal".to_string()];
let expanded = expand_fields_preset(&fields, "notes");
assert_eq!(
expanded,
["id", "author_username", "body", "created_at_iso"]
);
}
#[test]
fn test_expand_fields_preset_passthrough() {
let fields = vec!["id".to_string(), "body".to_string()];
let expanded = expand_fields_preset(&fields, "notes");
assert_eq!(expanded, ["id", "body"]);
}
}

View File

@@ -164,6 +164,38 @@ pub struct ScoringConfig {
/// Bonus points per individual inline review comment (DiffNote).
#[serde(rename = "noteBonus")]
pub note_bonus: i64,
/// Points per MR where the user was assigned as a reviewer.
#[serde(rename = "reviewerAssignmentWeight")]
pub reviewer_assignment_weight: i64,
/// Half-life in days for author contribution decay.
#[serde(rename = "authorHalfLifeDays")]
pub author_half_life_days: u32,
/// Half-life in days for reviewer contribution decay.
#[serde(rename = "reviewerHalfLifeDays")]
pub reviewer_half_life_days: u32,
/// Half-life in days for reviewer assignment decay.
#[serde(rename = "reviewerAssignmentHalfLifeDays")]
pub reviewer_assignment_half_life_days: u32,
/// Half-life in days for note/comment contribution decay.
#[serde(rename = "noteHalfLifeDays")]
pub note_half_life_days: u32,
/// Multiplier applied to scores from closed (not merged) MRs.
#[serde(rename = "closedMrMultiplier")]
pub closed_mr_multiplier: f64,
/// Minimum character count for a review note to earn note_bonus.
#[serde(rename = "reviewerMinNoteChars")]
pub reviewer_min_note_chars: u32,
/// Usernames excluded from expert/scoring results.
#[serde(rename = "excludedUsernames")]
pub excluded_usernames: Vec<String>,
}
impl Default for ScoringConfig {
@@ -172,6 +204,14 @@ impl Default for ScoringConfig {
author_weight: 25,
reviewer_weight: 10,
note_bonus: 1,
reviewer_assignment_weight: 3,
author_half_life_days: 180,
reviewer_half_life_days: 90,
reviewer_assignment_half_life_days: 45,
note_half_life_days: 45,
closed_mr_multiplier: 0.5,
reviewer_min_note_chars: 20,
excluded_usernames: vec![],
}
}
}
@@ -287,6 +327,55 @@ fn validate_scoring(scoring: &ScoringConfig) -> Result<()> {
details: "scoring.noteBonus must be >= 0".to_string(),
});
}
if scoring.reviewer_assignment_weight < 0 {
return Err(LoreError::ConfigInvalid {
details: "scoring.reviewerAssignmentWeight must be >= 0".to_string(),
});
}
if scoring.author_half_life_days == 0 || scoring.author_half_life_days > 3650 {
return Err(LoreError::ConfigInvalid {
details: "scoring.authorHalfLifeDays must be in 1..=3650".to_string(),
});
}
if scoring.reviewer_half_life_days == 0 || scoring.reviewer_half_life_days > 3650 {
return Err(LoreError::ConfigInvalid {
details: "scoring.reviewerHalfLifeDays must be in 1..=3650".to_string(),
});
}
if scoring.reviewer_assignment_half_life_days == 0
|| scoring.reviewer_assignment_half_life_days > 3650
{
return Err(LoreError::ConfigInvalid {
details: "scoring.reviewerAssignmentHalfLifeDays must be in 1..=3650".to_string(),
});
}
if scoring.note_half_life_days == 0 || scoring.note_half_life_days > 3650 {
return Err(LoreError::ConfigInvalid {
details: "scoring.noteHalfLifeDays must be in 1..=3650".to_string(),
});
}
if !scoring.closed_mr_multiplier.is_finite()
|| scoring.closed_mr_multiplier <= 0.0
|| scoring.closed_mr_multiplier > 1.0
{
return Err(LoreError::ConfigInvalid {
details: "scoring.closedMrMultiplier must be finite and in (0.0, 1.0]".to_string(),
});
}
if scoring.reviewer_min_note_chars > 4096 {
return Err(LoreError::ConfigInvalid {
details: "scoring.reviewerMinNoteChars must be <= 4096".to_string(),
});
}
if scoring
.excluded_usernames
.iter()
.any(|u| u.trim().is_empty())
{
return Err(LoreError::ConfigInvalid {
details: "scoring.excludedUsernames entries must be non-empty".to_string(),
});
}
Ok(())
}
@@ -561,4 +650,140 @@ mod tests {
"set default_project should be present: {json}"
);
}
#[test]
fn test_config_validation_rejects_zero_half_life() {
let scoring = ScoringConfig {
author_half_life_days: 0,
..Default::default()
};
let err = validate_scoring(&scoring).unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("authorHalfLifeDays"),
"unexpected error: {msg}"
);
}
#[test]
fn test_config_validation_rejects_absurd_half_life() {
let scoring = ScoringConfig {
author_half_life_days: 5000,
..Default::default()
};
let err = validate_scoring(&scoring).unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("authorHalfLifeDays"),
"unexpected error: {msg}"
);
}
#[test]
fn test_config_validation_rejects_nan_multiplier() {
let scoring = ScoringConfig {
closed_mr_multiplier: f64::NAN,
..Default::default()
};
let err = validate_scoring(&scoring).unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("closedMrMultiplier"),
"unexpected error: {msg}"
);
}
#[test]
fn test_config_validation_rejects_zero_multiplier() {
let scoring = ScoringConfig {
closed_mr_multiplier: 0.0,
..Default::default()
};
let err = validate_scoring(&scoring).unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("closedMrMultiplier"),
"unexpected error: {msg}"
);
}
#[test]
fn test_config_validation_rejects_negative_reviewer_assignment_weight() {
let scoring = ScoringConfig {
reviewer_assignment_weight: -1,
..Default::default()
};
let err = validate_scoring(&scoring).unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("reviewerAssignmentWeight"),
"unexpected error: {msg}"
);
}
#[test]
fn test_config_validation_rejects_oversized_min_note_chars() {
let scoring = ScoringConfig {
reviewer_min_note_chars: 5000,
..Default::default()
};
let err = validate_scoring(&scoring).unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("reviewerMinNoteChars"),
"unexpected error: {msg}"
);
}
#[test]
fn test_config_validation_rejects_empty_excluded_username() {
let scoring = ScoringConfig {
excluded_usernames: vec!["valid".to_string(), " ".to_string()],
..Default::default()
};
let err = validate_scoring(&scoring).unwrap_err();
let msg = err.to_string();
assert!(msg.contains("excludedUsernames"), "unexpected error: {msg}");
}
#[test]
fn test_config_validation_accepts_valid_new_fields() {
let scoring = ScoringConfig {
author_half_life_days: 365,
reviewer_half_life_days: 180,
reviewer_assignment_half_life_days: 90,
note_half_life_days: 60,
closed_mr_multiplier: 0.5,
reviewer_min_note_chars: 20,
reviewer_assignment_weight: 3,
excluded_usernames: vec!["bot-user".to_string()],
..Default::default()
};
validate_scoring(&scoring).unwrap();
}
#[test]
fn test_config_validation_accepts_boundary_half_life() {
// 1 and 3650 are both valid boundaries
let scoring_min = ScoringConfig {
author_half_life_days: 1,
..Default::default()
};
validate_scoring(&scoring_min).unwrap();
let scoring_max = ScoringConfig {
author_half_life_days: 3650,
..Default::default()
};
validate_scoring(&scoring_max).unwrap();
}
#[test]
fn test_config_validation_accepts_multiplier_at_one() {
let scoring = ScoringConfig {
closed_mr_multiplier: 1.0,
..Default::default()
};
validate_scoring(&scoring).unwrap();
}
}

View File

@@ -69,10 +69,26 @@ const MIGRATIONS: &[(&str, &str)] = &[
"021",
include_str!("../../migrations/021_work_item_status.sql"),
),
(
"022",
include_str!("../../migrations/022_notes_query_index.sql"),
),
(
"023",
include_str!("../../migrations/023_issue_detail_fields.sql"),
),
(
"024",
include_str!("../../migrations/024_note_documents.sql"),
),
(
"025",
include_str!("../../migrations/025_note_dirty_backfill.sql"),
),
(
"026",
include_str!("../../migrations/026_scoring_indexes.sql"),
),
];
pub fn create_connection(db_path: &Path) -> Result<Connection> {
@@ -316,3 +332,7 @@ pub fn get_schema_version(conn: &Connection) -> i32 {
)
.unwrap_or(0)
}
#[cfg(test)]
#[path = "db_tests.rs"]
mod tests;

632
src/core/db_tests.rs Normal file
View File

@@ -0,0 +1,632 @@
use super::*;
fn setup_migrated_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn index_exists(conn: &Connection, index_name: &str) -> bool {
conn.query_row(
"SELECT COUNT(*) > 0 FROM sqlite_master WHERE type='index' AND name=?1",
[index_name],
|row| row.get(0),
)
.unwrap_or(false)
}
fn column_exists(conn: &Connection, table: &str, column: &str) -> bool {
let sql = format!("PRAGMA table_info({})", table);
let mut stmt = conn.prepare(&sql).unwrap();
let columns: Vec<String> = stmt
.query_map([], |row| row.get::<_, String>(1))
.unwrap()
.filter_map(|r| r.ok())
.collect();
columns.contains(&column.to_string())
}
#[test]
fn test_migration_022_indexes_exist() {
let conn = setup_migrated_db();
// New indexes from migration 022
assert!(
index_exists(&conn, "idx_notes_user_created"),
"idx_notes_user_created should exist"
);
assert!(
index_exists(&conn, "idx_notes_project_created"),
"idx_notes_project_created should exist"
);
assert!(
index_exists(&conn, "idx_notes_author_id"),
"idx_notes_author_id should exist"
);
// Discussion JOIN indexes (idx_discussions_issue_id is new;
// idx_discussions_mr_id already existed from migration 006 but
// IF NOT EXISTS makes it safe)
assert!(
index_exists(&conn, "idx_discussions_issue_id"),
"idx_discussions_issue_id should exist"
);
assert!(
index_exists(&conn, "idx_discussions_mr_id"),
"idx_discussions_mr_id should exist"
);
// author_id column on notes
assert!(
column_exists(&conn, "notes", "author_id"),
"notes.author_id column should exist"
);
}
// -- Helper: insert a minimal project for FK satisfaction --
fn insert_test_project(conn: &Connection) -> i64 {
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) \
VALUES (1000, 'test/project', 'https://example.com/test/project')",
[],
)
.unwrap();
conn.last_insert_rowid()
}
// -- Helper: insert a minimal issue --
fn insert_test_issue(conn: &Connection, project_id: i64) -> i64 {
conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, state, author_username, \
created_at, updated_at, last_seen_at) \
VALUES (100, ?1, 1, 'opened', 'alice', 1000, 1000, 1000)",
[project_id],
)
.unwrap();
conn.last_insert_rowid()
}
// -- Helper: insert a minimal discussion --
fn insert_test_discussion(conn: &Connection, project_id: i64, issue_id: i64) -> i64 {
conn.execute(
"INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, \
noteable_type, last_seen_at) \
VALUES ('disc-001', ?1, ?2, 'Issue', 1000)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
conn.last_insert_rowid()
}
// -- Helper: insert a minimal non-system note --
#[allow(clippy::too_many_arguments)]
fn insert_test_note(
conn: &Connection,
gitlab_id: i64,
discussion_id: i64,
project_id: i64,
is_system: bool,
) -> i64 {
conn.execute(
"INSERT INTO notes (gitlab_id, discussion_id, project_id, is_system, \
author_username, body, created_at, updated_at, last_seen_at) \
VALUES (?1, ?2, ?3, ?4, 'alice', 'note body', 1000, 1000, 1000)",
rusqlite::params![gitlab_id, discussion_id, project_id, is_system as i32],
)
.unwrap();
conn.last_insert_rowid()
}
// -- Helper: insert a document --
fn insert_test_document(
conn: &Connection,
source_type: &str,
source_id: i64,
project_id: i64,
) -> i64 {
conn.execute(
"INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) \
VALUES (?1, ?2, ?3, 'test content', 'hash123')",
rusqlite::params![source_type, source_id, project_id],
)
.unwrap();
conn.last_insert_rowid()
}
#[test]
fn test_migration_024_allows_note_source_type() {
let conn = setup_migrated_db();
let pid = insert_test_project(&conn);
// Should succeed -- 'note' is now allowed
conn.execute(
"INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) \
VALUES ('note', 1, ?1, 'note content', 'hash-note')",
[pid],
)
.expect("INSERT with source_type='note' into documents should succeed");
// dirty_sources should also accept 'note'
conn.execute(
"INSERT INTO dirty_sources (source_type, source_id, queued_at) \
VALUES ('note', 1, 1000)",
[],
)
.expect("INSERT with source_type='note' into dirty_sources should succeed");
}
#[test]
fn test_migration_024_preserves_existing_data() {
// Run migrations up to 023 only, insert data, then apply 024
// Migration 024 is at index 23 (0-based). Use hardcoded index so adding
// later migrations doesn't silently shift what this test exercises.
let conn = create_connection(Path::new(":memory:")).unwrap();
// Apply migrations 001-023 (indices 0..23)
run_migrations_up_to(&conn, 23);
let pid = insert_test_project(&conn);
// Insert a document with existing source_type
conn.execute(
"INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash, title) \
VALUES ('issue', 1, ?1, 'issue content', 'hash-issue', 'Test Issue')",
[pid],
)
.unwrap();
let doc_id: i64 = conn.last_insert_rowid();
// Insert junction data
conn.execute(
"INSERT INTO document_labels (document_id, label_name) VALUES (?1, 'bug')",
[doc_id],
)
.unwrap();
conn.execute(
"INSERT INTO document_paths (document_id, path) VALUES (?1, 'src/main.rs')",
[doc_id],
)
.unwrap();
// Insert dirty_sources row
conn.execute(
"INSERT INTO dirty_sources (source_type, source_id, queued_at) VALUES ('issue', 1, 1000)",
[],
)
.unwrap();
// Now apply migration 024 (index 23) -- the table-rebuild migration
run_single_migration(&conn, 23);
// Verify document still exists with correct data
let (st, content, title): (String, String, String) = conn
.query_row(
"SELECT source_type, content_text, title FROM documents WHERE id = ?1",
[doc_id],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
)
.unwrap();
assert_eq!(st, "issue");
assert_eq!(content, "issue content");
assert_eq!(title, "Test Issue");
// Verify junction data preserved
let label_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM document_labels WHERE document_id = ?1",
[doc_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(label_count, 1);
let path_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM document_paths WHERE document_id = ?1",
[doc_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(path_count, 1);
// Verify dirty_sources preserved
let dirty_count: i64 = conn
.query_row("SELECT COUNT(*) FROM dirty_sources", [], |row| row.get(0))
.unwrap();
assert_eq!(dirty_count, 1);
}
#[test]
fn test_migration_024_fts_triggers_intact() {
let conn = setup_migrated_db();
let pid = insert_test_project(&conn);
// Insert a document after migration -- FTS trigger should fire
let doc_id = insert_test_document(&conn, "note", 1, pid);
// Verify FTS entry exists
let fts_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'test'",
[],
|row| row.get(0),
)
.unwrap();
assert!(fts_count > 0, "FTS trigger should have created an entry");
// Verify update trigger works
conn.execute(
"UPDATE documents SET content_text = 'updated content' WHERE id = ?1",
[doc_id],
)
.unwrap();
let fts_updated: i64 = conn
.query_row(
"SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'updated'",
[],
|row| row.get(0),
)
.unwrap();
assert!(
fts_updated > 0,
"FTS update trigger should reflect new content"
);
// Verify delete trigger works
conn.execute("DELETE FROM documents WHERE id = ?1", [doc_id])
.unwrap();
let fts_after_delete: i64 = conn
.query_row(
"SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'updated'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
fts_after_delete, 0,
"FTS delete trigger should remove the entry"
);
}
#[test]
fn test_migration_024_row_counts_preserved() {
let conn = setup_migrated_db();
// After full migration, tables should exist and be queryable
let doc_count: i64 = conn
.query_row("SELECT COUNT(*) FROM documents", [], |row| row.get(0))
.unwrap();
assert_eq!(doc_count, 0, "Fresh DB should have 0 documents");
let dirty_count: i64 = conn
.query_row("SELECT COUNT(*) FROM dirty_sources", [], |row| row.get(0))
.unwrap();
assert_eq!(dirty_count, 0, "Fresh DB should have 0 dirty_sources");
}
#[test]
fn test_migration_024_integrity_checks_pass() {
let conn = setup_migrated_db();
// PRAGMA integrity_check
let integrity: String = conn
.query_row("PRAGMA integrity_check", [], |row| row.get(0))
.unwrap();
assert_eq!(integrity, "ok", "Database integrity check should pass");
// PRAGMA foreign_key_check (returns rows only if there are violations)
let fk_violations: i64 = conn
.query_row("SELECT COUNT(*) FROM pragma_foreign_key_check", [], |row| {
row.get(0)
})
.unwrap();
assert_eq!(fk_violations, 0, "No foreign key violations should exist");
}
#[test]
fn test_migration_024_note_delete_trigger_cleans_document() {
let conn = setup_migrated_db();
let pid = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, pid);
let disc_id = insert_test_discussion(&conn, pid, issue_id);
let note_id = insert_test_note(&conn, 200, disc_id, pid, false);
// Create a document for this note
insert_test_document(&conn, "note", note_id, pid);
let doc_before: i64 = conn
.query_row(
"SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
[note_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(doc_before, 1);
// Delete the note -- trigger should remove the document
conn.execute("DELETE FROM notes WHERE id = ?1", [note_id])
.unwrap();
let doc_after: i64 = conn
.query_row(
"SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
[note_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(
doc_after, 0,
"notes_ad_cleanup trigger should delete the document"
);
}
#[test]
fn test_migration_024_note_system_flip_trigger_cleans_document() {
let conn = setup_migrated_db();
let pid = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, pid);
let disc_id = insert_test_discussion(&conn, pid, issue_id);
let note_id = insert_test_note(&conn, 201, disc_id, pid, false);
// Create a document for this note
insert_test_document(&conn, "note", note_id, pid);
let doc_before: i64 = conn
.query_row(
"SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
[note_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(doc_before, 1);
// Flip is_system from 0 to 1 -- trigger should remove the document
conn.execute("UPDATE notes SET is_system = 1 WHERE id = ?1", [note_id])
.unwrap();
let doc_after: i64 = conn
.query_row(
"SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
[note_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(
doc_after, 0,
"notes_au_system_cleanup trigger should delete the document"
);
}
#[test]
fn test_migration_024_system_note_delete_trigger_does_not_fire() {
let conn = setup_migrated_db();
let pid = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, pid);
let disc_id = insert_test_discussion(&conn, pid, issue_id);
// Insert a system note (is_system = true)
let note_id = insert_test_note(&conn, 202, disc_id, pid, true);
// Manually insert a document (shouldn't exist for system notes in practice,
// but we test the trigger guard)
insert_test_document(&conn, "note", note_id, pid);
let doc_before: i64 = conn
.query_row(
"SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
[note_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(doc_before, 1);
// Delete system note -- trigger has WHEN old.is_system = 0 so it should NOT fire
conn.execute("DELETE FROM notes WHERE id = ?1", [note_id])
.unwrap();
let doc_after: i64 = conn
.query_row(
"SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
[note_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(
doc_after, 1,
"notes_ad_cleanup trigger should NOT fire for system notes"
);
}
/// Run migrations only up to version `up_to` (inclusive).
fn run_migrations_up_to(conn: &Connection, up_to: usize) {
conn.execute_batch(
"CREATE TABLE IF NOT EXISTS schema_version ( \
version INTEGER PRIMARY KEY, applied_at INTEGER NOT NULL, description TEXT);",
)
.unwrap();
for (version_str, sql) in &MIGRATIONS[..up_to] {
let version: i32 = version_str.parse().unwrap();
conn.execute_batch(sql).unwrap();
conn.execute(
"INSERT OR REPLACE INTO schema_version (version, applied_at, description) \
VALUES (?1, strftime('%s', 'now') * 1000, ?2)",
rusqlite::params![version, version_str],
)
.unwrap();
}
}
/// Run a single migration by index (0-based).
fn run_single_migration(conn: &Connection, index: usize) {
let (version_str, sql) = MIGRATIONS[index];
let version: i32 = version_str.parse().unwrap();
conn.execute_batch(sql).unwrap();
conn.execute(
"INSERT OR REPLACE INTO schema_version (version, applied_at, description) \
VALUES (?1, strftime('%s', 'now') * 1000, ?2)",
rusqlite::params![version, version_str],
)
.unwrap();
}
#[test]
fn test_migration_025_backfills_existing_notes() {
let conn = create_connection(Path::new(":memory:")).unwrap();
// Run all migrations through 024 (index 0..24)
run_migrations_up_to(&conn, 24);
let pid = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, pid);
let disc_id = insert_test_discussion(&conn, pid, issue_id);
// Insert 5 non-system notes
for i in 1..=5 {
insert_test_note(&conn, 300 + i, disc_id, pid, false);
}
// Insert 2 system notes
for i in 1..=2 {
insert_test_note(&conn, 400 + i, disc_id, pid, true);
}
// Run migration 025
run_single_migration(&conn, 24);
let dirty_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
dirty_count, 5,
"Migration 025 should backfill 5 non-system notes"
);
// Verify system notes were not backfilled
let system_note_ids: Vec<i64> = {
let mut stmt = conn
.prepare(
"SELECT source_id FROM dirty_sources WHERE source_type = 'note' ORDER BY source_id",
)
.unwrap();
stmt.query_map([], |row| row.get(0))
.unwrap()
.collect::<std::result::Result<Vec<_>, _>>()
.unwrap()
};
// System note ids should not appear
let all_system_note_ids: Vec<i64> = {
let mut stmt = conn
.prepare("SELECT id FROM notes WHERE is_system = 1 ORDER BY id")
.unwrap();
stmt.query_map([], |row| row.get(0))
.unwrap()
.collect::<std::result::Result<Vec<_>, _>>()
.unwrap()
};
for sys_id in &all_system_note_ids {
assert!(
!system_note_ids.contains(sys_id),
"System note id {} should not be in dirty_sources",
sys_id
);
}
}
#[test]
fn test_migration_025_idempotent_with_existing_documents() {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations_up_to(&conn, 24);
let pid = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, pid);
let disc_id = insert_test_discussion(&conn, pid, issue_id);
// Insert 3 non-system notes
let note_ids: Vec<i64> = (1..=3)
.map(|i| insert_test_note(&conn, 500 + i, disc_id, pid, false))
.collect();
// Create documents for 2 of 3 notes (simulating already-generated docs)
insert_test_document(&conn, "note", note_ids[0], pid);
insert_test_document(&conn, "note", note_ids[1], pid);
// Run migration 025
run_single_migration(&conn, 24);
let dirty_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
dirty_count, 1,
"Only the note without a document should be backfilled"
);
// Verify the correct note was queued
let queued_id: i64 = conn
.query_row(
"SELECT source_id FROM dirty_sources WHERE source_type = 'note'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(queued_id, note_ids[2]);
}
#[test]
fn test_migration_025_skips_notes_already_in_dirty_queue() {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations_up_to(&conn, 24);
let pid = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, pid);
let disc_id = insert_test_discussion(&conn, pid, issue_id);
// Insert 3 non-system notes
let note_ids: Vec<i64> = (1..=3)
.map(|i| insert_test_note(&conn, 600 + i, disc_id, pid, false))
.collect();
// Pre-queue one note in dirty_sources
conn.execute(
"INSERT INTO dirty_sources (source_type, source_id, queued_at) VALUES ('note', ?1, 999)",
[note_ids[0]],
)
.unwrap();
// Run migration 025
run_single_migration(&conn, 24);
let dirty_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
dirty_count, 3,
"All 3 notes should be in dirty_sources (1 pre-existing + 2 new)"
);
// Verify the pre-existing entry preserved its original queued_at
let original_queued_at: i64 = conn
.query_row(
"SELECT queued_at FROM dirty_sources WHERE source_type = 'note' AND source_id = ?1",
[note_ids[0]],
|row| row.get(0),
)
.unwrap();
assert_eq!(
original_queued_at, 999,
"ON CONFLICT DO NOTHING should preserve the original queued_at"
);
}

71
src/core/file_history.rs Normal file
View File

@@ -0,0 +1,71 @@
use std::collections::HashSet;
use std::collections::VecDeque;
use rusqlite::Connection;
use super::error::Result;
/// Resolves a file path through its rename history in `mr_file_changes`.
///
/// BFS in both directions: forward (`old_path` -> `new_path`) and backward
/// (`new_path` -> `old_path`). Returns all equivalent paths including the
/// original, sorted for determinism. Cycles are detected via a visited set.
///
/// `max_hops` limits the BFS depth (distance from the starting path).
pub fn resolve_rename_chain(
conn: &Connection,
project_id: i64,
path: &str,
max_hops: usize,
) -> Result<Vec<String>> {
let mut visited: HashSet<String> = HashSet::new();
visited.insert(path.to_string());
if max_hops == 0 {
return Ok(vec![path.to_string()]);
}
let mut queue: VecDeque<(String, usize)> = VecDeque::new();
queue.push_back((path.to_string(), 0));
let forward_sql = "\
SELECT DISTINCT mfc.new_path FROM mr_file_changes mfc \
WHERE mfc.project_id = ?1 AND mfc.old_path = ?2 AND mfc.change_type = 'renamed'";
let backward_sql = "\
SELECT DISTINCT mfc.old_path FROM mr_file_changes mfc \
WHERE mfc.project_id = ?1 AND mfc.new_path = ?2 AND mfc.change_type = 'renamed'";
while let Some((current, depth)) = queue.pop_front() {
if depth >= max_hops {
continue;
}
// Forward: current was the old name -> discover new names
let mut fwd_stmt = conn.prepare_cached(forward_sql)?;
let forward: Vec<String> = fwd_stmt
.query_map(rusqlite::params![project_id, &current], |row| row.get(0))?
.filter_map(std::result::Result::ok)
.collect();
// Backward: current was the new name -> discover old names
let mut bwd_stmt = conn.prepare_cached(backward_sql)?;
let backward: Vec<String> = bwd_stmt
.query_map(rusqlite::params![project_id, &current], |row| row.get(0))?
.filter_map(std::result::Result::ok)
.collect();
for discovered in forward.into_iter().chain(backward) {
if visited.insert(discovered.clone()) {
queue.push_back((discovered, depth + 1));
}
}
}
let mut paths: Vec<String> = visited.into_iter().collect();
paths.sort();
Ok(paths)
}
#[cfg(test)]
#[path = "file_history_tests.rs"]
mod tests;

View File

@@ -0,0 +1,274 @@
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_project(conn: &Connection) -> i64 {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'group/repo', 'https://gitlab.example.com/group/repo', 1000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (1, 300, 5, 1, 'Rename MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
[],
)
.unwrap();
1 // project_id
}
fn insert_rename(conn: &Connection, mr_id: i64, old_path: &str, new_path: &str) {
conn.execute(
"INSERT INTO mr_file_changes (merge_request_id, project_id, old_path, new_path, change_type)
VALUES (?1, 1, ?2, ?3, 'renamed')",
rusqlite::params![mr_id, old_path, new_path],
)
.unwrap();
}
#[test]
fn test_no_renames_returns_original_path() {
let conn = setup_test_db();
let project_id = seed_project(&conn);
let result = resolve_rename_chain(&conn, project_id, "src/auth.rs", 10).unwrap();
assert_eq!(result, ["src/auth.rs"]);
}
#[test]
fn test_forward_chain() {
// a.rs -> b.rs -> c.rs, starting from a.rs finds all three
let conn = setup_test_db();
let project_id = seed_project(&conn);
insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
// Need a second MR for the next rename
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (2, 301, 6, 1, 'Rename MR 2', 'merged', 3000, 4000, 4000, 'feature2', 'main')",
[],
)
.unwrap();
insert_rename(&conn, 2, "src/b.rs", "src/c.rs");
let mut result = resolve_rename_chain(&conn, project_id, "src/a.rs", 10).unwrap();
result.sort();
assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs"]);
}
#[test]
fn test_backward_chain() {
// a.rs -> b.rs -> c.rs, starting from c.rs finds all three
let conn = setup_test_db();
let project_id = seed_project(&conn);
insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (2, 301, 6, 1, 'Rename MR 2', 'merged', 3000, 4000, 4000, 'feature2', 'main')",
[],
)
.unwrap();
insert_rename(&conn, 2, "src/b.rs", "src/c.rs");
let mut result = resolve_rename_chain(&conn, project_id, "src/c.rs", 10).unwrap();
result.sort();
assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs"]);
}
#[test]
fn test_cycle_detection() {
// a -> b -> a: terminates without infinite loop
let conn = setup_test_db();
let project_id = seed_project(&conn);
insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (2, 301, 6, 1, 'Rename back', 'merged', 3000, 4000, 4000, 'feature2', 'main')",
[],
)
.unwrap();
insert_rename(&conn, 2, "src/b.rs", "src/a.rs");
let mut result = resolve_rename_chain(&conn, project_id, "src/a.rs", 10).unwrap();
result.sort();
assert_eq!(result, ["src/a.rs", "src/b.rs"]);
}
#[test]
fn test_max_hops_zero_returns_original() {
let conn = setup_test_db();
let project_id = seed_project(&conn);
insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
let result = resolve_rename_chain(&conn, project_id, "src/a.rs", 0).unwrap();
assert_eq!(result, ["src/a.rs"]);
}
#[test]
fn test_max_hops_bounded() {
// Chain: a -> b -> c -> d -> e (4 hops)
// With max_hops=2, should find exactly {a, b, c} (original + 2 depth levels)
let conn = setup_test_db();
let project_id = seed_project(&conn);
let paths = ["src/a.rs", "src/b.rs", "src/c.rs", "src/d.rs", "src/e.rs"];
for (i, window) in paths.windows(2).enumerate() {
if i > 0 {
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (?1, ?2, ?3, 1, 'MR', 'merged', ?4, ?5, ?5, 'feat', 'main')",
rusqlite::params![
(i + 1) as i64,
(300 + i) as i64,
(5 + i) as i64,
(1000 * (i + 1)) as i64,
(2000 * (i + 1)) as i64,
],
)
.unwrap();
}
#[allow(clippy::cast_possible_wrap)]
insert_rename(&conn, (i + 1) as i64, window[0], window[1]);
}
let result = resolve_rename_chain(&conn, project_id, "src/a.rs", 2).unwrap();
assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs"]);
// Depth 1 should find only {a, b}
let result1 = resolve_rename_chain(&conn, project_id, "src/a.rs", 1).unwrap();
assert_eq!(result1, ["src/a.rs", "src/b.rs"]);
}
#[test]
fn test_diamond_pattern() {
// Diamond: a -> b, a -> c, b -> d, c -> d
// From a with max_hops=2, should find all four: {a, b, c, d}
let conn = setup_test_db();
let project_id = seed_project(&conn);
// MR 1: a -> b
insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
// MR 2: a -> c
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (2, 301, 6, 1, 'MR 2', 'merged', 2000, 3000, 3000, 'feat2', 'main')",
[],
)
.unwrap();
insert_rename(&conn, 2, "src/a.rs", "src/c.rs");
// MR 3: b -> d
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (3, 302, 7, 1, 'MR 3', 'merged', 3000, 4000, 4000, 'feat3', 'main')",
[],
)
.unwrap();
insert_rename(&conn, 3, "src/b.rs", "src/d.rs");
// MR 4: c -> d
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (4, 303, 8, 1, 'MR 4', 'merged', 4000, 5000, 5000, 'feat4', 'main')",
[],
)
.unwrap();
insert_rename(&conn, 4, "src/c.rs", "src/d.rs");
// max_hops=2: a(0) -> {b,c}(1) -> {d}(2) — all four found
let result = resolve_rename_chain(&conn, project_id, "src/a.rs", 2).unwrap();
assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs", "src/d.rs"]);
// max_hops=1: a(0) -> {b,c}(1) — d at depth 2 excluded
let result1 = resolve_rename_chain(&conn, project_id, "src/a.rs", 1).unwrap();
assert_eq!(result1, ["src/a.rs", "src/b.rs", "src/c.rs"]);
}
#[test]
fn test_branching_renames() {
// a.rs was renamed to b.rs in one MR and c.rs in another
let conn = setup_test_db();
let project_id = seed_project(&conn);
insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (2, 301, 6, 1, 'Rename MR 2', 'merged', 3000, 4000, 4000, 'feature2', 'main')",
[],
)
.unwrap();
insert_rename(&conn, 2, "src/a.rs", "src/c.rs");
let mut result = resolve_rename_chain(&conn, project_id, "src/a.rs", 10).unwrap();
result.sort();
assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs"]);
}
#[test]
fn test_different_project_isolation() {
// Renames in project 2 should not leak into project 1 queries
let conn = setup_test_db();
let _project_id = seed_project(&conn);
// Create project 2
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (2, 200, 'other/repo', 'https://gitlab.example.com/other/repo', 1000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (2, 301, 5, 2, 'Other MR', 'merged', 1000, 2000, 2000, 'feat', 'main')",
[],
)
.unwrap();
// Rename in project 1
insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
// Rename in project 2 (different mr_id and project_id)
conn.execute(
"INSERT INTO mr_file_changes (merge_request_id, project_id, old_path, new_path, change_type)
VALUES (2, 2, 'src/a.rs', 'src/z.rs', 'renamed')",
[],
)
.unwrap();
// Query project 1 -- should NOT see z.rs
let mut result = resolve_rename_chain(&conn, 1, "src/a.rs", 10).unwrap();
result.sort();
assert_eq!(result, ["src/a.rs", "src/b.rs"]);
// Query project 2 -- should NOT see b.rs
let mut result2 = resolve_rename_chain(&conn, 2, "src/a.rs", 10).unwrap();
result2.sort();
assert_eq!(result2, ["src/a.rs", "src/z.rs"]);
}

View File

@@ -4,7 +4,7 @@ use std::sync::Arc;
use std::sync::atomic::{AtomicBool, AtomicU32, Ordering};
use std::thread;
use std::time::Duration;
use tracing::{debug, error, info, warn};
use tracing::{debug, error, warn};
use uuid::Uuid;
use super::db::create_connection;
@@ -75,7 +75,7 @@ impl AppLock {
"INSERT INTO app_locks (name, owner, acquired_at, heartbeat_at) VALUES (?, ?, ?, ?)",
(&self.name, &self.owner, now, now),
)?;
info!(owner = %self.owner, "Lock acquired (new)");
debug!(owner = %self.owner, "Lock acquired (new)");
}
Some((existing_owner, acquired_at, heartbeat_at)) => {
let is_stale = now - heartbeat_at > self.stale_lock_ms;
@@ -85,7 +85,7 @@ impl AppLock {
"UPDATE app_locks SET owner = ?, acquired_at = ?, heartbeat_at = ? WHERE name = ?",
(&self.owner, now, now, &self.name),
)?;
info!(
debug!(
owner = %self.owner,
previous_owner = %existing_owner,
was_stale = is_stale,
@@ -125,7 +125,7 @@ impl AppLock {
"DELETE FROM app_locks WHERE name = ? AND owner = ?",
(&self.name, &self.owner),
) {
Ok(_) => info!(owner = %self.owner, "Lock released"),
Ok(_) => debug!(owner = %self.owner, "Lock released"),
Err(e) => error!(
owner = %self.owner,
error = %e,

View File

@@ -1,7 +1,45 @@
use std::fmt;
use std::fs;
use std::path::Path;
use tracing_subscriber::EnvFilter;
use tracing_subscriber::fmt::format::{FormatEvent, FormatFields};
use tracing_subscriber::registry::LookupSpan;
/// Compact stderr formatter: `HH:MM:SS LEVEL message key=value`
///
/// No span context, no full timestamps, no target — just the essentials.
/// The JSON file log is unaffected (it uses its own layer).
pub struct CompactHumanFormat;
impl<S, N> FormatEvent<S, N> for CompactHumanFormat
where
S: tracing::Subscriber + for<'a> LookupSpan<'a>,
N: for<'a> FormatFields<'a> + 'static,
{
fn format_event(
&self,
ctx: &tracing_subscriber::fmt::FmtContext<'_, S, N>,
mut writer: tracing_subscriber::fmt::format::Writer<'_>,
event: &tracing::Event<'_>,
) -> fmt::Result {
let now = chrono::Local::now();
let time = now.format("%H:%M:%S");
let level = *event.metadata().level();
let styled = match level {
tracing::Level::ERROR => console::style("ERROR").red().bold(),
tracing::Level::WARN => console::style(" WARN").yellow(),
tracing::Level::INFO => console::style(" INFO").green(),
tracing::Level::DEBUG => console::style("DEBUG").dim(),
tracing::Level::TRACE => console::style("TRACE").dim(),
};
write!(writer, "{time} {styled} ")?;
ctx.format_fields(writer.by_ref(), event)?;
writeln!(writer)
}
}
pub fn build_stderr_filter(verbose: u8, quiet: bool) -> EnvFilter {
if std::env::var("RUST_LOG").is_ok() {
@@ -13,8 +51,8 @@ pub fn build_stderr_filter(verbose: u8, quiet: bool) -> EnvFilter {
}
let directives = match verbose {
0 => "lore=info,warn",
1 => "lore=debug,warn",
0 => "lore=warn",
1 => "lore=info,warn",
2 => "lore=debug,info",
_ => "lore=trace,debug",
};

View File

@@ -4,10 +4,12 @@ pub mod db;
pub mod dependent_queue;
pub mod error;
pub mod events_db;
pub mod file_history;
pub mod lock;
pub mod logging;
pub mod metrics;
pub mod note_parser;
pub mod path_resolver;
pub mod paths;
pub mod payloads;
pub mod project;
@@ -19,6 +21,7 @@ pub mod timeline;
pub mod timeline_collect;
pub mod timeline_expand;
pub mod timeline_seed;
pub mod trace;
pub use config::Config;
pub use error::{LoreError, Result};

View File

@@ -22,20 +22,34 @@ pub struct ExtractResult {
pub parse_failures: usize,
}
// GitLab system notes include the entity type word: "mentioned in issue #5"
// or "mentioned in merge request !730". The word is mandatory in real data,
// but we also keep the old bare-sigil form as a fallback (no data uses it today,
// but other GitLab instances might differ).
static MENTIONED_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"mentioned in (?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
r"mentioned in (?:issue |merge request )?(?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
)
.expect("mentioned regex is valid")
});
static CLOSED_BY_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"closed by (?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
r"closed by (?:issue |merge request )?(?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
)
.expect("closed_by regex is valid")
});
/// Matches full GitLab URLs like:
/// `https://gitlab.example.com/group/project/-/issues/123`
/// `https://gitlab.example.com/group/sub/project/-/merge_requests/456`
static GITLAB_URL_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"https?://[^\s/]+/(?P<project>[^\s]+?)/-/(?P<entity_type>issues|merge_requests)/(?P<iid>\d+)",
)
.expect("gitlab url regex is valid")
});
pub fn parse_cross_refs(body: &str) -> Vec<ParsedCrossRef> {
let mut refs = Vec::new();
@@ -54,6 +68,47 @@ pub fn parse_cross_refs(body: &str) -> Vec<ParsedCrossRef> {
refs
}
/// Extract cross-references from GitLab URLs in free-text bodies (descriptions, user notes).
pub fn parse_url_refs(body: &str) -> Vec<ParsedCrossRef> {
let mut refs = Vec::new();
let mut seen = std::collections::HashSet::new();
for caps in GITLAB_URL_RE.captures_iter(body) {
let Some(entity_type_raw) = caps.name("entity_type").map(|m| m.as_str()) else {
continue;
};
let Some(iid_str) = caps.name("iid").map(|m| m.as_str()) else {
continue;
};
let Some(project) = caps.name("project").map(|m| m.as_str()) else {
continue;
};
let Ok(iid) = iid_str.parse::<i64>() else {
continue;
};
let target_entity_type = match entity_type_raw {
"issues" => "issue",
"merge_requests" => "merge_request",
_ => continue,
};
let key = (target_entity_type, project.to_owned(), iid);
if !seen.insert(key) {
continue; // deduplicate within same body
}
refs.push(ParsedCrossRef {
reference_type: "mentioned".to_owned(),
target_entity_type: target_entity_type.to_owned(),
target_iid: iid,
target_project_path: Some(project.to_owned()),
});
}
refs
}
fn capture_to_cross_ref(
caps: &regex::Captures<'_>,
reference_type: &str,
@@ -233,331 +288,189 @@ fn resolve_cross_project_entity(
resolve_entity_id(conn, project_id, entity_type, iid)
}
#[cfg(test)]
mod tests {
use super::*;
/// Extract cross-references from issue and MR descriptions (GitLab URLs only).
pub fn extract_refs_from_descriptions(conn: &Connection, project_id: i64) -> Result<ExtractResult> {
let mut result = ExtractResult::default();
#[test]
fn test_parse_mentioned_in_mr() {
let refs = parse_cross_refs("mentioned in !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 567);
assert!(refs[0].target_project_path.is_none());
}
let mut insert_stmt = conn.prepare_cached(
"INSERT OR IGNORE INTO entity_references
(project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
target_project_path, target_entity_iid,
reference_type, source_method, created_at)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, 'description_parse', ?9)",
)?;
#[test]
fn test_parse_mentioned_in_issue() {
let refs = parse_cross_refs("mentioned in #234");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 234);
assert!(refs[0].target_project_path.is_none());
}
#[test]
fn test_parse_mentioned_cross_project() {
let refs = parse_cross_refs("mentioned in group/repo!789");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 789);
assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
}
#[test]
fn test_parse_mentioned_cross_project_issue() {
let refs = parse_cross_refs("mentioned in group/repo#123");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 123);
assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
}
#[test]
fn test_parse_closed_by_mr() {
let refs = parse_cross_refs("closed by !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 567);
assert!(refs[0].target_project_path.is_none());
}
#[test]
fn test_parse_closed_by_cross_project() {
let refs = parse_cross_refs("closed by group/repo!789");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 789);
assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
}
#[test]
fn test_parse_multiple_refs() {
let refs = parse_cross_refs("mentioned in !123 and mentioned in #456");
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 123);
assert_eq!(refs[1].target_entity_type, "issue");
assert_eq!(refs[1].target_iid, 456);
}
#[test]
fn test_parse_no_refs() {
let refs = parse_cross_refs("Updated the description");
assert!(refs.is_empty());
}
#[test]
fn test_parse_non_english_note() {
let refs = parse_cross_refs("a ajout\u{00e9} l'\u{00e9}tiquette ~bug");
assert!(refs.is_empty());
}
#[test]
fn test_parse_multi_level_group_path() {
let refs = parse_cross_refs("mentioned in top/sub/project#123");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("top/sub/project")
);
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_parse_deeply_nested_group_path() {
let refs = parse_cross_refs("mentioned in a/b/c/d/e!42");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_project_path.as_deref(), Some("a/b/c/d/e"));
assert_eq!(refs[0].target_iid, 42);
}
#[test]
fn test_parse_hyphenated_project_path() {
let refs = parse_cross_refs("mentioned in my-group/my-project#99");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("my-group/my-project")
);
}
#[test]
fn test_parse_dotted_project_path() {
let refs = parse_cross_refs("mentioned in visiostack.io/backend#123");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("visiostack.io/backend")
);
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_parse_dotted_nested_project_path() {
let refs = parse_cross_refs("closed by my.org/sub.group/my.project!42");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("my.org/sub.group/my.project")
);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 42);
}
#[test]
fn test_parse_self_reference_is_valid() {
let refs = parse_cross_refs("mentioned in #123");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_parse_mixed_mentioned_and_closed() {
let refs = parse_cross_refs("mentioned in !10 and closed by !20");
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_iid, 10);
assert_eq!(refs[1].reference_type, "closes");
assert_eq!(refs[1].target_iid, 20);
}
fn setup_test_db() -> Connection {
use crate::core::db::{create_connection, run_migrations};
let conn = create_connection(std::path::Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_test_data(conn: &Connection) -> i64 {
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'group/test-project', 'https://gitlab.com/group/test-project', ?1, ?1)",
[now],
)
.unwrap();
// Issues with descriptions
let mut issue_stmt = conn.prepare_cached(
"SELECT id, iid, description FROM issues
WHERE project_id = ?1 AND description IS NOT NULL AND description != ''",
)?;
let issues: Vec<(i64, i64, String)> = issue_stmt
.query_map([project_id], |row| {
Ok((row.get(0)?, row.get(1)?, row.get(2)?))
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 123, 'Test Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (11, 1001, 1, 456, 'Another Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 789, 'Test MR', 'opened', 'feat', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES (30, 'disc-aaa', 1, 10, 'Issue', ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, merge_request_id, noteable_type, last_seen_at)
VALUES (31, 'disc-bbb', 1, 20, 'MergeRequest', ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 1, 'mentioned in !789', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (41, 4001, 31, 1, 1, 'mentioned in #456', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (42, 4002, 30, 1, 0, 'mentioned in !999', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (43, 4003, 30, 1, 1, 'added label ~bug', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (44, 4004, 30, 1, 1, 'mentioned in other/project#999', ?1, ?1, ?1)",
[now],
)
.unwrap();
1
for (entity_id, _iid, description) in &issues {
insert_url_refs(
conn,
&mut insert_stmt,
&mut result,
project_id,
"issue",
*entity_id,
description,
now,
)?;
}
#[test]
fn test_extract_refs_from_system_notes_integration() {
let conn = setup_test_db();
let project_id = seed_test_data(&conn);
// Merge requests with descriptions
let mut mr_stmt = conn.prepare_cached(
"SELECT id, iid, description FROM merge_requests
WHERE project_id = ?1 AND description IS NOT NULL AND description != ''",
)?;
let mrs: Vec<(i64, i64, String)> = mr_stmt
.query_map([project_id], |row| {
Ok((row.get(0)?, row.get(1)?, row.get(2)?))
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
let result = extract_refs_from_system_notes(&conn, project_id).unwrap();
assert_eq!(result.inserted, 2, "Two same-project refs should resolve");
assert_eq!(
result.skipped_unresolvable, 1,
"One cross-project ref should be unresolvable"
);
assert_eq!(
result.parse_failures, 1,
"One system note has no cross-ref pattern"
);
let ref_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE project_id = ?1 AND source_method = 'note_parse'",
[project_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(ref_count, 3, "Should have 3 entity_references rows total");
let unresolved_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE target_entity_id IS NULL AND source_method = 'note_parse'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
unresolved_count, 1,
"Should have 1 unresolved cross-project ref"
);
let (path, iid): (String, i64) = conn
.query_row(
"SELECT target_project_path, target_entity_iid FROM entity_references WHERE target_entity_id IS NULL",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.unwrap();
assert_eq!(path, "other/project");
assert_eq!(iid, 999);
for (entity_id, _iid, description) in &mrs {
insert_url_refs(
conn,
&mut insert_stmt,
&mut result,
project_id,
"merge_request",
*entity_id,
description,
now,
)?;
}
#[test]
fn test_extract_refs_idempotent() {
let conn = setup_test_db();
let project_id = seed_test_data(&conn);
let result1 = extract_refs_from_system_notes(&conn, project_id).unwrap();
let result2 = extract_refs_from_system_notes(&conn, project_id).unwrap();
assert_eq!(result2.inserted, 0);
assert_eq!(result2.skipped_unresolvable, 0);
let total: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE source_method = 'note_parse'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
total,
(result1.inserted + result1.skipped_unresolvable) as i64
if result.inserted > 0 || result.skipped_unresolvable > 0 {
debug!(
inserted = result.inserted,
unresolvable = result.skipped_unresolvable,
"Description cross-reference extraction complete"
);
}
#[test]
fn test_extract_refs_empty_project() {
let conn = setup_test_db();
let result = extract_refs_from_system_notes(&conn, 999).unwrap();
assert_eq!(result.inserted, 0);
assert_eq!(result.skipped_unresolvable, 0);
assert_eq!(result.parse_failures, 0);
}
Ok(result)
}
/// Extract cross-references from user (non-system) notes (GitLab URLs only).
pub fn extract_refs_from_user_notes(conn: &Connection, project_id: i64) -> Result<ExtractResult> {
let mut result = ExtractResult::default();
let mut note_stmt = conn.prepare_cached(
"SELECT n.id, n.body, d.noteable_type,
COALESCE(d.issue_id, d.merge_request_id) AS entity_id
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
WHERE n.is_system = 0
AND n.project_id = ?1
AND n.body IS NOT NULL",
)?;
let notes: Vec<(i64, String, String, i64)> = note_stmt
.query_map([project_id], |row| {
Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?))
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
if notes.is_empty() {
return Ok(result);
}
let mut insert_stmt = conn.prepare_cached(
"INSERT OR IGNORE INTO entity_references
(project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
target_project_path, target_entity_iid,
reference_type, source_method, created_at)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, 'note_parse', ?9)",
)?;
let now = now_ms();
for (_, body, noteable_type, entity_id) in &notes {
let source_entity_type = noteable_type_to_entity_type(noteable_type);
insert_url_refs(
conn,
&mut insert_stmt,
&mut result,
project_id,
source_entity_type,
*entity_id,
body,
now,
)?;
}
if result.inserted > 0 || result.skipped_unresolvable > 0 {
debug!(
inserted = result.inserted,
unresolvable = result.skipped_unresolvable,
"User note cross-reference extraction complete"
);
}
Ok(result)
}
/// Shared helper: parse URL refs from a body and insert into entity_references.
#[allow(clippy::too_many_arguments)]
fn insert_url_refs(
conn: &Connection,
insert_stmt: &mut rusqlite::CachedStatement<'_>,
result: &mut ExtractResult,
project_id: i64,
source_entity_type: &str,
source_entity_id: i64,
body: &str,
now: i64,
) -> Result<()> {
let url_refs = parse_url_refs(body);
for xref in &url_refs {
let target_entity_id = if let Some(ref path) = xref.target_project_path {
resolve_cross_project_entity(conn, path, &xref.target_entity_type, xref.target_iid)
} else {
resolve_entity_id(conn, project_id, &xref.target_entity_type, xref.target_iid)
};
let rows_changed = insert_stmt.execute(rusqlite::params![
project_id,
source_entity_type,
source_entity_id,
xref.target_entity_type,
target_entity_id,
xref.target_project_path,
if target_entity_id.is_none() {
Some(xref.target_iid)
} else {
None
},
xref.reference_type,
now,
])?;
if rows_changed > 0 {
if target_entity_id.is_none() {
result.skipped_unresolvable += 1;
} else {
result.inserted += 1;
}
}
}
Ok(())
}
#[cfg(test)]
#[path = "note_parser_tests.rs"]
mod tests;

View File

@@ -0,0 +1,770 @@
use super::*;
// --- parse_cross_refs: real GitLab system note format ---
#[test]
fn test_parse_mentioned_in_mr() {
let refs = parse_cross_refs("mentioned in merge request !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 567);
assert!(refs[0].target_project_path.is_none());
}
#[test]
fn test_parse_mentioned_in_issue() {
let refs = parse_cross_refs("mentioned in issue #234");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 234);
assert!(refs[0].target_project_path.is_none());
}
#[test]
fn test_parse_mentioned_cross_project() {
let refs = parse_cross_refs("mentioned in merge request group/repo!789");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 789);
assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
}
#[test]
fn test_parse_mentioned_cross_project_issue() {
let refs = parse_cross_refs("mentioned in issue group/repo#123");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 123);
assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
}
#[test]
fn test_parse_closed_by_mr() {
let refs = parse_cross_refs("closed by merge request !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 567);
assert!(refs[0].target_project_path.is_none());
}
#[test]
fn test_parse_closed_by_cross_project() {
let refs = parse_cross_refs("closed by merge request group/repo!789");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 789);
assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
}
#[test]
fn test_parse_multiple_refs() {
let refs = parse_cross_refs("mentioned in merge request !123 and mentioned in issue #456");
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 123);
assert_eq!(refs[1].target_entity_type, "issue");
assert_eq!(refs[1].target_iid, 456);
}
#[test]
fn test_parse_no_refs() {
let refs = parse_cross_refs("Updated the description");
assert!(refs.is_empty());
}
#[test]
fn test_parse_non_english_note() {
let refs = parse_cross_refs("a ajout\u{00e9} l'\u{00e9}tiquette ~bug");
assert!(refs.is_empty());
}
#[test]
fn test_parse_multi_level_group_path() {
let refs = parse_cross_refs("mentioned in issue top/sub/project#123");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("top/sub/project")
);
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_parse_deeply_nested_group_path() {
let refs = parse_cross_refs("mentioned in merge request a/b/c/d/e!42");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_project_path.as_deref(), Some("a/b/c/d/e"));
assert_eq!(refs[0].target_iid, 42);
}
#[test]
fn test_parse_hyphenated_project_path() {
let refs = parse_cross_refs("mentioned in issue my-group/my-project#99");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("my-group/my-project")
);
}
#[test]
fn test_parse_dotted_project_path() {
let refs = parse_cross_refs("mentioned in issue visiostack.io/backend#123");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("visiostack.io/backend")
);
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_parse_dotted_nested_project_path() {
let refs = parse_cross_refs("closed by merge request my.org/sub.group/my.project!42");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("my.org/sub.group/my.project")
);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 42);
}
// Bare-sigil fallback (no "issue"/"merge request" word) still works
#[test]
fn test_parse_bare_sigil_fallback() {
let refs = parse_cross_refs("mentioned in #123");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_iid, 123);
assert_eq!(refs[0].target_entity_type, "issue");
}
#[test]
fn test_parse_bare_sigil_closed_by() {
let refs = parse_cross_refs("closed by !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 567);
}
#[test]
fn test_parse_mixed_mentioned_and_closed() {
let refs = parse_cross_refs("mentioned in merge request !10 and closed by merge request !20");
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_iid, 10);
assert_eq!(refs[1].reference_type, "closes");
assert_eq!(refs[1].target_iid, 20);
}
// --- parse_url_refs ---
#[test]
fn test_url_ref_same_project_issue() {
let refs = parse_url_refs(
"See https://gitlab.visiostack.com/vs/typescript-code/-/issues/3537 for details",
);
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 3537);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("vs/typescript-code")
);
assert_eq!(refs[0].reference_type, "mentioned");
}
#[test]
fn test_url_ref_merge_request() {
let refs =
parse_url_refs("https://gitlab.visiostack.com/vs/typescript-code/-/merge_requests/3548");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 3548);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("vs/typescript-code")
);
}
#[test]
fn test_url_ref_cross_project() {
let refs = parse_url_refs(
"Related: https://gitlab.visiostack.com/vs/python-code/-/merge_requests/5203",
);
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 5203);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("vs/python-code")
);
}
#[test]
fn test_url_ref_with_anchor() {
let refs =
parse_url_refs("https://gitlab.visiostack.com/vs/typescript-code/-/issues/123#note_456");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_url_ref_markdown_link() {
let refs = parse_url_refs(
"Check [this MR](https://gitlab.visiostack.com/vs/typescript-code/-/merge_requests/100) for context",
);
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 100);
}
#[test]
fn test_url_ref_multiple_urls() {
let body =
"See https://gitlab.com/a/b/-/issues/1 and https://gitlab.com/a/b/-/merge_requests/2";
let refs = parse_url_refs(body);
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 1);
assert_eq!(refs[1].target_entity_type, "merge_request");
assert_eq!(refs[1].target_iid, 2);
}
#[test]
fn test_url_ref_deduplicates() {
let body = "See https://gitlab.com/a/b/-/issues/1 and again https://gitlab.com/a/b/-/issues/1";
let refs = parse_url_refs(body);
assert_eq!(
refs.len(),
1,
"Duplicate URLs in same body should be deduplicated"
);
}
#[test]
fn test_url_ref_non_gitlab_urls_ignored() {
let refs = parse_url_refs(
"Check https://google.com/search?q=test and https://github.com/org/repo/issues/1",
);
assert!(refs.is_empty());
}
#[test]
fn test_url_ref_deeply_nested_project() {
let refs = parse_url_refs("https://gitlab.com/org/sub/deep/project/-/issues/42");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("org/sub/deep/project")
);
assert_eq!(refs[0].target_iid, 42);
}
// --- Integration tests: system notes (updated for real format) ---
fn setup_test_db() -> Connection {
use crate::core::db::{create_connection, run_migrations};
let conn = create_connection(std::path::Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_test_data(conn: &Connection) -> i64 {
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'group/test-project', 'https://gitlab.com/group/test-project', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 123, 'Test Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (11, 1001, 1, 456, 'Another Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 789, 'Test MR', 'opened', 'feat', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES (30, 'disc-aaa', 1, 10, 'Issue', ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, merge_request_id, noteable_type, last_seen_at)
VALUES (31, 'disc-bbb', 1, 20, 'MergeRequest', ?1)",
[now],
)
.unwrap();
// System note: real GitLab format "mentioned in merge request !789"
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 1, 'mentioned in merge request !789', ?1, ?1, ?1)",
[now],
)
.unwrap();
// System note: real GitLab format "mentioned in issue #456"
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (41, 4001, 31, 1, 1, 'mentioned in issue #456', ?1, ?1, ?1)",
[now],
)
.unwrap();
// User note (is_system=0) — should NOT be processed by system note extractor
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (42, 4002, 30, 1, 0, 'mentioned in merge request !999', ?1, ?1, ?1)",
[now],
)
.unwrap();
// System note with no cross-ref pattern
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (43, 4003, 30, 1, 1, 'added label ~bug', ?1, ?1, ?1)",
[now],
)
.unwrap();
// System note: cross-project ref
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (44, 4004, 30, 1, 1, 'mentioned in issue other/project#999', ?1, ?1, ?1)",
[now],
)
.unwrap();
1
}
#[test]
fn test_extract_refs_from_system_notes_integration() {
let conn = setup_test_db();
let project_id = seed_test_data(&conn);
let result = extract_refs_from_system_notes(&conn, project_id).unwrap();
assert_eq!(result.inserted, 2, "Two same-project refs should resolve");
assert_eq!(
result.skipped_unresolvable, 1,
"One cross-project ref should be unresolvable"
);
assert_eq!(
result.parse_failures, 1,
"One system note has no cross-ref pattern"
);
let ref_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE project_id = ?1 AND source_method = 'note_parse'",
[project_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(ref_count, 3, "Should have 3 entity_references rows total");
let unresolved_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE target_entity_id IS NULL AND source_method = 'note_parse'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
unresolved_count, 1,
"Should have 1 unresolved cross-project ref"
);
let (path, iid): (String, i64) = conn
.query_row(
"SELECT target_project_path, target_entity_iid FROM entity_references WHERE target_entity_id IS NULL",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.unwrap();
assert_eq!(path, "other/project");
assert_eq!(iid, 999);
}
#[test]
fn test_extract_refs_idempotent() {
let conn = setup_test_db();
let project_id = seed_test_data(&conn);
let result1 = extract_refs_from_system_notes(&conn, project_id).unwrap();
let result2 = extract_refs_from_system_notes(&conn, project_id).unwrap();
assert_eq!(result2.inserted, 0);
assert_eq!(result2.skipped_unresolvable, 0);
let total: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE source_method = 'note_parse'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
total,
(result1.inserted + result1.skipped_unresolvable) as i64
);
}
#[test]
fn test_extract_refs_empty_project() {
let conn = setup_test_db();
let result = extract_refs_from_system_notes(&conn, 999).unwrap();
assert_eq!(result.inserted, 0);
assert_eq!(result.skipped_unresolvable, 0);
assert_eq!(result.parse_failures, 0);
}
// --- Integration tests: description extraction ---
#[test]
fn test_extract_refs_from_descriptions_issue() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/typescript-code', 'https://gitlab.com/vs/typescript-code', ?1, ?1)",
[now],
)
.unwrap();
// Issue with MR reference in description
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, description, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 3537, 'Test Issue', 'opened',
'Related to https://gitlab.com/vs/typescript-code/-/merge_requests/3548',
?1, ?1, ?1)",
[now],
)
.unwrap();
// The target MR so it resolves
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 3548, 'Fix MR', 'merged', 'fix', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(result.inserted, 1, "Should insert 1 description ref");
assert_eq!(result.skipped_unresolvable, 0);
let method: String = conn
.query_row(
"SELECT source_method FROM entity_references WHERE project_id = 1",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(method, "description_parse");
}
#[test]
fn test_extract_refs_from_descriptions_mr() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/typescript-code', 'https://gitlab.com/vs/typescript-code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 100, 'Target Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, description, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 200, 'Fixing MR', 'merged', 'fix', 'main', 'dev',
'Fixes https://gitlab.com/vs/typescript-code/-/issues/100',
?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(result.inserted, 1);
let (src_type, tgt_type): (String, String) = conn
.query_row(
"SELECT source_entity_type, target_entity_type FROM entity_references WHERE project_id = 1",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.unwrap();
assert_eq!(src_type, "merge_request");
assert_eq!(tgt_type, "issue");
}
#[test]
fn test_extract_refs_from_descriptions_idempotent() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, description, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 1, 'Issue', 'opened',
'See https://gitlab.com/vs/code/-/merge_requests/2', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 2, 'MR', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
let r1 = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(r1.inserted, 1);
let r2 = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(r2.inserted, 0, "Second run should insert 0 (idempotent)");
}
#[test]
fn test_extract_refs_from_descriptions_cross_project_unresolved() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/typescript-code', 'https://gitlab.com/vs/typescript-code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, description, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 1, 'Issue', 'opened',
'See https://gitlab.com/vs/other-project/-/merge_requests/99', ?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_descriptions(&conn, 1).unwrap();
assert_eq!(result.inserted, 0);
assert_eq!(
result.skipped_unresolvable, 1,
"Cross-project ref with no matching project should be unresolvable"
);
let (path, iid): (String, i64) = conn
.query_row(
"SELECT target_project_path, target_entity_iid FROM entity_references WHERE target_entity_id IS NULL",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.unwrap();
assert_eq!(path, "vs/other-project");
assert_eq!(iid, 99);
}
// --- Integration tests: user note extraction ---
#[test]
fn test_extract_refs_from_user_notes_with_url() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 50, 'Source Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 60, 'Target MR', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES (30, 'disc-user', 1, 10, 'Issue', ?1)",
[now],
)
.unwrap();
// User note with a URL
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 0,
'This is related to https://gitlab.com/vs/code/-/merge_requests/60',
?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_user_notes(&conn, 1).unwrap();
assert_eq!(result.inserted, 1);
let method: String = conn
.query_row(
"SELECT source_method FROM entity_references WHERE project_id = 1",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(method, "note_parse");
}
#[test]
fn test_extract_refs_from_user_notes_no_system_note_patterns() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 50, 'Source', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 999, 'Target', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES (30, 'disc-x', 1, 10, 'Issue', ?1)",
[now],
)
.unwrap();
// User note with system-note-like text but no URL — should NOT extract
// (user notes only use URL parsing, not system note pattern matching)
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 0, 'mentioned in merge request !999', ?1, ?1, ?1)",
[now],
)
.unwrap();
let result = extract_refs_from_user_notes(&conn, 1).unwrap();
assert_eq!(
result.inserted, 0,
"User notes should only parse URLs, not system note patterns"
);
}
#[test]
fn test_extract_refs_from_user_notes_idempotent() {
let conn = setup_test_db();
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 1, 'Src', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 2, 'Tgt', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES (30, 'disc-y', 1, 10, 'Issue', ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 0,
'See https://gitlab.com/vs/code/-/merge_requests/2', ?1, ?1, ?1)",
[now],
)
.unwrap();
let r1 = extract_refs_from_user_notes(&conn, 1).unwrap();
assert_eq!(r1.inserted, 1);
let r2 = extract_refs_from_user_notes(&conn, 1).unwrap();
assert_eq!(r2.inserted, 0, "Second extraction should be idempotent");
}

244
src/core/path_resolver.rs Normal file
View File

@@ -0,0 +1,244 @@
use rusqlite::Connection;
use super::error::{LoreError, Result};
// ─── SQL Helpers ─────────────────────────────────────────────────────────────
/// Escape LIKE metacharacters (`%`, `_`, `\`).
/// All queries using this must include `ESCAPE '\'`.
pub fn escape_like(input: &str) -> String {
input
.replace('\\', "\\\\")
.replace('%', "\\%")
.replace('_', "\\_")
}
/// Normalize user-supplied repo paths to match stored DiffNote / file-change paths.
/// - trims whitespace
/// - strips leading "./" and "/" (repo-relative paths)
/// - converts '\' to '/' when no '/' present (Windows paste)
/// - collapses repeated "//"
pub fn normalize_repo_path(input: &str) -> String {
let mut s = input.trim().to_string();
// Windows backslash normalization (only when no forward slashes present)
if s.contains('\\') && !s.contains('/') {
s = s.replace('\\', "/");
}
// Strip leading ./
while s.starts_with("./") {
s = s[2..].to_string();
}
// Strip leading /
s = s.trim_start_matches('/').to_string();
// Collapse repeated //
while s.contains("//") {
s = s.replace("//", "/");
}
s
}
// ─── Path Query Resolution ──────────────────────────────────────────────────
/// Describes how to match a user-supplied path in SQL.
#[derive(Debug)]
pub struct PathQuery {
/// The parameter value to bind.
pub value: String,
/// If true: use `LIKE value ESCAPE '\'`. If false: use `= value`.
pub is_prefix: bool,
}
/// Result of a suffix probe against the DB.
pub enum SuffixResult {
/// Suffix probe was not attempted (conditions not met).
NotAttempted,
/// No paths matched the suffix.
NoMatch,
/// Exactly one distinct path matched — auto-resolve.
Unique(String),
/// Multiple distinct paths matched — user must disambiguate.
Ambiguous(Vec<String>),
}
/// Build a path query from a user-supplied path, with project-scoped DB probes.
///
/// Resolution strategy (in priority order):
/// 1. Trailing `/` → directory prefix (LIKE `path/%`)
/// 2. Exact match probe against notes + `mr_file_changes` → exact (= `path`)
/// 3. Directory prefix probe → prefix (LIKE `path/%`)
/// 4. Suffix probe for bare filenames → auto-resolve or ambiguity error
/// 5. Heuristic fallback: `.` in last segment → file, else → directory prefix
pub fn build_path_query(
conn: &Connection,
path: &str,
project_id: Option<i64>,
) -> Result<PathQuery> {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
let is_root = !trimmed.contains('/');
let forced_dir = path.ends_with('/');
// Heuristic is now only a fallback; probes decide first when ambiguous.
let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
// Probe 1: exact file exists in DiffNotes OR mr_file_changes (project-scoped)
// Checks both new_path and old_path to support querying renamed files.
let exact_exists = conn
.query_row(
"SELECT 1 FROM notes INDEXED BY idx_notes_diffnote_path_created
WHERE note_type = 'DiffNote'
AND is_system = 0
AND (position_new_path = ?1 OR position_old_path = ?1)
AND (?2 IS NULL OR project_id = ?2)
LIMIT 1",
rusqlite::params![trimmed, project_id],
|_| Ok(()),
)
.is_ok()
|| conn
.query_row(
"SELECT 1 FROM mr_file_changes
WHERE (new_path = ?1 OR old_path = ?1)
AND (?2 IS NULL OR project_id = ?2)
LIMIT 1",
rusqlite::params![trimmed, project_id],
|_| Ok(()),
)
.is_ok();
// Probe 2: directory prefix exists in DiffNotes OR mr_file_changes (project-scoped)
let prefix_exists = if !forced_dir && !exact_exists {
let escaped = escape_like(trimmed);
let pat = format!("{escaped}/%");
conn.query_row(
"SELECT 1 FROM notes INDEXED BY idx_notes_diffnote_path_created
WHERE note_type = 'DiffNote'
AND is_system = 0
AND (position_new_path LIKE ?1 ESCAPE '\\' OR position_old_path LIKE ?1 ESCAPE '\\')
AND (?2 IS NULL OR project_id = ?2)
LIMIT 1",
rusqlite::params![pat, project_id],
|_| Ok(()),
)
.is_ok()
|| conn
.query_row(
"SELECT 1 FROM mr_file_changes
WHERE (new_path LIKE ?1 ESCAPE '\\' OR old_path LIKE ?1 ESCAPE '\\')
AND (?2 IS NULL OR project_id = ?2)
LIMIT 1",
rusqlite::params![pat, project_id],
|_| Ok(()),
)
.is_ok()
} else {
false
};
// Probe 3: suffix match — user typed a bare filename or partial path that
// doesn't exist as-is. Search for full paths ending with /input (or equal to input).
// This handles "login.rs" matching "src/auth/login.rs".
let suffix_resolved = if !forced_dir && !exact_exists && !prefix_exists && looks_like_file {
suffix_probe(conn, trimmed, project_id)?
} else {
SuffixResult::NotAttempted
};
match suffix_resolved {
SuffixResult::Unique(full_path) => Ok(PathQuery {
value: full_path,
is_prefix: false,
}),
SuffixResult::Ambiguous(candidates) => {
let list = candidates
.iter()
.map(|p| format!(" {p}"))
.collect::<Vec<_>>()
.join("\n");
Err(LoreError::Ambiguous(format!(
"'{trimmed}' matches multiple paths. Use the full path or -p to scope:\n{list}"
)))
}
SuffixResult::NotAttempted | SuffixResult::NoMatch => {
// Original logic: exact > prefix > heuristic
let is_file = if forced_dir {
false
} else if exact_exists {
true
} else if prefix_exists {
false
} else {
looks_like_file
};
if is_file {
Ok(PathQuery {
value: trimmed.to_string(),
is_prefix: false,
})
} else {
let escaped = escape_like(trimmed);
Ok(PathQuery {
value: format!("{escaped}/%"),
is_prefix: true,
})
}
}
}
}
/// Probe both notes and mr_file_changes for paths ending with the given suffix.
/// Searches both new_path and old_path columns to support renamed file resolution.
/// Returns up to 11 distinct candidates (enough to detect ambiguity + show a useful list).
pub fn suffix_probe(
conn: &Connection,
suffix: &str,
project_id: Option<i64>,
) -> Result<SuffixResult> {
let escaped = escape_like(suffix);
let suffix_pat = format!("%/{escaped}");
let mut stmt = conn.prepare_cached(
"SELECT DISTINCT full_path FROM (
SELECT position_new_path AS full_path
FROM notes INDEXED BY idx_notes_diffnote_path_created
WHERE note_type = 'DiffNote'
AND is_system = 0
AND (position_new_path LIKE ?1 ESCAPE '\\' OR position_new_path = ?2)
AND (?3 IS NULL OR project_id = ?3)
UNION
SELECT new_path AS full_path FROM mr_file_changes
WHERE (new_path LIKE ?1 ESCAPE '\\' OR new_path = ?2)
AND (?3 IS NULL OR project_id = ?3)
UNION
SELECT position_old_path AS full_path FROM notes
WHERE note_type = 'DiffNote'
AND is_system = 0
AND position_old_path IS NOT NULL
AND (position_old_path LIKE ?1 ESCAPE '\\' OR position_old_path = ?2)
AND (?3 IS NULL OR project_id = ?3)
UNION
SELECT old_path AS full_path FROM mr_file_changes
WHERE old_path IS NOT NULL
AND (old_path LIKE ?1 ESCAPE '\\' OR old_path = ?2)
AND (?3 IS NULL OR project_id = ?3)
)
ORDER BY full_path
LIMIT 11",
)?;
let candidates: Vec<String> = stmt
.query_map(rusqlite::params![suffix_pat, suffix, project_id], |row| {
row.get(0)
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
match candidates.len() {
0 => Ok(SuffixResult::NoMatch),
1 => Ok(SuffixResult::Unique(candidates.into_iter().next().unwrap())),
_ => Ok(SuffixResult::Ambiguous(candidates)),
}
}
#[cfg(test)]
#[path = "path_resolver_tests.rs"]
mod tests;

View File

@@ -0,0 +1,290 @@
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_project(conn: &Connection, id: i64) {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (?1, ?1, 'group/repo', 'https://gl.example.com/group/repo', 1000, 2000)",
rusqlite::params![id],
)
.unwrap();
}
fn seed_mr(conn: &Connection, mr_id: i64, project_id: i64) {
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (?1, ?1, ?1, ?2, 'MR', 'merged', 1000, 2000, 2000, 'feat', 'main')",
rusqlite::params![mr_id, project_id],
)
.unwrap();
}
fn seed_file_change(conn: &Connection, mr_id: i64, project_id: i64, path: &str) {
conn.execute(
"INSERT INTO mr_file_changes (merge_request_id, project_id, new_path, change_type)
VALUES (?1, ?2, ?3, 'modified')",
rusqlite::params![mr_id, project_id, path],
)
.unwrap();
}
fn seed_diffnote(conn: &Connection, id: i64, project_id: i64, path: &str) {
// Need a discussion first (MergeRequest type, linked to mr_id=1)
conn.execute(
"INSERT OR IGNORE INTO discussions (id, gitlab_discussion_id, project_id, \
merge_request_id, noteable_type, resolvable, resolved, last_seen_at, last_note_at)
VALUES (?1, ?2, ?3, 1, 'MergeRequest', 1, 0, 2000, 2000)",
rusqlite::params![id, format!("disc-{id}"), project_id],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, note_type, is_system, \
author_username, body, created_at, updated_at, last_seen_at, position_new_path)
VALUES (?1, ?1, ?1, ?2, 'DiffNote', 0, 'user', 'note', 1000, 2000, 2000, ?3)",
rusqlite::params![id, project_id, path],
)
.unwrap();
}
// ─── escape_like ─────────────────────────────────────────────────────────────
#[test]
fn test_escape_like() {
assert_eq!(escape_like("normal/path"), "normal/path");
assert_eq!(escape_like("has_underscore"), "has\\_underscore");
assert_eq!(escape_like("has%percent"), "has\\%percent");
assert_eq!(escape_like("has\\backslash"), "has\\\\backslash");
}
// ─── normalize_repo_path ─────────────────────────────────────────────────────
#[test]
fn test_normalize_repo_path() {
assert_eq!(normalize_repo_path("./src/foo/"), "src/foo/");
assert_eq!(normalize_repo_path("/src/foo/"), "src/foo/");
assert_eq!(normalize_repo_path("././src/foo"), "src/foo");
assert_eq!(normalize_repo_path("src\\foo\\bar.rs"), "src/foo/bar.rs");
assert_eq!(normalize_repo_path("src/foo\\bar"), "src/foo\\bar");
assert_eq!(normalize_repo_path("src//foo//bar/"), "src/foo/bar/");
assert_eq!(normalize_repo_path(" src/foo/ "), "src/foo/");
assert_eq!(normalize_repo_path("src/foo/bar.rs"), "src/foo/bar.rs");
assert_eq!(normalize_repo_path(""), "");
}
// ─── build_path_query heuristics (no DB data) ──────────────────────────────
#[test]
fn test_trailing_slash_is_prefix() {
let conn = setup_test_db();
let pq = build_path_query(&conn, "src/auth/", None).unwrap();
assert_eq!(pq.value, "src/auth/%");
assert!(pq.is_prefix);
}
#[test]
fn test_no_dot_in_last_segment_is_prefix() {
let conn = setup_test_db();
let pq = build_path_query(&conn, "src/auth", None).unwrap();
assert_eq!(pq.value, "src/auth/%");
assert!(pq.is_prefix);
}
#[test]
fn test_file_extension_is_exact() {
let conn = setup_test_db();
let pq = build_path_query(&conn, "src/auth/login.rs", None).unwrap();
assert_eq!(pq.value, "src/auth/login.rs");
assert!(!pq.is_prefix);
}
#[test]
fn test_root_file_is_exact() {
let conn = setup_test_db();
let pq = build_path_query(&conn, "README.md", None).unwrap();
assert_eq!(pq.value, "README.md");
assert!(!pq.is_prefix);
}
#[test]
fn test_dotless_root_file_is_exact() {
let conn = setup_test_db();
let pq = build_path_query(&conn, "Makefile", None).unwrap();
assert_eq!(pq.value, "Makefile");
assert!(!pq.is_prefix);
let pq = build_path_query(&conn, "LICENSE", None).unwrap();
assert_eq!(pq.value, "LICENSE");
assert!(!pq.is_prefix);
}
#[test]
fn test_metacharacters_escaped_in_prefix() {
let conn = setup_test_db();
let pq = build_path_query(&conn, "src/test_files/", None).unwrap();
assert_eq!(pq.value, "src/test\\_files/%");
assert!(pq.is_prefix);
}
#[test]
fn test_exact_value_not_escaped() {
let conn = setup_test_db();
let pq = build_path_query(&conn, "README_with_underscore.md", None).unwrap();
assert_eq!(pq.value, "README_with_underscore.md");
assert!(!pq.is_prefix);
}
// ─── build_path_query DB probes ─────────────────────────────────────────────
#[test]
fn test_db_probe_detects_dotless_file() {
// "src/Dockerfile" has no dot in last segment -> normally prefix.
// DB probe detects it's actually a file.
let conn = setup_test_db();
seed_project(&conn, 1);
seed_mr(&conn, 1, 1);
seed_diffnote(&conn, 1, 1, "src/Dockerfile");
let pq = build_path_query(&conn, "src/Dockerfile", None).unwrap();
assert_eq!(pq.value, "src/Dockerfile");
assert!(!pq.is_prefix);
// Without DB data -> falls through to prefix
let empty = setup_test_db();
let pq2 = build_path_query(&empty, "src/Dockerfile", None).unwrap();
assert!(pq2.is_prefix);
}
#[test]
fn test_db_probe_via_file_changes() {
// Exact match via mr_file_changes even without notes
let conn = setup_test_db();
seed_project(&conn, 1);
seed_mr(&conn, 1, 1);
seed_file_change(&conn, 1, 1, "src/Dockerfile");
let pq = build_path_query(&conn, "src/Dockerfile", None).unwrap();
assert_eq!(pq.value, "src/Dockerfile");
assert!(!pq.is_prefix);
}
#[test]
fn test_db_probe_project_scoped() {
let conn = setup_test_db();
seed_project(&conn, 1);
seed_project(&conn, 2);
seed_mr(&conn, 1, 1);
seed_diffnote(&conn, 1, 1, "infra/Makefile");
// Unscoped: finds it
assert!(
!build_path_query(&conn, "infra/Makefile", None)
.unwrap()
.is_prefix
);
// Scoped to project 1: finds it
assert!(
!build_path_query(&conn, "infra/Makefile", Some(1))
.unwrap()
.is_prefix
);
// Scoped to project 2: no data -> prefix
assert!(
build_path_query(&conn, "infra/Makefile", Some(2))
.unwrap()
.is_prefix
);
}
// ─── suffix resolution ──────────────────────────────────────────────────────
#[test]
fn test_suffix_resolves_bare_filename() {
let conn = setup_test_db();
seed_project(&conn, 1);
seed_mr(&conn, 1, 1);
seed_file_change(&conn, 1, 1, "src/auth/login.rs");
let pq = build_path_query(&conn, "login.rs", None).unwrap();
assert_eq!(pq.value, "src/auth/login.rs");
assert!(!pq.is_prefix);
}
#[test]
fn test_suffix_resolves_partial_path() {
let conn = setup_test_db();
seed_project(&conn, 1);
seed_mr(&conn, 1, 1);
seed_file_change(&conn, 1, 1, "src/auth/login.rs");
let pq = build_path_query(&conn, "auth/login.rs", None).unwrap();
assert_eq!(pq.value, "src/auth/login.rs");
assert!(!pq.is_prefix);
}
#[test]
fn test_suffix_ambiguous_returns_error() {
let conn = setup_test_db();
seed_project(&conn, 1);
seed_mr(&conn, 1, 1);
seed_file_change(&conn, 1, 1, "src/auth/utils.rs");
seed_file_change(&conn, 1, 1, "src/db/utils.rs");
let err = build_path_query(&conn, "utils.rs", None).unwrap_err();
let msg = err.to_string();
assert!(msg.contains("src/auth/utils.rs"), "candidates: {msg}");
assert!(msg.contains("src/db/utils.rs"), "candidates: {msg}");
}
#[test]
fn test_suffix_scoped_to_project() {
let conn = setup_test_db();
seed_project(&conn, 1);
seed_project(&conn, 2);
seed_mr(&conn, 1, 1);
seed_mr(&conn, 2, 2);
seed_file_change(&conn, 1, 1, "src/utils.rs");
seed_file_change(&conn, 2, 2, "lib/utils.rs");
// Unscoped: ambiguous
assert!(build_path_query(&conn, "utils.rs", None).is_err());
// Scoped to project 1: resolves
let pq = build_path_query(&conn, "utils.rs", Some(1)).unwrap();
assert_eq!(pq.value, "src/utils.rs");
}
#[test]
fn test_suffix_deduplicates_across_sources() {
// Same path in notes AND file_changes -> single match, not ambiguous
let conn = setup_test_db();
seed_project(&conn, 1);
seed_mr(&conn, 1, 1);
seed_file_change(&conn, 1, 1, "src/auth/login.rs");
seed_diffnote(&conn, 1, 1, "src/auth/login.rs");
let pq = build_path_query(&conn, "login.rs", None).unwrap();
assert_eq!(pq.value, "src/auth/login.rs");
assert!(!pq.is_prefix);
}
#[test]
fn test_exact_match_preferred_over_suffix() {
let conn = setup_test_db();
seed_project(&conn, 1);
seed_mr(&conn, 1, 1);
seed_file_change(&conn, 1, 1, "README.md");
seed_file_change(&conn, 1, 1, "docs/README.md");
// "README.md" exists as exact match -> no ambiguity
let pq = build_path_query(&conn, "README.md", None).unwrap();
assert_eq!(pq.value, "README.md");
assert!(!pq.is_prefix);
}

View File

@@ -95,110 +95,5 @@ pub fn read_payload(conn: &Connection, id: i64) -> Result<Option<serde_json::Val
}
#[cfg(test)]
mod tests {
use super::*;
use crate::core::db::create_connection;
use tempfile::tempdir;
fn setup_test_db() -> Connection {
let dir = tempdir().unwrap();
let db_path = dir.path().join("test.db");
let conn = create_connection(&db_path).unwrap();
conn.execute_batch(
"CREATE TABLE raw_payloads (
id INTEGER PRIMARY KEY,
source TEXT NOT NULL,
project_id INTEGER,
resource_type TEXT NOT NULL,
gitlab_id TEXT NOT NULL,
fetched_at INTEGER NOT NULL,
content_encoding TEXT NOT NULL DEFAULT 'identity',
payload_hash TEXT NOT NULL,
payload BLOB NOT NULL
);
CREATE UNIQUE INDEX uq_raw_payloads_dedupe
ON raw_payloads(project_id, resource_type, gitlab_id, payload_hash);",
)
.unwrap();
conn
}
#[test]
fn test_store_and_read_payload() {
let conn = setup_test_db();
let payload = serde_json::json!({"title": "Test Issue", "id": 123});
let json_bytes = serde_json::to_vec(&payload).unwrap();
let id = store_payload(
&conn,
StorePayloadOptions {
project_id: Some(1),
resource_type: "issue",
gitlab_id: "123",
json_bytes: &json_bytes,
compress: false,
},
)
.unwrap();
let result = read_payload(&conn, id).unwrap().unwrap();
assert_eq!(result["title"], "Test Issue");
}
#[test]
fn test_compression_roundtrip() {
let conn = setup_test_db();
let payload = serde_json::json!({"data": "x".repeat(1000)});
let json_bytes = serde_json::to_vec(&payload).unwrap();
let id = store_payload(
&conn,
StorePayloadOptions {
project_id: Some(1),
resource_type: "issue",
gitlab_id: "456",
json_bytes: &json_bytes,
compress: true,
},
)
.unwrap();
let result = read_payload(&conn, id).unwrap().unwrap();
assert_eq!(result["data"], "x".repeat(1000));
}
#[test]
fn test_deduplication() {
let conn = setup_test_db();
let payload = serde_json::json!({"id": 789});
let json_bytes = serde_json::to_vec(&payload).unwrap();
let id1 = store_payload(
&conn,
StorePayloadOptions {
project_id: Some(1),
resource_type: "issue",
gitlab_id: "789",
json_bytes: &json_bytes,
compress: false,
},
)
.unwrap();
let id2 = store_payload(
&conn,
StorePayloadOptions {
project_id: Some(1),
resource_type: "issue",
gitlab_id: "789",
json_bytes: &json_bytes,
compress: false,
},
)
.unwrap();
assert_eq!(id1, id2);
}
}
#[path = "payloads_tests.rs"]
mod tests;

105
src/core/payloads_tests.rs Normal file
View File

@@ -0,0 +1,105 @@
use super::*;
use crate::core::db::create_connection;
use tempfile::tempdir;
fn setup_test_db() -> Connection {
let dir = tempdir().unwrap();
let db_path = dir.path().join("test.db");
let conn = create_connection(&db_path).unwrap();
conn.execute_batch(
"CREATE TABLE raw_payloads (
id INTEGER PRIMARY KEY,
source TEXT NOT NULL,
project_id INTEGER,
resource_type TEXT NOT NULL,
gitlab_id TEXT NOT NULL,
fetched_at INTEGER NOT NULL,
content_encoding TEXT NOT NULL DEFAULT 'identity',
payload_hash TEXT NOT NULL,
payload BLOB NOT NULL
);
CREATE UNIQUE INDEX uq_raw_payloads_dedupe
ON raw_payloads(project_id, resource_type, gitlab_id, payload_hash);",
)
.unwrap();
conn
}
#[test]
fn test_store_and_read_payload() {
let conn = setup_test_db();
let payload = serde_json::json!({"title": "Test Issue", "id": 123});
let json_bytes = serde_json::to_vec(&payload).unwrap();
let id = store_payload(
&conn,
StorePayloadOptions {
project_id: Some(1),
resource_type: "issue",
gitlab_id: "123",
json_bytes: &json_bytes,
compress: false,
},
)
.unwrap();
let result = read_payload(&conn, id).unwrap().unwrap();
assert_eq!(result["title"], "Test Issue");
}
#[test]
fn test_compression_roundtrip() {
let conn = setup_test_db();
let payload = serde_json::json!({"data": "x".repeat(1000)});
let json_bytes = serde_json::to_vec(&payload).unwrap();
let id = store_payload(
&conn,
StorePayloadOptions {
project_id: Some(1),
resource_type: "issue",
gitlab_id: "456",
json_bytes: &json_bytes,
compress: true,
},
)
.unwrap();
let result = read_payload(&conn, id).unwrap().unwrap();
assert_eq!(result["data"], "x".repeat(1000));
}
#[test]
fn test_deduplication() {
let conn = setup_test_db();
let payload = serde_json::json!({"id": 789});
let json_bytes = serde_json::to_vec(&payload).unwrap();
let id1 = store_payload(
&conn,
StorePayloadOptions {
project_id: Some(1),
resource_type: "issue",
gitlab_id: "789",
json_bytes: &json_bytes,
compress: false,
},
)
.unwrap();
let id2 = store_payload(
&conn,
StorePayloadOptions {
project_id: Some(1),
resource_type: "issue",
gitlab_id: "789",
json_bytes: &json_bytes,
compress: false,
},
)
.unwrap();
assert_eq!(id1, id2);
}

View File

@@ -1,6 +1,7 @@
use rusqlite::Connection;
use super::error::{LoreError, Result};
use super::path_resolver::escape_like;
pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> {
let exact = conn.query_row(
@@ -106,169 +107,6 @@ pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> {
/// Escape LIKE metacharacters so `%` and `_` in user input are treated as
/// literals. All queries using this must include `ESCAPE '\'`.
fn escape_like(input: &str) -> String {
input
.replace('\\', "\\\\")
.replace('%', "\\%")
.replace('_', "\\_")
}
#[cfg(test)]
mod tests {
use super::*;
fn setup_db() -> Connection {
let conn = Connection::open_in_memory().unwrap();
conn.execute_batch(
"
CREATE TABLE projects (
id INTEGER PRIMARY KEY,
gitlab_project_id INTEGER UNIQUE NOT NULL,
path_with_namespace TEXT NOT NULL,
default_branch TEXT,
web_url TEXT,
created_at INTEGER,
updated_at INTEGER,
raw_payload_id INTEGER
);
CREATE INDEX idx_projects_path ON projects(path_with_namespace);
",
)
.unwrap();
conn
}
fn insert_project(conn: &Connection, id: i64, path: &str) {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (?1, ?2, ?3)",
rusqlite::params![id, id * 100, path],
)
.unwrap();
}
#[test]
fn test_exact_match() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
let id = resolve_project(&conn, "backend/auth-service").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_case_insensitive() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
let id = resolve_project(&conn, "Backend/Auth-Service").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_suffix_unambiguous() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
insert_project(&conn, 2, "frontend/web-ui");
let id = resolve_project(&conn, "auth-service").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_suffix_ambiguous() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
insert_project(&conn, 2, "frontend/auth-service");
let err = resolve_project(&conn, "auth-service").unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("ambiguous"),
"Expected ambiguous error, got: {}",
msg
);
assert!(msg.contains("backend/auth-service"));
assert!(msg.contains("frontend/auth-service"));
}
#[test]
fn test_substring_unambiguous() {
let conn = setup_db();
insert_project(&conn, 1, "vs/python-code");
insert_project(&conn, 2, "vs/typescript-code");
let id = resolve_project(&conn, "typescript").unwrap();
assert_eq!(id, 2);
}
#[test]
fn test_substring_case_insensitive() {
let conn = setup_db();
insert_project(&conn, 1, "vs/python-code");
insert_project(&conn, 2, "vs/typescript-code");
let id = resolve_project(&conn, "TypeScript").unwrap();
assert_eq!(id, 2);
}
#[test]
fn test_substring_ambiguous() {
let conn = setup_db();
insert_project(&conn, 1, "vs/python-code");
insert_project(&conn, 2, "vs/typescript-code");
let err = resolve_project(&conn, "code").unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("ambiguous"),
"Expected ambiguous error, got: {}",
msg
);
assert!(msg.contains("vs/python-code"));
assert!(msg.contains("vs/typescript-code"));
}
#[test]
fn test_suffix_preferred_over_substring() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
insert_project(&conn, 2, "backend/auth-service-v2");
let id = resolve_project(&conn, "auth-service").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_no_match() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
let err = resolve_project(&conn, "nonexistent").unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("not found"),
"Expected not found error, got: {}",
msg
);
assert!(msg.contains("backend/auth-service"));
}
#[test]
fn test_empty_projects() {
let conn = setup_db();
let err = resolve_project(&conn, "anything").unwrap_err();
let msg = err.to_string();
assert!(msg.contains("No projects have been synced"));
}
#[test]
fn test_underscore_not_wildcard() {
let conn = setup_db();
insert_project(&conn, 1, "backend/my_project");
insert_project(&conn, 2, "backend/my-project");
// `_` in user input must not match `-` (LIKE wildcard behavior)
let id = resolve_project(&conn, "my_project").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_percent_not_wildcard() {
let conn = setup_db();
insert_project(&conn, 1, "backend/a%b");
insert_project(&conn, 2, "backend/axyzb");
// `%` in user input must not match arbitrary strings
let id = resolve_project(&conn, "a%b").unwrap();
assert_eq!(id, 1);
}
}
#[path = "project_tests.rs"]
mod tests;

156
src/core/project_tests.rs Normal file
View File

@@ -0,0 +1,156 @@
use super::*;
fn setup_db() -> Connection {
let conn = Connection::open_in_memory().unwrap();
conn.execute_batch(
"
CREATE TABLE projects (
id INTEGER PRIMARY KEY,
gitlab_project_id INTEGER UNIQUE NOT NULL,
path_with_namespace TEXT NOT NULL,
default_branch TEXT,
web_url TEXT,
created_at INTEGER,
updated_at INTEGER,
raw_payload_id INTEGER
);
CREATE INDEX idx_projects_path ON projects(path_with_namespace);
",
)
.unwrap();
conn
}
fn insert_project(conn: &Connection, id: i64, path: &str) {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (?1, ?2, ?3)",
rusqlite::params![id, id * 100, path],
)
.unwrap();
}
#[test]
fn test_exact_match() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
let id = resolve_project(&conn, "backend/auth-service").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_case_insensitive() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
let id = resolve_project(&conn, "Backend/Auth-Service").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_suffix_unambiguous() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
insert_project(&conn, 2, "frontend/web-ui");
let id = resolve_project(&conn, "auth-service").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_suffix_ambiguous() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
insert_project(&conn, 2, "frontend/auth-service");
let err = resolve_project(&conn, "auth-service").unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("ambiguous"),
"Expected ambiguous error, got: {}",
msg
);
assert!(msg.contains("backend/auth-service"));
assert!(msg.contains("frontend/auth-service"));
}
#[test]
fn test_substring_unambiguous() {
let conn = setup_db();
insert_project(&conn, 1, "vs/python-code");
insert_project(&conn, 2, "vs/typescript-code");
let id = resolve_project(&conn, "typescript").unwrap();
assert_eq!(id, 2);
}
#[test]
fn test_substring_case_insensitive() {
let conn = setup_db();
insert_project(&conn, 1, "vs/python-code");
insert_project(&conn, 2, "vs/typescript-code");
let id = resolve_project(&conn, "TypeScript").unwrap();
assert_eq!(id, 2);
}
#[test]
fn test_substring_ambiguous() {
let conn = setup_db();
insert_project(&conn, 1, "vs/python-code");
insert_project(&conn, 2, "vs/typescript-code");
let err = resolve_project(&conn, "code").unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("ambiguous"),
"Expected ambiguous error, got: {}",
msg
);
assert!(msg.contains("vs/python-code"));
assert!(msg.contains("vs/typescript-code"));
}
#[test]
fn test_suffix_preferred_over_substring() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
insert_project(&conn, 2, "backend/auth-service-v2");
let id = resolve_project(&conn, "auth-service").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_no_match() {
let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service");
let err = resolve_project(&conn, "nonexistent").unwrap_err();
let msg = err.to_string();
assert!(
msg.contains("not found"),
"Expected not found error, got: {}",
msg
);
assert!(msg.contains("backend/auth-service"));
}
#[test]
fn test_empty_projects() {
let conn = setup_db();
let err = resolve_project(&conn, "anything").unwrap_err();
let msg = err.to_string();
assert!(msg.contains("No projects have been synced"));
}
#[test]
fn test_underscore_not_wildcard() {
let conn = setup_db();
insert_project(&conn, 1, "backend/my_project");
insert_project(&conn, 2, "backend/my-project");
// `_` in user input must not match `-` (LIKE wildcard behavior)
let id = resolve_project(&conn, "my_project").unwrap();
assert_eq!(id, 1);
}
#[test]
fn test_percent_not_wildcard() {
let conn = setup_db();
insert_project(&conn, 1, "backend/a%b");
insert_project(&conn, 2, "backend/axyzb");
// `%` in user input must not match arbitrary strings
let id = resolve_project(&conn, "a%b").unwrap();
assert_eq!(id, 1);
}

View File

@@ -122,430 +122,5 @@ pub fn count_references_for_source(
}
#[cfg(test)]
mod tests {
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_project_issue_mr(conn: &Connection) -> (i64, i64, i64) {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'group/repo', 'https://gitlab.example.com/group/repo', 1000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
VALUES (1, 200, 10, 1, 'Test issue', 'closed', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (1, 300, 5, 1, 'Test MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
[],
)
.unwrap();
(1, 1, 1)
}
#[test]
fn test_extract_refs_from_state_events_basic() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count, 1, "Should insert exactly one reference");
let (src_type, src_id, tgt_type, tgt_id, ref_type, method): (
String,
i64,
String,
i64,
String,
String,
) = conn
.query_row(
"SELECT source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
reference_type, source_method
FROM entity_references WHERE project_id = ?1",
[project_id],
|row| {
Ok((
row.get(0)?,
row.get(1)?,
row.get(2)?,
row.get(3)?,
row.get(4)?,
row.get(5)?,
))
},
)
.unwrap();
assert_eq!(src_type, "merge_request");
assert_eq!(src_id, mr_id, "Source should be the MR's local DB id");
assert_eq!(tgt_type, "issue");
assert_eq!(tgt_id, issue_id, "Target should be the issue's local DB id");
assert_eq!(ref_type, "closes");
assert_eq!(method, "api");
}
#[test]
fn test_extract_refs_dedup_with_closes_issues() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO entity_references
(project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
reference_type, source_method, created_at)
VALUES (?1, 'merge_request', ?2, 'issue', ?3, 'closes', 'api', 3000)",
rusqlite::params![project_id, mr_id, issue_id],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count, 0, "Should not insert duplicate reference");
let total: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
[project_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(total, 1, "Should still have exactly one reference");
}
#[test]
fn test_extract_refs_no_source_mr() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, NULL)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count, 0, "Should not create refs when no source MR");
}
#[test]
fn test_extract_refs_mr_not_synced() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (2, ?1, ?2, NULL, 'closed', 3000, 999)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(
count, 0,
"Should not create ref when MR is not synced locally"
);
}
#[test]
fn test_extract_refs_idempotent() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count1 = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count1, 1);
let count2 = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count2, 0, "Second run should insert nothing (idempotent)");
}
#[test]
fn test_extract_refs_multiple_events_same_mr_issue() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (2, ?1, ?2, NULL, 'closed', 4000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert!(count <= 2, "At most 2 inserts attempted");
let total: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
[project_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(
total, 1,
"Only one unique reference should exist for same MR->issue pair"
);
}
#[test]
fn test_extract_refs_scoped_to_project() {
let conn = setup_test_db();
seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (2, 101, 'group/other', 'https://gitlab.example.com/group/other', 1000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
VALUES (2, 201, 10, 2, 'Other issue', 'closed', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (2, 301, 5, 2, 'Other MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
[],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, 1, 1, NULL, 'closed', 3000, 5)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (2, 2, 2, NULL, 'closed', 3000, 5)",
[],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, 1).unwrap();
assert_eq!(count, 1);
let total: i64 = conn
.query_row("SELECT COUNT(*) FROM entity_references", [], |row| {
row.get(0)
})
.unwrap();
assert_eq!(total, 1, "Only project 1 refs should be created");
}
#[test]
fn test_insert_entity_reference_creates_row() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: Some(issue_id),
target_project_path: None,
target_entity_iid: None,
reference_type: "closes",
source_method: "api",
};
let inserted = insert_entity_reference(&conn, &ref_).unwrap();
assert!(inserted);
let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
assert_eq!(count, 1);
}
#[test]
fn test_insert_entity_reference_idempotent() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: Some(issue_id),
target_project_path: None,
target_entity_iid: None,
reference_type: "closes",
source_method: "api",
};
let first = insert_entity_reference(&conn, &ref_).unwrap();
assert!(first);
let second = insert_entity_reference(&conn, &ref_).unwrap();
assert!(!second, "Duplicate insert should be ignored");
let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
assert_eq!(count, 1, "Still just one reference");
}
#[test]
fn test_insert_entity_reference_cross_project_unresolved() {
let conn = setup_test_db();
let (project_id, _issue_id, mr_id) = seed_project_issue_mr(&conn);
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: None,
target_project_path: Some("other-group/other-project"),
target_entity_iid: Some(99),
reference_type: "closes",
source_method: "api",
};
let inserted = insert_entity_reference(&conn, &ref_).unwrap();
assert!(inserted);
let (target_id, target_path, target_iid): (Option<i64>, Option<String>, Option<i64>) = conn
.query_row(
"SELECT target_entity_id, target_project_path, target_entity_iid \
FROM entity_references WHERE source_entity_id = ?1",
[mr_id],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
)
.unwrap();
assert!(target_id.is_none());
assert_eq!(target_path, Some("other-group/other-project".to_string()));
assert_eq!(target_iid, Some(99));
}
#[test]
fn test_insert_multiple_closes_references() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 210, 11, ?1, 'Second issue', 'opened', 1000, 2000, 2000)",
rusqlite::params![project_id],
)
.unwrap();
let issue_id_2 = 10i64;
for target_id in [issue_id, issue_id_2] {
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: Some(target_id),
target_project_path: None,
target_entity_iid: None,
reference_type: "closes",
source_method: "api",
};
insert_entity_reference(&conn, &ref_).unwrap();
}
let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
assert_eq!(count, 2);
}
#[test]
fn test_resolve_issue_local_id_found() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
let resolved = resolve_issue_local_id(&conn, project_id, 10).unwrap();
assert_eq!(resolved, Some(issue_id));
}
#[test]
fn test_resolve_issue_local_id_not_found() {
let conn = setup_test_db();
let (project_id, _issue_id, _mr_id) = seed_project_issue_mr(&conn);
let resolved = resolve_issue_local_id(&conn, project_id, 999).unwrap();
assert!(resolved.is_none());
}
#[test]
fn test_resolve_project_path_found() {
let conn = setup_test_db();
seed_project_issue_mr(&conn);
let path = resolve_project_path(&conn, 100).unwrap();
assert_eq!(path, Some("group/repo".to_string()));
}
#[test]
fn test_resolve_project_path_not_found() {
let conn = setup_test_db();
let path = resolve_project_path(&conn, 999).unwrap();
assert!(path.is_none());
}
}
#[path = "references_tests.rs"]
mod tests;

View File

@@ -0,0 +1,425 @@
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_project_issue_mr(conn: &Connection) -> (i64, i64, i64) {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'group/repo', 'https://gitlab.example.com/group/repo', 1000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
VALUES (1, 200, 10, 1, 'Test issue', 'closed', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (1, 300, 5, 1, 'Test MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
[],
)
.unwrap();
(1, 1, 1)
}
#[test]
fn test_extract_refs_from_state_events_basic() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count, 1, "Should insert exactly one reference");
let (src_type, src_id, tgt_type, tgt_id, ref_type, method): (
String,
i64,
String,
i64,
String,
String,
) = conn
.query_row(
"SELECT source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
reference_type, source_method
FROM entity_references WHERE project_id = ?1",
[project_id],
|row| {
Ok((
row.get(0)?,
row.get(1)?,
row.get(2)?,
row.get(3)?,
row.get(4)?,
row.get(5)?,
))
},
)
.unwrap();
assert_eq!(src_type, "merge_request");
assert_eq!(src_id, mr_id, "Source should be the MR's local DB id");
assert_eq!(tgt_type, "issue");
assert_eq!(tgt_id, issue_id, "Target should be the issue's local DB id");
assert_eq!(ref_type, "closes");
assert_eq!(method, "api");
}
#[test]
fn test_extract_refs_dedup_with_closes_issues() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO entity_references
(project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
reference_type, source_method, created_at)
VALUES (?1, 'merge_request', ?2, 'issue', ?3, 'closes', 'api', 3000)",
rusqlite::params![project_id, mr_id, issue_id],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count, 0, "Should not insert duplicate reference");
let total: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
[project_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(total, 1, "Should still have exactly one reference");
}
#[test]
fn test_extract_refs_no_source_mr() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, NULL)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count, 0, "Should not create refs when no source MR");
}
#[test]
fn test_extract_refs_mr_not_synced() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (2, ?1, ?2, NULL, 'closed', 3000, 999)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(
count, 0,
"Should not create ref when MR is not synced locally"
);
}
#[test]
fn test_extract_refs_idempotent() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count1 = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count1, 1);
let count2 = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count2, 0, "Second run should insert nothing (idempotent)");
}
#[test]
fn test_extract_refs_multiple_events_same_mr_issue() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (2, ?1, ?2, NULL, 'closed', 4000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert!(count <= 2, "At most 2 inserts attempted");
let total: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
[project_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(
total, 1,
"Only one unique reference should exist for same MR->issue pair"
);
}
#[test]
fn test_extract_refs_scoped_to_project() {
let conn = setup_test_db();
seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (2, 101, 'group/other', 'https://gitlab.example.com/group/other', 1000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
VALUES (2, 201, 10, 2, 'Other issue', 'closed', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (2, 301, 5, 2, 'Other MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
[],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, 1, 1, NULL, 'closed', 3000, 5)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (2, 2, 2, NULL, 'closed', 3000, 5)",
[],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, 1).unwrap();
assert_eq!(count, 1);
let total: i64 = conn
.query_row("SELECT COUNT(*) FROM entity_references", [], |row| {
row.get(0)
})
.unwrap();
assert_eq!(total, 1, "Only project 1 refs should be created");
}
#[test]
fn test_insert_entity_reference_creates_row() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: Some(issue_id),
target_project_path: None,
target_entity_iid: None,
reference_type: "closes",
source_method: "api",
};
let inserted = insert_entity_reference(&conn, &ref_).unwrap();
assert!(inserted);
let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
assert_eq!(count, 1);
}
#[test]
fn test_insert_entity_reference_idempotent() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: Some(issue_id),
target_project_path: None,
target_entity_iid: None,
reference_type: "closes",
source_method: "api",
};
let first = insert_entity_reference(&conn, &ref_).unwrap();
assert!(first);
let second = insert_entity_reference(&conn, &ref_).unwrap();
assert!(!second, "Duplicate insert should be ignored");
let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
assert_eq!(count, 1, "Still just one reference");
}
#[test]
fn test_insert_entity_reference_cross_project_unresolved() {
let conn = setup_test_db();
let (project_id, _issue_id, mr_id) = seed_project_issue_mr(&conn);
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: None,
target_project_path: Some("other-group/other-project"),
target_entity_iid: Some(99),
reference_type: "closes",
source_method: "api",
};
let inserted = insert_entity_reference(&conn, &ref_).unwrap();
assert!(inserted);
let (target_id, target_path, target_iid): (Option<i64>, Option<String>, Option<i64>) = conn
.query_row(
"SELECT target_entity_id, target_project_path, target_entity_iid \
FROM entity_references WHERE source_entity_id = ?1",
[mr_id],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
)
.unwrap();
assert!(target_id.is_none());
assert_eq!(target_path, Some("other-group/other-project".to_string()));
assert_eq!(target_iid, Some(99));
}
#[test]
fn test_insert_multiple_closes_references() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 210, 11, ?1, 'Second issue', 'opened', 1000, 2000, 2000)",
rusqlite::params![project_id],
)
.unwrap();
let issue_id_2 = 10i64;
for target_id in [issue_id, issue_id_2] {
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: Some(target_id),
target_project_path: None,
target_entity_iid: None,
reference_type: "closes",
source_method: "api",
};
insert_entity_reference(&conn, &ref_).unwrap();
}
let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
assert_eq!(count, 2);
}
#[test]
fn test_resolve_issue_local_id_found() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
let resolved = resolve_issue_local_id(&conn, project_id, 10).unwrap();
assert_eq!(resolved, Some(issue_id));
}
#[test]
fn test_resolve_issue_local_id_not_found() {
let conn = setup_test_db();
let (project_id, _issue_id, _mr_id) = seed_project_issue_mr(&conn);
let resolved = resolve_issue_local_id(&conn, project_id, 999).unwrap();
assert!(resolved.is_none());
}
#[test]
fn test_resolve_project_path_found() {
let conn = setup_test_db();
seed_project_issue_mr(&conn);
let path = resolve_project_path(&conn, 100).unwrap();
assert_eq!(path, Some("group/repo".to_string()));
}
#[test]
fn test_resolve_project_path_not_found() {
let conn = setup_test_db();
let path = resolve_project_path(&conn, 999).unwrap();
assert!(path.is_none());
}

View File

@@ -66,153 +66,5 @@ impl SyncRunRecorder {
}
#[cfg(test)]
mod tests {
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
#[test]
fn test_sync_run_recorder_start() {
let conn = setup_test_db();
let recorder = SyncRunRecorder::start(&conn, "sync", "abc12345").unwrap();
assert!(recorder.row_id > 0);
let (status, command, run_id): (String, String, String) = conn
.query_row(
"SELECT status, command, run_id FROM sync_runs WHERE id = ?1",
[recorder.row_id],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
)
.unwrap();
assert_eq!(status, "running");
assert_eq!(command, "sync");
assert_eq!(run_id, "abc12345");
}
#[test]
fn test_sync_run_recorder_succeed() {
let conn = setup_test_db();
let recorder = SyncRunRecorder::start(&conn, "sync", "def67890").unwrap();
let row_id = recorder.row_id;
let metrics = vec![StageTiming {
name: "ingest".to_string(),
project: None,
elapsed_ms: 1200,
items_processed: 50,
items_skipped: 0,
errors: 2,
rate_limit_hits: 0,
retries: 0,
sub_stages: vec![],
}];
recorder.succeed(&conn, &metrics, 50, 2).unwrap();
let (status, finished_at, metrics_json, total_items, total_errors): (
String,
Option<i64>,
Option<String>,
i64,
i64,
) = conn
.query_row(
"SELECT status, finished_at, metrics_json, total_items_processed, total_errors
FROM sync_runs WHERE id = ?1",
[row_id],
|row| {
Ok((
row.get(0)?,
row.get(1)?,
row.get(2)?,
row.get(3)?,
row.get(4)?,
))
},
)
.unwrap();
assert_eq!(status, "succeeded");
assert!(finished_at.is_some());
assert!(metrics_json.is_some());
assert_eq!(total_items, 50);
assert_eq!(total_errors, 2);
let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
assert_eq!(parsed.len(), 1);
assert_eq!(parsed[0].name, "ingest");
}
#[test]
fn test_sync_run_recorder_fail() {
let conn = setup_test_db();
let recorder = SyncRunRecorder::start(&conn, "ingest issues", "fail0001").unwrap();
let row_id = recorder.row_id;
recorder.fail(&conn, "GitLab auth failed", None).unwrap();
let (status, finished_at, error, metrics_json): (
String,
Option<i64>,
Option<String>,
Option<String>,
) = conn
.query_row(
"SELECT status, finished_at, error, metrics_json
FROM sync_runs WHERE id = ?1",
[row_id],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)),
)
.unwrap();
assert_eq!(status, "failed");
assert!(finished_at.is_some());
assert_eq!(error.as_deref(), Some("GitLab auth failed"));
assert!(metrics_json.is_none());
}
#[test]
fn test_sync_run_recorder_fail_with_partial_metrics() {
let conn = setup_test_db();
let recorder = SyncRunRecorder::start(&conn, "sync", "part0001").unwrap();
let row_id = recorder.row_id;
let partial_metrics = vec![StageTiming {
name: "ingest_issues".to_string(),
project: Some("group/repo".to_string()),
elapsed_ms: 800,
items_processed: 30,
items_skipped: 0,
errors: 0,
rate_limit_hits: 1,
retries: 0,
sub_stages: vec![],
}];
recorder
.fail(&conn, "Embedding failed", Some(&partial_metrics))
.unwrap();
let (status, metrics_json): (String, Option<String>) = conn
.query_row(
"SELECT status, metrics_json FROM sync_runs WHERE id = ?1",
[row_id],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.unwrap();
assert_eq!(status, "failed");
assert!(metrics_json.is_some());
let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
assert_eq!(parsed.len(), 1);
assert_eq!(parsed[0].name, "ingest_issues");
}
}
#[path = "sync_run_tests.rs"]
mod tests;

148
src/core/sync_run_tests.rs Normal file
View File

@@ -0,0 +1,148 @@
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
#[test]
fn test_sync_run_recorder_start() {
let conn = setup_test_db();
let recorder = SyncRunRecorder::start(&conn, "sync", "abc12345").unwrap();
assert!(recorder.row_id > 0);
let (status, command, run_id): (String, String, String) = conn
.query_row(
"SELECT status, command, run_id FROM sync_runs WHERE id = ?1",
[recorder.row_id],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
)
.unwrap();
assert_eq!(status, "running");
assert_eq!(command, "sync");
assert_eq!(run_id, "abc12345");
}
#[test]
fn test_sync_run_recorder_succeed() {
let conn = setup_test_db();
let recorder = SyncRunRecorder::start(&conn, "sync", "def67890").unwrap();
let row_id = recorder.row_id;
let metrics = vec![StageTiming {
name: "ingest".to_string(),
project: None,
elapsed_ms: 1200,
items_processed: 50,
items_skipped: 0,
errors: 2,
rate_limit_hits: 0,
retries: 0,
sub_stages: vec![],
}];
recorder.succeed(&conn, &metrics, 50, 2).unwrap();
let (status, finished_at, metrics_json, total_items, total_errors): (
String,
Option<i64>,
Option<String>,
i64,
i64,
) = conn
.query_row(
"SELECT status, finished_at, metrics_json, total_items_processed, total_errors
FROM sync_runs WHERE id = ?1",
[row_id],
|row| {
Ok((
row.get(0)?,
row.get(1)?,
row.get(2)?,
row.get(3)?,
row.get(4)?,
))
},
)
.unwrap();
assert_eq!(status, "succeeded");
assert!(finished_at.is_some());
assert!(metrics_json.is_some());
assert_eq!(total_items, 50);
assert_eq!(total_errors, 2);
let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
assert_eq!(parsed.len(), 1);
assert_eq!(parsed[0].name, "ingest");
}
#[test]
fn test_sync_run_recorder_fail() {
let conn = setup_test_db();
let recorder = SyncRunRecorder::start(&conn, "ingest issues", "fail0001").unwrap();
let row_id = recorder.row_id;
recorder.fail(&conn, "GitLab auth failed", None).unwrap();
let (status, finished_at, error, metrics_json): (
String,
Option<i64>,
Option<String>,
Option<String>,
) = conn
.query_row(
"SELECT status, finished_at, error, metrics_json
FROM sync_runs WHERE id = ?1",
[row_id],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)),
)
.unwrap();
assert_eq!(status, "failed");
assert!(finished_at.is_some());
assert_eq!(error.as_deref(), Some("GitLab auth failed"));
assert!(metrics_json.is_none());
}
#[test]
fn test_sync_run_recorder_fail_with_partial_metrics() {
let conn = setup_test_db();
let recorder = SyncRunRecorder::start(&conn, "sync", "part0001").unwrap();
let row_id = recorder.row_id;
let partial_metrics = vec![StageTiming {
name: "ingest_issues".to_string(),
project: Some("group/repo".to_string()),
elapsed_ms: 800,
items_processed: 30,
items_skipped: 0,
errors: 0,
rate_limit_hits: 1,
retries: 0,
sub_stages: vec![],
}];
recorder
.fail(&conn, "Embedding failed", Some(&partial_metrics))
.unwrap();
let (status, metrics_json): (String, Option<String>) = conn
.query_row(
"SELECT status, metrics_json FROM sync_runs WHERE id = ?1",
[row_id],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.unwrap();
assert_eq!(status, "failed");
assert!(metrics_json.is_some());
let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
assert_eq!(parsed.len(), 1);
assert_eq!(parsed[0].name, "ingest_issues");
}

View File

@@ -17,21 +17,27 @@ pub fn now_ms() -> i64 {
}
pub fn parse_since(input: &str) -> Option<i64> {
parse_since_from(input, now_ms())
}
/// Like `parse_since` but durations are relative to `reference_ms` instead of now.
/// Absolute dates/timestamps are returned as-is regardless of `reference_ms`.
pub fn parse_since_from(input: &str, reference_ms: i64) -> Option<i64> {
let input = input.trim();
if let Some(num_str) = input.strip_suffix('d') {
let days: i64 = num_str.parse().ok()?;
return Some(now_ms() - (days * 24 * 60 * 60 * 1000));
return Some(reference_ms - (days * 24 * 60 * 60 * 1000));
}
if let Some(num_str) = input.strip_suffix('w') {
let weeks: i64 = num_str.parse().ok()?;
return Some(now_ms() - (weeks * 7 * 24 * 60 * 60 * 1000));
return Some(reference_ms - (weeks * 7 * 24 * 60 * 60 * 1000));
}
if let Some(num_str) = input.strip_suffix('m') {
let months: i64 = num_str.parse().ok()?;
return Some(now_ms() - (months * 30 * 24 * 60 * 60 * 1000));
return Some(reference_ms - (months * 30 * 24 * 60 * 60 * 1000));
}
if input.len() == 10 && input.chars().filter(|&c| c == '-').count() == 2 {

View File

@@ -49,6 +49,21 @@ impl Ord for TimelineEvent {
}
}
/// Maximum characters per note body in a discussion thread.
pub const THREAD_NOTE_MAX_CHARS: usize = 2000;
/// Maximum notes per discussion thread before truncation.
pub const THREAD_MAX_NOTES: usize = 50;
/// A single note within a discussion thread.
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Serialize)]
pub struct ThreadNote {
pub note_id: i64,
pub author: Option<String>,
pub body: String,
pub created_at: i64,
}
/// Per spec Section 3.3. Serde tagged enum for JSON output.
///
/// Variant declaration order defines the sort order within a timestamp+entity
@@ -78,11 +93,39 @@ pub enum TimelineEventType {
snippet: String,
discussion_id: Option<i64>,
},
DiscussionThread {
discussion_id: i64,
notes: Vec<ThreadNote>,
},
CrossReferenced {
target: String,
},
}
/// Truncate a string to at most `max_chars` characters on a safe UTF-8 boundary.
pub(crate) fn truncate_to_chars(s: &str, max_chars: usize) -> String {
let char_count = s.chars().count();
if char_count <= max_chars {
return s.to_owned();
}
let byte_end = s
.char_indices()
.nth(max_chars)
.map(|(i, _)| i)
.unwrap_or(s.len());
s[..byte_end].to_owned()
}
/// A discussion matched during the seed phase, to be collected as a full thread.
#[derive(Debug, Clone)]
pub struct MatchedDiscussion {
pub discussion_id: i64,
pub entity_type: String,
pub entity_id: i64,
pub project_id: i64,
}
/// Internal entity reference used across pipeline stages.
#[derive(Debug, Clone, Serialize)]
pub struct EntityRef {
@@ -118,6 +161,8 @@ pub struct UnresolvedRef {
#[derive(Debug, Clone, Serialize)]
pub struct TimelineResult {
pub query: String,
/// The search mode actually used for seeding (e.g. "hybrid", "lexical", "lexical (hybrid fallback)").
pub search_mode: String,
pub events: Vec<TimelineEvent>,
/// Total events before the `--limit` was applied (for meta.total_events vs meta.showing).
#[serde(skip)]
@@ -166,6 +211,77 @@ pub fn resolve_entity_ref(
}
}
/// Resolve an entity by its user-facing IID (e.g. issue #42) to a full [`EntityRef`].
///
/// Unlike [`resolve_entity_ref`] which takes an internal DB id, this takes the
/// GitLab IID that users see. Used by entity-direct timeline seeding (`issue:42`).
///
/// When `project_id` is `Some`, the query is scoped to that project (disambiguates
/// duplicate IIDs across projects).
///
/// Returns `LoreError::NotFound` when no match exists, `LoreError::Ambiguous` when
/// the same IID exists in multiple projects (suggest `--project`).
pub fn resolve_entity_by_iid(
conn: &Connection,
entity_type: &str,
iid: i64,
project_id: Option<i64>,
) -> Result<EntityRef> {
let table = match entity_type {
"issue" => "issues",
"merge_request" => "merge_requests",
_ => {
return Err(super::error::LoreError::NotFound(format!(
"Unknown entity type: {entity_type}"
)));
}
};
let sql = format!(
"SELECT e.id, e.iid, p.path_with_namespace
FROM {table} e
JOIN projects p ON p.id = e.project_id
WHERE e.iid = ?1 AND (?2 IS NULL OR e.project_id = ?2)"
);
let mut stmt = conn.prepare(&sql)?;
let rows: Vec<(i64, i64, String)> = stmt
.query_map(rusqlite::params![iid, project_id], |row| {
Ok((
row.get::<_, i64>(0)?,
row.get::<_, i64>(1)?,
row.get::<_, String>(2)?,
))
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
match rows.len() {
0 => {
let sigil = if entity_type == "issue" { "#" } else { "!" };
Err(super::error::LoreError::NotFound(format!(
"{entity_type} {sigil}{iid} not found"
)))
}
1 => {
let (entity_id, entity_iid, project_path) = rows.into_iter().next().unwrap();
Ok(EntityRef {
entity_type: entity_type.to_owned(),
entity_id,
entity_iid,
project_path,
})
}
_ => {
let projects: Vec<&str> = rows.iter().map(|(_, _, p)| p.as_str()).collect();
let sigil = if entity_type == "issue" { "#" } else { "!" };
Err(super::error::LoreError::Ambiguous(format!(
"{entity_type} {sigil}{iid} exists in multiple projects: {}. Use --project to specify.",
projects.join(", ")
)))
}
}
}
#[cfg(test)]
mod tests {
use super::*;
@@ -248,7 +364,7 @@ mod tests {
#[test]
fn test_timeline_event_type_variant_count() {
// Verify all 9 variants serialize without panic
// Verify all 10 variants serialize without panic
let variants: Vec<TimelineEventType> = vec![
TimelineEventType::Created,
TimelineEventType::StateChanged {
@@ -272,13 +388,198 @@ mod tests {
snippet: "text".to_owned(),
discussion_id: None,
},
TimelineEventType::DiscussionThread {
discussion_id: 1,
notes: vec![ThreadNote {
note_id: 1,
author: Some("alice".to_owned()),
body: "hello".to_owned(),
created_at: 1000,
}],
},
TimelineEventType::CrossReferenced {
target: "!567".to_owned(),
},
];
assert_eq!(variants.len(), 9);
assert_eq!(variants.len(), 10);
for v in &variants {
serde_json::to_value(v).unwrap();
}
}
#[test]
fn test_discussion_thread_serializes_tagged() {
let event_type = TimelineEventType::DiscussionThread {
discussion_id: 42,
notes: vec![
ThreadNote {
note_id: 1,
author: Some("alice".to_owned()),
body: "first note".to_owned(),
created_at: 1000,
},
ThreadNote {
note_id: 2,
author: Some("bob".to_owned()),
body: "second note".to_owned(),
created_at: 2000,
},
],
};
let json = serde_json::to_value(&event_type).unwrap();
assert_eq!(json["kind"], "discussion_thread");
assert_eq!(json["discussion_id"], 42);
assert_eq!(json["notes"].as_array().unwrap().len(), 2);
assert_eq!(json["notes"][0]["note_id"], 1);
assert_eq!(json["notes"][0]["author"], "alice");
assert_eq!(json["notes"][0]["body"], "first note");
assert_eq!(json["notes"][1]["note_id"], 2);
}
#[test]
fn test_discussion_thread_sort_order() {
// DiscussionThread should sort after NoteEvidence, before CrossReferenced
let note_ev = TimelineEventType::NoteEvidence {
note_id: 1,
snippet: "a".to_owned(),
discussion_id: None,
};
let thread = TimelineEventType::DiscussionThread {
discussion_id: 1,
notes: vec![],
};
let cross_ref = TimelineEventType::CrossReferenced {
target: "!1".to_owned(),
};
assert!(note_ev < thread);
assert!(thread < cross_ref);
}
#[test]
fn test_thread_note_ord() {
let a = ThreadNote {
note_id: 1,
author: Some("alice".to_owned()),
body: "first".to_owned(),
created_at: 1000,
};
let b = ThreadNote {
note_id: 2,
author: Some("bob".to_owned()),
body: "second".to_owned(),
created_at: 2000,
};
// ThreadNote derives Ord — note_id is the first field, so ordering is by note_id
assert!(a < b);
}
#[test]
fn test_truncate_to_chars() {
assert_eq!(truncate_to_chars("hello", 200), "hello");
let long = "a".repeat(300);
assert_eq!(truncate_to_chars(&long, 200).chars().count(), 200);
}
// ─── resolve_entity_by_iid tests ────────────────────────────────────────
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn insert_project(conn: &Connection, gitlab_id: i64, path: &str) -> i64 {
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (?1, ?2, ?3)",
rusqlite::params![gitlab_id, path, format!("https://gitlab.com/{path}")],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test issue', 'opened', 'alice', 1000, 2000, 3000)",
rusqlite::params![project_id * 10000 + iid, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
rusqlite::params![project_id * 10000 + iid, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
#[test]
fn test_resolve_entity_by_iid_issue() {
let conn = setup_db();
let project_id = insert_project(&conn, 1, "group/project");
let entity_id = insert_issue(&conn, project_id, 42);
let result = resolve_entity_by_iid(&conn, "issue", 42, None).unwrap();
assert_eq!(result.entity_type, "issue");
assert_eq!(result.entity_id, entity_id);
assert_eq!(result.entity_iid, 42);
assert_eq!(result.project_path, "group/project");
}
#[test]
fn test_resolve_entity_by_iid_mr() {
let conn = setup_db();
let project_id = insert_project(&conn, 1, "group/project");
let entity_id = insert_mr(&conn, project_id, 99);
let result = resolve_entity_by_iid(&conn, "merge_request", 99, None).unwrap();
assert_eq!(result.entity_type, "merge_request");
assert_eq!(result.entity_id, entity_id);
assert_eq!(result.entity_iid, 99);
assert_eq!(result.project_path, "group/project");
}
#[test]
fn test_resolve_entity_by_iid_not_found() {
let conn = setup_db();
insert_project(&conn, 1, "group/project");
let result = resolve_entity_by_iid(&conn, "issue", 999, None);
assert!(result.is_err());
let err = result.unwrap_err();
assert!(matches!(err, crate::core::error::LoreError::NotFound(_)));
}
#[test]
fn test_resolve_entity_by_iid_ambiguous() {
let conn = setup_db();
let proj1 = insert_project(&conn, 1, "group/project-a");
let proj2 = insert_project(&conn, 2, "group/project-b");
insert_issue(&conn, proj1, 42);
insert_issue(&conn, proj2, 42);
let result = resolve_entity_by_iid(&conn, "issue", 42, None);
assert!(result.is_err());
let err = result.unwrap_err();
assert!(matches!(err, crate::core::error::LoreError::Ambiguous(_)));
}
#[test]
fn test_resolve_entity_by_iid_project_scoped() {
let conn = setup_db();
let proj1 = insert_project(&conn, 1, "group/project-a");
let proj2 = insert_project(&conn, 2, "group/project-b");
insert_issue(&conn, proj1, 42);
let entity_id_b = insert_issue(&conn, proj2, 42);
let result = resolve_entity_by_iid(&conn, "issue", 42, Some(proj2)).unwrap();
assert_eq!(result.entity_id, entity_id_b);
assert_eq!(result.project_path, "group/project-b");
}
}

View File

@@ -1,20 +1,27 @@
use rusqlite::Connection;
use std::collections::HashSet;
use crate::core::error::{LoreError, Result};
use crate::core::timeline::{EntityRef, ExpandedEntityRef, TimelineEvent, TimelineEventType};
use crate::core::timeline::{
EntityRef, ExpandedEntityRef, MatchedDiscussion, THREAD_MAX_NOTES, THREAD_NOTE_MAX_CHARS,
ThreadNote, TimelineEvent, TimelineEventType, truncate_to_chars,
};
/// Collect all events for seed and expanded entities, interleave chronologically.
///
/// Steps 4-5 of the timeline pipeline:
/// 1. For each entity, collect Created, StateChanged, Label, Milestone, Merged events
/// 2. Merge in evidence notes from the seed phase
/// 3. Sort chronologically with stable tiebreak
/// 4. Apply --since filter and --limit
/// 2. Collect discussion threads from matched discussions
/// 3. Merge in evidence notes from the seed phase
/// 4. Sort chronologically with stable tiebreak
/// 5. Apply --since filter and --limit
pub fn collect_events(
conn: &Connection,
seed_entities: &[EntityRef],
expanded_entities: &[ExpandedEntityRef],
evidence_notes: &[TimelineEvent],
matched_discussions: &[MatchedDiscussion],
since_ms: Option<i64>,
limit: usize,
) -> Result<(Vec<TimelineEvent>, usize)> {
@@ -30,6 +37,10 @@ pub fn collect_events(
collect_entity_events(conn, &expanded.entity_ref, false, &mut all_events)?;
}
// Collect discussion threads
let entity_lookup = build_entity_lookup(seed_entities, expanded_entities);
collect_discussion_threads(conn, matched_discussions, &entity_lookup, &mut all_events)?;
// Add evidence notes from seed phase
all_events.extend(evidence_notes.iter().cloned());
@@ -369,327 +380,117 @@ fn entity_id_column(entity: &EntityRef) -> Result<(&'static str, i64)> {
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
/// Lookup key: (entity_type, entity_id) -> (iid, project_path)
type EntityLookup = std::collections::HashMap<(String, i64), (i64, String)>;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
fn build_entity_lookup(seeds: &[EntityRef], expanded: &[ExpandedEntityRef]) -> EntityLookup {
let mut lookup = EntityLookup::new();
for e in seeds {
lookup.insert(
(e.entity_type.clone(), e.entity_id),
(e.entity_iid, e.project_path.clone()),
);
}
for exp in expanded {
let e = &exp.entity_ref;
lookup.insert(
(e.entity_type.clone(), e.entity_id),
(e.entity_iid, e.project_path.clone()),
);
}
lookup
}
/// Collect full discussion threads for matched discussions.
fn collect_discussion_threads(
conn: &Connection,
matched_discussions: &[MatchedDiscussion],
entity_lookup: &EntityLookup,
events: &mut Vec<TimelineEvent>,
) -> Result<()> {
// Deduplicate by discussion_id
let mut seen = HashSet::new();
let mut stmt = conn.prepare(
"SELECT id, author_username, body, created_at FROM notes
WHERE discussion_id = ?1 AND is_system = 0
ORDER BY created_at ASC",
)?;
for disc in matched_discussions {
if !seen.insert(disc.discussion_id) {
continue;
}
fn insert_project(conn: &Connection) -> i64 {
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
[],
)
.unwrap();
conn.last_insert_rowid()
let (iid, project_path) =
match entity_lookup.get(&(disc.entity_type.clone(), disc.entity_id)) {
Some(val) => val.clone(),
None => continue, // entity not in seed or expanded set
};
let rows = stmt.query_map(rusqlite::params![disc.discussion_id], |row| {
Ok((
row.get::<_, i64>(0)?, // id
row.get::<_, Option<String>>(1)?, // author_username
row.get::<_, Option<String>>(2)?, // body
row.get::<_, i64>(3)?, // created_at
))
})?;
let mut notes = Vec::new();
for row_result in rows {
let (note_id, author, body, created_at) = row_result?;
let body = truncate_to_chars(body.as_deref().unwrap_or(""), THREAD_NOTE_MAX_CHARS);
notes.push(ThreadNote {
note_id,
author,
body,
created_at,
});
}
fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, web_url) VALUES (?1, ?2, ?3, 'Auth bug', 'opened', 'alice', 1000, 2000, 3000, 'https://gitlab.com/group/project/-/issues/1')",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
// Skip empty threads (all notes were system notes)
if notes.is_empty() {
continue;
}
fn insert_mr(conn: &Connection, project_id: i64, iid: i64, merged_at: Option<i64>) -> i64 {
conn.execute(
"INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, merged_at, merge_user_username, web_url) VALUES (?1, ?2, ?3, 'Fix auth', 'merged', 'bob', 1000, 5000, 6000, ?4, 'charlie', 'https://gitlab.com/group/project/-/merge_requests/10')",
rusqlite::params![iid * 100, project_id, iid, merged_at],
)
.unwrap();
conn.last_insert_rowid()
let first_created_at = notes[0].created_at;
// Cap notes per thread
let total_notes = notes.len();
if total_notes > THREAD_MAX_NOTES {
notes.truncate(THREAD_MAX_NOTES);
notes.push(ThreadNote {
note_id: -1,
author: None,
body: format!("[{} more notes not shown]", total_notes - THREAD_MAX_NOTES),
created_at: notes.last().map_or(first_created_at, |n| n.created_at),
});
}
fn make_entity_ref(entity_type: &str, entity_id: i64, iid: i64) -> EntityRef {
EntityRef {
entity_type: entity_type.to_owned(),
entity_id,
let note_count = notes.len();
let actor = notes.first().and_then(|n| n.author.clone());
events.push(TimelineEvent {
timestamp: first_created_at,
entity_type: disc.entity_type.clone(),
entity_id: disc.entity_id,
entity_iid: iid,
project_path: "group/project".to_owned(),
}
}
fn insert_state_event(
conn: &Connection,
project_id: i64,
issue_id: Option<i64>,
mr_id: Option<i64>,
state: &str,
created_at: i64,
) {
let gitlab_id: i64 = rand::random::<u32>().into();
conn.execute(
"INSERT INTO resource_state_events (gitlab_id, project_id, issue_id, merge_request_id, state, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, 'alice', ?6)",
rusqlite::params![gitlab_id, project_id, issue_id, mr_id, state, created_at],
)
.unwrap();
}
fn insert_label_event(
conn: &Connection,
project_id: i64,
issue_id: Option<i64>,
mr_id: Option<i64>,
action: &str,
label_name: Option<&str>,
created_at: i64,
) {
let gitlab_id: i64 = rand::random::<u32>().into();
conn.execute(
"INSERT INTO resource_label_events (gitlab_id, project_id, issue_id, merge_request_id, action, label_name, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, 'alice', ?7)",
rusqlite::params![gitlab_id, project_id, issue_id, mr_id, action, label_name, created_at],
)
.unwrap();
}
fn insert_milestone_event(
conn: &Connection,
project_id: i64,
issue_id: Option<i64>,
mr_id: Option<i64>,
action: &str,
milestone_title: Option<&str>,
created_at: i64,
) {
let gitlab_id: i64 = rand::random::<u32>().into();
conn.execute(
"INSERT INTO resource_milestone_events (gitlab_id, project_id, issue_id, merge_request_id, action, milestone_title, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, 'alice', ?7)",
rusqlite::params![gitlab_id, project_id, issue_id, mr_id, action, milestone_title, created_at],
)
.unwrap();
}
#[test]
fn test_collect_creation_event() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
assert_eq!(events.len(), 1);
assert!(matches!(events[0].event_type, TimelineEventType::Created));
assert_eq!(events[0].timestamp, 1000);
assert_eq!(events[0].actor, Some("alice".to_owned()));
assert!(events[0].is_seed);
}
#[test]
fn test_collect_state_events() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
insert_state_event(&conn, project_id, Some(issue_id), None, "reopened", 4000);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
// Created + 2 state changes = 3
assert_eq!(events.len(), 3);
assert!(matches!(events[0].event_type, TimelineEventType::Created));
assert!(matches!(
events[1].event_type,
TimelineEventType::StateChanged { ref state } if state == "closed"
));
assert!(matches!(
events[2].event_type,
TimelineEventType::StateChanged { ref state } if state == "reopened"
));
}
#[test]
fn test_collect_merged_dedup() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let mr_id = insert_mr(&conn, project_id, 10, Some(5000));
// Also add a state event for 'merged' — this should NOT produce a StateChanged
insert_state_event(&conn, project_id, None, Some(mr_id), "merged", 5000);
let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
// Should have Created + Merged (not Created + StateChanged{merged} + Merged)
let merged_count = events
.iter()
.filter(|e| matches!(e.event_type, TimelineEventType::Merged))
.count();
let state_merged_count = events
.iter()
.filter(|e| matches!(&e.event_type, TimelineEventType::StateChanged { state } if state == "merged"))
.count();
assert_eq!(merged_count, 1);
assert_eq!(state_merged_count, 0);
}
#[test]
fn test_collect_null_label_fallback() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
insert_label_event(&conn, project_id, Some(issue_id), None, "add", None, 2000);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
let label_event = events.iter().find(|e| {
matches!(&e.event_type, TimelineEventType::LabelAdded { label } if label == "[deleted label]")
});
assert!(label_event.is_some());
}
#[test]
fn test_collect_null_milestone_fallback() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
insert_milestone_event(&conn, project_id, Some(issue_id), None, "add", None, 2000);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
let ms_event = events.iter().find(|e| {
matches!(&e.event_type, TimelineEventType::MilestoneSet { milestone } if milestone == "[deleted milestone]")
});
assert!(ms_event.is_some());
}
#[test]
fn test_collect_since_filter() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
insert_state_event(&conn, project_id, Some(issue_id), None, "reopened", 5000);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
// Since 4000: should exclude Created (1000) and closed (3000)
let (events, _) = collect_events(&conn, &seeds, &[], &[], Some(4000), 100).unwrap();
assert_eq!(events.len(), 1);
assert_eq!(events[0].timestamp, 5000);
}
#[test]
fn test_collect_chronological_sort() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10, Some(4000));
insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
insert_label_event(
&conn,
project_id,
None,
Some(mr_id),
"add",
Some("bug"),
2000,
);
let seeds = vec![
make_entity_ref("issue", issue_id, 1),
make_entity_ref("merge_request", mr_id, 10),
];
let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
// Verify chronological order
for window in events.windows(2) {
assert!(window[0].timestamp <= window[1].timestamp);
}
}
#[test]
fn test_collect_respects_limit() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
for i in 0..20 {
insert_state_event(
&conn,
project_id,
Some(issue_id),
None,
"closed",
3000 + i * 100,
);
}
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, total) = collect_events(&conn, &seeds, &[], &[], None, 5).unwrap();
assert_eq!(events.len(), 5);
// 20 state changes + 1 created = 21 total before limit
assert_eq!(total, 21);
}
#[test]
fn test_collect_evidence_notes_included() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let evidence = vec![TimelineEvent {
timestamp: 2500,
entity_type: "issue".to_owned(),
entity_id: issue_id,
entity_iid: 1,
project_path: "group/project".to_owned(),
event_type: TimelineEventType::NoteEvidence {
note_id: 42,
snippet: "relevant note".to_owned(),
discussion_id: Some(1),
project_path,
event_type: TimelineEventType::DiscussionThread {
discussion_id: disc.discussion_id,
notes,
},
summary: "Note by alice".to_owned(),
actor: Some("alice".to_owned()),
summary: format!("Discussion ({note_count} notes)"),
actor,
url: None,
is_seed: true,
}];
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &evidence, None, 100).unwrap();
let note_event = events.iter().find(|e| {
matches!(
&e.event_type,
TimelineEventType::NoteEvidence { note_id, .. } if *note_id == 42
)
});
assert!(note_event.is_some());
}
#[test]
fn test_collect_merged_fallback_to_state_event() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
// MR with merged_at = NULL
let mr_id = insert_mr(&conn, project_id, 10, None);
// But has a state event for 'merged'
insert_state_event(&conn, project_id, None, Some(mr_id), "merged", 5000);
let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
let merged = events
.iter()
.find(|e| matches!(e.event_type, TimelineEventType::Merged));
assert!(merged.is_some());
assert_eq!(merged.unwrap().timestamp, 5000);
}
Ok(())
}
#[cfg(test)]
#[path = "timeline_collect_tests.rs"]
mod tests;

View File

@@ -0,0 +1,704 @@
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn insert_project(conn: &Connection) -> i64 {
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
[],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, web_url) VALUES (?1, ?2, ?3, 'Auth bug', 'opened', 'alice', 1000, 2000, 3000, 'https://gitlab.com/group/project/-/issues/1')",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_mr(conn: &Connection, project_id: i64, iid: i64, merged_at: Option<i64>) -> i64 {
conn.execute(
"INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, merged_at, merge_user_username, web_url) VALUES (?1, ?2, ?3, 'Fix auth', 'merged', 'bob', 1000, 5000, 6000, ?4, 'charlie', 'https://gitlab.com/group/project/-/merge_requests/10')",
rusqlite::params![iid * 100, project_id, iid, merged_at],
)
.unwrap();
conn.last_insert_rowid()
}
fn make_entity_ref(entity_type: &str, entity_id: i64, iid: i64) -> EntityRef {
EntityRef {
entity_type: entity_type.to_owned(),
entity_id,
entity_iid: iid,
project_path: "group/project".to_owned(),
}
}
fn insert_state_event(
conn: &Connection,
project_id: i64,
issue_id: Option<i64>,
mr_id: Option<i64>,
state: &str,
created_at: i64,
) {
let gitlab_id: i64 = rand::random::<u32>().into();
conn.execute(
"INSERT INTO resource_state_events (gitlab_id, project_id, issue_id, merge_request_id, state, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, 'alice', ?6)",
rusqlite::params![gitlab_id, project_id, issue_id, mr_id, state, created_at],
)
.unwrap();
}
fn insert_label_event(
conn: &Connection,
project_id: i64,
issue_id: Option<i64>,
mr_id: Option<i64>,
action: &str,
label_name: Option<&str>,
created_at: i64,
) {
let gitlab_id: i64 = rand::random::<u32>().into();
conn.execute(
"INSERT INTO resource_label_events (gitlab_id, project_id, issue_id, merge_request_id, action, label_name, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, 'alice', ?7)",
rusqlite::params![gitlab_id, project_id, issue_id, mr_id, action, label_name, created_at],
)
.unwrap();
}
fn insert_milestone_event(
conn: &Connection,
project_id: i64,
issue_id: Option<i64>,
mr_id: Option<i64>,
action: &str,
milestone_title: Option<&str>,
created_at: i64,
) {
let gitlab_id: i64 = rand::random::<u32>().into();
conn.execute(
"INSERT INTO resource_milestone_events (gitlab_id, project_id, issue_id, merge_request_id, action, milestone_title, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, 'alice', ?7)",
rusqlite::params![gitlab_id, project_id, issue_id, mr_id, action, milestone_title, created_at],
)
.unwrap();
}
#[test]
fn test_collect_creation_event() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
assert_eq!(events.len(), 1);
assert!(matches!(events[0].event_type, TimelineEventType::Created));
assert_eq!(events[0].timestamp, 1000);
assert_eq!(events[0].actor, Some("alice".to_owned()));
assert!(events[0].is_seed);
}
#[test]
fn test_collect_state_events() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
insert_state_event(&conn, project_id, Some(issue_id), None, "reopened", 4000);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
// Created + 2 state changes = 3
assert_eq!(events.len(), 3);
assert!(matches!(events[0].event_type, TimelineEventType::Created));
assert!(matches!(
events[1].event_type,
TimelineEventType::StateChanged { ref state } if state == "closed"
));
assert!(matches!(
events[2].event_type,
TimelineEventType::StateChanged { ref state } if state == "reopened"
));
}
#[test]
fn test_collect_merged_dedup() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let mr_id = insert_mr(&conn, project_id, 10, Some(5000));
// Also add a state event for 'merged' — this should NOT produce a StateChanged
insert_state_event(&conn, project_id, None, Some(mr_id), "merged", 5000);
let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
// Should have Created + Merged (not Created + StateChanged{merged} + Merged)
let merged_count = events
.iter()
.filter(|e| matches!(e.event_type, TimelineEventType::Merged))
.count();
let state_merged_count = events
.iter()
.filter(|e| matches!(&e.event_type, TimelineEventType::StateChanged { state } if state == "merged"))
.count();
assert_eq!(merged_count, 1);
assert_eq!(state_merged_count, 0);
}
#[test]
fn test_collect_null_label_fallback() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
insert_label_event(&conn, project_id, Some(issue_id), None, "add", None, 2000);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
let label_event = events.iter().find(|e| {
matches!(&e.event_type, TimelineEventType::LabelAdded { label } if label == "[deleted label]")
});
assert!(label_event.is_some());
}
#[test]
fn test_collect_null_milestone_fallback() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
insert_milestone_event(&conn, project_id, Some(issue_id), None, "add", None, 2000);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
let ms_event = events.iter().find(|e| {
matches!(&e.event_type, TimelineEventType::MilestoneSet { milestone } if milestone == "[deleted milestone]")
});
assert!(ms_event.is_some());
}
#[test]
fn test_collect_since_filter() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
insert_state_event(&conn, project_id, Some(issue_id), None, "reopened", 5000);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
// Since 4000: should exclude Created (1000) and closed (3000)
let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], Some(4000), 100).unwrap();
assert_eq!(events.len(), 1);
assert_eq!(events[0].timestamp, 5000);
}
#[test]
fn test_collect_chronological_sort() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10, Some(4000));
insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
insert_label_event(
&conn,
project_id,
None,
Some(mr_id),
"add",
Some("bug"),
2000,
);
let seeds = vec![
make_entity_ref("issue", issue_id, 1),
make_entity_ref("merge_request", mr_id, 10),
];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
// Verify chronological order
for window in events.windows(2) {
assert!(window[0].timestamp <= window[1].timestamp);
}
}
#[test]
fn test_collect_respects_limit() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
for i in 0..20 {
insert_state_event(
&conn,
project_id,
Some(issue_id),
None,
"closed",
3000 + i * 100,
);
}
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, total) = collect_events(&conn, &seeds, &[], &[], &[], None, 5).unwrap();
assert_eq!(events.len(), 5);
// 20 state changes + 1 created = 21 total before limit
assert_eq!(total, 21);
}
#[test]
fn test_collect_evidence_notes_included() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let evidence = vec![TimelineEvent {
timestamp: 2500,
entity_type: "issue".to_owned(),
entity_id: issue_id,
entity_iid: 1,
project_path: "group/project".to_owned(),
event_type: TimelineEventType::NoteEvidence {
note_id: 42,
snippet: "relevant note".to_owned(),
discussion_id: Some(1),
},
summary: "Note by alice".to_owned(),
actor: Some("alice".to_owned()),
url: None,
is_seed: true,
}];
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let (events, _) = collect_events(&conn, &seeds, &[], &evidence, &[], None, 100).unwrap();
let note_event = events.iter().find(|e| {
matches!(
&e.event_type,
TimelineEventType::NoteEvidence { note_id, .. } if *note_id == 42
)
});
assert!(note_event.is_some());
}
#[test]
fn test_collect_merged_fallback_to_state_event() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
// MR with merged_at = NULL
let mr_id = insert_mr(&conn, project_id, 10, None);
// But has a state event for 'merged'
insert_state_event(&conn, project_id, None, Some(mr_id), "merged", 5000);
let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
let merged = events
.iter()
.find(|e| matches!(e.event_type, TimelineEventType::Merged));
assert!(merged.is_some());
assert_eq!(merged.unwrap().timestamp, 5000);
}
// ─── Discussion thread tests ────────────────────────────────────────────────
fn insert_discussion(
conn: &Connection,
project_id: i64,
issue_id: Option<i64>,
mr_id: Option<i64>,
) -> i64 {
let noteable_type = if issue_id.is_some() {
"Issue"
} else {
"MergeRequest"
};
conn.execute(
"INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, merge_request_id, noteable_type, last_seen_at) VALUES (?1, ?2, ?3, ?4, ?5, 0)",
rusqlite::params![format!("disc_{}", rand::random::<u32>()), project_id, issue_id, mr_id, noteable_type],
)
.unwrap();
conn.last_insert_rowid()
}
#[allow(clippy::too_many_arguments)]
fn insert_note(
conn: &Connection,
discussion_id: i64,
project_id: i64,
author: &str,
body: &str,
is_system: bool,
created_at: i64,
) -> i64 {
let gitlab_id: i64 = rand::random::<u32>().into();
conn.execute(
"INSERT INTO notes (gitlab_id, discussion_id, project_id, is_system, author_username, body, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?7, ?7)",
rusqlite::params![gitlab_id, discussion_id, project_id, is_system as i32, author, body, created_at],
)
.unwrap();
conn.last_insert_rowid()
}
fn make_matched_discussion(
discussion_id: i64,
entity_type: &str,
entity_id: i64,
project_id: i64,
) -> MatchedDiscussion {
MatchedDiscussion {
discussion_id,
entity_type: entity_type.to_owned(),
entity_id,
project_id,
}
}
#[test]
fn test_collect_discussion_thread_basic() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_note(
&conn,
disc_id,
project_id,
"alice",
"First note",
false,
2000,
);
insert_note(&conn, disc_id, project_id, "bob", "Reply here", false, 3000);
insert_note(
&conn,
disc_id,
project_id,
"alice",
"Follow up",
false,
4000,
);
let seeds = [make_entity_ref("issue", issue_id, 1)];
let discussions = [make_matched_discussion(
disc_id, "issue", issue_id, project_id,
)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
let thread = events
.iter()
.find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }));
assert!(thread.is_some(), "Should have a DiscussionThread event");
let thread = thread.unwrap();
if let TimelineEventType::DiscussionThread {
discussion_id,
notes,
} = &thread.event_type
{
assert_eq!(*discussion_id, disc_id);
assert_eq!(notes.len(), 3);
assert_eq!(notes[0].author.as_deref(), Some("alice"));
assert_eq!(notes[0].body, "First note");
assert_eq!(notes[1].author.as_deref(), Some("bob"));
assert_eq!(notes[2].body, "Follow up");
} else {
panic!("Expected DiscussionThread variant");
}
}
#[test]
fn test_collect_discussion_thread_skips_system_notes() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_note(
&conn,
disc_id,
project_id,
"alice",
"User note",
false,
2000,
);
insert_note(
&conn,
disc_id,
project_id,
"system",
"added label ~bug",
true,
3000,
);
insert_note(
&conn,
disc_id,
project_id,
"bob",
"Another user note",
false,
4000,
);
let seeds = [make_entity_ref("issue", issue_id, 1)];
let discussions = [make_matched_discussion(
disc_id, "issue", issue_id, project_id,
)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
let thread = events
.iter()
.find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }));
assert!(thread.is_some());
if let TimelineEventType::DiscussionThread { notes, .. } = &thread.unwrap().event_type {
assert_eq!(notes.len(), 2, "System notes should be filtered out");
assert_eq!(notes[0].body, "User note");
assert_eq!(notes[1].body, "Another user note");
} else {
panic!("Expected DiscussionThread");
}
}
#[test]
fn test_collect_discussion_thread_empty_after_system_filter() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
// Only system notes
insert_note(
&conn,
disc_id,
project_id,
"system",
"added label",
true,
2000,
);
insert_note(
&conn,
disc_id,
project_id,
"system",
"removed label",
true,
3000,
);
let seeds = [make_entity_ref("issue", issue_id, 1)];
let discussions = [make_matched_discussion(
disc_id, "issue", issue_id, project_id,
)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
let thread_count = events
.iter()
.filter(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
.count();
assert_eq!(
thread_count, 0,
"All-system-note discussion should produce no thread"
);
}
#[test]
fn test_collect_discussion_thread_body_truncation() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
let long_body = "x".repeat(10_000);
insert_note(&conn, disc_id, project_id, "alice", &long_body, false, 2000);
let seeds = [make_entity_ref("issue", issue_id, 1)];
let discussions = [make_matched_discussion(
disc_id, "issue", issue_id, project_id,
)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
let thread = events
.iter()
.find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
.unwrap();
if let TimelineEventType::DiscussionThread { notes, .. } = &thread.event_type {
assert!(
notes[0].body.chars().count() <= crate::core::timeline::THREAD_NOTE_MAX_CHARS,
"Body should be truncated to THREAD_NOTE_MAX_CHARS"
);
} else {
panic!("Expected DiscussionThread");
}
}
#[test]
fn test_collect_discussion_thread_note_cap() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
// Insert 60 notes, exceeding THREAD_MAX_NOTES (50)
for i in 0..60 {
insert_note(
&conn,
disc_id,
project_id,
"alice",
&format!("Note {i}"),
false,
2000 + i * 100,
);
}
let seeds = [make_entity_ref("issue", issue_id, 1)];
let discussions = [make_matched_discussion(
disc_id, "issue", issue_id, project_id,
)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
let thread = events
.iter()
.find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
.unwrap();
if let TimelineEventType::DiscussionThread { notes, .. } = &thread.event_type {
// 50 notes + 1 synthetic summary = 51
assert_eq!(
notes.len(),
crate::core::timeline::THREAD_MAX_NOTES + 1,
"Should cap at THREAD_MAX_NOTES + synthetic summary"
);
let last = notes.last().unwrap();
assert!(last.body.contains("more notes not shown"));
} else {
panic!("Expected DiscussionThread");
}
}
#[test]
fn test_collect_discussion_thread_timestamp_is_first_note() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_note(&conn, disc_id, project_id, "alice", "First", false, 5000);
insert_note(&conn, disc_id, project_id, "bob", "Second", false, 8000);
let seeds = [make_entity_ref("issue", issue_id, 1)];
let discussions = [make_matched_discussion(
disc_id, "issue", issue_id, project_id,
)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
let thread = events
.iter()
.find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
.unwrap();
assert_eq!(
thread.timestamp, 5000,
"Thread timestamp should be first note's created_at"
);
}
#[test]
fn test_collect_discussion_thread_sort_position() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
// Note at t=2000 (between Created at t=1000 and state change at t=3000)
insert_note(
&conn,
disc_id,
project_id,
"alice",
"discussion",
false,
2000,
);
insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
let seeds = [make_entity_ref("issue", issue_id, 1)];
let discussions = [make_matched_discussion(
disc_id, "issue", issue_id, project_id,
)];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
// Expected order: Created(1000), DiscussionThread(2000), StateChanged(3000)
assert!(events.len() >= 3);
assert!(matches!(events[0].event_type, TimelineEventType::Created));
assert!(matches!(
events[1].event_type,
TimelineEventType::DiscussionThread { .. }
));
assert!(matches!(
events[2].event_type,
TimelineEventType::StateChanged { .. }
));
}
#[test]
fn test_collect_discussion_thread_dedup() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_note(&conn, disc_id, project_id, "alice", "hello", false, 2000);
let seeds = [make_entity_ref("issue", issue_id, 1)];
// Same discussion_id twice
let discussions = [
make_matched_discussion(disc_id, "issue", issue_id, project_id),
make_matched_discussion(disc_id, "issue", issue_id, project_id),
];
let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
let thread_count = events
.iter()
.filter(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
.count();
assert_eq!(
thread_count, 1,
"Duplicate discussion_id should produce one thread"
);
}

View File

@@ -248,310 +248,5 @@ fn find_incoming(
}
#[cfg(test)]
mod tests {
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn insert_project(conn: &Connection) -> i64 {
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
[],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test', 'opened', 'alice', 1000, 2000, 3000)",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
#[allow(clippy::too_many_arguments)]
fn insert_ref(
conn: &Connection,
project_id: i64,
source_type: &str,
source_id: i64,
target_type: &str,
target_id: Option<i64>,
ref_type: &str,
source_method: &str,
) {
conn.execute(
"INSERT INTO entity_references (project_id, source_entity_type, source_entity_id, target_entity_type, target_entity_id, reference_type, source_method, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, 1000)",
rusqlite::params![project_id, source_type, source_id, target_type, target_id, ref_type, source_method],
)
.unwrap();
}
fn make_entity_ref(entity_type: &str, entity_id: i64, iid: i64) -> EntityRef {
EntityRef {
entity_type: entity_type.to_owned(),
entity_id,
entity_iid: iid,
project_path: "group/project".to_owned(),
}
}
#[test]
fn test_expand_depth_zero() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 0, false, 100).unwrap();
assert!(result.expanded_entities.is_empty());
assert!(result.unresolved_references.is_empty());
}
#[test]
fn test_expand_finds_linked_entity() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// MR closes issue
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert_eq!(result.expanded_entities.len(), 1);
assert_eq!(
result.expanded_entities[0].entity_ref.entity_type,
"merge_request"
);
assert_eq!(result.expanded_entities[0].entity_ref.entity_iid, 10);
assert_eq!(result.expanded_entities[0].depth, 1);
}
#[test]
fn test_expand_bidirectional() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// MR closes issue (MR is source, issue is target)
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
// Starting from MR should find the issue (outgoing)
let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert_eq!(result.expanded_entities.len(), 1);
assert_eq!(result.expanded_entities[0].entity_ref.entity_type, "issue");
}
#[test]
fn test_expand_respects_max_entities() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
// Create 10 MRs that all close this issue
for i in 2..=11 {
let mr_id = insert_mr(&conn, project_id, i);
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
}
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 3).unwrap();
assert!(result.expanded_entities.len() <= 3);
}
#[test]
fn test_expand_skips_mentions_by_default() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// MR mentions issue (should be skipped by default)
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"mentioned",
"note_parse",
);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert!(result.expanded_entities.is_empty());
}
#[test]
fn test_expand_includes_mentions_when_flagged() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// MR mentions issue
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"mentioned",
"note_parse",
);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, true, 100).unwrap();
assert_eq!(result.expanded_entities.len(), 1);
}
#[test]
fn test_expand_collects_unresolved() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
// Unresolved cross-project reference
conn.execute(
"INSERT INTO entity_references (project_id, source_entity_type, source_entity_id, target_entity_type, target_entity_id, target_project_path, target_entity_iid, reference_type, source_method, created_at) VALUES (?1, 'issue', ?2, 'issue', NULL, 'other/repo', 42, 'closes', 'description_parse', 1000)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert!(result.expanded_entities.is_empty());
assert_eq!(result.unresolved_references.len(), 1);
assert_eq!(
result.unresolved_references[0].target_project,
Some("other/repo".to_owned())
);
assert_eq!(result.unresolved_references[0].target_iid, Some(42));
}
#[test]
fn test_expand_tracks_provenance() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert_eq!(result.expanded_entities.len(), 1);
let expanded = &result.expanded_entities[0];
assert_eq!(expanded.via_reference_type, "closes");
assert_eq!(expanded.via_source_method, "api");
assert_eq!(expanded.via_from.entity_type, "issue");
assert_eq!(expanded.via_from.entity_id, issue_id);
}
#[test]
fn test_expand_no_duplicates() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// Two references from MR to same issue (different methods)
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"related",
"note_parse",
);
let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
// Should only appear once (first-come wins)
assert_eq!(result.expanded_entities.len(), 1);
}
#[test]
fn test_expand_empty_seeds() {
let conn = setup_test_db();
let result = expand_timeline(&conn, &[], 1, false, 100).unwrap();
assert!(result.expanded_entities.is_empty());
}
}
#[path = "timeline_expand_tests.rs"]
mod tests;

View File

@@ -0,0 +1,305 @@
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn insert_project(conn: &Connection) -> i64 {
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
[],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test', 'opened', 'alice', 1000, 2000, 3000)",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
#[allow(clippy::too_many_arguments)]
fn insert_ref(
conn: &Connection,
project_id: i64,
source_type: &str,
source_id: i64,
target_type: &str,
target_id: Option<i64>,
ref_type: &str,
source_method: &str,
) {
conn.execute(
"INSERT INTO entity_references (project_id, source_entity_type, source_entity_id, target_entity_type, target_entity_id, reference_type, source_method, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, 1000)",
rusqlite::params![project_id, source_type, source_id, target_type, target_id, ref_type, source_method],
)
.unwrap();
}
fn make_entity_ref(entity_type: &str, entity_id: i64, iid: i64) -> EntityRef {
EntityRef {
entity_type: entity_type.to_owned(),
entity_id,
entity_iid: iid,
project_path: "group/project".to_owned(),
}
}
#[test]
fn test_expand_depth_zero() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 0, false, 100).unwrap();
assert!(result.expanded_entities.is_empty());
assert!(result.unresolved_references.is_empty());
}
#[test]
fn test_expand_finds_linked_entity() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// MR closes issue
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert_eq!(result.expanded_entities.len(), 1);
assert_eq!(
result.expanded_entities[0].entity_ref.entity_type,
"merge_request"
);
assert_eq!(result.expanded_entities[0].entity_ref.entity_iid, 10);
assert_eq!(result.expanded_entities[0].depth, 1);
}
#[test]
fn test_expand_bidirectional() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// MR closes issue (MR is source, issue is target)
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
// Starting from MR should find the issue (outgoing)
let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert_eq!(result.expanded_entities.len(), 1);
assert_eq!(result.expanded_entities[0].entity_ref.entity_type, "issue");
}
#[test]
fn test_expand_respects_max_entities() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
// Create 10 MRs that all close this issue
for i in 2..=11 {
let mr_id = insert_mr(&conn, project_id, i);
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
}
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 3).unwrap();
assert!(result.expanded_entities.len() <= 3);
}
#[test]
fn test_expand_skips_mentions_by_default() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// MR mentions issue (should be skipped by default)
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"mentioned",
"note_parse",
);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert!(result.expanded_entities.is_empty());
}
#[test]
fn test_expand_includes_mentions_when_flagged() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// MR mentions issue
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"mentioned",
"note_parse",
);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, true, 100).unwrap();
assert_eq!(result.expanded_entities.len(), 1);
}
#[test]
fn test_expand_collects_unresolved() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
// Unresolved cross-project reference
conn.execute(
"INSERT INTO entity_references (project_id, source_entity_type, source_entity_id, target_entity_type, target_entity_id, target_project_path, target_entity_iid, reference_type, source_method, created_at) VALUES (?1, 'issue', ?2, 'issue', NULL, 'other/repo', 42, 'closes', 'description_parse', 1000)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert!(result.expanded_entities.is_empty());
assert_eq!(result.unresolved_references.len(), 1);
assert_eq!(
result.unresolved_references[0].target_project,
Some("other/repo".to_owned())
);
assert_eq!(result.unresolved_references[0].target_iid, Some(42));
}
#[test]
fn test_expand_tracks_provenance() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
let seeds = vec![make_entity_ref("issue", issue_id, 1)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
assert_eq!(result.expanded_entities.len(), 1);
let expanded = &result.expanded_entities[0];
assert_eq!(expanded.via_reference_type, "closes");
assert_eq!(expanded.via_source_method, "api");
assert_eq!(expanded.via_from.entity_type, "issue");
assert_eq!(expanded.via_from.entity_id, issue_id);
}
#[test]
fn test_expand_no_duplicates() {
let conn = setup_test_db();
let project_id = insert_project(&conn);
let issue_id = insert_issue(&conn, project_id, 1);
let mr_id = insert_mr(&conn, project_id, 10);
// Two references from MR to same issue (different methods)
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"closes",
"api",
);
insert_ref(
&conn,
project_id,
"merge_request",
mr_id,
"issue",
Some(issue_id),
"related",
"note_parse",
);
let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
// Should only appear once (first-come wins)
assert_eq!(result.expanded_entities.len(), 1);
}
#[test]
fn test_expand_empty_seeds() {
let conn = setup_test_db();
let result = expand_timeline(&conn, &[], 1, false, 100).unwrap();
assert!(result.expanded_entities.is_empty());
}

View File

@@ -4,24 +4,34 @@ use rusqlite::Connection;
use tracing::debug;
use crate::core::error::Result;
use crate::core::timeline::{EntityRef, TimelineEvent, TimelineEventType, resolve_entity_ref};
use crate::search::{FtsQueryMode, to_fts_query};
use crate::core::timeline::{
EntityRef, MatchedDiscussion, TimelineEvent, TimelineEventType, resolve_entity_by_iid,
resolve_entity_ref, truncate_to_chars,
};
use crate::embedding::ollama::OllamaClient;
use crate::search::{FtsQueryMode, SearchFilters, SearchMode, search_hybrid, to_fts_query};
/// Result of the seed + hydrate phases.
pub struct SeedResult {
pub seed_entities: Vec<EntityRef>,
pub evidence_notes: Vec<TimelineEvent>,
/// Discussions matched during seeding, to be collected as full threads.
pub matched_discussions: Vec<MatchedDiscussion>,
/// The search mode actually used (hybrid with fallback info).
pub search_mode: String,
}
/// Run the SEED + HYDRATE phases of the timeline pipeline.
///
/// 1. SEED: FTS5 keyword search over documents -> matched document IDs
/// 1. SEED: Hybrid search (FTS + vector via RRF) over documents -> matched document IDs
/// 2. HYDRATE: Map document IDs -> source entities + top matched notes as evidence
///
/// When `client` is `None` or Ollama is unavailable, falls back to FTS-only search.
/// Discussion documents are resolved to their parent entity (issue or MR).
/// Entities are deduplicated. Evidence notes are capped at `max_evidence`.
pub fn seed_timeline(
pub async fn seed_timeline(
conn: &Connection,
client: Option<&OllamaClient>,
query: &str,
project_id: Option<i64>,
since_ms: Option<i64>,
@@ -33,81 +43,206 @@ pub fn seed_timeline(
return Ok(SeedResult {
seed_entities: Vec::new(),
evidence_notes: Vec::new(),
matched_discussions: Vec::new(),
search_mode: "lexical".to_owned(),
});
}
let seed_entities = find_seed_entities(conn, &fts_query, project_id, since_ms, max_seeds)?;
// Use hybrid search for seed entity discovery (better recall than FTS alone).
// search_hybrid gracefully falls back to FTS-only when Ollama is unavailable.
let filters = SearchFilters {
project_id,
updated_since: since_ms,
limit: max_seeds.saturating_mul(3),
..SearchFilters::default()
};
let (hybrid_results, warnings) = search_hybrid(
conn,
client,
query,
SearchMode::Hybrid,
&filters,
FtsQueryMode::Safe,
)
.await?;
let search_mode = if warnings
.iter()
.any(|w| w.contains("falling back") || w.contains("FTS only"))
{
"lexical (hybrid fallback)".to_owned()
} else if client.is_some() && !hybrid_results.is_empty() {
"hybrid".to_owned()
} else {
"lexical".to_owned()
};
for w in &warnings {
debug!(warning = %w, "hybrid search warning during timeline seeding");
}
let (seed_entities, matched_discussions) = resolve_documents_to_entities(
conn,
&hybrid_results
.iter()
.map(|r| r.document_id)
.collect::<Vec<_>>(),
max_seeds,
)?;
// Evidence notes stay FTS-only (supplementary context, not worth a second embedding call)
let evidence_notes = find_evidence_notes(conn, &fts_query, project_id, since_ms, max_evidence)?;
Ok(SeedResult {
seed_entities,
evidence_notes,
matched_discussions,
search_mode,
})
}
/// Find seed entities via FTS5 search, resolving discussions to their parent entity.
fn find_seed_entities(
/// Seed the timeline directly from an entity IID, bypassing search entirely.
///
/// Used for `issue:42` / `mr:99` syntax. Resolves the entity, gathers ALL its
/// discussions, and returns a `SeedResult` compatible with the rest of the pipeline.
pub fn seed_timeline_direct(
conn: &Connection,
fts_query: &str,
entity_type: &str,
iid: i64,
project_id: Option<i64>,
since_ms: Option<i64>,
max_seeds: usize,
) -> Result<Vec<EntityRef>> {
let sql = r"
) -> Result<SeedResult> {
let entity_ref = resolve_entity_by_iid(conn, entity_type, iid, project_id)?;
// Gather all discussions for this entity (not search-matched, ALL of them)
let entity_id_col = match entity_type {
"issue" => "issue_id",
"merge_request" => "merge_request_id",
_ => {
return Ok(SeedResult {
seed_entities: vec![entity_ref],
evidence_notes: Vec::new(),
matched_discussions: Vec::new(),
search_mode: "direct".to_owned(),
});
}
};
let sql = format!("SELECT id, project_id FROM discussions WHERE {entity_id_col} = ?1");
let mut stmt = conn.prepare(&sql)?;
let matched_discussions: Vec<MatchedDiscussion> = stmt
.query_map(rusqlite::params![entity_ref.entity_id], |row| {
Ok(MatchedDiscussion {
discussion_id: row.get(0)?,
entity_type: entity_type.to_owned(),
entity_id: entity_ref.entity_id,
project_id: row.get(1)?,
})
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
Ok(SeedResult {
seed_entities: vec![entity_ref],
evidence_notes: Vec::new(),
matched_discussions,
search_mode: "direct".to_owned(),
})
}
/// Resolve a list of document IDs to deduplicated entity refs and matched discussions.
/// Discussion and note documents are resolved to their parent entity (issue or MR).
/// Returns (entities, matched_discussions).
fn resolve_documents_to_entities(
conn: &Connection,
document_ids: &[i64],
max_entities: usize,
) -> Result<(Vec<EntityRef>, Vec<MatchedDiscussion>)> {
if document_ids.is_empty() {
return Ok((Vec::new(), Vec::new()));
}
let placeholders: String = document_ids
.iter()
.map(|_| "?")
.collect::<Vec<_>>()
.join(",");
let sql = format!(
r"
SELECT d.source_type, d.source_id, d.project_id,
disc.issue_id, disc.merge_request_id
FROM documents_fts
JOIN documents d ON d.id = documents_fts.rowid
COALESCE(disc.issue_id, note_disc.issue_id) AS issue_id,
COALESCE(disc.merge_request_id, note_disc.merge_request_id) AS mr_id,
COALESCE(disc.id, note_disc.id) AS discussion_id
FROM documents d
LEFT JOIN discussions disc ON disc.id = d.source_id AND d.source_type = 'discussion'
WHERE documents_fts MATCH ?1
AND (?2 IS NULL OR d.project_id = ?2)
AND (?3 IS NULL OR d.updated_at >= ?3)
ORDER BY rank
LIMIT ?4
";
LEFT JOIN notes n ON n.id = d.source_id AND d.source_type = 'note'
LEFT JOIN discussions note_disc ON note_disc.id = n.discussion_id AND d.source_type = 'note'
WHERE d.id IN ({placeholders})
ORDER BY CASE d.id {order_clause} END
",
order_clause = document_ids
.iter()
.enumerate()
.map(|(i, id)| format!("WHEN {id} THEN {i}"))
.collect::<Vec<_>>()
.join(" "),
);
let mut stmt = conn.prepare(sql)?;
let rows = stmt.query_map(
rusqlite::params![
fts_query,
project_id,
since_ms,
max_seeds.saturating_mul(3) as i64
],
|row| {
let mut stmt = conn.prepare(&sql)?;
let params: Vec<&dyn rusqlite::types::ToSql> = document_ids
.iter()
.map(|id| id as &dyn rusqlite::types::ToSql)
.collect();
let rows = stmt.query_map(params.as_slice(), |row| {
Ok((
row.get::<_, String>(0)?,
row.get::<_, i64>(1)?,
row.get::<_, i64>(2)?,
row.get::<_, Option<i64>>(3)?,
row.get::<_, Option<i64>>(4)?,
row.get::<_, String>(0)?, // source_type
row.get::<_, i64>(1)?, // source_id
row.get::<_, i64>(2)?, // project_id
row.get::<_, Option<i64>>(3)?, // issue_id (coalesced)
row.get::<_, Option<i64>>(4)?, // mr_id (coalesced)
row.get::<_, Option<i64>>(5)?, // discussion_id (coalesced)
))
},
)?;
})?;
let mut seen = HashSet::new();
let mut seen_entities = HashSet::new();
let mut seen_discussions = HashSet::new();
let mut entities = Vec::new();
let mut matched_discussions = Vec::new();
for row_result in rows {
let (source_type, source_id, proj_id, disc_issue_id, disc_mr_id) = row_result?;
let (source_type, source_id, proj_id, disc_issue_id, disc_mr_id, discussion_id) =
row_result?;
let (entity_type, entity_id) = match source_type.as_str() {
"issue" => ("issue".to_owned(), source_id),
"merge_request" => ("merge_request".to_owned(), source_id),
"discussion" => {
"discussion" | "note" => {
if let Some(issue_id) = disc_issue_id {
("issue".to_owned(), issue_id)
} else if let Some(mr_id) = disc_mr_id {
("merge_request".to_owned(), mr_id)
} else {
continue; // orphaned discussion
continue; // orphaned discussion/note
}
}
_ => continue,
};
// Capture matched discussion (deduplicated)
if let Some(disc_id) = discussion_id
&& (source_type == "discussion" || source_type == "note")
&& seen_discussions.insert(disc_id)
{
matched_discussions.push(MatchedDiscussion {
discussion_id: disc_id,
entity_type: entity_type.clone(),
entity_id,
project_id: proj_id,
});
}
// Entity dedup
let key = (entity_type.clone(), entity_id);
if !seen.insert(key) {
if !seen_entities.insert(key) {
continue;
}
@@ -116,12 +251,12 @@ fn find_seed_entities(
entities.push(entity_ref);
}
if entities.len() >= max_seeds {
if entities.len() >= max_entities {
break;
}
}
Ok(entities)
Ok((entities, matched_discussions))
}
/// Find evidence notes: FTS5-matched discussion notes that provide context.
@@ -217,336 +352,6 @@ fn find_evidence_notes(
Ok(events)
}
/// Truncate a string to at most `max_chars` characters on a safe UTF-8 boundary.
fn truncate_to_chars(s: &str, max_chars: usize) -> String {
let char_count = s.chars().count();
if char_count <= max_chars {
return s.to_owned();
}
let byte_end = s
.char_indices()
.nth(max_chars)
.map(|(i, _)| i)
.unwrap_or(s.len());
s[..byte_end].to_owned()
}
#[cfg(test)]
mod tests {
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn insert_test_project(conn: &Connection) -> i64 {
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
[],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_test_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test issue', 'opened', 'alice', 1000, 2000, 3000)",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_test_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_document(
conn: &Connection,
source_type: &str,
source_id: i64,
project_id: i64,
content: &str,
) -> i64 {
conn.execute(
"INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) VALUES (?1, ?2, ?3, ?4, ?5)",
rusqlite::params![source_type, source_id, project_id, content, format!("hash_{source_id}")],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_discussion(
conn: &Connection,
project_id: i64,
issue_id: Option<i64>,
mr_id: Option<i64>,
) -> i64 {
let noteable_type = if issue_id.is_some() {
"Issue"
} else {
"MergeRequest"
};
conn.execute(
"INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, merge_request_id, noteable_type, last_seen_at) VALUES (?1, ?2, ?3, ?4, ?5, 0)",
rusqlite::params![format!("disc_{}", rand::random::<u32>()), project_id, issue_id, mr_id, noteable_type],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_note(
conn: &Connection,
discussion_id: i64,
project_id: i64,
body: &str,
is_system: bool,
) -> i64 {
let gitlab_id: i64 = rand::random::<u32>().into();
conn.execute(
"INSERT INTO notes (gitlab_id, discussion_id, project_id, is_system, author_username, body, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, ?4, 'alice', ?5, 5000, 5000, 5000)",
rusqlite::params![gitlab_id, discussion_id, project_id, is_system as i32, body],
)
.unwrap();
conn.last_insert_rowid()
}
#[test]
fn test_seed_empty_query_returns_empty() {
let conn = setup_test_db();
let result = seed_timeline(&conn, "", None, None, 50, 10).unwrap();
assert!(result.seed_entities.is_empty());
assert!(result.evidence_notes.is_empty());
}
#[test]
fn test_seed_no_matches_returns_empty() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 1);
insert_document(
&conn,
"issue",
issue_id,
project_id,
"unrelated content here",
);
let result = seed_timeline(&conn, "nonexistent_xyzzy_query", None, None, 50, 10).unwrap();
assert!(result.seed_entities.is_empty());
}
#[test]
fn test_seed_finds_issue() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 42);
insert_document(
&conn,
"issue",
issue_id,
project_id,
"authentication error in login flow",
);
let result = seed_timeline(&conn, "authentication", None, None, 50, 10).unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "issue");
assert_eq!(result.seed_entities[0].entity_iid, 42);
assert_eq!(result.seed_entities[0].project_path, "group/project");
}
#[test]
fn test_seed_finds_mr() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let mr_id = insert_test_mr(&conn, project_id, 99);
insert_document(
&conn,
"merge_request",
mr_id,
project_id,
"fix authentication bug",
);
let result = seed_timeline(&conn, "authentication", None, None, 50, 10).unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "merge_request");
assert_eq!(result.seed_entities[0].entity_iid, 99);
}
#[test]
fn test_seed_deduplicates_entities() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 10);
// Two documents referencing the same issue
insert_document(
&conn,
"issue",
issue_id,
project_id,
"authentication error first doc",
);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_document(
&conn,
"discussion",
disc_id,
project_id,
"authentication error second doc",
);
let result = seed_timeline(&conn, "authentication", None, None, 50, 10).unwrap();
// Should deduplicate: both map to the same issue
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_iid, 10);
}
#[test]
fn test_seed_resolves_discussion_to_parent() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 7);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_document(
&conn,
"discussion",
disc_id,
project_id,
"deployment pipeline failed",
);
let result = seed_timeline(&conn, "deployment", None, None, 50, 10).unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "issue");
assert_eq!(result.seed_entities[0].entity_iid, 7);
}
#[test]
fn test_seed_evidence_capped() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 1);
// Create 15 discussion documents with notes about "deployment"
for i in 0..15 {
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_document(
&conn,
"discussion",
disc_id,
project_id,
&format!("deployment issue number {i}"),
);
insert_note(
&conn,
disc_id,
project_id,
&format!("deployment note {i}"),
false,
);
}
let result = seed_timeline(&conn, "deployment", None, None, 50, 5).unwrap();
assert!(result.evidence_notes.len() <= 5);
}
#[test]
fn test_seed_evidence_snippet_truncated() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_document(
&conn,
"discussion",
disc_id,
project_id,
"deployment configuration",
);
let long_body = "x".repeat(500);
insert_note(&conn, disc_id, project_id, &long_body, false);
let result = seed_timeline(&conn, "deployment", None, None, 50, 10).unwrap();
assert!(!result.evidence_notes.is_empty());
if let TimelineEventType::NoteEvidence { snippet, .. } =
&result.evidence_notes[0].event_type
{
assert!(snippet.chars().count() <= 200);
} else {
panic!("Expected NoteEvidence");
}
}
#[test]
fn test_seed_respects_project_filter() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
// Insert a second project
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (2, 'other/repo', 'https://gitlab.com/other/repo')",
[],
)
.unwrap();
let project2_id = conn.last_insert_rowid();
let issue1_id = insert_test_issue(&conn, project_id, 1);
insert_document(
&conn,
"issue",
issue1_id,
project_id,
"authentication error",
);
let issue2_id = insert_test_issue(&conn, project2_id, 2);
insert_document(
&conn,
"issue",
issue2_id,
project2_id,
"authentication error",
);
// Filter to project 1 only
let result =
seed_timeline(&conn, "authentication", Some(project_id), None, 50, 10).unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].project_path, "group/project");
}
#[test]
fn test_truncate_to_chars_short() {
assert_eq!(truncate_to_chars("hello", 200), "hello");
}
#[test]
fn test_truncate_to_chars_long() {
let long = "a".repeat(300);
let result = truncate_to_chars(&long, 200);
assert_eq!(result.chars().count(), 200);
}
#[test]
fn test_truncate_to_chars_multibyte() {
let s = "\u{1F600}".repeat(300); // emoji
let result = truncate_to_chars(&s, 200);
assert_eq!(result.chars().count(), 200);
// Verify valid UTF-8
assert!(std::str::from_utf8(result.as_bytes()).is_ok());
}
}
#[path = "timeline_seed_tests.rs"]
mod tests;

View File

@@ -0,0 +1,512 @@
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn insert_test_project(conn: &Connection) -> i64 {
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
[],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_test_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test issue', 'opened', 'alice', 1000, 2000, 3000)",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_test_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
conn.execute(
"INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
rusqlite::params![iid * 100, project_id, iid],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_document(
conn: &Connection,
source_type: &str,
source_id: i64,
project_id: i64,
content: &str,
) -> i64 {
conn.execute(
"INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) VALUES (?1, ?2, ?3, ?4, ?5)",
rusqlite::params![source_type, source_id, project_id, content, format!("hash_{source_id}")],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_discussion(
conn: &Connection,
project_id: i64,
issue_id: Option<i64>,
mr_id: Option<i64>,
) -> i64 {
let noteable_type = if issue_id.is_some() {
"Issue"
} else {
"MergeRequest"
};
conn.execute(
"INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, merge_request_id, noteable_type, last_seen_at) VALUES (?1, ?2, ?3, ?4, ?5, 0)",
rusqlite::params![format!("disc_{}", rand::random::<u32>()), project_id, issue_id, mr_id, noteable_type],
)
.unwrap();
conn.last_insert_rowid()
}
fn insert_note(
conn: &Connection,
discussion_id: i64,
project_id: i64,
body: &str,
is_system: bool,
) -> i64 {
let gitlab_id: i64 = rand::random::<u32>().into();
conn.execute(
"INSERT INTO notes (gitlab_id, discussion_id, project_id, is_system, author_username, body, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, ?4, 'alice', ?5, 5000, 5000, 5000)",
rusqlite::params![gitlab_id, discussion_id, project_id, is_system as i32, body],
)
.unwrap();
conn.last_insert_rowid()
}
#[tokio::test]
async fn test_seed_empty_query_returns_empty() {
let conn = setup_test_db();
let result = seed_timeline(&conn, None, "", None, None, 50, 10)
.await
.unwrap();
assert!(result.seed_entities.is_empty());
assert!(result.evidence_notes.is_empty());
}
#[tokio::test]
async fn test_seed_no_matches_returns_empty() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 1);
insert_document(
&conn,
"issue",
issue_id,
project_id,
"unrelated content here",
);
let result = seed_timeline(&conn, None, "nonexistent_xyzzy_query", None, None, 50, 10)
.await
.unwrap();
assert!(result.seed_entities.is_empty());
}
#[tokio::test]
async fn test_seed_finds_issue() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 42);
insert_document(
&conn,
"issue",
issue_id,
project_id,
"authentication error in login flow",
);
let result = seed_timeline(&conn, None, "authentication", None, None, 50, 10)
.await
.unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "issue");
assert_eq!(result.seed_entities[0].entity_iid, 42);
assert_eq!(result.seed_entities[0].project_path, "group/project");
}
#[tokio::test]
async fn test_seed_finds_mr() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let mr_id = insert_test_mr(&conn, project_id, 99);
insert_document(
&conn,
"merge_request",
mr_id,
project_id,
"fix authentication bug",
);
let result = seed_timeline(&conn, None, "authentication", None, None, 50, 10)
.await
.unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "merge_request");
assert_eq!(result.seed_entities[0].entity_iid, 99);
}
#[tokio::test]
async fn test_seed_deduplicates_entities() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 10);
// Two documents referencing the same issue
insert_document(
&conn,
"issue",
issue_id,
project_id,
"authentication error first doc",
);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_document(
&conn,
"discussion",
disc_id,
project_id,
"authentication error second doc",
);
let result = seed_timeline(&conn, None, "authentication", None, None, 50, 10)
.await
.unwrap();
// Should deduplicate: both map to the same issue
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_iid, 10);
}
#[tokio::test]
async fn test_seed_resolves_discussion_to_parent() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 7);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_document(
&conn,
"discussion",
disc_id,
project_id,
"deployment pipeline failed",
);
let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
.await
.unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "issue");
assert_eq!(result.seed_entities[0].entity_iid, 7);
}
#[tokio::test]
async fn test_seed_evidence_capped() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 1);
// Create 15 discussion documents with notes about "deployment"
for i in 0..15 {
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_document(
&conn,
"discussion",
disc_id,
project_id,
&format!("deployment issue number {i}"),
);
insert_note(
&conn,
disc_id,
project_id,
&format!("deployment note {i}"),
false,
);
}
let result = seed_timeline(&conn, None, "deployment", None, None, 50, 5)
.await
.unwrap();
assert!(result.evidence_notes.len() <= 5);
}
#[tokio::test]
async fn test_seed_evidence_snippet_truncated() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_document(
&conn,
"discussion",
disc_id,
project_id,
"deployment configuration",
);
let long_body = "x".repeat(500);
insert_note(&conn, disc_id, project_id, &long_body, false);
let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
.await
.unwrap();
assert!(!result.evidence_notes.is_empty());
if let TimelineEventType::NoteEvidence { snippet, .. } = &result.evidence_notes[0].event_type {
assert!(snippet.chars().count() <= 200);
} else {
panic!("Expected NoteEvidence");
}
}
#[tokio::test]
async fn test_seed_respects_project_filter() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
// Insert a second project
conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (2, 'other/repo', 'https://gitlab.com/other/repo')",
[],
)
.unwrap();
let project2_id = conn.last_insert_rowid();
let issue1_id = insert_test_issue(&conn, project_id, 1);
insert_document(
&conn,
"issue",
issue1_id,
project_id,
"authentication error",
);
let issue2_id = insert_test_issue(&conn, project2_id, 2);
insert_document(
&conn,
"issue",
issue2_id,
project2_id,
"authentication error",
);
// Filter to project 1 only
let result = seed_timeline(
&conn,
None,
"authentication",
Some(project_id),
None,
50,
10,
)
.await
.unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].project_path, "group/project");
}
// ─── Matched discussion tests ───────────────────────────────────────────────
#[tokio::test]
async fn test_seed_captures_matched_discussions_from_discussion_doc() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_document(
&conn,
"discussion",
disc_id,
project_id,
"deployment pipeline authentication",
);
let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
.await
.unwrap();
assert_eq!(result.matched_discussions.len(), 1);
assert_eq!(result.matched_discussions[0].discussion_id, disc_id);
assert_eq!(result.matched_discussions[0].entity_type, "issue");
assert_eq!(result.matched_discussions[0].entity_id, issue_id);
}
#[tokio::test]
async fn test_seed_captures_matched_discussions_from_note_doc() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
let note_id = insert_note(&conn, disc_id, project_id, "note about deployment", false);
insert_document(
&conn,
"note",
note_id,
project_id,
"deployment configuration details",
);
let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
.await
.unwrap();
assert_eq!(
result.matched_discussions.len(),
1,
"Note doc should resolve to parent discussion"
);
assert_eq!(result.matched_discussions[0].discussion_id, disc_id);
assert_eq!(result.matched_discussions[0].entity_type, "issue");
}
#[tokio::test]
async fn test_seed_deduplicates_matched_discussions() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 1);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
// Two docs referencing the same discussion
insert_document(
&conn,
"discussion",
disc_id,
project_id,
"deployment pipeline first doc",
);
let note_id = insert_note(&conn, disc_id, project_id, "deployment note", false);
insert_document(
&conn,
"note",
note_id,
project_id,
"deployment pipeline second doc",
);
let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
.await
.unwrap();
assert_eq!(
result.matched_discussions.len(),
1,
"Same discussion_id from two docs should deduplicate"
);
}
#[tokio::test]
async fn test_seed_matched_discussions_have_correct_parent_entity() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let mr_id = insert_test_mr(&conn, project_id, 99);
let disc_id = insert_discussion(&conn, project_id, None, Some(mr_id));
insert_document(
&conn,
"discussion",
disc_id,
project_id,
"deployment pipeline for merge request",
);
let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
.await
.unwrap();
assert_eq!(result.matched_discussions.len(), 1);
assert_eq!(result.matched_discussions[0].entity_type, "merge_request");
assert_eq!(result.matched_discussions[0].entity_id, mr_id);
}
// ─── seed_timeline_direct tests ─────────────────────────────────────────────
#[test]
fn test_direct_seed_resolves_entity() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
insert_test_issue(&conn, project_id, 42);
let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "issue");
assert_eq!(result.seed_entities[0].entity_iid, 42);
assert_eq!(result.seed_entities[0].project_path, "group/project");
}
#[test]
fn test_direct_seed_gathers_all_discussions() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 42);
// Create 3 discussions for this issue
let disc1 = insert_discussion(&conn, project_id, Some(issue_id), None);
let disc2 = insert_discussion(&conn, project_id, Some(issue_id), None);
let disc3 = insert_discussion(&conn, project_id, Some(issue_id), None);
let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
assert_eq!(result.matched_discussions.len(), 3);
let disc_ids: Vec<i64> = result
.matched_discussions
.iter()
.map(|d| d.discussion_id)
.collect();
assert!(disc_ids.contains(&disc1));
assert!(disc_ids.contains(&disc2));
assert!(disc_ids.contains(&disc3));
}
#[test]
fn test_direct_seed_no_evidence_notes() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let issue_id = insert_test_issue(&conn, project_id, 42);
let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
insert_note(&conn, disc_id, project_id, "some note body", false);
let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
assert!(
result.evidence_notes.is_empty(),
"Direct seeding should not produce evidence notes"
);
}
#[test]
fn test_direct_seed_search_mode_is_direct() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
insert_test_issue(&conn, project_id, 42);
let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
assert_eq!(result.search_mode, "direct");
}
#[test]
fn test_direct_seed_not_found() {
let conn = setup_test_db();
insert_test_project(&conn);
let result = seed_timeline_direct(&conn, "issue", 999, None);
assert!(result.is_err());
}
#[test]
fn test_direct_seed_mr() {
let conn = setup_test_db();
let project_id = insert_test_project(&conn);
let mr_id = insert_test_mr(&conn, project_id, 99);
let disc_id = insert_discussion(&conn, project_id, None, Some(mr_id));
let result = seed_timeline_direct(&conn, "merge_request", 99, None).unwrap();
assert_eq!(result.seed_entities.len(), 1);
assert_eq!(result.seed_entities[0].entity_type, "merge_request");
assert_eq!(result.seed_entities[0].entity_iid, 99);
assert_eq!(result.matched_discussions.len(), 1);
assert_eq!(result.matched_discussions[0].discussion_id, disc_id);
}

262
src/core/trace.rs Normal file
View File

@@ -0,0 +1,262 @@
use serde::Serialize;
use super::error::Result;
use super::file_history::resolve_rename_chain;
use super::time::ms_to_iso;
/// Maximum rename chain BFS depth.
const MAX_RENAME_HOPS: usize = 10;
/// A linked issue found via entity_references on the MR.
#[derive(Debug, Serialize)]
pub struct TraceIssue {
pub iid: i64,
pub title: String,
pub state: String,
pub reference_type: String,
pub web_url: Option<String>,
}
/// A DiffNote discussion relevant to the traced file.
#[derive(Debug, Serialize)]
pub struct TraceDiscussion {
pub discussion_id: String,
pub mr_iid: i64,
pub author_username: String,
pub body: String,
pub path: String,
pub created_at_iso: String,
}
/// A single trace chain: an MR that touched the file, plus linked issues and discussions.
#[derive(Debug, Serialize)]
pub struct TraceChain {
pub mr_iid: i64,
pub mr_title: String,
pub mr_state: String,
pub mr_author: String,
pub change_type: String,
pub merged_at_iso: Option<String>,
pub updated_at_iso: String,
pub web_url: Option<String>,
pub issues: Vec<TraceIssue>,
pub discussions: Vec<TraceDiscussion>,
}
/// Result of a trace query.
#[derive(Debug, Serialize)]
pub struct TraceResult {
pub path: String,
pub resolved_paths: Vec<String>,
pub renames_followed: bool,
pub trace_chains: Vec<TraceChain>,
pub total_chains: usize,
}
/// Run the trace query: file -> MR -> issue chain.
pub fn run_trace(
conn: &rusqlite::Connection,
project_id: Option<i64>,
path: &str,
follow_renames: bool,
include_discussions: bool,
limit: usize,
) -> Result<TraceResult> {
// Resolve rename chain
let (all_paths, renames_followed) = if follow_renames {
if let Some(pid) = project_id {
let chain = resolve_rename_chain(conn, pid, path, MAX_RENAME_HOPS)?;
let followed = chain.len() > 1;
(chain, followed)
} else {
(vec![path.to_string()], false)
}
} else {
(vec![path.to_string()], false)
};
// Build placeholders for IN clause
let placeholders: Vec<String> = (0..all_paths.len())
.map(|i| format!("?{}", i + 2))
.collect();
let in_clause = placeholders.join(", ");
let project_filter = if project_id.is_some() {
"AND mfc.project_id = ?1"
} else {
""
};
// Step 1: Find MRs that touched the file
let mr_sql = format!(
"SELECT DISTINCT \
mr.id, mr.iid, mr.title, mr.state, mr.author_username, \
mfc.change_type, mr.merged_at, mr.updated_at, mr.web_url \
FROM mr_file_changes mfc \
JOIN merge_requests mr ON mr.id = mfc.merge_request_id \
WHERE mfc.new_path IN ({in_clause}) {project_filter} \
ORDER BY COALESCE(mr.merged_at, mr.updated_at) DESC \
LIMIT ?{}",
all_paths.len() + 2
);
let mut stmt = conn.prepare(&mr_sql)?;
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = Vec::new();
params.push(Box::new(project_id.unwrap_or(0)));
for p in &all_paths {
params.push(Box::new(p.clone()));
}
params.push(Box::new(limit as i64));
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
struct MrRow {
id: i64,
iid: i64,
title: String,
state: String,
author: String,
change_type: String,
merged_at: Option<i64>,
updated_at: i64,
web_url: Option<String>,
}
let mr_rows: Vec<MrRow> = stmt
.query_map(param_refs.as_slice(), |row| {
Ok(MrRow {
id: row.get(0)?,
iid: row.get(1)?,
title: row.get(2)?,
state: row.get(3)?,
author: row.get(4)?,
change_type: row.get(5)?,
merged_at: row.get(6)?,
updated_at: row.get(7)?,
web_url: row.get(8)?,
})
})?
.filter_map(std::result::Result::ok)
.collect();
// Step 2: For each MR, find linked issues + optional discussions
let mut trace_chains = Vec::with_capacity(mr_rows.len());
for mr in &mr_rows {
let issues = fetch_linked_issues(conn, mr.id)?;
let discussions = if include_discussions {
fetch_trace_discussions(conn, mr.id, mr.iid, &all_paths)?
} else {
Vec::new()
};
trace_chains.push(TraceChain {
mr_iid: mr.iid,
mr_title: mr.title.clone(),
mr_state: mr.state.clone(),
mr_author: mr.author.clone(),
change_type: mr.change_type.clone(),
merged_at_iso: mr.merged_at.map(ms_to_iso),
updated_at_iso: ms_to_iso(mr.updated_at),
web_url: mr.web_url.clone(),
issues,
discussions,
});
}
let total_chains = trace_chains.len();
Ok(TraceResult {
path: path.to_string(),
resolved_paths: all_paths,
renames_followed,
trace_chains,
total_chains,
})
}
/// Fetch issues linked to an MR via entity_references.
/// source = merge_request -> target = issue (closes/mentioned/related)
fn fetch_linked_issues(conn: &rusqlite::Connection, mr_id: i64) -> Result<Vec<TraceIssue>> {
let sql = "SELECT DISTINCT i.iid, i.title, i.state, er.reference_type, i.web_url \
FROM entity_references er \
JOIN issues i ON i.id = er.target_entity_id \
WHERE er.source_entity_type = 'merge_request' \
AND er.source_entity_id = ?1 \
AND er.target_entity_type = 'issue' \
AND er.target_entity_id IS NOT NULL \
ORDER BY \
CASE er.reference_type WHEN 'closes' THEN 0 WHEN 'related' THEN 1 ELSE 2 END, \
i.iid";
let mut stmt = conn.prepare(sql)?;
let issues: Vec<TraceIssue> = stmt
.query_map(rusqlite::params![mr_id], |row| {
Ok(TraceIssue {
iid: row.get(0)?,
title: row.get(1)?,
state: row.get(2)?,
reference_type: row.get(3)?,
web_url: row.get(4)?,
})
})?
.filter_map(std::result::Result::ok)
.collect();
Ok(issues)
}
/// Fetch DiffNote discussions on a specific MR that reference the traced paths.
fn fetch_trace_discussions(
conn: &rusqlite::Connection,
mr_id: i64,
mr_iid: i64,
paths: &[String],
) -> Result<Vec<TraceDiscussion>> {
let placeholders: Vec<String> = (0..paths.len()).map(|i| format!("?{}", i + 2)).collect();
let in_clause = placeholders.join(", ");
let sql = format!(
"SELECT d.gitlab_discussion_id, n.author_username, n.body, n.position_new_path, n.created_at \
FROM notes n \
JOIN discussions d ON d.id = n.discussion_id \
WHERE d.merge_request_id = ?1 \
AND n.position_new_path IN ({in_clause}) \
AND n.is_system = 0 \
ORDER BY n.created_at DESC \
LIMIT 20"
);
let mut stmt = conn.prepare(&sql)?;
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = Vec::new();
params.push(Box::new(mr_id));
for p in paths {
params.push(Box::new(p.clone()));
}
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let discussions: Vec<TraceDiscussion> = stmt
.query_map(param_refs.as_slice(), |row| {
let created_at: i64 = row.get(4)?;
Ok(TraceDiscussion {
discussion_id: row.get(0)?,
mr_iid,
author_username: row.get(1)?,
body: row.get(2)?,
path: row.get(3)?,
created_at_iso: ms_to_iso(created_at),
})
})?
.filter_map(std::result::Result::ok)
.collect();
Ok(discussions)
}
#[cfg(test)]
#[path = "trace_tests.rs"]
mod tests;

260
src/core/trace_tests.rs Normal file
View File

@@ -0,0 +1,260 @@
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> rusqlite::Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_project(conn: &rusqlite::Connection) -> i64 {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'group/repo', 'https://gitlab.example.com/group/repo', 1000, 2000)",
[],
)
.unwrap();
1
}
fn insert_mr(
conn: &rusqlite::Connection,
id: i64,
iid: i64,
title: &str,
state: &str,
merged_at: Option<i64>,
) {
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, author_username, \
created_at, updated_at, last_seen_at, source_branch, target_branch, merged_at, web_url)
VALUES (?1, ?2, ?3, 1, ?4, ?5, 'dev', 1000, 2000, 2000, 'feature', 'main', ?6, \
'https://gitlab.example.com/group/repo/-/merge_requests/' || ?3)",
rusqlite::params![id, 300 + id, iid, title, state, merged_at],
)
.unwrap();
}
fn insert_file_change(
conn: &rusqlite::Connection,
mr_id: i64,
old_path: Option<&str>,
new_path: &str,
change_type: &str,
) {
conn.execute(
"INSERT INTO mr_file_changes (merge_request_id, project_id, old_path, new_path, change_type)
VALUES (?1, 1, ?2, ?3, ?4)",
rusqlite::params![mr_id, old_path, new_path, change_type],
)
.unwrap();
}
fn insert_entity_ref(
conn: &rusqlite::Connection,
source_type: &str,
source_id: i64,
target_type: &str,
target_id: i64,
ref_type: &str,
) {
conn.execute(
"INSERT INTO entity_references (project_id, source_entity_type, source_entity_id, \
target_entity_type, target_entity_id, reference_type, source_method, created_at)
VALUES (1, ?1, ?2, ?3, ?4, ?5, 'api', 1000)",
rusqlite::params![source_type, source_id, target_type, target_id, ref_type],
)
.unwrap();
}
fn insert_issue(conn: &rusqlite::Connection, id: i64, iid: i64, title: &str, state: &str) {
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, \
last_seen_at, web_url)
VALUES (?1, ?2, 1, ?3, ?4, ?5, 1000, 2000, 2000, \
'https://gitlab.example.com/group/repo/-/issues/' || ?3)",
rusqlite::params![id, 400 + id, iid, title, state],
)
.unwrap();
}
fn insert_discussion_and_note(
conn: &rusqlite::Connection,
discussion_id: i64,
mr_id: i64,
note_id: i64,
author: &str,
body: &str,
position_new_path: Option<&str>,
) {
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, merge_request_id, \
noteable_type, last_seen_at)
VALUES (?1, 'disc-' || ?1, 1, ?2, 'MergeRequest', 2000)",
rusqlite::params![discussion_id, mr_id],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, \
is_system, created_at, updated_at, last_seen_at, position_new_path)
VALUES (?1, ?2, ?3, 1, ?4, ?5, 0, 1500, 1500, 2000, ?6)",
rusqlite::params![
note_id,
500 + note_id,
discussion_id,
author,
body,
position_new_path
],
)
.unwrap();
}
#[test]
fn test_trace_empty_file() {
let conn = setup_test_db();
seed_project(&conn);
let result = run_trace(&conn, Some(1), "src/nonexistent.rs", false, false, 10).unwrap();
assert!(result.trace_chains.is_empty());
assert_eq!(result.resolved_paths, ["src/nonexistent.rs"]);
}
#[test]
fn test_trace_finds_mr() {
let conn = setup_test_db();
seed_project(&conn);
insert_mr(&conn, 1, 10, "Add auth module", "merged", Some(3000));
insert_file_change(&conn, 1, None, "src/auth.rs", "added");
let result = run_trace(&conn, Some(1), "src/auth.rs", false, false, 10).unwrap();
assert_eq!(result.trace_chains.len(), 1);
let chain = &result.trace_chains[0];
assert_eq!(chain.mr_iid, 10);
assert_eq!(chain.mr_title, "Add auth module");
assert_eq!(chain.mr_state, "merged");
assert_eq!(chain.change_type, "added");
assert!(chain.merged_at_iso.is_some());
}
#[test]
fn test_trace_follows_renames() {
let conn = setup_test_db();
seed_project(&conn);
// MR 1: added old_auth.rs
insert_mr(&conn, 1, 10, "Add old auth", "merged", Some(1000));
insert_file_change(&conn, 1, None, "src/old_auth.rs", "added");
// MR 2: renamed old_auth.rs -> auth.rs
insert_mr(&conn, 2, 11, "Rename auth", "merged", Some(2000));
insert_file_change(&conn, 2, Some("src/old_auth.rs"), "src/auth.rs", "renamed");
// Query auth.rs with follow_renames -- should find both MRs
let result = run_trace(&conn, Some(1), "src/auth.rs", true, false, 10).unwrap();
assert!(result.renames_followed);
assert!(
result
.resolved_paths
.contains(&"src/old_auth.rs".to_string())
);
assert!(result.resolved_paths.contains(&"src/auth.rs".to_string()));
// MR 2 touches auth.rs (new_path), MR 1 touches old_auth.rs (new_path in its row)
assert_eq!(result.trace_chains.len(), 2);
}
#[test]
fn test_trace_links_issues() {
let conn = setup_test_db();
seed_project(&conn);
insert_mr(&conn, 1, 10, "Fix login bug", "merged", Some(3000));
insert_file_change(&conn, 1, None, "src/login.rs", "modified");
insert_issue(&conn, 1, 42, "Login broken on mobile", "closed");
insert_entity_ref(&conn, "merge_request", 1, "issue", 1, "closes");
let result = run_trace(&conn, Some(1), "src/login.rs", false, false, 10).unwrap();
assert_eq!(result.trace_chains.len(), 1);
assert_eq!(result.trace_chains[0].issues.len(), 1);
let issue = &result.trace_chains[0].issues[0];
assert_eq!(issue.iid, 42);
assert_eq!(issue.title, "Login broken on mobile");
assert_eq!(issue.reference_type, "closes");
}
#[test]
fn test_trace_limits_chains() {
let conn = setup_test_db();
seed_project(&conn);
for i in 1..=3 {
insert_mr(
&conn,
i,
10 + i,
&format!("MR {i}"),
"merged",
Some(1000 * i),
);
insert_file_change(&conn, i, None, "src/shared.rs", "modified");
}
let result = run_trace(&conn, Some(1), "src/shared.rs", false, false, 1).unwrap();
assert_eq!(result.trace_chains.len(), 1);
}
#[test]
fn test_trace_no_follow_renames() {
let conn = setup_test_db();
seed_project(&conn);
// MR 1: added old_name.rs
insert_mr(&conn, 1, 10, "Add old file", "merged", Some(1000));
insert_file_change(&conn, 1, None, "src/old_name.rs", "added");
// MR 2: renamed old_name.rs -> new_name.rs
insert_mr(&conn, 2, 11, "Rename file", "merged", Some(2000));
insert_file_change(
&conn,
2,
Some("src/old_name.rs"),
"src/new_name.rs",
"renamed",
);
// Without follow_renames -- should only find MR 2 (new_path = new_name.rs)
let result = run_trace(&conn, Some(1), "src/new_name.rs", false, false, 10).unwrap();
assert_eq!(result.resolved_paths, ["src/new_name.rs"]);
assert!(!result.renames_followed);
assert_eq!(result.trace_chains.len(), 1);
assert_eq!(result.trace_chains[0].mr_iid, 11);
}
#[test]
fn test_trace_includes_discussions() {
let conn = setup_test_db();
seed_project(&conn);
insert_mr(&conn, 1, 10, "Refactor auth", "merged", Some(3000));
insert_file_change(&conn, 1, None, "src/auth.rs", "modified");
insert_discussion_and_note(
&conn,
1,
1,
1,
"reviewer",
"This function should handle the error case.",
Some("src/auth.rs"),
);
let result = run_trace(&conn, Some(1), "src/auth.rs", false, true, 10).unwrap();
assert_eq!(result.trace_chains.len(), 1);
assert_eq!(result.trace_chains[0].discussions.len(), 1);
let disc = &result.trace_chains[0].discussions[0];
assert_eq!(disc.author_username, "reviewer");
assert!(disc.body.contains("error case"));
assert_eq!(disc.mr_iid, 10);
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -3,8 +3,9 @@ mod regenerator;
mod truncation;
pub use extractor::{
DocumentData, SourceType, compute_content_hash, compute_list_hash, extract_discussion_document,
extract_issue_document, extract_mr_document,
DocumentData, ParentMetadataCache, SourceType, compute_content_hash, compute_list_hash,
extract_discussion_document, extract_issue_document, extract_mr_document,
extract_note_document, extract_note_document_cached,
};
pub use regenerator::{RegenerateResult, regenerate_dirty_documents};
pub use truncation::{

Some files were not shown because too many files have changed in this diff Show More