I avoided everything already listed in your `Rejected Ideas` section and focused on net-new upgrades. 1. Centralize MR temporal semantics in one `mr_activity` CTE (architecture + correctness) Why this improves the plan: right now the state-aware timestamp logic is repeated across multiple signal branches, while `closed_mr_multiplier` is applied later in Rust by string state checks. That split is brittle. A single `mr_activity` CTE removes drift risk, simplifies query maintenance, and avoids per-row state-string handling in Rust. ```diff diff --git a/plan.md b/plan.md @@ SQL Restructure +mr_activity AS ( + SELECT + m.id AS mr_id, + m.project_id, + m.author_username, + m.state, + CASE + WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.created_at) + WHEN m.state = 'closed' THEN COALESCE(m.closed_at, m.created_at) + ELSE COALESCE(m.updated_at, m.created_at) + END AS activity_ts, + CASE + WHEN m.state = 'closed' THEN ?5 + ELSE 1.0 + END AS state_mult + FROM merge_requests m + WHERE m.state IN ('opened','merged','closed') +), @@ -... {state_aware_ts} AS seen_at, m.state AS mr_state +... a.activity_ts AS seen_at, a.state_mult @@ -SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated +SELECT username, signal, mr_id, qty, ts, state_mult FROM aggregated ``` 2. Parameterize `reviewer_min_note_chars` and tighten config validation (robustness) Why this improves the plan: inlining `reviewer_min_note_chars` into SQL text creates statement-cache churn and avoidable SQL-text variability. Also, current validation misses finite-range guards (`NaN`, absurd half-lives). Parameterization + stronger validation reduces weird failure modes. ```diff diff --git a/plan.md b/plan.md @@ 1. ScoringConfig (config.rs) - reviewer_min_note_chars must be >= 0 + reviewer_min_note_chars must be <= 4096 + all half-life values must be <= 3650 (10 years safety cap) + closed_mr_multiplier must be finite and in (0.0, 1.0] @@ SQL Restructure -AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= {reviewer_min_note_chars} +AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= ?6 ``` 3. Add path canonicalization before probes/scoring (correctness + UX) Why this improves the plan: rename-awareness helps only after path resolution succeeds. Inputs like `./src//foo.rs` or inconsistent trailing slashes can still miss. Canonicalizing query paths up front reduces false negatives and ambiguous suffix behavior. ```diff diff --git a/plan.md b/plan.md @@ 3a. Path Resolution Probes (who.rs) +Add `normalize_query_path()` before `build_path_query()`: +- strip leading `./` +- collapse repeated `/` +- trim whitespace +- preserve trailing `/` only for explicit prefix intent +Expose both `path_input_original` and `path_input_normalized` in `resolved_input`. @@ New Tests +test_path_normalization_handles_dot_and_double_slash +test_path_normalization_preserves_explicit_prefix_semantics ``` 4. Add epsilon-based tie buckets for stable ranking (determinism) Why this improves the plan: even with deterministic summation order, tiny `powf` platform differences can reorder near-equal scores. Tie bucketing keeps ordering stable and user-meaningful. ```diff diff --git a/plan.md b/plan.md @@ 4. Rust-Side Aggregation (who.rs) -Sort on raw `f64` score — `(raw_score DESC, last_seen DESC, username ASC)`. +Sort using a tie bucket: +`score_bucket = (raw_score / 1e-9).floor() as i64` +Order by `(score_bucket DESC, raw_score DESC, last_seen DESC, username ASC)`. +This preserves precision while preventing meaningless micro-delta reorderings. @@ New Tests +test_near_equal_scores_use_stable_tie_bucket_order ``` 5. Add `--diagnose-score` aggregated diagnostics (operability) Why this improves the plan: `--explain-score` tells “why this user scored”, but not “why this query behaved oddly” (path ambiguity, dedup collapse, old_path contribution share, filtered bots, window exclusions). Lightweight aggregate diagnostics are high-value without per-MR drill-down complexity. ```diff diff --git a/plan.md b/plan.md @@ CLI changes (who.rs) +Add `--diagnose-score` flag (compatible with `--explain-score`, incompatible with `--detail`). +When enabled, include: +- matched_notes_raw_count +- matched_notes_dedup_count +- matched_file_changes_raw_count +- matched_file_changes_dedup_count +- rows_excluded_by_window_upper_bound +- users_filtered_by_excluded_usernames +- query_elapsed_ms @@ Robot output +`diagnostics` object emitted only when `--diagnose-score` is set. ``` 6. Add probe-optimized indexes for path resolution (performance) Why this improves the plan: current proposed indexes are optimized for scoring joins, but `build_path_query()` and `suffix_probe()` run existence/path-only probes where `author_username` is not constrained. Dedicated probe indexes will materially reduce latency for path lookup modes. ```diff diff --git a/plan.md b/plan.md @@ 6. Index Migration (db.rs) +-- Fast exact/prefix/suffix path probes on notes (no author predicate) +CREATE INDEX IF NOT EXISTS idx_notes_new_path_project_created + ON notes(position_new_path, project_id, created_at) + WHERE note_type = 'DiffNote' AND is_system = 0 AND position_new_path IS NOT NULL; + +CREATE INDEX IF NOT EXISTS idx_notes_old_path_project_created + ON notes(position_old_path, project_id, created_at) + WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL; ``` 7. Add multi-path expert scoring (`--path` repeatable) with dedup across paths (feature + utility) Why this improves the plan: current model is single-path centric. Real ownership questions are usually subsystem-level. Repeatable paths/prefixes let users ask “who knows auth stack?” in one call. Dedup by `(username, signal, mr_id)` avoids double-counting same MR touching multiple requested paths. ```diff diff --git a/plan.md b/plan.md @@ CLI/feature scope +Add repeatable `--path` in expert mode: +`lore who --expert --path src/auth/ --path src/session/` +Optional `--path-file ` for large path sets (one per line). @@ SQL Restructure +Add `requested_paths` CTE and match each source against that set. +Ensure dedup key includes `(username, signal, mr_id)` so one MR contributes once per signal even if multiple paths match. @@ New Tests +test_multi_path_query_unions_results_without_double_counting +test_multi_path_with_overlap_prefixes_is_idempotent ``` These 7 revisions keep your current model direction intact, but reduce correctness drift risk, harden edge handling, improve query observability, and make the feature materially more useful for real ownership workflows.