**Critical Plan Findings First** 1. The proposed index `idx_notes_mr_path_author ON notes(noteable_id, ...)` will fail: `notes.noteable_id` does not exist in schema (`migrations/002_issues.sql:74`). 2. Rename awareness is only applied in scoring queries, not in path resolution probes; today `build_path_query()` and `suffix_probe()` only inspect `position_new_path`/`new_path` (`src/cli/commands/who.rs:465`, `src/cli/commands/who.rs:591`), so old-path queries can still miss. 3. A fixed `"24m"` default window is brittle once half-lives become configurable; it can silently truncate meaningful history for larger half-lives. Below are the revisions I’d make to your plan. 1. **Fix migration/index architecture (blocking correctness + perf)** Rationale: prevents migration failure and aligns indexes to actual query shapes. ```diff diff --git a/plan.md b/plan.md @@ ### 6. Index Migration (db.rs) - -- Support EXISTS subquery for reviewer participation check - CREATE INDEX IF NOT EXISTS idx_notes_mr_path_author - ON notes(noteable_id, position_new_path, author_username) - WHERE note_type = 'DiffNote' AND is_system = 0; + -- Support reviewer participation joins (notes -> discussions -> MR) + CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author_created + ON notes(discussion_id, author_username, created_at) + WHERE note_type = 'DiffNote' AND is_system = 0; + + -- Path-first indexes for global and project-scoped path lookups + CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr + ON mr_file_changes(new_path, project_id, merge_request_id); + CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr + ON mr_file_changes(old_path, project_id, merge_request_id) + WHERE old_path IS NOT NULL; @@ - -- Support state-aware timestamp selection - CREATE INDEX IF NOT EXISTS idx_mr_state_timestamps - ON merge_requests(state, merged_at, closed_at, updated_at, created_at); + -- Removed: low-selectivity timestamp composite index; joins are MR-id driven. ``` 2. **Restructure SQL around `matched_mrs` CTE instead of repeating OR path clauses** Rationale: better index use, less duplicated logic, cleaner maintenance. ```diff diff --git a/plan.md b/plan.md @@ ### 3. SQL Restructure (who.rs) - WITH raw AS ( - -- 5 UNION ALL subqueries (signals 1, 2, 3, 4a, 4b) - ), + WITH matched_notes AS ( + -- DiffNotes matching new_path + ... + UNION ALL + -- DiffNotes matching old_path + ... + ), + matched_file_changes AS ( + -- file changes matching new_path + ... + UNION ALL + -- file changes matching old_path + ... + ), + matched_mrs AS ( + SELECT DISTINCT mr_id, project_id FROM matched_notes + UNION + SELECT DISTINCT mr_id, project_id FROM matched_file_changes + ), + raw AS ( + -- signals sourced from matched_mrs + matched_notes + ), ``` 3. **Replace correlated `EXISTS/NOT EXISTS` reviewer split with one precomputed participation set** Rationale: same semantics, lower query cost, easier reasoning. ```diff diff --git a/plan.md b/plan.md @@ Signal 4 splits into two - Signal 4a uses an EXISTS subquery ... - Signal 4b uses NOT EXISTS ... + Build `reviewer_participation(mr_id, username)` once from matched DiffNotes. + Then classify `mr_reviewers` rows via LEFT JOIN: + - participated: `rp.username IS NOT NULL` + - assigned-only: `rp.username IS NULL` + This avoids correlated EXISTS scans per reviewer row. ``` 4. **Make default `--since` derived from half-life + decay floor, not hardcoded 24m** Rationale: remains mathematically consistent when config changes. ```diff diff --git a/plan.md b/plan.md @@ ### 1. ScoringConfig (config.rs) + pub decay_floor: f64, // default: 0.05 @@ ### 5. Default --since Change - Expert mode: "6m" -> "24m" + Expert mode default window is computed: + default_since_days = ceil(max_half_life_days * log2(1.0 / decay_floor)) + With defaults (max_half_life=180, floor=0.05), this is ~26 months. + CLI `--since` still overrides; `--all-history` still disables windowing. ``` 5. **Use `log2(1+count)` for notes instead of `ln(1+count)`** Rationale: keeps 1 note ~= 1 unit (with `note_bonus=1`) while preserving diminishing returns. ```diff diff --git a/plan.md b/plan.md @@ Scoring Formula - note_contribution(mr) = note_bonus * ln(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life) + note_contribution(mr) = note_bonus * log2(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life) ``` 6. **Guarantee deterministic float aggregation and expose `score_raw`** Rationale: avoids hash-order drift and explainability mismatch vs rounded integer score. ```diff diff --git a/plan.md b/plan.md @@ ### 4. Rust-Side Aggregation (who.rs) - HashMap + BTreeMap (or sort keys before accumulation) for deterministic summation order + Use compensated summation (Kahan/Neumaier) for stable f64 totals @@ - Sort on raw `f64` score ... round only for display + Keep `score_raw` internally and expose when `--explain-score` is active. + `score` remains integer for backward compatibility. ``` 7. **Extend rename awareness to query resolution (not only scoring)** Rationale: fixes user-facing misses for old path input and suffix lookup. ```diff diff --git a/plan.md b/plan.md @@ Path rename awareness - All signal subqueries match both old and new path columns + Also update `build_path_query()` probes and suffix probe: + - exact_exists: new_path OR old_path (notes + mr_file_changes) + - prefix_exists: new_path LIKE OR old_path LIKE + - suffix_probe: union of notes.position_new_path, notes.position_old_path, + mr_file_changes.new_path, mr_file_changes.old_path ``` 8. **Tighten CLI/output contracts for new flags** Rationale: avoids payload bloat/ambiguity and keeps robot clients stable. ```diff diff --git a/plan.md b/plan.md @@ ### 5b. Score Explainability via `--explain-score` + `--explain-score` conflicts with `--detail` (mutually exclusive) + `resolved_input` includes `as_of_ms`, `as_of_iso`, `scoring_model_version` + robot output includes `score_raw` and `components` only when explain is enabled ``` 9. **Add confidence metadata (promote from rejected to accepted)** Rationale: makes ranking more actionable and trustworthy with sparse evidence. ```diff diff --git a/plan.md b/plan.md @@ Rejected Ideas (with rationale) - Confidence/coverage metadata: ... Deferred to avoid scope creep + Confidence/coverage metadata: ACCEPTED (minimal v1) + Add per-user `confidence: low|medium|high` based on evidence breadth + recency. + Keep implementation lightweight (no extra SQL pass). ``` 10. **Upgrade test and verification scope to include query-plan and clock semantics** Rationale: catches regressions your current tests won’t. ```diff diff --git a/plan.md b/plan.md @@ 8. New Tests (TDD) + test_old_path_probe_exact_and_prefix + test_suffix_probe_uses_old_path_sources + test_since_relative_to_as_of_clock + test_explain_and_detail_are_mutually_exclusive + test_null_timestamp_fallback_to_created_at + test_query_plan_uses_path_indexes (EXPLAIN QUERY PLAN) @@ Verification + 7. EXPLAIN QUERY PLAN snapshots for expert query (exact + prefix) confirm index usage ``` If you want, I can produce a single consolidated “revision 3” plan document that fully merges all of the above into your original structure.