Your plan is already strong. The biggest remaining gaps are temporal correctness, indexability at scale, and ranking reliability under sparse/noisy evidence. These are the revisions I’d make. 1. **Fix temporal correctness for `--as-of` (critical)** Analysis: Right now the plan describes `--as-of`, but the SQL only enforces lower bounds (`>= since`). If `as_of` is in the past, “future” events can still enter and get full weight (because elapsed is clamped). This breaks reproducibility. ```diff @@ 3. SQL Restructure - AND n.created_at >= ?2 + AND n.created_at BETWEEN ?2 AND ?4 @@ Signal 3/4a/4b - AND {state_aware_ts} >= ?2 + AND {state_aware_ts} BETWEEN ?2 AND ?4 @@ 5a. Reproducible Scoring via --as-of - All decay computations use as_of_ms instead of SystemTime::now() + All event selection and decay computations are bounded by as_of_ms. + Query window is [since_ms, as_of_ms], never [since_ms, now_ms]. + Add test: test_as_of_excludes_future_events. ``` 2. **Resolve `closed`-state inconsistency** Analysis: The CASE handles `closed`, but all signal queries filter to `('opened','merged')`, making the `closed_at` branch dead code. Either include closed MRs intentionally or remove that logic. I’d include closed with a reduced multiplier. ```diff @@ ScoringConfig (config.rs) + pub closed_mr_multiplier: f64, // default: 0.5 @@ 3. SQL Restructure - AND m.state IN ('opened','merged') + AND m.state IN ('opened','merged','closed') @@ 4. Rust-Side Aggregation + if state == "closed" { contribution *= closed_mr_multiplier; } ``` 3. **Replace `OR` path predicates with index-friendly `UNION ALL` branches** Analysis: `(new_path ... OR old_path ...)` often degrades index usage in SQLite. Split into two indexed branches and dedupe once. This improves planner stability and latency on large datasets. ```diff @@ 3. SQL Restructure -WITH matched_notes AS ( - ... AND (n.position_new_path {path_op} OR n.position_old_path {path_op}) -), +WITH matched_notes AS ( + SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op} + UNION ALL + SELECT ... FROM notes n WHERE ... AND n.position_old_path {path_op} +), +matched_notes_dedup AS ( + SELECT DISTINCT id, discussion_id, author_username, created_at, project_id + FROM matched_notes +), @@ - JOIN matched_notes mn ... + JOIN matched_notes_dedup mn ... ``` 4. **Add canonical path identity (rename-chain support)** Analysis: Direct `old_path/new_path` matching only handles one-hop rename scenarios. A small alias graph/table built at ingest time gives robust expertise continuity across A→B→C chains and avoids repeated SQL complexity. ```diff @@ Files to Modify - 3. src/core/db.rs — Add migration for indexes... + 3. src/core/db.rs — Add migration for indexes + path_identity table + 4. src/core/ingest/*.rs — populate path_identity on rename events + 5. src/cli/commands/who.rs — resolve query path to canonical path_id first @@ Context - The fix has three parts: + The fix has four parts: + - Introduce canonical path identity so multi-hop renames preserve expertise ``` 5. **Split scoring engine into a versioned core module** Analysis: `who.rs` is becoming a mixed CLI/query/math/output surface. Move scoring math and event normalization into `src/core/scoring/` with explicit model versions. This reduces regression risk and enables future model experiments. ```diff @@ Files to Modify + 4. src/core/scoring/mod.rs — model interface + shared types + 5. src/core/scoring/model_v2_decay.rs — current implementation + 6. src/cli/commands/who.rs — orchestration only @@ 5b. Score Explainability + resolved_input includes scoring_model_version and scoring_model_name ``` 6. **Add evidence confidence to reduce sparse-data rank spikes** Analysis: One recent MR can outrank broader, steadier expertise. Add a confidence factor derived from number of distinct evidence MRs and expose both `score_raw` and `score_adjusted`. ```diff @@ Scoring Formula + confidence(user) = 1 - exp(-evidence_mr_count / 6.0) + score_adjusted = score_raw * confidence @@ 4. Rust-Side Aggregation + compute evidence_mr_count from unique MR ids across all signals + sort by score_adjusted DESC, then score_raw DESC, then last_seen DESC @@ 5b. --explain-score + include confidence and evidence_mr_count ``` 7. **Add first-class bot/service-account filtering** Analysis: Reviewer inflation is not just assignment; bots and automation users can still pollute rankings. Make exclusion explicit and configurable. ```diff @@ ScoringConfig (config.rs) + pub excluded_username_patterns: Vec, // defaults include "*bot*", "renovate", "dependabot" @@ 3. SQL Restructure + AND username NOT MATCHES excluded patterns (applied in Rust post-query or SQL where feasible) @@ CLI + --include-bots (override exclusions) ``` 8. **Tighten reviewer “participated” with substantive-note threshold** Analysis: A single “LGTM” note shouldn’t classify someone as engaged reviewer equivalent to real inline review. Use a minimum substantive threshold. ```diff @@ ScoringConfig (config.rs) + pub reviewer_min_note_chars: u32, // default: 20 @@ reviewer_participation CTE - SELECT DISTINCT ... FROM matched_notes + SELECT DISTINCT ... FROM matched_notes + WHERE LENGTH(TRIM(body)) >= ?reviewer_min_note_chars ``` 9. **Add rollout safety: model compare mode + rank-delta diagnostics** Analysis: This is a scoring-model migration. You need safe rollout mechanics, not just tests. Add a compare mode so you can inspect rank deltas before forcing v2. ```diff @@ CLI (who) + --scoring-model v1|v2|compare (default: v2) + --max-rank-delta-report N (compare mode diagnostics) @@ Robot output + include v1_score, v2_score, rank_delta when --scoring-model compare ``` If you want, I can produce a single consolidated “plan v4” document that applies all nine diffs cleanly into your original markdown.