Your plan is already strong. The biggest remaining gaps are temporal correctness, indexability at scale, and ranking reliability under sparse/noisy evidence. These are the revisions I’d make.

1. **Fix temporal correctness for `--as-of` (critical)**
Analysis: Right now the plan describes `--as-of`, but the SQL only enforces lower bounds (`>= since`). If `as_of` is in the past, “future” events can still enter and get full weight (because elapsed is clamped). This breaks reproducibility.
```diff
@@ 3. SQL Restructure
- AND n.created_at >= ?2
+ AND n.created_at BETWEEN ?2 AND ?4
@@ Signal 3/4a/4b
- AND {state_aware_ts} >= ?2
+ AND {state_aware_ts} BETWEEN ?2 AND ?4

@@ 5a. Reproducible Scoring via --as-of
- All decay computations use as_of_ms instead of SystemTime::now()
+ All event selection and decay computations are bounded by as_of_ms.
+ Query window is [since_ms, as_of_ms], never [since_ms, now_ms].
+ Add test: test_as_of_excludes_future_events.
```

2. **Resolve `closed`-state inconsistency**
Analysis: The CASE handles `closed`, but all signal queries filter to `('opened','merged')`, making the `closed_at` branch dead code. Either include closed MRs intentionally or remove that logic. I’d include closed with a reduced multiplier.
```diff
@@ ScoringConfig (config.rs)
+ pub closed_mr_multiplier: f64, // default: 0.5

@@ 3. SQL Restructure
- AND m.state IN ('opened','merged')
+ AND m.state IN ('opened','merged','closed')

@@ 4. Rust-Side Aggregation
+ if state == "closed" { contribution *= closed_mr_multiplier; }
```

3. **Replace `OR` path predicates with index-friendly `UNION ALL` branches**
Analysis: `(new_path ... OR old_path ...)` often degrades index usage in SQLite. Split into two indexed branches and dedupe once. This improves planner stability and latency on large datasets.
```diff
@@ 3. SQL Restructure
-WITH matched_notes AS (
-  ... AND (n.position_new_path {path_op} OR n.position_old_path {path_op})
-),
+WITH matched_notes AS (
+  SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
+  UNION ALL
+  SELECT ... FROM notes n WHERE ... AND n.position_old_path {path_op}
+),
+matched_notes_dedup AS (
+  SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
+  FROM matched_notes
+),
@@
- JOIN matched_notes mn ...
+ JOIN matched_notes_dedup mn ...
```

4. **Add canonical path identity (rename-chain support)**
Analysis: Direct `old_path/new_path` matching only handles one-hop rename scenarios. A small alias graph/table built at ingest time gives robust expertise continuity across A→B→C chains and avoids repeated SQL complexity.
```diff
@@ Files to Modify
- 3. src/core/db.rs — Add migration for indexes...
+ 3. src/core/db.rs — Add migration for indexes + path_identity table
+ 4. src/core/ingest/*.rs — populate path_identity on rename events
+ 5. src/cli/commands/who.rs — resolve query path to canonical path_id first

@@ Context
- The fix has three parts:
+ The fix has four parts:
+ - Introduce canonical path identity so multi-hop renames preserve expertise
```

5. **Split scoring engine into a versioned core module**
Analysis: `who.rs` is becoming a mixed CLI/query/math/output surface. Move scoring math and event normalization into `src/core/scoring/` with explicit model versions. This reduces regression risk and enables future model experiments.
```diff
@@ Files to Modify
+ 4. src/core/scoring/mod.rs — model interface + shared types
+ 5. src/core/scoring/model_v2_decay.rs — current implementation
+ 6. src/cli/commands/who.rs — orchestration only

@@ 5b. Score Explainability
+ resolved_input includes scoring_model_version and scoring_model_name
```

6. **Add evidence confidence to reduce sparse-data rank spikes**
Analysis: One recent MR can outrank broader, steadier expertise. Add a confidence factor derived from number of distinct evidence MRs and expose both `score_raw` and `score_adjusted`.
```diff
@@ Scoring Formula
+ confidence(user) = 1 - exp(-evidence_mr_count / 6.0)
+ score_adjusted = score_raw * confidence

@@ 4. Rust-Side Aggregation
+ compute evidence_mr_count from unique MR ids across all signals
+ sort by score_adjusted DESC, then score_raw DESC, then last_seen DESC

@@ 5b. --explain-score
+ include confidence and evidence_mr_count
```

7. **Add first-class bot/service-account filtering**
Analysis: Reviewer inflation is not just assignment; bots and automation users can still pollute rankings. Make exclusion explicit and configurable.
```diff
@@ ScoringConfig (config.rs)
+ pub excluded_username_patterns: Vec<String>, // defaults include "*bot*", "renovate", "dependabot"

@@ 3. SQL Restructure
+ AND username NOT MATCHES excluded patterns (applied in Rust post-query or SQL where feasible)

@@ CLI
+ --include-bots (override exclusions)
```

8. **Tighten reviewer “participated” with substantive-note threshold**
Analysis: A single “LGTM” note shouldn’t classify someone as engaged reviewer equivalent to real inline review. Use a minimum substantive threshold.
```diff
@@ ScoringConfig (config.rs)
+ pub reviewer_min_note_chars: u32, // default: 20

@@ reviewer_participation CTE
- SELECT DISTINCT ... FROM matched_notes
+ SELECT DISTINCT ... FROM matched_notes
+ WHERE LENGTH(TRIM(body)) >= ?reviewer_min_note_chars
```

9. **Add rollout safety: model compare mode + rank-delta diagnostics**
Analysis: This is a scoring-model migration. You need safe rollout mechanics, not just tests. Add a compare mode so you can inspect rank deltas before forcing v2.
```diff
@@ CLI (who)
+ --scoring-model v1|v2|compare   (default: v2)
+ --max-rank-delta-report N       (compare mode diagnostics)

@@ Robot output
+ include v1_score, v2_score, rank_delta when --scoring-model compare
```

If you want, I can produce a single consolidated “plan v4” document that applies all nine diffs cleanly into your original markdown.