gitlore/plans/time-decay-expert-scoring.feedback-6.md

I avoided everything already listed in your `Rejected Ideas` section and focused on net-new upgrades.

1. Centralize MR temporal semantics in one `mr_activity` CTE (architecture + correctness)
Why this improves the plan: right now the state-aware timestamp logic is repeated across multiple signal branches, while `closed_mr_multiplier` is applied later in Rust by string state checks. That split is brittle. A single `mr_activity` CTE removes drift risk, simplifies query maintenance, and avoids per-row state-string handling in Rust.

```diff
diff --git a/plan.md b/plan.md
@@ SQL Restructure
+mr_activity AS (
+  SELECT
+    m.id AS mr_id,
+    m.project_id,
+    m.author_username,
+    m.state,
+    CASE
+      WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.created_at)
+      WHEN m.state = 'closed' THEN COALESCE(m.closed_at, m.created_at)
+      ELSE COALESCE(m.updated_at, m.created_at)
+    END AS activity_ts,
+    CASE
+      WHEN m.state = 'closed' THEN ?5
+      ELSE 1.0
+    END AS state_mult
+  FROM merge_requests m
+  WHERE m.state IN ('opened','merged','closed')
+),
@@
-... {state_aware_ts} AS seen_at, m.state AS mr_state
+... a.activity_ts AS seen_at, a.state_mult
@@
-SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated
+SELECT username, signal, mr_id, qty, ts, state_mult FROM aggregated
```

2. Parameterize `reviewer_min_note_chars` and tighten config validation (robustness)
Why this improves the plan: inlining `reviewer_min_note_chars` into SQL text creates statement-cache churn and avoidable SQL-text variability. Also, current validation misses finite-range guards (`NaN`, absurd half-lives). Parameterization + stronger validation reduces weird failure modes.

```diff
diff --git a/plan.md b/plan.md
@@ 1. ScoringConfig (config.rs)
- reviewer_min_note_chars must be >= 0
+ reviewer_min_note_chars must be <= 4096
+ all half-life values must be <= 3650 (10 years safety cap)
+ closed_mr_multiplier must be finite and in (0.0, 1.0]
@@ SQL Restructure
-AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= {reviewer_min_note_chars}
+AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= ?6
```

3. Add path canonicalization before probes/scoring (correctness + UX)
Why this improves the plan: rename-awareness helps only after path resolution succeeds. Inputs like `./src//foo.rs` or inconsistent trailing slashes can still miss. Canonicalizing query paths up front reduces false negatives and ambiguous suffix behavior.

```diff
diff --git a/plan.md b/plan.md
@@ 3a. Path Resolution Probes (who.rs)
+Add `normalize_query_path()` before `build_path_query()`:
+- strip leading `./`
+- collapse repeated `/`
+- trim whitespace
+- preserve trailing `/` only for explicit prefix intent
+Expose both `path_input_original` and `path_input_normalized` in `resolved_input`.
@@ New Tests
+test_path_normalization_handles_dot_and_double_slash
+test_path_normalization_preserves_explicit_prefix_semantics
```

4. Add epsilon-based tie buckets for stable ranking (determinism)
Why this improves the plan: even with deterministic summation order, tiny `powf` platform differences can reorder near-equal scores. Tie bucketing keeps ordering stable and user-meaningful.

```diff
diff --git a/plan.md b/plan.md
@@ 4. Rust-Side Aggregation (who.rs)
-Sort on raw `f64` score — `(raw_score DESC, last_seen DESC, username ASC)`.
+Sort using a tie bucket:
+`score_bucket = (raw_score / 1e-9).floor() as i64`
+Order by `(score_bucket DESC, raw_score DESC, last_seen DESC, username ASC)`.
+This preserves precision while preventing meaningless micro-delta reorderings.
@@ New Tests
+test_near_equal_scores_use_stable_tie_bucket_order
```

5. Add `--diagnose-score` aggregated diagnostics (operability)
Why this improves the plan: `--explain-score` tells “why this user scored”, but not “why this query behaved oddly” (path ambiguity, dedup collapse, old_path contribution share, filtered bots, window exclusions). Lightweight aggregate diagnostics are high-value without per-MR drill-down complexity.

```diff
diff --git a/plan.md b/plan.md
@@ CLI changes (who.rs)
+Add `--diagnose-score` flag (compatible with `--explain-score`, incompatible with `--detail`).
+When enabled, include:
+- matched_notes_raw_count
+- matched_notes_dedup_count
+- matched_file_changes_raw_count
+- matched_file_changes_dedup_count
+- rows_excluded_by_window_upper_bound
+- users_filtered_by_excluded_usernames
+- query_elapsed_ms
@@ Robot output
+`diagnostics` object emitted only when `--diagnose-score` is set.
```

6. Add probe-optimized indexes for path resolution (performance)
Why this improves the plan: current proposed indexes are optimized for scoring joins, but `build_path_query()` and `suffix_probe()` run existence/path-only probes where `author_username` is not constrained. Dedicated probe indexes will materially reduce latency for path lookup modes.

```diff
diff --git a/plan.md b/plan.md
@@ 6. Index Migration (db.rs)
+-- Fast exact/prefix/suffix path probes on notes (no author predicate)
+CREATE INDEX IF NOT EXISTS idx_notes_new_path_project_created
+  ON notes(position_new_path, project_id, created_at)
+  WHERE note_type = 'DiffNote' AND is_system = 0 AND position_new_path IS NOT NULL;
+
+CREATE INDEX IF NOT EXISTS idx_notes_old_path_project_created
+  ON notes(position_old_path, project_id, created_at)
+  WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;
```

7. Add multi-path expert scoring (`--path` repeatable) with dedup across paths (feature + utility)
Why this improves the plan: current model is single-path centric. Real ownership questions are usually subsystem-level. Repeatable paths/prefixes let users ask “who knows auth stack?” in one call. Dedup by `(username, signal, mr_id)` avoids double-counting same MR touching multiple requested paths.

```diff
diff --git a/plan.md b/plan.md
@@ CLI/feature scope
+Add repeatable `--path` in expert mode:
+`lore who --expert --path src/auth/ --path src/session/`
+Optional `--path-file <file>` for large path sets (one per line).
@@ SQL Restructure
+Add `requested_paths` CTE and match each source against that set.
+Ensure dedup key includes `(username, signal, mr_id)` so one MR contributes once per signal even if multiple paths match.
@@ New Tests
+test_multi_path_query_unions_results_without_double_counting
+test_multi_path_with_overlap_prefixes_is_idempotent
```

These 7 revisions keep your current model direction intact, but reduce correctness drift risk, harden edge handling, improve query observability, and make the feature materially more useful for real ownership workflows.