docs: add lore-service, work-item-status-graphql, and time-decay plans

Three implementation plans with iterative cross-model refinement: lore-service (5 iterations): HTTP service layer exposing lore's SQLite data via REST/SSE for integration with external tools (dashboards, IDE extensions, chat agents). Covers authentication, rate limiting, caching strategy, and webhook-driven sync triggers. work-item-status-graphql (7 iterations + TDD appendix): Detailed implementation plan for the GraphQL-based work item status enrichment feature (now implemented). Includes the TDD appendix with test-first development specifications covering GraphQL client, adaptive pagination, ingestion orchestration, CLI display, and robot mode output. time-decay-expert-scoring (iteration 5 feedback): Updates to the existing time-decay scoring plan incorporating feedback on decay curve parameterization, recency weighting for discussion contributions, and staleness detection thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:17 -05:00
parent 1161edb212
commit 2c9de1a6c3
15 changed files with 9261 additions and 33 deletions
--- a/plans/time-decay-expert-scoring.md
+++ b/plans/time-decay-expert-scoring.md
@@ -2,12 +2,12 @@
 plan: true
 title: ""
 status: iterating
-iteration: 4
+iteration: 5
 target_iterations: 8
-beads_revision: 0
+beads_revision: 1
 related_plans: []
 created: 2026-02-08
-updated: 2026-02-08
+updated: 2026-02-09
 ---

 # Time-Decay Expert Scoring Model
@@ -78,6 +78,7 @@ Author/reviewer signals are deduplicated per MR (one signal per distinct MR). No
   - Change default `--since` from `"6m"` to `"24m"` (2 years captures all meaningful decayed signals)
   - Add `--as-of` flag for reproducible scoring at a fixed timestamp
   - Add `--explain-score` flag for per-user score component breakdown
+   - Add `--include-bots` flag to disable bot/service-account filtering
   - Sort on raw f64 score, round only for display
   - Update tests
 3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes (dual-path matching, reviewer participation CTE, path resolution probes)
@@ -100,6 +101,7 @@ pub struct ScoringConfig {
    pub note_half_life_days: u32,              // default: 45
    pub closed_mr_multiplier: f64,             // default: 0.5  (applied to closed-without-merge MRs)
    pub reviewer_min_note_chars: u32,          // default: 20   (minimum note body length to count as participation)
+    pub excluded_usernames: Vec<String>,       // default: []    (exact-match usernames to exclude, e.g. ["renovate-bot", "gitlab-ci"])
 }
 ```

@@ -108,6 +110,7 @@ pub struct ScoringConfig {
 - All `*_weight` / `*_bonus` must be >= 0 (negative weights produce nonsensical scores)
 - `closed_mr_multiplier` must be in `(0.0, 1.0]` (0 would discard closed MRs entirely; >1 would over-weight them)
 - `reviewer_min_note_chars` must be >= 0 (0 disables the filter; typical useful values: 10-50)
+- `excluded_usernames` entries must be non-empty strings (no blank entries)
 - Return `LoreError::ConfigInvalid` with a clear message on failure

 ### 2. Decay Function (who.rs)
@@ -128,25 +131,51 @@ The SQL uses **CTE-based dual-path matching** and **hybrid aggregation**. Rather
 MR-level signals return one row per (username, signal, mr_id) with a timestamp; note signals return one row per (username, mr_id) with `note_count` and `max_ts`. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and `log2(1+count)`.

 ```sql
-WITH matched_notes AS (
-  -- Centralize dual-path matching for DiffNotes
-  SELECT n.id, n.discussion_id, n.author_username, n.created_at,
-         n.position_new_path, n.position_old_path, n.project_id
+WITH matched_notes_raw AS (
+  -- Branch 1: match on new_path (uses idx_notes_new_path or equivalent)
+  SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
  FROM notes n
  WHERE n.note_type = 'DiffNote'
    AND n.is_system = 0
    AND n.author_username IS NOT NULL
    AND n.created_at >= ?2
-    AND n.created_at <= ?4
+    AND n.created_at < ?4
    AND (?3 IS NULL OR n.project_id = ?3)
-    AND (n.position_new_path {path_op} OR n.position_old_path {path_op})
+    AND n.position_new_path {path_op}
+  UNION ALL
+  -- Branch 2: match on old_path (uses idx_notes_old_path_author)
+  SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
+  FROM notes n
+  WHERE n.note_type = 'DiffNote'
+    AND n.is_system = 0
+    AND n.author_username IS NOT NULL
+    AND n.created_at >= ?2
+    AND n.created_at < ?4
+    AND (?3 IS NULL OR n.project_id = ?3)
+    AND n.position_old_path {path_op}
 ),
-matched_file_changes AS (
-  -- Centralize dual-path matching for file changes
+matched_notes AS (
+  -- Dedup: prevent double-counting when old_path = new_path (no rename)
+  SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
+  FROM matched_notes_raw
+),
+matched_file_changes_raw AS (
+  -- Branch 1: match on new_path (uses idx_mfc_new_path_project_mr)
  SELECT fc.merge_request_id, fc.project_id
  FROM mr_file_changes fc
  WHERE (?3 IS NULL OR fc.project_id = ?3)
-    AND (fc.new_path {path_op} OR fc.old_path {path_op})
+    AND fc.new_path {path_op}
+  UNION ALL
+  -- Branch 2: match on old_path (uses idx_mfc_old_path_project_mr)
+  SELECT fc.merge_request_id, fc.project_id
+  FROM mr_file_changes fc
+  WHERE (?3 IS NULL OR fc.project_id = ?3)
+    AND fc.old_path {path_op}
+),
+matched_file_changes AS (
+  -- Dedup: prevent double-counting when old_path = new_path (no rename)
+  SELECT DISTINCT merge_request_id, project_id
+  FROM matched_file_changes_raw
 ),
 reviewer_participation AS (
  -- Precompute which (mr_id, username) pairs have substantive DiffNote participation.
@@ -196,7 +225,7 @@ raw AS (
  WHERE m.author_username IS NOT NULL
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
-    AND {state_aware_ts} <= ?4
+    AND {state_aware_ts} < ?4

  UNION ALL

@@ -212,7 +241,7 @@ raw AS (
    AND (m.author_username IS NULL OR r.username != m.author_username)
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
-    AND {state_aware_ts} <= ?4
+    AND {state_aware_ts} < ?4

  UNION ALL

@@ -229,7 +258,7 @@ raw AS (
    AND (m.author_username IS NULL OR r.username != m.author_username)
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
-    AND {state_aware_ts} <= ?4
+    AND {state_aware_ts} < ?4
 ),
 aggregated AS (
  -- MR-level signals: 1 row per (username, signal_class, mr_id) with MAX(ts)
@@ -245,11 +274,11 @@ aggregated AS (
 SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated WHERE username IS NOT NULL
 ```

-Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `BETWEEN ?2 AND ?4` pattern ensures that when `--as-of` is set to a past date, events after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility.
+Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` exclusive upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `>= ?2 AND < ?4` pattern (half-open interval) ensures that when `--as-of` is set to a past date, events at or after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility. The exclusive upper bound avoids edge-case ambiguity when events have timestamps exactly equal to the as-of value.

-**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into `matched_notes` and `matched_file_changes` CTEs means path matching is defined once, the indexes are hit once, and adding future path resolution logic (e.g., alias chains) only requires changes in one place.
+**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into foundational CTEs (`matched_notes_raw` → `matched_notes`, `matched_file_changes_raw` → `matched_file_changes`) means path matching is defined once, each index branch is explicit, and adding future path resolution logic (e.g., alias chains) only requires changes in one place. The UNION ALL + dedup pattern ensures SQLite uses the optimal index for each path column independently.

-**Index optimization fallback (UNION ALL split)**: SQLite's query planner sometimes struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. If EXPLAIN QUERY PLAN shows this during step 6 verification, replace the `OR`-based CTEs with a `UNION ALL` split + dedup pattern:
+**Dual-path matching strategy (UNION ALL split)**: SQLite's query planner commonly struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. Rather than starting with `OR` and hoping the planner cooperates, use `UNION ALL` + dedup as the default strategy:
 ```sql
 matched_notes AS (
  SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
@@ -261,18 +290,18 @@ matched_notes_dedup AS (
  FROM matched_notes
 ),
 ```
-This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm the degradation.
+This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). The same pattern applies to `matched_file_changes`. The simpler `OR` variant is retained as a comment for benchmarking — if a future SQLite version handles `OR` well, the split can be collapsed.

 **Rationale for precomputed participation set**: The previous approach used correlated `EXISTS`/`NOT EXISTS` subqueries to classify reviewers. The `reviewer_participation` CTE materializes the set of `(mr_id, username)` pairs from matched DiffNotes once, then signal 4a JOINs against it (participated) and signal 4b LEFT JOINs with `IS NULL` (assigned-only). This avoids per-reviewer-row correlated scans, is easier to reason about, and produces the same exhaustive split — every `mr_reviewers` row falls into exactly one bucket.

 **Rationale for hybrid over fully-raw**: Pre-aggregating note counts in SQL prevents row explosion from heavy DiffNote volume on frequently-discussed paths. MR-level signals are already 1-per-MR by nature (deduped via GROUP BY in each subquery). This keeps memory and latency predictable regardless of review activity density.

-**Path rename awareness**: Both `matched_notes` and `matched_file_changes` CTEs match against both old and new path columns:
+**Path rename awareness**: Both `matched_notes` and `matched_file_changes` use UNION ALL + dedup to match against both old and new path columns independently, ensuring each branch uses its respective index:

- Notes: `(n.position_new_path {path_op} OR n.position_old_path {path_op})`
- File changes: `(fc.new_path {path_op} OR fc.old_path {path_op})`
+- Notes: branch 1 matches `position_new_path`, branch 2 matches `position_old_path`, deduped by `notes.id`
+- File changes: branch 1 matches `new_path`, branch 2 matches `old_path`, deduped by `(merge_request_id, project_id)`

-Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The `OR` match ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.
+Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The UNION ALL approach ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.

 **Signal 4 splits into two**: The current signal 4 (`file_reviewer`) joins `mr_reviewers` but doesn't distinguish participation. In the new plan:

@@ -332,7 +361,7 @@ For each username, accumulate into a struct with:

 The `mr_state` field from each SQL row is stored alongside the timestamp so the Rust-side can apply `closed_mr_multiplier` when `mr_state == "closed"`.

-Compute score as `f64`. Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`:
+Compute score as `f64` with **deterministic contribution ordering**: within each signal type, sort contributions by `(mr_id ASC)` before summing. This eliminates platform-dependent HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the complexity of compensated summation (Neumaier/Kahan). Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`:
 ```
 state_mult(mr) = if mr.state == "closed" { closed_mr_multiplier } else { 1.0 }

@@ -352,6 +381,8 @@ Compute counts from the accumulated data:
 - `review_note_count = notes_per_mr.values().map(|(count, _)| count).sum()`
 - `author_mr_count = author_mrs.len()`

+**Bot/service-account filtering**: After accumulating all user scores and before sorting, filter out any username that appears in `config.scoring.excluded_usernames` (exact match, case-insensitive). This is applied in Rust post-query (not SQL) to keep the SQL clean and avoid parameter explosion. When `--include-bots` is active, the filter is skipped entirely. The robot JSON `resolved_input` includes `excluded_usernames_applied: true|false` to indicate whether filtering was active.
+
 Truncate to limit after sorting.

 ### 5. Default --since Change
@@ -364,10 +395,11 @@ At 2 years, author decay = 6%, reviewer decay = 0.4%, note decay = 0.006% — ne
 ### 5a. Reproducible Scoring via `--as-of`

 Add `--as-of <RFC3339|YYYY-MM-DD>` flag that overrides the `now_ms` reference point used for decay calculations. When set:
- All event selection is bounded by `[since_ms, as_of_ms]` — events after `as_of_ms` are excluded from SQL results entirely (not just decayed)
+- All event selection is bounded by `[since_ms, as_of_ms)` — exclusive upper bound; events at or after `as_of_ms` are excluded from SQL results entirely (not just decayed). The SQL uses `< ?4` (strict less-than), not `<= ?4`.
+- `YYYY-MM-DD` input (without time component) is interpreted as end-of-day UTC: `T23:59:59.999Z`. This matches user intuition that `--as-of 2025-06-01` means "as of the end of June 1st" rather than "as of midnight at the start of June 1st" which would exclude the entire day's activity.
 - All decay computations use `as_of_ms` instead of `SystemTime::now()`
 - The `--since` window is calculated relative to `as_of_ms` (not wall clock)
- Robot JSON `resolved_input` includes `as_of_ms` and `as_of_iso` fields
+- Robot JSON `resolved_input` includes `as_of_ms`, `as_of_iso`, `window_start_iso`, `window_end_iso`, and `window_end_exclusive: true` fields — making the exact query window unambiguous in output

 **Rationale**: Decayed scoring is time-sensitive by nature. Without a fixed reference point, the same query run minutes apart produces different rankings, making debugging and test reproducibility difficult. `--as-of` pins the clock so that results are deterministic for a given dataset. The upper-bound filter in SQL is critical — without it, events after the as-of date would enter with full weight (since `elapsed.max(0.0)` clamps negative elapsed time to zero), breaking the reproducibility guarantee.

@@ -484,7 +516,13 @@ Add timestamp-aware variants:

 **`test_closed_mr_multiplier`**: Two identical MRs (same author, same age, same path). One is `merged`, one is `closed`. The merged MR should contribute `author_weight * decay(...)`, the closed MR should contribute `author_weight * closed_mr_multiplier * decay(...)`. With default multiplier 0.5, the closed MR contributes half.

-**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the upper-bound filtering in SQL.
+**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the exclusive upper-bound (`< ?4`) filtering in SQL.
+
+**`test_as_of_exclusive_upper_bound`**: Insert an event with timestamp exactly equal to the `as_of_ms` value. Verify it is excluded from results (strict less-than, not less-than-or-equal). This validates the half-open interval `[since, as_of)` semantics.
+
+**`test_excluded_usernames_filters_bots`**: Insert signals for a user named "renovate-bot" and a user named "jsmith", both with the same activity. With `excluded_usernames: ["renovate-bot"]` in config, only "jsmith" should appear in results. Validates the Rust-side post-query filtering.
+
+**`test_include_bots_flag_disables_filtering`**: Same setup as above, but with `--include-bots` active. Both "renovate-bot" and "jsmith" should appear in results.

 **`test_null_timestamp_fallback_to_created_at`**: Insert a merged MR with `merged_at = NULL` (edge case: old data before the column was populated). The state-aware timestamp should fall back to `created_at`. Verify the score reflects `created_at`, not 0 or a panic.

@@ -496,11 +534,13 @@ Add timestamp-aware variants:

 **`test_reviewer_split_is_exhaustive`**: For a reviewer assigned to an MR, they must appear in exactly one of: participated (has substantive DiffNotes meeting `reviewer_min_note_chars`) or assigned-only (no DiffNotes, or only trivial ones below the threshold). Never both, never neither. Test three cases: (1) reviewer with substantive DiffNotes -> participated only, (2) reviewer with no DiffNotes -> assigned-only only, (3) reviewer with only trivial notes ("LGTM") -> assigned-only only.

+**`test_deterministic_accumulation_order`**: Insert signals for a user with contributions at many different timestamps (10+ MRs with varied ages). Run `query_expert` 100 times in a loop. All 100 runs must produce the exact same `f64` score (bit-identical). Validates that the sorted contribution ordering eliminates HashMap-iteration-order nondeterminism.
+
 ### 9. Existing Test Compatibility

 All existing tests insert data with `now_ms()`. With decay, elapsed ~0ms means decay ~1.0, so scores round to the same integers as before. No existing test assertions should break.

-The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, and `reviewer_min_note_chars` fields.
+The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, `reviewer_min_note_chars`, and `excluded_usernames` fields.

 ## Verification

@@ -511,11 +551,17 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul
 5. `ubs src/cli/commands/who.rs src/core/config.rs src/core/db.rs` — no bug scanner findings
 6. Manual query plan verification (not automated — SQLite planner varies across versions):
   - Run `EXPLAIN QUERY PLAN` on the expert query (both exact and prefix modes) against a real database
-   - Confirm that `matched_notes` CTE uses `idx_notes_old_path_author` or the existing new_path index (not a full table scan)
-   - Confirm that `matched_file_changes` CTE uses `idx_mfc_old_path_project_mr` or `idx_mfc_new_path_project_mr`
+   - Confirm that `matched_notes_raw` branch 1 uses the existing new_path index and branch 2 uses `idx_notes_old_path_author` (not a full table scan on either branch)
+   - Confirm that `matched_file_changes_raw` branch 1 uses `idx_mfc_new_path_project_mr` and branch 2 uses `idx_mfc_old_path_project_mr`
   - Confirm that `reviewer_participation` CTE uses `idx_notes_diffnote_discussion_author`
   - Document the observed plan in a comment near the SQL for future regression reference
-7. Real-world validation:
+7. Performance baseline (manual, not CI-gated):
+   - Run `time cargo run --release -- who --path <exact-path>` on the real database for exact, prefix, and suffix modes
+   - Target SLOs: p95 exact path < 200ms, prefix < 300ms, suffix < 500ms on development hardware
+   - Record baseline timings as a comment near the SQL for regression reference
+   - If any mode exceeds 2x the baseline after future changes, investigate before merging
+   - Note: These are soft targets for developer awareness, not automated CI gates. Automated benchmarking with synthetic fixtures (100k/1M/5M notes) is a v2 investment if performance becomes a real concern.
+8. Real-world validation:
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx` — verify jdefting/zhayes old reviews are properly discounted relative to recent authors
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx --all-history` — compare full history vs 24m window to validate cutoff is reasonable
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx --explain-score` — verify component breakdown sums to total and authored signal dominates for known authors
@@ -524,6 +570,7 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx --as-of 2025-06-01` — verify deterministic output across repeated runs
   - Spot-check that reviewers who only left "LGTM"-style notes are classified as assigned-only (not participated)
   - Verify closed MRs contribute at ~50% of equivalent merged MR scores via `--explain-score`
+   - If the project has known bot accounts (e.g., renovate-bot), add them to `excluded_usernames` config and verify they no longer appear in results. Run again with `--include-bots` to confirm they reappear.

 ## Accepted from External Review

@@ -553,12 +600,20 @@ Ideas incorporated from ChatGPT review (feedback-1 through feedback-4) that genu
 - **EXPLAIN QUERY PLAN verification step**: Manual check that the restructured queries use the new indexes (not automated, since SQLite planner varies across versions).

 **From feedback-4:**
- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `<= ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms]`. Without this, `--as-of` reproducibility was fundamentally broken.
+- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `< ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms)`. Without this, `--as-of` reproducibility was fundamentally broken. (Refined to exclusive upper bound in feedback-5.)
 - **Closed-state inconsistency resolution**: The state-aware CASE expression handled `closed` state but the WHERE clause filtered to `('opened','merged')` only — dead code. Resolved by including `'closed'` in state filters and adding a `closed_mr_multiplier` (default 0.5) applied in Rust to all signals from closed-without-merge MRs. This credits real review effort on abandoned MRs while appropriately discounting it.
 - **Substantive note threshold for reviewer participation**: A single "LGTM" shouldn't promote a reviewer from 3-point (assigned-only) to 10-point (participated) weight. Added `reviewer_min_note_chars` (default 20) config field and `LENGTH(TRIM(body))` filter in the `reviewer_participation` CTE. This raises the bar for participation classification to actual substantive review comments.
- **UNION ALL optimization fallback for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Added documentation of a `UNION ALL` split + dedup fallback pattern to use if EXPLAIN QUERY PLAN shows degradation during verification. Start with the simpler `OR` approach; switch only if needed.
+- **UNION ALL optimization for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Originally documented as a fallback; promoted to default strategy in feedback-5 iteration. The UNION ALL + dedup approach ensures each index branch is used independently.
 - **New tests**: `test_trivial_note_does_not_count_as_participation`, `test_closed_mr_multiplier`, `test_as_of_excludes_future_events` — cover the three new features added from this review round.

+**From feedback-5 (ChatGPT review):**
+- **Exclusive upper bound for `--as-of`**: Changed from `[since_ms, as_of_ms]` (inclusive) to `[since_ms, as_of_ms)` (exclusive). Half-open intervals are the standard convention in temporal systems — they eliminate edge-case ambiguity when events have timestamps exactly at the boundary. Also added `YYYY-MM-DD` → end-of-day UTC parsing and window metadata in robot output.
+- **UNION ALL as default for dual-path matching**: Promoted from "fallback if planner regresses" to default strategy. SQLite `OR`-across-indexed-columns degradation is common enough that the predictable UNION ALL + dedup approach is the safer starting point. The simpler `OR` variant is retained as a comment for benchmarking.
+- **Deterministic contribution ordering**: Within each signal type, sort contributions by `mr_id` before summing. This eliminates HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the overhead of compensated summation (Neumaier/Kahan was rejected as overkill at this scale).
+- **Minimal bot/service-account filtering**: Added `excluded_usernames` (exact match, case-insensitive) to `ScoringConfig` and `--include-bots` CLI flag. Applied as a Rust-side post-filter (not SQL) to keep queries clean. Scope is deliberately minimal — no regex patterns, no heuristic detection. Users configure the list for their team's specific bots.
+- **Performance baseline SLOs**: Added manual performance baseline step to verification — record timings for exact/prefix/suffix modes and flag >2x regressions. Kept lightweight (no CI gating, no synthetic benchmarks) to match the project's current maturity.
+- **New tests**: `test_as_of_exclusive_upper_bound`, `test_excluded_usernames_filters_bots`, `test_include_bots_flag_disables_filtering`, `test_deterministic_accumulation_order` — cover the newly-accepted features.
+
 ## Rejected Ideas (with rationale)

 These suggestions were considered during review but explicitly excluded from this iteration:
@@ -573,3 +628,10 @@ These suggestions were considered during review but explicitly excluded from thi
 - **Split scoring engine into core module** (feedback-4 #5): Proposed extracting scoring math from `who.rs` into `src/core/scoring/model_v2_decay.rs`. Premature modularization — `who.rs` is the only consumer and is ~800 lines. Adding module plumbing and indirection for a single call site adds complexity without reducing it. If we add a second scoring consumer (e.g., automated triage), revisit.
 - **Bot/service-account filtering** (feedback-4 #7): Real concern but orthogonal to time-decay scoring. This is a general data quality feature that belongs in its own issue — it affects all `who` modes, not just expert scoring. Adding `excluded_username_patterns` config and `--include-bots` flag is scope expansion that should be designed and tested independently.
 - **Model compare mode / rank-delta diagnostics** (feedback-4 #9): Over-engineered rollout safety for an internal CLI tool with ~3 users. Maintaining two parallel scoring codepaths (v1 flat + v2 decayed) doubles test surface and code complexity. The `--explain-score` + `--as-of` combination already provides debugging capability. If a future model change is risky enough to warrant A/B comparison, build it then.
+- **Canonical path identity graph** (feedback-5 #1, also feedback-2 #2, feedback-4 #4): Third time proposed, third time rejected. Building a rename graph from `mr_file_changes(old_path, new_path)` with identity resolution requires new schema (`path_identities`, `path_aliases` tables), ingestion pipeline changes, graph traversal at query time, and backfill logic for existing data. The UNION ALL dual-path matching already covers the 80%+ case (direct renames). Multi-hop rename chains (A→B→C) are rare in practice and can be addressed in v2 with real usage data showing the gap matters.
+- **Normalized `expertise_events` table** (feedback-5 #2): Proposes shifting from query-time CTE joins to a precomputed `expertise_events` table populated at ingest time. While architecturally appealing for read performance, this doubles the data surface area (raw tables + derived events), requires new ingestion pipelines with incremental upsert logic, backfill tooling for existing databases, and introduces consistency risks when raw data is corrected/re-synced. The CTE approach is correct, maintainable, and performant at our current scale. If query latency becomes a real bottleneck (see performance baseline SLOs), materialized views or derived tables become a v2 optimization.
+- **Reviewer engagement model upgrade** (feedback-5 #3): Proposes adding `approved`/`changes_requested` review-state signals and trivial-comment pattern matching (`["lgtm","+1","nit","ship it"]`). Expands the signal type count from 4 to 6 and adds a fragile pattern-matching layer (what about "don't ship it"? "lgtm but..."?). The `reviewer_min_note_chars` threshold is imperfect but pragmatic — it's a single configurable number with no false-positive risk from substring matching. Review-state signals may be worth adding later as a separate enhancement when we have data on how often they diverge from DiffNote participation.
+- **Contribution-floor auto cutoff for `--since`** (feedback-5 #5): Proposes `--since auto` computing the earliest relevant timestamp from `min_contribution_floor` (e.g., 0.01 points). Adds a non-obvious config parameter for minimal benefit — the 24m default is already mathematically justified from the decay curves (author: 6%, reviewer: 0.4% at 2 years) and easily overridden with `--since` or `--all-history`. The auto-derivation formula (`ceil(max_half_life * log2(1/floor))`) is opaque to users who just want to understand why a certain time range was selected.
+- **Full evidence drill-down in `--explain-score`** (feedback-5 #8): Proposes `--explain-score=summary|full` with per-MR evidence rows. Already rejected in feedback-2 #7. Component totals are sufficient for v1 debugging — they answer "which signal type drives this user's score." Per-MR drill-down requires additional SQL queries and significant output format complexity. Deferred unless component breakdowns prove insufficient.
+- **Neumaier compensated summation** (feedback-5 #7 partial): Accepted the sorting aspect for deterministic ordering, but rejected Neumaier/Kahan compensated summation. At the scale of dozens to low hundreds of contributions per user, the rounding error from naive f64 summation is on the order of 1e-14 — several orders of magnitude below any meaningful score difference. Compensated summation adds code complexity and a maintenance burden for no practical benefit at this scale.
+- **Automated CI benchmark gate** (feedback-5 #10 partial): Accepted manual performance baselines, but rejected automated CI regression gating with synthetic fixtures (100k/1M/5M notes). Building and maintaining benchmark infrastructure is a significant investment that's premature for a CLI tool with ~3 users. Manual timing checks during development are sufficient until performance becomes a real concern.