Files
gitlore/plans/time-decay-expert-scoring.md
Taylor Eernisse 2c9de1a6c3 docs: add lore-service, work-item-status-graphql, and time-decay plans
Three implementation plans with iterative cross-model refinement:

lore-service (5 iterations):
  HTTP service layer exposing lore's SQLite data via REST/SSE for
  integration with external tools (dashboards, IDE extensions, chat
  agents). Covers authentication, rate limiting, caching strategy, and
  webhook-driven sync triggers.

work-item-status-graphql (7 iterations + TDD appendix):
  Detailed implementation plan for the GraphQL-based work item status
  enrichment feature (now implemented). Includes the TDD appendix with
  test-first development specifications covering GraphQL client, adaptive
  pagination, ingestion orchestration, CLI display, and robot mode output.

time-decay-expert-scoring (iteration 5 feedback):
  Updates to the existing time-decay scoring plan incorporating feedback
  on decay curve parameterization, recency weighting for discussion
  contributions, and staleness detection thresholds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:17 -05:00

638 lines
59 KiB
Markdown

---
plan: true
title: ""
status: iterating
iteration: 5
target_iterations: 8
beads_revision: 1
related_plans: []
created: 2026-02-08
updated: 2026-02-09
---
# Time-Decay Expert Scoring Model
## Context
The `lore who --path` command currently uses flat weights to score expertise: each authored MR counts as 25 points, each reviewed MR as 10, each inline note as 1 — regardless of when the activity happened. This produces three compounding problems:
1. **Temporal blindness**: Old activity counts the same as recent activity. Someone who authored a file 2 years ago ranks equivalently to someone who wrote it last week.
2. **Reviewer inflation**: Senior reviewers (jdefting, zhayes) who rubber-stamp every MR via assignment accumulate inflated scores indistinguishable from reviewers who actually left substantive inline feedback. The `mr_reviewers` table captures assignment, not engagement.
3. **Path-history blindness**: Renamed or moved files lose historical expertise because signal matching relies on `position_new_path` and `mr_file_changes.new_path` only. A developer who authored the file under its previous name gets zero credit after a rename.
The fix has three parts:
- Apply **exponential half-life decay** to each signal, grounded in cognitive science research
- **Split the reviewer signal** into "participated" (left DiffNotes) vs "assigned-only" (in `mr_reviewers` but no inline comments), with different weights and decay rates
- **Match both old and new paths** in all signal queries AND path resolution probes so expertise survives file renames
## Research Foundation
- **Ebbinghaus Forgetting Curve (1885)**: Memory retention follows exponential decay: `R = 2^(-t/h)` where h is the half-life
- **Generation Effect (Slamecka & Graf, 1978)**: Producing information (authoring code) creates ~2x more durable memory traces than reading it (reviewing)
- **Levels of Processing (Craik & Lockhart, 1972)**: Deeper cognitive engagement creates more durable memories — authoring > reviewing > commenting
- **Half-Life Regression (Settles & Meeder, 2016, Duolingo)**: Exponential decay with per-signal-type half-lives is practical and effective at scale. Chosen over power law for additivity, bounded behavior, and intuitive parameterization
- **Fritz et al. (2010, ICSE)**: "Degree-of-knowledge" model for code familiarity considers both authoring and interaction events with time-based decay
## Scoring Formula
```
score(user, path) = Sum_i( weight_i * 2^(-days_elapsed_i / half_life_i) )
```
For note signals grouped per MR, a diminishing-returns function caps comment storms:
```
note_contribution(mr) = note_bonus * log2(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)
```
**Why `log2` instead of `ln`?** With `log2`, a single note contributes exactly `note_bonus * 1.0` (since `log2(2) = 1`), making the `note_bonus` weight directly interpretable as "points per note at count=1." With `ln`, one note contributes `note_bonus * 0.69`, which is unintuitive and means `note_bonus=1` doesn't actually mean "1 point per note." The diminishing-returns curve shape is identical — only the scale factor differs.
Per-signal contributions (each signal is either per-MR or per-note-group):
| Signal Type | Base Weight | Half-Life | Rationale |
|-------------|-------------|-----------|-----------|
| **Author** (authored MR touching path) | 25 | 180 days | Deep generative engagement; ~50% retention at 6 months |
| **Reviewer Participated** (left DiffNote on MR/path) | 10 | 90 days | Active review engagement; ~50% at 3 months |
| **Reviewer Assigned-Only** (in `mr_reviewers`, no DiffNote on path) | 3 | 45 days | Passive assignment; minimal cognitive engagement, fades fast |
| **Note** (inline DiffNotes on path, grouped per MR) | 1 | 45 days | `log2(1+count)` per MR; diminishing returns prevent comment storms |
**Why split reviewers?** The `mr_reviewers` table records assignment, not engagement. A reviewer who left 5 inline comments on a file has demonstrably more expertise than one who was merely assigned and clicked "approve." The participated signal inherits the old reviewer weight (10) and decay (90 days); the assigned-only signal gets reduced weight (3) and faster decay (45 days) — enough to register but not enough to inflate past actual contributors.
**Why require substantive notes?** Participation is qualified by a minimum note body length (`reviewer_min_note_chars`, default 20). Without this, a single "LGTM" or "+1" comment would promote a reviewer from the 3-point assigned-only tier to the 10-point participated tier — a 3.3x weight increase for zero substantive engagement. The threshold is configurable to accommodate teams with different review conventions.
**Why cap notes per MR?** Without diminishing returns, a back-and-forth thread of 30 comments on a single MR would score 30 note points — disproportionate to the expertise gained. `log2(1 + 30) ≈ 4.95` vs `log2(1 + 1) = 1.0` preserves the signal that more comments = more engagement while preventing outlier MRs from dominating. The 30-note reviewer gets ~5x the credit of a 1-note reviewer, not 30x.
Author/reviewer signals are deduplicated per MR (one signal per distinct MR). Note signals are grouped per (user, MR) and use `log2(1 + count)` scaling.
**Why include closed MRs?** Closed-without-merge MRs represent real review effort and code familiarity even though the code was abandoned. All signals from closed MRs are multiplied by `closed_mr_multiplier` (default 0.5) to reflect this reduced but non-zero contribution. This applies uniformly to author, reviewer, and note signals on closed MRs.
## Files to Modify
1. **`src/core/config.rs`** — Add half-life fields + assigned-only reviewer config to `ScoringConfig`; add config validation
2. **`src/cli/commands/who.rs`** — Core changes:
- Add `half_life_decay()` pure function
- Restructure `query_expert()`: SQL returns hybrid-aggregated signal rows with timestamps (MR-level for author/reviewer, note-count-per-MR for notes), Rust applies decay + `log2(1+count)` + final ranking
- Match both `new_path` and `old_path` in all signal queries (rename awareness)
- Extend rename awareness to `build_path_query()` probes and `suffix_probe()` (not just scoring)
- Split reviewer signal into participated vs assigned-only
- Use state-aware timestamps (`merged_at` for merged MRs, `updated_at` for open MRs)
- Change default `--since` from `"6m"` to `"24m"` (2 years captures all meaningful decayed signals)
- Add `--as-of` flag for reproducible scoring at a fixed timestamp
- Add `--explain-score` flag for per-user score component breakdown
- Add `--include-bots` flag to disable bot/service-account filtering
- Sort on raw f64 score, round only for display
- Update tests
3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes (dual-path matching, reviewer participation CTE, path resolution probes)
## Implementation Details
### 1. ScoringConfig (config.rs)
Add half-life fields and the new assigned-only reviewer signal. All new fields use `#[serde(default)]` for backward compatibility:
```rust
pub struct ScoringConfig {
pub author_weight: i64, // default: 25
pub reviewer_weight: i64, // default: 10 (participated — left DiffNotes)
pub reviewer_assignment_weight: i64, // default: 3 (assigned-only — no DiffNotes on path)
pub note_bonus: i64, // default: 1
pub author_half_life_days: u32, // default: 180
pub reviewer_half_life_days: u32, // default: 90 (participated)
pub reviewer_assignment_half_life_days: u32, // default: 45 (assigned-only)
pub note_half_life_days: u32, // default: 45
pub closed_mr_multiplier: f64, // default: 0.5 (applied to closed-without-merge MRs)
pub reviewer_min_note_chars: u32, // default: 20 (minimum note body length to count as participation)
pub excluded_usernames: Vec<String>, // default: [] (exact-match usernames to exclude, e.g. ["renovate-bot", "gitlab-ci"])
}
```
**Config validation**: Add a `validate_scoring()` call in `Config::load_from_path()` after deserialization:
- All `*_half_life_days` must be > 0 (prevents division by zero in decay function)
- All `*_weight` / `*_bonus` must be >= 0 (negative weights produce nonsensical scores)
- `closed_mr_multiplier` must be in `(0.0, 1.0]` (0 would discard closed MRs entirely; >1 would over-weight them)
- `reviewer_min_note_chars` must be >= 0 (0 disables the filter; typical useful values: 10-50)
- `excluded_usernames` entries must be non-empty strings (no blank entries)
- Return `LoreError::ConfigInvalid` with a clear message on failure
### 2. Decay Function (who.rs)
```rust
fn half_life_decay(elapsed_ms: i64, half_life_days: u32) -> f64 {
let days = (elapsed_ms as f64 / 86_400_000.0).max(0.0);
let hl = f64::from(half_life_days);
if hl <= 0.0 { return 0.0; }
2.0_f64.powf(-days / hl)
}
```
### 3. SQL Restructure (who.rs)
The SQL uses **CTE-based dual-path matching** and **hybrid aggregation**. Rather than repeating `OR old_path` in every signal subquery, two foundational CTEs (`matched_notes`, `matched_file_changes`) centralize path matching. A third CTE (`reviewer_participation`) precomputes which reviewers actually left DiffNotes, avoiding correlated `EXISTS`/`NOT EXISTS` subqueries.
MR-level signals return one row per (username, signal, mr_id) with a timestamp; note signals return one row per (username, mr_id) with `note_count` and `max_ts`. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and `log2(1+count)`.
```sql
WITH matched_notes_raw AS (
-- Branch 1: match on new_path (uses idx_notes_new_path or equivalent)
SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
FROM notes n
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username IS NOT NULL
AND n.created_at >= ?2
AND n.created_at < ?4
AND (?3 IS NULL OR n.project_id = ?3)
AND n.position_new_path {path_op}
UNION ALL
-- Branch 2: match on old_path (uses idx_notes_old_path_author)
SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
FROM notes n
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username IS NOT NULL
AND n.created_at >= ?2
AND n.created_at < ?4
AND (?3 IS NULL OR n.project_id = ?3)
AND n.position_old_path {path_op}
),
matched_notes AS (
-- Dedup: prevent double-counting when old_path = new_path (no rename)
SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
FROM matched_notes_raw
),
matched_file_changes_raw AS (
-- Branch 1: match on new_path (uses idx_mfc_new_path_project_mr)
SELECT fc.merge_request_id, fc.project_id
FROM mr_file_changes fc
WHERE (?3 IS NULL OR fc.project_id = ?3)
AND fc.new_path {path_op}
UNION ALL
-- Branch 2: match on old_path (uses idx_mfc_old_path_project_mr)
SELECT fc.merge_request_id, fc.project_id
FROM mr_file_changes fc
WHERE (?3 IS NULL OR fc.project_id = ?3)
AND fc.old_path {path_op}
),
matched_file_changes AS (
-- Dedup: prevent double-counting when old_path = new_path (no rename)
SELECT DISTINCT merge_request_id, project_id
FROM matched_file_changes_raw
),
reviewer_participation AS (
-- Precompute which (mr_id, username) pairs have substantive DiffNote participation.
-- Materialized once, then joined against mr_reviewers to classify.
-- The LENGTH filter excludes trivial notes ("LGTM", "+1", emoji-only) from qualifying
-- a reviewer as "participated." Without this, a single "LGTM" would promote an assigned
-- reviewer from 3-point to 10-point weight, defeating the purpose of the split.
-- Note: mn.id refers back to notes.id, so we join notes to access the body column
-- (not carried in matched_notes to avoid bloating that CTE with body text).
SELECT DISTINCT d.merge_request_id AS mr_id, mn.author_username AS username
FROM matched_notes mn
JOIN discussions d ON mn.discussion_id = d.id
JOIN notes n_body ON mn.id = n_body.id
WHERE d.merge_request_id IS NOT NULL
AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= {reviewer_min_note_chars}
),
raw AS (
-- Signal 1: DiffNote reviewer (individual notes for note_cnt)
SELECT mn.author_username AS username, 'diffnote_reviewer' AS signal,
m.id AS mr_id, mn.id AS note_id, mn.created_at AS seen_at, m.state AS mr_state
FROM matched_notes mn
JOIN discussions d ON mn.discussion_id = d.id
JOIN merge_requests m ON d.merge_request_id = m.id
WHERE (m.author_username IS NULL OR mn.author_username != m.author_username)
AND m.state IN ('opened','merged','closed')
UNION ALL
-- Signal 2: DiffNote MR author
SELECT m.author_username AS username, 'diffnote_author' AS signal,
m.id AS mr_id, NULL AS note_id, MAX(mn.created_at) AS seen_at, m.state AS mr_state
FROM merge_requests m
JOIN discussions d ON d.merge_request_id = m.id
JOIN matched_notes mn ON mn.discussion_id = d.id
WHERE m.author_username IS NOT NULL
AND m.state IN ('opened','merged','closed')
GROUP BY m.author_username, m.id
UNION ALL
-- Signal 3: MR author via file changes (state-aware timestamp)
SELECT m.author_username AS username, 'file_author' AS signal,
m.id AS mr_id, NULL AS note_id,
{state_aware_ts} AS seen_at, m.state AS mr_state
FROM matched_file_changes mfc
JOIN merge_requests m ON mfc.merge_request_id = m.id
WHERE m.author_username IS NOT NULL
AND m.state IN ('opened','merged','closed')
AND {state_aware_ts} >= ?2
AND {state_aware_ts} < ?4
UNION ALL
-- Signal 4a: Reviewer participated (in mr_reviewers AND left DiffNotes on path)
SELECT r.username AS username, 'file_reviewer_participated' AS signal,
m.id AS mr_id, NULL AS note_id,
{state_aware_ts} AS seen_at, m.state AS mr_state
FROM matched_file_changes mfc
JOIN merge_requests m ON mfc.merge_request_id = m.id
JOIN mr_reviewers r ON r.merge_request_id = m.id
JOIN reviewer_participation rp ON rp.mr_id = m.id AND rp.username = r.username
WHERE r.username IS NOT NULL
AND (m.author_username IS NULL OR r.username != m.author_username)
AND m.state IN ('opened','merged','closed')
AND {state_aware_ts} >= ?2
AND {state_aware_ts} < ?4
UNION ALL
-- Signal 4b: Reviewer assigned-only (in mr_reviewers, NO DiffNotes on path)
SELECT r.username AS username, 'file_reviewer_assigned' AS signal,
m.id AS mr_id, NULL AS note_id,
{state_aware_ts} AS seen_at, m.state AS mr_state
FROM matched_file_changes mfc
JOIN merge_requests m ON mfc.merge_request_id = m.id
JOIN mr_reviewers r ON r.merge_request_id = m.id
LEFT JOIN reviewer_participation rp ON rp.mr_id = m.id AND rp.username = r.username
WHERE rp.username IS NULL -- NOT in participation set
AND r.username IS NOT NULL
AND (m.author_username IS NULL OR r.username != m.author_username)
AND m.state IN ('opened','merged','closed')
AND {state_aware_ts} >= ?2
AND {state_aware_ts} < ?4
),
aggregated AS (
-- MR-level signals: 1 row per (username, signal_class, mr_id) with MAX(ts)
SELECT username, signal, mr_id, 1 AS qty, MAX(seen_at) AS ts, mr_state
FROM raw WHERE signal != 'diffnote_reviewer'
GROUP BY username, signal, mr_id
UNION ALL
-- Note signals: 1 row per (username, mr_id) with note_count and max_ts
SELECT username, 'note_group' AS signal, mr_id, COUNT(*) AS qty, MAX(seen_at) AS ts, mr_state
FROM raw WHERE signal = 'diffnote_reviewer' AND note_id IS NOT NULL
GROUP BY username, mr_id
)
SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated WHERE username IS NOT NULL
```
Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` exclusive upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `>= ?2 AND < ?4` pattern (half-open interval) ensures that when `--as-of` is set to a past date, events at or after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility. The exclusive upper bound avoids edge-case ambiguity when events have timestamps exactly equal to the as-of value.
**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into foundational CTEs (`matched_notes_raw``matched_notes`, `matched_file_changes_raw``matched_file_changes`) means path matching is defined once, each index branch is explicit, and adding future path resolution logic (e.g., alias chains) only requires changes in one place. The UNION ALL + dedup pattern ensures SQLite uses the optimal index for each path column independently.
**Dual-path matching strategy (UNION ALL split)**: SQLite's query planner commonly struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. Rather than starting with `OR` and hoping the planner cooperates, use `UNION ALL` + dedup as the default strategy:
```sql
matched_notes AS (
SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
UNION ALL
SELECT ... FROM notes n WHERE ... AND n.position_old_path {path_op}
),
matched_notes_dedup AS (
SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
FROM matched_notes
),
```
This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). The same pattern applies to `matched_file_changes`. The simpler `OR` variant is retained as a comment for benchmarking — if a future SQLite version handles `OR` well, the split can be collapsed.
**Rationale for precomputed participation set**: The previous approach used correlated `EXISTS`/`NOT EXISTS` subqueries to classify reviewers. The `reviewer_participation` CTE materializes the set of `(mr_id, username)` pairs from matched DiffNotes once, then signal 4a JOINs against it (participated) and signal 4b LEFT JOINs with `IS NULL` (assigned-only). This avoids per-reviewer-row correlated scans, is easier to reason about, and produces the same exhaustive split — every `mr_reviewers` row falls into exactly one bucket.
**Rationale for hybrid over fully-raw**: Pre-aggregating note counts in SQL prevents row explosion from heavy DiffNote volume on frequently-discussed paths. MR-level signals are already 1-per-MR by nature (deduped via GROUP BY in each subquery). This keeps memory and latency predictable regardless of review activity density.
**Path rename awareness**: Both `matched_notes` and `matched_file_changes` use UNION ALL + dedup to match against both old and new path columns independently, ensuring each branch uses its respective index:
- Notes: branch 1 matches `position_new_path`, branch 2 matches `position_old_path`, deduped by `notes.id`
- File changes: branch 1 matches `new_path`, branch 2 matches `old_path`, deduped by `(merge_request_id, project_id)`
Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The UNION ALL approach ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.
**Signal 4 splits into two**: The current signal 4 (`file_reviewer`) joins `mr_reviewers` but doesn't distinguish participation. In the new plan:
- **Signal 4a** (`file_reviewer_participated`): User is in `mr_reviewers` AND appears in the `reviewer_participation` CTE (left DiffNotes on the path for that MR). Gets `reviewer_weight` (10) and `reviewer_half_life_days` (90).
- **Signal 4b** (`file_reviewer_assigned`): User is in `mr_reviewers` but NOT in the `reviewer_participation` CTE. Gets `reviewer_assignment_weight` (3) and `reviewer_assignment_half_life_days` (45).
### 3a. Path Resolution Probes (who.rs)
Rename awareness must extend beyond scoring queries to the path resolution layer. Currently `build_path_query()` (line 457) and `suffix_probe()` (line 584) only check `position_new_path` and `new_path`. If a user queries an old path name, these probes return "not found" and the scoring query never runs.
**Changes to `build_path_query()`**:
- **Probe 1 (exact_exists)**: Add `OR position_old_path = ?1` to the notes query and `OR old_path = ?1` to the `mr_file_changes` query. This detects files that existed under the queried name even if they've since been renamed.
- **Probe 2 (prefix_exists)**: Add `OR position_old_path LIKE ?1 ESCAPE '\\'` and `OR old_path LIKE ?1 ESCAPE '\\'` to the respective queries.
**Changes to `suffix_probe()`**:
The UNION query inside `suffix_probe()` currently only selects `position_new_path` from notes and `new_path` from file changes. Add two additional UNION branches:
```sql
UNION
SELECT position_old_path AS full_path FROM notes
WHERE note_type = 'DiffNote' AND is_system = 0
AND position_old_path IS NOT NULL
AND (position_old_path LIKE ?1 ESCAPE '\\' OR position_old_path = ?2)
AND (?3 IS NULL OR project_id = ?3)
UNION
SELECT old_path AS full_path FROM mr_file_changes
WHERE old_path IS NOT NULL
AND (old_path LIKE ?1 ESCAPE '\\' OR old_path = ?2)
AND (?3 IS NULL OR project_id = ?3)
```
This ensures that querying by an old filename (e.g., `login.rs` after it was renamed to `auth.rs`) still resolves to a usable path for scoring. The UNION deduplicates so the same path appearing in both old and new columns doesn't cause false ambiguity.
**State-aware timestamps for file-change signals (signals 3, 4a, 4b)**: Replace `m.updated_at` with a state-aware expression:
```sql
CASE
WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.created_at)
WHEN m.state = 'closed' THEN COALESCE(m.closed_at, m.created_at)
ELSE COALESCE(m.updated_at, m.created_at) -- opened / other
END AS activity_ts
```
**Rationale**: `updated_at` is noisy for merged MRs — it changes on label edits, title changes, rebases, and metadata touches, creating false recency. `merged_at` is the best indicator of when code expertise was formed (the moment the code entered the branch). But for **open MRs**, `updated_at` is actually the right signal because it reflects ongoing active work. `closed_at` anchors closed-without-merge MRs to their closure time (these represent review effort even if the code was abandoned). Each state gets the timestamp that best represents when expertise was last exercised.
### 4. Rust-Side Aggregation (who.rs)
For each username, accumulate into a struct with:
- **Author MRs**: `HashMap<i64, (i64, String)>` (mr_id -> (max timestamp, mr_state)) from `diffnote_author` + `file_author` signals
- **Reviewer Participated MRs**: `HashMap<i64, (i64, String)>` from `diffnote_reviewer` + `file_reviewer_participated` signals
- **Reviewer Assigned-Only MRs**: `HashMap<i64, (i64, String)>` from `file_reviewer_assigned` signals (excluding any MR already in participated set)
- **Notes per MR**: `HashMap<i64, (u32, i64, String)>` (mr_id -> (count, max_ts, mr_state)) from `note_group` rows in the aggregated query (already grouped per user+MR with note_count in `qty`). Used for `log2(1 + count)` diminishing returns.
- **Last seen**: max of all timestamps
- **Components** (when `--explain-score`): Track per-component f64 subtotals for `author`, `reviewer_participated`, `reviewer_assigned`, `notes`
The `mr_state` field from each SQL row is stored alongside the timestamp so the Rust-side can apply `closed_mr_multiplier` when `mr_state == "closed"`.
Compute score as `f64` with **deterministic contribution ordering**: within each signal type, sort contributions by `(mr_id ASC)` before summing. This eliminates platform-dependent HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the complexity of compensated summation (Neumaier/Kahan). Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`:
```
state_mult(mr) = if mr.state == "closed" { closed_mr_multiplier } else { 1.0 }
raw_score =
sum(author_weight * state_mult(mr) * decay(now - ts, author_hl) for (mr, ts) in author_mrs)
+ sum(reviewer_weight * state_mult(mr) * decay(now - ts, reviewer_hl) for (mr, ts) in reviewer_participated)
+ sum(reviewer_assignment_weight * state_mult(mr) * decay(now - ts, reviewer_assignment_hl) for (mr, ts) in reviewer_assigned)
+ sum(note_bonus * state_mult(mr) * log2(1 + count) * decay(now - ts, note_hl) for (mr, count, ts) in notes_per_mr)
```
**Why include closed MRs?** A closed-without-merge MR still represents review effort and code familiarity — the reviewer read the diff, left comments, and engaged with the code even though it was ultimately abandoned. Excluding closed MRs entirely (the previous plan's approach) discarded this signal. The `closed_mr_multiplier` (default 0.5) halves the contribution, reflecting that the code never landed but the reviewer's cognitive engagement was real. This also eliminates the dead-code inconsistency where the state-aware CASE expression handled `closed` but the WHERE clause excluded it.
**Sort on raw `f64` score**`(raw_score DESC, last_seen DESC, username ASC)`. This prevents false ties from premature rounding. Only round to `i64` for the `Expert.score` display field after sorting and truncation. The robot JSON `score` field stays integer for backward compatibility. When `--explain-score` is active, also include `score_raw` (the unrounded f64) alongside `score` so the component totals can be verified without rounding noise.
Compute counts from the accumulated data:
- `review_mr_count = reviewer_participated.len() + reviewer_assigned.len()`
- `review_note_count = notes_per_mr.values().map(|(count, _)| count).sum()`
- `author_mr_count = author_mrs.len()`
**Bot/service-account filtering**: After accumulating all user scores and before sorting, filter out any username that appears in `config.scoring.excluded_usernames` (exact match, case-insensitive). This is applied in Rust post-query (not SQL) to keep the SQL clean and avoid parameter explosion. When `--include-bots` is active, the filter is skipped entirely. The robot JSON `resolved_input` includes `excluded_usernames_applied: true|false` to indicate whether filtering was active.
Truncate to limit after sorting.
### 5. Default --since Change
Expert mode: `"6m"` -> `"24m"` (line 289 in who.rs).
At 2 years, author decay = 6%, reviewer decay = 0.4%, note decay = 0.006% — negligible, good cutoff.
**Diagnostic escape hatch**: Add `--all-history` flag (conflicts with `--since`) that sets `since_ms = 0`, capturing all data regardless of age. Useful for debugging scoring anomalies and validating the decay model against known experts. The `since_mode` field in robot JSON reports `"all"` when this flag is active.
### 5a. Reproducible Scoring via `--as-of`
Add `--as-of <RFC3339|YYYY-MM-DD>` flag that overrides the `now_ms` reference point used for decay calculations. When set:
- All event selection is bounded by `[since_ms, as_of_ms)` — exclusive upper bound; events at or after `as_of_ms` are excluded from SQL results entirely (not just decayed). The SQL uses `< ?4` (strict less-than), not `<= ?4`.
- `YYYY-MM-DD` input (without time component) is interpreted as end-of-day UTC: `T23:59:59.999Z`. This matches user intuition that `--as-of 2025-06-01` means "as of the end of June 1st" rather than "as of midnight at the start of June 1st" which would exclude the entire day's activity.
- All decay computations use `as_of_ms` instead of `SystemTime::now()`
- The `--since` window is calculated relative to `as_of_ms` (not wall clock)
- Robot JSON `resolved_input` includes `as_of_ms`, `as_of_iso`, `window_start_iso`, `window_end_iso`, and `window_end_exclusive: true` fields — making the exact query window unambiguous in output
**Rationale**: Decayed scoring is time-sensitive by nature. Without a fixed reference point, the same query run minutes apart produces different rankings, making debugging and test reproducibility difficult. `--as-of` pins the clock so that results are deterministic for a given dataset. The upper-bound filter in SQL is critical — without it, events after the as-of date would enter with full weight (since `elapsed.max(0.0)` clamps negative elapsed time to zero), breaking the reproducibility guarantee.
Implementation: Parse the flag in `run_who()`, compute `as_of_ms: i64`, and thread it through to `query_expert()` where it replaces `now_ms()` and is bound as `?4` in all SQL queries. When the flag is absent, `?4` defaults to `now_ms()` (wall clock), which makes the upper bound transparent — all events are within the window by definition. The flag is compatible with all modes but primarily useful in expert mode.
### 5b. Score Explainability via `--explain-score`
Add `--explain-score` flag that augments each expert result with a per-user component breakdown:
```json
{
"username": "jsmith",
"score": 42,
"score_raw": 42.0,
"components": {
"author": 28.5,
"reviewer_participated": 8.2,
"reviewer_assigned": 1.8,
"notes": 3.5
}
}
```
**Scope for this iteration**: Component breakdown only (4 floats per user). No top-evidence MRs, no decay curves, no per-MR drill-down. Those are v2 features if scoring disputes arise frequently.
**Flag conflicts**: `--explain-score` is mutually exclusive with `--detail`. Both augment per-user output in different ways; combining them would produce confusing overlapping output. Clap `conflicts_with` enforces this at parse time.
**Human output**: When `--explain-score` is active in human mode, append a parenthetical after each score: `42 (author:28.5 review:10.0 notes:3.5)`.
**Robot output**: Add `score_raw` (unrounded f64) and `components` object to each expert entry. Only present when `--explain-score` is active (no payload bloat by default). The `resolved_input` section also includes `scoring_model_version: 2` to distinguish from the v1 flat-weight model, enabling robot clients to adapt parsing.
**Rationale**: Multi-signal decayed ranking will be disputed without decomposition. Showing which signal drives a user's score makes results actionable and builds trust in the model. Keeping scope minimal avoids the output format complexity that originally motivated deferral.
### 6. Index Migration (db.rs)
Add a new migration to support the restructured query patterns. The dual-path matching CTEs and `reviewer_participation` CTE introduce query shapes that need index coverage:
```sql
-- Support dual-path matching on DiffNotes (old_path leg of the OR in matched_notes CTE)
CREATE INDEX IF NOT EXISTS idx_notes_old_path_author
ON notes(position_old_path, author_username, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;
-- Support dual-path matching on file changes (old_path leg of the OR in matched_file_changes CTE)
CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
ON mr_file_changes(old_path, project_id, merge_request_id)
WHERE old_path IS NOT NULL;
-- Support new_path matching on file changes (ensure index parity with old_path)
-- Existing indexes may not have optimal column order for the CTE pattern.
CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
ON mr_file_changes(new_path, project_id, merge_request_id);
-- Support reviewer_participation CTE: joining matched_notes -> discussions -> mr_reviewers
-- notes.discussion_id (NOT noteable_id, which doesn't exist in the schema) is the FK to discussions
CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author
ON notes(discussion_id, author_username, created_at)
WHERE note_type = 'DiffNote' AND is_system = 0;
```
**Rationale**: The existing indexes cover `position_new_path` and `new_path` but not their `old_path` counterparts. Without these, the `OR old_path` clauses would force table scans on renamed files. The `reviewer_participation` CTE joins `matched_notes` -> `discussions` -> `merge_requests`, so an index on `(discussion_id, author_username)` speeds up the CTE materialization.
**Schema note**: The `notes` table uses `discussion_id` as its FK to `discussions`, which in turn has `merge_request_id`. There is no `noteable_id` column on `notes`. The previous plan revision incorrectly referenced `noteable_id` — this is corrected.
**Removed**: The `idx_mr_state_timestamps` composite index on `merge_requests(state, merged_at, closed_at, updated_at, created_at)` was removed. MR lookups in the scoring query are always id-driven (joining from `matched_file_changes` or `discussions`), so the state-aware CASE expression operates on rows already fetched by primary key. A low-selectivity composite index on 5 columns would consume space without improving any query path.
Partial indexes (with `WHERE` clauses) keep the index size minimal — only DiffNote rows and non-null old_path rows are indexed.
### 7. Test Helpers
Add timestamp-aware variants:
- `insert_mr_at(conn, id, project_id, iid, author, state, updated_at_ms)`
- `insert_diffnote_at(conn, id, discussion_id, project_id, author, file_path, body, created_at_ms)`
### 8. New Tests (TDD)
#### Example-based tests
**`test_half_life_decay_math`**: Verify the pure function:
- elapsed=0 -> 1.0
- elapsed=half_life -> 0.5
- elapsed=2*half_life -> 0.25
- half_life_days=0 -> 0.0 (guard against div-by-zero)
**`test_expert_scores_decay_with_time`**: Two authors, one recent (10 days), one old (360 days). Recent author should score ~24, old author ~6.
**`test_expert_reviewer_decays_faster_than_author`**: Same MR, same age (90 days). Author retains ~18 points, reviewer retains ~5 points. Author dominates clearly.
**`test_reviewer_participated_vs_assigned_only`**: Two reviewers on the same MR at the same age. One left DiffNotes (participated), one didn't (assigned-only). Participated reviewer should score ~10 * decay, assigned-only should score ~3 * decay. Verifies the split works end-to-end.
**`test_note_diminishing_returns_per_mr`**: One reviewer with 1 note on MR-A and another with 20 notes on MR-B, both at same age. The 20-note reviewer should score `log2(21)/log2(2) ≈ 4.4x` the 1-note reviewer, NOT 20x. Validates the `log2(1+count)` cap.
**`test_config_validation_rejects_zero_half_life`**: `ScoringConfig` with `author_half_life_days = 0` should return `ConfigInvalid` error.
**`test_file_change_timestamp_uses_merged_at`**: An MR with `merged_at` set and `state = 'merged'` should use `merged_at` timestamp, not `updated_at`. Verify by setting `merged_at` to old date and `updated_at` to recent date — score should reflect the old date.
**`test_open_mr_uses_updated_at`**: An MR with `state = 'opened'` should use `updated_at` (not `created_at`). Verify that an open MR with recent `updated_at` scores higher than one with the same `created_at` but older `updated_at`.
**`test_old_path_match_credits_expertise`**: Insert a DiffNote with `position_old_path = "src/old.rs"` and `position_new_path = "src/new.rs"`. Query `--path src/old.rs` — the author should appear. Query `--path src/new.rs` — same author should also appear. Validates dual-path matching.
**`test_explain_score_components_sum_to_total`**: With `--explain-score`, verify that `components.author + components.reviewer_participated + components.reviewer_assigned + components.notes` equals the reported `score_raw` (within f64 rounding tolerance). Note: the closed_mr_multiplier is already folded into the per-component subtotals, not tracked as a separate component.
**`test_as_of_produces_deterministic_results`**: Insert data at known timestamps. Run `query_expert` twice with the same `--as-of` value — results must be identical. Then run with a later `--as-of` — scores should be lower (more decay).
**`test_old_path_probe_exact_and_prefix`**: Insert a DiffNote with `position_old_path = "src/old/foo.rs"` and `position_new_path = "src/new/foo.rs"`. Call `build_path_query(conn, "src/old/foo.rs")` — should resolve as exact file (not "not found"). Call `build_path_query(conn, "src/old/")` — should resolve as prefix. Validates that the path resolution probes now check old_path columns.
**`test_suffix_probe_uses_old_path_sources`**: Insert a file change with `old_path = "legacy/utils.rs"` and `new_path = "src/utils.rs"`. Call `build_path_query(conn, "legacy/utils.rs")` — should resolve via exact probe on old_path. Call `build_path_query(conn, "utils.rs")` — suffix probe should find both `legacy/utils.rs` and `src/utils.rs` and either resolve uniquely (if deduplicated) or report ambiguity.
**`test_since_relative_to_as_of_clock`**: Insert data at timestamps T1 and T2 (T2 > T1). With `--as-of T2` and `--since 30d`, the window is `[T2 - 30d, T2]`, not `[now - 30d, now]`. Verify that data at T1 is included or excluded based on the as-of-relative window, not the wall clock window.
**`test_explain_and_detail_are_mutually_exclusive`**: Parsing `--explain-score --detail` should fail with a conflict error from clap.
**`test_trivial_note_does_not_count_as_participation`**: A reviewer who left only a short note ("LGTM", 4 chars) on an MR should be classified as assigned-only, not participated, when `reviewer_min_note_chars = 20`. A reviewer who left a substantive note (>= 20 chars) should be classified as participated. Validates the LENGTH threshold in the `reviewer_participation` CTE.
**`test_closed_mr_multiplier`**: Two identical MRs (same author, same age, same path). One is `merged`, one is `closed`. The merged MR should contribute `author_weight * decay(...)`, the closed MR should contribute `author_weight * closed_mr_multiplier * decay(...)`. With default multiplier 0.5, the closed MR contributes half.
**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the exclusive upper-bound (`< ?4`) filtering in SQL.
**`test_as_of_exclusive_upper_bound`**: Insert an event with timestamp exactly equal to the `as_of_ms` value. Verify it is excluded from results (strict less-than, not less-than-or-equal). This validates the half-open interval `[since, as_of)` semantics.
**`test_excluded_usernames_filters_bots`**: Insert signals for a user named "renovate-bot" and a user named "jsmith", both with the same activity. With `excluded_usernames: ["renovate-bot"]` in config, only "jsmith" should appear in results. Validates the Rust-side post-query filtering.
**`test_include_bots_flag_disables_filtering`**: Same setup as above, but with `--include-bots` active. Both "renovate-bot" and "jsmith" should appear in results.
**`test_null_timestamp_fallback_to_created_at`**: Insert a merged MR with `merged_at = NULL` (edge case: old data before the column was populated). The state-aware timestamp should fall back to `created_at`. Verify the score reflects `created_at`, not 0 or a panic.
#### Invariant tests (regression safety for ranking systems)
**`test_score_monotonicity_by_age`**: For any single signal type, an older timestamp must never produce a higher score than a newer timestamp with the same weight and half-life. Generate N random (age, half_life) pairs and assert `decay(older) <= decay(newer)` for all.
**`test_row_order_independence`**: Insert the same set of signals in two different orders (e.g., reversed). Run `query_expert` on both — the resulting rankings (username order + scores) must be identical. Validates that neither SQL ordering nor HashMap iteration order affects final output.
**`test_reviewer_split_is_exhaustive`**: For a reviewer assigned to an MR, they must appear in exactly one of: participated (has substantive DiffNotes meeting `reviewer_min_note_chars`) or assigned-only (no DiffNotes, or only trivial ones below the threshold). Never both, never neither. Test three cases: (1) reviewer with substantive DiffNotes -> participated only, (2) reviewer with no DiffNotes -> assigned-only only, (3) reviewer with only trivial notes ("LGTM") -> assigned-only only.
**`test_deterministic_accumulation_order`**: Insert signals for a user with contributions at many different timestamps (10+ MRs with varied ages). Run `query_expert` 100 times in a loop. All 100 runs must produce the exact same `f64` score (bit-identical). Validates that the sorted contribution ordering eliminates HashMap-iteration-order nondeterminism.
### 9. Existing Test Compatibility
All existing tests insert data with `now_ms()`. With decay, elapsed ~0ms means decay ~1.0, so scores round to the same integers as before. No existing test assertions should break.
The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, `reviewer_min_note_chars`, and `excluded_usernames` fields.
## Verification
1. `cargo check --all-targets` — no compiler errors
2. `cargo clippy --all-targets -- -D warnings` — no lints
3. `cargo fmt --check` — formatting clean
4. `cargo test` — all existing + new tests pass (including invariant tests)
5. `ubs src/cli/commands/who.rs src/core/config.rs src/core/db.rs` — no bug scanner findings
6. Manual query plan verification (not automated — SQLite planner varies across versions):
- Run `EXPLAIN QUERY PLAN` on the expert query (both exact and prefix modes) against a real database
- Confirm that `matched_notes_raw` branch 1 uses the existing new_path index and branch 2 uses `idx_notes_old_path_author` (not a full table scan on either branch)
- Confirm that `matched_file_changes_raw` branch 1 uses `idx_mfc_new_path_project_mr` and branch 2 uses `idx_mfc_old_path_project_mr`
- Confirm that `reviewer_participation` CTE uses `idx_notes_diffnote_discussion_author`
- Document the observed plan in a comment near the SQL for future regression reference
7. Performance baseline (manual, not CI-gated):
- Run `time cargo run --release -- who --path <exact-path>` on the real database for exact, prefix, and suffix modes
- Target SLOs: p95 exact path < 200ms, prefix < 300ms, suffix < 500ms on development hardware
- Record baseline timings as a comment near the SQL for regression reference
- If any mode exceeds 2x the baseline after future changes, investigate before merging
- Note: These are soft targets for developer awareness, not automated CI gates. Automated benchmarking with synthetic fixtures (100k/1M/5M notes) is a v2 investment if performance becomes a real concern.
8. Real-world validation:
- `cargo run --release -- who --path MeasurementQualityDialog.tsx` — verify jdefting/zhayes old reviews are properly discounted relative to recent authors
- `cargo run --release -- who --path MeasurementQualityDialog.tsx --all-history` — compare full history vs 24m window to validate cutoff is reasonable
- `cargo run --release -- who --path MeasurementQualityDialog.tsx --explain-score` — verify component breakdown sums to total and authored signal dominates for known authors
- Spot-check that assigned-only reviewers (those who never left DiffNotes) rank below participated reviewers on the same MR
- Test a known renamed file path — verify expertise from the old name carries forward
- `cargo run --release -- who --path MeasurementQualityDialog.tsx --as-of 2025-06-01` — verify deterministic output across repeated runs
- Spot-check that reviewers who only left "LGTM"-style notes are classified as assigned-only (not participated)
- Verify closed MRs contribute at ~50% of equivalent merged MR scores via `--explain-score`
- If the project has known bot accounts (e.g., renovate-bot), add them to `excluded_usernames` config and verify they no longer appear in results. Run again with `--include-bots` to confirm they reappear.
## Accepted from External Review
Ideas incorporated from ChatGPT review (feedback-1 through feedback-4) that genuinely improved the plan:
**From feedback-1 and feedback-2:**
- **Path rename awareness (old_path matching)**: Real correctness gap. Both `position_old_path` and `mr_file_changes.old_path` exist in the schema. Simple `OR` clause addition with high value — expertise now survives file renames.
- **Hybrid SQL pre-aggregation**: Revised from "fully raw rows" to pre-aggregate note counts per (user, MR) in SQL. MR-level signals were already 1-per-MR; the note rows were the actual scalability risk. Bounded row counts with predictable memory.
- **State-aware timestamps**: Improved from our overly-simple `COALESCE(merged_at, created_at)` to a state-aware CASE expression. Open MRs genuinely need `updated_at` to reflect ongoing work; merged MRs need `merged_at` to anchor expertise formation.
- **Index migration**: The dual-path matching and CTE patterns need index support. Added partial indexes to keep size minimal.
- **Invariant tests**: `test_score_monotonicity_by_age`, `test_row_order_independence`, `test_reviewer_split_is_exhaustive` catch subtle ranking regressions that example-based tests miss.
- **`--as-of` flag**: Simple clock-pinning for reproducible decay scoring. Essential for debugging and test determinism.
- **`--explain-score` flag**: Moved from rejected to included with minimal scope (component breakdown only, no per-MR drill-down). Multi-signal scoring needs decomposition to build trust.
**From feedback-3:**
- **Fix `noteable_id` index bug (critical)**: The `notes` table uses `discussion_id` as FK to `discussions`, not `noteable_id` (which doesn't exist). The proposed `idx_notes_mr_path_author` index would fail at migration time. Fixed to use `(discussion_id, author_username, created_at)`.
- **CTE-based dual-path matching (`matched_notes`, `matched_file_changes`)**: Rather than repeating `OR old_path` in every signal subquery, centralize path matching in foundational CTEs. Defined once, indexed once, maintained once. Cleaner and more extensible.
- **Precomputed `reviewer_participation` CTE**: Replaced correlated `EXISTS`/`NOT EXISTS` subqueries with a materialized set of `(mr_id, username)` pairs. Same semantics, lower query cost, simpler reasoning about the reviewer split.
- **`log2(1+count)` over `ln(1+count)` for notes**: With `log2`, one note contributes exactly 1.0 unit (since `log2(2) = 1`), making `note_bonus=1` directly interpretable. `ln` gives 0.69 per note, which is unintuitive.
- **Path resolution probe rename awareness**: The plan added `old_path` matching to scoring queries but missed the upstream path resolution layer (`build_path_query()` probes and `suffix_probe()`). Without this, querying an old path name fails at resolution and never reaches scoring. Now both probes check old_path columns.
- **Removed low-selectivity `idx_mr_state_timestamps`**: MR lookups in scoring are id-driven (from file_changes or discussions), so a 5-column composite on state/timestamps adds no query benefit.
- **Added `idx_mfc_new_path_project_mr`**: Ensures index parity between old and new path columns on `mr_file_changes`.
- **`--explain-score` conflicts with `--detail`**: Prevents confusing overlapping output from two per-user augmentation flags.
- **`scoring_model_version` in resolved_input**: Lets robot clients distinguish v1 (flat weights) from v2 (decayed) output schemas.
- **`score_raw` in explain mode**: Exposes the unrounded f64 so component totals can be verified without rounding noise.
- **New tests**: `test_old_path_probe_exact_and_prefix`, `test_suffix_probe_uses_old_path_sources`, `test_since_relative_to_as_of_clock`, `test_explain_and_detail_are_mutually_exclusive`, `test_null_timestamp_fallback_to_created_at` — cover the newly-identified gaps in path resolution, clock semantics, and edge cases.
- **EXPLAIN QUERY PLAN verification step**: Manual check that the restructured queries use the new indexes (not automated, since SQLite planner varies across versions).
**From feedback-4:**
- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `< ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms)`. Without this, `--as-of` reproducibility was fundamentally broken. (Refined to exclusive upper bound in feedback-5.)
- **Closed-state inconsistency resolution**: The state-aware CASE expression handled `closed` state but the WHERE clause filtered to `('opened','merged')` only — dead code. Resolved by including `'closed'` in state filters and adding a `closed_mr_multiplier` (default 0.5) applied in Rust to all signals from closed-without-merge MRs. This credits real review effort on abandoned MRs while appropriately discounting it.
- **Substantive note threshold for reviewer participation**: A single "LGTM" shouldn't promote a reviewer from 3-point (assigned-only) to 10-point (participated) weight. Added `reviewer_min_note_chars` (default 20) config field and `LENGTH(TRIM(body))` filter in the `reviewer_participation` CTE. This raises the bar for participation classification to actual substantive review comments.
- **UNION ALL optimization for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Originally documented as a fallback; promoted to default strategy in feedback-5 iteration. The UNION ALL + dedup approach ensures each index branch is used independently.
- **New tests**: `test_trivial_note_does_not_count_as_participation`, `test_closed_mr_multiplier`, `test_as_of_excludes_future_events` — cover the three new features added from this review round.
**From feedback-5 (ChatGPT review):**
- **Exclusive upper bound for `--as-of`**: Changed from `[since_ms, as_of_ms]` (inclusive) to `[since_ms, as_of_ms)` (exclusive). Half-open intervals are the standard convention in temporal systems — they eliminate edge-case ambiguity when events have timestamps exactly at the boundary. Also added `YYYY-MM-DD` → end-of-day UTC parsing and window metadata in robot output.
- **UNION ALL as default for dual-path matching**: Promoted from "fallback if planner regresses" to default strategy. SQLite `OR`-across-indexed-columns degradation is common enough that the predictable UNION ALL + dedup approach is the safer starting point. The simpler `OR` variant is retained as a comment for benchmarking.
- **Deterministic contribution ordering**: Within each signal type, sort contributions by `mr_id` before summing. This eliminates HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the overhead of compensated summation (Neumaier/Kahan was rejected as overkill at this scale).
- **Minimal bot/service-account filtering**: Added `excluded_usernames` (exact match, case-insensitive) to `ScoringConfig` and `--include-bots` CLI flag. Applied as a Rust-side post-filter (not SQL) to keep queries clean. Scope is deliberately minimal — no regex patterns, no heuristic detection. Users configure the list for their team's specific bots.
- **Performance baseline SLOs**: Added manual performance baseline step to verification — record timings for exact/prefix/suffix modes and flag >2x regressions. Kept lightweight (no CI gating, no synthetic benchmarks) to match the project's current maturity.
- **New tests**: `test_as_of_exclusive_upper_bound`, `test_excluded_usernames_filters_bots`, `test_include_bots_flag_disables_filtering`, `test_deterministic_accumulation_order` — cover the newly-accepted features.
## Rejected Ideas (with rationale)
These suggestions were considered during review but explicitly excluded from this iteration:
- **Rename alias chain expansion (A->B->C traversal)** (feedback-2 #2, feedback-4 #4): Over-engineered for v1. The old_path `OR` match covers the 80% case (direct renames). Building a canonical path identity table at ingest time adds schema, ingestion logic, and graph traversal complexity for rare multi-hop renames. If real-world usage shows fragmented expertise on multi-rename files, this becomes a v2 feature.
- **Config-driven `max_age_days`** (feedback-1 #5, feedback-2 #5): We already have `--since` (explicit window), `--all-history` (no window), and the 24m default (mathematically justified). Adding a config field that derives the default since window creates confusing interaction between config and CLI flags. If half-lives change, updating the default constant is trivial.
- **Config-driven `decay_floor` for derived `--since` default** (feedback-3 #4): Proposed computing the default since window as `ceil(max_half_life * log2(1/floor))` so it auto-adjusts when half-lives change. Rejected: the formula is non-obvious to users, adds a config param (`decay_floor`) with no intuitive meaning, and the benefit is negligible — half-life changes are rare, and updating a constant is trivial. The 24m default is already mathematically justified and easy to override with `--since` or `--all-history`.
- **BTreeMap + Kahan/Neumaier compensated summation** (feedback-3 #6): Proposed deterministic iteration order and numerically stable summation. Rejected for this scale: the accumulator processes dozens to low hundreds of entries per user, where HashMap iteration order doesn't measurably affect f64 sums. Compensated summation adds code complexity for zero practical benefit at this magnitude. If we eventually aggregate thousands of signals per user, revisit.
- **Confidence/coverage metadata** (feedback-1 #8, feedback-2 #8, feedback-3 #9, feedback-4 #6): Repeatedly proposed across reviews with variations (score_adjusted with confidence factor, low/medium/high labels, evidence_mr_count weighting). Still scope creep. The `--explain-score` component breakdown already tells users which signal drives the score. Defining "sparse evidence" thresholds (how many MRs is "low"? what's the right exponential saturation constant?) is domain-specific guesswork without user feedback data. A single recent MR "outranking broader expertise" is the *correct* behavior of time-decay — the model intentionally weights recency. If real-world usage shows this is a problem, confidence becomes a v2 feature informed by actual threshold data.
- **Automated EXPLAIN QUERY PLAN tests** (feedback-3 #10 partial): SQLite's query planner changes across versions and can use different plans on different data distributions. Automated assertions on plan output are brittle. Instead, we document EXPLAIN QUERY PLAN as a manual verification step during development and include the observed plan as a comment near the SQL.
- **Per-MR evidence drill-down in `--explain-score`** (feedback-2 #7 promoted this): The v1 `--explain-score` shows component totals only. Listing top-evidence MRs per user would require additional SQL queries and significant output format work. Deferred unless component breakdowns prove insufficient for debugging.
- **Split scoring engine into core module** (feedback-4 #5): Proposed extracting scoring math from `who.rs` into `src/core/scoring/model_v2_decay.rs`. Premature modularization — `who.rs` is the only consumer and is ~800 lines. Adding module plumbing and indirection for a single call site adds complexity without reducing it. If we add a second scoring consumer (e.g., automated triage), revisit.
- **Bot/service-account filtering** (feedback-4 #7): Real concern but orthogonal to time-decay scoring. This is a general data quality feature that belongs in its own issue — it affects all `who` modes, not just expert scoring. Adding `excluded_username_patterns` config and `--include-bots` flag is scope expansion that should be designed and tested independently.
- **Model compare mode / rank-delta diagnostics** (feedback-4 #9): Over-engineered rollout safety for an internal CLI tool with ~3 users. Maintaining two parallel scoring codepaths (v1 flat + v2 decayed) doubles test surface and code complexity. The `--explain-score` + `--as-of` combination already provides debugging capability. If a future model change is risky enough to warrant A/B comparison, build it then.
- **Canonical path identity graph** (feedback-5 #1, also feedback-2 #2, feedback-4 #4): Third time proposed, third time rejected. Building a rename graph from `mr_file_changes(old_path, new_path)` with identity resolution requires new schema (`path_identities`, `path_aliases` tables), ingestion pipeline changes, graph traversal at query time, and backfill logic for existing data. The UNION ALL dual-path matching already covers the 80%+ case (direct renames). Multi-hop rename chains (A→B→C) are rare in practice and can be addressed in v2 with real usage data showing the gap matters.
- **Normalized `expertise_events` table** (feedback-5 #2): Proposes shifting from query-time CTE joins to a precomputed `expertise_events` table populated at ingest time. While architecturally appealing for read performance, this doubles the data surface area (raw tables + derived events), requires new ingestion pipelines with incremental upsert logic, backfill tooling for existing databases, and introduces consistency risks when raw data is corrected/re-synced. The CTE approach is correct, maintainable, and performant at our current scale. If query latency becomes a real bottleneck (see performance baseline SLOs), materialized views or derived tables become a v2 optimization.
- **Reviewer engagement model upgrade** (feedback-5 #3): Proposes adding `approved`/`changes_requested` review-state signals and trivial-comment pattern matching (`["lgtm","+1","nit","ship it"]`). Expands the signal type count from 4 to 6 and adds a fragile pattern-matching layer (what about "don't ship it"? "lgtm but..."?). The `reviewer_min_note_chars` threshold is imperfect but pragmatic — it's a single configurable number with no false-positive risk from substring matching. Review-state signals may be worth adding later as a separate enhancement when we have data on how often they diverge from DiffNote participation.
- **Contribution-floor auto cutoff for `--since`** (feedback-5 #5): Proposes `--since auto` computing the earliest relevant timestamp from `min_contribution_floor` (e.g., 0.01 points). Adds a non-obvious config parameter for minimal benefit — the 24m default is already mathematically justified from the decay curves (author: 6%, reviewer: 0.4% at 2 years) and easily overridden with `--since` or `--all-history`. The auto-derivation formula (`ceil(max_half_life * log2(1/floor))`) is opaque to users who just want to understand why a certain time range was selected.
- **Full evidence drill-down in `--explain-score`** (feedback-5 #8): Proposes `--explain-score=summary|full` with per-MR evidence rows. Already rejected in feedback-2 #7. Component totals are sufficient for v1 debugging — they answer "which signal type drives this user's score." Per-MR drill-down requires additional SQL queries and significant output format complexity. Deferred unless component breakdowns prove insufficient.
- **Neumaier compensated summation** (feedback-5 #7 partial): Accepted the sorting aspect for deterministic ordering, but rejected Neumaier/Kahan compensated summation. At the scale of dozens to low hundreds of contributions per user, the rounding error from naive f64 summation is on the order of 1e-14 — several orders of magnitude below any meaningful score difference. Compensated summation adds code complexity and a maintenance burden for no practical benefit at this scale.
- **Automated CI benchmark gate** (feedback-5 #10 partial): Accepted manual performance baselines, but rejected automated CI regression gating with synthetic fixtures (100k/1M/5M notes). Building and maintaining benchmark infrastructure is a significant investment that's premature for a CLI tool with ~3 users. Manual timing checks during development are sufficient until performance becomes a real concern.