Files
gitlore/plans/time-decay-expert-scoring.md
teernisse ffd074499a docs: update TUI PRD, time-decay scoring, and plan-to-beads plans
TUI PRD v2 (frankentui): Rounds 10-11 feedback refining the hybrid
Ratatui terminal UI approach — component architecture, keybinding
model, and incremental search integration.

Time-decay expert scoring: Round 6 feedback on the weighted scoring
model for the `who` command's expert mode, covering decay curves,
activity normalization, and bot filtering thresholds.

Plan-to-beads v2: Draft specification for the next iteration of the
plan-to-beads skill that converts markdown plans into dependency-
aware beads with full agent-executable context.
2026-02-12 11:21:32 -05:00

69 KiB

plan, title, status, iteration, target_iterations, beads_revision, related_plans, created, updated
plan title status iteration target_iterations beads_revision related_plans created updated
true iterating 6 8 1
2026-02-08 2026-02-12

Time-Decay Expert Scoring Model

Context

The lore who --path command currently uses flat weights to score expertise: each authored MR counts as 25 points, each reviewed MR as 10, each inline note as 1 — regardless of when the activity happened. This produces three compounding problems:

  1. Temporal blindness: Old activity counts the same as recent activity. Someone who authored a file 2 years ago ranks equivalently to someone who wrote it last week.
  2. Reviewer inflation: Senior reviewers (jdefting, zhayes) who rubber-stamp every MR via assignment accumulate inflated scores indistinguishable from reviewers who actually left substantive inline feedback. The mr_reviewers table captures assignment, not engagement.
  3. Path-history blindness: Renamed or moved files lose historical expertise because signal matching relies on position_new_path and mr_file_changes.new_path only. A developer who authored the file under its previous name gets zero credit after a rename.

The fix has three parts:

  • Apply exponential half-life decay to each signal, grounded in cognitive science research
  • Split the reviewer signal into "participated" (left DiffNotes) vs "assigned-only" (in mr_reviewers but no inline comments), with different weights and decay rates
  • Match both old and new paths in all signal queries AND path resolution probes so expertise survives file renames

Research Foundation

  • Ebbinghaus Forgetting Curve (1885): Memory retention follows exponential decay: R = 2^(-t/h) where h is the half-life
  • Generation Effect (Slamecka & Graf, 1978): Producing information (authoring code) creates ~2x more durable memory traces than reading it (reviewing)
  • Levels of Processing (Craik & Lockhart, 1972): Deeper cognitive engagement creates more durable memories — authoring > reviewing > commenting
  • Half-Life Regression (Settles & Meeder, 2016, Duolingo): Exponential decay with per-signal-type half-lives is practical and effective at scale. Chosen over power law for additivity, bounded behavior, and intuitive parameterization
  • Fritz et al. (2010, ICSE): "Degree-of-knowledge" model for code familiarity considers both authoring and interaction events with time-based decay

Scoring Formula

score(user, path) = Sum_i( weight_i * 2^(-days_elapsed_i / half_life_i) )

For note signals grouped per MR, a diminishing-returns function caps comment storms:

note_contribution(mr) = note_bonus * log2(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)

Why log2 instead of ln? With log2, a single note contributes exactly note_bonus * 1.0 (since log2(2) = 1), making the note_bonus weight directly interpretable as "points per note at count=1." With ln, one note contributes note_bonus * 0.69, which is unintuitive and means note_bonus=1 doesn't actually mean "1 point per note." The diminishing-returns curve shape is identical — only the scale factor differs.

Per-signal contributions (each signal is either per-MR or per-note-group):

Signal Type Base Weight Half-Life Rationale
Author (authored MR touching path) 25 180 days Deep generative engagement; ~50% retention at 6 months
Reviewer Participated (left DiffNote on MR/path) 10 90 days Active review engagement; ~50% at 3 months
Reviewer Assigned-Only (in mr_reviewers, no DiffNote on path) 3 45 days Passive assignment; minimal cognitive engagement, fades fast
Note (inline DiffNotes on path, grouped per MR) 1 45 days log2(1+count) per MR; diminishing returns prevent comment storms

Why split reviewers? The mr_reviewers table records assignment, not engagement. A reviewer who left 5 inline comments on a file has demonstrably more expertise than one who was merely assigned and clicked "approve." The participated signal inherits the old reviewer weight (10) and decay (90 days); the assigned-only signal gets reduced weight (3) and faster decay (45 days) — enough to register but not enough to inflate past actual contributors.

Why require substantive notes? Participation is qualified by a minimum note body length (reviewer_min_note_chars, default 20). Without this, a single "LGTM" or "+1" comment would promote a reviewer from the 3-point assigned-only tier to the 10-point participated tier — a 3.3x weight increase for zero substantive engagement. The threshold is configurable to accommodate teams with different review conventions.

Why cap notes per MR? Without diminishing returns, a back-and-forth thread of 30 comments on a single MR would score 30 note points — disproportionate to the expertise gained. log2(1 + 30) ≈ 4.95 vs log2(1 + 1) = 1.0 preserves the signal that more comments = more engagement while preventing outlier MRs from dominating. The 30-note reviewer gets ~5x the credit of a 1-note reviewer, not 30x.

Author/reviewer signals are deduplicated per MR (one signal per distinct MR). Note signals are grouped per (user, MR) and use log2(1 + count) scaling.

Why include closed MRs? Closed-without-merge MRs represent real review effort and code familiarity even though the code was abandoned. All signals from closed MRs are multiplied by closed_mr_multiplier (default 0.5) to reflect this reduced but non-zero contribution. This applies uniformly to author, reviewer, and note signals on closed MRs.

Files to Modify

  1. src/core/config.rs — Add half-life fields + assigned-only reviewer config to ScoringConfig; add config validation
  2. src/cli/commands/who.rs — Core changes:
    • Add half_life_decay() pure function
    • Add normalize_query_path() for input canonicalization before path resolution
    • Restructure query_expert(): SQL returns hybrid-aggregated signal rows with timestamps and state multiplier (MR-level for author/reviewer, note-count-per-MR for notes), Rust applies decay + log2(1+count) + final ranking
    • Match both new_path and old_path in all signal queries (rename awareness)
    • Extend rename awareness to build_path_query() probes and suffix_probe() (not just scoring)
    • Split reviewer signal into participated vs assigned-only
    • Use state-aware timestamps (merged_at for merged MRs, updated_at for open MRs)
    • Change default --since from "6m" to "24m" (2 years captures all meaningful decayed signals)
    • Add --as-of flag for reproducible scoring at a fixed timestamp
    • Add --explain-score flag for per-user score component breakdown
    • Add --include-bots flag to disable bot/service-account filtering
    • Sort on raw f64 score, round only for display
    • Update tests
  3. src/core/db.rs — Add migration for indexes supporting the new query shapes (dual-path matching, reviewer participation CTE, path resolution probes)

Implementation Details

1. ScoringConfig (config.rs)

Add half-life fields and the new assigned-only reviewer signal. All new fields use #[serde(default)] for backward compatibility:

pub struct ScoringConfig {
    pub author_weight: i64,                    // default: 25
    pub reviewer_weight: i64,                  // default: 10  (participated — left DiffNotes)
    pub reviewer_assignment_weight: i64,       // default: 3   (assigned-only — no DiffNotes on path)
    pub note_bonus: i64,                       // default: 1
    pub author_half_life_days: u32,            // default: 180
    pub reviewer_half_life_days: u32,          // default: 90  (participated)
    pub reviewer_assignment_half_life_days: u32, // default: 45 (assigned-only)
    pub note_half_life_days: u32,              // default: 45
    pub closed_mr_multiplier: f64,             // default: 0.5  (applied to closed-without-merge MRs)
    pub reviewer_min_note_chars: u32,          // default: 20   (minimum note body length to count as participation)
    pub excluded_usernames: Vec<String>,       // default: []    (exact-match usernames to exclude, e.g. ["renovate-bot", "gitlab-ci"])
}

Config validation: Add a validate_scoring() call in Config::load_from_path() after deserialization:

  • All *_half_life_days must be > 0 and <= 3650 (prevents division by zero in decay function; rejects absurd 10+ year half-lives that would effectively disable decay)
  • All *_weight / *_bonus must be >= 0 (negative weights produce nonsensical scores)
  • closed_mr_multiplier must be finite (not NaN/Inf) and in (0.0, 1.0] (0 would discard closed MRs entirely; >1 would over-weight them; NaN/Inf would propagate through all scores)
  • reviewer_min_note_chars must be >= 0 and <= 4096 (0 disables the filter; 4096 is a sane upper bound — no real review comment needs to be longer to qualify; typical useful values: 10-50)
  • excluded_usernames entries must be non-empty strings (no blank entries)
  • Return LoreError::ConfigInvalid with a clear message on failure

2. Decay Function (who.rs)

fn half_life_decay(elapsed_ms: i64, half_life_days: u32) -> f64 {
    let days = (elapsed_ms as f64 / 86_400_000.0).max(0.0);
    let hl = f64::from(half_life_days);
    if hl <= 0.0 { return 0.0; }
    2.0_f64.powf(-days / hl)
}

3. SQL Restructure (who.rs)

The SQL uses CTE-based dual-path matching, a centralized mr_activity CTE, and hybrid aggregation. Rather than repeating OR old_path in every signal subquery, two foundational CTEs (matched_notes, matched_file_changes) centralize path matching. A mr_activity CTE centralizes the state-aware timestamp and state multiplier in one place, eliminating repetition of the CASE expression across signals 3, 4a, 4b. A fourth CTE (reviewer_participation) precomputes which reviewers actually left DiffNotes, avoiding correlated EXISTS/NOT EXISTS subqueries.

MR-level signals return one row per (username, signal, mr_id) with a timestamp and state multiplier; note signals return one row per (username, mr_id) with note_count and max_ts. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and log2(1+count).

WITH matched_notes_raw AS (
  -- Branch 1: match on new_path (uses idx_notes_new_path or equivalent)
  SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
  FROM notes n
  WHERE n.note_type = 'DiffNote'
    AND n.is_system = 0
    AND n.author_username IS NOT NULL
    AND n.created_at >= ?2
    AND n.created_at < ?4
    AND (?3 IS NULL OR n.project_id = ?3)
    AND n.position_new_path {path_op}
  UNION ALL
  -- Branch 2: match on old_path (uses idx_notes_old_path_author)
  SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
  FROM notes n
  WHERE n.note_type = 'DiffNote'
    AND n.is_system = 0
    AND n.author_username IS NOT NULL
    AND n.created_at >= ?2
    AND n.created_at < ?4
    AND (?3 IS NULL OR n.project_id = ?3)
    AND n.position_old_path {path_op}
),
matched_notes AS (
  -- Dedup: prevent double-counting when old_path = new_path (no rename)
  SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
  FROM matched_notes_raw
),
matched_file_changes_raw AS (
  -- Branch 1: match on new_path (uses idx_mfc_new_path_project_mr)
  SELECT fc.merge_request_id, fc.project_id
  FROM mr_file_changes fc
  WHERE (?3 IS NULL OR fc.project_id = ?3)
    AND fc.new_path {path_op}
  UNION ALL
  -- Branch 2: match on old_path (uses idx_mfc_old_path_project_mr)
  SELECT fc.merge_request_id, fc.project_id
  FROM mr_file_changes fc
  WHERE (?3 IS NULL OR fc.project_id = ?3)
    AND fc.old_path {path_op}
),
matched_file_changes AS (
  -- Dedup: prevent double-counting when old_path = new_path (no rename)
  SELECT DISTINCT merge_request_id, project_id
  FROM matched_file_changes_raw
),
mr_activity AS (
  -- Centralized state-aware timestamps and state multiplier.
  -- Defined once, referenced by all file-change-based signals (3, 4a, 4b).
  -- Scoped to MRs matched by file changes to avoid materializing the full MR table.
  SELECT DISTINCT
    m.id AS mr_id,
    m.author_username,
    m.state,
    CASE
      WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.created_at)
      WHEN m.state = 'closed' THEN COALESCE(m.closed_at, m.created_at)
      ELSE COALESCE(m.updated_at, m.created_at)
    END AS activity_ts,
    CASE WHEN m.state = 'closed' THEN ?5 ELSE 1.0 END AS state_mult
  FROM merge_requests m
  JOIN matched_file_changes mfc ON mfc.merge_request_id = m.id
  WHERE m.state IN ('opened','merged','closed')
),
reviewer_participation AS (
  -- Precompute which (mr_id, username) pairs have substantive DiffNote participation.
  -- Materialized once, then joined against mr_reviewers to classify.
  -- The LENGTH filter excludes trivial notes ("LGTM", "+1", emoji-only) from qualifying
  -- a reviewer as "participated." Without this, a single "LGTM" would promote an assigned
  -- reviewer from 3-point to 10-point weight, defeating the purpose of the split.
  -- Note: mn.id refers back to notes.id, so we join notes to access the body column
  -- (not carried in matched_notes to avoid bloating that CTE with body text).
  -- ?6 is the configured reviewer_min_note_chars value (default 20).
  SELECT DISTINCT d.merge_request_id AS mr_id, mn.author_username AS username
  FROM matched_notes mn
  JOIN discussions d ON mn.discussion_id = d.id
  JOIN notes n_body ON mn.id = n_body.id
  WHERE d.merge_request_id IS NOT NULL
    AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= ?6
),
raw AS (
  -- Signal 1: DiffNote reviewer (individual notes for note_cnt)
  -- Computes state_mult inline (not via mr_activity) because this joins through discussions, not file changes.
  SELECT mn.author_username AS username, 'diffnote_reviewer' AS signal,
         m.id AS mr_id, mn.id AS note_id, mn.created_at AS seen_at,
         CASE WHEN m.state = 'closed' THEN ?5 ELSE 1.0 END AS state_mult
  FROM matched_notes mn
  JOIN discussions d ON mn.discussion_id = d.id
  JOIN merge_requests m ON d.merge_request_id = m.id
  WHERE (m.author_username IS NULL OR mn.author_username != m.author_username)
    AND m.state IN ('opened','merged','closed')

  UNION ALL

  -- Signal 2: DiffNote MR author
  -- Computes state_mult inline (same reason as signal 1).
  SELECT m.author_username AS username, 'diffnote_author' AS signal,
         m.id AS mr_id, NULL AS note_id, MAX(mn.created_at) AS seen_at,
         CASE WHEN m.state = 'closed' THEN ?5 ELSE 1.0 END AS state_mult
  FROM merge_requests m
  JOIN discussions d ON d.merge_request_id = m.id
  JOIN matched_notes mn ON mn.discussion_id = d.id
  WHERE m.author_username IS NOT NULL
    AND m.state IN ('opened','merged','closed')
  GROUP BY m.author_username, m.id

  UNION ALL

  -- Signal 3: MR author via file changes (uses mr_activity CTE for timestamp + state_mult)
  SELECT a.author_username AS username, 'file_author' AS signal,
         a.mr_id, NULL AS note_id,
         a.activity_ts AS seen_at, a.state_mult
  FROM mr_activity a
  WHERE a.author_username IS NOT NULL
    AND a.activity_ts >= ?2
    AND a.activity_ts < ?4

  UNION ALL

  -- Signal 4a: Reviewer participated (in mr_reviewers AND left DiffNotes on path)
  SELECT r.username AS username, 'file_reviewer_participated' AS signal,
         a.mr_id, NULL AS note_id,
         a.activity_ts AS seen_at, a.state_mult
  FROM mr_activity a
  JOIN mr_reviewers r ON r.merge_request_id = a.mr_id
  JOIN reviewer_participation rp ON rp.mr_id = a.mr_id AND rp.username = r.username
  WHERE r.username IS NOT NULL
    AND (a.author_username IS NULL OR r.username != a.author_username)
    AND a.activity_ts >= ?2
    AND a.activity_ts < ?4

  UNION ALL

  -- Signal 4b: Reviewer assigned-only (in mr_reviewers, NO DiffNotes on path)
  SELECT r.username AS username, 'file_reviewer_assigned' AS signal,
         a.mr_id, NULL AS note_id,
         a.activity_ts AS seen_at, a.state_mult
  FROM mr_activity a
  JOIN mr_reviewers r ON r.merge_request_id = a.mr_id
  LEFT JOIN reviewer_participation rp ON rp.mr_id = a.mr_id AND rp.username = r.username
  WHERE rp.username IS NULL  -- NOT in participation set
    AND r.username IS NOT NULL
    AND (a.author_username IS NULL OR r.username != a.author_username)
    AND a.activity_ts >= ?2
    AND a.activity_ts < ?4
),
aggregated AS (
  -- MR-level signals: 1 row per (username, signal_class, mr_id) with MAX(ts)
  SELECT username, signal, mr_id, 1 AS qty, MAX(seen_at) AS ts, MAX(state_mult) AS state_mult
  FROM raw WHERE signal != 'diffnote_reviewer'
  GROUP BY username, signal, mr_id
  UNION ALL
  -- Note signals: 1 row per (username, mr_id) with note_count and max_ts
  SELECT username, 'note_group' AS signal, mr_id, COUNT(*) AS qty, MAX(seen_at) AS ts, MAX(state_mult) AS state_mult
  FROM raw WHERE signal = 'diffnote_reviewer' AND note_id IS NOT NULL
  GROUP BY username, mr_id
)
SELECT username, signal, mr_id, qty, ts, state_mult FROM aggregated WHERE username IS NOT NULL

Where {path_op} is either = ?1 or LIKE ?1 ESCAPE '\\' depending on the path query type, ?2 is since_ms, ?3 is the optional project_id, ?4 is the as_of_ms exclusive upper bound (defaults to now_ms when --as-of is not specified), ?5 is the closed_mr_multiplier (default 0.5, bound as a parameter), and ?6 is the configured reviewer_min_note_chars value (default 20, bound as a parameter). The >= ?2 AND < ?4 pattern (half-open interval) ensures that when --as-of is set to a past date, events at or after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility. The exclusive upper bound avoids edge-case ambiguity when events have timestamps exactly equal to the as-of value.

Rationale for CTE-based dual-path matching: The previous approach (repeating OR old_path in every signal subquery) duplicated the path matching logic 5 times. Factoring it into foundational CTEs (matched_notes_rawmatched_notes, matched_file_changes_rawmatched_file_changes) means path matching is defined once, each index branch is explicit, and adding future path resolution logic (e.g., alias chains) only requires changes in one place. The UNION ALL + dedup pattern ensures SQLite uses the optimal index for each path column independently.

Dual-path matching strategy (UNION ALL split): SQLite's query planner commonly struggles with OR across two indexed columns, falling back to a full table scan instead of using either index. Rather than starting with OR and hoping the planner cooperates, use UNION ALL + dedup as the default strategy:

matched_notes AS (
  SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
  UNION ALL
  SELECT ... FROM notes n WHERE ... AND n.position_old_path {path_op}
),
matched_notes_dedup AS (
  SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
  FROM matched_notes
),

This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when old_path = new_path (no rename). The same pattern applies to matched_file_changes. The simpler OR variant is retained as a comment for benchmarking — if a future SQLite version handles OR well, the split can be collapsed.

Rationale for precomputed participation set: The previous approach used correlated EXISTS/NOT EXISTS subqueries to classify reviewers. The reviewer_participation CTE materializes the set of (mr_id, username) pairs from matched DiffNotes once, then signal 4a JOINs against it (participated) and signal 4b LEFT JOINs with IS NULL (assigned-only). This avoids per-reviewer-row correlated scans, is easier to reason about, and produces the same exhaustive split — every mr_reviewers row falls into exactly one bucket.

Rationale for hybrid over fully-raw: Pre-aggregating note counts in SQL prevents row explosion from heavy DiffNote volume on frequently-discussed paths. MR-level signals are already 1-per-MR by nature (deduped via GROUP BY in each subquery). This keeps memory and latency predictable regardless of review activity density.

Path rename awareness: Both matched_notes and matched_file_changes use UNION ALL + dedup to match against both old and new path columns independently, ensuring each branch uses its respective index:

  • Notes: branch 1 matches position_new_path, branch 2 matches position_old_path, deduped by notes.id
  • File changes: branch 1 matches new_path, branch 2 matches old_path, deduped by (merge_request_id, project_id)

Both columns already exist in the schema (notes.position_old_path from migration 002, mr_file_changes.old_path from migration 016). The UNION ALL approach ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (--path src/foo/), the LIKE operator applies to both columns identically.

Signal 4 splits into two: The current signal 4 (file_reviewer) joins mr_reviewers but doesn't distinguish participation. In the new plan:

  • Signal 4a (file_reviewer_participated): User is in mr_reviewers AND appears in the reviewer_participation CTE (left DiffNotes on the path for that MR). Gets reviewer_weight (10) and reviewer_half_life_days (90).
  • Signal 4b (file_reviewer_assigned): User is in mr_reviewers but NOT in the reviewer_participation CTE. Gets reviewer_assignment_weight (3) and reviewer_assignment_half_life_days (45).

Rationale for mr_activity CTE: The previous approach repeated the state-aware CASE expression and m.state column in signals 3, 4a, and 4b, with the closed_mr_multiplier applied later in Rust by string-matching on mr_state. This split was brittle — the CASE expression could drift between signal branches, and per-row state-string handling in Rust was unnecessary indirection. The mr_activity CTE defines the timestamp and multiplier once, scoped to matched MRs only (via JOIN with matched_file_changes) to avoid materializing the full MR table. Signals 3, 4a, 4b now reference a.activity_ts and a.state_mult directly. Signals 1 and 2 (DiffNote-based) still compute state_mult inline because they join through discussions, not matched_file_changes, and adding them to mr_activity would require a second join path that doesn't simplify anything.

Rationale for parameterized reviewer_min_note_chars and closed_mr_multiplier: Previous iterations inlined reviewer_min_note_chars as a literal in the SQL string and kept closed_mr_multiplier in Rust only. Binding both as SQL parameters (?5 for closed_mr_multiplier, ?6 for reviewer_min_note_chars) eliminates statement-cache churn (the SQL text is identical regardless of config values), avoids SQL-text variability that complicates EXPLAIN QUERY PLAN analysis, and centralizes the multiplier application in SQL for file-change signals. The DiffNote signals (1, 2) still compute state_mult inline because they don't go through mr_activity.

3a. Path Canonicalization and Resolution Probes (who.rs)

Path canonicalization: Before any path resolution or scoring, normalize the user's input path via normalize_query_path():

  • Strip leading ./ (e.g., ./src/foo.rssrc/foo.rs)
  • Collapse repeated / (e.g., src//foo.rssrc/foo.rs)
  • Trim leading/trailing whitespace
  • Preserve trailing / only when present — it signals explicit prefix intent

This is applied once at the top of run_who() before build_path_query(). The robot JSON resolved_input includes both path_input_original (raw user input) and path_input_normalized (after canonicalization) for debugging transparency. The normalization is purely syntactic — no filesystem lookups, no canonicalization against the database.

Path resolution probes: Rename awareness must extend beyond scoring queries to the path resolution layer. Currently build_path_query() (line 457) and suffix_probe() (line 584) only check position_new_path and new_path. If a user queries an old path name, these probes return "not found" and the scoring query never runs.

Rename awareness must extend beyond scoring queries to the path resolution layer. Currently build_path_query() (line 457) and suffix_probe() (line 584) only check position_new_path and new_path. If a user queries an old path name, these probes return "not found" and the scoring query never runs.

Changes to build_path_query():

  • Probe 1 (exact_exists): Add OR position_old_path = ?1 to the notes query and OR old_path = ?1 to the mr_file_changes query. This detects files that existed under the queried name even if they've since been renamed.
  • Probe 2 (prefix_exists): Add OR position_old_path LIKE ?1 ESCAPE '\\' and OR old_path LIKE ?1 ESCAPE '\\' to the respective queries.

Changes to suffix_probe():

The UNION query inside suffix_probe() currently only selects position_new_path from notes and new_path from file changes. Add two additional UNION branches:

UNION
SELECT position_old_path AS full_path FROM notes
WHERE note_type = 'DiffNote' AND is_system = 0
  AND position_old_path IS NOT NULL
  AND (position_old_path LIKE ?1 ESCAPE '\\' OR position_old_path = ?2)
  AND (?3 IS NULL OR project_id = ?3)
UNION
SELECT old_path AS full_path FROM mr_file_changes
WHERE old_path IS NOT NULL
  AND (old_path LIKE ?1 ESCAPE '\\' OR old_path = ?2)
  AND (?3 IS NULL OR project_id = ?3)

This ensures that querying by an old filename (e.g., login.rs after it was renamed to auth.rs) still resolves to a usable path for scoring. The UNION deduplicates so the same path appearing in both old and new columns doesn't cause false ambiguity.

State-aware timestamps for file-change signals (signals 3, 4a, 4b): Centralized in the mr_activity CTE (see section 3). The CASE expression uses merged_at for merged MRs, closed_at for closed MRs, and updated_at for open MRs, with created_at as fallback when the preferred timestamp is NULL.

Rationale: updated_at is noisy for merged MRs — it changes on label edits, title changes, rebases, and metadata touches, creating false recency. merged_at is the best indicator of when code expertise was formed (the moment the code entered the branch). But for open MRs, updated_at is actually the right signal because it reflects ongoing active work. closed_at anchors closed-without-merge MRs to their closure time (these represent review effort even if the code was abandoned). Each state gets the timestamp that best represents when expertise was last exercised.

4. Rust-Side Aggregation (who.rs)

For each username, accumulate into a struct with:

  • Author MRs: HashMap<i64, (i64, f64)> (mr_id -> (max timestamp, state_mult)) from diffnote_author + file_author signals
  • Reviewer Participated MRs: HashMap<i64, (i64, f64)> from diffnote_reviewer + file_reviewer_participated signals
  • Reviewer Assigned-Only MRs: HashMap<i64, (i64, f64)> from file_reviewer_assigned signals (excluding any MR already in participated set)
  • Notes per MR: HashMap<i64, (u32, i64, f64)> (mr_id -> (count, max_ts, state_mult)) from note_group rows in the aggregated query (already grouped per user+MR with note_count in qty). Used for log2(1 + count) diminishing returns.
  • Last seen: max of all timestamps
  • Components (when --explain-score): Track per-component f64 subtotals for author, reviewer_participated, reviewer_assigned, notes

The state_mult field from each SQL row (already computed in SQL as 1.0 for merged/open or closed_mr_multiplier for closed) is stored alongside the timestamp — no string-matching on MR state needed in Rust.

Compute score as f64 with deterministic contribution ordering: within each signal type, sort contributions by (mr_id ASC) before summing. This eliminates platform-dependent HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the complexity of compensated summation (Neumaier/Kahan). Each MR-level contribution is multiplied by its state_mult (already computed in SQL):

raw_score =
    sum(author_weight * state_mult * decay(now - ts, author_hl) for (mr, ts, state_mult) in author_mrs)
  + sum(reviewer_weight * state_mult * decay(now - ts, reviewer_hl) for (mr, ts, state_mult) in reviewer_participated)
  + sum(reviewer_assignment_weight * state_mult * decay(now - ts, reviewer_assignment_hl) for (mr, ts, state_mult) in reviewer_assigned)
  + sum(note_bonus * state_mult * log2(1 + count) * decay(now - ts, note_hl) for (mr, count, ts, state_mult) in notes_per_mr)

Why include closed MRs? A closed-without-merge MR still represents review effort and code familiarity — the reviewer read the diff, left comments, and engaged with the code even though it was ultimately abandoned. Excluding closed MRs entirely (the previous plan's approach) discarded this signal. The closed_mr_multiplier (default 0.5) halves the contribution, reflecting that the code never landed but the reviewer's cognitive engagement was real. This also eliminates the dead-code inconsistency where the state-aware CASE expression handled closed but the WHERE clause excluded it.

Sort on raw f64 score(raw_score DESC, last_seen DESC, username ASC). This prevents false ties from premature rounding. Only round to i64 for the Expert.score display field after sorting and truncation. The robot JSON score field stays integer for backward compatibility. When --explain-score is active, also include score_raw (the unrounded f64) alongside score so the component totals can be verified without rounding noise.

Compute counts from the accumulated data:

  • review_mr_count = reviewer_participated.len() + reviewer_assigned.len()
  • review_note_count = notes_per_mr.values().map(|(count, _)| count).sum()
  • author_mr_count = author_mrs.len()

Bot/service-account filtering: After accumulating all user scores and before sorting, filter out any username that appears in config.scoring.excluded_usernames (exact match, case-insensitive). This is applied in Rust post-query (not SQL) to keep the SQL clean and avoid parameter explosion. When --include-bots is active, the filter is skipped entirely. The robot JSON resolved_input includes excluded_usernames_applied: true|false to indicate whether filtering was active.

Truncate to limit after sorting.

5. Default --since Change

Expert mode: "6m" -> "24m" (line 289 in who.rs). At 2 years, author decay = 6%, reviewer decay = 0.4%, note decay = 0.006% — negligible, good cutoff.

Diagnostic escape hatch: Add --all-history flag (conflicts with --since) that sets since_ms = 0, capturing all data regardless of age. Useful for debugging scoring anomalies and validating the decay model against known experts. The since_mode field in robot JSON reports "all" when this flag is active.

5a. Reproducible Scoring via --as-of

Add --as-of <RFC3339|YYYY-MM-DD> flag that overrides the now_ms reference point used for decay calculations. When set:

  • All event selection is bounded by [since_ms, as_of_ms) — exclusive upper bound; events at or after as_of_ms are excluded from SQL results entirely (not just decayed). The SQL uses < ?4 (strict less-than), not <= ?4.
  • YYYY-MM-DD input (without time component) is interpreted as end-of-day UTC: T23:59:59.999Z. This matches user intuition that --as-of 2025-06-01 means "as of the end of June 1st" rather than "as of midnight at the start of June 1st" which would exclude the entire day's activity.
  • All decay computations use as_of_ms instead of SystemTime::now()
  • The --since window is calculated relative to as_of_ms (not wall clock)
  • Robot JSON resolved_input includes as_of_ms, as_of_iso, window_start_iso, window_end_iso, and window_end_exclusive: true fields — making the exact query window unambiguous in output

Rationale: Decayed scoring is time-sensitive by nature. Without a fixed reference point, the same query run minutes apart produces different rankings, making debugging and test reproducibility difficult. --as-of pins the clock so that results are deterministic for a given dataset. The upper-bound filter in SQL is critical — without it, events after the as-of date would enter with full weight (since elapsed.max(0.0) clamps negative elapsed time to zero), breaking the reproducibility guarantee.

Implementation: Parse the flag in run_who(), compute as_of_ms: i64, and thread it through to query_expert() where it replaces now_ms() and is bound as ?4 in all SQL queries. When the flag is absent, ?4 defaults to now_ms() (wall clock), which makes the upper bound transparent — all events are within the window by definition. The flag is compatible with all modes but primarily useful in expert mode.

5b. Score Explainability via --explain-score

Add --explain-score flag that augments each expert result with a per-user component breakdown:

{
  "username": "jsmith",
  "score": 42,
  "score_raw": 42.0,
  "components": {
    "author": 28.5,
    "reviewer_participated": 8.2,
    "reviewer_assigned": 1.8,
    "notes": 3.5
  }
}

Scope for this iteration: Component breakdown only (4 floats per user). No top-evidence MRs, no decay curves, no per-MR drill-down. Those are v2 features if scoring disputes arise frequently.

Flag conflicts: --explain-score is mutually exclusive with --detail. Both augment per-user output in different ways; combining them would produce confusing overlapping output. Clap conflicts_with enforces this at parse time.

Human output: When --explain-score is active in human mode, append a parenthetical after each score: 42 (author:28.5 review:10.0 notes:3.5).

Robot output: Add score_raw (unrounded f64) and components object to each expert entry. Only present when --explain-score is active (no payload bloat by default). The resolved_input section also includes scoring_model_version: 2 to distinguish from the v1 flat-weight model, enabling robot clients to adapt parsing.

Rationale: Multi-signal decayed ranking will be disputed without decomposition. Showing which signal drives a user's score makes results actionable and builds trust in the model. Keeping scope minimal avoids the output format complexity that originally motivated deferral.

6. Index Migration (db.rs)

Add a new migration to support the restructured query patterns. The dual-path matching CTEs and reviewer_participation CTE introduce query shapes that need index coverage:

-- Support dual-path matching on DiffNotes (old_path leg of the OR in matched_notes CTE)
CREATE INDEX IF NOT EXISTS idx_notes_old_path_author
  ON notes(position_old_path, author_username, created_at)
  WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;

-- Support dual-path matching on file changes (old_path leg of the OR in matched_file_changes CTE)
CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
  ON mr_file_changes(old_path, project_id, merge_request_id)
  WHERE old_path IS NOT NULL;

-- Support new_path matching on file changes (ensure index parity with old_path)
-- Existing indexes may not have optimal column order for the CTE pattern.
CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
  ON mr_file_changes(new_path, project_id, merge_request_id);

-- Support reviewer_participation CTE: joining matched_notes -> discussions -> mr_reviewers
-- notes.discussion_id (NOT noteable_id, which doesn't exist in the schema) is the FK to discussions
CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author
  ON notes(discussion_id, author_username, created_at)
  WHERE note_type = 'DiffNote' AND is_system = 0;

-- Support path resolution probes on old_path (build_path_query() and suffix_probe())
-- The existing idx_notes_diffnote_path_created covers new_path probes, but old_path probes
-- need their own index since probes don't constrain author_username.
CREATE INDEX IF NOT EXISTS idx_notes_old_path_project_created
  ON notes(position_old_path, project_id, created_at)
  WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;

Rationale: The existing indexes cover position_new_path and new_path but not their old_path counterparts. Without these, the OR old_path clauses would force table scans on renamed files. The reviewer_participation CTE joins matched_notes -> discussions -> merge_requests, so an index on (discussion_id, author_username) speeds up the CTE materialization. The idx_notes_old_path_project_created index supports path resolution probes (build_path_query() and suffix_probe()) which run existence/path-only checks without constraining author_username — the scoring-oriented idx_notes_old_path_author has author_username as the second column, which is suboptimal for these probes.

Schema note: The notes table uses discussion_id as its FK to discussions, which in turn has merge_request_id. There is no noteable_id column on notes. The previous plan revision incorrectly referenced noteable_id — this is corrected.

Removed: The idx_mr_state_timestamps composite index on merge_requests(state, merged_at, closed_at, updated_at, created_at) was removed. MR lookups in the scoring query are always id-driven (joining from matched_file_changes or discussions), so the state-aware CASE expression operates on rows already fetched by primary key. A low-selectivity composite index on 5 columns would consume space without improving any query path.

Partial indexes (with WHERE clauses) keep the index size minimal — only DiffNote rows and non-null old_path rows are indexed.

7. Test Helpers

Add timestamp-aware variants:

  • insert_mr_at(conn, id, project_id, iid, author, state, updated_at_ms)
  • insert_diffnote_at(conn, id, discussion_id, project_id, author, file_path, body, created_at_ms)

8. New Tests (TDD)

Example-based tests

test_half_life_decay_math: Verify the pure function:

  • elapsed=0 -> 1.0
  • elapsed=half_life -> 0.5
  • elapsed=2*half_life -> 0.25
  • half_life_days=0 -> 0.0 (guard against div-by-zero)

test_expert_scores_decay_with_time: Two authors, one recent (10 days), one old (360 days). Recent author should score ~24, old author ~6.

test_expert_reviewer_decays_faster_than_author: Same MR, same age (90 days). Author retains ~18 points, reviewer retains ~5 points. Author dominates clearly.

test_reviewer_participated_vs_assigned_only: Two reviewers on the same MR at the same age. One left DiffNotes (participated), one didn't (assigned-only). Participated reviewer should score ~10 * decay, assigned-only should score ~3 * decay. Verifies the split works end-to-end.

test_note_diminishing_returns_per_mr: One reviewer with 1 note on MR-A and another with 20 notes on MR-B, both at same age. The 20-note reviewer should score log2(21)/log2(2) ≈ 4.4x the 1-note reviewer, NOT 20x. Validates the log2(1+count) cap.

test_config_validation_rejects_zero_half_life: ScoringConfig with author_half_life_days = 0 should return ConfigInvalid error.

test_file_change_timestamp_uses_merged_at: An MR with merged_at set and state = 'merged' should use merged_at timestamp, not updated_at. Verify by setting merged_at to old date and updated_at to recent date — score should reflect the old date.

test_open_mr_uses_updated_at: An MR with state = 'opened' should use updated_at (not created_at). Verify that an open MR with recent updated_at scores higher than one with the same created_at but older updated_at.

test_old_path_match_credits_expertise: Insert a DiffNote with position_old_path = "src/old.rs" and position_new_path = "src/new.rs". Query --path src/old.rs — the author should appear. Query --path src/new.rs — same author should also appear. Validates dual-path matching.

test_explain_score_components_sum_to_total: With --explain-score, verify that components.author + components.reviewer_participated + components.reviewer_assigned + components.notes equals the reported score_raw (within f64 rounding tolerance). Note: the closed_mr_multiplier is already folded into the per-component subtotals, not tracked as a separate component.

test_as_of_produces_deterministic_results: Insert data at known timestamps. Run query_expert twice with the same --as-of value — results must be identical. Then run with a later --as-of — scores should be lower (more decay).

test_old_path_probe_exact_and_prefix: Insert a DiffNote with position_old_path = "src/old/foo.rs" and position_new_path = "src/new/foo.rs". Call build_path_query(conn, "src/old/foo.rs") — should resolve as exact file (not "not found"). Call build_path_query(conn, "src/old/") — should resolve as prefix. Validates that the path resolution probes now check old_path columns.

test_suffix_probe_uses_old_path_sources: Insert a file change with old_path = "legacy/utils.rs" and new_path = "src/utils.rs". Call build_path_query(conn, "legacy/utils.rs") — should resolve via exact probe on old_path. Call build_path_query(conn, "utils.rs") — suffix probe should find both legacy/utils.rs and src/utils.rs and either resolve uniquely (if deduplicated) or report ambiguity.

test_since_relative_to_as_of_clock: Insert data at timestamps T1 and T2 (T2 > T1). With --as-of T2 and --since 30d, the window is [T2 - 30d, T2], not [now - 30d, now]. Verify that data at T1 is included or excluded based on the as-of-relative window, not the wall clock window.

test_explain_and_detail_are_mutually_exclusive: Parsing --explain-score --detail should fail with a conflict error from clap.

test_trivial_note_does_not_count_as_participation: A reviewer who left only a short note ("LGTM", 4 chars) on an MR should be classified as assigned-only, not participated, when reviewer_min_note_chars = 20. A reviewer who left a substantive note (>= 20 chars) should be classified as participated. Validates the LENGTH threshold in the reviewer_participation CTE.

test_closed_mr_multiplier: Two identical MRs (same author, same age, same path). One is merged, one is closed. The merged MR should contribute author_weight * decay(...), the closed MR should contribute author_weight * closed_mr_multiplier * decay(...). With default multiplier 0.5, the closed MR contributes half.

test_as_of_excludes_future_events: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With --as-of set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the exclusive upper-bound (< ?4) filtering in SQL.

test_as_of_exclusive_upper_bound: Insert an event with timestamp exactly equal to the as_of_ms value. Verify it is excluded from results (strict less-than, not less-than-or-equal). This validates the half-open interval [since, as_of) semantics.

test_excluded_usernames_filters_bots: Insert signals for a user named "renovate-bot" and a user named "jsmith", both with the same activity. With excluded_usernames: ["renovate-bot"] in config, only "jsmith" should appear in results. Validates the Rust-side post-query filtering.

test_include_bots_flag_disables_filtering: Same setup as above, but with --include-bots active. Both "renovate-bot" and "jsmith" should appear in results.

test_null_timestamp_fallback_to_created_at: Insert a merged MR with merged_at = NULL (edge case: old data before the column was populated). The state-aware timestamp should fall back to created_at. Verify the score reflects created_at, not 0 or a panic.

test_path_normalization_handles_dot_and_double_slash: Call normalize_query_path("./src//foo.rs") — should return "src/foo.rs". Call normalize_query_path(" src/bar.rs ") — should return "src/bar.rs". Call normalize_query_path("src/foo.rs") — should return unchanged (already normalized). Call normalize_query_path("") — should return "" (empty input passes through).

test_path_normalization_preserves_prefix_semantics: Call normalize_query_path("./src/dir/") — should return "src/dir/" (trailing slash preserved for prefix intent). Call normalize_query_path("src/dir") — should return "src/dir" (no trailing slash = file, not prefix).

test_config_validation_rejects_absurd_half_life: ScoringConfig with author_half_life_days = 5000 (>3650 cap) should return ConfigInvalid error. Similarly, reviewer_min_note_chars = 5000 (>4096 cap) should fail.

test_config_validation_rejects_nan_multiplier: ScoringConfig with closed_mr_multiplier = f64::NAN should return ConfigInvalid error. Same for f64::INFINITY.

Invariant tests (regression safety for ranking systems)

test_score_monotonicity_by_age: For any single signal type, an older timestamp must never produce a higher score than a newer timestamp with the same weight and half-life. Generate N random (age, half_life) pairs and assert decay(older) <= decay(newer) for all.

test_row_order_independence: Insert the same set of signals in two different orders (e.g., reversed). Run query_expert on both — the resulting rankings (username order + scores) must be identical. Validates that neither SQL ordering nor HashMap iteration order affects final output.

test_reviewer_split_is_exhaustive: For a reviewer assigned to an MR, they must appear in exactly one of: participated (has substantive DiffNotes meeting reviewer_min_note_chars) or assigned-only (no DiffNotes, or only trivial ones below the threshold). Never both, never neither. Test three cases: (1) reviewer with substantive DiffNotes -> participated only, (2) reviewer with no DiffNotes -> assigned-only only, (3) reviewer with only trivial notes ("LGTM") -> assigned-only only.

test_deterministic_accumulation_order: Insert signals for a user with contributions at many different timestamps (10+ MRs with varied ages). Run query_expert 100 times in a loop. All 100 runs must produce the exact same f64 score (bit-identical). Validates that the sorted contribution ordering eliminates HashMap-iteration-order nondeterminism.

9. Existing Test Compatibility

All existing tests insert data with now_ms(). With decay, elapsed ~0ms means decay ~1.0, so scores round to the same integers as before. No existing test assertions should break.

The test_expert_scoring_weights_are_configurable test needs ..Default::default() added to fill the new half-life fields, reviewer_assignment_weight / reviewer_assignment_half_life_days, closed_mr_multiplier, reviewer_min_note_chars, and excluded_usernames fields.

Verification

  1. cargo check --all-targets — no compiler errors
  2. cargo clippy --all-targets -- -D warnings — no lints
  3. cargo fmt --check — formatting clean
  4. cargo test — all existing + new tests pass (including invariant tests)
  5. ubs src/cli/commands/who.rs src/core/config.rs src/core/db.rs — no bug scanner findings
  6. Manual query plan verification (not automated — SQLite planner varies across versions):
    • Run EXPLAIN QUERY PLAN on the expert query (both exact and prefix modes) against a real database
    • Confirm that matched_notes_raw branch 1 uses the existing new_path index and branch 2 uses idx_notes_old_path_author (not a full table scan on either branch)
    • Confirm that matched_file_changes_raw branch 1 uses idx_mfc_new_path_project_mr and branch 2 uses idx_mfc_old_path_project_mr
    • Confirm that reviewer_participation CTE uses idx_notes_diffnote_discussion_author
    • Confirm that mr_activity CTE joins merge_requests via primary key from matched_file_changes
    • Confirm that path resolution probes (old_path leg) use idx_notes_old_path_project_created
    • Document the observed plan in a comment near the SQL for future regression reference
  7. Performance baseline (manual, not CI-gated):
    • Run time cargo run --release -- who --path <exact-path> on the real database for exact, prefix, and suffix modes
    • Target SLOs: p95 exact path < 200ms, prefix < 300ms, suffix < 500ms on development hardware
    • Record baseline timings as a comment near the SQL for regression reference
    • If any mode exceeds 2x the baseline after future changes, investigate before merging
    • Note: These are soft targets for developer awareness, not automated CI gates. Automated benchmarking with synthetic fixtures (100k/1M/5M notes) is a v2 investment if performance becomes a real concern.
  8. Real-world validation:
    • cargo run --release -- who --path MeasurementQualityDialog.tsx — verify jdefting/zhayes old reviews are properly discounted relative to recent authors
    • cargo run --release -- who --path MeasurementQualityDialog.tsx --all-history — compare full history vs 24m window to validate cutoff is reasonable
    • cargo run --release -- who --path MeasurementQualityDialog.tsx --explain-score — verify component breakdown sums to total and authored signal dominates for known authors
    • Spot-check that assigned-only reviewers (those who never left DiffNotes) rank below participated reviewers on the same MR
    • Test a known renamed file path — verify expertise from the old name carries forward
    • cargo run --release -- who --path MeasurementQualityDialog.tsx --as-of 2025-06-01 — verify deterministic output across repeated runs
    • Spot-check that reviewers who only left "LGTM"-style notes are classified as assigned-only (not participated)
    • Verify closed MRs contribute at ~50% of equivalent merged MR scores via --explain-score
    • If the project has known bot accounts (e.g., renovate-bot), add them to excluded_usernames config and verify they no longer appear in results. Run again with --include-bots to confirm they reappear.
    • Test path normalization: who --path ./src//foo.rs and who --path src/foo.rs should produce identical results

Accepted from External Review

Ideas incorporated from ChatGPT review (feedback-1 through feedback-4) that genuinely improved the plan:

From feedback-1 and feedback-2:

  • Path rename awareness (old_path matching): Real correctness gap. Both position_old_path and mr_file_changes.old_path exist in the schema. Simple OR clause addition with high value — expertise now survives file renames.
  • Hybrid SQL pre-aggregation: Revised from "fully raw rows" to pre-aggregate note counts per (user, MR) in SQL. MR-level signals were already 1-per-MR; the note rows were the actual scalability risk. Bounded row counts with predictable memory.
  • State-aware timestamps: Improved from our overly-simple COALESCE(merged_at, created_at) to a state-aware CASE expression. Open MRs genuinely need updated_at to reflect ongoing work; merged MRs need merged_at to anchor expertise formation.
  • Index migration: The dual-path matching and CTE patterns need index support. Added partial indexes to keep size minimal.
  • Invariant tests: test_score_monotonicity_by_age, test_row_order_independence, test_reviewer_split_is_exhaustive catch subtle ranking regressions that example-based tests miss.
  • --as-of flag: Simple clock-pinning for reproducible decay scoring. Essential for debugging and test determinism.
  • --explain-score flag: Moved from rejected to included with minimal scope (component breakdown only, no per-MR drill-down). Multi-signal scoring needs decomposition to build trust.

From feedback-3:

  • Fix noteable_id index bug (critical): The notes table uses discussion_id as FK to discussions, not noteable_id (which doesn't exist). The proposed idx_notes_mr_path_author index would fail at migration time. Fixed to use (discussion_id, author_username, created_at).
  • CTE-based dual-path matching (matched_notes, matched_file_changes): Rather than repeating OR old_path in every signal subquery, centralize path matching in foundational CTEs. Defined once, indexed once, maintained once. Cleaner and more extensible.
  • Precomputed reviewer_participation CTE: Replaced correlated EXISTS/NOT EXISTS subqueries with a materialized set of (mr_id, username) pairs. Same semantics, lower query cost, simpler reasoning about the reviewer split.
  • log2(1+count) over ln(1+count) for notes: With log2, one note contributes exactly 1.0 unit (since log2(2) = 1), making note_bonus=1 directly interpretable. ln gives 0.69 per note, which is unintuitive.
  • Path resolution probe rename awareness: The plan added old_path matching to scoring queries but missed the upstream path resolution layer (build_path_query() probes and suffix_probe()). Without this, querying an old path name fails at resolution and never reaches scoring. Now both probes check old_path columns.
  • Removed low-selectivity idx_mr_state_timestamps: MR lookups in scoring are id-driven (from file_changes or discussions), so a 5-column composite on state/timestamps adds no query benefit.
  • Added idx_mfc_new_path_project_mr: Ensures index parity between old and new path columns on mr_file_changes.
  • --explain-score conflicts with --detail: Prevents confusing overlapping output from two per-user augmentation flags.
  • scoring_model_version in resolved_input: Lets robot clients distinguish v1 (flat weights) from v2 (decayed) output schemas.
  • score_raw in explain mode: Exposes the unrounded f64 so component totals can be verified without rounding noise.
  • New tests: test_old_path_probe_exact_and_prefix, test_suffix_probe_uses_old_path_sources, test_since_relative_to_as_of_clock, test_explain_and_detail_are_mutually_exclusive, test_null_timestamp_fallback_to_created_at — cover the newly-identified gaps in path resolution, clock semantics, and edge cases.
  • EXPLAIN QUERY PLAN verification step: Manual check that the restructured queries use the new indexes (not automated, since SQLite planner varies across versions).

From feedback-4:

  • --as-of temporal correctness (critical): The plan described --as-of but the SQL only enforced a lower bound (>= ?2). Events after the as-of date would leak in with full weight (because elapsed.max(0.0) clamps negative elapsed time to zero). Added < ?4 upper bound to all SQL timestamp filters, making the query window [since_ms, as_of_ms). Without this, --as-of reproducibility was fundamentally broken. (Refined to exclusive upper bound in feedback-5.)
  • Closed-state inconsistency resolution: The state-aware CASE expression handled closed state but the WHERE clause filtered to ('opened','merged') only — dead code. Resolved by including 'closed' in state filters and adding a closed_mr_multiplier (default 0.5) applied in Rust to all signals from closed-without-merge MRs. This credits real review effort on abandoned MRs while appropriately discounting it.
  • Substantive note threshold for reviewer participation: A single "LGTM" shouldn't promote a reviewer from 3-point (assigned-only) to 10-point (participated) weight. Added reviewer_min_note_chars (default 20) config field and LENGTH(TRIM(body)) filter in the reviewer_participation CTE. This raises the bar for participation classification to actual substantive review comments.
  • UNION ALL optimization for path predicates: SQLite's planner can degrade OR across two indexed columns to a table scan. Originally documented as a fallback; promoted to default strategy in feedback-5 iteration. The UNION ALL + dedup approach ensures each index branch is used independently.
  • New tests: test_trivial_note_does_not_count_as_participation, test_closed_mr_multiplier, test_as_of_excludes_future_events — cover the three new features added from this review round.

From feedback-5 (ChatGPT review):

  • Exclusive upper bound for --as-of: Changed from [since_ms, as_of_ms] (inclusive) to [since_ms, as_of_ms) (exclusive). Half-open intervals are the standard convention in temporal systems — they eliminate edge-case ambiguity when events have timestamps exactly at the boundary. Also added YYYY-MM-DD → end-of-day UTC parsing and window metadata in robot output.
  • UNION ALL as default for dual-path matching: Promoted from "fallback if planner regresses" to default strategy. SQLite OR-across-indexed-columns degradation is common enough that the predictable UNION ALL + dedup approach is the safer starting point. The simpler OR variant is retained as a comment for benchmarking.
  • Deterministic contribution ordering: Within each signal type, sort contributions by mr_id before summing. This eliminates HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the overhead of compensated summation (Neumaier/Kahan was rejected as overkill at this scale).
  • Minimal bot/service-account filtering: Added excluded_usernames (exact match, case-insensitive) to ScoringConfig and --include-bots CLI flag. Applied as a Rust-side post-filter (not SQL) to keep queries clean. Scope is deliberately minimal — no regex patterns, no heuristic detection. Users configure the list for their team's specific bots.
  • Performance baseline SLOs: Added manual performance baseline step to verification — record timings for exact/prefix/suffix modes and flag >2x regressions. Kept lightweight (no CI gating, no synthetic benchmarks) to match the project's current maturity.
  • New tests: test_as_of_exclusive_upper_bound, test_excluded_usernames_filters_bots, test_include_bots_flag_disables_filtering, test_deterministic_accumulation_order — cover the newly-accepted features.

From feedback-6 (ChatGPT review):

  • Centralized mr_activity CTE: The state-aware timestamp CASE expression and closed_mr_multiplier were repeated across signals 3, 4a, 4b with the multiplier applied later in Rust via string-matching on mr_state. This was brittle — the CASE could drift between branches and the Rust-side string matching was unnecessary indirection. A single mr_activity CTE defines both activity_ts and state_mult once, scoped to matched MRs only (via JOIN with matched_file_changes). Signals 1 and 2 still compute state_mult inline because they join through discussions, not matched_file_changes.
  • Parameterized reviewer_min_note_chars and closed_mr_multiplier: Previously reviewer_min_note_chars was inlined as a literal in the SQL string and closed_mr_multiplier was applied only in Rust. Binding both as SQL parameters (?5 for closed_mr_multiplier, ?6 for reviewer_min_note_chars) eliminates statement-cache churn, ensures identical SQL text regardless of config values, and simplifies EXPLAIN QUERY PLAN analysis.
  • Tightened config validation: Added upper bounds — *_half_life_days <= 3650 (10-year safety cap), reviewer_min_note_chars <= 4096, and closed_mr_multiplier must be finite (not NaN/Inf). These prevent absurd configurations from silently producing nonsensical results.
  • Path canonicalization via normalize_query_path(): Inputs like ./src//foo.rs or whitespace-padded paths could fail path resolution even when the file exists in the database. A simple syntactic normalization (strip ./, collapse //, trim whitespace, preserve trailing /) runs before build_path_query() to reduce false negatives. No filesystem or database lookups — purely string manipulation.
  • Probe-optimized idx_notes_old_path_project_created index: The scoring-oriented idx_notes_old_path_author index has author_username as its second column, which is suboptimal for path resolution probes that don't constrain author. A dedicated probe index on (position_old_path, project_id, created_at) ensures build_path_query() and suffix_probe() old_path lookups are efficient.
  • New tests: test_path_normalization_handles_dot_and_double_slash, test_path_normalization_preserves_prefix_semantics, test_config_validation_rejects_absurd_half_life, test_config_validation_rejects_nan_multiplier — cover the path canonicalization and tightened validation logic.

Rejected Ideas (with rationale)

These suggestions were considered during review but explicitly excluded from this iteration:

  • Rename alias chain expansion (A->B->C traversal) (feedback-2 #2, feedback-4 #4): Over-engineered for v1. The old_path OR match covers the 80% case (direct renames). Building a canonical path identity table at ingest time adds schema, ingestion logic, and graph traversal complexity for rare multi-hop renames. If real-world usage shows fragmented expertise on multi-rename files, this becomes a v2 feature.
  • Config-driven max_age_days (feedback-1 #5, feedback-2 #5): We already have --since (explicit window), --all-history (no window), and the 24m default (mathematically justified). Adding a config field that derives the default since window creates confusing interaction between config and CLI flags. If half-lives change, updating the default constant is trivial.
  • Config-driven decay_floor for derived --since default (feedback-3 #4): Proposed computing the default since window as ceil(max_half_life * log2(1/floor)) so it auto-adjusts when half-lives change. Rejected: the formula is non-obvious to users, adds a config param (decay_floor) with no intuitive meaning, and the benefit is negligible — half-life changes are rare, and updating a constant is trivial. The 24m default is already mathematically justified and easy to override with --since or --all-history.
  • BTreeMap + Kahan/Neumaier compensated summation (feedback-3 #6): Proposed deterministic iteration order and numerically stable summation. Rejected for this scale: the accumulator processes dozens to low hundreds of entries per user, where HashMap iteration order doesn't measurably affect f64 sums. Compensated summation adds code complexity for zero practical benefit at this magnitude. If we eventually aggregate thousands of signals per user, revisit.
  • Confidence/coverage metadata (feedback-1 #8, feedback-2 #8, feedback-3 #9, feedback-4 #6): Repeatedly proposed across reviews with variations (score_adjusted with confidence factor, low/medium/high labels, evidence_mr_count weighting). Still scope creep. The --explain-score component breakdown already tells users which signal drives the score. Defining "sparse evidence" thresholds (how many MRs is "low"? what's the right exponential saturation constant?) is domain-specific guesswork without user feedback data. A single recent MR "outranking broader expertise" is the correct behavior of time-decay — the model intentionally weights recency. If real-world usage shows this is a problem, confidence becomes a v2 feature informed by actual threshold data.
  • Automated EXPLAIN QUERY PLAN tests (feedback-3 #10 partial): SQLite's query planner changes across versions and can use different plans on different data distributions. Automated assertions on plan output are brittle. Instead, we document EXPLAIN QUERY PLAN as a manual verification step during development and include the observed plan as a comment near the SQL.
  • Per-MR evidence drill-down in --explain-score (feedback-2 #7 promoted this): The v1 --explain-score shows component totals only. Listing top-evidence MRs per user would require additional SQL queries and significant output format work. Deferred unless component breakdowns prove insufficient for debugging.
  • Split scoring engine into core module (feedback-4 #5): Proposed extracting scoring math from who.rs into src/core/scoring/model_v2_decay.rs. Premature modularization — who.rs is the only consumer and is ~800 lines. Adding module plumbing and indirection for a single call site adds complexity without reducing it. If we add a second scoring consumer (e.g., automated triage), revisit.
  • Bot/service-account filtering (feedback-4 #7): Real concern but orthogonal to time-decay scoring. This is a general data quality feature that belongs in its own issue — it affects all who modes, not just expert scoring. Adding excluded_username_patterns config and --include-bots flag is scope expansion that should be designed and tested independently.
  • Model compare mode / rank-delta diagnostics (feedback-4 #9): Over-engineered rollout safety for an internal CLI tool with ~3 users. Maintaining two parallel scoring codepaths (v1 flat + v2 decayed) doubles test surface and code complexity. The --explain-score + --as-of combination already provides debugging capability. If a future model change is risky enough to warrant A/B comparison, build it then.
  • Canonical path identity graph (feedback-5 #1, also feedback-2 #2, feedback-4 #4): Third time proposed, third time rejected. Building a rename graph from mr_file_changes(old_path, new_path) with identity resolution requires new schema (path_identities, path_aliases tables), ingestion pipeline changes, graph traversal at query time, and backfill logic for existing data. The UNION ALL dual-path matching already covers the 80%+ case (direct renames). Multi-hop rename chains (A→B→C) are rare in practice and can be addressed in v2 with real usage data showing the gap matters.
  • Normalized expertise_events table (feedback-5 #2): Proposes shifting from query-time CTE joins to a precomputed expertise_events table populated at ingest time. While architecturally appealing for read performance, this doubles the data surface area (raw tables + derived events), requires new ingestion pipelines with incremental upsert logic, backfill tooling for existing databases, and introduces consistency risks when raw data is corrected/re-synced. The CTE approach is correct, maintainable, and performant at our current scale. If query latency becomes a real bottleneck (see performance baseline SLOs), materialized views or derived tables become a v2 optimization.
  • Reviewer engagement model upgrade (feedback-5 #3): Proposes adding approved/changes_requested review-state signals and trivial-comment pattern matching (["lgtm","+1","nit","ship it"]). Expands the signal type count from 4 to 6 and adds a fragile pattern-matching layer (what about "don't ship it"? "lgtm but..."?). The reviewer_min_note_chars threshold is imperfect but pragmatic — it's a single configurable number with no false-positive risk from substring matching. Review-state signals may be worth adding later as a separate enhancement when we have data on how often they diverge from DiffNote participation.
  • Contribution-floor auto cutoff for --since (feedback-5 #5): Proposes --since auto computing the earliest relevant timestamp from min_contribution_floor (e.g., 0.01 points). Adds a non-obvious config parameter for minimal benefit — the 24m default is already mathematically justified from the decay curves (author: 6%, reviewer: 0.4% at 2 years) and easily overridden with --since or --all-history. The auto-derivation formula (ceil(max_half_life * log2(1/floor))) is opaque to users who just want to understand why a certain time range was selected.
  • Full evidence drill-down in --explain-score (feedback-5 #8): Proposes --explain-score=summary|full with per-MR evidence rows. Already rejected in feedback-2 #7. Component totals are sufficient for v1 debugging — they answer "which signal type drives this user's score." Per-MR drill-down requires additional SQL queries and significant output format complexity. Deferred unless component breakdowns prove insufficient.
  • Neumaier compensated summation (feedback-5 #7 partial): Accepted the sorting aspect for deterministic ordering, but rejected Neumaier/Kahan compensated summation. At the scale of dozens to low hundreds of contributions per user, the rounding error from naive f64 summation is on the order of 1e-14 — several orders of magnitude below any meaningful score difference. Compensated summation adds code complexity and a maintenance burden for no practical benefit at this scale.
  • Automated CI benchmark gate (feedback-5 #10 partial): Accepted manual performance baselines, but rejected automated CI regression gating with synthetic fixtures (100k/1M/5M notes). Building and maintaining benchmark infrastructure is a significant investment that's premature for a CLI tool with ~3 users. Manual timing checks during development are sufficient until performance becomes a real concern.
  • Epsilon-based tie buckets for ranking (feedback-6 #4) — rejected because the plan already has deterministic contribution ordering by mr_id within each signal type, which eliminates HashMap-iteration nondeterminism. Platform-dependent powf differences at the scale of dozens to hundreds of contributions per user are sub-epsilon (order of 1e-15). If two users genuinely score within 1e-9 of each other, the existing tiebreak by (last_seen DESC, username ASC) is already meaningful and deterministic. Adding a bucketing layer introduces a magic epsilon constant and floor operation for a problem that doesn't manifest in practice.
  • --diagnose-score aggregated diagnostics flag (feedback-6 #5) — rejected because this is diagnostic/debugging tooling that adds a new flag, new output format, and new counting logic (matched_notes_raw_count, dedup_count, window exclusions, etc.) across the SQL pipeline. The existing --explain-score component breakdown + manual EXPLAIN QUERY PLAN verification already covers the debugging need. The additional SQL instrumentation required (counting rows at each CTE stage) would complicate the query for a feature with unclear demand. A v2 addition if operational debugging becomes a recurring need.
  • Multi-path expert scoring (--path repeatable) (feedback-6 #7) — rejected because this is a feature expansion, not a plan improvement for the time-decay model. Multi-path requires a requested_paths CTE, modified dedup logic keyed on (username, signal, mr_id) across paths, CLI parsing changes for repeatable --path and --path-file, and new test cases for overlap/prefix/dedup semantics. This is a separate bead/feature that should be designed independently — it's orthogonal to time-decay scoring and can be added later without requiring any changes to the decay model.