Files

Taylor Eernisse 4185abe05d docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc

Ideas catalog (docs/ideas/): 25 feature concept documents covering future
lore capabilities including bottleneck detection, churn analysis, expert
scoring, collaboration patterns, milestone risk, knowledge silos, and more.
Each doc includes motivation, implementation sketch, data requirements, and
dependencies on existing infrastructure. README.md provides an overview and
SYSTEM-PROPOSAL.md presents the unified analytics vision.

Plans (plans/): Time-decay expert scoring design with four rounds of review
feedback exploring decay functions, scoring algebra, and integration points
with the existing who-expert pipeline.

Issue doc (docs/issues/001): Documents the timeline pipeline bug where
EntityRef was missing project context, causing ambiguous cross-project
references during the EXPAND stage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-09 10:16:48 -05:00

49 KiB

Raw Blame History

plan, title, status, iteration, target_iterations, beads_revision, related_plans, created, updated

plan

title

status

iteration

target_iterations

beads_revision

related_plans

created

updated

true

iterating

2026-02-08

Time-Decay Expert Scoring Model

Context

The lore who --path command currently uses flat weights to score expertise: each authored MR counts as 25 points, each reviewed MR as 10, each inline note as 1 — regardless of when the activity happened. This produces three compounding problems:

Temporal blindness: Old activity counts the same as recent activity. Someone who authored a file 2 years ago ranks equivalently to someone who wrote it last week.
Reviewer inflation: Senior reviewers (jdefting, zhayes) who rubber-stamp every MR via assignment accumulate inflated scores indistinguishable from reviewers who actually left substantive inline feedback. The mr_reviewers table captures assignment, not engagement.
Path-history blindness: Renamed or moved files lose historical expertise because signal matching relies on position_new_path and mr_file_changes.new_path only. A developer who authored the file under its previous name gets zero credit after a rename.

The fix has three parts:

Apply exponential half-life decay to each signal, grounded in cognitive science research
Split the reviewer signal into "participated" (left DiffNotes) vs "assigned-only" (in mr_reviewers but no inline comments), with different weights and decay rates
Match both old and new paths in all signal queries AND path resolution probes so expertise survives file renames

Research Foundation

Ebbinghaus Forgetting Curve (1885): Memory retention follows exponential decay: R = 2^(-t/h) where h is the half-life
Generation Effect (Slamecka & Graf, 1978): Producing information (authoring code) creates ~2x more durable memory traces than reading it (reviewing)
Levels of Processing (Craik & Lockhart, 1972): Deeper cognitive engagement creates more durable memories — authoring > reviewing > commenting
Half-Life Regression (Settles & Meeder, 2016, Duolingo): Exponential decay with per-signal-type half-lives is practical and effective at scale. Chosen over power law for additivity, bounded behavior, and intuitive parameterization
Fritz et al. (2010, ICSE): "Degree-of-knowledge" model for code familiarity considers both authoring and interaction events with time-based decay

Scoring Formula

score(user, path) = Sum_i( weight_i * 2^(-days_elapsed_i / half_life_i) )

For note signals grouped per MR, a diminishing-returns function caps comment storms:

note_contribution(mr) = note_bonus * log2(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)

Why log2 instead of ln? With log2, a single note contributes exactly note_bonus * 1.0 (since log2(2) = 1), making the note_bonus weight directly interpretable as "points per note at count=1." With ln, one note contributes note_bonus * 0.69, which is unintuitive and means note_bonus=1 doesn't actually mean "1 point per note." The diminishing-returns curve shape is identical — only the scale factor differs.

Per-signal contributions (each signal is either per-MR or per-note-group):

Signal Type	Base Weight	Half-Life	Rationale
Author (authored MR touching path)	25	180 days	Deep generative engagement; ~50% retention at 6 months
Reviewer Participated (left DiffNote on MR/path)	10	90 days	Active review engagement; ~50% at 3 months
Reviewer Assigned-Only (in `mr_reviewers`, no DiffNote on path)	3	45 days	Passive assignment; minimal cognitive engagement, fades fast
Note (inline DiffNotes on path, grouped per MR)	1	45 days	`log2(1+count)` per MR; diminishing returns prevent comment storms

Why split reviewers? The mr_reviewers table records assignment, not engagement. A reviewer who left 5 inline comments on a file has demonstrably more expertise than one who was merely assigned and clicked "approve." The participated signal inherits the old reviewer weight (10) and decay (90 days); the assigned-only signal gets reduced weight (3) and faster decay (45 days) — enough to register but not enough to inflate past actual contributors.

Why require substantive notes? Participation is qualified by a minimum note body length (reviewer_min_note_chars, default 20). Without this, a single "LGTM" or "+1" comment would promote a reviewer from the 3-point assigned-only tier to the 10-point participated tier — a 3.3x weight increase for zero substantive engagement. The threshold is configurable to accommodate teams with different review conventions.

Why cap notes per MR? Without diminishing returns, a back-and-forth thread of 30 comments on a single MR would score 30 note points — disproportionate to the expertise gained. log2(1 + 30) ≈ 4.95 vs log2(1 + 1) = 1.0 preserves the signal that more comments = more engagement while preventing outlier MRs from dominating. The 30-note reviewer gets ~5x the credit of a 1-note reviewer, not 30x.

Author/reviewer signals are deduplicated per MR (one signal per distinct MR). Note signals are grouped per (user, MR) and use log2(1 + count) scaling.

Why include closed MRs? Closed-without-merge MRs represent real review effort and code familiarity even though the code was abandoned. All signals from closed MRs are multiplied by closed_mr_multiplier (default 0.5) to reflect this reduced but non-zero contribution. This applies uniformly to author, reviewer, and note signals on closed MRs.

Files to Modify

src/core/config.rs — Add half-life fields + assigned-only reviewer config to ScoringConfig; add config validation
src/cli/commands/who.rs — Core changes:
- Add half_life_decay() pure function
- Restructure query_expert(): SQL returns hybrid-aggregated signal rows with timestamps (MR-level for author/reviewer, note-count-per-MR for notes), Rust applies decay + log2(1+count) + final ranking
- Match both new_path and old_path in all signal queries (rename awareness)
- Extend rename awareness to build_path_query() probes and suffix_probe() (not just scoring)
- Split reviewer signal into participated vs assigned-only
- Use state-aware timestamps (merged_at for merged MRs, updated_at for open MRs)
- Change default --since from "6m" to "24m" (2 years captures all meaningful decayed signals)
- Add --as-of flag for reproducible scoring at a fixed timestamp
- Add --explain-score flag for per-user score component breakdown
- Sort on raw f64 score, round only for display
- Update tests
src/core/db.rs — Add migration for indexes supporting the new query shapes (dual-path matching, reviewer participation CTE, path resolution probes)

Implementation Details

1. ScoringConfig (config.rs)

Add half-life fields and the new assigned-only reviewer signal. All new fields use #[serde(default)] for backward compatibility:

pub struct ScoringConfig {
    pub author_weight: i64,                    // default: 25
    pub reviewer_weight: i64,                  // default: 10  (participated — left DiffNotes)
    pub reviewer_assignment_weight: i64,       // default: 3   (assigned-only — no DiffNotes on path)
    pub note_bonus: i64,                       // default: 1
    pub author_half_life_days: u32,            // default: 180
    pub reviewer_half_life_days: u32,          // default: 90  (participated)
    pub reviewer_assignment_half_life_days: u32, // default: 45 (assigned-only)
    pub note_half_life_days: u32,              // default: 45
    pub closed_mr_multiplier: f64,             // default: 0.5  (applied to closed-without-merge MRs)
    pub reviewer_min_note_chars: u32,          // default: 20   (minimum note body length to count as participation)
}

Config validation: Add a validate_scoring() call in Config::load_from_path() after deserialization:

All *_half_life_days must be > 0 (prevents division by zero in decay function)
All *_weight / *_bonus must be >= 0 (negative weights produce nonsensical scores)
closed_mr_multiplier must be in (0.0, 1.0] (0 would discard closed MRs entirely; >1 would over-weight them)
reviewer_min_note_chars must be >= 0 (0 disables the filter; typical useful values: 10-50)
Return LoreError::ConfigInvalid with a clear message on failure

2. Decay Function (who.rs)

fn half_life_decay(elapsed_ms: i64, half_life_days: u32) -> f64 {
    let days = (elapsed_ms as f64 / 86_400_000.0).max(0.0);
    let hl = f64::from(half_life_days);
    if hl <= 0.0 { return 0.0; }
    2.0_f64.powf(-days / hl)
}

3. SQL Restructure (who.rs)

The SQL uses CTE-based dual-path matching and hybrid aggregation. Rather than repeating OR old_path in every signal subquery, two foundational CTEs (matched_notes, matched_file_changes) centralize path matching. A third CTE (reviewer_participation) precomputes which reviewers actually left DiffNotes, avoiding correlated EXISTS/NOT EXISTS subqueries.

MR-level signals return one row per (username, signal, mr_id) with a timestamp; note signals return one row per (username, mr_id) with note_count and max_ts. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and log2(1+count).

WITH matched_notes AS (
  -- Centralize dual-path matching for DiffNotes
  SELECT n.id, n.discussion_id, n.author_username, n.created_at,
         n.position_new_path, n.position_old_path, n.project_id
  FROM notes n
  WHERE n.note_type = 'DiffNote'
    AND n.is_system = 0
    AND n.author_username IS NOT NULL
    AND n.created_at >= ?2
    AND n.created_at <= ?4
    AND (?3 IS NULL OR n.project_id = ?3)
    AND (n.position_new_path {path_op} OR n.position_old_path {path_op})
),
matched_file_changes AS (
  -- Centralize dual-path matching for file changes
  SELECT fc.merge_request_id, fc.project_id
  FROM mr_file_changes fc
  WHERE (?3 IS NULL OR fc.project_id = ?3)
    AND (fc.new_path {path_op} OR fc.old_path {path_op})
),
reviewer_participation AS (
  -- Precompute which (mr_id, username) pairs have substantive DiffNote participation.
  -- Materialized once, then joined against mr_reviewers to classify.
  -- The LENGTH filter excludes trivial notes ("LGTM", "+1", emoji-only) from qualifying
  -- a reviewer as "participated." Without this, a single "LGTM" would promote an assigned
  -- reviewer from 3-point to 10-point weight, defeating the purpose of the split.
  -- Note: mn.id refers back to notes.id, so we join notes to access the body column
  -- (not carried in matched_notes to avoid bloating that CTE with body text).
  SELECT DISTINCT d.merge_request_id AS mr_id, mn.author_username AS username
  FROM matched_notes mn
  JOIN discussions d ON mn.discussion_id = d.id
  JOIN notes n_body ON mn.id = n_body.id
  WHERE d.merge_request_id IS NOT NULL
    AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= {reviewer_min_note_chars}
),
raw AS (
  -- Signal 1: DiffNote reviewer (individual notes for note_cnt)
  SELECT mn.author_username AS username, 'diffnote_reviewer' AS signal,
         m.id AS mr_id, mn.id AS note_id, mn.created_at AS seen_at, m.state AS mr_state
  FROM matched_notes mn
  JOIN discussions d ON mn.discussion_id = d.id
  JOIN merge_requests m ON d.merge_request_id = m.id
  WHERE (m.author_username IS NULL OR mn.author_username != m.author_username)
    AND m.state IN ('opened','merged','closed')

  UNION ALL

  -- Signal 2: DiffNote MR author
  SELECT m.author_username AS username, 'diffnote_author' AS signal,
         m.id AS mr_id, NULL AS note_id, MAX(mn.created_at) AS seen_at, m.state AS mr_state
  FROM merge_requests m
  JOIN discussions d ON d.merge_request_id = m.id
  JOIN matched_notes mn ON mn.discussion_id = d.id
  WHERE m.author_username IS NOT NULL
    AND m.state IN ('opened','merged','closed')
  GROUP BY m.author_username, m.id

  UNION ALL

  -- Signal 3: MR author via file changes (state-aware timestamp)
  SELECT m.author_username AS username, 'file_author' AS signal,
         m.id AS mr_id, NULL AS note_id,
         {state_aware_ts} AS seen_at, m.state AS mr_state
  FROM matched_file_changes mfc
  JOIN merge_requests m ON mfc.merge_request_id = m.id
  WHERE m.author_username IS NOT NULL
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
    AND {state_aware_ts} <= ?4

  UNION ALL

  -- Signal 4a: Reviewer participated (in mr_reviewers AND left DiffNotes on path)
  SELECT r.username AS username, 'file_reviewer_participated' AS signal,
         m.id AS mr_id, NULL AS note_id,
         {state_aware_ts} AS seen_at, m.state AS mr_state
  FROM matched_file_changes mfc
  JOIN merge_requests m ON mfc.merge_request_id = m.id
  JOIN mr_reviewers r ON r.merge_request_id = m.id
  JOIN reviewer_participation rp ON rp.mr_id = m.id AND rp.username = r.username
  WHERE r.username IS NOT NULL
    AND (m.author_username IS NULL OR r.username != m.author_username)
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
    AND {state_aware_ts} <= ?4

  UNION ALL

  -- Signal 4b: Reviewer assigned-only (in mr_reviewers, NO DiffNotes on path)
  SELECT r.username AS username, 'file_reviewer_assigned' AS signal,
         m.id AS mr_id, NULL AS note_id,
         {state_aware_ts} AS seen_at, m.state AS mr_state
  FROM matched_file_changes mfc
  JOIN merge_requests m ON mfc.merge_request_id = m.id
  JOIN mr_reviewers r ON r.merge_request_id = m.id
  LEFT JOIN reviewer_participation rp ON rp.mr_id = m.id AND rp.username = r.username
  WHERE rp.username IS NULL  -- NOT in participation set
    AND r.username IS NOT NULL
    AND (m.author_username IS NULL OR r.username != m.author_username)
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
    AND {state_aware_ts} <= ?4
),
aggregated AS (
  -- MR-level signals: 1 row per (username, signal_class, mr_id) with MAX(ts)
  SELECT username, signal, mr_id, 1 AS qty, MAX(seen_at) AS ts, mr_state
  FROM raw WHERE signal != 'diffnote_reviewer'
  GROUP BY username, signal, mr_id
  UNION ALL
  -- Note signals: 1 row per (username, mr_id) with note_count and max_ts
  SELECT username, 'note_group' AS signal, mr_id, COUNT(*) AS qty, MAX(seen_at) AS ts, mr_state
  FROM raw WHERE signal = 'diffnote_reviewer' AND note_id IS NOT NULL
  GROUP BY username, mr_id
)
SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated WHERE username IS NOT NULL

Where {state_aware_ts} is the state-aware timestamp expression (defined in the next section), {path_op} is either = ?1 or LIKE ?1 ESCAPE '\\' depending on the path query type, ?4 is the as_of_ms upper bound (defaults to now_ms when --as-of is not specified), and {reviewer_min_note_chars} is the configured reviewer_min_note_chars value (default 20, inlined as a literal in the SQL string). The BETWEEN ?2 AND ?4 pattern ensures that when --as-of is set to a past date, events after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility.

Rationale for CTE-based dual-path matching: The previous approach (repeating OR old_path in every signal subquery) duplicated the path matching logic 5 times. Factoring it into matched_notes and matched_file_changes CTEs means path matching is defined once, the indexes are hit once, and adding future path resolution logic (e.g., alias chains) only requires changes in one place.

Index optimization fallback (UNION ALL split): SQLite's query planner sometimes struggles with OR across two indexed columns, falling back to a full table scan instead of using either index. If EXPLAIN QUERY PLAN shows this during step 6 verification, replace the OR-based CTEs with a UNION ALL split + dedup pattern:

matched_notes AS (
  SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
  UNION ALL
  SELECT ... FROM notes n WHERE ... AND n.position_old_path {path_op}
),
matched_notes_dedup AS (
  SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
  FROM matched_notes
),

This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when old_path = new_path (no rename). Start with the simpler OR approach and only switch to UNION ALL if query plans confirm the degradation.

Rationale for precomputed participation set: The previous approach used correlated EXISTS/NOT EXISTS subqueries to classify reviewers. The reviewer_participation CTE materializes the set of (mr_id, username) pairs from matched DiffNotes once, then signal 4a JOINs against it (participated) and signal 4b LEFT JOINs with IS NULL (assigned-only). This avoids per-reviewer-row correlated scans, is easier to reason about, and produces the same exhaustive split — every mr_reviewers row falls into exactly one bucket.

Rationale for hybrid over fully-raw: Pre-aggregating note counts in SQL prevents row explosion from heavy DiffNote volume on frequently-discussed paths. MR-level signals are already 1-per-MR by nature (deduped via GROUP BY in each subquery). This keeps memory and latency predictable regardless of review activity density.

Path rename awareness: Both matched_notes and matched_file_changes CTEs match against both old and new path columns:

Notes: (n.position_new_path {path_op} OR n.position_old_path {path_op})
File changes: (fc.new_path {path_op} OR fc.old_path {path_op})

Both columns already exist in the schema (notes.position_old_path from migration 002, mr_file_changes.old_path from migration 016). The OR match ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (--path src/foo/), the LIKE operator applies to both columns identically.

Signal 4 splits into two: The current signal 4 (file_reviewer) joins mr_reviewers but doesn't distinguish participation. In the new plan:

Signal 4a (file_reviewer_participated): User is in mr_reviewers AND appears in the reviewer_participation CTE (left DiffNotes on the path for that MR). Gets reviewer_weight (10) and reviewer_half_life_days (90).
Signal 4b (file_reviewer_assigned): User is in mr_reviewers but NOT in the reviewer_participation CTE. Gets reviewer_assignment_weight (3) and reviewer_assignment_half_life_days (45).

3a. Path Resolution Probes (who.rs)

Rename awareness must extend beyond scoring queries to the path resolution layer. Currently build_path_query() (line 457) and suffix_probe() (line 584) only check position_new_path and new_path. If a user queries an old path name, these probes return "not found" and the scoring query never runs.

Changes to build_path_query():

Probe 1 (exact_exists): Add OR position_old_path = ?1 to the notes query and OR old_path = ?1 to the mr_file_changes query. This detects files that existed under the queried name even if they've since been renamed.
Probe 2 (prefix_exists): Add OR position_old_path LIKE ?1 ESCAPE '\\' and OR old_path LIKE ?1 ESCAPE '\\' to the respective queries.

Changes to suffix_probe():

The UNION query inside suffix_probe() currently only selects position_new_path from notes and new_path from file changes. Add two additional UNION branches:

UNION
SELECT position_old_path AS full_path FROM notes
WHERE note_type = 'DiffNote' AND is_system = 0
  AND position_old_path IS NOT NULL
  AND (position_old_path LIKE ?1 ESCAPE '\\' OR position_old_path = ?2)
  AND (?3 IS NULL OR project_id = ?3)
UNION
SELECT old_path AS full_path FROM mr_file_changes
WHERE old_path IS NOT NULL
  AND (old_path LIKE ?1 ESCAPE '\\' OR old_path = ?2)
  AND (?3 IS NULL OR project_id = ?3)

This ensures that querying by an old filename (e.g., login.rs after it was renamed to auth.rs) still resolves to a usable path for scoring. The UNION deduplicates so the same path appearing in both old and new columns doesn't cause false ambiguity.

State-aware timestamps for file-change signals (signals 3, 4a, 4b): Replace m.updated_at with a state-aware expression:

CASE
  WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.created_at)
  WHEN m.state = 'closed' THEN COALESCE(m.closed_at, m.created_at)
  ELSE COALESCE(m.updated_at, m.created_at)  -- opened / other
END AS activity_ts

Rationale: updated_at is noisy for merged MRs — it changes on label edits, title changes, rebases, and metadata touches, creating false recency. merged_at is the best indicator of when code expertise was formed (the moment the code entered the branch). But for open MRs, updated_at is actually the right signal because it reflects ongoing active work. closed_at anchors closed-without-merge MRs to their closure time (these represent review effort even if the code was abandoned). Each state gets the timestamp that best represents when expertise was last exercised.

4. Rust-Side Aggregation (who.rs)

For each username, accumulate into a struct with:

Author MRs: HashMap<i64, (i64, String)> (mr_id -> (max timestamp, mr_state)) from diffnote_author + file_author signals
Reviewer Participated MRs: HashMap<i64, (i64, String)> from diffnote_reviewer + file_reviewer_participated signals
Reviewer Assigned-Only MRs: HashMap<i64, (i64, String)> from file_reviewer_assigned signals (excluding any MR already in participated set)
Notes per MR: HashMap<i64, (u32, i64, String)> (mr_id -> (count, max_ts, mr_state)) from note_group rows in the aggregated query (already grouped per user+MR with note_count in qty). Used for log2(1 + count) diminishing returns.
Last seen: max of all timestamps
Components (when --explain-score): Track per-component f64 subtotals for author, reviewer_participated, reviewer_assigned, notes

The mr_state field from each SQL row is stored alongside the timestamp so the Rust-side can apply closed_mr_multiplier when mr_state == "closed".

Compute score as f64. Each MR-level contribution is multiplied by closed_mr_multiplier (default 0.5) when the MR's state is "closed":

state_mult(mr) = if mr.state == "closed" { closed_mr_multiplier } else { 1.0 }

raw_score =
    sum(author_weight * state_mult(mr) * decay(now - ts, author_hl) for (mr, ts) in author_mrs)
  + sum(reviewer_weight * state_mult(mr) * decay(now - ts, reviewer_hl) for (mr, ts) in reviewer_participated)
  + sum(reviewer_assignment_weight * state_mult(mr) * decay(now - ts, reviewer_assignment_hl) for (mr, ts) in reviewer_assigned)
  + sum(note_bonus * state_mult(mr) * log2(1 + count) * decay(now - ts, note_hl) for (mr, count, ts) in notes_per_mr)

Why include closed MRs? A closed-without-merge MR still represents review effort and code familiarity — the reviewer read the diff, left comments, and engaged with the code even though it was ultimately abandoned. Excluding closed MRs entirely (the previous plan's approach) discarded this signal. The closed_mr_multiplier (default 0.5) halves the contribution, reflecting that the code never landed but the reviewer's cognitive engagement was real. This also eliminates the dead-code inconsistency where the state-aware CASE expression handled closed but the WHERE clause excluded it.

Sort on raw f64 score — (raw_score DESC, last_seen DESC, username ASC). This prevents false ties from premature rounding. Only round to i64 for the Expert.score display field after sorting and truncation. The robot JSON score field stays integer for backward compatibility. When --explain-score is active, also include score_raw (the unrounded f64) alongside score so the component totals can be verified without rounding noise.

Compute counts from the accumulated data:

review_mr_count = reviewer_participated.len() + reviewer_assigned.len()
review_note_count = notes_per_mr.values().map(|(count, _)| count).sum()
author_mr_count = author_mrs.len()

Truncate to limit after sorting.

5. Default --since Change

Expert mode: "6m" -> "24m" (line 289 in who.rs). At 2 years, author decay = 6%, reviewer decay = 0.4%, note decay = 0.006% — negligible, good cutoff.

Diagnostic escape hatch: Add --all-history flag (conflicts with --since) that sets since_ms = 0, capturing all data regardless of age. Useful for debugging scoring anomalies and validating the decay model against known experts. The since_mode field in robot JSON reports "all" when this flag is active.

5a. Reproducible Scoring via `--as-of`

Add --as-of <RFC3339|YYYY-MM-DD> flag that overrides the now_ms reference point used for decay calculations. When set:

All event selection is bounded by [since_ms, as_of_ms] — events after as_of_ms are excluded from SQL results entirely (not just decayed)
All decay computations use as_of_ms instead of SystemTime::now()
The --since window is calculated relative to as_of_ms (not wall clock)
Robot JSON resolved_input includes as_of_ms and as_of_iso fields

Rationale: Decayed scoring is time-sensitive by nature. Without a fixed reference point, the same query run minutes apart produces different rankings, making debugging and test reproducibility difficult. --as-of pins the clock so that results are deterministic for a given dataset. The upper-bound filter in SQL is critical — without it, events after the as-of date would enter with full weight (since elapsed.max(0.0) clamps negative elapsed time to zero), breaking the reproducibility guarantee.

Implementation: Parse the flag in run_who(), compute as_of_ms: i64, and thread it through to query_expert() where it replaces now_ms() and is bound as ?4 in all SQL queries. When the flag is absent, ?4 defaults to now_ms() (wall clock), which makes the upper bound transparent — all events are within the window by definition. The flag is compatible with all modes but primarily useful in expert mode.

5b. Score Explainability via `--explain-score`

Add --explain-score flag that augments each expert result with a per-user component breakdown:

{
  "username": "jsmith",
  "score": 42,
  "score_raw": 42.0,
  "components": {
    "author": 28.5,
    "reviewer_participated": 8.2,
    "reviewer_assigned": 1.8,
    "notes": 3.5
  }
}

Scope for this iteration: Component breakdown only (4 floats per user). No top-evidence MRs, no decay curves, no per-MR drill-down. Those are v2 features if scoring disputes arise frequently.

Flag conflicts: --explain-score is mutually exclusive with --detail. Both augment per-user output in different ways; combining them would produce confusing overlapping output. Clap conflicts_with enforces this at parse time.

Human output: When --explain-score is active in human mode, append a parenthetical after each score: 42 (author:28.5 review:10.0 notes:3.5).

Robot output: Add score_raw (unrounded f64) and components object to each expert entry. Only present when --explain-score is active (no payload bloat by default). The resolved_input section also includes scoring_model_version: 2 to distinguish from the v1 flat-weight model, enabling robot clients to adapt parsing.

Rationale: Multi-signal decayed ranking will be disputed without decomposition. Showing which signal drives a user's score makes results actionable and builds trust in the model. Keeping scope minimal avoids the output format complexity that originally motivated deferral.

6. Index Migration (db.rs)

Add a new migration to support the restructured query patterns. The dual-path matching CTEs and reviewer_participation CTE introduce query shapes that need index coverage:

-- Support dual-path matching on DiffNotes (old_path leg of the OR in matched_notes CTE)
CREATE INDEX IF NOT EXISTS idx_notes_old_path_author
  ON notes(position_old_path, author_username, created_at)
  WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;

-- Support dual-path matching on file changes (old_path leg of the OR in matched_file_changes CTE)
CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
  ON mr_file_changes(old_path, project_id, merge_request_id)
  WHERE old_path IS NOT NULL;

-- Support new_path matching on file changes (ensure index parity with old_path)
-- Existing indexes may not have optimal column order for the CTE pattern.
CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
  ON mr_file_changes(new_path, project_id, merge_request_id);

-- Support reviewer_participation CTE: joining matched_notes -> discussions -> mr_reviewers
-- notes.discussion_id (NOT noteable_id, which doesn't exist in the schema) is the FK to discussions
CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author
  ON notes(discussion_id, author_username, created_at)
  WHERE note_type = 'DiffNote' AND is_system = 0;

Rationale: The existing indexes cover position_new_path and new_path but not their old_path counterparts. Without these, the OR old_path clauses would force table scans on renamed files. The reviewer_participation CTE joins matched_notes -> discussions -> merge_requests, so an index on (discussion_id, author_username) speeds up the CTE materialization.

Schema note: The notes table uses discussion_id as its FK to discussions, which in turn has merge_request_id. There is no noteable_id column on notes. The previous plan revision incorrectly referenced noteable_id — this is corrected.

Removed: The idx_mr_state_timestamps composite index on merge_requests(state, merged_at, closed_at, updated_at, created_at) was removed. MR lookups in the scoring query are always id-driven (joining from matched_file_changes or discussions), so the state-aware CASE expression operates on rows already fetched by primary key. A low-selectivity composite index on 5 columns would consume space without improving any query path.

Partial indexes (with WHERE clauses) keep the index size minimal — only DiffNote rows and non-null old_path rows are indexed.

7. Test Helpers

Add timestamp-aware variants:

insert_mr_at(conn, id, project_id, iid, author, state, updated_at_ms)
insert_diffnote_at(conn, id, discussion_id, project_id, author, file_path, body, created_at_ms)

8. New Tests (TDD)

Example-based tests

test_half_life_decay_math: Verify the pure function:

elapsed=0 -> 1.0
elapsed=half_life -> 0.5
elapsed=2*half_life -> 0.25
half_life_days=0 -> 0.0 (guard against div-by-zero)

test_expert_scores_decay_with_time: Two authors, one recent (10 days), one old (360 days). Recent author should score ~24, old author ~6.

test_expert_reviewer_decays_faster_than_author: Same MR, same age (90 days). Author retains ~18 points, reviewer retains ~5 points. Author dominates clearly.

test_reviewer_participated_vs_assigned_only: Two reviewers on the same MR at the same age. One left DiffNotes (participated), one didn't (assigned-only). Participated reviewer should score ~10 * decay, assigned-only should score ~3 * decay. Verifies the split works end-to-end.

test_note_diminishing_returns_per_mr: One reviewer with 1 note on MR-A and another with 20 notes on MR-B, both at same age. The 20-note reviewer should score log2(21)/log2(2) ≈ 4.4x the 1-note reviewer, NOT 20x. Validates the log2(1+count) cap.

test_config_validation_rejects_zero_half_life: ScoringConfig with author_half_life_days = 0 should return ConfigInvalid error.

test_file_change_timestamp_uses_merged_at: An MR with merged_at set and state = 'merged' should use merged_at timestamp, not updated_at. Verify by setting merged_at to old date and updated_at to recent date — score should reflect the old date.

test_open_mr_uses_updated_at: An MR with state = 'opened' should use updated_at (not created_at). Verify that an open MR with recent updated_at scores higher than one with the same created_at but older updated_at.

test_old_path_match_credits_expertise: Insert a DiffNote with position_old_path = "src/old.rs" and position_new_path = "src/new.rs". Query --path src/old.rs — the author should appear. Query --path src/new.rs — same author should also appear. Validates dual-path matching.

test_explain_score_components_sum_to_total: With --explain-score, verify that components.author + components.reviewer_participated + components.reviewer_assigned + components.notes equals the reported score_raw (within f64 rounding tolerance). Note: the closed_mr_multiplier is already folded into the per-component subtotals, not tracked as a separate component.

test_as_of_produces_deterministic_results: Insert data at known timestamps. Run query_expert twice with the same --as-of value — results must be identical. Then run with a later --as-of — scores should be lower (more decay).

test_old_path_probe_exact_and_prefix: Insert a DiffNote with position_old_path = "src/old/foo.rs" and position_new_path = "src/new/foo.rs". Call build_path_query(conn, "src/old/foo.rs") — should resolve as exact file (not "not found"). Call build_path_query(conn, "src/old/") — should resolve as prefix. Validates that the path resolution probes now check old_path columns.

test_suffix_probe_uses_old_path_sources: Insert a file change with old_path = "legacy/utils.rs" and new_path = "src/utils.rs". Call build_path_query(conn, "legacy/utils.rs") — should resolve via exact probe on old_path. Call build_path_query(conn, "utils.rs") — suffix probe should find both legacy/utils.rs and src/utils.rs and either resolve uniquely (if deduplicated) or report ambiguity.

test_since_relative_to_as_of_clock: Insert data at timestamps T1 and T2 (T2 > T1). With --as-of T2 and --since 30d, the window is [T2 - 30d, T2], not [now - 30d, now]. Verify that data at T1 is included or excluded based on the as-of-relative window, not the wall clock window.

test_explain_and_detail_are_mutually_exclusive: Parsing --explain-score --detail should fail with a conflict error from clap.

test_trivial_note_does_not_count_as_participation: A reviewer who left only a short note ("LGTM", 4 chars) on an MR should be classified as assigned-only, not participated, when reviewer_min_note_chars = 20. A reviewer who left a substantive note (>= 20 chars) should be classified as participated. Validates the LENGTH threshold in the reviewer_participation CTE.

test_closed_mr_multiplier: Two identical MRs (same author, same age, same path). One is merged, one is closed. The merged MR should contribute author_weight * decay(...), the closed MR should contribute author_weight * closed_mr_multiplier * decay(...). With default multiplier 0.5, the closed MR contributes half.

test_as_of_excludes_future_events: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With --as-of set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the upper-bound filtering in SQL.

test_null_timestamp_fallback_to_created_at: Insert a merged MR with merged_at = NULL (edge case: old data before the column was populated). The state-aware timestamp should fall back to created_at. Verify the score reflects created_at, not 0 or a panic.

Invariant tests (regression safety for ranking systems)

test_score_monotonicity_by_age: For any single signal type, an older timestamp must never produce a higher score than a newer timestamp with the same weight and half-life. Generate N random (age, half_life) pairs and assert decay(older) <= decay(newer) for all.

test_row_order_independence: Insert the same set of signals in two different orders (e.g., reversed). Run query_expert on both — the resulting rankings (username order + scores) must be identical. Validates that neither SQL ordering nor HashMap iteration order affects final output.

test_reviewer_split_is_exhaustive: For a reviewer assigned to an MR, they must appear in exactly one of: participated (has substantive DiffNotes meeting reviewer_min_note_chars) or assigned-only (no DiffNotes, or only trivial ones below the threshold). Never both, never neither. Test three cases: (1) reviewer with substantive DiffNotes -> participated only, (2) reviewer with no DiffNotes -> assigned-only only, (3) reviewer with only trivial notes ("LGTM") -> assigned-only only.

9. Existing Test Compatibility

All existing tests insert data with now_ms(). With decay, elapsed ~0ms means decay ~1.0, so scores round to the same integers as before. No existing test assertions should break.

The test_expert_scoring_weights_are_configurable test needs ..Default::default() added to fill the new half-life fields, reviewer_assignment_weight / reviewer_assignment_half_life_days, closed_mr_multiplier, and reviewer_min_note_chars fields.

Verification

cargo check --all-targets — no compiler errors
cargo clippy --all-targets -- -D warnings — no lints
cargo fmt --check — formatting clean
cargo test — all existing + new tests pass (including invariant tests)
ubs src/cli/commands/who.rs src/core/config.rs src/core/db.rs — no bug scanner findings
Manual query plan verification (not automated — SQLite planner varies across versions):
- Run EXPLAIN QUERY PLAN on the expert query (both exact and prefix modes) against a real database
- Confirm that matched_notes CTE uses idx_notes_old_path_author or the existing new_path index (not a full table scan)
- Confirm that matched_file_changes CTE uses idx_mfc_old_path_project_mr or idx_mfc_new_path_project_mr
- Confirm that reviewer_participation CTE uses idx_notes_diffnote_discussion_author
- Document the observed plan in a comment near the SQL for future regression reference
Real-world validation:
- cargo run --release -- who --path MeasurementQualityDialog.tsx — verify jdefting/zhayes old reviews are properly discounted relative to recent authors
- cargo run --release -- who --path MeasurementQualityDialog.tsx --all-history — compare full history vs 24m window to validate cutoff is reasonable
- cargo run --release -- who --path MeasurementQualityDialog.tsx --explain-score — verify component breakdown sums to total and authored signal dominates for known authors
- Spot-check that assigned-only reviewers (those who never left DiffNotes) rank below participated reviewers on the same MR
- Test a known renamed file path — verify expertise from the old name carries forward
- cargo run --release -- who --path MeasurementQualityDialog.tsx --as-of 2025-06-01 — verify deterministic output across repeated runs
- Spot-check that reviewers who only left "LGTM"-style notes are classified as assigned-only (not participated)
- Verify closed MRs contribute at ~50% of equivalent merged MR scores via --explain-score

Accepted from External Review

Ideas incorporated from ChatGPT review (feedback-1 through feedback-4) that genuinely improved the plan:

From feedback-1 and feedback-2:

Path rename awareness (old_path matching): Real correctness gap. Both position_old_path and mr_file_changes.old_path exist in the schema. Simple OR clause addition with high value — expertise now survives file renames.
Hybrid SQL pre-aggregation: Revised from "fully raw rows" to pre-aggregate note counts per (user, MR) in SQL. MR-level signals were already 1-per-MR; the note rows were the actual scalability risk. Bounded row counts with predictable memory.
State-aware timestamps: Improved from our overly-simple COALESCE(merged_at, created_at) to a state-aware CASE expression. Open MRs genuinely need updated_at to reflect ongoing work; merged MRs need merged_at to anchor expertise formation.
Index migration: The dual-path matching and CTE patterns need index support. Added partial indexes to keep size minimal.
Invariant tests: test_score_monotonicity_by_age, test_row_order_independence, test_reviewer_split_is_exhaustive catch subtle ranking regressions that example-based tests miss.
--as-of flag: Simple clock-pinning for reproducible decay scoring. Essential for debugging and test determinism.
--explain-score flag: Moved from rejected to included with minimal scope (component breakdown only, no per-MR drill-down). Multi-signal scoring needs decomposition to build trust.

From feedback-3:

Fix noteable_id index bug (critical): The notes table uses discussion_id as FK to discussions, not noteable_id (which doesn't exist). The proposed idx_notes_mr_path_author index would fail at migration time. Fixed to use (discussion_id, author_username, created_at).
CTE-based dual-path matching (matched_notes, matched_file_changes): Rather than repeating OR old_path in every signal subquery, centralize path matching in foundational CTEs. Defined once, indexed once, maintained once. Cleaner and more extensible.
Precomputed reviewer_participation CTE: Replaced correlated EXISTS/NOT EXISTS subqueries with a materialized set of (mr_id, username) pairs. Same semantics, lower query cost, simpler reasoning about the reviewer split.
log2(1+count) over ln(1+count) for notes: With log2, one note contributes exactly 1.0 unit (since log2(2) = 1), making note_bonus=1 directly interpretable. ln gives 0.69 per note, which is unintuitive.
Path resolution probe rename awareness: The plan added old_path matching to scoring queries but missed the upstream path resolution layer (build_path_query() probes and suffix_probe()). Without this, querying an old path name fails at resolution and never reaches scoring. Now both probes check old_path columns.
Removed low-selectivity idx_mr_state_timestamps: MR lookups in scoring are id-driven (from file_changes or discussions), so a 5-column composite on state/timestamps adds no query benefit.
Added idx_mfc_new_path_project_mr: Ensures index parity between old and new path columns on mr_file_changes.
--explain-score conflicts with --detail: Prevents confusing overlapping output from two per-user augmentation flags.
scoring_model_version in resolved_input: Lets robot clients distinguish v1 (flat weights) from v2 (decayed) output schemas.
score_raw in explain mode: Exposes the unrounded f64 so component totals can be verified without rounding noise.
New tests: test_old_path_probe_exact_and_prefix, test_suffix_probe_uses_old_path_sources, test_since_relative_to_as_of_clock, test_explain_and_detail_are_mutually_exclusive, test_null_timestamp_fallback_to_created_at — cover the newly-identified gaps in path resolution, clock semantics, and edge cases.
EXPLAIN QUERY PLAN verification step: Manual check that the restructured queries use the new indexes (not automated, since SQLite planner varies across versions).

From feedback-4:

--as-of temporal correctness (critical): The plan described --as-of but the SQL only enforced a lower bound (>= ?2). Events after the as-of date would leak in with full weight (because elapsed.max(0.0) clamps negative elapsed time to zero). Added <= ?4 upper bound to all SQL timestamp filters, making the query window [since_ms, as_of_ms]. Without this, --as-of reproducibility was fundamentally broken.
Closed-state inconsistency resolution: The state-aware CASE expression handled closed state but the WHERE clause filtered to ('opened','merged') only — dead code. Resolved by including 'closed' in state filters and adding a closed_mr_multiplier (default 0.5) applied in Rust to all signals from closed-without-merge MRs. This credits real review effort on abandoned MRs while appropriately discounting it.
Substantive note threshold for reviewer participation: A single "LGTM" shouldn't promote a reviewer from 3-point (assigned-only) to 10-point (participated) weight. Added reviewer_min_note_chars (default 20) config field and LENGTH(TRIM(body)) filter in the reviewer_participation CTE. This raises the bar for participation classification to actual substantive review comments.
UNION ALL optimization fallback for path predicates: SQLite's planner can degrade OR across two indexed columns to a table scan. Added documentation of a UNION ALL split + dedup fallback pattern to use if EXPLAIN QUERY PLAN shows degradation during verification. Start with the simpler OR approach; switch only if needed.
New tests: test_trivial_note_does_not_count_as_participation, test_closed_mr_multiplier, test_as_of_excludes_future_events — cover the three new features added from this review round.

Rejected Ideas (with rationale)

These suggestions were considered during review but explicitly excluded from this iteration:

Rename alias chain expansion (A->B->C traversal) (feedback-2 #2, feedback-4 #4): Over-engineered for v1. The old_path OR match covers the 80% case (direct renames). Building a canonical path identity table at ingest time adds schema, ingestion logic, and graph traversal complexity for rare multi-hop renames. If real-world usage shows fragmented expertise on multi-rename files, this becomes a v2 feature.
Config-driven max_age_days (feedback-1 #5, feedback-2 #5): We already have --since (explicit window), --all-history (no window), and the 24m default (mathematically justified). Adding a config field that derives the default since window creates confusing interaction between config and CLI flags. If half-lives change, updating the default constant is trivial.
Config-driven decay_floor for derived --since default (feedback-3 #4): Proposed computing the default since window as ceil(max_half_life * log2(1/floor)) so it auto-adjusts when half-lives change. Rejected: the formula is non-obvious to users, adds a config param (decay_floor) with no intuitive meaning, and the benefit is negligible — half-life changes are rare, and updating a constant is trivial. The 24m default is already mathematically justified and easy to override with --since or --all-history.
BTreeMap + Kahan/Neumaier compensated summation (feedback-3 #6): Proposed deterministic iteration order and numerically stable summation. Rejected for this scale: the accumulator processes dozens to low hundreds of entries per user, where HashMap iteration order doesn't measurably affect f64 sums. Compensated summation adds code complexity for zero practical benefit at this magnitude. If we eventually aggregate thousands of signals per user, revisit.
Confidence/coverage metadata (feedback-1 #8, feedback-2 #8, feedback-3 #9, feedback-4 #6): Repeatedly proposed across reviews with variations (score_adjusted with confidence factor, low/medium/high labels, evidence_mr_count weighting). Still scope creep. The --explain-score component breakdown already tells users which signal drives the score. Defining "sparse evidence" thresholds (how many MRs is "low"? what's the right exponential saturation constant?) is domain-specific guesswork without user feedback data. A single recent MR "outranking broader expertise" is the correct behavior of time-decay — the model intentionally weights recency. If real-world usage shows this is a problem, confidence becomes a v2 feature informed by actual threshold data.
Automated EXPLAIN QUERY PLAN tests (feedback-3 #10 partial): SQLite's query planner changes across versions and can use different plans on different data distributions. Automated assertions on plan output are brittle. Instead, we document EXPLAIN QUERY PLAN as a manual verification step during development and include the observed plan as a comment near the SQL.
Per-MR evidence drill-down in --explain-score (feedback-2 #7 promoted this): The v1 --explain-score shows component totals only. Listing top-evidence MRs per user would require additional SQL queries and significant output format work. Deferred unless component breakdowns prove insufficient for debugging.
Split scoring engine into core module (feedback-4 #5): Proposed extracting scoring math from who.rs into src/core/scoring/model_v2_decay.rs. Premature modularization — who.rs is the only consumer and is ~800 lines. Adding module plumbing and indirection for a single call site adds complexity without reducing it. If we add a second scoring consumer (e.g., automated triage), revisit.
Bot/service-account filtering (feedback-4 #7): Real concern but orthogonal to time-decay scoring. This is a general data quality feature that belongs in its own issue — it affects all who modes, not just expert scoring. Adding excluded_username_patterns config and --include-bots flag is scope expansion that should be designed and tested independently.
Model compare mode / rank-delta diagnostics (feedback-4 #9): Over-engineered rollout safety for an internal CLI tool with ~3 users. Maintaining two parallel scoring codepaths (v1 flat + v2 decayed) doubles test surface and code complexity. The --explain-score + --as-of combination already provides debugging capability. If a future model change is risky enough to warrant A/B comparison, build it then.

49 KiB Raw Blame History