docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc
Ideas catalog (docs/ideas/): 25 feature concept documents covering future lore capabilities including bottleneck detection, churn analysis, expert scoring, collaboration patterns, milestone risk, knowledge silos, and more. Each doc includes motivation, implementation sketch, data requirements, and dependencies on existing infrastructure. README.md provides an overview and SYSTEM-PROPOSAL.md presents the unified analytics vision. Plans (plans/): Time-decay expert scoring design with four rounds of review feedback exploring decay functions, scoring algebra, and integration points with the existing who-expert pipeline. Issue doc (docs/issues/001): Documents the timeline pipeline bug where EntityRef was missing project context, causing ambiguous cross-project references during the EXPAND stage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
114
plans/time-decay-expert-scoring.feedback-1.md
Normal file
114
plans/time-decay-expert-scoring.feedback-1.md
Normal file
@@ -0,0 +1,114 @@
|
||||
Your plan is strong directionally, but I’d revise it in 8 key places to avoid regressions and make it significantly more useful in production.
|
||||
|
||||
1. **Split reviewer signals into “participated” vs “assigned-only”**
|
||||
Reason: today’s inflation problem is often assignment noise. Treating `mr_reviewers` equal to real review activity still over-ranks passive reviewers.
|
||||
|
||||
```diff
|
||||
@@ Per-signal contributions
|
||||
-| Reviewer (reviewed MR touching path) | 10 | 90 days |
|
||||
+| ReviewerParticipated (left DiffNote on MR/path) | 10 | 90 days |
|
||||
+| ReviewerAssignedOnly (in mr_reviewers, no DiffNote by that user on MR/path) | 3 | 45 days |
|
||||
```
|
||||
|
||||
```diff
|
||||
@@ Scoring Formula
|
||||
-score = reviewer_mr * reviewer_weight + ...
|
||||
+score = reviewer_participated * reviewer_weight
|
||||
+ + reviewer_assigned_only * reviewer_assignment_weight
|
||||
+ + ...
|
||||
```
|
||||
|
||||
2. **Cap/saturate note intensity per MR**
|
||||
Reason: raw per-note addition can still reward “comment storms.” Use diminishing returns.
|
||||
|
||||
```diff
|
||||
@@ Rust-Side Aggregation
|
||||
-- Notes: Vec<i64> (timestamps) from diffnote_reviewer
|
||||
+-- Notes grouped per (username, mr_id): note_count + max_ts
|
||||
+-- Note contribution per MR uses diminishing returns:
|
||||
+-- note_score_mr = note_bonus * ln(1 + note_count) * decay(now - ts, note_hl)
|
||||
```
|
||||
|
||||
3. **Use better event timestamps than `m.updated_at` for file-change signals**
|
||||
Reason: `updated_at` is noisy (title edits, metadata touches) and creates false recency.
|
||||
|
||||
```diff
|
||||
@@ SQL Restructure
|
||||
- signal 3/4 seen_at = m.updated_at
|
||||
+ signal 3/4 activity_ts = COALESCE(m.merged_at, m.closed_at, m.created_at, m.updated_at)
|
||||
```
|
||||
|
||||
4. **Don’t stream raw note rows to Rust; pre-aggregate in SQL**
|
||||
Reason: current plan removes SQL grouping and can blow up memory/latency on large repos.
|
||||
|
||||
```diff
|
||||
@@ SQL Restructure
|
||||
-SELECT username, signal, mr_id, note_id, ts FROM signals
|
||||
+WITH raw_signals AS (...),
|
||||
+aggregated AS (
|
||||
+ -- 1 row per (username, signal_class, mr_id) for MR-level signals
|
||||
+ -- 1 row per (username, mr_id) for note_count + max_ts
|
||||
+)
|
||||
+SELECT username, signal_class, mr_id, qty, ts FROM aggregated
|
||||
```
|
||||
|
||||
5. **Replace fixed `"24m"` with model-driven cutoff**
|
||||
Reason: hardcoded 24m is arbitrary and tied to current weights/half-lives only.
|
||||
|
||||
```diff
|
||||
@@ Default --since Change
|
||||
-Expert mode: "6m" -> "24m"
|
||||
+Expert mode default window derived from scoring.max_age_days (default 1095 days / 36m).
|
||||
+Formula guidance: choose max_age where max possible single-event contribution < epsilon (e.g. 0.25 points).
|
||||
+Add `--all-history` to disable cutoff for diagnostics.
|
||||
```
|
||||
|
||||
6. **Validate scoring config explicitly**
|
||||
Reason: silent bad configs (`half_life_days = 0`, negative weights) create undefined behavior.
|
||||
|
||||
```diff
|
||||
@@ ScoringConfig (config.rs)
|
||||
pub struct ScoringConfig {
|
||||
pub author_weight: i64,
|
||||
pub reviewer_weight: i64,
|
||||
pub note_bonus: i64,
|
||||
+ pub reviewer_assignment_weight: i64, // default: 3
|
||||
pub author_half_life_days: u32,
|
||||
pub reviewer_half_life_days: u32,
|
||||
pub note_half_life_days: u32,
|
||||
+ pub reviewer_assignment_half_life_days: u32, // default: 45
|
||||
+ pub max_age_days: u32, // default: 1095
|
||||
}
|
||||
@@ Config::load_from_path
|
||||
+validate_scoring(&config.scoring)?; // weights >= 0, half_life_days > 0, max_age_days >= 30
|
||||
```
|
||||
|
||||
7. **Keep raw float score internally; round only for display**
|
||||
Reason: rounding before sort causes avoidable ties/rank instability.
|
||||
|
||||
```diff
|
||||
@@ Rust-Side Aggregation
|
||||
-Round to i64 for Expert.score field
|
||||
+Compute `raw_score: f64`, sort by raw_score DESC.
|
||||
+Expose integer `score` for existing UX.
|
||||
+Optionally expose `score_raw` and `score_components` in robot JSON when `--explain-score`.
|
||||
```
|
||||
|
||||
8. **Add confidence + data-completeness metadata**
|
||||
Reason: rankings are misleading if `mr_file_changes` coverage is poor.
|
||||
|
||||
```diff
|
||||
@@ ExpertResult / Output
|
||||
+confidence: "high" | "medium" | "low"
|
||||
+coverage: { mrs_with_file_changes, total_mrs_in_window, percent }
|
||||
+warning when coverage < threshold (e.g. 70%)
|
||||
```
|
||||
|
||||
```diff
|
||||
@@ Verification
|
||||
4. cargo test
|
||||
+5. ubs src/cli/commands/who.rs src/core/config.rs
|
||||
+6. Benchmark query_expert on representative DB (latency + rows scanned before/after)
|
||||
```
|
||||
|
||||
If you want, I can rewrite your full plan document into a clean “v2” version that already incorporates these diffs end-to-end.
|
||||
132
plans/time-decay-expert-scoring.feedback-2.md
Normal file
132
plans/time-decay-expert-scoring.feedback-2.md
Normal file
@@ -0,0 +1,132 @@
|
||||
The plan is strong, but I’d revise it in 10 places to improve correctness, scalability, and operator trust.
|
||||
|
||||
1. **Add rename/old-path awareness (correctness gap)**
|
||||
Analysis: right now both existing code and your plan still center on `position_new_path` / `new_path` matches (`src/cli/commands/who.rs:643`, `src/cli/commands/who.rs:681`). That misses expertise on renamed/deleted paths and under-ranks long-time owners after refactors.
|
||||
|
||||
```diff
|
||||
@@ ## Context
|
||||
-This produces two compounding problems:
|
||||
+This produces three compounding problems:
|
||||
@@
|
||||
2. **Reviewer inflation**: ...
|
||||
+3. **Path-history blindness**: Renamed/moved files lose historical expertise because matching relies on current-path fields only.
|
||||
|
||||
@@ ### 3. SQL Restructure (who.rs)
|
||||
-AND n.position_new_path {path_op}
|
||||
+AND (n.position_new_path {path_op} OR n.position_old_path {path_op})
|
||||
|
||||
-AND fc.new_path {path_op}
|
||||
+AND (fc.new_path {path_op} OR fc.old_path {path_op})
|
||||
```
|
||||
|
||||
2. **Follow rename chains for queried paths**
|
||||
Analysis: matching `old_path` helps, but true continuity needs alias expansion (A→B→C). Without this, expertise before multi-hop renames is fragmented.
|
||||
|
||||
```diff
|
||||
@@ ### 3. SQL Restructure (who.rs)
|
||||
+**Path alias expansion**: Before scoring, resolve a bounded rename alias set (default max depth: 20)
|
||||
+from `mr_file_changes(change_type='renamed')`. Query signals against all aliases.
|
||||
+Output includes `path_aliases_used` for transparency.
|
||||
```
|
||||
|
||||
3. **Use hybrid SQL pre-aggregation instead of fully raw rows**
|
||||
Analysis: the “raw row” design is simpler but will degrade on large repos with heavy DiffNote volume. Pre-aggregating to `(user, mr)` for MR signals and `(user, mr, note_count)` for note signals keeps memory/latency predictable.
|
||||
|
||||
```diff
|
||||
@@ ### 3. SQL Restructure (who.rs)
|
||||
-The SQL CTE ... removes the outer GROUP BY aggregation. Instead, it returns raw signal rows:
|
||||
-SELECT username, signal, mr_id, note_id, ts FROM signals
|
||||
+Use hybrid aggregation:
|
||||
+- SQL returns MR-level rows for author/reviewer signals (1 row per user+MR+signal_class)
|
||||
+- SQL returns note groups (1 row per user+MR with note_count, max_ts)
|
||||
+- Rust applies decay + ln(1+count) + final ranking.
|
||||
```
|
||||
|
||||
4. **Make timestamp policy state-aware (merged vs opened)**
|
||||
Analysis: replacing `updated_at` with only `COALESCE(merged_at, created_at)` over-decays long-running open MRs. Open MRs need recency from active lifecycle; merged MRs should anchor to merge time.
|
||||
|
||||
```diff
|
||||
@@ ### 3. SQL Restructure (who.rs)
|
||||
-Replace m.updated_at with COALESCE(m.merged_at, m.created_at)
|
||||
+Use state-aware timestamp:
|
||||
+activity_ts =
|
||||
+ CASE
|
||||
+ WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.updated_at, m.created_at, m.last_seen_at)
|
||||
+ WHEN m.state = 'opened' THEN COALESCE(m.updated_at, m.created_at, m.last_seen_at)
|
||||
+ END
|
||||
```
|
||||
|
||||
5. **Replace fixed `24m` with config-driven max age**
|
||||
Analysis: `24m` is reasonable now, but brittle after tuning weights/half-lives. Tie cutoff to config so model behavior remains coherent as parameters evolve.
|
||||
|
||||
```diff
|
||||
@@ ### 1. ScoringConfig (config.rs)
|
||||
+pub max_age_days: u32, // default: 730 (or 1095)
|
||||
|
||||
@@ ### 5. Default --since Change
|
||||
-Expert mode: "6m" -> "24m"
|
||||
+Expert mode default window derives from `scoring.max_age_days`
|
||||
+unless user passes `--since` or `--all-history`.
|
||||
```
|
||||
|
||||
6. **Add reproducible scoring time via `--as-of`**
|
||||
Analysis: decayed ranking is time-sensitive; debugging and tests become flaky without a fixed reference clock. This improves reliability and incident triage.
|
||||
|
||||
```diff
|
||||
@@ ## Files to Modify
|
||||
-2. src/cli/commands/who.rs
|
||||
+2. src/cli/commands/who.rs
|
||||
+3. src/cli/mod.rs
|
||||
+4. src/main.rs
|
||||
|
||||
@@ ### 5. Default --since Change
|
||||
+Add `--as-of <RFC3339|YYYY-MM-DD>` to score at a fixed timestamp.
|
||||
+`resolved_input` includes `as_of_ms` and `as_of_iso`.
|
||||
```
|
||||
|
||||
7. **Add explainability output (`--explain-score`)**
|
||||
Analysis: decayed multi-signal ranking will be disputed without decomposition. Show components and top evidence MRs to make results actionable and debuggable.
|
||||
|
||||
```diff
|
||||
@@ ## Rejected Ideas (with rationale)
|
||||
-- **`--explain-score` flag with component breakdown**: ... deferred
|
||||
+**Included in this iteration**: `--explain-score` adds per-user score components
|
||||
+(`author`, `review_participated`, `review_assigned`, `notes`) plus top evidence MRs.
|
||||
```
|
||||
|
||||
8. **Add confidence/coverage metadata**
|
||||
Analysis: rankings can look precise while data is incomplete (`mr_file_changes` gaps, sparse DiffNotes). Confidence fields prevent false certainty.
|
||||
|
||||
```diff
|
||||
@@ ### 4. Rust-Side Aggregation (who.rs)
|
||||
+Compute and emit:
|
||||
+- `coverage`: {mrs_considered, mrs_with_file_changes, mrs_with_diffnotes, percent}
|
||||
+- `confidence`: high|medium|low (threshold-based)
|
||||
```
|
||||
|
||||
9. **Add index migration for new query shapes**
|
||||
Analysis: your new `EXISTS/NOT EXISTS` reviewer split and path dual-matching will need better indexes; current `who` indexes are not enough for author+path+time combinations.
|
||||
|
||||
```diff
|
||||
@@ ## Files to Modify
|
||||
+3. **`migrations/021_who_decay_indexes.sql`** — indexes for decay query patterns:
|
||||
+ - notes(diffnote path + author + created_at + discussion_id) partial
|
||||
+ - notes(old_path variant) partial
|
||||
+ - mr_file_changes(project_id, new_path, merge_request_id)
|
||||
+ - mr_file_changes(project_id, old_path, merge_request_id) partial
|
||||
+ - merge_requests(state, merged_at, updated_at, created_at)
|
||||
```
|
||||
|
||||
10. **Expand tests to invariants and determinism**
|
||||
Analysis: example-based tests are good, but ranking systems need invariant tests to avoid subtle regressions.
|
||||
|
||||
```diff
|
||||
@@ ### 7. New Tests (TDD)
|
||||
+**`test_score_monotonicity_by_age`**: same signal, older timestamp never scores higher
|
||||
+**`test_row_order_independence`**: shuffled SQL row order yields identical ranking
|
||||
+**`test_as_of_reproducibility`**: same data + same `--as-of` => identical output
|
||||
+**`test_rename_alias_chain_scoring`**: expertise carries across A->B->C rename chain
|
||||
+**`test_overlap_participated_vs_assigned_counts`**: overlap reflects split reviewer semantics
|
||||
```
|
||||
|
||||
If you want, I can produce a full consolidated `v2` plan doc patch (single unified diff against `plans/time-decay-expert-scoring.md`) rather than per-change snippets.
|
||||
167
plans/time-decay-expert-scoring.feedback-3.md
Normal file
167
plans/time-decay-expert-scoring.feedback-3.md
Normal file
@@ -0,0 +1,167 @@
|
||||
**Critical Plan Findings First**
|
||||
1. The proposed index `idx_notes_mr_path_author ON notes(noteable_id, ...)` will fail: `notes.noteable_id` does not exist in schema (`migrations/002_issues.sql:74`).
|
||||
2. Rename awareness is only applied in scoring queries, not in path resolution probes; today `build_path_query()` and `suffix_probe()` only inspect `position_new_path`/`new_path` (`src/cli/commands/who.rs:465`, `src/cli/commands/who.rs:591`), so old-path queries can still miss.
|
||||
3. A fixed `"24m"` default window is brittle once half-lives become configurable; it can silently truncate meaningful history for larger half-lives.
|
||||
|
||||
Below are the revisions I’d make to your plan.
|
||||
|
||||
1. **Fix migration/index architecture (blocking correctness + perf)**
|
||||
Rationale: prevents migration failure and aligns indexes to actual query shapes.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ ### 6. Index Migration (db.rs)
|
||||
- -- Support EXISTS subquery for reviewer participation check
|
||||
- CREATE INDEX IF NOT EXISTS idx_notes_mr_path_author
|
||||
- ON notes(noteable_id, position_new_path, author_username)
|
||||
- WHERE note_type = 'DiffNote' AND is_system = 0;
|
||||
+ -- Support reviewer participation joins (notes -> discussions -> MR)
|
||||
+ CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author_created
|
||||
+ ON notes(discussion_id, author_username, created_at)
|
||||
+ WHERE note_type = 'DiffNote' AND is_system = 0;
|
||||
+
|
||||
+ -- Path-first indexes for global and project-scoped path lookups
|
||||
+ CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
|
||||
+ ON mr_file_changes(new_path, project_id, merge_request_id);
|
||||
+ CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
|
||||
+ ON mr_file_changes(old_path, project_id, merge_request_id)
|
||||
+ WHERE old_path IS NOT NULL;
|
||||
@@
|
||||
- -- Support state-aware timestamp selection
|
||||
- CREATE INDEX IF NOT EXISTS idx_mr_state_timestamps
|
||||
- ON merge_requests(state, merged_at, closed_at, updated_at, created_at);
|
||||
+ -- Removed: low-selectivity timestamp composite index; joins are MR-id driven.
|
||||
```
|
||||
|
||||
2. **Restructure SQL around `matched_mrs` CTE instead of repeating OR path clauses**
|
||||
Rationale: better index use, less duplicated logic, cleaner maintenance.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ ### 3. SQL Restructure (who.rs)
|
||||
- WITH raw AS (
|
||||
- -- 5 UNION ALL subqueries (signals 1, 2, 3, 4a, 4b)
|
||||
- ),
|
||||
+ WITH matched_notes AS (
|
||||
+ -- DiffNotes matching new_path
|
||||
+ ...
|
||||
+ UNION ALL
|
||||
+ -- DiffNotes matching old_path
|
||||
+ ...
|
||||
+ ),
|
||||
+ matched_file_changes AS (
|
||||
+ -- file changes matching new_path
|
||||
+ ...
|
||||
+ UNION ALL
|
||||
+ -- file changes matching old_path
|
||||
+ ...
|
||||
+ ),
|
||||
+ matched_mrs AS (
|
||||
+ SELECT DISTINCT mr_id, project_id FROM matched_notes
|
||||
+ UNION
|
||||
+ SELECT DISTINCT mr_id, project_id FROM matched_file_changes
|
||||
+ ),
|
||||
+ raw AS (
|
||||
+ -- signals sourced from matched_mrs + matched_notes
|
||||
+ ),
|
||||
```
|
||||
|
||||
3. **Replace correlated `EXISTS/NOT EXISTS` reviewer split with one precomputed participation set**
|
||||
Rationale: same semantics, lower query cost, easier reasoning.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ Signal 4 splits into two
|
||||
- Signal 4a uses an EXISTS subquery ...
|
||||
- Signal 4b uses NOT EXISTS ...
|
||||
+ Build `reviewer_participation(mr_id, username)` once from matched DiffNotes.
|
||||
+ Then classify `mr_reviewers` rows via LEFT JOIN:
|
||||
+ - participated: `rp.username IS NOT NULL`
|
||||
+ - assigned-only: `rp.username IS NULL`
|
||||
+ This avoids correlated EXISTS scans per reviewer row.
|
||||
```
|
||||
|
||||
4. **Make default `--since` derived from half-life + decay floor, not hardcoded 24m**
|
||||
Rationale: remains mathematically consistent when config changes.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ ### 1. ScoringConfig (config.rs)
|
||||
+ pub decay_floor: f64, // default: 0.05
|
||||
@@ ### 5. Default --since Change
|
||||
- Expert mode: "6m" -> "24m"
|
||||
+ Expert mode default window is computed:
|
||||
+ default_since_days = ceil(max_half_life_days * log2(1.0 / decay_floor))
|
||||
+ With defaults (max_half_life=180, floor=0.05), this is ~26 months.
|
||||
+ CLI `--since` still overrides; `--all-history` still disables windowing.
|
||||
```
|
||||
|
||||
5. **Use `log2(1+count)` for notes instead of `ln(1+count)`**
|
||||
Rationale: keeps 1 note ~= 1 unit (with `note_bonus=1`) while preserving diminishing returns.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ Scoring Formula
|
||||
- note_contribution(mr) = note_bonus * ln(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)
|
||||
+ note_contribution(mr) = note_bonus * log2(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)
|
||||
```
|
||||
|
||||
6. **Guarantee deterministic float aggregation and expose `score_raw`**
|
||||
Rationale: avoids hash-order drift and explainability mismatch vs rounded integer score.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ ### 4. Rust-Side Aggregation (who.rs)
|
||||
- HashMap<i64, ...>
|
||||
+ BTreeMap<i64, ...> (or sort keys before accumulation) for deterministic summation order
|
||||
+ Use compensated summation (Kahan/Neumaier) for stable f64 totals
|
||||
@@
|
||||
- Sort on raw `f64` score ... round only for display
|
||||
+ Keep `score_raw` internally and expose when `--explain-score` is active.
|
||||
+ `score` remains integer for backward compatibility.
|
||||
```
|
||||
|
||||
7. **Extend rename awareness to query resolution (not only scoring)**
|
||||
Rationale: fixes user-facing misses for old path input and suffix lookup.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ Path rename awareness
|
||||
- All signal subqueries match both old and new path columns
|
||||
+ Also update `build_path_query()` probes and suffix probe:
|
||||
+ - exact_exists: new_path OR old_path (notes + mr_file_changes)
|
||||
+ - prefix_exists: new_path LIKE OR old_path LIKE
|
||||
+ - suffix_probe: union of notes.position_new_path, notes.position_old_path,
|
||||
+ mr_file_changes.new_path, mr_file_changes.old_path
|
||||
```
|
||||
|
||||
8. **Tighten CLI/output contracts for new flags**
|
||||
Rationale: avoids payload bloat/ambiguity and keeps robot clients stable.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ ### 5b. Score Explainability via `--explain-score`
|
||||
+ `--explain-score` conflicts with `--detail` (mutually exclusive)
|
||||
+ `resolved_input` includes `as_of_ms`, `as_of_iso`, `scoring_model_version`
|
||||
+ robot output includes `score_raw` and `components` only when explain is enabled
|
||||
```
|
||||
|
||||
9. **Add confidence metadata (promote from rejected to accepted)**
|
||||
Rationale: makes ranking more actionable and trustworthy with sparse evidence.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ Rejected Ideas (with rationale)
|
||||
- Confidence/coverage metadata: ... Deferred to avoid scope creep
|
||||
+ Confidence/coverage metadata: ACCEPTED (minimal v1)
|
||||
+ Add per-user `confidence: low|medium|high` based on evidence breadth + recency.
|
||||
+ Keep implementation lightweight (no extra SQL pass).
|
||||
```
|
||||
|
||||
10. **Upgrade test and verification scope to include query-plan and clock semantics**
|
||||
Rationale: catches regressions your current tests won’t.
|
||||
```diff
|
||||
diff --git a/plan.md b/plan.md
|
||||
@@ 8. New Tests (TDD)
|
||||
+ test_old_path_probe_exact_and_prefix
|
||||
+ test_suffix_probe_uses_old_path_sources
|
||||
+ test_since_relative_to_as_of_clock
|
||||
+ test_explain_and_detail_are_mutually_exclusive
|
||||
+ test_null_timestamp_fallback_to_created_at
|
||||
+ test_query_plan_uses_path_indexes (EXPLAIN QUERY PLAN)
|
||||
@@ Verification
|
||||
+ 7. EXPLAIN QUERY PLAN snapshots for expert query (exact + prefix) confirm index usage
|
||||
```
|
||||
|
||||
If you want, I can produce a single consolidated “revision 3” plan document that fully merges all of the above into your original structure.
|
||||
133
plans/time-decay-expert-scoring.feedback-4.md
Normal file
133
plans/time-decay-expert-scoring.feedback-4.md
Normal file
@@ -0,0 +1,133 @@
|
||||
Your plan is already strong. The biggest remaining gaps are temporal correctness, indexability at scale, and ranking reliability under sparse/noisy evidence. These are the revisions I’d make.
|
||||
|
||||
1. **Fix temporal correctness for `--as-of` (critical)**
|
||||
Analysis: Right now the plan describes `--as-of`, but the SQL only enforces lower bounds (`>= since`). If `as_of` is in the past, “future” events can still enter and get full weight (because elapsed is clamped). This breaks reproducibility.
|
||||
```diff
|
||||
@@ 3. SQL Restructure
|
||||
- AND n.created_at >= ?2
|
||||
+ AND n.created_at BETWEEN ?2 AND ?4
|
||||
@@ Signal 3/4a/4b
|
||||
- AND {state_aware_ts} >= ?2
|
||||
+ AND {state_aware_ts} BETWEEN ?2 AND ?4
|
||||
|
||||
@@ 5a. Reproducible Scoring via --as-of
|
||||
- All decay computations use as_of_ms instead of SystemTime::now()
|
||||
+ All event selection and decay computations are bounded by as_of_ms.
|
||||
+ Query window is [since_ms, as_of_ms], never [since_ms, now_ms].
|
||||
+ Add test: test_as_of_excludes_future_events.
|
||||
```
|
||||
|
||||
2. **Resolve `closed`-state inconsistency**
|
||||
Analysis: The CASE handles `closed`, but all signal queries filter to `('opened','merged')`, making the `closed_at` branch dead code. Either include closed MRs intentionally or remove that logic. I’d include closed with a reduced multiplier.
|
||||
```diff
|
||||
@@ ScoringConfig (config.rs)
|
||||
+ pub closed_mr_multiplier: f64, // default: 0.5
|
||||
|
||||
@@ 3. SQL Restructure
|
||||
- AND m.state IN ('opened','merged')
|
||||
+ AND m.state IN ('opened','merged','closed')
|
||||
|
||||
@@ 4. Rust-Side Aggregation
|
||||
+ if state == "closed" { contribution *= closed_mr_multiplier; }
|
||||
```
|
||||
|
||||
3. **Replace `OR` path predicates with index-friendly `UNION ALL` branches**
|
||||
Analysis: `(new_path ... OR old_path ...)` often degrades index usage in SQLite. Split into two indexed branches and dedupe once. This improves planner stability and latency on large datasets.
|
||||
```diff
|
||||
@@ 3. SQL Restructure
|
||||
-WITH matched_notes AS (
|
||||
- ... AND (n.position_new_path {path_op} OR n.position_old_path {path_op})
|
||||
-),
|
||||
+WITH matched_notes AS (
|
||||
+ SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
|
||||
+ UNION ALL
|
||||
+ SELECT ... FROM notes n WHERE ... AND n.position_old_path {path_op}
|
||||
+),
|
||||
+matched_notes_dedup AS (
|
||||
+ SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
|
||||
+ FROM matched_notes
|
||||
+),
|
||||
@@
|
||||
- JOIN matched_notes mn ...
|
||||
+ JOIN matched_notes_dedup mn ...
|
||||
```
|
||||
|
||||
4. **Add canonical path identity (rename-chain support)**
|
||||
Analysis: Direct `old_path/new_path` matching only handles one-hop rename scenarios. A small alias graph/table built at ingest time gives robust expertise continuity across A→B→C chains and avoids repeated SQL complexity.
|
||||
```diff
|
||||
@@ Files to Modify
|
||||
- 3. src/core/db.rs — Add migration for indexes...
|
||||
+ 3. src/core/db.rs — Add migration for indexes + path_identity table
|
||||
+ 4. src/core/ingest/*.rs — populate path_identity on rename events
|
||||
+ 5. src/cli/commands/who.rs — resolve query path to canonical path_id first
|
||||
|
||||
@@ Context
|
||||
- The fix has three parts:
|
||||
+ The fix has four parts:
|
||||
+ - Introduce canonical path identity so multi-hop renames preserve expertise
|
||||
```
|
||||
|
||||
5. **Split scoring engine into a versioned core module**
|
||||
Analysis: `who.rs` is becoming a mixed CLI/query/math/output surface. Move scoring math and event normalization into `src/core/scoring/` with explicit model versions. This reduces regression risk and enables future model experiments.
|
||||
```diff
|
||||
@@ Files to Modify
|
||||
+ 4. src/core/scoring/mod.rs — model interface + shared types
|
||||
+ 5. src/core/scoring/model_v2_decay.rs — current implementation
|
||||
+ 6. src/cli/commands/who.rs — orchestration only
|
||||
|
||||
@@ 5b. Score Explainability
|
||||
+ resolved_input includes scoring_model_version and scoring_model_name
|
||||
```
|
||||
|
||||
6. **Add evidence confidence to reduce sparse-data rank spikes**
|
||||
Analysis: One recent MR can outrank broader, steadier expertise. Add a confidence factor derived from number of distinct evidence MRs and expose both `score_raw` and `score_adjusted`.
|
||||
```diff
|
||||
@@ Scoring Formula
|
||||
+ confidence(user) = 1 - exp(-evidence_mr_count / 6.0)
|
||||
+ score_adjusted = score_raw * confidence
|
||||
|
||||
@@ 4. Rust-Side Aggregation
|
||||
+ compute evidence_mr_count from unique MR ids across all signals
|
||||
+ sort by score_adjusted DESC, then score_raw DESC, then last_seen DESC
|
||||
|
||||
@@ 5b. --explain-score
|
||||
+ include confidence and evidence_mr_count
|
||||
```
|
||||
|
||||
7. **Add first-class bot/service-account filtering**
|
||||
Analysis: Reviewer inflation is not just assignment; bots and automation users can still pollute rankings. Make exclusion explicit and configurable.
|
||||
```diff
|
||||
@@ ScoringConfig (config.rs)
|
||||
+ pub excluded_username_patterns: Vec<String>, // defaults include "*bot*", "renovate", "dependabot"
|
||||
|
||||
@@ 3. SQL Restructure
|
||||
+ AND username NOT MATCHES excluded patterns (applied in Rust post-query or SQL where feasible)
|
||||
|
||||
@@ CLI
|
||||
+ --include-bots (override exclusions)
|
||||
```
|
||||
|
||||
8. **Tighten reviewer “participated” with substantive-note threshold**
|
||||
Analysis: A single “LGTM” note shouldn’t classify someone as engaged reviewer equivalent to real inline review. Use a minimum substantive threshold.
|
||||
```diff
|
||||
@@ ScoringConfig (config.rs)
|
||||
+ pub reviewer_min_note_chars: u32, // default: 20
|
||||
|
||||
@@ reviewer_participation CTE
|
||||
- SELECT DISTINCT ... FROM matched_notes
|
||||
+ SELECT DISTINCT ... FROM matched_notes
|
||||
+ WHERE LENGTH(TRIM(body)) >= ?reviewer_min_note_chars
|
||||
```
|
||||
|
||||
9. **Add rollout safety: model compare mode + rank-delta diagnostics**
|
||||
Analysis: This is a scoring-model migration. You need safe rollout mechanics, not just tests. Add a compare mode so you can inspect rank deltas before forcing v2.
|
||||
```diff
|
||||
@@ CLI (who)
|
||||
+ --scoring-model v1|v2|compare (default: v2)
|
||||
+ --max-rank-delta-report N (compare mode diagnostics)
|
||||
|
||||
@@ Robot output
|
||||
+ include v1_score, v2_score, rank_delta when --scoring-model compare
|
||||
```
|
||||
|
||||
If you want, I can produce a single consolidated “plan v4” document that applies all nine diffs cleanly into your original markdown.
|
||||
575
plans/time-decay-expert-scoring.md
Normal file
575
plans/time-decay-expert-scoring.md
Normal file
@@ -0,0 +1,575 @@
|
||||
---
|
||||
plan: true
|
||||
title: ""
|
||||
status: iterating
|
||||
iteration: 4
|
||||
target_iterations: 8
|
||||
beads_revision: 0
|
||||
related_plans: []
|
||||
created: 2026-02-08
|
||||
updated: 2026-02-08
|
||||
---
|
||||
|
||||
# Time-Decay Expert Scoring Model
|
||||
|
||||
## Context
|
||||
|
||||
The `lore who --path` command currently uses flat weights to score expertise: each authored MR counts as 25 points, each reviewed MR as 10, each inline note as 1 — regardless of when the activity happened. This produces three compounding problems:
|
||||
|
||||
1. **Temporal blindness**: Old activity counts the same as recent activity. Someone who authored a file 2 years ago ranks equivalently to someone who wrote it last week.
|
||||
2. **Reviewer inflation**: Senior reviewers (jdefting, zhayes) who rubber-stamp every MR via assignment accumulate inflated scores indistinguishable from reviewers who actually left substantive inline feedback. The `mr_reviewers` table captures assignment, not engagement.
|
||||
3. **Path-history blindness**: Renamed or moved files lose historical expertise because signal matching relies on `position_new_path` and `mr_file_changes.new_path` only. A developer who authored the file under its previous name gets zero credit after a rename.
|
||||
|
||||
The fix has three parts:
|
||||
- Apply **exponential half-life decay** to each signal, grounded in cognitive science research
|
||||
- **Split the reviewer signal** into "participated" (left DiffNotes) vs "assigned-only" (in `mr_reviewers` but no inline comments), with different weights and decay rates
|
||||
- **Match both old and new paths** in all signal queries AND path resolution probes so expertise survives file renames
|
||||
|
||||
## Research Foundation
|
||||
|
||||
- **Ebbinghaus Forgetting Curve (1885)**: Memory retention follows exponential decay: `R = 2^(-t/h)` where h is the half-life
|
||||
- **Generation Effect (Slamecka & Graf, 1978)**: Producing information (authoring code) creates ~2x more durable memory traces than reading it (reviewing)
|
||||
- **Levels of Processing (Craik & Lockhart, 1972)**: Deeper cognitive engagement creates more durable memories — authoring > reviewing > commenting
|
||||
- **Half-Life Regression (Settles & Meeder, 2016, Duolingo)**: Exponential decay with per-signal-type half-lives is practical and effective at scale. Chosen over power law for additivity, bounded behavior, and intuitive parameterization
|
||||
- **Fritz et al. (2010, ICSE)**: "Degree-of-knowledge" model for code familiarity considers both authoring and interaction events with time-based decay
|
||||
|
||||
## Scoring Formula
|
||||
|
||||
```
|
||||
score(user, path) = Sum_i( weight_i * 2^(-days_elapsed_i / half_life_i) )
|
||||
```
|
||||
|
||||
For note signals grouped per MR, a diminishing-returns function caps comment storms:
|
||||
```
|
||||
note_contribution(mr) = note_bonus * log2(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)
|
||||
```
|
||||
|
||||
**Why `log2` instead of `ln`?** With `log2`, a single note contributes exactly `note_bonus * 1.0` (since `log2(2) = 1`), making the `note_bonus` weight directly interpretable as "points per note at count=1." With `ln`, one note contributes `note_bonus * 0.69`, which is unintuitive and means `note_bonus=1` doesn't actually mean "1 point per note." The diminishing-returns curve shape is identical — only the scale factor differs.
|
||||
|
||||
Per-signal contributions (each signal is either per-MR or per-note-group):
|
||||
|
||||
| Signal Type | Base Weight | Half-Life | Rationale |
|
||||
|-------------|-------------|-----------|-----------|
|
||||
| **Author** (authored MR touching path) | 25 | 180 days | Deep generative engagement; ~50% retention at 6 months |
|
||||
| **Reviewer Participated** (left DiffNote on MR/path) | 10 | 90 days | Active review engagement; ~50% at 3 months |
|
||||
| **Reviewer Assigned-Only** (in `mr_reviewers`, no DiffNote on path) | 3 | 45 days | Passive assignment; minimal cognitive engagement, fades fast |
|
||||
| **Note** (inline DiffNotes on path, grouped per MR) | 1 | 45 days | `log2(1+count)` per MR; diminishing returns prevent comment storms |
|
||||
|
||||
**Why split reviewers?** The `mr_reviewers` table records assignment, not engagement. A reviewer who left 5 inline comments on a file has demonstrably more expertise than one who was merely assigned and clicked "approve." The participated signal inherits the old reviewer weight (10) and decay (90 days); the assigned-only signal gets reduced weight (3) and faster decay (45 days) — enough to register but not enough to inflate past actual contributors.
|
||||
|
||||
**Why require substantive notes?** Participation is qualified by a minimum note body length (`reviewer_min_note_chars`, default 20). Without this, a single "LGTM" or "+1" comment would promote a reviewer from the 3-point assigned-only tier to the 10-point participated tier — a 3.3x weight increase for zero substantive engagement. The threshold is configurable to accommodate teams with different review conventions.
|
||||
|
||||
**Why cap notes per MR?** Without diminishing returns, a back-and-forth thread of 30 comments on a single MR would score 30 note points — disproportionate to the expertise gained. `log2(1 + 30) ≈ 4.95` vs `log2(1 + 1) = 1.0` preserves the signal that more comments = more engagement while preventing outlier MRs from dominating. The 30-note reviewer gets ~5x the credit of a 1-note reviewer, not 30x.
|
||||
|
||||
Author/reviewer signals are deduplicated per MR (one signal per distinct MR). Note signals are grouped per (user, MR) and use `log2(1 + count)` scaling.
|
||||
|
||||
**Why include closed MRs?** Closed-without-merge MRs represent real review effort and code familiarity even though the code was abandoned. All signals from closed MRs are multiplied by `closed_mr_multiplier` (default 0.5) to reflect this reduced but non-zero contribution. This applies uniformly to author, reviewer, and note signals on closed MRs.
|
||||
|
||||
## Files to Modify
|
||||
|
||||
1. **`src/core/config.rs`** — Add half-life fields + assigned-only reviewer config to `ScoringConfig`; add config validation
|
||||
2. **`src/cli/commands/who.rs`** — Core changes:
|
||||
- Add `half_life_decay()` pure function
|
||||
- Restructure `query_expert()`: SQL returns hybrid-aggregated signal rows with timestamps (MR-level for author/reviewer, note-count-per-MR for notes), Rust applies decay + `log2(1+count)` + final ranking
|
||||
- Match both `new_path` and `old_path` in all signal queries (rename awareness)
|
||||
- Extend rename awareness to `build_path_query()` probes and `suffix_probe()` (not just scoring)
|
||||
- Split reviewer signal into participated vs assigned-only
|
||||
- Use state-aware timestamps (`merged_at` for merged MRs, `updated_at` for open MRs)
|
||||
- Change default `--since` from `"6m"` to `"24m"` (2 years captures all meaningful decayed signals)
|
||||
- Add `--as-of` flag for reproducible scoring at a fixed timestamp
|
||||
- Add `--explain-score` flag for per-user score component breakdown
|
||||
- Sort on raw f64 score, round only for display
|
||||
- Update tests
|
||||
3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes (dual-path matching, reviewer participation CTE, path resolution probes)
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### 1. ScoringConfig (config.rs)
|
||||
|
||||
Add half-life fields and the new assigned-only reviewer signal. All new fields use `#[serde(default)]` for backward compatibility:
|
||||
|
||||
```rust
|
||||
pub struct ScoringConfig {
|
||||
pub author_weight: i64, // default: 25
|
||||
pub reviewer_weight: i64, // default: 10 (participated — left DiffNotes)
|
||||
pub reviewer_assignment_weight: i64, // default: 3 (assigned-only — no DiffNotes on path)
|
||||
pub note_bonus: i64, // default: 1
|
||||
pub author_half_life_days: u32, // default: 180
|
||||
pub reviewer_half_life_days: u32, // default: 90 (participated)
|
||||
pub reviewer_assignment_half_life_days: u32, // default: 45 (assigned-only)
|
||||
pub note_half_life_days: u32, // default: 45
|
||||
pub closed_mr_multiplier: f64, // default: 0.5 (applied to closed-without-merge MRs)
|
||||
pub reviewer_min_note_chars: u32, // default: 20 (minimum note body length to count as participation)
|
||||
}
|
||||
```
|
||||
|
||||
**Config validation**: Add a `validate_scoring()` call in `Config::load_from_path()` after deserialization:
|
||||
- All `*_half_life_days` must be > 0 (prevents division by zero in decay function)
|
||||
- All `*_weight` / `*_bonus` must be >= 0 (negative weights produce nonsensical scores)
|
||||
- `closed_mr_multiplier` must be in `(0.0, 1.0]` (0 would discard closed MRs entirely; >1 would over-weight them)
|
||||
- `reviewer_min_note_chars` must be >= 0 (0 disables the filter; typical useful values: 10-50)
|
||||
- Return `LoreError::ConfigInvalid` with a clear message on failure
|
||||
|
||||
### 2. Decay Function (who.rs)
|
||||
|
||||
```rust
|
||||
fn half_life_decay(elapsed_ms: i64, half_life_days: u32) -> f64 {
|
||||
let days = (elapsed_ms as f64 / 86_400_000.0).max(0.0);
|
||||
let hl = f64::from(half_life_days);
|
||||
if hl <= 0.0 { return 0.0; }
|
||||
2.0_f64.powf(-days / hl)
|
||||
}
|
||||
```
|
||||
|
||||
### 3. SQL Restructure (who.rs)
|
||||
|
||||
The SQL uses **CTE-based dual-path matching** and **hybrid aggregation**. Rather than repeating `OR old_path` in every signal subquery, two foundational CTEs (`matched_notes`, `matched_file_changes`) centralize path matching. A third CTE (`reviewer_participation`) precomputes which reviewers actually left DiffNotes, avoiding correlated `EXISTS`/`NOT EXISTS` subqueries.
|
||||
|
||||
MR-level signals return one row per (username, signal, mr_id) with a timestamp; note signals return one row per (username, mr_id) with `note_count` and `max_ts`. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and `log2(1+count)`.
|
||||
|
||||
```sql
|
||||
WITH matched_notes AS (
|
||||
-- Centralize dual-path matching for DiffNotes
|
||||
SELECT n.id, n.discussion_id, n.author_username, n.created_at,
|
||||
n.position_new_path, n.position_old_path, n.project_id
|
||||
FROM notes n
|
||||
WHERE n.note_type = 'DiffNote'
|
||||
AND n.is_system = 0
|
||||
AND n.author_username IS NOT NULL
|
||||
AND n.created_at >= ?2
|
||||
AND n.created_at <= ?4
|
||||
AND (?3 IS NULL OR n.project_id = ?3)
|
||||
AND (n.position_new_path {path_op} OR n.position_old_path {path_op})
|
||||
),
|
||||
matched_file_changes AS (
|
||||
-- Centralize dual-path matching for file changes
|
||||
SELECT fc.merge_request_id, fc.project_id
|
||||
FROM mr_file_changes fc
|
||||
WHERE (?3 IS NULL OR fc.project_id = ?3)
|
||||
AND (fc.new_path {path_op} OR fc.old_path {path_op})
|
||||
),
|
||||
reviewer_participation AS (
|
||||
-- Precompute which (mr_id, username) pairs have substantive DiffNote participation.
|
||||
-- Materialized once, then joined against mr_reviewers to classify.
|
||||
-- The LENGTH filter excludes trivial notes ("LGTM", "+1", emoji-only) from qualifying
|
||||
-- a reviewer as "participated." Without this, a single "LGTM" would promote an assigned
|
||||
-- reviewer from 3-point to 10-point weight, defeating the purpose of the split.
|
||||
-- Note: mn.id refers back to notes.id, so we join notes to access the body column
|
||||
-- (not carried in matched_notes to avoid bloating that CTE with body text).
|
||||
SELECT DISTINCT d.merge_request_id AS mr_id, mn.author_username AS username
|
||||
FROM matched_notes mn
|
||||
JOIN discussions d ON mn.discussion_id = d.id
|
||||
JOIN notes n_body ON mn.id = n_body.id
|
||||
WHERE d.merge_request_id IS NOT NULL
|
||||
AND LENGTH(TRIM(COALESCE(n_body.body, ''))) >= {reviewer_min_note_chars}
|
||||
),
|
||||
raw AS (
|
||||
-- Signal 1: DiffNote reviewer (individual notes for note_cnt)
|
||||
SELECT mn.author_username AS username, 'diffnote_reviewer' AS signal,
|
||||
m.id AS mr_id, mn.id AS note_id, mn.created_at AS seen_at, m.state AS mr_state
|
||||
FROM matched_notes mn
|
||||
JOIN discussions d ON mn.discussion_id = d.id
|
||||
JOIN merge_requests m ON d.merge_request_id = m.id
|
||||
WHERE (m.author_username IS NULL OR mn.author_username != m.author_username)
|
||||
AND m.state IN ('opened','merged','closed')
|
||||
|
||||
UNION ALL
|
||||
|
||||
-- Signal 2: DiffNote MR author
|
||||
SELECT m.author_username AS username, 'diffnote_author' AS signal,
|
||||
m.id AS mr_id, NULL AS note_id, MAX(mn.created_at) AS seen_at, m.state AS mr_state
|
||||
FROM merge_requests m
|
||||
JOIN discussions d ON d.merge_request_id = m.id
|
||||
JOIN matched_notes mn ON mn.discussion_id = d.id
|
||||
WHERE m.author_username IS NOT NULL
|
||||
AND m.state IN ('opened','merged','closed')
|
||||
GROUP BY m.author_username, m.id
|
||||
|
||||
UNION ALL
|
||||
|
||||
-- Signal 3: MR author via file changes (state-aware timestamp)
|
||||
SELECT m.author_username AS username, 'file_author' AS signal,
|
||||
m.id AS mr_id, NULL AS note_id,
|
||||
{state_aware_ts} AS seen_at, m.state AS mr_state
|
||||
FROM matched_file_changes mfc
|
||||
JOIN merge_requests m ON mfc.merge_request_id = m.id
|
||||
WHERE m.author_username IS NOT NULL
|
||||
AND m.state IN ('opened','merged','closed')
|
||||
AND {state_aware_ts} >= ?2
|
||||
AND {state_aware_ts} <= ?4
|
||||
|
||||
UNION ALL
|
||||
|
||||
-- Signal 4a: Reviewer participated (in mr_reviewers AND left DiffNotes on path)
|
||||
SELECT r.username AS username, 'file_reviewer_participated' AS signal,
|
||||
m.id AS mr_id, NULL AS note_id,
|
||||
{state_aware_ts} AS seen_at, m.state AS mr_state
|
||||
FROM matched_file_changes mfc
|
||||
JOIN merge_requests m ON mfc.merge_request_id = m.id
|
||||
JOIN mr_reviewers r ON r.merge_request_id = m.id
|
||||
JOIN reviewer_participation rp ON rp.mr_id = m.id AND rp.username = r.username
|
||||
WHERE r.username IS NOT NULL
|
||||
AND (m.author_username IS NULL OR r.username != m.author_username)
|
||||
AND m.state IN ('opened','merged','closed')
|
||||
AND {state_aware_ts} >= ?2
|
||||
AND {state_aware_ts} <= ?4
|
||||
|
||||
UNION ALL
|
||||
|
||||
-- Signal 4b: Reviewer assigned-only (in mr_reviewers, NO DiffNotes on path)
|
||||
SELECT r.username AS username, 'file_reviewer_assigned' AS signal,
|
||||
m.id AS mr_id, NULL AS note_id,
|
||||
{state_aware_ts} AS seen_at, m.state AS mr_state
|
||||
FROM matched_file_changes mfc
|
||||
JOIN merge_requests m ON mfc.merge_request_id = m.id
|
||||
JOIN mr_reviewers r ON r.merge_request_id = m.id
|
||||
LEFT JOIN reviewer_participation rp ON rp.mr_id = m.id AND rp.username = r.username
|
||||
WHERE rp.username IS NULL -- NOT in participation set
|
||||
AND r.username IS NOT NULL
|
||||
AND (m.author_username IS NULL OR r.username != m.author_username)
|
||||
AND m.state IN ('opened','merged','closed')
|
||||
AND {state_aware_ts} >= ?2
|
||||
AND {state_aware_ts} <= ?4
|
||||
),
|
||||
aggregated AS (
|
||||
-- MR-level signals: 1 row per (username, signal_class, mr_id) with MAX(ts)
|
||||
SELECT username, signal, mr_id, 1 AS qty, MAX(seen_at) AS ts, mr_state
|
||||
FROM raw WHERE signal != 'diffnote_reviewer'
|
||||
GROUP BY username, signal, mr_id
|
||||
UNION ALL
|
||||
-- Note signals: 1 row per (username, mr_id) with note_count and max_ts
|
||||
SELECT username, 'note_group' AS signal, mr_id, COUNT(*) AS qty, MAX(seen_at) AS ts, mr_state
|
||||
FROM raw WHERE signal = 'diffnote_reviewer' AND note_id IS NOT NULL
|
||||
GROUP BY username, mr_id
|
||||
)
|
||||
SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated WHERE username IS NOT NULL
|
||||
```
|
||||
|
||||
Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `BETWEEN ?2 AND ?4` pattern ensures that when `--as-of` is set to a past date, events after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility.
|
||||
|
||||
**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into `matched_notes` and `matched_file_changes` CTEs means path matching is defined once, the indexes are hit once, and adding future path resolution logic (e.g., alias chains) only requires changes in one place.
|
||||
|
||||
**Index optimization fallback (UNION ALL split)**: SQLite's query planner sometimes struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. If EXPLAIN QUERY PLAN shows this during step 6 verification, replace the `OR`-based CTEs with a `UNION ALL` split + dedup pattern:
|
||||
```sql
|
||||
matched_notes AS (
|
||||
SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
|
||||
UNION ALL
|
||||
SELECT ... FROM notes n WHERE ... AND n.position_old_path {path_op}
|
||||
),
|
||||
matched_notes_dedup AS (
|
||||
SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
|
||||
FROM matched_notes
|
||||
),
|
||||
```
|
||||
This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm the degradation.
|
||||
|
||||
**Rationale for precomputed participation set**: The previous approach used correlated `EXISTS`/`NOT EXISTS` subqueries to classify reviewers. The `reviewer_participation` CTE materializes the set of `(mr_id, username)` pairs from matched DiffNotes once, then signal 4a JOINs against it (participated) and signal 4b LEFT JOINs with `IS NULL` (assigned-only). This avoids per-reviewer-row correlated scans, is easier to reason about, and produces the same exhaustive split — every `mr_reviewers` row falls into exactly one bucket.
|
||||
|
||||
**Rationale for hybrid over fully-raw**: Pre-aggregating note counts in SQL prevents row explosion from heavy DiffNote volume on frequently-discussed paths. MR-level signals are already 1-per-MR by nature (deduped via GROUP BY in each subquery). This keeps memory and latency predictable regardless of review activity density.
|
||||
|
||||
**Path rename awareness**: Both `matched_notes` and `matched_file_changes` CTEs match against both old and new path columns:
|
||||
|
||||
- Notes: `(n.position_new_path {path_op} OR n.position_old_path {path_op})`
|
||||
- File changes: `(fc.new_path {path_op} OR fc.old_path {path_op})`
|
||||
|
||||
Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The `OR` match ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.
|
||||
|
||||
**Signal 4 splits into two**: The current signal 4 (`file_reviewer`) joins `mr_reviewers` but doesn't distinguish participation. In the new plan:
|
||||
|
||||
- **Signal 4a** (`file_reviewer_participated`): User is in `mr_reviewers` AND appears in the `reviewer_participation` CTE (left DiffNotes on the path for that MR). Gets `reviewer_weight` (10) and `reviewer_half_life_days` (90).
|
||||
- **Signal 4b** (`file_reviewer_assigned`): User is in `mr_reviewers` but NOT in the `reviewer_participation` CTE. Gets `reviewer_assignment_weight` (3) and `reviewer_assignment_half_life_days` (45).
|
||||
|
||||
### 3a. Path Resolution Probes (who.rs)
|
||||
|
||||
Rename awareness must extend beyond scoring queries to the path resolution layer. Currently `build_path_query()` (line 457) and `suffix_probe()` (line 584) only check `position_new_path` and `new_path`. If a user queries an old path name, these probes return "not found" and the scoring query never runs.
|
||||
|
||||
**Changes to `build_path_query()`**:
|
||||
|
||||
- **Probe 1 (exact_exists)**: Add `OR position_old_path = ?1` to the notes query and `OR old_path = ?1` to the `mr_file_changes` query. This detects files that existed under the queried name even if they've since been renamed.
|
||||
- **Probe 2 (prefix_exists)**: Add `OR position_old_path LIKE ?1 ESCAPE '\\'` and `OR old_path LIKE ?1 ESCAPE '\\'` to the respective queries.
|
||||
|
||||
**Changes to `suffix_probe()`**:
|
||||
|
||||
The UNION query inside `suffix_probe()` currently only selects `position_new_path` from notes and `new_path` from file changes. Add two additional UNION branches:
|
||||
|
||||
```sql
|
||||
UNION
|
||||
SELECT position_old_path AS full_path FROM notes
|
||||
WHERE note_type = 'DiffNote' AND is_system = 0
|
||||
AND position_old_path IS NOT NULL
|
||||
AND (position_old_path LIKE ?1 ESCAPE '\\' OR position_old_path = ?2)
|
||||
AND (?3 IS NULL OR project_id = ?3)
|
||||
UNION
|
||||
SELECT old_path AS full_path FROM mr_file_changes
|
||||
WHERE old_path IS NOT NULL
|
||||
AND (old_path LIKE ?1 ESCAPE '\\' OR old_path = ?2)
|
||||
AND (?3 IS NULL OR project_id = ?3)
|
||||
```
|
||||
|
||||
This ensures that querying by an old filename (e.g., `login.rs` after it was renamed to `auth.rs`) still resolves to a usable path for scoring. The UNION deduplicates so the same path appearing in both old and new columns doesn't cause false ambiguity.
|
||||
|
||||
**State-aware timestamps for file-change signals (signals 3, 4a, 4b)**: Replace `m.updated_at` with a state-aware expression:
|
||||
|
||||
```sql
|
||||
CASE
|
||||
WHEN m.state = 'merged' THEN COALESCE(m.merged_at, m.created_at)
|
||||
WHEN m.state = 'closed' THEN COALESCE(m.closed_at, m.created_at)
|
||||
ELSE COALESCE(m.updated_at, m.created_at) -- opened / other
|
||||
END AS activity_ts
|
||||
```
|
||||
|
||||
**Rationale**: `updated_at` is noisy for merged MRs — it changes on label edits, title changes, rebases, and metadata touches, creating false recency. `merged_at` is the best indicator of when code expertise was formed (the moment the code entered the branch). But for **open MRs**, `updated_at` is actually the right signal because it reflects ongoing active work. `closed_at` anchors closed-without-merge MRs to their closure time (these represent review effort even if the code was abandoned). Each state gets the timestamp that best represents when expertise was last exercised.
|
||||
|
||||
### 4. Rust-Side Aggregation (who.rs)
|
||||
|
||||
For each username, accumulate into a struct with:
|
||||
- **Author MRs**: `HashMap<i64, (i64, String)>` (mr_id -> (max timestamp, mr_state)) from `diffnote_author` + `file_author` signals
|
||||
- **Reviewer Participated MRs**: `HashMap<i64, (i64, String)>` from `diffnote_reviewer` + `file_reviewer_participated` signals
|
||||
- **Reviewer Assigned-Only MRs**: `HashMap<i64, (i64, String)>` from `file_reviewer_assigned` signals (excluding any MR already in participated set)
|
||||
- **Notes per MR**: `HashMap<i64, (u32, i64, String)>` (mr_id -> (count, max_ts, mr_state)) from `note_group` rows in the aggregated query (already grouped per user+MR with note_count in `qty`). Used for `log2(1 + count)` diminishing returns.
|
||||
- **Last seen**: max of all timestamps
|
||||
- **Components** (when `--explain-score`): Track per-component f64 subtotals for `author`, `reviewer_participated`, `reviewer_assigned`, `notes`
|
||||
|
||||
The `mr_state` field from each SQL row is stored alongside the timestamp so the Rust-side can apply `closed_mr_multiplier` when `mr_state == "closed"`.
|
||||
|
||||
Compute score as `f64`. Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`:
|
||||
```
|
||||
state_mult(mr) = if mr.state == "closed" { closed_mr_multiplier } else { 1.0 }
|
||||
|
||||
raw_score =
|
||||
sum(author_weight * state_mult(mr) * decay(now - ts, author_hl) for (mr, ts) in author_mrs)
|
||||
+ sum(reviewer_weight * state_mult(mr) * decay(now - ts, reviewer_hl) for (mr, ts) in reviewer_participated)
|
||||
+ sum(reviewer_assignment_weight * state_mult(mr) * decay(now - ts, reviewer_assignment_hl) for (mr, ts) in reviewer_assigned)
|
||||
+ sum(note_bonus * state_mult(mr) * log2(1 + count) * decay(now - ts, note_hl) for (mr, count, ts) in notes_per_mr)
|
||||
```
|
||||
|
||||
**Why include closed MRs?** A closed-without-merge MR still represents review effort and code familiarity — the reviewer read the diff, left comments, and engaged with the code even though it was ultimately abandoned. Excluding closed MRs entirely (the previous plan's approach) discarded this signal. The `closed_mr_multiplier` (default 0.5) halves the contribution, reflecting that the code never landed but the reviewer's cognitive engagement was real. This also eliminates the dead-code inconsistency where the state-aware CASE expression handled `closed` but the WHERE clause excluded it.
|
||||
|
||||
**Sort on raw `f64` score** — `(raw_score DESC, last_seen DESC, username ASC)`. This prevents false ties from premature rounding. Only round to `i64` for the `Expert.score` display field after sorting and truncation. The robot JSON `score` field stays integer for backward compatibility. When `--explain-score` is active, also include `score_raw` (the unrounded f64) alongside `score` so the component totals can be verified without rounding noise.
|
||||
|
||||
Compute counts from the accumulated data:
|
||||
- `review_mr_count = reviewer_participated.len() + reviewer_assigned.len()`
|
||||
- `review_note_count = notes_per_mr.values().map(|(count, _)| count).sum()`
|
||||
- `author_mr_count = author_mrs.len()`
|
||||
|
||||
Truncate to limit after sorting.
|
||||
|
||||
### 5. Default --since Change
|
||||
|
||||
Expert mode: `"6m"` -> `"24m"` (line 289 in who.rs).
|
||||
At 2 years, author decay = 6%, reviewer decay = 0.4%, note decay = 0.006% — negligible, good cutoff.
|
||||
|
||||
**Diagnostic escape hatch**: Add `--all-history` flag (conflicts with `--since`) that sets `since_ms = 0`, capturing all data regardless of age. Useful for debugging scoring anomalies and validating the decay model against known experts. The `since_mode` field in robot JSON reports `"all"` when this flag is active.
|
||||
|
||||
### 5a. Reproducible Scoring via `--as-of`
|
||||
|
||||
Add `--as-of <RFC3339|YYYY-MM-DD>` flag that overrides the `now_ms` reference point used for decay calculations. When set:
|
||||
- All event selection is bounded by `[since_ms, as_of_ms]` — events after `as_of_ms` are excluded from SQL results entirely (not just decayed)
|
||||
- All decay computations use `as_of_ms` instead of `SystemTime::now()`
|
||||
- The `--since` window is calculated relative to `as_of_ms` (not wall clock)
|
||||
- Robot JSON `resolved_input` includes `as_of_ms` and `as_of_iso` fields
|
||||
|
||||
**Rationale**: Decayed scoring is time-sensitive by nature. Without a fixed reference point, the same query run minutes apart produces different rankings, making debugging and test reproducibility difficult. `--as-of` pins the clock so that results are deterministic for a given dataset. The upper-bound filter in SQL is critical — without it, events after the as-of date would enter with full weight (since `elapsed.max(0.0)` clamps negative elapsed time to zero), breaking the reproducibility guarantee.
|
||||
|
||||
Implementation: Parse the flag in `run_who()`, compute `as_of_ms: i64`, and thread it through to `query_expert()` where it replaces `now_ms()` and is bound as `?4` in all SQL queries. When the flag is absent, `?4` defaults to `now_ms()` (wall clock), which makes the upper bound transparent — all events are within the window by definition. The flag is compatible with all modes but primarily useful in expert mode.
|
||||
|
||||
### 5b. Score Explainability via `--explain-score`
|
||||
|
||||
Add `--explain-score` flag that augments each expert result with a per-user component breakdown:
|
||||
|
||||
```json
|
||||
{
|
||||
"username": "jsmith",
|
||||
"score": 42,
|
||||
"score_raw": 42.0,
|
||||
"components": {
|
||||
"author": 28.5,
|
||||
"reviewer_participated": 8.2,
|
||||
"reviewer_assigned": 1.8,
|
||||
"notes": 3.5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Scope for this iteration**: Component breakdown only (4 floats per user). No top-evidence MRs, no decay curves, no per-MR drill-down. Those are v2 features if scoring disputes arise frequently.
|
||||
|
||||
**Flag conflicts**: `--explain-score` is mutually exclusive with `--detail`. Both augment per-user output in different ways; combining them would produce confusing overlapping output. Clap `conflicts_with` enforces this at parse time.
|
||||
|
||||
**Human output**: When `--explain-score` is active in human mode, append a parenthetical after each score: `42 (author:28.5 review:10.0 notes:3.5)`.
|
||||
|
||||
**Robot output**: Add `score_raw` (unrounded f64) and `components` object to each expert entry. Only present when `--explain-score` is active (no payload bloat by default). The `resolved_input` section also includes `scoring_model_version: 2` to distinguish from the v1 flat-weight model, enabling robot clients to adapt parsing.
|
||||
|
||||
**Rationale**: Multi-signal decayed ranking will be disputed without decomposition. Showing which signal drives a user's score makes results actionable and builds trust in the model. Keeping scope minimal avoids the output format complexity that originally motivated deferral.
|
||||
|
||||
### 6. Index Migration (db.rs)
|
||||
|
||||
Add a new migration to support the restructured query patterns. The dual-path matching CTEs and `reviewer_participation` CTE introduce query shapes that need index coverage:
|
||||
|
||||
```sql
|
||||
-- Support dual-path matching on DiffNotes (old_path leg of the OR in matched_notes CTE)
|
||||
CREATE INDEX IF NOT EXISTS idx_notes_old_path_author
|
||||
ON notes(position_old_path, author_username, created_at)
|
||||
WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;
|
||||
|
||||
-- Support dual-path matching on file changes (old_path leg of the OR in matched_file_changes CTE)
|
||||
CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
|
||||
ON mr_file_changes(old_path, project_id, merge_request_id)
|
||||
WHERE old_path IS NOT NULL;
|
||||
|
||||
-- Support new_path matching on file changes (ensure index parity with old_path)
|
||||
-- Existing indexes may not have optimal column order for the CTE pattern.
|
||||
CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
|
||||
ON mr_file_changes(new_path, project_id, merge_request_id);
|
||||
|
||||
-- Support reviewer_participation CTE: joining matched_notes -> discussions -> mr_reviewers
|
||||
-- notes.discussion_id (NOT noteable_id, which doesn't exist in the schema) is the FK to discussions
|
||||
CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author
|
||||
ON notes(discussion_id, author_username, created_at)
|
||||
WHERE note_type = 'DiffNote' AND is_system = 0;
|
||||
```
|
||||
|
||||
**Rationale**: The existing indexes cover `position_new_path` and `new_path` but not their `old_path` counterparts. Without these, the `OR old_path` clauses would force table scans on renamed files. The `reviewer_participation` CTE joins `matched_notes` -> `discussions` -> `merge_requests`, so an index on `(discussion_id, author_username)` speeds up the CTE materialization.
|
||||
|
||||
**Schema note**: The `notes` table uses `discussion_id` as its FK to `discussions`, which in turn has `merge_request_id`. There is no `noteable_id` column on `notes`. The previous plan revision incorrectly referenced `noteable_id` — this is corrected.
|
||||
|
||||
**Removed**: The `idx_mr_state_timestamps` composite index on `merge_requests(state, merged_at, closed_at, updated_at, created_at)` was removed. MR lookups in the scoring query are always id-driven (joining from `matched_file_changes` or `discussions`), so the state-aware CASE expression operates on rows already fetched by primary key. A low-selectivity composite index on 5 columns would consume space without improving any query path.
|
||||
|
||||
Partial indexes (with `WHERE` clauses) keep the index size minimal — only DiffNote rows and non-null old_path rows are indexed.
|
||||
|
||||
### 7. Test Helpers
|
||||
|
||||
Add timestamp-aware variants:
|
||||
- `insert_mr_at(conn, id, project_id, iid, author, state, updated_at_ms)`
|
||||
- `insert_diffnote_at(conn, id, discussion_id, project_id, author, file_path, body, created_at_ms)`
|
||||
|
||||
### 8. New Tests (TDD)
|
||||
|
||||
#### Example-based tests
|
||||
|
||||
**`test_half_life_decay_math`**: Verify the pure function:
|
||||
- elapsed=0 -> 1.0
|
||||
- elapsed=half_life -> 0.5
|
||||
- elapsed=2*half_life -> 0.25
|
||||
- half_life_days=0 -> 0.0 (guard against div-by-zero)
|
||||
|
||||
**`test_expert_scores_decay_with_time`**: Two authors, one recent (10 days), one old (360 days). Recent author should score ~24, old author ~6.
|
||||
|
||||
**`test_expert_reviewer_decays_faster_than_author`**: Same MR, same age (90 days). Author retains ~18 points, reviewer retains ~5 points. Author dominates clearly.
|
||||
|
||||
**`test_reviewer_participated_vs_assigned_only`**: Two reviewers on the same MR at the same age. One left DiffNotes (participated), one didn't (assigned-only). Participated reviewer should score ~10 * decay, assigned-only should score ~3 * decay. Verifies the split works end-to-end.
|
||||
|
||||
**`test_note_diminishing_returns_per_mr`**: One reviewer with 1 note on MR-A and another with 20 notes on MR-B, both at same age. The 20-note reviewer should score `log2(21)/log2(2) ≈ 4.4x` the 1-note reviewer, NOT 20x. Validates the `log2(1+count)` cap.
|
||||
|
||||
**`test_config_validation_rejects_zero_half_life`**: `ScoringConfig` with `author_half_life_days = 0` should return `ConfigInvalid` error.
|
||||
|
||||
**`test_file_change_timestamp_uses_merged_at`**: An MR with `merged_at` set and `state = 'merged'` should use `merged_at` timestamp, not `updated_at`. Verify by setting `merged_at` to old date and `updated_at` to recent date — score should reflect the old date.
|
||||
|
||||
**`test_open_mr_uses_updated_at`**: An MR with `state = 'opened'` should use `updated_at` (not `created_at`). Verify that an open MR with recent `updated_at` scores higher than one with the same `created_at` but older `updated_at`.
|
||||
|
||||
**`test_old_path_match_credits_expertise`**: Insert a DiffNote with `position_old_path = "src/old.rs"` and `position_new_path = "src/new.rs"`. Query `--path src/old.rs` — the author should appear. Query `--path src/new.rs` — same author should also appear. Validates dual-path matching.
|
||||
|
||||
**`test_explain_score_components_sum_to_total`**: With `--explain-score`, verify that `components.author + components.reviewer_participated + components.reviewer_assigned + components.notes` equals the reported `score_raw` (within f64 rounding tolerance). Note: the closed_mr_multiplier is already folded into the per-component subtotals, not tracked as a separate component.
|
||||
|
||||
**`test_as_of_produces_deterministic_results`**: Insert data at known timestamps. Run `query_expert` twice with the same `--as-of` value — results must be identical. Then run with a later `--as-of` — scores should be lower (more decay).
|
||||
|
||||
**`test_old_path_probe_exact_and_prefix`**: Insert a DiffNote with `position_old_path = "src/old/foo.rs"` and `position_new_path = "src/new/foo.rs"`. Call `build_path_query(conn, "src/old/foo.rs")` — should resolve as exact file (not "not found"). Call `build_path_query(conn, "src/old/")` — should resolve as prefix. Validates that the path resolution probes now check old_path columns.
|
||||
|
||||
**`test_suffix_probe_uses_old_path_sources`**: Insert a file change with `old_path = "legacy/utils.rs"` and `new_path = "src/utils.rs"`. Call `build_path_query(conn, "legacy/utils.rs")` — should resolve via exact probe on old_path. Call `build_path_query(conn, "utils.rs")` — suffix probe should find both `legacy/utils.rs` and `src/utils.rs` and either resolve uniquely (if deduplicated) or report ambiguity.
|
||||
|
||||
**`test_since_relative_to_as_of_clock`**: Insert data at timestamps T1 and T2 (T2 > T1). With `--as-of T2` and `--since 30d`, the window is `[T2 - 30d, T2]`, not `[now - 30d, now]`. Verify that data at T1 is included or excluded based on the as-of-relative window, not the wall clock window.
|
||||
|
||||
**`test_explain_and_detail_are_mutually_exclusive`**: Parsing `--explain-score --detail` should fail with a conflict error from clap.
|
||||
|
||||
**`test_trivial_note_does_not_count_as_participation`**: A reviewer who left only a short note ("LGTM", 4 chars) on an MR should be classified as assigned-only, not participated, when `reviewer_min_note_chars = 20`. A reviewer who left a substantive note (>= 20 chars) should be classified as participated. Validates the LENGTH threshold in the `reviewer_participation` CTE.
|
||||
|
||||
**`test_closed_mr_multiplier`**: Two identical MRs (same author, same age, same path). One is `merged`, one is `closed`. The merged MR should contribute `author_weight * decay(...)`, the closed MR should contribute `author_weight * closed_mr_multiplier * decay(...)`. With default multiplier 0.5, the closed MR contributes half.
|
||||
|
||||
**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the upper-bound filtering in SQL.
|
||||
|
||||
**`test_null_timestamp_fallback_to_created_at`**: Insert a merged MR with `merged_at = NULL` (edge case: old data before the column was populated). The state-aware timestamp should fall back to `created_at`. Verify the score reflects `created_at`, not 0 or a panic.
|
||||
|
||||
#### Invariant tests (regression safety for ranking systems)
|
||||
|
||||
**`test_score_monotonicity_by_age`**: For any single signal type, an older timestamp must never produce a higher score than a newer timestamp with the same weight and half-life. Generate N random (age, half_life) pairs and assert `decay(older) <= decay(newer)` for all.
|
||||
|
||||
**`test_row_order_independence`**: Insert the same set of signals in two different orders (e.g., reversed). Run `query_expert` on both — the resulting rankings (username order + scores) must be identical. Validates that neither SQL ordering nor HashMap iteration order affects final output.
|
||||
|
||||
**`test_reviewer_split_is_exhaustive`**: For a reviewer assigned to an MR, they must appear in exactly one of: participated (has substantive DiffNotes meeting `reviewer_min_note_chars`) or assigned-only (no DiffNotes, or only trivial ones below the threshold). Never both, never neither. Test three cases: (1) reviewer with substantive DiffNotes -> participated only, (2) reviewer with no DiffNotes -> assigned-only only, (3) reviewer with only trivial notes ("LGTM") -> assigned-only only.
|
||||
|
||||
### 9. Existing Test Compatibility
|
||||
|
||||
All existing tests insert data with `now_ms()`. With decay, elapsed ~0ms means decay ~1.0, so scores round to the same integers as before. No existing test assertions should break.
|
||||
|
||||
The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, and `reviewer_min_note_chars` fields.
|
||||
|
||||
## Verification
|
||||
|
||||
1. `cargo check --all-targets` — no compiler errors
|
||||
2. `cargo clippy --all-targets -- -D warnings` — no lints
|
||||
3. `cargo fmt --check` — formatting clean
|
||||
4. `cargo test` — all existing + new tests pass (including invariant tests)
|
||||
5. `ubs src/cli/commands/who.rs src/core/config.rs src/core/db.rs` — no bug scanner findings
|
||||
6. Manual query plan verification (not automated — SQLite planner varies across versions):
|
||||
- Run `EXPLAIN QUERY PLAN` on the expert query (both exact and prefix modes) against a real database
|
||||
- Confirm that `matched_notes` CTE uses `idx_notes_old_path_author` or the existing new_path index (not a full table scan)
|
||||
- Confirm that `matched_file_changes` CTE uses `idx_mfc_old_path_project_mr` or `idx_mfc_new_path_project_mr`
|
||||
- Confirm that `reviewer_participation` CTE uses `idx_notes_diffnote_discussion_author`
|
||||
- Document the observed plan in a comment near the SQL for future regression reference
|
||||
7. Real-world validation:
|
||||
- `cargo run --release -- who --path MeasurementQualityDialog.tsx` — verify jdefting/zhayes old reviews are properly discounted relative to recent authors
|
||||
- `cargo run --release -- who --path MeasurementQualityDialog.tsx --all-history` — compare full history vs 24m window to validate cutoff is reasonable
|
||||
- `cargo run --release -- who --path MeasurementQualityDialog.tsx --explain-score` — verify component breakdown sums to total and authored signal dominates for known authors
|
||||
- Spot-check that assigned-only reviewers (those who never left DiffNotes) rank below participated reviewers on the same MR
|
||||
- Test a known renamed file path — verify expertise from the old name carries forward
|
||||
- `cargo run --release -- who --path MeasurementQualityDialog.tsx --as-of 2025-06-01` — verify deterministic output across repeated runs
|
||||
- Spot-check that reviewers who only left "LGTM"-style notes are classified as assigned-only (not participated)
|
||||
- Verify closed MRs contribute at ~50% of equivalent merged MR scores via `--explain-score`
|
||||
|
||||
## Accepted from External Review
|
||||
|
||||
Ideas incorporated from ChatGPT review (feedback-1 through feedback-4) that genuinely improved the plan:
|
||||
|
||||
**From feedback-1 and feedback-2:**
|
||||
- **Path rename awareness (old_path matching)**: Real correctness gap. Both `position_old_path` and `mr_file_changes.old_path` exist in the schema. Simple `OR` clause addition with high value — expertise now survives file renames.
|
||||
- **Hybrid SQL pre-aggregation**: Revised from "fully raw rows" to pre-aggregate note counts per (user, MR) in SQL. MR-level signals were already 1-per-MR; the note rows were the actual scalability risk. Bounded row counts with predictable memory.
|
||||
- **State-aware timestamps**: Improved from our overly-simple `COALESCE(merged_at, created_at)` to a state-aware CASE expression. Open MRs genuinely need `updated_at` to reflect ongoing work; merged MRs need `merged_at` to anchor expertise formation.
|
||||
- **Index migration**: The dual-path matching and CTE patterns need index support. Added partial indexes to keep size minimal.
|
||||
- **Invariant tests**: `test_score_monotonicity_by_age`, `test_row_order_independence`, `test_reviewer_split_is_exhaustive` catch subtle ranking regressions that example-based tests miss.
|
||||
- **`--as-of` flag**: Simple clock-pinning for reproducible decay scoring. Essential for debugging and test determinism.
|
||||
- **`--explain-score` flag**: Moved from rejected to included with minimal scope (component breakdown only, no per-MR drill-down). Multi-signal scoring needs decomposition to build trust.
|
||||
|
||||
**From feedback-3:**
|
||||
- **Fix `noteable_id` index bug (critical)**: The `notes` table uses `discussion_id` as FK to `discussions`, not `noteable_id` (which doesn't exist). The proposed `idx_notes_mr_path_author` index would fail at migration time. Fixed to use `(discussion_id, author_username, created_at)`.
|
||||
- **CTE-based dual-path matching (`matched_notes`, `matched_file_changes`)**: Rather than repeating `OR old_path` in every signal subquery, centralize path matching in foundational CTEs. Defined once, indexed once, maintained once. Cleaner and more extensible.
|
||||
- **Precomputed `reviewer_participation` CTE**: Replaced correlated `EXISTS`/`NOT EXISTS` subqueries with a materialized set of `(mr_id, username)` pairs. Same semantics, lower query cost, simpler reasoning about the reviewer split.
|
||||
- **`log2(1+count)` over `ln(1+count)` for notes**: With `log2`, one note contributes exactly 1.0 unit (since `log2(2) = 1`), making `note_bonus=1` directly interpretable. `ln` gives 0.69 per note, which is unintuitive.
|
||||
- **Path resolution probe rename awareness**: The plan added `old_path` matching to scoring queries but missed the upstream path resolution layer (`build_path_query()` probes and `suffix_probe()`). Without this, querying an old path name fails at resolution and never reaches scoring. Now both probes check old_path columns.
|
||||
- **Removed low-selectivity `idx_mr_state_timestamps`**: MR lookups in scoring are id-driven (from file_changes or discussions), so a 5-column composite on state/timestamps adds no query benefit.
|
||||
- **Added `idx_mfc_new_path_project_mr`**: Ensures index parity between old and new path columns on `mr_file_changes`.
|
||||
- **`--explain-score` conflicts with `--detail`**: Prevents confusing overlapping output from two per-user augmentation flags.
|
||||
- **`scoring_model_version` in resolved_input**: Lets robot clients distinguish v1 (flat weights) from v2 (decayed) output schemas.
|
||||
- **`score_raw` in explain mode**: Exposes the unrounded f64 so component totals can be verified without rounding noise.
|
||||
- **New tests**: `test_old_path_probe_exact_and_prefix`, `test_suffix_probe_uses_old_path_sources`, `test_since_relative_to_as_of_clock`, `test_explain_and_detail_are_mutually_exclusive`, `test_null_timestamp_fallback_to_created_at` — cover the newly-identified gaps in path resolution, clock semantics, and edge cases.
|
||||
- **EXPLAIN QUERY PLAN verification step**: Manual check that the restructured queries use the new indexes (not automated, since SQLite planner varies across versions).
|
||||
|
||||
**From feedback-4:**
|
||||
- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `<= ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms]`. Without this, `--as-of` reproducibility was fundamentally broken.
|
||||
- **Closed-state inconsistency resolution**: The state-aware CASE expression handled `closed` state but the WHERE clause filtered to `('opened','merged')` only — dead code. Resolved by including `'closed'` in state filters and adding a `closed_mr_multiplier` (default 0.5) applied in Rust to all signals from closed-without-merge MRs. This credits real review effort on abandoned MRs while appropriately discounting it.
|
||||
- **Substantive note threshold for reviewer participation**: A single "LGTM" shouldn't promote a reviewer from 3-point (assigned-only) to 10-point (participated) weight. Added `reviewer_min_note_chars` (default 20) config field and `LENGTH(TRIM(body))` filter in the `reviewer_participation` CTE. This raises the bar for participation classification to actual substantive review comments.
|
||||
- **UNION ALL optimization fallback for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Added documentation of a `UNION ALL` split + dedup fallback pattern to use if EXPLAIN QUERY PLAN shows degradation during verification. Start with the simpler `OR` approach; switch only if needed.
|
||||
- **New tests**: `test_trivial_note_does_not_count_as_participation`, `test_closed_mr_multiplier`, `test_as_of_excludes_future_events` — cover the three new features added from this review round.
|
||||
|
||||
## Rejected Ideas (with rationale)
|
||||
|
||||
These suggestions were considered during review but explicitly excluded from this iteration:
|
||||
|
||||
- **Rename alias chain expansion (A->B->C traversal)** (feedback-2 #2, feedback-4 #4): Over-engineered for v1. The old_path `OR` match covers the 80% case (direct renames). Building a canonical path identity table at ingest time adds schema, ingestion logic, and graph traversal complexity for rare multi-hop renames. If real-world usage shows fragmented expertise on multi-rename files, this becomes a v2 feature.
|
||||
- **Config-driven `max_age_days`** (feedback-1 #5, feedback-2 #5): We already have `--since` (explicit window), `--all-history` (no window), and the 24m default (mathematically justified). Adding a config field that derives the default since window creates confusing interaction between config and CLI flags. If half-lives change, updating the default constant is trivial.
|
||||
- **Config-driven `decay_floor` for derived `--since` default** (feedback-3 #4): Proposed computing the default since window as `ceil(max_half_life * log2(1/floor))` so it auto-adjusts when half-lives change. Rejected: the formula is non-obvious to users, adds a config param (`decay_floor`) with no intuitive meaning, and the benefit is negligible — half-life changes are rare, and updating a constant is trivial. The 24m default is already mathematically justified and easy to override with `--since` or `--all-history`.
|
||||
- **BTreeMap + Kahan/Neumaier compensated summation** (feedback-3 #6): Proposed deterministic iteration order and numerically stable summation. Rejected for this scale: the accumulator processes dozens to low hundreds of entries per user, where HashMap iteration order doesn't measurably affect f64 sums. Compensated summation adds code complexity for zero practical benefit at this magnitude. If we eventually aggregate thousands of signals per user, revisit.
|
||||
- **Confidence/coverage metadata** (feedback-1 #8, feedback-2 #8, feedback-3 #9, feedback-4 #6): Repeatedly proposed across reviews with variations (score_adjusted with confidence factor, low/medium/high labels, evidence_mr_count weighting). Still scope creep. The `--explain-score` component breakdown already tells users which signal drives the score. Defining "sparse evidence" thresholds (how many MRs is "low"? what's the right exponential saturation constant?) is domain-specific guesswork without user feedback data. A single recent MR "outranking broader expertise" is the *correct* behavior of time-decay — the model intentionally weights recency. If real-world usage shows this is a problem, confidence becomes a v2 feature informed by actual threshold data.
|
||||
- **Automated EXPLAIN QUERY PLAN tests** (feedback-3 #10 partial): SQLite's query planner changes across versions and can use different plans on different data distributions. Automated assertions on plan output are brittle. Instead, we document EXPLAIN QUERY PLAN as a manual verification step during development and include the observed plan as a comment near the SQL.
|
||||
- **Per-MR evidence drill-down in `--explain-score`** (feedback-2 #7 promoted this): The v1 `--explain-score` shows component totals only. Listing top-evidence MRs per user would require additional SQL queries and significant output format work. Deferred unless component breakdowns prove insufficient for debugging.
|
||||
- **Split scoring engine into core module** (feedback-4 #5): Proposed extracting scoring math from `who.rs` into `src/core/scoring/model_v2_decay.rs`. Premature modularization — `who.rs` is the only consumer and is ~800 lines. Adding module plumbing and indirection for a single call site adds complexity without reducing it. If we add a second scoring consumer (e.g., automated triage), revisit.
|
||||
- **Bot/service-account filtering** (feedback-4 #7): Real concern but orthogonal to time-decay scoring. This is a general data quality feature that belongs in its own issue — it affects all `who` modes, not just expert scoring. Adding `excluded_username_patterns` config and `--include-bots` flag is scope expansion that should be designed and tested independently.
|
||||
- **Model compare mode / rank-delta diagnostics** (feedback-4 #9): Over-engineered rollout safety for an internal CLI tool with ~3 users. Maintaining two parallel scoring codepaths (v1 flat + v2 decayed) doubles test surface and code complexity. The `--explain-score` + `--as-of` combination already provides debugging capability. If a future model change is risky enough to warrant A/B comparison, build it then.
|
||||
Reference in New Issue
Block a user