Files
gitlore/plans/time-decay-expert-scoring.feedback-3.md
Taylor Eernisse 4185abe05d docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc
Ideas catalog (docs/ideas/): 25 feature concept documents covering future
lore capabilities including bottleneck detection, churn analysis, expert
scoring, collaboration patterns, milestone risk, knowledge silos, and more.
Each doc includes motivation, implementation sketch, data requirements, and
dependencies on existing infrastructure. README.md provides an overview and
SYSTEM-PROPOSAL.md presents the unified analytics vision.

Plans (plans/): Time-decay expert scoring design with four rounds of review
feedback exploring decay functions, scoring algebra, and integration points
with the existing who-expert pipeline.

Issue doc (docs/issues/001): Documents the timeline pipeline bug where
EntityRef was missing project context, causing ambiguous cross-project
references during the EXPAND stage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:48 -05:00

167 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
**Critical Plan Findings First**
1. The proposed index `idx_notes_mr_path_author ON notes(noteable_id, ...)` will fail: `notes.noteable_id` does not exist in schema (`migrations/002_issues.sql:74`).
2. Rename awareness is only applied in scoring queries, not in path resolution probes; today `build_path_query()` and `suffix_probe()` only inspect `position_new_path`/`new_path` (`src/cli/commands/who.rs:465`, `src/cli/commands/who.rs:591`), so old-path queries can still miss.
3. A fixed `"24m"` default window is brittle once half-lives become configurable; it can silently truncate meaningful history for larger half-lives.
Below are the revisions Id make to your plan.
1. **Fix migration/index architecture (blocking correctness + perf)**
Rationale: prevents migration failure and aligns indexes to actual query shapes.
```diff
diff --git a/plan.md b/plan.md
@@ ### 6. Index Migration (db.rs)
- -- Support EXISTS subquery for reviewer participation check
- CREATE INDEX IF NOT EXISTS idx_notes_mr_path_author
- ON notes(noteable_id, position_new_path, author_username)
- WHERE note_type = 'DiffNote' AND is_system = 0;
+ -- Support reviewer participation joins (notes -> discussions -> MR)
+ CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author_created
+ ON notes(discussion_id, author_username, created_at)
+ WHERE note_type = 'DiffNote' AND is_system = 0;
+
+ -- Path-first indexes for global and project-scoped path lookups
+ CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
+ ON mr_file_changes(new_path, project_id, merge_request_id);
+ CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
+ ON mr_file_changes(old_path, project_id, merge_request_id)
+ WHERE old_path IS NOT NULL;
@@
- -- Support state-aware timestamp selection
- CREATE INDEX IF NOT EXISTS idx_mr_state_timestamps
- ON merge_requests(state, merged_at, closed_at, updated_at, created_at);
+ -- Removed: low-selectivity timestamp composite index; joins are MR-id driven.
```
2. **Restructure SQL around `matched_mrs` CTE instead of repeating OR path clauses**
Rationale: better index use, less duplicated logic, cleaner maintenance.
```diff
diff --git a/plan.md b/plan.md
@@ ### 3. SQL Restructure (who.rs)
- WITH raw AS (
- -- 5 UNION ALL subqueries (signals 1, 2, 3, 4a, 4b)
- ),
+ WITH matched_notes AS (
+ -- DiffNotes matching new_path
+ ...
+ UNION ALL
+ -- DiffNotes matching old_path
+ ...
+ ),
+ matched_file_changes AS (
+ -- file changes matching new_path
+ ...
+ UNION ALL
+ -- file changes matching old_path
+ ...
+ ),
+ matched_mrs AS (
+ SELECT DISTINCT mr_id, project_id FROM matched_notes
+ UNION
+ SELECT DISTINCT mr_id, project_id FROM matched_file_changes
+ ),
+ raw AS (
+ -- signals sourced from matched_mrs + matched_notes
+ ),
```
3. **Replace correlated `EXISTS/NOT EXISTS` reviewer split with one precomputed participation set**
Rationale: same semantics, lower query cost, easier reasoning.
```diff
diff --git a/plan.md b/plan.md
@@ Signal 4 splits into two
- Signal 4a uses an EXISTS subquery ...
- Signal 4b uses NOT EXISTS ...
+ Build `reviewer_participation(mr_id, username)` once from matched DiffNotes.
+ Then classify `mr_reviewers` rows via LEFT JOIN:
+ - participated: `rp.username IS NOT NULL`
+ - assigned-only: `rp.username IS NULL`
+ This avoids correlated EXISTS scans per reviewer row.
```
4. **Make default `--since` derived from half-life + decay floor, not hardcoded 24m**
Rationale: remains mathematically consistent when config changes.
```diff
diff --git a/plan.md b/plan.md
@@ ### 1. ScoringConfig (config.rs)
+ pub decay_floor: f64, // default: 0.05
@@ ### 5. Default --since Change
- Expert mode: "6m" -> "24m"
+ Expert mode default window is computed:
+ default_since_days = ceil(max_half_life_days * log2(1.0 / decay_floor))
+ With defaults (max_half_life=180, floor=0.05), this is ~26 months.
+ CLI `--since` still overrides; `--all-history` still disables windowing.
```
5. **Use `log2(1+count)` for notes instead of `ln(1+count)`**
Rationale: keeps 1 note ~= 1 unit (with `note_bonus=1`) while preserving diminishing returns.
```diff
diff --git a/plan.md b/plan.md
@@ Scoring Formula
- note_contribution(mr) = note_bonus * ln(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)
+ note_contribution(mr) = note_bonus * log2(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)
```
6. **Guarantee deterministic float aggregation and expose `score_raw`**
Rationale: avoids hash-order drift and explainability mismatch vs rounded integer score.
```diff
diff --git a/plan.md b/plan.md
@@ ### 4. Rust-Side Aggregation (who.rs)
- HashMap<i64, ...>
+ BTreeMap<i64, ...> (or sort keys before accumulation) for deterministic summation order
+ Use compensated summation (Kahan/Neumaier) for stable f64 totals
@@
- Sort on raw `f64` score ... round only for display
+ Keep `score_raw` internally and expose when `--explain-score` is active.
+ `score` remains integer for backward compatibility.
```
7. **Extend rename awareness to query resolution (not only scoring)**
Rationale: fixes user-facing misses for old path input and suffix lookup.
```diff
diff --git a/plan.md b/plan.md
@@ Path rename awareness
- All signal subqueries match both old and new path columns
+ Also update `build_path_query()` probes and suffix probe:
+ - exact_exists: new_path OR old_path (notes + mr_file_changes)
+ - prefix_exists: new_path LIKE OR old_path LIKE
+ - suffix_probe: union of notes.position_new_path, notes.position_old_path,
+ mr_file_changes.new_path, mr_file_changes.old_path
```
8. **Tighten CLI/output contracts for new flags**
Rationale: avoids payload bloat/ambiguity and keeps robot clients stable.
```diff
diff --git a/plan.md b/plan.md
@@ ### 5b. Score Explainability via `--explain-score`
+ `--explain-score` conflicts with `--detail` (mutually exclusive)
+ `resolved_input` includes `as_of_ms`, `as_of_iso`, `scoring_model_version`
+ robot output includes `score_raw` and `components` only when explain is enabled
```
9. **Add confidence metadata (promote from rejected to accepted)**
Rationale: makes ranking more actionable and trustworthy with sparse evidence.
```diff
diff --git a/plan.md b/plan.md
@@ Rejected Ideas (with rationale)
- Confidence/coverage metadata: ... Deferred to avoid scope creep
+ Confidence/coverage metadata: ACCEPTED (minimal v1)
+ Add per-user `confidence: low|medium|high` based on evidence breadth + recency.
+ Keep implementation lightweight (no extra SQL pass).
```
10. **Upgrade test and verification scope to include query-plan and clock semantics**
Rationale: catches regressions your current tests wont.
```diff
diff --git a/plan.md b/plan.md
@@ 8. New Tests (TDD)
+ test_old_path_probe_exact_and_prefix
+ test_suffix_probe_uses_old_path_sources
+ test_since_relative_to_as_of_clock
+ test_explain_and_detail_are_mutually_exclusive
+ test_null_timestamp_fallback_to_created_at
+ test_query_plan_uses_path_indexes (EXPLAIN QUERY PLAN)
@@ Verification
+ 7. EXPLAIN QUERY PLAN snapshots for expert query (exact + prefix) confirm index usage
```
If you want, I can produce a single consolidated “revision 3” plan document that fully merges all of the above into your original structure.