gitlore/plans/time-decay-expert-scoring.feedback-3.md at 386dd884ecb3131e684e8eefb424b6aee66c85e0

Files

Taylor Eernisse 4185abe05d docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc

Ideas catalog (docs/ideas/): 25 feature concept documents covering future
lore capabilities including bottleneck detection, churn analysis, expert
scoring, collaboration patterns, milestone risk, knowledge silos, and more.
Each doc includes motivation, implementation sketch, data requirements, and
dependencies on existing infrastructure. README.md provides an overview and
SYSTEM-PROPOSAL.md presents the unified analytics vision.

Plans (plans/): Time-decay expert scoring design with four rounds of review
feedback exploring decay functions, scoring algebra, and integration points
with the existing who-expert pipeline.

Issue doc (docs/issues/001): Documents the timeline pipeline bug where
EntityRef was missing project context, causing ambiguous cross-project
references during the EXPAND stage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-09 10:16:48 -05:00

7.1 KiB

Raw Blame History

Critical Plan Findings First

The proposed index idx_notes_mr_path_author ON notes(noteable_id, ...) will fail: notes.noteable_id does not exist in schema (migrations/002_issues.sql:74).
Rename awareness is only applied in scoring queries, not in path resolution probes; today build_path_query() and suffix_probe() only inspect position_new_path/new_path (src/cli/commands/who.rs:465, src/cli/commands/who.rs:591), so old-path queries can still miss.
A fixed "24m" default window is brittle once half-lives become configurable; it can silently truncate meaningful history for larger half-lives.

Below are the revisions I’d make to your plan.

Fix migration/index architecture (blocking correctness + perf) Rationale: prevents migration failure and aligns indexes to actual query shapes.

diff --git a/plan.md b/plan.md
@@ ### 6. Index Migration (db.rs)
- -- Support EXISTS subquery for reviewer participation check
- CREATE INDEX IF NOT EXISTS idx_notes_mr_path_author
-   ON notes(noteable_id, position_new_path, author_username)
-   WHERE note_type = 'DiffNote' AND is_system = 0;
+ -- Support reviewer participation joins (notes -> discussions -> MR)
+ CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author_created
+   ON notes(discussion_id, author_username, created_at)
+   WHERE note_type = 'DiffNote' AND is_system = 0;
+
+ -- Path-first indexes for global and project-scoped path lookups
+ CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
+   ON mr_file_changes(new_path, project_id, merge_request_id);
+ CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
+   ON mr_file_changes(old_path, project_id, merge_request_id)
+   WHERE old_path IS NOT NULL;
@@
- -- Support state-aware timestamp selection
- CREATE INDEX IF NOT EXISTS idx_mr_state_timestamps
-   ON merge_requests(state, merged_at, closed_at, updated_at, created_at);
+ -- Removed: low-selectivity timestamp composite index; joins are MR-id driven.

Restructure SQL around matched_mrs CTE instead of repeating OR path clauses Rationale: better index use, less duplicated logic, cleaner maintenance.

diff --git a/plan.md b/plan.md
@@ ### 3. SQL Restructure (who.rs)
- WITH raw AS (
-   -- 5 UNION ALL subqueries (signals 1, 2, 3, 4a, 4b)
- ),
+ WITH matched_notes AS (
+   -- DiffNotes matching new_path
+   ...
+   UNION ALL
+   -- DiffNotes matching old_path
+   ...
+ ),
+ matched_file_changes AS (
+   -- file changes matching new_path
+   ...
+   UNION ALL
+   -- file changes matching old_path
+   ...
+ ),
+ matched_mrs AS (
+   SELECT DISTINCT mr_id, project_id FROM matched_notes
+   UNION
+   SELECT DISTINCT mr_id, project_id FROM matched_file_changes
+ ),
+ raw AS (
+   -- signals sourced from matched_mrs + matched_notes
+ ),

Replace correlated EXISTS/NOT EXISTS reviewer split with one precomputed participation set Rationale: same semantics, lower query cost, easier reasoning.

diff --git a/plan.md b/plan.md
@@ Signal 4 splits into two
- Signal 4a uses an EXISTS subquery ...
- Signal 4b uses NOT EXISTS ...
+ Build `reviewer_participation(mr_id, username)` once from matched DiffNotes.
+ Then classify `mr_reviewers` rows via LEFT JOIN:
+ - participated: `rp.username IS NOT NULL`
+ - assigned-only: `rp.username IS NULL`
+ This avoids correlated EXISTS scans per reviewer row.

Make default --since derived from half-life + decay floor, not hardcoded 24m Rationale: remains mathematically consistent when config changes.

diff --git a/plan.md b/plan.md
@@ ### 1. ScoringConfig (config.rs)
+ pub decay_floor: f64, // default: 0.05
@@ ### 5. Default --since Change
- Expert mode: "6m" -> "24m"
+ Expert mode default window is computed:
+ default_since_days = ceil(max_half_life_days * log2(1.0 / decay_floor))
+ With defaults (max_half_life=180, floor=0.05), this is ~26 months.
+ CLI `--since` still overrides; `--all-history` still disables windowing.

Use log2(1+count) for notes instead of ln(1+count) Rationale: keeps 1 note ~= 1 unit (with note_bonus=1) while preserving diminishing returns.

diff --git a/plan.md b/plan.md
@@ Scoring Formula
- note_contribution(mr) = note_bonus * ln(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)
+ note_contribution(mr) = note_bonus * log2(1 + note_count_in_mr) * 2^(-days_elapsed / note_half_life)

Guarantee deterministic float aggregation and expose score_raw Rationale: avoids hash-order drift and explainability mismatch vs rounded integer score.

diff --git a/plan.md b/plan.md
@@ ### 4. Rust-Side Aggregation (who.rs)
- HashMap<i64, ...>
+ BTreeMap<i64, ...> (or sort keys before accumulation) for deterministic summation order
+ Use compensated summation (Kahan/Neumaier) for stable f64 totals
@@
- Sort on raw `f64` score ... round only for display
+ Keep `score_raw` internally and expose when `--explain-score` is active.
+ `score` remains integer for backward compatibility.

Extend rename awareness to query resolution (not only scoring) Rationale: fixes user-facing misses for old path input and suffix lookup.

diff --git a/plan.md b/plan.md
@@ Path rename awareness
- All signal subqueries match both old and new path columns
+ Also update `build_path_query()` probes and suffix probe:
+ - exact_exists: new_path OR old_path (notes + mr_file_changes)
+ - prefix_exists: new_path LIKE OR old_path LIKE
+ - suffix_probe: union of notes.position_new_path, notes.position_old_path,
+   mr_file_changes.new_path, mr_file_changes.old_path

Tighten CLI/output contracts for new flags Rationale: avoids payload bloat/ambiguity and keeps robot clients stable.

diff --git a/plan.md b/plan.md
@@ ### 5b. Score Explainability via `--explain-score`
+ `--explain-score` conflicts with `--detail` (mutually exclusive)
+ `resolved_input` includes `as_of_ms`, `as_of_iso`, `scoring_model_version`
+ robot output includes `score_raw` and `components` only when explain is enabled

Add confidence metadata (promote from rejected to accepted) Rationale: makes ranking more actionable and trustworthy with sparse evidence.

diff --git a/plan.md b/plan.md
@@ Rejected Ideas (with rationale)
- Confidence/coverage metadata: ... Deferred to avoid scope creep
+ Confidence/coverage metadata: ACCEPTED (minimal v1)
+ Add per-user `confidence: low|medium|high` based on evidence breadth + recency.
+ Keep implementation lightweight (no extra SQL pass).

Upgrade test and verification scope to include query-plan and clock semantics Rationale: catches regressions your current tests won’t.

diff --git a/plan.md b/plan.md
@@ 8. New Tests (TDD)
+ test_old_path_probe_exact_and_prefix
+ test_suffix_probe_uses_old_path_sources
+ test_since_relative_to_as_of_clock
+ test_explain_and_detail_are_mutually_exclusive
+ test_null_timestamp_fallback_to_created_at
+ test_query_plan_uses_path_indexes (EXPLAIN QUERY PLAN)
@@ Verification
+ 7. EXPLAIN QUERY PLAN snapshots for expert query (exact + prefix) confirm index usage

If you want, I can produce a single consolidated “revision 3” plan document that fully merges all of the above into your original structure.

7.1 KiB Raw Blame History Unescape Escape

7.1 KiB

Raw Blame History