docs: add lore-service, work-item-status-graphql, and time-decay plans
Three implementation plans with iterative cross-model refinement: lore-service (5 iterations): HTTP service layer exposing lore's SQLite data via REST/SSE for integration with external tools (dashboards, IDE extensions, chat agents). Covers authentication, rate limiting, caching strategy, and webhook-driven sync triggers. work-item-status-graphql (7 iterations + TDD appendix): Detailed implementation plan for the GraphQL-based work item status enrichment feature (now implemented). Includes the TDD appendix with test-first development specifications covering GraphQL client, adaptive pagination, ingestion orchestration, CLI display, and robot mode output. time-decay-expert-scoring (iteration 5 feedback): Updates to the existing time-decay scoring plan incorporating feedback on decay curve parameterization, recency weighting for discussion contributions, and staleness detection thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
128
plans/time-decay-expert-scoring.feedback-5.md
Normal file
128
plans/time-decay-expert-scoring.feedback-5.md
Normal file
@@ -0,0 +1,128 @@
|
||||
**Best Revisions To Strengthen The Plan**
|
||||
|
||||
1. **[Critical] Replace one-hop rename matching with canonical path identities**
|
||||
Analysis and rationale: Current `old_path OR new_path` fixes direct renames, but it still breaks on rename chains (`a.rs -> b.rs -> c.rs`) and split/move patterns. A canonical `path_identity` graph built from `mr_file_changes(old_path,new_path)` gives stable identity over time, which is the right architectural boundary for expertise history.
|
||||
```diff
|
||||
@@ ## Context
|
||||
-- Match both old and new paths in all signal queries AND path resolution probes so expertise survives file renames
|
||||
+- Build canonical path identities from rename edges and score by identity, not raw path strings, so expertise survives multi-hop renames and moves
|
||||
|
||||
@@ ## Files to Modify
|
||||
-2. **`src/cli/commands/who.rs`** — Core changes:
|
||||
+2. **`src/cli/commands/who.rs`** — Core changes:
|
||||
...
|
||||
- - Match both `new_path` and `old_path` in all signal queries (rename awareness)
|
||||
+ - Resolve queried paths to `path_identity_id` and match all aliases in that identity set
|
||||
+4. **`src/core/path_identity.rs`** — New module:
|
||||
+ - Build/maintain rename graph from `mr_file_changes`
|
||||
+ - Resolve path -> identity + aliases for probes/scoring
|
||||
```
|
||||
|
||||
2. **[Critical] Shift scoring input from runtime CTE joins to a normalized `expertise_events` table**
|
||||
Analysis and rationale: Your SQL is correct but complex and expensive at query time. Precomputing normalized events at ingestion gives simpler, faster, and more reliable scoring queries; it also enables model versioning/backfills without touching raw MR/note tables each request.
|
||||
```diff
|
||||
@@ ## Files to Modify
|
||||
-3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes
|
||||
+3. **`src/core/db.rs`** — Add migrations for:
|
||||
+ - `expertise_events` table (normalized scoring events)
|
||||
+ - supporting indexes
|
||||
+4. **`src/core/ingest/expertise_events.rs`** — New:
|
||||
+ - Incremental upsert of events during sync/ingest
|
||||
|
||||
@@ ## SQL Restructure (who.rs)
|
||||
-The SQL uses CTE-based dual-path matching and hybrid aggregation...
|
||||
+Runtime SQL reads precomputed `expertise_events` filtered by path identity + time window.
|
||||
+Heavy joins/aggregation move to ingest-time normalization.
|
||||
```
|
||||
|
||||
3. **[High] Upgrade reviewer engagement model beyond char-count threshold**
|
||||
Analysis and rationale: `min_note_chars` is a useful guardrail but brittle (easy to game, penalizes concise high-quality comments). Add explicit review-state signals (`approved`, `changes_requested`) and trivial-comment pattern filtering to better capture real reviewer expertise.
|
||||
```diff
|
||||
@@ ## Scoring Formula
|
||||
-| **Reviewer Participated** (left DiffNote on MR/path) | 10 | 90 days |
|
||||
+| **Reviewer Participated** (substantive DiffNote and/or formal review action) | 10 | 90 days |
|
||||
+| **Review Decision: changes_requested** | 6 | 120 days |
|
||||
+| **Review Decision: approved** | 4 | 75 days |
|
||||
|
||||
@@ ### 1. ScoringConfig (config.rs)
|
||||
pub reviewer_min_note_chars: u32,
|
||||
+ pub reviewer_trivial_note_patterns: Vec<String>, // default: ["lgtm","+1","nit","ship it","👍"]
|
||||
+ pub review_approved_weight: i64, // default: 4
|
||||
+ pub review_changes_requested_weight: i64, // default: 6
|
||||
```
|
||||
|
||||
4. **[High] Make temporal semantics explicit and deterministic**
|
||||
Analysis and rationale: `--as-of` is good, but day parsing and boundary semantics can still cause subtle reproducibility issues. Define window as `[since_ms, as_of_ms)` and parse `YYYY-MM-DD` as end-of-day UTC (or explicit timezone) so user expectations match outputs.
|
||||
```diff
|
||||
@@ ### 5a. Reproducible Scoring via `--as-of`
|
||||
-- All event selection is bounded by `[since_ms, as_of_ms]`
|
||||
+- All event selection is bounded by `[since_ms, as_of_ms)` (exclusive upper bound)
|
||||
+- `YYYY-MM-DD` is interpreted as `23:59:59.999Z` unless `--timezone` is provided
|
||||
+- Robot output includes `window_start_iso`, `window_end_iso`, `window_end_exclusive: true`
|
||||
```
|
||||
|
||||
5. **[High] Replace fixed default `--since 24m` with contribution-floor auto cutoff**
|
||||
Analysis and rationale: A static window is simple but often over-scans data. Compute a model-derived horizon from a minimum contribution floor (for example `0.01` points) per signal; this keeps results equivalent while reducing query cost.
|
||||
```diff
|
||||
@@ ### 5. Default --since Change
|
||||
-Expert mode: `"6m"` -> `"24m"`
|
||||
+Expert mode default: `--since auto`
|
||||
+`auto` computes earliest relevant timestamp from configured weights/half-lives and `min_contribution_floor`
|
||||
+Add config: `min_contribution_floor` (default: 0.01)
|
||||
+`--since` still overrides, `--all-history` still bypasses cutoff
|
||||
```
|
||||
|
||||
6. **[High] Add bot/service-account filtering now (not later)**
|
||||
Analysis and rationale: Bot activity can materially distort expertise rankings in real repos. This is low implementation cost with high quality gain and should be in v1 of the scoring revamp, not deferred.
|
||||
```diff
|
||||
@@ ### 1. ScoringConfig (config.rs)
|
||||
+ pub excluded_username_patterns: Vec<String>, // default: ["bot","\\[bot\\]","service-account","ci-"]
|
||||
@@ ### 2. SQL Restructure (who.rs)
|
||||
+Apply username exclusion in all signal sources unless `--include-bots` is set
|
||||
@@ ### 5b. Score Explainability via `--explain-score`
|
||||
+Add `filtered_events` counts in robot output metadata
|
||||
```
|
||||
|
||||
7. **[Medium] Enforce deterministic floating-point accumulation**
|
||||
Analysis and rationale: Even with small sets, unordered `HashMap` iteration can cause tiny platform-dependent ranking differences near ties. Sorting contributions and using Neumaier summation removes nondeterminism and stabilizes tests/CI.
|
||||
```diff
|
||||
@@ ### 4. Rust-Side Aggregation (who.rs)
|
||||
-Compute score as `f64`.
|
||||
+Compute score as `f64` using deterministic contribution ordering:
|
||||
+1) sort by (username, signal, mr_id, ts)
|
||||
+2) sum with Neumaier compensation
|
||||
+Tie-break remains `(raw_score DESC, last_seen DESC, username ASC)`
|
||||
```
|
||||
|
||||
8. **[Medium] Strengthen explainability with evidence, not just totals**
|
||||
Analysis and rationale: Component totals help, but disputes usually need “why this user got this score now.” Add compact top evidence rows per component (`mr_id`, `ts`, `raw_contribution`) behind an optional mode.
|
||||
```diff
|
||||
@@ ### 5b. Score Explainability via `--explain-score`
|
||||
-Component breakdown only (4 floats per user).
|
||||
+Add `--explain-score=summary|full`:
|
||||
+`summary`: current 4-component totals
|
||||
+`full`: adds top N evidence rows per component (default N=3)
|
||||
+Robot output includes per-evidence `mr_id`, `signal`, `ts`, `contribution`
|
||||
```
|
||||
|
||||
9. **[Medium] Make query plan strategy explicit: `UNION ALL` default for dual-path scans**
|
||||
Analysis and rationale: You currently treat `UNION ALL` as fallback if planner regresses. For SQLite, OR-across-indexed-columns regressions are common enough that defaulting to branch-split queries is often more predictable.
|
||||
```diff
|
||||
@@ **Index optimization fallback (UNION ALL split)**
|
||||
-Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm degradation.
|
||||
+Use `UNION ALL` + dedup as default for dual-path matching.
|
||||
+Keep `OR` variant as optional strategy flag for benchmarking/regression checks.
|
||||
```
|
||||
|
||||
10. **[Medium] Add explicit performance SLO + benchmark gate**
|
||||
Analysis and rationale: This plan is query-heavy and ranking-critical; add measurable performance budgets so future edits do not silently degrade UX. Include synthetic fixture benchmarks for exact, prefix, and suffix path modes.
|
||||
```diff
|
||||
@@ ## Verification
|
||||
+8. Performance regression gate:
|
||||
+ - `cargo bench --bench who_expert_scoring`
|
||||
+ - Dataset tiers: 100k, 1M, 5M notes
|
||||
+ - SLOs: p95 exact path < 150ms, prefix < 250ms, suffix < 400ms on reference hardware
|
||||
+ - Fail CI if regression > 20% vs stored baseline
|
||||
```
|
||||
|
||||
If you want, I can produce a single consolidated “iteration 5” plan document with these changes already merged into your current structure.
|
||||
Reference in New Issue
Block a user