docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc
Ideas catalog (docs/ideas/): 25 feature concept documents covering future lore capabilities including bottleneck detection, churn analysis, expert scoring, collaboration patterns, milestone risk, knowledge silos, and more. Each doc includes motivation, implementation sketch, data requirements, and dependencies on existing infrastructure. README.md provides an overview and SYSTEM-PROPOSAL.md presents the unified analytics vision. Plans (plans/): Time-decay expert scoring design with four rounds of review feedback exploring decay functions, scoring algebra, and integration points with the existing who-expert pipeline. Issue doc (docs/issues/001): Documents the timeline pipeline bug where EntityRef was missing project context, causing ambiguous cross-project references during the EXPAND stage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
81
docs/ideas/recurring-patterns.md
Normal file
81
docs/ideas/recurring-patterns.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# Recurring Bug Pattern Detector
|
||||
|
||||
- **Command:** `lore recurring-patterns [--min-cluster <N>]`
|
||||
- **Confidence:** 76%
|
||||
- **Tier:** 3
|
||||
- **Status:** proposed
|
||||
- **Effort:** high — vector clustering, threshold tuning
|
||||
|
||||
## What
|
||||
|
||||
Cluster closed issues by embedding similarity. Identify clusters of 3+ issues that
|
||||
are semantically similar — these represent recurring problems that need a systemic
|
||||
fix rather than one-off patches.
|
||||
|
||||
## Why
|
||||
|
||||
Finding the same bug filed 5 different ways is one of the most impactful things you
|
||||
can surface. This is a sophisticated use of the embedding pipeline that no competing
|
||||
tool offers. It turns "we keep having auth issues" from a gut feeling into data.
|
||||
|
||||
## Data Required
|
||||
|
||||
All exists today:
|
||||
- `documents` (source_type='issue', content_text)
|
||||
- `embeddings` (768-dim vectors)
|
||||
- `issues` (state='closed' for filtering)
|
||||
|
||||
## Implementation Sketch
|
||||
|
||||
```
|
||||
1. Collect all embeddings for closed issue documents
|
||||
2. For each issue, find K nearest neighbors (K=10)
|
||||
3. Build adjacency graph: edge exists if similarity > threshold (e.g., 0.80)
|
||||
4. Find connected components (simple DFS/BFS)
|
||||
5. Filter to components with >= min-cluster members (default 3)
|
||||
6. For each cluster:
|
||||
a. Extract common terms (TF-IDF or simple word frequency)
|
||||
b. Sort by recency (most recent issue first)
|
||||
c. Report cluster with: theme, member issues, time span
|
||||
```
|
||||
|
||||
### Similarity Threshold Tuning
|
||||
|
||||
This is the critical parameter. Too low = noise, too high = misses.
|
||||
- Start at 0.80 cosine similarity
|
||||
- Expose as `--threshold` flag for user tuning
|
||||
- Report cluster cohesion score for transparency
|
||||
|
||||
## Human Output
|
||||
|
||||
```
|
||||
Recurring Patterns (3+ similar closed issues)
|
||||
|
||||
Cluster 1: "Authentication timeout errors" (5 issues, spanning 6 months)
|
||||
#89 Login timeout on slow networks (closed 3d ago)
|
||||
#72 Auth flow hangs on cellular (closed 2mo ago)
|
||||
#58 Token refresh timeout (closed 3mo ago)
|
||||
#45 SSO login timeout for remote users (closed 5mo ago)
|
||||
#31 Connection timeout in auth middleware (closed 6mo ago)
|
||||
Avg similarity: 0.87 | Suggested: systemic fix for auth timeout handling
|
||||
|
||||
Cluster 2: "Cache invalidation issues" (3 issues, spanning 2 months)
|
||||
#85 Stale cache after deploy (closed 2w ago)
|
||||
#77 Cache headers not updated (closed 1mo ago)
|
||||
#69 Dashboard shows old data after settings change (closed 2mo ago)
|
||||
Avg similarity: 0.82 | Suggested: review cache invalidation strategy
|
||||
```
|
||||
|
||||
## Downsides
|
||||
|
||||
- Clustering quality depends on embedding quality and threshold tuning
|
||||
- May produce false clusters (issues that mention similar terms but are different problems)
|
||||
- Computationally expensive for large issue counts (N^2 comparisons)
|
||||
- Need to handle multi-chunk documents (aggregate embeddings)
|
||||
|
||||
## Extensions
|
||||
|
||||
- `lore recurring-patterns --open` — find clusters in open issues (duplicates to merge)
|
||||
- `lore recurring-patterns --cross-project` — patterns across repos
|
||||
- Trend detection: are cluster sizes growing? (escalating problem)
|
||||
- Export as report for engineering retrospectives
|
||||
Reference in New Issue
Block a user