docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc

Ideas catalog (docs/ideas/): 25 feature concept documents covering future
lore capabilities including bottleneck detection, churn analysis, expert
scoring, collaboration patterns, milestone risk, knowledge silos, and more.
Each doc includes motivation, implementation sketch, data requirements, and
dependencies on existing infrastructure. README.md provides an overview and
SYSTEM-PROPOSAL.md presents the unified analytics vision.

Plans (plans/): Time-decay expert scoring design with four rounds of review
feedback exploring decay functions, scoring algebra, and integration points
with the existing who-expert pipeline.

Issue doc (docs/issues/001): Documents the timeline pipeline bug where
EntityRef was missing project context, causing ambiguous cross-project
references during the EXPAND stage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Taylor Eernisse
2026-02-09 10:16:48 -05:00
parent d54f669c5e
commit 4185abe05d
32 changed files with 4170 additions and 0 deletions

66
docs/ideas/README.md Normal file
View File

@@ -0,0 +1,66 @@
# Gitlore Feature Ideas
Central registry of potential features. Each idea leverages data already ingested
into the local SQLite database (issues, MRs, discussions, notes, resource events,
entity references, embeddings, file changes).
## Priority Tiers
**Tier 1 — High confidence, low effort, immediate value:**
| # | Idea | File | Confidence |
|---|------|------|------------|
| 9 | Similar Issues Finder | [similar-issues.md](similar-issues.md) | 95% |
| 17 | "What Changed?" Digest | [digest.md](digest.md) | 93% |
| 5 | Who Knows About X? | [experts.md](experts.md) | 92% |
| -- | Multi-Project Ergonomics | [project-ergonomics.md](project-ergonomics.md) | 90% |
| 27 | Weekly Digest Generator | [weekly-digest.md](weekly-digest.md) | 90% |
| 4 | Stale Discussion Finder | [stale-discussions.md](stale-discussions.md) | 90% |
**Tier 2 — Strong ideas, moderate effort:**
| # | Idea | File | Confidence |
|---|------|------|------------|
| 19 | MR-to-Issue Closure Gap | [closure-gaps.md](closure-gaps.md) | 88% |
| 1 | Contributor Heatmap | [contributors.md](contributors.md) | 88% |
| 21 | Knowledge Silo Detection | [silos.md](silos.md) | 87% |
| 2 | Review Bottleneck Detector | [bottlenecks.md](bottlenecks.md) | 85% |
| 14 | File Hotspot Report | [hotspots.md](hotspots.md) | 85% |
| 26 | Unlinked MR Finder | [unlinked.md](unlinked.md) | 83% |
| 6 | Decision Archaeology | [decisions.md](decisions.md) | 82% |
| 18 | Label Hygiene Audit | [label-audit.md](label-audit.md) | 82% |
**Tier 3 — Promising, needs more design work:**
| # | Idea | File | Confidence |
|---|------|------|------------|
| 29 | Entity Relationship Explorer | [graph.md](graph.md) | 80% |
| 12 | Milestone Risk Report | [milestone-risk.md](milestone-risk.md) | 78% |
| 3 | Label Velocity | [label-flow.md](label-flow.md) | 78% |
| 24 | Recurring Bug Patterns | [recurring-patterns.md](recurring-patterns.md) | 76% |
| 7 | Cross-Project Impact Graph | [impact-graph.md](impact-graph.md) | 75% |
| 16 | Idle Work Detector | [idle.md](idle.md) | 73% |
| 8 | MR Churn Analysis | [churn.md](churn.md) | 72% |
| 15 | Author Collaboration Network | [collaboration.md](collaboration.md) | 70% |
| 28 | DiffNote Coverage Map | [review-coverage.md](review-coverage.md) | 75% |
| 25 | MR Pipeline Efficiency | [mr-pipeline.md](mr-pipeline.md) | 78% |
## Rejected Ideas (with reasons)
| # | Idea | Reason |
|---|------|--------|
| 10 | Sprint Burndown from Labels | Too opinionated about label semantics |
| 11 | Code Review Quality Score | Subjective "quality" scoring creates perverse incentives |
| 13 | Discussion Sentiment Drift | Unreliable heuristic sentiment on technical text |
| 20 | Response Time Leaderboard | Toxic "leaderboard" framing; metric folded into #2 |
| 22 | Timeline Diff | Niche use case; timeline already interleaves events |
| 23 | Discussion Thread Summarizer | Requires LLM inference; out of scope for local-first tool |
| 30 | NL Query Interface | Over-engineered; existing filters cover this |
## How to use this list
1. Pick an idea from Tier 1 or Tier 2
2. Read its detail file for implementation plan and SQL sketches
3. Create a bead (`br create`) referencing the idea file
4. Implement following TDD (test first, then minimal impl)
5. Update the idea file with `status: implemented` when done

View File

@@ -0,0 +1,555 @@
# Project Manager System — Design Proposal
## The Problem
We have a growing backlog of ideas and issues in markdown files. Agents can ship
features in under an hour. The constraint isn't execution speed — it's knowing
WHAT to execute NEXT, in what ORDER, and detecting when the plan needs to change.
We need a system that:
1. Automatically scores and sequences work items
2. Detects when scope changes during spec generation
3. Tracks the full lifecycle: idea → spec → beads → shipped
4. Re-triages instantly when the dependency graph changes
5. Runs in seconds, not minutes
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ docs/ideas/*.md │
│ docs/issues/*.md │
│ (YAML frontmatter) │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ IDEA TRIAGE SKILL │
│ │
│ Phase 1: INGEST — parse all frontmatter │
│ Phase 2: VALIDATE — check refs, detect staleness │
│ Phase 3: EVALUATE — detect scope changes since last run │
│ Phase 4: SCORE — compute priority with unlock graph │
│ Phase 5: SEQUENCE — topological sort by dependency + score │
│ Phase 6: RECOMMEND — top 3 + unlock advisories + warnings │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ HUMAN DECIDES │
│ (picks from top 3, takes seconds) │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ SPEC GENERATION (Claude/GPT) │
│ Takes the idea doc, generates detailed implementation spec │
│ ALSO: re-evaluates frontmatter fields based on deeper │
│ understanding. Updates effort, blocked-by, components. │
│ This is the SCOPE CHANGE DETECTION point. │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ PLAN-TO-BEADS (existing skill) │
│ Spec → granular beads with dependencies via br CLI │
│ Links bead IDs back into the idea frontmatter │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ AGENT IMPLEMENTATION │
│ Works beads via br/bv workflow │
│ bv --robot-triage handles execution-phase prioritization │
└──────────────────────────┬──────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ COMPLETION & RE-TRIAGE │
│ Beads close → idea status updates to implemented │
│ Skill re-runs → newly unblocked ideas surface │
│ Loop back to top │
└─────────────────────────────────────────────────────────────┘
```
## The Two Systems and Their Boundary
| Concern | Ideas System (new) | Beads System (existing) |
|---------|-------------------|------------------------|
| Phase | Pre-commitment (what to build) | Execution (how to build) |
| Data | docs/ideas/*.md, docs/issues/*.md | .beads/issues.jsonl |
| Triage | Idea triage skill | bv --robot-triage |
| Tracking | YAML frontmatter | JSONL records |
| Granularity | Feature-level | Task-level |
| Lifecycle | proposed → specced → promoted | open → in_progress → closed |
**The handoff point is promotion.** An idea becomes one or more beads. After that,
the ideas system only tracks the idea's status (promoted/implemented). Beads owns
execution.
An idea file is NEVER deleted. It's a permanent design record. Even after
implementation, it documents WHY the feature was built and what tradeoffs were made.
---
## Data Model
### Frontmatter Schema
```yaml
---
# ── Identity ──
id: idea-009 # stable unique identifier
title: Similar Issues Finder
type: idea # idea | issue
status: proposed # see lifecycle below
# ── Timestamps ──
created: 2026-02-09
updated: 2026-02-09
eval-hash: null # SHA of scoring fields at last triage run
# ── Scoring Inputs ──
impact: high # high | medium | low
effort: small # small | medium | large | xlarge
severity: null # critical | high | medium | low (issues only)
autonomy: full # full | needs-design | needs-human
# ── Dependency Graph ──
blocked-by: [] # IDs of ideas/issues that must complete first
unlocks: # IDs that become possible/better after this ships
- idea-recurring-patterns
requires: [] # external prerequisites (gate names)
related: # soft links, not blocking
- issue-001
# ── Implementation Context ──
components: # source code paths this will touch
- src/search/
- src/embedding/
command: lore similar # proposed CLI command (null for issues)
has-spec: false # detailed spec has been generated
spec-path: null # path to spec doc if it exists
beads: [] # bead IDs after promotion
# ── Classification ──
tags:
- embeddings
- search
---
```
### Status Lifecycle
```
IDEA lifecycle:
proposed ──→ accepted ──→ specced ──→ promoted ──→ implemented
│ │
└──→ rejected └──→ (scope changed, back to accepted)
ISSUE lifecycle:
open ──→ accepted ──→ specced ──→ promoted ──→ resolved
└──→ wontfix
```
Transitions:
- `proposed → accepted`: Human confirms this is worth building
- `accepted → specced`: Detailed implementation spec has been generated
- `specced → promoted`: Beads created from the spec
- `promoted → implemented`: All beads closed
- Any → `rejected`/`wontfix`: Decided not to build (with reason in body)
- `specced → accepted`: Scope changed during spec, needs re-evaluation
### Effort Calibration (Agent-Executed)
| Level | Wall Clock | Autonomy | Example |
|-------|-----------|----------|---------|
| small | ~30 min | Agent ships end-to-end | stale-discussions, closure-gaps |
| medium | ~1 hour | Agent ships end-to-end | similar-issues, digest |
| large | 1-2 hours | May need one design decision | recurring-patterns, experts |
| xlarge | 2+ hours | Needs human architecture input | project groups |
### Gates Registry (docs/gates.yaml)
```yaml
gates:
gate-1:
title: Resource Events Ingestion
status: complete
completed: 2025-12-15
gate-2:
title: Cross-References & Entity Graph
status: complete
completed: 2026-01-10
gate-3:
title: Timeline Pipeline
status: complete
completed: 2026-01-25
gate-4:
title: MR File Changes Ingestion
status: partial
notes: Schema ready (migration 016), ingestion code exists but untested
tracks: mr_file_changes table population
gate-5:
title: Code Trace (file:line → commit → MR → issue)
status: not-started
blocked-by: gate-4
notes: Requires git log parsing + commit SHA matching
```
The skill reads this file to determine which `requires` entries are satisfied.
---
## Scoring Algorithm
### Priority Score
```
For ideas:
base = impact_weight # high=3, medium=2, low=1
unlock = 1 + (0.5 × count_of_unlocks) # items this directly enables
readiness = 0 if blocked, 1 if ready
priority = base × unlock × readiness
For issues:
base = severity_weight × 1.5 # critical=6, high=4.5, medium=3, low=1.5
unlock = 1 + (0.5 × count_of_unlocks) # (bugs rarely unlock, but can)
readiness = 0 if blocked, 1 if ready
priority = base × unlock × readiness
Tiebreak (among equal priority):
1. Prefer smaller effort (ships faster, starts next cycle sooner)
2. Prefer autonomy:full over needs-design over needs-human
3. Prefer older items (FIFO within same score)
```
### Why This Works
- High-impact items that unlock other items float to the top
- Blocked items score 0 regardless of impact (can't be worked)
- Effort is a tiebreaker, not a primary factor (since execution is fast)
- Issues with severity get a 1.5× multiplier (bugs degrade existing value)
- Unlock multiplier captures the "do Gate 4 first" insight automatically
### Example Rankings
| Item | Impact | Unlocks | Readiness | Score |
|------|--------|---------|-----------|-------|
| project-ergonomics | high(3) | 10 | ready(1) | 3 × 6.0 = 18.0 |
| gate-4-completion | med(2) | 5 | ready(1) | 2 × 3.5 = 7.0 |
| similar-issues | high(3) | 1 | ready(1) | 3 × 1.5 = 4.5 |
| stale-discussions | high(3) | 0 | ready(1) | 3 × 1.0 = 3.0 |
| hotspots | high(3) | 1 | blocked(0) | 0.0 |
Project-ergonomics dominates because it unlocks 10 downstream items. This is the
correct recommendation — it's the highest-leverage work even though "stale-discussions"
is simpler.
---
## Scope Change Detection
This is the hardest problem. An idea's scope can change in three ways:
### 1. During Spec Generation (Primary Detection Point)
When Claude/GPT generates a detailed implementation spec from an idea doc, it
understands the idea more deeply than the original sketch. The spec process should
be instructed to:
- Re-evaluate effort (now that implementation is understood in detail)
- Discover new dependencies (need to change schema first, need a new config option)
- Identify component changes (touches more modules than originally thought)
- Assess impact more accurately (this is actually higher/lower value than estimated)
**Mechanism:** The spec generation prompt includes an explicit "re-evaluate frontmatter"
step. The spec output includes an updated frontmatter block. If scoring-relevant
fields changed, the skill flags it:
```
SCOPE CHANGE DETECTED:
idea-009 (Similar Issues Finder)
- effort: small → medium (needs embedding aggregation strategy)
- blocked-by: [] → [gate-embeddings-populated]
- components: +src/cli/commands/similar.rs (new file)
Previous score: 4.5 → New score: 3.0
Recommendation: Still top-3, but sequencing may change.
```
### 2. During Implementation (Discovered Complexity)
An agent working on beads may discover the spec was wrong:
- "This requires a database migration I didn't anticipate"
- "This module doesn't expose the API I need"
**Mechanism:** When a bead is blocked or takes significantly longer than estimated,
the agent should update the idea's frontmatter. The skill detects the change on
next triage run via eval-hash comparison.
### 3. External Changes (Gate Completion, New Ideas)
When a gate completes or a new idea is added that changes the dependency graph:
- Gate 4 completes → 5 ideas become unblocked
- New idea added that's higher priority than current top-3
- Two ideas discovered to be duplicates
**Mechanism:** The skill detects these automatically by re-computing the full graph
on every run. The eval-hash tracks what the scoring fields looked like last time;
if they haven't changed but the SCORE changed (because a dependency was resolved),
the skill flags it as "newly unblocked."
### The eval-hash Field
```yaml
eval-hash: "a1b2c3d4" # SHA-256 of: impact + effort + blocked-by + unlocks + requires
```
Computed by hashing the concatenation of all scoring-relevant fields. When the skill
runs, it compares:
- If eval-hash matches AND score is same → no change, skip
- If eval-hash matches BUT score changed → external change (dependency resolved)
- If eval-hash differs → item was modified, re-evaluate
This avoids re-announcing unchanged items on every run.
---
## Skill Design
### Location
`.claude/skills/idea-triage/SKILL.md` (project-local)
### Trigger Phrases
- "triage ideas" / "what should I build next?"
- "idea triage" / "prioritize ideas"
- "what's the highest value work?"
- `/idea-triage`
### Workflow Phases
**Phase 1: INGEST**
- Glob docs/ideas/*.md and docs/issues/*.md
- Parse YAML frontmatter from each file
- Read docs/gates.yaml for capability status
- Collect: id, title, type, status, impact, effort, severity, autonomy,
blocked-by, unlocks, requires, has-spec, beads, eval-hash
**Phase 2: VALIDATE**
- Required fields present (id, title, type, status, impact, effort)
- All blocked-by IDs reference existing files
- All unlocks IDs reference existing files
- All requires entries exist in gates.yaml
- No dependency cycles (blocked-by graph is a DAG)
- Status transitions are valid (no "proposed" with beads linked)
- Output: list of validation errors/warnings
**Phase 3: EVALUATE (Scope Change Detection)**
- For each item, compute current eval-hash from scoring fields
- Compare against stored eval-hash in frontmatter
- If different: flag as SCOPE_CHANGED with field-level diff
- If same but score changed (due to external dep resolution): flag as NEWLY_UNBLOCKED
- If status is specced but has-spec is false: flag as INCONSISTENT
**Phase 4: SCORE**
- Resolve requires against gates.yaml (is the gate complete?)
- Resolve blocked-by against other items (is the blocker done?)
- Compute readiness: 0 if any hard blocker is unresolved, 1 otherwise
- Compute unlock count: count items whose blocked-by includes this ID
- Apply scoring formula:
- Ideas: impact_weight × (1 + 0.5 × unlock_count) × readiness
- Issues: severity_weight × 1.5 × (1 + 0.5 × unlock_count) × readiness
- Apply tiebreak: effort_weight, autonomy, created date
**Phase 5: SEQUENCE**
- Separate into: actionable (score > 0) vs blocked (score = 0)
- Among actionable: sort by score descending with tiebreak
- Among blocked: sort by "what-if score" (score if blockers were resolved)
- Compute unlock advisories: "completing X unblocks Y items worth Z total score"
**Phase 6: RECOMMEND**
Output structured report:
```
== IDEA TRIAGE ==
Run: 2026-02-09T14:30:00Z
Items: 22 (18 proposed, 2 accepted, 1 specced, 1 implemented)
RECOMMENDED SEQUENCE:
1. [idea-project-ergonomics] Multi-Project Ergonomics
impact:high effort:medium autonomy:full score:18.0
WHY FIRST: Unlocks 10 downstream ideas. Highest leverage.
COMPONENTS: src/core/config.rs, src/core/project.rs, src/cli/
2. [idea-009] Similar Issues Finder
impact:high effort:small autonomy:full score:4.5
WHY NEXT: Highest standalone impact. Ships in ~30 min.
UNLOCKS: idea-recurring-patterns
3. [idea-004] Stale Discussion Finder
impact:high effort:small autonomy:full score:3.0
WHY NEXT: Quick win, no dependencies, immediate user value.
BLOCKED (would rank high if unblocked):
idea-014 File Hotspots score-if-unblocked:4.5 BLOCKED BY: gate-4
idea-021 Knowledge Silos score-if-unblocked:3.0 BLOCKED BY: gate-4
UNLOCK ADVISORY: Completing gate-4 unblocks 5 items (combined: 15.0)
SCOPE CHANGES DETECTED:
idea-009: effort changed small→medium (eval-hash mismatch)
idea-017: now has spec (has-spec flipped to true)
NEWLY UNBLOCKED:
(none this run)
WARNINGS:
idea-016: status=proposed, unchanged for 30+ days
idea-008: blocked-by references "idea-gate4" which doesn't exist (typo?)
HEALTH:
Proposed: 18 | Accepted: 2 | Specced: 1 | Promoted: 0 | Implemented: 1
Blocked: 6 | Actionable: 16
Backlog runway at ~5/day: ~3 days
```
### What the Skill Does NOT Do
- **Never modifies files.** Read-only triage. The agent or human updates frontmatter.
Exception: the skill CAN update eval-hash after a triage run (opt-in).
- **Never creates beads.** That's plan-to-beads skill territory.
- **Never replaces bv.** Once work is in beads, bv --robot-triage handles execution
prioritization. This skill owns pre-commitment only.
- **Never generates specs.** That's a separate step with Claude/GPT.
---
## Integration Points
### With Spec Generation
The spec generation prompt (separate from this skill) should include:
```
After generating the implementation spec, re-evaluate the idea's frontmatter:
1. Is the effort estimate still accurate? (small/medium/large/xlarge)
2. Did you discover new dependencies? (add to blocked-by)
3. Are there components not listed? (add to components)
4. Has the impact assessment changed?
5. Can an agent ship this autonomously? (autonomy: full/needs-design/needs-human)
Output an UPDATED frontmatter block at the end of the spec.
If any scoring field changed, explain what changed and why.
```
### With plan-to-beads
When promoting an idea to beads:
1. Run plan-to-beads on the spec
2. Capture the created bead IDs
3. Update the idea's frontmatter: status → promoted, beads → [bd-xxx, bd-yyy]
4. Run br sync --flush-only && git add .beads/
### With bv --robot-triage
These systems don't talk to each other directly. The boundary is:
- Idea triage skill → "build idea-009 next"
- Human/agent generates spec → plan-to-beads → beads created
- bv --robot-triage → "work on bd-xxx next"
- Beads close → human/agent updates idea frontmatter → idea triage re-runs
### With New Item Ingestion
When someone adds a new file to docs/ideas/ or docs/issues/:
- If it has valid frontmatter: picked up automatically on next triage run
- If it has no/invalid frontmatter: flagged in WARNINGS section
- Skill can suggest default frontmatter based on content analysis
---
## Failure Modes and Mitigations
### 1. Frontmatter Rot
**Risk:** Fields don't get updated. Status says "proposed" but it's actually shipped.
**Mitigation:** Cross-reference with beads. If an idea has beads and all beads are
closed, flag that the idea should be "implemented" even if frontmatter says otherwise.
The skill detects this inconsistency.
### 2. Score Gaming
**Risk:** Someone inflates impact or unlocks count to make their idea rank higher.
**Mitigation:** Unlocks are verified — the skill checks that the referenced items
actually have this idea in their blocked-by. Impact is subjective but reviewed during
spec generation (second opinion from a different model/session).
### 3. Stale Gates Registry
**Risk:** gate-4 is actually complete but gates.yaml wasn't updated.
**Mitigation:** Skill warns when a gate has been "partial" for a long time. Could
also probe the codebase (check if mr_file_changes ingestion code exists and has tests).
### 4. Circular Dependencies
**Risk:** A blocks B blocks A.
**Mitigation:** Phase 2 validation explicitly checks for cycles in the blocked-by
graph and reports them as errors.
### 5. Unlock Count Inflation
**Risk:** An item claims to unlock 20 things, making it score astronomically.
**Mitigation:** Unlock count is VERIFIED by checking reverse blocked-by references.
If idea-X says it unlocks idea-Y, but idea-Y's blocked-by doesn't include idea-X,
the claim is discounted. Both explicit unlocks and reverse blocked-by contribute to
the count, but unverified claims are flagged.
### 6. Scope Creep During Spec
**Risk:** Spec generation reveals the idea is actually 5× harder than estimated.
The score drops, but the human has already mentally committed.
**Mitigation:** The scope change detection makes this VISIBLE. The triage output
explicitly shows "effort changed small→xlarge, score dropped from 4.5 to 0.75."
Human can then decide: proceed anyway, or switch to a different top-3 pick.
### 7. Orphaned Ideas
**Risk:** Ideas get promoted to beads, beads get implemented, but the idea file
never gets updated. It sits in "promoted" forever.
**Mitigation:** Skill checks: for each idea with status=promoted, look up the
linked beads. If all beads are closed, flag: "idea-009 appears complete, update
status to implemented."
---
## Implementation Plan
### Step 1: Create the Frontmatter Schema (this doc → applied to all files)
- Define the exact YAML schema (above)
- Create docs/gates.yaml
- Apply frontmatter to all 22 existing files in docs/ideas/ and docs/issues/
### Step 2: Build the Skill
- Create .claude/skills/idea-triage/SKILL.md
- Implement all 6 phases in the skill prompt
- The skill uses Glob, Read, and text processing — no external scripts needed
(25 files is small enough for Claude to process directly)
### Step 3: Test the System
- Run the skill against current files
- Verify scoring matches manual expectations
- Check that project-ergonomics ranks #1 (it should, due to unlock count)
- Verify blocked items score 0
- Check validation catches intentional errors
### Step 4: Run One Full Cycle
- Pick the top recommendation
- Generate a spec (separate session)
- Verify scope change detection works (spec should update frontmatter)
- Promote to beads via plan-to-beads
- Implement
- Verify completion detection works
### Step 5: Iterate
- Run triage again after implementation
- Verify newly unblocked items surface
- Adjust scoring weights if rankings feel wrong
- Add new ideas as they emerge

88
docs/ideas/bottlenecks.md Normal file
View File

@@ -0,0 +1,88 @@
# Review Bottleneck Detector
- **Command:** `lore bottlenecks [--since <date>]`
- **Confidence:** 85%
- **Tier:** 2
- **Status:** proposed
- **Effort:** medium — join MRs with first review note, compute percentiles
## What
For MRs in a given time window, compute:
1. **Time to first review** — created_at to first non-author DiffNote
2. **Review cycles** — count of discussion resolution rounds
3. **Time to merge** — created_at to merged_at
Flag MRs above P90 thresholds as bottlenecks.
## Why
Review bottlenecks are the #1 developer productivity killer. Making them visible
and measurable is the first step to fixing them. This provides data for process
retrospectives.
## Data Required
All exists today:
- `merge_requests` (created_at, merged_at, author_username)
- `notes` (note_type='DiffNote', author_username, created_at)
- `discussions` (resolved, resolvable)
## Implementation Sketch
```sql
-- Time to first review per MR
SELECT
mr.id,
mr.iid,
mr.title,
mr.author_username,
mr.created_at,
mr.merged_at,
p.path_with_namespace,
MIN(n.created_at) as first_review_at,
(MIN(n.created_at) - mr.created_at) / 3600000.0 as hours_to_first_review,
(mr.merged_at - mr.created_at) / 3600000.0 as hours_to_merge
FROM merge_requests mr
JOIN projects p ON mr.project_id = p.id
LEFT JOIN discussions d ON d.merge_request_id = mr.id
LEFT JOIN notes n ON n.discussion_id = d.id
AND n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username != mr.author_username
WHERE mr.created_at >= ?1
AND mr.state IN ('merged', 'opened')
GROUP BY mr.id
ORDER BY hours_to_first_review DESC NULLS FIRST;
```
## Human Output
```
Review Bottlenecks (last 30 days)
P50 time to first review: 4.2h
P90 time to first review: 28.1h
P50 time to merge: 2.1d
P90 time to merge: 8.3d
Slowest to review:
!234 Refactor auth 72h to first review (alice, still open)
!228 Database migration 48h to first review (bob, merged in 5d)
Most review cycles:
!234 Refactor auth 8 discussion threads, 4 resolved
!225 API versioning 6 discussion threads, 6 resolved
```
## Downsides
- Doesn't capture review done outside GitLab (Slack, in-person)
- DiffNote timestamp != when reviewer started reading
- Large MRs naturally take longer; no size normalization
## Extensions
- `lore bottlenecks --reviewer alice` — how fast does alice review?
- Per-project comparison: which project has the fastest review cycle?
- Trend line: is review speed improving or degrading over time?

77
docs/ideas/churn.md Normal file
View File

@@ -0,0 +1,77 @@
# MR Churn Analysis
- **Command:** `lore churn [--since <date>]`
- **Confidence:** 72%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — multi-table aggregation with composite scoring
## What
For merged MRs, compute a "contentiousness score" based on: number of review
discussions, number of DiffNotes, resolution cycles, file count. Flag high-churn
MRs as candidates for architectural review.
## Why
High-churn MRs often indicate architectural disagreements, unclear requirements,
or code that's hard to review. Surfacing them post-merge enables retrospectives
and identifies areas that need better design upfront.
## Data Required
All exists today:
- `merge_requests` (state='merged')
- `discussions` (merge_request_id, resolved, resolvable)
- `notes` (note_type='DiffNote', discussion_id)
- `mr_file_changes` (file count per MR)
## Implementation Sketch
```sql
SELECT
mr.iid,
mr.title,
mr.author_username,
p.path_with_namespace,
COUNT(DISTINCT d.id) as discussion_count,
COUNT(DISTINCT CASE WHEN n.note_type = 'DiffNote' THEN n.id END) as diffnote_count,
COUNT(DISTINCT CASE WHEN d.resolvable = 1 AND d.resolved = 1 THEN d.id END) as resolved_threads,
COUNT(DISTINCT mfc.id) as files_changed,
-- Composite score: normalize each metric and weight
(COUNT(DISTINCT d.id) * 2 + COUNT(DISTINCT n.id) + COUNT(DISTINCT mfc.id)) as churn_score
FROM merge_requests mr
JOIN projects p ON mr.project_id = p.id
LEFT JOIN discussions d ON d.merge_request_id = mr.id AND d.noteable_type = 'MergeRequest'
LEFT JOIN notes n ON n.discussion_id = d.id AND n.is_system = 0
LEFT JOIN mr_file_changes mfc ON mfc.merge_request_id = mr.id
WHERE mr.state = 'merged'
AND mr.merged_at >= ?1
GROUP BY mr.id
ORDER BY churn_score DESC
LIMIT ?2;
```
## Human Output
```
High-Churn MRs (last 90 days)
MR Discussions DiffNotes Files Score Title
!234 12 28 8 60 Refactor auth middleware
!225 8 19 5 39 API versioning v2
!218 6 15 12 39 Database schema migration
!210 5 8 3 21 Update logging framework
```
## Downsides
- High discussion count could mean thorough review, not contention
- Composite scoring weights are arbitrary; needs calibration per team
- Large MRs naturally score higher regardless of contention
## Extensions
- Normalize by file count (discussions per file changed)
- Compare against team averages (flag outliers, not absolute values)
- `lore churn --author alice` — which of alice's MRs generate the most discussion?

View File

@@ -0,0 +1,73 @@
# MR-to-Issue Closure Gap
- **Command:** `lore closure-gaps`
- **Confidence:** 88%
- **Tier:** 2
- **Status:** proposed
- **Effort:** low — single join query
## What
Find entity_references where reference_type='closes' AND the target issue is still
open AND the source MR is merged. These represent broken auto-close links where a
merge should have closed an issue but didn't.
## Why
Simple, definitive, actionable. If a merged MR says "closes #42" but #42 is still
open, something is wrong. Either auto-close failed (wrong target branch), the
reference was incorrect, or the issue needs manual attention.
## Data Required
All exists today:
- `entity_references` (reference_type='closes')
- `merge_requests` (state='merged')
- `issues` (state='opened')
## Implementation Sketch
```sql
SELECT
mr.iid as mr_iid,
mr.title as mr_title,
mr.merged_at,
mr.target_branch,
i.iid as issue_iid,
i.title as issue_title,
i.state as issue_state,
p.path_with_namespace
FROM entity_references er
JOIN merge_requests mr ON er.source_entity_type = 'merge_request'
AND er.source_entity_id = mr.id
JOIN issues i ON er.target_entity_type = 'issue'
AND er.target_entity_id = i.id
JOIN projects p ON er.project_id = p.id
WHERE er.reference_type = 'closes'
AND mr.state = 'merged'
AND i.state = 'opened';
```
## Human Output
```
Closure Gaps — merged MRs that didn't close their referenced issues
group/backend !234 merged 3d ago → #42 still OPEN
"Refactor auth middleware" should have closed "Login timeout bug"
Target branch: develop (default: main) — possible branch mismatch
group/frontend !45 merged 1w ago → #38 still OPEN
"Update dashboard" should have closed "Dashboard layout broken"
```
## Downsides
- Could be intentional (MR merged to wrong branch, issue tracked across branches)
- Cross-project references may not be resolvable if target project not synced
- GitLab auto-close only works when merging to default branch
## Extensions
- Flag likely cause: branch mismatch (target_branch != project.default_branch)
- `lore closure-gaps --auto-close` — actually close the issues via API (dangerous, needs confirmation)

101
docs/ideas/collaboration.md Normal file
View File

@@ -0,0 +1,101 @@
# Author Collaboration Network
- **Command:** `lore collaboration [--since <date>]`
- **Confidence:** 70%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — self-join on notes, graph construction
## What
Build a weighted graph of author pairs: (author_A, author_B, weight) where weight =
number of times A reviewed B's MR + B reviewed A's MR + they both commented on the
same entity.
## Why
Reveals team structure empirically. Shows who collaborates across team boundaries
and where knowledge transfer happens. Useful for re-orgs, onboarding planning,
and identifying isolated team members.
## Data Required
All exists today:
- `merge_requests` (author_username)
- `notes` (author_username, note_type='DiffNote')
- `discussions` (for co-participation)
## Implementation Sketch
```sql
-- Review relationships: who reviews whose MRs
SELECT
mr.author_username as author,
n.author_username as reviewer,
COUNT(*) as review_count
FROM merge_requests mr
JOIN discussions d ON d.merge_request_id = mr.id
JOIN notes n ON n.discussion_id = d.id
WHERE n.note_type = 'DiffNote'
AND n.is_system = 0
AND n.author_username != mr.author_username
AND mr.created_at >= ?1
GROUP BY mr.author_username, n.author_username;
-- Co-participation: who comments on the same entities
WITH entity_participants AS (
SELECT
COALESCE(d.issue_id, d.merge_request_id) as entity_id,
d.noteable_type,
n.author_username
FROM discussions d
JOIN notes n ON n.discussion_id = d.id
WHERE n.is_system = 0
AND n.created_at >= ?1
)
SELECT
a.author_username as person_a,
b.author_username as person_b,
COUNT(DISTINCT a.entity_id) as shared_entities
FROM entity_participants a
JOIN entity_participants b
ON a.entity_id = b.entity_id
AND a.noteable_type = b.noteable_type
AND a.author_username < b.author_username -- avoid duplicates
GROUP BY a.author_username, b.author_username;
```
## Output Formats
### JSON (for further analysis)
```json
{
"nodes": ["alice", "bob", "charlie"],
"edges": [
{ "source": "alice", "target": "bob", "reviews": 15, "co_participated": 8 },
{ "source": "bob", "target": "charlie", "reviews": 3, "co_participated": 12 }
]
}
```
### Human
```
Collaboration Network (last 90 days)
alice <-> bob 15 reviews, 8 shared discussions [strong]
bob <-> charlie 3 reviews, 12 shared discussions [moderate]
alice <-> charlie 1 review, 2 shared discussions [weak]
dave <-> (none) 0 reviews, 0 shared discussions [isolated]
```
## Downsides
- Interpretation requires context; high collaboration might mean dependency
- Doesn't capture collaboration outside GitLab
- Self-join can be slow with many notes
## Extensions
- `lore collaboration --format dot` — GraphViz network diagram
- `lore collaboration --isolated` — find team members with no collaboration edges
- Team boundary detection via graph clustering algorithms

View File

@@ -0,0 +1,86 @@
# Contributor Heatmap
- **Command:** `lore contributors [--since <date>]`
- **Confidence:** 88%
- **Tier:** 2
- **Status:** proposed
- **Effort:** medium — multiple aggregation queries
## What
Rank team members by activity across configurable time windows (7d, 30d, 90d). Shows
issues authored, MRs authored, MRs merged, review comments made, discussions
participated in.
## Why
Team leads constantly ask "who's been active?" or "who's contributing to reviews?"
This answers it from local data without GitLab Premium analytics. Also useful for
identifying team members who may be overloaded or disengaged.
## Data Required
All exists today:
- `issues` (author_username, created_at)
- `merge_requests` (author_username, created_at, merged_at)
- `notes` (author_username, created_at, note_type, is_system)
- `discussions` (for participation counting)
## Implementation Sketch
```sql
-- Combined activity per author
WITH activity AS (
SELECT author_username, 'issue_authored' as activity_type, created_at
FROM issues WHERE created_at >= ?1
UNION ALL
SELECT author_username, 'mr_authored', created_at
FROM merge_requests WHERE created_at >= ?1
UNION ALL
SELECT author_username, 'mr_merged', merged_at
FROM merge_requests WHERE merged_at >= ?1 AND state = 'merged'
UNION ALL
SELECT author_username, 'review_comment', created_at
FROM notes WHERE created_at >= ?1 AND note_type = 'DiffNote' AND is_system = 0
UNION ALL
SELECT author_username, 'discussion_comment', created_at
FROM notes WHERE created_at >= ?1 AND note_type != 'DiffNote' AND is_system = 0
)
SELECT
author_username,
COUNT(*) FILTER (WHERE activity_type = 'issue_authored') as issues,
COUNT(*) FILTER (WHERE activity_type = 'mr_authored') as mrs_authored,
COUNT(*) FILTER (WHERE activity_type = 'mr_merged') as mrs_merged,
COUNT(*) FILTER (WHERE activity_type = 'review_comment') as reviews,
COUNT(*) FILTER (WHERE activity_type = 'discussion_comment') as comments,
COUNT(*) as total
FROM activity
GROUP BY author_username
ORDER BY total DESC;
```
Note: SQLite doesn't support FILTER — use SUM(CASE WHEN ... THEN 1 ELSE 0 END).
## Human Output
```
Contributors (last 30 days)
Username Issues MRs Merged Reviews Comments Total
alice 3 8 7 23 12 53
bob 1 5 4 31 8 49
charlie 5 3 2 4 15 29
dave 0 1 0 2 3 6
```
## Downsides
- Could be used for surveillance; frame as team health, not individual tracking
- Activity volume != productivity (one thoughtful review > ten "LGTM"s)
- Doesn't capture work done outside GitLab
## Extensions
- `lore contributors --project group/backend` — scoped to project
- `lore contributors --type reviews` — focus on review activity only
- Trend comparison: `--compare 30d,90d` shows velocity changes

94
docs/ideas/decisions.md Normal file
View File

@@ -0,0 +1,94 @@
# Decision Archaeology
- **Command:** `lore decisions <query>`
- **Confidence:** 82%
- **Tier:** 2
- **Status:** proposed
- **Effort:** medium — search pipeline + regex pattern matching on notes
## What
Search for discussion notes that contain decision-making language. Use the existing
search pipeline but boost notes containing patterns like "decided", "agreed",
"will go with", "tradeoff", "because we", "rationale", "the approach is", "we chose".
Return the surrounding discussion context.
## Why
This is gitlore's unique value proposition — "why was this decision made?" is the
question that no other tool answers well. Architecture Decision Records are rarely
maintained; the real decisions live in discussion threads. This mines them.
## Data Required
All exists today:
- `documents` + search pipeline (for finding relevant entities)
- `notes` (body text for pattern matching)
- `discussions` (for thread context)
## Implementation Sketch
```
1. Run existing hybrid search to find entities matching the query topic
2. For each result entity, query all discussion notes
3. Score each note against decision-language patterns:
- Strong signals (weight 3): "decided to", "agreed on", "the decision is",
"we will go with", "approved approach"
- Medium signals (weight 2): "tradeoff", "because", "rationale", "chosen",
"opted for", "rejected", "alternative"
- Weak signals (weight 1): "should we", "proposal", "option A", "option B",
"pros and cons"
4. Return notes scoring above threshold, with surrounding context (previous and
next note in discussion thread)
5. Sort by: search relevance * decision score
```
### Decision Patterns (regex)
```rust
const STRONG_PATTERNS: &[&str] = &[
r"(?i)\b(decided|agreed|approved)\s+(to|on|that)\b",
r"(?i)\bthe\s+(decision|approach|plan)\s+is\b",
r"(?i)\bwe('ll| will| are going to)\s+(go with|use|implement)\b",
r"(?i)\blet'?s\s+(go with|use|do)\b",
];
const MEDIUM_PATTERNS: &[&str] = &[
r"(?i)\b(tradeoff|trade-off|rationale|because we|opted for)\b",
r"(?i)\b(rejected|ruled out|won't work|not viable)\b",
r"(?i)\b(chosen|selected|picked)\b.{0,20}\b(over|instead of)\b",
];
```
## Human Output
```
Decisions related to "authentication"
group/backend !234 — "Refactor auth middleware"
Discussion #a1b2c3 (alice, 3w ago):
"We decided to use JWT with short-lived tokens instead of session cookies.
The tradeoff is more complexity in the refresh flow, but we get stateless
auth which scales better."
Decision confidence: HIGH (3 strong pattern matches)
group/backend #42 — "Auth architecture review"
Discussion #d4e5f6 (bob, 2mo ago):
"After discussing with the security team, we'll go with bcrypt for password
hashing. Argon2 is theoretically better but bcrypt has wider library support."
Decision confidence: HIGH (2 strong pattern matches)
```
## Downsides
- Pattern matching is imperfect; may miss decisions phrased differently
- May surface "discussion about deciding" rather than actual decisions
- Non-English discussions won't match
- Requires good search results as input (garbage in, garbage out)
## Extensions
- `lore decisions --recent` — decisions made in last 30 days
- `lore decisions --author alice` — decisions made by specific person
- Export as ADR (Architecture Decision Record) format
- Combine with timeline for chronological decision history

131
docs/ideas/digest.md Normal file
View File

@@ -0,0 +1,131 @@
# "What Changed?" Digest
- **Command:** `lore digest --since <date>`
- **Confidence:** 93%
- **Tier:** 1
- **Status:** proposed
- **Effort:** medium — multiple queries across event tables, formatting logic
## What
Generate a structured summary of all activity since a given date: issues
opened/closed, MRs merged, labels changed, milestones updated, key discussions.
Group by project and sort by significance (state changes > merges > label changes >
new comments).
Default `--since` is 1 day (last 24 hours). Supports `7d`, `2w`, `YYYY-MM-DD`.
## Why
"What happened while I was on PTO?" is the most universal developer question. This
is a killer feature that leverages ALL the event data gitlore has ingested. No other
local tool provides this.
## Data Required
All exists today:
- `resource_state_events` (opened/closed/merged/reopened)
- `resource_label_events` (label add/remove)
- `resource_milestone_events` (milestone add/remove)
- `merge_requests` (merged_at for merge events)
- `issues` (created_at for new issues)
- `discussions` (last_note_at for active discussions)
## Implementation Sketch
```
1. Parse --since into ms epoch timestamp
2. Query each event table WHERE created_at >= since
3. Query new issues WHERE created_at >= since
4. Query merged MRs WHERE merged_at >= since
5. Query active discussions WHERE last_note_at >= since
6. Group all events by project
7. Within each project, sort by: state changes first, then merges, then labels
8. Format as human-readable sections or robot JSON
```
### SQL Queries
```sql
-- State changes in window
SELECT rse.*, i.iid as issue_iid, mr.iid as mr_iid,
COALESCE(i.title, mr.title) as title,
p.path_with_namespace
FROM resource_state_events rse
LEFT JOIN issues i ON rse.issue_id = i.id
LEFT JOIN merge_requests mr ON rse.merge_request_id = mr.id
JOIN projects p ON rse.project_id = p.id
WHERE rse.created_at >= ?1
ORDER BY rse.created_at DESC;
-- Newly merged MRs
SELECT mr.iid, mr.title, mr.author_username, mr.merged_at,
p.path_with_namespace
FROM merge_requests mr
JOIN projects p ON mr.project_id = p.id
WHERE mr.merged_at >= ?1
ORDER BY mr.merged_at DESC;
-- New issues
SELECT i.iid, i.title, i.author_username, i.created_at,
p.path_with_namespace
FROM issues i
JOIN projects p ON i.project_id = p.id
WHERE i.created_at >= ?1
ORDER BY i.created_at DESC;
```
## Human Output Format
```
=== What Changed (last 7 days) ===
group/backend (12 events)
Merged:
!234 Refactor auth middleware (alice, 2d ago)
!231 Fix connection pool leak (bob, 5d ago)
Closed:
#89 Login timeout on slow networks (closed by alice, 3d ago)
Opened:
#95 Rate limiting returns 500 (charlie, 1d ago)
Labels:
#90 +priority::high (dave, 4d ago)
group/frontend (3 events)
Merged:
!45 Update dashboard layout (eve, 6d ago)
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"since": "2025-01-20T00:00:00Z",
"projects": [
{
"path": "group/backend",
"merged": [ { "iid": 234, "title": "...", "author": "alice" } ],
"closed": [ { "iid": 89, "title": "...", "actor": "alice" } ],
"opened": [ { "iid": 95, "title": "...", "author": "charlie" } ],
"label_changes": [ { "iid": 90, "label": "priority::high", "action": "add" } ]
}
],
"summary": { "total_events": 15, "projects_active": 2 }
}
}
```
## Downsides
- Can be overwhelming for very active repos; needs `--limit` per category
- Doesn't capture nuance (a 200-comment MR merge is more significant than a typo fix)
- Only shows what gitlore has synced; stale data = stale digest
## Extensions
- `lore digest --author alice` — personal activity digest
- `lore digest --project group/backend` — single project scope
- `lore digest --format markdown` — paste-ready for Slack/email
- Combine with weekly-digest for scheduled summaries

120
docs/ideas/experts.md Normal file
View File

@@ -0,0 +1,120 @@
# Who Knows About X?
- **Command:** `lore experts <path-or-topic>`
- **Confidence:** 92%
- **Tier:** 1
- **Status:** proposed
- **Effort:** medium — two query paths (file-based, topic-based)
## What
Given a file path, find people who have authored MRs touching that file, left
DiffNotes on that file, or discussed issues referencing that file. Given a topic
string, use search to find relevant entities then extract the active participants.
## Why
"Who should I ask about the auth module?" is one of the most common questions in
large teams. This answers it empirically from actual contribution and review data.
No guessing, no out-of-date wiki pages.
## Data Required
All exists today:
- `mr_file_changes` (new_path, merge_request_id) — who changed the file
- `notes` (position_new_path, author_username) — who reviewed the file
- `merge_requests` (author_username) — MR authorship
- `documents` + search pipeline — for topic-based queries
- `discussions` + `notes` — for participant extraction
## Implementation Sketch
### Path Mode: `lore experts src/auth/`
```
1. Query mr_file_changes WHERE new_path LIKE 'src/auth/%'
2. Join merge_requests to get author_username for each MR
3. Query notes WHERE position_new_path LIKE 'src/auth/%'
4. Collect all usernames with activity counts
5. Rank by: MR authorship (weight 3) + DiffNote authorship (weight 2) + discussion participation (weight 1)
6. Apply recency decay (recent activity weighted higher)
```
### Topic Mode: `lore experts "authentication timeout"`
```
1. Run existing hybrid search for the topic
2. Collect top N document results
3. For each document, extract author_username
4. For each document's entity, query discussions and collect note authors
5. Rank by frequency and recency
```
### SQL (Path Mode)
```sql
-- Authors who changed files matching pattern
SELECT mr.author_username, COUNT(*) as changes, MAX(mr.merged_at) as last_active
FROM mr_file_changes mfc
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
WHERE mfc.new_path LIKE ?1
AND mr.state = 'merged'
GROUP BY mr.author_username
ORDER BY changes DESC;
-- Reviewers who commented on files matching pattern
SELECT n.author_username, COUNT(*) as reviews, MAX(n.created_at) as last_active
FROM notes n
WHERE n.position_new_path LIKE ?1
AND n.note_type = 'DiffNote'
AND n.is_system = 0
GROUP BY n.author_username
ORDER BY reviews DESC;
```
## Human Output Format
```
Experts for: src/auth/
alice 12 changes, 8 reviews (last active 3d ago) [top contributor]
bob 3 changes, 15 reviews (last active 1d ago) [top reviewer]
charlie 5 changes, 2 reviews (last active 2w ago)
dave 1 change, 0 reviews (last active 3mo ago) [stale]
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"query": "src/auth/",
"query_type": "path",
"experts": [
{
"username": "alice",
"changes": 12,
"reviews": 8,
"discussions": 3,
"score": 62,
"last_active": "2025-01-25T10:00:00Z",
"role": "top_contributor"
}
]
}
}
```
## Downsides
- Historical data may be stale (people leave teams, change roles)
- Path mode requires `mr_file_changes` to be populated (Gate 4 ingestion)
- Topic mode quality depends on search quality
- Doesn't account for org chart / actual ownership
## Extensions
- `lore experts --since 90d` — recency filter
- `lore experts --min-activity 3` — noise filter
- Combine with `lore silos` to highlight when an expert is the ONLY expert

75
docs/ideas/graph.md Normal file
View File

@@ -0,0 +1,75 @@
# Entity Relationship Explorer
- **Command:** `lore graph <entity-type> <iid>`
- **Confidence:** 80%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — BFS traversal (similar to timeline expand), output formatting
## What
Given an issue or MR, traverse `entity_references` and display all connected
entities with relationship types and depths. Output as tree, JSON, or Mermaid diagram.
## Why
The entity_references graph is already built (Gate 2) but has no dedicated
exploration command. Timeline shows events over time; this shows the relationship
structure. "What's connected to this issue?" is a different question from "what
happened to this issue?"
## Data Required
All exists today:
- `entity_references` (source/target entity, reference_type)
- `issues` / `merge_requests` (for entity context)
- Timeline expand stage already implements BFS over this graph
## Implementation Sketch
```
1. Resolve entity type + iid to local ID
2. BFS over entity_references:
- Follow source→target AND target→source (bidirectional)
- Track depth (--depth flag, default 2)
- Track reference_type for edge labels
3. Hydrate each discovered entity with title, state, URL
4. Format as tree / JSON / Mermaid
```
## Human Output (Tree)
```
#42 Login timeout bug (CLOSED)
├── closes ── !234 Refactor auth middleware (MERGED)
│ ├── mentioned ── #38 Connection timeout in auth flow (CLOSED)
│ └── mentioned ── #51 Token refresh improvements (OPEN)
├── related ── #45 Auth module documentation (OPEN)
└── mentioned ── !228 Database migration (MERGED)
└── closes ── #35 Schema version drift (CLOSED)
```
## Mermaid Output
```mermaid
graph LR
I42["#42 Login timeout"] -->|closes| MR234["!234 Refactor auth"]
MR234 -->|mentioned| I38["#38 Connection timeout"]
MR234 -->|mentioned| I51["#51 Token refresh"]
I42 -->|related| I45["#45 Auth docs"]
I42 -->|mentioned| MR228["!228 DB migration"]
MR228 -->|closes| I35["#35 Schema drift"]
```
## Downsides
- Overlaps somewhat with timeline (but different focus: structure vs chronology)
- High fan-out for popular entities (need depth + limit controls)
- Unresolved cross-project references appear as dead ends
## Extensions
- `lore graph --format dot` — GraphViz DOT output
- `lore graph --format mermaid` — Mermaid diagram
- `lore graph --include-discussions` — show discussion threads as nodes
- Interactive HTML visualization (future web UI)

70
docs/ideas/hotspots.md Normal file
View File

@@ -0,0 +1,70 @@
# File Hotspot Report
- **Command:** `lore hotspots [--since <date>]`
- **Confidence:** 85%
- **Tier:** 2
- **Status:** proposed
- **Effort:** low — single query on mr_file_changes (requires Gate 4 population)
## What
Rank files by frequency of appearance in merged MRs over a time window. Show
change_type breakdown (modified vs added vs deleted). Optionally filter by project.
## Why
Hot files are where bugs live. This is a proven engineering metric (see "Your Code
as a Crime Scene" by Adam Tornhill). High-churn files deserve extra test coverage,
better documentation, and architectural review.
## Data Required
- `mr_file_changes` (new_path, change_type, merge_request_id) — needs Gate 4 population
- `merge_requests` (merged_at, state='merged')
## Implementation Sketch
```sql
SELECT
mfc.new_path,
p.path_with_namespace,
COUNT(*) as total_changes,
SUM(CASE WHEN mfc.change_type = 'modified' THEN 1 ELSE 0 END) as modifications,
SUM(CASE WHEN mfc.change_type = 'added' THEN 1 ELSE 0 END) as additions,
SUM(CASE WHEN mfc.change_type = 'deleted' THEN 1 ELSE 0 END) as deletions,
SUM(CASE WHEN mfc.change_type = 'renamed' THEN 1 ELSE 0 END) as renames,
COUNT(DISTINCT mr.author_username) as unique_authors
FROM mr_file_changes mfc
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
JOIN projects p ON mfc.project_id = p.id
WHERE mr.state = 'merged'
AND mr.merged_at >= ?1
GROUP BY mfc.new_path, p.path_with_namespace
ORDER BY total_changes DESC
LIMIT ?2;
```
## Human Output
```
File Hotspots (last 90 days, top 20)
File Changes Authors Type Breakdown
src/auth/middleware.rs 18 4 14 mod, 3 add, 1 del
src/api/routes.rs 15 3 12 mod, 2 add, 1 rename
src/db/migrations.rs 12 2 8 mod, 4 add
tests/integration/auth_test.rs 11 3 9 mod, 2 add
```
## Downsides
- Requires `mr_file_changes` to be populated (Gate 4 ingestion)
- Doesn't distinguish meaningful changes from trivial ones (formatting, imports)
- Configuration files (CI, Cargo.toml) will rank high but aren't risky
## Extensions
- `lore hotspots --exclude "*.toml,*.yml"` — filter out config files
- `lore hotspots --dir src/auth/` — scope to directory
- Combine with `lore silos` for risk scoring: high churn + bus factor 1 = critical
- Complexity trend: correlate with discussion count (churn + many discussions = problematic)

69
docs/ideas/idle.md Normal file
View File

@@ -0,0 +1,69 @@
# Idle Work Detector
- **Command:** `lore idle [--days <N>] [--labels <pattern>]`
- **Confidence:** 73%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — label event querying with configurable patterns
## What
Find entities that received an "in progress" or similar label but have had no
discussion activity for N days. Cross-reference with assignee to show who might
have forgotten about something.
## Why
Forgotten WIP is invisible waste. Developers start work, get pulled to something
urgent, and the original task sits idle. This makes it visible before it becomes
a problem.
## Data Required
All exists today:
- `resource_label_events` (label_name, action='add', created_at)
- `discussions` (last_note_at for entity activity)
- `issues` / `merge_requests` (state, assignees)
- `issue_assignees` / `mr_assignees`
## Implementation Sketch
```
1. Query resource_label_events for labels matching "in progress" patterns
Default patterns: "in-progress", "in_progress", "doing", "wip",
"workflow::in-progress", "status::in-progress"
Configurable via --labels flag
2. For each entity with an "in progress" label still applied:
a. Check if the label was subsequently removed (if so, skip)
b. Get last_note_at from discussions for that entity
c. Flag if last_note_at is older than threshold
3. Join with assignees for attribution
```
## Human Output
```
Idle Work (labeled "in progress" but no activity for 14+ days)
group/backend
#90 Rate limiting design assigned to: charlie idle 18 days
Last activity: label +priority::high by dave
#85 Cache invalidation fix assigned to: alice idle 21 days
Last activity: discussion comment by bob
group/frontend
!230 Dashboard redesign assigned to: eve idle 14 days
Last activity: DiffNote by dave
```
## Downsides
- Requires label naming conventions; no universal standard
- Work may be happening outside GitLab (local branch, design doc)
- "Idle" threshold is subjective; 14 days may be normal for large features
## Extensions
- `lore idle --assignee alice` — personal idle work check
- `lore idle --notify` — generate message templates for nudging owners
- Configurable label patterns in config.json for team-specific workflows

View File

@@ -0,0 +1,92 @@
# Cross-Project Impact Graph
- **Command:** `lore impact-graph [--format json|dot|mermaid]`
- **Confidence:** 75%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — aggregation over entity_references, graph output formatting
## What
Aggregate `entity_references` by project pair to produce a weighted adjacency matrix
showing how projects reference each other. Output as JSON, DOT, or Mermaid for
visualization.
## Why
Makes invisible architectural coupling visible. "Backend and frontend repos have
47 cross-references this quarter" tells you about tight coupling that may need
architectural attention.
## Data Required
All exists today:
- `entity_references` (source/target entity IDs)
- `issues` / `merge_requests` (project_id for source/target)
- `projects` (path_with_namespace)
## Implementation Sketch
```sql
-- Project-to-project reference counts
WITH ref_projects AS (
SELECT
CASE er.source_entity_type
WHEN 'issue' THEN i_src.project_id
WHEN 'merge_request' THEN mr_src.project_id
END as source_project_id,
CASE er.target_entity_type
WHEN 'issue' THEN i_tgt.project_id
WHEN 'merge_request' THEN mr_tgt.project_id
END as target_project_id,
er.reference_type
FROM entity_references er
LEFT JOIN issues i_src ON er.source_entity_type = 'issue' AND er.source_entity_id = i_src.id
LEFT JOIN merge_requests mr_src ON er.source_entity_type = 'merge_request' AND er.source_entity_id = mr_src.id
LEFT JOIN issues i_tgt ON er.target_entity_type = 'issue' AND er.target_entity_id = i_tgt.id
LEFT JOIN merge_requests mr_tgt ON er.target_entity_type = 'merge_request' AND er.target_entity_id = mr_tgt.id
WHERE er.target_entity_id IS NOT NULL -- resolved references only
)
SELECT
p_src.path_with_namespace as source_project,
p_tgt.path_with_namespace as target_project,
er.reference_type,
COUNT(*) as weight
FROM ref_projects rp
JOIN projects p_src ON rp.source_project_id = p_src.id
JOIN projects p_tgt ON rp.target_project_id = p_tgt.id
WHERE rp.source_project_id != rp.target_project_id -- cross-project only
GROUP BY p_src.path_with_namespace, p_tgt.path_with_namespace, er.reference_type
ORDER BY weight DESC;
```
## Output Formats
### Mermaid
```mermaid
graph LR
Backend -->|closes 23| Frontend
Backend -->|mentioned 47| Infrastructure
Frontend -->|mentioned 12| Backend
```
### DOT
```dot
digraph impact {
"group/backend" -> "group/frontend" [label="closes: 23"];
"group/backend" -> "group/infra" [label="mentioned: 47"];
}
```
## Downsides
- Requires multiple projects synced; limited value for single-project users
- "Mentioned" references are noisy (high volume, low signal)
- Doesn't capture coupling through shared libraries or APIs (code-level coupling)
## Extensions
- `lore impact-graph --since 90d` — time-scoped coupling analysis
- `lore impact-graph --type closes` — only meaningful reference types
- Include unresolved references to show dependencies on un-synced projects
- Coupling trend: is cross-project coupling increasing over time?

97
docs/ideas/label-audit.md Normal file
View File

@@ -0,0 +1,97 @@
# Label Hygiene Audit
- **Command:** `lore label-audit`
- **Confidence:** 82%
- **Tier:** 2
- **Status:** proposed
- **Effort:** low — straightforward aggregation queries
## What
Report on label health:
- Labels used only once (may be typos or abandoned experiments)
- Labels applied and removed within 1 hour (likely mistakes)
- Labels with no active issues/MRs (orphaned)
- Label name collisions across projects (same name, different meaning)
- Labels never used at all (defined but not applied)
## Why
Label sprawl is real and makes filtering useless over time. Teams create labels
ad-hoc and never clean them up. This simple audit surfaces maintenance tasks.
## Data Required
All exists today:
- `labels` (name, project_id)
- `issue_labels` / `mr_labels` (usage counts)
- `resource_label_events` (add/remove pairs for mistake detection)
- `issues` / `merge_requests` (state for "active" filtering)
## Implementation Sketch
```sql
-- Labels used only once
SELECT l.name, p.path_with_namespace, COUNT(*) as usage
FROM labels l
JOIN projects p ON l.project_id = p.id
LEFT JOIN issue_labels il ON il.label_id = l.id
LEFT JOIN mr_labels ml ON ml.label_id = l.id
GROUP BY l.id
HAVING COUNT(il.issue_id) + COUNT(ml.merge_request_id) = 1;
-- Flash labels (applied and removed within 1 hour)
SELECT
rle1.label_name,
rle1.created_at as added_at,
rle2.created_at as removed_at,
(rle2.created_at - rle1.created_at) / 60000 as minutes_active
FROM resource_label_events rle1
JOIN resource_label_events rle2
ON rle1.issue_id = rle2.issue_id
AND rle1.label_name = rle2.label_name
AND rle1.action = 'add'
AND rle2.action = 'remove'
AND rle2.created_at > rle1.created_at
AND (rle2.created_at - rle1.created_at) < 3600000;
-- Unused labels (defined but never applied)
SELECT l.name, p.path_with_namespace
FROM labels l
JOIN projects p ON l.project_id = p.id
LEFT JOIN issue_labels il ON il.label_id = l.id
LEFT JOIN mr_labels ml ON ml.label_id = l.id
WHERE il.issue_id IS NULL AND ml.merge_request_id IS NULL;
```
## Human Output
```
Label Audit
Unused Labels (4):
group/backend: deprecated-v1, needs-triage, wontfix-maybe
group/frontend: old-design
Single-Use Labels (3):
group/backend: perf-regression (1 issue)
group/frontend: ux-debt (1 MR), mobile-only (1 issue)
Flash Labels (applied < 1hr, 2):
group/backend #90: +priority::critical then -priority::critical (12 min)
group/backend #85: +blocked then -blocked (5 min)
Cross-Project Collisions (1):
"needs-review" used in group/backend (32 uses) AND group/frontend (8 uses)
```
## Downsides
- Low glamour; this is janitorial work
- Single-use labels may be legitimate (one-off categorization)
- Cross-project collisions may be intentional (shared vocabulary)
## Extensions
- `lore label-audit --fix` — suggest deletions for unused labels
- Trend: label count over time (is sprawl increasing?)

74
docs/ideas/label-flow.md Normal file
View File

@@ -0,0 +1,74 @@
# Label Velocity
- **Command:** `lore label-flow <from-label> <to-label>`
- **Confidence:** 78%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — self-join on resource_label_events, percentile computation
## What
For a given label pair (e.g., "needs-review" to "approved"), compute median and P90
transition times using `resource_label_events`. Shows how fast work moves through
your process labels.
Also supports: single label dwell time (how long does "in-progress" stay applied?).
## Why
Process bottlenecks become quantifiable. "Our code review takes a median of 3 days"
is actionable data for retrospectives and process improvement.
## Data Required
All exists today:
- `resource_label_events` (label_name, action, created_at, issue_id, merge_request_id)
## Implementation Sketch
```sql
-- Label A → Label B transition time
WITH add_a AS (
SELECT issue_id, merge_request_id, MIN(created_at) as added_at
FROM resource_label_events
WHERE label_name = ?1 AND action = 'add'
GROUP BY issue_id, merge_request_id
),
add_b AS (
SELECT issue_id, merge_request_id, MIN(created_at) as added_at
FROM resource_label_events
WHERE label_name = ?2 AND action = 'add'
GROUP BY issue_id, merge_request_id
)
SELECT
(b.added_at - a.added_at) / 3600000.0 as hours_transition
FROM add_a a
JOIN add_b b ON a.issue_id = b.issue_id OR a.merge_request_id = b.merge_request_id
WHERE b.added_at > a.added_at;
```
Then compute percentiles in Rust (median, P75, P90).
## Human Output
```
Label Flow: "needs-review" → "approved"
Transitions: 42 issues/MRs in last 90 days
Median: 18.5 hours
P75: 36.2 hours
P90: 72.8 hours
Slowest: !234 Refactor auth (168 hours)
```
## Downsides
- Only works if teams use label-based workflows consistently
- Labels may be applied out of order or skipped
- Self-join performance could be slow with many events
## Extensions
- `lore label-flow --dwell "in-progress"` — how long does a label stay?
- `lore label-flow --all` — auto-discover common transitions from event data
- Visualization: label state machine with median transition times on edges

View File

@@ -0,0 +1,81 @@
# Milestone Risk Report
- **Command:** `lore milestone-risk [title]`
- **Confidence:** 78%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — milestone + issue aggregation with scope change detection
## What
For each active milestone (or a specific one): show total issues, % closed, issues
added after milestone creation (scope creep), issues with no assignee, issues with
overdue due_date. Flag milestones where completion rate is below expected trajectory.
## Why
Milestone health is usually assessed by gut feel. This provides objective signals
from data already ingested. Project managers can spot risks early.
## Data Required
All exists today:
- `milestones` (title, state, due_date)
- `issues` (milestone_id, state, created_at, due_date, assignee)
- `issue_assignees` (for unassigned detection)
## Implementation Sketch
```sql
SELECT
m.title,
m.state,
m.due_date,
COUNT(*) as total_issues,
SUM(CASE WHEN i.state = 'closed' THEN 1 ELSE 0 END) as closed,
SUM(CASE WHEN i.state = 'opened' THEN 1 ELSE 0 END) as open,
SUM(CASE WHEN i.created_at > m.created_at THEN 1 ELSE 0 END) as scope_creep,
SUM(CASE WHEN ia.username IS NULL AND i.state = 'opened' THEN 1 ELSE 0 END) as unassigned,
SUM(CASE WHEN i.due_date < DATE('now') AND i.state = 'opened' THEN 1 ELSE 0 END) as overdue
FROM milestones m
JOIN issues i ON i.milestone_id = m.id
LEFT JOIN issue_assignees ia ON ia.issue_id = i.id
WHERE m.state = 'active'
GROUP BY m.id;
```
Note: `created_at` comparison for scope creep is approximate — GitLab doesn't
expose when an issue was added to a milestone via its milestone_events.
Actually we DO have `resource_milestone_events` — use those for precise scope change
detection.
## Human Output
```
Milestone Risk Report
v2.0 (due Feb 15, 2025)
Progress: 14/20 closed (70%)
Scope: +3 issues added after milestone start
Risks: 2 issues overdue, 1 issue unassigned
Status: ON TRACK (70% complete, 60% time elapsed)
v2.1 (due Mar 30, 2025)
Progress: 2/15 closed (13%)
Scope: +8 issues added after milestone start
Risks: 5 issues unassigned
Status: AT RISK (13% complete, scope still growing)
```
## Downsides
- Milestone semantics vary wildly between teams
- "Scope creep" detection is noisy if teams batch-add issues to milestones
- due_date comparison assumes consistent timezone handling
## Extensions
- `lore milestone-risk --history` — show scope changes over time
- Velocity estimation: at current closure rate, will the milestone finish on time?
- Combine with label-flow for "how fast are milestone issues moving through workflow"

67
docs/ideas/mr-pipeline.md Normal file
View File

@@ -0,0 +1,67 @@
# MR Pipeline Efficiency
- **Command:** `lore mr-pipeline [--since <date>]`
- **Confidence:** 78%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — builds on bottleneck detector with more stages
## What
Track the full MR lifecycle: creation, first review, all reviews complete (threads
resolved), approval, merge. Compute time spent in each stage across all MRs.
Identify which stage is the bottleneck.
## Why
"Our merge process is slow" is vague. This breaks it into stages so teams can target
the actual bottleneck. Maybe creation-to-review is fast but review-to-merge is slow
(merge queue issues). Maybe first review is fast but resolution takes forever
(contentious code).
## Data Required
All exists today:
- `merge_requests` (created_at, merged_at)
- `notes` (note_type='DiffNote', created_at, author_username)
- `discussions` (resolved, resolvable, merge_request_id)
- `resource_state_events` (state changes with timestamps)
## Implementation Sketch
For each merged MR, compute:
1. **Created → First Review**: MIN(DiffNote.created_at) - mr.created_at
2. **First Review → All Resolved**: MAX(discussion.resolved_at) - MIN(DiffNote.created_at)
3. **All Resolved → Merged**: mr.merged_at - MAX(discussion.resolved_at)
Note: "resolved_at" isn't directly stored but can be approximated from the last
note in resolved discussions, or from state events.
## Human Output
```
MR Pipeline (last 30 days, 24 merged MRs)
Stage Median P75 P90
Created → First Review 4.2h 12.1h 28.3h
First Review → Resolved 8.1h 24.5h 72.0h <-- BOTTLENECK
Resolved → Merged 0.5h 1.2h 3.1h
Total (Created → Merged) 18.4h 48.2h 96.1h
Biggest bottleneck: Review resolution (median 8.1h)
Suggestion: Consider breaking large MRs into smaller reviewable chunks
```
## Downsides
- "Resolved" timestamp approximation may be inaccurate
- Pipeline assumes linear flow; real MRs have back-and-forth cycles
- Draft MRs skew metrics (created early, reviewed late intentionally)
## Extensions
- `lore mr-pipeline --exclude-drafts` — cleaner metrics
- Per-project comparison: which project has the fastest pipeline?
- Trend line: weekly pipeline speed over time
- Break down by MR size (files changed) to normalize

View File

@@ -0,0 +1,265 @@
# Multi-Project Ergonomics
- **Confidence:** 90%
- **Tier:** 1
- **Status:** proposed
- **Effort:** medium (multiple small improvements that compound)
## The Problem
Every command that touches project-scoped data requires `-p group/subgroup/project`
to disambiguate. For users with 5+ projects synced, this is:
- Repetitive: typing `-p infra/platform/auth-service` on every query
- Error-prone: mistyping long paths
- Discoverable only by failure: you don't know you need `-p` until you hit an
ambiguous error
The fuzzy matching in `resolve_project` is already good (suffix, substring,
case-insensitive) but it only kicks in on the `-p` value itself. There's no way to
set a default, group projects, or scope a whole session.
## Proposed Improvements
### 1. Project Aliases in Config
Let users define short aliases for long project paths.
```json
{
"projects": [
{ "path": "infra/platform/auth-service", "alias": "auth" },
{ "path": "infra/platform/billing-service", "alias": "billing" },
{ "path": "frontend/customer-portal", "alias": "portal" },
{ "path": "frontend/admin-dashboard", "alias": "admin" }
]
}
```
Then: `lore issues -p auth` resolves via alias before falling through to fuzzy match.
**Implementation:** Add optional `alias` field to `ProjectConfig`. In
`resolve_project`, check aliases before the existing exact/suffix/substring cascade.
```rust
#[derive(Debug, Clone, Deserialize)]
pub struct ProjectConfig {
pub path: String,
#[serde(default)]
pub alias: Option<String>,
}
```
Resolution order becomes:
1. Exact alias match (new)
2. Exact path match
3. Case-insensitive path match
4. Suffix match
5. Substring match
### 2. Default Project (`LORE_PROJECT` env var)
Set a default project for your shell session so you don't need `-p` at all.
```bash
export LORE_PROJECT=auth
lore issues # scoped to auth-service
lore mrs --state opened # scoped to auth-service
lore search "timeout bug" # scoped to auth-service
lore issues -p billing # explicit -p overrides the env var
```
**Implementation:** In every command that accepts `-p`, fall back to
`std::env::var("LORE_PROJECT")` when the flag is absent. The `-p` flag always wins.
Could also support a config-level default:
```json
{
"defaultProject": "auth"
}
```
Precedence: CLI flag > env var > config default > (no filter).
### 3. `lore use <project>` — Session Context Switcher
A command that sets `LORE_PROJECT` for the current shell by writing to a dotfile.
```bash
lore use auth
# writes ~/.local/state/lore/current-project containing "auth"
lore issues # reads current-project file, scopes to auth
lore use --clear # removes the file, back to all-project mode
lore use # shows current project context
```
This is similar to `kubectl config use-context`, `nvm use`, or `tfenv use`.
**Implementation:** Write a one-line file at a known state path. Each command reads
it as the lowest-priority default (below env var and CLI flag).
Precedence: CLI flag > env var > `lore use` state file > config default > (no filter).
### 4. `lore projects` — Project Listing and Discovery
A dedicated command to see what's synced, with aliases and activity stats.
```bash
$ lore projects
Alias Path Issues MRs Last Sync
auth infra/platform/auth-service 142 87 2h ago
billing infra/platform/billing-service 56 34 2h ago
portal frontend/customer-portal 203 112 2h ago
admin frontend/admin-dashboard 28 15 3d ago
- data/ml-pipeline 89 45 2h ago
```
Robot mode returns the same as JSON with alias, path, counts, and last sync time.
**Implementation:** Query `projects` joined with `COUNT(issues)`, `COUNT(mrs)`,
and `MAX(sync_runs.finished_at)`. Overlay aliases from config.
### 5. Project Groups in Config
Let users define named groups of projects for batch scoping.
```json
{
"projectGroups": {
"backend": ["auth", "billing", "data/ml-pipeline"],
"frontend": ["portal", "admin"],
"all-infra": ["auth", "billing"]
}
}
```
Then: `lore issues -p @backend` (or `--group backend`) queries across all projects
in the group.
**Implementation:** When `-p` value starts with `@`, look up the group and resolve
each member project. Pass as a `Vec<i64>` of project IDs to the query layer.
This is especially powerful for:
- `lore search "auth bug" -p @backend` — search across related repos
- `lore digest --since 7d -p @frontend` — team-scoped activity digest
- `lore timeline "deployment" -p @all-infra` — cross-repo timeline
### 6. Git-Aware Project Detection
When running `lore` from inside a git repo that matches a synced project, auto-scope
to that project without any flags.
```bash
cd ~/code/auth-service
lore issues # auto-detects this is infra/platform/auth-service
```
**Implementation:** Read `.git/config` for the remote URL, extract the project path,
check if it matches a synced project. Only activate when exactly one project matches.
Detection logic:
```
1. Check if cwd is inside a git repo (find .git)
2. Parse git remote origin URL
3. Extract path component (e.g., "infra/platform/auth-service.git" → "infra/platform/auth-service")
4. Match against synced projects
5. If exactly one match, use as implicit -p
6. If ambiguous or no match, do nothing (fall through to normal behavior)
```
Precedence: CLI flag > env var > `lore use` > config default > git detection > (no filter).
This is similar to how `gh` (GitHub CLI) auto-detects the repo you're in.
### 7. Prompt Integration / Shell Function
Provide a shell function that shows the current project context in the prompt.
```bash
# In .bashrc / .zshrc
eval "$(lore completions zsh)"
PROMPT='$(lore-prompt)%~ %# '
```
Output: `[lore:auth] ~/code/auth-service %`
Shows which project `lore` commands will scope to, using the same precedence chain.
Helps users understand what context they're in before running a query.
### 8. Short Project References in Output
Once aliases exist, use them everywhere in output for brevity:
**Before:**
```
infra/platform/auth-service#42 Login timeout bug
infra/platform/auth-service!234 Refactor auth middleware
```
**After:**
```
auth#42 Login timeout bug
auth!234 Refactor auth middleware
```
With `--full-paths` flag to get the verbose form when needed.
## Combined UX Flow
With all improvements, a typical session looks like:
```bash
# One-time config
lore init # sets up aliases during interactive setup
# Daily use
lore use auth # set context
lore issues --state opened # no -p needed
lore search "timeout" # scoped to auth
lore timeline "login flow" # scoped to auth
lore issues -p @backend # cross-repo query via group
lore mrs -p billing # quick alias switch
lore use --clear # back to global
```
Or for the power user who never wants to type `lore use`:
```bash
cd ~/code/auth-service
lore issues # git-aware auto-detection
```
Or for the scripter:
```bash
LORE_PROJECT=auth lore --robot issues -n 50 # env var for automation
```
## Priority Order
Implement in this order for maximum incremental value:
1. **Project aliases** — smallest change, biggest daily friction reduction
2. **`LORE_PROJECT` env var** — trivial to implement, enables scripting
3. **`lore projects` command** — discoverability, completes the alias story
4. **`lore use` context** — nice-to-have for heavy users
5. **Project groups** — high value for multi-repo teams
6. **Git-aware detection** — polish, "it just works" feel
7. **Short refs in output** — ties into timeline issue #001
8. **Prompt integration** — extra polish
## Relationship to Issue #001
The timeline entity-ref ambiguity (issue #001) is solved naturally by items 7 and 8
here. Once aliases exist, `format_entity_ref` can use the alias as the short project
identifier in multi-project output:
```
auth#42 instead of infra/platform/auth-service#42
```
And in single-project timelines (detected via `lore use` or git-aware), the project
prefix is omitted entirely — matching the current behavior but now intentionally.

View File

@@ -0,0 +1,81 @@
# Recurring Bug Pattern Detector
- **Command:** `lore recurring-patterns [--min-cluster <N>]`
- **Confidence:** 76%
- **Tier:** 3
- **Status:** proposed
- **Effort:** high — vector clustering, threshold tuning
## What
Cluster closed issues by embedding similarity. Identify clusters of 3+ issues that
are semantically similar — these represent recurring problems that need a systemic
fix rather than one-off patches.
## Why
Finding the same bug filed 5 different ways is one of the most impactful things you
can surface. This is a sophisticated use of the embedding pipeline that no competing
tool offers. It turns "we keep having auth issues" from a gut feeling into data.
## Data Required
All exists today:
- `documents` (source_type='issue', content_text)
- `embeddings` (768-dim vectors)
- `issues` (state='closed' for filtering)
## Implementation Sketch
```
1. Collect all embeddings for closed issue documents
2. For each issue, find K nearest neighbors (K=10)
3. Build adjacency graph: edge exists if similarity > threshold (e.g., 0.80)
4. Find connected components (simple DFS/BFS)
5. Filter to components with >= min-cluster members (default 3)
6. For each cluster:
a. Extract common terms (TF-IDF or simple word frequency)
b. Sort by recency (most recent issue first)
c. Report cluster with: theme, member issues, time span
```
### Similarity Threshold Tuning
This is the critical parameter. Too low = noise, too high = misses.
- Start at 0.80 cosine similarity
- Expose as `--threshold` flag for user tuning
- Report cluster cohesion score for transparency
## Human Output
```
Recurring Patterns (3+ similar closed issues)
Cluster 1: "Authentication timeout errors" (5 issues, spanning 6 months)
#89 Login timeout on slow networks (closed 3d ago)
#72 Auth flow hangs on cellular (closed 2mo ago)
#58 Token refresh timeout (closed 3mo ago)
#45 SSO login timeout for remote users (closed 5mo ago)
#31 Connection timeout in auth middleware (closed 6mo ago)
Avg similarity: 0.87 | Suggested: systemic fix for auth timeout handling
Cluster 2: "Cache invalidation issues" (3 issues, spanning 2 months)
#85 Stale cache after deploy (closed 2w ago)
#77 Cache headers not updated (closed 1mo ago)
#69 Dashboard shows old data after settings change (closed 2mo ago)
Avg similarity: 0.82 | Suggested: review cache invalidation strategy
```
## Downsides
- Clustering quality depends on embedding quality and threshold tuning
- May produce false clusters (issues that mention similar terms but are different problems)
- Computationally expensive for large issue counts (N^2 comparisons)
- Need to handle multi-chunk documents (aggregate embeddings)
## Extensions
- `lore recurring-patterns --open` — find clusters in open issues (duplicates to merge)
- `lore recurring-patterns --cross-project` — patterns across repos
- Trend detection: are cluster sizes growing? (escalating problem)
- Export as report for engineering retrospectives

View File

@@ -0,0 +1,78 @@
# DiffNote Coverage Map
- **Command:** `lore review-coverage <mr-iid>`
- **Confidence:** 75%
- **Tier:** 3
- **Status:** proposed
- **Effort:** medium — join DiffNote positions with mr_file_changes
## What
For a specific MR, show which files received review comments (DiffNotes) vs. which
files were changed but received no review attention. Highlights blind spots in code
review.
## Why
Large MRs often have files that get reviewed thoroughly and files that slip through
with no comments. This makes the review coverage visible so teams can decide if
un-reviewed files need a second look.
## Data Required
All exists today:
- `mr_file_changes` (new_path per MR)
- `notes` (position_new_path, note_type='DiffNote', discussion_id)
- `discussions` (merge_request_id)
## Implementation Sketch
```sql
SELECT
mfc.new_path,
mfc.change_type,
COUNT(DISTINCT n.id) as review_comments,
COUNT(DISTINCT d.id) as review_threads,
CASE WHEN COUNT(n.id) = 0 THEN 'NOT REVIEWED' ELSE 'REVIEWED' END as status
FROM mr_file_changes mfc
LEFT JOIN notes n ON n.position_new_path = mfc.new_path
AND n.note_type = 'DiffNote'
AND n.is_system = 0
LEFT JOIN discussions d ON n.discussion_id = d.id
AND d.merge_request_id = mfc.merge_request_id
WHERE mfc.merge_request_id = ?1
GROUP BY mfc.new_path
ORDER BY review_comments DESC;
```
## Human Output
```
Review Coverage for !234 — Refactor auth middleware
REVIEWED (5 files, 23 comments)
src/auth/middleware.rs 12 comments, 4 threads
src/auth/jwt.rs 6 comments, 2 threads
src/auth/session.rs 3 comments, 1 thread
tests/auth/middleware_test.rs 1 comment, 1 thread
src/auth/mod.rs 1 comment, 1 thread
NOT REVIEWED (3 files)
src/auth/types.rs modified [no review comments]
src/api/routes.rs modified [no review comments]
Cargo.toml modified [no review comments]
Coverage: 5/8 files (62.5%)
```
## Downsides
- Reviewers may have reviewed a file without leaving comments (approval by silence)
- position_new_path matching may not cover all DiffNote position formats
- Config files (Cargo.toml) not being reviewed is usually fine
## Extensions
- `lore review-coverage --all --since 30d` — aggregate coverage across all MRs
- Per-reviewer breakdown: which reviewers cover which files?
- Coverage heatmap: files that consistently escape review across multiple MRs

90
docs/ideas/silos.md Normal file
View File

@@ -0,0 +1,90 @@
# Knowledge Silo Detection
- **Command:** `lore silos [--min-changes <N>]`
- **Confidence:** 87%
- **Tier:** 2
- **Status:** proposed
- **Effort:** medium — requires mr_file_changes population (Gate 4)
## What
For each file path (or directory), count unique MR authors. Flag paths where only
1 person has ever authored changes (bus factor = 1). Aggregate by directory to show
silo areas.
## Why
Bus factor analysis is critical for team resilience. If only one person has ever
touched the auth module, that's a risk. This uses data already ingested to surface
knowledge concentration that's otherwise invisible.
## Data Required
- `mr_file_changes` (new_path, merge_request_id) — needs Gate 4 ingestion
- `merge_requests` (author_username, state='merged')
- `projects` (path_with_namespace)
## Implementation Sketch
```sql
-- Find directories with bus factor = 1
WITH file_authors AS (
SELECT
mfc.new_path,
mr.author_username,
p.path_with_namespace,
mfc.project_id
FROM mr_file_changes mfc
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
JOIN projects p ON mfc.project_id = p.id
WHERE mr.state = 'merged'
),
directory_authors AS (
SELECT
project_id,
path_with_namespace,
-- Extract directory: everything before last '/'
CASE
WHEN INSTR(new_path, '/') > 0
THEN SUBSTR(new_path, 1, LENGTH(new_path) - LENGTH(REPLACE(RTRIM(new_path, REPLACE(new_path, '/', '')), '', '')))
ELSE '.'
END as directory,
COUNT(DISTINCT author_username) as unique_authors,
COUNT(*) as total_changes,
GROUP_CONCAT(DISTINCT author_username) as authors
FROM file_authors
GROUP BY project_id, directory
)
SELECT * FROM directory_authors
WHERE unique_authors = 1
AND total_changes >= ?1 -- min-changes threshold
ORDER BY total_changes DESC;
```
## Human Output
```
Knowledge Silos (bus factor = 1, min 3 changes)
group/backend
src/auth/ alice (8 changes) HIGH RISK
src/billing/ bob (5 changes) HIGH RISK
src/utils/cache/ charlie (3 changes) MODERATE RISK
group/frontend
src/admin/ dave (12 changes) HIGH RISK
```
## Downsides
- Historical authors may have left the team; needs recency weighting
- Requires `mr_file_changes` to be populated (Gate 4)
- Single-author directories may be intentional (ownership model)
- Directory aggregation heuristic is imperfect for deep nesting
## Extensions
- `lore silos --since 180d` — only count recent activity
- `lore silos --depth 2` — aggregate at directory depth N
- Combine with `lore experts` to show both silos and experts in one view
- Risk scoring: weight by directory size, change frequency, recency

View File

@@ -0,0 +1,95 @@
# Similar Issues Finder
- **Command:** `lore similar <iid>`
- **Confidence:** 95%
- **Tier:** 1
- **Status:** proposed
- **Effort:** low — infrastructure exists, needs one new query path
## What
Given an issue IID, find the N most semantically similar issues using the existing
vector embeddings. Show similarity score and overlapping keywords.
Can also work with MRs: `lore similar --mr <iid>`.
## Why
Duplicate detection is a constant problem on active projects. "Is this bug already
filed?" becomes a one-liner. This is the most natural use of the embedding pipeline
and the feature people expect when they hear "semantic search."
## Data Required
All exists today:
- `documents` table (source_type, source_id, content_text)
- `embeddings` virtual table (768-dim vectors via sqlite-vec)
- `embedding_metadata` (document_hash for staleness check)
## Implementation Sketch
```
1. Resolve IID → issue.id → document.id (via source_type='issue', source_id)
2. Look up embedding vector(s) for that document
3. Query sqlite-vec for K nearest neighbors (K = limit * 2 for headroom)
4. Filter to source_type='issue' (or 'merge_request' if --include-mrs)
5. Exclude self
6. Rank by cosine similarity
7. Return top N with: iid, title, project, similarity_score, url
```
### SQL Core
```sql
-- Get the embedding for target document (chunk 0 = representative)
SELECT embedding FROM embeddings WHERE rowid = ?1 * 1000;
-- Find nearest neighbors
SELECT
rowid,
distance
FROM embeddings
WHERE embedding MATCH ?1
AND k = ?2
ORDER BY distance;
-- Resolve back to entities
SELECT d.source_type, d.source_id, d.title, d.url, i.iid, i.state
FROM documents d
JOIN issues i ON d.source_id = i.id AND d.source_type = 'issue'
WHERE d.id = ?;
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"query_issue": { "iid": 42, "title": "Login timeout on slow networks" },
"similar": [
{
"iid": 38,
"title": "Connection timeout in auth flow",
"project": "group/backend",
"similarity": 0.87,
"state": "closed",
"url": "https://gitlab.com/group/backend/-/issues/38"
}
]
},
"meta": { "elapsed_ms": 45, "candidates_scanned": 200 }
}
```
## Downsides
- Embedding quality depends on description quality; short issues may not match well
- Multi-chunk documents need aggregation strategy (use chunk 0 or average?)
- Requires embeddings to be generated first (`lore embed`)
## Extensions
- `lore similar --open-only` to filter to unresolved issues (duplicate triage)
- `lore similar --text "free text query"` to find issues similar to arbitrary text
- Batch mode: find all potential duplicate clusters across the entire database

View File

@@ -0,0 +1,100 @@
# Stale Discussion Finder
- **Command:** `lore stale-discussions [--days <N>]`
- **Confidence:** 90%
- **Tier:** 1
- **Status:** proposed
- **Effort:** low — single query, minimal formatting
## What
List unresolved, resolvable discussions where `last_note_at` is older than a
threshold (default 14 days), grouped by parent entity. Prioritize by discussion
count per entity (more stale threads = more urgent).
## Why
Unresolved discussions are silent blockers. They prevent MR merges, stall
decision-making, and represent forgotten conversations. This surfaces them so teams
can take action: resolve, respond, or explicitly mark as won't-fix.
## Data Required
All exists today:
- `discussions` (resolved, resolvable, last_note_at)
- `issues` / `merge_requests` (for parent entity context)
## Implementation Sketch
```sql
SELECT
d.id,
d.noteable_type,
CASE WHEN d.issue_id IS NOT NULL THEN i.iid ELSE mr.iid END as entity_iid,
CASE WHEN d.issue_id IS NOT NULL THEN i.title ELSE mr.title END as entity_title,
p.path_with_namespace,
d.last_note_at,
((?1 - d.last_note_at) / 86400000) as days_stale,
COUNT(*) OVER (PARTITION BY COALESCE(d.issue_id, d.merge_request_id), d.noteable_type) as stale_count_for_entity
FROM discussions d
JOIN projects p ON d.project_id = p.id
LEFT JOIN issues i ON d.issue_id = i.id
LEFT JOIN merge_requests mr ON d.merge_request_id = mr.id
WHERE d.resolved = 0
AND d.resolvable = 1
AND d.last_note_at < ?1
ORDER BY days_stale DESC;
```
## Human Output Format
```
Stale Discussions (14+ days without activity)
group/backend !234 — Refactor auth middleware (3 stale threads)
Discussion #a1b2c3 (28d stale) "Should we use JWT or session tokens?"
Discussion #d4e5f6 (21d stale) "Error handling for expired tokens"
Discussion #g7h8i9 (14d stale) "Performance implications of per-request validation"
group/backend #90 — Rate limiting design (1 stale thread)
Discussion #j0k1l2 (18d stale) "Redis vs in-memory rate counter"
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"threshold_days": 14,
"total_stale": 4,
"entities": [
{
"type": "merge_request",
"iid": 234,
"title": "Refactor auth middleware",
"project": "group/backend",
"stale_discussions": [
{
"discussion_id": "a1b2c3",
"days_stale": 28,
"first_note_preview": "Should we use JWT or session tokens?"
}
]
}
]
}
}
```
## Downsides
- Some discussions are intentionally left open (design docs, long-running threads)
- Could produce noise in repos with loose discussion hygiene
- Doesn't distinguish "stale and blocking" from "stale and irrelevant"
## Extensions
- `lore stale-discussions --mr-only` — focus on MR review threads (most actionable)
- `lore stale-discussions --author alice` — "threads I started that went quiet"
- `lore stale-discussions --assignee bob` — "threads on my MRs that need attention"

82
docs/ideas/unlinked.md Normal file
View File

@@ -0,0 +1,82 @@
# Unlinked MR Finder
- **Command:** `lore unlinked [--since <date>]`
- **Confidence:** 83%
- **Tier:** 2
- **Status:** proposed
- **Effort:** low — LEFT JOIN queries
## What
Two reports:
1. Merged MRs with no entity_references at all (no "closes", no "mentioned",
no "related") — orphan MRs with no issue traceability
2. Closed issues with no MR reference — issues closed manually without code change
## Why
Process compliance metric. Unlinked MRs mean lost traceability — you can't trace
a code change back to a requirement. Manually closed issues might mean work was done
outside the tracked process, or issues were closed prematurely.
## Data Required
All exists today:
- `merge_requests` (state, merged_at)
- `issues` (state, closed/updated_at)
- `entity_references` (for join/anti-join)
## Implementation Sketch
```sql
-- Orphan merged MRs (no references at all)
SELECT mr.iid, mr.title, mr.author_username, mr.merged_at,
p.path_with_namespace
FROM merge_requests mr
JOIN projects p ON mr.project_id = p.id
LEFT JOIN entity_references er
ON er.source_entity_type = 'merge_request' AND er.source_entity_id = mr.id
WHERE mr.state = 'merged'
AND mr.merged_at >= ?1
AND er.id IS NULL
ORDER BY mr.merged_at DESC;
-- Closed issues with no MR reference
SELECT i.iid, i.title, i.author_username, i.updated_at,
p.path_with_namespace
FROM issues i
JOIN projects p ON i.project_id = p.id
LEFT JOIN entity_references er
ON er.target_entity_type = 'issue' AND er.target_entity_id = i.id
AND er.source_entity_type = 'merge_request'
WHERE i.state = 'closed'
AND i.updated_at >= ?1
AND er.id IS NULL
ORDER BY i.updated_at DESC;
```
## Human Output
```
Unlinked MRs (merged with no issue reference, last 30 days)
!245 Fix typo in README (alice, merged 2d ago)
!239 Update CI pipeline (bob, merged 1w ago)
!236 Bump dependency versions (charlie, merged 2w ago)
Orphan Closed Issues (closed without any MR, last 30 days)
#92 Update documentation for v2 (closed by dave, 3d ago)
#88 Investigate memory usage (closed by eve, 2w ago)
```
## Downsides
- Some MRs legitimately don't reference issues (chores, CI fixes, dependency bumps)
- Some issues are legitimately closed without code (questions, duplicates, won't-fix)
- Noise level depends on team discipline
## Extensions
- `lore unlinked --ignore-labels "chore,ci"` — filter out expected orphans
- Compliance score: % of MRs with issue links over time (trend metric)

102
docs/ideas/weekly-digest.md Normal file
View File

@@ -0,0 +1,102 @@
# Weekly Digest Generator
- **Command:** `lore weekly [--since <date>]`
- **Confidence:** 90%
- **Tier:** 1
- **Status:** proposed
- **Effort:** medium — builds on digest infrastructure, adds markdown formatting
## What
Auto-generate a markdown document summarizing the week: MRs merged (grouped by
project), issues closed, new issues opened, ongoing discussions, milestone progress.
Formatted for pasting into Slack, email, or team standup notes.
Default window is 7 days. `--since` overrides.
## Why
Every team lead writes a weekly status update. This writes itself from the data.
Leverages everything gitlore has ingested. Saves 30-60 minutes of manual summarization
per week.
## Data Required
Same as digest (all exists today):
- `resource_state_events`, `merge_requests`, `issues`, `discussions`
- `milestones` for progress tracking
## Implementation Sketch
This is essentially `lore digest --since 7d --format markdown` with:
1. Section headers for each category
2. Milestone progress bars (X/Y issues closed)
3. "Highlights" section with the most-discussed items
4. "Risks" section with overdue issues and stale MRs
### Markdown Template
```markdown
# Weekly Summary — Jan 20-27, 2025
## Highlights
- **!234** Refactor auth middleware merged (12 discussions, 4 reviewers)
- **#95** New critical bug: Rate limiting returns 500
## Merged (3)
| MR | Title | Author | Reviewers |
|----|-------|--------|-----------|
| !234 | Refactor auth middleware | alice | bob, charlie |
| !231 | Fix connection pool leak | bob | alice |
| !45 | Update dashboard layout | eve | dave |
## Closed Issues (2)
- **#89** Login timeout on slow networks (closed by alice)
- **#87** Stale cache headers (closed by bob)
## New Issues (3)
- **#95** Rate limiting returns 500 (priority::high, assigned to charlie)
- **#94** Add rate limit documentation (priority::low)
- **#93** Flaky test in CI pipeline (assigned to dave)
## Milestone Progress
- **v2.0** — 14/20 issues closed (70%) — due Feb 15
- **v1.9-hotfix** — 3/3 issues closed (100%) — COMPLETE
## Active Discussions
- **#90** 8 new comments this week (needs-review)
- **!230** 5 review threads unresolved
```
## Robot Mode Output
```json
{
"ok": true,
"data": {
"period": { "from": "2025-01-20", "to": "2025-01-27" },
"merged_count": 3,
"closed_count": 2,
"opened_count": 3,
"highlights": [...],
"merged": [...],
"closed": [...],
"opened": [...],
"milestones": [...],
"active_discussions": [...]
}
}
```
## Downsides
- Formatting preferences vary by team; hard to please everyone
- "Highlights" ranking is heuristic (discussion count as proxy for importance)
- Doesn't capture work done outside GitLab
## Extensions
- `lore weekly --project group/backend` — single project scope
- `lore weekly --author alice` — personal weekly summary
- `lore weekly --output weekly.md` — write to file
- Scheduled generation via cron + robot mode