Files
gitlore/docs/ideas/label-audit.md
Taylor Eernisse 4185abe05d docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc
Ideas catalog (docs/ideas/): 25 feature concept documents covering future
lore capabilities including bottleneck detection, churn analysis, expert
scoring, collaboration patterns, milestone risk, knowledge silos, and more.
Each doc includes motivation, implementation sketch, data requirements, and
dependencies on existing infrastructure. README.md provides an overview and
SYSTEM-PROPOSAL.md presents the unified analytics vision.

Plans (plans/): Time-decay expert scoring design with four rounds of review
feedback exploring decay functions, scoring algebra, and integration points
with the existing who-expert pipeline.

Issue doc (docs/issues/001): Documents the timeline pipeline bug where
EntityRef was missing project context, causing ambiguous cross-project
references during the EXPAND stage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:48 -05:00

2.8 KiB

Label Hygiene Audit

  • Command: lore label-audit
  • Confidence: 82%
  • Tier: 2
  • Status: proposed
  • Effort: low — straightforward aggregation queries

What

Report on label health:

  • Labels used only once (may be typos or abandoned experiments)
  • Labels applied and removed within 1 hour (likely mistakes)
  • Labels with no active issues/MRs (orphaned)
  • Label name collisions across projects (same name, different meaning)
  • Labels never used at all (defined but not applied)

Why

Label sprawl is real and makes filtering useless over time. Teams create labels ad-hoc and never clean them up. This simple audit surfaces maintenance tasks.

Data Required

All exists today:

  • labels (name, project_id)
  • issue_labels / mr_labels (usage counts)
  • resource_label_events (add/remove pairs for mistake detection)
  • issues / merge_requests (state for "active" filtering)

Implementation Sketch

-- Labels used only once
SELECT l.name, p.path_with_namespace, COUNT(*) as usage
FROM labels l
JOIN projects p ON l.project_id = p.id
LEFT JOIN issue_labels il ON il.label_id = l.id
LEFT JOIN mr_labels ml ON ml.label_id = l.id
GROUP BY l.id
HAVING COUNT(il.issue_id) + COUNT(ml.merge_request_id) = 1;

-- Flash labels (applied and removed within 1 hour)
SELECT
    rle1.label_name,
    rle1.created_at as added_at,
    rle2.created_at as removed_at,
    (rle2.created_at - rle1.created_at) / 60000 as minutes_active
FROM resource_label_events rle1
JOIN resource_label_events rle2
    ON rle1.issue_id = rle2.issue_id
    AND rle1.label_name = rle2.label_name
    AND rle1.action = 'add'
    AND rle2.action = 'remove'
    AND rle2.created_at > rle1.created_at
    AND (rle2.created_at - rle1.created_at) < 3600000;

-- Unused labels (defined but never applied)
SELECT l.name, p.path_with_namespace
FROM labels l
JOIN projects p ON l.project_id = p.id
LEFT JOIN issue_labels il ON il.label_id = l.id
LEFT JOIN mr_labels ml ON ml.label_id = l.id
WHERE il.issue_id IS NULL AND ml.merge_request_id IS NULL;

Human Output

Label Audit

Unused Labels (4):
  group/backend:  deprecated-v1, needs-triage, wontfix-maybe
  group/frontend: old-design

Single-Use Labels (3):
  group/backend:  perf-regression (1 issue)
  group/frontend: ux-debt (1 MR), mobile-only (1 issue)

Flash Labels (applied < 1hr, 2):
  group/backend #90: +priority::critical then -priority::critical (12 min)
  group/backend #85: +blocked then -blocked (5 min)

Cross-Project Collisions (1):
  "needs-review" used in group/backend (32 uses) AND group/frontend (8 uses)

Downsides

  • Low glamour; this is janitorial work
  • Single-use labels may be legitimate (one-off categorization)
  • Cross-project collisions may be intentional (shared vocabulary)

Extensions

  • lore label-audit --fix — suggest deletions for unused labels
  • Trend: label count over time (is sprawl increasing?)