Ideas catalog (docs/ideas/): 25 feature concept documents covering future lore capabilities including bottleneck detection, churn analysis, expert scoring, collaboration patterns, milestone risk, knowledge silos, and more. Each doc includes motivation, implementation sketch, data requirements, and dependencies on existing infrastructure. README.md provides an overview and SYSTEM-PROPOSAL.md presents the unified analytics vision. Plans (plans/): Time-decay expert scoring design with four rounds of review feedback exploring decay functions, scoring algebra, and integration points with the existing who-expert pipeline. Issue doc (docs/issues/001): Documents the timeline pipeline bug where EntityRef was missing project context, causing ambiguous cross-project references during the EXPAND stage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2.7 KiB
2.7 KiB
Knowledge Silo Detection
- Command:
lore silos [--min-changes <N>] - Confidence: 87%
- Tier: 2
- Status: proposed
- Effort: medium — requires mr_file_changes population (Gate 4)
What
For each file path (or directory), count unique MR authors. Flag paths where only 1 person has ever authored changes (bus factor = 1). Aggregate by directory to show silo areas.
Why
Bus factor analysis is critical for team resilience. If only one person has ever touched the auth module, that's a risk. This uses data already ingested to surface knowledge concentration that's otherwise invisible.
Data Required
mr_file_changes(new_path, merge_request_id) — needs Gate 4 ingestionmerge_requests(author_username, state='merged')projects(path_with_namespace)
Implementation Sketch
-- Find directories with bus factor = 1
WITH file_authors AS (
SELECT
mfc.new_path,
mr.author_username,
p.path_with_namespace,
mfc.project_id
FROM mr_file_changes mfc
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
JOIN projects p ON mfc.project_id = p.id
WHERE mr.state = 'merged'
),
directory_authors AS (
SELECT
project_id,
path_with_namespace,
-- Extract directory: everything before last '/'
CASE
WHEN INSTR(new_path, '/') > 0
THEN SUBSTR(new_path, 1, LENGTH(new_path) - LENGTH(REPLACE(RTRIM(new_path, REPLACE(new_path, '/', '')), '', '')))
ELSE '.'
END as directory,
COUNT(DISTINCT author_username) as unique_authors,
COUNT(*) as total_changes,
GROUP_CONCAT(DISTINCT author_username) as authors
FROM file_authors
GROUP BY project_id, directory
)
SELECT * FROM directory_authors
WHERE unique_authors = 1
AND total_changes >= ?1 -- min-changes threshold
ORDER BY total_changes DESC;
Human Output
Knowledge Silos (bus factor = 1, min 3 changes)
group/backend
src/auth/ alice (8 changes) HIGH RISK
src/billing/ bob (5 changes) HIGH RISK
src/utils/cache/ charlie (3 changes) MODERATE RISK
group/frontend
src/admin/ dave (12 changes) HIGH RISK
Downsides
- Historical authors may have left the team; needs recency weighting
- Requires
mr_file_changesto be populated (Gate 4) - Single-author directories may be intentional (ownership model)
- Directory aggregation heuristic is imperfect for deep nesting
Extensions
lore silos --since 180d— only count recent activitylore silos --depth 2— aggregate at directory depth N- Combine with
lore expertsto show both silos and experts in one view - Risk scoring: weight by directory size, change frequency, recency