Files
gitlore/docs/ideas/silos.md
Taylor Eernisse 4185abe05d docs: add feature ideas catalog, time-decay scoring plan, and timeline issue doc
Ideas catalog (docs/ideas/): 25 feature concept documents covering future
lore capabilities including bottleneck detection, churn analysis, expert
scoring, collaboration patterns, milestone risk, knowledge silos, and more.
Each doc includes motivation, implementation sketch, data requirements, and
dependencies on existing infrastructure. README.md provides an overview and
SYSTEM-PROPOSAL.md presents the unified analytics vision.

Plans (plans/): Time-decay expert scoring design with four rounds of review
feedback exploring decay functions, scoring algebra, and integration points
with the existing who-expert pipeline.

Issue doc (docs/issues/001): Documents the timeline pipeline bug where
EntityRef was missing project context, causing ambiguous cross-project
references during the EXPAND stage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 10:16:48 -05:00

2.7 KiB

Knowledge Silo Detection

  • Command: lore silos [--min-changes <N>]
  • Confidence: 87%
  • Tier: 2
  • Status: proposed
  • Effort: medium — requires mr_file_changes population (Gate 4)

What

For each file path (or directory), count unique MR authors. Flag paths where only 1 person has ever authored changes (bus factor = 1). Aggregate by directory to show silo areas.

Why

Bus factor analysis is critical for team resilience. If only one person has ever touched the auth module, that's a risk. This uses data already ingested to surface knowledge concentration that's otherwise invisible.

Data Required

  • mr_file_changes (new_path, merge_request_id) — needs Gate 4 ingestion
  • merge_requests (author_username, state='merged')
  • projects (path_with_namespace)

Implementation Sketch

-- Find directories with bus factor = 1
WITH file_authors AS (
    SELECT
        mfc.new_path,
        mr.author_username,
        p.path_with_namespace,
        mfc.project_id
    FROM mr_file_changes mfc
    JOIN merge_requests mr ON mfc.merge_request_id = mr.id
    JOIN projects p ON mfc.project_id = p.id
    WHERE mr.state = 'merged'
),
directory_authors AS (
    SELECT
        project_id,
        path_with_namespace,
        -- Extract directory: everything before last '/'
        CASE
            WHEN INSTR(new_path, '/') > 0
            THEN SUBSTR(new_path, 1, LENGTH(new_path) - LENGTH(REPLACE(RTRIM(new_path, REPLACE(new_path, '/', '')), '', '')))
            ELSE '.'
        END as directory,
        COUNT(DISTINCT author_username) as unique_authors,
        COUNT(*) as total_changes,
        GROUP_CONCAT(DISTINCT author_username) as authors
    FROM file_authors
    GROUP BY project_id, directory
)
SELECT * FROM directory_authors
WHERE unique_authors = 1
  AND total_changes >= ?1  -- min-changes threshold
ORDER BY total_changes DESC;

Human Output

Knowledge Silos (bus factor = 1, min 3 changes)

group/backend
  src/auth/         alice (8 changes)    HIGH RISK
  src/billing/      bob (5 changes)      HIGH RISK
  src/utils/cache/  charlie (3 changes)  MODERATE RISK

group/frontend
  src/admin/        dave (12 changes)    HIGH RISK

Downsides

  • Historical authors may have left the team; needs recency weighting
  • Requires mr_file_changes to be populated (Gate 4)
  • Single-author directories may be intentional (ownership model)
  • Directory aggregation heuristic is imperfect for deep nesting

Extensions

  • lore silos --since 180d — only count recent activity
  • lore silos --depth 2 — aggregate at directory depth N
  • Combine with lore experts to show both silos and experts in one view
  • Risk scoring: weight by directory size, change frequency, recency