docs: add proposed code file reorganization plan

Planning document for the ongoing test extraction and code organization effort. Covers module-by-module analysis, proposed file splits, and phased execution plan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 10:54:56 -05:00
parent 48fbd4bfdb
commit 11fe02fac9
1 changed files with 425 additions and 0 deletions
--- a/PROPOSED_CODE_FILE_REORGANIZATION_PLAN.md
+++ b/PROPOSED_CODE_FILE_REORGANIZATION_PLAN.md
@@ -0,0 +1,425 @@
 # Proposed Code File Reorganization Plan
 ## Executive Summary
 The codebase is 79 Rust source files / 46K lines across 7 top-level modules. Most modules (`gitlab/`, `embedding/`, `search/`, `documents/`, `ingestion/`) are well-organized. The pain points are:
 1. **`core/` is a grab-bag** — 22 files mixing infrastructure, domain logic, DB operations, and an entire timeline pipeline
 2. **`main.rs` is 2713 lines** — ~30 handler functions that bridge CLI args to commands
 3. **`cli/mod.rs` is 949 lines** — every clap argument struct is packed into one file
 4. **Giant command files** — `who.rs` (6067 lines), `list.rs` (2931 lines) are unwieldy
 This plan is organized into **three tiers** based on impact-to-risk ratio. Tier 1 changes are "no-brainers" — they reduce confusion with minimal import churn. Tier 2 changes are valuable but involve more cross-cutting import updates. Tier 3 changes are "maybe later" — they'd be nice but the juice might not be worth the squeeze right now.
 ---
 ## Current Structure (Annotated)
 ```
 src/
 ├── main.rs              (2713 lines) ← dispatch + ~30 handler functions + error helpers
 ├── lib.rs               (9 lines)
 ├── cli/
 │   ├── mod.rs           (949 lines)  ← ALL clap arg structs crammed here
 │   ├── autocorrect.rs   (945 lines)
 │   ├── progress.rs      (92 lines)
 │   ├── robot.rs         (111 lines)
 │   └── commands/
 │       ├── mod.rs       (50 lines) — re-exports
 │       ├── auth_test.rs
 │       ├── count.rs     (406 lines)
 │       ├── doctor.rs    (576 lines)
 │       ├── drift.rs     (642 lines)
 │       ├── embed.rs
 │       ├── generate_docs.rs (320 lines)
 │       ├── ingest.rs    (1064 lines)
 │       ├── init.rs      (174 lines)
 │       ├── list.rs      (2931 lines) ← handles issues, MRs, AND notes listing
 │       ├── search.rs    (418 lines)
 │       ├── show.rs      (1377 lines)
 │       ├── stats.rs     (505 lines)
 │       ├── sync_status.rs (454 lines)
 │       ├── sync.rs      (576 lines)
 │       ├── timeline.rs  (488 lines)
 │       └── who.rs       (6067 lines) ← 5 sub-modes: expert, workload, active, overlap, reviews
 ├── core/
 │   ├── mod.rs           (25 lines)
 │   ├── backoff.rs       ← retry logic (used by ingestion)
 │   ├── config.rs        (789 lines) ← configuration types
 │   ├── db.rs            (970 lines) ← connection + 22 migrations
 │   ├── dependent_queue.rs (330 lines) ← job queue (used by ingestion orchestrator)
 │   ├── error.rs         (295 lines) ← error enum + exit codes
 │   ├── events_db.rs     (199 lines) ← resource event upserts (used by ingestion)
 │   ├── lock.rs          (228 lines) ← filesystem sync lock
 │   ├── logging.rs       (179 lines) ← tracing filter builders
 │   ├── metrics.rs       (566 lines) ← tracing-based stage timing
 │   ├── note_parser.rs   (563 lines) ← cross-ref extraction from note bodies
 │   ├── paths.rs         ← config/db/log file path resolution
 │   ├── payloads.rs      (204 lines) ← raw JSON payload storage
 │   ├── project.rs       (274 lines) ← fuzzy project resolution from DB
 │   ├── references.rs    (551 lines) ← entity cross-reference extraction
 │   ├── shutdown.rs      ← graceful shutdown via tokio signal
 │   ├── sync_run.rs      (218 lines) ← sync run recording to DB
 │   ├── time.rs          ← time conversion utilities
 │   ├── timeline.rs      (284 lines) ← timeline types + EntityRef
 │   ├── timeline_collect.rs (695 lines) ← Stage 4: collect events from DB
 │   ├── timeline_expand.rs (557 lines) ← Stage 3: expand via cross-refs
 │   └── timeline_seed.rs (552 lines) ← Stage 1: FTS search seeding
 ├── documents/           ← well-organized, 3 focused files
 ├── embedding/           ← well-organized, 6 focused files
 ├── gitlab/              ← well-organized, with transformers/ subdir
 ├── ingestion/           ← well-organized, 8 focused files
 └── search/              ← well-organized, 5 focused files
 ```
 ---
 ## Tier 1: No-Brainers (Do First)
 ### 1.1 Extract `timeline/` from `core/`
 **What:** Move the 4 timeline files into their own top-level module `src/timeline/`.
 **Current location:**
 - `core/timeline.rs` (284 lines) — types: `EntityRef`, `ExpandedEntityRef`, `TimelineEvent`, `TimelineEventType`, etc.
 - `core/timeline_seed.rs` (552 lines) — Stage 1: FTS-based seeding
 - `core/timeline_expand.rs` (557 lines) — Stage 3: cross-reference expansion
 - `core/timeline_collect.rs` (695 lines) — Stage 4: event collection from DB
 **New structure:**
 ```
 src/timeline/
 ├── mod.rs       ← types (from timeline.rs) + re-exports
 ├── seed.rs      ← from timeline_seed.rs
 ├── expand.rs    ← from timeline_expand.rs
 └── collect.rs   ← from timeline_collect.rs
 ```
 **Rationale:** These 4 files form a cohesive 5-stage pipeline (SEED→HYDRATE→EXPAND→COLLECT→RENDER). They have nothing to do with "core" infrastructure like `db.rs`, `config.rs`, or `error.rs`. They only import from `core::error`, `core::time`, and `search::fts` — all of which remain accessible via `crate::core::*` and `crate::search::*` after the move.
 **Import changes needed:**
 - `cli/commands/timeline.rs`: `use crate::core::timeline::*` → `use crate::timeline::*`, same for `timeline_seed`, `timeline_expand`, `timeline_collect`
 - `core/mod.rs`: remove the 4 `pub mod timeline*` lines
 - `lib.rs`: add `pub mod timeline;`
 **Risk: LOW** — Only 1 consumer (`cli/commands/timeline.rs`) + internal cross-references between the 4 files.
 ---
 ### 1.2 Extract `xref/` (cross-reference extraction) from `core/`
 **What:** Move `note_parser.rs` and `references.rs` into `src/xref/`.
 **Current location:**
 - `core/note_parser.rs` (563 lines) — parses note bodies for "mentioned in group/repo#123" patterns, persists to `note_cross_references` table
 - `core/references.rs` (551 lines) — extracts entity references from state events and closing MRs, writes to `entity_references` table
 **New structure:**
 ```
 src/xref/
 ├── mod.rs           ← re-exports
 ├── note_parser.rs   ← from core/note_parser.rs
 └── references.rs    ← from core/references.rs
 ```
 **Rationale:** These files implement a specific domain concept — extracting and persisting cross-references between issues and MRs. They are not "core infrastructure." They're consumed by `ingestion/orchestrator.rs` for the cross-reference extraction phase, and the data they produce is consumed by the timeline pipeline. Putting them in their own module makes the data flow clearer: `ingestion → xref → timeline`.
 **Import changes needed:**
 - `ingestion/orchestrator.rs`: `use crate::core::references::*` → `use crate::xref::references::*`
 - `ingestion/orchestrator.rs`: `use crate::core::note_parser::*` (if used directly — needs verification) → `use crate::xref::*`
 - `core/mod.rs`: remove `pub mod note_parser; pub mod references;`
 - `lib.rs`: add `pub mod xref;`
 - Internal: the files use `super::error::Result` and `super::time::now_ms` which become `crate::core::error::Result` and `crate::core::time::now_ms`
 **Risk: LOW** — 2-3 consumers at most. The files already use `super::` internally which just needs updating to `crate::core::`.
 ---
 ## Tier 2: Good Improvements (Do After Tier 1)
 ### 2.1 Group ingestion-adjacent DB operations
 **What:** Move `events_db.rs`, `dependent_queue.rs`, `payloads.rs`, and `sync_run.rs` from `core/` into `ingestion/` since they exclusively serve the ingestion pipeline.
 **Current consumers:**
 - `events_db.rs` → only used by `cli/commands/count.rs` (for event counts)
 - `dependent_queue.rs` → only used by `ingestion/orchestrator.rs` and `main.rs` (to release locked jobs)
 - `payloads.rs` → only used by `ingestion/discussions.rs`, `ingestion/issues.rs`, `ingestion/merge_requests.rs`, `ingestion/mr_discussions.rs`
 - `sync_run.rs` → only used by `cli/commands/sync.rs` and `cli/commands/sync_status.rs`
 **New structure:**
 ```
 src/ingestion/
 ├── (existing files...)
 ├── events_db.rs       ← from core/events_db.rs
 ├── dependent_queue.rs ← from core/dependent_queue.rs
 ├── payloads.rs        ← from core/payloads.rs
 └── sync_run.rs        ← from core/sync_run.rs
 ```
 **Rationale:** All 4 files exist to support the ingestion pipeline:
 - `events_db.rs` upserts resource state/label/milestone events fetched during ingestion
 - `dependent_queue.rs` manages the job queue that drives incremental discussion fetching
 - `payloads.rs` stores the raw JSON payloads fetched from GitLab
 - `sync_run.rs` records when syncs start/finish and their metrics
 When you're looking for "how does ingestion work?", you'd naturally look in `ingestion/`. Having these scattered in `core/` requires knowing the hidden dependency.
 **Import changes needed:**
 - `events_db.rs`: 1 consumer in `cli/commands/count.rs` changes from `crate::core::events_db` → `crate::ingestion::events_db`
 - `dependent_queue.rs`: 2 consumers — `ingestion/orchestrator.rs` (becomes `super::dependent_queue`) and `main.rs`
 - `payloads.rs`: 4 consumers in `ingestion/*.rs` (become `super::payloads`)
 - `sync_run.rs`: 2 consumers in `cli/commands/sync.rs` and `sync_status.rs`
 - Internal references change from `super::error` / `super::time` to `crate::core::error` / `crate::core::time`
 **Risk: MEDIUM** — More import changes, but all straightforward. The internal `super::` references need the most attention.
 **Alternatively:** If moving feels like too much churn, a lighter option is to create `core/ingestion_db.rs` that re-exports from these 4 files, making the grouping visible without moving files. But I think the move is cleaner.
 ---
 ### 2.2 Split `cli/mod.rs` — move arg structs to their command files
 **What:** Move each `*Args` struct from `cli/mod.rs` into the corresponding `cli/commands/*.rs` file. Keep `Cli` struct, `Commands` enum, and `detect_robot_mode_from_env()` in `cli/mod.rs`.
 **Currently `cli/mod.rs` (949 lines) contains:**
 - `Cli` struct (81 lines) — the root clap parser
 - `Commands` enum (193 lines) — all subcommand variants
 - `IssuesArgs` (86 lines) → move to `commands/list.rs` or stay near issues handling
 - `MrsArgs` (93 lines) → move to `commands/list.rs` or stay near MRs handling
 - `NotesArgs` (99 lines) → move to `commands/list.rs`
 - `IngestArgs` (33 lines) → move to `commands/ingest.rs`
 - `StatsArgs` (19 lines) → move to `commands/stats.rs`
 - `SearchArgs` (58 lines) → move to `commands/search.rs`
 - `GenerateDocsArgs` (9 lines) → move to `commands/generate_docs.rs`
 - `SyncArgs` (39 lines) → move to `commands/sync.rs`
 - `EmbedArgs` (15 lines) → move to `commands/embed.rs`
 - `TimelineArgs` (53 lines) → move to `commands/timeline.rs`
 - `WhoArgs` (76 lines) → move to `commands/who.rs`
 - `CountArgs` (9 lines) → move to `commands/count.rs`
 **After refactoring, `cli/mod.rs` shrinks to ~300 lines** (just `Cli` + `Commands` + the inlined variants like `Init`, `Drift`, `Backup`, `Reset`).
 **Rationale:** When adding a new flag to the `who` command, you currently have to edit `cli/mod.rs` (the args struct), `cli/commands/who.rs` (the implementation), and `main.rs` (the dispatch). If the args struct lives in `commands/who.rs`, you only need two files. This is the standard pattern in mature clap-based Rust CLIs.
 **Import changes needed:**
 - `main.rs` currently does `use lore::cli::{..., WhoArgs, ...}` — these would become `use lore::cli::commands::{..., WhoArgs, ...}` or the `commands/mod.rs` re-exports them
 - Each `commands/*.rs` gets its own `#[derive(Parser)]` struct
 - `Commands` enum in `cli/mod.rs` keeps using the types but imports from `commands::*`
 **Risk: MEDIUM** — Lots of `use` path changes in `main.rs`, but purely mechanical. No logic changes.
 ---
 ## Tier 3: Consider Later
 ### 3.1 Split `main.rs` (2713 lines)
 **The problem:** `main.rs` contains `main()`, ~30 `handle_*` functions, error handling, clap error formatting, fuzzy command matching, and the `robot-docs` JSON manifest (a 400+ line inline JSON literal).
 **Possible approach:**
 - Extract `handle_*` functions into `cli/dispatch.rs` (the routing layer)
 - Extract error handling into `cli/errors.rs`
 - Extract `handle_robot_docs` + the JSON manifest into `cli/robot_docs.rs`
 - Keep `main()` in `main.rs` at ~150 lines (just the tracing setup + dispatch call)
 **Why Tier 3:** This is the messiest split. The handler functions depend on the `cli::commands::*` functions AND the `cli::robot::*` helpers AND direct `std::process::exit` calls. Making this work cleanly requires careful thought about the error boundary between `main.rs` (binary) and `lib.rs` (library).
 **Risk: HIGH** — Every handler function touches `robot_mode`, constructs its own timer, opens the DB, and manages error display. The boilerplate is high but consistent, so splitting would just move it around without reducing complexity.
 ---
 ### 3.2 Split `cli/commands/who.rs` (6067 lines)
 **The problem:** This file implements 5 distinct modes (expert, workload, active, overlap, reviews), each with its own query, scoring model, and output formatting. It also includes the time-decay scoring model (~500 lines) and per-MR detail breakdown logic.
 **Possible split:**
 ```
 src/cli/commands/who/
 ├── mod.rs         ← WhoRun dispatcher, shared types
 ├── expert.rs      ← expert mode (path-based file expertise lookup)
 ├── workload.rs    ← workload mode (user's assigned issues/MRs)
 ├── active.rs      ← active discussions mode
 ├── overlap.rs     ← file overlap between users
 ├── reviews.rs     ← review pattern analysis
 └── scoring.rs     ← time-decay expert scoring model
 ```
 **Why Tier 3:** The 5 modes share many helper functions, database connection patterns, and output formatting logic. Splitting would require carefully identifying the shared helpers and deciding where they live. The file is big but internally consistent — the modes use a shared dispatcher pattern and common types.
 ---
 ### 3.3 Split `cli/commands/list.rs` (2931 lines)
 **The problem:** This file handles issue listing, MR listing, AND note listing — three related but distinct operations with separate query builders, output formatters, and test suites.
 **Possible split:**
 ```
 src/cli/commands/
 ├── list_issues.rs   ← issue listing + query builder
 ├── list_mrs.rs      ← MR listing + query builder
 ├── list_notes.rs    ← note listing + query builder
 └── list.rs          ← shared types (ListFilters, etc.) + re-exports
 ```
 **Why Tier 3:** Same issue as `who.rs` — the three listing modes share query building patterns, field selection logic, and sorting code. Splitting requires identifying and extracting the shared pieces first.
 ---
 ## Files NOT Recommended to Move
 These files belong exactly where they are:
 | File | Why it belongs in `core/` |
 |------|--------------------------|
 | `config.rs` | Config types used by nearly everything |
 | `db.rs` | Database connection + migrations — foundational |
 | `error.rs` | Error types used by every module |
 | `paths.rs` | File path resolution — infrastructure |
 | `logging.rs` | Tracing setup — infrastructure |
 | `lock.rs` | Filesystem sync lock — infrastructure |
 | `shutdown.rs` | Graceful shutdown signal — infrastructure |
 | `backoff.rs` | Retry math — infrastructure |
 | `time.rs` | Time conversion — used everywhere |
 | `metrics.rs` | Tracing metrics layer — infrastructure |
 | `project.rs` | Fuzzy project resolution — used by 8+ consumers across modules |
 These files are legitimate "core infrastructure" used across multiple modules. Moving them would create import churn with no clarity gain.
 ---
 ## Files NOT Recommended to Split/Merge
 | File | Why leave it alone |
 |------|-------------------|
 | `documents/extractor.rs` (2341 lines) | One cohesive extractor per entity type — the size comes from per-type formatting logic, not mixed concerns |
 | `ingestion/orchestrator.rs` (1703 lines) | Single orchestration flow — splitting would scatter the pipeline |
 | `gitlab/graphql.rs` (1293 lines) | GraphQL client with adaptive paging — cohesive |
 | `gitlab/client.rs` (851 lines) | REST client with all endpoints — cohesive |
 | `cli/autocorrect.rs` (945 lines) | Correction registry + fuzzy matching — splitting gains nothing |
 ---
 ## Proposed Final Structure (Tiers 1+2)
 ```
 src/
 ├── main.rs              (2713 lines — unchanged for now)
 ├── lib.rs               (adds: pub mod timeline; pub mod xref;)
 ├── cli/
 │   ├── mod.rs           (~300 lines — Cli + Commands only, args moved out)
 │   ├── autocorrect.rs   (unchanged)
 │   ├── progress.rs      (unchanged)
 │   ├── robot.rs         (unchanged)
 │   └── commands/
 │       ├── mod.rs       (re-exports + WhoArgs, IssuesArgs, etc.)
 │       ├── (all existing files — unchanged but with args structs moved in)
 │       └── ...
 ├── core/                (slimmed: 14 files → infrastructure only)
 │   ├── mod.rs
 │   ├── backoff.rs
 │   ├── config.rs
 │   ├── db.rs
 │   ├── error.rs
 │   ├── lock.rs
 │   ├── logging.rs
 │   ├── metrics.rs
 │   ├── paths.rs
 │   ├── project.rs
 │   ├── shutdown.rs
 │   └── time.rs
 ├── timeline/            (NEW — extracted from core/)
 │   ├── mod.rs           (types from core/timeline.rs)
 │   ├── seed.rs          (from core/timeline_seed.rs)
 │   ├── expand.rs        (from core/timeline_expand.rs)
 │   └── collect.rs       (from core/timeline_collect.rs)
 ├── xref/                (NEW — extracted from core/)
 │   ├── mod.rs
 │   ├── note_parser.rs   (from core/note_parser.rs)
 │   └── references.rs    (from core/references.rs)
 ├── ingestion/           (gains 4 files from core/)
 │   ├── (existing files...)
 │   ├── events_db.rs     (from core/events_db.rs)
 │   ├── dependent_queue.rs (from core/dependent_queue.rs)
 │   ├── payloads.rs      (from core/payloads.rs)
 │   └── sync_run.rs      (from core/sync_run.rs)
 ├── documents/           (unchanged)
 ├── embedding/           (unchanged)
 ├── gitlab/              (unchanged)
 └── search/              (unchanged)
 ```
 ---
 ## Import Change Tracking
 ### Tier 1.1: Timeline extraction
 | Consumer file | Old import | New import |
 |---------------|-----------|------------|
 | `cli/commands/timeline.rs:10-15` | `crate::core::timeline::*` | `crate::timeline::*` |
 | `cli/commands/timeline.rs:13` | `crate::core::timeline_collect::collect_events` | `crate::timeline::collect_events` (or `crate::timeline::collect::collect_events`) |
 | `cli/commands/timeline.rs:14` | `crate::core::timeline_expand::expand_timeline` | `crate::timeline::expand_timeline` |
 | `cli/commands/timeline.rs:15` | `crate::core::timeline_seed::seed_timeline` | `crate::timeline::seed_timeline` |
 | `core/timeline_seed.rs:7-8` | `super::timeline::*` | `super::*` (or `crate::timeline::*` depending on structure) |
 | `core/timeline_expand.rs:6` | `super::timeline::*` | `super::*` |
 | `core/timeline_collect.rs:4` | `super::timeline::*` | `super::*` |
 | `core/timeline_seed.rs:8` | `crate::search::*` | `crate::search::*` (no change) |
 | `core/timeline_seed.rs:6-7` | `super::error::Result` | `crate::core::error::Result` |
 | `core/timeline_expand.rs:5` | `super::error::Result` | `crate::core::error::Result` |
 | `core/timeline_collect.rs:3` | `super::error::*` | `crate::core::error::*` |
 ### Tier 1.2: Cross-reference extraction
 | Consumer file | Old import | New import |
 |---------------|-----------|------------|
 | `ingestion/orchestrator.rs:10-12` | `crate::core::references::*` | `crate::xref::references::*` |
 | `core/note_parser.rs:7-8` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
 | `core/references.rs:4-5` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
 ### Tier 2.1: Ingestion-adjacent DB ops
 | Consumer file | Old import | New import |
 |---------------|-----------|------------|
 | `cli/commands/count.rs:9` | `crate::core::events_db::*` | `crate::ingestion::events_db::*` |
 | `ingestion/orchestrator.rs:6-8` | `crate::core::dependent_queue::*` | `super::dependent_queue::*` |
 | `main.rs:37` | `crate::core::dependent_queue::release_all_locked_jobs` | `crate::ingestion::dependent_queue::release_all_locked_jobs` |
 | `ingestion/discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
 | `ingestion/issues.rs:9` | `crate::core::payloads::*` | `super::payloads::*` |
 | `ingestion/merge_requests.rs:8` | `crate::core::payloads::*` | `super::payloads::*` |
 | `ingestion/mr_discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
 | `cli/commands/sync.rs` | (uses `crate::core::sync_run::*`) | `crate::ingestion::sync_run::*` |
 | `cli/commands/sync_status.rs` | (uses `crate::core::sync_run::*` or `crate::core::metrics::*`) | check and update |
 | Internal: `events_db.rs:4-5` | `super::error::*`, `super::time::*` | `crate::core::error::*`, `crate::core::time::*` |
 | Internal: `dependent_queue.rs:5-6` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
 | Internal: `payloads.rs:9-10` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
 | Internal: `sync_run.rs:2-4` | `super::error::*`, `super::metrics::*`, `super::time::*` | `crate::core::error::*`, `crate::core::metrics::*`, `crate::core::time::*` |
 ---
 ## Execution Order
 1. **Tier 1.1** — Extract timeline → `src/timeline/` (LOW risk, 1 consumer)
 2. **Tier 1.2** — Extract xref → `src/xref/` (LOW risk, 1-2 consumers)
 3. **Cargo check + clippy + test** after each tier
 4. **Tier 2.1** — Move ingestion DB ops (MEDIUM risk, more consumers)
 5. **Cargo check + clippy + test**
 6. **Tier 2.2** — Split `cli/mod.rs` args (MEDIUM risk, mostly mechanical)
 7. **Cargo check + clippy + test + fmt**
 Each tier should be its own commit for easy rollback.
 ---
 ## What This Achieves
 **Before:** A developer looking at `core/` sees 22 files and has to mentally sort "infrastructure vs. domain logic vs. pipeline stage." The timeline pipeline is invisible unless you know to look in `core/`.
 **After:**
 - `core/` has 12 files, all clearly infrastructure (db, config, error, paths, logging, lock, shutdown, backoff, time, metrics, project)
 - `timeline/` is a discoverable first-class module showing the 5-stage pipeline
 - `xref/` makes the cross-reference extraction domain visible
 - `ingestion/` contains everything related to data fetching: the orchestrator, entity ingestors, AND their supporting DB operations
 - `cli/mod.rs` is lean — just the top-level Cli struct and Commands enum
 A new developer (or coding agent) can now answer "where is the timeline code?" → `src/timeline/`, "where is ingestion?" → `src/ingestion/`, "where is cross-reference extraction?" → `src/xref/`, without needing institutional knowledge.