# Proposed Code File Reorganization Plan ## 1. Scope, Audit Method, and Constraints This plan is based on a full audit of the `src/` tree (all 131 Rust files) plus integration tests in `tests/` that import `src` modules. What I audited: - module/file inventory (`src/**.rs`) - line counts and hotspot analysis - crate-internal import graph (`use crate::...`) - public API surface (public structs/enums/functions by file) - command routing and re-export topology (`main.rs`, `lib.rs`, `cli/mod.rs`, `cli/commands/mod.rs`) - cross-module coupling and test coupling Constraints followed for this proposal: - no implementation yet (plan only) - keep nesting shallow and intuitive - optimize for discoverability for humans and coding agents - no compatibility shims as a long-term strategy - every structural change includes explicit call-site update tracking --- ## 2. Current State (Measured) ### 2.1 Size by top-level module (`src/`) | Module | Files | Lines | Prod Files | Prod Lines | Test Files | Test Lines | |---|---:|---:|---:|---:|---:|---:| | `cli` | 41 | 29,131 | 37 | 23,068 | 4 | 6,063 | | `core` | 39 | 12,493 | 27 | 7,599 | 12 | 4,894 | | `ingestion` | 15 | 6,935 | 10 | 5,259 | 5 | 1,676 | | `documents` | 6 | 3,657 | 4 | 1,749 | 2 | 1,908 | | `gitlab` | 11 | 3,607 | 8 | 2,391 | 3 | 1,216 | | `embedding` | 10 | 1,878 | 7 | 1,327 | 3 | 551 | | `search` | 6 | 1,115 | 6 | 1,115 | 0 | 0 | | `main.rs` | 1 | 3,744 | 1 | 3,744 | 0 | 0 | | `lib.rs` | 1 | 9 | 1 | 9 | 0 | 0 | Total in `src/`: **131 files / 62,569 lines**. ### 2.2 Largest production hotspots | File | Lines | Why it matters | |---|---:|---| | `src/main.rs` | 3,744 | Binary entrypoint is doing too much dispatch and formatting work | | `src/cli/autocorrect.rs` | 1,865 | Large parsing/correction ruleset in one file | | `src/ingestion/orchestrator.rs` | 1,753 | Multi-stage ingestion orchestration and persistence mixed together | | `src/cli/commands/show.rs` | 1,544 | Issue/MR retrieval + rendering + JSON conversion all in one file | | `src/cli/render.rs` | 1,482 | Theme, table layout, formatting utilities bundled together | | `src/cli/commands/list.rs` | 1,383 | Issues + MRs + notes listing/query/printing in one file | | `src/cli/mod.rs` | 1,268 | Clap root parser plus every args struct | | `src/cli/commands/sync.rs` | 1,201 | Sync flow + human rendering + JSON output | | `src/cli/commands/me/queries.rs` | 1,135 | Multiple query families and post-processing logic | | `src/cli/commands/ingest.rs` | 1,116 | Ingest flow + dry-run + presentation concerns | | `src/documents/extractor.rs` | 1,059 | Four document source extractors in one file | ### 2.3 High-level dependency flow (top modules) Observed module coupling from imports: - `cli -> core` (very heavy, 33 files) - `cli -> documents/embedding/gitlab/ingestion/search` (command-dependent) - `ingestion -> core` (12 files), `ingestion -> gitlab` (10 files) - `search -> core` and `search -> embedding` - `timeline` logic currently located under `core/*timeline*` but semantically acts as its own subsystem ### 2.4 Structural pain points 1. `main.rs` is overloaded with command handlers, robot output envelope types, clap error mapping, and domain invocation. 2. `cli/mod.rs` mixes root parser concerns with command-specific argument schemas. 3. `core/` still holds domain-specific subsystems (`timeline`, cross-reference extraction, ingestion persistence helpers) that are not truly "core infra". 4. Several large command files combine query/build/fetch/render/json responsibilities. 5. Test helper setup is duplicated heavily in large test files (`who_tests`, `list_tests`, `me_tests`). --- ## 3. Reorganization Principles 1. Keep top-level domains explicit: `cli`, `core` (infra), `gitlab`, `ingestion`, `documents`, `embedding`, `search`, plus extracted domain modules where justified. 2. Keep nesting shallow: max 2-3 levels in normal workflow paths. 3. Co-locate command-specific args/types/rendering with the command implementation. 4. Separate orchestration from formatting from data-access code. 5. Prefer module boundaries that map to runtime pipeline boundaries. 6. Make import paths reveal ownership directly. --- ## 4. Proposed Target Structure (End State) ```text src/ main.rs # thin binary entrypoint lib.rs app/ # NEW: runtime dispatch/orchestration glue mod.rs dispatch.rs errors.rs robot_docs.rs cli/ mod.rs # Cli + Commands only args.rs # shared args structs used by Commands variants render/ mod.rs format.rs table.rs theme.rs autocorrect/ mod.rs flags.rs enums.rs fuzzy.rs commands/ mod.rs list/ mod.rs issues.rs mrs.rs notes.rs render.rs show/ mod.rs issue.rs mr.rs render.rs me/ # keep existing folder, retain split style who/ # keep existing folder, retain split style ingest/ mod.rs run.rs dry_run.rs render.rs sync/ mod.rs run.rs render.rs surgical.rs # smaller focused commands can stay single-file for now core/ # infra-only boundary after moves mod.rs backoff.rs config.rs cron.rs cursor.rs db.rs error.rs file_history.rs lock.rs logging.rs metrics.rs path_resolver.rs paths.rs project.rs shutdown.rs time.rs trace.rs timeline/ # NEW: extracted domain subsystem mod.rs types.rs seed.rs expand.rs collect.rs xref/ # NEW: extracted cross-reference subsystem mod.rs note_parser.rs references.rs ingestion/ mod.rs issues.rs merge_requests.rs discussions.rs mr_discussions.rs mr_diffs.rs dirty_tracker.rs discussion_queue.rs orchestrator/ mod.rs issues_flow.rs mrs_flow.rs resource_events.rs closes_issues.rs diff_jobs.rs progress.rs storage/ # NEW: ingestion-owned persistence helpers mod.rs payloads.rs # from core/payloads.rs events.rs # from core/events_db.rs queue.rs # from core/dependent_queue.rs sync_run.rs # from core/sync_run.rs documents/ mod.rs extractor/ mod.rs issues.rs mrs.rs discussions.rs notes.rs common.rs regenerator.rs truncation.rs embedding/ mod.rs change_detector.rs chunks.rs # merge chunk_ids.rs + chunking.rs ollama.rs pipeline.rs similarity.rs gitlab/ # mostly keep as-is (already coherent) search/ # mostly keep as-is (already coherent) ``` Notes: - `gitlab/` and `search/` are already cohesive and should largely remain unchanged. - `who/` and `me/` command families are already split well relative to other commands. --- ## 5. Detailed Change Plan (Phased) ## Phase 1: Domain Boundary Extraction (lowest conceptual risk, high clarity gain) ### 5.1 Extract timeline subsystem from `core` Move: - `src/core/timeline.rs` -> `src/timeline/types.rs` - `src/core/timeline_seed.rs` -> `src/timeline/seed.rs` - `src/core/timeline_expand.rs` -> `src/timeline/expand.rs` - `src/core/timeline_collect.rs` -> `src/timeline/collect.rs` - add `src/timeline/mod.rs` Why: - Timeline is a full pipeline domain (seed -> expand -> collect), not core infra. - Improves discoverability for `lore timeline` and timeline tests. Calling-code updates required: - `src/cli/commands/timeline.rs` - `crate::core::timeline::*` -> `crate::timeline::*` - `crate::core::timeline_seed::*` -> `crate::timeline::seed::*` - `crate::core::timeline_expand::*` -> `crate::timeline::expand::*` - `crate::core::timeline_collect::*` -> `crate::timeline::collect::*` - `tests/timeline_pipeline_tests.rs` - `lore::core::timeline*` imports -> `lore::timeline::*` - internal references among moved files update from `crate::core::timeline` to `crate::timeline::types` - `src/core/mod.rs`: remove `timeline*` module declarations - `src/lib.rs`: add `pub mod timeline;` ### 5.2 Extract cross-reference subsystem from `core` Move: - `src/core/note_parser.rs` -> `src/xref/note_parser.rs` - `src/core/references.rs` -> `src/xref/references.rs` - add `src/xref/mod.rs` Why: - Cross-reference extraction is a domain subsystem feeding ingestion and timeline. - Current placement in `core/` obscures data flow. Calling-code updates required: - `src/ingestion/orchestrator.rs` - `crate::core::references::*` -> `crate::xref::references::*` - `crate::core::note_parser::*` -> `crate::xref::note_parser::*` - `src/core/mod.rs`: remove `note_parser` and `references` - `src/lib.rs`: add `pub mod xref;` - tests referencing old paths update to `crate::xref::*` ### 5.3 Move ingestion-owned persistence helpers out of `core` Move: - `src/core/payloads.rs` -> `src/ingestion/storage/payloads.rs` - `src/core/events_db.rs` -> `src/ingestion/storage/events.rs` - `src/core/dependent_queue.rs` -> `src/ingestion/storage/queue.rs` - `src/core/sync_run.rs` -> `src/ingestion/storage/sync_run.rs` - add `src/ingestion/storage/mod.rs` Why: - These files primarily support ingestion/sync runtime behavior and ingestion persistence. - Consolidates ingestion runtime + ingestion storage into one domain area. Calling-code updates required: - `src/ingestion/discussions.rs`, `issues.rs`, `merge_requests.rs`, `mr_discussions.rs` - `core::payloads::*` -> `ingestion::storage::payloads::*` - `src/ingestion/orchestrator.rs` - `core::dependent_queue::*` -> `ingestion::storage::queue::*` - `core::events_db::*` -> `ingestion::storage::events::*` - `src/main.rs` - `core::dependent_queue::release_all_locked_jobs` -> `ingestion::storage::queue::release_all_locked_jobs` - `core::sync_run::SyncRunRecorder` -> `ingestion::storage::sync_run::SyncRunRecorder` - `src/cli/commands/count.rs` - `core::events_db::*` -> `ingestion::storage::events::*` - `src/cli/commands/sync_surgical.rs` - `core::sync_run::SyncRunRecorder` -> `ingestion::storage::sync_run::SyncRunRecorder` - `src/core/mod.rs`: remove moved modules - `src/ingestion/mod.rs`: export `pub mod storage;` --- ## Phase 2: CLI Structure Cleanup (high dev ergonomics impact) ### 5.4 Split `cli/mod.rs` responsibilities Current: - root parser (`Cli`, `Commands`) - all args structs (`IssuesArgs`, `WhoArgs`, `MeArgs`, etc.) Proposed: - `src/cli/mod.rs`: only `Cli`, `Commands`, top-level parser behavior - `src/cli/args.rs`: all args structs and command-local enums (`CronAction`, `TokenAction`) Why: - keeps parser root small and readable - one canonical place for args schemas Calling-code updates required: - `src/main.rs` - `use lore::cli::{..., WhoArgs, ...}` -> `use lore::cli::args::{...}` (or re-export from `cli/mod.rs`) - `src/cli/commands/who/mod.rs` - `use crate::cli::WhoArgs;` -> `use crate::cli::args::WhoArgs;` - `src/cli/commands/me/mod.rs` - `use crate::cli::MeArgs;` -> `use crate::cli::args::MeArgs;` ### 5.5 Make `main.rs` thin by moving dispatch logic to `app/` Proposed splits from `main.rs`: - `app/dispatch.rs`: all `handle_*` command handlers - `app/errors.rs`: clap error mapping, correction warning formatting - `app/robot_docs.rs`: robot docs schema/data envelope generation - keep `main.rs`: startup, logging init, parse, delegate to dispatcher Why: - reduces entrypoint complexity and improves testability of dispatch behavior - isolates robot docs machinery from runtime bootstrapping Calling-code updates required: - `main.rs`: replace direct handler function definitions with calls into `app::*` - `lib.rs`: add `pub mod app;` if shared imports needed by tests --- ## Phase 3: Split Large Command Files by Responsibility ### 5.6 Split `cli/commands/list.rs` Proposed: - `commands/list/issues.rs` (issue queries + issue output) - `commands/list/mrs.rs` (MR queries + MR output) - `commands/list/notes.rs` (note queries + note output) - `commands/list/render.rs` (shared formatting helpers) - `commands/list/mod.rs` (public API and re-exports) Why: - list concerns are already logically tripartite - better locality for bugfixes and feature additions Calling-code updates required: - `src/cli/commands/mod.rs`: import module folder and re-export unchanged API names - `src/main.rs`: ideally no change if `commands/mod.rs` re-exports remain stable ### 5.7 Split `cli/commands/show.rs` Proposed: - `commands/show/issue.rs` - `commands/show/mr.rs` - `commands/show/render.rs` - `commands/show/mod.rs` Why: - issue and MR detail assembly have separate SQL and shape logic - rendering concerns can be isolated from data retrieval Calling-code updates required: - `src/cli/commands/mod.rs` re-exports preserved (`run_show_issue`, `run_show_mr`, printers) - `src/main.rs` remains stable if re-exports preserved ### 5.8 Split `cli/commands/ingest.rs` and `cli/commands/sync.rs` Proposed: - `commands/ingest/run.rs`, `dry_run.rs`, `render.rs`, `mod.rs` - `commands/sync/run.rs`, `render.rs`, `surgical.rs`, `mod.rs` Why: - orchestration, preview generation, and output rendering are currently intertwined - surgical sync is semantically part of sync command family Calling-code updates required: - update `src/cli/commands/mod.rs` exports - update `src/cli/commands/sync_surgical.rs` path if merged into `commands/sync/surgical.rs` - no CLI UX changes expected if external API names remain ### 5.9 Split `documents/extractor.rs` Proposed: - `documents/extractor/issues.rs` - `documents/extractor/mrs.rs` - `documents/extractor/discussions.rs` - `documents/extractor/notes.rs` - `documents/extractor/common.rs` - `documents/extractor/mod.rs` Why: - extractor currently contains four independent source-type extraction paths - per-source unit tests become easier to target Calling-code updates required: - `src/documents/mod.rs` re-export surface remains stable - `src/documents/regenerator.rs` imports update only if internal re-export paths change --- ## Phase 4: Opportunistic Consolidations ### 5.10 Merge tiny embedding chunk helpers Merge: - `src/embedding/chunk_ids.rs` - `src/embedding/chunking.rs` - into `src/embedding/chunks.rs` Why: - both represent one conceptual concern: chunk partitioning and chunk identity mapping - avoids tiny-file scattering Calling-code updates required: - `src/embedding/pipeline.rs` - `src/embedding/change_detector.rs` - `src/search/vector.rs` - `src/embedding/mod.rs` exports ### 5.11 Test helper de-duplication Add a shared test support module for repeated DB fixture setup currently duplicated in: - `src/cli/commands/who_tests.rs` - `src/cli/commands/list_tests.rs` - `src/cli/commands/me/me_tests.rs` - multiple `core/*_tests.rs` Why: - lower maintenance cost and fewer fixture drift bugs Calling-code updates required: - test-only imports in affected files --- ## 6. File-Level Recommendation Matrix Legend: - `KEEP`: structure is already coherent - `MOVE`: relocate without major logic split - `SPLIT`: divide into focused files/modules - `MERGE`: consolidate tiny related files ### 6.1 `core/` - `backoff.rs` -> KEEP - `config.rs` -> KEEP (large but cohesive) - `cron.rs` -> KEEP - `cursor.rs` -> KEEP - `db.rs` -> KEEP - `dependent_queue.rs` -> MOVE to `ingestion/storage/queue.rs` - `error.rs` -> KEEP - `events_db.rs` -> MOVE to `ingestion/storage/events.rs` - `file_history.rs` -> KEEP - `lock.rs` -> KEEP - `logging.rs` -> KEEP - `metrics.rs` -> KEEP - `note_parser.rs` -> MOVE to `xref/note_parser.rs` - `path_resolver.rs` -> KEEP - `paths.rs` -> KEEP - `payloads.rs` -> MOVE to `ingestion/storage/payloads.rs` - `project.rs` -> KEEP - `references.rs` -> MOVE to `xref/references.rs` - `shutdown.rs` -> KEEP - `sync_run.rs` -> MOVE to `ingestion/storage/sync_run.rs` - `time.rs` -> KEEP - `timeline.rs`, `timeline_seed.rs`, `timeline_expand.rs`, `timeline_collect.rs` -> MOVE to `timeline/` - `trace.rs` -> KEEP ### 6.2 `cli/` - `mod.rs` -> SPLIT (`mod.rs` + `args.rs`) - `autocorrect.rs` -> SPLIT into `autocorrect/` submodules - `render.rs` -> SPLIT into `render/` submodules - `commands/list.rs` -> SPLIT into `commands/list/` - `commands/show.rs` -> SPLIT into `commands/show/` - `commands/ingest.rs` -> SPLIT into `commands/ingest/` - `commands/sync.rs` + `commands/sync_surgical.rs` -> SPLIT/MERGE into `commands/sync/` - `commands/me/*` -> KEEP (already good shape) - `commands/who/*` -> KEEP (already good shape) - small focused commands (`auth_test`, `embed`, `trace`, etc.) -> KEEP ### 6.3 `documents/` - `extractor.rs` -> SPLIT into extractor folder - `regenerator.rs` -> KEEP - `truncation.rs` -> KEEP ### 6.4 `embedding/` - `change_detector.rs` -> KEEP - `chunk_ids.rs` + `chunking.rs` -> MERGE into `chunks.rs` - `ollama.rs` -> KEEP - `pipeline.rs` -> KEEP for now (already a pipeline-centric file) - `similarity.rs` -> KEEP ### 6.5 `gitlab/`, `search/` - KEEP as-is except minor internal refactors only when touched by feature work --- ## 7. Import/Call-Site Impact Tracker (must-update list) This section tracks files that must be updated when moves happen to avoid broken builds. ### 7.1 For timeline extraction Must update: - `src/cli/commands/timeline.rs` - `tests/timeline_pipeline_tests.rs` - moved timeline module internals (`seed`, `expand`, `collect`) - `src/core/mod.rs` - `src/lib.rs` ### 7.2 For xref extraction Must update: - `src/ingestion/orchestrator.rs` (all `core::references` and `core::note_parser` paths) - tests importing moved modules - `src/core/mod.rs` - `src/lib.rs` ### 7.3 For ingestion storage move Must update: - `src/ingestion/discussions.rs` - `src/ingestion/issues.rs` - `src/ingestion/merge_requests.rs` - `src/ingestion/mr_discussions.rs` - `src/ingestion/orchestrator.rs` - `src/cli/commands/count.rs` - `src/cli/commands/sync_surgical.rs` - `src/main.rs` - `src/core/mod.rs` - `src/ingestion/mod.rs` ### 7.4 For CLI args split Must update: - `src/main.rs` - `src/cli/commands/who/mod.rs` - `src/cli/commands/me/mod.rs` - any command file importing args directly from `crate::cli::*Args` ### 7.5 For command file splits Must update: - `src/cli/commands/mod.rs` re-exports - tests that import command internals by file/module path - `src/main.rs` only if re-export names change (recommended: keep names stable) --- ## 8. Execution Strategy (Safe Order) Recommended order: 1. Phase 1 (`timeline`, `xref`, `ingestion/storage`) with no behavior changes. 2. Phase 2 (`cli/mod.rs` split, `main.rs` thinning) while preserving command signatures. 3. Phase 3 (`list`, `show`, `ingest`, `sync`, `extractor` splits). 4. Phase 4 opportunistic merges and test helper dedupe. For each phase: - complete file moves/splits and import rewrites in one cohesive change - run quality gates - only then proceed to next phase --- ## 9. Verification and Non-Regression Checklist After each phase, run: ```bash cargo check --all-targets cargo clippy --all-targets -- -D warnings cargo fmt --check cargo test cargo test -- --nocapture ``` Targeted suites to run when relevant: - timeline moves: `cargo test timeline_pipeline_tests` - who/me/list splits: `cargo test who_tests`, `cargo test list_tests`, `cargo test me_tests` - ingestion storage moves: `cargo test ingestion` Before each commit, run UBS on changed files: ```bash ubs ``` --- ## 10. Risks and Mitigations Primary risks: 1. Import path churn causing compile errors. 2. Accidental visibility changes (`pub`/`pub(crate)`) during file splits. 3. Re-export drift breaking `main.rs` or tests. 4. Behavioral drift from mixed refactor + logic changes. Mitigations: - refactor-only phases (no feature changes) - keep public API names stable during directory reshapes - preserve command re-exports in `cli/commands/mod.rs` - run full quality gates after each phase --- ## 11. Recommendation Start with **Phase 1 only** in the first implementation pass. It yields major clarity gains with relatively constrained blast radius. If Phase 1 lands cleanly, proceed with Phase 2. Phase 3 should be done in smaller PR-sized chunks (`list` first, then `show`, then `ingest/sync`, then `documents/extractor`). No code/file moves have been executed yet; this document is the proposal for review and approval.