Files
gitlore/PROPOSED_CODE_FILE_REORGANIZATION_PLAN.md

637 lines
20 KiB
Markdown

# Proposed Code File Reorganization Plan
## 1. Scope, Audit Method, and Constraints
This plan is based on a full audit of the `src/` tree (all 131 Rust files) plus integration tests in `tests/` that import `src` modules.
What I audited:
- module/file inventory (`src/**.rs`)
- line counts and hotspot analysis
- crate-internal import graph (`use crate::...`)
- public API surface (public structs/enums/functions by file)
- command routing and re-export topology (`main.rs`, `lib.rs`, `cli/mod.rs`, `cli/commands/mod.rs`)
- cross-module coupling and test coupling
Constraints followed for this proposal:
- no implementation yet (plan only)
- keep nesting shallow and intuitive
- optimize for discoverability for humans and coding agents
- no compatibility shims as a long-term strategy
- every structural change includes explicit call-site update tracking
---
## 2. Current State (Measured)
### 2.1 Size by top-level module (`src/`)
| Module | Files | Lines | Prod Files | Prod Lines | Test Files | Test Lines |
|---|---:|---:|---:|---:|---:|---:|
| `cli` | 41 | 29,131 | 37 | 23,068 | 4 | 6,063 |
| `core` | 39 | 12,493 | 27 | 7,599 | 12 | 4,894 |
| `ingestion` | 15 | 6,935 | 10 | 5,259 | 5 | 1,676 |
| `documents` | 6 | 3,657 | 4 | 1,749 | 2 | 1,908 |
| `gitlab` | 11 | 3,607 | 8 | 2,391 | 3 | 1,216 |
| `embedding` | 10 | 1,878 | 7 | 1,327 | 3 | 551 |
| `search` | 6 | 1,115 | 6 | 1,115 | 0 | 0 |
| `main.rs` | 1 | 3,744 | 1 | 3,744 | 0 | 0 |
| `lib.rs` | 1 | 9 | 1 | 9 | 0 | 0 |
Total in `src/`: **131 files / 62,569 lines**.
### 2.2 Largest production hotspots
| File | Lines | Why it matters |
|---|---:|---|
| `src/main.rs` | 3,744 | Binary entrypoint is doing too much dispatch and formatting work |
| `src/cli/autocorrect.rs` | 1,865 | Large parsing/correction ruleset in one file |
| `src/ingestion/orchestrator.rs` | 1,753 | Multi-stage ingestion orchestration and persistence mixed together |
| `src/cli/commands/show.rs` | 1,544 | Issue/MR retrieval + rendering + JSON conversion all in one file |
| `src/cli/render.rs` | 1,482 | Theme, table layout, formatting utilities bundled together |
| `src/cli/commands/list.rs` | 1,383 | Issues + MRs + notes listing/query/printing in one file |
| `src/cli/mod.rs` | 1,268 | Clap root parser plus every args struct |
| `src/cli/commands/sync.rs` | 1,201 | Sync flow + human rendering + JSON output |
| `src/cli/commands/me/queries.rs` | 1,135 | Multiple query families and post-processing logic |
| `src/cli/commands/ingest.rs` | 1,116 | Ingest flow + dry-run + presentation concerns |
| `src/documents/extractor.rs` | 1,059 | Four document source extractors in one file |
### 2.3 High-level dependency flow (top modules)
Observed module coupling from imports:
- `cli -> core` (very heavy, 33 files)
- `cli -> documents/embedding/gitlab/ingestion/search` (command-dependent)
- `ingestion -> core` (12 files), `ingestion -> gitlab` (10 files)
- `search -> core` and `search -> embedding`
- `timeline` logic currently located under `core/*timeline*` but semantically acts as its own subsystem
### 2.4 Structural pain points
1. `main.rs` is overloaded with command handlers, robot output envelope types, clap error mapping, and domain invocation.
2. `cli/mod.rs` mixes root parser concerns with command-specific argument schemas.
3. `core/` still holds domain-specific subsystems (`timeline`, cross-reference extraction, ingestion persistence helpers) that are not truly "core infra".
4. Several large command files combine query/build/fetch/render/json responsibilities.
5. Test helper setup is duplicated heavily in large test files (`who_tests`, `list_tests`, `me_tests`).
---
## 3. Reorganization Principles
1. Keep top-level domains explicit: `cli`, `core` (infra), `gitlab`, `ingestion`, `documents`, `embedding`, `search`, plus extracted domain modules where justified.
2. Keep nesting shallow: max 2-3 levels in normal workflow paths.
3. Co-locate command-specific args/types/rendering with the command implementation.
4. Separate orchestration from formatting from data-access code.
5. Prefer module boundaries that map to runtime pipeline boundaries.
6. Make import paths reveal ownership directly.
---
## 4. Proposed Target Structure (End State)
```text
src/
main.rs # thin binary entrypoint
lib.rs
app/ # NEW: runtime dispatch/orchestration glue
mod.rs
dispatch.rs
errors.rs
robot_docs.rs
cli/
mod.rs # Cli + Commands only
args.rs # shared args structs used by Commands variants
render/
mod.rs
format.rs
table.rs
theme.rs
autocorrect/
mod.rs
flags.rs
enums.rs
fuzzy.rs
commands/
mod.rs
list/
mod.rs
issues.rs
mrs.rs
notes.rs
render.rs
show/
mod.rs
issue.rs
mr.rs
render.rs
me/ # keep existing folder, retain split style
who/ # keep existing folder, retain split style
ingest/
mod.rs
run.rs
dry_run.rs
render.rs
sync/
mod.rs
run.rs
render.rs
surgical.rs
# smaller focused commands can stay single-file for now
core/ # infra-only boundary after moves
mod.rs
backoff.rs
config.rs
cron.rs
cursor.rs
db.rs
error.rs
file_history.rs
lock.rs
logging.rs
metrics.rs
path_resolver.rs
paths.rs
project.rs
shutdown.rs
time.rs
trace.rs
timeline/ # NEW: extracted domain subsystem
mod.rs
types.rs
seed.rs
expand.rs
collect.rs
xref/ # NEW: extracted cross-reference subsystem
mod.rs
note_parser.rs
references.rs
ingestion/
mod.rs
issues.rs
merge_requests.rs
discussions.rs
mr_discussions.rs
mr_diffs.rs
dirty_tracker.rs
discussion_queue.rs
orchestrator/
mod.rs
issues_flow.rs
mrs_flow.rs
resource_events.rs
closes_issues.rs
diff_jobs.rs
progress.rs
storage/ # NEW: ingestion-owned persistence helpers
mod.rs
payloads.rs # from core/payloads.rs
events.rs # from core/events_db.rs
queue.rs # from core/dependent_queue.rs
sync_run.rs # from core/sync_run.rs
documents/
mod.rs
extractor/
mod.rs
issues.rs
mrs.rs
discussions.rs
notes.rs
common.rs
regenerator.rs
truncation.rs
embedding/
mod.rs
change_detector.rs
chunks.rs # merge chunk_ids.rs + chunking.rs
ollama.rs
pipeline.rs
similarity.rs
gitlab/
# mostly keep as-is (already coherent)
search/
# mostly keep as-is (already coherent)
```
Notes:
- `gitlab/` and `search/` are already cohesive and should largely remain unchanged.
- `who/` and `me/` command families are already split well relative to other commands.
---
## 5. Detailed Change Plan (Phased)
## Phase 1: Domain Boundary Extraction (lowest conceptual risk, high clarity gain)
### 5.1 Extract timeline subsystem from `core`
Move:
- `src/core/timeline.rs` -> `src/timeline/types.rs`
- `src/core/timeline_seed.rs` -> `src/timeline/seed.rs`
- `src/core/timeline_expand.rs` -> `src/timeline/expand.rs`
- `src/core/timeline_collect.rs` -> `src/timeline/collect.rs`
- add `src/timeline/mod.rs`
Why:
- Timeline is a full pipeline domain (seed -> expand -> collect), not core infra.
- Improves discoverability for `lore timeline` and timeline tests.
Calling-code updates required:
- `src/cli/commands/timeline.rs`
- `crate::core::timeline::*` -> `crate::timeline::*`
- `crate::core::timeline_seed::*` -> `crate::timeline::seed::*`
- `crate::core::timeline_expand::*` -> `crate::timeline::expand::*`
- `crate::core::timeline_collect::*` -> `crate::timeline::collect::*`
- `tests/timeline_pipeline_tests.rs`
- `lore::core::timeline*` imports -> `lore::timeline::*`
- internal references among moved files update from `crate::core::timeline` to `crate::timeline::types`
- `src/core/mod.rs`: remove `timeline*` module declarations
- `src/lib.rs`: add `pub mod timeline;`
### 5.2 Extract cross-reference subsystem from `core`
Move:
- `src/core/note_parser.rs` -> `src/xref/note_parser.rs`
- `src/core/references.rs` -> `src/xref/references.rs`
- add `src/xref/mod.rs`
Why:
- Cross-reference extraction is a domain subsystem feeding ingestion and timeline.
- Current placement in `core/` obscures data flow.
Calling-code updates required:
- `src/ingestion/orchestrator.rs`
- `crate::core::references::*` -> `crate::xref::references::*`
- `crate::core::note_parser::*` -> `crate::xref::note_parser::*`
- `src/core/mod.rs`: remove `note_parser` and `references`
- `src/lib.rs`: add `pub mod xref;`
- tests referencing old paths update to `crate::xref::*`
### 5.3 Move ingestion-owned persistence helpers out of `core`
Move:
- `src/core/payloads.rs` -> `src/ingestion/storage/payloads.rs`
- `src/core/events_db.rs` -> `src/ingestion/storage/events.rs`
- `src/core/dependent_queue.rs` -> `src/ingestion/storage/queue.rs`
- `src/core/sync_run.rs` -> `src/ingestion/storage/sync_run.rs`
- add `src/ingestion/storage/mod.rs`
Why:
- These files primarily support ingestion/sync runtime behavior and ingestion persistence.
- Consolidates ingestion runtime + ingestion storage into one domain area.
Calling-code updates required:
- `src/ingestion/discussions.rs`, `issues.rs`, `merge_requests.rs`, `mr_discussions.rs`
- `core::payloads::*` -> `ingestion::storage::payloads::*`
- `src/ingestion/orchestrator.rs`
- `core::dependent_queue::*` -> `ingestion::storage::queue::*`
- `core::events_db::*` -> `ingestion::storage::events::*`
- `src/main.rs`
- `core::dependent_queue::release_all_locked_jobs` -> `ingestion::storage::queue::release_all_locked_jobs`
- `core::sync_run::SyncRunRecorder` -> `ingestion::storage::sync_run::SyncRunRecorder`
- `src/cli/commands/count.rs`
- `core::events_db::*` -> `ingestion::storage::events::*`
- `src/cli/commands/sync_surgical.rs`
- `core::sync_run::SyncRunRecorder` -> `ingestion::storage::sync_run::SyncRunRecorder`
- `src/core/mod.rs`: remove moved modules
- `src/ingestion/mod.rs`: export `pub mod storage;`
---
## Phase 2: CLI Structure Cleanup (high dev ergonomics impact)
### 5.4 Split `cli/mod.rs` responsibilities
Current:
- root parser (`Cli`, `Commands`)
- all args structs (`IssuesArgs`, `WhoArgs`, `MeArgs`, etc.)
Proposed:
- `src/cli/mod.rs`: only `Cli`, `Commands`, top-level parser behavior
- `src/cli/args.rs`: all args structs and command-local enums (`CronAction`, `TokenAction`)
Why:
- keeps parser root small and readable
- one canonical place for args schemas
Calling-code updates required:
- `src/main.rs`
- `use lore::cli::{..., WhoArgs, ...}` -> `use lore::cli::args::{...}` (or re-export from `cli/mod.rs`)
- `src/cli/commands/who/mod.rs`
- `use crate::cli::WhoArgs;` -> `use crate::cli::args::WhoArgs;`
- `src/cli/commands/me/mod.rs`
- `use crate::cli::MeArgs;` -> `use crate::cli::args::MeArgs;`
### 5.5 Make `main.rs` thin by moving dispatch logic to `app/`
Proposed splits from `main.rs`:
- `app/dispatch.rs`: all `handle_*` command handlers
- `app/errors.rs`: clap error mapping, correction warning formatting
- `app/robot_docs.rs`: robot docs schema/data envelope generation
- keep `main.rs`: startup, logging init, parse, delegate to dispatcher
Why:
- reduces entrypoint complexity and improves testability of dispatch behavior
- isolates robot docs machinery from runtime bootstrapping
Calling-code updates required:
- `main.rs`: replace direct handler function definitions with calls into `app::*`
- `lib.rs`: add `pub mod app;` if shared imports needed by tests
---
## Phase 3: Split Large Command Files by Responsibility
### 5.6 Split `cli/commands/list.rs`
Proposed:
- `commands/list/issues.rs` (issue queries + issue output)
- `commands/list/mrs.rs` (MR queries + MR output)
- `commands/list/notes.rs` (note queries + note output)
- `commands/list/render.rs` (shared formatting helpers)
- `commands/list/mod.rs` (public API and re-exports)
Why:
- list concerns are already logically tripartite
- better locality for bugfixes and feature additions
Calling-code updates required:
- `src/cli/commands/mod.rs`: import module folder and re-export unchanged API names
- `src/main.rs`: ideally no change if `commands/mod.rs` re-exports remain stable
### 5.7 Split `cli/commands/show.rs`
Proposed:
- `commands/show/issue.rs`
- `commands/show/mr.rs`
- `commands/show/render.rs`
- `commands/show/mod.rs`
Why:
- issue and MR detail assembly have separate SQL and shape logic
- rendering concerns can be isolated from data retrieval
Calling-code updates required:
- `src/cli/commands/mod.rs` re-exports preserved (`run_show_issue`, `run_show_mr`, printers)
- `src/main.rs` remains stable if re-exports preserved
### 5.8 Split `cli/commands/ingest.rs` and `cli/commands/sync.rs`
Proposed:
- `commands/ingest/run.rs`, `dry_run.rs`, `render.rs`, `mod.rs`
- `commands/sync/run.rs`, `render.rs`, `surgical.rs`, `mod.rs`
Why:
- orchestration, preview generation, and output rendering are currently intertwined
- surgical sync is semantically part of sync command family
Calling-code updates required:
- update `src/cli/commands/mod.rs` exports
- update `src/cli/commands/sync_surgical.rs` path if merged into `commands/sync/surgical.rs`
- no CLI UX changes expected if external API names remain
### 5.9 Split `documents/extractor.rs`
Proposed:
- `documents/extractor/issues.rs`
- `documents/extractor/mrs.rs`
- `documents/extractor/discussions.rs`
- `documents/extractor/notes.rs`
- `documents/extractor/common.rs`
- `documents/extractor/mod.rs`
Why:
- extractor currently contains four independent source-type extraction paths
- per-source unit tests become easier to target
Calling-code updates required:
- `src/documents/mod.rs` re-export surface remains stable
- `src/documents/regenerator.rs` imports update only if internal re-export paths change
---
## Phase 4: Opportunistic Consolidations
### 5.10 Merge tiny embedding chunk helpers
Merge:
- `src/embedding/chunk_ids.rs`
- `src/embedding/chunking.rs`
- into `src/embedding/chunks.rs`
Why:
- both represent one conceptual concern: chunk partitioning and chunk identity mapping
- avoids tiny-file scattering
Calling-code updates required:
- `src/embedding/pipeline.rs`
- `src/embedding/change_detector.rs`
- `src/search/vector.rs`
- `src/embedding/mod.rs` exports
### 5.11 Test helper de-duplication
Add a shared test support module for repeated DB fixture setup currently duplicated in:
- `src/cli/commands/who_tests.rs`
- `src/cli/commands/list_tests.rs`
- `src/cli/commands/me/me_tests.rs`
- multiple `core/*_tests.rs`
Why:
- lower maintenance cost and fewer fixture drift bugs
Calling-code updates required:
- test-only imports in affected files
---
## 6. File-Level Recommendation Matrix
Legend:
- `KEEP`: structure is already coherent
- `MOVE`: relocate without major logic split
- `SPLIT`: divide into focused files/modules
- `MERGE`: consolidate tiny related files
### 6.1 `core/`
- `backoff.rs` -> KEEP
- `config.rs` -> KEEP (large but cohesive)
- `cron.rs` -> KEEP
- `cursor.rs` -> KEEP
- `db.rs` -> KEEP
- `dependent_queue.rs` -> MOVE to `ingestion/storage/queue.rs`
- `error.rs` -> KEEP
- `events_db.rs` -> MOVE to `ingestion/storage/events.rs`
- `file_history.rs` -> KEEP
- `lock.rs` -> KEEP
- `logging.rs` -> KEEP
- `metrics.rs` -> KEEP
- `note_parser.rs` -> MOVE to `xref/note_parser.rs`
- `path_resolver.rs` -> KEEP
- `paths.rs` -> KEEP
- `payloads.rs` -> MOVE to `ingestion/storage/payloads.rs`
- `project.rs` -> KEEP
- `references.rs` -> MOVE to `xref/references.rs`
- `shutdown.rs` -> KEEP
- `sync_run.rs` -> MOVE to `ingestion/storage/sync_run.rs`
- `time.rs` -> KEEP
- `timeline.rs`, `timeline_seed.rs`, `timeline_expand.rs`, `timeline_collect.rs` -> MOVE to `timeline/`
- `trace.rs` -> KEEP
### 6.2 `cli/`
- `mod.rs` -> SPLIT (`mod.rs` + `args.rs`)
- `autocorrect.rs` -> SPLIT into `autocorrect/` submodules
- `render.rs` -> SPLIT into `render/` submodules
- `commands/list.rs` -> SPLIT into `commands/list/`
- `commands/show.rs` -> SPLIT into `commands/show/`
- `commands/ingest.rs` -> SPLIT into `commands/ingest/`
- `commands/sync.rs` + `commands/sync_surgical.rs` -> SPLIT/MERGE into `commands/sync/`
- `commands/me/*` -> KEEP (already good shape)
- `commands/who/*` -> KEEP (already good shape)
- small focused commands (`auth_test`, `embed`, `trace`, etc.) -> KEEP
### 6.3 `documents/`
- `extractor.rs` -> SPLIT into extractor folder
- `regenerator.rs` -> KEEP
- `truncation.rs` -> KEEP
### 6.4 `embedding/`
- `change_detector.rs` -> KEEP
- `chunk_ids.rs` + `chunking.rs` -> MERGE into `chunks.rs`
- `ollama.rs` -> KEEP
- `pipeline.rs` -> KEEP for now (already a pipeline-centric file)
- `similarity.rs` -> KEEP
### 6.5 `gitlab/`, `search/`
- KEEP as-is except minor internal refactors only when touched by feature work
---
## 7. Import/Call-Site Impact Tracker (must-update list)
This section tracks files that must be updated when moves happen to avoid broken builds.
### 7.1 For timeline extraction
Must update:
- `src/cli/commands/timeline.rs`
- `tests/timeline_pipeline_tests.rs`
- moved timeline module internals (`seed`, `expand`, `collect`)
- `src/core/mod.rs`
- `src/lib.rs`
### 7.2 For xref extraction
Must update:
- `src/ingestion/orchestrator.rs` (all `core::references` and `core::note_parser` paths)
- tests importing moved modules
- `src/core/mod.rs`
- `src/lib.rs`
### 7.3 For ingestion storage move
Must update:
- `src/ingestion/discussions.rs`
- `src/ingestion/issues.rs`
- `src/ingestion/merge_requests.rs`
- `src/ingestion/mr_discussions.rs`
- `src/ingestion/orchestrator.rs`
- `src/cli/commands/count.rs`
- `src/cli/commands/sync_surgical.rs`
- `src/main.rs`
- `src/core/mod.rs`
- `src/ingestion/mod.rs`
### 7.4 For CLI args split
Must update:
- `src/main.rs`
- `src/cli/commands/who/mod.rs`
- `src/cli/commands/me/mod.rs`
- any command file importing args directly from `crate::cli::*Args`
### 7.5 For command file splits
Must update:
- `src/cli/commands/mod.rs` re-exports
- tests that import command internals by file/module path
- `src/main.rs` only if re-export names change (recommended: keep names stable)
---
## 8. Execution Strategy (Safe Order)
Recommended order:
1. Phase 1 (`timeline`, `xref`, `ingestion/storage`) with no behavior changes.
2. Phase 2 (`cli/mod.rs` split, `main.rs` thinning) while preserving command signatures.
3. Phase 3 (`list`, `show`, `ingest`, `sync`, `extractor` splits).
4. Phase 4 opportunistic merges and test helper dedupe.
For each phase:
- complete file moves/splits and import rewrites in one cohesive change
- run quality gates
- only then proceed to next phase
---
## 9. Verification and Non-Regression Checklist
After each phase, run:
```bash
cargo check --all-targets
cargo clippy --all-targets -- -D warnings
cargo fmt --check
cargo test
cargo test -- --nocapture
```
Targeted suites to run when relevant:
- timeline moves: `cargo test timeline_pipeline_tests`
- who/me/list splits: `cargo test who_tests`, `cargo test list_tests`, `cargo test me_tests`
- ingestion storage moves: `cargo test ingestion`
Before each commit, run UBS on changed files:
```bash
ubs <changed-files>
```
---
## 10. Risks and Mitigations
Primary risks:
1. Import path churn causing compile errors.
2. Accidental visibility changes (`pub`/`pub(crate)`) during file splits.
3. Re-export drift breaking `main.rs` or tests.
4. Behavioral drift from mixed refactor + logic changes.
Mitigations:
- refactor-only phases (no feature changes)
- keep public API names stable during directory reshapes
- preserve command re-exports in `cli/commands/mod.rs`
- run full quality gates after each phase
---
## 11. Recommendation
Start with **Phase 1 only** in the first implementation pass. It yields major clarity gains with relatively constrained blast radius.
If Phase 1 lands cleanly, proceed with Phase 2. Phase 3 should be done in smaller PR-sized chunks (`list` first, then `show`, then `ingest/sync`, then `documents/extractor`).
No code/file moves have been executed yet; this document is the proposal for review and approval.