20 KiB
Proposed Code File Reorganization Plan
1. Scope, Audit Method, and Constraints
This plan is based on a full audit of the src/ tree (all 131 Rust files) plus integration tests in tests/ that import src modules.
What I audited:
- module/file inventory (
src/**.rs) - line counts and hotspot analysis
- crate-internal import graph (
use crate::...) - public API surface (public structs/enums/functions by file)
- command routing and re-export topology (
main.rs,lib.rs,cli/mod.rs,cli/commands/mod.rs) - cross-module coupling and test coupling
Constraints followed for this proposal:
- no implementation yet (plan only)
- keep nesting shallow and intuitive
- optimize for discoverability for humans and coding agents
- no compatibility shims as a long-term strategy
- every structural change includes explicit call-site update tracking
2. Current State (Measured)
2.1 Size by top-level module (src/)
| Module | Files | Lines | Prod Files | Prod Lines | Test Files | Test Lines |
|---|---|---|---|---|---|---|
cli |
41 | 29,131 | 37 | 23,068 | 4 | 6,063 |
core |
39 | 12,493 | 27 | 7,599 | 12 | 4,894 |
ingestion |
15 | 6,935 | 10 | 5,259 | 5 | 1,676 |
documents |
6 | 3,657 | 4 | 1,749 | 2 | 1,908 |
gitlab |
11 | 3,607 | 8 | 2,391 | 3 | 1,216 |
embedding |
10 | 1,878 | 7 | 1,327 | 3 | 551 |
search |
6 | 1,115 | 6 | 1,115 | 0 | 0 |
main.rs |
1 | 3,744 | 1 | 3,744 | 0 | 0 |
lib.rs |
1 | 9 | 1 | 9 | 0 | 0 |
Total in src/: 131 files / 62,569 lines.
2.2 Largest production hotspots
| File | Lines | Why it matters |
|---|---|---|
src/main.rs |
3,744 | Binary entrypoint is doing too much dispatch and formatting work |
src/cli/autocorrect.rs |
1,865 | Large parsing/correction ruleset in one file |
src/ingestion/orchestrator.rs |
1,753 | Multi-stage ingestion orchestration and persistence mixed together |
src/cli/commands/show.rs |
1,544 | Issue/MR retrieval + rendering + JSON conversion all in one file |
src/cli/render.rs |
1,482 | Theme, table layout, formatting utilities bundled together |
src/cli/commands/list.rs |
1,383 | Issues + MRs + notes listing/query/printing in one file |
src/cli/mod.rs |
1,268 | Clap root parser plus every args struct |
src/cli/commands/sync.rs |
1,201 | Sync flow + human rendering + JSON output |
src/cli/commands/me/queries.rs |
1,135 | Multiple query families and post-processing logic |
src/cli/commands/ingest.rs |
1,116 | Ingest flow + dry-run + presentation concerns |
src/documents/extractor.rs |
1,059 | Four document source extractors in one file |
2.3 High-level dependency flow (top modules)
Observed module coupling from imports:
cli -> core(very heavy, 33 files)cli -> documents/embedding/gitlab/ingestion/search(command-dependent)ingestion -> core(12 files),ingestion -> gitlab(10 files)search -> coreandsearch -> embeddingtimelinelogic currently located undercore/*timeline*but semantically acts as its own subsystem
2.4 Structural pain points
main.rsis overloaded with command handlers, robot output envelope types, clap error mapping, and domain invocation.cli/mod.rsmixes root parser concerns with command-specific argument schemas.core/still holds domain-specific subsystems (timeline, cross-reference extraction, ingestion persistence helpers) that are not truly "core infra".- Several large command files combine query/build/fetch/render/json responsibilities.
- Test helper setup is duplicated heavily in large test files (
who_tests,list_tests,me_tests).
3. Reorganization Principles
- Keep top-level domains explicit:
cli,core(infra),gitlab,ingestion,documents,embedding,search, plus extracted domain modules where justified. - Keep nesting shallow: max 2-3 levels in normal workflow paths.
- Co-locate command-specific args/types/rendering with the command implementation.
- Separate orchestration from formatting from data-access code.
- Prefer module boundaries that map to runtime pipeline boundaries.
- Make import paths reveal ownership directly.
4. Proposed Target Structure (End State)
src/
main.rs # thin binary entrypoint
lib.rs
app/ # NEW: runtime dispatch/orchestration glue
mod.rs
dispatch.rs
errors.rs
robot_docs.rs
cli/
mod.rs # Cli + Commands only
args.rs # shared args structs used by Commands variants
render/
mod.rs
format.rs
table.rs
theme.rs
autocorrect/
mod.rs
flags.rs
enums.rs
fuzzy.rs
commands/
mod.rs
list/
mod.rs
issues.rs
mrs.rs
notes.rs
render.rs
show/
mod.rs
issue.rs
mr.rs
render.rs
me/ # keep existing folder, retain split style
who/ # keep existing folder, retain split style
ingest/
mod.rs
run.rs
dry_run.rs
render.rs
sync/
mod.rs
run.rs
render.rs
surgical.rs
# smaller focused commands can stay single-file for now
core/ # infra-only boundary after moves
mod.rs
backoff.rs
config.rs
cron.rs
cursor.rs
db.rs
error.rs
file_history.rs
lock.rs
logging.rs
metrics.rs
path_resolver.rs
paths.rs
project.rs
shutdown.rs
time.rs
trace.rs
timeline/ # NEW: extracted domain subsystem
mod.rs
types.rs
seed.rs
expand.rs
collect.rs
xref/ # NEW: extracted cross-reference subsystem
mod.rs
note_parser.rs
references.rs
ingestion/
mod.rs
issues.rs
merge_requests.rs
discussions.rs
mr_discussions.rs
mr_diffs.rs
dirty_tracker.rs
discussion_queue.rs
orchestrator/
mod.rs
issues_flow.rs
mrs_flow.rs
resource_events.rs
closes_issues.rs
diff_jobs.rs
progress.rs
storage/ # NEW: ingestion-owned persistence helpers
mod.rs
payloads.rs # from core/payloads.rs
events.rs # from core/events_db.rs
queue.rs # from core/dependent_queue.rs
sync_run.rs # from core/sync_run.rs
documents/
mod.rs
extractor/
mod.rs
issues.rs
mrs.rs
discussions.rs
notes.rs
common.rs
regenerator.rs
truncation.rs
embedding/
mod.rs
change_detector.rs
chunks.rs # merge chunk_ids.rs + chunking.rs
ollama.rs
pipeline.rs
similarity.rs
gitlab/
# mostly keep as-is (already coherent)
search/
# mostly keep as-is (already coherent)
Notes:
gitlab/andsearch/are already cohesive and should largely remain unchanged.who/andme/command families are already split well relative to other commands.
5. Detailed Change Plan (Phased)
Phase 1: Domain Boundary Extraction (lowest conceptual risk, high clarity gain)
5.1 Extract timeline subsystem from core
Move:
src/core/timeline.rs->src/timeline/types.rssrc/core/timeline_seed.rs->src/timeline/seed.rssrc/core/timeline_expand.rs->src/timeline/expand.rssrc/core/timeline_collect.rs->src/timeline/collect.rs- add
src/timeline/mod.rs
Why:
- Timeline is a full pipeline domain (seed -> expand -> collect), not core infra.
- Improves discoverability for
lore timelineand timeline tests.
Calling-code updates required:
src/cli/commands/timeline.rscrate::core::timeline::*->crate::timeline::*crate::core::timeline_seed::*->crate::timeline::seed::*crate::core::timeline_expand::*->crate::timeline::expand::*crate::core::timeline_collect::*->crate::timeline::collect::*
tests/timeline_pipeline_tests.rslore::core::timeline*imports ->lore::timeline::*
- internal references among moved files update from
crate::core::timelinetocrate::timeline::types src/core/mod.rs: removetimeline*module declarationssrc/lib.rs: addpub mod timeline;
5.2 Extract cross-reference subsystem from core
Move:
src/core/note_parser.rs->src/xref/note_parser.rssrc/core/references.rs->src/xref/references.rs- add
src/xref/mod.rs
Why:
- Cross-reference extraction is a domain subsystem feeding ingestion and timeline.
- Current placement in
core/obscures data flow.
Calling-code updates required:
src/ingestion/orchestrator.rscrate::core::references::*->crate::xref::references::*crate::core::note_parser::*->crate::xref::note_parser::*
src/core/mod.rs: removenote_parserandreferencessrc/lib.rs: addpub mod xref;- tests referencing old paths update to
crate::xref::*
5.3 Move ingestion-owned persistence helpers out of core
Move:
src/core/payloads.rs->src/ingestion/storage/payloads.rssrc/core/events_db.rs->src/ingestion/storage/events.rssrc/core/dependent_queue.rs->src/ingestion/storage/queue.rssrc/core/sync_run.rs->src/ingestion/storage/sync_run.rs- add
src/ingestion/storage/mod.rs
Why:
- These files primarily support ingestion/sync runtime behavior and ingestion persistence.
- Consolidates ingestion runtime + ingestion storage into one domain area.
Calling-code updates required:
src/ingestion/discussions.rs,issues.rs,merge_requests.rs,mr_discussions.rscore::payloads::*->ingestion::storage::payloads::*
src/ingestion/orchestrator.rscore::dependent_queue::*->ingestion::storage::queue::*core::events_db::*->ingestion::storage::events::*
src/main.rscore::dependent_queue::release_all_locked_jobs->ingestion::storage::queue::release_all_locked_jobscore::sync_run::SyncRunRecorder->ingestion::storage::sync_run::SyncRunRecorder
src/cli/commands/count.rscore::events_db::*->ingestion::storage::events::*
src/cli/commands/sync_surgical.rscore::sync_run::SyncRunRecorder->ingestion::storage::sync_run::SyncRunRecorder
src/core/mod.rs: remove moved modulessrc/ingestion/mod.rs: exportpub mod storage;
Phase 2: CLI Structure Cleanup (high dev ergonomics impact)
5.4 Split cli/mod.rs responsibilities
Current:
- root parser (
Cli,Commands) - all args structs (
IssuesArgs,WhoArgs,MeArgs, etc.)
Proposed:
src/cli/mod.rs: onlyCli,Commands, top-level parser behaviorsrc/cli/args.rs: all args structs and command-local enums (CronAction,TokenAction)
Why:
- keeps parser root small and readable
- one canonical place for args schemas
Calling-code updates required:
src/main.rsuse lore::cli::{..., WhoArgs, ...}->use lore::cli::args::{...}(or re-export fromcli/mod.rs)
src/cli/commands/who/mod.rsuse crate::cli::WhoArgs;->use crate::cli::args::WhoArgs;
src/cli/commands/me/mod.rsuse crate::cli::MeArgs;->use crate::cli::args::MeArgs;
5.5 Make main.rs thin by moving dispatch logic to app/
Proposed splits from main.rs:
app/dispatch.rs: allhandle_*command handlersapp/errors.rs: clap error mapping, correction warning formattingapp/robot_docs.rs: robot docs schema/data envelope generation- keep
main.rs: startup, logging init, parse, delegate to dispatcher
Why:
- reduces entrypoint complexity and improves testability of dispatch behavior
- isolates robot docs machinery from runtime bootstrapping
Calling-code updates required:
main.rs: replace direct handler function definitions with calls intoapp::*lib.rs: addpub mod app;if shared imports needed by tests
Phase 3: Split Large Command Files by Responsibility
5.6 Split cli/commands/list.rs
Proposed:
commands/list/issues.rs(issue queries + issue output)commands/list/mrs.rs(MR queries + MR output)commands/list/notes.rs(note queries + note output)commands/list/render.rs(shared formatting helpers)commands/list/mod.rs(public API and re-exports)
Why:
- list concerns are already logically tripartite
- better locality for bugfixes and feature additions
Calling-code updates required:
src/cli/commands/mod.rs: import module folder and re-export unchanged API namessrc/main.rs: ideally no change ifcommands/mod.rsre-exports remain stable
5.7 Split cli/commands/show.rs
Proposed:
commands/show/issue.rscommands/show/mr.rscommands/show/render.rscommands/show/mod.rs
Why:
- issue and MR detail assembly have separate SQL and shape logic
- rendering concerns can be isolated from data retrieval
Calling-code updates required:
src/cli/commands/mod.rsre-exports preserved (run_show_issue,run_show_mr, printers)src/main.rsremains stable if re-exports preserved
5.8 Split cli/commands/ingest.rs and cli/commands/sync.rs
Proposed:
commands/ingest/run.rs,dry_run.rs,render.rs,mod.rscommands/sync/run.rs,render.rs,surgical.rs,mod.rs
Why:
- orchestration, preview generation, and output rendering are currently intertwined
- surgical sync is semantically part of sync command family
Calling-code updates required:
- update
src/cli/commands/mod.rsexports - update
src/cli/commands/sync_surgical.rspath if merged intocommands/sync/surgical.rs - no CLI UX changes expected if external API names remain
5.9 Split documents/extractor.rs
Proposed:
documents/extractor/issues.rsdocuments/extractor/mrs.rsdocuments/extractor/discussions.rsdocuments/extractor/notes.rsdocuments/extractor/common.rsdocuments/extractor/mod.rs
Why:
- extractor currently contains four independent source-type extraction paths
- per-source unit tests become easier to target
Calling-code updates required:
src/documents/mod.rsre-export surface remains stablesrc/documents/regenerator.rsimports update only if internal re-export paths change
Phase 4: Opportunistic Consolidations
5.10 Merge tiny embedding chunk helpers
Merge:
src/embedding/chunk_ids.rssrc/embedding/chunking.rs- into
src/embedding/chunks.rs
Why:
- both represent one conceptual concern: chunk partitioning and chunk identity mapping
- avoids tiny-file scattering
Calling-code updates required:
src/embedding/pipeline.rssrc/embedding/change_detector.rssrc/search/vector.rssrc/embedding/mod.rsexports
5.11 Test helper de-duplication
Add a shared test support module for repeated DB fixture setup currently duplicated in:
src/cli/commands/who_tests.rssrc/cli/commands/list_tests.rssrc/cli/commands/me/me_tests.rs- multiple
core/*_tests.rs
Why:
- lower maintenance cost and fewer fixture drift bugs
Calling-code updates required:
- test-only imports in affected files
6. File-Level Recommendation Matrix
Legend:
KEEP: structure is already coherentMOVE: relocate without major logic splitSPLIT: divide into focused files/modulesMERGE: consolidate tiny related files
6.1 core/
backoff.rs-> KEEPconfig.rs-> KEEP (large but cohesive)cron.rs-> KEEPcursor.rs-> KEEPdb.rs-> KEEPdependent_queue.rs-> MOVE toingestion/storage/queue.rserror.rs-> KEEPevents_db.rs-> MOVE toingestion/storage/events.rsfile_history.rs-> KEEPlock.rs-> KEEPlogging.rs-> KEEPmetrics.rs-> KEEPnote_parser.rs-> MOVE toxref/note_parser.rspath_resolver.rs-> KEEPpaths.rs-> KEEPpayloads.rs-> MOVE toingestion/storage/payloads.rsproject.rs-> KEEPreferences.rs-> MOVE toxref/references.rsshutdown.rs-> KEEPsync_run.rs-> MOVE toingestion/storage/sync_run.rstime.rs-> KEEPtimeline.rs,timeline_seed.rs,timeline_expand.rs,timeline_collect.rs-> MOVE totimeline/trace.rs-> KEEP
6.2 cli/
mod.rs-> SPLIT (mod.rs+args.rs)autocorrect.rs-> SPLIT intoautocorrect/submodulesrender.rs-> SPLIT intorender/submodulescommands/list.rs-> SPLIT intocommands/list/commands/show.rs-> SPLIT intocommands/show/commands/ingest.rs-> SPLIT intocommands/ingest/commands/sync.rs+commands/sync_surgical.rs-> SPLIT/MERGE intocommands/sync/commands/me/*-> KEEP (already good shape)commands/who/*-> KEEP (already good shape)- small focused commands (
auth_test,embed,trace, etc.) -> KEEP
6.3 documents/
extractor.rs-> SPLIT into extractor folderregenerator.rs-> KEEPtruncation.rs-> KEEP
6.4 embedding/
change_detector.rs-> KEEPchunk_ids.rs+chunking.rs-> MERGE intochunks.rsollama.rs-> KEEPpipeline.rs-> KEEP for now (already a pipeline-centric file)similarity.rs-> KEEP
6.5 gitlab/, search/
- KEEP as-is except minor internal refactors only when touched by feature work
7. Import/Call-Site Impact Tracker (must-update list)
This section tracks files that must be updated when moves happen to avoid broken builds.
7.1 For timeline extraction
Must update:
src/cli/commands/timeline.rstests/timeline_pipeline_tests.rs- moved timeline module internals (
seed,expand,collect) src/core/mod.rssrc/lib.rs
7.2 For xref extraction
Must update:
src/ingestion/orchestrator.rs(allcore::referencesandcore::note_parserpaths)- tests importing moved modules
src/core/mod.rssrc/lib.rs
7.3 For ingestion storage move
Must update:
src/ingestion/discussions.rssrc/ingestion/issues.rssrc/ingestion/merge_requests.rssrc/ingestion/mr_discussions.rssrc/ingestion/orchestrator.rssrc/cli/commands/count.rssrc/cli/commands/sync_surgical.rssrc/main.rssrc/core/mod.rssrc/ingestion/mod.rs
7.4 For CLI args split
Must update:
src/main.rssrc/cli/commands/who/mod.rssrc/cli/commands/me/mod.rs- any command file importing args directly from
crate::cli::*Args
7.5 For command file splits
Must update:
src/cli/commands/mod.rsre-exports- tests that import command internals by file/module path
src/main.rsonly if re-export names change (recommended: keep names stable)
8. Execution Strategy (Safe Order)
Recommended order:
- Phase 1 (
timeline,xref,ingestion/storage) with no behavior changes. - Phase 2 (
cli/mod.rssplit,main.rsthinning) while preserving command signatures. - Phase 3 (
list,show,ingest,sync,extractorsplits). - Phase 4 opportunistic merges and test helper dedupe.
For each phase:
- complete file moves/splits and import rewrites in one cohesive change
- run quality gates
- only then proceed to next phase
9. Verification and Non-Regression Checklist
After each phase, run:
cargo check --all-targets
cargo clippy --all-targets -- -D warnings
cargo fmt --check
cargo test
cargo test -- --nocapture
Targeted suites to run when relevant:
- timeline moves:
cargo test timeline_pipeline_tests - who/me/list splits:
cargo test who_tests,cargo test list_tests,cargo test me_tests - ingestion storage moves:
cargo test ingestion
Before each commit, run UBS on changed files:
ubs <changed-files>
10. Risks and Mitigations
Primary risks:
- Import path churn causing compile errors.
- Accidental visibility changes (
pub/pub(crate)) during file splits. - Re-export drift breaking
main.rsor tests. - Behavioral drift from mixed refactor + logic changes.
Mitigations:
- refactor-only phases (no feature changes)
- keep public API names stable during directory reshapes
- preserve command re-exports in
cli/commands/mod.rs - run full quality gates after each phase
11. Recommendation
Start with Phase 1 only in the first implementation pass. It yields major clarity gains with relatively constrained blast radius.
If Phase 1 lands cleanly, proceed with Phase 2. Phase 3 should be done in smaller PR-sized chunks (list first, then show, then ingest/sync, then documents/extractor).
No code/file moves have been executed yet; this document is the proposal for review and approval.