Files
gitlore/PROPOSED_CODE_FILE_REORGANIZATION_PLAN.md

20 KiB

Proposed Code File Reorganization Plan

1. Scope, Audit Method, and Constraints

This plan is based on a full audit of the src/ tree (all 131 Rust files) plus integration tests in tests/ that import src modules.

What I audited:

  • module/file inventory (src/**.rs)
  • line counts and hotspot analysis
  • crate-internal import graph (use crate::...)
  • public API surface (public structs/enums/functions by file)
  • command routing and re-export topology (main.rs, lib.rs, cli/mod.rs, cli/commands/mod.rs)
  • cross-module coupling and test coupling

Constraints followed for this proposal:

  • no implementation yet (plan only)
  • keep nesting shallow and intuitive
  • optimize for discoverability for humans and coding agents
  • no compatibility shims as a long-term strategy
  • every structural change includes explicit call-site update tracking

2. Current State (Measured)

2.1 Size by top-level module (src/)

Module Files Lines Prod Files Prod Lines Test Files Test Lines
cli 41 29,131 37 23,068 4 6,063
core 39 12,493 27 7,599 12 4,894
ingestion 15 6,935 10 5,259 5 1,676
documents 6 3,657 4 1,749 2 1,908
gitlab 11 3,607 8 2,391 3 1,216
embedding 10 1,878 7 1,327 3 551
search 6 1,115 6 1,115 0 0
main.rs 1 3,744 1 3,744 0 0
lib.rs 1 9 1 9 0 0

Total in src/: 131 files / 62,569 lines.

2.2 Largest production hotspots

File Lines Why it matters
src/main.rs 3,744 Binary entrypoint is doing too much dispatch and formatting work
src/cli/autocorrect.rs 1,865 Large parsing/correction ruleset in one file
src/ingestion/orchestrator.rs 1,753 Multi-stage ingestion orchestration and persistence mixed together
src/cli/commands/show.rs 1,544 Issue/MR retrieval + rendering + JSON conversion all in one file
src/cli/render.rs 1,482 Theme, table layout, formatting utilities bundled together
src/cli/commands/list.rs 1,383 Issues + MRs + notes listing/query/printing in one file
src/cli/mod.rs 1,268 Clap root parser plus every args struct
src/cli/commands/sync.rs 1,201 Sync flow + human rendering + JSON output
src/cli/commands/me/queries.rs 1,135 Multiple query families and post-processing logic
src/cli/commands/ingest.rs 1,116 Ingest flow + dry-run + presentation concerns
src/documents/extractor.rs 1,059 Four document source extractors in one file

2.3 High-level dependency flow (top modules)

Observed module coupling from imports:

  • cli -> core (very heavy, 33 files)
  • cli -> documents/embedding/gitlab/ingestion/search (command-dependent)
  • ingestion -> core (12 files), ingestion -> gitlab (10 files)
  • search -> core and search -> embedding
  • timeline logic currently located under core/*timeline* but semantically acts as its own subsystem

2.4 Structural pain points

  1. main.rs is overloaded with command handlers, robot output envelope types, clap error mapping, and domain invocation.
  2. cli/mod.rs mixes root parser concerns with command-specific argument schemas.
  3. core/ still holds domain-specific subsystems (timeline, cross-reference extraction, ingestion persistence helpers) that are not truly "core infra".
  4. Several large command files combine query/build/fetch/render/json responsibilities.
  5. Test helper setup is duplicated heavily in large test files (who_tests, list_tests, me_tests).

3. Reorganization Principles

  1. Keep top-level domains explicit: cli, core (infra), gitlab, ingestion, documents, embedding, search, plus extracted domain modules where justified.
  2. Keep nesting shallow: max 2-3 levels in normal workflow paths.
  3. Co-locate command-specific args/types/rendering with the command implementation.
  4. Separate orchestration from formatting from data-access code.
  5. Prefer module boundaries that map to runtime pipeline boundaries.
  6. Make import paths reveal ownership directly.

4. Proposed Target Structure (End State)

src/
  main.rs                      # thin binary entrypoint
  lib.rs

  app/                         # NEW: runtime dispatch/orchestration glue
    mod.rs
    dispatch.rs
    errors.rs
    robot_docs.rs

  cli/
    mod.rs                     # Cli + Commands only
    args.rs                    # shared args structs used by Commands variants
    render/
      mod.rs
      format.rs
      table.rs
      theme.rs
    autocorrect/
      mod.rs
      flags.rs
      enums.rs
      fuzzy.rs
    commands/
      mod.rs
      list/
        mod.rs
        issues.rs
        mrs.rs
        notes.rs
        render.rs
      show/
        mod.rs
        issue.rs
        mr.rs
        render.rs
      me/                      # keep existing folder, retain split style
      who/                     # keep existing folder, retain split style
      ingest/
        mod.rs
        run.rs
        dry_run.rs
        render.rs
      sync/
        mod.rs
        run.rs
        render.rs
        surgical.rs
      # smaller focused commands can stay single-file for now

  core/                        # infra-only boundary after moves
    mod.rs
    backoff.rs
    config.rs
    cron.rs
    cursor.rs
    db.rs
    error.rs
    file_history.rs
    lock.rs
    logging.rs
    metrics.rs
    path_resolver.rs
    paths.rs
    project.rs
    shutdown.rs
    time.rs
    trace.rs

  timeline/                    # NEW: extracted domain subsystem
    mod.rs
    types.rs
    seed.rs
    expand.rs
    collect.rs

  xref/                        # NEW: extracted cross-reference subsystem
    mod.rs
    note_parser.rs
    references.rs

  ingestion/
    mod.rs
    issues.rs
    merge_requests.rs
    discussions.rs
    mr_discussions.rs
    mr_diffs.rs
    dirty_tracker.rs
    discussion_queue.rs
    orchestrator/
      mod.rs
      issues_flow.rs
      mrs_flow.rs
      resource_events.rs
      closes_issues.rs
      diff_jobs.rs
      progress.rs
    storage/                   # NEW: ingestion-owned persistence helpers
      mod.rs
      payloads.rs              # from core/payloads.rs
      events.rs                # from core/events_db.rs
      queue.rs                 # from core/dependent_queue.rs
      sync_run.rs              # from core/sync_run.rs

  documents/
    mod.rs
    extractor/
      mod.rs
      issues.rs
      mrs.rs
      discussions.rs
      notes.rs
      common.rs
    regenerator.rs
    truncation.rs

  embedding/
    mod.rs
    change_detector.rs
    chunks.rs                  # merge chunk_ids.rs + chunking.rs
    ollama.rs
    pipeline.rs
    similarity.rs

  gitlab/
    # mostly keep as-is (already coherent)

  search/
    # mostly keep as-is (already coherent)

Notes:

  • gitlab/ and search/ are already cohesive and should largely remain unchanged.
  • who/ and me/ command families are already split well relative to other commands.

5. Detailed Change Plan (Phased)

Phase 1: Domain Boundary Extraction (lowest conceptual risk, high clarity gain)

5.1 Extract timeline subsystem from core

Move:

  • src/core/timeline.rs -> src/timeline/types.rs
  • src/core/timeline_seed.rs -> src/timeline/seed.rs
  • src/core/timeline_expand.rs -> src/timeline/expand.rs
  • src/core/timeline_collect.rs -> src/timeline/collect.rs
  • add src/timeline/mod.rs

Why:

  • Timeline is a full pipeline domain (seed -> expand -> collect), not core infra.
  • Improves discoverability for lore timeline and timeline tests.

Calling-code updates required:

  • src/cli/commands/timeline.rs
    • crate::core::timeline::* -> crate::timeline::*
    • crate::core::timeline_seed::* -> crate::timeline::seed::*
    • crate::core::timeline_expand::* -> crate::timeline::expand::*
    • crate::core::timeline_collect::* -> crate::timeline::collect::*
  • tests/timeline_pipeline_tests.rs
    • lore::core::timeline* imports -> lore::timeline::*
  • internal references among moved files update from crate::core::timeline to crate::timeline::types
  • src/core/mod.rs: remove timeline* module declarations
  • src/lib.rs: add pub mod timeline;

5.2 Extract cross-reference subsystem from core

Move:

  • src/core/note_parser.rs -> src/xref/note_parser.rs
  • src/core/references.rs -> src/xref/references.rs
  • add src/xref/mod.rs

Why:

  • Cross-reference extraction is a domain subsystem feeding ingestion and timeline.
  • Current placement in core/ obscures data flow.

Calling-code updates required:

  • src/ingestion/orchestrator.rs
    • crate::core::references::* -> crate::xref::references::*
    • crate::core::note_parser::* -> crate::xref::note_parser::*
  • src/core/mod.rs: remove note_parser and references
  • src/lib.rs: add pub mod xref;
  • tests referencing old paths update to crate::xref::*

5.3 Move ingestion-owned persistence helpers out of core

Move:

  • src/core/payloads.rs -> src/ingestion/storage/payloads.rs
  • src/core/events_db.rs -> src/ingestion/storage/events.rs
  • src/core/dependent_queue.rs -> src/ingestion/storage/queue.rs
  • src/core/sync_run.rs -> src/ingestion/storage/sync_run.rs
  • add src/ingestion/storage/mod.rs

Why:

  • These files primarily support ingestion/sync runtime behavior and ingestion persistence.
  • Consolidates ingestion runtime + ingestion storage into one domain area.

Calling-code updates required:

  • src/ingestion/discussions.rs, issues.rs, merge_requests.rs, mr_discussions.rs
    • core::payloads::* -> ingestion::storage::payloads::*
  • src/ingestion/orchestrator.rs
    • core::dependent_queue::* -> ingestion::storage::queue::*
    • core::events_db::* -> ingestion::storage::events::*
  • src/main.rs
    • core::dependent_queue::release_all_locked_jobs -> ingestion::storage::queue::release_all_locked_jobs
    • core::sync_run::SyncRunRecorder -> ingestion::storage::sync_run::SyncRunRecorder
  • src/cli/commands/count.rs
    • core::events_db::* -> ingestion::storage::events::*
  • src/cli/commands/sync_surgical.rs
    • core::sync_run::SyncRunRecorder -> ingestion::storage::sync_run::SyncRunRecorder
  • src/core/mod.rs: remove moved modules
  • src/ingestion/mod.rs: export pub mod storage;

Phase 2: CLI Structure Cleanup (high dev ergonomics impact)

5.4 Split cli/mod.rs responsibilities

Current:

  • root parser (Cli, Commands)
  • all args structs (IssuesArgs, WhoArgs, MeArgs, etc.)

Proposed:

  • src/cli/mod.rs: only Cli, Commands, top-level parser behavior
  • src/cli/args.rs: all args structs and command-local enums (CronAction, TokenAction)

Why:

  • keeps parser root small and readable
  • one canonical place for args schemas

Calling-code updates required:

  • src/main.rs
    • use lore::cli::{..., WhoArgs, ...} -> use lore::cli::args::{...} (or re-export from cli/mod.rs)
  • src/cli/commands/who/mod.rs
    • use crate::cli::WhoArgs; -> use crate::cli::args::WhoArgs;
  • src/cli/commands/me/mod.rs
    • use crate::cli::MeArgs; -> use crate::cli::args::MeArgs;

5.5 Make main.rs thin by moving dispatch logic to app/

Proposed splits from main.rs:

  • app/dispatch.rs: all handle_* command handlers
  • app/errors.rs: clap error mapping, correction warning formatting
  • app/robot_docs.rs: robot docs schema/data envelope generation
  • keep main.rs: startup, logging init, parse, delegate to dispatcher

Why:

  • reduces entrypoint complexity and improves testability of dispatch behavior
  • isolates robot docs machinery from runtime bootstrapping

Calling-code updates required:

  • main.rs: replace direct handler function definitions with calls into app::*
  • lib.rs: add pub mod app; if shared imports needed by tests

Phase 3: Split Large Command Files by Responsibility

5.6 Split cli/commands/list.rs

Proposed:

  • commands/list/issues.rs (issue queries + issue output)
  • commands/list/mrs.rs (MR queries + MR output)
  • commands/list/notes.rs (note queries + note output)
  • commands/list/render.rs (shared formatting helpers)
  • commands/list/mod.rs (public API and re-exports)

Why:

  • list concerns are already logically tripartite
  • better locality for bugfixes and feature additions

Calling-code updates required:

  • src/cli/commands/mod.rs: import module folder and re-export unchanged API names
  • src/main.rs: ideally no change if commands/mod.rs re-exports remain stable

5.7 Split cli/commands/show.rs

Proposed:

  • commands/show/issue.rs
  • commands/show/mr.rs
  • commands/show/render.rs
  • commands/show/mod.rs

Why:

  • issue and MR detail assembly have separate SQL and shape logic
  • rendering concerns can be isolated from data retrieval

Calling-code updates required:

  • src/cli/commands/mod.rs re-exports preserved (run_show_issue, run_show_mr, printers)
  • src/main.rs remains stable if re-exports preserved

5.8 Split cli/commands/ingest.rs and cli/commands/sync.rs

Proposed:

  • commands/ingest/run.rs, dry_run.rs, render.rs, mod.rs
  • commands/sync/run.rs, render.rs, surgical.rs, mod.rs

Why:

  • orchestration, preview generation, and output rendering are currently intertwined
  • surgical sync is semantically part of sync command family

Calling-code updates required:

  • update src/cli/commands/mod.rs exports
  • update src/cli/commands/sync_surgical.rs path if merged into commands/sync/surgical.rs
  • no CLI UX changes expected if external API names remain

5.9 Split documents/extractor.rs

Proposed:

  • documents/extractor/issues.rs
  • documents/extractor/mrs.rs
  • documents/extractor/discussions.rs
  • documents/extractor/notes.rs
  • documents/extractor/common.rs
  • documents/extractor/mod.rs

Why:

  • extractor currently contains four independent source-type extraction paths
  • per-source unit tests become easier to target

Calling-code updates required:

  • src/documents/mod.rs re-export surface remains stable
  • src/documents/regenerator.rs imports update only if internal re-export paths change

Phase 4: Opportunistic Consolidations

5.10 Merge tiny embedding chunk helpers

Merge:

  • src/embedding/chunk_ids.rs
  • src/embedding/chunking.rs
  • into src/embedding/chunks.rs

Why:

  • both represent one conceptual concern: chunk partitioning and chunk identity mapping
  • avoids tiny-file scattering

Calling-code updates required:

  • src/embedding/pipeline.rs
  • src/embedding/change_detector.rs
  • src/search/vector.rs
  • src/embedding/mod.rs exports

5.11 Test helper de-duplication

Add a shared test support module for repeated DB fixture setup currently duplicated in:

  • src/cli/commands/who_tests.rs
  • src/cli/commands/list_tests.rs
  • src/cli/commands/me/me_tests.rs
  • multiple core/*_tests.rs

Why:

  • lower maintenance cost and fewer fixture drift bugs

Calling-code updates required:

  • test-only imports in affected files

6. File-Level Recommendation Matrix

Legend:

  • KEEP: structure is already coherent
  • MOVE: relocate without major logic split
  • SPLIT: divide into focused files/modules
  • MERGE: consolidate tiny related files

6.1 core/

  • backoff.rs -> KEEP
  • config.rs -> KEEP (large but cohesive)
  • cron.rs -> KEEP
  • cursor.rs -> KEEP
  • db.rs -> KEEP
  • dependent_queue.rs -> MOVE to ingestion/storage/queue.rs
  • error.rs -> KEEP
  • events_db.rs -> MOVE to ingestion/storage/events.rs
  • file_history.rs -> KEEP
  • lock.rs -> KEEP
  • logging.rs -> KEEP
  • metrics.rs -> KEEP
  • note_parser.rs -> MOVE to xref/note_parser.rs
  • path_resolver.rs -> KEEP
  • paths.rs -> KEEP
  • payloads.rs -> MOVE to ingestion/storage/payloads.rs
  • project.rs -> KEEP
  • references.rs -> MOVE to xref/references.rs
  • shutdown.rs -> KEEP
  • sync_run.rs -> MOVE to ingestion/storage/sync_run.rs
  • time.rs -> KEEP
  • timeline.rs, timeline_seed.rs, timeline_expand.rs, timeline_collect.rs -> MOVE to timeline/
  • trace.rs -> KEEP

6.2 cli/

  • mod.rs -> SPLIT (mod.rs + args.rs)
  • autocorrect.rs -> SPLIT into autocorrect/ submodules
  • render.rs -> SPLIT into render/ submodules
  • commands/list.rs -> SPLIT into commands/list/
  • commands/show.rs -> SPLIT into commands/show/
  • commands/ingest.rs -> SPLIT into commands/ingest/
  • commands/sync.rs + commands/sync_surgical.rs -> SPLIT/MERGE into commands/sync/
  • commands/me/* -> KEEP (already good shape)
  • commands/who/* -> KEEP (already good shape)
  • small focused commands (auth_test, embed, trace, etc.) -> KEEP

6.3 documents/

  • extractor.rs -> SPLIT into extractor folder
  • regenerator.rs -> KEEP
  • truncation.rs -> KEEP

6.4 embedding/

  • change_detector.rs -> KEEP
  • chunk_ids.rs + chunking.rs -> MERGE into chunks.rs
  • ollama.rs -> KEEP
  • pipeline.rs -> KEEP for now (already a pipeline-centric file)
  • similarity.rs -> KEEP
  • KEEP as-is except minor internal refactors only when touched by feature work

7. Import/Call-Site Impact Tracker (must-update list)

This section tracks files that must be updated when moves happen to avoid broken builds.

7.1 For timeline extraction

Must update:

  • src/cli/commands/timeline.rs
  • tests/timeline_pipeline_tests.rs
  • moved timeline module internals (seed, expand, collect)
  • src/core/mod.rs
  • src/lib.rs

7.2 For xref extraction

Must update:

  • src/ingestion/orchestrator.rs (all core::references and core::note_parser paths)
  • tests importing moved modules
  • src/core/mod.rs
  • src/lib.rs

7.3 For ingestion storage move

Must update:

  • src/ingestion/discussions.rs
  • src/ingestion/issues.rs
  • src/ingestion/merge_requests.rs
  • src/ingestion/mr_discussions.rs
  • src/ingestion/orchestrator.rs
  • src/cli/commands/count.rs
  • src/cli/commands/sync_surgical.rs
  • src/main.rs
  • src/core/mod.rs
  • src/ingestion/mod.rs

7.4 For CLI args split

Must update:

  • src/main.rs
  • src/cli/commands/who/mod.rs
  • src/cli/commands/me/mod.rs
  • any command file importing args directly from crate::cli::*Args

7.5 For command file splits

Must update:

  • src/cli/commands/mod.rs re-exports
  • tests that import command internals by file/module path
  • src/main.rs only if re-export names change (recommended: keep names stable)

8. Execution Strategy (Safe Order)

Recommended order:

  1. Phase 1 (timeline, xref, ingestion/storage) with no behavior changes.
  2. Phase 2 (cli/mod.rs split, main.rs thinning) while preserving command signatures.
  3. Phase 3 (list, show, ingest, sync, extractor splits).
  4. Phase 4 opportunistic merges and test helper dedupe.

For each phase:

  • complete file moves/splits and import rewrites in one cohesive change
  • run quality gates
  • only then proceed to next phase

9. Verification and Non-Regression Checklist

After each phase, run:

cargo check --all-targets
cargo clippy --all-targets -- -D warnings
cargo fmt --check
cargo test
cargo test -- --nocapture

Targeted suites to run when relevant:

  • timeline moves: cargo test timeline_pipeline_tests
  • who/me/list splits: cargo test who_tests, cargo test list_tests, cargo test me_tests
  • ingestion storage moves: cargo test ingestion

Before each commit, run UBS on changed files:

ubs <changed-files>

10. Risks and Mitigations

Primary risks:

  1. Import path churn causing compile errors.
  2. Accidental visibility changes (pub/pub(crate)) during file splits.
  3. Re-export drift breaking main.rs or tests.
  4. Behavioral drift from mixed refactor + logic changes.

Mitigations:

  • refactor-only phases (no feature changes)
  • keep public API names stable during directory reshapes
  • preserve command re-exports in cli/commands/mod.rs
  • run full quality gates after each phase

11. Recommendation

Start with Phase 1 only in the first implementation pass. It yields major clarity gains with relatively constrained blast radius.

If Phase 1 lands cleanly, proceed with Phase 2. Phase 3 should be done in smaller PR-sized chunks (list first, then show, then ingest/sync, then documents/extractor).

No code/file moves have been executed yet; this document is the proposal for review and approval.