gitlore/PROPOSED_CODE_FILE_REORGANIZATION_PLAN.md

# Proposed Code File Reorganization Plan

## 1. Scope, Audit Method, and Constraints

This plan is based on a full audit of the `src/` tree (all 131 Rust files) plus integration tests in `tests/` that import `src` modules.

What I audited:
- module/file inventory (`src/**.rs`)
- line counts and hotspot analysis
- crate-internal import graph (`use crate::...`)
- public API surface (public structs/enums/functions by file)
- command routing and re-export topology (`main.rs`, `lib.rs`, `cli/mod.rs`, `cli/commands/mod.rs`)
- cross-module coupling and test coupling

Constraints followed for this proposal:
- no implementation yet (plan only)
- keep nesting shallow and intuitive
- optimize for discoverability for humans and coding agents
- no compatibility shims as a long-term strategy
- every structural change includes explicit call-site update tracking

---

## 2. Current State (Measured)

### 2.1 Size by top-level module (`src/`)

| Module | Files | Lines | Prod Files | Prod Lines | Test Files | Test Lines |
|---|---:|---:|---:|---:|---:|---:|
| `cli` | 41 | 29,131 | 37 | 23,068 | 4 | 6,063 |
| `core` | 39 | 12,493 | 27 | 7,599 | 12 | 4,894 |
| `ingestion` | 15 | 6,935 | 10 | 5,259 | 5 | 1,676 |
| `documents` | 6 | 3,657 | 4 | 1,749 | 2 | 1,908 |
| `gitlab` | 11 | 3,607 | 8 | 2,391 | 3 | 1,216 |
| `embedding` | 10 | 1,878 | 7 | 1,327 | 3 | 551 |
| `search` | 6 | 1,115 | 6 | 1,115 | 0 | 0 |
| `main.rs` | 1 | 3,744 | 1 | 3,744 | 0 | 0 |
| `lib.rs` | 1 | 9 | 1 | 9 | 0 | 0 |

Total in `src/`: **131 files / 62,569 lines**.

### 2.2 Largest production hotspots

| File | Lines | Why it matters |
|---|---:|---|
| `src/main.rs` | 3,744 | Binary entrypoint is doing too much dispatch and formatting work |
| `src/cli/autocorrect.rs` | 1,865 | Large parsing/correction ruleset in one file |
| `src/ingestion/orchestrator.rs` | 1,753 | Multi-stage ingestion orchestration and persistence mixed together |
| `src/cli/commands/show.rs` | 1,544 | Issue/MR retrieval + rendering + JSON conversion all in one file |
| `src/cli/render.rs` | 1,482 | Theme, table layout, formatting utilities bundled together |
| `src/cli/commands/list.rs` | 1,383 | Issues + MRs + notes listing/query/printing in one file |
| `src/cli/mod.rs` | 1,268 | Clap root parser plus every args struct |
| `src/cli/commands/sync.rs` | 1,201 | Sync flow + human rendering + JSON output |
| `src/cli/commands/me/queries.rs` | 1,135 | Multiple query families and post-processing logic |
| `src/cli/commands/ingest.rs` | 1,116 | Ingest flow + dry-run + presentation concerns |
| `src/documents/extractor.rs` | 1,059 | Four document source extractors in one file |

### 2.3 High-level dependency flow (top modules)

Observed module coupling from imports:
- `cli -> core` (very heavy, 33 files)
- `cli -> documents/embedding/gitlab/ingestion/search` (command-dependent)
- `ingestion -> core` (12 files), `ingestion -> gitlab` (10 files)
- `search -> core` and `search -> embedding`
- `timeline` logic currently located under `core/*timeline*` but semantically acts as its own subsystem

### 2.4 Structural pain points

1. `main.rs` is overloaded with command handlers, robot output envelope types, clap error mapping, and domain invocation.
2. `cli/mod.rs` mixes root parser concerns with command-specific argument schemas.
3. `core/` still holds domain-specific subsystems (`timeline`, cross-reference extraction, ingestion persistence helpers) that are not truly "core infra".
4. Several large command files combine query/build/fetch/render/json responsibilities.
5. Test helper setup is duplicated heavily in large test files (`who_tests`, `list_tests`, `me_tests`).

---

## 3. Reorganization Principles

1. Keep top-level domains explicit: `cli`, `core` (infra), `gitlab`, `ingestion`, `documents`, `embedding`, `search`, plus extracted domain modules where justified.
2. Keep nesting shallow: max 2-3 levels in normal workflow paths.
3. Co-locate command-specific args/types/rendering with the command implementation.
4. Separate orchestration from formatting from data-access code.
5. Prefer module boundaries that map to runtime pipeline boundaries.
6. Make import paths reveal ownership directly.

---

## 4. Proposed Target Structure (End State)

```text
src/
  main.rs                      # thin binary entrypoint
  lib.rs

  app/                         # NEW: runtime dispatch/orchestration glue
    mod.rs
    dispatch.rs
    errors.rs
    robot_docs.rs

  cli/
    mod.rs                     # Cli + Commands only
    args.rs                    # shared args structs used by Commands variants
    render/
      mod.rs
      format.rs
      table.rs
      theme.rs
    autocorrect/
      mod.rs
      flags.rs
      enums.rs
      fuzzy.rs
    commands/
      mod.rs
      list/
        mod.rs
        issues.rs
        mrs.rs
        notes.rs
        render.rs
      show/
        mod.rs
        issue.rs
        mr.rs
        render.rs
      me/                      # keep existing folder, retain split style
      who/                     # keep existing folder, retain split style
      ingest/
        mod.rs
        run.rs
        dry_run.rs
        render.rs
      sync/
        mod.rs
        run.rs
        render.rs
        surgical.rs
      # smaller focused commands can stay single-file for now

  core/                        # infra-only boundary after moves
    mod.rs
    backoff.rs
    config.rs
    cron.rs
    cursor.rs
    db.rs
    error.rs
    file_history.rs
    lock.rs
    logging.rs
    metrics.rs
    path_resolver.rs
    paths.rs
    project.rs
    shutdown.rs
    time.rs
    trace.rs

  timeline/                    # NEW: extracted domain subsystem
    mod.rs
    types.rs
    seed.rs
    expand.rs
    collect.rs

  xref/                        # NEW: extracted cross-reference subsystem
    mod.rs
    note_parser.rs
    references.rs

  ingestion/
    mod.rs
    issues.rs
    merge_requests.rs
    discussions.rs
    mr_discussions.rs
    mr_diffs.rs
    dirty_tracker.rs
    discussion_queue.rs
    orchestrator/
      mod.rs
      issues_flow.rs
      mrs_flow.rs
      resource_events.rs
      closes_issues.rs
      diff_jobs.rs
      progress.rs
    storage/                   # NEW: ingestion-owned persistence helpers
      mod.rs
      payloads.rs              # from core/payloads.rs
      events.rs                # from core/events_db.rs
      queue.rs                 # from core/dependent_queue.rs
      sync_run.rs              # from core/sync_run.rs

  documents/
    mod.rs
    extractor/
      mod.rs
      issues.rs
      mrs.rs
      discussions.rs
      notes.rs
      common.rs
    regenerator.rs
    truncation.rs

  embedding/
    mod.rs
    change_detector.rs
    chunks.rs                  # merge chunk_ids.rs + chunking.rs
    ollama.rs
    pipeline.rs
    similarity.rs

  gitlab/
    # mostly keep as-is (already coherent)

  search/
    # mostly keep as-is (already coherent)
```

Notes:
- `gitlab/` and `search/` are already cohesive and should largely remain unchanged.
- `who/` and `me/` command families are already split well relative to other commands.

---

## 5. Detailed Change Plan (Phased)

## Phase 1: Domain Boundary Extraction (lowest conceptual risk, high clarity gain)

### 5.1 Extract timeline subsystem from `core`

Move:
- `src/core/timeline.rs` -> `src/timeline/types.rs`
- `src/core/timeline_seed.rs` -> `src/timeline/seed.rs`
- `src/core/timeline_expand.rs` -> `src/timeline/expand.rs`
- `src/core/timeline_collect.rs` -> `src/timeline/collect.rs`
- add `src/timeline/mod.rs`

Why:
- Timeline is a full pipeline domain (seed -> expand -> collect), not core infra.
- Improves discoverability for `lore timeline` and timeline tests.

Calling-code updates required:
- `src/cli/commands/timeline.rs`
  - `crate::core::timeline::*` -> `crate::timeline::*`
  - `crate::core::timeline_seed::*` -> `crate::timeline::seed::*`
  - `crate::core::timeline_expand::*` -> `crate::timeline::expand::*`
  - `crate::core::timeline_collect::*` -> `crate::timeline::collect::*`
- `tests/timeline_pipeline_tests.rs`
  - `lore::core::timeline*` imports -> `lore::timeline::*`
- internal references among moved files update from `crate::core::timeline` to `crate::timeline::types`
- `src/core/mod.rs`: remove `timeline*` module declarations
- `src/lib.rs`: add `pub mod timeline;`

### 5.2 Extract cross-reference subsystem from `core`

Move:
- `src/core/note_parser.rs` -> `src/xref/note_parser.rs`
- `src/core/references.rs` -> `src/xref/references.rs`
- add `src/xref/mod.rs`

Why:
- Cross-reference extraction is a domain subsystem feeding ingestion and timeline.
- Current placement in `core/` obscures data flow.

Calling-code updates required:
- `src/ingestion/orchestrator.rs`
  - `crate::core::references::*` -> `crate::xref::references::*`
  - `crate::core::note_parser::*` -> `crate::xref::note_parser::*`
- `src/core/mod.rs`: remove `note_parser` and `references`
- `src/lib.rs`: add `pub mod xref;`
- tests referencing old paths update to `crate::xref::*`

### 5.3 Move ingestion-owned persistence helpers out of `core`

Move:
- `src/core/payloads.rs` -> `src/ingestion/storage/payloads.rs`
- `src/core/events_db.rs` -> `src/ingestion/storage/events.rs`
- `src/core/dependent_queue.rs` -> `src/ingestion/storage/queue.rs`
- `src/core/sync_run.rs` -> `src/ingestion/storage/sync_run.rs`
- add `src/ingestion/storage/mod.rs`

Why:
- These files primarily support ingestion/sync runtime behavior and ingestion persistence.
- Consolidates ingestion runtime + ingestion storage into one domain area.

Calling-code updates required:
- `src/ingestion/discussions.rs`, `issues.rs`, `merge_requests.rs`, `mr_discussions.rs`
  - `core::payloads::*` -> `ingestion::storage::payloads::*`
- `src/ingestion/orchestrator.rs`
  - `core::dependent_queue::*` -> `ingestion::storage::queue::*`
  - `core::events_db::*` -> `ingestion::storage::events::*`
- `src/main.rs`
  - `core::dependent_queue::release_all_locked_jobs` -> `ingestion::storage::queue::release_all_locked_jobs`
  - `core::sync_run::SyncRunRecorder` -> `ingestion::storage::sync_run::SyncRunRecorder`
- `src/cli/commands/count.rs`
  - `core::events_db::*` -> `ingestion::storage::events::*`
- `src/cli/commands/sync_surgical.rs`
  - `core::sync_run::SyncRunRecorder` -> `ingestion::storage::sync_run::SyncRunRecorder`
- `src/core/mod.rs`: remove moved modules
- `src/ingestion/mod.rs`: export `pub mod storage;`

---

## Phase 2: CLI Structure Cleanup (high dev ergonomics impact)

### 5.4 Split `cli/mod.rs` responsibilities

Current:
- root parser (`Cli`, `Commands`)
- all args structs (`IssuesArgs`, `WhoArgs`, `MeArgs`, etc.)

Proposed:
- `src/cli/mod.rs`: only `Cli`, `Commands`, top-level parser behavior
- `src/cli/args.rs`: all args structs and command-local enums (`CronAction`, `TokenAction`)

Why:
- keeps parser root small and readable
- one canonical place for args schemas

Calling-code updates required:
- `src/main.rs`
  - `use lore::cli::{..., WhoArgs, ...}` -> `use lore::cli::args::{...}` (or re-export from `cli/mod.rs`)
- `src/cli/commands/who/mod.rs`
  - `use crate::cli::WhoArgs;` -> `use crate::cli::args::WhoArgs;`
- `src/cli/commands/me/mod.rs`
  - `use crate::cli::MeArgs;` -> `use crate::cli::args::MeArgs;`

### 5.5 Make `main.rs` thin by moving dispatch logic to `app/`

Proposed splits from `main.rs`:
- `app/dispatch.rs`: all `handle_*` command handlers
- `app/errors.rs`: clap error mapping, correction warning formatting
- `app/robot_docs.rs`: robot docs schema/data envelope generation
- keep `main.rs`: startup, logging init, parse, delegate to dispatcher

Why:
- reduces entrypoint complexity and improves testability of dispatch behavior
- isolates robot docs machinery from runtime bootstrapping

Calling-code updates required:
- `main.rs`: replace direct handler function definitions with calls into `app::*`
- `lib.rs`: add `pub mod app;` if shared imports needed by tests

---

## Phase 3: Split Large Command Files by Responsibility

### 5.6 Split `cli/commands/list.rs`

Proposed:
- `commands/list/issues.rs` (issue queries + issue output)
- `commands/list/mrs.rs` (MR queries + MR output)
- `commands/list/notes.rs` (note queries + note output)
- `commands/list/render.rs` (shared formatting helpers)
- `commands/list/mod.rs` (public API and re-exports)

Why:
- list concerns are already logically tripartite
- better locality for bugfixes and feature additions

Calling-code updates required:
- `src/cli/commands/mod.rs`: import module folder and re-export unchanged API names
- `src/main.rs`: ideally no change if `commands/mod.rs` re-exports remain stable

### 5.7 Split `cli/commands/show.rs`

Proposed:
- `commands/show/issue.rs`
- `commands/show/mr.rs`
- `commands/show/render.rs`
- `commands/show/mod.rs`

Why:
- issue and MR detail assembly have separate SQL and shape logic
- rendering concerns can be isolated from data retrieval

Calling-code updates required:
- `src/cli/commands/mod.rs` re-exports preserved (`run_show_issue`, `run_show_mr`, printers)
- `src/main.rs` remains stable if re-exports preserved

### 5.8 Split `cli/commands/ingest.rs` and `cli/commands/sync.rs`

Proposed:
- `commands/ingest/run.rs`, `dry_run.rs`, `render.rs`, `mod.rs`
- `commands/sync/run.rs`, `render.rs`, `surgical.rs`, `mod.rs`

Why:
- orchestration, preview generation, and output rendering are currently intertwined
- surgical sync is semantically part of sync command family

Calling-code updates required:
- update `src/cli/commands/mod.rs` exports
- update `src/cli/commands/sync_surgical.rs` path if merged into `commands/sync/surgical.rs`
- no CLI UX changes expected if external API names remain

### 5.9 Split `documents/extractor.rs`

Proposed:
- `documents/extractor/issues.rs`
- `documents/extractor/mrs.rs`
- `documents/extractor/discussions.rs`
- `documents/extractor/notes.rs`
- `documents/extractor/common.rs`
- `documents/extractor/mod.rs`

Why:
- extractor currently contains four independent source-type extraction paths
- per-source unit tests become easier to target

Calling-code updates required:
- `src/documents/mod.rs` re-export surface remains stable
- `src/documents/regenerator.rs` imports update only if internal re-export paths change

---

## Phase 4: Opportunistic Consolidations

### 5.10 Merge tiny embedding chunk helpers

Merge:
- `src/embedding/chunk_ids.rs`
- `src/embedding/chunking.rs`
- into `src/embedding/chunks.rs`

Why:
- both represent one conceptual concern: chunk partitioning and chunk identity mapping
- avoids tiny-file scattering

Calling-code updates required:
- `src/embedding/pipeline.rs`
- `src/embedding/change_detector.rs`
- `src/search/vector.rs`
- `src/embedding/mod.rs` exports

### 5.11 Test helper de-duplication

Add a shared test support module for repeated DB fixture setup currently duplicated in:
- `src/cli/commands/who_tests.rs`
- `src/cli/commands/list_tests.rs`
- `src/cli/commands/me/me_tests.rs`
- multiple `core/*_tests.rs`

Why:
- lower maintenance cost and fewer fixture drift bugs

Calling-code updates required:
- test-only imports in affected files

---

## 6. File-Level Recommendation Matrix

Legend:
- `KEEP`: structure is already coherent
- `MOVE`: relocate without major logic split
- `SPLIT`: divide into focused files/modules
- `MERGE`: consolidate tiny related files

### 6.1 `core/`

- `backoff.rs` -> KEEP
- `config.rs` -> KEEP (large but cohesive)
- `cron.rs` -> KEEP
- `cursor.rs` -> KEEP
- `db.rs` -> KEEP
- `dependent_queue.rs` -> MOVE to `ingestion/storage/queue.rs`
- `error.rs` -> KEEP
- `events_db.rs` -> MOVE to `ingestion/storage/events.rs`
- `file_history.rs` -> KEEP
- `lock.rs` -> KEEP
- `logging.rs` -> KEEP
- `metrics.rs` -> KEEP
- `note_parser.rs` -> MOVE to `xref/note_parser.rs`
- `path_resolver.rs` -> KEEP
- `paths.rs` -> KEEP
- `payloads.rs` -> MOVE to `ingestion/storage/payloads.rs`
- `project.rs` -> KEEP
- `references.rs` -> MOVE to `xref/references.rs`
- `shutdown.rs` -> KEEP
- `sync_run.rs` -> MOVE to `ingestion/storage/sync_run.rs`
- `time.rs` -> KEEP
- `timeline.rs`, `timeline_seed.rs`, `timeline_expand.rs`, `timeline_collect.rs` -> MOVE to `timeline/`
- `trace.rs` -> KEEP

### 6.2 `cli/`

- `mod.rs` -> SPLIT (`mod.rs` + `args.rs`)
- `autocorrect.rs` -> SPLIT into `autocorrect/` submodules
- `render.rs` -> SPLIT into `render/` submodules
- `commands/list.rs` -> SPLIT into `commands/list/`
- `commands/show.rs` -> SPLIT into `commands/show/`
- `commands/ingest.rs` -> SPLIT into `commands/ingest/`
- `commands/sync.rs` + `commands/sync_surgical.rs` -> SPLIT/MERGE into `commands/sync/`
- `commands/me/*` -> KEEP (already good shape)
- `commands/who/*` -> KEEP (already good shape)
- small focused commands (`auth_test`, `embed`, `trace`, etc.) -> KEEP

### 6.3 `documents/`

- `extractor.rs` -> SPLIT into extractor folder
- `regenerator.rs` -> KEEP
- `truncation.rs` -> KEEP

### 6.4 `embedding/`

- `change_detector.rs` -> KEEP
- `chunk_ids.rs` + `chunking.rs` -> MERGE into `chunks.rs`
- `ollama.rs` -> KEEP
- `pipeline.rs` -> KEEP for now (already a pipeline-centric file)
- `similarity.rs` -> KEEP

### 6.5 `gitlab/`, `search/`

- KEEP as-is except minor internal refactors only when touched by feature work

---

## 7. Import/Call-Site Impact Tracker (must-update list)

This section tracks files that must be updated when moves happen to avoid broken builds.

### 7.1 For timeline extraction

Must update:
- `src/cli/commands/timeline.rs`
- `tests/timeline_pipeline_tests.rs`
- moved timeline module internals (`seed`, `expand`, `collect`)
- `src/core/mod.rs`
- `src/lib.rs`

### 7.2 For xref extraction

Must update:
- `src/ingestion/orchestrator.rs` (all `core::references` and `core::note_parser` paths)
- tests importing moved modules
- `src/core/mod.rs`
- `src/lib.rs`

### 7.3 For ingestion storage move

Must update:
- `src/ingestion/discussions.rs`
- `src/ingestion/issues.rs`
- `src/ingestion/merge_requests.rs`
- `src/ingestion/mr_discussions.rs`
- `src/ingestion/orchestrator.rs`
- `src/cli/commands/count.rs`
- `src/cli/commands/sync_surgical.rs`
- `src/main.rs`
- `src/core/mod.rs`
- `src/ingestion/mod.rs`

### 7.4 For CLI args split

Must update:
- `src/main.rs`
- `src/cli/commands/who/mod.rs`
- `src/cli/commands/me/mod.rs`
- any command file importing args directly from `crate::cli::*Args`

### 7.5 For command file splits

Must update:
- `src/cli/commands/mod.rs` re-exports
- tests that import command internals by file/module path
- `src/main.rs` only if re-export names change (recommended: keep names stable)

---

## 8. Execution Strategy (Safe Order)

Recommended order:
1. Phase 1 (`timeline`, `xref`, `ingestion/storage`) with no behavior changes.
2. Phase 2 (`cli/mod.rs` split, `main.rs` thinning) while preserving command signatures.
3. Phase 3 (`list`, `show`, `ingest`, `sync`, `extractor` splits).
4. Phase 4 opportunistic merges and test helper dedupe.

For each phase:
- complete file moves/splits and import rewrites in one cohesive change
- run quality gates
- only then proceed to next phase

---

## 9. Verification and Non-Regression Checklist

After each phase, run:

```bash
cargo check --all-targets
cargo clippy --all-targets -- -D warnings
cargo fmt --check
cargo test
cargo test -- --nocapture
```

Targeted suites to run when relevant:
- timeline moves: `cargo test timeline_pipeline_tests`
- who/me/list splits: `cargo test who_tests`, `cargo test list_tests`, `cargo test me_tests`
- ingestion storage moves: `cargo test ingestion`

Before each commit, run UBS on changed files:

```bash
ubs <changed-files>
```

---

## 10. Risks and Mitigations

Primary risks:
1. Import path churn causing compile errors.
2. Accidental visibility changes (`pub`/`pub(crate)`) during file splits.
3. Re-export drift breaking `main.rs` or tests.
4. Behavioral drift from mixed refactor + logic changes.

Mitigations:
- refactor-only phases (no feature changes)
- keep public API names stable during directory reshapes
- preserve command re-exports in `cli/commands/mod.rs`
- run full quality gates after each phase

---

## 11. Recommendation

Start with **Phase 1 only** in the first implementation pass. It yields major clarity gains with relatively constrained blast radius.

If Phase 1 lands cleanly, proceed with Phase 2. Phase 3 should be done in smaller PR-sized chunks (`list` first, then `show`, then `ingest/sync`, then `documents/extractor`).

No code/file moves have been executed yet; this document is the proposal for review and approval.