misc
This commit is contained in:
427
IMPLEMENTATION_PLAN.md
Normal file
427
IMPLEMENTATION_PLAN.md
Normal file
@@ -0,0 +1,427 @@
|
||||
# swagger-cli Refactoring Plan: 5 System Weakness Remediations
|
||||
|
||||
## Epic 1: Output Sink Abstraction
|
||||
|
||||
### Problem
|
||||
Every command handler takes `robot: bool`, manually branches on it 29+ times, and contains its own formatting logic. The `output/` module is a hollow shell (human.rs is 8 lines). Adding new output formats (YAML is on the roadmap) requires touching every command. Output concerns are tangled into business logic.
|
||||
|
||||
### Solution
|
||||
Introduce an `OutputMode` enum and an `emit()` function that each command calls once with its typed data. Commands produce data; the output layer decides presentation.
|
||||
|
||||
### Design
|
||||
|
||||
**New type in `src/output/mod.rs`:**
|
||||
```rust
|
||||
pub enum OutputMode {
|
||||
Robot,
|
||||
Human,
|
||||
// Future: Yaml, Table, etc.
|
||||
}
|
||||
```
|
||||
|
||||
**New `emit` function pattern:**
|
||||
Each command defines a `CommandOutput` trait impl or uses a generic `emit<T: Serialize + HumanDisplay>()` function. The `HumanDisplay` trait provides the human-readable rendering.
|
||||
|
||||
```rust
|
||||
pub trait HumanDisplay {
|
||||
fn display_human(&self, w: &mut dyn Write) -> std::io::Result<()>;
|
||||
}
|
||||
|
||||
pub fn emit<T: Serialize + HumanDisplay>(
|
||||
data: &T,
|
||||
mode: OutputMode,
|
||||
command: &str,
|
||||
duration: Duration,
|
||||
) {
|
||||
match mode {
|
||||
OutputMode::Robot => robot::robot_success(data, command, duration),
|
||||
OutputMode::Human => {
|
||||
let mut stdout = std::io::stdout().lock();
|
||||
data.display_human(&mut stdout).expect("stdout write");
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation Steps
|
||||
|
||||
1. **Create `OutputMode` enum and `HumanDisplay` trait** in `src/output/mod.rs`
|
||||
2. **Create `emit()` function** in `src/output/mod.rs`
|
||||
3. **Convert `tags` command first** (simplest command, 168 lines) as proof-of-concept:
|
||||
- Impl `HumanDisplay` for `TagsOutput`
|
||||
- Replace `if robot_mode { ... } else { ... }` with `emit(&output, mode, "tags", duration)`
|
||||
- Remove `robot: bool` parameter from `execute()`, replace with `OutputMode`
|
||||
4. **Propagate OutputMode through main.rs** -- replace `let robot = resolve_robot_mode(&cli)` with `let mode = resolve_output_mode(&cli)` returning `OutputMode`
|
||||
5. **Convert remaining simple commands** one at a time (show, schemas, diff, search, cache_cmd, aliases, doctor, list)
|
||||
6. **Convert fetch and sync** (most complex, last)
|
||||
7. **Delete dead code** -- `robot: bool` parameters, inline formatting blocks
|
||||
|
||||
### TDD Plan
|
||||
|
||||
**RED tests to write first (in `src/output/mod.rs` tests):**
|
||||
- `test_emit_robot_mode_produces_valid_json` -- call `emit()` with Robot mode, capture stdout, parse as `RobotEnvelope`
|
||||
- `test_emit_human_mode_produces_readable_text` -- call `emit()` with Human mode, verify no JSON in output
|
||||
- `test_human_display_tags_output` -- verify `TagsOutput::display_human()` produces table with correct headers
|
||||
|
||||
**GREEN:** Implement `OutputMode`, `HumanDisplay`, `emit()` to pass tests.
|
||||
|
||||
**REFACTOR:** Convert each command one at a time, running full test suite between each.
|
||||
|
||||
### Acceptance Criteria
|
||||
- Zero `if robot` / `if !robot` branches in command handlers
|
||||
- `output/human.rs` contains all human formatting logic (not 8 lines)
|
||||
- Adding a new output format requires zero changes to command handlers
|
||||
- All existing integration tests and golden tests pass unchanged
|
||||
|
||||
### Files Changed
|
||||
- `src/output/mod.rs` -- add `OutputMode`, `HumanDisplay`, `emit()`
|
||||
- `src/output/human.rs` -- move all human formatting here
|
||||
- `src/main.rs` -- `resolve_output_mode()` replaces `resolve_robot_mode()`
|
||||
- `src/cli/tags.rs` -- proof of concept conversion
|
||||
- `src/cli/show.rs`, `src/cli/schemas.rs`, `src/cli/diff.rs`, `src/cli/search.rs`, `src/cli/cache_cmd.rs`, `src/cli/aliases.rs`, `src/cli/doctor.rs`, `src/cli/list.rs` -- convert each
|
||||
- `src/cli/fetch.rs`, `src/cli/sync_cmd.rs` -- convert last (most complex)
|
||||
- `src/cli/mod.rs` -- no changes needed (Cli struct keeps `robot` flag, OutputMode resolves from it)
|
||||
|
||||
### Dependency Note
|
||||
This epic should be completed before Epic 2 (sync split) since sync_cmd conversion is part of this work.
|
||||
|
||||
---
|
||||
|
||||
## Epic 2: Split sync_cmd.rs
|
||||
|
||||
### Problem
|
||||
`sync_cmd.rs` is 1,843 lines containing 5+ distinct responsibilities: CLI args, diff computation, checkpoint/resume, per-host rate limiting, concurrent job pool, single-alias sync, batch sync, and output formatting. It's the most complex module and the hardest to reason about.
|
||||
|
||||
### Solution
|
||||
Split into a `sync/` directory with focused modules while preserving the existing public API (`execute()` function).
|
||||
|
||||
### Design
|
||||
|
||||
**New directory: `src/cli/sync/`**
|
||||
|
||||
| File | Responsibility | Lines (est.) |
|
||||
|------|---------------|-------------|
|
||||
| `mod.rs` | Re-exports, `Args` struct, `execute()` entry point | ~100 |
|
||||
| `types.rs` | All Serialize structs: `SyncOutput`, `AliasSyncResult`, `SyncAllOutput`, `ChangeSummary`, `ChangeDetails`, `EndpointKey`, `SchemaDiff`, `EndpointDiff` | ~100 |
|
||||
| `diff.rs` | `compute_diff()`, `endpoint_key()`, `endpoint_fingerprint()` | ~120 |
|
||||
| `checkpoint.rs` | `SyncCheckpoint`, `load_checkpoint()`, `save_checkpoint()`, `remove_checkpoint()` | ~60 |
|
||||
| `throttle.rs` | `PerHostThrottle`, `extract_host()` | ~50 |
|
||||
| `single.rs` | `sync_inner()`, `sync_one_alias()`, `sync_one_alias_inner()`, `output_no_changes()`, `output_changes()` | ~250 |
|
||||
| `batch.rs` | `sync_all_inner()` with concurrent stream logic | ~300 |
|
||||
|
||||
### Implementation Steps
|
||||
|
||||
1. **Create `src/cli/sync/` directory**
|
||||
2. **Move types first** -- extract all `#[derive(Serialize)]` structs into `types.rs`
|
||||
3. **Extract `diff.rs`** -- `compute_diff()`, `endpoint_key()`, `endpoint_fingerprint()`, `MAX_DETAIL_ITEMS`
|
||||
4. **Extract `checkpoint.rs`** -- checkpoint load/save/remove + `CHECKPOINT_FILE` const
|
||||
5. **Extract `throttle.rs`** -- `PerHostThrottle` and `extract_host()`
|
||||
6. **Extract `single.rs`** -- single-alias sync logic and output helpers
|
||||
7. **Extract `batch.rs`** -- `sync_all_inner()` concurrent execution
|
||||
8. **Create `mod.rs`** -- `Args` struct, `execute()`, re-exports
|
||||
9. **Update `src/cli/mod.rs`** -- replace `pub mod sync_cmd` with `pub mod sync`
|
||||
10. **Update `src/main.rs`** -- `Commands::Sync` references
|
||||
|
||||
### TDD Plan
|
||||
|
||||
**This is a pure refactor -- no new behavior.** The TDD approach is:
|
||||
- Run full test suite before starting: `cargo test`
|
||||
- After each extraction step, verify: `cargo test && cargo clippy --all-targets -- -D warnings`
|
||||
- No new tests needed (existing tests in `sync_cmd::tests` move to `sync/diff.rs` tests)
|
||||
|
||||
**Specific test verification after each step:**
|
||||
- `test_diff_no_changes`, `test_diff_added_endpoint`, `test_diff_removed_endpoint`, `test_diff_modified_endpoint`, `test_diff_added_schema`, `test_diff_removed_schema`, `test_diff_endpoint_modified_by_params` -- move to `sync/diff.rs`
|
||||
- Integration tests (`tests/integration_test.rs`) that test `sync` command -- must pass unchanged
|
||||
|
||||
### Acceptance Criteria
|
||||
- `sync_cmd.rs` is deleted, replaced by `sync/` directory
|
||||
- No file in `sync/` exceeds 350 lines
|
||||
- All existing tests pass with zero modifications
|
||||
- `pub async fn execute()` signature unchanged
|
||||
- `cargo clippy --all-targets -- -D warnings` passes
|
||||
|
||||
### Files Changed
|
||||
- **Delete:** `src/cli/sync_cmd.rs`
|
||||
- **Create:** `src/cli/sync/mod.rs`, `types.rs`, `diff.rs`, `checkpoint.rs`, `throttle.rs`, `single.rs`, `batch.rs`
|
||||
- **Modify:** `src/cli/mod.rs` (module declaration), `src/main.rs` (import path if needed)
|
||||
|
||||
### Dependency Note
|
||||
Should be done after Epic 1 (output sink) since that will have already simplified the output formatting within sync. If done before Epic 1, the output formatting moves to `single.rs` and `batch.rs` and then needs to be re-extracted during Epic 1.
|
||||
|
||||
---
|
||||
|
||||
## Epic 3: spawn_blocking for Cache I/O
|
||||
|
||||
### Problem
|
||||
`CacheManager` uses `std::fs` operations and `fs2::FileExt` (flock) -- all blocking I/O. But command handlers are `async fn` running on tokio. Under `sync --all --jobs=4`, multiple concurrent tasks hit blocking cache I/O, potentially starving the tokio runtime.
|
||||
|
||||
### Solution
|
||||
Wrap cache operations in `tokio::task::spawn_blocking()`. Create async wrapper methods on a new `AsyncCacheManager` or add async variants directly to `CacheManager`.
|
||||
|
||||
### Design
|
||||
|
||||
**Option A (recommended): Async wrapper struct**
|
||||
```rust
|
||||
pub struct AsyncCache {
|
||||
inner: CacheManager,
|
||||
}
|
||||
|
||||
impl AsyncCache {
|
||||
pub async fn load_index(&self, alias: &str) -> Result<(SpecIndex, CacheMetadata), SwaggerCliError> {
|
||||
let inner = self.inner.clone(); // CacheManager is just a PathBuf, cheap
|
||||
let alias = alias.to_string();
|
||||
tokio::task::spawn_blocking(move || inner.load_index(&alias))
|
||||
.await
|
||||
.map_err(|e| SwaggerCliError::Cache(format!("task join error: {e}")))?
|
||||
}
|
||||
// ... same pattern for write_cache, load_raw, list_aliases, etc.
|
||||
}
|
||||
```
|
||||
|
||||
This preserves the existing sync `CacheManager` for tests and simple cases while providing an async-safe interface for the runtime.
|
||||
|
||||
### Implementation Steps
|
||||
|
||||
1. **Derive `Clone` for `CacheManager`** (it's just a `PathBuf` wrapper -- trivial)
|
||||
2. **Create `AsyncCache` in `src/core/cache.rs`** with async wrappers for each public method:
|
||||
- `load_index()`, `load_raw()`, `write_cache()`, `list_aliases()`, `ensure_dirs()`, `alias_dir()`, `delete_alias_dir()`, `update_last_accessed()`
|
||||
3. **Write tests for `AsyncCache`** verifying it produces identical results to sync `CacheManager`
|
||||
4. **Convert command handlers** to use `AsyncCache` instead of `CacheManager`:
|
||||
- Start with `tags.rs` (simplest)
|
||||
- Then `list.rs`, `show.rs`, `search.rs`, `schemas.rs`
|
||||
- Then `sync_cmd.rs` / `sync/` (most critical -- this is where contention happens)
|
||||
- Then `fetch.rs`, `doctor.rs`, `cache_cmd.rs`, `aliases.rs`
|
||||
|
||||
### TDD Plan
|
||||
|
||||
**RED tests first:**
|
||||
```rust
|
||||
#[tokio::test]
|
||||
async fn test_async_cache_load_index_matches_sync() {
|
||||
// Setup: write a spec to cache using sync CacheManager
|
||||
// Act: load via AsyncCache
|
||||
// Assert: result matches sync CacheManager::load_index()
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_async_cache_write_then_load_roundtrip() {
|
||||
// Write via AsyncCache, load via AsyncCache, verify integrity
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_async_cache_concurrent_reads_no_panic() {
|
||||
// Spawn 10 concurrent load_index tasks on same alias
|
||||
// All should succeed (shared read lock)
|
||||
}
|
||||
```
|
||||
|
||||
**GREEN:** Implement `AsyncCache` wrapper.
|
||||
|
||||
**REFACTOR:** Convert command handlers one at a time.
|
||||
|
||||
### Acceptance Criteria
|
||||
- All cache I/O from async command handlers goes through `spawn_blocking`
|
||||
- Existing sync `CacheManager` preserved for non-async tests
|
||||
- `cargo test` passes (including lock_contention_test.rs)
|
||||
- No direct `std::fs` calls from async contexts in command handlers
|
||||
|
||||
### Files Changed
|
||||
- `src/core/cache.rs` -- add `Clone` derive, add `AsyncCache` struct
|
||||
- All `src/cli/*.rs` files -- replace `CacheManager::new()` with `AsyncCache::new()`
|
||||
|
||||
### Dependency Note
|
||||
Independent of Epics 1 and 2. Can be done in parallel.
|
||||
|
||||
---
|
||||
|
||||
## Epic 4: Property Tests for Search Engine and Parser
|
||||
|
||||
### Problem
|
||||
Property tests currently cover only 4 trivial properties (hash determinism, JSON roundtrip, index ordering, hash format). The search engine (722 LOC with scoring, tokenization, Unicode snippet handling) and the OpenAPI parser/indexer (659 LOC) have zero property test coverage.
|
||||
|
||||
### Solution
|
||||
Add targeted property tests using proptest for the search engine and format detection/parsing.
|
||||
|
||||
### Design
|
||||
|
||||
**New test file: `tests/property_search_test.rs`**
|
||||
|
||||
Property tests for search engine:
|
||||
1. **Score monotonicity:** adding a matching field to an endpoint should never decrease its score
|
||||
2. **Deterministic ordering:** same query + same index = same result order, always
|
||||
3. **Limit respected:** result count <= opts.limit for any query
|
||||
4. **Coverage boost property:** matching N/N terms scores >= matching 1/N terms
|
||||
5. **Case insensitivity:** lowercased query on lowercased data = same results as mixed case
|
||||
6. **Empty query safety:** any whitespace-only string returns empty
|
||||
7. **Unicode safety:** search never panics on arbitrary Unicode input (including emoji, RTL, zero-width chars)
|
||||
8. **Snippet bounds:** snippet length never exceeds 50 chars + ellipsis markers
|
||||
|
||||
**New test file: `tests/property_indexer_test.rs`**
|
||||
|
||||
Property tests for indexer:
|
||||
1. **Format detection idempotent:** detect_format(bytes, hint, ct) called twice = same result
|
||||
2. **JSON sniffing correct:** bytes starting with `{` or `[` always detected as JSON
|
||||
3. **Normalize roundtrip:** normalize_to_json(bytes, Json) preserves semantic content
|
||||
4. **build_index never panics on valid OpenAPI:** generate random valid-ish OpenAPI structures, verify build_index produces a valid SpecIndex
|
||||
5. **Index endpoint count matches paths:** number of IndexedEndpoints = sum of methods across all paths
|
||||
6. **Content hash deterministic:** same bytes -> same hash, every time
|
||||
|
||||
### Implementation Steps
|
||||
|
||||
1. **Create `tests/property_search_test.rs`** with proptest strategies for generating `SpecIndex` and `SearchOptions`
|
||||
2. **Implement search property tests** (8 tests above)
|
||||
3. **Create `tests/property_indexer_test.rs`** with proptest strategies for generating OpenAPI-like JSON
|
||||
4. **Implement indexer property tests** (6 tests above)
|
||||
5. **Run with extended case count** to verify: `cargo test -- --ignored` or `PROPTEST_CASES=1000 cargo test`
|
||||
|
||||
### TDD Plan
|
||||
|
||||
These ARE the tests. Write them, then verify they pass against the existing implementation. If any fail, that's a real bug to fix.
|
||||
|
||||
**Proptest strategy for SpecIndex generation:**
|
||||
```rust
|
||||
fn arb_indexed_endpoint() -> impl Strategy<Value = IndexedEndpoint> {
|
||||
(
|
||||
"/[a-z]{1,5}(/[a-z]{1,5}){0,3}", // path
|
||||
prop_oneof!["GET", "POST", "PUT", "DELETE", "PATCH"], // method
|
||||
proptest::option::of("[A-Za-z ]{1,50}"), // summary
|
||||
proptest::option::of("[A-Za-z ]{1,100}"), // description
|
||||
).prop_map(|(path, method, summary, description)| {
|
||||
IndexedEndpoint { path, method, summary, description, /* ... defaults ... */ }
|
||||
})
|
||||
}
|
||||
|
||||
fn arb_spec_index() -> impl Strategy<Value = SpecIndex> {
|
||||
proptest::collection::vec(arb_indexed_endpoint(), 0..50)
|
||||
.prop_map(|endpoints| SpecIndex { endpoints, /* ... */ })
|
||||
}
|
||||
```
|
||||
|
||||
### Acceptance Criteria
|
||||
- 14 new property tests (8 search + 6 indexer)
|
||||
- All pass with default proptest case count (256)
|
||||
- No new dependencies (proptest already in dev-deps)
|
||||
- Tests catch at least one real edge case (likely Unicode snippet handling)
|
||||
|
||||
### Files Changed
|
||||
- **Create:** `tests/property_search_test.rs`, `tests/property_indexer_test.rs`
|
||||
- **Possibly modify:** `src/core/search.rs` if property tests reveal bugs
|
||||
|
||||
### Dependency Note
|
||||
Fully independent. Can be done in parallel with all other epics.
|
||||
|
||||
---
|
||||
|
||||
## Epic 5: Shared Command Pipeline
|
||||
|
||||
### Problem
|
||||
Every command handler independently: captures `Instant::now()`, creates `CacheManager::new(cache_dir())`, loads index with `cm.load_index(alias)`, does work, formats output, passes duration. This is ~15-20 lines of identical scaffolding per command. Adding cross-cutting concerns (telemetry, logging, caching headers) requires touching every command.
|
||||
|
||||
### Solution
|
||||
Create a `CommandContext` struct that encapsulates the common setup, and a `run_command()` helper that handles timing and output.
|
||||
|
||||
### Design
|
||||
|
||||
```rust
|
||||
/// Shared context created once per command invocation.
|
||||
pub struct CommandContext {
|
||||
pub cache: AsyncCache, // or CacheManager
|
||||
pub mode: OutputMode,
|
||||
pub start: Instant,
|
||||
pub network_policy: NetworkPolicy,
|
||||
pub config_override: Option<PathBuf>,
|
||||
}
|
||||
|
||||
impl CommandContext {
|
||||
pub fn new(mode: OutputMode, network_flag: &str, config_override: Option<&Path>) -> Result<Self, SwaggerCliError> {
|
||||
Ok(Self {
|
||||
cache: AsyncCache::new(cache_dir()),
|
||||
mode,
|
||||
start: Instant::now(),
|
||||
network_policy: resolve_policy(network_flag)?,
|
||||
config_override: config_override.map(PathBuf::from),
|
||||
})
|
||||
}
|
||||
|
||||
pub fn load_index(&self, alias: &str) -> Result<(SpecIndex, CacheMetadata), SwaggerCliError> {
|
||||
self.cache.load_index(alias)
|
||||
}
|
||||
|
||||
pub fn elapsed(&self) -> Duration {
|
||||
self.start.elapsed()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Usage in commands:**
|
||||
```rust
|
||||
pub async fn execute(args: &Args, ctx: &CommandContext) -> Result<(), SwaggerCliError> {
|
||||
let (index, meta) = ctx.load_index(&args.alias)?;
|
||||
let output = build_output(&index);
|
||||
emit(&output, ctx.mode, "tags", ctx.elapsed());
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
### Implementation Steps
|
||||
|
||||
1. **Define `CommandContext` struct** in `src/cli/context.rs`
|
||||
2. **Write tests** for `CommandContext::new()` and helper methods
|
||||
3. **Update `main.rs`** to create `CommandContext` and pass to commands
|
||||
4. **Convert `tags` first** (simplest) as proof of concept
|
||||
5. **Convert all other commands** to accept `&CommandContext`
|
||||
6. **Remove duplicated `CacheManager::new(cache_dir())` + `Instant::now()` from each handler**
|
||||
|
||||
### TDD Plan
|
||||
|
||||
**RED tests first:**
|
||||
```rust
|
||||
#[test]
|
||||
fn test_command_context_creates_valid_cache() {
|
||||
let ctx = CommandContext::new(OutputMode::Robot, "auto", None).unwrap();
|
||||
// cache dir should be the platform default
|
||||
assert!(ctx.cache.cache_dir().exists() || true); // dir may not exist in test
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_command_context_elapsed_increases() {
|
||||
let ctx = CommandContext::new(OutputMode::Human, "auto", None).unwrap();
|
||||
std::thread::sleep(Duration::from_millis(10));
|
||||
assert!(ctx.elapsed().as_millis() >= 10);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_command_context_offline_policy() {
|
||||
let ctx = CommandContext::new(OutputMode::Robot, "offline", None).unwrap();
|
||||
assert_eq!(ctx.network_policy, NetworkPolicy::Offline);
|
||||
}
|
||||
```
|
||||
|
||||
### Acceptance Criteria
|
||||
- `CommandContext` created once in `main.rs`, passed to all commands
|
||||
- No command handler creates its own `CacheManager` or `Instant`
|
||||
- Each command's `execute()` signature includes `&CommandContext`
|
||||
- All tests pass unchanged
|
||||
|
||||
### Files Changed
|
||||
- **Create:** `src/cli/context.rs`
|
||||
- **Modify:** `src/cli/mod.rs` (add `pub mod context`)
|
||||
- **Modify:** `src/main.rs` (create CommandContext, pass to dispatch)
|
||||
- **Modify:** All `src/cli/*.rs` execute functions (accept `&CommandContext`)
|
||||
|
||||
### Dependency Note
|
||||
Depends on Epic 1 (OutputMode) and Epic 3 (AsyncCache). Should be done last.
|
||||
|
||||
---
|
||||
|
||||
## Execution Order
|
||||
|
||||
```
|
||||
Epic 4 (property tests) ─────────────────────────────────────┐
|
||||
Epic 3 (spawn_blocking) ─────────────────────────────────────┤
|
||||
Epic 1 (output sink) ──> Epic 2 (split sync) ──> Epic 5 (pipeline)
|
||||
```
|
||||
|
||||
Epics 3 and 4 are fully independent and can run in parallel with everything.
|
||||
Epic 1 should precede Epic 2 (simplifies sync output formatting before split).
|
||||
Epic 5 depends on Epics 1 and 3 (needs OutputMode and AsyncCache).
|
||||
Reference in New Issue
Block a user