diff --git a/.beads/issues.jsonl b/.beads/issues.jsonl index 8a33518..61bb310 100644 --- a/.beads/issues.jsonl +++ b/.beads/issues.jsonl @@ -24,11 +24,11 @@ {"id":"bd-30a","title":"Implement aliases command with list, rename, delete, set-default","description":"## Background\nThe aliases command manages multiple API specs. List all aliases with stats, show details, rename, delete, set default. All operations except delete are metadata-only (fast). Delete removes the entire alias directory after acquiring lock.\n\n## Approach\nImplement src/cli/aliases.rs with AliasesArgs and execute():\n\n**AliasesArgs:** list (bool, default), show (Option), rename (Option> — [old, new], requires 2 values), delete (Option), set_default (Option).\n\n**Operations:**\n- **List:** CacheManager.list_aliases() -> display name, url, version, is_default, cached_at, size, endpoint/schema counts\n- **Show:** load meta for specific alias, display full details\n- **Rename:** validate new alias name format (same rules as alias creation — alphanumeric, hyphens, underscores), check new name does not already exist, rename directory atomically using `std::fs::rename()` syscall (atomic on same filesystem), if renamed alias was the default alias in config, update config.default_alias to the new name and save\n- **Delete:** acquire lock on alias directory, remove entire alias directory (`std::fs::remove_dir_all`), if deleted alias was the default alias, clear default_alias in config (set to None), save config. No confirmation prompt — CLI is non-interactive for agent compatibility. PRD says \"explicit delete required\" meaning the user must explicitly pass --delete, but no interactive Y/N prompt.\n- **Set-default:** verify alias exists in cache before setting, update config.default_alias, save config. If alias does not exist, return error with suggestion listing available aliases.\n\n## Error Handling Details\n\n**Rename errors:**\n- New name fails format validation -> error with INVALID_ALIAS_NAME code and suggestion showing valid format\n- New name already exists -> error with ALIAS_EXISTS code\n- Rename to same name -> no-op, return success (idempotent, do not error)\n- Old alias does not exist -> error with ALIAS_NOT_FOUND code\n- Filesystem rename fails -> error with IO_ERROR code\n\n**Delete errors:**\n- Alias does not exist -> error with ALIAS_NOT_FOUND code\n- Lock contention (e.g., sync running) -> error with LOCK_CONTENTION code and suggestion to retry\n- Deleting the only alias -> allowed (leaves empty aliases state, no special handling)\n\n**Set-default errors:**\n- Alias does not exist -> error with ALIAS_NOT_FOUND code and suggestion listing available aliases\n\n## Acceptance Criteria\n- [ ] `aliases --robot` lists all aliases with correct metadata (name, url, version, is_default, cached_at, size, endpoint_count, schema_count)\n- [ ] `aliases --show petstore` shows full details for one alias\n- [ ] `aliases --rename old new` renames directory atomically and updates config if renamed alias was default\n- [ ] `aliases --rename old old` (same name) is a no-op, returns success\n- [ ] `aliases --delete old-api` removes alias directory and clears default if it was default\n- [ ] Delete does NOT prompt for confirmation (non-interactive CLI)\n- [ ] `aliases --set-default petstore` updates config, errors if alias does not exist\n- [ ] Rename validates new alias format (alphanumeric, hyphens, underscores)\n- [ ] Rename checks new name does not already exist\n- [ ] Delete of default alias clears default_alias in config\n- [ ] Robot output for each operation is well-structured with ok/data/meta envelope\n- [ ] Error responses include appropriate error codes and suggestions\n\n## Edge Cases\n- **Rename to same name:** No-op, return success (idempotent behavior).\n- **Delete the only alias:** Allowed. Leaves cache in empty state. Subsequent commands that need an alias will error with ALIAS_NOT_FOUND suggesting the user fetch a spec.\n- **Delete while sync is running:** Lock contention. Return LOCK_CONTENTION error with suggestion to wait or retry. Do not force-delete.\n- **Set-default to non-existent alias:** Error with ALIAS_NOT_FOUND and suggestion listing available aliases from cache.\n- **Rename when target name has invalid characters:** Error with INVALID_ALIAS_NAME showing the format rules.\n\n## Files\n- MODIFY: src/cli/aliases.rs (AliasesArgs, execute, rename/delete/set-default logic)\n- MODIFY: src/output/robot.rs (add output_aliases)\n- MODIFY: src/output/human.rs (add output_aliases)\n\n## TDD Anchor\nRED: Write `test_aliases_list` — fetch two specs, run aliases --robot, assert data.aliases has length 2.\nRED: Write `test_aliases_rename` — rename an alias, verify directory moved and config updated.\nRED: Write `test_aliases_rename_same_name` — rename to same name, verify no-op success.\nRED: Write `test_aliases_delete` — delete alias, verify directory removed and config cleared.\nRED: Write `test_aliases_delete_lock_contention` — hold lock on alias, attempt delete, assert LOCK_CONTENTION error.\nGREEN: Implement list_aliases and output.\nVERIFY: `cargo test test_aliases_list`\n\n## Dependency Context\nUses CacheManager (list_aliases, delete_alias) from bd-3ea. Uses Config (default_alias, save) from bd-1sb.","status":"open","priority":2,"issue_type":"task","created_at":"2026-02-12T16:28:47.390765Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:46:14.418127Z","compaction_level":0,"original_size":0,"labels":["management","phase2"],"dependencies":[{"issue_id":"bd-30a","depends_on_id":"bd-1sb","type":"blocks","created_at":"2026-02-12T16:28:47.395669Z","created_by":"tayloreernisse"},{"issue_id":"bd-30a","depends_on_id":"bd-2pl","type":"parent-child","created_at":"2026-02-12T16:28:47.394226Z","created_by":"tayloreernisse"},{"issue_id":"bd-30a","depends_on_id":"bd-3d2","type":"blocks","created_at":"2026-02-12T16:28:47.396077Z","created_by":"tayloreernisse"},{"issue_id":"bd-30a","depends_on_id":"bd-3ea","type":"blocks","created_at":"2026-02-12T16:28:47.394978Z","created_by":"tayloreernisse"}]} {"id":"bd-37c","title":"Add SBOM generation and cosign attestation","description":"## What\nGenerate SBOM (CycloneDX or SPDX) during CI build. Sign release artifacts with cosign for provenance attestation.\n\n## Acceptance Criteria\n- [ ] SBOM generated in CI pipeline\n- [ ] cosign attestation attached to release artifacts\n- [ ] Verifiable with cosign verify\n\n## Files\n- MODIFY: .gitlab-ci.yml (add SBOM + cosign steps)","status":"open","priority":4,"issue_type":"task","created_at":"2026-02-12T16:31:57.365996Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:42:45.149708Z","compaction_level":0,"original_size":0,"labels":["future","phase3"],"dependencies":[{"issue_id":"bd-37c","depends_on_id":"bd-2e4","type":"blocks","created_at":"2026-02-12T16:42:45.149692Z","created_by":"tayloreernisse"},{"issue_id":"bd-37c","depends_on_id":"bd-3aq","type":"parent-child","created_at":"2026-02-12T16:31:57.367010Z","created_by":"tayloreernisse"}]} {"id":"bd-3aq","title":"Epic: Phase 3 Future","status":"open","priority":1,"issue_type":"task","created_at":"2026-02-12T16:22:29.339564Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:22:29.340289Z","compaction_level":0,"original_size":0,"labels":["epic"]} -{"id":"bd-3b6","title":"Build async HTTP client with SSRF protection and streaming download","description":"## Background\nswagger-cli fetches OpenAPI specs over HTTPS with strict security controls. The HTTP client must enforce SSRF protection (blocking private/loopback/link-local/multicast IPs), require HTTPS by default, support streaming downloads with max-bytes enforcement, and handle retries with backoff. This is async (tokio + reqwest).\n\n## Approach\nCreate src/core/http.rs with:\n\n**AsyncHttpClient struct:** Wraps reqwest::Client configured with rustls-tls, connect timeout (5s), overall timeout (configurable, default 10s), redirect policy (max 5). Provides `fetch_spec()` async method.\n\n**SSRF Protection:** Before connecting, resolve the hostname and check the IP against blocked CIDR ranges: 127.0.0.0/8, ::1, 169.254.0.0/16, fe80::/10, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, multicast (224.0.0.0/4, ff00::/8). Also check resolved IP AFTER redirects (DNS rebinding defense). Return PolicyBlocked error for violations. Accept --allow-private-host exceptions.\n\n**HTTPS Enforcement:** Reject http:// URLs unless --allow-insecure-http is set. Return PolicyBlocked.\n\n**Streaming Download:** Use response.chunk() in a loop, counting bytes. Abort when exceeding max_bytes (default 25MB). This prevents OOM on huge specs.\n\n**Retries:** Retry on 5xx and network errors up to N times (default 2) with exponential backoff + jitter. Honor Retry-After header. Do NOT retry 4xx (except 429 rate limit).\n\n**Auth Headers:** Accept Vec of (name, value) header pairs. Redact auth values in any error messages.\n\n## Acceptance Criteria\n- [ ] fetch_spec(\"https://...\") returns body bytes on success\n- [ ] Loopback IP (127.0.0.1) is blocked with PolicyBlocked error\n- [ ] Private IP (10.0.0.1) is blocked with PolicyBlocked error\n- [ ] Link-local (169.254.169.254) is blocked with PolicyBlocked error\n- [ ] http:// URL without --allow-insecure-http returns PolicyBlocked\n- [ ] Download exceeding max_bytes aborts with InvalidSpec error\n- [ ] 401/403 returns Auth error (not retried)\n- [ ] 500 is retried up to retry count\n- [ ] Auth header values are not included in error messages\n\n## Files\n- CREATE: src/core/http.rs (AsyncHttpClient, SSRF checks, streaming download, retries)\n- MODIFY: src/core/mod.rs (pub mod http;)\n\n## TDD Anchor\nRED: Write `test_ssrf_blocks_loopback` — call the IP validation function with 127.0.0.1, assert it returns Err(PolicyBlocked).\nGREEN: Implement CIDR range checking.\nVERIFY: `cargo test test_ssrf_blocks`\n\nAdditional tests (use mockito for HTTP):\n- test_fetch_success_https\n- test_fetch_rejects_http\n- test_fetch_max_bytes_abort\n- test_fetch_retries_on_500\n- test_fetch_no_retry_on_401\n- test_auth_header_redacted_in_errors\n\n## Edge Cases\n- DNS resolution must happen BEFORE connecting — use `tokio::net::lookup_host()` or reqwest's resolve API\n- DNS rebinding: a hostname might resolve to public IP initially, then private IP on redirect. Check IP at EACH hop.\n- IPv6 mapped IPv4 addresses (::ffff:127.0.0.1) must also be caught\n- Retry-After header may be seconds or HTTP-date — parse both formats\n- connect_timeout (5s) is separate from overall timeout (10s)\n\n## Dependency Context\nUses SwaggerCliError variants (Network, Auth, PolicyBlocked, InvalidSpec) from bd-ilo (error types and core data models).","status":"open","priority":2,"issue_type":"task","created_at":"2026-02-12T16:26:35.163338Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:55:53.972326Z","compaction_level":0,"original_size":0,"labels":["fetch","phase1","security"],"dependencies":[{"issue_id":"bd-3b6","depends_on_id":"bd-3d2","type":"blocks","created_at":"2026-02-12T16:26:35.167736Z","created_by":"tayloreernisse"},{"issue_id":"bd-3b6","depends_on_id":"bd-3ny","type":"parent-child","created_at":"2026-02-12T16:26:35.167093Z","created_by":"tayloreernisse"}]} +{"id":"bd-3b6","title":"Build async HTTP client with SSRF protection and streaming download","description":"## Background\nswagger-cli fetches OpenAPI specs over HTTPS with strict security controls. The HTTP client must enforce SSRF protection (blocking private/loopback/link-local/multicast IPs), require HTTPS by default, support streaming downloads with max-bytes enforcement, and handle retries with backoff. This is async (tokio + reqwest).\n\n## Approach\nCreate src/core/http.rs with:\n\n**AsyncHttpClient struct:** Wraps reqwest::Client configured with rustls-tls, connect timeout (5s), overall timeout (configurable, default 10s), redirect policy (max 5). Provides `fetch_spec()` async method.\n\n**SSRF Protection:** Before connecting, resolve the hostname and check the IP against blocked CIDR ranges: 127.0.0.0/8, ::1, 169.254.0.0/16, fe80::/10, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, multicast (224.0.0.0/4, ff00::/8). Also check resolved IP AFTER redirects (DNS rebinding defense). Return PolicyBlocked error for violations. Accept --allow-private-host exceptions.\n\n**HTTPS Enforcement:** Reject http:// URLs unless --allow-insecure-http is set. Return PolicyBlocked.\n\n**Streaming Download:** Use response.chunk() in a loop, counting bytes. Abort when exceeding max_bytes (default 25MB). This prevents OOM on huge specs.\n\n**Retries:** Retry on 5xx and network errors up to N times (default 2) with exponential backoff + jitter. Honor Retry-After header. Do NOT retry 4xx (except 429 rate limit).\n\n**Auth Headers:** Accept Vec of (name, value) header pairs. Redact auth values in any error messages.\n\n## Acceptance Criteria\n- [ ] fetch_spec(\"https://...\") returns body bytes on success\n- [ ] Loopback IP (127.0.0.1) is blocked with PolicyBlocked error\n- [ ] Private IP (10.0.0.1) is blocked with PolicyBlocked error\n- [ ] Link-local (169.254.169.254) is blocked with PolicyBlocked error\n- [ ] http:// URL without --allow-insecure-http returns PolicyBlocked\n- [ ] Download exceeding max_bytes aborts with InvalidSpec error\n- [ ] 401/403 returns Auth error (not retried)\n- [ ] 500 is retried up to retry count\n- [ ] Auth header values are not included in error messages\n\n## Files\n- CREATE: src/core/http.rs (AsyncHttpClient, SSRF checks, streaming download, retries)\n- MODIFY: src/core/mod.rs (pub mod http;)\n\n## TDD Anchor\nRED: Write `test_ssrf_blocks_loopback` — call the IP validation function with 127.0.0.1, assert it returns Err(PolicyBlocked).\nGREEN: Implement CIDR range checking.\nVERIFY: `cargo test test_ssrf_blocks`\n\nAdditional tests (use mockito for HTTP):\n- test_fetch_success_https\n- test_fetch_rejects_http\n- test_fetch_max_bytes_abort\n- test_fetch_retries_on_500\n- test_fetch_no_retry_on_401\n- test_auth_header_redacted_in_errors\n\n## Edge Cases\n- DNS resolution must happen BEFORE connecting — use `tokio::net::lookup_host()` or reqwest's resolve API\n- DNS rebinding: a hostname might resolve to public IP initially, then private IP on redirect. Check IP at EACH hop.\n- IPv6 mapped IPv4 addresses (::ffff:127.0.0.1) must also be caught\n- Retry-After header may be seconds or HTTP-date — parse both formats\n- connect_timeout (5s) is separate from overall timeout (10s)\n\n## Dependency Context\nUses SwaggerCliError variants (Network, Auth, PolicyBlocked, InvalidSpec) from bd-ilo (error types and core data models).","status":"closed","priority":2,"issue_type":"task","created_at":"2026-02-12T16:26:35.163338Z","created_by":"tayloreernisse","updated_at":"2026-02-12T17:46:06.061487Z","closed_at":"2026-02-12T17:46:06.061440Z","close_reason":"Async HTTP client with SSRF protection, HTTPS enforcement, DNS resolution checks, streaming download with retries","compaction_level":0,"original_size":0,"labels":["fetch","phase1","security"],"dependencies":[{"issue_id":"bd-3b6","depends_on_id":"bd-3d2","type":"blocks","created_at":"2026-02-12T16:26:35.167736Z","created_by":"tayloreernisse"},{"issue_id":"bd-3b6","depends_on_id":"bd-3ny","type":"parent-child","created_at":"2026-02-12T16:26:35.167093Z","created_by":"tayloreernisse"}]} {"id":"bd-3bl","title":"Implement tags command","description":"## Background\nThe tags command is simple — it lists OpenAPI tags with their endpoint counts and descriptions. Pure index-backed, fast.\n\n## Approach\nImplement src/cli/tags.rs with TagsArgs (alias only) and execute(). Load index, output tags from index.tags (already sorted and counted during index building). Robot: data.tags[] with name, description, endpoint_count. Human: formatted list with \"X total\" in header.\n\n## Acceptance Criteria\n- [ ] `tags petstore --robot` returns JSON with data.tags array\n- [ ] Each tag has name (string), description (string|null), endpoint_count (integer)\n- [ ] Tags sorted by name ASC (pre-sorted in index)\n- [ ] Human output shows tag name, count, and description\n- [ ] Human output shows \"X total\" count in header line\n- [ ] Tags with no description show null in robot output, empty/omitted in human output\n- [ ] Empty tags list (spec with no tags defined) returns ok:true with data.tags as empty array\n- [ ] Robot meta includes standard fields (schema_version, tool_version, command, duration_ms)\n\n## Edge Cases\n- **Spec with no tags:** Return ok:true, data.tags: [], meta.total: 0. Human output: \"0 total\" header, no rows.\n- **Tags with empty descriptions:** Tag defined in spec with `description: \"\"` — treat as null/empty in output (same as missing description).\n- **Orphaned tags:** Tags defined at root level in the OpenAPI spec but not referenced by any operation. These should still appear in output with endpoint_count: 0 (they exist in the spec, the command reports what the spec declares).\n\n## Files\n- MODIFY: src/cli/tags.rs (TagsArgs, execute)\n- MODIFY: src/output/robot.rs (add output_tags)\n- MODIFY: src/output/human.rs (add output_tags)\n\n## TDD Anchor\nRED: Write `test_tags_list` — fetch petstore, run tags --robot, assert data.tags has expected tag count.\nRED: Write `test_tags_empty` — use a spec with no tags, assert data.tags is empty array.\nRED: Write `test_tags_no_description` — use a spec with a tag that has no description, assert description is null in robot output.\nGREEN: Implement tags command.\nVERIFY: `cargo test test_tags_list`\n\n## Dependency Context\nUses SpecIndex and IndexedTag types from bd-ilo (error types and core data models). Uses CacheManager.load_index from bd-3ea (cache read path).","status":"open","priority":3,"issue_type":"task","created_at":"2026-02-12T16:28:05.366529Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:55:58.303891Z","compaction_level":0,"original_size":0,"labels":["phase2","query"],"dependencies":[{"issue_id":"bd-3bl","depends_on_id":"bd-3d2","type":"blocks","created_at":"2026-02-12T16:28:05.368603Z","created_by":"tayloreernisse"},{"issue_id":"bd-3bl","depends_on_id":"bd-3ea","type":"blocks","created_at":"2026-02-12T16:28:05.368039Z","created_by":"tayloreernisse"},{"issue_id":"bd-3bl","depends_on_id":"bd-jek","type":"parent-child","created_at":"2026-02-12T16:28:05.367634Z","created_by":"tayloreernisse"}]} {"id":"bd-3d2","title":"Build CLI skeleton with clap and output formatting framework","description":"## Background\nswagger-cli needs a CLI parser (clap) that routes to subcommands and an output framework that handles both human-readable and robot JSON formatting. The CLI skeleton defines the top-level Cli struct with global flags (--robot, --pretty, --network, --config) and the Commands enum with all 11 subcommands. The output framework provides consistent formatting for all commands.\n\n## Approach\n**CLI (src/cli/mod.rs):**\n- Define `Cli` struct with `#[derive(Parser)]`: command (Commands subcommand), robot (bool, global), pretty (bool, global), network (String, global, default \"auto\"), config (Option, global, env SWAGGER_CLI_CONFIG)\n- Define `Commands` enum with all 11 variants: Fetch, List, Show, Search, Schemas, Tags, Aliases, Sync, Doctor, Cache, Diff\n- Create stub modules for each command (src/cli/fetch.rs, list.rs, etc.) with empty `FetchArgs` structs and `pub async fn execute()` signatures returning `Result<(), SwaggerCliError>`\n- Note: ALL execute functions use `async fn` signatures (we use tokio runtime throughout). Fetch, sync, and diff perform actual async I/O; query commands (list, show, search, etc.) are async in signature but may not await internally.\n\n**Main (src/main.rs):**\n- Pre-scan argv for `--robot` before clap parsing (handles parse errors in robot JSON)\n- `#[tokio::main] async fn main()` that tries Cli::try_parse_from, routes to command execute, handles errors\n- `output_robot_error()` and `output_human_error()` functions per PRD\n\n**Output (src/output/):**\n- `mod.rs`: Common traits/helpers, `RobotEnvelope` struct with ok, data, meta fields\n- `robot.rs`: Generic `robot_success()` and `robot_error()` functions that build the envelope with schema_version=1, tool_version from CARGO_PKG_VERSION, command name, duration_ms\n- `human.rs`: Stub formatters, TTY detection for color/unicode\n- `table.rs`: Table formatting helpers using `tabled` crate\n\n## Acceptance Criteria\n- [ ] `cargo build` succeeds with the full CLI skeleton\n- [ ] `swagger-cli --help` shows all 11 subcommands\n- [ ] `swagger-cli --version` prints version from Cargo.toml\n- [ ] `swagger-cli nonexistent --robot` outputs JSON error to stderr with code USAGE_ERROR and exits 2\n- [ ] `swagger-cli fetch --robot` (missing required args) outputs JSON error to stderr\n- [ ] RobotEnvelope serializes with ok, data, meta.schema_version, meta.tool_version, meta.command, meta.duration_ms\n- [ ] All command stubs return `Ok(())` (do nothing yet)\n\n## Files\n- CREATE: src/cli/mod.rs (Cli, Commands, pub mods)\n- CREATE: src/cli/fetch.rs, list.rs, show.rs, search.rs, schemas.rs, tags.rs, aliases.rs, sync.rs, doctor.rs, cache.rs, diff.rs (stub args + execute)\n- CREATE: src/output/mod.rs, src/output/robot.rs, src/output/human.rs, src/output/table.rs\n- MODIFY: src/main.rs (full entry point with error handling)\n- MODIFY: src/lib.rs (pub mod cli; pub mod output;)\n\n## TDD Anchor\nRED: Write test `test_robot_error_on_bad_command` using `assert_cmd` -- run `swagger-cli nonexistent --robot`, parse stderr as JSON, assert the JSON contains code=\"USAGE_ERROR\" and the process exits with code 2.\nGREEN: Implement main.rs pre-scan and robot error output.\nVERIFY: `cargo test test_robot_error_on_bad_command`\n\n## Edge Cases\n- Pre-scan argv for --robot MUST happen before clap parsing, otherwise clap's error output is plaintext even when agent expects JSON\n- Global --robot flag must be accessible from all subcommands (use `#[arg(long, global = true)]`)\n- duration_ms in robot envelope: use `std::time::Instant::now()` at start, elapsed at output\n- BTreeMap (not HashMap) for any constructed JSON objects to ensure deterministic key ordering\n\n## Dependency Context\nUses SwaggerCliError (exit_code, code, suggestion) from bd-ilo (error types and core data models).","status":"closed","priority":2,"issue_type":"task","created_at":"2026-02-12T16:24:12.604507Z","created_by":"tayloreernisse","updated_at":"2026-02-12T17:41:12.843580Z","closed_at":"2026-02-12T17:41:12.843535Z","close_reason":"CLI skeleton with 11 subcommands, robot/human output, main.rs routing","compaction_level":0,"original_size":0,"labels":["foundation","phase1"],"dependencies":[{"issue_id":"bd-3d2","depends_on_id":"bd-3e0","type":"parent-child","created_at":"2026-02-12T16:24:12.606348Z","created_by":"tayloreernisse"},{"issue_id":"bd-3d2","depends_on_id":"bd-ilo","type":"blocks","created_at":"2026-02-12T16:24:12.607227Z","created_by":"tayloreernisse"}]} {"id":"bd-3e0","title":"Epic: Project Foundation","status":"open","priority":1,"issue_type":"task","created_at":"2026-02-12T16:22:16.954888Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:22:16.956168Z","compaction_level":0,"original_size":0,"labels":["epic"]} -{"id":"bd-3ea","title":"Implement cache read path with integrity validation","description":"## Background\nThe cache read path is used by every query command (list, search, tags, schemas, aliases, doctor). It loads index.json and meta.json, validates their integrity (generation match, hash match, index_version match), and provides the data to commands. The `show` command additionally loads raw.json. Coalesced last_accessed updates reduce write amplification for hot-read bursts.\n\n## Approach\nImplement on CacheManager:\n- `load_index(alias) -> Result<(SpecIndex, CacheMetadata)>`: Read meta.json first (commit marker). If missing -> AliasNotFound or CacheIntegrity. Read index.json. Validate: meta.index_version == index.index_version, meta.generation == index.generation, meta.index_hash == sha256(index.json bytes). Mismatch -> CacheIntegrity error. Update last_accessed if stale >10min (best-effort, no lock required).\n- `load_raw(alias, meta: &CacheMetadata) -> Result`: Read raw.json, parse as Value. Validate meta.raw_hash == sha256(raw.json bytes). Return Value.\n- `list_aliases() -> Result>`: Iterate alias directories, read meta.json from each. Skip broken/partial aliases (log warning).\n- `delete_alias(alias) -> Result<()>`: Remove alias directory after acquiring lock.\n- `default_alias() -> Option`: Load config, return default_alias.\n- `alias_exists(alias) -> bool`: Check if meta.json exists in alias dir.\n\nCoalesced last_accessed: When load_index reads meta, check if meta.last_accessed is >10min old. If so, update only the last_accessed field in meta.json (best-effort write, no lock, ignore errors).\n\n## Acceptance Criteria\n- [ ] load_index succeeds for a valid cache (all 4 files present, hashes match)\n- [ ] load_index returns CacheIntegrity when generation mismatches\n- [ ] load_index returns CacheIntegrity when index_hash mismatches\n- [ ] load_index returns AliasNotFound when alias directory doesn't exist\n- [ ] load_raw validates raw_hash and returns parsed Value\n- [ ] list_aliases returns metadata for all valid aliases, skips broken ones\n- [ ] delete_alias removes the entire alias directory\n- [ ] last_accessed is updated at most once per 10 minutes\n\n## Files\n- MODIFY: src/core/cache.rs (add load_index, load_raw, list_aliases, delete_alias, default_alias, alias_exists, coalesced last_accessed)\n\n## TDD Anchor\nRED: Write `test_load_index_integrity_check` -- create a valid cache, then tamper with index.json (change a byte). Assert load_index returns CacheIntegrity error.\nGREEN: Implement hash validation in load_index().\nVERIFY: `cargo test test_load_index_integrity_check`\n\nAdditional tests:\n- test_load_index_success\n- test_load_index_missing_meta\n- test_load_raw_validates_hash\n- test_list_aliases_skips_broken\n- test_coalesced_last_accessed (verify not written within 10min window)\n\n## Edge Cases\n- list_aliases must not panic if an alias directory has no meta.json -- skip with warning\n- Coalesced write is best-effort: if it fails (permissions, concurrent write), silently ignore\n- load_raw is only called by show and schemas --show -- never by list/search/tags\n- Empty cache directory (no aliases) should return empty Vec, not error\n\n## Dependency Context\nUses CacheManager and write_cache() from bd-1ie (cache write path). Uses CacheMetadata, SpecIndex, and SwaggerCliError from bd-ilo (error types and core data models).","status":"open","priority":2,"issue_type":"task","created_at":"2026-02-12T16:25:15.526245Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:55:44.766573Z","compaction_level":0,"original_size":0,"labels":["infrastructure","phase1"],"dependencies":[{"issue_id":"bd-3ea","depends_on_id":"bd-1ie","type":"blocks","created_at":"2026-02-12T16:34:05.956814Z","created_by":"tayloreernisse"},{"issue_id":"bd-3ea","depends_on_id":"bd-hcb","type":"parent-child","created_at":"2026-02-12T16:25:15.528127Z","created_by":"tayloreernisse"},{"issue_id":"bd-3ea","depends_on_id":"bd-ilo","type":"blocks","created_at":"2026-02-12T16:25:15.528786Z","created_by":"tayloreernisse"}]} +{"id":"bd-3ea","title":"Implement cache read path with integrity validation","description":"## Background\nThe cache read path is used by every query command (list, search, tags, schemas, aliases, doctor). It loads index.json and meta.json, validates their integrity (generation match, hash match, index_version match), and provides the data to commands. The `show` command additionally loads raw.json. Coalesced last_accessed updates reduce write amplification for hot-read bursts.\n\n## Approach\nImplement on CacheManager:\n- `load_index(alias) -> Result<(SpecIndex, CacheMetadata)>`: Read meta.json first (commit marker). If missing -> AliasNotFound or CacheIntegrity. Read index.json. Validate: meta.index_version == index.index_version, meta.generation == index.generation, meta.index_hash == sha256(index.json bytes). Mismatch -> CacheIntegrity error. Update last_accessed if stale >10min (best-effort, no lock required).\n- `load_raw(alias, meta: &CacheMetadata) -> Result`: Read raw.json, parse as Value. Validate meta.raw_hash == sha256(raw.json bytes). Return Value.\n- `list_aliases() -> Result>`: Iterate alias directories, read meta.json from each. Skip broken/partial aliases (log warning).\n- `delete_alias(alias) -> Result<()>`: Remove alias directory after acquiring lock.\n- `default_alias() -> Option`: Load config, return default_alias.\n- `alias_exists(alias) -> bool`: Check if meta.json exists in alias dir.\n\nCoalesced last_accessed: When load_index reads meta, check if meta.last_accessed is >10min old. If so, update only the last_accessed field in meta.json (best-effort write, no lock, ignore errors).\n\n## Acceptance Criteria\n- [ ] load_index succeeds for a valid cache (all 4 files present, hashes match)\n- [ ] load_index returns CacheIntegrity when generation mismatches\n- [ ] load_index returns CacheIntegrity when index_hash mismatches\n- [ ] load_index returns AliasNotFound when alias directory doesn't exist\n- [ ] load_raw validates raw_hash and returns parsed Value\n- [ ] list_aliases returns metadata for all valid aliases, skips broken ones\n- [ ] delete_alias removes the entire alias directory\n- [ ] last_accessed is updated at most once per 10 minutes\n\n## Files\n- MODIFY: src/core/cache.rs (add load_index, load_raw, list_aliases, delete_alias, default_alias, alias_exists, coalesced last_accessed)\n\n## TDD Anchor\nRED: Write `test_load_index_integrity_check` -- create a valid cache, then tamper with index.json (change a byte). Assert load_index returns CacheIntegrity error.\nGREEN: Implement hash validation in load_index().\nVERIFY: `cargo test test_load_index_integrity_check`\n\nAdditional tests:\n- test_load_index_success\n- test_load_index_missing_meta\n- test_load_raw_validates_hash\n- test_list_aliases_skips_broken\n- test_coalesced_last_accessed (verify not written within 10min window)\n\n## Edge Cases\n- list_aliases must not panic if an alias directory has no meta.json -- skip with warning\n- Coalesced write is best-effort: if it fails (permissions, concurrent write), silently ignore\n- load_raw is only called by show and schemas --show -- never by list/search/tags\n- Empty cache directory (no aliases) should return empty Vec, not error\n\n## Dependency Context\nUses CacheManager and write_cache() from bd-1ie (cache write path). Uses CacheMetadata, SpecIndex, and SwaggerCliError from bd-ilo (error types and core data models).","status":"closed","priority":2,"issue_type":"task","created_at":"2026-02-12T16:25:15.526245Z","created_by":"tayloreernisse","updated_at":"2026-02-12T17:46:04.150028Z","closed_at":"2026-02-12T17:46:04.149980Z","close_reason":"Cache read with integrity validation implemented - load_index, load_raw, list_aliases, alias_exists, delete_alias with hash verification","compaction_level":0,"original_size":0,"labels":["infrastructure","phase1"],"dependencies":[{"issue_id":"bd-3ea","depends_on_id":"bd-1ie","type":"blocks","created_at":"2026-02-12T16:34:05.956814Z","created_by":"tayloreernisse"},{"issue_id":"bd-3ea","depends_on_id":"bd-hcb","type":"parent-child","created_at":"2026-02-12T16:25:15.528127Z","created_by":"tayloreernisse"},{"issue_id":"bd-3ea","depends_on_id":"bd-ilo","type":"blocks","created_at":"2026-02-12T16:25:15.528786Z","created_by":"tayloreernisse"}]} {"id":"bd-3f4","title":"Implement sync command with change detection and index-based diffs","description":"## Background\nThe sync command checks if a remote spec has changed and re-fetches if needed. Change detection uses ETag/Last-Modified headers and content hash comparison. Index-based diffs compute added/removed/modified endpoints and schemas by comparing old vs new indexes. This is async (network calls).\n\n## Approach\nImplement src/cli/sync.rs with SyncArgs and async execute():\n\n**Single alias sync flow:**\n1. Load existing meta (get ETag, Last-Modified, content_hash)\n2. Fetch remote spec with conditional headers (If-None-Match, If-Modified-Since)\n3. If 304 Not Modified → report \"no changes\"\n4. If 200: compute new content_hash, compare with stored\n5. If hash matches → report \"no changes\" (content identical despite no 304)\n6. If changed: normalize, build new index, compare old vs new index\n7. Compute diff: added/removed/modified endpoints and schemas (compare by path+method / name)\n8. If not --dry-run: write new cache\n9. Output change summary\n\n**SyncArgs:** alias (Option), all (bool), dry_run (bool), force (bool — re-fetch regardless), details (bool — include change lists), jobs (usize, default 4), per_host (usize, default 2), resume (bool), max_failures (Option).\n\n**--details output:** Capped at 200 items per category. Include truncated:bool flag.\n\n## Acceptance Criteria\n- [ ] Sync detects \"no changes\" via 304 response\n- [ ] Sync detects changes via content hash mismatch\n- [ ] --dry-run checks without writing\n- [ ] --force re-fetches regardless\n- [ ] --details includes added/removed/modified endpoint/schema lists (capped at 200)\n- [ ] Robot output: changed (bool), local_version, remote_version, changes.endpoints/schemas counts\n- [ ] --force with --dry-run: fetches and computes diff but doesn't write\n\n## Edge Cases\n- **304 Not Modified but content actually changed:** Server returns 304 incorrectly. Content hash comparison catches this (hash mismatch despite 304 = treat as changed).\n- **Huge index diff:** If thousands of endpoints changed, --details output must respect the 200-item cap with truncated:true flag.\n- **Server returns different Content-Type:** e.g., returns HTML error page instead of JSON. Format detection catches this — invalid spec error.\n- **ETag/Last-Modified missing on first fetch:** meta.etag and meta.last_modified will be None. Sync without conditional headers — always downloads full content.\n- **--force with --dry-run:** Fetch the content but don't write. Report what would change. This combo must be supported.\n\n## Files\n- MODIFY: src/cli/sync.rs (SyncArgs, execute, single_alias_sync, compute_index_diff)\n- MODIFY: src/output/robot.rs (add output_sync)\n- MODIFY: src/output/human.rs (add output_sync)\n\n## TDD Anchor\nRED: Write `test_sync_no_changes` — fetch petstore fixture locally, run sync --robot (mock server returns same content), assert changed==false.\nGREEN: Implement hash-based change detection.\nVERIFY: `cargo test test_sync_no_changes`\n\n## Dependency Context\nUses AsyncHttpClient from bd-3b6 for fetching. Uses indexer from bd-189 for rebuilding. Uses CacheManager from bd-3ea/bd-1ie for read/write. Requires a fetched spec in cache.","status":"open","priority":2,"issue_type":"task","created_at":"2026-02-12T16:28:47.430949Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:58:08.755348Z","compaction_level":0,"original_size":0,"labels":["phase2","sync"],"dependencies":[{"issue_id":"bd-3f4","depends_on_id":"bd-161","type":"parent-child","created_at":"2026-02-12T16:28:47.432443Z","created_by":"tayloreernisse"},{"issue_id":"bd-3f4","depends_on_id":"bd-16o","type":"blocks","created_at":"2026-02-12T16:28:47.432950Z","created_by":"tayloreernisse"},{"issue_id":"bd-3f4","depends_on_id":"bd-189","type":"blocks","created_at":"2026-02-12T16:28:47.433412Z","created_by":"tayloreernisse"}]} {"id":"bd-3km","title":"Implement list command with index-backed filtering and sorting","description":"## Background\nThe list command is the most-used query command. It loads only index.json (never raw.json) and filters/sorts endpoints by method, tag, and path regex. This is the command that must hit <50ms for cached queries, even on large specs like Stripe (312 endpoints) and GitHub (800+ endpoints).\n\n## Approach\nImplement src/cli/list.rs with ListArgs and execute():\n\n**ListArgs:** alias (Option), method (Option, value_parser GET/POST/PUT/DELETE/PATCH), tag (Option), path (Option — regex), sort (String, default \"path\", values: path/method/tag), limit (usize, default 50), all (bool — show all, no limit), all_aliases (bool — cross-alias search, Phase 2 bead).\n\n**Execute flow:**\n1. Resolve alias (use default if not specified)\n2. CacheManager::load_index(alias) — loads index.json + meta.json only\n3. Build regex from --path (fail fast with USAGE_ERROR on invalid regex)\n4. Filter endpoints: method match (case-insensitive), tag contains, path regex match\n5. Sort using canonical method ranking: GET=0, POST=1, PUT=2, PATCH=3, DELETE=4, OPTIONS=5, HEAD=6, TRACE=7\n6. Apply limit (unless --all)\n7. Output robot JSON or human table\n\n**Robot output format:** Per PRD — ok:true, data.endpoints[], data.total, data.filtered, data.applied_filters, meta with alias/spec_version/cached_at/duration_ms.\n\n**Human output:** Tabular format with METHOD PATH SUMMARY, header with API title and total count, footer with \"Showing X of Y endpoints (filtered)\".\n\n## Acceptance Criteria\n- [ ] `swagger-cli list petstore --robot` returns all endpoints as JSON\n- [ ] --method POST filters to POST-only endpoints\n- [ ] --tag pet filters to pet-tagged endpoints\n- [ ] --path \"store.*\" filters by regex\n- [ ] Invalid regex returns USAGE_ERROR (exit 2)\n- [ ] Combined filters work (--method POST --tag pet)\n- [ ] Default limit is 50; --all shows all\n- [ ] Endpoints sorted by path ASC, method_rank ASC by default\n- [ ] --sort method sorts by method rank first\n- [ ] No alias + no default → CONFIG_ERROR with suggestion\n- [ ] Command completes in <50ms for 500+ endpoint index\n\n## Files\n- MODIFY: src/cli/list.rs (ListArgs, execute, filter_endpoints, sort_endpoints, method_rank)\n- MODIFY: src/output/robot.rs (add output_list)\n- MODIFY: src/output/human.rs (add output_list)\n\n## TDD Anchor\nRED: Write `test_list_filter_by_method` — set up test cache with petstore, run list --method POST --robot, parse JSON, assert all endpoints have method==POST.\nGREEN: Implement filter_endpoints with method check.\nVERIFY: `cargo test test_list_filter_by_method`\n\n## Edge Cases\n- Empty result set (no matches) should return ok:true with empty endpoints array, not error\n- Path regex with special chars (e.g., `/pet/{petId}` — the braces are regex special) — users must escape\n- Method comparison must be case-insensitive (user passes \"post\", spec has \"POST\")\n\n## Dependency Context\nUses CacheManager.load_index from bd-3ea (cache read). Uses SpecIndex/IndexedEndpoint types from bd-ilo. Uses CLI skeleton from bd-3d2. Requires a fetched spec in cache to work (tested via test helper that calls fetch with local fixture).","status":"open","priority":2,"issue_type":"task","created_at":"2026-02-12T16:27:27.054531Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:27:27.058996Z","compaction_level":0,"original_size":0,"labels":["phase1","query"],"dependencies":[{"issue_id":"bd-3km","depends_on_id":"bd-3d2","type":"blocks","created_at":"2026-02-12T16:27:27.058985Z","created_by":"tayloreernisse"},{"issue_id":"bd-3km","depends_on_id":"bd-3ea","type":"blocks","created_at":"2026-02-12T16:27:27.058492Z","created_by":"tayloreernisse"},{"issue_id":"bd-3km","depends_on_id":"bd-epk","type":"parent-child","created_at":"2026-02-12T16:27:27.057878Z","created_by":"tayloreernisse"}]} {"id":"bd-3ll","title":"Epic: Global Features","status":"open","priority":1,"issue_type":"task","created_at":"2026-02-12T16:22:25.182608Z","created_by":"tayloreernisse","updated_at":"2026-02-12T16:22:25.183512Z","compaction_level":0,"original_size":0,"labels":["epic"]} diff --git a/Cargo.lock b/Cargo.lock index 09d632f..a8fa0fc 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -522,6 +522,23 @@ version = "0.3.31" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "05f29059c0c2090612e8d742178b0580d2dc940c837851ad723096f87af6663e" +[[package]] +name = "futures-io" +version = "0.3.31" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9e5c1b78ca4aae1ac06c48a526a655760685149f0d465d21f37abfe57ce075c6" + +[[package]] +name = "futures-macro" +version = "0.3.31" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "162ee34ebcb7c64a8abebc059ce0fee27c2262618d7b60ed8faf72fef13c3650" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.115", +] + [[package]] name = "futures-sink" version = "0.3.31" @@ -541,7 +558,11 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9fa08315bb612088cc391249efdc3bc77536f16c91f6cf495e6fbe85b20a4a81" dependencies = [ "futures-core", + "futures-io", + "futures-macro", + "futures-sink", "futures-task", + "memchr", "pin-project-lite", "pin-utils", "slab", @@ -1452,6 +1473,7 @@ dependencies = [ "base64", "bytes", "futures-core", + "futures-util", "http", "http-body", "http-body-util", @@ -1471,12 +1493,14 @@ dependencies = [ "sync_wrapper", "tokio", "tokio-rustls", + "tokio-util", "tower", "tower-http", "tower-service", "url", "wasm-bindgen", "wasm-bindgen-futures", + "wasm-streams", "web-sys", "webpki-roots", ] @@ -2259,6 +2283,19 @@ dependencies = [ "wasmparser", ] +[[package]] +name = "wasm-streams" +version = "0.4.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "15053d8d85c7eccdbefef60f06769760a563c7f0a9d6902a13d35c7800b0ad65" +dependencies = [ + "futures-util", + "js-sys", + "wasm-bindgen", + "wasm-bindgen-futures", + "web-sys", +] + [[package]] name = "wasmparser" version = "0.244.0" diff --git a/Cargo.toml b/Cargo.toml index bc1d852..7b3a3ac 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -11,7 +11,7 @@ colored = "3" directories = "6" fs2 = "0.4" regex = "1" -reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] } +reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls", "stream"] } serde = { version = "1", features = ["derive"] } serde_json = "1" serde_yaml = "0.9" diff --git a/src/core/cache.rs b/src/core/cache.rs index 07c3d00..ed886b1 100644 --- a/src/core/cache.rs +++ b/src/core/cache.rs @@ -229,6 +229,176 @@ impl CacheManager { Ok(meta) } + + /// Load a cached spec index with integrity validation. + /// + /// Reads `meta.json` first (as commit marker), then `index.json`. + /// Validates that index_version, generation, and index_hash all match + /// between meta and the on-disk index. Returns `AliasNotFound` if + /// meta.json is missing, `CacheIntegrity` on any mismatch. + pub fn load_index( + &self, + alias: &str, + ) -> Result<(SpecIndex, CacheMetadata), SwaggerCliError> { + validate_alias(alias)?; + let dir = self.alias_dir(alias); + + let meta_path = dir.join("meta.json"); + let meta_bytes = fs::read(&meta_path).map_err(|e| { + if e.kind() == std::io::ErrorKind::NotFound { + SwaggerCliError::AliasNotFound(alias.to_string()) + } else { + SwaggerCliError::Cache(format!( + "Failed to read {}: {e}", + meta_path.display() + )) + } + })?; + let meta: CacheMetadata = serde_json::from_slice(&meta_bytes).map_err(|e| { + SwaggerCliError::CacheIntegrity(format!( + "Corrupt meta.json for alias '{alias}': {e}" + )) + })?; + + let index_path = dir.join("index.json"); + let index_bytes = fs::read(&index_path).map_err(|e| { + SwaggerCliError::Cache(format!( + "Failed to read {}: {e}", + index_path.display() + )) + })?; + + let actual_hash = compute_hash(&index_bytes); + if meta.index_hash != actual_hash { + return Err(SwaggerCliError::CacheIntegrity(format!( + "Index hash mismatch for '{alias}': expected {}, got {actual_hash}", + meta.index_hash + ))); + } + + let index: SpecIndex = serde_json::from_slice(&index_bytes).map_err(|e| { + SwaggerCliError::CacheIntegrity(format!( + "Corrupt index.json for alias '{alias}': {e}" + )) + })?; + + if meta.index_version != index.index_version { + return Err(SwaggerCliError::CacheIntegrity(format!( + "index_version mismatch for '{alias}': meta={}, index={}", + meta.index_version, index.index_version + ))); + } + + if meta.generation != index.generation { + return Err(SwaggerCliError::CacheIntegrity(format!( + "generation mismatch for '{alias}': meta={}, index={}", + meta.generation, index.generation + ))); + } + + // Best-effort coalesced last_accessed update (no lock, ignore errors) + let age = Utc::now() - meta.last_accessed; + if age.num_minutes() > 10 { + let mut updated_meta = meta.clone(); + updated_meta.last_accessed = Utc::now(); + if let Ok(bytes) = serde_json::to_vec_pretty(&updated_meta) { + let _ = write_atomic(&meta_path, &bytes); + } + } + + Ok((index, meta)) + } + + /// Load the raw spec JSON with hash validation against metadata. + pub fn load_raw( + &self, + alias: &str, + meta: &CacheMetadata, + ) -> Result { + let raw_path = self.alias_dir(alias).join("raw.json"); + let raw_bytes = fs::read(&raw_path).map_err(|e| { + SwaggerCliError::Cache(format!( + "Failed to read {}: {e}", + raw_path.display() + )) + })?; + + let actual_hash = compute_hash(&raw_bytes); + if meta.raw_hash != actual_hash { + return Err(SwaggerCliError::CacheIntegrity(format!( + "Raw hash mismatch for '{}': expected {}, got {actual_hash}", + alias, meta.raw_hash + ))); + } + + let value: serde_json::Value = + serde_json::from_slice(&raw_bytes).map_err(|e| { + SwaggerCliError::Cache(format!( + "Failed to parse raw.json for '{}': {e}", + alias + )) + })?; + + Ok(value) + } + + /// List all cached aliases by reading meta.json from each subdirectory. + /// + /// Skips directories with missing or unreadable metadata (no panic). + pub fn list_aliases(&self) -> Result, SwaggerCliError> { + let entries = fs::read_dir(&self.cache_dir).map_err(|e| { + SwaggerCliError::Cache(format!( + "Failed to read cache directory {}: {e}", + self.cache_dir.display() + )) + })?; + + let mut results = Vec::new(); + for entry in entries { + let entry = match entry { + Ok(e) => e, + Err(_) => continue, + }; + let path = entry.path(); + if !path.is_dir() { + continue; + } + let meta_path = path.join("meta.json"); + let bytes = match fs::read(&meta_path) { + Ok(b) => b, + Err(_) => continue, + }; + if let Ok(meta) = serde_json::from_slice::(&bytes) { + results.push(meta); + } + } + + Ok(results) + } + + /// Check whether a cached alias exists (meta.json present). + pub fn alias_exists(&self, alias: &str) -> bool { + self.alias_dir(alias).join("meta.json").exists() + } + + /// Delete a cached alias directory (requires lock). + pub fn delete_alias(&self, alias: &str) -> Result<(), SwaggerCliError> { + validate_alias(alias)?; + let dir = self.alias_dir(alias); + if !dir.exists() { + return Err(SwaggerCliError::AliasNotFound(alias.to_string())); + } + + let _lock = self.acquire_lock(alias)?; + fs::remove_dir_all(&dir).map_err(|e| { + SwaggerCliError::Cache(format!( + "Failed to delete alias directory {}: {e}", + dir.display() + )) + })?; + + Ok(()) + } } /// Write `data` to `path.tmp`, fsync, then rename to `path`. @@ -433,4 +603,124 @@ mod tests { assert_eq!(meta.source_format, "yaml"); assert!(meta.content_hash.starts_with("sha256:")); } + + /// Helper: write a cache entry and return the manager + tempdir for further testing. + fn write_test_cache(alias: &str) -> (CacheManager, tempfile::TempDir, CacheMetadata) { + let tmp = tempfile::tempdir().unwrap(); + let manager = CacheManager::new(tmp.path().to_path_buf()); + let index = make_test_index(); + + let meta = manager + .write_cache( + alias, + b"openapi: 3.0.3", + b"{\"openapi\":\"3.0.3\"}", + &index, + Some("https://example.com/api.json".into()), + "1.0.0", + "Test API", + "yaml", + None, + None, + None, + ) + .unwrap(); + + (manager, tmp, meta) + } + + #[test] + fn test_load_index_success() { + let (manager, _tmp, written_meta) = write_test_cache("loadtest"); + + let (index, loaded_meta) = manager.load_index("loadtest").unwrap(); + assert_eq!(loaded_meta.alias, "loadtest"); + assert_eq!(loaded_meta.generation, written_meta.generation); + assert_eq!(loaded_meta.index_hash, written_meta.index_hash); + assert_eq!(index.index_version, 1); + assert_eq!(index.openapi, "3.0.3"); + } + + #[test] + fn test_load_index_integrity_check() { + let (manager, tmp, _meta) = write_test_cache("tampered"); + + // Tamper with index.json + let index_path = tmp.path().join("tampered").join("index.json"); + fs::write(&index_path, b"{\"corrupted\": true}").unwrap(); + + let result = manager.load_index("tampered"); + assert!(result.is_err()); + let err = result.unwrap_err(); + assert!( + matches!(err, SwaggerCliError::CacheIntegrity(_)), + "Expected CacheIntegrity, got: {err:?}" + ); + } + + #[test] + fn test_load_index_missing_meta() { + let tmp = tempfile::tempdir().unwrap(); + let manager = CacheManager::new(tmp.path().to_path_buf()); + + let result = manager.load_index("nonexistent"); + assert!(result.is_err()); + let err = result.unwrap_err(); + assert!( + matches!(err, SwaggerCliError::AliasNotFound(_)), + "Expected AliasNotFound, got: {err:?}" + ); + } + + #[test] + fn test_load_raw_validates_hash() { + let (manager, tmp, meta) = write_test_cache("rawtest"); + + // Tamper with raw.json + let raw_path = tmp.path().join("rawtest").join("raw.json"); + fs::write(&raw_path, b"{\"tampered\": true}").unwrap(); + + let result = manager.load_raw("rawtest", &meta); + assert!(result.is_err()); + let err = result.unwrap_err(); + assert!( + matches!(err, SwaggerCliError::CacheIntegrity(_)), + "Expected CacheIntegrity, got: {err:?}" + ); + } + + #[test] + fn test_list_aliases() { + let tmp = tempfile::tempdir().unwrap(); + let manager = CacheManager::new(tmp.path().to_path_buf()); + let index = make_test_index(); + + manager + .write_cache( + "api1", b"src1", b"{}", &index, None, "1.0", "API 1", "json", + None, None, None, + ) + .unwrap(); + manager + .write_cache( + "api2", b"src2", b"{}", &index, None, "2.0", "API 2", "yaml", + None, None, None, + ) + .unwrap(); + + let aliases = manager.list_aliases().unwrap(); + assert_eq!(aliases.len(), 2); + + let names: Vec<&str> = aliases.iter().map(|m| m.alias.as_str()).collect(); + assert!(names.contains(&"api1")); + assert!(names.contains(&"api2")); + } + + #[test] + fn test_alias_exists() { + let (manager, _tmp, _meta) = write_test_cache("exists"); + + assert!(manager.alias_exists("exists")); + assert!(!manager.alias_exists("nope")); + } } diff --git a/src/core/http.rs b/src/core/http.rs new file mode 100644 index 0000000..ac6f2b6 --- /dev/null +++ b/src/core/http.rs @@ -0,0 +1,512 @@ +use std::net::IpAddr; +use std::time::Duration; + +use reqwest::{StatusCode, Url}; +use tokio::net::lookup_host; + +use crate::errors::SwaggerCliError; + +const DEFAULT_CONNECT_TIMEOUT: Duration = Duration::from_secs(5); +const DEFAULT_OVERALL_TIMEOUT: Duration = Duration::from_secs(10); +const DEFAULT_MAX_BYTES: u64 = 25 * 1024 * 1024; // 25 MB +const DEFAULT_MAX_RETRIES: u32 = 2; +const RETRY_BASE_DELAY: Duration = Duration::from_millis(500); + +// --------------------------------------------------------------------------- +// SSRF protection +// --------------------------------------------------------------------------- + +fn is_ip_blocked(ip: &IpAddr) -> bool { + match ip { + IpAddr::V4(v4) => { + v4.is_loopback() // 127.0.0.0/8 + || v4.is_link_local() // 169.254.0.0/16 + || v4.is_broadcast() // 255.255.255.255 + || v4.is_unspecified() // 0.0.0.0 + || v4.is_multicast() // 224.0.0.0/4 + || is_private_v4(v4) + } + IpAddr::V6(v6) => { + v6.is_loopback() // ::1 + || v6.is_unspecified() // :: + || v6.is_multicast() // ff00::/8 + || is_link_local_v6(v6) // fe80::/10 + || is_blocked_mapped_v4(v6) + } + } +} + +fn is_private_v4(ip: &std::net::Ipv4Addr) -> bool { + let octets = ip.octets(); + // 10.0.0.0/8 + octets[0] == 10 + // 172.16.0.0/12 + || (octets[0] == 172 && (16..=31).contains(&octets[1])) + // 192.168.0.0/16 + || (octets[0] == 192 && octets[1] == 168) +} + +fn is_link_local_v6(ip: &std::net::Ipv6Addr) -> bool { + let segments = ip.segments(); + // fe80::/10 — first 10 bits are 1111_1110_10 + (segments[0] & 0xffc0) == 0xfe80 +} + +fn is_blocked_mapped_v4(v6: &std::net::Ipv6Addr) -> bool { + // ::ffff:x.x.x.x — IPv4-mapped IPv6 + let segments = v6.segments(); + if segments[0..5] == [0, 0, 0, 0, 0] && segments[5] == 0xffff { + let v4 = std::net::Ipv4Addr::new( + (segments[6] >> 8) as u8, + segments[6] as u8, + (segments[7] >> 8) as u8, + segments[7] as u8, + ); + return is_ip_blocked(&IpAddr::V4(v4)); + } + false +} + +// --------------------------------------------------------------------------- +// URL validation +// --------------------------------------------------------------------------- + +fn validate_url(url: &str, allow_insecure_http: bool) -> Result { + let parsed = Url::parse(url).map_err(|e| { + SwaggerCliError::InvalidSpec(format!("invalid URL '{url}': {e}")) + })?; + + match parsed.scheme() { + "https" => Ok(parsed), + "http" if allow_insecure_http => Ok(parsed), + "http" => Err(SwaggerCliError::PolicyBlocked( + format!("HTTP is not allowed for '{url}'. Use --allow-insecure-http to override."), + )), + other => Err(SwaggerCliError::InvalidSpec( + format!("unsupported scheme '{other}' in URL '{url}'"), + )), + } +} + +// --------------------------------------------------------------------------- +// DNS resolution + SSRF check +// --------------------------------------------------------------------------- + +async fn resolve_and_check( + host: &str, + port: u16, + allowed_private_hosts: &[String], +) -> Result<(), SwaggerCliError> { + if allowed_private_hosts.iter().any(|h| h == host) { + return Ok(()); + } + + let addr = format!("{host}:{port}"); + let addrs: Vec<_> = match lookup_host(&addr).await { + Ok(iter) => iter.collect(), + Err(e) => { + return Err(SwaggerCliError::InvalidSpec( + format!("DNS resolution failed for '{host}': {e}"), + )); + } + }; + + if addrs.is_empty() { + return Err(SwaggerCliError::InvalidSpec( + format!("DNS resolution returned no addresses for '{host}'"), + )); + } + + for socket_addr in &addrs { + if is_ip_blocked(&socket_addr.ip()) { + return Err(SwaggerCliError::PolicyBlocked(format!( + "resolved IP {} for host '{host}' is in a blocked range. \ + Use --allow-private-host {host} to override.", + socket_addr.ip() + ))); + } + } + + Ok(()) +} + +// --------------------------------------------------------------------------- +// FetchResult +// --------------------------------------------------------------------------- + +#[derive(Debug, Clone)] +pub struct FetchResult { + pub bytes: Vec, + pub content_type: Option, + pub etag: Option, + pub last_modified: Option, +} + +// --------------------------------------------------------------------------- +// AsyncHttpClient builder +// --------------------------------------------------------------------------- + +pub struct AsyncHttpClient { + connect_timeout: Duration, + overall_timeout: Duration, + max_bytes: u64, + max_retries: u32, + allow_insecure_http: bool, + allowed_private_hosts: Vec, + auth_headers: Vec<(String, String)>, +} + +impl Default for AsyncHttpClient { + fn default() -> Self { + Self { + connect_timeout: DEFAULT_CONNECT_TIMEOUT, + overall_timeout: DEFAULT_OVERALL_TIMEOUT, + max_bytes: DEFAULT_MAX_BYTES, + max_retries: DEFAULT_MAX_RETRIES, + allow_insecure_http: false, + allowed_private_hosts: Vec::new(), + auth_headers: Vec::new(), + } + } +} + +impl AsyncHttpClient { + pub fn builder() -> AsyncHttpClientBuilder { + AsyncHttpClientBuilder::default() + } + + pub async fn fetch_spec(&self, url: &str) -> Result { + let parsed = validate_url(url, self.allow_insecure_http)?; + + let host = parsed.host_str().ok_or_else(|| { + SwaggerCliError::InvalidSpec(format!("URL '{url}' has no host")) + })?; + let port = parsed.port_or_known_default().unwrap_or(443); + + resolve_and_check(host, port, &self.allowed_private_hosts).await?; + + let client = self.build_reqwest_client()?; + + let mut attempts = 0u32; + loop { + let mut request = client.get(parsed.clone()); + for (name, value) in &self.auth_headers { + request = request.header(name.as_str(), value.as_str()); + } + + let response = request.send().await.map_err(SwaggerCliError::Network)?; + let status = response.status(); + + match status { + s if s.is_success() => { + return self.read_response(response).await; + } + StatusCode::UNAUTHORIZED | StatusCode::FORBIDDEN => { + return Err(SwaggerCliError::Auth(format!( + "server returned {status} for '{url}'" + ))); + } + StatusCode::NOT_FOUND => { + return Err(SwaggerCliError::InvalidSpec(format!( + "spec not found at '{url}' (404)" + ))); + } + s if s == StatusCode::TOO_MANY_REQUESTS || s.is_server_error() => { + attempts += 1; + if attempts > self.max_retries { + return Err(SwaggerCliError::Network( + client + .get(url) + .send() + .await + .unwrap_err(), + )); + } + let delay = self.retry_delay(&response, attempts); + tokio::time::sleep(delay).await; + } + _ => { + return Err(SwaggerCliError::InvalidSpec(format!( + "unexpected status {status} fetching '{url}'" + ))); + } + } + } + } + + fn build_reqwest_client(&self) -> Result { + reqwest::Client::builder() + .connect_timeout(self.connect_timeout) + .timeout(self.overall_timeout) + .https_only(!self.allow_insecure_http) + .build() + .map_err(SwaggerCliError::Network) + } + + async fn read_response( + &self, + response: reqwest::Response, + ) -> Result { + let content_type = response + .headers() + .get(reqwest::header::CONTENT_TYPE) + .and_then(|v| v.to_str().ok()) + .map(String::from); + + let etag = response + .headers() + .get(reqwest::header::ETAG) + .and_then(|v| v.to_str().ok()) + .map(String::from); + + let last_modified = response + .headers() + .get(reqwest::header::LAST_MODIFIED) + .and_then(|v| v.to_str().ok()) + .map(String::from); + + // Stream the body with a size limit + let mut bytes = Vec::new(); + let mut stream = response; + while let Some(chunk) = stream.chunk().await.map_err(SwaggerCliError::Network)? { + bytes.extend_from_slice(&chunk); + if bytes.len() as u64 > self.max_bytes { + return Err(SwaggerCliError::PolicyBlocked(format!( + "response exceeds maximum size of {} bytes", + self.max_bytes + ))); + } + } + + Ok(FetchResult { + bytes, + content_type, + etag, + last_modified, + }) + } + + fn retry_delay(&self, _response: &reqwest::Response, attempt: u32) -> Duration { + // TODO: parse Retry-After header when present + RETRY_BASE_DELAY * 2u32.saturating_pow(attempt.saturating_sub(1)) + } +} + +// --------------------------------------------------------------------------- +// Builder +// --------------------------------------------------------------------------- + +#[derive(Default)] +pub struct AsyncHttpClientBuilder { + connect_timeout: Option, + overall_timeout: Option, + max_bytes: Option, + max_retries: Option, + allow_insecure_http: bool, + allowed_private_hosts: Vec, + auth_headers: Vec<(String, String)>, +} + +impl AsyncHttpClientBuilder { + pub fn connect_timeout(mut self, d: Duration) -> Self { + self.connect_timeout = Some(d); + self + } + + pub fn overall_timeout(mut self, d: Duration) -> Self { + self.overall_timeout = Some(d); + self + } + + pub fn max_bytes(mut self, n: u64) -> Self { + self.max_bytes = Some(n); + self + } + + pub fn max_retries(mut self, n: u32) -> Self { + self.max_retries = Some(n); + self + } + + pub fn allow_insecure_http(mut self, allow: bool) -> Self { + self.allow_insecure_http = allow; + self + } + + pub fn allowed_private_hosts(mut self, hosts: Vec) -> Self { + self.allowed_private_hosts = hosts; + self + } + + pub fn auth_header(mut self, name: String, value: String) -> Self { + self.auth_headers.push((name, value)); + self + } + + pub fn build(self) -> AsyncHttpClient { + AsyncHttpClient { + connect_timeout: self.connect_timeout.unwrap_or(DEFAULT_CONNECT_TIMEOUT), + overall_timeout: self.overall_timeout.unwrap_or(DEFAULT_OVERALL_TIMEOUT), + max_bytes: self.max_bytes.unwrap_or(DEFAULT_MAX_BYTES), + max_retries: self.max_retries.unwrap_or(DEFAULT_MAX_RETRIES), + allow_insecure_http: self.allow_insecure_http, + allowed_private_hosts: self.allowed_private_hosts, + auth_headers: self.auth_headers, + } + } +} + +// --------------------------------------------------------------------------- +// Tests +// --------------------------------------------------------------------------- + +#[cfg(test)] +mod tests { + use super::*; + use std::net::{Ipv4Addr, Ipv6Addr}; + + // -- SSRF IP blocking --------------------------------------------------- + + #[test] + fn test_ssrf_blocks_loopback() { + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(127, 0, 0, 1)))); + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(127, 255, 255, 254)))); + assert!(is_ip_blocked(&IpAddr::V6(Ipv6Addr::LOCALHOST))); + } + + #[test] + fn test_ssrf_blocks_private() { + // 10.0.0.0/8 + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(10, 0, 0, 1)))); + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(10, 255, 255, 255)))); + + // 172.16.0.0/12 + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(172, 16, 0, 1)))); + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(172, 31, 255, 255)))); + + // 192.168.0.0/16 + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(192, 168, 1, 1)))); + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(192, 168, 0, 1)))); + } + + #[test] + fn test_ssrf_blocks_link_local() { + // IPv4 link-local (169.254.x.x) -- includes the AWS metadata endpoint + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(169, 254, 169, 254)))); + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(169, 254, 0, 1)))); + + // IPv6 link-local (fe80::/10) + assert!(is_ip_blocked(&IpAddr::V6(Ipv6Addr::new( + 0xfe80, 0, 0, 0, 0, 0, 0, 1 + )))); + } + + #[test] + fn test_ssrf_blocks_multicast() { + assert!(is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(224, 0, 0, 1)))); + assert!(is_ip_blocked(&IpAddr::V6(Ipv6Addr::new( + 0xff02, 0, 0, 0, 0, 0, 0, 1 + )))); + } + + #[test] + fn test_ssrf_blocks_mapped_v4() { + // ::ffff:127.0.0.1 + let mapped = Ipv6Addr::new(0, 0, 0, 0, 0, 0xffff, 0x7f00, 0x0001); + assert!(is_ip_blocked(&IpAddr::V6(mapped))); + + // ::ffff:10.0.0.1 + let mapped_private = Ipv6Addr::new(0, 0, 0, 0, 0, 0xffff, 0x0a00, 0x0001); + assert!(is_ip_blocked(&IpAddr::V6(mapped_private))); + } + + #[test] + fn test_ssrf_allows_public() { + assert!(!is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(8, 8, 8, 8)))); + assert!(!is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(1, 1, 1, 1)))); + assert!(!is_ip_blocked(&IpAddr::V4(Ipv4Addr::new(93, 184, 216, 34)))); + } + + // -- URL validation ----------------------------------------------------- + + #[test] + fn test_url_rejects_http() { + let result = validate_url("http://example.com/spec.json", false); + assert!(result.is_err()); + let err = result.unwrap_err(); + assert!(matches!(err, SwaggerCliError::PolicyBlocked(_))); + } + + #[test] + fn test_url_allows_https() { + let result = validate_url("https://example.com/spec.json", false); + assert!(result.is_ok()); + assert_eq!( + result.unwrap().as_str(), + "https://example.com/spec.json" + ); + } + + #[test] + fn test_url_allows_http_when_opted_in() { + let result = validate_url("http://example.com/spec.json", true); + assert!(result.is_ok()); + } + + #[test] + fn test_url_rejects_unsupported_scheme() { + let result = validate_url("ftp://example.com/spec.json", false); + assert!(result.is_err()); + assert!(matches!(result.unwrap_err(), SwaggerCliError::InvalidSpec(_))); + } + + #[test] + fn test_url_rejects_garbage() { + let result = validate_url("not a url at all", false); + assert!(result.is_err()); + } + + // -- Builder defaults --------------------------------------------------- + + #[test] + fn test_builder_defaults() { + let client = AsyncHttpClient::builder().build(); + assert_eq!(client.connect_timeout, DEFAULT_CONNECT_TIMEOUT); + assert_eq!(client.overall_timeout, DEFAULT_OVERALL_TIMEOUT); + assert_eq!(client.max_bytes, DEFAULT_MAX_BYTES); + assert_eq!(client.max_retries, DEFAULT_MAX_RETRIES); + assert!(!client.allow_insecure_http); + assert!(client.allowed_private_hosts.is_empty()); + assert!(client.auth_headers.is_empty()); + } + + #[test] + fn test_builder_custom() { + let client = AsyncHttpClient::builder() + .connect_timeout(Duration::from_secs(3)) + .overall_timeout(Duration::from_secs(30)) + .max_bytes(1024) + .max_retries(5) + .allow_insecure_http(true) + .allowed_private_hosts(vec!["internal.corp".into()]) + .auth_header("Authorization".into(), "Bearer tok".into()) + .build(); + + assert_eq!(client.connect_timeout, Duration::from_secs(3)); + assert_eq!(client.overall_timeout, Duration::from_secs(30)); + assert_eq!(client.max_bytes, 1024); + assert_eq!(client.max_retries, 5); + assert!(client.allow_insecure_http); + assert_eq!(client.allowed_private_hosts, vec!["internal.corp"]); + assert_eq!(client.auth_headers.len(), 1); + } + + // -- DNS + SSRF integration (async) ------------------------------------- + + #[tokio::test] + async fn test_resolve_and_check_skips_allowed_host() { + let result = + resolve_and_check("localhost", 80, &["localhost".into()]).await; + assert!(result.is_ok()); + } +} diff --git a/src/core/mod.rs b/src/core/mod.rs index d0201fa..b01c7a1 100644 --- a/src/core/mod.rs +++ b/src/core/mod.rs @@ -1,4 +1,5 @@ pub mod cache; pub mod config; +pub mod http; pub mod indexer; pub mod spec;