Files
swagger-cli/prd-swagger-cli.md

4155 lines
146 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
plan: true
title: ""
status: iterating
iteration: 2
target_iterations: 8
beads_revision: 0
related_plans: []
created: 2026-02-12
updated: 2026-02-12
---
# PRD: swagger-cli - OpenAPI Specification CLI Tool
**Version:** 1.6
**Status:** Draft (revised: integrates SSRF/transport policy, alias format validation, robot JSON Schema artifacts, coalesced LRU writes, resumable sync, credential source abstraction, installer portability, diff command, dependency audit CI)
**Created:** 2026-02-02
**Target Audience:** Engineering teams, AI agents
## Executive Summary
Build a fast, agent-optimized CLI tool for querying OpenAPI specifications. Enables instant endpoint discovery, schema browsing, and API exploration without repeated web fetches.
**Key Metrics:**
- Query latency: <50ms for cached specs (index-backed, no raw spec parsing)
- First query: <2s including fetch + index build
- Robot mode JSON: 100% structured output with versioned schema contract and published JSON Schema artifacts
- Exit codes: Consistent, actionable, concurrency-safe
---
## Problem Statement
### Current State
**Agents querying APIs today:**
1. Use `WebFetch` with OpenAPI URLs
2. Requires prompt engineering to extract specific data
3. Multiple round-trips to find information
4. Re-fetches on every query (no caching)
5. Inconsistent output format
**Example pain point (observed):**
```bash
# Took 3 attempts to get endpoint info
WebFetch(url1) → config code only
WebFetch(url2)404 error
WebFetch(url3) → finally got spec
# Total: ~8-10 seconds, 3 tool calls
```
### Desired State
**With swagger-cli:**
```bash
# First query (with fetch)
swagger-cli fetch https://petstore3.swagger.io/api/v3/openapi.json --alias petstore
# ~2 seconds
# All subsequent queries
swagger-cli list petstore --tag "pet" --robot
# <50ms, structured JSON
```
---
## Goals and Non-Goals
### Goals
1. **Performance:** <50ms cached queries, <2s first fetch
2. **Agent ergonomics:** Robot mode with structured JSON, meaningful exit codes
3. **Reliability:** Offline-capable after initial fetch
4. **Flexibility:** Support multiple APIs with aliases
5. **Maintainability:** Clear code, comprehensive tests, documented decisions
6. **Determinism:** Global network policy control for reproducible offline/CI execution
7. **Operational hygiene:** Cache lifecycle management (prune, stats, size caps)
### Non-Goals
1. **API execution:** Not a replacement for curl/httpie (no actual HTTP requests)
2. **Code generation:** Not generating client SDKs (maybe v2)
3. **Spec validation:** Not validating OpenAPI correctness (trust upstream)
4. **UI/TUI:** CLI only, no interactive interfaces
---
## User Personas
### Primary: AI Agents
**Needs:**
- Instant endpoint lookup
- Structured JSON output
- Predictable behavior
- Meaningful errors
- No interactivity
**Usage patterns:**
```bash
swagger-cli fetch $URL --alias api --robot
swagger-cli search api "create user" --robot | jq '.data[0].path'
swagger-cli show api "/users" --robot
```
### Secondary: Human Developers
**Needs:**
- Quick API reference
- Readable output
- Easy setup
- Multiple API management
**Usage patterns:**
```bash
swagger-cli list petstore --method POST
swagger-cli show petstore "/pets/{petId}"
swagger-cli search petstore "order" --limit 5
```
---
## Functional Requirements
### FR-1: Spec Fetching and Caching
**Description:** Download and cache OpenAPI specs locally.
**Acceptance Criteria:**
- ✓ Fetch from URL with optional auth headers
- ✓ Store with user-defined alias
- ✓ Validate alias format: `^[A-Za-z0-9][A-Za-z0-9._-]{0,63}$` — rejects path separators, `..`, reserved device names (CON, PRN, NUL, etc.), and leading dots
- ✓ Canonicalize local file paths before ingest; fail fast on unreadable targets
- ✓ Cache in XDG cache dir (default `~/.cache/swagger-cli/`) with per-alias directories
- ✓ Store four files per alias: original upstream bytes (`raw.source`), canonical JSON (`raw.json`), derived index (`index.json`), metadata (`meta.json`)
- ✓ Validate spec is parseable JSON or YAML before caching (no full OpenAPI structural validation)
- ✓ YAML input is normalized to JSON during ingestion (raw.json always stores canonical JSON; original bytes preserved in raw.source)
- ✓ Enforce `--max-bytes` during streaming download (fail before full buffering; do not buffer entire response in memory)
- ✓ All query commands (list/search/tags/schemas/aliases/doctor) read index only -- never parse raw spec
- ✓ Handle redirects (follow up to 5)
- ✓ Default timeout is an overall request timeout (10s). Connect timeout is capped at 5s.
- ✓ Max spec size guardrail (default 25MB, configurable)
- ✓ Enforce fetch network policy by default: block loopback (127.0.0.0/8, ::1), link-local (169.254.0.0/16, fe80::/10), RFC1918 (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), and multicast targets for remote URL fetches — mitigates SSRF
- ✓ Require HTTPS for remote URLs by default; plain HTTP requires explicit `--allow-insecure-http` opt-in
- ✓ Resolve and validate final connected IP after redirects to mitigate DNS-rebinding bypasses (check resolved IP against blocked ranges before sending request body)
- ✓ Support local specs: `file://...` and `./openapi.json` paths (network policy does not apply to local files)
- ✓ Support stdin: `swagger-cli fetch - --alias x` reads spec from stdin (useful for piping from other tools)
- ✓ Support OpenAPI 3.0.x and 3.1.x
- ✓ Cache writes are crash-consistent (meta.json written LAST as commit marker; generation + index_hash validated on read)
- ✓ Cache is safe under concurrent processes (per-alias file lock)
- ✓ Readers detect torn/partial cache state (meta missing or generation/hash mismatch → CACHE_INTEGRITY error)
- ✓ Never print auth tokens in output; `--auth-header` values are redacted in logs/errors
- ✓ Support `--auth-profile <NAME>` to load auth config from `config.toml` (preferred over raw tokens in shell history)
- ✓ Optional fetch-time external ref bundling (`--resolve-external-refs`) with host allowlist and depth/size limits
- ✓ Validate all index pointers (operation_ptr, schema_ptr) resolve against raw.json during fetch/sync; invalid pointers fail the fetch
**Command:**
```bash
swagger-cli fetch <url> [OPTIONS]
OPTIONS:
--alias <NAME> Set alias for this spec (required; must match ^[A-Za-z0-9][A-Za-z0-9._-]{0,63}$)
--header <HEADER> Add request header (repeatable; e.g., "X-API-Key: token", "Accept: application/json")
--auth-header <HEADER> Alias for --header (kept for discoverability; same behavior)
--bearer <TOKEN> Shorthand for Bearer auth
--auth-profile <NAME> Load auth header/token from config.toml profile (preferred over raw tokens)
--force Overwrite existing alias
--timeout-ms <N> Request timeout in ms (default: 10000)
--max-bytes <N> Max download size in bytes (default: 26214400 / 25MB)
--retries <N> Retry transient failures (default: 2)
--input-format <FORMAT> Input format hint: auto (default), json, yaml
--resolve-external-refs Resolve and bundle external $ref targets at fetch-time (opt-in)
--ref-allow-host <HOST> Allowlist host for external ref resolution (repeatable; required with --resolve-external-refs)
--ref-max-depth <N> Max external ref chain depth (default: 3)
--ref-max-bytes <N> Total bytes cap for all external ref fetches (default: 10MB)
--allow-private-host <HOST> Allow specific private/internal host for SSRF policy bypass (repeatable)
--allow-insecure-http Permit http:// URLs (default: reject remote URLs without HTTPS)
--robot Machine-readable output
```
**Examples:**
```bash
# Basic fetch
swagger-cli fetch https://api.example.com/openapi.json --alias example
# With authentication
swagger-cli fetch https://internal.api/spec.json --alias internal \
--auth-header "X-API-Key: secret123"
# Bearer token shorthand
swagger-cli fetch https://api.com/spec --alias api --bearer $TOKEN
# Preferred: auth profile (avoids shell history leakage)
swagger-cli fetch https://internal.api/spec.json --alias internal \
--auth-profile corp-internal
# Force overwrite
swagger-cli fetch $URL --alias petstore --force
# Local file (useful for testing and offline workflows)
swagger-cli fetch ./openapi.json --alias local-api
swagger-cli fetch file:///absolute/path/to/spec.json --alias local-api
# Stdin (useful for piping from other tools or CI)
curl -s https://api.example.com/spec.json | swagger-cli fetch - --alias piped-api
# YAML spec (auto-detected or hinted)
swagger-cli fetch https://api.example.com/openapi.yaml --alias yaml-api
swagger-cli fetch ./spec.yaml --alias local-yaml --input-format yaml
# External ref bundling (opt-in, with host allowlist)
swagger-cli fetch https://api.example.com/openapi.json --alias bundled \
--resolve-external-refs \
--ref-allow-host api.example.com \
--ref-allow-host schemas.example.com
```
**Robot output:**
```json
{
"ok": true,
"data": {
"alias": "petstore",
"url": "https://petstore3.swagger.io/api/v3/openapi.json",
"version": "1.0.17",
"title": "Swagger Petstore",
"endpoint_count": 19,
"schema_count": 8,
"cached_at": "2026-02-02T20:45:00Z",
"source_format": "json",
"cache_dir": "/Users/user/.cache/swagger-cli/aliases/petstore/",
"files": {
"raw_source": "/Users/user/.cache/swagger-cli/aliases/petstore/raw.source",
"raw": "/Users/user/.cache/swagger-cli/aliases/petstore/raw.json",
"index": "/Users/user/.cache/swagger-cli/aliases/petstore/index.json",
"meta": "/Users/user/.cache/swagger-cli/aliases/petstore/meta.json"
},
"content_hash": "sha256:abc123..."
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "fetch",
"duration_ms": 1847
}
}
```
**Error cases:**
| Error | Exit Code | JSON Error Code |
|-------|-----------|-----------------|
| Network failure | 4 | `NETWORK_ERROR` |
| Invalid JSON | 5 | `INVALID_SPEC` |
| Alias exists | 6 | `ALIAS_EXISTS` |
| Auth failed | 7 | `AUTH_FAILED` |
| Policy blocked (SSRF, insecure transport) | 16 | `POLICY_BLOCKED` |
**HTTP error classification (normative):**
- `401/403``AUTH_FAILED` (exit 7)
- `404``INVALID_SPEC` (exit 5) — spec not found at URL
- Other `4xx``NETWORK_ERROR` (exit 4, not retryable)
- `5xx``NETWORK_ERROR` (exit 4, retryable with backoff)
**Decision rationale:**
- **Cache location:** XDG cache dir (`~/.cache/`) for ephemeral data; config in XDG config dir (`~/.config/`). Proper separation per XDG spec.
- **Four-file cache:** `raw.source` preserves exact upstream bytes (lossless provenance, may be JSON or YAML); `raw.json` stores canonical normalized JSON (all query/show logic operates on this); `index.json` is a precomputed, small, fast-loading structure for query commands; `meta.json` stores fetch metadata. The raw.source/raw.json split enables YAML input support without any impact on query performance or internal logic.
- **Alias format validation:** Aliases map directly to directory names, so path traversal (`../`), reserved device names (`CON`, `NUL`), and shell-hostile characters must be rejected at parse time. The regex `^[A-Za-z0-9][A-Za-z0-9._-]{0,63}$` covers all portable filesystem-safe names. 64 chars is generous but prevents absurdly long directory names.
- **SSRF protection:** Agent-facing CLI tools may run in privileged environments with access to cloud metadata endpoints (169.254.169.254) and internal services. Blocking private/loopback/link-local/multicast ranges by default prevents accidental SSRF. `--allow-private-host` permits explicit exceptions for legitimate internal API specs. DNS rebinding check validates the resolved IP after redirects, not just the hostname.
- **HTTPS-by-default:** Agents downloading API specs over plaintext HTTP risk MITM injection of malicious spec content. Requiring HTTPS by default with explicit `--allow-insecure-http` opt-in makes the secure path the default.
- **Alias requirement:** Forces explicit naming, prevents confusion with multiple specs
- **No auto-sync:** Manual sync required (`swagger-cli sync`) to prevent unexpected network calls
- **Content hash:** SHA256 for change detection, stored in metadata
- **Crash-consistent writes:** Cache writes use a multi-file commit protocol: raw.source, raw.json, and index.json are written and renamed first, then meta.json is written LAST as the commit marker. Readers validate meta.generation + meta.index_hash against index.json (and meta.raw_hash against raw.json for show commands) to detect torn state. Per-alias file locks prevent concurrent write corruption (justifies exit code 9, `CACHE_LOCKED`).
- **Auth redaction:** Auth tokens are never printed in output to prevent accidental secret leakage in logs or piped output.
- **Auth profiles:** `--auth-profile <NAME>` loads auth config from `config.toml`, avoiding raw tokens in shell history, process lists, and CI logs. Explicit `--header`/`--bearer` flags merge with (and override) profile headers for one-off use cases.
- **Repeatable headers:** `--header` is repeatable for multi-header fetches (common in corporate environments). `--auth-header` is kept as an alias for discoverability.
- **YAML input support:** Many real-world OpenAPI specs are authored in YAML. Input format is auto-detected (file extension, Content-Type header) or hinted via `--input-format`. YAML is normalized to JSON during ingestion; `raw.source` preserves original bytes, `raw.json` stores canonical JSON. This keeps all query/index logic JSON-only while accepting YAML at the boundary.
- **Streaming max-bytes:** `--max-bytes` is enforced during download via streaming byte counting, not post-download file size check. Prevents OOM on malicious/huge specs.
- **External ref bundling:** Opt-in at fetch time only. Requires explicit `--ref-allow-host` allowlist to prevent fetching from arbitrary hosts. Bundled result is stored in `raw.json` (all refs inlined). Preserves offline guarantee for all query commands. Default behavior (no `--resolve-external-refs`) is unchanged.
- **Pointer validation:** All `operation_ptr` and `schema_ptr` values in the index are validated against `raw.json` at fetch/sync time. Invalid pointers fail the fetch rather than silently producing broken `show` commands later. `doctor` re-validates all pointers.
### FR-2: Endpoint Listing
**Description:** List all endpoints with filtering and sorting.
**Acceptance Criteria:**
- ✓ Display path, method, summary
- ✓ Filter by HTTP method
- ✓ Filter by tag/category
- ✓ Filter by path pattern (regex)
- ✓ Sort by path (default) or method
- ✓ Limit results
- ✓ Robot mode JSON
**Command:**
```bash
swagger-cli list [ALIAS] [OPTIONS]
OPTIONS:
--method <METHOD> Filter by HTTP method (GET, POST, PUT, DELETE, PATCH)
--tag <TAG> Filter by OpenAPI tag
--path <PATTERN> Filter by path regex (invalid regex → USAGE_ERROR; no silent fallback)
--sort <FIELD> Sort by: path (default), method, tag
--limit <N> Limit results (default: 50)
--all Show all results (no limit)
--all-aliases Run query across all aliases; include `alias` field per result in output
--robot Machine-readable output
```
**Examples:**
```bash
# All endpoints
swagger-cli list petstore
# POST endpoints only
swagger-cli list petstore --method POST
# Tagged endpoints
swagger-cli list petstore --tag "pet"
# Path pattern
swagger-cli list petstore --path "store.*"
# Combined filters
swagger-cli list petstore --method POST --tag pet --limit 10
# All results
swagger-cli list petstore --all
```
**Human output:**
```
Petstore API (Swagger Petstore v1.0.17) - 19 endpoints
GET /pet/{petId} Find pet by ID
GET /pet/findByStatus Finds pets by status
GET /pet/findByTags Finds pets by tags
POST /pet Add a new pet to the store
PUT /pet Update an existing pet
DELETE /pet/{petId} Deletes a pet
GET /store/inventory Returns pet inventories
POST /store/order Place an order for a pet
Showing 8 of 19 endpoints (filtered)
```
**Robot output:**
```json
{
"ok": true,
"data": {
"endpoints": [
{
"path": "/pet",
"method": "POST",
"summary": "Add a new pet to the store",
"description": "Add a new pet to the store",
"tags": ["pet"],
"operation_id": "addPet",
"deprecated": false,
"parameters": [],
"request_body_required": true,
"security": ["petstore_auth"]
}
],
"total": 19,
"filtered": 3,
"applied_filters": {
"method": "POST",
"tag": "pet"
}
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "list",
"alias": "petstore",
"spec_version": "1.0.17",
"cached_at": "2026-02-02T20:45:00Z",
"duration_ms": 12
}
}
```
**Decision rationale:**
- **Default limit:** 50 prevents overwhelming output
- **Regex patterns:** More flexible than glob patterns for paths
- **Tag filtering:** Maps to OpenAPI native tags, not custom grouping
- **Index-backed:** All filtering and sorting operates on `index.json`, never parses raw spec. Index contains *effective* security scheme names (root + operation override semantics) so agents can accurately detect auth requirements without loading raw.json.
- **Deterministic ordering:** Index arrays are pre-sorted (endpoints by path+method, schemas by name, tags by name). Robot output preserves this ordering for stable diffs and predictable agent parsing.
- **Strict validation:** Invalid regex in `--path` or `--name` fails with `USAGE_ERROR` immediately. No silent fallback to "no filter." Invalid `--in` field selectors similarly fail fast.
- **Color/unicode:** Controlled by TTY detection independently of `--robot`. Piping to `| less` or `> file` disables color but does NOT switch to JSON output.
### FR-3: Endpoint Details
**Description:** Show complete details for a specific endpoint.
**Acceptance Criteria:**
- ✓ Display path, method, summary, description
- ✓ Show all parameters (path, query, header, cookie)
- ✓ Show request body schema (if applicable)
- ✓ Show response schemas with status codes
- ✓ Show security requirements
- ✓ Pretty-print JSON schemas
- ✓ Robot mode with full structured data
**Command:**
```bash
swagger-cli show [ALIAS] <path> [OPTIONS]
OPTIONS:
--method <METHOD> Required if path has multiple methods; otherwise optional
--format <FORMAT> Output format: pretty (default), json
--expand-refs Expand $ref pointers inline
--max-depth <N> Max expansion depth (default: 3; only with --expand-refs)
--robot Machine-readable output (implies --format json)
```
**Examples:**
```bash
# Show endpoint
swagger-cli show petstore "/pet/{petId}"
# Specific method
swagger-cli show petstore "/pet" --method POST
# With ref expansion (bounded)
swagger-cli show petstore "/store/order" --expand-refs --max-depth 5
# Robot mode
swagger-cli show petstore "/pet" --method POST --robot
```
**Human output:**
```
POST /pet
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Summary: Add a new pet to the store
Description: Add a new pet to the store
Tags: pet
Operation ID: addPet
Security: petstore_auth (required)
━━ Request Body (application/json) ━━━━━━━━━━━━━━━━━━━━━━━━━
Schema: Pet (required)
{
"id": 10,
"name": "doggie",
"category": { "id": 1, "name": "Dogs" },
"photoUrls": ["string"],
"tags": [{ "id": 0, "name": "string" }],
"status": "available"
}
━━ Responses ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
200 OK Successful operation
Schema: Pet
405 Invalid Input Invalid input
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```
**Robot output:**
```json
{
"ok": true,
"data": {
"path": "/pet",
"method": "POST",
"summary": "Add a new pet to the store",
"description": "Add a new pet to the store",
"tags": ["pet"],
"operation_id": "addPet",
"deprecated": false,
"parameters": [],
"request_body": {
"required": true,
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Pet"
}
}
}
},
"responses": {
"200": {
"description": "Successful operation",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Pet"
}
}
}
},
"405": {
"description": "Invalid input"
}
},
"security": [
{
"petstore_auth": ["write:pets", "read:pets"]
}
]
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "show",
"alias": "petstore",
"spec_version": "1.0.17",
"duration_ms": 23
}
}
```
**Decision rationale:**
- **Default no-expand:** Refs are useful for understanding schema reuse; expand optional
- **Bounded expand:** `--expand-refs` MUST be bounded by `--max-depth` (default 3) and MUST detect cycles. Cycles are annotated in output with `{"$circular_ref": "#/components/schemas/Foo"}` rather than causing infinite recursion.
- **External refs are NOT fetched (no network).** If `--expand-refs` encounters a non-internal `$ref` (anything not starting with `#/`), it MUST leave it unexpanded and annotate with `{"$external_ref": "<ref>"}` plus a warning in `meta.warnings[]` (robot) or a warning line (human). This preserves the offline guarantee.
- **Multiple methods:** If multiple methods exist and `--method` is not provided, return `USAGE_ERROR` with available methods listed in the suggestion field. This avoids nondeterminism and makes agent retries straightforward
- **Full schema inclusion:** Agents need complete data for code generation potential
- **Index-then-raw:** `show` first loads `index.json` to find the operation's JSON pointer, then loads `raw.json` as `serde_json::Value` and extracts the operation subtree. This is the only command that reads `raw.json`.
### FR-4: Schema Browser
**Description:** Browse and inspect component schemas.
**Acceptance Criteria:**
- ✓ List all schemas
- ✓ Filter by name pattern
- ✓ Show schema details
- ✓ Expand nested refs recursively
- ✓ Detect circular references
- ✓ Robot mode JSON
**Command:**
```bash
swagger-cli schemas [ALIAS] [OPTIONS]
OPTIONS:
--name <PATTERN> Filter by schema name (regex; invalid → USAGE_ERROR)
--list List all schema names (default)
--show <NAME> Show specific schema details
--expand-refs Recursively expand $ref pointers
--max-depth <N> Max recursion depth (default: 3)
--robot Machine-readable output
```
**Examples:**
```bash
# List all schemas
swagger-cli schemas petstore
# Filter schemas
swagger-cli schemas petstore --name ".*Pet.*"
# Show specific schema
swagger-cli schemas petstore --show Pet
# Show with expansion
swagger-cli schemas petstore --show Order --expand-refs
# Robot mode
swagger-cli schemas petstore --show Pet --robot
```
**Human output:**
```
Petstore API Schemas (8 total)
Pet
Category
Tag
Order
User
Address
Customer
ApiResponse
Showing 8 schemas
```
**Schema detail output:**
```
Schema: Pet
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Type: object
Required: name, photoUrls
Properties:
id (integer) Pet ID
name (string) Pet name
category ($ref) Reference to Category
photoUrls (array) Array of image URLs
tags (array) Array of Tag references
status (string) Pet status (enum: available, pending, sold)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```
**Robot output:**
```json
{
"ok": true,
"data": {
"name": "Pet",
"type": "object",
"required": ["name", "photoUrls"],
"properties": {
"id": {
"type": "integer",
"format": "int64",
"description": "Pet ID"
},
"name": {
"type": "string",
"description": "Pet name"
},
"category": {
"$ref": "#/components/schemas/Category"
},
"photoUrls": {
"type": "array",
"items": { "type": "string" }
},
"tags": {
"type": "array",
"items": { "$ref": "#/components/schemas/Tag" }
},
"status": {
"type": "string",
"enum": ["available", "pending", "sold"],
"description": "Pet status in the store"
}
}
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "schemas",
"alias": "petstore",
"circular_refs": false,
"duration_ms": 8
}
}
```
**Decision rationale:**
- **Max depth:** Prevents infinite recursion with circular refs
- **Circular detection:** Warns rather than errors (valid OpenAPI pattern)
- **Regex filter:** More powerful than substring matching
- **Index-backed listing:** Schema names and pointers come from `index.json`. Schema details (`--show`) load the relevant subtree from `raw.json` via JSON pointer.
- **Parameter details intentionally shallow:** name/location/required/desc in index preserves <50ms queries while improving agent usefulness for planning.
### FR-5: Text Search
**Description:** Search across endpoint paths, summaries, descriptions, and schema names.
**Acceptance Criteria:**
- ✓ Full-text search across all searchable fields
- ✓ Case-insensitive by default
- ✓ Rank results by relevance
- ✓ Show context snippets
- ✓ Limit results
- ✓ Robot mode JSON
**Command:**
```bash
swagger-cli search [ALIAS] <query> [OPTIONS]
OPTIONS:
--case-sensitive Case-sensitive matching
--exact Exact phrase matching
--in <FIELDS> Search in: all (default), paths, descriptions, schemas (invalid field → USAGE_ERROR)
--limit <N> Limit results (default: 20)
--all-aliases Search across all aliases; include `alias` field per result in output
--robot Machine-readable output
```
**Examples:**
```bash
# Basic search
swagger-cli search petstore "pet status"
# Case-sensitive
swagger-cli search petstore "ID" --case-sensitive
# Exact phrase
swagger-cli search petstore "find pet" --exact
# Scoped search
swagger-cli search petstore "order" --in paths,descriptions
# Robot mode
swagger-cli search petstore "store inventory" --robot --limit 5
```
**Human output:**
```
Petstore API Search: "pet status" (4 matches)
━━ Endpoints ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GET /pet/findByStatus
Finds pets by status
Multiple status values can be provided with comma separated strings...
PUT /pet
Update an existing pet
Update an existing pet by Id...
━━ Schemas ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Pet
Schema for pet objects with status field
Order
Schema for store orders with status tracking
```
**Robot output:**
```json
{
"ok": true,
"data": {
"query": "pet status",
"results": [
{
"type": "endpoint",
"path": "/pet/findByStatus",
"method": "GET",
"summary": "Finds pets by status",
"description": "Multiple status values can be provided...",
"rank": 1,
"score": 9500,
"matches": [
{
"field": "summary",
"snippet": "Finds pets by status"
},
{
"field": "path",
"snippet": "/pet/findByStatus"
}
]
},
{
"type": "schema",
"name": "Pet",
"description": "Schema for pet objects",
"rank": 2,
"score": 8700,
"matches": [
{
"field": "name",
"snippet": "Pet"
}
]
}
],
"total": 4,
"query_time_ms": 12
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "search",
"alias": "petstore",
"duration_ms": 12
}
}
```
**Decision rationale:**
- **Tokenized scoring:** Query is split into terms; score combines field weights (path > summary > description) with term coverage boost. `--exact` treats query as single phrase. Scores are quantized to integer (basis points) for cross-platform determinism and stable golden tests. Results include `rank` (1..N) for easy agent filtering.
- **Index-backed:** Search runs over `index.json` for speed and consistency; never loads `raw.json`
- **No semantic search:** Adds complexity; text search sufficient for MVP
- **Context snippets:** 50 chars before/after match, using char-boundary-safe slicing (Unicode-safe)
- **Options honored:** `--case-sensitive` and `--exact` flags affect matching behavior as expected
### FR-6: Alias Management
**Description:** Manage multiple API aliases.
**Acceptance Criteria:**
- ✓ List all configured aliases
- ✓ Show alias details (URL, version, stats)
- ✓ Rename aliases
- ✓ Delete aliases
- ✓ Set default alias
- ✓ Robot mode JSON
**Command:**
```bash
swagger-cli aliases [OPTIONS]
OPTIONS:
--list List all aliases (default)
--show <ALIAS> Show alias details
--rename <OLD> <NEW> Rename alias
--delete <ALIAS> Delete alias
--set-default <ALIAS> Set default alias
--robot Machine-readable output
```
**Examples:**
```bash
# List all
swagger-cli aliases
# Details
swagger-cli aliases --show petstore
# Rename
swagger-cli aliases --rename petstore pets
# Delete
swagger-cli aliases --delete old-api
# Set default
swagger-cli aliases --set-default petstore
```
**Human output:**
```
Configured APIs (3 total)
petstore * (default)
URL: https://petstore3.swagger.io/api/v3/openapi.json
Version: 1.0.17
Cached: 2026-02-02 15:45
Size: 45 KB
stripe
URL: https://raw.githubusercontent.com/stripe/openapi/master/openapi/spec3.json
Version: 2023-10-16
Cached: 2026-01-28 10:22
Size: 2.4 MB
github
URL: https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/api.github.com/api.github.com.json
Version: 1.1.4
Cached: 2026-02-01 09:15
Size: 8.2 MB
```
**Robot output:**
```json
{
"ok": true,
"data": {
"aliases": [
{
"name": "petstore",
"url": "https://petstore3.swagger.io/api/v3/openapi.json",
"version": "1.0.17",
"is_default": true,
"cached_at": "2026-02-02T20:45:00Z",
"cache_size_bytes": 46080,
"endpoint_count": 19,
"schema_count": 8
},
{
"name": "stripe",
"url": "https://raw.githubusercontent.com/stripe/openapi/master/openapi/spec3.json",
"version": "2023-10-16",
"is_default": false,
"cached_at": "2026-01-28T15:22:00Z",
"cache_size_bytes": 2516582,
"endpoint_count": 312,
"schema_count": 456
}
],
"default_alias": "petstore"
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "aliases",
"duration_ms": 5
}
}
```
**Decision rationale:**
- **Default alias:** Allows `swagger-cli list` without specifying alias
- **Size tracking:** Useful for cache management
- **No auto-delete:** Explicit delete required to prevent accidents
### FR-7: Sync and Updates
**Description:** Check for and apply spec updates.
**Acceptance Criteria:**
- ✓ Check if remote spec changed
- ✓ Re-fetch if changed
- ✓ Preserve alias and config
- ✓ Report differences
- ✓ Support --all flag
-`sync --all` persists per-alias progress checkpoint for resumable execution (`--resume`)
- ✓ Supports controlled abort via failure budget (`--max-failures`) to limit blast radius from noisy upstream incidents
- ✓ Robot mode JSON
**Command:**
```bash
swagger-cli sync [ALIAS] [OPTIONS]
OPTIONS:
--all Sync all aliases
--dry-run Check without updating
--force Force re-fetch regardless of changes
--details Include capped lists of added/removed/modified items in robot output (max 200 items; truncated flag)
--jobs <N> Parallel aliases to sync with --all (default: 4, bounded)
--per-host <N> Max concurrent requests per host (default: 2)
--resume Resume from last interrupted --all sync checkpoint
--max-failures <N> Abort run after N alias failures (default: unlimited)
--robot Machine-readable output
```
**Examples:**
```bash
# Sync one
swagger-cli sync petstore
# Check without updating
swagger-cli sync petstore --dry-run
# Sync all
swagger-cli sync --all
# Force re-fetch
swagger-cli sync petstore --force
```
**Human output:**
```
Syncing petstore...
Remote: https://petstore3.swagger.io/api/v3/openapi.json
Local: 1.0.17 (cached 2026-02-02 15:45)
Remote: 1.0.17 (checked 2026-02-02 16:14)
✓ No changes detected
Cache is up to date.
```
**Change detected output:**
```
Syncing petstore...
Remote: https://petstore3.swagger.io/api/v3/openapi.json
Local: 1.0.17 (cached 2026-02-02 15:45)
Remote: 1.0.18 (checked 2026-02-02 16:14)
⚠ Changes detected
Version: 1.0.17 → 1.0.18
Endpoints: 19 → 21 (+2)
Schemas: 8 → 9 (+1)
✓ Updated cache
Use `swagger-cli sync petstore --dry-run --robot` to retrieve the summary change set.
```
**Robot output:**
```json
{
"ok": true,
"data": {
"alias": "petstore",
"changed": true,
"local_version": "1.0.17",
"remote_version": "1.0.18",
"changes": {
"version_changed": true,
"endpoints": {
"before": 19,
"after": 21,
"added": 2,
"removed": 0,
"modified": 1
},
"endpoint_details": {
"added": [["POST", "/pets/batch"], ["GET", "/inventory/summary"]],
"removed": [],
"modified": [["PUT", "/pet"]],
"truncated": false
},
"schemas": {
"before": 8,
"after": 9,
"added": 1,
"removed": 0
},
"schema_details": {
"added": ["BatchRequest"],
"removed": [],
"truncated": false
}
},
"updated_at": "2026-02-02T21:14:00Z"
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "sync",
"duration_ms": 1523
}
}
```
**Decision rationale:**
- **ETag + Last-Modified support:** Prefer `If-None-Match` (ETag); fallback to `If-Modified-Since` (Last-Modified) when ETag absent. Covers more real-world servers.
- **Default behavior:** `sync` checks and applies updates when changed. `--dry-run` performs check-only without writing.
- **Index-based diffs:** Change detection (added/removed endpoints, schemas) computed by comparing old vs new `index.json`, not raw spec text. Fast and deterministic.
- **Actionable details:** `--details` includes capped (max 200) lists of added/removed/modified endpoints and schemas with `truncated` flag. Makes `sync --dry-run --robot --details` immediately useful to agents for understanding what changed.
- **Concurrency-safe:** Sync acquires per-alias lock before writing
- **Bounded parallelism:** `sync --all` uses bounded concurrency (default 4 aliases) with per-host throttling (default 2 per host). Prevents abusive request patterns against upstream servers. Retries honor `Retry-After` header when present; otherwise exponential backoff + jitter.
- **Partial failure reporting:** `sync --all --robot` reports per-alias success/failure without aborting the entire run. Agents can parse which aliases failed and retry selectively.
### FR-8: Tags Browser
**Description:** Browse OpenAPI tags/categories.
**Acceptance Criteria:**
- ✓ List all tags
- ✓ Show endpoint count per tag
- ✓ Show tag descriptions
- ✓ Robot mode JSON
**Command:**
```bash
swagger-cli tags [ALIAS] [OPTIONS]
OPTIONS:
--robot Machine-readable output
```
**Examples:**
```bash
# List all tags
swagger-cli tags petstore
# Robot mode
swagger-cli tags petstore --robot
```
**Human output:**
```
Petstore API Tags (3 total)
pet (8 endpoints)
Everything about your Pets
store (4 endpoints)
Access to Petstore orders
user (7 endpoints)
Operations about user
```
**Robot output:**
```json
{
"ok": true,
"data": {
"tags": [
{
"name": "pet",
"description": "Everything about your Pets",
"endpoint_count": 8
},
{
"name": "store",
"description": "Access to Petstore orders",
"endpoint_count": 4
},
{
"name": "user",
"description": "Operations about user",
"endpoint_count": 7
}
]
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "tags",
"alias": "petstore",
"total_tags": 3,
"duration_ms": 3
}
}
```
### FR-9: Health Check and Doctor
**Description:** Validate installation and cache health.
**Acceptance Criteria:**
- ✓ Check config directory exists
- ✓ Validate cached specs
- ✓ Detect corrupted files
- ✓ Detect partial/in-progress caches (e.g., raw/index present but meta missing; generation/hash mismatch between meta and index)
- ✓ Warn if config file permissions are insecure (auth tokens present + file is group/world readable)
- ✓ Verify all index pointers (operation_ptr, schema_ptr) resolve to existing JSON nodes in raw.json
- ✓ If raw exists but index is missing/invalid or index_version mismatched, rebuild index (when `--fix` enabled)
- ✓ Check for stale caches
- ✓ Report disk usage
- ✓ Robot mode JSON
**Command:**
```bash
swagger-cli doctor [OPTIONS]
OPTIONS:
--fix Auto-fix issues after acquiring per-alias lock:
rebuild index.json from raw if possible (preferred);
remove alias only if raw is unreadable/unparseable
--robot Machine-readable output
```
**Examples:**
```bash
# Check health
swagger-cli doctor
# Auto-fix
swagger-cli doctor --fix
# Robot mode
swagger-cli doctor --robot
```
**Human output:**
```
swagger-cli Health Check
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✓ Config directory: /Users/user/.config/swagger-cli
✓ Cache directory: /Users/user/.cache/swagger-cli/aliases
✓ Permissions: Read/write OK
Cached APIs (3):
✓ petstore 45 KB Fetched 29 minutes ago
✓ stripe 2.4 MB Fetched 5 days ago
⚠ github 8.2 MB Fetched 45 days ago (stale)
Disk Usage:
Total cache: 10.6 MB
Health: HEALTHY
Warnings: 1 stale cache (github)
```
**Robot output:**
```json
{
"ok": true,
"data": {
"health": "healthy",
"config_dir": "/Users/user/.config/swagger-cli",
"cache_dir": "/Users/user/.cache/swagger-cli/aliases",
"permissions": {
"config_readable": true,
"config_writable": true,
"cache_readable": true,
"cache_writable": true
},
"aliases": [
{
"name": "petstore",
"status": "healthy",
"size_bytes": 46080,
"age_days": 0.02,
"is_stale": false
},
{
"name": "github",
"status": "stale",
"size_bytes": 8601600,
"age_days": 45,
"is_stale": true
}
],
"disk_usage": {
"total_bytes": 11141120,
"total_mb": 10.6
},
"warnings": [
{
"code": "STALE_CACHE",
"alias": "github",
"message": "Cache is 45 days old",
"suggestion": "Run: swagger-cli sync github"
}
]
}
}
```
**Decision rationale:**
- **Stale threshold:** 30 days (warning only, not error)
- **Auto-fix scope:** Prefer repair over deletion; delete alias only as last resort when raw is unreadable. Never deletes stale caches. Repair modes:
1. If raw exists but index is missing/invalid or `index_version` mismatched → rebuild index from raw.
2. If raw + index are valid but meta is missing → reconstruct meta from raw + index (compute content_hash, set generation=1, compute index_hash).
3. If raw is unreadable/unparseable → delete alias (last resort).
- **Integrity validation:** Uses generation + index_hash from meta.json to detect torn/partial cache state. If meta is missing but raw/index exist, treat as incomplete (fixable via repair mode 2).
- **Security hygiene:** If auth tokens exist in config, doctor warns on insecure config permissions (group/world readable). Suggests `chmod 600`.
- **Disk usage tracking:** Useful for large deployments
### FR-10: Cache Lifecycle Management
**Description:** Manage cache growth, retention, and disk usage.
**Acceptance Criteria:**
- ✓ Show per-alias and total cache usage statistics
- ✓ Prune stale aliases older than configurable threshold
- ✓ Enforce global cache size cap via LRU eviction (optional)
- ✓ Robot mode JSON
**Command:**
```bash
swagger-cli cache [OPTIONS]
OPTIONS:
--stats Show per-alias and total cache usage (default action)
--prune-stale Delete aliases older than stale threshold (default: 90 days)
--prune-threshold <DAYS> Override stale threshold for pruning (default: 90)
--max-total-mb <N> Enforce global cache cap via LRU eviction (evicts oldest-accessed aliases first)
--dry-run Show what would be pruned/evicted without deleting
--robot Machine-readable output
```
**Examples:**
```bash
# Show cache stats
swagger-cli cache --stats
# Prune stale caches
swagger-cli cache --prune-stale
# Enforce 500MB cap
swagger-cli cache --max-total-mb 500
# Preview what would be pruned
swagger-cli cache --prune-stale --dry-run --robot
```
**Robot output:**
```json
{
"ok": true,
"data": {
"aliases": [
{
"name": "petstore",
"size_bytes": 46080,
"last_accessed": "2026-02-12T10:00:00Z",
"age_days": 10
}
],
"total_bytes": 11141120,
"total_mb": 10.6,
"pruned": [],
"evicted": []
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "cache",
"duration_ms": 5
}
}
```
**Decision rationale:**
- **Separate from doctor:** Doctor validates health; cache manages lifecycle. Different concerns.
- **LRU eviction:** Uses coalesced `last_accessed` writes (write only when older than 10 minutes) to reduce lock contention and write amplification under frequent agent queries. Preserves LRU ordering accuracy while avoiding per-query metadata rewrites.
- **Conservative defaults:** Prune threshold is 90 days (not 30 — that's doctor's stale warning). No automatic eviction; `--max-total-mb` must be explicitly requested.
- **Dry-run support:** Critical for agents to preview impact before destructive operations.
### FR-11: Spec Diff (Phase 2)
**Description:** Compare two spec states and report structural changes.
**Acceptance Criteria:**
- ✓ Compare alias vs alias, alias vs URL, or alias generation vs generation
- ✓ Report added, removed, and modified endpoints and schemas
- ✓ Robot mode emits machine-actionable diff summary
-`--fail-on breaking` exits non-zero when breaking changes detected (useful for CI gates)
**Command:**
```bash
swagger-cli diff <LEFT> <RIGHT> [OPTIONS]
OPTIONS:
--fail-on <LEVEL> Exit non-zero if changes at this level: breaking (default: none)
--details Include per-item change descriptions
--robot Machine-readable output
```
**Examples:**
```bash
# Compare two aliases
swagger-cli diff petstore-v1 petstore-v2 --robot
# Compare alias against remote URL (fetches RIGHT as temp)
swagger-cli diff petstore https://api.example.com/openapi.json --robot
# CI gate: fail if breaking changes
swagger-cli diff petstore-prod petstore-staging --fail-on breaking --robot
```
**Robot output:**
```json
{
"ok": true,
"data": {
"left": "petstore-v1",
"right": "petstore-v2",
"changes": {
"endpoints": {
"added": [["POST", "/pets/batch"]],
"removed": [],
"modified": [["PUT", "/pet"]]
},
"schemas": {
"added": ["BatchRequest"],
"removed": [],
"modified": ["Pet"]
},
"summary": {
"total_changes": 3,
"has_breaking": false
}
}
},
"meta": {
"schema_version": 1,
"tool_version": "1.0.0",
"command": "diff",
"duration_ms": 45
}
}
```
**Decision rationale:**
- **Structural diff, not text diff:** Compares normalized index structures (added/removed/modified endpoints and schemas), not raw JSON text. Produces actionable output for agents and CI.
- **Breaking-change classification (Phase 3 enhancement):** Full breaking/non-breaking/unknown classification requires heuristics (e.g., removed required field = breaking, added optional field = non-breaking, changed type = breaking). Phase 2 reports structural changes; Phase 3 adds semantic classification.
- **Leverages sync infrastructure:** Uses the same index-comparison logic already built for `sync --details`.
---
## Technical Architecture
### Technology Stack
**Language:** Rust 1.93+ (stable baseline, Jan 2026) -- all CI/Docker builds must pin to the same toolchain
**Core dependencies:**
```toml
[dependencies]
# HTTP client — rustls for portable musl/alpine builds (no OpenSSL toolchain needed)
reqwest = { version = "0.13", default-features = false, features = ["json", "blocking", "rustls-tls"] }
# JSON + YAML processing
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
# CLI framework
clap = { version = "4.5", features = ["derive", "env"] }
# Error handling
anyhow = "1.0"
thiserror = "1.0"
# Config management
toml = "0.8"
directories = "5.0"
# Output formatting
colored = "2.0"
tabled = "0.15"
# Date/time
chrono = { version = "0.4", features = ["serde"] }
# Regex
regex = "1.10"
# Hashing
sha2 = "0.10"
# File locking (for concurrent cache safety)
fs2 = "0.4"
[dev-dependencies]
assert_cmd = "2.0"
predicates = "3.0"
tempfile = "3.8"
mockito = "1.2"
criterion = "0.5"
```
**Rationale:**
- **Rust 1.93+:** Current stable; enables modern std features for terminal detection (no `atty` dependency needed)
- **reqwest 0.13 + rustls:** Current release line with `default-features = false` + `rustls-tls` to avoid OpenSSL dependency. This is critical for the Alpine/musl Dockerfile and simplifies cross-compilation.
- **clap v4:** Modern CLI framework with derive macros
- **serde:** De facto JSON standard in Rust ecosystem
- **toml:** Config serialization (note: crate name is `toml`, not `serde_toml`)
- **fs2:** Cross-platform file locking for concurrent cache safety
### Project Structure
```
swagger-cli/
├── Cargo.toml
├── Cargo.lock
├── README.md
├── LICENSE
├── .gitlab-ci.yml
├── src/
│ ├── main.rs # Entry point, CLI setup
│ ├── lib.rs # Public library interface
│ │
│ ├── cli/
│ │ ├── mod.rs # CLI command definitions
│ │ ├── fetch.rs # Fetch command
│ │ ├── list.rs # List command
│ │ ├── show.rs # Show command
│ │ ├── search.rs # Search command
│ │ ├── schemas.rs # Schema browsing
│ │ ├── tags.rs # Tag browsing
│ │ ├── aliases.rs # Alias management
│ │ ├── sync.rs # Sync command
│ │ ├── doctor.rs # Health check
│ │ ├── cache.rs # Cache lifecycle management
│ │ └── diff.rs # Spec diff (Phase 2)
│ │
│ ├── core/
│ │ ├── mod.rs # Core types and traits
│ │ ├── spec.rs # OpenAPI spec parsing
│ │ ├── cache.rs # Cache management
│ │ ├── config.rs # Configuration
│ │ └── search.rs # Search engine
│ │
│ ├── output/
│ │ ├── mod.rs # Output formatting
│ │ ├── human.rs # Human-readable output
│ │ ├── robot.rs # Robot mode JSON
│ │ └── table.rs # Table formatting
│ │
│ ├── errors.rs # Error types and codes
│ └── utils.rs # Shared utilities
├── benches/
│ └── perf.rs # Criterion benchmarks
├── tests/
│ ├── integration/
│ │ ├── fetch_test.rs
│ │ ├── list_test.rs
│ │ ├── show_test.rs
│ │ ├── search_test.rs
│ │ ├── aliases_test.rs
│ │ └── golden/ # Golden robot output snapshots
│ │ ├── fetch_success.json
│ │ ├── list_success.json
│ │ ├── show_success.json
│ │ └── error_alias_not_found.json
│ │
│ └── fixtures/
│ ├── petstore.json # Sample OpenAPI spec (JSON)
│ ├── petstore.yaml # Same spec in YAML (for format normalization tests)
│ ├── github.json # Large spec (8MB+) for perf testing
│ ├── external-refs.json # Spec with external $ref pointers (for bundling tests)
│ └── fastapi.json
├── docs/
│ ├── architecture.md # This file
│ ├── contributing.md
│ ├── examples.md
│ └── robot-schema/
│ └── v1/
│ ├── success.schema.json # JSON Schema for robot success responses
│ └── error.schema.json # JSON Schema for robot error responses
└── deny.toml # cargo-deny configuration (license + advisory policies)
```
### Data Models
**Design principle:** The raw OpenAPI spec is stored losslessly as JSON bytes (`raw.json`). Commands operate primarily on a normalized index (`index.json`). This avoids the fragility of a full typed OpenAPI model (which would break across 3.0/3.1 differences, extensions, and spec variations) while keeping query commands fast by avoiding multi-MB JSON deserialization.
**Index types (what query commands load):**
```rust
// src/core/spec.rs
use serde::{Deserialize, Serialize};
/// Precomputed index derived from the raw spec on fetch.
/// This is what list/search/tags/schemas/aliases/doctor load.
/// Typically 10-50KB even for large specs.
///
/// Determinism requirements (normative):
/// - endpoints MUST be sorted by (path ASC, method_rank ASC, method ASC)
/// - schemas MUST be sorted by (name ASC)
/// - tags MUST be sorted by (name ASC)
/// This guarantees stable robot output ordering, stable sync diffs, and meaningful golden tests.
///
/// Canonical HTTP method order (normative):
/// GET=0, POST=1, PUT=2, PATCH=3, DELETE=4, OPTIONS=5, HEAD=6, TRACE=7
/// Unknown methods sort last (rank=99), then lexicographically.
///
/// Tie-breaking for search results (normative):
/// - Primary: score DESC
/// - Secondary: type (endpoint before schema)
/// - Tertiary: path/name ASC, method_rank ASC
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SpecIndex {
/// Bump when index format changes (forces re-index on load)
pub index_version: u32,
/// Mirrors CacheMetadata.generation for torn-write detection.
/// Read protocol validates: meta.generation == index.generation.
pub generation: u64,
/// Mirrors CacheMetadata.content_hash for change detection / sanity checks.
pub content_hash: String,
pub openapi: String,
pub info: IndexInfo,
pub endpoints: Vec<IndexedEndpoint>,
pub schemas: Vec<IndexedSchema>,
pub tags: Vec<IndexedTag>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct IndexInfo {
pub title: String,
pub version: String,
pub description: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct IndexedEndpoint {
pub path: String,
pub method: String,
pub summary: Option<String>,
pub description: Option<String>,
pub operation_id: Option<String>,
pub tags: Vec<String>,
pub deprecated: bool,
/// Minimal parameter descriptors for agent planning (no schema expansion here).
pub parameters: Vec<IndexedParam>,
/// True if requestBody exists and is required.
pub request_body_required: bool,
/// Media types present under requestBody.content (e.g. ["application/json"]).
pub request_body_content_types: Vec<String>,
/// Effective security for this operation, applying OpenAPI override semantics:
/// - if operation.security is absent → inherit root-level security
/// - if operation.security == [] → explicitly no auth
/// Flattened to scheme names for compactness.
pub security_schemes: Vec<String>,
/// True if auth is required (i.e., effective security is non-empty).
pub security_required: bool,
/// JSON pointer into raw.json (e.g. "/paths/~1pet/get")
/// Used by `show` command to extract full operation details
pub operation_ptr: String,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct IndexedParam {
pub name: String,
/// "path" | "query" | "header" | "cookie"
pub location: String,
pub required: bool,
/// Optional short description (truncated during indexing to keep index small)
pub description: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct IndexedSchema {
pub name: String,
/// JSON pointer into raw.json (e.g. "/components/schemas/Pet")
pub schema_ptr: String,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct IndexedTag {
pub name: String,
pub description: Option<String>,
pub endpoint_count: usize,
}
```
**Why not a full typed OpenAPI model?**
A typed `OpenApiSpec` struct (even a complete one) will either:
- Reject real-world specs when deserialization hits unexpected fields or 3.1 JSON Schema constructs
- Silently drop fields via `#[serde(flatten)]` catch-alls, losing data agents might need
Instead:
- **Raw spec** is stored as exact bytes, parsed as `serde_json::Value` only when `show` or `schemas --show` needs full operation/schema details
- **Index** extracts only the fields needed for listing, filtering, searching, and linking back to raw via JSON pointers
- This makes the tool compatible with any valid OpenAPI 3.0.x or 3.1.x spec, including those with custom extensions
**Cache metadata and management:**
```rust
// src/core/cache.rs
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use std::path::PathBuf;
/// Stored as meta.json in each alias directory
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct CacheMetadata {
pub alias: String,
pub url: String,
pub fetched_at: DateTime<Utc>,
/// Updated on query commands for LRU eviction ordering.
/// Coalesced: written only when >10min stale to reduce write amplification.
pub last_accessed: DateTime<Utc>,
pub content_hash: String,
/// SHA256 hash of raw.json bytes. Commands that load raw.json (show, schemas --show)
/// validate this to detect raw file corruption.
pub raw_hash: String,
pub etag: Option<String>,
pub last_modified: Option<String>,
pub spec_version: String,
pub spec_title: String,
pub endpoint_count: usize,
pub schema_count: usize,
pub raw_size_bytes: u64,
/// Original input format: "json" | "yaml"
pub source_format: String,
pub index_version: u32,
/// Monotonically increasing; bumped on every successful fetch/sync commit.
pub generation: u64,
/// SHA256 hash of index.json bytes for integrity checking.
/// Readers validate this against the actual index.json to detect torn state.
pub index_hash: String,
}
impl CacheMetadata {
pub fn is_stale(&self, threshold_days: u32) -> bool {
let age = Utc::now() - self.fetched_at;
age.num_days() > threshold_days as i64
}
}
/// Cache directory layout:
/// ~/.cache/swagger-cli/aliases/<alias>/
/// ├── raw.source # Original upstream bytes as fetched (json, yaml, or gz — lossless provenance)
/// ├── raw.json # Canonical normalized JSON (always JSON; YAML normalized at ingest)
/// ├── index.json # Precomputed SpecIndex (small, fast)
/// ├── meta.json # CacheMetadata
/// └── .lock # File lock (held during writes)
///
/// Write protocol (crash-consistent, multi-file):
/// 1. Acquire exclusive lock on .lock
/// - Lock acquisition is bounded (default <= 1000ms). On timeout: CACHE_LOCKED (exit 9).
/// - Prevents permanent stalls from dead processes holding locks.
/// 2. Compute content_hash + raw_hash + next generation number + index_hash
/// 3. Write raw.source.tmp + raw.json.tmp + index.json.tmp
/// - raw.source.tmp: original bytes as fetched (JSON or YAML)
/// - raw.json.tmp: canonical normalized JSON (YAML normalized at this step)
/// - MUST call sync_all() on each tmp file before rename (ensures durability)
/// 4. Rename raw.source.tmp → raw.source; raw.json.tmp → raw.json; index.json.tmp → index.json
/// - MUST call sync_all() on each file after rename
/// 5. Write meta.json.tmp LAST (acts as commit marker; includes generation + hashes)
/// - MUST call sync_all() on meta.json.tmp before rename
/// 6. Rename meta.json.tmp → meta.json
/// - MUST call sync_all() on meta.json after rename
/// 6b. Best-effort: fsync the alias directory fd on Unix after renames
/// (ensures directory entries are durable; no-op on platforms that don't support it)
/// 7. Release lock
///
/// Read protocol:
/// - Read meta.json first (commit marker). If missing → alias incomplete/partial.
/// - Read index.json. Validate ALL THREE match meta.json:
/// 1. meta.index_version == index.index_version
/// 2. meta.generation == index.generation
/// 3. meta.index_hash == sha256(index.json bytes)
/// - If any mismatch → surface CACHE_INTEGRITY error (doctor --fix can rebuild from raw).
/// - `show` command: additionally reads raw.json as serde_json::Value
/// and validates meta.raw_hash == sha256(raw.json bytes) to detect raw corruption.
/// - Coalesce last_accessed updates: write only when current timestamp is >10min older than stored value.
/// This reduces write amplification and lock contention for hot-read bursts (best-effort, no lock required).
///
/// Hash computation:
pub fn compute_hash(raw_bytes: &[u8]) -> String {
use sha2::{Sha256, Digest};
let mut hasher = Sha256::new();
hasher.update(raw_bytes);
format!("sha256:{:x}", hasher.finalize())
}
```
**Configuration:**
```rust
// src/core/config.rs
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::path::PathBuf;
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Config {
pub default_alias: Option<String>,
pub stale_threshold_days: u32,
pub auth_profiles: HashMap<String, AuthConfig>,
#[serde(default)]
pub display: DisplayConfig,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct AuthConfig {
pub url_pattern: String,
pub auth_type: AuthType,
pub credential: CredentialSource,
}
/// Credential resolution: Literal for backward compat, EnvVar for CI/agent use,
/// Keyring for desktop environments (Phase 2).
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "source", rename_all = "snake_case")]
pub enum CredentialSource {
/// Inline token (backward-compatible; doctor warns on insecure perms)
Literal { token: String },
/// Read token from environment variable at runtime
EnvVar { var_name: String },
/// OS keychain lookup (Phase 2 — macOS Keychain, Linux Secret Service)
Keyring { service: String, account: String },
}
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum AuthType {
Bearer,
ApiKey { header: String },
}
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct DisplayConfig {
pub color: bool,
pub unicode: bool,
pub max_width: Option<usize>,
}
impl Config {
pub fn load() -> anyhow::Result<Self> {
let config_path = Self::config_path()?;
if config_path.exists() {
let content = std::fs::read_to_string(&config_path)?;
Ok(toml::from_str(&content)?)
} else {
Ok(Self::default())
}
}
pub fn save(&self) -> anyhow::Result<()> {
let config_path = Self::config_path()?;
let content = toml::to_string_pretty(self)?;
std::fs::write(&config_path, content)?;
Ok(())
}
/// Implements D7 override precedence:
/// 1. --config <path> / SWAGGER_CLI_CONFIG (handled at call site)
/// 2. SWAGGER_CLI_HOME (base dir; implies config/cache under it)
/// 3. XDG defaults via directories::ProjectDirs
pub fn config_path() -> anyhow::Result<PathBuf> {
if let Ok(home) = std::env::var("SWAGGER_CLI_HOME") {
return Ok(PathBuf::from(home).join("config").join("config.toml"));
}
let dirs = directories::ProjectDirs::from("", "", "swagger-cli")
.ok_or_else(|| anyhow::anyhow!("Cannot determine config directory"))?;
Ok(dirs.config_dir().join("config.toml"))
}
/// Implements D7 override precedence:
/// 1. SWAGGER_CLI_CACHE (cache dir only; highest for cache)
/// 2. SWAGGER_CLI_HOME (base dir; implies cache under it)
/// 3. XDG defaults via directories::ProjectDirs
pub fn cache_dir() -> anyhow::Result<PathBuf> {
if let Ok(cache) = std::env::var("SWAGGER_CLI_CACHE") {
return Ok(PathBuf::from(cache));
}
if let Ok(home) = std::env::var("SWAGGER_CLI_HOME") {
return Ok(PathBuf::from(home).join("cache"));
}
let dirs = directories::ProjectDirs::from("", "", "swagger-cli")
.ok_or_else(|| anyhow::anyhow!("Cannot determine cache directory"))?;
Ok(dirs.cache_dir().to_path_buf())
}
}
impl Default for Config {
fn default() -> Self {
Self {
default_alias: None,
stale_threshold_days: 30,
auth_profiles: HashMap::new(),
display: DisplayConfig::default(),
}
}
}
```
**Error types:**
```rust
// src/errors.rs
use thiserror::Error;
#[derive(Error, Debug)]
pub enum SwaggerCliError {
#[error("Usage error: {0}")]
Usage(String),
#[error("Cache locked for alias '{0}'")]
CacheLocked(String),
#[error("Network error: {0}")]
Network(#[from] reqwest::Error),
#[error("Invalid OpenAPI specification: {0}")]
InvalidSpec(String),
#[error("Alias '{0}' not found")]
AliasNotFound(String),
#[error("Alias '{0}' already exists")]
AliasExists(String),
#[error("Cache error: {0}")]
Cache(String),
#[error("Cache integrity error: {0}")]
CacheIntegrity(String),
#[error("Config error: {0}")]
Config(String),
#[error("Authentication failed: {0}")]
Auth(String),
#[error("IO error: {0}")]
Io(#[from] std::io::Error),
#[error("JSON error: {0}")]
Json(#[from] serde_json::Error),
#[error("Offline mode: {0}")]
OfflineMode(String),
#[error("Policy blocked: {0}")]
PolicyBlocked(String),
}
impl SwaggerCliError {
/// Convert to exit code
pub fn exit_code(&self) -> i32 {
match self {
Self::Usage(_) => 2,
Self::CacheLocked(_) => 9,
Self::Network(_) => 4,
Self::InvalidSpec(_) => 5,
Self::AliasExists(_) => 6,
Self::Auth(_) => 7,
Self::AliasNotFound(_) => 8,
Self::Cache(_) => 10,
Self::CacheIntegrity(_) => 14,
Self::Config(_) => 11,
Self::Io(_) => 12,
Self::Json(_) => 13,
Self::OfflineMode(_) => 15,
Self::PolicyBlocked(_) => 16,
}
}
/// Error code for robot mode
pub fn code(&self) -> &'static str {
match self {
Self::Usage(_) => "USAGE_ERROR",
Self::CacheLocked(_) => "CACHE_LOCKED",
Self::Network(_) => "NETWORK_ERROR",
Self::InvalidSpec(_) => "INVALID_SPEC",
Self::AliasExists(_) => "ALIAS_EXISTS",
Self::Auth(_) => "AUTH_FAILED",
Self::AliasNotFound(_) => "ALIAS_NOT_FOUND",
Self::Cache(_) => "CACHE_ERROR",
Self::CacheIntegrity(_) => "CACHE_INTEGRITY",
Self::Config(_) => "CONFIG_ERROR",
Self::Io(_) => "IO_ERROR",
Self::Json(_) => "JSON_ERROR",
Self::OfflineMode(_) => "OFFLINE_MODE",
Self::PolicyBlocked(_) => "POLICY_BLOCKED",
}
}
/// Suggestion for robot mode
pub fn suggestion(&self) -> Option<String> {
match self {
Self::Usage(_) => {
Some("Run: swagger-cli --help".to_string())
}
Self::CacheLocked(alias) => {
Some(format!("Retry later; another process is updating '{alias}'"))
}
Self::AliasNotFound(alias) => {
Some(format!("Run: swagger-cli aliases --list"))
}
Self::AliasExists(alias) => {
Some(format!("Use --force to overwrite"))
}
Self::CacheIntegrity(_) => {
Some("Run: swagger-cli doctor --fix".to_string())
}
Self::Auth(_) => {
Some("Check authentication credentials".to_string())
}
Self::OfflineMode(_) => {
Some("Remove --network offline or set SWAGGER_CLI_NETWORK=auto".to_string())
}
Self::PolicyBlocked(_) => {
Some("Use --allow-private-host <HOST> or --allow-insecure-http to bypass".to_string())
}
_ => None,
}
}
}
```
### Search Engine
**Index-backed text search with multi-term scoring:**
```rust
// src/core/search.rs
use crate::core::spec::{SpecIndex, IndexedEndpoint, IndexedSchema};
pub struct SearchEngine {
index: SpecIndex,
}
#[derive(Debug, Clone)]
pub struct SearchResult {
pub result_type: SearchResultType,
/// Relevance score quantized to integer (basis points) for cross-platform determinism.
/// Computed internally as float, then quantized at output: (raw_score * 100.0).round() as u32
pub score: u32,
/// 1-based rank in the result set (assigned after sorting).
pub rank: usize,
pub matches: Vec<Match>,
}
#[derive(Debug, Clone)]
pub enum SearchResultType {
Endpoint {
path: String,
method: String,
summary: Option<String>,
description: Option<String>,
tags: Vec<String>,
operation_id: Option<String>,
},
Schema {
name: String,
},
}
#[derive(Debug, Clone)]
pub struct Match {
pub field: String,
pub snippet: String,
}
impl SearchEngine {
/// Search engine operates on the precomputed index, never on raw.json.
pub fn new(index: SpecIndex) -> Self {
Self { index }
}
pub fn search(&self, query: &str, options: &SearchOptions) -> Vec<SearchResult> {
// Normalize query based on options
let normalized = if options.case_sensitive {
query.to_string()
} else {
query.to_lowercase()
};
// Tokenize unless exact phrase matching is requested
let terms = if options.exact {
vec![normalized.clone()]
} else {
tokenize(&normalized)
};
let mut results = Vec::new();
// Search endpoints
if options.search_paths || options.search_descriptions {
for endpoint in &self.index.endpoints {
if let Some(result) = self.match_endpoint(endpoint, &terms, options) {
results.push(result);
}
}
}
// Search schemas
if options.search_schemas {
for schema in &self.index.schemas {
if let Some(result) = self.match_schema(&schema.name, &terms, options) {
results.push(result);
}
}
}
// Sort by score descending with deterministic tie-breaking:
// (score DESC, type ordinal ASC, path/name ASC, method_rank ASC)
results.sort_by(|a, b| {
b.score.cmp(&a.score)
.then_with(|| a.result_type.type_ordinal().cmp(&b.result_type.type_ordinal()))
.then_with(|| a.result_type.sort_key().cmp(&b.result_type.sort_key()))
});
// Assign 1-based ranks after sorting
for (i, result) in results.iter_mut().enumerate() {
result.rank = i + 1;
}
// Apply limit
if let Some(limit) = options.limit {
results.truncate(limit);
}
results
}
fn match_endpoint(
&self,
endpoint: &IndexedEndpoint,
terms: &[String],
options: &SearchOptions,
) -> Option<SearchResult> {
let mut total_score = 0.0;
let mut matches = Vec::new();
let mut terms_matched = 0;
for term in terms {
let mut term_matched = false;
// Match path (highest weight: 10)
let path_norm = self.normalize(&endpoint.path, options.case_sensitive);
if path_norm.contains(term.as_str()) {
total_score += 10.0;
term_matched = true;
matches.push(Match {
field: "path".to_string(),
snippet: safe_snippet(&endpoint.path, term, 50),
});
}
// Match summary (weight: 5)
if let Some(summary) = &endpoint.summary {
let summary_norm = self.normalize(summary, options.case_sensitive);
if summary_norm.contains(term.as_str()) {
total_score += 5.0;
term_matched = true;
matches.push(Match {
field: "summary".to_string(),
snippet: safe_snippet(summary, term, 50),
});
}
}
// Match description (weight: 2)
if let Some(description) = &endpoint.description {
let desc_norm = self.normalize(description, options.case_sensitive);
if desc_norm.contains(term.as_str()) {
total_score += 2.0;
term_matched = true;
matches.push(Match {
field: "description".to_string(),
snippet: safe_snippet(description, term, 50),
});
}
}
if term_matched {
terms_matched += 1;
}
}
if total_score > 0.0 {
// Coverage boost: matching all terms scores higher
let coverage = terms_matched as f32 / terms.len() as f32;
total_score *= 1.0 + coverage;
Some(SearchResult {
result_type: SearchResultType::Endpoint {
path: endpoint.path.clone(),
method: endpoint.method.clone(),
summary: endpoint.summary.clone(),
description: endpoint.description.clone(),
tags: endpoint.tags.clone(),
operation_id: endpoint.operation_id.clone(),
},
score: total_score,
matches,
})
} else {
None
}
}
fn normalize(&self, text: &str, case_sensitive: bool) -> String {
if case_sensitive { text.to_string() } else { text.to_lowercase() }
}
}
/// Tokenize query into whitespace-separated terms
fn tokenize(query: &str) -> Vec<String> {
query.split_whitespace()
.filter(|s| !s.is_empty())
.map(|s| s.to_string())
.collect()
}
/// Create snippet with char-boundary-safe slicing (Unicode-safe).
/// Uses char_indices to find safe boundaries instead of byte offsets.
fn safe_snippet(text: &str, query: &str, context_chars: usize) -> String {
let text_lower = text.to_lowercase();
let query_lower = query.to_lowercase();
if let Some(byte_pos) = text_lower.find(&query_lower) {
// Find char-safe boundaries
let char_positions: Vec<(usize, char)> = text.char_indices().collect();
let match_char_idx = char_positions.iter()
.position(|(b, _)| *b >= byte_pos)
.unwrap_or(0);
let start_char = match_char_idx.saturating_sub(context_chars);
let end_char = (match_char_idx + query.chars().count() + context_chars)
.min(char_positions.len());
let start_byte = char_positions[start_char].0;
let end_byte = if end_char >= char_positions.len() {
text.len()
} else {
char_positions[end_char].0
};
let mut snippet = text[start_byte..end_byte].to_string();
if start_char > 0 {
snippet = format!("...{}", snippet);
}
if end_char < char_positions.len() {
snippet = format!("{}...", snippet);
}
snippet
} else {
text.chars().take(100).collect()
}
}
pub struct SearchOptions {
pub search_paths: bool,
pub search_descriptions: bool,
pub search_schemas: bool,
pub case_sensitive: bool,
pub exact: bool,
pub limit: Option<usize>,
}
impl Default for SearchOptions {
fn default() -> Self {
Self {
search_paths: true,
search_descriptions: true,
search_schemas: true,
case_sensitive: false,
exact: false,
limit: Some(20),
}
}
}
```
### CLI Implementation
**Main entry point:**
```rust
// src/main.rs
use clap::Parser;
use swagger_cli::{cli, errors::SwaggerCliError};
fn main() {
// Pre-scan argv so parse/usage errors can still honor --robot.
// This prevents "sometimes JSON, sometimes clap text" output for agents.
let argv: Vec<String> = std::env::args().collect();
let robot_requested = argv.iter().any(|a| a == "--robot");
let cli = match cli::Cli::try_parse_from(&argv) {
Ok(v) => v,
Err(e) => {
if robot_requested {
let json = serde_json::json!({
"ok": false,
"error": {
"code": "USAGE_ERROR",
"message": e.to_string(),
"suggestion": "Run: swagger-cli --help"
},
"meta": { "schema_version": 1, "command": "parse" }
});
eprintln!("{}", serde_json::to_string(&json).unwrap());
std::process::exit(2);
}
e.exit();
}
};
// Robot mode is explicit only -- never auto-enabled by TTY detection.
// TTY detection is used only for color/unicode formatting decisions.
let robot_mode = cli.robot;
// Execute command
let result = match cli.command {
cli::Commands::Fetch(args) => cli::fetch::execute(args, robot_mode),
cli::Commands::List(args) => cli::list::execute(args, robot_mode),
cli::Commands::Show(args) => cli::show::execute(args, robot_mode),
cli::Commands::Search(args) => cli::search::execute(args, robot_mode),
cli::Commands::Schemas(args) => cli::schemas::execute(args, robot_mode),
cli::Commands::Tags(args) => cli::tags::execute(args, robot_mode),
cli::Commands::Aliases(args) => cli::aliases::execute(args, robot_mode),
cli::Commands::Sync(args) => cli::sync::execute(args, robot_mode),
cli::Commands::Doctor(args) => cli::doctor::execute(args, robot_mode),
cli::Commands::Cache(args) => cli::cache::execute(args, robot_mode),
cli::Commands::Diff(args) => cli::diff::execute(args, robot_mode),
};
// Handle result
match result {
Ok(()) => std::process::exit(0),
Err(e) => {
if robot_mode {
output_robot_error(&e);
} else {
output_human_error(&e);
}
std::process::exit(e.exit_code());
}
}
}
fn output_robot_error(error: &SwaggerCliError) {
let json = serde_json::json!({
"ok": false,
"error": {
"code": error.code(),
"message": error.to_string(),
"suggestion": error.suggestion(),
},
"meta": {
"schema_version": 1,
"tool_version": env!("CARGO_PKG_VERSION"),
"command": "error",
"duration_ms": 0
}
});
eprintln!("{}", serde_json::to_string(&json).unwrap());
}
fn output_human_error(error: &SwaggerCliError) {
use colored::Colorize;
eprintln!("{} {}", "Error:".red().bold(), error);
if let Some(suggestion) = error.suggestion() {
eprintln!("{} {}", "Suggestion:".yellow(), suggestion);
}
}
```
**CLI structure:**
```rust
// src/cli/mod.rs
use clap::{Parser, Subcommand};
#[derive(Parser)]
#[command(name = "swagger-cli")]
#[command(about = "Fast OpenAPI specification CLI tool", long_about = None)]
#[command(version)]
pub struct Cli {
#[command(subcommand)]
pub command: Commands,
/// Output JSON for machine parsing
#[arg(long, global = true)]
pub robot: bool,
/// Pretty-print output (human debugging). If combined with --robot, prints pretty JSON.
#[arg(long, global = true)]
pub pretty: bool,
/// Network policy: auto (default), offline, online-only
#[arg(long, global = true, default_value = "auto", value_parser = ["auto", "offline", "online-only"])]
pub network: String,
/// Path to config file
#[arg(long, global = true, env = "SWAGGER_CLI_CONFIG")]
pub config: Option<std::path::PathBuf>,
}
#[derive(Subcommand)]
pub enum Commands {
/// Fetch and cache OpenAPI spec
Fetch(fetch::FetchArgs),
/// List endpoints
List(list::ListArgs),
/// Show endpoint details
Show(show::ShowArgs),
/// Search across spec
Search(search::SearchArgs),
/// Browse schemas
Schemas(schemas::SchemasArgs),
/// List tags
Tags(tags::TagsArgs),
/// Manage aliases
Aliases(aliases::AliasesArgs),
/// Sync specs
Sync(sync::SyncArgs),
/// Health check
Doctor(doctor::DoctorArgs),
/// Cache lifecycle management
Cache(cache::CacheArgs),
/// Compare two spec states (Phase 2)
Diff(diff::DiffArgs),
}
pub mod fetch;
pub mod list;
pub mod show;
pub mod search;
pub mod schemas;
pub mod tags;
pub mod aliases;
pub mod sync;
pub mod doctor;
pub mod cache;
pub mod diff;
```
**Example command implementation (index-backed):**
```rust
// src/cli/list.rs
use clap::Args;
use crate::{
core::{cache::CacheManager, spec::{SpecIndex, IndexedEndpoint}},
errors::SwaggerCliError,
output::{human, robot},
};
#[derive(Args)]
pub struct ListArgs {
/// Alias to query (uses default if not specified)
alias: Option<String>,
/// Filter by HTTP method
#[arg(long, value_parser = ["GET", "POST", "PUT", "DELETE", "PATCH"])]
method: Option<String>,
/// Filter by tag
#[arg(long)]
tag: Option<String>,
/// Filter by path pattern (regex)
#[arg(long)]
path: Option<String>,
/// Sort by field
#[arg(long, default_value = "path", value_parser = ["path", "method", "tag"])]
sort: String,
/// Limit results
#[arg(long, default_value = "50")]
limit: usize,
/// Show all results (no limit)
#[arg(long)]
all: bool,
}
pub fn execute(args: ListArgs, robot_mode: bool) -> Result<(), SwaggerCliError> {
let cache = CacheManager::new()?;
let alias = args.alias.or_else(|| cache.default_alias())
.ok_or_else(|| SwaggerCliError::Config("No alias specified and no default set".into()))?;
// Load index only -- never touches raw.json
let (index, meta) = cache.load_index(&alias)?;
// Build filters
let filters = Filters {
method: args.method,
tag: args.tag,
path_pattern: args.path,
};
// Filter directly from index endpoints (already structured; fails fast on invalid regex)
let mut endpoints = filter_endpoints(&index.endpoints, &filters)?;
// Sort
sort_endpoints(&mut endpoints, &args.sort);
// Apply limit
let total = endpoints.len();
if !args.all {
endpoints.truncate(args.limit);
}
// Output
if robot_mode {
robot::output_list(&alias, &meta, &endpoints, total)?;
} else {
human::output_list(&alias, &meta, &endpoints, total)?;
}
Ok(())
}
struct Filters {
method: Option<String>,
tag: Option<String>,
path_pattern: Option<String>,
}
fn filter_endpoints(endpoints: &[IndexedEndpoint], filters: &Filters) -> Result<Vec<IndexedEndpoint>, SwaggerCliError> {
let path_regex = match &filters.path_pattern {
Some(p) => Some(regex::Regex::new(p).map_err(|e| {
SwaggerCliError::Usage(format!("Invalid --path regex '{}': {}", p, e))
})?),
None => None,
};
endpoints.iter()
.filter(|ep| {
// Filter by path pattern
if let Some(regex) = &path_regex {
if !regex.is_match(&ep.path) {
return false;
}
}
// Filter by method
if let Some(filter_method) = &filters.method {
if ep.method.to_uppercase() != filter_method.to_uppercase() {
return false;
}
}
// Filter by tag
if let Some(filter_tag) = &filters.tag {
if !ep.tags.contains(filter_tag) {
return false;
}
}
true
})
.cloned()
.collect())
}
/// Canonical HTTP method ranking for deterministic ordering.
/// GET=0, POST=1, PUT=2, PATCH=3, DELETE=4, OPTIONS=5, HEAD=6, TRACE=7.
/// Unknown methods sort last (99), then lexicographically.
fn method_rank(method: &str) -> u8 {
match method {
"GET" => 0, "POST" => 1, "PUT" => 2, "PATCH" => 3,
"DELETE" => 4, "OPTIONS" => 5, "HEAD" => 6, "TRACE" => 7,
_ => 99,
}
}
fn sort_endpoints(endpoints: &mut [IndexedEndpoint], sort_by: &str) {
match sort_by {
"method" => endpoints.sort_by(|a, b| {
method_rank(&a.method).cmp(&method_rank(&b.method))
.then_with(|| a.path.cmp(&b.path))
.then_with(|| a.operation_id.cmp(&b.operation_id))
}),
"tag" => endpoints.sort_by(|a, b| {
let a_tag = a.tags.first().map(|s| s.as_str()).unwrap_or("");
let b_tag = b.tags.first().map(|s| s.as_str()).unwrap_or("");
a_tag.cmp(b_tag)
.then_with(|| a.path.cmp(&b.path))
.then_with(|| method_rank(&a.method).cmp(&method_rank(&b.method)))
}),
_ => endpoints.sort_by(|a, b| {
a.path.cmp(&b.path)
.then_with(|| method_rank(&a.method).cmp(&method_rank(&b.method)))
}),
}
}
```
---
## Testing Strategy
### Unit Tests
**Coverage targets:**
- Core logic: 90%+
- CLI commands: 80%+
- Error handling: 100%
**Example tests:**
```rust
// tests/core/cache_test.rs
#[cfg(test)]
mod tests {
use super::*;
use tempfile::TempDir;
#[test]
fn test_cache_save_and_load() {
let temp = TempDir::new().unwrap();
let cache = CacheManager::new_with_path(temp.path()).unwrap();
let spec_value = create_test_spec_value();
let index = build_index_from_value(&spec_value).unwrap();
let cached = CachedSpec {
alias: "test".to_string(),
url: "https://example.com/spec.json".to_string(),
spec,
metadata: CacheMetadata {
fetched_at: Utc::now(),
content_hash: "abc123".to_string(),
etag: None,
spec_version: "1.0.0".to_string(),
endpoint_count: 5,
schema_count: 3,
},
};
cache.save(&cached).unwrap();
let loaded = cache.load("test").unwrap();
assert_eq!(loaded.alias, "test");
assert_eq!(loaded.url, "https://example.com/spec.json");
}
#[test]
fn test_cache_not_found() {
let temp = TempDir::new().unwrap();
let cache = CacheManager::new_with_path(temp.path()).unwrap();
let result = cache.load("nonexistent");
assert!(matches!(result, Err(SwaggerCliError::AliasNotFound(_))));
}
#[test]
fn test_is_stale() {
let mut cached = create_test_cached_spec();
// Fresh
assert!(!cached.is_stale(30));
// Old
cached.metadata.fetched_at = Utc::now() - chrono::Duration::days(45);
assert!(cached.is_stale(30));
}
fn create_test_spec_value() -> serde_json::Value {
// Minimal valid OpenAPI as Value (tolerant parsing architecture)
serde_json::json!({
"openapi": "3.0.0",
"info": { "title": "Test API", "version": "1.0.0" },
"paths": {}
})
}
}
```
### Integration Tests
**Test scenarios:**
```rust
// tests/integration/fetch_test.rs
use assert_cmd::Command;
use predicates::prelude::*;
use mockito::mock;
#[test]
fn test_fetch_success() {
let _m = mock("GET", "/openapi.json")
.with_status(200)
.with_header("content-type", "application/json")
.with_body(include_str!("../fixtures/petstore.json"))
.create();
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["fetch", &format!("{}/openapi.json", mockito::server_url()), "--alias", "test", "--robot"])
.assert()
.success()
.stdout(predicate::str::contains(r#""ok":true"#));
}
#[test]
fn test_fetch_invalid_json() {
let _m = mock("GET", "/openapi.json")
.with_status(200)
.with_body("not json")
.create();
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["fetch", &mockito::server_url(), "--alias", "test", "--robot"])
.assert()
.failure()
.code(5) // INVALID_SPEC
.stderr(predicate::str::contains("INVALID_SPEC"));
}
#[test]
fn test_fetch_network_error() {
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["fetch", "http://localhost:1", "--alias", "test", "--robot"])
.assert()
.failure()
.code(4) // NETWORK_ERROR
.stderr(predicate::str::contains("NETWORK_ERROR"));
}
#[test]
fn test_alias_exists() {
let temp = TempDir::new().unwrap();
std::env::set_var("SWAGGER_CLI_HOME", temp.path());
let fixture = std::fs::canonicalize("tests/fixtures/petstore.json").unwrap();
let fixture_str = fixture.to_str().unwrap();
// First fetch (uses local fixture — no network)
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["fetch", fixture_str, "--alias", "test"])
.assert()
.success();
// Second fetch without --force
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["fetch", fixture_str, "--alias", "test", "--robot"])
.assert()
.failure()
.code(6) // ALIAS_EXISTS
.stderr(predicate::str::contains("ALIAS_EXISTS"))
.stderr(predicate::str::contains("--force"));
}
```
```rust
// tests/integration/list_test.rs
#[test]
fn test_list_all_endpoints() {
setup_test_cache("petstore");
let output = Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["list", "petstore", "--robot"])
.output()
.unwrap();
assert!(output.status.success());
// Parse stdout as JSON rather than string predicates (more robust, catches shape issues)
let json: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
assert_eq!(json["ok"], true);
assert!(json["data"]["endpoints"].is_array());
assert!(json["meta"]["schema_version"].is_number());
}
#[test]
fn test_list_filter_by_method() {
setup_test_cache("petstore");
let output = Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["list", "petstore", "--method", "POST", "--robot"])
.output()
.unwrap();
let json: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
let endpoints = json["data"]["endpoints"].as_array().unwrap();
for endpoint in endpoints {
assert_eq!(endpoint["method"], "POST");
}
}
#[test]
fn test_list_filter_by_tag() {
setup_test_cache("petstore");
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["list", "petstore", "--tag", "pets", "--robot"])
.assert()
.success()
.stdout(predicate::str::contains(r#""tags":["pets"]"#));
}
fn setup_test_cache(alias: &str) {
// Helper to set up test cache with fixtures.
// All tests MUST run under SWAGGER_CLI_HOME for hermetic behavior.
// Uses canonicalize() for absolute paths — file:// requires absolute per RFC 8089.
let path = std::fs::canonicalize(format!("tests/fixtures/{alias}.json"))
.expect("fixture file must exist");
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["fetch", path.to_str().unwrap(), "--alias", alias])
.assert()
.success();
}
/// Invariant test: list and search MUST NOT read raw.json.
/// This validates the core performance promise of index-backed queries.
#[test]
fn test_list_does_not_read_raw_json() {
let temp = TempDir::new().unwrap();
std::env::set_var("SWAGGER_CLI_HOME", temp.path());
setup_test_cache("petstore");
// Remove raw.json after cache setup -- list should still work from index alone
let raw_path = temp.path().join("cache/aliases/petstore/raw.json");
std::fs::remove_file(&raw_path).unwrap();
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["list", "petstore", "--robot"])
.assert()
.success(); // Must succeed without raw.json
}
#[test]
fn test_search_does_not_read_raw_json() {
let temp = TempDir::new().unwrap();
std::env::set_var("SWAGGER_CLI_HOME", temp.path());
setup_test_cache("petstore");
let raw_path = temp.path().join("cache/aliases/petstore/raw.json");
std::fs::remove_file(&raw_path).unwrap();
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["search", "petstore", "pet", "--robot"])
.assert()
.success(); // Must succeed without raw.json
}
```
### End-to-End Tests
**Workflow tests:**
```rust
// tests/integration/workflow_test.rs
#[test]
fn test_complete_workflow() {
let temp = TempDir::new().unwrap();
std::env::set_var("SWAGGER_CLI_HOME", temp.path());
// 1. Fetch spec (uses absolute path — no network, no invalid file:// URL)
let fixture = std::fs::canonicalize("tests/fixtures/petstore.json").unwrap();
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["fetch", fixture.to_str().unwrap(), "--alias", "pet"])
.assert()
.success();
// 2. List endpoints
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["list", "pet"])
.assert()
.success()
.stdout(predicate::str::contains("/pets"));
// 3. Show specific endpoint
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["show", "pet", "/pets", "--method", "GET"])
.assert()
.success()
.stdout(predicate::str::contains("List all pets"));
// 4. Search
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["search", "pet", "create"])
.assert()
.success();
// 5. Check aliases
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["aliases"])
.assert()
.success()
.stdout(predicate::str::contains("pet"));
// 6. Health check
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["doctor"])
.assert()
.success()
.stdout(predicate::str::contains("HEALTHY"));
}
```
### Reliability Stress Tests
**Purpose:** Validate crash-consistency, lock behavior, and determinism claims under adversarial conditions.
**Fault injection tests:**
```rust
// tests/reliability/crash_consistency_test.rs
/// Simulate crash at each write step to prove recoverability.
/// For each step in the write protocol (before/after fsync, before/after rename),
/// interrupt the write and verify:
/// - Partial state is detected by the read protocol
/// - doctor --fix can recover from any interrupted state
/// - No data corruption or silent degradation
#[test]
fn test_crash_before_meta_rename() {
// Write raw.json + index.json successfully, but don't write meta.json
// Verify: read protocol detects missing meta → CACHE_INTEGRITY
// Verify: doctor --fix rebuilds from raw
}
#[test]
fn test_crash_after_raw_before_index() {
// Write raw.json but crash before index.json
// Verify: meta.json (from previous generation) still valid
// Verify: doctor --fix detects stale index and rebuilds
}
```
**Multi-process lock contention tests:**
```rust
// tests/reliability/lock_contention_test.rs
/// Spawn N concurrent processes (N>=32) all attempting to fetch/sync the same alias.
/// Verify:
/// - Lock timeout is bounded (no deadlocks)
/// - Exactly one process succeeds per generation
/// - Failed processes get CACHE_LOCKED with actionable suggestion
/// - Final cache state is consistent (generation matches, hashes valid)
#[test]
fn test_concurrent_fetch_32_processes() {
let temp = TempDir::new().unwrap();
let handles: Vec<_> = (0..32).map(|_| {
std::thread::spawn(move || {
Command::cargo_bin("swagger-cli")
.unwrap()
.args(&["fetch", fixture_path, "--alias", "contended", "--force", "--robot"])
.env("SWAGGER_CLI_HOME", temp.path())
.output()
.unwrap()
})
}).collect();
// Verify: all exit 0 or 9 (CACHE_LOCKED), no panics, no corruption
// Verify: final cache state passes doctor validation
}
```
**Property-based tests:**
```rust
// tests/reliability/property_test.rs
/// Use proptest to verify:
/// - Index ordering is deterministic regardless of input order
/// - Search tie-breaking is stable across runs
/// - All generated pointers resolve in raw.json
/// - Content hash is deterministic for same input bytes
#[cfg(test)]
mod prop_tests {
use proptest::prelude::*;
proptest! {
#[test]
fn index_ordering_deterministic(endpoints in arb_endpoints(1..100)) {
let index1 = build_index(endpoints.clone());
let index2 = build_index(endpoints.into_iter().rev().collect());
assert_eq!(
serde_json::to_string(&index1).unwrap(),
serde_json::to_string(&index2).unwrap()
);
}
}
}
```
### Performance Tests
```rust
// benches/perf.rs (Criterion convention: benches/ directory)
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn bench_load_index(c: &mut Criterion) {
let index_path = setup_large_spec_index(); // 500+ endpoints
c.bench_function("load_index_json", |b| {
b.iter(|| {
let bytes = std::fs::read(black_box(&index_path)).unwrap();
let _index: SpecIndex = serde_json::from_slice(&bytes).unwrap();
});
});
}
fn bench_list_endpoints(c: &mut Criterion) {
let index = load_large_spec_index(); // 500+ endpoints
c.bench_function("list_all_endpoints", |b| {
b.iter(|| {
filter_endpoints(black_box(&index.endpoints), &Filters::default())
});
});
}
fn bench_search(c: &mut Criterion) {
let index = load_large_spec_index();
let search_engine = SearchEngine::new(index);
c.bench_function("search_query", |b| {
b.iter(|| {
search_engine.search(black_box("create user"), &Default::default())
});
});
}
criterion_group!(benches, bench_load_index, bench_list_endpoints, bench_search);
criterion_main!(benches);
```
**Performance targets:**
- Index load: <5ms (index.json is typically 10-50KB)
- List 500 endpoints: <50ms (filter + sort on in-memory index)
- Search 500 endpoints: <100ms (tokenized multi-term scoring)
- Fetch + cache 2MB spec: <2s (includes index build)
- `show` command: <100ms (loads raw.json as Value, extracts subtree via JSON pointer)
### Golden Robot Output Tests
**Purpose:** The #1 regression risk for agent tooling is "robot JSON changed shape". Golden tests prevent this.
```rust
// tests/integration/golden_test.rs
#[test]
fn test_robot_output_schema_stability() {
// For each command, verify robot JSON matches golden snapshot
let commands = vec![
("list", vec!["list", "petstore", "--robot"]),
("show", vec!["show", "petstore", "/pet/{petId}", "--robot"]),
("search", vec!["search", "petstore", "pet", "--robot"]),
("schemas", vec!["schemas", "petstore", "--robot"]),
("tags", vec!["tags", "petstore", "--robot"]),
("aliases", vec!["aliases", "--robot"]),
];
for (name, args) in commands {
let output = run_command(&args);
let json: serde_json::Value = serde_json::from_str(&output).unwrap();
// Verify structural invariants
assert!(json["ok"].is_boolean(), "{name}: missing ok field");
assert!(json["data"].is_object() || json["data"].is_null(), "{name}: bad data");
assert!(json["meta"]["schema_version"].is_number(), "{name}: missing schema_version");
assert!(json["meta"]["tool_version"].is_string(), "{name}: missing tool_version");
assert!(json["meta"]["command"].is_string(), "{name}: missing command");
assert!(json["meta"]["duration_ms"].is_number(), "{name}: missing duration_ms");
// Snapshot test: compare against golden file
// Fail CI on breaking changes unless schema_version is incremented
let golden_path = format!("tests/integration/golden/{name}_success.json");
assert_schema_compatible(&json, &golden_path);
}
}
```
---
## Distribution and Deployment
### GitLab CI/CD
```yaml
# .gitlab-ci.yml
stages:
- test
- build
- release
variables:
CARGO_HOME: $CI_PROJECT_DIR/.cargo
cache:
key: $CI_COMMIT_REF_SLUG
paths:
- .cargo/
- target/
# Test stage
test:unit:
stage: test
image: rust:1.93
script:
- cargo test --lib
- cargo test --doc
coverage: '/^\d+\.\d+% coverage/'
test:integration:
stage: test
image: rust:1.93
script:
- cargo test --test '*'
lint:
stage: test
image: rust:1.93
script:
- rustup component add clippy rustfmt
- cargo fmt -- --check
- cargo clippy -- -D warnings
security:deps:
stage: test
image: rust:1.93
script:
- cargo install cargo-deny cargo-audit
- cargo deny check
- cargo audit
allow_failure: false
# Build stage
.build_template: &build_template
stage: build
image: rust:1.93
script:
- cargo build --release --locked --target $TARGET
- cp target/$TARGET/release/swagger-cli swagger-cli-$TARGET
artifacts:
paths:
- swagger-cli-$TARGET
expire_in: 30 days
build:macos-arm64:
<<: *build_template
tags:
- macos
- arm64
variables:
TARGET: aarch64-apple-darwin
build:macos-x86_64:
<<: *build_template
tags:
- macos
- x86_64
variables:
TARGET: x86_64-apple-darwin
build:linux-x86_64:
<<: *build_template
image: rust:1.93
variables:
TARGET: x86_64-unknown-linux-gnu
build:linux-arm64:
<<: *build_template
image: rust:1.93
before_script:
- apt-get update && apt-get install -y gcc-aarch64-linux-gnu
- rustup target add aarch64-unknown-linux-gnu
variables:
TARGET: aarch64-unknown-linux-gnu
CARGO_TARGET_AARCH64_UNKNOWN_LINUX_GNU_LINKER: aarch64-linux-gnu-gcc
# Release stage (only on tags)
release:
stage: release
image: curlimages/curl:latest
only:
- tags
dependencies:
- build:macos-arm64
- build:macos-x86_64
- build:linux-x86_64
- build:linux-arm64
script:
- |
# Generate checksums and sign
sha256sum swagger-cli-* > SHA256SUMS
minisign -Sm SHA256SUMS -s /run/secrets/minisign_key
# Upload binaries + integrity artifacts
for file in swagger-cli-* SHA256SUMS SHA256SUMS.minisig; do
curl --header "JOB-TOKEN: $CI_JOB_TOKEN" \
--upload-file $file \
"${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/packages/generic/swagger-cli/${CI_COMMIT_TAG}/$file"
done
# Docker image
docker:
stage: release
image: docker:latest
services:
- docker:dind
only:
- tags
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG .
- docker tag $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG $CI_REGISTRY_IMAGE:latest
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_TAG
- docker push $CI_REGISTRY_IMAGE:latest
```
### Dockerfile
```dockerfile
# Multi-stage build for minimal image
FROM rust:1.93-alpine as builder
WORKDIR /build
# Install build dependencies
RUN apk add --no-cache musl-dev
# Copy source
COPY Cargo.toml Cargo.lock ./
COPY src ./src
# Build release
RUN cargo build --release --locked --target x86_64-unknown-linux-musl
# Runtime image
FROM alpine:latest
RUN apk add --no-cache ca-certificates
COPY --from=builder /build/target/x86_64-unknown-linux-musl/release/swagger-cli /usr/local/bin/
# Create XDG-compliant directories
RUN mkdir -p /root/.config/swagger-cli /root/.cache/swagger-cli/aliases
ENTRYPOINT ["swagger-cli"]
```
### Installation Script
```bash
#!/bin/bash
# install.sh - Universal installer
set -euo pipefail
# Secure temp directory with cleanup trap
TMP_DIR="$(mktemp -d)"
trap 'rm -rf "$TMP_DIR"' EXIT
# Detect OS and architecture
OS="$(uname -s)"
ARCH="$(uname -m)"
case "$OS" in
Darwin)
OS_LOWER="macos"
;;
Linux)
OS_LOWER="linux"
;;
*)
echo "Unsupported OS: $OS"
exit 1
;;
esac
case "$ARCH" in
arm64|aarch64)
ARCH_LOWER="arm64"
TARGET="aarch64"
;;
x86_64|amd64)
ARCH_LOWER="x86_64"
TARGET="x86_64"
;;
*)
echo "Unsupported architecture: $ARCH"
exit 1
;;
esac
VERSION="${VERSION:-latest}"
INSTALL_DIR="${INSTALL_DIR:-$HOME/.local/bin}"
BINARY_NAME="swagger-cli-${TARGET}-${OS_LOWER}"
# TODO: Replace with actual GitLab Package Registry URL (matches .gitlab-ci.yml release target)
DOWNLOAD_URL="https://<your-gitlab-host>/api/v4/projects/<id>/packages/generic/swagger-cli/${VERSION}/${BINARY_NAME}"
echo "Installing swagger-cli..."
echo " Version: $VERSION"
echo " Platform: $OS_LOWER-$ARCH_LOWER"
echo " Install dir: $INSTALL_DIR"
# Create install directory
mkdir -p "$INSTALL_DIR"
# Download binary + integrity artifacts
echo "Downloading..."
curl -sSL "$DOWNLOAD_URL" -o "$INSTALL_DIR/swagger-cli"
VERIFY="${VERIFY:-true}"
if [ "$VERIFY" = "true" ]; then
CHECKSUMS_URL="${DOWNLOAD_URL%/*}/SHA256SUMS"
SIGNATURE_URL="${DOWNLOAD_URL%/*}/SHA256SUMS.minisig"
curl -sSL "$CHECKSUMS_URL" -o "$TMP_DIR/SHA256SUMS"
curl -sSL "$SIGNATURE_URL" -o "$TMP_DIR/SHA256SUMS.minisig"
# Verify signature if minisign is available
if command -v minisign >/dev/null 2>&1; then
echo "Verifying signature..."
minisign -Vm "$TMP_DIR/SHA256SUMS" -p "$MINISIGN_PUBKEY" || {
echo "ERROR: Signature verification failed. Aborting."
rm -f "$INSTALL_DIR/swagger-cli"
exit 1
}
else
echo "Note: minisign not found, skipping signature verification (checksum only)"
fi
# Verify checksum (portable: sha256sum on Linux, shasum on macOS)
echo "Verifying checksum..."
EXPECTED=$(grep "$BINARY_NAME" "$TMP_DIR/SHA256SUMS" | awk '{print $1}')
if command -v sha256sum >/dev/null 2>&1; then
ACTUAL=$(sha256sum "$INSTALL_DIR/swagger-cli" | awk '{print $1}')
else
ACTUAL=$(shasum -a 256 "$INSTALL_DIR/swagger-cli" | awk '{print $1}')
fi
if [ "$EXPECTED" != "$ACTUAL" ]; then
echo "ERROR: Checksum mismatch. Expected $EXPECTED, got $ACTUAL"
rm -f "$INSTALL_DIR/swagger-cli"
exit 1
fi
echo "Integrity verified."
fi
# Make executable
chmod +x "$INSTALL_DIR/swagger-cli"
echo "✓ Installed to $INSTALL_DIR/swagger-cli"
# Check if in PATH
if ! echo "$PATH" | grep -q "$INSTALL_DIR"; then
echo ""
echo "Warning: $INSTALL_DIR is not in your PATH"
echo "Add this to your shell profile:"
echo " export PATH=\"\$PATH:$INSTALL_DIR\""
fi
echo ""
echo "Get started:"
echo " swagger-cli fetch https://api.example.com/openapi.json --alias myapi"
echo " swagger-cli list myapi"
```
---
## Success Metrics
### Phase 1 (MVP) - Week 1 (CP0-CP2)
**Must have:**
- ✓ Fetch and cache specs (four-file format: raw.source/raw.json/index/meta, with YAML normalization)
- ✓ True crash-consistent writes with fsync, per-alias file locking, bounded lock timeout (meta.json as commit marker)
- ✓ List endpoints with basic filtering (index-backed)
- ✓ Show endpoint details (index pointer + raw Value extraction)
- ✓ Robot mode JSON output with versioned schema contract (unified envelope for success AND errors)
- ✓ Exit codes and error handling (with Usage and CacheIntegrity variants)
- ✓ Strict option validation (invalid regex/options fail with USAGE_ERROR, no silent fallbacks)
- ✓ 80%+ test coverage
**Success criteria:**
- Query latency <50ms (cached, index-backed) -- validated against 8MB+ spec
- Fetch time <2s (2MB spec)
- Works with 3 real-world specs (Petstore, Stripe, GitHub)
- `list` and `search` never load raw.json (verified by removal test: delete raw.json, commands still work)
- Robot JSON includes schema_version, tool_version, command, duration_ms (on BOTH success and error)
- Index build is deterministic (sorted endpoints/schemas/tags with canonical method ordering; verified by golden tests)
- Cache writes are crash-consistent (fsync before renames, generation embedded in index.json, validated on read)
- All integration tests hermetic (SWAGGER_CLI_HOME set, no real network calls, absolute fixture paths)
- Crash-consistency claim validated by automated fault-injection test suite (not only unit tests)
- YAML input accepted and normalized to JSON during fetch (validated with YAML fixture)
- Alias format validated (rejects path traversal, reserved names, shell-hostile characters)
- SSRF policy enforced by default (loopback/private/link-local blocked; HTTPS required for remote URLs)
### Phase 2 (Polish) - Week 2 (CP3-CP5)
**Should have:**
- ✓ Text search (tokenized multi-term, Unicode-safe)
- ✓ Schema browsing (index-backed listing, raw-backed details)
- ✓ Tag browser (index-backed)
- ✓ Alias management
- ✓ Cross-alias discovery (`--all-aliases` on list and search)
- ✓ Sync with change detection (index-based diffs) + optional `--details` with capped change lists
- ✓ Sync concurrency (`--jobs`, `--per-host`, Retry-After handling)
- ✓ Health check (integrity validation via generation/hash, pointer validation, permission checks, `--fix` with locking)
- ✓ Cache lifecycle management (stats, prune, size caps)
- ✓ Global network policy (`--network offline/auto/online-only`)
- ✓ CI/CD pipeline with supply chain hardening (SHA256SUMS + minisign + cargo-deny/cargo-audit)
- ✓ Multi-platform binaries
- ✓ Golden robot output tests validated against published JSON Schema artifacts
- ✓ Reliability stress tests (fault injection, lock contention, property-based)
- ✓ Diff command with structural comparison for CI gates (FR-11)
**Success criteria:**
- Search 500 endpoints <100ms
- All commands have robot mode with stable schema
- Installable via curl script from correct hosting URL
- Golden tests prevent robot JSON shape regressions
- Sync `--details --robot` provides actionable added/removed/modified lists (capped, with truncated flag)
- Doctor detects partial caches + warns on insecure config permissions
- `--all-aliases` works for list and search (verified with 3+ aliases)
- `cache --stats` and `cache --prune-stale` operational
- `sync --all --jobs 4` completes faster than sequential (verified with 4+ aliases)
- Release artifacts include SHA256SUMS + minisign signatures
- `cargo deny check` and `cargo audit` pass in CI (dependency policy enforcement)
- `--network offline` blocks all network commands with OFFLINE_MODE error
- SSRF policy blocks loopback/private ranges by default; `--allow-private-host` permits exceptions
- Lock contention tests pass with 32 concurrent processes
- `diff` command reports structural changes between two alias states
### Phase 3 (Advanced) - Future
**Nice to have:**
- YAML output format (`--format yaml` with serde_yaml, determinism rules, golden fixtures)
- Semantic search with embeddings
- Diff breaking-change classification (heuristic-based: removed field = breaking, added optional = non-breaking, type change = breaking)
- Generate curl commands from endpoints
- Schema validation (validate JSON against schema)
- Import/export aliases
- Compressed cache storage
- OS keychain credential backend (`CredentialSource::Keyring` — macOS Keychain, Linux Secret Service)
- SBOM generation + cosign attestation for release artifacts
- Web UI (local server)
---
## Decision Log
### D1: Rust vs Go vs TypeScript
**Decision:** Rust
**Rationale:**
- **Performance:** Faster than Go/TS for parsing large specs
- **Type safety:** Stronger than TS, excellent serde ecosystem
- **Distribution:** Single binary, no runtime required
- **Ecosystem:** Rich CLI tooling (clap, colored, tabled)
**Alternatives considered:**
- Go: Easier concurrency, but slower JSON parsing
- TypeScript: Easier for team, but requires Node.js runtime
### D2: SQLite vs JSON Cache
**Decision:** JSON files
**Rationale:**
- **Simplicity:** No schema migrations, easy debugging
- **Portability:** No binary database files
- **Performance:** Adequate for <100 specs (target: 3-10)
- **Atomicity:** Atomic writes with temp files
**Alternatives considered:**
- SQLite: Better for complex queries, but overkill for MVP
- Could add later if search becomes bottleneck
### D3: Text Search vs Semantic Search
**Decision:** Text search (MVP), semantic later
**Rationale:**
- **Complexity:** Text search: 100 LOC; semantic: 1000+ LOC + embeddings
- **Dependencies:** No ML dependencies, smaller binary
- **Accuracy:** 90% of queries answered by text search
- **Upgrade path:** Can add semantic in v2 without breaking API
### D4: Sync Strategy (Manual vs Auto)
**Decision:** Manual sync with `swagger-cli sync`
**Rationale:**
- **Predictability:** Agents know when network calls happen
- **Offline:** Works offline after first fetch
- **Control:** User chooses when to check for updates
**Alternatives considered:**
- Auto-sync on first query: Unpredictable latency
- TTL-based: Still unpredictable, cache stampede risk
### D5: Ref Expansion (Inline vs Separate)
**Decision:** Keep refs by default, expand with `--expand-refs`
**Rationale:**
- **Understanding:** Refs show reuse patterns
- **Size:** Inline expansion = 3-5x larger output
- **Performance:** Expansion = recursive resolution = slower
- **Flexibility:** Opt-in when needed
### D6: Error Handling (Panic vs Result)
**Decision:** All public APIs return `Result`, no panics
**Rationale:**
- **Robustness:** Agents can handle errors programmatically
- **Robot mode:** All errors serializable to JSON
- **Recovery:** Network errors shouldn't crash CLI
**Implementation:**
- Use `anyhow::Result` internally
- Convert to typed `SwaggerCliError` at boundaries
- Map errors to exit codes
### D7: Config and Cache Location
**Decision:** Config in XDG config dir; cache in XDG cache dir (separate)
**Rationale:**
- **Standard:** Follows XDG Base Directory spec correctly -- config is durable settings, cache is regenerable data
- **Override precedence (highest to lowest):**
1. `--config <path>` (config file path)
2. `SWAGGER_CLI_CONFIG` (config file path)
3. `SWAGGER_CLI_HOME` (base dir; implies config/cache under it for hermetic runs, CI, and tests)
4. `SWAGGER_CLI_CACHE` (cache dir only; highest precedence for cache location)
5. XDG defaults via `directories::ProjectDirs`
- **Normative clarification:**
- Config path resolution ignores `SWAGGER_CLI_CACHE`.
- Cache dir resolution order is: `SWAGGER_CLI_CACHE` > `SWAGGER_CLI_HOME` > XDG.
- Config path resolution order is: `--config` > `SWAGGER_CLI_CONFIG` > `SWAGGER_CLI_HOME` > XDG.
- **Operations:** Cache can be safely deleted without losing config. Backup tools can exclude cache. Container volume mounts are cleaner.
- **Rationale:** This keeps tests and CI hermetic (via `SWAGGER_CLI_HOME`) while still allowing direct overrides in production.
- **Implementation note (normative):** A single PathResolver function applies this precedence. `Config::config_path()` and `Config::cache_dir()` implement the env var checks. All tests MUST run under `SWAGGER_CLI_HOME` to ensure hermetic behavior and avoid contaminating real user caches.
**Structure:**
```
~/.config/swagger-cli/
└── config.toml # User config (aliases, default, auth)
~/.cache/swagger-cli/
└── aliases/
├── petstore/
│ ├── meta.json # Fetch metadata + integrity hashes
│ ├── raw.source # Original upstream bytes (json/yaml as fetched)
│ ├── raw.json # Canonical normalized JSON
│ ├── index.json # Precomputed query index
│ └── .lock # File lock for writes
└── stripe/
├── meta.json
├── raw.source
├── raw.json
├── index.json
└── .lock
```
### D8: Index-Backed Cache (Source + Raw + Index + Meta)
**Decision:** Four-file cache per alias instead of single combined file
**Rationale:**
- **Performance:** Query commands (list, search, tags, schemas) load only `index.json` (10-50KB), avoiding deserialization of multi-MB raw specs. This is the key design choice that makes <50ms realistic for large specs like Stripe (2.4MB) and GitHub (8.2MB).
- **Correctness:** Raw spec is stored as exact upstream bytes, no lossy struct parsing. `show` and `schemas --show` use JSON pointers from the index to extract subtrees from the raw Value.
- **Compatibility:** No typed OpenAPI model to break across 3.0/3.1 differences, extensions, or spec variations.
- **Simplicity:** Index is rebuilt on every fetch/sync (cheap operation). If index format changes, bump `index_version` and re-index.
**Alternatives considered:**
- Single `CachedSpec` file with embedded spec: Simpler code, but loads entire spec for every command. Fails latency target for large specs.
- SQLite with FTS: Better for very large specs, but overkill for MVP (typically 3-10 specs).
### D9: Tolerant Parsing (serde_json::Value vs Typed Model)
**Decision:** Parse raw spec as `serde_json::Value`, extract normalized index
**Rationale:**
- OpenAPI 3.1 aligns with JSON Schema Draft 2020-12, introducing many new schema constructs
- Real-world specs contain custom `x-*` extensions, vendor-specific fields, and structural variations
- A typed model either rejects valid specs (deserialization failure) or silently drops fields
- `serde_json::Value` accepts any valid JSON; index extraction pulls only what's needed
- `show` command returns raw JSON subtrees, preserving all upstream data
**Alternatives considered:**
- Full typed `OpenApiSpec` struct: Clean code, but fragile against real-world spec diversity
- `openapiv3` crate: Good coverage but adds dependency weight and still may lag spec evolution
### D10: Robot Mode Contract
**Decision:** `--robot` is the only switch for JSON output; TTY detection controls only color/unicode
**Rationale:**
- Auto-enabling robot mode when stdout is not a TTY (the original design) is hostile to humans piping output (`| less`, `> file`)
- Makes behavior environment-dependent and unpredictable
- Agents should explicitly request robot mode; humans piping should get human-readable text without color
**Implementation:**
- `--robot` flag: switches output to JSON
- TTY detection: controls color and unicode characters only
- Every robot payload includes: `meta.schema_version`, `meta.tool_version`, `meta.command`, `meta.duration_ms`
- `schema_version` is bumped when JSON shape changes, enabling agents to detect breaking changes
- Robot JSON must be deterministic:
- Use `BTreeMap` (not `HashMap`) for any constructed objects/maps to ensure canonical key ordering
- Index arrays are emitted in their pre-sorted order (endpoints by path+method_rank, schemas by name, tags by name)
- Compact output by default; pretty-print only with `--pretty`
- This guarantees: stable golden tests, meaningful sync diffs, predictable agent parsing
### D11: Distribution Strategy
**Decision:** GitLab Releases + Docker (hybrid)
**Rationale:**
- **Humans:** Prefer native binaries (faster)
- **Agents:** Prefer Docker (reproducible)
- **CI/CD:** Multi-platform builds in GitLab
- **Updates:** Manual for now (future: auto-update)
**Release checklist:**
1. Tag version: `git tag v1.0.0`
2. Push: `git push --tags`
3. CI builds binaries for all platforms
4. Upload to GitLab Package Registry
5. Build and push Docker image
### D12: YAML Output De-scoped from MVP (Input Supported)
**Decision:** YAML *input* is accepted and normalized to JSON at ingest time. YAML *output* format deferred to v1.1+.
**Rationale:**
- `serde_yaml` is now a dependency for input normalization (see D16)
- YAML *output* determinism is harder to snapshot-test (key ordering, multiline strings, anchor handling)
- Agents overwhelmingly want JSON output; humans want the pretty formatter
- YAML output format (`--format yaml`) requires its own golden fixtures and normalization rules
- Input support covers the primary adoption friction; output format is a separate concern
### D13: External Refs — Annotate at Query Time, Bundle at Fetch Time (Opt-in)
**Decision:** Query-time `--expand-refs` expands internal refs only (`#/...`). External refs are annotated, never fetched at query time. Optional `--resolve-external-refs` at fetch time bundles external refs with explicit host allowlist.
**Rationale:**
- Query-time external ref fetching would cause unexpected network calls, violating the offline guarantee
- Fetch-time bundling is opt-in with explicit `--ref-allow-host` allowlist (see D19)
- Annotation (`{"$external_ref": "..."}`) at query time is transparent and agents can follow up if needed
- Warning in `meta.warnings[]` (robot) or stderr (human) makes the behavior discoverable
- Bundled specs (from fetch-time resolution) have all refs internalized, so `--expand-refs` at query time works fully
### D14: Auth Profiles — Preferred Over Raw Tokens
**Decision:** `--auth-profile <NAME>` loads auth config from `config.toml` profiles. Raw `--bearer`/`--header` flags kept for one-offs.
**Rationale:**
- Shell history, process lists, and CI logs can leak raw tokens
- Config already has `auth_profiles` structure — connecting fetch to it is low effort
- Explicit flags merge with and override profile headers (explicit wins)
- Doctor already checks config permissions — auth profiles integrate naturally
### D15: Portable Builds — rustls Over native-tls
**Decision:** Use `reqwest` with `rustls-tls` feature, not default native-tls (OpenSSL).
**Rationale:**
- Dockerfile uses `rust:1.93-alpine` with musl target — OpenSSL is a known pain point
- rustls is pure Rust, no system library dependency, simpler cross-compilation
- Smaller binary, no runtime linking issues in containers
- Industry trend: most Rust CLI tools have moved to rustls
### D16: YAML Input Support (Ingest, Not Output)
**Decision:** Accept YAML as input format during fetch; normalize to JSON internally. YAML *output* remains de-scoped.
**Rationale:**
- Many real-world OpenAPI specs are authored in YAML (especially GitLab, Kubernetes ecosystem)
- Rejecting YAML input creates unnecessary adoption friction
- Normalization to JSON at ingest means all internal logic (indexing, querying, show) remains JSON-only
- `raw.source` preserves original bytes for provenance; `raw.json` stores normalized JSON
- YAML output is a different concern (serialization determinism, golden test complexity) and stays in Phase 3
### D17: SSRF Protection and Transport Policy
**Decision:** Block loopback/RFC1918/link-local/multicast targets by default; require HTTPS for remote URLs; validate resolved IP after redirects
**Rationale:**
- Agent-facing CLI tools may run in privileged environments (cloud VMs, CI runners) with access to metadata endpoints (e.g., `169.254.169.254`) and internal services
- Default-deny for private ranges eliminates SSRF as an attack vector without impacting legitimate use
- `--allow-private-host <HOST>` permits explicit exceptions for internal API specs
- HTTPS-by-default prevents MITM injection of malicious spec content into the cache
- DNS rebinding check (validating resolved IP post-redirect) prevents DNS-rebinding bypass where a hostname resolves to a public IP initially then to a private IP on redirect
- Low implementation cost: resolve DNS before connecting, compare against blocked CIDR ranges
### D18: Global Network Policy
**Decision:** `--network auto|offline|online-only` global flag
**Rationale:**
- Manual `sync` is good, but agents/CI need a hard guarantee of no network calls
- `--network offline` makes any network-requiring command fail with `OFFLINE_MODE` (exit 15) instead of silently attempting a connection
- `--network online-only` is the inverse: useful for environments where offline behavior should be flagged
- Default `auto` preserves all existing behavior
- Low implementation cost (check flag before any HTTP call)
### D19: External Ref Bundling (Fetch-Time Only)
**Decision:** Opt-in `--resolve-external-refs` at fetch time with explicit host allowlist
**Rationale:**
- The original D13 correctly avoids query-time network calls for external refs
- But many production specs depend on external refs for core operations (especially microservice architectures)
- Fetch-time bundling with host allowlist preserves the offline guarantee for all query commands
- Explicit `--ref-allow-host` prevents fetching from arbitrary hosts (security)
- Depth and size limits prevent unbounded resolution chains
- Bundled result stored in `raw.json`; original (with $ref pointers) in `raw.source`
### D20: Supply Chain Hardening (Checksums + Signatures)
**Decision:** SHA256SUMS + minisign signatures for release artifacts
**Rationale:**
- Current installer downloads and executes binaries without verification
- SHA256SUMS provides integrity checking; minisign provides provenance verification
- minisign is lightweight, fast, and commonly used in Rust ecosystem (unlike GPG)
- Installer verifies by default (`VERIFY=true`); can be skipped with `VERIFY=false` for trusted environments
- Low implementation cost (one CI step + installer enhancement)
### D21: Alias Format Validation
**Decision:** Strict regex validation `^[A-Za-z0-9][A-Za-z0-9._-]{0,63}$` for alias names
**Rationale:**
- Alias names map directly to directory names under `~/.cache/swagger-cli/aliases/`
- Without validation, aliases like `../../../etc/passwd`, `CON`, or names with shell metacharacters could cause path traversal, filesystem issues on Windows, or injection in scripts
- The regex allows alphanumeric starts, permits dots/hyphens/underscores (common in API naming), and caps at 64 chars
- Validation happens at parse time (before any filesystem operation), failing with `USAGE_ERROR`
### D22: Credential Source Abstraction
**Decision:** `CredentialSource` enum (Literal, EnvVar, Keyring) for auth profile tokens
**Rationale:**
- Plaintext tokens in config files are a security risk, especially in shared or version-controlled environments
- `EnvVar` source avoids persisting tokens on disk entirely — preferred for CI and agent environments
- `Keyring` source (Phase 2) delegates to OS-native secure storage
- Backward-compatible: `Literal` still works for simple setups
- `doctor` already warns on insecure config permissions, so the Literal path is covered
### D23: Coalesced LRU Writes
**Decision:** `last_accessed` metadata writes coalesced with 10-minute minimum interval
**Rationale:**
- Updating metadata on every query creates unnecessary I/O, file lock contention, and SSD write amplification
- With frequent agent queries (potentially multiple per second), per-query writes add measurable latency
- 10-minute coalescing preserves LRU ordering accuracy for eviction while reducing writes by orders of magnitude
- Best-effort, no lock required — acceptable if a coalesced write is lost
### D24: Robot JSON Schema Artifacts
**Decision:** Publish versioned JSON Schema files alongside the binary
**Rationale:**
- Golden tests catch regressions during development, but external consumers (agent frameworks, CI pipelines) need a machine-readable contract
- JSON Schema files (`docs/robot-schema/v1/`) serve as the authoritative specification for robot mode output
- Integration tests validate golden fixtures against these schemas, ensuring the published contract matches reality
- Enables external tooling to validate swagger-cli output without custom parsers
### D25: Structural Diff Command (Phase 2)
**Decision:** `swagger-cli diff` compares two spec states using index-based structural comparison
**Rationale:**
- Sync already computes index diffs (added/removed/modified endpoints and schemas) — diff reuses this logic
- Agents and CI need actionable change reports, not raw text diffs
- `--fail-on breaking` enables CI gate usage (non-zero exit on breaking changes)
- Full breaking-change classification (heuristic semantic analysis) deferred to Phase 3 — Phase 2 reports structural changes only
---
## Open Questions
### Q1: Should we support OpenAPI 2.0 (Swagger)?
**Status:** Deferred to v2
**Rationale:**
- OpenAPI 3.0+ is industry standard (2017+)
- Legacy support adds complexity (different schema)
- Can convert 2.0 → 3.0 with external tools if needed
**Decision:** MVP targets OpenAPI 3.0.x and 3.1.x only
### Q2: How to handle huge specs (10MB+, 1000+ endpoints)?
**Status:** Largely addressed by index-backed architecture
**Current mitigations:**
- Index-backed queries: list/search/tags/schemas load only index.json (10-50KB even for huge specs)
- Raw spec parsed only by `show` command, and only the relevant subtree via JSON pointer
- Max download size guardrail (default 25MB) prevents accidental huge fetches
**Remaining potential issues:**
- `show` on a spec with deeply nested schemas + `--expand-refs` could be slow
- Cache size: 10MB × 10 specs = 100MB (acceptable)
**Optimization if needed:**
- SQLite FTS for search across very large specs
- Lazy raw.json loading with mmap
- Pagination for list output
**Decision:** Index architecture handles the common case; monitor `show` performance on large specs
### Q3: Should aliases support wildcards/regex?
**Status:** Partially addressed — `--all-aliases` flag added
**Use case:**
```bash
swagger-cli list --all-aliases --tag users # Query all aliased APIs
swagger-cli search --all-aliases "create user" # Search across all specs
```
**Resolution:**
- Default remains single alias per query (unchanged behavior)
- `--all-aliases` flag added to `list` and `search` for explicit cross-alias discovery
- Each result includes `alias` field so agents can disambiguate origin
- Wildcard/regex alias patterns (e.g., `prod-*`) deferred to future — `--all-aliases` covers the common case
**Decision:** Single alias by default; `--all-aliases` for federated discovery across all specs
### Q4: Authentication token storage security?
**Status:** Multi-source credential resolution (Literal, EnvVar, Keyring)
**Security model:**
- `CredentialSource::Literal` — inline token in config.toml (backward-compatible; doctor warns on insecure perms)
- `CredentialSource::EnvVar` — reads token from environment variable at runtime (preferred for CI/agents; no token persisted on disk)
- `CredentialSource::Keyring` — OS keychain lookup (Phase 2; macOS Keychain, Linux Secret Service via `keyring` crate)
- File permissions: 0600 (user-only read/write); doctor warns if weaker
- No encryption of config file itself (adds complexity; EnvVar/Keyring avoid the need)
**Config example:**
```toml
[auth_profiles.corp-internal]
url_pattern = "https://internal.api/*"
auth_type = "bearer"
[auth_profiles.corp-internal.credential]
source = "env_var"
var_name = "CORP_API_TOKEN"
```
**Decision:** CredentialSource enum supports Literal (MVP backward compat), EnvVar (MVP, preferred for agents), Keyring (Phase 2)
---
## Appendix A: Sample OpenAPI Spec
```json
{
"openapi": "3.0.0",
"info": {
"title": "Pet Store API",
"version": "1.0.0"
},
"paths": {
"/pets": {
"get": {
"summary": "List all pets",
"operationId": "listPets",
"tags": ["pets"],
"parameters": [
{
"name": "limit",
"in": "query",
"schema": {
"type": "integer",
"format": "int32"
}
}
],
"responses": {
"200": {
"description": "A list of pets",
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Pets"
}
}
}
}
}
},
"post": {
"summary": "Create a pet",
"operationId": "createPet",
"tags": ["pets"],
"requestBody": {
"required": true,
"content": {
"application/json": {
"schema": {
"$ref": "#/components/schemas/Pet"
}
}
}
},
"responses": {
"201": {
"description": "Pet created"
}
}
}
}
},
"components": {
"schemas": {
"Pet": {
"type": "object",
"required": ["id", "name"],
"properties": {
"id": {
"type": "integer",
"format": "int64"
},
"name": {
"type": "string"
},
"tag": {
"type": "string"
}
}
},
"Pets": {
"type": "array",
"items": {
"$ref": "#/components/schemas/Pet"
}
}
},
"securitySchemes": {
"ApiKeyAuth": {
"type": "apiKey",
"in": "header",
"name": "X-API-Key"
}
}
}
}
```
---
## Appendix B: Exit Code Reference
| Code | Meaning | JSON Code | Retry? | Agent Action |
|------|---------|-----------|--------|--------------|
| 0 | Success | N/A | N/A | Process data |
| 1 | Generic error | `INTERNAL_ERROR` | Maybe | Report bug |
| 2 | Usage error (bad args, invalid regex) | `USAGE_ERROR` | No | Fix command syntax |
| 4 | Network error | `NETWORK_ERROR` | Yes | Retry with backoff (5xx only; 4xx not retryable) |
| 5 | Invalid spec | `INVALID_SPEC` | No | Check spec URL |
| 6 | Alias exists | `ALIAS_EXISTS` | No | Use --force |
| 7 | Auth failed | `AUTH_FAILED` | No | Check credentials |
| 8 | Not found | `ALIAS_NOT_FOUND` | No | Run aliases --list |
| 9 | Cache locked (file lock busy) | `CACHE_LOCKED` | Yes | Retry after delay |
| 10 | Cache error | `CACHE_ERROR` | Maybe | Run doctor --fix |
| 11 | Config error | `CONFIG_ERROR` | No | Check config file |
| 12 | IO error | `IO_ERROR` | Maybe | Check permissions |
| 13 | JSON error | `JSON_ERROR` | No | Check input data |
| 14 | Cache integrity (torn/partial state) | `CACHE_INTEGRITY` | Yes | Run doctor --fix |
| 15 | Offline mode blocked network operation | `OFFLINE_MODE` | No | Retry without `--network offline` |
| 16 | Policy blocked (SSRF, insecure transport) | `POLICY_BLOCKED` | No | Use `--allow-private-host` or `--allow-insecure-http` |
---
## Appendix C: Robot Mode Output Schemas
**Success response:**
```json
{
"ok": true,
"data": { ... },
"meta": {
"schema_version": 1,
"tool_version": "semver",
"command": "string",
"alias": "string",
"spec_version": "string",
"cached_at": "ISO-8601",
"duration_ms": 0
}
}
```
**Error response (stderr):**
```json
{
"ok": false,
"error": {
"code": "ERROR_CODE",
"message": "Human-readable error message",
"suggestion": "Actionable suggestion (optional)"
},
"meta": {
"schema_version": 1,
"tool_version": "semver",
"command": "string",
"duration_ms": 0
}
}
```
**Contract guarantee:** Both success and error envelopes ALWAYS include `meta`. Agents need only one JSON parser, not two.
**JSON Schema artifacts (normative):**
Canonical JSON Schema files are versioned and published alongside the binary:
- `docs/robot-schema/v1/success.schema.json` — validates all success responses
- `docs/robot-schema/v1/error.schema.json` — validates all error responses
These schemas are the authoritative contract for robot mode output. Integration tests validate all golden fixtures against these schemas. External agent frameworks can fetch and use them for input validation.
**Compatibility policy:**
- **Additive fields** (new optional fields in `data` or `meta`): no `schema_version` bump required
- **Removed, renamed, or type-changed fields**: MUST bump `schema_version`
- `meta.command_version` (optional) added for command-specific payload evolution when a single command's data shape changes independently of others
---
## Implementation Phases
### Week 1: Core Infrastructure (CP0-CP2)
**CP0: Repo + CLI skeleton**
- Initialize Cargo project with correct dependencies
- Create directory structure
- Wire global flags (`--robot`, `--config`)
- Set up CI skeleton (fmt, clippy, test gates)
- **Exit criteria:** `swagger-cli --help` works; CI runs fmt/clippy/tests; all global flags parsed
**CP1: Cache + Index format**
- Implement XDG-compliant cache directory layout (`~/.cache/swagger-cli/aliases/<alias>/`)
- Implement config management (`~/.config/swagger-cli/config.toml`)
- Implement PathResolver applying D7 precedence (SWAGGER_CLI_HOME, SWAGGER_CLI_CACHE, XDG)
- Build index extraction from raw JSON spec (parse as `serde_json::Value`, extract `SpecIndex`)
- Index build MUST be deterministic (sorted endpoints/schemas/tags)
- Compute effective security per operation (root + operation-level override semantics)
- Implement crash-consistent writes (raw+index first, meta LAST as commit marker, generation + index_hash)
- Implement per-alias file locking
- Implement canonical ingest pipeline: detect format (JSON/YAML via extension + Content-Type), parse, normalize to JSON
- Store original bytes as `raw.source`, normalized JSON as `raw.json`
- Enforce `--max-bytes` via streaming byte count during download (fail before full buffering)
- Validate all index pointers (operation_ptr, schema_ptr) resolve against raw.json after index build
- Implement alias format validation (regex check, reserved name rejection, path traversal prevention)
- Implement SSRF protection (block private/loopback/link-local/multicast ranges, HTTPS-by-default, DNS rebinding check, `--allow-private-host`, `--allow-insecure-http`)
- Implement `fetch` command (URL + file:// + local path + stdin, repeatable --header, --bearer, --auth-profile, --input-format, --resolve-external-refs, timeouts, size caps, retries)
- Write error types with exit codes (including Usage and CacheIntegrity variants)
- **Exit criteria:**
- `fetch` writes raw.source + raw.json + index/meta crash-consistently with fsync; meta.json is commit marker; generation/index_hash validated on read
- Lock acquisition bounded (<=1s default), fails fast with CACHE_LOCKED on timeout
- Index build is deterministic (sorted by path+method / name)
- All index pointers (operation_ptr, schema_ptr) validated against raw.json at build time
- Index contains all fields required by FR-2 robot output (parameters, request_body_required, effective security_schemes + security_required)
- Alias format validation rejects path traversal, reserved names, empty, and overlong inputs
- SSRF policy enforced: loopback/RFC1918/link-local/multicast blocked by default; `--allow-private-host` permits exceptions; `--allow-insecure-http` required for HTTP URLs
- DNS rebinding check validates resolved IP after redirects
- `file://` and local paths work; YAML input accepted and normalized
- Streaming max-bytes enforcement (does not buffer entire response)
- All tests run under SWAGGER_CLI_HOME for hermetic behavior
- Unit tests validate index extraction against fixtures (petstore JSON + petstore YAML + large fixture)
**CP2: Core queries**
- Implement `list` command (filters, sort, limit -- reads index only)
- Implement `show` command (loads index for pointer, then raw.json as Value for full details)
- `show` returns raw subtrees deterministically (no ref expansion in CP2 — deferred to CP3 shared utility)
- Implement robot mode output with `schema_version`, `tool_version`, `command`, `duration_ms`
- Human output formatting (color/unicode controlled by TTY, independent of `--robot`)
- Implement global `--network` policy enforcement (offline blocks fetch/sync with OFFLINE_MODE error)
- **Exit criteria:**
- `list` + `show` + robot output stable with schema_version=1
- Robot error envelope includes meta (schema_version, tool_version, command, duration_ms) -- unified with success envelope
- Invalid regex/options fail with USAGE_ERROR (no silent fallbacks)
- `show` is deterministic: ambiguous path (multiple methods) requires `--method` and returns `USAGE_ERROR` in robot mode with available methods in suggestion
- Latency target validated against GitHub spec fixture (8MB+)
- Golden output test for both commands
- Raw.json removal test: `list` succeeds without raw.json present
**Deliverable:** Working fetch, list, show with index-backed performance and stable robot contract
### Week 2: Polish and Distribution (CP3-CP5)
**CP3: Discovery commands**
- Implement `search` command (tokenized multi-term scoring, Unicode-safe snippets, honors `--case-sensitive` and `--exact`)
- Implement `schemas` command (list from index, `--show` loads raw via JSON pointer)
- Implement `tags` command (reads index only)
- Implement shared `$ref` expansion utility used by:
- `show --expand-refs`
- `schemas --show --expand-refs`
- Bounded depth + cycle detection (`$circular_ref`) + external-ref annotation (`$external_ref`)
- Add golden robot output tests for all commands
- **Exit criteria:** All three commands index-backed; options honored; golden tests added and passing; ref expansion works with cycles and external refs annotated
**CP4: Alias + Sync + Doctor + Cache**
- Implement `aliases` command (list, show, rename, delete, set-default)
- Implement `sync` command (change detection via index comparison, atomic cache update)
- Bounded concurrency for `--all` (`--jobs`, `--per-host`, Retry-After handling)
- Per-alias partial failure reporting in robot output
- Implement `doctor` command (validate cache integrity, pointer validation, stale detection, disk usage, `--fix` with locking)
- Implement `cache` command (stats, prune-stale, max-total-mb with LRU eviction, dry-run)
- Add `--all-aliases` support to `list` and `search` commands
- Implement `diff` command (structural comparison of two spec states, leveraging sync index-diff logic)
- **Exit criteria:**
- All commands concurrency-safe; sync diffs computed from old vs new index
- Sync supports `--details` with capped change lists (agent-actionable added/removed/modified arrays)
- Sync `--all --jobs 4` faster than sequential (verified with 4+ aliases)
- Sync `--resume` resumes interrupted `--all` runs from checkpoint; `--max-failures` aborts after N failures
- Doctor detects partial caches (meta missing, generation/hash mismatch) + validates pointers + warns on insecure config permissions (tokens present)
- Cache stats/prune/eviction operational
- `--all-aliases` returns results with `alias` field from multiple specs
- `diff` command reports added/removed/modified endpoints and schemas
- Integration tests pass
**CP5: Release artifacts + Reliability**
- GitLab CI/CD pipeline (multi-platform builds)
- Supply chain hardening: SHA256SUMS + minisign signatures in release pipeline; cargo-deny + cargo-audit in CI
- Publish robot JSON Schema artifacts (`docs/robot-schema/v1/`); golden tests validate against schemas
- Docker image (correct XDG paths: `/root/.cache/swagger-cli/aliases` + `/root/.config/swagger-cli`)
- Installation script with checksum/signature verification (URLs match chosen hosting -- GitLab, not GitHub placeholder)
- Performance benchmarks in `benches/perf.rs` with Criterion
- Reliability stress tests:
- Fault injection at each write protocol step (verify recoverability)
- Multi-process lock contention (N>=32, verify bounded timeout + no deadlocks)
- Property-based tests for deterministic ordering and pointer validity
- Final golden test validation
- **Exit criteria:** Multi-platform binaries build in CI; Docker image runs correctly; install script downloads from correct URL and verifies integrity (portable sha256sum/shasum); cargo-deny + cargo-audit pass; all benchmarks meet targets; fault injection + lock contention + property tests pass; robot JSON Schema artifacts published and golden tests validate against them
**Deliverable:** Production-ready v1.0.0
### Future (v2+)
**Advanced features:**
- Semantic search
- Diff breaking-change classification (heuristic-based semantic analysis)
- Schema validation
- curl command generation
- Import/export
- OS keychain credential backend
- SBOM/provenance attestation
- Auto-update
---
## Conclusion
This PRD provides a complete blueprint for building swagger-cli, a fast agent-optimized CLI tool for querying OpenAPI specifications. The design prioritizes:
1. **Performance:** <50ms cached queries via index-backed architecture (query commands never parse raw specs)
2. **Reliability:** True crash-consistent cache writes (fsync + bounded lock acquisition), strict option validation, meaningful exit codes, concurrent-safe locking, validated by fault-injection and lock-contention stress tests
3. **Determinism:** Sorted indexes with canonical method ordering, deterministic tie-breaking in search, integer scores, stable golden tests, predictable sync diffs, global network policy for reproducible CI/agent runs
4. **Contract stability:** Unified JSON envelope (meta on both success AND error), versioned schema, strict error taxonomy aligned between code and docs, HTTP error classification
5. **Security:** SSRF protection (default-deny for private ranges, HTTPS required, DNS rebinding check), credential source abstraction (Literal/EnvVar/Keyring), alias format validation, auth profiles, token redaction, insecure config permission warnings, supply chain hardening (SHA256SUMS + minisign + cargo-deny/cargo-audit)
6. **Compatibility:** Tolerant JSON/YAML parsing handles any valid OpenAPI 3.0/3.1 spec, including extensions; effective security semantics; external refs annotated (never fetched) or bundled at fetch time (opt-in)
7. **Portability:** Rustls TLS for Alpine/musl builds, hermetic test suite, no OpenSSL dependency
8. **Simplicity:** Single binary, no runtime dependencies
9. **Extensibility:** Clean architecture for future features (semantic search, diff, validation, YAML output)
10. **Operational hygiene:** Cache lifecycle management (prune, stats, size caps), cross-alias discovery
**Next steps:**
1. Review and approve PRD
2. Create repository
3. Begin CP0 (repo + CLI skeleton)
4. Checkpoint reviews at CP0, CP1, CP2 (Week 1) and CP3, CP4, CP5 (Week 2)
---
**Document Status:** Ready for review
**Target Start Date:** TBD
**Target Release:** v1.0.0 in 2 weeks
---
## Rejected Recommendations
- **Precomputed token postings index + fuzzy search in MVP (ChatGPT #6)** — rejected because the original plan's tokenized multi-term scoring over the SpecIndex is already fast enough for the target use case (3-10 specs, <500 endpoints). Precomputed postings add index format complexity and churn. Fuzzy/typo-tolerant matching adds edge-case complexity (edit distance thresholds, false positives). Both are valid Phase 3 enhancements if search performance degrades with larger specs, but not justified for MVP.
- **Gzip/compressed input support (ChatGPT #1 partial)** — rejected because OpenAPI specs are rarely served gzip-compressed at the application layer (HTTP transport handles this transparently via `Accept-Encoding`). Adding explicit `.json.gz` / `.yaml.gz` handling increases ingestion complexity for a rare case. If needed, users can decompress before piping to stdin.
- **Silent auto-rebuild of index from raw during reads (ChatGPT #4 partial)** — rejected because transparent repair during reads masks bugs and creates unpredictable latency spikes. The current design surfaces CACHE_INTEGRITY errors explicitly and relies on `doctor --fix` for repair. Explicit > implicit for agent reliability. Pointer validation at fetch/sync time and doctor re-validation are accepted (see D19 and FR-9 updates).
- **Async HTTP + explicit service layering with tokio/tracing/src/app/src/infra structure (ChatGPT feedback-2 #6)** — rejected because the MVP scope (10 commands, 3-10 specs, `sync --all` with `--jobs` bounded thread pool) does not justify the complexity of an async runtime (tokio), structured logging framework (tracing), and three-layer architecture (app/infra/cli). `sync --all` already uses bounded thread-based concurrency. The flat `cli/` + `core/` structure is appropriate at this scale. If async becomes needed for Phase 3 (e.g., streaming, WebSocket-based live sync), it can be introduced then without breaking the CLI interface.
- **SBOM generation + cosign provenance attestation in v1 release pipeline (ChatGPT feedback-2 #9 partial)** — rejected for v1 because it adds CI complexity (cargo-cyclonedx, cosign toolchain, attestation workflow) for marginal benefit at initial release. cargo-deny + cargo-audit dependency auditing is accepted and provides the primary supply-chain benefit. SBOM/provenance is tracked in Phase 3 for when the tool has external consumers who need formal software bill of materials.
- **Full breaking-change classification in Phase 2 diff command (ChatGPT feedback-2 #3 partial)** — rejected for Phase 2 because heuristic breaking-change analysis (field removal = breaking, type change = breaking, new optional field = non-breaking) requires substantial domain logic and edge-case handling. Phase 2 diff reports structural changes (added/removed/modified); Phase 3 adds semantic classification. This avoids shipping unreliable breaking-change verdicts.