chore: Add test-runner agent, agent-swarm-launcher skill, review artifacts, and beads updates

- .claude/agents/test-runner.md: New Claude Code agent definition for running cargo test suites and analyzing results, configured with haiku model for fast execution. - skills/agent-swarm-launcher/: New skill for bootstrapping coordinated multi-agent workflows with AGENTS.md reconnaissance, Agent Mail coordination, and beads task tracking. - api-review.html, phase-a-review.html: Self-contained HTML review artifacts for API audit and Phase A search pipeline review. - .beads/issues.jsonl, .beads/last-touched: Updated issue tracker state reflecting current project work items. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 09:36:05 -05:00
parent a417640faa
commit 549a0646d7
7 changed files with 3051 additions and 2 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
--- a/.beads/last-touched
+++ b/.beads/last-touched
@@ -1 +1 @@
-bd-35o
+bd-1j1
--- a/.claude/agents/test-runner.md
+++ b/.claude/agents/test-runner.md
@@ -0,0 +1,59 @@
 ---
 name: test-runner
 description: "Use this agent when unit tests need to be run and results analyzed. This includes after writing or modifying code, before committing changes, or when explicitly asked to verify test status.\\n\\nExamples:\\n\\n- User: \"Please refactor the parse_session function to handle edge cases\"\\n  Assistant: \"Here is the refactored function with edge case handling: ...\"\\n  [code changes applied]\\n  Since a significant piece of code was modified, use the Task tool to launch the test-runner agent to verify nothing is broken.\\n  Assistant: \"Now let me run the test suite to make sure everything still passes.\"\\n\\n- User: \"Do all tests pass?\"\\n  Assistant: \"Let me use the Task tool to launch the test-runner agent to check the current test status.\"\\n\\n- User: \"I just finished implementing the search feature\"\\n  Assistant: \"Let me use the Task tool to launch the test-runner agent to validate the implementation.\"\\n\\n- After any logical chunk of code is written or modified, proactively use the Task tool to launch the test-runner agent to run the tests before reporting completion to the user."
 tools: Bash
 model: haiku
 color: orange
 ---
 You are an expert test execution and analysis engineer. Your sole responsibility is to run the project's unit test suite, interpret the results with precision, and deliver a clear, actionable summary.
 ## Execution Protocol
 1. **Discover the test framework**: Examine the project structure to determine how tests are run:
   - Look for `Cargo.toml` (Rust: `cargo test`)
   - If unclear, check README or CLAUDE.md for test instructions
 2. **Run the tests**: Execute the appropriate test command. Capture full output including stdout and stderr. Do NOT run tests interactively or with watch mode. Use flags that produce verbose or detailed output when available (e.g., `cargo test -- --nocapture`, `jest --verbose`).
 3. **Analyze results**: Parse the test output carefully and categorize:
   - Total tests run
   - Tests passed
   - Tests failed (with details)
   - Tests skipped/ignored
   - Compilation errors (if tests couldn't even run)
 4. **Report findings**:
   **If ALL tests pass:**
   Provide a concise success summary:
   - Total test count and pass count
   - Execution time if available
   - Note any skipped/ignored tests and why (if apparent)
   - A clear statement: "All tests passed."
   **If ANY tests fail:**
   Provide a detailed failure report:
   - List each failing test by its full name/path
   - Include the assertion error or panic message for each failure
   - Include relevant expected vs actual values
   - Note the file and line number where the failure occurred (if available)
   - Group failures by module/file if there are many
   - Suggest likely root causes when the error messages make it apparent
   - Note if failures appear related (e.g., same underlying issue)
   **If tests cannot run (compilation/setup error):**
   - Report the exact error preventing test execution
   - Identify the file and line causing the issue
   - Distinguish between test code errors and source code errors
 ## Rules
 - NEVER modify any source code or test code. You are read-only except for running the test command.
 - NEVER skip running tests and guess at results. Always execute the actual test command.
 - NEVER run the full application or any destructive commands. Only run test commands.
 - If the test suite is extremely large, run it fully anyway. Do not truncate or sample.
 - If multiple test targets exist (unit, integration, e2e), run unit tests only unless instructed otherwise.
 - Report raw numbers. Do not round or approximate test counts.
 - If tests produce warnings (not failures), mention them briefly but clearly separate them from failures.
 - Keep the summary structured and scannable. Use bullet points and clear headers.
--- a/api-review.html
+++ b/api-review.html
--- a/phase-a-review.html
+++ b/phase-a-review.html
--- a/skills/agent-swarm-launcher/SKILL.md
+++ b/skills/agent-swarm-launcher/SKILL.md
@@ -0,0 +1,36 @@
 ---
 name: agent-swarm-launcher
 description: "Launch a multi-agent “swarm” workflow for a repository: read and follow AGENTS.md/README.md, perform an architecture/codebase reconnaissance, then coordinate work via Agent Mail / beads-style task tracking when those tools are available. Use when you want to quickly bootstrap a coordinated agent workflow, avoid communication deadlocks, and start making progress on prioritized tasks."
 ---
 # Agent Swarm Launcher
 ## Workflow (do in order)
 1. Read *all* `AGENTS.md` and `README.md` files carefully and completely.
   - If multiple `AGENTS.md` files exist, treat deeper ones as higher priority within their directory scope.
   - Note any required workflows (e.g., TDD), tooling conventions, and “robot mode” flags.
 2. Enter “code investigation” mode and understand the project.
   - Identify entrypoints, key packages/modules, and how data flows.
   - Note build/test commands and any local dev constraints.
   - Summarize the technical architecture and purpose of the project.
 3. Register with Agent Mail and coordinate, if available.
   - If “MCP Agent Mail” exists in this environment, register and introduce yourself to the other agents.
   - Check Agent Mail and promptly respond to any messages.
   - If “beads” tracking is used by the repo/team, open/continue the current bead(s) and mark progress as you go.
   - If Agent Mail/beads are not available, state that plainly and proceed with a lightweight local substitute (a short task checklist in the thread).
 4. Start work (do not get stuck waiting).
   - Acknowledge incoming requests promptly.
   - Do not get stuck in “communication purgatory” where nothing is getting done.
   - If you are blocked on prioritization, look for a prioritization tool mentioned in `AGENTS.md` (for example “bv”) and use it; otherwise propose the next best task(s) and proceed.
   - If `AGENTS.md` references a task system (e.g., beads), pick the next task you can complete usefully and start.
 ## Execution rules
 - Follow repository instructions over this skill if they conflict.
 - Prefer action + short status updates over prolonged coordination.
 - If a referenced tool does not exist, do not hallucinate it—fall back and keep moving.
 - Do not claim you registered with or heard from other agents unless you actually did via the available tooling.
--- a/skills/agent-swarm-launcher/agents/openai.yaml
+++ b/skills/agent-swarm-launcher/agents/openai.yaml
@@ -0,0 +1,4 @@
 interface:
  display_name: "Agent Swarm Launcher"
  short_description: "Kick off multi-agent repo onboarding"
  default_prompt: "Use $agent-swarm-launcher to onboard, coordinate, and start the next prioritized task."