diff --git a/BACKBURNER.md b/BACKBURNER.md
new file mode 100644
index 0000000..dd73ba9
--- /dev/null
+++ b/BACKBURNER.md
@@ -0,0 +1,10 @@
+# Backburner Ideas
+
+Last updated: February 23, 2026
+
+## Deferred
+
+- Separate pricing semantics for subscription vs API usage:
+  - `cburn` should model Claude subscription economics independently from Admin/API token pricing.
+  - Admin Cost API integration is still valuable for API-key workflows, but should not be treated as the canonical source for subscription usage.
+  - Revisit after daemon/event infrastructure work stabilizes.
diff --git a/CEO_PITCH_DECKS.md b/CEO_PITCH_DECKS.md
new file mode 100644
index 0000000..7794176
--- /dev/null
+++ b/CEO_PITCH_DECKS.md
@@ -0,0 +1,236 @@
+# CEO Meeting Script: Four Daemon-Enabled Product Bets
+
+## Opening
+Today I want to walk through four product bets that become possible because we now have a continuously running daemon that can observe usage as it happens, not just after the fact.
+
+I am not presenting speculative AI magic. I am presenting four concrete products, each with a hard-nosed view of utility, feasibility, risk, and build path.
+
+The four bets are:
+1. Cognitive Flight Recorder
+2. Runaway Loop Quencher
+3. Session Archeology Engine
+4. Latent Tool ROI Scanner
+
+My recommendation is not “build all at once.” My recommendation is staged execution with clear kill criteria.
+
+---
+
+## 1) Cognitive Flight Recorder
+
+### The pitch
+If our AI spend spikes tomorrow, leadership will ask two questions immediately: what happened, and why did it happen. Right now, we can answer “how much,” but we cannot reliably answer “why.”
+
+The Cognitive Flight Recorder solves that. It turns a costly session into a replayable incident timeline. Not a dashboard snapshot. A sequence: where cost accelerated, where model behavior changed, where cache efficiency collapsed, and where the session crossed from productive to expensive.
+
+This product creates operational trust. When AI systems are expensive, trust depends on explainability under stress.
+
+### Why we need it
+Postmortems are currently slow and anecdotal. Engineers reconstruct stories manually. Finance gets numbers without causality. Leadership gets noise.
+
+A Flight Recorder makes AI spend investigable the same way we investigate reliability incidents. That is a major enterprise unlock.
+
+### How it works
+The daemon continuously captures session telemetry and emits timeline events. The recorder layers on top:
+- It builds per-session event streams with timestamps, token deltas, model transitions, and cache transitions.
+- It detects inflection points where cost trajectory changes materially.
+- It generates a concise incident report: “what changed,” “likely causes,” and “which preventive policy would have helped.”
+
+The output is not just visual. It is operational: replay + evidence + recommended guardrail.
+
+### Downstream effects
+If we execute this well, we get:
+- Faster incident resolution for spend spikes.
+- Better policy tuning because we can pinpoint the moment of failure.
+- Stronger executive confidence in scaling agent usage.
+- A compelling enterprise story: “we can explain every anomaly.”
+
+### Skeptical view
+Is it actually useful? Only if it leads to action. A pretty timeline that no one uses is dead weight.
+
+Is it feasible with current harnesses? Partially. We can do strong metadata-level replay now. Deep semantic replay depends on richer telemetry and raises privacy concerns.
+
+The critical risk is false causality: users may confuse sequence with cause. We mitigate this by attaching confidence levels and explicit evidence for every claim.
+
+### Implementation roadmap
+Phase 1 (2-3 weeks): metadata replay and “Top Cost Incidents” report.  
+Phase 2 (3-5 weeks): inflection detection and root-cause ranking with confidence scoring.  
+Phase 3 (4+ weeks): optional deep replay, privacy controls, and incident workflow integrations.
+
+### Decision rule
+Proceed if incident reports result in measurable policy changes. Kill or narrow if they remain passive observability artifacts.
+
+---
+
+## 8) Runaway Loop Quencher
+
+### The pitch
+Most bad AI spend is not one bad call. It is a loop: repeated expensive behavior with little progress. If we only detect this after the session, we are too late.
+
+The Runaway Loop Quencher is an active safety layer. It watches live telemetry, identifies likely runaway patterns, and intervenes before the burn compounds.
+
+This is the direct path to cost containment at runtime.
+
+### Why we need it
+Without active containment, scaling agent autonomy is financially unsafe. Teams become conservative. Leaders reduce usage. Innovation slows.
+
+If we can intervene mid-flight, we convert catastrophic sessions into manageable sessions.
+
+### How it works
+The daemon computes rolling risk signals:
+- accelerating cost per minute
+- repetitive call signatures
+- degrading cache performance
+- high token growth with weak progress proxies
+
+A policy engine converts those signals into action tiers:
+- Soft: alert and suggest a reset strategy
+- Guarded: require confirmation before continuing expensive patterns
+- Hard: stop execution for supported harnesses
+
+We start advisory-first, then move toward control where integrations allow.
+
+### Downstream effects
+If successful:
+- fewer runaway incidents
+- lower variance in daily spend
+- greater confidence in letting agents run longer on valuable tasks
+- ability to define budget safety SLOs
+
+### Skeptical view
+Is it actually useful? Yes, but only if precision is good. High false positives will cause immediate distrust and disablement.
+
+Is it feasible given harness reality? Detection and alerting are feasible now. Hard-stop control is integration-dependent and not universally available.
+
+The hard technical challenge is “progress.” We can estimate risk, but progress is not always machine-observable. That means we should not over-automate too early.
+
+### Implementation roadmap
+Phase 1 (2-4 weeks): risk scoring, alerting, and daemon risk endpoint.  
+Phase 2 (4-6 weeks): human confirmation gates and cooldown policies.  
+Phase 3 (6+ weeks): optional hard-stop integrations and policy simulation.
+
+### Decision rule
+Ship only if we can keep false positives low enough that teams keep it enabled. If intervention is frequently wrong, this product should remain advisory.
+
+---
+
+## 9) Session Archeology Engine
+
+### The pitch
+Right now we can tell teams they spent too much. We cannot tell them which recurring behavior patterns caused it.
+
+The Session Archeology Engine classifies sessions into behavioral archetypes and ties each archetype to practical intervention playbooks.
+
+This turns raw telemetry into behavior change.
+
+### Why we need it
+People do not improve from aggregate numbers. They improve from named patterns and concrete alternatives.
+
+If we can say, “These two session archetypes account for most avoidable spend, and here is exactly how to run them differently,” we create durable cost literacy.
+
+### How it works
+We extract session-level feature vectors:
+- session shape and duration profile
+- token composition and burstiness
+- cache behavior
+- model mix and switch behavior
+- retry and repetition patterns
+
+We cluster sessions and assign human-readable archetypes, then connect each archetype to:
+- likely waste mechanism
+- recommended policy/routing pattern
+- suggested prompt and workflow changes
+
+The output is both analytical and prescriptive.
+
+### Downstream effects
+If this works:
+- managers coach with evidence instead of intuition
+- teams adopt archetype-specific best practices
+- routing policies improve faster because they target behaviors, not averages
+- executives get clean narrative reporting on spend dynamics
+
+### Skeptical view
+Is it actually useful? It is useful only if archetypes stay stable and map to actions. Otherwise it becomes taxonomy theater.
+
+Is it feasible? Yes, baseline version is feasible with existing metadata. Advanced value improves with richer tool and outcome signals.
+
+Main risk: labels can drift as models and workflows change. We mitigate with periodic retraining, versioned labels, and strict “action attached” requirements.
+
+### Implementation roadmap
+Phase 1 (2-3 weeks): clustering baseline and weekly archetype report.  
+Phase 2 (3-5 weeks): intervention playbooks and policy recommendations per archetype.  
+Phase 3 (4+ weeks): team benchmarking and archetype drift alerts.
+
+### Decision rule
+Keep investing only if archetypes produce measurable behavior and cost improvements, not just better reporting.
+
+---
+
+## 13) Latent Tool ROI Scanner
+
+### The pitch
+Model choice is not the only cost lever. Tool behavior often dominates spend efficiency, and today that layer is mostly invisible.
+
+The Latent Tool ROI Scanner identifies which tools and workflows consume disproportionate cost relative to useful outcome, and recommends what to constrain, replace, or redesign.
+
+This is potentially the highest upside concept, but also the highest epistemic risk.
+
+### Why we need it
+Optimization efforts usually target visible levers. Hidden tool-level waste can remain untouched for months.
+
+If we can reveal negative-ROI tool patterns, we unlock savings without reducing strategic AI adoption.
+
+### How it works
+The scanner combines daemon telemetry with richer tool-event instrumentation:
+- per-tool invocation frequency and cost footprint
+- failure and retry signatures
+- outcome proxies from delivery systems (tests, merges, ticket transitions)
+
+It then computes conservative ROI scores and counterfactual scenarios:
+- “If we reduce this pattern by 30%, estimated impact is X with confidence band Y.”
+
+Recommendations are always evidence-backed and confidence-scored.
+
+### Downstream effects
+If accurate:
+- identifies hidden spend sinks
+- informs platform/tooling investments
+- enables high-leverage policy changes with limited developer friction
+- strengthens unit economics of agent operations
+
+### Skeptical view
+Is it actually useful today? Not fully. Without stronger outcome labeling, ROI claims can become fragile or misleading.
+
+Is it feasible with current harnesses? Partially. We can pilot scoring frameworks, but high-confidence production decisions require instrumentation we do not yet have.
+
+This is exactly where we should avoid overclaiming.
+
+### Implementation roadmap
+Phase 0 (1-2 weeks): instrumentation gap audit and schema design.  
+Phase 1 (3-4 weeks): tool-event ingestion and normalization pipeline.  
+Phase 2 (4-6 weeks): conservative ROI scoring + confidence intervals.  
+Phase 3 (4+ weeks): recommendation engine and controlled experiments.
+
+### Decision rule
+Treat as pilot until precision is validated against human review and external outcomes. If precision is weak, keep this as exploratory analytics.
+
+---
+
+## Portfolio recommendation and sequencing
+
+If we prioritize for impact times feasibility:
+1. Cognitive Flight Recorder
+2. Session Archeology Engine
+3. Runaway Loop Quencher (advisory first, control later)
+4. Latent Tool ROI Scanner (pilot behind instrumentation gate)
+
+This sequencing gives us near-term value while building the telemetry foundation needed for the harder products.
+
+The overarching principle: every insight must be tied to an action, every action must be measurable, and every high-stakes claim must carry confidence.
+
+## Closing
+The daemon turns our system from retrospective analytics into a live control surface. These four products are how we monetize and operationalize that shift.
+
+The question is not whether these ideas are interesting. The question is whether we can ship them with enough truthfulness that teams trust them.
+
+With staged delivery and strict kill criteria, we can.