gitlore/specs/SPEC_discussion_analysis.md

# Spec: Discussion Analysis — LLM-Powered Discourse Enrichment

**Parent:** SPEC_explain.md (replaces key_decisions heuristic, line 270)
**Created:** 2026-03-11
**Status:** DRAFT — iterating with user

## Spec Status
| Section | Status | Notes |
|---------|--------|-------|
| Objective | draft | Core vision defined, success metrics TBD |
| Tech Stack | draft | Bedrock + Anthropic API dual-backend |
| Architecture | draft | Pre-computed enrichment pipeline |
| Schema | draft | `discussion_analysis` table with staleness detection |
| CLI Command | draft | `lore enrich discussions` |
| LLM Provider | draft | Configurable backend abstraction |
| Explain Integration | draft | Replaces heuristic with DB lookup |
| Prompt Design | draft | Thread-level discourse classification |
| Testing Strategy | draft | Includes mock LLM for deterministic tests |
| Boundaries | draft | |
| Tasks | not started | Blocked on spec approval |

**Definition of Complete:** All sections `complete`, Open Questions empty,
every user journey has tasks, every task has TDD workflow and acceptance criteria.

---

## Open Questions (Resolve Before Implementation)

1. **Bedrock model ID**: Which exact Bedrock model will be used? (Assuming `anthropic.claude-3-haiku-*` — need the org-approved ARN or model ID.)
2. **Auth mechanism**: Does the Bedrock setup use IAM role assumption, SSO profile, or explicit access keys? This affects the SDK configuration.
3. **Rate limiting**: What's the org's Bedrock rate limit? This determines batch concurrency.
4. **Cost ceiling**: Should there be a per-run token budget or discussion count cap? (e.g., `--max-threads 200`)
5. **Confidence thresholds**: Below what confidence should we discard an analysis vs. store it with low confidence?
6. **explain integration field name**: Replace `key_decisions` entirely, or add a new `discourse_analysis` section alongside it? (Recommendation: replace `key_decisions` — the heuristic is acknowledged as inadequate.)

---

## Objective

**Goal:** Pre-compute structured discourse analysis for discussion threads using an LLM (Claude Haiku via Bedrock or Anthropic API), storing results locally so that `lore explain` and future commands can surface meaningful decisions, answered questions, and consensus without runtime LLM calls.

**Problem:** The current `key_decisions` heuristic in `explain` correlates state-change events with notes by the same actor within 60 minutes. This produces mostly empty results because real decisions happen in discussion threads, not at the moment of state changes. The heuristic cannot understand conversational semantics — whether a comment confirms a proposal, answers a question, or represents consensus.

**What this enables:**
- `lore explain issues 42` shows *actual* decisions extracted from discussion threads, not event-note temporal coincidences
- Reusable across commands — any command can query `discussion_analysis` for pre-computed insights
- Fully offline at query time — LLM enrichment is a batch pre-computation step
- Incremental — only re-analyzes threads whose notes have changed (staleness via `notes_hash`)

**Success metrics:**
- `lore enrich discussions` processes 100 threads in <60s with Haiku
- `lore explain` key_decisions section populated from enrichment data in <500ms (no LLM calls)
- Staleness detection: re-running enrichment skips unchanged threads
- Zero impact on users without LLM configuration — graceful degradation to empty key_decisions

---

## Tech Stack & Constraints

| Layer | Technology | Notes |
|-------|-----------|-------|
| Language | Rust | nightly-2026-03-01 |
| LLM (primary) | Claude Haiku via AWS Bedrock | Org-approved, security-compliant |
| LLM (fallback) | Claude Haiku via Anthropic API | For personal/non-org use |
| HTTP | asupersync `HttpClient` | Existing wrapper in `src/http.rs` |
| Database | SQLite via rusqlite | New migration for `discussion_analysis` table |
| Config | `~/.config/lore/config.json` | New `enrichment` section |

**Constraints:**
- Bedrock is the primary backend (org security requirement for Taylor's work context)
- Anthropic API is an alternative for non-org users
- `lore explain` must NEVER make runtime LLM calls — all enrichment is pre-computed
- `lore explain` performance budget unchanged: <500ms
- Enrichment is an explicit opt-in step (`lore enrich`), never runs during `sync`
- Must work when no LLM is configured — `key_decisions` degrades to empty array (or falls back to heuristic as transitional behavior)

---

## Architecture

### System Overview

```
┌─────────────────────────────────────────────────┐
│                  lore enrich                     │
│  (explicit user/agent command, batch operation)  │
└──────────────────────┬──────────────────────────┘
                       │
         ┌─────────────▼─────────────┐
         │    Enrichment Pipeline     │
         │  1. Select stale threads   │
         │  2. Build LLM prompts      │
         │  3. Call LLM (batched)     │
         │  4. Parse responses        │
         │  5. Store in DB            │
         └─────────────┬─────────────┘
                       │
         ┌─────────────▼─────────────┐
         │   discussion_analysis     │
         │   (SQLite table)          │
         └─────────────┬─────────────┘
                       │
         ┌─────────────▼─────────────┐
         │   lore explain / other    │
         │   (simple SELECT query)   │
         └───────────────────────────┘
```

### Data Flow

1. **Staleness detection**: For each discussion, compute `SHA-256(sorted note IDs + note bodies)`. Compare against stored `notes_hash`. Skip if unchanged.
2. **Prompt construction**: Extract the last N notes (configurable, default 5) from the thread. Build a structured prompt asking for discourse classification.
3. **LLM call**: Send to configured backend (Bedrock or Anthropic API). Parse structured JSON response.
4. **Storage**: Upsert into `discussion_analysis` with analysis results, model ID, timestamp, and notes_hash.

### Pre-computation vs Runtime Trade-offs

| Concern | Pre-computed (chosen) | Runtime |
|---------|----------------------|---------|
| explain latency | <500ms (DB query) | 2-5s per thread (LLM call) |
| Offline capability | Full | None |
| Bedrock compliance | Clean separation | Leaks into explain path |
| Reusability | Any command can query | Tied to explain |
| Freshness | Stale until re-enriched | Always current |
| Cost | Batch (predictable) | Per-query (unbounded) |

---

## Schema

### New Migration (next available version)

```sql
CREATE TABLE discussion_analysis (
    id INTEGER PRIMARY KEY,
    discussion_id INTEGER NOT NULL REFERENCES discussions(id),
    analysis_type TEXT NOT NULL,  -- 'decision', 'question_answered', 'consensus', 'open_debate', 'informational'
    confidence REAL NOT NULL,     -- 0.0 to 1.0
    summary TEXT NOT NULL,        -- LLM-generated 1-2 sentence summary
    evidence_note_ids TEXT,       -- JSON array of note IDs that support this analysis
    model_id TEXT NOT NULL,       -- e.g. 'anthropic.claude-3-haiku-20240307-v1:0'
    analyzed_at INTEGER NOT NULL, -- ms epoch
    notes_hash TEXT NOT NULL,     -- SHA-256 of thread content for staleness detection

    UNIQUE(discussion_id, analysis_type)
);

CREATE INDEX idx_discussion_analysis_discussion
    ON discussion_analysis(discussion_id);

CREATE INDEX idx_discussion_analysis_type
    ON discussion_analysis(analysis_type);
```

**Design decisions:**
- `UNIQUE(discussion_id, analysis_type)`: A thread can have at most one analysis per type. Re-enrichment upserts.
- `evidence_note_ids` is a JSON array (not a junction table) because it's read-only metadata, never queried by note ID.
- `notes_hash` enables O(1) staleness checks without re-reading all notes.
- `confidence` allows filtering in queries (e.g., only show decisions with confidence > 0.7).
- `analysis_type` uses lowercase snake_case strings, not an enum constraint, for forward compatibility.

### Analysis Types

| Type | Description | Example |
|------|-------------|---------|
| `decision` | A concrete decision was made or confirmed | "Team agreed to use Redis for caching" |
| `question_answered` | A question was asked and definitively answered | "Confirmed: the API supports pagination via cursor" |
| `consensus` | Multiple participants converged on an approach | "All reviewers approved the retry-with-backoff strategy" |
| `open_debate` | Active disagreement or unresolved discussion | "Disagreement on whether to use gRPC vs REST" |
| `informational` | Thread is purely informational, no actionable discourse | "Status update on deployment progress" |

### Notes Hash Computation

```
notes_hash = SHA-256(
    note_1_id + ":" + note_1_body + "\n" +
    note_2_id + ":" + note_2_body + "\n" +
    ...
)
```

Notes sorted by `id` (insertion order) before hashing. This means:
- New note added → hash changes → re-enrich
- Note edited (body changes) → hash changes → re-enrich
- No changes → hash matches → skip

---

## CLI Command

### `lore enrich discussions`

```bash
# Enrich all stale discussions across all projects
lore enrich discussions

# Scope to a project
lore enrich discussions -p group/repo

# Scope to a single entity's discussions
lore enrich discussions --issue 42 -p group/repo
lore enrich discussions --mr 99 -p group/repo

# Force re-enrichment (ignore staleness)
lore enrich discussions --force

# Dry run (show what would be enriched, don't call LLM)
lore enrich discussions --dry-run

# Limit batch size
lore enrich discussions --max-threads 50

# Robot mode
lore -J enrich discussions
```

### Robot Mode Output

```json
{
  "ok": true,
  "data": {
    "total_discussions": 1200,
    "stale": 45,
    "enriched": 45,
    "skipped_unchanged": 1155,
    "errors": 0,
    "tokens_used": {
      "input": 23400,
      "output": 4500
    }
  },
  "meta": { "elapsed_ms": 32000 }
}
```

### Human Mode Output

```
Enriching discussions...

  Project: vs/typescript-code
    Discussions: 1,200 total, 45 stale
    Enriching: ████████████████████ 45/45
    Results: 12 decisions, 8 questions answered, 5 consensus, 3 debates, 17 informational
    Tokens: 23.4K input, 4.5K output

  Done in 32s
```

### Command Registration

```rust
/// Pre-compute discourse analysis for discussion threads using LLM
#[command(after_help = "\x1b[1mExamples:\x1b[0m
  lore enrich discussions                      # Enrich all stale discussions
  lore enrich discussions -p group/repo        # Scope to project
  lore enrich discussions --issue 42           # Single issue's discussions
  lore -J enrich discussions --dry-run         # Preview what would be enriched")]
Enrich {
    /// What to enrich: "discussions"
    #[arg(value_parser = ["discussions"])]
    target: String,

    /// Scope to project (fuzzy match)
    #[arg(short, long)]
    project: Option<String>,

    /// Scope to a specific issue's discussions
    #[arg(long, conflicts_with = "mr")]
    issue: Option<i64>,

    /// Scope to a specific MR's discussions
    #[arg(long, conflicts_with = "issue")]
    mr: Option<i64>,

    /// Re-enrich all threads regardless of staleness
    #[arg(long)]
    force: bool,

    /// Show what would be enriched without calling LLM
    #[arg(long)]
    dry_run: bool,

    /// Maximum threads to enrich in one run
    #[arg(long, default_value = "500")]
    max_threads: usize,
},
```

---

## LLM Provider Abstraction

### Config Schema

New `enrichment` section in `~/.config/lore/config.json`:

```json
{
  "enrichment": {
    "provider": "bedrock",
    "bedrock": {
      "region": "us-east-1",
      "modelId": "anthropic.claude-3-haiku-20240307-v1:0",
      "profile": "default"
    },
    "anthropicApi": {
      "modelId": "claude-3-haiku-20240307"
    },
    "concurrency": 4,
    "maxNotesPerThread": 5,
    "minConfidence": 0.6
  }
}
```

**Provider selection:**
- `"bedrock"` — AWS Bedrock (uses AWS SDK credential chain: env vars → profile → IAM role)
- `"anthropic"` — Anthropic API (uses `ANTHROPIC_API_KEY` env var)
- `null` or absent — enrichment disabled, `lore enrich` exits with informative message

### Rust Abstraction

```rust
/// Trait for LLM backends. Implementations handle auth, serialization, and API specifics.
#[async_trait]
pub trait LlmProvider: Send + Sync {
    /// Send a prompt and get a structured response.
    async fn complete(&self, prompt: &str, max_tokens: u32) -> Result<LlmResponse>;

    /// Provider name for logging/storage (e.g., "bedrock", "anthropic")
    fn provider_name(&self) -> &str;

    /// Model identifier for storage (e.g., "anthropic.claude-3-haiku-20240307-v1:0")
    fn model_id(&self) -> &str;
}

pub struct LlmResponse {
    pub content: String,
    pub input_tokens: u32,
    pub output_tokens: u32,
    pub stop_reason: String,
}
```

### Bedrock Implementation Notes

- Uses AWS SDK `InvokeModel` API (not Converse) for Anthropic models on Bedrock
- Request body follows Anthropic Messages API format, wrapped in Bedrock's envelope
- Auth: AWS credential chain (env → profile → IMDS)
- Region from config or `AWS_REGION` env var
- Content type: `application/json`, accept: `application/json`

### Anthropic API Implementation Notes

- Standard Messages API (`POST /v1/messages`)
- Auth: `x-api-key` header from `ANTHROPIC_API_KEY` env var
- Model ID from config `enrichment.anthropicApi.modelId`

---

## Prompt Design

### Thread-Level Analysis Prompt

The prompt receives the last N notes from a discussion thread and classifies the discourse.

```
You are analyzing a discussion thread from a software project's issue tracker.

Thread context:
- Entity: {entity_type} #{iid} "{title}"
- Thread started: {first_note_at}
- Total notes in thread: {note_count}

Notes (most recent {N} shown):

[Note by @{author} at {timestamp}]
{body}

[Note by @{author} at {timestamp}]
{body}

...

Classify this thread's discourse. Respond with JSON only:

{
  "analysis_type": "decision" | "question_answered" | "consensus" | "open_debate" | "informational",
  "confidence": 0.0-1.0,
  "summary": "1-2 sentence summary of what was decided/answered/debated",
  "evidence_note_indices": [0, 2]  // indices of notes that most support this classification
}

Classification guide:
- "decision": A concrete choice was made. Look for: "let's go with", "agreed", "approved", explicit confirmation of an approach.
- "question_answered": A question was asked and definitively answered. Look for: question mark followed by a clear factual response.
- "consensus": Multiple people converged. Look for: multiple approvals, "+1", "LGTM", agreement from different authors.
- "open_debate": Active disagreement or unresolved alternatives. Look for: "but", "alternatively", "I disagree", competing proposals without resolution.
- "informational": Status updates, FYI notes, no actionable discourse.

If the thread is ambiguous, prefer "informational" with lower confidence over guessing.
```

### Prompt Design Principles

1. **Structured JSON output** — Haiku is reliable at JSON generation with clear schema
2. **Evidence-backed** — `evidence_note_indices` ties the classification to specific notes, enabling the UI to show "why"
3. **Conservative default** — "informational" is the fallback, preventing false-positive decisions
4. **Limited context window** — Last 5 notes (configurable) keeps token usage low per thread
5. **No system prompt tricks** — Straightforward classification task within Haiku's strengths

### Token Budget Estimation

| Component | Tokens (approx) |
|-----------|-----------------|
| System/instruction prompt | ~300 |
| Thread metadata | ~50 |
| 5 notes (avg 100 words each) | ~750 |
| Response | ~100 |
| **Total per thread** | **~1,200** |

At Haiku pricing (~$0.25/1M input, ~$1.25/1M output):
- 100 threads ≈ $0.03 input + $0.01 output = **~$0.04**
- 1,000 threads ≈ **~$0.40**

---

## Explain Integration

### Current Behavior (to be replaced)

`explain.rs:650` — `extract_key_decisions()` uses the 60-minute same-actor heuristic.

### New Behavior

When `discussion_analysis` table has data for the entity's discussions:

```rust
fn fetch_key_decisions_from_enrichment(
    conn: &Connection,
    entity_type: &str,
    entity_id: i64,
    max_decisions: usize,
) -> Result<Vec<KeyDecision>> {
    let id_col = id_column_for(entity_type);
    let sql = format!(
        "SELECT da.analysis_type, da.confidence, da.summary, da.evidence_note_ids,
                da.analyzed_at, d.gitlab_discussion_id
         FROM discussion_analysis da
         JOIN discussions d ON da.discussion_id = d.id
         WHERE d.{id_col} = ?1
           AND da.analysis_type IN ('decision', 'question_answered', 'consensus')
           AND da.confidence >= ?2
         ORDER BY da.confidence DESC, da.analyzed_at DESC
         LIMIT ?3"
    );
    // ... map to KeyDecision structs
}
```

### Fallback Strategy

```
if discussion_analysis table has rows for this entity:
    use enrichment data → key_decisions
else if enrichment is not configured:
    fall back to heuristic (existing behavior)
else:
    return empty key_decisions with a hint: "Run 'lore enrich discussions' to populate"
```

This preserves backwards compatibility during rollout. The heuristic can be removed entirely once enrichment is the established workflow.

### KeyDecision Struct Changes

```rust
#[derive(Debug, Serialize)]
pub struct KeyDecision {
    pub timestamp: String,           // ISO 8601 (analyzed_at or note timestamp)
    pub actor: Option<String>,       // May not be single-actor for consensus
    pub action: String,              // analysis_type: "decision", "question_answered", "consensus"
    pub summary: String,             // LLM-generated summary (replaces context_note)
    pub confidence: f64,             // 0.0-1.0
    pub discussion_id: Option<String>, // gitlab_discussion_id for linking
    #[serde(skip_serializing_if = "Option::is_none")]
    pub source: Option<String>,      // "enrichment" or "heuristic" (transitional)
}
```

---

## Testing Strategy

### Unit Tests (Mock LLM)

The LLM provider trait enables deterministic testing with a mock:

```rust
struct MockLlmProvider {
    responses: Vec<String>,  // pre-canned JSON responses
    call_count: AtomicUsize,
}

impl LlmProvider for MockLlmProvider {
    async fn complete(&self, _prompt: &str, _max_tokens: u32) -> Result<LlmResponse> {
        let idx = self.call_count.fetch_add(1, Ordering::SeqCst);
        Ok(LlmResponse {
            content: self.responses[idx].clone(),
            input_tokens: 100,
            output_tokens: 50,
            stop_reason: "end_turn".to_string(),
        })
    }
}
```

### Test Cases

| Test | What it validates |
|------|-------------------|
| `test_staleness_hash_changes_on_new_note` | notes_hash differs when note added |
| `test_staleness_hash_stable_no_changes` | notes_hash identical on re-computation |
| `test_enrichment_skips_unchanged_threads` | Threads with matching hash are not re-enriched |
| `test_enrichment_force_ignores_hash` | `--force` re-enriches all threads |
| `test_enrichment_stores_analysis` | Results persisted to `discussion_analysis` table |
| `test_enrichment_upserts_on_rereun` | Re-enrichment updates existing rows |
| `test_enrichment_dry_run_no_writes` | `--dry-run` produces count but writes nothing |
| `test_enrichment_respects_max_threads` | Caps at `--max-threads` value |
| `test_enrichment_scopes_to_project` | `-p` limits to project's discussions |
| `test_enrichment_scopes_to_entity` | `--issue 42` limits to that issue's discussions |
| `test_explain_uses_enrichment_data` | explain returns enrichment-sourced key_decisions |
| `test_explain_falls_back_to_heuristic` | No enrichment data → heuristic results |
| `test_explain_empty_when_no_data` | No enrichment, no heuristic matches → empty array |
| `test_prompt_construction` | Prompt includes correct notes, metadata, and instruction |
| `test_response_parsing_valid_json` | Well-formed LLM response parsed correctly |
| `test_response_parsing_malformed` | Malformed response logged, thread skipped (not crash) |
| `test_confidence_filter` | Only analysis above `minConfidence` shown in explain |
| `test_provider_config_bedrock` | Bedrock config parsed and provider instantiated |
| `test_provider_config_anthropic` | Anthropic API config parsed correctly |
| `test_no_enrichment_config_graceful` | Missing enrichment config → informative message, exit 0 |

### Integration Tests

- **Real Bedrock call** (gated behind `#[ignore]` + env var `LORE_TEST_BEDROCK=1`): Sends one real prompt to Bedrock, asserts valid JSON response with expected schema.
- **Full pipeline**: In-memory DB → insert discussions + notes → enrich with mock → verify `discussion_analysis` populated → run explain → verify key_decisions sourced from enrichment.

---

## Boundaries

### Always (autonomous)
- Run `cargo test` and `cargo clippy` after every code change
- Use `MockLlmProvider` in all non-integration tests
- Respect `--dry-run` flag — never call LLM in dry-run mode
- Log token usage for every enrichment run
- Graceful degradation when no enrichment config exists

### Ask First (needs approval)
- Adding AWS SDK or HTTP dependencies to Cargo.toml
- Choosing between `aws-sdk-bedrockruntime` crate vs raw HTTP to Bedrock
- Modifying the `Config` struct (new `enrichment` field)
- Changing `KeyDecision` struct shape (affects robot mode API contract)

### Never (hard stops)
- No LLM calls in `lore explain` path — enrichment is pre-computed only
- No storing API keys in config file — use env vars / credential chain
- No automatic enrichment during `lore sync` — enrichment is always explicit
- No sending discussion content to any service other than the configured LLM provider

---

## Non-Goals

- **No real-time streaming** — Enrichment is batch, not streaming
- **No multi-model ensemble** — Single model per run, configurable per config
- **No custom fine-tuning** — Uses Haiku as-is with prompt engineering
- **No enrichment of individual notes** — Thread-level only (the unit of discourse)
- **No automatic re-enrichment on sync** — User/agent must explicitly run `lore enrich`
- **No modification of discussion/notes tables** — Enrichment data lives in its own table
- **No embedding-based approach** — This is classification, not similarity search

---

## User Journeys

### P1 — Critical
- **UJ-1: Agent enriches discussions before explain**
  - Actor: AI agent (via robot mode)
  - Flow: `lore -J enrich discussions -p group/repo` → JSON summary of enrichment run → `lore -J explain issues 42` → key_decisions populated from enrichment
  - Error paths: No enrichment config (exit with suggestion), Bedrock auth failure (exit 5), rate limited (exit 7)
  - Implemented by: Tasks 1-5

### P2 — Important
- **UJ-2: Human runs enrichment and checks results**
  - Actor: Developer at terminal
  - Flow: `lore enrich discussions` → progress bar → summary → `lore explain issues 42` → sees decisions in narrative
  - Error paths: Same as UJ-1 but with human-readable messages
  - Implemented by: Tasks 1-5

- **UJ-3: Incremental enrichment after sync**
  - Actor: AI agent or human
  - Flow: `lore sync` → new notes ingested → `lore enrich discussions` → only stale threads re-enriched → fast completion
  - Implemented by: Task 2 (staleness detection)

### P3 — Nice to Have
- **UJ-4: Dry-run to estimate cost**
  - Actor: Cost-conscious user
  - Flow: `lore enrich discussions --dry-run` → see thread count and estimated tokens → decide whether to proceed
  - Implemented by: Task 4

---

## Tasks

### Phase 1: Schema & Provider Abstraction

- [ ] **Task 1:** Database migration + LLM provider trait
  - **Implements:** Infrastructure (all UJs)
  - **Files:** `src/core/db.rs` (migration), NEW `src/enrichment/mod.rs`, NEW `src/enrichment/provider.rs`
  - **Depends on:** Nothing
  - **Test-first:**
    1. Write `test_migration_creates_discussion_analysis_table`: run migrations, verify table exists with correct columns
    2. Write `test_provider_config_bedrock`: parse config JSON with bedrock enrichment section
    3. Write `test_provider_config_anthropic`: parse config JSON with anthropic enrichment section
    4. Write `test_no_enrichment_config_graceful`: parse config without enrichment section, verify `None`
    5. Run tests — all FAIL (red)
    6. Implement migration + `LlmProvider` trait + `EnrichmentConfig` struct + config parsing
    7. Run tests — all PASS (green)
  - **Acceptance:** Migration creates table. Config parses both provider variants. Missing config returns `None`.

### Phase 2: Staleness & Prompt Pipeline

- [ ] **Task 2:** Notes hash computation + staleness detection
  - **Implements:** UJ-3 (incremental enrichment)
  - **Files:** `src/enrichment/staleness.rs`
  - **Depends on:** Task 1
  - **Test-first:**
    1. Write `test_staleness_hash_changes_on_new_note`
    2. Write `test_staleness_hash_stable_no_changes`
    3. Write `test_enrichment_skips_unchanged_threads`
    4. Run tests — all FAIL (red)
    5. Implement `compute_notes_hash()` + `find_stale_discussions()` query
    6. Run tests — all PASS (green)
  - **Acceptance:** Hash deterministic. Stale detection correct. Unchanged threads skipped.

- [ ] **Task 3:** Prompt construction + response parsing
  - **Implements:** Core enrichment logic
  - **Files:** `src/enrichment/prompt.rs`, `src/enrichment/parser.rs`
  - **Depends on:** Task 1
  - **Test-first:**
    1. Write `test_prompt_construction`: verify prompt includes notes, metadata, instruction
    2. Write `test_response_parsing_valid_json`: well-formed response parsed
    3. Write `test_response_parsing_malformed`: malformed response returns error (not panic)
    4. Run tests — all FAIL (red)
    5. Implement `build_prompt()` + `parse_analysis_response()`
    6. Run tests — all PASS (green)
  - **Acceptance:** Prompt is well-formed. Parser handles valid and invalid responses gracefully.

### Phase 3: CLI Command & Pipeline

- [ ] **Task 4:** `lore enrich discussions` command + enrichment pipeline
  - **Implements:** UJ-1, UJ-2, UJ-4
  - **Files:** NEW `src/cli/commands/enrich.rs`, `src/cli/mod.rs`, `src/main.rs`
  - **Depends on:** Tasks 1, 2, 3
  - **Test-first:**
    1. Write `test_enrichment_stores_analysis`: mock LLM → verify rows in `discussion_analysis`
    2. Write `test_enrichment_upserts_on_rerun`: enrich → re-enrich → verify single row updated
    3. Write `test_enrichment_dry_run_no_writes`: dry-run → verify zero rows written
    4. Write `test_enrichment_respects_max_threads`: 10 stale, max=3 → only 3 enriched
    5. Write `test_enrichment_scopes_to_project`: verify project filter
    6. Write `test_enrichment_scopes_to_entity`: verify --issue/--mr filter
    7. Run tests — all FAIL (red)
    8. Implement: command registration, pipeline orchestration, mock-based tests
    9. Run tests — all PASS (green)
  - **Acceptance:** Full pipeline works with mock. Dry-run safe. Scoping correct. Robot JSON matches schema.

### Phase 4: LLM Backend Implementations

- [ ] **Task 5:** Bedrock + Anthropic API provider implementations
  - **Implements:** UJ-1, UJ-2 (actual LLM connectivity)
  - **Files:** `src/enrichment/bedrock.rs`, `src/enrichment/anthropic.rs`
  - **Depends on:** Task 4
  - **Test-first:**
    1. Write `test_bedrock_request_format`: verify request body matches Bedrock InvokeModel schema
    2. Write `test_anthropic_request_format`: verify request body matches Messages API schema
    3. Write integration test (gated `#[ignore]`): real Bedrock call, assert valid response
    4. Run tests — unit FAIL (red), integration skipped
    5. Implement both providers
    6. Run tests — all PASS (green)
  - **Acceptance:** Both providers construct valid requests. Auth works via standard credential chains. Integration test passes when enabled.

### Phase 5: Explain Integration

- [ ] **Task 6:** Replace heuristic with enrichment data in explain
  - **Implements:** UJ-1, UJ-2 (the payoff)
  - **Files:** `src/cli/commands/explain.rs`
  - **Depends on:** Task 4
  - **Test-first:**
    1. Write `test_explain_uses_enrichment_data`: insert mock enrichment rows → explain returns them as key_decisions
    2. Write `test_explain_falls_back_to_heuristic`: no enrichment rows → returns heuristic results
    3. Write `test_confidence_filter`: insert rows with varying confidence → only high-confidence shown
    4. Run tests — all FAIL (red)
    5. Implement `fetch_key_decisions_from_enrichment()` + fallback logic
    6. Run tests — all PASS (green)
  - **Acceptance:** Explain uses enrichment when available. Falls back gracefully. Confidence threshold respected.

---

## Dependencies (New Crates — Needs Discussion)

| Crate | Purpose | Alternative |
|-------|---------|-------------|
| `aws-sdk-bedrockruntime` | Bedrock InvokeModel API | Raw HTTP via existing `HttpClient` |
| `sha2` | SHA-256 for notes_hash | Already in dependency tree? Check. |

**Decision needed:** Use AWS SDK crate (heavier but handles auth/signing) vs. raw HTTP with SigV4 signing (lighter but more implementation work)?

---

## Session Log

### Session 1 — 2026-03-11
- Identified key_decisions heuristic as fundamentally inadequate (60-min same-actor window)
- User vision: LLM-powered discourse analysis, pre-computed for offline explain
- Key constraint: Bedrock required for org security compliance
- Designed pre-computed enrichment architecture
- Wrote initial spec draft for iteration