Files
gitlore/specs/SPEC_discussion_analysis.md
teernisse a57bff0646 docs(specs): add discussion analysis spec for LLM-powered discourse enrichment
SPEC_discussion_analysis.md defines a pre-computed enrichment pipeline that
replaces the current key_decisions heuristic in explain with actual
LLM-extracted discourse analysis (decisions, questions, consensus).

Key design choices:
- Dual LLM backend: Claude Haiku via AWS Bedrock (primary) or Anthropic API
- Pre-computed batch enrichment (lore enrich), never runtime LLM calls
- Staleness detection via notes_hash to skip unchanged threads
- New discussion_analysis SQLite table with structured JSON results
- Configurable via config.json enrichment section

Status: DRAFT — open questions on Bedrock model ID, auth mechanism, rate
limits, cost ceiling, and confidence thresholds.
2026-03-12 10:08:22 -04:00

30 KiB

Spec: Discussion Analysis — LLM-Powered Discourse Enrichment

Parent: SPEC_explain.md (replaces key_decisions heuristic, line 270) Created: 2026-03-11 Status: DRAFT — iterating with user

Spec Status

Section Status Notes
Objective draft Core vision defined, success metrics TBD
Tech Stack draft Bedrock + Anthropic API dual-backend
Architecture draft Pre-computed enrichment pipeline
Schema draft discussion_analysis table with staleness detection
CLI Command draft lore enrich discussions
LLM Provider draft Configurable backend abstraction
Explain Integration draft Replaces heuristic with DB lookup
Prompt Design draft Thread-level discourse classification
Testing Strategy draft Includes mock LLM for deterministic tests
Boundaries draft
Tasks not started Blocked on spec approval

Definition of Complete: All sections complete, Open Questions empty, every user journey has tasks, every task has TDD workflow and acceptance criteria.


Open Questions (Resolve Before Implementation)

  1. Bedrock model ID: Which exact Bedrock model will be used? (Assuming anthropic.claude-3-haiku-* — need the org-approved ARN or model ID.)
  2. Auth mechanism: Does the Bedrock setup use IAM role assumption, SSO profile, or explicit access keys? This affects the SDK configuration.
  3. Rate limiting: What's the org's Bedrock rate limit? This determines batch concurrency.
  4. Cost ceiling: Should there be a per-run token budget or discussion count cap? (e.g., --max-threads 200)
  5. Confidence thresholds: Below what confidence should we discard an analysis vs. store it with low confidence?
  6. explain integration field name: Replace key_decisions entirely, or add a new discourse_analysis section alongside it? (Recommendation: replace key_decisions — the heuristic is acknowledged as inadequate.)

Objective

Goal: Pre-compute structured discourse analysis for discussion threads using an LLM (Claude Haiku via Bedrock or Anthropic API), storing results locally so that lore explain and future commands can surface meaningful decisions, answered questions, and consensus without runtime LLM calls.

Problem: The current key_decisions heuristic in explain correlates state-change events with notes by the same actor within 60 minutes. This produces mostly empty results because real decisions happen in discussion threads, not at the moment of state changes. The heuristic cannot understand conversational semantics — whether a comment confirms a proposal, answers a question, or represents consensus.

What this enables:

  • lore explain issues 42 shows actual decisions extracted from discussion threads, not event-note temporal coincidences
  • Reusable across commands — any command can query discussion_analysis for pre-computed insights
  • Fully offline at query time — LLM enrichment is a batch pre-computation step
  • Incremental — only re-analyzes threads whose notes have changed (staleness via notes_hash)

Success metrics:

  • lore enrich discussions processes 100 threads in <60s with Haiku
  • lore explain key_decisions section populated from enrichment data in <500ms (no LLM calls)
  • Staleness detection: re-running enrichment skips unchanged threads
  • Zero impact on users without LLM configuration — graceful degradation to empty key_decisions

Tech Stack & Constraints

Layer Technology Notes
Language Rust nightly-2026-03-01
LLM (primary) Claude Haiku via AWS Bedrock Org-approved, security-compliant
LLM (fallback) Claude Haiku via Anthropic API For personal/non-org use
HTTP asupersync HttpClient Existing wrapper in src/http.rs
Database SQLite via rusqlite New migration for discussion_analysis table
Config ~/.config/lore/config.json New enrichment section

Constraints:

  • Bedrock is the primary backend (org security requirement for Taylor's work context)
  • Anthropic API is an alternative for non-org users
  • lore explain must NEVER make runtime LLM calls — all enrichment is pre-computed
  • lore explain performance budget unchanged: <500ms
  • Enrichment is an explicit opt-in step (lore enrich), never runs during sync
  • Must work when no LLM is configured — key_decisions degrades to empty array (or falls back to heuristic as transitional behavior)

Architecture

System Overview

┌─────────────────────────────────────────────────┐
│                  lore enrich                     │
│  (explicit user/agent command, batch operation)  │
└──────────────────────┬──────────────────────────┘
                       │
         ┌─────────────▼─────────────┐
         │    Enrichment Pipeline     │
         │  1. Select stale threads   │
         │  2. Build LLM prompts      │
         │  3. Call LLM (batched)     │
         │  4. Parse responses        │
         │  5. Store in DB            │
         └─────────────┬─────────────┘
                       │
         ┌─────────────▼─────────────┐
         │   discussion_analysis     │
         │   (SQLite table)          │
         └─────────────┬─────────────┘
                       │
         ┌─────────────▼─────────────┐
         │   lore explain / other    │
         │   (simple SELECT query)   │
         └───────────────────────────┘

Data Flow

  1. Staleness detection: For each discussion, compute SHA-256(sorted note IDs + note bodies). Compare against stored notes_hash. Skip if unchanged.
  2. Prompt construction: Extract the last N notes (configurable, default 5) from the thread. Build a structured prompt asking for discourse classification.
  3. LLM call: Send to configured backend (Bedrock or Anthropic API). Parse structured JSON response.
  4. Storage: Upsert into discussion_analysis with analysis results, model ID, timestamp, and notes_hash.

Pre-computation vs Runtime Trade-offs

Concern Pre-computed (chosen) Runtime
explain latency <500ms (DB query) 2-5s per thread (LLM call)
Offline capability Full None
Bedrock compliance Clean separation Leaks into explain path
Reusability Any command can query Tied to explain
Freshness Stale until re-enriched Always current
Cost Batch (predictable) Per-query (unbounded)

Schema

New Migration (next available version)

CREATE TABLE discussion_analysis (
    id INTEGER PRIMARY KEY,
    discussion_id INTEGER NOT NULL REFERENCES discussions(id),
    analysis_type TEXT NOT NULL,  -- 'decision', 'question_answered', 'consensus', 'open_debate', 'informational'
    confidence REAL NOT NULL,     -- 0.0 to 1.0
    summary TEXT NOT NULL,        -- LLM-generated 1-2 sentence summary
    evidence_note_ids TEXT,       -- JSON array of note IDs that support this analysis
    model_id TEXT NOT NULL,       -- e.g. 'anthropic.claude-3-haiku-20240307-v1:0'
    analyzed_at INTEGER NOT NULL, -- ms epoch
    notes_hash TEXT NOT NULL,     -- SHA-256 of thread content for staleness detection

    UNIQUE(discussion_id, analysis_type)
);

CREATE INDEX idx_discussion_analysis_discussion
    ON discussion_analysis(discussion_id);

CREATE INDEX idx_discussion_analysis_type
    ON discussion_analysis(analysis_type);

Design decisions:

  • UNIQUE(discussion_id, analysis_type): A thread can have at most one analysis per type. Re-enrichment upserts.
  • evidence_note_ids is a JSON array (not a junction table) because it's read-only metadata, never queried by note ID.
  • notes_hash enables O(1) staleness checks without re-reading all notes.
  • confidence allows filtering in queries (e.g., only show decisions with confidence > 0.7).
  • analysis_type uses lowercase snake_case strings, not an enum constraint, for forward compatibility.

Analysis Types

Type Description Example
decision A concrete decision was made or confirmed "Team agreed to use Redis for caching"
question_answered A question was asked and definitively answered "Confirmed: the API supports pagination via cursor"
consensus Multiple participants converged on an approach "All reviewers approved the retry-with-backoff strategy"
open_debate Active disagreement or unresolved discussion "Disagreement on whether to use gRPC vs REST"
informational Thread is purely informational, no actionable discourse "Status update on deployment progress"

Notes Hash Computation

notes_hash = SHA-256(
    note_1_id + ":" + note_1_body + "\n" +
    note_2_id + ":" + note_2_body + "\n" +
    ...
)

Notes sorted by id (insertion order) before hashing. This means:

  • New note added → hash changes → re-enrich
  • Note edited (body changes) → hash changes → re-enrich
  • No changes → hash matches → skip

CLI Command

lore enrich discussions

# Enrich all stale discussions across all projects
lore enrich discussions

# Scope to a project
lore enrich discussions -p group/repo

# Scope to a single entity's discussions
lore enrich discussions --issue 42 -p group/repo
lore enrich discussions --mr 99 -p group/repo

# Force re-enrichment (ignore staleness)
lore enrich discussions --force

# Dry run (show what would be enriched, don't call LLM)
lore enrich discussions --dry-run

# Limit batch size
lore enrich discussions --max-threads 50

# Robot mode
lore -J enrich discussions

Robot Mode Output

{
  "ok": true,
  "data": {
    "total_discussions": 1200,
    "stale": 45,
    "enriched": 45,
    "skipped_unchanged": 1155,
    "errors": 0,
    "tokens_used": {
      "input": 23400,
      "output": 4500
    }
  },
  "meta": { "elapsed_ms": 32000 }
}

Human Mode Output

Enriching discussions...

  Project: vs/typescript-code
    Discussions: 1,200 total, 45 stale
    Enriching: ████████████████████ 45/45
    Results: 12 decisions, 8 questions answered, 5 consensus, 3 debates, 17 informational
    Tokens: 23.4K input, 4.5K output

  Done in 32s

Command Registration

/// Pre-compute discourse analysis for discussion threads using LLM
#[command(after_help = "\x1b[1mExamples:\x1b[0m
  lore enrich discussions                      # Enrich all stale discussions
  lore enrich discussions -p group/repo        # Scope to project
  lore enrich discussions --issue 42           # Single issue's discussions
  lore -J enrich discussions --dry-run         # Preview what would be enriched")]
Enrich {
    /// What to enrich: "discussions"
    #[arg(value_parser = ["discussions"])]
    target: String,

    /// Scope to project (fuzzy match)
    #[arg(short, long)]
    project: Option<String>,

    /// Scope to a specific issue's discussions
    #[arg(long, conflicts_with = "mr")]
    issue: Option<i64>,

    /// Scope to a specific MR's discussions
    #[arg(long, conflicts_with = "issue")]
    mr: Option<i64>,

    /// Re-enrich all threads regardless of staleness
    #[arg(long)]
    force: bool,

    /// Show what would be enriched without calling LLM
    #[arg(long)]
    dry_run: bool,

    /// Maximum threads to enrich in one run
    #[arg(long, default_value = "500")]
    max_threads: usize,
},

LLM Provider Abstraction

Config Schema

New enrichment section in ~/.config/lore/config.json:

{
  "enrichment": {
    "provider": "bedrock",
    "bedrock": {
      "region": "us-east-1",
      "modelId": "anthropic.claude-3-haiku-20240307-v1:0",
      "profile": "default"
    },
    "anthropicApi": {
      "modelId": "claude-3-haiku-20240307"
    },
    "concurrency": 4,
    "maxNotesPerThread": 5,
    "minConfidence": 0.6
  }
}

Provider selection:

  • "bedrock" — AWS Bedrock (uses AWS SDK credential chain: env vars → profile → IAM role)
  • "anthropic" — Anthropic API (uses ANTHROPIC_API_KEY env var)
  • null or absent — enrichment disabled, lore enrich exits with informative message

Rust Abstraction

/// Trait for LLM backends. Implementations handle auth, serialization, and API specifics.
#[async_trait]
pub trait LlmProvider: Send + Sync {
    /// Send a prompt and get a structured response.
    async fn complete(&self, prompt: &str, max_tokens: u32) -> Result<LlmResponse>;

    /// Provider name for logging/storage (e.g., "bedrock", "anthropic")
    fn provider_name(&self) -> &str;

    /// Model identifier for storage (e.g., "anthropic.claude-3-haiku-20240307-v1:0")
    fn model_id(&self) -> &str;
}

pub struct LlmResponse {
    pub content: String,
    pub input_tokens: u32,
    pub output_tokens: u32,
    pub stop_reason: String,
}

Bedrock Implementation Notes

  • Uses AWS SDK InvokeModel API (not Converse) for Anthropic models on Bedrock
  • Request body follows Anthropic Messages API format, wrapped in Bedrock's envelope
  • Auth: AWS credential chain (env → profile → IMDS)
  • Region from config or AWS_REGION env var
  • Content type: application/json, accept: application/json

Anthropic API Implementation Notes

  • Standard Messages API (POST /v1/messages)
  • Auth: x-api-key header from ANTHROPIC_API_KEY env var
  • Model ID from config enrichment.anthropicApi.modelId

Prompt Design

Thread-Level Analysis Prompt

The prompt receives the last N notes from a discussion thread and classifies the discourse.

You are analyzing a discussion thread from a software project's issue tracker.

Thread context:
- Entity: {entity_type} #{iid} "{title}"
- Thread started: {first_note_at}
- Total notes in thread: {note_count}

Notes (most recent {N} shown):

[Note by @{author} at {timestamp}]
{body}

[Note by @{author} at {timestamp}]
{body}

...

Classify this thread's discourse. Respond with JSON only:

{
  "analysis_type": "decision" | "question_answered" | "consensus" | "open_debate" | "informational",
  "confidence": 0.0-1.0,
  "summary": "1-2 sentence summary of what was decided/answered/debated",
  "evidence_note_indices": [0, 2]  // indices of notes that most support this classification
}

Classification guide:
- "decision": A concrete choice was made. Look for: "let's go with", "agreed", "approved", explicit confirmation of an approach.
- "question_answered": A question was asked and definitively answered. Look for: question mark followed by a clear factual response.
- "consensus": Multiple people converged. Look for: multiple approvals, "+1", "LGTM", agreement from different authors.
- "open_debate": Active disagreement or unresolved alternatives. Look for: "but", "alternatively", "I disagree", competing proposals without resolution.
- "informational": Status updates, FYI notes, no actionable discourse.

If the thread is ambiguous, prefer "informational" with lower confidence over guessing.

Prompt Design Principles

  1. Structured JSON output — Haiku is reliable at JSON generation with clear schema
  2. Evidence-backedevidence_note_indices ties the classification to specific notes, enabling the UI to show "why"
  3. Conservative default — "informational" is the fallback, preventing false-positive decisions
  4. Limited context window — Last 5 notes (configurable) keeps token usage low per thread
  5. No system prompt tricks — Straightforward classification task within Haiku's strengths

Token Budget Estimation

Component Tokens (approx)
System/instruction prompt ~300
Thread metadata ~50
5 notes (avg 100 words each) ~750
Response ~100
Total per thread ~1,200

At Haiku pricing (~$0.25/1M input, ~$1.25/1M output):

  • 100 threads ≈ $0.03 input + $0.01 output = ~$0.04
  • 1,000 threads ≈ ~$0.40

Explain Integration

Current Behavior (to be replaced)

explain.rs:650extract_key_decisions() uses the 60-minute same-actor heuristic.

New Behavior

When discussion_analysis table has data for the entity's discussions:

fn fetch_key_decisions_from_enrichment(
    conn: &Connection,
    entity_type: &str,
    entity_id: i64,
    max_decisions: usize,
) -> Result<Vec<KeyDecision>> {
    let id_col = id_column_for(entity_type);
    let sql = format!(
        "SELECT da.analysis_type, da.confidence, da.summary, da.evidence_note_ids,
                da.analyzed_at, d.gitlab_discussion_id
         FROM discussion_analysis da
         JOIN discussions d ON da.discussion_id = d.id
         WHERE d.{id_col} = ?1
           AND da.analysis_type IN ('decision', 'question_answered', 'consensus')
           AND da.confidence >= ?2
         ORDER BY da.confidence DESC, da.analyzed_at DESC
         LIMIT ?3"
    );
    // ... map to KeyDecision structs
}

Fallback Strategy

if discussion_analysis table has rows for this entity:
    use enrichment data → key_decisions
else if enrichment is not configured:
    fall back to heuristic (existing behavior)
else:
    return empty key_decisions with a hint: "Run 'lore enrich discussions' to populate"

This preserves backwards compatibility during rollout. The heuristic can be removed entirely once enrichment is the established workflow.

KeyDecision Struct Changes

#[derive(Debug, Serialize)]
pub struct KeyDecision {
    pub timestamp: String,           // ISO 8601 (analyzed_at or note timestamp)
    pub actor: Option<String>,       // May not be single-actor for consensus
    pub action: String,              // analysis_type: "decision", "question_answered", "consensus"
    pub summary: String,             // LLM-generated summary (replaces context_note)
    pub confidence: f64,             // 0.0-1.0
    pub discussion_id: Option<String>, // gitlab_discussion_id for linking
    #[serde(skip_serializing_if = "Option::is_none")]
    pub source: Option<String>,      // "enrichment" or "heuristic" (transitional)
}

Testing Strategy

Unit Tests (Mock LLM)

The LLM provider trait enables deterministic testing with a mock:

struct MockLlmProvider {
    responses: Vec<String>,  // pre-canned JSON responses
    call_count: AtomicUsize,
}

impl LlmProvider for MockLlmProvider {
    async fn complete(&self, _prompt: &str, _max_tokens: u32) -> Result<LlmResponse> {
        let idx = self.call_count.fetch_add(1, Ordering::SeqCst);
        Ok(LlmResponse {
            content: self.responses[idx].clone(),
            input_tokens: 100,
            output_tokens: 50,
            stop_reason: "end_turn".to_string(),
        })
    }
}

Test Cases

Test What it validates
test_staleness_hash_changes_on_new_note notes_hash differs when note added
test_staleness_hash_stable_no_changes notes_hash identical on re-computation
test_enrichment_skips_unchanged_threads Threads with matching hash are not re-enriched
test_enrichment_force_ignores_hash --force re-enriches all threads
test_enrichment_stores_analysis Results persisted to discussion_analysis table
test_enrichment_upserts_on_rereun Re-enrichment updates existing rows
test_enrichment_dry_run_no_writes --dry-run produces count but writes nothing
test_enrichment_respects_max_threads Caps at --max-threads value
test_enrichment_scopes_to_project -p limits to project's discussions
test_enrichment_scopes_to_entity --issue 42 limits to that issue's discussions
test_explain_uses_enrichment_data explain returns enrichment-sourced key_decisions
test_explain_falls_back_to_heuristic No enrichment data → heuristic results
test_explain_empty_when_no_data No enrichment, no heuristic matches → empty array
test_prompt_construction Prompt includes correct notes, metadata, and instruction
test_response_parsing_valid_json Well-formed LLM response parsed correctly
test_response_parsing_malformed Malformed response logged, thread skipped (not crash)
test_confidence_filter Only analysis above minConfidence shown in explain
test_provider_config_bedrock Bedrock config parsed and provider instantiated
test_provider_config_anthropic Anthropic API config parsed correctly
test_no_enrichment_config_graceful Missing enrichment config → informative message, exit 0

Integration Tests

  • Real Bedrock call (gated behind #[ignore] + env var LORE_TEST_BEDROCK=1): Sends one real prompt to Bedrock, asserts valid JSON response with expected schema.
  • Full pipeline: In-memory DB → insert discussions + notes → enrich with mock → verify discussion_analysis populated → run explain → verify key_decisions sourced from enrichment.

Boundaries

Always (autonomous)

  • Run cargo test and cargo clippy after every code change
  • Use MockLlmProvider in all non-integration tests
  • Respect --dry-run flag — never call LLM in dry-run mode
  • Log token usage for every enrichment run
  • Graceful degradation when no enrichment config exists

Ask First (needs approval)

  • Adding AWS SDK or HTTP dependencies to Cargo.toml
  • Choosing between aws-sdk-bedrockruntime crate vs raw HTTP to Bedrock
  • Modifying the Config struct (new enrichment field)
  • Changing KeyDecision struct shape (affects robot mode API contract)

Never (hard stops)

  • No LLM calls in lore explain path — enrichment is pre-computed only
  • No storing API keys in config file — use env vars / credential chain
  • No automatic enrichment during lore sync — enrichment is always explicit
  • No sending discussion content to any service other than the configured LLM provider

Non-Goals

  • No real-time streaming — Enrichment is batch, not streaming
  • No multi-model ensemble — Single model per run, configurable per config
  • No custom fine-tuning — Uses Haiku as-is with prompt engineering
  • No enrichment of individual notes — Thread-level only (the unit of discourse)
  • No automatic re-enrichment on sync — User/agent must explicitly run lore enrich
  • No modification of discussion/notes tables — Enrichment data lives in its own table
  • No embedding-based approach — This is classification, not similarity search

User Journeys

P1 — Critical

  • UJ-1: Agent enriches discussions before explain
    • Actor: AI agent (via robot mode)
    • Flow: lore -J enrich discussions -p group/repo → JSON summary of enrichment run → lore -J explain issues 42 → key_decisions populated from enrichment
    • Error paths: No enrichment config (exit with suggestion), Bedrock auth failure (exit 5), rate limited (exit 7)
    • Implemented by: Tasks 1-5

P2 — Important

  • UJ-2: Human runs enrichment and checks results

    • Actor: Developer at terminal
    • Flow: lore enrich discussions → progress bar → summary → lore explain issues 42 → sees decisions in narrative
    • Error paths: Same as UJ-1 but with human-readable messages
    • Implemented by: Tasks 1-5
  • UJ-3: Incremental enrichment after sync

    • Actor: AI agent or human
    • Flow: lore sync → new notes ingested → lore enrich discussions → only stale threads re-enriched → fast completion
    • Implemented by: Task 2 (staleness detection)

P3 — Nice to Have

  • UJ-4: Dry-run to estimate cost
    • Actor: Cost-conscious user
    • Flow: lore enrich discussions --dry-run → see thread count and estimated tokens → decide whether to proceed
    • Implemented by: Task 4

Tasks

Phase 1: Schema & Provider Abstraction

  • Task 1: Database migration + LLM provider trait
    • Implements: Infrastructure (all UJs)
    • Files: src/core/db.rs (migration), NEW src/enrichment/mod.rs, NEW src/enrichment/provider.rs
    • Depends on: Nothing
    • Test-first:
      1. Write test_migration_creates_discussion_analysis_table: run migrations, verify table exists with correct columns
      2. Write test_provider_config_bedrock: parse config JSON with bedrock enrichment section
      3. Write test_provider_config_anthropic: parse config JSON with anthropic enrichment section
      4. Write test_no_enrichment_config_graceful: parse config without enrichment section, verify None
      5. Run tests — all FAIL (red)
      6. Implement migration + LlmProvider trait + EnrichmentConfig struct + config parsing
      7. Run tests — all PASS (green)
    • Acceptance: Migration creates table. Config parses both provider variants. Missing config returns None.

Phase 2: Staleness & Prompt Pipeline

  • Task 2: Notes hash computation + staleness detection

    • Implements: UJ-3 (incremental enrichment)
    • Files: src/enrichment/staleness.rs
    • Depends on: Task 1
    • Test-first:
      1. Write test_staleness_hash_changes_on_new_note
      2. Write test_staleness_hash_stable_no_changes
      3. Write test_enrichment_skips_unchanged_threads
      4. Run tests — all FAIL (red)
      5. Implement compute_notes_hash() + find_stale_discussions() query
      6. Run tests — all PASS (green)
    • Acceptance: Hash deterministic. Stale detection correct. Unchanged threads skipped.
  • Task 3: Prompt construction + response parsing

    • Implements: Core enrichment logic
    • Files: src/enrichment/prompt.rs, src/enrichment/parser.rs
    • Depends on: Task 1
    • Test-first:
      1. Write test_prompt_construction: verify prompt includes notes, metadata, instruction
      2. Write test_response_parsing_valid_json: well-formed response parsed
      3. Write test_response_parsing_malformed: malformed response returns error (not panic)
      4. Run tests — all FAIL (red)
      5. Implement build_prompt() + parse_analysis_response()
      6. Run tests — all PASS (green)
    • Acceptance: Prompt is well-formed. Parser handles valid and invalid responses gracefully.

Phase 3: CLI Command & Pipeline

  • Task 4: lore enrich discussions command + enrichment pipeline
    • Implements: UJ-1, UJ-2, UJ-4
    • Files: NEW src/cli/commands/enrich.rs, src/cli/mod.rs, src/main.rs
    • Depends on: Tasks 1, 2, 3
    • Test-first:
      1. Write test_enrichment_stores_analysis: mock LLM → verify rows in discussion_analysis
      2. Write test_enrichment_upserts_on_rerun: enrich → re-enrich → verify single row updated
      3. Write test_enrichment_dry_run_no_writes: dry-run → verify zero rows written
      4. Write test_enrichment_respects_max_threads: 10 stale, max=3 → only 3 enriched
      5. Write test_enrichment_scopes_to_project: verify project filter
      6. Write test_enrichment_scopes_to_entity: verify --issue/--mr filter
      7. Run tests — all FAIL (red)
      8. Implement: command registration, pipeline orchestration, mock-based tests
      9. Run tests — all PASS (green)
    • Acceptance: Full pipeline works with mock. Dry-run safe. Scoping correct. Robot JSON matches schema.

Phase 4: LLM Backend Implementations

  • Task 5: Bedrock + Anthropic API provider implementations
    • Implements: UJ-1, UJ-2 (actual LLM connectivity)
    • Files: src/enrichment/bedrock.rs, src/enrichment/anthropic.rs
    • Depends on: Task 4
    • Test-first:
      1. Write test_bedrock_request_format: verify request body matches Bedrock InvokeModel schema
      2. Write test_anthropic_request_format: verify request body matches Messages API schema
      3. Write integration test (gated #[ignore]): real Bedrock call, assert valid response
      4. Run tests — unit FAIL (red), integration skipped
      5. Implement both providers
      6. Run tests — all PASS (green)
    • Acceptance: Both providers construct valid requests. Auth works via standard credential chains. Integration test passes when enabled.

Phase 5: Explain Integration

  • Task 6: Replace heuristic with enrichment data in explain
    • Implements: UJ-1, UJ-2 (the payoff)
    • Files: src/cli/commands/explain.rs
    • Depends on: Task 4
    • Test-first:
      1. Write test_explain_uses_enrichment_data: insert mock enrichment rows → explain returns them as key_decisions
      2. Write test_explain_falls_back_to_heuristic: no enrichment rows → returns heuristic results
      3. Write test_confidence_filter: insert rows with varying confidence → only high-confidence shown
      4. Run tests — all FAIL (red)
      5. Implement fetch_key_decisions_from_enrichment() + fallback logic
      6. Run tests — all PASS (green)
    • Acceptance: Explain uses enrichment when available. Falls back gracefully. Confidence threshold respected.

Dependencies (New Crates — Needs Discussion)

Crate Purpose Alternative
aws-sdk-bedrockruntime Bedrock InvokeModel API Raw HTTP via existing HttpClient
sha2 SHA-256 for notes_hash Already in dependency tree? Check.

Decision needed: Use AWS SDK crate (heavier but handles auth/signing) vs. raw HTTP with SigV4 signing (lighter but more implementation work)?


Session Log

Session 1 — 2026-03-11

  • Identified key_decisions heuristic as fundamentally inadequate (60-min same-actor window)
  • User vision: LLM-powered discourse analysis, pre-computed for offline explain
  • Key constraint: Bedrock required for org security compliance
  • Designed pre-computed enrichment architecture
  • Wrote initial spec draft for iteration