Files

teernisse a57bff0646 docs(specs): add discussion analysis spec for LLM-powered discourse enrichment

SPEC_discussion_analysis.md defines a pre-computed enrichment pipeline that
replaces the current key_decisions heuristic in explain with actual
LLM-extracted discourse analysis (decisions, questions, consensus).

Key design choices:
- Dual LLM backend: Claude Haiku via AWS Bedrock (primary) or Anthropic API
- Pre-computed batch enrichment (lore enrich), never runtime LLM calls
- Staleness detection via notes_hash to skip unchanged threads
- New discussion_analysis SQLite table with structured JSON results
- Configurable via config.json enrichment section

Status: DRAFT — open questions on Bedrock model ID, auth mechanism, rate
limits, cost ceiling, and confidence thresholds.

2026-03-12 10:08:22 -04:00

30 KiB

Raw Blame History

Spec: Discussion Analysis — LLM-Powered Discourse Enrichment

Parent: SPEC_explain.md (replaces key_decisions heuristic, line 270) Created: 2026-03-11 Status: DRAFT — iterating with user

Spec Status

Section	Status	Notes
Objective	draft	Core vision defined, success metrics TBD
Tech Stack	draft	Bedrock + Anthropic API dual-backend
Architecture	draft	Pre-computed enrichment pipeline
Schema	draft	`discussion_analysis` table with staleness detection
CLI Command	draft	`lore enrich discussions`
LLM Provider	draft	Configurable backend abstraction
Explain Integration	draft	Replaces heuristic with DB lookup
Prompt Design	draft	Thread-level discourse classification
Testing Strategy	draft	Includes mock LLM for deterministic tests
Boundaries	draft
Tasks	not started	Blocked on spec approval

Definition of Complete: All sections complete, Open Questions empty, every user journey has tasks, every task has TDD workflow and acceptance criteria.

Open Questions (Resolve Before Implementation)

Bedrock model ID: Which exact Bedrock model will be used? (Assuming anthropic.claude-3-haiku-* — need the org-approved ARN or model ID.)
Auth mechanism: Does the Bedrock setup use IAM role assumption, SSO profile, or explicit access keys? This affects the SDK configuration.
Rate limiting: What's the org's Bedrock rate limit? This determines batch concurrency.
Cost ceiling: Should there be a per-run token budget or discussion count cap? (e.g., --max-threads 200)
Confidence thresholds: Below what confidence should we discard an analysis vs. store it with low confidence?
explain integration field name: Replace key_decisions entirely, or add a new discourse_analysis section alongside it? (Recommendation: replace key_decisions — the heuristic is acknowledged as inadequate.)

Objective

Goal: Pre-compute structured discourse analysis for discussion threads using an LLM (Claude Haiku via Bedrock or Anthropic API), storing results locally so that lore explain and future commands can surface meaningful decisions, answered questions, and consensus without runtime LLM calls.

Problem: The current key_decisions heuristic in explain correlates state-change events with notes by the same actor within 60 minutes. This produces mostly empty results because real decisions happen in discussion threads, not at the moment of state changes. The heuristic cannot understand conversational semantics — whether a comment confirms a proposal, answers a question, or represents consensus.

What this enables:

lore explain issues 42 shows actual decisions extracted from discussion threads, not event-note temporal coincidences
Reusable across commands — any command can query discussion_analysis for pre-computed insights
Fully offline at query time — LLM enrichment is a batch pre-computation step
Incremental — only re-analyzes threads whose notes have changed (staleness via notes_hash)

Success metrics:

lore enrich discussions processes 100 threads in <60s with Haiku
lore explain key_decisions section populated from enrichment data in <500ms (no LLM calls)
Staleness detection: re-running enrichment skips unchanged threads
Zero impact on users without LLM configuration — graceful degradation to empty key_decisions

Tech Stack & Constraints

Layer	Technology	Notes
Language	Rust	nightly-2026-03-01
LLM (primary)	Claude Haiku via AWS Bedrock	Org-approved, security-compliant
LLM (fallback)	Claude Haiku via Anthropic API	For personal/non-org use
HTTP	asupersync `HttpClient`	Existing wrapper in `src/http.rs`
Database	SQLite via rusqlite	New migration for `discussion_analysis` table
Config	`~/.config/lore/config.json`	New `enrichment` section

Constraints:

Bedrock is the primary backend (org security requirement for Taylor's work context)
Anthropic API is an alternative for non-org users
lore explain must NEVER make runtime LLM calls — all enrichment is pre-computed
lore explain performance budget unchanged: <500ms
Enrichment is an explicit opt-in step (lore enrich), never runs during sync
Must work when no LLM is configured — key_decisions degrades to empty array (or falls back to heuristic as transitional behavior)

Architecture

System Overview

┌─────────────────────────────────────────────────┐
│                  lore enrich                     │
│  (explicit user/agent command, batch operation)  │
└──────────────────────┬──────────────────────────┘
                       │
         ┌─────────────▼─────────────┐
         │    Enrichment Pipeline     │
         │  1. Select stale threads   │
         │  2. Build LLM prompts      │
         │  3. Call LLM (batched)     │
         │  4. Parse responses        │
         │  5. Store in DB            │
         └─────────────┬─────────────┘
                       │
         ┌─────────────▼─────────────┐
         │   discussion_analysis     │
         │   (SQLite table)          │
         └─────────────┬─────────────┘
                       │
         ┌─────────────▼─────────────┐
         │   lore explain / other    │
         │   (simple SELECT query)   │
         └───────────────────────────┘

Data Flow

Staleness detection: For each discussion, compute SHA-256(sorted note IDs + note bodies). Compare against stored notes_hash. Skip if unchanged.
Prompt construction: Extract the last N notes (configurable, default 5) from the thread. Build a structured prompt asking for discourse classification.
LLM call: Send to configured backend (Bedrock or Anthropic API). Parse structured JSON response.
Storage: Upsert into discussion_analysis with analysis results, model ID, timestamp, and notes_hash.

Pre-computation vs Runtime Trade-offs

Concern	Pre-computed (chosen)	Runtime
explain latency	<500ms (DB query)	2-5s per thread (LLM call)
Offline capability	Full	None
Bedrock compliance	Clean separation	Leaks into explain path
Reusability	Any command can query	Tied to explain
Freshness	Stale until re-enriched	Always current
Cost	Batch (predictable)	Per-query (unbounded)

Schema

New Migration (next available version)

CREATE TABLE discussion_analysis (
    id INTEGER PRIMARY KEY,
    discussion_id INTEGER NOT NULL REFERENCES discussions(id),
    analysis_type TEXT NOT NULL,  -- 'decision', 'question_answered', 'consensus', 'open_debate', 'informational'
    confidence REAL NOT NULL,     -- 0.0 to 1.0
    summary TEXT NOT NULL,        -- LLM-generated 1-2 sentence summary
    evidence_note_ids TEXT,       -- JSON array of note IDs that support this analysis
    model_id TEXT NOT NULL,       -- e.g. 'anthropic.claude-3-haiku-20240307-v1:0'
    analyzed_at INTEGER NOT NULL, -- ms epoch
    notes_hash TEXT NOT NULL,     -- SHA-256 of thread content for staleness detection

    UNIQUE(discussion_id, analysis_type)
);

CREATE INDEX idx_discussion_analysis_discussion
    ON discussion_analysis(discussion_id);

CREATE INDEX idx_discussion_analysis_type
    ON discussion_analysis(analysis_type);

Design decisions:

UNIQUE(discussion_id, analysis_type): A thread can have at most one analysis per type. Re-enrichment upserts.
evidence_note_ids is a JSON array (not a junction table) because it's read-only metadata, never queried by note ID.
notes_hash enables O(1) staleness checks without re-reading all notes.
confidence allows filtering in queries (e.g., only show decisions with confidence > 0.7).
analysis_type uses lowercase snake_case strings, not an enum constraint, for forward compatibility.

Analysis Types

Type	Description	Example
`decision`	A concrete decision was made or confirmed	"Team agreed to use Redis for caching"
`question_answered`	A question was asked and definitively answered	"Confirmed: the API supports pagination via cursor"
`consensus`	Multiple participants converged on an approach	"All reviewers approved the retry-with-backoff strategy"
`open_debate`	Active disagreement or unresolved discussion	"Disagreement on whether to use gRPC vs REST"
`informational`	Thread is purely informational, no actionable discourse	"Status update on deployment progress"

Notes Hash Computation

notes_hash = SHA-256(
    note_1_id + ":" + note_1_body + "\n" +
    note_2_id + ":" + note_2_body + "\n" +
    ...
)

Notes sorted by id (insertion order) before hashing. This means:

New note added → hash changes → re-enrich
Note edited (body changes) → hash changes → re-enrich
No changes → hash matches → skip

CLI Command

`lore enrich discussions`

# Enrich all stale discussions across all projects
lore enrich discussions

# Scope to a project
lore enrich discussions -p group/repo

# Scope to a single entity's discussions
lore enrich discussions --issue 42 -p group/repo
lore enrich discussions --mr 99 -p group/repo

# Force re-enrichment (ignore staleness)
lore enrich discussions --force

# Dry run (show what would be enriched, don't call LLM)
lore enrich discussions --dry-run

# Limit batch size
lore enrich discussions --max-threads 50

# Robot mode
lore -J enrich discussions

Robot Mode Output

{
  "ok": true,
  "data": {
    "total_discussions": 1200,
    "stale": 45,
    "enriched": 45,
    "skipped_unchanged": 1155,
    "errors": 0,
    "tokens_used": {
      "input": 23400,
      "output": 4500
    }
  },
  "meta": { "elapsed_ms": 32000 }
}

Human Mode Output

Enriching discussions...

  Project: vs/typescript-code
    Discussions: 1,200 total, 45 stale
    Enriching: ████████████████████ 45/45
    Results: 12 decisions, 8 questions answered, 5 consensus, 3 debates, 17 informational
    Tokens: 23.4K input, 4.5K output

  Done in 32s

Command Registration

/// Pre-compute discourse analysis for discussion threads using LLM
#[command(after_help = "\x1b[1mExamples:\x1b[0m
  lore enrich discussions                      # Enrich all stale discussions
  lore enrich discussions -p group/repo        # Scope to project
  lore enrich discussions --issue 42           # Single issue's discussions
  lore -J enrich discussions --dry-run         # Preview what would be enriched")]
Enrich {
    /// What to enrich: "discussions"
    #[arg(value_parser = ["discussions"])]
    target: String,

    /// Scope to project (fuzzy match)
    #[arg(short, long)]
    project: Option<String>,

    /// Scope to a specific issue's discussions
    #[arg(long, conflicts_with = "mr")]
    issue: Option<i64>,

    /// Scope to a specific MR's discussions
    #[arg(long, conflicts_with = "issue")]
    mr: Option<i64>,

    /// Re-enrich all threads regardless of staleness
    #[arg(long)]
    force: bool,

    /// Show what would be enriched without calling LLM
    #[arg(long)]
    dry_run: bool,

    /// Maximum threads to enrich in one run
    #[arg(long, default_value = "500")]
    max_threads: usize,
},

LLM Provider Abstraction

Config Schema

New enrichment section in ~/.config/lore/config.json:

{
  "enrichment": {
    "provider": "bedrock",
    "bedrock": {
      "region": "us-east-1",
      "modelId": "anthropic.claude-3-haiku-20240307-v1:0",
      "profile": "default"
    },
    "anthropicApi": {
      "modelId": "claude-3-haiku-20240307"
    },
    "concurrency": 4,
    "maxNotesPerThread": 5,
    "minConfidence": 0.6
  }
}

Provider selection:

"bedrock" — AWS Bedrock (uses AWS SDK credential chain: env vars → profile → IAM role)
"anthropic" — Anthropic API (uses ANTHROPIC_API_KEY env var)
null or absent — enrichment disabled, lore enrich exits with informative message

Rust Abstraction

/// Trait for LLM backends. Implementations handle auth, serialization, and API specifics.
#[async_trait]
pub trait LlmProvider: Send + Sync {
    /// Send a prompt and get a structured response.
    async fn complete(&self, prompt: &str, max_tokens: u32) -> Result<LlmResponse>;

    /// Provider name for logging/storage (e.g., "bedrock", "anthropic")
    fn provider_name(&self) -> &str;

    /// Model identifier for storage (e.g., "anthropic.claude-3-haiku-20240307-v1:0")
    fn model_id(&self) -> &str;
}

pub struct LlmResponse {
    pub content: String,
    pub input_tokens: u32,
    pub output_tokens: u32,
    pub stop_reason: String,
}

Bedrock Implementation Notes

Uses AWS SDK InvokeModel API (not Converse) for Anthropic models on Bedrock
Request body follows Anthropic Messages API format, wrapped in Bedrock's envelope
Auth: AWS credential chain (env → profile → IMDS)
Region from config or AWS_REGION env var
Content type: application/json, accept: application/json

Anthropic API Implementation Notes

Standard Messages API (POST /v1/messages)
Auth: x-api-key header from ANTHROPIC_API_KEY env var
Model ID from config enrichment.anthropicApi.modelId

Prompt Design

Thread-Level Analysis Prompt

The prompt receives the last N notes from a discussion thread and classifies the discourse.

You are analyzing a discussion thread from a software project's issue tracker.

Thread context:
- Entity: {entity_type} #{iid} "{title}"
- Thread started: {first_note_at}
- Total notes in thread: {note_count}

Notes (most recent {N} shown):

[Note by @{author} at {timestamp}]
{body}

[Note by @{author} at {timestamp}]
{body}

...

Classify this thread's discourse. Respond with JSON only:

{
  "analysis_type": "decision" | "question_answered" | "consensus" | "open_debate" | "informational",
  "confidence": 0.0-1.0,
  "summary": "1-2 sentence summary of what was decided/answered/debated",
  "evidence_note_indices": [0, 2]  // indices of notes that most support this classification
}

Classification guide:
- "decision": A concrete choice was made. Look for: "let's go with", "agreed", "approved", explicit confirmation of an approach.
- "question_answered": A question was asked and definitively answered. Look for: question mark followed by a clear factual response.
- "consensus": Multiple people converged. Look for: multiple approvals, "+1", "LGTM", agreement from different authors.
- "open_debate": Active disagreement or unresolved alternatives. Look for: "but", "alternatively", "I disagree", competing proposals without resolution.
- "informational": Status updates, FYI notes, no actionable discourse.

If the thread is ambiguous, prefer "informational" with lower confidence over guessing.

Prompt Design Principles

Structured JSON output — Haiku is reliable at JSON generation with clear schema
Evidence-backed — evidence_note_indices ties the classification to specific notes, enabling the UI to show "why"
Conservative default — "informational" is the fallback, preventing false-positive decisions
Limited context window — Last 5 notes (configurable) keeps token usage low per thread
No system prompt tricks — Straightforward classification task within Haiku's strengths

Token Budget Estimation

Component	Tokens (approx)
System/instruction prompt	~300
Thread metadata	~50
5 notes (avg 100 words each)	~750
Response	~100
Total per thread	~1,200

At Haiku pricing (~$0.25/1M input, ~$1.25/1M output):

100 threads ≈ $0.03 input + $0.01 output = ~$0.04
1,000 threads ≈ ~$0.40

Explain Integration

Current Behavior (to be replaced)

explain.rs:650 — extract_key_decisions() uses the 60-minute same-actor heuristic.

New Behavior

When discussion_analysis table has data for the entity's discussions:

fn fetch_key_decisions_from_enrichment(
    conn: &Connection,
    entity_type: &str,
    entity_id: i64,
    max_decisions: usize,
) -> Result<Vec<KeyDecision>> {
    let id_col = id_column_for(entity_type);
    let sql = format!(
        "SELECT da.analysis_type, da.confidence, da.summary, da.evidence_note_ids,
                da.analyzed_at, d.gitlab_discussion_id
         FROM discussion_analysis da
         JOIN discussions d ON da.discussion_id = d.id
         WHERE d.{id_col} = ?1
           AND da.analysis_type IN ('decision', 'question_answered', 'consensus')
           AND da.confidence >= ?2
         ORDER BY da.confidence DESC, da.analyzed_at DESC
         LIMIT ?3"
    );
    // ... map to KeyDecision structs
}

Fallback Strategy

if discussion_analysis table has rows for this entity:
    use enrichment data → key_decisions
else if enrichment is not configured:
    fall back to heuristic (existing behavior)
else:
    return empty key_decisions with a hint: "Run 'lore enrich discussions' to populate"

This preserves backwards compatibility during rollout. The heuristic can be removed entirely once enrichment is the established workflow.

KeyDecision Struct Changes

#[derive(Debug, Serialize)]
pub struct KeyDecision {
    pub timestamp: String,           // ISO 8601 (analyzed_at or note timestamp)
    pub actor: Option<String>,       // May not be single-actor for consensus
    pub action: String,              // analysis_type: "decision", "question_answered", "consensus"
    pub summary: String,             // LLM-generated summary (replaces context_note)
    pub confidence: f64,             // 0.0-1.0
    pub discussion_id: Option<String>, // gitlab_discussion_id for linking
    #[serde(skip_serializing_if = "Option::is_none")]
    pub source: Option<String>,      // "enrichment" or "heuristic" (transitional)
}

Testing Strategy

Unit Tests (Mock LLM)

The LLM provider trait enables deterministic testing with a mock:

struct MockLlmProvider {
    responses: Vec<String>,  // pre-canned JSON responses
    call_count: AtomicUsize,
}

impl LlmProvider for MockLlmProvider {
    async fn complete(&self, _prompt: &str, _max_tokens: u32) -> Result<LlmResponse> {
        let idx = self.call_count.fetch_add(1, Ordering::SeqCst);
        Ok(LlmResponse {
            content: self.responses[idx].clone(),
            input_tokens: 100,
            output_tokens: 50,
            stop_reason: "end_turn".to_string(),
        })
    }
}

Test Cases

Test	What it validates
`test_staleness_hash_changes_on_new_note`	notes_hash differs when note added
`test_staleness_hash_stable_no_changes`	notes_hash identical on re-computation
`test_enrichment_skips_unchanged_threads`	Threads with matching hash are not re-enriched
`test_enrichment_force_ignores_hash`	`--force` re-enriches all threads
`test_enrichment_stores_analysis`	Results persisted to `discussion_analysis` table
`test_enrichment_upserts_on_rereun`	Re-enrichment updates existing rows
`test_enrichment_dry_run_no_writes`	`--dry-run` produces count but writes nothing
`test_enrichment_respects_max_threads`	Caps at `--max-threads` value
`test_enrichment_scopes_to_project`	`-p` limits to project's discussions
`test_enrichment_scopes_to_entity`	`--issue 42` limits to that issue's discussions
`test_explain_uses_enrichment_data`	explain returns enrichment-sourced key_decisions
`test_explain_falls_back_to_heuristic`	No enrichment data → heuristic results
`test_explain_empty_when_no_data`	No enrichment, no heuristic matches → empty array
`test_prompt_construction`	Prompt includes correct notes, metadata, and instruction
`test_response_parsing_valid_json`	Well-formed LLM response parsed correctly
`test_response_parsing_malformed`	Malformed response logged, thread skipped (not crash)
`test_confidence_filter`	Only analysis above `minConfidence` shown in explain
`test_provider_config_bedrock`	Bedrock config parsed and provider instantiated
`test_provider_config_anthropic`	Anthropic API config parsed correctly
`test_no_enrichment_config_graceful`	Missing enrichment config → informative message, exit 0

Integration Tests

Real Bedrock call (gated behind #[ignore] + env var LORE_TEST_BEDROCK=1): Sends one real prompt to Bedrock, asserts valid JSON response with expected schema.
Full pipeline: In-memory DB → insert discussions + notes → enrich with mock → verify discussion_analysis populated → run explain → verify key_decisions sourced from enrichment.

Boundaries

Always (autonomous)

Run cargo test and cargo clippy after every code change
Use MockLlmProvider in all non-integration tests
Respect --dry-run flag — never call LLM in dry-run mode
Log token usage for every enrichment run
Graceful degradation when no enrichment config exists

Ask First (needs approval)

Adding AWS SDK or HTTP dependencies to Cargo.toml
Choosing between aws-sdk-bedrockruntime crate vs raw HTTP to Bedrock
Modifying the Config struct (new enrichment field)
Changing KeyDecision struct shape (affects robot mode API contract)

Never (hard stops)

No LLM calls in lore explain path — enrichment is pre-computed only
No storing API keys in config file — use env vars / credential chain
No automatic enrichment during lore sync — enrichment is always explicit
No sending discussion content to any service other than the configured LLM provider

Non-Goals

No real-time streaming — Enrichment is batch, not streaming
No multi-model ensemble — Single model per run, configurable per config
No custom fine-tuning — Uses Haiku as-is with prompt engineering
No enrichment of individual notes — Thread-level only (the unit of discourse)
No automatic re-enrichment on sync — User/agent must explicitly run lore enrich
No modification of discussion/notes tables — Enrichment data lives in its own table
No embedding-based approach — This is classification, not similarity search

User Journeys

P1 — Critical

UJ-1: Agent enriches discussions before explain
- Actor: AI agent (via robot mode)
- Flow: lore -J enrich discussions -p group/repo → JSON summary of enrichment run → lore -J explain issues 42 → key_decisions populated from enrichment
- Error paths: No enrichment config (exit with suggestion), Bedrock auth failure (exit 5), rate limited (exit 7)
- Implemented by: Tasks 1-5

P2 — Important

UJ-2: Human runs enrichment and checks results
- Actor: Developer at terminal
- Flow: lore enrich discussions → progress bar → summary → lore explain issues 42 → sees decisions in narrative
- Error paths: Same as UJ-1 but with human-readable messages
- Implemented by: Tasks 1-5
UJ-3: Incremental enrichment after sync
- Actor: AI agent or human
- Flow: lore sync → new notes ingested → lore enrich discussions → only stale threads re-enriched → fast completion
- Implemented by: Task 2 (staleness detection)

P3 — Nice to Have

UJ-4: Dry-run to estimate cost
- Actor: Cost-conscious user
- Flow: lore enrich discussions --dry-run → see thread count and estimated tokens → decide whether to proceed
- Implemented by: Task 4

Tasks

Phase 1: Schema & Provider Abstraction

Task 1: Database migration + LLM provider trait
- Implements: Infrastructure (all UJs)
- Files: src/core/db.rs (migration), NEW src/enrichment/mod.rs, NEW src/enrichment/provider.rs
- Depends on: Nothing
- Test-first:
  1. Write test_migration_creates_discussion_analysis_table: run migrations, verify table exists with correct columns
  2. Write test_provider_config_bedrock: parse config JSON with bedrock enrichment section
  3. Write test_provider_config_anthropic: parse config JSON with anthropic enrichment section
  4. Write test_no_enrichment_config_graceful: parse config without enrichment section, verify None
  5. Run tests — all FAIL (red)
  6. Implement migration + LlmProvider trait + EnrichmentConfig struct + config parsing
  7. Run tests — all PASS (green)
- Acceptance: Migration creates table. Config parses both provider variants. Missing config returns None.

Phase 2: Staleness & Prompt Pipeline

Task 2: Notes hash computation + staleness detection
- Implements: UJ-3 (incremental enrichment)
- Files: src/enrichment/staleness.rs
- Depends on: Task 1
- Test-first:
  1. Write test_staleness_hash_changes_on_new_note
  2. Write test_staleness_hash_stable_no_changes
  3. Write test_enrichment_skips_unchanged_threads
  4. Run tests — all FAIL (red)
  5. Implement compute_notes_hash() + find_stale_discussions() query
  6. Run tests — all PASS (green)
- Acceptance: Hash deterministic. Stale detection correct. Unchanged threads skipped.
Task 3: Prompt construction + response parsing
- Implements: Core enrichment logic
- Files: src/enrichment/prompt.rs, src/enrichment/parser.rs
- Depends on: Task 1
- Test-first:
  1. Write test_prompt_construction: verify prompt includes notes, metadata, instruction
  2. Write test_response_parsing_valid_json: well-formed response parsed
  3. Write test_response_parsing_malformed: malformed response returns error (not panic)
  4. Run tests — all FAIL (red)
  5. Implement build_prompt() + parse_analysis_response()
  6. Run tests — all PASS (green)
- Acceptance: Prompt is well-formed. Parser handles valid and invalid responses gracefully.

Phase 3: CLI Command & Pipeline

Task 4: lore enrich discussions command + enrichment pipeline
- Implements: UJ-1, UJ-2, UJ-4
- Files: NEW src/cli/commands/enrich.rs, src/cli/mod.rs, src/main.rs
- Depends on: Tasks 1, 2, 3
- Test-first:
  1. Write test_enrichment_stores_analysis: mock LLM → verify rows in discussion_analysis
  2. Write test_enrichment_upserts_on_rerun: enrich → re-enrich → verify single row updated
  3. Write test_enrichment_dry_run_no_writes: dry-run → verify zero rows written
  4. Write test_enrichment_respects_max_threads: 10 stale, max=3 → only 3 enriched
  5. Write test_enrichment_scopes_to_project: verify project filter
  6. Write test_enrichment_scopes_to_entity: verify --issue/--mr filter
  7. Run tests — all FAIL (red)
  8. Implement: command registration, pipeline orchestration, mock-based tests
  9. Run tests — all PASS (green)
- Acceptance: Full pipeline works with mock. Dry-run safe. Scoping correct. Robot JSON matches schema.

Phase 4: LLM Backend Implementations

Task 5: Bedrock + Anthropic API provider implementations
- Implements: UJ-1, UJ-2 (actual LLM connectivity)
- Files: src/enrichment/bedrock.rs, src/enrichment/anthropic.rs
- Depends on: Task 4
- Test-first:
  1. Write test_bedrock_request_format: verify request body matches Bedrock InvokeModel schema
  2. Write test_anthropic_request_format: verify request body matches Messages API schema
  3. Write integration test (gated #[ignore]): real Bedrock call, assert valid response
  4. Run tests — unit FAIL (red), integration skipped
  5. Implement both providers
  6. Run tests — all PASS (green)
- Acceptance: Both providers construct valid requests. Auth works via standard credential chains. Integration test passes when enabled.

Phase 5: Explain Integration

Task 6: Replace heuristic with enrichment data in explain
- Implements: UJ-1, UJ-2 (the payoff)
- Files: src/cli/commands/explain.rs
- Depends on: Task 4
- Test-first:
  1. Write test_explain_uses_enrichment_data: insert mock enrichment rows → explain returns them as key_decisions
  2. Write test_explain_falls_back_to_heuristic: no enrichment rows → returns heuristic results
  3. Write test_confidence_filter: insert rows with varying confidence → only high-confidence shown
  4. Run tests — all FAIL (red)
  5. Implement fetch_key_decisions_from_enrichment() + fallback logic
  6. Run tests — all PASS (green)
- Acceptance: Explain uses enrichment when available. Falls back gracefully. Confidence threshold respected.

Dependencies (New Crates — Needs Discussion)

Crate	Purpose	Alternative
`aws-sdk-bedrockruntime`	Bedrock InvokeModel API	Raw HTTP via existing `HttpClient`
`sha2`	SHA-256 for notes_hash	Already in dependency tree? Check.

Decision needed: Use AWS SDK crate (heavier but handles auth/signing) vs. raw HTTP with SigV4 signing (lighter but more implementation work)?

Session Log

Session 1 — 2026-03-11

Identified key_decisions heuristic as fundamentally inadequate (60-min same-actor window)
User vision: LLM-powered discourse analysis, pre-computed for offline explain
Key constraint: Bedrock required for org security compliance
Designed pre-computed enrichment architecture
Wrote initial spec draft for iteration

30 KiB Raw Blame History