docs: add lore-service, work-item-status-graphql, and time-decay plans

Three implementation plans with iterative cross-model refinement: lore-service (5 iterations): HTTP service layer exposing lore's SQLite data via REST/SSE for integration with external tools (dashboards, IDE extensions, chat agents). Covers authentication, rate limiting, caching strategy, and webhook-driven sync triggers. work-item-status-graphql (7 iterations + TDD appendix): Detailed implementation plan for the GraphQL-based work item status enrichment feature (now implemented). Includes the TDD appendix with test-first development specifications covering GraphQL client, adaptive pagination, ingestion orchestration, CLI display, and robot mode output. time-decay-expert-scoring (iteration 5 feedback): Updates to the existing time-decay scoring plan incorporating feedback on decay curve parameterization, recency weighting for discussion contributions, and staleness detection thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:17 -05:00
parent 1161edb212
commit 2c9de1a6c3
15 changed files with 9261 additions and 33 deletions
--- a/plans/lore-service.feedback-1.md
+++ b/plans/lore-service.feedback-1.md
@@ -0,0 +1,186 @@
 1. **Isolate scheduled behavior from manual `sync`**
 Reasoning: Your current plan injects backoff into `handle_sync_cmd`, which affects all `lore sync` calls (including manual recovery runs). Scheduled behavior should be isolated so humans aren’t unexpectedly blocked by service backoff.
 ```diff
@@ Context
 -`lore sync` runs a 4-stage pipeline (issues, MRs, docs, embeddings) that takes 2-4 minutes.
 +`lore sync` remains the manual/operator command.
 +`lore service run` (hidden/internal) is the scheduled execution entrypoint.
@@ Commands & User Journeys
 +### `lore service run` (hidden/internal)
 +**What it does:** Executes one scheduled sync attempt with service-only policy:
 +- applies service backoff policy
 +- records service run state
 +- invokes sync pipeline with configured profile
 +- updates retry state on success/failure
 +
 +**Invocation:** scheduler always runs:
 +`lore --robot service run --reason timer`
@@ Backoff Integration into `handle_sync_cmd`
 -Insert **after** config load but **before** the dry_run check:
 +Do not add backoff checks to `handle_sync_cmd`.
 +Backoff logic lives only in `handle_service_run`.
 ```
 2. **Use DB as source-of-truth for service state (not a standalone JSON status file)**
 Reasoning: You already have `sync_runs` in SQLite. A separate JSON status file creates split-brain and race/corruption risk. Keep JSON as optional cache/export only.
 ```diff
@@ Status File
 -Location: `{get_data_dir()}/sync-status.json`
 +Primary state location: SQLite (`service_state` table) + existing `sync_runs`.
 +Optional mirror file: `{get_data_dir()}/sync-status.json` (best-effort export only).
@@ File-by-File Implementation Details
 -### `src/core/sync_status.rs` (NEW)
 +### `migrations/015_service_state.sql` (NEW)
 +CREATE TABLE service_state (
 +  id INTEGER PRIMARY KEY CHECK (id = 1),
 +  installed INTEGER NOT NULL DEFAULT 0,
 +  platform TEXT,
 +  interval_seconds INTEGER,
 +  profile TEXT NOT NULL DEFAULT 'balanced',
 +  consecutive_failures INTEGER NOT NULL DEFAULT 0,
 +  next_retry_at_ms INTEGER,
 +  last_error_code TEXT,
 +  last_error_message TEXT,
 +  updated_at_ms INTEGER NOT NULL
 +);
 +
 +### `src/core/service_state.rs` (NEW)
 +- read/write state row
 +- derive backoff/next_retry
 +- join with latest `sync_runs` for status output
 ```
 3. **Backoff policy should be configurable, jittered, and error-aware**
 Reasoning: Fixed hardcoded backoff (`base=1800`) is wrong when user sets another interval. Also permanent failures (bad token/config) should not burn retries forever; they should enter paused/error state.
 ```diff
@@ Backoff Logic
 -// Exponential: base * 2^failures, capped at 4 hours
 +// Exponential with jitter: base * 2^(failures-1), capped, ±20% jitter
 +// Applies only to transient errors.
 +// Permanent errors set `paused_reason` and stop retries until user action.
@@ CLI Definition Changes
 +ServiceCommand::Resume,   // clear paused state / failures
 +ServiceCommand::Run,      // hidden
@@ Error Types
 +ServicePaused,            // scheduler paused due to permanent error
 +ServiceCommandFailed,     // OS command failure with stderr context
 ```
 4. **Add a pipeline-level single-flight lock**
 Reasoning: Current locking is in ingest stages; there’s still overlap risk across full sync pipelines (docs/embed can overlap with another run). Add a top-level lock for scheduled/manual sync pipeline execution.
 ```diff
@@ Architecture
 +Add `sync_pipeline` lock at top-level sync execution.
 +Keep existing ingest lock (`sync`) for ingest internals.
@@ Backoff Integration into `handle_sync_cmd`
 +Before starting sync pipeline, acquire `AppLock` with:
 +name = "sync_pipeline"
 +stale_lock_minutes = config.sync.stale_lock_minutes
 +heartbeat_interval_seconds = config.sync.heartbeat_interval_seconds
 ```
 5. **Don’t embed token in service files by default**
 Reasoning: Embedding PAT into unit/plist is a high-risk secret leak path. Make secure storage explicit and default-safe.
 ```diff
@@ `lore service install [--interval 30m]`
 +`lore service install [--interval 30m] [--token-source env-file|embedded]`
 +Default: `env-file` (0600 perms, user-owned)
 +`embedded` allowed only with explicit opt-in and warning
@@ Robot output
 -    "token_embedded": true
 +    "token_source": "env_file"
@@ Human output
 -  Note: Your GITLAB_TOKEN is embedded in the service file.
 +  Note: Token is stored in a user-private env file (0600).
 ```
 6. **Introduce a command-runner abstraction with timeout + stderr capture**
 Reasoning: `launchctl/systemctl/schtasks` calls are failure-prone; you need consistent error mapping and deterministic tests.
 ```diff
@@ Platform Backends
 -exports free functions that dispatch via `#[cfg(target_os)]`
 +exports backend + shared `CommandRunner`:
 +- run(cmd, args, timeout)
 +- capture stdout/stderr/exit code
 +- map failure to `ServiceCommandFailed { cmd, exit_code, stderr }`
 ```
 7. **Persist install manifest to avoid brittle file parsing**
 Reasoning: Parsing timer/plist for interval/state is fragile and platform-format dependent. Persist a manifest with checksums and expected artifacts.
 ```diff
@@ Platform Backends
 -Same pattern for ... `get_interval_seconds()`
 +Add manifest: `{data_dir}/service-manifest.json`
 +Stores platform, interval, profile, generated files, and command.
 +`service status` reads manifest first, then verifies platform state.
@@ Acceptance criteria
 +Install is idempotent:
 +- if manifest+files already match, report `no_change: true`
 +- if drift detected, reconcile and rewrite
 ```
 8. **Make schedule profile explicit (`fast|balanced|full`)**
 Reasoning: This makes the feature more useful and performance-tunable without requiring users to understand internal flags.
 ```diff
@@ `lore service install [--interval 30m]`
 +`lore service install [--interval 30m] [--profile fast|balanced|full]`
 +
 +Profiles:
 +- fast: `sync --no-docs --no-embed`
 +- balanced (default): `sync --no-embed`
 +- full: `sync`
 ```
 9. **Upgrade `service status` to include scheduler health + recent run summary**
 Reasoning: Single last-sync snapshot is too shallow. Include recent attempts and whether scheduler is paused/backing off/running.
 ```diff
@@ `lore service status`
 -What it does: Shows whether the service is installed, its configuration, last sync result, and next scheduled run.
 +What it does: Shows install state, scheduler state (running/backoff/paused), recent runs, and next run estimate.
@@ Robot output
 -    "last_sync": { ... },
 -    "backoff": null
 +    "scheduler_state": "running|backoff|paused|idle",
 +    "last_sync": { ... },
 +    "recent_runs": [{"run_id":"...","status":"...","started_at_iso":"..."}],
 +    "backoff": null,
 +    "paused_reason": null
 ```
 10. **Strengthen tests around determinism and cross-platform generation**
 Reasoning: Time-based backoff and shell quoting are classic flaky points. Add fake clock + fake command runner for deterministic tests.
 ```diff
@@ Testing Strategy
 +Add deterministic test seams:
 +- `Clock` trait for backoff/now calculations
 +- `CommandRunner` trait for backend command execution
 +
 +Add tests:
 +- transient vs permanent error classification
 +- backoff schedule with jitter bounds
 +- manifest drift reconciliation
 +- quoting/escaping for paths with spaces and special chars
 +- `service run` does not modify manual `sync` behavior
 ```
 If you want, I can rewrite your full plan as a single clean revised document with these changes already integrated (instead of patch fragments).
--- a/plans/lore-service.feedback-2.md
+++ b/plans/lore-service.feedback-2.md
@@ -0,0 +1,182 @@
 **High-Impact Revisions (ordered by priority)**
 1. **Make service identity project-scoped (avoid collisions across repos/users)**
 Analysis: Current fixed names (`com.gitlore.sync`, `LoreSync`, `lore-sync.timer`) will collide when users run multiple gitlore workspaces. This causes silent overwrites and broken uninstall/status behavior.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Commands & User Journeys / install
 - lore service install [--interval 30m] [--profile balanced] [--token-source env-file]
 + lore service install [--interval 30m] [--profile balanced] [--token-source auto] [--name <optional>]
@@ Install Manifest Schema
 +    /// Stable per-install identity (default derived from project root hash)
 +    pub service_id: String,
@@ Platform Backends
 - Label: com.gitlore.sync
 + Label: com.gitlore.sync.{service_id}
 - Task name: LoreSync
 + Task name: LoreSync-{service_id}
 - ~/.config/systemd/user/lore-sync.service
 + ~/.config/systemd/user/lore-sync-{service_id}.service
 ```
 2. **Replace token model with secure per-OS defaults**
 Analysis: The current “env-file default” is not actually secure on macOS launchd (token still ends up in plist). On Windows, assumptions about inherited environment are fragile. Use OS-native secure stores by default and keep `embedded` as explicit opt-in only.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Token storage strategies
 -| env-file (default) | ...
 +| auto (default) | macOS: Keychain, Linux: env-file (0600), Windows: Credential Manager |
 +| env-file | Linux/systemd only |
 | embedded | ... explicit warning ...
@@ macOS launchd section
 - env-file strategy stores canonical token in service-env but embeds token in plist
 + default strategy is Keychain lookup at runtime; no token persisted in plist
 + env-file is not offered on macOS
@@ Windows schtasks section
 - token must be in user's system environment
 + default strategy stores token in Windows Credential Manager and injects at runtime
 ```
 3. **Version and atomically persist manifest/status**
 Analysis: `Option<Self>` on read hides corruption, and non-atomic writes risk truncated JSON on crashes. This will create false “not installed” and scheduler confusion.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Install Manifest Schema
 +    pub schema_version: u32, // start at 1
 +    pub updated_at_iso: String,
@@ Status File Schema
 +    pub schema_version: u32, // start at 1
 +    pub updated_at_iso: String,
@@ Read/Write
 - read(path) -> Option<Self>
 + read(path) -> Result<Option<Self>, LoreError>
 - write(...) -> std::io::Result<()>
 + write_atomic(...) -> std::io::Result<()> // tmp file + fsync + rename
 ```
 4. **Persist `next_retry_at_ms` instead of recomputing jitter**
 Analysis: Deterministic jitter from timestamp modulo is predictable and can herd retries. Persisting `next_retry_at_ms` at failure time makes status accurate, stable, and cheap to compute.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ SyncStatusFile
     pub consecutive_failures: u32,
 +    pub next_retry_at_ms: Option<i64>,
@@ Backoff Logic
 - compute backoff from last_run.timestamp_ms and deterministic jitter each read
 + compute backoff once on failure, store next_retry_at_ms, read-only comparison afterward
 + jitter algorithm: full jitter in [0, cap], injectable RNG for tests
 ```
 5. **Add circuit breaker for repeated transient failures**
 Analysis: Infinite transient retries can run forever on systemic failures (DB corruption, bad network policy). After N transient failures, pause with actionable reason.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Scheduler states
 - backoff — transient failures, waiting to retry
 + backoff — transient failures, waiting to retry
 + paused — permanent error OR circuit breaker tripped after N transient failures
@@ Service run flow
 - On transient failure: increment failures, compute backoff
 + On transient failure: increment failures, compute backoff, if failures >= max_transient_failures -> pause
 ```
 6. **Stage-aware outcome policy (core freshness over all-or-nothing)**
 Analysis: Failing embeddings/docs should not block issues/MRs freshness. Split stage outcomes and only treat core stages as hard-fail by default. This improves reliability and practical usefulness.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Context
 - lore sync runs a 4-stage pipeline ... treated as one run result
 + lore service run records per-stage outcomes (issues, mrs, docs, embeddings)
@@ Status File Schema
 +    pub stage_results: Vec<StageResult>,
@@ service run flow
 - Execute sync pipeline with flags derived from profile
 + Execute stage-by-stage and classify severity:
 +   - critical: issues, mrs
 +   - optional: docs, embeddings
 + optional stage failures mark run as degraded, not failed
 ```
 7. **Replace cfg free-function backend with trait-based backend**
 Analysis: Current backend API is hard to test end-to-end without real OS commands. A `SchedulerBackend` trait enables deterministic integration tests and cleaner architecture.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Platform Backends / Architecture
 - exports free functions dispatched via #[cfg]
 + define trait SchedulerBackend { install, uninstall, state, file_paths, next_run }
 + provide LaunchdBackend, SystemdBackend, SchtasksBackend implementations
 + include FakeBackend for integration tests
 ```
 8. **Harden platform units and detect scheduler prerequisites**
 Analysis: systemd user timers often fail silently without user manager/linger; launchd context can be wrong in headless sessions. Add explicit diagnostics and unit hardening.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Linux systemd unit
 [Service]
 Type=oneshot
 ExecStart=...
 +TimeoutStartSec=900
 +NoNewPrivileges=true
 +PrivateTmp=true
 +ProtectSystem=strict
 +ProtectHome=read-only
@@ Linux install/status
 + detect user manager availability and linger state; surface warning/action
@@ macOS install/status
 + detect non-GUI bootstrap context and return actionable error
 ```
 9. **Add operational commands: `trigger`, `doctor`, and non-interactive log tail**
 Analysis: `logs` opening an editor is weak for automation and incident response. Operators need a preflight and immediate controlled run.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ ServiceCommand
 + Trigger, // run one attempt through service policy now
 + Doctor,  // validate scheduler, token, paths, permissions
@@ logs
 - opens editor
 + supports --tail <n> and --follow in human mode
 + robot mode can return last_n lines optionally
 ```
 10. **Fix plan inconsistencies and edge-case correctness**
 Analysis: There are internal mismatches that will cause implementation drift.  
 Diff:
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Interval Parsing
 - supports 's' suffix
 + remove 's' suffix (acceptance only allows 5m..24h)
@@ uninstall acceptance
 - removes ALL service files only
 + explicitly also remove service-manifest and service-env (status/logs retained)
@@ SyncStatusFile schema
 - pub last_run: SyncRunRecord
 + pub last_run: Option<SyncRunRecord> // matches idle/no runs state
 ```
 ---
 **Recommended Architecture Upgrade Summary**
 The strongest improvement set is: **(1) project-scoped IDs, (2) secure token defaults, (3) atomic/versioned state, (4) persisted retry schedule + circuit breaker, (5) stage-aware outcomes**. That combination materially improves correctness, multi-repo safety, security, operability, and real-world reliability without changing your core manual-vs-scheduled separation principle.
--- a/plans/lore-service.feedback-3.md
+++ b/plans/lore-service.feedback-3.md
@@ -0,0 +1,174 @@
 Below are the highest-impact revisions I’d make, ordered by severity/ROI. These focus on correctness first, then security, then operability and UX.
 1. **Fix multi-install ambiguity (`service_id` exists, but commands can’t target one explicitly)**
 Analysis: The plan introduces `service-manifest-{service_id}.json`, but `status/uninstall/resume/logs` have no selector. In a multi-workspace or multi-name install scenario, behavior becomes ambiguous and error-prone. Add explicit targeting plus discovery.
 ```diff
@@ ## Commands & User Journeys
 +### `lore service list`
 +Lists installed services discovered from `{data_dir}/service-manifest-*.json`.
 +Robot output includes `service_id`, `platform`, `interval_seconds`, `profile`, `installed_at_iso`.
@@ ### `lore service uninstall`
 -### `lore service uninstall`
 +### `lore service uninstall [--service <service_id|name>] [--all]`
@@
 -2. CLI reads install manifest to find `service_id`
 +2. CLI resolves target service via `--service` or current-project-derived default.
 +3. If multiple candidates and no selector, return actionable error.
@@ ### `lore service status`
 -### `lore service status`
 +### `lore service status [--service <service_id|name>]`
 ```
 2. **Make status state service-scoped (not global)**
 Analysis: A single `sync-status.json` for all services causes cross-service contamination (pause/backoff/outcome from one profile affecting another). Keep lock global, but state per service.
 ```diff
@@ ## Status File
 -### Location
 -`{get_data_dir()}/sync-status.json`
 +### Location
 +`{get_data_dir()}/sync-status-{service_id}.json`
@@ ## Paths Module Additions
 -pub fn get_service_status_path() -> PathBuf {
 -    get_data_dir().join("sync-status.json")
 +pub fn get_service_status_path(service_id: &str) -> PathBuf {
 +    get_data_dir().join(format!("sync-status-{service_id}.json"))
 }
@@
 -Note: `sync-status.json` is NOT scoped by `service_id`
 +Note: status is scoped by `service_id`; lock remains global (`sync_pipeline`) to prevent overlapping writes.
 ```
 3. **Stop classifying permanence via string matching**
 Analysis: Matching `"401 Unauthorized"` in strings is brittle and will misclassify edge cases. Carry machine codes through stage results and classify by `ErrorCode` only.
 ```diff
@@ pub struct StageResult {
 -    pub error: Option<String>,
 +    pub error: Option<String>,
 +    pub error_code: Option<String>, // e.g., AUTH_FAILED, NETWORK_ERROR
 }
@@ Error classification helpers
 -fn is_permanent_error_message(msg: Option<&str>) -> bool { ...string contains... }
 +fn is_permanent_error_code(code: Option<&str>) -> bool {
 +    matches!(code, Some("TOKEN_NOT_SET" | "AUTH_FAILED" | "CONFIG_NOT_FOUND" | "CONFIG_INVALID" | "MIGRATION_FAILED"))
 +}
 ```
 4. **Install should be transactional (manifest written last)**
 Analysis: Current order writes manifest before scheduler enable. If enable fails, you persist a false “installed” state. Use two-phase install with rollback.
 ```diff
@@ ### `lore service install` User journey
 -9. CLI writes install manifest ...
 -10. CLI runs the platform-specific enable command
 +9. CLI runs the platform-specific enable command
 +10. On success, CLI writes install manifest atomically
 +11. On failure, CLI removes generated files and returns `ServiceCommandFailed`
 ```
 5. **Fix launchd token security gap (env-file currently still embeds token)**
 Analysis: Current “env-file” on macOS still writes token into plist, defeating the main security goal. Generate a private wrapper script that reads env file at runtime and execs `lore`.
 ```diff
@@ ### macOS: launchd
 -<key>ProgramArguments</key>
 -<array>
 -    <string>{binary_path}</string>
 -    <string>--robot</string>
 -    <string>service</string>
 -    <string>run</string>
 -</array>
 +<key>ProgramArguments</key>
 +<array>
 +    <string>{data_dir}/service-run-{service_id}.sh</string>
 +</array>
@@
 -`env-file`: ... token value must still appear in plist ...
 +`env-file`: token never appears in plist; wrapper loads `{data_dir}/service-env-{service_id}` at runtime.
 ```
 6. **Improve backoff math and add half-open circuit recovery**
 Analysis: Current jitter + min clamp makes first retry deterministic and can over-pause. Also circuit-breaker requires manual resume forever. Add cooldown + half-open probe to self-heal.
 ```diff
@@ Backoff Logic
 -let backoff_secs = ((base_backoff as f64) * jitter_factor) as u64;
 -let backoff_secs = backoff_secs.max(base_interval_seconds);
 +let max_backoff = base_backoff;
 +let min_backoff = base_interval_seconds;
 +let span = max_backoff.saturating_sub(min_backoff);
 +let backoff_secs = min_backoff + ((span as f64) * jitter_factor) as u64;
@@ Scheduler states
 -- `paused` — permanent error ... OR circuit breaker tripped ...
 +- `paused` — permanent error requiring intervention
 +- `half_open` — probe state after circuit cooldown; one trial run allowed
@@ Circuit breaker
 -... transitions to `paused` ... Run: lore service resume
 +... transitions to `half_open` after cooldown (default 30m). Successful probe closes breaker automatically; failed probe returns to backoff/paused.
 ```
 7. **Promote backend trait to v1 (not v2) for deterministic integration tests**
 Analysis: This is a reliability-critical feature spanning OS schedulers. A trait abstraction now gives true behavior tests and safer refactors.
 ```diff
@@ ### Platform Backends
 -> Future architecture note: A `SchedulerBackend` trait ... for v2.
 +Adopt `SchedulerBackend` trait in v1 with real backends (`launchd/systemd/schtasks`) and `FakeBackend` for tests.
 +This enables deterministic install/uninstall/status/run-path integration tests without touching host scheduler.
 ```
 8. **Harden `run_cmd` timeout behavior**
 Analysis: If timeout occurs, child process must be killed and reaped. Otherwise you leak processes and can wedge repeated runs.
 ```diff
@@ fn run_cmd(...)
 -// Wait with timeout
 -let output = wait_with_timeout(output, timeout_secs)?;
 +// Wait with timeout; on timeout kill child and wait to reap
 +let output = wait_with_timeout_kill_and_reap(child, timeout_secs)?;
 ```
 9. **Add manual control commands (`pause`, `trigger`, `repair`)**
 Analysis: These are high-utility operational controls. `trigger` helps immediate sync without waiting interval. `pause` supports maintenance windows. `repair` avoids manual file deletion for corrupt state.
 ```diff
@@ pub enum ServiceCommand {
 +    /// Pause scheduled execution without uninstalling
 +    Pause { #[arg(long)] reason: Option<String> },
 +    /// Trigger an immediate one-off run using installed profile
 +    Trigger { #[arg(long)] ignore_backoff: bool },
 +    /// Repair corrupt manifest/status by backing up and reinitializing
 +    Repair { #[arg(long)] service: Option<String> },
 }
 ```
 10. **Make `logs` default non-interactive and add rotation policy**
 Analysis: Opening editor by default is awkward for automation/SSH and slower for normal diagnosis. Defaulting to `tail` is more practical; `--open` can preserve editor behavior.
 ```diff
@@ ### `lore service logs`
 -By default, opens in the user's preferred editor.
 +By default, prints last 100 lines to stdout.
 +Use `--open` to open editor.
@@
 +Log rotation: rotate `service-stdout.log` / `service-stderr.log` at 10 MB, keep 5 files.
 ```
 11. **Remove destructive/shell-unsafe suggested action**
 Analysis: `actions(): ["rm {path}", ...]` is unsafe (shell injection + destructive guidance). Replace with safe command path.
 ```diff
@@ LoreError::actions()
 -Self::ServiceCorruptState { path, .. } => vec![&format!("rm {path}"), "lore service install"],
 +Self::ServiceCorruptState { .. } => vec!["lore service repair", "lore service install"],
 ```
 12. **Tighten scheduler units for real-world reliability**
 Analysis: Add explicit working directory and success-exit handling to reduce environment drift and edge failures.
 ```diff
@@ systemd service unit
 [Service]
 Type=oneshot
 ExecStart={binary_path} --robot service run
 +WorkingDirectory={data_dir}
 +SuccessExitStatus=0
 TimeoutStartSec=900
 ```
 If you want, I can produce a single consolidated “v3 plan” markdown with these revisions already merged into your original structure.
--- a/plans/lore-service.feedback-4.md
+++ b/plans/lore-service.feedback-4.md
@@ -0,0 +1,190 @@
 No `## Rejected Recommendations` section was present in the plan you shared, so the proposals below are all net-new.
 1. **Make scheduled runs explicitly target a single service instance**
 Analysis: right now `service run` has no selector, but the plan supports multiple installed services. That creates ambiguity and incorrect manifest/status selection. This is the most important architectural fix.
 ```diff
@@ `lore service install` What it does
 - runs `lore --robot service run` at the specified interval
 + runs `lore --robot service run --service-id <service_id>` at the specified interval
@@ Robot output (`install`)
 - "sync_command": "/usr/local/bin/lore --robot service run",
 + "sync_command": "/usr/local/bin/lore --robot service run --service-id a1b2c3d4",
@@ `ServiceCommand` enum
 - #[command(hide = true)]
 - Run,
 + #[command(hide = true)]
 + Run {
 +     /// Internal selector injected by scheduler backend
 +     #[arg(long, hide = true)]
 +     service_id: String,
 + },
@@ `handle_service_run` signature
 -pub fn handle_service_run(start: std::time::Instant) -> Result<(), Box<dyn std::error::Error>>
 +pub fn handle_service_run(service_id: &str, start: std::time::Instant) -> Result<(), Box<dyn std::error::Error>>
@@ run flow step 1
 - Read install manifest
 + Read install manifest for `service_id`
 ```
 2. **Strengthen `service_id` derivation to avoid cross-workspace collisions**
 Analysis: hashing config path alone can collide when many workspaces share one global config. Identity should represent what is being synced, not only where config lives.
 ```diff
@@ Key Design Principles / Project-Scoped Service Identity
 - derive from a stable hash of the config file path
 + derive from a stable fingerprint of:
 + - canonical workspace root
 + - normalized configured GitLab project URLs
 + - canonical config path
 + then take first 12 hex chars of SHA-256
@@ `compute_service_id`
 - Returns first 8 hex chars of SHA-256 of the canonical config path.
 + Returns first 12 hex chars of SHA-256 of a canonical identity tuple
 + (workspace_root + sorted project URLs + config_path).
 ```
 3. **Introduce a service-state machine with a dedicated admin lock**
 Analysis: install/uninstall/pause/resume/repair/status can race each other. A lock and explicit transition table prevents invalid states and file races.
 ```diff
@@ New section: Service State Model
 + All state mutations are serialized by `AppLock("service-admin-{service_id}")`.
 + Legal transitions:
 + - idle -> running -> success|degraded|backoff|paused
 + - backoff -> running|paused
 + - paused -> half_open|running (resume)
 + - half_open -> running|paused
 + Any invalid transition is rejected with `ServiceCorruptState`.
@@ `handle_install`, `handle_uninstall`, `handle_pause`, `handle_resume`, `handle_repair`
 + Acquire `service-admin-{service_id}` before mutating manifest/status/service files.
 ```
 4. **Unify manual and scheduled sync execution behind one orchestrator**
 Analysis: the plan currently duplicates stage logic and error classification in `service run`, increasing drift risk. A shared orchestrator gives one authoritative pipeline behavior.
 ```diff
@@ Key Design Principles
 + #### 6. Single Sync Orchestrator
 + Both `lore sync` and `lore service run` call `SyncOrchestrator`.
 + Service mode adds policy (backoff/circuit-breaker); manual mode bypasses policy.
@@ Service Run Implementation
 - execute_sync_stages(&sync_args)
 + SyncOrchestrator::run(SyncMode::Service { profile, policy })
@@ manual sync
 - separate pipeline path
 + SyncOrchestrator::run(SyncMode::Manual { flags })
 ```
 5. **Add bounded in-run retries for transient core-stage failures**
 Analysis: single-shot failure handling will over-trigger backoff on temporary network blips. One short retry per core stage significantly improves freshness without much extra runtime.
 ```diff
@@ Stage-aware execution
 + Core stages (`issues`, `mrs`) get up to 1 immediate retry on transient errors
 + (jittered 1-5s). Permanent errors are never retried.
 + Optional stages keep best-effort semantics.
@@ Acceptance criteria (`service run`)
 + Retries transient core stage failures once before counting run as failed.
 ```
 6. **Harden persistence with full crash-safety semantics**
 Analysis: current atomic write description is good but incomplete for power-loss durability. You should fsync parent directory after rename and include lightweight integrity metadata.
 ```diff
@@ `write_atomic`
 - tmp file + fsync + rename
 + tmp file + fsync(file) + rename + fsync(parent_dir)
@@ `ServiceManifest` and `SyncStatusFile`
 + pub write_seq: u64,
 + pub content_sha256: String, // optional integrity guard for repair/doctor
 ```
 7. **Fix token handling to avoid shell/env injection and add secure-store mode**
 Analysis: sourcing env files in shell is brittle if token contains special chars/newlines. Also, secure OS credential stores should be first-class for production reliability/security.
 ```diff
@@ Token storage strategies
 -| `env-file` (default) ...
 +| `auto` (default) | use secure-store when available, else env-file |
 +| `secure-store` | macOS Keychain / libsecret / Windows Credential Manager |
 +| `env-file` | explicit fallback |
@@ macOS wrapper script
 -. "{data_dir}/service-env-{service_id}"
 -export {token_env_var}
 +TOKEN_VALUE="$(cat "{data_dir}/service-token-{service_id}" )"
 +export {token_env_var}="$TOKEN_VALUE"
@@ Acceptance criteria
 + Reject token values containing `\0` or newline for env-file mode.
 + Never eval/source untrusted token content.
 ```
 8. **Correct platform/runtime implementation hazards**
 Analysis: there are a few correctness risks that should be fixed in-plan now.
 ```diff
@@ macOS install steps
 - Get UID via `unsafe { libc::getuid() }`
 + Get UID via safe API (`nix::unistd::Uid::current()` or equivalent safe helper)
@@ Command Runner Helper
 - poll try_wait and read stdout/stderr after exit
 + avoid potential pipe backpressure deadlock:
 + use wait-with-timeout + concurrent stdout/stderr draining
@@ Linux timer
 - OnUnitActiveSec={interval_seconds}s
 + OnUnitInactiveSec={interval_seconds}s
 + AccuracySec=1min
 ```
 9. **Make logs fully service-scoped**
 Analysis: you already scoped manifest/status by `service_id`; logs are still global in several places. Multi-service installs will overwrite each other’s logs.
 ```diff
@@ Paths Module Additions
 -pub fn get_service_log_path() -> PathBuf
 +pub fn get_service_log_path(service_id: &str, stream: LogStream) -> PathBuf
@@ log filenames
 - logs/service-stderr.log
 - logs/service-stdout.log
 + logs/service-{service_id}-stderr.log
 + logs/service-{service_id}-stdout.log
@@ `service logs`
 - default path: `{data_dir}/logs/service-stderr.log`
 + default path: `{data_dir}/logs/service-{service_id}-stderr.log`
 ```
 10. **Resolve internal spec contradictions and rollback gaps**
 Analysis: there are a few contradictory statements and incomplete rollback behavior that will cause implementation churn.
 ```diff
@@ `service logs` behavior
 - default (no flags): open in editor (human)
 + default (no flags): print last 100 lines (human and robot metadata mode)
 + `--open` is explicit opt-in
@@ install rollback
 - On failure: removes generated service files
 + On failure: removes generated service files, env file, wrapper script, and temp manifest
@@ `handle_service_run` sample code
 - let manifest_path = get_service_manifest_path();
 + let manifest_path = get_service_manifest_path(service_id);
 ```
 If you want, I can take these revisions and produce a single consolidated “Iteration 4” replacement plan block with all sections rewritten coherently so it’s ready to hand to an implementer.
--- a/plans/lore-service.feedback-5.md
+++ b/plans/lore-service.feedback-5.md
@@ -0,0 +1,196 @@
 I reviewed the full plan and avoided everything already listed in `## Rejected Recommendations`. These are the highest-impact revisions I’d make.
 1. **Fix identity model inconsistency and prevent `--name` alias collisions**
 Why this improves the plan: your text says identity includes workspace root, but the current derivation code does not. Also, using `--name` as the actual `service_id` risks accidental cross-project collisions and destructive updates.
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Key Design Principles / 2. Project-Scoped Service Identity
 - Each installed service gets a unique `service_id` derived from a canonical identity tuple: the config file path, sorted GitLab project URLs, and workspace root.
 + Each installed service gets an immutable `identity_hash` derived from a canonical identity tuple:
 + workspace root + canonical config path + sorted normalized project URLs.
 + `service_id` remains the scheduler identifier; `--name` is a human alias only.
 + If `--name` collides with an existing service that has a different `identity_hash`, install fails with an actionable error.
@@ Install Manifest / Schema
 + /// Immutable identity hash for collision-safe matching across reinstalls
 + pub identity_hash: String,
 + /// Optional human-readable alias passed via --name
 + #[serde(skip_serializing_if = "Option::is_none")]
 + pub service_alias: Option<String>,
 + /// Canonical workspace root used in identity derivation
 + pub workspace_root: String,
@@ service_id derivation
 -pub fn compute_service_id(config_path: &Path, project_urls: &[&str]) -> String
 +pub fn compute_identity_hash(workspace_root: &Path, config_path: &Path, project_urls: &[&str]) -> String
 ```
 2. **Add lock protocol to eliminate uninstall/run race conditions**
 Why this improves the plan: today `service run` does not take admin lock, and admin commands do not take pipeline lock. `uninstall` can race with an active run and remove files mid-execution.
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Key Design Principles / 6. Serialized Admin Mutations
 - The `service run` entrypoint does NOT acquire the admin lock — it only acquires the `sync_pipeline` lock
 + The `service run` entrypoint acquires only `sync_pipeline`.
 + Destructive admin operations (`install` overwrite, `uninstall`, `repair --regenerate`) must:
 + 1) acquire `service-admin-{service_id}`
 + 2) disable scheduler backend entrypoint
 + 3) acquire `sync_pipeline` lock with timeout
 + 4) mutate/remove files
 + This lock ordering is mandatory to prevent deadlocks and run/delete races.
@@ lore service uninstall / User journey
 - 4. Runs platform-specific disable command
 - 5. Removes service files from disk
 + 4. Acquires `sync_pipeline` lock (after disabling scheduler) with bounded wait
 + 5. Removes service files from disk only after lock acquisition
 ```
 3. **Make transient handling `Retry-After` aware**
 Why this improves the plan: rate-limit and 503 responses often carry retry hints. Ignoring them causes useless retries and longer outages.
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Transient vs permanent error classification
 -| Transient | Retry with backoff | Network timeout, rate limited, DB locked, 5xx from GitLab |
 +| Transient | Retry with adaptive backoff | Network timeout, DB locked, 5xx from GitLab |
 +| Transient (hinted) | Respect server retry hint | Rate limited with Retry-After/X-RateLimit-Reset |
@@ Backoff Logic
 + If an error includes a retry hint (e.g., `Retry-After`), set:
 + `next_retry_at_ms = max(computed_backoff, hinted_retry_at_ms)`.
 + Persist `backoff_reason` for status visibility.
 ```
 4. **Decouple optional stage cadence from core sync interval**
 Why this improves the plan: running docs/embeddings every 5–30 minutes is expensive and unnecessary. Separate freshness windows reduce cost/latency while keeping core data fresh.
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Sync profiles
 -| `balanced` (default) | `--no-embed` | Issues + MRs + doc generation |
 -| `full` | (none) | Full pipeline including embeddings |
 +| `balanced` (default) | core every interval, docs every 60m, no embeddings | Fast + useful docs |
 +| `full` | core every interval, docs every interval, embeddings every 6h (default) | Full freshness with bounded cost |
@@ Status File / StageResult
 + /// true when stage intentionally skipped due freshness window
 + #[serde(default)]
 + pub skipped: bool,
@@ lore service run / Stage-aware execution
 + Optional stages may be skipped when their last successful run is within configured freshness windows.
 + Skipped optional stages do not count as failures and are recorded explicitly.
 ```
 5. **Give Windows parity for secure token handling (env-file + wrapper)**
 Why this improves the plan: current Windows path requires global/system env and has poor UX. A wrapper+env-file model gives platform parity and avoids global token exposure.
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Token storage strategies
 -| On Windows, neither strategy applies — the token must be in the user's system environment
 +| On Windows, `env-file` is supported via a generated wrapper script (`service-run-{service_id}.cmd` or `.ps1`)
 +| that reads `{data_dir}/service-env-{service_id}` and launches `lore --robot service run ...`.
 +| `embedded` remains opt-in and warned as less secure.
@@ Windows: schtasks
 - Token handling on Windows: The env var must be set system-wide via `setx`
 + Token handling on Windows:
 + - `env-file` (default): wrapper script reads token from private file at runtime
 + - `embedded`: token passed via wrapper-set environment variable
 + - `system_env`: still supported as fallback
 ```
 6. **Add run heartbeat and stale-run detection**
 Why this improves the plan: `running` state can become misleading after crashes or stale locks. Heartbeat metadata makes status accurate and improves incident triage.
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Status File / Schema
 + /// In-flight run metadata for crash/stale detection
 + #[serde(skip_serializing_if = "Option::is_none")]
 + pub current_run: Option<CurrentRunState>,
 +
 +pub struct CurrentRunState {
 +    pub run_id: String,
 +    pub started_at_ms: i64,
 +    pub last_heartbeat_ms: i64,
 +    pub pid: u32,
 +}
@@ lore service status
 - - `running` — currently executing (sync_pipeline lock held)
 + - `running` — currently executing with live heartbeat
 + - `running_stale` — in-flight metadata exists but heartbeat exceeded stale threshold
 ```
 7. **Upgrade drift detection from “loaded/unloaded” to spec-level drift**
 Why this improves the plan: platform state alone misses manual edits to unit/plist/wrapper files. Spec-hash drift gives reliable “what changed?” diagnostics and safe regeneration.
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ Install Manifest / Schema
 + /// Hash of generated scheduler artifacts and command spec
 + pub spec_hash: String,
@@ lore service status
 - Detects manifest/platform drift and reports it
 + Detects:
 + - platform drift (loaded/unloaded mismatch)
 + - spec drift (artifact content hash mismatch)
 + - command drift (sync command differs from manifest)
@@ lore service repair
 + Add `--regenerate` to rewrite scheduler artifacts from manifest when spec drift is detected.
 + This is non-destructive and does not delete status/log history.
 ```
 8. **Add safe operational modes: `install --dry-run` and `doctor --fix`**
 Why this improves the plan: dry-run reduces risk before writing OS scheduler files; fix-mode improves operator ergonomics and lowers support burden.
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ lore service install
 + Add `--dry-run`:
 + - validates config/token/prereqs
 + - renders service files and planned commands
 + - writes nothing, executes nothing
@@ lore service doctor
 + Add `--fix` for safe, non-destructive remediations:
 + - create missing dirs
 + - correct file permissions on env/wrapper files
 + - run `systemctl --user daemon-reload` when applicable
 + - report applied fixes in robot output
 ```
 9. **Define explicit schema migration behavior (not just `schema_version` fields)**
 Why this improves the plan: version fields without migration policy become operational risk during upgrades.
 ```diff
 --- a/plan.md
 +++ b/plan.md
@@ ServiceManifest Read/Write
 - `ServiceManifest::read(path: &Path) -> Result<Option<Self>, LoreError>`
 + `ServiceManifest::read_and_migrate(path: &Path) -> Result<Option<Self>, LoreError>`
 + - Migrates known older schema versions to current in-memory model
 + - Rewrites migrated file atomically
 + - Fails with actionable `ServiceCorruptState` for unknown future major versions
@@ SyncStatusFile Read/Write
 - `SyncStatusFile::read(path: &Path) -> Result<Option<Self>, LoreError>`
 + `SyncStatusFile::read_and_migrate(path: &Path) -> Result<Option<Self>, LoreError>`
 ```
 If you want, I can produce a fully rewritten v5 plan text that integrates all nine changes coherently section-by-section.
--- a/plans/lore-service.md
+++ b/plans/lore-service.md
--- a/plans/time-decay-expert-scoring.feedback-5.md
+++ b/plans/time-decay-expert-scoring.feedback-5.md
@@ -0,0 +1,128 @@
 **Best Revisions To Strengthen The Plan**
 1. **[Critical] Replace one-hop rename matching with canonical path identities**
 Analysis and rationale: Current `old_path OR new_path` fixes direct renames, but it still breaks on rename chains (`a.rs -> b.rs -> c.rs`) and split/move patterns. A canonical `path_identity` graph built from `mr_file_changes(old_path,new_path)` gives stable identity over time, which is the right architectural boundary for expertise history.
 ```diff
@@ ## Context
 -- Match both old and new paths in all signal queries AND path resolution probes so expertise survives file renames
 +- Build canonical path identities from rename edges and score by identity, not raw path strings, so expertise survives multi-hop renames and moves
@@ ## Files to Modify
 -2. **`src/cli/commands/who.rs`** — Core changes:
 +2. **`src/cli/commands/who.rs`** — Core changes:
    ...
 -   - Match both `new_path` and `old_path` in all signal queries (rename awareness)
 +   - Resolve queried paths to `path_identity_id` and match all aliases in that identity set
 +4. **`src/core/path_identity.rs`** — New module:
 +   - Build/maintain rename graph from `mr_file_changes`
 +   - Resolve path -> identity + aliases for probes/scoring
 ```
 2. **[Critical] Shift scoring input from runtime CTE joins to a normalized `expertise_events` table**
 Analysis and rationale: Your SQL is correct but complex and expensive at query time. Precomputing normalized events at ingestion gives simpler, faster, and more reliable scoring queries; it also enables model versioning/backfills without touching raw MR/note tables each request.
 ```diff
@@ ## Files to Modify
 -3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes
 +3. **`src/core/db.rs`** — Add migrations for:
 +   - `expertise_events` table (normalized scoring events)
 +   - supporting indexes
 +4. **`src/core/ingest/expertise_events.rs`** — New:
 +   - Incremental upsert of events during sync/ingest
@@ ## SQL Restructure (who.rs)
 -The SQL uses CTE-based dual-path matching and hybrid aggregation...
 +Runtime SQL reads precomputed `expertise_events` filtered by path identity + time window.
 +Heavy joins/aggregation move to ingest-time normalization.
 ```
 3. **[High] Upgrade reviewer engagement model beyond char-count threshold**
 Analysis and rationale: `min_note_chars` is a useful guardrail but brittle (easy to game, penalizes concise high-quality comments). Add explicit review-state signals (`approved`, `changes_requested`) and trivial-comment pattern filtering to better capture real reviewer expertise.
 ```diff
@@ ## Scoring Formula
 -| **Reviewer Participated** (left DiffNote on MR/path) | 10 | 90 days |
 +| **Reviewer Participated** (substantive DiffNote and/or formal review action) | 10 | 90 days |
 +| **Review Decision: changes_requested** | 6 | 120 days |
 +| **Review Decision: approved** | 4 | 75 days |
@@ ### 1. ScoringConfig (config.rs)
     pub reviewer_min_note_chars: u32,
 +    pub reviewer_trivial_note_patterns: Vec<String>, // default: ["lgtm","+1","nit","ship it","👍"]
 +    pub review_approved_weight: i64,                 // default: 4
 +    pub review_changes_requested_weight: i64,        // default: 6
 ```
 4. **[High] Make temporal semantics explicit and deterministic**
 Analysis and rationale: `--as-of` is good, but day parsing and boundary semantics can still cause subtle reproducibility issues. Define window as `[since_ms, as_of_ms)` and parse `YYYY-MM-DD` as end-of-day UTC (or explicit timezone) so user expectations match outputs.
 ```diff
@@ ### 5a. Reproducible Scoring via `--as-of`
 -- All event selection is bounded by `[since_ms, as_of_ms]`
 +- All event selection is bounded by `[since_ms, as_of_ms)` (exclusive upper bound)
 +- `YYYY-MM-DD` is interpreted as `23:59:59.999Z` unless `--timezone` is provided
 +- Robot output includes `window_start_iso`, `window_end_iso`, `window_end_exclusive: true`
 ```
 5. **[High] Replace fixed default `--since 24m` with contribution-floor auto cutoff**
 Analysis and rationale: A static window is simple but often over-scans data. Compute a model-derived horizon from a minimum contribution floor (for example `0.01` points) per signal; this keeps results equivalent while reducing query cost.
 ```diff
@@ ### 5. Default --since Change
 -Expert mode: `"6m"` -> `"24m"`
 +Expert mode default: `--since auto`
 +`auto` computes earliest relevant timestamp from configured weights/half-lives and `min_contribution_floor`
 +Add config: `min_contribution_floor` (default: 0.01)
 +`--since` still overrides, `--all-history` still bypasses cutoff
 ```
 6. **[High] Add bot/service-account filtering now (not later)**
 Analysis and rationale: Bot activity can materially distort expertise rankings in real repos. This is low implementation cost with high quality gain and should be in v1 of the scoring revamp, not deferred.
 ```diff
@@ ### 1. ScoringConfig (config.rs)
 +    pub excluded_username_patterns: Vec<String>, // default: ["bot","\\[bot\\]","service-account","ci-"]
@@ ### 2. SQL Restructure (who.rs)
 +Apply username exclusion in all signal sources unless `--include-bots` is set
@@ ### 5b. Score Explainability via `--explain-score`
 +Add `filtered_events` counts in robot output metadata
 ```
 7. **[Medium] Enforce deterministic floating-point accumulation**
 Analysis and rationale: Even with small sets, unordered `HashMap` iteration can cause tiny platform-dependent ranking differences near ties. Sorting contributions and using Neumaier summation removes nondeterminism and stabilizes tests/CI.
 ```diff
@@ ### 4. Rust-Side Aggregation (who.rs)
 -Compute score as `f64`.
 +Compute score as `f64` using deterministic contribution ordering:
 +1) sort by (username, signal, mr_id, ts)
 +2) sum with Neumaier compensation
 +Tie-break remains `(raw_score DESC, last_seen DESC, username ASC)`
 ```
 8. **[Medium] Strengthen explainability with evidence, not just totals**
 Analysis and rationale: Component totals help, but disputes usually need “why this user got this score now.” Add compact top evidence rows per component (`mr_id`, `ts`, `raw_contribution`) behind an optional mode.
 ```diff
@@ ### 5b. Score Explainability via `--explain-score`
 -Component breakdown only (4 floats per user).
 +Add `--explain-score=summary|full`:
 +`summary`: current 4-component totals
 +`full`: adds top N evidence rows per component (default N=3)
 +Robot output includes per-evidence `mr_id`, `signal`, `ts`, `contribution`
 ```
 9. **[Medium] Make query plan strategy explicit: `UNION ALL` default for dual-path scans**
 Analysis and rationale: You currently treat `UNION ALL` as fallback if planner regresses. For SQLite, OR-across-indexed-columns regressions are common enough that defaulting to branch-split queries is often more predictable.
 ```diff
@@ **Index optimization fallback (UNION ALL split)**
 -Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm degradation.
 +Use `UNION ALL` + dedup as default for dual-path matching.
 +Keep `OR` variant as optional strategy flag for benchmarking/regression checks.
 ```
 10. **[Medium] Add explicit performance SLO + benchmark gate**
 Analysis and rationale: This plan is query-heavy and ranking-critical; add measurable performance budgets so future edits do not silently degrade UX. Include synthetic fixture benchmarks for exact, prefix, and suffix path modes.
 ```diff
@@ ## Verification
 +8. Performance regression gate:
 +   - `cargo bench --bench who_expert_scoring`
 +   - Dataset tiers: 100k, 1M, 5M notes
 +   - SLOs: p95 exact path < 150ms, prefix < 250ms, suffix < 400ms on reference hardware
 +   - Fail CI if regression > 20% vs stored baseline
 ```
 If you want, I can produce a single consolidated “iteration 5” plan document with these changes already merged into your current structure.
--- a/plans/time-decay-expert-scoring.md
+++ b/plans/time-decay-expert-scoring.md
@@ -2,12 +2,12 @@
 plan: true
 title: ""
 status: iterating
-iteration: 4
+iteration: 5
 target_iterations: 8
-beads_revision: 0
+beads_revision: 1
 related_plans: []
 created: 2026-02-08
-updated: 2026-02-08
+updated: 2026-02-09
 ---
 # Time-Decay Expert Scoring Model
@@ -78,6 +78,7 @@ Author/reviewer signals are deduplicated per MR (one signal per distinct MR). No
   - Change default `--since` from `"6m"` to `"24m"` (2 years captures all meaningful decayed signals)
   - Add `--as-of` flag for reproducible scoring at a fixed timestamp
   - Add `--explain-score` flag for per-user score component breakdown
   - Add `--include-bots` flag to disable bot/service-account filtering
   - Sort on raw f64 score, round only for display
   - Update tests
 3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes (dual-path matching, reviewer participation CTE, path resolution probes)
@@ -100,6 +101,7 @@ pub struct ScoringConfig {
    pub note_half_life_days: u32,              // default: 45
    pub closed_mr_multiplier: f64,             // default: 0.5  (applied to closed-without-merge MRs)
    pub reviewer_min_note_chars: u32,          // default: 20   (minimum note body length to count as participation)
    pub excluded_usernames: Vec<String>,       // default: []    (exact-match usernames to exclude, e.g. ["renovate-bot", "gitlab-ci"])
 }
 ```
@@ -108,6 +110,7 @@ pub struct ScoringConfig {
 - All `*_weight` / `*_bonus` must be >= 0 (negative weights produce nonsensical scores)
 - `closed_mr_multiplier` must be in `(0.0, 1.0]` (0 would discard closed MRs entirely; >1 would over-weight them)
 - `reviewer_min_note_chars` must be >= 0 (0 disables the filter; typical useful values: 10-50)
 - `excluded_usernames` entries must be non-empty strings (no blank entries)
 - Return `LoreError::ConfigInvalid` with a clear message on failure
 ### 2. Decay Function (who.rs)
@@ -128,25 +131,51 @@ The SQL uses **CTE-based dual-path matching** and **hybrid aggregation**. Rather
 MR-level signals return one row per (username, signal, mr_id) with a timestamp; note signals return one row per (username, mr_id) with `note_count` and `max_ts`. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and `log2(1+count)`.
 ```sql
-WITH matched_notes AS (
+WITH matched_notes_raw AS (
-  -- Centralize dual-path matching for DiffNotes
+  -- Branch 1: match on new_path (uses idx_notes_new_path or equivalent)
-  SELECT n.id, n.discussion_id, n.author_username, n.created_at,
+  SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
         n.position_new_path, n.position_old_path, n.project_id
  FROM notes n
  WHERE n.note_type = 'DiffNote'
    AND n.is_system = 0
    AND n.author_username IS NOT NULL
    AND n.created_at >= ?2
-    AND n.created_at <= ?4
+    AND n.created_at < ?4
    AND (?3 IS NULL OR n.project_id = ?3)
-    AND (n.position_new_path {path_op} OR n.position_old_path {path_op})
+    AND n.position_new_path {path_op}
  UNION ALL
  -- Branch 2: match on old_path (uses idx_notes_old_path_author)
  SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
  FROM notes n
  WHERE n.note_type = 'DiffNote'
    AND n.is_system = 0
    AND n.author_username IS NOT NULL
    AND n.created_at >= ?2
    AND n.created_at < ?4
    AND (?3 IS NULL OR n.project_id = ?3)
    AND n.position_old_path {path_op}
 ),
-matched_file_changes AS (
+matched_notes AS (
-  -- Centralize dual-path matching for file changes
+  -- Dedup: prevent double-counting when old_path = new_path (no rename)
  SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
  FROM matched_notes_raw
 ),
 matched_file_changes_raw AS (
  -- Branch 1: match on new_path (uses idx_mfc_new_path_project_mr)
  SELECT fc.merge_request_id, fc.project_id
  FROM mr_file_changes fc
  WHERE (?3 IS NULL OR fc.project_id = ?3)
-    AND (fc.new_path {path_op} OR fc.old_path {path_op})
+    AND fc.new_path {path_op}
  UNION ALL
  -- Branch 2: match on old_path (uses idx_mfc_old_path_project_mr)
  SELECT fc.merge_request_id, fc.project_id
  FROM mr_file_changes fc
  WHERE (?3 IS NULL OR fc.project_id = ?3)
    AND fc.old_path {path_op}
 ),
 matched_file_changes AS (
  -- Dedup: prevent double-counting when old_path = new_path (no rename)
  SELECT DISTINCT merge_request_id, project_id
  FROM matched_file_changes_raw
 ),
 reviewer_participation AS (
  -- Precompute which (mr_id, username) pairs have substantive DiffNote participation.
@@ -196,7 +225,7 @@ raw AS (
  WHERE m.author_username IS NOT NULL
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
-    AND {state_aware_ts} <= ?4
+    AND {state_aware_ts} < ?4
  UNION ALL
@@ -212,7 +241,7 @@ raw AS (
    AND (m.author_username IS NULL OR r.username != m.author_username)
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
-    AND {state_aware_ts} <= ?4
+    AND {state_aware_ts} < ?4
  UNION ALL
@@ -229,7 +258,7 @@ raw AS (
    AND (m.author_username IS NULL OR r.username != m.author_username)
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
-    AND {state_aware_ts} <= ?4
+    AND {state_aware_ts} < ?4
 ),
 aggregated AS (
  -- MR-level signals: 1 row per (username, signal_class, mr_id) with MAX(ts)
@@ -245,11 +274,11 @@ aggregated AS (
 SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated WHERE username IS NOT NULL
 ```
-Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `BETWEEN ?2 AND ?4` pattern ensures that when `--as-of` is set to a past date, events after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility.
+Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` exclusive upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `>= ?2 AND < ?4` pattern (half-open interval) ensures that when `--as-of` is set to a past date, events at or after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility. The exclusive upper bound avoids edge-case ambiguity when events have timestamps exactly equal to the as-of value.
-**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into `matched_notes` and `matched_file_changes` CTEs means path matching is defined once, the indexes are hit once, and adding future path resolution logic (e.g., alias chains) only requires changes in one place.
+**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into foundational CTEs (`matched_notes_raw` → `matched_notes`, `matched_file_changes_raw` → `matched_file_changes`) means path matching is defined once, each index branch is explicit, and adding future path resolution logic (e.g., alias chains) only requires changes in one place. The UNION ALL + dedup pattern ensures SQLite uses the optimal index for each path column independently.
-**Index optimization fallback (UNION ALL split)**: SQLite's query planner sometimes struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. If EXPLAIN QUERY PLAN shows this during step 6 verification, replace the `OR`-based CTEs with a `UNION ALL` split + dedup pattern:
+**Dual-path matching strategy (UNION ALL split)**: SQLite's query planner commonly struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. Rather than starting with `OR` and hoping the planner cooperates, use `UNION ALL` + dedup as the default strategy:
 ```sql
 matched_notes AS (
  SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
@@ -261,18 +290,18 @@ matched_notes_dedup AS (
  FROM matched_notes
 ),
 ```
-This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm the degradation.
+This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). The same pattern applies to `matched_file_changes`. The simpler `OR` variant is retained as a comment for benchmarking — if a future SQLite version handles `OR` well, the split can be collapsed.
 **Rationale for precomputed participation set**: The previous approach used correlated `EXISTS`/`NOT EXISTS` subqueries to classify reviewers. The `reviewer_participation` CTE materializes the set of `(mr_id, username)` pairs from matched DiffNotes once, then signal 4a JOINs against it (participated) and signal 4b LEFT JOINs with `IS NULL` (assigned-only). This avoids per-reviewer-row correlated scans, is easier to reason about, and produces the same exhaustive split — every `mr_reviewers` row falls into exactly one bucket.
 **Rationale for hybrid over fully-raw**: Pre-aggregating note counts in SQL prevents row explosion from heavy DiffNote volume on frequently-discussed paths. MR-level signals are already 1-per-MR by nature (deduped via GROUP BY in each subquery). This keeps memory and latency predictable regardless of review activity density.
-**Path rename awareness**: Both `matched_notes` and `matched_file_changes` CTEs match against both old and new path columns:
+**Path rename awareness**: Both `matched_notes` and `matched_file_changes` use UNION ALL + dedup to match against both old and new path columns independently, ensuring each branch uses its respective index:
- Notes: `(n.position_new_path {path_op} OR n.position_old_path {path_op})`
+- Notes: branch 1 matches `position_new_path`, branch 2 matches `position_old_path`, deduped by `notes.id`
- File changes: `(fc.new_path {path_op} OR fc.old_path {path_op})`
+- File changes: branch 1 matches `new_path`, branch 2 matches `old_path`, deduped by `(merge_request_id, project_id)`
-Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The `OR` match ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.
+Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The UNION ALL approach ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.
 **Signal 4 splits into two**: The current signal 4 (`file_reviewer`) joins `mr_reviewers` but doesn't distinguish participation. In the new plan:
@@ -332,7 +361,7 @@ For each username, accumulate into a struct with:
 The `mr_state` field from each SQL row is stored alongside the timestamp so the Rust-side can apply `closed_mr_multiplier` when `mr_state == "closed"`.
-Compute score as `f64`. Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`:
+Compute score as `f64` with **deterministic contribution ordering**: within each signal type, sort contributions by `(mr_id ASC)` before summing. This eliminates platform-dependent HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the complexity of compensated summation (Neumaier/Kahan). Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`:
 ```
 state_mult(mr) = if mr.state == "closed" { closed_mr_multiplier } else { 1.0 }
@@ -352,6 +381,8 @@ Compute counts from the accumulated data:
 - `review_note_count = notes_per_mr.values().map(|(count, _)| count).sum()`
 - `author_mr_count = author_mrs.len()`
 **Bot/service-account filtering**: After accumulating all user scores and before sorting, filter out any username that appears in `config.scoring.excluded_usernames` (exact match, case-insensitive). This is applied in Rust post-query (not SQL) to keep the SQL clean and avoid parameter explosion. When `--include-bots` is active, the filter is skipped entirely. The robot JSON `resolved_input` includes `excluded_usernames_applied: true|false` to indicate whether filtering was active.
 Truncate to limit after sorting.
 ### 5. Default --since Change
@@ -364,10 +395,11 @@ At 2 years, author decay = 6%, reviewer decay = 0.4%, note decay = 0.006% — ne
 ### 5a. Reproducible Scoring via `--as-of`
 Add `--as-of <RFC3339|YYYY-MM-DD>` flag that overrides the `now_ms` reference point used for decay calculations. When set:
- All event selection is bounded by `[since_ms, as_of_ms]` — events after `as_of_ms` are excluded from SQL results entirely (not just decayed)
+- All event selection is bounded by `[since_ms, as_of_ms)` — exclusive upper bound; events at or after `as_of_ms` are excluded from SQL results entirely (not just decayed). The SQL uses `< ?4` (strict less-than), not `<= ?4`.
 - `YYYY-MM-DD` input (without time component) is interpreted as end-of-day UTC: `T23:59:59.999Z`. This matches user intuition that `--as-of 2025-06-01` means "as of the end of June 1st" rather than "as of midnight at the start of June 1st" which would exclude the entire day's activity.
 - All decay computations use `as_of_ms` instead of `SystemTime::now()`
 - The `--since` window is calculated relative to `as_of_ms` (not wall clock)
- Robot JSON `resolved_input` includes `as_of_ms` and `as_of_iso` fields
+- Robot JSON `resolved_input` includes `as_of_ms`, `as_of_iso`, `window_start_iso`, `window_end_iso`, and `window_end_exclusive: true` fields — making the exact query window unambiguous in output
 **Rationale**: Decayed scoring is time-sensitive by nature. Without a fixed reference point, the same query run minutes apart produces different rankings, making debugging and test reproducibility difficult. `--as-of` pins the clock so that results are deterministic for a given dataset. The upper-bound filter in SQL is critical — without it, events after the as-of date would enter with full weight (since `elapsed.max(0.0)` clamps negative elapsed time to zero), breaking the reproducibility guarantee.
@@ -484,7 +516,13 @@ Add timestamp-aware variants:
 **`test_closed_mr_multiplier`**: Two identical MRs (same author, same age, same path). One is `merged`, one is `closed`. The merged MR should contribute `author_weight * decay(...)`, the closed MR should contribute `author_weight * closed_mr_multiplier * decay(...)`. With default multiplier 0.5, the closed MR contributes half.
-**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the upper-bound filtering in SQL.
+**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the exclusive upper-bound (`< ?4`) filtering in SQL.
 **`test_as_of_exclusive_upper_bound`**: Insert an event with timestamp exactly equal to the `as_of_ms` value. Verify it is excluded from results (strict less-than, not less-than-or-equal). This validates the half-open interval `[since, as_of)` semantics.
 **`test_excluded_usernames_filters_bots`**: Insert signals for a user named "renovate-bot" and a user named "jsmith", both with the same activity. With `excluded_usernames: ["renovate-bot"]` in config, only "jsmith" should appear in results. Validates the Rust-side post-query filtering.
 **`test_include_bots_flag_disables_filtering`**: Same setup as above, but with `--include-bots` active. Both "renovate-bot" and "jsmith" should appear in results.
 **`test_null_timestamp_fallback_to_created_at`**: Insert a merged MR with `merged_at = NULL` (edge case: old data before the column was populated). The state-aware timestamp should fall back to `created_at`. Verify the score reflects `created_at`, not 0 or a panic.
@@ -496,11 +534,13 @@ Add timestamp-aware variants:
 **`test_reviewer_split_is_exhaustive`**: For a reviewer assigned to an MR, they must appear in exactly one of: participated (has substantive DiffNotes meeting `reviewer_min_note_chars`) or assigned-only (no DiffNotes, or only trivial ones below the threshold). Never both, never neither. Test three cases: (1) reviewer with substantive DiffNotes -> participated only, (2) reviewer with no DiffNotes -> assigned-only only, (3) reviewer with only trivial notes ("LGTM") -> assigned-only only.
 **`test_deterministic_accumulation_order`**: Insert signals for a user with contributions at many different timestamps (10+ MRs with varied ages). Run `query_expert` 100 times in a loop. All 100 runs must produce the exact same `f64` score (bit-identical). Validates that the sorted contribution ordering eliminates HashMap-iteration-order nondeterminism.
 ### 9. Existing Test Compatibility
 All existing tests insert data with `now_ms()`. With decay, elapsed ~0ms means decay ~1.0, so scores round to the same integers as before. No existing test assertions should break.
-The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, and `reviewer_min_note_chars` fields.
+The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, `reviewer_min_note_chars`, and `excluded_usernames` fields.
 ## Verification
@@ -511,11 +551,17 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul
 5. `ubs src/cli/commands/who.rs src/core/config.rs src/core/db.rs` — no bug scanner findings
 6. Manual query plan verification (not automated — SQLite planner varies across versions):
   - Run `EXPLAIN QUERY PLAN` on the expert query (both exact and prefix modes) against a real database
-   - Confirm that `matched_notes` CTE uses `idx_notes_old_path_author` or the existing new_path index (not a full table scan)
+   - Confirm that `matched_notes_raw` branch 1 uses the existing new_path index and branch 2 uses `idx_notes_old_path_author` (not a full table scan on either branch)
-   - Confirm that `matched_file_changes` CTE uses `idx_mfc_old_path_project_mr` or `idx_mfc_new_path_project_mr`
+   - Confirm that `matched_file_changes_raw` branch 1 uses `idx_mfc_new_path_project_mr` and branch 2 uses `idx_mfc_old_path_project_mr`
   - Confirm that `reviewer_participation` CTE uses `idx_notes_diffnote_discussion_author`
   - Document the observed plan in a comment near the SQL for future regression reference
-7. Real-world validation:
+7. Performance baseline (manual, not CI-gated):
   - Run `time cargo run --release -- who --path <exact-path>` on the real database for exact, prefix, and suffix modes
   - Target SLOs: p95 exact path < 200ms, prefix < 300ms, suffix < 500ms on development hardware
   - Record baseline timings as a comment near the SQL for regression reference
   - If any mode exceeds 2x the baseline after future changes, investigate before merging
   - Note: These are soft targets for developer awareness, not automated CI gates. Automated benchmarking with synthetic fixtures (100k/1M/5M notes) is a v2 investment if performance becomes a real concern.
 8. Real-world validation:
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx` — verify jdefting/zhayes old reviews are properly discounted relative to recent authors
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx --all-history` — compare full history vs 24m window to validate cutoff is reasonable
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx --explain-score` — verify component breakdown sums to total and authored signal dominates for known authors
@@ -524,6 +570,7 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx --as-of 2025-06-01` — verify deterministic output across repeated runs
   - Spot-check that reviewers who only left "LGTM"-style notes are classified as assigned-only (not participated)
   - Verify closed MRs contribute at ~50% of equivalent merged MR scores via `--explain-score`
   - If the project has known bot accounts (e.g., renovate-bot), add them to `excluded_usernames` config and verify they no longer appear in results. Run again with `--include-bots` to confirm they reappear.
 ## Accepted from External Review
@@ -553,12 +600,20 @@ Ideas incorporated from ChatGPT review (feedback-1 through feedback-4) that genu
 - **EXPLAIN QUERY PLAN verification step**: Manual check that the restructured queries use the new indexes (not automated, since SQLite planner varies across versions).
 **From feedback-4:**
- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `<= ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms]`. Without this, `--as-of` reproducibility was fundamentally broken.
+- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `< ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms)`. Without this, `--as-of` reproducibility was fundamentally broken. (Refined to exclusive upper bound in feedback-5.)
 - **Closed-state inconsistency resolution**: The state-aware CASE expression handled `closed` state but the WHERE clause filtered to `('opened','merged')` only — dead code. Resolved by including `'closed'` in state filters and adding a `closed_mr_multiplier` (default 0.5) applied in Rust to all signals from closed-without-merge MRs. This credits real review effort on abandoned MRs while appropriately discounting it.
 - **Substantive note threshold for reviewer participation**: A single "LGTM" shouldn't promote a reviewer from 3-point (assigned-only) to 10-point (participated) weight. Added `reviewer_min_note_chars` (default 20) config field and `LENGTH(TRIM(body))` filter in the `reviewer_participation` CTE. This raises the bar for participation classification to actual substantive review comments.
- **UNION ALL optimization fallback for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Added documentation of a `UNION ALL` split + dedup fallback pattern to use if EXPLAIN QUERY PLAN shows degradation during verification. Start with the simpler `OR` approach; switch only if needed.
+- **UNION ALL optimization for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Originally documented as a fallback; promoted to default strategy in feedback-5 iteration. The UNION ALL + dedup approach ensures each index branch is used independently.
 - **New tests**: `test_trivial_note_does_not_count_as_participation`, `test_closed_mr_multiplier`, `test_as_of_excludes_future_events` — cover the three new features added from this review round.
 **From feedback-5 (ChatGPT review):**
 - **Exclusive upper bound for `--as-of`**: Changed from `[since_ms, as_of_ms]` (inclusive) to `[since_ms, as_of_ms)` (exclusive). Half-open intervals are the standard convention in temporal systems — they eliminate edge-case ambiguity when events have timestamps exactly at the boundary. Also added `YYYY-MM-DD` → end-of-day UTC parsing and window metadata in robot output.
 - **UNION ALL as default for dual-path matching**: Promoted from "fallback if planner regresses" to default strategy. SQLite `OR`-across-indexed-columns degradation is common enough that the predictable UNION ALL + dedup approach is the safer starting point. The simpler `OR` variant is retained as a comment for benchmarking.
 - **Deterministic contribution ordering**: Within each signal type, sort contributions by `mr_id` before summing. This eliminates HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the overhead of compensated summation (Neumaier/Kahan was rejected as overkill at this scale).
 - **Minimal bot/service-account filtering**: Added `excluded_usernames` (exact match, case-insensitive) to `ScoringConfig` and `--include-bots` CLI flag. Applied as a Rust-side post-filter (not SQL) to keep queries clean. Scope is deliberately minimal — no regex patterns, no heuristic detection. Users configure the list for their team's specific bots.
 - **Performance baseline SLOs**: Added manual performance baseline step to verification — record timings for exact/prefix/suffix modes and flag >2x regressions. Kept lightweight (no CI gating, no synthetic benchmarks) to match the project's current maturity.
 - **New tests**: `test_as_of_exclusive_upper_bound`, `test_excluded_usernames_filters_bots`, `test_include_bots_flag_disables_filtering`, `test_deterministic_accumulation_order` — cover the newly-accepted features.
 ## Rejected Ideas (with rationale)
 These suggestions were considered during review but explicitly excluded from this iteration:
@@ -573,3 +628,10 @@ These suggestions were considered during review but explicitly excluded from thi
 - **Split scoring engine into core module** (feedback-4 #5): Proposed extracting scoring math from `who.rs` into `src/core/scoring/model_v2_decay.rs`. Premature modularization — `who.rs` is the only consumer and is ~800 lines. Adding module plumbing and indirection for a single call site adds complexity without reducing it. If we add a second scoring consumer (e.g., automated triage), revisit.
 - **Bot/service-account filtering** (feedback-4 #7): Real concern but orthogonal to time-decay scoring. This is a general data quality feature that belongs in its own issue — it affects all `who` modes, not just expert scoring. Adding `excluded_username_patterns` config and `--include-bots` flag is scope expansion that should be designed and tested independently.
 - **Model compare mode / rank-delta diagnostics** (feedback-4 #9): Over-engineered rollout safety for an internal CLI tool with ~3 users. Maintaining two parallel scoring codepaths (v1 flat + v2 decayed) doubles test surface and code complexity. The `--explain-score` + `--as-of` combination already provides debugging capability. If a future model change is risky enough to warrant A/B comparison, build it then.
 - **Canonical path identity graph** (feedback-5 #1, also feedback-2 #2, feedback-4 #4): Third time proposed, third time rejected. Building a rename graph from `mr_file_changes(old_path, new_path)` with identity resolution requires new schema (`path_identities`, `path_aliases` tables), ingestion pipeline changes, graph traversal at query time, and backfill logic for existing data. The UNION ALL dual-path matching already covers the 80%+ case (direct renames). Multi-hop rename chains (A→B→C) are rare in practice and can be addressed in v2 with real usage data showing the gap matters.
 - **Normalized `expertise_events` table** (feedback-5 #2): Proposes shifting from query-time CTE joins to a precomputed `expertise_events` table populated at ingest time. While architecturally appealing for read performance, this doubles the data surface area (raw tables + derived events), requires new ingestion pipelines with incremental upsert logic, backfill tooling for existing databases, and introduces consistency risks when raw data is corrected/re-synced. The CTE approach is correct, maintainable, and performant at our current scale. If query latency becomes a real bottleneck (see performance baseline SLOs), materialized views or derived tables become a v2 optimization.
 - **Reviewer engagement model upgrade** (feedback-5 #3): Proposes adding `approved`/`changes_requested` review-state signals and trivial-comment pattern matching (`["lgtm","+1","nit","ship it"]`). Expands the signal type count from 4 to 6 and adds a fragile pattern-matching layer (what about "don't ship it"? "lgtm but..."?). The `reviewer_min_note_chars` threshold is imperfect but pragmatic — it's a single configurable number with no false-positive risk from substring matching. Review-state signals may be worth adding later as a separate enhancement when we have data on how often they diverge from DiffNote participation.
 - **Contribution-floor auto cutoff for `--since`** (feedback-5 #5): Proposes `--since auto` computing the earliest relevant timestamp from `min_contribution_floor` (e.g., 0.01 points). Adds a non-obvious config parameter for minimal benefit — the 24m default is already mathematically justified from the decay curves (author: 6%, reviewer: 0.4% at 2 years) and easily overridden with `--since` or `--all-history`. The auto-derivation formula (`ceil(max_half_life * log2(1/floor))`) is opaque to users who just want to understand why a certain time range was selected.
 - **Full evidence drill-down in `--explain-score`** (feedback-5 #8): Proposes `--explain-score=summary|full` with per-MR evidence rows. Already rejected in feedback-2 #7. Component totals are sufficient for v1 debugging — they answer "which signal type drives this user's score." Per-MR drill-down requires additional SQL queries and significant output format complexity. Deferred unless component breakdowns prove insufficient.
 - **Neumaier compensated summation** (feedback-5 #7 partial): Accepted the sorting aspect for deterministic ordering, but rejected Neumaier/Kahan compensated summation. At the scale of dozens to low hundreds of contributions per user, the rounding error from naive f64 summation is on the order of 1e-14 — several orders of magnitude below any meaningful score difference. Compensated summation adds code complexity and a maintenance burden for no practical benefit at this scale.
 - **Automated CI benchmark gate** (feedback-5 #10 partial): Accepted manual performance baselines, but rejected automated CI regression gating with synthetic fixtures (100k/1M/5M notes). Building and maintaining benchmark infrastructure is a significant investment that's premature for a CLI tool with ~3 users. Manual timing checks during development are sufficient until performance becomes a real concern.
--- a/plans/work-item-status-graphql.feedback-3.md
+++ b/plans/work-item-status-graphql.feedback-3.md
@@ -0,0 +1,157 @@
 **Top Revisions I Recommend**
 1. **Fix auth semantics + a real inconsistency in your test plan**
 Your ACs require graceful handling for `403`, but the test list says the “403” test returns `401`. That hides the exact behavior you care about and can let permission regressions slip through.
 ```diff
@@ AC-1: GraphQL Client (Unit)
 - [ ] HTTP 401 → `LoreError::GitLabAuthFailed`
 + [ ] HTTP 401 → `LoreError::GitLabAuthFailed`
 + [ ] HTTP 403 → `LoreError::GitLabForbidden`
@@ AC-3: Status Fetcher (Integration)
 - [ ] GraphQL 403 → returns `Ok(HashMap::new())` with warning log
 + [ ] GraphQL 403 (`GitLabForbidden`) → returns `Ok(HashMap::new())` with warning log
@@ TDD Plan (RED)
 - 13. `test_fetch_statuses_403_graceful` — mock returns 401 → `Ok(HashMap::new())`
 + 13. `test_fetch_statuses_403_graceful` — mock returns 403 → `Ok(HashMap::new())`
 ```
 2. **Make enrichment atomic and stale-safe**
 Current plan can leave stale status values forever when a widget disappears or status becomes null. Make writes transactional and clear status fields for fetched scope before upserts.
 ```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
 + [ ] Enrichment DB writes are transactional per project (all-or-nothing)
 + [ ] Status fields are cleared for fetched issue scope before applying new statuses
 + [ ] If enrichment fails mid-project, prior persisted statuses are unchanged (rollback)
@@ File 6: `src/ingestion/orchestrator.rs`
 - fn enrich_issue_statuses(...)
 + fn enrich_issue_statuses_txn(...)
 + // BEGIN TRANSACTION
 + // clear status columns for fetched issue scope
 + // apply updates
 + // COMMIT
 ```
 3. **Add transient retry/backoff (429/5xx/network)**
 Right now one transient failure loses status enrichment for that sync. Retrying with bounded backoff gives much better reliability at low cost.
 ```diff
@@ AC-1: GraphQL Client (Unit)
 + [ ] Retries 429/502/503/504/network errors with bounded exponential backoff + jitter (max 3 attempts)
 + [ ] Honors `Retry-After` on 429 before retrying
@@ AC-6: Enrichment in Orchestrator (Integration)
 + [ ] Cancellation signal is checked before each retry sleep and between paginated calls
 ```
 4. **Stop full GraphQL scans when nothing changed**
 Running full pagination on every sync will dominate runtime on large repos. Trigger enrichment only when issue ingestion reports changes, with a manual override.
 ```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
 - [ ] Runs on every sync (not gated by `--full`)
 + [ ] Runs when issue ingestion changed at least one issue in the project
 + [ ] New override flag `--refresh-status` forces enrichment even with zero issue deltas
 + [ ] Optional periodic full refresh (e.g. every N syncs) to prevent long-tail drift
 ```
 5. **Do not expose raw token via `client.token()`**
 Architecturally cleaner and safer: keep token encapsulated and expose a GraphQL-ready client factory from `GitLabClient`.
 ```diff
@@ File 13: `src/gitlab/client.rs`
 - pub fn token(&self) -> &str
 + pub fn graphql_client(&self) -> crate::gitlab::graphql::GraphqlClient
@@ File 6: `src/ingestion/orchestrator.rs`
 - let graphql_client = GraphqlClient::new(&config.gitlab.base_url, client.token());
 + let graphql_client = client.graphql_client();
 ```
 6. **Add indexes for new status filters**
 `--status` on large tables will otherwise full-scan `issues`. Add compound indexes aligned with project-scoped list queries.
 ```diff
@@ AC-4: Migration 021 (Unit)
 + [ ] Adds index `idx_issues_project_status_name(project_id, status_name)`
 + [ ] Adds index `idx_issues_project_status_category(project_id, status_category)`
@@ File 14: `migrations/021_work_item_status.sql`
 ALTER TABLE issues ADD COLUMN status_name TEXT;
 ALTER TABLE issues ADD COLUMN status_category TEXT;
 ALTER TABLE issues ADD COLUMN status_color TEXT;
 ALTER TABLE issues ADD COLUMN status_icon_name TEXT;
 +CREATE INDEX IF NOT EXISTS idx_issues_project_status_name
 +  ON issues(project_id, status_name);
 +CREATE INDEX IF NOT EXISTS idx_issues_project_status_category
 +  ON issues(project_id, status_category);
 ```
 7. **Improve filter UX: add category filter + case-insensitive status**
 Case-sensitive exact name matches are brittle with custom lifecycle names. Category filter is stable and useful for automation.
 ```diff
@@ AC-9: List Issues Filter (E2E)
 - [ ] Filter is case-sensitive (matches GitLab's exact status name)
 + [ ] `--status` uses case-insensitive exact match by default (`COLLATE NOCASE`)
 + [ ] New filter `--status-category` supports `triage|to_do|in_progress|done|canceled`
 + [ ] `--status-exact` enables strict case-sensitive behavior when needed
 ```
 8. **Add capability probe/cache to avoid pointless calls**
 Free tier / old GitLab versions will never return status widget. Cache that capability per project (with TTL) to reduce noise and wasted requests.
 ```diff
@@ GitLab API Constraints
 +### Capability Probe
 +On first sync per project, detect status-widget support and cache result for 24h.
 +If unsupported, skip enrichment silently (debug log) until TTL expiry.
@@ AC-3: Status Fetcher (Integration)
 + [ ] Unsupported capability state bypasses GraphQL fetch and warning spam
 ```
 9. **Use a nested robot `status` object instead of 4 top-level fields**
 This is cleaner schema design and scales better as status metadata grows (IDs, lifecycle, timestamps, etc.).
 ```diff
@@ AC-7: Show Issue Display (Robot)
 - [ ] JSON includes `status_name`, `status_category`, `status_color`, `status_icon_name` fields
 - [ ] Fields are `null` (not absent) when status not available
 + [ ] JSON includes `status` object:
 +       `{ "name": "...", "category": "...", "color": "...", "icon_name": "..." }` or `null`
@@ AC-8: List Issues Display (Robot)
 - [ ] JSON includes `status_name`, `status_category` fields on each issue
 + [ ] JSON includes `status` object (or `null`) on each issue
 ```
 10. **Add one compelling feature: status analytics, not just status display**
 Right now this is mostly a transport/display enhancement. Make it genuinely useful with “stale in-progress” detection and age-in-status filters.
 ```diff
@@ Acceptance Criteria
 +### AC-11: Status Aging & Triage Value (E2E)
 +- [ ] `lore list issues --status-category in_progress --stale-days 14` filters to stale work
 +- [ ] Human table shows `Status Age` (days) when status exists
 +- [ ] Robot output includes `status_age_days` (nullable integer)
 ```
 11. **Harden test plan around failure modes you’ll actually hit**
 The current tests are good, but miss rollback/staleness/retry behavior that drives real reliability.
 ```diff
@@ TDD Plan (RED) additions
 +21. `test_enrich_clears_removed_status`
 +22. `test_enrich_transaction_rolls_back_on_failure`
 +23. `test_graphql_retry_429_then_success`
 +24. `test_graphql_retry_503_then_success`
 +25. `test_cancel_during_backoff_aborts_cleanly`
 +26. `test_status_filter_query_uses_project_status_index` (EXPLAIN smoke test)
 ```
 If you want, I can produce a fully revised v3 plan document end-to-end (frontmatter + reordered ACs + updated file list + updated TDD matrix) so it is ready to implement directly.
--- a/plans/work-item-status-graphql.feedback-4.md
+++ b/plans/work-item-status-graphql.feedback-4.md
@@ -0,0 +1,159 @@
 Your plan is already strong, but I’d revise it in 10 places to reduce risk at scale and make it materially more useful.
 1. Shared transport + retries for GraphQL (must-have)
 Reasoning: `REST` already has throttling/retry in `src/gitlab/client.rs`; your proposed GraphQL client would bypass that and can spike rate limits under concurrent project ingest (`src/cli/commands/ingest.rs`). Unifying transport prevents split behavior and cuts production incidents.
 ```diff
@@ AC-1: GraphQL Client (Unit)
 - [ ] Network error → `LoreError::Other`
 + [ ] GraphQL requests use shared GitLab transport (same timeout, rate limiter, retry policy as REST)
 + [ ] Retries 429/502/503/504/network errors (max 3) with exponential backoff + jitter
 + [ ] 429 honors `Retry-After` before retrying
 + [ ] Exhausted network retries → `LoreError::GitLabNetworkError`
@@ Decisions
 - 8. **No retry/backoff in v1** — DEFER.
 + 8. **Retry/backoff in v1** — YES (shared REST+GraphQL reliability policy).
@@ Implementation Detail
 + File 15: `src/gitlab/transport.rs` (NEW) — shared HTTP execution and retry/backoff policy.
 ```
 2. Capability cache for unsupported projects (must-have)
 Reasoning: Free tier / older GitLab will repeatedly emit warning noise every sync and waste calls. Cache support status per project and re-probe on TTL.
 ```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
 - [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync)
 + [ ] Unsupported capability responses (missing endpoint/type/widget) are cached per project
 + [ ] While cached unsupported, enrichment is skipped without repeated warning spam
 + [ ] Capability cache auto-expires (default 24h) and is re-probed
@@ Migration Numbering
 - This feature uses **migration 021**.
 + This feature uses **migrations 021-022**.
@@ Files Changed (Summary)
 + `migrations/022_project_capabilities.sql` | NEW — support cache table for project capabilities
 ```
 3. Delta-first enrichment with periodic full reconcile (must-have)
 Reasoning: Full GraphQL scan every sync is expensive for large projects. You already compute issue deltas in ingestion; use that as fast path and keep a periodic full sweep as safety net.
 ```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
 - [ ] Runs on every sync (not gated by `--full`)
 + [ ] Fast path: skip status enrichment when issue ingestion upserted 0 issues for that project
 + [ ] Safety net: run full reconciliation every `status_full_reconcile_hours` (default 24)
 + [ ] `--full` always forces reconciliation
@@ AC-5: Config Toggle (Unit)
 + [ ] `SyncConfig` has `status_full_reconcile_hours: u32` (default 24)
 ```
 4. Strongly typed widget parsing via `__typename` (must-have)
 Reasoning: current “deserialize arbitrary widget JSON into `StatusWidget`” is fragile. Query/type by `__typename` for forward compatibility and fewer silent parse mistakes.
 ```diff
@@ AC-3: Status Fetcher (Integration)
 - [ ] Extracts status from `widgets` array by matching `WorkItemWidgetStatus` fragment
 + [ ] Query includes `widgets { __typename ... }` and parser matches `__typename == "WorkItemWidgetStatus"`
 + [ ] Non-status widgets are ignored deterministically (no heuristic JSON-deserialize attempts)
@@ GraphQL Query
 +       widgets {
 +         __typename
 +         ... on WorkItemWidgetStatus { ... }
 +       }
 ```
 5. Set-based transactional DB apply (must-have)
 Reasoning: row-by-row clear/update loops will be slow on large projects and hold write locks longer. Temp-table + set-based SQL inside one txn is faster and easier to reason about rollback.
 ```diff
@@ AC-3: Status Fetcher (Integration)
 - `all_fetched_iids: Vec<i64>`
 + `all_fetched_iids: HashSet<i64>`
@@ AC-6: Enrichment in Orchestrator (Integration)
 - [ ] Before applying updates, NULL out status fields ... (loop per IID)
 - [ ] UPDATE SQL: `SET status_name=?, ... WHERE project_id=? AND iid=?`
 + [ ] Use temp tables and set-based SQL in one transaction:
 + [ ]   (1) clear stale statuses for fetched IIDs absent from status rows
 + [ ]   (2) apply status values for fetched IIDs with status
 + [ ] One commit per project; rollback leaves prior state intact
 ```
 6. Fix index strategy for `COLLATE NOCASE` + default sorting (must-have)
 Reasoning: your proposed `(project_id, status_name)` index may not fully help `COLLATE NOCASE` + `ORDER BY updated_at`. Tune index to real query shape in `src/cli/commands/list.rs`.
 ```diff
@@ AC-4: Migration 021 (Unit)
 - [ ] Adds compound index `idx_issues_project_status_name(project_id, status_name)` for `--status` filter performance
 + [ ] Adds covering NOCASE-aware index:
 + [ ]   `idx_issues_project_status_name_nocase_updated(project_id, status_name COLLATE NOCASE, updated_at DESC)`
 + [ ] Adds category index:
 + [ ]   `idx_issues_project_status_category_nocase(project_id, status_category COLLATE NOCASE)`
 ```
 7. Add stable/automation-friendly filters now (high-value feature)
 Reasoning: status names are user-customizable and renameable; category is more stable. Also add `--no-status` for quality checks and migration visibility.
 ```diff
@@ AC-9: List Issues Filter (E2E)
 + [ ] `lore list issues --status-category in_progress` filters by category (case-insensitive)
 + [ ] `lore list issues --no-status` returns only issues where `status_name IS NULL`
 + [ ] `--status` + `--status-category` combine with AND logic
@@ File 9: `src/cli/mod.rs`
 + Add flags: `--status-category`, `--no-status`
@@ File 11: `src/cli/autocorrect.rs`
 + Register `--status-category` and `--no-status` for `issues`
 ```
 8. Better enrichment observability and failure accounting (must-have ops)
 Reasoning: only tracking `statuses_enriched` hides skipped/cleared/errors, and auth failures become silent partial data quality issues. Add counters and explicit progress events.
 ```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
 - [ ] `IngestProjectResult` gains `statuses_enriched: usize` counter
 - [ ] Progress event: `ProgressEvent::StatusEnrichmentComplete { enriched: usize }`
 + [ ] `IngestProjectResult` gains:
 + [ ]   `statuses_enriched`, `statuses_cleared`, `status_enrichment_skipped`, `status_enrichment_failed`
 + [ ] Progress events:
 + [ ]   `StatusEnrichmentStarted`, `StatusEnrichmentSkipped`, `StatusEnrichmentComplete`, `StatusEnrichmentFailed`
 + [ ] End-of-sync summary includes per-project enrichment outcome counts
 ```
 9. Add `status_changed_at` for immediately useful workflow analytics (high-value feature)
 Reasoning: without change timestamp, you can’t answer “how long has this been in progress?” which is one of the most useful agent/human queries.
 ```diff
@@ AC-4: Migration 021 (Unit)
 + [ ] Adds nullable INTEGER column `status_changed_at` (ms epoch UTC)
@@ AC-6: Enrichment in Orchestrator (Integration)
 + [ ] If status_name/category changes, update `status_changed_at = now_ms()`
 + [ ] If status is cleared, set `status_changed_at = NULL`
@@ AC-9: List Issues Filter (E2E)
 + [ ] `lore list issues --stale-status-days N` filters by `status_changed_at <= now - N days`
 ```
 10. Expand test matrix for real-world failure/perf paths (must-have)
 Reasoning: current tests are good, but the highest-risk failures are retry behavior, capability caching, idempotency under repeated runs, and large-project performance.
 ```diff
@@ TDD Plan — RED Phase
 + 26. `test_graphql_retries_429_with_retry_after_then_succeeds`
 + 27. `test_graphql_retries_503_then_fails_after_max_attempts`
 + 28. `test_capability_cache_skips_unsupported_project_until_ttl_expiry`
 + 29. `test_delta_skip_when_no_issue_upserts`
 + 30. `test_periodic_full_reconcile_runs_after_threshold`
 + 31. `test_set_based_enrichment_scales_10k_issues_without_timeout`
 + 32. `test_enrichment_idempotent_across_two_runs`
 + 33. `test_status_changed_at_updates_only_on_actual_status_change`
 ```
 If you want, I can now produce a single consolidated revised plan document (full rewritten Markdown) with these changes merged in-place so it’s ready to execute.
--- a/plans/work-item-status-graphql.feedback-5.md
+++ b/plans/work-item-status-graphql.feedback-5.md
@@ -0,0 +1,124 @@
 Your plan is already strong and implementation-aware. The best upgrades are mostly about reliability under real-world API instability, large-scale performance, and making the feature more useful for automation.
 1. Promote retry/backoff from deferred to in-scope now.
 Reason: Right now, transient failures cause silent status gaps until a later sync. Bounded retries with jitter and a time budget dramatically improve successful enrichment without making syncs hang.
 ```diff
@@ AC-1: GraphQL Client (Unit) @@
 - [ ] Network error → `LoreError::Other`
 + [ ] Transient failures (`429`, `502`, `503`, `504`, timeout, connect reset) retry with exponential backoff + jitter (max 3 attempts)
 + [ ] `Retry-After` supports both delta-seconds and HTTP-date formats
 + [ ] Per-request retry budget capped (e.g. 120s total) to preserve cancellation responsiveness
@@ AC-6: Enrichment in Orchestrator (Integration) @@
 - [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync)
 + [ ] On transient GraphQL errors: retry policy applied before warning/skip behavior
@@ Decisions @@
 - 8. **No retry/backoff in v1** — DEFER.
 + 8. **Retry/backoff in v1** — YES. Required for reliable enrichment under normal GitLab/API turbulence.
 ```
 2. Add a capability cache so unsupported projects stop paying repeated GraphQL cost.
 Reason: Free tier / older instances will never return status widgets. Re-querying every sync is wasted time and noisy logs.
 ```diff
@@ Acceptance Criteria @@
 + ### AC-11: Capability Probe & Cache (Integration)
 + - [ ] Add `project_capabilities` cache with `supports_work_item_status`, `checked_at`, `cooldown_until`
 + - [ ] 404/403/known-unsupported responses update capability cache and suppress repeated warnings until TTL expires
 + - [ ] Supported projects still enrich every run (subject to normal schedule)
@@ Future Enhancements (Not in Scope) @@
 - **Capability probe/cache**: Detect status-widget support per project ... (deferred)
 + (moved into scope as AC-11)
 ```
 3. Make enrichment delta-aware with periodic forced reconciliation.
 Reason: Full pagination every sync is expensive on large projects. You can skip unnecessary status fetches when no issue changes occurred, while still doing periodic safety sweeps.
 ```diff
@@ AC-6: Enrichment in Orchestrator (Integration) @@
 - [ ] Runs on every sync (not gated by `--full`)
 + [ ] Runs when issue ingestion reports project issue deltas OR reconcile window elapsed
 + [ ] New config: `status_reconcile_hours` (default: 24) for periodic full sweep
 + [ ] `--refresh-status` forces enrichment regardless of delta/reconcile window
 ```
 4. Replace row-by-row update loops with set-based SQL via temp table.
 Reason: Current per-IID loops are simple but slow at scale and hold locks longer. Set-based updates are much faster and reduce lock contention.
 ```diff
@@ File 6: `src/ingestion/orchestrator.rs` (MODIFY) @@
 - for iid in all_fetched_iids { ... UPDATE issues ... }
 - for (iid, status) in statuses { ... UPDATE issues ... }
 + CREATE TEMP TABLE temp_issue_status_updates(...)
 + bulk INSERT temp rows (iid, name, category, color, icon_name)
 + single set-based UPDATE for enriched rows
 + single set-based NULL-clear for fetched-without-status rows
 + commit transaction
 ```
 5. Add strict mode and explicit partial-failure reporting.
 Reason: “Warn and continue” is good default UX, but automation needs a fail-fast option and machine-readable failure output.
 ```diff
@@ AC-5: Config Toggle (Unit) @@
 + - [ ] `SyncConfig` adds `status_enrichment_strict: bool` (default false)
@@ AC-6: Enrichment in Orchestrator (Integration) @@
 - [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync)
 + [ ] Default mode: warn + continue
 + [ ] Strict mode: status enrichment error fails sync for that run
@@ AC-6: IngestProjectResult @@
 + - [ ] Adds `status_enrichment_error: Option<String>`
@@ AC-8 / Robot sync envelope @@
 + - [ ] Robot output includes `partial_failures` array with per-project enrichment failures
 ```
 6. Fix case-insensitive matching robustness and track freshness.
 Reason: SQLite `COLLATE NOCASE` is ASCII-centric; custom statuses may be non-ASCII. Also you need visibility into staleness.
 ```diff
@@ AC-4: Migration 021 (Unit) @@
 - [ ] Migration adds 4 nullable TEXT columns to `issues`
 + [ ] Migration adds 6 columns:
 +     `status_name`, `status_category`, `status_color`, `status_icon_name`,
 +     `status_name_fold`, `status_synced_at`
 - [ ] Adds compound index `idx_issues_project_status_name(project_id, status_name)`
 + [ ] Adds compound index `idx_issues_project_status_name_fold(project_id, status_name_fold)`
@@ AC-9: List Issues Filter (E2E) @@
 - [ ] Filter uses case-insensitive matching (`COLLATE NOCASE`)
 + [ ] Filter uses `status_name_fold` (Unicode-safe fold normalization done at write time)
 ```
 7. Expand filtering to category and missing-status workflows.
 Reason: Name filters are useful, but automation is better on semantic categories and “missing data” detection.
 ```diff
@@ AC-9: List Issues Filter (E2E) @@
 + - [ ] `--status-category in_progress` filters by `status_category` (case-insensitive)
 + - [ ] `--no-status` returns only issues where `status_name IS NULL`
 + - [ ] `--status` and `--status-category` can be combined with AND logic
 ```
 8. Change robot payload from flat status fields to a nested `status` object.
 Reason: Better schema evolution and less top-level field sprawl as you add metadata (`synced_at`, future lifecycle fields).
 ```diff
@@ AC-7: Show Issue Display (E2E) @@
 - [ ] JSON includes `status_name`, `status_category`, `status_color`, `status_icon_name` fields
 - [ ] Fields are `null` (not absent) when status not available
 + [ ] JSON includes `status` object:
 +     `{ "name", "category", "color", "icon_name", "synced_at" }`
 + [ ] `status: null` when not available
@@ AC-8: List Issues Display (E2E) @@
 - [ ] `--fields` supports: `status_name`, `status_category`, `status_color`, `status_icon_name`
 + [ ] `--fields` supports: `status.name,status.category,status.color,status.icon_name,status.synced_at`
 ```
 If you want, I can produce a fully rewritten “Iteration 5” plan document with these changes integrated end-to-end (ACs, files, migrations, TDD batches, and updated decisions/future-scope).
--- a/plans/work-item-status-graphql.feedback-6.md
+++ b/plans/work-item-status-graphql.feedback-6.md
@@ -0,0 +1,130 @@
 Your iteration-5 plan is strong. The biggest remaining gaps are outcome ambiguity, cancellation safety, and long-term status identity. These are the revisions I’d make.
 1. **Make enrichment outcomes explicit (not “empty success”)**
 Analysis:
 Right now `404/403 -> Ok(empty)` is operationally ambiguous: “project has no statuses” vs “feature unavailable/auth issue.” Agents and dashboards need that distinction to make correct decisions.
 This improves reliability and observability without making sync fail-hard.
 ```diff
@@ AC-3: Status Fetcher (Integration)
 -- [ ] `fetch_issue_statuses()` returns `FetchStatusResult` containing:
 +- [ ] `fetch_issue_statuses()` returns `FetchStatusOutcome`:
 +  - `Fetched(FetchStatusResult)`
 +  - `Unsupported { reason: UnsupportedReason }`
 +  - `CancelledPartial(FetchStatusResult)`
@@
 -- [ ] GraphQL 404 → returns `Ok(FetchStatusResult)` with empty collections + warning log
 -- [ ] GraphQL 403 (`GitLabAuthFailed`) → returns `Ok(FetchStatusResult)` with empty collections + warning log
 +- [ ] GraphQL 404 → `Unsupported { reason: GraphqlEndpointMissing }` + warning log
 +- [ ] GraphQL 403 (`GitLabAuthFailed`) → `Unsupported { reason: AuthForbidden }` + warning log
@@ AC-10: Robot Sync Envelope (E2E)
 -- [ ] `status_enrichment` object: `{ "enriched": N, "cleared": N, "error": null | "message" }`
 +- [ ] `status_enrichment` object: `{ "mode": "fetched|unsupported|cancelled_partial", "reason": null|"...", "enriched": N, "cleared": N, "error": null|"message" }`
 ```
 2. **Add cancellation and pagination loop safety**
 Analysis:
 Large projects can run long. Current flow checks cancellation only before enrichment starts; pagination and per-row update loops can ignore cancellation for too long. Also, GraphQL cursor bugs can create infinite loops (`hasNextPage=true` with unchanged cursor).
 This is a robustness must-have.
 ```diff
@@ AC-3: Status Fetcher (Integration)
 + [ ] `fetch_issue_statuses()` accepts cancellation signal and checks it between page requests
 + [ ] Pagination guard: if `hasNextPage=true` but `endCursor` is `None` or unchanged, abort loop with warning and return partial outcome
 + [ ] Emits `pages_fetched` count for diagnostics
@@ File 1: `src/gitlab/graphql.rs`
 -- pub async fn fetch_issue_statuses(client: &GraphqlClient, project_path: &str) -> Result<FetchStatusResult>
 +- pub async fn fetch_issue_statuses(client: &GraphqlClient, project_path: &str, signal: &CancellationSignal) -> Result<FetchStatusOutcome>
 ```
 3. **Persist stable `status_id` in addition to name**
 Analysis:
 `status_name` is display-oriented and mutable (rename/custom lifecycle changes). A stable status identifier is critical for durable automations, analytics, and future migrations.
 This is a schema decision that is cheap now and expensive later if skipped.
 ```diff
@@ AC-2: Status Types (Unit)
 -- [ ] `WorkItemStatus` struct has `name`, `category`, `color`, `icon_name`
 +- [ ] `WorkItemStatus` struct has `id: String`, `name`, `category`, `color`, `icon_name`
@@ AC-4: Migration 021 (Unit)
 -- [ ] Migration adds 5 nullable columns to `issues`: `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at`
 +- [ ] Migration adds 6 nullable columns to `issues`: `status_id`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at`
 + [ ] Adds index `idx_issues_project_status_id(project_id, status_id)` for stable-machine filters
@@ GraphQL query
 -            status { name category color iconName }
 +            status { id name category color iconName }
@@ AC-7 / AC-8 Robot
 + [ ] JSON includes `status_id` (null when unavailable)
 ```
 4. **Handle GraphQL partial-data responses correctly**
 Analysis:
 GraphQL can return both `data` and `errors` in the same response. Current plan treats any `errors` as hard failure, which can discard valid data and reduce reliability.
 Use partial-data semantics: keep data, log/report warnings.
 ```diff
@@ AC-1: GraphQL Client (Unit)
 -- [ ] Error response: if top-level `errors` array is non-empty, returns `LoreError` with first error message
 +- [ ] If `errors` non-empty and `data` missing: return `LoreError` with first error message
 +- [ ] If `errors` non-empty and `data` present: return `data` + warning metadata (do not fail the whole fetch)
@@ TDD Plan (RED)
 + 33. `test_graphql_partial_data_with_errors_returns_data_and_warning`
 ```
 5. **Extract status enrichment from orchestrator into dedicated module**
 Analysis:
 `orchestrator.rs` already has many phases. Putting status transport/parsing/transaction policy directly there increases coupling and test friction.
 A dedicated module improves architecture clarity and makes future enhancements safer.
 ```diff
@@ Implementation Detail
 +- File 15: `src/ingestion/enrichment/status.rs` (NEW)
 +  - `run_status_enrichment(...)`
 +  - `enrich_issue_statuses_txn(...)`
 +  - outcome mapping + telemetry
@@ File 6: `src/ingestion/orchestrator.rs`
 -- Inline Phase 1.5 logic + helper function
 +- Delegates to `enrichment::status::run_status_enrichment(...)` and records returned stats
 ```
 6. **Add status/state consistency checks**
 Analysis:
 GitLab states status categories and issue state should synchronize, but ingestion drift or API edge cases can violate this. Detecting mismatch is high-signal for data integrity issues.
 This is compelling for agents because it catches “looks correct but isn’t” problems.
 ```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
 + [ ] Enrichment computes `status_state_mismatches` count:
 +   - `DONE|CANCELED` with `state=open` or `TO_DO|IN_PROGRESS|TRIAGE` with `state=closed`
 + [ ] Logs warning summary when mismatches > 0
@@ AC-10: Robot Sync Envelope (E2E)
 + [ ] `status_enrichment` includes `state_mismatches: N`
 ```
 7. **Add explicit performance envelope acceptance criterion**
 Analysis:
 Plan claims large-project handling, but no hard validation target is defined. Add a bounded, reproducible performance criterion to prevent regressions.
 This is especially important with pagination + per-row writes.
 ```diff
@@ Acceptance Criteria
 + ### AC-12: Performance Envelope (Integration)
 + - [ ] 10k-issue fixture completes status fetch + apply within defined budget on CI baseline machine
 + - [ ] Memory usage remains O(page_size), not O(total_issues)
 + - [ ] Cancellation during large sync exits within a bounded latency target
@@ TDD Plan (RED)
 + 34. `test_enrichment_large_project_budget`
 + 35. `test_fetch_statuses_memory_bound_by_page`
 + 36. `test_cancellation_latency_during_pagination`
 ```
 If you want, I can next produce a single consolidated “iteration 6” plan draft with these diffs fully merged so it’s ready to execute.
--- a/plans/work-item-status-graphql.feedback-7.md
+++ b/plans/work-item-status-graphql.feedback-7.md
@@ -0,0 +1,118 @@
 **Highest-Impact Revisions (new, not in your rejected list)**
 1. **Critical: Preserve GraphQL partial-error metadata end-to-end (don’t just log it)**
 Rationale: Right now partial GraphQL errors are warning-only. Agents get no machine-readable signal that status data may be incomplete, which can silently corrupt downstream automation decisions. Exposing partial-error metadata in `FetchStatusResult` and robot sync output makes reliability observable and actionable.
 ```diff
@@ AC-1: GraphQL Client (Unit)
 - [ ] Partial-data response: if `errors` array is non-empty BUT `data` field is present and non-null, returns `data` and logs warning with first error message
 + [ ] Partial-data response: if `errors` array is non-empty BUT `data` field is present and non-null, returns `data` and warning metadata (`had_errors=true`, `first_error_message`)
 + [ ] `GraphqlClient::query()` returns `GraphqlQueryResult { data, had_errors, first_error_message }`
@@ AC-3: Status Fetcher (Integration)
 + [ ] `FetchStatusResult` includes `partial_error_count: usize` and `first_partial_error: Option<String>`
 + [ ] Partial GraphQL errors increment `partial_error_count` and are surfaced to orchestrator result
@@ AC-10: Robot Sync Envelope (E2E)
 - { "mode": "...", "reason": ..., "enriched": N, "cleared": N, "error": ... }
 + { "mode": "...", "reason": ..., "enriched": N, "cleared": N, "error": ..., "partial_errors": N, "first_partial_error": null|"..." }
@@ File 1: src/gitlab/graphql.rs
 - pub async fn query(...) -> Result<serde_json::Value>
 + pub async fn query(...) -> Result<GraphqlQueryResult>
 + pub struct GraphqlQueryResult { pub data: serde_json::Value, pub had_errors: bool, pub first_error_message: Option<String> }
 ```
 2. **High: Add adaptive page-size fallback for GraphQL complexity/timeout failures**
 Rationale: Fixed `first: 100` is brittle on self-hosted instances with stricter complexity/time limits. Adaptive page size (100→50→25→10) improves success rate without retries/backoff and avoids failing an entire project due to one tunable server constraint.
 ```diff
@@ Query Path
 -query($projectPath: ID!, $after: String) { ... workItems(types: [ISSUE], first: 100, after: $after) ... }
 +query($projectPath: ID!, $after: String, $first: Int!) { ... workItems(types: [ISSUE], first: $first, after: $after) ... }
@@ AC-3: Status Fetcher (Integration)
 + [ ] Starts with `first=100`; on GraphQL complexity/timeout errors, retries same cursor with smaller page size (50, 25, 10)
 + [ ] If smallest page size still fails, returns error as today
 + [ ] Emits warning including page size downgrade event
@@ TDD Plan (RED)
 + 36. `test_fetch_statuses_complexity_error_reduces_page_size`
 + 37. `test_fetch_statuses_timeout_error_reduces_page_size`
 ```
 3. **High: Make project path lookup failure non-fatal for the sync**
 Rationale: Enrichment is optional. If `projects.path_with_namespace` lookup fails for any reason, sync should continue with a structured enrichment error instead of risking full project pipeline failure.
 ```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
 + [ ] If project path lookup fails/missing, status enrichment is skipped for that project, warning logged, and sync continues
 + [ ] `status_enrichment_error` captures `"project_path_missing"` (or DB error text)
@@ File 6: src/ingestion/orchestrator.rs
 - let project_path: String = conn.query_row(...)?;
 + let project_path = conn.query_row(...).optional()?;
 + if project_path.is_none() {
 +   result.status_enrichment_error = Some("project_path_missing".to_string());
 +   result.status_enrichment_mode = "fetched".to_string(); // attempted but unavailable locally
 +   emit(ProgressEvent::StatusEnrichmentComplete { enriched: 0, cleared: 0 });
 +   // continue to discussion sync
 + }
 ```
 4. **Medium: Upgrade `--status` from single-value to repeatable multi-value filter**
 Rationale: Practical usage often needs “active buckets” (`To do` OR `In progress`). Repeatable `--status` with OR semantics dramatically improves usefulness without adding new conceptual surface area.
 ```diff
@@ AC-9: List Issues Filter (E2E)
 - [ ] `lore list issues --status "In progress"` → only issues where `status_name = 'In progress'`
 + [ ] `lore list issues --status "In progress"` → unchanged single-value behavior
 + [ ] Repeatable flags supported: `--status "In progress" --status "To do"` (OR semantics across status values)
 + [ ] Repeated `--status` remains AND-composed with other filters
@@ File 9: src/cli/mod.rs
 - pub status: Option<String>,
 + pub status: Vec<String>,   // repeatable flag
@@ File 8: src/cli/commands/list.rs
 - if let Some(status) = filters.status { where_clauses.push("i.status_name = ? COLLATE NOCASE"); ... }
 + if !filters.statuses.is_empty() { /* dynamic OR/IN clause with case-insensitive matching */ }
 ```
 5. **Medium: Add coverage telemetry (`seen`, `with_status`, `without_status`)**
 Rationale: `enriched`/`cleared` alone is not enough to judge enrichment health. Coverage counters make it obvious whether a project truly has no statuses, is unsupported, or has unexpectedly low status population.
 ```diff
@@ AC-6: Enrichment in Orchestrator (Integration)
 + [ ] `IngestProjectResult` gains `statuses_seen: usize` and `statuses_without_widget: usize`
 + [ ] Enrichment log includes `seen`, `enriched`, `cleared`, `without_widget`
@@ AC-10: Robot Sync Envelope (E2E)
 - status_enrichment: { mode, reason, enriched, cleared, error }
 + status_enrichment: { mode, reason, seen, enriched, cleared, without_widget, error, partial_errors }
@@ File 6: src/ingestion/orchestrator.rs
 + result.statuses_seen = fetch_result.all_fetched_iids.len();
 + result.statuses_without_widget = result.statuses_seen.saturating_sub(result.statuses_enriched);
 ```
 6. **Medium: Centralize color parsing/render decisions (single helper used by show/list)**
 Rationale: Color parsing is duplicated in `show.rs` and `list.rs`, which invites drift and inconsistent behavior. One shared helper gives consistent fallback behavior and simpler tests.
 ```diff
@@ File 7: src/cli/commands/show.rs
 - fn style_with_hex(...) { ...hex parse logic... }
 + use crate::cli::commands::color::style_with_hex;
@@ File 8: src/cli/commands/list.rs
 - fn colored_cell_hex(...) { ...hex parse logic... }
 + use crate::cli::commands::color::colored_cell_hex;
@@ Files Changed (Summary)
 + `src/cli/commands/color.rs` (NEW) — shared hex parsing + styling helpers
 - duplicated hex parsing blocks removed from show/list
 ```
 ---
 If you want, I can produce a **single consolidated patch-style diff of the plan document itself** (all section edits merged, ready to paste as iteration 7).
--- a/plans/work-item-status-graphql.md
+++ b/plans/work-item-status-graphql.md
--- a/plans/work-item-status-graphql.tdd-appendix.md
+++ b/plans/work-item-status-graphql.tdd-appendix.md