docs: add lore-service, work-item-status-graphql, and time-decay plans

Three implementation plans with iterative cross-model refinement: lore-service (5 iterations): HTTP service layer exposing lore's SQLite data via REST/SSE for integration with external tools (dashboards, IDE extensions, chat agents). Covers authentication, rate limiting, caching strategy, and webhook-driven sync triggers. work-item-status-graphql (7 iterations + TDD appendix): Detailed implementation plan for the GraphQL-based work item status enrichment feature (now implemented). Includes the TDD appendix with test-first development specifications covering GraphQL client, adaptive pagination, ingestion orchestration, CLI display, and robot mode output. time-decay-expert-scoring (iteration 5 feedback): Updates to the existing time-decay scoring plan incorporating feedback on decay curve parameterization, recency weighting for discussion contributions, and staleness detection thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:17 -05:00
parent 1161edb212
commit 2c9de1a6c3
15 changed files with 9261 additions and 33 deletions
--- a/plans/lore-service.feedback-1.md
+++ b/plans/lore-service.feedback-1.md
@@ -0,0 +1,186 @@
+1. **Isolate scheduled behavior from manual `sync`**
+Reasoning: Your current plan injects backoff into `handle_sync_cmd`, which affects all `lore sync` calls (including manual recovery runs). Scheduled behavior should be isolated so humans aren’t unexpectedly blocked by service backoff.
+
+```diff
+@@ Context
+-`lore sync` runs a 4-stage pipeline (issues, MRs, docs, embeddings) that takes 2-4 minutes.
+`lore sync` remains the manual/operator command.
+`lore service run` (hidden/internal) is the scheduled execution entrypoint.
+
+@@ Commands & User Journeys
+### `lore service run` (hidden/internal)
+**What it does:** Executes one scheduled sync attempt with service-only policy:
+- applies service backoff policy
+- records service run state
+- invokes sync pipeline with configured profile
+- updates retry state on success/failure
+
+**Invocation:** scheduler always runs:
+`lore --robot service run --reason timer`
+
+@@ Backoff Integration into `handle_sync_cmd`
+-Insert **after** config load but **before** the dry_run check:
+Do not add backoff checks to `handle_sync_cmd`.
+Backoff logic lives only in `handle_service_run`.
+```
+
+2. **Use DB as source-of-truth for service state (not a standalone JSON status file)**
+Reasoning: You already have `sync_runs` in SQLite. A separate JSON status file creates split-brain and race/corruption risk. Keep JSON as optional cache/export only.
+
+```diff
+@@ Status File
+-Location: `{get_data_dir()}/sync-status.json`
+Primary state location: SQLite (`service_state` table) + existing `sync_runs`.
+Optional mirror file: `{get_data_dir()}/sync-status.json` (best-effort export only).
+
+@@ File-by-File Implementation Details
+-### `src/core/sync_status.rs` (NEW)
+### `migrations/015_service_state.sql` (NEW)
+CREATE TABLE service_state (
+  id INTEGER PRIMARY KEY CHECK (id = 1),
+  installed INTEGER NOT NULL DEFAULT 0,
+  platform TEXT,
+  interval_seconds INTEGER,
+  profile TEXT NOT NULL DEFAULT 'balanced',
+  consecutive_failures INTEGER NOT NULL DEFAULT 0,
+  next_retry_at_ms INTEGER,
+  last_error_code TEXT,
+  last_error_message TEXT,
+  updated_at_ms INTEGER NOT NULL
+);
+
+### `src/core/service_state.rs` (NEW)
+- read/write state row
+- derive backoff/next_retry
+- join with latest `sync_runs` for status output
+```
+
+3. **Backoff policy should be configurable, jittered, and error-aware**
+Reasoning: Fixed hardcoded backoff (`base=1800`) is wrong when user sets another interval. Also permanent failures (bad token/config) should not burn retries forever; they should enter paused/error state.
+
+```diff
+@@ Backoff Logic
+-// Exponential: base * 2^failures, capped at 4 hours
+// Exponential with jitter: base * 2^(failures-1), capped, ±20% jitter
+// Applies only to transient errors.
+// Permanent errors set `paused_reason` and stop retries until user action.
+
+@@ CLI Definition Changes
+ServiceCommand::Resume,   // clear paused state / failures
+ServiceCommand::Run,      // hidden
+
+@@ Error Types
+ServicePaused,            // scheduler paused due to permanent error
+ServiceCommandFailed,     // OS command failure with stderr context
+```
+
+4. **Add a pipeline-level single-flight lock**
+Reasoning: Current locking is in ingest stages; there’s still overlap risk across full sync pipelines (docs/embed can overlap with another run). Add a top-level lock for scheduled/manual sync pipeline execution.
+
+```diff
+@@ Architecture
+Add `sync_pipeline` lock at top-level sync execution.
+Keep existing ingest lock (`sync`) for ingest internals.
+
+@@ Backoff Integration into `handle_sync_cmd`
+Before starting sync pipeline, acquire `AppLock` with:
+name = "sync_pipeline"
+stale_lock_minutes = config.sync.stale_lock_minutes
+heartbeat_interval_seconds = config.sync.heartbeat_interval_seconds
+```
+
+5. **Don’t embed token in service files by default**
+Reasoning: Embedding PAT into unit/plist is a high-risk secret leak path. Make secure storage explicit and default-safe.
+
+```diff
+@@ `lore service install [--interval 30m]`
+`lore service install [--interval 30m] [--token-source env-file|embedded]`
+Default: `env-file` (0600 perms, user-owned)
+`embedded` allowed only with explicit opt-in and warning
+
+@@ Robot output
+-    "token_embedded": true
+    "token_source": "env_file"
+
+@@ Human output
+-  Note: Your GITLAB_TOKEN is embedded in the service file.
+  Note: Token is stored in a user-private env file (0600).
+```
+
+6. **Introduce a command-runner abstraction with timeout + stderr capture**
+Reasoning: `launchctl/systemctl/schtasks` calls are failure-prone; you need consistent error mapping and deterministic tests.
+
+```diff
+@@ Platform Backends
+-exports free functions that dispatch via `#[cfg(target_os)]`
+exports backend + shared `CommandRunner`:
+- run(cmd, args, timeout)
+- capture stdout/stderr/exit code
+- map failure to `ServiceCommandFailed { cmd, exit_code, stderr }`
+```
+
+7. **Persist install manifest to avoid brittle file parsing**
+Reasoning: Parsing timer/plist for interval/state is fragile and platform-format dependent. Persist a manifest with checksums and expected artifacts.
+
+```diff
+@@ Platform Backends
+-Same pattern for ... `get_interval_seconds()`
+Add manifest: `{data_dir}/service-manifest.json`
+Stores platform, interval, profile, generated files, and command.
+`service status` reads manifest first, then verifies platform state.
+
+@@ Acceptance criteria
+Install is idempotent:
+- if manifest+files already match, report `no_change: true`
+- if drift detected, reconcile and rewrite
+```
+
+8. **Make schedule profile explicit (`fast|balanced|full`)**
+Reasoning: This makes the feature more useful and performance-tunable without requiring users to understand internal flags.
+
+```diff
+@@ `lore service install [--interval 30m]`
+`lore service install [--interval 30m] [--profile fast|balanced|full]`
+
+Profiles:
+- fast: `sync --no-docs --no-embed`
+- balanced (default): `sync --no-embed`
+- full: `sync`
+```
+
+9. **Upgrade `service status` to include scheduler health + recent run summary**
+Reasoning: Single last-sync snapshot is too shallow. Include recent attempts and whether scheduler is paused/backing off/running.
+
+```diff
+@@ `lore service status`
+-What it does: Shows whether the service is installed, its configuration, last sync result, and next scheduled run.
+What it does: Shows install state, scheduler state (running/backoff/paused), recent runs, and next run estimate.
+
+@@ Robot output
+-    "last_sync": { ... },
+-    "backoff": null
+    "scheduler_state": "running|backoff|paused|idle",
+    "last_sync": { ... },
+    "recent_runs": [{"run_id":"...","status":"...","started_at_iso":"..."}],
+    "backoff": null,
+    "paused_reason": null
+```
+
+10. **Strengthen tests around determinism and cross-platform generation**
+Reasoning: Time-based backoff and shell quoting are classic flaky points. Add fake clock + fake command runner for deterministic tests.
+
+```diff
+@@ Testing Strategy
+Add deterministic test seams:
+- `Clock` trait for backoff/now calculations
+- `CommandRunner` trait for backend command execution
+
+Add tests:
+- transient vs permanent error classification
+- backoff schedule with jitter bounds
+- manifest drift reconciliation
+- quoting/escaping for paths with spaces and special chars
+- `service run` does not modify manual `sync` behavior
+```
+
+If you want, I can rewrite your full plan as a single clean revised document with these changes already integrated (instead of patch fragments).
--- a/plans/lore-service.feedback-2.md
+++ b/plans/lore-service.feedback-2.md
@@ -0,0 +1,182 @@
+**High-Impact Revisions (ordered by priority)**
+
+1. **Make service identity project-scoped (avoid collisions across repos/users)**
+Analysis: Current fixed names (`com.gitlore.sync`, `LoreSync`, `lore-sync.timer`) will collide when users run multiple gitlore workspaces. This causes silent overwrites and broken uninstall/status behavior.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Commands & User Journeys / install
+- lore service install [--interval 30m] [--profile balanced] [--token-source env-file]
+ lore service install [--interval 30m] [--profile balanced] [--token-source auto] [--name <optional>]
+@@ Install Manifest Schema
+    /// Stable per-install identity (default derived from project root hash)
+    pub service_id: String,
+@@ Platform Backends
+- Label: com.gitlore.sync
+ Label: com.gitlore.sync.{service_id}
+- Task name: LoreSync
+ Task name: LoreSync-{service_id}
+- ~/.config/systemd/user/lore-sync.service
+ ~/.config/systemd/user/lore-sync-{service_id}.service
+```
+
+2. **Replace token model with secure per-OS defaults**
+Analysis: The current “env-file default” is not actually secure on macOS launchd (token still ends up in plist). On Windows, assumptions about inherited environment are fragile. Use OS-native secure stores by default and keep `embedded` as explicit opt-in only.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Token storage strategies
+-| env-file (default) | ...
+| auto (default) | macOS: Keychain, Linux: env-file (0600), Windows: Credential Manager |
+| env-file | Linux/systemd only |
+ | embedded | ... explicit warning ...
+@@ macOS launchd section
+- env-file strategy stores canonical token in service-env but embeds token in plist
+ default strategy is Keychain lookup at runtime; no token persisted in plist
+ env-file is not offered on macOS
+@@ Windows schtasks section
+- token must be in user's system environment
+ default strategy stores token in Windows Credential Manager and injects at runtime
+```
+
+3. **Version and atomically persist manifest/status**
+Analysis: `Option<Self>` on read hides corruption, and non-atomic writes risk truncated JSON on crashes. This will create false “not installed” and scheduler confusion.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Install Manifest Schema
+    pub schema_version: u32, // start at 1
+    pub updated_at_iso: String,
+@@ Status File Schema
+    pub schema_version: u32, // start at 1
+    pub updated_at_iso: String,
+@@ Read/Write
+- read(path) -> Option<Self>
+ read(path) -> Result<Option<Self>, LoreError>
+- write(...) -> std::io::Result<()>
+ write_atomic(...) -> std::io::Result<()> // tmp file + fsync + rename
+```
+
+4. **Persist `next_retry_at_ms` instead of recomputing jitter**
+Analysis: Deterministic jitter from timestamp modulo is predictable and can herd retries. Persisting `next_retry_at_ms` at failure time makes status accurate, stable, and cheap to compute.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ SyncStatusFile
+     pub consecutive_failures: u32,
+    pub next_retry_at_ms: Option<i64>,
+@@ Backoff Logic
+- compute backoff from last_run.timestamp_ms and deterministic jitter each read
+ compute backoff once on failure, store next_retry_at_ms, read-only comparison afterward
+ jitter algorithm: full jitter in [0, cap], injectable RNG for tests
+```
+
+5. **Add circuit breaker for repeated transient failures**
+Analysis: Infinite transient retries can run forever on systemic failures (DB corruption, bad network policy). After N transient failures, pause with actionable reason.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Scheduler states
+- backoff — transient failures, waiting to retry
+ backoff — transient failures, waiting to retry
+ paused — permanent error OR circuit breaker tripped after N transient failures
+@@ Service run flow
+- On transient failure: increment failures, compute backoff
+ On transient failure: increment failures, compute backoff, if failures >= max_transient_failures -> pause
+```
+
+6. **Stage-aware outcome policy (core freshness over all-or-nothing)**
+Analysis: Failing embeddings/docs should not block issues/MRs freshness. Split stage outcomes and only treat core stages as hard-fail by default. This improves reliability and practical usefulness.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Context
+- lore sync runs a 4-stage pipeline ... treated as one run result
+ lore service run records per-stage outcomes (issues, mrs, docs, embeddings)
+@@ Status File Schema
+    pub stage_results: Vec<StageResult>,
+@@ service run flow
+- Execute sync pipeline with flags derived from profile
+ Execute stage-by-stage and classify severity:
+   - critical: issues, mrs
+   - optional: docs, embeddings
+ optional stage failures mark run as degraded, not failed
+```
+
+7. **Replace cfg free-function backend with trait-based backend**
+Analysis: Current backend API is hard to test end-to-end without real OS commands. A `SchedulerBackend` trait enables deterministic integration tests and cleaner architecture.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Platform Backends / Architecture
+- exports free functions dispatched via #[cfg]
+ define trait SchedulerBackend { install, uninstall, state, file_paths, next_run }
+ provide LaunchdBackend, SystemdBackend, SchtasksBackend implementations
+ include FakeBackend for integration tests
+```
+
+8. **Harden platform units and detect scheduler prerequisites**
+Analysis: systemd user timers often fail silently without user manager/linger; launchd context can be wrong in headless sessions. Add explicit diagnostics and unit hardening.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Linux systemd unit
+ [Service]
+ Type=oneshot
+ ExecStart=...
+TimeoutStartSec=900
+NoNewPrivileges=true
+PrivateTmp=true
+ProtectSystem=strict
+ProtectHome=read-only
+@@ Linux install/status
+ detect user manager availability and linger state; surface warning/action
+@@ macOS install/status
+ detect non-GUI bootstrap context and return actionable error
+```
+
+9. **Add operational commands: `trigger`, `doctor`, and non-interactive log tail**
+Analysis: `logs` opening an editor is weak for automation and incident response. Operators need a preflight and immediate controlled run.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ ServiceCommand
+ Trigger, // run one attempt through service policy now
+ Doctor,  // validate scheduler, token, paths, permissions
+@@ logs
+- opens editor
+ supports --tail <n> and --follow in human mode
+ robot mode can return last_n lines optionally
+```
+
+10. **Fix plan inconsistencies and edge-case correctness**
+Analysis: There are internal mismatches that will cause implementation drift.  
+Diff:
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Interval Parsing
+- supports 's' suffix
+ remove 's' suffix (acceptance only allows 5m..24h)
+@@ uninstall acceptance
+- removes ALL service files only
+ explicitly also remove service-manifest and service-env (status/logs retained)
+@@ SyncStatusFile schema
+- pub last_run: SyncRunRecord
+ pub last_run: Option<SyncRunRecord> // matches idle/no runs state
+```
+
+---
+
+**Recommended Architecture Upgrade Summary**
+
+The strongest improvement set is: **(1) project-scoped IDs, (2) secure token defaults, (3) atomic/versioned state, (4) persisted retry schedule + circuit breaker, (5) stage-aware outcomes**. That combination materially improves correctness, multi-repo safety, security, operability, and real-world reliability without changing your core manual-vs-scheduled separation principle.
--- a/plans/lore-service.feedback-3.md
+++ b/plans/lore-service.feedback-3.md
@@ -0,0 +1,174 @@
+Below are the highest-impact revisions I’d make, ordered by severity/ROI. These focus on correctness first, then security, then operability and UX.
+
+1. **Fix multi-install ambiguity (`service_id` exists, but commands can’t target one explicitly)**
+Analysis: The plan introduces `service-manifest-{service_id}.json`, but `status/uninstall/resume/logs` have no selector. In a multi-workspace or multi-name install scenario, behavior becomes ambiguous and error-prone. Add explicit targeting plus discovery.
+```diff
+@@ ## Commands & User Journeys
+### `lore service list`
+Lists installed services discovered from `{data_dir}/service-manifest-*.json`.
+Robot output includes `service_id`, `platform`, `interval_seconds`, `profile`, `installed_at_iso`.
+
+@@ ### `lore service uninstall`
+-### `lore service uninstall`
+### `lore service uninstall [--service <service_id|name>] [--all]`
+@@
+-2. CLI reads install manifest to find `service_id`
+2. CLI resolves target service via `--service` or current-project-derived default.
+3. If multiple candidates and no selector, return actionable error.
+
+@@ ### `lore service status`
+-### `lore service status`
+### `lore service status [--service <service_id|name>]`
+```
+
+2. **Make status state service-scoped (not global)**
+Analysis: A single `sync-status.json` for all services causes cross-service contamination (pause/backoff/outcome from one profile affecting another). Keep lock global, but state per service.
+```diff
+@@ ## Status File
+-### Location
+-`{get_data_dir()}/sync-status.json`
+### Location
+`{get_data_dir()}/sync-status-{service_id}.json`
+
+@@ ## Paths Module Additions
+-pub fn get_service_status_path() -> PathBuf {
+-    get_data_dir().join("sync-status.json")
+pub fn get_service_status_path(service_id: &str) -> PathBuf {
+    get_data_dir().join(format!("sync-status-{service_id}.json"))
+}
+@@
+-Note: `sync-status.json` is NOT scoped by `service_id`
+Note: status is scoped by `service_id`; lock remains global (`sync_pipeline`) to prevent overlapping writes.
+```
+
+3. **Stop classifying permanence via string matching**
+Analysis: Matching `"401 Unauthorized"` in strings is brittle and will misclassify edge cases. Carry machine codes through stage results and classify by `ErrorCode` only.
+```diff
+@@ pub struct StageResult {
+-    pub error: Option<String>,
+    pub error: Option<String>,
+    pub error_code: Option<String>, // e.g., AUTH_FAILED, NETWORK_ERROR
+}
+@@ Error classification helpers
+-fn is_permanent_error_message(msg: Option<&str>) -> bool { ...string contains... }
+fn is_permanent_error_code(code: Option<&str>) -> bool {
+    matches!(code, Some("TOKEN_NOT_SET" | "AUTH_FAILED" | "CONFIG_NOT_FOUND" | "CONFIG_INVALID" | "MIGRATION_FAILED"))
+}
+```
+
+4. **Install should be transactional (manifest written last)**
+Analysis: Current order writes manifest before scheduler enable. If enable fails, you persist a false “installed” state. Use two-phase install with rollback.
+```diff
+@@ ### `lore service install` User journey
+-9. CLI writes install manifest ...
+-10. CLI runs the platform-specific enable command
+9. CLI runs the platform-specific enable command
+10. On success, CLI writes install manifest atomically
+11. On failure, CLI removes generated files and returns `ServiceCommandFailed`
+```
+
+5. **Fix launchd token security gap (env-file currently still embeds token)**
+Analysis: Current “env-file” on macOS still writes token into plist, defeating the main security goal. Generate a private wrapper script that reads env file at runtime and execs `lore`.
+```diff
+@@ ### macOS: launchd
+-<key>ProgramArguments</key>
+-<array>
+-    <string>{binary_path}</string>
+-    <string>--robot</string>
+-    <string>service</string>
+-    <string>run</string>
+-</array>
+<key>ProgramArguments</key>
+<array>
+    <string>{data_dir}/service-run-{service_id}.sh</string>
+</array>
+@@
+-`env-file`: ... token value must still appear in plist ...
+`env-file`: token never appears in plist; wrapper loads `{data_dir}/service-env-{service_id}` at runtime.
+```
+
+6. **Improve backoff math and add half-open circuit recovery**
+Analysis: Current jitter + min clamp makes first retry deterministic and can over-pause. Also circuit-breaker requires manual resume forever. Add cooldown + half-open probe to self-heal.
+```diff
+@@ Backoff Logic
+-let backoff_secs = ((base_backoff as f64) * jitter_factor) as u64;
+-let backoff_secs = backoff_secs.max(base_interval_seconds);
+let max_backoff = base_backoff;
+let min_backoff = base_interval_seconds;
+let span = max_backoff.saturating_sub(min_backoff);
+let backoff_secs = min_backoff + ((span as f64) * jitter_factor) as u64;
+
+@@ Scheduler states
+-- `paused` — permanent error ... OR circuit breaker tripped ...
+- `paused` — permanent error requiring intervention
+- `half_open` — probe state after circuit cooldown; one trial run allowed
+
+@@ Circuit breaker
+-... transitions to `paused` ... Run: lore service resume
+... transitions to `half_open` after cooldown (default 30m). Successful probe closes breaker automatically; failed probe returns to backoff/paused.
+```
+
+7. **Promote backend trait to v1 (not v2) for deterministic integration tests**
+Analysis: This is a reliability-critical feature spanning OS schedulers. A trait abstraction now gives true behavior tests and safer refactors.
+```diff
+@@ ### Platform Backends
+-> Future architecture note: A `SchedulerBackend` trait ... for v2.
+Adopt `SchedulerBackend` trait in v1 with real backends (`launchd/systemd/schtasks`) and `FakeBackend` for tests.
+This enables deterministic install/uninstall/status/run-path integration tests without touching host scheduler.
+```
+
+8. **Harden `run_cmd` timeout behavior**
+Analysis: If timeout occurs, child process must be killed and reaped. Otherwise you leak processes and can wedge repeated runs.
+```diff
+@@ fn run_cmd(...)
+-// Wait with timeout
+-let output = wait_with_timeout(output, timeout_secs)?;
+// Wait with timeout; on timeout kill child and wait to reap
+let output = wait_with_timeout_kill_and_reap(child, timeout_secs)?;
+```
+
+9. **Add manual control commands (`pause`, `trigger`, `repair`)**
+Analysis: These are high-utility operational controls. `trigger` helps immediate sync without waiting interval. `pause` supports maintenance windows. `repair` avoids manual file deletion for corrupt state.
+```diff
+@@ pub enum ServiceCommand {
+    /// Pause scheduled execution without uninstalling
+    Pause { #[arg(long)] reason: Option<String> },
+    /// Trigger an immediate one-off run using installed profile
+    Trigger { #[arg(long)] ignore_backoff: bool },
+    /// Repair corrupt manifest/status by backing up and reinitializing
+    Repair { #[arg(long)] service: Option<String> },
+}
+```
+
+10. **Make `logs` default non-interactive and add rotation policy**
+Analysis: Opening editor by default is awkward for automation/SSH and slower for normal diagnosis. Defaulting to `tail` is more practical; `--open` can preserve editor behavior.
+```diff
+@@ ### `lore service logs`
+-By default, opens in the user's preferred editor.
+By default, prints last 100 lines to stdout.
+Use `--open` to open editor.
+@@
+Log rotation: rotate `service-stdout.log` / `service-stderr.log` at 10 MB, keep 5 files.
+```
+
+11. **Remove destructive/shell-unsafe suggested action**
+Analysis: `actions(): ["rm {path}", ...]` is unsafe (shell injection + destructive guidance). Replace with safe command path.
+```diff
+@@ LoreError::actions()
+-Self::ServiceCorruptState { path, .. } => vec![&format!("rm {path}"), "lore service install"],
+Self::ServiceCorruptState { .. } => vec!["lore service repair", "lore service install"],
+```
+
+12. **Tighten scheduler units for real-world reliability**
+Analysis: Add explicit working directory and success-exit handling to reduce environment drift and edge failures.
+```diff
+@@ systemd service unit
+ [Service]
+ Type=oneshot
+ ExecStart={binary_path} --robot service run
+WorkingDirectory={data_dir}
+SuccessExitStatus=0
+ TimeoutStartSec=900
+```
+
+If you want, I can produce a single consolidated “v3 plan” markdown with these revisions already merged into your original structure.
--- a/plans/lore-service.feedback-4.md
+++ b/plans/lore-service.feedback-4.md
@@ -0,0 +1,190 @@
+No `## Rejected Recommendations` section was present in the plan you shared, so the proposals below are all net-new.
+
+1. **Make scheduled runs explicitly target a single service instance**
+Analysis: right now `service run` has no selector, but the plan supports multiple installed services. That creates ambiguity and incorrect manifest/status selection. This is the most important architectural fix.
+
+```diff
+@@ `lore service install` What it does
+- runs `lore --robot service run` at the specified interval
+ runs `lore --robot service run --service-id <service_id>` at the specified interval
+
+@@ Robot output (`install`)
+- "sync_command": "/usr/local/bin/lore --robot service run",
+ "sync_command": "/usr/local/bin/lore --robot service run --service-id a1b2c3d4",
+
+@@ `ServiceCommand` enum
+- #[command(hide = true)]
+- Run,
+ #[command(hide = true)]
+ Run {
+     /// Internal selector injected by scheduler backend
+     #[arg(long, hide = true)]
+     service_id: String,
+ },
+
+@@ `handle_service_run` signature
+-pub fn handle_service_run(start: std::time::Instant) -> Result<(), Box<dyn std::error::Error>>
+pub fn handle_service_run(service_id: &str, start: std::time::Instant) -> Result<(), Box<dyn std::error::Error>>
+
+@@ run flow step 1
+- Read install manifest
+ Read install manifest for `service_id`
+```
+
+2. **Strengthen `service_id` derivation to avoid cross-workspace collisions**
+Analysis: hashing config path alone can collide when many workspaces share one global config. Identity should represent what is being synced, not only where config lives.
+
+```diff
+@@ Key Design Principles / Project-Scoped Service Identity
+- derive from a stable hash of the config file path
+ derive from a stable fingerprint of:
+ - canonical workspace root
+ - normalized configured GitLab project URLs
+ - canonical config path
+ then take first 12 hex chars of SHA-256
+
+@@ `compute_service_id`
+- Returns first 8 hex chars of SHA-256 of the canonical config path.
+ Returns first 12 hex chars of SHA-256 of a canonical identity tuple
+ (workspace_root + sorted project URLs + config_path).
+```
+
+3. **Introduce a service-state machine with a dedicated admin lock**
+Analysis: install/uninstall/pause/resume/repair/status can race each other. A lock and explicit transition table prevents invalid states and file races.
+
+```diff
+@@ New section: Service State Model
+ All state mutations are serialized by `AppLock("service-admin-{service_id}")`.
+ Legal transitions:
+ - idle -> running -> success|degraded|backoff|paused
+ - backoff -> running|paused
+ - paused -> half_open|running (resume)
+ - half_open -> running|paused
+ Any invalid transition is rejected with `ServiceCorruptState`.
+
+@@ `handle_install`, `handle_uninstall`, `handle_pause`, `handle_resume`, `handle_repair`
+ Acquire `service-admin-{service_id}` before mutating manifest/status/service files.
+```
+
+4. **Unify manual and scheduled sync execution behind one orchestrator**
+Analysis: the plan currently duplicates stage logic and error classification in `service run`, increasing drift risk. A shared orchestrator gives one authoritative pipeline behavior.
+
+```diff
+@@ Key Design Principles
+ #### 6. Single Sync Orchestrator
+ Both `lore sync` and `lore service run` call `SyncOrchestrator`.
+ Service mode adds policy (backoff/circuit-breaker); manual mode bypasses policy.
+
+@@ Service Run Implementation
+- execute_sync_stages(&sync_args)
+ SyncOrchestrator::run(SyncMode::Service { profile, policy })
+
+@@ manual sync
+- separate pipeline path
+ SyncOrchestrator::run(SyncMode::Manual { flags })
+```
+
+5. **Add bounded in-run retries for transient core-stage failures**
+Analysis: single-shot failure handling will over-trigger backoff on temporary network blips. One short retry per core stage significantly improves freshness without much extra runtime.
+
+```diff
+@@ Stage-aware execution
+ Core stages (`issues`, `mrs`) get up to 1 immediate retry on transient errors
+ (jittered 1-5s). Permanent errors are never retried.
+ Optional stages keep best-effort semantics.
+
+@@ Acceptance criteria (`service run`)
+ Retries transient core stage failures once before counting run as failed.
+```
+
+6. **Harden persistence with full crash-safety semantics**
+Analysis: current atomic write description is good but incomplete for power-loss durability. You should fsync parent directory after rename and include lightweight integrity metadata.
+
+```diff
+@@ `write_atomic`
+- tmp file + fsync + rename
+ tmp file + fsync(file) + rename + fsync(parent_dir)
+
+@@ `ServiceManifest` and `SyncStatusFile`
+ pub write_seq: u64,
+ pub content_sha256: String, // optional integrity guard for repair/doctor
+```
+
+7. **Fix token handling to avoid shell/env injection and add secure-store mode**
+Analysis: sourcing env files in shell is brittle if token contains special chars/newlines. Also, secure OS credential stores should be first-class for production reliability/security.
+
+```diff
+@@ Token storage strategies
+-| `env-file` (default) ...
+| `auto` (default) | use secure-store when available, else env-file |
+| `secure-store` | macOS Keychain / libsecret / Windows Credential Manager |
+| `env-file` | explicit fallback |
+
+@@ macOS wrapper script
+-. "{data_dir}/service-env-{service_id}"
+-export {token_env_var}
+TOKEN_VALUE="$(cat "{data_dir}/service-token-{service_id}" )"
+export {token_env_var}="$TOKEN_VALUE"
+
+@@ Acceptance criteria
+ Reject token values containing `\0` or newline for env-file mode.
+ Never eval/source untrusted token content.
+```
+
+8. **Correct platform/runtime implementation hazards**
+Analysis: there are a few correctness risks that should be fixed in-plan now.
+
+```diff
+@@ macOS install steps
+- Get UID via `unsafe { libc::getuid() }`
+ Get UID via safe API (`nix::unistd::Uid::current()` or equivalent safe helper)
+
+@@ Command Runner Helper
+- poll try_wait and read stdout/stderr after exit
+ avoid potential pipe backpressure deadlock:
+ use wait-with-timeout + concurrent stdout/stderr draining
+
+@@ Linux timer
+- OnUnitActiveSec={interval_seconds}s
+ OnUnitInactiveSec={interval_seconds}s
+ AccuracySec=1min
+```
+
+9. **Make logs fully service-scoped**
+Analysis: you already scoped manifest/status by `service_id`; logs are still global in several places. Multi-service installs will overwrite each other’s logs.
+
+```diff
+@@ Paths Module Additions
+-pub fn get_service_log_path() -> PathBuf
+pub fn get_service_log_path(service_id: &str, stream: LogStream) -> PathBuf
+
+@@ log filenames
+- logs/service-stderr.log
+- logs/service-stdout.log
+ logs/service-{service_id}-stderr.log
+ logs/service-{service_id}-stdout.log
+
+@@ `service logs`
+- default path: `{data_dir}/logs/service-stderr.log`
+ default path: `{data_dir}/logs/service-{service_id}-stderr.log`
+```
+
+10. **Resolve internal spec contradictions and rollback gaps**
+Analysis: there are a few contradictory statements and incomplete rollback behavior that will cause implementation churn.
+
+```diff
+@@ `service logs` behavior
+- default (no flags): open in editor (human)
+ default (no flags): print last 100 lines (human and robot metadata mode)
+ `--open` is explicit opt-in
+
+@@ install rollback
+- On failure: removes generated service files
+ On failure: removes generated service files, env file, wrapper script, and temp manifest
+
+@@ `handle_service_run` sample code
+- let manifest_path = get_service_manifest_path();
+ let manifest_path = get_service_manifest_path(service_id);
+```
+
+If you want, I can take these revisions and produce a single consolidated “Iteration 4” replacement plan block with all sections rewritten coherently so it’s ready to hand to an implementer.
--- a/plans/lore-service.feedback-5.md
+++ b/plans/lore-service.feedback-5.md
@@ -0,0 +1,196 @@
+I reviewed the full plan and avoided everything already listed in `## Rejected Recommendations`. These are the highest-impact revisions I’d make.
+
+1. **Fix identity model inconsistency and prevent `--name` alias collisions**
+Why this improves the plan: your text says identity includes workspace root, but the current derivation code does not. Also, using `--name` as the actual `service_id` risks accidental cross-project collisions and destructive updates.
+
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Key Design Principles / 2. Project-Scoped Service Identity
+- Each installed service gets a unique `service_id` derived from a canonical identity tuple: the config file path, sorted GitLab project URLs, and workspace root.
+ Each installed service gets an immutable `identity_hash` derived from a canonical identity tuple:
+ workspace root + canonical config path + sorted normalized project URLs.
+ `service_id` remains the scheduler identifier; `--name` is a human alias only.
+ If `--name` collides with an existing service that has a different `identity_hash`, install fails with an actionable error.
+
+@@ Install Manifest / Schema
+ /// Immutable identity hash for collision-safe matching across reinstalls
+ pub identity_hash: String,
+ /// Optional human-readable alias passed via --name
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub service_alias: Option<String>,
+ /// Canonical workspace root used in identity derivation
+ pub workspace_root: String,
+
+@@ service_id derivation
+-pub fn compute_service_id(config_path: &Path, project_urls: &[&str]) -> String
+pub fn compute_identity_hash(workspace_root: &Path, config_path: &Path, project_urls: &[&str]) -> String
+```
+
+2. **Add lock protocol to eliminate uninstall/run race conditions**
+Why this improves the plan: today `service run` does not take admin lock, and admin commands do not take pipeline lock. `uninstall` can race with an active run and remove files mid-execution.
+
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Key Design Principles / 6. Serialized Admin Mutations
+- The `service run` entrypoint does NOT acquire the admin lock — it only acquires the `sync_pipeline` lock
+ The `service run` entrypoint acquires only `sync_pipeline`.
+ Destructive admin operations (`install` overwrite, `uninstall`, `repair --regenerate`) must:
+ 1) acquire `service-admin-{service_id}`
+ 2) disable scheduler backend entrypoint
+ 3) acquire `sync_pipeline` lock with timeout
+ 4) mutate/remove files
+ This lock ordering is mandatory to prevent deadlocks and run/delete races.
+
+@@ lore service uninstall / User journey
+- 4. Runs platform-specific disable command
+- 5. Removes service files from disk
+ 4. Acquires `sync_pipeline` lock (after disabling scheduler) with bounded wait
+ 5. Removes service files from disk only after lock acquisition
+```
+
+3. **Make transient handling `Retry-After` aware**
+Why this improves the plan: rate-limit and 503 responses often carry retry hints. Ignoring them causes useless retries and longer outages.
+
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Transient vs permanent error classification
+-| Transient | Retry with backoff | Network timeout, rate limited, DB locked, 5xx from GitLab |
+| Transient | Retry with adaptive backoff | Network timeout, DB locked, 5xx from GitLab |
+| Transient (hinted) | Respect server retry hint | Rate limited with Retry-After/X-RateLimit-Reset |
+
+@@ Backoff Logic
+ If an error includes a retry hint (e.g., `Retry-After`), set:
+ `next_retry_at_ms = max(computed_backoff, hinted_retry_at_ms)`.
+ Persist `backoff_reason` for status visibility.
+```
+
+4. **Decouple optional stage cadence from core sync interval**
+Why this improves the plan: running docs/embeddings every 5–30 minutes is expensive and unnecessary. Separate freshness windows reduce cost/latency while keeping core data fresh.
+
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Sync profiles
+-| `balanced` (default) | `--no-embed` | Issues + MRs + doc generation |
+-| `full` | (none) | Full pipeline including embeddings |
+| `balanced` (default) | core every interval, docs every 60m, no embeddings | Fast + useful docs |
+| `full` | core every interval, docs every interval, embeddings every 6h (default) | Full freshness with bounded cost |
+
+@@ Status File / StageResult
+ /// true when stage intentionally skipped due freshness window
+ #[serde(default)]
+ pub skipped: bool,
+
+@@ lore service run / Stage-aware execution
+ Optional stages may be skipped when their last successful run is within configured freshness windows.
+ Skipped optional stages do not count as failures and are recorded explicitly.
+```
+
+5. **Give Windows parity for secure token handling (env-file + wrapper)**
+Why this improves the plan: current Windows path requires global/system env and has poor UX. A wrapper+env-file model gives platform parity and avoids global token exposure.
+
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Token storage strategies
+-| On Windows, neither strategy applies — the token must be in the user's system environment
+| On Windows, `env-file` is supported via a generated wrapper script (`service-run-{service_id}.cmd` or `.ps1`)
+| that reads `{data_dir}/service-env-{service_id}` and launches `lore --robot service run ...`.
+| `embedded` remains opt-in and warned as less secure.
+
+@@ Windows: schtasks
+- Token handling on Windows: The env var must be set system-wide via `setx`
+ Token handling on Windows:
+ - `env-file` (default): wrapper script reads token from private file at runtime
+ - `embedded`: token passed via wrapper-set environment variable
+ - `system_env`: still supported as fallback
+```
+
+6. **Add run heartbeat and stale-run detection**
+Why this improves the plan: `running` state can become misleading after crashes or stale locks. Heartbeat metadata makes status accurate and improves incident triage.
+
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Status File / Schema
+ /// In-flight run metadata for crash/stale detection
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub current_run: Option<CurrentRunState>,
+
+pub struct CurrentRunState {
+    pub run_id: String,
+    pub started_at_ms: i64,
+    pub last_heartbeat_ms: i64,
+    pub pid: u32,
+}
+
+@@ lore service status
+- - `running` — currently executing (sync_pipeline lock held)
+ - `running` — currently executing with live heartbeat
+ - `running_stale` — in-flight metadata exists but heartbeat exceeded stale threshold
+```
+
+7. **Upgrade drift detection from “loaded/unloaded” to spec-level drift**
+Why this improves the plan: platform state alone misses manual edits to unit/plist/wrapper files. Spec-hash drift gives reliable “what changed?” diagnostics and safe regeneration.
+
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ Install Manifest / Schema
+ /// Hash of generated scheduler artifacts and command spec
+ pub spec_hash: String,
+
+@@ lore service status
+- Detects manifest/platform drift and reports it
+ Detects:
+ - platform drift (loaded/unloaded mismatch)
+ - spec drift (artifact content hash mismatch)
+ - command drift (sync command differs from manifest)
+
+@@ lore service repair
+ Add `--regenerate` to rewrite scheduler artifacts from manifest when spec drift is detected.
+ This is non-destructive and does not delete status/log history.
+```
+
+8. **Add safe operational modes: `install --dry-run` and `doctor --fix`**
+Why this improves the plan: dry-run reduces risk before writing OS scheduler files; fix-mode improves operator ergonomics and lowers support burden.
+
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ lore service install
+ Add `--dry-run`:
+ - validates config/token/prereqs
+ - renders service files and planned commands
+ - writes nothing, executes nothing
+
+@@ lore service doctor
+ Add `--fix` for safe, non-destructive remediations:
+ - create missing dirs
+ - correct file permissions on env/wrapper files
+ - run `systemctl --user daemon-reload` when applicable
+ - report applied fixes in robot output
+```
+
+9. **Define explicit schema migration behavior (not just `schema_version` fields)**
+Why this improves the plan: version fields without migration policy become operational risk during upgrades.
+
+```diff
+--- a/plan.md
+++ b/plan.md
+@@ ServiceManifest Read/Write
+- `ServiceManifest::read(path: &Path) -> Result<Option<Self>, LoreError>`
+ `ServiceManifest::read_and_migrate(path: &Path) -> Result<Option<Self>, LoreError>`
+ - Migrates known older schema versions to current in-memory model
+ - Rewrites migrated file atomically
+ - Fails with actionable `ServiceCorruptState` for unknown future major versions
+
+@@ SyncStatusFile Read/Write
+- `SyncStatusFile::read(path: &Path) -> Result<Option<Self>, LoreError>`
+ `SyncStatusFile::read_and_migrate(path: &Path) -> Result<Option<Self>, LoreError>`
+```
+
+If you want, I can produce a fully rewritten v5 plan text that integrates all nine changes coherently section-by-section.
--- a/plans/lore-service.md
+++ b/plans/lore-service.md
--- a/plans/time-decay-expert-scoring.feedback-5.md
+++ b/plans/time-decay-expert-scoring.feedback-5.md
@@ -0,0 +1,128 @@
+**Best Revisions To Strengthen The Plan**
+
+1. **[Critical] Replace one-hop rename matching with canonical path identities**
+Analysis and rationale: Current `old_path OR new_path` fixes direct renames, but it still breaks on rename chains (`a.rs -> b.rs -> c.rs`) and split/move patterns. A canonical `path_identity` graph built from `mr_file_changes(old_path,new_path)` gives stable identity over time, which is the right architectural boundary for expertise history.
+```diff
+@@ ## Context
+-- Match both old and new paths in all signal queries AND path resolution probes so expertise survives file renames
+- Build canonical path identities from rename edges and score by identity, not raw path strings, so expertise survives multi-hop renames and moves
+
+@@ ## Files to Modify
+-2. **`src/cli/commands/who.rs`** — Core changes:
+2. **`src/cli/commands/who.rs`** — Core changes:
+    ...
+-   - Match both `new_path` and `old_path` in all signal queries (rename awareness)
+   - Resolve queried paths to `path_identity_id` and match all aliases in that identity set
+4. **`src/core/path_identity.rs`** — New module:
+   - Build/maintain rename graph from `mr_file_changes`
+   - Resolve path -> identity + aliases for probes/scoring
+```
+
+2. **[Critical] Shift scoring input from runtime CTE joins to a normalized `expertise_events` table**
+Analysis and rationale: Your SQL is correct but complex and expensive at query time. Precomputing normalized events at ingestion gives simpler, faster, and more reliable scoring queries; it also enables model versioning/backfills without touching raw MR/note tables each request.
+```diff
+@@ ## Files to Modify
+-3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes
+3. **`src/core/db.rs`** — Add migrations for:
+   - `expertise_events` table (normalized scoring events)
+   - supporting indexes
+4. **`src/core/ingest/expertise_events.rs`** — New:
+   - Incremental upsert of events during sync/ingest
+
+@@ ## SQL Restructure (who.rs)
+-The SQL uses CTE-based dual-path matching and hybrid aggregation...
+Runtime SQL reads precomputed `expertise_events` filtered by path identity + time window.
+Heavy joins/aggregation move to ingest-time normalization.
+```
+
+3. **[High] Upgrade reviewer engagement model beyond char-count threshold**
+Analysis and rationale: `min_note_chars` is a useful guardrail but brittle (easy to game, penalizes concise high-quality comments). Add explicit review-state signals (`approved`, `changes_requested`) and trivial-comment pattern filtering to better capture real reviewer expertise.
+```diff
+@@ ## Scoring Formula
+-| **Reviewer Participated** (left DiffNote on MR/path) | 10 | 90 days |
+| **Reviewer Participated** (substantive DiffNote and/or formal review action) | 10 | 90 days |
+| **Review Decision: changes_requested** | 6 | 120 days |
+| **Review Decision: approved** | 4 | 75 days |
+
+@@ ### 1. ScoringConfig (config.rs)
+     pub reviewer_min_note_chars: u32,
+    pub reviewer_trivial_note_patterns: Vec<String>, // default: ["lgtm","+1","nit","ship it","👍"]
+    pub review_approved_weight: i64,                 // default: 4
+    pub review_changes_requested_weight: i64,        // default: 6
+```
+
+4. **[High] Make temporal semantics explicit and deterministic**
+Analysis and rationale: `--as-of` is good, but day parsing and boundary semantics can still cause subtle reproducibility issues. Define window as `[since_ms, as_of_ms)` and parse `YYYY-MM-DD` as end-of-day UTC (or explicit timezone) so user expectations match outputs.
+```diff
+@@ ### 5a. Reproducible Scoring via `--as-of`
+-- All event selection is bounded by `[since_ms, as_of_ms]`
+- All event selection is bounded by `[since_ms, as_of_ms)` (exclusive upper bound)
+- `YYYY-MM-DD` is interpreted as `23:59:59.999Z` unless `--timezone` is provided
+- Robot output includes `window_start_iso`, `window_end_iso`, `window_end_exclusive: true`
+```
+
+5. **[High] Replace fixed default `--since 24m` with contribution-floor auto cutoff**
+Analysis and rationale: A static window is simple but often over-scans data. Compute a model-derived horizon from a minimum contribution floor (for example `0.01` points) per signal; this keeps results equivalent while reducing query cost.
+```diff
+@@ ### 5. Default --since Change
+-Expert mode: `"6m"` -> `"24m"`
+Expert mode default: `--since auto`
+`auto` computes earliest relevant timestamp from configured weights/half-lives and `min_contribution_floor`
+Add config: `min_contribution_floor` (default: 0.01)
+`--since` still overrides, `--all-history` still bypasses cutoff
+```
+
+6. **[High] Add bot/service-account filtering now (not later)**
+Analysis and rationale: Bot activity can materially distort expertise rankings in real repos. This is low implementation cost with high quality gain and should be in v1 of the scoring revamp, not deferred.
+```diff
+@@ ### 1. ScoringConfig (config.rs)
+    pub excluded_username_patterns: Vec<String>, // default: ["bot","\\[bot\\]","service-account","ci-"]
+@@ ### 2. SQL Restructure (who.rs)
+Apply username exclusion in all signal sources unless `--include-bots` is set
+@@ ### 5b. Score Explainability via `--explain-score`
+Add `filtered_events` counts in robot output metadata
+```
+
+7. **[Medium] Enforce deterministic floating-point accumulation**
+Analysis and rationale: Even with small sets, unordered `HashMap` iteration can cause tiny platform-dependent ranking differences near ties. Sorting contributions and using Neumaier summation removes nondeterminism and stabilizes tests/CI.
+```diff
+@@ ### 4. Rust-Side Aggregation (who.rs)
+-Compute score as `f64`.
+Compute score as `f64` using deterministic contribution ordering:
+1) sort by (username, signal, mr_id, ts)
+2) sum with Neumaier compensation
+Tie-break remains `(raw_score DESC, last_seen DESC, username ASC)`
+```
+
+8. **[Medium] Strengthen explainability with evidence, not just totals**
+Analysis and rationale: Component totals help, but disputes usually need “why this user got this score now.” Add compact top evidence rows per component (`mr_id`, `ts`, `raw_contribution`) behind an optional mode.
+```diff
+@@ ### 5b. Score Explainability via `--explain-score`
+-Component breakdown only (4 floats per user).
+Add `--explain-score=summary|full`:
+`summary`: current 4-component totals
+`full`: adds top N evidence rows per component (default N=3)
+Robot output includes per-evidence `mr_id`, `signal`, `ts`, `contribution`
+```
+
+9. **[Medium] Make query plan strategy explicit: `UNION ALL` default for dual-path scans**
+Analysis and rationale: You currently treat `UNION ALL` as fallback if planner regresses. For SQLite, OR-across-indexed-columns regressions are common enough that defaulting to branch-split queries is often more predictable.
+```diff
+@@ **Index optimization fallback (UNION ALL split)**
+-Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm degradation.
+Use `UNION ALL` + dedup as default for dual-path matching.
+Keep `OR` variant as optional strategy flag for benchmarking/regression checks.
+```
+
+10. **[Medium] Add explicit performance SLO + benchmark gate**
+Analysis and rationale: This plan is query-heavy and ranking-critical; add measurable performance budgets so future edits do not silently degrade UX. Include synthetic fixture benchmarks for exact, prefix, and suffix path modes.
+```diff
+@@ ## Verification
+8. Performance regression gate:
+   - `cargo bench --bench who_expert_scoring`
+   - Dataset tiers: 100k, 1M, 5M notes
+   - SLOs: p95 exact path < 150ms, prefix < 250ms, suffix < 400ms on reference hardware
+   - Fail CI if regression > 20% vs stored baseline
+```
+
+If you want, I can produce a single consolidated “iteration 5” plan document with these changes already merged into your current structure.
--- a/plans/time-decay-expert-scoring.md
+++ b/plans/time-decay-expert-scoring.md
@@ -2,12 +2,12 @@
 plan: true
 title: ""
 status: iterating
-iteration: 4
+iteration: 5
 target_iterations: 8
-beads_revision: 0
+beads_revision: 1
 related_plans: []
 created: 2026-02-08
-updated: 2026-02-08
+updated: 2026-02-09
 ---

 # Time-Decay Expert Scoring Model
@@ -78,6 +78,7 @@ Author/reviewer signals are deduplicated per MR (one signal per distinct MR). No
   - Change default `--since` from `"6m"` to `"24m"` (2 years captures all meaningful decayed signals)
   - Add `--as-of` flag for reproducible scoring at a fixed timestamp
   - Add `--explain-score` flag for per-user score component breakdown
+   - Add `--include-bots` flag to disable bot/service-account filtering
   - Sort on raw f64 score, round only for display
   - Update tests
 3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes (dual-path matching, reviewer participation CTE, path resolution probes)
@@ -100,6 +101,7 @@ pub struct ScoringConfig {
    pub note_half_life_days: u32,              // default: 45
    pub closed_mr_multiplier: f64,             // default: 0.5  (applied to closed-without-merge MRs)
    pub reviewer_min_note_chars: u32,          // default: 20   (minimum note body length to count as participation)
+    pub excluded_usernames: Vec<String>,       // default: []    (exact-match usernames to exclude, e.g. ["renovate-bot", "gitlab-ci"])
 }
 ```

@@ -108,6 +110,7 @@ pub struct ScoringConfig {
 - All `*_weight` / `*_bonus` must be >= 0 (negative weights produce nonsensical scores)
 - `closed_mr_multiplier` must be in `(0.0, 1.0]` (0 would discard closed MRs entirely; >1 would over-weight them)
 - `reviewer_min_note_chars` must be >= 0 (0 disables the filter; typical useful values: 10-50)
+- `excluded_usernames` entries must be non-empty strings (no blank entries)
 - Return `LoreError::ConfigInvalid` with a clear message on failure

 ### 2. Decay Function (who.rs)
@@ -128,25 +131,51 @@ The SQL uses **CTE-based dual-path matching** and **hybrid aggregation**. Rather
 MR-level signals return one row per (username, signal, mr_id) with a timestamp; note signals return one row per (username, mr_id) with `note_count` and `max_ts`. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and `log2(1+count)`.

 ```sql
-WITH matched_notes AS (
-  -- Centralize dual-path matching for DiffNotes
-  SELECT n.id, n.discussion_id, n.author_username, n.created_at,
-         n.position_new_path, n.position_old_path, n.project_id
+WITH matched_notes_raw AS (
+  -- Branch 1: match on new_path (uses idx_notes_new_path or equivalent)
+  SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
  FROM notes n
  WHERE n.note_type = 'DiffNote'
    AND n.is_system = 0
    AND n.author_username IS NOT NULL
    AND n.created_at >= ?2
-    AND n.created_at <= ?4
+    AND n.created_at < ?4
    AND (?3 IS NULL OR n.project_id = ?3)
-    AND (n.position_new_path {path_op} OR n.position_old_path {path_op})
+    AND n.position_new_path {path_op}
+  UNION ALL
+  -- Branch 2: match on old_path (uses idx_notes_old_path_author)
+  SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id
+  FROM notes n
+  WHERE n.note_type = 'DiffNote'
+    AND n.is_system = 0
+    AND n.author_username IS NOT NULL
+    AND n.created_at >= ?2
+    AND n.created_at < ?4
+    AND (?3 IS NULL OR n.project_id = ?3)
+    AND n.position_old_path {path_op}
 ),
-matched_file_changes AS (
-  -- Centralize dual-path matching for file changes
+matched_notes AS (
+  -- Dedup: prevent double-counting when old_path = new_path (no rename)
+  SELECT DISTINCT id, discussion_id, author_username, created_at, project_id
+  FROM matched_notes_raw
+),
+matched_file_changes_raw AS (
+  -- Branch 1: match on new_path (uses idx_mfc_new_path_project_mr)
  SELECT fc.merge_request_id, fc.project_id
  FROM mr_file_changes fc
  WHERE (?3 IS NULL OR fc.project_id = ?3)
-    AND (fc.new_path {path_op} OR fc.old_path {path_op})
+    AND fc.new_path {path_op}
+  UNION ALL
+  -- Branch 2: match on old_path (uses idx_mfc_old_path_project_mr)
+  SELECT fc.merge_request_id, fc.project_id
+  FROM mr_file_changes fc
+  WHERE (?3 IS NULL OR fc.project_id = ?3)
+    AND fc.old_path {path_op}
+),
+matched_file_changes AS (
+  -- Dedup: prevent double-counting when old_path = new_path (no rename)
+  SELECT DISTINCT merge_request_id, project_id
+  FROM matched_file_changes_raw
 ),
 reviewer_participation AS (
  -- Precompute which (mr_id, username) pairs have substantive DiffNote participation.
@@ -196,7 +225,7 @@ raw AS (
  WHERE m.author_username IS NOT NULL
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
-    AND {state_aware_ts} <= ?4
+    AND {state_aware_ts} < ?4

  UNION ALL

@@ -212,7 +241,7 @@ raw AS (
    AND (m.author_username IS NULL OR r.username != m.author_username)
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
-    AND {state_aware_ts} <= ?4
+    AND {state_aware_ts} < ?4

  UNION ALL

@@ -229,7 +258,7 @@ raw AS (
    AND (m.author_username IS NULL OR r.username != m.author_username)
    AND m.state IN ('opened','merged','closed')
    AND {state_aware_ts} >= ?2
-    AND {state_aware_ts} <= ?4
+    AND {state_aware_ts} < ?4
 ),
 aggregated AS (
  -- MR-level signals: 1 row per (username, signal_class, mr_id) with MAX(ts)
@@ -245,11 +274,11 @@ aggregated AS (
 SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated WHERE username IS NOT NULL
 ```

-Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `BETWEEN ?2 AND ?4` pattern ensures that when `--as-of` is set to a past date, events after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility.
+Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` exclusive upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `>= ?2 AND < ?4` pattern (half-open interval) ensures that when `--as-of` is set to a past date, events at or after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility. The exclusive upper bound avoids edge-case ambiguity when events have timestamps exactly equal to the as-of value.

-**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into `matched_notes` and `matched_file_changes` CTEs means path matching is defined once, the indexes are hit once, and adding future path resolution logic (e.g., alias chains) only requires changes in one place.
+**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into foundational CTEs (`matched_notes_raw` → `matched_notes`, `matched_file_changes_raw` → `matched_file_changes`) means path matching is defined once, each index branch is explicit, and adding future path resolution logic (e.g., alias chains) only requires changes in one place. The UNION ALL + dedup pattern ensures SQLite uses the optimal index for each path column independently.

-**Index optimization fallback (UNION ALL split)**: SQLite's query planner sometimes struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. If EXPLAIN QUERY PLAN shows this during step 6 verification, replace the `OR`-based CTEs with a `UNION ALL` split + dedup pattern:
+**Dual-path matching strategy (UNION ALL split)**: SQLite's query planner commonly struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. Rather than starting with `OR` and hoping the planner cooperates, use `UNION ALL` + dedup as the default strategy:
 ```sql
 matched_notes AS (
  SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op}
@@ -261,18 +290,18 @@ matched_notes_dedup AS (
  FROM matched_notes
 ),
 ```
-This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm the degradation.
+This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). The same pattern applies to `matched_file_changes`. The simpler `OR` variant is retained as a comment for benchmarking — if a future SQLite version handles `OR` well, the split can be collapsed.

 **Rationale for precomputed participation set**: The previous approach used correlated `EXISTS`/`NOT EXISTS` subqueries to classify reviewers. The `reviewer_participation` CTE materializes the set of `(mr_id, username)` pairs from matched DiffNotes once, then signal 4a JOINs against it (participated) and signal 4b LEFT JOINs with `IS NULL` (assigned-only). This avoids per-reviewer-row correlated scans, is easier to reason about, and produces the same exhaustive split — every `mr_reviewers` row falls into exactly one bucket.

 **Rationale for hybrid over fully-raw**: Pre-aggregating note counts in SQL prevents row explosion from heavy DiffNote volume on frequently-discussed paths. MR-level signals are already 1-per-MR by nature (deduped via GROUP BY in each subquery). This keeps memory and latency predictable regardless of review activity density.

-**Path rename awareness**: Both `matched_notes` and `matched_file_changes` CTEs match against both old and new path columns:
+**Path rename awareness**: Both `matched_notes` and `matched_file_changes` use UNION ALL + dedup to match against both old and new path columns independently, ensuring each branch uses its respective index:

- Notes: `(n.position_new_path {path_op} OR n.position_old_path {path_op})`
- File changes: `(fc.new_path {path_op} OR fc.old_path {path_op})`
+- Notes: branch 1 matches `position_new_path`, branch 2 matches `position_old_path`, deduped by `notes.id`
+- File changes: branch 1 matches `new_path`, branch 2 matches `old_path`, deduped by `(merge_request_id, project_id)`

-Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The `OR` match ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.
+Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The UNION ALL approach ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically.

 **Signal 4 splits into two**: The current signal 4 (`file_reviewer`) joins `mr_reviewers` but doesn't distinguish participation. In the new plan:

@@ -332,7 +361,7 @@ For each username, accumulate into a struct with:

 The `mr_state` field from each SQL row is stored alongside the timestamp so the Rust-side can apply `closed_mr_multiplier` when `mr_state == "closed"`.

-Compute score as `f64`. Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`:
+Compute score as `f64` with **deterministic contribution ordering**: within each signal type, sort contributions by `(mr_id ASC)` before summing. This eliminates platform-dependent HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the complexity of compensated summation (Neumaier/Kahan). Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`:
 ```
 state_mult(mr) = if mr.state == "closed" { closed_mr_multiplier } else { 1.0 }

@@ -352,6 +381,8 @@ Compute counts from the accumulated data:
 - `review_note_count = notes_per_mr.values().map(|(count, _)| count).sum()`
 - `author_mr_count = author_mrs.len()`

+**Bot/service-account filtering**: After accumulating all user scores and before sorting, filter out any username that appears in `config.scoring.excluded_usernames` (exact match, case-insensitive). This is applied in Rust post-query (not SQL) to keep the SQL clean and avoid parameter explosion. When `--include-bots` is active, the filter is skipped entirely. The robot JSON `resolved_input` includes `excluded_usernames_applied: true|false` to indicate whether filtering was active.
+
 Truncate to limit after sorting.

 ### 5. Default --since Change
@@ -364,10 +395,11 @@ At 2 years, author decay = 6%, reviewer decay = 0.4%, note decay = 0.006% — ne
 ### 5a. Reproducible Scoring via `--as-of`

 Add `--as-of <RFC3339|YYYY-MM-DD>` flag that overrides the `now_ms` reference point used for decay calculations. When set:
- All event selection is bounded by `[since_ms, as_of_ms]` — events after `as_of_ms` are excluded from SQL results entirely (not just decayed)
+- All event selection is bounded by `[since_ms, as_of_ms)` — exclusive upper bound; events at or after `as_of_ms` are excluded from SQL results entirely (not just decayed). The SQL uses `< ?4` (strict less-than), not `<= ?4`.
+- `YYYY-MM-DD` input (without time component) is interpreted as end-of-day UTC: `T23:59:59.999Z`. This matches user intuition that `--as-of 2025-06-01` means "as of the end of June 1st" rather than "as of midnight at the start of June 1st" which would exclude the entire day's activity.
 - All decay computations use `as_of_ms` instead of `SystemTime::now()`
 - The `--since` window is calculated relative to `as_of_ms` (not wall clock)
- Robot JSON `resolved_input` includes `as_of_ms` and `as_of_iso` fields
+- Robot JSON `resolved_input` includes `as_of_ms`, `as_of_iso`, `window_start_iso`, `window_end_iso`, and `window_end_exclusive: true` fields — making the exact query window unambiguous in output

 **Rationale**: Decayed scoring is time-sensitive by nature. Without a fixed reference point, the same query run minutes apart produces different rankings, making debugging and test reproducibility difficult. `--as-of` pins the clock so that results are deterministic for a given dataset. The upper-bound filter in SQL is critical — without it, events after the as-of date would enter with full weight (since `elapsed.max(0.0)` clamps negative elapsed time to zero), breaking the reproducibility guarantee.

@@ -484,7 +516,13 @@ Add timestamp-aware variants:

 **`test_closed_mr_multiplier`**: Two identical MRs (same author, same age, same path). One is `merged`, one is `closed`. The merged MR should contribute `author_weight * decay(...)`, the closed MR should contribute `author_weight * closed_mr_multiplier * decay(...)`. With default multiplier 0.5, the closed MR contributes half.

-**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the upper-bound filtering in SQL.
+**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the exclusive upper-bound (`< ?4`) filtering in SQL.
+
+**`test_as_of_exclusive_upper_bound`**: Insert an event with timestamp exactly equal to the `as_of_ms` value. Verify it is excluded from results (strict less-than, not less-than-or-equal). This validates the half-open interval `[since, as_of)` semantics.
+
+**`test_excluded_usernames_filters_bots`**: Insert signals for a user named "renovate-bot" and a user named "jsmith", both with the same activity. With `excluded_usernames: ["renovate-bot"]` in config, only "jsmith" should appear in results. Validates the Rust-side post-query filtering.
+
+**`test_include_bots_flag_disables_filtering`**: Same setup as above, but with `--include-bots` active. Both "renovate-bot" and "jsmith" should appear in results.

 **`test_null_timestamp_fallback_to_created_at`**: Insert a merged MR with `merged_at = NULL` (edge case: old data before the column was populated). The state-aware timestamp should fall back to `created_at`. Verify the score reflects `created_at`, not 0 or a panic.

@@ -496,11 +534,13 @@ Add timestamp-aware variants:

 **`test_reviewer_split_is_exhaustive`**: For a reviewer assigned to an MR, they must appear in exactly one of: participated (has substantive DiffNotes meeting `reviewer_min_note_chars`) or assigned-only (no DiffNotes, or only trivial ones below the threshold). Never both, never neither. Test three cases: (1) reviewer with substantive DiffNotes -> participated only, (2) reviewer with no DiffNotes -> assigned-only only, (3) reviewer with only trivial notes ("LGTM") -> assigned-only only.

+**`test_deterministic_accumulation_order`**: Insert signals for a user with contributions at many different timestamps (10+ MRs with varied ages). Run `query_expert` 100 times in a loop. All 100 runs must produce the exact same `f64` score (bit-identical). Validates that the sorted contribution ordering eliminates HashMap-iteration-order nondeterminism.
+
 ### 9. Existing Test Compatibility

 All existing tests insert data with `now_ms()`. With decay, elapsed ~0ms means decay ~1.0, so scores round to the same integers as before. No existing test assertions should break.

-The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, and `reviewer_min_note_chars` fields.
+The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, `reviewer_min_note_chars`, and `excluded_usernames` fields.

 ## Verification

@@ -511,11 +551,17 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul
 5. `ubs src/cli/commands/who.rs src/core/config.rs src/core/db.rs` — no bug scanner findings
 6. Manual query plan verification (not automated — SQLite planner varies across versions):
   - Run `EXPLAIN QUERY PLAN` on the expert query (both exact and prefix modes) against a real database
-   - Confirm that `matched_notes` CTE uses `idx_notes_old_path_author` or the existing new_path index (not a full table scan)
-   - Confirm that `matched_file_changes` CTE uses `idx_mfc_old_path_project_mr` or `idx_mfc_new_path_project_mr`
+   - Confirm that `matched_notes_raw` branch 1 uses the existing new_path index and branch 2 uses `idx_notes_old_path_author` (not a full table scan on either branch)
+   - Confirm that `matched_file_changes_raw` branch 1 uses `idx_mfc_new_path_project_mr` and branch 2 uses `idx_mfc_old_path_project_mr`
   - Confirm that `reviewer_participation` CTE uses `idx_notes_diffnote_discussion_author`
   - Document the observed plan in a comment near the SQL for future regression reference
-7. Real-world validation:
+7. Performance baseline (manual, not CI-gated):
+   - Run `time cargo run --release -- who --path <exact-path>` on the real database for exact, prefix, and suffix modes
+   - Target SLOs: p95 exact path < 200ms, prefix < 300ms, suffix < 500ms on development hardware
+   - Record baseline timings as a comment near the SQL for regression reference
+   - If any mode exceeds 2x the baseline after future changes, investigate before merging
+   - Note: These are soft targets for developer awareness, not automated CI gates. Automated benchmarking with synthetic fixtures (100k/1M/5M notes) is a v2 investment if performance becomes a real concern.
+8. Real-world validation:
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx` — verify jdefting/zhayes old reviews are properly discounted relative to recent authors
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx --all-history` — compare full history vs 24m window to validate cutoff is reasonable
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx --explain-score` — verify component breakdown sums to total and authored signal dominates for known authors
@@ -524,6 +570,7 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul
   - `cargo run --release -- who --path MeasurementQualityDialog.tsx --as-of 2025-06-01` — verify deterministic output across repeated runs
   - Spot-check that reviewers who only left "LGTM"-style notes are classified as assigned-only (not participated)
   - Verify closed MRs contribute at ~50% of equivalent merged MR scores via `--explain-score`
+   - If the project has known bot accounts (e.g., renovate-bot), add them to `excluded_usernames` config and verify they no longer appear in results. Run again with `--include-bots` to confirm they reappear.

 ## Accepted from External Review

@@ -553,12 +600,20 @@ Ideas incorporated from ChatGPT review (feedback-1 through feedback-4) that genu
 - **EXPLAIN QUERY PLAN verification step**: Manual check that the restructured queries use the new indexes (not automated, since SQLite planner varies across versions).

 **From feedback-4:**
- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `<= ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms]`. Without this, `--as-of` reproducibility was fundamentally broken.
+- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `< ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms)`. Without this, `--as-of` reproducibility was fundamentally broken. (Refined to exclusive upper bound in feedback-5.)
 - **Closed-state inconsistency resolution**: The state-aware CASE expression handled `closed` state but the WHERE clause filtered to `('opened','merged')` only — dead code. Resolved by including `'closed'` in state filters and adding a `closed_mr_multiplier` (default 0.5) applied in Rust to all signals from closed-without-merge MRs. This credits real review effort on abandoned MRs while appropriately discounting it.
 - **Substantive note threshold for reviewer participation**: A single "LGTM" shouldn't promote a reviewer from 3-point (assigned-only) to 10-point (participated) weight. Added `reviewer_min_note_chars` (default 20) config field and `LENGTH(TRIM(body))` filter in the `reviewer_participation` CTE. This raises the bar for participation classification to actual substantive review comments.
- **UNION ALL optimization fallback for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Added documentation of a `UNION ALL` split + dedup fallback pattern to use if EXPLAIN QUERY PLAN shows degradation during verification. Start with the simpler `OR` approach; switch only if needed.
+- **UNION ALL optimization for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Originally documented as a fallback; promoted to default strategy in feedback-5 iteration. The UNION ALL + dedup approach ensures each index branch is used independently.
 - **New tests**: `test_trivial_note_does_not_count_as_participation`, `test_closed_mr_multiplier`, `test_as_of_excludes_future_events` — cover the three new features added from this review round.

+**From feedback-5 (ChatGPT review):**
+- **Exclusive upper bound for `--as-of`**: Changed from `[since_ms, as_of_ms]` (inclusive) to `[since_ms, as_of_ms)` (exclusive). Half-open intervals are the standard convention in temporal systems — they eliminate edge-case ambiguity when events have timestamps exactly at the boundary. Also added `YYYY-MM-DD` → end-of-day UTC parsing and window metadata in robot output.
+- **UNION ALL as default for dual-path matching**: Promoted from "fallback if planner regresses" to default strategy. SQLite `OR`-across-indexed-columns degradation is common enough that the predictable UNION ALL + dedup approach is the safer starting point. The simpler `OR` variant is retained as a comment for benchmarking.
+- **Deterministic contribution ordering**: Within each signal type, sort contributions by `mr_id` before summing. This eliminates HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the overhead of compensated summation (Neumaier/Kahan was rejected as overkill at this scale).
+- **Minimal bot/service-account filtering**: Added `excluded_usernames` (exact match, case-insensitive) to `ScoringConfig` and `--include-bots` CLI flag. Applied as a Rust-side post-filter (not SQL) to keep queries clean. Scope is deliberately minimal — no regex patterns, no heuristic detection. Users configure the list for their team's specific bots.
+- **Performance baseline SLOs**: Added manual performance baseline step to verification — record timings for exact/prefix/suffix modes and flag >2x regressions. Kept lightweight (no CI gating, no synthetic benchmarks) to match the project's current maturity.
+- **New tests**: `test_as_of_exclusive_upper_bound`, `test_excluded_usernames_filters_bots`, `test_include_bots_flag_disables_filtering`, `test_deterministic_accumulation_order` — cover the newly-accepted features.
+
 ## Rejected Ideas (with rationale)

 These suggestions were considered during review but explicitly excluded from this iteration:
@@ -573,3 +628,10 @@ These suggestions were considered during review but explicitly excluded from thi
 - **Split scoring engine into core module** (feedback-4 #5): Proposed extracting scoring math from `who.rs` into `src/core/scoring/model_v2_decay.rs`. Premature modularization — `who.rs` is the only consumer and is ~800 lines. Adding module plumbing and indirection for a single call site adds complexity without reducing it. If we add a second scoring consumer (e.g., automated triage), revisit.
 - **Bot/service-account filtering** (feedback-4 #7): Real concern but orthogonal to time-decay scoring. This is a general data quality feature that belongs in its own issue — it affects all `who` modes, not just expert scoring. Adding `excluded_username_patterns` config and `--include-bots` flag is scope expansion that should be designed and tested independently.
 - **Model compare mode / rank-delta diagnostics** (feedback-4 #9): Over-engineered rollout safety for an internal CLI tool with ~3 users. Maintaining two parallel scoring codepaths (v1 flat + v2 decayed) doubles test surface and code complexity. The `--explain-score` + `--as-of` combination already provides debugging capability. If a future model change is risky enough to warrant A/B comparison, build it then.
+- **Canonical path identity graph** (feedback-5 #1, also feedback-2 #2, feedback-4 #4): Third time proposed, third time rejected. Building a rename graph from `mr_file_changes(old_path, new_path)` with identity resolution requires new schema (`path_identities`, `path_aliases` tables), ingestion pipeline changes, graph traversal at query time, and backfill logic for existing data. The UNION ALL dual-path matching already covers the 80%+ case (direct renames). Multi-hop rename chains (A→B→C) are rare in practice and can be addressed in v2 with real usage data showing the gap matters.
+- **Normalized `expertise_events` table** (feedback-5 #2): Proposes shifting from query-time CTE joins to a precomputed `expertise_events` table populated at ingest time. While architecturally appealing for read performance, this doubles the data surface area (raw tables + derived events), requires new ingestion pipelines with incremental upsert logic, backfill tooling for existing databases, and introduces consistency risks when raw data is corrected/re-synced. The CTE approach is correct, maintainable, and performant at our current scale. If query latency becomes a real bottleneck (see performance baseline SLOs), materialized views or derived tables become a v2 optimization.
+- **Reviewer engagement model upgrade** (feedback-5 #3): Proposes adding `approved`/`changes_requested` review-state signals and trivial-comment pattern matching (`["lgtm","+1","nit","ship it"]`). Expands the signal type count from 4 to 6 and adds a fragile pattern-matching layer (what about "don't ship it"? "lgtm but..."?). The `reviewer_min_note_chars` threshold is imperfect but pragmatic — it's a single configurable number with no false-positive risk from substring matching. Review-state signals may be worth adding later as a separate enhancement when we have data on how often they diverge from DiffNote participation.
+- **Contribution-floor auto cutoff for `--since`** (feedback-5 #5): Proposes `--since auto` computing the earliest relevant timestamp from `min_contribution_floor` (e.g., 0.01 points). Adds a non-obvious config parameter for minimal benefit — the 24m default is already mathematically justified from the decay curves (author: 6%, reviewer: 0.4% at 2 years) and easily overridden with `--since` or `--all-history`. The auto-derivation formula (`ceil(max_half_life * log2(1/floor))`) is opaque to users who just want to understand why a certain time range was selected.
+- **Full evidence drill-down in `--explain-score`** (feedback-5 #8): Proposes `--explain-score=summary|full` with per-MR evidence rows. Already rejected in feedback-2 #7. Component totals are sufficient for v1 debugging — they answer "which signal type drives this user's score." Per-MR drill-down requires additional SQL queries and significant output format complexity. Deferred unless component breakdowns prove insufficient.
+- **Neumaier compensated summation** (feedback-5 #7 partial): Accepted the sorting aspect for deterministic ordering, but rejected Neumaier/Kahan compensated summation. At the scale of dozens to low hundreds of contributions per user, the rounding error from naive f64 summation is on the order of 1e-14 — several orders of magnitude below any meaningful score difference. Compensated summation adds code complexity and a maintenance burden for no practical benefit at this scale.
+- **Automated CI benchmark gate** (feedback-5 #10 partial): Accepted manual performance baselines, but rejected automated CI regression gating with synthetic fixtures (100k/1M/5M notes). Building and maintaining benchmark infrastructure is a significant investment that's premature for a CLI tool with ~3 users. Manual timing checks during development are sufficient until performance becomes a real concern.
--- a/plans/work-item-status-graphql.feedback-3.md
+++ b/plans/work-item-status-graphql.feedback-3.md
@@ -0,0 +1,157 @@
+**Top Revisions I Recommend**
+
+1. **Fix auth semantics + a real inconsistency in your test plan**
+Your ACs require graceful handling for `403`, but the test list says the “403” test returns `401`. That hides the exact behavior you care about and can let permission regressions slip through.
+
+```diff
+@@ AC-1: GraphQL Client (Unit)
+ - [ ] HTTP 401 → `LoreError::GitLabAuthFailed`
+ + [ ] HTTP 401 → `LoreError::GitLabAuthFailed`
+ + [ ] HTTP 403 → `LoreError::GitLabForbidden`
+
+@@ AC-3: Status Fetcher (Integration)
+ - [ ] GraphQL 403 → returns `Ok(HashMap::new())` with warning log
+ + [ ] GraphQL 403 (`GitLabForbidden`) → returns `Ok(HashMap::new())` with warning log
+
+@@ TDD Plan (RED)
+ - 13. `test_fetch_statuses_403_graceful` — mock returns 401 → `Ok(HashMap::new())`
+ + 13. `test_fetch_statuses_403_graceful` — mock returns 403 → `Ok(HashMap::new())`
+```
+
+2. **Make enrichment atomic and stale-safe**
+Current plan can leave stale status values forever when a widget disappears or status becomes null. Make writes transactional and clear status fields for fetched scope before upserts.
+
+```diff
+@@ AC-6: Enrichment in Orchestrator (Integration)
+ + [ ] Enrichment DB writes are transactional per project (all-or-nothing)
+ + [ ] Status fields are cleared for fetched issue scope before applying new statuses
+ + [ ] If enrichment fails mid-project, prior persisted statuses are unchanged (rollback)
+
+@@ File 6: `src/ingestion/orchestrator.rs`
+- fn enrich_issue_statuses(...)
+ fn enrich_issue_statuses_txn(...)
+ // BEGIN TRANSACTION
+ // clear status columns for fetched issue scope
+ // apply updates
+ // COMMIT
+```
+
+3. **Add transient retry/backoff (429/5xx/network)**
+Right now one transient failure loses status enrichment for that sync. Retrying with bounded backoff gives much better reliability at low cost.
+
+```diff
+@@ AC-1: GraphQL Client (Unit)
+ + [ ] Retries 429/502/503/504/network errors with bounded exponential backoff + jitter (max 3 attempts)
+ + [ ] Honors `Retry-After` on 429 before retrying
+
+@@ AC-6: Enrichment in Orchestrator (Integration)
+ + [ ] Cancellation signal is checked before each retry sleep and between paginated calls
+```
+
+4. **Stop full GraphQL scans when nothing changed**
+Running full pagination on every sync will dominate runtime on large repos. Trigger enrichment only when issue ingestion reports changes, with a manual override.
+
+```diff
+@@ AC-6: Enrichment in Orchestrator (Integration)
+- [ ] Runs on every sync (not gated by `--full`)
+ [ ] Runs when issue ingestion changed at least one issue in the project
+ [ ] New override flag `--refresh-status` forces enrichment even with zero issue deltas
+ [ ] Optional periodic full refresh (e.g. every N syncs) to prevent long-tail drift
+```
+
+5. **Do not expose raw token via `client.token()`**
+Architecturally cleaner and safer: keep token encapsulated and expose a GraphQL-ready client factory from `GitLabClient`.
+
+```diff
+@@ File 13: `src/gitlab/client.rs`
+- pub fn token(&self) -> &str
+ pub fn graphql_client(&self) -> crate::gitlab::graphql::GraphqlClient
+
+@@ File 6: `src/ingestion/orchestrator.rs`
+- let graphql_client = GraphqlClient::new(&config.gitlab.base_url, client.token());
+ let graphql_client = client.graphql_client();
+```
+
+6. **Add indexes for new status filters**
+`--status` on large tables will otherwise full-scan `issues`. Add compound indexes aligned with project-scoped list queries.
+
+```diff
+@@ AC-4: Migration 021 (Unit)
+ + [ ] Adds index `idx_issues_project_status_name(project_id, status_name)`
+ + [ ] Adds index `idx_issues_project_status_category(project_id, status_category)`
+
+@@ File 14: `migrations/021_work_item_status.sql`
+ ALTER TABLE issues ADD COLUMN status_name TEXT;
+ ALTER TABLE issues ADD COLUMN status_category TEXT;
+ ALTER TABLE issues ADD COLUMN status_color TEXT;
+ ALTER TABLE issues ADD COLUMN status_icon_name TEXT;
+CREATE INDEX IF NOT EXISTS idx_issues_project_status_name
+  ON issues(project_id, status_name);
+CREATE INDEX IF NOT EXISTS idx_issues_project_status_category
+  ON issues(project_id, status_category);
+```
+
+7. **Improve filter UX: add category filter + case-insensitive status**
+Case-sensitive exact name matches are brittle with custom lifecycle names. Category filter is stable and useful for automation.
+
+```diff
+@@ AC-9: List Issues Filter (E2E)
+- [ ] Filter is case-sensitive (matches GitLab's exact status name)
+ [ ] `--status` uses case-insensitive exact match by default (`COLLATE NOCASE`)
+ [ ] New filter `--status-category` supports `triage|to_do|in_progress|done|canceled`
+ [ ] `--status-exact` enables strict case-sensitive behavior when needed
+```
+
+8. **Add capability probe/cache to avoid pointless calls**
+Free tier / old GitLab versions will never return status widget. Cache that capability per project (with TTL) to reduce noise and wasted requests.
+
+```diff
+@@ GitLab API Constraints
+### Capability Probe
+On first sync per project, detect status-widget support and cache result for 24h.
+If unsupported, skip enrichment silently (debug log) until TTL expiry.
+
+@@ AC-3: Status Fetcher (Integration)
+ [ ] Unsupported capability state bypasses GraphQL fetch and warning spam
+```
+
+9. **Use a nested robot `status` object instead of 4 top-level fields**
+This is cleaner schema design and scales better as status metadata grows (IDs, lifecycle, timestamps, etc.).
+
+```diff
+@@ AC-7: Show Issue Display (Robot)
+- [ ] JSON includes `status_name`, `status_category`, `status_color`, `status_icon_name` fields
+- [ ] Fields are `null` (not absent) when status not available
+ [ ] JSON includes `status` object:
+       `{ "name": "...", "category": "...", "color": "...", "icon_name": "..." }` or `null`
+
+@@ AC-8: List Issues Display (Robot)
+- [ ] JSON includes `status_name`, `status_category` fields on each issue
+ [ ] JSON includes `status` object (or `null`) on each issue
+```
+
+10. **Add one compelling feature: status analytics, not just status display**
+Right now this is mostly a transport/display enhancement. Make it genuinely useful with “stale in-progress” detection and age-in-status filters.
+
+```diff
+@@ Acceptance Criteria
+### AC-11: Status Aging & Triage Value (E2E)
+- [ ] `lore list issues --status-category in_progress --stale-days 14` filters to stale work
+- [ ] Human table shows `Status Age` (days) when status exists
+- [ ] Robot output includes `status_age_days` (nullable integer)
+```
+
+11. **Harden test plan around failure modes you’ll actually hit**
+The current tests are good, but miss rollback/staleness/retry behavior that drives real reliability.
+
+```diff
+@@ TDD Plan (RED) additions
+21. `test_enrich_clears_removed_status`
+22. `test_enrich_transaction_rolls_back_on_failure`
+23. `test_graphql_retry_429_then_success`
+24. `test_graphql_retry_503_then_success`
+25. `test_cancel_during_backoff_aborts_cleanly`
+26. `test_status_filter_query_uses_project_status_index` (EXPLAIN smoke test)
+```
+
+If you want, I can produce a fully revised v3 plan document end-to-end (frontmatter + reordered ACs + updated file list + updated TDD matrix) so it is ready to implement directly.
--- a/plans/work-item-status-graphql.feedback-4.md
+++ b/plans/work-item-status-graphql.feedback-4.md
@@ -0,0 +1,159 @@
+Your plan is already strong, but I’d revise it in 10 places to reduce risk at scale and make it materially more useful.
+
+1. Shared transport + retries for GraphQL (must-have)
+Reasoning: `REST` already has throttling/retry in `src/gitlab/client.rs`; your proposed GraphQL client would bypass that and can spike rate limits under concurrent project ingest (`src/cli/commands/ingest.rs`). Unifying transport prevents split behavior and cuts production incidents.
+
+```diff
+@@ AC-1: GraphQL Client (Unit)
+- [ ] Network error → `LoreError::Other`
+ [ ] GraphQL requests use shared GitLab transport (same timeout, rate limiter, retry policy as REST)
+ [ ] Retries 429/502/503/504/network errors (max 3) with exponential backoff + jitter
+ [ ] 429 honors `Retry-After` before retrying
+ [ ] Exhausted network retries → `LoreError::GitLabNetworkError`
+
+@@ Decisions
+- 8. **No retry/backoff in v1** — DEFER.
+ 8. **Retry/backoff in v1** — YES (shared REST+GraphQL reliability policy).
+
+@@ Implementation Detail
+ File 15: `src/gitlab/transport.rs` (NEW) — shared HTTP execution and retry/backoff policy.
+```
+
+2. Capability cache for unsupported projects (must-have)
+Reasoning: Free tier / older GitLab will repeatedly emit warning noise every sync and waste calls. Cache support status per project and re-probe on TTL.
+
+```diff
+@@ AC-6: Enrichment in Orchestrator (Integration)
+- [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync)
+ [ ] Unsupported capability responses (missing endpoint/type/widget) are cached per project
+ [ ] While cached unsupported, enrichment is skipped without repeated warning spam
+ [ ] Capability cache auto-expires (default 24h) and is re-probed
+
+@@ Migration Numbering
+- This feature uses **migration 021**.
+ This feature uses **migrations 021-022**.
+
+@@ Files Changed (Summary)
+ `migrations/022_project_capabilities.sql` | NEW — support cache table for project capabilities
+```
+
+3. Delta-first enrichment with periodic full reconcile (must-have)
+Reasoning: Full GraphQL scan every sync is expensive for large projects. You already compute issue deltas in ingestion; use that as fast path and keep a periodic full sweep as safety net.
+
+```diff
+@@ AC-6: Enrichment in Orchestrator (Integration)
+- [ ] Runs on every sync (not gated by `--full`)
+ [ ] Fast path: skip status enrichment when issue ingestion upserted 0 issues for that project
+ [ ] Safety net: run full reconciliation every `status_full_reconcile_hours` (default 24)
+ [ ] `--full` always forces reconciliation
+
+@@ AC-5: Config Toggle (Unit)
+ [ ] `SyncConfig` has `status_full_reconcile_hours: u32` (default 24)
+```
+
+4. Strongly typed widget parsing via `__typename` (must-have)
+Reasoning: current “deserialize arbitrary widget JSON into `StatusWidget`” is fragile. Query/type by `__typename` for forward compatibility and fewer silent parse mistakes.
+
+```diff
+@@ AC-3: Status Fetcher (Integration)
+- [ ] Extracts status from `widgets` array by matching `WorkItemWidgetStatus` fragment
+ [ ] Query includes `widgets { __typename ... }` and parser matches `__typename == "WorkItemWidgetStatus"`
+ [ ] Non-status widgets are ignored deterministically (no heuristic JSON-deserialize attempts)
+
+@@ GraphQL Query
+       widgets {
+         __typename
+         ... on WorkItemWidgetStatus { ... }
+       }
+```
+
+5. Set-based transactional DB apply (must-have)
+Reasoning: row-by-row clear/update loops will be slow on large projects and hold write locks longer. Temp-table + set-based SQL inside one txn is faster and easier to reason about rollback.
+
+```diff
+@@ AC-3: Status Fetcher (Integration)
+- `all_fetched_iids: Vec<i64>`
+ `all_fetched_iids: HashSet<i64>`
+
+@@ AC-6: Enrichment in Orchestrator (Integration)
+- [ ] Before applying updates, NULL out status fields ... (loop per IID)
+- [ ] UPDATE SQL: `SET status_name=?, ... WHERE project_id=? AND iid=?`
+ [ ] Use temp tables and set-based SQL in one transaction:
+ [ ]   (1) clear stale statuses for fetched IIDs absent from status rows
+ [ ]   (2) apply status values for fetched IIDs with status
+ [ ] One commit per project; rollback leaves prior state intact
+```
+
+6. Fix index strategy for `COLLATE NOCASE` + default sorting (must-have)
+Reasoning: your proposed `(project_id, status_name)` index may not fully help `COLLATE NOCASE` + `ORDER BY updated_at`. Tune index to real query shape in `src/cli/commands/list.rs`.
+
+```diff
+@@ AC-4: Migration 021 (Unit)
+- [ ] Adds compound index `idx_issues_project_status_name(project_id, status_name)` for `--status` filter performance
+ [ ] Adds covering NOCASE-aware index:
+ [ ]   `idx_issues_project_status_name_nocase_updated(project_id, status_name COLLATE NOCASE, updated_at DESC)`
+ [ ] Adds category index:
+ [ ]   `idx_issues_project_status_category_nocase(project_id, status_category COLLATE NOCASE)`
+```
+
+7. Add stable/automation-friendly filters now (high-value feature)
+Reasoning: status names are user-customizable and renameable; category is more stable. Also add `--no-status` for quality checks and migration visibility.
+
+```diff
+@@ AC-9: List Issues Filter (E2E)
+ [ ] `lore list issues --status-category in_progress` filters by category (case-insensitive)
+ [ ] `lore list issues --no-status` returns only issues where `status_name IS NULL`
+ [ ] `--status` + `--status-category` combine with AND logic
+
+@@ File 9: `src/cli/mod.rs`
+ Add flags: `--status-category`, `--no-status`
+
+@@ File 11: `src/cli/autocorrect.rs`
+ Register `--status-category` and `--no-status` for `issues`
+```
+
+8. Better enrichment observability and failure accounting (must-have ops)
+Reasoning: only tracking `statuses_enriched` hides skipped/cleared/errors, and auth failures become silent partial data quality issues. Add counters and explicit progress events.
+
+```diff
+@@ AC-6: Enrichment in Orchestrator (Integration)
+- [ ] `IngestProjectResult` gains `statuses_enriched: usize` counter
+- [ ] Progress event: `ProgressEvent::StatusEnrichmentComplete { enriched: usize }`
+ [ ] `IngestProjectResult` gains:
+ [ ]   `statuses_enriched`, `statuses_cleared`, `status_enrichment_skipped`, `status_enrichment_failed`
+ [ ] Progress events:
+ [ ]   `StatusEnrichmentStarted`, `StatusEnrichmentSkipped`, `StatusEnrichmentComplete`, `StatusEnrichmentFailed`
+ [ ] End-of-sync summary includes per-project enrichment outcome counts
+```
+
+9. Add `status_changed_at` for immediately useful workflow analytics (high-value feature)
+Reasoning: without change timestamp, you can’t answer “how long has this been in progress?” which is one of the most useful agent/human queries.
+
+```diff
+@@ AC-4: Migration 021 (Unit)
+ [ ] Adds nullable INTEGER column `status_changed_at` (ms epoch UTC)
+
+@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] If status_name/category changes, update `status_changed_at = now_ms()`
+ [ ] If status is cleared, set `status_changed_at = NULL`
+
+@@ AC-9: List Issues Filter (E2E)
+ [ ] `lore list issues --stale-status-days N` filters by `status_changed_at <= now - N days`
+```
+
+10. Expand test matrix for real-world failure/perf paths (must-have)
+Reasoning: current tests are good, but the highest-risk failures are retry behavior, capability caching, idempotency under repeated runs, and large-project performance.
+
+```diff
+@@ TDD Plan — RED Phase
+ 26. `test_graphql_retries_429_with_retry_after_then_succeeds`
+ 27. `test_graphql_retries_503_then_fails_after_max_attempts`
+ 28. `test_capability_cache_skips_unsupported_project_until_ttl_expiry`
+ 29. `test_delta_skip_when_no_issue_upserts`
+ 30. `test_periodic_full_reconcile_runs_after_threshold`
+ 31. `test_set_based_enrichment_scales_10k_issues_without_timeout`
+ 32. `test_enrichment_idempotent_across_two_runs`
+ 33. `test_status_changed_at_updates_only_on_actual_status_change`
+```
+
+If you want, I can now produce a single consolidated revised plan document (full rewritten Markdown) with these changes merged in-place so it’s ready to execute.
--- a/plans/work-item-status-graphql.feedback-5.md
+++ b/plans/work-item-status-graphql.feedback-5.md
@@ -0,0 +1,124 @@
+Your plan is already strong and implementation-aware. The best upgrades are mostly about reliability under real-world API instability, large-scale performance, and making the feature more useful for automation.
+
+1. Promote retry/backoff from deferred to in-scope now.
+Reason: Right now, transient failures cause silent status gaps until a later sync. Bounded retries with jitter and a time budget dramatically improve successful enrichment without making syncs hang.
+
+```diff
+@@ AC-1: GraphQL Client (Unit) @@
+ - [ ] Network error → `LoreError::Other`
+ + [ ] Transient failures (`429`, `502`, `503`, `504`, timeout, connect reset) retry with exponential backoff + jitter (max 3 attempts)
+ + [ ] `Retry-After` supports both delta-seconds and HTTP-date formats
+ + [ ] Per-request retry budget capped (e.g. 120s total) to preserve cancellation responsiveness
+
+@@ AC-6: Enrichment in Orchestrator (Integration) @@
+ - [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync)
+ + [ ] On transient GraphQL errors: retry policy applied before warning/skip behavior
+
+@@ Decisions @@
+ - 8. **No retry/backoff in v1** — DEFER.
+ + 8. **Retry/backoff in v1** — YES. Required for reliable enrichment under normal GitLab/API turbulence.
+```
+
+2. Add a capability cache so unsupported projects stop paying repeated GraphQL cost.
+Reason: Free tier / older instances will never return status widgets. Re-querying every sync is wasted time and noisy logs.
+
+```diff
+@@ Acceptance Criteria @@
+ + ### AC-11: Capability Probe & Cache (Integration)
+ + - [ ] Add `project_capabilities` cache with `supports_work_item_status`, `checked_at`, `cooldown_until`
+ + - [ ] 404/403/known-unsupported responses update capability cache and suppress repeated warnings until TTL expires
+ + - [ ] Supported projects still enrich every run (subject to normal schedule)
+
+@@ Future Enhancements (Not in Scope) @@
+ - **Capability probe/cache**: Detect status-widget support per project ... (deferred)
+ + (moved into scope as AC-11)
+```
+
+3. Make enrichment delta-aware with periodic forced reconciliation.
+Reason: Full pagination every sync is expensive on large projects. You can skip unnecessary status fetches when no issue changes occurred, while still doing periodic safety sweeps.
+
+```diff
+@@ AC-6: Enrichment in Orchestrator (Integration) @@
+ - [ ] Runs on every sync (not gated by `--full`)
+ + [ ] Runs when issue ingestion reports project issue deltas OR reconcile window elapsed
+ + [ ] New config: `status_reconcile_hours` (default: 24) for periodic full sweep
+ + [ ] `--refresh-status` forces enrichment regardless of delta/reconcile window
+```
+
+4. Replace row-by-row update loops with set-based SQL via temp table.
+Reason: Current per-IID loops are simple but slow at scale and hold locks longer. Set-based updates are much faster and reduce lock contention.
+
+```diff
+@@ File 6: `src/ingestion/orchestrator.rs` (MODIFY) @@
+ - for iid in all_fetched_iids { ... UPDATE issues ... }
+ - for (iid, status) in statuses { ... UPDATE issues ... }
+ + CREATE TEMP TABLE temp_issue_status_updates(...)
+ + bulk INSERT temp rows (iid, name, category, color, icon_name)
+ + single set-based UPDATE for enriched rows
+ + single set-based NULL-clear for fetched-without-status rows
+ + commit transaction
+```
+
+5. Add strict mode and explicit partial-failure reporting.
+Reason: “Warn and continue” is good default UX, but automation needs a fail-fast option and machine-readable failure output.
+
+```diff
+@@ AC-5: Config Toggle (Unit) @@
+ + - [ ] `SyncConfig` adds `status_enrichment_strict: bool` (default false)
+
+@@ AC-6: Enrichment in Orchestrator (Integration) @@
+ - [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync)
+ + [ ] Default mode: warn + continue
+ + [ ] Strict mode: status enrichment error fails sync for that run
+
+@@ AC-6: IngestProjectResult @@
+ + - [ ] Adds `status_enrichment_error: Option<String>`
+
+@@ AC-8 / Robot sync envelope @@
+ + - [ ] Robot output includes `partial_failures` array with per-project enrichment failures
+```
+
+6. Fix case-insensitive matching robustness and track freshness.
+Reason: SQLite `COLLATE NOCASE` is ASCII-centric; custom statuses may be non-ASCII. Also you need visibility into staleness.
+
+```diff
+@@ AC-4: Migration 021 (Unit) @@
+ - [ ] Migration adds 4 nullable TEXT columns to `issues`
+ + [ ] Migration adds 6 columns:
+ +     `status_name`, `status_category`, `status_color`, `status_icon_name`,
+ +     `status_name_fold`, `status_synced_at`
+ - [ ] Adds compound index `idx_issues_project_status_name(project_id, status_name)`
+ + [ ] Adds compound index `idx_issues_project_status_name_fold(project_id, status_name_fold)`
+
+@@ AC-9: List Issues Filter (E2E) @@
+ - [ ] Filter uses case-insensitive matching (`COLLATE NOCASE`)
+ + [ ] Filter uses `status_name_fold` (Unicode-safe fold normalization done at write time)
+```
+
+7. Expand filtering to category and missing-status workflows.
+Reason: Name filters are useful, but automation is better on semantic categories and “missing data” detection.
+
+```diff
+@@ AC-9: List Issues Filter (E2E) @@
+ + - [ ] `--status-category in_progress` filters by `status_category` (case-insensitive)
+ + - [ ] `--no-status` returns only issues where `status_name IS NULL`
+ + - [ ] `--status` and `--status-category` can be combined with AND logic
+```
+
+8. Change robot payload from flat status fields to a nested `status` object.
+Reason: Better schema evolution and less top-level field sprawl as you add metadata (`synced_at`, future lifecycle fields).
+
+```diff
+@@ AC-7: Show Issue Display (E2E) @@
+ - [ ] JSON includes `status_name`, `status_category`, `status_color`, `status_icon_name` fields
+ - [ ] Fields are `null` (not absent) when status not available
+ + [ ] JSON includes `status` object:
+ +     `{ "name", "category", "color", "icon_name", "synced_at" }`
+ + [ ] `status: null` when not available
+
+@@ AC-8: List Issues Display (E2E) @@
+ - [ ] `--fields` supports: `status_name`, `status_category`, `status_color`, `status_icon_name`
+ + [ ] `--fields` supports: `status.name,status.category,status.color,status.icon_name,status.synced_at`
+```
+
+If you want, I can produce a fully rewritten “Iteration 5” plan document with these changes integrated end-to-end (ACs, files, migrations, TDD batches, and updated decisions/future-scope).
--- a/plans/work-item-status-graphql.feedback-6.md
+++ b/plans/work-item-status-graphql.feedback-6.md
@@ -0,0 +1,130 @@
+Your iteration-5 plan is strong. The biggest remaining gaps are outcome ambiguity, cancellation safety, and long-term status identity. These are the revisions I’d make.
+
+1. **Make enrichment outcomes explicit (not “empty success”)**
+Analysis:
+Right now `404/403 -> Ok(empty)` is operationally ambiguous: “project has no statuses” vs “feature unavailable/auth issue.” Agents and dashboards need that distinction to make correct decisions.
+This improves reliability and observability without making sync fail-hard.
+
+```diff
+@@ AC-3: Status Fetcher (Integration)
+-- [ ] `fetch_issue_statuses()` returns `FetchStatusResult` containing:
+- [ ] `fetch_issue_statuses()` returns `FetchStatusOutcome`:
+  - `Fetched(FetchStatusResult)`
+  - `Unsupported { reason: UnsupportedReason }`
+  - `CancelledPartial(FetchStatusResult)`
+@@
+-- [ ] GraphQL 404 → returns `Ok(FetchStatusResult)` with empty collections + warning log
+-- [ ] GraphQL 403 (`GitLabAuthFailed`) → returns `Ok(FetchStatusResult)` with empty collections + warning log
+- [ ] GraphQL 404 → `Unsupported { reason: GraphqlEndpointMissing }` + warning log
+- [ ] GraphQL 403 (`GitLabAuthFailed`) → `Unsupported { reason: AuthForbidden }` + warning log
+
+@@ AC-10: Robot Sync Envelope (E2E)
+-- [ ] `status_enrichment` object: `{ "enriched": N, "cleared": N, "error": null | "message" }`
+- [ ] `status_enrichment` object: `{ "mode": "fetched|unsupported|cancelled_partial", "reason": null|"...", "enriched": N, "cleared": N, "error": null|"message" }`
+```
+
+2. **Add cancellation and pagination loop safety**
+Analysis:
+Large projects can run long. Current flow checks cancellation only before enrichment starts; pagination and per-row update loops can ignore cancellation for too long. Also, GraphQL cursor bugs can create infinite loops (`hasNextPage=true` with unchanged cursor).
+This is a robustness must-have.
+
+```diff
+@@ AC-3: Status Fetcher (Integration)
+ [ ] `fetch_issue_statuses()` accepts cancellation signal and checks it between page requests
+ [ ] Pagination guard: if `hasNextPage=true` but `endCursor` is `None` or unchanged, abort loop with warning and return partial outcome
+ [ ] Emits `pages_fetched` count for diagnostics
+
+@@ File 1: `src/gitlab/graphql.rs`
+-- pub async fn fetch_issue_statuses(client: &GraphqlClient, project_path: &str) -> Result<FetchStatusResult>
+- pub async fn fetch_issue_statuses(client: &GraphqlClient, project_path: &str, signal: &CancellationSignal) -> Result<FetchStatusOutcome>
+```
+
+3. **Persist stable `status_id` in addition to name**
+Analysis:
+`status_name` is display-oriented and mutable (rename/custom lifecycle changes). A stable status identifier is critical for durable automations, analytics, and future migrations.
+This is a schema decision that is cheap now and expensive later if skipped.
+
+```diff
+@@ AC-2: Status Types (Unit)
+-- [ ] `WorkItemStatus` struct has `name`, `category`, `color`, `icon_name`
+- [ ] `WorkItemStatus` struct has `id: String`, `name`, `category`, `color`, `icon_name`
+
+@@ AC-4: Migration 021 (Unit)
+-- [ ] Migration adds 5 nullable columns to `issues`: `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at`
+- [ ] Migration adds 6 nullable columns to `issues`: `status_id`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at`
+ [ ] Adds index `idx_issues_project_status_id(project_id, status_id)` for stable-machine filters
+
+@@ GraphQL query
+-            status { name category color iconName }
+            status { id name category color iconName }
+
+@@ AC-7 / AC-8 Robot
+ [ ] JSON includes `status_id` (null when unavailable)
+```
+
+4. **Handle GraphQL partial-data responses correctly**
+Analysis:
+GraphQL can return both `data` and `errors` in the same response. Current plan treats any `errors` as hard failure, which can discard valid data and reduce reliability.
+Use partial-data semantics: keep data, log/report warnings.
+
+```diff
+@@ AC-1: GraphQL Client (Unit)
+-- [ ] Error response: if top-level `errors` array is non-empty, returns `LoreError` with first error message
+- [ ] If `errors` non-empty and `data` missing: return `LoreError` with first error message
+- [ ] If `errors` non-empty and `data` present: return `data` + warning metadata (do not fail the whole fetch)
+
+@@ TDD Plan (RED)
+ 33. `test_graphql_partial_data_with_errors_returns_data_and_warning`
+```
+
+5. **Extract status enrichment from orchestrator into dedicated module**
+Analysis:
+`orchestrator.rs` already has many phases. Putting status transport/parsing/transaction policy directly there increases coupling and test friction.
+A dedicated module improves architecture clarity and makes future enhancements safer.
+
+```diff
+@@ Implementation Detail
+- File 15: `src/ingestion/enrichment/status.rs` (NEW)
+  - `run_status_enrichment(...)`
+  - `enrich_issue_statuses_txn(...)`
+  - outcome mapping + telemetry
+
+@@ File 6: `src/ingestion/orchestrator.rs`
+-- Inline Phase 1.5 logic + helper function
+- Delegates to `enrichment::status::run_status_enrichment(...)` and records returned stats
+```
+
+6. **Add status/state consistency checks**
+Analysis:
+GitLab states status categories and issue state should synchronize, but ingestion drift or API edge cases can violate this. Detecting mismatch is high-signal for data integrity issues.
+This is compelling for agents because it catches “looks correct but isn’t” problems.
+
+```diff
+@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] Enrichment computes `status_state_mismatches` count:
+   - `DONE|CANCELED` with `state=open` or `TO_DO|IN_PROGRESS|TRIAGE` with `state=closed`
+ [ ] Logs warning summary when mismatches > 0
+
+@@ AC-10: Robot Sync Envelope (E2E)
+ [ ] `status_enrichment` includes `state_mismatches: N`
+```
+
+7. **Add explicit performance envelope acceptance criterion**
+Analysis:
+Plan claims large-project handling, but no hard validation target is defined. Add a bounded, reproducible performance criterion to prevent regressions.
+This is especially important with pagination + per-row writes.
+
+```diff
+@@ Acceptance Criteria
+ ### AC-12: Performance Envelope (Integration)
+ - [ ] 10k-issue fixture completes status fetch + apply within defined budget on CI baseline machine
+ - [ ] Memory usage remains O(page_size), not O(total_issues)
+ - [ ] Cancellation during large sync exits within a bounded latency target
+
+@@ TDD Plan (RED)
+ 34. `test_enrichment_large_project_budget`
+ 35. `test_fetch_statuses_memory_bound_by_page`
+ 36. `test_cancellation_latency_during_pagination`
+```
+
+If you want, I can next produce a single consolidated “iteration 6” plan draft with these diffs fully merged so it’s ready to execute.
--- a/plans/work-item-status-graphql.feedback-7.md
+++ b/plans/work-item-status-graphql.feedback-7.md
@@ -0,0 +1,118 @@
+**Highest-Impact Revisions (new, not in your rejected list)**
+
+1. **Critical: Preserve GraphQL partial-error metadata end-to-end (don’t just log it)**
+Rationale: Right now partial GraphQL errors are warning-only. Agents get no machine-readable signal that status data may be incomplete, which can silently corrupt downstream automation decisions. Exposing partial-error metadata in `FetchStatusResult` and robot sync output makes reliability observable and actionable.
+
+```diff
+@@ AC-1: GraphQL Client (Unit)
+- [ ] Partial-data response: if `errors` array is non-empty BUT `data` field is present and non-null, returns `data` and logs warning with first error message
+ [ ] Partial-data response: if `errors` array is non-empty BUT `data` field is present and non-null, returns `data` and warning metadata (`had_errors=true`, `first_error_message`)
+ [ ] `GraphqlClient::query()` returns `GraphqlQueryResult { data, had_errors, first_error_message }`
+
+@@ AC-3: Status Fetcher (Integration)
+ [ ] `FetchStatusResult` includes `partial_error_count: usize` and `first_partial_error: Option<String>`
+ [ ] Partial GraphQL errors increment `partial_error_count` and are surfaced to orchestrator result
+
+@@ AC-10: Robot Sync Envelope (E2E)
+- { "mode": "...", "reason": ..., "enriched": N, "cleared": N, "error": ... }
+ { "mode": "...", "reason": ..., "enriched": N, "cleared": N, "error": ..., "partial_errors": N, "first_partial_error": null|"..." }
+
+@@ File 1: src/gitlab/graphql.rs
+- pub async fn query(...) -> Result<serde_json::Value>
+ pub async fn query(...) -> Result<GraphqlQueryResult>
+ pub struct GraphqlQueryResult { pub data: serde_json::Value, pub had_errors: bool, pub first_error_message: Option<String> }
+```
+
+2. **High: Add adaptive page-size fallback for GraphQL complexity/timeout failures**
+Rationale: Fixed `first: 100` is brittle on self-hosted instances with stricter complexity/time limits. Adaptive page size (100→50→25→10) improves success rate without retries/backoff and avoids failing an entire project due to one tunable server constraint.
+
+```diff
+@@ Query Path
+-query($projectPath: ID!, $after: String) { ... workItems(types: [ISSUE], first: 100, after: $after) ... }
+query($projectPath: ID!, $after: String, $first: Int!) { ... workItems(types: [ISSUE], first: $first, after: $after) ... }
+
+@@ AC-3: Status Fetcher (Integration)
+ [ ] Starts with `first=100`; on GraphQL complexity/timeout errors, retries same cursor with smaller page size (50, 25, 10)
+ [ ] If smallest page size still fails, returns error as today
+ [ ] Emits warning including page size downgrade event
+
+@@ TDD Plan (RED)
+ 36. `test_fetch_statuses_complexity_error_reduces_page_size`
+ 37. `test_fetch_statuses_timeout_error_reduces_page_size`
+```
+
+3. **High: Make project path lookup failure non-fatal for the sync**
+Rationale: Enrichment is optional. If `projects.path_with_namespace` lookup fails for any reason, sync should continue with a structured enrichment error instead of risking full project pipeline failure.
+
+```diff
+@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] If project path lookup fails/missing, status enrichment is skipped for that project, warning logged, and sync continues
+ [ ] `status_enrichment_error` captures `"project_path_missing"` (or DB error text)
+
+@@ File 6: src/ingestion/orchestrator.rs
+- let project_path: String = conn.query_row(...)?;
+ let project_path = conn.query_row(...).optional()?;
+ if project_path.is_none() {
+   result.status_enrichment_error = Some("project_path_missing".to_string());
+   result.status_enrichment_mode = "fetched".to_string(); // attempted but unavailable locally
+   emit(ProgressEvent::StatusEnrichmentComplete { enriched: 0, cleared: 0 });
+   // continue to discussion sync
+ }
+```
+
+4. **Medium: Upgrade `--status` from single-value to repeatable multi-value filter**
+Rationale: Practical usage often needs “active buckets” (`To do` OR `In progress`). Repeatable `--status` with OR semantics dramatically improves usefulness without adding new conceptual surface area.
+
+```diff
+@@ AC-9: List Issues Filter (E2E)
+- [ ] `lore list issues --status "In progress"` → only issues where `status_name = 'In progress'`
+ [ ] `lore list issues --status "In progress"` → unchanged single-value behavior
+ [ ] Repeatable flags supported: `--status "In progress" --status "To do"` (OR semantics across status values)
+ [ ] Repeated `--status` remains AND-composed with other filters
+
+@@ File 9: src/cli/mod.rs
+- pub status: Option<String>,
+ pub status: Vec<String>,   // repeatable flag
+
+@@ File 8: src/cli/commands/list.rs
+- if let Some(status) = filters.status { where_clauses.push("i.status_name = ? COLLATE NOCASE"); ... }
+ if !filters.statuses.is_empty() { /* dynamic OR/IN clause with case-insensitive matching */ }
+```
+
+5. **Medium: Add coverage telemetry (`seen`, `with_status`, `without_status`)**
+Rationale: `enriched`/`cleared` alone is not enough to judge enrichment health. Coverage counters make it obvious whether a project truly has no statuses, is unsupported, or has unexpectedly low status population.
+
+```diff
+@@ AC-6: Enrichment in Orchestrator (Integration)
+ [ ] `IngestProjectResult` gains `statuses_seen: usize` and `statuses_without_widget: usize`
+ [ ] Enrichment log includes `seen`, `enriched`, `cleared`, `without_widget`
+
+@@ AC-10: Robot Sync Envelope (E2E)
+- status_enrichment: { mode, reason, enriched, cleared, error }
+ status_enrichment: { mode, reason, seen, enriched, cleared, without_widget, error, partial_errors }
+
+@@ File 6: src/ingestion/orchestrator.rs
+ result.statuses_seen = fetch_result.all_fetched_iids.len();
+ result.statuses_without_widget = result.statuses_seen.saturating_sub(result.statuses_enriched);
+```
+
+6. **Medium: Centralize color parsing/render decisions (single helper used by show/list)**
+Rationale: Color parsing is duplicated in `show.rs` and `list.rs`, which invites drift and inconsistent behavior. One shared helper gives consistent fallback behavior and simpler tests.
+
+```diff
+@@ File 7: src/cli/commands/show.rs
+- fn style_with_hex(...) { ...hex parse logic... }
+ use crate::cli::commands::color::style_with_hex;
+
+@@ File 8: src/cli/commands/list.rs
+- fn colored_cell_hex(...) { ...hex parse logic... }
+ use crate::cli::commands::color::colored_cell_hex;
+
+@@ Files Changed (Summary)
+ `src/cli/commands/color.rs` (NEW) — shared hex parsing + styling helpers
+- duplicated hex parsing blocks removed from show/list
+```
+
+---
+
+If you want, I can produce a **single consolidated patch-style diff of the plan document itself** (all section edits merged, ready to paste as iteration 7).
--- a/plans/work-item-status-graphql.md
+++ b/plans/work-item-status-graphql.md
--- a/plans/work-item-status-graphql.tdd-appendix.md
+++ b/plans/work-item-status-graphql.tdd-appendix.md