From 2c9de1a6c3fade87c6662537ee7f1d4190f513a5 Mon Sep 17 00:00:00 2001 From: Taylor Eernisse Date: Wed, 11 Feb 2026 08:12:17 -0500 Subject: [PATCH] docs: add lore-service, work-item-status-graphql, and time-decay plans Three implementation plans with iterative cross-model refinement: lore-service (5 iterations): HTTP service layer exposing lore's SQLite data via REST/SSE for integration with external tools (dashboards, IDE extensions, chat agents). Covers authentication, rate limiting, caching strategy, and webhook-driven sync triggers. work-item-status-graphql (7 iterations + TDD appendix): Detailed implementation plan for the GraphQL-based work item status enrichment feature (now implemented). Includes the TDD appendix with test-first development specifications covering GraphQL client, adaptive pagination, ingestion orchestration, CLI display, and robot mode output. time-decay-expert-scoring (iteration 5 feedback): Updates to the existing time-decay scoring plan incorporating feedback on decay curve parameterization, recency weighting for discussion contributions, and staleness detection thresholds. Co-Authored-By: Claude Opus 4.6 --- plans/lore-service.feedback-1.md | 186 + plans/lore-service.feedback-2.md | 182 + plans/lore-service.feedback-3.md | 174 + plans/lore-service.feedback-4.md | 190 + plans/lore-service.feedback-5.md | 196 + plans/lore-service.md | 3759 +++++++++++++++++ plans/time-decay-expert-scoring.feedback-5.md | 128 + plans/time-decay-expert-scoring.md | 128 +- plans/work-item-status-graphql.feedback-3.md | 157 + plans/work-item-status-graphql.feedback-4.md | 159 + plans/work-item-status-graphql.feedback-5.md | 124 + plans/work-item-status-graphql.feedback-6.md | 130 + plans/work-item-status-graphql.feedback-7.md | 118 + plans/work-item-status-graphql.md | 1627 +++++++ .../work-item-status-graphql.tdd-appendix.md | 2036 +++++++++ 15 files changed, 9261 insertions(+), 33 deletions(-) create mode 100644 plans/lore-service.feedback-1.md create mode 100644 plans/lore-service.feedback-2.md create mode 100644 plans/lore-service.feedback-3.md create mode 100644 plans/lore-service.feedback-4.md create mode 100644 plans/lore-service.feedback-5.md create mode 100644 plans/lore-service.md create mode 100644 plans/time-decay-expert-scoring.feedback-5.md create mode 100644 plans/work-item-status-graphql.feedback-3.md create mode 100644 plans/work-item-status-graphql.feedback-4.md create mode 100644 plans/work-item-status-graphql.feedback-5.md create mode 100644 plans/work-item-status-graphql.feedback-6.md create mode 100644 plans/work-item-status-graphql.feedback-7.md create mode 100644 plans/work-item-status-graphql.md create mode 100644 plans/work-item-status-graphql.tdd-appendix.md diff --git a/plans/lore-service.feedback-1.md b/plans/lore-service.feedback-1.md new file mode 100644 index 0000000..9a3fc8c --- /dev/null +++ b/plans/lore-service.feedback-1.md @@ -0,0 +1,186 @@ +1. **Isolate scheduled behavior from manual `sync`** +Reasoning: Your current plan injects backoff into `handle_sync_cmd`, which affects all `lore sync` calls (including manual recovery runs). Scheduled behavior should be isolated so humans aren’t unexpectedly blocked by service backoff. + +```diff +@@ Context +-`lore sync` runs a 4-stage pipeline (issues, MRs, docs, embeddings) that takes 2-4 minutes. ++`lore sync` remains the manual/operator command. ++`lore service run` (hidden/internal) is the scheduled execution entrypoint. + +@@ Commands & User Journeys ++### `lore service run` (hidden/internal) ++**What it does:** Executes one scheduled sync attempt with service-only policy: ++- applies service backoff policy ++- records service run state ++- invokes sync pipeline with configured profile ++- updates retry state on success/failure ++ ++**Invocation:** scheduler always runs: ++`lore --robot service run --reason timer` + +@@ Backoff Integration into `handle_sync_cmd` +-Insert **after** config load but **before** the dry_run check: ++Do not add backoff checks to `handle_sync_cmd`. ++Backoff logic lives only in `handle_service_run`. +``` + +2. **Use DB as source-of-truth for service state (not a standalone JSON status file)** +Reasoning: You already have `sync_runs` in SQLite. A separate JSON status file creates split-brain and race/corruption risk. Keep JSON as optional cache/export only. + +```diff +@@ Status File +-Location: `{get_data_dir()}/sync-status.json` ++Primary state location: SQLite (`service_state` table) + existing `sync_runs`. ++Optional mirror file: `{get_data_dir()}/sync-status.json` (best-effort export only). + +@@ File-by-File Implementation Details +-### `src/core/sync_status.rs` (NEW) ++### `migrations/015_service_state.sql` (NEW) ++CREATE TABLE service_state ( ++ id INTEGER PRIMARY KEY CHECK (id = 1), ++ installed INTEGER NOT NULL DEFAULT 0, ++ platform TEXT, ++ interval_seconds INTEGER, ++ profile TEXT NOT NULL DEFAULT 'balanced', ++ consecutive_failures INTEGER NOT NULL DEFAULT 0, ++ next_retry_at_ms INTEGER, ++ last_error_code TEXT, ++ last_error_message TEXT, ++ updated_at_ms INTEGER NOT NULL ++); ++ ++### `src/core/service_state.rs` (NEW) ++- read/write state row ++- derive backoff/next_retry ++- join with latest `sync_runs` for status output +``` + +3. **Backoff policy should be configurable, jittered, and error-aware** +Reasoning: Fixed hardcoded backoff (`base=1800`) is wrong when user sets another interval. Also permanent failures (bad token/config) should not burn retries forever; they should enter paused/error state. + +```diff +@@ Backoff Logic +-// Exponential: base * 2^failures, capped at 4 hours ++// Exponential with jitter: base * 2^(failures-1), capped, ±20% jitter ++// Applies only to transient errors. ++// Permanent errors set `paused_reason` and stop retries until user action. + +@@ CLI Definition Changes ++ServiceCommand::Resume, // clear paused state / failures ++ServiceCommand::Run, // hidden + +@@ Error Types ++ServicePaused, // scheduler paused due to permanent error ++ServiceCommandFailed, // OS command failure with stderr context +``` + +4. **Add a pipeline-level single-flight lock** +Reasoning: Current locking is in ingest stages; there’s still overlap risk across full sync pipelines (docs/embed can overlap with another run). Add a top-level lock for scheduled/manual sync pipeline execution. + +```diff +@@ Architecture ++Add `sync_pipeline` lock at top-level sync execution. ++Keep existing ingest lock (`sync`) for ingest internals. + +@@ Backoff Integration into `handle_sync_cmd` ++Before starting sync pipeline, acquire `AppLock` with: ++name = "sync_pipeline" ++stale_lock_minutes = config.sync.stale_lock_minutes ++heartbeat_interval_seconds = config.sync.heartbeat_interval_seconds +``` + +5. **Don’t embed token in service files by default** +Reasoning: Embedding PAT into unit/plist is a high-risk secret leak path. Make secure storage explicit and default-safe. + +```diff +@@ `lore service install [--interval 30m]` ++`lore service install [--interval 30m] [--token-source env-file|embedded]` ++Default: `env-file` (0600 perms, user-owned) ++`embedded` allowed only with explicit opt-in and warning + +@@ Robot output +- "token_embedded": true ++ "token_source": "env_file" + +@@ Human output +- Note: Your GITLAB_TOKEN is embedded in the service file. ++ Note: Token is stored in a user-private env file (0600). +``` + +6. **Introduce a command-runner abstraction with timeout + stderr capture** +Reasoning: `launchctl/systemctl/schtasks` calls are failure-prone; you need consistent error mapping and deterministic tests. + +```diff +@@ Platform Backends +-exports free functions that dispatch via `#[cfg(target_os)]` ++exports backend + shared `CommandRunner`: ++- run(cmd, args, timeout) ++- capture stdout/stderr/exit code ++- map failure to `ServiceCommandFailed { cmd, exit_code, stderr }` +``` + +7. **Persist install manifest to avoid brittle file parsing** +Reasoning: Parsing timer/plist for interval/state is fragile and platform-format dependent. Persist a manifest with checksums and expected artifacts. + +```diff +@@ Platform Backends +-Same pattern for ... `get_interval_seconds()` ++Add manifest: `{data_dir}/service-manifest.json` ++Stores platform, interval, profile, generated files, and command. ++`service status` reads manifest first, then verifies platform state. + +@@ Acceptance criteria ++Install is idempotent: ++- if manifest+files already match, report `no_change: true` ++- if drift detected, reconcile and rewrite +``` + +8. **Make schedule profile explicit (`fast|balanced|full`)** +Reasoning: This makes the feature more useful and performance-tunable without requiring users to understand internal flags. + +```diff +@@ `lore service install [--interval 30m]` ++`lore service install [--interval 30m] [--profile fast|balanced|full]` ++ ++Profiles: ++- fast: `sync --no-docs --no-embed` ++- balanced (default): `sync --no-embed` ++- full: `sync` +``` + +9. **Upgrade `service status` to include scheduler health + recent run summary** +Reasoning: Single last-sync snapshot is too shallow. Include recent attempts and whether scheduler is paused/backing off/running. + +```diff +@@ `lore service status` +-What it does: Shows whether the service is installed, its configuration, last sync result, and next scheduled run. ++What it does: Shows install state, scheduler state (running/backoff/paused), recent runs, and next run estimate. + +@@ Robot output +- "last_sync": { ... }, +- "backoff": null ++ "scheduler_state": "running|backoff|paused|idle", ++ "last_sync": { ... }, ++ "recent_runs": [{"run_id":"...","status":"...","started_at_iso":"..."}], ++ "backoff": null, ++ "paused_reason": null +``` + +10. **Strengthen tests around determinism and cross-platform generation** +Reasoning: Time-based backoff and shell quoting are classic flaky points. Add fake clock + fake command runner for deterministic tests. + +```diff +@@ Testing Strategy ++Add deterministic test seams: ++- `Clock` trait for backoff/now calculations ++- `CommandRunner` trait for backend command execution ++ ++Add tests: ++- transient vs permanent error classification ++- backoff schedule with jitter bounds ++- manifest drift reconciliation ++- quoting/escaping for paths with spaces and special chars ++- `service run` does not modify manual `sync` behavior +``` + +If you want, I can rewrite your full plan as a single clean revised document with these changes already integrated (instead of patch fragments). \ No newline at end of file diff --git a/plans/lore-service.feedback-2.md b/plans/lore-service.feedback-2.md new file mode 100644 index 0000000..80a7cd7 --- /dev/null +++ b/plans/lore-service.feedback-2.md @@ -0,0 +1,182 @@ +**High-Impact Revisions (ordered by priority)** + +1. **Make service identity project-scoped (avoid collisions across repos/users)** +Analysis: Current fixed names (`com.gitlore.sync`, `LoreSync`, `lore-sync.timer`) will collide when users run multiple gitlore workspaces. This causes silent overwrites and broken uninstall/status behavior. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ Commands & User Journeys / install +- lore service install [--interval 30m] [--profile balanced] [--token-source env-file] ++ lore service install [--interval 30m] [--profile balanced] [--token-source auto] [--name ] +@@ Install Manifest Schema ++ /// Stable per-install identity (default derived from project root hash) ++ pub service_id: String, +@@ Platform Backends +- Label: com.gitlore.sync ++ Label: com.gitlore.sync.{service_id} +- Task name: LoreSync ++ Task name: LoreSync-{service_id} +- ~/.config/systemd/user/lore-sync.service ++ ~/.config/systemd/user/lore-sync-{service_id}.service +``` + +2. **Replace token model with secure per-OS defaults** +Analysis: The current “env-file default” is not actually secure on macOS launchd (token still ends up in plist). On Windows, assumptions about inherited environment are fragile. Use OS-native secure stores by default and keep `embedded` as explicit opt-in only. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ Token storage strategies +-| env-file (default) | ... ++| auto (default) | macOS: Keychain, Linux: env-file (0600), Windows: Credential Manager | ++| env-file | Linux/systemd only | + | embedded | ... explicit warning ... +@@ macOS launchd section +- env-file strategy stores canonical token in service-env but embeds token in plist ++ default strategy is Keychain lookup at runtime; no token persisted in plist ++ env-file is not offered on macOS +@@ Windows schtasks section +- token must be in user's system environment ++ default strategy stores token in Windows Credential Manager and injects at runtime +``` + +3. **Version and atomically persist manifest/status** +Analysis: `Option` on read hides corruption, and non-atomic writes risk truncated JSON on crashes. This will create false “not installed” and scheduler confusion. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ Install Manifest Schema ++ pub schema_version: u32, // start at 1 ++ pub updated_at_iso: String, +@@ Status File Schema ++ pub schema_version: u32, // start at 1 ++ pub updated_at_iso: String, +@@ Read/Write +- read(path) -> Option ++ read(path) -> Result, LoreError> +- write(...) -> std::io::Result<()> ++ write_atomic(...) -> std::io::Result<()> // tmp file + fsync + rename +``` + +4. **Persist `next_retry_at_ms` instead of recomputing jitter** +Analysis: Deterministic jitter from timestamp modulo is predictable and can herd retries. Persisting `next_retry_at_ms` at failure time makes status accurate, stable, and cheap to compute. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ SyncStatusFile + pub consecutive_failures: u32, ++ pub next_retry_at_ms: Option, +@@ Backoff Logic +- compute backoff from last_run.timestamp_ms and deterministic jitter each read ++ compute backoff once on failure, store next_retry_at_ms, read-only comparison afterward ++ jitter algorithm: full jitter in [0, cap], injectable RNG for tests +``` + +5. **Add circuit breaker for repeated transient failures** +Analysis: Infinite transient retries can run forever on systemic failures (DB corruption, bad network policy). After N transient failures, pause with actionable reason. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ Scheduler states +- backoff — transient failures, waiting to retry ++ backoff — transient failures, waiting to retry ++ paused — permanent error OR circuit breaker tripped after N transient failures +@@ Service run flow +- On transient failure: increment failures, compute backoff ++ On transient failure: increment failures, compute backoff, if failures >= max_transient_failures -> pause +``` + +6. **Stage-aware outcome policy (core freshness over all-or-nothing)** +Analysis: Failing embeddings/docs should not block issues/MRs freshness. Split stage outcomes and only treat core stages as hard-fail by default. This improves reliability and practical usefulness. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ Context +- lore sync runs a 4-stage pipeline ... treated as one run result ++ lore service run records per-stage outcomes (issues, mrs, docs, embeddings) +@@ Status File Schema ++ pub stage_results: Vec, +@@ service run flow +- Execute sync pipeline with flags derived from profile ++ Execute stage-by-stage and classify severity: ++ - critical: issues, mrs ++ - optional: docs, embeddings ++ optional stage failures mark run as degraded, not failed +``` + +7. **Replace cfg free-function backend with trait-based backend** +Analysis: Current backend API is hard to test end-to-end without real OS commands. A `SchedulerBackend` trait enables deterministic integration tests and cleaner architecture. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ Platform Backends / Architecture +- exports free functions dispatched via #[cfg] ++ define trait SchedulerBackend { install, uninstall, state, file_paths, next_run } ++ provide LaunchdBackend, SystemdBackend, SchtasksBackend implementations ++ include FakeBackend for integration tests +``` + +8. **Harden platform units and detect scheduler prerequisites** +Analysis: systemd user timers often fail silently without user manager/linger; launchd context can be wrong in headless sessions. Add explicit diagnostics and unit hardening. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ Linux systemd unit + [Service] + Type=oneshot + ExecStart=... ++TimeoutStartSec=900 ++NoNewPrivileges=true ++PrivateTmp=true ++ProtectSystem=strict ++ProtectHome=read-only +@@ Linux install/status ++ detect user manager availability and linger state; surface warning/action +@@ macOS install/status ++ detect non-GUI bootstrap context and return actionable error +``` + +9. **Add operational commands: `trigger`, `doctor`, and non-interactive log tail** +Analysis: `logs` opening an editor is weak for automation and incident response. Operators need a preflight and immediate controlled run. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ ServiceCommand ++ Trigger, // run one attempt through service policy now ++ Doctor, // validate scheduler, token, paths, permissions +@@ logs +- opens editor ++ supports --tail and --follow in human mode ++ robot mode can return last_n lines optionally +``` + +10. **Fix plan inconsistencies and edge-case correctness** +Analysis: There are internal mismatches that will cause implementation drift. +Diff: +```diff +--- a/plan.md ++++ b/plan.md +@@ Interval Parsing +- supports 's' suffix ++ remove 's' suffix (acceptance only allows 5m..24h) +@@ uninstall acceptance +- removes ALL service files only ++ explicitly also remove service-manifest and service-env (status/logs retained) +@@ SyncStatusFile schema +- pub last_run: SyncRunRecord ++ pub last_run: Option // matches idle/no runs state +``` + +--- + +**Recommended Architecture Upgrade Summary** + +The strongest improvement set is: **(1) project-scoped IDs, (2) secure token defaults, (3) atomic/versioned state, (4) persisted retry schedule + circuit breaker, (5) stage-aware outcomes**. That combination materially improves correctness, multi-repo safety, security, operability, and real-world reliability without changing your core manual-vs-scheduled separation principle. \ No newline at end of file diff --git a/plans/lore-service.feedback-3.md b/plans/lore-service.feedback-3.md new file mode 100644 index 0000000..3b5ccb2 --- /dev/null +++ b/plans/lore-service.feedback-3.md @@ -0,0 +1,174 @@ +Below are the highest-impact revisions I’d make, ordered by severity/ROI. These focus on correctness first, then security, then operability and UX. + +1. **Fix multi-install ambiguity (`service_id` exists, but commands can’t target one explicitly)** +Analysis: The plan introduces `service-manifest-{service_id}.json`, but `status/uninstall/resume/logs` have no selector. In a multi-workspace or multi-name install scenario, behavior becomes ambiguous and error-prone. Add explicit targeting plus discovery. +```diff +@@ ## Commands & User Journeys ++### `lore service list` ++Lists installed services discovered from `{data_dir}/service-manifest-*.json`. ++Robot output includes `service_id`, `platform`, `interval_seconds`, `profile`, `installed_at_iso`. + +@@ ### `lore service uninstall` +-### `lore service uninstall` ++### `lore service uninstall [--service ] [--all]` +@@ +-2. CLI reads install manifest to find `service_id` ++2. CLI resolves target service via `--service` or current-project-derived default. ++3. If multiple candidates and no selector, return actionable error. + +@@ ### `lore service status` +-### `lore service status` ++### `lore service status [--service ]` +``` + +2. **Make status state service-scoped (not global)** +Analysis: A single `sync-status.json` for all services causes cross-service contamination (pause/backoff/outcome from one profile affecting another). Keep lock global, but state per service. +```diff +@@ ## Status File +-### Location +-`{get_data_dir()}/sync-status.json` ++### Location ++`{get_data_dir()}/sync-status-{service_id}.json` + +@@ ## Paths Module Additions +-pub fn get_service_status_path() -> PathBuf { +- get_data_dir().join("sync-status.json") ++pub fn get_service_status_path(service_id: &str) -> PathBuf { ++ get_data_dir().join(format!("sync-status-{service_id}.json")) +} +@@ +-Note: `sync-status.json` is NOT scoped by `service_id` ++Note: status is scoped by `service_id`; lock remains global (`sync_pipeline`) to prevent overlapping writes. +``` + +3. **Stop classifying permanence via string matching** +Analysis: Matching `"401 Unauthorized"` in strings is brittle and will misclassify edge cases. Carry machine codes through stage results and classify by `ErrorCode` only. +```diff +@@ pub struct StageResult { +- pub error: Option, ++ pub error: Option, ++ pub error_code: Option, // e.g., AUTH_FAILED, NETWORK_ERROR +} +@@ Error classification helpers +-fn is_permanent_error_message(msg: Option<&str>) -> bool { ...string contains... } ++fn is_permanent_error_code(code: Option<&str>) -> bool { ++ matches!(code, Some("TOKEN_NOT_SET" | "AUTH_FAILED" | "CONFIG_NOT_FOUND" | "CONFIG_INVALID" | "MIGRATION_FAILED")) ++} +``` + +4. **Install should be transactional (manifest written last)** +Analysis: Current order writes manifest before scheduler enable. If enable fails, you persist a false “installed” state. Use two-phase install with rollback. +```diff +@@ ### `lore service install` User journey +-9. CLI writes install manifest ... +-10. CLI runs the platform-specific enable command ++9. CLI runs the platform-specific enable command ++10. On success, CLI writes install manifest atomically ++11. On failure, CLI removes generated files and returns `ServiceCommandFailed` +``` + +5. **Fix launchd token security gap (env-file currently still embeds token)** +Analysis: Current “env-file” on macOS still writes token into plist, defeating the main security goal. Generate a private wrapper script that reads env file at runtime and execs `lore`. +```diff +@@ ### macOS: launchd +-ProgramArguments +- +- {binary_path} +- --robot +- service +- run +- ++ProgramArguments ++ ++ {data_dir}/service-run-{service_id}.sh ++ +@@ +-`env-file`: ... token value must still appear in plist ... ++`env-file`: token never appears in plist; wrapper loads `{data_dir}/service-env-{service_id}` at runtime. +``` + +6. **Improve backoff math and add half-open circuit recovery** +Analysis: Current jitter + min clamp makes first retry deterministic and can over-pause. Also circuit-breaker requires manual resume forever. Add cooldown + half-open probe to self-heal. +```diff +@@ Backoff Logic +-let backoff_secs = ((base_backoff as f64) * jitter_factor) as u64; +-let backoff_secs = backoff_secs.max(base_interval_seconds); ++let max_backoff = base_backoff; ++let min_backoff = base_interval_seconds; ++let span = max_backoff.saturating_sub(min_backoff); ++let backoff_secs = min_backoff + ((span as f64) * jitter_factor) as u64; + +@@ Scheduler states +-- `paused` — permanent error ... OR circuit breaker tripped ... ++- `paused` — permanent error requiring intervention ++- `half_open` — probe state after circuit cooldown; one trial run allowed + +@@ Circuit breaker +-... transitions to `paused` ... Run: lore service resume ++... transitions to `half_open` after cooldown (default 30m). Successful probe closes breaker automatically; failed probe returns to backoff/paused. +``` + +7. **Promote backend trait to v1 (not v2) for deterministic integration tests** +Analysis: This is a reliability-critical feature spanning OS schedulers. A trait abstraction now gives true behavior tests and safer refactors. +```diff +@@ ### Platform Backends +-> Future architecture note: A `SchedulerBackend` trait ... for v2. ++Adopt `SchedulerBackend` trait in v1 with real backends (`launchd/systemd/schtasks`) and `FakeBackend` for tests. ++This enables deterministic install/uninstall/status/run-path integration tests without touching host scheduler. +``` + +8. **Harden `run_cmd` timeout behavior** +Analysis: If timeout occurs, child process must be killed and reaped. Otherwise you leak processes and can wedge repeated runs. +```diff +@@ fn run_cmd(...) +-// Wait with timeout +-let output = wait_with_timeout(output, timeout_secs)?; ++// Wait with timeout; on timeout kill child and wait to reap ++let output = wait_with_timeout_kill_and_reap(child, timeout_secs)?; +``` + +9. **Add manual control commands (`pause`, `trigger`, `repair`)** +Analysis: These are high-utility operational controls. `trigger` helps immediate sync without waiting interval. `pause` supports maintenance windows. `repair` avoids manual file deletion for corrupt state. +```diff +@@ pub enum ServiceCommand { ++ /// Pause scheduled execution without uninstalling ++ Pause { #[arg(long)] reason: Option }, ++ /// Trigger an immediate one-off run using installed profile ++ Trigger { #[arg(long)] ignore_backoff: bool }, ++ /// Repair corrupt manifest/status by backing up and reinitializing ++ Repair { #[arg(long)] service: Option }, +} +``` + +10. **Make `logs` default non-interactive and add rotation policy** +Analysis: Opening editor by default is awkward for automation/SSH and slower for normal diagnosis. Defaulting to `tail` is more practical; `--open` can preserve editor behavior. +```diff +@@ ### `lore service logs` +-By default, opens in the user's preferred editor. ++By default, prints last 100 lines to stdout. ++Use `--open` to open editor. +@@ ++Log rotation: rotate `service-stdout.log` / `service-stderr.log` at 10 MB, keep 5 files. +``` + +11. **Remove destructive/shell-unsafe suggested action** +Analysis: `actions(): ["rm {path}", ...]` is unsafe (shell injection + destructive guidance). Replace with safe command path. +```diff +@@ LoreError::actions() +-Self::ServiceCorruptState { path, .. } => vec![&format!("rm {path}"), "lore service install"], ++Self::ServiceCorruptState { .. } => vec!["lore service repair", "lore service install"], +``` + +12. **Tighten scheduler units for real-world reliability** +Analysis: Add explicit working directory and success-exit handling to reduce environment drift and edge failures. +```diff +@@ systemd service unit + [Service] + Type=oneshot + ExecStart={binary_path} --robot service run ++WorkingDirectory={data_dir} ++SuccessExitStatus=0 + TimeoutStartSec=900 +``` + +If you want, I can produce a single consolidated “v3 plan” markdown with these revisions already merged into your original structure. \ No newline at end of file diff --git a/plans/lore-service.feedback-4.md b/plans/lore-service.feedback-4.md new file mode 100644 index 0000000..459e1a1 --- /dev/null +++ b/plans/lore-service.feedback-4.md @@ -0,0 +1,190 @@ +No `## Rejected Recommendations` section was present in the plan you shared, so the proposals below are all net-new. + +1. **Make scheduled runs explicitly target a single service instance** +Analysis: right now `service run` has no selector, but the plan supports multiple installed services. That creates ambiguity and incorrect manifest/status selection. This is the most important architectural fix. + +```diff +@@ `lore service install` What it does +- runs `lore --robot service run` at the specified interval ++ runs `lore --robot service run --service-id ` at the specified interval + +@@ Robot output (`install`) +- "sync_command": "/usr/local/bin/lore --robot service run", ++ "sync_command": "/usr/local/bin/lore --robot service run --service-id a1b2c3d4", + +@@ `ServiceCommand` enum +- #[command(hide = true)] +- Run, ++ #[command(hide = true)] ++ Run { ++ /// Internal selector injected by scheduler backend ++ #[arg(long, hide = true)] ++ service_id: String, ++ }, + +@@ `handle_service_run` signature +-pub fn handle_service_run(start: std::time::Instant) -> Result<(), Box> ++pub fn handle_service_run(service_id: &str, start: std::time::Instant) -> Result<(), Box> + +@@ run flow step 1 +- Read install manifest ++ Read install manifest for `service_id` +``` + +2. **Strengthen `service_id` derivation to avoid cross-workspace collisions** +Analysis: hashing config path alone can collide when many workspaces share one global config. Identity should represent what is being synced, not only where config lives. + +```diff +@@ Key Design Principles / Project-Scoped Service Identity +- derive from a stable hash of the config file path ++ derive from a stable fingerprint of: ++ - canonical workspace root ++ - normalized configured GitLab project URLs ++ - canonical config path ++ then take first 12 hex chars of SHA-256 + +@@ `compute_service_id` +- Returns first 8 hex chars of SHA-256 of the canonical config path. ++ Returns first 12 hex chars of SHA-256 of a canonical identity tuple ++ (workspace_root + sorted project URLs + config_path). +``` + +3. **Introduce a service-state machine with a dedicated admin lock** +Analysis: install/uninstall/pause/resume/repair/status can race each other. A lock and explicit transition table prevents invalid states and file races. + +```diff +@@ New section: Service State Model ++ All state mutations are serialized by `AppLock("service-admin-{service_id}")`. ++ Legal transitions: ++ - idle -> running -> success|degraded|backoff|paused ++ - backoff -> running|paused ++ - paused -> half_open|running (resume) ++ - half_open -> running|paused ++ Any invalid transition is rejected with `ServiceCorruptState`. + +@@ `handle_install`, `handle_uninstall`, `handle_pause`, `handle_resume`, `handle_repair` ++ Acquire `service-admin-{service_id}` before mutating manifest/status/service files. +``` + +4. **Unify manual and scheduled sync execution behind one orchestrator** +Analysis: the plan currently duplicates stage logic and error classification in `service run`, increasing drift risk. A shared orchestrator gives one authoritative pipeline behavior. + +```diff +@@ Key Design Principles ++ #### 6. Single Sync Orchestrator ++ Both `lore sync` and `lore service run` call `SyncOrchestrator`. ++ Service mode adds policy (backoff/circuit-breaker); manual mode bypasses policy. + +@@ Service Run Implementation +- execute_sync_stages(&sync_args) ++ SyncOrchestrator::run(SyncMode::Service { profile, policy }) + +@@ manual sync +- separate pipeline path ++ SyncOrchestrator::run(SyncMode::Manual { flags }) +``` + +5. **Add bounded in-run retries for transient core-stage failures** +Analysis: single-shot failure handling will over-trigger backoff on temporary network blips. One short retry per core stage significantly improves freshness without much extra runtime. + +```diff +@@ Stage-aware execution ++ Core stages (`issues`, `mrs`) get up to 1 immediate retry on transient errors ++ (jittered 1-5s). Permanent errors are never retried. ++ Optional stages keep best-effort semantics. + +@@ Acceptance criteria (`service run`) ++ Retries transient core stage failures once before counting run as failed. +``` + +6. **Harden persistence with full crash-safety semantics** +Analysis: current atomic write description is good but incomplete for power-loss durability. You should fsync parent directory after rename and include lightweight integrity metadata. + +```diff +@@ `write_atomic` +- tmp file + fsync + rename ++ tmp file + fsync(file) + rename + fsync(parent_dir) + +@@ `ServiceManifest` and `SyncStatusFile` ++ pub write_seq: u64, ++ pub content_sha256: String, // optional integrity guard for repair/doctor +``` + +7. **Fix token handling to avoid shell/env injection and add secure-store mode** +Analysis: sourcing env files in shell is brittle if token contains special chars/newlines. Also, secure OS credential stores should be first-class for production reliability/security. + +```diff +@@ Token storage strategies +-| `env-file` (default) ... ++| `auto` (default) | use secure-store when available, else env-file | ++| `secure-store` | macOS Keychain / libsecret / Windows Credential Manager | ++| `env-file` | explicit fallback | + +@@ macOS wrapper script +-. "{data_dir}/service-env-{service_id}" +-export {token_env_var} ++TOKEN_VALUE="$(cat "{data_dir}/service-token-{service_id}" )" ++export {token_env_var}="$TOKEN_VALUE" + +@@ Acceptance criteria ++ Reject token values containing `\0` or newline for env-file mode. ++ Never eval/source untrusted token content. +``` + +8. **Correct platform/runtime implementation hazards** +Analysis: there are a few correctness risks that should be fixed in-plan now. + +```diff +@@ macOS install steps +- Get UID via `unsafe { libc::getuid() }` ++ Get UID via safe API (`nix::unistd::Uid::current()` or equivalent safe helper) + +@@ Command Runner Helper +- poll try_wait and read stdout/stderr after exit ++ avoid potential pipe backpressure deadlock: ++ use wait-with-timeout + concurrent stdout/stderr draining + +@@ Linux timer +- OnUnitActiveSec={interval_seconds}s ++ OnUnitInactiveSec={interval_seconds}s ++ AccuracySec=1min +``` + +9. **Make logs fully service-scoped** +Analysis: you already scoped manifest/status by `service_id`; logs are still global in several places. Multi-service installs will overwrite each other’s logs. + +```diff +@@ Paths Module Additions +-pub fn get_service_log_path() -> PathBuf ++pub fn get_service_log_path(service_id: &str, stream: LogStream) -> PathBuf + +@@ log filenames +- logs/service-stderr.log +- logs/service-stdout.log ++ logs/service-{service_id}-stderr.log ++ logs/service-{service_id}-stdout.log + +@@ `service logs` +- default path: `{data_dir}/logs/service-stderr.log` ++ default path: `{data_dir}/logs/service-{service_id}-stderr.log` +``` + +10. **Resolve internal spec contradictions and rollback gaps** +Analysis: there are a few contradictory statements and incomplete rollback behavior that will cause implementation churn. + +```diff +@@ `service logs` behavior +- default (no flags): open in editor (human) ++ default (no flags): print last 100 lines (human and robot metadata mode) ++ `--open` is explicit opt-in + +@@ install rollback +- On failure: removes generated service files ++ On failure: removes generated service files, env file, wrapper script, and temp manifest + +@@ `handle_service_run` sample code +- let manifest_path = get_service_manifest_path(); ++ let manifest_path = get_service_manifest_path(service_id); +``` + +If you want, I can take these revisions and produce a single consolidated “Iteration 4” replacement plan block with all sections rewritten coherently so it’s ready to hand to an implementer. \ No newline at end of file diff --git a/plans/lore-service.feedback-5.md b/plans/lore-service.feedback-5.md new file mode 100644 index 0000000..c220792 --- /dev/null +++ b/plans/lore-service.feedback-5.md @@ -0,0 +1,196 @@ +I reviewed the full plan and avoided everything already listed in `## Rejected Recommendations`. These are the highest-impact revisions I’d make. + +1. **Fix identity model inconsistency and prevent `--name` alias collisions** +Why this improves the plan: your text says identity includes workspace root, but the current derivation code does not. Also, using `--name` as the actual `service_id` risks accidental cross-project collisions and destructive updates. + +```diff +--- a/plan.md ++++ b/plan.md +@@ Key Design Principles / 2. Project-Scoped Service Identity +- Each installed service gets a unique `service_id` derived from a canonical identity tuple: the config file path, sorted GitLab project URLs, and workspace root. ++ Each installed service gets an immutable `identity_hash` derived from a canonical identity tuple: ++ workspace root + canonical config path + sorted normalized project URLs. ++ `service_id` remains the scheduler identifier; `--name` is a human alias only. ++ If `--name` collides with an existing service that has a different `identity_hash`, install fails with an actionable error. + +@@ Install Manifest / Schema ++ /// Immutable identity hash for collision-safe matching across reinstalls ++ pub identity_hash: String, ++ /// Optional human-readable alias passed via --name ++ #[serde(skip_serializing_if = "Option::is_none")] ++ pub service_alias: Option, ++ /// Canonical workspace root used in identity derivation ++ pub workspace_root: String, + +@@ service_id derivation +-pub fn compute_service_id(config_path: &Path, project_urls: &[&str]) -> String ++pub fn compute_identity_hash(workspace_root: &Path, config_path: &Path, project_urls: &[&str]) -> String +``` + +2. **Add lock protocol to eliminate uninstall/run race conditions** +Why this improves the plan: today `service run` does not take admin lock, and admin commands do not take pipeline lock. `uninstall` can race with an active run and remove files mid-execution. + +```diff +--- a/plan.md ++++ b/plan.md +@@ Key Design Principles / 6. Serialized Admin Mutations +- The `service run` entrypoint does NOT acquire the admin lock — it only acquires the `sync_pipeline` lock ++ The `service run` entrypoint acquires only `sync_pipeline`. ++ Destructive admin operations (`install` overwrite, `uninstall`, `repair --regenerate`) must: ++ 1) acquire `service-admin-{service_id}` ++ 2) disable scheduler backend entrypoint ++ 3) acquire `sync_pipeline` lock with timeout ++ 4) mutate/remove files ++ This lock ordering is mandatory to prevent deadlocks and run/delete races. + +@@ lore service uninstall / User journey +- 4. Runs platform-specific disable command +- 5. Removes service files from disk ++ 4. Acquires `sync_pipeline` lock (after disabling scheduler) with bounded wait ++ 5. Removes service files from disk only after lock acquisition +``` + +3. **Make transient handling `Retry-After` aware** +Why this improves the plan: rate-limit and 503 responses often carry retry hints. Ignoring them causes useless retries and longer outages. + +```diff +--- a/plan.md ++++ b/plan.md +@@ Transient vs permanent error classification +-| Transient | Retry with backoff | Network timeout, rate limited, DB locked, 5xx from GitLab | ++| Transient | Retry with adaptive backoff | Network timeout, DB locked, 5xx from GitLab | ++| Transient (hinted) | Respect server retry hint | Rate limited with Retry-After/X-RateLimit-Reset | + +@@ Backoff Logic ++ If an error includes a retry hint (e.g., `Retry-After`), set: ++ `next_retry_at_ms = max(computed_backoff, hinted_retry_at_ms)`. ++ Persist `backoff_reason` for status visibility. +``` + +4. **Decouple optional stage cadence from core sync interval** +Why this improves the plan: running docs/embeddings every 5–30 minutes is expensive and unnecessary. Separate freshness windows reduce cost/latency while keeping core data fresh. + +```diff +--- a/plan.md ++++ b/plan.md +@@ Sync profiles +-| `balanced` (default) | `--no-embed` | Issues + MRs + doc generation | +-| `full` | (none) | Full pipeline including embeddings | ++| `balanced` (default) | core every interval, docs every 60m, no embeddings | Fast + useful docs | ++| `full` | core every interval, docs every interval, embeddings every 6h (default) | Full freshness with bounded cost | + +@@ Status File / StageResult ++ /// true when stage intentionally skipped due freshness window ++ #[serde(default)] ++ pub skipped: bool, + +@@ lore service run / Stage-aware execution ++ Optional stages may be skipped when their last successful run is within configured freshness windows. ++ Skipped optional stages do not count as failures and are recorded explicitly. +``` + +5. **Give Windows parity for secure token handling (env-file + wrapper)** +Why this improves the plan: current Windows path requires global/system env and has poor UX. A wrapper+env-file model gives platform parity and avoids global token exposure. + +```diff +--- a/plan.md ++++ b/plan.md +@@ Token storage strategies +-| On Windows, neither strategy applies — the token must be in the user's system environment ++| On Windows, `env-file` is supported via a generated wrapper script (`service-run-{service_id}.cmd` or `.ps1`) ++| that reads `{data_dir}/service-env-{service_id}` and launches `lore --robot service run ...`. ++| `embedded` remains opt-in and warned as less secure. + +@@ Windows: schtasks +- Token handling on Windows: The env var must be set system-wide via `setx` ++ Token handling on Windows: ++ - `env-file` (default): wrapper script reads token from private file at runtime ++ - `embedded`: token passed via wrapper-set environment variable ++ - `system_env`: still supported as fallback +``` + +6. **Add run heartbeat and stale-run detection** +Why this improves the plan: `running` state can become misleading after crashes or stale locks. Heartbeat metadata makes status accurate and improves incident triage. + +```diff +--- a/plan.md ++++ b/plan.md +@@ Status File / Schema ++ /// In-flight run metadata for crash/stale detection ++ #[serde(skip_serializing_if = "Option::is_none")] ++ pub current_run: Option, ++ ++pub struct CurrentRunState { ++ pub run_id: String, ++ pub started_at_ms: i64, ++ pub last_heartbeat_ms: i64, ++ pub pid: u32, ++} + +@@ lore service status +- - `running` — currently executing (sync_pipeline lock held) ++ - `running` — currently executing with live heartbeat ++ - `running_stale` — in-flight metadata exists but heartbeat exceeded stale threshold +``` + +7. **Upgrade drift detection from “loaded/unloaded” to spec-level drift** +Why this improves the plan: platform state alone misses manual edits to unit/plist/wrapper files. Spec-hash drift gives reliable “what changed?” diagnostics and safe regeneration. + +```diff +--- a/plan.md ++++ b/plan.md +@@ Install Manifest / Schema ++ /// Hash of generated scheduler artifacts and command spec ++ pub spec_hash: String, + +@@ lore service status +- Detects manifest/platform drift and reports it ++ Detects: ++ - platform drift (loaded/unloaded mismatch) ++ - spec drift (artifact content hash mismatch) ++ - command drift (sync command differs from manifest) + +@@ lore service repair ++ Add `--regenerate` to rewrite scheduler artifacts from manifest when spec drift is detected. ++ This is non-destructive and does not delete status/log history. +``` + +8. **Add safe operational modes: `install --dry-run` and `doctor --fix`** +Why this improves the plan: dry-run reduces risk before writing OS scheduler files; fix-mode improves operator ergonomics and lowers support burden. + +```diff +--- a/plan.md ++++ b/plan.md +@@ lore service install ++ Add `--dry-run`: ++ - validates config/token/prereqs ++ - renders service files and planned commands ++ - writes nothing, executes nothing + +@@ lore service doctor ++ Add `--fix` for safe, non-destructive remediations: ++ - create missing dirs ++ - correct file permissions on env/wrapper files ++ - run `systemctl --user daemon-reload` when applicable ++ - report applied fixes in robot output +``` + +9. **Define explicit schema migration behavior (not just `schema_version` fields)** +Why this improves the plan: version fields without migration policy become operational risk during upgrades. + +```diff +--- a/plan.md ++++ b/plan.md +@@ ServiceManifest Read/Write +- `ServiceManifest::read(path: &Path) -> Result, LoreError>` ++ `ServiceManifest::read_and_migrate(path: &Path) -> Result, LoreError>` ++ - Migrates known older schema versions to current in-memory model ++ - Rewrites migrated file atomically ++ - Fails with actionable `ServiceCorruptState` for unknown future major versions + +@@ SyncStatusFile Read/Write +- `SyncStatusFile::read(path: &Path) -> Result, LoreError>` ++ `SyncStatusFile::read_and_migrate(path: &Path) -> Result, LoreError>` +``` + +If you want, I can produce a fully rewritten v5 plan text that integrates all nine changes coherently section-by-section. \ No newline at end of file diff --git a/plans/lore-service.md b/plans/lore-service.md new file mode 100644 index 0000000..7a1ab13 --- /dev/null +++ b/plans/lore-service.md @@ -0,0 +1,3759 @@ +--- +plan: true +title: "" +status: iterating +iteration: 5 +target_iterations: 8 +beads_revision: 0 +related_plans: [] +created: 2026-02-09 +updated: 2026-02-11 +--- + +# Plan: `lore service` — OS-Native Scheduled Sync + +## Context + +`lore sync` runs a 4-stage pipeline (issues, MRs, docs, embeddings) that takes 2-4 minutes. Today it must be invoked manually. We want `lore service install` to set up OS-native scheduled execution automatically, with exponential backoff on failures, a circuit breaker for persistent transient errors, stage-aware outcome tracking, and a status file for observability. This is the first nested subcommand in the project. + +### Key Design Principles + +#### 1. Separation of Manual and Scheduled Execution + +`lore sync` remains the manual/operator command. It is never subject to backoff, pausing, or service-level policy. A separate hidden entrypoint — `lore service run --service-id ` — is what the OS scheduler actually invokes. This entrypoint applies service-specific policy (backoff, error classification, pipeline locking) before delegating to the sync pipeline. This separation ensures that a human running `lore sync` to debug or recover is never unexpectedly blocked by service state. The `--service-id` parameter ensures unambiguous manifest/status file selection when multiple services are installed. + +#### 2. Project-Scoped Service Identity + +Each installed service gets a unique `service_id` derived from a canonical identity tuple: the workspace root, config file path, and sorted GitLab project URLs. This composite fingerprint prevents collisions even when multiple workspaces share a single global config file — the identity represents *what* is being synced and *where*, not just the config location. The hash uses 12 hex characters (48 bits) for collision safety. An optional `--name` flag allows explicit naming for human readability; if `--name` collides with an existing service that has a different identity hash (different workspace/config/projects), install fails with an actionable error listing the conflict. + +#### 3. Stage-Aware Outcome Tracking + +The sync pipeline has stages of differing criticality. Issues and MRs are **core** — their failure constitutes a hard failure. Docs and embeddings are **optional** — their failure produces a **degraded** outcome but does not trigger backoff or pause. This ensures data freshness for the most important entities even when peripheral stages have transient problems. + +#### 4. Resilient Failure Handling + +Errors are classified as transient (retry with backoff) or permanent (pause until user intervention). A **circuit breaker** trips after a configurable number of consecutive transient failures (default: 10), transitioning to a `half_open` probe state after a cooldown period (default: 30 minutes). In `half_open`, one trial run is allowed — if it succeeds, the breaker closes automatically; if it fails, the breaker returns to `paused` state requiring manual `lore service resume`. This provides self-healing for systemic but recoverable failures (DNS outages, temporary GitLab maintenance) while still halting on truly persistent problems. + +#### 5. Transactional Install + +The install process is two-phase: service files are generated and the platform-specific enable command is run first. Only on success is the install manifest written atomically. If the enable command fails, generated files are cleaned up and no manifest is persisted. This prevents a false "installed" state when the scheduler rejects the service configuration. + +#### 6. Serialized Admin Mutations + +All commands that mutate service state (install, uninstall, pause, resume, repair) acquire an admin-level lock — `AppLock("service-admin-{service_id}")` — before reading or writing manifest/status files. This prevents races between concurrent admin commands (e.g., a user running `service pause` while an automated tool runs `service resume`). The admin lock is separate from the `sync_pipeline` lock, which guards the data pipeline. Legal state transitions: + +- `idle` -> `running` -> `success` | `degraded` | `backoff` | `paused` +- `backoff` -> `running` | `paused` +- `paused` -> `half_open` | `running` (via `resume`) +- `half_open` -> `running` | `paused` + +Any transition not in this table is rejected with `ServiceCorruptState`. The `service run` entrypoint does NOT acquire the admin lock — it only acquires the `sync_pipeline` lock to avoid overlapping data writes. + +--- + +## Commands & User Journeys + +### `lore service install [--interval 30m] [--profile balanced] [--token-source env-file] [--name ] [--dry-run]` + +**What it does:** Generates and installs an OS-native scheduled task that runs `lore --robot service run --service-id ` at the specified interval, with the chosen sync profile, token storage strategy, and a project-scoped identity to avoid collisions across workspaces. + +**User journey:** +1. User runs `lore service install --interval 15m --profile fast` +2. CLI loads config to read `gitlab.tokenEnvVar` (default: `GITLAB_TOKEN`) +3. CLI resolves the token value from the current environment +4. CLI computes or reads `service_id`: + - If `--name` is provided, use it (sanitized to `[a-z0-9-]`) + - Otherwise, derive from a composite fingerprint of (workspace root + config path + sorted project URLs) — first 12 hex chars of SHA-256 + - This becomes the suffix for all platform-specific identifiers (launchd label, systemd unit name, Windows task name) +5. CLI resolves its own binary path via `std::env::current_exe()?.canonicalize()?` +6. CLI writes the token to a user-private env file (`{data_dir}/service-env-{service_id}`, mode 0600) unless `--token-source embedded` is explicitly passed +7. CLI generates the platform-specific service files (referencing `lore --robot service run --service-id `, NOT `lore sync`) +8. CLI writes service files to disk +9. CLI runs the platform-specific enable command +10. On success: CLI writes install manifest atomically (tmp file + fsync(file) + rename + fsync(parent_dir)) to `{data_dir}/service-manifest-{service_id}.json` +11. On failure: CLI removes generated service files, env file, wrapper script, and temp manifest — returns `ServiceCommandFailed` with stderr context +12. CLI outputs success with details of what was installed + +**Sync profiles:** + +| Profile | Sync flags | Use case | +|---------|-----------|----------| +| `fast` | `--no-docs --no-embed` | Minimal: issues + MRs only | +| `balanced` (default) | `--no-embed` | Issues + MRs + doc generation | +| `full` | (none) | Full pipeline including embeddings | + +The profile determines what flags are passed to the underlying sync command. The scheduler invocation is always `lore --robot service run --service-id `, which reads the profile from the install manifest and constructs the appropriate sync flags. + +**Token storage strategies:** + +| Strategy | Behavior | Security | Platforms | +|----------|----------|----------|-----------| +| `env-file` (default) | Token written to `{data_dir}/service-env-{service_id}` with 0600 permissions. On Linux/systemd, referenced via `EnvironmentFile=` (true file-based loading). On macOS/launchd, a wrapper shell script (mode 0700) sources the env file at runtime and execs `lore` — the token never appears in the plist. | Token file only readable by owner. Canonical source is the env file; `lore service install` re-reads it on regeneration. | macOS, Linux | +| `embedded` | Token embedded directly in service file. Requires explicit `--token-source embedded` flag. CLI prints a security warning. | Less secure: token visible in plist/unit file. | macOS, Linux | + +On Windows, neither strategy applies — the token must be in the user's system environment (set via `setx` or system settings). `token_source` is reported as `"system_env"`. This is documented as a requirement in `lore service install` output on Windows. + +> **Note on macOS wrapper script approach:** launchd cannot natively load environment files. Rather than embedding the token directly in the plist (which would persist it in a readable XML file), we generate a small wrapper shell script (`{data_dir}/service-run-{service_id}.sh`, mode 0700) that sources the env file and execs `lore`. The plist's `ProgramArguments` points to the wrapper script, keeping the token out of the plist entirely. On Linux/systemd, `EnvironmentFile=` provides native file-based loading without any wrapper needed. +> +> **Future enhancement:** On macOS, Keychain integration could eliminate the env file entirely. On Windows, Credential Manager could replace the system environment requirement. These are deferred to a future iteration to avoid adding platform-specific secure store dependencies (`security-framework`, `winapi`) in v1. + +**Acceptance criteria:** +- Parses interval strings: `5m`, `15m`, `30m`, `1h`, `2h`, `12h`, `24h` +- Rejects intervals < 5 minutes or > 24 hours +- Rejects non-numeric or malformed intervals with clear error messages +- Computes `service_id` from composite fingerprint (workspace root + config path + project URLs) or `--name` flag; sanitizes to `[a-z0-9-]`. If `--name` collides with an existing service with a different identity hash, returns an actionable error. +- If already installed (manifest exists for this `service_id`): reads existing manifest. If config matches, reports `no_change: true`. If config differs, overwrites and reports what changed. +- If `GITLAB_TOKEN` (or configured env var) is not set, fails with `TokenNotSet` error +- If `current_exe()` fails, returns `ServiceError` +- Creates parent directories for service files if they don't exist +- Writes install manifest atomically (tmp file + fsync(file) + rename + fsync(parent_dir)) alongside service files +- Runs `service doctor` checks as a pre-flight: validates scheduler prerequisites (e.g., systemd user manager/linger on Linux, GUI session context on macOS) and surfaces warnings or errors before installing +- `--dry-run`: validates config/token/prereqs, renders service files and planned commands, but writes nothing and executes nothing. Robot output includes `"dry_run": true` and the rendered service file content for inspection. +- Robot mode outputs `{"ok":true,"data":{...},"meta":{"elapsed_ms":N}}` +- Human mode outputs a clear summary with file paths and next steps + +**Robot output:** +```json +{ + "ok": true, + "data": { + "platform": "launchd", + "service_id": "a1b2c3d4e5f6", + "interval_seconds": 900, + "profile": "fast", + "binary_path": "/usr/local/bin/lore", + "config_path": null, + "service_files": ["/Users/x/Library/LaunchAgents/com.gitlore.sync.a1b2c3d4e5f6.plist"], + "sync_command": "/usr/local/bin/lore --robot service run --service-id a1b2c3d4e5f6", + "token_env_var": "GITLAB_TOKEN", + "token_source": "env_file", + "no_change": false + }, + "meta": { "elapsed_ms": 42 } +} +``` + +**Human output:** +``` +Service installed: + Platform: launchd + Service ID: a1b2c3d4e5f6 + Interval: 15m (900s) + Profile: fast (--no-docs --no-embed) + Binary: /usr/local/bin/lore + Service: ~/Library/LaunchAgents/com.gitlore.sync.a1b2c3d4e5f6.plist + Command: lore --robot service run --service-id a1b2c3d4e5f6 + Token: stored in ~/.local/share/lore/service-env-a1b2c3d4e5f6 (0600) + + To rotate your token: lore service install +``` + +--- + +### `lore service list` + +**What it does:** Lists all installed services discovered from `{data_dir}/service-manifest-*.json` files. Useful when managing multiple gitlore workspaces to see all active installations at a glance. + +**User journey:** +1. User runs `lore service list` +2. CLI scans `{data_dir}` for files matching `service-manifest-*.json` +3. Reads each manifest and verifies platform state +4. Outputs summary of all installed services + +**Acceptance criteria:** +- Returns empty list (not error) when no services installed +- Shows `service_id`, `platform`, `interval`, `profile`, `installed_at_iso` for each +- Verifies platform state matches manifest (flags drift) +- Robot and human output modes + +**Robot output:** +```json +{ + "ok": true, + "data": { + "services": [ + { + "service_id": "a1b2c3d4e5f6", + "platform": "launchd", + "interval_seconds": 900, + "profile": "fast", + "installed_at_iso": "2026-02-09T10:00:00Z", + "platform_state": "loaded", + "drift": false + } + ] + }, + "meta": { "elapsed_ms": 15 } +} +``` + +**Human output:** +``` +Installed services: + a1b2c3d4e5f6 launchd 15m fast installed 2026-02-09 loaded +``` + +Or when none installed: +``` +No services installed. Run: lore service install +``` + +--- + +### `lore service uninstall [--service ] [--all]` + +**What it does:** Disables and removes the scheduled task, its manifest, and its token env file. + +**User journey:** +1. User runs `lore service uninstall` +2. CLI resolves target service: uses `--service` if provided, otherwise derives `service_id` from current project config. If multiple manifests exist and no selector is provided, returns an actionable error listing available services with `lore service list`. +3. If manifest doesn't exist, checks platform directly; if not installed, exits cleanly with informational message (exit 0, not an error) +4. Runs platform-specific disable command +5. Removes service files from disk +6. Removes install manifest (`service-manifest-{service_id}.json`) +7. Removes token env file (`service-env-{service_id}`) if it exists +8. Does NOT remove the status file or log files (those are operational data, not config) +9. Outputs confirmation + +**Acceptance criteria:** +- Idempotent: running when not installed is not an error +- Removes ALL service files (timer + service on systemd), the install manifest, and the token env file +- Does NOT remove the status file or log files (those are data, not config) +- If platform disable command fails (e.g., service was already unloaded), still removes files and succeeds +- Robot and human output modes + +**Robot output:** +```json +{ + "ok": true, + "data": { + "was_installed": true, + "service_id": "a1b2c3d4e5f6", + "platform": "launchd", + "removed_files": [ + "/Users/x/Library/LaunchAgents/com.gitlore.sync.a1b2c3d4e5f6.plist", + "/Users/x/.local/share/lore/service-manifest-a1b2c3d4e5f6.json", + "/Users/x/.local/share/lore/service-env-a1b2c3d4e5f6" + ] + }, + "meta": { "elapsed_ms": 15 } +} +``` + +**Human output:** +``` +Service uninstalled (a1b2c3d4e5f6): + Removed: ~/Library/LaunchAgents/com.gitlore.sync.a1b2c3d4e5f6.plist + Removed: ~/.local/share/lore/service-manifest-a1b2c3d4e5f6.json + Removed: ~/.local/share/lore/service-env-a1b2c3d4e5f6 + Kept: ~/.local/share/lore/sync-status-a1b2c3d4e5f6.json (run history) + Kept: ~/.local/share/lore/logs/ (service logs) +``` + +Or if not installed: +``` +Service is not installed. Nothing to do. +``` + +--- + +### `lore service status [--service ]` + +**What it does:** Shows install state, scheduler state (running/backoff/paused/half_open/idle), last sync result, recent run history, and next run estimate. Resolves target service via `--service` flag or current-project-derived default. + +**User journey:** +1. User runs `lore service status` +2. CLI resolves target service: uses `--service` if provided, otherwise derives `service_id` from current project config. If multiple manifests exist and no selector is provided, returns an actionable error listing available services with `lore service list`. +3. CLI reads install manifest from `{data_dir}/service-manifest-{service_id}.json` +4. If installed, verifies platform state matches manifest (detects drift) +5. Reads `{data_dir}/sync-status-{service_id}.json` for last sync and recent run history +6. Queries platform for service state and next run time +7. Computes scheduler state from status file + backoff logic +8. Outputs combined status + +**Scheduler states:** +- `idle` — installed but no runs yet +- `running` — currently executing (sync_pipeline lock held, `current_run` metadata present with recent `started_at_ms`) +- `running_stale` — `current_run` metadata exists but the process (by PID) is no longer alive, or `started_at_ms` is older than 30 minutes. Indicates a crashed or killed previous run. `lore service status` reports this with the stale run's start time and PID for diagnostics. +- `degraded` — last run completed but one or more optional stages failed (docs/embeddings). Core data (issues/MRs) is fresh. +- `backoff` — transient failures, waiting to retry +- `half_open` — circuit breaker cooldown expired; one probe run is allowed. If it succeeds, the breaker closes automatically and state returns to normal. If it fails, state transitions to `paused`. +- `paused` — permanent error detected (bad token, config error) OR circuit breaker tripped and probe failed. Requires user intervention via `lore service resume`. +- `not_installed` — service not installed + +**Acceptance criteria:** +- Works even if service is not installed (shows `installed: false`, `scheduler_state: "not_installed"`) +- Works even if status file doesn't exist (shows `last_sync: null`) +- Shows backoff state with remaining time if in backoff +- Shows paused reason if in paused state +- Includes recent runs summary (last 5 runs) +- Shows next scheduled run if determinable from platform +- Detects drift at multiple levels: + - **Platform drift:** loaded/unloaded mismatch between manifest and OS scheduler + - **Spec drift:** SHA-256 hash of service file content on disk doesn't match `spec_hash` in manifest (detects manual edits to plist/unit files) + - **Command drift:** sync command in service file differs from manifest's `sync_command` +- Exit code 0 always (status is informational) + +**Robot output:** +```json +{ + "ok": true, + "data": { + "installed": true, + "service_id": "a1b2c3d4e5f6", + "platform": "launchd", + "interval_seconds": 1800, + "profile": "balanced", + "service_state": "loaded", + "scheduler_state": "running", + "last_sync": { + "timestamp_iso": "2026-02-09T10:30:00.000Z", + "duration_seconds": 12.5, + "outcome": "success", + "stage_results": [ + { "stage": "issues", "success": true, "items_updated": 5 }, + { "stage": "mrs", "success": true, "items_updated": 3 }, + { "stage": "docs", "success": true, "items_updated": 12 } + ], + "consecutive_failures": 0 + }, + "recent_runs": [ + { "timestamp_iso": "2026-02-09T10:30:00Z", "outcome": "success", "duration_seconds": 12.5 }, + { "timestamp_iso": "2026-02-09T10:00:00Z", "outcome": "success", "duration_seconds": 11.8 } + ], + "backoff": null, + "paused_reason": null, + "drift": { + "platform_drift": false, + "spec_drift": false, + "command_drift": false + } + }, + "meta": { "elapsed_ms": 15 } +} +``` + +When degraded (optional stages failed): +```json +"scheduler_state": "degraded", +"last_sync": { + "outcome": "degraded", + "stage_results": [ + { "stage": "issues", "success": true, "items_updated": 5 }, + { "stage": "mrs", "success": true, "items_updated": 3 }, + { "stage": "docs", "success": false, "error": "I/O error writing documents" } + ] +} +``` + +When in backoff: +```json +"scheduler_state": "backoff", +"backoff": { + "consecutive_failures": 3, + "next_retry_iso": "2026-02-09T14:30:00.000Z", + "remaining_seconds": 7200 +} +``` + +When paused (permanent error): +```json +"scheduler_state": "paused", +"paused_reason": "AUTH_FAILED: GitLab returned 401 Unauthorized. Run: lore service resume" +``` + +When paused (circuit breaker): +```json +"scheduler_state": "paused", +"paused_reason": "CIRCUIT_BREAKER: 10 consecutive transient failures (last: NetworkError). Run: lore service resume" +``` + +When in half-open (circuit breaker cooldown expired, probe pending): +```json +"scheduler_state": "half_open", +"backoff": { + "consecutive_failures": 10, + "circuit_breaker_cooldown_expired": true, + "message": "Circuit breaker cooldown expired. Next run will be a probe attempt." +} +``` + +**Human output:** +``` +Service status (a1b2c3d4e5f6): + Installed: yes + Platform: launchd + Interval: 30m (1800s) + Profile: balanced + State: loaded + Scheduler: running + +Last sync: + Time: 2026-02-09 10:30:00 UTC + Duration: 12.5s + Outcome: success + Stages: issues (5), mrs (3), docs (12) + Failures: 0 consecutive + +Recent runs (last 5): + 10:30 UTC success 12.5s + 10:00 UTC success 11.8s +``` + +When degraded: +``` + Scheduler: DEGRADED + Core stages OK: issues (5), mrs (3) + Failed stages: docs (I/O error writing documents) + Core data is fresh. Optional stages will retry next run. +``` + +When paused (permanent error): +``` + Scheduler: PAUSED - AUTH_FAILED + GitLab returned 401 Unauthorized + Fix: rotate token, then run: lore service resume +``` + +When paused (circuit breaker): +``` + Scheduler: PAUSED - CIRCUIT_BREAKER + 10 consecutive transient failures (last: NetworkError) + Fix: check network/GitLab availability, then run: lore service resume +``` + +When half-open (circuit breaker cooldown expired): +``` + Scheduler: HALF_OPEN + Circuit breaker cooldown expired. Next run will probe. + If probe succeeds, scheduler returns to normal. +``` + +--- + +### `lore service logs [--tail ] [--follow] [--open] [--service ]` + +**What it does:** Displays or streams the service log file. By default, prints the last 100 lines to stdout. With `--tail `, shows the last N lines. With `--follow`, streams new lines as they arrive (like `tail -f`). With `--open`, opens in the user's preferred editor. + +**User journey (default):** +1. User runs `lore service logs` +2. CLI determines log path: `{data_dir}/logs/service-{service_id}-stderr.log` +3. CLI checks if file exists; if not, outputs "No log file found yet" with the expected path +4. Prints last 100 lines to stdout + +**User journey (--open):** +1. User runs `lore service logs --open` +2. CLI determines editor: `$VISUAL` -> `$EDITOR` -> `less` (Unix) / `notepad` (Windows) +3. Spawns editor as child process, waits for exit +4. Exits with editor's exit code + +**User journey (--tail / --follow):** +1. User runs `lore service logs --tail 50` or `lore service logs --follow` +2. CLI reads the last N lines or streams with follow +3. Outputs directly to stdout + +**Log rotation:** Rotate `service-{service_id}-stdout.log` and `service-{service_id}-stderr.log` at 10 MB, keeping 5 rotated files. Rotation is checked at 10 MB, not at every write. This avoids creating many small files and prevents log file explosion. + +**Acceptance criteria:** +- Default (no flags): prints last 100 lines to stdout +- `--open`: Falls back through `VISUAL` -> `EDITOR` -> `less` -> `notepad`. If no editor and no `less` available, returns `ServiceError` with suggestion. +- `--tail ` shows last N lines (default 100 if no value), exits immediately +- `--follow` streams new log lines until Ctrl-C (like `tail -f`); mutually exclusive with `--open` +- `--tail` and `--follow` can be combined: show last N lines then follow +- In robot mode, outputs the log file path and optionally last N lines as JSON (never opens editor) + +**Robot output (does not open editor):** +```json +{ + "ok": true, + "data": { + "log_path": "/Users/x/.local/share/lore/logs/service-a1b2c3d4e5f6-stderr.log", + "exists": true, + "size_bytes": 4096, + "last_lines": ["2026-02-09T10:30:00Z sync completed in 12.5s", "..."] + }, + "meta": { "elapsed_ms": 1 } +} +``` + +The `last_lines` field is included when `--tail` is specified in robot mode (capped at 100 lines to avoid bloated JSON). Without `--tail`, only path metadata is returned. `--follow` is not supported in robot mode (returns error: "follow mode requires interactive terminal"). + +--- + +### `lore service doctor` + +**What it does:** Validates that the service environment is healthy: scheduler prerequisites, token validity, file permissions, config accessibility, and platform-specific readiness. + +**User journey:** +1. User runs `lore service doctor` (or it runs automatically as a pre-flight during `service install`) +2. CLI runs a series of diagnostic checks and reports pass/warn/fail for each + +**Diagnostic checks:** +1. **Config accessible** — Can load and parse `config.json` +2. **Token present** — Configured env var is set and non-empty +3. **Token valid** — Quick auth test against GitLab API (optional, skipped with `--offline`) +4. **Binary path** — `current_exe()` resolves and is executable +5. **Data directory** — Writable by current user +6. **Platform prerequisites:** + - **macOS:** Running in a GUI login session (launchd bootstrap domain is `gui/{uid}`, not `system`) + - **Linux:** `systemctl --user` is available; user manager is running; `loginctl enable-linger` is active (required for timers to fire when user is not logged in) + - **Windows:** `schtasks` is available +7. **Existing install** — If manifest exists, verify platform state matches (drift detection) + +**Acceptance criteria:** +- Each check reports: `pass`, `warn`, or `fail` +- Warnings are non-blocking (e.g., linger not enabled — timer works when logged in but not on reboot) +- Failures are blocking for `service install` (install aborts with actionable message) +- `--offline` skips network checks (token validation) +- `--fix` attempts safe, non-destructive remediations for fixable issues: create missing directories, correct file permissions on env/wrapper files (0600/0700), run `systemctl --user daemon-reload` when applicable. Reports each applied fix in the output. Does NOT attempt fixes that could cause data loss. +- Exit code: 0 if all pass/warn, non-zero if any fail + +**Robot output:** +```json +{ + "ok": true, + "data": { + "checks": [ + { "name": "config", "status": "pass" }, + { "name": "token_present", "status": "pass" }, + { "name": "token_valid", "status": "pass" }, + { "name": "binary_path", "status": "pass" }, + { "name": "data_directory", "status": "pass" }, + { "name": "platform_prerequisites", "status": "warn", "message": "loginctl linger not enabled; timer will not fire on reboot without active session", "action": "loginctl enable-linger $(whoami)" }, + { "name": "install_state", "status": "pass" } + ], + "overall": "warn" + }, + "meta": { "elapsed_ms": 850 } +} +``` + +**Human output:** +``` +Service doctor: + [PASS] Config loaded from ~/.config/lore/config.json + [PASS] GITLAB_TOKEN is set + [PASS] GitLab authentication successful + [PASS] Binary: /usr/local/bin/lore + [PASS] Data dir: ~/.local/share/lore/ (writable) + [WARN] loginctl linger not enabled + Timer will not fire on reboot without active session + Fix: loginctl enable-linger $(whoami) + [PASS] No existing install detected + + Overall: WARN (1 warning) +``` + +--- + +### `lore service run` (hidden/internal) + +**What it does:** Executes one scheduled sync attempt with full service-level policy. This is the command the OS scheduler actually invokes — users should never need to call it directly. + +**Invocation by scheduler:** `lore --robot service run --service-id ` + +**Execution flow:** +1. Read install manifest for the given `service_id` to determine profile, interval, and circuit breaker config +2. Read status file (service-scoped) +3. If paused (not half_open): check if circuit breaker cooldown has expired. If cooldown expired, transition to `half_open` and allow probe (continue to step 5). If cooldown still active or paused for permanent error, log reason, write status, exit 0. +4. If in backoff window: log skip reason, write status, exit 0 +5. Acquire `sync_pipeline` `AppLock` (prevents overlap with manual sync or another scheduled run) +6. If lock acquisition fails (another sync running): log, exit 0 +7. Execute sync pipeline with flags derived from profile +8. On success: reset `consecutive_failures` to 0, write status, release lock +9. On transient failure: increment `consecutive_failures`, compute next backoff, write status, release lock +10. On permanent failure: set `paused_reason`, write status, release lock + +**Stage-aware execution:** + +The sync pipeline is executed stage-by-stage, with each stage's outcome recorded independently: + +| Stage | Criticality | Failure behavior | In-run retry | +|-------|-------------|-----------------|--------------| +| `issues` | **core** | Hard failure — triggers backoff/pause | 1 retry on transient errors (1-5s jittered delay) | +| `mrs` | **core** | Hard failure — triggers backoff/pause | 1 retry on transient errors (1-5s jittered delay) | +| `docs` | **optional** | Degraded outcome — logged but does not trigger backoff | No retry (best-effort) | +| `embeddings` | **optional** | Degraded outcome — logged but does not trigger backoff | No retry (best-effort) | + +**In-run retries for core stages:** Before counting a core stage failure toward backoff/circuit-breaker, the service runner retries the stage once with a jittered delay of 1-5 seconds. This absorbs transient network blips (DNS hiccups, momentary 5xx responses) without extending run duration significantly. Only transient errors are retried — permanent errors (bad token, config errors) are never retried. If the retry succeeds, the stage is recorded as successful. If both attempts fail, the final error is used for classification. This significantly reduces false backoff triggers from brief network interruptions. + +If all core stages succeed (potentially after retry) but optional stages fail, the run outcome is `"degraded"` — consecutive failures are NOT incremented, and the scheduler state reflects `degraded` rather than `backoff`. This ensures data freshness for the most important entities even when peripheral stages have transient problems. + +**Transient vs permanent error classification:** + +| Error type | Classification | Examples | +|-----------|---------------|----------| +| Transient | Retry with backoff | Network timeout, DB locked, 5xx from GitLab | +| Transient (hinted) | Respect server retry hint | Rate limited with `Retry-After` or `X-RateLimit-Reset` header | +| Permanent | Pause until user action | 401 Unauthorized (bad token), config not found, config invalid, migration failed | + +The classification is determined by the `ErrorCode` of the underlying `LoreError`: +- Permanent: `TokenNotSet`, `AuthFailed`, `ConfigNotFound`, `ConfigInvalid`, `MigrationFailed` +- Transient: everything else (`NetworkError`, `RateLimited`, `DbLocked`, `DbError`, `InternalError`, etc.) + +# Key design decisions: +- `next_retry_at_ms` is computed once on failure and persisted — `service status` simply reads it for stable, consistent display +- **Retry-After awareness:** If a transient error includes a server-provided retry hint (e.g., `Retry-After` header on 429 responses, `X-RateLimit-Reset` on GitLab rate limits), the backoff is set to `max(computed_backoff, hinted_retry_at)`. This prevents useless retries during rate-limit windows and respects GitLab's guidance. The `backoff_reason` field (if present) indicates whether the backoff was server-hinted. +- Backoff base is the configured interval, not a hardcoded 1800s — a user with `--interval 5m` gets shorter backoffs than one with `--interval 1h` +- Optional stage failures produce `degraded` outcome without triggering backoff +- Respects backoff window from previous failures (reads `next_retry_at_ms` from status file) +- Pauses on permanent errors instead of burning retries +- Trips circuit breaker after 10 consecutive transient failures +- Exit code is always 0 (the scheduler should not interpret exit codes as retry signals — lore manages its own retry logic) + +**Circuit breaker (with half-open recovery):** + +After `max_transient_failures` consecutive transient failures (default: 10), the service transitions to `paused` state with reason `CIRCUIT_BREAKER`. However, instead of requiring manual intervention forever, the circuit breaker enters a `half_open` state after a cooldown period (`circuit_breaker_cooldown_seconds`, default: 1800 = 30 minutes). + +In `half_open`, the next `service run` invocation is allowed to proceed as a **probe**: +- If the probe succeeds or returns `degraded`, the circuit breaker closes automatically: `consecutive_failures` resets to 0, `paused_reason` is cleared, and normal operation resumes. +- If the probe fails, the circuit breaker returns to `paused` state with an updated `circuit_breaker_paused_at_ms` timestamp, starting another cooldown period. + +This provides self-healing for recoverable systemic failures (DNS outages, GitLab maintenance windows) without requiring manual `lore service resume` for every transient hiccup. Truly persistent problems (bad token, config corruption) are caught by the permanent error classifier and go directly to `paused` without the half-open mechanism. + +The `circuit_breaker_cooldown_seconds` is stored in the manifest alongside `max_transient_failures`. Both are hardcoded defaults for v1 (10 failures, 30-minute cooldown) but can be made configurable in a future iteration. + +**Acceptance criteria:** +- Hidden from `--help` (use `#[command(hide = true)]`) +- Always runs in robot mode regardless of `--robot` flag +- Acquires pipeline-level lock before executing sync +- Executes stages independently and records per-stage outcomes +- Retries transient core stage failures once (1-5s jittered delay) before counting as failed +- Permanent core stage errors are never retried — immediate pause +- Classifies core stage errors as transient or permanent +- Optional stage failures produce `degraded` outcome without triggering backoff +- Respects backoff window from previous failures (reads `next_retry_at_ms` from status file) +- Pauses on permanent errors instead of burning retries +- Trips circuit breaker after 10 consecutive transient failures +- Exit code is always 0 (the scheduler should not interpret exit codes as retry signals — lore manages its own retry logic) + +**Robot output (success):** +```json +{ + "ok": true, + "data": { + "action": "sync_completed", + "outcome": "success", + "profile": "balanced", + "duration_seconds": 45.2, + "stage_results": [ + { "stage": "issues", "success": true, "items_updated": 12 }, + { "stage": "mrs", "success": true, "items_updated": 4 }, + { "stage": "docs", "success": true, "items_updated": 28 } + ], + "consecutive_failures": 0 + }, + "meta": { "elapsed_ms": 45200 } +} +``` + +**Robot output (degraded — optional stages failed):** +```json +{ + "ok": true, + "data": { + "action": "sync_completed", + "outcome": "degraded", + "profile": "full", + "duration_seconds": 38.1, + "stage_results": [ + { "stage": "issues", "success": true, "items_updated": 12 }, + { "stage": "mrs", "success": true, "items_updated": 4 }, + { "stage": "docs", "success": true, "items_updated": 28 }, + { "stage": "embeddings", "success": false, "error": "Ollama unavailable" } + ], + "consecutive_failures": 0 + }, + "meta": { "elapsed_ms": 38100 } +} +``` + +**Robot output (skipped — backoff):** +```json +{ + "ok": true, + "data": { + "action": "skipped", + "reason": "backoff", + "consecutive_failures": 3, + "next_retry_iso": "2026-02-09T14:30:00.000Z", + "remaining_seconds": 1842 + }, + "meta": { "elapsed_ms": 1 } +} +``` + +**Robot output (paused — permanent error):** +```json +{ + "ok": true, + "data": { + "action": "paused", + "reason": "AUTH_FAILED", + "message": "GitLab returned 401 Unauthorized", + "suggestion": "Rotate token, then run: lore service resume" + }, + "meta": { "elapsed_ms": 1200 } +} +``` + +**Robot output (paused — circuit breaker):** +```json +{ + "ok": true, + "data": { + "action": "paused", + "reason": "CIRCUIT_BREAKER", + "message": "10 consecutive transient failures (last: NetworkError: connection refused)", + "consecutive_failures": 10, + "suggestion": "Check network/GitLab availability, then run: lore service resume" + }, + "meta": { "elapsed_ms": 1200 } +} +``` + +--- + +### `lore service resume [--service ]` + +**What it does:** Clears the paused state (including half-open circuit breaker) and resets consecutive failures, allowing the scheduler to retry on the next interval. + +**User journey:** +1. User sees `lore service status` reports `scheduler: PAUSED` +2. User fixes the underlying issue (rotates token, fixes config, etc.) +3. User runs `lore service resume` +4. CLI resets `consecutive_failures` to 0, clears `paused_reason` and `last_error_*` fields +5. Next scheduled `service run` will attempt sync normally + +**Acceptance criteria:** +- If not paused, exits cleanly with informational message ("Service is not paused") +- If not installed, exits cleanly with informational message ("Service is not installed") +- Does NOT trigger an immediate sync (just clears state — scheduler handles the next run) +- Robot and human output modes + +**Robot output:** +```json +{ + "ok": true, + "data": { + "was_paused": true, + "previous_reason": "AUTH_FAILED", + "consecutive_failures_cleared": 5 + }, + "meta": { "elapsed_ms": 2 } +} +``` + +**Human output:** +``` +Service resumed: + Previous state: PAUSED (AUTH_FAILED) + Failures cleared: 5 + Next sync will run at the scheduled interval. +``` + +Or for circuit breaker: +``` +Service resumed: + Previous state: PAUSED (CIRCUIT_BREAKER, 10 transient failures) + Failures cleared: 10 + Next sync will run at the scheduled interval. +``` + +--- + +### `lore service pause [--reason ] [--service ]` + +**What it does:** Pauses scheduled execution without uninstalling the service. Useful for maintenance windows, debugging, or temporarily stopping syncs while the underlying infrastructure is being modified. + +**User journey:** +1. User runs `lore service pause --reason "GitLab maintenance window"` +2. CLI writes `paused_reason` to the status file with the provided reason (or "Manually paused" if no reason given) +3. Next `service run` will see the paused state and exit immediately + +**Acceptance criteria:** +- Sets `paused_reason` in the status file +- Does NOT modify the OS scheduler (service remains installed and scheduled — it just no-ops) +- If already paused, updates the reason and reports `already_paused: true` +- `lore service resume` clears the pause (same as for other paused states) +- Robot and human output modes + +**Robot output:** +```json +{ + "ok": true, + "data": { + "service_id": "a1b2c3d4e5f6", + "paused": true, + "reason": "GitLab maintenance window", + "already_paused": false + }, + "meta": { "elapsed_ms": 2 } +} +``` + +**Human output:** +``` +Service paused (a1b2c3d4e5f6): + Reason: GitLab maintenance window + Resume with: lore service resume +``` + +--- + +### `lore service trigger [--ignore-backoff] [--service ]` + +**What it does:** Triggers an immediate one-off sync using the installed service profile and policy. Unlike running `lore sync` manually, this goes through the service policy layer (status file, stage-aware outcomes, error classification) — giving you the same behavior the scheduler would produce, but on-demand. + +**User journey:** +1. User runs `lore service trigger` +2. CLI reads the manifest to determine profile +3. By default, respects current backoff/paused state (reports skip reason if blocked) +4. With `--ignore-backoff`, bypasses backoff window (but NOT paused state — use `resume` for that) +5. Executes `handle_service_run` logic +6. Updates status file with the run result + +**Acceptance criteria:** +- Uses the installed profile from the manifest +- Default: respects backoff and paused states +- `--ignore-backoff`: bypasses backoff window, still respects paused +- If not installed, returns actionable error +- Robot and human output modes (same format as `service run` output) + +--- + +### `lore service repair [--service ]` + +**What it does:** Repairs corrupt manifest or status files by backing them up and reinitializing. This is a safe alternative to manually deleting files and reinstalling. + +**User journey:** +1. User runs `lore service repair` (typically after seeing `ServiceCorruptState` errors) +2. CLI checks manifest and status files for JSON parseability +3. If corrupt: renames the corrupt file to `{name}.corrupt.{timestamp}` (backup, not delete) +4. Reinitializes the status file to default state +5. If manifest is corrupt, reports that reinstallation is needed +6. Outputs what was repaired + +**Acceptance criteria:** +- Never deletes files — backs up corrupt files with `.corrupt.{timestamp}` suffix +- If both files are valid, reports "No repair needed" (exit 0) +- If manifest is corrupt, clears it and advises `lore service install` +- If status file is corrupt, reinitializes to default +- Robot and human output modes + +**Robot output:** +```json +{ + "ok": true, + "data": { + "repaired": true, + "actions": [ + { "file": "sync-status-a1b2c3d4e5f6.json", "action": "reinitialized", "backup": "sync-status-a1b2c3d4e5f6.json.corrupt.1707480000" } + ], + "needs_reinstall": false + }, + "meta": { "elapsed_ms": 5 } +} +``` + +**Human output:** +``` +Service repaired (a1b2c3d4e5f6): + Reinitialized: sync-status-a1b2c3d4e5f6.json + Backed up: sync-status-a1b2c3d4e5f6.json.corrupt.1707480000 +``` + +--- + +## Install Manifest + +### Location +`{get_data_dir()}/service-manifest-{service_id}.json` — e.g., `~/.local/share/lore/service-manifest-a1b2c3d4e5f6.json` + +### Purpose +Avoids brittle parsing of platform-specific files (plist XML, systemd units) to recover install configuration. `service status` reads the manifest first, then verifies platform state matches. The `service_id` suffix enables multiple coexisting installations for different workspaces. + +### Schema +```rust +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ServiceManifest { + /// Schema version for forward compatibility (start at 1) + pub schema_version: u32, + /// Stable identity for this service installation + pub service_id: String, + /// Canonical workspace root used in identity derivation + pub workspace_root: String, + /// When the service was first installed + pub installed_at_iso: String, + /// When the manifest was last written + pub updated_at_iso: String, + /// Platform backend + pub platform: String, + /// Configured interval in seconds + pub interval_seconds: u64, + /// Sync profile (fast/balanced/full) + pub profile: String, + /// Absolute path to the lore binary + pub binary_path: String, + /// Optional config path override + #[serde(skip_serializing_if = "Option::is_none")] + pub config_path: Option, + /// How the token is stored + pub token_source: String, + /// Token environment variable name + pub token_env_var: String, + /// Paths to generated service files + pub service_files: Vec, + /// The exact command the scheduler runs + pub sync_command: String, + /// Circuit breaker threshold (consecutive transient failures before pause) + pub max_transient_failures: u32, + /// Cooldown period before circuit breaker enters half-open probe state (seconds) + pub circuit_breaker_cooldown_seconds: u64, + /// SHA-256 hash of generated scheduler artifacts (plist/unit/wrapper content). + /// Used for spec-level drift detection: if file content on disk doesn't match + /// this hash, something external modified the service files. + pub spec_hash: String, +} +``` + +### `service_id` derivation + +```rust +/// Compute a stable service ID from a canonical identity tuple: +/// (workspace_root + config_path + sorted project URLs). +/// +/// This avoids collisions when multiple workspaces share one global config +/// by incorporating what is being synced (project URLs) and where the workspace +/// lives alongside the config location. +/// Returns first 12 hex chars of SHA-256 (48 bits — collision-safe for local use). +pub fn compute_service_id(workspace_root: &Path, config_path: &Path, project_urls: &[&str]) -> String { + use sha2::{Sha256, Digest}; + let canonical_config = config_path.canonicalize() + .unwrap_or_else(|_| config_path.to_path_buf()); + let canonical_workspace = workspace_root.canonicalize() + .unwrap_or_else(|_| workspace_root.to_path_buf()); + let mut hasher = Sha256::new(); + hasher.update(canonical_workspace.to_string_lossy().as_bytes()); + hasher.update(b"\0"); + hasher.update(canonical_config.to_string_lossy().as_bytes()); + // Sort URLs for determinism regardless of config ordering + let mut urls: Vec<&str> = project_urls.to_vec(); + urls.sort_unstable(); + for url in &urls { + hasher.update(b"\0"); // separator to prevent concatenation collisions + hasher.update(url.as_bytes()); + } + let hash = hasher.finalize(); + hex::encode(&hash[..6]) // 12 hex chars +} + +/// Sanitize a user-provided name to [a-z0-9-], max 32 chars. +pub fn sanitize_service_name(name: &str) -> Result { + let sanitized: String = name.to_lowercase() + .chars() + .map(|c| if c.is_ascii_alphanumeric() || c == '-' { c } else { '-' }) + .collect(); + let trimmed = sanitized.trim_matches('-').to_string(); + if trimmed.is_empty() { + return Err("Service name must contain at least one alphanumeric character".into()); + } + if trimmed.len() > 32 { + return Err("Service name must be 32 characters or fewer".into()); + } + Ok(trimmed) +} +``` + +### Read/Write +- `ServiceManifest::read(path: &Path) -> Result, LoreError>` — returns `Ok(None)` if file doesn't exist, `Err` if file exists but is corrupt/unparseable (distinguishes missing from corrupt). **Schema migration:** If the file has `schema_version < CURRENT_VERSION`, the read method migrates the in-memory model to the current version (adding default values for new fields) and atomically rewrites the file. If the file has an unknown future `schema_version` (higher than current), it returns `Err(ServiceCorruptState)` with an actionable message to update `lore`. +- `ServiceManifest::write_atomic(&self, path: &Path) -> std::io::Result<()>` — writes to tmp file in same directory, fsyncs, then renames over target. Creates parent dirs if needed. +- Written by `service install`, read by `service status`, `service run`, `service uninstall` +- `service uninstall` removes the manifest file + +--- + +## Status File + +### Location +`{get_data_dir()}/sync-status-{service_id}.json` — e.g., `~/.local/share/lore/sync-status-a1b2c3d4e5f6.json` + +Add `get_service_status_path(service_id: &str)` to `src/core/paths.rs`. + +**Service-scoped status:** Each installed service gets its own status file, keyed by `service_id`. This prevents cross-service contamination — a `fast` profile service pausing due to transient errors should not affect a `full` profile service's state. The pipeline lock remains global (`sync_pipeline`) to prevent overlapping writes to the shared database. + +### Schema +```rust +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SyncStatusFile { + /// Schema version for forward compatibility (start at 1) + pub schema_version: u32, + /// When this status file was last written + pub updated_at_iso: String, + /// Most recent run result (None if no runs yet — matches idle state) + #[serde(skip_serializing_if = "Option::is_none")] + pub last_run: Option, + /// Rolling window of recent runs (last 10, newest first) + #[serde(default)] + pub recent_runs: Vec, + /// Count of consecutive failures (resets to 0 on success or degraded outcome) + pub consecutive_failures: u32, + /// Persisted next retry time (set on failure, cleared on success/resume). + /// Computed once at failure time with jitter, then read-only comparison afterward. + /// This avoids recomputing jitter on every status check. + #[serde(skip_serializing_if = "Option::is_none")] + pub next_retry_at_ms: Option, + /// If set, service is paused due to a permanent error or circuit breaker + #[serde(skip_serializing_if = "Option::is_none")] + pub paused_reason: Option, + /// Timestamp when circuit breaker entered paused state (for cooldown calculation) + #[serde(skip_serializing_if = "Option::is_none")] + pub circuit_breaker_paused_at_ms: Option, + /// Error code that caused the pause (for machine consumption) + #[serde(skip_serializing_if = "Option::is_none")] + pub last_error_code: Option, + /// Error message from last failure + #[serde(skip_serializing_if = "Option::is_none")] + pub last_error_message: Option, + /// In-flight run metadata for crash/stale detection. Written to the status file at run start, + /// cleared on completion (success or failure). If present when a new run starts, the previous + /// run crashed or was killed. + #[serde(skip_serializing_if = "Option::is_none")] + pub current_run: Option, +} + +/// Metadata for an in-flight sync run. Used to detect stale/crashed runs. +/// Written to the status file at run start, cleared on completion (success or failure). +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct CurrentRunState { + /// Unix timestamp (ms) when this run started + pub started_at_ms: i64, + /// PID of the process executing this run + pub pid: u32, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SyncRunRecord { + /// ISO-8601 timestamp of this sync run + pub timestamp_iso: String, + /// Unix timestamp in milliseconds + pub timestamp_ms: i64, + /// How long the sync took + pub duration_seconds: f64, + /// Run outcome: "success", "degraded", or "failed" + pub outcome: String, + /// Per-stage results (only present in detailed records, not in recent_runs summary) + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub stage_results: Vec, + /// Error message if sync failed (None on success/degraded) + #[serde(skip_serializing_if = "Option::is_none")] + pub error_message: Option, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct StageResult { + /// Stage name: "issues", "mrs", "docs", "embeddings" + pub stage: String, + /// Whether this stage completed successfully + pub success: bool, + /// Number of items created/updated (0 on failure) + #[serde(default)] + pub items_updated: usize, + /// Error message if stage failed + #[serde(skip_serializing_if = "Option::is_none")] + pub error: Option, + /// Machine-readable error code from the underlying LoreError (e.g., "AUTH_FAILED", "NETWORK_ERROR"). + /// Propagated through the stage execution layer for reliable error classification. + /// Falls back to string matching on `error` field when not available. + #[serde(skip_serializing_if = "Option::is_none")] + pub error_code: Option, +} +``` + +### Read/Write +- `SyncStatusFile::read(path: &Path) -> Result, LoreError>` — returns `Ok(None)` if file doesn't exist, `Err` if file exists but is corrupt/unparseable (distinguishes missing from corrupt — a corrupt status file is a warning, not fatal). **Schema migration:** Same behavior as `ServiceManifest::read` — migrates older versions to current, rejects unknown future versions. +- `SyncStatusFile::write_atomic(&self, path: &Path) -> std::io::Result<()>` — writes to tmp file in same directory, fsyncs, then renames over target. Creates parent dirs if needed. Atomic writes prevent truncated JSON from crashes during write. +- `SyncStatusFile::record_run(&mut self, run: SyncRunRecord)` — pushes to `recent_runs` (capped at 10), updates `last_run` +- `SyncStatusFile::clear_paused(&mut self)` — clears `paused_reason`, `circuit_breaker_paused_at_ms`, `last_error_*`, `next_retry_at_ms`, resets `consecutive_failures` +- File is **NOT** a fatal error source — if write fails, log a warning and continue (sync result matters more than recording it) + +### Backoff Logic + +Backoff applies **only** to transient errors and **only** within `service run`. Manual `lore sync` is never subject to backoff. Permanent errors bypass backoff entirely and enter the paused state. + +**Key design change (from feedback):** Instead of recomputing jitter on every `service status` / `service run` check, we compute `next_retry_at_ms` **once** at failure time and persist it. This makes status output stable, avoids predictable jitter from timestamp-seeded determinism, and simplifies the read path to a single comparison. + +```rust +/// Injectable time source for deterministic testing. +pub trait Clock: Send + Sync { + fn now_ms(&self) -> i64; +} + +/// Production clock using chrono. +pub struct SystemClock; +impl Clock for SystemClock { + fn now_ms(&self) -> i64 { + chrono::Utc::now().timestamp_millis() + } +} + +/// Injectable RNG for deterministic jitter tests. +pub trait JitterRng: Send + Sync { + /// Returns a value in [0.0, 1.0) + fn next_f64(&mut self) -> f64; +} + +/// Production RNG using thread_rng. +pub struct ThreadJitterRng; +impl JitterRng for ThreadJitterRng { + fn next_f64(&mut self) -> f64 { + use rand::Rng; + rand::thread_rng().gen() + } +} + +impl SyncStatusFile { + /// Check if we're still in a backoff window. + /// Returns None if sync should proceed. + /// Returns Some(remaining_seconds) if within backoff window. + /// Reads the persisted `next_retry_at_ms` — no jitter computation on the read path. + pub fn backoff_remaining(&self, clock: &dyn Clock) -> Option { + // Paused state is handled separately (not via backoff) + if self.paused_reason.is_some() { + return None; // caller checks paused_reason directly + } + + if self.consecutive_failures == 0 { + return None; + } + + let next_retry = self.next_retry_at_ms?; + let now_ms = clock.now_ms(); + + if now_ms < next_retry { + Some(((next_retry - now_ms) / 1000) as u64) + } else { + None // backoff expired, proceed + } + } + + /// Compute and set next_retry_at_ms after a transient failure. + /// Called once at failure time — jitter is applied here, not on reads. + /// Uses the *configured* interval as the backoff base (not a hardcoded value). + /// If the server provided a retry hint (e.g., Retry-After header), it is + /// respected as a floor: next_retry_at_ms = max(computed_backoff, hint). + pub fn set_backoff( + &mut self, + base_interval_seconds: u64, + clock: &dyn Clock, + rng: &mut dyn JitterRng, + retry_after_ms: Option, + ) { + let exponent = (self.consecutive_failures - 1).min(20); // prevent overflow + let base_backoff = (base_interval_seconds as u128) + .saturating_mul(1u128 << exponent) + .min(4 * 3600) as u64; // cap at 4 hours + + // Full jitter: uniform random in [base_interval..cap] + // This decorrelates retries across multiple installations while ensuring + // the minimum backoff is always at least the configured interval. + let jitter_factor = rng.next_f64(); // 0.0..1.0 + let min_backoff = base_interval_seconds; + let span = base_backoff.saturating_sub(min_backoff); + let backoff_secs = min_backoff + ((span as f64) * jitter_factor) as u64; + + let computed_retry_at = clock.now_ms() + (backoff_secs as i64 * 1000); + + // Respect server-provided retry hint as a floor + self.next_retry_at_ms = Some(match retry_after_ms { + Some(hint) => computed_retry_at.max(hint), + None => computed_retry_at, + }); + } +} +``` + +**Key design decisions:** +- `next_retry_at_ms` is computed once on failure and persisted — `service status` simply reads it for stable, consistent display +- Backoff base is the configured interval, not a hardcoded 1800s — a user with `--interval 5m` gets shorter backoffs than one with `--interval 1h` +- Full jitter (random in `[base_interval..cap]`) decorrelates retries across multiple installations, avoiding thundering herd +- Injectable `JitterRng` trait enables deterministic testing without seeding from timestamps +- Paused state is checked separately from backoff — they are orthogonal concerns +- `next_retry_at_ms` is cleared on success and on `service resume` + +### Backoff examples with 30m (1800s) base interval: +| consecutive_failures | max_backoff_seconds | human-readable range | +|---------------------|---------------------|----------------------| +| 1 | 1800 | 30 min (jittered within [30m, 30m]) | +| 2 | 3600 | up to 1 hour (min 30m) | +| 3 | 7200 | up to 2 hours (min 30m) | +| 4 | 14400 | up to 4 hours (capped, min 30m) | +| 5-9 | 14400 | up to 4 hours (capped, min 30m) | +| 10 | — | **circuit breaker trips → paused** | + +--- + +## Service Run Implementation (`handle_service_run`) + +**Critical:** Backoff, error classification, circuit breaker, stage-aware execution, and status file management live **only** in `handle_service_run`. The manual `handle_sync_cmd` is NOT modified — it does not read or write the service status file. + +### Location: `src/cli/commands/service/run.rs` + +```rust +pub fn handle_service_run(service_id: &str, start: std::time::Instant) -> Result<(), Box> { + let clock = SystemClock; + let mut rng = ThreadJitterRng; + + // 1. Read manifest for the given service_id + let manifest_path = lore::core::paths::get_service_manifest_path(service_id); + let manifest = ServiceManifest::read(&manifest_path)? + .ok_or_else(|| LoreError::ServiceError { + message: format!("Service manifest not found for service_id '{service_id}'. Is the service installed?"), + })?; + + // 2. Read status file (service-scoped) + let status_path = lore::core::paths::get_service_status_path(&manifest.service_id); + let mut status = match SyncStatusFile::read(&status_path) { + Ok(Some(s)) => s, + Ok(None) => SyncStatusFile::default(), + Err(e) => { + tracing::warn!(error = %e, "Corrupt status file, starting fresh"); + SyncStatusFile::default() + } + }; + + // 3. Check paused state (permanent error or circuit breaker) + if let Some(reason) = &status.paused_reason { + // Check for circuit breaker half-open transition + let is_circuit_breaker = reason.starts_with("CIRCUIT_BREAKER"); + let half_open = is_circuit_breaker + && status.circuit_breaker_paused_at_ms.map_or(false, |paused_at| { + let cooldown_ms = (manifest.circuit_breaker_cooldown_seconds as i64) * 1000; + clock.now_ms() >= paused_at + cooldown_ms + }); + + if half_open { + // Cooldown expired — allow probe run (continue to step 5) + tracing::info!("Circuit breaker half-open: allowing probe run"); + } else { + print_robot_json(json!({ + "ok": true, + "data": { + "action": "paused", + "reason": reason, + "suggestion": if is_circuit_breaker { + format!("Waiting for cooldown ({}s). Or run: lore service resume", + manifest.circuit_breaker_cooldown_seconds) + } else { + "Fix the issue, then run: lore service resume".to_string() + } + }, + "meta": { "elapsed_ms": start.elapsed().as_millis() } + })); + return Ok(()); + } + } + + // 4. Check backoff (reads persisted next_retry_at_ms — no jitter computation) + if let Some(remaining) = status.backoff_remaining(&clock) { + print_robot_json(json!({ + "ok": true, + "data": { + "action": "skipped", + "reason": "backoff", + "consecutive_failures": status.consecutive_failures, + "next_retry_iso": status.next_retry_at_ms.map(|ms| { + chrono::DateTime::from_timestamp_millis(ms) + .map(|dt| dt.to_rfc3339()) + }), + "remaining_seconds": remaining, + }, + "meta": { "elapsed_ms": start.elapsed().as_millis() } + })); + return Ok(()); + } + + // 5. Acquire pipeline lock + let lock = match AppLock::try_acquire("sync_pipeline", stale_minutes) { + Ok(lock) => lock, + Err(_) => { + print_robot_json(json!({ + "ok": true, + "data": { "action": "skipped", "reason": "locked" }, + "meta": { "elapsed_ms": start.elapsed().as_millis() } + })); + return Ok(()); + } + }; + + // 6. Write current_run metadata for stale-run detection + status.current_run = Some(CurrentRunState { + started_at_ms: clock.now_ms(), + pid: std::process::id(), + }); + let _ = status.write_atomic(&status_path); // best-effort + + // 7. Build sync args from profile + let sync_args = manifest.profile_to_sync_args(); + + // 8. Execute sync pipeline stage-by-stage + let stage_results = execute_sync_stages(&sync_args); + + // 8. Classify outcome + let core_failed = stage_results.iter() + .any(|s| (s.stage == "issues" || s.stage == "mrs") && !s.success); + let optional_failed = stage_results.iter() + .any(|s| (s.stage == "docs" || s.stage == "embeddings") && !s.success); + let all_success = stage_results.iter().all(|s| s.success); + + let outcome = if all_success { + "success" + } else if !core_failed && optional_failed { + "degraded" + } else { + "failed" + }; + + let run = SyncRunRecord { + timestamp_iso: chrono::Utc::now().to_rfc3339(), + timestamp_ms: clock.now_ms(), + duration_seconds: start.elapsed().as_secs_f64(), + outcome: outcome.to_string(), + stage_results: stage_results.clone(), + error_message: if outcome == "failed" { + stage_results.iter() + .find(|s| !s.success) + .and_then(|s| s.error.clone()) + } else { + None + }, + }; + status.record_run(run); + + match outcome { + "success" | "degraded" => { + // Degraded does NOT count as a failure — core data is fresh + status.consecutive_failures = 0; + status.next_retry_at_ms = None; + status.paused_reason = None; + status.last_error_code = None; + status.last_error_message = None; + } + "failed" => { + let core_error = stage_results.iter() + .find(|s| (s.stage == "issues" || s.stage == "mrs") && !s.success); + + // Check if the underlying error is permanent + if let Some(stage) = core_error { + if is_permanent_stage_error(stage) { + status.paused_reason = Some(format!( + "{}: {}", + stage.stage, + stage.error.as_deref().unwrap_or("unknown error") + )); + status.last_error_code = Some("PERMANENT".to_string()); + status.last_error_message = stage.error.clone(); + // Don't increment consecutive_failures — we're pausing + } else { + status.consecutive_failures = status.consecutive_failures.saturating_add(1); + status.last_error_code = Some("TRANSIENT".to_string()); + status.last_error_message = stage.error.clone(); + + // Circuit breaker check + if status.consecutive_failures >= manifest.max_transient_failures { + status.paused_reason = Some(format!( + "CIRCUIT_BREAKER: {} consecutive transient failures (last: {})", + status.consecutive_failures, + stage.error.as_deref().unwrap_or("unknown") + )); + status.circuit_breaker_paused_at_ms = Some(clock.now_ms()); + status.next_retry_at_ms = None; // paused, not backing off + } else { + // Extract retry hint from stage error if available (e.g., Retry-After header) + let retry_hint = extract_retry_after_hint(stage); + status.set_backoff(manifest.interval_seconds, &clock, &mut rng, retry_hint); + } + } + } + } + _ => unreachable!(), + } + + // 9. Clear current_run (run is complete) + status.current_run = None; + + // 10. Write status atomically (best-effort) + if let Err(e) = status.write_atomic(&status_path) { + tracing::warn!(error = %e, "Failed to write sync status file"); + } + + // 10. Release lock (drop) + drop(lock); + + // 11. Print result + print_robot_json(json!({ + "ok": true, + "data": { + "action": if outcome == "failed" && status.paused_reason.is_some() { "paused" } else { "sync_completed" }, + "outcome": outcome, + "profile": manifest.profile, + "duration_seconds": start.elapsed().as_secs_f64(), + "stage_results": stage_results, + "consecutive_failures": status.consecutive_failures, + }, + "meta": { "elapsed_ms": start.elapsed().as_millis() } + })); + + Ok(()) +} +``` + +### Error classification helpers + +```rust +/// Classify by ErrorCode (used when we have the LoreError directly) +fn is_permanent_error(e: &LoreError) -> bool { + matches!( + e.code(), + ErrorCode::TokenNotSet + | ErrorCode::AuthFailed + | ErrorCode::ConfigNotFound + | ErrorCode::ConfigInvalid + | ErrorCode::MigrationFailed + ) +} + +/// Classify from error_code string (primary) or error message string (fallback). +/// The error_code field is propagated through stage execution and is the +/// preferred classification mechanism. String matching on the error message +/// is a fallback for stages that don't yet propagate error_code. +fn is_permanent_stage_error(stage: &StageResult) -> bool { + // Primary: classify by machine-readable error code + if let Some(code) = &stage.error_code { + return matches!( + code.as_str(), + "TOKEN_NOT_SET" | "AUTH_FAILED" | "CONFIG_NOT_FOUND" + | "CONFIG_INVALID" | "MIGRATION_FAILED" + ); + } + // Fallback: string matching (for stages that don't yet propagate error_code) + stage.error.as_deref().map_or(false, |m| { + m.contains("401 Unauthorized") + || m.contains("TokenNotSet") + || m.contains("ConfigNotFound") + || m.contains("ConfigInvalid") + || m.contains("MigrationFailed") + }) +} +``` + +> **Implementation note:** The `error_code` field on `StageResult` is the primary classification mechanism. Each stage's execution wrapper should catch `LoreError`, extract its `ErrorCode` via `.code().to_string()`, and populate the `error_code` field. The string-matching fallback exists for robustness but should not be the primary path. + +### Pipeline lock + +The `sync_pipeline` lock uses the existing `AppLock` mechanism (same as the ingest lock). It prevents: +- Two `service run` invocations overlapping (if scheduler fires before previous run completes) +- A `service run` overlapping with a manual `lore sync` (the manual sync should also acquire this lock) + +**Change to `handle_sync_cmd`:** Add `sync_pipeline` lock acquisition at the top of `handle_sync_cmd` as well. This is the **only** change to the manual sync path — no backoff, no status file writes. If the lock is already held by a `service run`, manual sync waits briefly then fails with a clear message ("A scheduled sync is in progress. Wait for it to complete or use --force to override."). + +```rust +// In handle_sync_cmd, after config load: +let _pipeline_lock = AppLock::try_acquire("sync_pipeline", stale_lock_minutes) + .map_err(|_| LoreError::ServiceError { + message: "Another sync is in progress. Wait for it to complete or use --force.".into(), + })?; +``` + +--- + +## Platform Backends + +### Architecture + +`src/cli/commands/service/platform/mod.rs` exports free functions that dispatch via `#[cfg(target_os)]`. All functions take `service_id` to construct platform-specific identifiers: + +```rust +pub fn install(service_id: &str, ...) -> Result { + #[cfg(target_os = "macos")] + return launchd::install(service_id, ...); + #[cfg(target_os = "linux")] + return systemd::install(service_id, ...); + #[cfg(target_os = "windows")] + return schtasks::install(service_id, ...); + #[cfg(not(any(target_os = "macos", target_os = "linux", target_os = "windows")))] + return Err(LoreError::ServiceUnsupported); +} +``` + +Same pattern for `uninstall()`, `is_installed()`, `get_state()`, `service_file_paths()`, `platform_name()`. + +> **Architecture note:** A `SchedulerBackend` trait is the target architecture for deterministic integration testing with a `FakeBackend` that simulates install/uninstall/state without touching the OS. For v1, the `#[cfg]` dispatch + `run_cmd` helper provides adequate testability — unit tests validate template generation (string output, no OS calls) and `run_cmd` captures all OS interactions with kill+reap timeout handling. The function signatures already mirror the trait shape (`install`, `uninstall`, `is_installed`, `get_state`, `service_file_paths`, `check_prerequisites`), making the trait extraction a low-risk refactoring target for v2. When extracted, the trait should be parameterized by `service_id` and return `Result` for all operations. + +### Command Runner Helper + +All platform backends use a shared `run_cmd` helper for consistent error handling: + +```rust +/// Execute a system command with timeout and stderr capture. +/// Returns stdout on success, ServiceCommandFailed on failure. +/// On timeout, kills the child process and waits to reap it (prevents zombie processes). +fn run_cmd(program: &str, args: &[&str], timeout_secs: u64) -> Result { + let mut child = std::process::Command::new(program) + .args(args) + .stdout(std::process::Stdio::piped()) + .stderr(std::process::Stdio::piped()) + .spawn() + .map_err(|e| LoreError::ServiceCommandFailed { + cmd: format!("{} {}", program, args.join(" ")), + exit_code: None, + stderr: e.to_string(), + })?; + + // Wait with timeout; on timeout kill and reap + // This prevents process leaks that can wedge repeated runs. + let output = wait_with_timeout_kill_and_reap(&mut child, timeout_secs)?; + + if output.status.success() { + Ok(String::from_utf8_lossy(&output.stdout).to_string()) + } else { + Err(LoreError::ServiceCommandFailed { + cmd: format!("{} {}", program, args.join(" ")), + exit_code: output.status.code(), + stderr: String::from_utf8_lossy(&output.stderr).to_string(), + }) + } +} + +/// Wait for child process with timeout. On timeout, sends SIGKILL and waits +/// for the process to be reaped (prevents zombie processes on Unix). +/// +/// NOTE: stdout/stderr are read after exit. This is safe for scheduler commands +/// (launchctl, systemctl, schtasks) which produce small output. For commands +/// that could produce large output (>64KB), concurrent draining via threads or +/// `child.wait_with_output()` would be needed to prevent pipe backpressure deadlock. +fn wait_with_timeout_kill_and_reap( + child: &mut std::process::Child, + timeout_secs: u64, +) -> Result { + use std::time::{Duration, Instant}; + + let deadline = Instant::now() + Duration::from_secs(timeout_secs); + + loop { + match child.try_wait() { + Ok(Some(status)) => { + let stdout = child.stdout.take().map_or(Vec::new(), |mut s| { + let mut buf = Vec::new(); + std::io::Read::read_to_end(&mut s, &mut buf).unwrap_or(0); + buf + }); + let stderr = child.stderr.take().map_or(Vec::new(), |mut s| { + let mut buf = Vec::new(); + std::io::Read::read_to_end(&mut s, &mut buf).unwrap_or(0); + buf + }); + return Ok(std::process::Output { status, stdout, stderr }); + } + Ok(None) => { + if Instant::now() >= deadline { + // Timeout: kill and reap + let _ = child.kill(); + let _ = child.wait(); // reap to prevent zombie + return Err(LoreError::ServiceCommandFailed { + cmd: "(timeout)".into(), + exit_code: None, + stderr: format!("Process timed out after {timeout_secs}s"), + }); + } + std::thread::sleep(Duration::from_millis(100)); + } + Err(e) => return Err(LoreError::ServiceCommandFailed { + cmd: "(wait)".into(), + exit_code: None, + stderr: e.to_string(), + }), + } + } +} +``` + +This ensures all `launchctl`, `systemctl`, and `schtasks` failures produce consistent, machine-readable errors with the exact command, exit code, and stderr captured. + +### Token Storage Helper + +```rust +/// Write token to a user-private env file, scoped by service_id. +/// Returns the path to the env file. +/// +/// Rejects tokens containing NUL bytes or newlines to prevent env-file injection. +/// The token is written as a raw value (not shell-quoted) and read via `cat` in +/// the wrapper script, never `source`d or `eval`d. +fn write_token_env_file( + data_dir: &Path, + service_id: &str, + token_env_var: &str, + token_value: &str, +) -> Result { + // Validate token content — reject values that could break env-file format + if token_value.contains('\0') || token_value.contains('\n') || token_value.contains('\r') { + return Err(LoreError::ServiceError { + message: "Token contains NUL or newline characters, which are not safe for env-file storage. \ + Use --token-source embedded instead.".into(), + }); + } + + let env_path = data_dir.join(format!("service-env-{service_id}")); + let content = format!("{}={}\n", token_env_var, token_value); + + // Write atomically: tmp file + fsync + let tmp_path = env_path.with_extension("tmp"); + std::fs::write(&tmp_path, &content)?; + + // Set permissions to 0600 (owner read/write only) BEFORE rename + #[cfg(unix)] + { + use std::os::unix::fs::PermissionsExt; + std::fs::set_permissions(&tmp_path, std::fs::Permissions::from_mode(0o600))?; + } + + std::fs::rename(&tmp_path, &env_path)?; + Ok(env_path) +} +``` + +### Function signatures + +```rust +pub struct InstallResult { + pub platform: String, + pub service_id: String, + pub interval_seconds: u64, + pub profile: String, + pub binary_path: String, + pub config_path: Option, + pub service_files: Vec, + pub sync_command: String, + pub token_env_var: String, + pub token_source: String, // "env_file", "embedded", or "system_env" +} + +pub struct UninstallResult { + pub was_installed: bool, + pub service_id: String, + pub platform: String, + pub removed_files: Vec, +} + +pub fn install( + service_id: &str, + binary_path: &str, + config_path: Option<&str>, + interval_seconds: u64, + profile: &str, + token_env_var: &str, + token_value: &str, + token_source: &str, + log_dir: &Path, + data_dir: &Path, +) -> Result; + +pub fn uninstall(service_id: &str) -> Result; +pub fn is_installed(service_id: &str) -> bool; +pub fn get_state(service_id: &str) -> Option; // "loaded", "running", etc. +pub fn service_file_paths(service_id: &str) -> Vec; +pub fn platform_name() -> &'static str; + +/// Pre-flight check for platform-specific prerequisites. +/// Returns a list of diagnostic results. +pub fn check_prerequisites() -> Vec; + +pub struct DiagnosticCheck { + pub name: String, + pub status: DiagnosticStatus, // Pass, Warn, Fail + pub message: Option, + pub action: Option, // Suggested fix command +} +``` + +--- + +### macOS: launchd (`platform/launchd.rs`) + +**Service file:** `~/Library/LaunchAgents/com.gitlore.sync.{service_id}.plist` + +**Label:** `com.gitlore.sync.{service_id}` + +**Wrapper script approach:** launchd cannot natively load environment files. Instead of embedding the token directly in the plist (which would persist it in a readable XML file), we generate a small wrapper shell script that reads the env file at runtime and execs `lore`. This keeps the token out of the plist entirely for the `env-file` strategy. + +**Wrapper script** (`{data_dir}/service-run-{service_id}.sh`, mode 0700): +```bash +#!/bin/sh +# Generated by lore service install — do not edit +set -e +# Read token from env file (KEY=VALUE format) — never source/eval untrusted content +{token_env_var}="$(sed -n 's/^{token_env_var}=//p' "{data_dir}/service-env-{service_id}")" +export {token_env_var} +{config_export_line} +exec "{binary_path}" --robot service run --service-id "{service_id}" +``` + +Where `{config_export_line}` is either empty or `export LORE_CONFIG_PATH="{config_path}"`. + +**Plist template** (generated via `format!()`, no crate needed): +```xml + + + + + Label + com.gitlore.sync.{service_id} + ProgramArguments + + {program_arguments} + + {env_dict} + StartInterval + {interval_seconds} + RunAtLoad + + ProcessType + Background + Nice + 10 + LowPriorityIO + + StandardOutPath + {log_dir}/service-{service_id}-stdout.log + StandardErrorPath + {log_dir}/service-{service_id}-stderr.log + TimeOut + 600 + + +``` + +Where `{program_arguments}` and `{env_dict}` depend on `token_source`: +- **env-file (default):** The plist invokes the wrapper script instead of `lore` directly. No token appears in the plist. + ```xml + {data_dir}/service-run-{service_id}.sh + ``` + `{env_dict}` is empty (the wrapper script handles environment setup). + +- **embedded:** The plist invokes `lore` directly with the token embedded in `EnvironmentVariables`. + ```xml + {binary_path} + --robot + service + run + --service-id + {service_id} + ``` + ```xml + EnvironmentVariables + + {token_env_var} + {token_value} + {config_env_entry} + + ``` + +Where `{config_env_entry}` is either empty or: +```xml + LORE_CONFIG_PATH + {config_path} +``` + +**XML escaping:** The token value and paths must be XML-escaped. Write a helper `fn xml_escape(s: &str) -> String` that replaces `&`, `<`, `>`, `"`, `'` with their XML entity equivalents. This is critical — tokens can contain `&` or `<`. + +**Install steps:** +1. `std::fs::create_dir_all(plist_path.parent())` +2. `std::fs::write(&plist_path, plist_content)` +3. Try `launchctl bootstrap gui/{uid} {plist_path}` via `std::process::Command` +4. If that fails (older macOS), fall back to `launchctl load {plist_path}` +5. Get UID via safe wrapper: `fn current_uid() -> u32 { unsafe { libc::getuid() } }` — isolated in a single-line function with `#[allow(unsafe_code)]` exemption since `getuid()` is trivially safe (no pointers, no mutation, always succeeds). Alternatively, use the `nix` crate's `nix::unistd::Uid::current()` if already a dependency. + +**Uninstall steps:** +1. Try `launchctl bootout gui/{uid}/com.gitlore.sync.{service_id}` +2. If that fails, try `launchctl unload {plist_path}` +3. `std::fs::remove_file(&plist_path)` (ignore error if doesn't exist) + +**State detection:** +- `is_installed(service_id)`: check if plist file exists on disk +- `get_state(service_id)`: run `launchctl list com.gitlore.sync.{service_id}`, parse exit code (0 = loaded, non-0 = not loaded) +- `get_interval_seconds(service_id)`: read plist file, find `StartInterval` then next `` value via simple string search (no XML parser needed) + +**Platform prerequisites (`check_prerequisites`):** +- Verify running in a GUI login session: check `launchctl print gui/{uid}` succeeds. In SSH-only or headless contexts, launchd user agents won't load — return `Fail` with action "Log in via GUI or use SSH with ForwardAgent". +- This is a warning, not a hard block — some macOS setups (like `launchctl asuser`) can work around it. + +--- + +### Linux: systemd (`platform/systemd.rs`) + +**Service files:** +- `~/.config/systemd/user/lore-sync-{service_id}.service` +- `~/.config/systemd/user/lore-sync-{service_id}.timer` + +**Service unit (hardened):** +```ini +[Unit] +Description=Gitlore GitLab data sync ({service_id}) + +[Service] +Type=oneshot +ExecStart={binary_path} --robot service run --service-id {service_id} +WorkingDirectory={data_dir} +SuccessExitStatus=0 +TimeoutStartSec=900 +NoNewPrivileges=true +PrivateTmp=true +ProtectSystem=strict +ProtectHome=read-only +ReadWritePaths={data_dir} +{token_env_line} +{config_env_line} +``` + +Where `{token_env_line}` depends on `token_source`: +- **env-file:** `EnvironmentFile={data_dir}/service-env-{service_id}` (systemd natively supports this — true file-based loading, no embedding) +- **embedded:** `Environment={token_env_var}={token_value}` + +Where `{config_env_line}` is either empty or `Environment=LORE_CONFIG_PATH={config_path}`. + +**Hardening notes:** +- `TimeoutStartSec=900` — kills stuck syncs after 15 minutes (generous but bounded) +- `NoNewPrivileges=true` — prevents privilege escalation +- `PrivateTmp=true` — isolated /tmp +- `ProtectSystem=strict` — read-only filesystem except explicitly allowed paths +- `ProtectHome=read-only` — read-only home directory +- `ReadWritePaths={data_dir}` — allows writing to the lore data directory (status files, logs, DB) + +**Timer unit:** +```ini +[Unit] +Description=Gitlore sync timer ({service_id}) + +[Timer] +OnBootSec=5min +OnUnitInactiveSec={interval_seconds}s +AccuracySec=1min +Persistent=true +RandomizedDelaySec=60 + +[Install] +WantedBy=timers.target +``` + +**Install steps:** +1. `std::fs::create_dir_all(unit_dir)` +2. Write both files +3. Run `systemctl --user daemon-reload` +4. Run `systemctl --user enable --now lore-sync-{service_id}.timer` + +**Uninstall steps:** +1. Run `systemctl --user disable --now lore-sync-{service_id}.timer` (ignore error) +2. Remove both files +3. Run `systemctl --user daemon-reload` + +**State detection:** +- `is_installed(service_id)`: check if timer file exists +- `get_state(service_id)`: run `systemctl --user is-active lore-sync-{service_id}.timer`, capture stdout ("active", "inactive", etc.) +- `get_interval_seconds(service_id)`: read timer file, parse `OnUnitInactiveSec` value + +**Platform prerequisites (`check_prerequisites`):** +- **User manager running:** Check `systemctl --user status` exits 0. If not, return `Fail` with message "systemd user manager not running. Start a user session or contact your system administrator." +- **Linger enabled:** Check `loginctl show-user $(whoami) --property=Linger` returns `Linger=yes`. If not, return `Warn` with message "loginctl linger not enabled. Timer will not fire on reboot without an active login session." and action `loginctl enable-linger $(whoami)`. This is a warning, not a block — the timer works fine when the user is logged in. + +--- + +### Windows: schtasks (`platform/schtasks.rs`) + +**Task name:** `LoreSync-{service_id}` + +**Install:** +``` +schtasks /create /tn "LoreSync-{service_id}" /tr "\"{binary_path}\" --robot service run --service-id {service_id}" /sc minute /mo {interval_minutes} /f +``` + +Note: `/mo` requires minutes, so convert seconds to minutes (round up). Minimum is 1 minute (but we enforce 5 minutes at the parse level). + +**Token handling on Windows:** The env var must be set system-wide via `setx` or be present in the user's environment. Neither env-file nor embedded strategies apply — Windows scheduled tasks inherit the user's environment. Set `token_source: "system_env"` in the result and document this as a requirement. + +**Uninstall:** +``` +schtasks /delete /tn "LoreSync-{service_id}" /f +``` + +**State detection:** +- `is_installed(service_id)`: run `schtasks /query /tn "LoreSync-{service_id}"`, check exit code (0 = exists) +- `get_state(service_id)`: parse output of `schtasks /query /tn "LoreSync-{service_id}" /fo CSV /v`, extract "Status" column +- `get_interval_seconds(service_id)`: parse "Repeat: Every" from verbose output, or store the value ourselves + +**Platform prerequisites (`check_prerequisites`):** +- Verify `schtasks` is available: run `schtasks /?` and check exit code. Return `Fail` if not found. + +--- + +## Interval Parsing + +```rust +/// Parse interval strings like "15m", "1h", "30m", "2h", "24h" +/// Only minutes (m) and hours (h) are accepted — seconds are not exposed +/// because the minimum interval is 5 minutes and sub-minute granularity +/// would be confusing for a scheduled sync. +pub fn parse_interval(input: &str) -> std::result::Result { + let input = input.trim(); + + let (num_str, multiplier) = if let Some(n) = input.strip_suffix('m') { + (n, 60u64) + } else if let Some(n) = input.strip_suffix('h') { + (n, 3600u64) + } else { + return Err(format!( + "Invalid interval '{input}'. Use format like 15m, 30m, 1h, 2h" + )); + }; + + let num: u64 = num_str + .parse() + .map_err(|_| format!("Invalid number in interval: '{num_str}'"))?; + + if num == 0 { + return Err("Interval must be greater than 0".to_string()); + } + + let seconds = num * multiplier; + + if seconds < 300 { + return Err(format!( + "Minimum interval is 5m (got {input}, which is {seconds}s)" + )); + } + if seconds > 86400 { + return Err(format!( + "Maximum interval is 24h (got {input}, which is {seconds}s)" + )); + } + + Ok(seconds) +} +``` + +--- + +## Error Types + +### Additions to `src/core/error.rs` + +**ErrorCode enum:** +```rust +ServiceError, // Add after Ambiguous +ServiceCommandFailed, // OS command (launchctl/systemctl/schtasks) failed +ServiceCorruptState, // Manifest or status file is corrupt/unparseable +``` + +**ErrorCode::exit_code():** +```rust +Self::ServiceError => 21, +Self::ServiceCommandFailed => 22, +Self::ServiceCorruptState => 23, +``` + +**ErrorCode::Display:** +```rust +Self::ServiceError => "SERVICE_ERROR", +Self::ServiceCommandFailed => "SERVICE_COMMAND_FAILED", +Self::ServiceCorruptState => "SERVICE_CORRUPT_STATE", +``` + +**LoreError enum:** +```rust +#[error("Service error: {message}")] +ServiceError { message: String }, + +#[error("Service management not supported on this platform. Requires macOS (launchd), Linux (systemd), or Windows (schtasks).")] +ServiceUnsupported, + +#[error("Service command failed: {cmd} (exit {exit_code:?}): {stderr}")] +ServiceCommandFailed { + cmd: String, + exit_code: Option, + stderr: String, +}, + +#[error("Service state file corrupt: {path}: {reason}")] +ServiceCorruptState { + path: String, + reason: String, +}, +``` + +**LoreError::code():** +```rust +Self::ServiceError { .. } => ErrorCode::ServiceError, +Self::ServiceUnsupported => ErrorCode::ServiceError, +Self::ServiceCommandFailed { .. } => ErrorCode::ServiceCommandFailed, +Self::ServiceCorruptState { .. } => ErrorCode::ServiceCorruptState, +``` + +**LoreError::suggestion():** +```rust +Self::ServiceError { .. } => Some("Check service status: lore service status\nRun diagnostics: lore service doctor\nView logs: lore service logs"), +Self::ServiceUnsupported => Some("Requires macOS (launchd), Linux (systemd), or Windows (schtasks)"), +Self::ServiceCommandFailed { .. } => Some("Check service logs: lore service logs\nRun diagnostics: lore service doctor\nTry reinstalling: lore service install"), +Self::ServiceCorruptState { .. } => Some("Run: lore service repair\nThen reinstall if needed: lore service install"), +``` + +**LoreError::actions():** +```rust +Self::ServiceError { .. } => vec!["lore service status", "lore service doctor", "lore service logs"], +Self::ServiceUnsupported => vec![], +Self::ServiceCommandFailed { .. } => vec!["lore service logs", "lore service doctor", "lore service install"], +Self::ServiceCorruptState { .. } => vec!["lore service repair", "lore service install"], +``` + +--- + +## CLI Definition Changes + +### `src/cli/mod.rs` + +Add to `Commands` enum (after `Who(WhoArgs)`, before the hidden commands): + +```rust +/// Manage the OS-native scheduled sync service +Service { + #[command(subcommand)] + command: ServiceCommand, +}, +``` + +Add the `ServiceCommand` enum (can be in the same file or re-exported from `service/mod.rs`): + +```rust +#[derive(Subcommand)] +pub enum ServiceCommand { + /// Install the scheduled sync service + Install { + /// Sync interval (e.g., 15m, 30m, 1h). Default: 30m. Min: 5m. Max: 24h. + #[arg(long, default_value = "30m")] + interval: String, + /// Sync profile: fast (issues+MRs), balanced (+ docs), full (+ embeddings) + #[arg(long, default_value = "balanced")] + profile: String, + /// Token storage: env-file (default, 0600 perms) or embedded (in service file) + #[arg(long, default_value = "env-file")] + token_source: String, + /// Custom service name (default: derived from config path hash). + /// Useful when managing multiple installations for readability. + #[arg(long)] + name: Option, + /// Validate and render service files without writing or executing anything + #[arg(long)] + dry_run: bool, + }, + /// Remove the scheduled sync service + Uninstall { + /// Target a specific service by ID or name (default: current project) + #[arg(long)] + service: Option, + /// Uninstall all services + #[arg(long)] + all: bool, + }, + /// List all installed services + List, + /// Show service status and last sync result + Status { + /// Target a specific service by ID or name (default: current project) + #[arg(long)] + service: Option, + }, + /// View service logs + Logs { + /// Show last N lines (default: 100) + #[arg(long)] + tail: Option>, + /// Stream new log lines as they arrive (like tail -f) + #[arg(long)] + follow: bool, + /// Open log file in editor instead of printing to stdout + #[arg(long)] + open: bool, + /// Target a specific service by ID or name + #[arg(long)] + service: Option, + }, + /// Clear paused state and reset failure counter + Resume { + /// Target a specific service by ID or name (default: current project) + #[arg(long)] + service: Option, + }, + /// Pause scheduled execution without uninstalling + Pause { + /// Reason for pausing (shown in status output) + #[arg(long)] + reason: Option, + /// Target a specific service by ID or name (default: current project) + #[arg(long)] + service: Option, + }, + /// Trigger an immediate one-off sync using installed profile + Trigger { + /// Bypass backoff window (still respects paused state) + #[arg(long)] + ignore_backoff: bool, + /// Target a specific service by ID or name (default: current project) + #[arg(long)] + service: Option, + }, + /// Repair corrupt manifest or status files + Repair { + /// Target a specific service by ID or name (default: current project) + #[arg(long)] + service: Option, + }, + /// Validate service environment and prerequisites + Doctor { + /// Skip network checks (token validation) + #[arg(long)] + offline: bool, + /// Attempt safe, non-destructive fixes for detected issues + #[arg(long)] + fix: bool, + }, + /// Execute one scheduled sync attempt (called by OS scheduler, hidden from help) + #[command(hide = true)] + Run { + /// Internal selector injected by scheduler backend — identifies which + /// service manifest and status file to use for this run. + #[arg(long, hide = true)] + service_id: String, + }, +} +``` + +### `src/cli/commands/mod.rs` + +Add: +```rust +pub mod service; +``` + +No re-exports needed — the dispatch goes through `service::handle_install`, etc. directly. + +### `src/main.rs` dispatch + +Add import: +```rust +use lore::cli::ServiceCommand; +``` + +Add match arm (before the hidden commands): +```rust +Some(Commands::Service { command }) => { + handle_service(cli.config.as_deref(), command, robot_mode) +} +``` + +Add handler function: +```rust +fn handle_service( + config_override: Option<&str>, + command: ServiceCommand, + robot_mode: bool, +) -> Result<(), Box> { + let start = std::time::Instant::now(); + match command { + ServiceCommand::Install { interval, profile, token_source, name, dry_run } => { + lore::cli::commands::service::handle_install( + config_override, &interval, &profile, &token_source, name.as_deref(), + dry_run, robot_mode, start, + ) + } + ServiceCommand::Uninstall { service, all } => { + lore::cli::commands::service::handle_uninstall(service.as_deref(), all, robot_mode, start) + } + ServiceCommand::List => { + lore::cli::commands::service::handle_list(robot_mode, start) + } + ServiceCommand::Status { service } => { + lore::cli::commands::service::handle_status(config_override, service.as_deref(), robot_mode, start) + } + ServiceCommand::Logs { tail, follow, open, service } => { + lore::cli::commands::service::handle_logs(tail, follow, open, service.as_deref(), robot_mode, start) + } + ServiceCommand::Resume { service } => { + lore::cli::commands::service::handle_resume(service.as_deref(), robot_mode, start) + } + ServiceCommand::Pause { reason, service } => { + lore::cli::commands::service::handle_pause(service.as_deref(), reason.as_deref(), robot_mode, start) + } + ServiceCommand::Trigger { ignore_backoff, service } => { + lore::cli::commands::service::handle_trigger(service.as_deref(), ignore_backoff, robot_mode, start) + } + ServiceCommand::Repair { service } => { + lore::cli::commands::service::handle_repair(service.as_deref(), robot_mode, start) + } + ServiceCommand::Doctor { offline, fix } => { + lore::cli::commands::service::handle_doctor(config_override, offline, fix, robot_mode, start) + } + ServiceCommand::Run { service_id } => { + // Always robot mode for scheduled execution + lore::cli::commands::service::handle_service_run(&service_id, start) + } + } +} +``` + +--- + +## Autocorrect Registry + +### `src/cli/autocorrect.rs` + +Add to `COMMAND_FLAGS` array (before the hidden commands): +```rust +("service", &["--interval", "--profile", "--token-source", "--name", "--dry-run", "--tail", "--follow", "--open", "--offline", "--fix", "--service", "--all", "--reason", "--ignore-backoff"]), +``` + +**Important:** The `registry_covers_command_flags` test in autocorrect.rs uses clap introspection to verify all flags are registered. Since `service` is a nested subcommand, verify whether this test recurses into subcommands. If it does, the test will fail without this entry. If it doesn't recurse (only checks top-level subcommands), the test passes but we should still add the entry for correctness. + +Looking at the test (lines 868-908): it iterates `cmd.get_subcommands()` which gets the top-level subcommands. The `Service` variant uses `#[command(subcommand)]` which means clap will show `service` as a subcommand with its own sub-subcommands. The test won't recurse into `install`'s flags, but `service` itself has no direct flags (only subcommands do), so an empty entry or omission would pass the test. Adding `("service", &["--interval"])` is conservative and correct — the `--interval` flag lives on the `install` sub-subcommand but won't cause issues. + +However, `detect_subcommand` only finds the *first* positional arg. For `lore service install --intervl 30m`, it returns `"service"`, not `"install"`. So the `--interval` flag needs to be registered under `"service"` for fuzzy matching. + +--- + +## robot-docs Manifest + +### Addition to `handle_robot_docs` in `src/main.rs` + +Add to the `commands` JSON object: + +```rust +"service": { + "description": "Manage OS-native scheduled sync service", + "subcommands": { + "install": { + "description": "Install scheduled sync service", + "flags": ["--interval ", "--profile ", "--token-source ", "--name ", "--dry-run"], + "defaults": { "interval": "30m", "profile": "balanced", "token_source": "env-file" }, + "example": "lore --robot service install --interval 15m --profile fast", + "response_schema": { + "ok": "bool", + "data.platform": "string (launchd|systemd|schtasks)", + "data.service_id": "string", + "data.interval_seconds": "number", + "data.profile": "string", + "data.binary_path": "string", + "data.service_files": "[string]", + "data.token_source": "string (env_file|embedded|system_env)", + "data.no_change": "bool" + } + }, + "uninstall": { + "description": "Remove scheduled sync service", + "flags": ["--service ", "--all"], + "example": "lore --robot service uninstall", + "response_schema": { + "ok": "bool", + "data.was_installed": "bool", + "data.service_id": "string", + "data.removed_files": "[string]" + } + }, + "list": { + "description": "List all installed services", + "example": "lore --robot service list", + "response_schema": { + "ok": "bool", + "data.services": "[{service_id, platform, interval_seconds, profile, installed_at_iso, platform_state, drift}]" + } + }, + "status": { + "description": "Show service status, scheduler state, and recent runs", + "flags": ["--service "], + "example": "lore --robot service status", + "response_schema": { + "ok": "bool", + "data.installed": "bool", + "data.service_id": "string|null", + "data.platform": "string", + "data.interval_seconds": "number|null", + "data.profile": "string|null", + "data.scheduler_state": "string (idle|running|running_stale|degraded|backoff|half_open|paused|not_installed)", + "data.last_sync": "SyncRunRecord|null", + "data.recent_runs": "[SyncRunRecord]", + "data.backoff": "object|null", + "data.paused_reason": "string|null", + "data.drift": "object|null {platform_drift: bool, spec_drift: bool, command_drift: bool}" + } + }, + "logs": { + "description": "View service logs (human: editor/tail, robot: path + optional lines)", + "flags": ["--tail ", "--follow"], + "example": "lore --robot service logs --tail 50", + "response_schema": { + "ok": "bool", + "data.log_path": "string", + "data.exists": "bool", + "data.size_bytes": "number", + "data.last_lines": "[string]|null" + } + }, + "resume": { + "description": "Clear paused state and reset failure counter", + "example": "lore --robot service resume", + "response_schema": { + "ok": "bool", + "data.was_paused": "bool", + "data.previous_reason": "string|null", + "data.consecutive_failures_cleared": "number" + } + }, + "pause": { + "description": "Pause scheduled execution without uninstalling", + "flags": ["--reason ", "--service "], + "example": "lore --robot service pause --reason 'maintenance'", + "response_schema": { + "ok": "bool", + "data.service_id": "string", + "data.paused": "bool", + "data.reason": "string", + "data.already_paused": "bool" + } + }, + "trigger": { + "description": "Trigger immediate one-off sync using installed profile", + "flags": ["--ignore-backoff", "--service "], + "example": "lore --robot service trigger", + "response_schema": "Same as service run output" + }, + "repair": { + "description": "Repair corrupt manifest or status files", + "flags": ["--service "], + "example": "lore --robot service repair", + "response_schema": { + "ok": "bool", + "data.repaired": "bool", + "data.actions": "[{file, action, backup?}]", + "data.needs_reinstall": "bool" + } + }, + "doctor": { + "description": "Validate service environment and prerequisites", + "flags": ["--offline", "--fix"], + "example": "lore --robot service doctor", + "response_schema": { + "ok": "bool", + "data.checks": "[{name, status, message?, action?}]", + "data.overall": "string (pass|warn|fail)" + } + } + } +} +``` + +--- + +## Paths Module Additions + +### `src/core/paths.rs` + +```rust +pub fn get_service_status_path(service_id: &str) -> PathBuf { + get_data_dir().join(format!("sync-status-{service_id}.json")) +} + +pub fn get_service_manifest_path(service_id: &str) -> PathBuf { + get_data_dir().join(format!("service-manifest-{service_id}.json")) +} + +pub fn get_service_env_path(service_id: &str) -> PathBuf { + get_data_dir().join(format!("service-env-{service_id}")) +} + +pub fn get_service_wrapper_path(service_id: &str) -> PathBuf { + get_data_dir().join(format!("service-run-{service_id}.sh")) +} + +pub fn get_service_log_path(service_id: &str, stream: &str) -> PathBuf { + get_data_dir().join("logs").join(format!("service-{service_id}-{stream}.log")) +} + +// stream values: "stdout" or "stderr" +// Example: get_service_log_path("a1b2c3d4e5f6", "stderr") +// => ~/.local/share/lore/logs/service-a1b2c3d4e5f6-stderr.log + +/// List all installed service IDs by scanning for manifest files. +pub fn list_service_ids() -> Vec { + let data_dir = get_data_dir(); + std::fs::read_dir(&data_dir) + .unwrap_or_else(|_| /* return empty iterator */) + .filter_map(|entry| { + let name = entry.ok()?.file_name().to_string_lossy().to_string(); + name.strip_prefix("service-manifest-") + .and_then(|s| s.strip_suffix(".json")) + .map(String::from) + }) + .collect() +} +``` + +Note: Status files are scoped by `service_id` — each installed service gets independent backoff/paused/circuit-breaker state. The pipeline lock remains global (`sync_pipeline`) to prevent overlapping writes to the shared database. + +--- + +## Core Module Registration + +### `src/core/mod.rs` + +Add: +```rust +pub mod sync_status; +pub mod service_manifest; +``` + +--- + +## File-by-File Implementation Details + +### `src/core/sync_status.rs` (NEW) + +- `SyncRunRecord` struct with Serialize + Deserialize + Clone +- `StageResult` struct with Serialize + Deserialize + Clone +- `SyncStatusFile` struct with Serialize + Deserialize + Default (schema_version=1) +- `Clock` trait + `SystemClock` impl (for deterministic testing) +- `JitterRng` trait + `ThreadJitterRng` impl (for deterministic jitter testing) +- `parse_interval(input: &str) -> Result` +- `SyncStatusFile::read(path: &Path) -> Result, LoreError>` — distinguishes missing from corrupt +- `SyncStatusFile::write_atomic(&self, path: &Path) -> std::io::Result<()>` — tmp+fsync+rename +- `SyncStatusFile::record_run(&mut self, run: SyncRunRecord)` — push to recent_runs (capped at 10) +- `SyncStatusFile::clear_paused(&mut self)` — reset paused_reason, errors, failures, next_retry_at_ms +- `SyncStatusFile::backoff_remaining(&self, clock: &dyn Clock) -> Option` — reads persisted next_retry_at_ms +- `SyncStatusFile::set_backoff(&mut self, base_interval_seconds, clock, rng)` — compute and persist next_retry_at_ms +- `fn is_permanent_error(code: &ErrorCode) -> bool` +- `fn is_permanent_stage_error(stage: &StageResult) -> bool` — primary: error_code, fallback: string matching +- `SyncStatusFile::is_circuit_breaker_half_open(&self, manifest: &ServiceManifest, clock: &dyn Clock) -> bool` — checks if cooldown has expired +- Unit tests for all of the above + +### `src/core/service_manifest.rs` (NEW) + +- `ServiceManifest` struct with Serialize + Deserialize (schema_version=1), includes `workspace_root` and `spec_hash` +- `ServiceManifest::read(path: &Path) -> Result, LoreError>` — distinguishes missing from corrupt +- `ServiceManifest::write_atomic(&self, path: &Path) -> std::io::Result<()>` — tmp+fsync+rename +- `ServiceManifest::profile_to_sync_args(&self) -> Vec` — maps profile to sync CLI flags +- `compute_service_id(workspace_root: &Path, config_path: &Path, project_urls: &[&str]) -> String` — composite fingerprint (workspace root + config path + sorted project URLs), first 12 hex chars of SHA-256 +- `sanitize_service_name(name: &str) -> Result` — `[a-z0-9-]`, max 32 chars +- `DiagnosticCheck` struct, `DiagnosticStatus` enum (Pass/Warn/Fail) +- Unit tests for profile mapping, service_id computation, name sanitization + +### `src/cli/commands/service/mod.rs` (NEW) + +- Re-exports from submodules: `handle_install`, `handle_uninstall`, `handle_list`, `handle_status`, `handle_logs`, `handle_resume`, `handle_pause`, `handle_trigger`, `handle_repair`, `handle_doctor`, `handle_service_run` +- Shared `resolve_service_id(selector: Option<&str>, config_override: Option<&str>) -> Result` helper: resolves `--service` flag, or derives from current config path. If multiple services exist and no selector provided, returns actionable error listing available services. +- Shared `acquire_admin_lock(service_id: &str) -> Result` helper: acquires `AppLock("service-admin-{service_id}")` for state mutation commands. Used by install, uninstall, pause, resume, and repair. NOT used by `service run` (which only acquires `sync_pipeline`). +- Imports from submodules + +### `src/cli/commands/service/install.rs` (NEW) + +- `handle_install(config_override, interval_str, profile, token_source, name, dry_run, robot_mode, start) -> Result<()>` +- Validates profile is one of `fast|balanced|full` +- Validates token_source is one of `env-file|embedded` +- Computes or validates `service_id` from `--name` or composite fingerprint (workspace root + config path + project URLs). If `--name` is provided and collides with an existing service with a different identity hash, returns an actionable error. +- Acquires admin lock `AppLock("service-admin-{service_id}")` before mutating any files +- Runs `doctor` pre-flight checks; aborts on any `Fail` result +- Loads config, resolves token, resolves binary path +- Writes token to env file (if env-file strategy, scoped by service_id) +- On macOS with env-file: generates wrapper script at `{data_dir}/service-run-{service_id}.sh` (mode 0700) +- Calls `platform::install(service_id, ...)` +- **Transactional**: on enable success, writes install manifest atomically. On enable failure, removes generated service files and wrapper script, returns `ServiceCommandFailed`. +- Compares with existing manifest to detect no-change case +- Prints result (robot JSON or human-readable) + +### `src/cli/commands/service/uninstall.rs` (NEW) + +- `handle_uninstall(service_selector, all, robot_mode, start) -> Result<()>` +- Resolves target service via selector or current-project default +- With `--all`: iterates all discovered manifests +- Reads manifest to find service_id +- Calls `platform::uninstall(service_id)` +- Removes install manifest (`service-manifest-{service_id}.json`) +- Removes env file (`service-env-{service_id}`) if exists +- Removes wrapper script (`service-run-{service_id}.sh`) if exists (macOS) +- Does NOT remove the status file or log files (those are operational data, not config) +- Outputs confirmation + +### `src/cli/commands/service/status.rs` (NEW) + +- `handle_status(config_override, robot_mode, start) -> Result<()>` +- Reads install manifest (primary source for config and service_id) +- Calls `platform::is_installed(service_id)`, `get_state(service_id)` to verify platform state +- Detects drift: platform drift (loaded/unloaded), spec drift (content hash vs `spec_hash`), command drift +- Reads `SyncStatusFile` for last sync and recent runs +- Detects stale runs via `current_run` metadata: checks if PID is alive and `started_at_ms` is within 30 minutes +- Computes scheduler state from status + manifest (including `degraded`, `running_stale`) +- Computes backoff info from persisted `next_retry_at_ms` +- Prints combined status + +### `src/cli/commands/service/logs.rs` (NEW) + +- `handle_logs(tail, follow, robot_mode, start) -> Result<()>` +- `--tail`: read last N lines, output directly to stdout (or as JSON array in robot mode) +- `--follow`: stream new lines (human mode only; robot mode returns error) +- Default (no flags): print last 100 lines to stdout (human) or return path metadata (robot) +- Robot mode with `--tail`: includes `last_lines` field (capped at 100) + +### `src/cli/commands/service/resume.rs` (NEW) + +- `handle_resume(robot_mode, start) -> Result<()>` +- Reads status file, clears paused state (including circuit breaker), writes back atomically +- Prints confirmation with previous reason + +### `src/cli/commands/service/doctor.rs` (NEW) + +- `handle_doctor(config_override, offline, fix, robot_mode, start) -> Result<()>` +- Runs diagnostic checks: config, token, binary, data dir, platform prerequisites, install state +- Skips network checks when `--offline` +- `--fix`: attempts safe, non-destructive remediations (create dirs, fix permissions, daemon-reload). Reports each applied fix. +- Reports pass/warn/fail per check +- Also used as pre-flight by `handle_install` (as an internal function call, without `--fix`) + +### `src/cli/commands/service/run.rs` (NEW) + +- `handle_service_run(service_id: &str, start) -> Result<()>` +- The hidden scheduled execution entrypoint; `service_id` is injected by the scheduler command line +- Reads manifest for the given `service_id` to get profile/interval/max_transient_failures/circuit_breaker_cooldown_seconds +- Checks paused state with half-open transition (cooldown check), backoff (via persisted next_retry_at_ms), pipeline lock +- Writes `current_run` metadata (started_at_ms, pid) to status file before sync for stale-run detection; clears it on completion +- Executes sync stage-by-stage, records per-stage outcomes with `error_code` propagation +- Classifies: success / degraded / failed +- Respects server-provided `Retry-After` hints when computing backoff (via `extract_retry_after_hint`) +- Circuit breaker check on transient failure count; records `circuit_breaker_paused_at_ms` for cooldown +- Half-open probe: if probe succeeds, auto-closes circuit breaker; if fails, returns to paused with new timestamp +- Performs log rotation check before executing sync +- Updates status atomically +- Always robot mode, always exit 0 + +### `src/cli/commands/service/list.rs` (NEW) + +- `handle_list(robot_mode, start) -> Result<()>` +- Scans `{data_dir}` for `service-manifest-*.json` files +- Reads each manifest, verifies platform state, detects drift +- Outputs summary in robot JSON or human-readable table + +### `src/cli/commands/service/pause.rs` (NEW) + +- `handle_pause(service_selector, reason, robot_mode, start) -> Result<()>` +- Resolves service, writes `paused_reason` to status file +- Does NOT modify OS scheduler (service stays installed and scheduled — it just no-ops) +- Reports `already_paused: true` if already paused (updates reason) + +### `src/cli/commands/service/trigger.rs` (NEW) + +- `handle_trigger(service_selector, ignore_backoff, robot_mode, start) -> Result<()>` +- Resolves service, reads manifest for profile +- Delegates to `handle_service_run` logic with optional backoff bypass +- Still respects paused state (use `resume` first) + +### `src/cli/commands/service/repair.rs` (NEW) + +- `handle_repair(service_selector, robot_mode, start) -> Result<()>` +- Validates manifest and status files for JSON parseability +- Corrupt files: renamed to `{name}.corrupt.{timestamp}` (backup, never delete) +- Status file: reinitialized to default +- Manifest: cleared, advises reinstall +- Reports what was repaired + +### `src/cli/commands/service/platform/mod.rs` (NEW) + +- `#[cfg]`-gated imports and dispatch functions (all take `service_id`) +- `fn xml_escape(s: &str) -> String` helper (used by launchd) +- `fn run_cmd(program, args, timeout_secs) -> Result` — shared command runner with kill+reap on timeout +- `fn wait_with_timeout_kill_and_reap(child, timeout_secs) -> Result` — timeout handler that kills and reaps child process +- `fn write_token_env_file(data_dir, service_id, token_env_var, token_value) -> Result` — token storage +- `fn write_wrapper_script(data_dir, service_id, binary_path, token_env_var, config_path) -> Result` — macOS wrapper script for runtime env loading (mode 0700) +- `fn check_prerequisites() -> Vec` — platform-specific pre-flight +- `fn write_atomic(path: &Path, content: &str) -> std::io::Result<()>` — shared atomic write helper (tmp + fsync(file) + rename + fsync(parent_dir) for power-loss durability) + +### `src/cli/commands/service/platform/launchd.rs` (NEW, `#[cfg(target_os = "macos")]`) + +- `fn plist_path(service_id: &str) -> PathBuf` — `~/Library/LaunchAgents/com.gitlore.sync.{service_id}.plist` +- `fn generate_plist(service_id, binary_path, config_path, interval_seconds, token_env_var, token_value, token_source, log_dir, data_dir) -> String` — generates plist with wrapper script (env-file) or direct invocation (embedded) +- `fn generate_plist_with_wrapper(service_id, wrapper_path, interval_seconds, log_dir) -> String` — env-file variant: ProgramArguments points to wrapper script +- `fn generate_plist_with_embedded(service_id, binary_path, config_path, interval_seconds, token_env_var, token_value, log_dir) -> String` — embedded variant: token in EnvironmentVariables +- `fn install(service_id, ...) -> Result` +- `fn uninstall(service_id) -> Result` +- `fn is_installed(service_id) -> bool` +- `fn get_state(service_id) -> Option` +- `fn get_interval_seconds(service_id) -> u64` +- `fn check_prerequisites() -> Vec` — GUI session check +- Unit tests: `test_generate_plist_with_wrapper()` — verify wrapper path in ProgramArguments, no token in plist +- Unit tests: `test_generate_plist_with_embedded()` — verify token in EnvironmentVariables +- Unit tests: XML escaping, service_id in label + +### `src/cli/commands/service/platform/systemd.rs` (NEW, `#[cfg(target_os = "linux")]`) + +- `fn unit_dir() -> PathBuf` — `~/.config/systemd/user/` +- `fn generate_service(service_id, binary_path, config_path, token_env_var, token_value, token_source, data_dir) -> String` — includes hardening directives +- `fn generate_timer(service_id, interval_seconds) -> String` +- `fn install(service_id, ...) -> Result` +- `fn uninstall(service_id) -> Result` +- Same query functions as launchd (all scoped by service_id) +- `fn check_prerequisites() -> Vec` — user manager + linger checks +- Unit test: `test_generate_service()` (both env-file and embedded, verify hardening), `test_generate_timer()` + +### `src/cli/commands/service/platform/schtasks.rs` (NEW, `#[cfg(target_os = "windows")]`) + +- `fn install(service_id, ...) -> Result` +- `fn uninstall(service_id) -> Result` +- Same query functions (scoped by service_id) +- `fn check_prerequisites() -> Vec` — schtasks availability +- Note: `token_source: "system_env"` — token must be in system environment + +--- + +## Testing Strategy + +### Test Infrastructure + +**Fake clock for deterministic time-dependent tests:** +```rust +/// Test clock with controllable time +struct FakeClock { + now_ms: i64, +} + +impl Clock for FakeClock { + fn now_ms(&self) -> i64 { + self.now_ms + } +} +``` + +**Fake RNG for deterministic jitter tests:** +```rust +/// Test RNG that returns a predetermined sequence of values +struct FakeJitterRng { + values: Vec, + index: usize, +} + +impl FakeJitterRng { + fn new(values: Vec) -> Self { + Self { values, index: 0 } + } +} + +impl JitterRng for FakeJitterRng { + fn next_f64(&mut self) -> f64 { + let val = self.values[self.index % self.values.len()]; + self.index += 1; + val + } +} +``` + +This eliminates all time- and randomness-dependent flakiness. Every test sets an explicit "now" and jitter value, then asserts exact results. + +### Unit Tests (in `src/core/sync_status.rs`) + +```rust +#[cfg(test)] +mod tests { + use super::*; + use tempfile::TempDir; + + struct FakeClock { now_ms: i64 } + impl Clock for FakeClock { + fn now_ms(&self) -> i64 { self.now_ms } + } + + struct FakeJitterRng { value: f64 } + impl FakeJitterRng { + fn new(value: f64) -> Self { + Self { value } + } + } + + impl JitterRng for FakeJitterRng { + fn next_f64(&mut self) -> f64 { + self.value + } + } + + // --- Interval parsing --- + + #[test] + fn parse_interval_valid_minutes() { + assert_eq!(parse_interval("5m").unwrap(), 300); + assert_eq!(parse_interval("15m").unwrap(), 900); + assert_eq!(parse_interval("30m").unwrap(), 1800); + } + + #[test] + fn parse_interval_valid_hours() { + assert_eq!(parse_interval("1h").unwrap(), 3600); + assert_eq!(parse_interval("2h").unwrap(), 7200); + assert_eq!(parse_interval("24h").unwrap(), 86400); + } + + #[test] + fn parse_interval_too_short() { + assert!(parse_interval("1m").is_err()); + assert!(parse_interval("4m").is_err()); + } + + #[test] + fn parse_interval_too_long() { + assert!(parse_interval("25h").is_err()); + } + + #[test] + fn parse_interval_invalid() { + assert!(parse_interval("0m").is_err()); + assert!(parse_interval("abc").is_err()); + assert!(parse_interval("").is_err()); + assert!(parse_interval("m").is_err()); + assert!(parse_interval("10x").is_err()); + assert!(parse_interval("30s").is_err()); // seconds not supported + } + + #[test] + fn parse_interval_trims_whitespace() { + assert_eq!(parse_interval(" 30m ").unwrap(), 1800); + } + + // --- Status file persistence --- + + #[test] + fn status_file_round_trip() { + let dir = TempDir::new().unwrap(); + let path = dir.path().join("sync-status-test1234.json"); + + let mut status = SyncStatusFile::default(); + let run = SyncRunRecord { + timestamp_iso: "2026-02-09T10:30:00Z".to_string(), + timestamp_ms: 1_770_609_000_000, + duration_seconds: 12.5, + outcome: "success".to_string(), + stage_results: vec![ + StageResult { stage: "issues".into(), success: true, items_updated: 5, error: None }, + StageResult { stage: "mrs".into(), success: true, items_updated: 3, error: None }, + ], + error_message: None, + }; + status.record_run(run); + status.write_atomic(&path).unwrap(); + + let loaded = SyncStatusFile::read(&path).unwrap().unwrap(); + assert_eq!(loaded.last_run.as_ref().unwrap().outcome, "success"); + assert_eq!(loaded.last_run.as_ref().unwrap().stage_results.len(), 2); + assert_eq!(loaded.consecutive_failures, 0); + assert_eq!(loaded.recent_runs.len(), 1); + assert_eq!(loaded.schema_version, 1); + } + + #[test] + fn status_file_read_missing_returns_ok_none() { + let dir = TempDir::new().unwrap(); + let path = dir.path().join("nonexistent.json"); + assert!(SyncStatusFile::read(&path).unwrap().is_none()); + } + + #[test] + fn status_file_read_corrupt_returns_err() { + let dir = TempDir::new().unwrap(); + let path = dir.path().join("corrupt.json"); + std::fs::write(&path, "not valid json{{{").unwrap(); + assert!(SyncStatusFile::read(&path).is_err()); + } + + #[test] + fn status_file_atomic_write_survives_crash() { + // Verify no partial writes by checking file is valid JSON after write + let dir = TempDir::new().unwrap(); + let path = dir.path().join("sync-status-test1234.json"); + let status = SyncStatusFile::default(); + status.write_atomic(&path).unwrap(); + // Read back and verify + let loaded = SyncStatusFile::read(&path).unwrap().unwrap(); + assert_eq!(loaded.schema_version, 1); + } + + #[test] + fn record_run_caps_at_10() { + let mut status = SyncStatusFile::default(); + for i in 0..15 { + status.record_run(make_run(i * 1000, "success")); + } + assert_eq!(status.recent_runs.len(), 10); + } + + #[test] + fn default_status_has_no_last_run() { + let status = SyncStatusFile::default(); + assert!(status.last_run.is_none()); + } + + // --- Backoff (deterministic via FakeClock + persisted next_retry_at_ms) --- + + #[test] + fn backoff_returns_none_when_zero_failures() { + let status = make_status("success", 0, 100_000); + let clock = FakeClock { now_ms: 200_000 }; + assert!(status.backoff_remaining(&clock).is_none()); + } + + #[test] + fn backoff_returns_none_when_no_next_retry() { + let mut status = make_status("failed", 1, 100_000_000); + status.next_retry_at_ms = None; + let clock = FakeClock { now_ms: 200_000_000 }; + assert!(status.backoff_remaining(&clock).is_none()); + } + + #[test] + fn backoff_active_within_window() { + let mut status = make_status("failed", 1, 100_000_000); + status.next_retry_at_ms = Some(100_000_000 + 1_800_000); // 30 min from now + let clock = FakeClock { now_ms: 100_000_000 + 1000 }; // 1s after failure + let remaining = status.backoff_remaining(&clock); + assert!(remaining.is_some()); + assert_eq!(remaining.unwrap(), 1799); + } + + #[test] + fn backoff_expired() { + let mut status = make_status("failed", 1, 100_000_000); + status.next_retry_at_ms = Some(100_000_000 + 1_800_000); + let clock = FakeClock { now_ms: 100_000_000 + 2_000_000 }; // past retry time + assert!(status.backoff_remaining(&clock).is_none()); + } + + #[test] + fn set_backoff_persists_next_retry() { + let mut status = make_status("failed", 1, 100_000_000); + let clock = FakeClock { now_ms: 100_000_000 }; + let mut rng = FakeJitterRng::new(0.5); // 0.5 for deterministic + status.set_backoff(1800, &clock, &mut rng, None); + assert!(status.next_retry_at_ms.is_some()); + // With jitter=0.5, backoff = max(1800*0.5, 1800) = 1800s + let expected_ms = 100_000_000 + 1_800_000; + assert_eq!(status.next_retry_at_ms.unwrap(), expected_ms); + } + + #[test] + fn set_backoff_caps_at_4_hours() { + let mut status = make_status("failed", 20, 100_000_000); + let clock = FakeClock { now_ms: 100_000_000 }; + let mut rng = FakeJitterRng::new(1.0); // max jitter + status.set_backoff(1800, &clock, &mut rng, None); + // Cap: 4h = 14400s, jitter=1.0: max(14400*1.0, 1800) = 14400 + let max_ms = 100_000_000 + 14_400_000; + assert!(status.next_retry_at_ms.unwrap() <= max_ms); + } + + #[test] + fn set_backoff_minimum_is_base_interval() { + let mut status = make_status("failed", 1, 100_000_000); + let clock = FakeClock { now_ms: 100_000_000 }; + let mut rng = FakeJitterRng::new(0.0); // min jitter + status.set_backoff(1800, &clock, &mut rng, None); + // jitter=0.0: max(1800*0.0, 1800) = 1800 (minimum enforced) + let expected_ms = 100_000_000 + 1_800_000; + assert_eq!(status.next_retry_at_ms.unwrap(), expected_ms); + } + + #[test] + fn set_backoff_respects_retry_after_hint() { + let mut status = make_status("failed", 1, 100_000_000); + let clock = FakeClock { now_ms: 100_000_000 }; + let mut rng = FakeJitterRng::new(0.0); // min jitter => computed backoff = 1800s + let hint = 100_000_000 + 3_600_000; // server says retry after 1 hour + status.set_backoff(1800, &clock, &mut rng, Some(hint)); + // Hint (1h) > computed backoff (30m), so hint wins + assert_eq!(status.next_retry_at_ms.unwrap(), hint); + } + + #[test] + fn set_backoff_ignores_hint_when_computed_is_larger() { + let mut status = make_status("failed", 1, 100_000_000); + let clock = FakeClock { now_ms: 100_000_000 }; + let mut rng = FakeJitterRng::new(0.0); + let hint = 100_000_000 + 60_000; // server says retry after 1 minute + status.set_backoff(1800, &clock, &mut rng, Some(hint)); + // Computed (30m) > hint (1m), so computed wins + let expected_ms = 100_000_000 + 1_800_000; + assert_eq!(status.next_retry_at_ms.unwrap(), expected_ms); + } + + #[test] + fn set_backoff_uses_configured_interval_not_hardcoded() { + let mut status1 = make_status("failed", 1, 100_000_000); + let mut status2 = make_status("failed", 1, 100_000_000); + let clock = FakeClock { now_ms: 100_000_000 }; + let mut rng = FakeJitterRng::new(0.5); + + status1.set_backoff(300, &clock, &mut rng, None); // 5m base + rng.value = 0.5; // reset + status2.set_backoff(3600, &clock, &mut rng, None); // 1h base + + // 5m base should produce shorter backoff than 1h base + assert!(status1.next_retry_at_ms.unwrap() < status2.next_retry_at_ms.unwrap()); + } + + #[test] + fn backoff_skips_when_paused() { + let mut status = make_status("failed", 3, 100_000_000); + status.paused_reason = Some("AUTH_FAILED".to_string()); + status.next_retry_at_ms = Some(100_000_000 + 999_999_999); + let clock = FakeClock { now_ms: 100_000_000 + 1000 }; + // Paused state is checked separately, backoff_remaining returns None + assert!(status.backoff_remaining(&clock).is_none()); + } + + // --- Error classification --- + + #[test] + fn permanent_errors_classified_correctly() { + assert!(is_permanent_error(&ErrorCode::TokenNotSet)); + assert!(is_permanent_error(&ErrorCode::AuthFailed)); + assert!(is_permanent_error(&ErrorCode::ConfigNotFound)); + assert!(is_permanent_error(&ErrorCode::ConfigInvalid)); + assert!(is_permanent_error(&ErrorCode::MigrationFailed)); + } + + #[test] + fn transient_errors_classified_correctly() { + assert!(!is_permanent_error(&ErrorCode::NetworkError)); + assert!(!is_permanent_error(&ErrorCode::RateLimited)); + assert!(!is_permanent_error(&ErrorCode::DbLocked)); + assert!(!is_permanent_error(&ErrorCode::DbError)); + assert!(!is_permanent_error(&ErrorCode::InternalError)); + } + + // --- Stage-aware outcomes --- + + #[test] + fn degraded_outcome_does_not_count_as_failure() { + // When core stages succeed but optional stages fail, consecutive_failures should reset + let mut status = make_status("failed", 3, 100_000_000); + status.next_retry_at_ms = Some(200_000_000); + + // Simulate degraded outcome clearing failure state + status.consecutive_failures = 0; + status.next_retry_at_ms = None; + assert_eq!(status.consecutive_failures, 0); + assert!(status.next_retry_at_ms.is_none()); + } + + // --- Backoff (service run only, NOT manual sync) --- + // (Test degraded state by running with --profile full when Ollama is down) + // Embeddings should fail, but issues/MRs should succeed + #[test] + fn service_run_degraded_outcome_clears_failures() { + let mut status = make_status("failed", 3, 100_000_000); + status.consecutive_failures = 3; + status.next_retry_at_ms = Some(200_000_000); + + // Simulate degraded outcome clearing failure state + status.consecutive_failures = 0; + status.next_retry_at_ms = None; + assert_eq!(status.consecutive_failures, 0); + assert!(status.next_retry_at_ms.is_none()); + } + + // --- Circuit breaker --- + #[test] + fn circuit_breaker_trips_at_threshold() { + let mut status = make_status("failed", 9, 100_000_000); + // Incrementing to 10 should trigger circuit breaker + status.consecutive_failures = status.consecutive_failures.saturating_add(1); + assert_eq!(status.consecutive_failures, 10); + // Caller would set paused_reason = "CIRCUIT_BREAKER" + } + + // --- Paused state (permanent error) --- + #[test] + fn clear_paused_resets_all_fields() { + let mut status = make_status("failed", 5, 100_000_000); + status.paused_reason = Some("AUTH_FAILED: 401 Unauthorized".to_string()); + status.last_error_code = Some("AUTH_FAILED".to_string()); + status.last_error_message = Some("401 Unauthorized".to_string()); + status.next_retry_at_ms = Some(200_000_000); + status.circuit_breaker_paused_at_ms = Some(100_000_000); + status.clear_paused(); + assert!(status.paused_reason.is_none()); + assert!(status.circuit_breaker_paused_at_ms.is_none()); + assert!(status.last_error_code.is_none()); + assert!(status.last_error_message.is_none()); + assert!(status.next_retry_at_ms.is_none()); + assert_eq!(status.consecutive_failures, 0); + } + + #[test] + fn clear_paused_also_clears_circuit_breaker() { + let mut status = make_status("failed", 10, 100_000_000); + status.paused_reason = Some("CIRCUIT_BREAKER: 10 consecutive transient failures".to_string()); + status.clear_paused(); + assert!(status.paused_reason.is_none()); + assert_eq!(status.consecutive_failures, 0); + } + + fn make_run(ts_ms: i64, outcome: &str) -> SyncRunRecord { + SyncRunRecord { + timestamp_iso: String::new(), + timestamp_ms: ts_ms, + duration_seconds: 1.0, + outcome: outcome.to_string(), + stage_results: vec![], + error_message: if outcome == "failed" { + Some("test error".into()) + } else { + None + }, + } + } + + fn make_stage_result(stage: &str, success: bool, error_code: Option<&str>) -> StageResult { + StageResult { + stage: stage.to_string(), + success, + items_updated: if success { 5 } else { 0 }, + error: if success { None } else { Some("test error".into()) }, + error_code: error_code.map(|s| s.to_string()), + } + } + + fn make_status(outcome: &str, failures: u32, ts_ms: i64) -> SyncStatusFile { + let run = make_run(ts_ms, outcome); + SyncStatusFile { + schema_version: 1, + updated_at_iso: String::new(), + last_run: Some(run.clone()), + recent_runs: vec![run], + consecutive_failures: failures, + next_retry_at_ms: None, + paused_reason: None, + circuit_breaker_paused_at_ms: None, + last_error_code: None, + last_error_message: None, + current_run: None, + } + } +} +``` + +### Service Manifest Tests (in `src/core/service_manifest.rs`) + +```rust +#[cfg(test)] +mod tests { + use super::*; + use tempfile::TempDir; + + #[test] + fn manifest_round_trip() { + let dir = TempDir::new().unwrap(); + let path = dir.path().join("manifest.json"); + let manifest = ServiceManifest { + schema_version: 1, + service_id: "a1b2c3d4e5f6".to_string(), + workspace_root: "/Users/x/projects/my-project".to_string(), + installed_at_iso: "2026-02-09T10:00:00Z".to_string(), + updated_at_iso: "2026-02-09T10:00:00Z".to_string(), + platform: "launchd".to_string(), + interval_seconds: 900, + profile: "fast".to_string(), + binary_path: "/usr/local/bin/lore".to_string(), + config_path: None, + token_source: "env_file".to_string(), + token_env_var: "GITLAB_TOKEN".to_string(), + service_files: vec!["/Users/x/Library/LaunchAgents/com.gitlore.sync.a1b2c3d4e5f6.plist".to_string()], + sync_command: "/usr/local/bin/lore --robot service run".to_string(), + max_transient_failures: 10, + circuit_breaker_cooldown_seconds: 1800, + spec_hash: "abc123def456".to_string(), + }; + manifest.write_atomic(&path).unwrap(); + let loaded = ServiceManifest::read(&path).unwrap().unwrap(); + assert_eq!(loaded.profile, "fast"); + assert_eq!(loaded.interval_seconds, 900); + assert_eq!(loaded.service_id, "a1b2c3d4e5f6"); + assert_eq!(loaded.max_transient_failures, 10); + assert_eq!(loaded.circuit_breaker_cooldown_seconds, 1800); + } + + #[test] + fn manifest_read_missing_returns_ok_none() { + let dir = TempDir::new().unwrap(); + assert!(ServiceManifest::read(&dir.path().join("nope.json")).unwrap().is_none()); + } + + #[test] + fn manifest_read_corrupt_returns_err() { + let dir = TempDir::new().unwrap(); + let path = dir.path().join("bad.json"); + std::fs::write(&path, "{{{{").unwrap(); + assert!(ServiceManifest::read(&path).is_err()); + } + + #[test] + fn profile_to_sync_args_fast() { + let m = make_manifest("fast"); + assert_eq!(m.profile_to_sync_args(), vec!["--no-docs", "--no-embed"]); + } + + #[test] + fn profile_to_sync_args_balanced() { + let m = make_manifest("balanced"); + assert_eq!(m.profile_to_sync_args(), vec!["--no-embed"]); + } + + #[test] + fn profile_to_sync_args_full() { + let m = make_manifest("full"); + assert!(m.profile_to_sync_args().is_empty()); + } + + #[test] + fn compute_service_id_deterministic() { + let urls = ["https://gitlab.com/group/repo"]; + let id1 = compute_service_id(Path::new("/home/user/project"), Path::new("/home/user/.config/lore/config.json"), &urls); + let id2 = compute_service_id(Path::new("/home/user/project"), Path::new("/home/user/.config/lore/config.json"), &urls); + assert_eq!(id1, id2); + assert_eq!(id1.len(), 12); + } + + #[test] + fn compute_service_id_different_workspaces() { + let urls = ["https://gitlab.com/group/repo"]; + let config = Path::new("/home/user/.config/lore/config.json"); + let id1 = compute_service_id(Path::new("/home/user/project-a"), config, &urls); + let id2 = compute_service_id(Path::new("/home/user/project-b"), config, &urls); + assert_ne!(id1, id2); // Same config, different workspace => different IDs + } + + #[test] + fn compute_service_id_different_configs() { + let urls = ["https://gitlab.com/group/repo"]; + let workspace = Path::new("/home/user/project"); + let id1 = compute_service_id(workspace, Path::new("/home/user1/config.json"), &urls); + let id2 = compute_service_id(workspace, Path::new("/home/user2/config.json"), &urls); + assert_ne!(id1, id2); + } + + #[test] + fn compute_service_id_different_projects_same_config() { + let workspace = Path::new("/home/user/project"); + let config = Path::new("/home/user/.config/lore/config.json"); + let id1 = compute_service_id(workspace, config, &["https://gitlab.com/group/repo-a"]); + let id2 = compute_service_id(workspace, config, &["https://gitlab.com/group/repo-b"]); + assert_ne!(id1, id2); // Same config path, different projects => different IDs + } + + #[test] + fn compute_service_id_url_order_independent() { + let workspace = Path::new("/home/user/project"); + let config = Path::new("/config.json"); + let id1 = compute_service_id(workspace, config, &["https://gitlab.com/a", "https://gitlab.com/b"]); + let id2 = compute_service_id(workspace, config, &["https://gitlab.com/b", "https://gitlab.com/a"]); + assert_eq!(id1, id2); // Order should not matter (sorted internally) + } + + #[test] + fn sanitize_service_name_valid() { + assert_eq!(sanitize_service_name("my-project").unwrap(), "my-project"); + assert_eq!(sanitize_service_name("MyProject").unwrap(), "myproject"); + } + + #[test] + fn sanitize_service_name_special_chars() { + assert_eq!(sanitize_service_name("my project!").unwrap(), "my-project-"); + } + + #[test] + fn sanitize_service_name_empty_rejects() { + assert!(sanitize_service_name("---").is_err()); + assert!(sanitize_service_name("").is_err()); + } + + #[test] + fn sanitize_service_name_too_long() { + let long_name = "a".repeat(33); + assert!(sanitize_service_name(&long_name).is_err()); + } + + fn make_manifest(profile: &str) -> ServiceManifest { /* ... */ } +} +``` + +### Platform-Specific Unit Tests + +```rust +// In platform/launchd.rs +#[cfg(test)] +mod tests { + use super::*; + + // --- Wrapper script variant (env-file, default) --- + + #[test] + fn plist_wrapper_contains_scoped_label() { + let plist = generate_plist_with_wrapper("abc123", Path::new("/data/service-run-abc123.sh"), 1800, Path::new("/tmp/logs")); + assert!(plist.contains("com.gitlore.sync.abc123")); + } + + #[test] + fn plist_wrapper_invokes_wrapper_not_lore_directly() { + let plist = generate_plist_with_wrapper("abc123", Path::new("/data/service-run-abc123.sh"), 1800, Path::new("/tmp/logs")); + assert!(plist.contains("/data/service-run-abc123.sh")); + // Should NOT contain direct lore invocation args + assert!(!plist.contains("--robot")); + assert!(!plist.contains("service")); + } + + #[test] + fn plist_wrapper_does_not_contain_token() { + let plist = generate_plist_with_wrapper("abc123", Path::new("/data/service-run-abc123.sh"), 1800, Path::new("/tmp/logs")); + assert!(!plist.contains("GITLAB_TOKEN")); + assert!(!plist.contains("glpat")); + } + + #[test] + fn plist_wrapper_contains_interval() { + let plist = generate_plist_with_wrapper("abc123", Path::new("/data/service-run-abc123.sh"), 900, Path::new("/tmp/logs")); + assert!(plist.contains("900")); + } + + // --- Embedded variant --- + + #[test] + fn plist_embedded_contains_token() { + let plist = generate_plist_with_embedded("abc123", "/usr/local/bin/lore", None, 1800, "GITLAB_TOKEN", "glpat-xxx", Path::new("/tmp/logs")); + assert!(plist.contains("GITLAB_TOKEN")); + assert!(plist.contains("glpat-xxx")); + } + + #[test] + fn plist_embedded_invokes_lore_directly() { + let plist = generate_plist_with_embedded("abc123", "/usr/local/bin/lore", None, 1800, "GITLAB_TOKEN", "glpat-xxx", Path::new("/tmp/logs")); + assert!(plist.contains("--robot")); + assert!(plist.contains("service")); + assert!(plist.contains("run")); + } + + #[test] + fn plist_embedded_xml_escapes_token() { + let plist = generate_plist_with_embedded( + "abc123", "/usr/local/bin/lore", None, 1800, "GITLAB_TOKEN", "tok&en<>", Path::new("/tmp/logs"), + ); + assert!(plist.contains("tok&en<>")); + assert!(!plist.contains("tok&en<>")); + } + + #[test] + fn plist_xml_escapes_paths_with_special_chars() { + let plist = generate_plist_with_embedded( + "abc123", "/Users/O'Brien/bin/lore", None, 1800, "GITLAB_TOKEN", "glpat-xxx", + Path::new("/tmp/logs"), + ); + assert!(plist.contains("O'Brien")); + } + + // --- Shared plist properties --- + + #[test] + fn plist_has_background_process_type() { + let plist = generate_plist_with_wrapper("abc123", Path::new("/data/service-run-abc123.sh"), 1800, Path::new("/tmp/logs")); + assert!(plist.contains("Background")); + assert!(plist.contains("10")); // Nice + } + + #[test] + fn plist_embedded_includes_config_path_when_provided() { + let plist = generate_plist_with_embedded("abc123", "/usr/local/bin/lore", Some("/custom/config.json"), 1800, "GITLAB_TOKEN", "glpat-xxx", Path::new("/tmp/logs")); + assert!(plist.contains("LORE_CONFIG_PATH")); + assert!(plist.contains("/custom/config.json")); + } +} + +// In platform/systemd.rs +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn service_unit_contains_hardening() { + let unit = generate_service("abc123", "/usr/local/bin/lore", None, "GITLAB_TOKEN", "glpat-xxx", "env-file", Path::new("/data")); + assert!(unit.contains("NoNewPrivileges=true")); + assert!(unit.contains("PrivateTmp=true")); + assert!(unit.contains("ProtectSystem=strict")); + assert!(unit.contains("ProtectHome=read-only")); + assert!(unit.contains("TimeoutStartSec=900")); + assert!(unit.contains("WorkingDirectory=/data")); + assert!(unit.contains("SuccessExitStatus=0")); + } + + #[test] + fn service_unit_env_file_mode() { + let unit = generate_service("abc123", "/usr/local/bin/lore", None, "GITLAB_TOKEN", "glpat-xxx", "env-file", Path::new("/data")); + assert!(unit.contains("EnvironmentFile=/data/service-env-abc123")); + assert!(!unit.contains("Environment=GITLAB_TOKEN=")); + } + + #[test] + fn service_unit_embedded_mode() { + let unit = generate_service("abc123", "/usr/local/bin/lore", None, "GITLAB_TOKEN", "glpat-xxx", "embedded", Path::new("/data")); + assert!(unit.contains("Environment=GITLAB_TOKEN=glpat-xxx")); + assert!(!unit.contains("EnvironmentFile=")); + } + + #[test] + fn timer_unit_contains_scoped_description() { + let timer = generate_timer("abc123", 900); + assert!(timer.contains("abc123")); + assert!(timer.contains("OnUnitInactiveSec=900s")); + } +} +``` + +### Integration Tests (CLI parsing) + +```rust +// In service/mod.rs +#[cfg(test)] +mod tests { + use clap::Parser; + use crate::cli::Cli; + + #[test] + fn parse_service_install_default() { + let cli = Cli::try_parse_from(["lore", "service", "install"]).unwrap(); + match cli.command { + Some(Commands::Service { command: ServiceCommand::Install { interval, profile, token_source, name } }) => { + assert_eq!(interval, "30m"); + assert_eq!(profile, "balanced"); + assert_eq!(token_source, "env-file"); + assert!(name.is_none()); + } + _ => panic!("Expected Service Install"), + } + } + + #[test] + fn parse_service_install_all_flags() { + let cli = Cli::try_parse_from([ + "lore", "service", "install", + "--interval", "1h", + "--profile", "fast", + "--token-source", "embedded", + "--name", "my-project", + ]).unwrap(); + match cli.command { + Some(Commands::Service { command: ServiceCommand::Install { interval, profile, token_source, name } }) => { + assert_eq!(interval, "1h"); + assert_eq!(profile, "fast"); + assert_eq!(token_source, "embedded"); + assert_eq!(name.as_deref(), Some("my-project")); + } + _ => panic!("Expected Service Install"), + } + } + + #[test] + fn parse_service_uninstall() { + let cli = Cli::try_parse_from(["lore", "service", "uninstall"]).unwrap(); + assert!(matches!( + cli.command, + Some(Commands::Service { command: ServiceCommand::Uninstall }) + )); + } + + #[test] + fn parse_service_status() { + let cli = Cli::try_parse_from(["lore", "service", "status"]).unwrap(); + assert!(matches!( + cli.command, + Some(Commands::Service { command: ServiceCommand::Status }) + )); + } + + #[test] + fn parse_service_logs_default() { + let cli = Cli::try_parse_from(["lore", "service", "logs"]).unwrap(); + assert!(matches!( + cli.command, + Some(Commands::Service { command: ServiceCommand::Logs { .. } }) + )); + } + + #[test] + fn parse_service_logs_with_tail() { + let cli = Cli::try_parse_from(["lore", "service", "logs", "--tail", "50"]).unwrap(); + // Verify tail flag is parsed + } + + #[test] + fn parse_service_resume() { + let cli = Cli::try_parse_from(["lore", "service", "resume"]).unwrap(); + assert!(matches!( + cli.command, + Some(Commands::Service { command: ServiceCommand::Resume }) + )); + } + + #[test] + fn parse_service_doctor() { + let cli = Cli::try_parse_from(["lore", "service", "doctor"]).unwrap(); + assert!(matches!( + cli.command, + Some(Commands::Service { command: ServiceCommand::Doctor { .. } }) + )); + } + + #[test] + fn parse_service_doctor_offline() { + let cli = Cli::try_parse_from(["lore", "service", "doctor", "--offline"]).unwrap(); + // Verify offline flag is parsed + } + + #[test] + fn parse_service_run_hidden() { + let cli = Cli::try_parse_from(["lore", "service", "run"]).unwrap(); + assert!(matches!( + cli.command, + Some(Commands::Service { command: ServiceCommand::Run }) + )); + } +} +``` + +### Behavioral Tests (service run isolation) + +```rust +// Verify that manual sync path is NOT affected by service state +#[test] +fn manual_sync_ignores_backoff_state() { + // Create a status file with active backoff + let dir = TempDir::new().unwrap(); + let status_path = dir.path().join("sync-status-test1234.json"); + let mut status = make_status("failed", 5, chrono::Utc::now().timestamp_millis()); + status.next_retry_at_ms = Some(chrono::Utc::now().timestamp_millis() + 999_999_999); + status.write_atomic(&status_path).unwrap(); + + // handle_sync_cmd should NOT read this file at all + // (verified by the absence of any backoff check in handle_sync_cmd) +} + +// Verify service run respects paused state +#[test] +fn service_run_respects_paused_state() { + let mut status = SyncStatusFile::default(); + status.paused_reason = Some("AUTH_FAILED".to_string()); + // handle_service_run should check paused_reason BEFORE backoff + // and exit with action: "paused" +} + +// Verify degraded outcome clears failure counter +#[test] +fn service_run_degraded_clears_failures() { + let mut status = make_status("failed", 3, 100_000_000); + status.next_retry_at_ms = Some(200_000_000); + // After a degraded run (core OK, optional failed): + status.consecutive_failures = 0; + status.next_retry_at_ms = None; + assert_eq!(status.consecutive_failures, 0); +} + +// Verify circuit breaker trips at threshold +#[test] +fn service_run_circuit_breaker_trips() { + let mut status = make_status("failed", 9, 100_000_000); + status.consecutive_failures = status.consecutive_failures.saturating_add(1); + // At 10 failures, should set paused_reason + if status.consecutive_failures >= 10 { + status.paused_reason = Some("CIRCUIT_BREAKER".to_string()); + } + assert!(status.paused_reason.is_some()); +} +``` + +--- + +## New Dependencies + +**Two new crates:** + +| Crate | Version | Purpose | Justification | +|-------|---------|---------|---------------| +| `sha2` | `0.10` | Compute `service_id` from config path | Small, well-audited, no-std compatible. Used for exactly one hash computation. | +| `hex` | `0.4` | Encode hash bytes to hex string | Tiny utility, widely used. | + +> **Note on `rand`:** The `JitterRng` trait uses `rand::thread_rng()` in production. Check if `rand` is already a transitive dependency (via other crates). If so, add it as a direct dependency. If not, consider using a simpler PRNG or system randomness via `getrandom` to avoid pulling in the full `rand` crate for a single call site. The `JitterRng` trait abstracts this, so the implementation can change without affecting the API. + +**Existing dependencies used:** +- `std::process::Command` — for launchctl, systemctl, schtasks +- `format!()` — for plist XML and systemd unit templates +- `std::env::current_exe()` — for binary path resolution +- `serde` + `serde_json` (existing) — for status/manifest files +- `chrono` (existing) — for timestamps +- `dirs` (existing) — for home directory +- `libc` (existing, unix only) — for `getuid()` +- `console` (existing) — for colored human output +- `tempfile` (existing, dev dep) — for test temp dirs + +--- + +## Implementation Order + +### Phase 1: Core types (standalone, fully testable) +1. `Cargo.toml` — add `sha2`, `hex` dependencies (and `rand` if not already transitive) +2. `src/core/sync_status.rs` — `SyncRunRecord`, `StageResult` (with `error_code`), `SyncStatusFile` (with `circuit_breaker_paused_at_ms`, `current_run`), `CurrentRunState`, `Clock` trait, `JitterRng` trait, `parse_interval`, `is_permanent_error`, `is_permanent_stage_error`, `is_circuit_breaker_half_open`, `extract_retry_after_hint`, atomic write helper, schema migration on read, all unit tests +3. `src/core/service_manifest.rs` — `ServiceManifest` (with `circuit_breaker_cooldown_seconds`, `workspace_root`, `spec_hash`), `DiagnosticCheck`, `DiagnosticStatus`, `compute_service_id(workspace_root, config_path, project_urls)`, `sanitize_service_name`, `compute_spec_hash(service_files_content)`, profile mapping, atomic write helper, schema migration on read, unit tests +4. `src/core/error.rs` — add `ServiceError`, `ServiceUnsupported`, `ServiceCommandFailed`, `ServiceCorruptState` +5. `src/core/paths.rs` — add `get_service_status_path(service_id)`, `get_service_manifest_path(service_id)`, `get_service_env_path(service_id)`, `get_service_wrapper_path(service_id)`, `get_service_log_path(service_id, stream)`, `list_service_ids()` +6. `src/core/mod.rs` — add `pub mod sync_status; pub mod service_manifest;` + +### Phase 2: Platform backends (parallelizable across platforms) +7. `src/cli/commands/service/platform/mod.rs` — dispatch functions (with `service_id`), `run_cmd` (with kill+reap on timeout), `wait_with_timeout_kill_and_reap`, `xml_escape`, `write_token_env_file`, `write_wrapper_script`, `write_atomic`, `check_prerequisites` +8. `src/cli/commands/service/platform/launchd.rs` — macOS backend with wrapper script (env-file) and embedded variants, project-scoped label + prerequisite checks + tests +9. `src/cli/commands/service/platform/systemd.rs` — Linux backend with hardened unit (WorkingDirectory, SuccessExitStatus), project-scoped names, linger/user-manager checks + tests +10. `src/cli/commands/service/platform/schtasks.rs` — Windows backend with project-scoped task name + +### Phase 3: Command handlers +11. `src/cli/commands/service/doctor.rs` — pre-flight diagnostic checks (used by install and standalone) +12. `src/cli/commands/service/install.rs` — install handler with transactional ordering (enable then manifest), wrapper script generation, doctor pre-flight, service_id +13. `src/cli/commands/service/uninstall.rs` — uninstall handler with `--service`/`--all` selectors (removes manifest + env file + wrapper script) +14. `src/cli/commands/service/list.rs` — list handler (scans data_dir for manifests, verifies platform state) +15. `src/cli/commands/service/status.rs` — status handler with scheduler state including `degraded` and `half_open` +16. `src/cli/commands/service/logs.rs` — logs handler with default tail output, `--open` for editor, `--follow`, log rotation check +17. `src/cli/commands/service/resume.rs` — resume handler (clears paused + circuit breaker) +18. `src/cli/commands/service/pause.rs` — pause handler (sets manual pause reason) +19. `src/cli/commands/service/trigger.rs` — trigger handler (immediate run with optional backoff bypass) +20. `src/cli/commands/service/repair.rs` — repair handler (backup corrupt files, reinitialize) +21. `src/cli/commands/service/run.rs` — hidden scheduled execution entrypoint with stage-aware execution, circuit breaker, half-open probe, log rotation +22. `src/cli/commands/service/mod.rs` — re-exports + `resolve_service_id` helper + +### Phase 4: CLI wiring +23. `src/cli/mod.rs` — `ServiceCommand` in `Commands` enum (with all new subcommands and flags) +24. `src/cli/commands/mod.rs` — `pub mod service;` +25. `src/main.rs` — dispatch + pipeline lock in `handle_sync_cmd` + robot-docs manifest +26. `src/cli/autocorrect.rs` — add service entry with all flags + +### Phase 5: Verification +27. `cargo check --all-targets && cargo clippy --all-targets -- -D warnings && cargo test && cargo fmt --check` + +--- + +## Verification Checklist + +```bash +# Build and lint +cargo check --all-targets +cargo clippy --all-targets -- -D warnings +cargo fmt --check + +# Run all tests +cargo test + +# --- Doctor (run first to verify prerequisites) --- +cargo run --release -- service doctor +cargo run --release -- -J service doctor | jq '.data.overall' # should show "pass" or "warn" +cargo run --release -- -J service doctor --offline | jq . +cargo run --release -- -J service doctor --fix | jq '.data.checks[] | select(.status == "fixed")' + +# --- Dry-run install (should write nothing) --- +cargo run --release -- -J service install --interval 15m --profile fast --dry-run | jq '.data.dry_run' # true +launchctl list | grep gitlore # should NOT be present + +# --- Install (macOS) --- +cargo run --release -- service install --interval 15m --profile fast +launchctl list | grep gitlore +cargo run --release -- -J service status | jq '.data.service_id' # should show hash +cargo run --release -- service logs --tail 5 +cargo run --release -- service uninstall +launchctl list | grep gitlore # should be gone + +# Verify install with custom name +cargo run --release -- service install --interval 30m --name my-project +launchctl list | grep gitlore # should show com.gitlore.sync.my-project +cargo run --release -- -J service status | jq '.data.service_id' # "my-project" +cargo run --release -- service uninstall + +# Verify install idempotency +cargo run --release -- -J service install --interval 30m +cargo run --release -- -J service install --interval 30m # should report no_change: true +cargo run --release -- -J service install --interval 15m # should report changes +cargo run --release -- service uninstall + +# --- Service run (use `service trigger` for manual testing, or provide --service-id) --- +cargo run --release -- -J service install --interval 30m +SVC_ID=$(cargo run --release -- -J service status | jq -r '.data.service_id') +cargo run --release -- -J service trigger # preferred way to manually invoke a service run +cargo run --release -- -J service status | jq '.data.recent_runs' # should show the run +cargo run --release -- -J service status | jq '.data.last_sync.stage_results' # per-stage outcomes + +# --- Stage-aware outcomes --- +# (Test degraded state by running with --profile full when Ollama is down) +# Embeddings should fail, but issues/MRs should succeed +cargo run --release -- -J service install --profile full +# Stop Ollama, then: +cargo run --release -- -J service run --service-id $SVC_ID| jq '.data.outcome' # "degraded" +cargo run --release -- -J service status | jq '.data.scheduler_state' # "degraded" + +# --- Backoff (service run only, NOT manual sync) --- +# 1. Create a status file simulating failures +cat > ~/.local/share/lore/sync-status-a1b2c3d4e5f6.json << 'EOF' +{ + "schema_version": 1, + "updated_at_iso": "2026-02-09T10:00:00Z", + "last_run": {"timestamp_iso":"2026-02-09T10:00:00Z","timestamp_ms":TIMESTAMP,"duration_seconds":1.0,"outcome":"failed","stage_results":[],"error_message":"test"}, + "recent_runs": [], + "consecutive_failures": 3, + "next_retry_at_ms": FUTURE_MS, + "paused_reason": null, + "last_error_code": null, + "last_error_message": null, + "circuit_breaker_paused_at_ms": null +} +EOF +# Replace timestamps: sed -i '' "s/TIMESTAMP/$(date +%s)000/;s/FUTURE_MS/$(($(date +%s)*1000 + 3600000))/" ~/.local/share/lore/sync-status-a1b2c3d4e5f6.json + +# 2. Service run should skip (backoff) +cargo run --release -- -J service run --service-id $SVC_ID| jq '.data.action' # "skipped" + +# 3. Manual sync should NOT be affected +cargo run --release -- sync # should proceed normally + +# --- Paused state (permanent error) --- +cat > ~/.local/share/lore/sync-status-a1b2c3d4e5f6.json << 'EOF' +{ + "schema_version": 1, + "updated_at_iso": "2026-02-09T10:00:00Z", + "last_run": {"timestamp_iso":"2026-02-09T10:00:00Z","timestamp_ms":0,"duration_seconds":1.0,"outcome":"failed","stage_results":[],"error_message":"401 Unauthorized"}, + "recent_runs": [], + "consecutive_failures": 1, + "next_retry_at_ms": null, + "paused_reason": "AUTH_FAILED: 401 Unauthorized", + "last_error_code": "AUTH_FAILED", + "last_error_message": "401 Unauthorized", + "circuit_breaker_paused_at_ms": null +} +EOF + +# Service run should report paused +cargo run --release -- -J service run --service-id $SVC_ID| jq '.data.action' # "paused" +cargo run --release -- -J service status | jq '.data.paused_reason' # "AUTH_FAILED" + +# Resume clears the state +cargo run --release -- -J service resume | jq . # clears circuit breaker + +# --- Circuit breaker --- +cat > ~/.local/share/lore/sync-status-a1b2c3d4e5f6.json << 'EOF' +{ + "schema_version": 1, + "updated_at_iso": "2026-02-09T10:00:00Z", + "last_run": {"timestamp_iso":"2026-02-09T10:00:00Z","timestamp_ms":0,"duration_seconds":1.0,"outcome":"failed","stage_results":[],"error_message":"connection refused"}, + "recent_runs": [], + "consecutive_failures": 10, + "next_retry_at_ms": null, + "paused_reason": "CIRCUIT_BREAKER: 10 consecutive transient failures", + "last_error_code": "TRANSIENT", + "last_error_message": "connection refused", + "circuit_breaker_paused_at_ms": 1770609000000 +} +EOF +cargo run --release -- -J service run --service-id $SVC_ID| jq '.data.action' # "paused" +cargo run --release -- -J service status | jq '.data.paused_reason' # "CIRCUIT_BREAKER" +cargo run --release -- -J service resume | jq . # clears circuit breaker + +# --- Robot mode for all commands --- +cargo run --release -- -J service install --interval 30m | jq . +cargo run --release -- -J service list | jq . +cargo run --release -- -J service status | jq . +cargo run --release -- -J service logs --tail 10 | jq . +cargo run --release -- -J service doctor | jq . +cargo run --release -- -J service pause --reason "test" | jq . +cargo run --release -- -J service resume | jq . +cargo run --release -- -J service trigger | jq . +cargo run --release -- -J service repair | jq . +cargo run --release -- -J service uninstall | jq . + +# --- New operational commands --- +cargo run --release -- -J service install --interval 30m +cargo run --release -- -J service pause --reason "maintenance" +cargo run --release -- -J service status | jq '.data.scheduler_state' # "paused" +cargo run --release -- -J service run --service-id $SVC_ID| jq '.data.action' # "paused" +cargo run --release -- -J service resume | jq . +cargo run --release -- -J service trigger | jq . # immediate sync +cargo run --release -- -J service list | jq '.data.services' +cargo run --release -- service uninstall + +# --- Token env file security (macOS/Linux) --- +cargo run --release -- service install --interval 30m +ls -la ~/.local/share/lore/service-env-* # should show -rw------- permissions +# On macOS, verify wrapper script exists and token NOT in plist: +ls -la ~/.local/share/lore/service-run-* # should show -rwx------ permissions +grep -c GITLAB_TOKEN ~/Library/LaunchAgents/com.gitlore.sync.*.plist # should be 0 (env-file mode) +cargo run --release -- service uninstall +ls ~/.local/share/lore/service-env-* # should be gone (uninstall removes it) +ls ~/.local/share/lore/service-run-* # should be gone (uninstall removes wrapper) + +# --- Manifest persistence --- +cargo run --release -- service install --interval 15m --profile full +cat ~/.local/share/lore/service-manifest-*.json | jq . # should show manifest with service_id +cargo run --release -- service uninstall +ls ~/.local/share/lore/service-manifest-* # should be gone + +# --- Logs with tail/follow --- +cargo run --release -- service install --interval 30m +cargo run --release -- -J service run --service-id $SVC_ID # generate some log output +cargo run --release -- service logs --tail 20 # show last 20 lines +# cargo run --release -- service logs --follow # (interactive — Ctrl-C to stop) + +# --- Uninstall cleanup --- +cargo run --release -- service install --interval 30m +cargo run --release -- -J service uninstall | jq '.data.removed_files' +# Verify status file and logs are kept +ls ~/.local/share/lore/sync-status-*.json # should exist +ls ~/.local/share/lore/logs/ # should exist + +# --- Repair command --- +# Corrupt a status file to test repair +echo "{{{" > ~/.local/share/lore/sync-status-test.json +cargo run --release -- -J service repair | jq . # should backup and reinitialize + +# --- Final cleanup --- +cargo run --release -- service uninstall 2>/dev/null +rm -f ~/.local/share/lore/sync-status-*.json +``` + +--- + +## Rejected Recommendations + +Recommendations from external reviewers that were considered and explicitly rejected. Kept here to prevent re-proposal. + +- **Unified `SyncOrchestrator` for manual and scheduled sync** (feedback-4, rec 4) — rejected because manual and scheduled sync have fundamentally different policies (backoff/circuit-breaker vs. none). A shared orchestrator adds abstraction without clear benefit. The current approach (separate paths with shared pipeline lock) is simpler, correct, and avoids coupling the manual path to service-layer concerns. The two paths share the sync pipeline implementation itself; only the policy wrapper differs. + +- **`auto` token strategy with secure-store (Keychain / libsecret / Credential Manager) as default** (feedback-2 rec 2, feedback-4 rec 7) — rejected because adding platform-specific secure store dependencies (`security-framework`, `libsecret`, `winapi`) is heavy for v1. The wrapper-script approach (already in the plan) keeps the token out of the plist safely on macOS. The plan notes secure-store as a future enhancement. The token validation fix (rejecting NUL/newline) from feedback-4 rec 7 was accepted separately. + +- **Store service state in SQLite instead of JSON status file** (feedback-1, rec 2) — rejected because the status file is intentionally independent of the database. This avoids coupling service lifecycle to DB migrations, enables service operation when the DB is locked/corrupt, and keeps the service layer self-contained. The JSON file approach with atomic writes is adequate for single-writer status tracking. + +- **`write_seq` and `content_sha256` integrity fields in manifest/status files** (feedback-4, rec 6 partial) — rejected because this is over-engineering for a status file that is written by a single process with atomic writes. The `service repair` command already handles corrupt files by backup+reinit. The fsync(parent_dir) improvement from rec 6 was accepted separately. + +- **Use `nix` crate for safe UID access** (feedback-4, rec 8 partial) — rejected as a mandatory dependency because `getuid()` is trivially safe (no pointers, no mutation) and adding `nix` for a single call is disproportionate. A single-line safe wrapper with `#[allow(unsafe_code)]` is sufficient. If `nix` is already a dependency for other reasons, using it is fine. + +- **Mandatory dual-lock acquisition with strict ordering for uninstall/run races** (feedback-5, rec 2) — rejected because the existing plan already has admin lock for destructive ops and pipeline lock for runs. The race window (scheduler fires during uninstall) is tiny, the consequence is benign (service runs, finds no manifest, exits 0), and mandatory lock ordering with dual acquisition adds significant complexity. The plan's existing separation (admin lock for state mutations, pipeline lock for data writes) is sufficient. + +- **Decoupled optional stage cadence from core sync interval** (feedback-5, rec 4) — rejected because separate freshness windows per stage (e.g., "docs every 60m, embeddings every 6h") add significant complexity: new config fields per stage, last-success tracking per stage, skip logic, and confusing profile semantics. The existing profile system already solves this more simply: use `fast` for frequent intervals (issues+MRs only), `balanced` or `full` for less frequent intervals that include heavier stages. + +- **Windows env-file parity via wrapper script** (feedback-5, rec 5) — rejected because Windows Task Scheduler has fundamentally different environment handling than launchd/systemd. A wrapper `.cmd` or `.ps1` script introduces fragility (quoting, encoding, UAC edge cases, PowerShell execution policy) for marginal benefit. The current `system_env` approach is honest, works reliably, and Windows users are accustomed to system environment variables. Future Credential Manager integration (already noted as deferred) is the right long-term solution. + +- **`--regenerate` flag on service repair** (feedback-5, rec 7 partial) — rejected because `lore service install` is already idempotent (detects existing manifest, overwrites if config differs). Regenerating scheduler artifacts is exactly what a re-install does. Adding `--regenerate` to repair creates a confusing second path to the same outcome. The `spec_hash` drift detection (accepted from this rec) gives users clear diagnostics; the remedy is simply `lore service install`. diff --git a/plans/time-decay-expert-scoring.feedback-5.md b/plans/time-decay-expert-scoring.feedback-5.md new file mode 100644 index 0000000..81040bc --- /dev/null +++ b/plans/time-decay-expert-scoring.feedback-5.md @@ -0,0 +1,128 @@ +**Best Revisions To Strengthen The Plan** + +1. **[Critical] Replace one-hop rename matching with canonical path identities** +Analysis and rationale: Current `old_path OR new_path` fixes direct renames, but it still breaks on rename chains (`a.rs -> b.rs -> c.rs`) and split/move patterns. A canonical `path_identity` graph built from `mr_file_changes(old_path,new_path)` gives stable identity over time, which is the right architectural boundary for expertise history. +```diff +@@ ## Context +-- Match both old and new paths in all signal queries AND path resolution probes so expertise survives file renames ++- Build canonical path identities from rename edges and score by identity, not raw path strings, so expertise survives multi-hop renames and moves + +@@ ## Files to Modify +-2. **`src/cli/commands/who.rs`** — Core changes: ++2. **`src/cli/commands/who.rs`** — Core changes: + ... +- - Match both `new_path` and `old_path` in all signal queries (rename awareness) ++ - Resolve queried paths to `path_identity_id` and match all aliases in that identity set ++4. **`src/core/path_identity.rs`** — New module: ++ - Build/maintain rename graph from `mr_file_changes` ++ - Resolve path -> identity + aliases for probes/scoring +``` + +2. **[Critical] Shift scoring input from runtime CTE joins to a normalized `expertise_events` table** +Analysis and rationale: Your SQL is correct but complex and expensive at query time. Precomputing normalized events at ingestion gives simpler, faster, and more reliable scoring queries; it also enables model versioning/backfills without touching raw MR/note tables each request. +```diff +@@ ## Files to Modify +-3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes ++3. **`src/core/db.rs`** — Add migrations for: ++ - `expertise_events` table (normalized scoring events) ++ - supporting indexes ++4. **`src/core/ingest/expertise_events.rs`** — New: ++ - Incremental upsert of events during sync/ingest + +@@ ## SQL Restructure (who.rs) +-The SQL uses CTE-based dual-path matching and hybrid aggregation... ++Runtime SQL reads precomputed `expertise_events` filtered by path identity + time window. ++Heavy joins/aggregation move to ingest-time normalization. +``` + +3. **[High] Upgrade reviewer engagement model beyond char-count threshold** +Analysis and rationale: `min_note_chars` is a useful guardrail but brittle (easy to game, penalizes concise high-quality comments). Add explicit review-state signals (`approved`, `changes_requested`) and trivial-comment pattern filtering to better capture real reviewer expertise. +```diff +@@ ## Scoring Formula +-| **Reviewer Participated** (left DiffNote on MR/path) | 10 | 90 days | ++| **Reviewer Participated** (substantive DiffNote and/or formal review action) | 10 | 90 days | ++| **Review Decision: changes_requested** | 6 | 120 days | ++| **Review Decision: approved** | 4 | 75 days | + +@@ ### 1. ScoringConfig (config.rs) + pub reviewer_min_note_chars: u32, ++ pub reviewer_trivial_note_patterns: Vec, // default: ["lgtm","+1","nit","ship it","👍"] ++ pub review_approved_weight: i64, // default: 4 ++ pub review_changes_requested_weight: i64, // default: 6 +``` + +4. **[High] Make temporal semantics explicit and deterministic** +Analysis and rationale: `--as-of` is good, but day parsing and boundary semantics can still cause subtle reproducibility issues. Define window as `[since_ms, as_of_ms)` and parse `YYYY-MM-DD` as end-of-day UTC (or explicit timezone) so user expectations match outputs. +```diff +@@ ### 5a. Reproducible Scoring via `--as-of` +-- All event selection is bounded by `[since_ms, as_of_ms]` ++- All event selection is bounded by `[since_ms, as_of_ms)` (exclusive upper bound) ++- `YYYY-MM-DD` is interpreted as `23:59:59.999Z` unless `--timezone` is provided ++- Robot output includes `window_start_iso`, `window_end_iso`, `window_end_exclusive: true` +``` + +5. **[High] Replace fixed default `--since 24m` with contribution-floor auto cutoff** +Analysis and rationale: A static window is simple but often over-scans data. Compute a model-derived horizon from a minimum contribution floor (for example `0.01` points) per signal; this keeps results equivalent while reducing query cost. +```diff +@@ ### 5. Default --since Change +-Expert mode: `"6m"` -> `"24m"` ++Expert mode default: `--since auto` ++`auto` computes earliest relevant timestamp from configured weights/half-lives and `min_contribution_floor` ++Add config: `min_contribution_floor` (default: 0.01) ++`--since` still overrides, `--all-history` still bypasses cutoff +``` + +6. **[High] Add bot/service-account filtering now (not later)** +Analysis and rationale: Bot activity can materially distort expertise rankings in real repos. This is low implementation cost with high quality gain and should be in v1 of the scoring revamp, not deferred. +```diff +@@ ### 1. ScoringConfig (config.rs) ++ pub excluded_username_patterns: Vec, // default: ["bot","\\[bot\\]","service-account","ci-"] +@@ ### 2. SQL Restructure (who.rs) ++Apply username exclusion in all signal sources unless `--include-bots` is set +@@ ### 5b. Score Explainability via `--explain-score` ++Add `filtered_events` counts in robot output metadata +``` + +7. **[Medium] Enforce deterministic floating-point accumulation** +Analysis and rationale: Even with small sets, unordered `HashMap` iteration can cause tiny platform-dependent ranking differences near ties. Sorting contributions and using Neumaier summation removes nondeterminism and stabilizes tests/CI. +```diff +@@ ### 4. Rust-Side Aggregation (who.rs) +-Compute score as `f64`. ++Compute score as `f64` using deterministic contribution ordering: ++1) sort by (username, signal, mr_id, ts) ++2) sum with Neumaier compensation ++Tie-break remains `(raw_score DESC, last_seen DESC, username ASC)` +``` + +8. **[Medium] Strengthen explainability with evidence, not just totals** +Analysis and rationale: Component totals help, but disputes usually need “why this user got this score now.” Add compact top evidence rows per component (`mr_id`, `ts`, `raw_contribution`) behind an optional mode. +```diff +@@ ### 5b. Score Explainability via `--explain-score` +-Component breakdown only (4 floats per user). ++Add `--explain-score=summary|full`: ++`summary`: current 4-component totals ++`full`: adds top N evidence rows per component (default N=3) ++Robot output includes per-evidence `mr_id`, `signal`, `ts`, `contribution` +``` + +9. **[Medium] Make query plan strategy explicit: `UNION ALL` default for dual-path scans** +Analysis and rationale: You currently treat `UNION ALL` as fallback if planner regresses. For SQLite, OR-across-indexed-columns regressions are common enough that defaulting to branch-split queries is often more predictable. +```diff +@@ **Index optimization fallback (UNION ALL split)** +-Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm degradation. ++Use `UNION ALL` + dedup as default for dual-path matching. ++Keep `OR` variant as optional strategy flag for benchmarking/regression checks. +``` + +10. **[Medium] Add explicit performance SLO + benchmark gate** +Analysis and rationale: This plan is query-heavy and ranking-critical; add measurable performance budgets so future edits do not silently degrade UX. Include synthetic fixture benchmarks for exact, prefix, and suffix path modes. +```diff +@@ ## Verification ++8. Performance regression gate: ++ - `cargo bench --bench who_expert_scoring` ++ - Dataset tiers: 100k, 1M, 5M notes ++ - SLOs: p95 exact path < 150ms, prefix < 250ms, suffix < 400ms on reference hardware ++ - Fail CI if regression > 20% vs stored baseline +``` + +If you want, I can produce a single consolidated “iteration 5” plan document with these changes already merged into your current structure. \ No newline at end of file diff --git a/plans/time-decay-expert-scoring.md b/plans/time-decay-expert-scoring.md index 101b08b..1625fa7 100644 --- a/plans/time-decay-expert-scoring.md +++ b/plans/time-decay-expert-scoring.md @@ -2,12 +2,12 @@ plan: true title: "" status: iterating -iteration: 4 +iteration: 5 target_iterations: 8 -beads_revision: 0 +beads_revision: 1 related_plans: [] created: 2026-02-08 -updated: 2026-02-08 +updated: 2026-02-09 --- # Time-Decay Expert Scoring Model @@ -78,6 +78,7 @@ Author/reviewer signals are deduplicated per MR (one signal per distinct MR). No - Change default `--since` from `"6m"` to `"24m"` (2 years captures all meaningful decayed signals) - Add `--as-of` flag for reproducible scoring at a fixed timestamp - Add `--explain-score` flag for per-user score component breakdown + - Add `--include-bots` flag to disable bot/service-account filtering - Sort on raw f64 score, round only for display - Update tests 3. **`src/core/db.rs`** — Add migration for indexes supporting the new query shapes (dual-path matching, reviewer participation CTE, path resolution probes) @@ -100,6 +101,7 @@ pub struct ScoringConfig { pub note_half_life_days: u32, // default: 45 pub closed_mr_multiplier: f64, // default: 0.5 (applied to closed-without-merge MRs) pub reviewer_min_note_chars: u32, // default: 20 (minimum note body length to count as participation) + pub excluded_usernames: Vec, // default: [] (exact-match usernames to exclude, e.g. ["renovate-bot", "gitlab-ci"]) } ``` @@ -108,6 +110,7 @@ pub struct ScoringConfig { - All `*_weight` / `*_bonus` must be >= 0 (negative weights produce nonsensical scores) - `closed_mr_multiplier` must be in `(0.0, 1.0]` (0 would discard closed MRs entirely; >1 would over-weight them) - `reviewer_min_note_chars` must be >= 0 (0 disables the filter; typical useful values: 10-50) +- `excluded_usernames` entries must be non-empty strings (no blank entries) - Return `LoreError::ConfigInvalid` with a clear message on failure ### 2. Decay Function (who.rs) @@ -128,25 +131,51 @@ The SQL uses **CTE-based dual-path matching** and **hybrid aggregation**. Rather MR-level signals return one row per (username, signal, mr_id) with a timestamp; note signals return one row per (username, mr_id) with `note_count` and `max_ts`. This keeps row counts bounded (dozens to low hundreds per path) while giving Rust the data it needs for decay and `log2(1+count)`. ```sql -WITH matched_notes AS ( - -- Centralize dual-path matching for DiffNotes - SELECT n.id, n.discussion_id, n.author_username, n.created_at, - n.position_new_path, n.position_old_path, n.project_id +WITH matched_notes_raw AS ( + -- Branch 1: match on new_path (uses idx_notes_new_path or equivalent) + SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id FROM notes n WHERE n.note_type = 'DiffNote' AND n.is_system = 0 AND n.author_username IS NOT NULL AND n.created_at >= ?2 - AND n.created_at <= ?4 + AND n.created_at < ?4 AND (?3 IS NULL OR n.project_id = ?3) - AND (n.position_new_path {path_op} OR n.position_old_path {path_op}) + AND n.position_new_path {path_op} + UNION ALL + -- Branch 2: match on old_path (uses idx_notes_old_path_author) + SELECT n.id, n.discussion_id, n.author_username, n.created_at, n.project_id + FROM notes n + WHERE n.note_type = 'DiffNote' + AND n.is_system = 0 + AND n.author_username IS NOT NULL + AND n.created_at >= ?2 + AND n.created_at < ?4 + AND (?3 IS NULL OR n.project_id = ?3) + AND n.position_old_path {path_op} ), -matched_file_changes AS ( - -- Centralize dual-path matching for file changes +matched_notes AS ( + -- Dedup: prevent double-counting when old_path = new_path (no rename) + SELECT DISTINCT id, discussion_id, author_username, created_at, project_id + FROM matched_notes_raw +), +matched_file_changes_raw AS ( + -- Branch 1: match on new_path (uses idx_mfc_new_path_project_mr) SELECT fc.merge_request_id, fc.project_id FROM mr_file_changes fc WHERE (?3 IS NULL OR fc.project_id = ?3) - AND (fc.new_path {path_op} OR fc.old_path {path_op}) + AND fc.new_path {path_op} + UNION ALL + -- Branch 2: match on old_path (uses idx_mfc_old_path_project_mr) + SELECT fc.merge_request_id, fc.project_id + FROM mr_file_changes fc + WHERE (?3 IS NULL OR fc.project_id = ?3) + AND fc.old_path {path_op} +), +matched_file_changes AS ( + -- Dedup: prevent double-counting when old_path = new_path (no rename) + SELECT DISTINCT merge_request_id, project_id + FROM matched_file_changes_raw ), reviewer_participation AS ( -- Precompute which (mr_id, username) pairs have substantive DiffNote participation. @@ -196,7 +225,7 @@ raw AS ( WHERE m.author_username IS NOT NULL AND m.state IN ('opened','merged','closed') AND {state_aware_ts} >= ?2 - AND {state_aware_ts} <= ?4 + AND {state_aware_ts} < ?4 UNION ALL @@ -212,7 +241,7 @@ raw AS ( AND (m.author_username IS NULL OR r.username != m.author_username) AND m.state IN ('opened','merged','closed') AND {state_aware_ts} >= ?2 - AND {state_aware_ts} <= ?4 + AND {state_aware_ts} < ?4 UNION ALL @@ -229,7 +258,7 @@ raw AS ( AND (m.author_username IS NULL OR r.username != m.author_username) AND m.state IN ('opened','merged','closed') AND {state_aware_ts} >= ?2 - AND {state_aware_ts} <= ?4 + AND {state_aware_ts} < ?4 ), aggregated AS ( -- MR-level signals: 1 row per (username, signal_class, mr_id) with MAX(ts) @@ -245,11 +274,11 @@ aggregated AS ( SELECT username, signal, mr_id, qty, ts, mr_state FROM aggregated WHERE username IS NOT NULL ``` -Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `BETWEEN ?2 AND ?4` pattern ensures that when `--as-of` is set to a past date, events after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility. +Where `{state_aware_ts}` is the state-aware timestamp expression (defined in the next section), `{path_op}` is either `= ?1` or `LIKE ?1 ESCAPE '\\'` depending on the path query type, `?4` is the `as_of_ms` exclusive upper bound (defaults to `now_ms` when `--as-of` is not specified), and `{reviewer_min_note_chars}` is the configured `reviewer_min_note_chars` value (default 20, inlined as a literal in the SQL string). The `>= ?2 AND < ?4` pattern (half-open interval) ensures that when `--as-of` is set to a past date, events at or after that date are excluded — without this, "future" events would leak in with full weight, breaking reproducibility. The exclusive upper bound avoids edge-case ambiguity when events have timestamps exactly equal to the as-of value. -**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into `matched_notes` and `matched_file_changes` CTEs means path matching is defined once, the indexes are hit once, and adding future path resolution logic (e.g., alias chains) only requires changes in one place. +**Rationale for CTE-based dual-path matching**: The previous approach (repeating `OR old_path` in every signal subquery) duplicated the path matching logic 5 times. Factoring it into foundational CTEs (`matched_notes_raw` → `matched_notes`, `matched_file_changes_raw` → `matched_file_changes`) means path matching is defined once, each index branch is explicit, and adding future path resolution logic (e.g., alias chains) only requires changes in one place. The UNION ALL + dedup pattern ensures SQLite uses the optimal index for each path column independently. -**Index optimization fallback (UNION ALL split)**: SQLite's query planner sometimes struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. If EXPLAIN QUERY PLAN shows this during step 6 verification, replace the `OR`-based CTEs with a `UNION ALL` split + dedup pattern: +**Dual-path matching strategy (UNION ALL split)**: SQLite's query planner commonly struggles with `OR` across two indexed columns, falling back to a full table scan instead of using either index. Rather than starting with `OR` and hoping the planner cooperates, use `UNION ALL` + dedup as the default strategy: ```sql matched_notes AS ( SELECT ... FROM notes n WHERE ... AND n.position_new_path {path_op} @@ -261,18 +290,18 @@ matched_notes_dedup AS ( FROM matched_notes ), ``` -This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). Start with the simpler `OR` approach and only switch to `UNION ALL` if query plans confirm the degradation. +This ensures each branch can use its respective index independently. The dedup CTE prevents double-counting when `old_path = new_path` (no rename). The same pattern applies to `matched_file_changes`. The simpler `OR` variant is retained as a comment for benchmarking — if a future SQLite version handles `OR` well, the split can be collapsed. **Rationale for precomputed participation set**: The previous approach used correlated `EXISTS`/`NOT EXISTS` subqueries to classify reviewers. The `reviewer_participation` CTE materializes the set of `(mr_id, username)` pairs from matched DiffNotes once, then signal 4a JOINs against it (participated) and signal 4b LEFT JOINs with `IS NULL` (assigned-only). This avoids per-reviewer-row correlated scans, is easier to reason about, and produces the same exhaustive split — every `mr_reviewers` row falls into exactly one bucket. **Rationale for hybrid over fully-raw**: Pre-aggregating note counts in SQL prevents row explosion from heavy DiffNote volume on frequently-discussed paths. MR-level signals are already 1-per-MR by nature (deduped via GROUP BY in each subquery). This keeps memory and latency predictable regardless of review activity density. -**Path rename awareness**: Both `matched_notes` and `matched_file_changes` CTEs match against both old and new path columns: +**Path rename awareness**: Both `matched_notes` and `matched_file_changes` use UNION ALL + dedup to match against both old and new path columns independently, ensuring each branch uses its respective index: -- Notes: `(n.position_new_path {path_op} OR n.position_old_path {path_op})` -- File changes: `(fc.new_path {path_op} OR fc.old_path {path_op})` +- Notes: branch 1 matches `position_new_path`, branch 2 matches `position_old_path`, deduped by `notes.id` +- File changes: branch 1 matches `new_path`, branch 2 matches `old_path`, deduped by `(merge_request_id, project_id)` -Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The `OR` match ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically. +Both columns already exist in the schema (`notes.position_old_path` from migration 002, `mr_file_changes.old_path` from migration 016). The UNION ALL approach ensures expertise is credited even when a file was renamed after the work was done. For prefix queries (`--path src/foo/`), the `LIKE` operator applies to both columns identically. **Signal 4 splits into two**: The current signal 4 (`file_reviewer`) joins `mr_reviewers` but doesn't distinguish participation. In the new plan: @@ -332,7 +361,7 @@ For each username, accumulate into a struct with: The `mr_state` field from each SQL row is stored alongside the timestamp so the Rust-side can apply `closed_mr_multiplier` when `mr_state == "closed"`. -Compute score as `f64`. Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`: +Compute score as `f64` with **deterministic contribution ordering**: within each signal type, sort contributions by `(mr_id ASC)` before summing. This eliminates platform-dependent HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the complexity of compensated summation (Neumaier/Kahan). Each MR-level contribution is multiplied by `closed_mr_multiplier` (default 0.5) when the MR's state is `"closed"`: ``` state_mult(mr) = if mr.state == "closed" { closed_mr_multiplier } else { 1.0 } @@ -352,6 +381,8 @@ Compute counts from the accumulated data: - `review_note_count = notes_per_mr.values().map(|(count, _)| count).sum()` - `author_mr_count = author_mrs.len()` +**Bot/service-account filtering**: After accumulating all user scores and before sorting, filter out any username that appears in `config.scoring.excluded_usernames` (exact match, case-insensitive). This is applied in Rust post-query (not SQL) to keep the SQL clean and avoid parameter explosion. When `--include-bots` is active, the filter is skipped entirely. The robot JSON `resolved_input` includes `excluded_usernames_applied: true|false` to indicate whether filtering was active. + Truncate to limit after sorting. ### 5. Default --since Change @@ -364,10 +395,11 @@ At 2 years, author decay = 6%, reviewer decay = 0.4%, note decay = 0.006% — ne ### 5a. Reproducible Scoring via `--as-of` Add `--as-of ` flag that overrides the `now_ms` reference point used for decay calculations. When set: -- All event selection is bounded by `[since_ms, as_of_ms]` — events after `as_of_ms` are excluded from SQL results entirely (not just decayed) +- All event selection is bounded by `[since_ms, as_of_ms)` — exclusive upper bound; events at or after `as_of_ms` are excluded from SQL results entirely (not just decayed). The SQL uses `< ?4` (strict less-than), not `<= ?4`. +- `YYYY-MM-DD` input (without time component) is interpreted as end-of-day UTC: `T23:59:59.999Z`. This matches user intuition that `--as-of 2025-06-01` means "as of the end of June 1st" rather than "as of midnight at the start of June 1st" which would exclude the entire day's activity. - All decay computations use `as_of_ms` instead of `SystemTime::now()` - The `--since` window is calculated relative to `as_of_ms` (not wall clock) -- Robot JSON `resolved_input` includes `as_of_ms` and `as_of_iso` fields +- Robot JSON `resolved_input` includes `as_of_ms`, `as_of_iso`, `window_start_iso`, `window_end_iso`, and `window_end_exclusive: true` fields — making the exact query window unambiguous in output **Rationale**: Decayed scoring is time-sensitive by nature. Without a fixed reference point, the same query run minutes apart produces different rankings, making debugging and test reproducibility difficult. `--as-of` pins the clock so that results are deterministic for a given dataset. The upper-bound filter in SQL is critical — without it, events after the as-of date would enter with full weight (since `elapsed.max(0.0)` clamps negative elapsed time to zero), breaking the reproducibility guarantee. @@ -484,7 +516,13 @@ Add timestamp-aware variants: **`test_closed_mr_multiplier`**: Two identical MRs (same author, same age, same path). One is `merged`, one is `closed`. The merged MR should contribute `author_weight * decay(...)`, the closed MR should contribute `author_weight * closed_mr_multiplier * decay(...)`. With default multiplier 0.5, the closed MR contributes half. -**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the upper-bound filtering in SQL. +**`test_as_of_excludes_future_events`**: Insert events at timestamps T1 (past) and T2 (future relative to as-of). With `--as-of` set between T1 and T2, only T1 events should appear in results. T2 events must be excluded entirely, not just decayed. Validates the exclusive upper-bound (`< ?4`) filtering in SQL. + +**`test_as_of_exclusive_upper_bound`**: Insert an event with timestamp exactly equal to the `as_of_ms` value. Verify it is excluded from results (strict less-than, not less-than-or-equal). This validates the half-open interval `[since, as_of)` semantics. + +**`test_excluded_usernames_filters_bots`**: Insert signals for a user named "renovate-bot" and a user named "jsmith", both with the same activity. With `excluded_usernames: ["renovate-bot"]` in config, only "jsmith" should appear in results. Validates the Rust-side post-query filtering. + +**`test_include_bots_flag_disables_filtering`**: Same setup as above, but with `--include-bots` active. Both "renovate-bot" and "jsmith" should appear in results. **`test_null_timestamp_fallback_to_created_at`**: Insert a merged MR with `merged_at = NULL` (edge case: old data before the column was populated). The state-aware timestamp should fall back to `created_at`. Verify the score reflects `created_at`, not 0 or a panic. @@ -496,11 +534,13 @@ Add timestamp-aware variants: **`test_reviewer_split_is_exhaustive`**: For a reviewer assigned to an MR, they must appear in exactly one of: participated (has substantive DiffNotes meeting `reviewer_min_note_chars`) or assigned-only (no DiffNotes, or only trivial ones below the threshold). Never both, never neither. Test three cases: (1) reviewer with substantive DiffNotes -> participated only, (2) reviewer with no DiffNotes -> assigned-only only, (3) reviewer with only trivial notes ("LGTM") -> assigned-only only. +**`test_deterministic_accumulation_order`**: Insert signals for a user with contributions at many different timestamps (10+ MRs with varied ages). Run `query_expert` 100 times in a loop. All 100 runs must produce the exact same `f64` score (bit-identical). Validates that the sorted contribution ordering eliminates HashMap-iteration-order nondeterminism. + ### 9. Existing Test Compatibility All existing tests insert data with `now_ms()`. With decay, elapsed ~0ms means decay ~1.0, so scores round to the same integers as before. No existing test assertions should break. -The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, and `reviewer_min_note_chars` fields. +The `test_expert_scoring_weights_are_configurable` test needs `..Default::default()` added to fill the new half-life fields, `reviewer_assignment_weight` / `reviewer_assignment_half_life_days`, `closed_mr_multiplier`, `reviewer_min_note_chars`, and `excluded_usernames` fields. ## Verification @@ -511,11 +551,17 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul 5. `ubs src/cli/commands/who.rs src/core/config.rs src/core/db.rs` — no bug scanner findings 6. Manual query plan verification (not automated — SQLite planner varies across versions): - Run `EXPLAIN QUERY PLAN` on the expert query (both exact and prefix modes) against a real database - - Confirm that `matched_notes` CTE uses `idx_notes_old_path_author` or the existing new_path index (not a full table scan) - - Confirm that `matched_file_changes` CTE uses `idx_mfc_old_path_project_mr` or `idx_mfc_new_path_project_mr` + - Confirm that `matched_notes_raw` branch 1 uses the existing new_path index and branch 2 uses `idx_notes_old_path_author` (not a full table scan on either branch) + - Confirm that `matched_file_changes_raw` branch 1 uses `idx_mfc_new_path_project_mr` and branch 2 uses `idx_mfc_old_path_project_mr` - Confirm that `reviewer_participation` CTE uses `idx_notes_diffnote_discussion_author` - Document the observed plan in a comment near the SQL for future regression reference -7. Real-world validation: +7. Performance baseline (manual, not CI-gated): + - Run `time cargo run --release -- who --path ` on the real database for exact, prefix, and suffix modes + - Target SLOs: p95 exact path < 200ms, prefix < 300ms, suffix < 500ms on development hardware + - Record baseline timings as a comment near the SQL for regression reference + - If any mode exceeds 2x the baseline after future changes, investigate before merging + - Note: These are soft targets for developer awareness, not automated CI gates. Automated benchmarking with synthetic fixtures (100k/1M/5M notes) is a v2 investment if performance becomes a real concern. +8. Real-world validation: - `cargo run --release -- who --path MeasurementQualityDialog.tsx` — verify jdefting/zhayes old reviews are properly discounted relative to recent authors - `cargo run --release -- who --path MeasurementQualityDialog.tsx --all-history` — compare full history vs 24m window to validate cutoff is reasonable - `cargo run --release -- who --path MeasurementQualityDialog.tsx --explain-score` — verify component breakdown sums to total and authored signal dominates for known authors @@ -524,6 +570,7 @@ The `test_expert_scoring_weights_are_configurable` test needs `..Default::defaul - `cargo run --release -- who --path MeasurementQualityDialog.tsx --as-of 2025-06-01` — verify deterministic output across repeated runs - Spot-check that reviewers who only left "LGTM"-style notes are classified as assigned-only (not participated) - Verify closed MRs contribute at ~50% of equivalent merged MR scores via `--explain-score` + - If the project has known bot accounts (e.g., renovate-bot), add them to `excluded_usernames` config and verify they no longer appear in results. Run again with `--include-bots` to confirm they reappear. ## Accepted from External Review @@ -553,12 +600,20 @@ Ideas incorporated from ChatGPT review (feedback-1 through feedback-4) that genu - **EXPLAIN QUERY PLAN verification step**: Manual check that the restructured queries use the new indexes (not automated, since SQLite planner varies across versions). **From feedback-4:** -- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `<= ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms]`. Without this, `--as-of` reproducibility was fundamentally broken. +- **`--as-of` temporal correctness (critical)**: The plan described `--as-of` but the SQL only enforced a lower bound (`>= ?2`). Events after the as-of date would leak in with full weight (because `elapsed.max(0.0)` clamps negative elapsed time to zero). Added `< ?4` upper bound to all SQL timestamp filters, making the query window `[since_ms, as_of_ms)`. Without this, `--as-of` reproducibility was fundamentally broken. (Refined to exclusive upper bound in feedback-5.) - **Closed-state inconsistency resolution**: The state-aware CASE expression handled `closed` state but the WHERE clause filtered to `('opened','merged')` only — dead code. Resolved by including `'closed'` in state filters and adding a `closed_mr_multiplier` (default 0.5) applied in Rust to all signals from closed-without-merge MRs. This credits real review effort on abandoned MRs while appropriately discounting it. - **Substantive note threshold for reviewer participation**: A single "LGTM" shouldn't promote a reviewer from 3-point (assigned-only) to 10-point (participated) weight. Added `reviewer_min_note_chars` (default 20) config field and `LENGTH(TRIM(body))` filter in the `reviewer_participation` CTE. This raises the bar for participation classification to actual substantive review comments. -- **UNION ALL optimization fallback for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Added documentation of a `UNION ALL` split + dedup fallback pattern to use if EXPLAIN QUERY PLAN shows degradation during verification. Start with the simpler `OR` approach; switch only if needed. +- **UNION ALL optimization for path predicates**: SQLite's planner can degrade `OR` across two indexed columns to a table scan. Originally documented as a fallback; promoted to default strategy in feedback-5 iteration. The UNION ALL + dedup approach ensures each index branch is used independently. - **New tests**: `test_trivial_note_does_not_count_as_participation`, `test_closed_mr_multiplier`, `test_as_of_excludes_future_events` — cover the three new features added from this review round. +**From feedback-5 (ChatGPT review):** +- **Exclusive upper bound for `--as-of`**: Changed from `[since_ms, as_of_ms]` (inclusive) to `[since_ms, as_of_ms)` (exclusive). Half-open intervals are the standard convention in temporal systems — they eliminate edge-case ambiguity when events have timestamps exactly at the boundary. Also added `YYYY-MM-DD` → end-of-day UTC parsing and window metadata in robot output. +- **UNION ALL as default for dual-path matching**: Promoted from "fallback if planner regresses" to default strategy. SQLite `OR`-across-indexed-columns degradation is common enough that the predictable UNION ALL + dedup approach is the safer starting point. The simpler `OR` variant is retained as a comment for benchmarking. +- **Deterministic contribution ordering**: Within each signal type, sort contributions by `mr_id` before summing. This eliminates HashMap iteration order as a source of f64 rounding variance near ties, ensuring CI reproducibility without the overhead of compensated summation (Neumaier/Kahan was rejected as overkill at this scale). +- **Minimal bot/service-account filtering**: Added `excluded_usernames` (exact match, case-insensitive) to `ScoringConfig` and `--include-bots` CLI flag. Applied as a Rust-side post-filter (not SQL) to keep queries clean. Scope is deliberately minimal — no regex patterns, no heuristic detection. Users configure the list for their team's specific bots. +- **Performance baseline SLOs**: Added manual performance baseline step to verification — record timings for exact/prefix/suffix modes and flag >2x regressions. Kept lightweight (no CI gating, no synthetic benchmarks) to match the project's current maturity. +- **New tests**: `test_as_of_exclusive_upper_bound`, `test_excluded_usernames_filters_bots`, `test_include_bots_flag_disables_filtering`, `test_deterministic_accumulation_order` — cover the newly-accepted features. + ## Rejected Ideas (with rationale) These suggestions were considered during review but explicitly excluded from this iteration: @@ -573,3 +628,10 @@ These suggestions were considered during review but explicitly excluded from thi - **Split scoring engine into core module** (feedback-4 #5): Proposed extracting scoring math from `who.rs` into `src/core/scoring/model_v2_decay.rs`. Premature modularization — `who.rs` is the only consumer and is ~800 lines. Adding module plumbing and indirection for a single call site adds complexity without reducing it. If we add a second scoring consumer (e.g., automated triage), revisit. - **Bot/service-account filtering** (feedback-4 #7): Real concern but orthogonal to time-decay scoring. This is a general data quality feature that belongs in its own issue — it affects all `who` modes, not just expert scoring. Adding `excluded_username_patterns` config and `--include-bots` flag is scope expansion that should be designed and tested independently. - **Model compare mode / rank-delta diagnostics** (feedback-4 #9): Over-engineered rollout safety for an internal CLI tool with ~3 users. Maintaining two parallel scoring codepaths (v1 flat + v2 decayed) doubles test surface and code complexity. The `--explain-score` + `--as-of` combination already provides debugging capability. If a future model change is risky enough to warrant A/B comparison, build it then. +- **Canonical path identity graph** (feedback-5 #1, also feedback-2 #2, feedback-4 #4): Third time proposed, third time rejected. Building a rename graph from `mr_file_changes(old_path, new_path)` with identity resolution requires new schema (`path_identities`, `path_aliases` tables), ingestion pipeline changes, graph traversal at query time, and backfill logic for existing data. The UNION ALL dual-path matching already covers the 80%+ case (direct renames). Multi-hop rename chains (A→B→C) are rare in practice and can be addressed in v2 with real usage data showing the gap matters. +- **Normalized `expertise_events` table** (feedback-5 #2): Proposes shifting from query-time CTE joins to a precomputed `expertise_events` table populated at ingest time. While architecturally appealing for read performance, this doubles the data surface area (raw tables + derived events), requires new ingestion pipelines with incremental upsert logic, backfill tooling for existing databases, and introduces consistency risks when raw data is corrected/re-synced. The CTE approach is correct, maintainable, and performant at our current scale. If query latency becomes a real bottleneck (see performance baseline SLOs), materialized views or derived tables become a v2 optimization. +- **Reviewer engagement model upgrade** (feedback-5 #3): Proposes adding `approved`/`changes_requested` review-state signals and trivial-comment pattern matching (`["lgtm","+1","nit","ship it"]`). Expands the signal type count from 4 to 6 and adds a fragile pattern-matching layer (what about "don't ship it"? "lgtm but..."?). The `reviewer_min_note_chars` threshold is imperfect but pragmatic — it's a single configurable number with no false-positive risk from substring matching. Review-state signals may be worth adding later as a separate enhancement when we have data on how often they diverge from DiffNote participation. +- **Contribution-floor auto cutoff for `--since`** (feedback-5 #5): Proposes `--since auto` computing the earliest relevant timestamp from `min_contribution_floor` (e.g., 0.01 points). Adds a non-obvious config parameter for minimal benefit — the 24m default is already mathematically justified from the decay curves (author: 6%, reviewer: 0.4% at 2 years) and easily overridden with `--since` or `--all-history`. The auto-derivation formula (`ceil(max_half_life * log2(1/floor))`) is opaque to users who just want to understand why a certain time range was selected. +- **Full evidence drill-down in `--explain-score`** (feedback-5 #8): Proposes `--explain-score=summary|full` with per-MR evidence rows. Already rejected in feedback-2 #7. Component totals are sufficient for v1 debugging — they answer "which signal type drives this user's score." Per-MR drill-down requires additional SQL queries and significant output format complexity. Deferred unless component breakdowns prove insufficient. +- **Neumaier compensated summation** (feedback-5 #7 partial): Accepted the sorting aspect for deterministic ordering, but rejected Neumaier/Kahan compensated summation. At the scale of dozens to low hundreds of contributions per user, the rounding error from naive f64 summation is on the order of 1e-14 — several orders of magnitude below any meaningful score difference. Compensated summation adds code complexity and a maintenance burden for no practical benefit at this scale. +- **Automated CI benchmark gate** (feedback-5 #10 partial): Accepted manual performance baselines, but rejected automated CI regression gating with synthetic fixtures (100k/1M/5M notes). Building and maintaining benchmark infrastructure is a significant investment that's premature for a CLI tool with ~3 users. Manual timing checks during development are sufficient until performance becomes a real concern. diff --git a/plans/work-item-status-graphql.feedback-3.md b/plans/work-item-status-graphql.feedback-3.md new file mode 100644 index 0000000..5a30224 --- /dev/null +++ b/plans/work-item-status-graphql.feedback-3.md @@ -0,0 +1,157 @@ +**Top Revisions I Recommend** + +1. **Fix auth semantics + a real inconsistency in your test plan** +Your ACs require graceful handling for `403`, but the test list says the “403” test returns `401`. That hides the exact behavior you care about and can let permission regressions slip through. + +```diff +@@ AC-1: GraphQL Client (Unit) + - [ ] HTTP 401 → `LoreError::GitLabAuthFailed` + + [ ] HTTP 401 → `LoreError::GitLabAuthFailed` + + [ ] HTTP 403 → `LoreError::GitLabForbidden` + +@@ AC-3: Status Fetcher (Integration) + - [ ] GraphQL 403 → returns `Ok(HashMap::new())` with warning log + + [ ] GraphQL 403 (`GitLabForbidden`) → returns `Ok(HashMap::new())` with warning log + +@@ TDD Plan (RED) + - 13. `test_fetch_statuses_403_graceful` — mock returns 401 → `Ok(HashMap::new())` + + 13. `test_fetch_statuses_403_graceful` — mock returns 403 → `Ok(HashMap::new())` +``` + +2. **Make enrichment atomic and stale-safe** +Current plan can leave stale status values forever when a widget disappears or status becomes null. Make writes transactional and clear status fields for fetched scope before upserts. + +```diff +@@ AC-6: Enrichment in Orchestrator (Integration) + + [ ] Enrichment DB writes are transactional per project (all-or-nothing) + + [ ] Status fields are cleared for fetched issue scope before applying new statuses + + [ ] If enrichment fails mid-project, prior persisted statuses are unchanged (rollback) + +@@ File 6: `src/ingestion/orchestrator.rs` +- fn enrich_issue_statuses(...) ++ fn enrich_issue_statuses_txn(...) ++ // BEGIN TRANSACTION ++ // clear status columns for fetched issue scope ++ // apply updates ++ // COMMIT +``` + +3. **Add transient retry/backoff (429/5xx/network)** +Right now one transient failure loses status enrichment for that sync. Retrying with bounded backoff gives much better reliability at low cost. + +```diff +@@ AC-1: GraphQL Client (Unit) + + [ ] Retries 429/502/503/504/network errors with bounded exponential backoff + jitter (max 3 attempts) + + [ ] Honors `Retry-After` on 429 before retrying + +@@ AC-6: Enrichment in Orchestrator (Integration) + + [ ] Cancellation signal is checked before each retry sleep and between paginated calls +``` + +4. **Stop full GraphQL scans when nothing changed** +Running full pagination on every sync will dominate runtime on large repos. Trigger enrichment only when issue ingestion reports changes, with a manual override. + +```diff +@@ AC-6: Enrichment in Orchestrator (Integration) +- [ ] Runs on every sync (not gated by `--full`) ++ [ ] Runs when issue ingestion changed at least one issue in the project ++ [ ] New override flag `--refresh-status` forces enrichment even with zero issue deltas ++ [ ] Optional periodic full refresh (e.g. every N syncs) to prevent long-tail drift +``` + +5. **Do not expose raw token via `client.token()`** +Architecturally cleaner and safer: keep token encapsulated and expose a GraphQL-ready client factory from `GitLabClient`. + +```diff +@@ File 13: `src/gitlab/client.rs` +- pub fn token(&self) -> &str ++ pub fn graphql_client(&self) -> crate::gitlab::graphql::GraphqlClient + +@@ File 6: `src/ingestion/orchestrator.rs` +- let graphql_client = GraphqlClient::new(&config.gitlab.base_url, client.token()); ++ let graphql_client = client.graphql_client(); +``` + +6. **Add indexes for new status filters** +`--status` on large tables will otherwise full-scan `issues`. Add compound indexes aligned with project-scoped list queries. + +```diff +@@ AC-4: Migration 021 (Unit) + + [ ] Adds index `idx_issues_project_status_name(project_id, status_name)` + + [ ] Adds index `idx_issues_project_status_category(project_id, status_category)` + +@@ File 14: `migrations/021_work_item_status.sql` + ALTER TABLE issues ADD COLUMN status_name TEXT; + ALTER TABLE issues ADD COLUMN status_category TEXT; + ALTER TABLE issues ADD COLUMN status_color TEXT; + ALTER TABLE issues ADD COLUMN status_icon_name TEXT; ++CREATE INDEX IF NOT EXISTS idx_issues_project_status_name ++ ON issues(project_id, status_name); ++CREATE INDEX IF NOT EXISTS idx_issues_project_status_category ++ ON issues(project_id, status_category); +``` + +7. **Improve filter UX: add category filter + case-insensitive status** +Case-sensitive exact name matches are brittle with custom lifecycle names. Category filter is stable and useful for automation. + +```diff +@@ AC-9: List Issues Filter (E2E) +- [ ] Filter is case-sensitive (matches GitLab's exact status name) ++ [ ] `--status` uses case-insensitive exact match by default (`COLLATE NOCASE`) ++ [ ] New filter `--status-category` supports `triage|to_do|in_progress|done|canceled` ++ [ ] `--status-exact` enables strict case-sensitive behavior when needed +``` + +8. **Add capability probe/cache to avoid pointless calls** +Free tier / old GitLab versions will never return status widget. Cache that capability per project (with TTL) to reduce noise and wasted requests. + +```diff +@@ GitLab API Constraints ++### Capability Probe ++On first sync per project, detect status-widget support and cache result for 24h. ++If unsupported, skip enrichment silently (debug log) until TTL expiry. + +@@ AC-3: Status Fetcher (Integration) ++ [ ] Unsupported capability state bypasses GraphQL fetch and warning spam +``` + +9. **Use a nested robot `status` object instead of 4 top-level fields** +This is cleaner schema design and scales better as status metadata grows (IDs, lifecycle, timestamps, etc.). + +```diff +@@ AC-7: Show Issue Display (Robot) +- [ ] JSON includes `status_name`, `status_category`, `status_color`, `status_icon_name` fields +- [ ] Fields are `null` (not absent) when status not available ++ [ ] JSON includes `status` object: ++ `{ "name": "...", "category": "...", "color": "...", "icon_name": "..." }` or `null` + +@@ AC-8: List Issues Display (Robot) +- [ ] JSON includes `status_name`, `status_category` fields on each issue ++ [ ] JSON includes `status` object (or `null`) on each issue +``` + +10. **Add one compelling feature: status analytics, not just status display** +Right now this is mostly a transport/display enhancement. Make it genuinely useful with “stale in-progress” detection and age-in-status filters. + +```diff +@@ Acceptance Criteria ++### AC-11: Status Aging & Triage Value (E2E) ++- [ ] `lore list issues --status-category in_progress --stale-days 14` filters to stale work ++- [ ] Human table shows `Status Age` (days) when status exists ++- [ ] Robot output includes `status_age_days` (nullable integer) +``` + +11. **Harden test plan around failure modes you’ll actually hit** +The current tests are good, but miss rollback/staleness/retry behavior that drives real reliability. + +```diff +@@ TDD Plan (RED) additions ++21. `test_enrich_clears_removed_status` ++22. `test_enrich_transaction_rolls_back_on_failure` ++23. `test_graphql_retry_429_then_success` ++24. `test_graphql_retry_503_then_success` ++25. `test_cancel_during_backoff_aborts_cleanly` ++26. `test_status_filter_query_uses_project_status_index` (EXPLAIN smoke test) +``` + +If you want, I can produce a fully revised v3 plan document end-to-end (frontmatter + reordered ACs + updated file list + updated TDD matrix) so it is ready to implement directly. \ No newline at end of file diff --git a/plans/work-item-status-graphql.feedback-4.md b/plans/work-item-status-graphql.feedback-4.md new file mode 100644 index 0000000..9c23dc6 --- /dev/null +++ b/plans/work-item-status-graphql.feedback-4.md @@ -0,0 +1,159 @@ +Your plan is already strong, but I’d revise it in 10 places to reduce risk at scale and make it materially more useful. + +1. Shared transport + retries for GraphQL (must-have) +Reasoning: `REST` already has throttling/retry in `src/gitlab/client.rs`; your proposed GraphQL client would bypass that and can spike rate limits under concurrent project ingest (`src/cli/commands/ingest.rs`). Unifying transport prevents split behavior and cuts production incidents. + +```diff +@@ AC-1: GraphQL Client (Unit) +- [ ] Network error → `LoreError::Other` ++ [ ] GraphQL requests use shared GitLab transport (same timeout, rate limiter, retry policy as REST) ++ [ ] Retries 429/502/503/504/network errors (max 3) with exponential backoff + jitter ++ [ ] 429 honors `Retry-After` before retrying ++ [ ] Exhausted network retries → `LoreError::GitLabNetworkError` + +@@ Decisions +- 8. **No retry/backoff in v1** — DEFER. ++ 8. **Retry/backoff in v1** — YES (shared REST+GraphQL reliability policy). + +@@ Implementation Detail ++ File 15: `src/gitlab/transport.rs` (NEW) — shared HTTP execution and retry/backoff policy. +``` + +2. Capability cache for unsupported projects (must-have) +Reasoning: Free tier / older GitLab will repeatedly emit warning noise every sync and waste calls. Cache support status per project and re-probe on TTL. + +```diff +@@ AC-6: Enrichment in Orchestrator (Integration) +- [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync) ++ [ ] Unsupported capability responses (missing endpoint/type/widget) are cached per project ++ [ ] While cached unsupported, enrichment is skipped without repeated warning spam ++ [ ] Capability cache auto-expires (default 24h) and is re-probed + +@@ Migration Numbering +- This feature uses **migration 021**. ++ This feature uses **migrations 021-022**. + +@@ Files Changed (Summary) ++ `migrations/022_project_capabilities.sql` | NEW — support cache table for project capabilities +``` + +3. Delta-first enrichment with periodic full reconcile (must-have) +Reasoning: Full GraphQL scan every sync is expensive for large projects. You already compute issue deltas in ingestion; use that as fast path and keep a periodic full sweep as safety net. + +```diff +@@ AC-6: Enrichment in Orchestrator (Integration) +- [ ] Runs on every sync (not gated by `--full`) ++ [ ] Fast path: skip status enrichment when issue ingestion upserted 0 issues for that project ++ [ ] Safety net: run full reconciliation every `status_full_reconcile_hours` (default 24) ++ [ ] `--full` always forces reconciliation + +@@ AC-5: Config Toggle (Unit) ++ [ ] `SyncConfig` has `status_full_reconcile_hours: u32` (default 24) +``` + +4. Strongly typed widget parsing via `__typename` (must-have) +Reasoning: current “deserialize arbitrary widget JSON into `StatusWidget`” is fragile. Query/type by `__typename` for forward compatibility and fewer silent parse mistakes. + +```diff +@@ AC-3: Status Fetcher (Integration) +- [ ] Extracts status from `widgets` array by matching `WorkItemWidgetStatus` fragment ++ [ ] Query includes `widgets { __typename ... }` and parser matches `__typename == "WorkItemWidgetStatus"` ++ [ ] Non-status widgets are ignored deterministically (no heuristic JSON-deserialize attempts) + +@@ GraphQL Query ++ widgets { ++ __typename ++ ... on WorkItemWidgetStatus { ... } ++ } +``` + +5. Set-based transactional DB apply (must-have) +Reasoning: row-by-row clear/update loops will be slow on large projects and hold write locks longer. Temp-table + set-based SQL inside one txn is faster and easier to reason about rollback. + +```diff +@@ AC-3: Status Fetcher (Integration) +- `all_fetched_iids: Vec` ++ `all_fetched_iids: HashSet` + +@@ AC-6: Enrichment in Orchestrator (Integration) +- [ ] Before applying updates, NULL out status fields ... (loop per IID) +- [ ] UPDATE SQL: `SET status_name=?, ... WHERE project_id=? AND iid=?` ++ [ ] Use temp tables and set-based SQL in one transaction: ++ [ ] (1) clear stale statuses for fetched IIDs absent from status rows ++ [ ] (2) apply status values for fetched IIDs with status ++ [ ] One commit per project; rollback leaves prior state intact +``` + +6. Fix index strategy for `COLLATE NOCASE` + default sorting (must-have) +Reasoning: your proposed `(project_id, status_name)` index may not fully help `COLLATE NOCASE` + `ORDER BY updated_at`. Tune index to real query shape in `src/cli/commands/list.rs`. + +```diff +@@ AC-4: Migration 021 (Unit) +- [ ] Adds compound index `idx_issues_project_status_name(project_id, status_name)` for `--status` filter performance ++ [ ] Adds covering NOCASE-aware index: ++ [ ] `idx_issues_project_status_name_nocase_updated(project_id, status_name COLLATE NOCASE, updated_at DESC)` ++ [ ] Adds category index: ++ [ ] `idx_issues_project_status_category_nocase(project_id, status_category COLLATE NOCASE)` +``` + +7. Add stable/automation-friendly filters now (high-value feature) +Reasoning: status names are user-customizable and renameable; category is more stable. Also add `--no-status` for quality checks and migration visibility. + +```diff +@@ AC-9: List Issues Filter (E2E) ++ [ ] `lore list issues --status-category in_progress` filters by category (case-insensitive) ++ [ ] `lore list issues --no-status` returns only issues where `status_name IS NULL` ++ [ ] `--status` + `--status-category` combine with AND logic + +@@ File 9: `src/cli/mod.rs` ++ Add flags: `--status-category`, `--no-status` + +@@ File 11: `src/cli/autocorrect.rs` ++ Register `--status-category` and `--no-status` for `issues` +``` + +8. Better enrichment observability and failure accounting (must-have ops) +Reasoning: only tracking `statuses_enriched` hides skipped/cleared/errors, and auth failures become silent partial data quality issues. Add counters and explicit progress events. + +```diff +@@ AC-6: Enrichment in Orchestrator (Integration) +- [ ] `IngestProjectResult` gains `statuses_enriched: usize` counter +- [ ] Progress event: `ProgressEvent::StatusEnrichmentComplete { enriched: usize }` ++ [ ] `IngestProjectResult` gains: ++ [ ] `statuses_enriched`, `statuses_cleared`, `status_enrichment_skipped`, `status_enrichment_failed` ++ [ ] Progress events: ++ [ ] `StatusEnrichmentStarted`, `StatusEnrichmentSkipped`, `StatusEnrichmentComplete`, `StatusEnrichmentFailed` ++ [ ] End-of-sync summary includes per-project enrichment outcome counts +``` + +9. Add `status_changed_at` for immediately useful workflow analytics (high-value feature) +Reasoning: without change timestamp, you can’t answer “how long has this been in progress?” which is one of the most useful agent/human queries. + +```diff +@@ AC-4: Migration 021 (Unit) ++ [ ] Adds nullable INTEGER column `status_changed_at` (ms epoch UTC) + +@@ AC-6: Enrichment in Orchestrator (Integration) ++ [ ] If status_name/category changes, update `status_changed_at = now_ms()` ++ [ ] If status is cleared, set `status_changed_at = NULL` + +@@ AC-9: List Issues Filter (E2E) ++ [ ] `lore list issues --stale-status-days N` filters by `status_changed_at <= now - N days` +``` + +10. Expand test matrix for real-world failure/perf paths (must-have) +Reasoning: current tests are good, but the highest-risk failures are retry behavior, capability caching, idempotency under repeated runs, and large-project performance. + +```diff +@@ TDD Plan — RED Phase ++ 26. `test_graphql_retries_429_with_retry_after_then_succeeds` ++ 27. `test_graphql_retries_503_then_fails_after_max_attempts` ++ 28. `test_capability_cache_skips_unsupported_project_until_ttl_expiry` ++ 29. `test_delta_skip_when_no_issue_upserts` ++ 30. `test_periodic_full_reconcile_runs_after_threshold` ++ 31. `test_set_based_enrichment_scales_10k_issues_without_timeout` ++ 32. `test_enrichment_idempotent_across_two_runs` ++ 33. `test_status_changed_at_updates_only_on_actual_status_change` +``` + +If you want, I can now produce a single consolidated revised plan document (full rewritten Markdown) with these changes merged in-place so it’s ready to execute. \ No newline at end of file diff --git a/plans/work-item-status-graphql.feedback-5.md b/plans/work-item-status-graphql.feedback-5.md new file mode 100644 index 0000000..4006a24 --- /dev/null +++ b/plans/work-item-status-graphql.feedback-5.md @@ -0,0 +1,124 @@ +Your plan is already strong and implementation-aware. The best upgrades are mostly about reliability under real-world API instability, large-scale performance, and making the feature more useful for automation. + +1. Promote retry/backoff from deferred to in-scope now. +Reason: Right now, transient failures cause silent status gaps until a later sync. Bounded retries with jitter and a time budget dramatically improve successful enrichment without making syncs hang. + +```diff +@@ AC-1: GraphQL Client (Unit) @@ + - [ ] Network error → `LoreError::Other` + + [ ] Transient failures (`429`, `502`, `503`, `504`, timeout, connect reset) retry with exponential backoff + jitter (max 3 attempts) + + [ ] `Retry-After` supports both delta-seconds and HTTP-date formats + + [ ] Per-request retry budget capped (e.g. 120s total) to preserve cancellation responsiveness + +@@ AC-6: Enrichment in Orchestrator (Integration) @@ + - [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync) + + [ ] On transient GraphQL errors: retry policy applied before warning/skip behavior + +@@ Decisions @@ + - 8. **No retry/backoff in v1** — DEFER. + + 8. **Retry/backoff in v1** — YES. Required for reliable enrichment under normal GitLab/API turbulence. +``` + +2. Add a capability cache so unsupported projects stop paying repeated GraphQL cost. +Reason: Free tier / older instances will never return status widgets. Re-querying every sync is wasted time and noisy logs. + +```diff +@@ Acceptance Criteria @@ + + ### AC-11: Capability Probe & Cache (Integration) + + - [ ] Add `project_capabilities` cache with `supports_work_item_status`, `checked_at`, `cooldown_until` + + - [ ] 404/403/known-unsupported responses update capability cache and suppress repeated warnings until TTL expires + + - [ ] Supported projects still enrich every run (subject to normal schedule) + +@@ Future Enhancements (Not in Scope) @@ + - **Capability probe/cache**: Detect status-widget support per project ... (deferred) + + (moved into scope as AC-11) +``` + +3. Make enrichment delta-aware with periodic forced reconciliation. +Reason: Full pagination every sync is expensive on large projects. You can skip unnecessary status fetches when no issue changes occurred, while still doing periodic safety sweeps. + +```diff +@@ AC-6: Enrichment in Orchestrator (Integration) @@ + - [ ] Runs on every sync (not gated by `--full`) + + [ ] Runs when issue ingestion reports project issue deltas OR reconcile window elapsed + + [ ] New config: `status_reconcile_hours` (default: 24) for periodic full sweep + + [ ] `--refresh-status` forces enrichment regardless of delta/reconcile window +``` + +4. Replace row-by-row update loops with set-based SQL via temp table. +Reason: Current per-IID loops are simple but slow at scale and hold locks longer. Set-based updates are much faster and reduce lock contention. + +```diff +@@ File 6: `src/ingestion/orchestrator.rs` (MODIFY) @@ + - for iid in all_fetched_iids { ... UPDATE issues ... } + - for (iid, status) in statuses { ... UPDATE issues ... } + + CREATE TEMP TABLE temp_issue_status_updates(...) + + bulk INSERT temp rows (iid, name, category, color, icon_name) + + single set-based UPDATE for enriched rows + + single set-based NULL-clear for fetched-without-status rows + + commit transaction +``` + +5. Add strict mode and explicit partial-failure reporting. +Reason: “Warn and continue” is good default UX, but automation needs a fail-fast option and machine-readable failure output. + +```diff +@@ AC-5: Config Toggle (Unit) @@ + + - [ ] `SyncConfig` adds `status_enrichment_strict: bool` (default false) + +@@ AC-6: Enrichment in Orchestrator (Integration) @@ + - [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync) + + [ ] Default mode: warn + continue + + [ ] Strict mode: status enrichment error fails sync for that run + +@@ AC-6: IngestProjectResult @@ + + - [ ] Adds `status_enrichment_error: Option` + +@@ AC-8 / Robot sync envelope @@ + + - [ ] Robot output includes `partial_failures` array with per-project enrichment failures +``` + +6. Fix case-insensitive matching robustness and track freshness. +Reason: SQLite `COLLATE NOCASE` is ASCII-centric; custom statuses may be non-ASCII. Also you need visibility into staleness. + +```diff +@@ AC-4: Migration 021 (Unit) @@ + - [ ] Migration adds 4 nullable TEXT columns to `issues` + + [ ] Migration adds 6 columns: + + `status_name`, `status_category`, `status_color`, `status_icon_name`, + + `status_name_fold`, `status_synced_at` + - [ ] Adds compound index `idx_issues_project_status_name(project_id, status_name)` + + [ ] Adds compound index `idx_issues_project_status_name_fold(project_id, status_name_fold)` + +@@ AC-9: List Issues Filter (E2E) @@ + - [ ] Filter uses case-insensitive matching (`COLLATE NOCASE`) + + [ ] Filter uses `status_name_fold` (Unicode-safe fold normalization done at write time) +``` + +7. Expand filtering to category and missing-status workflows. +Reason: Name filters are useful, but automation is better on semantic categories and “missing data” detection. + +```diff +@@ AC-9: List Issues Filter (E2E) @@ + + - [ ] `--status-category in_progress` filters by `status_category` (case-insensitive) + + - [ ] `--no-status` returns only issues where `status_name IS NULL` + + - [ ] `--status` and `--status-category` can be combined with AND logic +``` + +8. Change robot payload from flat status fields to a nested `status` object. +Reason: Better schema evolution and less top-level field sprawl as you add metadata (`synced_at`, future lifecycle fields). + +```diff +@@ AC-7: Show Issue Display (E2E) @@ + - [ ] JSON includes `status_name`, `status_category`, `status_color`, `status_icon_name` fields + - [ ] Fields are `null` (not absent) when status not available + + [ ] JSON includes `status` object: + + `{ "name", "category", "color", "icon_name", "synced_at" }` + + [ ] `status: null` when not available + +@@ AC-8: List Issues Display (E2E) @@ + - [ ] `--fields` supports: `status_name`, `status_category`, `status_color`, `status_icon_name` + + [ ] `--fields` supports: `status.name,status.category,status.color,status.icon_name,status.synced_at` +``` + +If you want, I can produce a fully rewritten “Iteration 5” plan document with these changes integrated end-to-end (ACs, files, migrations, TDD batches, and updated decisions/future-scope). \ No newline at end of file diff --git a/plans/work-item-status-graphql.feedback-6.md b/plans/work-item-status-graphql.feedback-6.md new file mode 100644 index 0000000..8224586 --- /dev/null +++ b/plans/work-item-status-graphql.feedback-6.md @@ -0,0 +1,130 @@ +Your iteration-5 plan is strong. The biggest remaining gaps are outcome ambiguity, cancellation safety, and long-term status identity. These are the revisions I’d make. + +1. **Make enrichment outcomes explicit (not “empty success”)** +Analysis: +Right now `404/403 -> Ok(empty)` is operationally ambiguous: “project has no statuses” vs “feature unavailable/auth issue.” Agents and dashboards need that distinction to make correct decisions. +This improves reliability and observability without making sync fail-hard. + +```diff +@@ AC-3: Status Fetcher (Integration) +-- [ ] `fetch_issue_statuses()` returns `FetchStatusResult` containing: ++- [ ] `fetch_issue_statuses()` returns `FetchStatusOutcome`: ++ - `Fetched(FetchStatusResult)` ++ - `Unsupported { reason: UnsupportedReason }` ++ - `CancelledPartial(FetchStatusResult)` +@@ +-- [ ] GraphQL 404 → returns `Ok(FetchStatusResult)` with empty collections + warning log +-- [ ] GraphQL 403 (`GitLabAuthFailed`) → returns `Ok(FetchStatusResult)` with empty collections + warning log ++- [ ] GraphQL 404 → `Unsupported { reason: GraphqlEndpointMissing }` + warning log ++- [ ] GraphQL 403 (`GitLabAuthFailed`) → `Unsupported { reason: AuthForbidden }` + warning log + +@@ AC-10: Robot Sync Envelope (E2E) +-- [ ] `status_enrichment` object: `{ "enriched": N, "cleared": N, "error": null | "message" }` ++- [ ] `status_enrichment` object: `{ "mode": "fetched|unsupported|cancelled_partial", "reason": null|"...", "enriched": N, "cleared": N, "error": null|"message" }` +``` + +2. **Add cancellation and pagination loop safety** +Analysis: +Large projects can run long. Current flow checks cancellation only before enrichment starts; pagination and per-row update loops can ignore cancellation for too long. Also, GraphQL cursor bugs can create infinite loops (`hasNextPage=true` with unchanged cursor). +This is a robustness must-have. + +```diff +@@ AC-3: Status Fetcher (Integration) ++ [ ] `fetch_issue_statuses()` accepts cancellation signal and checks it between page requests ++ [ ] Pagination guard: if `hasNextPage=true` but `endCursor` is `None` or unchanged, abort loop with warning and return partial outcome ++ [ ] Emits `pages_fetched` count for diagnostics + +@@ File 1: `src/gitlab/graphql.rs` +-- pub async fn fetch_issue_statuses(client: &GraphqlClient, project_path: &str) -> Result ++- pub async fn fetch_issue_statuses(client: &GraphqlClient, project_path: &str, signal: &CancellationSignal) -> Result +``` + +3. **Persist stable `status_id` in addition to name** +Analysis: +`status_name` is display-oriented and mutable (rename/custom lifecycle changes). A stable status identifier is critical for durable automations, analytics, and future migrations. +This is a schema decision that is cheap now and expensive later if skipped. + +```diff +@@ AC-2: Status Types (Unit) +-- [ ] `WorkItemStatus` struct has `name`, `category`, `color`, `icon_name` ++- [ ] `WorkItemStatus` struct has `id: String`, `name`, `category`, `color`, `icon_name` + +@@ AC-4: Migration 021 (Unit) +-- [ ] Migration adds 5 nullable columns to `issues`: `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at` ++- [ ] Migration adds 6 nullable columns to `issues`: `status_id`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at` ++ [ ] Adds index `idx_issues_project_status_id(project_id, status_id)` for stable-machine filters + +@@ GraphQL query +- status { name category color iconName } ++ status { id name category color iconName } + +@@ AC-7 / AC-8 Robot ++ [ ] JSON includes `status_id` (null when unavailable) +``` + +4. **Handle GraphQL partial-data responses correctly** +Analysis: +GraphQL can return both `data` and `errors` in the same response. Current plan treats any `errors` as hard failure, which can discard valid data and reduce reliability. +Use partial-data semantics: keep data, log/report warnings. + +```diff +@@ AC-1: GraphQL Client (Unit) +-- [ ] Error response: if top-level `errors` array is non-empty, returns `LoreError` with first error message ++- [ ] If `errors` non-empty and `data` missing: return `LoreError` with first error message ++- [ ] If `errors` non-empty and `data` present: return `data` + warning metadata (do not fail the whole fetch) + +@@ TDD Plan (RED) ++ 33. `test_graphql_partial_data_with_errors_returns_data_and_warning` +``` + +5. **Extract status enrichment from orchestrator into dedicated module** +Analysis: +`orchestrator.rs` already has many phases. Putting status transport/parsing/transaction policy directly there increases coupling and test friction. +A dedicated module improves architecture clarity and makes future enhancements safer. + +```diff +@@ Implementation Detail ++- File 15: `src/ingestion/enrichment/status.rs` (NEW) ++ - `run_status_enrichment(...)` ++ - `enrich_issue_statuses_txn(...)` ++ - outcome mapping + telemetry + +@@ File 6: `src/ingestion/orchestrator.rs` +-- Inline Phase 1.5 logic + helper function ++- Delegates to `enrichment::status::run_status_enrichment(...)` and records returned stats +``` + +6. **Add status/state consistency checks** +Analysis: +GitLab states status categories and issue state should synchronize, but ingestion drift or API edge cases can violate this. Detecting mismatch is high-signal for data integrity issues. +This is compelling for agents because it catches “looks correct but isn’t” problems. + +```diff +@@ AC-6: Enrichment in Orchestrator (Integration) ++ [ ] Enrichment computes `status_state_mismatches` count: ++ - `DONE|CANCELED` with `state=open` or `TO_DO|IN_PROGRESS|TRIAGE` with `state=closed` ++ [ ] Logs warning summary when mismatches > 0 + +@@ AC-10: Robot Sync Envelope (E2E) ++ [ ] `status_enrichment` includes `state_mismatches: N` +``` + +7. **Add explicit performance envelope acceptance criterion** +Analysis: +Plan claims large-project handling, but no hard validation target is defined. Add a bounded, reproducible performance criterion to prevent regressions. +This is especially important with pagination + per-row writes. + +```diff +@@ Acceptance Criteria ++ ### AC-12: Performance Envelope (Integration) ++ - [ ] 10k-issue fixture completes status fetch + apply within defined budget on CI baseline machine ++ - [ ] Memory usage remains O(page_size), not O(total_issues) ++ - [ ] Cancellation during large sync exits within a bounded latency target + +@@ TDD Plan (RED) ++ 34. `test_enrichment_large_project_budget` ++ 35. `test_fetch_statuses_memory_bound_by_page` ++ 36. `test_cancellation_latency_during_pagination` +``` + +If you want, I can next produce a single consolidated “iteration 6” plan draft with these diffs fully merged so it’s ready to execute. \ No newline at end of file diff --git a/plans/work-item-status-graphql.feedback-7.md b/plans/work-item-status-graphql.feedback-7.md new file mode 100644 index 0000000..dc6cbe1 --- /dev/null +++ b/plans/work-item-status-graphql.feedback-7.md @@ -0,0 +1,118 @@ +**Highest-Impact Revisions (new, not in your rejected list)** + +1. **Critical: Preserve GraphQL partial-error metadata end-to-end (don’t just log it)** +Rationale: Right now partial GraphQL errors are warning-only. Agents get no machine-readable signal that status data may be incomplete, which can silently corrupt downstream automation decisions. Exposing partial-error metadata in `FetchStatusResult` and robot sync output makes reliability observable and actionable. + +```diff +@@ AC-1: GraphQL Client (Unit) +- [ ] Partial-data response: if `errors` array is non-empty BUT `data` field is present and non-null, returns `data` and logs warning with first error message ++ [ ] Partial-data response: if `errors` array is non-empty BUT `data` field is present and non-null, returns `data` and warning metadata (`had_errors=true`, `first_error_message`) ++ [ ] `GraphqlClient::query()` returns `GraphqlQueryResult { data, had_errors, first_error_message }` + +@@ AC-3: Status Fetcher (Integration) ++ [ ] `FetchStatusResult` includes `partial_error_count: usize` and `first_partial_error: Option` ++ [ ] Partial GraphQL errors increment `partial_error_count` and are surfaced to orchestrator result + +@@ AC-10: Robot Sync Envelope (E2E) +- { "mode": "...", "reason": ..., "enriched": N, "cleared": N, "error": ... } ++ { "mode": "...", "reason": ..., "enriched": N, "cleared": N, "error": ..., "partial_errors": N, "first_partial_error": null|"..." } + +@@ File 1: src/gitlab/graphql.rs +- pub async fn query(...) -> Result ++ pub async fn query(...) -> Result ++ pub struct GraphqlQueryResult { pub data: serde_json::Value, pub had_errors: bool, pub first_error_message: Option } +``` + +2. **High: Add adaptive page-size fallback for GraphQL complexity/timeout failures** +Rationale: Fixed `first: 100` is brittle on self-hosted instances with stricter complexity/time limits. Adaptive page size (100→50→25→10) improves success rate without retries/backoff and avoids failing an entire project due to one tunable server constraint. + +```diff +@@ Query Path +-query($projectPath: ID!, $after: String) { ... workItems(types: [ISSUE], first: 100, after: $after) ... } ++query($projectPath: ID!, $after: String, $first: Int!) { ... workItems(types: [ISSUE], first: $first, after: $after) ... } + +@@ AC-3: Status Fetcher (Integration) ++ [ ] Starts with `first=100`; on GraphQL complexity/timeout errors, retries same cursor with smaller page size (50, 25, 10) ++ [ ] If smallest page size still fails, returns error as today ++ [ ] Emits warning including page size downgrade event + +@@ TDD Plan (RED) ++ 36. `test_fetch_statuses_complexity_error_reduces_page_size` ++ 37. `test_fetch_statuses_timeout_error_reduces_page_size` +``` + +3. **High: Make project path lookup failure non-fatal for the sync** +Rationale: Enrichment is optional. If `projects.path_with_namespace` lookup fails for any reason, sync should continue with a structured enrichment error instead of risking full project pipeline failure. + +```diff +@@ AC-6: Enrichment in Orchestrator (Integration) ++ [ ] If project path lookup fails/missing, status enrichment is skipped for that project, warning logged, and sync continues ++ [ ] `status_enrichment_error` captures `"project_path_missing"` (or DB error text) + +@@ File 6: src/ingestion/orchestrator.rs +- let project_path: String = conn.query_row(...)?; ++ let project_path = conn.query_row(...).optional()?; ++ if project_path.is_none() { ++ result.status_enrichment_error = Some("project_path_missing".to_string()); ++ result.status_enrichment_mode = "fetched".to_string(); // attempted but unavailable locally ++ emit(ProgressEvent::StatusEnrichmentComplete { enriched: 0, cleared: 0 }); ++ // continue to discussion sync ++ } +``` + +4. **Medium: Upgrade `--status` from single-value to repeatable multi-value filter** +Rationale: Practical usage often needs “active buckets” (`To do` OR `In progress`). Repeatable `--status` with OR semantics dramatically improves usefulness without adding new conceptual surface area. + +```diff +@@ AC-9: List Issues Filter (E2E) +- [ ] `lore list issues --status "In progress"` → only issues where `status_name = 'In progress'` ++ [ ] `lore list issues --status "In progress"` → unchanged single-value behavior ++ [ ] Repeatable flags supported: `--status "In progress" --status "To do"` (OR semantics across status values) ++ [ ] Repeated `--status` remains AND-composed with other filters + +@@ File 9: src/cli/mod.rs +- pub status: Option, ++ pub status: Vec, // repeatable flag + +@@ File 8: src/cli/commands/list.rs +- if let Some(status) = filters.status { where_clauses.push("i.status_name = ? COLLATE NOCASE"); ... } ++ if !filters.statuses.is_empty() { /* dynamic OR/IN clause with case-insensitive matching */ } +``` + +5. **Medium: Add coverage telemetry (`seen`, `with_status`, `without_status`)** +Rationale: `enriched`/`cleared` alone is not enough to judge enrichment health. Coverage counters make it obvious whether a project truly has no statuses, is unsupported, or has unexpectedly low status population. + +```diff +@@ AC-6: Enrichment in Orchestrator (Integration) ++ [ ] `IngestProjectResult` gains `statuses_seen: usize` and `statuses_without_widget: usize` ++ [ ] Enrichment log includes `seen`, `enriched`, `cleared`, `without_widget` + +@@ AC-10: Robot Sync Envelope (E2E) +- status_enrichment: { mode, reason, enriched, cleared, error } ++ status_enrichment: { mode, reason, seen, enriched, cleared, without_widget, error, partial_errors } + +@@ File 6: src/ingestion/orchestrator.rs ++ result.statuses_seen = fetch_result.all_fetched_iids.len(); ++ result.statuses_without_widget = result.statuses_seen.saturating_sub(result.statuses_enriched); +``` + +6. **Medium: Centralize color parsing/render decisions (single helper used by show/list)** +Rationale: Color parsing is duplicated in `show.rs` and `list.rs`, which invites drift and inconsistent behavior. One shared helper gives consistent fallback behavior and simpler tests. + +```diff +@@ File 7: src/cli/commands/show.rs +- fn style_with_hex(...) { ...hex parse logic... } ++ use crate::cli::commands::color::style_with_hex; + +@@ File 8: src/cli/commands/list.rs +- fn colored_cell_hex(...) { ...hex parse logic... } ++ use crate::cli::commands::color::colored_cell_hex; + +@@ Files Changed (Summary) ++ `src/cli/commands/color.rs` (NEW) — shared hex parsing + styling helpers +- duplicated hex parsing blocks removed from show/list +``` + +--- + +If you want, I can produce a **single consolidated patch-style diff of the plan document itself** (all section edits merged, ready to paste as iteration 7). \ No newline at end of file diff --git a/plans/work-item-status-graphql.md b/plans/work-item-status-graphql.md new file mode 100644 index 0000000..b995b4b --- /dev/null +++ b/plans/work-item-status-graphql.md @@ -0,0 +1,1627 @@ +--- +plan: true +title: "Work Item Status via GraphQL Enrichment" +status: iterating +iteration: 7 +target_iterations: 8 +beads_revision: 1 +related_plans: [] +created: 2026-02-10 +updated: 2026-02-11 +--- + +# Work Item Status via GraphQL Enrichment + +> **Bead:** bd-2y79 | **Priority:** P1 | **Status:** Planning +> **Created:** 2026-02-10 + +## Problem + +GitLab issues have native work item status (To do, In progress, Done, Won't do, Duplicate) but +this is only available via GraphQL — not the REST API we use for ingestion. Without this data, +`lore` cannot report or filter by workflow status, making it invisible to agents and humans. + +--- + +## Acceptance Criteria + +Each criterion is independently testable. Implementation is complete when ALL pass. + +### AC-1: GraphQL Client (Unit) + +- [ ] `GraphqlClient::query()` POSTs to `{base_url}/api/graphql` with `Content-Type: application/json` +- [ ] Request uses `Authorization: Bearer {token}` header (NOT `PRIVATE-TOKEN`) +- [ ] Request body is `{"query": "...", "variables": {...}}` +- [ ] Successful response: parses `data` field from JSON envelope +- [ ] Error response: if top-level `errors` array is non-empty AND `data` field is absent/null, returns `LoreError` with first error message +- [ ] Partial-data response: if `errors` array is non-empty BUT `data` field is present and non-null, returns `GraphqlQueryResult { data, had_partial_errors: true, first_partial_error: Some("...") }` (does NOT fail the query — GraphQL spec permits `data` + `errors` coexistence for partial results) +- [ ] `GraphqlQueryResult` struct: `data: serde_json::Value`, `had_partial_errors: bool`, `first_partial_error: Option` — enables callers to propagate partial-error metadata to observability surfaces +- [ ] Successful response (no errors): returns `GraphqlQueryResult { data, had_partial_errors: false, first_partial_error: None }` +- [ ] HTTP 401 → `LoreError::GitLabAuthFailed` +- [ ] HTTP 403 → `LoreError::GitLabAuthFailed` (forbidden treated as auth failure — no separate variant needed) +- [ ] HTTP 404 → `LoreError::GitLabNotFound` +- [ ] HTTP 429 → `LoreError::GitLabRateLimited` (respects `Retry-After` header — supports both delta-seconds and HTTP-date formats, falls back to 60s if unparseable) +- [ ] Network error → `LoreError::Other` + +### AC-2: Status Types (Unit) + +- [ ] `WorkItemStatus` struct has `name: String`, `category: Option`, `color: Option`, `icon_name: Option` +- [ ] `category` stored as raw string from GraphQL (e.g., `"IN_PROGRESS"`, `"TO_DO"`) — no enum, no normalization +- [ ] `WorkItemStatus` deserializes from GraphQL JSON shape with `name`, `category`, `color`, `iconName` +- [ ] All fields except `name` are `Option` — absent fields deserialize to `None` +- [ ] Custom statuses (18.5+) with non-standard category values deserialize without error + +### AC-3: Status Fetcher (Integration) + +- [ ] `fetch_issue_statuses()` returns `FetchStatusResult` containing: + - `statuses: HashMap` keyed by issue IID (parsed from GraphQL's String to i64) + - `all_fetched_iids: HashSet` — all IIDs seen in GraphQL response (for staleness clearing) + - `unsupported_reason: Option` — set when enrichment was skipped due to 404/403 (enables orchestrator to distinguish "no statuses" from "feature unavailable") + - `partial_error_count: usize` — count of pages that returned partial-data responses (data + errors) + - `first_partial_error: Option` — first partial-error message for diagnostic surfacing +- [ ] Paginates: follows `pageInfo.endCursor` + `hasNextPage` until all pages consumed +- [ ] Pagination guard: if `hasNextPage=true` but `endCursor` is `None` or unchanged, aborts pagination loop with a warning log and returns the partial result collected so far (prevents infinite loops from GraphQL cursor bugs) +- [ ] Adaptive page size: starts with `first=100`; on GraphQL complexity or timeout errors (detected via error message substring matching for `"complexity"` or `"timeout"`), retries the same cursor with halved page size (100→50→25→10); if page size 10 still fails, returns the error +- [ ] Query uses `$first: Int!` variable (not hardcoded `first: 100`) to support adaptive page sizing +- [ ] Query includes `__typename` in `widgets` selection; parser matches `__typename == "WorkItemWidgetStatus"` for deterministic widget identification (no heuristic try-deserialize) +- [ ] Non-status widgets are ignored deterministically via `__typename` check +- [ ] Issues with no status widget in `widgets` array → in `all_fetched_iids` but not in `statuses` map (no error) +- [ ] Issues with status widget but `status: null` → in `all_fetched_iids` but not in `statuses` map +- [ ] GraphQL 404 → returns `Ok(FetchStatusResult)` with empty collections, `unsupported_reason == Some(UnsupportedReason::GraphqlEndpointMissing)`, + warning log +- [ ] GraphQL 403 (`GitLabAuthFailed`) → returns `Ok(FetchStatusResult)` with empty collections, `unsupported_reason == Some(UnsupportedReason::AuthForbidden)`, + warning log + +### AC-4: Migration 021 (Unit) + +- [ ] Migration adds 5 nullable columns to `issues`: `status_name TEXT`, `status_category TEXT`, `status_color TEXT`, `status_icon_name TEXT`, `status_synced_at INTEGER` (ms epoch UTC — when enrichment last wrote/cleared this row's status) +- [ ] Adds compound index `idx_issues_project_status_name(project_id, status_name)` for `--status` filter performance +- [ ] `LATEST_SCHEMA_VERSION` becomes 21 +- [ ] Existing issues retain all data (ALTER TABLE ADD COLUMN is non-destructive) +- [ ] In-memory DB test: after migration, `SELECT status_name, status_category, status_color, status_icon_name, status_synced_at FROM issues` succeeds +- [ ] NULL default: existing rows have NULL for all 5 new columns + +### AC-5: Config Toggle (Unit) + +- [ ] `SyncConfig` has `fetch_work_item_status: bool` field +- [ ] Default value: `true` (uses existing `default_true()` helper) +- [ ] JSON key: `"fetchWorkItemStatus"` (camelCase, matching convention) +- [ ] Existing config files without this key → defaults to `true` (no breakage) + +### AC-6: Enrichment in Orchestrator (Integration) + +- [ ] Enrichment runs after `ingest_issues()` completes, before discussion sync +- [ ] Runs on every sync (not gated by `--full`) +- [ ] Gated by `config.sync.fetch_work_item_status` — if `false`, skipped entirely +- [ ] Creates `GraphqlClient` via `client.graphql_client()` factory (token stays encapsulated) +- [ ] For each project: calls `fetch_issue_statuses()`, then UPDATEs matching `issues` rows +- [ ] Enrichment DB writes are transactional per project (all-or-nothing) +- [ ] Before applying updates, NULL out status fields for issues that were fetched but have no status widget (prevents stale status from lingering when status is removed) +- [ ] If enrichment fails mid-project, prior persisted statuses are unchanged (transaction rollback) +- [ ] UPDATE SQL: `SET status_name=?, status_category=?, status_color=?, status_icon_name=?, status_synced_at=? WHERE project_id=? AND iid=?` (synced_at = current epoch ms) +- [ ] Clear SQL also sets `status_synced_at` to current epoch ms (records when we confirmed absence of status) +- [ ] Logs summary: `"Enriched {n} issues with work item status for {project}"` — includes `seen`, `enriched`, `cleared`, `without_widget` counts +- [ ] On any GraphQL error: logs warning, continues to next project (never fails the sync) +- [ ] If project path lookup fails (DB error or missing row), status enrichment is skipped for that project with warning log and `status_enrichment_error: "project_path_missing"` — does NOT fail the overall project pipeline +- [ ] `IngestProjectResult` gains `statuses_enriched: usize`, `statuses_cleared: usize`, `statuses_seen: usize`, `statuses_without_widget: usize`, `partial_error_count: usize`, `first_partial_error: Option`, and `status_enrichment_error: Option` (captures error message when enrichment fails for this project) +- [ ] Progress events: `StatusEnrichmentComplete { enriched, cleared }` and `StatusEnrichmentSkipped` (when config toggle is false) +- [ ] Enrichment log line includes `seen`, `enriched`, `cleared`, and `without_widget` counts for full observability + +### AC-7: Show Issue Display (E2E) + +**Human (`lore show issue 123`):** +- [ ] New line after "State": `Status: In progress` (colored by `status_color` hex → nearest terminal color) +- [ ] Status line only shown when `status_name IS NOT NULL` +- [ ] Category shown in parens when available, lowercased: `Status: In progress (in_progress)` + +**Robot (`lore --robot show issue 123`):** +- [ ] JSON includes `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at` fields +- [ ] Fields are `null` (not absent) when status not available +- [ ] `status_synced_at` is integer (ms epoch UTC) or `null` — enables freshness checks by consumers + +### AC-8: List Issues Display (E2E) + +**Human (`lore list issues`):** +- [ ] New "Status" column in table after "State" column +- [ ] Status name colored by `status_color` hex → nearest terminal color +- [ ] NULL status → empty cell (no placeholder text) + +**Robot (`lore --robot list issues`):** +- [ ] JSON includes `status_name`, `status_category` fields on each issue +- [ ] `--fields` supports: `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at` +- [ ] `--fields minimal` preset does NOT include status fields (keeps token count low) + +### AC-9: List Issues Filter (E2E) + +- [ ] `lore list issues --status "In progress"` → only issues where `status_name = 'In progress'` +- [ ] Filter uses case-insensitive matching (`COLLATE NOCASE`) for UX — `"in progress"` matches `"In progress"` +- [ ] `--status` is repeatable: `--status "In progress" --status "To do"` → issues matching ANY of the given statuses (OR semantics within `--status`, AND with other filters) +- [ ] Single `--status` produces `WHERE status_name = ? COLLATE NOCASE`; multiple produce `WHERE status_name IN (?, ?) COLLATE NOCASE` +- [ ] `--status` combined with other filters (e.g., `--state opened --status "To do"`) → AND logic +- [ ] `--status` with no matching issues → "No issues found." + +### AC-10: Robot Sync Envelope (E2E) + +- [ ] `lore --robot sync` JSON response includes per-project `status_enrichment` object: `{ "mode": "fetched|unsupported|skipped", "reason": null | "graphql_endpoint_missing" | "auth_forbidden", "seen": N, "enriched": N, "cleared": N, "without_widget": N, "partial_errors": N, "first_partial_error": null | "message", "error": null | "message" }` +- [ ] `mode: "fetched"` — enrichment ran normally (even if 0 statuses found) +- [ ] `mode: "unsupported"` — GraphQL returned 404 or 403; `reason` explains why; all counters are 0 +- [ ] `mode: "skipped"` — config toggle is off; all other fields are 0/null +- [ ] When enrichment fails for a project: `error` contains the error message string, counters are 0, `mode` is `"fetched"` (it attempted) +- [ ] `partial_errors` > 0 indicates GraphQL returned partial-data responses (data + errors) — agents can use this to flag potentially incomplete status data +- [ ] `seen` counter enables agents to distinguish "project has 0 issues" from "project has 500 issues but 0 with status" +- [ ] Aggregate `status_enrichment_errors: N` in top-level sync summary for quick agent health checks + +### AC-11: Compiler & Quality Gates + +- [ ] `cargo check --all-targets` passes with zero errors +- [ ] `cargo clippy --all-targets -- -D warnings` passes (pedantic + nursery enabled) +- [ ] `cargo fmt --check` passes +- [ ] `cargo test` passes — all new + existing tests green + +--- + +## GitLab API Constraints + +### Tier Requirement + +**Premium or Ultimate only.** The status widget (`WorkItemWidgetStatus`) lives entirely in +GitLab EE. On Free tier, the widget simply won't appear in the `widgets` array — no error, +just absent. + +### Version Requirements + +| GitLab Version | Status Support | +|---|---| +| < 17.11 | No status widget at all | +| 17.11 - 18.1 | Status widget (Experiment), `category` field missing | +| 18.2 - 18.3 | Status widget with feature flag (enabled by default) | +| 18.4+ | Generally available, `workItemAllowedStatuses` query added | +| 18.5+ | Custom statuses via Lifecycles | + +### Authentication + +GraphQL endpoint (`/api/graphql`) does **NOT** accept `PRIVATE-TOKEN` header. + +Must use: `Authorization: Bearer ` + +Same personal access token works — just a different header format. Requires `api` or `read_api` scope. + +### Query Path + +**Must use `workItems` resolver, NOT `project.issues`.** The legacy `issues` field returns the +old `Issue` type which does not include status widgets. + +```graphql +query($projectPath: ID!, $after: String, $first: Int!) { + project(fullPath: $projectPath) { + workItems(types: [ISSUE], first: $first, after: $after) { + nodes { + iid + state + widgets { + __typename + ... on WorkItemWidgetStatus { + status { + name + category + color + iconName + } + } + } + } + pageInfo { + endCursor + hasNextPage + } + } + } +} +``` + +### GraphQL Limits + +| Limit | Value | +|---|---| +| Max complexity | 250 (authenticated) | +| Max page size | 100 nodes | +| Max query size | 10,000 chars | +| Request timeout | 30 seconds | + +### Status Values + +**System-defined statuses (default lifecycle):** + +| ID | Name | Color | Category | Maps to State | +|----|------|-------|----------|---------------| +| 1 | To do | `#737278` (gray) | `to_do` | open | +| 2 | In progress | `#1f75cb` (blue) | `in_progress` | open | +| 3 | Done | `#108548` (green) | `done` | closed | +| 4 | Won't do | `#DD2B0E` (red) | `canceled` | closed | +| 5 | Duplicate | `#DD2B0E` (red) | `canceled` | closed | + +**Known category values:** `TRIAGE`, `TO_DO`, `IN_PROGRESS`, `DONE`, `CANCELED` + +Note: Organizations with Premium/Ultimate on 18.5+ can define up to 70 custom statuses per +namespace. Custom status names, IDs, and potentially category values will vary by instance. +We store category as a raw string (`Option`) rather than an enum to avoid breaking +on unknown values from custom lifecycles or future GitLab releases. + +### Status-State Synchronization + +Setting a status in `done`/`canceled` category automatically closes the issue. +Setting a status in `triage`/`to_do`/`in_progress` automatically reopens it. +This is bidirectional — closing via REST sets status to `default_closed_status`. + +--- + +## Implementation Detail + +### File 1: `src/gitlab/graphql.rs` (NEW) + +**Purpose:** Minimal GraphQL client and status-specific fetcher. + +```rust +use std::collections::{HashMap, HashSet}; + +use reqwest::Client; +use serde::Deserialize; +use tracing::{debug, warn}; + +use crate::core::error::{LoreError, Result}; + +use super::types::WorkItemStatus; + +// ─── GraphQL Client ────────────────────────────────────────────────────────── + +pub struct GraphqlClient { + http: Client, + base_url: String, // e.g. "https://gitlab.example.com" + token: String, +} + +/// Result of a GraphQL query — includes both data and partial-error metadata. +/// Partial errors occur when GraphQL returns both `data` and `errors` (per spec, +/// this means some fields resolved successfully while others failed). +pub struct GraphqlQueryResult { + pub data: serde_json::Value, + pub had_partial_errors: bool, + pub first_partial_error: Option, +} + +impl GraphqlClient { + pub fn new(base_url: &str, token: &str) -> Self { + let base_url = base_url.trim_end_matches('/').to_string(); + let http = Client::builder() + .timeout(std::time::Duration::from_secs(30)) + .build() + .expect("Failed to build HTTP client"); + Self { + http, + base_url, + token: token.to_string(), + } + } + + /// POST a GraphQL query and return a `GraphqlQueryResult`. + /// Returns Err if HTTP fails, JSON parse fails, or response has `errors` with no `data`. + /// If response has both `errors` and `data`, returns `GraphqlQueryResult` with + /// `had_partial_errors=true` and `first_partial_error` populated + /// (GraphQL spec permits partial results with errors). + pub async fn query( + &self, + query: &str, + variables: serde_json::Value, + ) -> Result { + let url = format!("{}/api/graphql", self.base_url); + let body = serde_json::json!({ + "query": query, + "variables": variables, + }); + + let response = self + .http + .post(&url) + .header("Authorization", format!("Bearer {}", self.token)) + .header("Content-Type", "application/json") + .json(&body) + .send() + .await + .map_err(|e| LoreError::Other(format!("GraphQL request failed: {e}")))?; + + let status = response.status(); + + if status == reqwest::StatusCode::UNAUTHORIZED + || status == reqwest::StatusCode::FORBIDDEN + { + return Err(LoreError::GitLabAuthFailed); + } + if status == reqwest::StatusCode::NOT_FOUND { + return Err(LoreError::GitLabNotFound { + resource: "GraphQL endpoint".to_string(), + }); + } + if status == reqwest::StatusCode::TOO_MANY_REQUESTS { + let retry_after = response + .headers() + .get("retry-after") + .and_then(|v| v.to_str().ok()) + .and_then(|v| { + // Try delta-seconds first (e.g., "60"), then HTTP-date + v.parse::().ok().or_else(|| { + httpdate::parse_http_date(v) + .ok() + .and_then(|dt| dt.duration_since(std::time::SystemTime::now()).ok()) + .map(|d| d.as_secs()) + }) + }) + .unwrap_or(60); // LoreError::GitLabRateLimited uses u64, not Option + return Err(LoreError::GitLabRateLimited { retry_after }); + } + if !status.is_success() { + let body = response + .text() + .await + .unwrap_or_else(|_| "unknown".to_string()); + return Err(LoreError::Other(format!( + "GraphQL HTTP {}: {}", + status, body + ))); + } + + let json: serde_json::Value = response + .json() + .await + .map_err(|e| LoreError::Other(format!("GraphQL response parse failed: {e}")))?; + + // Check for GraphQL-level errors + let errors = json + .get("errors") + .and_then(|e| e.as_array()) + .filter(|arr| !arr.is_empty()); + + let data = json.get("data").filter(|d| !d.is_null()).cloned(); + + if let Some(err_array) = errors { + let first_msg = err_array[0] + .get("message") + .and_then(|m| m.as_str()) + .unwrap_or("Unknown GraphQL error") + .to_string(); + + if let Some(data) = data { + // Partial data with errors — return data with error metadata + warn!(error = %first_msg, "GraphQL returned partial data with errors"); + return Ok(GraphqlQueryResult { + data, + had_partial_errors: true, + first_partial_error: Some(first_msg), + }); + } + // Errors only, no data + return Err(LoreError::Other(format!("GraphQL error: {first_msg}"))); + } + + data.map(|d| GraphqlQueryResult { + data: d, + had_partial_errors: false, + first_partial_error: None, + }) + .ok_or_else(|| LoreError::Other("GraphQL response missing 'data' field".to_string())) + } +} + +// ─── Status Fetcher ────────────────────────────────────────────────────────── + +const ISSUE_STATUS_QUERY: &str = r#" +query($projectPath: ID!, $after: String, $first: Int!) { + project(fullPath: $projectPath) { + workItems(types: [ISSUE], first: $first, after: $after) { + nodes { + iid + widgets { + __typename + ... on WorkItemWidgetStatus { + status { + name + category + color + iconName + } + } + } + } + pageInfo { + endCursor + hasNextPage + } + } + } +} +"#; + +/// Page sizes to try, in order, when adaptive fallback is triggered. +const PAGE_SIZES: &[u32] = &[100, 50, 25, 10]; + +/// Deserialization types for the GraphQL response shape. +/// These are private — only `WorkItemStatus` escapes via the return type. + +#[derive(Deserialize)] +struct WorkItemsResponse { + project: Option, +} + +#[derive(Deserialize)] +struct ProjectNode { + #[serde(rename = "workItems")] + work_items: Option, +} + +#[derive(Deserialize)] +struct WorkItemConnection { + nodes: Vec, + #[serde(rename = "pageInfo")] + page_info: PageInfo, +} + +#[derive(Deserialize)] +struct WorkItemNode { + iid: String, // GraphQL returns iid as String + widgets: Vec, +} + +#[derive(Deserialize)] +struct PageInfo { + #[serde(rename = "endCursor")] + end_cursor: Option, + #[serde(rename = "hasNextPage")] + has_next_page: bool, +} + +#[derive(Deserialize)] +struct StatusWidget { + status: Option, +} + +/// Reason why status enrichment was not available for a project. +/// Used in `FetchStatusResult` to distinguish +/// "no statuses found" from "feature unavailable." +#[derive(Debug, Clone)] +pub enum UnsupportedReason { + /// GraphQL endpoint returned 404 (old GitLab, self-hosted with GraphQL disabled) + GraphqlEndpointMissing, + /// GraphQL endpoint returned 403 (insufficient permissions) + AuthForbidden, +} + +/// Result of fetching issue statuses — includes both the status map and the +/// set of all IIDs that were seen in the GraphQL response (with or without status). +/// `all_fetched_iids` is a `HashSet` for O(1) staleness lookups in the orchestrator +/// when NULLing out status fields for issues that no longer have a status widget. +/// +/// `unsupported_reason` is `Some(...)` when enrichment was skipped due to 404/403 — +/// enables the orchestrator to distinguish "project has no statuses" from "feature +/// unavailable" for observability and robot sync output. +/// +/// `partial_error_count` and `first_partial_error` track pages where GraphQL returned +/// both `data` and `errors` — enables agents to detect potentially incomplete data. +pub struct FetchStatusResult { + pub statuses: HashMap, + pub all_fetched_iids: HashSet, + pub unsupported_reason: Option, + pub partial_error_count: usize, + pub first_partial_error: Option, +} + +/// Returns true if a GraphQL error message suggests the query is too complex +/// or timed out — conditions where reducing page size may help. +fn is_complexity_or_timeout_error(msg: &str) -> bool { + let lower = msg.to_ascii_lowercase(); + lower.contains("complexity") || lower.contains("timeout") +} + +/// Fetch work item statuses for all issues in a project. +/// Returns a `FetchStatusResult` containing: +/// - `statuses`: map of IID (i64) → WorkItemStatus for issues that have a status widget +/// - `all_fetched_iids`: all IIDs seen in the GraphQL response as `HashSet` (for O(1) staleness clearing) +/// - `unsupported_reason`: `Some(...)` when enrichment was skipped due to 404/403 +/// - `partial_error_count` / `first_partial_error`: partial-error tracking across pages +/// +/// Paginates through all results. Uses adaptive page sizing: starts with 100, +/// falls back to 50→25→10 on complexity/timeout errors. +/// Includes a pagination guard: aborts if `hasNextPage=true` but cursor is +/// `None` or unchanged from the previous page (prevents infinite loops from GraphQL bugs). +/// +/// On 404/403: returns Ok(FetchStatusResult) with empty maps + unsupported_reason + warning log. +/// On other errors: returns Err. +pub async fn fetch_issue_statuses( + client: &GraphqlClient, + project_path: &str, +) -> Result { + let mut statuses = HashMap::new(); + let mut all_fetched_iids = HashSet::new(); + let mut cursor: Option = None; + let mut page_size_idx = 0; // index into PAGE_SIZES + let mut partial_error_count = 0usize; + let mut first_partial_error: Option = None; + + loop { + let current_page_size = PAGE_SIZES[page_size_idx]; + let variables = serde_json::json!({ + "projectPath": project_path, + "after": cursor, + "first": current_page_size, + }); + + let query_result = match client.query(ISSUE_STATUS_QUERY, variables).await { + Ok(result) => result, + Err(LoreError::GitLabNotFound { .. }) => { + warn!( + project = project_path, + "GraphQL endpoint not found — skipping status enrichment" + ); + return Ok(FetchStatusResult { + statuses, + all_fetched_iids, + unsupported_reason: Some(UnsupportedReason::GraphqlEndpointMissing), + partial_error_count, + first_partial_error, + }); + } + Err(LoreError::GitLabAuthFailed) => { + warn!( + project = project_path, + "GraphQL auth failed — skipping status enrichment" + ); + return Ok(FetchStatusResult { + statuses, + all_fetched_iids, + unsupported_reason: Some(UnsupportedReason::AuthForbidden), + partial_error_count, + first_partial_error, + }); + } + Err(LoreError::Other(ref msg)) if is_complexity_or_timeout_error(msg) => { + // Adaptive page size: try smaller page + if page_size_idx + 1 < PAGE_SIZES.len() { + let old_size = PAGE_SIZES[page_size_idx]; + page_size_idx += 1; + let new_size = PAGE_SIZES[page_size_idx]; + warn!( + project = project_path, + old_size, + new_size, + "GraphQL complexity/timeout error — reducing page size and retrying" + ); + continue; // retry same cursor with smaller page + } + // Smallest page size still failed + return Err(LoreError::Other(msg.clone())); + } + Err(e) => return Err(e), + }; + + // Track partial-error metadata + if query_result.had_partial_errors { + partial_error_count += 1; + if first_partial_error.is_none() { + first_partial_error = query_result.first_partial_error.clone(); + } + } + + let response: WorkItemsResponse = serde_json::from_value(query_result.data) + .map_err(|e| LoreError::Other(format!("Failed to parse GraphQL response: {e}")))?; + + let connection = match response + .project + .and_then(|p| p.work_items) + { + Some(c) => c, + None => { + debug!( + project = project_path, + "No workItems in GraphQL response" + ); + break; + } + }; + + for node in &connection.nodes { + if let Ok(iid) = node.iid.parse::() { + all_fetched_iids.insert(iid); + + // Find the status widget via __typename for deterministic matching + for widget_value in &node.widgets { + let is_status_widget = widget_value + .get("__typename") + .and_then(|t| t.as_str()) + == Some("WorkItemWidgetStatus"); + + if is_status_widget { + if let Ok(sw) = serde_json::from_value::(widget_value.clone()) + && let Some(status) = sw.status + { + statuses.insert(iid, status); + } + } + } + } + } + + // After a successful page, reset page size back to max for next page + // (the complexity issue may be cursor-position-specific) + page_size_idx = 0; + + if connection.page_info.has_next_page { + let new_cursor = connection.page_info.end_cursor; + // Pagination guard: abort if cursor is None or unchanged (prevents infinite loops) + if new_cursor.is_none() || new_cursor == cursor { + warn!( + project = project_path, + pages_fetched = all_fetched_iids.len(), + "Pagination cursor stalled (hasNextPage=true but cursor unchanged) — returning partial result" + ); + break; + } + cursor = new_cursor; + } else { + break; + } + } + + debug!( + project = project_path, + count = statuses.len(), + total_fetched = all_fetched_iids.len(), + partial_error_count, + "Fetched issue statuses via GraphQL" + ); + + Ok(FetchStatusResult { + statuses, + all_fetched_iids, + unsupported_reason: None, + partial_error_count, + first_partial_error, + }) +} + +/// Map RGB to nearest ANSI 256-color index using the 6x6x6 color cube (indices 16-231). +/// NOTE: clippy::items_after_test_module — this MUST be placed BEFORE the test module. +pub fn ansi256_from_rgb(r: u8, g: u8, b: u8) -> u8 { + let ri = ((u16::from(r) * 5 + 127) / 255) as u8; + let gi = ((u16::from(g) * 5 + 127) / 255) as u8; + let bi = ((u16::from(b) * 5 + 127) / 255) as u8; + 16 + 36 * ri + 6 * gi + bi +} + +#[cfg(test)] +mod tests { + use super::*; + // Tests use wiremock or similar mock server + // See TDD Plan section for test specifications +} +``` + +### File 2: `src/gitlab/mod.rs` (MODIFY) + +**Existing code (14 lines):** +```rust +pub mod client; +pub mod transformers; +pub mod types; +``` + +**Add after `pub mod types;`:** +```rust +pub mod graphql; +``` + +**Add to `pub use types::{...}` list:** +```rust +WorkItemStatus, +``` + +### File 3: `src/gitlab/types.rs` (MODIFY) + +**Add at end of file (after `GitLabMergeRequest` struct, before any `#[cfg(test)]`):** + +```rust +/// Work item status from GitLab GraphQL API. +/// Stored in the `issues` table columns: status_name, status_category, status_color, status_icon_name. +/// +/// `category` is stored as a raw string from GraphQL (e.g., "IN_PROGRESS", "TO_DO", "DONE"). +/// No enum — custom statuses on GitLab 18.5+ can have arbitrary category values, and even +/// system-defined categories may change across GitLab versions. Storing the raw string avoids +/// serde deserialization failures on unknown values. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct WorkItemStatus { + pub name: String, + pub category: Option, + pub color: Option, + #[serde(rename = "iconName")] + pub icon_name: Option, +} +``` + +### File 4: `src/core/db.rs` (MODIFY) + +**Existing pattern — each migration is a tuple of `(&str, &str)` loaded via `include_str!`:** +```rust +const MIGRATIONS: &[(&str, &str)] = &[ + ("001", include_str!("../../migrations/001_initial.sql")), + // ... 002 through 020 ... + ("020", include_str!("../../migrations/020_mr_diffs_watermark.sql")), +]; +``` + +**Add as the 21st entry at the end of the `MIGRATIONS` array:** +```rust + ("021", include_str!("../../migrations/021_work_item_status.sql")), +``` + +`LATEST_SCHEMA_VERSION` is computed as `MIGRATIONS.len() as i32` — automatically becomes 21. + +### File 5: `src/core/config.rs` (MODIFY) + +**In `SyncConfig` struct, add after `fetch_mr_file_changes`:** +```rust +#[serde(rename = "fetchWorkItemStatus", default = "default_true")] +pub fetch_work_item_status: bool, +``` + +**In `impl Default for SyncConfig`, add after `fetch_mr_file_changes: true`:** +```rust +fetch_work_item_status: true, +``` + +### File 6: `src/ingestion/orchestrator.rs` (MODIFY) + +**Existing `ingest_project_issues_with_progress` flow (simplified):** +``` +Phase 1: ingest_issues() +Phase 2: sync_discussions() ← discussions for changed issues +Phase 3: drain_resource_events() ← if config.sync.fetch_resource_events +Phase 4: extract_refs_from_state_events() +``` + +**Insert new Phase 1.5 between issue ingestion and discussion sync.** + +The orchestrator function receives a `&GitLabClient`. Use the new `client.graphql_client()` +factory (File 13) to get a ready-to-use GraphQL client without exposing the raw token. +The function has `project_id: i64` but needs `path_with_namespace` for GraphQL — look it up from DB. +**The path lookup uses `.optional()?` to make failure non-fatal** — if the project path is +missing or the query fails, enrichment is skipped with a structured error rather than +failing the entire project pipeline: + +```rust +// ── Phase 1.5: GraphQL Status Enrichment ──────────────────────────────── + +if config.sync.fetch_work_item_status && !signal.is_cancelled() { + // Get project path for GraphQL query (orchestrator only has project_id). + // Non-fatal: if path is missing, skip enrichment with structured error. + let project_path: Option = conn + .query_row( + "SELECT path_with_namespace FROM projects WHERE id = ?1", + [project_id], + |r| r.get(0), + ) + .optional()?; + + let Some(project_path) = project_path else { + warn!(project_id, "Project path not found — skipping status enrichment"); + result.status_enrichment_error = Some("project_path_missing".to_string()); + result.status_enrichment_mode = "fetched".to_string(); // attempted but unavailable + emit(ProgressEvent::StatusEnrichmentComplete { enriched: 0, cleared: 0 }); + // Fall through to discussion sync — enrichment failure is non-fatal + }; + + if let Some(ref project_path) = project_path { + let graphql_client = client.graphql_client(); // factory keeps token encapsulated + + match crate::gitlab::graphql::fetch_issue_statuses( + &graphql_client, + project_path, + ) + .await + { + Ok(fetch_result) => { + // Record unsupported reason for robot sync output + result.status_enrichment_mode = match &fetch_result.unsupported_reason { + Some(crate::gitlab::graphql::UnsupportedReason::GraphqlEndpointMissing) => { + result.status_unsupported_reason = Some("graphql_endpoint_missing".to_string()); + "unsupported".to_string() + } + Some(crate::gitlab::graphql::UnsupportedReason::AuthForbidden) => { + result.status_unsupported_reason = Some("auth_forbidden".to_string()); + "unsupported".to_string() + } + None => "fetched".to_string(), + }; + + // Coverage telemetry + result.statuses_seen = fetch_result.all_fetched_iids.len(); + result.partial_error_count = fetch_result.partial_error_count; + result.first_partial_error = fetch_result.first_partial_error.clone(); + + let now_ms = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .unwrap_or_default() + .as_millis() as i64; + let (enriched, cleared) = enrich_issue_statuses_txn( + conn, project_id, &fetch_result.statuses, &fetch_result.all_fetched_iids, now_ms, + )?; + result.statuses_enriched = enriched; + result.statuses_cleared = cleared; + result.statuses_without_widget = result.statuses_seen + .saturating_sub(enriched); + info!( + project = %project_path, + seen = result.statuses_seen, + enriched, + cleared, + without_widget = result.statuses_without_widget, + partial_errors = result.partial_error_count, + "Issue status enrichment complete" + ); + } + Err(e) => { + let msg = format!("{e}"); + warn!( + project = %project_path, + error = %msg, + "GraphQL status enrichment failed — continuing without status data" + ); + result.status_enrichment_error = Some(msg); + result.status_enrichment_mode = "fetched".to_string(); // it attempted + } + } + + emit(ProgressEvent::StatusEnrichmentComplete { + enriched: result.statuses_enriched, + cleared: result.statuses_cleared, + }); + } +} else { + result.status_enrichment_mode = "skipped".to_string(); + emit(ProgressEvent::StatusEnrichmentSkipped); +} +``` + +**New helper function in `orchestrator.rs` — transactional with staleness clearing:** + +```rust +/// Apply status enrichment within a transaction. Two phases: +/// 1. NULL out status fields for issues that were fetched but have no status widget +/// (prevents stale status from lingering when a status is removed in GitLab) +/// 2. Apply new status values for issues that do have a status widget +/// +/// Both phases write `status_synced_at` to record when enrichment last touched each row. +/// If anything fails, the entire transaction rolls back — no partial updates. +fn enrich_issue_statuses_txn( + conn: &Connection, + project_id: i64, + statuses: &HashMap, + all_fetched_iids: &HashSet, + now_ms: i64, +) -> Result<(usize, usize)> { + let tx = conn.unchecked_transaction()?; + + // Phase 1: Clear stale statuses for fetched issues that no longer have a status widget + let mut clear_stmt = tx.prepare_cached( + "UPDATE issues + SET status_name = NULL, status_category = NULL, status_color = NULL, + status_icon_name = NULL, status_synced_at = ?3 + WHERE project_id = ?1 AND iid = ?2 + AND status_name IS NOT NULL", + )?; + let mut cleared = 0usize; + for iid in all_fetched_iids { + if !statuses.contains_key(iid) { + let rows = clear_stmt.execute(rusqlite::params![project_id, iid, now_ms])?; + if rows > 0 { + cleared += 1; + } + } + } + + // Phase 2: Apply new status values + let mut update_stmt = tx.prepare_cached( + "UPDATE issues + SET status_name = ?1, status_category = ?2, status_color = ?3, + status_icon_name = ?4, status_synced_at = ?5 + WHERE project_id = ?6 AND iid = ?7", + )?; + + let mut enriched = 0; + for (iid, status) in statuses { + let rows = update_stmt.execute(rusqlite::params![ + &status.name, + &status.category, + &status.color, + &status.icon_name, + now_ms, + project_id, + iid, + ])?; + if rows > 0 { + enriched += 1; + } + } + + tx.commit()?; + Ok((enriched, cleared)) +} +``` + +**Modify `IngestProjectResult` — add fields:** +```rust +pub statuses_enriched: usize, +pub statuses_cleared: usize, +pub statuses_seen: usize, // total IIDs in GraphQL response +pub statuses_without_widget: usize, // seen - enriched (coverage metric) +pub partial_error_count: usize, // pages with partial-data responses +pub first_partial_error: Option, // first partial-error message +pub status_enrichment_error: Option, +pub status_enrichment_mode: String, // "fetched" | "unsupported" | "skipped" +pub status_unsupported_reason: Option, // "graphql_endpoint_missing" | "auth_forbidden" +``` + +**Modify `ProgressEvent` enum — add variants:** +```rust +StatusEnrichmentComplete { enriched: usize, cleared: usize }, +StatusEnrichmentSkipped, // emitted when config.sync.fetch_work_item_status is false +``` + +### File 7: `src/cli/commands/show.rs` (MODIFY) + +**Modify `IssueRow` (private struct in show.rs) — add 5 fields:** +```rust +status_name: Option, +status_category: Option, +status_color: Option, +status_icon_name: Option, +status_synced_at: Option, +``` + +**Modify BOTH `find_issue` SQL queries (with and without project filter) — add 5 columns to SELECT:** + +Note: `find_issue()` has two separate SQL strings — one for when `project_filter` is `Some` and +one for `None`. Both must be updated identically. + +```sql +SELECT i.id, i.iid, i.title, i.description, i.state, i.author_username, + i.created_at, i.updated_at, i.web_url, p.path_with_namespace, + i.due_date, i.milestone_title, + i.status_name, i.status_category, i.status_color, i.status_icon_name, i.status_synced_at +``` + +Column indices: `status_name=12, status_category=13, status_color=14, status_icon_name=15, status_synced_at=16` + +**Modify `find_issue` row mapping — add after `milestone_title: row.get(11)?`:** +```rust +status_name: row.get(12)?, +status_category: row.get(13)?, +status_color: row.get(14)?, +status_icon_name: row.get(15)?, +status_synced_at: row.get(16)?, +``` + +**Modify `IssueDetail` (public struct) — add 5 fields:** +```rust +pub status_name: Option, +pub status_category: Option, +pub status_color: Option, +pub status_icon_name: Option, +pub status_synced_at: Option, +``` + +**Modify `run_show_issue` return — add fields from `issue` row:** +```rust +status_name: issue.status_name, +status_category: issue.status_category, +status_color: issue.status_color, +status_icon_name: issue.status_icon_name, +status_synced_at: issue.status_synced_at, +``` + +**Modify `print_show_issue` — add after the "State:" line (currently ~line 604):** +```rust +if let Some(status) = &issue.status_name { + let status_display = if let Some(cat) = &issue.status_category { + format!("{status} ({})", cat.to_ascii_lowercase()) + } else { + status.clone() + }; + println!( + "Status: {}", + style_with_hex(&status_display, issue.status_color.as_deref()) + ); +} +``` + +**New helper function for hex color → terminal color:** +```rust +fn style_with_hex<'a>(text: &'a str, hex: Option<&str>) -> console::StyledObject<&'a str> { + let Some(hex) = hex else { + return style(text); + }; + let hex = hex.trim_start_matches('#'); + // NOTE: clippy::collapsible_if — must combine conditions + if hex.len() == 6 + && let (Ok(r), Ok(g), Ok(b)) = ( + u8::from_str_radix(&hex[0..2], 16), + u8::from_str_radix(&hex[2..4], 16), + u8::from_str_radix(&hex[4..6], 16), + ) + { + return style(text).color256(crate::gitlab::graphql::ansi256_from_rgb(r, g, b)); + } + style(text) +} +``` + +**Modify `IssueDetailJson` — add 5 fields:** +```rust +pub status_name: Option, +pub status_category: Option, +pub status_color: Option, +pub status_icon_name: Option, +pub status_synced_at: Option, +``` + +**Modify `From<&IssueDetail> for IssueDetailJson` — add after existing field mappings:** +```rust +status_name: d.status_name.clone(), +status_category: d.status_category.clone(), +status_color: d.status_color.clone(), +status_icon_name: d.status_icon_name.clone(), +status_synced_at: d.status_synced_at, +``` + +### File 8: `src/cli/commands/list.rs` (MODIFY) + +**Modify `IssueListRow` — add 5 fields:** +```rust +pub status_name: Option, +pub status_category: Option, +pub status_color: Option, +pub status_icon_name: Option, +pub status_synced_at: Option, +``` + +**Modify `IssueListRowJson` — add 5 fields (all for robot mode):** +```rust +pub status_name: Option, +pub status_category: Option, +pub status_color: Option, +pub status_icon_name: Option, +pub status_synced_at: Option, +``` + +**Modify `From<&IssueListRow> for IssueListRowJson` — add after existing field mappings:** +```rust +status_name: r.status_name.clone(), +status_category: r.status_category.clone(), +status_color: r.status_color.clone(), +status_icon_name: r.status_icon_name.clone(), +status_synced_at: r.status_synced_at, +``` + +**Modify `query_issues` SELECT — add 5 columns after `unresolved_count` subquery:** +```sql +i.status_name, +i.status_category, +i.status_color, +i.status_icon_name, +i.status_synced_at +``` + +**Modify `query_issues` row mapping — add after `unresolved_count: row.get(11)?`:** + +The existing `query_issues` SELECT has 12 columns (indices 0-11). The 5 new status columns +append as indices 12-16: + +```rust +status_name: row.get(12)?, +status_category: row.get(13)?, +status_color: row.get(14)?, +status_icon_name: row.get(15)?, +status_synced_at: row.get(16)?, +``` + +**Modify `ListFilters` — add status filter:** +```rust +pub statuses: &'a [String], +``` + +**Modify `query_issues` WHERE clause builder — add after `has_due_date` block:** +```rust +if !filters.statuses.is_empty() { + if filters.statuses.len() == 1 { + where_clauses.push("i.status_name = ? COLLATE NOCASE"); + params.push(Box::new(filters.statuses[0].clone())); + } else { + // Build IN clause: "i.status_name IN (?, ?, ?) COLLATE NOCASE" + let placeholders = vec!["?"; filters.statuses.len()].join(", "); + where_clauses.push(format!("i.status_name IN ({placeholders}) COLLATE NOCASE")); + for s in filters.statuses { + params.push(Box::new(s.clone())); + } + } +} +``` + +**Modify `print_list_issues` table — add "Status" column header between "State" and "Assignee":** + +Current column order: IID, Title, State, Assignee, Labels, Disc, Updated +New column order: IID, Title, State, **Status**, Assignee, Labels, Disc, Updated + +```rust +Cell::new("Status").add_attribute(Attribute::Bold), +``` + +**Modify `print_list_issues` row — add status cell after state cell (before assignee cell):** +```rust +let status_cell = match &issue.status_name { + Some(status) => colored_cell_hex(status, issue.status_color.as_deref()), + None => Cell::new(""), +}; +``` + +**New helper function for hex → `comfy_table::Color`:** +```rust +fn colored_cell_hex(content: impl std::fmt::Display, hex: Option<&str>) -> Cell { + let Some(hex) = hex else { + return Cell::new(content); + }; + if !console::colors_enabled() { + return Cell::new(content); + } + let hex = hex.trim_start_matches('#'); + // NOTE: clippy::collapsible_if — must combine conditions + if hex.len() == 6 + && let (Ok(r), Ok(g), Ok(b)) = ( + u8::from_str_radix(&hex[0..2], 16), + u8::from_str_radix(&hex[2..4], 16), + u8::from_str_radix(&hex[4..6], 16), + ) + { + return Cell::new(content).fg(Color::Rgb { r, g, b }); + } + Cell::new(content) +} +``` + +### File 9: `src/cli/mod.rs` (MODIFY) + +**In `IssuesArgs` struct, add `--status` flag (after `--milestone` or similar filter flags):** +```rust +/// Filter by work item status name (e.g., "In progress"). Repeatable for OR semantics. +#[arg(long, help_heading = "Filters")] +pub status: Vec, +``` + +### File 10: `src/main.rs` (MODIFY) + +**In `handle_issues` function (~line 695), add `statuses` to `ListFilters` construction:** +```rust +let filters = ListFilters { + // ... existing fields ... + statuses: &args.status, +}; +``` + +**In legacy `List` command handler (~line 2421), also add `statuses: &[]` to `ListFilters`:** +```rust +let filters = ListFilters { + // ... existing fields ... + statuses: &[], // legacy command has no --status flag +}; +``` + +### File 11: `src/cli/autocorrect.rs` (MODIFY) + +**In `COMMAND_FLAGS` array (~line 52), add `"--status"` to the `"issues"` entry:** +```rust +( + "issues", + &[ + // ... existing flags ... + "--has-due", + "--no-has-due", + "--status", // <-- ADD THIS + "--sort", + // ... + ], +), +``` + +The `registry_covers_command_flags` test validates all clap flags are registered here — it will +fail if `--status` is missing. + +### File 12: `src/cli/commands/ingest.rs` (MODIFY) + +**In the `ProgressEvent` match within the progress callback, add new arms:** +```rust +ProgressEvent::StatusEnrichmentComplete { enriched, cleared } => { + // handle progress display +} +ProgressEvent::StatusEnrichmentSkipped => { + // handle skipped display (config toggle off) +} +``` + +The existing match is exhaustive — adding new variants to the enum without adding +these arms will cause a compile error. + +### File 13: `src/gitlab/client.rs` (MODIFY) + +**Add `GraphqlClient` factory — keeps token encapsulated (no raw accessor):** +```rust +/// Create a GraphQL client using the same base URL and token as this REST client. +/// The token is not exposed — only a ready-to-use GraphQL client is returned. +pub fn graphql_client(&self) -> crate::gitlab::graphql::GraphqlClient { + crate::gitlab::graphql::GraphqlClient::new(&self.base_url, &self.token) +} +``` + +### File 14: `migrations/021_work_item_status.sql` (NEW) + +```sql +ALTER TABLE issues ADD COLUMN status_name TEXT; +ALTER TABLE issues ADD COLUMN status_category TEXT; +ALTER TABLE issues ADD COLUMN status_color TEXT; +ALTER TABLE issues ADD COLUMN status_icon_name TEXT; +ALTER TABLE issues ADD COLUMN status_synced_at INTEGER; + +CREATE INDEX IF NOT EXISTS idx_issues_project_status_name + ON issues(project_id, status_name); +``` + +--- + +## Migration Numbering + +Current state: migrations 001-020 exist on disk and in `MIGRATIONS` array. +This feature uses **migration 021**. + +## Files Changed (Summary) + +| File | Change | Lines (est) | +|---|---|---| +| `migrations/021_work_item_status.sql` | **NEW** — 5 ALTER TABLE ADD COLUMN + 1 index | 8 | +| `src/gitlab/graphql.rs` | **NEW** — GraphQL client (with `GraphqlQueryResult` struct, partial-error metadata, HTTP-date Retry-After) + status fetcher + `FetchStatusResult` (HashSet, partial-error counters) + `UnsupportedReason` enum + adaptive page sizing + pagination guard + `ansi256_from_rgb` + `__typename` matching | ~380 | +| `src/gitlab/mod.rs` | Add `pub mod graphql;` + re-exports | +3 | +| `src/gitlab/types.rs` | Add `WorkItemStatus` struct (no enum — category is `Option`) | +15 | +| `src/gitlab/client.rs` | Add `pub fn graphql_client()` factory | +5 | +| `src/core/db.rs` | Add migration 021 to `MIGRATIONS` array | +5 | +| `src/core/config.rs` | Add `fetch_work_item_status` to `SyncConfig` + Default | +4 | +| `src/ingestion/orchestrator.rs` | Enrichment step (non-fatal path lookup) + `enrich_issue_statuses_txn()` (with `now_ms` param, returns enriched+cleared) + coverage telemetry (`seen`, `without_widget`, `partial_error_count`, `first_partial_error`) + `status_enrichment_error` + `status_enrichment_mode` + `status_unsupported_reason` on result + progress fields | +125 | +| `src/cli/commands/show.rs` | 5 fields on `IssueRow`/`IssueDetail`/`IssueDetailJson`, display, hex color helper | +45 | +| `src/cli/commands/list.rs` | 5 fields on row types, SQL, column, repeatable `--status` filter with IN clause, hex color helper | +50 | +| `src/cli/commands/ingest.rs` | Add `StatusEnrichmentComplete` + `StatusEnrichmentSkipped` match arms to progress callback | +6 | +| `src/cli/mod.rs` | `--status` flag on `IssuesArgs` (`Vec` for repeatable) | +3 | +| `src/cli/autocorrect.rs` | Add `"--status"` to `COMMAND_FLAGS` issues entry | +1 | +| `src/main.rs` | Wire `statuses` into both `ListFilters` constructions | +2 | + +**Total: ~650 lines new/modified across 14 files (2 new, 12 modified)** + +## TDD Plan + +### RED Phase — Tests to Write First + +1. **`test_graphql_query_success`** — mock HTTP server returns `{"data":{"foo":"bar"}}` → `Ok(GraphqlQueryResult { data: json!({"foo":"bar"}), had_partial_errors: false, first_partial_error: None })` +2. **`test_graphql_query_with_errors_no_data`** — mock returns `{"errors":[{"message":"bad"}]}` (no `data` field) → `Err(LoreError::Other("GraphQL error: bad"))` +3. **`test_graphql_auth_uses_bearer`** — mock asserts request has `Authorization: Bearer tok123` header +4. **`test_graphql_401_maps_to_auth_failed`** — mock returns 401 → `Err(LoreError::GitLabAuthFailed)` +5. **`test_graphql_403_maps_to_auth_failed`** — mock returns 403 → `Err(LoreError::GitLabAuthFailed)` +6. **`test_graphql_404_maps_to_not_found`** — mock returns 404 → `Err(LoreError::GitLabNotFound)` +7. **`test_work_item_status_deserialize`** — parse `{"name":"In progress","category":"IN_PROGRESS","color":"#1f75cb","iconName":"status-in-progress"}` → category is `Some("IN_PROGRESS")` +8. **`test_work_item_status_optional_fields`** — parse `{"name":"To do"}` → category/color/icon_name are None +9. **`test_work_item_status_unknown_category`** — parse `{"name":"Custom","category":"SOME_FUTURE_VALUE"}` → category is `Some("SOME_FUTURE_VALUE")` (no deserialization error) +10. **`test_work_item_status_null_category`** — parse `{"name":"In progress","category":null}` → category is None +11. **`test_fetch_statuses_pagination`** — mock returns 2 pages → all statuses in map, all_fetched_iids includes all IIDs +12. **`test_fetch_statuses_no_status_widget`** — mock returns widgets without StatusWidget → empty statuses map, all_fetched_iids still populated +13. **`test_fetch_statuses_404_graceful`** — mock returns 404 → `Ok(FetchStatusResult)` with empty maps, `unsupported_reason == Some(GraphqlEndpointMissing)` +14. **`test_fetch_statuses_403_graceful`** — mock returns 403 → `Ok(FetchStatusResult)` with empty maps, `unsupported_reason == Some(AuthForbidden)` +15. **`test_migration_021_adds_columns`** — in-memory DB: `PRAGMA table_info(issues)` includes 5 new columns (including `status_synced_at`) +16. **`test_migration_021_adds_index`** — in-memory DB: `PRAGMA index_list(issues)` includes `idx_issues_project_status_name` +17. **`test_enrich_issue_statuses_txn`** — insert issue, call `enrich_issue_statuses_txn()`, verify 4 status columns populated + `status_synced_at` is set to `now_ms` +18. **`test_enrich_skips_unknown_iids`** — status map has IID not in DB → no error, returns 0 +19. **`test_enrich_clears_removed_status`** — issue previously had status, now in `all_fetched_iids` but not in `statuses` → status fields NULLed out, `status_synced_at` updated to `now_ms` (not NULLed — confirms we checked this row) +20. **`test_enrich_transaction_rolls_back_on_failure`** — simulate failure mid-enrichment → no partial updates, prior status values intact +21. **`test_list_filter_by_status`** — insert 2 issues with different statuses, filter returns correct one +22. **`test_list_filter_by_status_case_insensitive`** — `--status "in progress"` matches `"In progress"` via COLLATE NOCASE +23. **`test_config_fetch_work_item_status_default_true`** — `SyncConfig::default().fetch_work_item_status == true` +24. **`test_config_deserialize_without_key`** — JSON without `fetchWorkItemStatus` → defaults to `true` +25. **`test_ansi256_from_rgb`** — known conversions: `(0,0,0)→16`, `(255,255,255)→231`, `(31,117,203)→~68` +26. **`test_enrich_idempotent_across_two_runs`** — run `enrich_issue_statuses_txn()` twice with same data → columns unchanged, `enriched` count same both times +27. **`test_typename_matching_ignores_non_status_widgets`** — widgets array with `__typename: "WorkItemWidgetDescription"` → not parsed as status, no error +28. **`test_retry_after_http_date_format`** — mock returns 429 with `Retry-After: Wed, 11 Feb 2026 01:00:00 GMT` → parses to delta seconds from now +29. **`test_retry_after_invalid_falls_back_to_60`** — mock returns 429 with `Retry-After: garbage` → falls back to 60 +30. **`test_enrich_sets_synced_at_on_clear`** — issue with status cleared → `status_synced_at` is `now_ms` (not NULL) +31. **`test_enrichment_error_captured_in_result`** — simulate GraphQL error → `IngestProjectResult.status_enrichment_error` contains error message string +32. **`test_robot_sync_includes_status_enrichment`** — robot sync output JSON includes per-project `status_enrichment` object with `mode`, `reason`, `seen`, `enriched`, `cleared`, `without_widget`, `partial_errors`, `first_partial_error`, `error` fields +33. **`test_graphql_partial_data_with_errors_returns_data`** — mock returns `{"data":{"foo":"bar"},"errors":[{"message":"partial failure"}]}` → `Ok(GraphqlQueryResult { data: json!({"foo":"bar"}), had_partial_errors: true, first_partial_error: Some("partial failure") })` +34. **`test_fetch_statuses_cursor_stall_aborts`** — mock returns `hasNextPage: true` with same `endCursor` on consecutive pages → pagination aborts with warning, returns partial result collected so far +35. **`test_fetch_statuses_unsupported_reason_none_on_success`** — successful fetch → `unsupported_reason` is `None` +36. **`test_fetch_statuses_complexity_error_reduces_page_size`** — mock returns complexity error on first page with `first=100` → retries with `first=50`, succeeds → result contains all statuses from the successful page +37. **`test_fetch_statuses_timeout_error_reduces_page_size`** — mock returns timeout error on first page with `first=100` → retries with smaller page size, succeeds +38. **`test_fetch_statuses_smallest_page_still_fails`** — mock returns complexity error at all page sizes (100→50→25→10) → returns Err +39. **`test_fetch_statuses_page_size_resets_after_success`** — first page succeeds at 100, second page fails at 100 → falls back to 50 for page 2, page 3 retries at 100 (reset after success) +40. **`test_list_filter_by_multiple_statuses`** — `--status "In progress" --status "To do"` → returns issues matching either status +41. **`test_project_path_missing_skips_enrichment`** — project with no `path_with_namespace` in DB → enrichment skipped, `status_enrichment_error == "project_path_missing"`, sync continues +42. **`test_fetch_statuses_partial_errors_tracked`** — mock returns partial-data response on one page → `partial_error_count == 1`, `first_partial_error` populated + +### GREEN Phase — Build Order + +**Batch 1: Types + Migration** (Files 3, 4, 14 → Tests 7-10, 15-16, 23-24) +- Create `migrations/021_work_item_status.sql` (5 columns + index) +- Add `WorkItemStatus` struct to `types.rs` (no enum — category is `Option`) +- Register migration 021 in `db.rs` +- Add `fetch_work_item_status` to `config.rs` +- Run: `cargo test test_work_item_status test_migration_021 test_config` + +**Batch 2: GraphQL Client** (Files 1, 2, 13 → Tests 1-6, 11-14, 25, 27-29, 33-39, 42) +- Create `src/gitlab/graphql.rs` with full client (`GraphqlQueryResult` struct, partial-error metadata, HTTP-date Retry-After) + fetcher + `FetchStatusResult` (with `partial_error_count`/`first_partial_error`) + `UnsupportedReason` + adaptive page sizing + pagination guard + `ansi256_from_rgb` +- Add `pub mod graphql;` to `gitlab/mod.rs` +- Add `pub fn graphql_client()` factory to `client.rs` +- Add `httpdate` crate to `Cargo.toml` dependencies (for Retry-After HTTP-date parsing) +- Run: `cargo test graphql` + +**Batch 3: Orchestrator** (Files 6, 12 → Tests 17-20, 26, 30-32, 41) +- Add enrichment phase (with non-fatal path lookup) + `enrich_issue_statuses_txn()` (with `now_ms` param) to orchestrator +- Add coverage telemetry fields (`statuses_seen`, `statuses_without_widget`, `partial_error_count`, `first_partial_error`) to `IngestProjectResult` +- Add `status_enrichment_error: Option`, `status_enrichment_mode: String`, `status_unsupported_reason: Option` to `IngestProjectResult` +- Add `StatusEnrichmentComplete` + `StatusEnrichmentSkipped` to `ProgressEvent` enum +- Add match arms in `ingest.rs` progress callback +- Run: `cargo test orchestrator` + +**Batch 4: CLI Display + Filter** (Files 7-11 → Tests 21-22, 40) +- Add status fields (including `status_synced_at`) to `show.rs` structs, SQL, display +- Add status fields to `list.rs` structs, SQL, column, repeatable `--status` filter (with IN clause + COLLATE NOCASE) +- Add `--status` flag to `cli/mod.rs` as `Vec` (repeatable) +- Add `"--status"` to autocorrect registry +- Wire `statuses` in both `ListFilters` constructions in `main.rs` +- Wire `status_enrichment` (with `mode`/`reason`/`seen`/`without_widget`/`partial_errors`/`first_partial_error` fields) into robot sync output envelope +- Run: `cargo test` (full suite) + +**Batch 5: Quality Gates** (AC-11) +- `cargo check --all-targets` +- `cargo clippy --all-targets -- -D warnings` +- `cargo fmt --check` +- `cargo test` (all green) + +**Key gotcha per batch:** +- Batch 1: Migration has 5 columns now (including `status_synced_at INTEGER`), not 4 — test assertion must check for all 5 +- Batch 2: Use `r##"..."##` in tests containing `"#1f75cb"` hex colors; `FetchStatusResult` is not `Clone` — tests must check fields individually; `__typename` test mock data must include the field in widget JSON objects; `httpdate` crate needed for Retry-After HTTP-date parsing — add to `Cargo.toml`; pagination guard test needs mock that returns same `endCursor` twice; partial-data test needs mock that returns both `data` and `errors` fields; adaptive page size tests need mock that inspects `$first` variable in request body to return different responses per page size; `GraphqlQueryResult` (not raw `Value`) is now the return type — test assertions must destructure it +- Batch 3: Progress callback in `ingest.rs` must be updated in same batch as enum change (2 new arms: `StatusEnrichmentComplete` + `StatusEnrichmentSkipped`); `unchecked_transaction()` needed because `conn` is `&Connection` not `&mut Connection`; `enrich_issue_statuses_txn` takes 5 params now (added `now_ms: i64`) and returns `(usize, usize)` tuple — destructure at call site; `status_enrichment_error` must be populated on the `Err` branch; `status_enrichment_mode` and `status_unsupported_reason` must be set in all code paths; project path lookup uses `.optional()?` — requires `use rusqlite::OptionalExtension;` import +- Batch 4: Autocorrect registry must be updated in same batch as clap flag addition; `COLLATE NOCASE` applies to the comparison, not the column definition; `status_synced_at` is `Option` in Rust structs (maps to nullable INTEGER in SQLite); `--status` is `Vec` not `Option` — `ListFilters.statuses` is `&[String]`; multi-value filter uses dynamic IN clause with placeholder generation + +## Edge Cases + +- **GitLab Free tier**: Status widget absent → columns stay NULL, no error +- **GitLab < 17.11**: No status widget at all → same as Free tier +- **GitLab 17.11-18.0**: Status widget present but `category` field missing → store name only +- **Custom statuses (18.5+)**: Names and categories won't match system defaults → stored as raw strings, no deserialization failures +- **Token with `read_api` scope**: Should work for GraphQL queries (read-only) +- **Self-hosted with GraphQL disabled**: 404 on endpoint → graceful skip, `unsupported_reason: GraphqlEndpointMissing` +- **HTTP 403 (Forbidden)**: Treated as auth failure, graceful skip with warning log, `unsupported_reason: AuthForbidden` +- **Rate limiting**: Respect `Retry-After` header if present +- **Large projects (10k+ issues)**: Pagination handles this — adaptive page sizing starts at 100, falls back to 50→25→10 on complexity/timeout errors +- **Self-hosted with strict complexity limits**: Adaptive page sizing automatically reduces `first` parameter to avoid complexity budget exhaustion +- **Status-state sync**: Closed via REST → status might be "Done" — expected and correct +- **Concurrent syncs**: Status enrichment is idempotent — UPDATE is safe to run multiple times +- **Status removed in GitLab**: Issue was fetched but has no status widget → status fields NULLed out (staleness clearing) +- **Enrichment failure mid-project**: Transaction rolls back — no partial updates, prior status values intact. Error message captured in `IngestProjectResult.status_enrichment_error` and surfaced in robot sync output. +- **Project path missing from DB**: Enrichment skipped with structured error (`project_path_missing`), does NOT fail the overall project pipeline — sync continues to discussion phase +- **Retry-After with HTTP-date format**: Parsed to delta-seconds from now via `httpdate` crate. Invalid format falls back to 60s. +- **NULL hex color**: `style_with_hex` and `colored_cell_hex` fall back to unstyled text +- **Invalid hex color**: Malformed color string → fall back to unstyled text +- **Empty project**: `fetch_issue_statuses` returns empty result → no UPDATEs, no error +- **Case-insensitive filter**: `--status "in progress"` matches `"In progress"` via COLLATE NOCASE +- **Multiple status filter**: `--status "To do" --status "In progress"` → OR semantics within status, AND with other filters +- **GraphQL partial-data response**: Response has both `data` and `errors` → data is used, `partial_error_count` incremented, `first_partial_error` captured — agents can detect incomplete data via robot sync output +- **GraphQL cursor stall**: `hasNextPage=true` but `endCursor` is `None` or unchanged → pagination aborted with warning, partial result returned (prevents infinite loops from GraphQL cursor bugs) + +## Decisions + +1. **Store color + icon_name** — YES. Used for colored CLI output in human view. +2. **Run on every sync** — always enrich, not just `--full`. This is vital data. +3. **Include `--status` filter** — YES, in v1. `lore list issues --status "In progress"` +4. **Factory over raw token** — YES. `client.graphql_client()` keeps token encapsulated. +5. **Transactional enrichment** — YES. All-or-nothing per project prevents partial/stale state. +6. **Case-insensitive `--status` filter** — YES. Better UX, especially for custom status names. ASCII `COLLATE NOCASE` is sufficient — all system statuses are ASCII, and custom names are overwhelmingly ASCII too. +7. **Flat fields over nested JSON object** — YES. Consistent with existing `labels`, `milestone_title` pattern. Works with `--fields` selection. Nesting would break `--fields` syntax and require special dot-path resolution. +8. **No retry/backoff in v1** — DEFER. REST client doesn't have retry either. This is a cross-cutting concern that should be built once as a shared transport layer (`transport.rs`) for both REST and GraphQL, not bolted onto GraphQL alone. Adding retry only for GraphQL creates split behavior and maintenance burden. Note: adaptive page sizing (Decision 18) handles the specific case of complexity/timeout errors without needing general retry. +9. **No capability probe/cache in v1** — DEFER. Graceful degradation (empty map on 404/403) is sufficient. The warning is once per project per sync — acceptable noise. Cache table adds migration complexity and a new DB schema concept for marginal benefit. +10. **`status_synced_at` column** — YES. Lightweight enrichment freshness timestamp (ms epoch UTC). Enables consumers to detect stale data and supports future delta-driven enrichment. Written on both status-set and status-clear operations to distinguish "never enriched" (NULL) from "enriched but no status" (timestamp set, status NULL). +11. **Enrichment error in robot output** — YES. `status_enrichment_error: Option` on `IngestProjectResult` + per-project `status_enrichment` object in robot sync JSON. Agents need machine-readable signal when enrichment fails — silent warnings in logs are invisible to automation. +12. **No `status_name_fold` shadow column** — REJECT. `COLLATE NOCASE` handles ASCII case-folding which covers all system statuses. A fold column doubles write cost for negligible benefit. +13. **No `--status-category` / `--no-status` in v1** — DEFER. Keep v1 focused on core `--status` filter. These are easy additions once usage patterns emerge. +14. **No strict mode** — DEFER. A `status_enrichment_strict` config toggle adds config bloat for an edge case. The `status_enrichment_error` field gives agents the signal they need to implement their own strict behavior. +15. **Explicit outcome mode in robot output** — YES. `status_enrichment.mode` distinguishes `"fetched"` / `"unsupported"` / `"skipped"` with optional `reason` field. Resolves the ambiguity where agents couldn't tell "project has no statuses" from "feature unavailable" — both previously looked like an empty success. Cheap to implement since `FetchStatusResult` already has `unsupported_reason`. +16. **GraphQL partial-data tolerance with end-to-end metadata** — YES. When GraphQL returns both `data` and `errors`, use the data and propagate error metadata (`had_partial_errors`/`first_partial_error`) through `GraphqlQueryResult` → `FetchStatusResult` → `IngestProjectResult` → robot sync output. Agents get machine-readable signal that status data may be incomplete, rather than silent log-only warnings. +17. **Pagination guard against cursor stall** — YES. If `hasNextPage=true` but `endCursor` is `None` or unchanged, abort the loop and return partial results. This is a zero-cost safety valve against infinite loops from GraphQL cursor bugs. The alternative (trusting the server unconditionally) risks hanging the sync indefinitely. +18. **Adaptive page sizing** — YES. Start with `first=100`, fall back to 50→25→10 on GraphQL complexity/timeout errors. This is NOT general retry/backoff (Decision 8) — it specifically handles self-hosted GitLab instances with stricter complexity/time limits by reducing the page size that caused the problem. After a successful page, resets to 100 (the complexity issue may be cursor-position-specific). Zero operational cost — adds ~15 lines and 4 tests. +19. **Repeatable `--status` filter** — YES. `--status "To do" --status "In progress"` with OR semantics. Practical for "show me active work" queries. Clap supports `Vec` natively, and dynamic IN clause generation is straightforward. Single-value case uses `=` for simplicity. +20. **Non-fatal project path lookup** — YES. Uses `.optional()?` instead of `?` for the `path_with_namespace` DB query. If the project path is missing, enrichment is skipped with `status_enrichment_error: "project_path_missing"` and the sync continues to discussion phase. Enrichment is optional — it should never take down the entire project pipeline. +21. **Coverage telemetry counters** — YES. `seen`, `enriched`, `cleared`, `without_widget` in `IngestProjectResult` and robot sync output. `enriched`/`cleared` alone cannot distinguish "project has 0 issues" from "project has 500 issues with 0 statuses." Coverage counters cost one `len()` call and one subtraction — negligible. + +## Future Enhancements (Not in Scope) + +These ideas surfaced during planning (iterations 3-6, cross-model reviews) but are out of scope for this implementation. File as separate beads if/when needed. + +### Filters & Querying +- **`--status-category` filter**: Filter by category (`in_progress`, `done`, etc.) for automation — more stable than name-based filtering for custom lifecycles. Consider adding a `COLLATE NOCASE` index on `(project_id, status_category)` when this lands. +- **`--no-status` filter**: Return only issues where `status_name IS NULL` — useful for migration visibility and data quality audits. +- **`--stale-status-days N` filter**: Filter issues where status hasn't changed in N days — requires `status_changed_at` column (see below). + +### Enrichment Optimization +- **Delta-driven enrichment**: Skip GraphQL fetch when issue ingestion reports zero changes for a project (optimization for large repos). Use `status_synced_at` (already in v1 schema) to determine last enrichment time. Add `status_full_reconcile_hours` config (default 24) to force periodic full sweep as safety net. +- **`--refresh-status` override flag**: Force enrichment even with zero issue deltas. +- **Capability probe/cache**: Detect status-widget support per project, cache with TTL in a `project_capabilities` table to avoid repeated pointless GraphQL calls on Free tier. Re-probe on TTL expiry (default 24h). The `status_synced_at` column provides a foundation — if a project's issues all have NULL `status_synced_at`, it's likely unsupported. + +### Schema Extensions +- **`status_changed_at` column**: Track when status *value* last changed (ms epoch UTC) — enables "how long has this been in progress?" queries, `--stale-status-days N` filter, and `status_age_days` field in robot output. Requires change-detection logic during enrichment (compare old vs new before UPDATE). Note: `status_synced_at` (in v1) records when enrichment last *ran* on the row, not when the status value changed — these serve different purposes. +- **`status_id` column**: Store GitLab's global node ID (e.g., `gid://gitlab/WorkItems::Statuses::Status/1`) for rename-resistant identity. Useful if automations need to track a specific status across renames. Deferred because the global node ID is opaque, instance-specific, and not useful for cross-instance queries or human consumption — `status_name` is the primary user-facing identifier. + +### Infrastructure +- **Shared transport with retry/backoff**: Retry 429/502/503/504/network errors with exponential backoff + jitter (max 3 attempts) — should be a cross-cutting concern via shared `transport.rs` layer for both REST and GraphQL clients. Must respect cancellation signal between retry sleeps. +- **Set-based bulk enrichment**: For very large projects (10k+ issues), replace row-by-row UPDATE loop with temp-table + set-based SQL for faster write-lock release. Profile first to determine if this is actually needed — SQLite `prepare_cached` statements may be sufficient. + +### Operational +- **Strict mode**: `status_enrichment_strict: bool` config toggle that makes enrichment failure fail the sync. Agents can already implement equivalent behavior by checking `status_enrichment_error` in the robot sync output (added in v1). +- **`partial_failures` array in robot sync envelope**: Aggregate all enrichment errors into a top-level array for easy agent consumption. Currently per-project `status_enrichment.error` provides the data — this would be a convenience aggregation. +- **Status/state consistency checks**: Detect status-state mismatches (e.g., `DONE` category with `state=open`, or `IN_PROGRESS` category with `state=closed`) and log warnings. Useful for data integrity diagnostics, but status-state sync is GitLab's responsibility and temporary mismatches during sync windows are expected — not appropriate for v1 enrichment logic. + +### Cancellation +- **Pass cancellation signal into fetcher**: Thread the existing `CancellationSignal` into `fetch_issue_statuses()` to check between page requests. Currently the orchestrator checks cancellation before enrichment starts, and individual page fetches complete in <1s, so the worst-case delay before honoring cancellation is ~1s per page (negligible). Worth adding if enrichment grows to include multiple fetch operations per project. + +--- + +## Battle-Test Results (Iteration 2, pre-revision) + +> This plan was fully implemented in a trial run. All 435+ tests passed, clippy clean, fmt clean. +> The code was then reverted so the plan could be reviewed before real implementation. +> The corrections below are already reflected in the code snippets above. +> +> **Note (Iteration 3):** The plan was revised based on external review feedback. Key changes: +> transactional enrichment with staleness clearing, `graphql_client()` factory pattern, +> 403 handling, compound index, case-insensitive filter, 5 additional tests, and dropped +> `WorkItemStatusCategory` enum in favor of `Option` (custom statuses on 18.5+ can +> have arbitrary category values). The trial implementation predates these revisions — the +> next trial should validate the new behavior. +> +> **Note (Iteration 4):** Revised based on cross-model review (ChatGPT). Key changes: +> `__typename`-based deterministic widget matching (replaces heuristic try-deserialize), +> `all_fetched_iids` changed from `Vec` to `HashSet` for O(1) staleness lookups, +> `statuses_cleared` counter added for enrichment observability, `StatusEnrichmentSkipped` +> progress event for config-off case, idempotency test added, expanded future enhancements +> with deferred ideas (status_changed_at, capability cache, shared transport, set-based bulk). +> Rejected scope-expanding suggestions: shared transport layer, capability cache table, +> delta-driven enrichment, NOCASE index changes, `--status-category`/`--no-status` filters, +> `status_changed_at` column — all deferred to future enhancements with improved descriptions. +> +> **Note (Iteration 5):** Revised based on second cross-model review (ChatGPT feedback-5). +> **Accepted:** (1) `status_synced_at INTEGER` column — lightweight freshness timestamp for +> enrichment observability and future delta-driven optimization, written on both set and clear +> operations; (2) `status_enrichment_error: Option` on `IngestProjectResult` — captures +> error message for machine-readable failure reporting; (3) AC-10 for robot sync envelope with +> per-project `status_enrichment` object; (4) `Retry-After` HTTP-date format support via +> `httpdate` crate; (5) 5 new tests (HTTP-date, clear synced_at, error capture, sync envelope). +> **Rejected with rationale:** retry/backoff (cross-cutting, not GraphQL-only), capability cache +> (migration complexity for marginal benefit), delta-driven enrichment (profile first), +> set-based SQL (premature optimization), strict mode (agents can use error field), Unicode +> case-folding column (ASCII COLLATE NOCASE sufficient), nested JSON status object (breaks +> --fields syntax), `--status-category`/`--no-status` filters (keep v1 focused). +> Expanded Decisions section from 9 to 14 with explicit rationale for each rejection. +> Reorganized Future Enhancements into categorized subsections. +> +> **Note (Iteration 6):** Revised based on third cross-model review (ChatGPT feedback-6). +> **Accepted:** (1) GraphQL partial-data tolerance — when response has both `data` and `errors`, +> return data and log warning instead of hard-failing (per GraphQL spec, partial results are valid); +> (2) Pagination guard — abort loop if `hasNextPage=true` but `endCursor` is `None` or unchanged +> (prevents infinite loops from GraphQL cursor bugs); (3) `UnsupportedReason` enum + +> `unsupported_reason` field on `FetchStatusResult` — distinguishes "no statuses found" from "feature +> unavailable" for robot sync output clarity; (4) Enhanced AC-10 with `mode`/`reason` fields for +> explicit enrichment outcome reporting. Added 3 new tests (partial-data, cursor stall, unsupported +> reason on success). Updated Decisions 15-17. Added `status_id` to Future Enhancements. +> **Rejected with rationale:** see Rejected Recommendations section below. +> +> **Note (Iteration 7):** Revised based on fourth cross-model review (ChatGPT feedback-7). +> **Accepted:** (1) Partial-error metadata end-to-end — `GraphqlQueryResult` struct replaces raw +> `Value` return, propagating `had_partial_errors`/`first_partial_error` through `FetchStatusResult` +> → `IngestProjectResult` → robot sync output (agents get machine-readable incomplete-data signal); +> (2) Adaptive page sizing — `first=100→50→25→10` fallback on complexity/timeout errors, handles +> self-hosted instances with stricter limits without needing general retry/backoff; (3) Non-fatal +> project path lookup — `.optional()?` prevents enrichment from crashing the project pipeline; +> (4) Repeatable `--status` filter — `Vec` with OR semantics via IN clause; (5) Coverage +> telemetry — `seen`, `without_widget`, `partial_error_count`, `first_partial_error` counters on +> `IngestProjectResult` for full observability. Added 7 new tests (adaptive page size ×4, multi-status, +> path missing, partial errors tracked). Updated Decisions 16, 18-21. +> **Rejected with rationale:** centralized color parsing module (see Rejected Recommendations). + +### Corrections Made During Trial + +| Issue | Root Cause | Fix Applied | +|---|---|---| +| `LoreError::GitLabRateLimited { retry_after }` uses `u64`, not `Option` | Plan assumed `Option` for retry_after | Added `.unwrap_or(60)` after parsing Retry-After header | +| Raw string `r#"..."#` breaks on `"#1f75cb"` in test JSON | The `"#` sequence matches the raw string terminator | Use `r##"..."##` double-hash delimiters in test code | +| Clippy: collapsible if statements (5 instances) | `if x { if y { ... } }` → `if x && y { ... }` | Used `if ... && let ...` chain syntax (Rust 2024 edition) | +| Clippy: items after test module | `ansi256_from_rgb()` was placed after `#[cfg(test)]` | Moved function before the test module | +| Clippy: manual range contains | `assert!(blue >= 16 && blue <= 231)` | Changed to `assert!((16..=231).contains(&blue))` | +| Missing token accessor on `GitLabClient` | Orchestrator needs token to create `GraphqlClient` | Added `pub fn token(&self) -> &str` to `client.rs` | +| Missing `path_with_namespace` in orchestrator | Only had `project_id`, needed project path for GraphQL | Added DB query to look up path from projects table | +| Exhaustive match on `ProgressEvent` | Adding enum variant breaks `ingest.rs` match | Added `StatusEnrichmentComplete` arm to progress callback | +| Legacy `List` command missing `status` field | Two `ListFilters` constructions in `main.rs` | Added `status: None` to legacy command's `ListFilters` | +| Autocorrect registry test failure | All clap flags must be registered in `COMMAND_FLAGS` | Added `"--status"` to the `"issues"` entry in `autocorrect.rs` | + +### Files Not in Original Plan (Discovered During Trial) + +- `migrations/021_work_item_status.sql` — separate file, not inline in db.rs (uses `include_str!`) +- `src/gitlab/client.rs` — needs `graphql_client()` factory for GraphQL client creation (revised from `token()` accessor) +- `src/cli/commands/ingest.rs` — progress callback match must be updated +- `src/cli/autocorrect.rs` — flag registry test requires all new flags registered +- `src/main.rs` — legacy `List` command path also constructs `ListFilters` + +### Test Results from Trial (all green, pre-revision) + +- 435 lib tests + 136 integration/benchmark tests = 571 total +- 11 new GraphQL tests (client, status fetcher, auth, pagination, graceful degradation) +- 5 new orchestrator tests (migration, enrich, skip unknown IIDs, progress event, result default) +- 3 new config tests (default true, deserialize false, omitted defaults true) +- All existing tests remained green + +**Post-revision additions (5 new tests, iteration 3):** `test_graphql_403_maps_to_auth_failed`, +`test_migration_021_adds_index`, `test_enrich_clears_removed_status`, +`test_enrich_transaction_rolls_back_on_failure`, `test_list_filter_by_status_case_insensitive` + +**Post-revision additions (2 new tests, iteration 4):** `test_enrich_idempotent_across_two_runs`, +`test_typename_matching_ignores_non_status_widgets` + +**Post-revision additions (5 new tests, iteration 5):** `test_retry_after_http_date_format`, +`test_retry_after_invalid_falls_back_to_60`, `test_enrich_sets_synced_at_on_clear`, +`test_enrichment_error_captured_in_result`, `test_robot_sync_includes_status_enrichment` + +**Post-revision additions (3 new tests, iteration 6):** `test_graphql_partial_data_with_errors_returns_data`, +`test_fetch_statuses_cursor_stall_aborts`, `test_fetch_statuses_unsupported_reason_none_on_success` + +**Post-revision additions (7 new tests, iteration 7):** `test_fetch_statuses_complexity_error_reduces_page_size`, +`test_fetch_statuses_timeout_error_reduces_page_size`, `test_fetch_statuses_smallest_page_still_fails`, +`test_fetch_statuses_page_size_resets_after_success`, `test_list_filter_by_multiple_statuses`, +`test_project_path_missing_skips_enrichment`, `test_fetch_statuses_partial_errors_tracked` + +--- + +## Rejected Recommendations + +Cumulative log of recommendations considered and rejected across iterations. Prevents future reviewers from re-proposing the same changes. + +### Iteration 4 (ChatGPT feedback-4) +- **Shared transport layer with retry/backoff** — rejected because this is a cross-cutting concern that should be built once for both REST and GraphQL, not bolted onto GraphQL alone. Adding retry only for GraphQL creates split behavior. Deferred to Future Enhancements. +- **Capability cache table** — rejected because migration complexity and new DB schema concept for marginal benefit. Graceful degradation (empty map on 404/403) is sufficient. +- **Delta-driven enrichment** — rejected because needs profiling first to determine if full re-fetch is actually a bottleneck. Premature optimization. +- **NOCASE index changes** — rejected because `COLLATE NOCASE` on the comparison is sufficient; adding it to the index definition provides no benefit for SQLite's query planner in this case. +- **`--status-category` / `--no-status` filters in v1** — rejected because keeping v1 focused. Easy additions once usage patterns emerge. +- **`status_changed_at` column in v1** — rejected because requires change-detection logic during enrichment (compare old vs new) which adds complexity. `status_synced_at` serves different purpose and is sufficient for v1. + +### Iteration 5 (ChatGPT feedback-5) +- **Retry/backoff (again)** — rejected for same reason as iteration 4. Cross-cutting concern. +- **Capability cache (again)** — rejected for same reason as iteration 4. +- **Delta-driven enrichment (again)** — rejected for same reason as iteration 4. +- **Set-based bulk SQL** — rejected because premature optimization. SQLite `prepare_cached` statements may be sufficient. Profile first. +- **Strict mode config toggle** — rejected because adds config bloat for edge case. Agents can implement equivalent behavior by checking `status_enrichment_error`. +- **Unicode case-folding shadow column (`status_name_fold`)** — rejected because `COLLATE NOCASE` handles ASCII case-folding which covers all system statuses. Doubles write cost for negligible benefit. +- **Nested JSON status object** — rejected because breaks `--fields` syntax and requires special dot-path resolution. Flat fields are consistent with existing patterns. +- **`--status-category` / `--no-status` filters (again)** — rejected for same reason as iteration 4. + +### Iteration 6 (ChatGPT feedback-6) +- **`FetchStatusOutcome` enum with `CancelledPartial` variant** — rejected because over-engineered for v1. The simpler approach of adding `unsupported_reason: Option` to `FetchStatusResult` provides the same observability signal (distinguishing "no statuses" from "feature unavailable") without introducing a 3-variant enum that forces match arms everywhere. The partial-from-cancellation case is not needed since the orchestrator checks cancellation before starting enrichment and individual page fetches complete in <1s. +- **Pass cancellation signal into `fetch_issue_statuses()`** — rejected because the orchestrator already checks cancellation before enrichment starts, and individual page fetches are <1s. Threading the signal through adds a parameter and complexity for negligible benefit. Deferred to Future Enhancements (Cancellation section) in case enrichment grows to include multiple fetch operations. +- **Persist `status_id` column (GitLab global node ID)** — rejected because GitLab's GraphQL `id` is a global node ID (e.g., `gid://gitlab/WorkItems::Statuses::Status/1`) that is opaque, instance-specific, and not useful for cross-instance queries or human consumption — the `name` is what users see and filter by. Added to Future Enhancements (Schema Extensions) as a deferred option for rename-resistant identity if needed. +- **Extract status enrichment into `src/ingestion/enrichment/status.rs` module** — rejected because the enrichment logic is ~60 lines (one orchestrator block + one helper function). Creating a new module, directory, and `mod.rs` for this is premature abstraction. The orchestrator is the right home until enrichment grows to justify extraction. +- **Status/state consistency checks (mismatch detection)** — rejected because status-state sync is GitLab's responsibility and temporary mismatches during sync windows are expected behavior. Adding mismatch detection to enrichment adds complexity for a diagnostic signal that would generate false positives. Deferred to Future Enhancements (Operational). +- **Performance envelope acceptance criterion (AC-12)** — rejected because "10k-issue fixture on CI baseline machine" is unmeasurable without CI infrastructure and test fixtures we don't have. Pagination handles large projects by design (100/page). Testing memory bounds requires test harness complexity far beyond the value for v1. + +### Iteration 7 (ChatGPT feedback-7) +- **Centralize color parsing into `src/cli/commands/color.rs` module** — rejected because the two color helpers (`style_with_hex` in show.rs, `colored_cell_hex` in list.rs) return different types (`console::StyledObject` vs `comfy_table::Cell`) for different rendering contexts. The shared hex-parsing logic is 4 lines. Creating a new module + file for this is premature abstraction per project rules ("only abstract when you have 3+ similar uses"). If a third color context emerges, extract then. diff --git a/plans/work-item-status-graphql.tdd-appendix.md b/plans/work-item-status-graphql.tdd-appendix.md new file mode 100644 index 0000000..f26e1af --- /dev/null +++ b/plans/work-item-status-graphql.tdd-appendix.md @@ -0,0 +1,2036 @@ +# Work Item Status — TDD Appendix + +> Pre-written tests for every acceptance criterion in +> `plans/work-item-status-graphql.md` (iteration 7). +> Replaces the skeleton TDD Plan section with compilable Rust test code. + +--- + +## Coverage Matrix + +| AC | Tests | File | +|----|-------|------| +| AC-1 GraphQL Client | T01-T06, T28-T29, T33, T43-T47 | `src/gitlab/graphql.rs` (inline mod) | +| AC-2 Status Types | T07-T10, T48 | `src/gitlab/types.rs` (inline mod) | +| AC-3 Status Fetcher | T11-T14, T27, T34-T39, T42, T49-T52 | `src/gitlab/graphql.rs` (inline mod) | +| AC-4 Migration 021 | T15-T16, T53-T54 | `tests/migration_tests.rs` | +| AC-5 Config Toggle | T23-T24 | `src/core/config.rs` (inline mod) | +| AC-6 Orchestrator | T17-T20, T26, T30-T31, T41, T55 | `tests/status_enrichment_tests.rs` | +| AC-7 Show Display | T56-T58 | `tests/status_display_tests.rs` | +| AC-8 List Display | T59-T60 | `tests/status_display_tests.rs` | +| AC-9 List Filter | T21-T22, T40, T61-T63 | `tests/status_filter_tests.rs` | +| AC-10 Robot Envelope | T32 | `tests/status_enrichment_tests.rs` | +| AC-11 Quality Gates | (cargo check/clippy/fmt/test) | CI only | +| Helpers | T25 | `src/gitlab/graphql.rs` (inline mod) | + +**Total: 63 tests** (42 original + 21 gap-fill additions marked with `NEW`) + +--- + +## File 1: `src/gitlab/types.rs` — inline `#[cfg(test)]` module + +Tests AC-2 (WorkItemStatus deserialization). + +```rust +#[cfg(test)] +mod tests { + use super::*; + + // ── T07: Full deserialization with all fields ──────────────────────── + #[test] + fn test_work_item_status_deserialize() { + let json = r##"{"name":"In progress","category":"IN_PROGRESS","color":"#1f75cb","iconName":"status-in-progress"}"##; + let status: WorkItemStatus = serde_json::from_str(json).unwrap(); + + assert_eq!(status.name, "In progress"); + assert_eq!(status.category.as_deref(), Some("IN_PROGRESS")); + assert_eq!(status.color.as_deref(), Some("#1f75cb")); + assert_eq!(status.icon_name.as_deref(), Some("status-in-progress")); + } + + // ── T08: Only required field present ───────────────────────────────── + #[test] + fn test_work_item_status_optional_fields() { + let json = r#"{"name":"To do"}"#; + let status: WorkItemStatus = serde_json::from_str(json).unwrap(); + + assert_eq!(status.name, "To do"); + assert!(status.category.is_none()); + assert!(status.color.is_none()); + assert!(status.icon_name.is_none()); + } + + // ── T09: Unknown category value (custom lifecycle on 18.5+) ────────── + #[test] + fn test_work_item_status_unknown_category() { + let json = r#"{"name":"Custom","category":"SOME_FUTURE_VALUE"}"#; + let status: WorkItemStatus = serde_json::from_str(json).unwrap(); + + assert_eq!(status.category.as_deref(), Some("SOME_FUTURE_VALUE")); + } + + // ── T10: Explicit null category ────────────────────────────────────── + #[test] + fn test_work_item_status_null_category() { + let json = r#"{"name":"In progress","category":null}"#; + let status: WorkItemStatus = serde_json::from_str(json).unwrap(); + + assert!(status.category.is_none()); + } + + // ── T48 (NEW): All five system statuses deserialize correctly ───────── + #[test] + fn test_work_item_status_all_system_statuses() { + let cases = [ + (r#"{"name":"To do","category":"TO_DO","color":"#737278"}"#, "TO_DO"), + (r#"{"name":"In progress","category":"IN_PROGRESS","color":"#1f75cb"}"#, "IN_PROGRESS"), + (r#"{"name":"Done","category":"DONE","color":"#108548"}"#, "DONE"), + (r#"{"name":"Won't do","category":"CANCELED","color":"#DD2B0E"}"#, "CANCELED"), + (r#"{"name":"Duplicate","category":"CANCELED","color":"#DD2B0E"}"#, "CANCELED"), + ]; + for (json, expected_cat) in cases { + let status: WorkItemStatus = serde_json::from_str(json).unwrap(); + assert_eq!( + status.category.as_deref(), + Some(expected_cat), + "Failed for: {}", + status.name + ); + } + } +} +``` + +--- + +## File 2: `src/gitlab/graphql.rs` — inline `#[cfg(test)]` module + +Tests AC-1 (GraphQL client), AC-3 (Status fetcher), and helper functions. +Uses `wiremock` (already in dev-dependencies). + +```rust +#[cfg(test)] +mod tests { + use super::*; + use wiremock::matchers::{body_json_schema, header, method, path}; + use wiremock::{Mock, MockServer, ResponseTemplate}; + + // ═══════════════════════════════════════════════════════════════════════ + // AC-1: GraphQL Client + // ═══════════════════════════════════════════════════════════════════════ + + // ── T01: Successful query returns data ─────────────────────────────── + #[tokio::test] + async fn test_graphql_query_success() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": {"foo": "bar"} + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = client + .query("{ foo }", serde_json::json!({})) + .await + .unwrap(); + + assert_eq!(result.data, serde_json::json!({"foo": "bar"})); + assert!(!result.had_partial_errors); + assert!(result.first_partial_error.is_none()); + } + + // ── T02: Errors array with no data field → Err ─────────────────────── + #[tokio::test] + async fn test_graphql_query_with_errors_no_data() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "errors": [{"message": "bad query"}] + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let err = client + .query("{ bad }", serde_json::json!({})) + .await + .unwrap_err(); + + match err { + LoreError::Other(msg) => assert!( + msg.contains("bad query"), + "Expected error message containing 'bad query', got: {msg}" + ), + other => panic!("Expected LoreError::Other, got: {other:?}"), + } + } + + // ── T03: Authorization header uses Bearer format ───────────────────── + #[tokio::test] + async fn test_graphql_auth_uses_bearer() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .and(header("Authorization", "Bearer tok123")) + .and(header("Content-Type", "application/json")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": {"ok": true} + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + // If the header doesn't match, wiremock returns 404 and we'd get an error + let result = client.query("{ ok }", serde_json::json!({})).await; + assert!(result.is_ok(), "Expected Ok, got: {result:?}"); + } + + // ── T04: HTTP 401 → GitLabAuthFailed ───────────────────────────────── + #[tokio::test] + async fn test_graphql_401_maps_to_auth_failed() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(401)) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "bad_token"); + let err = client.query("{ me }", serde_json::json!({})).await.unwrap_err(); + + assert!( + matches!(err, LoreError::GitLabAuthFailed), + "Expected GitLabAuthFailed, got: {err:?}" + ); + } + + // ── T05: HTTP 403 → GitLabAuthFailed ───────────────────────────────── + #[tokio::test] + async fn test_graphql_403_maps_to_auth_failed() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(403)) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "forbidden_token"); + let err = client.query("{ me }", serde_json::json!({})).await.unwrap_err(); + + assert!( + matches!(err, LoreError::GitLabAuthFailed), + "Expected GitLabAuthFailed, got: {err:?}" + ); + } + + // ── T06: HTTP 404 → GitLabNotFound ─────────────────────────────────── + #[tokio::test] + async fn test_graphql_404_maps_to_not_found() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(404)) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let err = client.query("{ me }", serde_json::json!({})).await.unwrap_err(); + + assert!( + matches!(err, LoreError::GitLabNotFound { .. }), + "Expected GitLabNotFound, got: {err:?}" + ); + } + + // ── T28: HTTP 429 with Retry-After HTTP-date format ────────────────── + #[tokio::test] + async fn test_retry_after_http_date_format() { + let server = MockServer::start().await; + // Use a date far in the future so delta is positive + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with( + ResponseTemplate::new(429) + .insert_header("Retry-After", "Wed, 11 Feb 2099 01:00:00 GMT"), + ) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let err = client.query("{ me }", serde_json::json!({})).await.unwrap_err(); + + match err { + LoreError::GitLabRateLimited { retry_after } => { + // Should be a large number of seconds (future date) + assert!( + retry_after > 60, + "Expected retry_after > 60 for far-future date, got: {retry_after}" + ); + } + other => panic!("Expected GitLabRateLimited, got: {other:?}"), + } + } + + // ── T29: HTTP 429 with unparseable Retry-After → fallback to 60 ───── + #[tokio::test] + async fn test_retry_after_invalid_falls_back_to_60() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with( + ResponseTemplate::new(429).insert_header("Retry-After", "garbage"), + ) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let err = client.query("{ me }", serde_json::json!({})).await.unwrap_err(); + + match err { + LoreError::GitLabRateLimited { retry_after } => { + assert_eq!(retry_after, 60, "Expected fallback to 60s"); + } + other => panic!("Expected GitLabRateLimited, got: {other:?}"), + } + } + + // ── T33: Partial data with errors → returns data + metadata ────────── + #[tokio::test] + async fn test_graphql_partial_data_with_errors_returns_data() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": {"foo": "bar"}, + "errors": [{"message": "partial failure"}] + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = client + .query("{ foo }", serde_json::json!({})) + .await + .unwrap(); + + assert_eq!(result.data, serde_json::json!({"foo": "bar"})); + assert!(result.had_partial_errors); + assert_eq!( + result.first_partial_error.as_deref(), + Some("partial failure") + ); + } + + // ── T43 (NEW): HTTP 429 with delta-seconds Retry-After ─────────────── + #[tokio::test] + async fn test_retry_after_delta_seconds() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with( + ResponseTemplate::new(429).insert_header("Retry-After", "120"), + ) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let err = client.query("{ me }", serde_json::json!({})).await.unwrap_err(); + + match err { + LoreError::GitLabRateLimited { retry_after } => { + assert_eq!(retry_after, 120); + } + other => panic!("Expected GitLabRateLimited, got: {other:?}"), + } + } + + // ── T44 (NEW): Network error → LoreError::Other ───────────────────── + #[tokio::test] + async fn test_graphql_network_error() { + // Connect to a port that's not listening + let client = GraphqlClient::new("http://127.0.0.1:1", "tok123"); + let err = client.query("{ me }", serde_json::json!({})).await.unwrap_err(); + + assert!( + matches!(err, LoreError::Other(_)), + "Expected LoreError::Other for network error, got: {err:?}" + ); + } + + // ── T45 (NEW): Request body contains query + variables ─────────────── + #[tokio::test] + async fn test_graphql_request_body_format() { + use std::sync::Arc; + use tokio::sync::Mutex; + + let server = MockServer::start().await; + let captured = Arc::new(Mutex::new(None::)); + let captured_clone = captured.clone(); + + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": {"ok": true} + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let vars = serde_json::json!({"projectPath": "group/repo"}); + let _ = client.query("query($projectPath: ID!) { project(fullPath: $projectPath) { id } }", vars.clone()).await; + + // Verify via wiremock received requests + let requests = server.received_requests().await.unwrap(); + assert_eq!(requests.len(), 1); + + let body: serde_json::Value = + serde_json::from_slice(&requests[0].body).unwrap(); + assert!(body.get("query").is_some(), "Body missing 'query' field"); + assert!(body.get("variables").is_some(), "Body missing 'variables' field"); + assert_eq!(body["variables"]["projectPath"], "group/repo"); + } + + // ── T46 (NEW): Base URL trailing slash is normalized ───────────────── + #[tokio::test] + async fn test_graphql_base_url_trailing_slash() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": {"ok": true} + }))) + .mount(&server) + .await; + + // Add trailing slash to base URL + let url_with_slash = format!("{}/", server.uri()); + let client = GraphqlClient::new(&url_with_slash, "tok123"); + let result = client.query("{ ok }", serde_json::json!({})).await; + assert!(result.is_ok(), "Trailing slash should be normalized, got: {result:?}"); + } + + // ── T47 (NEW): Response with data:null and no errors → Err ─────────── + #[tokio::test] + async fn test_graphql_data_null_no_errors() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": null + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let err = client.query("{ me }", serde_json::json!({})).await.unwrap_err(); + + match err { + LoreError::Other(msg) => assert!( + msg.contains("missing 'data'"), + "Expected 'missing data' message, got: {msg}" + ), + other => panic!("Expected LoreError::Other, got: {other:?}"), + } + } + + // ═══════════════════════════════════════════════════════════════════════ + // AC-3: Status Fetcher + // ═══════════════════════════════════════════════════════════════════════ + + /// Helper: build a GraphQL work-items response page with given issues. + /// Each item: (iid, status_name_or_none, has_status_widget) + fn make_work_items_page( + items: &[(i64, Option<&str>)], + has_next_page: bool, + end_cursor: Option<&str>, + ) -> serde_json::Value { + let nodes: Vec = items + .iter() + .map(|(iid, status_name)| { + let mut widgets = vec![ + serde_json::json!({"__typename": "WorkItemWidgetDescription"}), + ]; + if let Some(name) = status_name { + widgets.push(serde_json::json!({ + "__typename": "WorkItemWidgetStatus", + "status": { + "name": name, + "category": "IN_PROGRESS", + "color": "#1f75cb", + "iconName": "status-in-progress" + } + })); + } + // If status_name is None but we still want the widget (null status): + // handled by a separate test — here None means no status widget at all + serde_json::json!({ + "iid": iid.to_string(), + "widgets": widgets, + }) + }) + .collect(); + + serde_json::json!({ + "data": { + "project": { + "workItems": { + "nodes": nodes, + "pageInfo": { + "endCursor": end_cursor, + "hasNextPage": has_next_page, + } + } + } + } + }) + } + + /// Helper: build a page where issue has status widget but status is null. + fn make_null_status_widget_page(iid: i64) -> serde_json::Value { + serde_json::json!({ + "data": { + "project": { + "workItems": { + "nodes": [{ + "iid": iid.to_string(), + "widgets": [ + {"__typename": "WorkItemWidgetStatus", "status": null} + ] + }], + "pageInfo": { + "endCursor": null, + "hasNextPage": false, + } + } + } + } + }) + } + + // ── T11: Pagination across 2 pages ─────────────────────────────────── + #[tokio::test] + async fn test_fetch_statuses_pagination() { + let server = MockServer::start().await; + + // Page 1: returns cursor "page2" + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with({ + let mut seq = wiremock::ResponseTemplate::new(200); + // We need conditional responses based on request body. + // Use a simpler approach: mount two mocks, first returns page 1, + // second returns page 2. Wiremock uses LIFO matching. + seq.set_body_json(make_work_items_page( + &[(1, Some("In progress")), (2, Some("To do"))], + true, + Some("cursor_page2"), + )) + }) + .up_to_n_times(1) + .expect(1) + .mount(&server) + .await; + + // Page 2: no more pages + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with( + ResponseTemplate::new(200).set_body_json(make_work_items_page( + &[(3, Some("Done"))], + false, + None, + )), + ) + .expect(1) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert_eq!(result.statuses.len(), 3); + assert!(result.statuses.contains_key(&1)); + assert!(result.statuses.contains_key(&2)); + assert!(result.statuses.contains_key(&3)); + assert_eq!(result.all_fetched_iids.len(), 3); + assert!(result.unsupported_reason.is_none()); + } + + // ── T12: No status widget → empty statuses, populated all_fetched ──── + #[tokio::test] + async fn test_fetch_statuses_no_status_widget() { + let server = MockServer::start().await; + + // Issue has widgets but none is WorkItemWidgetStatus + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": { + "project": { + "workItems": { + "nodes": [{ + "iid": "42", + "widgets": [ + {"__typename": "WorkItemWidgetDescription"}, + {"__typename": "WorkItemWidgetLabels"} + ] + }], + "pageInfo": {"endCursor": null, "hasNextPage": false} + } + } + } + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert!(result.statuses.is_empty(), "No status widget → no statuses"); + assert!( + result.all_fetched_iids.contains(&42), + "IID 42 should still be in all_fetched_iids" + ); + } + + // ── T13: GraphQL 404 → graceful unsupported ────────────────────────── + #[tokio::test] + async fn test_fetch_statuses_404_graceful() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(404)) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert!(result.statuses.is_empty()); + assert!(result.all_fetched_iids.is_empty()); + assert!(matches!( + result.unsupported_reason, + Some(UnsupportedReason::GraphqlEndpointMissing) + )); + } + + // ── T14: GraphQL 403 → graceful unsupported ────────────────────────── + #[tokio::test] + async fn test_fetch_statuses_403_graceful() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(403)) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert!(result.statuses.is_empty()); + assert!(result.all_fetched_iids.is_empty()); + assert!(matches!( + result.unsupported_reason, + Some(UnsupportedReason::AuthForbidden) + )); + } + + // ── T25: ansi256_from_rgb known conversions ────────────────────────── + #[test] + fn test_ansi256_from_rgb() { + // Black → index 16 (0,0,0 in 6x6x6 cube) + assert_eq!(ansi256_from_rgb(0, 0, 0), 16); + // White → index 231 (5,5,5 in 6x6x6 cube) + assert_eq!(ansi256_from_rgb(255, 255, 255), 231); + // GitLab "In progress" blue #1f75cb → (31,117,203) + // ri = (31*5+127)/255 = 282/255 ≈ 1 + // gi = (117*5+127)/255 = 712/255 ≈ 2 (rounds to 3) + // bi = (203*5+127)/255 = 1142/255 ≈ 4 + let idx = ansi256_from_rgb(31, 117, 203); + // 16 + 36*1 + 6*2 + 4 = 16+36+12+4 = 68 + // or 16 + 36*1 + 6*3 + 4 = 16+36+18+4 = 74 depending on rounding + assert!( + (68..=74).contains(&idx), + "Expected ansi256 index near 68-74 for #1f75cb, got: {idx}" + ); + } + + // ── T27: __typename matching ignores non-status widgets ────────────── + #[tokio::test] + async fn test_typename_matching_ignores_non_status_widgets() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": { + "project": { + "workItems": { + "nodes": [{ + "iid": "10", + "widgets": [ + {"__typename": "WorkItemWidgetDescription"}, + {"__typename": "WorkItemWidgetLabels"}, + {"__typename": "WorkItemWidgetAssignees"}, + { + "__typename": "WorkItemWidgetStatus", + "status": { + "name": "In progress", + "category": "IN_PROGRESS" + } + } + ] + }], + "pageInfo": {"endCursor": null, "hasNextPage": false} + } + } + } + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + // Only status widget should be parsed + assert_eq!(result.statuses.len(), 1); + assert_eq!(result.statuses[&10].name, "In progress"); + } + + // ── T34: Cursor stall aborts pagination ────────────────────────────── + #[tokio::test] + async fn test_fetch_statuses_cursor_stall_aborts() { + let server = MockServer::start().await; + + // Both pages return the SAME cursor → stall detection + let stall_response = serde_json::json!({ + "data": { + "project": { + "workItems": { + "nodes": [{"iid": "1", "widgets": []}], + "pageInfo": {"endCursor": "same_cursor", "hasNextPage": true} + } + } + } + }); + + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with( + ResponseTemplate::new(200).set_body_json(stall_response.clone()), + ) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + // Should have aborted after detecting stall, returning partial results + assert!( + result.all_fetched_iids.contains(&1), + "Should contain the one IID fetched before stall" + ); + // Pagination should NOT have looped infinitely — wiremock would time out + // The test passing at all proves the guard worked + } + + // ── T35: Successful fetch → unsupported_reason is None ─────────────── + #[tokio::test] + async fn test_fetch_statuses_unsupported_reason_none_on_success() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json( + make_work_items_page(&[(1, Some("To do"))], false, None), + )) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert!(result.unsupported_reason.is_none()); + } + + // ── T36: Complexity error reduces page size ────────────────────────── + #[tokio::test] + async fn test_fetch_statuses_complexity_error_reduces_page_size() { + let server = MockServer::start().await; + let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0)); + let call_count_clone = call_count.clone(); + + // First call (page_size=100) → complexity error + // Second call (page_size=50) → success + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(move |req: &wiremock::Request| { + let n = call_count_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst); + if n == 0 { + // First request: simulate complexity error + ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "errors": [{"message": "Query has complexity of 300, which exceeds max complexity of 250"}] + })) + } else { + // Subsequent requests: return data + ResponseTemplate::new(200).set_body_json(make_work_items_page( + &[(1, Some("In progress"))], + false, + None, + )) + } + }) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert_eq!(result.statuses.len(), 1); + assert_eq!(result.statuses[&1].name, "In progress"); + // Should have made 2 calls: first failed, second succeeded + assert_eq!(call_count.load(std::sync::atomic::Ordering::SeqCst), 2); + } + + // ── T37: Timeout error reduces page size ───────────────────────────── + #[tokio::test] + async fn test_fetch_statuses_timeout_error_reduces_page_size() { + let server = MockServer::start().await; + let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0)); + let call_count_clone = call_count.clone(); + + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(move |_req: &wiremock::Request| { + let n = call_count_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst); + if n == 0 { + ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "errors": [{"message": "Query timeout after 30000ms"}] + })) + } else { + ResponseTemplate::new(200).set_body_json(make_work_items_page( + &[(5, Some("Done"))], + false, + None, + )) + } + }) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert_eq!(result.statuses.len(), 1); + assert!(call_count.load(std::sync::atomic::Ordering::SeqCst) >= 2); + } + + // ── T38: Smallest page still fails → returns Err ───────────────────── + #[tokio::test] + async fn test_fetch_statuses_smallest_page_still_fails() { + let server = MockServer::start().await; + + // Always return complexity error regardless of page size + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "errors": [{"message": "Query has complexity of 9999"}] + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let err = fetch_issue_statuses(&client, "group/project").await.unwrap_err(); + + assert!( + matches!(err, LoreError::Other(_)), + "Expected error after exhausting all page sizes, got: {err:?}" + ); + } + + // ── T39: Page size resets after success ─────────────────────────────── + #[tokio::test] + async fn test_fetch_statuses_page_size_resets_after_success() { + let server = MockServer::start().await; + let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0)); + let call_count_clone = call_count.clone(); + + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(move |_req: &wiremock::Request| { + let n = call_count_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst); + match n { + 0 => { + // Page 1 at size 100: success, has next page + ResponseTemplate::new(200).set_body_json(make_work_items_page( + &[(1, Some("To do"))], + true, + Some("cursor_p2"), + )) + } + 1 => { + // Page 2 at size 100 (reset): complexity error + ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "errors": [{"message": "Query has complexity of 300"}] + })) + } + 2 => { + // Page 2 retry at size 50: success + ResponseTemplate::new(200).set_body_json(make_work_items_page( + &[(2, Some("Done"))], + false, + None, + )) + } + _ => ResponseTemplate::new(500), + } + }) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert_eq!(result.statuses.len(), 2); + assert!(result.statuses.contains_key(&1)); + assert!(result.statuses.contains_key(&2)); + assert_eq!(call_count.load(std::sync::atomic::Ordering::SeqCst), 3); + } + + // ── T42: Partial errors tracked across pages ───────────────────────── + #[tokio::test] + async fn test_fetch_statuses_partial_errors_tracked() { + let server = MockServer::start().await; + + // Return data + errors (partial success) + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": { + "project": { + "workItems": { + "nodes": [{"iid": "1", "widgets": [ + {"__typename": "WorkItemWidgetStatus", "status": {"name": "To do"}} + ]}], + "pageInfo": {"endCursor": null, "hasNextPage": false} + } + } + }, + "errors": [{"message": "Rate limit warning: approaching limit"}] + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert_eq!(result.partial_error_count, 1); + assert_eq!( + result.first_partial_error.as_deref(), + Some("Rate limit warning: approaching limit") + ); + // Data should still be present + assert_eq!(result.statuses.len(), 1); + } + + // ── T49 (NEW): Empty project returns empty result ──────────────────── + #[tokio::test] + async fn test_fetch_statuses_empty_project() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": { + "project": { + "workItems": { + "nodes": [], + "pageInfo": {"endCursor": null, "hasNextPage": false} + } + } + } + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert!(result.statuses.is_empty()); + assert!(result.all_fetched_iids.is_empty()); + assert!(result.unsupported_reason.is_none()); + assert_eq!(result.partial_error_count, 0); + } + + // ── T50 (NEW): Status widget with null status → in all_fetched but not in statuses + #[tokio::test] + async fn test_fetch_statuses_null_status_in_widget() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with( + ResponseTemplate::new(200).set_body_json(make_null_status_widget_page(42)), + ) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + assert!(result.statuses.is_empty(), "Null status should not be in map"); + assert!( + result.all_fetched_iids.contains(&42), + "IID should still be tracked in all_fetched_iids" + ); + } + + // ── T51 (NEW): Non-numeric IID is silently skipped ─────────────────── + #[tokio::test] + async fn test_fetch_statuses_non_numeric_iid_skipped() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": { + "project": { + "workItems": { + "nodes": [ + { + "iid": "not_a_number", + "widgets": [{"__typename": "WorkItemWidgetStatus", "status": {"name": "To do"}}] + }, + { + "iid": "7", + "widgets": [{"__typename": "WorkItemWidgetStatus", "status": {"name": "Done"}}] + } + ], + "pageInfo": {"endCursor": null, "hasNextPage": false} + } + } + } + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + // Non-numeric IID silently skipped, numeric IID present + assert_eq!(result.statuses.len(), 1); + assert!(result.statuses.contains_key(&7)); + assert_eq!(result.all_fetched_iids.len(), 1); + } + + // ── T52 (NEW): Pagination cursor None with hasNextPage=true → aborts + #[tokio::test] + async fn test_fetch_statuses_null_cursor_with_has_next_aborts() { + let server = MockServer::start().await; + Mock::given(method("POST")) + .and(path("/api/graphql")) + .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({ + "data": { + "project": { + "workItems": { + "nodes": [{"iid": "1", "widgets": []}], + "pageInfo": {"endCursor": null, "hasNextPage": true} + } + } + } + }))) + .mount(&server) + .await; + + let client = GraphqlClient::new(&server.uri(), "tok123"); + let result = fetch_issue_statuses(&client, "group/project").await.unwrap(); + + // Should abort after first page (null cursor + hasNextPage=true) + assert_eq!(result.all_fetched_iids.len(), 1); + } +} +``` + +--- + +## File 3: `tests/migration_tests.rs` — append to existing file + +Tests AC-4 (Migration 021). + +```rust +// ── T15: Migration 021 adds all 5 status columns ──────────────────── +#[test] +fn test_migration_021_adds_columns() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + + let columns: Vec = conn + .prepare("PRAGMA table_info(issues)") + .unwrap() + .query_map([], |row| row.get(1)) + .unwrap() + .filter_map(|r| r.ok()) + .collect(); + + let expected = [ + "status_name", + "status_category", + "status_color", + "status_icon_name", + "status_synced_at", + ]; + for col in &expected { + assert!( + columns.contains(&col.to_string()), + "Missing column: {col}. Found: {columns:?}" + ); + } +} + +// ── T16: Migration 021 adds compound index ─────────────────────────── +#[test] +fn test_migration_021_adds_index() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + + let indexes: Vec = conn + .prepare("PRAGMA index_list(issues)") + .unwrap() + .query_map([], |row| row.get::<_, String>(1)) + .unwrap() + .filter_map(|r| r.ok()) + .collect(); + + assert!( + indexes.contains(&"idx_issues_project_status_name".to_string()), + "Missing index idx_issues_project_status_name. Found: {indexes:?}" + ); +} + +// ── T53 (NEW): Existing issues retain NULL defaults after migration ── +#[test] +fn test_migration_021_existing_rows_have_null_defaults() { + let conn = create_test_db(); + apply_migrations(&conn, 20); + + // Insert an issue before migration 021 + conn.execute( + "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) + VALUES (1, 100, 'group/project')", + [], + ) + .unwrap(); + conn.execute( + "INSERT INTO issues (gitlab_id, project_id, iid, state, created_at, updated_at, last_seen_at) + VALUES (1, 1, 42, 'opened', 1000, 1000, 1000)", + [], + ) + .unwrap(); + + // Now apply migration 021 + apply_migrations(&conn, 21); + + let (name, category, color, icon, synced_at): ( + Option, + Option, + Option, + Option, + Option, + ) = conn + .query_row( + "SELECT status_name, status_category, status_color, status_icon_name, status_synced_at + FROM issues WHERE iid = 42", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?, row.get(4)?)), + ) + .unwrap(); + + assert!(name.is_none(), "status_name should be NULL"); + assert!(category.is_none(), "status_category should be NULL"); + assert!(color.is_none(), "status_color should be NULL"); + assert!(icon.is_none(), "status_icon_name should be NULL"); + assert!(synced_at.is_none(), "status_synced_at should be NULL"); +} + +// ── T54 (NEW): SELECT on new columns succeeds after migration ──────── +#[test] +fn test_migration_021_select_new_columns_succeeds() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + + // This is the exact query pattern used in show.rs + let result = conn.execute_batch( + "SELECT status_name, status_category, status_color, status_icon_name, status_synced_at + FROM issues LIMIT 1", + ); + assert!(result.is_ok(), "SELECT on new columns should succeed: {result:?}"); +} +``` + +--- + +## File 4: `src/core/config.rs` — inline `#[cfg(test)]` additions + +Tests AC-5 (Config toggle). + +```rust +#[cfg(test)] +mod tests { + use super::*; + + // ── T23: Default SyncConfig has fetch_work_item_status=true ────── + #[test] + fn test_config_fetch_work_item_status_default_true() { + let config = SyncConfig::default(); + assert!(config.fetch_work_item_status); + } + + // ── T24: JSON without key defaults to true ────────────────────── + #[test] + fn test_config_deserialize_without_key() { + // Minimal SyncConfig JSON — no fetchWorkItemStatus key + let json = r#"{}"#; + let config: SyncConfig = serde_json::from_str(json).unwrap(); + assert!( + config.fetch_work_item_status, + "Missing key should default to true" + ); + } +} +``` + +--- + +## File 5: `tests/status_enrichment_tests.rs` (NEW) + +Tests AC-6 (Orchestrator enrichment), AC-10 (Robot envelope). + +```rust +//! Integration tests for status enrichment DB operations. + +use rusqlite::Connection; +use std::collections::{HashMap, HashSet}; +use std::path::PathBuf; + +// Import the enrichment function — it must be pub(crate) or pub for this to work. +// If it's private to orchestrator, these tests go inline. Adjust path as needed. +use lore::gitlab::types::WorkItemStatus; + +fn get_migrations_dir() -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("migrations") +} + +fn apply_migrations(conn: &Connection, through_version: i32) { + let migrations_dir = get_migrations_dir(); + for version in 1..=through_version { + let entries: Vec<_> = std::fs::read_dir(&migrations_dir) + .unwrap() + .filter_map(|e| e.ok()) + .filter(|e| { + e.file_name() + .to_string_lossy() + .starts_with(&format!("{:03}", version)) + }) + .collect(); + assert!(!entries.is_empty(), "Migration {} not found", version); + let sql = std::fs::read_to_string(entries[0].path()).unwrap(); + conn.execute_batch(&sql) + .unwrap_or_else(|e| panic!("Migration {} failed: {}", version, e)); + } +} + +fn create_test_db() -> Connection { + let conn = Connection::open_in_memory().unwrap(); + conn.pragma_update(None, "foreign_keys", "ON").unwrap(); + conn +} + +/// Insert a test project and issue into the DB. +fn seed_issue(conn: &Connection, project_id: i64, iid: i64) { + conn.execute( + "INSERT OR IGNORE INTO projects (id, gitlab_project_id, path_with_namespace, web_url) + VALUES (?1, ?1, 'group/project', 'https://gitlab.example.com/group/project')", + [project_id], + ) + .unwrap(); + conn.execute( + "INSERT INTO issues (gitlab_id, project_id, iid, state, created_at, updated_at, last_seen_at) + VALUES (?1, ?2, ?1, 'opened', 1000, 1000, 1000)", + rusqlite::params![iid, project_id], + ) + .unwrap(); +} + +/// Insert an issue with pre-existing status values (for clear/idempotency tests). +fn seed_issue_with_status( + conn: &Connection, + project_id: i64, + iid: i64, + status_name: &str, + synced_at: i64, +) { + seed_issue(conn, project_id, iid); + conn.execute( + "UPDATE issues SET status_name = ?1, status_category = 'IN_PROGRESS', + status_color = '#1f75cb', status_icon_name = 'status-in-progress', + status_synced_at = ?2 + WHERE project_id = ?3 AND iid = ?4", + rusqlite::params![status_name, synced_at, project_id, iid], + ) + .unwrap(); +} + +fn make_status(name: &str) -> WorkItemStatus { + WorkItemStatus { + name: name.to_string(), + category: Some("IN_PROGRESS".to_string()), + color: Some("#1f75cb".to_string()), + icon_name: Some("status-in-progress".to_string()), + } +} + +fn read_status(conn: &Connection, project_id: i64, iid: i64) -> ( + Option, + Option, + Option, + Option, + Option, +) { + conn.query_row( + "SELECT status_name, status_category, status_color, status_icon_name, status_synced_at + FROM issues WHERE project_id = ?1 AND iid = ?2", + rusqlite::params![project_id, iid], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?, row.get(4)?)), + ) + .unwrap() +} + +// ── T17: Enrich writes all 4 status columns + synced_at ────────────── +#[test] +fn test_enrich_issue_statuses_txn() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue(&conn, 1, 42); + + let mut statuses = HashMap::new(); + statuses.insert(42_i64, make_status("In progress")); + let all_fetched: HashSet = [42].into_iter().collect(); + let now_ms = 1_700_000_000_000_i64; + + // Call the enrichment function + let tx = conn.unchecked_transaction().unwrap(); + let mut update_stmt = tx + .prepare_cached( + "UPDATE issues SET status_name = ?1, status_category = ?2, status_color = ?3, + status_icon_name = ?4, status_synced_at = ?5 + WHERE project_id = ?6 AND iid = ?7", + ) + .unwrap(); + let mut enriched = 0usize; + for (iid, status) in &statuses { + let rows = update_stmt + .execute(rusqlite::params![ + &status.name, + &status.category, + &status.color, + &status.icon_name, + now_ms, + 1_i64, + iid, + ]) + .unwrap(); + if rows > 0 { + enriched += 1; + } + } + tx.commit().unwrap(); + + assert_eq!(enriched, 1); + + let (name, cat, color, icon, synced) = read_status(&conn, 1, 42); + assert_eq!(name.as_deref(), Some("In progress")); + assert_eq!(cat.as_deref(), Some("IN_PROGRESS")); + assert_eq!(color.as_deref(), Some("#1f75cb")); + assert_eq!(icon.as_deref(), Some("status-in-progress")); + assert_eq!(synced, Some(now_ms)); +} + +// ── T18: Unknown IID in status map → no error, returns 0 ──────────── +#[test] +fn test_enrich_skips_unknown_iids() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + // Don't insert any issues + + let mut statuses = HashMap::new(); + statuses.insert(999_i64, make_status("In progress")); + let all_fetched: HashSet = [999].into_iter().collect(); + + let tx = conn.unchecked_transaction().unwrap(); + let mut update_stmt = tx + .prepare_cached( + "UPDATE issues SET status_name = ?1, status_category = ?2, status_color = ?3, + status_icon_name = ?4, status_synced_at = ?5 + WHERE project_id = ?6 AND iid = ?7", + ) + .unwrap(); + let mut enriched = 0usize; + for (iid, status) in &statuses { + let rows = update_stmt + .execute(rusqlite::params![ + &status.name, &status.category, &status.color, &status.icon_name, + 1_700_000_000_000_i64, 1_i64, iid, + ]) + .unwrap(); + if rows > 0 { + enriched += 1; + } + } + tx.commit().unwrap(); + + assert_eq!(enriched, 0, "No DB rows match → 0 enriched"); +} + +// ── T19: Removed status → fields NULLed, synced_at updated ────────── +#[test] +fn test_enrich_clears_removed_status() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue_with_status(&conn, 1, 42, "In progress", 1_600_000_000_000); + + // Issue 42 is in all_fetched but NOT in statuses → should be cleared + let statuses: HashMap = HashMap::new(); + let all_fetched: HashSet = [42].into_iter().collect(); + let now_ms = 1_700_000_000_000_i64; + + let tx = conn.unchecked_transaction().unwrap(); + let mut clear_stmt = tx + .prepare_cached( + "UPDATE issues SET status_name = NULL, status_category = NULL, status_color = NULL, + status_icon_name = NULL, status_synced_at = ?3 + WHERE project_id = ?1 AND iid = ?2 AND status_name IS NOT NULL", + ) + .unwrap(); + let mut cleared = 0usize; + for iid in &all_fetched { + if !statuses.contains_key(iid) { + let rows = clear_stmt + .execute(rusqlite::params![1_i64, iid, now_ms]) + .unwrap(); + if rows > 0 { + cleared += 1; + } + } + } + tx.commit().unwrap(); + + assert_eq!(cleared, 1); + + let (name, cat, color, icon, synced) = read_status(&conn, 1, 42); + assert!(name.is_none(), "status_name should be NULL after clear"); + assert!(cat.is_none(), "status_category should be NULL after clear"); + assert!(color.is_none(), "status_color should be NULL after clear"); + assert!(icon.is_none(), "status_icon_name should be NULL after clear"); + // Crucially: synced_at is NOT NULL — it records when we confirmed absence + assert_eq!(synced, Some(now_ms), "status_synced_at should be updated to now_ms"); +} + +// ── T20: Transaction rolls back on simulated failure ───────────────── +#[test] +fn test_enrich_transaction_rolls_back_on_failure() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue_with_status(&conn, 1, 42, "Original", 1_600_000_000_000); + seed_issue(&conn, 1, 43); + + // Simulate: start transaction, update issue 42, then fail before commit + let result = (|| -> rusqlite::Result<()> { + let tx = conn.unchecked_transaction()?; + tx.execute( + "UPDATE issues SET status_name = 'Changed' WHERE project_id = 1 AND iid = 42", + [], + )?; + // Simulate error: intentionally cause failure + Err(rusqlite::Error::SqliteFailure( + rusqlite::ffi::Error::new(1), + Some("simulated failure".to_string()), + )) + // tx drops without commit → rollback + })(); + + assert!(result.is_err()); + + // Original status should be intact (transaction rolled back) + let (name, _, _, _, _) = read_status(&conn, 1, 42); + assert_eq!( + name.as_deref(), + Some("Original"), + "Transaction should have rolled back" + ); +} + +// ── T26: Idempotent across two runs ────────────────────────────────── +#[test] +fn test_enrich_idempotent_across_two_runs() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue(&conn, 1, 42); + + let mut statuses = HashMap::new(); + statuses.insert(42_i64, make_status("In progress")); + let all_fetched: HashSet = [42].into_iter().collect(); + let now_ms = 1_700_000_000_000_i64; + + // Run enrichment twice with same data + for _ in 0..2 { + let tx = conn.unchecked_transaction().unwrap(); + let mut stmt = tx + .prepare_cached( + "UPDATE issues SET status_name = ?1, status_category = ?2, status_color = ?3, + status_icon_name = ?4, status_synced_at = ?5 + WHERE project_id = ?6 AND iid = ?7", + ) + .unwrap(); + for (iid, status) in &statuses { + stmt.execute(rusqlite::params![ + &status.name, &status.category, &status.color, &status.icon_name, + now_ms, 1_i64, iid, + ]) + .unwrap(); + } + tx.commit().unwrap(); + } + + let (name, _, _, _, _) = read_status(&conn, 1, 42); + assert_eq!(name.as_deref(), Some("In progress")); +} + +// ── T30: Clear sets synced_at (not NULL) ───────────────────────────── +#[test] +fn test_enrich_sets_synced_at_on_clear() { + // Same as T19 but explicitly named for the AC assertion + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue_with_status(&conn, 1, 10, "Done", 1_500_000_000_000); + + let now_ms = 1_700_000_000_000_i64; + conn.execute( + "UPDATE issues SET status_name = NULL, status_category = NULL, status_color = NULL, + status_icon_name = NULL, status_synced_at = ?1 + WHERE project_id = 1 AND iid = 10", + [now_ms], + ) + .unwrap(); + + let (_, _, _, _, synced) = read_status(&conn, 1, 10); + assert_eq!( + synced, + Some(now_ms), + "Clearing status must still set synced_at to record the check" + ); +} + +// ── T31: Enrichment error captured in IngestProjectResult ──────────── +// NOTE: This test validates the struct field exists and can hold a value. +// Full integration requires the orchestrator wiring which is tested via +// cargo test on the actual orchestrator code. +#[test] +fn test_enrichment_error_captured_in_result() { + // This is a compile-time + field-existence test. + // IngestProjectResult must have status_enrichment_error: Option + // The actual population is tested in the orchestrator integration test. + // + // Pseudo-test structure (will compile once IngestProjectResult is updated): + // + // let mut result = IngestProjectResult::default(); + // result.status_enrichment_error = Some("GraphQL error: timeout".to_string()); + // assert_eq!( + // result.status_enrichment_error.as_deref(), + // Some("GraphQL error: timeout") + // ); + // + // Uncomment when IngestProjectResult is implemented. +} + +// ── T32: Robot sync envelope includes status_enrichment object ─────── +// NOTE: This is an E2E test that requires running the full CLI. +// Kept here as specification — implementation requires capturing CLI JSON output. +#[test] +fn test_robot_sync_includes_status_enrichment() { + // Specification: lore --robot sync output must include per-project: + // { + // "status_enrichment": { + // "mode": "fetched|unsupported|skipped", + // "reason": null | "graphql_endpoint_missing" | "auth_forbidden", + // "seen": N, + // "enriched": N, + // "cleared": N, + // "without_widget": N, + // "partial_errors": N, + // "first_partial_error": null | "message", + // "error": null | "message" + // } + // } + // + // This is validated by inspecting the JSON serialization of IngestProjectResult + // in the sync output path. The struct field tests above + serialization tests + // in the CLI layer cover this. +} + +// ── T41: Project path missing → enrichment skipped ─────────────────── +#[test] +fn test_project_path_missing_skips_enrichment() { + use rusqlite::OptionalExtension; + + let conn = create_test_db(); + apply_migrations(&conn, 21); + + // Insert a project WITHOUT path_with_namespace + // (In practice, all projects have this, but the lookup uses .optional()?) + conn.execute( + "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) + VALUES (1, 100, 'group/project')", + [], + ) + .unwrap(); + + // Simulate the orchestrator's path lookup for a non-existent project_id + let project_path: Option = conn + .query_row( + "SELECT path_with_namespace FROM projects WHERE id = ?1", + [999_i64], // non-existent project + |r| r.get(0), + ) + .optional() + .unwrap(); + + assert!( + project_path.is_none(), + "Non-existent project should return None" + ); + // The orchestrator should set: + // result.status_enrichment_error = Some("project_path_missing".to_string()); +} + +// ── T55 (NEW): Config toggle false → enrichment skipped ────────────── +#[test] +fn test_config_toggle_false_skips_enrichment() { + // Validates the SyncConfig toggle behavior + let json = r#"{"fetchWorkItemStatus": false}"#; + let config: lore::core::config::SyncConfig = serde_json::from_str(json).unwrap(); + assert!( + !config.fetch_work_item_status, + "Explicit false should override default" + ); + // When this is false, orchestrator skips enrichment and sets + // result.status_enrichment_mode = "skipped" +} +``` + +--- + +## File 6: `tests/status_filter_tests.rs` (NEW) + +Tests AC-9 (List filter). + +```rust +//! Integration tests for --status filter on issue listing. + +use rusqlite::Connection; +use std::path::PathBuf; + +fn get_migrations_dir() -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("migrations") +} + +fn apply_migrations(conn: &Connection, through_version: i32) { + let migrations_dir = get_migrations_dir(); + for version in 1..=through_version { + let entries: Vec<_> = std::fs::read_dir(&migrations_dir) + .unwrap() + .filter_map(|e| e.ok()) + .filter(|e| { + e.file_name() + .to_string_lossy() + .starts_with(&format!("{:03}", version)) + }) + .collect(); + assert!(!entries.is_empty(), "Migration {} not found", version); + let sql = std::fs::read_to_string(entries[0].path()).unwrap(); + conn.execute_batch(&sql) + .unwrap_or_else(|e| panic!("Migration {} failed: {}", version, e)); + } +} + +fn create_test_db() -> Connection { + let conn = Connection::open_in_memory().unwrap(); + conn.pragma_update(None, "foreign_keys", "ON").unwrap(); + conn +} + +/// Seed a project + issue with status +fn seed_issue_with_status( + conn: &Connection, + project_id: i64, + iid: i64, + state: &str, + status_name: Option<&str>, +) { + conn.execute( + "INSERT OR IGNORE INTO projects (id, gitlab_project_id, path_with_namespace, web_url) + VALUES (?1, ?1, 'group/project', 'https://gitlab.example.com/group/project')", + [project_id], + ) + .unwrap(); + conn.execute( + "INSERT INTO issues (gitlab_id, project_id, iid, state, created_at, updated_at, last_seen_at, status_name) + VALUES (?1, ?2, ?1, ?3, 1000, 1000, 1000, ?4)", + rusqlite::params![iid, project_id, state, status_name], + ) + .unwrap(); +} + +// ── T21: Filter by status returns correct issue ────────────────────── +#[test] +fn test_list_filter_by_status() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue_with_status(&conn, 1, 1, "opened", Some("In progress")); + seed_issue_with_status(&conn, 1, 2, "opened", Some("To do")); + + let count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM issues WHERE status_name = ?1 COLLATE NOCASE", + ["In progress"], + |r| r.get(0), + ) + .unwrap(); + assert_eq!(count, 1); +} + +// ── T22: Case-insensitive status filter ────────────────────────────── +#[test] +fn test_list_filter_by_status_case_insensitive() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue_with_status(&conn, 1, 1, "opened", Some("In progress")); + + let count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM issues WHERE status_name = ?1 COLLATE NOCASE", + ["in progress"], + |r| r.get(0), + ) + .unwrap(); + assert_eq!(count, 1, "'in progress' should match 'In progress' via COLLATE NOCASE"); +} + +// ── T40: Multiple --status values (OR semantics) ───────────────────── +#[test] +fn test_list_filter_by_multiple_statuses() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue_with_status(&conn, 1, 1, "opened", Some("In progress")); + seed_issue_with_status(&conn, 1, 2, "opened", Some("To do")); + seed_issue_with_status(&conn, 1, 3, "closed", Some("Done")); + + let count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM issues + WHERE status_name IN (?1, ?2) COLLATE NOCASE", + rusqlite::params!["In progress", "To do"], + |r| r.get(0), + ) + .unwrap(); + assert_eq!(count, 2, "Should match both 'In progress' and 'To do'"); +} + +// ── T61 (NEW): --status combined with --state (AND logic) ──────────── +#[test] +fn test_list_filter_status_and_state() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue_with_status(&conn, 1, 1, "opened", Some("In progress")); + seed_issue_with_status(&conn, 1, 2, "closed", Some("In progress")); + + let count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM issues + WHERE state = ?1 AND status_name = ?2 COLLATE NOCASE", + rusqlite::params!["opened", "In progress"], + |r| r.get(0), + ) + .unwrap(); + assert_eq!(count, 1, "Only the opened issue matches both filters"); +} + +// ── T62 (NEW): --status with no matching issues → 0 results ───────── +#[test] +fn test_list_filter_by_status_no_match() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue_with_status(&conn, 1, 1, "opened", Some("In progress")); + + let count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM issues WHERE status_name = ?1 COLLATE NOCASE", + ["Nonexistent status"], + |r| r.get(0), + ) + .unwrap(); + assert_eq!(count, 0); +} + +// ── T63 (NEW): NULL status excluded from filter ────────────────────── +#[test] +fn test_list_filter_by_status_excludes_null() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue_with_status(&conn, 1, 1, "opened", Some("In progress")); + seed_issue_with_status(&conn, 1, 2, "opened", None); // No status + + let count: i64 = conn + .query_row( + "SELECT COUNT(*) FROM issues WHERE status_name = ?1 COLLATE NOCASE", + ["In progress"], + |r| r.get(0), + ) + .unwrap(); + assert_eq!(count, 1, "NULL status should not match"); +} +``` + +--- + +## File 7: `tests/status_display_tests.rs` (NEW) + +Tests AC-7 (Show display), AC-8 (List display). + +```rust +//! Integration tests for status field presence in show/list SQL queries. +//! These verify the data layer — not terminal rendering (which requires +//! visual inspection or snapshot testing). + +use rusqlite::Connection; +use std::path::PathBuf; + +fn get_migrations_dir() -> PathBuf { + PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("migrations") +} + +fn apply_migrations(conn: &Connection, through_version: i32) { + let migrations_dir = get_migrations_dir(); + for version in 1..=through_version { + let entries: Vec<_> = std::fs::read_dir(&migrations_dir) + .unwrap() + .filter_map(|e| e.ok()) + .filter(|e| { + e.file_name() + .to_string_lossy() + .starts_with(&format!("{:03}", version)) + }) + .collect(); + assert!(!entries.is_empty(), "Migration {} not found", version); + let sql = std::fs::read_to_string(entries[0].path()).unwrap(); + conn.execute_batch(&sql) + .unwrap_or_else(|e| panic!("Migration {} failed: {}", version, e)); + } +} + +fn create_test_db() -> Connection { + let conn = Connection::open_in_memory().unwrap(); + conn.pragma_update(None, "foreign_keys", "ON").unwrap(); + conn +} + +fn seed_issue(conn: &Connection, iid: i64, status_name: Option<&str>, status_category: Option<&str>) { + conn.execute( + "INSERT OR IGNORE INTO projects (id, gitlab_project_id, path_with_namespace, web_url) + VALUES (1, 100, 'group/project', 'https://gitlab.example.com/group/project')", + [], + ) + .unwrap(); + conn.execute( + "INSERT INTO issues (gitlab_id, project_id, iid, state, created_at, updated_at, last_seen_at, + status_name, status_category, status_color, status_icon_name, status_synced_at) + VALUES (?1, 1, ?1, 'opened', 1000, 1000, 1000, ?2, ?3, '#1f75cb', 'status', 1700000000000)", + rusqlite::params![iid, status_name, status_category], + ) + .unwrap(); +} + +// ── T56 (NEW): Show issue query includes status fields ─────────────── +#[test] +fn test_show_issue_includes_status_fields() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue(&conn, 42, Some("In progress"), Some("IN_PROGRESS")); + + // Simulate the show.rs SQL query — verify all 5 status columns are readable + let (name, cat, color, icon, synced): ( + Option, Option, Option, Option, Option, + ) = conn + .query_row( + "SELECT i.status_name, i.status_category, i.status_color, + i.status_icon_name, i.status_synced_at + FROM issues i WHERE i.iid = 42", + [], + |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?, row.get(4)?)), + ) + .unwrap(); + + assert_eq!(name.as_deref(), Some("In progress")); + assert_eq!(cat.as_deref(), Some("IN_PROGRESS")); + assert_eq!(color.as_deref(), Some("#1f75cb")); + assert_eq!(icon.as_deref(), Some("status")); + assert_eq!(synced, Some(1_700_000_000_000)); +} + +// ── T57 (NEW): Show issue with NULL status → fields are None ───────── +#[test] +fn test_show_issue_null_status_fields() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue(&conn, 42, None, None); + + let (name, cat): (Option, Option) = conn + .query_row( + "SELECT i.status_name, i.status_category FROM issues i WHERE i.iid = 42", + [], + |row| Ok((row.get(0)?, row.get(1)?)), + ) + .unwrap(); + + assert!(name.is_none()); + assert!(cat.is_none()); +} + +// ── T58 (NEW): Robot show includes null status (not absent) ────────── +// Specification: In robot mode JSON output, status fields must be present +// as null values (not omitted from the JSON entirely). +// Validated by the IssueDetailJson struct having non-skip-serializing fields. +// This is a compile-time guarantee — the struct definition is the test. +#[test] +fn test_robot_show_status_fields_present_when_null() { + // Validate via serde: serialize a struct with None status fields + // and verify the keys are present as null in the output. + #[derive(serde::Serialize)] + struct MockIssueJson { + status_name: Option, + status_category: Option, + status_color: Option, + status_icon_name: Option, + status_synced_at: Option, + } + + let json = serde_json::to_value(MockIssueJson { + status_name: None, + status_category: None, + status_color: None, + status_icon_name: None, + status_synced_at: None, + }) + .unwrap(); + + // Keys must be present (as null), not absent + assert!(json.get("status_name").is_some(), "status_name key must be present"); + assert!(json["status_name"].is_null(), "status_name must be null"); + assert!(json.get("status_synced_at").is_some(), "status_synced_at key must be present"); + assert!(json["status_synced_at"].is_null(), "status_synced_at must be null"); +} + +// ── T59 (NEW): List issues query includes status columns ───────────── +#[test] +fn test_list_issues_includes_status_columns() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue(&conn, 1, Some("To do"), Some("TO_DO")); + seed_issue(&conn, 2, Some("Done"), Some("DONE")); + + let rows: Vec<(i64, Option, Option)> = conn + .prepare( + "SELECT i.iid, i.status_name, i.status_category + FROM issues i ORDER BY i.iid", + ) + .unwrap() + .query_map([], |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?))) + .unwrap() + .filter_map(|r| r.ok()) + .collect(); + + assert_eq!(rows.len(), 2); + assert_eq!(rows[0].1.as_deref(), Some("To do")); + assert_eq!(rows[0].2.as_deref(), Some("TO_DO")); + assert_eq!(rows[1].1.as_deref(), Some("Done")); + assert_eq!(rows[1].2.as_deref(), Some("DONE")); +} + +// ── T60 (NEW): List issues NULL status → empty string in display ───── +#[test] +fn test_list_issues_null_status_returns_none() { + let conn = create_test_db(); + apply_migrations(&conn, 21); + seed_issue(&conn, 1, None, None); + + let status: Option = conn + .query_row( + "SELECT i.status_name FROM issues i WHERE i.iid = 1", + [], + |row| row.get(0), + ) + .unwrap(); + + assert!(status.is_none(), "NULL status should map to None in Rust"); +} +``` + +--- + +## Gap Analysis Summary + +| Gap Found | Test Added | AC | +|-----------|-----------|-----| +| No test for delta-seconds Retry-After | T43 | AC-1 | +| No test for network errors | T44 | AC-1 | +| No test for request body format | T45 | AC-1 | +| No test for base URL trailing slash | T46 | AC-1 | +| No test for data:null response | T47 | AC-1 | +| No test for all 5 system statuses | T48 | AC-2 | +| No test for empty project | T49 | AC-3 | +| No test for null status in widget | T50 | AC-3 | +| No test for non-numeric IID | T51 | AC-3 | +| No test for null cursor with hasNextPage | T52 | AC-3 | +| No test for existing row NULL defaults | T53 | AC-4 | +| No test for SELECT succeeding after migration | T54 | AC-4 | +| No test for config toggle false | T55 | AC-5/6 | +| No test for show issue with status | T56 | AC-7 | +| No test for show issue NULL status | T57 | AC-7 | +| No test for robot JSON null-not-absent | T58 | AC-7 | +| No test for list query with status cols | T59 | AC-8 | +| No test for list NULL status display | T60 | AC-8 | +| No test for --status AND --state combo | T61 | AC-9 | +| No test for --status no match | T62 | AC-9 | +| No test for NULL excluded from filter | T63 | AC-9 | + +--- + +## Wiremock Pattern Notes + +The `fetch_issue_statuses` tests use `wiremock 0.6` with dynamic response handlers +via `respond_with(move |req: &wiremock::Request| { ... })`. This is the recommended +pattern for tests that need to return different responses per call (pagination, +adaptive page sizing). + +Key patterns: +- **Sequential responses**: Use `AtomicUsize` counter in closure +- **Request inspection**: Parse `req.body` as JSON to check variables +- **LIFO mocking**: wiremock matches most-recently-mounted mock first +- **Up-to-n**: Use `.up_to_n_times(1)` for pagination page ordering + +## Cargo.toml Addition + +```toml +[dependencies] +httpdate = "1" # For Retry-After HTTP-date parsing +``` + +This crate is already well-maintained, minimal, and does exactly one thing.