docs: add lore-service, work-item-status-graphql, and time-decay plans

Three implementation plans with iterative cross-model refinement: lore-service (5 iterations): HTTP service layer exposing lore's SQLite data via REST/SSE for integration with external tools (dashboards, IDE extensions, chat agents). Covers authentication, rate limiting, caching strategy, and webhook-driven sync triggers. work-item-status-graphql (7 iterations + TDD appendix): Detailed implementation plan for the GraphQL-based work item status enrichment feature (now implemented). Includes the TDD appendix with test-first development specifications covering GraphQL client, adaptive pagination, ingestion orchestration, CLI display, and robot mode output. time-decay-expert-scoring (iteration 5 feedback): Updates to the existing time-decay scoring plan incorporating feedback on decay curve parameterization, recency weighting for discussion contributions, and staleness detection thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:17 -05:00
parent 1161edb212
commit 2c9de1a6c3
15 changed files with 9261 additions and 33 deletions
--- a/plans/lore-service.feedback-3.md
+++ b/plans/lore-service.feedback-3.md
@@ -0,0 +1,174 @@
+Below are the highest-impact revisions I’d make, ordered by severity/ROI. These focus on correctness first, then security, then operability and UX.
+
+1. **Fix multi-install ambiguity (`service_id` exists, but commands can’t target one explicitly)**
+Analysis: The plan introduces `service-manifest-{service_id}.json`, but `status/uninstall/resume/logs` have no selector. In a multi-workspace or multi-name install scenario, behavior becomes ambiguous and error-prone. Add explicit targeting plus discovery.
+```diff
+@@ ## Commands & User Journeys
+### `lore service list`
+Lists installed services discovered from `{data_dir}/service-manifest-*.json`.
+Robot output includes `service_id`, `platform`, `interval_seconds`, `profile`, `installed_at_iso`.
+
+@@ ### `lore service uninstall`
+-### `lore service uninstall`
+### `lore service uninstall [--service <service_id|name>] [--all]`
+@@
+-2. CLI reads install manifest to find `service_id`
+2. CLI resolves target service via `--service` or current-project-derived default.
+3. If multiple candidates and no selector, return actionable error.
+
+@@ ### `lore service status`
+-### `lore service status`
+### `lore service status [--service <service_id|name>]`
+```
+
+2. **Make status state service-scoped (not global)**
+Analysis: A single `sync-status.json` for all services causes cross-service contamination (pause/backoff/outcome from one profile affecting another). Keep lock global, but state per service.
+```diff
+@@ ## Status File
+-### Location
+-`{get_data_dir()}/sync-status.json`
+### Location
+`{get_data_dir()}/sync-status-{service_id}.json`
+
+@@ ## Paths Module Additions
+-pub fn get_service_status_path() -> PathBuf {
+-    get_data_dir().join("sync-status.json")
+pub fn get_service_status_path(service_id: &str) -> PathBuf {
+    get_data_dir().join(format!("sync-status-{service_id}.json"))
+}
+@@
+-Note: `sync-status.json` is NOT scoped by `service_id`
+Note: status is scoped by `service_id`; lock remains global (`sync_pipeline`) to prevent overlapping writes.
+```
+
+3. **Stop classifying permanence via string matching**
+Analysis: Matching `"401 Unauthorized"` in strings is brittle and will misclassify edge cases. Carry machine codes through stage results and classify by `ErrorCode` only.
+```diff
+@@ pub struct StageResult {
+-    pub error: Option<String>,
+    pub error: Option<String>,
+    pub error_code: Option<String>, // e.g., AUTH_FAILED, NETWORK_ERROR
+}
+@@ Error classification helpers
+-fn is_permanent_error_message(msg: Option<&str>) -> bool { ...string contains... }
+fn is_permanent_error_code(code: Option<&str>) -> bool {
+    matches!(code, Some("TOKEN_NOT_SET" | "AUTH_FAILED" | "CONFIG_NOT_FOUND" | "CONFIG_INVALID" | "MIGRATION_FAILED"))
+}
+```
+
+4. **Install should be transactional (manifest written last)**
+Analysis: Current order writes manifest before scheduler enable. If enable fails, you persist a false “installed” state. Use two-phase install with rollback.
+```diff
+@@ ### `lore service install` User journey
+-9. CLI writes install manifest ...
+-10. CLI runs the platform-specific enable command
+9. CLI runs the platform-specific enable command
+10. On success, CLI writes install manifest atomically
+11. On failure, CLI removes generated files and returns `ServiceCommandFailed`
+```
+
+5. **Fix launchd token security gap (env-file currently still embeds token)**
+Analysis: Current “env-file” on macOS still writes token into plist, defeating the main security goal. Generate a private wrapper script that reads env file at runtime and execs `lore`.
+```diff
+@@ ### macOS: launchd
+-<key>ProgramArguments</key>
+-<array>
+-    <string>{binary_path}</string>
+-    <string>--robot</string>
+-    <string>service</string>
+-    <string>run</string>
+-</array>
+<key>ProgramArguments</key>
+<array>
+    <string>{data_dir}/service-run-{service_id}.sh</string>
+</array>
+@@
+-`env-file`: ... token value must still appear in plist ...
+`env-file`: token never appears in plist; wrapper loads `{data_dir}/service-env-{service_id}` at runtime.
+```
+
+6. **Improve backoff math and add half-open circuit recovery**
+Analysis: Current jitter + min clamp makes first retry deterministic and can over-pause. Also circuit-breaker requires manual resume forever. Add cooldown + half-open probe to self-heal.
+```diff
+@@ Backoff Logic
+-let backoff_secs = ((base_backoff as f64) * jitter_factor) as u64;
+-let backoff_secs = backoff_secs.max(base_interval_seconds);
+let max_backoff = base_backoff;
+let min_backoff = base_interval_seconds;
+let span = max_backoff.saturating_sub(min_backoff);
+let backoff_secs = min_backoff + ((span as f64) * jitter_factor) as u64;
+
+@@ Scheduler states
+-- `paused` — permanent error ... OR circuit breaker tripped ...
+- `paused` — permanent error requiring intervention
+- `half_open` — probe state after circuit cooldown; one trial run allowed
+
+@@ Circuit breaker
+-... transitions to `paused` ... Run: lore service resume
+... transitions to `half_open` after cooldown (default 30m). Successful probe closes breaker automatically; failed probe returns to backoff/paused.
+```
+
+7. **Promote backend trait to v1 (not v2) for deterministic integration tests**
+Analysis: This is a reliability-critical feature spanning OS schedulers. A trait abstraction now gives true behavior tests and safer refactors.
+```diff
+@@ ### Platform Backends
+-> Future architecture note: A `SchedulerBackend` trait ... for v2.
+Adopt `SchedulerBackend` trait in v1 with real backends (`launchd/systemd/schtasks`) and `FakeBackend` for tests.
+This enables deterministic install/uninstall/status/run-path integration tests without touching host scheduler.
+```
+
+8. **Harden `run_cmd` timeout behavior**
+Analysis: If timeout occurs, child process must be killed and reaped. Otherwise you leak processes and can wedge repeated runs.
+```diff
+@@ fn run_cmd(...)
+-// Wait with timeout
+-let output = wait_with_timeout(output, timeout_secs)?;
+// Wait with timeout; on timeout kill child and wait to reap
+let output = wait_with_timeout_kill_and_reap(child, timeout_secs)?;
+```
+
+9. **Add manual control commands (`pause`, `trigger`, `repair`)**
+Analysis: These are high-utility operational controls. `trigger` helps immediate sync without waiting interval. `pause` supports maintenance windows. `repair` avoids manual file deletion for corrupt state.
+```diff
+@@ pub enum ServiceCommand {
+    /// Pause scheduled execution without uninstalling
+    Pause { #[arg(long)] reason: Option<String> },
+    /// Trigger an immediate one-off run using installed profile
+    Trigger { #[arg(long)] ignore_backoff: bool },
+    /// Repair corrupt manifest/status by backing up and reinitializing
+    Repair { #[arg(long)] service: Option<String> },
+}
+```
+
+10. **Make `logs` default non-interactive and add rotation policy**
+Analysis: Opening editor by default is awkward for automation/SSH and slower for normal diagnosis. Defaulting to `tail` is more practical; `--open` can preserve editor behavior.
+```diff
+@@ ### `lore service logs`
+-By default, opens in the user's preferred editor.
+By default, prints last 100 lines to stdout.
+Use `--open` to open editor.
+@@
+Log rotation: rotate `service-stdout.log` / `service-stderr.log` at 10 MB, keep 5 files.
+```
+
+11. **Remove destructive/shell-unsafe suggested action**
+Analysis: `actions(): ["rm {path}", ...]` is unsafe (shell injection + destructive guidance). Replace with safe command path.
+```diff
+@@ LoreError::actions()
+-Self::ServiceCorruptState { path, .. } => vec![&format!("rm {path}"), "lore service install"],
+Self::ServiceCorruptState { .. } => vec!["lore service repair", "lore service install"],
+```
+
+12. **Tighten scheduler units for real-world reliability**
+Analysis: Add explicit working directory and success-exit handling to reduce environment drift and edge failures.
+```diff
+@@ systemd service unit
+ [Service]
+ Type=oneshot
+ ExecStart={binary_path} --robot service run
+WorkingDirectory={data_dir}
+SuccessExitStatus=0
+ TimeoutStartSec=900
+```
+
+If you want, I can produce a single consolidated “v3 plan” markdown with these revisions already merged into your original structure.