docs: add lore-service, work-item-status-graphql, and time-decay plans

Three implementation plans with iterative cross-model refinement:

lore-service (5 iterations):
  HTTP service layer exposing lore's SQLite data via REST/SSE for
  integration with external tools (dashboards, IDE extensions, chat
  agents). Covers authentication, rate limiting, caching strategy, and
  webhook-driven sync triggers.

work-item-status-graphql (7 iterations + TDD appendix):
  Detailed implementation plan for the GraphQL-based work item status
  enrichment feature (now implemented). Includes the TDD appendix with
  test-first development specifications covering GraphQL client, adaptive
  pagination, ingestion orchestration, CLI display, and robot mode output.

time-decay-expert-scoring (iteration 5 feedback):
  Updates to the existing time-decay scoring plan incorporating feedback
  on decay curve parameterization, recency weighting for discussion
  contributions, and staleness detection thresholds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Taylor Eernisse
2026-02-11 08:12:17 -05:00
parent 1161edb212
commit 2c9de1a6c3
15 changed files with 9261 additions and 33 deletions

View File

@@ -0,0 +1,174 @@
Below are the highest-impact revisions Id make, ordered by severity/ROI. These focus on correctness first, then security, then operability and UX.
1. **Fix multi-install ambiguity (`service_id` exists, but commands cant target one explicitly)**
Analysis: The plan introduces `service-manifest-{service_id}.json`, but `status/uninstall/resume/logs` have no selector. In a multi-workspace or multi-name install scenario, behavior becomes ambiguous and error-prone. Add explicit targeting plus discovery.
```diff
@@ ## Commands & User Journeys
+### `lore service list`
+Lists installed services discovered from `{data_dir}/service-manifest-*.json`.
+Robot output includes `service_id`, `platform`, `interval_seconds`, `profile`, `installed_at_iso`.
@@ ### `lore service uninstall`
-### `lore service uninstall`
+### `lore service uninstall [--service <service_id|name>] [--all]`
@@
-2. CLI reads install manifest to find `service_id`
+2. CLI resolves target service via `--service` or current-project-derived default.
+3. If multiple candidates and no selector, return actionable error.
@@ ### `lore service status`
-### `lore service status`
+### `lore service status [--service <service_id|name>]`
```
2. **Make status state service-scoped (not global)**
Analysis: A single `sync-status.json` for all services causes cross-service contamination (pause/backoff/outcome from one profile affecting another). Keep lock global, but state per service.
```diff
@@ ## Status File
-### Location
-`{get_data_dir()}/sync-status.json`
+### Location
+`{get_data_dir()}/sync-status-{service_id}.json`
@@ ## Paths Module Additions
-pub fn get_service_status_path() -> PathBuf {
- get_data_dir().join("sync-status.json")
+pub fn get_service_status_path(service_id: &str) -> PathBuf {
+ get_data_dir().join(format!("sync-status-{service_id}.json"))
}
@@
-Note: `sync-status.json` is NOT scoped by `service_id`
+Note: status is scoped by `service_id`; lock remains global (`sync_pipeline`) to prevent overlapping writes.
```
3. **Stop classifying permanence via string matching**
Analysis: Matching `"401 Unauthorized"` in strings is brittle and will misclassify edge cases. Carry machine codes through stage results and classify by `ErrorCode` only.
```diff
@@ pub struct StageResult {
- pub error: Option<String>,
+ pub error: Option<String>,
+ pub error_code: Option<String>, // e.g., AUTH_FAILED, NETWORK_ERROR
}
@@ Error classification helpers
-fn is_permanent_error_message(msg: Option<&str>) -> bool { ...string contains... }
+fn is_permanent_error_code(code: Option<&str>) -> bool {
+ matches!(code, Some("TOKEN_NOT_SET" | "AUTH_FAILED" | "CONFIG_NOT_FOUND" | "CONFIG_INVALID" | "MIGRATION_FAILED"))
+}
```
4. **Install should be transactional (manifest written last)**
Analysis: Current order writes manifest before scheduler enable. If enable fails, you persist a false “installed” state. Use two-phase install with rollback.
```diff
@@ ### `lore service install` User journey
-9. CLI writes install manifest ...
-10. CLI runs the platform-specific enable command
+9. CLI runs the platform-specific enable command
+10. On success, CLI writes install manifest atomically
+11. On failure, CLI removes generated files and returns `ServiceCommandFailed`
```
5. **Fix launchd token security gap (env-file currently still embeds token)**
Analysis: Current “env-file” on macOS still writes token into plist, defeating the main security goal. Generate a private wrapper script that reads env file at runtime and execs `lore`.
```diff
@@ ### macOS: launchd
-<key>ProgramArguments</key>
-<array>
- <string>{binary_path}</string>
- <string>--robot</string>
- <string>service</string>
- <string>run</string>
-</array>
+<key>ProgramArguments</key>
+<array>
+ <string>{data_dir}/service-run-{service_id}.sh</string>
+</array>
@@
-`env-file`: ... token value must still appear in plist ...
+`env-file`: token never appears in plist; wrapper loads `{data_dir}/service-env-{service_id}` at runtime.
```
6. **Improve backoff math and add half-open circuit recovery**
Analysis: Current jitter + min clamp makes first retry deterministic and can over-pause. Also circuit-breaker requires manual resume forever. Add cooldown + half-open probe to self-heal.
```diff
@@ Backoff Logic
-let backoff_secs = ((base_backoff as f64) * jitter_factor) as u64;
-let backoff_secs = backoff_secs.max(base_interval_seconds);
+let max_backoff = base_backoff;
+let min_backoff = base_interval_seconds;
+let span = max_backoff.saturating_sub(min_backoff);
+let backoff_secs = min_backoff + ((span as f64) * jitter_factor) as u64;
@@ Scheduler states
-- `paused` — permanent error ... OR circuit breaker tripped ...
+- `paused` — permanent error requiring intervention
+- `half_open` — probe state after circuit cooldown; one trial run allowed
@@ Circuit breaker
-... transitions to `paused` ... Run: lore service resume
+... transitions to `half_open` after cooldown (default 30m). Successful probe closes breaker automatically; failed probe returns to backoff/paused.
```
7. **Promote backend trait to v1 (not v2) for deterministic integration tests**
Analysis: This is a reliability-critical feature spanning OS schedulers. A trait abstraction now gives true behavior tests and safer refactors.
```diff
@@ ### Platform Backends
-> Future architecture note: A `SchedulerBackend` trait ... for v2.
+Adopt `SchedulerBackend` trait in v1 with real backends (`launchd/systemd/schtasks`) and `FakeBackend` for tests.
+This enables deterministic install/uninstall/status/run-path integration tests without touching host scheduler.
```
8. **Harden `run_cmd` timeout behavior**
Analysis: If timeout occurs, child process must be killed and reaped. Otherwise you leak processes and can wedge repeated runs.
```diff
@@ fn run_cmd(...)
-// Wait with timeout
-let output = wait_with_timeout(output, timeout_secs)?;
+// Wait with timeout; on timeout kill child and wait to reap
+let output = wait_with_timeout_kill_and_reap(child, timeout_secs)?;
```
9. **Add manual control commands (`pause`, `trigger`, `repair`)**
Analysis: These are high-utility operational controls. `trigger` helps immediate sync without waiting interval. `pause` supports maintenance windows. `repair` avoids manual file deletion for corrupt state.
```diff
@@ pub enum ServiceCommand {
+ /// Pause scheduled execution without uninstalling
+ Pause { #[arg(long)] reason: Option<String> },
+ /// Trigger an immediate one-off run using installed profile
+ Trigger { #[arg(long)] ignore_backoff: bool },
+ /// Repair corrupt manifest/status by backing up and reinitializing
+ Repair { #[arg(long)] service: Option<String> },
}
```
10. **Make `logs` default non-interactive and add rotation policy**
Analysis: Opening editor by default is awkward for automation/SSH and slower for normal diagnosis. Defaulting to `tail` is more practical; `--open` can preserve editor behavior.
```diff
@@ ### `lore service logs`
-By default, opens in the user's preferred editor.
+By default, prints last 100 lines to stdout.
+Use `--open` to open editor.
@@
+Log rotation: rotate `service-stdout.log` / `service-stderr.log` at 10 MB, keep 5 files.
```
11. **Remove destructive/shell-unsafe suggested action**
Analysis: `actions(): ["rm {path}", ...]` is unsafe (shell injection + destructive guidance). Replace with safe command path.
```diff
@@ LoreError::actions()
-Self::ServiceCorruptState { path, .. } => vec![&format!("rm {path}"), "lore service install"],
+Self::ServiceCorruptState { .. } => vec!["lore service repair", "lore service install"],
```
12. **Tighten scheduler units for real-world reliability**
Analysis: Add explicit working directory and success-exit handling to reduce environment drift and edge failures.
```diff
@@ systemd service unit
[Service]
Type=oneshot
ExecStart={binary_path} --robot service run
+WorkingDirectory={data_dir}
+SuccessExitStatus=0
TimeoutStartSec=900
```
If you want, I can produce a single consolidated “v3 plan” markdown with these revisions already merged into your original structure.