docs: add lore-service, work-item-status-graphql, and time-decay plans
Three implementation plans with iterative cross-model refinement: lore-service (5 iterations): HTTP service layer exposing lore's SQLite data via REST/SSE for integration with external tools (dashboards, IDE extensions, chat agents). Covers authentication, rate limiting, caching strategy, and webhook-driven sync triggers. work-item-status-graphql (7 iterations + TDD appendix): Detailed implementation plan for the GraphQL-based work item status enrichment feature (now implemented). Includes the TDD appendix with test-first development specifications covering GraphQL client, adaptive pagination, ingestion orchestration, CLI display, and robot mode output. time-decay-expert-scoring (iteration 5 feedback): Updates to the existing time-decay scoring plan incorporating feedback on decay curve parameterization, recency weighting for discussion contributions, and staleness detection thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
174
plans/lore-service.feedback-3.md
Normal file
174
plans/lore-service.feedback-3.md
Normal file
@@ -0,0 +1,174 @@
|
||||
Below are the highest-impact revisions I’d make, ordered by severity/ROI. These focus on correctness first, then security, then operability and UX.
|
||||
|
||||
1. **Fix multi-install ambiguity (`service_id` exists, but commands can’t target one explicitly)**
|
||||
Analysis: The plan introduces `service-manifest-{service_id}.json`, but `status/uninstall/resume/logs` have no selector. In a multi-workspace or multi-name install scenario, behavior becomes ambiguous and error-prone. Add explicit targeting plus discovery.
|
||||
```diff
|
||||
@@ ## Commands & User Journeys
|
||||
+### `lore service list`
|
||||
+Lists installed services discovered from `{data_dir}/service-manifest-*.json`.
|
||||
+Robot output includes `service_id`, `platform`, `interval_seconds`, `profile`, `installed_at_iso`.
|
||||
|
||||
@@ ### `lore service uninstall`
|
||||
-### `lore service uninstall`
|
||||
+### `lore service uninstall [--service <service_id|name>] [--all]`
|
||||
@@
|
||||
-2. CLI reads install manifest to find `service_id`
|
||||
+2. CLI resolves target service via `--service` or current-project-derived default.
|
||||
+3. If multiple candidates and no selector, return actionable error.
|
||||
|
||||
@@ ### `lore service status`
|
||||
-### `lore service status`
|
||||
+### `lore service status [--service <service_id|name>]`
|
||||
```
|
||||
|
||||
2. **Make status state service-scoped (not global)**
|
||||
Analysis: A single `sync-status.json` for all services causes cross-service contamination (pause/backoff/outcome from one profile affecting another). Keep lock global, but state per service.
|
||||
```diff
|
||||
@@ ## Status File
|
||||
-### Location
|
||||
-`{get_data_dir()}/sync-status.json`
|
||||
+### Location
|
||||
+`{get_data_dir()}/sync-status-{service_id}.json`
|
||||
|
||||
@@ ## Paths Module Additions
|
||||
-pub fn get_service_status_path() -> PathBuf {
|
||||
- get_data_dir().join("sync-status.json")
|
||||
+pub fn get_service_status_path(service_id: &str) -> PathBuf {
|
||||
+ get_data_dir().join(format!("sync-status-{service_id}.json"))
|
||||
}
|
||||
@@
|
||||
-Note: `sync-status.json` is NOT scoped by `service_id`
|
||||
+Note: status is scoped by `service_id`; lock remains global (`sync_pipeline`) to prevent overlapping writes.
|
||||
```
|
||||
|
||||
3. **Stop classifying permanence via string matching**
|
||||
Analysis: Matching `"401 Unauthorized"` in strings is brittle and will misclassify edge cases. Carry machine codes through stage results and classify by `ErrorCode` only.
|
||||
```diff
|
||||
@@ pub struct StageResult {
|
||||
- pub error: Option<String>,
|
||||
+ pub error: Option<String>,
|
||||
+ pub error_code: Option<String>, // e.g., AUTH_FAILED, NETWORK_ERROR
|
||||
}
|
||||
@@ Error classification helpers
|
||||
-fn is_permanent_error_message(msg: Option<&str>) -> bool { ...string contains... }
|
||||
+fn is_permanent_error_code(code: Option<&str>) -> bool {
|
||||
+ matches!(code, Some("TOKEN_NOT_SET" | "AUTH_FAILED" | "CONFIG_NOT_FOUND" | "CONFIG_INVALID" | "MIGRATION_FAILED"))
|
||||
+}
|
||||
```
|
||||
|
||||
4. **Install should be transactional (manifest written last)**
|
||||
Analysis: Current order writes manifest before scheduler enable. If enable fails, you persist a false “installed” state. Use two-phase install with rollback.
|
||||
```diff
|
||||
@@ ### `lore service install` User journey
|
||||
-9. CLI writes install manifest ...
|
||||
-10. CLI runs the platform-specific enable command
|
||||
+9. CLI runs the platform-specific enable command
|
||||
+10. On success, CLI writes install manifest atomically
|
||||
+11. On failure, CLI removes generated files and returns `ServiceCommandFailed`
|
||||
```
|
||||
|
||||
5. **Fix launchd token security gap (env-file currently still embeds token)**
|
||||
Analysis: Current “env-file” on macOS still writes token into plist, defeating the main security goal. Generate a private wrapper script that reads env file at runtime and execs `lore`.
|
||||
```diff
|
||||
@@ ### macOS: launchd
|
||||
-<key>ProgramArguments</key>
|
||||
-<array>
|
||||
- <string>{binary_path}</string>
|
||||
- <string>--robot</string>
|
||||
- <string>service</string>
|
||||
- <string>run</string>
|
||||
-</array>
|
||||
+<key>ProgramArguments</key>
|
||||
+<array>
|
||||
+ <string>{data_dir}/service-run-{service_id}.sh</string>
|
||||
+</array>
|
||||
@@
|
||||
-`env-file`: ... token value must still appear in plist ...
|
||||
+`env-file`: token never appears in plist; wrapper loads `{data_dir}/service-env-{service_id}` at runtime.
|
||||
```
|
||||
|
||||
6. **Improve backoff math and add half-open circuit recovery**
|
||||
Analysis: Current jitter + min clamp makes first retry deterministic and can over-pause. Also circuit-breaker requires manual resume forever. Add cooldown + half-open probe to self-heal.
|
||||
```diff
|
||||
@@ Backoff Logic
|
||||
-let backoff_secs = ((base_backoff as f64) * jitter_factor) as u64;
|
||||
-let backoff_secs = backoff_secs.max(base_interval_seconds);
|
||||
+let max_backoff = base_backoff;
|
||||
+let min_backoff = base_interval_seconds;
|
||||
+let span = max_backoff.saturating_sub(min_backoff);
|
||||
+let backoff_secs = min_backoff + ((span as f64) * jitter_factor) as u64;
|
||||
|
||||
@@ Scheduler states
|
||||
-- `paused` — permanent error ... OR circuit breaker tripped ...
|
||||
+- `paused` — permanent error requiring intervention
|
||||
+- `half_open` — probe state after circuit cooldown; one trial run allowed
|
||||
|
||||
@@ Circuit breaker
|
||||
-... transitions to `paused` ... Run: lore service resume
|
||||
+... transitions to `half_open` after cooldown (default 30m). Successful probe closes breaker automatically; failed probe returns to backoff/paused.
|
||||
```
|
||||
|
||||
7. **Promote backend trait to v1 (not v2) for deterministic integration tests**
|
||||
Analysis: This is a reliability-critical feature spanning OS schedulers. A trait abstraction now gives true behavior tests and safer refactors.
|
||||
```diff
|
||||
@@ ### Platform Backends
|
||||
-> Future architecture note: A `SchedulerBackend` trait ... for v2.
|
||||
+Adopt `SchedulerBackend` trait in v1 with real backends (`launchd/systemd/schtasks`) and `FakeBackend` for tests.
|
||||
+This enables deterministic install/uninstall/status/run-path integration tests without touching host scheduler.
|
||||
```
|
||||
|
||||
8. **Harden `run_cmd` timeout behavior**
|
||||
Analysis: If timeout occurs, child process must be killed and reaped. Otherwise you leak processes and can wedge repeated runs.
|
||||
```diff
|
||||
@@ fn run_cmd(...)
|
||||
-// Wait with timeout
|
||||
-let output = wait_with_timeout(output, timeout_secs)?;
|
||||
+// Wait with timeout; on timeout kill child and wait to reap
|
||||
+let output = wait_with_timeout_kill_and_reap(child, timeout_secs)?;
|
||||
```
|
||||
|
||||
9. **Add manual control commands (`pause`, `trigger`, `repair`)**
|
||||
Analysis: These are high-utility operational controls. `trigger` helps immediate sync without waiting interval. `pause` supports maintenance windows. `repair` avoids manual file deletion for corrupt state.
|
||||
```diff
|
||||
@@ pub enum ServiceCommand {
|
||||
+ /// Pause scheduled execution without uninstalling
|
||||
+ Pause { #[arg(long)] reason: Option<String> },
|
||||
+ /// Trigger an immediate one-off run using installed profile
|
||||
+ Trigger { #[arg(long)] ignore_backoff: bool },
|
||||
+ /// Repair corrupt manifest/status by backing up and reinitializing
|
||||
+ Repair { #[arg(long)] service: Option<String> },
|
||||
}
|
||||
```
|
||||
|
||||
10. **Make `logs` default non-interactive and add rotation policy**
|
||||
Analysis: Opening editor by default is awkward for automation/SSH and slower for normal diagnosis. Defaulting to `tail` is more practical; `--open` can preserve editor behavior.
|
||||
```diff
|
||||
@@ ### `lore service logs`
|
||||
-By default, opens in the user's preferred editor.
|
||||
+By default, prints last 100 lines to stdout.
|
||||
+Use `--open` to open editor.
|
||||
@@
|
||||
+Log rotation: rotate `service-stdout.log` / `service-stderr.log` at 10 MB, keep 5 files.
|
||||
```
|
||||
|
||||
11. **Remove destructive/shell-unsafe suggested action**
|
||||
Analysis: `actions(): ["rm {path}", ...]` is unsafe (shell injection + destructive guidance). Replace with safe command path.
|
||||
```diff
|
||||
@@ LoreError::actions()
|
||||
-Self::ServiceCorruptState { path, .. } => vec![&format!("rm {path}"), "lore service install"],
|
||||
+Self::ServiceCorruptState { .. } => vec!["lore service repair", "lore service install"],
|
||||
```
|
||||
|
||||
12. **Tighten scheduler units for real-world reliability**
|
||||
Analysis: Add explicit working directory and success-exit handling to reduce environment drift and edge failures.
|
||||
```diff
|
||||
@@ systemd service unit
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart={binary_path} --robot service run
|
||||
+WorkingDirectory={data_dir}
|
||||
+SuccessExitStatus=0
|
||||
TimeoutStartSec=900
|
||||
```
|
||||
|
||||
If you want, I can produce a single consolidated “v3 plan” markdown with these revisions already merged into your original structure.
|
||||
Reference in New Issue
Block a user