gitlore/plans/lore-service.feedback-3.md at perf-audit

Files

Taylor Eernisse 2c9de1a6c3 docs: add lore-service, work-item-status-graphql, and time-decay plans

Three implementation plans with iterative cross-model refinement:

lore-service (5 iterations):
  HTTP service layer exposing lore's SQLite data via REST/SSE for
  integration with external tools (dashboards, IDE extensions, chat
  agents). Covers authentication, rate limiting, caching strategy, and
  webhook-driven sync triggers.

work-item-status-graphql (7 iterations + TDD appendix):
  Detailed implementation plan for the GraphQL-based work item status
  enrichment feature (now implemented). Includes the TDD appendix with
  test-first development specifications covering GraphQL client, adaptive
  pagination, ingestion orchestration, CLI display, and robot mode output.

time-decay-expert-scoring (iteration 5 feedback):
  Updates to the existing time-decay scoring plan incorporating feedback
  on decay curve parameterization, recency weighting for discussion
  contributions, and staleness detection thresholds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-11 08:12:17 -05:00

7.9 KiB

Raw Permalink Blame History

Below are the highest-impact revisions I’d make, ordered by severity/ROI. These focus on correctness first, then security, then operability and UX.

Fix multi-install ambiguity (service_id exists, but commands can’t target one explicitly) Analysis: The plan introduces service-manifest-{service_id}.json, but status/uninstall/resume/logs have no selector. In a multi-workspace or multi-name install scenario, behavior becomes ambiguous and error-prone. Add explicit targeting plus discovery.

@@ ## Commands & User Journeys
+### `lore service list`
+Lists installed services discovered from `{data_dir}/service-manifest-*.json`.
+Robot output includes `service_id`, `platform`, `interval_seconds`, `profile`, `installed_at_iso`.

@@ ### `lore service uninstall`
-### `lore service uninstall`
+### `lore service uninstall [--service <service_id|name>] [--all]`
@@
-2. CLI reads install manifest to find `service_id`
+2. CLI resolves target service via `--service` or current-project-derived default.
+3. If multiple candidates and no selector, return actionable error.

@@ ### `lore service status`
-### `lore service status`
+### `lore service status [--service <service_id|name>]`

Make status state service-scoped (not global) Analysis: A single sync-status.json for all services causes cross-service contamination (pause/backoff/outcome from one profile affecting another). Keep lock global, but state per service.

@@ ## Status File
-### Location
-`{get_data_dir()}/sync-status.json`
+### Location
+`{get_data_dir()}/sync-status-{service_id}.json`

@@ ## Paths Module Additions
-pub fn get_service_status_path() -> PathBuf {
-    get_data_dir().join("sync-status.json")
+pub fn get_service_status_path(service_id: &str) -> PathBuf {
+    get_data_dir().join(format!("sync-status-{service_id}.json"))
}
@@
-Note: `sync-status.json` is NOT scoped by `service_id`
+Note: status is scoped by `service_id`; lock remains global (`sync_pipeline`) to prevent overlapping writes.

Stop classifying permanence via string matching Analysis: Matching "401 Unauthorized" in strings is brittle and will misclassify edge cases. Carry machine codes through stage results and classify by ErrorCode only.

@@ pub struct StageResult {
-    pub error: Option<String>,
+    pub error: Option<String>,
+    pub error_code: Option<String>, // e.g., AUTH_FAILED, NETWORK_ERROR
}
@@ Error classification helpers
-fn is_permanent_error_message(msg: Option<&str>) -> bool { ...string contains... }
+fn is_permanent_error_code(code: Option<&str>) -> bool {
+    matches!(code, Some("TOKEN_NOT_SET" | "AUTH_FAILED" | "CONFIG_NOT_FOUND" | "CONFIG_INVALID" | "MIGRATION_FAILED"))
+}

Install should be transactional (manifest written last) Analysis: Current order writes manifest before scheduler enable. If enable fails, you persist a false “installed” state. Use two-phase install with rollback.

@@ ### `lore service install` User journey
-9. CLI writes install manifest ...
-10. CLI runs the platform-specific enable command
+9. CLI runs the platform-specific enable command
+10. On success, CLI writes install manifest atomically
+11. On failure, CLI removes generated files and returns `ServiceCommandFailed`

Fix launchd token security gap (env-file currently still embeds token) Analysis: Current “env-file” on macOS still writes token into plist, defeating the main security goal. Generate a private wrapper script that reads env file at runtime and execs lore.

@@ ### macOS: launchd
-<key>ProgramArguments</key>
-<array>
-    <string>{binary_path}</string>
-    <string>--robot</string>
-    <string>service</string>
-    <string>run</string>
-</array>
+<key>ProgramArguments</key>
+<array>
+    <string>{data_dir}/service-run-{service_id}.sh</string>
+</array>
@@
-`env-file`: ... token value must still appear in plist ...
+`env-file`: token never appears in plist; wrapper loads `{data_dir}/service-env-{service_id}` at runtime.

Improve backoff math and add half-open circuit recovery Analysis: Current jitter + min clamp makes first retry deterministic and can over-pause. Also circuit-breaker requires manual resume forever. Add cooldown + half-open probe to self-heal.

@@ Backoff Logic
-let backoff_secs = ((base_backoff as f64) * jitter_factor) as u64;
-let backoff_secs = backoff_secs.max(base_interval_seconds);
+let max_backoff = base_backoff;
+let min_backoff = base_interval_seconds;
+let span = max_backoff.saturating_sub(min_backoff);
+let backoff_secs = min_backoff + ((span as f64) * jitter_factor) as u64;

@@ Scheduler states
-- `paused` — permanent error ... OR circuit breaker tripped ...
+- `paused` — permanent error requiring intervention
+- `half_open` — probe state after circuit cooldown; one trial run allowed

@@ Circuit breaker
-... transitions to `paused` ... Run: lore service resume
+... transitions to `half_open` after cooldown (default 30m). Successful probe closes breaker automatically; failed probe returns to backoff/paused.

Promote backend trait to v1 (not v2) for deterministic integration tests Analysis: This is a reliability-critical feature spanning OS schedulers. A trait abstraction now gives true behavior tests and safer refactors.

@@ ### Platform Backends
-> Future architecture note: A `SchedulerBackend` trait ... for v2.
+Adopt `SchedulerBackend` trait in v1 with real backends (`launchd/systemd/schtasks`) and `FakeBackend` for tests.
+This enables deterministic install/uninstall/status/run-path integration tests without touching host scheduler.

Harden run_cmd timeout behavior Analysis: If timeout occurs, child process must be killed and reaped. Otherwise you leak processes and can wedge repeated runs.

@@ fn run_cmd(...)
-// Wait with timeout
-let output = wait_with_timeout(output, timeout_secs)?;
+// Wait with timeout; on timeout kill child and wait to reap
+let output = wait_with_timeout_kill_and_reap(child, timeout_secs)?;

Add manual control commands (pause, trigger, repair) Analysis: These are high-utility operational controls. trigger helps immediate sync without waiting interval. pause supports maintenance windows. repair avoids manual file deletion for corrupt state.

@@ pub enum ServiceCommand {
+    /// Pause scheduled execution without uninstalling
+    Pause { #[arg(long)] reason: Option<String> },
+    /// Trigger an immediate one-off run using installed profile
+    Trigger { #[arg(long)] ignore_backoff: bool },
+    /// Repair corrupt manifest/status by backing up and reinitializing
+    Repair { #[arg(long)] service: Option<String> },
}

Make logs default non-interactive and add rotation policy Analysis: Opening editor by default is awkward for automation/SSH and slower for normal diagnosis. Defaulting to tail is more practical; --open can preserve editor behavior.

@@ ### `lore service logs`
-By default, opens in the user's preferred editor.
+By default, prints last 100 lines to stdout.
+Use `--open` to open editor.
@@
+Log rotation: rotate `service-stdout.log` / `service-stderr.log` at 10 MB, keep 5 files.

Remove destructive/shell-unsafe suggested action Analysis: actions(): ["rm {path}", ...] is unsafe (shell injection + destructive guidance). Replace with safe command path.

@@ LoreError::actions()
-Self::ServiceCorruptState { path, .. } => vec![&format!("rm {path}"), "lore service install"],
+Self::ServiceCorruptState { .. } => vec!["lore service repair", "lore service install"],

Tighten scheduler units for real-world reliability Analysis: Add explicit working directory and success-exit handling to reduce environment drift and edge failures.

@@ systemd service unit
 [Service]
 Type=oneshot
 ExecStart={binary_path} --robot service run
+WorkingDirectory={data_dir}
+SuccessExitStatus=0
 TimeoutStartSec=900

If you want, I can produce a single consolidated “v3 plan” markdown with these revisions already merged into your original structure.

7.9 KiB Raw Permalink Blame History Unescape Escape

7.9 KiB

Raw Permalink Blame History