**High-Impact Revisions (ordered by priority)** 1. **Make service identity project-scoped (avoid collisions across repos/users)** Analysis: Current fixed names (`com.gitlore.sync`, `LoreSync`, `lore-sync.timer`) will collide when users run multiple gitlore workspaces. This causes silent overwrites and broken uninstall/status behavior. Diff: ```diff --- a/plan.md +++ b/plan.md @@ Commands & User Journeys / install - lore service install [--interval 30m] [--profile balanced] [--token-source env-file] + lore service install [--interval 30m] [--profile balanced] [--token-source auto] [--name ] @@ Install Manifest Schema + /// Stable per-install identity (default derived from project root hash) + pub service_id: String, @@ Platform Backends - Label: com.gitlore.sync + Label: com.gitlore.sync.{service_id} - Task name: LoreSync + Task name: LoreSync-{service_id} - ~/.config/systemd/user/lore-sync.service + ~/.config/systemd/user/lore-sync-{service_id}.service ``` 2. **Replace token model with secure per-OS defaults** Analysis: The current “env-file default” is not actually secure on macOS launchd (token still ends up in plist). On Windows, assumptions about inherited environment are fragile. Use OS-native secure stores by default and keep `embedded` as explicit opt-in only. Diff: ```diff --- a/plan.md +++ b/plan.md @@ Token storage strategies -| env-file (default) | ... +| auto (default) | macOS: Keychain, Linux: env-file (0600), Windows: Credential Manager | +| env-file | Linux/systemd only | | embedded | ... explicit warning ... @@ macOS launchd section - env-file strategy stores canonical token in service-env but embeds token in plist + default strategy is Keychain lookup at runtime; no token persisted in plist + env-file is not offered on macOS @@ Windows schtasks section - token must be in user's system environment + default strategy stores token in Windows Credential Manager and injects at runtime ``` 3. **Version and atomically persist manifest/status** Analysis: `Option` on read hides corruption, and non-atomic writes risk truncated JSON on crashes. This will create false “not installed” and scheduler confusion. Diff: ```diff --- a/plan.md +++ b/plan.md @@ Install Manifest Schema + pub schema_version: u32, // start at 1 + pub updated_at_iso: String, @@ Status File Schema + pub schema_version: u32, // start at 1 + pub updated_at_iso: String, @@ Read/Write - read(path) -> Option + read(path) -> Result, LoreError> - write(...) -> std::io::Result<()> + write_atomic(...) -> std::io::Result<()> // tmp file + fsync + rename ``` 4. **Persist `next_retry_at_ms` instead of recomputing jitter** Analysis: Deterministic jitter from timestamp modulo is predictable and can herd retries. Persisting `next_retry_at_ms` at failure time makes status accurate, stable, and cheap to compute. Diff: ```diff --- a/plan.md +++ b/plan.md @@ SyncStatusFile pub consecutive_failures: u32, + pub next_retry_at_ms: Option, @@ Backoff Logic - compute backoff from last_run.timestamp_ms and deterministic jitter each read + compute backoff once on failure, store next_retry_at_ms, read-only comparison afterward + jitter algorithm: full jitter in [0, cap], injectable RNG for tests ``` 5. **Add circuit breaker for repeated transient failures** Analysis: Infinite transient retries can run forever on systemic failures (DB corruption, bad network policy). After N transient failures, pause with actionable reason. Diff: ```diff --- a/plan.md +++ b/plan.md @@ Scheduler states - backoff — transient failures, waiting to retry + backoff — transient failures, waiting to retry + paused — permanent error OR circuit breaker tripped after N transient failures @@ Service run flow - On transient failure: increment failures, compute backoff + On transient failure: increment failures, compute backoff, if failures >= max_transient_failures -> pause ``` 6. **Stage-aware outcome policy (core freshness over all-or-nothing)** Analysis: Failing embeddings/docs should not block issues/MRs freshness. Split stage outcomes and only treat core stages as hard-fail by default. This improves reliability and practical usefulness. Diff: ```diff --- a/plan.md +++ b/plan.md @@ Context - lore sync runs a 4-stage pipeline ... treated as one run result + lore service run records per-stage outcomes (issues, mrs, docs, embeddings) @@ Status File Schema + pub stage_results: Vec, @@ service run flow - Execute sync pipeline with flags derived from profile + Execute stage-by-stage and classify severity: + - critical: issues, mrs + - optional: docs, embeddings + optional stage failures mark run as degraded, not failed ``` 7. **Replace cfg free-function backend with trait-based backend** Analysis: Current backend API is hard to test end-to-end without real OS commands. A `SchedulerBackend` trait enables deterministic integration tests and cleaner architecture. Diff: ```diff --- a/plan.md +++ b/plan.md @@ Platform Backends / Architecture - exports free functions dispatched via #[cfg] + define trait SchedulerBackend { install, uninstall, state, file_paths, next_run } + provide LaunchdBackend, SystemdBackend, SchtasksBackend implementations + include FakeBackend for integration tests ``` 8. **Harden platform units and detect scheduler prerequisites** Analysis: systemd user timers often fail silently without user manager/linger; launchd context can be wrong in headless sessions. Add explicit diagnostics and unit hardening. Diff: ```diff --- a/plan.md +++ b/plan.md @@ Linux systemd unit [Service] Type=oneshot ExecStart=... +TimeoutStartSec=900 +NoNewPrivileges=true +PrivateTmp=true +ProtectSystem=strict +ProtectHome=read-only @@ Linux install/status + detect user manager availability and linger state; surface warning/action @@ macOS install/status + detect non-GUI bootstrap context and return actionable error ``` 9. **Add operational commands: `trigger`, `doctor`, and non-interactive log tail** Analysis: `logs` opening an editor is weak for automation and incident response. Operators need a preflight and immediate controlled run. Diff: ```diff --- a/plan.md +++ b/plan.md @@ ServiceCommand + Trigger, // run one attempt through service policy now + Doctor, // validate scheduler, token, paths, permissions @@ logs - opens editor + supports --tail and --follow in human mode + robot mode can return last_n lines optionally ``` 10. **Fix plan inconsistencies and edge-case correctness** Analysis: There are internal mismatches that will cause implementation drift. Diff: ```diff --- a/plan.md +++ b/plan.md @@ Interval Parsing - supports 's' suffix + remove 's' suffix (acceptance only allows 5m..24h) @@ uninstall acceptance - removes ALL service files only + explicitly also remove service-manifest and service-env (status/logs retained) @@ SyncStatusFile schema - pub last_run: SyncRunRecord + pub last_run: Option // matches idle/no runs state ``` --- **Recommended Architecture Upgrade Summary** The strongest improvement set is: **(1) project-scoped IDs, (2) secure token defaults, (3) atomic/versioned state, (4) persisted retry schedule + circuit breaker, (5) stage-aware outcomes**. That combination materially improves correctness, multi-repo safety, security, operability, and real-world reliability without changing your core manual-vs-scheduled separation principle.