Files
gitlore/plans/lore-service.feedback-1.md
Taylor Eernisse 2c9de1a6c3 docs: add lore-service, work-item-status-graphql, and time-decay plans
Three implementation plans with iterative cross-model refinement:

lore-service (5 iterations):
  HTTP service layer exposing lore's SQLite data via REST/SSE for
  integration with external tools (dashboards, IDE extensions, chat
  agents). Covers authentication, rate limiting, caching strategy, and
  webhook-driven sync triggers.

work-item-status-graphql (7 iterations + TDD appendix):
  Detailed implementation plan for the GraphQL-based work item status
  enrichment feature (now implemented). Includes the TDD appendix with
  test-first development specifications covering GraphQL client, adaptive
  pagination, ingestion orchestration, CLI display, and robot mode output.

time-decay-expert-scoring (iteration 5 feedback):
  Updates to the existing time-decay scoring plan incorporating feedback
  on decay curve parameterization, recency weighting for discussion
  contributions, and staleness detection thresholds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:17 -05:00

186 lines
7.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
1. **Isolate scheduled behavior from manual `sync`**
Reasoning: Your current plan injects backoff into `handle_sync_cmd`, which affects all `lore sync` calls (including manual recovery runs). Scheduled behavior should be isolated so humans arent unexpectedly blocked by service backoff.
```diff
@@ Context
-`lore sync` runs a 4-stage pipeline (issues, MRs, docs, embeddings) that takes 2-4 minutes.
+`lore sync` remains the manual/operator command.
+`lore service run` (hidden/internal) is the scheduled execution entrypoint.
@@ Commands & User Journeys
+### `lore service run` (hidden/internal)
+**What it does:** Executes one scheduled sync attempt with service-only policy:
+- applies service backoff policy
+- records service run state
+- invokes sync pipeline with configured profile
+- updates retry state on success/failure
+
+**Invocation:** scheduler always runs:
+`lore --robot service run --reason timer`
@@ Backoff Integration into `handle_sync_cmd`
-Insert **after** config load but **before** the dry_run check:
+Do not add backoff checks to `handle_sync_cmd`.
+Backoff logic lives only in `handle_service_run`.
```
2. **Use DB as source-of-truth for service state (not a standalone JSON status file)**
Reasoning: You already have `sync_runs` in SQLite. A separate JSON status file creates split-brain and race/corruption risk. Keep JSON as optional cache/export only.
```diff
@@ Status File
-Location: `{get_data_dir()}/sync-status.json`
+Primary state location: SQLite (`service_state` table) + existing `sync_runs`.
+Optional mirror file: `{get_data_dir()}/sync-status.json` (best-effort export only).
@@ File-by-File Implementation Details
-### `src/core/sync_status.rs` (NEW)
+### `migrations/015_service_state.sql` (NEW)
+CREATE TABLE service_state (
+ id INTEGER PRIMARY KEY CHECK (id = 1),
+ installed INTEGER NOT NULL DEFAULT 0,
+ platform TEXT,
+ interval_seconds INTEGER,
+ profile TEXT NOT NULL DEFAULT 'balanced',
+ consecutive_failures INTEGER NOT NULL DEFAULT 0,
+ next_retry_at_ms INTEGER,
+ last_error_code TEXT,
+ last_error_message TEXT,
+ updated_at_ms INTEGER NOT NULL
+);
+
+### `src/core/service_state.rs` (NEW)
+- read/write state row
+- derive backoff/next_retry
+- join with latest `sync_runs` for status output
```
3. **Backoff policy should be configurable, jittered, and error-aware**
Reasoning: Fixed hardcoded backoff (`base=1800`) is wrong when user sets another interval. Also permanent failures (bad token/config) should not burn retries forever; they should enter paused/error state.
```diff
@@ Backoff Logic
-// Exponential: base * 2^failures, capped at 4 hours
+// Exponential with jitter: base * 2^(failures-1), capped, ±20% jitter
+// Applies only to transient errors.
+// Permanent errors set `paused_reason` and stop retries until user action.
@@ CLI Definition Changes
+ServiceCommand::Resume, // clear paused state / failures
+ServiceCommand::Run, // hidden
@@ Error Types
+ServicePaused, // scheduler paused due to permanent error
+ServiceCommandFailed, // OS command failure with stderr context
```
4. **Add a pipeline-level single-flight lock**
Reasoning: Current locking is in ingest stages; theres still overlap risk across full sync pipelines (docs/embed can overlap with another run). Add a top-level lock for scheduled/manual sync pipeline execution.
```diff
@@ Architecture
+Add `sync_pipeline` lock at top-level sync execution.
+Keep existing ingest lock (`sync`) for ingest internals.
@@ Backoff Integration into `handle_sync_cmd`
+Before starting sync pipeline, acquire `AppLock` with:
+name = "sync_pipeline"
+stale_lock_minutes = config.sync.stale_lock_minutes
+heartbeat_interval_seconds = config.sync.heartbeat_interval_seconds
```
5. **Dont embed token in service files by default**
Reasoning: Embedding PAT into unit/plist is a high-risk secret leak path. Make secure storage explicit and default-safe.
```diff
@@ `lore service install [--interval 30m]`
+`lore service install [--interval 30m] [--token-source env-file|embedded]`
+Default: `env-file` (0600 perms, user-owned)
+`embedded` allowed only with explicit opt-in and warning
@@ Robot output
- "token_embedded": true
+ "token_source": "env_file"
@@ Human output
- Note: Your GITLAB_TOKEN is embedded in the service file.
+ Note: Token is stored in a user-private env file (0600).
```
6. **Introduce a command-runner abstraction with timeout + stderr capture**
Reasoning: `launchctl/systemctl/schtasks` calls are failure-prone; you need consistent error mapping and deterministic tests.
```diff
@@ Platform Backends
-exports free functions that dispatch via `#[cfg(target_os)]`
+exports backend + shared `CommandRunner`:
+- run(cmd, args, timeout)
+- capture stdout/stderr/exit code
+- map failure to `ServiceCommandFailed { cmd, exit_code, stderr }`
```
7. **Persist install manifest to avoid brittle file parsing**
Reasoning: Parsing timer/plist for interval/state is fragile and platform-format dependent. Persist a manifest with checksums and expected artifacts.
```diff
@@ Platform Backends
-Same pattern for ... `get_interval_seconds()`
+Add manifest: `{data_dir}/service-manifest.json`
+Stores platform, interval, profile, generated files, and command.
+`service status` reads manifest first, then verifies platform state.
@@ Acceptance criteria
+Install is idempotent:
+- if manifest+files already match, report `no_change: true`
+- if drift detected, reconcile and rewrite
```
8. **Make schedule profile explicit (`fast|balanced|full`)**
Reasoning: This makes the feature more useful and performance-tunable without requiring users to understand internal flags.
```diff
@@ `lore service install [--interval 30m]`
+`lore service install [--interval 30m] [--profile fast|balanced|full]`
+
+Profiles:
+- fast: `sync --no-docs --no-embed`
+- balanced (default): `sync --no-embed`
+- full: `sync`
```
9. **Upgrade `service status` to include scheduler health + recent run summary**
Reasoning: Single last-sync snapshot is too shallow. Include recent attempts and whether scheduler is paused/backing off/running.
```diff
@@ `lore service status`
-What it does: Shows whether the service is installed, its configuration, last sync result, and next scheduled run.
+What it does: Shows install state, scheduler state (running/backoff/paused), recent runs, and next run estimate.
@@ Robot output
- "last_sync": { ... },
- "backoff": null
+ "scheduler_state": "running|backoff|paused|idle",
+ "last_sync": { ... },
+ "recent_runs": [{"run_id":"...","status":"...","started_at_iso":"..."}],
+ "backoff": null,
+ "paused_reason": null
```
10. **Strengthen tests around determinism and cross-platform generation**
Reasoning: Time-based backoff and shell quoting are classic flaky points. Add fake clock + fake command runner for deterministic tests.
```diff
@@ Testing Strategy
+Add deterministic test seams:
+- `Clock` trait for backoff/now calculations
+- `CommandRunner` trait for backend command execution
+
+Add tests:
+- transient vs permanent error classification
+- backoff schedule with jitter bounds
+- manifest drift reconciliation
+- quoting/escaping for paths with spaces and special chars
+- `service run` does not modify manual `sync` behavior
```
If you want, I can rewrite your full plan as a single clean revised document with these changes already integrated (instead of patch fragments).