docs: add lore-service, work-item-status-graphql, and time-decay plans
Three implementation plans with iterative cross-model refinement: lore-service (5 iterations): HTTP service layer exposing lore's SQLite data via REST/SSE for integration with external tools (dashboards, IDE extensions, chat agents). Covers authentication, rate limiting, caching strategy, and webhook-driven sync triggers. work-item-status-graphql (7 iterations + TDD appendix): Detailed implementation plan for the GraphQL-based work item status enrichment feature (now implemented). Includes the TDD appendix with test-first development specifications covering GraphQL client, adaptive pagination, ingestion orchestration, CLI display, and robot mode output. time-decay-expert-scoring (iteration 5 feedback): Updates to the existing time-decay scoring plan incorporating feedback on decay curve parameterization, recency weighting for discussion contributions, and staleness detection thresholds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
186
plans/lore-service.feedback-1.md
Normal file
186
plans/lore-service.feedback-1.md
Normal file
@@ -0,0 +1,186 @@
|
||||
1. **Isolate scheduled behavior from manual `sync`**
|
||||
Reasoning: Your current plan injects backoff into `handle_sync_cmd`, which affects all `lore sync` calls (including manual recovery runs). Scheduled behavior should be isolated so humans aren’t unexpectedly blocked by service backoff.
|
||||
|
||||
```diff
|
||||
@@ Context
|
||||
-`lore sync` runs a 4-stage pipeline (issues, MRs, docs, embeddings) that takes 2-4 minutes.
|
||||
+`lore sync` remains the manual/operator command.
|
||||
+`lore service run` (hidden/internal) is the scheduled execution entrypoint.
|
||||
|
||||
@@ Commands & User Journeys
|
||||
+### `lore service run` (hidden/internal)
|
||||
+**What it does:** Executes one scheduled sync attempt with service-only policy:
|
||||
+- applies service backoff policy
|
||||
+- records service run state
|
||||
+- invokes sync pipeline with configured profile
|
||||
+- updates retry state on success/failure
|
||||
+
|
||||
+**Invocation:** scheduler always runs:
|
||||
+`lore --robot service run --reason timer`
|
||||
|
||||
@@ Backoff Integration into `handle_sync_cmd`
|
||||
-Insert **after** config load but **before** the dry_run check:
|
||||
+Do not add backoff checks to `handle_sync_cmd`.
|
||||
+Backoff logic lives only in `handle_service_run`.
|
||||
```
|
||||
|
||||
2. **Use DB as source-of-truth for service state (not a standalone JSON status file)**
|
||||
Reasoning: You already have `sync_runs` in SQLite. A separate JSON status file creates split-brain and race/corruption risk. Keep JSON as optional cache/export only.
|
||||
|
||||
```diff
|
||||
@@ Status File
|
||||
-Location: `{get_data_dir()}/sync-status.json`
|
||||
+Primary state location: SQLite (`service_state` table) + existing `sync_runs`.
|
||||
+Optional mirror file: `{get_data_dir()}/sync-status.json` (best-effort export only).
|
||||
|
||||
@@ File-by-File Implementation Details
|
||||
-### `src/core/sync_status.rs` (NEW)
|
||||
+### `migrations/015_service_state.sql` (NEW)
|
||||
+CREATE TABLE service_state (
|
||||
+ id INTEGER PRIMARY KEY CHECK (id = 1),
|
||||
+ installed INTEGER NOT NULL DEFAULT 0,
|
||||
+ platform TEXT,
|
||||
+ interval_seconds INTEGER,
|
||||
+ profile TEXT NOT NULL DEFAULT 'balanced',
|
||||
+ consecutive_failures INTEGER NOT NULL DEFAULT 0,
|
||||
+ next_retry_at_ms INTEGER,
|
||||
+ last_error_code TEXT,
|
||||
+ last_error_message TEXT,
|
||||
+ updated_at_ms INTEGER NOT NULL
|
||||
+);
|
||||
+
|
||||
+### `src/core/service_state.rs` (NEW)
|
||||
+- read/write state row
|
||||
+- derive backoff/next_retry
|
||||
+- join with latest `sync_runs` for status output
|
||||
```
|
||||
|
||||
3. **Backoff policy should be configurable, jittered, and error-aware**
|
||||
Reasoning: Fixed hardcoded backoff (`base=1800`) is wrong when user sets another interval. Also permanent failures (bad token/config) should not burn retries forever; they should enter paused/error state.
|
||||
|
||||
```diff
|
||||
@@ Backoff Logic
|
||||
-// Exponential: base * 2^failures, capped at 4 hours
|
||||
+// Exponential with jitter: base * 2^(failures-1), capped, ±20% jitter
|
||||
+// Applies only to transient errors.
|
||||
+// Permanent errors set `paused_reason` and stop retries until user action.
|
||||
|
||||
@@ CLI Definition Changes
|
||||
+ServiceCommand::Resume, // clear paused state / failures
|
||||
+ServiceCommand::Run, // hidden
|
||||
|
||||
@@ Error Types
|
||||
+ServicePaused, // scheduler paused due to permanent error
|
||||
+ServiceCommandFailed, // OS command failure with stderr context
|
||||
```
|
||||
|
||||
4. **Add a pipeline-level single-flight lock**
|
||||
Reasoning: Current locking is in ingest stages; there’s still overlap risk across full sync pipelines (docs/embed can overlap with another run). Add a top-level lock for scheduled/manual sync pipeline execution.
|
||||
|
||||
```diff
|
||||
@@ Architecture
|
||||
+Add `sync_pipeline` lock at top-level sync execution.
|
||||
+Keep existing ingest lock (`sync`) for ingest internals.
|
||||
|
||||
@@ Backoff Integration into `handle_sync_cmd`
|
||||
+Before starting sync pipeline, acquire `AppLock` with:
|
||||
+name = "sync_pipeline"
|
||||
+stale_lock_minutes = config.sync.stale_lock_minutes
|
||||
+heartbeat_interval_seconds = config.sync.heartbeat_interval_seconds
|
||||
```
|
||||
|
||||
5. **Don’t embed token in service files by default**
|
||||
Reasoning: Embedding PAT into unit/plist is a high-risk secret leak path. Make secure storage explicit and default-safe.
|
||||
|
||||
```diff
|
||||
@@ `lore service install [--interval 30m]`
|
||||
+`lore service install [--interval 30m] [--token-source env-file|embedded]`
|
||||
+Default: `env-file` (0600 perms, user-owned)
|
||||
+`embedded` allowed only with explicit opt-in and warning
|
||||
|
||||
@@ Robot output
|
||||
- "token_embedded": true
|
||||
+ "token_source": "env_file"
|
||||
|
||||
@@ Human output
|
||||
- Note: Your GITLAB_TOKEN is embedded in the service file.
|
||||
+ Note: Token is stored in a user-private env file (0600).
|
||||
```
|
||||
|
||||
6. **Introduce a command-runner abstraction with timeout + stderr capture**
|
||||
Reasoning: `launchctl/systemctl/schtasks` calls are failure-prone; you need consistent error mapping and deterministic tests.
|
||||
|
||||
```diff
|
||||
@@ Platform Backends
|
||||
-exports free functions that dispatch via `#[cfg(target_os)]`
|
||||
+exports backend + shared `CommandRunner`:
|
||||
+- run(cmd, args, timeout)
|
||||
+- capture stdout/stderr/exit code
|
||||
+- map failure to `ServiceCommandFailed { cmd, exit_code, stderr }`
|
||||
```
|
||||
|
||||
7. **Persist install manifest to avoid brittle file parsing**
|
||||
Reasoning: Parsing timer/plist for interval/state is fragile and platform-format dependent. Persist a manifest with checksums and expected artifacts.
|
||||
|
||||
```diff
|
||||
@@ Platform Backends
|
||||
-Same pattern for ... `get_interval_seconds()`
|
||||
+Add manifest: `{data_dir}/service-manifest.json`
|
||||
+Stores platform, interval, profile, generated files, and command.
|
||||
+`service status` reads manifest first, then verifies platform state.
|
||||
|
||||
@@ Acceptance criteria
|
||||
+Install is idempotent:
|
||||
+- if manifest+files already match, report `no_change: true`
|
||||
+- if drift detected, reconcile and rewrite
|
||||
```
|
||||
|
||||
8. **Make schedule profile explicit (`fast|balanced|full`)**
|
||||
Reasoning: This makes the feature more useful and performance-tunable without requiring users to understand internal flags.
|
||||
|
||||
```diff
|
||||
@@ `lore service install [--interval 30m]`
|
||||
+`lore service install [--interval 30m] [--profile fast|balanced|full]`
|
||||
+
|
||||
+Profiles:
|
||||
+- fast: `sync --no-docs --no-embed`
|
||||
+- balanced (default): `sync --no-embed`
|
||||
+- full: `sync`
|
||||
```
|
||||
|
||||
9. **Upgrade `service status` to include scheduler health + recent run summary**
|
||||
Reasoning: Single last-sync snapshot is too shallow. Include recent attempts and whether scheduler is paused/backing off/running.
|
||||
|
||||
```diff
|
||||
@@ `lore service status`
|
||||
-What it does: Shows whether the service is installed, its configuration, last sync result, and next scheduled run.
|
||||
+What it does: Shows install state, scheduler state (running/backoff/paused), recent runs, and next run estimate.
|
||||
|
||||
@@ Robot output
|
||||
- "last_sync": { ... },
|
||||
- "backoff": null
|
||||
+ "scheduler_state": "running|backoff|paused|idle",
|
||||
+ "last_sync": { ... },
|
||||
+ "recent_runs": [{"run_id":"...","status":"...","started_at_iso":"..."}],
|
||||
+ "backoff": null,
|
||||
+ "paused_reason": null
|
||||
```
|
||||
|
||||
10. **Strengthen tests around determinism and cross-platform generation**
|
||||
Reasoning: Time-based backoff and shell quoting are classic flaky points. Add fake clock + fake command runner for deterministic tests.
|
||||
|
||||
```diff
|
||||
@@ Testing Strategy
|
||||
+Add deterministic test seams:
|
||||
+- `Clock` trait for backoff/now calculations
|
||||
+- `CommandRunner` trait for backend command execution
|
||||
+
|
||||
+Add tests:
|
||||
+- transient vs permanent error classification
|
||||
+- backoff schedule with jitter bounds
|
||||
+- manifest drift reconciliation
|
||||
+- quoting/escaping for paths with spaces and special chars
|
||||
+- `service run` does not modify manual `sync` behavior
|
||||
```
|
||||
|
||||
If you want, I can rewrite your full plan as a single clean revised document with these changes already integrated (instead of patch fragments).
|
||||
Reference in New Issue
Block a user