Files
gitlore/plans/lore-service.feedback-5.md
Taylor Eernisse 2c9de1a6c3 docs: add lore-service, work-item-status-graphql, and time-decay plans
Three implementation plans with iterative cross-model refinement:

lore-service (5 iterations):
  HTTP service layer exposing lore's SQLite data via REST/SSE for
  integration with external tools (dashboards, IDE extensions, chat
  agents). Covers authentication, rate limiting, caching strategy, and
  webhook-driven sync triggers.

work-item-status-graphql (7 iterations + TDD appendix):
  Detailed implementation plan for the GraphQL-based work item status
  enrichment feature (now implemented). Includes the TDD appendix with
  test-first development specifications covering GraphQL client, adaptive
  pagination, ingestion orchestration, CLI display, and robot mode output.

time-decay-expert-scoring (iteration 5 feedback):
  Updates to the existing time-decay scoring plan incorporating feedback
  on decay curve parameterization, recency weighting for discussion
  contributions, and staleness detection thresholds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 08:12:17 -05:00

196 lines
8.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
I reviewed the full plan and avoided everything already listed in `## Rejected Recommendations`. These are the highest-impact revisions Id make.
1. **Fix identity model inconsistency and prevent `--name` alias collisions**
Why this improves the plan: your text says identity includes workspace root, but the current derivation code does not. Also, using `--name` as the actual `service_id` risks accidental cross-project collisions and destructive updates.
```diff
--- a/plan.md
+++ b/plan.md
@@ Key Design Principles / 2. Project-Scoped Service Identity
- Each installed service gets a unique `service_id` derived from a canonical identity tuple: the config file path, sorted GitLab project URLs, and workspace root.
+ Each installed service gets an immutable `identity_hash` derived from a canonical identity tuple:
+ workspace root + canonical config path + sorted normalized project URLs.
+ `service_id` remains the scheduler identifier; `--name` is a human alias only.
+ If `--name` collides with an existing service that has a different `identity_hash`, install fails with an actionable error.
@@ Install Manifest / Schema
+ /// Immutable identity hash for collision-safe matching across reinstalls
+ pub identity_hash: String,
+ /// Optional human-readable alias passed via --name
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub service_alias: Option<String>,
+ /// Canonical workspace root used in identity derivation
+ pub workspace_root: String,
@@ service_id derivation
-pub fn compute_service_id(config_path: &Path, project_urls: &[&str]) -> String
+pub fn compute_identity_hash(workspace_root: &Path, config_path: &Path, project_urls: &[&str]) -> String
```
2. **Add lock protocol to eliminate uninstall/run race conditions**
Why this improves the plan: today `service run` does not take admin lock, and admin commands do not take pipeline lock. `uninstall` can race with an active run and remove files mid-execution.
```diff
--- a/plan.md
+++ b/plan.md
@@ Key Design Principles / 6. Serialized Admin Mutations
- The `service run` entrypoint does NOT acquire the admin lock — it only acquires the `sync_pipeline` lock
+ The `service run` entrypoint acquires only `sync_pipeline`.
+ Destructive admin operations (`install` overwrite, `uninstall`, `repair --regenerate`) must:
+ 1) acquire `service-admin-{service_id}`
+ 2) disable scheduler backend entrypoint
+ 3) acquire `sync_pipeline` lock with timeout
+ 4) mutate/remove files
+ This lock ordering is mandatory to prevent deadlocks and run/delete races.
@@ lore service uninstall / User journey
- 4. Runs platform-specific disable command
- 5. Removes service files from disk
+ 4. Acquires `sync_pipeline` lock (after disabling scheduler) with bounded wait
+ 5. Removes service files from disk only after lock acquisition
```
3. **Make transient handling `Retry-After` aware**
Why this improves the plan: rate-limit and 503 responses often carry retry hints. Ignoring them causes useless retries and longer outages.
```diff
--- a/plan.md
+++ b/plan.md
@@ Transient vs permanent error classification
-| Transient | Retry with backoff | Network timeout, rate limited, DB locked, 5xx from GitLab |
+| Transient | Retry with adaptive backoff | Network timeout, DB locked, 5xx from GitLab |
+| Transient (hinted) | Respect server retry hint | Rate limited with Retry-After/X-RateLimit-Reset |
@@ Backoff Logic
+ If an error includes a retry hint (e.g., `Retry-After`), set:
+ `next_retry_at_ms = max(computed_backoff, hinted_retry_at_ms)`.
+ Persist `backoff_reason` for status visibility.
```
4. **Decouple optional stage cadence from core sync interval**
Why this improves the plan: running docs/embeddings every 530 minutes is expensive and unnecessary. Separate freshness windows reduce cost/latency while keeping core data fresh.
```diff
--- a/plan.md
+++ b/plan.md
@@ Sync profiles
-| `balanced` (default) | `--no-embed` | Issues + MRs + doc generation |
-| `full` | (none) | Full pipeline including embeddings |
+| `balanced` (default) | core every interval, docs every 60m, no embeddings | Fast + useful docs |
+| `full` | core every interval, docs every interval, embeddings every 6h (default) | Full freshness with bounded cost |
@@ Status File / StageResult
+ /// true when stage intentionally skipped due freshness window
+ #[serde(default)]
+ pub skipped: bool,
@@ lore service run / Stage-aware execution
+ Optional stages may be skipped when their last successful run is within configured freshness windows.
+ Skipped optional stages do not count as failures and are recorded explicitly.
```
5. **Give Windows parity for secure token handling (env-file + wrapper)**
Why this improves the plan: current Windows path requires global/system env and has poor UX. A wrapper+env-file model gives platform parity and avoids global token exposure.
```diff
--- a/plan.md
+++ b/plan.md
@@ Token storage strategies
-| On Windows, neither strategy applies — the token must be in the user's system environment
+| On Windows, `env-file` is supported via a generated wrapper script (`service-run-{service_id}.cmd` or `.ps1`)
+| that reads `{data_dir}/service-env-{service_id}` and launches `lore --robot service run ...`.
+| `embedded` remains opt-in and warned as less secure.
@@ Windows: schtasks
- Token handling on Windows: The env var must be set system-wide via `setx`
+ Token handling on Windows:
+ - `env-file` (default): wrapper script reads token from private file at runtime
+ - `embedded`: token passed via wrapper-set environment variable
+ - `system_env`: still supported as fallback
```
6. **Add run heartbeat and stale-run detection**
Why this improves the plan: `running` state can become misleading after crashes or stale locks. Heartbeat metadata makes status accurate and improves incident triage.
```diff
--- a/plan.md
+++ b/plan.md
@@ Status File / Schema
+ /// In-flight run metadata for crash/stale detection
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub current_run: Option<CurrentRunState>,
+
+pub struct CurrentRunState {
+ pub run_id: String,
+ pub started_at_ms: i64,
+ pub last_heartbeat_ms: i64,
+ pub pid: u32,
+}
@@ lore service status
- - `running` — currently executing (sync_pipeline lock held)
+ - `running` — currently executing with live heartbeat
+ - `running_stale` — in-flight metadata exists but heartbeat exceeded stale threshold
```
7. **Upgrade drift detection from “loaded/unloaded” to spec-level drift**
Why this improves the plan: platform state alone misses manual edits to unit/plist/wrapper files. Spec-hash drift gives reliable “what changed?” diagnostics and safe regeneration.
```diff
--- a/plan.md
+++ b/plan.md
@@ Install Manifest / Schema
+ /// Hash of generated scheduler artifacts and command spec
+ pub spec_hash: String,
@@ lore service status
- Detects manifest/platform drift and reports it
+ Detects:
+ - platform drift (loaded/unloaded mismatch)
+ - spec drift (artifact content hash mismatch)
+ - command drift (sync command differs from manifest)
@@ lore service repair
+ Add `--regenerate` to rewrite scheduler artifacts from manifest when spec drift is detected.
+ This is non-destructive and does not delete status/log history.
```
8. **Add safe operational modes: `install --dry-run` and `doctor --fix`**
Why this improves the plan: dry-run reduces risk before writing OS scheduler files; fix-mode improves operator ergonomics and lowers support burden.
```diff
--- a/plan.md
+++ b/plan.md
@@ lore service install
+ Add `--dry-run`:
+ - validates config/token/prereqs
+ - renders service files and planned commands
+ - writes nothing, executes nothing
@@ lore service doctor
+ Add `--fix` for safe, non-destructive remediations:
+ - create missing dirs
+ - correct file permissions on env/wrapper files
+ - run `systemctl --user daemon-reload` when applicable
+ - report applied fixes in robot output
```
9. **Define explicit schema migration behavior (not just `schema_version` fields)**
Why this improves the plan: version fields without migration policy become operational risk during upgrades.
```diff
--- a/plan.md
+++ b/plan.md
@@ ServiceManifest Read/Write
- `ServiceManifest::read(path: &Path) -> Result<Option<Self>, LoreError>`
+ `ServiceManifest::read_and_migrate(path: &Path) -> Result<Option<Self>, LoreError>`
+ - Migrates known older schema versions to current in-memory model
+ - Rewrites migrated file atomically
+ - Fails with actionable `ServiceCorruptState` for unknown future major versions
@@ SyncStatusFile Read/Write
- `SyncStatusFile::read(path: &Path) -> Result<Option<Self>, LoreError>`
+ `SyncStatusFile::read_and_migrate(path: &Path) -> Result<Option<Self>, LoreError>`
```
If you want, I can produce a fully rewritten v5 plan text that integrates all nine changes coherently section-by-section.