chore(beads): Add observability and orchestrator issues to tracker

Add new beads for MR orchestrator integration, sync run observability, metrics collection, logging infrastructure, and CLI verbosity controls. Update last-touched timestamp. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
docs: Add observability PRD and sync pipeline explorer visualization
2026-02-04 13:39:34 -05:00 · 2026-02-04 13:39:22 -05:00 · 2026-02-04 13:39:13 -05:00 · 2026-02-04 13:39:00 -05:00 · 2026-02-04 13:38:43 -05:00 · 2026-02-04 13:38:29 -05:00
31 changed files with 3999 additions and 361 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
--- a/.beads/last-touched
+++ b/.beads/last-touched
@@ -1 +1 @@
-bd-1ep
+bd-3pz
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -296,6 +296,21 @@ dependencies = [
 "cfg-if",
 ]

+[[package]]
+name = "crossbeam-channel"
+version = "0.5.15"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "82b8f8f868b36967f9606790d1903570de9ceaf870a7bf9fbbd3016d636a2cb2"
+dependencies = [
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "crossbeam-utils"
+version = "0.8.21"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28"
+
 [[package]]
 name = "crossterm"
 version = "0.29.0"
@@ -347,6 +362,15 @@ version = "0.1.4"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "092966b41edc516079bdf31ec78a2e0588d1d0c08f78b91d8307215928642b2b"

+[[package]]
+name = "deranged"
+version = "0.5.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ececcb659e7ba858fb4f10388c250a7252eb0a27373f1a72b8748afdd248e587"
+dependencies = [
+ "powerfmt",
+]
+
 [[package]]
 name = "dialoguer"
 version = "0.12.0"
@@ -1108,6 +1132,7 @@ dependencies = [
 "thiserror",
 "tokio",
 "tracing",
+ "tracing-appender",
 "tracing-subscriber",
 "url",
 "urlencoding",
@@ -1183,6 +1208,12 @@ dependencies = [
 "windows-sys 0.61.2",
 ]

+[[package]]
+name = "num-conv"
+version = "0.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "cf97ec579c3c42f953ef76dbf8d55ac91fb219dde70e49aa4a6b7d74e9919050"
+
 [[package]]
 name = "num-traits"
 version = "0.2.19"
@@ -1343,6 +1374,12 @@ dependencies = [
 "zerovec",
 ]

+[[package]]
+name = "powerfmt"
+version = "0.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "439ee305def115ba05938db6eb1644ff94165c5ab5e9420d1c1bcedbba909391"
+
 [[package]]
 name = "ppv-lite86"
 version = "0.2.21"
@@ -1878,6 +1915,37 @@ dependencies = [
 "cfg-if",
 ]

+[[package]]
+name = "time"
+version = "0.3.46"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9da98b7d9b7dad93488a84b8248efc35352b0b2657397d4167e7ad67e5d535e5"
+dependencies = [
+ "deranged",
+ "itoa",
+ "num-conv",
+ "powerfmt",
+ "serde_core",
+ "time-core",
+ "time-macros",
+]
+
+[[package]]
+name = "time-core"
+version = "0.1.8"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7694e1cfe791f8d31026952abf09c69ca6f6fa4e1a1229e18988f06a04a12dca"
+
+[[package]]
+name = "time-macros"
+version = "0.2.26"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "78cc610bac2dcee56805c99642447d4c5dbde4d01f752ffea0199aee1f601dc4"
+dependencies = [
+ "num-conv",
+ "time-core",
+]
+
 [[package]]
 name = "tinystr"
 version = "0.8.2"
@@ -2003,6 +2071,18 @@ dependencies = [
 "tracing-core",
 ]

+[[package]]
+name = "tracing-appender"
+version = "0.2.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "786d480bce6247ab75f005b14ae1624ad978d3029d9113f0a22fa1ac773faeaf"
+dependencies = [
+ "crossbeam-channel",
+ "thiserror",
+ "time",
+ "tracing-subscriber",
+]
+
 [[package]]
 name = "tracing-attributes"
 version = "0.1.31"
@@ -2035,6 +2115,16 @@ dependencies = [
 "tracing-core",
 ]

+[[package]]
+name = "tracing-serde"
+version = "0.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "704b1aeb7be0d0a84fc9828cae51dab5970fee5088f83d1dd7ee6f6246fc6ff1"
+dependencies = [
+ "serde",
+ "tracing-core",
+]
+
 [[package]]
 name = "tracing-subscriber"
 version = "0.3.22"
@@ -2045,12 +2135,15 @@ dependencies = [
 "nu-ansi-term",
 "once_cell",
 "regex-automata",
+ "serde",
+ "serde_json",
 "sharded-slab",
 "smallvec",
 "thread_local",
 "tracing",
 "tracing-core",
 "tracing-log",
+ "tracing-serde",
 ]

 [[package]]
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -52,7 +52,8 @@ libc = "0.2"

 # Logging
 tracing = "0.1"
-tracing-subscriber = { version = "0.3", features = ["env-filter"] }
+tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
+tracing-appender = "0.2"

 [dev-dependencies]
 tempfile = "3"
--- a/docs/prd-observability.md
+++ b/docs/prd-observability.md
@@ -0,0 +1,866 @@
+# PRD: Observability Infrastructure for lore CLI
+
+**Status:** Draft
+**Author:** Taylor + Claude
+**Date:** 2026-02-04
+
+---
+
+## 1. Problem Statement
+
+lore currently has minimal observability. Logging is ephemeral (stderr only), there are no persistent log files, no performance metrics, no structured JSON log output, no verbosity controls beyond `RUST_LOG`, and no way to diagnose issues after the fact. When a sync fails at 3 AM in a cron job, or an embedding run takes 10x longer than usual, there is zero forensic data available.
+
+### Current State
+
+| Capability | Status |
+|---|---|
+| Log destination | stderr only, ephemeral |
+| Log persistence | None |
+| Structured output | Human-readable fmt only |
+| Verbosity control | `RUST_LOG` env var (no CLI flag) |
+| Performance metrics | Ad-hoc `Instant::now()` in 2 commands |
+| Timing in robot JSON | `elapsed_ms` in search and sync `meta` only |
+| Spans / correlation | None |
+| Log rotation | None |
+| Per-stage timing | None |
+| Rate limit / retry visibility | `tracing::warn!` only |
+| Error aggregation | None |
+| Historical comparison | None |
+
+### What's Already in Place (to build on)
+
+- `tracing` (0.1) + `tracing-subscriber` (0.3) with `env-filter` feature
+- Registry-based subscriber initialized in `src/main.rs:44-58` with a single `fmt::layer()` using `SuspendingWriter`
+- `SuspendingWriter` (`src/cli/progress.rs:25-73`) that coordinates log output with indicatif `MultiProgress` — buffers each log line, calls `MULTI.suspend()` on drop to clear progress bars before writing to stderr
+- `IngestDisplay` struct (`src/cli/commands/ingest.rs:65-104`) controlling UI verbosity with three modes: `interactive()` / `silent()` / `progress_only()`
+- Robot mode JSON envelope: `{ "ok": true, "data": {...}, "meta": {...} }` — used consistently in sync, search, sync-status, and doctor commands
+- XDG-compliant data directory at `~/.local/share/lore/`
+- `sync_runs` table (migration 001) with schema: `id`, `started_at`, `heartbeat_at`, `finished_at`, `status`, `command`, `error`, `metrics_json` — **exists but is never written to** (no INSERT anywhere in the codebase; `sync_status.rs` reads from it but always gets zero rows)
+- `uuid` crate (v1, v4 feature) already a dependency
+- Structured fields used in tracing calls (e.g., `info!(owner = %self.owner, ...)`)
+- `EnvFilter` currently hardcoded: `lore=info` + `warn` default directives
+- Global CLI flags in `src/cli/mod.rs:9-43`: `--config`, `--robot`, `-J`, `--color`, `--quiet` (all `global = true`)
+
+---
+
+## 2. Goals
+
+### Primary
+
+1. **Post-mortem debugging**: Any failed or slow run can be diagnosed after the fact from persistent, structured log files.
+2. **Performance visibility**: Every sync/ingest/embed/search operation reports granular stage-level timing, both to the terminal and to persistent storage.
+3. **Ergonomic verbosity**: Users and agents control log verbosity through CLI flags (`-v`, `-vv`, `-vvv`) without needing to know `RUST_LOG` syntax.
+4. **Machine-parseable logs**: A JSON log mode for piping into log aggregators (jq, Datadog, Loki, etc.).
+5. **Agent-friendly metrics**: Robot mode JSON output includes comprehensive timing breakdowns for every command, enabling automated monitoring.
+
+### Secondary
+
+6. **Log rotation and retention**: Log files don't grow unbounded; old logs are automatically cleaned up.
+7. **Correlation IDs**: Every sync run gets a unique ID that connects log lines, database records, and robot output.
+8. **Rate limit and retry transparency**: Every rate-limited request and retry is visible in logs with full context.
+9. **Sync history with metrics**: The `sync_runs` table is enriched with per-stage timing, item counts, and error counts so `lore sync-status` becomes a real dashboard.
+
+### Non-Goals
+
+- External telemetry export (OpenTelemetry, Prometheus) -- out of scope for v1.
+- Real-time log streaming / tailing UI.
+- Alerting or notification systems.
+- Distributed tracing across multiple lore instances.
+
+---
+
+## 3. Research Foundation
+
+### 3.1 The Three Pillars of Observability
+
+Academic and industry consensus (Gholamian & Ward 2021, "A Comprehensive Survey of Logging in Software") identifies three pillars:
+
+1. **Logs** -- Discrete events with context. The foundation.
+2. **Metrics** -- Numerical measurements over time (counters, gauges, histograms).
+3. **Traces** -- Causally ordered spans representing operations.
+
+For a CLI tool (not a long-running service), the mapping is:
+
+| Pillar | CLI Equivalent |
+|---|---|
+| Logs | Structured log files per invocation |
+| Metrics | Per-stage timing, item counts, error counts stored in DB |
+| Traces | Span hierarchy within a single invocation (sync -> ingest issues -> fetch page N -> sync discussions) |
+
+### 3.2 Structured Logging Best Practices
+
+From Duan et al. 2025 ("PDLogger: Automated Logging Framework for Practical Software Development") and industry practice:
+
+- **Always structured**: JSON or key=value, never free-form prose in production logs.
+- **Contextual fields propagate**: A sync_run_id set at the top level appears in every downstream log line.
+- **Levels have semantic meaning**:
+  - `ERROR`: Operation failed, requires attention.
+  - `WARN`: Degraded behavior (rate limited, retry, skip).
+  - `INFO`: Significant state transitions (stage start/complete, items processed).
+  - `DEBUG`: Detailed operational data (page boundaries, individual API calls).
+  - `TRACE`: Wire-level detail (request/response bodies, SQL queries).
+
+### 3.3 CLI Verbosity Conventions
+
+From the GNU Coding Standards, POSIX conventions, and modern Rust CLI tools (ripgrep, fd, cargo):
+
+| Pattern | Meaning | Precedent |
+|---|---|---|
+| (default) | INFO for app, WARN for deps | cargo, rustc |
+| `-q` / `--quiet` | Suppress non-error output | ripgrep, fd, cargo |
+| `-v` | DEBUG for app | ripgrep, fd |
+| `-vv` | DEBUG for app + deps | cargo |
+| `-vvv` | TRACE for everything | cargo, curl |
+| `RUST_LOG=...` | Fine-grained override | Universal in Rust |
+
+The `-v` flag should feel familiar to anyone who has used cargo, curl, or ssh.
+
+### 3.4 Log File Rotation
+
+`tracing-appender` (from the tokio-rs/tracing ecosystem) provides:
+
+- **Daily rotation**: New file per day, named `lore.2026-02-04.log`.
+- **Non-blocking writes**: Dedicated writer thread, zero impact on main async runtime.
+- **Configurable retention**: Delete files older than N days.
+
+This is the canonical solution in the Rust tracing ecosystem and requires no custom code.
+
+### 3.5 Performance Metrics for CLI Tools
+
+Inspired by hyperfine's approach to benchmarking and cargo's `--timings` flag:
+
+- Report wall-clock time per stage.
+- Report item throughput (items/sec).
+- Store historical runs for trend comparison.
+- Present timing data in both human-readable and machine-readable formats.
+
+---
+
+## 4. Design
+
+### 4.1 Architecture Overview
+
+```
+                          CLI Invocation
+                               |
+                    +----------+----------+
+                    |                     |
+              Interactive Mode       Robot Mode
+                    |                     |
+         +---stderr (human fmt)    stdout (JSON envelope)
+         |         |                     |
+         |   progress bars          { ok, data, meta: {
+         |   colored output           elapsed_ms,
+         |                            stages: [...],
+         |                            run_id
+         |                          }}
+         |
+    Log Subscribers (layered)
+         |
+    +----+----+--------+
+    |         |        |
+  stderr    file    (future:
+  (fmt)    (JSON)   OTLP)
+```
+
+### 4.2 Subscriber Stack
+
+Replace the current single-layer subscriber with a layered registry. Each layer has its own filter:
+
+```
+registry()
+  .with(stderr_layer.with_filter(stderr_filter))   // Human-readable, SuspendingWriter, -v controlled
+  .with(file_layer.with_filter(file_filter))        // JSON, daily rotation, always DEBUG+
+```
+
+**stderr layer**: Same `fmt::layer()` as today with `SuspendingWriter`, but level controlled by `-v` flags. When `--log-format json` is passed, this layer switches to `fmt::layer().json()` (same JSON format as file layer, but still routed through `SuspendingWriter` for progress bar coordination).
+
+**file layer**: Always-on JSON output to `~/.local/share/lore/logs/`, daily rotation via `tracing-appender`. Uses its own `EnvFilter` set to `lore=debug,warn` regardless of `-v` flags, ensuring post-mortem data is always available. The file layer does NOT use `SuspendingWriter` — it writes to a file, not stderr, so progress bar coordination is unnecessary.
+
+**Filter architecture**: Per-layer filtering (not a single shared `EnvFilter`) is required because the file layer must always be at DEBUG+ while stderr follows `-v`. `tracing-subscriber`'s `Layer::with_filter()` method enables this.
+
+**`RUST_LOG` override**: When `RUST_LOG` is set, it overrides BOTH layer filters. This is the expert escape hatch.
+
+**Current subscriber** (`src/main.rs:44-58`):
+```rust
+tracing_subscriber::registry()
+    .with(
+        tracing_subscriber::fmt::layer()
+            .with_target(false)
+            .with_writer(lore::cli::progress::SuspendingWriter),
+    )
+    .with(
+        EnvFilter::from_default_env()
+            .add_directive("lore=info".parse().unwrap())
+            .add_directive("warn".parse().unwrap()),
+    )
+    .init();
+```
+
+This will be replaced by the dual-layer setup. The `SuspendingWriter` integration and `with_target(false)` on the stderr layer remain unchanged.
+
+### 4.3 Verbosity Levels
+
+#### stderr layer (controlled by `-v` flags)
+
+| Flags | App Level | Dep Level | Behavior |
+|---|---|---|---|
+| (none) | INFO | WARN | Default. Stage transitions, summaries. |
+| `-q` | WARN | ERROR | Errors and warnings only. |
+| `-v` | DEBUG | WARN | Detailed app behavior. API pages, skip reasons. |
+| `-vv` | DEBUG | INFO | App + dependency detail. HTTP client, SQLite. |
+| `-vvv` | TRACE | DEBUG | Everything. Wire-level detail. |
+| `RUST_LOG=...` | (overrides all) | (overrides all) | Expert escape hatch. |
+
+Precedence: `RUST_LOG` > `-v` flags > defaults. This matches cargo's behavior.
+
+#### file layer (independent of `-v` flags)
+
+| Condition | App Level | Dep Level |
+|---|---|---|
+| Always (default) | DEBUG | WARN |
+| `RUST_LOG=...` set | (overrides) | (overrides) |
+
+The file layer always captures DEBUG+ for the `lore` crate and WARN+ for dependencies. This ensures post-mortem data is available even when the user ran with default stderr verbosity. `RUST_LOG` overrides both layers when set.
+
+#### New CLI flags
+
+Add to the `Cli` struct (`src/cli/mod.rs`):
+
+```rust
+/// Increase log verbosity (-v, -vv, -vvv)
+#[arg(short = 'v', long = "verbose", action = clap::ArgAction::Count, global = true)]
+pub verbose: u8,
+
+/// Log format for stderr output: text (default) or json
+#[arg(long = "log-format", global = true, value_parser = ["text", "json"], default_value = "text")]
+pub log_format: String,
+```
+
+The `-v` flag uses `clap::ArgAction::Count` to support `-v`, `-vv`, `-vvv` as a single flag with increasing count. The `--log-format` flag controls whether stderr emits human-readable or JSON-formatted log lines.
+
+### 4.4 Structured Log File Output
+
+**Location**: `~/.local/share/lore/logs/lore.YYYY-MM-DD.log`
+
+**Format**: One JSON object per line (JSONL), produced by `tracing-subscriber`'s `fmt::layer().json()`:
+
+```json
+{"timestamp":"2026-02-04T14:32:01.123Z","level":"INFO","target":"lore::ingestion","fields":{"message":"Discussion sync complete","project":"group/repo","issues_synced":42,"elapsed_ms":1234},"span":{"name":"ingest_issues","run_id":"a1b2c3"}}
+```
+
+**Rotation**: Daily via `tracing-appender::rolling::daily()`.
+
+**Retention**: Configurable, default 30 days. A `logs.retention_days` config field. Cleanup runs at startup (check directory, delete files older than N days).
+
+### 4.5 Tracing Spans
+
+Introduce spans for causal correlation within a single invocation:
+
+```
+sync (run_id=uuid)
+  +-- ingest_issues
+  |     +-- fetch_pages (project="group/repo")
+  |     +-- sync_discussions (project="group/repo")
+  |     +-- fetch_resource_events (project="group/repo")
+  +-- ingest_mrs
+  |     +-- fetch_pages (project="group/repo")
+  |     +-- sync_discussions (project="group/repo")
+  +-- generate_docs
+  +-- embed
+```
+
+Each span records `elapsed_ms` on close. The `run_id` propagates to all child spans and log events, enabling `jq '.span.run_id == "a1b2c3"' lore.2026-02-04.log` to extract an entire run.
+
+### 4.6 Performance Metrics
+
+#### 4.6.1 Per-Stage Timing
+
+Every command collects a `Vec<StageTiming>`:
+
+```rust
+#[derive(Debug, Clone, Serialize)]
+pub struct StageTiming {
+    pub name: String,                       // "ingest_issues", "fetch_pages", etc.
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub project: Option<String>,            // Which project, if applicable
+    pub elapsed_ms: u64,
+    pub items_processed: usize,
+    #[serde(skip_serializing_if = "is_zero")]
+    pub items_skipped: usize,
+    #[serde(skip_serializing_if = "is_zero")]
+    pub errors: usize,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    pub sub_stages: Vec<StageTiming>,       // Nested child stages
+}
+```
+
+**Collection mechanism**: Stage timing is materialized from tracing spans, not plumbed manually through function signatures. Phase 2 adds `#[instrument]` spans to each sync stage. Phase 3 adds a custom `tracing-subscriber` layer that records span enter/exit times and structured fields, then extracts the span tree into `Vec<StageTiming>` when the root span closes.
+
+This means:
+- No mutable timing collector threaded through `run_ingest` → `fetch_pages` → `sync_discussions`
+- Spans are the single source of truth for timing
+- `StageTiming` is a materialized view of the span tree
+- The custom layer implements `on_close` to capture `elapsed` and `on_record` to capture structured fields like `items_processed`
+
+**Where to define**: `src/core/metrics.rs` (new file — genuinely new functionality that doesn't fit in any existing file)
+
+#### 4.6.2 Robot JSON Meta Enhancement
+
+Currently:
+```json
+{ "ok": true, "data": {...}, "meta": { "elapsed_ms": 1234 } }
+```
+
+Proposed:
+```json
+{
+  "ok": true,
+  "data": { ... },
+  "meta": {
+    "run_id": "a1b2c3d4",
+    "elapsed_ms": 45230,
+    "stages": [
+      {
+        "name": "ingest_issues",
+        "elapsed_ms": 12340,
+        "items_processed": 150,
+        "items_skipped": 30,
+        "errors": 0,
+        "sub_stages": [
+          { "name": "fetch_pages", "project": "group/repo", "elapsed_ms": 5200, "items_processed": 150 },
+          { "name": "sync_discussions", "project": "group/repo", "elapsed_ms": 6800, "items_processed": 42, "items_skipped": 108 }
+        ]
+      },
+      {
+        "name": "ingest_mrs",
+        "elapsed_ms": 18900,
+        "items_processed": 85,
+        "items_skipped": 12,
+        "errors": 1
+      },
+      { "name": "generate_docs", "elapsed_ms": 8500, "items_processed": 235 },
+      { "name": "embed", "elapsed_ms": 5490, "items_processed": 1024 }
+    ]
+  }
+}
+```
+
+#### 4.6.3 Sync History Enrichment
+
+**Prerequisite bug fix**: The `sync_runs` table (migration 001) exists with columns `id`, `started_at`, `heartbeat_at`, `finished_at`, `status`, `command`, `error`, `metrics_json` — but **no code ever writes to it**. The `sync_status.rs` command reads from it but always gets zero rows. This must be fixed before enrichment.
+
+**Step 1: Wire up sync_runs lifecycle** (prerequisite, in Phase 4)
+
+Add INSERT/UPDATE calls to the sync and ingest command handlers:
+
+```rust
+// On sync/ingest start:
+INSERT INTO sync_runs (started_at, heartbeat_at, status, command)
+VALUES (?now_ms, ?now_ms, 'running', ?command_name)
+RETURNING id;
+
+// On sync/ingest success:
+UPDATE sync_runs
+SET finished_at = ?now_ms, status = 'succeeded', metrics_json = ?metrics
+WHERE id = ?run_id;
+
+// On sync/ingest failure:
+UPDATE sync_runs
+SET finished_at = ?now_ms, status = 'failed', error = ?error_msg, metrics_json = ?metrics
+WHERE id = ?run_id;
+```
+
+**Where**: Add a `SyncRunRecorder` helper in `src/core/db.rs` or `src/core/sync_run.rs` that encapsulates the INSERT/UPDATE lifecycle. Called from `run_sync()` in `src/cli/commands/sync.rs` and `run_ingest()` in `src/cli/commands/ingest.rs`.
+
+**Step 2: Schema migration** (migration 014)
+
+Add dedicated queryable columns alongside the existing `metrics_json`:
+
+```sql
+-- Migration 014: sync_runs enrichment for observability
+ALTER TABLE sync_runs ADD COLUMN run_id TEXT;
+ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
+ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
+
+-- Index for correlation queries
+CREATE INDEX idx_sync_runs_run_id ON sync_runs(run_id);
+```
+
+The existing `metrics_json` column stores the detailed `Vec<StageTiming>` as a JSON array. No need for a separate `stages_json` column.
+
+**Step 3: Enhanced sync-status display**
+
+`lore sync-status` (`src/cli/commands/sync_status.rs`) currently shows only the last run. Enhance to show recent runs with metrics:
+
+```
+Recent sync runs:
+  Run a1b2c3 | 2026-02-04 14:32 | 45.2s | 235 items | 1 error
+  Run d4e5f6 | 2026-02-03 14:30 | 38.1s | 220 items | 0 errors
+  Run g7h8i9 | 2026-02-02 14:29 | 42.7s | 228 items | 0 errors
+```
+
+Robot mode (`lore --robot sync-status`):
+```json
+{
+  "ok": true,
+  "data": {
+    "runs": [
+      {
+        "run_id": "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
+        "started_at": "2026-02-04T14:32:01.123Z",
+        "elapsed_ms": 45230,
+        "status": "succeeded",
+        "command": "sync",
+        "total_items_processed": 235,
+        "total_errors": 1,
+        "stages": [...]
+      }
+    ],
+    "cursors": [...],
+    "summary": {...}
+  }
+}
+```
+
+The `stages` array is parsed from `metrics_json` and included in the robot output. Interactive mode shows the summary table above; `lore --robot sync-status --run a1b2c3` shows a single run's full stage breakdown.
+
+#### 4.6.4 Human-Readable Timing
+
+At the end of `lore sync` (interactive mode), print a timing summary:
+
+```
+Sync complete in 45.2s
+  Ingest issues .... 12.3s (150 items, 42 discussions)
+  Ingest MRs ....... 18.9s (85 items, 1 error)
+  Generate docs .... 8.5s  (235 documents)
+  Embed ............ 5.5s  (1024 chunks)
+```
+
+Gated behind `display.show_text` so it doesn't appear in progress_only or silent modes.
+
+### 4.7 Rate Limit and Retry Transparency
+
+Currently, rate limits emit a `tracing::warn!`. Enhance to:
+
+- Log at INFO level (not just WARN) with structured fields: `info!(path, attempt, retry_after_secs, "Rate limited, retrying")`.
+- Count total rate-limit hits per run and include in stage timing.
+- In `-v` mode, show retry progress on stderr: `    Retrying /api/v4/projects/123/issues (429, waiting 2s)`.
+
+### 4.8 Configuration
+
+Add a new `logging` section to `Config` (`src/core/config.rs`):
+
+```rust
+#[derive(Debug, Clone, Deserialize)]
+pub struct LoggingConfig {
+    /// Directory for log files. Default: ~/.local/share/lore/logs/
+    #[serde(default)]
+    pub log_dir: Option<String>,
+
+    /// Days to retain log files. Default: 30. Set to 0 to disable file logging.
+    #[serde(default = "default_retention_days")]
+    pub retention_days: u32,
+
+    /// Enable JSON log files. Default: true.
+    #[serde(default = "default_true")]
+    pub file_logging: bool,
+}
+
+fn default_retention_days() -> u32 { 30 }
+fn default_true() -> bool { true }
+```
+
+Add to the `Config` struct:
+
+```rust
+#[serde(default)]
+pub logging: LoggingConfig,
+```
+
+With `config.json`:
+```json
+{
+  "logging": {
+    "log_dir": null,
+    "retention_days": 30,
+    "file_logging": true
+  }
+}
+```
+
+Defaults are sane so existing configs continue working with zero changes.
+
+**CLI flags** (added to `Cli` struct in `src/cli/mod.rs`):
+
+| Flag | Type | Default | Description |
+|---|---|---|---|
+| `-v` / `--verbose` | count (u8) | 0 | Increase stderr log verbosity. Stacks: `-v`, `-vv`, `-vvv`. |
+| `--log-format` | text \| json | text | Stderr log format. `json` emits one JSON object per log line (same schema as file layer). |
+
+These are global flags (`global = true`) consistent with the existing `--quiet`, `--robot`, etc.
+
+---
+
+## 5. Implementation Plan
+
+### Phase 1: Verbosity Flags + Structured File Logging
+
+**Scope**: CLI flags, dual-layer subscriber, file logging, rotation, retention, `--log-format`.
+
+**Files touched**:
+- `Cargo.toml` — add `tracing-appender` dependency
+- `src/cli/mod.rs` — add `-v`/`--verbose` (count) and `--log-format` flags to `Cli` struct
+- `src/main.rs` — replace subscriber initialization (lines 44-58) with dual-layer setup
+- `src/core/config.rs` — add `LoggingConfig` struct and `logging` field to `Config`
+- `src/core/paths.rs` — add `get_log_dir()` helper (XDG data dir + `/logs/`)
+- `src/cli/commands/doctor.rs` — add log file location and disk usage check
+
+**Implementation steps**:
+1. Add `-v` / `--verbose` (count, `u8`) and `--log-format` (text|json) flags to `Cli` struct.
+2. Add `tracing-appender` dependency to `Cargo.toml`.
+3. Add `LoggingConfig` to `Config` with `#[serde(default)]`.
+4. Add `get_log_dir()` to `src/core/paths.rs` (mirrors `get_db_path()` pattern).
+5. Replace subscriber init in `main.rs`:
+   - Build `stderr_filter` from `-v` count (or `RUST_LOG` if set).
+   - Build `file_filter` as `lore=debug,warn` (or `RUST_LOG` if set).
+   - stderr layer: `fmt::layer().with_writer(SuspendingWriter)` with `stderr_filter`. When `--log-format json`, chain `.json()`.
+   - file layer: `fmt::layer().json().with_writer(tracing_appender::rolling::daily(log_dir, "lore"))` with `file_filter`.
+   - Combine via `registry().with(stderr_layer.with_filter(stderr_filter)).with(file_layer.with_filter(file_filter))`.
+6. Implement log retention at startup: scan `log_dir`, delete files matching `lore.*.log` pattern older than `retention_days`. Run before subscriber init so deleted files aren't held open.
+7. Add log file check to `lore doctor`: report log directory path, number of log files, total disk usage. In robot mode, add a `logging` field to `DoctorChecks` with `log_dir`, `file_count`, `total_bytes`, `oldest_file`.
+
+**New dependencies**: `tracing-appender` (0.2)
+
+**Interaction with `-q`/`--quiet`**: The existing `--quiet` flag suppresses non-error terminal output via `IngestDisplay::silent()`. It should NOT affect file logging (file layer is always on). When `-q` and `-v` are both passed, `-q` wins for stderr (set stderr filter to WARN+). File layer remains at DEBUG+.
+
+**Tests** (see Section 6.1 for details):
+- Unit: `EnvFilter` construction from verbosity count (0→INFO, 1→DEBUG, 2→DEBUG+deps, 3→TRACE)
+- Unit: `RUST_LOG` overrides `-v` flags
+- Unit: `-q` + `-v` interaction (quiet wins)
+- Unit: `LoggingConfig` deserialization with missing/partial/full fields
+- Unit: Log retention deletes old files, preserves recent ones
+- Integration: Subscriber produces JSON lines to a test file
+- Integration: `SuspendingWriter` still works with dual-layer stack (no garbled output)
+
+### Phase 2: Spans + Correlation IDs
+
+**Scope**: Tracing spans, UUID-based `run_id`, span recording for JSON logs.
+
+**Depends on**: Phase 1 (subscriber must support span recording).
+
+**Files touched**:
+- `src/cli/commands/sync.rs` — add root span with `run_id` field to `run_sync()`
+- `src/cli/commands/ingest.rs` — add `#[instrument]` spans to `run_ingest()` and its stages
+- `src/ingestion/orchestrator.rs` — add spans for `fetch_pages`, `sync_discussions`, `fetch_resource_events`
+- `src/documents/regenerator.rs` — add span for `generate_docs` stage
+- `src/embedding/pipeline.rs` — add span for `embed` stage
+- `src/main.rs` — generate `run_id` before calling command handler, pass as field
+
+**Implementation steps**:
+1. Generate `run_id` using `Uuid::new_v4().to_string()[..8]` (first 8 chars of UUIDv4) at command entry in `main.rs`. No new dependency needed — `uuid` v1 with v4 feature is already in `Cargo.toml`.
+2. Create root span: `let _root = tracing::info_span!("sync", run_id = %run_id).entered();` (or equivalent for each command).
+3. Add `#[instrument(skip_all, fields(stage = "ingest_issues"))]` to ingest stages.
+4. Add `#[instrument(skip_all, fields(project = %project_path))]` to per-project functions.
+5. Ensure the file layer's JSON formatter includes span context. `tracing-subscriber`'s `fmt::layer().json()` includes the current span chain by default when the registry has span storage enabled.
+6. Verify: parse a log file, confirm every line includes `span.run_id`.
+
+**New dependencies**: None (`uuid` already present).
+
+**Tests**:
+- Unit: `run_id` is a valid 8-character hex string
+- Integration: Run a sync-like operation with spans, parse JSON log output, verify every line contains `run_id` in span context
+- Integration: Nested spans produce correct parent-child relationships in JSON output
+
+### Phase 3: Performance Metrics Collection
+
+**Scope**: `StageTiming` struct, span-to-metrics extraction, robot JSON enrichment, timing summary.
+
+**Depends on**: Phase 2 (spans must exist to extract timing from).
+
+**Files touched**:
+- `src/core/metrics.rs` — new file: `StageTiming` struct, `MetricsLayer` (custom tracing layer), span-to-timing extraction
+- `src/cli/commands/sync.rs` — consume `Vec<StageTiming>` from `MetricsLayer`, include in `SyncMeta`
+- `src/cli/commands/ingest.rs` — same pattern for standalone ingest
+- `src/main.rs` — register `MetricsLayer` in the subscriber stack
+
+**Implementation steps**:
+1. Define `StageTiming` struct with `sub_stages: Vec<StageTiming>` in `src/core/metrics.rs`.
+2. Implement `MetricsLayer` as a custom `tracing_subscriber::Layer`:
+   - `on_new_span`: Record span ID, name, parent, start time.
+   - `on_record`: Capture structured fields (`items_processed`, `items_skipped`, `errors`) recorded via `Span::record()`.
+   - `on_close`: Calculate `elapsed_ms`, build `StageTiming` entry, attach to parent.
+   - Provide `fn extract_timings(&self, run_id: &str) -> Vec<StageTiming>` to materialize the span tree after the root span closes.
+3. Store `MetricsLayer` reference (behind `Arc`) so command handlers can call `extract_timings()` after `run_sync()` completes.
+4. Extend `SyncMeta` and `SyncJsonOutput` to include `run_id: String` and `stages: Vec<StageTiming>`.
+5. Print human-readable timing summary at end of interactive sync (gated behind `IngestDisplay::show_text`).
+
+**Span field recording**: Sync stages must record item counts as span fields for `MetricsLayer` to capture:
+```rust
+let span = tracing::info_span!("ingest_issues");
+let _guard = span.enter();
+// ... do work ...
+span.record("items_processed", count);
+span.record("items_skipped", skipped);
+```
+
+**Tests**:
+- Unit: `StageTiming` serialization matches expected JSON (including nested `sub_stages`)
+- Unit: `MetricsLayer` correctly builds span tree from synthetic span events
+- Unit: `MetricsLayer` handles spans with no children (leaf stages like `embed`)
+- Unit: `MetricsLayer` handles concurrent spans (multiple projects in parallel)
+- Integration: `lore --robot sync` output includes `meta.stages` array with correct nesting
+- Integration: Interactive sync prints timing summary table to stderr
+
+### Phase 4: Sync History Enrichment
+
+**Scope**: Wire up `sync_runs` INSERT/UPDATE lifecycle, schema migration, enhanced sync-status.
+
+**Depends on**: Phase 3 (needs `Vec<StageTiming>` to store in `metrics_json`).
+
+**Files touched**:
+- `migrations/014_sync_runs_enrichment.sql` — new migration: add `run_id`, `total_items_processed`, `total_errors` columns + index
+- `src/core/sync_run.rs` — new file: `SyncRunRecorder` struct encapsulating INSERT on start, UPDATE on finish
+- `src/cli/commands/sync.rs` — create `SyncRunRecorder` before pipeline, finalize after
+- `src/cli/commands/ingest.rs` — same pattern for standalone ingest
+- `src/cli/commands/sync_status.rs` — enhance to show recent runs with metrics, parse `metrics_json`
+
+**Implementation steps**:
+1. Create migration `014_sync_runs_enrichment.sql`:
+   ```sql
+   ALTER TABLE sync_runs ADD COLUMN run_id TEXT;
+   ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
+   ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
+   CREATE INDEX idx_sync_runs_run_id ON sync_runs(run_id);
+   ```
+   Note: Migration number 014 assumes no other migration is added before this phase. If concurrent work adds migration 014, renumber accordingly.
+2. Implement `SyncRunRecorder`:
+   ```rust
+   pub struct SyncRunRecorder { id: i64, conn: Connection }
+   impl SyncRunRecorder {
+       pub fn start(conn: &Connection, command: &str, run_id: &str) -> Result<Self>;
+       pub fn succeed(self, metrics: &[StageTiming], total_items: usize, total_errors: usize) -> Result<()>;
+       pub fn fail(self, error: &str, metrics: Option<&[StageTiming]>) -> Result<()>;
+   }
+   ```
+3. In `run_sync()`: create `SyncRunRecorder::start()` before pipeline, call `.succeed()` or `.fail()` after.
+4. In `run_ingest()`: same pattern.
+5. Enhance `sync_status.rs`:
+   - Query last N runs (default 10) instead of just the last 1.
+   - Parse `metrics_json` column to extract stage breakdown.
+   - Show `run_id`, duration, item counts, error counts in both interactive and robot modes.
+   - Add `--run <run_id>` flag to `sync-status` for single-run detail view.
+
+**Tests**:
+- Unit: `SyncRunRecorder::start` inserts a row with status='running'
+- Unit: `SyncRunRecorder::succeed` updates status, sets finished_at, writes metrics_json
+- Unit: `SyncRunRecorder::fail` updates status, sets error, sets finished_at
+- Unit: Migration 014 applies cleanly on top of migration 013
+- Integration: `lore sync` creates a sync_runs row; `lore sync-status` displays it
+- Integration: `lore --robot sync-status` JSON includes `runs` array with stage breakdowns
+- Integration: Failed sync records error in sync_runs with partial metrics
+
+### Phase 5: Rate Limit + Retry Instrumentation
+
+**Scope**: Enhanced logging in GitLab client, retry counters in stage timing.
+
+**Depends on**: Phase 2 (spans for context), Phase 3 (StageTiming for counters).
+
+**Files touched**:
+- `src/gitlab/client.rs` (or wherever the HTTP client with retry logic lives) — add structured fields to retry/rate-limit log events
+- `src/core/metrics.rs` — add `rate_limit_hits` and `retries` fields to `StageTiming`
+
+**Implementation steps**:
+1. Find the retry/rate-limit handling code (likely in the GitLab HTTP client). Add structured tracing fields:
+   ```rust
+   info!(
+       path = %request_path,
+       attempt = attempt_number,
+       retry_after_secs = retry_after,
+       status_code = 429,
+       "Rate limited, retrying"
+   );
+   ```
+2. Add `rate_limit_hits: usize` and `retries: usize` fields to `StageTiming` (with `#[serde(skip_serializing_if = "is_zero")]`).
+3. In `MetricsLayer`, count rate-limit and retry events within each span and include in `StageTiming`.
+4. In `-v` mode, the existing stderr layer already shows INFO+ events, so retry activity becomes visible automatically. No additional work needed beyond step 1.
+
+**Tests**:
+- Unit: Rate-limit log events include all required structured fields
+- Unit: `StageTiming` serialization includes `rate_limit_hits` and `retries` when non-zero, omits when zero
+- Integration: Simulate 429 response, verify log line has `path`, `attempt`, `retry_after_secs` fields
+- Integration: After simulated retries, `StageTiming` counts match expected values
+
+---
+
+## 6. Acceptance Criteria
+
+### 6.1 Phase 1: Verbosity Flags + Structured File Logging
+
+**Functional criteria**:
+- [ ] `lore sync` writes JSON log lines to `~/.local/share/lore/logs/lore.YYYY-MM-DD.log` with zero configuration.
+- [ ] `lore -v sync` shows DEBUG-level `lore::*` output on stderr; dependency output stays at WARN.
+- [ ] `lore -vv sync` shows DEBUG-level `lore::*` + INFO-level dependency output on stderr.
+- [ ] `lore -vvv sync` shows TRACE-level output for everything on stderr.
+- [ ] `RUST_LOG=lore::gitlab=trace lore sync` overrides `-v` flags for both stderr and file layers.
+- [ ] `lore --log-format json sync` emits JSON-formatted log lines on stderr (same schema as file layer).
+- [ ] Log files rotate daily (new file per calendar day).
+- [ ] Files matching `lore.*.log` older than `retention_days` are deleted on startup.
+- [ ] Existing behavior is unchanged when no new flags are passed (INFO on stderr, human-readable format).
+- [ ] `--quiet` suppresses non-error stderr output. `-q` + `-v` together: `-q` wins (stderr at WARN+).
+- [ ] `--quiet` does NOT affect file logging (file layer remains at DEBUG+).
+- [ ] `lore doctor` reports: log directory path, number of log files, total disk usage in bytes. Robot mode includes a `logging` field in the checks JSON.
+- [ ] File layer always logs at DEBUG+ for `lore::*` crate regardless of `-v` flags.
+
+**Test specifications**:
+- `test_verbosity_filter_construction`: Given verbosity count 0/1/2/3, assert the resulting `EnvFilter` matches the expected directives table.
+- `test_rust_log_overrides_verbose`: Set `RUST_LOG=lore=trace`, pass `-v` (count=1), assert the filter uses TRACE (not DEBUG).
+- `test_quiet_overrides_verbose`: Pass `-q` and `-v` together, assert stderr filter is WARN+.
+- `test_logging_config_defaults`: Deserialize an empty `{}` JSON as `LoggingConfig`, assert `retention_days=30`, `file_logging=true`, `log_dir=None`.
+- `test_logging_config_partial`: Deserialize `{"retention_days": 7}`, assert `file_logging=true` default preserved.
+- `test_log_retention_cleanup`: Create temp dir with files named `lore.2026-01-01.log` through `lore.2026-02-04.log`. Run retention with `retention_days=7`. Assert files older than 7 days are deleted, recent files preserved.
+- `test_log_retention_ignores_non_log_files`: Create temp dir with `lore.2026-01-01.log` and `other.txt`. Run retention. Assert `other.txt` is NOT deleted.
+- `test_json_log_output_format`: Capture file layer output, parse each line as JSON, assert keys: `timestamp`, `level`, `target`, `fields`, `span`.
+- `test_suspending_writer_dual_layer`: Run a tracing event with both layers active and a progress bar. Assert no garbled output on stderr (no interleaved progress bar fragments in log lines).
+
+### 6.2 Phase 2: Spans + Correlation IDs
+
+**Functional criteria**:
+- [ ] Every log line within a sync run includes `run_id` in the JSON span context.
+- [ ] `jq 'select(.spans[] | .run_id != null)' lore.2026-02-04.log` extracts all lines from a run.
+- [ ] Nested spans produce a chain: log lines inside `fetch_pages` include both the `fetch_pages` span and the parent `ingest_issues` span in their span context.
+- [ ] `run_id` is an 8-character hex string (truncated UUIDv4).
+- [ ] Spans are visible in `-vv` stderr output as bracketed context.
+
+**Test specifications**:
+- `test_run_id_format`: Generate 100 run_ids, assert each is 8 chars, all hex characters.
+- `test_run_id_uniqueness`: Generate 1000 run_ids, assert no duplicates.
+- `test_span_context_in_json_logs`: Run a mock sync with spans, capture JSON log output, parse and verify each line has `spans` array containing `run_id`.
+- `test_nested_span_chain`: Create parent span "sync" with child "ingest_issues" with child "fetch_pages". Emit a log event inside "fetch_pages". Assert the JSON log line's span chain includes all three span names.
+- `test_span_elapsed_on_close`: Create a span, sleep 10ms, close it. Verify the close event records `elapsed_ms >= 10`.
+
+### 6.3 Phase 3: Performance Metrics Collection
+
+**Functional criteria**:
+- [ ] `lore --robot sync` JSON includes `meta.run_id` (string) and `meta.stages` (array).
+- [ ] Each stage in `meta.stages` has: `name`, `elapsed_ms`, `items_processed`.
+- [ ] Top-level stages (ingest_issues, ingest_mrs, generate_docs, embed) have `sub_stages` arrays.
+- [ ] Sub-stages include `project` field when applicable.
+- [ ] `lore sync` (interactive) prints a timing summary table on stderr, gated behind `IngestDisplay::show_text`.
+- [ ] `lore -q sync` does NOT print the timing summary.
+- [ ] Zero-value fields (`items_skipped: 0`, `errors: 0`) are omitted from JSON output.
+
+**Test specifications**:
+- `test_stage_timing_serialization`: Create a `StageTiming` with sub_stages, serialize to JSON, assert structure matches PRD example.
+- `test_stage_timing_zero_fields_omitted`: Create `StageTiming` with `errors: 0`, serialize, assert no `errors` key in output.
+- `test_metrics_layer_single_span`: Create `MetricsLayer`, enter/exit one span with recorded fields, extract timings, assert one `StageTiming` entry.
+- `test_metrics_layer_nested_spans`: Create parent + child spans, extract timings, assert parent has child in `sub_stages`.
+- `test_metrics_layer_parallel_spans`: Create two sibling spans (simulating two projects), extract timings, assert both appear as sub_stages of parent.
+- `test_sync_meta_includes_stages`: Mock a sync pipeline, verify robot JSON output parses correctly with `meta.stages`.
+- `test_timing_summary_format`: Capture stderr during interactive sync, verify timing table format matches PRD example.
+
+### 6.4 Phase 4: Sync History Enrichment
+
+**Functional criteria**:
+- [ ] `lore sync` creates a row in `sync_runs` with status='running' at start, updated to 'succeeded'/'failed' at finish.
+- [ ] `lore ingest issues` also creates a `sync_runs` row.
+- [ ] `sync_runs.run_id` matches the `run_id` in log files and robot JSON.
+- [ ] `sync_runs.metrics_json` contains the serialized `Vec<StageTiming>`.
+- [ ] `sync_runs.total_items_processed` and `total_errors` are populated.
+- [ ] `lore sync-status` shows the last 10 runs with: run_id, timestamp, duration, item count, error count.
+- [ ] `lore --robot sync-status` JSON includes `runs` array with `stages` parsed from `metrics_json`.
+- [ ] Failed syncs record the error message and any partial metrics collected before failure.
+- [ ] Migration 014 applies cleanly and is idempotent (safe to re-run).
+
+**Test specifications**:
+- `test_sync_run_recorder_start`: Call `start()`, query sync_runs, assert one row with status='running'.
+- `test_sync_run_recorder_succeed`: Call `start()` then `succeed()`, assert row has status='succeeded', finished_at set, metrics_json parseable.
+- `test_sync_run_recorder_fail`: Call `start()` then `fail()`, assert row has status='failed', error set.
+- `test_sync_run_recorder_fail_with_partial_metrics`: Call `start()`, collect some metrics, then `fail()`. Assert metrics_json contains partial data.
+- `test_migration_014_applies`: Apply all migrations 001-014 on a fresh DB. Assert `sync_runs` has `run_id`, `total_items_processed`, `total_errors` columns.
+- `test_migration_014_idempotent`: Apply migration 014 twice. Assert no error on second apply.
+- `test_sync_status_shows_runs`: Insert 3 sync_runs rows, run `print_sync_status()`, assert output includes all 3 with correct formatting.
+- `test_sync_status_json_includes_stages`: Insert a sync_runs row with metrics_json, run robot-mode sync-status, parse JSON, assert `runs[0].stages` is an array.
+
+### 6.5 Phase 5: Rate Limit + Retry Instrumentation
+
+**Functional criteria**:
+- [ ] Rate-limit events (HTTP 429) log at INFO with structured fields: `path`, `attempt`, `retry_after_secs`, `status_code`.
+- [ ] Retry events (non-429 transient errors) log with: `path`, `attempt`, `error`.
+- [ ] `StageTiming` includes `rate_limit_hits` and `retries` counts (omitted when zero).
+- [ ] `lore -v sync` shows retry activity on stderr (visible because it's INFO+).
+- [ ] Rate limit counts are included in `metrics_json` stored in `sync_runs`.
+
+**Test specifications**:
+- `test_rate_limit_log_fields`: Simulate a 429 response, capture log output, parse JSON, assert fields: `path`, `attempt`, `retry_after_secs`, `status_code`.
+- `test_retry_log_fields`: Simulate a transient error + retry, capture log, assert fields: `path`, `attempt`, `error`.
+- `test_stage_timing_rate_limit_counts`: Simulate 3 rate-limit hits within a span, extract `StageTiming`, assert `rate_limit_hits == 3`.
+- `test_stage_timing_retry_counts`: Simulate 2 retries, extract `StageTiming`, assert `retries == 2`.
+- `test_rate_limit_fields_omitted_when_zero`: Create `StageTiming` with zero rate limits, serialize, assert no `rate_limit_hits` key.
+
+---
+
+## 7. Resolved Decisions
+
+1. **Log format**: Use `tracing-subscriber`'s built-in JSON formatter (`fmt::layer().json()`). Zero custom code, battle-tested, and ecosystem tools (Grafana Loki, Datadog) already parse this format. The schema difference from our robot JSON envelope is cosmetic and not worth the maintenance burden of a custom formatter.
+
+2. **Span recording**: Always-on. lore is I/O-bound (GitLab API + SQLite), so the nanosecond-level overhead of span storage and chain lookup is unmeasurable against our millisecond-scale operations. Conditional recording would add subscriber construction complexity for zero practical benefit.
+
+3. **Log file location**: `~/.local/share/lore/logs/` (XDG data directory). Logs are NOT reproducible — you can generate new logs, but you cannot regenerate the exact diagnostic output from a past run. They are forensic artifacts that users would notice missing, so they belong in data, not cache.
+
+4. **Retention**: In scope for Phase 1. Startup cleanup: scan log directory, delete files matching `lore.*.log` older than `retention_days` (default 30). Simple, no background threads, no external dependencies. Runs before subscriber initialization so deleted file handles aren't held.
+
+5. **Stage timing granularity**: Per-project with nested sub-stages. When one project has 500 MRs and another has 3, knowing which one consumed the time budget is the difference between "sync was slow" and actionable diagnosis. The `StageTiming` struct includes an optional `project` field and a `sub_stages: Vec<StageTiming>` field for nesting.
+
+6. **Stage timing collection mechanism**: Materialized from tracing spans, not plumbed manually. A custom `MetricsLayer` in the subscriber stack records span enter/exit/record events and builds the `StageTiming` tree. This avoids threading a mutable collector through every function signature and makes spans the single source of truth for timing data. Phase 2 adds spans; Phase 3 adds the layer that reads them.
+
+7. **run_id format**: First 8 characters of `Uuid::new_v4().to_string()` (e.g., `"a1b2c3d4"`). The `uuid` crate (v1, v4 feature) is already a dependency. No new crate needed. 8 characters provide ~4 billion unique values — more than sufficient for local CLI invocations.
+
+8. **File log level**: Always DEBUG+ for `lore::*` crate, WARN+ for dependencies, regardless of `-v` flags. This ensures post-mortem data is always richer than what was shown on stderr. `RUST_LOG` overrides both layers when set.
+
+9. **sync_runs lifecycle**: The table exists (migration 001) but nothing writes to it. Phase 4 wires up the INSERT (on start) / UPDATE (on finish) lifecycle AND adds enrichment columns in a single migration. The existing `metrics_json` column stores the detailed `Vec<StageTiming>` array — no need for a separate `stages_json` column.
+
+10. **JSON stderr via --log-format**: A `--log-format text|json` global flag controls stderr log format. Default is `text` (human-readable). When `json`, stderr uses the same JSON formatter as the file layer, routed through `SuspendingWriter` for progress bar coordination. This enables `lore sync 2>&1 | jq` workflows without reading log files.
+
+---
+
+## 8. Phase Dependency Graph
+
+```
+Phase 1 (Subscriber + Flags)
+    |
+    v
+Phase 2 (Spans + run_id)
+    |
+    +------+------+
+    |             |
+    v             v
+Phase 3         Phase 5
+(Metrics)       (Rate Limit Logging)
+    |             |
+    v             |
+Phase 4           |
+(Sync History) <--+
+```
+
+**Parallelization opportunities**:
+- Phase 1 must complete before anything else.
+- Phase 2 must complete before Phase 3 or Phase 5.
+- Phase 3 and Phase 5 can run in parallel (Phase 5 only needs spans from Phase 2, not MetricsLayer from Phase 3).
+- Phase 4 depends on Phase 3 (needs `Vec<StageTiming>` to store). Phase 5's `rate_limit_hits`/`retries` fields on `StageTiming` can be added to Phase 4's stored data after Phase 5 completes, or Phase 4 can store them as zero initially.
+
+**Agent assignment suggestion**:
+- Agent A: Phase 1 → Phase 2 (sequential, foundational infrastructure)
+- Agent B: Phase 3 (after Phase 2 completes)
+- Agent C: Phase 5 (after Phase 2 completes, parallel with Phase 3)
+- Agent B or D: Phase 4 (after Phase 3 completes)
+
+---
+
+## 9. References
+
+- Gholamian, S. & Ward, P. (2021). "A Comprehensive Survey of Logging in Software." arXiv:2110.12489.
+- Duan, S. et al. (2025). "PDLogger: Automated Logging Framework for Practical Software Development." arXiv:2507.19951.
+- tokio-rs/tracing ecosystem: `tracing`, `tracing-subscriber`, `tracing-appender`.
+- GNU Coding Standards: Verbosity and diagnostic output conventions.
+ Rust CLI Working Group: Recommendations for error reporting and verbosity.
--- a/gitlore-sync-explorer.html
+++ b/gitlore-sync-explorer.html
@@ -0,0 +1,844 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Gitlore Sync Pipeline Explorer</title>
+<style>
+  :root {
+    --bg: #0d1117;
+    --bg-secondary: #161b22;
+    --bg-tertiary: #1c2129;
+    --border: #30363d;
+    --text: #c9d1d9;
+    --text-dim: #8b949e;
+    --text-bright: #f0f6fc;
+    --cyan: #58a6ff;
+    --green: #3fb950;
+    --amber: #d29922;
+    --red: #f85149;
+    --purple: #bc8cff;
+    --pink: #f778ba;
+    --cyan-dim: rgba(88,166,255,0.15);
+    --green-dim: rgba(63,185,80,0.15);
+    --amber-dim: rgba(210,153,34,0.15);
+    --red-dim: rgba(248,81,73,0.15);
+    --purple-dim: rgba(188,140,255,0.15);
+  }
+  * { margin: 0; padding: 0; box-sizing: border-box; }
+  body {
+    font-family: 'SF Mono', 'Cascadia Code', 'Fira Code', 'JetBrains Mono', monospace;
+    background: var(--bg); color: var(--text);
+    display: flex; height: 100vh; overflow: hidden;
+  }
+  .sidebar {
+    width: 220px; min-width: 220px; background: var(--bg-secondary);
+    border-right: 1px solid var(--border); display: flex; flex-direction: column; padding: 16px 0;
+  }
+  .sidebar-title {
+    font-size: 11px; font-weight: 700; text-transform: uppercase;
+    letter-spacing: 1.2px; color: var(--text-dim); padding: 0 16px 12px;
+  }
+  .logo {
+    padding: 0 16px 20px; font-size: 15px; font-weight: 700; color: var(--cyan);
+    display: flex; align-items: center; gap: 8px;
+  }
+  .logo svg { width: 20px; height: 20px; }
+  .nav-item {
+    padding: 10px 16px; cursor: pointer; font-size: 13px; color: var(--text-dim);
+    transition: all 0.15s; border-left: 3px solid transparent;
+    display: flex; align-items: center; gap: 10px;
+  }
+  .nav-item:hover { background: var(--bg-tertiary); color: var(--text); }
+  .nav-item.active { background: var(--cyan-dim); color: var(--cyan); border-left-color: var(--cyan); }
+  .nav-dot { width: 8px; height: 8px; border-radius: 50%; flex-shrink: 0; }
+  .main { flex: 1; display: flex; flex-direction: column; overflow: hidden; }
+  .header {
+    padding: 16px 24px; border-bottom: 1px solid var(--border);
+    display: flex; align-items: center; justify-content: space-between;
+  }
+  .header h1 { font-size: 16px; font-weight: 600; color: var(--text-bright); }
+  .header-badge {
+    font-size: 11px; padding: 3px 10px; border-radius: 12px;
+    background: var(--cyan-dim); color: var(--cyan);
+  }
+  .canvas-wrapper { flex: 1; overflow: auto; position: relative; }
+  .canvas { padding: 32px; min-height: 100%; }
+  .flow-container { display: none; }
+  .flow-container.active { display: block; }
+  .phase { margin-bottom: 32px; }
+  .phase-header { display: flex; align-items: center; gap: 12px; margin-bottom: 16px; }
+  .phase-number {
+    width: 28px; height: 28px; border-radius: 50%; display: flex; align-items: center;
+    justify-content: center; font-size: 13px; font-weight: 700; flex-shrink: 0;
+  }
+  .phase-title { font-size: 14px; font-weight: 600; color: var(--text-bright); }
+  .phase-subtitle { font-size: 11px; color: var(--text-dim); margin-left: 4px; font-weight: 400; }
+  .flow-row {
+    display: flex; align-items: stretch; gap: 0; flex-wrap: wrap;
+    margin-left: 14px; padding-left: 26px; border-left: 2px solid var(--border);
+  }
+  .flow-row:last-child { border-left-color: transparent; }
+  .node {
+    position: relative; padding: 12px 16px; border-radius: 8px;
+    border: 1px solid var(--border); background: var(--bg-secondary);
+    font-size: 12px; cursor: pointer; transition: all 0.2s;
+    min-width: 180px; max-width: 260px; margin: 4px 0;
+  }
+  .node:hover {
+    border-color: var(--cyan); transform: translateY(-1px);
+    box-shadow: 0 4px 12px rgba(0,0,0,0.3);
+  }
+  .node.selected {
+    border-color: var(--cyan);
+    box-shadow: 0 0 0 1px var(--cyan), 0 4px 16px rgba(88,166,255,0.15);
+  }
+  .node-title { font-weight: 600; font-size: 12px; margin-bottom: 4px; color: var(--text-bright); }
+  .node-desc { font-size: 11px; color: var(--text-dim); line-height: 1.5; }
+  .node.api { border-left: 3px solid var(--cyan); }
+  .node.transform { border-left: 3px solid var(--purple); }
+  .node.db { border-left: 3px solid var(--green); }
+  .node.decision { border-left: 3px solid var(--amber); }
+  .node.error { border-left: 3px solid var(--red); }
+  .node.queue { border-left: 3px solid var(--pink); }
+  .arrow {
+    display: flex; align-items: center; padding: 0 6px;
+    color: var(--text-dim); font-size: 16px; flex-shrink: 0;
+  }
+  .arrow-down {
+    display: flex; justify-content: center; padding: 4px 0;
+    color: var(--text-dim); font-size: 16px; margin-left: 14px;
+    padding-left: 26px; border-left: 2px solid var(--border);
+  }
+  .branch-container {
+    margin-left: 14px; padding-left: 26px;
+    border-left: 2px solid var(--border); padding-bottom: 8px;
+  }
+  .branch-row { display: flex; gap: 12px; margin: 8px 0; flex-wrap: wrap; }
+  .branch-label {
+    font-size: 11px; font-weight: 600; margin: 8px 0 4px;
+    display: flex; align-items: center; gap: 6px;
+  }
+  .branch-label.success { color: var(--green); }
+  .branch-label.error { color: var(--red); }
+  .branch-label.retry { color: var(--amber); }
+  .diff-badge {
+    display: inline-block; font-size: 10px; padding: 2px 6px;
+    border-radius: 4px; margin-top: 6px; font-weight: 600;
+  }
+  .diff-badge.changed { background: var(--amber-dim); color: var(--amber); }
+  .diff-badge.same { background: var(--green-dim); color: var(--green); }
+  .detail-panel {
+    position: fixed; right: 0; top: 0; bottom: 0; width: 380px;
+    background: var(--bg-secondary); border-left: 1px solid var(--border);
+    transform: translateX(100%); transition: transform 0.25s ease;
+    z-index: 100; display: flex; flex-direction: column; overflow: hidden;
+  }
+  .detail-panel.open { transform: translateX(0); }
+  .detail-header {
+    padding: 16px 20px; border-bottom: 1px solid var(--border);
+    display: flex; align-items: center; justify-content: space-between;
+  }
+  .detail-header h2 { font-size: 14px; font-weight: 600; color: var(--text-bright); }
+  .detail-close {
+    cursor: pointer; color: var(--text-dim); font-size: 18px;
+    background: none; border: none; padding: 4px 8px; border-radius: 4px;
+  }
+  .detail-close:hover { background: var(--bg-tertiary); color: var(--text); }
+  .detail-body { flex: 1; overflow-y: auto; padding: 20px; }
+  .detail-section { margin-bottom: 20px; }
+  .detail-section h3 {
+    font-size: 11px; text-transform: uppercase; letter-spacing: 0.8px;
+    color: var(--text-dim); margin-bottom: 8px;
+  }
+  .detail-section p { font-size: 12px; line-height: 1.7; color: var(--text); }
+  .sql-block {
+    background: var(--bg); border: 1px solid var(--border); border-radius: 6px;
+    padding: 12px; font-size: 11px; line-height: 1.6; color: var(--green);
+    overflow-x: auto; white-space: pre; margin-top: 8px;
+  }
+  .detail-tag {
+    display: inline-block; font-size: 10px; padding: 2px 8px;
+    border-radius: 10px; margin: 2px 4px 2px 0;
+  }
+  .detail-tag.file { background: var(--purple-dim); color: var(--purple); }
+  .detail-tag.type-api { background: var(--cyan-dim); color: var(--cyan); }
+  .detail-tag.type-db { background: var(--green-dim); color: var(--green); }
+  .detail-tag.type-transform { background: var(--purple-dim); color: var(--purple); }
+  .detail-tag.type-decision { background: var(--amber-dim); color: var(--amber); }
+  .detail-tag.type-error { background: var(--red-dim); color: var(--red); }
+  .detail-tag.type-queue { background: rgba(247,120,186,0.15); color: var(--pink); }
+  .watermark-panel { border-top: 1px solid var(--border); background: var(--bg-secondary); }
+  .watermark-toggle {
+    padding: 10px 24px; cursor: pointer; font-size: 12px; color: var(--text-dim);
+    display: flex; align-items: center; gap: 8px; user-select: none;
+  }
+  .watermark-toggle:hover { color: var(--text); }
+  .watermark-toggle .chevron { transition: transform 0.2s; font-size: 10px; }
+  .watermark-toggle .chevron.open { transform: rotate(180deg); }
+  .watermark-content { display: none; padding: 0 24px 16px; max-height: 260px; overflow-y: auto; }
+  .watermark-content.open { display: block; }
+  .wm-table { width: 100%; border-collapse: collapse; font-size: 11px; }
+  .wm-table th {
+    text-align: left; padding: 6px 12px; color: var(--text-dim); font-weight: 600;
+    border-bottom: 1px solid var(--border); font-size: 10px;
+    text-transform: uppercase; letter-spacing: 0.5px;
+  }
+  .wm-table td { padding: 6px 12px; border-bottom: 1px solid var(--border); color: var(--text); }
+  .wm-table td:first-child { color: var(--cyan); font-weight: 600; }
+  .wm-table td:nth-child(2) { color: var(--green); }
+  .overview-pipeline { display: flex; gap: 0; align-items: stretch; margin: 24px 0; flex-wrap: wrap; }
+  .overview-stage {
+    flex: 1; min-width: 200px; background: var(--bg-secondary);
+    border: 1px solid var(--border); border-radius: 10px; padding: 20px;
+    cursor: pointer; transition: all 0.2s;
+  }
+  .overview-stage:hover {
+    border-color: var(--cyan); transform: translateY(-2px);
+    box-shadow: 0 6px 20px rgba(0,0,0,0.3);
+  }
+  .overview-arrow { display: flex; align-items: center; padding: 0 8px; font-size: 20px; color: var(--text-dim); }
+  .stage-num { font-size: 10px; font-weight: 700; text-transform: uppercase; letter-spacing: 1px; margin-bottom: 8px; }
+  .stage-title { font-size: 15px; font-weight: 700; color: var(--text-bright); margin-bottom: 6px; }
+  .stage-desc { font-size: 11px; color: var(--text-dim); line-height: 1.6; }
+  .stage-detail {
+    margin-top: 12px; padding-top: 12px; border-top: 1px solid var(--border);
+    font-size: 11px; color: var(--text-dim); line-height: 1.6;
+  }
+  .stage-detail code {
+    color: var(--amber); background: var(--amber-dim); padding: 1px 5px;
+    border-radius: 3px; font-size: 10px;
+  }
+  .info-box {
+    background: var(--bg-tertiary); border: 1px solid var(--border);
+    border-radius: 8px; padding: 16px; margin: 16px 0; font-size: 12px; line-height: 1.7;
+  }
+  .info-box-title { font-weight: 600; color: var(--cyan); margin-bottom: 6px; display: flex; align-items: center; gap: 6px; }
+  .info-box ul { margin-left: 16px; color: var(--text-dim); }
+  .info-box li { margin: 4px 0; }
+  .info-box code {
+    color: var(--amber); background: var(--amber-dim);
+    padding: 1px 5px; border-radius: 3px; font-size: 11px;
+  }
+  .legend {
+    display: flex; gap: 16px; flex-wrap: wrap; margin-bottom: 24px;
+    padding: 12px 16px; background: var(--bg-secondary);
+    border: 1px solid var(--border); border-radius: 8px;
+  }
+  .legend-item { display: flex; align-items: center; gap: 6px; font-size: 11px; color: var(--text-dim); }
+  .legend-color { width: 12px; height: 3px; border-radius: 2px; }
+  ::-webkit-scrollbar { width: 8px; height: 8px; }
+  ::-webkit-scrollbar-track { background: transparent; }
+  ::-webkit-scrollbar-thumb { background: var(--border); border-radius: 4px; }
+  ::-webkit-scrollbar-thumb:hover { background: var(--text-dim); }
+</style>
+</head>
+<body>
+
+<div class="sidebar">
+  <div class="logo">
+    <svg viewBox="0 0 20 20" fill="none" stroke="currentColor" stroke-width="1.5">
+      <circle cx="10" cy="10" r="8"/><path d="M10 6v4l3 2"/>
+    </svg>
+    lore sync
+  </div>
+  <div class="sidebar-title">Entity Flows</div>
+  <div class="nav-item active" data-view="overview" onclick="switchView('overview')">
+    <div class="nav-dot" style="background:var(--cyan)"></div>Full Sync Overview
+  </div>
+  <div class="nav-item" data-view="issues" onclick="switchView('issues')">
+    <div class="nav-dot" style="background:var(--green)"></div>Issues
+  </div>
+  <div class="nav-item" data-view="mrs" onclick="switchView('mrs')">
+    <div class="nav-dot" style="background:var(--purple)"></div>Merge Requests
+  </div>
+  <div class="nav-item" data-view="docs" onclick="switchView('docs')">
+    <div class="nav-dot" style="background:var(--amber)"></div>Documents
+  </div>
+  <div class="nav-item" data-view="embed" onclick="switchView('embed')">
+    <div class="nav-dot" style="background:var(--pink)"></div>Embeddings
+  </div>
+</div>
+
+<div class="main">
+  <div class="header">
+    <h1 id="view-title">Full Sync Overview</h1>
+    <span class="header-badge" id="view-badge">4 stages</span>
+  </div>
+
+  <div class="canvas-wrapper"><div class="canvas">
+
+    <!-- OVERVIEW -->
+    <div class="flow-container active" id="view-overview">
+      <div class="overview-pipeline">
+        <div class="overview-stage" onclick="switchView('issues')">
+          <div class="stage-num" style="color:var(--green)">Stage 1</div>
+          <div class="stage-title">Ingest Issues</div>
+          <div class="stage-desc">Fetch issues + discussions + resource events from GitLab API</div>
+          <div class="stage-detail">Cursor-based incremental sync.<br>Sequential discussion fetch.<br>Queue-based resource events.</div>
+        </div>
+        <div class="overview-arrow">&rarr;</div>
+        <div class="overview-stage" onclick="switchView('mrs')">
+          <div class="stage-num" style="color:var(--purple)">Stage 2</div>
+          <div class="stage-title">Ingest MRs</div>
+          <div class="stage-desc">Fetch merge requests + discussions + resource events</div>
+          <div class="stage-detail">Page-based incremental sync.<br>Parallel prefetch discussions.<br>Queue-based resource events.</div>
+        </div>
+        <div class="overview-arrow">&rarr;</div>
+        <div class="overview-stage" onclick="switchView('docs')">
+          <div class="stage-num" style="color:var(--amber)">Stage 3</div>
+          <div class="stage-title">Generate Docs</div>
+          <div class="stage-desc">Regenerate searchable documents for changed entities</div>
+          <div class="stage-detail">Driven by <code>dirty_sources</code> table.<br>Triple-hash skip optimization.<br>FTS5 index auto-updated.</div>
+        </div>
+        <div class="overview-arrow">&rarr;</div>
+        <div class="overview-stage" onclick="switchView('embed')">
+          <div class="stage-num" style="color:var(--pink)">Stage 4</div>
+          <div class="stage-title">Embed</div>
+          <div class="stage-desc">Generate vector embeddings via Ollama for semantic search</div>
+          <div class="stage-detail">Hash-based change detection.<br>Chunked, batched API calls.<br><b>Non-fatal</b> &mdash; graceful if Ollama down.</div>
+        </div>
+      </div>
+      <div class="info-box">
+        <div class="info-box-title">Concurrency Model</div>
+        <ul>
+          <li>Stages 1 &amp; 2 process <b>projects concurrently</b> via <code>buffer_unordered(primary_concurrency)</code></li>
+          <li>Each project gets its own <b>SQLite connection</b>; rate limiter is <b>shared</b></li>
+          <li>Discussions: <b>sequential</b> (issues) or <b>batched parallel prefetch</b> (MRs)</li>
+          <li>Resource events use a <b>persistent job queue</b> with atomic claim + exponential backoff</li>
+        </ul>
+      </div>
+      <div class="info-box">
+        <div class="info-box-title">Sync Flags</div>
+        <ul>
+          <li><code>--full</code> &mdash; Resets all cursors &amp; watermarks, forces complete re-fetch</li>
+          <li><code>--no-docs</code> &mdash; Skips Stage 3 (document generation)</li>
+          <li><code>--no-embed</code> &mdash; Skips Stage 4 (embedding generation)</li>
+          <li><code>--force</code> &mdash; Overrides stale single-flight lock</li>
+          <li><code>--project &lt;path&gt;</code> &mdash; Sync only one project (fuzzy matching)</li>
+        </ul>
+      </div>
+      <div class="info-box">
+        <div class="info-box-title">Single-Flight Lock</div>
+        <ul>
+          <li>Table-based lock (<code>AppLock</code>) prevents concurrent syncs</li>
+          <li>Heartbeat keeps the lock alive; stale locks auto-detected</li>
+          <li>Use <code>--force</code> to override a stale lock</li>
+        </ul>
+      </div>
+    </div>
+
+    <!-- ISSUES -->
+    <div class="flow-container" id="view-issues">
+      <div class="legend">
+        <div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>API Call</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Transform</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Decision</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error Path</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--pink)"></div>Queue</div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">1</div>
+          <div class="phase-title">Fetch Issues <span class="phase-subtitle">Cursor-Based Incremental Sync</span></div>
+        </div>
+        <div class="flow-row">
+          <div class="node api" data-detail="issue-api-call"><div class="node-title">GitLab API Call</div><div class="node-desc">paginate_issues() with<br>updated_after = cursor - rewind</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node decision" data-detail="issue-cursor-filter"><div class="node-title">Cursor Filter</div><div class="node-desc">updated_at &gt; cursor_ts<br>OR tie_breaker check</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node transform" data-detail="issue-transform"><div class="node-title">transform_issue()</div><div class="node-desc">GitLab API shape &rarr;<br>local DB row shape</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node db" data-detail="issue-transaction"><div class="node-title">Transaction</div><div class="node-desc">store_payload &rarr; upsert &rarr;<br>mark_dirty &rarr; relink</div></div>
+        </div>
+        <div class="arrow-down">&darr;</div>
+        <div class="flow-row">
+          <div class="node db" data-detail="issue-cursor-update"><div class="node-title">Update Cursor</div><div class="node-desc">Every 100 issues + final<br>sync_cursors table</div></div>
+        </div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:var(--green-dim);color:var(--green)">2</div>
+          <div class="phase-title">Discussion Sync <span class="phase-subtitle">Sequential, Watermark-Based</span></div>
+        </div>
+        <div class="flow-row">
+          <div class="node db" data-detail="issue-disc-query"><div class="node-title">Query Stale Issues</div><div class="node-desc">updated_at &gt; COALESCE(<br>discussions_synced_for_<br>updated_at, 0)</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node api" data-detail="issue-disc-fetch"><div class="node-title">Paginate Discussions</div><div class="node-desc">Sequential per issue<br>paginate_issue_discussions()</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node transform" data-detail="issue-disc-transform"><div class="node-title">Transform</div><div class="node-desc">transform_discussion()<br>transform_notes()</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node db" data-detail="issue-disc-write"><div class="node-title">Write Discussion</div><div class="node-desc">store_payload &rarr; upsert<br>DELETE notes &rarr; INSERT notes</div></div>
+        </div>
+        <div class="branch-container">
+          <div class="branch-label success">&#10003; On Success (all pages fetched)</div>
+          <div class="branch-row">
+            <div class="node db" data-detail="issue-disc-stale"><div class="node-title">Remove Stale</div><div class="node-desc">DELETE discussions not<br>seen in this fetch</div></div>
+            <div class="arrow">&rarr;</div>
+            <div class="node db" data-detail="issue-disc-watermark"><div class="node-title">Advance Watermark</div><div class="node-desc">discussions_synced_for_<br>updated_at = updated_at</div></div>
+          </div>
+          <div class="branch-label error">&#10007; On Pagination Error</div>
+          <div class="branch-row">
+            <div class="node error" data-detail="issue-disc-fail"><div class="node-title">Skip Stale Removal</div><div class="node-desc">Watermark NOT advanced<br>Will retry next sync</div></div>
+          </div>
+        </div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:rgba(247,120,186,0.15);color:var(--pink)">3</div>
+          <div class="phase-title">Resource Events <span class="phase-subtitle">Queue-Based, Concurrent Fetch</span></div>
+        </div>
+        <div class="flow-row">
+          <div class="node queue" data-detail="re-cleanup"><div class="node-title">Cleanup Obsolete</div><div class="node-desc">DELETE jobs where entity<br>watermark is current</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node queue" data-detail="re-enqueue"><div class="node-title">Enqueue Jobs</div><div class="node-desc">INSERT for entities where<br>updated_at &gt; watermark</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node queue" data-detail="re-claim"><div class="node-title">Claim Jobs</div><div class="node-desc">Atomic UPDATE...RETURNING<br>with lock acquisition</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node api" data-detail="re-fetch"><div class="node-title">Fetch Events</div><div class="node-desc">3 concurrent: state +<br>label + milestone</div></div>
+        </div>
+        <div class="branch-container">
+          <div class="branch-label success">&#10003; On Success</div>
+          <div class="branch-row">
+            <div class="node db" data-detail="re-store"><div class="node-title">Store Events</div><div class="node-desc">Transaction: upsert all<br>3 event types</div></div>
+            <div class="arrow">&rarr;</div>
+            <div class="node db" data-detail="re-complete"><div class="node-title">Complete + Watermark</div><div class="node-desc">DELETE job row<br>Advance watermark</div></div>
+          </div>
+          <div class="branch-label error">&#10007; Permanent Error (404 / 403)</div>
+          <div class="branch-row">
+            <div class="node error" data-detail="re-permanent"><div class="node-title">Skip Permanently</div><div class="node-desc">complete_job + advance<br>watermark (coalesced)</div></div>
+          </div>
+          <div class="branch-label retry">&#8635; Transient Error</div>
+          <div class="branch-row">
+            <div class="node error" data-detail="re-transient"><div class="node-title">Backoff Retry</div><div class="node-desc">fail_job: 30s x 2^(n-1)<br>capped at 480s</div></div>
+          </div>
+        </div>
+      </div>
+    </div>
+
+    <!-- MERGE REQUESTS -->
+    <div class="flow-container" id="view-mrs">
+      <div class="legend">
+        <div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>API Call</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Transform</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Diff from Issues</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error Path</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--pink)"></div>Queue</div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">1</div>
+          <div class="phase-title">Fetch MRs <span class="phase-subtitle">Page-Based Incremental Sync</span></div>
+        </div>
+        <div class="flow-row">
+          <div class="node api" data-detail="mr-api-call"><div class="node-title">GitLab API Call</div><div class="node-desc">fetch_merge_requests_page()<br>with cursor rewind</div><div class="diff-badge changed">Page-based, not streaming</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node decision" data-detail="mr-cursor-filter"><div class="node-title">Cursor Filter</div><div class="node-desc">Same logic as issues:<br>timestamp + tie-breaker</div><div class="diff-badge same">Same as issues</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node transform" data-detail="mr-transform"><div class="node-title">transform_merge_request()</div><div class="node-desc">Maps API shape &rarr;<br>local DB row</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node db" data-detail="mr-transaction"><div class="node-title">Transaction</div><div class="node-desc">store &rarr; upsert &rarr; dirty &rarr;<br>labels + assignees + reviewers</div><div class="diff-badge changed">3 junction tables (not 2)</div></div>
+        </div>
+        <div class="arrow-down">&darr;</div>
+        <div class="flow-row">
+          <div class="node db" data-detail="mr-cursor-update"><div class="node-title">Update Cursor</div><div class="node-desc">Per page (not every 100)</div><div class="diff-badge changed">Per page boundary</div></div>
+        </div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:var(--green-dim);color:var(--green)">2</div>
+          <div class="phase-title">MR Discussion Sync <span class="phase-subtitle">Parallel Prefetch + Serial Write</span></div>
+        </div>
+        <div class="info-box" style="margin-left:40px;margin-bottom:16px;">
+          <div class="info-box-title">Key Differences from Issue Discussions</div>
+          <ul>
+            <li><b>Parallel prefetch</b> &mdash; fetches all discussions for a batch concurrently via <code>join_all()</code></li>
+            <li><b>Upsert pattern</b> &mdash; notes use INSERT...ON CONFLICT (not delete-all + re-insert)</li>
+            <li><b>Sweep stale</b> &mdash; uses <code>last_seen_at</code> timestamp comparison (not set difference)</li>
+            <li><b>Sync health tracking</b> &mdash; records <code>discussions_sync_attempts</code> and <code>last_error</code></li>
+          </ul>
+        </div>
+        <div class="flow-row">
+          <div class="node db" data-detail="mr-disc-query"><div class="node-title">Query Stale MRs</div><div class="node-desc">updated_at &gt; COALESCE(<br>discussions_synced_for_<br>updated_at, 0)</div><div class="diff-badge same">Same watermark logic</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node decision" data-detail="mr-disc-batch"><div class="node-title">Batch by Concurrency</div><div class="node-desc">dependent_concurrency<br>MRs per batch</div><div class="diff-badge changed">Batched processing</div></div>
+        </div>
+        <div class="arrow-down">&darr;</div>
+        <div class="flow-row">
+          <div class="node api" data-detail="mr-disc-prefetch"><div class="node-title">Parallel Prefetch</div><div class="node-desc">join_all() fetches all<br>discussions for batch</div><div class="diff-badge changed">Parallel (not sequential)</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node transform" data-detail="mr-disc-transform"><div class="node-title">Transform In-Memory</div><div class="node-desc">transform_mr_discussion()<br>+ diff position notes</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node db" data-detail="mr-disc-write"><div class="node-title">Serial Write</div><div class="node-desc">upsert discussion<br>upsert notes (ON CONFLICT)</div><div class="diff-badge changed">Upsert, not delete+insert</div></div>
+        </div>
+        <div class="branch-container">
+          <div class="branch-label success">&#10003; On Full Success</div>
+          <div class="branch-row">
+            <div class="node db" data-detail="mr-disc-sweep"><div class="node-title">Sweep Stale</div><div class="node-desc">DELETE WHERE last_seen_at<br>&lt; run_seen_at (disc + notes)</div><div class="diff-badge changed">last_seen_at sweep</div></div>
+            <div class="arrow">&rarr;</div>
+            <div class="node db" data-detail="mr-disc-watermark"><div class="node-title">Advance Watermark</div><div class="node-desc">discussions_synced_for_<br>updated_at = updated_at</div></div>
+          </div>
+          <div class="branch-label error">&#10007; On Failure</div>
+          <div class="branch-row">
+            <div class="node error" data-detail="mr-disc-fail"><div class="node-title">Record Sync Health</div><div class="node-desc">Watermark NOT advanced<br>Tracks attempts + last_error</div><div class="diff-badge changed">Health tracking</div></div>
+          </div>
+        </div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:rgba(247,120,186,0.15);color:var(--pink)">3</div>
+          <div class="phase-title">Resource Events <span class="phase-subtitle">Same as Issues</span></div>
+        </div>
+        <div class="info-box" style="margin-left:40px">
+          <div class="info-box-title">Identical to Issue Resource Events</div>
+          <ul>
+            <li>Same queue-based approach: cleanup &rarr; enqueue &rarr; claim &rarr; fetch &rarr; store/fail</li>
+            <li>Same watermark column: <code>resource_events_synced_for_updated_at</code></li>
+            <li>Same error handling: 404/403 coalesced to empty, transient errors get backoff</li>
+            <li>entity_type = <code>"merge_request"</code> instead of <code>"issue"</code></li>
+          </ul>
+        </div>
+      </div>
+    </div>
+
+    <!-- DOCUMENTS -->
+    <div class="flow-container" id="view-docs">
+      <div class="legend">
+        <div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>Trigger</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Extract</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Decision</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error</div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">1</div>
+          <div class="phase-title">Dirty Source Queue <span class="phase-subtitle">Populated During Ingestion</span></div>
+        </div>
+        <div class="flow-row">
+          <div class="node api" data-detail="doc-trigger"><div class="node-title">mark_dirty_tx()</div><div class="node-desc">Called during every issue/<br>MR/discussion upsert</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node db" data-detail="doc-dirty-table"><div class="node-title">dirty_sources Table</div><div class="node-desc">INSERT (source_type, source_id)<br>ON CONFLICT reset backoff</div></div>
+        </div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:var(--amber-dim);color:var(--amber)">2</div>
+          <div class="phase-title">Drain Loop <span class="phase-subtitle">Batch 500, Respects Backoff</span></div>
+        </div>
+        <div class="flow-row">
+          <div class="node db" data-detail="doc-drain"><div class="node-title">Get Dirty Sources</div><div class="node-desc">Batch 500, ORDER BY<br>attempt_count, queued_at</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node decision" data-detail="doc-dispatch"><div class="node-title">Dispatch by Type</div><div class="node-desc">issue / mr / discussion<br>&rarr; extract function</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node decision" data-detail="doc-deleted-check"><div class="node-title">Source Exists?</div><div class="node-desc">If deleted: remove doc row<br>(cascade cleans FTS + embeds)</div></div>
+        </div>
+        <div class="arrow-down">&darr;</div>
+        <div class="flow-row">
+          <div class="node transform" data-detail="doc-extract"><div class="node-title">Extract Content</div><div class="node-desc">Structured text:<br>header + metadata + body</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node decision" data-detail="doc-triple-hash"><div class="node-title">Triple-Hash Check</div><div class="node-desc">content_hash + labels_hash<br>+ paths_hash all match?</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node db" data-detail="doc-write"><div class="node-title">SAVEPOINT Write</div><div class="node-desc">Atomic: document row +<br>labels + paths</div></div>
+        </div>
+        <div class="branch-container">
+          <div class="branch-label success">&#10003; On Success</div>
+          <div class="branch-row">
+            <div class="node db" data-detail="doc-clear"><div class="node-title">clear_dirty()</div><div class="node-desc">Remove from dirty_sources</div></div>
+          </div>
+          <div class="branch-label error">&#10007; On Error</div>
+          <div class="branch-row">
+            <div class="node error" data-detail="doc-error"><div class="node-title">record_dirty_error()</div><div class="node-desc">Increment attempt_count<br>Exponential backoff</div></div>
+          </div>
+          <div class="branch-label" style="color:var(--purple)">&#8801; Triple-Hash Match (skip)</div>
+          <div class="branch-row">
+            <div class="node db" data-detail="doc-skip"><div class="node-title">Skip Write</div><div class="node-desc">All 3 hashes match &rarr;<br>no WAL churn, clear dirty</div></div>
+          </div>
+        </div>
+      </div>
+      <div class="info-box">
+        <div class="info-box-title">Full Mode (<code>--full</code>)</div>
+        <ul>
+          <li>Seeds <b>ALL</b> entities into <code>dirty_sources</code> via keyset pagination</li>
+          <li>Triple-hash optimization prevents redundant writes even in full mode</li>
+          <li>Runs FTS <code>OPTIMIZE</code> after drain completes</li>
+        </ul>
+      </div>
+    </div>
+
+    <!-- EMBEDDINGS -->
+    <div class="flow-container" id="view-embed">
+      <div class="legend">
+        <div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>API (Ollama)</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Processing</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Decision</div>
+        <div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error</div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:var(--amber-dim);color:var(--amber)">1</div>
+          <div class="phase-title">Change Detection <span class="phase-subtitle">Hash + Config Drift</span></div>
+        </div>
+        <div class="flow-row">
+          <div class="node decision" data-detail="embed-detect"><div class="node-title">find_pending_documents()</div><div class="node-desc">No metadata row? OR<br>document_hash mismatch? OR<br>config drift?</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node db" data-detail="embed-paginate"><div class="node-title">Keyset Pagination</div><div class="node-desc">500 documents per page<br>ordered by doc ID</div></div>
+        </div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:var(--purple-dim);color:var(--purple)">2</div>
+          <div class="phase-title">Chunking <span class="phase-subtitle">Split + Overflow Guard</span></div>
+        </div>
+        <div class="flow-row">
+          <div class="node transform" data-detail="embed-chunk"><div class="node-title">split_into_chunks()</div><div class="node-desc">Split by paragraph boundaries<br>with configurable overlap</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node decision" data-detail="embed-overflow"><div class="node-title">Overflow Guard</div><div class="node-desc">Too many chunks?<br>Skip to prevent rowid collision</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node transform" data-detail="embed-work"><div class="node-title">Build ChunkWork</div><div class="node-desc">Assign encoded chunk IDs<br>per document</div></div>
+        </div>
+      </div>
+      <div class="phase">
+        <div class="phase-header">
+          <div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">3</div>
+          <div class="phase-title">Ollama Embedding <span class="phase-subtitle">Batched API Calls</span></div>
+        </div>
+        <div class="flow-row">
+          <div class="node api" data-detail="embed-batch"><div class="node-title">Batch Embed</div><div class="node-desc">32 chunks per Ollama<br>API call</div></div>
+          <div class="arrow">&rarr;</div>
+          <div class="node db" data-detail="embed-store"><div class="node-title">Store Vectors</div><div class="node-desc">sqlite-vec embeddings table<br>+ embedding_metadata</div></div>
+        </div>
+        <div class="branch-container">
+          <div class="branch-label success">&#10003; On Success</div>
+          <div class="branch-row">
+            <div class="node db" data-detail="embed-success"><div class="node-title">SAVEPOINT Commit</div><div class="node-desc">Atomic per page:<br>clear old + write new</div></div>
+          </div>
+          <div class="branch-label retry">&#8635; Context-Length Error</div>
+          <div class="branch-row">
+            <div class="node error" data-detail="embed-ctx-error"><div class="node-title">Retry Individually</div><div class="node-desc">Re-embed each chunk solo<br>to isolate oversized one</div></div>
+          </div>
+          <div class="branch-label error">&#10007; Other Error</div>
+          <div class="branch-row">
+            <div class="node error" data-detail="embed-other-error"><div class="node-title">Record Error</div><div class="node-desc">Store in embedding_metadata<br>for retry next run</div></div>
+          </div>
+        </div>
+      </div>
+      <div class="info-box">
+        <div class="info-box-title">Full Mode (<code>--full</code>)</div>
+        <ul>
+          <li>DELETEs all <code>embedding_metadata</code> and <code>embeddings</code> rows first</li>
+          <li>Every document re-processed from scratch</li>
+        </ul>
+      </div>
+      <div class="info-box">
+        <div class="info-box-title">Non-Fatal in Sync</div>
+        <ul>
+          <li>Stage 4 failures (Ollama down, model missing) are <b>graceful</b></li>
+          <li>Sync completes successfully; embeddings just won't be updated</li>
+          <li>Semantic search degrades to FTS-only mode</li>
+        </ul>
+      </div>
+    </div>
+
+  </div></div>
+
+  <!-- Watermark Panel -->
+  <div class="watermark-panel">
+    <div class="watermark-toggle" onclick="toggleWatermarks()">
+      <span class="chevron" id="wm-chevron">&#9650;</span>
+      Watermark &amp; Cursor Reference
+    </div>
+    <div class="watermark-content" id="wm-content">
+      <table class="wm-table">
+        <thead><tr><th>Table</th><th>Column(s)</th><th>Purpose</th></tr></thead>
+        <tbody>
+          <tr><td>sync_cursors</td><td>updated_at_cursor + tie_breaker_id</td><td>Incremental fetch: "last entity we saw" per project+type</td></tr>
+          <tr><td>issues</td><td>discussions_synced_for_updated_at</td><td>Per-issue discussion watermark</td></tr>
+          <tr><td>issues</td><td>resource_events_synced_for_updated_at</td><td>Per-issue resource event watermark</td></tr>
+          <tr><td>merge_requests</td><td>discussions_synced_for_updated_at</td><td>Per-MR discussion watermark</td></tr>
+          <tr><td>merge_requests</td><td>resource_events_synced_for_updated_at</td><td>Per-MR resource event watermark</td></tr>
+          <tr><td>dirty_sources</td><td>queued_at + next_attempt_at</td><td>Document regeneration queue with backoff</td></tr>
+          <tr><td>embedding_metadata</td><td>document_hash + chunk_max_bytes + model + dims</td><td>Embedding staleness detection</td></tr>
+          <tr><td>pending_dependent_fetches</td><td>locked_at + next_retry_at + attempts</td><td>Resource event job queue with backoff</td></tr>
+        </tbody>
+      </table>
+    </div>
+  </div>
+</div>
+
+<!-- Detail Panel -->
+<div class="detail-panel" id="detail-panel">
+  <div class="detail-header">
+    <h2 id="detail-title">Node Details</h2>
+    <button class="detail-close" onclick="closeDetail()">&times;</button>
+  </div>
+  <div class="detail-body" id="detail-body"></div>
+</div>
+
+<script>
+const viewTitles = {
+  overview: 'Full Sync Overview', issues: 'Issue Ingestion Flow',
+  mrs: 'Merge Request Ingestion Flow', docs: 'Document Generation Flow',
+  embed: 'Embedding Generation Flow',
+};
+const viewBadges = {
+  overview: '4 stages', issues: '3 phases', mrs: '3 phases',
+  docs: '2 phases', embed: '3 phases',
+};
+
+function switchView(view) {
+  document.querySelectorAll('.flow-container').forEach(function(el) { el.classList.remove('active'); });
+  document.getElementById('view-' + view).classList.add('active');
+  document.querySelectorAll('.nav-item').forEach(function(el) {
+    el.classList.toggle('active', el.dataset.view === view);
+  });
+  document.getElementById('view-title').textContent = viewTitles[view];
+  document.getElementById('view-badge').textContent = viewBadges[view];
+  closeDetail();
+}
+
+function toggleWatermarks() {
+  document.getElementById('wm-content').classList.toggle('open');
+  document.getElementById('wm-chevron').classList.toggle('open');
+}
+
+var details = {
+  'issue-api-call': { title: 'GitLab API: Paginate Issues', type: 'api', file: 'src/ingestion/issues.rs:51-140', desc: 'Streams issues from the GitLab API using cursor-based incremental sync. The API is called with updated_after set to the last known cursor minus a configurable rewind window (to handle clock skew between GitLab and the local database).', sql: 'GET /api/v4/projects/{id}/issues\n  ?updated_after={cursor - rewind_seconds}\n  &order_by=updated_at&sort=asc\n  &per_page=100' },
+  'issue-cursor-filter': { title: 'Cursor Filter (Dedup)', type: 'decision', file: 'src/ingestion/issues.rs:95-110', desc: 'Because of the cursor rewind, some issues will be re-fetched that we already have. The cursor filter skips these using a two-part comparison: primary on updated_at timestamp, with gitlab_id as a tie-breaker when timestamps are equal.', sql: '// Pseudocode:\nif issue.updated_at > cursor_ts:\n    ACCEPT  // newer than cursor\nelif issue.updated_at == cursor_ts\n     AND issue.gitlab_id > tie_breaker_id:\n    ACCEPT  // same timestamp, higher ID\nelse:\n    SKIP    // already processed' },
+  'issue-transform': { title: 'Transform Issue', type: 'transform', file: 'src/gitlab/transformers/issue.rs', desc: 'Maps the GitLab API response shape to the local database row shape. Parses ISO 8601 timestamps to milliseconds-since-epoch, extracts label names, assignee usernames, milestone info, and due dates.' },
+  'issue-transaction': { title: 'Issue Write Transaction', type: 'db', file: 'src/ingestion/issues.rs:190-220', desc: 'All operations for a single issue are wrapped in one SQLite transaction for atomicity. If any step fails, the entire issue write is rolled back.', sql: 'BEGIN;\n-- 1. Store raw JSON payload (compressed, deduped)\nINSERT INTO payloads ...;\n-- 2. Upsert issue row\nINSERT INTO issues ... ON CONFLICT(gitlab_id)\n  DO UPDATE SET ...;\n-- 3. Mark dirty for document regen\nINSERT INTO dirty_sources ...;\n-- 4. Relink labels\nDELETE FROM issue_labels WHERE issue_id = ?;\nINSERT INTO labels ... ON CONFLICT DO UPDATE;\nINSERT INTO issue_labels ...;\n-- 5. Relink assignees\nDELETE FROM issue_assignees WHERE issue_id = ?;\nINSERT INTO issue_assignees ...;\nCOMMIT;' },
+  'issue-cursor-update': { title: 'Update Sync Cursor', type: 'db', file: 'src/ingestion/issues.rs:130-140', desc: 'The sync cursor is updated every 100 issues (for crash recovery) and once at the end of the stream. If the process crashes mid-sync, it resumes from at most 100 issues back.', sql: 'INSERT INTO sync_cursors\n  (project_id, resource_type,\n   updated_at_cursor, tie_breaker_id)\nVALUES (?1, \'issues\', ?2, ?3)\nON CONFLICT(project_id, resource_type)\n  DO UPDATE SET\n    updated_at_cursor = ?2,\n    tie_breaker_id = ?3;' },
+  'issue-disc-query': { title: 'Query Issues Needing Discussion Sync', type: 'db', file: 'src/ingestion/issues.rs:450-471', desc: 'Finds all issues in this project whose updated_at timestamp exceeds their per-row discussion watermark. Issues that have not changed since their last discussion sync are skipped entirely.', sql: 'SELECT id, iid, updated_at\nFROM issues\nWHERE project_id = ?1\n  AND updated_at > COALESCE(\n    discussions_synced_for_updated_at, 0\n  );' },
+  'issue-disc-fetch': { title: 'Paginate Issue Discussions', type: 'api', file: 'src/ingestion/discussions.rs:73-205', desc: 'Discussions are fetched sequentially per issue (rusqlite Connection is not Send, so async parallelism is not possible here). Each issue\'s discussions are streamed page by page from the GitLab API.', sql: 'GET /api/v4/projects/{id}/issues/{iid}\n    /discussions?per_page=100' },
+  'issue-disc-transform': { title: 'Transform Discussion + Notes', type: 'transform', file: 'src/gitlab/transformers/discussion.rs', desc: 'Transforms the raw GitLab discussion payload into normalized rows. Sets NoteableRef::Issue. Computes resolvable/resolved status, first_note_at/last_note_at timestamps, and per-note position indices.' },
+  'issue-disc-write': { title: 'Write Discussion (Full Refresh)', type: 'db', file: 'src/ingestion/discussions.rs:140-180', desc: 'Issue discussions use a full-refresh pattern: all existing notes for a discussion are deleted and re-inserted. This is simpler than upsert but means partial failures lose the previous state.', sql: 'BEGIN;\nINSERT INTO payloads ...;\nINSERT INTO discussions ... ON CONFLICT DO UPDATE;\nINSERT INTO dirty_sources ...;\n-- Full refresh: delete all then re-insert\nDELETE FROM notes WHERE discussion_id = ?;\nINSERT INTO notes VALUES (...);\nCOMMIT;' },
+  'issue-disc-stale': { title: 'Remove Stale Discussions', type: 'db', file: 'src/ingestion/discussions.rs:185-195', desc: 'After successfully fetching ALL discussion pages for an issue, any discussions in the DB that were not seen in this fetch are deleted. Uses a temp table for >500 IDs to avoid SQLite\'s 999-variable limit.', sql: '-- For small sets (<= 500):\nDELETE FROM discussions\nWHERE issue_id = ?\n  AND gitlab_id NOT IN (...);\n\n-- For large sets (> 500):\nCREATE TEMP TABLE seen_ids(id TEXT);\nINSERT INTO seen_ids ...;\nDELETE FROM discussions\nWHERE issue_id = ?\n  AND gitlab_id NOT IN\n    (SELECT id FROM seen_ids);\nDROP TABLE seen_ids;' },
+  'issue-disc-watermark': { title: 'Advance Discussion Watermark', type: 'db', file: 'src/ingestion/discussions.rs:198', desc: 'Sets the per-issue watermark to the issue\'s current updated_at, signaling that discussions are now synced for this version of the issue.', sql: 'UPDATE issues\nSET discussions_synced_for_updated_at\n    = updated_at\nWHERE id = ?;' },
+  'issue-disc-fail': { title: 'Pagination Error Handling', type: 'error', file: 'src/ingestion/discussions.rs:182', desc: 'If pagination fails mid-stream, stale discussion removal is skipped (we don\'t know the full set) and the watermark is NOT advanced. The issue will be retried on the next sync run.' },
+  're-cleanup': { title: 'Cleanup Obsolete Jobs', type: 'queue', file: 'src/ingestion/orchestrator.rs:490-520', desc: 'Before enqueuing new jobs, delete any existing jobs for entities whose watermark is already current. These are leftover from a previous run.', sql: 'DELETE FROM pending_dependent_fetches\nWHERE project_id = ?\n  AND job_type = \'resource_events\'\n  AND entity_local_id IN (\n    SELECT id FROM issues\n    WHERE project_id = ?\n      AND updated_at <= COALESCE(\n        resource_events_synced_for_updated_at, 0\n      )\n  );' },
+  're-enqueue': { title: 'Enqueue Resource Event Jobs', type: 'queue', file: 'src/ingestion/orchestrator.rs:525-555', desc: 'For each entity whose updated_at exceeds its resource event watermark, insert a job into the queue. Uses INSERT OR IGNORE for idempotency.', sql: 'INSERT OR IGNORE INTO pending_dependent_fetches\n  (project_id, entity_type, entity_iid,\n   entity_local_id, job_type, enqueued_at)\nSELECT project_id, \'issue\', iid, id,\n       \'resource_events\', ?now\nFROM issues\nWHERE project_id = ?\n  AND updated_at > COALESCE(\n    resource_events_synced_for_updated_at, 0\n  );' },
+  're-claim': { title: 'Claim Jobs (Atomic Lock)', type: 'queue', file: 'src/core/dependent_queue.rs', desc: 'Atomically claims a batch of unlocked jobs whose backoff period has elapsed. Uses UPDATE...RETURNING for lock acquisition in a single statement.', sql: 'UPDATE pending_dependent_fetches\nSET locked_at = ?now\nWHERE rowid IN (\n  SELECT rowid\n  FROM pending_dependent_fetches\n  WHERE project_id = ?\n    AND job_type = \'resource_events\'\n    AND locked_at IS NULL\n    AND (next_retry_at IS NULL\n         OR next_retry_at <= ?now)\n  ORDER BY enqueued_at ASC\n  LIMIT ?batch_size\n)\nRETURNING *;' },
+  're-fetch': { title: 'Fetch 3 Event Types Concurrently', type: 'api', file: 'src/gitlab/client.rs:732-771', desc: 'Uses tokio::join! (not try_join!) to fetch state, label, and milestone events concurrently. Permanent errors (404, 403) are coalesced to empty vecs via coalesce_inaccessible().', sql: 'tokio::join!(\n  fetch_issue_state_events(proj, iid),\n  fetch_issue_label_events(proj, iid),\n  fetch_issue_milestone_events(proj, iid),\n)\n// Each: coalesce_inaccessible()\n// 404/403 -> Ok(vec![])\n// Other errors -> propagated' },
+  're-store': { title: 'Store Resource Events', type: 'db', file: 'src/ingestion/orchestrator.rs:620-640', desc: 'All three event types are upserted in a single transaction.', sql: 'BEGIN;\nINSERT INTO resource_state_events ...\n  ON CONFLICT DO UPDATE;\nINSERT INTO resource_label_events ...\n  ON CONFLICT DO UPDATE;\nINSERT INTO resource_milestone_events ...\n  ON CONFLICT DO UPDATE;\nCOMMIT;' },
+  're-complete': { title: 'Complete Job + Advance Watermark', type: 'db', file: 'src/ingestion/orchestrator.rs:645-660', desc: 'After successful storage, the job row is deleted and the entity\'s watermark is advanced.', sql: 'DELETE FROM pending_dependent_fetches\n  WHERE rowid = ?;\n\nUPDATE issues\nSET resource_events_synced_for_updated_at\n    = updated_at\nWHERE id = ?;' },
+  're-permanent': { title: 'Permanent Error: Skip Entity', type: 'error', file: 'src/ingestion/orchestrator.rs:665-680', desc: '404 (endpoint doesn\'t exist) and 403 (insufficient permissions) are permanent. The job is completed and watermark advanced, so this entity is permanently skipped until next updated on GitLab.' },
+  're-transient': { title: 'Transient Error: Exponential Backoff', type: 'error', file: 'src/core/dependent_queue.rs', desc: 'Network errors, 500s, rate limits get exponential backoff. Formula: 30s * 2^(attempts-1), capped at 480s (8 minutes).', sql: 'UPDATE pending_dependent_fetches\nSET locked_at = NULL,\n    attempts = attempts + 1,\n    next_retry_at = ?now\n      + 30000 * pow(2, attempts),\n    -- capped at 480000ms (8 min)\n    last_error = ?error_msg\nWHERE rowid = ?;' },
+  'mr-api-call': { title: 'GitLab API: Fetch MR Pages', type: 'api', file: 'src/ingestion/merge_requests.rs:51-151', desc: 'Unlike issues which stream, MRs use explicit page-based pagination via fetch_merge_requests_page(). Each page returns items plus a next_page indicator.', sql: 'GET /api/v4/projects/{id}/merge_requests\n  ?updated_after={cursor - rewind}\n  &order_by=updated_at&sort=asc\n  &per_page=100&page={n}' },
+  'mr-cursor-filter': { title: 'Cursor Filter', type: 'decision', file: 'src/ingestion/merge_requests.rs:90-105', desc: 'Identical logic to issues: timestamp comparison with gitlab_id tie-breaker.' },
+  'mr-transform': { title: 'Transform Merge Request', type: 'transform', file: 'src/gitlab/transformers/mr.rs', desc: 'Maps GitLab MR response to local row. Handles draft detection (prefers draft field, falls back to work_in_progress), detailed_merge_status, merge_user resolution, and reviewer extraction.' },
+  'mr-transaction': { title: 'MR Write Transaction', type: 'db', file: 'src/ingestion/merge_requests.rs:170-210', desc: 'Same pattern as issues but with THREE junction tables: labels, assignees, AND reviewers.', sql: 'BEGIN;\nINSERT INTO payloads ...;\nINSERT INTO merge_requests ...\n  ON CONFLICT DO UPDATE;\nINSERT INTO dirty_sources ...;\n-- 3 junction tables:\nDELETE FROM mr_labels WHERE mr_id = ?;\nINSERT INTO mr_labels ...;\nDELETE FROM mr_assignees WHERE mr_id = ?;\nINSERT INTO mr_assignees ...;\nDELETE FROM mr_reviewers WHERE mr_id = ?;\nINSERT INTO mr_reviewers ...;\nCOMMIT;' },
+  'mr-cursor-update': { title: 'Update Cursor Per Page', type: 'db', file: 'src/ingestion/merge_requests.rs:140-150', desc: 'Unlike issues (every 100 items), MR cursor is updated at each page boundary for better crash recovery.' },
+  'mr-disc-query': { title: 'Query MRs Needing Discussion Sync', type: 'db', file: 'src/ingestion/merge_requests.rs:430-451', desc: 'Same watermark pattern as issues. Runs AFTER MR ingestion to avoid memory growth.', sql: 'SELECT id, iid, updated_at\nFROM merge_requests\nWHERE project_id = ?1\n  AND updated_at > COALESCE(\n    discussions_synced_for_updated_at, 0\n  );' },
+  'mr-disc-batch': { title: 'Batch by Concurrency', type: 'decision', file: 'src/ingestion/orchestrator.rs:420-465', desc: 'MRs are processed in batches sized by dependent_concurrency. Each batch first prefetches all discussions in parallel, then writes serially.' },
+  'mr-disc-prefetch': { title: 'Parallel Prefetch', type: 'api', file: 'src/ingestion/mr_discussions.rs:66-120', desc: 'All MRs in the batch have their discussions fetched concurrently via join_all(). Each MR\'s discussions are fetched in one call, transformed in memory, and returned as PrefetchedMrDiscussions.', sql: '// For each MR in batch, concurrently:\nGET /api/v4/projects/{id}/merge_requests\n    /{iid}/discussions?per_page=100\n\n// All fetched + transformed in memory\n// before any DB writes happen' },
+  'mr-disc-transform': { title: 'Transform MR Discussions', type: 'transform', file: 'src/ingestion/mr_discussions.rs:125-160', desc: 'Uses transform_mr_discussion() which additionally handles DiffNote positions (file paths, line ranges, SHA triplets).' },
+  'mr-disc-write': { title: 'Serial Write (Upsert Pattern)', type: 'db', file: 'src/ingestion/mr_discussions.rs:165-220', desc: 'Unlike issue discussions (delete-all + re-insert), MR discussions use INSERT...ON CONFLICT DO UPDATE for both discussions and notes. Safer for partial failures.', sql: 'BEGIN;\nINSERT INTO payloads ...;\nINSERT INTO discussions ...\n  ON CONFLICT DO UPDATE\n  SET ..., last_seen_at = ?run_ts;\nINSERT INTO dirty_sources ...;\n-- Upsert notes (not delete+insert):\nINSERT INTO notes ...\n  ON CONFLICT DO UPDATE\n  SET ..., last_seen_at = ?run_ts;\nCOMMIT;' },
+  'mr-disc-sweep': { title: 'Sweep Stale (last_seen_at)', type: 'db', file: 'src/ingestion/mr_discussions.rs:225-245', desc: 'Staleness detected via last_seen_at timestamps. Both discussions AND notes are swept independently.', sql: '-- Sweep stale discussions:\nDELETE FROM discussions\nWHERE merge_request_id = ?\n  AND last_seen_at < ?run_seen_at;\n\n-- Sweep stale notes:\nDELETE FROM notes\nWHERE discussion_id IN (\n  SELECT id FROM discussions\n  WHERE merge_request_id = ?\n) AND last_seen_at < ?run_seen_at;' },
+  'mr-disc-watermark': { title: 'Advance MR Discussion Watermark', type: 'db', file: 'src/ingestion/mr_discussions.rs:248', desc: 'Same as issues: stamps the per-MR watermark.', sql: 'UPDATE merge_requests\nSET discussions_synced_for_updated_at\n    = updated_at\nWHERE id = ?;' },
+  'mr-disc-fail': { title: 'Failure: Sync Health Tracking', type: 'error', file: 'src/ingestion/mr_discussions.rs:252-260', desc: 'Unlike issues, MR discussion failures are tracked: discussions_sync_attempts is incremented and discussions_sync_last_error is recorded. Watermark is NOT advanced.' },
+  'doc-trigger': { title: 'mark_dirty_tx()', type: 'api', file: 'src/ingestion/dirty_tracker.rs', desc: 'Called during every upsert in ingestion. Inserts into dirty_sources, or on conflict resets backoff. This bridges ingestion (stages 1-2) and document generation (stage 3).', sql: 'INSERT INTO dirty_sources\n  (source_type, source_id, queued_at)\nVALUES (?1, ?2, ?now)\nON CONFLICT(source_type, source_id)\n  DO UPDATE SET\n    queued_at = ?now,\n    attempt_count = 0,\n    next_attempt_at = NULL,\n    last_error = NULL;' },
+  'doc-dirty-table': { title: 'dirty_sources Table', type: 'db', file: 'src/ingestion/dirty_tracker.rs', desc: 'Persistent queue of entities needing document regeneration. Supports exponential backoff for failed extractions.' },
+  'doc-drain': { title: 'Get Dirty Sources (Batched)', type: 'db', file: 'src/documents/regenerator.rs:35-45', desc: 'Fetches up to 500 dirty entries per batch, prioritizing fewer attempts. Respects exponential backoff.', sql: 'SELECT source_type, source_id\nFROM dirty_sources\nWHERE next_attempt_at IS NULL\n   OR next_attempt_at <= ?now\nORDER BY attempt_count ASC,\n         queued_at ASC\nLIMIT 500;' },
+  'doc-dispatch': { title: 'Dispatch by Source Type', type: 'decision', file: 'src/documents/extractor.rs', desc: 'Routes to the appropriate extraction function: "issue" -> extract_issue_document(), "merge_request" -> extract_mr_document(), "discussion" -> extract_discussion_document().' },
+  'doc-deleted-check': { title: 'Source Exists Check', type: 'decision', file: 'src/documents/regenerator.rs:48-55', desc: 'If the source entity was deleted, the extractor returns None. The regenerator deletes the document row. FK cascades clean up FTS and embeddings.' },
+  'doc-extract': { title: 'Extract Structured Content', type: 'transform', file: 'src/documents/extractor.rs', desc: 'Builds searchable text:\n[[Issue]] #42: Title\nProject: group/repo\nURL: ...\nLabels: [bug, urgent]\nState: opened\n\n--- Description ---\n...\n\nDiscussions inherit parent labels and extract DiffNote file paths.' },
+  'doc-triple-hash': { title: 'Triple-Hash Write Optimization', type: 'decision', file: 'src/documents/regenerator.rs:55-62', desc: 'Checks content_hash + labels_hash + paths_hash against existing document. If ALL three match, write is completely skipped. Critical for --full mode performance.' },
+  'doc-write': { title: 'SAVEPOINT Atomic Write', type: 'db', file: 'src/documents/regenerator.rs:58-65', desc: 'Document, labels, and paths written inside a SAVEPOINT for atomicity.', sql: 'SAVEPOINT doc_write;\nINSERT INTO documents ...\n  ON CONFLICT DO UPDATE SET\n    content = ?, content_hash = ?,\n    labels_hash = ?, paths_hash = ?;\nDELETE FROM document_labels\n  WHERE doc_id = ?;\nINSERT INTO document_labels ...;\nDELETE FROM document_paths\n  WHERE doc_id = ?;\nINSERT INTO document_paths ...;\nRELEASE doc_write;' },
+  'doc-clear': { title: 'Clear Dirty Entry', type: 'db', file: 'src/ingestion/dirty_tracker.rs', desc: 'On success, the dirty_sources row is deleted.', sql: 'DELETE FROM dirty_sources\nWHERE source_type = ?\n  AND source_id = ?;' },
+  'doc-error': { title: 'Record Error + Backoff', type: 'error', file: 'src/ingestion/dirty_tracker.rs', desc: 'Increments attempt_count, sets next_attempt_at with exponential backoff. Entry stays for retry.', sql: 'UPDATE dirty_sources\nSET attempt_count = attempt_count + 1,\n    next_attempt_at = ?now\n      + compute_backoff(attempt_count),\n    last_error = ?error_msg\nWHERE source_type = ?\n  AND source_id = ?;' },
+  'doc-skip': { title: 'Skip Write (Hash Match)', type: 'db', file: 'src/documents/regenerator.rs:57', desc: 'When all three hashes match, the document has not actually changed. Common when updated_at changes but content/labels/paths remain the same. Dirty entry is cleared without writes.' },
+  'embed-detect': { title: 'Change Detection', type: 'decision', file: 'src/embedding/change_detector.rs', desc: 'Document needs re-embedding if: (1) No embedding_metadata row, (2) document_hash mismatch, (3) Config drift in chunk_max_bytes, model, or dims.', sql: 'SELECT d.id, d.content, d.content_hash\nFROM documents d\nLEFT JOIN embedding_metadata em\n  ON em.document_id = d.id\nWHERE em.document_id IS NULL\n   OR em.document_hash != d.content_hash\n   OR em.chunk_max_bytes != ?config\n   OR em.model != ?model\n   OR em.dims != ?dims;' },
+  'embed-paginate': { title: 'Keyset Pagination', type: 'db', file: 'src/embedding/pipeline.rs:80-100', desc: '500 documents per page using keyset pagination. Each page wrapped in a SAVEPOINT.' },
+  'embed-chunk': { title: 'Split Into Chunks', type: 'transform', file: 'src/embedding/chunking.rs', desc: 'Splits content at paragraph boundaries with configurable max size and overlap.' },
+  'embed-overflow': { title: 'Overflow Guard', type: 'decision', file: 'src/embedding/pipeline.rs:110-120', desc: 'If a document produces too many chunks, it is skipped to prevent rowid collisions in the encoded chunk ID scheme.' },
+  'embed-work': { title: 'Build ChunkWork Items', type: 'transform', file: 'src/embedding/pipeline.rs:125-140', desc: 'Each chunk gets an encoded ID (document_id * 1000000 + chunk_index) for the sqlite-vec primary key.' },
+  'embed-batch': { title: 'Batch Embed via Ollama', type: 'api', file: 'src/embedding/pipeline.rs:150-200', desc: 'Sends 32 chunks per Ollama API call. Model default: nomic-embed-text.', sql: 'POST http://localhost:11434/api/embed\n{\n  "model": "nomic-embed-text",\n  "input": ["chunk1...", "chunk2...", ...]\n}' },
+  'embed-store': { title: 'Store Vectors', type: 'db', file: 'src/embedding/pipeline.rs:205-230', desc: 'Vectors stored in sqlite-vec virtual table. Metadata in embedding_metadata. Old embeddings cleared on first successful chunk.', sql: '-- Clear old embeddings:\nDELETE FROM embeddings\n  WHERE rowid / 1000000 = ?doc_id;\n\n-- Insert new vector:\nINSERT INTO embeddings(rowid, embedding)\nVALUES (?chunk_id, ?vector_blob);\n\n-- Update metadata:\nINSERT INTO embedding_metadata ...\n  ON CONFLICT DO UPDATE SET\n    document_hash = ?,\n    chunk_max_bytes = ?,\n    model = ?, dims = ?;' },
+  'embed-success': { title: 'SAVEPOINT Commit', type: 'db', file: 'src/embedding/pipeline.rs:240-250', desc: 'Each page of 500 documents wrapped in a SAVEPOINT. Completed pages survive crashes.' },
+  'embed-ctx-error': { title: 'Context-Length Retry', type: 'error', file: 'src/embedding/pipeline.rs:260-280', desc: 'If Ollama returns context-length error for a batch, each chunk is retried individually to isolate the oversized one.' },
+  'embed-other-error': { title: 'Record Error for Retry', type: 'error', file: 'src/embedding/pipeline.rs:285-295', desc: 'Network/model errors recorded in embedding_metadata. Document detected as pending again on next run.' },
+};
+
+function escapeHtml(str) {
+  var div = document.createElement('div');
+  div.appendChild(document.createTextNode(str));
+  return div.textContent;
+}
+
+function buildDetailContent(d) {
+  var container = document.createDocumentFragment();
+
+  // Tags section
+  var tagSection = document.createElement('div');
+  tagSection.className = 'detail-section';
+  var typeTag = document.createElement('span');
+  typeTag.className = 'detail-tag type-' + d.type;
+  typeTag.textContent = d.type.toUpperCase();
+  tagSection.appendChild(typeTag);
+  if (d.file) {
+    var fileTag = document.createElement('span');
+    fileTag.className = 'detail-tag file';
+    fileTag.textContent = d.file;
+    tagSection.appendChild(fileTag);
+  }
+  container.appendChild(tagSection);
+
+  // Description
+  var descSection = document.createElement('div');
+  descSection.className = 'detail-section';
+  var descH3 = document.createElement('h3');
+  descH3.textContent = 'Description';
+  descSection.appendChild(descH3);
+  var descP = document.createElement('p');
+  descP.textContent = d.desc;
+  descSection.appendChild(descP);
+  container.appendChild(descSection);
+
+  // SQL
+  if (d.sql) {
+    var sqlSection = document.createElement('div');
+    sqlSection.className = 'detail-section';
+    var sqlH3 = document.createElement('h3');
+    sqlH3.textContent = 'Key Query / Code';
+    sqlSection.appendChild(sqlH3);
+    var sqlBlock = document.createElement('div');
+    sqlBlock.className = 'sql-block';
+    sqlBlock.textContent = d.sql;
+    sqlSection.appendChild(sqlBlock);
+    container.appendChild(sqlSection);
+  }
+
+  return container;
+}
+
+function showDetail(key) {
+  var d = details[key];
+  if (!d) return;
+  var panel = document.getElementById('detail-panel');
+  document.getElementById('detail-title').textContent = d.title;
+  var body = document.getElementById('detail-body');
+  while (body.firstChild) body.removeChild(body.firstChild);
+  body.appendChild(buildDetailContent(d));
+  document.querySelectorAll('.node.selected').forEach(function(n) { n.classList.remove('selected'); });
+  var clicked = document.querySelector('[data-detail="' + key + '"]');
+  if (clicked) clicked.classList.add('selected');
+  panel.classList.add('open');
+}
+
+function closeDetail() {
+  document.getElementById('detail-panel').classList.remove('open');
+  document.querySelectorAll('.node.selected').forEach(function(n) { n.classList.remove('selected'); });
+}
+
+document.addEventListener('click', function(e) {
+  var node = e.target.closest('.node[data-detail]');
+  if (node) { showDetail(node.dataset.detail); return; }
+  if (!e.target.closest('.detail-panel') && !e.target.closest('.node')) closeDetail();
+});
+document.addEventListener('keydown', function(e) { if (e.key === 'Escape') closeDetail(); });
+</script>
+</body>
+</html>
--- a/migrations/013_resource_event_watermarks.sql
+++ b/migrations/013_resource_event_watermarks.sql
@@ -0,0 +1,10 @@
+-- Migration 013: Add resource event sync watermarks
+-- Mirrors the discussions_synced_for_updated_at pattern so that only entities
+-- whose updated_at exceeds the last resource event sync get re-enqueued.
+
+ALTER TABLE issues ADD COLUMN resource_events_synced_for_updated_at INTEGER;
+ALTER TABLE merge_requests ADD COLUMN resource_events_synced_for_updated_at INTEGER;
+
+-- Update schema version
+INSERT INTO schema_version (version, applied_at, description)
+VALUES (13, strftime('%s', 'now') * 1000, 'Add resource event sync watermarks to issues and merge_requests');
--- a/migrations/014_sync_runs_enrichment.sql
+++ b/migrations/014_sync_runs_enrichment.sql
@@ -0,0 +1,12 @@
+-- Migration 014: sync_runs enrichment for observability
+-- Adds correlation ID and aggregate counts for queryable sync history
+
+ALTER TABLE sync_runs ADD COLUMN run_id TEXT;
+ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
+ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
+
+-- Index for correlation queries (find run by run_id from logs)
+CREATE INDEX IF NOT EXISTS idx_sync_runs_run_id ON sync_runs(run_id);
+
+INSERT INTO schema_version (version, applied_at, description)
+VALUES (14, strftime('%s', 'now') * 1000, 'Sync runs enrichment for observability');
--- a/src/cli/commands/doctor.rs
+++ b/src/cli/commands/doctor.rs
@@ -6,7 +6,7 @@ use serde::Serialize;
 use crate::core::config::Config;
 use crate::core::db::{create_connection, get_schema_version, verify_pragmas};
 use crate::core::error::LoreError;
-use crate::core::paths::{get_config_path, get_db_path};
+use crate::core::paths::{get_config_path, get_db_path, get_log_dir};
 use crate::gitlab::GitLabClient;

 #[derive(Debug, Clone, Serialize)]
@@ -37,6 +37,7 @@ pub struct DoctorChecks {
    pub gitlab: GitLabCheck,
    pub projects: ProjectsCheck,
    pub ollama: OllamaCheck,
+    pub logging: LoggingCheck,
 }

 #[derive(Debug, Serialize)]
@@ -87,6 +88,18 @@ pub struct OllamaCheck {
    pub model: Option<String>,
 }

+#[derive(Debug, Serialize)]
+pub struct LoggingCheck {
+    #[serde(flatten)]
+    pub result: CheckResult,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub log_dir: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub file_count: Option<usize>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub total_bytes: Option<u64>,
+}
+
 /// Run the doctor command.
 pub async fn run_doctor(config_path: Option<&str>) -> DoctorResult {
    let config_path_buf = get_config_path(config_path);
@@ -107,7 +120,10 @@ pub async fn run_doctor(config_path: Option<&str>) -> DoctorResult {
    // Check Ollama
    let ollama_check = check_ollama(config.as_ref()).await;

-    // Success if all required checks pass (ollama is optional)
+    // Check logging
+    let logging_check = check_logging(config.as_ref());
+
+    // Success if all required checks pass (ollama and logging are optional)
    let success = config_check.result.status == CheckStatus::Ok
        && database_check.result.status == CheckStatus::Ok
        && gitlab_check.result.status == CheckStatus::Ok
@@ -121,6 +137,7 @@ pub async fn run_doctor(config_path: Option<&str>) -> DoctorResult {
            gitlab: gitlab_check,
            projects: projects_check,
            ollama: ollama_check,
+            logging: logging_check,
        },
    }
 }
@@ -457,6 +474,59 @@ async fn check_ollama(config: Option<&Config>) -> OllamaCheck {
    }
 }

+fn check_logging(config: Option<&Config>) -> LoggingCheck {
+    let log_dir = get_log_dir(config.and_then(|c| c.logging.log_dir.as_deref()));
+    let log_dir_str = log_dir.display().to_string();
+
+    if !log_dir.exists() {
+        return LoggingCheck {
+            result: CheckResult {
+                status: CheckStatus::Ok,
+                message: Some(format!("{log_dir_str} (no log files yet)")),
+            },
+            log_dir: Some(log_dir_str),
+            file_count: Some(0),
+            total_bytes: Some(0),
+        };
+    }
+
+    let mut file_count = 0usize;
+    let mut total_bytes = 0u64;
+
+    if let Ok(entries) = std::fs::read_dir(&log_dir) {
+        for entry in entries.flatten() {
+            let name = entry.file_name();
+            let name_str = name.to_string_lossy();
+            if name_str.starts_with("lore.") {
+                file_count += 1;
+                if let Ok(meta) = entry.metadata() {
+                    total_bytes += meta.len();
+                }
+            }
+        }
+    }
+
+    let size_display = if total_bytes < 1024 {
+        format!("{total_bytes} B")
+    } else if total_bytes < 1024 * 1024 {
+        format!("{:.1} KB", total_bytes as f64 / 1024.0)
+    } else {
+        format!("{:.1} MB", total_bytes as f64 / (1024.0 * 1024.0))
+    };
+
+    LoggingCheck {
+        result: CheckResult {
+            status: CheckStatus::Ok,
+            message: Some(format!(
+                "{log_dir_str} ({file_count} files, {size_display})"
+            )),
+        },
+        log_dir: Some(log_dir_str),
+        file_count: Some(file_count),
+        total_bytes: Some(total_bytes),
+    }
+}
+
 /// Format and print doctor results to console.
 pub fn print_doctor_results(result: &DoctorResult) {
    println!("\nlore doctor\n");
@@ -466,6 +536,7 @@ pub fn print_doctor_results(result: &DoctorResult) {
    print_check("GitLab", &result.checks.gitlab.result);
    print_check("Projects", &result.checks.projects.result);
    print_check("Ollama", &result.checks.ollama.result);
+    print_check("Logging", &result.checks.logging.result);

    println!();

--- a/src/cli/commands/ingest.rs
+++ b/src/cli/commands/ingest.rs
@@ -5,6 +5,8 @@ use indicatif::{ProgressBar, ProgressStyle};
 use rusqlite::Connection;
 use serde::Serialize;

+use tracing::Instrument;
+
 use crate::Config;
 use crate::core::db::create_connection;
 use crate::core::error::{LoreError, Result};
@@ -111,6 +113,24 @@ pub async fn run_ingest(
    force: bool,
    full: bool,
    display: IngestDisplay,
+) -> Result<IngestResult> {
+    let run_id = uuid::Uuid::new_v4().simple().to_string();
+    let run_id = &run_id[..8];
+    let span = tracing::info_span!("ingest", %run_id, %resource_type);
+
+    run_ingest_inner(config, resource_type, project_filter, force, full, display)
+        .instrument(span)
+        .await
+}
+
+/// Inner implementation of run_ingest, instrumented with a root span.
+async fn run_ingest_inner(
+    config: &Config,
+    resource_type: &str,
+    project_filter: Option<&str>,
+    force: bool,
+    full: bool,
+    display: IngestDisplay,
 ) -> Result<IngestResult> {
    // Validate resource type early
    if resource_type != "issues" && resource_type != "mrs" {
@@ -162,15 +182,15 @@ pub async fn run_ingest(
        }
        for (local_project_id, _, path) in &projects {
            if resource_type == "issues" {
-                // Reset issue discussion watermarks first so discussions get re-synced
+                // Reset issue discussion and resource event watermarks so everything gets re-synced
                conn.execute(
-                    "UPDATE issues SET discussions_synced_for_updated_at = NULL WHERE project_id = ?",
+                    "UPDATE issues SET discussions_synced_for_updated_at = NULL, resource_events_synced_for_updated_at = NULL WHERE project_id = ?",
                    [*local_project_id],
                )?;
            } else if resource_type == "mrs" {
-                // Reset MR discussion watermarks
+                // Reset MR discussion and resource event watermarks
                conn.execute(
-                    "UPDATE merge_requests SET discussions_synced_for_updated_at = NULL WHERE project_id = ?",
+                    "UPDATE merge_requests SET discussions_synced_for_updated_at = NULL, resource_events_synced_for_updated_at = NULL WHERE project_id = ?",
                    [*local_project_id],
                )?;
            }
@@ -255,11 +275,12 @@ pub async fn run_ingest(
                    b.set_style(
                        ProgressStyle::default_bar()
                            .template(
-                                "    {spinner:.blue} Syncing discussions [{bar:30.cyan/dim}] {pos}/{len}",
+                                "    {spinner:.blue} {prefix:.cyan} Syncing discussions [{bar:30.cyan/dim}] {pos}/{len}",
                            )
                            .unwrap()
                            .progress_chars("=> "),
                    );
+                    b.set_prefix(path.clone());
                    b
                };

@@ -296,7 +317,7 @@ pub async fn run_ingest(
                            disc_bar_clone.set_length(total as u64);
                            disc_bar_clone.set_style(
                                ProgressStyle::default_bar()
-                                    .template("    {spinner:.blue} Fetching resource events [{bar:30.cyan/dim}] {pos}/{len}")
+                                    .template("    {spinner:.blue} {prefix:.cyan} Fetching resource events [{bar:30.cyan/dim}] {pos}/{len}")
                                    .unwrap()
                                    .progress_chars("=> "),
                            );
--- a/src/cli/commands/sync.rs
+++ b/src/cli/commands/sync.rs
@@ -3,10 +3,12 @@
 use console::style;
 use indicatif::{ProgressBar, ProgressStyle};
 use serde::Serialize;
+use tracing::Instrument;
 use tracing::{info, warn};

 use crate::Config;
 use crate::core::error::Result;
+use crate::core::metrics::{MetricsLayer, StageTiming};

 use super::embed::run_embed;
 use super::generate_docs::run_generate_docs;
@@ -26,6 +28,8 @@ pub struct SyncOptions {
 /// Result of the sync command.
 #[derive(Debug, Default, Serialize)]
 pub struct SyncResult {
+    #[serde(skip)]
+    pub run_id: String,
    pub issues_updated: usize,
    pub mrs_updated: usize,
    pub discussions_fetched: usize,
@@ -52,8 +56,30 @@ fn stage_spinner(stage: u8, total: u8, msg: &str, robot_mode: bool) -> ProgressB
 }

 /// Run the full sync pipeline: ingest -> generate-docs -> embed.
-pub async fn run_sync(config: &Config, options: SyncOptions) -> Result<SyncResult> {
-    let mut result = SyncResult::default();
+///
+/// `run_id` is an optional correlation ID for log/metrics tracing.
+/// When called from `handle_sync_cmd`, this should be the same ID
+/// stored in the `sync_runs` table so logs and DB records correlate.
+pub async fn run_sync(
+    config: &Config,
+    options: SyncOptions,
+    run_id: Option<&str>,
+) -> Result<SyncResult> {
+    let generated_id;
+    let run_id = match run_id {
+        Some(id) => id,
+        None => {
+            generated_id = uuid::Uuid::new_v4().simple().to_string();
+            &generated_id[..8]
+        }
+    };
+    let span = tracing::info_span!("sync", %run_id);
+
+    async move {
+        let mut result = SyncResult {
+            run_id: run_id.to_string(),
+            ..SyncResult::default()
+        };

        let ingest_display = if options.robot_mode {
            IngestDisplay::silent()
@@ -175,10 +201,17 @@ pub async fn run_sync(config: &Config, options: SyncOptions) -> Result<SyncResul
        );

        Ok(result)
+    }
+    .instrument(span)
+    .await
 }

 /// Print human-readable sync summary.
-pub fn print_sync(result: &SyncResult, elapsed: std::time::Duration) {
+pub fn print_sync(
+    result: &SyncResult,
+    elapsed: std::time::Duration,
+    metrics: Option<&MetricsLayer>,
+) {
    println!("{} Sync complete:", style("done").green().bold(),);
    println!("  Issues updated:           {}", result.issues_updated);
    println!("  MRs updated:              {}", result.mrs_updated);
@@ -204,6 +237,65 @@ pub fn print_sync(result: &SyncResult, elapsed: std::time::Duration) {
    );
    println!("  Documents embedded:        {}", result.documents_embedded);
    println!("  Elapsed: {:.1}s", elapsed.as_secs_f64());
+
+    // Print per-stage timing breakdown if metrics are available
+    if let Some(metrics) = metrics {
+        let stages = metrics.extract_timings();
+        if !stages.is_empty() {
+            print_timing_summary(&stages);
+        }
+    }
+}
+
+/// Print per-stage timing breakdown for interactive users.
+fn print_timing_summary(stages: &[StageTiming]) {
+    println!();
+    println!("{}", style("Stage timing:").dim());
+    for stage in stages {
+        for sub in &stage.sub_stages {
+            print_stage_line(sub, 1);
+        }
+    }
+}
+
+/// Print a single stage timing line with indentation.
+fn print_stage_line(stage: &StageTiming, depth: usize) {
+    let indent = "  ".repeat(depth);
+    let name = if let Some(ref project) = stage.project {
+        format!("{} ({})", stage.name, project)
+    } else {
+        stage.name.clone()
+    };
+    let pad_width = 30_usize.saturating_sub(indent.len() + name.len());
+    let dots = ".".repeat(pad_width.max(2));
+
+    let mut suffix = String::new();
+    if stage.items_processed > 0 {
+        suffix.push_str(&format!("{} items", stage.items_processed));
+    }
+    if stage.errors > 0 {
+        if !suffix.is_empty() {
+            suffix.push_str(", ");
+        }
+        suffix.push_str(&format!("{} errors", stage.errors));
+    }
+    if stage.rate_limit_hits > 0 {
+        if !suffix.is_empty() {
+            suffix.push_str(", ");
+        }
+        suffix.push_str(&format!("{} rate limits", stage.rate_limit_hits));
+    }
+
+    let time_str = format!("{:.1}s", stage.elapsed_ms as f64 / 1000.0);
+    if suffix.is_empty() {
+        println!("{indent}{name} {dots} {time_str}");
+    } else {
+        println!("{indent}{name} {dots} {time_str} ({suffix})");
+    }
+
+    for sub in &stage.sub_stages {
+        print_stage_line(sub, depth + 1);
+    }
 }

 /// JSON output for sync.
@@ -216,15 +308,23 @@ struct SyncJsonOutput<'a> {

 #[derive(Serialize)]
 struct SyncMeta {
+    run_id: String,
    elapsed_ms: u64,
+    #[serde(skip_serializing_if = "Vec::is_empty")]
+    stages: Vec<StageTiming>,
 }

-/// Print JSON robot-mode sync output.
-pub fn print_sync_json(result: &SyncResult, elapsed_ms: u64) {
+/// Print JSON robot-mode sync output with optional metrics.
+pub fn print_sync_json(result: &SyncResult, elapsed_ms: u64, metrics: Option<&MetricsLayer>) {
+    let stages = metrics.map_or_else(Vec::new, MetricsLayer::extract_timings);
    let output = SyncJsonOutput {
        ok: true,
        data: result,
-        meta: SyncMeta { elapsed_ms },
+        meta: SyncMeta {
+            run_id: result.run_id.clone(),
+            elapsed_ms,
+            stages,
+        },
    };
    println!("{}", serde_json::to_string(&output).unwrap());
 }
--- a/src/cli/commands/sync_status.rs
+++ b/src/cli/commands/sync_status.rs
@@ -7,8 +7,11 @@ use serde::Serialize;
 use crate::Config;
 use crate::core::db::create_connection;
 use crate::core::error::Result;
+use crate::core::metrics::StageTiming;
 use crate::core::paths::get_db_path;
-use crate::core::time::ms_to_iso;
+use crate::core::time::{format_full_datetime, ms_to_iso};
+
+const RECENT_RUNS_LIMIT: usize = 10;

 /// Sync run information.
 #[derive(Debug)]
@@ -19,6 +22,10 @@ pub struct SyncRunInfo {
    pub status: String,
    pub command: String,
    pub error: Option<String>,
+    pub run_id: Option<String>,
+    pub total_items_processed: i64,
+    pub total_errors: i64,
+    pub stages: Option<Vec<StageTiming>>,
 }

 /// Cursor position information.
@@ -34,6 +41,7 @@ pub struct CursorInfo {
 #[derive(Debug)]
 pub struct DataSummary {
    pub issue_count: i64,
+    pub mr_count: i64,
    pub discussion_count: i64,
    pub note_count: i64,
    pub system_note_count: i64,
@@ -42,7 +50,7 @@ pub struct DataSummary {
 /// Complete sync status result.
 #[derive(Debug)]
 pub struct SyncStatusResult {
-    pub last_run: Option<SyncRunInfo>,
+    pub runs: Vec<SyncRunInfo>,
    pub cursors: Vec<CursorInfo>,
    pub summary: DataSummary,
 }
@@ -52,27 +60,33 @@ pub fn run_sync_status(config: &Config) -> Result<SyncStatusResult> {
    let db_path = get_db_path(config.storage.db_path.as_deref());
    let conn = create_connection(&db_path)?;

-    let last_run = get_last_sync_run(&conn)?;
+    let runs = get_recent_sync_runs(&conn, RECENT_RUNS_LIMIT)?;
    let cursors = get_cursor_positions(&conn)?;
    let summary = get_data_summary(&conn)?;

    Ok(SyncStatusResult {
-        last_run,
+        runs,
        cursors,
        summary,
    })
 }

-/// Get the most recent sync run.
-fn get_last_sync_run(conn: &Connection) -> Result<Option<SyncRunInfo>> {
+/// Get the most recent sync runs.
+fn get_recent_sync_runs(conn: &Connection, limit: usize) -> Result<Vec<SyncRunInfo>> {
    let mut stmt = conn.prepare(
-        "SELECT id, started_at, finished_at, status, command, error
+        "SELECT id, started_at, finished_at, status, command, error,
+                run_id, total_items_processed, total_errors, metrics_json
         FROM sync_runs
         ORDER BY started_at DESC
-         LIMIT 1",
+         LIMIT ?1",
    )?;

-    let result = stmt.query_row([], |row| {
+    let runs: std::result::Result<Vec<_>, _> = stmt
+        .query_map([limit as i64], |row| {
+            let metrics_json: Option<String> = row.get(9)?;
+            let stages: Option<Vec<StageTiming>> =
+                metrics_json.and_then(|json| serde_json::from_str(&json).ok());
+
            Ok(SyncRunInfo {
                id: row.get(0)?,
                started_at: row.get(1)?,
@@ -80,14 +94,15 @@ fn get_last_sync_run(conn: &Connection) -> Result<Option<SyncRunInfo>> {
                status: row.get(3)?,
                command: row.get(4)?,
                error: row.get(5)?,
+                run_id: row.get(6)?,
+                total_items_processed: row.get::<_, Option<i64>>(7)?.unwrap_or(0),
+                total_errors: row.get::<_, Option<i64>>(8)?.unwrap_or(0),
+                stages,
            })
-    });
+        })?
+        .collect();

-    match result {
-        Ok(info) => Ok(Some(info)),
-        Err(rusqlite::Error::QueryReturnedNoRows) => Ok(None),
-        Err(e) => Err(e.into()),
-    }
+    Ok(runs?)
 }

 /// Get cursor positions for all projects/resource types.
@@ -119,6 +134,10 @@ fn get_data_summary(conn: &Connection) -> Result<DataSummary> {
        .query_row("SELECT COUNT(*) FROM issues", [], |row| row.get(0))
        .unwrap_or(0);

+    let mr_count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM merge_requests", [], |row| row.get(0))
+        .unwrap_or(0);
+
    let discussion_count: i64 = conn
        .query_row("SELECT COUNT(*) FROM discussions", [], |row| row.get(0))
        .unwrap_or(0);
@@ -133,6 +152,7 @@ fn get_data_summary(conn: &Connection) -> Result<DataSummary> {

    Ok(DataSummary {
        issue_count,
+        mr_count,
        discussion_count,
        note_count,
        system_note_count,
@@ -149,15 +169,17 @@ fn format_duration(ms: i64) -> String {
        format!("{}h {}m {}s", hours, minutes % 60, seconds % 60)
    } else if minutes > 0 {
        format!("{}m {}s", minutes, seconds % 60)
+    } else if ms >= 1000 {
+        format!("{:.1}s", ms as f64 / 1000.0)
    } else {
-        format!("{}s", seconds)
+        format!("{}ms", ms)
    }
 }

 /// Format number with thousands separators.
 fn format_number(n: i64) -> String {
    let is_negative = n < 0;
-    let abs_n = n.abs();
+    let abs_n = n.unsigned_abs();
    let s = abs_n.to_string();
    let chars: Vec<char> = s.chars().collect();
    let mut result = String::new();
@@ -176,7 +198,10 @@ fn format_number(n: i64) -> String {
    result
 }

-/// JSON output structures for robot mode.
+// ============================================================================
+// JSON output structures for robot mode
+// ============================================================================
+
 #[derive(Serialize)]
 struct SyncStatusJsonOutput {
    ok: bool,
@@ -185,7 +210,7 @@ struct SyncStatusJsonOutput {

 #[derive(Serialize)]
 struct SyncStatusJsonData {
-    last_sync: Option<SyncRunJsonInfo>,
+    runs: Vec<SyncRunJsonInfo>,
    cursors: Vec<CursorJsonInfo>,
    summary: SummaryJsonInfo,
 }
@@ -201,7 +226,13 @@ struct SyncRunJsonInfo {
    #[serde(skip_serializing_if = "Option::is_none")]
    duration_ms: Option<i64>,
    #[serde(skip_serializing_if = "Option::is_none")]
+    run_id: Option<String>,
+    total_items_processed: i64,
+    total_errors: i64,
+    #[serde(skip_serializing_if = "Option::is_none")]
    error: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    stages: Option<Vec<StageTiming>>,
 }

 #[derive(Serialize)]
@@ -217,6 +248,7 @@ struct CursorJsonInfo {
 #[derive(Serialize)]
 struct SummaryJsonInfo {
    issues: i64,
+    merge_requests: i64,
    discussions: i64,
    notes: i64,
    system_notes: i64,
@@ -224,7 +256,10 @@ struct SummaryJsonInfo {

 /// Print sync status as JSON (robot mode).
 pub fn print_sync_status_json(result: &SyncStatusResult) {
-    let last_sync = result.last_run.as_ref().map(|run| {
+    let runs = result
+        .runs
+        .iter()
+        .map(|run| {
            let duration_ms = run.finished_at.map(|f| f - run.started_at);
            SyncRunJsonInfo {
                id: run.id,
@@ -233,9 +268,14 @@ pub fn print_sync_status_json(result: &SyncStatusResult) {
                started_at: ms_to_iso(run.started_at),
                completed_at: run.finished_at.map(ms_to_iso),
                duration_ms,
+                run_id: run.run_id.clone(),
+                total_items_processed: run.total_items_processed,
+                total_errors: run.total_errors,
                error: run.error.clone(),
+                stages: run.stages.clone(),
            }
-    });
+        })
+        .collect();

    let cursors = result
        .cursors
@@ -251,10 +291,11 @@ pub fn print_sync_status_json(result: &SyncStatusResult) {
    let output = SyncStatusJsonOutput {
        ok: true,
        data: SyncStatusJsonData {
-            last_sync,
+            runs,
            cursors,
            summary: SummaryJsonInfo {
                issues: result.summary.issue_count,
+                merge_requests: result.summary.mr_count,
                discussions: result.summary.discussion_count,
                notes: result.summary.note_count - result.summary.system_note_count,
                system_notes: result.summary.system_note_count,
@@ -265,41 +306,25 @@ pub fn print_sync_status_json(result: &SyncStatusResult) {
    println!("{}", serde_json::to_string(&output).unwrap());
 }

+// ============================================================================
+// Human-readable output
+// ============================================================================
+
 /// Print sync status result.
 pub fn print_sync_status(result: &SyncStatusResult) {
-    // Last Sync section
-    println!("{}", style("Last Sync").bold().underlined());
+    // Recent Runs section
+    println!("{}", style("Recent Sync Runs").bold().underlined());
    println!();

-    match &result.last_run {
-        Some(run) => {
-            let status_styled = match run.status.as_str() {
-                "succeeded" => style(&run.status).green(),
-                "failed" => style(&run.status).red(),
-                "running" => style(&run.status).yellow(),
-                _ => style(&run.status).dim(),
-            };
-
-            println!("  Status:    {}", status_styled);
-            println!("  Command:   {}", run.command);
-            println!("  Started:   {}", ms_to_iso(run.started_at));
-
-            if let Some(finished) = run.finished_at {
-                println!("  Completed: {}", ms_to_iso(finished));
-                let duration = finished - run.started_at;
-                println!("  Duration:  {}", format_duration(duration));
-            }
-
-            if let Some(error) = &run.error {
-                println!("  Error:     {}", style(error).red());
-            }
-        }
-        None => {
+    if result.runs.is_empty() {
        println!("  {}", style("No sync runs recorded yet.").dim());
        println!(
            "  {}",
-                style("Run 'lore ingest --type=issues' to start.").dim()
+            style("Run 'lore sync' or 'lore ingest' to start.").dim()
        );
+    } else {
+        for run in &result.runs {
+            print_run_line(run);
        }
    }

@@ -344,6 +369,10 @@ pub fn print_sync_status(result: &SyncStatusResult) {
        "  Issues:      {}",
        style(format_number(result.summary.issue_count)).bold()
    );
+    println!(
+        "  MRs:         {}",
+        style(format_number(result.summary.mr_count)).bold()
+    );
    println!(
        "  Discussions: {}",
        style(format_number(result.summary.discussion_count)).bold()
@@ -361,14 +390,63 @@ pub fn print_sync_status(result: &SyncStatusResult) {
    );
 }

+/// Print a single run as a compact one-liner.
+fn print_run_line(run: &SyncRunInfo) {
+    let status_styled = match run.status.as_str() {
+        "succeeded" => style(&run.status).green(),
+        "failed" => style(&run.status).red(),
+        "running" => style(&run.status).yellow(),
+        _ => style(&run.status).dim(),
+    };
+
+    let run_label = run
+        .run_id
+        .as_deref()
+        .map_or_else(|| format!("#{}", run.id), |id| format!("Run {id}"));
+
+    let duration = run.finished_at.map(|f| format_duration(f - run.started_at));
+
+    let time = format_full_datetime(run.started_at);
+
+    let mut parts = vec![
+        format!("{}", style(run_label).bold()),
+        format!("{status_styled}"),
+        format!("{}", style(&run.command).dim()),
+        time,
+    ];
+
+    if let Some(d) = duration {
+        parts.push(d);
+    } else {
+        parts.push("in progress".to_string());
+    }
+
+    if run.total_items_processed > 0 {
+        parts.push(format!("{} items", run.total_items_processed));
+    }
+
+    if run.total_errors > 0 {
+        parts.push(format!(
+            "{}",
+            style(format!("{} errors", run.total_errors)).red()
+        ));
+    }
+
+    println!("  {}", parts.join(" | "));
+
+    if let Some(error) = &run.error {
+        println!("    {}", style(error).red());
+    }
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;

    #[test]
    fn format_duration_handles_seconds() {
-        assert_eq!(format_duration(5_000), "5s");
-        assert_eq!(format_duration(59_000), "59s");
+        assert_eq!(format_duration(5_000), "5.0s");
+        assert_eq!(format_duration(59_000), "59.0s");
    }

    #[test]
@@ -385,6 +463,12 @@ mod tests {
        assert_eq!(format_duration(3_723_000), "1h 2m 3s");
    }

+    #[test]
+    fn format_duration_handles_milliseconds() {
+        assert_eq!(format_duration(500), "500ms");
+        assert_eq!(format_duration(0), "0ms");
+    }
+
    #[test]
    fn format_number_adds_thousands_separators() {
        assert_eq!(format_number(1000), "1,000");
--- a/src/cli/mod.rs
+++ b/src/cli/mod.rs
@@ -31,6 +31,14 @@ pub struct Cli {
    #[arg(short = 'q', long, global = true)]
    pub quiet: bool,

+    /// Increase log verbosity (-v, -vv, -vvv)
+    #[arg(short = 'v', long = "verbose", action = clap::ArgAction::Count, global = true)]
+    pub verbose: u8,
+
+    /// Log format for stderr output: text (default) or json
+    #[arg(long = "log-format", global = true, value_parser = ["text", "json"], default_value = "text")]
+    pub log_format: String,
+
    #[command(subcommand)]
    pub command: Commands,
 }
--- a/src/core/config.rs
+++ b/src/core/config.rs
@@ -120,6 +120,41 @@ impl Default for EmbeddingConfig {
    }
 }

+/// Logging and observability settings.
+#[derive(Debug, Clone, Deserialize)]
+#[serde(default)]
+pub struct LoggingConfig {
+    /// Directory for log files. Default: ~/.local/share/lore/logs/
+    #[serde(rename = "logDir")]
+    pub log_dir: Option<String>,
+
+    /// Days to retain log files. Default: 30. Set to 0 to disable file logging.
+    #[serde(rename = "retentionDays", default = "default_retention_days")]
+    pub retention_days: u32,
+
+    /// Enable JSON log files. Default: true.
+    #[serde(rename = "fileLogging", default = "default_file_logging")]
+    pub file_logging: bool,
+}
+
+fn default_retention_days() -> u32 {
+    30
+}
+
+fn default_file_logging() -> bool {
+    true
+}
+
+impl Default for LoggingConfig {
+    fn default() -> Self {
+        Self {
+            log_dir: None,
+            retention_days: default_retention_days(),
+            file_logging: default_file_logging(),
+        }
+    }
+}
+
 /// Main configuration structure.
 #[derive(Debug, Clone, Deserialize)]
 pub struct Config {
@@ -134,6 +169,9 @@ pub struct Config {

    #[serde(default)]
    pub embedding: EmbeddingConfig,
+
+    #[serde(default)]
+    pub logging: LoggingConfig,
 }

 impl Config {
--- a/src/core/db.rs
+++ b/src/core/db.rs
@@ -47,6 +47,10 @@ const MIGRATIONS: &[(&str, &str)] = &[
        "013",
        include_str!("../../migrations/013_resource_event_watermarks.sql"),
    ),
+    (
+        "014",
+        include_str!("../../migrations/014_sync_runs_enrichment.sql"),
+    ),
 ];

 /// Create a database connection with production-grade pragmas.
@@ -190,12 +194,34 @@ pub fn run_migrations_from_dir(conn: &Connection, migrations_dir: &Path) -> Resu

        let sql = fs::read_to_string(entry.path())?;

-        conn.execute_batch(&sql)
+        // Wrap each migration in a savepoint to prevent partial application,
+        // matching the safety guarantees of run_migrations().
+        let savepoint_name = format!("migration_{}", version);
+        conn.execute_batch(&format!("SAVEPOINT {}", savepoint_name))
            .map_err(|e| LoreError::MigrationFailed {
+                version,
+                message: format!("Failed to create savepoint: {}", e),
+                source: Some(e),
+            })?;
+
+        match conn.execute_batch(&sql) {
+            Ok(()) => {
+                conn.execute_batch(&format!("RELEASE {}", savepoint_name))
+                    .map_err(|e| LoreError::MigrationFailed {
+                        version,
+                        message: format!("Failed to release savepoint: {}", e),
+                        source: Some(e),
+                    })?;
+            }
+            Err(e) => {
+                let _ = conn.execute_batch(&format!("ROLLBACK TO {}", savepoint_name));
+                return Err(LoreError::MigrationFailed {
                    version,
                    message: e.to_string(),
                    source: Some(e),
-            })?;
+                });
+            }
+        }

        info!(version, file = %filename_str, "Migration applied");
    }
--- a/src/core/dependent_queue.rs
+++ b/src/core/dependent_queue.rs
@@ -54,11 +54,16 @@ pub fn enqueue_job(
    Ok(changes > 0)
 }

-/// Claim a batch of jobs for processing.
+/// Claim a batch of jobs for processing, scoped to a specific project.
 ///
 /// Atomically selects and locks jobs within a transaction. Only claims jobs
 /// where `locked_at IS NULL` and `(next_retry_at IS NULL OR next_retry_at <= now)`.
-pub fn claim_jobs(conn: &Connection, job_type: &str, batch_size: usize) -> Result<Vec<PendingJob>> {
+pub fn claim_jobs(
+    conn: &Connection,
+    job_type: &str,
+    project_id: i64,
+    batch_size: usize,
+) -> Result<Vec<PendingJob>> {
    if batch_size == 0 {
        return Ok(Vec::new());
    }
@@ -73,6 +78,7 @@ pub fn claim_jobs(conn: &Connection, job_type: &str, batch_size: usize) -> Resul
         WHERE id IN (
             SELECT id FROM pending_dependent_fetches
             WHERE job_type = ?2
+               AND project_id = ?4
               AND locked_at IS NULL
               AND (next_retry_at IS NULL OR next_retry_at <= ?1)
             ORDER BY enqueued_at ASC
@@ -83,7 +89,9 @@ pub fn claim_jobs(conn: &Connection, job_type: &str, batch_size: usize) -> Resul
    )?;

    let jobs = stmt
-        .query_map(rusqlite::params![now, job_type, batch_size as i64], |row| {
+        .query_map(
+            rusqlite::params![now, job_type, batch_size as i64, project_id],
+            |row| {
                Ok(PendingJob {
                    id: row.get(0)?,
                    project_id: row.get(1)?,
@@ -94,7 +102,8 @@ pub fn claim_jobs(conn: &Connection, job_type: &str, batch_size: usize) -> Resul
                    payload_json: row.get(6)?,
                    attempts: row.get(7)?,
                })
-        })?
+            },
+        )?
        .collect::<std::result::Result<Vec<_>, _>>()?;

    Ok(jobs)
@@ -152,19 +161,69 @@ pub fn reclaim_stale_locks(conn: &Connection, stale_threshold_minutes: u32) -> R
    Ok(changes)
 }

-/// Count pending jobs by job_type (for stats/progress).
-pub fn count_pending_jobs(conn: &Connection) -> Result<HashMap<String, usize>> {
+/// Count pending jobs by job_type, optionally scoped to a project.
+pub fn count_pending_jobs(
+    conn: &Connection,
+    project_id: Option<i64>,
+) -> Result<HashMap<String, usize>> {
+    let mut counts = HashMap::new();
+
+    match project_id {
+        Some(pid) => {
+            let mut stmt = conn.prepare_cached(
+                "SELECT job_type, COUNT(*) FROM pending_dependent_fetches \
+                 WHERE project_id = ?1 GROUP BY job_type",
+            )?;
+            let rows = stmt.query_map(rusqlite::params![pid], |row| {
+                let job_type: String = row.get(0)?;
+                let count: i64 = row.get(1)?;
+                Ok((job_type, count as usize))
+            })?;
+            for row in rows {
+                let (job_type, count) = row?;
+                counts.insert(job_type, count);
+            }
+        }
+        None => {
            let mut stmt = conn.prepare_cached(
                "SELECT job_type, COUNT(*) FROM pending_dependent_fetches GROUP BY job_type",
            )?;
-
-    let mut counts = HashMap::new();
            let rows = stmt.query_map([], |row| {
                let job_type: String = row.get(0)?;
                let count: i64 = row.get(1)?;
                Ok((job_type, count as usize))
            })?;
+            for row in rows {
+                let (job_type, count) = row?;
+                counts.insert(job_type, count);
+            }
+        }
+    }

+    Ok(counts)
+}
+
+/// Count jobs that are actually claimable right now, by job_type.
+///
+/// Only counts jobs where `locked_at IS NULL` and `(next_retry_at IS NULL OR next_retry_at <= now)`,
+/// matching the exact WHERE clause used by [`claim_jobs`]. This gives an accurate total
+/// for progress bars — unlike [`count_pending_jobs`] which includes locked and backing-off jobs.
+pub fn count_claimable_jobs(conn: &Connection, project_id: i64) -> Result<HashMap<String, usize>> {
+    let now = now_ms();
+    let mut counts = HashMap::new();
+
+    let mut stmt = conn.prepare_cached(
+        "SELECT job_type, COUNT(*) FROM pending_dependent_fetches \
+         WHERE project_id = ?1 \
+           AND locked_at IS NULL \
+           AND (next_retry_at IS NULL OR next_retry_at <= ?2) \
+         GROUP BY job_type",
+    )?;
+    let rows = stmt.query_map(rusqlite::params![project_id, now], |row| {
+        let job_type: String = row.get(0)?;
+        let count: i64 = row.get(1)?;
+        Ok((job_type, count as usize))
+    })?;
    for row in rows {
        let (job_type, count) = row?;
        counts.insert(job_type, count);
--- a/src/core/error.rs
+++ b/src/core/error.rs
@@ -59,7 +59,7 @@ impl ErrorCode {
    pub fn exit_code(&self) -> i32 {
        match self {
            Self::InternalError => 1,
-            Self::ConfigNotFound => 20,
+            Self::ConfigNotFound => 2,
            Self::ConfigInvalid => 3,
            Self::TokenNotSet => 4,
            Self::GitLabAuthFailed => 5,
@@ -240,6 +240,15 @@ impl LoreError {
        }
    }

+    /// Whether this error represents a permanent API failure that should not be retried.
+    ///
+    /// Only 404 (not found) is truly permanent: the resource doesn't exist and never will.
+    /// 403 and auth errors are NOT permanent — they may be environmental (VPN down,
+    /// token rotation, temporary restrictions) and should be retried with backoff.
+    pub fn is_permanent_api_error(&self) -> bool {
+        matches!(self, Self::GitLabNotFound { .. })
+    }
+
    /// Get the exit code for this error.
    pub fn exit_code(&self) -> i32 {
        self.code().exit_code()
--- a/src/core/logging.rs
+++ b/src/core/logging.rs
@@ -0,0 +1,217 @@
+//! Logging infrastructure: dual-layer subscriber setup and log file retention.
+//!
+//! Provides a layered tracing subscriber with:
+//! - **stderr layer**: Human-readable or JSON format, controlled by `-v` flags
+//! - **file layer**: Always-on JSON output to daily-rotated log files
+
+use std::fs;
+use std::path::Path;
+
+use tracing_subscriber::EnvFilter;
+
+/// Build an `EnvFilter` from the verbosity count.
+///
+/// | Count | App Level | Dep Level |
+/// |-------|-----------|-----------|
+/// |   0   |   INFO    |   WARN    |
+/// |   1   |   DEBUG   |   WARN    |
+/// |   2   |   DEBUG   |   INFO    |
+/// |   3+  |   TRACE   |   DEBUG   |
+pub fn build_stderr_filter(verbose: u8, quiet: bool) -> EnvFilter {
+    // RUST_LOG always wins if set
+    if std::env::var("RUST_LOG").is_ok() {
+        return EnvFilter::from_default_env();
+    }
+
+    // -q overrides -v for stderr
+    if quiet {
+        return EnvFilter::new("lore=warn,error");
+    }
+
+    let directives = match verbose {
+        0 => "lore=info,warn",
+        1 => "lore=debug,warn",
+        2 => "lore=debug,info",
+        _ => "lore=trace,debug",
+    };
+
+    EnvFilter::new(directives)
+}
+
+/// Build an `EnvFilter` for the file layer.
+///
+/// Always captures DEBUG+ for `lore::*` and WARN+ for dependencies,
+/// unless `RUST_LOG` is set (which overrides everything).
+pub fn build_file_filter() -> EnvFilter {
+    if std::env::var("RUST_LOG").is_ok() {
+        return EnvFilter::from_default_env();
+    }
+
+    EnvFilter::new("lore=debug,warn")
+}
+
+/// Delete log files older than `retention_days` from the given directory.
+///
+/// Only deletes files matching the `lore.YYYY-MM-DD.log` pattern.
+/// Returns the number of files deleted.
+pub fn cleanup_old_logs(log_dir: &Path, retention_days: u32) -> usize {
+    if retention_days == 0 || !log_dir.exists() {
+        return 0;
+    }
+
+    let cutoff = chrono::Utc::now() - chrono::Duration::days(i64::from(retention_days));
+    let cutoff_date = cutoff.format("%Y-%m-%d").to_string();
+    let mut deleted = 0;
+
+    let entries = match fs::read_dir(log_dir) {
+        Ok(entries) => entries,
+        Err(_) => return 0,
+    };
+
+    for entry in entries.flatten() {
+        let file_name = entry.file_name();
+        let name = file_name.to_string_lossy();
+
+        // Match pattern: lore.YYYY-MM-DD.log or lore.YYYY-MM-DD (tracing-appender format)
+        if let Some(date_str) = extract_log_date(&name)
+            && date_str < cutoff_date
+            && fs::remove_file(entry.path()).is_ok()
+        {
+            deleted += 1;
+        }
+    }
+
+    deleted
+}
+
+/// Extract the date portion from a log filename.
+///
+/// Matches: `lore.YYYY-MM-DD.log` or `lore.YYYY-MM-DD`
+fn extract_log_date(filename: &str) -> Option<String> {
+    let rest = filename.strip_prefix("lore.")?;
+
+    // Must have at least YYYY-MM-DD (10 ASCII chars).
+    // Use get() to avoid panicking on non-ASCII filenames.
+    let date_part = rest.get(..10)?;
+
+    // Validate it looks like a date
+    let parts: Vec<&str> = date_part.split('-').collect();
+    if parts.len() != 3 || parts[0].len() != 4 || parts[1].len() != 2 || parts[2].len() != 2 {
+        return None;
+    }
+
+    // Check all parts are numeric (also ensures ASCII)
+    if !parts.iter().all(|p| p.chars().all(|c| c.is_ascii_digit())) {
+        return None;
+    }
+
+    // After the date, must be end-of-string or ".log"
+    let suffix = rest.get(10..)?;
+    if suffix.is_empty() || suffix == ".log" {
+        Some(date_part.to_string())
+    } else {
+        None
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::fs::File;
+    use tempfile::TempDir;
+
+    #[test]
+    fn test_extract_log_date_with_extension() {
+        assert_eq!(
+            extract_log_date("lore.2026-02-04.log"),
+            Some("2026-02-04".to_string())
+        );
+    }
+
+    #[test]
+    fn test_extract_log_date_without_extension() {
+        assert_eq!(
+            extract_log_date("lore.2026-02-04"),
+            Some("2026-02-04".to_string())
+        );
+    }
+
+    #[test]
+    fn test_extract_log_date_rejects_non_log_files() {
+        assert_eq!(extract_log_date("other.txt"), None);
+        assert_eq!(extract_log_date("lore.config.json"), None);
+        assert_eq!(extract_log_date("lore.db"), None);
+    }
+
+    #[test]
+    fn test_extract_log_date_rejects_invalid_dates() {
+        assert_eq!(extract_log_date("lore.not-a-date.log"), None);
+        assert_eq!(extract_log_date("lore.20260204.log"), None);
+    }
+
+    #[test]
+    fn test_cleanup_old_logs_deletes_old_files() {
+        let dir = TempDir::new().unwrap();
+
+        // Create old log files (well before any reasonable retention)
+        File::create(dir.path().join("lore.2020-01-01.log")).unwrap();
+        File::create(dir.path().join("lore.2020-01-15.log")).unwrap();
+
+        // Create a recent log file (today)
+        let today = chrono::Utc::now().format("%Y-%m-%d").to_string();
+        let recent_name = format!("lore.{today}.log");
+        File::create(dir.path().join(&recent_name)).unwrap();
+
+        // Create a non-log file that should NOT be deleted
+        File::create(dir.path().join("other.txt")).unwrap();
+
+        let deleted = cleanup_old_logs(dir.path(), 7);
+
+        assert_eq!(deleted, 2);
+        assert!(!dir.path().join("lore.2020-01-01.log").exists());
+        assert!(!dir.path().join("lore.2020-01-15.log").exists());
+        assert!(dir.path().join(&recent_name).exists());
+        assert!(dir.path().join("other.txt").exists());
+    }
+
+    #[test]
+    fn test_cleanup_old_logs_zero_retention_is_noop() {
+        let dir = TempDir::new().unwrap();
+        File::create(dir.path().join("lore.2020-01-01.log")).unwrap();
+
+        let deleted = cleanup_old_logs(dir.path(), 0);
+        assert_eq!(deleted, 0);
+        assert!(dir.path().join("lore.2020-01-01.log").exists());
+    }
+
+    #[test]
+    fn test_cleanup_old_logs_nonexistent_dir() {
+        let deleted = cleanup_old_logs(Path::new("/nonexistent/dir"), 7);
+        assert_eq!(deleted, 0);
+    }
+
+    #[test]
+    fn test_build_stderr_filter_default() {
+        // Can't easily assert filter contents, but verify it doesn't panic
+        let _filter = build_stderr_filter(0, false);
+    }
+
+    #[test]
+    fn test_build_stderr_filter_verbose_levels() {
+        let _f0 = build_stderr_filter(0, false);
+        let _f1 = build_stderr_filter(1, false);
+        let _f2 = build_stderr_filter(2, false);
+        let _f3 = build_stderr_filter(3, false);
+    }
+
+    #[test]
+    fn test_build_stderr_filter_quiet_overrides_verbose() {
+        // Quiet should win over verbose
+        let _filter = build_stderr_filter(3, true);
+    }
+
+    #[test]
+    fn test_build_file_filter() {
+        let _filter = build_file_filter();
+    }
+}
--- a/src/core/metrics.rs
+++ b/src/core/metrics.rs
@@ -0,0 +1,609 @@
+//! Performance metrics types and tracing layer for sync pipeline observability.
+//!
+//! Provides:
+//! - [`StageTiming`]: Serializable timing/counter data for pipeline stages
+//! - [`MetricsLayer`]: Custom tracing subscriber layer that captures span timing
+
+use std::collections::HashMap;
+use std::sync::{Arc, Mutex};
+use std::time::Instant;
+
+use serde::{Deserialize, Serialize};
+use tracing::Subscriber;
+use tracing::span::{Attributes, Id, Record};
+use tracing_subscriber::layer::{Context, Layer};
+use tracing_subscriber::registry::LookupSpan;
+
+/// Returns true when value is zero (for serde `skip_serializing_if`).
+fn is_zero(v: &usize) -> bool {
+    *v == 0
+}
+
+/// Timing and counter data for a single pipeline stage.
+///
+/// Supports nested sub-stages for hierarchical timing breakdowns.
+/// Fields with zero/empty values are omitted from JSON output to
+/// keep robot-mode payloads compact.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct StageTiming {
+    pub name: String,
+    #[serde(skip_serializing_if = "Option::is_none", default)]
+    pub project: Option<String>,
+    pub elapsed_ms: u64,
+    pub items_processed: usize,
+    #[serde(skip_serializing_if = "is_zero", default)]
+    pub items_skipped: usize,
+    #[serde(skip_serializing_if = "is_zero", default)]
+    pub errors: usize,
+    #[serde(skip_serializing_if = "is_zero", default)]
+    pub rate_limit_hits: usize,
+    #[serde(skip_serializing_if = "is_zero", default)]
+    pub retries: usize,
+    #[serde(skip_serializing_if = "Vec::is_empty", default)]
+    pub sub_stages: Vec<StageTiming>,
+}
+
+// ============================================================================
+// MetricsLayer: custom tracing subscriber layer
+// ============================================================================
+
+/// Internal data tracked per open span.
+struct SpanData {
+    name: String,
+    parent_id: Option<u64>,
+    start: Instant,
+    fields: HashMap<String, serde_json::Value>,
+    rate_limit_hits: usize,
+    retries: usize,
+}
+
+/// Completed span data with its original ID and parent ID.
+struct CompletedSpan {
+    id: u64,
+    parent_id: Option<u64>,
+    timing: StageTiming,
+}
+
+/// Custom tracing layer that captures span timing and structured fields.
+///
+/// Collects data from `#[instrument]` spans and materializes it into
+/// a `Vec<StageTiming>` tree via [`extract_timings`].
+///
+/// Thread-safe via `Arc<Mutex<>>` — suitable for concurrent span operations.
+#[derive(Debug, Clone)]
+pub struct MetricsLayer {
+    spans: Arc<Mutex<HashMap<u64, SpanData>>>,
+    completed: Arc<Mutex<Vec<CompletedSpan>>>,
+}
+
+impl Default for MetricsLayer {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+impl MetricsLayer {
+    pub fn new() -> Self {
+        Self {
+            spans: Arc::new(Mutex::new(HashMap::new())),
+            completed: Arc::new(Mutex::new(Vec::new())),
+        }
+    }
+
+    /// Extract timing tree for a completed run.
+    ///
+    /// Returns the top-level stages with sub-stages nested.
+    /// Call after the root span closes.
+    pub fn extract_timings(&self) -> Vec<StageTiming> {
+        let completed = self.completed.lock().unwrap_or_else(|e| e.into_inner());
+        if completed.is_empty() {
+            return Vec::new();
+        }
+
+        // Build children map: parent_id -> Vec<StageTiming>
+        let mut children_map: HashMap<u64, Vec<StageTiming>> = HashMap::new();
+        let mut roots = Vec::new();
+        let mut id_to_timing: HashMap<u64, StageTiming> = HashMap::new();
+
+        // First pass: collect all timings by ID
+        for entry in completed.iter() {
+            id_to_timing.insert(entry.id, entry.timing.clone());
+        }
+
+        // Second pass: process in reverse order (children close before parents)
+        // to build the tree bottom-up
+        for entry in completed.iter() {
+            // Attach any children that were collected for this span
+            if let Some(timing) = id_to_timing.get_mut(&entry.id)
+                && let Some(children) = children_map.remove(&entry.id)
+            {
+                timing.sub_stages = children;
+            }
+
+            if let Some(parent_id) = entry.parent_id {
+                // This is a child span — attach to parent's children
+                if let Some(timing) = id_to_timing.remove(&entry.id) {
+                    children_map.entry(parent_id).or_default().push(timing);
+                }
+            }
+        }
+
+        // Remaining entries in id_to_timing are roots
+        for entry in completed.iter() {
+            if entry.parent_id.is_none()
+                && let Some(mut timing) = id_to_timing.remove(&entry.id)
+            {
+                if let Some(children) = children_map.remove(&entry.id) {
+                    timing.sub_stages = children;
+                }
+                roots.push(timing);
+            }
+        }
+
+        roots
+    }
+}
+
+/// Visitor that extracts field values from span attributes.
+struct FieldVisitor<'a>(&'a mut HashMap<String, serde_json::Value>);
+
+impl tracing::field::Visit for FieldVisitor<'_> {
+    fn record_debug(&mut self, field: &tracing::field::Field, value: &dyn std::fmt::Debug) {
+        self.0.insert(
+            field.name().to_string(),
+            serde_json::Value::String(format!("{value:?}")),
+        );
+    }
+
+    fn record_i64(&mut self, field: &tracing::field::Field, value: i64) {
+        self.0.insert(
+            field.name().to_string(),
+            serde_json::Value::Number(value.into()),
+        );
+    }
+
+    fn record_u64(&mut self, field: &tracing::field::Field, value: u64) {
+        self.0.insert(
+            field.name().to_string(),
+            serde_json::Value::Number(value.into()),
+        );
+    }
+
+    fn record_str(&mut self, field: &tracing::field::Field, value: &str) {
+        self.0.insert(
+            field.name().to_string(),
+            serde_json::Value::String(value.to_string()),
+        );
+    }
+
+    fn record_bool(&mut self, field: &tracing::field::Field, value: bool) {
+        self.0
+            .insert(field.name().to_string(), serde_json::Value::Bool(value));
+    }
+}
+
+/// Visitor that extracts event fields for rate-limit/retry detection.
+#[derive(Default)]
+struct EventVisitor {
+    status_code: Option<u64>,
+    message: String,
+}
+
+impl tracing::field::Visit for EventVisitor {
+    fn record_debug(&mut self, field: &tracing::field::Field, value: &dyn std::fmt::Debug) {
+        if field.name() == "message" {
+            self.message = format!("{value:?}");
+        }
+    }
+
+    fn record_u64(&mut self, field: &tracing::field::Field, value: u64) {
+        if field.name() == "status_code" {
+            self.status_code = Some(value);
+        }
+    }
+
+    fn record_str(&mut self, field: &tracing::field::Field, value: &str) {
+        if field.name() == "message" {
+            self.message = value.to_string();
+        }
+    }
+}
+
+impl<S> Layer<S> for MetricsLayer
+where
+    S: Subscriber + for<'a> LookupSpan<'a>,
+{
+    fn on_new_span(&self, attrs: &Attributes<'_>, id: &Id, ctx: Context<'_, S>) {
+        let parent_id = ctx
+            .span(id)
+            .and_then(|s| s.parent().map(|p| p.id().into_u64()));
+
+        let mut fields = HashMap::new();
+        let mut visitor = FieldVisitor(&mut fields);
+        attrs.record(&mut visitor);
+
+        self.spans.lock().unwrap_or_else(|e| e.into_inner()).insert(
+            id.into_u64(),
+            SpanData {
+                name: attrs.metadata().name().to_string(),
+                parent_id,
+                start: Instant::now(),
+                fields,
+                rate_limit_hits: 0,
+                retries: 0,
+            },
+        );
+    }
+
+    fn on_record(&self, id: &Id, values: &Record<'_>, _ctx: Context<'_, S>) {
+        if let Some(data) = self
+            .spans
+            .lock()
+            .unwrap_or_else(|e| e.into_inner())
+            .get_mut(&id.into_u64())
+        {
+            let mut visitor = FieldVisitor(&mut data.fields);
+            values.record(&mut visitor);
+        }
+    }
+
+    fn on_event(&self, event: &tracing::Event<'_>, ctx: Context<'_, S>) {
+        // Count rate-limit and retry events on the current span
+        if let Some(span_ref) = ctx.event_span(event) {
+            let id = span_ref.id();
+            if let Some(data) = self
+                .spans
+                .lock()
+                .unwrap_or_else(|e| e.into_inner())
+                .get_mut(&id.into_u64())
+            {
+                let mut visitor = EventVisitor::default();
+                event.record(&mut visitor);
+
+                if visitor.status_code == Some(429) {
+                    data.rate_limit_hits += 1;
+                }
+                if visitor.message.contains("retrying") || visitor.message.contains("Retrying") {
+                    data.retries += 1;
+                }
+            }
+        }
+    }
+
+    fn on_close(&self, id: Id, _ctx: Context<'_, S>) {
+        if let Some(data) = self
+            .spans
+            .lock()
+            .unwrap_or_else(|e| e.into_inner())
+            .remove(&id.into_u64())
+        {
+            let elapsed = data.start.elapsed();
+            let timing = StageTiming {
+                name: data.name,
+                project: data
+                    .fields
+                    .get("project")
+                    .and_then(|v| v.as_str())
+                    .map(String::from),
+                elapsed_ms: elapsed.as_millis() as u64,
+                items_processed: data
+                    .fields
+                    .get("items_processed")
+                    .and_then(|v| v.as_u64())
+                    .unwrap_or(0) as usize,
+                items_skipped: data
+                    .fields
+                    .get("items_skipped")
+                    .and_then(|v| v.as_u64())
+                    .unwrap_or(0) as usize,
+                errors: data
+                    .fields
+                    .get("errors")
+                    .and_then(|v| v.as_u64())
+                    .unwrap_or(0) as usize,
+                rate_limit_hits: data.rate_limit_hits,
+                retries: data.retries,
+                sub_stages: vec![],
+            };
+            self.completed
+                .lock()
+                .unwrap_or_else(|e| e.into_inner())
+                .push(CompletedSpan {
+                    id: id.into_u64(),
+                    parent_id: data.parent_id,
+                    timing,
+                });
+        }
+    }
+}
+
+// Manual Debug impl since SpanData and CompletedSpan don't derive Debug
+impl std::fmt::Debug for SpanData {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("SpanData")
+            .field("name", &self.name)
+            .field("parent_id", &self.parent_id)
+            .field("fields", &self.fields.keys().collect::<Vec<_>>())
+            .finish()
+    }
+}
+
+impl std::fmt::Debug for CompletedSpan {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("CompletedSpan")
+            .field("id", &self.id)
+            .field("parent_id", &self.parent_id)
+            .field("timing", &self.timing.name)
+            .finish()
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use tracing_subscriber::layer::SubscriberExt;
+
+    #[test]
+    fn test_stage_timing_serialization() {
+        let timing = StageTiming {
+            name: "sync".to_string(),
+            project: None,
+            elapsed_ms: 1500,
+            items_processed: 42,
+            items_skipped: 3,
+            errors: 1,
+            rate_limit_hits: 2,
+            retries: 5,
+            sub_stages: vec![StageTiming {
+                name: "ingest_issues".to_string(),
+                project: Some("group/repo".to_string()),
+                elapsed_ms: 800,
+                items_processed: 30,
+                items_skipped: 0,
+                errors: 0,
+                rate_limit_hits: 0,
+                retries: 0,
+                sub_stages: vec![],
+            }],
+        };
+
+        let json = serde_json::to_value(&timing).unwrap();
+        assert_eq!(json["name"], "sync");
+        assert_eq!(json["elapsed_ms"], 1500);
+        assert_eq!(json["items_processed"], 42);
+        assert_eq!(json["items_skipped"], 3);
+        assert_eq!(json["errors"], 1);
+        assert_eq!(json["rate_limit_hits"], 2);
+        assert_eq!(json["retries"], 5);
+
+        // Sub-stage present
+        let sub = &json["sub_stages"][0];
+        assert_eq!(sub["name"], "ingest_issues");
+        assert_eq!(sub["project"], "group/repo");
+        assert_eq!(sub["elapsed_ms"], 800);
+    }
+
+    #[test]
+    fn test_stage_timing_zero_fields_omitted() {
+        let timing = StageTiming {
+            name: "embed".to_string(),
+            project: None,
+            elapsed_ms: 500,
+            items_processed: 10,
+            items_skipped: 0,
+            errors: 0,
+            rate_limit_hits: 0,
+            retries: 0,
+            sub_stages: vec![],
+        };
+
+        let json = serde_json::to_value(&timing).unwrap();
+        let obj = json.as_object().unwrap();
+
+        // Zero fields must be absent
+        assert!(!obj.contains_key("items_skipped"));
+        assert!(!obj.contains_key("errors"));
+        assert!(!obj.contains_key("rate_limit_hits"));
+        assert!(!obj.contains_key("retries"));
+        assert!(!obj.contains_key("sub_stages"));
+        assert!(!obj.contains_key("project"));
+
+        // Required fields always present
+        assert!(obj.contains_key("name"));
+        assert!(obj.contains_key("elapsed_ms"));
+        assert!(obj.contains_key("items_processed"));
+    }
+
+    #[test]
+    fn test_stage_timing_empty_sub_stages() {
+        let timing = StageTiming {
+            name: "docs".to_string(),
+            project: None,
+            elapsed_ms: 200,
+            items_processed: 5,
+            items_skipped: 0,
+            errors: 0,
+            rate_limit_hits: 0,
+            retries: 0,
+            sub_stages: vec![],
+        };
+
+        let json_str = serde_json::to_string(&timing).unwrap();
+        assert!(!json_str.contains("sub_stages"));
+    }
+
+    #[test]
+    fn test_stage_timing_debug_clone() {
+        let timing = StageTiming {
+            name: "test".to_string(),
+            project: None,
+            elapsed_ms: 100,
+            items_processed: 1,
+            items_skipped: 0,
+            errors: 0,
+            rate_limit_hits: 0,
+            retries: 0,
+            sub_stages: vec![],
+        };
+
+        let cloned = timing.clone();
+        assert_eq!(cloned.name, "test");
+
+        let debug_str = format!("{timing:?}");
+        assert!(debug_str.contains("test"));
+    }
+
+    #[test]
+    fn test_stage_timing_nested_sub_stages() {
+        let timing = StageTiming {
+            name: "sync".to_string(),
+            project: None,
+            elapsed_ms: 3000,
+            items_processed: 100,
+            items_skipped: 0,
+            errors: 0,
+            rate_limit_hits: 0,
+            retries: 0,
+            sub_stages: vec![StageTiming {
+                name: "ingest".to_string(),
+                project: None,
+                elapsed_ms: 2000,
+                items_processed: 80,
+                items_skipped: 0,
+                errors: 0,
+                rate_limit_hits: 0,
+                retries: 0,
+                sub_stages: vec![StageTiming {
+                    name: "discussions".to_string(),
+                    project: Some("a/b".to_string()),
+                    elapsed_ms: 1000,
+                    items_processed: 50,
+                    items_skipped: 0,
+                    errors: 0,
+                    rate_limit_hits: 0,
+                    retries: 0,
+                    sub_stages: vec![],
+                }],
+            }],
+        };
+
+        let json = serde_json::to_value(&timing).unwrap();
+        let nested = &json["sub_stages"][0]["sub_stages"][0];
+        assert_eq!(nested["name"], "discussions");
+        assert_eq!(nested["project"], "a/b");
+    }
+
+    #[test]
+    fn test_rate_limit_fields_omitted_when_zero() {
+        let timing = StageTiming {
+            name: "ingest".to_string(),
+            project: None,
+            elapsed_ms: 100,
+            items_processed: 5,
+            items_skipped: 0,
+            errors: 0,
+            rate_limit_hits: 0,
+            retries: 0,
+            sub_stages: vec![],
+        };
+
+        let json_str = serde_json::to_string(&timing).unwrap();
+        assert!(!json_str.contains("rate_limit_hits"));
+        assert!(!json_str.contains("retries"));
+    }
+
+    #[test]
+    fn test_rate_limit_fields_present_when_nonzero() {
+        let timing = StageTiming {
+            name: "ingest".to_string(),
+            project: None,
+            elapsed_ms: 100,
+            items_processed: 5,
+            items_skipped: 0,
+            errors: 0,
+            rate_limit_hits: 3,
+            retries: 7,
+            sub_stages: vec![],
+        };
+
+        let json = serde_json::to_value(&timing).unwrap();
+        assert_eq!(json["rate_limit_hits"], 3);
+        assert_eq!(json["retries"], 7);
+    }
+
+    #[test]
+    fn test_metrics_layer_single_span() {
+        let metrics = MetricsLayer::new();
+        let subscriber = tracing_subscriber::registry().with(metrics.clone());
+
+        tracing::subscriber::with_default(subscriber, || {
+            let span = tracing::info_span!("test_stage");
+            let _guard = span.enter();
+            // Simulate work
+        });
+
+        let timings = metrics.extract_timings();
+        assert_eq!(timings.len(), 1);
+        assert_eq!(timings[0].name, "test_stage");
+        assert!(timings[0].elapsed_ms < 100); // Should be near-instant
+    }
+
+    #[test]
+    fn test_metrics_layer_nested_spans() {
+        let metrics = MetricsLayer::new();
+        let subscriber = tracing_subscriber::registry().with(metrics.clone());
+
+        tracing::subscriber::with_default(subscriber, || {
+            let parent = tracing::info_span!("parent");
+            let _parent_guard = parent.enter();
+            {
+                let child = tracing::info_span!("child");
+                let _child_guard = child.enter();
+            }
+        });
+
+        let timings = metrics.extract_timings();
+        assert_eq!(timings.len(), 1);
+        assert_eq!(timings[0].name, "parent");
+        assert_eq!(timings[0].sub_stages.len(), 1);
+        assert_eq!(timings[0].sub_stages[0].name, "child");
+    }
+
+    #[test]
+    fn test_metrics_layer_parallel_spans() {
+        let metrics = MetricsLayer::new();
+        let subscriber = tracing_subscriber::registry().with(metrics.clone());
+
+        tracing::subscriber::with_default(subscriber, || {
+            let parent = tracing::info_span!("parent");
+            let _parent_guard = parent.enter();
+            {
+                let child_a = tracing::info_span!("child_a");
+                let _a = child_a.enter();
+            }
+            {
+                let child_b = tracing::info_span!("child_b");
+                let _b = child_b.enter();
+            }
+        });
+
+        let timings = metrics.extract_timings();
+        assert_eq!(timings.len(), 1);
+        assert_eq!(timings[0].sub_stages.len(), 2);
+
+        let names: Vec<&str> = timings[0]
+            .sub_stages
+            .iter()
+            .map(|s| s.name.as_str())
+            .collect();
+        assert!(names.contains(&"child_a"));
+        assert!(names.contains(&"child_b"));
+    }
+
+    #[test]
+    fn test_metrics_layer_empty() {
+        let metrics = MetricsLayer::new();
+        let timings = metrics.extract_timings();
+        assert!(timings.is_empty());
+    }
+}
--- a/src/core/mod.rs
+++ b/src/core/mod.rs
@@ -7,9 +7,12 @@ pub mod dependent_queue;
 pub mod error;
 pub mod events_db;
 pub mod lock;
+pub mod logging;
+pub mod metrics;
 pub mod paths;
 pub mod payloads;
 pub mod project;
+pub mod sync_run;
 pub mod time;

 pub use config::Config;
--- a/src/core/paths.rs
+++ b/src/core/paths.rs
@@ -52,6 +52,15 @@ pub fn get_db_path(config_override: Option<&str>) -> PathBuf {
    get_data_dir().join("lore.db")
 }

+/// Get the log directory path.
+/// Uses config override if provided, otherwise uses default in data dir.
+pub fn get_log_dir(config_override: Option<&str>) -> PathBuf {
+    if let Some(path) = config_override {
+        return PathBuf::from(path);
+    }
+    get_data_dir().join("logs")
+}
+
 /// Get the backup directory path.
 /// Uses config override if provided, otherwise uses default in data dir.
 pub fn get_backup_dir(config_override: Option<&str>) -> PathBuf {
--- a/src/core/sync_run.rs
+++ b/src/core/sync_run.rs
@@ -0,0 +1,232 @@
+//! Sync run lifecycle recorder.
+//!
+//! Encapsulates the INSERT-on-start, UPDATE-on-finish lifecycle for the
+//! `sync_runs` table, enabling sync history tracking and observability.
+
+use rusqlite::Connection;
+
+use super::error::Result;
+use super::metrics::StageTiming;
+use super::time::now_ms;
+
+/// Records a single sync run's lifecycle in the `sync_runs` table.
+///
+/// Created via [`start`](Self::start), then finalized with either
+/// [`succeed`](Self::succeed) or [`fail`](Self::fail). Both finalizers
+/// consume `self` to enforce single-use at compile time.
+pub struct SyncRunRecorder {
+    row_id: i64,
+}
+
+impl SyncRunRecorder {
+    /// Insert a new `sync_runs` row with `status='running'`.
+    pub fn start(conn: &Connection, command: &str, run_id: &str) -> Result<Self> {
+        let now = now_ms();
+        conn.execute(
+            "INSERT INTO sync_runs (started_at, heartbeat_at, status, command, run_id)
+             VALUES (?1, ?2, 'running', ?3, ?4)",
+            rusqlite::params![now, now, command, run_id],
+        )?;
+        let row_id = conn.last_insert_rowid();
+        Ok(Self { row_id })
+    }
+
+    /// Mark run as succeeded with full metrics.
+    pub fn succeed(
+        self,
+        conn: &Connection,
+        metrics: &[StageTiming],
+        total_items: usize,
+        total_errors: usize,
+    ) -> Result<()> {
+        let now = now_ms();
+        let metrics_json = serde_json::to_string(metrics).unwrap_or_else(|_| "[]".to_string());
+        conn.execute(
+            "UPDATE sync_runs
+             SET finished_at = ?1, status = 'succeeded',
+                 metrics_json = ?2, total_items_processed = ?3, total_errors = ?4
+             WHERE id = ?5",
+            rusqlite::params![
+                now,
+                metrics_json,
+                total_items as i64,
+                total_errors as i64,
+                self.row_id
+            ],
+        )?;
+        Ok(())
+    }
+
+    /// Mark run as failed with error message and optional partial metrics.
+    pub fn fail(
+        self,
+        conn: &Connection,
+        error: &str,
+        metrics: Option<&[StageTiming]>,
+    ) -> Result<()> {
+        let now = now_ms();
+        let metrics_json =
+            metrics.map(|m| serde_json::to_string(m).unwrap_or_else(|_| "[]".to_string()));
+        conn.execute(
+            "UPDATE sync_runs
+             SET finished_at = ?1, status = 'failed', error = ?2,
+                 metrics_json = ?3
+             WHERE id = ?4",
+            rusqlite::params![now, error, metrics_json, self.row_id],
+        )?;
+        Ok(())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::core::db::{create_connection, run_migrations};
+    use std::path::Path;
+
+    fn setup_test_db() -> Connection {
+        let conn = create_connection(Path::new(":memory:")).unwrap();
+        run_migrations(&conn).unwrap();
+        conn
+    }
+
+    #[test]
+    fn test_sync_run_recorder_start() {
+        let conn = setup_test_db();
+        let recorder = SyncRunRecorder::start(&conn, "sync", "abc12345").unwrap();
+        assert!(recorder.row_id > 0);
+
+        let (status, command, run_id): (String, String, String) = conn
+            .query_row(
+                "SELECT status, command, run_id FROM sync_runs WHERE id = ?1",
+                [recorder.row_id],
+                |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
+            )
+            .unwrap();
+
+        assert_eq!(status, "running");
+        assert_eq!(command, "sync");
+        assert_eq!(run_id, "abc12345");
+    }
+
+    #[test]
+    fn test_sync_run_recorder_succeed() {
+        let conn = setup_test_db();
+        let recorder = SyncRunRecorder::start(&conn, "sync", "def67890").unwrap();
+        let row_id = recorder.row_id;
+
+        let metrics = vec![StageTiming {
+            name: "ingest".to_string(),
+            project: None,
+            elapsed_ms: 1200,
+            items_processed: 50,
+            items_skipped: 0,
+            errors: 2,
+            rate_limit_hits: 0,
+            retries: 0,
+            sub_stages: vec![],
+        }];
+
+        recorder.succeed(&conn, &metrics, 50, 2).unwrap();
+
+        let (status, finished_at, metrics_json, total_items, total_errors): (
+            String,
+            Option<i64>,
+            Option<String>,
+            i64,
+            i64,
+        ) = conn
+            .query_row(
+                "SELECT status, finished_at, metrics_json, total_items_processed, total_errors
+                 FROM sync_runs WHERE id = ?1",
+                [row_id],
+                |row| {
+                    Ok((
+                        row.get(0)?,
+                        row.get(1)?,
+                        row.get(2)?,
+                        row.get(3)?,
+                        row.get(4)?,
+                    ))
+                },
+            )
+            .unwrap();
+
+        assert_eq!(status, "succeeded");
+        assert!(finished_at.is_some());
+        assert!(metrics_json.is_some());
+        assert_eq!(total_items, 50);
+        assert_eq!(total_errors, 2);
+
+        // Verify metrics_json is parseable
+        let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
+        assert_eq!(parsed.len(), 1);
+        assert_eq!(parsed[0].name, "ingest");
+    }
+
+    #[test]
+    fn test_sync_run_recorder_fail() {
+        let conn = setup_test_db();
+        let recorder = SyncRunRecorder::start(&conn, "ingest issues", "fail0001").unwrap();
+        let row_id = recorder.row_id;
+
+        recorder.fail(&conn, "GitLab auth failed", None).unwrap();
+
+        let (status, finished_at, error, metrics_json): (
+            String,
+            Option<i64>,
+            Option<String>,
+            Option<String>,
+        ) = conn
+            .query_row(
+                "SELECT status, finished_at, error, metrics_json
+                 FROM sync_runs WHERE id = ?1",
+                [row_id],
+                |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)),
+            )
+            .unwrap();
+
+        assert_eq!(status, "failed");
+        assert!(finished_at.is_some());
+        assert_eq!(error.as_deref(), Some("GitLab auth failed"));
+        assert!(metrics_json.is_none());
+    }
+
+    #[test]
+    fn test_sync_run_recorder_fail_with_partial_metrics() {
+        let conn = setup_test_db();
+        let recorder = SyncRunRecorder::start(&conn, "sync", "part0001").unwrap();
+        let row_id = recorder.row_id;
+
+        let partial_metrics = vec![StageTiming {
+            name: "ingest_issues".to_string(),
+            project: Some("group/repo".to_string()),
+            elapsed_ms: 800,
+            items_processed: 30,
+            items_skipped: 0,
+            errors: 0,
+            rate_limit_hits: 1,
+            retries: 0,
+            sub_stages: vec![],
+        }];
+
+        recorder
+            .fail(&conn, "Embedding failed", Some(&partial_metrics))
+            .unwrap();
+
+        let (status, metrics_json): (String, Option<String>) = conn
+            .query_row(
+                "SELECT status, metrics_json FROM sync_runs WHERE id = ?1",
+                [row_id],
+                |row| Ok((row.get(0)?, row.get(1)?)),
+            )
+            .unwrap();
+
+        assert_eq!(status, "failed");
+        assert!(metrics_json.is_some());
+
+        let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
+        assert_eq!(parsed.len(), 1);
+        assert_eq!(parsed[0].name, "ingest_issues");
+    }
+}
--- a/src/documents/regenerator.rs
+++ b/src/documents/regenerator.rs
@@ -1,6 +1,6 @@
 use rusqlite::Connection;
 use rusqlite::OptionalExtension;
-use tracing::{debug, warn};
+use tracing::{debug, instrument, warn};

 use crate::core::error::Result;
 use crate::documents::{
@@ -21,6 +21,7 @@ pub struct RegenerateResult {
 ///
 /// Uses per-item error handling (fail-soft) and drains the queue completely
 /// via a bounded batch loop. Each dirty item is processed independently.
+#[instrument(skip(conn), fields(items_processed, items_skipped, errors))]
 pub fn regenerate_dirty_documents(conn: &Connection) -> Result<RegenerateResult> {
    let mut result = RegenerateResult::default();

@@ -61,6 +62,10 @@ pub fn regenerate_dirty_documents(conn: &Connection) -> Result<RegenerateResult>
        "Document regeneration complete"
    );

+    tracing::Span::current().record("items_processed", result.regenerated);
+    tracing::Span::current().record("items_skipped", result.unchanged);
+    tracing::Span::current().record("errors", result.errored);
+
    Ok(result)
 }

@@ -282,6 +287,7 @@ mod tests {
                updated_at INTEGER NOT NULL,
                last_seen_at INTEGER NOT NULL,
                discussions_synced_for_updated_at INTEGER,
+                resource_events_synced_for_updated_at INTEGER,
                web_url TEXT,
                raw_payload_id INTEGER
            );
--- a/src/embedding/chunk_ids.rs
+++ b/src/embedding/chunk_ids.rs
@@ -1,7 +1,7 @@
 /// Multiplier for encoding (document_id, chunk_index) into a single rowid.
 /// Supports up to 1000 chunks per document. At CHUNK_MAX_BYTES=6000,
 /// a 2MB document (MAX_DOCUMENT_BYTES_HARD) produces ~333 chunks.
-/// The pipeline enforces chunk_count < CHUNK_ROWID_MULTIPLIER at runtime.
+/// The pipeline enforces chunk_count <= CHUNK_ROWID_MULTIPLIER at runtime.
 pub const CHUNK_ROWID_MULTIPLIER: i64 = 1000;

 /// Encode (document_id, chunk_index) into a sqlite-vec rowid.
--- a/src/embedding/pipeline.rs
+++ b/src/embedding/pipeline.rs
@@ -4,7 +4,7 @@ use std::collections::HashSet;

 use rusqlite::Connection;
 use sha2::{Digest, Sha256};
-use tracing::{info, warn};
+use tracing::{info, instrument, warn};

 use crate::core::error::Result;
 use crate::embedding::change_detector::{count_pending_documents, find_pending_documents};
@@ -37,6 +37,7 @@ struct ChunkWork {
 ///
 /// Processes batches of BATCH_SIZE texts per Ollama API call.
 /// Uses keyset pagination over documents (DB_PAGE_SIZE per page).
+#[instrument(skip(conn, client, progress_callback), fields(%model_name, items_processed, items_skipped, errors))]
 pub async fn embed_documents(
    conn: &Connection,
    client: &OllamaClient,
@@ -87,7 +88,7 @@ pub async fn embed_documents(
            // Overflow guard: skip documents that produce too many chunks.
            // Must run BEFORE clear_document_embeddings so existing embeddings
            // are preserved when we skip.
-            if total_chunks as i64 >= CHUNK_ROWID_MULTIPLIER {
+            if total_chunks as i64 > CHUNK_ROWID_MULTIPLIER {
                warn!(
                    doc_id = doc.document_id,
                    chunk_count = total_chunks,
@@ -295,6 +296,10 @@ pub async fn embed_documents(
        "Embedding pipeline complete"
    );

+    tracing::Span::current().record("items_processed", result.embedded);
+    tracing::Span::current().record("items_skipped", result.skipped);
+    tracing::Span::current().record("errors", result.failed);
+
    Ok(result)
 }

--- a/src/gitlab/client.rs
+++ b/src/gitlab/client.rs
@@ -144,11 +144,12 @@ impl GitLabClient {

            if response.status() == StatusCode::TOO_MANY_REQUESTS && attempt < Self::MAX_RETRIES {
                let retry_after = Self::parse_retry_after(&response);
-                tracing::warn!(
-                    retry_after_secs = retry_after,
+                tracing::info!(
+                    path = %path,
                    attempt,
-                    path,
-                    "Rate limited by GitLab, retrying"
+                    retry_after_secs = retry_after,
+                    status_code = 429u16,
+                    "Rate limited, retrying"
                );
                sleep(Duration::from_secs(retry_after)).await;
                continue;
@@ -565,11 +566,12 @@ impl GitLabClient {

            if response.status() == StatusCode::TOO_MANY_REQUESTS && attempt < Self::MAX_RETRIES {
                let retry_after = Self::parse_retry_after(&response);
-                tracing::warn!(
-                    retry_after_secs = retry_after,
+                tracing::info!(
+                    path = %path,
                    attempt,
-                    path,
-                    "Rate limited by GitLab, retrying"
+                    retry_after_secs = retry_after,
+                    status_code = 429u16,
+                    "Rate limited, retrying"
                );
                sleep(Duration::from_secs(retry_after)).await;
                continue;
--- a/src/gitlab/transformers/discussion.rs
+++ b/src/gitlab/transformers/discussion.rs
@@ -1,5 +1,7 @@
 //! Discussion and note transformers: convert GitLab discussions to local schema.

+use tracing::warn;
+
 use crate::core::time::{iso_to_ms, iso_to_ms_strict, now_ms};
 use crate::gitlab::types::{GitLabDiscussion, GitLabNote};

@@ -60,7 +62,13 @@ pub struct NormalizedNote {

 /// Parse ISO 8601 timestamp to milliseconds, defaulting to 0 on failure.
 fn parse_timestamp(ts: &str) -> i64 {
-    iso_to_ms(ts).unwrap_or(0)
+    match iso_to_ms(ts) {
+        Some(ms) => ms,
+        None => {
+            warn!(timestamp = ts, "Invalid timestamp, defaulting to epoch 0");
+            0
+        }
+    }
 }

 /// Transform a GitLab discussion into normalized schema.
--- a/src/ingestion/merge_requests.rs
+++ b/src/ingestion/merge_requests.rs
@@ -314,7 +314,6 @@ fn process_mr_in_transaction(
    })
 }

-/// Upsert a label within a transaction, returning its ID.
 /// Upsert a label within a transaction, returning its ID.
 /// Uses INSERT...ON CONFLICT...RETURNING for a single round-trip.
 fn upsert_label_tx(
--- a/src/ingestion/orchestrator.rs
+++ b/src/ingestion/orchestrator.rs
@@ -7,11 +7,11 @@

 use futures::future::join_all;
 use rusqlite::Connection;
-use tracing::{debug, info, warn};
+use tracing::{debug, info, instrument, warn};

 use crate::Config;
 use crate::core::dependent_queue::{
-    claim_jobs, complete_job, count_pending_jobs, enqueue_job, fail_job, reclaim_stale_locks,
+    claim_jobs, complete_job, count_claimable_jobs, enqueue_job, fail_job, reclaim_stale_locks,
 };
 use crate::core::error::Result;
 use crate::gitlab::GitLabClient;
@@ -108,6 +108,10 @@ pub async fn ingest_project_issues(
 }

 /// Ingest all issues and their discussions for a project with progress reporting.
+#[instrument(
+    skip(conn, client, config, progress),
+    fields(project_id, gitlab_project_id, items_processed, items_skipped, errors)
+)]
 pub async fn ingest_project_issues_with_progress(
    conn: &Connection,
    client: &GitLabClient,
@@ -124,12 +128,17 @@ pub async fn ingest_project_issues_with_progress(
    };

    // Step 1: Ingest issues
+    emit(ProgressEvent::IssuesFetchStarted);
    let issue_result = ingest_issues(conn, client, config, project_id, gitlab_project_id).await?;

    result.issues_fetched = issue_result.fetched;
    result.issues_upserted = issue_result.upserted;
    result.labels_created = issue_result.labels_created;

+    emit(ProgressEvent::IssuesFetchComplete {
+        total: result.issues_fetched,
+    });
+
    // Step 2: Sync discussions for issues that need it
    let issues_needing_sync = issue_result.issues_needing_discussion_sync;

@@ -189,8 +198,15 @@ pub async fn ingest_project_issues_with_progress(
        }

        // Drain the queue
-        let drain_result =
-            drain_resource_events(conn, client, config, gitlab_project_id, &progress).await?;
+        let drain_result = drain_resource_events(
+            conn,
+            client,
+            config,
+            project_id,
+            gitlab_project_id,
+            &progress,
+        )
+        .await?;
        result.resource_events_fetched = drain_result.fetched;
        result.resource_events_failed = drain_result.failed;
    }
@@ -208,6 +224,10 @@ pub async fn ingest_project_issues_with_progress(
        "Project ingestion complete"
    );

+    tracing::Span::current().record("items_processed", result.issues_upserted);
+    tracing::Span::current().record("items_skipped", result.issues_skipped_discussion_sync);
+    tracing::Span::current().record("errors", result.resource_events_failed);
+
    Ok(result)
 }

@@ -280,6 +300,10 @@ pub async fn ingest_project_merge_requests(
 }

 /// Ingest all merge requests and their discussions for a project with progress reporting.
+#[instrument(
+    skip(conn, client, config, progress),
+    fields(project_id, gitlab_project_id, items_processed, items_skipped, errors)
+)]
 pub async fn ingest_project_merge_requests_with_progress(
    conn: &Connection,
    client: &GitLabClient,
@@ -380,8 +404,15 @@ pub async fn ingest_project_merge_requests_with_progress(
            debug!(enqueued, "Enqueued resource events jobs for MRs");
        }

-        let drain_result =
-            drain_resource_events(conn, client, config, gitlab_project_id, &progress).await?;
+        let drain_result = drain_resource_events(
+            conn,
+            client,
+            config,
+            project_id,
+            gitlab_project_id,
+            &progress,
+        )
+        .await?;
        result.resource_events_fetched = drain_result.fetched;
        result.resource_events_failed = drain_result.failed;
    }
@@ -400,6 +431,10 @@ pub async fn ingest_project_merge_requests_with_progress(
        "MR project ingestion complete"
    );

+    tracing::Span::current().record("items_processed", result.mrs_upserted);
+    tracing::Span::current().record("items_skipped", result.mrs_skipped_discussion_sync);
+    tracing::Span::current().record("errors", result.resource_events_failed);
+
    Ok(result)
 }

@@ -455,6 +490,7 @@ async fn sync_mr_discussions_sequential(
 pub struct DrainResult {
    pub fetched: usize,
    pub failed: usize,
+    pub skipped_not_found: usize,
 }

 /// Enqueue resource_events jobs for all entities of a given type in a project.
@@ -466,21 +502,60 @@ fn enqueue_resource_events_for_entity_type(
    project_id: i64,
    entity_type: &str,
 ) -> Result<usize> {
-    // Query all entities for this project and enqueue resource_events jobs.
-    // The UNIQUE constraint on pending_dependent_fetches makes this idempotent -
-    // already-queued entities are silently skipped via INSERT OR IGNORE.
+    // Clean up obsolete jobs: remove resource_events jobs for entities whose
+    // watermark is already current (updated_at <= resource_events_synced_for_updated_at).
+    // These are leftover from prior runs that failed after watermark-stamping but
+    // before job deletion, or from entities that no longer need syncing.
+    // We intentionally keep jobs for entities that still need syncing (including
+    // in-progress or failed-with-backoff jobs) to preserve retry state.
+    match entity_type {
+        "issue" => {
+            conn.execute(
+                "DELETE FROM pending_dependent_fetches \
+                 WHERE project_id = ?1 AND entity_type = 'issue' AND job_type = 'resource_events' \
+                   AND entity_local_id IN ( \
+                       SELECT id FROM issues \
+                       WHERE project_id = ?1 \
+                         AND updated_at <= COALESCE(resource_events_synced_for_updated_at, 0) \
+                   )",
+                [project_id],
+            )?;
+        }
+        "merge_request" => {
+            conn.execute(
+                "DELETE FROM pending_dependent_fetches \
+                 WHERE project_id = ?1 AND entity_type = 'merge_request' AND job_type = 'resource_events' \
+                   AND entity_local_id IN ( \
+                       SELECT id FROM merge_requests \
+                       WHERE project_id = ?1 \
+                         AND updated_at <= COALESCE(resource_events_synced_for_updated_at, 0) \
+                   )",
+                [project_id],
+            )?;
+        }
+        _ => {}
+    }
+
+    // Enqueue resource_events jobs only for entities whose updated_at exceeds
+    // their last resource event sync watermark.
    //
    // Use separate hardcoded queries per entity type to avoid format!-based SQL.
    let entities: Vec<(i64, i64)> = match entity_type {
        "issue" => {
-            let mut stmt =
-                conn.prepare_cached("SELECT id, iid FROM issues WHERE project_id = ?1")?;
+            let mut stmt = conn.prepare_cached(
+                "SELECT id, iid FROM issues \
+                 WHERE project_id = ?1 \
+                   AND updated_at > COALESCE(resource_events_synced_for_updated_at, 0)",
+            )?;
            stmt.query_map([project_id], |row| Ok((row.get(0)?, row.get(1)?)))?
                .collect::<std::result::Result<Vec<_>, _>>()?
        }
        "merge_request" => {
-            let mut stmt =
-                conn.prepare_cached("SELECT id, iid FROM merge_requests WHERE project_id = ?1")?;
+            let mut stmt = conn.prepare_cached(
+                "SELECT id, iid FROM merge_requests \
+                 WHERE project_id = ?1 \
+                   AND updated_at > COALESCE(resource_events_synced_for_updated_at, 0)",
+            )?;
            stmt.query_map([project_id], |row| Ok((row.get(0)?, row.get(1)?)))?
                .collect::<std::result::Result<Vec<_>, _>>()?
        }
@@ -509,10 +584,15 @@ fn enqueue_resource_events_for_entity_type(
 ///
 /// Processes jobs sequentially since `rusqlite::Connection` is not `Send`.
 /// Uses exponential backoff on failure via `fail_job`.
+#[instrument(
+    skip(conn, client, config, progress),
+    fields(project_id, gitlab_project_id, items_processed, errors)
+)]
 async fn drain_resource_events(
    conn: &Connection,
    client: &GitLabClient,
    config: &Config,
+    project_id: i64,
    gitlab_project_id: i64,
    progress: &Option<ProgressCallback>,
 ) -> Result<DrainResult> {
@@ -525,9 +605,15 @@ async fn drain_resource_events(
        info!(reclaimed, "Reclaimed stale resource event locks");
    }

-    // Count total pending jobs for progress reporting
-    let pending_counts = count_pending_jobs(conn)?;
-    let total_pending = pending_counts.get("resource_events").copied().unwrap_or(0);
+    // Count only claimable jobs (unlocked, past retry backoff) for accurate progress.
+    // Using count_pending_jobs here would inflate the total with locked/backing-off
+    // jobs that can't be claimed in this drain run, causing the progress bar to
+    // never reach 100%.
+    let claimable_counts = count_claimable_jobs(conn, project_id)?;
+    let total_pending = claimable_counts
+        .get("resource_events")
+        .copied()
+        .unwrap_or(0);

    if total_pending == 0 {
        return Ok(result);
@@ -547,15 +633,19 @@ async fn drain_resource_events(
    let mut seen_job_ids = std::collections::HashSet::new();

    loop {
-        let jobs = claim_jobs(conn, "resource_events", batch_size)?;
+        let jobs = claim_jobs(conn, "resource_events", project_id, batch_size)?;
        if jobs.is_empty() {
            break;
        }

+        // Track whether any job in this batch was actually new. If every
+        // claimed job was already seen, break to avoid an infinite loop
+        // (can happen with clock skew or zero-backoff edge cases).
+        let mut any_new_in_batch = false;
+
        for job in &jobs {
            // Guard against re-processing a job that was failed and re-claimed
-            // within the same drain run (shouldn't happen due to backoff, but
-            // defensive against clock skew or zero-backoff edge cases).
+            // within the same drain run.
            if !seen_job_ids.insert(job.id) {
                warn!(
                    job_id = job.id,
@@ -563,6 +653,7 @@ async fn drain_resource_events(
                );
                continue;
            }
+            any_new_in_batch = true;

            match client
                .fetch_all_resource_events(gitlab_project_id, &job.entity_type, job.entity_iid)
@@ -582,6 +673,11 @@ async fn drain_resource_events(
                    match store_result {
                        Ok(()) => {
                            complete_job(conn, job.id)?;
+                            update_resource_event_watermark(
+                                conn,
+                                &job.entity_type,
+                                job.entity_local_id,
+                            )?;
                            result.fetched += 1;
                        }
                        Err(e) => {
@@ -597,6 +693,25 @@ async fn drain_resource_events(
                    }
                }
                Err(e) => {
+                    // Only 404 (not found) is truly permanent -- the resource
+                    // events endpoint doesn't exist for this entity. Stamp the
+                    // watermark so we skip it next run. All other errors
+                    // (403, auth, network) get backoff retry.
+                    if e.is_permanent_api_error() {
+                        debug!(
+                            entity_type = %job.entity_type,
+                            entity_iid = job.entity_iid,
+                            error = %e,
+                            "Permanent API error for resource events, marking complete"
+                        );
+                        complete_job(conn, job.id)?;
+                        update_resource_event_watermark(
+                            conn,
+                            &job.entity_type,
+                            job.entity_local_id,
+                        )?;
+                        result.skipped_not_found += 1;
+                    } else {
                        warn!(
                            entity_type = %job.entity_type,
                            entity_iid = job.entity_iid,
@@ -607,6 +722,7 @@ async fn drain_resource_events(
                        result.failed += 1;
                    }
                }
+            }

            processed += 1;
            emit(ProgressEvent::ResourceEventFetched {
@@ -614,6 +730,12 @@ async fn drain_resource_events(
                total: total_pending,
            });
        }
+
+        // If every job in this batch was already seen, stop to prevent spinning.
+        if !any_new_in_batch {
+            warn!("All claimed jobs were already processed, breaking drain loop");
+            break;
+        }
    }

    emit(ProgressEvent::ResourceEventsFetchComplete {
@@ -629,6 +751,9 @@ async fn drain_resource_events(
        );
    }

+    tracing::Span::current().record("items_processed", result.fetched);
+    tracing::Span::current().record("errors", result.failed);
+
    Ok(result)
 }

@@ -680,6 +805,33 @@ fn store_resource_events(
    Ok(())
 }

+/// Update the resource event watermark for an entity after successful event fetch.
+///
+/// Sets `resource_events_synced_for_updated_at = updated_at` so the entity
+/// won't be re-enqueued until its `updated_at` advances again.
+fn update_resource_event_watermark(
+    conn: &Connection,
+    entity_type: &str,
+    entity_local_id: i64,
+) -> Result<()> {
+    match entity_type {
+        "issue" => {
+            conn.execute(
+                "UPDATE issues SET resource_events_synced_for_updated_at = updated_at WHERE id = ?",
+                [entity_local_id],
+            )?;
+        }
+        "merge_request" => {
+            conn.execute(
+                "UPDATE merge_requests SET resource_events_synced_for_updated_at = updated_at WHERE id = ?",
+                [entity_local_id],
+            )?;
+        }
+        _ => {}
+    }
+    Ok(())
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;
@@ -722,6 +874,7 @@ mod tests {
        let result = DrainResult::default();
        assert_eq!(result.fetched, 0);
        assert_eq!(result.failed, 0);
+        assert_eq!(result.skipped_not_found, 0);
    }

    #[test]
--- a/src/main.rs
+++ b/src/main.rs
@@ -4,7 +4,7 @@ use clap::Parser;
 use console::style;
 use dialoguer::{Confirm, Input};
 use serde::Serialize;
-use tracing_subscriber::EnvFilter;
+use tracing_subscriber::Layer;
 use tracing_subscriber::layer::SubscriberExt;
 use tracing_subscriber::util::SubscriberInitExt;

@@ -30,8 +30,10 @@ use lore::core::db::{
    LATEST_SCHEMA_VERSION, create_connection, get_schema_version, run_migrations,
 };
 use lore::core::error::{LoreError, RobotErrorOutput};
-use lore::core::paths::get_config_path;
-use lore::core::paths::get_db_path;
+use lore::core::logging;
+use lore::core::metrics::MetricsLayer;
+use lore::core::paths::{get_config_path, get_db_path, get_log_dir};
+use lore::core::sync_run::SyncRunRecorder;

 #[tokio::main]
 async fn main() {
@@ -41,25 +43,94 @@ async fn main() {
        libc::signal(libc::SIGPIPE, libc::SIG_DFL);
    }

-    // Initialize logging with progress-bar-aware writer.
-    // SuspendingWriter suspends the shared MultiProgress before each log line,
-    // preventing log output from interleaving with progress bar animations.
-    tracing_subscriber::registry()
-        .with(
-            tracing_subscriber::fmt::layer()
-                .with_target(false)
-                .with_writer(lore::cli::progress::SuspendingWriter),
-        )
-        .with(
-            EnvFilter::from_default_env()
-                .add_directive("lore=info".parse().unwrap())
-                .add_directive("warn".parse().unwrap()),
-        )
-        .init();
-
+    // Parse CLI first so we know verbosity settings before initializing the subscriber.
    let cli = Cli::parse();
    let robot_mode = cli.is_robot_mode();

+    // Try to load logging config for file layer settings.
+    // If config isn't available yet (e.g. during `lore init`), use defaults.
+    let logging_config = lore::Config::load(cli.config.as_deref())
+        .map(|c| c.logging)
+        .unwrap_or_default();
+
+    // Clean up old log files before initializing subscriber (so deleted handles aren't held open)
+    let log_dir = get_log_dir(logging_config.log_dir.as_deref());
+    if logging_config.file_logging && logging_config.retention_days > 0 {
+        logging::cleanup_old_logs(&log_dir, logging_config.retention_days);
+    }
+
+    // Build triple-layer subscriber:
+    // - stderr layer: human-readable or JSON, controlled by -v flags
+    // - file layer: always-on JSON to daily-rotated log files
+    // - metrics layer: captures span timing for robot-mode performance data
+    let stderr_filter = logging::build_stderr_filter(cli.verbose, cli.quiet);
+    let metrics_layer = MetricsLayer::new();
+
+    let registry = tracing_subscriber::registry();
+
+    // Hold the file writer guard at function scope so it flushes on exit.
+    // WorkerGuard::drop() flushes pending log entries — forgetting it loses them.
+    let _file_guard: Option<tracing_appender::non_blocking::WorkerGuard>;
+
+    // stderr layer: format depends on --log-format flag
+    if cli.log_format == "json" {
+        let stderr_layer = tracing_subscriber::fmt::layer()
+            .json()
+            .with_writer(lore::cli::progress::SuspendingWriter)
+            .with_filter(stderr_filter);
+
+        if logging_config.file_logging {
+            let file_filter = logging::build_file_filter();
+            std::fs::create_dir_all(&log_dir).ok();
+            let file_appender = tracing_appender::rolling::daily(&log_dir, "lore");
+            let (non_blocking, guard) = tracing_appender::non_blocking(file_appender);
+            _file_guard = Some(guard);
+            let file_layer = tracing_subscriber::fmt::layer()
+                .json()
+                .with_writer(non_blocking)
+                .with_filter(file_filter);
+            registry
+                .with(stderr_layer)
+                .with(file_layer)
+                .with(metrics_layer.clone())
+                .init();
+        } else {
+            _file_guard = None;
+            registry
+                .with(stderr_layer)
+                .with(metrics_layer.clone())
+                .init();
+        }
+    } else {
+        let stderr_layer = tracing_subscriber::fmt::layer()
+            .with_target(false)
+            .with_writer(lore::cli::progress::SuspendingWriter)
+            .with_filter(stderr_filter);
+
+        if logging_config.file_logging {
+            let file_filter = logging::build_file_filter();
+            std::fs::create_dir_all(&log_dir).ok();
+            let file_appender = tracing_appender::rolling::daily(&log_dir, "lore");
+            let (non_blocking, guard) = tracing_appender::non_blocking(file_appender);
+            _file_guard = Some(guard);
+            let file_layer = tracing_subscriber::fmt::layer()
+                .json()
+                .with_writer(non_blocking)
+                .with_filter(file_filter);
+            registry
+                .with(stderr_layer)
+                .with(file_layer)
+                .with(metrics_layer.clone())
+                .init();
+        } else {
+            _file_guard = None;
+            registry
+                .with(stderr_layer)
+                .with(metrics_layer.clone())
+                .init();
+        }
+    }
+
    // Apply color settings (console crate handles NO_COLOR/CLICOLOR natively in "auto" mode)
    match cli.color.as_str() {
        "never" => console::set_colors_enabled(false),
@@ -76,9 +147,18 @@ async fn main() {
        Commands::Search(args) => handle_search(cli.config.as_deref(), args, robot_mode).await,
        Commands::Stats(args) => handle_stats(cli.config.as_deref(), args, robot_mode).await,
        Commands::Embed(args) => handle_embed(cli.config.as_deref(), args, robot_mode).await,
-        Commands::Sync(args) => handle_sync_cmd(cli.config.as_deref(), args, robot_mode).await,
+        Commands::Sync(args) => {
+            handle_sync_cmd(cli.config.as_deref(), args, robot_mode, &metrics_layer).await
+        }
        Commands::Ingest(args) => {
-            handle_ingest(cli.config.as_deref(), args, robot_mode, quiet).await
+            handle_ingest(
+                cli.config.as_deref(),
+                args,
+                robot_mode,
+                quiet,
+                &metrics_layer,
+            )
+            .await
        }
        Commands::Count(args) => handle_count(cli.config.as_deref(), args, robot_mode).await,
        Commands::Status => handle_sync_status_cmd(cli.config.as_deref(), robot_mode).await,
@@ -389,6 +469,7 @@ async fn handle_ingest(
    args: IngestArgs,
    robot_mode: bool,
    quiet: bool,
+    metrics: &MetricsLayer,
 ) -> Result<(), Box<dyn std::error::Error>> {
    let config = Config::load(config_override)?;
    let display = if robot_mode || quiet {
@@ -400,6 +481,16 @@ async fn handle_ingest(
    let force = args.force && !args.no_force;
    let full = args.full && !args.no_full;

+    // Record ingest run lifecycle in sync_runs table
+    let entity_label = args.entity.as_deref().unwrap_or("all");
+    let command = format!("ingest:{entity_label}");
+    let db_path = get_db_path(config.storage.db_path.as_deref());
+    let recorder_conn = create_connection(&db_path)?;
+    let run_id = uuid::Uuid::new_v4().simple().to_string();
+    let run_id_short = &run_id[..8];
+    let recorder = SyncRunRecorder::start(&recorder_conn, &command, run_id_short)?;
+
+    let ingest_result: std::result::Result<(), Box<dyn std::error::Error>> = async {
        match args.entity.as_deref() {
            Some(resource_type) => {
                // Single entity ingest
@@ -457,8 +548,24 @@ async fn handle_ingest(
                }
            }
        }
-
        Ok(())
+    }
+    .await;
+
+    match ingest_result {
+        Ok(()) => {
+            let stages = metrics.extract_timings();
+            let total_items: usize = stages.iter().map(|s| s.items_processed).sum();
+            let total_errors: usize = stages.iter().map(|s| s.errors).sum();
+            let _ = recorder.succeed(&recorder_conn, &stages, total_items, total_errors);
+            Ok(())
+        }
+        Err(e) => {
+            let stages = metrics.extract_timings();
+            let _ = recorder.fail(&recorder_conn, &e.to_string(), Some(&stages));
+            Err(e)
+        }
+    }
 }

 /// JSON output for combined ingest (issues + mrs).
@@ -1148,6 +1255,7 @@ async fn handle_sync_cmd(
    config_override: Option<&str>,
    args: SyncArgs,
    robot_mode: bool,
+    metrics: &MetricsLayer,
 ) -> Result<(), Box<dyn std::error::Error>> {
    let mut config = Config::load(config_override)?;
    if args.no_events {
@@ -1162,16 +1270,39 @@ async fn handle_sync_cmd(
        robot_mode,
    };

+    // Record sync run lifecycle in sync_runs table
+    let db_path = get_db_path(config.storage.db_path.as_deref());
+    let recorder_conn = create_connection(&db_path)?;
+    let run_id = uuid::Uuid::new_v4().simple().to_string();
+    let run_id_short = &run_id[..8];
+    let recorder = SyncRunRecorder::start(&recorder_conn, "sync", run_id_short)?;
+
    let start = std::time::Instant::now();
-    let result = run_sync(&config, options).await?;
+    match run_sync(&config, options, Some(run_id_short)).await {
+        Ok(result) => {
            let elapsed = start.elapsed();
+            let stages = metrics.extract_timings();
+            let total_items = result.issues_updated
+                + result.mrs_updated
+                + result.documents_regenerated
+                + result.documents_embedded;
+            let total_errors = result.resource_events_failed;
+            // Best-effort: don't fail the command if recording fails
+            let _ = recorder.succeed(&recorder_conn, &stages, total_items, total_errors);

            if robot_mode {
-        print_sync_json(&result, elapsed.as_millis() as u64);
+                print_sync_json(&result, elapsed.as_millis() as u64, Some(metrics));
            } else {
-        print_sync(&result, elapsed);
+                print_sync(&result, elapsed, Some(metrics));
            }
            Ok(())
+        }
+        Err(e) => {
+            let stages = metrics.extract_timings();
+            let _ = recorder.fail(&recorder_conn, &e.to_string(), Some(&stages));
+            Err(e.into())
+        }
+    }
 }

 // ============================================================================
--- a/src/search/rrf.rs
+++ b/src/search/rrf.rs
@@ -60,11 +60,7 @@ pub fn rank_rrf(vector_results: &[(i64, f64)], fts_results: &[(i64, f64)]) -> Ve
        .collect();

    // Sort descending by rrf_score
-    results.sort_by(|a, b| {
-        b.rrf_score
-            .partial_cmp(&a.rrf_score)
-            .unwrap_or(std::cmp::Ordering::Equal)
-    });
+    results.sort_by(|a, b| b.rrf_score.total_cmp(&a.rrf_score));

    // Normalize: best = 1.0
    if let Some(max_score) = results.first().map(|r| r.rrf_score).filter(|&s| s > 0.0) {
Author	SHA1	Message	Date
teernisse	a65ea2f56f	chore(beads): Add observability and orchestrator issues to tracker Add new beads for MR orchestrator integration, sync run observability, metrics collection, logging infrastructure, and CLI verbosity controls. Update last-touched timestamp. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:34 -05:00
teernisse	38da7ca47b	docs: Add observability PRD and sync pipeline explorer visualization - prd-observability.md: Product requirements document for the sync pipeline observability system, covering structured logging, metrics collection, sync run tracking, and robot-mode performance output - gitlore-sync-explorer.html: Self-contained interactive HTML visualization for exploring sync pipeline stage timings and data flow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:22 -05:00
teernisse	86a51cddef	fix: Project-scoped job claiming, structured rate-limit logging, RRF total_cmp Targeted fixes across multiple subsystems: dependent_queue: - Add project_id parameter to claim_jobs() for project-scoped job claiming, preventing cross-project job theft during concurrent multi-project ingestion - Add project_id parameter to count_pending_jobs() with optional scoping (None returns global counts, Some(pid) returns per-project counts) gitlab/client: - Downgrade rate-limit log from warn to info (429s are expected operational behavior, not warnings) and add structured fields (path, status_code) for better log filtering and aggregation gitlab/transformers/discussion: - Add tracing::warn on invalid timestamp parse instead of silent fallback to epoch 0, making data quality issues visible in logs ingestion/merge_requests: - Remove duplicate doc comment on upsert_label_tx search/rrf: - Replace partial_cmp().unwrap_or() with total_cmp() for f64 sorting, eliminating the NaN edge case entirely (total_cmp treats NaN consistently) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:13 -05:00
teernisse	f6d19a9467	feat(sync): Instrument pipeline with tracing spans, run_id correlation, and metrics Add end-to-end observability to the sync and ingest pipelines: Sync command: - Generate UUID-based run_id for each sync invocation, propagated through all child spans for log correlation across stages - Accept MetricsLayer reference to extract hierarchical StageTiming data after pipeline completion for robot-mode performance output - Record sync runs in DB via SyncRunRecorder (start/succeed/fail lifecycle) - Wrap entire sync execution in a root tracing span with run_id field Ingest command: - Wrap run_ingest in an instrumented root span with run_id and resource_type - Add project path prefix to discussion progress bars for multi-project clarity - Reset resource_events_synced_for_updated_at on --full re-sync Sync status: - Expand from single last_run to configurable recent runs list (default 10) - Parse and expose StageTiming metrics from stored metrics_json - Add run_id, total_items_processed, total_errors to SyncRunInfo - Add mr_count to DataSummary for complete entity coverage Orchestrator: - Add #[instrument] with structured fields to issue and MR ingestion functions - Record items_processed, items_skipped, errors on span close for MetricsLayer - Emit granular progress events (IssuesFetchStarted, IssuesFetchComplete) - Pass project_id through to drain_resource_events for scoped job claiming Document regenerator and embedding pipeline: - Add #[instrument] spans with items_processed, items_skipped, errors fields - Record final counts on span close for metrics extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:39:00 -05:00
teernisse	362503d3bf	feat(cli): Add verbosity controls, JSON log format, and triple-layer subscriber Overhaul the CLI logging infrastructure for production observability: CLI flags: - Add -v/-vv/-vvv (--verbose) for progressive stderr verbosity control: 0=INFO, 1=DEBUG app, 2=DEBUG all, 3+=TRACE - Add --log-format text\|json for structured stderr output in automation - Existing -q/--quiet overrides verbosity for silent operation Subscriber architecture (main.rs): - Replace single-layer subscriber with triple-layer setup: 1. stderr layer: human-readable or JSON, filtered by -v flags 2. file layer: always-on JSON to daily-rotated logs (lore.YYYY-MM-DD.log) 3. MetricsLayer: captures span timing for robot-mode performance payloads - Parse CLI before subscriber init so verbosity is known at setup time - Load LoggingConfig early (with graceful fallback for pre-init commands) - Clean up old log files before subscriber init to avoid holding deleted handles - Hold WorkerGuard at function scope to ensure flush on exit Doctor command: - Add logging health check: validates log directory exists, reports file count and total size, warns on missing or inaccessible log directory Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:38:43 -05:00
teernisse	329c8f4539	feat(observability): Add metrics, logging, and sync-run core modules Introduce the foundational observability layer for the sync pipeline: - MetricsLayer: Custom tracing subscriber layer that captures span timing and structured fields, materializing them into a hierarchical Vec<StageTiming> tree for robot-mode performance data output - logging: Dual-layer subscriber infrastructure with configurable stderr verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily rotation and configurable retention (default 30 days) - SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs table (start -> succeed\|fail), with correlation IDs and aggregate counts - LoggingConfig: New config section for log_dir, retention_days, and file_logging toggle - get_log_dir(): Path helper for log directory resolution - is_permanent_api_error(): Distinguish retryable vs permanent API failures (only 404 is truly permanent; 403/auth errors may be environmental) Database changes: - Migration 013: Add resource_events_synced_for_updated_at watermark columns to issues and merge_requests tables for incremental resource event sync - Migration 014: Enrich sync_runs with run_id correlation ID, aggregate counts (total_items_processed, total_errors), and run_id index - Wrap file-based migrations in savepoints for rollback safety Dependencies: Add uuid (run_id generation), tracing-appender (file logging) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 13:38:29 -05:00