feat(sync): Instrument pipeline with tracing spans, run_id correlation, and metrics

Add end-to-end observability to the sync and ingest pipelines: Sync command: - Generate UUID-based run_id for each sync invocation, propagated through all child spans for log correlation across stages - Accept MetricsLayer reference to extract hierarchical StageTiming data after pipeline completion for robot-mode performance output - Record sync runs in DB via SyncRunRecorder (start/succeed/fail lifecycle) - Wrap entire sync execution in a root tracing span with run_id field Ingest command: - Wrap run_ingest in an instrumented root span with run_id and resource_type - Add project path prefix to discussion progress bars for multi-project clarity - Reset resource_events_synced_for_updated_at on --full re-sync Sync status: - Expand from single last_run to configurable recent runs list (default 10) - Parse and expose StageTiming metrics from stored metrics_json - Add run_id, total_items_processed, total_errors to SyncRunInfo - Add mr_count to DataSummary for complete entity coverage Orchestrator: - Add #[instrument] with structured fields to issue and MR ingestion functions - Record items_processed, items_skipped, errors on span close for MetricsLayer - Emit granular progress events (IssuesFetchStarted, IssuesFetchComplete) - Pass project_id through to drain_resource_events for scoped job claiming Document regenerator and embedding pipeline: - Add #[instrument] spans with items_processed, items_skipped, errors fields - Record final counts on span close for metrics extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 10:01:28 -05:00
parent 362503d3bf
commit f6d19a9467
6 changed files with 603 additions and 234 deletions
--- a/src/documents/regenerator.rs
+++ b/src/documents/regenerator.rs
@@ -1,6 +1,6 @@
 use rusqlite::Connection;
 use rusqlite::OptionalExtension;
-use tracing::{debug, warn};
+use tracing::{debug, instrument, warn};

 use crate::core::error::Result;
 use crate::documents::{
@@ -21,6 +21,7 @@ pub struct RegenerateResult {
 ///
 /// Uses per-item error handling (fail-soft) and drains the queue completely
 /// via a bounded batch loop. Each dirty item is processed independently.
+#[instrument(skip(conn), fields(items_processed, items_skipped, errors))]
 pub fn regenerate_dirty_documents(conn: &Connection) -> Result<RegenerateResult> {
    let mut result = RegenerateResult::default();

@@ -61,6 +62,10 @@ pub fn regenerate_dirty_documents(conn: &Connection) -> Result<RegenerateResult>
        "Document regeneration complete"
    );

+    tracing::Span::current().record("items_processed", result.regenerated);
+    tracing::Span::current().record("items_skipped", result.unchanged);
+    tracing::Span::current().record("errors", result.errored);
+
    Ok(result)
 }

@@ -282,6 +287,7 @@ mod tests {
                updated_at INTEGER NOT NULL,
                last_seen_at INTEGER NOT NULL,
                discussions_synced_for_updated_at INTEGER,
+                resource_events_synced_for_updated_at INTEGER,
                web_url TEXT,
                raw_payload_id INTEGER
            );