feat(sync): Instrument pipeline with tracing spans, run_id correlation, and metrics
Add end-to-end observability to the sync and ingest pipelines: Sync command: - Generate UUID-based run_id for each sync invocation, propagated through all child spans for log correlation across stages - Accept MetricsLayer reference to extract hierarchical StageTiming data after pipeline completion for robot-mode performance output - Record sync runs in DB via SyncRunRecorder (start/succeed/fail lifecycle) - Wrap entire sync execution in a root tracing span with run_id field Ingest command: - Wrap run_ingest in an instrumented root span with run_id and resource_type - Add project path prefix to discussion progress bars for multi-project clarity - Reset resource_events_synced_for_updated_at on --full re-sync Sync status: - Expand from single last_run to configurable recent runs list (default 10) - Parse and expose StageTiming metrics from stored metrics_json - Add run_id, total_items_processed, total_errors to SyncRunInfo - Add mr_count to DataSummary for complete entity coverage Orchestrator: - Add #[instrument] with structured fields to issue and MR ingestion functions - Record items_processed, items_skipped, errors on span close for MetricsLayer - Emit granular progress events (IssuesFetchStarted, IssuesFetchComplete) - Pass project_id through to drain_resource_events for scoped job claiming Document regenerator and embedding pipeline: - Add #[instrument] spans with items_processed, items_skipped, errors fields - Record final counts on span close for metrics extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -4,7 +4,7 @@ use std::collections::HashSet;
|
||||
|
||||
use rusqlite::Connection;
|
||||
use sha2::{Digest, Sha256};
|
||||
use tracing::{info, warn};
|
||||
use tracing::{info, instrument, warn};
|
||||
|
||||
use crate::core::error::Result;
|
||||
use crate::embedding::change_detector::{count_pending_documents, find_pending_documents};
|
||||
@@ -37,6 +37,7 @@ struct ChunkWork {
|
||||
///
|
||||
/// Processes batches of BATCH_SIZE texts per Ollama API call.
|
||||
/// Uses keyset pagination over documents (DB_PAGE_SIZE per page).
|
||||
#[instrument(skip(conn, client, progress_callback), fields(%model_name, items_processed, items_skipped, errors))]
|
||||
pub async fn embed_documents(
|
||||
conn: &Connection,
|
||||
client: &OllamaClient,
|
||||
@@ -87,7 +88,7 @@ pub async fn embed_documents(
|
||||
// Overflow guard: skip documents that produce too many chunks.
|
||||
// Must run BEFORE clear_document_embeddings so existing embeddings
|
||||
// are preserved when we skip.
|
||||
if total_chunks as i64 >= CHUNK_ROWID_MULTIPLIER {
|
||||
if total_chunks as i64 > CHUNK_ROWID_MULTIPLIER {
|
||||
warn!(
|
||||
doc_id = doc.document_id,
|
||||
chunk_count = total_chunks,
|
||||
@@ -295,6 +296,10 @@ pub async fn embed_documents(
|
||||
"Embedding pipeline complete"
|
||||
);
|
||||
|
||||
tracing::Span::current().record("items_processed", result.embedded);
|
||||
tracing::Span::current().record("items_skipped", result.skipped);
|
||||
tracing::Span::current().record("errors", result.failed);
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user