8 Commits

Author SHA1 Message Date
Taylor Eernisse
9c04b7fb1b chore(beads): Update issue tracker metadata
Syncs .beads/issues.jsonl and last-touched timestamp with current
project state.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:04:44 -05:00
Taylor Eernisse
dd2869fd98 test: Remove redundant comments from test files
Applies the same doc comment cleanup to test files:
- Removes test module headers (//! lines)
- Removes obvious test function comments
- Retains comments explaining non-obvious test scenarios

Test names should be descriptive enough to convey intent without
additional comments. Complex test setup or assertions that need
explanation retain their comments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:04:39 -05:00
Taylor Eernisse
65583ed5d6 refactor: Remove redundant doc comments throughout codebase
Removes module-level doc comments (//! lines) and excessive inline doc
comments that were duplicating information already evident from:
- Function/struct names (self-documenting code)
- Type signatures (the what is clear from types)
- Implementation context (the how is clear from code)

Affected modules:
- cli/* - Removed command descriptions duplicating clap help text
- core/* - Removed module headers and obvious function docs
- documents/* - Removed extractor/regenerator/truncation docs
- embedding/* - Removed pipeline and chunking docs
- gitlab/* - Removed client and transformer docs (kept type definitions)
- ingestion/* - Removed orchestrator and ingestion docs
- search/* - Removed FTS and vector search docs

Philosophy: Code should be self-documenting. Comments should explain
"why" (business decisions, non-obvious constraints) not "what" (which
the code itself shows). This change reduces noise and maintenance burden
while keeping the codebase just as understandable.

Retains comments for:
- Non-obvious business logic
- Important safety invariants
- Complex algorithm explanations
- Public API boundaries where generated docs matter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:04:32 -05:00
Taylor Eernisse
976ad92ef0 test(gitlab): Add GitLabIssueRef deserialization tests
Adds test coverage for the new GitLabIssueRef type used by the
MR closes_issues API endpoint:

- deserializes_gitlab_issue_ref: Single object with all fields
- deserializes_gitlab_issue_ref_array: Array of refs (typical API response)

Validates that cross-project references (different project_id values)
deserialize correctly, which is important for cross-project close links.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:47 -05:00
Taylor Eernisse
a76dc8089e feat(orchestrator): Integrate closes_issues fetching and cross-ref extraction
Extends the MR ingestion pipeline to populate the entity_references table
from multiple sources:

1. Resource state events (extract_refs_from_state_events):
   Called after draining the resource_events queue for both issues and MRs.
   Extracts "closes" relationships from the structured API data.

2. System notes (extract_refs_from_system_notes):
   Called during MR ingestion to parse "mentioned in" and "closed by"
   patterns from discussion note bodies.

3. MR closes_issues API (new):
   - enqueue_mr_closes_issues_jobs(): Queues jobs for all MRs
   - drain_mr_closes_issues(): Fetches closes_issues for each MR
   - Records cross-references with source_method='closes_issues_api'

New progress events:
- ClosesIssuesFetchStarted { total }
- ClosesIssueFetched { current, total }
- ClosesIssuesFetchComplete { fetched, failed }

New result fields on IngestMrProjectResult:
- closes_issues_fetched: Count of successful fetches
- closes_issues_failed: Count of failed fetches

The pipeline now comprehensively builds the relationship graph between
issues and MRs, enabling queries like "what will close this issue?"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:40 -05:00
Taylor Eernisse
26cf13248d feat(gitlab): Add MR closes_issues API endpoint and GitLabIssueRef type
Extends the GitLab client to fetch the list of issues that an MR will close
when merged, using the /projects/:id/merge_requests/:iid/closes_issues endpoint.

New type:
- GitLabIssueRef: Lightweight issue reference with id, iid, project_id, title,
  state, and web_url. Used for the closes_issues response which returns a list
  of issue summaries rather than full GitLabIssue objects.

New client method:
- fetch_mr_closes_issues(gitlab_project_id, iid): Returns Vec<GitLabIssueRef>
  for all issues that the MR's description/commits indicate will be closed.

This enables building the entity_references table from API data in addition to
parsing system notes, providing more reliable cross-reference discovery.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:30 -05:00
Taylor Eernisse
a2e26454dc build: Add regex dependency for cross-reference parsing
The note_parser module requires regex for extracting "mentioned in" and
"closed by" patterns from GitLab system notes. The regex crate provides:

- LazyLock-compatible lazy compilation (Regex::new at first use)
- Named capture groups for clean field extraction
- Efficient iteration over all matches via captures_iter()

Version 1.x is the current stable release with good compile times.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:21 -05:00
Taylor Eernisse
f748570d4d feat(core): Add cross-reference extraction infrastructure
Introduces two new modules for extracting and storing entity cross-references
from GitLab data:

note_parser.rs:
- Parses system notes for "mentioned in" and "closed by" patterns
- Extracts cross-project references (group/project#42, group/project!123)
- Uses lazy-compiled regexes for performance
- Handles both issue (#) and MR (!) sigils
- Provides extract_refs_from_system_notes() for batch processing

references.rs:
- Extracts refs from resource_state_events table (API-sourced closes links)
- Provides insert_entity_reference() for storing discovered references
- Includes resolution helpers: resolve_issue_local_id, resolve_mr_local_id,
  resolve_project_path for converting iids to internal IDs
- Enables cross-project reference resolution

These modules power the entity_references table, enabling features like
"find all MRs that close this issue" and "find all issues mentioned in this MR".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:03:13 -05:00
78 changed files with 1623 additions and 2171 deletions

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
bd-1ht bd-3ia

1
Cargo.lock generated
View File

@@ -1122,6 +1122,7 @@ dependencies = [
"libc", "libc",
"open", "open",
"rand", "rand",
"regex",
"reqwest", "reqwest",
"rusqlite", "rusqlite",
"serde", "serde",

View File

@@ -46,6 +46,7 @@ sha2 = "0.10"
flate2 = "1" flate2 = "1"
chrono = { version = "0.4", features = ["serde"] } chrono = { version = "0.4", features = ["serde"] }
uuid = { version = "1", features = ["v4"] } uuid = { version = "1", features = ["v4"] }
regex = "1"
[target.'cfg(unix)'.dependencies] [target.'cfg(unix)'.dependencies]
libc = "0.2" libc = "0.2"

View File

@@ -1,22 +1,16 @@
//! Auth test command - verify GitLab authentication.
use crate::core::config::Config; use crate::core::config::Config;
use crate::core::error::{LoreError, Result}; use crate::core::error::{LoreError, Result};
use crate::gitlab::GitLabClient; use crate::gitlab::GitLabClient;
/// Result of successful auth test.
pub struct AuthTestResult { pub struct AuthTestResult {
pub username: String, pub username: String,
pub name: String, pub name: String,
pub base_url: String, pub base_url: String,
} }
/// Run the auth-test command.
pub async fn run_auth_test(config_path: Option<&str>) -> Result<AuthTestResult> { pub async fn run_auth_test(config_path: Option<&str>) -> Result<AuthTestResult> {
// 1. Load config
let config = Config::load(config_path)?; let config = Config::load(config_path)?;
// 2. Get token from environment
let token = std::env::var(&config.gitlab.token_env_var) let token = std::env::var(&config.gitlab.token_env_var)
.map(|t| t.trim().to_string()) .map(|t| t.trim().to_string())
.map_err(|_| LoreError::TokenNotSet { .map_err(|_| LoreError::TokenNotSet {
@@ -29,10 +23,8 @@ pub async fn run_auth_test(config_path: Option<&str>) -> Result<AuthTestResult>
}); });
} }
// 3. Create client and test auth
let client = GitLabClient::new(&config.gitlab.base_url, &token, None); let client = GitLabClient::new(&config.gitlab.base_url, &token, None);
// 4. Get current user
let user = client.get_current_user().await?; let user = client.get_current_user().await?;
Ok(AuthTestResult { Ok(AuthTestResult {

View File

@@ -1,5 +1,3 @@
//! Count command - display entity counts from local database.
use console::style; use console::style;
use rusqlite::Connection; use rusqlite::Connection;
use serde::Serialize; use serde::Serialize;
@@ -10,23 +8,20 @@ use crate::core::error::Result;
use crate::core::events_db::{self, EventCounts}; use crate::core::events_db::{self, EventCounts};
use crate::core::paths::get_db_path; use crate::core::paths::get_db_path;
/// Result of count query.
pub struct CountResult { pub struct CountResult {
pub entity: String, pub entity: String,
pub count: i64, pub count: i64,
pub system_count: Option<i64>, // For notes only pub system_count: Option<i64>,
pub state_breakdown: Option<StateBreakdown>, // For issues/MRs pub state_breakdown: Option<StateBreakdown>,
} }
/// State breakdown for issues or MRs.
pub struct StateBreakdown { pub struct StateBreakdown {
pub opened: i64, pub opened: i64,
pub closed: i64, pub closed: i64,
pub merged: Option<i64>, // MRs only pub merged: Option<i64>,
pub locked: Option<i64>, // MRs only pub locked: Option<i64>,
} }
/// Run the count command.
pub fn run_count(config: &Config, entity: &str, type_filter: Option<&str>) -> Result<CountResult> { pub fn run_count(config: &Config, entity: &str, type_filter: Option<&str>) -> Result<CountResult> {
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
@@ -45,7 +40,6 @@ pub fn run_count(config: &Config, entity: &str, type_filter: Option<&str>) -> Re
} }
} }
/// Count issues with state breakdown.
fn count_issues(conn: &Connection) -> Result<CountResult> { fn count_issues(conn: &Connection) -> Result<CountResult> {
let count: i64 = conn.query_row("SELECT COUNT(*) FROM issues", [], |row| row.get(0))?; let count: i64 = conn.query_row("SELECT COUNT(*) FROM issues", [], |row| row.get(0))?;
@@ -74,7 +68,6 @@ fn count_issues(conn: &Connection) -> Result<CountResult> {
}) })
} }
/// Count merge requests with state breakdown.
fn count_mrs(conn: &Connection) -> Result<CountResult> { fn count_mrs(conn: &Connection) -> Result<CountResult> {
let count: i64 = conn.query_row("SELECT COUNT(*) FROM merge_requests", [], |row| row.get(0))?; let count: i64 = conn.query_row("SELECT COUNT(*) FROM merge_requests", [], |row| row.get(0))?;
@@ -115,7 +108,6 @@ fn count_mrs(conn: &Connection) -> Result<CountResult> {
}) })
} }
/// Count discussions with optional noteable type filter.
fn count_discussions(conn: &Connection, type_filter: Option<&str>) -> Result<CountResult> { fn count_discussions(conn: &Connection, type_filter: Option<&str>) -> Result<CountResult> {
let (count, entity_name) = match type_filter { let (count, entity_name) = match type_filter {
Some("issue") => { Some("issue") => {
@@ -149,7 +141,6 @@ fn count_discussions(conn: &Connection, type_filter: Option<&str>) -> Result<Cou
}) })
} }
/// Count notes with optional noteable type filter.
fn count_notes(conn: &Connection, type_filter: Option<&str>) -> Result<CountResult> { fn count_notes(conn: &Connection, type_filter: Option<&str>) -> Result<CountResult> {
let (total, system_count, entity_name) = match type_filter { let (total, system_count, entity_name) = match type_filter {
Some("issue") => { Some("issue") => {
@@ -184,7 +175,6 @@ fn count_notes(conn: &Connection, type_filter: Option<&str>) -> Result<CountResu
} }
}; };
// Non-system notes count
let non_system = total - system_count; let non_system = total - system_count;
Ok(CountResult { Ok(CountResult {
@@ -195,7 +185,6 @@ fn count_notes(conn: &Connection, type_filter: Option<&str>) -> Result<CountResu
}) })
} }
/// Format number with thousands separators.
fn format_number(n: i64) -> String { fn format_number(n: i64) -> String {
let s = n.to_string(); let s = n.to_string();
let chars: Vec<char> = s.chars().collect(); let chars: Vec<char> = s.chars().collect();
@@ -211,7 +200,6 @@ fn format_number(n: i64) -> String {
result result
} }
/// JSON output structure for count command.
#[derive(Serialize)] #[derive(Serialize)]
struct CountJsonOutput { struct CountJsonOutput {
ok: bool, ok: bool,
@@ -238,14 +226,12 @@ struct CountJsonBreakdown {
locked: Option<i64>, locked: Option<i64>,
} }
/// Run the event count query.
pub fn run_count_events(config: &Config) -> Result<EventCounts> { pub fn run_count_events(config: &Config) -> Result<EventCounts> {
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
events_db::count_events(&conn) events_db::count_events(&conn)
} }
/// JSON output structure for event counts.
#[derive(Serialize)] #[derive(Serialize)]
struct EventCountJsonOutput { struct EventCountJsonOutput {
ok: bool, ok: bool,
@@ -267,7 +253,6 @@ struct EventTypeCounts {
total: usize, total: usize,
} }
/// Print event counts as JSON (robot mode).
pub fn print_event_count_json(counts: &EventCounts) { pub fn print_event_count_json(counts: &EventCounts) {
let output = EventCountJsonOutput { let output = EventCountJsonOutput {
ok: true, ok: true,
@@ -294,7 +279,6 @@ pub fn print_event_count_json(counts: &EventCounts) {
println!("{}", serde_json::to_string(&output).unwrap()); println!("{}", serde_json::to_string(&output).unwrap());
} }
/// Print event counts (human-readable).
pub fn print_event_count(counts: &EventCounts) { pub fn print_event_count(counts: &EventCounts) {
println!( println!(
"{:<20} {:>8} {:>8} {:>8}", "{:<20} {:>8} {:>8} {:>8}",
@@ -341,7 +325,6 @@ pub fn print_event_count(counts: &EventCounts) {
); );
} }
/// Print count result as JSON (robot mode).
pub fn print_count_json(result: &CountResult) { pub fn print_count_json(result: &CountResult) {
let breakdown = result.state_breakdown.as_ref().map(|b| CountJsonBreakdown { let breakdown = result.state_breakdown.as_ref().map(|b| CountJsonBreakdown {
opened: b.opened, opened: b.opened,
@@ -363,7 +346,6 @@ pub fn print_count_json(result: &CountResult) {
println!("{}", serde_json::to_string(&output).unwrap()); println!("{}", serde_json::to_string(&output).unwrap());
} }
/// Print count result.
pub fn print_count(result: &CountResult) { pub fn print_count(result: &CountResult) {
let count_str = format_number(result.count); let count_str = format_number(result.count);
@@ -386,7 +368,6 @@ pub fn print_count(result: &CountResult) {
); );
} }
// Print state breakdown if available
if let Some(breakdown) = &result.state_breakdown { if let Some(breakdown) = &result.state_breakdown {
println!(" opened: {}", format_number(breakdown.opened)); println!(" opened: {}", format_number(breakdown.opened));
if let Some(merged) = breakdown.merged { if let Some(merged) = breakdown.merged {

View File

@@ -1,5 +1,3 @@
//! Doctor command - check environment health.
use console::style; use console::style;
use serde::Serialize; use serde::Serialize;
@@ -100,30 +98,22 @@ pub struct LoggingCheck {
pub total_bytes: Option<u64>, pub total_bytes: Option<u64>,
} }
/// Run the doctor command.
pub async fn run_doctor(config_path: Option<&str>) -> DoctorResult { pub async fn run_doctor(config_path: Option<&str>) -> DoctorResult {
let config_path_buf = get_config_path(config_path); let config_path_buf = get_config_path(config_path);
let config_path_str = config_path_buf.display().to_string(); let config_path_str = config_path_buf.display().to_string();
// Check config
let (config_check, config) = check_config(&config_path_str); let (config_check, config) = check_config(&config_path_str);
// Check database
let database_check = check_database(config.as_ref()); let database_check = check_database(config.as_ref());
// Check GitLab
let gitlab_check = check_gitlab(config.as_ref()).await; let gitlab_check = check_gitlab(config.as_ref()).await;
// Check projects
let projects_check = check_projects(config.as_ref()); let projects_check = check_projects(config.as_ref());
// Check Ollama
let ollama_check = check_ollama(config.as_ref()).await; let ollama_check = check_ollama(config.as_ref()).await;
// Check logging
let logging_check = check_logging(config.as_ref()); let logging_check = check_logging(config.as_ref());
// Success if all required checks pass (ollama and logging are optional)
let success = config_check.result.status == CheckStatus::Ok let success = config_check.result.status == CheckStatus::Ok
&& database_check.result.status == CheckStatus::Ok && database_check.result.status == CheckStatus::Ok
&& gitlab_check.result.status == CheckStatus::Ok && gitlab_check.result.status == CheckStatus::Ok
@@ -393,7 +383,6 @@ async fn check_ollama(config: Option<&Config>) -> OllamaCheck {
let base_url = &config.embedding.base_url; let base_url = &config.embedding.base_url;
let model = &config.embedding.model; let model = &config.embedding.model;
// Short timeout for Ollama check
let client = reqwest::Client::builder() let client = reqwest::Client::builder()
.timeout(std::time::Duration::from_secs(2)) .timeout(std::time::Duration::from_secs(2))
.build() .build()
@@ -418,9 +407,6 @@ async fn check_ollama(config: Option<&Config>) -> OllamaCheck {
.map(|m| m.name.split(':').next().unwrap_or(&m.name)) .map(|m| m.name.split(':').next().unwrap_or(&m.name))
.collect(); .collect();
// Strip tag from configured model name too (e.g.
// "nomic-embed-text:v1.5" → "nomic-embed-text") so both
// sides are compared at the same granularity.
let model_base = model.split(':').next().unwrap_or(model); let model_base = model.split(':').next().unwrap_or(model);
if !model_names.contains(&model_base) { if !model_names.contains(&model_base) {
return OllamaCheck { return OllamaCheck {
@@ -531,7 +517,6 @@ fn check_logging(config: Option<&Config>) -> LoggingCheck {
} }
} }
/// Format and print doctor results to console.
pub fn print_doctor_results(result: &DoctorResult) { pub fn print_doctor_results(result: &DoctorResult) {
println!("\nlore doctor\n"); println!("\nlore doctor\n");

View File

@@ -1,5 +1,3 @@
//! Embed command: generate vector embeddings for documents via Ollama.
use console::style; use console::style;
use serde::Serialize; use serde::Serialize;
@@ -10,7 +8,6 @@ use crate::core::paths::get_db_path;
use crate::embedding::ollama::{OllamaClient, OllamaConfig}; use crate::embedding::ollama::{OllamaClient, OllamaConfig};
use crate::embedding::pipeline::embed_documents; use crate::embedding::pipeline::embed_documents;
/// Result of the embed command.
#[derive(Debug, Default, Serialize)] #[derive(Debug, Default, Serialize)]
pub struct EmbedCommandResult { pub struct EmbedCommandResult {
pub embedded: usize, pub embedded: usize,
@@ -18,9 +15,6 @@ pub struct EmbedCommandResult {
pub skipped: usize, pub skipped: usize,
} }
/// Run the embed command.
///
/// `progress_callback` reports `(processed, total)` as documents are embedded.
pub async fn run_embed( pub async fn run_embed(
config: &Config, config: &Config,
full: bool, full: bool,
@@ -30,7 +24,6 @@ pub async fn run_embed(
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
// Build Ollama config from user settings
let ollama_config = OllamaConfig { let ollama_config = OllamaConfig {
base_url: config.embedding.base_url.clone(), base_url: config.embedding.base_url.clone(),
model: config.embedding.model.clone(), model: config.embedding.model.clone(),
@@ -38,13 +31,9 @@ pub async fn run_embed(
}; };
let client = OllamaClient::new(ollama_config); let client = OllamaClient::new(ollama_config);
// Health check — fail fast if Ollama is down or model missing
client.health_check().await?; client.health_check().await?;
if full { if full {
// Clear ALL embeddings and metadata atomically for a complete re-embed.
// Wrapped in a transaction so a crash between the two DELETEs can't
// leave orphaned data.
conn.execute_batch( conn.execute_batch(
"BEGIN; "BEGIN;
DELETE FROM embedding_metadata; DELETE FROM embedding_metadata;
@@ -52,7 +41,6 @@ pub async fn run_embed(
COMMIT;", COMMIT;",
)?; )?;
} else if retry_failed { } else if retry_failed {
// Clear errors so they become pending again
conn.execute( conn.execute(
"UPDATE embedding_metadata SET last_error = NULL, attempt_count = 0 "UPDATE embedding_metadata SET last_error = NULL, attempt_count = 0
WHERE last_error IS NOT NULL", WHERE last_error IS NOT NULL",
@@ -70,7 +58,6 @@ pub async fn run_embed(
}) })
} }
/// Print human-readable output.
pub fn print_embed(result: &EmbedCommandResult) { pub fn print_embed(result: &EmbedCommandResult) {
println!("{} Embedding complete", style("done").green().bold(),); println!("{} Embedding complete", style("done").green().bold(),);
println!(" Embedded: {}", result.embedded); println!(" Embedded: {}", result.embedded);
@@ -82,14 +69,12 @@ pub fn print_embed(result: &EmbedCommandResult) {
} }
} }
/// JSON output.
#[derive(Serialize)] #[derive(Serialize)]
struct EmbedJsonOutput<'a> { struct EmbedJsonOutput<'a> {
ok: bool, ok: bool,
data: &'a EmbedCommandResult, data: &'a EmbedCommandResult,
} }
/// Print JSON robot-mode output.
pub fn print_embed_json(result: &EmbedCommandResult) { pub fn print_embed_json(result: &EmbedCommandResult) {
let output = EmbedJsonOutput { let output = EmbedJsonOutput {
ok: true, ok: true,

View File

@@ -1,5 +1,3 @@
//! Generate searchable documents from ingested GitLab data.
use console::style; use console::style;
use rusqlite::Connection; use rusqlite::Connection;
use serde::Serialize; use serde::Serialize;
@@ -14,7 +12,6 @@ use crate::documents::{SourceType, regenerate_dirty_documents};
const FULL_MODE_CHUNK_SIZE: i64 = 2000; const FULL_MODE_CHUNK_SIZE: i64 = 2000;
/// Result of a generate-docs run.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct GenerateDocsResult { pub struct GenerateDocsResult {
pub regenerated: usize, pub regenerated: usize,
@@ -24,12 +21,6 @@ pub struct GenerateDocsResult {
pub full_mode: bool, pub full_mode: bool,
} }
/// Run the generate-docs pipeline.
///
/// Default mode: process only existing dirty_sources entries.
/// Full mode: seed dirty_sources with ALL entities, then drain.
///
/// `progress_callback` reports `(processed, estimated_total)` as documents are generated.
pub fn run_generate_docs( pub fn run_generate_docs(
config: &Config, config: &Config,
full: bool, full: bool,
@@ -56,7 +47,6 @@ pub fn run_generate_docs(
result.errored = regen.errored; result.errored = regen.errored;
if full { if full {
// Optimize FTS index after bulk rebuild
let _ = conn.execute( let _ = conn.execute(
"INSERT INTO documents_fts(documents_fts) VALUES('optimize')", "INSERT INTO documents_fts(documents_fts) VALUES('optimize')",
[], [],
@@ -67,7 +57,6 @@ pub fn run_generate_docs(
Ok(result) Ok(result)
} }
/// Seed dirty_sources with all entities of the given type using keyset pagination.
fn seed_dirty( fn seed_dirty(
conn: &Connection, conn: &Connection,
source_type: SourceType, source_type: SourceType,
@@ -113,7 +102,6 @@ fn seed_dirty(
break; break;
} }
// Advance keyset cursor to the max id within the chunk window
let max_id: i64 = conn.query_row( let max_id: i64 = conn.query_row(
&format!( &format!(
"SELECT MAX(id) FROM (SELECT id FROM {table} WHERE id > ?1 ORDER BY id LIMIT ?2)", "SELECT MAX(id) FROM (SELECT id FROM {table} WHERE id > ?1 ORDER BY id LIMIT ?2)",
@@ -136,7 +124,6 @@ fn seed_dirty(
Ok(total_seeded) Ok(total_seeded)
} }
/// Print human-readable output.
pub fn print_generate_docs(result: &GenerateDocsResult) { pub fn print_generate_docs(result: &GenerateDocsResult) {
let mode = if result.full_mode { let mode = if result.full_mode {
"full" "full"
@@ -159,7 +146,6 @@ pub fn print_generate_docs(result: &GenerateDocsResult) {
} }
} }
/// JSON output structures.
#[derive(Serialize)] #[derive(Serialize)]
struct GenerateDocsJsonOutput { struct GenerateDocsJsonOutput {
ok: bool, ok: bool,
@@ -176,7 +162,6 @@ struct GenerateDocsJsonData {
errored: usize, errored: usize,
} }
/// Print JSON robot-mode output.
pub fn print_generate_docs_json(result: &GenerateDocsResult) { pub fn print_generate_docs_json(result: &GenerateDocsResult) {
let output = GenerateDocsJsonOutput { let output = GenerateDocsJsonOutput {
ok: true, ok: true,

View File

@@ -1,5 +1,3 @@
//! Ingest command - fetch data from GitLab.
use std::sync::Arc; use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering}; use std::sync::atomic::{AtomicUsize, Ordering};
@@ -22,17 +20,14 @@ use crate::ingestion::{
ingest_project_merge_requests_with_progress, ingest_project_merge_requests_with_progress,
}; };
/// Result of ingest command for display.
#[derive(Default)] #[derive(Default)]
pub struct IngestResult { pub struct IngestResult {
pub resource_type: String, pub resource_type: String,
pub projects_synced: usize, pub projects_synced: usize,
// Issue-specific fields
pub issues_fetched: usize, pub issues_fetched: usize,
pub issues_upserted: usize, pub issues_upserted: usize,
pub issues_synced_discussions: usize, pub issues_synced_discussions: usize,
pub issues_skipped_discussion_sync: usize, pub issues_skipped_discussion_sync: usize,
// MR-specific fields
pub mrs_fetched: usize, pub mrs_fetched: usize,
pub mrs_upserted: usize, pub mrs_upserted: usize,
pub mrs_synced_discussions: usize, pub mrs_synced_discussions: usize,
@@ -40,17 +35,13 @@ pub struct IngestResult {
pub assignees_linked: usize, pub assignees_linked: usize,
pub reviewers_linked: usize, pub reviewers_linked: usize,
pub diffnotes_count: usize, pub diffnotes_count: usize,
// Shared fields
pub labels_created: usize, pub labels_created: usize,
pub discussions_fetched: usize, pub discussions_fetched: usize,
pub notes_upserted: usize, pub notes_upserted: usize,
// Resource events
pub resource_events_fetched: usize, pub resource_events_fetched: usize,
pub resource_events_failed: usize, pub resource_events_failed: usize,
} }
/// Outcome of ingesting a single project, used to aggregate results
/// from concurrent project processing.
enum ProjectIngestOutcome { enum ProjectIngestOutcome {
Issues { Issues {
path: String, path: String,
@@ -62,24 +53,14 @@ enum ProjectIngestOutcome {
}, },
} }
/// Controls what interactive UI elements `run_ingest` displays.
///
/// Separates progress indicators (spinners, bars) from text output (headers,
/// per-project summaries) so callers like `sync` can show progress without
/// duplicating summary text.
#[derive(Debug, Clone, Copy)] #[derive(Debug, Clone, Copy)]
pub struct IngestDisplay { pub struct IngestDisplay {
/// Show animated spinners and progress bars.
pub show_progress: bool, pub show_progress: bool,
/// Show the per-project spinner. When called from `sync`, the stage
/// spinner already covers this, so a second spinner causes flashing.
pub show_spinner: bool, pub show_spinner: bool,
/// Show text headers ("Ingesting...") and per-project summary lines.
pub show_text: bool, pub show_text: bool,
} }
impl IngestDisplay { impl IngestDisplay {
/// Interactive mode: everything visible.
pub fn interactive() -> Self { pub fn interactive() -> Self {
Self { Self {
show_progress: true, show_progress: true,
@@ -88,7 +69,6 @@ impl IngestDisplay {
} }
} }
/// Robot/JSON mode: everything hidden.
pub fn silent() -> Self { pub fn silent() -> Self {
Self { Self {
show_progress: false, show_progress: false,
@@ -97,8 +77,6 @@ impl IngestDisplay {
} }
} }
/// Progress bars only, no spinner or text (used by sync which provides its
/// own stage spinner).
pub fn progress_only() -> Self { pub fn progress_only() -> Self {
Self { Self {
show_progress: true, show_progress: true,
@@ -108,10 +86,6 @@ impl IngestDisplay {
} }
} }
/// Run the ingest command.
///
/// `stage_bar` is an optional `ProgressBar` (typically from sync's stage spinner)
/// that will be updated with aggregate progress across all projects.
pub async fn run_ingest( pub async fn run_ingest(
config: &Config, config: &Config,
resource_type: &str, resource_type: &str,
@@ -138,7 +112,6 @@ pub async fn run_ingest(
.await .await
} }
/// Inner implementation of run_ingest, instrumented with a root span.
async fn run_ingest_inner( async fn run_ingest_inner(
config: &Config, config: &Config,
resource_type: &str, resource_type: &str,
@@ -148,7 +121,6 @@ async fn run_ingest_inner(
display: IngestDisplay, display: IngestDisplay,
stage_bar: Option<ProgressBar>, stage_bar: Option<ProgressBar>,
) -> Result<IngestResult> { ) -> Result<IngestResult> {
// Validate resource type early
if resource_type != "issues" && resource_type != "mrs" { if resource_type != "issues" && resource_type != "mrs" {
return Err(LoreError::Other(format!( return Err(LoreError::Other(format!(
"Invalid resource type '{}'. Valid types: issues, mrs", "Invalid resource type '{}'. Valid types: issues, mrs",
@@ -156,11 +128,9 @@ async fn run_ingest_inner(
))); )));
} }
// Get database path and create connection
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
// Acquire single-flight lock
let lock_conn = create_connection(&db_path)?; let lock_conn = create_connection(&db_path)?;
let mut lock = AppLock::new( let mut lock = AppLock::new(
lock_conn, lock_conn,
@@ -172,23 +142,19 @@ async fn run_ingest_inner(
); );
lock.acquire(force)?; lock.acquire(force)?;
// Get token from environment
let token = let token =
std::env::var(&config.gitlab.token_env_var).map_err(|_| LoreError::TokenNotSet { std::env::var(&config.gitlab.token_env_var).map_err(|_| LoreError::TokenNotSet {
env_var: config.gitlab.token_env_var.clone(), env_var: config.gitlab.token_env_var.clone(),
})?; })?;
// Create GitLab client
let client = GitLabClient::new( let client = GitLabClient::new(
&config.gitlab.base_url, &config.gitlab.base_url,
&token, &token,
Some(config.sync.requests_per_second), Some(config.sync.requests_per_second),
); );
// Get projects to sync
let projects = get_projects_to_sync(&conn, &config.projects, project_filter)?; let projects = get_projects_to_sync(&conn, &config.projects, project_filter)?;
// If --full flag is set, reset sync cursors and discussion watermarks for a complete re-fetch
if full { if full {
if display.show_text { if display.show_text {
println!( println!(
@@ -198,20 +164,17 @@ async fn run_ingest_inner(
} }
for (local_project_id, _, path) in &projects { for (local_project_id, _, path) in &projects {
if resource_type == "issues" { if resource_type == "issues" {
// Reset issue discussion and resource event watermarks so everything gets re-synced
conn.execute( conn.execute(
"UPDATE issues SET discussions_synced_for_updated_at = NULL, resource_events_synced_for_updated_at = NULL WHERE project_id = ?", "UPDATE issues SET discussions_synced_for_updated_at = NULL, resource_events_synced_for_updated_at = NULL WHERE project_id = ?",
[*local_project_id], [*local_project_id],
)?; )?;
} else if resource_type == "mrs" { } else if resource_type == "mrs" {
// Reset MR discussion and resource event watermarks
conn.execute( conn.execute(
"UPDATE merge_requests SET discussions_synced_for_updated_at = NULL, resource_events_synced_for_updated_at = NULL WHERE project_id = ?", "UPDATE merge_requests SET discussions_synced_for_updated_at = NULL, resource_events_synced_for_updated_at = NULL WHERE project_id = ?",
[*local_project_id], [*local_project_id],
)?; )?;
} }
// Then reset sync cursor
conn.execute( conn.execute(
"DELETE FROM sync_cursors WHERE project_id = ? AND resource_type = ?", "DELETE FROM sync_cursors WHERE project_id = ? AND resource_type = ?",
(*local_project_id, resource_type), (*local_project_id, resource_type),
@@ -248,12 +211,9 @@ async fn run_ingest_inner(
println!(); println!();
} }
// Process projects concurrently. Each project gets its own DB connection
// while sharing the rate limiter through the cloned GitLabClient.
let concurrency = config.sync.primary_concurrency as usize; let concurrency = config.sync.primary_concurrency as usize;
let resource_type_owned = resource_type.to_string(); let resource_type_owned = resource_type.to_string();
// Aggregate counters for stage_bar updates (shared across concurrent projects)
let agg_fetched = Arc::new(AtomicUsize::new(0)); let agg_fetched = Arc::new(AtomicUsize::new(0));
let agg_discussions = Arc::new(AtomicUsize::new(0)); let agg_discussions = Arc::new(AtomicUsize::new(0));
let agg_disc_total = Arc::new(AtomicUsize::new(0)); let agg_disc_total = Arc::new(AtomicUsize::new(0));
@@ -328,7 +288,6 @@ async fn run_ingest_inner(
} else { } else {
Box::new(move |event: ProgressEvent| match event { Box::new(move |event: ProgressEvent| match event {
ProgressEvent::IssuesFetchStarted | ProgressEvent::MrsFetchStarted => { ProgressEvent::IssuesFetchStarted | ProgressEvent::MrsFetchStarted => {
// Spinner already showing fetch message
} }
ProgressEvent::IssuesFetchComplete { total } | ProgressEvent::MrsFetchComplete { total } => { ProgressEvent::IssuesFetchComplete { total } | ProgressEvent::MrsFetchComplete { total } => {
let agg = agg_fetched_clone.fetch_add(total, Ordering::Relaxed) + total; let agg = agg_fetched_clone.fetch_add(total, Ordering::Relaxed) + total;
@@ -410,6 +369,20 @@ async fn run_ingest_inner(
ProgressEvent::ResourceEventsFetchComplete { .. } => { ProgressEvent::ResourceEventsFetchComplete { .. } => {
disc_bar_clone.finish_and_clear(); disc_bar_clone.finish_and_clear();
} }
ProgressEvent::ClosesIssuesFetchStarted { total } => {
disc_bar_clone.reset();
disc_bar_clone.set_length(total as u64);
disc_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
stage_bar_clone.set_message(
"Fetching closes-issues references...".to_string()
);
}
ProgressEvent::ClosesIssueFetched { current, total: _ } => {
disc_bar_clone.set_position(current as u64);
}
ProgressEvent::ClosesIssuesFetchComplete { .. } => {
disc_bar_clone.finish_and_clear();
}
}) })
}; };
@@ -453,9 +426,6 @@ async fn run_ingest_inner(
.collect() .collect()
.await; .await;
// Aggregate results and print per-project summaries.
// Process all successes first, then return the first error (if any)
// so that successful project summaries are always printed.
let mut first_error: Option<LoreError> = None; let mut first_error: Option<LoreError> = None;
for project_result in project_results { for project_result in project_results {
match project_result { match project_result {
@@ -510,21 +480,17 @@ async fn run_ingest_inner(
return Err(e); return Err(e);
} }
// Lock is released on drop
Ok(total) Ok(total)
} }
/// Get projects to sync from database, optionally filtered.
fn get_projects_to_sync( fn get_projects_to_sync(
conn: &Connection, conn: &Connection,
configured_projects: &[crate::core::config::ProjectConfig], configured_projects: &[crate::core::config::ProjectConfig],
filter: Option<&str>, filter: Option<&str>,
) -> Result<Vec<(i64, i64, String)>> { ) -> Result<Vec<(i64, i64, String)>> {
// If a filter is provided, resolve it to a specific project
if let Some(filter_str) = filter { if let Some(filter_str) = filter {
let project_id = resolve_project(conn, filter_str)?; let project_id = resolve_project(conn, filter_str)?;
// Verify the resolved project is in our config
let row: Option<(i64, String)> = conn let row: Option<(i64, String)> = conn
.query_row( .query_row(
"SELECT gitlab_project_id, path_with_namespace FROM projects WHERE id = ?1", "SELECT gitlab_project_id, path_with_namespace FROM projects WHERE id = ?1",
@@ -534,7 +500,6 @@ fn get_projects_to_sync(
.ok(); .ok();
if let Some((gitlab_id, path)) = row { if let Some((gitlab_id, path)) = row {
// Confirm it's a configured project
if configured_projects.iter().any(|p| p.path == path) { if configured_projects.iter().any(|p| p.path == path) {
return Ok(vec![(project_id, gitlab_id, path)]); return Ok(vec![(project_id, gitlab_id, path)]);
} }
@@ -550,7 +515,6 @@ fn get_projects_to_sync(
))); )));
} }
// No filter: return all configured projects
let mut projects = Vec::new(); let mut projects = Vec::new();
for project_config in configured_projects { for project_config in configured_projects {
let result: Option<(i64, i64)> = conn let result: Option<(i64, i64)> = conn
@@ -569,7 +533,6 @@ fn get_projects_to_sync(
Ok(projects) Ok(projects)
} }
/// Print summary for a single project (issues).
fn print_issue_project_summary(path: &str, result: &IngestProjectResult) { fn print_issue_project_summary(path: &str, result: &IngestProjectResult) {
let labels_str = if result.labels_created > 0 { let labels_str = if result.labels_created > 0 {
format!(", {} new labels", result.labels_created) format!(", {} new labels", result.labels_created)
@@ -599,7 +562,6 @@ fn print_issue_project_summary(path: &str, result: &IngestProjectResult) {
} }
} }
/// Print summary for a single project (merge requests).
fn print_mr_project_summary(path: &str, result: &IngestMrProjectResult) { fn print_mr_project_summary(path: &str, result: &IngestMrProjectResult) {
let labels_str = if result.labels_created > 0 { let labels_str = if result.labels_created > 0 {
format!(", {} new labels", result.labels_created) format!(", {} new labels", result.labels_created)
@@ -647,7 +609,6 @@ fn print_mr_project_summary(path: &str, result: &IngestMrProjectResult) {
} }
} }
/// JSON output structures for robot mode.
#[derive(Serialize)] #[derive(Serialize)]
struct IngestJsonOutput { struct IngestJsonOutput {
ok: bool, ok: bool,
@@ -688,7 +649,6 @@ struct IngestMrStats {
diffnotes_count: usize, diffnotes_count: usize,
} }
/// Print final summary as JSON (robot mode).
pub fn print_ingest_summary_json(result: &IngestResult) { pub fn print_ingest_summary_json(result: &IngestResult) {
let (issues, merge_requests) = if result.resource_type == "issues" { let (issues, merge_requests) = if result.resource_type == "issues" {
( (
@@ -733,7 +693,6 @@ pub fn print_ingest_summary_json(result: &IngestResult) {
println!("{}", serde_json::to_string(&output).unwrap()); println!("{}", serde_json::to_string(&output).unwrap());
} }
/// Print final summary.
pub fn print_ingest_summary(result: &IngestResult) { pub fn print_ingest_summary(result: &IngestResult) {
println!(); println!();

View File

@@ -1,5 +1,3 @@
//! Init command - initialize configuration and database.
use std::fs; use std::fs;
use crate::core::config::{MinimalConfig, MinimalGitLabConfig, ProjectConfig}; use crate::core::config::{MinimalConfig, MinimalGitLabConfig, ProjectConfig};
@@ -8,21 +6,18 @@ use crate::core::error::{LoreError, Result};
use crate::core::paths::{get_config_path, get_data_dir}; use crate::core::paths::{get_config_path, get_data_dir};
use crate::gitlab::{GitLabClient, GitLabProject}; use crate::gitlab::{GitLabClient, GitLabProject};
/// Input data for init command.
pub struct InitInputs { pub struct InitInputs {
pub gitlab_url: String, pub gitlab_url: String,
pub token_env_var: String, pub token_env_var: String,
pub project_paths: Vec<String>, pub project_paths: Vec<String>,
} }
/// Options for init command.
pub struct InitOptions { pub struct InitOptions {
pub config_path: Option<String>, pub config_path: Option<String>,
pub force: bool, pub force: bool,
pub non_interactive: bool, pub non_interactive: bool,
} }
/// Result of successful init.
pub struct InitResult { pub struct InitResult {
pub config_path: String, pub config_path: String,
pub data_dir: String, pub data_dir: String,
@@ -40,12 +35,10 @@ pub struct ProjectInfo {
pub name: String, pub name: String,
} }
/// Run the init command programmatically.
pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitResult> { pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitResult> {
let config_path = get_config_path(options.config_path.as_deref()); let config_path = get_config_path(options.config_path.as_deref());
let data_dir = get_data_dir(); let data_dir = get_data_dir();
// 1. Check if config exists (force takes precedence over non_interactive)
if config_path.exists() && !options.force { if config_path.exists() && !options.force {
if options.non_interactive { if options.non_interactive {
return Err(LoreError::Other(format!( return Err(LoreError::Other(format!(
@@ -59,7 +52,6 @@ pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitRe
)); ));
} }
// 2. Validate GitLab URL format
if url::Url::parse(&inputs.gitlab_url).is_err() { if url::Url::parse(&inputs.gitlab_url).is_err() {
return Err(LoreError::Other(format!( return Err(LoreError::Other(format!(
"Invalid GitLab URL: {}", "Invalid GitLab URL: {}",
@@ -67,12 +59,10 @@ pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitRe
))); )));
} }
// 3. Check token is set in environment
let token = std::env::var(&inputs.token_env_var).map_err(|_| LoreError::TokenNotSet { let token = std::env::var(&inputs.token_env_var).map_err(|_| LoreError::TokenNotSet {
env_var: inputs.token_env_var.clone(), env_var: inputs.token_env_var.clone(),
})?; })?;
// 4. Create GitLab client and test authentication
let client = GitLabClient::new(&inputs.gitlab_url, &token, None); let client = GitLabClient::new(&inputs.gitlab_url, &token, None);
let gitlab_user = client.get_current_user().await.map_err(|e| { let gitlab_user = client.get_current_user().await.map_err(|e| {
@@ -88,7 +78,6 @@ pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitRe
name: gitlab_user.name, name: gitlab_user.name,
}; };
// 5. Validate each project path
let mut validated_projects: Vec<(ProjectInfo, GitLabProject)> = Vec::new(); let mut validated_projects: Vec<(ProjectInfo, GitLabProject)> = Vec::new();
for project_path in &inputs.project_paths { for project_path in &inputs.project_paths {
@@ -115,14 +104,10 @@ pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitRe
)); ));
} }
// 6. All validations passed - now write config and setup DB
// Create config directory if needed
if let Some(parent) = config_path.parent() { if let Some(parent) = config_path.parent() {
fs::create_dir_all(parent)?; fs::create_dir_all(parent)?;
} }
// Write minimal config (rely on serde defaults)
let config = MinimalConfig { let config = MinimalConfig {
gitlab: MinimalGitLabConfig { gitlab: MinimalGitLabConfig {
base_url: inputs.gitlab_url, base_url: inputs.gitlab_url,
@@ -138,16 +123,13 @@ pub async fn run_init(inputs: InitInputs, options: InitOptions) -> Result<InitRe
let config_json = serde_json::to_string_pretty(&config)?; let config_json = serde_json::to_string_pretty(&config)?;
fs::write(&config_path, format!("{config_json}\n"))?; fs::write(&config_path, format!("{config_json}\n"))?;
// 7. Create data directory and initialize database
fs::create_dir_all(&data_dir)?; fs::create_dir_all(&data_dir)?;
let db_path = data_dir.join("lore.db"); let db_path = data_dir.join("lore.db");
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
// Run embedded migrations
run_migrations(&conn)?; run_migrations(&conn)?;
// 8. Insert validated projects
for (_, gitlab_project) in &validated_projects { for (_, gitlab_project) in &validated_projects {
conn.execute( conn.execute(
"INSERT INTO projects (gitlab_project_id, path_with_namespace, default_branch, web_url) "INSERT INTO projects (gitlab_project_id, path_with_namespace, default_branch, web_url)

View File

@@ -1,5 +1,3 @@
//! List command - display issues/MRs from local database.
use comfy_table::{Attribute, Cell, Color, ContentArrangement, Table}; use comfy_table::{Attribute, Cell, Color, ContentArrangement, Table};
use rusqlite::Connection; use rusqlite::Connection;
use serde::Serialize; use serde::Serialize;
@@ -11,7 +9,6 @@ use crate::core::paths::get_db_path;
use crate::core::project::resolve_project; use crate::core::project::resolve_project;
use crate::core::time::{ms_to_iso, now_ms, parse_since}; use crate::core::time::{ms_to_iso, now_ms, parse_since};
/// Apply foreground color to a Cell only if colors are enabled.
fn colored_cell(content: impl std::fmt::Display, color: Color) -> Cell { fn colored_cell(content: impl std::fmt::Display, color: Color) -> Cell {
let cell = Cell::new(content); let cell = Cell::new(content);
if console::colors_enabled() { if console::colors_enabled() {
@@ -21,7 +18,6 @@ fn colored_cell(content: impl std::fmt::Display, color: Color) -> Cell {
} }
} }
/// Issue row for display.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct IssueListRow { pub struct IssueListRow {
pub iid: i64, pub iid: i64,
@@ -39,7 +35,6 @@ pub struct IssueListRow {
pub unresolved_count: i64, pub unresolved_count: i64,
} }
/// Serializable version for JSON output.
#[derive(Serialize)] #[derive(Serialize)]
pub struct IssueListRowJson { pub struct IssueListRowJson {
pub iid: i64, pub iid: i64,
@@ -76,14 +71,12 @@ impl From<&IssueListRow> for IssueListRowJson {
} }
} }
/// Result of list query.
#[derive(Serialize)] #[derive(Serialize)]
pub struct ListResult { pub struct ListResult {
pub issues: Vec<IssueListRow>, pub issues: Vec<IssueListRow>,
pub total_count: usize, pub total_count: usize,
} }
/// JSON output structure.
#[derive(Serialize)] #[derive(Serialize)]
pub struct ListResultJson { pub struct ListResultJson {
pub issues: Vec<IssueListRowJson>, pub issues: Vec<IssueListRowJson>,
@@ -101,7 +94,6 @@ impl From<&ListResult> for ListResultJson {
} }
} }
/// MR row for display.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct MrListRow { pub struct MrListRow {
pub iid: i64, pub iid: i64,
@@ -123,7 +115,6 @@ pub struct MrListRow {
pub unresolved_count: i64, pub unresolved_count: i64,
} }
/// Serializable version for JSON output.
#[derive(Serialize)] #[derive(Serialize)]
pub struct MrListRowJson { pub struct MrListRowJson {
pub iid: i64, pub iid: i64,
@@ -168,14 +159,12 @@ impl From<&MrListRow> for MrListRowJson {
} }
} }
/// Result of MR list query.
#[derive(Serialize)] #[derive(Serialize)]
pub struct MrListResult { pub struct MrListResult {
pub mrs: Vec<MrListRow>, pub mrs: Vec<MrListRow>,
pub total_count: usize, pub total_count: usize,
} }
/// JSON output structure for MRs.
#[derive(Serialize)] #[derive(Serialize)]
pub struct MrListResultJson { pub struct MrListResultJson {
pub mrs: Vec<MrListRowJson>, pub mrs: Vec<MrListRowJson>,
@@ -193,7 +182,6 @@ impl From<&MrListResult> for MrListResultJson {
} }
} }
/// Filter options for issue list query.
pub struct ListFilters<'a> { pub struct ListFilters<'a> {
pub limit: usize, pub limit: usize,
pub project: Option<&'a str>, pub project: Option<&'a str>,
@@ -209,7 +197,6 @@ pub struct ListFilters<'a> {
pub order: &'a str, pub order: &'a str,
} }
/// Filter options for MR list query.
pub struct MrListFilters<'a> { pub struct MrListFilters<'a> {
pub limit: usize, pub limit: usize,
pub project: Option<&'a str>, pub project: Option<&'a str>,
@@ -227,7 +214,6 @@ pub struct MrListFilters<'a> {
pub order: &'a str, pub order: &'a str,
} }
/// Run the list issues command.
pub fn run_list_issues(config: &Config, filters: ListFilters) -> Result<ListResult> { pub fn run_list_issues(config: &Config, filters: ListFilters) -> Result<ListResult> {
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
@@ -236,9 +222,7 @@ pub fn run_list_issues(config: &Config, filters: ListFilters) -> Result<ListResu
Ok(result) Ok(result)
} }
/// Query issues from database with enriched data.
fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult> { fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult> {
// Build WHERE clause
let mut where_clauses = Vec::new(); let mut where_clauses = Vec::new();
let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new(); let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
@@ -255,14 +239,12 @@ fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult>
params.push(Box::new(state.to_string())); params.push(Box::new(state.to_string()));
} }
// Handle author filter (strip leading @ if present)
if let Some(author) = filters.author { if let Some(author) = filters.author {
let username = author.strip_prefix('@').unwrap_or(author); let username = author.strip_prefix('@').unwrap_or(author);
where_clauses.push("i.author_username = ?"); where_clauses.push("i.author_username = ?");
params.push(Box::new(username.to_string())); params.push(Box::new(username.to_string()));
} }
// Handle assignee filter (strip leading @ if present)
if let Some(assignee) = filters.assignee { if let Some(assignee) = filters.assignee {
let username = assignee.strip_prefix('@').unwrap_or(assignee); let username = assignee.strip_prefix('@').unwrap_or(assignee);
where_clauses.push( where_clauses.push(
@@ -272,7 +254,6 @@ fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult>
params.push(Box::new(username.to_string())); params.push(Box::new(username.to_string()));
} }
// Handle since filter
if let Some(since_str) = filters.since { if let Some(since_str) = filters.since {
let cutoff_ms = parse_since(since_str).ok_or_else(|| { let cutoff_ms = parse_since(since_str).ok_or_else(|| {
LoreError::Other(format!( LoreError::Other(format!(
@@ -284,7 +265,6 @@ fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult>
params.push(Box::new(cutoff_ms)); params.push(Box::new(cutoff_ms));
} }
// Handle label filters (AND logic - all labels must be present)
if let Some(labels) = filters.labels { if let Some(labels) = filters.labels {
for label in labels { for label in labels {
where_clauses.push( where_clauses.push(
@@ -296,19 +276,16 @@ fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult>
} }
} }
// Handle milestone filter
if let Some(milestone) = filters.milestone { if let Some(milestone) = filters.milestone {
where_clauses.push("i.milestone_title = ?"); where_clauses.push("i.milestone_title = ?");
params.push(Box::new(milestone.to_string())); params.push(Box::new(milestone.to_string()));
} }
// Handle due_before filter
if let Some(due_before) = filters.due_before { if let Some(due_before) = filters.due_before {
where_clauses.push("i.due_date IS NOT NULL AND i.due_date <= ?"); where_clauses.push("i.due_date IS NOT NULL AND i.due_date <= ?");
params.push(Box::new(due_before.to_string())); params.push(Box::new(due_before.to_string()));
} }
// Handle has_due_date filter
if filters.has_due_date { if filters.has_due_date {
where_clauses.push("i.due_date IS NOT NULL"); where_clauses.push("i.due_date IS NOT NULL");
} }
@@ -319,7 +296,6 @@ fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult>
format!("WHERE {}", where_clauses.join(" AND ")) format!("WHERE {}", where_clauses.join(" AND "))
}; };
// Get total count
let count_sql = format!( let count_sql = format!(
"SELECT COUNT(*) FROM issues i "SELECT COUNT(*) FROM issues i
JOIN projects p ON i.project_id = p.id JOIN projects p ON i.project_id = p.id
@@ -330,11 +306,10 @@ fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult>
let total_count: i64 = conn.query_row(&count_sql, param_refs.as_slice(), |row| row.get(0))?; let total_count: i64 = conn.query_row(&count_sql, param_refs.as_slice(), |row| row.get(0))?;
let total_count = total_count as usize; let total_count = total_count as usize;
// Build ORDER BY
let sort_column = match filters.sort { let sort_column = match filters.sort {
"created" => "i.created_at", "created" => "i.created_at",
"iid" => "i.iid", "iid" => "i.iid",
_ => "i.updated_at", // default _ => "i.updated_at",
}; };
let order = if filters.order == "asc" { let order = if filters.order == "asc" {
"ASC" "ASC"
@@ -342,7 +317,6 @@ fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult>
"DESC" "DESC"
}; };
// Get issues with enriched data
let query_sql = format!( let query_sql = format!(
"SELECT "SELECT
i.iid, i.iid,
@@ -416,7 +390,6 @@ fn query_issues(conn: &Connection, filters: &ListFilters) -> Result<ListResult>
}) })
} }
/// Run the list MRs command.
pub fn run_list_mrs(config: &Config, filters: MrListFilters) -> Result<MrListResult> { pub fn run_list_mrs(config: &Config, filters: MrListFilters) -> Result<MrListResult> {
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
@@ -425,9 +398,7 @@ pub fn run_list_mrs(config: &Config, filters: MrListFilters) -> Result<MrListRes
Ok(result) Ok(result)
} }
/// Query MRs from database with enriched data.
fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult> { fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult> {
// Build WHERE clause
let mut where_clauses = Vec::new(); let mut where_clauses = Vec::new();
let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new(); let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
@@ -444,14 +415,12 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
params.push(Box::new(state.to_string())); params.push(Box::new(state.to_string()));
} }
// Handle author filter (strip leading @ if present)
if let Some(author) = filters.author { if let Some(author) = filters.author {
let username = author.strip_prefix('@').unwrap_or(author); let username = author.strip_prefix('@').unwrap_or(author);
where_clauses.push("m.author_username = ?"); where_clauses.push("m.author_username = ?");
params.push(Box::new(username.to_string())); params.push(Box::new(username.to_string()));
} }
// Handle assignee filter (strip leading @ if present)
if let Some(assignee) = filters.assignee { if let Some(assignee) = filters.assignee {
let username = assignee.strip_prefix('@').unwrap_or(assignee); let username = assignee.strip_prefix('@').unwrap_or(assignee);
where_clauses.push( where_clauses.push(
@@ -461,7 +430,6 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
params.push(Box::new(username.to_string())); params.push(Box::new(username.to_string()));
} }
// Handle reviewer filter (strip leading @ if present)
if let Some(reviewer) = filters.reviewer { if let Some(reviewer) = filters.reviewer {
let username = reviewer.strip_prefix('@').unwrap_or(reviewer); let username = reviewer.strip_prefix('@').unwrap_or(reviewer);
where_clauses.push( where_clauses.push(
@@ -471,7 +439,6 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
params.push(Box::new(username.to_string())); params.push(Box::new(username.to_string()));
} }
// Handle since filter
if let Some(since_str) = filters.since { if let Some(since_str) = filters.since {
let cutoff_ms = parse_since(since_str).ok_or_else(|| { let cutoff_ms = parse_since(since_str).ok_or_else(|| {
LoreError::Other(format!( LoreError::Other(format!(
@@ -483,7 +450,6 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
params.push(Box::new(cutoff_ms)); params.push(Box::new(cutoff_ms));
} }
// Handle label filters (AND logic - all labels must be present)
if let Some(labels) = filters.labels { if let Some(labels) = filters.labels {
for label in labels { for label in labels {
where_clauses.push( where_clauses.push(
@@ -495,20 +461,17 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
} }
} }
// Handle draft filter
if filters.draft { if filters.draft {
where_clauses.push("m.draft = 1"); where_clauses.push("m.draft = 1");
} else if filters.no_draft { } else if filters.no_draft {
where_clauses.push("m.draft = 0"); where_clauses.push("m.draft = 0");
} }
// Handle target branch filter
if let Some(target_branch) = filters.target_branch { if let Some(target_branch) = filters.target_branch {
where_clauses.push("m.target_branch = ?"); where_clauses.push("m.target_branch = ?");
params.push(Box::new(target_branch.to_string())); params.push(Box::new(target_branch.to_string()));
} }
// Handle source branch filter
if let Some(source_branch) = filters.source_branch { if let Some(source_branch) = filters.source_branch {
where_clauses.push("m.source_branch = ?"); where_clauses.push("m.source_branch = ?");
params.push(Box::new(source_branch.to_string())); params.push(Box::new(source_branch.to_string()));
@@ -520,7 +483,6 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
format!("WHERE {}", where_clauses.join(" AND ")) format!("WHERE {}", where_clauses.join(" AND "))
}; };
// Get total count
let count_sql = format!( let count_sql = format!(
"SELECT COUNT(*) FROM merge_requests m "SELECT COUNT(*) FROM merge_requests m
JOIN projects p ON m.project_id = p.id JOIN projects p ON m.project_id = p.id
@@ -531,11 +493,10 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
let total_count: i64 = conn.query_row(&count_sql, param_refs.as_slice(), |row| row.get(0))?; let total_count: i64 = conn.query_row(&count_sql, param_refs.as_slice(), |row| row.get(0))?;
let total_count = total_count as usize; let total_count = total_count as usize;
// Build ORDER BY
let sort_column = match filters.sort { let sort_column = match filters.sort {
"created" => "m.created_at", "created" => "m.created_at",
"iid" => "m.iid", "iid" => "m.iid",
_ => "m.updated_at", // default _ => "m.updated_at",
}; };
let order = if filters.order == "asc" { let order = if filters.order == "asc" {
"ASC" "ASC"
@@ -543,7 +504,6 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
"DESC" "DESC"
}; };
// Get MRs with enriched data
let query_sql = format!( let query_sql = format!(
"SELECT "SELECT
m.iid, m.iid,
@@ -631,7 +591,6 @@ fn query_mrs(conn: &Connection, filters: &MrListFilters) -> Result<MrListResult>
Ok(MrListResult { mrs, total_count }) Ok(MrListResult { mrs, total_count })
} }
/// Format relative time from ms epoch.
fn format_relative_time(ms_epoch: i64) -> String { fn format_relative_time(ms_epoch: i64) -> String {
let now = now_ms(); let now = now_ms();
let diff = now - ms_epoch; let diff = now - ms_epoch;
@@ -662,7 +621,6 @@ fn format_relative_time(ms_epoch: i64) -> String {
} }
} }
/// Truncate string to max width with ellipsis.
fn truncate_with_ellipsis(s: &str, max_width: usize) -> String { fn truncate_with_ellipsis(s: &str, max_width: usize) -> String {
if s.chars().count() <= max_width { if s.chars().count() <= max_width {
s.to_string() s.to_string()
@@ -672,7 +630,6 @@ fn truncate_with_ellipsis(s: &str, max_width: usize) -> String {
} }
} }
/// Format labels for display: [bug, urgent +2]
fn format_labels(labels: &[String], max_shown: usize) -> String { fn format_labels(labels: &[String], max_shown: usize) -> String {
if labels.is_empty() { if labels.is_empty() {
return String::new(); return String::new();
@@ -688,7 +645,6 @@ fn format_labels(labels: &[String], max_shown: usize) -> String {
} }
} }
/// Format assignees for display: @user1, @user2 +1
fn format_assignees(assignees: &[String]) -> String { fn format_assignees(assignees: &[String]) -> String {
if assignees.is_empty() { if assignees.is_empty() {
return "-".to_string(); return "-".to_string();
@@ -709,7 +665,6 @@ fn format_assignees(assignees: &[String]) -> String {
} }
} }
/// Format discussion count: "3/1!" (3 total, 1 unresolved)
fn format_discussions(total: i64, unresolved: i64) -> String { fn format_discussions(total: i64, unresolved: i64) -> String {
if total == 0 { if total == 0 {
return String::new(); return String::new();
@@ -722,13 +677,11 @@ fn format_discussions(total: i64, unresolved: i64) -> String {
} }
} }
/// Format branch info: target <- source
fn format_branches(target: &str, source: &str, max_width: usize) -> String { fn format_branches(target: &str, source: &str, max_width: usize) -> String {
let full = format!("{} <- {}", target, source); let full = format!("{} <- {}", target, source);
truncate_with_ellipsis(&full, max_width) truncate_with_ellipsis(&full, max_width)
} }
/// Print issues list as a formatted table.
pub fn print_list_issues(result: &ListResult) { pub fn print_list_issues(result: &ListResult) {
if result.issues.is_empty() { if result.issues.is_empty() {
println!("No issues found."); println!("No issues found.");
@@ -781,7 +734,6 @@ pub fn print_list_issues(result: &ListResult) {
println!("{table}"); println!("{table}");
} }
/// Print issues list as JSON.
pub fn print_list_issues_json(result: &ListResult) { pub fn print_list_issues_json(result: &ListResult) {
let json_result = ListResultJson::from(result); let json_result = ListResultJson::from(result);
match serde_json::to_string_pretty(&json_result) { match serde_json::to_string_pretty(&json_result) {
@@ -790,7 +742,6 @@ pub fn print_list_issues_json(result: &ListResult) {
} }
} }
/// Open issue in browser. Returns the URL that was opened.
pub fn open_issue_in_browser(result: &ListResult) -> Option<String> { pub fn open_issue_in_browser(result: &ListResult) -> Option<String> {
let first_issue = result.issues.first()?; let first_issue = result.issues.first()?;
let url = first_issue.web_url.as_ref()?; let url = first_issue.web_url.as_ref()?;
@@ -807,7 +758,6 @@ pub fn open_issue_in_browser(result: &ListResult) -> Option<String> {
} }
} }
/// Print MRs list as a formatted table.
pub fn print_list_mrs(result: &MrListResult) { pub fn print_list_mrs(result: &MrListResult) {
if result.mrs.is_empty() { if result.mrs.is_empty() {
println!("No merge requests found."); println!("No merge requests found.");
@@ -869,7 +819,6 @@ pub fn print_list_mrs(result: &MrListResult) {
println!("{table}"); println!("{table}");
} }
/// Print MRs list as JSON.
pub fn print_list_mrs_json(result: &MrListResult) { pub fn print_list_mrs_json(result: &MrListResult) {
let json_result = MrListResultJson::from(result); let json_result = MrListResultJson::from(result);
match serde_json::to_string_pretty(&json_result) { match serde_json::to_string_pretty(&json_result) {
@@ -878,7 +827,6 @@ pub fn print_list_mrs_json(result: &MrListResult) {
} }
} }
/// Open MR in browser. Returns the URL that was opened.
pub fn open_mr_in_browser(result: &MrListResult) -> Option<String> { pub fn open_mr_in_browser(result: &MrListResult) -> Option<String> {
let first_mr = result.mrs.first()?; let first_mr = result.mrs.first()?;
let url = first_mr.web_url.as_ref()?; let url = first_mr.web_url.as_ref()?;
@@ -921,10 +869,10 @@ mod tests {
fn relative_time_formats_correctly() { fn relative_time_formats_correctly() {
let now = now_ms(); let now = now_ms();
assert_eq!(format_relative_time(now - 30_000), "just now"); // 30s ago assert_eq!(format_relative_time(now - 30_000), "just now");
assert_eq!(format_relative_time(now - 120_000), "2 min ago"); // 2 min ago assert_eq!(format_relative_time(now - 120_000), "2 min ago");
assert_eq!(format_relative_time(now - 7_200_000), "2 hours ago"); // 2 hours ago assert_eq!(format_relative_time(now - 7_200_000), "2 hours ago");
assert_eq!(format_relative_time(now - 172_800_000), "2 days ago"); // 2 days ago assert_eq!(format_relative_time(now - 172_800_000), "2 days ago");
} }
#[test] #[test]

View File

@@ -1,5 +1,3 @@
//! CLI command implementations.
pub mod auth_test; pub mod auth_test;
pub mod count; pub mod count;
pub mod doctor; pub mod doctor;

View File

@@ -1,5 +1,3 @@
//! Search command: lexical (FTS5) search with filter support and single-query hydration.
use console::style; use console::style;
use serde::Serialize; use serde::Serialize;
@@ -15,7 +13,6 @@ use crate::search::{
search_fts, search_fts,
}; };
/// Display-ready search result with all fields hydrated.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct SearchResultDisplay { pub struct SearchResultDisplay {
pub document_id: i64, pub document_id: i64,
@@ -34,7 +31,6 @@ pub struct SearchResultDisplay {
pub explain: Option<ExplainData>, pub explain: Option<ExplainData>,
} }
/// Ranking explanation for --explain output.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct ExplainData { pub struct ExplainData {
pub vector_rank: Option<usize>, pub vector_rank: Option<usize>,
@@ -42,7 +38,6 @@ pub struct ExplainData {
pub rrf_score: f64, pub rrf_score: f64,
} }
/// Search response wrapper.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct SearchResponse { pub struct SearchResponse {
pub query: String, pub query: String,
@@ -52,7 +47,6 @@ pub struct SearchResponse {
pub warnings: Vec<String>, pub warnings: Vec<String>,
} }
/// Build SearchFilters from CLI args.
pub struct SearchCliFilters { pub struct SearchCliFilters {
pub source_type: Option<String>, pub source_type: Option<String>,
pub author: Option<String>, pub author: Option<String>,
@@ -64,7 +58,6 @@ pub struct SearchCliFilters {
pub limit: usize, pub limit: usize,
} }
/// Run a lexical search query.
pub fn run_search( pub fn run_search(
config: &Config, config: &Config,
query: &str, query: &str,
@@ -75,7 +68,6 @@ pub fn run_search(
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
// Check if any documents exist
let doc_count: i64 = conn let doc_count: i64 = conn
.query_row("SELECT COUNT(*) FROM documents", [], |row| row.get(0)) .query_row("SELECT COUNT(*) FROM documents", [], |row| row.get(0))
.unwrap_or(0); .unwrap_or(0);
@@ -90,7 +82,6 @@ pub fn run_search(
}); });
} }
// Build filters
let source_type = cli_filters let source_type = cli_filters
.source_type .source_type
.as_deref() .as_deref()
@@ -146,7 +137,6 @@ pub fn run_search(
limit: cli_filters.limit, limit: cli_filters.limit,
}; };
// Adaptive recall: wider initial fetch when filters applied
let requested = filters.clamp_limit(); let requested = filters.clamp_limit();
let top_k = if filters.has_any_filter() { let top_k = if filters.has_any_filter() {
(requested * 50).clamp(200, 1500) (requested * 50).clamp(200, 1500)
@@ -154,24 +144,20 @@ pub fn run_search(
(requested * 10).clamp(50, 1500) (requested * 10).clamp(50, 1500)
}; };
// FTS search
let fts_results = search_fts(&conn, query, top_k, fts_mode)?; let fts_results = search_fts(&conn, query, top_k, fts_mode)?;
let fts_tuples: Vec<(i64, f64)> = fts_results let fts_tuples: Vec<(i64, f64)> = fts_results
.iter() .iter()
.map(|r| (r.document_id, r.bm25_score)) .map(|r| (r.document_id, r.bm25_score))
.collect(); .collect();
// Build snippet map before ranking
let snippet_map: std::collections::HashMap<i64, String> = fts_results let snippet_map: std::collections::HashMap<i64, String> = fts_results
.iter() .iter()
.map(|r| (r.document_id, r.snippet.clone())) .map(|r| (r.document_id, r.snippet.clone()))
.collect(); .collect();
// RRF ranking (single-list for lexical mode)
let ranked = rank_rrf(&[], &fts_tuples); let ranked = rank_rrf(&[], &fts_tuples);
let ranked_ids: Vec<i64> = ranked.iter().map(|r| r.document_id).collect(); let ranked_ids: Vec<i64> = ranked.iter().map(|r| r.document_id).collect();
// Apply post-retrieval filters
let filtered_ids = apply_filters(&conn, &ranked_ids, &filters)?; let filtered_ids = apply_filters(&conn, &ranked_ids, &filters)?;
if filtered_ids.is_empty() { if filtered_ids.is_empty() {
@@ -184,10 +170,8 @@ pub fn run_search(
}); });
} }
// Hydrate results in single round-trip
let hydrated = hydrate_results(&conn, &filtered_ids)?; let hydrated = hydrate_results(&conn, &filtered_ids)?;
// Build display results preserving filter order
let rrf_map: std::collections::HashMap<i64, &crate::search::RrfResult> = let rrf_map: std::collections::HashMap<i64, &crate::search::RrfResult> =
ranked.iter().map(|r| (r.document_id, r)).collect(); ranked.iter().map(|r| (r.document_id, r)).collect();
@@ -233,7 +217,6 @@ pub fn run_search(
}) })
} }
/// Raw row from hydration query.
struct HydratedRow { struct HydratedRow {
document_id: i64, document_id: i64,
source_type: String, source_type: String,
@@ -248,10 +231,6 @@ struct HydratedRow {
paths: Vec<String>, paths: Vec<String>,
} }
/// Hydrate document IDs into full display rows in a single query.
///
/// Uses json_each() to pass ranked IDs and preserve ordering via ORDER BY j.key.
/// Labels and paths fetched via correlated json_group_array subqueries.
fn hydrate_results(conn: &rusqlite::Connection, document_ids: &[i64]) -> Result<Vec<HydratedRow>> { fn hydrate_results(conn: &rusqlite::Connection, document_ids: &[i64]) -> Result<Vec<HydratedRow>> {
if document_ids.is_empty() { if document_ids.is_empty() {
return Ok(Vec::new()); return Ok(Vec::new());
@@ -299,7 +278,6 @@ fn hydrate_results(conn: &rusqlite::Connection, document_ids: &[i64]) -> Result<
Ok(rows) Ok(rows)
} }
/// Parse a JSON array string into a Vec<String>, filtering out null/empty.
fn parse_json_array(json: &str) -> Vec<String> { fn parse_json_array(json: &str) -> Vec<String> {
serde_json::from_str::<Vec<serde_json::Value>>(json) serde_json::from_str::<Vec<serde_json::Value>>(json)
.unwrap_or_default() .unwrap_or_default()
@@ -309,7 +287,6 @@ fn parse_json_array(json: &str) -> Vec<String> {
.collect() .collect()
} }
/// Print human-readable search results.
pub fn print_search_results(response: &SearchResponse) { pub fn print_search_results(response: &SearchResponse) {
if !response.warnings.is_empty() { if !response.warnings.is_empty() {
for w in &response.warnings { for w in &response.warnings {
@@ -364,7 +341,6 @@ pub fn print_search_results(response: &SearchResponse) {
println!(" Labels: {}", result.labels.join(", ")); println!(" Labels: {}", result.labels.join(", "));
} }
// Strip HTML tags from snippet for terminal display
let clean_snippet = result.snippet.replace("<mark>", "").replace("</mark>", ""); let clean_snippet = result.snippet.replace("<mark>", "").replace("</mark>", "");
println!(" {}", style(clean_snippet).dim()); println!(" {}", style(clean_snippet).dim());
@@ -384,7 +360,6 @@ pub fn print_search_results(response: &SearchResponse) {
} }
} }
/// JSON output structures.
#[derive(Serialize)] #[derive(Serialize)]
struct SearchJsonOutput<'a> { struct SearchJsonOutput<'a> {
ok: bool, ok: bool,
@@ -397,7 +372,6 @@ struct SearchMeta {
elapsed_ms: u64, elapsed_ms: u64,
} }
/// Print JSON robot-mode output.
pub fn print_search_results_json(response: &SearchResponse, elapsed_ms: u64) { pub fn print_search_results_json(response: &SearchResponse, elapsed_ms: u64) {
let output = SearchJsonOutput { let output = SearchJsonOutput {
ok: true, ok: true,

View File

@@ -1,5 +1,3 @@
//! Show command - display detailed entity information from local database.
use console::style; use console::style;
use rusqlite::Connection; use rusqlite::Connection;
use serde::Serialize; use serde::Serialize;
@@ -11,7 +9,6 @@ use crate::core::paths::get_db_path;
use crate::core::project::resolve_project; use crate::core::project::resolve_project;
use crate::core::time::ms_to_iso; use crate::core::time::ms_to_iso;
/// Merge request metadata for display.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct MrDetail { pub struct MrDetail {
pub id: i64, pub id: i64,
@@ -35,14 +32,12 @@ pub struct MrDetail {
pub discussions: Vec<MrDiscussionDetail>, pub discussions: Vec<MrDiscussionDetail>,
} }
/// MR discussion detail for display.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct MrDiscussionDetail { pub struct MrDiscussionDetail {
pub notes: Vec<MrNoteDetail>, pub notes: Vec<MrNoteDetail>,
pub individual_note: bool, pub individual_note: bool,
} }
/// MR note detail for display (includes DiffNote position).
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct MrNoteDetail { pub struct MrNoteDetail {
pub author_username: String, pub author_username: String,
@@ -52,7 +47,6 @@ pub struct MrNoteDetail {
pub position: Option<DiffNotePosition>, pub position: Option<DiffNotePosition>,
} }
/// DiffNote position context for display.
#[derive(Debug, Clone, Serialize)] #[derive(Debug, Clone, Serialize)]
pub struct DiffNotePosition { pub struct DiffNotePosition {
pub old_path: Option<String>, pub old_path: Option<String>,
@@ -62,7 +56,6 @@ pub struct DiffNotePosition {
pub position_type: Option<String>, pub position_type: Option<String>,
} }
/// Issue metadata for display.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct IssueDetail { pub struct IssueDetail {
pub id: i64, pub id: i64,
@@ -79,14 +72,12 @@ pub struct IssueDetail {
pub discussions: Vec<DiscussionDetail>, pub discussions: Vec<DiscussionDetail>,
} }
/// Discussion detail for display.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct DiscussionDetail { pub struct DiscussionDetail {
pub notes: Vec<NoteDetail>, pub notes: Vec<NoteDetail>,
pub individual_note: bool, pub individual_note: bool,
} }
/// Note detail for display.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct NoteDetail { pub struct NoteDetail {
pub author_username: String, pub author_username: String,
@@ -95,7 +86,6 @@ pub struct NoteDetail {
pub is_system: bool, pub is_system: bool,
} }
/// Run the show issue command.
pub fn run_show_issue( pub fn run_show_issue(
config: &Config, config: &Config,
iid: i64, iid: i64,
@@ -104,13 +94,10 @@ pub fn run_show_issue(
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
// Find the issue
let issue = find_issue(&conn, iid, project_filter)?; let issue = find_issue(&conn, iid, project_filter)?;
// Load labels
let labels = get_issue_labels(&conn, issue.id)?; let labels = get_issue_labels(&conn, issue.id)?;
// Load discussions with notes
let discussions = get_issue_discussions(&conn, issue.id)?; let discussions = get_issue_discussions(&conn, issue.id)?;
Ok(IssueDetail { Ok(IssueDetail {
@@ -129,7 +116,6 @@ pub fn run_show_issue(
}) })
} }
/// Internal issue row from query.
struct IssueRow { struct IssueRow {
id: i64, id: i64,
iid: i64, iid: i64,
@@ -143,7 +129,6 @@ struct IssueRow {
project_path: String, project_path: String,
} }
/// Find issue by iid, optionally filtered by project.
fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Result<IssueRow> { fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Result<IssueRow> {
let (sql, params): (&str, Vec<Box<dyn rusqlite::ToSql>>) = match project_filter { let (sql, params): (&str, Vec<Box<dyn rusqlite::ToSql>>) = match project_filter {
Some(project) => { Some(project) => {
@@ -201,7 +186,6 @@ fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Resu
} }
} }
/// Get labels for an issue.
fn get_issue_labels(conn: &Connection, issue_id: i64) -> Result<Vec<String>> { fn get_issue_labels(conn: &Connection, issue_id: i64) -> Result<Vec<String>> {
let mut stmt = conn.prepare( let mut stmt = conn.prepare(
"SELECT l.name FROM labels l "SELECT l.name FROM labels l
@@ -217,9 +201,7 @@ fn get_issue_labels(conn: &Connection, issue_id: i64) -> Result<Vec<String>> {
Ok(labels) Ok(labels)
} }
/// Get discussions with notes for an issue.
fn get_issue_discussions(conn: &Connection, issue_id: i64) -> Result<Vec<DiscussionDetail>> { fn get_issue_discussions(conn: &Connection, issue_id: i64) -> Result<Vec<DiscussionDetail>> {
// First get all discussions
let mut disc_stmt = conn.prepare( let mut disc_stmt = conn.prepare(
"SELECT id, individual_note FROM discussions "SELECT id, individual_note FROM discussions
WHERE issue_id = ? WHERE issue_id = ?
@@ -233,7 +215,6 @@ fn get_issue_discussions(conn: &Connection, issue_id: i64) -> Result<Vec<Discuss
})? })?
.collect::<std::result::Result<Vec<_>, _>>()?; .collect::<std::result::Result<Vec<_>, _>>()?;
// Then get notes for each discussion
let mut note_stmt = conn.prepare( let mut note_stmt = conn.prepare(
"SELECT author_username, body, created_at, is_system "SELECT author_username, body, created_at, is_system
FROM notes FROM notes
@@ -255,7 +236,6 @@ fn get_issue_discussions(conn: &Connection, issue_id: i64) -> Result<Vec<Discuss
})? })?
.collect::<std::result::Result<Vec<_>, _>>()?; .collect::<std::result::Result<Vec<_>, _>>()?;
// Filter out discussions with only system notes
let has_user_notes = notes.iter().any(|n| !n.is_system); let has_user_notes = notes.iter().any(|n| !n.is_system);
if has_user_notes || notes.is_empty() { if has_user_notes || notes.is_empty() {
discussions.push(DiscussionDetail { discussions.push(DiscussionDetail {
@@ -268,24 +248,18 @@ fn get_issue_discussions(conn: &Connection, issue_id: i64) -> Result<Vec<Discuss
Ok(discussions) Ok(discussions)
} }
/// Run the show MR command.
pub fn run_show_mr(config: &Config, iid: i64, project_filter: Option<&str>) -> Result<MrDetail> { pub fn run_show_mr(config: &Config, iid: i64, project_filter: Option<&str>) -> Result<MrDetail> {
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
// Find the MR
let mr = find_mr(&conn, iid, project_filter)?; let mr = find_mr(&conn, iid, project_filter)?;
// Load labels
let labels = get_mr_labels(&conn, mr.id)?; let labels = get_mr_labels(&conn, mr.id)?;
// Load assignees
let assignees = get_mr_assignees(&conn, mr.id)?; let assignees = get_mr_assignees(&conn, mr.id)?;
// Load reviewers
let reviewers = get_mr_reviewers(&conn, mr.id)?; let reviewers = get_mr_reviewers(&conn, mr.id)?;
// Load discussions with notes
let discussions = get_mr_discussions(&conn, mr.id)?; let discussions = get_mr_discussions(&conn, mr.id)?;
Ok(MrDetail { Ok(MrDetail {
@@ -311,7 +285,6 @@ pub fn run_show_mr(config: &Config, iid: i64, project_filter: Option<&str>) -> R
}) })
} }
/// Internal MR row from query.
struct MrRow { struct MrRow {
id: i64, id: i64,
iid: i64, iid: i64,
@@ -330,7 +303,6 @@ struct MrRow {
project_path: String, project_path: String,
} }
/// Find MR by iid, optionally filtered by project.
fn find_mr(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Result<MrRow> { fn find_mr(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Result<MrRow> {
let (sql, params): (&str, Vec<Box<dyn rusqlite::ToSql>>) = match project_filter { let (sql, params): (&str, Vec<Box<dyn rusqlite::ToSql>>) = match project_filter {
Some(project) => { Some(project) => {
@@ -398,7 +370,6 @@ fn find_mr(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Result<
} }
} }
/// Get labels for an MR.
fn get_mr_labels(conn: &Connection, mr_id: i64) -> Result<Vec<String>> { fn get_mr_labels(conn: &Connection, mr_id: i64) -> Result<Vec<String>> {
let mut stmt = conn.prepare( let mut stmt = conn.prepare(
"SELECT l.name FROM labels l "SELECT l.name FROM labels l
@@ -414,7 +385,6 @@ fn get_mr_labels(conn: &Connection, mr_id: i64) -> Result<Vec<String>> {
Ok(labels) Ok(labels)
} }
/// Get assignees for an MR.
fn get_mr_assignees(conn: &Connection, mr_id: i64) -> Result<Vec<String>> { fn get_mr_assignees(conn: &Connection, mr_id: i64) -> Result<Vec<String>> {
let mut stmt = conn.prepare( let mut stmt = conn.prepare(
"SELECT username FROM mr_assignees "SELECT username FROM mr_assignees
@@ -429,7 +399,6 @@ fn get_mr_assignees(conn: &Connection, mr_id: i64) -> Result<Vec<String>> {
Ok(assignees) Ok(assignees)
} }
/// Get reviewers for an MR.
fn get_mr_reviewers(conn: &Connection, mr_id: i64) -> Result<Vec<String>> { fn get_mr_reviewers(conn: &Connection, mr_id: i64) -> Result<Vec<String>> {
let mut stmt = conn.prepare( let mut stmt = conn.prepare(
"SELECT username FROM mr_reviewers "SELECT username FROM mr_reviewers
@@ -444,9 +413,7 @@ fn get_mr_reviewers(conn: &Connection, mr_id: i64) -> Result<Vec<String>> {
Ok(reviewers) Ok(reviewers)
} }
/// Get discussions with notes for an MR.
fn get_mr_discussions(conn: &Connection, mr_id: i64) -> Result<Vec<MrDiscussionDetail>> { fn get_mr_discussions(conn: &Connection, mr_id: i64) -> Result<Vec<MrDiscussionDetail>> {
// First get all discussions
let mut disc_stmt = conn.prepare( let mut disc_stmt = conn.prepare(
"SELECT id, individual_note FROM discussions "SELECT id, individual_note FROM discussions
WHERE merge_request_id = ? WHERE merge_request_id = ?
@@ -460,7 +427,6 @@ fn get_mr_discussions(conn: &Connection, mr_id: i64) -> Result<Vec<MrDiscussionD
})? })?
.collect::<std::result::Result<Vec<_>, _>>()?; .collect::<std::result::Result<Vec<_>, _>>()?;
// Then get notes for each discussion (with DiffNote position fields)
let mut note_stmt = conn.prepare( let mut note_stmt = conn.prepare(
"SELECT author_username, body, created_at, is_system, "SELECT author_username, body, created_at, is_system,
position_old_path, position_new_path, position_old_line, position_old_path, position_new_path, position_old_line,
@@ -507,7 +473,6 @@ fn get_mr_discussions(conn: &Connection, mr_id: i64) -> Result<Vec<MrDiscussionD
})? })?
.collect::<std::result::Result<Vec<_>, _>>()?; .collect::<std::result::Result<Vec<_>, _>>()?;
// Filter out discussions with only system notes
let has_user_notes = notes.iter().any(|n| !n.is_system); let has_user_notes = notes.iter().any(|n| !n.is_system);
if has_user_notes || notes.is_empty() { if has_user_notes || notes.is_empty() {
discussions.push(MrDiscussionDetail { discussions.push(MrDiscussionDetail {
@@ -520,14 +485,11 @@ fn get_mr_discussions(conn: &Connection, mr_id: i64) -> Result<Vec<MrDiscussionD
Ok(discussions) Ok(discussions)
} }
/// Format date from ms epoch.
fn format_date(ms: i64) -> String { fn format_date(ms: i64) -> String {
let iso = ms_to_iso(ms); let iso = ms_to_iso(ms);
// Extract just the date part (YYYY-MM-DD)
iso.split('T').next().unwrap_or(&iso).to_string() iso.split('T').next().unwrap_or(&iso).to_string()
} }
/// Truncate text with ellipsis (character-safe for UTF-8).
fn truncate(s: &str, max_len: usize) -> String { fn truncate(s: &str, max_len: usize) -> String {
if s.chars().count() <= max_len { if s.chars().count() <= max_len {
s.to_string() s.to_string()
@@ -537,7 +499,6 @@ fn truncate(s: &str, max_len: usize) -> String {
} }
} }
/// Wrap text to width, with indent prefix on continuation lines.
fn wrap_text(text: &str, width: usize, indent: &str) -> String { fn wrap_text(text: &str, width: usize, indent: &str) -> String {
let mut result = String::new(); let mut result = String::new();
let mut current_line = String::new(); let mut current_line = String::new();
@@ -569,15 +530,12 @@ fn wrap_text(text: &str, width: usize, indent: &str) -> String {
result result
} }
/// Print issue detail.
pub fn print_show_issue(issue: &IssueDetail) { pub fn print_show_issue(issue: &IssueDetail) {
// Header
let header = format!("Issue #{}: {}", issue.iid, issue.title); let header = format!("Issue #{}: {}", issue.iid, issue.title);
println!("{}", style(&header).bold()); println!("{}", style(&header).bold());
println!("{}", "".repeat(header.len().min(80))); println!("{}", "".repeat(header.len().min(80)));
println!(); println!();
// Metadata
println!("Project: {}", style(&issue.project_path).cyan()); println!("Project: {}", style(&issue.project_path).cyan());
let state_styled = if issue.state == "opened" { let state_styled = if issue.state == "opened" {
@@ -603,7 +561,6 @@ pub fn print_show_issue(issue: &IssueDetail) {
println!(); println!();
// Description
println!("{}", style("Description:").bold()); println!("{}", style("Description:").bold());
if let Some(desc) = &issue.description { if let Some(desc) = &issue.description {
let truncated = truncate(desc, 500); let truncated = truncate(desc, 500);
@@ -615,7 +572,6 @@ pub fn print_show_issue(issue: &IssueDetail) {
println!(); println!();
// Discussions
let user_discussions: Vec<&DiscussionDetail> = issue let user_discussions: Vec<&DiscussionDetail> = issue
.discussions .discussions
.iter() .iter()
@@ -636,7 +592,6 @@ pub fn print_show_issue(issue: &IssueDetail) {
discussion.notes.iter().filter(|n| !n.is_system).collect(); discussion.notes.iter().filter(|n| !n.is_system).collect();
if let Some(first_note) = user_notes.first() { if let Some(first_note) = user_notes.first() {
// First note of discussion (not indented)
println!( println!(
" {} ({}):", " {} ({}):",
style(format!("@{}", first_note.author_username)).cyan(), style(format!("@{}", first_note.author_username)).cyan(),
@@ -646,7 +601,6 @@ pub fn print_show_issue(issue: &IssueDetail) {
println!(" {}", wrapped); println!(" {}", wrapped);
println!(); println!();
// Replies (indented)
for reply in user_notes.iter().skip(1) { for reply in user_notes.iter().skip(1) {
println!( println!(
" {} ({}):", " {} ({}):",
@@ -662,16 +616,13 @@ pub fn print_show_issue(issue: &IssueDetail) {
} }
} }
/// Print MR detail.
pub fn print_show_mr(mr: &MrDetail) { pub fn print_show_mr(mr: &MrDetail) {
// Header with draft indicator
let draft_prefix = if mr.draft { "[Draft] " } else { "" }; let draft_prefix = if mr.draft { "[Draft] " } else { "" };
let header = format!("MR !{}: {}{}", mr.iid, draft_prefix, mr.title); let header = format!("MR !{}: {}{}", mr.iid, draft_prefix, mr.title);
println!("{}", style(&header).bold()); println!("{}", style(&header).bold());
println!("{}", "".repeat(header.len().min(80))); println!("{}", "".repeat(header.len().min(80)));
println!(); println!();
// Metadata
println!("Project: {}", style(&mr.project_path).cyan()); println!("Project: {}", style(&mr.project_path).cyan());
let state_styled = match mr.state.as_str() { let state_styled = match mr.state.as_str() {
@@ -735,7 +686,6 @@ pub fn print_show_mr(mr: &MrDetail) {
println!(); println!();
// Description
println!("{}", style("Description:").bold()); println!("{}", style("Description:").bold());
if let Some(desc) = &mr.description { if let Some(desc) = &mr.description {
let truncated = truncate(desc, 500); let truncated = truncate(desc, 500);
@@ -747,7 +697,6 @@ pub fn print_show_mr(mr: &MrDetail) {
println!(); println!();
// Discussions
let user_discussions: Vec<&MrDiscussionDetail> = mr let user_discussions: Vec<&MrDiscussionDetail> = mr
.discussions .discussions
.iter() .iter()
@@ -768,12 +717,10 @@ pub fn print_show_mr(mr: &MrDetail) {
discussion.notes.iter().filter(|n| !n.is_system).collect(); discussion.notes.iter().filter(|n| !n.is_system).collect();
if let Some(first_note) = user_notes.first() { if let Some(first_note) = user_notes.first() {
// Print DiffNote position context if present
if let Some(pos) = &first_note.position { if let Some(pos) = &first_note.position {
print_diff_position(pos); print_diff_position(pos);
} }
// First note of discussion (not indented)
println!( println!(
" {} ({}):", " {} ({}):",
style(format!("@{}", first_note.author_username)).cyan(), style(format!("@{}", first_note.author_username)).cyan(),
@@ -783,7 +730,6 @@ pub fn print_show_mr(mr: &MrDetail) {
println!(" {}", wrapped); println!(" {}", wrapped);
println!(); println!();
// Replies (indented)
for reply in user_notes.iter().skip(1) { for reply in user_notes.iter().skip(1) {
println!( println!(
" {} ({}):", " {} ({}):",
@@ -799,7 +745,6 @@ pub fn print_show_mr(mr: &MrDetail) {
} }
} }
/// Print DiffNote position context.
fn print_diff_position(pos: &DiffNotePosition) { fn print_diff_position(pos: &DiffNotePosition) {
let file = pos.new_path.as_ref().or(pos.old_path.as_ref()); let file = pos.new_path.as_ref().or(pos.old_path.as_ref());
@@ -821,11 +766,6 @@ fn print_diff_position(pos: &DiffNotePosition) {
} }
} }
// ============================================================================
// JSON Output Structs (with ISO timestamps for machine consumption)
// ============================================================================
/// JSON output for issue detail.
#[derive(Serialize)] #[derive(Serialize)]
pub struct IssueDetailJson { pub struct IssueDetailJson {
pub id: i64, pub id: i64,
@@ -842,14 +782,12 @@ pub struct IssueDetailJson {
pub discussions: Vec<DiscussionDetailJson>, pub discussions: Vec<DiscussionDetailJson>,
} }
/// JSON output for discussion detail.
#[derive(Serialize)] #[derive(Serialize)]
pub struct DiscussionDetailJson { pub struct DiscussionDetailJson {
pub notes: Vec<NoteDetailJson>, pub notes: Vec<NoteDetailJson>,
pub individual_note: bool, pub individual_note: bool,
} }
/// JSON output for note detail.
#[derive(Serialize)] #[derive(Serialize)]
pub struct NoteDetailJson { pub struct NoteDetailJson {
pub author_username: String, pub author_username: String,
@@ -897,7 +835,6 @@ impl From<&NoteDetail> for NoteDetailJson {
} }
} }
/// JSON output for MR detail.
#[derive(Serialize)] #[derive(Serialize)]
pub struct MrDetailJson { pub struct MrDetailJson {
pub id: i64, pub id: i64,
@@ -921,14 +858,12 @@ pub struct MrDetailJson {
pub discussions: Vec<MrDiscussionDetailJson>, pub discussions: Vec<MrDiscussionDetailJson>,
} }
/// JSON output for MR discussion detail.
#[derive(Serialize)] #[derive(Serialize)]
pub struct MrDiscussionDetailJson { pub struct MrDiscussionDetailJson {
pub notes: Vec<MrNoteDetailJson>, pub notes: Vec<MrNoteDetailJson>,
pub individual_note: bool, pub individual_note: bool,
} }
/// JSON output for MR note detail.
#[derive(Serialize)] #[derive(Serialize)]
pub struct MrNoteDetailJson { pub struct MrNoteDetailJson {
pub author_username: String, pub author_username: String,
@@ -985,7 +920,6 @@ impl From<&MrNoteDetail> for MrNoteDetailJson {
} }
} }
/// Print issue detail as JSON.
pub fn print_show_issue_json(issue: &IssueDetail) { pub fn print_show_issue_json(issue: &IssueDetail) {
let json_result = IssueDetailJson::from(issue); let json_result = IssueDetailJson::from(issue);
match serde_json::to_string_pretty(&json_result) { match serde_json::to_string_pretty(&json_result) {
@@ -994,7 +928,6 @@ pub fn print_show_issue_json(issue: &IssueDetail) {
} }
} }
/// Print MR detail as JSON.
pub fn print_show_mr_json(mr: &MrDetail) { pub fn print_show_mr_json(mr: &MrDetail) {
let json_result = MrDetailJson::from(mr); let json_result = MrDetailJson::from(mr);
match serde_json::to_string_pretty(&json_result) { match serde_json::to_string_pretty(&json_result) {
@@ -1030,7 +963,6 @@ mod tests {
#[test] #[test]
fn format_date_extracts_date_part() { fn format_date_extracts_date_part() {
// 2024-01-15T00:00:00Z in milliseconds
let ms = 1705276800000; let ms = 1705276800000;
let date = format_date(ms); let date = format_date(ms);
assert!(date.starts_with("2024-01-15")); assert!(date.starts_with("2024-01-15"));

View File

@@ -1,5 +1,3 @@
//! Stats command: document counts, embedding coverage, queue status, integrity checks.
use console::style; use console::style;
use rusqlite::Connection; use rusqlite::Connection;
use serde::Serialize; use serde::Serialize;
@@ -9,7 +7,6 @@ use crate::core::db::create_connection;
use crate::core::error::Result; use crate::core::error::Result;
use crate::core::paths::get_db_path; use crate::core::paths::get_db_path;
/// Result of the stats command.
#[derive(Debug, Default, Serialize)] #[derive(Debug, Default, Serialize)]
pub struct StatsResult { pub struct StatsResult {
pub documents: DocumentStats, pub documents: DocumentStats,
@@ -74,14 +71,12 @@ pub struct RepairResult {
pub stale_cleared: i64, pub stale_cleared: i64,
} }
/// Run the stats command.
pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResult> { pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResult> {
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
let mut result = StatsResult::default(); let mut result = StatsResult::default();
// Document counts
result.documents.total = count_query(&conn, "SELECT COUNT(*) FROM documents")?; result.documents.total = count_query(&conn, "SELECT COUNT(*) FROM documents")?;
result.documents.issues = count_query( result.documents.issues = count_query(
&conn, &conn,
@@ -100,7 +95,6 @@ pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResu
"SELECT COUNT(*) FROM documents WHERE is_truncated = 1", "SELECT COUNT(*) FROM documents WHERE is_truncated = 1",
)?; )?;
// Embedding stats — skip gracefully if table doesn't exist (Gate A only)
if table_exists(&conn, "embedding_metadata") { if table_exists(&conn, "embedding_metadata") {
let embedded = count_query( let embedded = count_query(
&conn, &conn,
@@ -119,10 +113,8 @@ pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResu
}; };
} }
// FTS stats
result.fts.indexed = count_query(&conn, "SELECT COUNT(*) FROM documents_fts")?; result.fts.indexed = count_query(&conn, "SELECT COUNT(*) FROM documents_fts")?;
// Queue stats
result.queues.dirty_sources = count_query( result.queues.dirty_sources = count_query(
&conn, &conn,
"SELECT COUNT(*) FROM dirty_sources WHERE last_error IS NULL", "SELECT COUNT(*) FROM dirty_sources WHERE last_error IS NULL",
@@ -158,15 +150,12 @@ pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResu
)?; )?;
} }
// Integrity check
#[allow(clippy::field_reassign_with_default)] #[allow(clippy::field_reassign_with_default)]
if check { if check {
let mut integrity = IntegrityResult::default(); let mut integrity = IntegrityResult::default();
// FTS/doc count mismatch
integrity.fts_doc_mismatch = result.fts.indexed != result.documents.total; integrity.fts_doc_mismatch = result.fts.indexed != result.documents.total;
// Orphan embeddings (rowid/1000 should match a document ID)
if table_exists(&conn, "embeddings") { if table_exists(&conn, "embeddings") {
integrity.orphan_embeddings = count_query( integrity.orphan_embeddings = count_query(
&conn, &conn,
@@ -175,7 +164,6 @@ pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResu
)?; )?;
} }
// Stale metadata (document_hash != current content_hash)
if table_exists(&conn, "embedding_metadata") { if table_exists(&conn, "embedding_metadata") {
integrity.stale_metadata = count_query( integrity.stale_metadata = count_query(
&conn, &conn,
@@ -185,7 +173,6 @@ pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResu
)?; )?;
} }
// Orphaned resource events (FK targets missing)
if table_exists(&conn, "resource_state_events") { if table_exists(&conn, "resource_state_events") {
integrity.orphan_state_events = count_query( integrity.orphan_state_events = count_query(
&conn, &conn,
@@ -211,7 +198,6 @@ pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResu
)?; )?;
} }
// Queue health: stuck locks and max retry attempts
if table_exists(&conn, "pending_dependent_fetches") { if table_exists(&conn, "pending_dependent_fetches") {
integrity.queue_stuck_locks = count_query( integrity.queue_stuck_locks = count_query(
&conn, &conn,
@@ -232,7 +218,6 @@ pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResu
&& integrity.stale_metadata == 0 && integrity.stale_metadata == 0
&& orphan_events == 0; && orphan_events == 0;
// Repair
if repair { if repair {
let mut repair_result = RepairResult::default(); let mut repair_result = RepairResult::default();
@@ -252,7 +237,6 @@ pub fn run_stats(config: &Config, check: bool, repair: bool) -> Result<StatsResu
)?; )?;
repair_result.orphans_deleted = deleted as i64; repair_result.orphans_deleted = deleted as i64;
// Also clean orphaned vectors if vec0 table exists
if table_exists(&conn, "embeddings") { if table_exists(&conn, "embeddings") {
let _ = conn.execute( let _ = conn.execute(
"DELETE FROM embeddings "DELETE FROM embeddings
@@ -299,7 +283,6 @@ fn table_exists(conn: &Connection, table: &str) -> bool {
> 0 > 0
} }
/// Print human-readable stats.
pub fn print_stats(result: &StatsResult) { pub fn print_stats(result: &StatsResult) {
println!("{}", style("Documents").cyan().bold()); println!("{}", style("Documents").cyan().bold());
println!(" Total: {}", result.documents.total); println!(" Total: {}", result.documents.total);
@@ -429,14 +412,12 @@ pub fn print_stats(result: &StatsResult) {
} }
} }
/// JSON output structures.
#[derive(Serialize)] #[derive(Serialize)]
struct StatsJsonOutput { struct StatsJsonOutput {
ok: bool, ok: bool,
data: StatsResult, data: StatsResult,
} }
/// Print JSON robot-mode output.
pub fn print_stats_json(result: &StatsResult) { pub fn print_stats_json(result: &StatsResult) {
let output = StatsJsonOutput { let output = StatsJsonOutput {
ok: true, ok: true,

View File

@@ -1,10 +1,8 @@
//! Sync command: unified orchestrator for ingest -> generate-docs -> embed.
use console::style; use console::style;
use indicatif::{ProgressBar, ProgressStyle}; use indicatif::{ProgressBar, ProgressStyle};
use serde::Serialize; use serde::Serialize;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc; use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};
use tracing::Instrument; use tracing::Instrument;
use tracing::{info, warn}; use tracing::{info, warn};
@@ -16,7 +14,6 @@ use super::embed::run_embed;
use super::generate_docs::run_generate_docs; use super::generate_docs::run_generate_docs;
use super::ingest::{IngestDisplay, run_ingest}; use super::ingest::{IngestDisplay, run_ingest};
/// Options for the sync command.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct SyncOptions { pub struct SyncOptions {
pub full: bool, pub full: bool,
@@ -27,7 +24,6 @@ pub struct SyncOptions {
pub robot_mode: bool, pub robot_mode: bool,
} }
/// Result of the sync command.
#[derive(Debug, Default, Serialize)] #[derive(Debug, Default, Serialize)]
pub struct SyncResult { pub struct SyncResult {
#[serde(skip)] #[serde(skip)]
@@ -41,10 +37,6 @@ pub struct SyncResult {
pub documents_embedded: usize, pub documents_embedded: usize,
} }
/// Create a styled spinner for a sync stage.
///
/// Uses `{prefix}` for the `[N/M]` stage label so callers can update `{msg}`
/// independently without losing the stage context.
fn stage_spinner(stage: u8, total: u8, msg: &str, robot_mode: bool) -> ProgressBar { fn stage_spinner(stage: u8, total: u8, msg: &str, robot_mode: bool) -> ProgressBar {
if robot_mode { if robot_mode {
return ProgressBar::hidden(); return ProgressBar::hidden();
@@ -61,11 +53,6 @@ fn stage_spinner(stage: u8, total: u8, msg: &str, robot_mode: bool) -> ProgressB
pb pb
} }
/// Run the full sync pipeline: ingest -> generate-docs -> embed.
///
/// `run_id` is an optional correlation ID for log/metrics tracing.
/// When called from `handle_sync_cmd`, this should be the same ID
/// stored in the `sync_runs` table so logs and DB records correlate.
pub async fn run_sync( pub async fn run_sync(
config: &Config, config: &Config,
options: SyncOptions, options: SyncOptions,
@@ -102,7 +89,6 @@ pub async fn run_sync(
}; };
let mut current_stage: u8 = 0; let mut current_stage: u8 = 0;
// Stage 1: Ingest issues
current_stage += 1; current_stage += 1;
let spinner = stage_spinner( let spinner = stage_spinner(
current_stage, current_stage,
@@ -127,7 +113,6 @@ pub async fn run_sync(
result.resource_events_failed += issues_result.resource_events_failed; result.resource_events_failed += issues_result.resource_events_failed;
spinner.finish_and_clear(); spinner.finish_and_clear();
// Stage 2: Ingest MRs
current_stage += 1; current_stage += 1;
let spinner = stage_spinner( let spinner = stage_spinner(
current_stage, current_stage,
@@ -152,7 +137,6 @@ pub async fn run_sync(
result.resource_events_failed += mrs_result.resource_events_failed; result.resource_events_failed += mrs_result.resource_events_failed;
spinner.finish_and_clear(); spinner.finish_and_clear();
// Stage 3: Generate documents (unless --no-docs)
if !options.no_docs { if !options.no_docs {
current_stage += 1; current_stage += 1;
let spinner = stage_spinner( let spinner = stage_spinner(
@@ -163,7 +147,6 @@ pub async fn run_sync(
); );
info!("Sync stage {current_stage}/{total_stages}: generating documents"); info!("Sync stage {current_stage}/{total_stages}: generating documents");
// Create a dedicated progress bar matching the ingest stage style
let docs_bar = if options.robot_mode { let docs_bar = if options.robot_mode {
ProgressBar::hidden() ProgressBar::hidden()
} else { } else {
@@ -186,8 +169,6 @@ pub async fn run_sync(
if !tick_started_clone.swap(true, Ordering::Relaxed) { if !tick_started_clone.swap(true, Ordering::Relaxed) {
docs_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100)); docs_bar_clone.enable_steady_tick(std::time::Duration::from_millis(100));
} }
// Update length every callback — the regenerator's estimated_total
// can grow if new dirty items are queued during processing.
docs_bar_clone.set_length(total as u64); docs_bar_clone.set_length(total as u64);
docs_bar_clone.set_position(processed as u64); docs_bar_clone.set_position(processed as u64);
} }
@@ -200,7 +181,6 @@ pub async fn run_sync(
info!("Sync: skipping document generation (--no-docs)"); info!("Sync: skipping document generation (--no-docs)");
} }
// Stage 4: Embed documents (unless --no-embed)
if !options.no_embed { if !options.no_embed {
current_stage += 1; current_stage += 1;
let spinner = stage_spinner( let spinner = stage_spinner(
@@ -211,7 +191,6 @@ pub async fn run_sync(
); );
info!("Sync stage {current_stage}/{total_stages}: embedding documents"); info!("Sync stage {current_stage}/{total_stages}: embedding documents");
// Create a dedicated progress bar matching the ingest stage style
let embed_bar = if options.robot_mode { let embed_bar = if options.robot_mode {
ProgressBar::hidden() ProgressBar::hidden()
} else { } else {
@@ -245,7 +224,6 @@ pub async fn run_sync(
spinner.finish_and_clear(); spinner.finish_and_clear();
} }
Err(e) => { Err(e) => {
// Graceful degradation: Ollama down is a warning, not an error
embed_bar.finish_and_clear(); embed_bar.finish_and_clear();
spinner.finish_and_clear(); spinner.finish_and_clear();
if !options.robot_mode { if !options.robot_mode {
@@ -275,7 +253,6 @@ pub async fn run_sync(
.await .await
} }
/// Print human-readable sync summary.
pub fn print_sync( pub fn print_sync(
result: &SyncResult, result: &SyncResult,
elapsed: std::time::Duration, elapsed: std::time::Duration,
@@ -307,7 +284,6 @@ pub fn print_sync(
println!(" Documents embedded: {}", result.documents_embedded); println!(" Documents embedded: {}", result.documents_embedded);
println!(" Elapsed: {:.1}s", elapsed.as_secs_f64()); println!(" Elapsed: {:.1}s", elapsed.as_secs_f64());
// Print per-stage timing breakdown if metrics are available
if let Some(metrics) = metrics { if let Some(metrics) = metrics {
let stages = metrics.extract_timings(); let stages = metrics.extract_timings();
if !stages.is_empty() { if !stages.is_empty() {
@@ -316,7 +292,6 @@ pub fn print_sync(
} }
} }
/// Print per-stage timing breakdown for interactive users.
fn print_timing_summary(stages: &[StageTiming]) { fn print_timing_summary(stages: &[StageTiming]) {
println!(); println!();
println!("{}", style("Stage timing:").dim()); println!("{}", style("Stage timing:").dim());
@@ -327,7 +302,6 @@ fn print_timing_summary(stages: &[StageTiming]) {
} }
} }
/// Print a single stage timing line with indentation.
fn print_stage_line(stage: &StageTiming, depth: usize) { fn print_stage_line(stage: &StageTiming, depth: usize) {
let indent = " ".repeat(depth); let indent = " ".repeat(depth);
let name = if let Some(ref project) = stage.project { let name = if let Some(ref project) = stage.project {
@@ -367,7 +341,6 @@ fn print_stage_line(stage: &StageTiming, depth: usize) {
} }
} }
/// JSON output for sync.
#[derive(Serialize)] #[derive(Serialize)]
struct SyncJsonOutput<'a> { struct SyncJsonOutput<'a> {
ok: bool, ok: bool,
@@ -383,7 +356,6 @@ struct SyncMeta {
stages: Vec<StageTiming>, stages: Vec<StageTiming>,
} }
/// Print JSON robot-mode sync output with optional metrics.
pub fn print_sync_json(result: &SyncResult, elapsed_ms: u64, metrics: Option<&MetricsLayer>) { pub fn print_sync_json(result: &SyncResult, elapsed_ms: u64, metrics: Option<&MetricsLayer>) {
let stages = metrics.map_or_else(Vec::new, MetricsLayer::extract_timings); let stages = metrics.map_or_else(Vec::new, MetricsLayer::extract_timings);
let output = SyncJsonOutput { let output = SyncJsonOutput {

View File

@@ -1,5 +1,3 @@
//! Sync status command - display synchronization state from local database.
use console::style; use console::style;
use rusqlite::Connection; use rusqlite::Connection;
use serde::Serialize; use serde::Serialize;
@@ -13,7 +11,6 @@ use crate::core::time::{format_full_datetime, ms_to_iso};
const RECENT_RUNS_LIMIT: usize = 10; const RECENT_RUNS_LIMIT: usize = 10;
/// Sync run information.
#[derive(Debug)] #[derive(Debug)]
pub struct SyncRunInfo { pub struct SyncRunInfo {
pub id: i64, pub id: i64,
@@ -28,7 +25,6 @@ pub struct SyncRunInfo {
pub stages: Option<Vec<StageTiming>>, pub stages: Option<Vec<StageTiming>>,
} }
/// Cursor position information.
#[derive(Debug)] #[derive(Debug)]
pub struct CursorInfo { pub struct CursorInfo {
pub project_path: String, pub project_path: String,
@@ -37,7 +33,6 @@ pub struct CursorInfo {
pub tie_breaker_id: Option<i64>, pub tie_breaker_id: Option<i64>,
} }
/// Data summary counts.
#[derive(Debug)] #[derive(Debug)]
pub struct DataSummary { pub struct DataSummary {
pub issue_count: i64, pub issue_count: i64,
@@ -47,7 +42,6 @@ pub struct DataSummary {
pub system_note_count: i64, pub system_note_count: i64,
} }
/// Complete sync status result.
#[derive(Debug)] #[derive(Debug)]
pub struct SyncStatusResult { pub struct SyncStatusResult {
pub runs: Vec<SyncRunInfo>, pub runs: Vec<SyncRunInfo>,
@@ -55,7 +49,6 @@ pub struct SyncStatusResult {
pub summary: DataSummary, pub summary: DataSummary,
} }
/// Run the sync-status command.
pub fn run_sync_status(config: &Config) -> Result<SyncStatusResult> { pub fn run_sync_status(config: &Config) -> Result<SyncStatusResult> {
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
@@ -71,7 +64,6 @@ pub fn run_sync_status(config: &Config) -> Result<SyncStatusResult> {
}) })
} }
/// Get the most recent sync runs.
fn get_recent_sync_runs(conn: &Connection, limit: usize) -> Result<Vec<SyncRunInfo>> { fn get_recent_sync_runs(conn: &Connection, limit: usize) -> Result<Vec<SyncRunInfo>> {
let mut stmt = conn.prepare( let mut stmt = conn.prepare(
"SELECT id, started_at, finished_at, status, command, error, "SELECT id, started_at, finished_at, status, command, error,
@@ -105,7 +97,6 @@ fn get_recent_sync_runs(conn: &Connection, limit: usize) -> Result<Vec<SyncRunIn
Ok(runs?) Ok(runs?)
} }
/// Get cursor positions for all projects/resource types.
fn get_cursor_positions(conn: &Connection) -> Result<Vec<CursorInfo>> { fn get_cursor_positions(conn: &Connection) -> Result<Vec<CursorInfo>> {
let mut stmt = conn.prepare( let mut stmt = conn.prepare(
"SELECT p.path_with_namespace, sc.resource_type, sc.updated_at_cursor, sc.tie_breaker_id "SELECT p.path_with_namespace, sc.resource_type, sc.updated_at_cursor, sc.tie_breaker_id
@@ -128,7 +119,6 @@ fn get_cursor_positions(conn: &Connection) -> Result<Vec<CursorInfo>> {
Ok(cursors?) Ok(cursors?)
} }
/// Get data summary counts.
fn get_data_summary(conn: &Connection) -> Result<DataSummary> { fn get_data_summary(conn: &Connection) -> Result<DataSummary> {
let issue_count: i64 = conn let issue_count: i64 = conn
.query_row("SELECT COUNT(*) FROM issues", [], |row| row.get(0)) .query_row("SELECT COUNT(*) FROM issues", [], |row| row.get(0))
@@ -159,7 +149,6 @@ fn get_data_summary(conn: &Connection) -> Result<DataSummary> {
}) })
} }
/// Format duration in milliseconds to human-readable string.
fn format_duration(ms: i64) -> String { fn format_duration(ms: i64) -> String {
let seconds = ms / 1000; let seconds = ms / 1000;
let minutes = seconds / 60; let minutes = seconds / 60;
@@ -176,7 +165,6 @@ fn format_duration(ms: i64) -> String {
} }
} }
/// Format number with thousands separators.
fn format_number(n: i64) -> String { fn format_number(n: i64) -> String {
let is_negative = n < 0; let is_negative = n < 0;
let abs_n = n.unsigned_abs(); let abs_n = n.unsigned_abs();
@@ -198,10 +186,6 @@ fn format_number(n: i64) -> String {
result result
} }
// ============================================================================
// JSON output structures for robot mode
// ============================================================================
#[derive(Serialize)] #[derive(Serialize)]
struct SyncStatusJsonOutput { struct SyncStatusJsonOutput {
ok: bool, ok: bool,
@@ -254,7 +238,6 @@ struct SummaryJsonInfo {
system_notes: i64, system_notes: i64,
} }
/// Print sync status as JSON (robot mode).
pub fn print_sync_status_json(result: &SyncStatusResult) { pub fn print_sync_status_json(result: &SyncStatusResult) {
let runs = result let runs = result
.runs .runs
@@ -306,13 +289,7 @@ pub fn print_sync_status_json(result: &SyncStatusResult) {
println!("{}", serde_json::to_string(&output).unwrap()); println!("{}", serde_json::to_string(&output).unwrap());
} }
// ============================================================================
// Human-readable output
// ============================================================================
/// Print sync status result.
pub fn print_sync_status(result: &SyncStatusResult) { pub fn print_sync_status(result: &SyncStatusResult) {
// Recent Runs section
println!("{}", style("Recent Sync Runs").bold().underlined()); println!("{}", style("Recent Sync Runs").bold().underlined());
println!(); println!();
@@ -330,7 +307,6 @@ pub fn print_sync_status(result: &SyncStatusResult) {
println!(); println!();
// Cursor Positions section
println!("{}", style("Cursor Positions").bold().underlined()); println!("{}", style("Cursor Positions").bold().underlined());
println!(); println!();
@@ -361,7 +337,6 @@ pub fn print_sync_status(result: &SyncStatusResult) {
println!(); println!();
// Data Summary section
println!("{}", style("Data Summary").bold().underlined()); println!("{}", style("Data Summary").bold().underlined());
println!(); println!();
@@ -390,7 +365,6 @@ pub fn print_sync_status(result: &SyncStatusResult) {
); );
} }
/// Print a single run as a compact one-liner.
fn print_run_line(run: &SyncRunInfo) { fn print_run_line(run: &SyncRunInfo) {
let status_styled = match run.status.as_str() { let status_styled = match run.status.as_str() {
"succeeded" => style(&run.status).green(), "succeeded" => style(&run.status).green(),

View File

@@ -1,41 +1,31 @@
//! CLI module with clap command definitions.
pub mod commands; pub mod commands;
pub mod progress; pub mod progress;
use clap::{Parser, Subcommand}; use clap::{Parser, Subcommand};
use std::io::IsTerminal; use std::io::IsTerminal;
/// Gitlore - Local GitLab data management with semantic search
#[derive(Parser)] #[derive(Parser)]
#[command(name = "lore")] #[command(name = "lore")]
#[command(version, about, long_about = None)] #[command(version, about, long_about = None)]
pub struct Cli { pub struct Cli {
/// Path to config file
#[arg(short = 'c', long, global = true)] #[arg(short = 'c', long, global = true)]
pub config: Option<String>, pub config: Option<String>,
/// Machine-readable JSON output (auto-enabled when piped)
#[arg(long, global = true, env = "LORE_ROBOT")] #[arg(long, global = true, env = "LORE_ROBOT")]
pub robot: bool, pub robot: bool,
/// JSON output (global shorthand)
#[arg(short = 'J', long = "json", global = true)] #[arg(short = 'J', long = "json", global = true)]
pub json: bool, pub json: bool,
/// Color output: auto (default), always, or never
#[arg(long, global = true, value_parser = ["auto", "always", "never"], default_value = "auto")] #[arg(long, global = true, value_parser = ["auto", "always", "never"], default_value = "auto")]
pub color: String, pub color: String,
/// Suppress non-essential output
#[arg(short = 'q', long, global = true)] #[arg(short = 'q', long, global = true)]
pub quiet: bool, pub quiet: bool,
/// Increase log verbosity (-v, -vv, -vvv)
#[arg(short = 'v', long = "verbose", action = clap::ArgAction::Count, global = true)] #[arg(short = 'v', long = "verbose", action = clap::ArgAction::Count, global = true)]
pub verbose: u8, pub verbose: u8,
/// Log format for stderr output: text (default) or json
#[arg(long = "log-format", global = true, value_parser = ["text", "json"], default_value = "text")] #[arg(long = "log-format", global = true, value_parser = ["text", "json"], default_value = "text")]
pub log_format: String, pub log_format: String,
@@ -44,7 +34,6 @@ pub struct Cli {
} }
impl Cli { impl Cli {
/// Check if robot mode is active (explicit flag, env var, or non-TTY stdout)
pub fn is_robot_mode(&self) -> bool { pub fn is_robot_mode(&self) -> bool {
self.robot || self.json || !std::io::stdout().is_terminal() self.robot || self.json || !std::io::stdout().is_terminal()
} }
@@ -53,104 +42,74 @@ impl Cli {
#[derive(Subcommand)] #[derive(Subcommand)]
#[allow(clippy::large_enum_variant)] #[allow(clippy::large_enum_variant)]
pub enum Commands { pub enum Commands {
/// List or show issues
Issues(IssuesArgs), Issues(IssuesArgs),
/// List or show merge requests
Mrs(MrsArgs), Mrs(MrsArgs),
/// Ingest data from GitLab
Ingest(IngestArgs), Ingest(IngestArgs),
/// Count entities in local database
Count(CountArgs), Count(CountArgs),
/// Show sync state
Status, Status,
/// Verify GitLab authentication
Auth, Auth,
/// Check environment health
Doctor, Doctor,
/// Show version information
Version, Version,
/// Initialize configuration and database
Init { Init {
/// Skip overwrite confirmation
#[arg(short = 'f', long)] #[arg(short = 'f', long)]
force: bool, force: bool,
/// Fail if prompts would be shown
#[arg(long)] #[arg(long)]
non_interactive: bool, non_interactive: bool,
/// GitLab base URL (required in robot mode)
#[arg(long)] #[arg(long)]
gitlab_url: Option<String>, gitlab_url: Option<String>,
/// Environment variable name holding GitLab token (required in robot mode)
#[arg(long)] #[arg(long)]
token_env_var: Option<String>, token_env_var: Option<String>,
/// Comma-separated project paths (required in robot mode)
#[arg(long)] #[arg(long)]
projects: Option<String>, projects: Option<String>,
}, },
/// Create timestamped database backup
#[command(hide = true)] #[command(hide = true)]
Backup, Backup,
/// Delete database and reset all state
#[command(hide = true)] #[command(hide = true)]
Reset { Reset {
/// Skip confirmation prompt
#[arg(short = 'y', long)] #[arg(short = 'y', long)]
yes: bool, yes: bool,
}, },
/// Search indexed documents
Search(SearchArgs), Search(SearchArgs),
/// Show document and index statistics
Stats(StatsArgs), Stats(StatsArgs),
/// Generate searchable documents from ingested data
#[command(name = "generate-docs")] #[command(name = "generate-docs")]
GenerateDocs(GenerateDocsArgs), GenerateDocs(GenerateDocsArgs),
/// Generate vector embeddings for documents via Ollama
Embed(EmbedArgs), Embed(EmbedArgs),
/// Run full sync pipeline: ingest -> generate-docs -> embed
Sync(SyncArgs), Sync(SyncArgs),
/// Run pending database migrations
Migrate, Migrate,
/// Quick health check: config, database, schema version
Health, Health,
/// Machine-readable command manifest for agent self-discovery
#[command(name = "robot-docs")] #[command(name = "robot-docs")]
RobotDocs, RobotDocs,
/// Generate shell completions
#[command(hide = true)] #[command(hide = true)]
Completions { Completions {
/// Shell to generate completions for
#[arg(value_parser = ["bash", "zsh", "fish", "powershell"])] #[arg(value_parser = ["bash", "zsh", "fish", "powershell"])]
shell: String, shell: String,
}, },
// --- Hidden backward-compat aliases ---
/// List issues or MRs (deprecated: use 'lore issues' or 'lore mrs')
#[command(hide = true)] #[command(hide = true)]
List { List {
/// Entity type to list
#[arg(value_parser = ["issues", "mrs"])] #[arg(value_parser = ["issues", "mrs"])]
entity: String, entity: String,
@@ -192,36 +151,28 @@ pub enum Commands {
source_branch: Option<String>, source_branch: Option<String>,
}, },
/// Show detailed entity information (deprecated: use 'lore issues <IID>' or 'lore mrs <IID>')
#[command(hide = true)] #[command(hide = true)]
Show { Show {
/// Entity type to show
#[arg(value_parser = ["issue", "mr"])] #[arg(value_parser = ["issue", "mr"])]
entity: String, entity: String,
/// Entity IID
iid: i64, iid: i64,
#[arg(long)] #[arg(long)]
project: Option<String>, project: Option<String>,
}, },
/// Verify GitLab authentication (deprecated: use 'lore auth')
#[command(hide = true, name = "auth-test")] #[command(hide = true, name = "auth-test")]
AuthTest, AuthTest,
/// Show sync state (deprecated: use 'lore status')
#[command(hide = true, name = "sync-status")] #[command(hide = true, name = "sync-status")]
SyncStatus, SyncStatus,
} }
/// Arguments for `lore issues [IID]`
#[derive(Parser)] #[derive(Parser)]
pub struct IssuesArgs { pub struct IssuesArgs {
/// Issue IID (omit to list, provide to show details)
pub iid: Option<i64>, pub iid: Option<i64>,
/// Maximum results
#[arg( #[arg(
short = 'n', short = 'n',
long = "limit", long = "limit",
@@ -230,39 +181,30 @@ pub struct IssuesArgs {
)] )]
pub limit: usize, pub limit: usize,
/// Filter by state (opened, closed, all)
#[arg(short = 's', long, help_heading = "Filters")] #[arg(short = 's', long, help_heading = "Filters")]
pub state: Option<String>, pub state: Option<String>,
/// Filter by project path
#[arg(short = 'p', long, help_heading = "Filters")] #[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>, pub project: Option<String>,
/// Filter by author username
#[arg(short = 'a', long, help_heading = "Filters")] #[arg(short = 'a', long, help_heading = "Filters")]
pub author: Option<String>, pub author: Option<String>,
/// Filter by assignee username
#[arg(short = 'A', long, help_heading = "Filters")] #[arg(short = 'A', long, help_heading = "Filters")]
pub assignee: Option<String>, pub assignee: Option<String>,
/// Filter by label (repeatable, AND logic)
#[arg(short = 'l', long, help_heading = "Filters")] #[arg(short = 'l', long, help_heading = "Filters")]
pub label: Option<Vec<String>>, pub label: Option<Vec<String>>,
/// Filter by milestone title
#[arg(short = 'm', long, help_heading = "Filters")] #[arg(short = 'm', long, help_heading = "Filters")]
pub milestone: Option<String>, pub milestone: Option<String>,
/// Filter by time (7d, 2w, 1m, or YYYY-MM-DD)
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub since: Option<String>, pub since: Option<String>,
/// Filter by due date (before this date, YYYY-MM-DD)
#[arg(long = "due-before", help_heading = "Filters")] #[arg(long = "due-before", help_heading = "Filters")]
pub due_before: Option<String>, pub due_before: Option<String>,
/// Show only issues with a due date
#[arg( #[arg(
long = "has-due", long = "has-due",
help_heading = "Filters", help_heading = "Filters",
@@ -273,18 +215,15 @@ pub struct IssuesArgs {
#[arg(long = "no-has-due", hide = true, overrides_with = "has_due")] #[arg(long = "no-has-due", hide = true, overrides_with = "has_due")]
pub no_has_due: bool, pub no_has_due: bool,
/// Sort field (updated, created, iid)
#[arg(long, value_parser = ["updated", "created", "iid"], default_value = "updated", help_heading = "Sorting")] #[arg(long, value_parser = ["updated", "created", "iid"], default_value = "updated", help_heading = "Sorting")]
pub sort: String, pub sort: String,
/// Sort ascending (default: descending)
#[arg(long, help_heading = "Sorting", overrides_with = "no_asc")] #[arg(long, help_heading = "Sorting", overrides_with = "no_asc")]
pub asc: bool, pub asc: bool,
#[arg(long = "no-asc", hide = true, overrides_with = "asc")] #[arg(long = "no-asc", hide = true, overrides_with = "asc")]
pub no_asc: bool, pub no_asc: bool,
/// Open first matching item in browser
#[arg( #[arg(
short = 'o', short = 'o',
long, long,
@@ -297,13 +236,10 @@ pub struct IssuesArgs {
pub no_open: bool, pub no_open: bool,
} }
/// Arguments for `lore mrs [IID]`
#[derive(Parser)] #[derive(Parser)]
pub struct MrsArgs { pub struct MrsArgs {
/// MR IID (omit to list, provide to show details)
pub iid: Option<i64>, pub iid: Option<i64>,
/// Maximum results
#[arg( #[arg(
short = 'n', short = 'n',
long = "limit", long = "limit",
@@ -312,35 +248,27 @@ pub struct MrsArgs {
)] )]
pub limit: usize, pub limit: usize,
/// Filter by state (opened, merged, closed, locked, all)
#[arg(short = 's', long, help_heading = "Filters")] #[arg(short = 's', long, help_heading = "Filters")]
pub state: Option<String>, pub state: Option<String>,
/// Filter by project path
#[arg(short = 'p', long, help_heading = "Filters")] #[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>, pub project: Option<String>,
/// Filter by author username
#[arg(short = 'a', long, help_heading = "Filters")] #[arg(short = 'a', long, help_heading = "Filters")]
pub author: Option<String>, pub author: Option<String>,
/// Filter by assignee username
#[arg(short = 'A', long, help_heading = "Filters")] #[arg(short = 'A', long, help_heading = "Filters")]
pub assignee: Option<String>, pub assignee: Option<String>,
/// Filter by reviewer username
#[arg(short = 'r', long, help_heading = "Filters")] #[arg(short = 'r', long, help_heading = "Filters")]
pub reviewer: Option<String>, pub reviewer: Option<String>,
/// Filter by label (repeatable, AND logic)
#[arg(short = 'l', long, help_heading = "Filters")] #[arg(short = 'l', long, help_heading = "Filters")]
pub label: Option<Vec<String>>, pub label: Option<Vec<String>>,
/// Filter by time (7d, 2w, 1m, or YYYY-MM-DD)
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub since: Option<String>, pub since: Option<String>,
/// Show only draft MRs
#[arg( #[arg(
short = 'd', short = 'd',
long, long,
@@ -349,7 +277,6 @@ pub struct MrsArgs {
)] )]
pub draft: bool, pub draft: bool,
/// Exclude draft MRs
#[arg( #[arg(
short = 'D', short = 'D',
long = "no-draft", long = "no-draft",
@@ -358,26 +285,21 @@ pub struct MrsArgs {
)] )]
pub no_draft: bool, pub no_draft: bool,
/// Filter by target branch
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub target: Option<String>, pub target: Option<String>,
/// Filter by source branch
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub source: Option<String>, pub source: Option<String>,
/// Sort field (updated, created, iid)
#[arg(long, value_parser = ["updated", "created", "iid"], default_value = "updated", help_heading = "Sorting")] #[arg(long, value_parser = ["updated", "created", "iid"], default_value = "updated", help_heading = "Sorting")]
pub sort: String, pub sort: String,
/// Sort ascending (default: descending)
#[arg(long, help_heading = "Sorting", overrides_with = "no_asc")] #[arg(long, help_heading = "Sorting", overrides_with = "no_asc")]
pub asc: bool, pub asc: bool,
#[arg(long = "no-asc", hide = true, overrides_with = "asc")] #[arg(long = "no-asc", hide = true, overrides_with = "asc")]
pub no_asc: bool, pub no_asc: bool,
/// Open first matching item in browser
#[arg( #[arg(
short = 'o', short = 'o',
long, long,
@@ -390,25 +312,20 @@ pub struct MrsArgs {
pub no_open: bool, pub no_open: bool,
} }
/// Arguments for `lore ingest [ENTITY]`
#[derive(Parser)] #[derive(Parser)]
pub struct IngestArgs { pub struct IngestArgs {
/// Entity to ingest (issues, mrs). Omit to ingest everything.
#[arg(value_parser = ["issues", "mrs"])] #[arg(value_parser = ["issues", "mrs"])]
pub entity: Option<String>, pub entity: Option<String>,
/// Filter to single project
#[arg(short = 'p', long)] #[arg(short = 'p', long)]
pub project: Option<String>, pub project: Option<String>,
/// Override stale sync lock
#[arg(short = 'f', long, overrides_with = "no_force")] #[arg(short = 'f', long, overrides_with = "no_force")]
pub force: bool, pub force: bool,
#[arg(long = "no-force", hide = true, overrides_with = "force")] #[arg(long = "no-force", hide = true, overrides_with = "force")]
pub no_force: bool, pub no_force: bool,
/// Full re-sync: reset cursors and fetch all data from scratch
#[arg(long, overrides_with = "no_full")] #[arg(long, overrides_with = "no_full")]
pub full: bool, pub full: bool,
@@ -416,60 +333,46 @@ pub struct IngestArgs {
pub no_full: bool, pub no_full: bool,
} }
/// Arguments for `lore stats`
#[derive(Parser)] #[derive(Parser)]
pub struct StatsArgs { pub struct StatsArgs {
/// Run integrity checks
#[arg(long, overrides_with = "no_check")] #[arg(long, overrides_with = "no_check")]
pub check: bool, pub check: bool,
#[arg(long = "no-check", hide = true, overrides_with = "check")] #[arg(long = "no-check", hide = true, overrides_with = "check")]
pub no_check: bool, pub no_check: bool,
/// Repair integrity issues (auto-enables --check)
#[arg(long)] #[arg(long)]
pub repair: bool, pub repair: bool,
} }
/// Arguments for `lore search <QUERY>`
#[derive(Parser)] #[derive(Parser)]
pub struct SearchArgs { pub struct SearchArgs {
/// Search query string
pub query: String, pub query: String,
/// Search mode (lexical, hybrid, semantic)
#[arg(long, default_value = "hybrid", value_parser = ["lexical", "hybrid", "semantic"], help_heading = "Output")] #[arg(long, default_value = "hybrid", value_parser = ["lexical", "hybrid", "semantic"], help_heading = "Output")]
pub mode: String, pub mode: String,
/// Filter by source type (issue, mr, discussion)
#[arg(long = "type", value_name = "TYPE", value_parser = ["issue", "mr", "discussion"], help_heading = "Filters")] #[arg(long = "type", value_name = "TYPE", value_parser = ["issue", "mr", "discussion"], help_heading = "Filters")]
pub source_type: Option<String>, pub source_type: Option<String>,
/// Filter by author username
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub author: Option<String>, pub author: Option<String>,
/// Filter by project path
#[arg(short = 'p', long, help_heading = "Filters")] #[arg(short = 'p', long, help_heading = "Filters")]
pub project: Option<String>, pub project: Option<String>,
/// Filter by label (repeatable, AND logic)
#[arg(long, action = clap::ArgAction::Append, help_heading = "Filters")] #[arg(long, action = clap::ArgAction::Append, help_heading = "Filters")]
pub label: Vec<String>, pub label: Vec<String>,
/// Filter by file path (trailing / for prefix match)
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub path: Option<String>, pub path: Option<String>,
/// Filter by created after (7d, 2w, or YYYY-MM-DD)
#[arg(long, help_heading = "Filters")] #[arg(long, help_heading = "Filters")]
pub after: Option<String>, pub after: Option<String>,
/// Filter by updated after (7d, 2w, or YYYY-MM-DD)
#[arg(long = "updated-after", help_heading = "Filters")] #[arg(long = "updated-after", help_heading = "Filters")]
pub updated_after: Option<String>, pub updated_after: Option<String>,
/// Maximum results (default 20, max 100)
#[arg( #[arg(
short = 'n', short = 'n',
long = "limit", long = "limit",
@@ -478,71 +381,57 @@ pub struct SearchArgs {
)] )]
pub limit: usize, pub limit: usize,
/// Show ranking explanation per result
#[arg(long, help_heading = "Output", overrides_with = "no_explain")] #[arg(long, help_heading = "Output", overrides_with = "no_explain")]
pub explain: bool, pub explain: bool,
#[arg(long = "no-explain", hide = true, overrides_with = "explain")] #[arg(long = "no-explain", hide = true, overrides_with = "explain")]
pub no_explain: bool, pub no_explain: bool,
/// FTS query mode: safe (default) or raw
#[arg(long = "fts-mode", default_value = "safe", value_parser = ["safe", "raw"], help_heading = "Output")] #[arg(long = "fts-mode", default_value = "safe", value_parser = ["safe", "raw"], help_heading = "Output")]
pub fts_mode: String, pub fts_mode: String,
} }
/// Arguments for `lore generate-docs`
#[derive(Parser)] #[derive(Parser)]
pub struct GenerateDocsArgs { pub struct GenerateDocsArgs {
/// Full rebuild: seed all entities into dirty queue, then drain
#[arg(long)] #[arg(long)]
pub full: bool, pub full: bool,
/// Filter to single project
#[arg(short = 'p', long)] #[arg(short = 'p', long)]
pub project: Option<String>, pub project: Option<String>,
} }
/// Arguments for `lore sync`
#[derive(Parser)] #[derive(Parser)]
pub struct SyncArgs { pub struct SyncArgs {
/// Reset cursors, fetch everything
#[arg(long, overrides_with = "no_full")] #[arg(long, overrides_with = "no_full")]
pub full: bool, pub full: bool,
#[arg(long = "no-full", hide = true, overrides_with = "full")] #[arg(long = "no-full", hide = true, overrides_with = "full")]
pub no_full: bool, pub no_full: bool,
/// Override stale lock
#[arg(long, overrides_with = "no_force")] #[arg(long, overrides_with = "no_force")]
pub force: bool, pub force: bool,
#[arg(long = "no-force", hide = true, overrides_with = "force")] #[arg(long = "no-force", hide = true, overrides_with = "force")]
pub no_force: bool, pub no_force: bool,
/// Skip embedding step
#[arg(long)] #[arg(long)]
pub no_embed: bool, pub no_embed: bool,
/// Skip document regeneration
#[arg(long)] #[arg(long)]
pub no_docs: bool, pub no_docs: bool,
/// Skip resource event fetching (overrides config)
#[arg(long = "no-events")] #[arg(long = "no-events")]
pub no_events: bool, pub no_events: bool,
} }
/// Arguments for `lore embed`
#[derive(Parser)] #[derive(Parser)]
pub struct EmbedArgs { pub struct EmbedArgs {
/// Re-embed all documents (clears existing embeddings first)
#[arg(long, overrides_with = "no_full")] #[arg(long, overrides_with = "no_full")]
pub full: bool, pub full: bool,
#[arg(long = "no-full", hide = true, overrides_with = "full")] #[arg(long = "no-full", hide = true, overrides_with = "full")]
pub no_full: bool, pub no_full: bool,
/// Retry previously failed embeddings
#[arg(long, overrides_with = "no_retry_failed")] #[arg(long, overrides_with = "no_retry_failed")]
pub retry_failed: bool, pub retry_failed: bool,
@@ -550,14 +439,11 @@ pub struct EmbedArgs {
pub no_retry_failed: bool, pub no_retry_failed: bool,
} }
/// Arguments for `lore count <ENTITY>`
#[derive(Parser)] #[derive(Parser)]
pub struct CountArgs { pub struct CountArgs {
/// Entity type to count (issues, mrs, discussions, notes, events)
#[arg(value_parser = ["issues", "mrs", "discussions", "notes", "events"])] #[arg(value_parser = ["issues", "mrs", "discussions", "notes", "events"])]
pub entity: String, pub entity: String,
/// Parent type filter: issue or mr (for discussions/notes)
#[arg(short = 'f', long = "for", value_parser = ["issue", "mr"])] #[arg(short = 'f', long = "for", value_parser = ["issue", "mr"])]
pub for_entity: Option<String>, pub for_entity: Option<String>,
} }

View File

@@ -1,41 +1,17 @@
//! Shared progress bar infrastructure.
//!
//! All progress bars must be created via [`multi()`] to ensure coordinated
//! rendering. The [`SuspendingWriter`] suspends the multi-progress before
//! writing tracing output, preventing log lines from interleaving with
//! progress bar animations.
use indicatif::MultiProgress; use indicatif::MultiProgress;
use std::io::Write; use std::io::Write;
use std::sync::LazyLock; use std::sync::LazyLock;
use tracing_subscriber::fmt::MakeWriter; use tracing_subscriber::fmt::MakeWriter;
/// Global multi-progress that coordinates all progress bar rendering.
///
/// Every `ProgressBar` displayed to the user **must** be registered via
/// `multi().add(bar)`. Standalone bars bypass the coordination and will
/// fight with other bars for the terminal line, causing rapid flashing.
static MULTI: LazyLock<MultiProgress> = LazyLock::new(MultiProgress::new); static MULTI: LazyLock<MultiProgress> = LazyLock::new(MultiProgress::new);
/// Returns the shared [`MultiProgress`] instance.
pub fn multi() -> &'static MultiProgress { pub fn multi() -> &'static MultiProgress {
&MULTI &MULTI
} }
/// A tracing `MakeWriter` that suspends the shared [`MultiProgress`] while
/// writing, so log output doesn't interleave with progress bar animations.
///
/// # How it works
///
/// `MultiProgress::suspend` temporarily clears all active progress bars from
/// the terminal, executes the closure (which writes the log line), then
/// redraws the bars. This ensures a clean, flicker-free display even when
/// logging happens concurrently with progress updates.
#[derive(Clone)] #[derive(Clone)]
pub struct SuspendingWriter; pub struct SuspendingWriter;
/// Writer returned by [`SuspendingWriter`] that buffers a single log line
/// and flushes it inside a `MultiProgress::suspend` call.
pub struct SuspendingWriterInner { pub struct SuspendingWriterInner {
buf: Vec<u8>, buf: Vec<u8>,
} }
@@ -47,7 +23,6 @@ impl Write for SuspendingWriterInner {
} }
fn flush(&mut self) -> std::io::Result<()> { fn flush(&mut self) -> std::io::Result<()> {
// Nothing to do — actual flush happens on drop.
Ok(()) Ok(())
} }
} }
@@ -102,10 +77,8 @@ mod tests {
fn suspending_writer_buffers_and_flushes() { fn suspending_writer_buffers_and_flushes() {
let writer = SuspendingWriter; let writer = SuspendingWriter;
let mut w = MakeWriter::make_writer(&writer); let mut w = MakeWriter::make_writer(&writer);
// Write should succeed and buffer data
let n = w.write(b"test log line\n").unwrap(); let n = w.write(b"test log line\n").unwrap();
assert_eq!(n, 14); assert_eq!(n, 14);
// Drop flushes via suspend — no panic means it works
drop(w); drop(w);
} }
@@ -113,7 +86,6 @@ mod tests {
fn suspending_writer_empty_does_not_flush() { fn suspending_writer_empty_does_not_flush() {
let writer = SuspendingWriter; let writer = SuspendingWriter;
let w = MakeWriter::make_writer(&writer); let w = MakeWriter::make_writer(&writer);
// Drop with empty buffer — should be a no-op
drop(w); drop(w);
} }
} }

View File

@@ -1,24 +1,10 @@
use rand::Rng; use rand::Rng;
/// Compute next_attempt_at with exponential backoff and jitter.
///
/// Formula: now + min(3600000, 1000 * 2^attempt_count) * (0.9 to 1.1)
/// - Capped at 1 hour to prevent runaway delays
/// - ±10% jitter prevents synchronized retries after outages
///
/// Used by:
/// - `dirty_sources` retry scheduling (document regeneration failures)
/// - `pending_discussion_fetches` retry scheduling (API fetch failures)
///
/// Having one implementation prevents subtle divergence between queues
/// (e.g., different caps or jitter ranges).
pub fn compute_next_attempt_at(now: i64, attempt_count: i64) -> i64 { pub fn compute_next_attempt_at(now: i64, attempt_count: i64) -> i64 {
// Cap attempt_count to prevent overflow (2^30 > 1 hour anyway)
let capped_attempts = attempt_count.min(30) as u32; let capped_attempts = attempt_count.min(30) as u32;
let base_delay_ms = 1000_i64.saturating_mul(1 << capped_attempts); let base_delay_ms = 1000_i64.saturating_mul(1 << capped_attempts);
let capped_delay_ms = base_delay_ms.min(3_600_000); // 1 hour cap let capped_delay_ms = base_delay_ms.min(3_600_000);
// Add ±10% jitter
let jitter_factor = rand::thread_rng().gen_range(0.9..=1.1); let jitter_factor = rand::thread_rng().gen_range(0.9..=1.1);
let delay_with_jitter = (capped_delay_ms as f64 * jitter_factor) as i64; let delay_with_jitter = (capped_delay_ms as f64 * jitter_factor) as i64;
@@ -34,7 +20,6 @@ mod tests {
#[test] #[test]
fn test_exponential_curve() { fn test_exponential_curve() {
let now = 1_000_000_000_i64; let now = 1_000_000_000_i64;
// Each attempt should roughly double the delay (within jitter)
for attempt in 1..=10 { for attempt in 1..=10 {
let result = compute_next_attempt_at(now, attempt); let result = compute_next_attempt_at(now, attempt);
let delay = result - now; let delay = result - now;
@@ -65,7 +50,7 @@ mod tests {
#[test] #[test]
fn test_jitter_range() { fn test_jitter_range() {
let now = 1_000_000_000_i64; let now = 1_000_000_000_i64;
let attempt = 5; // base = 32000 let attempt = 5;
let base = 1000_i64 * (1 << attempt); let base = 1000_i64 * (1 << attempt);
let min_delay = (base as f64 * 0.89) as i64; let min_delay = (base as f64 * 0.89) as i64;
let max_delay = (base as f64 * 1.11) as i64; let max_delay = (base as f64 * 1.11) as i64;
@@ -85,7 +70,6 @@ mod tests {
let now = 1_000_000_000_i64; let now = 1_000_000_000_i64;
let result = compute_next_attempt_at(now, 1); let result = compute_next_attempt_at(now, 1);
let delay = result - now; let delay = result - now;
// attempt 1: base = 2000ms, with jitter: 1800-2200ms
assert!( assert!(
(1800..=2200).contains(&delay), (1800..=2200).contains(&delay),
"first retry delay: {delay}ms" "first retry delay: {delay}ms"
@@ -95,7 +79,6 @@ mod tests {
#[test] #[test]
fn test_overflow_safety() { fn test_overflow_safety() {
let now = i64::MAX / 2; let now = i64::MAX / 2;
// Should not panic even with very large attempt_count
let result = compute_next_attempt_at(now, i64::MAX); let result = compute_next_attempt_at(now, i64::MAX);
assert!(result > now); assert!(result > now);
} }

View File

@@ -1,7 +1,3 @@
//! Configuration loading and validation.
//!
//! Config schema mirrors the TypeScript version with serde for deserialization.
use serde::Deserialize; use serde::Deserialize;
use std::fs; use std::fs;
use std::path::Path; use std::path::Path;
@@ -9,7 +5,6 @@ use std::path::Path;
use super::error::{LoreError, Result}; use super::error::{LoreError, Result};
use super::paths::get_config_path; use super::paths::get_config_path;
/// GitLab connection settings.
#[derive(Debug, Clone, Deserialize)] #[derive(Debug, Clone, Deserialize)]
pub struct GitLabConfig { pub struct GitLabConfig {
#[serde(rename = "baseUrl")] #[serde(rename = "baseUrl")]
@@ -23,13 +18,11 @@ fn default_token_env_var() -> String {
"GITLAB_TOKEN".to_string() "GITLAB_TOKEN".to_string()
} }
/// Project to sync.
#[derive(Debug, Clone, Deserialize)] #[derive(Debug, Clone, Deserialize)]
pub struct ProjectConfig { pub struct ProjectConfig {
pub path: String, pub path: String,
} }
/// Sync behavior settings.
#[derive(Debug, Clone, Deserialize)] #[derive(Debug, Clone, Deserialize)]
#[serde(default)] #[serde(default)]
pub struct SyncConfig { pub struct SyncConfig {
@@ -77,7 +70,6 @@ impl Default for SyncConfig {
} }
} }
/// Storage settings.
#[derive(Debug, Clone, Deserialize, Default)] #[derive(Debug, Clone, Deserialize, Default)]
#[serde(default)] #[serde(default)]
pub struct StorageConfig { pub struct StorageConfig {
@@ -98,7 +90,6 @@ fn default_compress_raw_payloads() -> bool {
true true
} }
/// Embedding provider settings.
#[derive(Debug, Clone, Deserialize)] #[derive(Debug, Clone, Deserialize)]
#[serde(default)] #[serde(default)]
pub struct EmbeddingConfig { pub struct EmbeddingConfig {
@@ -120,19 +111,15 @@ impl Default for EmbeddingConfig {
} }
} }
/// Logging and observability settings.
#[derive(Debug, Clone, Deserialize)] #[derive(Debug, Clone, Deserialize)]
#[serde(default)] #[serde(default)]
pub struct LoggingConfig { pub struct LoggingConfig {
/// Directory for log files. Default: ~/.local/share/lore/logs/
#[serde(rename = "logDir")] #[serde(rename = "logDir")]
pub log_dir: Option<String>, pub log_dir: Option<String>,
/// Days to retain log files. Default: 30. Set to 0 to disable file logging.
#[serde(rename = "retentionDays", default = "default_retention_days")] #[serde(rename = "retentionDays", default = "default_retention_days")]
pub retention_days: u32, pub retention_days: u32,
/// Enable JSON log files. Default: true.
#[serde(rename = "fileLogging", default = "default_file_logging")] #[serde(rename = "fileLogging", default = "default_file_logging")]
pub file_logging: bool, pub file_logging: bool,
} }
@@ -155,7 +142,6 @@ impl Default for LoggingConfig {
} }
} }
/// Main configuration structure.
#[derive(Debug, Clone, Deserialize)] #[derive(Debug, Clone, Deserialize)]
pub struct Config { pub struct Config {
pub gitlab: GitLabConfig, pub gitlab: GitLabConfig,
@@ -175,7 +161,6 @@ pub struct Config {
} }
impl Config { impl Config {
/// Load and validate configuration from file.
pub fn load(cli_override: Option<&str>) -> Result<Self> { pub fn load(cli_override: Option<&str>) -> Result<Self> {
let config_path = get_config_path(cli_override); let config_path = get_config_path(cli_override);
@@ -188,7 +173,6 @@ impl Config {
Self::load_from_path(&config_path) Self::load_from_path(&config_path)
} }
/// Load configuration from a specific path.
pub fn load_from_path(path: &Path) -> Result<Self> { pub fn load_from_path(path: &Path) -> Result<Self> {
let content = fs::read_to_string(path).map_err(|e| LoreError::ConfigInvalid { let content = fs::read_to_string(path).map_err(|e| LoreError::ConfigInvalid {
details: format!("Failed to read config file: {e}"), details: format!("Failed to read config file: {e}"),
@@ -199,7 +183,6 @@ impl Config {
details: format!("Invalid JSON: {e}"), details: format!("Invalid JSON: {e}"),
})?; })?;
// Validate required fields
if config.projects.is_empty() { if config.projects.is_empty() {
return Err(LoreError::ConfigInvalid { return Err(LoreError::ConfigInvalid {
details: "At least one project is required".to_string(), details: "At least one project is required".to_string(),
@@ -214,7 +197,6 @@ impl Config {
} }
} }
// Validate URL format
if url::Url::parse(&config.gitlab.base_url).is_err() { if url::Url::parse(&config.gitlab.base_url).is_err() {
return Err(LoreError::ConfigInvalid { return Err(LoreError::ConfigInvalid {
details: format!("Invalid GitLab URL: {}", config.gitlab.base_url), details: format!("Invalid GitLab URL: {}", config.gitlab.base_url),
@@ -225,7 +207,6 @@ impl Config {
} }
} }
/// Minimal config for writing during init (relies on defaults when loaded).
#[derive(Debug, serde::Serialize)] #[derive(Debug, serde::Serialize)]
pub struct MinimalConfig { pub struct MinimalConfig {
pub gitlab: MinimalGitLabConfig, pub gitlab: MinimalGitLabConfig,

View File

@@ -1,7 +1,3 @@
//! Database connection and migration management.
//!
//! Uses rusqlite with WAL mode for crash safety.
use rusqlite::Connection; use rusqlite::Connection;
use sqlite_vec::sqlite3_vec_init; use sqlite_vec::sqlite3_vec_init;
use std::fs; use std::fs;
@@ -10,11 +6,8 @@ use tracing::{debug, info};
use super::error::{LoreError, Result}; use super::error::{LoreError, Result};
/// Latest schema version, derived from the embedded migrations count.
/// Used by the health check to verify databases are up-to-date.
pub const LATEST_SCHEMA_VERSION: i32 = MIGRATIONS.len() as i32; pub const LATEST_SCHEMA_VERSION: i32 = MIGRATIONS.len() as i32;
/// Embedded migrations - compiled into the binary.
const MIGRATIONS: &[(&str, &str)] = &[ const MIGRATIONS: &[(&str, &str)] = &[
("001", include_str!("../../migrations/001_initial.sql")), ("001", include_str!("../../migrations/001_initial.sql")),
("002", include_str!("../../migrations/002_issues.sql")), ("002", include_str!("../../migrations/002_issues.sql")),
@@ -53,9 +46,7 @@ const MIGRATIONS: &[(&str, &str)] = &[
), ),
]; ];
/// Create a database connection with production-grade pragmas.
pub fn create_connection(db_path: &Path) -> Result<Connection> { pub fn create_connection(db_path: &Path) -> Result<Connection> {
// Register sqlite-vec extension globally (safe to call multiple times)
#[allow(clippy::missing_transmute_annotations)] #[allow(clippy::missing_transmute_annotations)]
unsafe { unsafe {
rusqlite::ffi::sqlite3_auto_extension(Some(std::mem::transmute( rusqlite::ffi::sqlite3_auto_extension(Some(std::mem::transmute(
@@ -63,30 +54,26 @@ pub fn create_connection(db_path: &Path) -> Result<Connection> {
))); )));
} }
// Ensure parent directory exists
if let Some(parent) = db_path.parent() { if let Some(parent) = db_path.parent() {
fs::create_dir_all(parent)?; fs::create_dir_all(parent)?;
} }
let conn = Connection::open(db_path)?; let conn = Connection::open(db_path)?;
// Production-grade pragmas for single-user CLI
conn.pragma_update(None, "journal_mode", "WAL")?; conn.pragma_update(None, "journal_mode", "WAL")?;
conn.pragma_update(None, "synchronous", "NORMAL")?; // Safe for WAL on local disk conn.pragma_update(None, "synchronous", "NORMAL")?;
conn.pragma_update(None, "foreign_keys", "ON")?; conn.pragma_update(None, "foreign_keys", "ON")?;
conn.pragma_update(None, "busy_timeout", 5000)?; // 5s wait on lock contention conn.pragma_update(None, "busy_timeout", 5000)?;
conn.pragma_update(None, "temp_store", "MEMORY")?; // Small speed win conn.pragma_update(None, "temp_store", "MEMORY")?;
conn.pragma_update(None, "cache_size", -64000)?; // 64MB cache (negative = KB) conn.pragma_update(None, "cache_size", -64000)?;
conn.pragma_update(None, "mmap_size", 268_435_456)?; // 256MB memory-mapped I/O conn.pragma_update(None, "mmap_size", 268_435_456)?;
debug!(db_path = %db_path.display(), "Database connection created"); debug!(db_path = %db_path.display(), "Database connection created");
Ok(conn) Ok(conn)
} }
/// Run all pending migrations using embedded SQL.
pub fn run_migrations(conn: &Connection) -> Result<()> { pub fn run_migrations(conn: &Connection) -> Result<()> {
// Get current schema version
let has_version_table: bool = conn let has_version_table: bool = conn
.query_row( .query_row(
"SELECT COUNT(*) > 0 FROM sqlite_master WHERE type='table' AND name='schema_version'", "SELECT COUNT(*) > 0 FROM sqlite_master WHERE type='table' AND name='schema_version'",
@@ -114,9 +101,6 @@ pub fn run_migrations(conn: &Connection) -> Result<()> {
continue; continue;
} }
// Wrap each migration in a transaction to prevent partial application.
// If the migration SQL already contains BEGIN/COMMIT, execute_batch handles
// it, but wrapping in a savepoint ensures atomicity for those that don't.
let savepoint_name = format!("migration_{}", version); let savepoint_name = format!("migration_{}", version);
conn.execute_batch(&format!("SAVEPOINT {}", savepoint_name)) conn.execute_batch(&format!("SAVEPOINT {}", savepoint_name))
.map_err(|e| LoreError::MigrationFailed { .map_err(|e| LoreError::MigrationFailed {
@@ -150,7 +134,6 @@ pub fn run_migrations(conn: &Connection) -> Result<()> {
Ok(()) Ok(())
} }
/// Run migrations from filesystem (for testing or custom migrations).
#[allow(dead_code)] #[allow(dead_code)]
pub fn run_migrations_from_dir(conn: &Connection, migrations_dir: &Path) -> Result<()> { pub fn run_migrations_from_dir(conn: &Connection, migrations_dir: &Path) -> Result<()> {
let has_version_table: bool = conn let has_version_table: bool = conn
@@ -194,8 +177,6 @@ pub fn run_migrations_from_dir(conn: &Connection, migrations_dir: &Path) -> Resu
let sql = fs::read_to_string(entry.path())?; let sql = fs::read_to_string(entry.path())?;
// Wrap each migration in a savepoint to prevent partial application,
// matching the safety guarantees of run_migrations().
let savepoint_name = format!("migration_{}", version); let savepoint_name = format!("migration_{}", version);
conn.execute_batch(&format!("SAVEPOINT {}", savepoint_name)) conn.execute_batch(&format!("SAVEPOINT {}", savepoint_name))
.map_err(|e| LoreError::MigrationFailed { .map_err(|e| LoreError::MigrationFailed {
@@ -229,8 +210,6 @@ pub fn run_migrations_from_dir(conn: &Connection, migrations_dir: &Path) -> Resu
Ok(()) Ok(())
} }
/// Verify database pragmas are set correctly.
/// Used by lore doctor command.
pub fn verify_pragmas(conn: &Connection) -> (bool, Vec<String>) { pub fn verify_pragmas(conn: &Connection) -> (bool, Vec<String>) {
let mut issues = Vec::new(); let mut issues = Vec::new();
@@ -258,7 +237,6 @@ pub fn verify_pragmas(conn: &Connection) -> (bool, Vec<String>) {
let synchronous: i32 = conn let synchronous: i32 = conn
.pragma_query_value(None, "synchronous", |row| row.get(0)) .pragma_query_value(None, "synchronous", |row| row.get(0))
.unwrap_or(0); .unwrap_or(0);
// NORMAL = 1
if synchronous != 1 { if synchronous != 1 {
issues.push(format!("synchronous is {synchronous}, expected 1 (NORMAL)")); issues.push(format!("synchronous is {synchronous}, expected 1 (NORMAL)"));
} }
@@ -266,7 +244,6 @@ pub fn verify_pragmas(conn: &Connection) -> (bool, Vec<String>) {
(issues.is_empty(), issues) (issues.is_empty(), issues)
} }
/// Get current schema version.
pub fn get_schema_version(conn: &Connection) -> i32 { pub fn get_schema_version(conn: &Connection) -> i32 {
let has_version_table: bool = conn let has_version_table: bool = conn
.query_row( .query_row(

View File

@@ -1,8 +1,3 @@
//! Generic dependent fetch queue for resource events, MR closes, and MR diffs.
//!
//! Provides enqueue, claim, complete, fail (with exponential backoff), and
//! stale lock reclamation operations against the `pending_dependent_fetches` table.
use std::collections::HashMap; use std::collections::HashMap;
use rusqlite::Connection; use rusqlite::Connection;
@@ -10,7 +5,6 @@ use rusqlite::Connection;
use super::error::Result; use super::error::Result;
use super::time::now_ms; use super::time::now_ms;
/// A pending job from the dependent fetch queue.
#[derive(Debug)] #[derive(Debug)]
pub struct PendingJob { pub struct PendingJob {
pub id: i64, pub id: i64,
@@ -23,9 +17,6 @@ pub struct PendingJob {
pub attempts: i32, pub attempts: i32,
} }
/// Enqueue a dependent fetch job. Idempotent via UNIQUE constraint (INSERT OR IGNORE).
///
/// Returns `true` if actually inserted (not deduped).
pub fn enqueue_job( pub fn enqueue_job(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -54,10 +45,6 @@ pub fn enqueue_job(
Ok(changes > 0) Ok(changes > 0)
} }
/// Claim a batch of jobs for processing, scoped to a specific project.
///
/// Atomically selects and locks jobs within a transaction. Only claims jobs
/// where `locked_at IS NULL` and `(next_retry_at IS NULL OR next_retry_at <= now)`.
pub fn claim_jobs( pub fn claim_jobs(
conn: &Connection, conn: &Connection,
job_type: &str, job_type: &str,
@@ -70,8 +57,6 @@ pub fn claim_jobs(
let now = now_ms(); let now = now_ms();
// Use UPDATE ... RETURNING to atomically select and lock in one statement.
// This eliminates the race between SELECT and UPDATE.
let mut stmt = conn.prepare_cached( let mut stmt = conn.prepare_cached(
"UPDATE pending_dependent_fetches "UPDATE pending_dependent_fetches
SET locked_at = ?1 SET locked_at = ?1
@@ -109,7 +94,6 @@ pub fn claim_jobs(
Ok(jobs) Ok(jobs)
} }
/// Mark a job as complete (DELETE the row).
pub fn complete_job(conn: &Connection, job_id: i64) -> Result<()> { pub fn complete_job(conn: &Connection, job_id: i64) -> Result<()> {
conn.execute( conn.execute(
"DELETE FROM pending_dependent_fetches WHERE id = ?1", "DELETE FROM pending_dependent_fetches WHERE id = ?1",
@@ -119,17 +103,9 @@ pub fn complete_job(conn: &Connection, job_id: i64) -> Result<()> {
Ok(()) Ok(())
} }
/// Mark a job as failed. Increments attempts, sets next_retry_at with exponential
/// backoff, clears locked_at, and records the error.
///
/// Backoff: 30s * 2^(attempts), capped at 480s. Uses a single atomic UPDATE
/// to avoid a read-then-write race on the `attempts` counter.
pub fn fail_job(conn: &Connection, job_id: i64, error: &str) -> Result<()> { pub fn fail_job(conn: &Connection, job_id: i64, error: &str) -> Result<()> {
let now = now_ms(); let now = now_ms();
// Atomic increment + backoff calculation in one UPDATE.
// MIN(attempts, 4) caps the shift to prevent overflow; the overall
// backoff is clamped to 480 000 ms via MIN(..., 480000).
let changes = conn.execute( let changes = conn.execute(
"UPDATE pending_dependent_fetches "UPDATE pending_dependent_fetches
SET attempts = attempts + 1, SET attempts = attempts + 1,
@@ -149,9 +125,6 @@ pub fn fail_job(conn: &Connection, job_id: i64, error: &str) -> Result<()> {
Ok(()) Ok(())
} }
/// Reclaim stale locks (locked_at older than threshold).
///
/// Returns count of reclaimed jobs.
pub fn reclaim_stale_locks(conn: &Connection, stale_threshold_minutes: u32) -> Result<usize> { pub fn reclaim_stale_locks(conn: &Connection, stale_threshold_minutes: u32) -> Result<usize> {
let threshold_ms = now_ms() - (i64::from(stale_threshold_minutes) * 60 * 1000); let threshold_ms = now_ms() - (i64::from(stale_threshold_minutes) * 60 * 1000);
@@ -163,7 +136,6 @@ pub fn reclaim_stale_locks(conn: &Connection, stale_threshold_minutes: u32) -> R
Ok(changes) Ok(changes)
} }
/// Count pending jobs by job_type, optionally scoped to a project.
pub fn count_pending_jobs( pub fn count_pending_jobs(
conn: &Connection, conn: &Connection,
project_id: Option<i64>, project_id: Option<i64>,
@@ -205,11 +177,6 @@ pub fn count_pending_jobs(
Ok(counts) Ok(counts)
} }
/// Count jobs that are actually claimable right now, by job_type.
///
/// Only counts jobs where `locked_at IS NULL` and `(next_retry_at IS NULL OR next_retry_at <= now)`,
/// matching the exact WHERE clause used by [`claim_jobs`]. This gives an accurate total
/// for progress bars — unlike [`count_pending_jobs`] which includes locked and backing-off jobs.
pub fn count_claimable_jobs(conn: &Connection, project_id: i64) -> Result<HashMap<String, usize>> { pub fn count_claimable_jobs(conn: &Connection, project_id: i64) -> Result<HashMap<String, usize>> {
let now = now_ms(); let now = now_ms();
let mut counts = HashMap::new(); let mut counts = HashMap::new();

View File

@@ -1,11 +1,6 @@
//! Custom error types for gitlore.
//!
//! Uses thiserror for ergonomic error definitions with structured error codes.
use serde::Serialize; use serde::Serialize;
use thiserror::Error; use thiserror::Error;
/// Error codes for programmatic error handling.
#[derive(Debug, Clone, Copy, PartialEq, Eq)] #[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ErrorCode { pub enum ErrorCode {
ConfigNotFound, ConfigNotFound,
@@ -55,7 +50,6 @@ impl std::fmt::Display for ErrorCode {
} }
impl ErrorCode { impl ErrorCode {
/// Get the exit code for this error (for robot mode).
pub fn exit_code(&self) -> i32 { pub fn exit_code(&self) -> i32 {
match self { match self {
Self::InternalError => 1, Self::InternalError => 1,
@@ -80,7 +74,6 @@ impl ErrorCode {
} }
} }
/// Main error type for gitlore.
#[derive(Error, Debug)] #[derive(Error, Debug)]
pub enum LoreError { pub enum LoreError {
#[error("Config file not found at {path}. Run \"lore init\" first.")] #[error("Config file not found at {path}. Run \"lore init\" first.")]
@@ -163,7 +156,6 @@ pub enum LoreError {
} }
impl LoreError { impl LoreError {
/// Get the error code for programmatic handling.
pub fn code(&self) -> ErrorCode { pub fn code(&self) -> ErrorCode {
match self { match self {
Self::ConfigNotFound { .. } => ErrorCode::ConfigNotFound, Self::ConfigNotFound { .. } => ErrorCode::ConfigNotFound,
@@ -190,7 +182,6 @@ impl LoreError {
} }
} }
/// Get a suggestion for how to fix this error, including inline examples.
pub fn suggestion(&self) -> Option<&'static str> { pub fn suggestion(&self) -> Option<&'static str> {
match self { match self {
Self::ConfigNotFound { .. } => Some( Self::ConfigNotFound { .. } => Some(
@@ -240,21 +231,14 @@ impl LoreError {
} }
} }
/// Whether this error represents a permanent API failure that should not be retried.
///
/// Only 404 (not found) is truly permanent: the resource doesn't exist and never will.
/// 403 and auth errors are NOT permanent — they may be environmental (VPN down,
/// token rotation, temporary restrictions) and should be retried with backoff.
pub fn is_permanent_api_error(&self) -> bool { pub fn is_permanent_api_error(&self) -> bool {
matches!(self, Self::GitLabNotFound { .. }) matches!(self, Self::GitLabNotFound { .. })
} }
/// Get the exit code for this error.
pub fn exit_code(&self) -> i32 { pub fn exit_code(&self) -> i32 {
self.code().exit_code() self.code().exit_code()
} }
/// Convert to robot-mode JSON error output.
pub fn to_robot_error(&self) -> RobotError { pub fn to_robot_error(&self) -> RobotError {
RobotError { RobotError {
code: self.code().to_string(), code: self.code().to_string(),
@@ -264,7 +248,6 @@ impl LoreError {
} }
} }
/// Structured error for robot mode JSON output.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct RobotError { pub struct RobotError {
pub code: String, pub code: String,
@@ -273,7 +256,6 @@ pub struct RobotError {
pub suggestion: Option<String>, pub suggestion: Option<String>,
} }
/// Wrapper for robot mode error output.
#[derive(Debug, Serialize)] #[derive(Debug, Serialize)]
pub struct RobotErrorOutput { pub struct RobotErrorOutput {
pub error: RobotError, pub error: RobotError,

View File

@@ -1,15 +1,9 @@
//! Database upsert functions for resource events (state, label, milestone).
use rusqlite::Connection; use rusqlite::Connection;
use super::error::{LoreError, Result}; use super::error::{LoreError, Result};
use super::time::iso_to_ms_strict; use super::time::iso_to_ms_strict;
use crate::gitlab::types::{GitLabLabelEvent, GitLabMilestoneEvent, GitLabStateEvent}; use crate::gitlab::types::{GitLabLabelEvent, GitLabMilestoneEvent, GitLabStateEvent};
/// Upsert state events for an entity.
///
/// Uses INSERT OR REPLACE keyed on UNIQUE(gitlab_id, project_id).
/// Caller is responsible for wrapping in a transaction if atomicity is needed.
pub fn upsert_state_events( pub fn upsert_state_events(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -52,8 +46,6 @@ pub fn upsert_state_events(
Ok(count) Ok(count)
} }
/// Upsert label events for an entity.
/// Caller is responsible for wrapping in a transaction if atomicity is needed.
pub fn upsert_label_events( pub fn upsert_label_events(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -93,8 +85,6 @@ pub fn upsert_label_events(
Ok(count) Ok(count)
} }
/// Upsert milestone events for an entity.
/// Caller is responsible for wrapping in a transaction if atomicity is needed.
pub fn upsert_milestone_events( pub fn upsert_milestone_events(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -135,8 +125,6 @@ pub fn upsert_milestone_events(
Ok(count) Ok(count)
} }
/// Resolve entity type string to (issue_id, merge_request_id) pair.
/// Exactly one is Some, the other is None.
fn resolve_entity_ids( fn resolve_entity_ids(
entity_type: &str, entity_type: &str,
entity_local_id: i64, entity_local_id: i64,
@@ -150,11 +138,9 @@ fn resolve_entity_ids(
} }
} }
/// Count resource events by type for the count command.
pub fn count_events(conn: &Connection) -> Result<EventCounts> { pub fn count_events(conn: &Connection) -> Result<EventCounts> {
let mut counts = EventCounts::default(); let mut counts = EventCounts::default();
// State events
let row: (i64, i64) = conn let row: (i64, i64) = conn
.query_row( .query_row(
"SELECT "SELECT
@@ -168,7 +154,6 @@ pub fn count_events(conn: &Connection) -> Result<EventCounts> {
counts.state_issue = row.0 as usize; counts.state_issue = row.0 as usize;
counts.state_mr = row.1 as usize; counts.state_mr = row.1 as usize;
// Label events
let row: (i64, i64) = conn let row: (i64, i64) = conn
.query_row( .query_row(
"SELECT "SELECT
@@ -182,7 +167,6 @@ pub fn count_events(conn: &Connection) -> Result<EventCounts> {
counts.label_issue = row.0 as usize; counts.label_issue = row.0 as usize;
counts.label_mr = row.1 as usize; counts.label_mr = row.1 as usize;
// Milestone events
let row: (i64, i64) = conn let row: (i64, i64) = conn
.query_row( .query_row(
"SELECT "SELECT
@@ -199,7 +183,6 @@ pub fn count_events(conn: &Connection) -> Result<EventCounts> {
Ok(counts) Ok(counts)
} }
/// Event counts broken down by type and entity.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct EventCounts { pub struct EventCounts {
pub state_issue: usize, pub state_issue: usize,

View File

@@ -1,7 +1,3 @@
//! Crash-safe single-flight lock using heartbeat pattern.
//!
//! Prevents concurrent sync operations and allows recovery from crashed processes.
use rusqlite::{Connection, TransactionBehavior}; use rusqlite::{Connection, TransactionBehavior};
use std::path::PathBuf; use std::path::PathBuf;
use std::sync::Arc; use std::sync::Arc;
@@ -15,17 +11,14 @@ use super::db::create_connection;
use super::error::{LoreError, Result}; use super::error::{LoreError, Result};
use super::time::{ms_to_iso, now_ms}; use super::time::{ms_to_iso, now_ms};
/// Maximum consecutive heartbeat failures before signaling error.
const MAX_HEARTBEAT_FAILURES: u32 = 3; const MAX_HEARTBEAT_FAILURES: u32 = 3;
/// Lock configuration options.
pub struct LockOptions { pub struct LockOptions {
pub name: String, pub name: String,
pub stale_lock_minutes: u32, pub stale_lock_minutes: u32,
pub heartbeat_interval_seconds: u32, pub heartbeat_interval_seconds: u32,
} }
/// App lock with heartbeat for crash recovery.
pub struct AppLock { pub struct AppLock {
conn: Connection, conn: Connection,
db_path: PathBuf, db_path: PathBuf,
@@ -40,7 +33,6 @@ pub struct AppLock {
} }
impl AppLock { impl AppLock {
/// Create a new app lock instance.
pub fn new(conn: Connection, options: LockOptions) -> Self { pub fn new(conn: Connection, options: LockOptions) -> Self {
let db_path = conn.path().map(PathBuf::from).unwrap_or_default(); let db_path = conn.path().map(PathBuf::from).unwrap_or_default();
@@ -58,23 +50,17 @@ impl AppLock {
} }
} }
/// Check if heartbeat has failed (indicates lock may be compromised).
pub fn is_heartbeat_healthy(&self) -> bool { pub fn is_heartbeat_healthy(&self) -> bool {
!self.heartbeat_failed.load(Ordering::SeqCst) !self.heartbeat_failed.load(Ordering::SeqCst)
} }
/// Attempt to acquire the lock atomically.
///
/// Returns Ok(true) if lock acquired, Err if lock is held by another active process.
pub fn acquire(&mut self, force: bool) -> Result<bool> { pub fn acquire(&mut self, force: bool) -> Result<bool> {
let now = now_ms(); let now = now_ms();
// Use IMMEDIATE transaction to prevent race conditions
let tx = self let tx = self
.conn .conn
.transaction_with_behavior(TransactionBehavior::Immediate)?; .transaction_with_behavior(TransactionBehavior::Immediate)?;
// Check for existing lock within the transaction
let existing: Option<(String, i64, i64)> = tx let existing: Option<(String, i64, i64)> = tx
.query_row( .query_row(
"SELECT owner, acquired_at, heartbeat_at FROM app_locks WHERE name = ?", "SELECT owner, acquired_at, heartbeat_at FROM app_locks WHERE name = ?",
@@ -85,7 +71,6 @@ impl AppLock {
match existing { match existing {
None => { None => {
// No lock exists, acquire it
tx.execute( tx.execute(
"INSERT INTO app_locks (name, owner, acquired_at, heartbeat_at) VALUES (?, ?, ?, ?)", "INSERT INTO app_locks (name, owner, acquired_at, heartbeat_at) VALUES (?, ?, ?, ?)",
(&self.name, &self.owner, now, now), (&self.name, &self.owner, now, now),
@@ -96,7 +81,6 @@ impl AppLock {
let is_stale = now - heartbeat_at > self.stale_lock_ms; let is_stale = now - heartbeat_at > self.stale_lock_ms;
if is_stale || force { if is_stale || force {
// Lock is stale or force override, take it
tx.execute( tx.execute(
"UPDATE app_locks SET owner = ?, acquired_at = ?, heartbeat_at = ? WHERE name = ?", "UPDATE app_locks SET owner = ?, acquired_at = ?, heartbeat_at = ? WHERE name = ?",
(&self.owner, now, now, &self.name), (&self.owner, now, now, &self.name),
@@ -108,13 +92,11 @@ impl AppLock {
"Lock acquired (override)" "Lock acquired (override)"
); );
} else if existing_owner == self.owner { } else if existing_owner == self.owner {
// Re-entrant, update heartbeat
tx.execute( tx.execute(
"UPDATE app_locks SET heartbeat_at = ? WHERE name = ?", "UPDATE app_locks SET heartbeat_at = ? WHERE name = ?",
(now, &self.name), (now, &self.name),
)?; )?;
} else { } else {
// Lock held by another active process - rollback and return error
drop(tx); drop(tx);
return Err(LoreError::DatabaseLocked { return Err(LoreError::DatabaseLocked {
owner: existing_owner, owner: existing_owner,
@@ -124,20 +106,17 @@ impl AppLock {
} }
} }
// Commit the transaction atomically
tx.commit()?; tx.commit()?;
self.start_heartbeat(); self.start_heartbeat();
Ok(true) Ok(true)
} }
/// Release the lock.
pub fn release(&mut self) { pub fn release(&mut self) {
if self.released.swap(true, Ordering::SeqCst) { if self.released.swap(true, Ordering::SeqCst) {
return; // Already released return;
} }
// Stop heartbeat thread
if let Some(handle) = self.heartbeat_handle.take() { if let Some(handle) = self.heartbeat_handle.take() {
let _ = handle.join(); let _ = handle.join();
} }
@@ -150,7 +129,6 @@ impl AppLock {
info!(owner = %self.owner, "Lock released"); info!(owner = %self.owner, "Lock released");
} }
/// Start the heartbeat thread to keep the lock alive.
fn start_heartbeat(&mut self) { fn start_heartbeat(&mut self) {
let name = self.name.clone(); let name = self.name.clone();
let owner = self.owner.clone(); let owner = self.owner.clone();
@@ -161,11 +139,10 @@ impl AppLock {
let db_path = self.db_path.clone(); let db_path = self.db_path.clone();
if db_path.as_os_str().is_empty() { if db_path.as_os_str().is_empty() {
return; // In-memory database, skip heartbeat return;
} }
self.heartbeat_handle = Some(thread::spawn(move || { self.heartbeat_handle = Some(thread::spawn(move || {
// Open a new connection with proper pragmas
let conn = match create_connection(&db_path) { let conn = match create_connection(&db_path) {
Ok(c) => c, Ok(c) => c,
Err(e) => { Err(e) => {
@@ -175,11 +152,9 @@ impl AppLock {
} }
}; };
// Poll frequently for early exit, but only update heartbeat at full interval
const POLL_INTERVAL: Duration = Duration::from_millis(100); const POLL_INTERVAL: Duration = Duration::from_millis(100);
loop { loop {
// Sleep in small increments, checking released flag frequently
let mut elapsed = Duration::ZERO; let mut elapsed = Duration::ZERO;
while elapsed < interval { while elapsed < interval {
thread::sleep(POLL_INTERVAL); thread::sleep(POLL_INTERVAL);
@@ -189,7 +164,6 @@ impl AppLock {
} }
} }
// Check once more after full interval elapsed
if released.load(Ordering::SeqCst) { if released.load(Ordering::SeqCst) {
break; break;
} }
@@ -203,12 +177,10 @@ impl AppLock {
match result { match result {
Ok(rows_affected) => { Ok(rows_affected) => {
if rows_affected == 0 { if rows_affected == 0 {
// Lock was stolen or deleted
warn!(owner = %owner, "Heartbeat failed: lock no longer held"); warn!(owner = %owner, "Heartbeat failed: lock no longer held");
heartbeat_failed.store(true, Ordering::SeqCst); heartbeat_failed.store(true, Ordering::SeqCst);
break; break;
} }
// Reset failure count on success
failure_count.store(0, Ordering::SeqCst); failure_count.store(0, Ordering::SeqCst);
debug!(owner = %owner, "Heartbeat updated"); debug!(owner = %owner, "Heartbeat updated");
} }

View File

@@ -1,29 +1,13 @@
//! Logging infrastructure: dual-layer subscriber setup and log file retention.
//!
//! Provides a layered tracing subscriber with:
//! - **stderr layer**: Human-readable or JSON format, controlled by `-v` flags
//! - **file layer**: Always-on JSON output to daily-rotated log files
use std::fs; use std::fs;
use std::path::Path; use std::path::Path;
use tracing_subscriber::EnvFilter; use tracing_subscriber::EnvFilter;
/// Build an `EnvFilter` from the verbosity count.
///
/// | Count | App Level | Dep Level |
/// |-------|-----------|-----------|
/// | 0 | INFO | WARN |
/// | 1 | DEBUG | WARN |
/// | 2 | DEBUG | INFO |
/// | 3+ | TRACE | DEBUG |
pub fn build_stderr_filter(verbose: u8, quiet: bool) -> EnvFilter { pub fn build_stderr_filter(verbose: u8, quiet: bool) -> EnvFilter {
// RUST_LOG always wins if set
if std::env::var("RUST_LOG").is_ok() { if std::env::var("RUST_LOG").is_ok() {
return EnvFilter::from_default_env(); return EnvFilter::from_default_env();
} }
// -q overrides -v for stderr
if quiet { if quiet {
return EnvFilter::new("lore=warn,error"); return EnvFilter::new("lore=warn,error");
} }
@@ -38,10 +22,6 @@ pub fn build_stderr_filter(verbose: u8, quiet: bool) -> EnvFilter {
EnvFilter::new(directives) EnvFilter::new(directives)
} }
/// Build an `EnvFilter` for the file layer.
///
/// Always captures DEBUG+ for `lore::*` and WARN+ for dependencies,
/// unless `RUST_LOG` is set (which overrides everything).
pub fn build_file_filter() -> EnvFilter { pub fn build_file_filter() -> EnvFilter {
if std::env::var("RUST_LOG").is_ok() { if std::env::var("RUST_LOG").is_ok() {
return EnvFilter::from_default_env(); return EnvFilter::from_default_env();
@@ -50,10 +30,6 @@ pub fn build_file_filter() -> EnvFilter {
EnvFilter::new("lore=debug,warn") EnvFilter::new("lore=debug,warn")
} }
/// Delete log files older than `retention_days` from the given directory.
///
/// Only deletes files matching the `lore.YYYY-MM-DD.log` pattern.
/// Returns the number of files deleted.
pub fn cleanup_old_logs(log_dir: &Path, retention_days: u32) -> usize { pub fn cleanup_old_logs(log_dir: &Path, retention_days: u32) -> usize {
if retention_days == 0 || !log_dir.exists() { if retention_days == 0 || !log_dir.exists() {
return 0; return 0;
@@ -72,7 +48,6 @@ pub fn cleanup_old_logs(log_dir: &Path, retention_days: u32) -> usize {
let file_name = entry.file_name(); let file_name = entry.file_name();
let name = file_name.to_string_lossy(); let name = file_name.to_string_lossy();
// Match pattern: lore.YYYY-MM-DD.log or lore.YYYY-MM-DD (tracing-appender format)
if let Some(date_str) = extract_log_date(&name) if let Some(date_str) = extract_log_date(&name)
&& date_str < cutoff_date && date_str < cutoff_date
&& fs::remove_file(entry.path()).is_ok() && fs::remove_file(entry.path()).is_ok()
@@ -84,28 +59,20 @@ pub fn cleanup_old_logs(log_dir: &Path, retention_days: u32) -> usize {
deleted deleted
} }
/// Extract the date portion from a log filename.
///
/// Matches: `lore.YYYY-MM-DD.log` or `lore.YYYY-MM-DD`
fn extract_log_date(filename: &str) -> Option<String> { fn extract_log_date(filename: &str) -> Option<String> {
let rest = filename.strip_prefix("lore.")?; let rest = filename.strip_prefix("lore.")?;
// Must have at least YYYY-MM-DD (10 ASCII chars).
// Use get() to avoid panicking on non-ASCII filenames.
let date_part = rest.get(..10)?; let date_part = rest.get(..10)?;
// Validate it looks like a date
let parts: Vec<&str> = date_part.split('-').collect(); let parts: Vec<&str> = date_part.split('-').collect();
if parts.len() != 3 || parts[0].len() != 4 || parts[1].len() != 2 || parts[2].len() != 2 { if parts.len() != 3 || parts[0].len() != 4 || parts[1].len() != 2 || parts[2].len() != 2 {
return None; return None;
} }
// Check all parts are numeric (also ensures ASCII)
if !parts.iter().all(|p| p.chars().all(|c| c.is_ascii_digit())) { if !parts.iter().all(|p| p.chars().all(|c| c.is_ascii_digit())) {
return None; return None;
} }
// After the date, must be end-of-string or ".log"
let suffix = rest.get(10..)?; let suffix = rest.get(10..)?;
if suffix.is_empty() || suffix == ".log" { if suffix.is_empty() || suffix == ".log" {
Some(date_part.to_string()) Some(date_part.to_string())
@@ -153,16 +120,13 @@ mod tests {
fn test_cleanup_old_logs_deletes_old_files() { fn test_cleanup_old_logs_deletes_old_files() {
let dir = TempDir::new().unwrap(); let dir = TempDir::new().unwrap();
// Create old log files (well before any reasonable retention)
File::create(dir.path().join("lore.2020-01-01.log")).unwrap(); File::create(dir.path().join("lore.2020-01-01.log")).unwrap();
File::create(dir.path().join("lore.2020-01-15.log")).unwrap(); File::create(dir.path().join("lore.2020-01-15.log")).unwrap();
// Create a recent log file (today)
let today = chrono::Utc::now().format("%Y-%m-%d").to_string(); let today = chrono::Utc::now().format("%Y-%m-%d").to_string();
let recent_name = format!("lore.{today}.log"); let recent_name = format!("lore.{today}.log");
File::create(dir.path().join(&recent_name)).unwrap(); File::create(dir.path().join(&recent_name)).unwrap();
// Create a non-log file that should NOT be deleted
File::create(dir.path().join("other.txt")).unwrap(); File::create(dir.path().join("other.txt")).unwrap();
let deleted = cleanup_old_logs(dir.path(), 7); let deleted = cleanup_old_logs(dir.path(), 7);
@@ -192,7 +156,6 @@ mod tests {
#[test] #[test]
fn test_build_stderr_filter_default() { fn test_build_stderr_filter_default() {
// Can't easily assert filter contents, but verify it doesn't panic
let _filter = build_stderr_filter(0, false); let _filter = build_stderr_filter(0, false);
} }
@@ -206,7 +169,6 @@ mod tests {
#[test] #[test]
fn test_build_stderr_filter_quiet_overrides_verbose() { fn test_build_stderr_filter_quiet_overrides_verbose() {
// Quiet should win over verbose
let _filter = build_stderr_filter(3, true); let _filter = build_stderr_filter(3, true);
} }

View File

@@ -1,9 +1,3 @@
//! Performance metrics types and tracing layer for sync pipeline observability.
//!
//! Provides:
//! - [`StageTiming`]: Serializable timing/counter data for pipeline stages
//! - [`MetricsLayer`]: Custom tracing subscriber layer that captures span timing
use std::collections::HashMap; use std::collections::HashMap;
use std::sync::{Arc, Mutex}; use std::sync::{Arc, Mutex};
use std::time::Instant; use std::time::Instant;
@@ -14,16 +8,10 @@ use tracing::span::{Attributes, Id, Record};
use tracing_subscriber::layer::{Context, Layer}; use tracing_subscriber::layer::{Context, Layer};
use tracing_subscriber::registry::LookupSpan; use tracing_subscriber::registry::LookupSpan;
/// Returns true when value is zero (for serde `skip_serializing_if`).
fn is_zero(v: &usize) -> bool { fn is_zero(v: &usize) -> bool {
*v == 0 *v == 0
} }
/// Timing and counter data for a single pipeline stage.
///
/// Supports nested sub-stages for hierarchical timing breakdowns.
/// Fields with zero/empty values are omitted from JSON output to
/// keep robot-mode payloads compact.
#[derive(Debug, Clone, Serialize, Deserialize)] #[derive(Debug, Clone, Serialize, Deserialize)]
pub struct StageTiming { pub struct StageTiming {
pub name: String, pub name: String,
@@ -43,11 +31,6 @@ pub struct StageTiming {
pub sub_stages: Vec<StageTiming>, pub sub_stages: Vec<StageTiming>,
} }
// ============================================================================
// MetricsLayer: custom tracing subscriber layer
// ============================================================================
/// Internal data tracked per open span.
struct SpanData { struct SpanData {
name: String, name: String,
parent_id: Option<u64>, parent_id: Option<u64>,
@@ -57,19 +40,12 @@ struct SpanData {
retries: usize, retries: usize,
} }
/// Completed span data with its original ID and parent ID.
struct CompletedSpan { struct CompletedSpan {
id: u64, id: u64,
parent_id: Option<u64>, parent_id: Option<u64>,
timing: StageTiming, timing: StageTiming,
} }
/// Custom tracing layer that captures span timing and structured fields.
///
/// Collects data from `#[instrument]` spans and materializes it into
/// a `Vec<StageTiming>` tree via [`extract_timings`].
///
/// Thread-safe via `Arc<Mutex<>>` — suitable for concurrent span operations.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct MetricsLayer { pub struct MetricsLayer {
spans: Arc<Mutex<HashMap<u64, SpanData>>>, spans: Arc<Mutex<HashMap<u64, SpanData>>>,
@@ -90,45 +66,34 @@ impl MetricsLayer {
} }
} }
/// Extract timing tree for a completed run.
///
/// Returns the top-level stages with sub-stages nested.
/// Call after the root span closes.
pub fn extract_timings(&self) -> Vec<StageTiming> { pub fn extract_timings(&self) -> Vec<StageTiming> {
let completed = self.completed.lock().unwrap_or_else(|e| e.into_inner()); let completed = self.completed.lock().unwrap_or_else(|e| e.into_inner());
if completed.is_empty() { if completed.is_empty() {
return Vec::new(); return Vec::new();
} }
// Build children map: parent_id -> Vec<StageTiming>
let mut children_map: HashMap<u64, Vec<StageTiming>> = HashMap::new(); let mut children_map: HashMap<u64, Vec<StageTiming>> = HashMap::new();
let mut roots = Vec::new(); let mut roots = Vec::new();
let mut id_to_timing: HashMap<u64, StageTiming> = HashMap::new(); let mut id_to_timing: HashMap<u64, StageTiming> = HashMap::new();
// First pass: collect all timings by ID
for entry in completed.iter() { for entry in completed.iter() {
id_to_timing.insert(entry.id, entry.timing.clone()); id_to_timing.insert(entry.id, entry.timing.clone());
} }
// Second pass: process in reverse order (children close before parents)
// to build the tree bottom-up
for entry in completed.iter() { for entry in completed.iter() {
// Attach any children that were collected for this span
if let Some(timing) = id_to_timing.get_mut(&entry.id) if let Some(timing) = id_to_timing.get_mut(&entry.id)
&& let Some(children) = children_map.remove(&entry.id) && let Some(children) = children_map.remove(&entry.id)
{ {
timing.sub_stages = children; timing.sub_stages = children;
} }
if let Some(parent_id) = entry.parent_id { if let Some(parent_id) = entry.parent_id
// This is a child span — attach to parent's children && let Some(timing) = id_to_timing.remove(&entry.id)
if let Some(timing) = id_to_timing.remove(&entry.id) { {
children_map.entry(parent_id).or_default().push(timing); children_map.entry(parent_id).or_default().push(timing);
} }
} }
}
// Remaining entries in id_to_timing are roots
for entry in completed.iter() { for entry in completed.iter() {
if entry.parent_id.is_none() if entry.parent_id.is_none()
&& let Some(mut timing) = id_to_timing.remove(&entry.id) && let Some(mut timing) = id_to_timing.remove(&entry.id)
@@ -144,7 +109,6 @@ impl MetricsLayer {
} }
} }
/// Visitor that extracts field values from span attributes.
struct FieldVisitor<'a>(&'a mut HashMap<String, serde_json::Value>); struct FieldVisitor<'a>(&'a mut HashMap<String, serde_json::Value>);
impl tracing::field::Visit for FieldVisitor<'_> { impl tracing::field::Visit for FieldVisitor<'_> {
@@ -182,7 +146,6 @@ impl tracing::field::Visit for FieldVisitor<'_> {
} }
} }
/// Visitor that extracts event fields for rate-limit/retry detection.
#[derive(Default)] #[derive(Default)]
struct EventVisitor { struct EventVisitor {
status_code: Option<u64>, status_code: Option<u64>,
@@ -248,7 +211,6 @@ where
} }
fn on_event(&self, event: &tracing::Event<'_>, ctx: Context<'_, S>) { fn on_event(&self, event: &tracing::Event<'_>, ctx: Context<'_, S>) {
// Count rate-limit and retry events on the current span
if let Some(span_ref) = ctx.event_span(event) { if let Some(span_ref) = ctx.event_span(event) {
let id = span_ref.id(); let id = span_ref.id();
if let Some(data) = self if let Some(data) = self
@@ -317,7 +279,6 @@ where
} }
} }
// Manual Debug impl since SpanData and CompletedSpan don't derive Debug
impl std::fmt::Debug for SpanData { impl std::fmt::Debug for SpanData {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("SpanData") f.debug_struct("SpanData")
@@ -376,7 +337,6 @@ mod tests {
assert_eq!(json["rate_limit_hits"], 2); assert_eq!(json["rate_limit_hits"], 2);
assert_eq!(json["retries"], 5); assert_eq!(json["retries"], 5);
// Sub-stage present
let sub = &json["sub_stages"][0]; let sub = &json["sub_stages"][0];
assert_eq!(sub["name"], "ingest_issues"); assert_eq!(sub["name"], "ingest_issues");
assert_eq!(sub["project"], "group/repo"); assert_eq!(sub["project"], "group/repo");
@@ -400,7 +360,6 @@ mod tests {
let json = serde_json::to_value(&timing).unwrap(); let json = serde_json::to_value(&timing).unwrap();
let obj = json.as_object().unwrap(); let obj = json.as_object().unwrap();
// Zero fields must be absent
assert!(!obj.contains_key("items_skipped")); assert!(!obj.contains_key("items_skipped"));
assert!(!obj.contains_key("errors")); assert!(!obj.contains_key("errors"));
assert!(!obj.contains_key("rate_limit_hits")); assert!(!obj.contains_key("rate_limit_hits"));
@@ -408,7 +367,6 @@ mod tests {
assert!(!obj.contains_key("sub_stages")); assert!(!obj.contains_key("sub_stages"));
assert!(!obj.contains_key("project")); assert!(!obj.contains_key("project"));
// Required fields always present
assert!(obj.contains_key("name")); assert!(obj.contains_key("name"));
assert!(obj.contains_key("elapsed_ms")); assert!(obj.contains_key("elapsed_ms"));
assert!(obj.contains_key("items_processed")); assert!(obj.contains_key("items_processed"));
@@ -539,13 +497,12 @@ mod tests {
tracing::subscriber::with_default(subscriber, || { tracing::subscriber::with_default(subscriber, || {
let span = tracing::info_span!("test_stage"); let span = tracing::info_span!("test_stage");
let _guard = span.enter(); let _guard = span.enter();
// Simulate work
}); });
let timings = metrics.extract_timings(); let timings = metrics.extract_timings();
assert_eq!(timings.len(), 1); assert_eq!(timings.len(), 1);
assert_eq!(timings[0].name, "test_stage"); assert_eq!(timings[0].name, "test_stage");
assert!(timings[0].elapsed_ms < 100); // Should be near-instant assert!(timings[0].elapsed_ms < 100);
} }
#[test] #[test]

View File

@@ -1,5 +1,3 @@
//! Core infrastructure modules.
pub mod backoff; pub mod backoff;
pub mod config; pub mod config;
pub mod db; pub mod db;
@@ -9,9 +7,11 @@ pub mod events_db;
pub mod lock; pub mod lock;
pub mod logging; pub mod logging;
pub mod metrics; pub mod metrics;
pub mod note_parser;
pub mod paths; pub mod paths;
pub mod payloads; pub mod payloads;
pub mod project; pub mod project;
pub mod references;
pub mod sync_run; pub mod sync_run;
pub mod time; pub mod time;

561
src/core/note_parser.rs Normal file
View File

@@ -0,0 +1,561 @@
use std::sync::LazyLock;
use regex::Regex;
use rusqlite::Connection;
use tracing::debug;
use super::error::Result;
use super::time::now_ms;
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct ParsedCrossRef {
pub reference_type: String,
pub target_entity_type: String,
pub target_iid: i64,
pub target_project_path: Option<String>,
}
#[derive(Debug, Default)]
pub struct ExtractResult {
pub inserted: usize,
pub skipped_unresolvable: usize,
pub parse_failures: usize,
}
static MENTIONED_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"mentioned in (?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
)
.expect("mentioned regex is valid")
});
static CLOSED_BY_RE: LazyLock<Regex> = LazyLock::new(|| {
Regex::new(
r"closed by (?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
)
.expect("closed_by regex is valid")
});
pub fn parse_cross_refs(body: &str) -> Vec<ParsedCrossRef> {
let mut refs = Vec::new();
for caps in MENTIONED_RE.captures_iter(body) {
if let Some(parsed) = capture_to_cross_ref(&caps, "mentioned") {
refs.push(parsed);
}
}
for caps in CLOSED_BY_RE.captures_iter(body) {
if let Some(parsed) = capture_to_cross_ref(&caps, "closes") {
refs.push(parsed);
}
}
refs
}
fn capture_to_cross_ref(
caps: &regex::Captures<'_>,
reference_type: &str,
) -> Option<ParsedCrossRef> {
let sigil = caps.name("sigil")?.as_str();
let iid_str = caps.name("iid")?.as_str();
let iid: i64 = iid_str.parse().ok()?;
let project = caps.name("project").map(|m| m.as_str().to_owned());
let target_entity_type = match sigil {
"#" => "issue",
"!" => "merge_request",
_ => return None,
};
Some(ParsedCrossRef {
reference_type: reference_type.to_owned(),
target_entity_type: target_entity_type.to_owned(),
target_iid: iid,
target_project_path: project,
})
}
struct SystemNote {
note_id: i64,
body: String,
noteable_type: String,
entity_id: i64,
}
pub fn extract_refs_from_system_notes(conn: &Connection, project_id: i64) -> Result<ExtractResult> {
let mut result = ExtractResult::default();
let mut stmt = conn.prepare_cached(
"SELECT n.id, n.body, d.noteable_type,
COALESCE(d.issue_id, d.merge_request_id) AS entity_id
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
WHERE n.is_system = 1
AND n.project_id = ?1
AND n.body IS NOT NULL",
)?;
let notes: Vec<SystemNote> = stmt
.query_map([project_id], |row| {
Ok(SystemNote {
note_id: row.get(0)?,
body: row.get(1)?,
noteable_type: row.get(2)?,
entity_id: row.get(3)?,
})
})?
.filter_map(|r| r.ok())
.collect();
if notes.is_empty() {
return Ok(result);
}
let mut insert_stmt = conn.prepare_cached(
"INSERT OR IGNORE INTO entity_references
(project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
target_project_path, target_entity_iid,
reference_type, source_method, created_at)
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, 'note_parse', ?9)",
)?;
let now = now_ms();
for note in &notes {
let cross_refs = parse_cross_refs(&note.body);
if cross_refs.is_empty() {
debug!(
note_id = note.note_id,
body = %note.body,
"System note did not match any cross-reference pattern"
);
result.parse_failures += 1;
continue;
}
let source_entity_type = noteable_type_to_entity_type(&note.noteable_type);
for xref in &cross_refs {
let target_entity_id = if xref.target_project_path.is_none() {
resolve_entity_id(conn, project_id, &xref.target_entity_type, xref.target_iid)
} else {
resolve_cross_project_entity(
conn,
xref.target_project_path.as_deref().unwrap_or_default(),
&xref.target_entity_type,
xref.target_iid,
)
};
let rows_changed = insert_stmt.execute(rusqlite::params![
project_id,
source_entity_type,
note.entity_id,
xref.target_entity_type,
target_entity_id,
xref.target_project_path,
if target_entity_id.is_none() {
Some(xref.target_iid)
} else {
None
},
xref.reference_type,
now,
])?;
if rows_changed > 0 {
if target_entity_id.is_none() {
result.skipped_unresolvable += 1;
} else {
result.inserted += 1;
}
}
}
}
if result.inserted > 0 || result.skipped_unresolvable > 0 {
debug!(
inserted = result.inserted,
unresolvable = result.skipped_unresolvable,
parse_failures = result.parse_failures,
"System note cross-reference extraction complete"
);
}
Ok(result)
}
fn noteable_type_to_entity_type(noteable_type: &str) -> &str {
match noteable_type {
"Issue" => "issue",
"MergeRequest" => "merge_request",
_ => "issue",
}
}
fn resolve_entity_id(
conn: &Connection,
project_id: i64,
entity_type: &str,
iid: i64,
) -> Option<i64> {
let (table, id_col) = match entity_type {
"issue" => ("issues", "id"),
"merge_request" => ("merge_requests", "id"),
_ => return None,
};
let sql = format!("SELECT {id_col} FROM {table} WHERE project_id = ?1 AND iid = ?2");
conn.query_row(&sql, rusqlite::params![project_id, iid], |row| row.get(0))
.ok()
}
fn resolve_cross_project_entity(
conn: &Connection,
project_path: &str,
entity_type: &str,
iid: i64,
) -> Option<i64> {
let project_id: i64 = conn
.query_row(
"SELECT id FROM projects WHERE path_with_namespace = ?1",
[project_path],
|row| row.get(0),
)
.ok()?;
resolve_entity_id(conn, project_id, entity_type, iid)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parse_mentioned_in_mr() {
let refs = parse_cross_refs("mentioned in !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 567);
assert!(refs[0].target_project_path.is_none());
}
#[test]
fn test_parse_mentioned_in_issue() {
let refs = parse_cross_refs("mentioned in #234");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 234);
assert!(refs[0].target_project_path.is_none());
}
#[test]
fn test_parse_mentioned_cross_project() {
let refs = parse_cross_refs("mentioned in group/repo!789");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 789);
assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
}
#[test]
fn test_parse_mentioned_cross_project_issue() {
let refs = parse_cross_refs("mentioned in group/repo#123");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_entity_type, "issue");
assert_eq!(refs[0].target_iid, 123);
assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
}
#[test]
fn test_parse_closed_by_mr() {
let refs = parse_cross_refs("closed by !567");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 567);
assert!(refs[0].target_project_path.is_none());
}
#[test]
fn test_parse_closed_by_cross_project() {
let refs = parse_cross_refs("closed by group/repo!789");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].reference_type, "closes");
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 789);
assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
}
#[test]
fn test_parse_multiple_refs() {
let refs = parse_cross_refs("mentioned in !123 and mentioned in #456");
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 123);
assert_eq!(refs[1].target_entity_type, "issue");
assert_eq!(refs[1].target_iid, 456);
}
#[test]
fn test_parse_no_refs() {
let refs = parse_cross_refs("Updated the description");
assert!(refs.is_empty());
}
#[test]
fn test_parse_non_english_note() {
let refs = parse_cross_refs("a ajout\u{00e9} l'\u{00e9}tiquette ~bug");
assert!(refs.is_empty());
}
#[test]
fn test_parse_multi_level_group_path() {
let refs = parse_cross_refs("mentioned in top/sub/project#123");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("top/sub/project")
);
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_parse_deeply_nested_group_path() {
let refs = parse_cross_refs("mentioned in a/b/c/d/e!42");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_project_path.as_deref(), Some("a/b/c/d/e"));
assert_eq!(refs[0].target_iid, 42);
}
#[test]
fn test_parse_hyphenated_project_path() {
let refs = parse_cross_refs("mentioned in my-group/my-project#99");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("my-group/my-project")
);
}
#[test]
fn test_parse_dotted_project_path() {
let refs = parse_cross_refs("mentioned in visiostack.io/backend#123");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("visiostack.io/backend")
);
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_parse_dotted_nested_project_path() {
let refs = parse_cross_refs("closed by my.org/sub.group/my.project!42");
assert_eq!(refs.len(), 1);
assert_eq!(
refs[0].target_project_path.as_deref(),
Some("my.org/sub.group/my.project")
);
assert_eq!(refs[0].target_entity_type, "merge_request");
assert_eq!(refs[0].target_iid, 42);
}
#[test]
fn test_parse_self_reference_is_valid() {
let refs = parse_cross_refs("mentioned in #123");
assert_eq!(refs.len(), 1);
assert_eq!(refs[0].target_iid, 123);
}
#[test]
fn test_parse_mixed_mentioned_and_closed() {
let refs = parse_cross_refs("mentioned in !10 and closed by !20");
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].reference_type, "mentioned");
assert_eq!(refs[0].target_iid, 10);
assert_eq!(refs[1].reference_type, "closes");
assert_eq!(refs[1].target_iid, 20);
}
fn setup_test_db() -> Connection {
use crate::core::db::{create_connection, run_migrations};
let conn = create_connection(std::path::Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_test_data(conn: &Connection) -> i64 {
let now = now_ms();
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'group/test-project', 'https://gitlab.com/group/test-project', ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 1000, 1, 123, 'Test Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
VALUES (11, 1001, 1, 456, 'Another Issue', 'opened', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
VALUES (20, 2000, 1, 789, 'Test MR', 'opened', 'feat', 'main', 'dev', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES (30, 'disc-aaa', 1, 10, 'Issue', ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO discussions (id, gitlab_discussion_id, project_id, merge_request_id, noteable_type, last_seen_at)
VALUES (31, 'disc-bbb', 1, 20, 'MergeRequest', ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (40, 4000, 30, 1, 1, 'mentioned in !789', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (41, 4001, 31, 1, 1, 'mentioned in #456', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (42, 4002, 30, 1, 0, 'mentioned in !999', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (43, 4003, 30, 1, 1, 'added label ~bug', ?1, ?1, ?1)",
[now],
)
.unwrap();
conn.execute(
"INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
VALUES (44, 4004, 30, 1, 1, 'mentioned in other/project#999', ?1, ?1, ?1)",
[now],
)
.unwrap();
1
}
#[test]
fn test_extract_refs_from_system_notes_integration() {
let conn = setup_test_db();
let project_id = seed_test_data(&conn);
let result = extract_refs_from_system_notes(&conn, project_id).unwrap();
assert_eq!(result.inserted, 2, "Two same-project refs should resolve");
assert_eq!(
result.skipped_unresolvable, 1,
"One cross-project ref should be unresolvable"
);
assert_eq!(
result.parse_failures, 1,
"One system note has no cross-ref pattern"
);
let ref_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE project_id = ?1 AND source_method = 'note_parse'",
[project_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(ref_count, 3, "Should have 3 entity_references rows total");
let unresolved_count: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE target_entity_id IS NULL AND source_method = 'note_parse'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
unresolved_count, 1,
"Should have 1 unresolved cross-project ref"
);
let (path, iid): (String, i64) = conn
.query_row(
"SELECT target_project_path, target_entity_iid FROM entity_references WHERE target_entity_id IS NULL",
[],
|row| Ok((row.get(0)?, row.get(1)?)),
)
.unwrap();
assert_eq!(path, "other/project");
assert_eq!(iid, 999);
}
#[test]
fn test_extract_refs_idempotent() {
let conn = setup_test_db();
let project_id = seed_test_data(&conn);
let result1 = extract_refs_from_system_notes(&conn, project_id).unwrap();
let result2 = extract_refs_from_system_notes(&conn, project_id).unwrap();
assert_eq!(result2.inserted, 0);
assert_eq!(result2.skipped_unresolvable, 0);
let total: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE source_method = 'note_parse'",
[],
|row| row.get(0),
)
.unwrap();
assert_eq!(
total,
(result1.inserted + result1.skipped_unresolvable) as i64
);
}
#[test]
fn test_extract_refs_empty_project() {
let conn = setup_test_db();
let result = extract_refs_from_system_notes(&conn, 999).unwrap();
assert_eq!(result.inserted, 0);
assert_eq!(result.skipped_unresolvable, 0);
assert_eq!(result.parse_failures, 0);
}
}

View File

@@ -1,50 +1,31 @@
//! XDG-compliant path resolution for config and data directories.
use std::path::PathBuf; use std::path::PathBuf;
/// Get the path to the config file.
///
/// Resolution order:
/// 1. CLI flag override (if provided)
/// 2. LORE_CONFIG_PATH environment variable
/// 3. XDG default (~/.config/lore/config.json)
/// 4. Local fallback (./lore.config.json) if exists
/// 5. Returns XDG default even if not exists
pub fn get_config_path(cli_override: Option<&str>) -> PathBuf { pub fn get_config_path(cli_override: Option<&str>) -> PathBuf {
// 1. CLI flag override
if let Some(path) = cli_override { if let Some(path) = cli_override {
return PathBuf::from(path); return PathBuf::from(path);
} }
// 2. Environment variable
if let Ok(path) = std::env::var("LORE_CONFIG_PATH") { if let Ok(path) = std::env::var("LORE_CONFIG_PATH") {
return PathBuf::from(path); return PathBuf::from(path);
} }
// 3. XDG default
let xdg_path = get_xdg_config_dir().join("lore").join("config.json"); let xdg_path = get_xdg_config_dir().join("lore").join("config.json");
if xdg_path.exists() { if xdg_path.exists() {
return xdg_path; return xdg_path;
} }
// 4. Local fallback (for development)
let local_path = PathBuf::from("lore.config.json"); let local_path = PathBuf::from("lore.config.json");
if local_path.exists() { if local_path.exists() {
return local_path; return local_path;
} }
// 5. Return XDG path (will trigger not-found error if missing)
xdg_path xdg_path
} }
/// Get the data directory path.
/// Uses XDG_DATA_HOME or defaults to ~/.local/share/lore
pub fn get_data_dir() -> PathBuf { pub fn get_data_dir() -> PathBuf {
get_xdg_data_dir().join("lore") get_xdg_data_dir().join("lore")
} }
/// Get the database file path.
/// Uses config override if provided, otherwise uses default in data dir.
pub fn get_db_path(config_override: Option<&str>) -> PathBuf { pub fn get_db_path(config_override: Option<&str>) -> PathBuf {
if let Some(path) = config_override { if let Some(path) = config_override {
return PathBuf::from(path); return PathBuf::from(path);
@@ -52,8 +33,6 @@ pub fn get_db_path(config_override: Option<&str>) -> PathBuf {
get_data_dir().join("lore.db") get_data_dir().join("lore.db")
} }
/// Get the log directory path.
/// Uses config override if provided, otherwise uses default in data dir.
pub fn get_log_dir(config_override: Option<&str>) -> PathBuf { pub fn get_log_dir(config_override: Option<&str>) -> PathBuf {
if let Some(path) = config_override { if let Some(path) = config_override {
return PathBuf::from(path); return PathBuf::from(path);
@@ -61,8 +40,6 @@ pub fn get_log_dir(config_override: Option<&str>) -> PathBuf {
get_data_dir().join("logs") get_data_dir().join("logs")
} }
/// Get the backup directory path.
/// Uses config override if provided, otherwise uses default in data dir.
pub fn get_backup_dir(config_override: Option<&str>) -> PathBuf { pub fn get_backup_dir(config_override: Option<&str>) -> PathBuf {
if let Some(path) = config_override { if let Some(path) = config_override {
return PathBuf::from(path); return PathBuf::from(path);
@@ -70,7 +47,6 @@ pub fn get_backup_dir(config_override: Option<&str>) -> PathBuf {
get_data_dir().join("backups") get_data_dir().join("backups")
} }
/// Get XDG config directory, falling back to ~/.config
fn get_xdg_config_dir() -> PathBuf { fn get_xdg_config_dir() -> PathBuf {
std::env::var("XDG_CONFIG_HOME") std::env::var("XDG_CONFIG_HOME")
.map(PathBuf::from) .map(PathBuf::from)
@@ -81,7 +57,6 @@ fn get_xdg_config_dir() -> PathBuf {
}) })
} }
/// Get XDG data directory, falling back to ~/.local/share
fn get_xdg_data_dir() -> PathBuf { fn get_xdg_data_dir() -> PathBuf {
std::env::var("XDG_DATA_HOME") std::env::var("XDG_DATA_HOME")
.map(PathBuf::from) .map(PathBuf::from)
@@ -102,8 +77,4 @@ mod tests {
let path = get_config_path(Some("/custom/path.json")); let path = get_config_path(Some("/custom/path.json"));
assert_eq!(path, PathBuf::from("/custom/path.json")); assert_eq!(path, PathBuf::from("/custom/path.json"));
} }
// Note: env var tests removed - mutating process-global env vars
// in parallel tests is unsafe in Rust 2024. The env var code path
// is trivial (std::env::var) and doesn't warrant the complexity.
} }

View File

@@ -1,5 +1,3 @@
//! Raw payload storage with optional compression and deduplication.
use flate2::Compression; use flate2::Compression;
use flate2::read::GzDecoder; use flate2::read::GzDecoder;
use flate2::write::GzEncoder; use flate2::write::GzEncoder;
@@ -10,26 +8,21 @@ use std::io::{Read, Write};
use super::error::Result; use super::error::Result;
use super::time::now_ms; use super::time::now_ms;
/// Options for storing a payload.
pub struct StorePayloadOptions<'a> { pub struct StorePayloadOptions<'a> {
pub project_id: Option<i64>, pub project_id: Option<i64>,
pub resource_type: &'a str, // 'project' | 'issue' | 'mr' | 'note' | 'discussion' pub resource_type: &'a str,
pub gitlab_id: &'a str, // TEXT because discussion IDs are strings pub gitlab_id: &'a str,
pub json_bytes: &'a [u8], pub json_bytes: &'a [u8],
pub compress: bool, pub compress: bool,
} }
/// Store a raw API payload with optional compression and deduplication.
/// Returns the row ID (either new or existing if duplicate).
pub fn store_payload(conn: &Connection, options: StorePayloadOptions) -> Result<i64> { pub fn store_payload(conn: &Connection, options: StorePayloadOptions) -> Result<i64> {
let json_bytes = options.json_bytes; let json_bytes = options.json_bytes;
// 2. SHA-256 hash the JSON bytes (pre-compression)
let mut hasher = Sha256::new(); let mut hasher = Sha256::new();
hasher.update(json_bytes); hasher.update(json_bytes);
let payload_hash = format!("{:x}", hasher.finalize()); let payload_hash = format!("{:x}", hasher.finalize());
// 3. Check for duplicate by (project_id, resource_type, gitlab_id, payload_hash)
let existing: Option<i64> = conn let existing: Option<i64> = conn
.query_row( .query_row(
"SELECT id FROM raw_payloads "SELECT id FROM raw_payloads
@@ -44,12 +37,10 @@ pub fn store_payload(conn: &Connection, options: StorePayloadOptions) -> Result<
) )
.ok(); .ok();
// 4. If duplicate, return existing ID
if let Some(id) = existing { if let Some(id) = existing {
return Ok(id); return Ok(id);
} }
// 5. Compress if requested
let (encoding, payload_bytes): (&str, std::borrow::Cow<'_, [u8]>) = if options.compress { let (encoding, payload_bytes): (&str, std::borrow::Cow<'_, [u8]>) = if options.compress {
let mut encoder = GzEncoder::new(Vec::new(), Compression::default()); let mut encoder = GzEncoder::new(Vec::new(), Compression::default());
encoder.write_all(json_bytes)?; encoder.write_all(json_bytes)?;
@@ -58,7 +49,6 @@ pub fn store_payload(conn: &Connection, options: StorePayloadOptions) -> Result<
("identity", std::borrow::Cow::Borrowed(json_bytes)) ("identity", std::borrow::Cow::Borrowed(json_bytes))
}; };
// 6. INSERT with content_encoding
conn.execute( conn.execute(
"INSERT INTO raw_payloads "INSERT INTO raw_payloads
(source, project_id, resource_type, gitlab_id, fetched_at, content_encoding, payload_hash, payload) (source, project_id, resource_type, gitlab_id, fetched_at, content_encoding, payload_hash, payload)
@@ -77,8 +67,6 @@ pub fn store_payload(conn: &Connection, options: StorePayloadOptions) -> Result<
Ok(conn.last_insert_rowid()) Ok(conn.last_insert_rowid())
} }
/// Read a raw payload by ID, decompressing if necessary.
/// Returns None if not found.
pub fn read_payload(conn: &Connection, id: i64) -> Result<Option<serde_json::Value>> { pub fn read_payload(conn: &Connection, id: i64) -> Result<Option<serde_json::Value>> {
let row: Option<(String, Vec<u8>)> = conn let row: Option<(String, Vec<u8>)> = conn
.query_row( .query_row(
@@ -92,7 +80,6 @@ pub fn read_payload(conn: &Connection, id: i64) -> Result<Option<serde_json::Val
return Ok(None); return Ok(None);
}; };
// Decompress if needed
let json_bytes = if encoding == "gzip" { let json_bytes = if encoding == "gzip" {
let mut decoder = GzDecoder::new(&payload_bytes[..]); let mut decoder = GzDecoder::new(&payload_bytes[..]);
let mut decompressed = Vec::new(); let mut decompressed = Vec::new();
@@ -117,7 +104,6 @@ mod tests {
let db_path = dir.path().join("test.db"); let db_path = dir.path().join("test.db");
let conn = create_connection(&db_path).unwrap(); let conn = create_connection(&db_path).unwrap();
// Create minimal schema for testing
conn.execute_batch( conn.execute_batch(
"CREATE TABLE raw_payloads ( "CREATE TABLE raw_payloads (
id INTEGER PRIMARY KEY, id INTEGER PRIMARY KEY,
@@ -212,6 +198,6 @@ mod tests {
) )
.unwrap(); .unwrap();
assert_eq!(id1, id2); // Same payload returns same ID assert_eq!(id1, id2);
} }
} }

View File

@@ -2,14 +2,7 @@ use rusqlite::Connection;
use super::error::{LoreError, Result}; use super::error::{LoreError, Result};
/// Resolve a project string to a project_id using cascading match:
/// 1. Exact match on path_with_namespace
/// 2. Case-insensitive exact match
/// 3. Suffix match (e.g., "auth-service" matches "group/auth-service") — only if unambiguous
/// 4. Substring match (e.g., "typescript" matches "vs/typescript-code") — only if unambiguous
/// 5. Error with available projects list
pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> { pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> {
// Step 1: Exact match
let exact = conn.query_row( let exact = conn.query_row(
"SELECT id FROM projects WHERE path_with_namespace = ?1", "SELECT id FROM projects WHERE path_with_namespace = ?1",
rusqlite::params![project_str], rusqlite::params![project_str],
@@ -19,7 +12,6 @@ pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> {
return Ok(id); return Ok(id);
} }
// Step 2: Case-insensitive exact match
let ci = conn.query_row( let ci = conn.query_row(
"SELECT id FROM projects WHERE LOWER(path_with_namespace) = LOWER(?1)", "SELECT id FROM projects WHERE LOWER(path_with_namespace) = LOWER(?1)",
rusqlite::params![project_str], rusqlite::params![project_str],
@@ -29,7 +21,6 @@ pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> {
return Ok(id); return Ok(id);
} }
// Step 3: Suffix match (unambiguous)
let mut suffix_stmt = conn.prepare( let mut suffix_stmt = conn.prepare(
"SELECT id, path_with_namespace FROM projects "SELECT id, path_with_namespace FROM projects
WHERE path_with_namespace LIKE '%/' || ?1 WHERE path_with_namespace LIKE '%/' || ?1
@@ -59,7 +50,6 @@ pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> {
_ => {} _ => {}
} }
// Step 4: Case-insensitive substring match (unambiguous)
let mut substr_stmt = conn.prepare( let mut substr_stmt = conn.prepare(
"SELECT id, path_with_namespace FROM projects "SELECT id, path_with_namespace FROM projects
WHERE LOWER(path_with_namespace) LIKE '%' || LOWER(?1) || '%'", WHERE LOWER(path_with_namespace) LIKE '%' || LOWER(?1) || '%'",
@@ -88,7 +78,6 @@ pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> {
_ => {} _ => {}
} }
// Step 5: No match — list available projects
let mut all_stmt = let mut all_stmt =
conn.prepare("SELECT path_with_namespace FROM projects ORDER BY path_with_namespace")?; conn.prepare("SELECT path_with_namespace FROM projects ORDER BY path_with_namespace")?;
let all_projects: Vec<String> = all_stmt let all_projects: Vec<String> = all_stmt
@@ -211,7 +200,6 @@ mod tests {
let conn = setup_db(); let conn = setup_db();
insert_project(&conn, 1, "vs/python-code"); insert_project(&conn, 1, "vs/python-code");
insert_project(&conn, 2, "vs/typescript-code"); insert_project(&conn, 2, "vs/typescript-code");
// "code" matches both projects
let err = resolve_project(&conn, "code").unwrap_err(); let err = resolve_project(&conn, "code").unwrap_err();
let msg = err.to_string(); let msg = err.to_string();
assert!( assert!(
@@ -225,11 +213,9 @@ mod tests {
#[test] #[test]
fn test_suffix_preferred_over_substring() { fn test_suffix_preferred_over_substring() {
// Suffix match (step 3) should resolve before substring (step 4)
let conn = setup_db(); let conn = setup_db();
insert_project(&conn, 1, "backend/auth-service"); insert_project(&conn, 1, "backend/auth-service");
insert_project(&conn, 2, "backend/auth-service-v2"); insert_project(&conn, 2, "backend/auth-service-v2");
// "auth-service" is an exact suffix of project 1
let id = resolve_project(&conn, "auth-service").unwrap(); let id = resolve_project(&conn, "auth-service").unwrap();
assert_eq!(id, 1); assert_eq!(id, 1);
} }

551
src/core/references.rs Normal file
View File

@@ -0,0 +1,551 @@
use rusqlite::{Connection, OptionalExtension};
use tracing::info;
use super::error::Result;
use super::time::now_ms;
pub fn extract_refs_from_state_events(conn: &Connection, project_id: i64) -> Result<usize> {
let changes = conn.execute(
"INSERT OR IGNORE INTO entity_references (
project_id,
source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
reference_type, source_method, created_at
)
SELECT
rse.project_id,
'merge_request',
mr.id,
'issue',
rse.issue_id,
'closes',
'api',
rse.created_at
FROM resource_state_events rse
JOIN merge_requests mr
ON mr.project_id = rse.project_id
AND mr.iid = rse.source_merge_request_iid
WHERE rse.source_merge_request_iid IS NOT NULL
AND rse.issue_id IS NOT NULL
AND rse.project_id = ?1",
rusqlite::params![project_id],
)?;
if changes > 0 {
info!(
project_id,
references_inserted = changes,
"Extracted cross-references from state events"
);
}
Ok(changes)
}
#[derive(Debug, Clone)]
pub struct EntityReference<'a> {
pub project_id: i64,
pub source_entity_type: &'a str,
pub source_entity_id: i64,
pub target_entity_type: &'a str,
pub target_entity_id: Option<i64>,
pub target_project_path: Option<&'a str>,
pub target_entity_iid: Option<i64>,
pub reference_type: &'a str,
pub source_method: &'a str,
}
pub fn insert_entity_reference(conn: &Connection, ref_: &EntityReference<'_>) -> Result<bool> {
let now = now_ms();
let changes = conn.execute(
"INSERT OR IGNORE INTO entity_references \
(project_id, source_entity_type, source_entity_id, \
target_entity_type, target_entity_id, target_project_path, target_entity_iid, \
reference_type, source_method, created_at) \
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10)",
rusqlite::params![
ref_.project_id,
ref_.source_entity_type,
ref_.source_entity_id,
ref_.target_entity_type,
ref_.target_entity_id,
ref_.target_project_path,
ref_.target_entity_iid,
ref_.reference_type,
ref_.source_method,
now,
],
)?;
Ok(changes > 0)
}
pub fn resolve_issue_local_id(
conn: &Connection,
project_id: i64,
issue_iid: i64,
) -> Result<Option<i64>> {
let mut stmt =
conn.prepare_cached("SELECT id FROM issues WHERE project_id = ?1 AND iid = ?2")?;
let result = stmt
.query_row(rusqlite::params![project_id, issue_iid], |row| row.get(0))
.optional()?;
Ok(result)
}
pub fn resolve_project_path(conn: &Connection, gitlab_project_id: i64) -> Result<Option<String>> {
let mut stmt = conn
.prepare_cached("SELECT path_with_namespace FROM projects WHERE gitlab_project_id = ?1")?;
let result = stmt
.query_row(rusqlite::params![gitlab_project_id], |row| row.get(0))
.optional()?;
Ok(result)
}
pub fn count_references_for_source(
conn: &Connection,
source_entity_type: &str,
source_entity_id: i64,
) -> Result<usize> {
let count: i64 = conn.query_row(
"SELECT COUNT(*) FROM entity_references \
WHERE source_entity_type = ?1 AND source_entity_id = ?2",
rusqlite::params![source_entity_type, source_entity_id],
|row| row.get(0),
)?;
Ok(count as usize)
}
#[cfg(test)]
mod tests {
use super::*;
use crate::core::db::{create_connection, run_migrations};
use std::path::Path;
fn setup_test_db() -> Connection {
let conn = create_connection(Path::new(":memory:")).unwrap();
run_migrations(&conn).unwrap();
conn
}
fn seed_project_issue_mr(conn: &Connection) -> (i64, i64, i64) {
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (1, 100, 'group/repo', 'https://gitlab.example.com/group/repo', 1000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
VALUES (1, 200, 10, 1, 'Test issue', 'closed', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (1, 300, 5, 1, 'Test MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
[],
)
.unwrap();
(1, 1, 1)
}
#[test]
fn test_extract_refs_from_state_events_basic() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count, 1, "Should insert exactly one reference");
let (src_type, src_id, tgt_type, tgt_id, ref_type, method): (
String,
i64,
String,
i64,
String,
String,
) = conn
.query_row(
"SELECT source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
reference_type, source_method
FROM entity_references WHERE project_id = ?1",
[project_id],
|row| {
Ok((
row.get(0)?,
row.get(1)?,
row.get(2)?,
row.get(3)?,
row.get(4)?,
row.get(5)?,
))
},
)
.unwrap();
assert_eq!(src_type, "merge_request");
assert_eq!(src_id, mr_id, "Source should be the MR's local DB id");
assert_eq!(tgt_type, "issue");
assert_eq!(tgt_id, issue_id, "Target should be the issue's local DB id");
assert_eq!(ref_type, "closes");
assert_eq!(method, "api");
}
#[test]
fn test_extract_refs_dedup_with_closes_issues() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO entity_references
(project_id, source_entity_type, source_entity_id,
target_entity_type, target_entity_id,
reference_type, source_method, created_at)
VALUES (?1, 'merge_request', ?2, 'issue', ?3, 'closes', 'api', 3000)",
rusqlite::params![project_id, mr_id, issue_id],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count, 0, "Should not insert duplicate reference");
let total: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
[project_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(total, 1, "Should still have exactly one reference");
}
#[test]
fn test_extract_refs_no_source_mr() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, NULL)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count, 0, "Should not create refs when no source MR");
}
#[test]
fn test_extract_refs_mr_not_synced() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (2, ?1, ?2, NULL, 'closed', 3000, 999)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(
count, 0,
"Should not create ref when MR is not synced locally"
);
}
#[test]
fn test_extract_refs_idempotent() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count1 = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count1, 1);
let count2 = extract_refs_from_state_events(&conn, project_id).unwrap();
assert_eq!(count2, 0, "Second run should insert nothing (idempotent)");
}
#[test]
fn test_extract_refs_multiple_events_same_mr_issue() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (2, ?1, ?2, NULL, 'closed', 4000, 5)",
rusqlite::params![project_id, issue_id],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, project_id).unwrap();
assert!(count <= 2, "At most 2 inserts attempted");
let total: i64 = conn
.query_row(
"SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
[project_id],
|row| row.get(0),
)
.unwrap();
assert_eq!(
total, 1,
"Only one unique reference should exist for same MR->issue pair"
);
}
#[test]
fn test_extract_refs_scoped_to_project() {
let conn = setup_test_db();
seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
VALUES (2, 101, 'group/other', 'https://gitlab.example.com/group/other', 1000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
VALUES (2, 201, 10, 2, 'Other issue', 'closed', 1000, 2000, 2000)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
VALUES (2, 301, 5, 2, 'Other MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
[],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (1, 1, 1, NULL, 'closed', 3000, 5)",
[],
)
.unwrap();
conn.execute(
"INSERT INTO resource_state_events
(gitlab_id, project_id, issue_id, merge_request_id, state,
created_at, source_merge_request_iid)
VALUES (2, 2, 2, NULL, 'closed', 3000, 5)",
[],
)
.unwrap();
let count = extract_refs_from_state_events(&conn, 1).unwrap();
assert_eq!(count, 1);
let total: i64 = conn
.query_row("SELECT COUNT(*) FROM entity_references", [], |row| {
row.get(0)
})
.unwrap();
assert_eq!(total, 1, "Only project 1 refs should be created");
}
#[test]
fn test_insert_entity_reference_creates_row() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: Some(issue_id),
target_project_path: None,
target_entity_iid: None,
reference_type: "closes",
source_method: "api",
};
let inserted = insert_entity_reference(&conn, &ref_).unwrap();
assert!(inserted);
let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
assert_eq!(count, 1);
}
#[test]
fn test_insert_entity_reference_idempotent() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: Some(issue_id),
target_project_path: None,
target_entity_iid: None,
reference_type: "closes",
source_method: "api",
};
let first = insert_entity_reference(&conn, &ref_).unwrap();
assert!(first);
let second = insert_entity_reference(&conn, &ref_).unwrap();
assert!(!second, "Duplicate insert should be ignored");
let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
assert_eq!(count, 1, "Still just one reference");
}
#[test]
fn test_insert_entity_reference_cross_project_unresolved() {
let conn = setup_test_db();
let (project_id, _issue_id, mr_id) = seed_project_issue_mr(&conn);
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: None,
target_project_path: Some("other-group/other-project"),
target_entity_iid: Some(99),
reference_type: "closes",
source_method: "api",
};
let inserted = insert_entity_reference(&conn, &ref_).unwrap();
assert!(inserted);
let (target_id, target_path, target_iid): (Option<i64>, Option<String>, Option<i64>) = conn
.query_row(
"SELECT target_entity_id, target_project_path, target_entity_iid \
FROM entity_references WHERE source_entity_id = ?1",
[mr_id],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
)
.unwrap();
assert!(target_id.is_none());
assert_eq!(target_path, Some("other-group/other-project".to_string()));
assert_eq!(target_iid, Some(99));
}
#[test]
fn test_insert_multiple_closes_references() {
let conn = setup_test_db();
let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
conn.execute(
"INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
VALUES (10, 210, 11, ?1, 'Second issue', 'opened', 1000, 2000, 2000)",
rusqlite::params![project_id],
)
.unwrap();
let issue_id_2 = 10i64;
for target_id in [issue_id, issue_id_2] {
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_id,
target_entity_type: "issue",
target_entity_id: Some(target_id),
target_project_path: None,
target_entity_iid: None,
reference_type: "closes",
source_method: "api",
};
insert_entity_reference(&conn, &ref_).unwrap();
}
let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
assert_eq!(count, 2);
}
#[test]
fn test_resolve_issue_local_id_found() {
let conn = setup_test_db();
let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
let resolved = resolve_issue_local_id(&conn, project_id, 10).unwrap();
assert_eq!(resolved, Some(issue_id));
}
#[test]
fn test_resolve_issue_local_id_not_found() {
let conn = setup_test_db();
let (project_id, _issue_id, _mr_id) = seed_project_issue_mr(&conn);
let resolved = resolve_issue_local_id(&conn, project_id, 999).unwrap();
assert!(resolved.is_none());
}
#[test]
fn test_resolve_project_path_found() {
let conn = setup_test_db();
seed_project_issue_mr(&conn);
let path = resolve_project_path(&conn, 100).unwrap();
assert_eq!(path, Some("group/repo".to_string()));
}
#[test]
fn test_resolve_project_path_not_found() {
let conn = setup_test_db();
let path = resolve_project_path(&conn, 999).unwrap();
assert!(path.is_none());
}
}

View File

@@ -1,25 +1,14 @@
//! Sync run lifecycle recorder.
//!
//! Encapsulates the INSERT-on-start, UPDATE-on-finish lifecycle for the
//! `sync_runs` table, enabling sync history tracking and observability.
use rusqlite::Connection; use rusqlite::Connection;
use super::error::Result; use super::error::Result;
use super::metrics::StageTiming; use super::metrics::StageTiming;
use super::time::now_ms; use super::time::now_ms;
/// Records a single sync run's lifecycle in the `sync_runs` table.
///
/// Created via [`start`](Self::start), then finalized with either
/// [`succeed`](Self::succeed) or [`fail`](Self::fail). Both finalizers
/// consume `self` to enforce single-use at compile time.
pub struct SyncRunRecorder { pub struct SyncRunRecorder {
row_id: i64, row_id: i64,
} }
impl SyncRunRecorder { impl SyncRunRecorder {
/// Insert a new `sync_runs` row with `status='running'`.
pub fn start(conn: &Connection, command: &str, run_id: &str) -> Result<Self> { pub fn start(conn: &Connection, command: &str, run_id: &str) -> Result<Self> {
let now = now_ms(); let now = now_ms();
conn.execute( conn.execute(
@@ -31,7 +20,6 @@ impl SyncRunRecorder {
Ok(Self { row_id }) Ok(Self { row_id })
} }
/// Mark run as succeeded with full metrics.
pub fn succeed( pub fn succeed(
self, self,
conn: &Connection, conn: &Connection,
@@ -57,7 +45,6 @@ impl SyncRunRecorder {
Ok(()) Ok(())
} }
/// Mark run as failed with error message and optional partial metrics.
pub fn fail( pub fn fail(
self, self,
conn: &Connection, conn: &Connection,
@@ -158,7 +145,6 @@ mod tests {
assert_eq!(total_items, 50); assert_eq!(total_items, 50);
assert_eq!(total_errors, 2); assert_eq!(total_errors, 2);
// Verify metrics_json is parseable
let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap(); let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
assert_eq!(parsed.len(), 1); assert_eq!(parsed.len(), 1);
assert_eq!(parsed[0].name, "ingest"); assert_eq!(parsed[0].name, "ingest");

View File

@@ -1,39 +1,24 @@
//! Time utilities for consistent timestamp handling.
//!
//! All database *_at columns use milliseconds since epoch for consistency.
use chrono::{DateTime, Utc}; use chrono::{DateTime, Utc};
/// Convert GitLab API ISO 8601 timestamp to milliseconds since epoch.
pub fn iso_to_ms(iso_string: &str) -> Option<i64> { pub fn iso_to_ms(iso_string: &str) -> Option<i64> {
DateTime::parse_from_rfc3339(iso_string) DateTime::parse_from_rfc3339(iso_string)
.ok() .ok()
.map(|dt| dt.timestamp_millis()) .map(|dt| dt.timestamp_millis())
} }
/// Convert milliseconds since epoch to ISO 8601 string.
pub fn ms_to_iso(ms: i64) -> String { pub fn ms_to_iso(ms: i64) -> String {
DateTime::from_timestamp_millis(ms) DateTime::from_timestamp_millis(ms)
.map(|dt| dt.to_rfc3339()) .map(|dt| dt.to_rfc3339())
.unwrap_or_else(|| "Invalid timestamp".to_string()) .unwrap_or_else(|| "Invalid timestamp".to_string())
} }
/// Get current time in milliseconds since epoch.
pub fn now_ms() -> i64 { pub fn now_ms() -> i64 {
Utc::now().timestamp_millis() Utc::now().timestamp_millis()
} }
/// Parse a relative time string (7d, 2w, 1m) or ISO date into ms epoch.
///
/// Returns the timestamp as of which to filter (cutoff point).
/// - `7d` = 7 days ago
/// - `2w` = 2 weeks ago
/// - `1m` = 1 month ago (30 days)
/// - `2024-01-15` = midnight UTC on that date
pub fn parse_since(input: &str) -> Option<i64> { pub fn parse_since(input: &str) -> Option<i64> {
let input = input.trim(); let input = input.trim();
// Try relative format: Nd, Nw, Nm
if let Some(num_str) = input.strip_suffix('d') { if let Some(num_str) = input.strip_suffix('d') {
let days: i64 = num_str.parse().ok()?; let days: i64 = num_str.parse().ok()?;
return Some(now_ms() - (days * 24 * 60 * 60 * 1000)); return Some(now_ms() - (days * 24 * 60 * 60 * 1000));
@@ -49,25 +34,20 @@ pub fn parse_since(input: &str) -> Option<i64> {
return Some(now_ms() - (months * 30 * 24 * 60 * 60 * 1000)); return Some(now_ms() - (months * 30 * 24 * 60 * 60 * 1000));
} }
// Try ISO date: YYYY-MM-DD
if input.len() == 10 && input.chars().filter(|&c| c == '-').count() == 2 { if input.len() == 10 && input.chars().filter(|&c| c == '-').count() == 2 {
let iso_full = format!("{input}T00:00:00Z"); let iso_full = format!("{input}T00:00:00Z");
return iso_to_ms(&iso_full); return iso_to_ms(&iso_full);
} }
// Try full ISO 8601
iso_to_ms(input) iso_to_ms(input)
} }
/// Convert ISO 8601 timestamp to milliseconds with strict error handling.
/// Returns Err with a descriptive message if the timestamp is invalid.
pub fn iso_to_ms_strict(iso_string: &str) -> Result<i64, String> { pub fn iso_to_ms_strict(iso_string: &str) -> Result<i64, String> {
DateTime::parse_from_rfc3339(iso_string) DateTime::parse_from_rfc3339(iso_string)
.map(|dt| dt.timestamp_millis()) .map(|dt| dt.timestamp_millis())
.map_err(|_| format!("Invalid timestamp: {}", iso_string)) .map_err(|_| format!("Invalid timestamp: {}", iso_string))
} }
/// Convert optional ISO 8601 timestamp to optional milliseconds (strict).
pub fn iso_to_ms_opt_strict(iso_string: &Option<String>) -> Result<Option<i64>, String> { pub fn iso_to_ms_opt_strict(iso_string: &Option<String>) -> Result<Option<i64>, String> {
match iso_string { match iso_string {
Some(s) => iso_to_ms_strict(s).map(Some), Some(s) => iso_to_ms_strict(s).map(Some),
@@ -75,7 +55,6 @@ pub fn iso_to_ms_opt_strict(iso_string: &Option<String>) -> Result<Option<i64>,
} }
} }
/// Format milliseconds epoch to human-readable full datetime.
pub fn format_full_datetime(ms: i64) -> String { pub fn format_full_datetime(ms: i64) -> String {
DateTime::from_timestamp_millis(ms) DateTime::from_timestamp_millis(ms)
.map(|dt| dt.format("%Y-%m-%d %H:%M UTC").to_string()) .map(|dt| dt.format("%Y-%m-%d %H:%M UTC").to_string())
@@ -101,7 +80,7 @@ mod tests {
#[test] #[test]
fn test_now_ms() { fn test_now_ms() {
let now = now_ms(); let now = now_ms();
assert!(now > 1700000000000); // After 2023 assert!(now > 1700000000000);
} }
#[test] #[test]
@@ -109,7 +88,7 @@ mod tests {
let now = now_ms(); let now = now_ms();
let seven_days = parse_since("7d").unwrap(); let seven_days = parse_since("7d").unwrap();
let expected = now - (7 * 24 * 60 * 60 * 1000); let expected = now - (7 * 24 * 60 * 60 * 1000);
assert!((seven_days - expected).abs() < 1000); // Within 1 second assert!((seven_days - expected).abs() < 1000);
} }
#[test] #[test]
@@ -132,7 +111,6 @@ mod tests {
fn test_parse_since_iso_date() { fn test_parse_since_iso_date() {
let ms = parse_since("2024-01-15").unwrap(); let ms = parse_since("2024-01-15").unwrap();
assert!(ms > 0); assert!(ms > 0);
// Should be midnight UTC on that date
let expected = iso_to_ms("2024-01-15T00:00:00Z").unwrap(); let expected = iso_to_ms("2024-01-15T00:00:00Z").unwrap();
assert_eq!(ms, expected); assert_eq!(ms, expected);
} }

View File

@@ -9,7 +9,6 @@ use super::truncation::{
}; };
use crate::core::error::Result; use crate::core::error::Result;
/// Source type for documents.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] #[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")] #[serde(rename_all = "snake_case")]
pub enum SourceType { pub enum SourceType {
@@ -27,10 +26,6 @@ impl SourceType {
} }
} }
/// Parse from CLI input, accepting common aliases.
///
/// Accepts: "issue", "issues", "mr", "mrs", "merge_request", "merge_requests",
/// "discussion", "discussions"
pub fn parse(s: &str) -> Option<Self> { pub fn parse(s: &str) -> Option<Self> {
match s.to_lowercase().as_str() { match s.to_lowercase().as_str() {
"issue" | "issues" => Some(Self::Issue), "issue" | "issues" => Some(Self::Issue),
@@ -47,7 +42,6 @@ impl std::fmt::Display for SourceType {
} }
} }
/// Generated document ready for storage.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct DocumentData { pub struct DocumentData {
pub source_type: SourceType, pub source_type: SourceType,
@@ -68,16 +62,12 @@ pub struct DocumentData {
pub truncated_reason: Option<String>, pub truncated_reason: Option<String>,
} }
/// Compute SHA-256 hash of content.
pub fn compute_content_hash(content: &str) -> String { pub fn compute_content_hash(content: &str) -> String {
let mut hasher = Sha256::new(); let mut hasher = Sha256::new();
hasher.update(content.as_bytes()); hasher.update(content.as_bytes());
format!("{:x}", hasher.finalize()) format!("{:x}", hasher.finalize())
} }
/// Compute SHA-256 hash over a sorted list of strings.
/// Used for labels_hash and paths_hash to detect changes efficiently.
/// Sorts by index reference to avoid cloning, hashes incrementally to avoid join allocation.
pub fn compute_list_hash(items: &[String]) -> String { pub fn compute_list_hash(items: &[String]) -> String {
let mut indices: Vec<usize> = (0..items.len()).collect(); let mut indices: Vec<usize> = (0..items.len()).collect();
indices.sort_by(|a, b| items[*a].cmp(&items[*b])); indices.sort_by(|a, b| items[*a].cmp(&items[*b]));
@@ -91,10 +81,7 @@ pub fn compute_list_hash(items: &[String]) -> String {
format!("{:x}", hasher.finalize()) format!("{:x}", hasher.finalize())
} }
/// Extract a searchable document from an issue.
/// Returns None if the issue has been deleted from the DB.
pub fn extract_issue_document(conn: &Connection, issue_id: i64) -> Result<Option<DocumentData>> { pub fn extract_issue_document(conn: &Connection, issue_id: i64) -> Result<Option<DocumentData>> {
// Query main issue entity with project info
let row = conn.query_row( let row = conn.query_row(
"SELECT i.id, i.iid, i.title, i.description, i.state, i.author_username, "SELECT i.id, i.iid, i.title, i.description, i.state, i.author_username,
i.created_at, i.updated_at, i.web_url, i.created_at, i.updated_at, i.web_url,
@@ -105,17 +92,17 @@ pub fn extract_issue_document(conn: &Connection, issue_id: i64) -> Result<Option
rusqlite::params![issue_id], rusqlite::params![issue_id],
|row| { |row| {
Ok(( Ok((
row.get::<_, i64>(0)?, // id row.get::<_, i64>(0)?,
row.get::<_, i64>(1)?, // iid row.get::<_, i64>(1)?,
row.get::<_, Option<String>>(2)?, // title row.get::<_, Option<String>>(2)?,
row.get::<_, Option<String>>(3)?, // description row.get::<_, Option<String>>(3)?,
row.get::<_, String>(4)?, // state row.get::<_, String>(4)?,
row.get::<_, Option<String>>(5)?, // author_username row.get::<_, Option<String>>(5)?,
row.get::<_, i64>(6)?, // created_at row.get::<_, i64>(6)?,
row.get::<_, i64>(7)?, // updated_at row.get::<_, i64>(7)?,
row.get::<_, Option<String>>(8)?, // web_url row.get::<_, Option<String>>(8)?,
row.get::<_, String>(9)?, // path_with_namespace row.get::<_, String>(9)?,
row.get::<_, i64>(10)?, // project_id row.get::<_, i64>(10)?,
)) ))
}, },
); );
@@ -138,7 +125,6 @@ pub fn extract_issue_document(conn: &Connection, issue_id: i64) -> Result<Option
Err(e) => return Err(e.into()), Err(e) => return Err(e.into()),
}; };
// Query labels via junction table
let mut label_stmt = conn.prepare_cached( let mut label_stmt = conn.prepare_cached(
"SELECT l.name FROM issue_labels il "SELECT l.name FROM issue_labels il
JOIN labels l ON l.id = il.label_id JOIN labels l ON l.id = il.label_id
@@ -149,10 +135,8 @@ pub fn extract_issue_document(conn: &Connection, issue_id: i64) -> Result<Option
.query_map(rusqlite::params![id], |row| row.get(0))? .query_map(rusqlite::params![id], |row| row.get(0))?
.collect::<std::result::Result<Vec<_>, _>>()?; .collect::<std::result::Result<Vec<_>, _>>()?;
// Build labels JSON array string
let labels_json = serde_json::to_string(&labels).unwrap_or_else(|_| "[]".to_string()); let labels_json = serde_json::to_string(&labels).unwrap_or_else(|_| "[]".to_string());
// Format content_text per PRD template
let display_title = title.as_deref().unwrap_or("(untitled)"); let display_title = title.as_deref().unwrap_or("(untitled)");
let mut content = format!( let mut content = format!(
"[[Issue]] #{}: {}\nProject: {}\n", "[[Issue]] #{}: {}\nProject: {}\n",
@@ -167,16 +151,14 @@ pub fn extract_issue_document(conn: &Connection, issue_id: i64) -> Result<Option
content.push_str(&format!("Author: @{}\n", author)); content.push_str(&format!("Author: @{}\n", author));
} }
// Add description section only if description is Some
if let Some(ref desc) = description { if let Some(ref desc) = description {
content.push_str("\n--- Description ---\n\n"); content.push_str("\n--- Description ---\n\n");
content.push_str(desc); content.push_str(desc);
} }
let labels_hash = compute_list_hash(&labels); let labels_hash = compute_list_hash(&labels);
let paths_hash = compute_list_hash(&[]); // Issues have no paths let paths_hash = compute_list_hash(&[]);
// Apply hard cap truncation for safety, then hash the final stored content
let hard_cap = truncate_hard_cap(&content); let hard_cap = truncate_hard_cap(&content);
let content_hash = compute_content_hash(&hard_cap.content); let content_hash = compute_content_hash(&hard_cap.content);
@@ -200,8 +182,6 @@ pub fn extract_issue_document(conn: &Connection, issue_id: i64) -> Result<Option
})) }))
} }
/// Extract a searchable document from a merge request.
/// Returns None if the MR has been deleted from the DB.
pub fn extract_mr_document(conn: &Connection, mr_id: i64) -> Result<Option<DocumentData>> { pub fn extract_mr_document(conn: &Connection, mr_id: i64) -> Result<Option<DocumentData>> {
let row = conn.query_row( let row = conn.query_row(
"SELECT m.id, m.iid, m.title, m.description, m.state, m.author_username, "SELECT m.id, m.iid, m.title, m.description, m.state, m.author_username,
@@ -214,19 +194,19 @@ pub fn extract_mr_document(conn: &Connection, mr_id: i64) -> Result<Option<Docum
rusqlite::params![mr_id], rusqlite::params![mr_id],
|row| { |row| {
Ok(( Ok((
row.get::<_, i64>(0)?, // id row.get::<_, i64>(0)?,
row.get::<_, i64>(1)?, // iid row.get::<_, i64>(1)?,
row.get::<_, Option<String>>(2)?, // title row.get::<_, Option<String>>(2)?,
row.get::<_, Option<String>>(3)?, // description row.get::<_, Option<String>>(3)?,
row.get::<_, Option<String>>(4)?, // state row.get::<_, Option<String>>(4)?,
row.get::<_, Option<String>>(5)?, // author_username row.get::<_, Option<String>>(5)?,
row.get::<_, Option<String>>(6)?, // source_branch row.get::<_, Option<String>>(6)?,
row.get::<_, Option<String>>(7)?, // target_branch row.get::<_, Option<String>>(7)?,
row.get::<_, Option<i64>>(8)?, // created_at (nullable in schema) row.get::<_, Option<i64>>(8)?,
row.get::<_, Option<i64>>(9)?, // updated_at (nullable in schema) row.get::<_, Option<i64>>(9)?,
row.get::<_, Option<String>>(10)?, // web_url row.get::<_, Option<String>>(10)?,
row.get::<_, String>(11)?, // path_with_namespace row.get::<_, String>(11)?,
row.get::<_, i64>(12)?, // project_id row.get::<_, i64>(12)?,
)) ))
}, },
); );
@@ -251,7 +231,6 @@ pub fn extract_mr_document(conn: &Connection, mr_id: i64) -> Result<Option<Docum
Err(e) => return Err(e.into()), Err(e) => return Err(e.into()),
}; };
// Query labels via junction table
let mut label_stmt = conn.prepare_cached( let mut label_stmt = conn.prepare_cached(
"SELECT l.name FROM mr_labels ml "SELECT l.name FROM mr_labels ml
JOIN labels l ON l.id = ml.label_id JOIN labels l ON l.id = ml.label_id
@@ -278,7 +257,6 @@ pub fn extract_mr_document(conn: &Connection, mr_id: i64) -> Result<Option<Docum
if let Some(ref author) = author_username { if let Some(ref author) = author_username {
content.push_str(&format!("Author: @{}\n", author)); content.push_str(&format!("Author: @{}\n", author));
} }
// Source line: source_branch -> target_branch
if let (Some(src), Some(tgt)) = (&source_branch, &target_branch) { if let (Some(src), Some(tgt)) = (&source_branch, &target_branch) {
content.push_str(&format!("Source: {} -> {}\n", src, tgt)); content.push_str(&format!("Source: {} -> {}\n", src, tgt));
} }
@@ -291,7 +269,6 @@ pub fn extract_mr_document(conn: &Connection, mr_id: i64) -> Result<Option<Docum
let labels_hash = compute_list_hash(&labels); let labels_hash = compute_list_hash(&labels);
let paths_hash = compute_list_hash(&[]); let paths_hash = compute_list_hash(&[]);
// Apply hard cap truncation for safety, then hash the final stored content
let hard_cap = truncate_hard_cap(&content); let hard_cap = truncate_hard_cap(&content);
let content_hash = compute_content_hash(&hard_cap.content); let content_hash = compute_content_hash(&hard_cap.content);
@@ -315,20 +292,16 @@ pub fn extract_mr_document(conn: &Connection, mr_id: i64) -> Result<Option<Docum
})) }))
} }
/// Format ms epoch as YYYY-MM-DD date string.
fn format_date(ms: i64) -> String { fn format_date(ms: i64) -> String {
DateTime::from_timestamp_millis(ms) DateTime::from_timestamp_millis(ms)
.map(|dt| dt.format("%Y-%m-%d").to_string()) .map(|dt| dt.format("%Y-%m-%d").to_string())
.unwrap_or_else(|| "unknown".to_string()) .unwrap_or_else(|| "unknown".to_string())
} }
/// Extract a searchable document from a discussion thread.
/// Returns None if the discussion or its parent has been deleted.
pub fn extract_discussion_document( pub fn extract_discussion_document(
conn: &Connection, conn: &Connection,
discussion_id: i64, discussion_id: i64,
) -> Result<Option<DocumentData>> { ) -> Result<Option<DocumentData>> {
// Query discussion metadata
let disc_row = conn.query_row( let disc_row = conn.query_row(
"SELECT d.id, d.noteable_type, d.issue_id, d.merge_request_id, "SELECT d.id, d.noteable_type, d.issue_id, d.merge_request_id,
p.path_with_namespace, p.id AS project_id p.path_with_namespace, p.id AS project_id
@@ -338,12 +311,12 @@ pub fn extract_discussion_document(
rusqlite::params![discussion_id], rusqlite::params![discussion_id],
|row| { |row| {
Ok(( Ok((
row.get::<_, i64>(0)?, // id row.get::<_, i64>(0)?,
row.get::<_, String>(1)?, // noteable_type row.get::<_, String>(1)?,
row.get::<_, Option<i64>>(2)?, // issue_id row.get::<_, Option<i64>>(2)?,
row.get::<_, Option<i64>>(3)?, // merge_request_id row.get::<_, Option<i64>>(3)?,
row.get::<_, String>(4)?, // path_with_namespace row.get::<_, String>(4)?,
row.get::<_, i64>(5)?, // project_id row.get::<_, i64>(5)?,
)) ))
}, },
); );
@@ -355,7 +328,6 @@ pub fn extract_discussion_document(
Err(e) => return Err(e.into()), Err(e) => return Err(e.into()),
}; };
// Query parent entity
let (_parent_iid, parent_title, parent_web_url, parent_type_prefix, labels) = let (_parent_iid, parent_title, parent_web_url, parent_type_prefix, labels) =
match noteable_type.as_str() { match noteable_type.as_str() {
"Issue" => { "Issue" => {
@@ -379,7 +351,6 @@ pub fn extract_discussion_document(
Err(rusqlite::Error::QueryReturnedNoRows) => return Ok(None), Err(rusqlite::Error::QueryReturnedNoRows) => return Ok(None),
Err(e) => return Err(e.into()), Err(e) => return Err(e.into()),
}; };
// Query parent labels
let mut label_stmt = conn.prepare_cached( let mut label_stmt = conn.prepare_cached(
"SELECT l.name FROM issue_labels il "SELECT l.name FROM issue_labels il
JOIN labels l ON l.id = il.label_id JOIN labels l ON l.id = il.label_id
@@ -413,7 +384,6 @@ pub fn extract_discussion_document(
Err(rusqlite::Error::QueryReturnedNoRows) => return Ok(None), Err(rusqlite::Error::QueryReturnedNoRows) => return Ok(None),
Err(e) => return Err(e.into()), Err(e) => return Err(e.into()),
}; };
// Query parent labels
let mut label_stmt = conn.prepare_cached( let mut label_stmt = conn.prepare_cached(
"SELECT l.name FROM mr_labels ml "SELECT l.name FROM mr_labels ml
JOIN labels l ON l.id = ml.label_id JOIN labels l ON l.id = ml.label_id
@@ -429,7 +399,6 @@ pub fn extract_discussion_document(
_ => return Ok(None), _ => return Ok(None),
}; };
// Query non-system notes in thread order
let mut note_stmt = conn.prepare_cached( let mut note_stmt = conn.prepare_cached(
"SELECT n.author_username, n.body, n.created_at, n.gitlab_id, "SELECT n.author_username, n.body, n.created_at, n.gitlab_id,
n.note_type, n.position_old_path, n.position_new_path n.note_type, n.position_old_path, n.position_new_path
@@ -454,7 +423,6 @@ pub fn extract_discussion_document(
body: row.get(1)?, body: row.get(1)?,
created_at: row.get(2)?, created_at: row.get(2)?,
gitlab_id: row.get(3)?, gitlab_id: row.get(3)?,
// index 4 is note_type (unused here)
old_path: row.get(5)?, old_path: row.get(5)?,
new_path: row.get(6)?, new_path: row.get(6)?,
}) })
@@ -465,7 +433,6 @@ pub fn extract_discussion_document(
return Ok(None); return Ok(None);
} }
// Extract DiffNote paths (deduplicated, sorted)
let mut path_set = BTreeSet::new(); let mut path_set = BTreeSet::new();
for note in &notes { for note in &notes {
if let Some(ref p) = note.old_path if let Some(ref p) = note.old_path
@@ -481,16 +448,13 @@ pub fn extract_discussion_document(
} }
let paths: Vec<String> = path_set.into_iter().collect(); let paths: Vec<String> = path_set.into_iter().collect();
// Construct URL: parent_web_url#note_{first_note_gitlab_id}
let first_note_gitlab_id = notes[0].gitlab_id; let first_note_gitlab_id = notes[0].gitlab_id;
let url = parent_web_url let url = parent_web_url
.as_ref() .as_ref()
.map(|wu| format!("{}#note_{}", wu, first_note_gitlab_id)); .map(|wu| format!("{}#note_{}", wu, first_note_gitlab_id));
// First non-system note author
let author_username = notes[0].author.clone(); let author_username = notes[0].author.clone();
// Build content
let display_title = parent_title.as_deref().unwrap_or("(untitled)"); let display_title = parent_title.as_deref().unwrap_or("(untitled)");
let labels_json = serde_json::to_string(&labels).unwrap_or_else(|_| "[]".to_string()); let labels_json = serde_json::to_string(&labels).unwrap_or_else(|_| "[]".to_string());
let paths_json = serde_json::to_string(&paths).unwrap_or_else(|_| "[]".to_string()); let paths_json = serde_json::to_string(&paths).unwrap_or_else(|_| "[]".to_string());
@@ -507,7 +471,6 @@ pub fn extract_discussion_document(
content.push_str(&format!("Files: {}\n", paths_json)); content.push_str(&format!("Files: {}\n", paths_json));
} }
// Build NoteContent list for truncation-aware thread rendering
let note_contents: Vec<NoteContent> = notes let note_contents: Vec<NoteContent> = notes
.iter() .iter()
.map(|note| NoteContent { .map(|note| NoteContent {
@@ -517,7 +480,6 @@ pub fn extract_discussion_document(
}) })
.collect(); .collect();
// Estimate header size to reserve budget for thread content
let header_len = content.len() + "\n--- Thread ---\n\n".len(); let header_len = content.len() + "\n--- Thread ---\n\n".len();
let thread_budget = MAX_DISCUSSION_BYTES.saturating_sub(header_len); let thread_budget = MAX_DISCUSSION_BYTES.saturating_sub(header_len);
@@ -525,7 +487,6 @@ pub fn extract_discussion_document(
content.push_str("\n--- Thread ---\n\n"); content.push_str("\n--- Thread ---\n\n");
content.push_str(&thread_result.content); content.push_str(&thread_result.content);
// Use first note's created_at and last note's created_at for timestamps
let created_at = notes[0].created_at; let created_at = notes[0].created_at;
let updated_at = notes.last().map(|n| n.created_at).unwrap_or(created_at); let updated_at = notes.last().map(|n| n.created_at).unwrap_or(created_at);
@@ -545,7 +506,7 @@ pub fn extract_discussion_document(
created_at, created_at,
updated_at, updated_at,
url, url,
title: None, // Discussions don't have their own title title: None,
content_text: content, content_text: content,
content_hash, content_hash,
is_truncated: thread_result.is_truncated, is_truncated: thread_result.is_truncated,
@@ -580,7 +541,7 @@ mod tests {
Some(SourceType::Discussion) Some(SourceType::Discussion)
); );
assert_eq!(SourceType::parse("invalid"), None); assert_eq!(SourceType::parse("invalid"), None);
assert_eq!(SourceType::parse("ISSUE"), Some(SourceType::Issue)); // case insensitive assert_eq!(SourceType::parse("ISSUE"), Some(SourceType::Issue));
} }
#[test] #[test]
@@ -603,8 +564,7 @@ mod tests {
let hash2 = compute_content_hash("hello"); let hash2 = compute_content_hash("hello");
assert_eq!(hash1, hash2); assert_eq!(hash1, hash2);
assert!(!hash1.is_empty()); assert!(!hash1.is_empty());
// SHA-256 of "hello" is known assert_eq!(hash1.len(), 64);
assert_eq!(hash1.len(), 64); // 256 bits = 64 hex chars
} }
#[test] #[test]
@@ -631,12 +591,10 @@ mod tests {
fn test_list_hash_empty() { fn test_list_hash_empty() {
let hash = compute_list_hash(&[]); let hash = compute_list_hash(&[]);
assert_eq!(hash.len(), 64); assert_eq!(hash.len(), 64);
// Empty list hashes consistently
let hash2 = compute_list_hash(&[]); let hash2 = compute_list_hash(&[]);
assert_eq!(hash, hash2); assert_eq!(hash, hash2);
} }
// Helper to create an in-memory DB with the required tables for extraction tests
fn setup_test_db() -> Connection { fn setup_test_db() -> Connection {
let conn = Connection::open_in_memory().unwrap(); let conn = Connection::open_in_memory().unwrap();
conn.execute_batch( conn.execute_batch(
@@ -685,7 +643,6 @@ mod tests {
) )
.unwrap(); .unwrap();
// Insert a test project
conn.execute( conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url) VALUES (1, 100, 'group/project-one', 'https://gitlab.example.com/group/project-one')", "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url) VALUES (1, 100, 'group/project-one', 'https://gitlab.example.com/group/project-one')",
[], [],
@@ -871,12 +828,9 @@ mod tests {
insert_issue(&conn, 1, 10, Some("Test"), Some(""), "opened", None, None); insert_issue(&conn, 1, 10, Some("Test"), Some(""), "opened", None, None);
let doc = extract_issue_document(&conn, 1).unwrap().unwrap(); let doc = extract_issue_document(&conn, 1).unwrap().unwrap();
// Empty string description still includes the section header
assert!(doc.content_text.contains("--- Description ---\n\n")); assert!(doc.content_text.contains("--- Description ---\n\n"));
} }
// --- MR extraction tests ---
fn setup_mr_test_db() -> Connection { fn setup_mr_test_db() -> Connection {
let conn = setup_test_db(); let conn = setup_test_db();
conn.execute_batch( conn.execute_batch(
@@ -1067,10 +1021,8 @@ mod tests {
assert!(!doc.content_text.contains("Source:")); assert!(!doc.content_text.contains("Source:"));
} }
// --- Discussion extraction tests ---
fn setup_discussion_test_db() -> Connection { fn setup_discussion_test_db() -> Connection {
let conn = setup_mr_test_db(); // includes projects, issues schema, labels, mr tables let conn = setup_mr_test_db();
conn.execute_batch( conn.execute_batch(
" "
CREATE TABLE discussions ( CREATE TABLE discussions (
@@ -1166,7 +1118,6 @@ mod tests {
link_issue_label(&conn, 1, 1); link_issue_label(&conn, 1, 1);
link_issue_label(&conn, 1, 2); link_issue_label(&conn, 1, 2);
insert_discussion(&conn, 1, "Issue", Some(1), None); insert_discussion(&conn, 1, "Issue", Some(1), None);
// 1710460800000 = 2024-03-15T00:00:00Z
insert_note( insert_note(
&conn, &conn,
1, 1,
@@ -1213,7 +1164,7 @@ mod tests {
.contains("@janedoe (2024-03-15):\nAgreed. What about refresh token strategy?") .contains("@janedoe (2024-03-15):\nAgreed. What about refresh token strategy?")
); );
assert_eq!(doc.author_username, Some("johndoe".to_string())); assert_eq!(doc.author_username, Some("johndoe".to_string()));
assert!(doc.title.is_none()); // Discussions don't have their own title assert!(doc.title.is_none());
} }
#[test] #[test]
@@ -1226,7 +1177,6 @@ mod tests {
#[test] #[test]
fn test_discussion_parent_deleted() { fn test_discussion_parent_deleted() {
let conn = setup_discussion_test_db(); let conn = setup_discussion_test_db();
// Insert issue, create discussion, then delete the issue
insert_issue( insert_issue(
&conn, &conn,
99, 99,
@@ -1250,8 +1200,6 @@ mod tests {
None, None,
None, None,
); );
// Delete the parent issue — FK cascade won't delete discussion in test since
// we used REFERENCES without ON DELETE CASCADE in test schema, so just delete from issues
conn.execute("PRAGMA foreign_keys = OFF", []).unwrap(); conn.execute("PRAGMA foreign_keys = OFF", []).unwrap();
conn.execute("DELETE FROM issues WHERE id = 99", []) conn.execute("DELETE FROM issues WHERE id = 99", [])
.unwrap(); .unwrap();
@@ -1358,7 +1306,6 @@ mod tests {
); );
let doc = extract_discussion_document(&conn, 1).unwrap().unwrap(); let doc = extract_discussion_document(&conn, 1).unwrap().unwrap();
// Paths should be deduplicated and sorted
assert_eq!(doc.paths, vec!["src/new.rs", "src/old.rs"]); assert_eq!(doc.paths, vec!["src/new.rs", "src/old.rs"]);
assert!( assert!(
doc.content_text doc.content_text
@@ -1498,7 +1445,6 @@ mod tests {
None, None,
); );
// All notes are system notes -> no content -> returns None
let result = extract_discussion_document(&conn, 1).unwrap(); let result = extract_discussion_document(&conn, 1).unwrap();
assert!(result.is_none()); assert!(result.is_none());
} }

View File

@@ -1,7 +1,3 @@
//! Document generation and management.
//!
//! Extracts searchable documents from issues, MRs, and discussions.
mod extractor; mod extractor;
mod regenerator; mod regenerator;
mod truncation; mod truncation;

View File

@@ -9,7 +9,6 @@ use crate::documents::{
}; };
use crate::ingestion::dirty_tracker::{clear_dirty, get_dirty_sources, record_dirty_error}; use crate::ingestion::dirty_tracker::{clear_dirty, get_dirty_sources, record_dirty_error};
/// Result of a document regeneration run.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct RegenerateResult { pub struct RegenerateResult {
pub regenerated: usize, pub regenerated: usize,
@@ -17,12 +16,6 @@ pub struct RegenerateResult {
pub errored: usize, pub errored: usize,
} }
/// Drain the dirty_sources queue, regenerating documents for each entry.
///
/// Uses per-item error handling (fail-soft) and drains the queue completely
/// via a bounded batch loop. Each dirty item is processed independently.
///
/// `progress_callback` reports `(processed, estimated_total)` after each item.
#[instrument( #[instrument(
skip(conn, progress_callback), skip(conn, progress_callback),
fields(items_processed, items_skipped, errors) fields(items_processed, items_skipped, errors)
@@ -33,10 +26,6 @@ pub fn regenerate_dirty_documents(
) -> Result<RegenerateResult> { ) -> Result<RegenerateResult> {
let mut result = RegenerateResult::default(); let mut result = RegenerateResult::default();
// Estimated total for progress reporting. Recount each loop iteration
// so the denominator grows if new items are enqueued during processing
// (the queue can grow while we drain it). We use max() so the value
// never shrinks — preventing the progress fraction from going backwards.
let mut estimated_total: usize = 0; let mut estimated_total: usize = 0;
loop { loop {
@@ -45,7 +34,6 @@ pub fn regenerate_dirty_documents(
break; break;
} }
// Recount remaining + already-processed to get the true total.
let remaining: usize = conn let remaining: usize = conn
.query_row("SELECT COUNT(*) FROM dirty_sources", [], |row| row.get(0)) .query_row("SELECT COUNT(*) FROM dirty_sources", [], |row| row.get(0))
.unwrap_or(0_i64) as usize; .unwrap_or(0_i64) as usize;
@@ -95,7 +83,6 @@ pub fn regenerate_dirty_documents(
Ok(result) Ok(result)
} }
/// Regenerate a single document. Returns true if content_hash changed.
fn regenerate_one(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<bool> { fn regenerate_one(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<bool> {
let doc = match source_type { let doc = match source_type {
SourceType::Issue => extract_issue_document(conn, source_id)?, SourceType::Issue => extract_issue_document(conn, source_id)?,
@@ -104,7 +91,6 @@ fn regenerate_one(conn: &Connection, source_type: SourceType, source_id: i64) ->
}; };
let Some(doc) = doc else { let Some(doc) = doc else {
// Source was deleted — remove the document (cascade handles FTS/embeddings)
delete_document(conn, source_type, source_id)?; delete_document(conn, source_type, source_id)?;
return Ok(true); return Ok(true);
}; };
@@ -112,13 +98,11 @@ fn regenerate_one(conn: &Connection, source_type: SourceType, source_id: i64) ->
let existing_hash = get_existing_hash(conn, source_type, source_id)?; let existing_hash = get_existing_hash(conn, source_type, source_id)?;
let changed = existing_hash.as_ref() != Some(&doc.content_hash); let changed = existing_hash.as_ref() != Some(&doc.content_hash);
// Always upsert: labels/paths can change independently of content_hash
upsert_document(conn, &doc)?; upsert_document(conn, &doc)?;
Ok(changed) Ok(changed)
} }
/// Get existing content hash for a document, if it exists.
fn get_existing_hash( fn get_existing_hash(
conn: &Connection, conn: &Connection,
source_type: SourceType, source_type: SourceType,
@@ -136,11 +120,6 @@ fn get_existing_hash(
Ok(hash) Ok(hash)
} }
/// Upsert a document with triple-hash write optimization.
///
/// Wrapped in a SAVEPOINT to ensure atomicity of the multi-statement write
/// (document row + labels + paths). Without this, a crash between statements
/// could leave the document with a stale labels_hash but missing label rows.
fn upsert_document(conn: &Connection, doc: &DocumentData) -> Result<()> { fn upsert_document(conn: &Connection, doc: &DocumentData) -> Result<()> {
conn.execute_batch("SAVEPOINT upsert_doc")?; conn.execute_batch("SAVEPOINT upsert_doc")?;
match upsert_document_inner(conn, doc) { match upsert_document_inner(conn, doc) {
@@ -149,8 +128,6 @@ fn upsert_document(conn: &Connection, doc: &DocumentData) -> Result<()> {
Ok(()) Ok(())
} }
Err(e) => { Err(e) => {
// ROLLBACK TO restores the savepoint but leaves it active.
// RELEASE removes it so the connection is clean for the next call.
let _ = conn.execute_batch("ROLLBACK TO upsert_doc; RELEASE upsert_doc"); let _ = conn.execute_batch("ROLLBACK TO upsert_doc; RELEASE upsert_doc");
Err(e) Err(e)
} }
@@ -158,7 +135,6 @@ fn upsert_document(conn: &Connection, doc: &DocumentData) -> Result<()> {
} }
fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<()> { fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<()> {
// Check existing hashes before writing
let existing: Option<(i64, String, String, String)> = conn let existing: Option<(i64, String, String, String)> = conn
.query_row( .query_row(
"SELECT id, content_hash, labels_hash, paths_hash FROM documents "SELECT id, content_hash, labels_hash, paths_hash FROM documents
@@ -168,7 +144,6 @@ fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<()> {
) )
.optional()?; .optional()?;
// Fast path: skip ALL writes when nothing changed (prevents WAL churn)
if let Some((_, ref old_content_hash, ref old_labels_hash, ref old_paths_hash)) = existing if let Some((_, ref old_content_hash, ref old_labels_hash, ref old_paths_hash)) = existing
&& old_content_hash == &doc.content_hash && old_content_hash == &doc.content_hash
&& old_labels_hash == &doc.labels_hash && old_labels_hash == &doc.labels_hash
@@ -179,7 +154,6 @@ fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<()> {
let labels_json = serde_json::to_string(&doc.labels).unwrap_or_else(|_| "[]".to_string()); let labels_json = serde_json::to_string(&doc.labels).unwrap_or_else(|_| "[]".to_string());
// Upsert document row
conn.execute( conn.execute(
"INSERT INTO documents "INSERT INTO documents
(source_type, source_id, project_id, author_username, label_names, (source_type, source_id, project_id, author_username, label_names,
@@ -218,13 +192,11 @@ fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<()> {
], ],
)?; )?;
// Get document ID
let doc_id = match existing { let doc_id = match existing {
Some((id, _, _, _)) => id, Some((id, _, _, _)) => id,
None => get_document_id(conn, doc.source_type, doc.source_id)?, None => get_document_id(conn, doc.source_type, doc.source_id)?,
}; };
// Only update labels if hash changed
let labels_changed = match &existing { let labels_changed = match &existing {
Some((_, _, old_hash, _)) => old_hash != &doc.labels_hash, Some((_, _, old_hash, _)) => old_hash != &doc.labels_hash,
None => true, None => true,
@@ -242,7 +214,6 @@ fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<()> {
} }
} }
// Only update paths if hash changed
let paths_changed = match &existing { let paths_changed = match &existing {
Some((_, _, _, old_hash)) => old_hash != &doc.paths_hash, Some((_, _, _, old_hash)) => old_hash != &doc.paths_hash,
None => true, None => true,
@@ -263,7 +234,6 @@ fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<()> {
Ok(()) Ok(())
} }
/// Delete a document by source identity.
fn delete_document(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<()> { fn delete_document(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<()> {
conn.execute( conn.execute(
"DELETE FROM documents WHERE source_type = ?1 AND source_id = ?2", "DELETE FROM documents WHERE source_type = ?1 AND source_id = ?2",
@@ -272,7 +242,6 @@ fn delete_document(conn: &Connection, source_type: SourceType, source_id: i64) -
Ok(()) Ok(())
} }
/// Get document ID by source type and source ID.
fn get_document_id(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<i64> { fn get_document_id(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<i64> {
let id: i64 = conn.query_row( let id: i64 = conn.query_row(
"SELECT id FROM documents WHERE source_type = ?1 AND source_id = ?2", "SELECT id FROM documents WHERE source_type = ?1 AND source_id = ?2",
@@ -391,7 +360,6 @@ mod tests {
assert_eq!(result.unchanged, 0); assert_eq!(result.unchanged, 0);
assert_eq!(result.errored, 0); assert_eq!(result.errored, 0);
// Verify document was created
let count: i64 = conn let count: i64 = conn
.query_row("SELECT COUNT(*) FROM documents", [], |r| r.get(0)) .query_row("SELECT COUNT(*) FROM documents", [], |r| r.get(0))
.unwrap(); .unwrap();
@@ -411,12 +379,10 @@ mod tests {
[], [],
).unwrap(); ).unwrap();
// First regeneration creates the document
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
let r1 = regenerate_dirty_documents(&conn, None).unwrap(); let r1 = regenerate_dirty_documents(&conn, None).unwrap();
assert_eq!(r1.regenerated, 1); assert_eq!(r1.regenerated, 1);
// Second regeneration — same data, should be unchanged
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
let r2 = regenerate_dirty_documents(&conn, None).unwrap(); let r2 = regenerate_dirty_documents(&conn, None).unwrap();
assert_eq!(r2.unchanged, 1); assert_eq!(r2.unchanged, 1);
@@ -433,14 +399,13 @@ mod tests {
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
regenerate_dirty_documents(&conn, None).unwrap(); regenerate_dirty_documents(&conn, None).unwrap();
// Delete the issue and re-mark dirty
conn.execute("PRAGMA foreign_keys = OFF", []).unwrap(); conn.execute("PRAGMA foreign_keys = OFF", []).unwrap();
conn.execute("DELETE FROM issues WHERE id = 1", []).unwrap(); conn.execute("DELETE FROM issues WHERE id = 1", []).unwrap();
conn.execute("PRAGMA foreign_keys = ON", []).unwrap(); conn.execute("PRAGMA foreign_keys = ON", []).unwrap();
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
let result = regenerate_dirty_documents(&conn, None).unwrap(); let result = regenerate_dirty_documents(&conn, None).unwrap();
assert_eq!(result.regenerated, 1); // Deletion counts as "changed" assert_eq!(result.regenerated, 1);
let count: i64 = conn let count: i64 = conn
.query_row("SELECT COUNT(*) FROM documents", [], |r| r.get(0)) .query_row("SELECT COUNT(*) FROM documents", [], |r| r.get(0))
@@ -462,7 +427,6 @@ mod tests {
let result = regenerate_dirty_documents(&conn, None).unwrap(); let result = regenerate_dirty_documents(&conn, None).unwrap();
assert_eq!(result.regenerated, 10); assert_eq!(result.regenerated, 10);
// Queue should be empty
let dirty = get_dirty_sources(&conn).unwrap(); let dirty = get_dirty_sources(&conn).unwrap();
assert!(dirty.is_empty()); assert!(dirty.is_empty());
} }
@@ -485,16 +449,13 @@ mod tests {
) )
.unwrap(); .unwrap();
// First run creates document
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
regenerate_dirty_documents(&conn, None).unwrap(); regenerate_dirty_documents(&conn, None).unwrap();
// Second run — triple hash match, should skip ALL writes
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
let result = regenerate_dirty_documents(&conn, None).unwrap(); let result = regenerate_dirty_documents(&conn, None).unwrap();
assert_eq!(result.unchanged, 1); assert_eq!(result.unchanged, 1);
// Labels should still be present (not deleted and re-inserted)
let label_count: i64 = conn let label_count: i64 = conn
.query_row("SELECT COUNT(*) FROM document_labels", [], |r| r.get(0)) .query_row("SELECT COUNT(*) FROM document_labels", [], |r| r.get(0))
.unwrap(); .unwrap();

View File

@@ -1,25 +1,19 @@
/// Maximum byte limit for discussion documents (suitable for embedding chunking).
/// Note: uses `.len()` (byte count), not char count — consistent with `CHUNK_MAX_BYTES`.
pub const MAX_DISCUSSION_BYTES: usize = 32_000; pub const MAX_DISCUSSION_BYTES: usize = 32_000;
/// Hard safety cap (bytes) for any document type (pathological content: pasted logs, base64).
pub const MAX_DOCUMENT_BYTES_HARD: usize = 2_000_000; pub const MAX_DOCUMENT_BYTES_HARD: usize = 2_000_000;
/// A single note's content for truncation processing.
pub struct NoteContent { pub struct NoteContent {
pub author: String, pub author: String,
pub date: String, pub date: String,
pub body: String, pub body: String,
} }
/// Result of truncation processing.
pub struct TruncationResult { pub struct TruncationResult {
pub content: String, pub content: String,
pub is_truncated: bool, pub is_truncated: bool,
pub reason: Option<TruncationReason>, pub reason: Option<TruncationReason>,
} }
/// Why a document was truncated (matches DB CHECK constraint values).
#[derive(Debug, Clone, Copy, PartialEq, Eq)] #[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum TruncationReason { pub enum TruncationReason {
TokenLimitMiddleDrop, TokenLimitMiddleDrop,
@@ -29,7 +23,6 @@ pub enum TruncationReason {
} }
impl TruncationReason { impl TruncationReason {
/// Returns the DB-compatible string matching the CHECK constraint.
pub fn as_str(&self) -> &'static str { pub fn as_str(&self) -> &'static str {
match self { match self {
Self::TokenLimitMiddleDrop => "token_limit_middle_drop", Self::TokenLimitMiddleDrop => "token_limit_middle_drop",
@@ -40,19 +33,14 @@ impl TruncationReason {
} }
} }
/// Format a single note as `@author (date):\nbody\n\n`.
fn format_note(note: &NoteContent) -> String { fn format_note(note: &NoteContent) -> String {
format!("@{} ({}):\n{}\n\n", note.author, note.date, note.body) format!("@{} ({}):\n{}\n\n", note.author, note.date, note.body)
} }
/// Truncate a string at a UTF-8-safe byte boundary.
/// Returns a slice no longer than `max_bytes` bytes, walking backward
/// to find the nearest char boundary if needed.
pub fn truncate_utf8(s: &str, max_bytes: usize) -> &str { pub fn truncate_utf8(s: &str, max_bytes: usize) -> &str {
if s.len() <= max_bytes { if s.len() <= max_bytes {
return s; return s;
} }
// Walk backward from max_bytes to find a char boundary
let mut end = max_bytes; let mut end = max_bytes;
while end > 0 && !s.is_char_boundary(end) { while end > 0 && !s.is_char_boundary(end) {
end -= 1; end -= 1;
@@ -60,14 +48,6 @@ pub fn truncate_utf8(s: &str, max_bytes: usize) -> &str {
&s[..end] &s[..end]
} }
/// Truncate discussion notes to fit within `max_bytes`.
///
/// Algorithm:
/// 1. Format all notes
/// 2. If total fits, return as-is
/// 3. Single note: truncate at UTF-8 boundary, append [truncated]
/// 4. Try to keep first N notes + last note + marker within limit
/// 5. If first + last > limit: keep only first (truncated)
pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> TruncationResult { pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> TruncationResult {
if notes.is_empty() { if notes.is_empty() {
return TruncationResult { return TruncationResult {
@@ -80,7 +60,6 @@ pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> Truncatio
let formatted: Vec<String> = notes.iter().map(format_note).collect(); let formatted: Vec<String> = notes.iter().map(format_note).collect();
let total: String = formatted.concat(); let total: String = formatted.concat();
// Case 1: fits within limit
if total.len() <= max_bytes { if total.len() <= max_bytes {
return TruncationResult { return TruncationResult {
content: total, content: total,
@@ -89,9 +68,8 @@ pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> Truncatio
}; };
} }
// Case 2: single note — truncate it
if notes.len() == 1 { if notes.len() == 1 {
let truncated = truncate_utf8(&total, max_bytes.saturating_sub(11)); // room for [truncated] let truncated = truncate_utf8(&total, max_bytes.saturating_sub(11));
let content = format!("{}[truncated]", truncated); let content = format!("{}[truncated]", truncated);
return TruncationResult { return TruncationResult {
content, content,
@@ -100,10 +78,8 @@ pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> Truncatio
}; };
} }
// Case 3: multiple notes — try first N + marker + last
let last_note = &formatted[formatted.len() - 1]; let last_note = &formatted[formatted.len() - 1];
// Binary search for max N where first N notes + marker + last note fit
let mut best_n = 0; let mut best_n = 0;
for n in 1..formatted.len() - 1 { for n in 1..formatted.len() - 1 {
let first_n: usize = formatted[..n].iter().map(|s| s.len()).sum(); let first_n: usize = formatted[..n].iter().map(|s| s.len()).sum();
@@ -118,7 +94,6 @@ pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> Truncatio
} }
if best_n > 0 { if best_n > 0 {
// We can keep first best_n notes + marker + last note
let first_part: String = formatted[..best_n].concat(); let first_part: String = formatted[..best_n].concat();
let omitted = formatted.len() - best_n - 1; let omitted = formatted.len() - best_n - 1;
let marker = format!("\n\n[... {} notes omitted for length ...]\n\n", omitted); let marker = format!("\n\n[... {} notes omitted for length ...]\n\n", omitted);
@@ -130,7 +105,6 @@ pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> Truncatio
}; };
} }
// Case 4: even first + last don't fit — keep only first (truncated)
let first_note = &formatted[0]; let first_note = &formatted[0];
if first_note.len() + last_note.len() > max_bytes { if first_note.len() + last_note.len() > max_bytes {
let truncated = truncate_utf8(first_note, max_bytes.saturating_sub(11)); let truncated = truncate_utf8(first_note, max_bytes.saturating_sub(11));
@@ -142,7 +116,6 @@ pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> Truncatio
}; };
} }
// Fallback: first + marker + last (0 middle notes kept)
let omitted = formatted.len() - 2; let omitted = formatted.len() - 2;
let marker = format!("\n\n[... {} notes omitted for length ...]\n\n", omitted); let marker = format!("\n\n[... {} notes omitted for length ...]\n\n", omitted);
let content = format!("{}{}{}", formatted[0], marker, last_note); let content = format!("{}{}{}", formatted[0], marker, last_note);
@@ -153,8 +126,6 @@ pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> Truncatio
} }
} }
/// Apply hard cap truncation to any document type.
/// Truncates at UTF-8-safe boundary if content exceeds 2MB.
pub fn truncate_hard_cap(content: &str) -> TruncationResult { pub fn truncate_hard_cap(content: &str) -> TruncationResult {
if content.len() <= MAX_DOCUMENT_BYTES_HARD { if content.len() <= MAX_DOCUMENT_BYTES_HARD {
return TruncationResult { return TruncationResult {
@@ -201,7 +172,6 @@ mod tests {
#[test] #[test]
fn test_middle_notes_dropped() { fn test_middle_notes_dropped() {
// Create 10 notes where total exceeds limit
let big_body = "x".repeat(4000); let big_body = "x".repeat(4000);
let notes: Vec<NoteContent> = (0..10) let notes: Vec<NoteContent> = (0..10)
.map(|i| make_note(&format!("user{}", i), &big_body)) .map(|i| make_note(&format!("user{}", i), &big_body))
@@ -209,11 +179,8 @@ mod tests {
let result = truncate_discussion(&notes, 10_000); let result = truncate_discussion(&notes, 10_000);
assert!(result.is_truncated); assert!(result.is_truncated);
assert_eq!(result.reason, Some(TruncationReason::TokenLimitMiddleDrop)); assert_eq!(result.reason, Some(TruncationReason::TokenLimitMiddleDrop));
// First note preserved
assert!(result.content.contains("@user0")); assert!(result.content.contains("@user0"));
// Last note preserved
assert!(result.content.contains("@user9")); assert!(result.content.contains("@user9"));
// Marker present
assert!(result.content.contains("notes omitted for length")); assert!(result.content.contains("notes omitted for length"));
} }
@@ -256,20 +223,16 @@ mod tests {
#[test] #[test]
fn test_utf8_boundary_safety() { fn test_utf8_boundary_safety() {
// Emoji are 4 bytes each
let emoji_content = "🎉".repeat(10); let emoji_content = "🎉".repeat(10);
let truncated = truncate_utf8(&emoji_content, 10); let truncated = truncate_utf8(&emoji_content, 10);
// 10 bytes should hold 2 emoji (8 bytes) with 2 bytes left over (not enough for another)
assert_eq!(truncated.len(), 8); assert_eq!(truncated.len(), 8);
assert_eq!(truncated, "🎉🎉"); assert_eq!(truncated, "🎉🎉");
} }
#[test] #[test]
fn test_utf8_boundary_cjk() { fn test_utf8_boundary_cjk() {
// CJK characters are 3 bytes each
let cjk = "中文字符测试"; let cjk = "中文字符测试";
let truncated = truncate_utf8(cjk, 7); let truncated = truncate_utf8(cjk, 7);
// 7 bytes: 2 full chars (6 bytes), 1 byte left (not enough for another)
assert_eq!(truncated, "中文"); assert_eq!(truncated, "中文");
assert_eq!(truncated.len(), 6); assert_eq!(truncated.len(), 6);
} }
@@ -294,7 +257,6 @@ mod tests {
#[test] #[test]
fn test_marker_count_correct() { fn test_marker_count_correct() {
// 7 notes, keep first 1 + last 1, drop middle 5
let big_body = "x".repeat(5000); let big_body = "x".repeat(5000);
let notes: Vec<NoteContent> = (0..7) let notes: Vec<NoteContent> = (0..7)
.map(|i| make_note(&format!("user{}", i), &big_body)) .map(|i| make_note(&format!("user{}", i), &big_body))

View File

@@ -1,11 +1,8 @@
//! Detect documents needing (re-)embedding based on content hash changes.
use rusqlite::Connection; use rusqlite::Connection;
use crate::core::error::Result; use crate::core::error::Result;
use crate::embedding::chunking::{CHUNK_MAX_BYTES, EXPECTED_DIMS}; use crate::embedding::chunking::{CHUNK_MAX_BYTES, EXPECTED_DIMS};
/// A document that needs embedding or re-embedding.
#[derive(Debug)] #[derive(Debug)]
pub struct PendingDocument { pub struct PendingDocument {
pub document_id: i64, pub document_id: i64,
@@ -13,20 +10,12 @@ pub struct PendingDocument {
pub content_hash: String, pub content_hash: String,
} }
/// Find documents that need embedding: new (no metadata), changed (hash mismatch),
/// or config-drifted (chunk_max_bytes/model/dims mismatch).
///
/// Uses keyset pagination (WHERE d.id > last_id) and returns up to `page_size` results.
pub fn find_pending_documents( pub fn find_pending_documents(
conn: &Connection, conn: &Connection,
page_size: usize, page_size: usize,
last_id: i64, last_id: i64,
model_name: &str, model_name: &str,
) -> Result<Vec<PendingDocument>> { ) -> Result<Vec<PendingDocument>> {
// Documents that either:
// 1. Have no embedding_metadata at all (new)
// 2. Have metadata where document_hash != content_hash (changed)
// 3. Config drift: chunk_max_bytes, model, or dims mismatch (or pre-migration NULL)
let sql = r#" let sql = r#"
SELECT d.id, d.content_text, d.content_hash SELECT d.id, d.content_text, d.content_hash
FROM documents d FROM documents d
@@ -79,7 +68,6 @@ pub fn find_pending_documents(
Ok(rows) Ok(rows)
} }
/// Count total documents that need embedding.
pub fn count_pending_documents(conn: &Connection, model_name: &str) -> Result<i64> { pub fn count_pending_documents(conn: &Connection, model_name: &str) -> Result<i64> {
let count: i64 = conn.query_row( let count: i64 = conn.query_row(
r#" r#"

View File

@@ -1,17 +1,9 @@
/// Multiplier for encoding (document_id, chunk_index) into a single rowid.
/// Supports up to 1000 chunks per document. At CHUNK_MAX_BYTES=6000,
/// a 2MB document (MAX_DOCUMENT_BYTES_HARD) produces ~333 chunks.
/// The pipeline enforces chunk_count <= CHUNK_ROWID_MULTIPLIER at runtime.
pub const CHUNK_ROWID_MULTIPLIER: i64 = 1000; pub const CHUNK_ROWID_MULTIPLIER: i64 = 1000;
/// Encode (document_id, chunk_index) into a sqlite-vec rowid.
///
/// rowid = document_id * CHUNK_ROWID_MULTIPLIER + chunk_index
pub fn encode_rowid(document_id: i64, chunk_index: i64) -> i64 { pub fn encode_rowid(document_id: i64, chunk_index: i64) -> i64 {
document_id * CHUNK_ROWID_MULTIPLIER + chunk_index document_id * CHUNK_ROWID_MULTIPLIER + chunk_index
} }
/// Decode a sqlite-vec rowid back into (document_id, chunk_index).
pub fn decode_rowid(rowid: i64) -> (i64, i64) { pub fn decode_rowid(rowid: i64) -> (i64, i64) {
let document_id = rowid / CHUNK_ROWID_MULTIPLIER; let document_id = rowid / CHUNK_ROWID_MULTIPLIER;
let chunk_index = rowid % CHUNK_ROWID_MULTIPLIER; let chunk_index = rowid % CHUNK_ROWID_MULTIPLIER;

View File

@@ -1,29 +1,9 @@
//! Text chunking for embedding: split documents at paragraph boundaries with overlap.
/// Maximum bytes per chunk.
/// Named `_BYTES` because `str::len()` returns byte count; multi-byte UTF-8
/// sequences mean byte length >= char count.
///
/// nomic-embed-text has an 8,192-token context window. English prose averages
/// ~4 chars/token, but technical content (code, URLs, JSON) can be 1-2
/// chars/token. We use 6,000 bytes as a conservative limit that stays safe
/// even for code-heavy chunks (~6,000 tokens worst-case).
pub const CHUNK_MAX_BYTES: usize = 6_000; pub const CHUNK_MAX_BYTES: usize = 6_000;
/// Expected embedding dimensions for nomic-embed-text.
pub const EXPECTED_DIMS: usize = 768; pub const EXPECTED_DIMS: usize = 768;
/// Character overlap between adjacent chunks.
pub const CHUNK_OVERLAP_CHARS: usize = 200; pub const CHUNK_OVERLAP_CHARS: usize = 200;
/// Split document content into chunks suitable for embedding.
///
/// Documents <= CHUNK_MAX_BYTES produce a single chunk.
/// Longer documents are split at paragraph boundaries (`\n\n`), falling back
/// to sentence boundaries, then word boundaries, then hard character cut.
/// Adjacent chunks share CHUNK_OVERLAP_CHARS of overlap.
///
/// Returns Vec<(chunk_index, chunk_text)>.
pub fn split_into_chunks(content: &str) -> Vec<(usize, String)> { pub fn split_into_chunks(content: &str) -> Vec<(usize, String)> {
if content.is_empty() { if content.is_empty() {
return Vec::new(); return Vec::new();
@@ -44,11 +24,9 @@ pub fn split_into_chunks(content: &str) -> Vec<(usize, String)> {
break; break;
} }
// Find a split point within CHUNK_MAX_BYTES (char-boundary-safe)
let end = floor_char_boundary(content, start + CHUNK_MAX_BYTES); let end = floor_char_boundary(content, start + CHUNK_MAX_BYTES);
let window = &content[start..end]; let window = &content[start..end];
// Try paragraph boundary (\n\n) — search backward from end
let split_at = find_paragraph_break(window) let split_at = find_paragraph_break(window)
.or_else(|| find_sentence_break(window)) .or_else(|| find_sentence_break(window))
.or_else(|| find_word_break(window)) .or_else(|| find_word_break(window))
@@ -57,9 +35,6 @@ pub fn split_into_chunks(content: &str) -> Vec<(usize, String)> {
let chunk_text = &content[start..start + split_at]; let chunk_text = &content[start..start + split_at];
chunks.push((chunk_index, chunk_text.to_string())); chunks.push((chunk_index, chunk_text.to_string()));
// Advance with overlap, guaranteeing forward progress to prevent infinite loops.
// If split_at <= CHUNK_OVERLAP_CHARS we skip overlap to avoid stalling.
// The .max(1) ensures we always advance at least 1 byte.
let advance = if split_at > CHUNK_OVERLAP_CHARS { let advance = if split_at > CHUNK_OVERLAP_CHARS {
split_at - CHUNK_OVERLAP_CHARS split_at - CHUNK_OVERLAP_CHARS
} else { } else {
@@ -73,10 +48,7 @@ pub fn split_into_chunks(content: &str) -> Vec<(usize, String)> {
chunks chunks
} }
/// Find the last paragraph break (`\n\n`) in the window, preferring the
/// last third for balanced chunks.
fn find_paragraph_break(window: &str) -> Option<usize> { fn find_paragraph_break(window: &str) -> Option<usize> {
// Search backward from 2/3 of the way through to find a good split
let search_start = window.len() * 2 / 3; let search_start = window.len() * 2 / 3;
window[search_start..] window[search_start..]
.rfind("\n\n") .rfind("\n\n")
@@ -84,7 +56,6 @@ fn find_paragraph_break(window: &str) -> Option<usize> {
.or_else(|| window[..search_start].rfind("\n\n").map(|pos| pos + 2)) .or_else(|| window[..search_start].rfind("\n\n").map(|pos| pos + 2))
} }
/// Find the last sentence boundary (`. `, `? `, `! `) in the window.
fn find_sentence_break(window: &str) -> Option<usize> { fn find_sentence_break(window: &str) -> Option<usize> {
let search_start = window.len() / 2; let search_start = window.len() / 2;
for pat in &[". ", "? ", "! "] { for pat in &[". ", "? ", "! "] {
@@ -92,7 +63,6 @@ fn find_sentence_break(window: &str) -> Option<usize> {
return Some(search_start + pos + pat.len()); return Some(search_start + pos + pat.len());
} }
} }
// Try first half
for pat in &[". ", "? ", "! "] { for pat in &[". ", "? ", "! "] {
if let Some(pos) = window[..search_start].rfind(pat) { if let Some(pos) = window[..search_start].rfind(pat) {
return Some(pos + pat.len()); return Some(pos + pat.len());
@@ -101,7 +71,6 @@ fn find_sentence_break(window: &str) -> Option<usize> {
None None
} }
/// Find the last word boundary (space) in the window.
fn find_word_break(window: &str) -> Option<usize> { fn find_word_break(window: &str) -> Option<usize> {
let search_start = window.len() / 2; let search_start = window.len() / 2;
window[search_start..] window[search_start..]
@@ -110,8 +79,6 @@ fn find_word_break(window: &str) -> Option<usize> {
.or_else(|| window[..search_start].rfind(' ').map(|pos| pos + 1)) .or_else(|| window[..search_start].rfind(' ').map(|pos| pos + 1))
} }
/// Find the largest byte index <= `idx` that is a valid char boundary in `s`.
/// Equivalent to `str::floor_char_boundary` (stabilized in Rust 1.82).
fn floor_char_boundary(s: &str, idx: usize) -> usize { fn floor_char_boundary(s: &str, idx: usize) -> usize {
if idx >= s.len() { if idx >= s.len() {
return s.len(); return s.len();
@@ -151,7 +118,6 @@ mod tests {
#[test] #[test]
fn test_long_document_multiple_chunks() { fn test_long_document_multiple_chunks() {
// Create content > CHUNK_MAX_BYTES with paragraph boundaries
let paragraph = "This is a paragraph of text.\n\n"; let paragraph = "This is a paragraph of text.\n\n";
let mut content = String::new(); let mut content = String::new();
while content.len() < CHUNK_MAX_BYTES * 2 { while content.len() < CHUNK_MAX_BYTES * 2 {
@@ -165,18 +131,15 @@ mod tests {
chunks.len() chunks.len()
); );
// Verify indices are sequential
for (i, (idx, _)) in chunks.iter().enumerate() { for (i, (idx, _)) in chunks.iter().enumerate() {
assert_eq!(*idx, i); assert_eq!(*idx, i);
} }
// Verify all content is covered (no gaps)
assert!(!chunks.last().unwrap().1.is_empty()); assert!(!chunks.last().unwrap().1.is_empty());
} }
#[test] #[test]
fn test_chunk_overlap() { fn test_chunk_overlap() {
// Create content that will produce 2+ chunks
let paragraph = "This is paragraph content for testing chunk overlap behavior.\n\n"; let paragraph = "This is paragraph content for testing chunk overlap behavior.\n\n";
let mut content = String::new(); let mut content = String::new();
while content.len() < CHUNK_MAX_BYTES + CHUNK_OVERLAP_CHARS + 1000 { while content.len() < CHUNK_MAX_BYTES + CHUNK_OVERLAP_CHARS + 1000 {
@@ -186,11 +149,9 @@ mod tests {
let chunks = split_into_chunks(&content); let chunks = split_into_chunks(&content);
assert!(chunks.len() >= 2); assert!(chunks.len() >= 2);
// Check that adjacent chunks share some content (overlap)
if chunks.len() >= 2 { if chunks.len() >= 2 {
let end_of_first = &chunks[0].1; let end_of_first = &chunks[0].1;
let start_of_second = &chunks[1].1; let start_of_second = &chunks[1].1;
// The end of first chunk should overlap with start of second
let overlap_region = let overlap_region =
&end_of_first[end_of_first.len().saturating_sub(CHUNK_OVERLAP_CHARS)..]; &end_of_first[end_of_first.len().saturating_sub(CHUNK_OVERLAP_CHARS)..];
assert!( assert!(
@@ -203,11 +164,9 @@ mod tests {
#[test] #[test]
fn test_no_paragraph_boundary() { fn test_no_paragraph_boundary() {
// Create content without paragraph breaks
let content = "word ".repeat(CHUNK_MAX_BYTES / 5 * 3); let content = "word ".repeat(CHUNK_MAX_BYTES / 5 * 3);
let chunks = split_into_chunks(&content); let chunks = split_into_chunks(&content);
assert!(chunks.len() >= 2); assert!(chunks.len() >= 2);
// Should still split (at word boundaries)
for (_, chunk) in &chunks { for (_, chunk) in &chunks {
assert!(!chunk.is_empty()); assert!(!chunk.is_empty());
} }

View File

@@ -4,7 +4,6 @@ use std::time::Duration;
use crate::core::error::{LoreError, Result}; use crate::core::error::{LoreError, Result};
/// Configuration for Ollama embedding service.
pub struct OllamaConfig { pub struct OllamaConfig {
pub base_url: String, pub base_url: String,
pub model: String, pub model: String,
@@ -21,7 +20,6 @@ impl Default for OllamaConfig {
} }
} }
/// Async client for Ollama embedding API.
pub struct OllamaClient { pub struct OllamaClient {
client: Client, client: Client,
config: OllamaConfig, config: OllamaConfig,
@@ -60,10 +58,6 @@ impl OllamaClient {
Self { client, config } Self { client, config }
} }
/// Health check: verifies Ollama is reachable and the configured model exists.
///
/// Model matching uses `starts_with` so "nomic-embed-text" matches
/// "nomic-embed-text:latest".
pub async fn health_check(&self) -> Result<()> { pub async fn health_check(&self) -> Result<()> {
let url = format!("{}/api/tags", self.config.base_url); let url = format!("{}/api/tags", self.config.base_url);
@@ -100,9 +94,6 @@ impl OllamaClient {
Ok(()) Ok(())
} }
/// Embed a batch of texts using the configured model.
///
/// Returns one embedding vector per input text.
pub async fn embed_batch(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> { pub async fn embed_batch(&self, texts: Vec<String>) -> Result<Vec<Vec<f32>>> {
let url = format!("{}/api/embed", self.config.base_url); let url = format!("{}/api/embed", self.config.base_url);
@@ -144,7 +135,6 @@ impl OllamaClient {
} }
} }
/// Quick health check without creating a full client.
pub async fn check_ollama_health(base_url: &str) -> bool { pub async fn check_ollama_health(base_url: &str) -> bool {
let client = Client::builder() let client = Client::builder()
.timeout(Duration::from_secs(5)) .timeout(Duration::from_secs(5))
@@ -173,12 +163,10 @@ mod tests {
#[test] #[test]
fn test_health_check_model_starts_with() { fn test_health_check_model_starts_with() {
// Verify the matching logic: "nomic-embed-text" should match "nomic-embed-text:latest"
let model = "nomic-embed-text"; let model = "nomic-embed-text";
let tag_name = "nomic-embed-text:latest"; let tag_name = "nomic-embed-text:latest";
assert!(tag_name.starts_with(model)); assert!(tag_name.starts_with(model));
// Non-matching model
let wrong_model = "llama2"; let wrong_model = "llama2";
assert!(!tag_name.starts_with(wrong_model)); assert!(!tag_name.starts_with(wrong_model));
} }

View File

@@ -1,5 +1,3 @@
//! Async embedding pipeline: chunk documents, embed via Ollama, store in sqlite-vec.
use std::collections::HashSet; use std::collections::HashSet;
use rusqlite::Connection; use rusqlite::Connection;
@@ -15,7 +13,6 @@ use crate::embedding::ollama::OllamaClient;
const BATCH_SIZE: usize = 32; const BATCH_SIZE: usize = 32;
const DB_PAGE_SIZE: usize = 500; const DB_PAGE_SIZE: usize = 500;
/// Result of an embedding run.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct EmbedResult { pub struct EmbedResult {
pub embedded: usize, pub embedded: usize,
@@ -23,7 +20,6 @@ pub struct EmbedResult {
pub skipped: usize, pub skipped: usize,
} }
/// Work item: a single chunk to embed.
struct ChunkWork { struct ChunkWork {
doc_id: i64, doc_id: i64,
chunk_index: usize, chunk_index: usize,
@@ -33,10 +29,6 @@ struct ChunkWork {
text: String, text: String,
} }
/// Run the embedding pipeline: find pending documents, chunk, embed, store.
///
/// Processes batches of BATCH_SIZE texts per Ollama API call.
/// Uses keyset pagination over documents (DB_PAGE_SIZE per page).
#[instrument(skip(conn, client, progress_callback), fields(%model_name, items_processed, items_skipped, errors))] #[instrument(skip(conn, client, progress_callback), fields(%model_name, items_processed, items_skipped, errors))]
pub async fn embed_documents( pub async fn embed_documents(
conn: &Connection, conn: &Connection,
@@ -61,16 +53,6 @@ pub async fn embed_documents(
break; break;
} }
// Wrap all DB writes for this page in a savepoint so that
// clear_document_embeddings + store_embedding are atomic. If the
// process crashes mid-page, the savepoint is never released and
// SQLite rolls back — preventing partial document states where old
// embeddings are cleared but new ones haven't been written yet.
//
// We use a closure + match to ensure the savepoint is always
// rolled back on error — bare `execute_batch("SAVEPOINT")` with `?`
// propagation would leak the savepoint and leave the connection in
// a broken transactional state.
conn.execute_batch("SAVEPOINT embed_page")?; conn.execute_batch("SAVEPOINT embed_page")?;
let page_result = embed_page( let page_result = embed_page(
conn, conn,
@@ -109,10 +91,6 @@ pub async fn embed_documents(
Ok(result) Ok(result)
} }
/// Process a single page of pending documents within an active savepoint.
///
/// All `?` propagation from this function is caught by the caller, which
/// rolls back the savepoint on error.
#[allow(clippy::too_many_arguments)] #[allow(clippy::too_many_arguments)]
async fn embed_page( async fn embed_page(
conn: &Connection, conn: &Connection,
@@ -125,12 +103,10 @@ async fn embed_page(
total: usize, total: usize,
progress_callback: &Option<Box<dyn Fn(usize, usize)>>, progress_callback: &Option<Box<dyn Fn(usize, usize)>>,
) -> Result<()> { ) -> Result<()> {
// Build chunk work items for this page
let mut all_chunks: Vec<ChunkWork> = Vec::new(); let mut all_chunks: Vec<ChunkWork> = Vec::new();
let mut page_normal_docs: usize = 0; let mut page_normal_docs: usize = 0;
for doc in pending { for doc in pending {
// Always advance the cursor, even for skipped docs, to avoid re-fetching
*last_id = doc.document_id; *last_id = doc.document_id;
if doc.content_text.is_empty() { if doc.content_text.is_empty() {
@@ -142,9 +118,6 @@ async fn embed_page(
let chunks = split_into_chunks(&doc.content_text); let chunks = split_into_chunks(&doc.content_text);
let total_chunks = chunks.len(); let total_chunks = chunks.len();
// Overflow guard: skip documents that produce too many chunks.
// Must run BEFORE clear_document_embeddings so existing embeddings
// are preserved when we skip.
if total_chunks as i64 > CHUNK_ROWID_MULTIPLIER { if total_chunks as i64 > CHUNK_ROWID_MULTIPLIER {
warn!( warn!(
doc_id = doc.document_id, doc_id = doc.document_id,
@@ -152,12 +125,10 @@ async fn embed_page(
max = CHUNK_ROWID_MULTIPLIER, max = CHUNK_ROWID_MULTIPLIER,
"Document produces too many chunks, skipping to prevent rowid collision" "Document produces too many chunks, skipping to prevent rowid collision"
); );
// Record a sentinel error so the document is not re-detected as
// pending on subsequent runs (prevents infinite re-processing).
record_embedding_error( record_embedding_error(
conn, conn,
doc.document_id, doc.document_id,
0, // sentinel chunk_index 0,
&doc.content_hash, &doc.content_hash,
"overflow-sentinel", "overflow-sentinel",
model_name, model_name,
@@ -174,10 +145,6 @@ async fn embed_page(
continue; continue;
} }
// Don't clear existing embeddings here — defer until the first
// successful chunk embedding so that if ALL chunks for a document
// fail, old embeddings survive instead of leaving zero data.
for (chunk_index, text) in chunks { for (chunk_index, text) in chunks {
all_chunks.push(ChunkWork { all_chunks.push(ChunkWork {
doc_id: doc.document_id, doc_id: doc.document_id,
@@ -190,15 +157,10 @@ async fn embed_page(
} }
page_normal_docs += 1; page_normal_docs += 1;
// Don't fire progress here — wait until embedding completes below.
} }
// Track documents whose old embeddings have been cleared.
// We defer clearing until the first successful chunk embedding so
// that if ALL chunks for a document fail, old embeddings survive.
let mut cleared_docs: HashSet<i64> = HashSet::new(); let mut cleared_docs: HashSet<i64> = HashSet::new();
// Process chunks in batches of BATCH_SIZE
for batch in all_chunks.chunks(BATCH_SIZE) { for batch in all_chunks.chunks(BATCH_SIZE) {
let texts: Vec<String> = batch.iter().map(|c| c.text.clone()).collect(); let texts: Vec<String> = batch.iter().map(|c| c.text.clone()).collect();
@@ -235,7 +197,6 @@ async fn embed_page(
continue; continue;
} }
// Clear old embeddings on first successful chunk for this document
if !cleared_docs.contains(&chunk.doc_id) { if !cleared_docs.contains(&chunk.doc_id) {
clear_document_embeddings(conn, chunk.doc_id)?; clear_document_embeddings(conn, chunk.doc_id)?;
cleared_docs.insert(chunk.doc_id); cleared_docs.insert(chunk.doc_id);
@@ -255,12 +216,8 @@ async fn embed_page(
} }
} }
Err(e) => { Err(e) => {
// Batch failed — retry each chunk individually so one
// oversized chunk doesn't poison the entire batch.
let err_str = e.to_string(); let err_str = e.to_string();
let err_lower = err_str.to_lowercase(); let err_lower = err_str.to_lowercase();
// Ollama error messages vary across versions. Match broadly
// against known patterns to detect context-window overflow.
let is_context_error = err_lower.contains("context length") let is_context_error = err_lower.contains("context length")
|| err_lower.contains("too long") || err_lower.contains("too long")
|| err_lower.contains("maximum context") || err_lower.contains("maximum context")
@@ -276,7 +233,6 @@ async fn embed_page(
if !embeddings.is_empty() if !embeddings.is_empty()
&& embeddings[0].len() == EXPECTED_DIMS => && embeddings[0].len() == EXPECTED_DIMS =>
{ {
// Clear old embeddings on first successful chunk
if !cleared_docs.contains(&chunk.doc_id) { if !cleared_docs.contains(&chunk.doc_id) {
clear_document_embeddings(conn, chunk.doc_id)?; clear_document_embeddings(conn, chunk.doc_id)?;
cleared_docs.insert(chunk.doc_id); cleared_docs.insert(chunk.doc_id);
@@ -333,8 +289,6 @@ async fn embed_page(
} }
} }
// Fire progress for all normal documents after embedding completes.
// This ensures progress reflects actual embedding work, not just chunking.
*processed += page_normal_docs; *processed += page_normal_docs;
if let Some(cb) = progress_callback { if let Some(cb) = progress_callback {
cb(*processed, total); cb(*processed, total);
@@ -343,7 +297,6 @@ async fn embed_page(
Ok(()) Ok(())
} }
/// Clear all embeddings and metadata for a document.
fn clear_document_embeddings(conn: &Connection, document_id: i64) -> Result<()> { fn clear_document_embeddings(conn: &Connection, document_id: i64) -> Result<()> {
conn.execute( conn.execute(
"DELETE FROM embedding_metadata WHERE document_id = ?1", "DELETE FROM embedding_metadata WHERE document_id = ?1",
@@ -360,7 +313,6 @@ fn clear_document_embeddings(conn: &Connection, document_id: i64) -> Result<()>
Ok(()) Ok(())
} }
/// Store an embedding vector and its metadata.
#[allow(clippy::too_many_arguments)] #[allow(clippy::too_many_arguments)]
fn store_embedding( fn store_embedding(
conn: &Connection, conn: &Connection,
@@ -384,7 +336,6 @@ fn store_embedding(
rusqlite::params![rowid, embedding_bytes], rusqlite::params![rowid, embedding_bytes],
)?; )?;
// Only store chunk_count on the sentinel row (chunk_index=0)
let chunk_count: Option<i64> = if chunk_index == 0 { let chunk_count: Option<i64> = if chunk_index == 0 {
Some(total_chunks as i64) Some(total_chunks as i64)
} else { } else {
@@ -413,7 +364,6 @@ fn store_embedding(
Ok(()) Ok(())
} }
/// Record an embedding error in metadata for later retry.
fn record_embedding_error( fn record_embedding_error(
conn: &Connection, conn: &Connection,
doc_id: i64, doc_id: i64,

View File

@@ -1,5 +1,3 @@
//! GitLab API client with rate limiting and error handling.
use async_stream::stream; use async_stream::stream;
use chrono::{DateTime, Utc}; use chrono::{DateTime, Utc};
use futures::Stream; use futures::Stream;
@@ -13,12 +11,11 @@ use tokio::time::sleep;
use tracing::debug; use tracing::debug;
use super::types::{ use super::types::{
GitLabDiscussion, GitLabIssue, GitLabLabelEvent, GitLabMergeRequest, GitLabMilestoneEvent, GitLabDiscussion, GitLabIssue, GitLabIssueRef, GitLabLabelEvent, GitLabMergeRequest,
GitLabProject, GitLabStateEvent, GitLabUser, GitLabVersion, GitLabMilestoneEvent, GitLabProject, GitLabStateEvent, GitLabUser, GitLabVersion,
}; };
use crate::core::error::{LoreError, Result}; use crate::core::error::{LoreError, Result};
/// Simple rate limiter with jitter to prevent thundering herd.
struct RateLimiter { struct RateLimiter {
last_request: Instant, last_request: Instant,
min_interval: Duration, min_interval: Duration,
@@ -26,35 +23,28 @@ struct RateLimiter {
impl RateLimiter { impl RateLimiter {
fn new(requests_per_second: f64) -> Self { fn new(requests_per_second: f64) -> Self {
// Floor at 0.1 rps to prevent division-by-zero panic in Duration::from_secs_f64
let rps = requests_per_second.max(0.1); let rps = requests_per_second.max(0.1);
Self { Self {
last_request: Instant::now() - Duration::from_secs(1), // Allow immediate first request last_request: Instant::now() - Duration::from_secs(1),
min_interval: Duration::from_secs_f64(1.0 / rps), min_interval: Duration::from_secs_f64(1.0 / rps),
} }
} }
/// Compute how long to wait and update last_request to the expected
/// request time (now, or now + delay). The caller sleeps *after*
/// releasing the mutex guard.
fn check_delay(&mut self) -> Option<Duration> { fn check_delay(&mut self) -> Option<Duration> {
let elapsed = self.last_request.elapsed(); let elapsed = self.last_request.elapsed();
if elapsed < self.min_interval { if elapsed < self.min_interval {
let jitter = Duration::from_millis(rand_jitter()); let jitter = Duration::from_millis(rand_jitter());
let delay = self.min_interval - elapsed + jitter; let delay = self.min_interval - elapsed + jitter;
// Set last_request to when the request will actually fire
self.last_request = Instant::now() + delay; self.last_request = Instant::now() + delay;
Some(delay) Some(delay)
} else { } else {
// No delay needed; request fires immediately
self.last_request = Instant::now(); self.last_request = Instant::now();
None None
} }
} }
} }
/// Generate random jitter between 0-50ms using a lightweight atomic counter.
fn rand_jitter() -> u64 { fn rand_jitter() -> u64 {
use std::sync::atomic::{AtomicU64, Ordering}; use std::sync::atomic::{AtomicU64, Ordering};
static COUNTER: AtomicU64 = AtomicU64::new(0); static COUNTER: AtomicU64 = AtomicU64::new(0);
@@ -66,10 +56,6 @@ fn rand_jitter() -> u64 {
(n ^ nanos) % 50 (n ^ nanos) % 50
} }
/// GitLab API client with rate limiting.
///
/// Cloning shares the underlying HTTP client and rate limiter,
/// making it cheap and safe for concurrent use across projects.
#[derive(Clone)] #[derive(Clone)]
pub struct GitLabClient { pub struct GitLabClient {
client: Client, client: Client,
@@ -79,7 +65,6 @@ pub struct GitLabClient {
} }
impl GitLabClient { impl GitLabClient {
/// Create a new GitLab client.
pub fn new(base_url: &str, token: &str, requests_per_second: Option<f64>) -> Self { pub fn new(base_url: &str, token: &str, requests_per_second: Option<f64>) -> Self {
let mut headers = HeaderMap::new(); let mut headers = HeaderMap::new();
headers.insert(ACCEPT, HeaderValue::from_static("application/json")); headers.insert(ACCEPT, HeaderValue::from_static("application/json"));
@@ -100,26 +85,21 @@ impl GitLabClient {
} }
} }
/// Get the currently authenticated user.
pub async fn get_current_user(&self) -> Result<GitLabUser> { pub async fn get_current_user(&self) -> Result<GitLabUser> {
self.request("/api/v4/user").await self.request("/api/v4/user").await
} }
/// Get a project by its path.
pub async fn get_project(&self, path_with_namespace: &str) -> Result<GitLabProject> { pub async fn get_project(&self, path_with_namespace: &str) -> Result<GitLabProject> {
let encoded = urlencoding::encode(path_with_namespace); let encoded = urlencoding::encode(path_with_namespace);
self.request(&format!("/api/v4/projects/{encoded}")).await self.request(&format!("/api/v4/projects/{encoded}")).await
} }
/// Get GitLab server version.
pub async fn get_version(&self) -> Result<GitLabVersion> { pub async fn get_version(&self) -> Result<GitLabVersion> {
self.request("/api/v4/version").await self.request("/api/v4/version").await
} }
/// Maximum number of retries on 429 Too Many Requests.
const MAX_RETRIES: u32 = 3; const MAX_RETRIES: u32 = 3;
/// Make an authenticated API request with automatic 429 retry.
async fn request<T: serde::de::DeserializeOwned>(&self, path: &str) -> Result<T> { async fn request<T: serde::de::DeserializeOwned>(&self, path: &str) -> Result<T> {
let url = format!("{}{}", self.base_url, path); let url = format!("{}{}", self.base_url, path);
let mut last_response = None; let mut last_response = None;
@@ -160,14 +140,10 @@ impl GitLabClient {
break; break;
} }
// Safety: the loop always executes at least once (0..=MAX_RETRIES)
// and either sets last_response+break, or continues (only when
// attempt < MAX_RETRIES). The final iteration always reaches break.
self.handle_response(last_response.expect("retry loop ran at least once"), path) self.handle_response(last_response.expect("retry loop ran at least once"), path)
.await .await
} }
/// Parse retry-after header from a 429 response, defaulting to 60s.
fn parse_retry_after(response: &Response) -> u64 { fn parse_retry_after(response: &Response) -> u64 {
response response
.headers() .headers()
@@ -177,7 +153,6 @@ impl GitLabClient {
.unwrap_or(60) .unwrap_or(60)
} }
/// Handle API response, converting errors appropriately.
async fn handle_response<T: serde::de::DeserializeOwned>( async fn handle_response<T: serde::de::DeserializeOwned>(
&self, &self,
response: Response, response: Response,
@@ -217,15 +192,6 @@ impl GitLabClient {
} }
} }
/// Paginate through issues for a project.
///
/// Returns an async stream of issues, handling pagination automatically.
/// Issues are ordered by updated_at ascending to support cursor-based sync.
///
/// # Arguments
/// * `gitlab_project_id` - The GitLab project ID
/// * `updated_after` - Optional cursor (ms epoch) - only fetch issues updated after this
/// * `cursor_rewind_seconds` - Rewind cursor by this many seconds to handle edge cases
pub fn paginate_issues( pub fn paginate_issues(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -236,7 +202,6 @@ impl GitLabClient {
let mut page = 1u32; let mut page = 1u32;
let per_page = 100u32; let per_page = 100u32;
// Apply cursor rewind, clamping to 0
let rewound_cursor = updated_after.map(|ts| { let rewound_cursor = updated_after.map(|ts| {
let rewind_ms = (cursor_rewind_seconds as i64) * 1000; let rewind_ms = (cursor_rewind_seconds as i64) * 1000;
(ts - rewind_ms).max(0) (ts - rewind_ms).max(0)
@@ -252,7 +217,6 @@ impl GitLabClient {
("page", page.to_string()), ("page", page.to_string()),
]; ];
// Add updated_after if we have a cursor
if let Some(ts_ms) = rewound_cursor if let Some(ts_ms) = rewound_cursor
&& let Some(iso) = ms_to_iso8601(ts_ms) && let Some(iso) = ms_to_iso8601(ts_ms)
{ {
@@ -267,12 +231,10 @@ impl GitLabClient {
let is_empty = issues.is_empty(); let is_empty = issues.is_empty();
let full_page = issues.len() as u32 == per_page; let full_page = issues.len() as u32 == per_page;
// Yield each issue
for issue in issues { for issue in issues {
yield Ok(issue); yield Ok(issue);
} }
// Check for next page
let next_page = headers let next_page = headers
.get("x-next-page") .get("x-next-page")
.and_then(|v| v.to_str().ok()) .and_then(|v| v.to_str().ok())
@@ -286,7 +248,6 @@ impl GitLabClient {
if is_empty || !full_page { if is_empty || !full_page {
break; break;
} }
// Full page but no x-next-page header: try next page heuristically
page += 1; page += 1;
} }
} }
@@ -300,9 +261,6 @@ impl GitLabClient {
}) })
} }
/// Paginate through discussions for an issue.
///
/// Returns an async stream of discussions, handling pagination automatically.
pub fn paginate_issue_discussions( pub fn paginate_issue_discussions(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -346,7 +304,6 @@ impl GitLabClient {
if is_empty || !full_page { if is_empty || !full_page {
break; break;
} }
// Full page but no x-next-page header: try next page heuristically
page += 1; page += 1;
} }
} }
@@ -360,15 +317,6 @@ impl GitLabClient {
}) })
} }
/// Paginate through merge requests for a project.
///
/// Returns an async stream of merge requests, handling pagination automatically.
/// MRs are ordered by updated_at ascending to support cursor-based sync.
///
/// # Arguments
/// * `gitlab_project_id` - The GitLab project ID
/// * `updated_after` - Optional cursor (ms epoch) - only fetch MRs updated after this
/// * `cursor_rewind_seconds` - Rewind cursor by this many seconds to handle edge cases
pub fn paginate_merge_requests( pub fn paginate_merge_requests(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -414,7 +362,6 @@ impl GitLabClient {
}) })
} }
/// Fetch a single page of merge requests with pagination metadata.
pub async fn fetch_merge_requests_page( pub async fn fetch_merge_requests_page(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -423,7 +370,6 @@ impl GitLabClient {
page: u32, page: u32,
per_page: u32, per_page: u32,
) -> Result<MergeRequestPage> { ) -> Result<MergeRequestPage> {
// Apply cursor rewind, clamping to 0
let rewound_cursor = updated_after.map(|ts| { let rewound_cursor = updated_after.map(|ts| {
let rewind_ms = (cursor_rewind_seconds as i64) * 1000; let rewind_ms = (cursor_rewind_seconds as i64) * 1000;
(ts - rewind_ms).max(0) (ts - rewind_ms).max(0)
@@ -438,7 +384,6 @@ impl GitLabClient {
("page", page.to_string()), ("page", page.to_string()),
]; ];
// Add updated_after if we have a cursor
if let Some(ts_ms) = rewound_cursor if let Some(ts_ms) = rewound_cursor
&& let Some(iso) = ms_to_iso8601(ts_ms) && let Some(iso) = ms_to_iso8601(ts_ms)
{ {
@@ -450,7 +395,6 @@ impl GitLabClient {
.request_with_headers::<Vec<GitLabMergeRequest>>(&path, &params) .request_with_headers::<Vec<GitLabMergeRequest>>(&path, &params)
.await?; .await?;
// Pagination fallback chain: Link header > x-next-page > full-page heuristic
let link_next = parse_link_header_next(&headers); let link_next = parse_link_header_next(&headers);
let x_next_page = headers let x_next_page = headers
.get("x-next-page") .get("x-next-page")
@@ -459,10 +403,10 @@ impl GitLabClient {
let full_page = items.len() as u32 == per_page; let full_page = items.len() as u32 == per_page;
let (next_page, is_last_page) = match (link_next.is_some(), x_next_page, full_page) { let (next_page, is_last_page) = match (link_next.is_some(), x_next_page, full_page) {
(true, _, _) => (Some(page + 1), false), // Link header present: continue (true, _, _) => (Some(page + 1), false),
(false, Some(np), _) => (Some(np), false), // x-next-page present: use it (false, Some(np), _) => (Some(np), false),
(false, None, true) => (Some(page + 1), false), // Full page, no headers: try next (false, None, true) => (Some(page + 1), false),
(false, None, false) => (None, true), // Partial page: we're done (false, None, false) => (None, true),
}; };
Ok(MergeRequestPage { Ok(MergeRequestPage {
@@ -472,9 +416,6 @@ impl GitLabClient {
}) })
} }
/// Paginate through discussions for a merge request.
///
/// Returns an async stream of discussions, handling pagination automatically.
pub fn paginate_mr_discussions( pub fn paginate_mr_discussions(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -505,7 +446,6 @@ impl GitLabClient {
yield Ok(discussion); yield Ok(discussion);
} }
// Pagination fallback chain: Link header > x-next-page > full-page heuristic
let link_next = parse_link_header_next(&headers); let link_next = parse_link_header_next(&headers);
let x_next_page = headers let x_next_page = headers
.get("x-next-page") .get("x-next-page")
@@ -514,18 +454,18 @@ impl GitLabClient {
let should_continue = match (link_next.is_some(), x_next_page, full_page) { let should_continue = match (link_next.is_some(), x_next_page, full_page) {
(true, _, _) => { (true, _, _) => {
page += 1; // Link header present: continue to next page += 1;
true true
} }
(false, Some(np), _) if np > page => { (false, Some(np), _) if np > page => {
page = np; // x-next-page tells us exactly which page page = np;
true true
} }
(false, None, true) => { (false, None, true) => {
page += 1; // Full page, no headers: try next page += 1;
true true
} }
_ => false, // Otherwise we're done _ => false,
}; };
if !should_continue || is_empty { if !should_continue || is_empty {
@@ -541,8 +481,6 @@ impl GitLabClient {
}) })
} }
/// Make an authenticated API request with query parameters, returning headers.
/// Automatically retries on 429 Too Many Requests.
async fn request_with_headers<T: serde::de::DeserializeOwned>( async fn request_with_headers<T: serde::de::DeserializeOwned>(
&self, &self,
path: &str, path: &str,
@@ -595,8 +533,6 @@ impl GitLabClient {
} }
} }
/// Fetch all discussions for an MR (collects paginated results).
/// This is useful for parallel prefetching where we want all data upfront.
impl GitLabClient { impl GitLabClient {
pub async fn fetch_all_mr_discussions( pub async fn fetch_all_mr_discussions(
&self, &self,
@@ -616,12 +552,7 @@ impl GitLabClient {
} }
} }
/// Resource events API methods.
///
/// These endpoints return per-entity events (not project-wide), so they collect
/// all pages into a Vec rather than using streaming.
impl GitLabClient { impl GitLabClient {
/// Fetch all pages from a paginated endpoint, returning collected results.
async fn fetch_all_pages<T: serde::de::DeserializeOwned>(&self, path: &str) -> Result<Vec<T>> { async fn fetch_all_pages<T: serde::de::DeserializeOwned>(&self, path: &str) -> Result<Vec<T>> {
let mut results = Vec::new(); let mut results = Vec::new();
let mut page = 1u32; let mut page = 1u32;
@@ -658,7 +589,16 @@ impl GitLabClient {
Ok(results) Ok(results)
} }
/// Fetch state events for an issue. pub async fn fetch_mr_closes_issues(
&self,
gitlab_project_id: i64,
iid: i64,
) -> Result<Vec<GitLabIssueRef>> {
let path =
format!("/api/v4/projects/{gitlab_project_id}/merge_requests/{iid}/closes_issues");
self.fetch_all_pages(&path).await
}
pub async fn fetch_issue_state_events( pub async fn fetch_issue_state_events(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -669,7 +609,6 @@ impl GitLabClient {
self.fetch_all_pages(&path).await self.fetch_all_pages(&path).await
} }
/// Fetch label events for an issue.
pub async fn fetch_issue_label_events( pub async fn fetch_issue_label_events(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -680,7 +619,6 @@ impl GitLabClient {
self.fetch_all_pages(&path).await self.fetch_all_pages(&path).await
} }
/// Fetch milestone events for an issue.
pub async fn fetch_issue_milestone_events( pub async fn fetch_issue_milestone_events(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -691,7 +629,6 @@ impl GitLabClient {
self.fetch_all_pages(&path).await self.fetch_all_pages(&path).await
} }
/// Fetch state events for a merge request.
pub async fn fetch_mr_state_events( pub async fn fetch_mr_state_events(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -703,7 +640,6 @@ impl GitLabClient {
self.fetch_all_pages(&path).await self.fetch_all_pages(&path).await
} }
/// Fetch label events for a merge request.
pub async fn fetch_mr_label_events( pub async fn fetch_mr_label_events(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -715,7 +651,6 @@ impl GitLabClient {
self.fetch_all_pages(&path).await self.fetch_all_pages(&path).await
} }
/// Fetch milestone events for a merge request.
pub async fn fetch_mr_milestone_events( pub async fn fetch_mr_milestone_events(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -727,12 +662,6 @@ impl GitLabClient {
self.fetch_all_pages(&path).await self.fetch_all_pages(&path).await
} }
/// Fetch all three event types for an entity concurrently.
///
/// Uses `tokio::join!` instead of `try_join!` so that a 404 on one event
/// type (e.g., labels) doesn't discard successfully-fetched data from the
/// others (e.g., state events). 404s are treated as "no events" (empty vec);
/// all other errors (including 403) are propagated for retry.
pub async fn fetch_all_resource_events( pub async fn fetch_all_resource_events(
&self, &self,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -765,8 +694,6 @@ impl GitLabClient {
} }
}; };
// Treat 404 as "endpoint not available for this entity" → empty vec.
// All other errors (403, network, etc.) propagate for retry handling.
let state = coalesce_not_found(state_res)?; let state = coalesce_not_found(state_res)?;
let label = coalesce_not_found(label_res)?; let label = coalesce_not_found(label_res)?;
let milestone = coalesce_not_found(milestone_res)?; let milestone = coalesce_not_found(milestone_res)?;
@@ -775,7 +702,6 @@ impl GitLabClient {
} }
} }
/// Page result for merge request pagination.
#[derive(Debug)] #[derive(Debug)]
pub struct MergeRequestPage { pub struct MergeRequestPage {
pub items: Vec<GitLabMergeRequest>, pub items: Vec<GitLabMergeRequest>,
@@ -783,13 +709,11 @@ pub struct MergeRequestPage {
pub is_last_page: bool, pub is_last_page: bool,
} }
/// Parse Link header to extract rel="next" URL (RFC 8288).
fn parse_link_header_next(headers: &HeaderMap) -> Option<String> { fn parse_link_header_next(headers: &HeaderMap) -> Option<String> {
headers headers
.get("link") .get("link")
.and_then(|v| v.to_str().ok()) .and_then(|v| v.to_str().ok())
.and_then(|link_str| { .and_then(|link_str| {
// Format: <url>; rel="next", <url>; rel="last"
for part in link_str.split(',') { for part in link_str.split(',') {
let part = part.trim(); let part = part.trim();
if (part.contains("rel=\"next\"") || part.contains("rel=next")) if (part.contains("rel=\"next\"") || part.contains("rel=next"))
@@ -803,11 +727,6 @@ fn parse_link_header_next(headers: &HeaderMap) -> Option<String> {
}) })
} }
/// Convert a resource-event fetch result: 404 → empty vec, other errors propagated.
///
/// 404 means the endpoint doesn't exist for this entity type — truly permanent.
/// 403 and other errors are NOT coalesced: they may be environmental (VPN, token
/// rotation) and should be retried via the drain loop's backoff mechanism.
fn coalesce_not_found<T>(result: Result<Vec<T>>) -> Result<Vec<T>> { fn coalesce_not_found<T>(result: Result<Vec<T>>) -> Result<Vec<T>> {
match result { match result {
Ok(v) => Ok(v), Ok(v) => Ok(v),
@@ -816,7 +735,6 @@ fn coalesce_not_found<T>(result: Result<Vec<T>>) -> Result<Vec<T>> {
} }
} }
/// Convert milliseconds since epoch to ISO 8601 string.
fn ms_to_iso8601(ms: i64) -> Option<String> { fn ms_to_iso8601(ms: i64) -> Option<String> {
DateTime::<Utc>::from_timestamp_millis(ms) DateTime::<Utc>::from_timestamp_millis(ms)
.map(|dt| dt.format("%Y-%m-%dT%H:%M:%S%.3fZ").to_string()) .map(|dt| dt.format("%Y-%m-%dT%H:%M:%S%.3fZ").to_string())
@@ -828,7 +746,6 @@ mod tests {
#[test] #[test]
fn ms_to_iso8601_converts_correctly() { fn ms_to_iso8601_converts_correctly() {
// 2024-01-15T10:00:00.000Z = 1705312800000 ms
let result = ms_to_iso8601(1705312800000); let result = ms_to_iso8601(1705312800000);
assert_eq!(result, Some("2024-01-15T10:00:00.000Z".to_string())); assert_eq!(result, Some("2024-01-15T10:00:00.000Z".to_string()));
} }
@@ -841,10 +758,9 @@ mod tests {
#[test] #[test]
fn cursor_rewind_clamps_to_zero() { fn cursor_rewind_clamps_to_zero() {
let updated_after = 1000i64; // 1 second let updated_after = 1000i64;
let cursor_rewind_seconds = 10u32; // 10 seconds let cursor_rewind_seconds = 10u32;
// Rewind would be negative, should clamp to 0
let rewind_ms = i64::from(cursor_rewind_seconds) * 1000; let rewind_ms = i64::from(cursor_rewind_seconds) * 1000;
let rewound = (updated_after - rewind_ms).max(0); let rewound = (updated_after - rewind_ms).max(0);
@@ -853,13 +769,12 @@ mod tests {
#[test] #[test]
fn cursor_rewind_applies_correctly() { fn cursor_rewind_applies_correctly() {
let updated_after = 1705312800000i64; // 2024-01-15T10:00:00.000Z let updated_after = 1705312800000i64;
let cursor_rewind_seconds = 60u32; // 1 minute let cursor_rewind_seconds = 60u32;
let rewind_ms = i64::from(cursor_rewind_seconds) * 1000; let rewind_ms = i64::from(cursor_rewind_seconds) * 1000;
let rewound = (updated_after - rewind_ms).max(0); let rewound = (updated_after - rewind_ms).max(0);
// Should be 1 minute earlier
assert_eq!(rewound, 1705312740000); assert_eq!(rewound, 1705312740000);
} }

View File

@@ -1,5 +1,3 @@
//! GitLab API client and types.
pub mod client; pub mod client;
pub mod transformers; pub mod transformers;
pub mod types; pub mod types;
@@ -10,7 +8,7 @@ pub use transformers::{
transform_discussion, transform_issue, transform_notes, transform_discussion, transform_issue, transform_notes,
}; };
pub use types::{ pub use types::{
GitLabAuthor, GitLabDiscussion, GitLabIssue, GitLabLabelEvent, GitLabLabelRef, GitLabAuthor, GitLabDiscussion, GitLabIssue, GitLabIssueRef, GitLabLabelEvent, GitLabLabelRef,
GitLabMergeRequestRef, GitLabMilestoneEvent, GitLabMilestoneRef, GitLabNote, GitLabMergeRequestRef, GitLabMilestoneEvent, GitLabMilestoneRef, GitLabNote,
GitLabNotePosition, GitLabProject, GitLabStateEvent, GitLabUser, GitLabVersion, GitLabNotePosition, GitLabProject, GitLabStateEvent, GitLabUser, GitLabVersion,
}; };

View File

@@ -1,66 +1,57 @@
//! Discussion and note transformers: convert GitLab discussions to local schema.
use tracing::warn; use tracing::warn;
use crate::core::time::{iso_to_ms, iso_to_ms_strict, now_ms}; use crate::core::time::{iso_to_ms, iso_to_ms_strict, now_ms};
use crate::gitlab::types::{GitLabDiscussion, GitLabNote}; use crate::gitlab::types::{GitLabDiscussion, GitLabNote};
/// Reference to the parent noteable (Issue or MergeRequest).
/// Uses an enum to prevent accidentally mixing up issue vs MR IDs at compile time.
#[derive(Debug, Clone, Copy)] #[derive(Debug, Clone, Copy)]
pub enum NoteableRef { pub enum NoteableRef {
Issue(i64), Issue(i64),
MergeRequest(i64), MergeRequest(i64),
} }
/// Normalized discussion for local storage.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct NormalizedDiscussion { pub struct NormalizedDiscussion {
pub gitlab_discussion_id: String, pub gitlab_discussion_id: String,
pub project_id: i64, pub project_id: i64,
pub issue_id: Option<i64>, pub issue_id: Option<i64>,
pub merge_request_id: Option<i64>, pub merge_request_id: Option<i64>,
pub noteable_type: String, // "Issue" or "MergeRequest" pub noteable_type: String,
pub individual_note: bool, pub individual_note: bool,
pub first_note_at: Option<i64>, // min(note.created_at) in ms epoch pub first_note_at: Option<i64>,
pub last_note_at: Option<i64>, // max(note.created_at) in ms epoch pub last_note_at: Option<i64>,
pub last_seen_at: i64, pub last_seen_at: i64,
pub resolvable: bool, // any note is resolvable pub resolvable: bool,
pub resolved: bool, // all resolvable notes are resolved pub resolved: bool,
} }
/// Normalized note for local storage.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct NormalizedNote { pub struct NormalizedNote {
pub gitlab_id: i64, pub gitlab_id: i64,
pub project_id: i64, pub project_id: i64,
pub note_type: Option<String>, // "DiscussionNote" | "DiffNote" | null pub note_type: Option<String>,
pub is_system: bool, pub is_system: bool,
pub author_username: String, pub author_username: String,
pub body: String, pub body: String,
pub created_at: i64, // ms epoch pub created_at: i64,
pub updated_at: i64, // ms epoch pub updated_at: i64,
pub last_seen_at: i64, pub last_seen_at: i64,
pub position: i32, // 0-indexed array position pub position: i32,
pub resolvable: bool, pub resolvable: bool,
pub resolved: bool, pub resolved: bool,
pub resolved_by: Option<String>, pub resolved_by: Option<String>,
pub resolved_at: Option<i64>, pub resolved_at: Option<i64>,
// DiffNote position fields (CP1 - basic path/line)
pub position_old_path: Option<String>, pub position_old_path: Option<String>,
pub position_new_path: Option<String>, pub position_new_path: Option<String>,
pub position_old_line: Option<i32>, pub position_old_line: Option<i32>,
pub position_new_line: Option<i32>, pub position_new_line: Option<i32>,
// DiffNote extended position fields (CP2) pub position_type: Option<String>,
pub position_type: Option<String>, // "text" | "image" | "file" pub position_line_range_start: Option<i32>,
pub position_line_range_start: Option<i32>, // multi-line comment start pub position_line_range_end: Option<i32>,
pub position_line_range_end: Option<i32>, // multi-line comment end pub position_base_sha: Option<String>,
pub position_base_sha: Option<String>, // Base commit SHA for diff pub position_start_sha: Option<String>,
pub position_start_sha: Option<String>, // Start commit SHA for diff pub position_head_sha: Option<String>,
pub position_head_sha: Option<String>, // Head commit SHA for diff
} }
/// Parse ISO 8601 timestamp to milliseconds, defaulting to 0 on failure.
fn parse_timestamp(ts: &str) -> i64 { fn parse_timestamp(ts: &str) -> i64 {
match iso_to_ms(ts) { match iso_to_ms(ts) {
Some(ms) => ms, Some(ms) => ms,
@@ -71,7 +62,6 @@ fn parse_timestamp(ts: &str) -> i64 {
} }
} }
/// Transform a GitLab discussion into normalized schema.
pub fn transform_discussion( pub fn transform_discussion(
gitlab_discussion: &GitLabDiscussion, gitlab_discussion: &GitLabDiscussion,
local_project_id: i64, local_project_id: i64,
@@ -79,13 +69,11 @@ pub fn transform_discussion(
) -> NormalizedDiscussion { ) -> NormalizedDiscussion {
let now = now_ms(); let now = now_ms();
// Derive issue_id, merge_request_id, and noteable_type from the enum
let (issue_id, merge_request_id, noteable_type) = match noteable { let (issue_id, merge_request_id, noteable_type) = match noteable {
NoteableRef::Issue(id) => (Some(id), None, "Issue"), NoteableRef::Issue(id) => (Some(id), None, "Issue"),
NoteableRef::MergeRequest(id) => (None, Some(id), "MergeRequest"), NoteableRef::MergeRequest(id) => (None, Some(id), "MergeRequest"),
}; };
// Compute first_note_at and last_note_at from notes
let note_timestamps: Vec<i64> = gitlab_discussion let note_timestamps: Vec<i64> = gitlab_discussion
.notes .notes
.iter() .iter()
@@ -95,10 +83,8 @@ pub fn transform_discussion(
let first_note_at = note_timestamps.iter().min().copied(); let first_note_at = note_timestamps.iter().min().copied();
let last_note_at = note_timestamps.iter().max().copied(); let last_note_at = note_timestamps.iter().max().copied();
// Compute resolvable: any note is resolvable
let resolvable = gitlab_discussion.notes.iter().any(|n| n.resolvable); let resolvable = gitlab_discussion.notes.iter().any(|n| n.resolvable);
// Compute resolved: all resolvable notes are resolved
let resolved = if resolvable { let resolved = if resolvable {
gitlab_discussion gitlab_discussion
.notes .notes
@@ -124,8 +110,6 @@ pub fn transform_discussion(
} }
} }
/// Transform a GitLab discussion for MR context.
/// Convenience wrapper that uses NoteableRef::MergeRequest internally.
pub fn transform_mr_discussion( pub fn transform_mr_discussion(
gitlab_discussion: &GitLabDiscussion, gitlab_discussion: &GitLabDiscussion,
local_project_id: i64, local_project_id: i64,
@@ -138,7 +122,6 @@ pub fn transform_mr_discussion(
) )
} }
/// Transform notes from a GitLab discussion into normalized schema.
pub fn transform_notes( pub fn transform_notes(
gitlab_discussion: &GitLabDiscussion, gitlab_discussion: &GitLabDiscussion,
local_project_id: i64, local_project_id: i64,
@@ -159,7 +142,6 @@ fn transform_single_note(
position: i32, position: i32,
now: i64, now: i64,
) -> NormalizedNote { ) -> NormalizedNote {
// Extract DiffNote position fields if present
let ( let (
position_old_path, position_old_path,
position_new_path, position_new_path,
@@ -201,8 +183,6 @@ fn transform_single_note(
} }
} }
/// Extract DiffNote position fields from GitLabNotePosition.
/// Returns tuple of all position fields (all None if position is None).
#[allow(clippy::type_complexity)] #[allow(clippy::type_complexity)]
fn extract_position_fields( fn extract_position_fields(
position: &Option<crate::gitlab::types::GitLabNotePosition>, position: &Option<crate::gitlab::types::GitLabNotePosition>,
@@ -240,8 +220,6 @@ fn extract_position_fields(
} }
} }
/// Transform notes from a GitLab discussion with strict timestamp parsing.
/// Returns Err if any timestamp is invalid - no silent fallback to 0.
pub fn transform_notes_with_diff_position( pub fn transform_notes_with_diff_position(
gitlab_discussion: &GitLabDiscussion, gitlab_discussion: &GitLabDiscussion,
local_project_id: i64, local_project_id: i64,
@@ -262,7 +240,6 @@ fn transform_single_note_strict(
position: i32, position: i32,
now: i64, now: i64,
) -> Result<NormalizedNote, String> { ) -> Result<NormalizedNote, String> {
// Parse timestamps with strict error handling
let created_at = iso_to_ms_strict(&note.created_at)?; let created_at = iso_to_ms_strict(&note.created_at)?;
let updated_at = iso_to_ms_strict(&note.updated_at)?; let updated_at = iso_to_ms_strict(&note.updated_at)?;
let resolved_at = match &note.resolved_at { let resolved_at = match &note.resolved_at {
@@ -270,7 +247,6 @@ fn transform_single_note_strict(
None => None, None => None,
}; };
// Extract DiffNote position fields if present
let ( let (
position_old_path, position_old_path,
position_new_path, position_new_path,
@@ -448,7 +424,7 @@ mod tests {
false, false,
vec![ vec![
make_test_note(1, "2024-01-16T09:00:00.000Z", false, false, false), make_test_note(1, "2024-01-16T09:00:00.000Z", false, false, false),
make_test_note(2, "2024-01-16T09:00:00.000Z", true, false, false), // system note make_test_note(2, "2024-01-16T09:00:00.000Z", true, false, false),
], ],
); );
@@ -482,16 +458,14 @@ mod tests {
false, false,
vec![ vec![
make_test_note(1, "2024-01-16T09:00:00.000Z", false, false, false), make_test_note(1, "2024-01-16T09:00:00.000Z", false, false, false),
make_test_note(2, "2024-01-16T11:00:00.000Z", false, false, false), // latest make_test_note(2, "2024-01-16T11:00:00.000Z", false, false, false),
make_test_note(3, "2024-01-16T10:00:00.000Z", false, false, false), make_test_note(3, "2024-01-16T10:00:00.000Z", false, false, false),
], ],
); );
let result = transform_discussion(&discussion, 100, NoteableRef::Issue(42)); let result = transform_discussion(&discussion, 100, NoteableRef::Issue(42));
// first_note_at should be 09:00 (note 1)
assert_eq!(result.first_note_at, Some(1705395600000)); assert_eq!(result.first_note_at, Some(1705395600000));
// last_note_at should be 11:00 (note 2)
assert_eq!(result.last_note_at, Some(1705402800000)); assert_eq!(result.last_note_at, Some(1705402800000));
} }
@@ -527,7 +501,7 @@ mod tests {
let resolvable = make_test_discussion( let resolvable = make_test_discussion(
false, false,
vec![ vec![
make_test_note(1, "2024-01-16T09:00:00.000Z", false, true, false), // resolvable make_test_note(1, "2024-01-16T09:00:00.000Z", false, true, false),
make_test_note(2, "2024-01-16T10:00:00.000Z", false, false, false), make_test_note(2, "2024-01-16T10:00:00.000Z", false, false, false),
], ],
); );
@@ -538,16 +512,14 @@ mod tests {
#[test] #[test]
fn computes_resolved_only_when_all_resolvable_notes_resolved() { fn computes_resolved_only_when_all_resolvable_notes_resolved() {
// Mix of resolved/unresolved - not resolved
let partial = make_test_discussion( let partial = make_test_discussion(
false, false,
vec![ vec![
make_test_note(1, "2024-01-16T09:00:00.000Z", false, true, true), // resolved make_test_note(1, "2024-01-16T09:00:00.000Z", false, true, true),
make_test_note(2, "2024-01-16T10:00:00.000Z", false, true, false), // not resolved make_test_note(2, "2024-01-16T10:00:00.000Z", false, true, false),
], ],
); );
// All resolvable notes resolved
let fully_resolved = make_test_discussion( let fully_resolved = make_test_discussion(
false, false,
vec![ vec![
@@ -556,7 +528,6 @@ mod tests {
], ],
); );
// No resolvable notes - resolved should be false
let no_resolvable = make_test_discussion( let no_resolvable = make_test_discussion(
false, false,
vec![make_test_note( vec![make_test_note(

View File

@@ -1,5 +1,3 @@
//! Issue transformer: converts GitLabIssue to local schema.
use chrono::DateTime; use chrono::DateTime;
use thiserror::Error; use thiserror::Error;
@@ -11,7 +9,6 @@ pub enum TransformError {
TimestampParse(String, String), TimestampParse(String, String),
} }
/// Local schema representation of an issue row.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct IssueRow { pub struct IssueRow {
pub gitlab_id: i64, pub gitlab_id: i64,
@@ -21,14 +18,13 @@ pub struct IssueRow {
pub description: Option<String>, pub description: Option<String>,
pub state: String, pub state: String,
pub author_username: String, pub author_username: String,
pub created_at: i64, // ms epoch UTC pub created_at: i64,
pub updated_at: i64, // ms epoch UTC pub updated_at: i64,
pub web_url: String, pub web_url: String,
pub due_date: Option<String>, // YYYY-MM-DD pub due_date: Option<String>,
pub milestone_title: Option<String>, // Denormalized for quick display pub milestone_title: Option<String>,
} }
/// Local schema representation of a milestone row.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct MilestoneRow { pub struct MilestoneRow {
pub gitlab_id: i64, pub gitlab_id: i64,
@@ -41,7 +37,6 @@ pub struct MilestoneRow {
pub web_url: Option<String>, pub web_url: Option<String>,
} }
/// Issue bundled with extracted metadata.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct IssueWithMetadata { pub struct IssueWithMetadata {
pub issue: IssueRow, pub issue: IssueRow,
@@ -50,14 +45,12 @@ pub struct IssueWithMetadata {
pub milestone: Option<MilestoneRow>, pub milestone: Option<MilestoneRow>,
} }
/// Parse ISO 8601 timestamp to milliseconds since Unix epoch.
fn parse_timestamp(ts: &str) -> Result<i64, TransformError> { fn parse_timestamp(ts: &str) -> Result<i64, TransformError> {
DateTime::parse_from_rfc3339(ts) DateTime::parse_from_rfc3339(ts)
.map(|dt| dt.timestamp_millis()) .map(|dt| dt.timestamp_millis())
.map_err(|e| TransformError::TimestampParse(ts.to_string(), e.to_string())) .map_err(|e| TransformError::TimestampParse(ts.to_string(), e.to_string()))
} }
/// Transform a GitLab issue into local schema format.
pub fn transform_issue(issue: &GitLabIssue) -> Result<IssueWithMetadata, TransformError> { pub fn transform_issue(issue: &GitLabIssue) -> Result<IssueWithMetadata, TransformError> {
let created_at = parse_timestamp(&issue.created_at)?; let created_at = parse_timestamp(&issue.created_at)?;
let updated_at = parse_timestamp(&issue.updated_at)?; let updated_at = parse_timestamp(&issue.updated_at)?;
@@ -182,20 +175,16 @@ mod tests {
let issue = make_test_issue(); let issue = make_test_issue();
let result = transform_issue(&issue).unwrap(); let result = transform_issue(&issue).unwrap();
// 2024-01-15T10:00:00.000Z = 1705312800000 ms
assert_eq!(result.issue.created_at, 1705312800000); assert_eq!(result.issue.created_at, 1705312800000);
// 2024-01-20T15:30:00.000Z = 1705764600000 ms
assert_eq!(result.issue.updated_at, 1705764600000); assert_eq!(result.issue.updated_at, 1705764600000);
} }
#[test] #[test]
fn handles_timezone_offset_timestamps() { fn handles_timezone_offset_timestamps() {
let mut issue = make_test_issue(); let mut issue = make_test_issue();
// GitLab can return timestamps with timezone offset
issue.created_at = "2024-01-15T05:00:00-05:00".to_string(); issue.created_at = "2024-01-15T05:00:00-05:00".to_string();
let result = transform_issue(&issue).unwrap(); let result = transform_issue(&issue).unwrap();
// 05:00 EST = 10:00 UTC = same as original test
assert_eq!(result.issue.created_at, 1705312800000); assert_eq!(result.issue.created_at, 1705312800000);
} }
@@ -237,10 +226,8 @@ mod tests {
let result = transform_issue(&issue).unwrap(); let result = transform_issue(&issue).unwrap();
// Denormalized title on issue for quick display
assert_eq!(result.issue.milestone_title, Some("v1.0".to_string())); assert_eq!(result.issue.milestone_title, Some("v1.0".to_string()));
// Full milestone row for normalized storage
let milestone = result.milestone.expect("should have milestone"); let milestone = result.milestone.expect("should have milestone");
assert_eq!(milestone.gitlab_id, 500); assert_eq!(milestone.gitlab_id, 500);
assert_eq!(milestone.iid, 5); assert_eq!(milestone.iid, 5);

View File

@@ -1,9 +1,6 @@
//! Merge request transformer: converts GitLabMergeRequest to local schema.
use crate::core::time::{iso_to_ms_opt_strict, iso_to_ms_strict, now_ms}; use crate::core::time::{iso_to_ms_opt_strict, iso_to_ms_strict, now_ms};
use crate::gitlab::types::GitLabMergeRequest; use crate::gitlab::types::GitLabMergeRequest;
/// Local schema representation of a merge request row.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct NormalizedMergeRequest { pub struct NormalizedMergeRequest {
pub gitlab_id: i64, pub gitlab_id: i64,
@@ -21,15 +18,14 @@ pub struct NormalizedMergeRequest {
pub references_full: Option<String>, pub references_full: Option<String>,
pub detailed_merge_status: Option<String>, pub detailed_merge_status: Option<String>,
pub merge_user_username: Option<String>, pub merge_user_username: Option<String>,
pub created_at: i64, // ms epoch UTC pub created_at: i64,
pub updated_at: i64, // ms epoch UTC pub updated_at: i64,
pub merged_at: Option<i64>, // ms epoch UTC pub merged_at: Option<i64>,
pub closed_at: Option<i64>, // ms epoch UTC pub closed_at: Option<i64>,
pub last_seen_at: i64, // ms epoch UTC pub last_seen_at: i64,
pub web_url: String, pub web_url: String,
} }
/// Merge request bundled with extracted metadata.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct MergeRequestWithMetadata { pub struct MergeRequestWithMetadata {
pub merge_request: NormalizedMergeRequest, pub merge_request: NormalizedMergeRequest,
@@ -38,61 +34,43 @@ pub struct MergeRequestWithMetadata {
pub reviewer_usernames: Vec<String>, pub reviewer_usernames: Vec<String>,
} }
/// Transform a GitLab merge request into local schema format.
///
/// # Arguments
/// * `gitlab_mr` - The GitLab MR API response
/// * `local_project_id` - The local database project ID (not GitLab's project_id)
///
/// # Returns
/// * `Ok(MergeRequestWithMetadata)` - Transformed MR with extracted metadata
/// * `Err(String)` - Error message if transformation fails (e.g., invalid timestamps)
pub fn transform_merge_request( pub fn transform_merge_request(
gitlab_mr: &GitLabMergeRequest, gitlab_mr: &GitLabMergeRequest,
local_project_id: i64, local_project_id: i64,
) -> Result<MergeRequestWithMetadata, String> { ) -> Result<MergeRequestWithMetadata, String> {
// Parse required timestamps
let created_at = iso_to_ms_strict(&gitlab_mr.created_at)?; let created_at = iso_to_ms_strict(&gitlab_mr.created_at)?;
let updated_at = iso_to_ms_strict(&gitlab_mr.updated_at)?; let updated_at = iso_to_ms_strict(&gitlab_mr.updated_at)?;
// Parse optional timestamps
let merged_at = iso_to_ms_opt_strict(&gitlab_mr.merged_at)?; let merged_at = iso_to_ms_opt_strict(&gitlab_mr.merged_at)?;
let closed_at = iso_to_ms_opt_strict(&gitlab_mr.closed_at)?; let closed_at = iso_to_ms_opt_strict(&gitlab_mr.closed_at)?;
// Draft: prefer draft, fallback to work_in_progress
let is_draft = gitlab_mr.draft || gitlab_mr.work_in_progress; let is_draft = gitlab_mr.draft || gitlab_mr.work_in_progress;
// Merge status: prefer detailed_merge_status over legacy
let detailed_merge_status = gitlab_mr let detailed_merge_status = gitlab_mr
.detailed_merge_status .detailed_merge_status
.clone() .clone()
.or_else(|| gitlab_mr.merge_status_legacy.clone()); .or_else(|| gitlab_mr.merge_status_legacy.clone());
// Merge user: prefer merge_user over merged_by
let merge_user_username = gitlab_mr let merge_user_username = gitlab_mr
.merge_user .merge_user
.as_ref() .as_ref()
.map(|u| u.username.clone()) .map(|u| u.username.clone())
.or_else(|| gitlab_mr.merged_by.as_ref().map(|u| u.username.clone())); .or_else(|| gitlab_mr.merged_by.as_ref().map(|u| u.username.clone()));
// References extraction
let (references_short, references_full) = gitlab_mr let (references_short, references_full) = gitlab_mr
.references .references
.as_ref() .as_ref()
.map(|r| (Some(r.short.clone()), Some(r.full.clone()))) .map(|r| (Some(r.short.clone()), Some(r.full.clone())))
.unwrap_or((None, None)); .unwrap_or((None, None));
// Head SHA
let head_sha = gitlab_mr.sha.clone(); let head_sha = gitlab_mr.sha.clone();
// Extract assignee usernames
let assignee_usernames: Vec<String> = gitlab_mr let assignee_usernames: Vec<String> = gitlab_mr
.assignees .assignees
.iter() .iter()
.map(|a| a.username.clone()) .map(|a| a.username.clone())
.collect(); .collect();
// Extract reviewer usernames
let reviewer_usernames: Vec<String> = gitlab_mr let reviewer_usernames: Vec<String> = gitlab_mr
.reviewers .reviewers
.iter() .iter()

View File

@@ -1,5 +1,3 @@
//! Transformers for converting GitLab API responses to local schema.
pub mod discussion; pub mod discussion;
pub mod issue; pub mod issue;
pub mod merge_request; pub mod merge_request;

View File

@@ -1,5 +1,3 @@
//! GitLab API response types.
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Deserialize)] #[derive(Debug, Clone, Deserialize)]
@@ -34,10 +32,6 @@ pub struct GitLabVersion {
pub revision: String, pub revision: String,
} }
// === Checkpoint 1: Issue/Discussion types ===
/// Author information embedded in issues, notes, etc.
/// Note: This is a simplified author - GitLabUser has more fields.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabAuthor { pub struct GitLabAuthor {
pub id: i64, pub id: i64,
@@ -45,7 +39,6 @@ pub struct GitLabAuthor {
pub name: String, pub name: String,
} }
/// GitLab Milestone (embedded in issues).
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabMilestone { pub struct GitLabMilestone {
pub id: i64, pub id: i64,
@@ -53,117 +46,79 @@ pub struct GitLabMilestone {
pub project_id: Option<i64>, pub project_id: Option<i64>,
pub title: String, pub title: String,
pub description: Option<String>, pub description: Option<String>,
/// "active" or "closed".
pub state: Option<String>, pub state: Option<String>,
/// YYYY-MM-DD format.
pub due_date: Option<String>, pub due_date: Option<String>,
pub web_url: Option<String>, pub web_url: Option<String>,
} }
/// GitLab Issue from /projects/:id/issues endpoint.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabIssue { pub struct GitLabIssue {
/// GitLab global ID (unique across all projects).
pub id: i64, pub id: i64,
/// Project-scoped issue number (the number shown in the UI).
pub iid: i64, pub iid: i64,
/// The project this issue belongs to.
pub project_id: i64, pub project_id: i64,
pub title: String, pub title: String,
pub description: Option<String>, pub description: Option<String>,
/// "opened" or "closed".
pub state: String, pub state: String,
/// ISO 8601 timestamp.
pub created_at: String, pub created_at: String,
/// ISO 8601 timestamp.
pub updated_at: String, pub updated_at: String,
/// ISO 8601 timestamp when closed (null if open).
pub closed_at: Option<String>, pub closed_at: Option<String>,
pub author: GitLabAuthor, pub author: GitLabAuthor,
/// Assignees (can be multiple).
#[serde(default)] #[serde(default)]
pub assignees: Vec<GitLabAuthor>, pub assignees: Vec<GitLabAuthor>,
/// Array of label names (not label details).
pub labels: Vec<String>, pub labels: Vec<String>,
/// Associated milestone (if any).
pub milestone: Option<GitLabMilestone>, pub milestone: Option<GitLabMilestone>,
/// Due date in YYYY-MM-DD format.
pub due_date: Option<String>, pub due_date: Option<String>,
pub web_url: String, pub web_url: String,
} }
/// GitLab Discussion from /projects/:id/issues/:iid/discussions endpoint.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabDiscussion { pub struct GitLabDiscussion {
/// String ID (e.g., "6a9c1750b37d...").
pub id: String, pub id: String,
/// True if this is a standalone comment, false if it's a threaded discussion.
pub individual_note: bool, pub individual_note: bool,
/// Notes in this discussion (always at least one).
pub notes: Vec<GitLabNote>, pub notes: Vec<GitLabNote>,
} }
/// A single note/comment within a discussion.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabNote { pub struct GitLabNote {
pub id: i64, pub id: i64,
/// "DiscussionNote", "DiffNote", or null for simple notes.
/// Using rename because "type" is a reserved keyword in Rust.
#[serde(rename = "type")] #[serde(rename = "type")]
pub note_type: Option<String>, pub note_type: Option<String>,
pub body: String, pub body: String,
pub author: GitLabAuthor, pub author: GitLabAuthor,
/// ISO 8601 timestamp.
pub created_at: String, pub created_at: String,
/// ISO 8601 timestamp.
pub updated_at: String, pub updated_at: String,
/// True for system-generated notes (label changes, assignments, etc.).
pub system: bool, pub system: bool,
/// Whether this note can be resolved (MR discussions).
#[serde(default)] #[serde(default)]
pub resolvable: bool, pub resolvable: bool,
/// Whether this note has been resolved.
#[serde(default)] #[serde(default)]
pub resolved: bool, pub resolved: bool,
/// Who resolved this note (if resolved).
pub resolved_by: Option<GitLabAuthor>, pub resolved_by: Option<GitLabAuthor>,
/// When this note was resolved (if resolved).
pub resolved_at: Option<String>, pub resolved_at: Option<String>,
/// Position metadata for DiffNotes (code review comments).
pub position: Option<GitLabNotePosition>, pub position: Option<GitLabNotePosition>,
} }
/// Position metadata for DiffNotes (code review comments on specific lines).
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabNotePosition { pub struct GitLabNotePosition {
pub old_path: Option<String>, pub old_path: Option<String>,
pub new_path: Option<String>, pub new_path: Option<String>,
pub old_line: Option<i32>, pub old_line: Option<i32>,
pub new_line: Option<i32>, pub new_line: Option<i32>,
/// Position type: "text", "image", or "file".
pub position_type: Option<String>, pub position_type: Option<String>,
/// Line range for multi-line comments (GitLab 13.6+).
pub line_range: Option<GitLabLineRange>, pub line_range: Option<GitLabLineRange>,
/// Base commit SHA for the diff.
pub base_sha: Option<String>, pub base_sha: Option<String>,
/// Start commit SHA for the diff.
pub start_sha: Option<String>, pub start_sha: Option<String>,
/// Head commit SHA for the diff.
pub head_sha: Option<String>, pub head_sha: Option<String>,
} }
/// Line range for multi-line DiffNote comments.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabLineRange { pub struct GitLabLineRange {
pub start: GitLabLineRangePoint, pub start: GitLabLineRangePoint,
pub end: GitLabLineRangePoint, pub end: GitLabLineRangePoint,
} }
/// A point in a line range (start or end).
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabLineRangePoint { pub struct GitLabLineRangePoint {
pub line_code: Option<String>, pub line_code: Option<String>,
/// "old" or "new".
#[serde(rename = "type")] #[serde(rename = "type")]
pub line_type: Option<String>, pub line_type: Option<String>,
pub old_line: Option<i32>, pub old_line: Option<i32>,
@@ -171,20 +126,15 @@ pub struct GitLabLineRangePoint {
} }
impl GitLabLineRange { impl GitLabLineRange {
/// Get the start line number (new_line preferred, falls back to old_line).
pub fn start_line(&self) -> Option<i32> { pub fn start_line(&self) -> Option<i32> {
self.start.new_line.or(self.start.old_line) self.start.new_line.or(self.start.old_line)
} }
/// Get the end line number (new_line preferred, falls back to old_line).
pub fn end_line(&self) -> Option<i32> { pub fn end_line(&self) -> Option<i32> {
self.end.new_line.or(self.end.old_line) self.end.new_line.or(self.end.old_line)
} }
} }
// === Resource Event types (Phase B - Gate 1) ===
/// Reference to an MR in state event's source_merge_request field.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabMergeRequestRef { pub struct GitLabMergeRequestRef {
pub iid: i64, pub iid: i64,
@@ -192,7 +142,6 @@ pub struct GitLabMergeRequestRef {
pub web_url: Option<String>, pub web_url: Option<String>,
} }
/// Reference to a label in label event's label field.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabLabelRef { pub struct GitLabLabelRef {
pub id: i64, pub id: i64,
@@ -201,7 +150,6 @@ pub struct GitLabLabelRef {
pub description: Option<String>, pub description: Option<String>,
} }
/// Reference to a milestone in milestone event's milestone field.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabMilestoneRef { pub struct GitLabMilestoneRef {
pub id: i64, pub id: i64,
@@ -209,7 +157,6 @@ pub struct GitLabMilestoneRef {
pub title: String, pub title: String,
} }
/// State change event from the Resource State Events API.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabStateEvent { pub struct GitLabStateEvent {
pub id: i64, pub id: i64,
@@ -222,7 +169,6 @@ pub struct GitLabStateEvent {
pub source_merge_request: Option<GitLabMergeRequestRef>, pub source_merge_request: Option<GitLabMergeRequestRef>,
} }
/// Label change event from the Resource Label Events API.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabLabelEvent { pub struct GitLabLabelEvent {
pub id: i64, pub id: i64,
@@ -234,7 +180,6 @@ pub struct GitLabLabelEvent {
pub action: String, pub action: String,
} }
/// Milestone change event from the Resource Milestone Events API.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabMilestoneEvent { pub struct GitLabMilestoneEvent {
pub id: i64, pub id: i64,
@@ -246,18 +191,22 @@ pub struct GitLabMilestoneEvent {
pub action: String, pub action: String,
} }
// === Checkpoint 2: Merge Request types === #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabIssueRef {
pub id: i64,
pub iid: i64,
pub project_id: i64,
pub title: String,
pub state: String,
pub web_url: String,
}
/// GitLab MR references (short and full reference strings).
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabReferences { pub struct GitLabReferences {
/// Short reference e.g. "!42".
pub short: String, pub short: String,
/// Full reference e.g. "group/project!42".
pub full: String, pub full: String,
} }
/// GitLab Reviewer (can have approval state in future).
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabReviewer { pub struct GitLabReviewer {
pub id: i64, pub id: i64,
@@ -265,58 +214,36 @@ pub struct GitLabReviewer {
pub name: String, pub name: String,
} }
/// GitLab Merge Request from /projects/:id/merge_requests endpoint.
/// Note: Uses non-deprecated field names where possible (detailed_merge_status, merge_user).
/// Falls back gracefully for older GitLab versions.
#[derive(Debug, Clone, Deserialize, Serialize)] #[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabMergeRequest { pub struct GitLabMergeRequest {
/// GitLab global ID (unique across all projects).
pub id: i64, pub id: i64,
/// Project-scoped MR number (the number shown in the UI).
pub iid: i64, pub iid: i64,
/// The project this MR belongs to.
pub project_id: i64, pub project_id: i64,
pub title: String, pub title: String,
pub description: Option<String>, pub description: Option<String>,
/// "opened" | "merged" | "closed" | "locked".
pub state: String, pub state: String,
/// Work-in-progress status (preferred over work_in_progress).
#[serde(default)] #[serde(default)]
pub draft: bool, pub draft: bool,
/// Deprecated; fallback for older instances.
#[serde(default)] #[serde(default)]
pub work_in_progress: bool, pub work_in_progress: bool,
pub source_branch: String, pub source_branch: String,
pub target_branch: String, pub target_branch: String,
/// Current commit SHA at head of source branch (CP3-ready).
pub sha: Option<String>, pub sha: Option<String>,
/// Short and full reference strings (CP3-ready).
pub references: Option<GitLabReferences>, pub references: Option<GitLabReferences>,
/// Non-deprecated merge status. Prefer over merge_status.
pub detailed_merge_status: Option<String>, pub detailed_merge_status: Option<String>,
/// Deprecated merge_status field for fallback.
#[serde(alias = "merge_status")] #[serde(alias = "merge_status")]
pub merge_status_legacy: Option<String>, pub merge_status_legacy: Option<String>,
/// ISO 8601 timestamp.
pub created_at: String, pub created_at: String,
/// ISO 8601 timestamp.
pub updated_at: String, pub updated_at: String,
/// ISO 8601 timestamp when merged (null if not merged).
pub merged_at: Option<String>, pub merged_at: Option<String>,
/// ISO 8601 timestamp when closed (null if not closed).
pub closed_at: Option<String>, pub closed_at: Option<String>,
pub author: GitLabAuthor, pub author: GitLabAuthor,
/// Non-deprecated; who merged this MR.
pub merge_user: Option<GitLabAuthor>, pub merge_user: Option<GitLabAuthor>,
/// Deprecated; fallback for older instances.
pub merged_by: Option<GitLabAuthor>, pub merged_by: Option<GitLabAuthor>,
/// Array of label names.
#[serde(default)] #[serde(default)]
pub labels: Vec<String>, pub labels: Vec<String>,
/// Assignees (can be multiple).
#[serde(default)] #[serde(default)]
pub assignees: Vec<GitLabAuthor>, pub assignees: Vec<GitLabAuthor>,
/// Reviewers (MR-specific).
#[serde(default)] #[serde(default)]
pub reviewers: Vec<GitLabReviewer>, pub reviewers: Vec<GitLabReviewer>,
pub web_url: String, pub web_url: String,

View File

@@ -7,8 +7,6 @@ use crate::documents::SourceType;
const DIRTY_SOURCES_BATCH_SIZE: usize = 500; const DIRTY_SOURCES_BATCH_SIZE: usize = 500;
/// Mark a source entity as dirty INSIDE an existing transaction.
/// ON CONFLICT resets ALL backoff/error state so fresh updates are immediately eligible.
pub fn mark_dirty_tx( pub fn mark_dirty_tx(
tx: &rusqlite::Transaction<'_>, tx: &rusqlite::Transaction<'_>,
source_type: SourceType, source_type: SourceType,
@@ -28,7 +26,6 @@ pub fn mark_dirty_tx(
Ok(()) Ok(())
} }
/// Convenience wrapper for non-transactional contexts.
pub fn mark_dirty(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<()> { pub fn mark_dirty(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<()> {
conn.execute( conn.execute(
"INSERT INTO dirty_sources (source_type, source_id, queued_at) "INSERT INTO dirty_sources (source_type, source_id, queued_at)
@@ -44,9 +41,6 @@ pub fn mark_dirty(conn: &Connection, source_type: SourceType, source_id: i64) ->
Ok(()) Ok(())
} }
/// Get dirty sources ready for processing.
/// Returns entries where next_attempt_at is NULL or <= now.
/// Orders by attempt_count ASC (fresh before failed), then queued_at ASC.
pub fn get_dirty_sources(conn: &Connection) -> Result<Vec<(SourceType, i64)>> { pub fn get_dirty_sources(conn: &Connection) -> Result<Vec<(SourceType, i64)>> {
let now = now_ms(); let now = now_ms();
let mut stmt = conn.prepare( let mut stmt = conn.prepare(
@@ -79,7 +73,6 @@ pub fn get_dirty_sources(conn: &Connection) -> Result<Vec<(SourceType, i64)>> {
Ok(results) Ok(results)
} }
/// Clear dirty entry after successful processing.
pub fn clear_dirty(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<()> { pub fn clear_dirty(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<()> {
conn.execute( conn.execute(
"DELETE FROM dirty_sources WHERE source_type = ?1 AND source_id = ?2", "DELETE FROM dirty_sources WHERE source_type = ?1 AND source_id = ?2",
@@ -88,7 +81,6 @@ pub fn clear_dirty(conn: &Connection, source_type: SourceType, source_id: i64) -
Ok(()) Ok(())
} }
/// Record an error for a dirty source, incrementing attempt_count and setting backoff.
pub fn record_dirty_error( pub fn record_dirty_error(
conn: &Connection, conn: &Connection,
source_type: SourceType, source_type: SourceType,
@@ -96,7 +88,6 @@ pub fn record_dirty_error(
error: &str, error: &str,
) -> Result<()> { ) -> Result<()> {
let now = now_ms(); let now = now_ms();
// Get current attempt_count first
let attempt_count: i64 = conn.query_row( let attempt_count: i64 = conn.query_row(
"SELECT attempt_count FROM dirty_sources WHERE source_type = ?1 AND source_id = ?2", "SELECT attempt_count FROM dirty_sources WHERE source_type = ?1 AND source_id = ?2",
rusqlite::params![source_type.as_str(), source_id], rusqlite::params![source_type.as_str(), source_id],
@@ -176,7 +167,6 @@ mod tests {
fn test_requeue_resets_backoff() { fn test_requeue_resets_backoff() {
let conn = setup_db(); let conn = setup_db();
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
// Simulate error state
record_dirty_error(&conn, SourceType::Issue, 1, "test error").unwrap(); record_dirty_error(&conn, SourceType::Issue, 1, "test error").unwrap();
let attempt: i64 = conn let attempt: i64 = conn
@@ -188,7 +178,6 @@ mod tests {
.unwrap(); .unwrap();
assert_eq!(attempt, 1); assert_eq!(attempt, 1);
// Re-mark should reset
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
let attempt: i64 = conn let attempt: i64 = conn
.query_row( .query_row(
@@ -213,7 +202,6 @@ mod tests {
fn test_get_respects_backoff() { fn test_get_respects_backoff() {
let conn = setup_db(); let conn = setup_db();
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
// Set next_attempt_at far in the future
conn.execute( conn.execute(
"UPDATE dirty_sources SET next_attempt_at = 9999999999999 WHERE source_id = 1", "UPDATE dirty_sources SET next_attempt_at = 9999999999999 WHERE source_id = 1",
[], [],
@@ -227,20 +215,18 @@ mod tests {
#[test] #[test]
fn test_get_orders_by_attempt_count() { fn test_get_orders_by_attempt_count() {
let conn = setup_db(); let conn = setup_db();
// Insert issue 1 (failed, attempt_count=2)
mark_dirty(&conn, SourceType::Issue, 1).unwrap(); mark_dirty(&conn, SourceType::Issue, 1).unwrap();
conn.execute( conn.execute(
"UPDATE dirty_sources SET attempt_count = 2 WHERE source_id = 1", "UPDATE dirty_sources SET attempt_count = 2 WHERE source_id = 1",
[], [],
) )
.unwrap(); .unwrap();
// Insert issue 2 (fresh, attempt_count=0)
mark_dirty(&conn, SourceType::Issue, 2).unwrap(); mark_dirty(&conn, SourceType::Issue, 2).unwrap();
let results = get_dirty_sources(&conn).unwrap(); let results = get_dirty_sources(&conn).unwrap();
assert_eq!(results.len(), 2); assert_eq!(results.len(), 2);
assert_eq!(results[0].1, 2); // Fresh first assert_eq!(results[0].1, 2);
assert_eq!(results[1].1, 1); // Failed second assert_eq!(results[1].1, 1);
} }
#[test] #[test]

View File

@@ -4,7 +4,6 @@ use crate::core::backoff::compute_next_attempt_at;
use crate::core::error::Result; use crate::core::error::Result;
use crate::core::time::now_ms; use crate::core::time::now_ms;
/// Noteable type for discussion queue.
#[derive(Debug, Clone, Copy, PartialEq, Eq)] #[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum NoteableType { pub enum NoteableType {
Issue, Issue,
@@ -28,7 +27,6 @@ impl NoteableType {
} }
} }
/// A pending discussion fetch entry.
pub struct PendingFetch { pub struct PendingFetch {
pub project_id: i64, pub project_id: i64,
pub noteable_type: NoteableType, pub noteable_type: NoteableType,
@@ -36,7 +34,6 @@ pub struct PendingFetch {
pub attempt_count: i32, pub attempt_count: i32,
} }
/// Queue a discussion fetch. ON CONFLICT resets backoff (consistent with dirty_sources).
pub fn queue_discussion_fetch( pub fn queue_discussion_fetch(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -57,7 +54,6 @@ pub fn queue_discussion_fetch(
Ok(()) Ok(())
} }
/// Get next batch of pending fetches (WHERE next_attempt_at IS NULL OR <= now).
pub fn get_pending_fetches(conn: &Connection, limit: usize) -> Result<Vec<PendingFetch>> { pub fn get_pending_fetches(conn: &Connection, limit: usize) -> Result<Vec<PendingFetch>> {
let now = now_ms(); let now = now_ms();
let mut stmt = conn.prepare( let mut stmt = conn.prepare(
@@ -96,7 +92,6 @@ pub fn get_pending_fetches(conn: &Connection, limit: usize) -> Result<Vec<Pendin
Ok(results) Ok(results)
} }
/// Mark fetch complete (remove from queue).
pub fn complete_fetch( pub fn complete_fetch(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -111,7 +106,6 @@ pub fn complete_fetch(
Ok(()) Ok(())
} }
/// Record fetch error with backoff.
pub fn record_fetch_error( pub fn record_fetch_error(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -213,7 +207,6 @@ mod tests {
.unwrap(); .unwrap();
assert_eq!(attempt, 1); assert_eq!(attempt, 1);
// Re-queue should reset
queue_discussion_fetch(&conn, 1, NoteableType::Issue, 42).unwrap(); queue_discussion_fetch(&conn, 1, NoteableType::Issue, 42).unwrap();
let attempt: i32 = conn let attempt: i32 = conn
.query_row( .query_row(

View File

@@ -1,11 +1,3 @@
//! Discussion ingestion with full-refresh strategy.
//!
//! Fetches discussions for an issue and stores them locally with:
//! - Raw payload storage with deduplication
//! - Full discussion and note replacement per issue
//! - Sync timestamp tracking per issue
//! - Safe stale removal only after successful pagination
use futures::StreamExt; use futures::StreamExt;
use rusqlite::Connection; use rusqlite::Connection;
use tracing::{debug, warn}; use tracing::{debug, warn};
@@ -20,7 +12,6 @@ use crate::ingestion::dirty_tracker;
use super::issues::IssueForDiscussionSync; use super::issues::IssueForDiscussionSync;
/// Result of discussion ingestion for a single issue.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct IngestDiscussionsResult { pub struct IngestDiscussionsResult {
pub discussions_fetched: usize, pub discussions_fetched: usize,
@@ -29,7 +20,6 @@ pub struct IngestDiscussionsResult {
pub stale_discussions_removed: usize, pub stale_discussions_removed: usize,
} }
/// Ingest discussions for a list of issues that need sync.
pub async fn ingest_issue_discussions( pub async fn ingest_issue_discussions(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
@@ -69,7 +59,6 @@ pub async fn ingest_issue_discussions(
Ok(total_result) Ok(total_result)
} }
/// Ingest discussions for a single issue.
async fn ingest_discussions_for_issue( async fn ingest_discussions_for_issue(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
@@ -86,16 +75,12 @@ async fn ingest_discussions_for_issue(
"Fetching discussions for issue" "Fetching discussions for issue"
); );
// Stream discussions from GitLab
let mut discussions_stream = client.paginate_issue_discussions(gitlab_project_id, issue.iid); let mut discussions_stream = client.paginate_issue_discussions(gitlab_project_id, issue.iid);
// Track discussions we've seen for stale removal
let mut seen_discussion_ids: Vec<String> = Vec::new(); let mut seen_discussion_ids: Vec<String> = Vec::new();
// Track if any error occurred during pagination
let mut pagination_error: Option<crate::core::error::LoreError> = None; let mut pagination_error: Option<crate::core::error::LoreError> = None;
while let Some(disc_result) = discussions_stream.next().await { while let Some(disc_result) = discussions_stream.next().await {
// Handle errors - record but don't delete stale data
let gitlab_discussion = match disc_result { let gitlab_discussion = match disc_result {
Ok(d) => d, Ok(d) => d,
Err(e) => { Err(e) => {
@@ -110,7 +95,6 @@ async fn ingest_discussions_for_issue(
}; };
result.discussions_fetched += 1; result.discussions_fetched += 1;
// Store raw payload
let payload_bytes = serde_json::to_vec(&gitlab_discussion)?; let payload_bytes = serde_json::to_vec(&gitlab_discussion)?;
let payload_id = store_payload( let payload_id = store_payload(
conn, conn,
@@ -123,55 +107,43 @@ async fn ingest_discussions_for_issue(
}, },
)?; )?;
// Transform and store discussion
let normalized = transform_discussion( let normalized = transform_discussion(
&gitlab_discussion, &gitlab_discussion,
local_project_id, local_project_id,
NoteableRef::Issue(issue.local_issue_id), NoteableRef::Issue(issue.local_issue_id),
); );
// Wrap all discussion+notes operations in a transaction for atomicity
let tx = conn.unchecked_transaction()?; let tx = conn.unchecked_transaction()?;
upsert_discussion(&tx, &normalized, payload_id)?; upsert_discussion(&tx, &normalized, payload_id)?;
// Get local discussion ID
let local_discussion_id: i64 = tx.query_row( let local_discussion_id: i64 = tx.query_row(
"SELECT id FROM discussions WHERE project_id = ? AND gitlab_discussion_id = ?", "SELECT id FROM discussions WHERE project_id = ? AND gitlab_discussion_id = ?",
(local_project_id, &normalized.gitlab_discussion_id), (local_project_id, &normalized.gitlab_discussion_id),
|row| row.get(0), |row| row.get(0),
)?; )?;
// Mark dirty for document regeneration (inside transaction)
dirty_tracker::mark_dirty_tx(&tx, SourceType::Discussion, local_discussion_id)?; dirty_tracker::mark_dirty_tx(&tx, SourceType::Discussion, local_discussion_id)?;
// Transform and store notes
let notes = transform_notes(&gitlab_discussion, local_project_id); let notes = transform_notes(&gitlab_discussion, local_project_id);
let notes_count = notes.len(); let notes_count = notes.len();
// Delete existing notes for this discussion (full refresh)
tx.execute( tx.execute(
"DELETE FROM notes WHERE discussion_id = ?", "DELETE FROM notes WHERE discussion_id = ?",
[local_discussion_id], [local_discussion_id],
)?; )?;
for note in notes { for note in notes {
// Note: per-note raw payload storage is skipped because the discussion
// payload (already stored above) contains all notes. The full note
// content is also stored in the notes table itself.
insert_note(&tx, local_discussion_id, &note, None)?; insert_note(&tx, local_discussion_id, &note, None)?;
} }
tx.commit()?; tx.commit()?;
// Increment counters AFTER successful commit to keep metrics honest
result.discussions_upserted += 1; result.discussions_upserted += 1;
result.notes_upserted += notes_count; result.notes_upserted += notes_count;
seen_discussion_ids.push(normalized.gitlab_discussion_id.clone()); seen_discussion_ids.push(normalized.gitlab_discussion_id.clone());
} }
// Only remove stale discussions and advance watermark if pagination completed
// without errors. Safe for both empty results and populated results.
if pagination_error.is_none() { if pagination_error.is_none() {
let removed = remove_stale_discussions(conn, issue.local_issue_id, &seen_discussion_ids)?; let removed = remove_stale_discussions(conn, issue.local_issue_id, &seen_discussion_ids)?;
result.stale_discussions_removed = removed; result.stale_discussions_removed = removed;
@@ -189,7 +161,6 @@ async fn ingest_discussions_for_issue(
Ok(result) Ok(result)
} }
/// Upsert a discussion.
fn upsert_discussion( fn upsert_discussion(
conn: &Connection, conn: &Connection,
discussion: &crate::gitlab::transformers::NormalizedDiscussion, discussion: &crate::gitlab::transformers::NormalizedDiscussion,
@@ -226,7 +197,6 @@ fn upsert_discussion(
Ok(()) Ok(())
} }
/// Insert a note.
fn insert_note( fn insert_note(
conn: &Connection, conn: &Connection,
discussion_id: i64, discussion_id: i64,
@@ -261,35 +231,26 @@ fn insert_note(
Ok(()) Ok(())
} }
/// Remove discussions that were not seen in this fetch (stale removal).
/// Chunks large sets to avoid SQL query size limits.
fn remove_stale_discussions( fn remove_stale_discussions(
conn: &Connection, conn: &Connection,
issue_id: i64, issue_id: i64,
seen_ids: &[String], seen_ids: &[String],
) -> Result<usize> { ) -> Result<usize> {
if seen_ids.is_empty() { if seen_ids.is_empty() {
// No discussions seen - remove all for this issue
let deleted = conn.execute("DELETE FROM discussions WHERE issue_id = ?", [issue_id])?; let deleted = conn.execute("DELETE FROM discussions WHERE issue_id = ?", [issue_id])?;
return Ok(deleted); return Ok(deleted);
} }
// SQLite has a limit of 999 variables per query by default
// Chunk the seen_ids to stay well under this limit
const CHUNK_SIZE: usize = 500; const CHUNK_SIZE: usize = 500;
// For safety, use a temp table approach for large sets
let total_deleted = if seen_ids.len() > CHUNK_SIZE { let total_deleted = if seen_ids.len() > CHUNK_SIZE {
// Create temp table for seen IDs
conn.execute( conn.execute(
"CREATE TEMP TABLE IF NOT EXISTS _temp_seen_discussions (id TEXT PRIMARY KEY)", "CREATE TEMP TABLE IF NOT EXISTS _temp_seen_discussions (id TEXT PRIMARY KEY)",
[], [],
)?; )?;
// Clear any previous data
conn.execute("DELETE FROM _temp_seen_discussions", [])?; conn.execute("DELETE FROM _temp_seen_discussions", [])?;
// Insert seen IDs in chunks
for chunk in seen_ids.chunks(CHUNK_SIZE) { for chunk in seen_ids.chunks(CHUNK_SIZE) {
let placeholders: Vec<&str> = chunk.iter().map(|_| "(?)").collect(); let placeholders: Vec<&str> = chunk.iter().map(|_| "(?)").collect();
let sql = format!( let sql = format!(
@@ -302,7 +263,6 @@ fn remove_stale_discussions(
conn.execute(&sql, params.as_slice())?; conn.execute(&sql, params.as_slice())?;
} }
// Delete discussions not in temp table
let deleted = conn.execute( let deleted = conn.execute(
"DELETE FROM discussions "DELETE FROM discussions
WHERE issue_id = ?1 WHERE issue_id = ?1
@@ -310,11 +270,9 @@ fn remove_stale_discussions(
[issue_id], [issue_id],
)?; )?;
// Clean up temp table
conn.execute("DROP TABLE IF EXISTS _temp_seen_discussions", [])?; conn.execute("DROP TABLE IF EXISTS _temp_seen_discussions", [])?;
deleted deleted
} else { } else {
// Small set - use simple IN clause
let placeholders: Vec<&str> = seen_ids.iter().map(|_| "?").collect(); let placeholders: Vec<&str> = seen_ids.iter().map(|_| "?").collect();
let sql = format!( let sql = format!(
"DELETE FROM discussions WHERE issue_id = ?1 AND gitlab_discussion_id NOT IN ({})", "DELETE FROM discussions WHERE issue_id = ?1 AND gitlab_discussion_id NOT IN ({})",
@@ -333,7 +291,6 @@ fn remove_stale_discussions(
Ok(total_deleted) Ok(total_deleted)
} }
/// Update the discussions_synced_for_updated_at timestamp on an issue.
fn update_issue_sync_timestamp(conn: &Connection, issue_id: i64, updated_at: i64) -> Result<()> { fn update_issue_sync_timestamp(conn: &Connection, issue_id: i64, updated_at: i64) -> Result<()> {
conn.execute( conn.execute(
"UPDATE issues SET discussions_synced_for_updated_at = ? WHERE id = ?", "UPDATE issues SET discussions_synced_for_updated_at = ? WHERE id = ?",

View File

@@ -1,12 +1,3 @@
//! Issue ingestion with cursor-based incremental sync.
//!
//! Fetches issues from GitLab and stores them locally with:
//! - Cursor-based pagination for incremental sync
//! - Raw payload storage with deduplication
//! - Label extraction and stale-link removal
//! - Milestone normalization with dedicated table
//! - Tracking of issues needing discussion sync
use std::ops::Deref; use std::ops::Deref;
use futures::StreamExt; use futures::StreamExt;
@@ -23,7 +14,6 @@ use crate::gitlab::transformers::{MilestoneRow, transform_issue};
use crate::gitlab::types::GitLabIssue; use crate::gitlab::types::GitLabIssue;
use crate::ingestion::dirty_tracker; use crate::ingestion::dirty_tracker;
/// Result of issue ingestion.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct IngestIssuesResult { pub struct IngestIssuesResult {
pub fetched: usize, pub fetched: usize,
@@ -32,36 +22,31 @@ pub struct IngestIssuesResult {
pub issues_needing_discussion_sync: Vec<IssueForDiscussionSync>, pub issues_needing_discussion_sync: Vec<IssueForDiscussionSync>,
} }
/// Issue that needs discussion sync.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct IssueForDiscussionSync { pub struct IssueForDiscussionSync {
pub local_issue_id: i64, pub local_issue_id: i64,
pub iid: i64, pub iid: i64,
pub updated_at: i64, // ms epoch pub updated_at: i64,
} }
/// Cursor state for incremental sync.
#[derive(Debug, Default)] #[derive(Debug, Default)]
struct SyncCursor { struct SyncCursor {
updated_at_cursor: Option<i64>, updated_at_cursor: Option<i64>,
tie_breaker_id: Option<i64>, tie_breaker_id: Option<i64>,
} }
/// Ingest issues for a project.
pub async fn ingest_issues( pub async fn ingest_issues(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
config: &Config, config: &Config,
project_id: i64, // Local DB project ID project_id: i64,
gitlab_project_id: i64, // GitLab project ID gitlab_project_id: i64,
) -> Result<IngestIssuesResult> { ) -> Result<IngestIssuesResult> {
let mut result = IngestIssuesResult::default(); let mut result = IngestIssuesResult::default();
// 1. Get current cursor
let cursor = get_sync_cursor(conn, project_id)?; let cursor = get_sync_cursor(conn, project_id)?;
debug!(?cursor, "Starting issue ingestion with cursor"); debug!(?cursor, "Starting issue ingestion with cursor");
// 2. Stream issues with cursor rewind
let mut issues_stream = client.paginate_issues( let mut issues_stream = client.paginate_issues(
gitlab_project_id, gitlab_project_id,
cursor.updated_at_cursor, cursor.updated_at_cursor,
@@ -72,12 +57,10 @@ pub async fn ingest_issues(
let mut last_updated_at: Option<i64> = None; let mut last_updated_at: Option<i64> = None;
let mut last_gitlab_id: Option<i64> = None; let mut last_gitlab_id: Option<i64> = None;
// 3. Process each issue
while let Some(issue_result) = issues_stream.next().await { while let Some(issue_result) = issues_stream.next().await {
let issue = issue_result?; let issue = issue_result?;
result.fetched += 1; result.fetched += 1;
// Parse timestamp early - skip issues with invalid timestamps
let issue_updated_at = match parse_timestamp(&issue.updated_at) { let issue_updated_at = match parse_timestamp(&issue.updated_at) {
Ok(ts) => ts, Ok(ts) => ts,
Err(e) => { Err(e) => {
@@ -90,23 +73,19 @@ pub async fn ingest_issues(
} }
}; };
// Apply local cursor filter (skip already-processed due to rewind overlap)
if !passes_cursor_filter_with_ts(issue.id, issue_updated_at, &cursor) { if !passes_cursor_filter_with_ts(issue.id, issue_updated_at, &cursor) {
debug!(gitlab_id = issue.id, "Skipping already-processed issue"); debug!(gitlab_id = issue.id, "Skipping already-processed issue");
continue; continue;
} }
// Transform and store
let labels_created = process_single_issue(conn, config, project_id, &issue)?; let labels_created = process_single_issue(conn, config, project_id, &issue)?;
result.upserted += 1; result.upserted += 1;
result.labels_created += labels_created; result.labels_created += labels_created;
// Track cursor position (use already-parsed timestamp)
last_updated_at = Some(issue_updated_at); last_updated_at = Some(issue_updated_at);
last_gitlab_id = Some(issue.id); last_gitlab_id = Some(issue.id);
batch_count += 1; batch_count += 1;
// Incremental cursor update every 100 issues
if batch_count % 100 == 0 if batch_count % 100 == 0
&& let (Some(ts), Some(id)) = (last_updated_at, last_gitlab_id) && let (Some(ts), Some(id)) = (last_updated_at, last_gitlab_id)
{ {
@@ -115,17 +94,12 @@ pub async fn ingest_issues(
} }
} }
// 4. Final cursor update
if let (Some(ts), Some(id)) = (last_updated_at, last_gitlab_id) { if let (Some(ts), Some(id)) = (last_updated_at, last_gitlab_id) {
update_sync_cursor(conn, project_id, ts, id)?; update_sync_cursor(conn, project_id, ts, id)?;
} else if result.fetched == 0 && cursor.updated_at_cursor.is_some() { } else if result.fetched == 0 && cursor.updated_at_cursor.is_some() {
// No new issues returned, but we have an existing cursor.
// Update sync_attempted_at to track that we checked (useful for monitoring)
// The cursor itself stays the same since there's nothing newer to advance to.
debug!("No new issues found, cursor unchanged"); debug!("No new issues found, cursor unchanged");
} }
// 5. Find issues needing discussion sync
result.issues_needing_discussion_sync = get_issues_needing_discussion_sync(conn, project_id)?; result.issues_needing_discussion_sync = get_issues_needing_discussion_sync(conn, project_id)?;
info!( info!(
@@ -139,11 +113,9 @@ pub async fn ingest_issues(
Ok(result) Ok(result)
} }
/// Check if an issue passes the cursor filter (not already processed).
/// Takes pre-parsed timestamp to avoid redundant parsing.
fn passes_cursor_filter_with_ts(gitlab_id: i64, issue_ts: i64, cursor: &SyncCursor) -> bool { fn passes_cursor_filter_with_ts(gitlab_id: i64, issue_ts: i64, cursor: &SyncCursor) -> bool {
let Some(cursor_ts) = cursor.updated_at_cursor else { let Some(cursor_ts) = cursor.updated_at_cursor else {
return true; // No cursor = fetch all return true;
}; };
if issue_ts < cursor_ts { if issue_ts < cursor_ts {
@@ -160,12 +132,10 @@ fn passes_cursor_filter_with_ts(gitlab_id: i64, issue_ts: i64, cursor: &SyncCurs
true true
} }
// Keep the original function for backward compatibility with tests
/// Check if an issue passes the cursor filter (not already processed).
#[cfg(test)] #[cfg(test)]
fn passes_cursor_filter(issue: &GitLabIssue, cursor: &SyncCursor) -> Result<bool> { fn passes_cursor_filter(issue: &GitLabIssue, cursor: &SyncCursor) -> Result<bool> {
let Some(cursor_ts) = cursor.updated_at_cursor else { let Some(cursor_ts) = cursor.updated_at_cursor else {
return Ok(true); // No cursor = fetch all return Ok(true);
}; };
let issue_ts = parse_timestamp(&issue.updated_at)?; let issue_ts = parse_timestamp(&issue.updated_at)?;
@@ -185,8 +155,6 @@ fn passes_cursor_filter(issue: &GitLabIssue, cursor: &SyncCursor) -> Result<bool
Ok(true) Ok(true)
} }
/// Process a single issue: store payload, upsert issue, handle labels.
/// All operations are wrapped in a transaction for atomicity.
fn process_single_issue( fn process_single_issue(
conn: &Connection, conn: &Connection,
config: &Config, config: &Config,
@@ -195,12 +163,10 @@ fn process_single_issue(
) -> Result<usize> { ) -> Result<usize> {
let now = now_ms(); let now = now_ms();
// Transform issue first (outside transaction - no DB access)
let payload_bytes = serde_json::to_vec(issue)?; let payload_bytes = serde_json::to_vec(issue)?;
let transformed = transform_issue(issue)?; let transformed = transform_issue(issue)?;
let issue_row = &transformed.issue; let issue_row = &transformed.issue;
// Wrap all DB operations in a transaction for atomicity
let tx = conn.unchecked_transaction()?; let tx = conn.unchecked_transaction()?;
let labels_created = process_issue_in_transaction( let labels_created = process_issue_in_transaction(
&tx, &tx,
@@ -219,7 +185,6 @@ fn process_single_issue(
Ok(labels_created) Ok(labels_created)
} }
/// Inner function that performs all DB operations within a transaction.
#[allow(clippy::too_many_arguments)] #[allow(clippy::too_many_arguments)]
fn process_issue_in_transaction( fn process_issue_in_transaction(
tx: &Transaction<'_>, tx: &Transaction<'_>,
@@ -235,7 +200,6 @@ fn process_issue_in_transaction(
) -> Result<usize> { ) -> Result<usize> {
let mut labels_created = 0; let mut labels_created = 0;
// Store raw payload (deref Transaction to Connection for store_payload)
let payload_id = store_payload( let payload_id = store_payload(
tx.deref(), tx.deref(),
StorePayloadOptions { StorePayloadOptions {
@@ -247,14 +211,12 @@ fn process_issue_in_transaction(
}, },
)?; )?;
// Upsert milestone if present, get local ID
let milestone_id: Option<i64> = if let Some(m) = milestone { let milestone_id: Option<i64> = if let Some(m) = milestone {
Some(upsert_milestone_tx(tx, project_id, m)?) Some(upsert_milestone_tx(tx, project_id, m)?)
} else { } else {
None None
}; };
// Upsert issue (including new fields: due_date, milestone_id, milestone_title)
tx.execute( tx.execute(
"INSERT INTO issues ( "INSERT INTO issues (
gitlab_id, project_id, iid, title, description, state, gitlab_id, project_id, iid, title, description, state,
@@ -292,35 +254,29 @@ fn process_issue_in_transaction(
), ),
)?; )?;
// Get local issue ID
let local_issue_id: i64 = tx.query_row( let local_issue_id: i64 = tx.query_row(
"SELECT id FROM issues WHERE project_id = ? AND iid = ?", "SELECT id FROM issues WHERE project_id = ? AND iid = ?",
(project_id, issue_row.iid), (project_id, issue_row.iid),
|row| row.get(0), |row| row.get(0),
)?; )?;
// Mark dirty for document regeneration (inside transaction)
dirty_tracker::mark_dirty_tx(tx, SourceType::Issue, local_issue_id)?; dirty_tracker::mark_dirty_tx(tx, SourceType::Issue, local_issue_id)?;
// Clear existing label links (stale removal)
tx.execute( tx.execute(
"DELETE FROM issue_labels WHERE issue_id = ?", "DELETE FROM issue_labels WHERE issue_id = ?",
[local_issue_id], [local_issue_id],
)?; )?;
// Upsert labels and create links
for label_name in label_names { for label_name in label_names {
let label_id = upsert_label_tx(tx, project_id, label_name, &mut labels_created)?; let label_id = upsert_label_tx(tx, project_id, label_name, &mut labels_created)?;
link_issue_label_tx(tx, local_issue_id, label_id)?; link_issue_label_tx(tx, local_issue_id, label_id)?;
} }
// Clear existing assignee links (stale removal)
tx.execute( tx.execute(
"DELETE FROM issue_assignees WHERE issue_id = ?", "DELETE FROM issue_assignees WHERE issue_id = ?",
[local_issue_id], [local_issue_id],
)?; )?;
// Insert assignees
for username in assignee_usernames { for username in assignee_usernames {
tx.execute( tx.execute(
"INSERT OR IGNORE INTO issue_assignees (issue_id, username) VALUES (?, ?)", "INSERT OR IGNORE INTO issue_assignees (issue_id, username) VALUES (?, ?)",
@@ -331,8 +287,6 @@ fn process_issue_in_transaction(
Ok(labels_created) Ok(labels_created)
} }
/// Upsert a label within a transaction, returning its ID.
/// Uses INSERT...ON CONFLICT...RETURNING for a single round-trip.
fn upsert_label_tx( fn upsert_label_tx(
tx: &Transaction<'_>, tx: &Transaction<'_>,
project_id: i64, project_id: i64,
@@ -347,7 +301,6 @@ fn upsert_label_tx(
|row| row.get(0), |row| row.get(0),
)?; )?;
// If the rowid matches last_insert_rowid, this was a new insert
if tx.last_insert_rowid() == id { if tx.last_insert_rowid() == id {
*created_count += 1; *created_count += 1;
} }
@@ -355,7 +308,6 @@ fn upsert_label_tx(
Ok(id) Ok(id)
} }
/// Link an issue to a label within a transaction.
fn link_issue_label_tx(tx: &Transaction<'_>, issue_id: i64, label_id: i64) -> Result<()> { fn link_issue_label_tx(tx: &Transaction<'_>, issue_id: i64, label_id: i64) -> Result<()> {
tx.execute( tx.execute(
"INSERT OR IGNORE INTO issue_labels (issue_id, label_id) VALUES (?, ?)", "INSERT OR IGNORE INTO issue_labels (issue_id, label_id) VALUES (?, ?)",
@@ -364,8 +316,6 @@ fn link_issue_label_tx(tx: &Transaction<'_>, issue_id: i64, label_id: i64) -> Re
Ok(()) Ok(())
} }
/// Upsert a milestone within a transaction, returning its local ID.
/// Uses RETURNING to avoid a separate SELECT round-trip.
fn upsert_milestone_tx( fn upsert_milestone_tx(
tx: &Transaction<'_>, tx: &Transaction<'_>,
project_id: i64, project_id: i64,
@@ -398,7 +348,6 @@ fn upsert_milestone_tx(
Ok(local_id) Ok(local_id)
} }
/// Get the current sync cursor for issues.
fn get_sync_cursor(conn: &Connection, project_id: i64) -> Result<SyncCursor> { fn get_sync_cursor(conn: &Connection, project_id: i64) -> Result<SyncCursor> {
let row: Option<(Option<i64>, Option<i64>)> = conn let row: Option<(Option<i64>, Option<i64>)> = conn
.query_row( .query_row(
@@ -418,7 +367,6 @@ fn get_sync_cursor(conn: &Connection, project_id: i64) -> Result<SyncCursor> {
}) })
} }
/// Update the sync cursor.
fn update_sync_cursor( fn update_sync_cursor(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -436,7 +384,6 @@ fn update_sync_cursor(
Ok(()) Ok(())
} }
/// Get issues that need discussion sync (updated_at > discussions_synced_for_updated_at).
fn get_issues_needing_discussion_sync( fn get_issues_needing_discussion_sync(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -460,8 +407,6 @@ fn get_issues_needing_discussion_sync(
Ok(issues?) Ok(issues?)
} }
/// Parse ISO 8601 timestamp to milliseconds.
/// Returns an error if parsing fails instead of silently returning 0.
fn parse_timestamp(ts: &str) -> Result<i64> { fn parse_timestamp(ts: &str) -> Result<i64> {
chrono::DateTime::parse_from_rfc3339(ts) chrono::DateTime::parse_from_rfc3339(ts)
.map(|dt| dt.timestamp_millis()) .map(|dt| dt.timestamp_millis())
@@ -500,11 +445,10 @@ mod tests {
#[test] #[test]
fn cursor_filter_allows_newer_issues() { fn cursor_filter_allows_newer_issues() {
let cursor = SyncCursor { let cursor = SyncCursor {
updated_at_cursor: Some(1705312800000), // 2024-01-15T10:00:00Z updated_at_cursor: Some(1705312800000),
tie_breaker_id: Some(100), tie_breaker_id: Some(100),
}; };
// Issue with later timestamp passes
let issue = make_test_issue(101, "2024-01-16T10:00:00.000Z"); let issue = make_test_issue(101, "2024-01-16T10:00:00.000Z");
assert!(passes_cursor_filter(&issue, &cursor).unwrap_or(false)); assert!(passes_cursor_filter(&issue, &cursor).unwrap_or(false));
} }
@@ -516,7 +460,6 @@ mod tests {
tie_breaker_id: Some(100), tie_breaker_id: Some(100),
}; };
// Issue with earlier timestamp blocked
let issue = make_test_issue(99, "2024-01-14T10:00:00.000Z"); let issue = make_test_issue(99, "2024-01-14T10:00:00.000Z");
assert!(!passes_cursor_filter(&issue, &cursor).unwrap_or(true)); assert!(!passes_cursor_filter(&issue, &cursor).unwrap_or(true));
} }
@@ -528,15 +471,12 @@ mod tests {
tie_breaker_id: Some(100), tie_breaker_id: Some(100),
}; };
// Same timestamp, higher ID passes
let issue1 = make_test_issue(101, "2024-01-15T10:00:00.000Z"); let issue1 = make_test_issue(101, "2024-01-15T10:00:00.000Z");
assert!(passes_cursor_filter(&issue1, &cursor).unwrap_or(false)); assert!(passes_cursor_filter(&issue1, &cursor).unwrap_or(false));
// Same timestamp, same ID blocked
let issue2 = make_test_issue(100, "2024-01-15T10:00:00.000Z"); let issue2 = make_test_issue(100, "2024-01-15T10:00:00.000Z");
assert!(!passes_cursor_filter(&issue2, &cursor).unwrap_or(true)); assert!(!passes_cursor_filter(&issue2, &cursor).unwrap_or(true));
// Same timestamp, lower ID blocked
let issue3 = make_test_issue(99, "2024-01-15T10:00:00.000Z"); let issue3 = make_test_issue(99, "2024-01-15T10:00:00.000Z");
assert!(!passes_cursor_filter(&issue3, &cursor).unwrap_or(true)); assert!(!passes_cursor_filter(&issue3, &cursor).unwrap_or(true));
} }

View File

@@ -1,12 +1,3 @@
//! Merge request ingestion with cursor-based incremental sync.
//!
//! Fetches merge requests from GitLab and stores them locally with:
//! - Cursor-based pagination for incremental sync
//! - Page-boundary cursor updates for crash recovery
//! - Raw payload storage with deduplication
//! - Label/assignee/reviewer extraction with clear-and-relink pattern
//! - Tracking of MRs needing discussion sync
use std::ops::Deref; use std::ops::Deref;
use rusqlite::{Connection, Transaction, params}; use rusqlite::{Connection, Transaction, params};
@@ -22,7 +13,6 @@ use crate::gitlab::transformers::merge_request::transform_merge_request;
use crate::gitlab::types::GitLabMergeRequest; use crate::gitlab::types::GitLabMergeRequest;
use crate::ingestion::dirty_tracker; use crate::ingestion::dirty_tracker;
/// Result of merge request ingestion.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct IngestMergeRequestsResult { pub struct IngestMergeRequestsResult {
pub fetched: usize, pub fetched: usize,
@@ -32,44 +22,38 @@ pub struct IngestMergeRequestsResult {
pub reviewers_linked: usize, pub reviewers_linked: usize,
} }
/// MR that needs discussion sync.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub struct MrForDiscussionSync { pub struct MrForDiscussionSync {
pub local_mr_id: i64, pub local_mr_id: i64,
pub iid: i64, pub iid: i64,
pub updated_at: i64, // ms epoch pub updated_at: i64,
} }
/// Cursor state for incremental sync.
#[derive(Debug, Default)] #[derive(Debug, Default)]
struct SyncCursor { struct SyncCursor {
updated_at_cursor: Option<i64>, updated_at_cursor: Option<i64>,
tie_breaker_id: Option<i64>, tie_breaker_id: Option<i64>,
} }
/// Ingest merge requests for a project.
pub async fn ingest_merge_requests( pub async fn ingest_merge_requests(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
config: &Config, config: &Config,
project_id: i64, // Local DB project ID project_id: i64,
gitlab_project_id: i64, // GitLab project ID gitlab_project_id: i64,
full_sync: bool, // Reset cursor if true full_sync: bool,
) -> Result<IngestMergeRequestsResult> { ) -> Result<IngestMergeRequestsResult> {
let mut result = IngestMergeRequestsResult::default(); let mut result = IngestMergeRequestsResult::default();
// Handle full sync - reset cursor and discussion watermarks
if full_sync { if full_sync {
reset_sync_cursor(conn, project_id)?; reset_sync_cursor(conn, project_id)?;
reset_discussion_watermarks(conn, project_id)?; reset_discussion_watermarks(conn, project_id)?;
info!("Full sync: cursor and discussion watermarks reset"); info!("Full sync: cursor and discussion watermarks reset");
} }
// 1. Get current cursor
let cursor = get_sync_cursor(conn, project_id)?; let cursor = get_sync_cursor(conn, project_id)?;
debug!(?cursor, "Starting MR ingestion with cursor"); debug!(?cursor, "Starting MR ingestion with cursor");
// 2. Fetch MRs page by page with cursor rewind
let mut page = 1u32; let mut page = 1u32;
let per_page = 100u32; let per_page = 100u32;
@@ -87,11 +71,9 @@ pub async fn ingest_merge_requests(
let mut last_updated_at: Option<i64> = None; let mut last_updated_at: Option<i64> = None;
let mut last_gitlab_id: Option<i64> = None; let mut last_gitlab_id: Option<i64> = None;
// 3. Process each MR
for mr in &page_result.items { for mr in &page_result.items {
result.fetched += 1; result.fetched += 1;
// Parse timestamp early
let mr_updated_at = match parse_timestamp(&mr.updated_at) { let mr_updated_at = match parse_timestamp(&mr.updated_at) {
Ok(ts) => ts, Ok(ts) => ts,
Err(e) => { Err(e) => {
@@ -104,31 +86,26 @@ pub async fn ingest_merge_requests(
} }
}; };
// Apply local cursor filter (skip already-processed due to rewind overlap)
if !passes_cursor_filter_with_ts(mr.id, mr_updated_at, &cursor) { if !passes_cursor_filter_with_ts(mr.id, mr_updated_at, &cursor) {
debug!(gitlab_id = mr.id, "Skipping already-processed MR"); debug!(gitlab_id = mr.id, "Skipping already-processed MR");
continue; continue;
} }
// Transform and store
let mr_result = process_single_mr(conn, config, project_id, mr)?; let mr_result = process_single_mr(conn, config, project_id, mr)?;
result.upserted += 1; result.upserted += 1;
result.labels_created += mr_result.labels_created; result.labels_created += mr_result.labels_created;
result.assignees_linked += mr_result.assignees_linked; result.assignees_linked += mr_result.assignees_linked;
result.reviewers_linked += mr_result.reviewers_linked; result.reviewers_linked += mr_result.reviewers_linked;
// Track cursor position
last_updated_at = Some(mr_updated_at); last_updated_at = Some(mr_updated_at);
last_gitlab_id = Some(mr.id); last_gitlab_id = Some(mr.id);
} }
// 4. Page-boundary cursor update
if let (Some(ts), Some(id)) = (last_updated_at, last_gitlab_id) { if let (Some(ts), Some(id)) = (last_updated_at, last_gitlab_id) {
update_sync_cursor(conn, project_id, ts, id)?; update_sync_cursor(conn, project_id, ts, id)?;
debug!(page, "Page-boundary cursor update"); debug!(page, "Page-boundary cursor update");
} }
// 5. Check for more pages
if page_result.is_last_page { if page_result.is_last_page {
break; break;
} }
@@ -150,27 +127,22 @@ pub async fn ingest_merge_requests(
Ok(result) Ok(result)
} }
/// Result of processing a single MR.
struct ProcessMrResult { struct ProcessMrResult {
labels_created: usize, labels_created: usize,
assignees_linked: usize, assignees_linked: usize,
reviewers_linked: usize, reviewers_linked: usize,
} }
/// Process a single MR: store payload, upsert MR, handle labels/assignees/reviewers.
/// All operations are wrapped in a transaction for atomicity.
fn process_single_mr( fn process_single_mr(
conn: &Connection, conn: &Connection,
config: &Config, config: &Config,
project_id: i64, project_id: i64,
mr: &GitLabMergeRequest, mr: &GitLabMergeRequest,
) -> Result<ProcessMrResult> { ) -> Result<ProcessMrResult> {
// Transform MR first (outside transaction - no DB access)
let payload_bytes = serde_json::to_vec(mr)?; let payload_bytes = serde_json::to_vec(mr)?;
let transformed = transform_merge_request(mr, project_id) let transformed = transform_merge_request(mr, project_id)
.map_err(|e| LoreError::Other(format!("MR transform failed: {}", e)))?; .map_err(|e| LoreError::Other(format!("MR transform failed: {}", e)))?;
// Wrap all DB operations in a transaction for atomicity
let tx = conn.unchecked_transaction()?; let tx = conn.unchecked_transaction()?;
let result = let result =
process_mr_in_transaction(&tx, config, project_id, mr, &payload_bytes, &transformed)?; process_mr_in_transaction(&tx, config, project_id, mr, &payload_bytes, &transformed)?;
@@ -179,7 +151,6 @@ fn process_single_mr(
Ok(result) Ok(result)
} }
/// Inner function that performs all DB operations within a transaction.
fn process_mr_in_transaction( fn process_mr_in_transaction(
tx: &Transaction<'_>, tx: &Transaction<'_>,
config: &Config, config: &Config,
@@ -192,7 +163,6 @@ fn process_mr_in_transaction(
let mr_row = &transformed.merge_request; let mr_row = &transformed.merge_request;
let now = now_ms(); let now = now_ms();
// Store raw payload
let payload_id = store_payload( let payload_id = store_payload(
tx.deref(), tx.deref(),
StorePayloadOptions { StorePayloadOptions {
@@ -204,7 +174,6 @@ fn process_mr_in_transaction(
}, },
)?; )?;
// Upsert merge request
tx.execute( tx.execute(
"INSERT INTO merge_requests ( "INSERT INTO merge_requests (
gitlab_id, project_id, iid, title, description, state, draft, gitlab_id, project_id, iid, title, description, state, draft,
@@ -258,17 +227,14 @@ fn process_mr_in_transaction(
], ],
)?; )?;
// Get local MR ID
let local_mr_id: i64 = tx.query_row( let local_mr_id: i64 = tx.query_row(
"SELECT id FROM merge_requests WHERE project_id = ? AND iid = ?", "SELECT id FROM merge_requests WHERE project_id = ? AND iid = ?",
(project_id, mr_row.iid), (project_id, mr_row.iid),
|row| row.get(0), |row| row.get(0),
)?; )?;
// Mark dirty for document regeneration (inside transaction)
dirty_tracker::mark_dirty_tx(tx, SourceType::MergeRequest, local_mr_id)?; dirty_tracker::mark_dirty_tx(tx, SourceType::MergeRequest, local_mr_id)?;
// Clear-and-relink labels
tx.execute( tx.execute(
"DELETE FROM mr_labels WHERE merge_request_id = ?", "DELETE FROM mr_labels WHERE merge_request_id = ?",
[local_mr_id], [local_mr_id],
@@ -281,7 +247,6 @@ fn process_mr_in_transaction(
)?; )?;
} }
// Clear-and-relink assignees
tx.execute( tx.execute(
"DELETE FROM mr_assignees WHERE merge_request_id = ?", "DELETE FROM mr_assignees WHERE merge_request_id = ?",
[local_mr_id], [local_mr_id],
@@ -294,7 +259,6 @@ fn process_mr_in_transaction(
)?; )?;
} }
// Clear-and-relink reviewers
tx.execute( tx.execute(
"DELETE FROM mr_reviewers WHERE merge_request_id = ?", "DELETE FROM mr_reviewers WHERE merge_request_id = ?",
[local_mr_id], [local_mr_id],
@@ -314,8 +278,6 @@ fn process_mr_in_transaction(
}) })
} }
/// Upsert a label within a transaction, returning its ID.
/// Uses INSERT...ON CONFLICT...RETURNING for a single round-trip.
fn upsert_label_tx( fn upsert_label_tx(
tx: &Transaction<'_>, tx: &Transaction<'_>,
project_id: i64, project_id: i64,
@@ -330,7 +292,6 @@ fn upsert_label_tx(
|row| row.get(0), |row| row.get(0),
)?; )?;
// If the rowid matches last_insert_rowid, this was a new insert
if tx.last_insert_rowid() == id { if tx.last_insert_rowid() == id {
*created_count += 1; *created_count += 1;
} }
@@ -338,11 +299,9 @@ fn upsert_label_tx(
Ok(id) Ok(id)
} }
/// Check if an MR passes the cursor filter (not already processed).
/// Takes pre-parsed timestamp to avoid redundant parsing.
fn passes_cursor_filter_with_ts(gitlab_id: i64, mr_ts: i64, cursor: &SyncCursor) -> bool { fn passes_cursor_filter_with_ts(gitlab_id: i64, mr_ts: i64, cursor: &SyncCursor) -> bool {
let Some(cursor_ts) = cursor.updated_at_cursor else { let Some(cursor_ts) = cursor.updated_at_cursor else {
return true; // No cursor = fetch all return true;
}; };
if mr_ts < cursor_ts { if mr_ts < cursor_ts {
@@ -359,7 +318,6 @@ fn passes_cursor_filter_with_ts(gitlab_id: i64, mr_ts: i64, cursor: &SyncCursor)
true true
} }
/// Get the current sync cursor for merge requests.
fn get_sync_cursor(conn: &Connection, project_id: i64) -> Result<SyncCursor> { fn get_sync_cursor(conn: &Connection, project_id: i64) -> Result<SyncCursor> {
let row: Option<(Option<i64>, Option<i64>)> = conn let row: Option<(Option<i64>, Option<i64>)> = conn
.query_row( .query_row(
@@ -379,7 +337,6 @@ fn get_sync_cursor(conn: &Connection, project_id: i64) -> Result<SyncCursor> {
}) })
} }
/// Update the sync cursor.
fn update_sync_cursor( fn update_sync_cursor(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -397,7 +354,6 @@ fn update_sync_cursor(
Ok(()) Ok(())
} }
/// Reset the sync cursor (for full sync).
fn reset_sync_cursor(conn: &Connection, project_id: i64) -> Result<()> { fn reset_sync_cursor(conn: &Connection, project_id: i64) -> Result<()> {
conn.execute( conn.execute(
"DELETE FROM sync_cursors WHERE project_id = ? AND resource_type = 'merge_requests'", "DELETE FROM sync_cursors WHERE project_id = ? AND resource_type = 'merge_requests'",
@@ -406,7 +362,6 @@ fn reset_sync_cursor(conn: &Connection, project_id: i64) -> Result<()> {
Ok(()) Ok(())
} }
/// Reset discussion and resource event watermarks for all MRs in project (for full sync).
fn reset_discussion_watermarks(conn: &Connection, project_id: i64) -> Result<()> { fn reset_discussion_watermarks(conn: &Connection, project_id: i64) -> Result<()> {
conn.execute( conn.execute(
"UPDATE merge_requests "UPDATE merge_requests
@@ -420,7 +375,6 @@ fn reset_discussion_watermarks(conn: &Connection, project_id: i64) -> Result<()>
Ok(()) Ok(())
} }
/// Get MRs that need discussion sync (updated_at > discussions_synced_for_updated_at).
pub fn get_mrs_needing_discussion_sync( pub fn get_mrs_needing_discussion_sync(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -444,7 +398,6 @@ pub fn get_mrs_needing_discussion_sync(
Ok(mrs?) Ok(mrs?)
} }
/// Parse ISO 8601 timestamp to milliseconds.
fn parse_timestamp(ts: &str) -> Result<i64> { fn parse_timestamp(ts: &str) -> Result<i64> {
chrono::DateTime::parse_from_rfc3339(ts) chrono::DateTime::parse_from_rfc3339(ts)
.map(|dt| dt.timestamp_millis()) .map(|dt| dt.timestamp_millis())
@@ -468,12 +421,11 @@ mod tests {
#[test] #[test]
fn cursor_filter_allows_newer_mrs() { fn cursor_filter_allows_newer_mrs() {
let cursor = SyncCursor { let cursor = SyncCursor {
updated_at_cursor: Some(1705312800000), // 2024-01-15T10:00:00Z updated_at_cursor: Some(1705312800000),
tie_breaker_id: Some(100), tie_breaker_id: Some(100),
}; };
// MR with later timestamp passes let later_ts = 1705399200000;
let later_ts = 1705399200000; // 2024-01-16T10:00:00Z
assert!(passes_cursor_filter_with_ts(101, later_ts, &cursor)); assert!(passes_cursor_filter_with_ts(101, later_ts, &cursor));
} }
@@ -484,8 +436,7 @@ mod tests {
tie_breaker_id: Some(100), tie_breaker_id: Some(100),
}; };
// MR with earlier timestamp blocked let earlier_ts = 1705226400000;
let earlier_ts = 1705226400000; // 2024-01-14T10:00:00Z
assert!(!passes_cursor_filter_with_ts(99, earlier_ts, &cursor)); assert!(!passes_cursor_filter_with_ts(99, earlier_ts, &cursor));
} }
@@ -496,20 +447,17 @@ mod tests {
tie_breaker_id: Some(100), tie_breaker_id: Some(100),
}; };
// Same timestamp, higher ID passes
assert!(passes_cursor_filter_with_ts(101, 1705312800000, &cursor)); assert!(passes_cursor_filter_with_ts(101, 1705312800000, &cursor));
// Same timestamp, same ID blocked
assert!(!passes_cursor_filter_with_ts(100, 1705312800000, &cursor)); assert!(!passes_cursor_filter_with_ts(100, 1705312800000, &cursor));
// Same timestamp, lower ID blocked
assert!(!passes_cursor_filter_with_ts(99, 1705312800000, &cursor)); assert!(!passes_cursor_filter_with_ts(99, 1705312800000, &cursor));
} }
#[test] #[test]
fn cursor_filter_allows_all_when_no_cursor() { fn cursor_filter_allows_all_when_no_cursor() {
let cursor = SyncCursor::default(); let cursor = SyncCursor::default();
let old_ts = 1577836800000; // 2020-01-01T00:00:00Z let old_ts = 1577836800000;
assert!(passes_cursor_filter_with_ts(1, old_ts, &cursor)); assert!(passes_cursor_filter_with_ts(1, old_ts, &cursor));
} }
} }

View File

@@ -1,8 +1,3 @@
//! Data ingestion modules for GitLab resources.
//!
//! This module handles fetching and storing issues, discussions, and notes
//! from GitLab with cursor-based incremental sync.
pub mod dirty_tracker; pub mod dirty_tracker;
pub mod discussion_queue; pub mod discussion_queue;
pub mod discussions; pub mod discussions;

View File

@@ -1,15 +1,3 @@
//! MR Discussion ingestion with atomicity guarantees.
//!
//! Critical requirements:
//! - Parse notes BEFORE any destructive DB operations
//! - Watermark advanced ONLY on full pagination success
//! - Upsert + sweep pattern for data replacement
//! - Sync health telemetry for debugging failures
//!
//! Supports two modes:
//! - Streaming: fetch and write incrementally (memory efficient)
//! - Prefetch: fetch all upfront, then write (enables parallel API calls)
use futures::StreamExt; use futures::StreamExt;
use rusqlite::{Connection, params}; use rusqlite::{Connection, params};
use tracing::{debug, info, warn}; use tracing::{debug, info, warn};
@@ -29,7 +17,6 @@ use crate::ingestion::dirty_tracker;
use super::merge_requests::MrForDiscussionSync; use super::merge_requests::MrForDiscussionSync;
/// Result of MR discussion ingestion for a single MR.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct IngestMrDiscussionsResult { pub struct IngestMrDiscussionsResult {
pub discussions_fetched: usize, pub discussions_fetched: usize,
@@ -40,20 +27,15 @@ pub struct IngestMrDiscussionsResult {
pub pagination_succeeded: bool, pub pagination_succeeded: bool,
} }
/// Prefetched discussions for an MR (ready for DB write).
/// This separates the API fetch phase from the DB write phase to enable parallelism.
#[derive(Debug)] #[derive(Debug)]
pub struct PrefetchedMrDiscussions { pub struct PrefetchedMrDiscussions {
pub mr: MrForDiscussionSync, pub mr: MrForDiscussionSync,
pub discussions: Vec<PrefetchedDiscussion>, pub discussions: Vec<PrefetchedDiscussion>,
pub fetch_error: Option<String>, pub fetch_error: Option<String>,
/// True if any discussions failed to transform (skip sweep if true)
pub had_transform_errors: bool, pub had_transform_errors: bool,
/// Count of notes skipped due to transform errors
pub notes_skipped_count: usize, pub notes_skipped_count: usize,
} }
/// A single prefetched discussion with transformed data.
#[derive(Debug)] #[derive(Debug)]
pub struct PrefetchedDiscussion { pub struct PrefetchedDiscussion {
pub raw: GitLabDiscussion, pub raw: GitLabDiscussion,
@@ -61,8 +43,6 @@ pub struct PrefetchedDiscussion {
pub notes: Vec<NormalizedNote>, pub notes: Vec<NormalizedNote>,
} }
/// Fetch discussions for an MR without writing to DB.
/// This can be called in parallel for multiple MRs.
pub async fn prefetch_mr_discussions( pub async fn prefetch_mr_discussions(
client: &GitLabClient, client: &GitLabClient,
gitlab_project_id: i64, gitlab_project_id: i64,
@@ -71,7 +51,6 @@ pub async fn prefetch_mr_discussions(
) -> PrefetchedMrDiscussions { ) -> PrefetchedMrDiscussions {
debug!(mr_iid = mr.iid, "Prefetching discussions for MR"); debug!(mr_iid = mr.iid, "Prefetching discussions for MR");
// Fetch all discussions from GitLab
let raw_discussions = match client let raw_discussions = match client
.fetch_all_mr_discussions(gitlab_project_id, mr.iid) .fetch_all_mr_discussions(gitlab_project_id, mr.iid)
.await .await
@@ -88,13 +67,11 @@ pub async fn prefetch_mr_discussions(
} }
}; };
// Transform each discussion
let mut discussions = Vec::with_capacity(raw_discussions.len()); let mut discussions = Vec::with_capacity(raw_discussions.len());
let mut had_transform_errors = false; let mut had_transform_errors = false;
let mut notes_skipped_count = 0; let mut notes_skipped_count = 0;
for raw in raw_discussions { for raw in raw_discussions {
// Transform notes
let notes = match transform_notes_with_diff_position(&raw, local_project_id) { let notes = match transform_notes_with_diff_position(&raw, local_project_id) {
Ok(n) => n, Ok(n) => n,
Err(e) => { Err(e) => {
@@ -104,14 +81,12 @@ pub async fn prefetch_mr_discussions(
error = %e, error = %e,
"Note transform failed during prefetch" "Note transform failed during prefetch"
); );
// Track the failure - don't sweep stale data if transforms failed
had_transform_errors = true; had_transform_errors = true;
notes_skipped_count += raw.notes.len(); notes_skipped_count += raw.notes.len();
continue; continue;
} }
}; };
// Transform discussion
let normalized = transform_mr_discussion(&raw, local_project_id, mr.local_mr_id); let normalized = transform_mr_discussion(&raw, local_project_id, mr.local_mr_id);
discussions.push(PrefetchedDiscussion { discussions.push(PrefetchedDiscussion {
@@ -130,15 +105,12 @@ pub async fn prefetch_mr_discussions(
} }
} }
/// Write prefetched discussions to DB.
/// This must be called serially (rusqlite Connection is not Send).
pub fn write_prefetched_mr_discussions( pub fn write_prefetched_mr_discussions(
conn: &Connection, conn: &Connection,
config: &Config, config: &Config,
local_project_id: i64, local_project_id: i64,
prefetched: PrefetchedMrDiscussions, prefetched: PrefetchedMrDiscussions,
) -> Result<IngestMrDiscussionsResult> { ) -> Result<IngestMrDiscussionsResult> {
// Sync succeeds only if no fetch errors AND no transform errors
let sync_succeeded = prefetched.fetch_error.is_none() && !prefetched.had_transform_errors; let sync_succeeded = prefetched.fetch_error.is_none() && !prefetched.had_transform_errors;
let mut result = IngestMrDiscussionsResult { let mut result = IngestMrDiscussionsResult {
@@ -149,7 +121,6 @@ pub fn write_prefetched_mr_discussions(
let mr = &prefetched.mr; let mr = &prefetched.mr;
// Handle fetch errors
if let Some(error) = &prefetched.fetch_error { if let Some(error) = &prefetched.fetch_error {
warn!(mr_iid = mr.iid, error = %error, "Prefetch failed for MR"); warn!(mr_iid = mr.iid, error = %error, "Prefetch failed for MR");
record_sync_health_error(conn, mr.local_mr_id, error)?; record_sync_health_error(conn, mr.local_mr_id, error)?;
@@ -158,9 +129,7 @@ pub fn write_prefetched_mr_discussions(
let run_seen_at = now_ms(); let run_seen_at = now_ms();
// Write each discussion
for disc in &prefetched.discussions { for disc in &prefetched.discussions {
// Count DiffNotes upfront (independent of transaction)
let diffnotes_in_disc = disc let diffnotes_in_disc = disc
.notes .notes
.iter() .iter()
@@ -168,10 +137,8 @@ pub fn write_prefetched_mr_discussions(
.count(); .count();
let notes_in_disc = disc.notes.len(); let notes_in_disc = disc.notes.len();
// Start transaction
let tx = conn.unchecked_transaction()?; let tx = conn.unchecked_transaction()?;
// Store raw payload
let payload_bytes = serde_json::to_vec(&disc.raw)?; let payload_bytes = serde_json::to_vec(&disc.raw)?;
let payload_id = Some(store_payload( let payload_id = Some(store_payload(
&tx, &tx,
@@ -184,20 +151,16 @@ pub fn write_prefetched_mr_discussions(
}, },
)?); )?);
// Upsert discussion
upsert_discussion(&tx, &disc.normalized, run_seen_at, payload_id)?; upsert_discussion(&tx, &disc.normalized, run_seen_at, payload_id)?;
// Get local discussion ID
let local_discussion_id: i64 = tx.query_row( let local_discussion_id: i64 = tx.query_row(
"SELECT id FROM discussions WHERE project_id = ? AND gitlab_discussion_id = ?", "SELECT id FROM discussions WHERE project_id = ? AND gitlab_discussion_id = ?",
params![local_project_id, &disc.normalized.gitlab_discussion_id], params![local_project_id, &disc.normalized.gitlab_discussion_id],
|row| row.get(0), |row| row.get(0),
)?; )?;
// Mark dirty for document regeneration (inside transaction)
dirty_tracker::mark_dirty_tx(&tx, SourceType::Discussion, local_discussion_id)?; dirty_tracker::mark_dirty_tx(&tx, SourceType::Discussion, local_discussion_id)?;
// Upsert notes
for note in &disc.notes { for note in &disc.notes {
let should_store_payload = !note.is_system let should_store_payload = !note.is_system
|| note.position_new_path.is_some() || note.position_new_path.is_some()
@@ -229,15 +192,12 @@ pub fn write_prefetched_mr_discussions(
tx.commit()?; tx.commit()?;
// Increment counters AFTER successful commit to keep metrics honest
result.discussions_fetched += 1; result.discussions_fetched += 1;
result.discussions_upserted += 1; result.discussions_upserted += 1;
result.notes_upserted += notes_in_disc; result.notes_upserted += notes_in_disc;
result.diffnotes_count += diffnotes_in_disc; result.diffnotes_count += diffnotes_in_disc;
} }
// Only sweep stale data and advance watermark on full success
// If any discussions failed to transform, preserve existing data
if sync_succeeded { if sync_succeeded {
sweep_stale_discussions(conn, mr.local_mr_id, run_seen_at)?; sweep_stale_discussions(conn, mr.local_mr_id, run_seen_at)?;
sweep_stale_notes(conn, local_project_id, mr.local_mr_id, run_seen_at)?; sweep_stale_notes(conn, local_project_id, mr.local_mr_id, run_seen_at)?;
@@ -259,7 +219,6 @@ pub fn write_prefetched_mr_discussions(
Ok(result) Ok(result)
} }
/// Ingest discussions for MRs that need sync.
pub async fn ingest_mr_discussions( pub async fn ingest_mr_discussions(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
@@ -269,7 +228,7 @@ pub async fn ingest_mr_discussions(
mrs: &[MrForDiscussionSync], mrs: &[MrForDiscussionSync],
) -> Result<IngestMrDiscussionsResult> { ) -> Result<IngestMrDiscussionsResult> {
let mut total_result = IngestMrDiscussionsResult { let mut total_result = IngestMrDiscussionsResult {
pagination_succeeded: true, // Start optimistic pagination_succeeded: true,
..Default::default() ..Default::default()
}; };
@@ -289,7 +248,6 @@ pub async fn ingest_mr_discussions(
total_result.notes_upserted += result.notes_upserted; total_result.notes_upserted += result.notes_upserted;
total_result.notes_skipped_bad_timestamp += result.notes_skipped_bad_timestamp; total_result.notes_skipped_bad_timestamp += result.notes_skipped_bad_timestamp;
total_result.diffnotes_count += result.diffnotes_count; total_result.diffnotes_count += result.diffnotes_count;
// Pagination failed for any MR means overall failure
if !result.pagination_succeeded { if !result.pagination_succeeded {
total_result.pagination_succeeded = false; total_result.pagination_succeeded = false;
} }
@@ -309,7 +267,6 @@ pub async fn ingest_mr_discussions(
Ok(total_result) Ok(total_result)
} }
/// Ingest discussions for a single MR.
async fn ingest_discussions_for_mr( async fn ingest_discussions_for_mr(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
@@ -329,13 +286,10 @@ async fn ingest_discussions_for_mr(
"Fetching discussions for MR" "Fetching discussions for MR"
); );
// Record sync start time for sweep
let run_seen_at = now_ms(); let run_seen_at = now_ms();
// Stream discussions from GitLab
let mut discussions_stream = client.paginate_mr_discussions(gitlab_project_id, mr.iid); let mut discussions_stream = client.paginate_mr_discussions(gitlab_project_id, mr.iid);
// Track if we've received any response
let mut received_first_response = false; let mut received_first_response = false;
while let Some(disc_result) = discussions_stream.next().await { while let Some(disc_result) = discussions_stream.next().await {
@@ -343,7 +297,6 @@ async fn ingest_discussions_for_mr(
received_first_response = true; received_first_response = true;
} }
// Handle pagination errors - don't advance watermark
let gitlab_discussion = match disc_result { let gitlab_discussion = match disc_result {
Ok(d) => d, Ok(d) => d,
Err(e) => { Err(e) => {
@@ -357,7 +310,6 @@ async fn ingest_discussions_for_mr(
break; break;
} }
}; };
// CRITICAL: Parse notes BEFORE any destructive DB operations
let notes = match transform_notes_with_diff_position(&gitlab_discussion, local_project_id) { let notes = match transform_notes_with_diff_position(&gitlab_discussion, local_project_id) {
Ok(notes) => notes, Ok(notes) => notes,
Err(e) => { Err(e) => {
@@ -369,25 +321,21 @@ async fn ingest_discussions_for_mr(
); );
result.notes_skipped_bad_timestamp += gitlab_discussion.notes.len(); result.notes_skipped_bad_timestamp += gitlab_discussion.notes.len();
result.pagination_succeeded = false; result.pagination_succeeded = false;
continue; // Skip this discussion, preserve existing data continue;
} }
}; };
// Count DiffNotes upfront (independent of transaction)
let diffnotes_in_disc = notes let diffnotes_in_disc = notes
.iter() .iter()
.filter(|n| n.position_new_path.is_some() || n.position_old_path.is_some()) .filter(|n| n.position_new_path.is_some() || n.position_old_path.is_some())
.count(); .count();
let notes_count = notes.len(); let notes_count = notes.len();
// Transform discussion
let normalized_discussion = let normalized_discussion =
transform_mr_discussion(&gitlab_discussion, local_project_id, mr.local_mr_id); transform_mr_discussion(&gitlab_discussion, local_project_id, mr.local_mr_id);
// Only NOW start transaction (after parse succeeded)
let tx = conn.unchecked_transaction()?; let tx = conn.unchecked_transaction()?;
// Store raw payload
let payload_bytes = serde_json::to_vec(&gitlab_discussion)?; let payload_bytes = serde_json::to_vec(&gitlab_discussion)?;
let payload_id = Some(store_payload( let payload_id = Some(store_payload(
&tx, &tx,
@@ -400,10 +348,8 @@ async fn ingest_discussions_for_mr(
}, },
)?); )?);
// Upsert discussion with run_seen_at
upsert_discussion(&tx, &normalized_discussion, run_seen_at, payload_id)?; upsert_discussion(&tx, &normalized_discussion, run_seen_at, payload_id)?;
// Get local discussion ID
let local_discussion_id: i64 = tx.query_row( let local_discussion_id: i64 = tx.query_row(
"SELECT id FROM discussions WHERE project_id = ? AND gitlab_discussion_id = ?", "SELECT id FROM discussions WHERE project_id = ? AND gitlab_discussion_id = ?",
params![ params![
@@ -413,12 +359,9 @@ async fn ingest_discussions_for_mr(
|row| row.get(0), |row| row.get(0),
)?; )?;
// Mark dirty for document regeneration (inside transaction)
dirty_tracker::mark_dirty_tx(&tx, SourceType::Discussion, local_discussion_id)?; dirty_tracker::mark_dirty_tx(&tx, SourceType::Discussion, local_discussion_id)?;
// Upsert notes (not delete-all-then-insert)
for note in &notes { for note in &notes {
// Selective payload storage: skip system notes without position
let should_store_payload = !note.is_system let should_store_payload = !note.is_system
|| note.position_new_path.is_some() || note.position_new_path.is_some()
|| note.position_old_path.is_some(); || note.position_old_path.is_some();
@@ -452,22 +395,17 @@ async fn ingest_discussions_for_mr(
tx.commit()?; tx.commit()?;
// Increment counters AFTER successful commit to keep metrics honest
result.discussions_fetched += 1; result.discussions_fetched += 1;
result.discussions_upserted += 1; result.discussions_upserted += 1;
result.notes_upserted += notes_count; result.notes_upserted += notes_count;
result.diffnotes_count += diffnotes_in_disc; result.diffnotes_count += diffnotes_in_disc;
} }
// Only sweep stale data and advance watermark on full success
if result.pagination_succeeded && received_first_response { if result.pagination_succeeded && received_first_response {
// Sweep stale discussions for this MR
sweep_stale_discussions(conn, mr.local_mr_id, run_seen_at)?; sweep_stale_discussions(conn, mr.local_mr_id, run_seen_at)?;
// Sweep stale notes for this MR
sweep_stale_notes(conn, local_project_id, mr.local_mr_id, run_seen_at)?; sweep_stale_notes(conn, local_project_id, mr.local_mr_id, run_seen_at)?;
// Advance watermark
mark_discussions_synced(conn, mr.local_mr_id, mr.updated_at)?; mark_discussions_synced(conn, mr.local_mr_id, mr.updated_at)?;
clear_sync_health_error(conn, mr.local_mr_id)?; clear_sync_health_error(conn, mr.local_mr_id)?;
@@ -476,7 +414,6 @@ async fn ingest_discussions_for_mr(
"MR discussion sync complete, watermark advanced" "MR discussion sync complete, watermark advanced"
); );
} else if result.pagination_succeeded && !received_first_response { } else if result.pagination_succeeded && !received_first_response {
// Empty response (no discussions) - still safe to sweep and advance
sweep_stale_discussions(conn, mr.local_mr_id, run_seen_at)?; sweep_stale_discussions(conn, mr.local_mr_id, run_seen_at)?;
sweep_stale_notes(conn, local_project_id, mr.local_mr_id, run_seen_at)?; sweep_stale_notes(conn, local_project_id, mr.local_mr_id, run_seen_at)?;
mark_discussions_synced(conn, mr.local_mr_id, mr.updated_at)?; mark_discussions_synced(conn, mr.local_mr_id, mr.updated_at)?;
@@ -493,7 +430,6 @@ async fn ingest_discussions_for_mr(
Ok(result) Ok(result)
} }
/// Upsert a discussion with last_seen_at for sweep.
fn upsert_discussion( fn upsert_discussion(
conn: &Connection, conn: &Connection,
discussion: &crate::gitlab::transformers::NormalizedDiscussion, discussion: &crate::gitlab::transformers::NormalizedDiscussion,
@@ -531,7 +467,6 @@ fn upsert_discussion(
Ok(()) Ok(())
} }
/// Upsert a note with last_seen_at for sweep.
fn upsert_note( fn upsert_note(
conn: &Connection, conn: &Connection,
discussion_id: i64, discussion_id: i64,
@@ -601,7 +536,6 @@ fn upsert_note(
Ok(()) Ok(())
} }
/// Sweep stale discussions (not seen in this run).
fn sweep_stale_discussions(conn: &Connection, local_mr_id: i64, run_seen_at: i64) -> Result<usize> { fn sweep_stale_discussions(conn: &Connection, local_mr_id: i64, run_seen_at: i64) -> Result<usize> {
let deleted = conn.execute( let deleted = conn.execute(
"DELETE FROM discussions "DELETE FROM discussions
@@ -614,7 +548,6 @@ fn sweep_stale_discussions(conn: &Connection, local_mr_id: i64, run_seen_at: i64
Ok(deleted) Ok(deleted)
} }
/// Sweep stale notes for discussions belonging to this MR.
fn sweep_stale_notes( fn sweep_stale_notes(
conn: &Connection, conn: &Connection,
local_project_id: i64, local_project_id: i64,
@@ -636,7 +569,6 @@ fn sweep_stale_notes(
Ok(deleted) Ok(deleted)
} }
/// Mark MR discussions as synced (advance watermark).
fn mark_discussions_synced(conn: &Connection, local_mr_id: i64, updated_at: i64) -> Result<()> { fn mark_discussions_synced(conn: &Connection, local_mr_id: i64, updated_at: i64) -> Result<()> {
conn.execute( conn.execute(
"UPDATE merge_requests SET discussions_synced_for_updated_at = ? WHERE id = ?", "UPDATE merge_requests SET discussions_synced_for_updated_at = ? WHERE id = ?",
@@ -645,7 +577,6 @@ fn mark_discussions_synced(conn: &Connection, local_mr_id: i64, updated_at: i64)
Ok(()) Ok(())
} }
/// Record sync health error for debugging.
fn record_sync_health_error(conn: &Connection, local_mr_id: i64, error: &str) -> Result<()> { fn record_sync_health_error(conn: &Connection, local_mr_id: i64, error: &str) -> Result<()> {
conn.execute( conn.execute(
"UPDATE merge_requests SET "UPDATE merge_requests SET
@@ -658,7 +589,6 @@ fn record_sync_health_error(conn: &Connection, local_mr_id: i64, error: &str) ->
Ok(()) Ok(())
} }
/// Clear sync health error on success.
fn clear_sync_health_error(conn: &Connection, local_mr_id: i64) -> Result<()> { fn clear_sync_health_error(conn: &Connection, local_mr_id: i64) -> Result<()> {
conn.execute( conn.execute(
"UPDATE merge_requests SET "UPDATE merge_requests SET

View File

@@ -1,10 +1,3 @@
//! Ingestion orchestrator: coordinates issue/MR and discussion sync.
//!
//! Implements the canonical pattern:
//! 1. Fetch resources (issues or MRs) with cursor-based sync
//! 2. Identify resources needing discussion sync
//! 3. Execute discussion sync with parallel prefetch (fetch in parallel, write serially)
use futures::future::join_all; use futures::future::join_all;
use rusqlite::Connection; use rusqlite::Connection;
use tracing::{debug, info, instrument, warn}; use tracing::{debug, info, instrument, warn};
@@ -14,6 +7,9 @@ use crate::core::dependent_queue::{
claim_jobs, complete_job, count_claimable_jobs, enqueue_job, fail_job, reclaim_stale_locks, claim_jobs, complete_job, count_claimable_jobs, enqueue_job, fail_job, reclaim_stale_locks,
}; };
use crate::core::error::Result; use crate::core::error::Result;
use crate::core::references::{
EntityReference, insert_entity_reference, resolve_issue_local_id, resolve_project_path,
};
use crate::gitlab::GitLabClient; use crate::gitlab::GitLabClient;
use super::discussions::ingest_issue_discussions; use super::discussions::ingest_issue_discussions;
@@ -23,45 +19,30 @@ use super::merge_requests::{
}; };
use super::mr_discussions::{prefetch_mr_discussions, write_prefetched_mr_discussions}; use super::mr_discussions::{prefetch_mr_discussions, write_prefetched_mr_discussions};
/// Progress callback for ingestion operations.
pub type ProgressCallback = Box<dyn Fn(ProgressEvent) + Send + Sync>; pub type ProgressCallback = Box<dyn Fn(ProgressEvent) + Send + Sync>;
/// Progress events emitted during ingestion.
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub enum ProgressEvent { pub enum ProgressEvent {
/// Issue fetching started
IssuesFetchStarted, IssuesFetchStarted,
/// An issue was fetched (current count)
IssueFetched { count: usize }, IssueFetched { count: usize },
/// Issue fetching complete
IssuesFetchComplete { total: usize }, IssuesFetchComplete { total: usize },
/// Discussion sync started (total issues to sync)
DiscussionSyncStarted { total: usize }, DiscussionSyncStarted { total: usize },
/// Discussion synced for an issue (current/total)
DiscussionSynced { current: usize, total: usize }, DiscussionSynced { current: usize, total: usize },
/// Discussion sync complete
DiscussionSyncComplete, DiscussionSyncComplete,
/// MR fetching started
MrsFetchStarted, MrsFetchStarted,
/// An MR was fetched (current count)
MrFetched { count: usize }, MrFetched { count: usize },
/// MR fetching complete
MrsFetchComplete { total: usize }, MrsFetchComplete { total: usize },
/// MR discussion sync started (total MRs to sync)
MrDiscussionSyncStarted { total: usize }, MrDiscussionSyncStarted { total: usize },
/// MR discussion synced (current/total)
MrDiscussionSynced { current: usize, total: usize }, MrDiscussionSynced { current: usize, total: usize },
/// MR discussion sync complete
MrDiscussionSyncComplete, MrDiscussionSyncComplete,
/// Resource event fetching started (total jobs)
ResourceEventsFetchStarted { total: usize }, ResourceEventsFetchStarted { total: usize },
/// Resource event fetched for an entity (current/total)
ResourceEventFetched { current: usize, total: usize }, ResourceEventFetched { current: usize, total: usize },
/// Resource event fetching complete
ResourceEventsFetchComplete { fetched: usize, failed: usize }, ResourceEventsFetchComplete { fetched: usize, failed: usize },
ClosesIssuesFetchStarted { total: usize },
ClosesIssueFetched { current: usize, total: usize },
ClosesIssuesFetchComplete { fetched: usize, failed: usize },
} }
/// Result of full project ingestion (issues).
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct IngestProjectResult { pub struct IngestProjectResult {
pub issues_fetched: usize, pub issues_fetched: usize,
@@ -76,7 +57,6 @@ pub struct IngestProjectResult {
pub resource_events_failed: usize, pub resource_events_failed: usize,
} }
/// Result of MR ingestion for a project.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct IngestMrProjectResult { pub struct IngestMrProjectResult {
pub mrs_fetched: usize, pub mrs_fetched: usize,
@@ -93,9 +73,10 @@ pub struct IngestMrProjectResult {
pub mrs_skipped_discussion_sync: usize, pub mrs_skipped_discussion_sync: usize,
pub resource_events_fetched: usize, pub resource_events_fetched: usize,
pub resource_events_failed: usize, pub resource_events_failed: usize,
pub closes_issues_fetched: usize,
pub closes_issues_failed: usize,
} }
/// Ingest all issues and their discussions for a project.
pub async fn ingest_project_issues( pub async fn ingest_project_issues(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
@@ -107,7 +88,6 @@ pub async fn ingest_project_issues(
.await .await
} }
/// Ingest all issues and their discussions for a project with progress reporting.
#[instrument( #[instrument(
skip(conn, client, config, progress), skip(conn, client, config, progress),
fields(project_id, gitlab_project_id, items_processed, items_skipped, errors) fields(project_id, gitlab_project_id, items_processed, items_skipped, errors)
@@ -127,7 +107,6 @@ pub async fn ingest_project_issues_with_progress(
} }
}; };
// Step 1: Ingest issues
emit(ProgressEvent::IssuesFetchStarted); emit(ProgressEvent::IssuesFetchStarted);
let issue_result = ingest_issues(conn, client, config, project_id, gitlab_project_id).await?; let issue_result = ingest_issues(conn, client, config, project_id, gitlab_project_id).await?;
@@ -139,10 +118,8 @@ pub async fn ingest_project_issues_with_progress(
total: result.issues_fetched, total: result.issues_fetched,
}); });
// Step 2: Sync discussions for issues that need it
let issues_needing_sync = issue_result.issues_needing_discussion_sync; let issues_needing_sync = issue_result.issues_needing_discussion_sync;
// Query actual total issues for accurate skip count (issues_upserted only counts this run)
let total_issues: i64 = conn let total_issues: i64 = conn
.query_row( .query_row(
"SELECT COUNT(*) FROM issues WHERE project_id = ?", "SELECT COUNT(*) FROM issues WHERE project_id = ?",
@@ -153,7 +130,6 @@ pub async fn ingest_project_issues_with_progress(
let total_issues = total_issues as usize; let total_issues = total_issues as usize;
result.issues_skipped_discussion_sync = total_issues.saturating_sub(issues_needing_sync.len()); result.issues_skipped_discussion_sync = total_issues.saturating_sub(issues_needing_sync.len());
// Step 3: Sync discussions for issues that need it
if issues_needing_sync.is_empty() { if issues_needing_sync.is_empty() {
info!("No issues need discussion sync"); info!("No issues need discussion sync");
} else { } else {
@@ -166,7 +142,6 @@ pub async fn ingest_project_issues_with_progress(
total: issues_needing_sync.len(), total: issues_needing_sync.len(),
}); });
// Execute sequential discussion sync (see function doc for why not concurrent)
let discussion_results = sync_discussions_sequential( let discussion_results = sync_discussions_sequential(
conn, conn,
client, client,
@@ -180,7 +155,6 @@ pub async fn ingest_project_issues_with_progress(
emit(ProgressEvent::DiscussionSyncComplete); emit(ProgressEvent::DiscussionSyncComplete);
// Aggregate discussion results
for disc_result in discussion_results { for disc_result in discussion_results {
result.discussions_fetched += disc_result.discussions_fetched; result.discussions_fetched += disc_result.discussions_fetched;
result.discussions_upserted += disc_result.discussions_upserted; result.discussions_upserted += disc_result.discussions_upserted;
@@ -189,15 +163,12 @@ pub async fn ingest_project_issues_with_progress(
} }
} }
// Step 4: Enqueue and drain resource events (if enabled)
if config.sync.fetch_resource_events { if config.sync.fetch_resource_events {
// Enqueue resource_events jobs for all issues in this project
let enqueued = enqueue_resource_events_for_entity_type(conn, project_id, "issue")?; let enqueued = enqueue_resource_events_for_entity_type(conn, project_id, "issue")?;
if enqueued > 0 { if enqueued > 0 {
debug!(enqueued, "Enqueued resource events jobs for issues"); debug!(enqueued, "Enqueued resource events jobs for issues");
} }
// Drain the queue
let drain_result = drain_resource_events( let drain_result = drain_resource_events(
conn, conn,
client, client,
@@ -209,6 +180,15 @@ pub async fn ingest_project_issues_with_progress(
.await?; .await?;
result.resource_events_fetched = drain_result.fetched; result.resource_events_fetched = drain_result.fetched;
result.resource_events_failed = drain_result.failed; result.resource_events_failed = drain_result.failed;
let refs_inserted =
crate::core::references::extract_refs_from_state_events(conn, project_id)?;
if refs_inserted > 0 {
debug!(
refs_inserted,
"Extracted cross-references from state events"
);
}
} }
info!( info!(
@@ -231,12 +211,6 @@ pub async fn ingest_project_issues_with_progress(
Ok(result) Ok(result)
} }
/// Sync discussions sequentially for each issue.
///
/// NOTE: Despite the config having `dependent_concurrency`, we process sequentially
/// because rusqlite's `Connection` is not `Send` and cannot be shared across tasks.
/// True concurrency would require connection pooling (r2d2, deadpool, etc.).
/// The batch_size from config is used for progress logging granularity.
async fn sync_discussions_sequential( async fn sync_discussions_sequential(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
@@ -251,7 +225,6 @@ async fn sync_discussions_sequential(
let mut results = Vec::with_capacity(issues.len()); let mut results = Vec::with_capacity(issues.len());
// Process in batches for progress feedback (actual processing is sequential)
for chunk in issues.chunks(batch_size) { for chunk in issues.chunks(batch_size) {
for issue in chunk { for issue in chunk {
let disc_result = ingest_issue_discussions( let disc_result = ingest_issue_discussions(
@@ -265,7 +238,6 @@ async fn sync_discussions_sequential(
.await?; .await?;
results.push(disc_result); results.push(disc_result);
// Emit progress
if let Some(cb) = progress { if let Some(cb) = progress {
cb(ProgressEvent::DiscussionSynced { cb(ProgressEvent::DiscussionSynced {
current: results.len(), current: results.len(),
@@ -278,7 +250,6 @@ async fn sync_discussions_sequential(
Ok(results) Ok(results)
} }
/// Ingest all merge requests and their discussions for a project.
pub async fn ingest_project_merge_requests( pub async fn ingest_project_merge_requests(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
@@ -299,7 +270,6 @@ pub async fn ingest_project_merge_requests(
.await .await
} }
/// Ingest all merge requests and their discussions for a project with progress reporting.
#[instrument( #[instrument(
skip(conn, client, config, progress), skip(conn, client, config, progress),
fields(project_id, gitlab_project_id, items_processed, items_skipped, errors) fields(project_id, gitlab_project_id, items_processed, items_skipped, errors)
@@ -320,7 +290,6 @@ pub async fn ingest_project_merge_requests_with_progress(
} }
}; };
// Step 1: Ingest MRs
emit(ProgressEvent::MrsFetchStarted); emit(ProgressEvent::MrsFetchStarted);
let mr_result = ingest_merge_requests( let mr_result = ingest_merge_requests(
conn, conn,
@@ -342,11 +311,8 @@ pub async fn ingest_project_merge_requests_with_progress(
total: result.mrs_fetched, total: result.mrs_fetched,
}); });
// Step 2: Query DB for MRs needing discussion sync
// CRITICAL: Query AFTER ingestion to avoid memory growth during large ingests
let mrs_needing_sync = get_mrs_needing_discussion_sync(conn, project_id)?; let mrs_needing_sync = get_mrs_needing_discussion_sync(conn, project_id)?;
// Query total MRs for accurate skip count
let total_mrs: i64 = conn let total_mrs: i64 = conn
.query_row( .query_row(
"SELECT COUNT(*) FROM merge_requests WHERE project_id = ?", "SELECT COUNT(*) FROM merge_requests WHERE project_id = ?",
@@ -357,7 +323,6 @@ pub async fn ingest_project_merge_requests_with_progress(
let total_mrs = total_mrs as usize; let total_mrs = total_mrs as usize;
result.mrs_skipped_discussion_sync = total_mrs.saturating_sub(mrs_needing_sync.len()); result.mrs_skipped_discussion_sync = total_mrs.saturating_sub(mrs_needing_sync.len());
// Step 3: Sync discussions for MRs that need it
if mrs_needing_sync.is_empty() { if mrs_needing_sync.is_empty() {
info!("No MRs need discussion sync"); info!("No MRs need discussion sync");
} else { } else {
@@ -370,7 +335,6 @@ pub async fn ingest_project_merge_requests_with_progress(
total: mrs_needing_sync.len(), total: mrs_needing_sync.len(),
}); });
// Execute sequential MR discussion sync
let discussion_results = sync_mr_discussions_sequential( let discussion_results = sync_mr_discussions_sequential(
conn, conn,
client, client,
@@ -384,7 +348,6 @@ pub async fn ingest_project_merge_requests_with_progress(
emit(ProgressEvent::MrDiscussionSyncComplete); emit(ProgressEvent::MrDiscussionSyncComplete);
// Aggregate discussion results
for disc_result in discussion_results { for disc_result in discussion_results {
result.discussions_fetched += disc_result.discussions_fetched; result.discussions_fetched += disc_result.discussions_fetched;
result.discussions_upserted += disc_result.discussions_upserted; result.discussions_upserted += disc_result.discussions_upserted;
@@ -397,7 +360,6 @@ pub async fn ingest_project_merge_requests_with_progress(
} }
} }
// Step 4: Enqueue and drain resource events (if enabled)
if config.sync.fetch_resource_events { if config.sync.fetch_resource_events {
let enqueued = enqueue_resource_events_for_entity_type(conn, project_id, "merge_request")?; let enqueued = enqueue_resource_events_for_entity_type(conn, project_id, "merge_request")?;
if enqueued > 0 { if enqueued > 0 {
@@ -415,6 +377,44 @@ pub async fn ingest_project_merge_requests_with_progress(
.await?; .await?;
result.resource_events_fetched = drain_result.fetched; result.resource_events_fetched = drain_result.fetched;
result.resource_events_failed = drain_result.failed; result.resource_events_failed = drain_result.failed;
let refs_inserted =
crate::core::references::extract_refs_from_state_events(conn, project_id)?;
if refs_inserted > 0 {
debug!(
refs_inserted,
"Extracted cross-references from state events"
);
}
}
let note_refs = crate::core::note_parser::extract_refs_from_system_notes(conn, project_id)?;
if note_refs.inserted > 0 || note_refs.skipped_unresolvable > 0 {
debug!(
inserted = note_refs.inserted,
unresolvable = note_refs.skipped_unresolvable,
parse_failures = note_refs.parse_failures,
"Extracted cross-references from system notes (MRs)"
);
}
{
let enqueued = enqueue_mr_closes_issues_jobs(conn, project_id)?;
if enqueued > 0 {
debug!(enqueued, "Enqueued mr_closes_issues jobs");
}
let closes_result = drain_mr_closes_issues(
conn,
client,
config,
project_id,
gitlab_project_id,
&progress,
)
.await?;
result.closes_issues_fetched = closes_result.fetched;
result.closes_issues_failed = closes_result.failed;
} }
info!( info!(
@@ -428,6 +428,8 @@ pub async fn ingest_project_merge_requests_with_progress(
mrs_skipped = result.mrs_skipped_discussion_sync, mrs_skipped = result.mrs_skipped_discussion_sync,
resource_events_fetched = result.resource_events_fetched, resource_events_fetched = result.resource_events_fetched,
resource_events_failed = result.resource_events_failed, resource_events_failed = result.resource_events_failed,
closes_issues_fetched = result.closes_issues_fetched,
closes_issues_failed = result.closes_issues_failed,
"MR project ingestion complete" "MR project ingestion complete"
); );
@@ -438,10 +440,6 @@ pub async fn ingest_project_merge_requests_with_progress(
Ok(result) Ok(result)
} }
/// Sync discussions for MRs with parallel API prefetching.
///
/// Pattern: Fetch discussions for multiple MRs in parallel, then write serially.
/// This overlaps network I/O while respecting rusqlite's single-connection constraint.
async fn sync_mr_discussions_sequential( async fn sync_mr_discussions_sequential(
conn: &Connection, conn: &Connection,
client: &GitLabClient, client: &GitLabClient,
@@ -457,22 +455,18 @@ async fn sync_mr_discussions_sequential(
let mut results = Vec::with_capacity(mrs.len()); let mut results = Vec::with_capacity(mrs.len());
let mut processed = 0; let mut processed = 0;
// Process in batches: parallel API fetch, serial DB write
for chunk in mrs.chunks(batch_size) { for chunk in mrs.chunks(batch_size) {
// Step 1: Prefetch discussions for all MRs in this batch in parallel
let prefetch_futures = chunk.iter().map(|mr| { let prefetch_futures = chunk.iter().map(|mr| {
prefetch_mr_discussions(client, gitlab_project_id, local_project_id, mr.clone()) prefetch_mr_discussions(client, gitlab_project_id, local_project_id, mr.clone())
}); });
let prefetched_batch = join_all(prefetch_futures).await; let prefetched_batch = join_all(prefetch_futures).await;
// Step 2: Write each prefetched result serially
for prefetched in prefetched_batch { for prefetched in prefetched_batch {
let disc_result = let disc_result =
write_prefetched_mr_discussions(conn, config, local_project_id, prefetched)?; write_prefetched_mr_discussions(conn, config, local_project_id, prefetched)?;
results.push(disc_result); results.push(disc_result);
processed += 1; processed += 1;
// Emit progress
if let Some(cb) = progress { if let Some(cb) = progress {
cb(ProgressEvent::MrDiscussionSynced { cb(ProgressEvent::MrDiscussionSynced {
current: processed, current: processed,
@@ -485,7 +479,6 @@ async fn sync_mr_discussions_sequential(
Ok(results) Ok(results)
} }
/// Result of draining the resource events queue.
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct DrainResult { pub struct DrainResult {
pub fetched: usize, pub fetched: usize,
@@ -493,21 +486,11 @@ pub struct DrainResult {
pub skipped_not_found: usize, pub skipped_not_found: usize,
} }
/// Enqueue resource_events jobs for all entities of a given type in a project.
///
/// Uses the pending_dependent_fetches queue. Jobs are deduplicated by the UNIQUE
/// constraint, so re-enqueueing the same entity is a no-op.
fn enqueue_resource_events_for_entity_type( fn enqueue_resource_events_for_entity_type(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
entity_type: &str, entity_type: &str,
) -> Result<usize> { ) -> Result<usize> {
// Clean up obsolete jobs: remove resource_events jobs for entities whose
// watermark is already current (updated_at <= resource_events_synced_for_updated_at).
// These are leftover from prior runs that failed after watermark-stamping but
// before job deletion, or from entities that no longer need syncing.
// We intentionally keep jobs for entities that still need syncing (including
// in-progress or failed-with-backoff jobs) to preserve retry state.
match entity_type { match entity_type {
"issue" => { "issue" => {
conn.execute( conn.execute(
@@ -536,10 +519,6 @@ fn enqueue_resource_events_for_entity_type(
_ => {} _ => {}
} }
// Enqueue resource_events jobs only for entities whose updated_at exceeds
// their last resource event sync watermark.
//
// Use separate hardcoded queries per entity type to avoid format!-based SQL.
let entities: Vec<(i64, i64)> = match entity_type { let entities: Vec<(i64, i64)> = match entity_type {
"issue" => { "issue" => {
let mut stmt = conn.prepare_cached( let mut stmt = conn.prepare_cached(
@@ -580,10 +559,6 @@ fn enqueue_resource_events_for_entity_type(
Ok(enqueued) Ok(enqueued)
} }
/// Drain pending resource_events jobs: claim, fetch from GitLab, store, complete/fail.
///
/// Processes jobs sequentially since `rusqlite::Connection` is not `Send`.
/// Uses exponential backoff on failure via `fail_job`.
#[instrument( #[instrument(
skip(conn, client, config, progress), skip(conn, client, config, progress),
fields(project_id, gitlab_project_id, items_processed, errors) fields(project_id, gitlab_project_id, items_processed, errors)
@@ -599,16 +574,11 @@ async fn drain_resource_events(
let mut result = DrainResult::default(); let mut result = DrainResult::default();
let batch_size = config.sync.dependent_concurrency as usize; let batch_size = config.sync.dependent_concurrency as usize;
// Reclaim stale locks from crashed processes
let reclaimed = reclaim_stale_locks(conn, config.sync.stale_lock_minutes)?; let reclaimed = reclaim_stale_locks(conn, config.sync.stale_lock_minutes)?;
if reclaimed > 0 { if reclaimed > 0 {
info!(reclaimed, "Reclaimed stale resource event locks"); info!(reclaimed, "Reclaimed stale resource event locks");
} }
// Count only claimable jobs (unlocked, past retry backoff) for accurate progress.
// Using count_pending_jobs here would inflate the total with locked/backing-off
// jobs that can't be claimed in this drain run, causing the progress bar to
// never reach 100%.
let claimable_counts = count_claimable_jobs(conn, project_id)?; let claimable_counts = count_claimable_jobs(conn, project_id)?;
let total_pending = claimable_counts let total_pending = claimable_counts
.get("resource_events") .get("resource_events")
@@ -638,14 +608,9 @@ async fn drain_resource_events(
break; break;
} }
// Track whether any job in this batch was actually new. If every
// claimed job was already seen, break to avoid an infinite loop
// (can happen with clock skew or zero-backoff edge cases).
let mut any_new_in_batch = false; let mut any_new_in_batch = false;
for job in &jobs { for job in &jobs {
// Guard against re-processing a job that was failed and re-claimed
// within the same drain run.
if !seen_job_ids.insert(job.id) { if !seen_job_ids.insert(job.id) {
warn!( warn!(
job_id = job.id, job_id = job.id,
@@ -693,10 +658,6 @@ async fn drain_resource_events(
} }
} }
Err(e) => { Err(e) => {
// Only 404 (not found) is truly permanent -- the resource
// events endpoint doesn't exist for this entity. Stamp the
// watermark so we skip it next run. All other errors
// (403, auth, network) get backoff retry.
if e.is_permanent_api_error() { if e.is_permanent_api_error() {
debug!( debug!(
entity_type = %job.entity_type, entity_type = %job.entity_type,
@@ -731,7 +692,6 @@ async fn drain_resource_events(
}); });
} }
// If every job in this batch was already seen, stop to prevent spinning.
if !any_new_in_batch { if !any_new_in_batch {
warn!("All claimed jobs were already processed, breaking drain loop"); warn!("All claimed jobs were already processed, breaking drain loop");
break; break;
@@ -757,9 +717,6 @@ async fn drain_resource_events(
Ok(result) Ok(result)
} }
/// Store fetched resource events in the database.
///
/// Wraps all three event types in a single transaction for atomicity.
fn store_resource_events( fn store_resource_events(
conn: &Connection, conn: &Connection,
project_id: i64, project_id: i64,
@@ -805,10 +762,6 @@ fn store_resource_events(
Ok(()) Ok(())
} }
/// Update the resource event watermark for an entity after successful event fetch.
///
/// Sets `resource_events_synced_for_updated_at = updated_at` so the entity
/// won't be re-enqueued until its `updated_at` advances again.
fn update_resource_event_watermark( fn update_resource_event_watermark(
conn: &Connection, conn: &Connection,
entity_type: &str, entity_type: &str,
@@ -832,6 +785,209 @@ fn update_resource_event_watermark(
Ok(()) Ok(())
} }
fn enqueue_mr_closes_issues_jobs(conn: &Connection, project_id: i64) -> Result<usize> {
let mut stmt =
conn.prepare_cached("SELECT id, iid FROM merge_requests WHERE project_id = ?1")?;
let entities: Vec<(i64, i64)> = stmt
.query_map([project_id], |row| Ok((row.get(0)?, row.get(1)?)))?
.collect::<std::result::Result<Vec<_>, _>>()?;
let mut enqueued = 0;
for (local_id, iid) in &entities {
if enqueue_job(
conn,
project_id,
"merge_request",
*iid,
*local_id,
"mr_closes_issues",
None,
)? {
enqueued += 1;
}
}
Ok(enqueued)
}
#[instrument(
skip(conn, client, config, progress),
fields(project_id, gitlab_project_id, items_processed, errors)
)]
async fn drain_mr_closes_issues(
conn: &Connection,
client: &GitLabClient,
config: &Config,
project_id: i64,
gitlab_project_id: i64,
progress: &Option<ProgressCallback>,
) -> Result<DrainResult> {
let mut result = DrainResult::default();
let batch_size = config.sync.dependent_concurrency as usize;
let reclaimed = reclaim_stale_locks(conn, config.sync.stale_lock_minutes)?;
if reclaimed > 0 {
info!(reclaimed, "Reclaimed stale mr_closes_issues locks");
}
let claimable_counts = count_claimable_jobs(conn, project_id)?;
let total_pending = claimable_counts
.get("mr_closes_issues")
.copied()
.unwrap_or(0);
if total_pending == 0 {
return Ok(result);
}
let emit = |event: ProgressEvent| {
if let Some(cb) = progress {
cb(event);
}
};
emit(ProgressEvent::ClosesIssuesFetchStarted {
total: total_pending,
});
let mut processed = 0;
let mut seen_job_ids = std::collections::HashSet::new();
loop {
let jobs = claim_jobs(conn, "mr_closes_issues", project_id, batch_size)?;
if jobs.is_empty() {
break;
}
let mut any_new_in_batch = false;
for job in &jobs {
if !seen_job_ids.insert(job.id) {
warn!(
job_id = job.id,
"Skipping already-processed mr_closes_issues job"
);
continue;
}
any_new_in_batch = true;
match client
.fetch_mr_closes_issues(gitlab_project_id, job.entity_iid)
.await
{
Ok(closes_issues) => {
let store_result = store_closes_issues_refs(
conn,
project_id,
job.entity_local_id,
&closes_issues,
);
match store_result {
Ok(()) => {
complete_job(conn, job.id)?;
result.fetched += 1;
}
Err(e) => {
warn!(
entity_iid = job.entity_iid,
error = %e,
"Failed to store closes_issues references"
);
fail_job(conn, job.id, &e.to_string())?;
result.failed += 1;
}
}
}
Err(e) => {
if e.is_permanent_api_error() {
debug!(
entity_iid = job.entity_iid,
error = %e,
"Permanent API error for closes_issues, marking complete"
);
complete_job(conn, job.id)?;
result.skipped_not_found += 1;
} else {
warn!(
entity_iid = job.entity_iid,
error = %e,
"Failed to fetch closes_issues from GitLab"
);
fail_job(conn, job.id, &e.to_string())?;
result.failed += 1;
}
}
}
processed += 1;
emit(ProgressEvent::ClosesIssueFetched {
current: processed,
total: total_pending,
});
}
if !any_new_in_batch {
warn!("All claimed mr_closes_issues jobs were already processed, breaking drain loop");
break;
}
}
emit(ProgressEvent::ClosesIssuesFetchComplete {
fetched: result.fetched,
failed: result.failed,
});
if result.fetched > 0 || result.failed > 0 {
info!(
fetched = result.fetched,
failed = result.failed,
"mr_closes_issues drain complete"
);
}
tracing::Span::current().record("items_processed", result.fetched);
tracing::Span::current().record("errors", result.failed);
Ok(result)
}
fn store_closes_issues_refs(
conn: &Connection,
project_id: i64,
mr_local_id: i64,
closes_issues: &[crate::gitlab::types::GitLabIssueRef],
) -> Result<()> {
for issue_ref in closes_issues {
let target_local_id = resolve_issue_local_id(conn, project_id, issue_ref.iid)?;
let (target_id, target_path, target_iid) = if let Some(local_id) = target_local_id {
(Some(local_id), None, None)
} else {
let path = resolve_project_path(conn, issue_ref.project_id)?;
let fallback =
path.unwrap_or_else(|| format!("gitlab_project:{}", issue_ref.project_id));
(None, Some(fallback), Some(issue_ref.iid))
};
let ref_ = EntityReference {
project_id,
source_entity_type: "merge_request",
source_entity_id: mr_local_id,
target_entity_type: "issue",
target_entity_id: target_id,
target_project_path: target_path.as_deref(),
target_entity_iid: target_iid,
reference_type: "closes",
source_method: "api",
};
insert_entity_reference(conn, &ref_)?;
}
Ok(())
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use super::*; use super::*;
@@ -879,7 +1035,6 @@ mod tests {
#[test] #[test]
fn progress_event_resource_variants_exist() { fn progress_event_resource_variants_exist() {
// Verify the new progress event variants are constructible
let _start = ProgressEvent::ResourceEventsFetchStarted { total: 10 }; let _start = ProgressEvent::ResourceEventsFetchStarted { total: 10 };
let _progress = ProgressEvent::ResourceEventFetched { let _progress = ProgressEvent::ResourceEventFetched {
current: 5, current: 5,

View File

@@ -1,8 +1,3 @@
//! Gitlore - Semantic search for GitLab issues, MRs, and discussions.
//!
//! A self-hosted CLI tool that syncs GitLab data to a local SQLite database
//! with fast querying and semantic search capabilities.
pub mod cli; pub mod cli;
pub mod core; pub mod core;
pub mod documents; pub mod documents;

View File

@@ -1,5 +1,3 @@
//! Gitlore CLI entry point.
use clap::Parser; use clap::Parser;
use console::style; use console::style;
use dialoguer::{Confirm, Input}; use dialoguer::{Confirm, Input};
@@ -37,42 +35,30 @@ use lore::core::sync_run::SyncRunRecorder;
#[tokio::main] #[tokio::main]
async fn main() { async fn main() {
// Reset SIGPIPE to default behavior so piping (e.g. `lore issues | head`) doesn't panic
#[cfg(unix)] #[cfg(unix)]
unsafe { unsafe {
libc::signal(libc::SIGPIPE, libc::SIG_DFL); libc::signal(libc::SIGPIPE, libc::SIG_DFL);
} }
// Parse CLI first so we know verbosity settings before initializing the subscriber.
let cli = Cli::parse(); let cli = Cli::parse();
let robot_mode = cli.is_robot_mode(); let robot_mode = cli.is_robot_mode();
// Try to load logging config for file layer settings.
// If config isn't available yet (e.g. during `lore init`), use defaults.
let logging_config = lore::Config::load(cli.config.as_deref()) let logging_config = lore::Config::load(cli.config.as_deref())
.map(|c| c.logging) .map(|c| c.logging)
.unwrap_or_default(); .unwrap_or_default();
// Clean up old log files before initializing subscriber (so deleted handles aren't held open)
let log_dir = get_log_dir(logging_config.log_dir.as_deref()); let log_dir = get_log_dir(logging_config.log_dir.as_deref());
if logging_config.file_logging && logging_config.retention_days > 0 { if logging_config.file_logging && logging_config.retention_days > 0 {
logging::cleanup_old_logs(&log_dir, logging_config.retention_days); logging::cleanup_old_logs(&log_dir, logging_config.retention_days);
} }
// Build triple-layer subscriber:
// - stderr layer: human-readable or JSON, controlled by -v flags
// - file layer: always-on JSON to daily-rotated log files
// - metrics layer: captures span timing for robot-mode performance data
let stderr_filter = logging::build_stderr_filter(cli.verbose, cli.quiet); let stderr_filter = logging::build_stderr_filter(cli.verbose, cli.quiet);
let metrics_layer = MetricsLayer::new(); let metrics_layer = MetricsLayer::new();
let registry = tracing_subscriber::registry(); let registry = tracing_subscriber::registry();
// Hold the file writer guard at function scope so it flushes on exit.
// WorkerGuard::drop() flushes pending log entries — forgetting it loses them.
let _file_guard: Option<tracing_appender::non_blocking::WorkerGuard>; let _file_guard: Option<tracing_appender::non_blocking::WorkerGuard>;
// stderr layer: format depends on --log-format flag
if cli.log_format == "json" { if cli.log_format == "json" {
let stderr_layer = tracing_subscriber::fmt::layer() let stderr_layer = tracing_subscriber::fmt::layer()
.json() .json()
@@ -131,11 +117,10 @@ async fn main() {
} }
} }
// Apply color settings (console crate handles NO_COLOR/CLICOLOR natively in "auto" mode)
match cli.color.as_str() { match cli.color.as_str() {
"never" => console::set_colors_enabled(false), "never" => console::set_colors_enabled(false),
"always" => console::set_colors_enabled(true), "always" => console::set_colors_enabled(true),
"auto" => {} // console crate handles this natively "auto" => {}
_ => unreachable!(), _ => unreachable!(),
} }
@@ -193,7 +178,6 @@ async fn main() {
Commands::Health => handle_health(cli.config.as_deref(), robot_mode).await, Commands::Health => handle_health(cli.config.as_deref(), robot_mode).await,
Commands::RobotDocs => handle_robot_docs(robot_mode), Commands::RobotDocs => handle_robot_docs(robot_mode),
// --- Backward-compat: deprecated aliases ---
Commands::List { Commands::List {
entity, entity,
limit, limit,
@@ -296,7 +280,6 @@ async fn main() {
} }
} }
/// Fallback error output for non-LoreError errors in robot mode.
#[derive(Serialize)] #[derive(Serialize)]
struct FallbackErrorOutput { struct FallbackErrorOutput {
error: FallbackError, error: FallbackError,
@@ -309,15 +292,12 @@ struct FallbackError {
} }
fn handle_error(e: Box<dyn std::error::Error>, robot_mode: bool) -> ! { fn handle_error(e: Box<dyn std::error::Error>, robot_mode: bool) -> ! {
// Try to downcast to LoreError for structured output
if let Some(gi_error) = e.downcast_ref::<LoreError>() { if let Some(gi_error) = e.downcast_ref::<LoreError>() {
if robot_mode { if robot_mode {
let output = RobotErrorOutput::from(gi_error); let output = RobotErrorOutput::from(gi_error);
// Use serde_json for safe serialization; fallback constructs JSON safely
eprintln!( eprintln!(
"{}", "{}",
serde_json::to_string(&output).unwrap_or_else(|_| { serde_json::to_string(&output).unwrap_or_else(|_| {
// Fallback uses serde to ensure proper escaping
let fallback = FallbackErrorOutput { let fallback = FallbackErrorOutput {
error: FallbackError { error: FallbackError {
code: "INTERNAL_ERROR".to_string(), code: "INTERNAL_ERROR".to_string(),
@@ -338,7 +318,6 @@ fn handle_error(e: Box<dyn std::error::Error>, robot_mode: bool) -> ! {
} }
} }
// Fallback for non-LoreError errors - use serde for proper JSON escaping
if robot_mode { if robot_mode {
let output = FallbackErrorOutput { let output = FallbackErrorOutput {
error: FallbackError { error: FallbackError {
@@ -359,10 +338,6 @@ fn handle_error(e: Box<dyn std::error::Error>, robot_mode: bool) -> ! {
std::process::exit(1); std::process::exit(1);
} }
// ============================================================================
// Primary command handlers
// ============================================================================
fn handle_issues( fn handle_issues(
config_override: Option<&str>, config_override: Option<&str>,
args: IssuesArgs, args: IssuesArgs,
@@ -375,7 +350,6 @@ fn handle_issues(
let order = if asc { "asc" } else { "desc" }; let order = if asc { "asc" } else { "desc" };
if let Some(iid) = args.iid { if let Some(iid) = args.iid {
// Show mode
let result = run_show_issue(&config, iid, args.project.as_deref())?; let result = run_show_issue(&config, iid, args.project.as_deref())?;
if robot_mode { if robot_mode {
print_show_issue_json(&result); print_show_issue_json(&result);
@@ -383,7 +357,6 @@ fn handle_issues(
print_show_issue(&result); print_show_issue(&result);
} }
} else { } else {
// List mode
let filters = ListFilters { let filters = ListFilters {
limit: args.limit, limit: args.limit,
project: args.project.as_deref(), project: args.project.as_deref(),
@@ -424,7 +397,6 @@ fn handle_mrs(
let order = if asc { "asc" } else { "desc" }; let order = if asc { "asc" } else { "desc" };
if let Some(iid) = args.iid { if let Some(iid) = args.iid {
// Show mode
let result = run_show_mr(&config, iid, args.project.as_deref())?; let result = run_show_mr(&config, iid, args.project.as_deref())?;
if robot_mode { if robot_mode {
print_show_mr_json(&result); print_show_mr_json(&result);
@@ -432,7 +404,6 @@ fn handle_mrs(
print_show_mr(&result); print_show_mr(&result);
} }
} else { } else {
// List mode
let filters = MrListFilters { let filters = MrListFilters {
limit: args.limit, limit: args.limit,
project: args.project.as_deref(), project: args.project.as_deref(),
@@ -481,7 +452,6 @@ async fn handle_ingest(
let force = args.force && !args.no_force; let force = args.force && !args.no_force;
let full = args.full && !args.no_full; let full = args.full && !args.no_full;
// Record ingest run lifecycle in sync_runs table
let entity_label = args.entity.as_deref().unwrap_or("all"); let entity_label = args.entity.as_deref().unwrap_or("all");
let command = format!("ingest:{entity_label}"); let command = format!("ingest:{entity_label}");
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
@@ -493,7 +463,6 @@ async fn handle_ingest(
let ingest_result: std::result::Result<(), Box<dyn std::error::Error>> = async { let ingest_result: std::result::Result<(), Box<dyn std::error::Error>> = async {
match args.entity.as_deref() { match args.entity.as_deref() {
Some(resource_type) => { Some(resource_type) => {
// Single entity ingest
let result = run_ingest( let result = run_ingest(
&config, &config,
resource_type, resource_type,
@@ -512,7 +481,6 @@ async fn handle_ingest(
} }
} }
None => { None => {
// Ingest everything: issues then MRs
if !robot_mode && !quiet { if !robot_mode && !quiet {
println!( println!(
"{}", "{}",
@@ -571,7 +539,6 @@ async fn handle_ingest(
} }
} }
/// JSON output for combined ingest (issues + mrs).
#[derive(Serialize)] #[derive(Serialize)]
struct CombinedIngestOutput { struct CombinedIngestOutput {
ok: bool, ok: bool,
@@ -666,7 +633,6 @@ async fn handle_sync_status_cmd(
Ok(()) Ok(())
} }
/// JSON output for init command.
#[derive(Serialize)] #[derive(Serialize)]
struct InitOutput { struct InitOutput {
ok: bool, ok: bool,
@@ -725,7 +691,6 @@ async fn handle_init(
token_env_var_flag: Option<String>, token_env_var_flag: Option<String>,
projects_flag: Option<String>, projects_flag: Option<String>,
) -> Result<(), Box<dyn std::error::Error>> { ) -> Result<(), Box<dyn std::error::Error>> {
// Robot mode: require all inputs via flags, skip interactive prompts
if robot_mode { if robot_mode {
let missing: Vec<&str> = [ let missing: Vec<&str> = [
gitlab_url_flag.is_none().then_some("--gitlab-url"), gitlab_url_flag.is_none().then_some("--gitlab-url"),
@@ -773,7 +738,6 @@ async fn handle_init(
return Ok(()); return Ok(());
} }
// Human mode: interactive prompts
let config_path = get_config_path(config_override); let config_path = get_config_path(config_override);
let mut confirmed_overwrite = force; let mut confirmed_overwrite = force;
@@ -903,7 +867,6 @@ async fn handle_init(
Ok(()) Ok(())
} }
/// JSON output for auth-test command.
#[derive(Serialize)] #[derive(Serialize)]
struct AuthTestOutput { struct AuthTestOutput {
ok: bool, ok: bool,
@@ -953,7 +916,7 @@ async fn handle_auth_test(
} else { } else {
eprintln!("{}", style(format!("Error: {e}")).red()); eprintln!("{}", style(format!("Error: {e}")).red());
} }
std::process::exit(5); // AUTH_FAILED exit code std::process::exit(5);
} }
} }
} }
@@ -977,7 +940,6 @@ async fn handle_doctor(
Ok(()) Ok(())
} }
/// JSON output for version command.
#[derive(Serialize)] #[derive(Serialize)]
struct VersionOutput { struct VersionOutput {
ok: bool, ok: bool,
@@ -1071,7 +1033,6 @@ fn handle_reset(robot_mode: bool) -> Result<(), Box<dyn std::error::Error>> {
std::process::exit(1); std::process::exit(1);
} }
/// JSON output for migrate command.
#[derive(Serialize)] #[derive(Serialize)]
struct MigrateOutput { struct MigrateOutput {
ok: bool, ok: bool,
@@ -1085,7 +1046,6 @@ struct MigrateData {
migrated: bool, migrated: bool,
} }
/// JSON error output with suggestion field.
#[derive(Serialize)] #[derive(Serialize)]
struct RobotErrorWithSuggestion { struct RobotErrorWithSuggestion {
error: RobotErrorSuggestionData, error: RobotErrorSuggestionData,
@@ -1125,7 +1085,7 @@ async fn handle_migrate(
style("Run 'lore init' first to create the database.").yellow() style("Run 'lore init' first to create the database.").yellow()
); );
} }
std::process::exit(10); // DB_ERROR exit code std::process::exit(10);
} }
let conn = create_connection(&db_path)?; let conn = create_connection(&db_path)?;
@@ -1174,7 +1134,6 @@ async fn handle_stats(
robot_mode: bool, robot_mode: bool,
) -> Result<(), Box<dyn std::error::Error>> { ) -> Result<(), Box<dyn std::error::Error>> {
let config = Config::load(config_override)?; let config = Config::load(config_override)?;
// Auto-enable --check when --repair is used
let check = (args.check && !args.no_check) || args.repair; let check = (args.check && !args.no_check) || args.repair;
let result = run_stats(&config, check, args.repair)?; let result = run_stats(&config, check, args.repair)?;
if robot_mode { if robot_mode {
@@ -1273,7 +1232,6 @@ async fn handle_sync_cmd(
robot_mode, robot_mode,
}; };
// Record sync run lifecycle in sync_runs table
let db_path = get_db_path(config.storage.db_path.as_deref()); let db_path = get_db_path(config.storage.db_path.as_deref());
let recorder_conn = create_connection(&db_path)?; let recorder_conn = create_connection(&db_path)?;
let run_id = uuid::Uuid::new_v4().simple().to_string(); let run_id = uuid::Uuid::new_v4().simple().to_string();
@@ -1290,7 +1248,6 @@ async fn handle_sync_cmd(
+ result.documents_regenerated + result.documents_regenerated
+ result.documents_embedded; + result.documents_embedded;
let total_errors = result.resource_events_failed; let total_errors = result.resource_events_failed;
// Best-effort: don't fail the command if recording fails
let _ = recorder.succeed(&recorder_conn, &stages, total_items, total_errors); let _ = recorder.succeed(&recorder_conn, &stages, total_items, total_errors);
if robot_mode { if robot_mode {
@@ -1308,11 +1265,6 @@ async fn handle_sync_cmd(
} }
} }
// ============================================================================
// Health + Robot-docs handlers
// ============================================================================
/// JSON output for health command.
#[derive(Serialize)] #[derive(Serialize)]
struct HealthOutput { struct HealthOutput {
ok: bool, ok: bool,
@@ -1406,7 +1358,6 @@ async fn handle_health(
Ok(()) Ok(())
} }
/// JSON output for robot-docs command.
#[derive(Serialize)] #[derive(Serialize)]
struct RobotDocsOutput { struct RobotDocsOutput {
ok: bool, ok: bool,
@@ -1591,10 +1542,6 @@ fn handle_robot_docs(robot_mode: bool) -> Result<(), Box<dyn std::error::Error>>
Ok(()) Ok(())
} }
// ============================================================================
// Backward-compat handlers (deprecated, delegate to new handlers)
// ============================================================================
#[allow(clippy::too_many_arguments)] #[allow(clippy::too_many_arguments)]
async fn handle_list_compat( async fn handle_list_compat(
config_override: Option<&str>, config_override: Option<&str>,

View File

@@ -5,14 +5,12 @@ use rusqlite::Connection;
const DEFAULT_LIMIT: usize = 20; const DEFAULT_LIMIT: usize = 20;
const MAX_LIMIT: usize = 100; const MAX_LIMIT: usize = 100;
/// Path filter: exact match or prefix match (trailing `/`).
#[derive(Debug, Clone)] #[derive(Debug, Clone)]
pub enum PathFilter { pub enum PathFilter {
Exact(String), Exact(String),
Prefix(String), Prefix(String),
} }
/// Filters applied to search results post-retrieval.
#[derive(Debug, Clone, Default)] #[derive(Debug, Clone, Default)]
pub struct SearchFilters { pub struct SearchFilters {
pub source_type: Option<SourceType>, pub source_type: Option<SourceType>,
@@ -26,7 +24,6 @@ pub struct SearchFilters {
} }
impl SearchFilters { impl SearchFilters {
/// Returns true if any filter (besides limit) is set.
pub fn has_any_filter(&self) -> bool { pub fn has_any_filter(&self) -> bool {
self.source_type.is_some() self.source_type.is_some()
|| self.author.is_some() || self.author.is_some()
@@ -37,7 +34,6 @@ impl SearchFilters {
|| self.path.is_some() || self.path.is_some()
} }
/// Clamp limit to [1, 100], defaulting 0 to 20.
pub fn clamp_limit(&self) -> usize { pub fn clamp_limit(&self) -> usize {
if self.limit == 0 { if self.limit == 0 {
DEFAULT_LIMIT DEFAULT_LIMIT
@@ -47,17 +43,12 @@ impl SearchFilters {
} }
} }
/// Escape SQL LIKE wildcards in a string.
fn escape_like(s: &str) -> String { fn escape_like(s: &str) -> String {
s.replace('\\', "\\\\") s.replace('\\', "\\\\")
.replace('%', "\\%") .replace('%', "\\%")
.replace('_', "\\_") .replace('_', "\\_")
} }
/// Apply filters to a ranked list of document IDs, preserving rank order.
///
/// Uses json_each() to pass ranked IDs efficiently and maintain ordering
/// via ORDER BY j.key.
pub fn apply_filters( pub fn apply_filters(
conn: &Connection, conn: &Connection,
document_ids: &[i64], document_ids: &[i64],
@@ -216,8 +207,6 @@ mod tests {
#[test] #[test]
fn test_empty_ids() { fn test_empty_ids() {
// Cannot test apply_filters without DB, but we can verify empty returns empty
// by testing the early return path logic
let f = SearchFilters::default(); let f = SearchFilters::default();
assert!(!f.has_any_filter()); assert!(!f.has_any_filter());
} }

View File

@@ -1,16 +1,12 @@
use crate::core::error::Result; use crate::core::error::Result;
use rusqlite::Connection; use rusqlite::Connection;
/// FTS query mode.
#[derive(Debug, Clone, Copy, PartialEq, Eq)] #[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum FtsQueryMode { pub enum FtsQueryMode {
/// Safe mode: each token wrapped in quotes, trailing * preserved on alphanumeric tokens.
Safe, Safe,
/// Raw mode: query passed directly to FTS5 (for advanced users).
Raw, Raw,
} }
/// A single FTS5 search result.
#[derive(Debug)] #[derive(Debug)]
pub struct FtsResult { pub struct FtsResult {
pub document_id: i64, pub document_id: i64,
@@ -18,14 +14,6 @@ pub struct FtsResult {
pub snippet: String, pub snippet: String,
} }
/// Convert raw user input into a safe FTS5 query.
///
/// Safe mode:
/// - Splits on whitespace
/// - Wraps each token in double quotes (escaping internal quotes)
/// - Preserves trailing `*` on alphanumeric-only tokens (prefix search)
///
/// Raw mode: passes through unchanged.
pub fn to_fts_query(raw: &str, mode: FtsQueryMode) -> String { pub fn to_fts_query(raw: &str, mode: FtsQueryMode) -> String {
match mode { match mode {
FtsQueryMode::Raw => raw.to_string(), FtsQueryMode::Raw => raw.to_string(),
@@ -38,16 +26,13 @@ pub fn to_fts_query(raw: &str, mode: FtsQueryMode) -> String {
let tokens: Vec<String> = trimmed let tokens: Vec<String> = trimmed
.split_whitespace() .split_whitespace()
.map(|token| { .map(|token| {
// Check if token ends with * and the rest is alphanumeric
if let Some(stem) = token.strip_suffix('*') if let Some(stem) = token.strip_suffix('*')
&& !stem.is_empty() && !stem.is_empty()
&& stem.chars().all(|c| c.is_alphanumeric() || c == '_') && stem.chars().all(|c| c.is_alphanumeric() || c == '_')
{ {
// Preserve prefix search: "stem"*
let escaped = stem.replace('"', "\"\""); let escaped = stem.replace('"', "\"\"");
return format!("\"{}\"*", escaped); return format!("\"{}\"*", escaped);
} }
// Default: wrap in quotes, escape internal quotes
let escaped = token.replace('"', "\"\""); let escaped = token.replace('"', "\"\"");
format!("\"{}\"", escaped) format!("\"{}\"", escaped)
}) })
@@ -58,10 +43,6 @@ pub fn to_fts_query(raw: &str, mode: FtsQueryMode) -> String {
} }
} }
/// Execute an FTS5 search query.
///
/// Returns results ranked by BM25 score (lower = better match) with
/// contextual snippets highlighting matches.
pub fn search_fts( pub fn search_fts(
conn: &Connection, conn: &Connection,
query: &str, query: &str,
@@ -97,14 +78,11 @@ pub fn search_fts(
Ok(results) Ok(results)
} }
/// Generate a fallback snippet for results without FTS snippets.
/// Truncates at a word boundary and appends "...".
pub fn generate_fallback_snippet(content_text: &str, max_chars: usize) -> String { pub fn generate_fallback_snippet(content_text: &str, max_chars: usize) -> String {
if content_text.chars().count() <= max_chars { if content_text.chars().count() <= max_chars {
return content_text.to_string(); return content_text.to_string();
} }
// Collect the char boundary at max_chars to slice correctly for multi-byte content
let byte_end = content_text let byte_end = content_text
.char_indices() .char_indices()
.nth(max_chars) .nth(max_chars)
@@ -112,7 +90,6 @@ pub fn generate_fallback_snippet(content_text: &str, max_chars: usize) -> String
.unwrap_or(content_text.len()); .unwrap_or(content_text.len());
let truncated = &content_text[..byte_end]; let truncated = &content_text[..byte_end];
// Walk backward to find a word boundary (space)
if let Some(last_space) = truncated.rfind(' ') { if let Some(last_space) = truncated.rfind(' ') {
format!("{}...", &truncated[..last_space]) format!("{}...", &truncated[..last_space])
} else { } else {
@@ -120,7 +97,6 @@ pub fn generate_fallback_snippet(content_text: &str, max_chars: usize) -> String
} }
} }
/// Get the best snippet: prefer FTS snippet, fall back to truncated content.
pub fn get_result_snippet(fts_snippet: Option<&str>, content_text: &str) -> String { pub fn get_result_snippet(fts_snippet: Option<&str>, content_text: &str) -> String {
match fts_snippet { match fts_snippet {
Some(s) if !s.is_empty() => s.to_string(), Some(s) if !s.is_empty() => s.to_string(),
@@ -179,11 +155,9 @@ mod tests {
#[test] #[test]
fn test_prefix_only_alphanumeric() { fn test_prefix_only_alphanumeric() {
// Non-alphanumeric prefix: C++* should NOT be treated as prefix search
let result = to_fts_query("C++*", FtsQueryMode::Safe); let result = to_fts_query("C++*", FtsQueryMode::Safe);
assert_eq!(result, "\"C++*\""); assert_eq!(result, "\"C++*\"");
// Pure alphanumeric prefix: auth* should be prefix search
let result = to_fts_query("auth*", FtsQueryMode::Safe); let result = to_fts_query("auth*", FtsQueryMode::Safe);
assert_eq!(result, "\"auth\"*"); assert_eq!(result, "\"auth\"*");
} }
@@ -205,7 +179,7 @@ mod tests {
let content = "This is a moderately long piece of text that should be truncated at a word boundary for readability purposes"; let content = "This is a moderately long piece of text that should be truncated at a word boundary for readability purposes";
let result = generate_fallback_snippet(content, 50); let result = generate_fallback_snippet(content, 50);
assert!(result.ends_with("...")); assert!(result.ends_with("..."));
assert!(result.len() <= 55); // 50 + "..." assert!(result.len() <= 55);
} }
#[test] #[test]

View File

@@ -1,5 +1,3 @@
//! Hybrid search orchestrator combining FTS5 + sqlite-vec via RRF.
use rusqlite::Connection; use rusqlite::Connection;
use crate::core::error::Result; use crate::core::error::Result;
@@ -11,7 +9,6 @@ const BASE_RECALL_MIN: usize = 50;
const FILTERED_RECALL_MIN: usize = 200; const FILTERED_RECALL_MIN: usize = 200;
const RECALL_CAP: usize = 1500; const RECALL_CAP: usize = 1500;
/// Search mode selection.
#[derive(Debug, Clone, Copy, PartialEq, Eq)] #[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum SearchMode { pub enum SearchMode {
Hybrid, Hybrid,
@@ -38,7 +35,6 @@ impl SearchMode {
} }
} }
/// Combined search result with provenance from both retrieval lists.
pub struct HybridResult { pub struct HybridResult {
pub document_id: i64, pub document_id: i64,
pub score: f64, pub score: f64,
@@ -47,11 +43,6 @@ pub struct HybridResult {
pub rrf_score: f64, pub rrf_score: f64,
} }
/// Execute hybrid search, returning ranked results + any warnings.
///
/// `client` is `Option` to enable graceful degradation: when Ollama is
/// unavailable, the caller passes `None` and hybrid mode falls back to
/// FTS-only with a warning.
pub async fn search_hybrid( pub async fn search_hybrid(
conn: &Connection, conn: &Connection,
client: Option<&OllamaClient>, client: Option<&OllamaClient>,
@@ -62,7 +53,6 @@ pub async fn search_hybrid(
) -> Result<(Vec<HybridResult>, Vec<String>)> { ) -> Result<(Vec<HybridResult>, Vec<String>)> {
let mut warnings: Vec<String> = Vec::new(); let mut warnings: Vec<String> = Vec::new();
// Adaptive recall
let requested = filters.clamp_limit(); let requested = filters.clamp_limit();
let top_k = if filters.has_any_filter() { let top_k = if filters.has_any_filter() {
(requested * 50).clamp(FILTERED_RECALL_MIN, RECALL_CAP) (requested * 50).clamp(FILTERED_RECALL_MIN, RECALL_CAP)
@@ -159,7 +149,6 @@ pub async fn search_hybrid(
}) })
.collect(); .collect();
// Apply post-retrieval filters and limit
let limit = filters.clamp_limit(); let limit = filters.clamp_limit();
let results = if filters.has_any_filter() { let results = if filters.has_any_filter() {
let all_ids: Vec<i64> = results.iter().map(|r| r.document_id).collect(); let all_ids: Vec<i64> = results.iter().map(|r| r.document_id).collect();
@@ -232,7 +221,7 @@ mod tests {
}; };
let requested = filters.clamp_limit(); let requested = filters.clamp_limit();
let top_k = (requested * 50).clamp(FILTERED_RECALL_MIN, RECALL_CAP); let top_k = (requested * 50).clamp(FILTERED_RECALL_MIN, RECALL_CAP);
assert_eq!(top_k, RECALL_CAP); // 5000 capped to 1500 assert_eq!(top_k, RECALL_CAP);
} }
#[test] #[test]
@@ -243,6 +232,6 @@ mod tests {
}; };
let requested = filters.clamp_limit(); let requested = filters.clamp_limit();
let top_k = (requested * 10).clamp(BASE_RECALL_MIN, RECALL_CAP); let top_k = (requested * 10).clamp(BASE_RECALL_MIN, RECALL_CAP);
assert_eq!(top_k, BASE_RECALL_MIN); // 10 -> 50 assert_eq!(top_k, BASE_RECALL_MIN);
} }
} }

View File

@@ -2,39 +2,24 @@ use std::collections::HashMap;
const RRF_K: f64 = 60.0; const RRF_K: f64 = 60.0;
/// A single result from Reciprocal Rank Fusion, containing both raw and
/// normalized scores plus per-list rank provenance for --explain output.
pub struct RrfResult { pub struct RrfResult {
pub document_id: i64, pub document_id: i64,
/// Raw RRF score: sum of 1/(k + rank) across all lists.
pub rrf_score: f64, pub rrf_score: f64,
/// Normalized to [0, 1] where the best result is 1.0.
pub normalized_score: f64, pub normalized_score: f64,
/// 1-indexed rank in the vector results list, if present.
pub vector_rank: Option<usize>, pub vector_rank: Option<usize>,
/// 1-indexed rank in the FTS results list, if present.
pub fts_rank: Option<usize>, pub fts_rank: Option<usize>,
} }
/// Combine vector and FTS retrieval results using Reciprocal Rank Fusion.
///
/// Input tuples are `(document_id, score/distance)` — already sorted by each retriever.
/// Ranks are 1-indexed (first result = rank 1).
///
/// Score = sum of 1/(k + rank) for each list containing the document.
pub fn rank_rrf(vector_results: &[(i64, f64)], fts_results: &[(i64, f64)]) -> Vec<RrfResult> { pub fn rank_rrf(vector_results: &[(i64, f64)], fts_results: &[(i64, f64)]) -> Vec<RrfResult> {
if vector_results.is_empty() && fts_results.is_empty() { if vector_results.is_empty() && fts_results.is_empty() {
return Vec::new(); return Vec::new();
} }
// (rrf_score, vector_rank, fts_rank)
let mut scores: HashMap<i64, (f64, Option<usize>, Option<usize>)> = HashMap::new(); let mut scores: HashMap<i64, (f64, Option<usize>, Option<usize>)> = HashMap::new();
for (i, &(doc_id, _)) in vector_results.iter().enumerate() { for (i, &(doc_id, _)) in vector_results.iter().enumerate() {
let rank = i + 1; // 1-indexed let rank = i + 1;
let entry = scores.entry(doc_id).or_insert((0.0, None, None)); let entry = scores.entry(doc_id).or_insert((0.0, None, None));
// Only count the first occurrence per list to prevent duplicates
// from inflating the score.
if entry.1.is_none() { if entry.1.is_none() {
entry.0 += 1.0 / (RRF_K + rank as f64); entry.0 += 1.0 / (RRF_K + rank as f64);
entry.1 = Some(rank); entry.1 = Some(rank);
@@ -42,7 +27,7 @@ pub fn rank_rrf(vector_results: &[(i64, f64)], fts_results: &[(i64, f64)]) -> Ve
} }
for (i, &(doc_id, _)) in fts_results.iter().enumerate() { for (i, &(doc_id, _)) in fts_results.iter().enumerate() {
let rank = i + 1; // 1-indexed let rank = i + 1;
let entry = scores.entry(doc_id).or_insert((0.0, None, None)); let entry = scores.entry(doc_id).or_insert((0.0, None, None));
if entry.2.is_none() { if entry.2.is_none() {
entry.0 += 1.0 / (RRF_K + rank as f64); entry.0 += 1.0 / (RRF_K + rank as f64);
@@ -55,16 +40,14 @@ pub fn rank_rrf(vector_results: &[(i64, f64)], fts_results: &[(i64, f64)]) -> Ve
.map(|(doc_id, (rrf_score, vector_rank, fts_rank))| RrfResult { .map(|(doc_id, (rrf_score, vector_rank, fts_rank))| RrfResult {
document_id: doc_id, document_id: doc_id,
rrf_score, rrf_score,
normalized_score: 0.0, // filled in below normalized_score: 0.0,
vector_rank, vector_rank,
fts_rank, fts_rank,
}) })
.collect(); .collect();
// Sort descending by rrf_score
results.sort_by(|a, b| b.rrf_score.total_cmp(&a.rrf_score)); results.sort_by(|a, b| b.rrf_score.total_cmp(&a.rrf_score));
// Normalize: best = 1.0
if let Some(max_score) = results.first().map(|r| r.rrf_score).filter(|&s| s > 0.0) { if let Some(max_score) = results.first().map(|r| r.rrf_score).filter(|&s| s > 0.0) {
for result in &mut results { for result in &mut results {
result.normalized_score = result.rrf_score / max_score; result.normalized_score = result.rrf_score / max_score;
@@ -84,10 +67,8 @@ mod tests {
let fts = vec![(1, 5.0), (3, 3.0)]; let fts = vec![(1, 5.0), (3, 3.0)];
let results = rank_rrf(&vector, &fts); let results = rank_rrf(&vector, &fts);
// Doc 1 appears in both lists, should rank highest
assert_eq!(results[0].document_id, 1); assert_eq!(results[0].document_id, 1);
// Doc 1 score should be higher than doc 2 and doc 3
let doc1 = &results[0]; let doc1 = &results[0];
let doc2_score = results let doc2_score = results
.iter() .iter()
@@ -121,10 +102,8 @@ mod tests {
let fts = vec![(1, 5.0), (3, 3.0)]; let fts = vec![(1, 5.0), (3, 3.0)];
let results = rank_rrf(&vector, &fts); let results = rank_rrf(&vector, &fts);
// Best result should have normalized_score = 1.0
assert!((results[0].normalized_score - 1.0).abs() < f64::EPSILON); assert!((results[0].normalized_score - 1.0).abs() < f64::EPSILON);
// All scores in [0, 1]
for r in &results { for r in &results {
assert!(r.normalized_score >= 0.0); assert!(r.normalized_score >= 0.0);
assert!(r.normalized_score <= 1.0); assert!(r.normalized_score <= 1.0);
@@ -165,7 +144,6 @@ mod tests {
assert_eq!(results.len(), 1); assert_eq!(results.len(), 1);
let r = &results[0]; let r = &results[0];
// RRF score = 1/(60+1) + 1/(60+1) = 2/61
let expected = 2.0 / 61.0; let expected = 2.0 / 61.0;
assert!((r.rrf_score - expected).abs() < 1e-10); assert!((r.rrf_score - expected).abs() < 1e-10);
assert!((r.normalized_score - 1.0).abs() < f64::EPSILON); assert!((r.normalized_score - 1.0).abs() < f64::EPSILON);
@@ -177,7 +155,6 @@ mod tests {
let results = rank_rrf(&vector, &[]); let results = rank_rrf(&vector, &[]);
assert_eq!(results.len(), 2); assert_eq!(results.len(), 2);
// Single result should still have normalized_score = 1.0
assert!((results[0].normalized_score - 1.0).abs() < f64::EPSILON); assert!((results[0].normalized_score - 1.0).abs() < f64::EPSILON);
} }
} }

View File

@@ -5,16 +5,13 @@ use rusqlite::Connection;
use crate::core::error::Result; use crate::core::error::Result;
use crate::embedding::chunk_ids::decode_rowid; use crate::embedding::chunk_ids::decode_rowid;
/// A single vector search result (document-level, deduplicated).
#[derive(Debug)] #[derive(Debug)]
pub struct VectorResult { pub struct VectorResult {
pub document_id: i64, pub document_id: i64,
pub distance: f64, pub distance: f64,
} }
/// Query the maximum number of chunks per document for adaptive dedup sizing.
fn max_chunks_per_document(conn: &Connection) -> i64 { fn max_chunks_per_document(conn: &Connection) -> i64 {
// Fast path: stored chunk_count on sentinel rows (post-migration 010)
let stored: Option<i64> = conn let stored: Option<i64> = conn
.query_row( .query_row(
"SELECT MAX(chunk_count) FROM embedding_metadata "SELECT MAX(chunk_count) FROM embedding_metadata
@@ -28,7 +25,6 @@ fn max_chunks_per_document(conn: &Connection) -> i64 {
return max; return max;
} }
// Fallback for pre-migration data: count chunks per document
conn.query_row( conn.query_row(
"SELECT COALESCE(MAX(cnt), 1) FROM ( "SELECT COALESCE(MAX(cnt), 1) FROM (
SELECT COUNT(*) as cnt FROM embedding_metadata SELECT COUNT(*) as cnt FROM embedding_metadata
@@ -40,12 +36,6 @@ fn max_chunks_per_document(conn: &Connection) -> i64 {
.unwrap_or(1) .unwrap_or(1)
} }
/// Search documents using sqlite-vec KNN query.
///
/// Over-fetches by an adaptive multiplier based on actual max chunks per document
/// to handle chunk deduplication (multiple chunks per document produce multiple
/// KNN results for the same document_id).
/// Returns deduplicated results with best (lowest) distance per document.
pub fn search_vector( pub fn search_vector(
conn: &Connection, conn: &Connection,
query_embedding: &[f32], query_embedding: &[f32],
@@ -55,7 +45,6 @@ pub fn search_vector(
return Ok(Vec::new()); return Ok(Vec::new());
} }
// Convert to raw little-endian bytes for sqlite-vec
let embedding_bytes: Vec<u8> = query_embedding let embedding_bytes: Vec<u8> = query_embedding
.iter() .iter()
.flat_map(|f| f.to_le_bytes()) .flat_map(|f| f.to_le_bytes())
@@ -79,7 +68,6 @@ pub fn search_vector(
})? })?
.collect::<std::result::Result<Vec<_>, _>>()?; .collect::<std::result::Result<Vec<_>, _>>()?;
// Dedup by document_id, keeping best (lowest) distance
let mut best: HashMap<i64, f64> = HashMap::new(); let mut best: HashMap<i64, f64> = HashMap::new();
for (rowid, distance) in rows { for (rowid, distance) in rows {
let (document_id, _chunk_index) = decode_rowid(rowid); let (document_id, _chunk_index) = decode_rowid(rowid);
@@ -92,7 +80,6 @@ pub fn search_vector(
.or_insert(distance); .or_insert(distance);
} }
// Sort by distance ascending, take limit
let mut results: Vec<VectorResult> = best let mut results: Vec<VectorResult> = best
.into_iter() .into_iter()
.map(|(document_id, distance)| VectorResult { .map(|(document_id, distance)| VectorResult {
@@ -110,29 +97,20 @@ pub fn search_vector(
mod tests { mod tests {
use super::*; use super::*;
// Note: Full integration tests require sqlite-vec loaded, which happens via
// create_connection in db.rs. These are basic unit tests for the dedup logic.
#[test] #[test]
fn test_empty_returns_empty() { fn test_empty_returns_empty() {
// Can't test KNN without sqlite-vec, but we can test edge cases
let result = search_vector_dedup(vec![], 10); let result = search_vector_dedup(vec![], 10);
assert!(result.is_empty()); assert!(result.is_empty());
} }
#[test] #[test]
fn test_dedup_keeps_best_distance() { fn test_dedup_keeps_best_distance() {
// Simulate: doc 1 has chunks at rowid 1000 (idx 0) and 1001 (idx 1) let rows = vec![(1000_i64, 0.5_f64), (1001, 0.3), (2000, 0.4)];
let rows = vec![
(1000_i64, 0.5_f64), // doc 1, chunk 0
(1001, 0.3), // doc 1, chunk 1 (better)
(2000, 0.4), // doc 2, chunk 0
];
let results = search_vector_dedup(rows, 10); let results = search_vector_dedup(rows, 10);
assert_eq!(results.len(), 2); assert_eq!(results.len(), 2);
assert_eq!(results[0].document_id, 1); // doc 1 best = 0.3 assert_eq!(results[0].document_id, 1);
assert!((results[0].distance - 0.3).abs() < f64::EPSILON); assert!((results[0].distance - 0.3).abs() < f64::EPSILON);
assert_eq!(results[1].document_id, 2); // doc 2 = 0.4 assert_eq!(results[1].document_id, 2);
} }
#[test] #[test]
@@ -142,7 +120,6 @@ mod tests {
assert_eq!(results.len(), 2); assert_eq!(results.len(), 2);
} }
/// Helper for testing dedup logic without sqlite-vec
fn search_vector_dedup(rows: Vec<(i64, f64)>, limit: usize) -> Vec<VectorResult> { fn search_vector_dedup(rows: Vec<(i64, f64)>, limit: usize) -> Vec<VectorResult> {
let mut best: HashMap<i64, f64> = HashMap::new(); let mut best: HashMap<i64, f64> = HashMap::new();
for (rowid, distance) in rows { for (rowid, distance) in rows {

View File

@@ -1,5 +1,3 @@
//! Tests for DiffNote position extraction in note transformer.
use lore::gitlab::transformers::discussion::transform_notes_with_diff_position; use lore::gitlab::transformers::discussion::transform_notes_with_diff_position;
use lore::gitlab::types::{ use lore::gitlab::types::{
GitLabAuthor, GitLabDiscussion, GitLabLineRange, GitLabLineRangePoint, GitLabNote, GitLabAuthor, GitLabDiscussion, GitLabLineRange, GitLabLineRangePoint, GitLabNote,
@@ -60,8 +58,6 @@ fn make_discussion(notes: Vec<GitLabNote>) -> GitLabDiscussion {
} }
} }
// === DiffNote Position Field Extraction ===
#[test] #[test]
fn extracts_position_paths_from_diffnote() { fn extracts_position_paths_from_diffnote() {
let position = GitLabNotePosition { let position = GitLabNotePosition {
@@ -174,7 +170,7 @@ fn line_range_uses_old_line_fallback_when_new_line_missing() {
line_code: None, line_code: None,
line_type: Some("old".to_string()), line_type: Some("old".to_string()),
old_line: Some(20), old_line: Some(20),
new_line: None, // missing - should fall back to old_line new_line: None,
}, },
end: GitLabLineRangePoint { end: GitLabLineRangePoint {
line_code: None, line_code: None,
@@ -203,8 +199,6 @@ fn line_range_uses_old_line_fallback_when_new_line_missing() {
assert_eq!(notes[0].position_line_range_end, Some(25)); assert_eq!(notes[0].position_line_range_end, Some(25));
} }
// === Regular Notes (non-DiffNote) ===
#[test] #[test]
fn regular_note_has_none_for_all_position_fields() { fn regular_note_has_none_for_all_position_fields() {
let note = make_basic_note(1, "2024-01-16T09:00:00.000Z"); let note = make_basic_note(1, "2024-01-16T09:00:00.000Z");
@@ -224,8 +218,6 @@ fn regular_note_has_none_for_all_position_fields() {
assert_eq!(notes[0].position_head_sha, None); assert_eq!(notes[0].position_head_sha, None);
} }
// === Strict Timestamp Parsing ===
#[test] #[test]
fn returns_error_for_invalid_created_at_timestamp() { fn returns_error_for_invalid_created_at_timestamp() {
let mut note = make_basic_note(1, "2024-01-16T09:00:00.000Z"); let mut note = make_basic_note(1, "2024-01-16T09:00:00.000Z");
@@ -264,8 +256,6 @@ fn returns_error_for_invalid_resolved_at_timestamp() {
assert!(result.is_err()); assert!(result.is_err());
} }
// === Mixed Discussion (DiffNote + Regular Notes) ===
#[test] #[test]
fn handles_mixed_diffnote_and_regular_notes() { fn handles_mixed_diffnote_and_regular_notes() {
let position = GitLabNotePosition { let position = GitLabNotePosition {
@@ -286,16 +276,12 @@ fn handles_mixed_diffnote_and_regular_notes() {
let notes = transform_notes_with_diff_position(&discussion, 100).unwrap(); let notes = transform_notes_with_diff_position(&discussion, 100).unwrap();
assert_eq!(notes.len(), 2); assert_eq!(notes.len(), 2);
// First note is DiffNote with position
assert_eq!(notes[0].position_new_path, Some("file.rs".to_string())); assert_eq!(notes[0].position_new_path, Some("file.rs".to_string()));
assert_eq!(notes[0].position_new_line, Some(42)); assert_eq!(notes[0].position_new_line, Some(42));
// Second note is regular with None position fields
assert_eq!(notes[1].position_new_path, None); assert_eq!(notes[1].position_new_path, None);
assert_eq!(notes[1].position_new_line, None); assert_eq!(notes[1].position_new_line, None);
} }
// === Position Preservation ===
#[test] #[test]
fn preserves_note_position_index() { fn preserves_note_position_index() {
let pos1 = GitLabNotePosition { let pos1 = GitLabNotePosition {
@@ -330,11 +316,8 @@ fn preserves_note_position_index() {
assert_eq!(notes[1].position, 1); assert_eq!(notes[1].position, 1);
} }
// === Edge Cases ===
#[test] #[test]
fn handles_diffnote_with_empty_position_fields() { fn handles_diffnote_with_empty_position_fields() {
// DiffNote exists but all position fields are None
let position = GitLabNotePosition { let position = GitLabNotePosition {
old_path: None, old_path: None,
new_path: None, new_path: None,
@@ -351,7 +334,6 @@ fn handles_diffnote_with_empty_position_fields() {
let notes = transform_notes_with_diff_position(&discussion, 100).unwrap(); let notes = transform_notes_with_diff_position(&discussion, 100).unwrap();
// All position fields should be None, not cause an error
assert_eq!(notes[0].position_old_path, None); assert_eq!(notes[0].position_old_path, None);
assert_eq!(notes[0].position_new_path, None); assert_eq!(notes[0].position_new_path, None);
} }
@@ -376,6 +358,5 @@ fn handles_file_position_type() {
assert_eq!(notes[0].position_type, Some("file".to_string())); assert_eq!(notes[0].position_type, Some("file".to_string()));
assert_eq!(notes[0].position_new_path, Some("binary.bin".to_string())); assert_eq!(notes[0].position_new_path, Some("binary.bin".to_string()));
// File-level comments have no line numbers
assert_eq!(notes[0].position_new_line, None); assert_eq!(notes[0].position_new_line, None);
} }

View File

@@ -1,16 +1,8 @@
//! Integration tests for embedding storage and vector search.
//!
//! These tests create an in-memory SQLite database with sqlite-vec loaded,
//! apply all migrations through 010 (chunk config), and verify KNN search
//! and metadata operations.
use lore::core::db::create_connection; use lore::core::db::create_connection;
use rusqlite::Connection; use rusqlite::Connection;
use std::path::PathBuf; use std::path::PathBuf;
use tempfile::TempDir; use tempfile::TempDir;
/// Create a test DB on disk (required for sqlite-vec which needs the extension loaded).
/// Uses create_connection to get the sqlite-vec extension registered.
fn create_test_db() -> (TempDir, Connection) { fn create_test_db() -> (TempDir, Connection) {
let tmp = TempDir::new().unwrap(); let tmp = TempDir::new().unwrap();
let db_path = tmp.path().join("test.db"); let db_path = tmp.path().join("test.db");
@@ -35,7 +27,6 @@ fn create_test_db() -> (TempDir, Connection) {
.unwrap_or_else(|e| panic!("Migration {} failed: {}", version, e)); .unwrap_or_else(|e| panic!("Migration {} failed: {}", version, e));
} }
// Seed a project
conn.execute( conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')", "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')",
[], [],
@@ -54,7 +45,6 @@ fn insert_document(conn: &Connection, id: i64, title: &str, content: &str) {
.unwrap(); .unwrap();
} }
/// Create a 768-dim vector with a specific dimension set to 1.0 (unit vector along axis).
fn axis_vector(dim: usize) -> Vec<f32> { fn axis_vector(dim: usize) -> Vec<f32> {
let mut v = vec![0.0f32; 768]; let mut v = vec![0.0f32; 768];
v[dim] = 1.0; v[dim] = 1.0;
@@ -89,12 +79,10 @@ fn knn_search_returns_nearest_neighbors() {
insert_document(&conn, 2, "Doc B", "Content about database optimization."); insert_document(&conn, 2, "Doc B", "Content about database optimization.");
insert_document(&conn, 3, "Doc C", "Content about logging infrastructure."); insert_document(&conn, 3, "Doc C", "Content about logging infrastructure.");
// Doc 1: axis 0, Doc 2: axis 1, Doc 3: axis 2
insert_embedding(&conn, 1, 0, &axis_vector(0)); insert_embedding(&conn, 1, 0, &axis_vector(0));
insert_embedding(&conn, 2, 0, &axis_vector(1)); insert_embedding(&conn, 2, 0, &axis_vector(1));
insert_embedding(&conn, 3, 0, &axis_vector(2)); insert_embedding(&conn, 3, 0, &axis_vector(2));
// Query vector close to axis 0 (should match doc 1)
let mut query = vec![0.0f32; 768]; let mut query = vec![0.0f32; 768];
query[0] = 0.9; query[0] = 0.9;
query[1] = 0.1; query[1] = 0.1;
@@ -132,7 +120,6 @@ fn knn_search_deduplicates_chunks() {
"Very long content that was chunked.", "Very long content that was chunked.",
); );
// Same document, two chunks, both similar to query
let mut v1 = vec![0.0f32; 768]; let mut v1 = vec![0.0f32; 768];
v1[0] = 1.0; v1[0] = 1.0;
let mut v2 = vec![0.0f32; 768]; let mut v2 = vec![0.0f32; 768];
@@ -144,7 +131,6 @@ fn knn_search_deduplicates_chunks() {
let results = lore::search::search_vector(&conn, &axis_vector(0), 10).unwrap(); let results = lore::search::search_vector(&conn, &axis_vector(0), 10).unwrap();
// Should deduplicate: same document_id appears at most once
let unique_docs: std::collections::HashSet<i64> = let unique_docs: std::collections::HashSet<i64> =
results.iter().map(|r| r.document_id).collect(); results.iter().map(|r| r.document_id).collect();
assert_eq!( assert_eq!(
@@ -161,7 +147,6 @@ fn orphan_trigger_deletes_embeddings_on_document_delete() {
insert_document(&conn, 1, "Will be deleted", "Content."); insert_document(&conn, 1, "Will be deleted", "Content.");
insert_embedding(&conn, 1, 0, &axis_vector(0)); insert_embedding(&conn, 1, 0, &axis_vector(0));
// Verify embedding exists
let count: i64 = conn let count: i64 = conn
.query_row( .query_row(
"SELECT COUNT(*) FROM embeddings WHERE rowid = 1000", "SELECT COUNT(*) FROM embeddings WHERE rowid = 1000",
@@ -171,11 +156,9 @@ fn orphan_trigger_deletes_embeddings_on_document_delete() {
.unwrap(); .unwrap();
assert_eq!(count, 1, "Embedding should exist before delete"); assert_eq!(count, 1, "Embedding should exist before delete");
// Delete the document
conn.execute("DELETE FROM documents WHERE id = 1", []) conn.execute("DELETE FROM documents WHERE id = 1", [])
.unwrap(); .unwrap();
// Verify embedding was cascade-deleted via trigger
let count: i64 = conn let count: i64 = conn
.query_row( .query_row(
"SELECT COUNT(*) FROM embeddings WHERE rowid = 1000", "SELECT COUNT(*) FROM embeddings WHERE rowid = 1000",
@@ -188,7 +171,6 @@ fn orphan_trigger_deletes_embeddings_on_document_delete() {
"Trigger should delete embeddings when document is deleted" "Trigger should delete embeddings when document is deleted"
); );
// Verify metadata was cascade-deleted via FK
let meta_count: i64 = conn let meta_count: i64 = conn
.query_row( .query_row(
"SELECT COUNT(*) FROM embedding_metadata WHERE document_id = 1", "SELECT COUNT(*) FROM embedding_metadata WHERE document_id = 1",
@@ -207,19 +189,12 @@ fn empty_database_returns_no_results() {
assert!(results.is_empty(), "Empty DB should return no results"); assert!(results.is_empty(), "Empty DB should return no results");
} }
// --- Bug-fix regression tests ---
#[test] #[test]
fn overflow_doc_with_error_sentinel_not_re_detected_as_pending() { fn overflow_doc_with_error_sentinel_not_re_detected_as_pending() {
// Bug 2: Documents skipped for chunk overflow must record a sentinel error
// in embedding_metadata so they are not re-detected as pending on subsequent
// pipeline runs (which would cause an infinite re-processing loop).
let (_tmp, conn) = create_test_db(); let (_tmp, conn) = create_test_db();
insert_document(&conn, 1, "Overflow doc", "Some content"); insert_document(&conn, 1, "Overflow doc", "Some content");
// Simulate what the pipeline does when a document exceeds CHUNK_ROWID_MULTIPLIER:
// it records an error sentinel at chunk_index=0.
let now = chrono::Utc::now().timestamp_millis(); let now = chrono::Utc::now().timestamp_millis();
conn.execute( conn.execute(
"INSERT INTO embedding_metadata "INSERT INTO embedding_metadata
@@ -230,7 +205,6 @@ fn overflow_doc_with_error_sentinel_not_re_detected_as_pending() {
) )
.unwrap(); .unwrap();
// Now find_pending_documents should NOT return this document
let pending = let pending =
lore::embedding::find_pending_documents(&conn, 100, 0, "nomic-embed-text").unwrap(); lore::embedding::find_pending_documents(&conn, 100, 0, "nomic-embed-text").unwrap();
assert!( assert!(
@@ -239,7 +213,6 @@ fn overflow_doc_with_error_sentinel_not_re_detected_as_pending() {
pending.len() pending.len()
); );
// count_pending_documents should also return 0
let count = lore::embedding::count_pending_documents(&conn, "nomic-embed-text").unwrap(); let count = lore::embedding::count_pending_documents(&conn, "nomic-embed-text").unwrap();
assert_eq!( assert_eq!(
count, 0, count, 0,
@@ -249,11 +222,8 @@ fn overflow_doc_with_error_sentinel_not_re_detected_as_pending() {
#[test] #[test]
fn count_and_find_pending_agree() { fn count_and_find_pending_agree() {
// Bug 1: count_pending_documents and find_pending_documents must use
// logically equivalent WHERE clauses to produce consistent results.
let (_tmp, conn) = create_test_db(); let (_tmp, conn) = create_test_db();
// Case 1: No documents at all
let count = lore::embedding::count_pending_documents(&conn, "nomic-embed-text").unwrap(); let count = lore::embedding::count_pending_documents(&conn, "nomic-embed-text").unwrap();
let found = let found =
lore::embedding::find_pending_documents(&conn, 1000, 0, "nomic-embed-text").unwrap(); lore::embedding::find_pending_documents(&conn, 1000, 0, "nomic-embed-text").unwrap();
@@ -263,7 +233,6 @@ fn count_and_find_pending_agree() {
"Empty DB: count and find should agree" "Empty DB: count and find should agree"
); );
// Case 2: New document (no metadata)
insert_document(&conn, 1, "New doc", "Content"); insert_document(&conn, 1, "New doc", "Content");
let count = lore::embedding::count_pending_documents(&conn, "nomic-embed-text").unwrap(); let count = lore::embedding::count_pending_documents(&conn, "nomic-embed-text").unwrap();
let found = let found =
@@ -275,7 +244,6 @@ fn count_and_find_pending_agree() {
); );
assert_eq!(count, 1); assert_eq!(count, 1);
// Case 3: Document with matching metadata (not pending)
let now = chrono::Utc::now().timestamp_millis(); let now = chrono::Utc::now().timestamp_millis();
conn.execute( conn.execute(
"INSERT INTO embedding_metadata "INSERT INTO embedding_metadata
@@ -295,7 +263,6 @@ fn count_and_find_pending_agree() {
); );
assert_eq!(count, 0); assert_eq!(count, 0);
// Case 4: Config drift (chunk_max_bytes mismatch)
conn.execute( conn.execute(
"UPDATE embedding_metadata SET chunk_max_bytes = 999 WHERE document_id = 1", "UPDATE embedding_metadata SET chunk_max_bytes = 999 WHERE document_id = 1",
[], [],
@@ -314,14 +281,11 @@ fn count_and_find_pending_agree() {
#[test] #[test]
fn full_embed_delete_is_atomic() { fn full_embed_delete_is_atomic() {
// Bug 7: The --full flag's two DELETE statements should be atomic.
// This test verifies that both tables are cleared together.
let (_tmp, conn) = create_test_db(); let (_tmp, conn) = create_test_db();
insert_document(&conn, 1, "Doc", "Content"); insert_document(&conn, 1, "Doc", "Content");
insert_embedding(&conn, 1, 0, &axis_vector(0)); insert_embedding(&conn, 1, 0, &axis_vector(0));
// Verify data exists
let meta_count: i64 = conn let meta_count: i64 = conn
.query_row("SELECT COUNT(*) FROM embedding_metadata", [], |r| r.get(0)) .query_row("SELECT COUNT(*) FROM embedding_metadata", [], |r| r.get(0))
.unwrap(); .unwrap();
@@ -331,7 +295,6 @@ fn full_embed_delete_is_atomic() {
assert_eq!(meta_count, 1); assert_eq!(meta_count, 1);
assert_eq!(embed_count, 1); assert_eq!(embed_count, 1);
// Execute the atomic delete (same as embed.rs --full)
conn.execute_batch( conn.execute_batch(
"BEGIN; "BEGIN;
DELETE FROM embedding_metadata; DELETE FROM embedding_metadata;

View File

@@ -1,5 +1,3 @@
//! Tests for test fixtures - verifies they deserialize correctly.
use lore::gitlab::types::{GitLabDiscussion, GitLabIssue}; use lore::gitlab::types::{GitLabDiscussion, GitLabIssue};
use serde::de::DeserializeOwned; use serde::de::DeserializeOwned;
use std::path::PathBuf; use std::path::PathBuf;
@@ -38,14 +36,11 @@ fn fixture_gitlab_issues_page_deserializes() {
"Need at least 3 issues for pagination tests" "Need at least 3 issues for pagination tests"
); );
// Check first issue has labels
assert!(!issues[0].labels.is_empty()); assert!(!issues[0].labels.is_empty());
// Check second issue has null description and empty labels
assert!(issues[1].description.is_none()); assert!(issues[1].description.is_none());
assert!(issues[1].labels.is_empty()); assert!(issues[1].labels.is_empty());
// Check third issue has multiple labels
assert!(issues[2].labels.len() >= 3); assert!(issues[2].labels.len() >= 3);
} }
@@ -67,7 +62,6 @@ fn fixture_gitlab_discussions_page_deserializes() {
"Need multiple discussions for testing" "Need multiple discussions for testing"
); );
// Check we have both individual_note=true and individual_note=false
let has_individual = discussions.iter().any(|d| d.individual_note); let has_individual = discussions.iter().any(|d| d.individual_note);
let has_threaded = discussions.iter().any(|d| !d.individual_note); let has_threaded = discussions.iter().any(|d| !d.individual_note);
assert!( assert!(

View File

@@ -1,8 +1,3 @@
//! Integration tests for FTS5 search.
//!
//! These tests create an in-memory SQLite database, apply migrations through 008 (FTS5),
//! seed documents, and verify search behavior.
use rusqlite::Connection; use rusqlite::Connection;
fn create_test_db() -> Connection { fn create_test_db() -> Connection {
@@ -28,7 +23,6 @@ fn create_test_db() -> Connection {
.unwrap_or_else(|e| panic!("Migration {} failed: {}", version, e)); .unwrap_or_else(|e| panic!("Migration {} failed: {}", version, e));
} }
// Seed a project
conn.execute( conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')", "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')",
[], [],
@@ -110,7 +104,6 @@ fn fts_stemming_matches() {
"Deployment configuration for production servers.", "Deployment configuration for production servers.",
); );
// "running" should match "runner" and "executing" via porter stemmer
let results = let results =
lore::search::search_fts(&conn, "running", 10, lore::search::FtsQueryMode::Safe).unwrap(); lore::search::search_fts(&conn, "running", 10, lore::search::FtsQueryMode::Safe).unwrap();
assert!( assert!(
@@ -157,11 +150,9 @@ fn fts_special_characters_handled() {
"The C++ compiler segfaults on template metaprogramming.", "The C++ compiler segfaults on template metaprogramming.",
); );
// Special characters should not crash the search
let results = let results =
lore::search::search_fts(&conn, "C++ compiler", 10, lore::search::FtsQueryMode::Safe) lore::search::search_fts(&conn, "C++ compiler", 10, lore::search::FtsQueryMode::Safe)
.unwrap(); .unwrap();
// Safe mode sanitizes the query — it should still return results or at least not crash
assert!(results.len() <= 1); assert!(results.len() <= 1);
} }
@@ -169,7 +160,6 @@ fn fts_special_characters_handled() {
fn fts_result_ordering_by_relevance() { fn fts_result_ordering_by_relevance() {
let conn = create_test_db(); let conn = create_test_db();
// Doc 1: "authentication" in title and content
insert_document( insert_document(
&conn, &conn,
1, 1,
@@ -177,7 +167,6 @@ fn fts_result_ordering_by_relevance() {
"Authentication system redesign", "Authentication system redesign",
"The authentication system needs a complete redesign. Authentication flows are broken.", "The authentication system needs a complete redesign. Authentication flows are broken.",
); );
// Doc 2: "authentication" only in content, once
insert_document( insert_document(
&conn, &conn,
2, 2,
@@ -185,7 +174,6 @@ fn fts_result_ordering_by_relevance() {
"Login page update", "Login page update",
"Updated the login page with better authentication error messages.", "Updated the login page with better authentication error messages.",
); );
// Doc 3: unrelated
insert_document( insert_document(
&conn, &conn,
3, 3,
@@ -203,7 +191,6 @@ fn fts_result_ordering_by_relevance() {
.unwrap(); .unwrap();
assert!(results.len() >= 2, "Should match at least 2 documents"); assert!(results.len() >= 2, "Should match at least 2 documents");
// Doc 1 should rank higher (more occurrences of the term)
assert_eq!( assert_eq!(
results[0].document_id, 1, results[0].document_id, 1,
"Document with more term occurrences should rank first" "Document with more term occurrences should rank first"
@@ -246,7 +233,6 @@ fn fts_snippet_generated() {
.unwrap(); .unwrap();
assert!(!results.is_empty()); assert!(!results.is_empty());
// Snippet should contain some text (may have FTS5 highlight markers)
assert!( assert!(
!results[0].snippet.is_empty(), !results[0].snippet.is_empty(),
"Snippet should be generated" "Snippet should be generated"
@@ -265,7 +251,6 @@ fn fts_triggers_sync_on_insert() {
"This is test content for FTS trigger verification.", "This is test content for FTS trigger verification.",
); );
// Verify FTS table has an entry via direct query
let fts_count: i64 = conn let fts_count: i64 = conn
.query_row( .query_row(
"SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'test'", "SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'test'",
@@ -289,7 +274,6 @@ fn fts_triggers_sync_on_delete() {
"This content will be deleted from the index.", "This content will be deleted from the index.",
); );
// Verify it's indexed
let before: i64 = conn let before: i64 = conn
.query_row( .query_row(
"SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'deletable'", "SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'deletable'",
@@ -299,11 +283,9 @@ fn fts_triggers_sync_on_delete() {
.unwrap(); .unwrap();
assert_eq!(before, 1); assert_eq!(before, 1);
// Delete the document
conn.execute("DELETE FROM documents WHERE id = 1", []) conn.execute("DELETE FROM documents WHERE id = 1", [])
.unwrap(); .unwrap();
// Verify it's removed from FTS
let after: i64 = conn let after: i64 = conn
.query_row( .query_row(
"SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'deletable'", "SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'deletable'",
@@ -318,7 +300,6 @@ fn fts_triggers_sync_on_delete() {
fn fts_null_title_handled() { fn fts_null_title_handled() {
let conn = create_test_db(); let conn = create_test_db();
// Discussion documents have NULL titles
conn.execute( conn.execute(
"INSERT INTO documents (id, source_type, source_id, project_id, title, content_text, content_hash, url) "INSERT INTO documents (id, source_type, source_id, project_id, title, content_text, content_hash, url)
VALUES (1, 'discussion', 1, 1, NULL, 'Discussion about API rate limiting strategies.', 'hash1', 'https://example.com/1')", VALUES (1, 'discussion', 1, 1, NULL, 'Discussion about API rate limiting strategies.', 'hash1', 'https://example.com/1')",

View File

@@ -1,7 +1,5 @@
//! Tests for GitLab API response type deserialization.
use lore::gitlab::types::{ use lore::gitlab::types::{
GitLabAuthor, GitLabDiscussion, GitLabIssue, GitLabLabelEvent, GitLabLabelRef, GitLabAuthor, GitLabDiscussion, GitLabIssue, GitLabIssueRef, GitLabLabelEvent, GitLabLabelRef,
GitLabMergeRequest, GitLabMergeRequestRef, GitLabMilestone, GitLabMilestoneEvent, GitLabMergeRequest, GitLabMergeRequestRef, GitLabMilestone, GitLabMilestoneEvent,
GitLabMilestoneRef, GitLabNote, GitLabNotePosition, GitLabReferences, GitLabReviewer, GitLabMilestoneRef, GitLabNote, GitLabNotePosition, GitLabReferences, GitLabReviewer,
GitLabStateEvent, GitLabStateEvent,
@@ -212,7 +210,6 @@ fn handles_diffnote_type() {
#[test] #[test]
fn handles_missing_resolvable_field() { fn handles_missing_resolvable_field() {
// GitLab API sometimes omits resolvable/resolved fields entirely
let json = r#"{ let json = r#"{
"id": 12345, "id": 12345,
"type": null, "type": null,
@@ -229,7 +226,6 @@ fn handles_missing_resolvable_field() {
let note: GitLabNote = serde_json::from_str(json).expect("Failed to deserialize note"); let note: GitLabNote = serde_json::from_str(json).expect("Failed to deserialize note");
// Should default to false when not present
assert!(!note.resolvable); assert!(!note.resolvable);
assert!(!note.resolved); assert!(!note.resolved);
} }
@@ -258,7 +254,6 @@ fn deserializes_system_note() {
#[test] #[test]
fn deserializes_note_position_with_partial_fields() { fn deserializes_note_position_with_partial_fields() {
// DiffNote position can have partial data (e.g., new file with no old_path)
let json = r#"{ let json = r#"{
"old_path": null, "old_path": null,
"new_path": "src/new_file.rs", "new_path": "src/new_file.rs",
@@ -403,8 +398,6 @@ fn deserializes_gitlab_milestone() {
assert_eq!(milestone.due_date, Some("2024-04-01".to_string())); assert_eq!(milestone.due_date, Some("2024-04-01".to_string()));
} }
// === Checkpoint 2: Merge Request type tests ===
#[test] #[test]
fn deserializes_gitlab_merge_request_from_fixture() { fn deserializes_gitlab_merge_request_from_fixture() {
let json = include_str!("fixtures/gitlab_merge_request.json"); let json = include_str!("fixtures/gitlab_merge_request.json");
@@ -449,7 +442,6 @@ fn deserializes_gitlab_merge_request_with_references() {
#[test] #[test]
fn deserializes_gitlab_merge_request_minimal() { fn deserializes_gitlab_merge_request_minimal() {
// Test with minimal fields (no optional ones)
let json = r#"{ let json = r#"{
"id": 1, "id": 1,
"iid": 1, "iid": 1,
@@ -509,7 +501,6 @@ fn deserializes_gitlab_merge_request_with_draft() {
#[test] #[test]
fn deserializes_gitlab_merge_request_with_work_in_progress_fallback() { fn deserializes_gitlab_merge_request_with_work_in_progress_fallback() {
// Older GitLab instances use work_in_progress instead of draft
let json = r#"{ let json = r#"{
"id": 1, "id": 1,
"iid": 1, "iid": 1,
@@ -528,13 +519,11 @@ fn deserializes_gitlab_merge_request_with_work_in_progress_fallback() {
let mr: GitLabMergeRequest = serde_json::from_str(json).expect("Failed to deserialize WIP MR"); let mr: GitLabMergeRequest = serde_json::from_str(json).expect("Failed to deserialize WIP MR");
assert!(mr.work_in_progress); assert!(mr.work_in_progress);
// draft defaults to false when not present
assert!(!mr.draft); assert!(!mr.draft);
} }
#[test] #[test]
fn deserializes_gitlab_merge_request_with_locked_state() { fn deserializes_gitlab_merge_request_with_locked_state() {
// locked is a transitional state during merge
let json = r#"{ let json = r#"{
"id": 1, "id": 1,
"iid": 1, "iid": 1,
@@ -640,8 +629,6 @@ fn deserializes_diffnote_position_with_line_range() {
assert_eq!(range.end_line(), Some(15)); assert_eq!(range.end_line(), Some(15));
} }
// === Resource Event type tests ===
#[test] #[test]
fn deserializes_state_event_closed_by_mr() { fn deserializes_state_event_closed_by_mr() {
let json = r#"{ let json = r#"{
@@ -896,3 +883,60 @@ fn deserializes_milestone_ref() {
assert_eq!(ms_ref.iid, 5); assert_eq!(ms_ref.iid, 5);
assert_eq!(ms_ref.title, "v1.0"); assert_eq!(ms_ref.title, "v1.0");
} }
#[test]
fn deserializes_gitlab_issue_ref() {
let json = r#"{
"id": 5001,
"iid": 42,
"project_id": 100,
"title": "Fix authentication bug",
"state": "opened",
"web_url": "https://gitlab.example.com/group/project/-/issues/42"
}"#;
let issue_ref: GitLabIssueRef =
serde_json::from_str(json).expect("Failed to deserialize issue ref");
assert_eq!(issue_ref.id, 5001);
assert_eq!(issue_ref.iid, 42);
assert_eq!(issue_ref.project_id, 100);
assert_eq!(issue_ref.title, "Fix authentication bug");
assert_eq!(issue_ref.state, "opened");
assert_eq!(
issue_ref.web_url,
"https://gitlab.example.com/group/project/-/issues/42"
);
}
#[test]
fn deserializes_gitlab_issue_ref_array() {
let json = r#"[
{
"id": 5001,
"iid": 42,
"project_id": 100,
"title": "Issue one",
"state": "opened",
"web_url": "https://gitlab.example.com/-/issues/42"
},
{
"id": 5002,
"iid": 43,
"project_id": 200,
"title": "Issue two from another project",
"state": "closed",
"web_url": "https://gitlab.example.com/-/issues/43"
}
]"#;
let refs: Vec<GitLabIssueRef> =
serde_json::from_str(json).expect("Failed to deserialize issue ref array");
assert_eq!(refs.len(), 2);
assert_eq!(refs[0].iid, 42);
assert_eq!(refs[0].project_id, 100);
assert_eq!(refs[1].iid, 43);
assert_eq!(refs[1].project_id, 200);
assert_eq!(refs[1].state, "closed");
}

View File

@@ -1,9 +1,3 @@
//! Golden query test suite.
//!
//! Verifies end-to-end search quality with known-good expected results.
//! Uses a seeded SQLite DB with deterministic fixture data and no external
//! dependencies (no Ollama, no GitLab).
#![allow(dead_code)] #![allow(dead_code)]
use rusqlite::Connection; use rusqlite::Connection;
@@ -12,7 +6,6 @@ use std::path::PathBuf;
use lore::search::{FtsQueryMode, SearchFilters, SearchMode, apply_filters, search_fts}; use lore::search::{FtsQueryMode, SearchFilters, SearchMode, apply_filters, search_fts};
/// A golden query test case.
#[derive(Debug, Deserialize)] #[derive(Debug, Deserialize)]
struct GoldenQuery { struct GoldenQuery {
query: String, query: String,
@@ -42,12 +35,10 @@ fn load_golden_queries() -> Vec<GoldenQuery> {
.unwrap_or_else(|e| panic!("Failed to parse golden queries: {}", e)) .unwrap_or_else(|e| panic!("Failed to parse golden queries: {}", e))
} }
/// Create an in-memory database with FTS5 schema and seed deterministic fixture data.
fn create_seeded_db() -> Connection { fn create_seeded_db() -> Connection {
let conn = Connection::open_in_memory().unwrap(); let conn = Connection::open_in_memory().unwrap();
conn.pragma_update(None, "foreign_keys", "ON").unwrap(); conn.pragma_update(None, "foreign_keys", "ON").unwrap();
// Apply migrations 001-008 (FTS5)
let migrations_dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("migrations"); let migrations_dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("migrations");
for version in 1..=8 { for version in 1..=8 {
let entries: Vec<_> = std::fs::read_dir(&migrations_dir) let entries: Vec<_> = std::fs::read_dir(&migrations_dir)
@@ -65,7 +56,6 @@ fn create_seeded_db() -> Connection {
.unwrap_or_else(|e| panic!("Migration {} failed: {}", version, e)); .unwrap_or_else(|e| panic!("Migration {} failed: {}", version, e));
} }
// Seed project
conn.execute( conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url) "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url)
VALUES (1, 100, 'group/project', 'https://gitlab.example.com/group/project')", VALUES (1, 100, 'group/project', 'https://gitlab.example.com/group/project')",
@@ -73,9 +63,7 @@ fn create_seeded_db() -> Connection {
) )
.unwrap(); .unwrap();
// Seed deterministic documents
let documents = vec![ let documents = vec![
// id=1: Auth issue (matches: authentication, login, OAuth, JWT, token, refresh)
( (
1, 1,
"issue", "issue",
@@ -86,7 +74,6 @@ fn create_seeded_db() -> Connection {
Multiple users reported authentication failures across all OAuth providers.", Multiple users reported authentication failures across all OAuth providers.",
"testuser", "testuser",
), ),
// id=2: User profile MR (matches: user, profile, avatar, upload)
( (
2, 2,
"merge_request", "merge_request",
@@ -96,7 +83,6 @@ fn create_seeded_db() -> Connection {
responsive design for mobile and desktop viewports.", responsive design for mobile and desktop viewports.",
"developer1", "developer1",
), ),
// id=3: Database migration issue (matches: database, migration, PostgreSQL, schema)
( (
3, 3,
"issue", "issue",
@@ -106,7 +92,6 @@ fn create_seeded_db() -> Connection {
rewritten to use the new schema modification syntax. All staging environments affected.", rewritten to use the new schema modification syntax. All staging environments affected.",
"dba_admin", "dba_admin",
), ),
// id=4: Performance MR (matches: performance, optimization, caching, query)
( (
4, 4,
"merge_request", "merge_request",
@@ -116,7 +101,6 @@ fn create_seeded_db() -> Connection {
to 180ms. Added connection pooling and prepared statement caching.", to 180ms. Added connection pooling and prepared statement caching.",
"senior_dev", "senior_dev",
), ),
// id=5: API rate limiting discussion (matches: API, rate, limiting, throttle)
( (
5, 5,
"discussion", "discussion",
@@ -127,7 +111,6 @@ fn create_seeded_db() -> Connection {
Need to handle burst traffic during peak hours without throttling legitimate users.", Need to handle burst traffic during peak hours without throttling legitimate users.",
"architect", "architect",
), ),
// id=6: UI/CSS issue (matches: CSS, styling, frontend, responsive, UI)
( (
6, 6,
"issue", "issue",
@@ -138,7 +121,6 @@ fn create_seeded_db() -> Connection {
conflicting CSS specificity with the theme system.", conflicting CSS specificity with the theme system.",
"frontend_dev", "frontend_dev",
), ),
// id=7: CI/CD MR (matches: CI, CD, pipeline, deployment, Docker)
( (
7, 7,
"merge_request", "merge_request",
@@ -148,7 +130,6 @@ fn create_seeded_db() -> Connection {
support for failed deployments. Pipeline runtime reduced from 45min to 12min.", support for failed deployments. Pipeline runtime reduced from 45min to 12min.",
"devops_lead", "devops_lead",
), ),
// id=8: Security issue (matches: security, vulnerability, XSS, injection)
( (
8, 8,
"issue", "issue",
@@ -169,7 +150,6 @@ fn create_seeded_db() -> Connection {
.unwrap(); .unwrap();
} }
// Seed labels for filtered queries
conn.execute_batch( conn.execute_batch(
"INSERT INTO document_labels (document_id, label_name) VALUES (1, 'bug'); "INSERT INTO document_labels (document_id, label_name) VALUES (1, 'bug');
INSERT INTO document_labels (document_id, label_name) VALUES (1, 'authentication'); INSERT INTO document_labels (document_id, label_name) VALUES (1, 'authentication');
@@ -212,7 +192,6 @@ fn golden_queries_all_pass() {
for (i, gq) in queries.iter().enumerate() { for (i, gq) in queries.iter().enumerate() {
let mode = SearchMode::parse(&gq.mode).unwrap_or(SearchMode::Lexical); let mode = SearchMode::parse(&gq.mode).unwrap_or(SearchMode::Lexical);
// For lexical-only golden queries (no Ollama needed)
assert_eq!( assert_eq!(
mode, mode,
SearchMode::Lexical, SearchMode::Lexical,
@@ -221,11 +200,9 @@ fn golden_queries_all_pass() {
gq.mode gq.mode
); );
// Run FTS search
let fts_results = search_fts(&conn, &gq.query, 50, FtsQueryMode::Safe).unwrap(); let fts_results = search_fts(&conn, &gq.query, 50, FtsQueryMode::Safe).unwrap();
let doc_ids: Vec<i64> = fts_results.iter().map(|r| r.document_id).collect(); let doc_ids: Vec<i64> = fts_results.iter().map(|r| r.document_id).collect();
// Apply filters if any
let filters = build_search_filters(&gq.filters); let filters = build_search_filters(&gq.filters);
let filtered_ids = if filters.has_any_filter() { let filtered_ids = if filters.has_any_filter() {
apply_filters(&conn, &doc_ids, &filters).unwrap() apply_filters(&conn, &doc_ids, &filters).unwrap()
@@ -233,7 +210,6 @@ fn golden_queries_all_pass() {
doc_ids.clone() doc_ids.clone()
}; };
// Check min_results
if filtered_ids.len() < gq.min_results { if filtered_ids.len() < gq.min_results {
failures.push(format!( failures.push(format!(
"FAIL [{}] \"{}\": expected >= {} results, got {} (description: {})", "FAIL [{}] \"{}\": expected >= {} results, got {} (description: {})",
@@ -246,13 +222,10 @@ fn golden_queries_all_pass() {
continue; continue;
} }
// Check each expected doc_id is in top max_rank
for expected_id in &gq.expected_doc_ids { for expected_id in &gq.expected_doc_ids {
let position = filtered_ids.iter().position(|id| id == expected_id); let position = filtered_ids.iter().position(|id| id == expected_id);
match position { match position {
Some(pos) if pos < gq.max_rank => { Some(pos) if pos < gq.max_rank => {}
// Pass
}
Some(pos) => { Some(pos) => {
failures.push(format!( failures.push(format!(
"FAIL [{}] \"{}\": expected doc_id {} in top {}, found at rank {} (description: {})", "FAIL [{}] \"{}\": expected doc_id {} in top {}, found at rank {} (description: {})",

View File

@@ -1,8 +1,3 @@
//! Integration tests for hybrid search combining FTS + vector.
//!
//! Tests all three search modes (lexical, semantic, hybrid) and
//! verifies graceful degradation when embeddings are unavailable.
use lore::core::db::create_connection; use lore::core::db::create_connection;
use lore::search::{FtsQueryMode, SearchFilters, SearchMode, search_fts, search_hybrid}; use lore::search::{FtsQueryMode, SearchFilters, SearchMode, search_fts, search_hybrid};
use rusqlite::Connection; use rusqlite::Connection;
@@ -89,7 +84,6 @@ fn lexical_mode_uses_fts_only() {
assert!(!results.is_empty(), "Lexical search should find results"); assert!(!results.is_empty(), "Lexical search should find results");
assert_eq!(results[0].document_id, 1); assert_eq!(results[0].document_id, 1);
// Lexical mode should not produce Ollama-related warnings
assert!( assert!(
warnings.iter().all(|w| !w.contains("Ollama")), warnings.iter().all(|w| !w.contains("Ollama")),
"Lexical mode should not warn about Ollama" "Lexical mode should not warn about Ollama"
@@ -98,12 +92,10 @@ fn lexical_mode_uses_fts_only() {
#[test] #[test]
fn lexical_mode_no_embeddings_required() { fn lexical_mode_no_embeddings_required() {
// Use in-memory DB without sqlite-vec for pure FTS
let conn = Connection::open_in_memory().unwrap(); let conn = Connection::open_in_memory().unwrap();
conn.pragma_update(None, "foreign_keys", "ON").unwrap(); conn.pragma_update(None, "foreign_keys", "ON").unwrap();
let migrations_dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("migrations"); let migrations_dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("migrations");
// Only apply through migration 008 (FTS5, no embeddings)
for version in 1..=8 { for version in 1..=8 {
let entries: Vec<_> = std::fs::read_dir(&migrations_dir) let entries: Vec<_> = std::fs::read_dir(&migrations_dir)
.unwrap() .unwrap()
@@ -159,7 +151,7 @@ fn hybrid_mode_degrades_to_fts_without_client() {
let (results, warnings) = rt let (results, warnings) = rt
.block_on(search_hybrid( .block_on(search_hybrid(
&conn, &conn,
None, // No Ollama client None,
"performance slow", "performance slow",
SearchMode::Hybrid, SearchMode::Hybrid,
&filters, &filters,
@@ -168,7 +160,6 @@ fn hybrid_mode_degrades_to_fts_without_client() {
.unwrap(); .unwrap();
assert!(!results.is_empty(), "Should fall back to FTS results"); assert!(!results.is_empty(), "Should fall back to FTS results");
// Should warn about missing Ollama client
assert!( assert!(
warnings.iter().any(|w| w.to_lowercase().contains("vector") warnings.iter().any(|w| w.to_lowercase().contains("vector")
|| w.to_lowercase().contains("ollama") || w.to_lowercase().contains("ollama")
@@ -184,14 +175,12 @@ fn hybrid_mode_degrades_to_fts_without_client() {
fn rrf_ranking_combines_signals() { fn rrf_ranking_combines_signals() {
use lore::search::rank_rrf; use lore::search::rank_rrf;
// Two documents with different rankings in each signal let vector_results = vec![(1_i64, 0.1), (2, 0.5)];
let vector_results = vec![(1_i64, 0.1), (2, 0.5)]; // doc 1 closer let fts_results = vec![(2_i64, -5.0), (1, -3.0)];
let fts_results = vec![(2_i64, -5.0), (1, -3.0)]; // doc 2 higher BM25
let rrf = rank_rrf(&vector_results, &fts_results); let rrf = rank_rrf(&vector_results, &fts_results);
assert_eq!(rrf.len(), 2, "Should return both documents"); assert_eq!(rrf.len(), 2, "Should return both documents");
// Both docs appear in both signals, so both get RRF scores
for r in &rrf { for r in &rrf {
assert!(r.rrf_score > 0.0, "RRF score should be positive"); assert!(r.rrf_score > 0.0, "RRF score should be positive");
} }
@@ -235,7 +224,6 @@ fn filters_by_source_type() {
#[test] #[test]
fn search_mode_variants_exist() { fn search_mode_variants_exist() {
// Verify all enum variants compile and are distinct
let hybrid = SearchMode::Hybrid; let hybrid = SearchMode::Hybrid;
let lexical = SearchMode::Lexical; let lexical = SearchMode::Lexical;
let semantic = SearchMode::Semantic; let semantic = SearchMode::Semantic;

View File

@@ -1,5 +1,3 @@
//! Tests for database migrations.
use rusqlite::Connection; use rusqlite::Connection;
use std::path::PathBuf; use std::path::PathBuf;
@@ -41,7 +39,6 @@ fn migration_002_creates_issues_table() {
let conn = create_test_db(); let conn = create_test_db();
apply_migrations(&conn, 2); apply_migrations(&conn, 2);
// Verify issues table exists with expected columns
let columns: Vec<String> = conn let columns: Vec<String> = conn
.prepare("PRAGMA table_info(issues)") .prepare("PRAGMA table_info(issues)")
.unwrap() .unwrap()
@@ -124,13 +121,11 @@ fn migration_002_enforces_state_check() {
let conn = create_test_db(); let conn = create_test_db();
apply_migrations(&conn, 2); apply_migrations(&conn, 2);
// First insert a project so we can reference it
conn.execute( conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')", "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')",
[], [],
).unwrap(); ).unwrap();
// Valid states should work
conn.execute( conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, state, created_at, updated_at, last_seen_at) "INSERT INTO issues (gitlab_id, project_id, iid, state, created_at, updated_at, last_seen_at)
VALUES (1, 1, 1, 'opened', 1000, 1000, 1000)", VALUES (1, 1, 1, 'opened', 1000, 1000, 1000)",
@@ -143,7 +138,6 @@ fn migration_002_enforces_state_check() {
[], [],
).unwrap(); ).unwrap();
// Invalid state should fail
let result = conn.execute( let result = conn.execute(
"INSERT INTO issues (gitlab_id, project_id, iid, state, created_at, updated_at, last_seen_at) "INSERT INTO issues (gitlab_id, project_id, iid, state, created_at, updated_at, last_seen_at)
VALUES (3, 1, 3, 'invalid', 1000, 1000, 1000)", VALUES (3, 1, 3, 'invalid', 1000, 1000, 1000)",
@@ -158,7 +152,6 @@ fn migration_002_enforces_noteable_type_check() {
let conn = create_test_db(); let conn = create_test_db();
apply_migrations(&conn, 2); apply_migrations(&conn, 2);
// Setup: project and issue
conn.execute( conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')", "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')",
[], [],
@@ -169,14 +162,12 @@ fn migration_002_enforces_noteable_type_check() {
[], [],
).unwrap(); ).unwrap();
// Valid: Issue discussion with issue_id
conn.execute( conn.execute(
"INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) "INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES ('abc123', 1, 1, 'Issue', 1000)", VALUES ('abc123', 1, 1, 'Issue', 1000)",
[], [],
).unwrap(); ).unwrap();
// Invalid: noteable_type not in allowed values
let result = conn.execute( let result = conn.execute(
"INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) "INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
VALUES ('def456', 1, 1, 'Commit', 1000)", VALUES ('def456', 1, 1, 'Commit', 1000)",
@@ -184,7 +175,6 @@ fn migration_002_enforces_noteable_type_check() {
); );
assert!(result.is_err()); assert!(result.is_err());
// Invalid: Issue type but no issue_id
let result = conn.execute( let result = conn.execute(
"INSERT INTO discussions (gitlab_discussion_id, project_id, noteable_type, last_seen_at) "INSERT INTO discussions (gitlab_discussion_id, project_id, noteable_type, last_seen_at)
VALUES ('ghi789', 1, 'Issue', 1000)", VALUES ('ghi789', 1, 'Issue', 1000)",
@@ -198,7 +188,6 @@ fn migration_002_cascades_on_project_delete() {
let conn = create_test_db(); let conn = create_test_db();
apply_migrations(&conn, 2); apply_migrations(&conn, 2);
// Setup: project, issue, label, issue_label link, discussion, note
conn.execute( conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')", "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')",
[], [],
@@ -229,11 +218,9 @@ fn migration_002_cascades_on_project_delete() {
[], [],
).unwrap(); ).unwrap();
// Delete project
conn.execute("DELETE FROM projects WHERE id = 1", []) conn.execute("DELETE FROM projects WHERE id = 1", [])
.unwrap(); .unwrap();
// Verify cascade: all related data should be gone
let issue_count: i64 = conn let issue_count: i64 = conn
.query_row("SELECT COUNT(*) FROM issues", [], |r| r.get(0)) .query_row("SELECT COUNT(*) FROM issues", [], |r| r.get(0))
.unwrap(); .unwrap();
@@ -265,8 +252,6 @@ fn migration_002_updates_schema_version() {
assert_eq!(version, 2); assert_eq!(version, 2);
} }
// === Migration 005 Tests ===
#[test] #[test]
fn migration_005_creates_milestones_table() { fn migration_005_creates_milestones_table() {
let conn = create_test_db(); let conn = create_test_db();
@@ -331,7 +316,6 @@ fn migration_005_milestones_cascade_on_project_delete() {
let conn = create_test_db(); let conn = create_test_db();
apply_migrations(&conn, 5); apply_migrations(&conn, 5);
// Setup: project with milestone
conn.execute( conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')", "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')",
[], [],
@@ -341,11 +325,9 @@ fn migration_005_milestones_cascade_on_project_delete() {
[], [],
).unwrap(); ).unwrap();
// Delete project
conn.execute("DELETE FROM projects WHERE id = 1", []) conn.execute("DELETE FROM projects WHERE id = 1", [])
.unwrap(); .unwrap();
// Verify milestone is gone
let count: i64 = conn let count: i64 = conn
.query_row("SELECT COUNT(*) FROM milestones", [], |r| r.get(0)) .query_row("SELECT COUNT(*) FROM milestones", [], |r| r.get(0))
.unwrap(); .unwrap();
@@ -357,7 +339,6 @@ fn migration_005_assignees_cascade_on_issue_delete() {
let conn = create_test_db(); let conn = create_test_db();
apply_migrations(&conn, 5); apply_migrations(&conn, 5);
// Setup: project, issue, assignee
conn.execute( conn.execute(
"INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')", "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project')",
[], [],
@@ -373,10 +354,8 @@ fn migration_005_assignees_cascade_on_issue_delete() {
) )
.unwrap(); .unwrap();
// Delete issue
conn.execute("DELETE FROM issues WHERE id = 1", []).unwrap(); conn.execute("DELETE FROM issues WHERE id = 1", []).unwrap();
// Verify assignee link is gone
let count: i64 = conn let count: i64 = conn
.query_row("SELECT COUNT(*) FROM issue_assignees", [], |r| r.get(0)) .query_row("SELECT COUNT(*) FROM issue_assignees", [], |r| r.get(0))
.unwrap(); .unwrap();

View File

@@ -1,5 +1,3 @@
//! Tests for MR discussion transformer.
use lore::gitlab::transformers::discussion::transform_mr_discussion; use lore::gitlab::transformers::discussion::transform_mr_discussion;
use lore::gitlab::types::{GitLabAuthor, GitLabDiscussion, GitLabNote}; use lore::gitlab::types::{GitLabAuthor, GitLabDiscussion, GitLabNote};
@@ -77,7 +75,7 @@ fn transform_mr_discussion_computes_resolvable_from_notes() {
let result = transform_mr_discussion(&discussion, 100, 42); let result = transform_mr_discussion(&discussion, 100, 42);
assert!(result.resolvable); assert!(result.resolvable);
assert!(!result.resolved); // resolvable but not resolved assert!(!result.resolved);
} }
#[test] #[test]

View File

@@ -1,5 +1,3 @@
//! Tests for MR transformer module.
use lore::gitlab::transformers::merge_request::transform_merge_request; use lore::gitlab::transformers::merge_request::transform_merge_request;
use lore::gitlab::types::{GitLabAuthor, GitLabMergeRequest, GitLabReferences, GitLabReviewer}; use lore::gitlab::types::{GitLabAuthor, GitLabMergeRequest, GitLabReferences, GitLabReviewer};
@@ -63,7 +61,7 @@ fn transforms_mr_with_all_fields() {
assert_eq!(result.merge_request.gitlab_id, 12345); assert_eq!(result.merge_request.gitlab_id, 12345);
assert_eq!(result.merge_request.iid, 42); assert_eq!(result.merge_request.iid, 42);
assert_eq!(result.merge_request.project_id, 200); // Local project ID, not GitLab's assert_eq!(result.merge_request.project_id, 200);
assert_eq!(result.merge_request.title, "Add user authentication"); assert_eq!(result.merge_request.title, "Add user authentication");
assert_eq!( assert_eq!(
result.merge_request.description, result.merge_request.description,
@@ -105,22 +103,17 @@ fn parses_timestamps_to_ms_epoch() {
let mr = make_test_mr(); let mr = make_test_mr();
let result = transform_merge_request(&mr, 200).unwrap(); let result = transform_merge_request(&mr, 200).unwrap();
// 2024-01-15T10:00:00.000Z = 1705312800000 ms
assert_eq!(result.merge_request.created_at, 1705312800000); assert_eq!(result.merge_request.created_at, 1705312800000);
// 2024-01-20T14:30:00.000Z = 1705761000000 ms
assert_eq!(result.merge_request.updated_at, 1705761000000); assert_eq!(result.merge_request.updated_at, 1705761000000);
// merged_at should also be parsed
assert_eq!(result.merge_request.merged_at, Some(1705761000000)); assert_eq!(result.merge_request.merged_at, Some(1705761000000));
} }
#[test] #[test]
fn handles_timezone_offset_timestamps() { fn handles_timezone_offset_timestamps() {
let mut mr = make_test_mr(); let mut mr = make_test_mr();
// GitLab can return timestamps with timezone offset
mr.created_at = "2024-01-15T05:00:00-05:00".to_string(); mr.created_at = "2024-01-15T05:00:00-05:00".to_string();
let result = transform_merge_request(&mr, 200).unwrap(); let result = transform_merge_request(&mr, 200).unwrap();
// 05:00 EST = 10:00 UTC = same as original test
assert_eq!(result.merge_request.created_at, 1705312800000); assert_eq!(result.merge_request.created_at, 1705312800000);
} }
@@ -322,7 +315,6 @@ fn handles_closed_at_timestamp() {
let result = transform_merge_request(&mr, 200).unwrap(); let result = transform_merge_request(&mr, 200).unwrap();
assert!(result.merge_request.merged_at.is_none()); assert!(result.merge_request.merged_at.is_none());
// 2024-01-18T12:00:00.000Z = 1705579200000 ms
assert_eq!(result.merge_request.closed_at, Some(1705579200000)); assert_eq!(result.merge_request.closed_at, Some(1705579200000));
} }