gitlore

Author	SHA1	Message	Date
teernisse	bf977eca1a	refactor(structure): reorganize codebase into domain-focused modules	2026-03-06 15:24:09 -05:00
teernisse	9107a78b57	perf(ingestion): replace per-row INSERT loops with chunked batch INSERTs The issue and MR ingestion paths previously inserted labels, assignees, and reviewers one row at a time inside a transaction. For entities with many labels or assignees, this issued N separate SQLite statements where a single multi-row INSERT suffices. Replace the per-row loops with batch INSERT functions that build a single `INSERT OR IGNORE ... VALUES (?1,?2),(?1,?3),...` statement per chunk. Chunks are capped at 400 rows (BATCH_LINK_ROWS_MAX) to stay comfortably below SQLite's default 999 bind-parameter limit. Affected paths: - issues.rs: link_issue_labels_batch_tx, insert_issue_assignees_batch_tx - merge_requests.rs: insert_mr_labels_batch_tx, insert_mr_assignees_batch_tx, insert_mr_reviewers_batch_tx New tests verify deduplication (OR IGNORE), multi-chunk correctness, and equivalence with the old per-row approach. A perf benchmark (bench_issue_assignee_insert_individual_vs_batch) demonstrates the speedup across representative assignee set sizes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 13:36:26 -05:00
teernisse	9ec1344945	feat(surgical-sync): add per-IID surgical sync pipeline with preflight validation Add the ability to sync specific issues or merge requests by IID without running a full incremental sync. This enables fast, targeted data refresh for individual entities — useful for agent workflows, debugging, and real-time investigation of specific issues or MRs. Architecture: - New CLI flags: --issue <IID> and --mr <IID> (repeatable, up to 100 total) scoped to a single project via -p/--project - Preflight phase validates all IIDs exist on GitLab before any DB writes, with TOCTOU-aware soft verification at ingest time - 6-stage pipeline: preflight -> fetch -> ingest -> dependents -> docs -> embed - Each stage is cancellation-aware via ShutdownSignal - Dedicated SyncRunRecorder extensions track surgical-specific counters (issues_fetched, mrs_ingested, docs_regenerated, etc.) New modules: - src/ingestion/surgical.rs: Core surgical fetch/ingest/dependent logic with preflight_fetch(), ingest_issue_by_iid(), ingest_mr_by_iid(), and fetch_dependents_for_{issue,mr}() - src/cli/commands/sync_surgical.rs: Full CLI orchestrator with progress spinners, human/robot output, and cancellation handling - src/embedding/pipeline.rs: embed_documents_by_ids() for scoped embedding - src/documents/regenerator.rs: regenerate_dirty_documents_for_sources() for scoped document regeneration Database changes: - Migration 027: Extends sync_runs with mode, phase, surgical_iids_json, per-entity counters, and cancelled_at column - New indexes: idx_sync_runs_mode_started, idx_sync_runs_status_phase_started GitLab client: - get_issue_by_iid() and get_mr_by_iid() single-entity fetch methods Error handling: - New SurgicalPreflightFailed error variant with entity_type, iid, project, and reason fields. Shares exit code 6 with GitLabNotFound. Includes comprehensive test coverage: - 645 lines of surgical ingestion tests (wiremock-based) - 184 lines of scoped embedding tests - 85 lines of scoped regeneration tests - 113 lines of GitLab client single-entity tests - 236 lines of sync_run surgical column/counter tests - Unit tests for SyncOptions, error codes, and CLI validation	2026-02-18 16:28:21 -05:00
teernisse	eef73decb5	fix(cli): timeline tag width, test env isolation, and logging verbosity Miscellaneous fixes across CLI and core modules: - Timeline: widen TAG_WIDTH from 10 to 11 to accommodate longer event type labels without truncation - render.rs: save and restore LORE_ICONS env var in glyph_mode test to prevent interference from the test environment leaking into or from other tests that set LORE_ICONS - logging.rs: adjust verbose=1 to info level (was debug), verbose=2 to debug — this reduces noise at -v while keeping -vv as the full debug experience - issues.rs, merge_requests.rs: use infodebug! macro consistently for ingestion summary logging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-14 11:25:42 -05:00
Taylor Eernisse	c6a5461d41	refactor(ingestion): compact log summaries and quieter shutdown messages Migrate all ingestion completion logs to use nonzero_summary() for compact, zero-suppressed output. Before: 8-14 individual key=value structured fields per completion message. After: a single summary field like '42 fetched · 3 labels · 12 notes' that only shows non-zero counters. Also downgrade all 'Shutdown requested...' messages from info! to debug!. These are emitted on every Ctrl+C and add noise to the partial results output that immediately follows. They remain visible at -vv for debugging graceful shutdown behavior. Affected modules: - issues.rs: issue ingestion completion - merge_requests.rs: MR ingestion completion, full-sync cursor reset - mr_discussions.rs: discussion ingestion completion - orchestrator.rs: project-level issue and MR completion summaries, all shutdown-requested checkpoints across discussion sync, resource events drain, closes-issues drain, and MR diffs drain Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 22:31:57 -05:00
Taylor Eernisse	7e0e6a91f2	refactor: extract unit tests into separate _tests.rs files Move inline #[cfg(test)] mod tests { ... } blocks from 22 source files into dedicated _tests.rs companion files, wired via: #[cfg(test)] #[path = "module_tests.rs"] mod tests; This keeps implementation-focused source files leaner and more scannable while preserving full access to private items through `use super::*;`. Modules extracted: core: db, note_parser, payloads, project, references, sync_run, timeline_collect, timeline_expand, timeline_seed cli: list (55 tests), who (75 tests) documents: extractor (43 tests), regenerator embedding: change_detector, chunking gitlab: graphql (wiremock async tests), transformers/issue ingestion: dirty_tracker, discussions, issues, mr_diffs Also adds conflicts_with("explain_score") to the --detail flag in the who command to prevent mutually exclusive flags from being combined. All 629 unit tests pass. No behavior changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 10:54:02 -05:00
Taylor Eernisse	dfa44e5bcd	fix(ingestion): label upsert reliability, init idempotency, and sync health Label upsert (issues + merge_requests): Replace INSERT ... ON CONFLICT DO UPDATE RETURNING with INSERT OR IGNORE + SELECT. The prior RETURNING-based approach relied on last_insert_rowid() matching the returned id, which is not guaranteed when ON CONFLICT triggers an update (SQLite may return 0). The new two-step approach is unambiguous and correctly tracks created_count. Init: Add ON CONFLICT(gitlab_project_id) DO UPDATE to the project insert so re-running `lore init` updates path/branch/url instead of failing with a unique constraint violation. MR discussions sync: Reset discussions_sync_attempts to 0 when clearing a sync health error, so previously-failed MRs get a fresh retry budget after successful sync. Count: format_number now handles negative numbers correctly by extracting the sign before inserting thousand-separators. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-09 10:15:53 -05:00
Taylor Eernisse	d3306114eb	fix(ingestion): pass ShutdownSignal into issue and MR pagination loops The orchestrator already accepted a ShutdownSignal but only checked it between phases (after all issues fetched, before discussions). The inner loops in ingest_issues() and ingest_merge_requests() consumed entire paginated streams without checking for cancellation. On a large initial sync (thousands of issues/MRs), Ctrl+C could be unresponsive for minutes while the current entity type finished draining. Now both functions accept &ShutdownSignal and check is_cancelled() at the top of each iteration, breaking out promptly and committing the cursor for whatever was already processed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-08 07:55:36 -05:00
Taylor Eernisse	65583ed5d6	refactor: Remove redundant doc comments throughout codebase Removes module-level doc comments (//! lines) and excessive inline doc comments that were duplicating information already evident from: - Function/struct names (self-documenting code) - Type signatures (the what is clear from types) - Implementation context (the how is clear from code) Affected modules: - cli/* - Removed command descriptions duplicating clap help text - core/* - Removed module headers and obvious function docs - documents/* - Removed extractor/regenerator/truncation docs - embedding/* - Removed pipeline and chunking docs - gitlab/* - Removed client and transformer docs (kept type definitions) - ingestion/* - Removed orchestrator and ingestion docs - search/* - Removed FTS and vector search docs Philosophy: Code should be self-documenting. Comments should explain "why" (business decisions, non-obvious constraints) not "what" (which the code itself shows). This change reduces noise and maintenance burden while keeping the codebase just as understandable. Retains comments for: - Non-obvious business logic - Important safety invariants - Complex algorithm explanations - Public API boundaries where generated docs matter Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 00:04:32 -05:00
Taylor Eernisse	ee5c5f9645	perf: Eliminate double serialization, add SQLite tuning, optimize hot paths 11 isomorphic performance fixes from deep audit (no behavior changes): - Eliminate double serialization: store_payload now accepts pre-serialized bytes (&[u8]) instead of re-serializing from serde_json::Value. Uses Cow<[u8]> for zero-copy when compression is disabled. - Add SQLite cache_size (64MB) and mmap_size (256MB) pragmas - Replace SELECT-then-INSERT label upserts with INSERT...ON CONFLICT RETURNING in both issues.rs and merge_requests.rs - Replace INSERT + SELECT milestone upsert with RETURNING - Use prepare_cached for 5 hot-path queries in extractor.rs - Optimize compute_list_hash: index-sort + incremental SHA-256 instead of clone+sort+join+hash - Pre-allocate embedding float-to-bytes buffer with Vec::with_capacity - Replace RandomState::new() in rand_jitter with atomic counter XOR nanos - Remove redundant per-note payload storage (discussion payload contains all notes already) - Change transform_issue to accept &GitLabIssue (avoids full struct clone) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 08:12:37 -05:00
Taylor Eernisse	a50fc78823	style: Apply cargo fmt and clippy fixes across codebase Automated formatting and lint corrections from parallel agent work: - cargo fmt: import reordering (alphabetical), line wrapping to respect max width, trailing comma normalization, destructuring alignment, function signature reformatting, match arm formatting - clippy (pedantic): Range::contains() instead of manual comparisons, i64::from() instead of `as i64` casts, .clamp() instead of .max().min() chains, let-chain refactors (if-let with &&), #[allow(clippy::too_many_arguments)] and #[allow(clippy::field_reassign_with_default)] where warranted - Removed trailing blank lines and extra whitespace No behavioral changes. All existing tests pass unmodified. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-03 13:01:59 -05:00
Taylor Eernisse	559f0702ad	feat(ingestion): Mark entities dirty on ingest for document regeneration Integrates the dirty tracking system into all four ingestion paths (issues, MRs, issue discussions, MR discussions). After each entity is upserted within its transaction, a corresponding dirty_queue entry is inserted so the document regenerator knows which documents need rebuilding. This ensures that document generation stays transactionally consistent with data changes: if the ingest transaction rolls back, the dirty marker rolls back too, preventing stale document regeneration attempts. Also updates GiError references to LoreError in these files as part of the codebase-wide rename, and adjusts issue discussion logging from info to debug level to reduce noise during normal sync runs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 15:46:51 -05:00
Taylor Eernisse	cd44e516e3	feat(ingestion): Implement MR sync with parallel discussion prefetch Adds complete merge request ingestion pipeline with a novel two-phase discussion sync strategy optimized for throughput. New modules: - merge_requests.rs: MR upsert with labels/assignees/reviewers handling, stale MR cleanup, and watermark-based incremental sync - mr_discussions.rs: Parallel prefetch strategy for MR discussions Two-phase MR discussion sync: 1. PREFETCH PHASE: Spawn concurrent tasks to fetch discussions for multiple MRs simultaneously (configurable concurrency, default 8). Transform and validate in parallel, storing results in memory. 2. WRITE PHASE: Serial database writes to avoid lock contention. Each MR's discussions written in a single transaction, with proper stale discussion cleanup. This approach achieves ~4-8x throughput vs serial fetching while maintaining database consistency. Transform errors are tracked per-MR to prevent partial writes from corrupting watermarks. Orchestrator updates: - ingest_merge_requests(): Coordinates MR fetch -> discussion sync flow - Progress callbacks emit MR-specific events for UI feedback - Respects --full flag to reset discussion watermarks for full resync The prefetch strategy is critical for MRs which typically have more discussions than issues, and where API latency dominates sync time. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 22:45:48 -05:00
Taylor Eernisse	cd60350c6d	feat(ingestion): Implement cursor-based incremental sync from GitLab Provides efficient data synchronization with minimal API calls. src/ingestion/issues.rs - Issue sync logic: - Cursor-based incremental sync using updated_at timestamp - Fetches only issues modified since last sync - Configurable cursor rewind for overlap safety (default 2s) - Batched database writes with transaction wrapping - Upserts issues, labels, milestones, and assignees - Maintains issue_labels and issue_assignees junction tables - Returns IngestIssuesResult with counts and issues needing discussion sync - Identifies issues where discussion count changed src/ingestion/discussions.rs - Discussion sync logic: - Fetches discussions for issues that need sync - Compares discussion count vs stored to detect changes - Batched note insertion with raw payload preservation - Updates discussion metadata (resolved state, note counts) - Tracks sync state per discussion to enable incremental updates - Returns IngestDiscussionsResult with fetched/skipped counts src/ingestion/orchestrator.rs - Sync coordination: - Two-phase sync: issues first, then discussions - Progress callback support for CLI progress bars - ProgressEvent enum for fine-grained status updates: - IssueFetch, IssueProcess, DiscussionFetch, DiscussionSkip - Acquires sync lock before starting - Updates sync watermark on successful completion - Handles partial failures gracefully (watermark not updated) - Returns IngestProjectResult with detailed statistics The architecture supports future additions: - Merge request ingestion (parallel to issues) - Full-text search indexing hooks - Vector embedding pipeline integration Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 11:28:34 -05:00

14 Commits