Two bug fixes:
1. extractor.rs: The content hash was computed on the pre-truncation
content, meaning the hash stored in the document didn't correspond
to the actual stored (truncated) content. This would cause change
detection to miss updates when content changed only within the
truncated portion. Hash is now computed after truncate_hard_cap()
so it always matches the persisted content.
2. dependent_queue.rs: claim_jobs() had a TOCTOU race between the
SELECT that found available jobs and the UPDATE that locked them.
Under concurrent callers, two drain runs could claim the same job.
Replaced with a single UPDATE ... RETURNING statement that
atomically selects and locks jobs in one operation.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Three bugs fixed:
1. Early return in orchestrator when no discussions needed sync also
skipped resource event enqueue+drain. On incremental syncs (the most
common case), resource events were never fetched. Restructured to use
if/else instead of early return so Step 4 always executes.
2. Ingest command JSON and human-readable output silently dropped
resource_events_fetched/failed counts. Added to IngestJsonData and
print_ingest_summary.
3. Progress bar reuse after finish_and_clear caused indicatif to silently
ignore subsequent set_position/set_length calls. Added reset() call
before reconfiguring the bar for resource events.
Also removed stale comment referencing "unsafe" that didn't reflect
the actual unchecked_transaction approach.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Integrate resource event fetching as Step 4 of both issue and MR
ingestion, gated behind the fetch_resource_events config flag.
Orchestrator changes:
- Add ProgressEvent variants: ResourceEventsFetchStarted,
ResourceEventFetched, ResourceEventsFetchComplete
- Add resource_events_fetched/failed fields to IngestProjectResult
and IngestMrProjectResult
- New enqueue_resource_events_for_entity_type() queries all
issues/MRs for a project and enqueues resource_events jobs via
the dependent queue (INSERT OR IGNORE for idempotency)
- New drain_resource_events() claims jobs in batches, fetches
state/label/milestone events from GitLab API, stores them
atomically via unchecked_transaction, and handles failures
with exponential backoff via fail_job()
- Max-iterations guard prevents infinite retry loops within a
single drain run
- New store_resource_events() + per-type _tx helpers write events
using prepared statements inside a single transaction
- DrainResult struct tracks fetched/failed counts
CLI ingest changes:
- IngestResult gains resource_events_fetched/failed fields
- Progress bar repurposed for resource event fetch phase
(reuses discussion bar with updated template)
- Accumulates event counts from both issue and MR ingestion
CLI sync changes:
- SyncResult gains resource_events_fetched/failed fields
- Accumulates counts from both ingest stages
- print_sync() conditionally displays event counts
- Structured logging includes event counts
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Automated formatting and lint corrections from parallel agent work:
- cargo fmt: import reordering (alphabetical), line wrapping to respect
max width, trailing comma normalization, destructuring alignment,
function signature reformatting, match arm formatting
- clippy (pedantic): Range::contains() instead of manual comparisons,
i64::from() instead of `as i64` casts, .clamp() instead of
.max().min() chains, let-chain refactors (if-let with &&),
#[allow(clippy::too_many_arguments)] and
#[allow(clippy::field_reassign_with_default)] where warranted
- Removed trailing blank lines and extra whitespace
No behavioral changes. All existing tests pass unmodified.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds two new categories of integrity checks to 'lore stats --check':
Event FK integrity (3 queries):
- Detects orphaned resource_state_events where issue_id or
merge_request_id points to a non-existent parent entity
- Same check for resource_label_events and resource_milestone_events
- Under normal CASCADE operation these should always be zero; non-zero
indicates manual DB edits, bugs, or partial migration state
Queue health diagnostics:
- pending_dependent_fetches counts: pending, failed, and stuck (locked)
- queue_stuck_locks: Jobs with locked_at set (potential worker crashes)
- queue_max_attempts: Highest retry count across all jobs (signals
permanently failing jobs when > 3)
New IntegrityResult fields: orphan_state_events, orphan_label_events,
orphan_milestone_events, queue_stuck_locks, queue_max_attempts.
New QueueStats fields: pending_dependent_fetches,
pending_dependent_fetches_failed, pending_dependent_fetches_stuck.
Human output shows colored PASS/WARN/FAIL indicators:
- Red "!" for orphaned events (integrity failure)
- Yellow "!" for stuck locks and high retry counts (warnings)
- Dependent fetch queue line only shown when non-zero
All new queries are guarded by table_exists() checks for graceful
degradation on databases without migration 011 applied.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends the count command to support "events" as an entity type,
displaying resource event counts broken down by event type (state,
label, milestone) and entity type (issue, merge request).
New functions in count.rs:
- run_count_events: Creates DB connection and delegates to
events_db::count_events for the actual queries
- print_event_count: Human-readable table with aligned columns
showing per-type breakdowns and row/column totals
- print_event_count_json: Structured JSON matching the robot mode
contract with ok/data envelope and per-type issue/mr/total counts
JSON output structure:
{"ok":true,"data":{"state_events":{"issue":N,"merge_request":N,
"total":N},"label_events":{...},"milestone_events":{...},"total":N}}
Updated exports in commands/mod.rs to expose the three new public
functions (run_count_events, print_event_count, print_event_count_json).
The "events" branch in handle_count (main.rs, committed earlier)
routes to these functions before the existing entity type dispatcher.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New module src/core/dependent_queue.rs provides job queue operations
against the pending_dependent_fetches table. Designed for second-pass
fetches that depend on primary entity ingestion (resource events,
MR close references, MR file diffs).
Queue operations:
- enqueue_job: Idempotent INSERT OR IGNORE keyed on the UNIQUE
(project_id, entity_type, entity_iid, job_type) constraint.
Returns bool indicating whether the row was actually inserted.
- claim_jobs: Two-phase claim — SELECT available jobs (unlocked,
past retry window) then UPDATE locked_at in batch. Orders by
enqueued_at ASC for FIFO processing within a job type.
- complete_job: DELETE the row on successful processing.
- fail_job: Increments attempts, calculates exponential backoff
(30s * 2^(attempts-1), capped at 480s), sets next_retry_at,
clears locked_at, and records the error message. Reads current
attempts via query with unwrap_or(0) fallback for robustness.
- reclaim_stale_locks: Clears locked_at on jobs locked longer than
a configurable threshold, recovering from worker crashes.
- count_pending_jobs: GROUP BY job_type aggregation for progress
reporting and stats display.
Registers both events_db and dependent_queue in src/core/mod.rs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New module src/core/events_db.rs provides database operations for
resource events:
- upsert_state_events: Batch INSERT OR REPLACE for state change events,
keyed on UNIQUE(gitlab_id, project_id). Wraps in a savepoint for
atomicity per entity batch. Maps GitLabStateEvent fields including
optional user, source_commit, and source_merge_request_iid.
- upsert_label_events: Same pattern for label add/remove events,
extracting label.name for denormalized storage.
- upsert_milestone_events: Same pattern for milestone assignment events,
storing both milestone.title and milestone.id.
All three upsert functions:
- Take &mut Connection (required for savepoint creation)
- Use prepare_cached for statement reuse across batch iterations
- Convert ISO timestamps via iso_to_ms_strict for ms-epoch storage
- Propagate rusqlite errors via the #[from] LoreError::Database path
- Return the count of events processed
Supporting functions:
- resolve_entity_ids: Maps entity_type string to (issue_id, MR_id) pair
with exactly-one-non-NULL invariant matching the CHECK constraints
- count_events: Queries all three event tables with conditional COUNT
aggregations, returning EventCounts struct. Uses unwrap_or((0, 0))
for graceful degradation when tables don't exist (pre-migration 011).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends GitLabClient with methods for fetching resource events from
GitLab's per-entity API endpoints. Adds a new impl block containing:
- fetch_all_pages<T>: Generic paginated collector that handles
x-next-page header parsing with fallback to page-size heuristics.
Uses per_page=100 and respects the existing rate limiter via
request_with_headers. Terminates when: (a) x-next-page header is
absent/stale, (b) response is empty, or (c) page is not full.
- Six typed endpoint methods:
- fetch_issue_state_events / fetch_mr_state_events
- fetch_issue_label_events / fetch_mr_label_events
- fetch_issue_milestone_events / fetch_mr_milestone_events
- fetch_all_resource_events: Convenience method that fetches all three
event types for an entity (issue or merge_request) in sequence,
returning a tuple of (state, label, milestone) event vectors.
Routes to issue or MR endpoints based on entity_type string.
All methods follow the existing client patterns: path formatting with
gitlab_project_id and iid, error propagation via Result, and rate
limiter integration through the shared request_with_headers path.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a new boolean field to SyncConfig that controls whether resource
event fetching is performed during sync:
- SyncConfig.fetch_resource_events: defaults to true via serde
default_true helper, serialized as "fetchResourceEvents" in JSON
- SyncArgs.no_events: --no-events CLI flag that overrides the config
value to false when present
- SyncOptions.no_events: propagates the flag through the sync pipeline
- handle_sync_cmd: mutates loaded config when --no-events is set,
ensuring the flag takes effect regardless of config file contents
This follows the existing pattern established by --no-embed and
--no-docs flags, where CLI flags override config file defaults.
The config is loaded as mutable specifically to support this override.
Also adds "events" to the count command's entity type value_parser,
enabling `lore count events` (implementation in a separate commit).
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds six new types for deserializing responses from GitLab's three
Resource Events API endpoints (state, label, milestone):
- GitLabStateEvent: State transitions with optional user, source_commit,
and source_merge_request reference
- GitLabLabelEvent: Label add/remove events with nested GitLabLabelRef
- GitLabMilestoneEvent: Milestone assignment changes with nested
GitLabMilestoneRef
- GitLabMergeRequestRef: Lightweight MR reference (iid, title, web_url)
- GitLabLabelRef: Label metadata (id, name, color, description)
- GitLabMilestoneRef: Milestone metadata (id, iid, title)
All types derive Deserialize + Serialize and use Option<T> for nullable
fields (user, source_commit, color, description) to match GitLab's API
contract where these fields may be null.
Includes 8 new test cases covering:
- State events with/without user, with/without source_merge_request
- Label events for add and remove actions, including null color handling
- Milestone event deserialization
- Standalone ref type deserialization (MR, label, milestone)
Uses r##"..."## raw string delimiters where JSON contains hex color
codes (#FF0000) that would conflict with r#"..."# delimiters.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces five new tables that power temporal queries (timeline,
file-history, trace) via GitLab Resource Events APIs:
- resource_state_events: State transitions (opened/closed/reopened/merged/locked)
with actor tracking, source commit, and source MR references
- resource_label_events: Label add/remove history per entity
- resource_milestone_events: Milestone assignment changes per entity
- entity_references: Cross-reference table (Gate 2 prep) linking
source/target entity pairs with reference type and discovery method
- pending_dependent_fetches: Generic job queue for resource_events,
mr_closes_issues, and mr_diffs with exponential backoff retry
All event tables enforce entity exclusivity via CHECK constraints
(exactly one of issue_id or merge_request_id must be non-NULL).
Deduplication handled via UNIQUE indexes on (gitlab_id, project_id).
FK cascades ensure cleanup when parent entities are removed.
The dependent fetch queue uses a UNIQUE constraint on
(project_id, entity_type, entity_iid, job_type) for idempotent
enqueue, with partial indexes optimizing claim and retry queries.
Registered as migration 011 in the embedded MIGRATIONS array in db.rs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --full / --no-full flag pair to EmbedArgs with overrides_with
semantics matching the existing flag pattern. When active, atomically
DELETEs all embedding_metadata and embeddings before re-embedding.
- Thread the full flag through run_embed -> run_sync so that
'lore sync --full' triggers a complete re-embed alongside the full
re-ingest it already performed.
- Add indicatif spinners to sync stages with dynamic stage numbering
that adjusts when --no-docs or --no-embed skip stages. Spinners are
hidden in robot mode.
- Update robot-docs manifest to advertise the new --full flag on the
embed command.
- Replace hardcoded schema version 9 in health check with the
LATEST_SCHEMA_VERSION constant from db.rs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reduces CHUNK_MAX_BYTES from 32KB to 6KB and CHUNK_OVERLAP_CHARS from
500 to 200 to stay within nomic-embed-text's 8,192-token context
window. This commit addresses all downstream consequences of that
reduction:
- Config drift detection: find_pending_documents and
count_pending_documents now take model_name and compare
chunk_max_bytes, model, and dims against stored metadata. Documents
embedded with stale config are automatically re-queued.
- Overflow guard: documents producing >= CHUNK_ROWID_MULTIPLIER chunks
are skipped with a sentinel error recorded in embedding_metadata,
preventing both rowid collision and infinite re-processing loops.
- Deferred clearing: old embeddings are no longer cleared before
attempting new ones. clear_document_embeddings is deferred until the
first successful chunk embedding, so if all chunks fail the document
retains its previous embeddings rather than losing all data.
- Savepoints: each page of DB writes is wrapped in a SQLite savepoint
so a crash mid-page rolls back atomically instead of leaving partial
state (cleared embeddings with no replacements).
- Per-chunk retry on context overflow: when a batch fails with a
context-length error, each chunk is retried individually so one
oversized chunk doesn't poison the entire batch.
- Adaptive dedup in vector search: replaces the static 3x over-fetch
multiplier with a dynamic one based on actual max chunks per document
(using the new chunk_count column with a fallback COUNT query for
pre-migration data). Also replaces partial_cmp with total_cmp for
f64 distance sorting.
- Stores chunk_max_bytes and chunk_count (on sentinel rows) in
embedding_metadata to support config drift detection and adaptive
dedup without runtime queries.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add chunk_max_bytes and chunk_count columns to embedding_metadata to
support config drift detection and adaptive dedup sizing. Includes a
partial index on sentinel rows (chunk_index=0) to accelerate the drift
detection and max-chunk queries.
Also exports LATEST_SCHEMA_VERSION as a public constant derived from
the MIGRATIONS array length, replacing the previously hardcoded magic
number in the health check.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extend resolve_project() with a 4th cascade step: case-insensitive
substring match when exact, case-insensitive, and suffix matches all
fail. This allows shorthand like "typescript" to match
"vs/typescript-code" when unambiguous. Multi-match still returns an
error with all candidates listed.
Also change ambiguity errors from LoreError::Other to LoreError::Ambiguous
so they get the proper AMBIGUOUS error code (exit 18) instead of
INTERNAL_ERROR.
Includes tests for unambiguous substring, case-insensitive substring,
ambiguous substring, and suffix-preferred-over-substring ordering.
Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>
Runtime setup:
- Reset SIGPIPE to SIG_DFL on Unix at the very start of main() so
piping to head/grep doesn't cause a panic.
- Apply --color flag to console::set_colors_enabled() after CLI parse.
- Extract quiet flag and thread it to handle_ingest.
Command dispatch:
- Add Completions match arm using clap_complete::generate().
- Resolve all --no-X negation flags in handlers: asc, has_due, open
(issues/mrs), force/full (ingest/sync), check (stats), explain
(search), retry_failed (embed).
- Auto-enable --check when --repair is used in handle_stats.
- Suppress deprecation warnings in robot mode for List, Show, AuthTest,
and SyncStatus deprecated aliases.
Stubs:
- Change handle_backup/handle_reset from ok:true to structured error
JSON on stderr with exit code 1. Remove unused NotImplementedOutput
and NotImplementedData structs.
Version:
- Include GIT_HASH env var in handle_version output (human and robot).
- Add git_hash field to VersionData with skip_serializing_if for None.
Robot-docs:
- Update exit code table with codes 14-18 (Ollama, NotFound, Ambiguous)
and code 20 (ConfigNotFound). Clarify code 1 and 2 descriptions.
Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>
Ingest:
- Introduce IngestDisplay struct with show_progress/show_text booleans
to decouple progress bars from text output. Replaces the robot_mode
bool parameter with explicit display control, enabling sync to show
progress without duplicating summary text (progress_only mode).
- Use resolve_project() for --project filtering instead of LIKE queries,
providing proper error messages for ambiguous or missing projects.
List:
- Add colored_cell() helper that checks console::colors_enabled() before
applying comfy-table foreground colors, bridging the gap between the
console and comfy-table crates for --color flag support.
- Use resolve_project() for project filtering (exact ID match).
- Improve since filter to return explicit errors instead of silently
ignoring invalid values.
- Improve format_relative_time for proper singular/plural forms.
Search:
- Validate --after/--updated-after with explicit error messages.
- Handle optional title field (Option<String>) in HydratedRow.
Show:
- Use resolve_project() for project disambiguation.
Sync:
- Thread robot_mode via SyncOptions for IngestDisplay selection.
- Use IngestDisplay::progress_only() in interactive sync mode.
GenerateDocs:
- Use resolve_project() for --project filtering.
Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>
Global flags:
- --color (auto|always|never) for explicit color control
- --quiet/-q to suppress non-essential output
- Hidden Completions subcommand for bash/zsh/fish/powershell
Flag negation (--no-X) with overrides_with for: has-due, asc, open
(issues/mrs), force/full (ingest/sync), check (stats), explain (search),
retry-failed (embed). Enables scripted flag composition where later flags
override earlier ones.
Validation:
- value_parser on search --mode, --type, --fts-mode for early rejection
- Remove requires="check" from --repair (auto-enabled in handler)
Polish:
- help_heading groups (Filters, Sorting, Output, Actions) on issues,
mrs, and search args for cleaner --help output
- Hide Backup, Reset, and Completions from --help
Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>
ConfigNotFound previously used exit code 2 which collides with clap's
usage error code. Remap it to exit 20 to avoid ambiguity. Also add
dedicated NotFound (exit 17) and Ambiguous (exit 18) error codes with
proper ErrorCode variants and Display implementations, replacing the
previous incorrect mapping of these errors to GitLabNotFound.
Co-Authored-By: Claude (us.anthropic.claude-opus-4-5-20251101-v1:0) <noreply@anthropic.com>
Extends the CLI with six new commands that complete the search pipeline:
- lore search <QUERY>: Hybrid search with mode selection (lexical,
hybrid, semantic), rich filtering (--type, --author, --project,
--label, --path, --after, --updated-after), result limits, and
optional explain mode showing RRF score breakdowns. Safe FTS mode
sanitizes user input; raw mode passes through for power users.
- lore stats: Document and index statistics with optional --check
for integrity verification and --repair to fix inconsistencies
(orphaned documents, missing FTS entries, stale dirty queue items).
- lore embed: Generate vector embeddings via Ollama. Supports
--retry-failed to re-attempt previously failed embeddings.
- lore generate-docs: Drain the dirty queue to regenerate documents.
--full seeds all entities for complete rebuild. --project scopes
to a single project.
- lore sync: Full pipeline orchestration (ingest issues + MRs,
generate-docs, embed) with --no-embed and --no-docs flags for
partial runs. Reports per-stage results and total elapsed time.
- lore health: Quick pre-flight check (config exists, DB exists,
schema current). Returns exit code 1 if unhealthy. Designed for
agent pre-flight scripts.
- lore robot-docs: Machine-readable command manifest for agent
self-discovery. Returns all commands, flags, examples, exit codes,
and recommended workflows as structured JSON.
Also enhances lore init with --gitlab-url, --token-env-var, and
--projects flags for fully non-interactive robot-mode initialization.
Fixes init's force/non-interactive precedence logic and adds JSON
output for robot mode.
Updates all command files for the GiError -> LoreError rename.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Integrates the dirty tracking system into all four ingestion paths
(issues, MRs, issue discussions, MR discussions). After each entity
is upserted within its transaction, a corresponding dirty_queue entry
is inserted so the document regenerator knows which documents need
rebuilding.
This ensures that document generation stays transactionally consistent
with data changes: if the ingest transaction rolls back, the dirty
marker rolls back too, preventing stale document regeneration attempts.
Also updates GiError references to LoreError in these files as part
of the codebase-wide rename, and adjusts issue discussion logging
from info to debug level to reduce noise during normal sync runs.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the search module providing three search modes:
- Lexical (FTS5): Full-text search using SQLite FTS5 with safe query
sanitization. User queries are automatically tokenized and wrapped
in proper FTS5 syntax. Supports a "raw" mode for power users who
want direct FTS5 query syntax (NEAR, column filters, etc.).
- Semantic (vector): Embeds the search query via Ollama, then performs
cosine similarity search against stored document embeddings. Results
are deduplicated by doc_id since documents may have multiple chunks.
- Hybrid (default): Executes both lexical and semantic searches in
parallel, then fuses results using Reciprocal Rank Fusion (RRF) with
k=60. This avoids the complexity of score normalization while
producing high-quality merged rankings. Gracefully degrades to
lexical-only when embeddings are unavailable.
Additional components:
- search::filters: Post-retrieval filtering by source_type, author,
project, labels (AND logic), file path prefix, created_after, and
updated_after. Date filters accept relative formats (7d, 2w) and
ISO dates.
- search::rrf: Reciprocal Rank Fusion implementation with configurable
k parameter and optional explain mode that annotates each result
with its component ranks and fusion score breakdown.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the embedding module that generates vector representations
of documents using a local Ollama instance with the nomic-embed-text
model. These embeddings enable semantic (vector) search and the hybrid
search mode that fuses lexical and semantic results via RRF.
Key components:
- embedding::ollama: HTTP client for the Ollama /api/embeddings
endpoint. Handles connection errors with actionable error messages
(OllamaUnavailable, OllamaModelNotFound) and validates response
dimensions.
- embedding::chunking: Splits long documents into overlapping
paragraph-aware chunks for embedding. Uses a configurable max token
estimate (8192 default for nomic-embed-text) with 10% overlap to
preserve cross-chunk context.
- embedding::chunk_ids: Encodes chunk identity as
doc_id * 1000 + chunk_index for the embeddings table rowid. This
allows vector search to map results back to documents and
deduplicate by doc_id efficiently.
- embedding::change_detector: Compares document content_hash against
stored embedding hashes to skip re-embedding unchanged documents,
making incremental embedding runs fast.
- embedding::pipeline: Orchestrates the full embedding flow: detect
changed documents, chunk them, call Ollama in configurable
concurrency (default 4), store results. Supports --retry-failed
to re-attempt previously failed embeddings.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements the documents module that transforms raw ingested entities
(issues, MRs, discussions) into searchable document blobs stored in
the documents table. This is the foundation for both FTS5 lexical
search and vector embedding.
Key components:
- documents::extractor: Renders entities into structured text documents.
Issues include title, description, labels, milestone, assignees, and
threaded discussion summaries. MRs additionally include source/target
branches, reviewers, and approval status. Discussions are rendered
with full note threading.
- documents::regenerator: Drains the dirty_queue table to regenerate
only documents whose source entities changed since last sync. Supports
full rebuild mode (seeds all entities into dirty queue first) and
project-scoped regeneration.
- documents::truncation: Safety cap at 2MB per document to prevent
pathological outliers from degrading FTS or embedding performance.
- ingestion::dirty_tracker: Marks entities as dirty inside the
ingestion transaction so document regeneration stays consistent
with data changes. Uses INSERT OR IGNORE to deduplicate.
- ingestion::discussion_queue: Queue-based discussion fetching that
isolates individual discussion failures from the broader ingestion
pipeline, preventing a single corrupt discussion from blocking
an entire project sync.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two targeted fixes to the GitLab API client:
1. Pagination: When the x-next-page header is missing but the current
page returned a full page of results, heuristically advance to the
next page instead of stopping. This fixes silent data truncation
observed with certain GitLab instances that omit pagination headers
on intermediate pages. The existing early-exit on empty or partial
pages remains as the termination condition.
2. Rate limiter: Refactor the async acquire() method into a synchronous
check_delay() that computes the required sleep duration and updates
last_request time while holding the mutex, then releases the lock
before sleeping. This eliminates holding the Mutex<RateLimiter>
across an await point, which previously could block other request
tasks unnecessarily during the sleep interval.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mechanical rename of GiError -> LoreError across the core module to
match the project's rebranding from gitlab-inbox to gitlore/lore.
Updates the error enum name, all From impls, and the Result type alias.
Additionally introduces:
- New error variants for embedding pipeline: OllamaUnavailable,
OllamaModelNotFound, EmbeddingFailed, EmbeddingsNotBuilt. Each
includes actionable suggestions (e.g., "ollama serve", "ollama pull
nomic-embed-text") to guide users through recovery.
- New error codes 14-16 for programmatic handling of Ollama failures.
- Savepoint-based migration execution in db.rs: each migration now
runs inside a SQLite SAVEPOINT so a failed migration rolls back
cleanly without corrupting the schema_version tracking. Previously
a partial migration could leave the database in an inconsistent
state.
- core::backoff module: exponential backoff with jitter utility for
retry loops in the embedding pipeline and discussion queues.
- core::project module: helper for resolving project IDs and paths
from the local database, used by the document regenerator and
search filters.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaces the verb-first pattern ('lore list issues', 'lore show
issue 42') with noun-first subcommands that feel more natural:
lore issues # list issues
lore issues 42 # show issue #42
lore mrs # list merge requests
lore mrs 99 # show MR #99
lore ingest # ingest everything
lore ingest issues # ingest only issues
lore count issues # count issues
lore status # sync status
lore auth # verify auth
lore doctor # health check
Key changes:
- New IssuesArgs, MrsArgs, IngestArgs, CountArgs structs with
short flags (-n, -s, -p, -a, -l, -o, -f, -J, etc.)
- Global -J/--json flag as shorthand for --robot
- 'lore ingest' with no argument ingests both issues and MRs,
emitting combined JSON summary in robot mode
- --asc flag replaces --order=asc/desc for brevity
- Renamed flags: --has-due-date -> --has-due, --type -> --for,
--confirm -> --yes, target_branch -> --target, etc.
Old commands (list, show, auth-test, sync-status) are preserved
as hidden backward-compat aliases that emit deprecation warnings
to stderr before delegating to the new handlers.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ingestion counters (discussions_upserted, notes_upserted,
discussions_fetched, diffnotes_count) were incremented before
tx.commit(), meaning a failed commit would report inflated
metrics. Counters now increment only after successful commit
so reported numbers accurately reflect persisted state.
Also simplifies the stale-removal guard in issue discussions:
the received_first_response flag was unnecessary since an empty
seen_discussion_ids list is safe to pass to remove_stale -- if
there were no discussions, stale removal correctly sweeps all
previously-stored discussions. The two separate code paths
(empty vs populated) are collapsed into a single branch.
Derives Default on IngestResult to eliminate verbose zero-init.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two SQL correctness issues fixed:
1. Project filter used LIKE '%term%' which caused partial matches
(e.g. filtering for "foo" matched "group/foobar"). Now uses
exact match OR suffix match after '/' so "foo" matches
"group/foo" but not "group/foobar".
2. GROUP_CONCAT used comma as delimiter for labels and assignees,
which broke parsing when label names themselves contained commas.
Switched to ASCII unit separator (0x1F) which cannot appear in
GitLab entity names.
Also adds a guard for negative time deltas in format_relative_time
to handle clock skew gracefully instead of panicking.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Error suggestions now include concrete CLI examples so users
(and robot-mode consumers) can act immediately without consulting
docs. For instance, ConfigNotFound now shows the expected path
and the exact command to run, TokenNotSet shows the export syntax,
and Ambiguous shows the -p flag with example project paths.
Also fixes the error code for Ambiguous errors: it now maps to
GitLabNotFound instead of InternalError, since the entity exists
but the user needs to disambiguate -- not an internal failure.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Duplicate ISO 8601 timestamp parsing functions existed in both
discussion.rs and merge_request.rs transformers. This extracts
iso_to_ms_strict() and iso_to_ms_opt_strict() into core::time
as the single source of truth, and updates both transformer
modules to use the shared implementations.
Also removes the private now_ms() from merge_request.rs in
favor of the existing core::time::now_ms(), and replaces the
local parse_timestamp_opt() in discussion.rs with the public
iso_to_ms() from core::time.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends all data commands to support merge requests alongside issues,
with consistent patterns and JSON output for robot mode.
List command (gi list mrs):
- MR-specific columns: branches, draft status, reviewers
- Filters: --state (opened|merged|closed|locked|all), --draft,
--no-draft, --reviewer, --target-branch, --source-branch
- Discussion count with unresolved indicator (e.g., "5/2!")
- JSON output includes full MR metadata
Show command (gi show mr <iid>):
- MR details with branches, assignees, reviewers, merge status
- DiffNote positions showing file:line for code review comments
- Full description and discussion bodies (no truncation in JSON)
- --json flag for structured output with ISO timestamps
Count command (gi count mrs):
- MR counting with optional --type filter for discussions/notes
- JSON output with breakdown by state
Ingest command (gi ingest --type mrs):
- Full MR sync with discussion prefetch
- Progress output shows MR-specific metrics (diffnotes count)
- JSON summary with comprehensive sync statistics
All commands respect global --robot mode for auto-JSON output.
The pattern "gi list mrs --json | jq '.mrs[] | .iid'" now works
for scripted MR processing.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces a unified robot mode that enables JSON output across all
commands, designed for AI agent and script consumption.
Robot mode activation (any of):
- --robot flag: Explicit opt-in
- GI_ROBOT=1 env var: For persistent configuration
- Non-TTY stdout: Auto-detect when piped (e.g., gi list issues | jq)
Implementation:
- Cli::is_robot_mode(): Centralized detection logic
- All command handlers receive robot_mode boolean
- Errors emit structured JSON to stderr with exit codes
- Success responses emit JSON to stdout
Behavior changes in robot mode:
- No color/emoji output (no ANSI escapes)
- No progress spinners or interactive prompts
- Timestamps as ISO 8601 strings (not relative "2 hours ago")
- Full content (no truncation of descriptions/notes)
- Structured error objects with code, message, suggestion
This enables reliable parsing by Claude Code, shell scripts, and
automation pipelines. The auto-detect on non-TTY means simple piping
"just works" without explicit flags.
Per-command --json flags remain for explicit control and override
robot mode when needed for human-friendly terminal + JSON file output.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Improves core infrastructure with robot-friendly error output and
faster lock release for better sync behavior.
Error handling improvements (error.rs):
- ErrorCode::exit_code(): Unique exit codes per error type (1-13)
for programmatic error handling in scripts/agents
- GiError::suggestion(): Helpful hints for common error recovery
- GiError::to_robot_error(): Structured JSON error conversion
- RobotError/RobotErrorOutput: Serializable error types with code,
message, and optional suggestion fields
Lock improvements (lock.rs):
- Heartbeat thread now polls every 100ms for release flag, only
updating database heartbeat at full interval (5s default)
- Eliminates 5-10s delay after sync completion when waiting for
heartbeat thread to notice release
- Reduces lock hold time after operation completes
Database (db.rs):
- Bump expected schema version to 6 for MR migration
The exit code mapping enables shell scripts and CI/CD pipelines to
distinguish between configuration errors (2-4), GitLab API errors
(5-8), and database errors (9-11) for appropriate retry/alert logic.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds complete merge request ingestion pipeline with a novel two-phase
discussion sync strategy optimized for throughput.
New modules:
- merge_requests.rs: MR upsert with labels/assignees/reviewers handling,
stale MR cleanup, and watermark-based incremental sync
- mr_discussions.rs: Parallel prefetch strategy for MR discussions
Two-phase MR discussion sync:
1. PREFETCH PHASE: Spawn concurrent tasks to fetch discussions for
multiple MRs simultaneously (configurable concurrency, default 8).
Transform and validate in parallel, storing results in memory.
2. WRITE PHASE: Serial database writes to avoid lock contention.
Each MR's discussions written in a single transaction, with
proper stale discussion cleanup.
This approach achieves ~4-8x throughput vs serial fetching while
maintaining database consistency. Transform errors are tracked per-MR
to prevent partial writes from corrupting watermarks.
Orchestrator updates:
- ingest_merge_requests(): Coordinates MR fetch -> discussion sync flow
- Progress callbacks emit MR-specific events for UI feedback
- Respects --full flag to reset discussion watermarks for full resync
The prefetch strategy is critical for MRs which typically have more
discussions than issues, and where API latency dominates sync time.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduces NormalizedMergeRequest transformer and updates discussion
normalization to handle both issue and MR discussions polymorphically.
New transformers:
- NormalizedMergeRequest: Transforms API MergeRequest to database row,
extracting labels/assignees/reviewers into separate collections for
junction table insertion. Handles draft detection, detailed_merge_status
preference over deprecated merge_status, and merge_user over merged_by.
Discussion transformer updates:
- NormalizedDiscussion now takes noteable_type ("Issue" | "MergeRequest")
and noteable_id for polymorphic FK binding
- normalize_discussions_for_issue(): Convenience wrapper for issues
- normalize_discussions_for_mr(): Convenience wrapper for MRs
- DiffNote position fields (type, line_range, SHA triplet) now extracted
from API position object for code review context
Design decisions:
- Transformer returns (normalized_item, labels, assignees, reviewers)
tuple for efficient batch insertion without re-querying
- Timestamps converted to ms epoch for SQLite storage consistency
- Optional fields use map() chains for clean null handling
The polymorphic discussion approach allows reusing the same discussions
and notes tables for both issues and MRs, with noteable_type + FK
determining the parent relationship.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends GitLabClient with endpoints for fetching merge requests and
their discussions, following the same patterns established for issues.
New methods:
- fetch_merge_requests(): Paginated MR listing with cursor support,
using updated_after filter for incremental sync. Uses 'all' scope
to include MRs where user is author/assignee/reviewer.
- fetch_merge_requests_single_page(): Single page variant for callers
managing their own pagination (used by parallel prefetch)
- fetch_mr_discussions(): Paginated discussion listing for a single MR,
returns full discussion trees with notes
API design notes:
- Uses keyset pagination (order_by=updated_at, keyset=true) for
consistent results during sync operations
- MR endpoint uses /merge_requests (not /mrs) per GitLab API naming
- Discussion endpoint matches issue pattern for consistency
- Per_page defaults to 100 (GitLab max) for efficiency
The fetch_merge_requests_single_page method enables the parallel
prefetch strategy used in mr_discussions.rs, where multiple MRs'
discussions are fetched concurrently during the sweep phase.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Extends GitLab type definitions with comprehensive merge request support,
matching the API response structure for /projects/:id/merge_requests.
New types:
- MergeRequest: Full MR metadata including draft status, branch info,
detailed_merge_status, merge_user (modern API fields replacing
deprecated alternatives), and references for cross-project support
- MrReviewer: Reviewer user info (MR-specific, distinct from assignees)
- MrAssignee: Assignee user info with consistent structure
- MrDiscussion: MR discussion wrapper for polymorphic handling
- DiffNotePosition: Rich position data for code review comments with
line ranges and SHA triplet for commit context
Design decisions:
- Use Option<T> for all nullable API fields to handle partial responses
- Include deprecated fields (merged_by, merge_status) alongside modern
alternatives for backward compatibility with older GitLab instances
- DiffNotePosition uses Option for all fields since different position
types (text/image/file) populate different subsets
These types enable type-safe deserialization of GitLab MR API responses
with full coverage of the fields needed for CP2 ingestion.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This is a P1 fix from the CP1-CP2 alignment audit. The --full flag was
designed to enable complete data re-synchronization, but it only reset
sync_cursors for issues—it failed to reset the per-issue
discussions_synced_for_updated_at watermark.
The result was an inconsistent state: issues would be re-fetched from
GitLab (because sync_cursors were cleared), but their discussions would
NOT be re-synced (because the watermark comparison prevented it). This
was a subtle bug because the watermark check uses:
WHERE updated_at > COALESCE(discussions_synced_for_updated_at, 0)
When discussions_synced_for_updated_at is already set to the issue's
updated_at, the comparison fails and discussions are skipped.
Fix: Before clearing sync_cursors, set discussions_synced_for_updated_at
to NULL for all issues in the project. This makes COALESCE return 0,
ensuring all issues become eligible for discussion sync.
The ordering is important: watermarks must be reset BEFORE cursors to
ensure the full sync behaves consistently.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This is a P0 fix from the CP1-CP2 alignment audit. The original
NormalizedDiscussion struct had issue_id as a non-optional i64 and
hardcoded noteable_type to "Issue", making it incompatible with merge
request discussions even though the database schema already supports
both via nullable columns and a CHECK constraint.
Changes:
- Add NoteableRef enum with Issue(i64) and MergeRequest(i64) variants
to provide compile-time safety against mixing up issue vs MR IDs
- Change NormalizedDiscussion.issue_id from i64 to Option<i64>
- Add NormalizedDiscussion.merge_request_id: Option<i64>
- Update transform_discussion() signature to take NoteableRef instead
of local_issue_id, deriving issue_id/merge_request_id/noteable_type
from the enum variant
- Update upsert_discussion() SQL to include merge_request_id column
(now 12 parameters instead of 11)
- Export NoteableRef from transformers module
- Add test for MergeRequest discussion transformation
- Update all existing tests to use NoteableRef::Issue(id)
The database schema (migration 002) was forward-thinking and already
supports both issue_id and merge_request_id as nullable columns with
a CHECK constraint. This change prepares the application layer for
CP2 merge request support without requiring any migrations.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Provides a typed interface to the GitLab API with pagination support.
src/gitlab/types.rs - API response type definitions:
- GitLabIssue: Full issue payload with author, assignees, labels
- GitLabDiscussion: Discussion thread with notes array
- GitLabNote: Individual note with author, timestamps, body
- GitLabAuthor/GitLabUser: User information with avatar URLs
- GitLabProject: Project metadata from /api/v4/projects
- GitLabVersion: GitLab instance version from /api/v4/version
- GitLabNotePosition: Line-level position for diff notes
- All types derive Deserialize for JSON parsing
src/gitlab/client.rs - HTTP client with authentication:
- Bearer token authentication from config
- Base URL configuration for self-hosted instances
- Paginated iteration via keyset or offset pagination
- Automatic Link header parsing for next page URLs
- Per-page limit control (default 100)
- Methods: get_user(), get_version(), get_project()
- Async stream for issues: list_issues_paginated()
- Async stream for discussions: list_issue_discussions_paginated()
- Respects GitLab rate limiting via response headers
src/gitlab/transformers/ - API to database mapping:
transformers/issue.rs - Issue transformation:
- Maps GitLabIssue to IssueRow for database insert
- Extracts milestone ID and due date
- Normalizes author/assignee usernames
- Preserves label IDs for junction table
- Returns IssueWithMetadata including label/assignee lists
transformers/discussion.rs - Discussion transformation:
- Maps GitLabDiscussion to NormalizedDiscussion
- Extracts thread metadata (resolvable, resolved)
- Flattens notes to NormalizedNote with foreign keys
- Handles system notes vs user notes
- Preserves note position for diff discussions
transformers/mod.rs - Re-exports all transformer types
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>