Enriched all per-note search beads (NOTE-0A through NOTE-2I) with:
- Corrected migration numbers (022, 024, 025)
- Verified file paths and line numbers from codebase
- Complete function signatures for referenced code
- Detailed approach sections with SQL and Rust patterns
- DocumentData struct field mappings
- TDD anchors with specific test names
- Edge cases from codebase analysis
- Dependency context explaining what each blocker provides
Add quick_start section with glab equivalents, lore-exclusive features,
and read/write split guidance. Add example_output to issues, mrs, search,
and who commands. Update strip_schemas to also strip example_output in
brief mode. Update beads tracking state.
Closes: bd-91j1
Make run_search async, replace hardcoded lexical mode with SearchMode::parse(),
wire search_hybrid() with OllamaClient for semantic/hybrid modes, graceful
degradation when Ollama unavailable.
Closes: bd-1ksf
Implement drift detection using cosine similarity between issue description
embedding and chronological note embeddings. Sliding window (size 3) identifies
topic drift points. Includes human and robot output formatters.
New files: drift.rs, similarity.rs
Closes: bd-1cjx
Add references_full, user_notes_count, merge_requests_count computed
fields to show issue. Add closed_at and confidential columns via
migration 023.
Closes: bd-2g50
TUI PRD v2 (frankentui): Rounds 10-11 feedback refining the hybrid
Ratatui terminal UI approach — component architecture, keybinding
model, and incremental search integration.
Time-decay expert scoring: Round 6 feedback on the weighted scoring
model for the `who` command's expert mode, covering decay curves,
activity normalization, and bot filtering thresholds.
Plan-to-beads v2: Draft specification for the next iteration of the
plan-to-beads skill that converts markdown plans into dependency-
aware beads with full agent-executable context.
Per-note search PRD: Comprehensive product requirements for evolving
the search system from document-level to note-level granularity.
Includes 6 rounds of iterative feedback refining scope, ranking
strategy, migration path, and robot mode integration.
User journeys: Detailed walkthrough of 8 primary user workflows
covering issue triage, MR review lookup, code archaeology, expert
discovery, sync pipeline operation, and agent integration patterns.
Excalidraw source files and PNG exports for 5 architectural diagrams:
01-human-flow-map: User journey through lore CLI commands
02-agent-flow-map: AI agent interaction patterns with robot mode
03-command-coverage: Matrix of CLI commands vs data entities
04-gap-priority-matrix: Feature gap analysis with priority scoring
05-data-flow-architecture: End-to-end data pipeline from GitLab
through ingestion, storage, indexing, and query layers
User-supplied project names containing `%` or `_` were passed directly
into LIKE patterns, causing unintended wildcard matching. For example,
`my_project` would match `my-project` because `_` is a single-char
wildcard in SQL LIKE.
Added escape_like() helper that escapes `\`, `%`, and `_` with
backslash, and added ESCAPE '\' clauses to both the suffix-match and
substring-match queries in resolve_project().
Includes two regression tests:
- test_underscore_not_wildcard: `_` in input must not match `-`
- test_percent_not_wildcard: `%` in input must not match arbitrary strings
Query optimizer fixes for the `who` and `stats` commands based on
a systematic performance audit of the SQLite query plans.
who command (expert/reviews/detail modes):
- Add INDEXED BY idx_notes_diffnote_path_created hints to all DiffNote
queries. SQLite's planner was selecting idx_notes_system (38% of rows)
over the far more selective partial index (9.3% of rows). Measured
50-133x speedup on expert queries, 26x on reviews queries.
- Reorder JOIN clauses in detail mode's MR-author sub-select to match
the index scan direction (notes -> discussions -> merge_requests).
stats command:
- Replace 12+ sequential COUNT(*) queries with conditional aggregates
(COALESCE + SUM + CASE). Documents, dirty_sources, pending_discussion_
fetches, and pending_dependent_fetches tables each scanned once instead
of 2-3 times. Measured 1.7x speedup (109ms -> 65ms warm cache).
- Switch FTS document count from COUNT(*) on the virtual table to
COUNT(*) on documents_fts_docsize shadow table (B-tree scan vs FTS5
virtual table overhead). Measured 19x speedup for that single query.
Database: 61652 docs, 282K notes, 211K discussions, 1.5GB.
Updates README.md to explain the new defaultProject behavior:
- Config example now shows the defaultProject field
- New row in the configuration reference table describing the field,
its type (optional string), default (none), and behavior (fallback
when -p omitted, must match a configured path, CLI always overrides)
- Project Resolution section updated to explain the cascading logic:
CLI flag > config default > all projects
- Init section notes the interactive prompt for multi-project setups
and the --default-project flag for non-interactive/robot mode
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrates the defaultProject config field across the entire CLI
surface so that omitting `-p` now falls back to the configured default.
Init command:
- New `--default-project` flag on `lore init` (and robot-mode variant)
- InitInputs.default_project: Option<String> passed through to run_init
- Validation in run_init ensures the default matches a configured path
- Interactive mode: when multiple projects are configured, prompts
whether to set a default and which project to use
- Robot mode: InitOutputJson now includes default_project (omitted when
null) for downstream automation
- Autocorrect dictionary updated with `--default-project`
Command handlers applying effective_project():
- handle_issues: list filters use config default when -p omitted
- handle_mrs: same cascading resolution for MR listing
- handle_ingest: dry-run and full sync respect the default
- handle_timeline: TimelineParams.project resolved via effective_project
- handle_search: SearchCliFilters.project resolved via effective_project
- handle_generate_docs: project filter cascades
- handle_who: falls back to config.default_project when -p omitted
- handle_count: both count subcommands respect the default
- handle_discussions: discussion count filters respect the default
Robot-docs:
- init command schema updated with --default-project flag and
response_schema showing default_project as string?
- New config_notes section documents the defaultProject field with
type, description, and example
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces a new optional `defaultProject` field on Config (and
MinimalConfig for init output) that acts as a fallback when the
`-p`/`--project` CLI flag is omitted.
Domain-layer changes:
- Config.default_project: Option<String> with camelCase serde rename
- Config::load validates that defaultProject matches a configured
project path (exact or case-insensitive suffix match), returning
ConfigInvalid on mismatch
- Config::effective_project(cli_flag) -> Option<&str>: cascading
resolver that prefers the CLI flag, then the config default, then None
- MinimalConfig.default_project with skip_serializing_if for clean
JSON output when unset
Tests added:
- effective_project: CLI overrides default, falls back to default,
returns None when both absent
- Config::load: accepts valid defaultProject, rejects nonexistent,
accepts suffix match
- MinimalConfig: omits null defaultProject, includes when set
- Helper write_config_with_default_project for parameterized tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add StatusEnrichmentStarted/PageFetched/Writing progress events so
sync no longer has a 45-60s silent gap during GraphQL status fetch
- Thread per-page callback into fetch_issue_statuses_with_progress
- Hide status_category from all human and robot output (keep in DB)
- Add meta.available_statuses to issues list JSON response for agent
self-discovery of valid --status filter values
- Update robot-docs with status filtering documentation
(Supersedes empty commit f3788eb — jj auto-snapshot race.)
Three related refinements to how work item status is presented:
1. available_statuses in meta (list.rs, main.rs):
Robot-mode issue list responses now include meta.available_statuses —
a sorted array of all distinct status_name values in the database.
Agents can use this to validate --status filter values or display
valid options without a separate query.
2. Hide status_category from JSON (list.rs, show.rs):
status_category is a GitLab internal classification that duplicates
the state field. Switched to skip_serializing so it never appears
in JSON output while remaining available internally.
3. Simplify human-readable status display (show.rs):
Removed the "(category)" parenthetical from the Status line.
4. robot-docs schema updates (main.rs):
Documented --status filter semantics and meta.available_statuses.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Establishes Jujutsu (jj) as the preferred VCS tool for this colocated
repo, matching the global Claude Code rules. Agents should use jj
equivalents for all git operations and only fall back to raw git for
hooks, LFS, submodules, or gh CLI interop.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three related refinements to how work item status is presented:
1. available_statuses in meta (list.rs, main.rs):
Robot-mode issue list responses now include meta.available_statuses —
a sorted array of all distinct status_name values in the database.
Agents can use this to validate --status filter values, offer
autocomplete, or display valid options without a separate query.
2. Hide status_category from JSON (list.rs, show.rs):
status_category (e.g. "open", "closed") is a GitLab internal
classification that duplicates the state field and adds no actionable
signal for consumers. Switched from skip_serializing_if to
skip_serializing so it never appears in JSON output while remaining
available internally for future use.
3. Simplify human-readable status display (show.rs):
Removed the "(category)" parenthetical from the Status line in
lore show issue output. The category was noise — users care about
the board column label, not GitLab's internal taxonomy.
4. robot-docs schema updates (main.rs):
Documented the --status filter semantics and the new
meta.available_statuses field in the self-discovery manifest.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously the status enrichment phase (GraphQL work item status fetch)
ran silently — users saw no feedback between "syncing issues" and the
final enrichment summary. For projects with hundreds of issues and
adaptive page-size retries, this felt like a hang.
Changes across three layers:
GraphQL (graphql.rs):
- Extract fetch_issue_statuses_with_progress() accepting an optional
on_page callback invoked after each paginated fetch with the
running count of fetched IIDs
- Original fetch_issue_statuses() preserved as a zero-cost
delegation wrapper (no callback overhead)
Orchestrator (orchestrator.rs):
- Three new ProgressEvent variants: StatusEnrichmentStarted,
StatusEnrichmentPageFetched, StatusEnrichmentWriting
- Wire the page callback through to the new _with_progress fn
CLI (ingest.rs):
- Handle all three new events in the progress callback, updating
both the per-project spinner and the stage bar with live counts
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The migration runner now inserts (OR REPLACE) the schema_version row
after each successful migration batch, regardless of whether the
migration SQL itself contains a self-registering INSERT. This prevents
version tracking gaps when a .sql migration omits the bookkeeping
statement, which would leave the schema at an unrecorded version and
cause re-execution attempts on next startup.
Legacy migrations that already self-register are unaffected thanks to
the OR REPLACE conflict resolution.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Work item status integration across all CLI output:
Issue listing (lore list issues):
- New Status column appears when any issue has status data, with
hex-color rendering using ANSI 256-color approximation
- New --status flag for case-insensitive filtering (OR logic for
multiple values): lore issues --status "In progress" --status "To do"
- Status fields (name, category, color, icon_name, synced_at) in issue
list query and JSON output with conditional serialization
Issue detail (lore show issue):
- Displays "Status: In progress (in_progress)" with color-coded output
using ANSI 256-color approximation from hex color values
- Status fields included in robot mode JSON with ISO timestamps
- IssueRow, IssueDetail, IssueDetailJson all carry status columns
Robot mode field selection expanded to new commands:
- search: --fields with "minimal" preset (document_id, title, source_type, score)
- timeline: --fields with "minimal" preset (timestamp, type, entity_iid, detail)
- who: --fields with per-mode presets (expert_minimal, workload_minimal, etc.)
- robot-docs: new --brief flag strips response_schema from output (~60% smaller)
- strip_schemas() utility in robot.rs for --brief mode
- expand_fields_preset() extended for search, timeline, and all who modes
Robot-docs manifest updated with --status flag documentation, --fields
flags for search/timeline/who, fields_presets sections, and corrected
search response schema field names.
Note: replaces empty commit dcfd449 which lost staging during hook execution.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
.claude/hooks/on-file-write.sh:
- Fix hook to read Claude Code context from JSON stdin (FILE_PATH and
CWD extracted via jq) instead of relying on environment variables
- Scan only the changed file instead of the entire project directory,
reducing hook execution from ~30s to <1s per save
.beads/:
- Sync issue tracker state
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three implementation plans with iterative cross-model refinement:
lore-service (5 iterations):
HTTP service layer exposing lore's SQLite data via REST/SSE for
integration with external tools (dashboards, IDE extensions, chat
agents). Covers authentication, rate limiting, caching strategy, and
webhook-driven sync triggers.
work-item-status-graphql (7 iterations + TDD appendix):
Detailed implementation plan for the GraphQL-based work item status
enrichment feature (now implemented). Includes the TDD appendix with
test-first development specifications covering GraphQL client, adaptive
pagination, ingestion orchestration, CLI display, and robot mode output.
time-decay-expert-scoring (iteration 5 feedback):
Updates to the existing time-decay scoring plan incorporating feedback
on decay curve parameterization, recency weighting for discussion
contributions, and staleness detection thresholds.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive product requirements document for the gitlore TUI built on
FrankenTUI's Elm architecture (Msg -> update -> view). The PRD (7800+
lines) covers:
Architecture: Separate binary crate (lore-tui) with runtime delegation,
Elm-style Model/Cmd/Msg, DbManager with closure-based read pool + WAL,
TaskSupervisor for dedup/cancellation, EntityKey system for type-safe
entity references, CommandRegistry as single source of truth for
keybindings/palette/help.
Screens: Dashboard, IssueList, IssueDetail, MrList, MrDetail, Search
(lexical/hybrid/semantic with facets), Timeline (5-stage pipeline),
Who (expert/workload/reviews/active/overlap), Sync (live progress),
CommandPalette, Help overlay.
Infrastructure: InputMode state machine, Clock trait for deterministic
rendering, crash_context ring buffer with redaction, instance lock,
progressive hydration, session restore, grapheme-safe text truncation
(unicode-width + unicode-segmentation), terminal sanitization (ANSI/bidi/
C1 controls), entity LRU cache.
Testing: Snapshot tests via insta, event-fuzz, CLI/TUI parity, tiered
benchmark fixtures (S/M/L), query-plan CI enforcement, Phase 2.5
vertical slice gate.
9 plan-refine iterations (ChatGPT review -> Claude integration):
Iter 1-3: Connection pool, debounce, EntityKey, TaskSupervisor,
keyset pagination, capability-adaptive rendering
Iter 4-6: Separate binary crate, ANSI hardening, session restore,
read tx isolation, progressive hydration, unicode-width
Iter 7-9: Per-screen LoadState, CommandRegistry, InputMode, Clock,
log redaction, entity cache, search cancel SLO, crash diagnostics
Also includes the original tui-prd.md (ratatui-based, superseded by v2).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
README.md:
- Feature summary updated to mention work item status sync and GraphQL
- New config reference entry for sync.fetchWorkItemStatus (default true)
- Issue listing/show examples include --status flag usage
- Valid fields list expanded with status_name, status_category,
status_color, status_icon_name, status_synced_at_iso
- Database schema table updated for issues table
- Ingest/sync command descriptions mention status enrichment phase
- Adaptive page sizing and graceful degradation documented
AGENTS.md:
- Robot mode example shows --status flag usage
docs/robot-mode-design.md:
- Issue available fields list expanded with status fields
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Work item status integration across all CLI output:
Issue listing (lore list issues):
- New Status column appears when any issue has status data, with
hex-color rendering using ANSI 256-color approximation
- New --status flag for case-insensitive filtering (OR logic for
multiple values): lore issues --status "In progress" --status "To do"
Issue detail (lore show issue):
- Displays "Status: In progress (in_progress)" with color-coded output
- Status fields (name, category, color, icon, synced_at) included in
robot mode JSON with ISO timestamps
Robot mode field selection expanded to new commands:
- search: --fields with "minimal" preset (document_id, title, source_type, score)
- timeline: --fields with "minimal" preset (timestamp, type, entity_iid, detail)
- who: --fields with per-mode presets (expert_minimal, workload_minimal, etc.)
- robot-docs: new --brief flag strips response_schema from output (~60% smaller)
Robot-docs manifest updated with --status flag documentation, --fields
flags for search/timeline/who, fields_presets sections, and corrected
search response schema field names.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a "Phase 1.5" status enrichment step to the issue ingestion pipeline
that fetches work item statuses via the GitLab GraphQL API after the
standard REST API ingestion completes.
Schema changes (migration 021):
- Add status_name, status_category, status_color, status_icon_name, and
status_synced_at columns to the issues table (all nullable)
Ingestion pipeline changes:
- New `enrich_issue_statuses_txn()` function that applies fetched
statuses in a single transaction with two phases: clear stale statuses
for issues that no longer have a status widget, then apply new/updated
statuses from the GraphQL response
- ProgressEvent variants for status enrichment (complete/skipped)
- IngestProjectResult tracks enrichment metrics (seen, enriched, cleared,
without_widget, partial_error_count, enrichment_mode, errors)
- Robot mode JSON output includes per-project status enrichment details
Configuration:
- New `sync.fetchWorkItemStatus` config option (defaults true) to disable
GraphQL status enrichment on instances without Premium/Ultimate
- `LoreError::GitLabAuthFailed` now treated as permanent API error so
status enrichment auth failures don't trigger retries
Also removes the unnecessary nested SAVEPOINT in store_closes_issues_refs
(already runs within the orchestrator's transaction context).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce a reusable GraphQL client (`src/gitlab/graphql.rs`) that handles
GitLab's GraphQL API with full error handling for auth failures, rate
limiting, and partial errors. Key capabilities:
- Adaptive page sizing (100 → 50 → 25 → 10) to handle GitLab GraphQL
complexity limits without hardcoding a single safe page size
- Paginated issue status fetching via the workItems GraphQL query
- Graceful detection of unsupported instances (missing GraphQL endpoint
or forbidden auth) so ingestion continues without status data
- Retry-After header parsing via the `httpdate` crate for rate limit
compliance
Also adds `WorkItemStatus` type to `gitlab::types` with name, category,
color, and icon_name fields (all optional except name) with comprehensive
deserialization tests covering all system statuses (TO_DO, IN_PROGRESS,
DONE, CANCELED) and edge cases (null category, unknown future values).
The `GitLabClient` gains a `graphql_client()` factory method for
ergonomic access from the ingestion pipeline.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drain_mr_diffs in orchestrator.rs already wraps each MR diff store
in an unchecked_transaction (alongside job completion and watermark
update). upsert_mr_file_changes was also starting its own inner
transaction via conn.unchecked_transaction(), causing every call to
fail with "cannot start a transaction within a transaction".
Remove the inner transaction management from upsert_mr_file_changes
so it operates on whatever Connection (or Transaction deref'd to
Connection) the caller provides. The caller in drain_mr_diffs owns
the transaction boundary. Standalone callers (tests, future direct
use) auto-commit each statement, which is correct for their use case.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ideas catalog (docs/ideas/): 25 feature concept documents covering future
lore capabilities including bottleneck detection, churn analysis, expert
scoring, collaboration patterns, milestone risk, knowledge silos, and more.
Each doc includes motivation, implementation sketch, data requirements, and
dependencies on existing infrastructure. README.md provides an overview and
SYSTEM-PROPOSAL.md presents the unified analytics vision.
Plans (plans/): Time-decay expert scoring design with four rounds of review
feedback exploring decay functions, scoring algebra, and integration points
with the existing who-expert pipeline.
Issue doc (docs/issues/001): Documents the timeline pipeline bug where
EntityRef was missing project context, causing ambiguous cross-project
references during the EXPAND stage.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add rule/config files for Cursor, Cline, Codex, Gemini, Continue, and
OpenCode editors pointing them to project conventions, UBS usage, and
AGENTS.md. Add a Claude Code on-file-write hook that runs UBS on
supported source files after every save.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- regenerator: Include project_id in the ON CONFLICT UPDATE clause for
document upserts. Previously, if a document moved between projects
(e.g., during re-ingestion), the project_id would remain stale.
- truncation: Compute the omission marker ("N notes omitted") before
checking whether first+last notes fit in the budget. The old order
computed the marker after the budget check, meaning the marker's byte
cost was unaccounted for and could cause over-budget output.
- ollama: Tighten model name matching to require either an exact match
or a colon-delimited tag prefix (model == name or name starts with
"model:"). The prior starts_with check would false-positive on
"nomic-embed-text-v2" when looking for "nomic-embed-text". Tests
updated to cover exact match, tagged, wrong model, and prefix
false-positive cases.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Label upsert (issues + merge_requests): Replace INSERT ... ON CONFLICT DO
UPDATE RETURNING with INSERT OR IGNORE + SELECT. The prior RETURNING-based
approach relied on last_insert_rowid() matching the returned id, which is
not guaranteed when ON CONFLICT triggers an update (SQLite may return 0).
The new two-step approach is unambiguous and correctly tracks created_count.
Init: Add ON CONFLICT(gitlab_project_id) DO UPDATE to the project insert
so re-running `lore init` updates path/branch/url instead of failing with
a unique constraint violation.
MR discussions sync: Reset discussions_sync_attempts to 0 when clearing a
sync health error, so previously-failed MRs get a fresh retry budget after
successful sync.
Count: format_number now handles negative numbers correctly by extracting
the sign before inserting thousand-separators.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace .unwrap_or(), .ok(), and .filter_map(|r| r.ok()) patterns with
proper error propagation using ? and rusqlite::OptionalExtension where
the query may legitimately return no rows.
Affected areas:
- events_db::count_events: three count queries now propagate errors
instead of defaulting to (0, 0) on failure
- note_parser::extract_refs_from_system_notes: row iteration errors
are now propagated instead of silently dropped via filter_map
- note_parser::noteable_type_to_entity_type: unknown types now log a
debug warning before defaulting to "issue"
- payloads::store_payload/read_payload: use .optional()? instead of
.ok() to distinguish "no row" from "query failed"
- backoff::compute_next_attempt_at: use .clamp(0, 30) to guard against
negative attempt_count, not just .min(30)
- search::vector::max_chunks_per_document: returns Result<i64> with
proper error propagation through .optional()?.flatten()
- embedding::chunk_ids::decode_rowid: promote debug_assert to assert
since negative rowids indicate data corruption worth failing fast on
- ingestion::dirty_tracker::record_dirty_error: use .optional()? to
handle missing dirty_sources row gracefully instead of hard error
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expert mode now surfaces the specific MR references (project/path!iid) that
contributed to each expert's score, capped at 50 per user. A new --detail flag
adds per-MR breakdowns showing role (Author/Reviewer/both), note count, and
last activity timestamp.
Scoring weights (author_weight, reviewer_weight, note_bonus) are now
configurable via the config file's `scoring` section with validation that
rejects negative values. Defaults shift to author_weight=25, reviewer_weight=10,
note_bonus=1 — better reflecting that code authorship is a stronger expertise
signal than review assignment alone.
Path resolution gains suffix matching: typing "login.rs" auto-resolves to
"src/auth/login.rs" when unambiguous, with clear disambiguation errors when
multiple paths match. Project-scoping (-p) narrows the candidate set.
The MAX_MR_REFS_PER_USER constant is promoted to module scope for reuse
across expert and overlap modes. Human output shows MR refs inline and detail
sub-rows when requested. Robot JSON includes mr_refs, mr_refs_total,
mr_refs_truncated, and optional details array.
Includes comprehensive tests for suffix resolution, scoring weight
configurability, MR ref aggregation across projects, and detail mode.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The vector search multiplier could grow unbounded on documents with
many chunks, producing enormous k values that cause SQLite to scan
far more rows than necessary. Clamp the multiplier to [8, 200] and
cap k at 10,000 to prevent degenerate performance on large corpora.
Also adds a debug_assert in decode_rowid to catch negative rowids
early — these indicate a bug in the encoding pipeline and should
fail fast rather than silently produce garbage document IDs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds mr_diffs_fetched and mr_diffs_failed fields to IngestResult and
SyncResult, threads them through the orchestrator aggregation, includes
them in the structured tracing span and human-readable sync summary.
Previously MR diff failures were silently swallowed — now they appear
alongside resource event counts for full pipeline observability.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, drain_resource_events, drain_mr_closes_issues, and
drain_mr_diffs each opened a transaction only for the job-complete +
watermark update, but the store operation ran outside that transaction.
If the process crashed between the store and the watermark update, data
would be persisted without the watermark advancing, causing silent
duplicates on the next sync.
Now each drain function opens the transaction before the store call and
commits it only after both the store and the watermark update succeed.
On error, the transaction is explicitly dropped so the connection is
not left in a half-committed state.
Also:
- store_resource_events no longer manages its own transaction; the caller
passes in a connection (which is actually the transaction)
- upsert_mr_file_changes wraps DELETE + INSERT in a transaction internally
- reset_discussion_watermarks now also clears diffs_synced_for_updated_at
- Orchestrator error span now includes closes_issues_failed + mr_diffs_failed
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>