Commit Graph

7 Commits

Author SHA1 Message Date
Taylor Eernisse
a7f86b26e4 refactor(core): compact human log format, quieter lock lifecycle, nonzero_summary helper
Three quality-of-life improvements to reduce log noise and improve readability:

1. logging.rs: Add CompactHumanFormat for stderr tracing output. Replaces the
   default format with a minimal 'HH:MM:SS LEVEL  message key=value' layout —
   no span context, no full timestamps, no target module. The JSON file log
   layer is unaffected. This makes watching 'lore sync' output much cleaner.

2. lock.rs: Downgrade AppLock acquire/release messages from info! to debug!.
   Lock lifecycle events (acquired new, acquired existing, released) are
   operational bookkeeping that clutters normal output. They remain visible
   at -vv verbosity for troubleshooting.

3. ingestion/mod.rs: Add nonzero_summary() utility that formats named counters
   as a compact middle-dot-separated string, suppressing zero values. Produces
   output like '42 fetched · 3 labels · 12 notes' instead of verbose key=value
   structured fields. Returns 'nothing to update' when all values are zero.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 22:31:30 -05:00
Taylor Eernisse
cc11d3e5a0 fix: peer review — 5 correctness bugs across who, db, lock, embedding, main
Comprehensive peer code review identified and fixed the following:

1. who.rs: @-prefixed path routing used `target` (with @) instead of
   `clean` (stripped) when checking for '/' and passing to Expert mode,
   causing `lore who @src/auth/` to silently return zero results because
   the SQL LIKE matched against `@src/auth/%` which never exists.

2. db.rs: After ROLLBACK TO savepoint on migration failure, the savepoint
   was never RELEASEd, leaving it active on the connection. Fixed in both
   run_migrations() and run_migrations_from_dir().

3. lock.rs: Multiple acquire() calls (e.g. re-acquiring a stale lock)
   replaced the heartbeat_handle without stopping the old thread, causing
   two concurrent heartbeat writers competing on the same lock row. Now
   signals the old thread to stop and joins it before spawning a new one.

4. chunk_ids.rs: encode_rowid() had no guard for chunk_index >= 1000
   (CHUNK_ROWID_MULTIPLIER), which would cause rowid collisions between
   adjacent documents. Added range assertion [0, 1000).

5. main.rs: Fallback JSON error formatting in handle_auth_test
   interpolated LoreError Display output without escaping quotes or
   backslashes, potentially producing malformed JSON for robot-mode
   consumers. Now escapes both characters before interpolation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 08:07:59 -05:00
Taylor Eernisse
5786d7f4b6 fix: defensive hardening — lock release logging, SQLite param guard, vector cast
Three defensive improvements found via peer code review:

1. lock.rs: Lock release errors were silently discarded with `let _ =`.
   If the DELETE failed (disk full, corruption), the lock stayed in the
   database with no diagnostic. Next sync would require --force with no
   clue why. Now logs with error!() including the underlying error message.

2. filters.rs: Dynamic SQL label filter construction had no upper bound
   on bind parameters. With many combined filters, param_idx + labels.len()
   could exceed SQLite's 999-parameter limit, producing an opaque error.
   Added a guard that caps labels at 900 - param_idx.

3. vector.rs: max_chunks_per_document returned i64 which was cast to
   usize. A negative value from a corrupt database would wrap to a huge
   number, causing overflow in the multiplier calculation. Now clamped
   to .max(1) and cast via unsigned_abs().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 07:55:54 -05:00
Taylor Eernisse
65583ed5d6 refactor: Remove redundant doc comments throughout codebase
Removes module-level doc comments (//! lines) and excessive inline doc
comments that were duplicating information already evident from:
- Function/struct names (self-documenting code)
- Type signatures (the what is clear from types)
- Implementation context (the how is clear from code)

Affected modules:
- cli/* - Removed command descriptions duplicating clap help text
- core/* - Removed module headers and obvious function docs
- documents/* - Removed extractor/regenerator/truncation docs
- embedding/* - Removed pipeline and chunking docs
- gitlab/* - Removed client and transformer docs (kept type definitions)
- ingestion/* - Removed orchestrator and ingestion docs
- search/* - Removed FTS and vector search docs

Philosophy: Code should be self-documenting. Comments should explain
"why" (business decisions, non-obvious constraints) not "what" (which
the code itself shows). This change reduces noise and maintenance burden
while keeping the codebase just as understandable.

Retains comments for:
- Non-obvious business logic
- Important safety invariants
- Complex algorithm explanations
- Public API boundaries where generated docs matter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:04:32 -05:00
Taylor Eernisse
6e22f120d0 refactor(core): Rename GiError to LoreError and add search infrastructure
Mechanical rename of GiError -> LoreError across the core module to
match the project's rebranding from gitlab-inbox to gitlore/lore.
Updates the error enum name, all From impls, and the Result type alias.

Additionally introduces:

- New error variants for embedding pipeline: OllamaUnavailable,
  OllamaModelNotFound, EmbeddingFailed, EmbeddingsNotBuilt. Each
  includes actionable suggestions (e.g., "ollama serve", "ollama pull
  nomic-embed-text") to guide users through recovery.

- New error codes 14-16 for programmatic handling of Ollama failures.

- Savepoint-based migration execution in db.rs: each migration now
  runs inside a SQLite SAVEPOINT so a failed migration rolls back
  cleanly without corrupting the schema_version tracking. Previously
  a partial migration could leave the database in an inconsistent
  state.

- core::backoff module: exponential backoff with jitter utility for
  retry loops in the embedding pipeline and discussion queues.

- core::project module: helper for resolving project IDs and paths
  from the local database, used by the document regenerator and
  search filters.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 15:45:54 -05:00
Taylor Eernisse
5fe76e46a3 fix(core): Add structured error handling and responsive lock release
Improves core infrastructure with robot-friendly error output and
faster lock release for better sync behavior.

Error handling improvements (error.rs):
- ErrorCode::exit_code(): Unique exit codes per error type (1-13)
  for programmatic error handling in scripts/agents
- GiError::suggestion(): Helpful hints for common error recovery
- GiError::to_robot_error(): Structured JSON error conversion
- RobotError/RobotErrorOutput: Serializable error types with code,
  message, and optional suggestion fields

Lock improvements (lock.rs):
- Heartbeat thread now polls every 100ms for release flag, only
  updating database heartbeat at full interval (5s default)
- Eliminates 5-10s delay after sync completion when waiting for
  heartbeat thread to notice release
- Reduces lock hold time after operation completes

Database (db.rs):
- Bump expected schema version to 6 for MR migration

The exit code mapping enables shell scripts and CI/CD pipelines to
distinguish between configuration errors (2-4), GitLab API errors
(5-8), and database errors (9-11) for appropriate retry/alert logic.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 22:46:08 -05:00
Taylor Eernisse
7aaa51f645 feat(core): Implement infrastructure layer for CLI operations
Establishes foundational modules that all other components depend on.

src/core/config.rs - Configuration management:
- JSON-based config file with Zod-like validation via serde
- GitLab settings: base URL, token environment variable
- Project list with paths to track
- Sync settings: backfill days, stale lock timeout, cursor rewind
- Storage settings: database path, payload compression toggle
- XDG-compliant config path resolution via dirs crate
- Loads GITLAB_TOKEN from configured environment variable

src/core/db.rs - Database connection and migrations:
- Opens or creates SQLite database with WAL mode for concurrency
- Embeds migration SQL as const strings (001-005)
- Runs migrations idempotently with checksum verification
- Provides thread-safe connection management

src/core/error.rs - Unified error handling:
- GiError enum with variants for all failure modes
- Config, Database, GitLab, Ingestion, Lock, IO, Parse errors
- thiserror derive for automatic Display/Error impls
- Result type alias for ergonomic error propagation

src/core/lock.rs - Distributed sync locking:
- File-based locks to prevent concurrent syncs
- Stale lock detection with configurable timeout
- Force override for recovery scenarios
- Lock file contains PID and timestamp for debugging

src/core/paths.rs - Path resolution:
- XDG Base Directory Specification compliance
- Config: ~/.config/gi/config.json
- Data: ~/.local/share/gi/gi.db
- Creates parent directories on first access

src/core/payloads.rs - Raw payload storage:
- Optional gzip compression for storage efficiency
- SHA-256 content addressing for deduplication
- Type-prefixed keys (issue:, discussion:, note:)
- Batch insert with UPSERT for idempotent ingestion

src/core/time.rs - Timestamp utilities:
- Relative time parsing (7d, 2w, 1m) for --since flag
- ISO 8601 date parsing for absolute dates
- Human-friendly relative time formatting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 11:28:07 -05:00