Commit Graph

6 Commits

Author SHA1 Message Date
teernisse
eac640225f feat(core): add cursor persistence module for session-based timestamps
Introduces a lightweight file-based cursor system for persisting
per-user timestamps across CLI invocations. This enables "since last
check" semantics where `lore me` can track what the user has seen.

Key design decisions:
- Per-user cursor files: ~/.local/share/lore/me_cursor_<username>.json
- Atomic writes via temp-file + rename pattern (crash-safe)
- Graceful degradation: missing/corrupt files return None
- Username sanitization: non-safe chars replaced with underscore

The cursor module provides three operations:
- read_cursor(username) -> Option<i64>: read last-check timestamp
- write_cursor(username, timestamp_ms): atomically persist timestamp  
- reset_cursor(username): delete cursor file (no-op if missing)

Tests cover: missing file, roundtrip, per-user isolation, reset
isolation, JSON validity after overwrites, corrupt file handling.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-25 10:02:13 -05:00
teernisse
30ed02c694 feat(token): add stored token support with resolve_token and token_source
Introduce a centralized token resolution system that supports both
environment variables and config-file-stored tokens with clear priority
(env var wins). This enables cron-based sync which runs in minimal
shell environments without env vars.

Core changes:
- GitLabConfig gains optional `token` field and `resolve_token()` method
  that checks env var first, then config file, returning trimmed values
- `token_source()` returns human-readable provenance ("environment variable"
  or "config file") for diagnostics
- `ensure_config_permissions()` enforces 0600 on config files containing
  tokens (Unix only, no-op on other platforms)

New CLI commands:
- `lore token set [--token VALUE]` — validates against GitLab API, stores
  in config, enforces file permissions. Supports flag, stdin pipe, or
  interactive entry.
- `lore token show [--unmask]` — displays masked token with source label

Consumers updated to use resolve_token():
- auth_test: removes manual env var lookup
- doctor: shows token source in health check output
- ingest: uses centralized resolution

Includes 10 unit tests for resolve/source logic and 2 for mask_token.
2026-02-18 16:27:48 -05:00
Taylor Eernisse
65583ed5d6 refactor: Remove redundant doc comments throughout codebase
Removes module-level doc comments (//! lines) and excessive inline doc
comments that were duplicating information already evident from:
- Function/struct names (self-documenting code)
- Type signatures (the what is clear from types)
- Implementation context (the how is clear from code)

Affected modules:
- cli/* - Removed command descriptions duplicating clap help text
- core/* - Removed module headers and obvious function docs
- documents/* - Removed extractor/regenerator/truncation docs
- embedding/* - Removed pipeline and chunking docs
- gitlab/* - Removed client and transformer docs (kept type definitions)
- ingestion/* - Removed orchestrator and ingestion docs
- search/* - Removed FTS and vector search docs

Philosophy: Code should be self-documenting. Comments should explain
"why" (business decisions, non-obvious constraints) not "what" (which
the code itself shows). This change reduces noise and maintenance burden
while keeping the codebase just as understandable.

Retains comments for:
- Non-obvious business logic
- Important safety invariants
- Complex algorithm explanations
- Public API boundaries where generated docs matter

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 00:04:32 -05:00
teernisse
329c8f4539 feat(observability): Add metrics, logging, and sync-run core modules
Introduce the foundational observability layer for the sync pipeline:

- MetricsLayer: Custom tracing subscriber layer that captures span timing
  and structured fields, materializing them into a hierarchical
  Vec<StageTiming> tree for robot-mode performance data output
- logging: Dual-layer subscriber infrastructure with configurable stderr
  verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily
  rotation and configurable retention (default 30 days)
- SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs
  table (start -> succeed|fail), with correlation IDs and aggregate counts
- LoggingConfig: New config section for log_dir, retention_days, and
  file_logging toggle
- get_log_dir(): Path helper for log directory resolution
- is_permanent_api_error(): Distinguish retryable vs permanent API failures
  (only 404 is truly permanent; 403/auth errors may be environmental)

Database changes:
- Migration 013: Add resource_events_synced_for_updated_at watermark columns
  to issues and merge_requests tables for incremental resource event sync
- Migration 014: Enrich sync_runs with run_id correlation ID, aggregate
  counts (total_items_processed, total_errors), and run_id index
- Wrap file-based migrations in savepoints for rollback safety

Dependencies: Add uuid (run_id generation), tracing-appender (file logging)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 13:38:29 -05:00
teernisse
55b895a2eb Update name to gitlore instead of gitlab-inbox 2026-01-28 15:49:14 -05:00
Taylor Eernisse
7aaa51f645 feat(core): Implement infrastructure layer for CLI operations
Establishes foundational modules that all other components depend on.

src/core/config.rs - Configuration management:
- JSON-based config file with Zod-like validation via serde
- GitLab settings: base URL, token environment variable
- Project list with paths to track
- Sync settings: backfill days, stale lock timeout, cursor rewind
- Storage settings: database path, payload compression toggle
- XDG-compliant config path resolution via dirs crate
- Loads GITLAB_TOKEN from configured environment variable

src/core/db.rs - Database connection and migrations:
- Opens or creates SQLite database with WAL mode for concurrency
- Embeds migration SQL as const strings (001-005)
- Runs migrations idempotently with checksum verification
- Provides thread-safe connection management

src/core/error.rs - Unified error handling:
- GiError enum with variants for all failure modes
- Config, Database, GitLab, Ingestion, Lock, IO, Parse errors
- thiserror derive for automatic Display/Error impls
- Result type alias for ergonomic error propagation

src/core/lock.rs - Distributed sync locking:
- File-based locks to prevent concurrent syncs
- Stale lock detection with configurable timeout
- Force override for recovery scenarios
- Lock file contains PID and timestamp for debugging

src/core/paths.rs - Path resolution:
- XDG Base Directory Specification compliance
- Config: ~/.config/gi/config.json
- Data: ~/.local/share/gi/gi.db
- Creates parent directories on first access

src/core/payloads.rs - Raw payload storage:
- Optional gzip compression for storage efficiency
- SHA-256 content addressing for deduplication
- Type-prefixed keys (issue:, discussion:, note:)
- Batch insert with UPSERT for idempotent ingestion

src/core/time.rs - Timestamp utilities:
- Relative time parsing (7d, 2w, 1m) for --since flag
- ISO 8601 date parsing for absolute dates
- Human-friendly relative time formatting

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-26 11:28:07 -05:00