Introduce a centralized token resolution system that supports both
environment variables and config-file-stored tokens with clear priority
(env var wins). This enables cron-based sync which runs in minimal
shell environments without env vars.
Core changes:
- GitLabConfig gains optional `token` field and `resolve_token()` method
that checks env var first, then config file, returning trimmed values
- `token_source()` returns human-readable provenance ("environment variable"
or "config file") for diagnostics
- `ensure_config_permissions()` enforces 0600 on config files containing
tokens (Unix only, no-op on other platforms)
New CLI commands:
- `lore token set [--token VALUE]` — validates against GitLab API, stores
in config, enforces file permissions. Supports flag, stdin pipe, or
interactive entry.
- `lore token show [--unmask]` — displays masked token with source label
Consumers updated to use resolve_token():
- auth_test: removes manual env var lookup
- doctor: shows token source in health check output
- ingest: uses centralized resolution
Includes 10 unit tests for resolve/source logic and 2 for mask_token.
Removes module-level doc comments (//! lines) and excessive inline doc
comments that were duplicating information already evident from:
- Function/struct names (self-documenting code)
- Type signatures (the what is clear from types)
- Implementation context (the how is clear from code)
Affected modules:
- cli/* - Removed command descriptions duplicating clap help text
- core/* - Removed module headers and obvious function docs
- documents/* - Removed extractor/regenerator/truncation docs
- embedding/* - Removed pipeline and chunking docs
- gitlab/* - Removed client and transformer docs (kept type definitions)
- ingestion/* - Removed orchestrator and ingestion docs
- search/* - Removed FTS and vector search docs
Philosophy: Code should be self-documenting. Comments should explain
"why" (business decisions, non-obvious constraints) not "what" (which
the code itself shows). This change reduces noise and maintenance burden
while keeping the codebase just as understandable.
Retains comments for:
- Non-obvious business logic
- Important safety invariants
- Complex algorithm explanations
- Public API boundaries where generated docs matter
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Introduce the foundational observability layer for the sync pipeline:
- MetricsLayer: Custom tracing subscriber layer that captures span timing
and structured fields, materializing them into a hierarchical
Vec<StageTiming> tree for robot-mode performance data output
- logging: Dual-layer subscriber infrastructure with configurable stderr
verbosity (-v/-vv/-vvv) and always-on JSON file logging with daily
rotation and configurable retention (default 30 days)
- SyncRunRecorder: Compile-time enforced lifecycle recorder for sync_runs
table (start -> succeed|fail), with correlation IDs and aggregate counts
- LoggingConfig: New config section for log_dir, retention_days, and
file_logging toggle
- get_log_dir(): Path helper for log directory resolution
- is_permanent_api_error(): Distinguish retryable vs permanent API failures
(only 404 is truly permanent; 403/auth errors may be environmental)
Database changes:
- Migration 013: Add resource_events_synced_for_updated_at watermark columns
to issues and merge_requests tables for incremental resource event sync
- Migration 014: Enrich sync_runs with run_id correlation ID, aggregate
counts (total_items_processed, total_errors), and run_id index
- Wrap file-based migrations in savepoints for rollback safety
Dependencies: Add uuid (run_id generation), tracing-appender (file logging)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>