docs: update README and beads tracker state

Update README with documentation for surgical sync, token management, code provenance tracing, file-level history, cron scheduling, and configurable icon system. Add usage examples and environment variables. Update beads issue tracker state.
2026-02-18 16:28:21 -05:00
parent 9ec1344945
commit 53b093586b
3 changed files with 132 additions and 27 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
--- a/.beads/last-touched
+++ b/.beads/last-touched
@@ -1 +1 @@
-bd-1elx
+bd-9dd
--- a/README.md
+++ b/README.md
@@ -12,6 +12,9 @@ Local GitLab data management with semantic search, people intelligence, and temp
 - **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
 - **People intelligence**: Expert discovery, workload analysis, review patterns, active discussions, and code ownership overlap
 - **Timeline pipeline**: Reconstructs chronological event histories by combining search, graph traversal, and event aggregation across related entities
+- **Code provenance tracing**: Traces why code was introduced by linking files to MRs, MRs to issues, and issues to discussion threads
+- **File-level history**: Shows which MRs touched a file with rename-chain resolution and inline DiffNote snippets
+- **Surgical sync**: Sync specific issues or MRs by IID without running a full incremental sync, with preflight validation
 - **Git history linking**: Tracks merge and squash commit SHAs to connect MRs with git history
 - **File change tracking**: Records which files each MR touches, enabling file-level history queries
 - **Raw payload storage**: Preserves original GitLab API responses for debugging
@@ -21,9 +24,12 @@ Local GitLab data management with semantic search, people intelligence, and temp
 - **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
 - **Note querying**: Rich filtering over discussion notes by author, type, path, resolution status, time range, and body content
 - **Discussion drift detection**: Semantic analysis of how discussions diverge from original issue intent
+- **Automated sync scheduling**: Cron-based automatic syncing with configurable intervals (Unix)
+- **Token management**: Secure interactive or piped token storage with masked display
 - **Robot mode**: Machine-readable JSON output with structured errors, meaningful exit codes, and actionable recovery steps
 - **Error tolerance**: Auto-corrects common CLI mistakes (case, typos, single-dash flags, value casing) with teaching feedback
 - **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing
+- **Icon system**: Configurable icon sets (Nerd Fonts, Unicode, ASCII) with automatic detection

 ## Installation

@@ -77,6 +83,15 @@ lore timeline "deployment"
 # Timeline for a specific issue
 lore timeline issue:42

+# Why was this file changed? (file -> MR -> issue -> discussion)
+lore trace src/features/auth/login.ts
+
+# Which MRs touched this file?
+lore file-history src/features/auth/
+
+# Sync a specific issue without full sync
+lore sync --issue 42 -p group/repo
+
 # Query notes by author
 lore notes --author alice --since 7d

@@ -190,6 +205,8 @@ Create a personal access token with `read_api` scope:
 | `XDG_DATA_HOME` | XDG Base Directory for data (fallback: `~/.local/share`) | No |
 | `NO_COLOR` | Disable color output when set (any value) | No |
 | `CLICOLOR` | Standard color control (0 to disable) | No |
+| `LORE_ICONS` | Override icon set: `nerd`, `unicode`, or `ascii` | No |
+| `NERD_FONTS` | Enable Nerd Font icons when set to a non-empty value | No |
 | `RUST_LOG` | Logging level filter (e.g., `lore=debug`) | No |

 ## Commands
@@ -353,12 +370,13 @@ Shows: total DiffNotes, categorized by code area with percentage breakdown.

 #### Active Mode

-Surface unresolved discussions needing attention.
+Surface unresolved discussions needing attention. By default, only discussions on open issues and non-merged MRs are shown.

 ```bash
 lore who --active                     # Unresolved discussions (last 7 days)
 lore who --active --since 30d        # Wider time window
 lore who --active -p group/repo      # Scoped to project
+lore who --active --include-closed   # Include discussions on closed/merged entities
 ```

 Shows: discussion threads with participants and last activity timestamps.
@@ -382,6 +400,7 @@ Shows: users with touch counts (author vs. review), linked MR references. Defaul
 | `--since` | Time window (7d, 2w, 6m, YYYY-MM-DD). Default varies by mode. |
 | `-n` / `--limit` | Max results per section (1-500, default 20) |
 | `--all-history` | Remove the default time window, query all history |
+| `--include-closed` | Include discussions on closed issues and merged/closed MRs (active mode) |
 | `--detail` | Show per-MR detail breakdown (expert mode only) |
 | `--explain-score` | Show per-component score breakdown (expert mode only) |
 | `--as-of` | Score as if "now" is a past date (ISO 8601 or duration like 30d, expert mode only) |
@@ -465,8 +484,6 @@ lore notes --contains "TODO"                 # Substring search in note body
 lore notes --include-system                  # Include system-generated notes
 lore notes --since 2w --until 2024-12-31     # Time-bounded range
 lore notes --sort updated --asc              # Sort by update time, ascending
-lore notes --format csv                      # CSV output
-lore notes --format jsonl                    # Line-delimited JSON
 lore notes -o                                # Open first result in browser

 # Field selection (robot mode)
@@ -493,9 +510,52 @@ lore -J notes --fields minimal               # Compact: id, author_username, bod
 | `--resolution` | Filter by resolution status (`any`, `unresolved`, `resolved`) |
 | `--sort` | Sort by `created` (default) or `updated` |
 | `--asc` | Sort ascending (default: descending) |
-| `--format` | Output format: `table` (default), `json`, `jsonl`, `csv` |
 | `-o` / `--open` | Open first result in browser |

+### `lore file-history`
+
+Show which merge requests touched a file, with rename-chain resolution and optional DiffNote discussion snippets.
+
+```bash
+lore file-history src/main.rs                         # MRs that touched this file
+lore file-history src/auth/ -p group/repo             # Scoped to project
+lore file-history src/foo.rs --discussions             # Include DiffNote snippets
+lore file-history src/bar.rs --no-follow-renames       # Skip rename chain resolution
+lore file-history src/bar.rs --merged                  # Only merged MRs
+lore file-history src/bar.rs -n 100                    # More results
+```
+
+Rename-chain resolution follows file renames through `mr_file_changes` so that querying a renamed file also surfaces MRs that touched previous names. Disable with `--no-follow-renames`.
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `-p` / `--project` | all | Scope to a specific project (fuzzy match) |
+| `--discussions` | off | Include DiffNote discussion snippets on the file |
+| `--no-follow-renames` | off | Disable rename chain resolution |
+| `--merged` | off | Only show merged MRs |
+| `-n` / `--limit` | `50` | Maximum results |
+
+### `lore trace`
+
+Trace why code was introduced by building provenance chains: file -> MR -> issue -> discussion threads.
+
+```bash
+lore trace src/main.rs                         # Why was this file changed?
+lore trace src/auth/ -p group/repo             # Scoped to project
+lore trace src/foo.rs --discussions             # Include DiffNote context
+lore trace src/bar.rs:42                        # Line hint (future Tier 2)
+lore trace src/bar.rs --no-follow-renames       # Skip rename chain resolution
+```
+
+Each trace chain links a file change to the MR that introduced it, the issue(s) that motivated it (via "closes" references), and the discussion threads on those entities. Line-level hints (`:line` suffix) are accepted but produce an advisory message until Tier 2 git-blame integration is available.
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `-p` / `--project` | all | Scope to a specific project (fuzzy match) |
+| `--discussions` | off | Include DiffNote discussion snippets |
+| `--no-follow-renames` | off | Disable rename chain resolution |
+| `-n` / `--limit` | `20` | Maximum trace chains to display |
+
 ### `lore drift`

 Detect discussion divergence from the original intent of an issue by comparing the semantic similarity of discussion content against the issue description.
@@ -506,9 +566,34 @@ lore drift issues 42 --threshold 0.6        # Higher threshold (stricter)
 lore drift issues 42 -p group/repo          # Scope to project
 ```

+### `lore cron`
+
+Manage cron-based automatic syncing (Unix only). Installs a crontab entry that runs `lore sync --lock -q` at a configurable interval.
+
+```bash
+lore cron install                     # Install cron job (every 8 minutes)
+lore cron install --interval 15       # Custom interval in minutes
+lore cron status                      # Check if cron is installed
+lore cron uninstall                   # Remove cron job
+```
+
+The `--lock` flag on the auto-sync ensures that if a sync is already running, the cron invocation exits cleanly rather than competing for the database lock.
+
+### `lore token`
+
+Manage the stored GitLab token. Supports interactive entry with validation, non-interactive piped input, and masked display.
+
+```bash
+lore token set                         # Interactive token entry + validation
+lore token set --token glpat-xxx       # Non-interactive token storage
+echo glpat-xxx | lore token set        # Pipe token from stdin
+lore token show                        # Show token (masked)
+lore token show --unmask               # Show full token
+```
+
 ### `lore sync`

-Run the full sync pipeline: ingest from GitLab (including work item status enrichment via GraphQL), generate searchable documents, and compute embeddings.
+Run the full sync pipeline: ingest from GitLab (including work item status enrichment via GraphQL), generate searchable documents, and compute embeddings. Supports both incremental (cursor-based) and surgical (per-IID) modes.

 ```bash
 lore sync                    # Full pipeline
@@ -518,11 +603,29 @@ lore sync --no-embed         # Skip embedding step
 lore sync --no-docs          # Skip document regeneration
 lore sync --no-events        # Skip resource event fetching
 lore sync --no-file-changes  # Skip MR file change fetching
+lore sync --no-status        # Skip work-item status enrichment via GraphQL
 lore sync --dry-run          # Preview what would be synced
+lore sync --timings          # Show detailed timing breakdown per stage
+lore sync --lock             # Acquire file lock (skip if another sync is running)
+
+# Surgical sync: fetch specific entities by IID
+lore sync --issue 42 -p group/repo               # Sync a single issue
+lore sync --mr 99 -p group/repo                  # Sync a single MR
+lore sync --issue 42 --mr 99 -p group/repo       # Mix issues and MRs
+lore sync --issue 1 --issue 2 -p group/repo      # Multiple issues
+lore sync --issue 42 -p group/repo --preflight-only  # Validate without writing
 ```

 The sync command displays animated progress bars for each stage and outputs timing metrics on completion. In robot mode (`-J`), detailed stage timing is included in the JSON response.

+#### Surgical Sync
+
+When `--issue` or `--mr` flags are provided, sync switches to surgical mode which fetches only the specified entities and their dependents (discussions, events, file changes) from GitLab. This is faster than a full incremental sync and useful for refreshing specific entities on demand.
+
+Surgical mode requires `-p` / `--project` to scope the operation. Each entity goes through preflight validation against the GitLab API, then ingestion, document regeneration, and embedding. Entities that haven't changed since the last sync are skipped (TOCTOU check).
+
+Use `--preflight-only` to validate that entities exist on GitLab without writing to the database.
+
 ### `lore ingest`

 Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings). For issue ingestion, this includes a status enrichment phase that fetches work item statuses via the GitLab GraphQL API.
@@ -753,7 +856,7 @@ The CLI auto-corrects common mistakes before parsing, emitting a teaching note t
 |-----------|---------|------|
 | Single-dash long flag | `-robot` -> `--robot` | All |
 | Case normalization | `--Robot` -> `--robot` | All |
-| Flag prefix expansion | `--proj` -> `--project` (unambiguous only) | All |
+| Flag prefix expansion | `--proj` -> `--project`, `--no-color` -> `--color never` (unambiguous only) | All |
 | Fuzzy flag match | `--projct` -> `--project` | All (threshold 0.9 in robot, 0.8 in human) |
 | Subcommand alias | `merge_requests` -> `mrs`, `robotdocs` -> `robot-docs` | All |
 | Value normalization | `--state Opened` -> `--state opened` | All |
@@ -785,7 +888,7 @@ Commands accept aliases for common variations:
 | `stats` | `stat` |
 | `status` | `st` |

-Unambiguous prefixes also work via subcommand inference (e.g., `lore iss` -> `lore issues`, `lore time` -> `lore timeline`).
+Unambiguous prefixes also work via subcommand inference (e.g., `lore iss` -> `lore issues`, `lore time` -> `lore timeline`, `lore tra` -> `lore trace`).

 ### Agent Self-Discovery

@@ -840,6 +943,8 @@ lore --robot <command>                    # Machine-readable JSON
 lore -J <command>                         # JSON shorthand
 lore --color never <command>              # Disable color output
 lore --color always <command>             # Force color output
+lore --icons nerd <command>               # Nerd Font icons
+lore --icons ascii <command>              # ASCII-only icons (no Unicode)
 lore -q <command>                         # Suppress non-essential output
 lore -v <command>                         # Debug logging
 lore -vv <command>                        # More verbose debug logging
@@ -847,7 +952,7 @@ lore -vvv <command>                       # Trace-level logging
 lore --log-format json <command>          # JSON-formatted log output to stderr
 ```

-Color output respects `NO_COLOR` and `CLICOLOR` environment variables in `auto` mode (the default).
+Color output respects `NO_COLOR` and `CLICOLOR` environment variables in `auto` mode (the default). Icon sets default to `unicode` and can be overridden via `--icons`, `LORE_ICONS`, or `NERD_FONTS` environment variables.

 ## Shell Completions

@@ -895,7 +1000,7 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
 | `embeddings` | Vector embeddings for semantic search |
 | `dirty_sources` | Entities needing document regeneration after ingest |
 | `pending_discussion_fetches` | Queue for discussion fetch operations |
-| `sync_runs` | Audit trail of sync operations |
+| `sync_runs` | Audit trail of sync operations (supports surgical mode tracking with per-entity results) |
 | `sync_cursors` | Cursor positions for incremental sync |
 | `app_locks` | Crash-safe single-flight lock |
 | `raw_payloads` | Compressed original API responses |