feat(who): expand expert + overlap queries with mr_file_changes and mr_reviewers
Chain: bd-jec (config flag) -> bd-2yo (fetch MR diffs) -> bd-3qn6 (rewrite who queries) - Add fetch_mr_file_changes config option and --no-file-changes CLI flag - Add GitLab MR diffs API fetch pipeline with watermark-based sync - Create migration 020 for diffs_synced_for_updated_at watermark column - Rewrite query_expert() and query_overlap() to use 4-signal UNION ALL: DiffNote reviewers, DiffNote MR authors, file-change authors, file-change reviewers - Deduplicate across signal types via COUNT(DISTINCT CASE WHEN ... THEN mr_id END) - Add insert_file_change test helper, 8 new who tests, all 397 tests pass - Also includes: list performance migration 019, autocorrect module, README updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
194
README.md
194
README.md
@@ -1,6 +1,6 @@
|
||||
# Gitlore
|
||||
|
||||
Local GitLab data management with semantic search and temporal intelligence. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, hybrid search, and chronological event reconstruction.
|
||||
Local GitLab data management with semantic search, people intelligence, and temporal analysis. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, hybrid search, chronological event reconstruction, and expert discovery.
|
||||
|
||||
## Features
|
||||
|
||||
@@ -10,6 +10,7 @@ Local GitLab data management with semantic search and temporal intelligence. Syn
|
||||
- **Multi-project**: Track issues and MRs across multiple GitLab projects
|
||||
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
|
||||
- **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
|
||||
- **People intelligence**: Expert discovery, workload analysis, review patterns, active discussions, and code ownership overlap
|
||||
- **Timeline pipeline**: Reconstructs chronological event histories by combining search, graph traversal, and event aggregation across related entities
|
||||
- **Git history linking**: Tracks merge and squash commit SHAs to connect MRs with git history
|
||||
- **File change tracking**: Records which files each MR touches, enabling file-level history queries
|
||||
@@ -17,7 +18,7 @@ Local GitLab data management with semantic search and temporal intelligence. Syn
|
||||
- **Discussion threading**: Full support for issue and MR discussions including inline code review comments
|
||||
- **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
|
||||
- **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
|
||||
- **Robot mode**: Machine-readable JSON output with structured errors and meaningful exit codes
|
||||
- **Robot mode**: Machine-readable JSON output with structured errors, meaningful exit codes, and actionable recovery steps
|
||||
- **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing
|
||||
|
||||
## Installation
|
||||
@@ -60,6 +61,15 @@ lore mrs 456
|
||||
# Search across all indexed data
|
||||
lore search "authentication bug"
|
||||
|
||||
# Who knows about this code area?
|
||||
lore who src/features/auth/
|
||||
|
||||
# What is @asmith working on?
|
||||
lore who @asmith
|
||||
|
||||
# Timeline of events related to deployments
|
||||
lore timeline "deployment"
|
||||
|
||||
# Robot mode (machine-readable JSON)
|
||||
lore -J issues -n 5 | jq .
|
||||
```
|
||||
@@ -256,8 +266,135 @@ lore search "deploy" --explain # Show ranking explanation per resu
|
||||
lore search "deploy" --fts-mode raw # Raw FTS5 query syntax (advanced)
|
||||
```
|
||||
|
||||
The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. Use `raw` for advanced FTS5 query syntax (AND, OR, NOT, phrase matching, prefix queries).
|
||||
|
||||
Requires `lore generate-docs` (or `lore sync`) to have been run at least once. Semantic and hybrid modes require `lore embed` (or `lore sync`) to have generated vector embeddings via Ollama.
|
||||
|
||||
### `lore who`
|
||||
|
||||
People intelligence: discover experts, analyze workloads, review patterns, active discussions, and code overlap.
|
||||
|
||||
#### Expert Mode
|
||||
|
||||
Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis).
|
||||
|
||||
```bash
|
||||
lore who src/features/auth/ # Who knows about this directory?
|
||||
lore who src/features/auth/login.ts # Who knows about this file?
|
||||
lore who --path README.md # Root files need --path flag
|
||||
lore who --path Makefile # Dotless root files too
|
||||
lore who src/ --since 3m # Limit to recent 3 months
|
||||
lore who src/ -p group/repo # Scope to project
|
||||
```
|
||||
|
||||
The target is auto-detected as a path when it contains `/`. For root files without `/` (e.g., `README.md`), use the `--path` flag. Default time window: 6 months.
|
||||
|
||||
#### Workload Mode
|
||||
|
||||
See what someone is currently working on.
|
||||
|
||||
```bash
|
||||
lore who @asmith # Full workload summary
|
||||
lore who @asmith -p group/repo # Scoped to one project
|
||||
```
|
||||
|
||||
Shows: assigned open issues, authored MRs, MRs under review, and unresolved discussions.
|
||||
|
||||
#### Reviews Mode
|
||||
|
||||
Analyze someone's code review patterns by area.
|
||||
|
||||
```bash
|
||||
lore who @asmith --reviews # Review activity breakdown
|
||||
lore who @asmith --reviews --since 3m # Recent review patterns
|
||||
```
|
||||
|
||||
Shows: total DiffNotes, categorized by code area with percentage breakdown.
|
||||
|
||||
#### Active Mode
|
||||
|
||||
Surface unresolved discussions needing attention.
|
||||
|
||||
```bash
|
||||
lore who --active # Unresolved discussions (last 7 days)
|
||||
lore who --active --since 30d # Wider time window
|
||||
lore who --active -p group/repo # Scoped to project
|
||||
```
|
||||
|
||||
Shows: discussion threads with participants and last activity timestamps.
|
||||
|
||||
#### Overlap Mode
|
||||
|
||||
Find who else is touching a file or directory.
|
||||
|
||||
```bash
|
||||
lore who --overlap src/features/auth/ # Who else works here?
|
||||
lore who --overlap src/lib.rs # Single file overlap
|
||||
```
|
||||
|
||||
Shows: users with touch counts (author vs. review), linked MR references. Default time window: 6 months.
|
||||
|
||||
#### Common Flags
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `-p` / `--project` | Scope to a project (fuzzy match) |
|
||||
| `--since` | Time window (7d, 2w, 6m, YYYY-MM-DD). Default varies by mode. |
|
||||
| `-n` / `--limit` | Max results per section (1-500, default 20) |
|
||||
|
||||
### `lore timeline`
|
||||
|
||||
Reconstruct a chronological timeline of events matching a keyword query. The pipeline discovers related entities through cross-reference graph traversal and assembles a unified, time-ordered event stream.
|
||||
|
||||
```bash
|
||||
lore timeline "deployment" # Events related to deployments
|
||||
lore timeline "auth" -p group/repo # Scoped to a project
|
||||
lore timeline "auth" --since 30d # Only recent events
|
||||
lore timeline "migration" --depth 2 # Deeper cross-reference expansion
|
||||
lore timeline "migration" --expand-mentions # Follow 'mentioned' edges (high fan-out)
|
||||
lore timeline "deploy" -n 50 # Limit event count
|
||||
lore timeline "auth" --max-seeds 5 # Fewer seed entities
|
||||
```
|
||||
|
||||
#### Flags
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `-p` / `--project` | all | Scope to a specific project (fuzzy match) |
|
||||
| `--since` | none | Only events after this date (7d, 2w, 6m, YYYY-MM-DD) |
|
||||
| `--depth` | `1` | Cross-reference expansion depth (0 = seeds only) |
|
||||
| `--expand-mentions` | off | Also follow "mentioned" edges during expansion |
|
||||
| `-n` / `--limit` | `100` | Maximum events to display |
|
||||
| `--max-seeds` | `10` | Maximum seed entities from search |
|
||||
| `--max-entities` | `50` | Maximum entities discovered via cross-references |
|
||||
| `--max-evidence` | `10` | Maximum evidence notes included |
|
||||
|
||||
#### Pipeline Stages
|
||||
|
||||
1. **SEED** -- Full-text search identifies the most relevant issues and MRs matching the query. Documents are ranked by BM25 relevance.
|
||||
2. **HYDRATE** -- Evidence notes are extracted: the top FTS-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced.
|
||||
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities via "closes", "related", and optionally "mentioned" references up to the configured depth.
|
||||
4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking.
|
||||
5. **RENDER** -- Events are formatted as human-readable text or structured JSON (robot mode).
|
||||
|
||||
#### Event Types
|
||||
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `Created` | Entity creation |
|
||||
| `StateChanged` | State transitions (opened, closed, reopened) |
|
||||
| `LabelAdded` | Label applied to entity |
|
||||
| `LabelRemoved` | Label removed from entity |
|
||||
| `MilestoneSet` | Milestone assigned |
|
||||
| `MilestoneRemoved` | Milestone removed |
|
||||
| `Merged` | MR merged (deduplicated against state events) |
|
||||
| `NoteEvidence` | Discussion note matched by FTS, with snippet |
|
||||
| `CrossReferenced` | Reference to another entity |
|
||||
|
||||
#### Unresolved References
|
||||
|
||||
When graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the output. This enables discovery of external dependencies and can inform future sync targets.
|
||||
|
||||
### `lore sync`
|
||||
|
||||
Run the full sync pipeline: ingest from GitLab, generate searchable documents, and compute embeddings.
|
||||
@@ -269,6 +406,7 @@ lore sync --force # Override stale lock
|
||||
lore sync --no-embed # Skip embedding step
|
||||
lore sync --no-docs # Skip document regeneration
|
||||
lore sync --no-events # Skip resource event fetching
|
||||
lore sync --dry-run # Preview what would be synced
|
||||
```
|
||||
|
||||
The sync command displays animated progress bars for each stage and outputs timing metrics on completion. In robot mode (`-J`), detailed stage timing is included in the JSON response.
|
||||
@@ -284,6 +422,7 @@ lore ingest mrs # MRs only
|
||||
lore ingest issues -p group/repo # Single project
|
||||
lore ingest --force # Override stale lock
|
||||
lore ingest --full # Full re-sync (reset cursors)
|
||||
lore ingest --dry-run # Preview what would change
|
||||
```
|
||||
|
||||
The `--full` flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:
|
||||
@@ -307,6 +446,7 @@ Generate vector embeddings for documents via Ollama. Requires Ollama running wit
|
||||
|
||||
```bash
|
||||
lore embed # Embed new/changed documents
|
||||
lore embed --full # Re-embed all documents (clears existing)
|
||||
lore embed --retry-failed # Retry previously failed embeddings
|
||||
```
|
||||
|
||||
@@ -322,6 +462,9 @@ lore count discussions --for issue # Issue discussions only
|
||||
lore count discussions --for mr # MR discussions only
|
||||
lore count notes # Total notes (system vs user breakdown)
|
||||
lore count notes --for issue # Issue notes only
|
||||
lore count events # Total resource events
|
||||
lore count events --for issue # Issue events only
|
||||
lore count events --for mr # MR events only
|
||||
```
|
||||
|
||||
### `lore stats`
|
||||
@@ -332,6 +475,7 @@ Show document and index statistics, with optional integrity checks.
|
||||
lore stats # Document and index statistics
|
||||
lore stats --check # Run integrity checks
|
||||
lore stats --check --repair # Repair integrity issues
|
||||
lore stats --dry-run # Preview repairs without saving
|
||||
```
|
||||
|
||||
### `lore status`
|
||||
@@ -357,6 +501,14 @@ lore init --force # Overwrite existing config
|
||||
lore init --non-interactive # Fail if prompts needed
|
||||
```
|
||||
|
||||
In robot mode, `init` supports non-interactive setup via flags:
|
||||
|
||||
```bash
|
||||
lore -J init --gitlab-url https://gitlab.com \
|
||||
--token-env-var GITLAB_TOKEN \
|
||||
--projects "group/project,other/project"
|
||||
```
|
||||
|
||||
### `lore auth`
|
||||
|
||||
Verify GitLab authentication is working.
|
||||
@@ -392,7 +544,7 @@ lore migrate
|
||||
|
||||
### `lore health`
|
||||
|
||||
Quick pre-flight check for config, database, and schema version. Exits 0 if healthy, 1 if unhealthy.
|
||||
Quick pre-flight check for config, database, and schema version. Exits 0 if healthy, 19 if unhealthy.
|
||||
|
||||
```bash
|
||||
lore health
|
||||
@@ -591,42 +743,6 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
|
||||
|
||||
The database is stored at `~/.local/share/lore/lore.db` by default (XDG compliant).
|
||||
|
||||
## Timeline Pipeline
|
||||
|
||||
The timeline pipeline reconstructs chronological event histories for GitLab entities by combining full-text search, cross-reference graph traversal, and resource event aggregation. Given a search query, it identifies relevant issues and MRs, discovers related entities through their reference graph, and assembles a unified, time-ordered event stream.
|
||||
|
||||
### Stages
|
||||
|
||||
The pipeline executes in five stages:
|
||||
|
||||
1. **SEED** -- Full-text search identifies the most relevant issues and MRs matching the query. Documents (issue bodies, MR descriptions, discussion notes) are ranked by BM25 relevance.
|
||||
|
||||
2. **HYDRATE** -- Evidence notes are extracted from the seed results: the top FTS-matched discussion notes with 200-character snippets that explain *why* each entity was surfaced.
|
||||
|
||||
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities. Starting from seed entities, the pipeline follows "closes", "related", and optionally "mentioned" references up to a configurable depth, tracking provenance (which entity referenced which, via what method).
|
||||
|
||||
4. **COLLECT** -- Events are gathered for all discovered entities (seeds + expanded). Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking (timestamp, then entity ID, then event type).
|
||||
|
||||
5. **RENDER** -- Events are formatted for output as human-readable text or structured JSON.
|
||||
|
||||
### Event Types
|
||||
|
||||
| Event | Description |
|
||||
|-------|-------------|
|
||||
| `Created` | Entity creation |
|
||||
| `StateChanged` | State transitions (opened, closed, reopened) |
|
||||
| `LabelAdded` | Label applied to entity |
|
||||
| `LabelRemoved` | Label removed from entity |
|
||||
| `MilestoneSet` | Milestone assigned |
|
||||
| `MilestoneRemoved` | Milestone removed |
|
||||
| `Merged` | MR merged (deduplicated against state events) |
|
||||
| `NoteEvidence` | Discussion note matched by FTS, with snippet |
|
||||
| `CrossReferenced` | Reference to another entity |
|
||||
|
||||
### Unresolved References
|
||||
|
||||
When the graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the pipeline output. This enables discovery of external dependencies and can inform future sync targets.
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user