diff --git a/README.md b/README.md index 0297558..922243f 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Gitlore -Local GitLab data management with semantic search. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, and hybrid search. +Local GitLab data management with semantic search and temporal intelligence. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, hybrid search, and chronological event reconstruction. ## Features @@ -10,6 +10,9 @@ Local GitLab data management with semantic search. Syncs issues, MRs, discussion - **Multi-project**: Track issues and MRs across multiple GitLab projects - **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches - **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion +- **Timeline pipeline**: Reconstructs chronological event histories by combining search, graph traversal, and event aggregation across related entities +- **Git history linking**: Tracks merge and squash commit SHAs to connect MRs with git history +- **File change tracking**: Records which files each MR touches, enabling file-level history queries - **Raw payload storage**: Preserves original GitLab API responses for debugging - **Discussion threading**: Full support for issue and MR discussions including inline code review comments - **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues @@ -518,7 +521,7 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables: |-------|---------| | `projects` | Tracked GitLab projects with metadata | | `issues` | Issue metadata (title, state, author, due date, milestone) | -| `merge_requests` | MR metadata (title, state, draft, branches, merge status) | +| `merge_requests` | MR metadata (title, state, draft, branches, merge status, commit SHAs) | | `milestones` | Project milestones with state and due dates | | `labels` | Project labels with colors | | `issue_labels` | Many-to-many issue-label relationships | @@ -526,6 +529,7 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables: | `mr_labels` | Many-to-many MR-label relationships | | `mr_assignees` | Many-to-many MR-assignee relationships | | `mr_reviewers` | Many-to-many MR-reviewer relationships | +| `mr_file_changes` | Files touched by each MR (path, change type, renames) | | `discussions` | Issue/MR discussion threads | | `notes` | Individual notes within discussions (with system note flag and DiffNote position data) | | `resource_state_events` | Issue/MR state change history (opened, closed, merged, reopened) | @@ -545,6 +549,42 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables: The database is stored at `~/.local/share/lore/lore.db` by default (XDG compliant). +## Timeline Pipeline + +The timeline pipeline reconstructs chronological event histories for GitLab entities by combining full-text search, cross-reference graph traversal, and resource event aggregation. Given a search query, it identifies relevant issues and MRs, discovers related entities through their reference graph, and assembles a unified, time-ordered event stream. + +### Stages + +The pipeline executes in five stages: + +1. **SEED** -- Full-text search identifies the most relevant issues and MRs matching the query. Documents (issue bodies, MR descriptions, discussion notes) are ranked by BM25 relevance. + +2. **HYDRATE** -- Evidence notes are extracted from the seed results: the top FTS-matched discussion notes with 200-character snippets that explain *why* each entity was surfaced. + +3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities. Starting from seed entities, the pipeline follows "closes", "related", and optionally "mentioned" references up to a configurable depth, tracking provenance (which entity referenced which, via what method). + +4. **COLLECT** -- Events are gathered for all discovered entities (seeds + expanded). Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking (timestamp, then entity ID, then event type). + +5. **RENDER** -- Events are formatted for output as human-readable text or structured JSON. + +### Event Types + +| Event | Description | +|-------|-------------| +| `Created` | Entity creation | +| `StateChanged` | State transitions (opened, closed, reopened) | +| `LabelAdded` | Label applied to entity | +| `LabelRemoved` | Label removed from entity | +| `MilestoneSet` | Milestone assigned | +| `MilestoneRemoved` | Milestone removed | +| `Merged` | MR merged (deduplicated against state events) | +| `NoteEvidence` | Discussion note matched by FTS, with snippet | +| `CrossReferenced` | Reference to another entity | + +### Unresolved References + +When the graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the pipeline output. This enables discovery of external dependencies and can inform future sync targets. + ## Development ```bash