docs(readme): add timeline pipeline documentation and schema updates

Documents the timeline pipeline feature in the README:
- New feature bullets: timeline pipeline, git history linking, file
  change tracking
- Updated schema table: merge_requests now includes commit SHAs,
  added mr_file_changes table
- New "Timeline Pipeline" section explaining the 5-stage architecture
  (SEED -> HYDRATE -> EXPAND -> COLLECT -> RENDER) with a table of all
  event types and a note on unresolved cross-project references

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Taylor Eernisse
2026-02-06 08:38:48 -05:00
parent 03d9f8cce5
commit b005edb7f2

View File

@@ -1,6 +1,6 @@
# Gitlore
Local GitLab data management with semantic search. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, and hybrid search.
Local GitLab data management with semantic search and temporal intelligence. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying, filtering, hybrid search, and chronological event reconstruction.
## Features
@@ -10,6 +10,9 @@ Local GitLab data management with semantic search. Syncs issues, MRs, discussion
- **Multi-project**: Track issues and MRs across multiple GitLab projects
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
- **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
- **Timeline pipeline**: Reconstructs chronological event histories by combining search, graph traversal, and event aggregation across related entities
- **Git history linking**: Tracks merge and squash commit SHAs to connect MRs with git history
- **File change tracking**: Records which files each MR touches, enabling file-level history queries
- **Raw payload storage**: Preserves original GitLab API responses for debugging
- **Discussion threading**: Full support for issue and MR discussions including inline code review comments
- **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
@@ -518,7 +521,7 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
|-------|---------|
| `projects` | Tracked GitLab projects with metadata |
| `issues` | Issue metadata (title, state, author, due date, milestone) |
| `merge_requests` | MR metadata (title, state, draft, branches, merge status) |
| `merge_requests` | MR metadata (title, state, draft, branches, merge status, commit SHAs) |
| `milestones` | Project milestones with state and due dates |
| `labels` | Project labels with colors |
| `issue_labels` | Many-to-many issue-label relationships |
@@ -526,6 +529,7 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
| `mr_labels` | Many-to-many MR-label relationships |
| `mr_assignees` | Many-to-many MR-assignee relationships |
| `mr_reviewers` | Many-to-many MR-reviewer relationships |
| `mr_file_changes` | Files touched by each MR (path, change type, renames) |
| `discussions` | Issue/MR discussion threads |
| `notes` | Individual notes within discussions (with system note flag and DiffNote position data) |
| `resource_state_events` | Issue/MR state change history (opened, closed, merged, reopened) |
@@ -545,6 +549,42 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
The database is stored at `~/.local/share/lore/lore.db` by default (XDG compliant).
## Timeline Pipeline
The timeline pipeline reconstructs chronological event histories for GitLab entities by combining full-text search, cross-reference graph traversal, and resource event aggregation. Given a search query, it identifies relevant issues and MRs, discovers related entities through their reference graph, and assembles a unified, time-ordered event stream.
### Stages
The pipeline executes in five stages:
1. **SEED** -- Full-text search identifies the most relevant issues and MRs matching the query. Documents (issue bodies, MR descriptions, discussion notes) are ranked by BM25 relevance.
2. **HYDRATE** -- Evidence notes are extracted from the seed results: the top FTS-matched discussion notes with 200-character snippets that explain *why* each entity was surfaced.
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities. Starting from seed entities, the pipeline follows "closes", "related", and optionally "mentioned" references up to a configurable depth, tracking provenance (which entity referenced which, via what method).
4. **COLLECT** -- Events are gathered for all discovered entities (seeds + expanded). Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking (timestamp, then entity ID, then event type).
5. **RENDER** -- Events are formatted for output as human-readable text or structured JSON.
### Event Types
| Event | Description |
|-------|-------------|
| `Created` | Entity creation |
| `StateChanged` | State transitions (opened, closed, reopened) |
| `LabelAdded` | Label applied to entity |
| `LabelRemoved` | Label removed from entity |
| `MilestoneSet` | Milestone assigned |
| `MilestoneRemoved` | Milestone removed |
| `Merged` | MR merged (deduplicated against state events) |
| `NoteEvidence` | Discussion note matched by FTS, with snippet |
| `CrossReferenced` | Reference to another entity |
### Unresolved References
When the graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the pipeline output. This enables discovery of external dependencies and can inform future sync targets.
## Development
```bash