Migration 015 adds merge_commit_sha/squash_commit_sha to merge_requests (Gate 4/5 prerequisites), closes_issues_synced_for_updated_at watermark for incremental sync, and the missing idx_label_events_label index. The MR transformer and ingestion pipeline now populate commit SHAs during sync. The orchestrator uses watermark-based filtering for closes_issues jobs instead of re-enqueuing all MRs every sync. The Phase B PRD is updated to match the actual codebase: corrected migration numbering (011-015), documented nullable label/milestone fields (migration 012), watermark patterns (013), observability infrastructure (014), simplified source_method values, and updated entity_references schema to match implementation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
47 KiB
Phase B: Temporal Intelligence Foundation
Status: Draft Prerequisite: CP3 Gates B+C complete (working search + sync pipeline) Goal: Transform gitlore from a search engine into a temporal code intelligence system by ingesting structured event data from GitLab and exposing temporal queries that answer "why" and "when" questions about project history.
Motivation
gitlore currently stores snapshots — the latest state of each issue, MR, and discussion. But temporal queries need change history. When an issue's labels change from priority::low to priority::critical, the current schema overwrites the label junction. The transition is lost.
GitLab issues, MRs, and discussions contain the raw ingredients for temporal intelligence: state transitions, label mutations, assignee changes, cross-references between entities, and decision rationale in discussions. What's missing is a structured temporal index that makes these ingredients queryable.
The Problem This Solves
Today, when an AI agent or developer asks "Why did the team switch from REST to GraphQL?" or "What happened with the auth migration?", the answer is scattered across paginated API responses with no temporal index, no cross-referencing, and no semantic layer. Reconstructing a decision timeline manually takes 20+ minutes of clicking through GitLab's UI. This phase makes it take 2 seconds.
Forcing Function
This phase is designed around one concrete question: "What happened with X?" — where X is any keyword, feature name, or initiative. If lore timeline "auth migration" can produce a useful, chronologically-ordered narrative of all related events across issues, MRs, and discussions, the architecture is validated. If it can't, we learn what's missing before investing in deeper temporal features.
Executive Summary (Gated Milestones)
Five gates, each independently verifiable and shippable:
Gate 1 (Resource Events Ingestion): Structured event data from GitLab APIs → local event tables
Gate 2 (Cross-Reference Extraction): Entity relationship graph from structured APIs + system note parsing
Gate 3 (Decision Timeline): lore timeline command — keyword-driven chronological narrative
Gate 4 (File Decision History): lore file-history command — MR-to-file linking + scoped timelines
Gate 5 (Code Trace): lore trace command — file:line → commit → MR → issue → rationale chain
Key Design Decisions
- Structured APIs over text parsing. GitLab provides Resource Events APIs (
resource_state_events,resource_label_events,resource_milestone_events) that return clean JSON. These are the primary data source for temporal events. System note parsing is a fallback for events without structured APIs (assignee changes, cross-references). - Dependent resource pattern. Resource events are fetched per-entity, triggered by the existing dirty source tracking. Same architecture as discussion fetching — queue-based, resumable, incremental.
- Opt-in event ingestion. New config flag
sync.fetchResourceEvents(defaulttrue) controls whether the sync pipeline fetches event data. Users who don't need temporal features skip the additional API calls. - Application-level graph traversal. Cross-reference expansion uses BFS in Rust, not recursive SQL CTEs. Capped at configurable depth (default 1) for predictable performance.
- Evolutionary library extraction. New commands are built with typed return structs from day one. Old commands are not retrofitted until a concrete consumer (MCP server, web UI) requires it.
- Phase A fields cherry-picked as needed.
merge_commit_shaandsquash_commit_shaare added in migration 015 and populated during MR ingestion. Remaining Phase A fields are handled in their own migration later.
Scope Boundaries
In scope:
- Batch temporal queries over historical data
- Structured event ingestion from GitLab APIs
- Cross-reference graph construction
- CLI commands with robot mode JSON output
Out of scope (future phases):
- Real-time monitoring / notifications ("alert me when my code changes")
- MCP server (Phase C — consumes the library API this phase produces)
- Web UI (Phase D — consumes the same library API)
- Pattern evolution / cross-project trend detection (Phase C)
- Library extraction refactor (happens organically as new commands are added)
Gate 1: Resource Events Ingestion
1.1 Rationale: Why Not Parse System Notes?
The original approach was to parse system note body text with regex to extract state changes and label mutations. Research revealed this is the wrong approach:
- Structured APIs exist. GitLab's Resource Events APIs return clean JSON with explicit
action,state, andlabelfields. Available on all tiers (Free, Premium, Ultimate). - System notes are localized. A French GitLab instance says
"ajouté l'étiquette ~bug"— regex breaks for non-English instances. - Label events aren't in the Notes API. Per GitLab Issue #24661, label change system notes are not returned by the Notes API. The Resource Label Events API is the only reliable source.
- No versioned format spec. System note text has changed across GitLab 14.x–17.x with no documentation of format changes.
System note parsing is still used for events without structured APIs (see Gate 2), but with the explicit understanding that it's best-effort and fragile for non-English instances.
1.2 Schema (Migration 011)
File: migrations/011_resource_events.sql
-- State change events (opened, closed, reopened, merged, locked)
-- Source: GET /projects/:id/issues/:iid/resource_state_events
-- Source: GET /projects/:id/merge_requests/:iid/resource_state_events
CREATE TABLE resource_state_events (
id INTEGER PRIMARY KEY,
gitlab_id INTEGER NOT NULL,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
state TEXT NOT NULL, -- 'opened' | 'closed' | 'reopened' | 'merged' | 'locked'
actor_gitlab_id INTEGER, -- GitLab user ID (stable; usernames can change)
actor_username TEXT, -- display/search convenience
created_at INTEGER NOT NULL, -- ms epoch UTC
source_commit TEXT, -- commit SHA that caused this state change
source_merge_request_iid INTEGER, -- iid from source_merge_request ref
CHECK (
(issue_id IS NOT NULL AND merge_request_id IS NULL)
OR (issue_id IS NULL AND merge_request_id IS NOT NULL)
)
);
CREATE UNIQUE INDEX uq_state_events_gitlab ON resource_state_events(gitlab_id, project_id);
CREATE INDEX idx_state_events_issue ON resource_state_events(issue_id)
WHERE issue_id IS NOT NULL;
CREATE INDEX idx_state_events_mr ON resource_state_events(merge_request_id)
WHERE merge_request_id IS NOT NULL;
CREATE INDEX idx_state_events_created ON resource_state_events(created_at);
-- Label change events (add, remove)
-- Source: GET /projects/:id/issues/:iid/resource_label_events
-- Source: GET /projects/:id/merge_requests/:iid/resource_label_events
CREATE TABLE resource_label_events (
id INTEGER PRIMARY KEY,
gitlab_id INTEGER NOT NULL,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
label_name TEXT, -- nullable: GitLab returns null for deleted labels (see §1.2.1)
actor_gitlab_id INTEGER,
actor_username TEXT,
created_at INTEGER NOT NULL, -- ms epoch UTC
CHECK (
(issue_id IS NOT NULL AND merge_request_id IS NULL)
OR (issue_id IS NULL AND merge_request_id IS NOT NULL)
)
);
CREATE UNIQUE INDEX uq_label_events_gitlab ON resource_label_events(gitlab_id, project_id);
CREATE INDEX idx_label_events_issue ON resource_label_events(issue_id)
WHERE issue_id IS NOT NULL;
CREATE INDEX idx_label_events_mr ON resource_label_events(merge_request_id)
WHERE merge_request_id IS NOT NULL;
CREATE INDEX idx_label_events_created ON resource_label_events(created_at);
-- Note: idx_label_events_label was added in migration 015 (not in the original 011)
-- Milestone change events (add, remove)
-- Source: GET /projects/:id/issues/:iid/resource_milestone_events
-- Source: GET /projects/:id/merge_requests/:iid/resource_milestone_events
CREATE TABLE resource_milestone_events (
id INTEGER PRIMARY KEY,
gitlab_id INTEGER NOT NULL,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
milestone_title TEXT, -- nullable: GitLab returns null for deleted milestones (see §1.2.1)
milestone_id INTEGER,
actor_gitlab_id INTEGER,
actor_username TEXT,
created_at INTEGER NOT NULL, -- ms epoch UTC
CHECK (
(issue_id IS NOT NULL AND merge_request_id IS NULL)
OR (issue_id IS NULL AND merge_request_id IS NOT NULL)
)
);
CREATE UNIQUE INDEX uq_milestone_events_gitlab ON resource_milestone_events(gitlab_id, project_id);
CREATE INDEX idx_milestone_events_issue ON resource_milestone_events(issue_id)
WHERE issue_id IS NOT NULL;
CREATE INDEX idx_milestone_events_mr ON resource_milestone_events(merge_request_id)
WHERE merge_request_id IS NOT NULL;
CREATE INDEX idx_milestone_events_created ON resource_milestone_events(created_at);
1.2.1 Nullable Label and Milestone Fields (Migration 012)
GitLab returns null for label and milestone in Resource Events when the referenced label or milestone has been deleted from the project. This was discovered in production after the initial schema deployed with NOT NULL constraints.
Migration 012 recreates resource_label_events and resource_milestone_events with nullable label_name and milestone_title columns. The table-swap approach (create new → copy → drop old → rename) is required because SQLite doesn't support ALTER COLUMN.
Timeline queries that encounter null labels/milestones display "[deleted label]" or "[deleted milestone]" in human output and omit the name field in robot JSON.
1.2.2 Resource Event Watermarks (Migration 013)
To avoid re-fetching resource events for every entity on every sync, a watermark column tracks the updated_at value at the time of last successful event fetch:
ALTER TABLE issues ADD COLUMN resource_events_synced_for_updated_at INTEGER;
ALTER TABLE merge_requests ADD COLUMN resource_events_synced_for_updated_at INTEGER;
Incremental behavior: During sync, only entities where updated_at > COALESCE(resource_events_synced_for_updated_at, 0) are enqueued for resource event fetching. On --full sync, these watermarks are reset to NULL, causing all entities to be re-enqueued.
This mirrors the existing discussions_synced_for_updated_at pattern and works in conjunction with the dependent fetch queue.
1.3 Config Extension
File: src/core/config.rs
Add to SyncConfig:
/// Fetch resource events (state, label, milestone changes) during sync.
/// Increases API calls but enables temporal queries (lore timeline, etc.).
/// Default: true
#[serde(default = "default_true")]
pub fetch_resource_events: bool,
Config file example:
{
"sync": {
"fetchResourceEvents": true
}
}
1.4 GitLab API Client
New endpoints in src/gitlab/client.rs:
GET /projects/:id/issues/:iid/resource_state_events?per_page=100
GET /projects/:id/issues/:iid/resource_label_events?per_page=100
GET /projects/:id/merge_requests/:iid/resource_state_events?per_page=100
GET /projects/:id/merge_requests/:iid/resource_label_events?per_page=100
GET /projects/:id/issues/:iid/resource_milestone_events?per_page=100
GET /projects/:id/merge_requests/:iid/resource_milestone_events?per_page=100
All endpoints use standard pagination. Fetch all pages per entity.
New serde types in src/gitlab/types.rs:
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabStateEvent {
pub id: i64,
pub user: Option<GitLabAuthor>,
pub created_at: String,
pub resource_type: String, // "Issue" | "MergeRequest"
pub resource_id: i64,
pub state: String, // "opened" | "closed" | "reopened" | "merged" | "locked"
pub source_commit: Option<String>,
pub source_merge_request: Option<GitLabMergeRequestRef>,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabLabelEvent {
pub id: i64,
pub user: Option<GitLabAuthor>,
pub created_at: String,
pub resource_type: String,
pub resource_id: i64,
pub label: Option<GitLabLabelRef>, // nullable: deleted labels return null
pub action: String, // "add" | "remove"
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct GitLabMilestoneEvent {
pub id: i64,
pub user: Option<GitLabAuthor>,
pub created_at: String,
pub resource_type: String,
pub resource_id: i64,
pub milestone: Option<GitLabMilestoneRef>, // nullable: deleted milestones return null
pub action: String, // "add" | "remove"
}
1.5 Ingestion Pipeline
Architecture: Generic dependent-fetch queue, generalizing the pending_discussion_fetches pattern. A single queue table serves all dependent resource types across Gates 1, 2, and 4, avoiding schema churn as new fetch types are added.
New queue table (in migration 011):
-- Generic queue for all dependent resource fetches (events, closes_issues, diffs)
-- Replaces per-type queue tables with a unified job model
CREATE TABLE pending_dependent_fetches (
id INTEGER PRIMARY KEY,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
entity_type TEXT NOT NULL CHECK (entity_type IN ('issue', 'merge_request')),
entity_iid INTEGER NOT NULL,
entity_local_id INTEGER NOT NULL,
job_type TEXT NOT NULL CHECK (job_type IN (
'resource_events', -- Gate 1: state + label + milestone events
'mr_closes_issues', -- Gate 2: closes_issues API
'mr_diffs' -- Gate 4: MR file changes
)),
payload_json TEXT, -- job-specific params, e.g. {"event_types":["state","label","milestone"]}
enqueued_at INTEGER NOT NULL,
attempts INTEGER NOT NULL DEFAULT 0,
last_error TEXT,
next_retry_at INTEGER,
locked_at INTEGER, -- crash recovery: NULL = available, non-NULL = in progress
UNIQUE(project_id, entity_type, entity_iid, job_type)
);
The locked_at column provides crash recovery: if a sync process crashes mid-drain, stale locks (older than 5 minutes) are automatically reclaimed on the next lore sync run. This is intentionally minimal — full job leasing with locked_by and lease expiration is unnecessary for a single-process CLI tool.
Flow:
- During issue/MR ingestion, when an entity is upserted (new or updated), enqueue jobs in
pending_dependent_fetches:- For all entities:
job_type = 'resource_events'(whenfetchResourceEventsis true) - For MRs:
job_type = 'mr_closes_issues'(always, for Gate 2) - For MRs:
job_type = 'mr_diffs'(whenfetchMrFileChangesis true, for Gate 4)
- For all entities:
- After primary ingestion completes, drain the dependent fetch queue:
- Claim jobs:
UPDATE ... SET locked_at = now WHERE locked_at IS NULL AND (next_retry_at IS NULL OR next_retry_at <= now) - For each job, dispatch by
job_typeto the appropriate fetcher - On success: DELETE the job row
- On transient failure: increment
attempts, setnext_retry_atwith exponential backoff, clearlocked_at
- Claim jobs:
lore syncdrains dependent jobs after ingestion + discussion fetch steps.
Incremental behavior: Only entities that changed since last sync are enqueued. On --full sync, all entities are re-enqueued.
1.6 API Call Budget
Per entity: 3 API calls (state + label + milestone) for issues, 3 for MRs.
| Scenario | Entities | API Calls | Time at 2k req/min |
|---|---|---|---|
| Initial sync, 500 issues + 200 MRs | 700 | 2,100 | ~1 min |
| Initial sync, 2,000 issues + 1,000 MRs | 3,000 | 9,000 | ~4.5 min |
| Incremental sync, 20 changed entities | 20 | 60 | <2 sec |
Acceptable for initial sync. Incremental sync adds negligible overhead.
Optimization (future): If milestone events prove low-value, make them opt-in to reduce calls by 1/3.
1.7 Acceptance Criteria
- Migration 011 creates all three event tables + generic dependent fetch queue
lore syncfetches resource events for changed entities whenfetchResourceEventsis truelore sync --no-eventsskips event fetching- Event fetch failures are queued for retry with exponential backoff
- Stale locks (crashed sync) automatically reclaimed on next run
lore count eventsshows event counts by typelore stats --checkvalidates event table referential integritylore stats --checkvalidates dependent job queue health (no stuck locks, retryable jobs visible)- Robot mode JSON for all new commands
1.8 Observability Infrastructure (Migration 014)
The sync pipeline includes lightweight observability via sync_runs enrichment. Migration 014 adds:
ALTER TABLE sync_runs ADD COLUMN run_id TEXT; -- correlation ID for log tracing
ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
CREATE INDEX IF NOT EXISTS idx_sync_runs_run_id ON sync_runs(run_id);
Purpose: The run_id column correlates log entries (via tracing) with sync run records. total_items_processed and total_errors provide aggregate counts for lore sync-status and robot mode health checks without requiring log parsing.
This is separate from the event tables but supports the same operational workflow — answering "did the last sync succeed?" and "how many entities were processed?" programmatically.
Gate 2: Cross-Reference Extraction
2.1 Rationale
Temporal queries need to follow links between entities: "MR !567 closed issue #234", "issue #234 mentioned in MR !567", "#299 was opened as a follow-up to !567". These relationships are captured in two places:
- Structured API:
GET /projects/:id/merge_requests/:iid/closes_issuesreturns issues that close when the MR merges. Also,resource_state_eventsincludessource_merge_request_iidfor "closed by MR" events. - System notes: Cross-references like "mentioned in !456" and "closed by !789" appear in system note body text.
2.2 Schema (in Migration 011)
-- Cross-references between entities
-- Populated from: closes_issues API, state events, system note parsing
--
-- Directionality convention:
-- source = the entity where the reference was *observed* (contains the note, or is the MR in closes_issues)
-- target = the entity being *referenced* (the issue closed, the MR mentioned)
-- This is consistent across all source_methods and enables predictable BFS traversal.
--
-- Unresolved references: when a cross-reference points to an entity in a project
-- that isn't synced locally, target_entity_id is NULL but target_project_path and
-- target_entity_iid are populated. This preserves valuable edges rather than
-- silently dropping them. Timeline output marks these as "[external]".
CREATE TABLE entity_references (
id INTEGER PRIMARY KEY,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
source_entity_type TEXT NOT NULL CHECK (source_entity_type IN ('issue', 'merge_request')),
source_entity_id INTEGER NOT NULL, -- local DB id
target_entity_type TEXT NOT NULL CHECK (target_entity_type IN ('issue', 'merge_request')),
target_entity_id INTEGER, -- local DB id (NULL when target is unresolved/external)
target_project_path TEXT, -- e.g. "group/other-repo" (populated for cross-project refs)
target_entity_iid INTEGER, -- GitLab iid (populated when target_entity_id is NULL)
reference_type TEXT NOT NULL CHECK (reference_type IN ('closes', 'mentioned', 'related')),
source_method TEXT NOT NULL CHECK (source_method IN ('api', 'note_parse', 'description_parse')),
created_at INTEGER NOT NULL -- ms epoch UTC
);
-- Unique constraint includes source_method: the same relationship can be discovered by
-- multiple methods (e.g., closes_issues API and a state event), and we store both for provenance.
CREATE UNIQUE INDEX uq_entity_refs ON entity_references(
project_id, source_entity_type, source_entity_id, target_entity_type,
COALESCE(target_entity_id, -1), COALESCE(target_project_path, ''),
COALESCE(target_entity_iid, -1), reference_type, source_method
);
CREATE INDEX idx_entity_refs_source ON entity_references(source_entity_type, source_entity_id);
CREATE INDEX idx_entity_refs_target ON entity_references(target_entity_id)
WHERE target_entity_id IS NOT NULL;
CREATE INDEX idx_entity_refs_unresolved ON entity_references(target_project_path, target_entity_iid)
WHERE target_entity_id IS NULL;
source_method values:
| Value | Meaning |
|---|---|
'api' |
Populated from structured GitLab APIs (closes_issues, resource_state_events) |
'note_parse' |
Extracted from system note body text (best-effort, English only) |
'description_parse' |
Extracted from issue/MR description body text (future) |
The original design used more granular values ('api_closes_issues', 'api_state_event', 'system_note_parse'). In practice, the API-sourced references don't need sub-method distinction — the reference_type already captures the semantic relationship — so the implementation simplified to three values.
2.3 Population Strategy
Tier 1 — Structured APIs (reliable):
closes_issuesendpoint: After MR ingestion, fetchGET /projects/:id/merge_requests/:iid/closes_issues. Insertreference_type = 'closes',source_method = 'api'. Source = MR, target = issue.- State events: When
resource_state_eventscontainssource_merge_request_iid, insertreference_type = 'closes',source_method = 'api'. Source = MR (referenced by iid), target = issue (that received the state change).
Tier 2 — System note parsing (best-effort):
Parse system notes where is_system = 1 for cross-reference patterns.
Directionality rule: Source = entity containing the system note. Target = entity referenced by the note text. This is consistent with Tier 1's convention.
mentioned in !{iid}
mentioned in #{iid}
mentioned in {group}/{project}!{iid}
mentioned in {group}/{project}#{iid}
closed by !{iid}
closed by #{iid}
Cross-project references: When a system note references {group}/{project}#{iid} and the target project is not synced locally, store with target_entity_id = NULL, target_project_path = '{group}/{project}', target_entity_iid = {iid}. These unresolved references are still valuable for timeline narratives — they indicate external dependencies and decision context even when we can't traverse further.
Insert with source_method = 'note_parse'. Accept that:
- This breaks on non-English GitLab instances
- Format may vary across GitLab versions
- Log parse failures at
debuglevel for monitoring
Tier 3 — Description/body parsing (source_method = 'description_parse', deferred):
Issue and MR descriptions often contain #123 or !456 references. Parsing these is lower confidence (mentions != relationships) and is deferred to a future iteration. The source_method value 'description_parse' is reserved in the CHECK constraint for this future work.
2.4 Ingestion Flow
The closes_issues fetch uses the generic dependent fetch queue (job_type = 'mr_closes_issues'):
- After MR ingestion, a
mr_closes_issuesjob is enqueued alongsideresource_eventsjobs - One additional API call per MR:
GET /projects/:id/merge_requests/:iid/closes_issues - Cross-reference parsing from system notes runs as a local post-processing step (no API calls) after all dependent fetches complete
Watermark pattern (migration 015): A closes_issues_synced_for_updated_at column on merge_requests tracks the last updated_at value at which closes_issues data was fetched. Only MRs where updated_at > COALESCE(closes_issues_synced_for_updated_at, 0) are enqueued for re-fetching. The watermark is updated after successful fetch or after a permanent API error (e.g., 404 for external MRs). On --full sync, the watermark is reset to NULL.
2.5 Acceptance Criteria
entity_referencestable populated fromcloses_issuesAPI for all synced MRsentity_referencestable populated fromresource_state_eventswheresource_merge_request_idis present- System notes parsed for cross-reference patterns (English instances)
- Cross-project references stored as unresolved when target project is not synced
source_methodcolumn tracks provenance of each reference- References are deduplicated (same relationship from multiple sources stored once)
- Timeline JSON includes expansion provenance (
via) for all expanded entities
Gate 3: Decision Timeline (lore timeline)
3.1 Command Design
# Basic: keyword-driven timeline
lore timeline "auth migration"
# Scoped to project
lore timeline "auth migration" -p group/repo
# Limit date range
lore timeline "auth migration" --since 6m
lore timeline "auth migration" --since 2024-01-01
# Control cross-reference expansion depth
lore timeline "auth migration" --depth 0 # No expansion (matched entities only)
lore timeline "auth migration" --depth 1 # Follow direct references (default)
lore timeline "auth migration" --depth 2 # Two hops
# Control which edge types are followed during expansion
lore timeline "auth migration" --expand-mentions # Also follow 'mentioned' edges (off by default)
# Default expansion follows 'closes' and 'related' edges only.
# 'mentioned' edges are excluded by default because they have high fan-out
# and often connect tangentially related entities.
# Limit results
lore timeline "auth migration" -n 50
# Robot mode
lore -J timeline "auth migration"
3.2 Query Flow
1. SEED: FTS5 keyword search → matched document IDs (issues, MRs, and notes/discussions)
↓
2. HYDRATE:
- Map document IDs → source entities (issues, MRs)
- Collect top matched notes as evidence candidates (bounded, default top 10)
These are the actual decision-bearing comments that answer "why"
↓
3. EXPAND: Follow entity_references (BFS, depth-limited)
→ Discover related entities not matched by keywords
→ Default: follow 'closes' + 'related' edges; skip 'mentioned' unless --expand-mentions
→ Unresolved (external) references included in output but not traversed further
↓
4. COLLECT EVENTS: For all entities (seed + expanded):
- Entity creation (created_at from issues/merge_requests)
- State changes (resource_state_events)
- Label changes (resource_label_events)
- Milestone changes (resource_milestone_events)
- Evidence notes: top FTS5-matched notes as discrete events (snippet + author + url)
- Merge events (merged_at from merge_requests)
↓
5. INTERLEAVE: Sort all events chronologically
↓
6. RENDER: Format as timeline (human or JSON)
Why evidence notes instead of "discussion activity summarized": The forcing function is "What happened with X?" A timeline entry that says "3 new comments" doesn't answer why — it answers how many. By including the top FTS5-matched notes as first-class timeline events, the timeline surfaces the actual decision rationale, code review feedback, and architectural reasoning that motivated changes. This uses the existing search infrastructure (CP3) with no new indexing required.
3.3 Event Model
The timeline doesn't store a separate unified event table. Instead, it queries across the existing tables at read time and produces a virtual event stream:
pub struct TimelineEvent {
pub timestamp: i64, // ms epoch
pub entity_type: String, // "issue" | "merge_request" | "discussion"
pub entity_iid: i64,
pub project_path: String,
pub event_type: TimelineEventType,
pub summary: String, // human-readable one-liner
pub actor: Option<String>, // username
pub url: Option<String>,
pub is_seed: bool, // matched by keyword (vs. expanded via reference)
}
pub enum TimelineEventType {
Created, // entity opened/created
StateChanged { state: String }, // closed, reopened, merged, locked
LabelAdded { label: String },
LabelRemoved { label: String },
MilestoneSet { milestone: String },
MilestoneRemoved { milestone: String },
Merged,
NoteEvidence { // FTS5-matched note surfacing decision rationale
note_id: i64,
snippet: String, // first ~200 chars of the matching note body
discussion_id: Option<i64>,
},
CrossReferenced { target: String },
}
3.4 Human Output Format
lore timeline "auth migration"
Timeline: "auth migration" (12 events across 4 entities)
───────────────────────────────────────────────────────
2024-03-15 CREATED #234 Migrate to OAuth2 @alice
Labels: ~auth, ~breaking-change
2024-03-18 CREATED !567 feat: add OAuth2 provider @bob
References: #234
2024-03-20 NOTE #234 "Should we support SAML too? I think @charlie
we should stick with OAuth2 for now..."
2024-03-22 LABEL !567 added ~security-review @alice
2024-03-24 NOTE !567 [src/auth/oauth.rs:45] @dave
"Consider refresh token rotation to
prevent session fixation attacks"
2024-03-25 MERGED !567 feat: add OAuth2 provider @alice
2024-03-26 CLOSED #234 closed by !567 @alice
2024-03-28 CREATED #299 OAuth2 login fails for SSO users @dave [expanded]
(via !567, closes)
───────────────────────────────────────────────────────
Seed entities: #234, !567 | Expanded: #299 (depth 1, via !567)
Entities discovered via cross-reference expansion are marked [expanded] with a compact provenance note showing which seed entity and edge type led to their discovery.
Evidence notes (NOTE events) show the first ~200 characters of FTS5-matched note bodies. These are the actual decision-bearing comments that answer "why" — not just activity counts.
3.5 Robot Mode JSON
{
"ok": true,
"data": {
"query": "auth migration",
"event_count": 12,
"seed_entities": [
{ "type": "issue", "iid": 234, "project": "group/repo" },
{ "type": "merge_request", "iid": 567, "project": "group/repo" }
],
"expanded_entities": [
{
"type": "issue",
"iid": 299,
"project": "group/repo",
"depth": 1,
"via": {
"from": { "type": "merge_request", "iid": 567, "project": "group/repo" },
"reference_type": "closes",
"source_method": "api"
}
}
],
"unresolved_references": [
{
"source": { "type": "merge_request", "iid": 567, "project": "group/repo" },
"target_project": "group/other-repo",
"target_type": "issue",
"target_iid": 42,
"reference_type": "mentioned"
}
],
"events": [
{
"timestamp": "2024-03-15T10:00:00Z",
"entity_type": "issue",
"entity_iid": 234,
"project": "group/repo",
"event_type": "created",
"summary": "Migrate to OAuth2",
"actor": "alice",
"url": "https://gitlab.com/group/repo/-/issues/234",
"is_seed": true,
"details": {
"labels": ["auth", "breaking-change"]
}
},
{
"timestamp": "2024-03-20T14:30:00Z",
"entity_type": "issue",
"entity_iid": 234,
"project": "group/repo",
"event_type": "note_evidence",
"summary": "Should we support SAML too? I think we should stick with OAuth2 for now...",
"actor": "charlie",
"url": "https://gitlab.com/group/repo/-/issues/234#note_12345",
"is_seed": true,
"details": {
"note_id": 12345,
"snippet": "Should we support SAML too? I think we should stick with OAuth2 for now..."
}
}
]
},
"meta": {
"search_mode": "lexical",
"expansion_depth": 1,
"expand_mentions": false,
"total_entities": 3,
"total_events": 12,
"evidence_notes_included": 4,
"unresolved_references": 1
}
}
3.6 Acceptance Criteria
lore timeline <query>returns chronologically ordered events- Seed entities found via FTS5 keyword search (issues, MRs, and notes)
- State, label, and milestone events interleaved from resource event tables
- Entity creation and merge events included
- Evidence-bearing notes included as
note_evidenceevents (top FTS5 matches, bounded default 10) - Cross-reference expansion follows
entity_referencesto configurable depth - Default expansion follows
closes+relatededges;--expand-mentionsaddsmentionededges --depth 0disables expansion--sincefilters by event timestamp-pscopes to project- Human output is colored and readable
- Robot mode returns structured JSON with expansion provenance (
via) for expanded entities - Unresolved (external) references included in JSON output
Gate 4: File Decision History (lore file-history)
4.1 Schema
Commit SHAs (Migration 015 — already applied):
merge_commit_sha and squash_commit_sha were added to merge_requests in migration 015. These are now populated during MR ingestion and available for Gate 4/5 queries.
File changes table (future migration — not yet created):
-- Files changed by each merge request
-- Source: GET /projects/:id/merge_requests/:iid/diffs
CREATE TABLE mr_file_changes (
id INTEGER PRIMARY KEY,
merge_request_id INTEGER NOT NULL REFERENCES merge_requests(id) ON DELETE CASCADE,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
old_path TEXT, -- NULL for new files
new_path TEXT NOT NULL,
change_type TEXT NOT NULL CHECK (change_type IN ('added', 'modified', 'deleted', 'renamed')),
UNIQUE(merge_request_id, new_path)
);
CREATE INDEX idx_mr_files_new_path ON mr_file_changes(new_path);
CREATE INDEX idx_mr_files_old_path ON mr_file_changes(old_path)
WHERE old_path IS NOT NULL;
CREATE INDEX idx_mr_files_mr ON mr_file_changes(merge_request_id);
4.2 Config Extension
{
"sync": {
"fetchMrFileChanges": true
}
}
Opt-in. When enabled, the sync pipeline fetches GET /projects/:id/merge_requests/:iid/diffs for each changed MR and extracts file metadata. Diff content is not stored — only file paths and change types.
4.3 Ingestion
Uses the generic dependent fetch queue (job_type = 'mr_diffs'):
- After MR ingestion, if
fetchMrFileChangesis true, enqueue amr_diffsjob inpending_dependent_fetches. - Parse response:
changes[].{old_path, new_path, new_file, renamed_file, deleted_file}. - Derive
change_type:new_file == true→'added'renamed_file == true→'renamed'deleted_file == true→'deleted'- else →
'modified'
- Upsert into
mr_file_changes. On re-sync, DELETE existing rows for the MR and re-insert (diffs can change if MR is rebased).
API call cost: 1 additional call per MR. Acceptable for incremental sync (10–50 MRs/day).
4.4 Command Design
# Show decision history for a file
lore file-history src/auth/oauth.rs
# Scoped to project (required if file path exists in multiple projects)
lore file-history src/auth/oauth.rs -p group/repo
# Include discussions on the MRs
lore file-history src/auth/oauth.rs --discussions
# Follow rename chains (default: on)
lore file-history src/auth/oauth.rs # follows renames automatically
lore file-history src/auth/oauth.rs --no-follow-renames # disable rename chain resolution
# Limit results
lore file-history src/auth/oauth.rs -n 10
# Filter to merged MRs only
lore file-history src/auth/oauth.rs --merged
# Robot mode
lore -J file-history src/auth/oauth.rs
4.5 Query Logic
SELECT
mr.iid,
mr.title,
mr.state,
mr.author_username,
mr.merged_at,
mr.created_at,
mr.web_url,
mr.merge_commit_sha,
mfc.change_type,
mfc.old_path,
(SELECT COUNT(*) FROM discussions d
WHERE d.merge_request_id = mr.id) AS discussion_count,
(SELECT COUNT(*) FROM notes n
JOIN discussions d ON n.discussion_id = d.id
WHERE d.merge_request_id = mr.id
AND n.position_new_path = ?1) AS file_discussion_count
FROM mr_file_changes mfc
JOIN merge_requests mr ON mr.id = mfc.merge_request_id
WHERE mfc.new_path = ?1 OR mfc.old_path = ?1
ORDER BY COALESCE(mr.merged_at, mr.created_at) DESC;
For each MR, optionally fetch related issues via entity_references (Gate 2 data).
4.6 Rename Handling
File renames are tracked via old_path and resolved as bounded chains:
- Start with the query path in the path set:
{src/auth/oauth.rs} - Search
mr_file_changesfor rows wherechange_type = 'renamed'and eithernew_pathorold_pathis in the path set - Add the other side of each rename to the path set
- Repeat until no new paths are discovered, up to a maximum of 10 hops (configurable)
- Use the full path set for the file history query
Safeguards:
- Hop cap (default 10) prevents runaway expansion
- Cycle detection: if a path is already in the set, skip it
- The unioned path set is used for matching MRs in the main query
Output:
- Human mode annotates the rename chain:
"src/auth/oauth.rs (renamed from src/auth/handler.rs ← src/auth.rs)" - Robot mode JSON includes
rename_chain:["src/auth.rs", "src/auth/handler.rs", "src/auth/oauth.rs"] --no-follow-renamesdisables chain resolution (matches only the literal path provided)
4.7 Acceptance Criteria
mr_file_changestable populated from GitLab diffs APImerge_commit_shaandsquash_commit_shacaptured inmerge_requestslore file-history <path>returns MRs ordered by merge/creation date- Output includes: MR title, state, author, change type, discussion count
--discussionsshows inline discussion snippets from DiffNotes on the file- Rename chains resolved with bounded hop count (default 10) and cycle detection
--no-follow-renamesdisables chain resolution- Robot mode JSON includes
rename_chainwhen renames are detected - Robot mode JSON output
-prequired when path exists in multiple projects (Ambiguous error)
Gate 5: Code Trace (lore trace)
5.1 Overview
lore trace answers "Why was this code introduced?" by tracing from a file (and optionally a line number) back through the MR and issue that motivated the change.
5.2 Two-Tier Architecture
Tier 1 — API-only (no local git required):
Uses merge_commit_sha and squash_commit_sha from the merge_requests table to link MRs to commits. Combined with mr_file_changes, this can answer "which MRs touched this file" and link to their motivating issues via entity_references.
This is equivalent to lore file-history enriched with issue context — effectively a file-scoped decision timeline.
Tier 2 — Git integration (requires local clone):
Uses git blame to map a specific line to a commit SHA, then resolves the commit to an MR via merge_commit_sha lookup. This provides line-level precision.
Gate 5 ships Tier 1 only. Tier 2 (git integration via git2-rs) is a future enhancement.
5.3 Command Design
# Trace a file's history (Tier 1: API-only)
lore trace src/auth/oauth.rs
# Trace a specific line (Tier 2: requires local git)
lore trace src/auth/oauth.rs:45
# Robot mode
lore -J trace src/auth/oauth.rs
5.4 Query Flow (Tier 1)
1. Find MRs that touched this file (mr_file_changes)
↓
2. For each MR, find related issues (entity_references WHERE reference_type = 'closes')
↓
3. For each issue, fetch discussions with rationale
↓
4. Build trace chain: file → MR → issue → discussions
↓
5. Order by merge date (most recent first)
5.5 Output Format (Human)
lore trace src/auth/oauth.rs
Trace: src/auth/oauth.rs
────────────────────────
!567 feat: add OAuth2 provider MERGED 2024-03-25
→ Closes #234: Migrate to OAuth2
→ 12 discussion comments, 4 on this file
→ Decision: Use rust-oauth2 crate (discussed in #234, comment by @alice)
!612 fix: token refresh race condition MERGED 2024-04-10
→ Closes #299: OAuth2 login fails for SSO users
→ 5 discussion comments, 2 on this file
→ [src/auth/oauth.rs:45] "Add mutex around refresh to prevent double-refresh"
!701 refactor: extract TokenManager MERGED 2024-05-01
→ Related: #312: Reduce auth module complexity
→ 3 discussion comments
→ Note: file was renamed from src/auth/handler.rs
5.6 Tier 2 Design Notes (Future — Not in This Phase)
When git integration is added:
- Add
git2-rsdependency for native git operations - Implement
git blame -L <line>,<line> <file>to get commit SHA for a specific line - Look up commit SHA in
merge_requests.merge_commit_shaormerge_requests.squash_commit_sha - If no match (commit was squashed), search
merge_commit_shafor commits in the blame range - Optional
blame_cachetable for performance (invalidated by content hash)
Known limitation: Squash commits break blame-to-MR mapping for individual commits within an MR. The squash commit SHA maps to the MR, but all lines show the same commit. This is a fundamental Git limitation documented in GitLab Forum #77146.
5.7 Acceptance Criteria (Tier 1 Only)
lore trace <file>shows MRs that touched the file with linked issues and discussion context- Output includes the MR → issue → discussion chain
- Discussion snippets show DiffNote content on the traced file
- Cross-references from
entity_referencesused for MR→issue linking - Robot mode JSON output
- Graceful handling when no MR data found ("Run
lore syncwithfetchMrFileChanges: true")
Migration Strategy
Migration Numbering
Phase B uses migration numbers 011–015. The original plan assumed migration 010 was available, but chunk config (010_chunk_config.sql) was implemented first, shifting everything by +1.
| Migration | File | Content | Gate |
|---|---|---|---|
| 011 | 011_resource_events.sql |
Resource event tables (state, label, milestone), entity_references, generic dependent fetch queue | Gates 1, 2 |
| 012 | 012_nullable_label_milestone.sql |
Make label_name and milestone_title nullable for deleted labels/milestones |
Gate 1 (fix) |
| 013 | 013_resource_event_watermarks.sql |
Add resource_events_synced_for_updated_at to issues and merge_requests |
Gate 1 (optimization) |
| 014 | 014_sync_runs_enrichment.sql |
Observability: run_id, total_items_processed, total_errors on sync_runs |
Observability |
| 015 | 015_commit_shas_and_closes_watermark.sql |
merge_commit_sha, squash_commit_sha, closes_issues_synced_for_updated_at on merge_requests; idx_label_events_label index |
Gates 2, 4 |
| TBD | — | mr_file_changes table for MR diff data |
Gate 4 |
Backward Compatibility
- All new tables are additive (no ALTER on existing data-bearing columns)
lore syncworks without event data — temporal commands gracefully report "No event data. Runlore syncto populate."- Existing search, issues, mrs commands are unaffected
Risks and Mitigations
Identified During Premortem
| Risk | Severity | Mitigation |
|---|---|---|
| API call volume explosion (3 event calls per entity) | Medium | Incremental sync limits to changed entities; opt-in config flag |
| System note parsing fragile for non-English instances | Medium | Used only for assignee changes and cross-refs; source_method tracks provenance |
| GitLab diffs API returns large payloads | Low | Extract file metadata only, discard diff content |
| Cross-reference graph traversal unbounded | Medium | BFS depth capped at configurable limit (default 1); mentioned edges excluded by default |
| Cross-project references lost when target not synced | Medium | Unresolved references stored with target_entity_id = NULL; still appear in timeline output |
| Phase A migration numbering conflict | Low | Resolved: chunk config took 010; Phase B shifted to 011-015 |
| Timeline output lacks "why" evidence | Medium | Evidence-bearing notes from FTS5 included as first-class timeline events |
| Squash commits break blame-to-MR mapping | Medium | Tier 2 (git integration) deferred; Tier 1 uses file-level MR matching |
Accepted Limitations
- No real-time monitoring. Phase B is batch queries over historical data. "Notify me when my code changes" requires a different architecture (webhooks, polling daemon) and is out of scope.
- No pattern evolution. Cross-project trend detection requires all of Phase B's infrastructure plus semantic clustering. Deferred to Phase C.
- English-only system note parsing. Cross-reference extraction from system notes works reliably only for English-language GitLab instances. Structured API data works for all languages.
- Bounded rename chain resolution.
lore file-historyresolves rename chains up to 10 hops with cycle detection. Pathological rename histories (>10 hops) are truncated. - Evidence notes are keyword-matched, not summarized. Timeline evidence notes are the raw FTS5-matched note text, not AI-generated summaries. This keeps the system deterministic and avoids LLM dependencies.
Success Metrics
| Metric | Target |
|---|---|
lore timeline query latency |
< 200ms for typical queries (< 50 seed entities) |
| Timeline event coverage | State + label + creation + merge + evidence note events for all synced entities |
| Timeline evidence quality | Top 10 FTS5-matched notes included per query; at least 1 evidence note for queries matching discussion-bearing entities |
| Cross-reference coverage | > 80% of "closed by MR" relationships captured via structured API |
| Unresolved reference capture | Cross-project references stored even when target project is not synced |
| Incremental sync overhead | < 5% increase in sync time for event fetching |
lore file-history coverage |
File changes captured for all synced MRs (when opt-in enabled) |
| Rename chain resolution | Multi-hop renames correctly resolved up to 10 hops |
Future Phases (Out of Scope)
Phase C: Advanced Temporal Features
- Pattern Evolution: cross-project trend detection via embedding clusters
- Git integration (Tier 2):
git blame→ commit → MR resolution - MCP server: expose
timeline,file-history,traceas typed MCP tools
Phase D: Consumer Applications
- Web UI: separate frontend consuming lore's JSON API via
lore serve - Real-time monitoring: webhook listener or polling daemon for change notifications
- IDE integration: editor plugins surfacing temporal context inline