14 Commits

Author SHA1 Message Date
teernisse
3f38b3fda7 docs: add comprehensive command surface analysis
Deep analysis of the full `lore` CLI command surface (34 commands across
6 categories) covering command inventory, data flow, overlap analysis,
and optimization proposals.

Document structure:
- Main consolidated doc: docs/command-surface-analysis.md (1251 lines)
- Split sections in docs/command-surface-analysis/ for navigation:
  00-overview.md      - Summary, inventory, priorities
  01-entity-commands.md   - issues, mrs, notes, search, count
  02-intelligence-commands.md - who, timeline, me, file-history, trace, related, drift
  03-pipeline-and-infra.md    - sync, ingest, generate-docs, embed, diagnostics
  04-data-flow.md     - Shared data source map, command network graph
  05-overlap-analysis.md  - Quantified overlap percentages for every command pair
  06-agent-workflows.md   - Common agent flows, round-trip costs, token profiles
  07-consolidation-proposals.md  - 5 proposals to reduce 34 commands to 29
  08-robot-optimization-proposals.md - 6 proposals for --include, --batch, --depth
  09-appendices.md    - Robot output envelope, field presets, exit codes

Key findings:
- High overlap pairs: who-workload/me (~85%), health/doctor (~90%)
- 5 consolidation proposals to reduce command count by 15%
- 6 robot-mode optimization proposals targeting agent round-trip reduction
- Full DB table mapping and data flow documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-28 00:08:31 -05:00
teernisse
439c20e713 release: v0.9.1 2026-02-26 11:39:05 -05:00
teernisse
fd0a40b181 chore: update beads and GitLab TODOs integration plan
Update beads issue tracking state and expand the GitLab TODOs
notifications integration design document with additional
implementation details.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-26 11:07:04 -05:00
teernisse
b2811b5e45 fix(fts): remove NEAR from infix operator list
NEAR is an FTS5 function (NEAR(term1 term2, N)), not an infix operator like
AND/OR/NOT. Passing it through unquoted in Safe mode was incorrect - it would
be treated as a literal term rather than a function call.

Users who need NEAR proximity search should use FtsQueryMode::Raw which
passes the query through verbatim to FTS5.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-26 11:06:59 -05:00
teernisse
2d2e470621 refactor(orchestrator): consolidate stale lock reclamation and fix edge cases
Several improvements to the ingestion orchestrator:

1. Stale lock reclamation consolidation:
   Previously, reclaim_stale_locks() was called redundantly in multiple
   drain functions (drain_resource_events, drain_closes_issues, etc.).
   Now it's called once at sync entry points (ingest_project_issues,
   ingest_project_mrs) to reduce overhead and DB contention.

2. Fix status_enrichment_mode error values:
   - "fetched" -> "error" when project path is missing
   - "fetched" -> "fetch_error" when GraphQL fetch fails
   These values are used in robot mode JSON output and should accurately
   reflect the error condition.

3. Add batch_size zero guard:
   Added .max(1) to batch_size calculation to prevent panic in .chunks()
   when config.sync.dependent_concurrency is 0. This makes the code
   defensive against misconfiguration.

These changes improve correctness and reduce unnecessary DB operations
during sync, particularly beneficial for large projects with many entities.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-26 11:06:44 -05:00
teernisse
23efb15599 feat(truncation): add pre-truncation for oversized descriptions
Add pre_truncate_description() to prevent unbounded memory allocation when
processing pathologically large descriptions (e.g., 500MB base64 blobs in
issue descriptions).

Previously, the document extraction pipeline would:
1. Allocate memory for the entire description
2. Append to content buffer
3. Only truncate at the end via truncate_hard_cap()

For a 500MB description, this would allocate 500MB+ before truncation.

New approach:
1. Check description size BEFORE appending
2. If over limit, truncate at UTF-8 boundary immediately
3. Add human-readable marker: "[... description truncated from 500.0MB to 2.0MB ...]"
4. Log warning with original size for observability

Also adds format_bytes() helper for human-readable byte sizes (B, KB, MB).

This is applied to both issue and MR document extraction in extractor.rs,
protecting the embedding pipeline from OOM on malformed GitLab data.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-26 11:06:32 -05:00
teernisse
a45c37c7e4 feat(timeline): add entity-direct seeding and round-robin evidence selection
Enhance the timeline command with two major improvements:

1. Entity-direct seeding syntax (bypass search):
   lore timeline issue:42    # Timeline for specific issue
   lore timeline i:42        # Short form
   lore timeline mr:99       # Timeline for specific MR
   lore timeline m:99        # Short form

   This directly resolves the entity and gathers ALL its discussions without
   requiring search/embedding. Useful when you know exactly which entity you want.

2. Round-robin evidence note selection:
   Previously, evidence notes were taken in FTS rank order, which could result
   in all notes coming from a single high-traffic discussion. Now we:
   - Fetch 5x the requested limit (or minimum 50)
   - Group notes by discussion_id
   - Select round-robin across discussions
   - This ensures diverse evidence from multiple conversations

API changes:
- Renamed total_events_before_limit -> total_filtered_events (clearer semantics)
- Added resolve_entity_by_iid() in timeline.rs for IID-based entity resolution
- Added seed_timeline_direct() in timeline_seed.rs for search-free seeding
- Added round_robin_select_by_discussion() helper function

The entity-direct mode uses search_mode: "direct" to distinguish from
"hybrid" or "lexical" search modes in the response metadata.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-26 11:06:23 -05:00
teernisse
8657e10822 feat(related): add semantic similarity discovery command
Implement `lore related` command for discovering semantically similar entities
using vector embeddings. Supports two modes:

Entity mode:
  lore related issues 42     # Find entities similar to issue #42
  lore related mrs 99        # Find entities similar to MR !99

Query mode:
  lore related "auth bug"    # Find entities matching free text query

Key features:
- Uses existing embedding infrastructure (nomic-embed-text via Ollama)
- Computes shared labels between source and results
- Shows similarity scores as percentage (0-100%)
- Warns when all results have low similarity (<30%)
- Warns for short queries (<=2 words) that may produce noisy results
- Filters out discussion/note documents, returning only issues and MRs
- Handles orphaned documents gracefully (skips if entity deleted)
- Robot mode JSON output with {ok, data, meta} envelope

Implementation details:
- distance_to_similarity() converts L2 distance to 0-1 score: 1/(1+distance)
- Uses saturating_add/saturating_mul for overflow safety on limit parameter
- Proper error handling for missing embeddings ("run lore embed first")
- Project scoping via -p flag with fuzzy matching

CLI integration:
- Added to autocorrect.rs command registry
- Added Related variant to Commands enum in cli/mod.rs
- Wired into main.rs with handle_related()

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-26 11:06:12 -05:00
teernisse
7fdeafa330 feat(db): add migration 028 for discussions.merge_request_id FK constraint
Add foreign key constraint on discussions.merge_request_id to prevent orphaned
discussions when MRs are deleted. SQLite doesn't support ALTER TABLE ADD CONSTRAINT,
so this migration recreates the table with:

1. New table with FK: REFERENCES merge_requests(id) ON DELETE CASCADE
2. Data copy with FK validation (only copies rows with valid MR references)
3. Table swap (DROP old, RENAME new)
4. Full index recreation (all 10 indexes from migrations 002-022)

The migration also includes a CHECK constraint ensuring mutual exclusivity:
- Issue discussions have issue_id NOT NULL and merge_request_id NULL
- MR discussions have merge_request_id NOT NULL and issue_id NULL

Also fixes run_migrations() to properly propagate query errors instead of
silently returning unwrap_or defaults, improving error diagnostics.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-26 11:06:01 -05:00
teernisse
0fe3737035 docs(plan): add GitLab TODOs integration design document
Captures design decisions and acceptance criteria for adding GitLab
TODO support to lore. This plan was developed through user interview
to ensure the feature aligns with actual workflows.

Key design decisions:
- Read-only scope (no mark-as-done operations)
- Three integration points: --todos flag, activity enrichment, lore todos
- Account-wide: --project does NOT filter todos (unlike issues/MRs)
- Separate signal: todos don't affect attention state calculation
- Snapshot sync: missing todos = marked done elsewhere = delete locally

The plan covers:
- Database schema (todos table + indexes)
- GitLab API client extensions
- Sync pipeline integration
- Action type handling and grouping
- CLI commands and robot mode schemas
- Non-synced project handling with [external] indicator

Implementation is organized into 5 rollout slices:
A: Schema + Client, B: Sync, C: lore todos, D: lore me, E: Polish

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-25 10:02:55 -05:00
teernisse
87bdbda468 feat(status): add per-entity sync counts from migration 027
Enhances sync status reporting to include granular per-entity counts
that were added in database migration 027. This provides better
visibility into what each sync run actually processed.

New fields in SyncRunInfo and robot mode JSON:
- issues_fetched / issues_ingested: issue sync counts
- mrs_fetched / mrs_ingested: merge request sync counts
- skipped_stale: entities skipped due to staleness
- docs_regenerated / docs_embedded: document pipeline counts
- warnings_count: non-fatal issues during sync

Robot mode optimization:
- Uses skip_serializing_if = "is_zero" to omit zero-value fields
- Reduces JSON payload size for typical sync runs
- Maintains backwards compatibility (fields are additive)

SQL query now reads all 8 new columns from sync_runs table,
with defensive unwrap_or(0) for NULL handling.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-25 10:02:45 -05:00
teernisse
ed987c8f71 docs: update robot-docs manifest and agent instructions for since-last-check
Updates the `lore robot-docs` manifest with comprehensive documentation
for the new since-last-check inbox feature, enabling AI agents to
discover and use the functionality programmatically.

robot-docs manifest additions:
- since_last_check response schema with cursor_iso, groups, events
- --reset-cursor flag documentation
- Design notes: cursor persistence location, --project filter behavior
- Example commands in personal_dashboard section

Agent instruction updates (AGENTS.md, CLAUDE.md):
- Added --mrs, --project, --user flags to command examples
- Added --reset-cursor example
- Aligned both files for consistency

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-25 10:02:37 -05:00
teernisse
ce5621f3ed feat(me): add "since last check" cursor-based inbox to dashboard
Implements a cursor-based notification inbox that surfaces actionable
events from others since the user's last `lore me` invocation. This
addresses the core UX need: "what happened while I was away?"

Event Sources (three-way UNION query):
1. Others' comments on user's open issues/MRs
2. @mentions on ANY item (not restricted to owned items)
3. Assignment/review-request system notes mentioning user

Mention Detection:
- SQL LIKE pre-filter for performance, then regex validation
- Word-boundary-aware: rejects "alice" in "@alice-bot" or "alice@corp.com"
- Domain rejection: "@alice.com" not matched (prevents email false positives)
- Punctuation tolerance: "@alice," "@alice." "(@ alice)" all match

Cursor Watermark Pattern:
- Global watermark computed from ALL projects before --project filtering
- Ensures --project display filter doesn't permanently skip events
- Cursor advances only after successful render (no data loss on errors)
- First run establishes baseline (no inbox shown), subsequent runs show delta

Output:
- Human: color-coded event badges, grouped by entity, actor + timestamp
- Robot: standard envelope with since_last_check object containing
  cursor_iso, total_event_count, and groups array with nested events

CLI additions:
- --reset-cursor flag: clears cursor (next run shows no new events)
- Autocorrect: --reset-cursor added to known me command flags

Tests cover:
- Mention with trailing comma/period/parentheses (should match)
- Email-like text "@alice.com" (should NOT match)  
- Domain-like text "@alice.example" (should NOT match)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-25 10:02:31 -05:00
teernisse
eac640225f feat(core): add cursor persistence module for session-based timestamps
Introduces a lightweight file-based cursor system for persisting
per-user timestamps across CLI invocations. This enables "since last
check" semantics where `lore me` can track what the user has seen.

Key design decisions:
- Per-user cursor files: ~/.local/share/lore/me_cursor_<username>.json
- Atomic writes via temp-file + rename pattern (crash-safe)
- Graceful degradation: missing/corrupt files return None
- Username sanitization: non-safe chars replaced with underscore

The cursor module provides three operations:
- read_cursor(username) -> Option<i64>: read last-check timestamp
- write_cursor(username, timestamp_ms): atomically persist timestamp  
- reset_cursor(username): delete cursor file (no-op if missing)

Tests cover: missing file, roundtrip, per-user isolation, reset
isolation, JSON validity after overwrites, corrupt file handling.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-25 10:02:13 -05:00
42 changed files with 6180 additions and 83 deletions

File diff suppressed because one or more lines are too long

View File

@@ -1 +1 @@
bd-1tv8
bd-8con

View File

@@ -626,8 +626,12 @@ lore --robot embed
# Personal work dashboard
lore --robot me
lore --robot me --issues
lore --robot me --mrs
lore --robot me --activity --since 7d
lore --robot me --project group/repo
lore --robot me --user jdoe
lore --robot me --fields minimal
lore --robot me --reset-cursor
# Agent self-discovery manifest (all commands, flags, exit codes, response schemas)
lore robot-docs

View File

@@ -645,8 +645,12 @@ lore --robot embed
# Personal work dashboard
lore --robot me
lore --robot me --issues
lore --robot me --mrs
lore --robot me --activity --since 7d
lore --robot me --project group/repo
lore --robot me --user jdoe
lore --robot me --fields minimal
lore --robot me --reset-cursor
# Agent self-discovery manifest (all commands, flags, exit codes, response schemas)
lore robot-docs

2
Cargo.lock generated
View File

@@ -1158,7 +1158,7 @@ checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
[[package]]
name = "lore"
version = "0.9.0"
version = "0.9.1"
dependencies = [
"async-stream",
"charmed-lipgloss",

View File

@@ -1,6 +1,6 @@
[package]
name = "lore"
version = "0.9.0"
version = "0.9.1"
edition = "2024"
description = "Gitlore - Local GitLab data management with semantic search"
authors = ["Taylor Eernisse"]

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,92 @@
# Lore Command Surface Analysis — Overview
**Date:** 2026-02-26
**Version:** v0.9.1 (439c20e)
---
## Purpose
Deep analysis of the full `lore` CLI command surface: what each command does, how commands overlap, how they connect in agent workflows, and where consolidation and robot-mode optimization can reduce round trips and token waste.
## Document Map
| File | Contents | When to Read |
|---|---|---|
| **00-overview.md** | This file. Summary, inventory, priorities. | Always read first. |
| [01-entity-commands.md](01-entity-commands.md) | `issues`, `mrs`, `notes`, `search`, `count` — flags, DB tables, robot schemas | Need command reference for entity queries |
| [02-intelligence-commands.md](02-intelligence-commands.md) | `who`, `timeline`, `me`, `file-history`, `trace`, `related`, `drift` | Need command reference for intelligence/analysis |
| [03-pipeline-and-infra.md](03-pipeline-and-infra.md) | `sync`, `ingest`, `generate-docs`, `embed`, diagnostics, setup | Need command reference for data management |
| [04-data-flow.md](04-data-flow.md) | Shared data source map, command network graph, clusters | Understanding how commands interconnect |
| [05-overlap-analysis.md](05-overlap-analysis.md) | Quantified overlap percentages for every command pair | Evaluating what to consolidate |
| [06-agent-workflows.md](06-agent-workflows.md) | Common agent flows, round-trip costs, token profiles | Understanding inefficiency pain points |
| [07-consolidation-proposals.md](07-consolidation-proposals.md) | 5 proposals to reduce 34 commands to 29 | Planning command surface changes |
| [08-robot-optimization-proposals.md](08-robot-optimization-proposals.md) | 6 proposals for `--include`, `--batch`, `--depth`, etc. | Planning robot-mode improvements |
| [09-appendices.md](09-appendices.md) | Robot output envelope, field presets, exit codes | Reference material |
---
## Command Inventory (34 commands)
| Category | Commands | Count |
|---|---|---|
| Entity Query | `issues`, `mrs`, `notes`, `search`, `count` | 5 |
| Intelligence | `who` (5 modes), `timeline`, `related`, `drift`, `me`, `file-history`, `trace` | 7 (11 with who sub-modes) |
| Data Pipeline | `sync`, `ingest`, `generate-docs`, `embed` | 4 |
| Diagnostics | `health`, `auth`, `doctor`, `status`, `stats` | 5 |
| Setup | `init`, `token`, `cron`, `migrate` | 4 |
| Meta | `version`, `completions`, `robot-docs` | 3 |
---
## Key Findings
### High-Overlap Pairs
| Pair | Overlap | Recommendation |
|---|---|---|
| `who workload` vs `me` | ~85% | Workload is a strict subset of me |
| `health` vs `doctor` | ~90% | Health is a strict subset of doctor |
| `file-history` vs `trace` | ~75% | Trace is a superset minus `--merged` |
| `related` query-mode vs `search --mode semantic` | ~80% | Related query-mode is search without filters |
| `auth` vs `doctor` | ~100% of auth | Auth is fully contained within doctor |
### Agent Workflow Pain Points
| Workflow | Current Round Trips | With Optimizations |
|---|---|---|
| "Understand this issue" | 4 calls | 1 call (`--include`) |
| "Why was code changed?" | 3 calls | 1 call (`--include`) |
| "What should I work on?" | 4 calls | 2 calls |
| "Find and understand" | 4 calls | 2 calls |
| "Is system healthy?" | 2-4 calls | 1 call |
---
## Priority Ranking
| Pri | Proposal | Category | Effort | Impact |
|---|---|---|---|---|
| **P0** | `--include` flag on detail commands | Robot optimization | High | Eliminates 2-3 round trips per workflow |
| **P0** | `--depth` on `me` command | Robot optimization | Low | 60-80% token reduction on most-used command |
| **P1** | `--batch` for detail views | Robot optimization | Medium | Eliminates N+1 after search/timeline |
| **P1** | Absorb `file-history` into `trace` | Consolidation | Low | Cleaner surface, shared code |
| **P1** | Merge `who overlap` into `who expert` | Consolidation | Low | -1 round trip in review flows |
| **P2** | `context` composite command | Robot optimization | Medium | Single entry point for entity understanding |
| **P2** | Merge `count`+`status` into `stats` | Consolidation | Medium | -2 commands, progressive disclosure |
| **P2** | Absorb `auth` into `doctor` | Consolidation | Low | -1 command |
| **P2** | Remove `related` query-mode | Consolidation | Low | -1 confusing choice |
| **P3** | `--max-tokens` budget | Robot optimization | High | Flexible but complex to implement |
| **P3** | `--format tsv` | Robot optimization | Medium | High savings, limited applicability |
### Consolidation Summary
| Before | After | Removed |
|---|---|---|
| `file-history` + `trace` | `trace` (+ `--shallow`) | -1 |
| `auth` + `doctor` | `doctor` (+ `--auth`) | -1 |
| `related` query-mode | `search --mode semantic` | -1 mode |
| `who overlap` + `who expert` | `who expert` (+ touch_count) | -1 sub-mode |
| `count` + `status` + `stats` | `stats` (+ `--entities`, `--sync`) | -2 |
**Total: 34 commands -> 29 commands**

View File

@@ -0,0 +1,308 @@
# Entity Query Commands
Reference for: `issues`, `mrs`, `notes`, `search`, `count`
---
## `issues` (alias: `issue`)
List or show issues from local database.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `[IID]` | positional | — | Omit to list, provide to show detail |
| `-n, --limit` | int | 50 | Max results |
| `--fields` | string | — | Select output columns (preset: `minimal`) |
| `-s, --state` | enum | — | `opened\|closed\|all` |
| `-p, --project` | string | — | Filter by project (fuzzy) |
| `-a, --author` | string | — | Filter by author username |
| `-A, --assignee` | string | — | Filter by assignee username |
| `-l, --label` | string[] | — | Filter by labels (AND logic, repeatable) |
| `-m, --milestone` | string | — | Filter by milestone title |
| `--status` | string[] | — | Filter by work-item status (COLLATE NOCASE, OR logic) |
| `--since` | duration/date | — | Filter by created date (`7d`, `2w`, `YYYY-MM-DD`) |
| `--due-before` | date | — | Filter by due date |
| `--has-due` | flag | — | Show only issues with due dates |
| `--sort` | enum | `updated` | `updated\|created\|iid` |
| `--asc` | flag | — | Sort ascending |
| `-o, --open` | flag | — | Open first match in browser |
**DB tables:** `issues`, `projects`, `issue_assignees`, `issue_labels`, `labels`
**Detail mode adds:** `discussions`, `notes`, `entity_references` (closing MRs)
### Robot Output (list mode)
```json
{
"ok": true,
"data": {
"issues": [
{
"iid": 42, "title": "Fix auth", "state": "opened",
"author_username": "jdoe", "labels": ["backend"],
"assignees": ["jdoe"], "discussion_count": 3,
"unresolved_count": 1, "created_at_iso": "...",
"updated_at_iso": "...", "web_url": "...",
"project_path": "group/repo",
"status_name": "In progress"
}
],
"total_count": 150, "showing": 50
},
"meta": { "elapsed_ms": 40, "available_statuses": ["Open", "In progress", "Closed"] }
}
```
### Robot Output (detail mode — `issues <IID>`)
```json
{
"ok": true,
"data": {
"id": 12345, "iid": 42, "title": "Fix auth",
"description": "Full markdown body...",
"state": "opened", "author_username": "jdoe",
"created_at": "...", "updated_at": "...", "closed_at": null,
"confidential": false, "web_url": "...", "project_path": "group/repo",
"references_full": "group/repo#42",
"labels": ["backend"], "assignees": ["jdoe"],
"due_date": null, "milestone": null,
"user_notes_count": 5, "merge_requests_count": 1,
"closing_merge_requests": [
{ "iid": 99, "title": "Refactor auth", "state": "merged", "web_url": "..." }
],
"discussions": [
{
"notes": [
{ "author_username": "jdoe", "body": "...", "created_at": "...", "is_system": false }
],
"individual_note": false
}
],
"status_name": "In progress", "status_color": "#1068bf"
}
}
```
**Minimal preset:** `iid`, `title`, `state`, `updated_at_iso`
---
## `mrs` (aliases: `mr`, `merge-request`, `merge-requests`)
List or show merge requests.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `[IID]` | positional | — | Omit to list, provide to show detail |
| `-n, --limit` | int | 50 | Max results |
| `--fields` | string | — | Select output columns (preset: `minimal`) |
| `-s, --state` | enum | — | `opened\|merged\|closed\|locked\|all` |
| `-p, --project` | string | — | Filter by project |
| `-a, --author` | string | — | Filter by author |
| `-A, --assignee` | string | — | Filter by assignee |
| `-r, --reviewer` | string | — | Filter by reviewer |
| `-l, --label` | string[] | — | Filter by labels (AND) |
| `--since` | duration/date | — | Filter by created date |
| `-d, --draft` | flag | — | Draft MRs only |
| `-D, --no-draft` | flag | — | Exclude drafts |
| `--target` | string | — | Filter by target branch |
| `--source` | string | — | Filter by source branch |
| `--sort` | enum | `updated` | `updated\|created\|iid` |
| `--asc` | flag | — | Sort ascending |
| `-o, --open` | flag | — | Open in browser |
**DB tables:** `merge_requests`, `projects`, `mr_reviewers`, `mr_labels`, `labels`, `mr_assignees`
**Detail mode adds:** `discussions`, `notes`, `mr_diffs`
### Robot Output (list mode)
```json
{
"ok": true,
"data": {
"mrs": [
{
"iid": 99, "title": "Refactor auth", "state": "merged",
"draft": false, "author_username": "jdoe",
"source_branch": "feat/auth", "target_branch": "main",
"labels": ["backend"], "assignees": ["jdoe"], "reviewers": ["reviewer"],
"discussion_count": 5, "unresolved_count": 0,
"created_at_iso": "...", "updated_at_iso": "...",
"web_url": "...", "project_path": "group/repo"
}
],
"total_count": 500, "showing": 50
}
}
```
### Robot Output (detail mode — `mrs <IID>`)
```json
{
"ok": true,
"data": {
"id": 67890, "iid": 99, "title": "Refactor auth",
"description": "Full markdown body...",
"state": "merged", "draft": false, "author_username": "jdoe",
"source_branch": "feat/auth", "target_branch": "main",
"created_at": "...", "updated_at": "...",
"merged_at": "...", "closed_at": null,
"web_url": "...", "project_path": "group/repo",
"labels": ["backend"], "assignees": ["jdoe"], "reviewers": ["reviewer"],
"discussions": [
{
"notes": [
{
"author_username": "reviewer", "body": "...",
"created_at": "...", "is_system": false,
"position": { "new_path": "src/auth.rs", "new_line": 42 }
}
],
"individual_note": false
}
]
}
}
```
**Minimal preset:** `iid`, `title`, `state`, `updated_at_iso`
---
## `notes` (alias: `note`)
List discussion notes/comments with fine-grained filters.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `-n, --limit` | int | 50 | Max results |
| `--fields` | string | — | Preset: `minimal` |
| `-a, --author` | string | — | Filter by author |
| `--note-type` | enum | — | `DiffNote\|DiscussionNote` |
| `--contains` | string | — | Body text substring filter |
| `--note-id` | int | — | Internal note ID |
| `--gitlab-note-id` | int | — | GitLab note ID |
| `--discussion-id` | string | — | Discussion ID filter |
| `--include-system` | flag | — | Include system notes |
| `--for-issue` | int | — | Notes on specific issue (requires `-p`) |
| `--for-mr` | int | — | Notes on specific MR (requires `-p`) |
| `-p, --project` | string | — | Scope to project |
| `--since` | duration/date | — | Created after |
| `--until` | date | — | Created before (inclusive) |
| `--path` | string | — | File path filter (exact or prefix with `/`) |
| `--resolution` | enum | — | `any\|unresolved\|resolved` |
| `--sort` | enum | `created` | `created\|updated` |
| `--asc` | flag | — | Sort ascending |
| `--open` | flag | — | Open in browser |
**DB tables:** `notes`, `discussions`, `projects`, `issues`, `merge_requests`
### Robot Output
```json
{
"ok": true,
"data": {
"notes": [
{
"id": 1234, "gitlab_id": 56789,
"author_username": "reviewer", "body": "...",
"note_type": "DiffNote", "is_system": false,
"created_at_iso": "...", "updated_at_iso": "...",
"position_new_path": "src/auth.rs", "position_new_line": 42,
"resolvable": true, "resolved": false,
"noteable_type": "MergeRequest", "parent_iid": 99,
"parent_title": "Refactor auth", "project_path": "group/repo"
}
],
"total_count": 1000, "showing": 50
}
}
```
**Minimal preset:** `id`, `author_username`, `body`, `created_at_iso`
---
## `search` (aliases: `find`, `query`)
Semantic + full-text search across indexed documents.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `<QUERY>` | positional | required | Search query string |
| `--mode` | enum | `hybrid` | `lexical\|hybrid\|semantic` |
| `--type` | enum | — | `issue\|mr\|discussion\|note` |
| `--author` | string | — | Filter by author |
| `-p, --project` | string | — | Scope to project |
| `--label` | string[] | — | Filter by labels (AND) |
| `--path` | string | — | File path filter |
| `--since` | duration/date | — | Created after |
| `--updated-since` | duration/date | — | Updated after |
| `-n, --limit` | int | 20 | Max results (max: 100) |
| `--fields` | string | — | Preset: `minimal` |
| `--explain` | flag | — | Show ranking breakdown |
| `--fts-mode` | enum | `safe` | `safe\|raw` |
**DB tables:** `documents`, `documents_fts` (FTS5), `embeddings` (vec0), `document_labels`, `document_paths`, `projects`
**Search modes:**
- **lexical** — FTS5 with BM25 ranking (fastest, no Ollama needed)
- **hybrid** — RRF combination of lexical + semantic (default)
- **semantic** — Vector similarity only (requires Ollama)
### Robot Output
```json
{
"ok": true,
"data": {
"query": "authentication bug",
"mode": "hybrid",
"total_results": 15,
"results": [
{
"document_id": 1234, "source_type": "issue",
"title": "Fix SSO auth", "url": "...",
"author": "jdoe", "project_path": "group/repo",
"labels": ["auth"], "paths": ["src/auth/"],
"snippet": "...matching text...",
"score": 0.85,
"explain": { "vector_rank": 2, "fts_rank": 1, "rrf_score": 0.85 }
}
],
"warnings": []
}
}
```
**Minimal preset:** `document_id`, `title`, `source_type`, `score`
---
## `count`
Count entities in local database.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `<ENTITY>` | positional | required | `issues\|mrs\|discussions\|notes\|events\|references` |
| `-f, --for` | enum | — | Parent type: `issue\|mr` |
**DB tables:** Conditional aggregation on entity tables
### Robot Output
```json
{
"ok": true,
"data": {
"entity": "merge_requests",
"count": 1234,
"system_excluded": 5000,
"breakdown": { "opened": 100, "closed": 50, "merged": 1084 }
}
}
```

View File

@@ -0,0 +1,452 @@
# Intelligence Commands
Reference for: `who`, `timeline`, `me`, `file-history`, `trace`, `related`, `drift`
---
## `who` (People Intelligence)
Five sub-modes, dispatched by argument shape.
| Mode | Trigger | Purpose |
|---|---|---|
| **expert** | `who <path>` or `who --path <path>` | Who knows about a code area? |
| **workload** | `who @username` | What is this person working on? |
| **reviews** | `who @username --reviews` | Review pattern analysis |
| **active** | `who --active` | Unresolved discussions needing attention |
| **overlap** | `who --overlap <path>` | Who else touches these files? |
### Shared Flags
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `-p, --project` | string | — | Scope to project |
| `-n, --limit` | int | varies | Max results (1-500) |
| `--fields` | string | — | Preset: `minimal` |
| `--since` | duration/date | — | Time window |
| `--include-bots` | flag | — | Include bot users |
| `--include-closed` | flag | — | Include closed issues/MRs |
| `--all-history` | flag | — | Query all history |
### Expert-Only Flags
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `--detail` | flag | — | Per-MR breakdown |
| `--as-of` | date/duration | — | Score at point in time |
| `--explain-score` | flag | — | Score breakdown |
### DB Tables by Mode
| Mode | Primary Tables |
|---|---|
| expert | `notes` (INDEXED BY idx_notes_diffnote_path_created), `merge_requests`, `mr_reviewers` |
| workload | `issues`, `merge_requests`, `mr_reviewers` |
| reviews | `merge_requests`, `discussions`, `notes` |
| active | `discussions`, `notes`, `issues`, `merge_requests` |
| overlap | `notes`, `mr_file_changes`, `merge_requests` |
### Robot Output (expert)
```json
{
"ok": true,
"data": {
"mode": "expert",
"input": { "target": "src/auth/", "path": "src/auth/" },
"resolved_input": { "mode": "expert", "project_id": 1, "project_path": "group/repo" },
"result": {
"experts": [
{
"username": "jdoe", "score": 42.5,
"detail": { "mr_ids_author": [99, 101], "mr_ids_reviewer": [88] }
}
]
}
}
}
```
### Robot Output (workload)
```json
{
"data": {
"mode": "workload",
"result": {
"assigned_issues": [{ "iid": 42, "title": "Fix auth", "state": "opened" }],
"authored_mrs": [{ "iid": 99, "title": "Refactor auth", "state": "merged" }],
"review_mrs": [{ "iid": 88, "title": "Add SSO", "state": "opened" }]
}
}
}
```
### Robot Output (reviews)
```json
{
"data": {
"mode": "reviews",
"result": {
"categories": [
{
"category": "approval_rate",
"reviewers": [{ "name": "jdoe", "count": 15, "percentage": 85.0 }]
}
]
}
}
}
```
### Robot Output (active)
```json
{
"data": {
"mode": "active",
"result": {
"discussions": [
{ "entity_type": "mr", "iid": 99, "title": "Refactor auth", "participants": ["jdoe", "reviewer"] }
]
}
}
}
```
### Robot Output (overlap)
```json
{
"data": {
"mode": "overlap",
"result": {
"users": [{ "username": "jdoe", "touch_count": 15 }]
}
}
}
```
### Minimal Presets
| Mode | Fields |
|---|---|
| expert | `username`, `score` |
| workload | `iid`, `title`, `state` |
| reviews | `name`, `count`, `percentage` |
| active | `entity_type`, `iid`, `title`, `participants` |
| overlap | `username`, `touch_count` |
---
## `timeline`
Reconstruct chronological event history for a topic/entity with cross-reference expansion.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `<QUERY>` | positional | required | Search text or entity ref (`issue:42`, `mr:99`) |
| `-p, --project` | string | — | Scope to project |
| `--since` | duration/date | — | Filter events after |
| `--depth` | int | 1 | Cross-ref expansion depth (0=none) |
| `--no-mentions` | flag | — | Skip "mentioned" edges, keep "closes"/"related" |
| `-n, --limit` | int | 100 | Max events |
| `--fields` | string | — | Preset: `minimal` |
| `--max-seeds` | int | 10 | Max seed entities from search |
| `--max-entities` | int | 50 | Max expanded entities |
| `--max-evidence` | int | 10 | Max evidence notes |
**Pipeline:** SEED -> HYDRATE -> EXPAND -> COLLECT -> RENDER
**DB tables:** `issues`, `merge_requests`, `discussions`, `notes`, `entity_references`, `resource_state_events`, `resource_label_events`, `resource_milestone_events`, `documents` (for search seeding)
### Robot Output
```json
{
"ok": true,
"data": {
"query": "authentication", "event_count": 25,
"seed_entities": [{ "type": "issue", "iid": 42, "project": "group/repo" }],
"expanded_entities": [
{
"type": "mr", "iid": 99, "project": "group/repo", "depth": 1,
"via": {
"from": { "type": "issue", "iid": 42 },
"reference_type": "closes"
}
}
],
"unresolved_references": [
{
"source": { "type": "issue", "iid": 42, "project": "group/repo" },
"target_type": "mr", "target_iid": 200, "reference_type": "mentioned"
}
],
"events": [
{
"timestamp": "2026-01-15T10:30:00Z",
"entity_type": "issue", "entity_iid": 42, "project": "group/repo",
"event_type": "state_changed", "summary": "Reopened",
"actor": "jdoe", "is_seed": true,
"evidence_notes": [{ "author": "jdoe", "snippet": "..." }]
}
]
},
"meta": {
"elapsed_ms": 150, "search_mode": "fts",
"expansion_depth": 1, "include_mentions": true,
"total_entities": 5, "total_events": 25,
"evidence_notes_included": 8, "discussion_threads_included": 3,
"unresolved_references": 1, "showing": 25
}
}
```
**Minimal preset:** `timestamp`, `type`, `entity_iid`, `detail`
---
## `me` (Personal Dashboard)
Personal work dashboard with issues, MRs, activity, and since-last-check inbox.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `--issues` | flag | — | Open issues section only |
| `--mrs` | flag | — | MRs section only |
| `--activity` | flag | — | Activity feed only |
| `--since` | duration/date | `30d` | Activity window |
| `-p, --project` | string | — | Scope to one project |
| `--all` | flag | — | All synced projects |
| `--user` | string | — | Override configured username |
| `--fields` | string | — | Preset: `minimal` |
| `--reset-cursor` | flag | — | Clear since-last-check cursor |
**Sections (no flags = all):** Issues, MRs authored, MRs reviewing, Activity, Inbox
**DB tables:** `issues`, `merge_requests`, `resource_state_events`, `projects`, `issue_labels`, `mr_labels`
### Robot Output
```json
{
"ok": true,
"data": {
"username": "jdoe",
"summary": {
"project_count": 3, "open_issue_count": 5,
"authored_mr_count": 2, "reviewing_mr_count": 1,
"needs_attention_count": 3
},
"since_last_check": {
"cursor_iso": "2026-02-25T18:00:00Z",
"total_event_count": 8,
"groups": [
{
"entity_type": "issue", "entity_iid": 42,
"entity_title": "Fix auth", "project": "group/repo",
"events": [
{ "timestamp_iso": "...", "event_type": "comment",
"actor": "reviewer", "summary": "New comment" }
]
}
]
},
"open_issues": [
{
"project": "group/repo", "iid": 42, "title": "Fix auth",
"state": "opened", "attention_state": "needs_attention",
"status_name": "In progress", "labels": ["auth"],
"updated_at_iso": "..."
}
],
"open_mrs_authored": [
{
"project": "group/repo", "iid": 99, "title": "Refactor auth",
"state": "opened", "attention_state": "needs_attention",
"draft": false, "labels": ["backend"], "updated_at_iso": "..."
}
],
"reviewing_mrs": [],
"activity": [
{
"timestamp_iso": "...", "event_type": "state_changed",
"entity_type": "issue", "entity_iid": 42, "project": "group/repo",
"actor": "jdoe", "is_own": true, "summary": "Closed"
}
]
}
}
```
**Minimal presets:** Items: `iid, title, attention_state, updated_at_iso` | Activity: `timestamp_iso, event_type, entity_iid, actor`
---
## `file-history`
Show which MRs touched a file, with linked discussions.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `<PATH>` | positional | required | File path to trace |
| `-p, --project` | string | — | Scope to project |
| `--discussions` | flag | — | Include DiffNote snippets |
| `--no-follow-renames` | flag | — | Skip rename chain resolution |
| `--merged` | flag | — | Only merged MRs |
| `-n, --limit` | int | 50 | Max MRs |
**DB tables:** `mr_file_changes`, `merge_requests`, `notes` (DiffNotes), `projects`
### Robot Output
```json
{
"ok": true,
"data": {
"path": "src/auth/middleware.rs",
"rename_chain": [
{ "previous_path": "src/auth.rs", "mr_iid": 55, "merged_at": "..." }
],
"merge_requests": [
{
"iid": 99, "title": "Refactor auth", "state": "merged",
"author": "jdoe", "merged_at": "...", "change_type": "modified"
}
],
"discussions": [
{
"discussion_id": 123, "mr_iid": 99, "author": "reviewer",
"body_snippet": "...", "path": "src/auth/middleware.rs"
}
]
},
"meta": { "elapsed_ms": 30, "total_mrs": 5, "renames_followed": true }
}
```
---
## `trace`
File -> MR -> issue -> discussion chain to understand why code was introduced.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `<PATH>` | positional | required | File path (future: `:line` suffix) |
| `-p, --project` | string | — | Scope to project |
| `--discussions` | flag | — | Include DiffNote snippets |
| `--no-follow-renames` | flag | — | Skip rename chain |
| `-n, --limit` | int | 20 | Max chains |
**DB tables:** `mr_file_changes`, `merge_requests`, `issues`, `discussions`, `notes`, `entity_references`
### Robot Output
```json
{
"ok": true,
"data": {
"path": "src/auth/middleware.rs",
"resolved_paths": ["src/auth/middleware.rs", "src/auth.rs"],
"trace_chains": [
{
"mr_iid": 99, "mr_title": "Refactor auth", "mr_state": "merged",
"mr_author": "jdoe", "change_type": "modified",
"merged_at_iso": "...", "web_url": "...",
"issues": [42],
"discussions": [
{
"discussion_id": 123, "author_username": "reviewer",
"body_snippet": "...", "path": "src/auth/middleware.rs"
}
]
}
]
},
"meta": { "tier": "api_only", "total_chains": 3, "renames_followed": 1 }
}
```
---
## `related`
Find semantically related entities via vector search.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `<QUERY_OR_TYPE>` | positional | required | Entity type (`issues`, `mrs`) or free text |
| `[IID]` | positional | — | Entity IID (required with entity type) |
| `-n, --limit` | int | 10 | Max results |
| `-p, --project` | string | — | Scope to project |
**Two modes:**
- **Entity mode:** `related issues 42` — find entities similar to issue #42
- **Query mode:** `related "auth flow"` — find entities matching free text
**DB tables:** `documents`, `embeddings` (vec0), `projects`
**Requires:** Ollama running (for query mode embedding)
### Robot Output (entity mode)
```json
{
"ok": true,
"data": {
"query_entity_type": "issue",
"query_entity_iid": 42,
"query_entity_title": "Fix SSO authentication",
"similar_entities": [
{
"entity_type": "mr", "entity_iid": 99,
"entity_title": "Refactor auth module",
"project_path": "group/repo", "state": "merged",
"similarity_score": 0.87,
"shared_labels": ["auth"], "shared_authors": ["jdoe"]
}
]
}
}
```
---
## `drift`
Detect discussion divergence from original intent.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `<ENTITY_TYPE>` | positional | required | Currently only `issues` |
| `<IID>` | positional | required | Entity IID |
| `--threshold` | f32 | 0.4 | Similarity threshold (0.0-1.0) |
| `-p, --project` | string | — | Scope to project |
**DB tables:** `issues`, `discussions`, `notes`, `embeddings`
**Requires:** Ollama running
### Robot Output
```json
{
"ok": true,
"data": {
"entity_type": "issue", "entity_iid": 42,
"total_notes": 15,
"detected_drift": true,
"drift_point": {
"note_index": 8, "similarity": 0.32,
"author": "someone", "created_at": "..."
},
"similarity_curve": [
{ "note_index": 0, "similarity": 0.95, "author": "jdoe", "created_at": "..." },
{ "note_index": 1, "similarity": 0.88, "author": "reviewer", "created_at": "..." }
]
}
}
```

View File

@@ -0,0 +1,210 @@
# Pipeline & Infrastructure Commands
Reference for: `sync`, `ingest`, `generate-docs`, `embed`, `health`, `auth`, `doctor`, `status`, `stats`, `init`, `token`, `cron`, `migrate`, `version`, `completions`, `robot-docs`
---
## Data Pipeline
### `sync` (Full Pipeline)
Complete sync: ingest -> generate-docs -> embed.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `--full` | flag | — | Full re-sync (reset cursors) |
| `-f, --force` | flag | — | Override stale lock |
| `--no-embed` | flag | — | Skip embedding |
| `--no-docs` | flag | — | Skip doc generation |
| `--no-events` | flag | — | Skip resource events |
| `--no-file-changes` | flag | — | Skip MR file changes |
| `--no-status` | flag | — | Skip work-item status enrichment |
| `--dry-run` | flag | — | Preview without changes |
| `-t, --timings` | flag | — | Show timing breakdown |
| `--lock` | flag | — | Acquire file lock |
| `--issue` | int[] | — | Surgically sync specific issues (repeatable) |
| `--mr` | int[] | — | Surgically sync specific MRs (repeatable) |
| `-p, --project` | string | — | Required with `--issue`/`--mr` |
| `--preflight-only` | flag | — | Validate without DB writes |
**Stages:** GitLab REST ingest -> GraphQL status enrichment -> Document generation -> Ollama embedding
**Surgical sync:** `lore sync --issue 42 --mr 99 -p group/repo` fetches only specific entities.
### `ingest`
Fetch data from GitLab API only (no docs, no embeddings).
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `[ENTITY]` | positional | — | `issues` or `mrs` (omit for all) |
| `-p, --project` | string | — | Single project |
| `-f, --force` | flag | — | Override stale lock |
| `--full` | flag | — | Full re-sync |
| `--dry-run` | flag | — | Preview |
**Fetches from GitLab:**
- Issues + discussions + notes
- MRs + discussions + notes
- Resource events (state, label, milestone)
- MR file changes (for DiffNote tracking)
- Work-item statuses (via GraphQL)
### `generate-docs`
Create searchable documents from ingested data.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `--full` | flag | — | Full rebuild |
| `-p, --project` | string | — | Single project rebuild |
**Writes:** `documents`, `document_labels`, `document_paths`
### `embed`
Generate vector embeddings via Ollama.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `--full` | flag | — | Re-embed all |
| `--retry-failed` | flag | — | Retry failed embeddings |
**Requires:** Ollama running with `nomic-embed-text`
**Writes:** `embeddings`, `embedding_metadata`
---
## Diagnostics
### `health`
Quick pre-flight check (~50ms). Exit 0 = healthy, exit 19 = unhealthy.
**Checks:** config found, DB found, schema version current.
```json
{
"ok": true,
"data": {
"healthy": true,
"config_found": true, "db_found": true,
"schema_current": true, "schema_version": 28
}
}
```
### `auth`
Verify GitLab authentication.
**Checks:** token set, GitLab reachable, user identity.
### `doctor`
Comprehensive environment check.
**Checks:** config validity, token, GitLab connectivity, DB health, migration status, Ollama availability + model status.
```json
{
"ok": true,
"data": {
"config": { "valid": true, "path": "~/.config/lore/config.json" },
"token": { "set": true, "gitlab": { "reachable": true, "user": "jdoe" } },
"database": { "exists": true, "version": 28, "tables": 25 },
"ollama": { "available": true, "model_ready": true }
}
}
```
### `status` (alias: `st`)
Show sync state per project.
```json
{
"ok": true,
"data": {
"projects": [
{
"project_path": "group/repo",
"last_synced_at": "2026-02-26T10:00:00Z",
"document_count": 5000, "discussion_count": 2000, "notes_count": 15000
}
]
}
}
```
### `stats` (alias: `stat`)
Document and index statistics with optional integrity checks.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `--check` | flag | — | Run integrity checks |
| `--repair` | flag | — | Fix issues (implies `--check`) |
| `--dry-run` | flag | — | Preview repairs |
```json
{
"ok": true,
"data": {
"documents": { "total": 61652, "issues": 5000, "mrs": 2000, "notes": 50000 },
"embeddings": { "total": 80000, "synced": 79500, "pending": 500, "failed": 0 },
"fts": { "total_docs": 61652 },
"queues": { "pending": 0, "in_progress": 0, "failed": 0, "max_attempts": 0 },
"integrity": {
"ok": true, "fts_doc_mismatch": 0, "orphan_embeddings": 0,
"stale_metadata": 0, "orphan_state_events": 0
}
}
}
```
---
## Setup
### `init`
Initialize configuration and database.
| Flag | Type | Default | Purpose |
|---|---|---|---|
| `-f, --force` | flag | — | Skip overwrite confirmation |
| `--non-interactive` | flag | — | Fail if prompts needed |
| `--gitlab-url` | string | — | GitLab base URL (required in robot mode) |
| `--token-env-var` | string | — | Env var holding token (required in robot mode) |
| `--projects` | string | — | Comma-separated project paths (required in robot mode) |
| `--default-project` | string | — | Default project path |
### `token`
| Subcommand | Flags | Purpose |
|---|---|---|
| `token set` | `--token <TOKEN>` | Store token (reads stdin if omitted) |
| `token show` | `--unmask` | Display token (masked by default) |
### `cron`
| Subcommand | Flags | Purpose |
|---|---|---|
| `cron install` | `--interval <MINUTES>` (default: 8) | Schedule auto-sync |
| `cron uninstall` | — | Remove cron job |
| `cron status` | — | Check installation |
### `migrate`
Run pending database migrations. No flags.
---
## Meta
| Command | Purpose |
|---|---|
| `version` | Show version string |
| `completions <shell>` | Generate shell completions (bash/zsh/fish/powershell) |
| `robot-docs` | Machine-readable command manifest (`--brief` for ~60% smaller) |

View File

@@ -0,0 +1,179 @@
# Data Flow & Command Network
How commands interconnect through shared data sources and output-to-input dependencies.
---
## 1. Command Network Graph
Arrows mean "output of A feeds as input to B":
```
┌─────────┐
│ search │─────────────────────────────┐
└────┬────┘ │
│ iid │ topic
┌────▼────┐ ┌────▼─────┐
┌─────│ issues │◄───────────────────────│ timeline │
│ │ mrs │ (detail) └──────────┘
│ └────┬────┘ ▲
│ │ iid │ entity ref
│ ┌────▼────┐ ┌──────────────┐ │
│ │ related │ │ file-history │───────┘
│ │ drift │ └──────┬───────┘
│ └─────────┘ │ MR iids
│ ┌────▼────┐
│ │ trace │──── issues (linked)
│ └────┬────┘
│ │ paths
│ ┌────▼────┐
│ │ who │
│ │ (expert)│
│ └─────────┘
file paths ┌─────────┐
│ │ me │──── issues, mrs (dashboard)
▼ └─────────┘
┌──────────┐ ▲
│ notes │ │ (~same data)
└──────────┘ ┌────┴──────┐
│who workload│
└───────────┘
```
### Feed Chains (output of A -> input of B)
| From | To | What Flows |
|---|---|---|
| `search` | `issues`, `mrs` | IIDs from search results -> detail lookup |
| `search` | `timeline` | Topic/query -> chronological history |
| `search` | `related` | Entity IID -> semantic similarity |
| `me` | `issues`, `mrs` | IIDs from dashboard -> detail lookup |
| `trace` | `issues` | Linked issue IIDs -> detail lookup |
| `trace` | `who` | File paths -> expert lookup |
| `file-history` | `mrs` | MR IIDs -> detail lookup |
| `file-history` | `timeline` | Entity refs -> chronological events |
| `timeline` | `issues`, `mrs` | Referenced IIDs -> detail lookup |
| `who expert` | `who reviews` | Username -> review patterns |
| `who expert` | `mrs` | MR IIDs from expert detail -> MR detail |
---
## 2. Shared Data Source Map
Which DB tables power which commands. Higher overlap = stronger consolidation signal.
### Primary Entity Tables
| Table | Read By |
|---|---|
| `issues` | issues, me, who-workload, search, timeline, trace, count, stats |
| `merge_requests` | mrs, me, who-workload, search, timeline, trace, file-history, count, stats |
| `notes` | notes, issues-detail, mrs-detail, who-expert, who-active, search, timeline, trace, file-history |
| `discussions` | notes, issues-detail, mrs-detail, who-active, who-reviews, timeline, trace |
### Relationship Tables
| Table | Read By |
|---|---|
| `entity_references` | trace, timeline |
| `mr_file_changes` | trace, file-history, who-overlap |
| `issue_labels` | issues, me |
| `mr_labels` | mrs, me |
| `issue_assignees` | issues, me |
| `mr_reviewers` | mrs, who-expert, who-workload |
### Event Tables
| Table | Read By |
|---|---|
| `resource_state_events` | timeline, me-activity |
| `resource_label_events` | timeline |
| `resource_milestone_events` | timeline |
### Document/Search Tables
| Table | Read By |
|---|---|
| `documents` + `documents_fts` | search, stats |
| `embeddings` | search, related, drift |
| `document_labels` | search |
| `document_paths` | search |
### Infrastructure Tables
| Table | Read By |
|---|---|
| `sync_cursors` | status |
| `dirty_sources` | stats |
| `embedding_metadata` | stats, embed |
---
## 3. Shared-Data Clusters
Commands that read from the same primary tables form natural clusters:
### Cluster A: Issue/MR Entities
`issues`, `mrs`, `me`, `who workload`, `count`
All read `issues` + `merge_requests` with similar filter patterns (state, author, labels, project). These commands share the same underlying WHERE-clause builder logic.
### Cluster B: Notes/Discussions
`notes`, `issues detail`, `mrs detail`, `who expert`, `who active`, `timeline`
All traverse the `discussions` -> `notes` join path. The `notes` command does it with independent filters; the others embed notes within parent context.
### Cluster C: File Genealogy
`trace`, `file-history`, `who overlap`
All use `mr_file_changes` with rename chain BFS (forward: old_path -> new_path, backward: new_path -> old_path). Shared `resolve_rename_chain()` function.
### Cluster D: Semantic/Vector
`search`, `related`, `drift`
All use `documents` + `embeddings` via Ollama. `search` adds FTS component; `related` is pure vector; `drift` uses vector for divergence scoring.
### Cluster E: Diagnostics
`health`, `auth`, `doctor`, `status`, `stats`
All check system state. `health` < `doctor` (strict subset). `status` checks sync cursors. `stats` checks document/index health. `auth` checks token/connectivity.
---
## 4. Query Pattern Sharing
### Dynamic Filter Builder (used by issues, mrs, notes)
All three list commands use the same pattern: build a WHERE clause dynamically from filter flags with parameterized tokens. Labels use EXISTS subquery against junction table.
### Rename Chain BFS (used by trace, file-history, who overlap)
Forward query:
```sql
SELECT DISTINCT new_path FROM mr_file_changes
WHERE project_id = ?1 AND old_path = ?2 AND change_type = 'renamed'
```
Backward query:
```sql
SELECT DISTINCT old_path FROM mr_file_changes
WHERE project_id = ?1 AND new_path = ?2 AND change_type = 'renamed'
```
Cycle detection via `HashSet` of visited paths, `MAX_RENAME_HOPS = 10`.
### Hybrid Search (used by search, timeline seeding)
RRF ranking: `score = (60 / fts_rank) + (60 / vector_rank)`
FTS5 queries go through `to_fts_query()` which sanitizes input and builds MATCH expressions. Vector search calls Ollama to embed the query, then does cosine similarity against `embeddings` vec0 table.
### Project Resolution (used by most commands)
`resolve_project(conn, project_filter)` does fuzzy matching on `path_with_namespace` — suffix and substring matching. Returns `(project_id, path_with_namespace)`.

View File

@@ -0,0 +1,170 @@
# Overlap Analysis
Quantified functional duplication between commands.
---
## 1. High Overlap (>70%)
### `who workload` vs `me` — 85% overlap
| Dimension | `who @user` (workload) | `me --user @user` |
|---|---|---|
| Assigned issues | Yes | Yes |
| Authored MRs | Yes | Yes |
| Reviewing MRs | Yes | Yes |
| Attention state | No | **Yes** |
| Activity feed | No | **Yes** |
| Since-last-check inbox | No | **Yes** |
| Cross-project | Yes | **Yes** |
**Verdict:** `who workload` is a strict subset of `me`. The only reason to use `who workload` is if you DON'T want attention_state/activity/inbox — but `me --issues --mrs --fields minimal` achieves the same thing.
### `health` vs `doctor` — 90% overlap
| Check | `health` | `doctor` |
|---|---|---|
| Config found | Yes | Yes |
| DB exists | Yes | Yes |
| Schema current | Yes | Yes |
| Token valid | No | **Yes** |
| GitLab reachable | No | **Yes** |
| Ollama available | No | **Yes** |
**Verdict:** `health` is a strict subset of `doctor`. However, `health` has unique value as a ~50ms pre-flight with clean exit 0/19 semantics for scripting.
### `file-history` vs `trace` — 75% overlap
| Feature | `file-history` | `trace` |
|---|---|---|
| Find MRs for file | Yes | Yes |
| Rename chain BFS | Yes | Yes |
| DiffNote discussions | `--discussions` | `--discussions` |
| Follow to linked issues | No | **Yes** |
| `--merged` filter | **Yes** | No |
**Verdict:** `trace` is a superset of `file-history` minus the `--merged` filter. Both use the same `resolve_rename_chain()` function and query `mr_file_changes`.
### `related` query-mode vs `search --mode semantic` — 80% overlap
| Feature | `related "text"` | `search "text" --mode semantic` |
|---|---|---|
| Vector similarity | Yes | Yes |
| FTS component | No | No (semantic mode skips FTS) |
| Filters (labels, author, since) | No | **Yes** |
| Explain ranking | No | **Yes** |
| Field selection | No | **Yes** |
| Requires Ollama | Yes | Yes |
**Verdict:** `related "text"` is `search --mode semantic` without any filter capabilities. The entity-seeded mode (`related issues 42`) is NOT duplicated — it seeds from an existing entity's embedding.
---
## 2. Medium Overlap (40-70%)
### `who expert` vs `who overlap` — 50%
Both answer "who works on this file" but with different scoring:
| Aspect | `who expert` | `who overlap` |
|---|---|---|
| Scoring | Half-life decay, signal types (diffnote_author, reviewer, etc.) | Raw touch count |
| Output | Ranked experts with scores | Users with touch counts |
| Use case | "Who should review this?" | "Who else touches this?" |
**Verdict:** Overlap is a simplified version of expert. Expert could include touch_count as a field.
### `timeline` vs `trace` — 45%
Both follow `entity_references` to discover connected entities, but from different entry points:
| Aspect | `timeline` | `trace` |
|---|---|---|
| Entry point | Entity (issue/MR) or search query | File path |
| Direction | Entity -> cross-refs -> events | File -> MRs -> issues -> discussions |
| Output | Chronological events | Causal chains (why code changed) |
| Expansion | Depth-controlled cross-ref following | MR -> issue via entity_references |
**Verdict:** Complementary, not duplicative. Different questions, shared plumbing.
### `auth` vs `doctor` — 100% of auth
`auth` checks: token set + GitLab reachable + user identity.
`doctor` checks: all of the above + DB + schema + Ollama.
**Verdict:** `auth` is completely contained within `doctor`.
### `count` vs `stats` — 40%
Both answer "how much data?":
| Aspect | `count` | `stats` |
|---|---|---|
| Layer | Entity (issues, MRs, notes) | Document index |
| State breakdown | Yes (opened/closed/merged) | No |
| Integrity checks | No | Yes |
| Queue status | No | Yes |
**Verdict:** Different layers. Could be unified under `stats --entities`.
### `notes` vs `issues/mrs detail` — 50%
Both return note content:
| Aspect | `notes` command | Detail view discussions |
|---|---|---|
| Independent filtering | **Yes** (author, path, resolution, contains, type) | No |
| Parent context | Minimal (parent_iid, parent_title) | **Full** (complete entity + all discussions) |
| Cross-entity queries | **Yes** (all notes matching criteria) | No (one entity only) |
**Verdict:** `notes` is for filtered queries across entities. Detail views are for complete context on one entity. Different use cases.
---
## 3. No Significant Overlap
| Command | Why It's Unique |
|---|---|
| `drift` | Only command doing semantic divergence detection |
| `timeline` | Only command doing multi-entity chronological reconstruction with expansion |
| `search` (hybrid) | Only command combining FTS + vector with RRF ranking |
| `me` (inbox) | Only command with cursor-based since-last-check tracking |
| `who expert` | Only command with half-life decay scoring by signal type |
| `who reviews` | Only command analyzing review patterns (approval rate, latency) |
| `who active` | Only command surfacing unresolved discussions needing attention |
---
## 4. Overlap Adjacency Matrix
Rows/columns are commands. Values are estimated functional overlap percentage.
```
issues mrs notes search who-e who-w who-r who-a who-o timeline me fh trace related drift count status stats health doctor
issues - 30 50 20 5 40 0 5 0 15 40 0 10 10 0 20 0 10 0 0
mrs 30 - 50 20 5 40 0 5 0 15 40 5 10 10 0 20 0 10 0 0
notes 50 50 - 15 15 0 5 10 0 10 0 5 5 0 0 0 0 0 0 0
search 20 20 15 - 0 0 0 0 0 15 0 0 0 80 0 0 0 5 0 0
who-expert 5 5 15 0 - 0 10 0 50 0 0 10 10 0 0 0 0 0 0 0
who-workload 40 40 0 0 0 - 0 0 0 0 85 0 0 0 0 0 0 0 0 0
who-reviews 0 0 5 0 10 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0
who-active 5 5 10 0 0 0 0 - 0 5 0 0 0 0 0 0 0 0 0 0
who-overlap 0 0 0 0 50 0 0 0 - 0 0 10 5 0 0 0 0 0 0 0
timeline 15 15 10 15 0 0 0 5 0 - 5 5 45 0 0 0 0 0 0 0
me 40 40 0 0 0 85 0 0 0 5 - 0 0 0 0 0 5 0 5 5
file-history 0 5 5 0 10 0 0 0 10 5 0 - 75 0 0 0 0 0 0 0
trace 10 10 5 0 10 0 0 0 5 45 0 75 - 0 0 0 0 0 0 0
related 10 10 0 80 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0
drift 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0
count 20 20 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 40 0 0
status 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 - 20 30 40
stats 10 10 0 5 0 0 0 0 0 0 0 0 0 0 0 40 20 - 0 15
health 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 30 0 - 90
doctor 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 40 15 90 -
```
**Highest overlap pairs (>= 75%):**
1. `health` / `doctor` — 90%
2. `who workload` / `me` — 85%
3. `related` query-mode / `search semantic` — 80%
4. `file-history` / `trace` — 75%

View File

@@ -0,0 +1,216 @@
# Agent Workflow Analysis
Common agent workflows, round-trip costs, and token profiles.
---
## 1. Common Workflows
### Flow 1: "What should I work on?" — 4 round trips
```
me → dashboard overview (which items need attention?)
issues <iid> -p proj → detail on picked issue (full context + discussions)
trace src/relevant/file.rs → understand code context (why was it written?)
who src/relevant/file.rs → find domain experts (who can help?)
```
**Total tokens (minimal):** ~800 + ~2000 + ~1000 + ~400 = ~4200
**Total tokens (full):** ~3000 + ~6000 + ~1500 + ~800 = ~11300
**Latency:** 4 serial round trips
### Flow 2: "What happened with this feature?" — 3 round trips
```
search "feature name" → find relevant entities
timeline "feature name" → reconstruct chronological history
related issues 42 → discover connected work
```
**Total tokens (minimal):** ~600 + ~1500 + ~400 = ~2500
**Total tokens (full):** ~2000 + ~5000 + ~1000 = ~8000
**Latency:** 3 serial round trips
### Flow 3: "Why was this code changed?" — 3 round trips
```
trace src/file.rs → file -> MR -> issue chain
issues <iid> -p proj → full issue detail
timeline "issue:42" → full history with cross-refs
```
**Total tokens (minimal):** ~800 + ~2000 + ~1500 = ~4300
**Total tokens (full):** ~1500 + ~6000 + ~5000 = ~12500
**Latency:** 3 serial round trips
### Flow 4: "Is the system healthy?" — 2-4 round trips
```
health → quick pre-flight (pass/fail)
doctor → detailed diagnostics (if health fails)
status → sync state per project
stats → document/index health
```
**Total tokens:** ~100 + ~300 + ~200 + ~400 = ~1000
**Latency:** 2-4 serial round trips (often 1 if health passes)
### Flow 5: "Who can review this?" — 2-3 round trips
```
who src/auth/ → find file experts
who @jdoe --reviews → check reviewer's patterns
```
**Total tokens (minimal):** ~300 + ~300 = ~600
**Latency:** 2 serial round trips
### Flow 6: "Find and understand an issue" — 4 round trips
```
search "query" → discover entities (get IIDs)
issues <iid> → full detail with discussions
timeline "issue:42" → chronological context
related issues 42 → connected entities
```
**Total tokens (minimal):** ~600 + ~2000 + ~1500 + ~400 = ~4500
**Total tokens (full):** ~2000 + ~6000 + ~5000 + ~1000 = ~14000
**Latency:** 4 serial round trips
---
## 2. Token Cost Profiles
Measured typical response sizes in robot mode with default settings:
| Command | Typical Tokens (full) | With `--fields minimal` | Dominant Cost Driver |
|---|---|---|---|
| `me` (all sections) | 2000-5000 | 500-1500 | Open items count |
| `issues` (list, n=50) | 1500-3000 | 400-800 | Labels arrays |
| `issues <iid>` (detail) | 1000-8000 | N/A (no minimal for detail) | Discussion depth |
| `mrs <iid>` (detail) | 1000-8000 | N/A | Discussion depth, DiffNote positions |
| `timeline` (limit=100) | 2000-6000 | 800-1500 | Event count + evidence |
| `search` (n=20) | 1000-3000 | 300-600 | Snippet length |
| `who expert` | 300-800 | 150-300 | Expert count |
| `who workload` | 500-1500 | 200-500 | Open items count |
| `trace` | 500-2000 | 300-800 | Chain depth |
| `file-history` | 300-1500 | 200-500 | MR count |
| `related` | 300-1000 | 200-400 | Result count |
| `drift` | 200-800 | N/A | Similarity curve length |
| `notes` (n=50) | 1500-5000 | 500-1000 | Body length |
| `count` | ~100 | N/A | Fixed structure |
| `stats` | ~500 | N/A | Fixed structure |
| `health` | ~100 | N/A | Fixed structure |
| `doctor` | ~300 | N/A | Fixed structure |
| `status` | ~200 | N/A | Project count |
### Key Observations
1. **Detail commands are expensive.** `issues <iid>` and `mrs <iid>` can hit 8000 tokens due to discussions. This is the content agents actually need, but most of it is discussion body text.
2. **`me` is the most-called command** and ranges 2000-5000 tokens. Agents often just need "do I have work?" which is ~100 tokens (summary counts only).
3. **Lists with labels are wasteful.** Every issue/MR in a list carries its full label array. With 50 items x 5 labels each, that's 250 strings of overhead.
4. **`--fields minimal` helps a lot** — 50-70% reduction on list commands. But it's not available on detail views.
5. **Timeline scales linearly** with event count and evidence notes. The `--max-evidence` flag helps cap the expensive part.
---
## 3. Round-Trip Inefficiency Patterns
### Pattern A: Discovery -> Detail (N+1)
Agent searches, gets 5 results, then needs detail on each:
```
search "auth bug" → 5 results
issues 42 -p proj → detail
issues 55 -p proj → detail
issues 71 -p proj → detail
issues 88 -p proj → detail
issues 95 -p proj → detail
```
**6 round trips** for what should be 2 (search + batch detail).
### Pattern B: Detail -> Context Gathering
Agent gets issue detail, then needs timeline + related + trace:
```
issues 42 -p proj → detail
timeline "issue:42" -p proj → events
related issues 42 -p proj → similar
trace src/file.rs -p proj → code provenance
```
**4 round trips** for what should be 1 (detail with embedded context).
### Pattern C: Health Check Cascade
Agent checks health, discovers issue, drills down:
```
health → unhealthy (exit 19)
doctor → token OK, Ollama missing
stats --check → 5 orphan embeddings
stats --repair → fixed
```
**4 round trips** but only 2 are actually needed (doctor covers health).
### Pattern D: Dashboard -> Action
Agent checks dashboard, picks item, needs full context:
```
me → 5 open issues, 2 MRs
issues 42 -p proj → picked issue detail
who src/auth/ -p proj → expert for help
timeline "issue:42" -p proj → history
```
**4 round trips.** With `--include`, could be 2 (me with inline detail + who).
---
## 4. Optimized Workflow Vision
What the same workflows look like with proposed optimizations:
### Flow 1 Optimized: "What should I work on?" — 2 round trips
```
me --depth titles → 400 tokens: counts + item titles with attention_state
issues 42 --include timeline,trace → 1 call: detail + events + code provenance
```
### Flow 2 Optimized: "What happened with this feature?" — 1-2 round trips
```
search "feature" -n 5 → find entities
issues 42 --include timeline,related → everything in one call
```
### Flow 3 Optimized: "Why was this code changed?" — 1 round trip
```
trace src/file.rs --include experts,timeline → full chain + experts + events
```
### Flow 4 Optimized: "Is the system healthy?" — 1 round trip
```
doctor → covers health + auth + connectivity
# status + stats only if doctor reveals issues
```
### Flow 6 Optimized: "Find and understand" — 2 round trips
```
search "query" -n 5 → discover entities
issues --batch 42,55,71 --include timeline → batch detail with events
```

View File

@@ -0,0 +1,198 @@
# Consolidation Proposals
5 proposals to reduce 34 commands to 29 by merging high-overlap commands.
---
## A. Absorb `file-history` into `trace --shallow`
**Overlap:** 75%. Both do rename chain BFS on `mr_file_changes`, both optionally include DiffNote discussions. `trace` follows `entity_references` to linked issues; `file-history` stops at MRs.
**Current state:**
```bash
# These do nearly the same thing:
lore file-history src/auth/ -p proj --discussions
lore trace src/auth/ -p proj --discussions
# trace just adds: issues linked via entity_references
```
**Proposed change:**
- `trace <path>` — full chain: file -> MR -> issue -> discussions (existing behavior)
- `trace <path> --shallow` — MR-only, no issue following (replaces `file-history`)
- Move `--merged` flag from `file-history` to `trace`
- Deprecate `file-history` as an alias that maps to `trace --shallow`
**Migration path:**
1. Add `--shallow` and `--merged` flags to `trace`
2. Make `file-history` an alias with deprecation warning
3. Update robot-docs to point to `trace`
4. Remove alias after 2 releases
**Breaking changes:** Robot output shape differs slightly (`trace_chains` vs `merge_requests` key name). The `--shallow` variant should match `file-history`'s output shape for compatibility.
**Effort:** Low. Most code is already shared via `resolve_rename_chain()`.
---
## B. Absorb `auth` into `doctor`
**Overlap:** 100% of `auth` is contained within `doctor`.
**Current state:**
```bash
lore auth # checks: token set, GitLab reachable, user identity
lore doctor # checks: all of above + DB + schema + Ollama
```
**Proposed change:**
- `doctor` — full check (existing behavior)
- `doctor --auth` — token + GitLab only (replaces `auth`)
- Keep `health` separate (fast pre-flight, different exit code contract: 0/19)
- Deprecate `auth` as alias for `doctor --auth`
**Migration path:**
1. Add `--auth` flag to `doctor`
2. Make `auth` an alias with deprecation warning
3. Remove alias after 2 releases
**Breaking changes:** None for robot mode (same JSON shape). Exit code mapping needs verification.
**Effort:** Low. Doctor already has the auth check logic.
---
## C. Remove `related` query-mode
**Overlap:** 80% with `search --mode semantic`.
**Current state:**
```bash
# These are functionally equivalent:
lore related "authentication flow"
lore search "authentication flow" --mode semantic
# This is UNIQUE (no overlap):
lore related issues 42
```
**Proposed change:**
- Keep entity-seeded mode: `related issues 42` (seeds from existing entity embedding)
- Remove free-text mode: `related "text"` -> error with suggestion: "Use `search --mode semantic`"
- Alternatively: keep as sugar but document it as equivalent to search
**Migration path:**
1. Add deprecation warning when query-mode is used
2. After 2 releases, remove query-mode parsing
3. Entity-mode stays unchanged
**Breaking changes:** Agents using `related "text"` must switch to `search --mode semantic`. This is a strict improvement since search has filters.
**Effort:** Low. Just argument validation change.
---
## D. Merge `who overlap` into `who expert`
**Overlap:** 50% functional, but overlap is a strict simplification of expert.
**Current state:**
```bash
lore who src/auth/ # expert mode: scored rankings
lore who --overlap src/auth/ # overlap mode: raw touch counts
```
**Proposed change:**
- `who <path>` (expert) adds `touch_count` and `last_touch_at` fields to each expert row
- `who --overlap <path>` becomes an alias for `who <path> --fields username,touch_count`
- Eventually remove `--overlap` flag
**New expert output:**
```json
{
"experts": [
{
"username": "jdoe", "score": 42.5,
"touch_count": 15, "last_touch_at": "2026-02-20",
"detail": { "mr_ids_author": [99, 101] }
}
]
}
```
**Migration path:**
1. Add `touch_count` and `last_touch_at` to expert output
2. Make `--overlap` an alias with deprecation warning
3. Remove `--overlap` after 2 releases
**Breaking changes:** Expert output gains new fields (non-breaking for JSON consumers). Overlap output shape changes if agents were parsing `{ "users": [...] }` vs `{ "experts": [...] }`.
**Effort:** Low. Expert query already touches the same tables; just need to add a COUNT aggregation.
---
## E. Merge `count` and `status` into `stats`
**Overlap:** `count` and `stats` both answer "how much data?"; `status` and `stats` both report system state.
**Current state:**
```bash
lore count issues # entity count + state breakdown
lore count mrs # entity count + state breakdown
lore status # sync cursors per project
lore stats # document/index counts + integrity
```
**Proposed change:**
- `stats` — document/index health (existing behavior, default)
- `stats --entities` — adds entity counts (replaces `count`)
- `stats --sync` — adds sync cursor positions (replaces `status`)
- `stats --all` — everything: entities + sync + documents + integrity
- `stats --check` / `--repair` — unchanged
**New `--all` output:**
```json
{
"data": {
"entities": {
"issues": { "total": 5000, "opened": 200, "closed": 4800 },
"merge_requests": { "total": 1234, "opened": 100, "closed": 50, "merged": 1084 },
"discussions": { "total": 8000 },
"notes": { "total": 282000, "system_excluded": 50000 }
},
"sync": {
"projects": [
{ "project_path": "group/repo", "last_synced_at": "...", "document_count": 5000 }
]
},
"documents": { "total": 61652, "issues": 5000, "mrs": 2000, "notes": 50000 },
"embeddings": { "total": 80000, "synced": 79500, "pending": 500 },
"fts": { "total_docs": 61652 },
"queues": { "pending": 0, "in_progress": 0, "failed": 0 },
"integrity": { "ok": true }
}
}
```
**Migration path:**
1. Add `--entities`, `--sync`, `--all` flags to `stats`
2. Make `count` an alias for `stats --entities` with deprecation warning
3. Make `status` an alias for `stats --sync` with deprecation warning
4. Remove aliases after 2 releases
**Breaking changes:** `count` output currently has `{ "entity": "issues", "count": N, "breakdown": {...} }`. Under `stats --entities`, this becomes nested under `data.entities`. Alias can preserve old shape during deprecation period.
**Effort:** Medium. Need to compose three query paths into one response builder.
---
## Summary
| Consolidation | Removes | Effort | Breaking? |
|---|---|---|---|
| `file-history` -> `trace --shallow` | -1 command | Low | Alias redirect, output shape compat |
| `auth` -> `doctor --auth` | -1 command | Low | Alias redirect |
| `related` query-mode removal | -1 mode | Low | Must switch to `search --mode semantic` |
| `who overlap` -> `who expert` | -1 sub-mode | Low | Output gains fields |
| `count` + `status` -> `stats` | -2 commands | Medium | Output nesting changes |
**Total: 34 commands -> 29 commands.** All changes use deprecation-with-alias pattern for gradual migration.

View File

@@ -0,0 +1,347 @@
# Robot-Mode Optimization Proposals
6 proposals to reduce round trips and token waste for agent consumers.
---
## A. `--include` flag for embedded sub-queries (P0)
**Problem:** The #1 agent inefficiency. Every "understand this entity" workflow requires 3-4 serial round trips: detail + timeline + related + trace.
**Proposal:** Add `--include` flag to detail commands that embeds sub-query results in the response.
```bash
# Before: 4 round trips, ~12000 tokens
lore -J issues 42 -p proj
lore -J timeline "issue:42" -p proj --limit 20
lore -J related issues 42 -p proj -n 5
lore -J trace src/auth/ -p proj
# After: 1 round trip, ~5000 tokens (sub-queries use reduced limits)
lore -J issues 42 -p proj --include timeline,related
```
### Include Matrix
| Base Command | Valid Includes | Default Limits |
|---|---|---|
| `issues <iid>` | `timeline`, `related`, `trace` | 20 events, 5 related, 5 chains |
| `mrs <iid>` | `timeline`, `related`, `file-changes` | 20 events, 5 related |
| `trace <path>` | `experts`, `timeline` | 5 experts, 20 events |
| `me` | `detail` (inline top-N item details) | 3 items detailed |
| `search` | `detail` (inline top-N result details) | 3 results detailed |
### Response Shape
Included data uses `_` prefix to distinguish from base fields:
```json
{
"ok": true,
"data": {
"iid": 42, "title": "Fix auth", "state": "opened",
"discussions": [...],
"_timeline": {
"event_count": 15,
"events": [...]
},
"_related": {
"similar_entities": [...]
}
},
"meta": {
"elapsed_ms": 200,
"_timeline_ms": 45,
"_related_ms": 120
}
}
```
### Error Handling
Sub-query errors are non-fatal. If Ollama is down, `_related` returns an error instead of failing the whole request:
```json
{
"_related_error": "Ollama unavailable — related results skipped"
}
```
### Limit Control
```bash
# Custom limits for included data
lore -J issues 42 --include timeline:50,related:10
```
### Round-Trip Savings
| Workflow | Before | After | Savings |
|---|---|---|---|
| Understand an issue | 4 calls | 1 call | **75%** |
| Why was code changed | 3 calls | 1 call | **67%** |
| Find and understand | 4 calls | 2 calls | **50%** |
**Effort:** High. Each include needs its own sub-query executor, error isolation, and limit enforcement. But the payoff is massive — this single feature halves agent round trips.
---
## B. `--depth` control on `me` (P0)
**Problem:** `me` returns 2000-5000 tokens. Agents checking "do I have work?" only need ~100 tokens.
**Proposal:** Add `--depth` flag with three levels.
```bash
# Counts only (~100 tokens) — "do I have work?"
lore -J me --depth counts
# Titles (~400 tokens) — "what work do I have?"
lore -J me --depth titles
# Full (current behavior, 2000+ tokens) — "give me everything"
lore -J me --depth full
lore -J me # same as --depth full
```
### Depth Levels
| Level | Includes | Typical Tokens |
|---|---|---|
| `counts` | `summary` block only (counts, no items) | ~100 |
| `titles` | summary + item lists with minimal fields (iid, title, attention_state) | ~400 |
| `full` | Everything: items, activity, inbox, discussions | ~2000-5000 |
### Response at `--depth counts`
```json
{
"ok": true,
"data": {
"username": "jdoe",
"summary": {
"project_count": 3,
"open_issue_count": 5,
"authored_mr_count": 2,
"reviewing_mr_count": 1,
"needs_attention_count": 3
}
}
}
```
### Response at `--depth titles`
```json
{
"ok": true,
"data": {
"username": "jdoe",
"summary": { ... },
"open_issues": [
{ "iid": 42, "title": "Fix auth", "attention_state": "needs_attention" }
],
"open_mrs_authored": [
{ "iid": 99, "title": "Refactor auth", "attention_state": "needs_attention" }
],
"reviewing_mrs": []
}
}
```
**Effort:** Low. The data is already available; just need to gate serialization by depth level.
---
## C. `--batch` flag for multi-entity detail (P1)
**Problem:** After search/timeline, agents discover N entity IIDs and need detail on each. Currently N round trips.
**Proposal:** Add `--batch` flag to `issues` and `mrs` detail mode.
```bash
# Before: 3 round trips
lore -J issues 42 -p proj
lore -J issues 55 -p proj
lore -J issues 71 -p proj
# After: 1 round trip
lore -J issues --batch 42,55,71 -p proj
```
### Response
```json
{
"ok": true,
"data": {
"results": [
{ "iid": 42, "title": "Fix auth", "state": "opened", ... },
{ "iid": 55, "title": "Add SSO", "state": "opened", ... },
{ "iid": 71, "title": "Token refresh", "state": "closed", ... }
],
"errors": [
{ "iid": 99, "error": "Not found" }
]
}
}
```
### Constraints
- Max 20 IIDs per batch
- Individual errors don't fail the batch (partial results returned)
- Works with `--include` for maximum efficiency: `--batch 42,55 --include timeline`
- Works with `--fields minimal` for token control
**Effort:** Medium. Need to loop the existing detail handler and compose results.
---
## D. Composite `context` command (P2)
**Problem:** Agents need full context on an entity but must learn `--include` syntax. A purpose-built command is more discoverable.
**Proposal:** Add `context` command that returns detail + timeline + related in one call.
```bash
lore -J context issues 42 -p proj
lore -J context mrs 99 -p proj
```
### Equivalent To
```bash
lore -J issues 42 -p proj --include timeline,related
```
But with optimized defaults:
- Timeline: 20 most recent events, max 3 evidence notes
- Related: top 5 entities
- Discussions: truncated after 5 threads
- Non-fatal: Ollama-dependent parts gracefully degrade
### Response Shape
Same as `issues <iid> --include timeline,related` but with the reduced defaults applied.
### Relationship to `--include`
`context` is sugar for the most common `--include` pattern. Both mechanisms can coexist:
- `context` for the 80% case (agents wanting full entity understanding)
- `--include` for custom combinations
**Effort:** Medium. Thin wrapper around detail + include pipeline.
---
## E. `--max-tokens` response budget (P3)
**Problem:** Response sizes vary wildly (100 to 8000 tokens). Agents can't predict cost in advance.
**Proposal:** Let agents cap response size. Server truncates to fit.
```bash
lore -J me --max-tokens 500
lore -J timeline "feature" --max-tokens 1000
lore -J context issues 42 --max-tokens 2000
```
### Truncation Strategy (priority order)
1. Apply `--fields minimal` if not already set
2. Reduce array lengths (newest/highest-score items survive)
3. Truncate string fields (descriptions, snippets) to 200 chars
4. Omit null/empty fields
5. Drop included sub-queries (if using `--include`)
### Meta Notice
```json
{
"meta": {
"elapsed_ms": 50,
"truncated": true,
"original_tokens": 3500,
"budget_tokens": 1000,
"dropped": ["_related", "discussions[5:]", "activity[10:]"]
}
}
```
### Implementation Notes
Token estimation: rough heuristic based on JSON character count / 4. Doesn't need to be exact — the goal is "roughly this size" not "exactly N tokens."
**Effort:** High. Requires token estimation, progressive truncation logic, and tracking what was dropped.
---
## F. `--format tsv` for list commands (P3)
**Problem:** JSON is verbose for tabular data. List commands return arrays of objects with repeated key names.
**Proposal:** Add `--format tsv` for list commands.
```bash
lore -J issues --format tsv --fields iid,title,state -n 10
```
### Output
```
iid title state
42 Fix auth opened
55 Add SSO opened
71 Token refresh closed
```
### Token Savings
| Command | JSON tokens | TSV tokens | Savings |
|---|---|---|---|
| `issues -n 50 --fields minimal` | ~800 | ~250 | **69%** |
| `mrs -n 50 --fields minimal` | ~800 | ~250 | **69%** |
| `who expert -n 10` | ~300 | ~100 | **67%** |
| `notes -n 50 --fields minimal` | ~1000 | ~350 | **65%** |
### Applicable Commands
TSV works well for flat, tabular data:
- `issues` (list), `mrs` (list), `notes` (list)
- `who expert`, `who overlap`, `who reviews`
- `count`
TSV does NOT work for nested/complex data:
- Detail views (discussions are nested)
- Timeline (events have nested evidence)
- Search (nested explain, labels arrays)
- `me` (multiple sections)
### Agent Parsing
Most LLMs parse TSV naturally. Agents that need structured data can still use JSON.
**Effort:** Medium. Tab-separated serialization for flat structs is straightforward. Need to handle escaping for body text containing tabs/newlines.
---
## Impact Summary
| Optimization | Priority | Effort | Round-Trip Savings | Token Savings |
|---|---|---|---|---|
| `--include` | P0 | High | **50-75%** | Moderate |
| `--depth` on `me` | P0 | Low | None | **60-80%** |
| `--batch` | P1 | Medium | **N-1 per batch** | Moderate |
| `context` command | P2 | Medium | **67-75%** | Moderate |
| `--max-tokens` | P3 | High | None | **Variable** |
| `--format tsv` | P3 | Medium | None | **65-69% on lists** |
### Implementation Order
1. **`--depth` on `me`** — lowest effort, high value, no risk
2. **`--include` on `issues`/`mrs` detail** — highest impact, start with `timeline` include only
3. **`--batch`** — eliminates N+1 pattern
4. **`context` command** — sugar on top of `--include`
5. **`--format tsv`** — nice-to-have, easy to add incrementally
6. **`--max-tokens`** — complex, defer until demand is clear

View File

@@ -0,0 +1,181 @@
# Appendices
---
## A. Robot Output Envelope
All robot-mode responses follow this structure:
```json
{
"ok": true,
"data": { /* command-specific */ },
"meta": { "elapsed_ms": 42 }
}
```
Errors (to stderr):
```json
{
"error": {
"code": "CONFIG_NOT_FOUND",
"message": "Configuration file not found",
"suggestion": "Run 'lore init'",
"actions": ["lore init"]
}
}
```
The `actions` array contains copy-paste shell commands for automated recovery. Omitted when empty.
---
## B. Exit Codes
| Code | Meaning | Retryable |
|---|---|---|
| 0 | Success | N/A |
| 1 | Internal error / not implemented | Maybe |
| 2 | Usage error (invalid flags or arguments) | No (fix syntax) |
| 3 | Config invalid | No (fix config) |
| 4 | Token not set | No (set token) |
| 5 | GitLab auth failed | Maybe (token expired?) |
| 6 | Resource not found (HTTP 404) | No |
| 7 | Rate limited | Yes (wait) |
| 8 | Network error | Yes (retry) |
| 9 | Database locked | Yes (wait) |
| 10 | Database error | Maybe |
| 11 | Migration failed | No (investigate) |
| 12 | I/O error | Maybe |
| 13 | Transform error | No (bug) |
| 14 | Ollama unavailable | Yes (start Ollama) |
| 15 | Ollama model not found | No (pull model) |
| 16 | Embedding failed | Yes (retry) |
| 17 | Not found (entity does not exist) | No |
| 18 | Ambiguous match (use `-p` to specify project) | No (be specific) |
| 19 | Health check failed | Yes (fix issues first) |
| 20 | Config not found | No (run init) |
---
## C. Field Selection Presets
The `--fields` flag supports both presets and custom field lists:
```bash
lore -J issues --fields minimal # Preset
lore -J mrs --fields iid,title,state,draft # Custom comma-separated
```
| Command | Minimal Preset Fields |
|---|---|
| `issues` (list) | `iid`, `title`, `state`, `updated_at_iso` |
| `mrs` (list) | `iid`, `title`, `state`, `updated_at_iso` |
| `notes` (list) | `id`, `author_username`, `body`, `created_at_iso` |
| `search` | `document_id`, `title`, `source_type`, `score` |
| `timeline` | `timestamp`, `type`, `entity_iid`, `detail` |
| `who expert` | `username`, `score` |
| `who workload` | `iid`, `title`, `state` |
| `who reviews` | `name`, `count`, `percentage` |
| `who active` | `entity_type`, `iid`, `title`, `participants` |
| `who overlap` | `username`, `touch_count` |
| `me` (items) | `iid`, `title`, `attention_state`, `updated_at_iso` |
| `me` (activity) | `timestamp_iso`, `event_type`, `entity_iid`, `actor` |
---
## D. Configuration Precedence
1. CLI flags (highest priority)
2. Environment variables (`LORE_ROBOT`, `GITLAB_TOKEN`, `LORE_CONFIG_PATH`)
3. Config file (`~/.config/lore/config.json`)
4. Built-in defaults (lowest priority)
---
## E. Time Parsing
All commands accepting `--since`, `--until`, `--as-of` support:
| Format | Example | Meaning |
|---|---|---|
| Relative days | `7d` | 7 days ago |
| Relative weeks | `2w` | 2 weeks ago |
| Relative months | `1m`, `6m` | 1/6 months ago |
| Absolute date | `2026-01-15` | Specific date |
Internally converted to Unix milliseconds for DB queries.
---
## F. Database Schema (28 migrations)
### Primary Entity Tables
| Table | Key Columns | Notes |
|---|---|---|
| `projects` | `gitlab_project_id`, `path_with_namespace`, `web_url` | No `name` or `last_seen_at` |
| `issues` | `iid`, `title`, `state`, `author_username`, 5 status columns | Status columns nullable (migration 021) |
| `merge_requests` | `iid`, `title`, `state`, `draft`, `source_branch`, `target_branch` | `last_seen_at INTEGER NOT NULL` |
| `discussions` | `gitlab_discussion_id` (text), `issue_id`/`merge_request_id` | One FK must be set |
| `notes` | `gitlab_id`, `author_username`, `body`, DiffNote position columns | `type` column for DiffNote/DiscussionNote |
### Relationship Tables
| Table | Purpose |
|---|---|
| `issue_labels`, `mr_labels` | Label junction (DELETE+INSERT for stale removal) |
| `issue_assignees`, `mr_assignees` | Assignee junction |
| `mr_reviewers` | Reviewer junction |
| `entity_references` | Cross-refs: closes, mentioned, related (with `source_method`) |
| `mr_file_changes` | File diffs: old_path, new_path, change_type |
### Event Tables
| Table | Constraint |
|---|---|
| `resource_state_events` | CHECK: exactly one of issue_id/merge_request_id NOT NULL |
| `resource_label_events` | Same CHECK constraint; `label_name` nullable (migration 012) |
| `resource_milestone_events` | Same CHECK constraint; `milestone_title` nullable |
### Document/Search Pipeline
| Table | Purpose |
|---|---|
| `documents` | Unified searchable content (source_type: issue/merge_request/discussion) |
| `documents_fts` | FTS5 virtual table for text search |
| `documents_fts_docsize` | FTS5 shadow B-tree (19x faster for COUNT) |
| `document_labels` | Fast label filtering (indexed exact-match) |
| `document_paths` | File path association for DiffNote filtering |
| `embeddings` | vec0 virtual table; rowid = document_id * 1000 + chunk_index |
| `embedding_metadata` | Chunk provenance + staleness tracking (document_hash) |
| `dirty_sources` | Documents needing regeneration (with backoff via next_attempt_at) |
### Infrastructure
| Table | Purpose |
|---|---|
| `sync_runs` | Sync history with metrics |
| `sync_cursors` | Per-resource sync position (updated_at cursor + tie_breaker_id) |
| `app_locks` | Crash-safe single-flight lock |
| `raw_payloads` | Raw JSON storage for debugging |
| `pending_discussion_fetches` | Dependent discussion fetch queue |
| `pending_dependent_fetches` | Job queue for resource_events, mr_closes, mr_diffs |
| `schema_version` | Migration tracking |
---
## G. Glossary
| Term | Definition |
|---|---|
| **IID** | Issue/MR number within a project (not globally unique) |
| **FTS5** | SQLite full-text search extension (BM25 ranking) |
| **vec0** | SQLite extension for vector similarity search |
| **RRF** | Reciprocal Rank Fusion — combines FTS and vector rankings |
| **DiffNote** | Comment attached to a specific line in a merge request diff |
| **Entity reference** | Cross-reference between issues/MRs (closes, mentioned, related) |
| **Rename chain** | BFS traversal of mr_file_changes to follow file renames |
| **Attention state** | Computed field on `me` items: needs_attention, not_started, stale, etc. |
| **Surgical sync** | Fetching specific entities by IID instead of full incremental sync |

View File

@@ -0,0 +1,58 @@
-- Migration 028: Add FK constraint on discussions.merge_request_id
-- Schema version: 28
-- Fixes missing foreign key that causes orphaned discussions when MRs are deleted
-- SQLite doesn't support ALTER TABLE ADD CONSTRAINT, so we must recreate the table.
-- Step 1: Create new table with the FK constraint
CREATE TABLE discussions_new (
id INTEGER PRIMARY KEY,
gitlab_discussion_id TEXT NOT NULL,
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE, -- FK was missing!
noteable_type TEXT NOT NULL CHECK (noteable_type IN ('Issue', 'MergeRequest')),
individual_note INTEGER NOT NULL DEFAULT 0,
first_note_at INTEGER,
last_note_at INTEGER,
last_seen_at INTEGER NOT NULL,
resolvable INTEGER NOT NULL DEFAULT 0,
resolved INTEGER NOT NULL DEFAULT 0,
raw_payload_id INTEGER REFERENCES raw_payloads(id), -- Added in migration 004
CHECK (
(noteable_type = 'Issue' AND issue_id IS NOT NULL AND merge_request_id IS NULL) OR
(noteable_type = 'MergeRequest' AND merge_request_id IS NOT NULL AND issue_id IS NULL)
)
);
-- Step 2: Copy data (only rows with valid FK references to avoid constraint violations)
INSERT INTO discussions_new
SELECT d.* FROM discussions d
WHERE (d.merge_request_id IS NULL OR EXISTS (SELECT 1 FROM merge_requests m WHERE m.id = d.merge_request_id));
-- Step 3: Drop old table and rename
DROP TABLE discussions;
ALTER TABLE discussions_new RENAME TO discussions;
-- Step 4: Recreate ALL indexes that were on the discussions table
-- From migration 002 (original table)
CREATE UNIQUE INDEX uq_discussions_project_discussion_id ON discussions(project_id, gitlab_discussion_id);
CREATE INDEX idx_discussions_issue ON discussions(issue_id);
CREATE INDEX idx_discussions_mr ON discussions(merge_request_id);
CREATE INDEX idx_discussions_last_note ON discussions(last_note_at);
-- From migration 003 (orphan detection)
CREATE INDEX idx_discussions_last_seen ON discussions(last_seen_at);
-- From migration 006 (MR indexes)
CREATE INDEX idx_discussions_mr_id ON discussions(merge_request_id);
CREATE INDEX idx_discussions_mr_resolved ON discussions(merge_request_id, resolved, resolvable);
-- From migration 017 (who command indexes)
CREATE INDEX idx_discussions_unresolved_recent ON discussions(project_id, last_note_at) WHERE resolvable = 1 AND resolved = 0;
CREATE INDEX idx_discussions_unresolved_recent_global ON discussions(last_note_at) WHERE resolvable = 1 AND resolved = 0;
-- From migration 019 (list performance)
CREATE INDEX idx_discussions_issue_resolved ON discussions(issue_id, resolvable, resolved);
-- From migration 022 (notes query optimization)
CREATE INDEX idx_discussions_issue_id ON discussions(issue_id);
-- Record migration
INSERT INTO schema_version (version, applied_at, description)
VALUES (28, strftime('%s', 'now') * 1000, 'Add FK constraint on discussions.merge_request_id');

View File

@@ -0,0 +1,652 @@
---
plan: true
title: "GitLab TODOs Integration"
status: proposed
iteration: 4
target_iterations: 4
beads_revision: 1
related_plans: []
created: 2026-02-23
updated: 2026-02-26
audit_revision: 4
---
# GitLab TODOs Integration
## Summary
Add GitLab TODO support to lore. Todos are fetched during sync, stored locally, and surfaced through a standalone `lore todos` command and integration into the `lore me` dashboard.
**Scope:** Read-only. No mark-as-done operations.
---
## Workflows
### Workflow 1: Morning Triage (Human)
1. User runs `lore me` to see personal dashboard
2. Summary header shows "5 pending todos" alongside issue/MR counts
3. Todos section groups items: 2 Assignments, 2 Mentions, 1 Approval Required
4. User scans Assignments — sees issue #42 assigned by @manager
5. User runs `lore todos` for full detail with body snippets
6. User clicks target URL to address highest-priority item
7. After marking done in GitLab, next `lore sync` removes it locally
### Workflow 2: Agent Polling (Robot Mode)
1. Agent runs `lore --robot health` as pre-flight check
2. Agent runs `lore --robot me --fields minimal` for dashboard
3. Agent extracts `pending_todo_count` from summary — if 0, skip todos
4. If count > 0, agent runs `lore --robot todos`
5. Agent iterates `data.todos[]`, filtering by `action` type
6. Agent prioritizes `approval_required` and `build_failed` for immediate attention
7. Agent logs external todos (`is_external: true`) for manual review
### Workflow 3: Cross-Project Visibility
1. User is mentioned in a project they don't sync (e.g., company-wide repo)
2. `lore sync` fetches the todo anyway (account-wide fetch)
3. `lore todos` shows item with `[external]` indicator and project path
4. User can still click target URL to view in GitLab
5. Target title may be unavailable — graceful fallback to "Untitled"
---
## Acceptance Criteria
Behavioral contract. Each AC is a single testable statement.
### Storage
| ID | Behavior |
|----|----------|
| AC-1 | Todos are persisted locally in SQLite |
| AC-2 | Each todo is uniquely identified by its GitLab todo ID |
| AC-3 | Todos from non-synced projects are stored with their project path |
### Sync
| ID | Behavior |
|----|----------|
| AC-4 | `lore sync` fetches all pending todos from GitLab |
| AC-5 | Sync fetches todos account-wide, not per-project |
| AC-6 | Todos marked done in GitLab are removed locally on next sync |
| AC-7 | Transient sync errors do not delete valid local todos |
| AC-8 | `lore sync --no-todos` skips todo fetching |
| AC-9 | Sync logs todo statistics (fetched, inserted, updated, deleted) |
### `lore todos` Command
| ID | Behavior |
|----|----------|
| AC-10 | `lore todos` displays all pending todos |
| AC-11 | Todos are grouped by action type: Assignments, Mentions, Approvals, Build Issues |
| AC-12 | Each todo shows: target title, project path, author, age |
| AC-13 | Non-synced project todos display `[external]` indicator |
| AC-14 | `lore todos --limit N` limits output to N todos |
| AC-15 | `lore --robot todos` returns JSON with standard `{ok, data, meta}` envelope |
| AC-16 | `lore --robot todos --fields minimal` returns reduced field set |
| AC-17 | `todo` and `td` are recognized as aliases for `todos` |
### `lore me` Integration
| ID | Behavior |
|----|----------|
| AC-18 | `lore me` summary includes pending todo count |
| AC-19 | `lore me` includes a todos section in the full dashboard |
| AC-20 | `lore me --todos` shows only the todos section |
| AC-21 | Todos are NOT filtered by `--project` flag (always account-wide) |
| AC-22 | Warning is displayed if `--project` is passed with `--todos` |
| AC-23 | Todo events appear in the activity feed for local entities |
### Action Types
| ID | Behavior |
|----|----------|
| AC-24 | Core actions are displayed: assigned, mentioned, directly_addressed, approval_required, build_failed, unmergeable |
| AC-25 | Niche actions are stored but not displayed: merge_train_removed, member_access_requested, marked |
### Attention State
| ID | Behavior |
|----|----------|
| AC-26 | Todos do not affect attention state calculation |
| AC-27 | Todos do not appear in "since last check" cursor-based inbox |
### Error Handling
| ID | Behavior |
|----|----------|
| AC-28 | 403 Forbidden on todos API logs warning and continues sync |
| AC-29 | 429 Rate Limited respects Retry-After header |
| AC-30 | Malformed todo JSON logs warning, skips that item, and disables purge for that sync |
### Documentation
| ID | Behavior |
|----|----------|
| AC-31 | `lore todos` appears in CLI help |
| AC-32 | `lore robot-docs` includes todos schema |
| AC-33 | CLAUDE.md documents the todos command |
### Quality
| ID | Behavior |
|----|----------|
| AC-34 | All quality gates pass: check, clippy, fmt, test |
---
## Architecture
Designed to fulfill the acceptance criteria above.
### Module Structure
```
src/
├── gitlab/
│ ├── client.rs # fetch_todos() method (AC-4, AC-5)
│ └── types.rs # GitLabTodo struct
├── ingestion/
│ └── todos.rs # sync_todos(), purge-safe deletion (AC-6, AC-7)
├── cli/commands/
│ ├── todos.rs # lore todos command (AC-10-17)
│ └── me/
│ ├── types.rs # MeTodo, extend MeSummary (AC-18)
│ └── queries.rs # query_todos() (AC-19, AC-23)
└── core/
└── db.rs # Migration 028 (AC-1, AC-2, AC-3)
```
### Data Flow
```
GitLab API Local SQLite CLI Output
─────────── ──────────── ──────────
GET /api/v4/todos → todos table → lore todos
(account-wide) (purge-safe sync) lore me --todos
```
### Key Design Decisions
| Decision | Rationale | ACs |
|----------|-----------|-----|
| Account-wide fetch | GitLab todos API is user-scoped, not project-scoped | AC-5, AC-21 |
| Purge-safe deletion | Transient errors should not delete valid data | AC-7 |
| Separate from attention | Todos are notifications, not engagement signals | AC-26, AC-27 |
| Store all actions, display core | Future-proofs for new action types | AC-24, AC-25 |
### Existing Code to Extend
| Type | Location | Extension |
|------|----------|-----------|
| `MeSummary` | `src/cli/commands/me/types.rs` | Add `pending_todo_count` field |
| `ActivityEventType` | `src/cli/commands/me/types.rs` | Add `Todo` variant |
| `MeDashboard` | `src/cli/commands/me/types.rs` | Add `todos: Vec<MeTodo>` field |
| `SyncArgs` | `src/cli/mod.rs` | Add `--no-todos` flag |
| `MeArgs` | `src/cli/mod.rs` | Add `--todos` flag |
---
## Implementation Specifications
Each IMP section details HOW to fulfill specific ACs.
### IMP-1: Database Schema
**Fulfills:** AC-1, AC-2, AC-3
**Migration 028:**
```sql
CREATE TABLE todos (
id INTEGER PRIMARY KEY,
gitlab_todo_id INTEGER NOT NULL UNIQUE,
project_id INTEGER REFERENCES projects(id) ON DELETE SET NULL,
gitlab_project_id INTEGER,
target_type TEXT NOT NULL,
target_id TEXT,
target_iid INTEGER,
target_url TEXT NOT NULL,
target_title TEXT,
action_name TEXT NOT NULL,
author_id INTEGER,
author_username TEXT,
body TEXT,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
synced_at INTEGER NOT NULL,
sync_generation INTEGER NOT NULL DEFAULT 0,
project_path TEXT
);
CREATE INDEX idx_todos_action_created ON todos(action_name, created_at DESC);
CREATE INDEX idx_todos_target ON todos(target_type, target_id);
CREATE INDEX idx_todos_created ON todos(created_at DESC);
CREATE INDEX idx_todos_sync_gen ON todos(sync_generation);
CREATE INDEX idx_todos_gitlab_project ON todos(gitlab_project_id);
CREATE INDEX idx_todos_target_lookup ON todos(target_type, project_id, target_iid);
```
**Notes:**
- `project_id` nullable for non-synced projects (AC-3)
- `gitlab_project_id` nullable — TODO targets include non-project entities (Namespace, etc.)
- No `state` column — we only store pending todos
- `sync_generation` enables two-generation grace purge (AC-7)
---
### IMP-2: GitLab API Client
**Fulfills:** AC-4, AC-5
**Endpoint:** `GET /api/v4/todos?state=pending`
**Types to add in `src/gitlab/types.rs`:**
```rust
#[derive(Debug, Deserialize)]
pub struct GitLabTodo {
pub id: i64,
pub project: Option<GitLabTodoProject>,
pub author: Option<GitLabTodoAuthor>,
pub action_name: String,
pub target_type: String,
pub target: Option<GitLabTodoTarget>,
pub target_url: String,
pub body: Option<String>,
pub state: String,
pub created_at: String,
pub updated_at: String,
}
#[derive(Debug, Deserialize)]
pub struct GitLabTodoProject {
pub id: i64,
pub path_with_namespace: String,
}
#[derive(Debug, Deserialize)]
pub struct GitLabTodoTarget {
pub id: serde_json::Value, // i64 or String (commit SHA)
pub iid: Option<i64>,
pub title: Option<String>,
}
#[derive(Debug, Deserialize)]
pub struct GitLabTodoAuthor {
pub id: i64,
pub username: String,
}
```
**Client method in `src/gitlab/client.rs`:**
```rust
pub fn fetch_todos(&self) -> impl Stream<Item = Result<GitLabTodo>> {
self.paginate("/api/v4/todos?state=pending")
}
```
---
### IMP-3: Sync Pipeline Integration
**Fulfills:** AC-4, AC-5, AC-6, AC-7, AC-8, AC-9
**New file: `src/ingestion/todos.rs`**
**Sync position:** Account-wide step after per-project sync and status enrichment.
```
Sync order:
1. Issues (per project)
2. MRs (per project)
3. Status enrichment (account-wide GraphQL)
4. Todos (account-wide REST) ← NEW
```
**Purge-safe deletion pattern:**
```rust
pub struct TodoSyncResult {
pub fetched: usize,
pub upserted: usize,
pub deleted: usize,
pub generation: i64,
pub purge_allowed: bool,
}
pub fn sync_todos(conn: &Connection, client: &GitLabClient) -> Result<TodoSyncResult> {
// 1. Get next generation
let generation: i64 = conn.query_row(
"SELECT COALESCE(MAX(sync_generation), 0) + 1 FROM todos",
[], |r| r.get(0)
)?;
let mut fetched = 0;
let mut purge_allowed = true;
// 2. Fetch and upsert all todos
for result in client.fetch_todos()? {
match result {
Ok(todo) => {
upsert_todo_guarded(conn, &todo, generation)?;
fetched += 1;
}
Err(e) => {
// Malformed JSON: log warning, skip item, disable purge
warn!("Skipping malformed todo: {e}");
purge_allowed = false;
}
}
}
// 3. Two-generation grace purge: delete only if missing for 2+ consecutive syncs
// This protects against pagination drift (new todos inserted during traversal)
let deleted = if purge_allowed {
conn.execute("DELETE FROM todos WHERE sync_generation < ? - 1", [generation])?
} else {
0
};
Ok(TodoSyncResult { fetched, upserted: fetched, deleted, generation, purge_allowed })
}
```
**Concurrent-safe upsert:**
```sql
INSERT INTO todos (..., sync_generation) VALUES (?, ..., ?)
ON CONFLICT(gitlab_todo_id) DO UPDATE SET
...,
sync_generation = excluded.sync_generation,
synced_at = excluded.synced_at
WHERE excluded.sync_generation >= todos.sync_generation;
```
**"Success" for purge (all must be true):**
- Every page fetch completed without error
- Every todo JSON decoded successfully (any decode failure sets `purge_allowed=false`)
- Pagination traversal completed (not interrupted)
- Response was not 401/403
- Zero todos IS valid for purge when above conditions met
**Two-generation grace purge:**
Todos are deleted only if missing for 2 consecutive successful syncs (`sync_generation < current - 1`).
This protects against false deletions from pagination drift (new todos inserted during traversal).
---
### IMP-4: Project Path Extraction
**Fulfills:** AC-3, AC-13
```rust
use once_cell::sync::Lazy;
use regex::Regex;
pub fn extract_project_path(url: &str) -> Option<&str> {
static RE: Lazy<Regex> = Lazy::new(|| {
Regex::new(r"https?://[^/]+/(.+?)/-/(?:issues|merge_requests|epics|commits)/")
.expect("valid regex")
});
RE.captures(url)
.and_then(|c| c.get(1))
.map(|m| m.as_str())
}
```
**Usage:** Prefer `project.path_with_namespace` from API when available. Fall back to URL extraction for external projects.
---
### IMP-5: `lore todos` Command
**Fulfills:** AC-10, AC-11, AC-12, AC-13, AC-14, AC-15, AC-16, AC-17
**New file: `src/cli/commands/todos.rs`**
**Args:**
```rust
#[derive(Parser)]
#[command(alias = "todo")]
pub struct TodosArgs {
#[arg(short = 'n', long)]
pub limit: Option<usize>,
}
```
**Autocorrect aliases in `src/cli/mod.rs`:**
```rust
("td", "todos"),
("todo", "todos"),
```
**Action type grouping:**
| Group | Actions |
|-------|---------|
| Assignments | `assigned` |
| Mentions | `mentioned`, `directly_addressed` |
| Approvals | `approval_required` |
| Build Issues | `build_failed`, `unmergeable` |
**Robot mode schema:**
```json
{
"ok": true,
"data": {
"todos": [{
"id": 123,
"gitlab_todo_id": 456,
"action": "mentioned",
"target_type": "Issue",
"target_iid": 42,
"target_title": "Fix login bug",
"target_url": "https://...",
"project_path": "group/repo",
"author_username": "jdoe",
"body": "Hey @you, can you look at this?",
"created_at_iso": "2026-02-20T10:00:00Z",
"is_external": false
}],
"counts": {
"total": 8,
"assigned": 2,
"mentioned": 5,
"approval_required": 1,
"build_failed": 0,
"unmergeable": 0,
"other": 0
}
},
"meta": {"elapsed_ms": 42}
}
```
**Minimal fields:** `gitlab_todo_id`, `action`, `target_type`, `target_iid`, `project_path`, `is_external`
---
### IMP-6: `lore me` Integration
**Fulfills:** AC-18, AC-19, AC-20, AC-21, AC-22, AC-23
**Types to add/extend in `src/cli/commands/me/types.rs`:**
```rust
// EXTEND
pub struct MeSummary {
// ... existing fields ...
pub pending_todo_count: usize, // ADD
}
// EXTEND
pub enum ActivityEventType {
// ... existing variants ...
Todo, // ADD
}
// EXTEND
pub struct MeDashboard {
// ... existing fields ...
pub todos: Vec<MeTodo>, // ADD
}
// NEW
pub struct MeTodo {
pub id: i64,
pub gitlab_todo_id: i64,
pub action: String,
pub target_type: String,
pub target_iid: Option<i64>,
pub target_title: Option<String>,
pub target_url: String,
pub project_path: String,
pub author_username: Option<String>,
pub body: Option<String>,
pub created_at: i64,
pub is_external: bool,
}
```
**Warning for `--project` with `--todos` (AC-22):**
```rust
if args.todos && args.project.is_some() {
eprintln!("Warning: Todos are account-wide; project filter not applied");
}
```
---
### IMP-7: Error Handling
**Fulfills:** AC-28, AC-29, AC-30
| Error | Behavior |
|-------|----------|
| 403 Forbidden | Log warning, skip todo sync, continue with other entities |
| 429 Rate Limited | Respect `Retry-After` header using existing retry policy |
| Malformed JSON | Log warning with todo ID, skip item, set `purge_allowed=false`, continue batch |
**Rationale for purge disable on malformed JSON:** If we can't decode a todo, we don't know its `gitlab_todo_id`. Without that, we might accidentally purge a valid todo that was simply malformed in transit. Disabling purge for that sync is the safe choice.
---
### IMP-8: Test Fixtures
**Fulfills:** AC-34
**Location:** `tests/fixtures/todos/`
**`todos_pending.json`:**
```json
[
{
"id": 102,
"project": {"id": 2, "path_with_namespace": "diaspora/client"},
"author": {"id": 1, "username": "admin"},
"action_name": "mentioned",
"target_type": "Issue",
"target": {"id": 11, "iid": 4, "title": "Inventory system"},
"target_url": "https://gitlab.example.com/diaspora/client/-/issues/4",
"body": "@user please review",
"state": "pending",
"created_at": "2026-02-20T10:00:00.000Z",
"updated_at": "2026-02-20T10:00:00.000Z"
}
]
```
**`todos_empty.json`:** `[]`
**`todos_commit_target.json`:** (target.id is string SHA)
**`todos_niche_actions.json`:** (merge_train_removed, etc.)
---
## Rollout Slices
### Dependency Graph
```
Slice A ──────► Slice B ──────┬──────► Slice C
(Schema) (Sync) │ (`lore todos`)
└──────► Slice D
(`lore me`)
Slice C ───┬───► Slice E
Slice D ───┘ (Polish)
```
### Slice A: Schema + Client
**ACs:** AC-1, AC-2, AC-3, AC-4, AC-5
**IMPs:** IMP-1, IMP-2, IMP-4
**Deliverable:** Migration + client method + deserialization tests pass
### Slice B: Sync Integration
**ACs:** AC-6, AC-7, AC-8, AC-9, AC-28, AC-29, AC-30
**IMPs:** IMP-3, IMP-7
**Deliverable:** `lore sync` fetches todos; `--no-todos` works
### Slice C: `lore todos` Command
**ACs:** AC-10, AC-11, AC-12, AC-13, AC-14, AC-15, AC-16, AC-17, AC-24, AC-25
**IMPs:** IMP-5
**Deliverable:** `lore todos` and `lore --robot todos` work
### Slice D: `lore me` Integration
**ACs:** AC-18, AC-19, AC-20, AC-21, AC-22, AC-23, AC-26, AC-27
**IMPs:** IMP-6
**Deliverable:** `lore me --todos` works; summary shows count
### Slice E: Polish
**ACs:** AC-31, AC-32, AC-33, AC-34
**IMPs:** IMP-8
**Deliverable:** Docs updated; all quality gates pass
---
## Design Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Write operations | Read-only | Complexity; glab handles writes |
| Storage | SQLite | Consistent with existing architecture |
| Project filter | Account-wide only | GitLab API is user-scoped |
| Action type display | Core only | Reduce noise; store all for future |
| Attention state | Separate signal | Todos are notifications, not engagement |
| History | Pending only | Simplicity; done todos have no value locally |
| Grouping | By action type | Matches GitLab UI; aids triage |
| Purge strategy | Two-generation grace | Protects against pagination drift during sync |
---
## Out of Scope
- Write operations (mark as done)
- Done todo history tracking
- Filters beyond `--limit`
- Todo-based attention state boosting
- Notification settings API
---
## References
- [GitLab To-Do List API](https://docs.gitlab.com/api/todos/)
- [GitLab User Todos](https://docs.gitlab.com/user/todos/)

View File

@@ -183,6 +183,7 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--max-evidence",
],
),
("related", &["--limit", "--project"]),
(
"who",
&[
@@ -297,6 +298,7 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
"--all",
"--user",
"--fields",
"--reset-cursor",
],
),
];

View File

@@ -710,6 +710,131 @@ fn activity_review_request_system_note() {
assert_eq!(results[0].event_type, ActivityEventType::ReviewRequest);
}
// ─── Since-Last-Check Mention Tests ─────────────────────────────────────────
#[test]
fn since_last_check_detects_mention_with_trailing_comma() {
let conn = setup_test_db();
insert_project(&conn, 1, "group/repo");
insert_issue(&conn, 10, 1, 42, "someone");
let disc_id = 100;
insert_discussion(&conn, disc_id, 1, None, Some(10));
let t = now_ms() - 1000;
insert_note_at(
&conn,
200,
disc_id,
1,
"bob",
false,
"please review this @alice, thanks",
t,
);
let groups = query_since_last_check(&conn, "alice", 0).unwrap();
let total_events: usize = groups.iter().map(|g| g.events.len()).sum();
assert_eq!(total_events, 1, "expected mention with comma to match");
}
#[test]
fn since_last_check_ignores_email_like_text() {
let conn = setup_test_db();
insert_project(&conn, 1, "group/repo");
insert_issue(&conn, 10, 1, 42, "someone");
let disc_id = 100;
insert_discussion(&conn, disc_id, 1, None, Some(10));
let t = now_ms() - 1000;
insert_note_at(
&conn,
200,
disc_id,
1,
"bob",
false,
"contact alice at foo@alice.com",
t,
);
let groups = query_since_last_check(&conn, "alice", 0).unwrap();
let total_events: usize = groups.iter().map(|g| g.events.len()).sum();
assert_eq!(total_events, 0, "email text should not count as mention");
}
#[test]
fn since_last_check_detects_mention_with_trailing_period() {
let conn = setup_test_db();
insert_project(&conn, 1, "group/repo");
insert_issue(&conn, 10, 1, 42, "someone");
let disc_id = 100;
insert_discussion(&conn, disc_id, 1, None, Some(10));
let t = now_ms() - 1000;
insert_note_at(
&conn,
200,
disc_id,
1,
"bob",
false,
"please review this @alice.",
t,
);
let groups = query_since_last_check(&conn, "alice", 0).unwrap();
let total_events: usize = groups.iter().map(|g| g.events.len()).sum();
assert_eq!(total_events, 1, "expected mention with period to match");
}
#[test]
fn since_last_check_detects_mention_inside_parentheses() {
let conn = setup_test_db();
insert_project(&conn, 1, "group/repo");
insert_issue(&conn, 10, 1, 42, "someone");
let disc_id = 100;
insert_discussion(&conn, disc_id, 1, None, Some(10));
let t = now_ms() - 1000;
insert_note_at(
&conn,
200,
disc_id,
1,
"bob",
false,
"thanks (@alice) for the update",
t,
);
let groups = query_since_last_check(&conn, "alice", 0).unwrap();
let total_events: usize = groups.iter().map(|g| g.events.len()).sum();
assert_eq!(total_events, 1, "expected parenthesized mention to match");
}
#[test]
fn since_last_check_ignores_domain_like_text() {
let conn = setup_test_db();
insert_project(&conn, 1, "group/repo");
insert_issue(&conn, 10, 1, 42, "someone");
let disc_id = 100;
insert_discussion(&conn, disc_id, 1, None, Some(10));
let t = now_ms() - 1000;
insert_note_at(
&conn,
200,
disc_id,
1,
"bob",
false,
"@alice.com is the old hostname",
t,
);
let groups = query_since_last_check(&conn, "alice", 0).unwrap();
let total_events: usize = groups.iter().map(|g| g.events.len()).sum();
assert_eq!(
total_events, 0,
"domain-like text should not count as mention"
);
}
// ─── Helper Tests ──────────────────────────────────────────────────────────
#[test]
@@ -734,6 +859,7 @@ fn parse_attention_state_all_variants() {
#[test]
fn parse_event_type_all_variants() {
assert_eq!(parse_event_type("note"), ActivityEventType::Note);
assert_eq!(parse_event_type("mention_note"), ActivityEventType::Note);
assert_eq!(
parse_event_type("status_change"),
ActivityEventType::StatusChange

View File

@@ -9,14 +9,18 @@ use rusqlite::Connection;
use crate::Config;
use crate::cli::MeArgs;
use crate::core::cursor;
use crate::core::db::create_connection;
use crate::core::error::{LoreError, Result};
use crate::core::paths::get_db_path;
use crate::core::project::resolve_project;
use crate::core::time::parse_since;
use self::queries::{query_activity, query_authored_mrs, query_open_issues, query_reviewing_mrs};
use self::types::{AttentionState, MeDashboard, MeSummary};
use self::queries::{
query_activity, query_authored_mrs, query_open_issues, query_reviewing_mrs,
query_since_last_check,
};
use self::types::{AttentionState, MeDashboard, MeSummary, SinceLastCheck};
/// Default activity lookback: 1 day in milliseconds.
const DEFAULT_ACTIVITY_SINCE_DAYS: i64 = 1;
@@ -72,6 +76,20 @@ pub fn resolve_project_scope(
/// summary computation → dashboard assembly → rendering.
pub fn run_me(config: &Config, args: &MeArgs, robot_mode: bool) -> Result<()> {
let start = std::time::Instant::now();
let username = resolve_username(args, config)?;
// 0. Handle --reset-cursor early return
if args.reset_cursor {
cursor::reset_cursor(username)
.map_err(|e| LoreError::Other(format!("reset cursor: {e}")))?;
let elapsed_ms = start.elapsed().as_millis() as u64;
if robot_mode {
render_robot::print_cursor_reset_json(elapsed_ms)?;
} else {
println!("Cursor reset for @{username}. Next `lore me` will establish a new baseline.");
}
return Ok(());
}
// 1. Open DB
let db_path = get_db_path(config.storage.db_path.as_deref());
@@ -89,14 +107,11 @@ pub fn run_me(config: &Config, args: &MeArgs, robot_mode: bool) -> Result<()> {
));
}
// 3. Resolve username
let username = resolve_username(args, config)?;
// 4. Resolve project scope
// 3. Resolve project scope
let project_ids = resolve_project_scope(&conn, args, config)?;
let single_project = project_ids.len() == 1;
// 5. Parse --since (default 1d for activity feed)
// 4. Parse --since (default 1d for activity feed)
let since_ms = match args.since.as_deref() {
Some(raw) => parse_since(raw).ok_or_else(|| {
LoreError::Other(format!(
@@ -106,13 +121,13 @@ pub fn run_me(config: &Config, args: &MeArgs, robot_mode: bool) -> Result<()> {
None => crate::core::time::now_ms() - DEFAULT_ACTIVITY_SINCE_DAYS * MS_PER_DAY,
};
// 6. Determine which sections to query
// 5. Determine which sections to query
let show_all = args.show_all_sections();
let want_issues = show_all || args.issues;
let want_mrs = show_all || args.mrs;
let want_activity = show_all || args.activity;
// 7. Run queries for requested sections
// 6. Run queries for requested sections
let open_issues = if want_issues {
query_open_issues(&conn, username, &project_ids)?
} else {
@@ -137,7 +152,32 @@ pub fn run_me(config: &Config, args: &MeArgs, robot_mode: bool) -> Result<()> {
Vec::new()
};
// 8. Compute summary
// 6b. Since-last-check (cursor-based inbox)
let cursor_ms = cursor::read_cursor(username);
// Capture global watermark BEFORE project filtering so --project doesn't
// permanently skip events from other projects.
let mut global_watermark: Option<i64> = None;
let since_last_check = if let Some(prev_cursor) = cursor_ms {
let groups = query_since_last_check(&conn, username, prev_cursor)?;
// Watermark from ALL groups (unfiltered) — this is the true high-water mark
global_watermark = groups.iter().map(|g| g.latest_timestamp).max();
// If --project was passed, filter groups by project for display only
let groups = if !project_ids.is_empty() {
filter_groups_by_project_ids(&conn, &groups, &project_ids)
} else {
groups
};
let total = groups.iter().map(|g| g.events.len()).sum();
Some(SinceLastCheck {
cursor_ms: prev_cursor,
groups,
total_event_count: total,
})
} else {
None // First run — no section shown
};
// 7. Compute summary
let needs_attention_count = open_issues
.iter()
.filter(|i| i.attention_state == AttentionState::NeedsAttention)
@@ -171,7 +211,7 @@ pub fn run_me(config: &Config, args: &MeArgs, robot_mode: bool) -> Result<()> {
needs_attention_count,
};
// 9. Assemble dashboard
// 8. Assemble dashboard
let dashboard = MeDashboard {
username: username.to_string(),
since_ms: Some(since_ms),
@@ -180,9 +220,10 @@ pub fn run_me(config: &Config, args: &MeArgs, robot_mode: bool) -> Result<()> {
open_mrs_authored,
reviewing_mrs,
activity,
since_last_check,
};
// 10. Render
// 9. Render
let elapsed_ms = start.elapsed().as_millis() as u64;
if robot_mode {
@@ -200,9 +241,43 @@ pub fn run_me(config: &Config, args: &MeArgs, robot_mode: bool) -> Result<()> {
);
}
// 10. Advance cursor AFTER successful render (watermark pattern)
// Uses max event timestamp from UNFILTERED results so --project filtering
// doesn't permanently skip events from other projects.
let watermark = global_watermark.unwrap_or_else(crate::core::time::now_ms);
cursor::write_cursor(username, watermark)
.map_err(|e| LoreError::Other(format!("write cursor: {e}")))?;
Ok(())
}
/// Filter since-last-check groups to only those matching the given project IDs.
/// Used when --project narrows the display scope (cursor is still global).
fn filter_groups_by_project_ids(
conn: &Connection,
groups: &[types::SinceCheckGroup],
project_ids: &[i64],
) -> Vec<types::SinceCheckGroup> {
// Resolve project IDs to paths for matching
let paths: HashSet<String> = project_ids
.iter()
.filter_map(|pid| {
conn.query_row(
"SELECT path_with_namespace FROM projects WHERE id = ?1",
rusqlite::params![pid],
|row| row.get::<_, String>(0),
)
.ok()
})
.collect();
groups
.iter()
.filter(|g| paths.contains(&g.project_path))
.cloned()
.collect()
}
#[cfg(test)]
mod tests {
use super::*;
@@ -243,6 +318,7 @@ mod tests {
all: false,
user: user.map(String::from),
fields: None,
reset_cursor: false,
}
}

View File

@@ -8,7 +8,13 @@ use rusqlite::Connection;
use crate::core::error::Result;
use super::types::{ActivityEventType, AttentionState, MeActivityEvent, MeIssue, MeMr};
use regex::Regex;
use std::collections::HashMap;
use super::types::{
ActivityEventType, AttentionState, MeActivityEvent, MeIssue, MeMr, SinceCheckEvent,
SinceCheckGroup,
};
/// Stale threshold: items with no activity for 30 days are marked "stale".
const STALE_THRESHOLD_MS: i64 = 30 * 24 * 3600 * 1000;
@@ -464,6 +470,223 @@ pub fn query_activity(
Ok(events)
}
// ─── Since Last Check (cursor-based inbox) ──────────────────────────────────
/// Raw row from the since-last-check UNION query.
struct RawSinceCheckRow {
timestamp: i64,
event_type: String,
entity_type: String,
entity_iid: i64,
entity_title: String,
project_path: String,
actor: Option<String>,
summary: String,
body_preview: Option<String>,
is_mention_source: bool,
mention_body: Option<String>,
}
/// Query actionable events from others since `cursor_ms`.
/// Returns events from three sources:
/// 1. Others' comments on my open items
/// 2. @mentions on any item (not restricted to my items)
/// 3. Assignment/review-request system notes mentioning me
pub fn query_since_last_check(
conn: &Connection,
username: &str,
cursor_ms: i64,
) -> Result<Vec<SinceCheckGroup>> {
// Build the "my items" subquery fragments (reused from activity).
let my_issue_check = "EXISTS (
SELECT 1 FROM issue_assignees ia
JOIN issues i2 ON ia.issue_id = i2.id
WHERE ia.issue_id = {entity_issue_id} AND ia.username = ?1 AND i2.state = 'opened'
)";
let my_mr_check = "(
EXISTS (SELECT 1 FROM merge_requests mr2 WHERE mr2.id = {entity_mr_id} AND mr2.author_username = ?1 AND mr2.state = 'opened')
OR EXISTS (SELECT 1 FROM mr_reviewers rv
JOIN merge_requests mr3 ON rv.merge_request_id = mr3.id
WHERE rv.merge_request_id = {entity_mr_id} AND rv.username = ?1 AND mr3.state = 'opened')
)";
// Source 1: Others' comments on my open items
let source1 = format!(
"SELECT n.created_at, 'note',
CASE WHEN d.issue_id IS NOT NULL THEN 'issue' ELSE 'mr' END,
COALESCE(i.iid, m.iid),
COALESCE(i.title, m.title),
p.path_with_namespace,
n.author_username,
SUBSTR(n.body, 1, 200),
NULL,
0,
NULL
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
JOIN projects p ON d.project_id = p.id
LEFT JOIN issues i ON d.issue_id = i.id
LEFT JOIN merge_requests m ON d.merge_request_id = m.id
WHERE n.is_system = 0
AND n.created_at > ?2
AND n.author_username != ?1
AND (
(d.issue_id IS NOT NULL AND {issue_check})
OR (d.merge_request_id IS NOT NULL AND {mr_check})
)",
issue_check = my_issue_check.replace("{entity_issue_id}", "d.issue_id"),
mr_check = my_mr_check.replace("{entity_mr_id}", "d.merge_request_id"),
);
// Source 2: @mentions on ANY item (not restricted to my items)
// Word-boundary-aware matching to reduce false positives
let source2 = format!(
"SELECT n.created_at, 'mention_note',
CASE WHEN d.issue_id IS NOT NULL THEN 'issue' ELSE 'mr' END,
COALESCE(i.iid, m.iid),
COALESCE(i.title, m.title),
p.path_with_namespace,
n.author_username,
SUBSTR(n.body, 1, 200),
NULL,
1,
n.body
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
JOIN projects p ON d.project_id = p.id
LEFT JOIN issues i ON d.issue_id = i.id
LEFT JOIN merge_requests m ON d.merge_request_id = m.id
WHERE n.is_system = 0
AND n.created_at > ?2
AND n.author_username != ?1
AND LOWER(n.body) LIKE '%@' || LOWER(?1) || '%'
AND NOT (
(d.issue_id IS NOT NULL AND {issue_check})
OR (d.merge_request_id IS NOT NULL AND {mr_check})
)",
issue_check = my_issue_check.replace("{entity_issue_id}", "d.issue_id"),
mr_check = my_mr_check.replace("{entity_mr_id}", "d.merge_request_id"),
);
// Source 3: Assignment/review-request system notes mentioning me
let source3 = "SELECT n.created_at,
CASE
WHEN LOWER(n.body) LIKE '%assigned to @%' THEN 'assign'
WHEN LOWER(n.body) LIKE '%unassigned @%' THEN 'unassign'
WHEN LOWER(n.body) LIKE '%requested review from @%' THEN 'review_request'
ELSE 'assign'
END,
CASE WHEN d.issue_id IS NOT NULL THEN 'issue' ELSE 'mr' END,
COALESCE(i.iid, m.iid),
COALESCE(i.title, m.title),
p.path_with_namespace,
n.author_username,
n.body,
NULL,
0,
NULL
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
JOIN projects p ON d.project_id = p.id
LEFT JOIN issues i ON d.issue_id = i.id
LEFT JOIN merge_requests m ON d.merge_request_id = m.id
WHERE n.is_system = 1
AND n.created_at > ?2
AND n.author_username != ?1
AND (
LOWER(n.body) LIKE '%assigned to @' || LOWER(?1) || '%'
OR LOWER(n.body) LIKE '%unassigned @' || LOWER(?1) || '%'
OR LOWER(n.body) LIKE '%requested review from @' || LOWER(?1) || '%'
)"
.to_string();
let full_sql = format!(
"{source1}
UNION ALL {source2}
UNION ALL {source3}
ORDER BY 1 DESC
LIMIT 200"
);
let params: Vec<Box<dyn rusqlite::types::ToSql>> =
vec![Box::new(username.to_string()), Box::new(cursor_ms)];
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let mut stmt = conn.prepare(&full_sql)?;
let rows = stmt.query_map(param_refs.as_slice(), |row| {
Ok(RawSinceCheckRow {
timestamp: row.get(0)?,
event_type: row.get(1)?,
entity_type: row.get(2)?,
entity_iid: row.get(3)?,
entity_title: row.get::<_, Option<String>>(4)?.unwrap_or_default(),
project_path: row.get(5)?,
actor: row.get(6)?,
summary: row.get::<_, Option<String>>(7)?.unwrap_or_default(),
body_preview: row.get(8)?,
is_mention_source: row.get::<_, i32>(9)? != 0,
mention_body: row.get(10)?,
})
})?;
let mention_re = build_exact_mention_regex(username);
let raw_events: Vec<RawSinceCheckRow> = rows
.collect::<std::result::Result<Vec<_>, _>>()?
.into_iter()
.filter(|row| {
!row.is_mention_source
|| row
.mention_body
.as_deref()
.is_some_and(|body| contains_exact_mention(body, &mention_re))
})
.collect();
Ok(group_since_check_events(raw_events))
}
/// Group flat event rows by entity, sort groups newest-first, events within oldest-first.
fn group_since_check_events(rows: Vec<RawSinceCheckRow>) -> Vec<SinceCheckGroup> {
// Key: (entity_type, entity_iid, project_path)
let mut groups: HashMap<(String, i64, String), SinceCheckGroup> = HashMap::new();
for row in rows {
let key = (
row.entity_type.clone(),
row.entity_iid,
row.project_path.clone(),
);
let group = groups.entry(key).or_insert_with(|| SinceCheckGroup {
entity_type: row.entity_type.clone(),
entity_iid: row.entity_iid,
entity_title: row.entity_title.clone(),
project_path: row.project_path.clone(),
events: Vec::new(),
latest_timestamp: 0,
});
if row.timestamp > group.latest_timestamp {
group.latest_timestamp = row.timestamp;
}
group.events.push(SinceCheckEvent {
timestamp: row.timestamp,
event_type: parse_event_type(&row.event_type),
actor: row.actor,
summary: row.summary,
body_preview: row.body_preview,
});
}
let mut result: Vec<SinceCheckGroup> = groups.into_values().collect();
// Sort groups newest-first
result.sort_by_key(|g| std::cmp::Reverse(g.latest_timestamp));
// Sort events within each group oldest-first (read top-to-bottom)
for group in &mut result {
group.events.sort_by_key(|e| e.timestamp);
}
result
}
// ─── Helpers ────────────────────────────────────────────────────────────────
/// Parse attention state string from SQL CASE result.
@@ -482,6 +705,7 @@ fn parse_attention_state(s: &str) -> AttentionState {
fn parse_event_type(s: &str) -> ActivityEventType {
match s {
"note" => ActivityEventType::Note,
"mention_note" => ActivityEventType::Note,
"status_change" => ActivityEventType::StatusChange,
"label_change" => ActivityEventType::LabelChange,
"assign" => ActivityEventType::Assign,
@@ -492,6 +716,46 @@ fn parse_event_type(s: &str) -> ActivityEventType {
}
}
fn build_exact_mention_regex(username: &str) -> Regex {
let escaped = regex::escape(username);
let pattern = format!(r"(?i)@{escaped}");
Regex::new(&pattern).expect("mention regex must compile")
}
fn contains_exact_mention(body: &str, mention_re: &Regex) -> bool {
for m in mention_re.find_iter(body) {
let start = m.start();
let end = m.end();
let prev = body[..start].chars().next_back();
if prev.is_some_and(is_username_char) {
continue;
}
if let Some(next) = body[end..].chars().next() {
// Reject domain-like continuations such as "@alice.com"
if next == '.' {
let after_dot = body[end + next.len_utf8()..].chars().next();
if after_dot.is_some_and(is_username_char) {
continue;
}
}
if is_username_char(next) {
continue;
}
}
return true;
}
false
}
fn is_username_char(ch: char) -> bool {
ch.is_ascii_alphanumeric() || matches!(ch, '_' | '-')
}
/// Build a SQL clause for project ID filtering.
/// `start_idx` is the 1-based parameter index for the first project ID.
/// Returns empty string when no filter is needed (all projects).

View File

@@ -2,6 +2,7 @@ use crate::cli::render::{self, Align, GlyphMode, Icons, LoreRenderer, StyledCell
use super::types::{
ActivityEventType, AttentionState, MeActivityEvent, MeDashboard, MeIssue, MeMr, MeSummary,
SinceLastCheck,
};
// ─── Layout Helpers ─────────────────────────────────────────────────────────
@@ -475,10 +476,113 @@ fn format_entity_ref(entity_type: &str, iid: i64) -> String {
}
}
// ─── Since Last Check ────────────────────────────────────────────────────────
/// Print the "since last check" section at the top of the dashboard.
pub fn print_since_last_check_section(since: &SinceLastCheck, single_project: bool) {
let relative = render::format_relative_time(since.cursor_ms);
if since.groups.is_empty() {
println!(
"\n {}",
Theme::dim().render(&format!(
"No new events since {} ({relative})",
render::format_datetime(since.cursor_ms),
))
);
return;
}
println!(
"{}",
render::section_divider(&format!("Since Last Check ({relative})"))
);
for group in &since.groups {
// Entity header: !247 Fix race condition...
let ref_str = match group.entity_type.as_str() {
"issue" => format!("#{}", group.entity_iid),
"mr" => format!("!{}", group.entity_iid),
_ => format!("{}:{}", group.entity_type, group.entity_iid),
};
let ref_style = match group.entity_type.as_str() {
"issue" => Theme::issue_ref(),
"mr" => Theme::mr_ref(),
_ => Theme::bold(),
};
println!();
println!(
" {} {}",
ref_style.render(&ref_str),
Theme::bold().render(&render::truncate(&group.entity_title, title_width(20))),
);
if !single_project {
println!(" {}", Theme::dim().render(&group.project_path));
}
// Sub-events as indented rows
let summary_max = title_width(42);
let mut table = Table::new()
.columns(3)
.indent(6)
.align(2, Align::Right)
.max_width(1, summary_max);
for event in &group.events {
let badge = activity_badge_label(&event.event_type);
let badge_style = activity_badge_style(&event.event_type);
let actor_prefix = event
.actor
.as_deref()
.map(|a| format!("@{a} "))
.unwrap_or_default();
let clean_summary = event.summary.replace('\n', " ");
let summary_text = format!("{actor_prefix}{clean_summary}");
let time = render::format_relative_time_compact(event.timestamp);
table.add_row(vec![
StyledCell::styled(badge, badge_style),
StyledCell::plain(summary_text),
StyledCell::styled(time, Theme::dim()),
]);
}
let rendered = table.render();
for (line, event) in rendered.lines().zip(group.events.iter()) {
println!("{line}");
if let Some(preview) = &event.body_preview
&& !preview.is_empty()
{
let truncated = render::truncate(preview, 60);
println!(
" {}",
Theme::dim().render(&format!("\"{truncated}\""))
);
}
}
}
// Footer
println!(
"\n {}",
Theme::dim().render(&format!(
"{} events across {} items",
since.total_event_count,
since.groups.len()
))
);
}
// ─── Full Dashboard ──────────────────────────────────────────────────────────
/// Render the complete human-mode dashboard.
pub fn print_me_dashboard(dashboard: &MeDashboard, single_project: bool) {
if let Some(ref since) = dashboard.since_last_check {
print_since_last_check_section(since, single_project);
}
print_summary_header(&dashboard.summary, &dashboard.username);
print_issues_section(&dashboard.open_issues, single_project);
print_authored_mrs_section(&dashboard.open_mrs_authored, single_project);
@@ -495,6 +599,9 @@ pub fn print_me_dashboard_filtered(
show_mrs: bool,
show_activity: bool,
) {
if let Some(ref since) = dashboard.since_last_check {
print_since_last_check_section(since, single_project);
}
print_summary_header(&dashboard.summary, &dashboard.username);
if show_issues {

View File

@@ -5,6 +5,7 @@ use crate::core::time::ms_to_iso;
use super::types::{
ActivityEventType, AttentionState, MeActivityEvent, MeDashboard, MeIssue, MeMr, MeSummary,
SinceCheckEvent, SinceCheckGroup, SinceLastCheck,
};
// ─── Robot JSON Output (Task #18) ────────────────────────────────────────────
@@ -43,6 +44,27 @@ pub fn print_me_json(
Ok(())
}
/// Print `--reset-cursor` response using standard robot envelope.
pub fn print_cursor_reset_json(elapsed_ms: u64) -> crate::core::error::Result<()> {
let value = cursor_reset_envelope_json(elapsed_ms);
let json = serde_json::to_string(&value)
.map_err(|e| crate::core::error::LoreError::Other(format!("JSON serialization: {e}")))?;
println!("{json}");
Ok(())
}
fn cursor_reset_envelope_json(elapsed_ms: u64) -> serde_json::Value {
serde_json::json!({
"ok": true,
"data": {
"cursor_reset": true
},
"meta": {
"elapsed_ms": elapsed_ms
}
})
}
// ─── JSON Envelope ───────────────────────────────────────────────────────────
#[derive(Serialize)]
@@ -57,6 +79,8 @@ struct MeDataJson {
username: String,
since_iso: Option<String>,
summary: SummaryJson,
#[serde(skip_serializing_if = "Option::is_none")]
since_last_check: Option<SinceLastCheckJson>,
open_issues: Vec<IssueJson>,
open_mrs_authored: Vec<MrJson>,
reviewing_mrs: Vec<MrJson>,
@@ -69,6 +93,7 @@ impl MeDataJson {
username: d.username.clone(),
since_iso: d.since_ms.map(ms_to_iso),
summary: SummaryJson::from(&d.summary),
since_last_check: d.since_last_check.as_ref().map(SinceLastCheckJson::from),
open_issues: d.open_issues.iter().map(IssueJson::from).collect(),
open_mrs_authored: d.open_mrs_authored.iter().map(MrJson::from).collect(),
reviewing_mrs: d.reviewing_mrs.iter().map(MrJson::from).collect(),
@@ -197,6 +222,67 @@ impl From<&MeActivityEvent> for ActivityJson {
}
}
// ─── Since Last Check ────────────────────────────────────────────────────────
#[derive(Serialize)]
struct SinceLastCheckJson {
cursor_iso: String,
total_event_count: usize,
groups: Vec<SinceCheckGroupJson>,
}
impl From<&SinceLastCheck> for SinceLastCheckJson {
fn from(s: &SinceLastCheck) -> Self {
Self {
cursor_iso: ms_to_iso(s.cursor_ms),
total_event_count: s.total_event_count,
groups: s.groups.iter().map(SinceCheckGroupJson::from).collect(),
}
}
}
#[derive(Serialize)]
struct SinceCheckGroupJson {
entity_type: String,
entity_iid: i64,
entity_title: String,
project: String,
events: Vec<SinceCheckEventJson>,
}
impl From<&SinceCheckGroup> for SinceCheckGroupJson {
fn from(g: &SinceCheckGroup) -> Self {
Self {
entity_type: g.entity_type.clone(),
entity_iid: g.entity_iid,
entity_title: g.entity_title.clone(),
project: g.project_path.clone(),
events: g.events.iter().map(SinceCheckEventJson::from).collect(),
}
}
}
#[derive(Serialize)]
struct SinceCheckEventJson {
timestamp_iso: String,
event_type: String,
actor: Option<String>,
summary: String,
body_preview: Option<String>,
}
impl From<&SinceCheckEvent> for SinceCheckEventJson {
fn from(e: &SinceCheckEvent) -> Self {
Self {
timestamp_iso: ms_to_iso(e.timestamp),
event_type: event_type_str(&e.event_type),
actor: e.actor.clone(),
summary: e.summary.clone(),
body_preview: e.body_preview.clone(),
}
}
}
// ─── Helpers ─────────────────────────────────────────────────────────────────
/// Convert `AttentionState` to its programmatic string representation.
@@ -331,4 +417,12 @@ mod tests {
assert!(!json.is_own);
assert_eq!(json.body_preview, Some("This looks good".to_string()));
}
#[test]
fn cursor_reset_envelope_includes_meta_elapsed_ms() {
let value = cursor_reset_envelope_json(17);
assert_eq!(value["ok"], serde_json::json!(true));
assert_eq!(value["data"]["cursor_reset"], serde_json::json!(true));
assert_eq!(value["meta"]["elapsed_ms"], serde_json::json!(17));
}
}

View File

@@ -86,6 +86,34 @@ pub struct MeActivityEvent {
pub body_preview: Option<String>,
}
/// A single actionable event in the "since last check" section.
#[derive(Clone)]
pub struct SinceCheckEvent {
pub timestamp: i64,
pub event_type: ActivityEventType,
pub actor: Option<String>,
pub summary: String,
pub body_preview: Option<String>,
}
/// Events grouped by entity for the "since last check" section.
#[derive(Clone)]
pub struct SinceCheckGroup {
pub entity_type: String,
pub entity_iid: i64,
pub entity_title: String,
pub project_path: String,
pub events: Vec<SinceCheckEvent>,
pub latest_timestamp: i64,
}
/// The complete "since last check" result.
pub struct SinceLastCheck {
pub cursor_ms: i64,
pub groups: Vec<SinceCheckGroup>,
pub total_event_count: usize,
}
/// The complete dashboard result.
pub struct MeDashboard {
pub username: String,
@@ -95,4 +123,5 @@ pub struct MeDashboard {
pub open_mrs_authored: Vec<MeMr>,
pub reviewing_mrs: Vec<MeMr>,
pub activity: Vec<MeActivityEvent>,
pub since_last_check: Option<SinceLastCheck>,
}

View File

@@ -11,6 +11,7 @@ pub mod ingest;
pub mod init;
pub mod list;
pub mod me;
pub mod related;
pub mod search;
pub mod show;
pub mod stats;
@@ -48,6 +49,7 @@ pub use list::{
print_list_notes, print_list_notes_json, query_notes, run_list_issues, run_list_mrs,
};
pub use me::run_me;
pub use related::{RelatedResponse, print_related_human, print_related_json, run_related};
pub use search::{
SearchCliFilters, SearchResponse, print_search_results, print_search_results_json, run_search,
};

637
src/cli/commands/related.rs Normal file
View File

@@ -0,0 +1,637 @@
//! Semantic similarity discovery: find related entities via vector search.
use std::collections::HashSet;
use rusqlite::Connection;
use serde::Serialize;
use crate::cli::render::{Icons, Theme};
use crate::cli::robot::RobotMeta;
use crate::core::config::Config;
use crate::core::db::create_connection;
use crate::core::error::{LoreError, Result};
use crate::core::paths::get_db_path;
use crate::core::project::resolve_project;
use crate::core::time::ms_to_iso;
use crate::embedding::ollama::{OllamaClient, OllamaConfig};
use crate::search::search_vector;
// ---------------------------------------------------------------------------
// Response types
// ---------------------------------------------------------------------------
#[derive(Debug, Serialize)]
pub struct RelatedResponse {
pub mode: String,
#[serde(skip_serializing_if = "Option::is_none")]
pub source: Option<RelatedSource>,
#[serde(skip_serializing_if = "Option::is_none")]
pub query: Option<String>,
pub results: Vec<RelatedResult>,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub warnings: Vec<String>,
}
#[derive(Debug, Serialize)]
pub struct RelatedSource {
pub source_type: String,
pub iid: i64,
pub title: String,
pub project_path: String,
}
#[derive(Debug, Serialize)]
pub struct RelatedResult {
pub source_type: String,
pub iid: i64,
pub title: String,
pub url: String,
pub similarity_score: f64,
pub project_path: String,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub shared_labels: Vec<String>,
pub author: Option<String>,
pub updated_at: String,
}
// ---------------------------------------------------------------------------
// Internal row types
// ---------------------------------------------------------------------------
struct DocumentRow {
id: i64,
source_type: String,
source_id: i64,
#[allow(dead_code)]
project_id: i64,
#[allow(dead_code)]
title: Option<String>,
url: Option<String>,
content_text: String,
label_names: Option<String>,
author_username: Option<String>,
updated_at: Option<i64>,
}
struct EntityInfo {
#[allow(dead_code)]
iid: i64,
title: String,
project_path: String,
}
// ---------------------------------------------------------------------------
// Main entry point
// ---------------------------------------------------------------------------
/// Run the related command.
///
/// Modes:
/// - Entity mode: `lore related issues 42` or `lore related mrs 99`
/// - Query mode: `lore related 'search terms'`
pub async fn run_related(
config: &Config,
query_or_type: &str,
iid: Option<i64>,
limit: usize,
project: Option<&str>,
) -> Result<RelatedResponse> {
let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?;
// Check if embeddings exist
let embedding_count: i64 = conn
.query_row("SELECT COUNT(*) FROM embedding_metadata", [], |row| {
row.get(0)
})
.unwrap_or(0);
if embedding_count == 0 {
return Err(LoreError::Other(
"No embeddings found. Run 'lore embed' first to generate vector embeddings.".into(),
));
}
// Validate input
if query_or_type.trim().is_empty() {
return Err(LoreError::Other(
"Query cannot be empty. Provide an entity type (issues/mrs) and IID, or a search query.".into(),
));
}
// Determine mode: entity vs query
let entity_type = match query_or_type.to_lowercase().as_str() {
"issues" | "issue" | "i" => Some("issue"),
"mrs" | "mr" | "m" | "merge_request" => Some("merge_request"),
_ => None,
};
if let Some(etype) = entity_type {
// Entity mode
let iid = iid.ok_or_else(|| {
LoreError::Other("Entity mode requires an IID (e.g., 'lore related issues 42')".into())
})?;
run_related_entity(&conn, config, etype, iid, limit, project).await
} else {
// Query mode - treat query_or_type as free text
run_related_query(&conn, config, query_or_type, limit, project).await
}
}
async fn run_related_entity(
conn: &Connection,
config: &Config,
entity_type: &str,
iid: i64,
limit: usize,
project_filter: Option<&str>,
) -> Result<RelatedResponse> {
// Find the source document
let source_doc = find_entity_document(conn, entity_type, iid, project_filter)?;
let source_info = get_entity_info(conn, entity_type, source_doc.source_id)?;
// Embed the source content
let embedding = embed_text(config, &source_doc.content_text).await?;
// Search for similar documents (limit + 1 to account for filtering self)
let vector_results = search_vector(conn, &embedding, limit.saturating_add(1))?;
// Filter out self and hydrate results
let source_labels = parse_label_names(&source_doc.label_names);
let mut results = Vec::new();
let mut warnings = Vec::new();
for vr in vector_results {
// Skip self
if vr.document_id == source_doc.id {
continue;
}
if let Some(result) = hydrate_result(conn, vr.document_id, vr.distance, &source_labels)? {
results.push(result);
}
if results.len() >= limit {
break;
}
}
// Check for low similarity
if !results.is_empty() && results.iter().all(|r| r.similarity_score < 0.3) {
warnings.push("No strongly related entities found (all scores < 0.3)".to_string());
}
Ok(RelatedResponse {
mode: "entity".to_string(),
source: Some(RelatedSource {
source_type: entity_type.to_string(),
iid,
title: source_info.title,
project_path: source_info.project_path,
}),
query: None,
results,
warnings,
})
}
async fn run_related_query(
conn: &Connection,
config: &Config,
query: &str,
limit: usize,
project_filter: Option<&str>,
) -> Result<RelatedResponse> {
let mut warnings = Vec::new();
// Warn if query is very short
if query.split_whitespace().count() <= 2 {
warnings.push("Short queries may produce noisy results".to_string());
}
// Embed the query
let embedding = embed_text(config, query).await?;
// Search for similar documents (fetch extra to allow for project filtering)
let vector_results = search_vector(conn, &embedding, limit.saturating_mul(2))?;
// Filter by project if specified and hydrate
let project_id = project_filter
.map(|p| resolve_project(conn, p))
.transpose()?;
let mut results = Vec::new();
let empty_labels: HashSet<String> = HashSet::new();
for vr in vector_results {
// Check project filter
if let Some(pid) = project_id {
let doc_project_id: Option<i64> = conn
.query_row(
"SELECT project_id FROM documents WHERE id = ?1",
[vr.document_id],
|row| row.get(0),
)
.ok();
if doc_project_id != Some(pid) {
continue;
}
}
if let Some(result) = hydrate_result(conn, vr.document_id, vr.distance, &empty_labels)? {
results.push(result);
}
if results.len() >= limit {
break;
}
}
// Check for low similarity
if !results.is_empty() && results.iter().all(|r| r.similarity_score < 0.3) {
warnings.push("No strongly related entities found (all scores < 0.3)".to_string());
}
Ok(RelatedResponse {
mode: "query".to_string(),
source: None,
query: Some(query.to_string()),
results,
warnings,
})
}
// ---------------------------------------------------------------------------
// DB helpers
// ---------------------------------------------------------------------------
fn find_entity_document(
conn: &Connection,
entity_type: &str,
iid: i64,
project_filter: Option<&str>,
) -> Result<DocumentRow> {
let table = match entity_type {
"issue" => "issues",
"merge_request" => "merge_requests",
_ => {
return Err(LoreError::Other(format!(
"Unknown entity type: {entity_type}"
)));
}
};
let (sql, params): (String, Vec<Box<dyn rusqlite::ToSql>>) = match project_filter {
Some(project) => {
let project_id = resolve_project(conn, project)?;
(
format!(
"SELECT d.id, d.source_type, d.source_id, d.project_id, d.title, d.url,
d.content_text, d.label_names, d.author_username, d.updated_at
FROM documents d
JOIN {table} e ON d.source_id = e.id
WHERE d.source_type = ?1 AND e.iid = ?2 AND e.project_id = ?3"
),
vec![
Box::new(entity_type.to_string()),
Box::new(iid),
Box::new(project_id),
],
)
}
None => (
format!(
"SELECT d.id, d.source_type, d.source_id, d.project_id, d.title, d.url,
d.content_text, d.label_names, d.author_username, d.updated_at
FROM documents d
JOIN {table} e ON d.source_id = e.id
WHERE d.source_type = ?1 AND e.iid = ?2"
),
vec![Box::new(entity_type.to_string()), Box::new(iid)],
),
};
let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
let mut stmt = conn.prepare(&sql)?;
let rows: Vec<DocumentRow> = stmt
.query_map(param_refs.as_slice(), |row| {
Ok(DocumentRow {
id: row.get(0)?,
source_type: row.get(1)?,
source_id: row.get(2)?,
project_id: row.get(3)?,
title: row.get(4)?,
url: row.get(5)?,
content_text: row.get(6)?,
label_names: row.get(7)?,
author_username: row.get(8)?,
updated_at: row.get(9)?,
})
})?
.collect::<std::result::Result<Vec<_>, _>>()?;
match rows.len() {
0 => Err(LoreError::NotFound(format!(
"{entity_type} #{iid} not found (run 'lore sync' first?)"
))),
1 => Ok(rows.into_iter().next().unwrap()),
_ => Err(LoreError::Ambiguous(format!(
"{entity_type} #{iid} exists in multiple projects. Use --project to specify."
))),
}
}
fn get_entity_info(conn: &Connection, entity_type: &str, entity_id: i64) -> Result<EntityInfo> {
let table = match entity_type {
"issue" => "issues",
"merge_request" => "merge_requests",
_ => {
return Err(LoreError::Other(format!(
"Unknown entity type: {entity_type}"
)));
}
};
let sql = format!(
"SELECT e.iid, e.title, p.path_with_namespace
FROM {table} e
JOIN projects p ON e.project_id = p.id
WHERE e.id = ?1"
);
conn.query_row(&sql, [entity_id], |row| {
Ok(EntityInfo {
iid: row.get(0)?,
title: row.get(1)?,
project_path: row.get(2)?,
})
})
.map_err(|e| LoreError::NotFound(format!("Entity not found: {e}")))
}
fn hydrate_result(
conn: &Connection,
document_id: i64,
distance: f64,
source_labels: &HashSet<String>,
) -> Result<Option<RelatedResult>> {
let doc: Option<DocumentRow> = conn
.query_row(
"SELECT d.id, d.source_type, d.source_id, d.project_id, d.title, d.url,
d.content_text, d.label_names, d.author_username, d.updated_at
FROM documents d
WHERE d.id = ?1",
[document_id],
|row| {
Ok(DocumentRow {
id: row.get(0)?,
source_type: row.get(1)?,
source_id: row.get(2)?,
project_id: row.get(3)?,
title: row.get(4)?,
url: row.get(5)?,
content_text: row.get(6)?,
label_names: row.get(7)?,
author_username: row.get(8)?,
updated_at: row.get(9)?,
})
},
)
.ok();
let Some(doc) = doc else {
return Ok(None);
};
// Skip discussion/note documents - we want entities only
if doc.source_type == "discussion" || doc.source_type == "note" {
return Ok(None);
}
// Get IID from the source entity
let table = match doc.source_type.as_str() {
"issue" => "issues",
"merge_request" => "merge_requests",
_ => return Ok(None),
};
// Get IID and title from the source entity - skip gracefully if not found
// (this handles orphaned documents where the entity was deleted)
let entity_info: Option<(i64, String, String)> = conn
.query_row(
&format!(
"SELECT e.iid, e.title, p.path_with_namespace
FROM {table} e
JOIN projects p ON e.project_id = p.id
WHERE e.id = ?1"
),
[doc.source_id],
|row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
)
.ok();
let Some((iid, title, project_path)) = entity_info else {
// Entity not found in database - skip this result
return Ok(None);
};
// Compute shared labels
let result_labels = parse_label_names(&doc.label_names);
let shared_labels: Vec<String> = source_labels
.intersection(&result_labels)
.cloned()
.collect();
Ok(Some(RelatedResult {
source_type: doc.source_type,
iid,
title,
url: doc.url.unwrap_or_default(),
similarity_score: distance_to_similarity(distance),
project_path,
shared_labels,
author: doc.author_username,
updated_at: doc.updated_at.map(ms_to_iso).unwrap_or_default(),
}))
}
// ---------------------------------------------------------------------------
// Embedding helper
// ---------------------------------------------------------------------------
async fn embed_text(config: &Config, text: &str) -> Result<Vec<f32>> {
let ollama = OllamaClient::new(OllamaConfig {
base_url: config.embedding.base_url.clone(),
model: config.embedding.model.clone(),
timeout_secs: 60,
});
let embeddings = ollama.embed_batch(&[text]).await?;
embeddings
.into_iter()
.next()
.ok_or_else(|| LoreError::EmbeddingFailed {
document_id: 0,
reason: "No embedding returned".to_string(),
})
}
// ---------------------------------------------------------------------------
// Utilities
// ---------------------------------------------------------------------------
/// Convert L2 distance to a 0-1 similarity score.
/// Uses inverse relationship: closer (lower distance) = higher similarity.
fn distance_to_similarity(distance: f64) -> f64 {
1.0 / (1.0 + distance)
}
fn parse_label_names(label_names_json: &Option<String>) -> HashSet<String> {
label_names_json
.as_deref()
.and_then(|s| serde_json::from_str::<Vec<String>>(s).ok())
.unwrap_or_default()
.into_iter()
.collect()
}
// ---------------------------------------------------------------------------
// Printers
// ---------------------------------------------------------------------------
pub fn print_related_human(response: &RelatedResponse) {
// Header
let header = match &response.source {
Some(src) => format!("Related to {} #{}: {}", src.source_type, src.iid, src.title),
None => format!(
"Related to query: \"{}\"",
response.query.as_deref().unwrap_or("")
),
};
println!("{}", Theme::bold().render(&header));
println!("{}", "-".repeat(header.len().min(70)));
println!();
if response.results.is_empty() {
println!("No related entities found.");
return;
}
for (i, result) in response.results.iter().enumerate() {
let type_icon = match result.source_type.as_str() {
"issue" => Icons::issue_opened(),
"merge_request" => Icons::mr_opened(),
_ => " ",
};
let score_bar_len = (result.similarity_score * 10.0) as usize;
let score_bar: String = "\u{2588}".repeat(score_bar_len);
println!(
"{:>2}. {} {} #{} ({:.0}%) {}",
i + 1,
type_icon,
result.source_type,
result.iid,
result.similarity_score * 100.0,
score_bar
);
println!(" {}", result.title);
println!(
" {} | @{}",
result.project_path,
result.author.as_deref().unwrap_or("?")
);
if !result.shared_labels.is_empty() {
println!(" Labels shared: {}", result.shared_labels.join(", "));
}
println!();
}
// Warnings
for warning in &response.warnings {
println!("{} {}", Theme::warning().render(Icons::warning()), warning);
}
}
pub fn print_related_json(response: &RelatedResponse, elapsed_ms: u64) {
let meta = RobotMeta { elapsed_ms };
let output = serde_json::json!({
"ok": true,
"data": response,
"meta": meta,
});
match serde_json::to_string(&output) {
Ok(json) => println!("{json}"),
Err(e) => eprintln!("Error serializing to JSON: {e}"),
}
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_distance_to_similarity_identical() {
assert!((distance_to_similarity(0.0) - 1.0).abs() < f64::EPSILON);
}
#[test]
fn test_distance_to_similarity_midpoint() {
assert!((distance_to_similarity(1.0) - 0.5).abs() < f64::EPSILON);
}
#[test]
fn test_distance_to_similarity_large() {
let sim = distance_to_similarity(2.0);
assert!(sim > 0.0 && sim < 0.5);
assert!((sim - 0.333_333_333_333_333_3).abs() < 0.001);
}
#[test]
fn test_distance_to_similarity_range() {
for d in [0.0, 0.1, 0.5, 1.0, 2.0, 5.0, 10.0] {
let sim = distance_to_similarity(d);
assert!(
sim > 0.0 && sim <= 1.0,
"score {sim} out of range for distance {d}"
);
}
}
#[test]
fn test_parse_label_names_valid() {
let json = Some(r#"["bug", "priority::high"]"#.to_string());
let labels = parse_label_names(&json);
assert!(labels.contains("bug"));
assert!(labels.contains("priority::high"));
assert_eq!(labels.len(), 2);
}
#[test]
fn test_parse_label_names_empty() {
let labels = parse_label_names(&None);
assert!(labels.is_empty());
}
#[test]
fn test_parse_label_names_invalid_json() {
let json = Some("not valid json".to_string());
let labels = parse_label_names(&json);
assert!(labels.is_empty());
}
#[test]
fn test_parse_label_names_empty_array() {
let json = Some("[]".to_string());
let labels = parse_label_names(&json);
assert!(labels.is_empty());
}
}

View File

@@ -12,6 +12,10 @@ use crate::core::time::{format_full_datetime, ms_to_iso};
const RECENT_RUNS_LIMIT: usize = 10;
fn is_zero(value: &i64) -> bool {
*value == 0
}
#[derive(Debug)]
pub struct SyncRunInfo {
pub id: i64,
@@ -24,6 +28,15 @@ pub struct SyncRunInfo {
pub total_items_processed: i64,
pub total_errors: i64,
pub stages: Option<Vec<StageTiming>>,
// Per-entity counts (from migration 027)
pub issues_fetched: i64,
pub issues_ingested: i64,
pub mrs_fetched: i64,
pub mrs_ingested: i64,
pub skipped_stale: i64,
pub docs_regenerated: i64,
pub docs_embedded: i64,
pub warnings_count: i64,
}
#[derive(Debug)]
@@ -68,7 +81,9 @@ pub fn run_sync_status(config: &Config) -> Result<SyncStatusResult> {
fn get_recent_sync_runs(conn: &Connection, limit: usize) -> Result<Vec<SyncRunInfo>> {
let mut stmt = conn.prepare(
"SELECT id, started_at, finished_at, status, command, error,
run_id, total_items_processed, total_errors, metrics_json
run_id, total_items_processed, total_errors, metrics_json,
issues_fetched, issues_ingested, mrs_fetched, mrs_ingested,
skipped_stale, docs_regenerated, docs_embedded, warnings_count
FROM sync_runs
ORDER BY started_at DESC
LIMIT ?1",
@@ -91,6 +106,14 @@ fn get_recent_sync_runs(conn: &Connection, limit: usize) -> Result<Vec<SyncRunIn
total_items_processed: row.get::<_, Option<i64>>(7)?.unwrap_or(0),
total_errors: row.get::<_, Option<i64>>(8)?.unwrap_or(0),
stages,
issues_fetched: row.get::<_, Option<i64>>(10)?.unwrap_or(0),
issues_ingested: row.get::<_, Option<i64>>(11)?.unwrap_or(0),
mrs_fetched: row.get::<_, Option<i64>>(12)?.unwrap_or(0),
mrs_ingested: row.get::<_, Option<i64>>(13)?.unwrap_or(0),
skipped_stale: row.get::<_, Option<i64>>(14)?.unwrap_or(0),
docs_regenerated: row.get::<_, Option<i64>>(15)?.unwrap_or(0),
docs_embedded: row.get::<_, Option<i64>>(16)?.unwrap_or(0),
warnings_count: row.get::<_, Option<i64>>(17)?.unwrap_or(0),
})
})?
.collect();
@@ -198,6 +221,23 @@ struct SyncRunJsonInfo {
error: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
stages: Option<Vec<StageTiming>>,
// Per-entity counts
#[serde(skip_serializing_if = "is_zero")]
issues_fetched: i64,
#[serde(skip_serializing_if = "is_zero")]
issues_ingested: i64,
#[serde(skip_serializing_if = "is_zero")]
mrs_fetched: i64,
#[serde(skip_serializing_if = "is_zero")]
mrs_ingested: i64,
#[serde(skip_serializing_if = "is_zero")]
skipped_stale: i64,
#[serde(skip_serializing_if = "is_zero")]
docs_regenerated: i64,
#[serde(skip_serializing_if = "is_zero")]
docs_embedded: i64,
#[serde(skip_serializing_if = "is_zero")]
warnings_count: i64,
}
#[derive(Serialize)]
@@ -237,6 +277,14 @@ pub fn print_sync_status_json(result: &SyncStatusResult, elapsed_ms: u64) {
total_errors: run.total_errors,
error: run.error.clone(),
stages: run.stages.clone(),
issues_fetched: run.issues_fetched,
issues_ingested: run.issues_ingested,
mrs_fetched: run.mrs_fetched,
mrs_ingested: run.mrs_ingested,
skipped_stale: run.skipped_stale,
docs_regenerated: run.docs_regenerated,
docs_embedded: run.docs_embedded,
warnings_count: run.warnings_count,
}
})
.collect();

View File

@@ -175,7 +175,7 @@ pub async fn run_timeline(config: &Config, params: &TimelineParams) -> Result<Ti
query: params.query.clone(),
search_mode: seed_result.search_mode,
events,
total_events_before_limit: total_before_limit,
total_filtered_events: total_before_limit,
seed_entities: seed_result.seed_entities,
expanded_entities: expand_result.expanded_entities,
unresolved_references: expand_result.unresolved_references,
@@ -342,7 +342,7 @@ fn format_entity_ref(entity_type: &str, iid: i64) -> String {
/// Render timeline as robot-mode JSON in {ok, data, meta} envelope.
pub fn print_timeline_json_with_meta(
result: &TimelineResult,
total_events_before_limit: usize,
total_filtered_events: usize,
depth: u32,
include_mentions: bool,
fields: Option<&[String]>,
@@ -355,7 +355,7 @@ pub fn print_timeline_json_with_meta(
expansion_depth: depth,
include_mentions,
total_entities: result.seed_entities.len() + result.expanded_entities.len(),
total_events: total_events_before_limit,
total_events: total_filtered_events,
evidence_notes_included: count_evidence_notes(&result.events),
discussion_threads_included: count_discussion_threads(&result.events),
unresolved_references: result.unresolved_references.len(),

View File

@@ -293,6 +293,28 @@ pub enum Commands {
project: Option<String>,
},
/// Find semantically related entities via vector search
#[command(after_help = "\x1b[1mExamples:\x1b[0m
lore related issues 42 # Find entities related to issue #42
lore related mrs 99 -p group/repo # Related to MR #99 in specific project
lore related 'authentication flow' # Find entities matching free text query
lore --robot related issues 42 -n 5 # JSON output, limit 5 results")]
Related {
/// Entity type (issues, mrs) or free text query
query_or_type: String,
/// Entity IID (required when first arg is entity type)
iid: Option<i64>,
/// Maximum results
#[arg(short = 'n', long, default_value = "10")]
limit: usize,
/// Scope to project (fuzzy match)
#[arg(short, long)]
project: Option<String>,
},
/// Manage cron-based automatic syncing
#[command(after_help = "\x1b[1mExamples:\x1b[0m
lore cron install # Install cron job (every 8 minutes)
@@ -1095,6 +1117,10 @@ pub struct MeArgs {
/// Select output fields (comma-separated, or 'minimal' preset)
#[arg(long, help_heading = "Output", value_delimiter = ',')]
pub fields: Option<Vec<String>>,
/// Reset the since-last-check cursor (next run shows no new events)
#[arg(long, help_heading = "Output")]
pub reset_cursor: bool,
}
impl MeArgs {

152
src/core/cursor.rs Normal file
View File

@@ -0,0 +1,152 @@
// ─── Me Cursor Persistence ──────────────────────────────────────────────────
//
// File-based cursor for the "since last check" section of `lore me`.
// Stores per-user timestamps in ~/.local/share/lore/me_cursor_<username>.json.
use std::io;
use std::io::Write;
use serde::{Deserialize, Serialize};
use super::paths::get_cursor_path;
#[derive(Serialize, Deserialize)]
struct CursorFile {
last_check_ms: i64,
}
/// Read the last-check cursor. Returns `None` if the file doesn't exist or is corrupt.
pub fn read_cursor(username: &str) -> Option<i64> {
let path = get_cursor_path(username);
let data = std::fs::read_to_string(path).ok()?;
let cursor: CursorFile = serde_json::from_str(&data).ok()?;
Some(cursor.last_check_ms)
}
/// Write the last-check cursor atomically.
pub fn write_cursor(username: &str, timestamp_ms: i64) -> io::Result<()> {
let path = get_cursor_path(username);
if let Some(parent) = path.parent() {
std::fs::create_dir_all(parent)?;
let cursor = CursorFile {
last_check_ms: timestamp_ms,
};
let json = serde_json::to_string(&cursor).map_err(io::Error::other)?;
let nonce = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_nanos())
.unwrap_or(0);
let file_name = path
.file_name()
.and_then(|name| name.to_str())
.unwrap_or("me_cursor.json");
let temp_path = parent.join(format!(".{file_name}.{nonce}.tmp"));
{
let mut temp_file = std::fs::File::create(&temp_path)?;
temp_file.write_all(json.as_bytes())?;
temp_file.sync_all()?;
}
std::fs::rename(&temp_path, &path)?;
return Ok(());
}
Err(io::Error::new(
io::ErrorKind::InvalidInput,
"cursor path has no parent directory",
))
}
/// Reset the cursor by deleting the file. No-op if it doesn't exist.
pub fn reset_cursor(username: &str) -> io::Result<()> {
let path = get_cursor_path(username);
match std::fs::remove_file(path) {
Ok(()) => Ok(()),
Err(e) if e.kind() == io::ErrorKind::NotFound => Ok(()),
Err(e) => Err(e),
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::sync::{Mutex, OnceLock};
fn env_lock() -> &'static Mutex<()> {
static LOCK: OnceLock<Mutex<()>> = OnceLock::new();
LOCK.get_or_init(|| Mutex::new(()))
}
fn with_temp_xdg_data_home<T>(f: impl FnOnce() -> T) -> T {
let _guard = env_lock().lock().unwrap();
let previous = std::env::var_os("XDG_DATA_HOME");
let dir = tempfile::tempdir().unwrap();
// SAFETY: test-only scoped env override.
unsafe { std::env::set_var("XDG_DATA_HOME", dir.path()) };
let result = f();
match previous {
Some(value) => {
// SAFETY: restoring prior environment for test isolation.
unsafe { std::env::set_var("XDG_DATA_HOME", value) };
}
None => {
// SAFETY: restoring prior environment for test isolation.
unsafe { std::env::remove_var("XDG_DATA_HOME") };
}
}
result
}
#[test]
fn read_cursor_returns_none_when_missing() {
with_temp_xdg_data_home(|| {
assert_eq!(read_cursor("alice"), None);
});
}
#[test]
fn cursor_roundtrip() {
with_temp_xdg_data_home(|| {
write_cursor("alice", 1_700_000_000_000).unwrap();
assert_eq!(read_cursor("alice"), Some(1_700_000_000_000));
});
}
#[test]
fn cursor_isolated_per_user() {
with_temp_xdg_data_home(|| {
write_cursor("alice", 100).unwrap();
write_cursor("bob", 200).unwrap();
assert_eq!(read_cursor("alice"), Some(100));
assert_eq!(read_cursor("bob"), Some(200));
});
}
#[test]
fn reset_cursor_only_affects_target_user() {
with_temp_xdg_data_home(|| {
write_cursor("alice", 100).unwrap();
write_cursor("bob", 200).unwrap();
reset_cursor("alice").unwrap();
assert_eq!(read_cursor("alice"), None);
assert_eq!(read_cursor("bob"), Some(200));
});
}
#[test]
fn cursor_write_keeps_valid_json() {
with_temp_xdg_data_home(|| {
write_cursor("alice", 111).unwrap();
write_cursor("alice", 222).unwrap();
let data = std::fs::read_to_string(get_cursor_path("alice")).unwrap();
let parsed: CursorFile = serde_json::from_str(&data).unwrap();
assert_eq!(parsed.last_check_ms, 222);
});
}
#[test]
fn parse_corrupt_json_returns_none() {
let bad_json = "not json at all";
let parsed: Option<CursorFile> = serde_json::from_str(bad_json).ok();
assert!(parsed.is_none());
}
}

View File

@@ -93,6 +93,10 @@ const MIGRATIONS: &[(&str, &str)] = &[
"027",
include_str!("../../migrations/027_surgical_sync_runs.sql"),
),
(
"028",
include_str!("../../migrations/028_discussions_mr_fk.sql"),
),
];
pub fn create_connection(db_path: &Path) -> Result<Connection> {
@@ -130,21 +134,20 @@ pub fn create_connection(db_path: &Path) -> Result<Connection> {
}
pub fn run_migrations(conn: &Connection) -> Result<()> {
let has_version_table: bool = conn
.query_row(
// Note: sqlite_master always exists, so errors here indicate real DB problems
// (corruption, locked, etc.) - we must not silently treat them as "fresh DB"
let has_version_table: bool = conn.query_row(
"SELECT COUNT(*) > 0 FROM sqlite_master WHERE type='table' AND name='schema_version'",
[],
|row| row.get(0),
)
.unwrap_or(false);
)?;
let current_version: i32 = if has_version_table {
conn.query_row(
"SELECT COALESCE(MAX(version), 0) FROM schema_version",
[],
|row| row.get(0),
)
.unwrap_or(0)
)?
} else {
0
};

View File

@@ -2,6 +2,7 @@ pub mod backoff;
pub mod config;
#[cfg(unix)]
pub mod cron;
pub mod cursor;
pub mod db;
pub mod dependent_queue;
pub mod error;

View File

@@ -40,6 +40,20 @@ pub fn get_log_dir(config_override: Option<&str>) -> PathBuf {
get_data_dir().join("logs")
}
pub fn get_cursor_path(username: &str) -> PathBuf {
let safe_username: String = username
.chars()
.map(|ch| {
if ch.is_ascii_alphanumeric() || matches!(ch, '_' | '-' | '.') {
ch
} else {
'_'
}
})
.collect();
get_data_dir().join(format!("me_cursor_{safe_username}.json"))
}
pub fn get_backup_dir(config_override: Option<&str>) -> PathBuf {
if let Some(path) = config_override {
return PathBuf::from(path);

View File

@@ -164,9 +164,10 @@ pub struct TimelineResult {
/// The search mode actually used for seeding (e.g. "hybrid", "lexical", "lexical (hybrid fallback)").
pub search_mode: String,
pub events: Vec<TimelineEvent>,
/// Total events before the `--limit` was applied (for meta.total_events vs meta.showing).
/// Total events after filters (e.g., --since) but before --limit was applied.
/// Use this to show "showing X of Y filtered events".
#[serde(skip)]
pub total_events_before_limit: usize,
pub total_filtered_events: usize,
pub seed_entities: Vec<EntityRef>,
pub expanded_entities: Vec<ExpandedEntityRef>,
pub unresolved_references: Vec<UnresolvedRef>,

View File

@@ -260,6 +260,9 @@ fn resolve_documents_to_entities(
}
/// Find evidence notes: FTS5-matched discussion notes that provide context.
///
/// Uses round-robin selection across discussions to ensure diverse evidence
/// rather than all notes coming from a single high-traffic discussion.
fn find_evidence_notes(
conn: &Connection,
fts_query: &str,
@@ -267,6 +270,10 @@ fn find_evidence_notes(
since_ms: Option<i64>,
max_evidence: usize,
) -> Result<Vec<TimelineEvent>> {
// Fetch extra rows to enable round-robin across discussions.
// We'll select from multiple discussions in rotation.
let fetch_limit = (max_evidence * 5).max(50);
let sql = r"
SELECT n.id AS note_id, n.body, n.created_at, n.author_username,
disc.id AS discussion_id,
@@ -286,7 +293,7 @@ fn find_evidence_notes(
let mut stmt = conn.prepare(sql)?;
let rows = stmt.query_map(
rusqlite::params![fts_query, project_id, since_ms, max_evidence as i64],
rusqlite::params![fts_query, project_id, since_ms, fetch_limit as i64],
|row| {
Ok((
row.get::<_, i64>(0)?, // note_id
@@ -331,7 +338,9 @@ fn find_evidence_notes(
}
};
events.push(TimelineEvent {
events.push((
discussion_id,
TimelineEvent {
timestamp: created_at,
entity_type: parent_type,
entity_id: parent_entity_id,
@@ -346,10 +355,67 @@ fn find_evidence_notes(
actor: author,
url: None,
is_seed: true,
});
},
));
}
Ok(events)
// Round-robin selection across discussions for diverse evidence
Ok(round_robin_select_by_discussion(events, max_evidence))
}
/// Round-robin select events across discussions to ensure diverse evidence.
///
/// Groups events by discussion_id, then iterates through discussions in order,
/// taking one event from each until the limit is reached.
fn round_robin_select_by_discussion(
events: Vec<(i64, TimelineEvent)>,
max_evidence: usize,
) -> Vec<TimelineEvent> {
use std::collections::HashMap;
if events.is_empty() || max_evidence == 0 {
return Vec::new();
}
// Group events by discussion_id, preserving order within each group
let mut by_discussion: HashMap<i64, Vec<TimelineEvent>> = HashMap::new();
let mut discussion_order: Vec<i64> = Vec::new();
for (discussion_id, event) in events {
if !by_discussion.contains_key(&discussion_id) {
discussion_order.push(discussion_id);
}
by_discussion.entry(discussion_id).or_default().push(event);
}
// Round-robin selection
let mut result = Vec::with_capacity(max_evidence);
let mut indices: Vec<usize> = vec![0; discussion_order.len()];
'outer: loop {
let mut made_progress = false;
for (disc_idx, &discussion_id) in discussion_order.iter().enumerate() {
let notes = by_discussion.get(&discussion_id).unwrap();
let note_idx = indices[disc_idx];
if note_idx < notes.len() {
result.push(notes[note_idx].clone());
indices[disc_idx] += 1;
made_progress = true;
if result.len() >= max_evidence {
break 'outer;
}
}
}
if !made_progress {
break;
}
}
result
}
#[cfg(test)]

View File

@@ -6,10 +6,12 @@ use std::collections::{BTreeSet, HashMap};
use std::fmt::Write as _;
use super::truncation::{
MAX_DISCUSSION_BYTES, NoteContent, truncate_discussion, truncate_hard_cap,
MAX_DISCUSSION_BYTES, MAX_DOCUMENT_BYTES_HARD, NoteContent, pre_truncate_description,
truncate_discussion, truncate_hard_cap,
};
use crate::core::error::Result;
use crate::core::time::ms_to_iso;
use tracing::warn;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
@@ -158,7 +160,16 @@ pub fn extract_issue_document(conn: &Connection, issue_id: i64) -> Result<Option
if let Some(ref desc) = description {
content.push_str("\n--- Description ---\n\n");
content.push_str(desc);
// Pre-truncate to avoid unbounded memory allocation for huge descriptions
let pre_trunc = pre_truncate_description(desc, MAX_DOCUMENT_BYTES_HARD);
if pre_trunc.was_truncated {
warn!(
iid,
original_bytes = pre_trunc.original_bytes,
"Issue description truncated (oversized)"
);
}
content.push_str(&pre_trunc.content);
}
let labels_hash = compute_list_hash(&labels);
@@ -268,7 +279,16 @@ pub fn extract_mr_document(conn: &Connection, mr_id: i64) -> Result<Option<Docum
if let Some(ref desc) = description {
content.push_str("\n--- Description ---\n\n");
content.push_str(desc);
// Pre-truncate to avoid unbounded memory allocation for huge descriptions
let pre_trunc = pre_truncate_description(desc, MAX_DOCUMENT_BYTES_HARD);
if pre_trunc.was_truncated {
warn!(
iid,
original_bytes = pre_trunc.original_bytes,
"MR description truncated (oversized)"
);
}
content.push_str(&pre_trunc.content);
}
let labels_hash = compute_list_hash(&labels);

View File

@@ -48,6 +48,56 @@ pub fn truncate_utf8(s: &str, max_bytes: usize) -> &str {
&s[..end]
}
/// Result of pre-truncating a description to avoid unbounded memory allocation.
pub struct DescriptionPreTruncateResult {
pub content: String,
pub was_truncated: bool,
pub original_bytes: usize,
}
/// Pre-truncate a description to avoid allocating huge amounts of memory.
///
/// This is called BEFORE appending to the document content, so we don't
/// allocate memory for pathologically large descriptions (e.g., 500MB base64 blob).
///
/// Returns the (potentially truncated) description and whether truncation occurred.
pub fn pre_truncate_description(desc: &str, max_bytes: usize) -> DescriptionPreTruncateResult {
let original_bytes = desc.len();
if original_bytes <= max_bytes {
return DescriptionPreTruncateResult {
content: desc.to_string(),
was_truncated: false,
original_bytes,
};
}
// Truncate at UTF-8 boundary and add indicator
let truncated = truncate_utf8(desc, max_bytes.saturating_sub(50)); // Reserve space for marker
let mut content = truncated.to_string();
content.push_str("\n\n[... description truncated from ");
content.push_str(&format_bytes(original_bytes));
content.push_str(" to ");
content.push_str(&format_bytes(max_bytes));
content.push_str(" ...]");
DescriptionPreTruncateResult {
content,
was_truncated: true,
original_bytes,
}
}
fn format_bytes(bytes: usize) -> String {
if bytes >= 1_000_000 {
format!("{:.1}MB", bytes as f64 / 1_000_000.0)
} else if bytes >= 1_000 {
format!("{:.1}KB", bytes as f64 / 1_000.0)
} else {
format!("{}B", bytes)
}
}
pub fn truncate_discussion(notes: &[NoteContent], max_bytes: usize) -> TruncationResult {
if notes.is_empty() {
return TruncationResult {

View File

@@ -130,6 +130,12 @@ pub async fn ingest_project_issues_with_progress(
progress: Option<ProgressCallback>,
signal: &ShutdownSignal,
) -> Result<IngestProjectResult> {
// Reclaim stale locks once at entry, not per-drain-function
let reclaimed = reclaim_stale_locks(conn, config.sync.stale_lock_minutes)?;
if reclaimed > 0 {
debug!(reclaimed, "Reclaimed stale locks at issue sync start");
}
let mut result = IngestProjectResult::default();
let emit = |event: ProgressEvent| {
if let Some(ref cb) = progress {
@@ -176,7 +182,7 @@ pub async fn ingest_project_issues_with_progress(
None => {
warn!("Cannot enrich statuses: project path not found for project_id={project_id}");
result.status_enrichment_error = Some("project_path_missing".into());
result.status_enrichment_mode = "fetched".into();
result.status_enrichment_mode = "error".into();
emit(ProgressEvent::StatusEnrichmentComplete {
enriched: 0,
cleared: 0,
@@ -260,7 +266,7 @@ pub async fn ingest_project_issues_with_progress(
Err(e) => {
warn!("Status enrichment fetch failed: {e}");
result.status_enrichment_error = Some(e.to_string());
result.status_enrichment_mode = "fetched".into();
result.status_enrichment_mode = "fetch_error".into();
emit(ProgressEvent::StatusEnrichmentComplete {
enriched: 0,
cleared: 0,
@@ -460,7 +466,8 @@ async fn sync_discussions_sequential(
progress: &Option<ProgressCallback>,
signal: &ShutdownSignal,
) -> Result<Vec<super::discussions::IngestDiscussionsResult>> {
let batch_size = config.sync.dependent_concurrency as usize;
// Guard against batch_size == 0 which would panic in .chunks()
let batch_size = (config.sync.dependent_concurrency as usize).max(1);
let total = issues.len();
let mut results = Vec::with_capacity(issues.len());
@@ -531,6 +538,12 @@ pub async fn ingest_project_merge_requests_with_progress(
progress: Option<ProgressCallback>,
signal: &ShutdownSignal,
) -> Result<IngestMrProjectResult> {
// Reclaim stale locks once at entry, not per-drain-function
let reclaimed = reclaim_stale_locks(conn, config.sync.stale_lock_minutes)?;
if reclaimed > 0 {
debug!(reclaimed, "Reclaimed stale locks at MR sync start");
}
let mut result = IngestMrProjectResult::default();
let emit = |event: ProgressEvent| {
if let Some(ref cb) = progress {
@@ -766,7 +779,8 @@ async fn sync_mr_discussions_sequential(
progress: &Option<ProgressCallback>,
signal: &ShutdownSignal,
) -> Result<Vec<super::mr_discussions::IngestMrDiscussionsResult>> {
let batch_size = config.sync.dependent_concurrency as usize;
// Guard against batch_size == 0 which would panic in .chunks()
let batch_size = (config.sync.dependent_concurrency as usize).max(1);
let total = mrs.len();
let mut results = Vec::with_capacity(mrs.len());
@@ -941,10 +955,7 @@ async fn drain_resource_events(
let mut result = DrainResult::default();
let batch_size = config.sync.dependent_concurrency as usize;
let reclaimed = reclaim_stale_locks(conn, config.sync.stale_lock_minutes)?;
if reclaimed > 0 {
debug!(reclaimed, "Reclaimed stale resource event locks");
}
// Note: stale locks are reclaimed once at sync entry point, not here
let claimable_counts = count_claimable_jobs(conn, project_id)?;
let total_pending = claimable_counts
@@ -1263,10 +1274,7 @@ async fn drain_mr_closes_issues(
let mut result = DrainResult::default();
let batch_size = config.sync.dependent_concurrency as usize;
let reclaimed = reclaim_stale_locks(conn, config.sync.stale_lock_minutes)?;
if reclaimed > 0 {
debug!(reclaimed, "Reclaimed stale mr_closes_issues locks");
}
// Note: stale locks are reclaimed once at sync entry point, not here
let claimable_counts = count_claimable_jobs(conn, project_id)?;
let total_pending = claimable_counts
@@ -1523,10 +1531,7 @@ async fn drain_mr_diffs(
let mut result = DrainResult::default();
let batch_size = config.sync.dependent_concurrency as usize;
let reclaimed = reclaim_stale_locks(conn, config.sync.stale_lock_minutes)?;
if reclaimed > 0 {
debug!(reclaimed, "Reclaimed stale mr_diffs locks");
}
// Note: stale locks are reclaimed once at sync entry point, not here
let claimable_counts = count_claimable_jobs(conn, project_id)?;
let total_pending = claimable_counts.get("mr_diffs").copied().unwrap_or(0);

View File

@@ -18,15 +18,16 @@ use lore::cli::commands::{
print_event_count, print_event_count_json, print_file_history, print_file_history_json,
print_generate_docs, print_generate_docs_json, print_ingest_summary, print_ingest_summary_json,
print_list_issues, print_list_issues_json, print_list_mrs, print_list_mrs_json,
print_list_notes, print_list_notes_json, print_search_results, print_search_results_json,
print_show_issue, print_show_issue_json, print_show_mr, print_show_mr_json, print_stats,
print_stats_json, print_sync, print_sync_json, print_sync_status, print_sync_status_json,
print_timeline, print_timeline_json_with_meta, print_trace, print_trace_json, print_who_human,
print_who_json, query_notes, run_auth_test, run_count, run_count_events, run_cron_install,
run_cron_status, run_cron_uninstall, run_doctor, run_drift, run_embed, run_file_history,
run_generate_docs, run_ingest, run_ingest_dry_run, run_init, run_list_issues, run_list_mrs,
run_me, run_search, run_show_issue, run_show_mr, run_stats, run_sync, run_sync_status,
run_timeline, run_token_set, run_token_show, run_who,
print_list_notes, print_list_notes_json, print_related_human, print_related_json,
print_search_results, print_search_results_json, print_show_issue, print_show_issue_json,
print_show_mr, print_show_mr_json, print_stats, print_stats_json, print_sync, print_sync_json,
print_sync_status, print_sync_status_json, print_timeline, print_timeline_json_with_meta,
print_trace, print_trace_json, print_who_human, print_who_json, query_notes, run_auth_test,
run_count, run_count_events, run_cron_install, run_cron_status, run_cron_uninstall, run_doctor,
run_drift, run_embed, run_file_history, run_generate_docs, run_ingest, run_ingest_dry_run,
run_init, run_list_issues, run_list_mrs, run_me, run_related, run_search, run_show_issue,
run_show_mr, run_stats, run_sync, run_sync_status, run_timeline, run_token_set, run_token_show,
run_who,
};
use lore::cli::render::{ColorMode, GlyphMode, Icons, LoreRenderer, Theme};
use lore::cli::robot::{RobotMeta, strip_schemas};
@@ -225,6 +226,22 @@ async fn main() {
)
.await
}
Some(Commands::Related {
query_or_type,
iid,
limit,
project,
}) => {
handle_related(
cli.config.as_deref(),
&query_or_type,
iid,
limit,
project.as_deref(),
robot_mode,
)
.await
}
Some(Commands::Stats(args)) => handle_stats(cli.config.as_deref(), args, robot_mode).await,
Some(Commands::Embed(args)) => handle_embed(cli.config.as_deref(), args, robot_mode).await,
Some(Commands::Sync(args)) => {
@@ -1996,7 +2013,7 @@ async fn handle_timeline(
if robot_mode {
print_timeline_json_with_meta(
&result,
result.total_events_before_limit,
result.total_filtered_events,
params.depth,
!params.no_mentions,
args.fields.as_deref(),
@@ -2956,8 +2973,8 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
}
},
"me": {
"description": "Personal work dashboard: open issues, authored/reviewing MRs, activity feed with computed attention states",
"flags": ["--issues", "--mrs", "--activity", "--since <period>", "-p/--project <path>", "--all", "--user <username>", "--fields <list|minimal>"],
"description": "Personal work dashboard: open issues, authored/reviewing MRs, activity feed, and cursor-based since-last-check inbox with computed attention states",
"flags": ["--issues", "--mrs", "--activity", "--since <period>", "-p/--project <path>", "--all", "--user <username>", "--fields <list|minimal>", "--reset-cursor"],
"example": "lore --robot me",
"response_schema": {
"ok": "bool",
@@ -2965,6 +2982,7 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
"username": "string",
"since_iso": "string?",
"summary": {"project_count": "int", "open_issue_count": "int", "authored_mr_count": "int", "reviewing_mr_count": "int", "needs_attention_count": "int"},
"since_last_check": "{cursor_iso:string, total_event_count:int, groups:[{entity_type:string, entity_iid:int, entity_title:string, project:string, events:[{timestamp_iso:string, event_type:string, actor:string?, summary:string, body_preview:string?}]}]}?",
"open_issues": "[{project:string, iid:int, title:string, state:string, attention_state:string, status_name:string?, labels:[string], updated_at_iso:string, web_url:string?}]",
"open_mrs_authored": "[{project:string, iid:int, title:string, state:string, attention_state:string, draft:bool, detailed_merge_status:string?, author_username:string?, labels:[string], updated_at_iso:string, web_url:string?}]",
"reviewing_mrs": "[same as open_mrs_authored]",
@@ -2981,7 +2999,9 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
"event_types": "note | status_change | label_change | assign | unassign | review_request | milestone_change",
"section_flags": "If none of --issues/--mrs/--activity specified, all sections returned",
"since_default": "1d for activity feed",
"issue_filter": "Only In Progress / In Review status issues shown"
"issue_filter": "Only In Progress / In Review status issues shown",
"since_last_check": "Cursor-based inbox showing events since last run. Null on first run (no cursor yet). Groups events by entity (issue/MR). Sources: others' comments on your items, @mentions, assignment/review-request notes. Cursor auto-advances after each run. Use --reset-cursor to clear.",
"cursor_persistence": "Stored per user in ~/.local/share/lore/me_cursor_<username>.json. --project filters display only for since-last-check; cursor still advances for all projects for that user."
}
},
"robot-docs": {
@@ -3013,7 +3033,7 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
"embed: Generate vector embeddings for semantic search via Ollama",
"cron: Automated sync scheduling (Unix)",
"token: Secure token management with masked display",
"me: Personal work dashboard with attention states, activity feed, and needs-attention triage"
"me: Personal work dashboard with attention states, activity feed, cursor-based since-last-check inbox, and needs-attention triage"
],
"read_write_split": "lore = ALL reads (issues, MRs, search, who, timeline, intelligence). glab = ALL writes (create, update, approve, merge, CI/CD)."
});
@@ -3080,6 +3100,14 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
"lore --robot sync --issue 7 -p group/project",
"lore --robot sync --issue 7 --mr 10 -p group/project",
"lore --robot sync --issue 7 -p group/project --preflight-only"
],
"personal_dashboard": [
"lore --robot me",
"lore --robot me --issues",
"lore --robot me --activity --since 7d",
"lore --robot me --project group/repo",
"lore --robot me --fields minimal",
"lore --robot me --reset-cursor"
]
});
@@ -3245,6 +3273,28 @@ async fn handle_drift(
Ok(())
}
async fn handle_related(
config_override: Option<&str>,
query_or_type: &str,
iid: Option<i64>,
limit: usize,
project: Option<&str>,
robot_mode: bool,
) -> Result<(), Box<dyn std::error::Error>> {
let start = std::time::Instant::now();
let config = Config::load(config_override)?;
let effective_project = config.effective_project(project);
let response = run_related(&config, query_or_type, iid, limit, effective_project).await?;
let elapsed_ms = start.elapsed().as_millis() as u64;
if robot_mode {
print_related_json(&response, elapsed_ms);
} else {
print_related_human(&response);
}
Ok(())
}
#[allow(clippy::too_many_arguments)]
async fn handle_list_compat(
config_override: Option<&str>,

View File

@@ -54,7 +54,9 @@ pub fn to_fts_query(raw: &str, mode: FtsQueryMode) -> String {
// FTS5 boolean operators are case-sensitive uppercase keywords.
// Pass them through unquoted so users can write "switch AND health".
const FTS5_OPERATORS: &[&str] = &["AND", "OR", "NOT", "NEAR"];
// Note: NEAR is a function NEAR(term1 term2, N), not an infix operator.
// Users who need NEAR syntax should use FtsQueryMode::Raw.
const FTS5_OPERATORS: &[&str] = &["AND", "OR", "NOT"];
let mut result = String::with_capacity(trimmed.len() + 20);
for (i, token) in trimmed.split_whitespace().enumerate() {