feat(path): rename-aware ambiguity resolution for suffix probe

When a bare filename like 'operators.ts' matches multiple full paths,
check if they are the same file connected by renames (via BFS on
mr_file_changes). If so, auto-resolve to the newest path instead of
erroring. Also wires path resolution into file-history and trace
commands so bare filenames work everywhere.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
teernisse
2026-02-17 15:05:47 -05:00
parent 171260a772
commit 714c8c2623
7 changed files with 632 additions and 100 deletions

View File

@@ -0,0 +1,140 @@
Your iteration 4 plan is already strong. The highest-impact revisions are around query shape, transaction boundaries, and contract stability for agents.
1. **Switch discussions query to a two-phase page-first architecture**
Analysis: Current `ranked_notes` runs over every filtered discussion before `LIMIT`, which can explode on project-wide queries. A page-first plan keeps complexity proportional to `limit`, improves tail latency, and reduces memory churn.
```diff
@@ ## 3c. SQL Query
-Core query uses a CTE + ranked-notes rollup (window function) to avoid per-row correlated
-subqueries.
+Core query is split into two phases for scalability:
+1) `paged_discussions` applies filters/sort/LIMIT and returns only page IDs.
+2) Note rollups and optional `--include-notes` expansion run only for those page IDs.
+This bounds note scanning to visible results and stabilizes latency on large projects.
-WITH filtered_discussions AS (
+WITH filtered_discussions AS (
...
),
-ranked_notes AS (
+paged_discussions AS (
+ SELECT id
+ FROM filtered_discussions
+ ORDER BY COALESCE({sort_column}, 0) {order}, id {order}
+ LIMIT ?
+),
+ranked_notes AS (
...
- WHERE n.discussion_id IN (SELECT id FROM filtered_discussions)
+ WHERE n.discussion_id IN (SELECT id FROM paged_discussions)
)
```
2. **Move snapshot transaction ownership to handlers (not query helpers)**
Analysis: This avoids nested transaction edge cases, keeps function signatures clean, and guarantees one snapshot across count + page + include-notes + serialization metadata.
```diff
@@ ## Cross-cutting: snapshot consistency
-Wrap `query_notes` and `query_discussions` in a deferred read transaction.
+Open one deferred read transaction in each handler (`handle_notes`, `handle_discussions`)
+and pass `&Transaction` into query helpers. Query helpers do not open/commit transactions.
+This guarantees a single snapshot across all subqueries and avoids nested tx pitfalls.
-pub fn query_discussions(conn: &Connection, ...)
+pub fn query_discussions(tx: &rusqlite::Transaction<'_>, ...)
```
3. **Add immutable input filter `--project-id` across notes/discussions/show**
Analysis: You already expose `gitlab_project_id` because paths are mutable; input should support the same immutable selector. This removes failure modes after project renames/transfers.
```diff
@@ ## 3a. CLI Args
+ /// Filter by immutable GitLab project ID
+ #[arg(long, help_heading = "Filters", conflicts_with = "project")]
+ pub project_id: Option<i64>,
@@ ## Bridge Contract
+Input symmetry rule: commands that accept `--project` should also accept `--project-id`.
+If both are present, return usage error (exit code 2).
```
4. **Enforce bridge fields for nested notes in `discussions --include-notes`**
Analysis: Current guardrail is entity-level; nested notes can still lose required IDs under aggressive filtering. This is a contract hole for write-bridging.
```diff
@@ ### Field Filtering Guardrail
-In robot mode, `filter_fields` MUST force-include Bridge Contract fields...
+In robot mode, `filter_fields` MUST force-include Bridge Contract fields at all returned levels:
+- discussion row fields
+- nested note fields when `discussions --include-notes` is used
+const BRIDGE_FIELDS_DISCUSSION_NOTES: &[&str] = &[
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
+ "gitlab_discussion_id", "gitlab_note_id",
+];
```
5. **Make ambiguity preflight scope-aware and machine-actionable**
Analysis: Current preflight checks only `gitlab_discussion_id`, which can produce false ambiguity when additional filters already narrow to one project. Also, agents need structured candidates, not only free-text.
```diff
@@ ### Ambiguity Guardrail
-SELECT DISTINCT p.path_with_namespace
+SELECT DISTINCT p.path_with_namespace, p.gitlab_project_id
FROM discussions d
JOIN projects p ON p.id = d.project_id
-WHERE d.gitlab_discussion_id = ?
+WHERE d.gitlab_discussion_id = ?
+ /* plus active scope filters: noteable_type, for_issue/for_mr, since/path when present */
LIMIT 3
-Return LoreError::Ambiguous with message
+Return LoreError::Ambiguous with structured details:
+`{ code, message, candidates:[{project_path, gitlab_project_id}], suggestion }`
```
6. **Add `--contains` filter to `discussions`**
Analysis: This is a high-utility agent workflow gap. Agents frequently need “find thread by text then reply”; forcing a separate `notes` search round-trip is unnecessary.
```diff
@@ ## 3a. CLI Args
+ /// Filter discussions whose notes contain text
+ #[arg(long, help_heading = "Filters")]
+ pub contains: Option<String>,
@@ ## 3d. Filters struct
+ pub contains: Option<String>,
@@ ## 3d. Where-clause construction
+- `path` -> EXISTS (...)
+- `path` -> EXISTS (...)
+- `contains` -> EXISTS (
+ SELECT 1 FROM notes n
+ WHERE n.discussion_id = d.id
+ AND n.body LIKE ?
+ )
```
7. **Promote two baseline indexes from “candidate” to “required”**
Analysis: These are directly hit by new primary paths; waiting for post-merge profiling risks immediate perf cliffs in real usage.
```diff
@@ ## 3h. Query-plan validation
-Candidate indexes (add only if EXPLAIN QUERY PLAN shows they're needed):
-- discussions(project_id, gitlab_discussion_id)
-- notes(discussion_id, created_at DESC, id DESC)
+Required baseline indexes for this feature:
+- discussions(project_id, gitlab_discussion_id)
+- notes(discussion_id, created_at DESC, id DESC)
+Keep other indexes conditional on EXPLAIN QUERY PLAN.
```
8. **Add schema versioning and remove contradictory rejected items**
Analysis: `robot-docs` contract drift is a long-term agent risk; explicit schema versions let clients fail safely. Also, rejected items currently contradict active sections, which creates implementation ambiguity.
```diff
@@ ## 4. Fix Robot-Docs Response Schemas
"meta": {"elapsed_ms": "int", ...}
+"meta": {"elapsed_ms":"int", ..., "schema_version":"string"}
+
+Schema version policy:
+- bump minor on additive fields
+- bump major on removals/renames
+- expose per-command versions in `robot-docs`
@@ ## Rejected Recommendations
-- Add `gitlab_note_id` to show-command note detail structs ... rejected ...
-- Add `gitlab_discussion_id` to show-command discussion detail structs ... rejected ...
-- Add `gitlab_project_id` to show-command discussion detail structs ... rejected ...
+Remove stale rejected entries that conflict with accepted workstreams in this plan iteration.
```
If you want, I can produce a fully rewritten iteration 5 plan document that applies all of the above edits cleanly end-to-end.

View File

@@ -2,7 +2,7 @@
plan: true
title: ""
status: iterating
iteration: 4
iteration: 5
target_iterations: 8
beads_revision: 0
related_plans: []
@@ -52,8 +52,9 @@ output.
### Field Filtering Guardrail
In robot mode, `filter_fields` **MUST** force-include Bridge Contract fields even when the
caller passes a narrower `--fields` list. This prevents agents from accidentally stripping
the identifiers they need for write operations.
caller passes a narrower `--fields` list. This applies at **all nesting levels**: both the
top-level entity fields and nested sub-entities (e.g., notes inside `discussions --include-notes`).
This prevents agents from accidentally stripping the identifiers they need for write operations.
**Implementation**: Add a `BRIDGE_FIELDS` constant map per entity type. In `filter_fields()`,
when operating in robot mode, union the caller's requested fields with the bridge set before
@@ -69,70 +70,127 @@ const BRIDGE_FIELDS_DISCUSSIONS: &[&str] = &[
"project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id",
];
// Applied to nested notes within discussions --include-notes
const BRIDGE_FIELDS_DISCUSSION_NOTES: &[&str] = &[
"project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id", "gitlab_note_id",
];
```
In `filter_fields`, when entity is `"notes"` or `"discussions"`, merge the bridge set into the
requested fields before filtering the JSON value. This is a ~5-line change to the existing
function.
requested fields before filtering the JSON value. For `"discussions"`, also apply
`BRIDGE_FIELDS_DISCUSSION_NOTES` to each element of the nested `notes` array. This is a ~10-line
change to the existing function.
### Snapshot Consistency (Cross-Cutting)
Multi-query commands (`handle_notes`, `handle_discussions`) **MUST** execute all their queries
within a single deferred read transaction. This guarantees snapshot consistency when a concurrent
sync/ingest is modifying the database.
**Transaction ownership lives in handlers, not query helpers.** Each handler opens one deferred
read transaction and passes it to query helpers. Query helpers accept `&Connection` (which
`Transaction` derefs to via `std::ops::Deref`) so they remain testable with plain connections
in unit tests. This avoids nested transaction edge cases and guarantees a single snapshot across
count + page + include-notes + serialization.
```rust
// In handle_notes / handle_discussions:
let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
let result = query_notes(&tx, &filters, &config)?;
// ... serialize ...
tx.commit()?; // read-only, but closes cleanly
```
Query helpers keep their `conn: &Connection` signature — `Transaction<'_>` implements
`Deref<Target = Connection>`, so `&tx` coerces to `&Connection` at call sites.
### Ambiguity Guardrail
When filtering by `gitlab_discussion_id` (on either `notes` or `discussions` commands) without
`--project`, if the query matches discussions in multiple projects:
- Return an `Ambiguous` error (exit code 18, matching existing convention)
- Include matching project paths in the error message
- Include matching project paths **and `gitlab_project_id`s** in a structured candidates list
- Suggest retry with `--project <path>`
**Implementation**: Run a **preflight distinct-project check** before the main list query
executes its `LIMIT`. This is critical because a post-query check on the paginated result set
can silently miss cross-project ambiguity when `LIMIT` truncates results to rows from a single
project. The preflight query is cheap (hits the `gitlab_discussion_id` index, returns at most
a few rows) and eliminates non-deterministic write-targeting risk.
**Implementation**: Run a **scope-aware preflight distinct-project check** before the main list
query executes its `LIMIT`. The preflight applies active scope filters (noteable_type, since,
for_issue/for_mr) alongside the discussion ID check, so it won't produce false ambiguity when
other filters already narrow to one project. This is critical because a post-query check on the
paginated result set can silently miss cross-project ambiguity when `LIMIT` truncates results to
rows from a single project. The preflight query is cheap (hits the `gitlab_discussion_id` index,
returns at most a few rows) and eliminates non-deterministic write-targeting risk.
```sql
-- Preflight ambiguity check (runs before main query)
SELECT DISTINCT p.path_with_namespace
-- Preflight ambiguity check (runs before main query, includes active scope filters)
SELECT DISTINCT p.path_with_namespace, p.gitlab_project_id
FROM discussions d
JOIN projects p ON p.id = d.project_id
WHERE d.gitlab_discussion_id = ?
-- scope filters applied dynamically:
-- AND d.noteable_type = ? (when --noteable-type present)
-- AND d.merge_request_id = (SELECT ...) (when --for-mr present)
-- AND d.issue_id = (SELECT ...) (when --for-issue present)
LIMIT 3
```
If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with the
distinct project paths and suggestion to retry with `--project <path>`.
If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with structured
candidates for machine consumption:
```rust
// In query_notes / query_discussions, before executing the main query:
if let Some(ref disc_id) = filters.gitlab_discussion_id {
if filters.project.is_none() {
let distinct_projects: Vec<String> = conn
let candidates: Vec<(String, i64)> = conn
.prepare(
"SELECT DISTINCT p.path_with_namespace \
"SELECT DISTINCT p.path_with_namespace, p.gitlab_project_id \
FROM discussions d \
JOIN projects p ON p.id = d.project_id \
WHERE d.gitlab_discussion_id = ? \
LIMIT 3"
// Note: add scope filter clauses dynamically
)?
.query_map([disc_id], |row| row.get(0))?
.query_map([disc_id], |row| Ok((row.get(0)?, row.get(1)?)))?
.collect::<std::result::Result<Vec<_>, _>>()?;
if distinct_projects.len() > 1 {
if candidates.len() > 1 {
return Err(LoreError::Ambiguous {
message: format!(
"Discussion ID matches {} projects: {}. Use --project to disambiguate.",
distinct_projects.len(),
distinct_projects.join(", ")
"Discussion ID matches {} projects. Use --project to disambiguate.",
candidates.len(),
),
candidates: candidates.into_iter()
.map(|(path, id)| AmbiguousCandidate { project_path: path, gitlab_project_id: id })
.collect(),
});
}
}
}
```
In robot mode, the error serializes as:
```json
{
"error": {
"code": "AMBIGUOUS",
"message": "Discussion ID matches 2 projects. Use --project to disambiguate.",
"candidates": [
{"project_path": "group/repo-a", "gitlab_project_id": 42},
{"project_path": "group/repo-b", "gitlab_project_id": 99}
],
"suggestion": "lore -J discussions --gitlab-discussion-id <id> --project <path>",
"actions": ["lore -J discussions --gitlab-discussion-id <id> --project group/repo-a"]
}
}
```
This gives agents machine-actionable candidates: they can pick a project and retry immediately
without parsing free-text error messages.
#### 1h. Wrap `query_notes` in a read transaction
Wrap the count query and page query in a deferred read transaction per the Snapshot Consistency
cross-cutting requirement. See the Bridge Contract section for the pattern.
Per the Snapshot Consistency cross-cutting requirement, `handle_notes` opens a deferred read
transaction and passes it to `query_notes`. See the Snapshot Consistency section for the pattern.
### Tests
@@ -337,6 +395,7 @@ fn notes_ambiguous_gitlab_discussion_id_across_projects() {
// (this can happen since IDs are per-project)
// Filter by gitlab_discussion_id without --project
// Assert LoreError::Ambiguous is returned with both project paths
// Assert candidates include gitlab_project_id for machine consumption
}
```
@@ -352,6 +411,19 @@ fn notes_ambiguity_preflight_not_defeated_by_limit() {
}
```
#### Test 8: Ambiguity preflight respects scope filters (no false positives)
```rust
#[test]
fn notes_ambiguity_preflight_respects_scope_filters() {
let conn = create_test_db();
// Insert 2 projects, each with a discussion sharing the same gitlab_discussion_id
// But one is Issue-type and the other MergeRequest-type
// Filter by gitlab_discussion_id + --noteable-type MergeRequest (narrows to 1 project)
// Assert NO ambiguity error — scope filters disambiguate
}
```
---
## 2. Add `gitlab_discussion_id` to Show Command Discussion Groups
@@ -644,6 +716,9 @@ lore -J discussions --gitlab-discussion-id 6a9c1750b37d
# List unresolved threads with latest 2 notes inline (fewer round-trips)
lore -J discussions --for-mr 99 --resolution unresolved --include-notes 2
# Find discussions containing specific text
lore -J discussions --for-mr 99 --contains "prefer the approach"
```
### Response Schema
@@ -801,6 +876,10 @@ pub struct DiscussionsArgs {
#[arg(long, value_enum, help_heading = "Filters")]
pub noteable_type: Option<NoteableTypeFilter>,
/// Filter discussions whose notes contain text (case-insensitive LIKE match)
#[arg(long, help_heading = "Filters")]
pub contains: Option<String>,
/// Include up to N latest notes per discussion (0 = none, default; clamped to 20)
#[arg(long, default_value = "0", help_heading = "Output")]
pub include_notes: usize,
@@ -925,7 +1004,7 @@ The `included_note_count` is set to `notes.len()` and `has_more_notes` is set to
`note_count > included_note_count` during the JSON conversion, providing per-discussion
truncation signals.
#### 3c. SQL Query
#### 3c. SQL Query — Two-Phase Page-First Architecture
**File**: `src/cli/commands/list.rs`
@@ -935,21 +1014,29 @@ pub fn query_discussions(
filters: &DiscussionListFilters,
config: &Config,
) -> Result<DiscussionListResult> {
// Wrap all queries in a deferred read transaction for snapshot consistency
let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
// NOTE: Transaction is managed by the handler (handle_discussions).
// This function receives &Connection (which Transaction derefs to via `std::ops::Deref`).
// Preflight ambiguity check (if gitlab_discussion_id without project)
// ... see Ambiguity Guardrail section ...
// Main query + count query ...
// ... note expansion query (if include_notes > 0) ...
tx.commit()?;
// Phase 1: Filter + sort + LIMIT to get page IDs
// Phase 2: Note rollups only for paged results
// Phase 3: Optional --include-notes expansion (separate query)
}
```
Core query uses a CTE + ranked-notes rollup (window function) to avoid per-row correlated
subqueries. The `ROW_NUMBER()` approach produces a single scan over the notes table, which
is more predictable than repeated LIMIT 1 sub-selects at scale (200K+ discussions):
The query uses a **two-phase page-first architecture** for scalability:
1. **Phase 1** (`paged_discussions`): Apply all filters, sort, and LIMIT to produce just the
discussion IDs for the current page. This bounds the result set before any note scanning.
2. **Phase 2** (`ranked_notes` + `note_rollup`): Run note aggregation only for the paged
discussion IDs. This ensures note scanning is proportional to `--limit`, not to the total
filtered discussion count.
This architecture prevents the performance cliff that occurs on project-wide queries with
thousands of discussions: instead of scanning notes for all filtered discussions (potentially
200K+), we scan only for the 50 (or whatever `--limit` is) that will actually be returned.
```sql
WITH filtered_discussions AS (
@@ -961,6 +1048,14 @@ WITH filtered_discussions AS (
JOIN projects p ON d.project_id = p.id
{where_sql}
),
-- Phase 1: Page-first — apply sort + LIMIT before note scanning
paged_discussions AS (
SELECT id
FROM filtered_discussions
ORDER BY COALESCE({sort_column}, 0) {order}, id {order}
LIMIT ?
),
-- Phase 2: Note rollups only for paged results
ranked_notes AS (
SELECT
n.discussion_id,
@@ -980,7 +1075,7 @@ ranked_notes AS (
n.created_at, n.id
) AS rn_first_position
FROM notes n
WHERE n.discussion_id IN (SELECT id FROM filtered_discussions)
WHERE n.discussion_id IN (SELECT id FROM paged_discussions)
),
note_rollup AS (
SELECT
@@ -1012,12 +1107,12 @@ SELECT
nr.position_new_path,
nr.position_new_line
FROM filtered_discussions fd
JOIN paged_discussions pd ON fd.id = pd.id
JOIN projects p ON fd.project_id = p.id
LEFT JOIN issues i ON fd.issue_id = i.id
LEFT JOIN merge_requests m ON fd.merge_request_id = m.id
LEFT JOIN note_rollup nr ON nr.discussion_id = fd.id
ORDER BY COALESCE({sort_column}, 0) {order}, fd.id {order}
LIMIT ?
```
**Dual window function rationale**: The `ranked_notes` CTE uses two separate `ROW_NUMBER()`
@@ -1028,12 +1123,11 @@ displacing the first human author/body, and prevents a non-positioned note from
the file location. The `MAX(CASE WHEN rn_xxx = 1 ...)` pattern extracts the correct value
from each independently-ranked sequence.
**Performance rationale**: The CTE pre-filters discussions before joining notes. The
`ranked_notes` CTE uses `ROW_NUMBER()` (a single pass over the notes index) instead of
correlated `(SELECT ... LIMIT 1)` sub-selects per discussion. For MR-scoped queries
(50-200 discussions) the performance is equivalent. For project-wide scans with thousands
of discussions, the window function approach avoids repeated index probes and produces a
more predictable query plan.
**Page-first scalability rationale**: The `paged_discussions` CTE applies LIMIT before note
scanning. For MR-scoped queries (50-200 discussions) the performance is equivalent to the
non-paged approach. For project-wide scans with thousands of discussions, the page-first
architecture avoids scanning notes for discussions that won't appear in the result, keeping
latency proportional to `--limit` rather than to the total filtered count.
**Note on ordering**: The `COALESCE({sort_column}, 0)` with tiebreaker `fd.id` ensures
deterministic ordering even when timestamps are NULL (partial sync states). The `id`
@@ -1042,6 +1136,10 @@ tiebreaker is cheap (primary key) and prevents unstable sort output.
**Note on SQLite FILTER syntax**: SQLite does not support `COUNT(*) FILTER (WHERE ...)`.
Use `SUM(CASE WHEN ... THEN 1 ELSE 0 END)` instead (as shown above).
**Count query**: The total_count query runs separately against `filtered_discussions` (without
the LIMIT) using `SELECT COUNT(*) FROM filtered_discussions`. This is needed for `has_more`
metadata. The count uses the same filter CTEs but omits notes entirely.
#### 3c-ii. Note expansion query (--include-notes)
When `include_notes > 0`, after the main discussion query, run a **single batched query**
@@ -1103,6 +1201,7 @@ pub struct DiscussionListFilters {
pub since: Option<String>,
pub path: Option<String>,
pub noteable_type: Option<NoteableTypeFilter>,
pub contains: Option<String>,
pub sort: DiscussionSortField,
pub order: SortDirection,
pub include_notes: usize,
@@ -1117,6 +1216,7 @@ Where-clause construction uses `match` on typed enums — never raw string inter
- `since``d.first_note_at >= ?` (using `parse_since()`)
- `path``EXISTS (SELECT 1 FROM notes n WHERE n.discussion_id = d.id AND n.position_new_path LIKE ?)`
- `noteable_type` → match: `Issue``d.noteable_type = 'Issue'`, `MergeRequest``d.noteable_type = 'MergeRequest'`
- `contains``EXISTS (SELECT 1 FROM notes n WHERE n.discussion_id = d.id AND n.body LIKE '%' || ? || '%')`
#### 3e. Handler wiring
@@ -1128,7 +1228,7 @@ Add match arm:
Some(Commands::Discussions(args)) => handle_discussions(cli.config.as_deref(), args, robot_mode),
```
Handler function:
Handler function (with transaction ownership):
```rust
fn handle_discussions(
@@ -1143,6 +1243,10 @@ fn handle_discussions(
let effective_limit = args.limit.min(500);
let effective_include_notes = args.include_notes.min(20);
// Snapshot consistency: one transaction across all queries
let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
let filters = DiscussionListFilters {
limit: effective_limit,
project: args.project,
@@ -1153,12 +1257,15 @@ fn handle_discussions(
since: args.since,
path: args.path,
noteable_type: args.noteable_type,
contains: args.contains,
sort: args.sort,
order: args.order,
include_notes: effective_include_notes,
};
let result = query_discussions(&conn, &filters, &config)?;
let result = query_discussions(&tx, &filters, &config)?;
tx.commit()?; // read-only, but closes cleanly
let format = if robot_mode && args.format == "table" {
"json"
@@ -1247,7 +1354,7 @@ CSV view: all fields, following same pattern as `print_list_notes_csv`.
.collect(),
```
#### 3h. Query-plan validation
#### 3h. Query-plan validation and indexes
Before merging the discussions command, capture `EXPLAIN QUERY PLAN` output for the three
primary query patterns:
@@ -1255,17 +1362,26 @@ primary query patterns:
- `--project <path> --since 7d --sort last-note`
- `--gitlab-discussion-id <id>`
If plans show table scans on `notes` or `discussions` for these patterns, add targeted indexes
to the `MIGRATIONS` array in `src/core/db.rs`:
**Required baseline index** (directly hit by `--include-notes` expansion, which runs a
`ROW_NUMBER() OVER (PARTITION BY discussion_id ORDER BY created_at DESC, id DESC)` window
on the notes table):
**Candidate indexes** (add only if EXPLAIN QUERY PLAN shows they're needed):
```sql
CREATE INDEX IF NOT EXISTS idx_notes_discussion_created_desc
ON notes(discussion_id, created_at DESC, id DESC);
```
This index is non-negotiable because the include-notes expansion query's performance is
directly proportional to how efficiently it can scan notes per discussion. Without it, SQLite
falls back to a full table scan of the 282K-row notes table for each batch.
**Conditional indexes** (add only if EXPLAIN QUERY PLAN shows they're needed):
- `discussions(project_id, gitlab_discussion_id)` — for ambiguity preflight + direct ID lookup
- `discussions(merge_request_id, last_note_at, id)` — for MR-scoped + sorted queries
- `notes(discussion_id, created_at DESC, id DESC)` — for `--include-notes` expansion
- `notes(discussion_id, is_system, created_at, id)` — for ranked_notes CTE ordering
This is a measured approach: profile first, add indexes only where the query plan demands them.
No speculative index creation.
This is a measured approach: one required index for the critical new path, remaining indexes
added only where the query plan demands them.
### Tests
@@ -1500,7 +1616,7 @@ fn discussions_ambiguous_gitlab_discussion_id_across_projects() {
};
let result = query_discussions(&conn, &filters, &Config::default());
assert!(result.is_err());
// Error should be Ambiguous with both project paths
// Error should be Ambiguous with both project paths and gitlab_project_ids
}
```
@@ -1579,6 +1695,99 @@ fn discussions_first_note_rollup_skips_system_notes() {
}
```
#### Test 15: --contains filter returns matching discussions
```rust
#[test]
fn query_discussions_contains_filter() {
let conn = create_test_db();
insert_project(&conn, 1);
insert_mr(&conn, 1, 1, 99, "Test MR");
insert_discussion(&conn, 1, "disc-match", 1, None, Some(1), "MergeRequest");
insert_discussion(&conn, 2, "disc-nomatch", 1, None, Some(1), "MergeRequest");
insert_note_in_discussion(&conn, 1, 500, 1, 1, "alice", "I really do prefer this approach");
insert_note_in_discussion(&conn, 2, 501, 2, 1, "bob", "Looks good to me");
let filters = DiscussionListFilters {
contains: Some("really do prefer".to_string()),
..DiscussionListFilters::default_for_mr(99)
};
let result = query_discussions(&conn, &filters, &Config::default()).unwrap();
assert_eq!(result.discussions.len(), 1);
assert_eq!(result.discussions[0].gitlab_discussion_id, "disc-match");
}
```
#### Test 16: Nested note bridge fields survive --fields filtering in robot mode
```rust
#[test]
fn discussions_nested_note_bridge_fields_forced_in_robot_mode() {
// When discussions --include-notes returns nested notes,
// bridge fields on nested notes must survive --fields filtering
let mut value = serde_json::json!({
"data": {
"discussions": [{
"gitlab_discussion_id": "abc",
"noteable_type": "MergeRequest",
"parent_iid": 99,
"project_path": "group/repo",
"gitlab_project_id": 42,
"note_count": 1,
"notes": [{
"body": "test note",
"project_path": "group/repo",
"gitlab_project_id": 42,
"noteable_type": "MergeRequest",
"parent_iid": 99,
"gitlab_discussion_id": "abc",
"gitlab_note_id": 500
}]
}]
}
});
// Agent requests only "body" on notes — bridge fields must still appear
filter_fields_robot(
&mut value,
"discussions",
&["note_count".to_string()],
);
let note = &value["data"]["discussions"][0]["notes"][0];
assert!(note.get("gitlab_discussion_id").is_some());
assert!(note.get("gitlab_note_id").is_some());
assert!(note.get("gitlab_project_id").is_some());
}
```
#### Test 17: Ambiguity preflight respects scope filters (no false positives)
```rust
#[test]
fn discussions_ambiguity_preflight_respects_scope_filters() {
let conn = create_test_db();
insert_project(&conn, 1); // "group/repo-a"
insert_project(&conn, 2); // "group/repo-b"
// Same gitlab_discussion_id in both projects
// But different noteable_types
insert_discussion(&conn, 1, "shared-id", 1, Some(1), None, "Issue");
insert_discussion(&conn, 2, "shared-id", 2, None, Some(1), "MergeRequest");
// Filter by noteable_type narrows to one project — should NOT fire ambiguity
let filters = DiscussionListFilters {
gitlab_discussion_id: Some("shared-id".to_string()),
noteable_type: Some(NoteableTypeFilter::MergeRequest),
project: None,
..DiscussionListFilters::default()
};
let result = query_discussions(&conn, &filters, &Config::default());
assert!(result.is_ok());
assert_eq!(result.unwrap().discussions.len(), 1);
}
```
---
## 4. Fix Robot-Docs Response Schemas
@@ -1629,6 +1838,7 @@ With:
"--since <period>",
"--path <filepath>",
"--noteable-type <Issue|MergeRequest>",
"--contains <text>",
"--include-notes <N>",
"--sort <first-note|last-note>",
"--order <asc|desc>",
@@ -1831,14 +2041,13 @@ Changes 1 and 2 can be done in parallel. Change 4 must come last since it docume
final schema of all preceding changes.
**Cross-cutting**: The Bridge Contract field guardrail (force-including bridge fields in robot
mode) should be implemented as part of Change 1, since it modifies `filter_fields` in
`robot.rs` which all subsequent changes depend on. The `BRIDGE_FIELDS_*` constants are defined
once and reused by Changes 3 and 4.
mode, including nested notes) should be implemented as part of Change 1, since it modifies
`filter_fields` in `robot.rs` which all subsequent changes depend on. The `BRIDGE_FIELDS_*`
constants are defined once and reused by Changes 3 and 4.
**Cross-cutting**: The snapshot consistency pattern (deferred read transaction) should be
implemented in Change 1 for `query_notes` and carried forward to Change 3 for
`query_discussions`. This is a one-line wrapper that provides correctness guarantees with
zero performance cost.
**Cross-cutting**: The snapshot consistency pattern (deferred read transaction in handlers)
should be implemented in Change 1 for `handle_notes` and carried forward to Change 3 for
`handle_discussions`. Transaction ownership lives in handlers; query helpers accept `&Connection`.
---
@@ -1850,40 +2059,52 @@ After all changes:
`gitlab_discussion_id`, `gitlab_note_id`, and `gitlab_project_id` in the response
2. An agent can run `lore -J discussions --for-mr 3929 --resolution unresolved` to see all
open threads with their IDs
3. An agent can run `lore -J mrs 3929` and see `gitlab_discussion_id`, `resolvable`,
3. An agent can run `lore -J discussions --for-mr 3929 --contains "prefer the approach"` to
find threads by text content without a separate `notes` round-trip
4. An agent can run `lore -J mrs 3929` and see `gitlab_discussion_id`, `resolvable`,
`resolved`, and `last_note_at_iso` on each discussion group, plus `gitlab_note_id` on
each note within
4. `lore robot-docs` lists actual field names for all commands
5. All existing tests still pass
6. No clippy warnings (pedantic + nursery)
7. Robot-docs contract tests pass with field-set parity (not just string-contains), preventing
5. `lore robot-docs` lists actual field names for all commands
6. All existing tests still pass
7. No clippy warnings (pedantic + nursery)
8. Robot-docs contract tests pass with field-set parity (not just string-contains), preventing
future schema drift in both directions
8. Bridge Contract fields (`project_path`, `gitlab_project_id`, `noteable_type`, `parent_iid`,
9. Bridge Contract fields (`project_path`, `gitlab_project_id`, `noteable_type`, `parent_iid`,
`gitlab_discussion_id`, `gitlab_note_id`) are present in every applicable read payload
9. Bridge Contract fields survive `--fields` filtering in robot mode (guardrail enforced)
10. `--gitlab-discussion-id` filter works on both `notes` and `discussions` commands
11. `--include-notes N` populates inline notes on `discussions` output via single batched query
12. CLI-level contract integration tests verify bridge fields through the full handler path
13. `gitlab_note_id` is available in notes list output (alongside `gitlab_id` for back-compat)
10. Bridge Contract fields survive `--fields` filtering in robot mode (guardrail enforced),
including nested notes within `discussions --include-notes`
11. `--gitlab-discussion-id` filter works on both `notes` and `discussions` commands
12. `--include-notes N` populates inline notes on `discussions` output via single batched query
13. CLI-level contract integration tests verify bridge fields through the full handler path
14. `gitlab_note_id` is available in notes list output (alongside `gitlab_id` for back-compat)
and in show detail notes, providing a uniform field name across all commands
14. Ambiguity guardrail fires when `--gitlab-discussion-id` matches multiple projects without
15. Ambiguity guardrail fires when `--gitlab-discussion-id` matches multiple projects without
`--project` specified — **including when LIMIT would have hidden the ambiguity** (preflight
query runs before LIMIT)
15. Output guardrails clamp `--limit` to 500 and `--include-notes` to 20; `meta` reports
query runs before LIMIT). Error includes structured candidates with `gitlab_project_id`
for machine consumption
16. Ambiguity preflight is scope-aware: active filters (noteable_type, for_issue/for_mr) are
applied alongside the discussion ID check, preventing false ambiguity when scope already
narrows to one project
17. Output guardrails clamp `--limit` to 500 and `--include-notes` to 20; `meta` reports
effective values and `has_more` truncation flag
16. Discussion and show queries use deterministic ordering (COALESCE + id tiebreaker) to
18. Discussion and show queries use deterministic ordering (COALESCE + id tiebreaker) to
prevent unstable output during partial sync states
17. Per-discussion truncation signals (`included_note_count`, `has_more_notes`) are accurate
19. Per-discussion truncation signals (`included_note_count`, `has_more_notes`) are accurate
for `--include-notes` output
18. Multi-query commands (`query_notes`, `query_discussions`) use deferred read transactions
for snapshot consistency during concurrent ingest
19. Discussion filters (`resolution`, `noteable_type`, `sort`, `order`) use typed enums
20. Multi-query handlers (`handle_notes`, `handle_discussions`) open deferred read transactions;
query helpers accept `&Connection` for snapshot consistency and testability
21. Discussion filters (`resolution`, `noteable_type`, `sort`, `order`) use typed enums
with match-to-SQL mapping — no raw string interpolation in query construction
20. First-note rollup correctly handles discussions with leading system notes — `first_author`
22. First-note rollup correctly handles discussions with leading system notes — `first_author`
and `first_note_body_snippet` always reflect the first non-system note
21. Query plans for primary discussion query patterns (`--for-mr`, `--project --since`,
23. Query plans for primary discussion query patterns (`--for-mr`, `--project --since`,
`--gitlab-discussion-id`) have been validated via EXPLAIN QUERY PLAN; targeted indexes
added only where scans were observed
24. The `notes(discussion_id, created_at DESC, id DESC)` index is present for `--include-notes`
expansion performance
25. Discussion query uses page-first CTE architecture: note rollups scan only the paged result
set, not all filtered discussions, keeping latency proportional to `--limit`
26. `--contains` filter on `discussions` returns only discussions with matching note text
---
@@ -1902,6 +2123,6 @@ After all changes:
- **`--with-write-hints` flag for inline glab endpoint templates** — rejected because this couples lore's read surface to glab's API surface, violating the read/write split principle. The Bridge Contract gives agents the raw identifiers; constructing glab commands is the agent's responsibility. Adding endpoint templates would require lore to track glab API changes, creating an unnecessary maintenance burden.
- **Show-command note ordering change (`ORDER BY COALESCE(position, ...), created_at, id`)** — rejected because show-command note ordering within a discussion thread is out of scope for this plan. The existing ordering works correctly for present data; the defensive COALESCE pattern is applied to discussion-level ordering where it matters for agent workflows.
- **Query-plan validation as a separate numbered workstream** — rejected because it adds delivery overhead without proportional benefit. Query-plan validation is integrated into workstream 3 as a pre-merge validation step (section 3h), with candidate indexes listed but only added when EXPLAIN QUERY PLAN shows they're needed. This keeps the measured approach without inflating the workstream count.
- **Add `gitlab_note_id` to show-command note detail structs** — rejected because show-command note detail structs already have `gitlab_id` (same value as `id`). The field is unambiguous and consistent with the Bridge Contract. Adding `gitlab_note_id` would create a duplicate and increase payload size without benefit.
- **Add `gitlab_discussion_id` to show-command discussion detail structs** — rejected because show-command discussion detail structs already have `gitlab_discussion_id`. The field is unambiguous and consistent with the Bridge Contract. Adding `gitlab_discussion_id` would create a duplicate and increase payload size without benefit.
- **Add `gitlab_project_id` to show-command discussion detail structs** — rejected because show-command discussion detail structs already have `gitlab_project_id`. The field is unambiguous and consistent with the Bridge Contract. Adding `gitlab_project_id` would create a duplicate and increase payload size without benefit.
- **`--project-id` immutable input filter across notes/discussions/show** — rejected because this is scope creep touching every command and changing CLI ergonomics. Agents already get `gitlab_project_id` in output to construct API calls; the input-side concern (project renames breaking `--project`) is theoretical and hasn't been observed in practice. The `--project` flag already supports fuzzy matching which handles most rename scenarios. If real-world evidence surfaces, this can be added later without breaking changes.
- **Schema versioning in robot-docs (`schema_version` field + semver policy)** — rejected because this tool has zero external consumers beyond our own agents, and the contract tests (field-set parity assertions) catch drift at compile time. Schema versioning adds bureaucratic overhead (version bumps, compatibility matrices, deprecation policies) without proportional benefit for an internal tool in early development. If lore gains external consumers, this can be reconsidered.
- **Remove "stale" rejected items that "conflict" with active sections** — rejected because the prior entries about show-command structs were stale from iteration 2 and have been cleaned up independently. The rejected section is cumulative by design — it prevents future reviewers from re-proposing changes that have already been evaluated.