feat(path): rename-aware ambiguity resolution for suffix probe

When a bare filename like 'operators.ts' matches multiple full paths, check if they are the same file connected by renames (via BFS on mr_file_changes). If so, auto-resolve to the newest path instead of erroring. Also wires path resolution into file-history and trace commands so bare filenames work everywhere. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-17 15:05:47 -05:00
parent 171260a772
commit 714c8c2623
7 changed files with 632 additions and 100 deletions
--- a/docs/plan-expose-discussion-ids.feedback-5.md
+++ b/docs/plan-expose-discussion-ids.feedback-5.md
@@ -0,0 +1,140 @@
+Your iteration 4 plan is already strong. The highest-impact revisions are around query shape, transaction boundaries, and contract stability for agents.
+
+1. **Switch discussions query to a two-phase page-first architecture**
+Analysis: Current `ranked_notes` runs over every filtered discussion before `LIMIT`, which can explode on project-wide queries. A page-first plan keeps complexity proportional to `limit`, improves tail latency, and reduces memory churn.
+```diff
+@@ ## 3c. SQL Query
+-Core query uses a CTE + ranked-notes rollup (window function) to avoid per-row correlated
+-subqueries.
+Core query is split into two phases for scalability:
+1) `paged_discussions` applies filters/sort/LIMIT and returns only page IDs.
+2) Note rollups and optional `--include-notes` expansion run only for those page IDs.
+This bounds note scanning to visible results and stabilizes latency on large projects.
+ 
+-WITH filtered_discussions AS (
+WITH filtered_discussions AS (
+   ...
+ ),
+-ranked_notes AS (
+paged_discussions AS (
+    SELECT id
+    FROM filtered_discussions
+    ORDER BY COALESCE({sort_column}, 0) {order}, id {order}
+    LIMIT ?
+),
+ranked_notes AS (
+   ...
+-    WHERE n.discussion_id IN (SELECT id FROM filtered_discussions)
+    WHERE n.discussion_id IN (SELECT id FROM paged_discussions)
+ )
+```
+
+2. **Move snapshot transaction ownership to handlers (not query helpers)**
+Analysis: This avoids nested transaction edge cases, keeps function signatures clean, and guarantees one snapshot across count + page + include-notes + serialization metadata.
+```diff
+@@ ## Cross-cutting: snapshot consistency
+-Wrap `query_notes` and `query_discussions` in a deferred read transaction.
+Open one deferred read transaction in each handler (`handle_notes`, `handle_discussions`)
+and pass `&Transaction` into query helpers. Query helpers do not open/commit transactions.
+This guarantees a single snapshot across all subqueries and avoids nested tx pitfalls.
+ 
+-pub fn query_discussions(conn: &Connection, ...)
+pub fn query_discussions(tx: &rusqlite::Transaction<'_>, ...)
+```
+
+3. **Add immutable input filter `--project-id` across notes/discussions/show**
+Analysis: You already expose `gitlab_project_id` because paths are mutable; input should support the same immutable selector. This removes failure modes after project renames/transfers.
+```diff
+@@ ## 3a. CLI Args
+    /// Filter by immutable GitLab project ID
+    #[arg(long, help_heading = "Filters", conflicts_with = "project")]
+    pub project_id: Option<i64>,
+@@ ## Bridge Contract
+Input symmetry rule: commands that accept `--project` should also accept `--project-id`.
+If both are present, return usage error (exit code 2).
+```
+
+4. **Enforce bridge fields for nested notes in `discussions --include-notes`**
+Analysis: Current guardrail is entity-level; nested notes can still lose required IDs under aggressive filtering. This is a contract hole for write-bridging.
+```diff
+@@ ### Field Filtering Guardrail
+-In robot mode, `filter_fields` MUST force-include Bridge Contract fields...
+In robot mode, `filter_fields` MUST force-include Bridge Contract fields at all returned levels:
+- discussion row fields
+- nested note fields when `discussions --include-notes` is used
+ 
+const BRIDGE_FIELDS_DISCUSSION_NOTES: &[&str] = &[
+    "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
+    "gitlab_discussion_id", "gitlab_note_id",
+];
+```
+
+5. **Make ambiguity preflight scope-aware and machine-actionable**
+Analysis: Current preflight checks only `gitlab_discussion_id`, which can produce false ambiguity when additional filters already narrow to one project. Also, agents need structured candidates, not only free-text.
+```diff
+@@ ### Ambiguity Guardrail
+-SELECT DISTINCT p.path_with_namespace
+SELECT DISTINCT p.path_with_namespace, p.gitlab_project_id
+ FROM discussions d
+ JOIN projects p ON p.id = d.project_id
+-WHERE d.gitlab_discussion_id = ?
+WHERE d.gitlab_discussion_id = ?
+  /* plus active scope filters: noteable_type, for_issue/for_mr, since/path when present */
+ LIMIT 3
+ 
+-Return LoreError::Ambiguous with message
+Return LoreError::Ambiguous with structured details:
+`{ code, message, candidates:[{project_path, gitlab_project_id}], suggestion }`
+```
+
+6. **Add `--contains` filter to `discussions`**
+Analysis: This is a high-utility agent workflow gap. Agents frequently need “find thread by text then reply”; forcing a separate `notes` search round-trip is unnecessary.
+```diff
+@@ ## 3a. CLI Args
+    /// Filter discussions whose notes contain text
+    #[arg(long, help_heading = "Filters")]
+    pub contains: Option<String>,
+@@ ## 3d. Filters struct
+    pub contains: Option<String>,
+@@ ## 3d. Where-clause construction
+- `path` -> EXISTS (...)
+- `path` -> EXISTS (...)
+- `contains` -> EXISTS (
+    SELECT 1 FROM notes n
+    WHERE n.discussion_id = d.id
+      AND n.body LIKE ?
+  )
+```
+
+7. **Promote two baseline indexes from “candidate” to “required”**
+Analysis: These are directly hit by new primary paths; waiting for post-merge profiling risks immediate perf cliffs in real usage.
+```diff
+@@ ## 3h. Query-plan validation
+-Candidate indexes (add only if EXPLAIN QUERY PLAN shows they're needed):
+-- discussions(project_id, gitlab_discussion_id)
+-- notes(discussion_id, created_at DESC, id DESC)
+Required baseline indexes for this feature:
+- discussions(project_id, gitlab_discussion_id)
+- notes(discussion_id, created_at DESC, id DESC)
+Keep other indexes conditional on EXPLAIN QUERY PLAN.
+```
+
+8. **Add schema versioning and remove contradictory rejected items**
+Analysis: `robot-docs` contract drift is a long-term agent risk; explicit schema versions let clients fail safely. Also, rejected items currently contradict active sections, which creates implementation ambiguity.
+```diff
+@@ ## 4. Fix Robot-Docs Response Schemas
+ "meta": {"elapsed_ms": "int", ...}
+"meta": {"elapsed_ms":"int", ..., "schema_version":"string"}
+
+Schema version policy:
+- bump minor on additive fields
+- bump major on removals/renames
+- expose per-command versions in `robot-docs`
+@@ ## Rejected Recommendations
+-- Add `gitlab_note_id` to show-command note detail structs ... rejected ...
+-- Add `gitlab_discussion_id` to show-command discussion detail structs ... rejected ...
+-- Add `gitlab_project_id` to show-command discussion detail structs ... rejected ...
+Remove stale rejected entries that conflict with accepted workstreams in this plan iteration.
+```
+
+If you want, I can produce a fully rewritten iteration 5 plan document that applies all of the above edits cleanly end-to-end.
--- a/docs/plan-expose-discussion-ids.md
+++ b/docs/plan-expose-discussion-ids.md
@@ -2,7 +2,7 @@
 plan: true
 title: ""
 status: iterating
-iteration: 4
+iteration: 5
 target_iterations: 8
 beads_revision: 0
 related_plans: []
@@ -52,8 +52,9 @@ output.
 ### Field Filtering Guardrail

 In robot mode, `filter_fields` **MUST** force-include Bridge Contract fields even when the
-caller passes a narrower `--fields` list. This prevents agents from accidentally stripping
-the identifiers they need for write operations.
+caller passes a narrower `--fields` list. This applies at **all nesting levels**: both the
+top-level entity fields and nested sub-entities (e.g., notes inside `discussions --include-notes`).
+This prevents agents from accidentally stripping the identifiers they need for write operations.

 **Implementation**: Add a `BRIDGE_FIELDS` constant map per entity type. In `filter_fields()`,
 when operating in robot mode, union the caller's requested fields with the bridge set before
@@ -69,70 +70,127 @@ const BRIDGE_FIELDS_DISCUSSIONS: &[&str] = &[
    "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
    "gitlab_discussion_id",
 ];
+// Applied to nested notes within discussions --include-notes
+const BRIDGE_FIELDS_DISCUSSION_NOTES: &[&str] = &[
+    "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
+    "gitlab_discussion_id", "gitlab_note_id",
+];
 ```

 In `filter_fields`, when entity is `"notes"` or `"discussions"`, merge the bridge set into the
-requested fields before filtering the JSON value. This is a ~5-line change to the existing
-function.
+requested fields before filtering the JSON value. For `"discussions"`, also apply
+`BRIDGE_FIELDS_DISCUSSION_NOTES` to each element of the nested `notes` array. This is a ~10-line
+change to the existing function.
+
+### Snapshot Consistency (Cross-Cutting)
+
+Multi-query commands (`handle_notes`, `handle_discussions`) **MUST** execute all their queries
+within a single deferred read transaction. This guarantees snapshot consistency when a concurrent
+sync/ingest is modifying the database.
+
+**Transaction ownership lives in handlers, not query helpers.** Each handler opens one deferred
+read transaction and passes it to query helpers. Query helpers accept `&Connection` (which
+`Transaction` derefs to via `std::ops::Deref`) so they remain testable with plain connections
+in unit tests. This avoids nested transaction edge cases and guarantees a single snapshot across
+count + page + include-notes + serialization.
+
+```rust
+// In handle_notes / handle_discussions:
+let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
+let result = query_notes(&tx, &filters, &config)?;
+// ... serialize ...
+tx.commit()?;  // read-only, but closes cleanly
+```
+
+Query helpers keep their `conn: &Connection` signature — `Transaction<'_>` implements
+`Deref<Target = Connection>`, so `&tx` coerces to `&Connection` at call sites.

 ### Ambiguity Guardrail

 When filtering by `gitlab_discussion_id` (on either `notes` or `discussions` commands) without
 `--project`, if the query matches discussions in multiple projects:
 - Return an `Ambiguous` error (exit code 18, matching existing convention)
- Include matching project paths in the error message
+- Include matching project paths **and `gitlab_project_id`s** in a structured candidates list
 - Suggest retry with `--project <path>`

-**Implementation**: Run a **preflight distinct-project check** before the main list query
-executes its `LIMIT`. This is critical because a post-query check on the paginated result set
-can silently miss cross-project ambiguity when `LIMIT` truncates results to rows from a single
-project. The preflight query is cheap (hits the `gitlab_discussion_id` index, returns at most
-a few rows) and eliminates non-deterministic write-targeting risk.
+**Implementation**: Run a **scope-aware preflight distinct-project check** before the main list
+query executes its `LIMIT`. The preflight applies active scope filters (noteable_type, since,
+for_issue/for_mr) alongside the discussion ID check, so it won't produce false ambiguity when
+other filters already narrow to one project. This is critical because a post-query check on the
+paginated result set can silently miss cross-project ambiguity when `LIMIT` truncates results to
+rows from a single project. The preflight query is cheap (hits the `gitlab_discussion_id` index,
+returns at most a few rows) and eliminates non-deterministic write-targeting risk.

 ```sql
-- Preflight ambiguity check (runs before main query)
-SELECT DISTINCT p.path_with_namespace
+-- Preflight ambiguity check (runs before main query, includes active scope filters)
+SELECT DISTINCT p.path_with_namespace, p.gitlab_project_id
 FROM discussions d
 JOIN projects p ON p.id = d.project_id
 WHERE d.gitlab_discussion_id = ?
+  -- scope filters applied dynamically:
+  -- AND d.noteable_type = ? (when --noteable-type present)
+  -- AND d.merge_request_id = (SELECT ...) (when --for-mr present)
+  -- AND d.issue_id = (SELECT ...) (when --for-issue present)
 LIMIT 3
 ```

-If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with the
-distinct project paths and suggestion to retry with `--project <path>`.
+If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with structured
+candidates for machine consumption:

 ```rust
 // In query_notes / query_discussions, before executing the main query:
 if let Some(ref disc_id) = filters.gitlab_discussion_id {
    if filters.project.is_none() {
-        let distinct_projects: Vec<String> = conn
+        let candidates: Vec<(String, i64)> = conn
            .prepare(
-                "SELECT DISTINCT p.path_with_namespace \
+                "SELECT DISTINCT p.path_with_namespace, p.gitlab_project_id \
                 FROM discussions d \
                 JOIN projects p ON p.id = d.project_id \
                 WHERE d.gitlab_discussion_id = ? \
                 LIMIT 3"
+                // Note: add scope filter clauses dynamically
            )?
-            .query_map([disc_id], |row| row.get(0))?
+            .query_map([disc_id], |row| Ok((row.get(0)?, row.get(1)?)))?
            .collect::<std::result::Result<Vec<_>, _>>()?;

-        if distinct_projects.len() > 1 {
+        if candidates.len() > 1 {
            return Err(LoreError::Ambiguous {
                message: format!(
-                    "Discussion ID matches {} projects: {}. Use --project to disambiguate.",
-                    distinct_projects.len(),
-                    distinct_projects.join(", ")
+                    "Discussion ID matches {} projects. Use --project to disambiguate.",
+                    candidates.len(),
                ),
+                candidates: candidates.into_iter()
+                    .map(|(path, id)| AmbiguousCandidate { project_path: path, gitlab_project_id: id })
+                    .collect(),
            });
        }
    }
 }
 ```

+In robot mode, the error serializes as:
+```json
+{
+  "error": {
+    "code": "AMBIGUOUS",
+    "message": "Discussion ID matches 2 projects. Use --project to disambiguate.",
+    "candidates": [
+      {"project_path": "group/repo-a", "gitlab_project_id": 42},
+      {"project_path": "group/repo-b", "gitlab_project_id": 99}
+    ],
+    "suggestion": "lore -J discussions --gitlab-discussion-id <id> --project <path>",
+    "actions": ["lore -J discussions --gitlab-discussion-id <id> --project group/repo-a"]
+  }
+}
+```
+
+This gives agents machine-actionable candidates: they can pick a project and retry immediately
+without parsing free-text error messages.
+
 #### 1h. Wrap `query_notes` in a read transaction

-Wrap the count query and page query in a deferred read transaction per the Snapshot Consistency
-cross-cutting requirement. See the Bridge Contract section for the pattern.
+Per the Snapshot Consistency cross-cutting requirement, `handle_notes` opens a deferred read
+transaction and passes it to `query_notes`. See the Snapshot Consistency section for the pattern.

 ### Tests

@@ -337,6 +395,7 @@ fn notes_ambiguous_gitlab_discussion_id_across_projects() {
    // (this can happen since IDs are per-project)
    // Filter by gitlab_discussion_id without --project
    // Assert LoreError::Ambiguous is returned with both project paths
+    // Assert candidates include gitlab_project_id for machine consumption
 }
 ```

@@ -352,6 +411,19 @@ fn notes_ambiguity_preflight_not_defeated_by_limit() {
 }
 ```

+#### Test 8: Ambiguity preflight respects scope filters (no false positives)
+
+```rust
+#[test]
+fn notes_ambiguity_preflight_respects_scope_filters() {
+    let conn = create_test_db();
+    // Insert 2 projects, each with a discussion sharing the same gitlab_discussion_id
+    // But one is Issue-type and the other MergeRequest-type
+    // Filter by gitlab_discussion_id + --noteable-type MergeRequest (narrows to 1 project)
+    // Assert NO ambiguity error — scope filters disambiguate
+}
+```
+
 ---

 ## 2. Add `gitlab_discussion_id` to Show Command Discussion Groups
@@ -644,6 +716,9 @@ lore -J discussions --gitlab-discussion-id 6a9c1750b37d

 # List unresolved threads with latest 2 notes inline (fewer round-trips)
 lore -J discussions --for-mr 99 --resolution unresolved --include-notes 2
+
+# Find discussions containing specific text
+lore -J discussions --for-mr 99 --contains "prefer the approach"
 ```

 ### Response Schema
@@ -801,6 +876,10 @@ pub struct DiscussionsArgs {
    #[arg(long, value_enum, help_heading = "Filters")]
    pub noteable_type: Option<NoteableTypeFilter>,

+    /// Filter discussions whose notes contain text (case-insensitive LIKE match)
+    #[arg(long, help_heading = "Filters")]
+    pub contains: Option<String>,
+
    /// Include up to N latest notes per discussion (0 = none, default; clamped to 20)
    #[arg(long, default_value = "0", help_heading = "Output")]
    pub include_notes: usize,
@@ -925,7 +1004,7 @@ The `included_note_count` is set to `notes.len()` and `has_more_notes` is set to
 `note_count > included_note_count` during the JSON conversion, providing per-discussion
 truncation signals.

-#### 3c. SQL Query
+#### 3c. SQL Query — Two-Phase Page-First Architecture

 **File**: `src/cli/commands/list.rs`

@@ -935,21 +1014,29 @@ pub fn query_discussions(
    filters: &DiscussionListFilters,
    config: &Config,
 ) -> Result<DiscussionListResult> {
-    // Wrap all queries in a deferred read transaction for snapshot consistency
-    let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
+    // NOTE: Transaction is managed by the handler (handle_discussions).
+    // This function receives &Connection (which Transaction derefs to via `std::ops::Deref`).

    // Preflight ambiguity check (if gitlab_discussion_id without project)
    // ... see Ambiguity Guardrail section ...

-    // Main query + count query ...
-    // ... note expansion query (if include_notes > 0) ...
-
-    tx.commit()?;
+    // Phase 1: Filter + sort + LIMIT to get page IDs
+    // Phase 2: Note rollups only for paged results
+    // Phase 3: Optional --include-notes expansion (separate query)
+}
 ```

-Core query uses a CTE + ranked-notes rollup (window function) to avoid per-row correlated
-subqueries. The `ROW_NUMBER()` approach produces a single scan over the notes table, which
-is more predictable than repeated LIMIT 1 sub-selects at scale (200K+ discussions):
+The query uses a **two-phase page-first architecture** for scalability:
+
+1. **Phase 1** (`paged_discussions`): Apply all filters, sort, and LIMIT to produce just the
+   discussion IDs for the current page. This bounds the result set before any note scanning.
+2. **Phase 2** (`ranked_notes` + `note_rollup`): Run note aggregation only for the paged
+   discussion IDs. This ensures note scanning is proportional to `--limit`, not to the total
+   filtered discussion count.
+
+This architecture prevents the performance cliff that occurs on project-wide queries with
+thousands of discussions: instead of scanning notes for all filtered discussions (potentially
+200K+), we scan only for the 50 (or whatever `--limit` is) that will actually be returned.

 ```sql
 WITH filtered_discussions AS (
@@ -961,6 +1048,14 @@ WITH filtered_discussions AS (
    JOIN projects p ON d.project_id = p.id
    {where_sql}
 ),
+-- Phase 1: Page-first — apply sort + LIMIT before note scanning
+paged_discussions AS (
+    SELECT id
+    FROM filtered_discussions
+    ORDER BY COALESCE({sort_column}, 0) {order}, id {order}
+    LIMIT ?
+),
+-- Phase 2: Note rollups only for paged results
 ranked_notes AS (
    SELECT
        n.discussion_id,
@@ -980,7 +1075,7 @@ ranked_notes AS (
                     n.created_at, n.id
        ) AS rn_first_position
    FROM notes n
-    WHERE n.discussion_id IN (SELECT id FROM filtered_discussions)
+    WHERE n.discussion_id IN (SELECT id FROM paged_discussions)
 ),
 note_rollup AS (
    SELECT
@@ -1012,12 +1107,12 @@ SELECT
    nr.position_new_path,
    nr.position_new_line
 FROM filtered_discussions fd
+JOIN paged_discussions pd ON fd.id = pd.id
 JOIN projects p ON fd.project_id = p.id
 LEFT JOIN issues i ON fd.issue_id = i.id
 LEFT JOIN merge_requests m ON fd.merge_request_id = m.id
 LEFT JOIN note_rollup nr ON nr.discussion_id = fd.id
 ORDER BY COALESCE({sort_column}, 0) {order}, fd.id {order}
-LIMIT ?
 ```

 **Dual window function rationale**: The `ranked_notes` CTE uses two separate `ROW_NUMBER()`
@@ -1028,12 +1123,11 @@ displacing the first human author/body, and prevents a non-positioned note from
 the file location. The `MAX(CASE WHEN rn_xxx = 1 ...)` pattern extracts the correct value
 from each independently-ranked sequence.

-**Performance rationale**: The CTE pre-filters discussions before joining notes. The
-`ranked_notes` CTE uses `ROW_NUMBER()` (a single pass over the notes index) instead of
-correlated `(SELECT ... LIMIT 1)` sub-selects per discussion. For MR-scoped queries
-(50-200 discussions) the performance is equivalent. For project-wide scans with thousands
-of discussions, the window function approach avoids repeated index probes and produces a
-more predictable query plan.
+**Page-first scalability rationale**: The `paged_discussions` CTE applies LIMIT before note
+scanning. For MR-scoped queries (50-200 discussions) the performance is equivalent to the
+non-paged approach. For project-wide scans with thousands of discussions, the page-first
+architecture avoids scanning notes for discussions that won't appear in the result, keeping
+latency proportional to `--limit` rather than to the total filtered count.

 **Note on ordering**: The `COALESCE({sort_column}, 0)` with tiebreaker `fd.id` ensures
 deterministic ordering even when timestamps are NULL (partial sync states). The `id`
@@ -1042,6 +1136,10 @@ tiebreaker is cheap (primary key) and prevents unstable sort output.
 **Note on SQLite FILTER syntax**: SQLite does not support `COUNT(*) FILTER (WHERE ...)`.
 Use `SUM(CASE WHEN ... THEN 1 ELSE 0 END)` instead (as shown above).

+**Count query**: The total_count query runs separately against `filtered_discussions` (without
+the LIMIT) using `SELECT COUNT(*) FROM filtered_discussions`. This is needed for `has_more`
+metadata. The count uses the same filter CTEs but omits notes entirely.
+
 #### 3c-ii. Note expansion query (--include-notes)

 When `include_notes > 0`, after the main discussion query, run a **single batched query**
@@ -1103,6 +1201,7 @@ pub struct DiscussionListFilters {
    pub since: Option<String>,
    pub path: Option<String>,
    pub noteable_type: Option<NoteableTypeFilter>,
+    pub contains: Option<String>,
    pub sort: DiscussionSortField,
    pub order: SortDirection,
    pub include_notes: usize,
@@ -1117,6 +1216,7 @@ Where-clause construction uses `match` on typed enums — never raw string inter
 - `since` → `d.first_note_at >= ?` (using `parse_since()`)
 - `path` → `EXISTS (SELECT 1 FROM notes n WHERE n.discussion_id = d.id AND n.position_new_path LIKE ?)`
 - `noteable_type` → match: `Issue` → `d.noteable_type = 'Issue'`, `MergeRequest` → `d.noteable_type = 'MergeRequest'`
+- `contains` → `EXISTS (SELECT 1 FROM notes n WHERE n.discussion_id = d.id AND n.body LIKE '%' || ? || '%')`

 #### 3e. Handler wiring

@@ -1128,7 +1228,7 @@ Add match arm:
 Some(Commands::Discussions(args)) => handle_discussions(cli.config.as_deref(), args, robot_mode),
 ```

-Handler function:
+Handler function (with transaction ownership):

 ```rust
 fn handle_discussions(
@@ -1143,6 +1243,10 @@ fn handle_discussions(

    let effective_limit = args.limit.min(500);
    let effective_include_notes = args.include_notes.min(20);
+
+    // Snapshot consistency: one transaction across all queries
+    let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
+
    let filters = DiscussionListFilters {
        limit: effective_limit,
        project: args.project,
@@ -1153,12 +1257,15 @@ fn handle_discussions(
        since: args.since,
        path: args.path,
        noteable_type: args.noteable_type,
+        contains: args.contains,
        sort: args.sort,
        order: args.order,
        include_notes: effective_include_notes,
    };

-    let result = query_discussions(&conn, &filters, &config)?;
+    let result = query_discussions(&tx, &filters, &config)?;
+
+    tx.commit()?;  // read-only, but closes cleanly

    let format = if robot_mode && args.format == "table" {
        "json"
@@ -1247,7 +1354,7 @@ CSV view: all fields, following same pattern as `print_list_notes_csv`.
    .collect(),
 ```

-#### 3h. Query-plan validation
+#### 3h. Query-plan validation and indexes

 Before merging the discussions command, capture `EXPLAIN QUERY PLAN` output for the three
 primary query patterns:
@@ -1255,17 +1362,26 @@ primary query patterns:
 - `--project <path> --since 7d --sort last-note`
 - `--gitlab-discussion-id <id>`

-If plans show table scans on `notes` or `discussions` for these patterns, add targeted indexes
-to the `MIGRATIONS` array in `src/core/db.rs`:
+**Required baseline index** (directly hit by `--include-notes` expansion, which runs a
+`ROW_NUMBER() OVER (PARTITION BY discussion_id ORDER BY created_at DESC, id DESC)` window
+on the notes table):

-**Candidate indexes** (add only if EXPLAIN QUERY PLAN shows they're needed):
+```sql
+CREATE INDEX IF NOT EXISTS idx_notes_discussion_created_desc
+ON notes(discussion_id, created_at DESC, id DESC);
+```
+
+This index is non-negotiable because the include-notes expansion query's performance is
+directly proportional to how efficiently it can scan notes per discussion. Without it, SQLite
+falls back to a full table scan of the 282K-row notes table for each batch.
+
+**Conditional indexes** (add only if EXPLAIN QUERY PLAN shows they're needed):
 - `discussions(project_id, gitlab_discussion_id)` — for ambiguity preflight + direct ID lookup
 - `discussions(merge_request_id, last_note_at, id)` — for MR-scoped + sorted queries
- `notes(discussion_id, created_at DESC, id DESC)` — for `--include-notes` expansion
 - `notes(discussion_id, is_system, created_at, id)` — for ranked_notes CTE ordering

-This is a measured approach: profile first, add indexes only where the query plan demands them.
-No speculative index creation.
+This is a measured approach: one required index for the critical new path, remaining indexes
+added only where the query plan demands them.

 ### Tests

@@ -1500,7 +1616,7 @@ fn discussions_ambiguous_gitlab_discussion_id_across_projects() {
    };
    let result = query_discussions(&conn, &filters, &Config::default());
    assert!(result.is_err());
-    // Error should be Ambiguous with both project paths
+    // Error should be Ambiguous with both project paths and gitlab_project_ids
 }
 ```

@@ -1579,6 +1695,99 @@ fn discussions_first_note_rollup_skips_system_notes() {
 }
 ```

+#### Test 15: --contains filter returns matching discussions
+
+```rust
+#[test]
+fn query_discussions_contains_filter() {
+    let conn = create_test_db();
+    insert_project(&conn, 1);
+    insert_mr(&conn, 1, 1, 99, "Test MR");
+    insert_discussion(&conn, 1, "disc-match", 1, None, Some(1), "MergeRequest");
+    insert_discussion(&conn, 2, "disc-nomatch", 1, None, Some(1), "MergeRequest");
+    insert_note_in_discussion(&conn, 1, 500, 1, 1, "alice", "I really do prefer this approach");
+    insert_note_in_discussion(&conn, 2, 501, 2, 1, "bob", "Looks good to me");
+
+    let filters = DiscussionListFilters {
+        contains: Some("really do prefer".to_string()),
+        ..DiscussionListFilters::default_for_mr(99)
+    };
+    let result = query_discussions(&conn, &filters, &Config::default()).unwrap();
+
+    assert_eq!(result.discussions.len(), 1);
+    assert_eq!(result.discussions[0].gitlab_discussion_id, "disc-match");
+}
+```
+
+#### Test 16: Nested note bridge fields survive --fields filtering in robot mode
+
+```rust
+#[test]
+fn discussions_nested_note_bridge_fields_forced_in_robot_mode() {
+    // When discussions --include-notes returns nested notes,
+    // bridge fields on nested notes must survive --fields filtering
+    let mut value = serde_json::json!({
+        "data": {
+            "discussions": [{
+                "gitlab_discussion_id": "abc",
+                "noteable_type": "MergeRequest",
+                "parent_iid": 99,
+                "project_path": "group/repo",
+                "gitlab_project_id": 42,
+                "note_count": 1,
+                "notes": [{
+                    "body": "test note",
+                    "project_path": "group/repo",
+                    "gitlab_project_id": 42,
+                    "noteable_type": "MergeRequest",
+                    "parent_iid": 99,
+                    "gitlab_discussion_id": "abc",
+                    "gitlab_note_id": 500
+                }]
+            }]
+        }
+    });
+
+    // Agent requests only "body" on notes — bridge fields must still appear
+    filter_fields_robot(
+        &mut value,
+        "discussions",
+        &["note_count".to_string()],
+    );
+
+    let note = &value["data"]["discussions"][0]["notes"][0];
+    assert!(note.get("gitlab_discussion_id").is_some());
+    assert!(note.get("gitlab_note_id").is_some());
+    assert!(note.get("gitlab_project_id").is_some());
+}
+```
+
+#### Test 17: Ambiguity preflight respects scope filters (no false positives)
+
+```rust
+#[test]
+fn discussions_ambiguity_preflight_respects_scope_filters() {
+    let conn = create_test_db();
+    insert_project(&conn, 1);  // "group/repo-a"
+    insert_project(&conn, 2);  // "group/repo-b"
+    // Same gitlab_discussion_id in both projects
+    // But different noteable_types
+    insert_discussion(&conn, 1, "shared-id", 1, Some(1), None, "Issue");
+    insert_discussion(&conn, 2, "shared-id", 2, None, Some(1), "MergeRequest");
+
+    // Filter by noteable_type narrows to one project — should NOT fire ambiguity
+    let filters = DiscussionListFilters {
+        gitlab_discussion_id: Some("shared-id".to_string()),
+        noteable_type: Some(NoteableTypeFilter::MergeRequest),
+        project: None,
+        ..DiscussionListFilters::default()
+    };
+    let result = query_discussions(&conn, &filters, &Config::default());
+    assert!(result.is_ok());
+    assert_eq!(result.unwrap().discussions.len(), 1);
+}
+```
+
 ---

 ## 4. Fix Robot-Docs Response Schemas
@@ -1629,6 +1838,7 @@ With:
        "--since <period>",
        "--path <filepath>",
        "--noteable-type <Issue|MergeRequest>",
+        "--contains <text>",
        "--include-notes <N>",
        "--sort <first-note|last-note>",
        "--order <asc|desc>",
@@ -1831,14 +2041,13 @@ Changes 1 and 2 can be done in parallel. Change 4 must come last since it docume
 final schema of all preceding changes.

 **Cross-cutting**: The Bridge Contract field guardrail (force-including bridge fields in robot
-mode) should be implemented as part of Change 1, since it modifies `filter_fields` in
-`robot.rs` which all subsequent changes depend on. The `BRIDGE_FIELDS_*` constants are defined
-once and reused by Changes 3 and 4.
+mode, including nested notes) should be implemented as part of Change 1, since it modifies
+`filter_fields` in `robot.rs` which all subsequent changes depend on. The `BRIDGE_FIELDS_*`
+constants are defined once and reused by Changes 3 and 4.

-**Cross-cutting**: The snapshot consistency pattern (deferred read transaction) should be
-implemented in Change 1 for `query_notes` and carried forward to Change 3 for
-`query_discussions`. This is a one-line wrapper that provides correctness guarantees with
-zero performance cost.
+**Cross-cutting**: The snapshot consistency pattern (deferred read transaction in handlers)
+should be implemented in Change 1 for `handle_notes` and carried forward to Change 3 for
+`handle_discussions`. Transaction ownership lives in handlers; query helpers accept `&Connection`.

 ---

@@ -1850,40 +2059,52 @@ After all changes:
   `gitlab_discussion_id`, `gitlab_note_id`, and `gitlab_project_id` in the response
 2. An agent can run `lore -J discussions --for-mr 3929 --resolution unresolved` to see all
   open threads with their IDs
-3. An agent can run `lore -J mrs 3929` and see `gitlab_discussion_id`, `resolvable`,
+3. An agent can run `lore -J discussions --for-mr 3929 --contains "prefer the approach"` to
+   find threads by text content without a separate `notes` round-trip
+4. An agent can run `lore -J mrs 3929` and see `gitlab_discussion_id`, `resolvable`,
   `resolved`, and `last_note_at_iso` on each discussion group, plus `gitlab_note_id` on
   each note within
-4. `lore robot-docs` lists actual field names for all commands
-5. All existing tests still pass
-6. No clippy warnings (pedantic + nursery)
-7. Robot-docs contract tests pass with field-set parity (not just string-contains), preventing
+5. `lore robot-docs` lists actual field names for all commands
+6. All existing tests still pass
+7. No clippy warnings (pedantic + nursery)
+8. Robot-docs contract tests pass with field-set parity (not just string-contains), preventing
   future schema drift in both directions
-8. Bridge Contract fields (`project_path`, `gitlab_project_id`, `noteable_type`, `parent_iid`,
+9. Bridge Contract fields (`project_path`, `gitlab_project_id`, `noteable_type`, `parent_iid`,
   `gitlab_discussion_id`, `gitlab_note_id`) are present in every applicable read payload
-9. Bridge Contract fields survive `--fields` filtering in robot mode (guardrail enforced)
-10. `--gitlab-discussion-id` filter works on both `notes` and `discussions` commands
-11. `--include-notes N` populates inline notes on `discussions` output via single batched query
-12. CLI-level contract integration tests verify bridge fields through the full handler path
-13. `gitlab_note_id` is available in notes list output (alongside `gitlab_id` for back-compat)
+10. Bridge Contract fields survive `--fields` filtering in robot mode (guardrail enforced),
+    including nested notes within `discussions --include-notes`
+11. `--gitlab-discussion-id` filter works on both `notes` and `discussions` commands
+12. `--include-notes N` populates inline notes on `discussions` output via single batched query
+13. CLI-level contract integration tests verify bridge fields through the full handler path
+14. `gitlab_note_id` is available in notes list output (alongside `gitlab_id` for back-compat)
    and in show detail notes, providing a uniform field name across all commands
-14. Ambiguity guardrail fires when `--gitlab-discussion-id` matches multiple projects without
+15. Ambiguity guardrail fires when `--gitlab-discussion-id` matches multiple projects without
    `--project` specified — **including when LIMIT would have hidden the ambiguity** (preflight
-    query runs before LIMIT)
-15. Output guardrails clamp `--limit` to 500 and `--include-notes` to 20; `meta` reports
+    query runs before LIMIT). Error includes structured candidates with `gitlab_project_id`
+    for machine consumption
+16. Ambiguity preflight is scope-aware: active filters (noteable_type, for_issue/for_mr) are
+    applied alongside the discussion ID check, preventing false ambiguity when scope already
+    narrows to one project
+17. Output guardrails clamp `--limit` to 500 and `--include-notes` to 20; `meta` reports
    effective values and `has_more` truncation flag
-16. Discussion and show queries use deterministic ordering (COALESCE + id tiebreaker) to
+18. Discussion and show queries use deterministic ordering (COALESCE + id tiebreaker) to
    prevent unstable output during partial sync states
-17. Per-discussion truncation signals (`included_note_count`, `has_more_notes`) are accurate
+19. Per-discussion truncation signals (`included_note_count`, `has_more_notes`) are accurate
    for `--include-notes` output
-18. Multi-query commands (`query_notes`, `query_discussions`) use deferred read transactions
-    for snapshot consistency during concurrent ingest
-19. Discussion filters (`resolution`, `noteable_type`, `sort`, `order`) use typed enums
+20. Multi-query handlers (`handle_notes`, `handle_discussions`) open deferred read transactions;
+    query helpers accept `&Connection` for snapshot consistency and testability
+21. Discussion filters (`resolution`, `noteable_type`, `sort`, `order`) use typed enums
    with match-to-SQL mapping — no raw string interpolation in query construction
-20. First-note rollup correctly handles discussions with leading system notes — `first_author`
+22. First-note rollup correctly handles discussions with leading system notes — `first_author`
    and `first_note_body_snippet` always reflect the first non-system note
-21. Query plans for primary discussion query patterns (`--for-mr`, `--project --since`,
+23. Query plans for primary discussion query patterns (`--for-mr`, `--project --since`,
    `--gitlab-discussion-id`) have been validated via EXPLAIN QUERY PLAN; targeted indexes
    added only where scans were observed
+24. The `notes(discussion_id, created_at DESC, id DESC)` index is present for `--include-notes`
+    expansion performance
+25. Discussion query uses page-first CTE architecture: note rollups scan only the paged result
+    set, not all filtered discussions, keeping latency proportional to `--limit`
+26. `--contains` filter on `discussions` returns only discussions with matching note text

 ---

@@ -1902,6 +2123,6 @@ After all changes:
 - **`--with-write-hints` flag for inline glab endpoint templates** — rejected because this couples lore's read surface to glab's API surface, violating the read/write split principle. The Bridge Contract gives agents the raw identifiers; constructing glab commands is the agent's responsibility. Adding endpoint templates would require lore to track glab API changes, creating an unnecessary maintenance burden.
 - **Show-command note ordering change (`ORDER BY COALESCE(position, ...), created_at, id`)** — rejected because show-command note ordering within a discussion thread is out of scope for this plan. The existing ordering works correctly for present data; the defensive COALESCE pattern is applied to discussion-level ordering where it matters for agent workflows.
 - **Query-plan validation as a separate numbered workstream** — rejected because it adds delivery overhead without proportional benefit. Query-plan validation is integrated into workstream 3 as a pre-merge validation step (section 3h), with candidate indexes listed but only added when EXPLAIN QUERY PLAN shows they're needed. This keeps the measured approach without inflating the workstream count.
- **Add `gitlab_note_id` to show-command note detail structs** — rejected because show-command note detail structs already have `gitlab_id` (same value as `id`). The field is unambiguous and consistent with the Bridge Contract. Adding `gitlab_note_id` would create a duplicate and increase payload size without benefit.
- **Add `gitlab_discussion_id` to show-command discussion detail structs** — rejected because show-command discussion detail structs already have `gitlab_discussion_id`. The field is unambiguous and consistent with the Bridge Contract. Adding `gitlab_discussion_id` would create a duplicate and increase payload size without benefit.
- **Add `gitlab_project_id` to show-command discussion detail structs** — rejected because show-command discussion detail structs already have `gitlab_project_id`. The field is unambiguous and consistent with the Bridge Contract. Adding `gitlab_project_id` would create a duplicate and increase payload size without benefit.
+- **`--project-id` immutable input filter across notes/discussions/show** — rejected because this is scope creep touching every command and changing CLI ergonomics. Agents already get `gitlab_project_id` in output to construct API calls; the input-side concern (project renames breaking `--project`) is theoretical and hasn't been observed in practice. The `--project` flag already supports fuzzy matching which handles most rename scenarios. If real-world evidence surfaces, this can be added later without breaking changes.
+- **Schema versioning in robot-docs (`schema_version` field + semver policy)** — rejected because this tool has zero external consumers beyond our own agents, and the contract tests (field-set parity assertions) catch drift at compile time. Schema versioning adds bureaucratic overhead (version bumps, compatibility matrices, deprecation policies) without proportional benefit for an internal tool in early development. If lore gains external consumers, this can be reconsidered.
+- **Remove "stale" rejected items that "conflict" with active sections** — rejected because the prior entries about show-command structs were stale from iteration 2 and have been cleaned up independently. The rejected section is cumulative by design — it prevents future reviewers from re-proposing changes that have already been evaluated.