Below are the highest-impact revisions I’d make to this plan. I excluded everything listed in your `## Rejected Recommendations` section. **1. Fix a correctness bug in the ambiguity guardrail (must run before `LIMIT`)** The current post-query ambiguity check can silently fail when `--limit` truncates results to one project even though multiple projects match the same `gitlab_discussion_id`. That creates non-deterministic write targeting risk. ```diff @@ ## Ambiguity Guardrail -**Implementation**: After the main query, if `gitlab_discussion_id` is set and no `--project` -was provided, check if the result set spans multiple `project_path` values. +**Implementation**: Run a preflight distinct-project check when `gitlab_discussion_id` is set +and `--project` was not provided, before the main list query applies `LIMIT`. +Use: +```sql +SELECT DISTINCT p.path_with_namespace +FROM discussions d +JOIN projects p ON p.id = d.project_id +WHERE d.gitlab_discussion_id = ? +LIMIT 3 +``` +If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with project +paths and suggestion to retry with `--project `. ``` --- **2. Add `gitlab_project_id` to the Bridge Contract** `project_path` is human-friendly but mutable (renames/transfers). `gitlab_project_id` gives a stable write target and avoids path re-resolution failures. ```diff @@ ## Bridge Contract (Cross-Cutting) Every read payload that surfaces notes or discussions **MUST** include: - `project_path` +- `gitlab_project_id` - `noteable_type` - `parent_iid` - `gitlab_discussion_id` - `gitlab_note_id` @@ const BRIDGE_FIELDS_NOTES: &[&str] = &[ - "project_path", "noteable_type", "parent_iid", + "project_path", "gitlab_project_id", "noteable_type", "parent_iid", "gitlab_discussion_id", "gitlab_note_id", ]; const BRIDGE_FIELDS_DISCUSSIONS: &[&str] = &[ - "project_path", "noteable_type", "parent_iid", + "project_path", "gitlab_project_id", "noteable_type", "parent_iid", "gitlab_discussion_id", ]; ``` --- **3. Replace stringly-typed filter/sort fields with enums end-to-end** Right now `sort`, `order`, `resolution`, `noteable_type` are mostly `String`. This is fragile and risks unsafe SQL interpolation drift over time. Typed enums make invalid states unrepresentable. ```diff @@ ## 3a. CLI Args - pub resolution: Option, + pub resolution: Option, @@ - pub noteable_type: Option, + pub noteable_type: Option, @@ - pub sort: String, + pub sort: DiscussionSortField, @@ - pub asc: bool, + pub order: SortDirection, @@ ## 3d. Filters struct - pub resolution: Option, - pub noteable_type: Option, - pub sort: String, - pub order: String, + pub resolution: Option, + pub noteable_type: Option, + pub sort: DiscussionSortField, + pub order: SortDirection, @@ +Map enum -> SQL fragment via `match` in query builder; never interpolate raw strings. ``` --- **4. Enforce snapshot consistency for multi-query commands** `discussions` with `--include-notes` does multiple reads. Without a single read transaction, concurrent ingest can produce mismatched `total_count`, row set, and expanded notes. ```diff @@ ## 3c. SQL Query -pub fn query_discussions(...) +pub fn query_discussions(...) { + // Run count query + page query + note expansion under one deferred read transaction + // so output is a single consistent snapshot. + let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?; ... + tx.commit()?; } @@ ## 1. Add `gitlab_discussion_id` to Notes Output +Apply the same snapshot rule to `query_notes` when returning `total_count` + paged rows. ``` --- **5. Correct first-note rollup semantics (current CTE can return null/incorrect `first_author`)** In the proposed SQL, `rn=1` is computed over all notes but then filtered with `is_system=0`, so threads with a leading system note may incorrectly lose `first_author`/snippet. Also path rollup uses non-deterministic `MAX(...)`. ```diff @@ ## 3c. SQL Query -ranked_notes AS ( +ranked_notes AS ( SELECT n.discussion_id, n.author_username, n.body, n.is_system, n.position_new_path, n.position_new_line, - ROW_NUMBER() OVER ( - PARTITION BY n.discussion_id - ORDER BY n.position, n.id - ) AS rn + ROW_NUMBER() OVER ( + PARTITION BY n.discussion_id + ORDER BY CASE WHEN n.is_system = 0 THEN 0 ELSE 1 END, n.created_at, n.id + ) AS rn_first_note, + ROW_NUMBER() OVER ( + PARTITION BY n.discussion_id + ORDER BY CASE WHEN n.position_new_path IS NULL THEN 1 ELSE 0 END, n.created_at, n.id + ) AS rn_first_position @@ - MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author, - MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body, - MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path, - MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line + MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN author_username END) AS first_author, + MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN body END) AS first_note_body, + MAX(CASE WHEN rn_first_position = 1 THEN position_new_path END) AS position_new_path, + MAX(CASE WHEN rn_first_position = 1 THEN position_new_line END) AS position_new_line ``` --- **6. Add per-discussion truncation signals for `--include-notes`** Top-level `has_more` is useful, but agents also need to know if an individual thread’s notes were truncated. Otherwise they can’t tell if a thread is complete. ```diff @@ ## Response Schema { "gitlab_discussion_id": "...", ... - "notes": [] + "included_note_count": 0, + "has_more_notes": false, + "notes": [] } @@ ## 3b. Domain Structs pub struct DiscussionListRowJson { @@ + pub included_note_count: usize, + pub has_more_notes: bool, #[serde(skip_serializing_if = "Vec::is_empty")] pub notes: Vec, } @@ ## 3c-ii. Note expansion query (--include-notes) -Group by `discussion_id` in Rust and attach notes arrays... +Group by `discussion_id` in Rust, attach notes arrays, and set: +`included_note_count = notes.len()`, +`has_more_notes = note_count > included_note_count`. ``` --- **7. Add explicit query-plan gate and targeted index workstream (measured, not speculative)** This plan introduces heavy discussion-centric reads. You should bake in deterministic performance validation with `EXPLAIN QUERY PLAN` and only then add indexes if missing. ```diff @@ ## Scope: Four workstreams, delivered in order: -4. Fix robot-docs to list actual field names instead of opaque type references +4. Add query-plan validation + targeted index updates for new discussion queries +5. Fix robot-docs to list actual field names instead of opaque type references @@ +## 4. Query-Plan Validation and Targeted Indexes + +Before and after implementing `query_discussions`, capture `EXPLAIN QUERY PLAN` for: +- `--for-mr --resolution unresolved` +- `--project --since 7d --sort last_note` +- `--gitlab-discussion-id ` + +If plans show table scans on `notes`/`discussions`, add indexes in `MIGRATIONS` array: +- `discussions(project_id, gitlab_discussion_id)` +- `discussions(merge_request_id, last_note_at, id)` +- `notes(discussion_id, created_at DESC, id DESC)` +- `notes(discussion_id, position, id)` + +Tests: assert the new query paths return expected rows under indexed schema and no regressions. ``` --- If you want, I can produce a single consolidated “iteration 4” version of the plan text with all seven revisions merged in place.