gitlore/docs/plan-expose-discussion-ids.feedback-4.md at embedding-title-normalization

Files

teernisse 171260a772 feat(cli): implement 'lore trace' command (bd-2n4, bd-9dd)

Gate 5 Code Trace - Tier 1 (API-only, no git blame).
Answers 'Why was this code introduced?' by building
file -> MR -> issue -> discussion chains.

New files:
- src/core/trace.rs: run_trace() query logic with rename-aware
  path resolution, entity_reference-based issue linking, and
  DiffNote discussion extraction
- src/core/trace_tests.rs: 7 unit tests for query logic
- src/cli/commands/trace.rs: CLI command with human output,
  robot JSON output, and :line suffix parsing (5 tests)

Human output shows full content (no truncation).
Robot JSON truncates discussion bodies to 500 chars for token efficiency.

Wiring:
- TraceArgs + Commands::Trace in cli/mod.rs
- handle_trace in main.rs
- VALID_COMMANDS + robot-docs manifest entry
- COMMAND_FLAGS autocorrect registry entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-17 14:57:21 -05:00

7.8 KiB

Raw Permalink Blame History

Below are the highest-impact revisions I’d make to this plan. I excluded everything listed in your ## Rejected Recommendations section.

1. Fix a correctness bug in the ambiguity guardrail (must run before LIMIT)

The current post-query ambiguity check can silently fail when --limit truncates results to one project even though multiple projects match the same gitlab_discussion_id. That creates non-deterministic write targeting risk.

@@ ## Ambiguity Guardrail
-**Implementation**: After the main query, if `gitlab_discussion_id` is set and no `--project`
-was provided, check if the result set spans multiple `project_path` values.
+**Implementation**: Run a preflight distinct-project check when `gitlab_discussion_id` is set
+and `--project` was not provided, before the main list query applies `LIMIT`.
+Use:
+```sql
+SELECT DISTINCT p.path_with_namespace
+FROM discussions d
+JOIN projects p ON p.id = d.project_id
+WHERE d.gitlab_discussion_id = ?
+LIMIT 3
+```
+If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with project
+paths and suggestion to retry with `--project <path>`.

2. Add gitlab_project_id to the Bridge Contract

project_path is human-friendly but mutable (renames/transfers). gitlab_project_id gives a stable write target and avoids path re-resolution failures.

@@ ## Bridge Contract (Cross-Cutting)
 Every read payload that surfaces notes or discussions **MUST** include:
 - `project_path`
+- `gitlab_project_id`
 - `noteable_type`
 - `parent_iid`
 - `gitlab_discussion_id`
 - `gitlab_note_id`
@@
 const BRIDGE_FIELDS_NOTES: &[&str] = &[
-    "project_path", "noteable_type", "parent_iid",
+    "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
     "gitlab_discussion_id", "gitlab_note_id",
 ];
 const BRIDGE_FIELDS_DISCUSSIONS: &[&str] = &[
-    "project_path", "noteable_type", "parent_iid",
+    "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
     "gitlab_discussion_id",
 ];

3. Replace stringly-typed filter/sort fields with enums end-to-end

Right now sort, order, resolution, noteable_type are mostly String. This is fragile and risks unsafe SQL interpolation drift over time. Typed enums make invalid states unrepresentable.

@@ ## 3a. CLI Args
-    pub resolution: Option<String>,
+    pub resolution: Option<ResolutionFilter>,
@@
-    pub noteable_type: Option<String>,
+    pub noteable_type: Option<NoteableTypeFilter>,
@@
-    pub sort: String,
+    pub sort: DiscussionSortField,
@@
-    pub asc: bool,
+    pub order: SortDirection,
@@ ## 3d. Filters struct
-    pub resolution: Option<String>,
-    pub noteable_type: Option<String>,
-    pub sort: String,
-    pub order: String,
+    pub resolution: Option<ResolutionFilter>,
+    pub noteable_type: Option<NoteableTypeFilter>,
+    pub sort: DiscussionSortField,
+    pub order: SortDirection,
@@
+Map enum -> SQL fragment via `match` in query builder; never interpolate raw strings.

4. Enforce snapshot consistency for multi-query commands

discussions with --include-notes does multiple reads. Without a single read transaction, concurrent ingest can produce mismatched total_count, row set, and expanded notes.

@@ ## 3c. SQL Query
-pub fn query_discussions(...)
+pub fn query_discussions(...)
 {
+    // Run count query + page query + note expansion under one deferred read transaction
+    // so output is a single consistent snapshot.
+    let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
     ...
+    tx.commit()?;
 }
@@ ## 1. Add `gitlab_discussion_id` to Notes Output
+Apply the same snapshot rule to `query_notes` when returning `total_count` + paged rows.

5. Correct first-note rollup semantics (current CTE can return null/incorrect first_author)

In the proposed SQL, rn=1 is computed over all notes but then filtered with is_system=0, so threads with a leading system note may incorrectly lose first_author/snippet. Also path rollup uses non-deterministic MAX(...).

@@ ## 3c. SQL Query
-ranked_notes AS (
+ranked_notes AS (
     SELECT
         n.discussion_id,
         n.author_username,
         n.body,
         n.is_system,
         n.position_new_path,
         n.position_new_line,
-        ROW_NUMBER() OVER (
-            PARTITION BY n.discussion_id
-            ORDER BY n.position, n.id
-        ) AS rn
+        ROW_NUMBER() OVER (
+            PARTITION BY n.discussion_id
+            ORDER BY CASE WHEN n.is_system = 0 THEN 0 ELSE 1 END, n.created_at, n.id
+        ) AS rn_first_note,
+        ROW_NUMBER() OVER (
+            PARTITION BY n.discussion_id
+            ORDER BY CASE WHEN n.position_new_path IS NULL THEN 1 ELSE 0 END, n.created_at, n.id
+        ) AS rn_first_position
@@
-        MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author,
-        MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body,
-        MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path,
-        MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line
+        MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN author_username END) AS first_author,
+        MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN body END) AS first_note_body,
+        MAX(CASE WHEN rn_first_position = 1 THEN position_new_path END) AS position_new_path,
+        MAX(CASE WHEN rn_first_position = 1 THEN position_new_line END) AS position_new_line

6. Add per-discussion truncation signals for --include-notes

Top-level has_more is useful, but agents also need to know if an individual thread’s notes were truncated. Otherwise they can’t tell if a thread is complete.

@@ ## Response Schema
       {
         "gitlab_discussion_id": "...",
         ...
-        "notes": []
+        "included_note_count": 0,
+        "has_more_notes": false,
+        "notes": []
       }
@@ ## 3b. Domain Structs
 pub struct DiscussionListRowJson {
@@
+    pub included_note_count: usize,
+    pub has_more_notes: bool,
     #[serde(skip_serializing_if = "Vec::is_empty")]
     pub notes: Vec<NoteListRowJson>,
 }
@@ ## 3c-ii. Note expansion query (--include-notes)
-Group by `discussion_id` in Rust and attach notes arrays...
+Group by `discussion_id` in Rust, attach notes arrays, and set:
+`included_note_count = notes.len()`,
+`has_more_notes = note_count > included_note_count`.

7. Add explicit query-plan gate and targeted index workstream (measured, not speculative)

This plan introduces heavy discussion-centric reads. You should bake in deterministic performance validation with EXPLAIN QUERY PLAN and only then add indexes if missing.

@@ ## Scope: Four workstreams, delivered in order:
-4. Fix robot-docs to list actual field names instead of opaque type references
+4. Add query-plan validation + targeted index updates for new discussion queries
+5. Fix robot-docs to list actual field names instead of opaque type references
@@
+## 4. Query-Plan Validation and Targeted Indexes
+
+Before and after implementing `query_discussions`, capture `EXPLAIN QUERY PLAN` for:
+- `--for-mr <iid> --resolution unresolved`
+- `--project <path> --since 7d --sort last_note`
+- `--gitlab-discussion-id <id>`
+
+If plans show table scans on `notes`/`discussions`, add indexes in `MIGRATIONS` array:
+- `discussions(project_id, gitlab_discussion_id)`
+- `discussions(merge_request_id, last_note_at, id)`
+- `notes(discussion_id, created_at DESC, id DESC)`
+- `notes(discussion_id, position, id)`
+
+Tests: assert the new query paths return expected rows under indexed schema and no regressions.

If you want, I can produce a single consolidated “iteration 4” version of the plan text with all seven revisions merged in place.

7.8 KiB Raw Permalink Blame History Unescape Escape

7.8 KiB

Raw Permalink Blame History