feat(cli): implement 'lore trace' command (bd-2n4, bd-9dd)

Gate 5 Code Trace - Tier 1 (API-only, no git blame).
Answers 'Why was this code introduced?' by building
file -> MR -> issue -> discussion chains.

New files:
- src/core/trace.rs: run_trace() query logic with rename-aware
  path resolution, entity_reference-based issue linking, and
  DiffNote discussion extraction
- src/core/trace_tests.rs: 7 unit tests for query logic
- src/cli/commands/trace.rs: CLI command with human output,
  robot JSON output, and :line suffix parsing (5 tests)

Wiring:
- TraceArgs + Commands::Trace in cli/mod.rs
- handle_trace in main.rs
- VALID_COMMANDS + robot-docs manifest entry
- COMMAND_FLAGS autocorrect registry entry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
teernisse
2026-02-17 14:16:45 -05:00
parent a1bca10408
commit 415f7e69af
13 changed files with 1514 additions and 78 deletions

View File

@@ -0,0 +1,147 @@
1. **Make `gitlab_note_id` explicit in all note-level payloads without breaking existing consumers**
Rationale: Your Bridge Contract already requires `gitlab_note_id`, but current plan keeps `gitlab_id` only in `notes` list while adding `gitlab_note_id` only in `show`. That forces agents to special-case commands. Add `gitlab_note_id` as an alias field everywhere note-level data appears, while keeping `gitlab_id` for compatibility.
```diff
@@ Bridge Contract (Cross-Cutting)
-Every read payload that surfaces notes or discussions MUST include:
+Every read payload that surfaces notes or discussions MUST include:
- project_path
- noteable_type
- parent_iid
- gitlab_discussion_id
- gitlab_note_id (when note-level data is returned — i.e., in notes list and show detail)
+ - Back-compat rule: note payloads may continue exposing `gitlab_id`, but MUST also expose `gitlab_note_id` with the same value.
@@ 1. Add `gitlab_discussion_id` to Notes Output
-#### 1c. Add field to `NoteListRowJson`
+#### 1c. Add fields to `NoteListRowJson`
+Add `gitlab_note_id` alias in addition to existing `gitlab_id` (no rename, no breakage).
@@ 1f. Update `--fields minimal` preset
-"notes" => ["id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
+"notes" => ["id", "gitlab_note_id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
```
2. **Avoid duplicate flag semantics for discussion filtering**
Rationale: `notes` already has `--discussion-id` and it already maps to `d.gitlab_discussion_id`. Adding a second independent flag/field (`--gitlab-discussion-id`) increases complexity and precedence bugs. Keep one backing filter field and make the new flag an alias.
```diff
@@ 1g. Add `--gitlab-discussion-id` filter to notes
-Allow filtering notes directly by GitLab discussion thread ID...
+Normalize discussion ID flags:
+- Keep one backing filter field (`discussion_id`)
+- Support both `--discussion-id` (existing) and `--gitlab-discussion-id` (alias)
+- If both are provided, clap should reject as duplicate/alias conflict
```
3. **Add ambiguity guardrails for cross-project discussion IDs**
Rationale: `gitlab_discussion_id` is unique per project, not globally. Filtering by discussion ID without project can return multiple rows across repos, which breaks deterministic write bridging. Fail fast with an `Ambiguous` error and actionable fix (`--project`).
```diff
@@ Bridge Contract (Cross-Cutting)
+### Ambiguity Guardrail
+When filtering by `gitlab_discussion_id` without `--project`, if multiple projects match:
+- return `Ambiguous` error
+- include matching project paths in message
+- suggest retry with `--project <path>`
```
4. **Replace `--include-notes` N+1 retrieval with one batched top-N query**
Rationale: The current plans per-discussion follow-up query scales poorly and creates latency spikes. Use a single window-function query over selected discussion IDs and group rows in Rust. This is both faster and more predictable.
```diff
@@ 3c-ii. Note expansion query (--include-notes)
-When `include_notes > 0`, after the main discussion query, run a follow-up query per discussion...
+When `include_notes > 0`, run one batched query:
+WITH ranked_notes AS (
+ SELECT
+ n.*,
+ d.gitlab_discussion_id,
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY n.created_at DESC, n.id DESC
+ ) AS rn
+ FROM notes n
+ JOIN discussions d ON d.id = n.discussion_id
+ WHERE n.discussion_id IN ( ...selected discussion ids... )
+)
+SELECT ... FROM ranked_notes WHERE rn <= ?
+ORDER BY discussion_id, rn;
+
+Group by `discussion_id` in Rust and attach notes arrays without per-thread round-trips.
```
5. **Add hard output guardrails and explicit truncation metadata**
Rationale: `--limit` and `--include-notes` are unbounded today. For robot workflows this can accidentally generate huge payloads. Cap values and surface effective limits plus truncation state in `meta`.
```diff
@@ 3a. CLI Args
- pub limit: usize,
+ pub limit: usize, // clamp to max (e.g., 500)
- pub include_notes: usize,
+ pub include_notes: usize, // clamp to max (e.g., 20)
@@ Response Schema
- "meta": { "elapsed_ms": 12 }
+ "meta": {
+ "elapsed_ms": 12,
+ "effective_limit": 50,
+ "effective_include_notes": 2,
+ "has_more": true
+ }
```
6. **Strengthen deterministic ordering and null handling**
Rationale: `first_note_at`, `last_note_at`, and note `position` can be null/incomplete during partial sync states. Add null-safe ordering to avoid unstable output and flaky automation.
```diff
@@ 2c. Update queries to SELECT new fields
-... ORDER BY first_note_at
+... ORDER BY COALESCE(first_note_at, last_note_at, 0), id
@@ show note query
-ORDER BY position
+ORDER BY COALESCE(position, 9223372036854775807), created_at, id
@@ 3c. SQL Query
-ORDER BY {sort_column} {order}
+ORDER BY COALESCE({sort_column}, 0) {order}, fd.id {order}
```
7. **Make write-bridging more useful with optional command hints**
Rationale: Exposing IDs is necessary but not sufficient; agents still need to assemble endpoints repeatedly. Add optional `--with-write-hints` that injects compact endpoint templates (`reply`, `resolve`) derived from row context. This improves usability without bloating default output.
```diff
@@ 3a. CLI Args
+ /// Include machine-actionable glab write hints per row
+ #[arg(long, help_heading = "Output")]
+ pub with_write_hints: bool,
@@ Response Schema (notes/discussions/show)
+ "write_hints?": {
+ "reply_endpoint": "string",
+ "resolve_endpoint?": "string"
+ }
```
8. **Upgrade robot-docs/contract validation from string-contains to parity checks**
Rationale: `contains("gitlab_discussion_id")` catches very little and allows schema drift. Build field-set parity tests that compare actual serialized JSON keys to robot-docs declared fields for `notes`, `discussions`, and `show` discussion nodes.
```diff
@@ 4f. Add robot-docs contract tests
-assert!(notes_schema.contains("gitlab_discussion_id"));
+let declared = parse_schema_field_list(notes_schema);
+let sample = sample_notes_row_json_keys();
+assert_required_subset(&declared, &["project_path","noteable_type","parent_iid","gitlab_discussion_id","gitlab_note_id"]);
+assert_schema_matches_payload(&declared, &sample);
@@ 4g. Add CLI-level contract integration tests
+Add parity tests for:
+- notes list JSON
+- discussions list JSON
+- issues show discussions[*]
+- mrs show discussions[*]
```
If you want, I can produce a full revised v3 plan text with these edits merged end-to-end so its ready to execute directly.

View File

@@ -0,0 +1,207 @@
Below are the highest-impact revisions Id make to this plan. I excluded everything listed in your `## Rejected Recommendations` section.
**1. Fix a correctness bug in the ambiguity guardrail (must run before `LIMIT`)**
The current post-query ambiguity check can silently fail when `--limit` truncates results to one project even though multiple projects match the same `gitlab_discussion_id`. That creates non-deterministic write targeting risk.
```diff
@@ ## Ambiguity Guardrail
-**Implementation**: After the main query, if `gitlab_discussion_id` is set and no `--project`
-was provided, check if the result set spans multiple `project_path` values.
+**Implementation**: Run a preflight distinct-project check when `gitlab_discussion_id` is set
+and `--project` was not provided, before the main list query applies `LIMIT`.
+Use:
+```sql
+SELECT DISTINCT p.path_with_namespace
+FROM discussions d
+JOIN projects p ON p.id = d.project_id
+WHERE d.gitlab_discussion_id = ?
+LIMIT 3
+```
+If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with project
+paths and suggestion to retry with `--project <path>`.
```
---
**2. Add `gitlab_project_id` to the Bridge Contract**
`project_path` is human-friendly but mutable (renames/transfers). `gitlab_project_id` gives a stable write target and avoids path re-resolution failures.
```diff
@@ ## Bridge Contract (Cross-Cutting)
Every read payload that surfaces notes or discussions **MUST** include:
- `project_path`
+- `gitlab_project_id`
- `noteable_type`
- `parent_iid`
- `gitlab_discussion_id`
- `gitlab_note_id`
@@
const BRIDGE_FIELDS_NOTES: &[&str] = &[
- "project_path", "noteable_type", "parent_iid",
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id", "gitlab_note_id",
];
const BRIDGE_FIELDS_DISCUSSIONS: &[&str] = &[
- "project_path", "noteable_type", "parent_iid",
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
"gitlab_discussion_id",
];
```
---
**3. Replace stringly-typed filter/sort fields with enums end-to-end**
Right now `sort`, `order`, `resolution`, `noteable_type` are mostly `String`. This is fragile and risks unsafe SQL interpolation drift over time. Typed enums make invalid states unrepresentable.
```diff
@@ ## 3a. CLI Args
- pub resolution: Option<String>,
+ pub resolution: Option<ResolutionFilter>,
@@
- pub noteable_type: Option<String>,
+ pub noteable_type: Option<NoteableTypeFilter>,
@@
- pub sort: String,
+ pub sort: DiscussionSortField,
@@
- pub asc: bool,
+ pub order: SortDirection,
@@ ## 3d. Filters struct
- pub resolution: Option<String>,
- pub noteable_type: Option<String>,
- pub sort: String,
- pub order: String,
+ pub resolution: Option<ResolutionFilter>,
+ pub noteable_type: Option<NoteableTypeFilter>,
+ pub sort: DiscussionSortField,
+ pub order: SortDirection,
@@
+Map enum -> SQL fragment via `match` in query builder; never interpolate raw strings.
```
---
**4. Enforce snapshot consistency for multi-query commands**
`discussions` with `--include-notes` does multiple reads. Without a single read transaction, concurrent ingest can produce mismatched `total_count`, row set, and expanded notes.
```diff
@@ ## 3c. SQL Query
-pub fn query_discussions(...)
+pub fn query_discussions(...)
{
+ // Run count query + page query + note expansion under one deferred read transaction
+ // so output is a single consistent snapshot.
+ let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
...
+ tx.commit()?;
}
@@ ## 1. Add `gitlab_discussion_id` to Notes Output
+Apply the same snapshot rule to `query_notes` when returning `total_count` + paged rows.
```
---
**5. Correct first-note rollup semantics (current CTE can return null/incorrect `first_author`)**
In the proposed SQL, `rn=1` is computed over all notes but then filtered with `is_system=0`, so threads with a leading system note may incorrectly lose `first_author`/snippet. Also path rollup uses non-deterministic `MAX(...)`.
```diff
@@ ## 3c. SQL Query
-ranked_notes AS (
+ranked_notes AS (
SELECT
n.discussion_id,
n.author_username,
n.body,
n.is_system,
n.position_new_path,
n.position_new_line,
- ROW_NUMBER() OVER (
- PARTITION BY n.discussion_id
- ORDER BY n.position, n.id
- ) AS rn
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY CASE WHEN n.is_system = 0 THEN 0 ELSE 1 END, n.created_at, n.id
+ ) AS rn_first_note,
+ ROW_NUMBER() OVER (
+ PARTITION BY n.discussion_id
+ ORDER BY CASE WHEN n.position_new_path IS NULL THEN 1 ELSE 0 END, n.created_at, n.id
+ ) AS rn_first_position
@@
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author,
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body,
- MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path,
- MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN author_username END) AS first_author,
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN body END) AS first_note_body,
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_path END) AS position_new_path,
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_line END) AS position_new_line
```
---
**6. Add per-discussion truncation signals for `--include-notes`**
Top-level `has_more` is useful, but agents also need to know if an individual threads notes were truncated. Otherwise they cant tell if a thread is complete.
```diff
@@ ## Response Schema
{
"gitlab_discussion_id": "...",
...
- "notes": []
+ "included_note_count": 0,
+ "has_more_notes": false,
+ "notes": []
}
@@ ## 3b. Domain Structs
pub struct DiscussionListRowJson {
@@
+ pub included_note_count: usize,
+ pub has_more_notes: bool,
#[serde(skip_serializing_if = "Vec::is_empty")]
pub notes: Vec<NoteListRowJson>,
}
@@ ## 3c-ii. Note expansion query (--include-notes)
-Group by `discussion_id` in Rust and attach notes arrays...
+Group by `discussion_id` in Rust, attach notes arrays, and set:
+`included_note_count = notes.len()`,
+`has_more_notes = note_count > included_note_count`.
```
---
**7. Add explicit query-plan gate and targeted index workstream (measured, not speculative)**
This plan introduces heavy discussion-centric reads. You should bake in deterministic performance validation with `EXPLAIN QUERY PLAN` and only then add indexes if missing.
```diff
@@ ## Scope: Four workstreams, delivered in order:
-4. Fix robot-docs to list actual field names instead of opaque type references
+4. Add query-plan validation + targeted index updates for new discussion queries
+5. Fix robot-docs to list actual field names instead of opaque type references
@@
+## 4. Query-Plan Validation and Targeted Indexes
+
+Before and after implementing `query_discussions`, capture `EXPLAIN QUERY PLAN` for:
+- `--for-mr <iid> --resolution unresolved`
+- `--project <path> --since 7d --sort last_note`
+- `--gitlab-discussion-id <id>`
+
+If plans show table scans on `notes`/`discussions`, add indexes in `MIGRATIONS` array:
+- `discussions(project_id, gitlab_discussion_id)`
+- `discussions(merge_request_id, last_note_at, id)`
+- `notes(discussion_id, created_at DESC, id DESC)`
+- `notes(discussion_id, position, id)`
+
+Tests: assert the new query paths return expected rows under indexed schema and no regressions.
```
---
If you want, I can produce a single consolidated “iteration 4” version of the plan text with all seven revisions merged in place.

View File

@@ -2,7 +2,7 @@
plan: true
title: ""
status: iterating
iteration: 2
iteration: 3
target_iterations: 8
beads_revision: 0
related_plans: []
@@ -34,6 +34,11 @@ Every read payload that surfaces notes or discussions **MUST** include:
- `gitlab_discussion_id`
- `gitlab_note_id` (when note-level data is returned — i.e., in notes list and show detail)
**Back-compat rule**: Note payloads in the `notes` list command continue exposing `gitlab_id`
for existing consumers, but **MUST also** expose `gitlab_note_id` with the same value. This
ensures agents can use a single field name (`gitlab_note_id`) across all commands — `notes`,
`show`, and `discussions --include-notes` — without special-casing by command.
This contract exists so agents can deterministically construct `glab api` write calls without
cross-referencing multiple commands. Each workstream below must satisfy these fields in its
output.
@@ -64,6 +69,37 @@ In `filter_fields`, when entity is `"notes"` or `"discussions"`, merge the bridg
requested fields before filtering the JSON value. This is a ~5-line change to the existing
function.
### Ambiguity Guardrail
When filtering by `gitlab_discussion_id` (on either `notes` or `discussions` commands) without
`--project`, if the query matches discussions in multiple projects:
- Return an `Ambiguous` error (exit code 18, matching existing convention)
- Include matching project paths in the error message
- Suggest retry with `--project <path>`
**Implementation**: After the main query, if `gitlab_discussion_id` is set and no `--project`
was provided, check if the result set spans multiple `project_path` values. If so, return
`LoreError::Ambiguous` with the distinct project paths. This is a post-query check (not a
pre-query reject) so it only fires when real ambiguity exists.
```rust
// In query_notes / query_discussions, after collecting results:
if filters.gitlab_discussion_id.is_some() && filters.project.is_none() {
let distinct_projects: HashSet<&str> = results.iter()
.map(|r| r.project_path.as_str())
.collect();
if distinct_projects.len() > 1 {
return Err(LoreError::Ambiguous {
message: format!(
"Discussion ID matches {} projects: {}. Use --project to disambiguate.",
distinct_projects.len(),
distinct_projects.into_iter().collect::<Vec<_>>().join(", ")
),
});
}
}
```
---
## 1. Add `gitlab_discussion_id` to Notes Output
@@ -175,13 +211,17 @@ etc.) which rusqlite's `row.get("name")` can resolve. This eliminates the fragil
column-index counting that has caused bugs in the past. If the conversion touches too many
lines, limit named lookup to just the new field and add a follow-up task.
#### 1c. Add field to `NoteListRowJson`
#### 1c. Add fields to `NoteListRowJson`
**File**: `src/cli/commands/list.rs` line ~1093
Add both `gitlab_discussion_id` and `gitlab_note_id` (alias for `gitlab_id`):
```rust
pub struct NoteListRowJson {
// ... existing fields ...
pub gitlab_id: i64, // KEEP — existing consumers
pub gitlab_note_id: i64, // ADD — Bridge Contract alias
pub project_path: String,
pub gitlab_discussion_id: String, // ADD
}
@@ -194,6 +234,8 @@ impl From<&NoteListRow> for NoteListRowJson {
fn from(row: &NoteListRow) -> Self {
Self {
// ... existing fields ...
gitlab_id: row.gitlab_id,
gitlab_note_id: row.gitlab_id, // ADD — same value as gitlab_id
project_path: row.project_path.clone(),
gitlab_discussion_id: row.gitlab_discussion_id.clone(), // ADD
}
@@ -205,7 +247,7 @@ impl From<&NoteListRow> for NoteListRowJson {
**File**: `src/cli/commands/list.rs` line ~1004
Add `gitlab_discussion_id` to the CSV header and row output.
Add `gitlab_discussion_id` and `gitlab_note_id` to the CSV header and row output.
#### 1e. Add to table display
@@ -218,13 +260,13 @@ Add a column showing a truncated discussion ID (first 8 chars) in the table view
**File**: `src/cli/robot.rs` line ~67
```rust
"notes" => ["id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
"notes" => ["id", "gitlab_note_id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
.iter()
.map(|s| (*s).to_string())
.collect(),
```
The discussion ID is critical enough for agent workflows that it belongs in `minimal`.
The discussion ID and note ID are critical for agent bridge workflows and belong in `minimal`.
#### 1g. Add `--gitlab-discussion-id` filter to notes
@@ -233,6 +275,10 @@ the internal integer). This enables one-hop note retrieval from external referen
that received a `gitlab_discussion_id` from another command or webhook can jump straight to
the relevant notes without knowing the internal discussion ID.
**Note**: This is distinct from the existing `--discussion-id` filter which takes the internal
integer ID. The two filters serve different use cases: internal cross-referencing vs. external
API bridging.
**File**: `src/cli/mod.rs` (NotesArgs)
```rust
@@ -286,6 +332,7 @@ fn note_list_row_json_includes_gitlab_discussion_id() {
let json_row = NoteListRowJson::from(&row);
assert_eq!(json_row.gitlab_discussion_id, "6a9c1750b37d");
assert_eq!(json_row.gitlab_note_id, 100); // alias matches gitlab_id
let serialized = serde_json::to_value(&json_row).unwrap();
assert!(serialized.get("gitlab_discussion_id").is_some());
@@ -293,6 +340,9 @@ fn note_list_row_json_includes_gitlab_discussion_id() {
serialized["gitlab_discussion_id"].as_str().unwrap(),
"6a9c1750b37d"
);
// Both gitlab_id and gitlab_note_id present with same value
assert_eq!(serialized["gitlab_id"], 100);
assert_eq!(serialized["gitlab_note_id"], 100);
}
```
@@ -436,6 +486,19 @@ fn notes_filter_by_gitlab_discussion_id() {
}
```
#### Test 6: Ambiguity guardrail fires for cross-project discussion ID matches
```rust
#[test]
fn notes_ambiguous_gitlab_discussion_id_across_projects() {
let conn = create_test_db();
// Insert 2 projects, each with a discussion sharing the same gitlab_discussion_id
// (this can happen since IDs are per-project)
// Filter by gitlab_discussion_id without --project
// Assert LoreError::Ambiguous is returned with both project paths
}
```
---
## 2. Add `gitlab_discussion_id` to Show Command Discussion Groups
@@ -534,18 +597,24 @@ pub struct MrDiscussionDetailJson {
**Issue discussions** (`show.rs:325`):
```sql
SELECT id, gitlab_discussion_id, individual_note, resolvable, resolved, last_note_at
SELECT id, gitlab_discussion_id, individual_note, resolvable, resolved,
COALESCE(last_note_at, first_note_at, 0) AS last_note_at
FROM discussions
WHERE issue_id = ? ORDER BY first_note_at
WHERE issue_id = ? ORDER BY COALESCE(first_note_at, last_note_at, 0), id
```
**MR discussions** (`show.rs:537`):
```sql
SELECT id, gitlab_discussion_id, individual_note, resolvable, resolved, last_note_at
SELECT id, gitlab_discussion_id, individual_note, resolvable, resolved,
COALESCE(last_note_at, first_note_at, 0) AS last_note_at
FROM discussions
WHERE merge_request_id = ? ORDER BY first_note_at
WHERE merge_request_id = ? ORDER BY COALESCE(first_note_at, last_note_at, 0), id
```
**Note on ordering**: The `COALESCE` with tiebreaker `id` ensures deterministic ordering even
when timestamps are NULL (possible during partial sync states). This prevents unstable output
that could confuse automated workflows.
#### 2d. Update query_map closures
The `disc_rows` tuple changes from `(i64, bool)` to a richer shape. Use named columns here
@@ -753,7 +822,12 @@ lore -J discussions --for-mr 99 --resolution unresolved --include-notes 2
"total_count": 15,
"showing": 15
},
"meta": { "elapsed_ms": 12 }
"meta": {
"elapsed_ms": 12,
"effective_limit": 50,
"effective_include_notes": 0,
"has_more": false
}
}
```
@@ -761,6 +835,10 @@ The `notes` array is empty by default (zero overhead). When `--include-notes N`
each discussion includes up to N of its most recent notes inline. This covers the common
agent pattern of "show me unresolved threads with context" in a single round-trip.
The `meta` block includes `effective_limit` and `effective_include_notes` (the clamped values
actually used) plus `has_more` (true when total_count > showing). This lets agents detect
truncation and decide whether to paginate or narrow their query.
### File Architecture
**No new files.** Follow the existing pattern:
@@ -789,7 +867,7 @@ Args struct:
```rust
#[derive(Parser)]
pub struct DiscussionsArgs {
/// Maximum results
/// Maximum results (clamped to 500)
#[arg(short = 'n', long = "limit", default_value = "50", help_heading = "Output")]
pub limit: usize,
@@ -833,7 +911,7 @@ pub struct DiscussionsArgs {
#[arg(long, value_parser = ["Issue", "MergeRequest"], help_heading = "Filters")]
pub noteable_type: Option<String>,
/// Include up to N latest notes per discussion (0 = none, default)
/// Include up to N latest notes per discussion (0 = none, default; clamped to 20)
#[arg(long, default_value = "0", help_heading = "Output")]
pub include_notes: usize,
@@ -847,6 +925,11 @@ pub struct DiscussionsArgs {
}
```
**Output guardrails**: The handler clamps `limit` to `min(limit, 500)` and `include_notes`
to `min(include_notes, 20)` before passing to the query layer. This prevents accidentally
huge payloads in robot mode. The clamped values are reported in `meta.effective_limit` and
`meta.effective_include_notes`.
#### 3b. Domain Structs
**File**: `src/cli/commands/list.rs`
@@ -981,8 +1064,8 @@ SELECT
COALESCE(nr.note_count, 0) AS note_count,
nr.first_author,
nr.first_note_body,
fd.first_note_at,
fd.last_note_at,
COALESCE(fd.first_note_at, fd.last_note_at, 0) AS first_note_at,
COALESCE(fd.last_note_at, fd.first_note_at, 0) AS last_note_at,
fd.resolvable,
fd.resolved,
nr.position_new_path,
@@ -992,7 +1075,7 @@ JOIN projects p ON fd.project_id = p.id
LEFT JOIN issues i ON fd.issue_id = i.id
LEFT JOIN merge_requests m ON fd.merge_request_id = m.id
LEFT JOIN note_rollup nr ON nr.discussion_id = fd.id
ORDER BY {sort_column} {order}
ORDER BY COALESCE({sort_column}, 0) {order}, fd.id {order}
LIMIT ?
```
@@ -1004,39 +1087,54 @@ of discussions, the window function approach avoids repeated index probes and pr
more predictable query plan. The `MAX(CASE WHEN rn = 1 ...)` pattern extracts first-note
attributes from the grouped output without additional lookups.
**Note on ordering**: The `COALESCE({sort_column}, 0)` with tiebreaker `fd.id` ensures
deterministic ordering even when timestamps are NULL (partial sync states). The `id`
tiebreaker is cheap (primary key) and prevents unstable sort output.
**Note on SQLite FILTER syntax**: SQLite does not support `COUNT(*) FILTER (WHERE ...)`.
Use `SUM(CASE WHEN ... THEN 1 ELSE 0 END)` instead (as shown above).
#### 3c-ii. Note expansion query (--include-notes)
When `include_notes > 0`, after the main discussion query, run a follow-up query per
discussion to fetch its N most recent notes:
When `include_notes > 0`, after the main discussion query, run a **single batched query**
using a window function to fetch the N most recent notes per discussion:
```sql
SELECT n.id, n.gitlab_id, n.author_username, n.body, n.note_type,
n.is_system, n.created_at, n.updated_at,
n.position_new_path, n.position_new_line,
n.position_old_path, n.position_old_line,
n.resolvable, n.resolved, n.resolved_by,
d.noteable_type,
COALESCE(i.iid, m.iid) AS parent_iid,
COALESCE(i.title, m.title) AS parent_title,
p.path_with_namespace AS project_path,
d.gitlab_discussion_id
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
JOIN projects p ON n.project_id = p.id
LEFT JOIN issues i ON d.issue_id = i.id
LEFT JOIN merge_requests m ON d.merge_request_id = m.id
WHERE d.id = ?
ORDER BY n.created_at DESC
LIMIT ?
WITH ranked_expansion AS (
SELECT
n.id, n.gitlab_id, n.author_username, n.body, n.note_type,
n.is_system, n.created_at, n.updated_at,
n.position_new_path, n.position_new_line,
n.position_old_path, n.position_old_line,
n.resolvable, n.resolved, n.resolved_by,
d.noteable_type,
COALESCE(i.iid, m.iid) AS parent_iid,
COALESCE(i.title, m.title) AS parent_title,
p.path_with_namespace AS project_path,
d.gitlab_discussion_id,
n.discussion_id,
ROW_NUMBER() OVER (
PARTITION BY n.discussion_id
ORDER BY n.created_at DESC, n.id DESC
) AS rn
FROM notes n
JOIN discussions d ON n.discussion_id = d.id
JOIN projects p ON n.project_id = p.id
LEFT JOIN issues i ON d.issue_id = i.id
LEFT JOIN merge_requests m ON d.merge_request_id = m.id
WHERE n.discussion_id IN ({placeholders})
)
SELECT * FROM ranked_expansion WHERE rn <= ?
ORDER BY discussion_id, rn
```
**Optimization**: If discussion count is small (<= 50), batch all discussion IDs into a
single `WHERE d.id IN (?, ?, ...)` query with a secondary partition to split by discussion.
For larger result sets, fall back to per-discussion queries to avoid huge IN clauses. This
matches the existing note-loading pattern in `show.rs`.
Group by `discussion_id` in Rust and attach notes arrays to the corresponding
`DiscussionListRowJson`. This avoids per-discussion round-trips entirely — one query
regardless of how many discussions are in the result set.
The `{placeholders}` are the `id` values from the main discussion query result. Since
the discussion count is already clamped by `--limit` (max 500), the IN clause size is
bounded and safe.
The returned `NoteListRow` rows reuse the same struct and `NoteListRowJson` conversion from
workstream 1, ensuring identical note shape across all commands.
@@ -1093,8 +1191,10 @@ fn handle_discussions(
let conn = create_connection(&db_path)?;
let order = if args.asc { "asc" } else { "desc" };
let effective_limit = args.limit.min(500);
let effective_include_notes = args.include_notes.min(20);
let filters = DiscussionListFilters {
limit: args.limit,
limit: effective_limit,
project: args.project,
for_issue_iid: args.for_issue,
for_mr_iid: args.for_mr,
@@ -1105,7 +1205,7 @@ fn handle_discussions(
noteable_type: args.noteable_type,
sort: args.sort,
order: order.to_string(),
include_notes: args.include_notes,
include_notes: effective_include_notes,
};
let result = query_discussions(&conn, &filters, &config)?;
@@ -1122,6 +1222,8 @@ fn handle_discussions(
start.elapsed().as_millis() as u64,
args.fields.as_deref(),
robot_mode,
effective_limit,
effective_include_notes,
),
"jsonl" => print_list_discussions_jsonl(&result),
"csv" => print_list_discussions_csv(&result),
@@ -1144,9 +1246,17 @@ pub fn print_list_discussions_json(
elapsed_ms: u64,
fields: Option<&[String]>,
robot_mode: bool,
effective_limit: usize,
effective_include_notes: usize,
) {
let json_result = DiscussionListResultJson::from(result);
let meta = RobotMeta { elapsed_ms };
let has_more = result.total_count as usize > json_result.showing;
let meta = serde_json::json!({
"elapsed_ms": elapsed_ms,
"effective_limit": effective_limit,
"effective_include_notes": effective_include_notes,
"has_more": has_more,
});
let output = serde_json::json!({
"ok": true,
"data": json_result,
@@ -1325,7 +1435,7 @@ fn query_discussions_by_gitlab_id() {
}
```
#### Test 8: --include-notes populates notes array
#### Test 8: --include-notes populates notes array via batched query
```rust
#[test]
@@ -1381,6 +1491,69 @@ fn discussions_bridge_fields_forced_in_robot_mode() {
}
```
#### Test 10: Output guardrails clamp limit and include_notes
```rust
#[test]
fn discussions_output_guardrails() {
// Verify that limit > 500 is clamped to 500
// Verify that include_notes > 20 is clamped to 20
// These are handler-level tests (not query-level)
assert_eq!(1000_usize.min(500), 500);
assert_eq!(50_usize.min(20), 20);
assert_eq!(5_usize.min(20), 5); // below cap stays unchanged
}
```
#### Test 11: Ambiguity guardrail fires for cross-project discussion ID
```rust
#[test]
fn discussions_ambiguous_gitlab_discussion_id_across_projects() {
let conn = create_test_db();
insert_project(&conn, 1); // "group/repo-a"
insert_project(&conn, 2); // "group/repo-b"
// Insert discussions with same gitlab_discussion_id in different projects
insert_discussion(&conn, 1, "shared-id", 1, None, None, "Issue");
insert_discussion(&conn, 2, "shared-id", 2, None, None, "Issue");
let filters = DiscussionListFilters {
gitlab_discussion_id: Some("shared-id".to_string()),
project: None, // no project specified
..DiscussionListFilters::default()
};
let result = query_discussions(&conn, &filters, &Config::default());
assert!(result.is_err());
// Error should be Ambiguous with both project paths
}
```
#### Test 12: has_more metadata is accurate
```rust
#[test]
fn discussions_has_more_metadata() {
let conn = create_test_db();
insert_project(&conn, 1);
insert_mr(&conn, 1, 1, 99, "Test MR");
// Insert 5 discussions
for i in 1..=5 {
insert_discussion(&conn, i, &format!("disc-{i}"), 1, None, Some(1), "MergeRequest");
insert_note_in_discussion(&conn, i, 500 + i, i, 1, "alice", "note");
}
// Limit to 3 — should show has_more = true
let filters = DiscussionListFilters {
limit: 3,
..DiscussionListFilters::default_for_mr(99)
};
let result = query_discussions(&conn, &filters, &Config::default()).unwrap();
assert_eq!(result.discussions.len(), 3);
assert_eq!(result.total_count, 5);
// has_more = total_count > showing = 5 > 3 = true
}
```
---
## 4. Fix Robot-Docs Response Schemas
@@ -1410,7 +1583,7 @@ Replace:
With:
```json
"data": {
"notes": "[{id:int, gitlab_id:int, author_username:string, body:string?, note_type:string?, is_system:bool, created_at_iso:string, updated_at_iso:string, position_new_path:string?, position_new_line:int?, position_old_path:string?, position_old_line:int?, resolvable:bool, resolved:bool, resolved_by:string?, noteable_type:string?, parent_iid:int?, parent_title:string?, project_path:string, gitlab_discussion_id:string}]",
"notes": "[{id:int, gitlab_id:int, gitlab_note_id:int, author_username:string, body:string?, note_type:string?, is_system:bool, created_at_iso:string, updated_at_iso:string, position_new_path:string?, position_new_line:int?, position_old_path:string?, position_old_line:int?, resolvable:bool, resolved:bool, resolved_by:string?, noteable_type:string?, parent_iid:int?, parent_title:string?, project_path:string, gitlab_discussion_id:string}]",
"total_count": "int",
"showing": "int"
}
@@ -1442,11 +1615,11 @@ With:
"response_schema": {
"ok": "bool",
"data": {
"discussions": "[{gitlab_discussion_id:string, noteable_type:string, parent_iid:int?, parent_title:string?, project_path:string, individual_note:bool, note_count:int, first_author:string?, first_note_body_snippet:string?, first_note_at_iso:string, last_note_at_iso:string, resolvable:bool, resolved:bool, position_new_path:string?, position_new_line:int?, notes:[NoteListRowJson]?}]",
"discussions": "[{gitlab_discussion_id:string, noteable_type:string, parent_iid:int?, parent_title:string?, project_path:string, individual_note:bool, note_count:int, first_author:string?, first_note_body_snippet:string?, first_note_at_iso:string, last_note_at_iso:string, resolvable:bool, resolved:bool, position_new_path:string?, position_new_line:int?, notes:[{...NoteListRowJson fields...}]?}]",
"total_count": "int",
"showing": "int"
},
"meta": {"elapsed_ms": "int"}
"meta": {"elapsed_ms": "int", "effective_limit": "int", "effective_include_notes": "int", "has_more": "bool"}
}
}
```
@@ -1473,33 +1646,83 @@ notes within show discussions now include `gitlab_note_id`.
"discussions: Thread-level discussion listing with gitlab_discussion_id for API integration"
```
#### 4f. Add robot-docs contract tests
#### 4f. Add robot-docs contract tests (field-set parity)
**File**: `src/main.rs` (within `#[cfg(test)]` module)
Add lightweight tests that parse the robot-docs JSON output and assert required Bridge
Contract fields are present. This prevents schema drift — if someone adds a field to the
struct but forgets to update robot-docs, the test fails.
Add tests that parse the robot-docs JSON output and compare declared fields against actual
serialized struct fields. This is stronger than string-contains checks — it catches schema
drift in both directions (field added to struct but not docs, or field listed in docs but
removed from struct).
```rust
#[test]
fn robot_docs_notes_schema_includes_bridge_fields() {
let docs = get_robot_docs_json(); // helper that builds the robot-docs Value
let notes_schema = docs["commands"]["notes"]["response_schema"]["data"]["notes"]
.as_str().unwrap();
assert!(notes_schema.contains("gitlab_discussion_id"));
assert!(notes_schema.contains("project_path"));
assert!(notes_schema.contains("parent_iid"));
/// Parse compact schema string "field1:type, field2:type?" into a set of field names
fn parse_schema_fields(schema: &str) -> HashSet<String> {
// Strip leading "[{" and trailing "}]", split on ", ", extract field name before ":"
schema.trim_start_matches("[{").trim_end_matches("}]")
.split(", ")
.filter_map(|f| f.split(':').next())
.map(|f| f.to_string())
.collect()
}
/// Get the actual serialized field names from a sample JSON struct
fn sample_note_json_keys() -> HashSet<String> {
let row = NoteListRow { /* ... test defaults ... */ };
let json = NoteListRowJson::from(&row);
let value = serde_json::to_value(&json).unwrap();
value.as_object().unwrap().keys().cloned().collect()
}
#[test]
fn robot_docs_discussions_schema_includes_bridge_fields() {
fn robot_docs_notes_schema_matches_actual_fields() {
let docs = get_robot_docs_json();
let notes_schema = docs["commands"]["notes"]["response_schema"]["data"]["notes"]
.as_str().unwrap();
let declared = parse_schema_fields(notes_schema);
let actual = sample_note_json_keys();
// All bridge fields must be in both declared and actual
for bridge in &["gitlab_discussion_id", "project_path", "parent_iid", "gitlab_note_id"] {
assert!(declared.contains(*bridge), "robot-docs missing bridge field: {bridge}");
assert!(actual.contains(*bridge), "NoteListRowJson missing bridge field: {bridge}");
}
// Every declared field should exist in the actual struct (no phantom docs)
for field in &declared {
assert!(actual.contains(field),
"robot-docs declares '{field}' but NoteListRowJson doesn't serialize it");
}
// Every actual field should be declared in docs (no undocumented fields)
for field in &actual {
assert!(declared.contains(field),
"NoteListRowJson serializes '{field}' but robot-docs doesn't declare it");
}
}
#[test]
fn robot_docs_discussions_schema_matches_actual_fields() {
let docs = get_robot_docs_json();
let disc_schema = docs["commands"]["discussions"]["response_schema"]["data"]["discussions"]
.as_str().unwrap();
assert!(disc_schema.contains("gitlab_discussion_id"));
assert!(disc_schema.contains("project_path"));
assert!(disc_schema.contains("parent_iid"));
let declared = parse_schema_fields(disc_schema);
let actual = sample_discussion_json_keys();
for bridge in &["gitlab_discussion_id", "project_path", "parent_iid"] {
assert!(declared.contains(*bridge), "robot-docs missing bridge field: {bridge}");
assert!(actual.contains(*bridge), "DiscussionListRowJson missing bridge field: {bridge}");
}
for field in &declared {
assert!(actual.contains(field),
"robot-docs declares '{field}' but DiscussionListRowJson doesn't serialize it");
}
for field in &actual {
assert!(declared.contains(field),
"DiscussionListRowJson serializes '{field}' but robot-docs doesn't declare it");
}
}
#[test]
@@ -1536,6 +1759,7 @@ fn notes_handler_json_includes_bridge_fields() {
for note in value["notes"].as_array().unwrap() {
assert!(note.get("gitlab_discussion_id").is_some(), "missing gitlab_discussion_id");
assert!(note.get("gitlab_note_id").is_some(), "missing gitlab_note_id");
assert!(note.get("project_path").is_some(), "missing project_path");
assert!(note.get("parent_iid").is_some(), "missing parent_iid");
}
@@ -1591,7 +1815,7 @@ once and reused by Changes 3 and 4.
After all changes:
1. An agent can run `lore -J notes --for-mr 3929 --contains "really do prefer"` and get
`gitlab_discussion_id` in the response
`gitlab_discussion_id` and `gitlab_note_id` in the response
2. An agent can run `lore -J discussions --for-mr 3929 --resolution unresolved` to see all
open threads with their IDs
3. An agent can run `lore -J mrs 3929` and see `gitlab_discussion_id`, `resolvable`,
@@ -1600,19 +1824,28 @@ After all changes:
4. `lore robot-docs` lists actual field names for all commands
5. All existing tests still pass
6. No clippy warnings (pedantic + nursery)
7. Robot-docs contract tests pass, preventing future schema drift
7. Robot-docs contract tests pass with field-set parity (not just string-contains), preventing
future schema drift in both directions
8. Bridge Contract fields (`project_path`, `noteable_type`, `parent_iid`,
`gitlab_discussion_id`, `gitlab_note_id`) are present in every applicable read payload
9. Bridge Contract fields survive `--fields` filtering in robot mode (guardrail enforced)
10. `--gitlab-discussion-id` filter works on both `notes` and `discussions` commands
11. `--include-notes N` populates inline notes on `discussions` output
11. `--include-notes N` populates inline notes on `discussions` output via single batched query
12. CLI-level contract integration tests verify bridge fields through the full handler path
13. `gitlab_note_id` is available in notes list output (alongside `gitlab_id` for back-compat)
and in show detail notes, providing a uniform field name across all commands
14. Ambiguity guardrail fires when `--gitlab-discussion-id` matches multiple projects without
`--project` specified
15. Output guardrails clamp `--limit` to 500 and `--include-notes` to 20; `meta` reports
effective values and `has_more` truncation flag
16. Discussion and show queries use deterministic ordering (COALESCE + id tiebreaker) to
prevent unstable output during partial sync states
---
## Rejected Recommendations
- **Rename `id``note_id` and `gitlab_id``gitlab_note_id` in notes list output** — rejected because every existing consumer (agents, scripts, field presets) uses `id` and `gitlab_id`. The fields are unambiguous within the `notes` context. The show-command note structs are a different story (they have no IDs at all), so we add `gitlab_note_id` there where it's genuinely missing. Renaming established fields is churn without proportional benefit.
- **Rename `id``note_id` and `gitlab_id``gitlab_note_id` in notes list output** — rejected because every existing consumer (agents, scripts, field presets) uses `id` and `gitlab_id`. The fields are unambiguous within the `notes` context. The show-command note structs are a different story (they have no IDs at all), so we add `gitlab_note_id` there where it's genuinely missing. Renaming established fields is churn without proportional benefit. (Updated: we now ADD `gitlab_note_id` as an alias alongside `gitlab_id` per iteration 3 feedback.)
- **Keyset cursor-based pagination (`--cursor` flag)** — rejected because no existing lore command has pagination, agents use `--limit` effectively, and adding a cursor mechanism is significant scope creep. Tracked as potential future work if agents hit real pagination needs.
- **Split `note_count` into `user_note_count`/`total_note_count` and rename `first_author` to `first_user_author`** — rejected because `note_count` already excludes system notes by query design (the `WHERE is_system = 0` / `CASE WHEN` filter), and `first_author` already targets the first non-system note. The current naming is clear and consistent with how `notes --include-system` works elsewhere.
- **Match path filter on both `position_new_path` and `position_old_path`** — rejected because agents care about where code is *now* (new path), not where it was before a rename. Matching old paths adds complexity and returns confusing results for moved files.
@@ -1621,3 +1854,6 @@ After all changes:
- **Structured robot-docs schema (JSON objects instead of string blobs)** — rejected because the current compact string format is intentionally token-efficient for agent consumption. Switching to nested JSON objects per field would significantly bloat robot-docs output. The string-based contract tests are sufficient — they test what agents actually parse. Agents already work with the inline field listing format used by `issues` and `mrs`.
- **`bridge_contract` meta-section in robot-docs output** — rejected because agents don't need a separate meta-contract section; they need correct field listings per command, which we already provide. Adding a cross-cutting contract section to robot-docs adds documentation surface area without improving the agent workflow.
- **Performance regression benchmark test (ignored by default)** — rejected because timing-based assertions are inherently flaky across machines, CI environments, and load conditions. Performance is validated through query plan analysis (EXPLAIN) and manual profiling, not hard-coded elapsed-time thresholds.
- **Make `--discussion-id` and `--gitlab-discussion-id` aliases for the same backing filter** — rejected because they filter on different identifiers: `--discussion-id` takes the internal integer ID (existing behavior), while `--gitlab-discussion-id` takes the external string ID. These serve fundamentally different use cases (internal cross-referencing vs. external API bridging) and cannot be collapsed without breaking existing consumers.
- **`--with-write-hints` flag for inline glab endpoint templates** — rejected because this couples lore's read surface to glab's API surface, violating the read/write split principle. The Bridge Contract gives agents the raw identifiers; constructing glab commands is the agent's responsibility. Adding endpoint templates would require lore to track glab API changes, creating an unnecessary maintenance burden.
- **Show-command note ordering change (`ORDER BY COALESCE(position, ...), created_at, id`)** — rejected because show-command note ordering within a discussion thread is out of scope for this plan. The existing ordering works correctly for present data; the defensive COALESCE pattern is applied to discussion-level ordering where it matters for agent workflows.