feat(cli): implement 'lore trace' command (bd-2n4, bd-9dd)
Gate 5 Code Trace - Tier 1 (API-only, no git blame). Answers 'Why was this code introduced?' by building file -> MR -> issue -> discussion chains. New files: - src/core/trace.rs: run_trace() query logic with rename-aware path resolution, entity_reference-based issue linking, and DiffNote discussion extraction - src/core/trace_tests.rs: 7 unit tests for query logic - src/cli/commands/trace.rs: CLI command with human output, robot JSON output, and :line suffix parsing (5 tests) Human output shows full content (no truncation). Robot JSON truncates discussion bodies to 500 chars for token efficiency. Wiring: - TraceArgs + Commands::Trace in cli/mod.rs - handle_trace in main.rs - VALID_COMMANDS + robot-docs manifest entry - COMMAND_FLAGS autocorrect registry entry Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
147
docs/plan-expose-discussion-ids.feedback-3.md
Normal file
147
docs/plan-expose-discussion-ids.feedback-3.md
Normal file
@@ -0,0 +1,147 @@
|
||||
1. **Make `gitlab_note_id` explicit in all note-level payloads without breaking existing consumers**
|
||||
Rationale: Your Bridge Contract already requires `gitlab_note_id`, but current plan keeps `gitlab_id` only in `notes` list while adding `gitlab_note_id` only in `show`. That forces agents to special-case commands. Add `gitlab_note_id` as an alias field everywhere note-level data appears, while keeping `gitlab_id` for compatibility.
|
||||
|
||||
```diff
|
||||
@@ Bridge Contract (Cross-Cutting)
|
||||
-Every read payload that surfaces notes or discussions MUST include:
|
||||
+Every read payload that surfaces notes or discussions MUST include:
|
||||
- project_path
|
||||
- noteable_type
|
||||
- parent_iid
|
||||
- gitlab_discussion_id
|
||||
- gitlab_note_id (when note-level data is returned — i.e., in notes list and show detail)
|
||||
+ - Back-compat rule: note payloads may continue exposing `gitlab_id`, but MUST also expose `gitlab_note_id` with the same value.
|
||||
|
||||
@@ 1. Add `gitlab_discussion_id` to Notes Output
|
||||
-#### 1c. Add field to `NoteListRowJson`
|
||||
+#### 1c. Add fields to `NoteListRowJson`
|
||||
+Add `gitlab_note_id` alias in addition to existing `gitlab_id` (no rename, no breakage).
|
||||
|
||||
@@ 1f. Update `--fields minimal` preset
|
||||
-"notes" => ["id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
|
||||
+"notes" => ["id", "gitlab_note_id", "author_username", "body", "created_at_iso", "gitlab_discussion_id"]
|
||||
```
|
||||
|
||||
2. **Avoid duplicate flag semantics for discussion filtering**
|
||||
Rationale: `notes` already has `--discussion-id` and it already maps to `d.gitlab_discussion_id`. Adding a second independent flag/field (`--gitlab-discussion-id`) increases complexity and precedence bugs. Keep one backing filter field and make the new flag an alias.
|
||||
|
||||
```diff
|
||||
@@ 1g. Add `--gitlab-discussion-id` filter to notes
|
||||
-Allow filtering notes directly by GitLab discussion thread ID...
|
||||
+Normalize discussion ID flags:
|
||||
+- Keep one backing filter field (`discussion_id`)
|
||||
+- Support both `--discussion-id` (existing) and `--gitlab-discussion-id` (alias)
|
||||
+- If both are provided, clap should reject as duplicate/alias conflict
|
||||
```
|
||||
|
||||
3. **Add ambiguity guardrails for cross-project discussion IDs**
|
||||
Rationale: `gitlab_discussion_id` is unique per project, not globally. Filtering by discussion ID without project can return multiple rows across repos, which breaks deterministic write bridging. Fail fast with an `Ambiguous` error and actionable fix (`--project`).
|
||||
|
||||
```diff
|
||||
@@ Bridge Contract (Cross-Cutting)
|
||||
+### Ambiguity Guardrail
|
||||
+When filtering by `gitlab_discussion_id` without `--project`, if multiple projects match:
|
||||
+- return `Ambiguous` error
|
||||
+- include matching project paths in message
|
||||
+- suggest retry with `--project <path>`
|
||||
```
|
||||
|
||||
4. **Replace `--include-notes` N+1 retrieval with one batched top-N query**
|
||||
Rationale: The current plan’s per-discussion follow-up query scales poorly and creates latency spikes. Use a single window-function query over selected discussion IDs and group rows in Rust. This is both faster and more predictable.
|
||||
|
||||
```diff
|
||||
@@ 3c-ii. Note expansion query (--include-notes)
|
||||
-When `include_notes > 0`, after the main discussion query, run a follow-up query per discussion...
|
||||
+When `include_notes > 0`, run one batched query:
|
||||
+WITH ranked_notes AS (
|
||||
+ SELECT
|
||||
+ n.*,
|
||||
+ d.gitlab_discussion_id,
|
||||
+ ROW_NUMBER() OVER (
|
||||
+ PARTITION BY n.discussion_id
|
||||
+ ORDER BY n.created_at DESC, n.id DESC
|
||||
+ ) AS rn
|
||||
+ FROM notes n
|
||||
+ JOIN discussions d ON d.id = n.discussion_id
|
||||
+ WHERE n.discussion_id IN ( ...selected discussion ids... )
|
||||
+)
|
||||
+SELECT ... FROM ranked_notes WHERE rn <= ?
|
||||
+ORDER BY discussion_id, rn;
|
||||
+
|
||||
+Group by `discussion_id` in Rust and attach notes arrays without per-thread round-trips.
|
||||
```
|
||||
|
||||
5. **Add hard output guardrails and explicit truncation metadata**
|
||||
Rationale: `--limit` and `--include-notes` are unbounded today. For robot workflows this can accidentally generate huge payloads. Cap values and surface effective limits plus truncation state in `meta`.
|
||||
|
||||
```diff
|
||||
@@ 3a. CLI Args
|
||||
- pub limit: usize,
|
||||
+ pub limit: usize, // clamp to max (e.g., 500)
|
||||
|
||||
- pub include_notes: usize,
|
||||
+ pub include_notes: usize, // clamp to max (e.g., 20)
|
||||
|
||||
@@ Response Schema
|
||||
- "meta": { "elapsed_ms": 12 }
|
||||
+ "meta": {
|
||||
+ "elapsed_ms": 12,
|
||||
+ "effective_limit": 50,
|
||||
+ "effective_include_notes": 2,
|
||||
+ "has_more": true
|
||||
+ }
|
||||
```
|
||||
|
||||
6. **Strengthen deterministic ordering and null handling**
|
||||
Rationale: `first_note_at`, `last_note_at`, and note `position` can be null/incomplete during partial sync states. Add null-safe ordering to avoid unstable output and flaky automation.
|
||||
|
||||
```diff
|
||||
@@ 2c. Update queries to SELECT new fields
|
||||
-... ORDER BY first_note_at
|
||||
+... ORDER BY COALESCE(first_note_at, last_note_at, 0), id
|
||||
|
||||
@@ show note query
|
||||
-ORDER BY position
|
||||
+ORDER BY COALESCE(position, 9223372036854775807), created_at, id
|
||||
|
||||
@@ 3c. SQL Query
|
||||
-ORDER BY {sort_column} {order}
|
||||
+ORDER BY COALESCE({sort_column}, 0) {order}, fd.id {order}
|
||||
```
|
||||
|
||||
7. **Make write-bridging more useful with optional command hints**
|
||||
Rationale: Exposing IDs is necessary but not sufficient; agents still need to assemble endpoints repeatedly. Add optional `--with-write-hints` that injects compact endpoint templates (`reply`, `resolve`) derived from row context. This improves usability without bloating default output.
|
||||
|
||||
```diff
|
||||
@@ 3a. CLI Args
|
||||
+ /// Include machine-actionable glab write hints per row
|
||||
+ #[arg(long, help_heading = "Output")]
|
||||
+ pub with_write_hints: bool,
|
||||
|
||||
@@ Response Schema (notes/discussions/show)
|
||||
+ "write_hints?": {
|
||||
+ "reply_endpoint": "string",
|
||||
+ "resolve_endpoint?": "string"
|
||||
+ }
|
||||
```
|
||||
|
||||
8. **Upgrade robot-docs/contract validation from string-contains to parity checks**
|
||||
Rationale: `contains("gitlab_discussion_id")` catches very little and allows schema drift. Build field-set parity tests that compare actual serialized JSON keys to robot-docs declared fields for `notes`, `discussions`, and `show` discussion nodes.
|
||||
|
||||
```diff
|
||||
@@ 4f. Add robot-docs contract tests
|
||||
-assert!(notes_schema.contains("gitlab_discussion_id"));
|
||||
+let declared = parse_schema_field_list(notes_schema);
|
||||
+let sample = sample_notes_row_json_keys();
|
||||
+assert_required_subset(&declared, &["project_path","noteable_type","parent_iid","gitlab_discussion_id","gitlab_note_id"]);
|
||||
+assert_schema_matches_payload(&declared, &sample);
|
||||
|
||||
@@ 4g. Add CLI-level contract integration tests
|
||||
+Add parity tests for:
|
||||
+- notes list JSON
|
||||
+- discussions list JSON
|
||||
+- issues show discussions[*]
|
||||
+- mrs show discussions[*]
|
||||
```
|
||||
|
||||
If you want, I can produce a full revised v3 plan text with these edits merged end-to-end so it’s ready to execute directly.
|
||||
207
docs/plan-expose-discussion-ids.feedback-4.md
Normal file
207
docs/plan-expose-discussion-ids.feedback-4.md
Normal file
@@ -0,0 +1,207 @@
|
||||
Below are the highest-impact revisions I’d make to this plan. I excluded everything listed in your `## Rejected Recommendations` section.
|
||||
|
||||
**1. Fix a correctness bug in the ambiguity guardrail (must run before `LIMIT`)**
|
||||
|
||||
The current post-query ambiguity check can silently fail when `--limit` truncates results to one project even though multiple projects match the same `gitlab_discussion_id`. That creates non-deterministic write targeting risk.
|
||||
|
||||
```diff
|
||||
@@ ## Ambiguity Guardrail
|
||||
-**Implementation**: After the main query, if `gitlab_discussion_id` is set and no `--project`
|
||||
-was provided, check if the result set spans multiple `project_path` values.
|
||||
+**Implementation**: Run a preflight distinct-project check when `gitlab_discussion_id` is set
|
||||
+and `--project` was not provided, before the main list query applies `LIMIT`.
|
||||
+Use:
|
||||
+```sql
|
||||
+SELECT DISTINCT p.path_with_namespace
|
||||
+FROM discussions d
|
||||
+JOIN projects p ON p.id = d.project_id
|
||||
+WHERE d.gitlab_discussion_id = ?
|
||||
+LIMIT 3
|
||||
+```
|
||||
+If more than one project is found, return `LoreError::Ambiguous` (exit code 18) with project
|
||||
+paths and suggestion to retry with `--project <path>`.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**2. Add `gitlab_project_id` to the Bridge Contract**
|
||||
|
||||
`project_path` is human-friendly but mutable (renames/transfers). `gitlab_project_id` gives a stable write target and avoids path re-resolution failures.
|
||||
|
||||
```diff
|
||||
@@ ## Bridge Contract (Cross-Cutting)
|
||||
Every read payload that surfaces notes or discussions **MUST** include:
|
||||
- `project_path`
|
||||
+- `gitlab_project_id`
|
||||
- `noteable_type`
|
||||
- `parent_iid`
|
||||
- `gitlab_discussion_id`
|
||||
- `gitlab_note_id`
|
||||
@@
|
||||
const BRIDGE_FIELDS_NOTES: &[&str] = &[
|
||||
- "project_path", "noteable_type", "parent_iid",
|
||||
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
|
||||
"gitlab_discussion_id", "gitlab_note_id",
|
||||
];
|
||||
const BRIDGE_FIELDS_DISCUSSIONS: &[&str] = &[
|
||||
- "project_path", "noteable_type", "parent_iid",
|
||||
+ "project_path", "gitlab_project_id", "noteable_type", "parent_iid",
|
||||
"gitlab_discussion_id",
|
||||
];
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**3. Replace stringly-typed filter/sort fields with enums end-to-end**
|
||||
|
||||
Right now `sort`, `order`, `resolution`, `noteable_type` are mostly `String`. This is fragile and risks unsafe SQL interpolation drift over time. Typed enums make invalid states unrepresentable.
|
||||
|
||||
```diff
|
||||
@@ ## 3a. CLI Args
|
||||
- pub resolution: Option<String>,
|
||||
+ pub resolution: Option<ResolutionFilter>,
|
||||
@@
|
||||
- pub noteable_type: Option<String>,
|
||||
+ pub noteable_type: Option<NoteableTypeFilter>,
|
||||
@@
|
||||
- pub sort: String,
|
||||
+ pub sort: DiscussionSortField,
|
||||
@@
|
||||
- pub asc: bool,
|
||||
+ pub order: SortDirection,
|
||||
@@ ## 3d. Filters struct
|
||||
- pub resolution: Option<String>,
|
||||
- pub noteable_type: Option<String>,
|
||||
- pub sort: String,
|
||||
- pub order: String,
|
||||
+ pub resolution: Option<ResolutionFilter>,
|
||||
+ pub noteable_type: Option<NoteableTypeFilter>,
|
||||
+ pub sort: DiscussionSortField,
|
||||
+ pub order: SortDirection,
|
||||
@@
|
||||
+Map enum -> SQL fragment via `match` in query builder; never interpolate raw strings.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**4. Enforce snapshot consistency for multi-query commands**
|
||||
|
||||
`discussions` with `--include-notes` does multiple reads. Without a single read transaction, concurrent ingest can produce mismatched `total_count`, row set, and expanded notes.
|
||||
|
||||
```diff
|
||||
@@ ## 3c. SQL Query
|
||||
-pub fn query_discussions(...)
|
||||
+pub fn query_discussions(...)
|
||||
{
|
||||
+ // Run count query + page query + note expansion under one deferred read transaction
|
||||
+ // so output is a single consistent snapshot.
|
||||
+ let tx = conn.transaction_with_behavior(rusqlite::TransactionBehavior::Deferred)?;
|
||||
...
|
||||
+ tx.commit()?;
|
||||
}
|
||||
@@ ## 1. Add `gitlab_discussion_id` to Notes Output
|
||||
+Apply the same snapshot rule to `query_notes` when returning `total_count` + paged rows.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**5. Correct first-note rollup semantics (current CTE can return null/incorrect `first_author`)**
|
||||
|
||||
In the proposed SQL, `rn=1` is computed over all notes but then filtered with `is_system=0`, so threads with a leading system note may incorrectly lose `first_author`/snippet. Also path rollup uses non-deterministic `MAX(...)`.
|
||||
|
||||
```diff
|
||||
@@ ## 3c. SQL Query
|
||||
-ranked_notes AS (
|
||||
+ranked_notes AS (
|
||||
SELECT
|
||||
n.discussion_id,
|
||||
n.author_username,
|
||||
n.body,
|
||||
n.is_system,
|
||||
n.position_new_path,
|
||||
n.position_new_line,
|
||||
- ROW_NUMBER() OVER (
|
||||
- PARTITION BY n.discussion_id
|
||||
- ORDER BY n.position, n.id
|
||||
- ) AS rn
|
||||
+ ROW_NUMBER() OVER (
|
||||
+ PARTITION BY n.discussion_id
|
||||
+ ORDER BY CASE WHEN n.is_system = 0 THEN 0 ELSE 1 END, n.created_at, n.id
|
||||
+ ) AS rn_first_note,
|
||||
+ ROW_NUMBER() OVER (
|
||||
+ PARTITION BY n.discussion_id
|
||||
+ ORDER BY CASE WHEN n.position_new_path IS NULL THEN 1 ELSE 0 END, n.created_at, n.id
|
||||
+ ) AS rn_first_position
|
||||
@@
|
||||
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN author_username END) AS first_author,
|
||||
- MAX(CASE WHEN rn = 1 AND is_system = 0 THEN body END) AS first_note_body,
|
||||
- MAX(CASE WHEN position_new_path IS NOT NULL THEN position_new_path END) AS position_new_path,
|
||||
- MAX(CASE WHEN position_new_line IS NOT NULL THEN position_new_line END) AS position_new_line
|
||||
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN author_username END) AS first_author,
|
||||
+ MAX(CASE WHEN rn_first_note = 1 AND is_system = 0 THEN body END) AS first_note_body,
|
||||
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_path END) AS position_new_path,
|
||||
+ MAX(CASE WHEN rn_first_position = 1 THEN position_new_line END) AS position_new_line
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**6. Add per-discussion truncation signals for `--include-notes`**
|
||||
|
||||
Top-level `has_more` is useful, but agents also need to know if an individual thread’s notes were truncated. Otherwise they can’t tell if a thread is complete.
|
||||
|
||||
```diff
|
||||
@@ ## Response Schema
|
||||
{
|
||||
"gitlab_discussion_id": "...",
|
||||
...
|
||||
- "notes": []
|
||||
+ "included_note_count": 0,
|
||||
+ "has_more_notes": false,
|
||||
+ "notes": []
|
||||
}
|
||||
@@ ## 3b. Domain Structs
|
||||
pub struct DiscussionListRowJson {
|
||||
@@
|
||||
+ pub included_note_count: usize,
|
||||
+ pub has_more_notes: bool,
|
||||
#[serde(skip_serializing_if = "Vec::is_empty")]
|
||||
pub notes: Vec<NoteListRowJson>,
|
||||
}
|
||||
@@ ## 3c-ii. Note expansion query (--include-notes)
|
||||
-Group by `discussion_id` in Rust and attach notes arrays...
|
||||
+Group by `discussion_id` in Rust, attach notes arrays, and set:
|
||||
+`included_note_count = notes.len()`,
|
||||
+`has_more_notes = note_count > included_note_count`.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**7. Add explicit query-plan gate and targeted index workstream (measured, not speculative)**
|
||||
|
||||
This plan introduces heavy discussion-centric reads. You should bake in deterministic performance validation with `EXPLAIN QUERY PLAN` and only then add indexes if missing.
|
||||
|
||||
```diff
|
||||
@@ ## Scope: Four workstreams, delivered in order:
|
||||
-4. Fix robot-docs to list actual field names instead of opaque type references
|
||||
+4. Add query-plan validation + targeted index updates for new discussion queries
|
||||
+5. Fix robot-docs to list actual field names instead of opaque type references
|
||||
@@
|
||||
+## 4. Query-Plan Validation and Targeted Indexes
|
||||
+
|
||||
+Before and after implementing `query_discussions`, capture `EXPLAIN QUERY PLAN` for:
|
||||
+- `--for-mr <iid> --resolution unresolved`
|
||||
+- `--project <path> --since 7d --sort last_note`
|
||||
+- `--gitlab-discussion-id <id>`
|
||||
+
|
||||
+If plans show table scans on `notes`/`discussions`, add indexes in `MIGRATIONS` array:
|
||||
+- `discussions(project_id, gitlab_discussion_id)`
|
||||
+- `discussions(merge_request_id, last_note_at, id)`
|
||||
+- `notes(discussion_id, created_at DESC, id DESC)`
|
||||
+- `notes(discussion_id, position, id)`
|
||||
+
|
||||
+Tests: assert the new query paths return expected rows under indexed schema and no regressions.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
If you want, I can produce a single consolidated “iteration 4” version of the plan text with all seven revisions merged in place.
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user