Files
gitlore/docs/plan-expose-discussion-ids.feedback-4.md.bak
teernisse a1bca10408 feat(cli): implement 'lore file-history' command (bd-z94)
Adds file-history command showing which MRs touched a file, with:
- Rename chain resolution via BFS (resolve_rename_chain from bd-1yx)
- DiffNote discussion snippets with --discussions flag
- --merged filter, --no-follow-renames, -n limit
- Human output with styled MR list and rename chain display
- Robot JSON output with {ok, data, meta} envelope
- Autocorrect registry and robot-docs manifest entry
- Fixes pre-existing --no-status missing from sync autocorrect registry
2026-02-17 12:57:56 -05:00

160 lines
6.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
I reviewed the plan end-to-end and focused only on new improvements (none of the items in `## Rejected Recommendations` are re-proposed).
1. Add direct `--discussion-id` retrieval paths
Rationale: This removes a full discovery hop for the exact workflow that failed (replying to a known thread). It also reduces ambiguity and query cost when an agent already has the thread ID.
```diff
@@ Core Changes
| 7 | Fix robot-docs to list actual field names | Docs | Small |
+| 8 | Add direct `--discussion-id` filter to notes/discussions/show | Core | Small |
@@ Change 3: Add Standalone `discussions` List Command
lore -J discussions --for-mr 99 --cursor <token> # keyset pagination
+lore -J discussions --discussion-id 6a9c1750b37d... # direct lookup
@@ 3a. CLI Args
+ #[arg(long, conflicts_with_all = ["for_issue", "for_mr"], help_heading = "Filters")]
+ pub discussion_id: Option<String>,
@@ Change 1: Add `gitlab_discussion_id` to Notes Output
+Add `--discussion-id <hex>` filter to `notes` for direct note retrieval within one thread.
```
2. Add a shared filter compiler to eliminate count/query drift
Rationale: The plan currently repeats filters across data query, `total_count`, and `incomplete_rows` count queries. That is a classic reliability bug source. A single compiled filter object makes count semantics provably consistent.
```diff
@@ Count Semantics (Cross-Cutting Convention)
+## Filter Compiler (NEW, Cross-Cutting Convention)
+All list commands must build predicates via a shared `CompiledFilters` object that emits:
+- SQL predicate fragment
+- bind parameters
+- canonical filter string (for cursor hash)
+The same compiled object is reused by:
+- page data query
+- `total_count` query
+- `incomplete_rows` query
```
3. Harden keyset pagination semantics for `DESC`, limits, and client ergonomics
Rationale: `(sort_value, id) > (?, ?)` is only correct for ascending order. Descending sort needs `<`. Also add explicit `has_more` so clients dont infer from cursor nullability.
```diff
@@ Keyset Pagination (Cross-Cutting, Change B)
-```sql
-WHERE (sort_value, id) > (?, ?)
-```
+Use comparator by order:
+- ASC: `(sort_value, id) > (?, ?)`
+- DESC: `(sort_value, id) < (?, ?)`
@@ 3a. CLI Args
+ #[arg(short = 'n', long = "limit", default_value = "50", value_parser = clap::value_parser!(usize).range(1..=500), help_heading = "Output")]
+ pub limit: usize,
@@ Response Schema
- "next_cursor": "aW...xyz=="
+ "next_cursor": "aW...xyz==",
+ "has_more": true
```
4. Add DB-level entity integrity invariants (not just response invariants)
Rationale: Response-side filtering is good, but DB correctness should also be guarded. This prevents silent corruption and bad joins from ingestion or future migrations.
```diff
@@ Contract Invariants (NEW)
+### Entity Integrity Invariants (DB + Ingest)
+1. `discussions` must belong to exactly one parent (`issue_id XOR merge_request_id`).
+2. `discussions.noteable_type` must match the populated parent column.
+3. Natural-key uniqueness is enforced where valid:
+ - `(project_id, gitlab_discussion_id)` unique for discussions.
+4. Ingestion must reject/quarantine rows violating invariants and report counts.
@@ Supporting Indexes (Cross-Cutting, Change D)
+CREATE UNIQUE INDEX IF NOT EXISTS idx_discussions_project_gitlab_discussion_id
+ ON discussions(project_id, gitlab_discussion_id);
```
5. Switch bulk note loading to streaming grouping (avoid large intermediate vecs)
Rationale: Current bulk strategy still materializes all notes before grouping. Streaming into the map cuts peak memory and improves large-MR stability.
```diff
@@ Change 2e. Constructor — use bulk notes map
-let all_note_rows: Vec<MrNoteDetail> = ... // From bulk query above
-let notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> =
- all_note_rows.into_iter().fold(HashMap::new(), |mut map, note| {
- map.entry(note.discussion_id).or_insert_with(Vec::new).push(note);
- map
- });
+let mut notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> = HashMap::new();
+for row in bulk_note_stmt.query_map(params, map_note_row)? {
+ let note = row?;
+ notes_by_discussion.entry(note.discussion_id).or_default().push(note);
+}
```
6. Make freshness tri-state (`fresh|stale|unknown`) and fail closed on unknown with `--require-fresh`
Rationale: `stale: bool` alone cannot represent “never synced / unknown project freshness.” For write safety, unknown freshness should be explicit and reject under freshness constraints.
```diff
@@ Freshness Metadata & Staleness Guards
pub struct ResponseMeta {
pub elapsed_ms: i64,
pub data_as_of_iso: String,
pub sync_lag_seconds: i64,
pub stale: bool,
+ pub freshness_state: String, // "fresh" | "stale" | "unknown"
+ #[serde(skip_serializing_if = "Option::is_none")]
+ pub freshness_reason: Option<String>,
pub incomplete_rows: i64,
@@
-if sync_lag_seconds > max_age_secs {
+if freshness_state == "unknown" || sync_lag_seconds > max_age_secs {
```
7. Tune indexes to match actual ORDER BY paths in window queries
Rationale: `idx_notes_discussion_position` is likely insufficient for the two window orderings. A covering-style index aligned with partition/order keys reduces random table lookups.
```diff
@@ Supporting Indexes (Cross-Cutting, Change D)
--- Notes: window function ORDER BY (discussion_id, position) for ROW_NUMBER()
-CREATE INDEX IF NOT EXISTS idx_notes_discussion_position
- ON notes(discussion_id, position);
+-- Notes: support dual ROW_NUMBER() orderings and reduce table lookups
+CREATE INDEX IF NOT EXISTS idx_notes_discussion_window
+ ON notes(discussion_id, is_system, position, created_at, gitlab_id);
```
8. Add a phased rollout gate before strict exclusion becomes default
Rationale: Enforcing `gitlab_* IS NOT NULL` immediately can hide data if existing rows are incomplete. A short observation gate prevents sudden regressions while preserving the end-state contract.
```diff
@@ Delivery Order
+Batch 0: Observability gate (NEW)
+- Ship `incomplete_rows` and freshness meta first
+- Measure incomplete rate across real datasets
+- If incomplete ratio <= threshold, enable strict exclusion defaults
+- If above threshold, block rollout and fix ingestion quality first
+
Change 1 (notes output) ──┐
```
9. Add property-based invariants for pagination/count correctness
Rationale: Your current tests are scenario-based and good, but randomized property tests are much better at catching edge-case cursor/count bugs.
```diff
@@ Tests (Change 3 / Change B)
+**Test 12**: Property-based pagination invariants (`proptest`)
+```rust
+#[test]
+fn prop_discussion_cursor_no_overlap_no_gap_under_random_data() { /* ... */ }
+```
+
+**Test 13**: Property-based count invariants
+```rust
+#[test]
+fn prop_total_count_and_incomplete_rows_match_filter_partition() { /* ... */ }
+```
```
If you want, I can now produce a fully consolidated “Plan v4” that applies these diffs cleanly into your original document so it reads as a single coherent spec.