gitlore/docs/plan-expose-discussion-ids.feedback-4.md.bak

I reviewed the plan end-to-end and focused only on new improvements (none of the items in `## Rejected Recommendations` are re-proposed).

1. Add direct `--discussion-id` retrieval paths
Rationale: This removes a full discovery hop for the exact workflow that failed (replying to a known thread). It also reduces ambiguity and query cost when an agent already has the thread ID.

```diff
@@ Core Changes
 | 7 | Fix robot-docs to list actual field names | Docs | Small |
+| 8 | Add direct `--discussion-id` filter to notes/discussions/show | Core | Small |

@@ Change 3: Add Standalone `discussions` List Command
 lore -J discussions --for-mr 99 --cursor <token>       # keyset pagination
+lore -J discussions --discussion-id 6a9c1750b37d...    # direct lookup

@@ 3a. CLI Args
+    #[arg(long, conflicts_with_all = ["for_issue", "for_mr"], help_heading = "Filters")]
+    pub discussion_id: Option<String>,

@@ Change 1: Add `gitlab_discussion_id` to Notes Output
+Add `--discussion-id <hex>` filter to `notes` for direct note retrieval within one thread.
```

2. Add a shared filter compiler to eliminate count/query drift
Rationale: The plan currently repeats filters across data query, `total_count`, and `incomplete_rows` count queries. That is a classic reliability bug source. A single compiled filter object makes count semantics provably consistent.

```diff
@@ Count Semantics (Cross-Cutting Convention)
+## Filter Compiler (NEW, Cross-Cutting Convention)
+All list commands must build predicates via a shared `CompiledFilters` object that emits:
+- SQL predicate fragment
+- bind parameters
+- canonical filter string (for cursor hash)
+The same compiled object is reused by:
+- page data query
+- `total_count` query
+- `incomplete_rows` query
```

3. Harden keyset pagination semantics for `DESC`, limits, and client ergonomics
Rationale: `(sort_value, id) > (?, ?)` is only correct for ascending order. Descending sort needs `<`. Also add explicit `has_more` so clients don’t infer from cursor nullability.

```diff
@@ Keyset Pagination (Cross-Cutting, Change B)
-```sql
-WHERE (sort_value, id) > (?, ?)
-```
+Use comparator by order:
+- ASC: `(sort_value, id) > (?, ?)`
+- DESC: `(sort_value, id) < (?, ?)`

@@ 3a. CLI Args
+    #[arg(short = 'n', long = "limit", default_value = "50", value_parser = clap::value_parser!(usize).range(1..=500), help_heading = "Output")]
+    pub limit: usize,

@@ Response Schema
-    "next_cursor": "aW...xyz=="
+    "next_cursor": "aW...xyz==",
+    "has_more": true
```

4. Add DB-level entity integrity invariants (not just response invariants)
Rationale: Response-side filtering is good, but DB correctness should also be guarded. This prevents silent corruption and bad joins from ingestion or future migrations.

```diff
@@ Contract Invariants (NEW)
+### Entity Integrity Invariants (DB + Ingest)
+1. `discussions` must belong to exactly one parent (`issue_id XOR merge_request_id`).
+2. `discussions.noteable_type` must match the populated parent column.
+3. Natural-key uniqueness is enforced where valid:
+   - `(project_id, gitlab_discussion_id)` unique for discussions.
+4. Ingestion must reject/quarantine rows violating invariants and report counts.

@@ Supporting Indexes (Cross-Cutting, Change D)
+CREATE UNIQUE INDEX IF NOT EXISTS idx_discussions_project_gitlab_discussion_id
+    ON discussions(project_id, gitlab_discussion_id);
```

5. Switch bulk note loading to streaming grouping (avoid large intermediate vecs)
Rationale: Current bulk strategy still materializes all notes before grouping. Streaming into the map cuts peak memory and improves large-MR stability.

```diff
@@ Change 2e. Constructor — use bulk notes map
-let all_note_rows: Vec<MrNoteDetail> = ...  // From bulk query above
-let notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> =
-    all_note_rows.into_iter().fold(HashMap::new(), |mut map, note| {
-        map.entry(note.discussion_id).or_insert_with(Vec::new).push(note);
-        map
-    });
+let mut notes_by_discussion: HashMap<i64, Vec<MrNoteDetail>> = HashMap::new();
+for row in bulk_note_stmt.query_map(params, map_note_row)? {
+    let note = row?;
+    notes_by_discussion.entry(note.discussion_id).or_default().push(note);
+}
```

6. Make freshness tri-state (`fresh|stale|unknown`) and fail closed on unknown with `--require-fresh`
Rationale: `stale: bool` alone cannot represent “never synced / unknown project freshness.” For write safety, unknown freshness should be explicit and reject under freshness constraints.

```diff
@@ Freshness Metadata & Staleness Guards
 pub struct ResponseMeta {
     pub elapsed_ms: i64,
     pub data_as_of_iso: String,
     pub sync_lag_seconds: i64,
     pub stale: bool,
+    pub freshness_state: String, // "fresh" | "stale" | "unknown"
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub freshness_reason: Option<String>,
     pub incomplete_rows: i64,
@@
-if sync_lag_seconds > max_age_secs {
+if freshness_state == "unknown" || sync_lag_seconds > max_age_secs {
```

7. Tune indexes to match actual ORDER BY paths in window queries
Rationale: `idx_notes_discussion_position` is likely insufficient for the two window orderings. A covering-style index aligned with partition/order keys reduces random table lookups.

```diff
@@ Supporting Indexes (Cross-Cutting, Change D)
--- Notes: window function ORDER BY (discussion_id, position) for ROW_NUMBER()
-CREATE INDEX IF NOT EXISTS idx_notes_discussion_position
-    ON notes(discussion_id, position);
+-- Notes: support dual ROW_NUMBER() orderings and reduce table lookups
+CREATE INDEX IF NOT EXISTS idx_notes_discussion_window
+    ON notes(discussion_id, is_system, position, created_at, gitlab_id);
```

8. Add a phased rollout gate before strict exclusion becomes default
Rationale: Enforcing `gitlab_* IS NOT NULL` immediately can hide data if existing rows are incomplete. A short observation gate prevents sudden regressions while preserving the end-state contract.

```diff
@@ Delivery Order
+Batch 0: Observability gate (NEW)
+- Ship `incomplete_rows` and freshness meta first
+- Measure incomplete rate across real datasets
+- If incomplete ratio <= threshold, enable strict exclusion defaults
+- If above threshold, block rollout and fix ingestion quality first
+
 Change 1 (notes output)       ──┐
```

9. Add property-based invariants for pagination/count correctness
Rationale: Your current tests are scenario-based and good, but randomized property tests are much better at catching edge-case cursor/count bugs.

```diff
@@ Tests (Change 3 / Change B)
+**Test 12**: Property-based pagination invariants (`proptest`)
+```rust
+#[test]
+fn prop_discussion_cursor_no_overlap_no_gap_under_random_data() { /* ... */ }
+```
+
+**Test 13**: Property-based count invariants
+```rust
+#[test]
+fn prop_total_count_and_incomplete_rows_match_filter_partition() { /* ... */ }
+```
```

If you want, I can now produce a fully consolidated “Plan v4” that applies these diffs cleanly into your original document so it reads as a single coherent spec.