gitlore/docs/who-command-design.feedback-7.md at d701b1f977d196b4ca6c1290875274261297c4c9

Files

Taylor Eernisse 8dc479e515 docs: add lore who command design plan with 8 iterations of review feedback

Design document for `lore who` — a people intelligence query layer over
existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53
participants). Answers five collaboration questions: expert lookup by
file/path, workload summary, review pattern analysis, active discussion
tracking, and file overlap detection.

Key design decisions refined across 8 feedback iterations:
- All SQL is fully static (no format!()) with prepare_cached() throughout
- Exact vs prefix path matching via PathQuery struct (two static SQL variants)
- Self-review exclusion (author != reviewer) on all DiffNote branches
- Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers
- Bounded payloads with *_total/*_truncated metadata for robot consumers
- Truncation transparency via LIMIT+1 overflow detection pattern
- Robot JSON includes resolved_input for reproducibility (since_mode tri-state)
- Multi-project correctness with project-qualified entity references
- Composite migration indexes designed for query selectivity on hot paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-07 21:35:05 -05:00

12 KiB

Raw Blame History

Below are the highest-leverage revisions I’d make to iteration 6 to improve correctness (multi-project edge cases), robot-mode reliability (bounded payloads + truncation), and signal quality—without changing the fundamental scope (still pure SQL over existing tables).

Make build_path_query project-aware and two-way probe (exact and prefix) Why

Your DB probe currently answers: “does this exact file exist anywhere in DiffNotes?” That can misclassify in a project-scoped run:

Path exists as a dotless file in Project A → probe returns true

User runs -p Project B where the path is a directory (or different shape) → you switch to exact, return empty, and miss valid prefix hits.

Also, you still have a minor heuristic fragility for dot directories when the user omits trailing / (e.g., .github/workflows): last segment has a dot → you treat as file unless forced dir.

Revision

Thread project_id into build_path_query(conn, path, project_id)

Probe exact first (scoped), then probe prefix (scoped)

Only fall back to heuristics if both probes fail

This keeps “static SQL, no dynamic assembly,” and costs at most 2 indexed existence queries per invocation.

diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@

fn build_path_query(conn: &Connection, path: &str) -> Result {

fn build_path_query(conn: &Connection, path: &str, project_id: Option) -> Result { let trimmed = path.trim_end_matches('/'); let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed); let is_root = !trimmed.contains('/'); let forced_dir = path.ends_with('/');

let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));

// Heuristic is now only a fallback; probes decide first.
let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));

let exact_exists = if !looks_like_file && !forced_dir {
```
   conn.query_row(
```
```
       "SELECT 1 FROM notes
```
```
        WHERE note_type = 'DiffNote'
```
```
          AND is_system = 0
```
```
          AND position_new_path = ?1
```
```
        LIMIT 1",
```
```
       rusqlite::params![trimmed],
```
```
       |_| Ok(()),
```
```
   )
```
```
   .is_ok()
```
} else {
```
   false
```
};

// Probe 1: exact file exists (scoped)
let exact_exists = conn.query_row(
```
   "SELECT 1 FROM notes
```
```
    WHERE note_type = 'DiffNote'
```
```
      AND is_system = 0
```
```
      AND position_new_path = ?1
```

      AND (?2 IS NULL OR project_id = ?2)

```
    LIMIT 1",
```

   rusqlite::params![trimmed, project_id],

```
   |_| Ok(()),
```
).is_ok();
// Probe 2: directory prefix exists (scoped)
let prefix_exists = if !forced_dir {
```
   let escaped = escape_like(trimmed);
```
```
   let pat = format!("{escaped}/%");
```
```
   conn.query_row(
```
```
       "SELECT 1 FROM notes
```
```
        WHERE note_type = 'DiffNote'
```
```
          AND is_system = 0
```

          AND position_new_path LIKE ?1 ESCAPE '\\'

          AND (?2 IS NULL OR project_id = ?2)

```
        LIMIT 1",
```

       rusqlite::params![pat, project_id],

```
       |_| Ok(()),
```
```
   ).is_ok()
```
} else { false };

let is_file = looks_like_file || exact_exists;

// Forced directory always wins; otherwise: exact > prefix > heuristic
let is_file = if forced_dir { false }

             else if exact_exists { true }

             else if prefix_exists { false }

```
             else { looks_like_file };
```
if is_file { Ok(PathQuery { value: trimmed.to_string(), is_prefix: false }) } else { let escaped = escape_like(trimmed); Ok(PathQuery { value: format!("{escaped}/%"), is_prefix: true }) } } @@

let pq = build_path_query(conn, path)?;

let pq = build_path_query(conn, path, project_id)?;

Add test coverage for the multi-project misclassification case:

diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@ #[test] fn test_build_path_query_dotless_subdir_file_uses_db_probe() { @@

   let pq = build_path_query(&conn, "src/Dockerfile").unwrap();

   let pq = build_path_query(&conn, "src/Dockerfile", None).unwrap();

   let pq2 = build_path_query(&conn2, "src/Dockerfile").unwrap();

   let pq2 = build_path_query(&conn2, "src/Dockerfile", None).unwrap();

}

#[test]
fn test_build_path_query_probe_is_project_scoped() {

   // Path exists as a dotless file in project 1; project 2 should not

   // treat it as an exact file unless it exists there too.

```
   let conn = setup_test_db();
```
```
   insert_project(&conn, 1, "team/a");
```
```
   insert_project(&conn, 2, "team/b");
```

   insert_mr(&conn, 1, 1, 10, "author_a", "opened");

   insert_discussion(&conn, 1, 1, Some(1), None, true, false);

   insert_diffnote(&conn, 1, 1, 1, "rev", "infra/Makefile", "note");

   let pq_scoped = build_path_query(&conn, "infra/Makefile", Some(2)).unwrap();

   assert!(pq_scoped.is_prefix); // should fall back to prefix in project 2

}

Bound robot payload sizes for participants and mr_refs (with totals + truncation) Why

mr_refs and participants can become unbounded arrays in robot mode, which is a real operational hazard:

huge JSON → slow, noisy diffs, brittle downstream pipelines

potential SQLite group_concat truncation becomes invisible (and you can’t distinguish “no refs” vs “refs truncated”)

Revision

Introduce hard caps and explicit metadata:

participants_total, participants_truncated

mr_refs_total, mr_refs_truncated

This is not scope creep—it’s defensive output hygiene.

diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@ pub struct ActiveDiscussion { @@ pub participants: Vec,

pub participants_total: u32,
pub participants_truncated: bool, } @@ pub struct OverlapUser { @@ pub mr_refs: Vec,
pub mr_refs_total: u32,
pub mr_refs_truncated: bool, }

Implementation sketch (Rust-side, deterministic):

diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@ fn query_active(...) -> Result {

const MAX_PARTICIPANTS: usize = 50; @@

```
       participants.sort();
```

```
       participants.sort();
```

       let participants_total = participants.len() as u32;

       let participants_truncated = participants.len() > MAX_PARTICIPANTS;

```
       if participants_truncated {
```

           participants.truncate(MAX_PARTICIPANTS);

```
       }
```

@@ Ok(ActiveDiscussion { @@ participants,

```
           participants_total,
```

           participants_truncated,
       })

@@ fn query_overlap(...) -> Result {

const MAX_MR_REFS_PER_USER: usize = 50; @@ .map(|a| { let mut mr_refs: Vec = a.mr_refs.into_iter().collect(); mr_refs.sort();

       let mr_refs_total = mr_refs.len() as u32;

       let mr_refs_truncated = mr_refs.len() > MAX_MR_REFS_PER_USER;

```
       if mr_refs_truncated {
```

           mr_refs.truncate(MAX_MR_REFS_PER_USER);

```
       }
       OverlapUser {
```

@@ mr_refs,

```
           mr_refs_total,
```

           mr_refs_truncated,
       }
   })

Update robot JSON accordingly:

diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@ fn active_to_json(r: &ActiveResult) -> serde_json::Value { @@ "participants": d.participants,

       "participants_total": d.participants_total,

       "participants_truncated": d.participants_truncated,
   }))

@@ fn overlap_to_json(r: &OverlapResult) -> serde_json::Value { @@ "mr_refs": u.mr_refs,

       "mr_refs_total": u.mr_refs_total,

       "mr_refs_truncated": u.mr_refs_truncated,
   }))

Also update robot-docs manifest schema snippet for who.active.discussions[] and who.overlap.users[].

Add truncation metadata to Workload sections (same LIMIT+1 pattern) Why

Workload is the mode most likely to be consumed by agents, and right now it has silent truncation (each section is LIMIT N with no signal). Your plan already treats truncation as a first-class contract elsewhere; Workload should match.

Revision

For each workload query:

request LIMIT + 1

set *_truncated booleans

trim to requested limit

diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@ pub struct WorkloadResult { pub username: String, pub assigned_issues: Vec, pub authored_mrs: Vec, pub reviewing_mrs: Vec, pub unresolved_discussions: Vec,

pub assigned_issues_truncated: bool,
pub authored_mrs_truncated: bool,
pub reviewing_mrs_truncated: bool,
pub unresolved_discussions_truncated: bool, }

And in JSON include the booleans (plus you already have summary.counts).

This is mechanically repetitive but extremely valuable for automation.

Rename “Last Active” → “Last Seen” for Expert/Overlap Why

For “author” rows, the timestamp is derived from review activity on their MR (via MAX(n.created_at)), not necessarily that person’s direct action. Calling that “active” is semantically misleading. “Last seen” is accurate across both reviewer+author branches.

diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@ pub struct Expert { @@

pub last_active_ms: i64,

pub last_seen_ms: i64, } @@ pub struct OverlapUser { @@

pub last_touch_at: i64,

pub last_seen_at: i64, @@ fn print_expert_human(...) { @@

```
   style("Last Active").bold(),
```

```
   style("Last Seen").bold(),
```

   style(format_relative_time(expert.last_active_ms)).dim(),

   style(format_relative_time(expert.last_seen_ms)).dim(),

(Keep internal SQL aliases consistent: last_seen_at everywhere.)

Make MR state filtering consistent in Expert/Overlap reviewer branches Why

You already restrict Overlap author branch to opened|merged, but reviewer branches can include closed/unmerged noise. Consistency improves signal quality and can reduce scan churn.

Low-risk revision: apply the same state filter to reviewer branches (Expert + Overlap). You can keep “closed” excluded by default without adding new flags.

diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@ WHERE n.note_type = 'DiffNote' AND n.is_system = 0 @@

AND n.created_at >= ?2

AND m.state IN ('opened','merged')
AND n.created_at >= ?2

This is a semantic choice; if you later want archaeology across closed/unmerged, that belongs in a separate mode/flag, but I would not add it now.

Add a design principle for bounded outputs (aligns with robot-first reproducibility) diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@

Truncation transparency. Result types carry a truncated: bool flag... +11. Bounded payloads. Robot JSON must never emit unbounded arrays (participants, refs).

Large list fields are capped with *_total + *_truncated so agents can page/retry.

Consolidated plan metadata bump (Iteration 7) diff Copy code diff --git a/who-command-design.md b/who-command-design.md @@ -iteration: 6 +iteration: 7 updated: 2026-02-07

Net effect (what you get)

Correct path classification under -p scoping (no cross-project probe leakage)

Deterministic + bounded robot payloads (no giant JSON surprises)

Uniform truncation contract across all modes (Workload no longer silently truncates)

Clearer semantics (“Last Seen” avoids misinterpretation)

Cleaner signals (reviewer branches ignore closed/unmerged by default)

If you want, I can also produce a second diff that updates the robot-docs schema block and the Verification EXPLAIN expectations to reflect the new probe queries and the state filter.

12 KiB Raw Blame History Unescape Escape

12 KiB

Raw Blame History