gitlore/docs/who-command-design.feedback-5.md at perf-audit

Files

Taylor Eernisse 8dc479e515 docs: add lore who command design plan with 8 iterations of review feedback

Design document for `lore who` — a people intelligence query layer over
existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53
participants). Answers five collaboration questions: expert lookup by
file/path, workload summary, review pattern analysis, active discussion
tracking, and file overlap detection.

Key design decisions refined across 8 feedback iterations:
- All SQL is fully static (no format!()) with prepare_cached() throughout
- Exact vs prefix path matching via PathQuery struct (two static SQL variants)
- Self-review exclusion (author != reviewer) on all DiffNote branches
- Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers
- Bounded payloads with *_total/*_truncated metadata for robot consumers
- Truncation transparency via LIMIT+1 overflow detection pattern
- Robot JSON includes resolved_input for reproducibility (since_mode tri-state)
- Multi-project correctness with project-qualified entity references
- Composite migration indexes designed for query selectivity on hot paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-07 21:35:05 -05:00

12 KiB

Raw Permalink Blame History

Below are the highest-leverage revisions I’d make. They’re tightly scoped (no new tables/APIs), but fix a few real correctness issues and make the outputs more actionable.

Fix a correctness bug in PathQuery: don’t escape for =, and make --path Makefile actually work Why

Bug: build_path_query() currently runs escape_like() even when is_prefix = false (exact match). That will break exact matches for paths containing _, %, or \ because = does not treat those as metacharacters (so the escaped string won’t equal the stored path).

UX mismatch: The plan says --path handles dotless root files (Makefile/LICENSE), but the current logic still treats them as directory prefixes (Makefile/%) → zero results.

Change

Only escape for LIKE.

Treat root paths (no /) passed via --path as exact matches by default (unless they end with /).

diff Copy code diff --git a/plan.md b/plan.md @@ -/// Build a path query from a user-supplied path. -/// -/// Rules: -/// - If the path ends with /, it's a directory prefix -> escaped_path% (LIKE) -/// - If the last path segment contains ., it's a file -> exact match (=) -/// - Otherwise, it's a directory prefix -> escaped_path/% (LIKE) +/// Build a path query from a user-supplied path. +/// +/// Rules: +/// - If the path ends with /, it's a directory prefix -> escaped_path/% (LIKE) +/// - If the path is a root path (no /) and does NOT end with /, treat as exact (=) +/// (this makes --path Makefile and --path LICENSE work as intended) +/// - Else if the last path segment contains ., treat as exact (=) +/// - Otherwise, treat as directory prefix -> escaped_path/% (LIKE) @@ -fn build_path_query(path: &str) -> PathQuery { +fn build_path_query(path: &str) -> PathQuery { let trimmed = path.trim_end_matches('/'); let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);

let is_file = !path.ends_with('/') && last_segment.contains('.');
let escaped = escape_like(trimmed);

let is_root = !trimmed.contains('/');
let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));

if is_file { PathQuery {

```
       value: escaped,
```

       // IMPORTANT: do NOT escape for exact match (=)

       value: trimmed.to_string(),
       is_prefix: false,
   }

} else {

```
   let escaped = escape_like(trimmed);
   PathQuery {
       value: format!("{escaped}/%"),
       is_prefix: true,
   }
```
} } @@ -/// Known limitation: Dotless root files (LICENSE, Makefile, Dockerfile) -/// without a trailing / will be treated as directory prefixes. Use --path -/// for these — the --path flag passes through to Expert mode directly, -/// and the build_path_query output for "LICENSE" is a prefix LICENSE/% -/// which will simply return zero results (a safe, obvious failure mode that the -/// help text addresses). +/// Note: Root file paths passed via --path (including dotless files like Makefile/LICENSE) +/// are treated as exact matches unless they end with /.

Also update the --path help text to be explicit:

diff Copy code diff --git a/plan.md b/plan.md @@

/// Force expert mode for a file/directory path (handles root files like
/// README.md, LICENSE, Makefile that lack a / and can't be auto-detected)

/// Force expert mode for a file/directory path.
/// Root files (README.md, LICENSE, Makefile) are treated as exact matches.
/// Use a trailing / to force directory-prefix matching.

Fix Active mode: your note_count is currently counting participants, and the CTE scans too broadly Why

In note_agg, you do SELECT DISTINCT discussion_id, author_username and then COUNT(*) AS note_count. That’s participant count, not note count.

The current note_agg also builds the DISTINCT set from all notes then joins to picked. It’s avoidable work.

Change

Split into two aggregations scoped to picked:

note_counts: counts non-system notes per picked discussion.

participants: distinct usernames per picked discussion, then GROUP_CONCAT.

diff Copy code diff --git a/plan.md b/plan.md @@

```
   note_agg AS (
```
```
       SELECT
```
```
           n.discussion_id,
```
```
           COUNT(*) AS note_count,
```

           GROUP_CONCAT(n.author_username, X'1F') AS participants

```
       FROM (
```

           SELECT DISTINCT discussion_id, author_username

```
           FROM notes
```

           WHERE is_system = 0 AND author_username IS NOT NULL

```
       ) n
```

       JOIN picked p ON p.id = n.discussion_id

```
       GROUP BY n.discussion_id
```
```
   )
```

```
   note_counts AS (
```
```
       SELECT
```
```
           n.discussion_id,
```
```
           COUNT(*) AS note_count
```
```
       FROM notes n
```

       JOIN picked p ON p.id = n.discussion_id

```
       WHERE n.is_system = 0
```
```
       GROUP BY n.discussion_id
```
```
   ),
```
```
   participants AS (
```
```
       SELECT
```
```
           x.discussion_id,
```

           GROUP_CONCAT(x.author_username, X'1F') AS participants

```
       FROM (
```

           SELECT DISTINCT n.discussion_id, n.author_username

```
           FROM notes n
```

           JOIN picked p ON p.id = n.discussion_id

           WHERE n.is_system = 0 AND n.author_username IS NOT NULL

```
       ) x
```
```
       GROUP BY x.discussion_id
```
```
   )
```

   LEFT JOIN note_agg na ON na.discussion_id = p.id

   LEFT JOIN note_counts nc ON nc.discussion_id = p.id

   LEFT JOIN participants pa ON pa.discussion_id = p.id

       COALESCE(na.note_count, 0) AS note_count,

       COALESCE(na.participants, '') AS participants

       COALESCE(nc.note_count, 0) AS note_count,

       COALESCE(pa.participants, '') AS participants

Net effect: correctness fix + more predictable perf.

Add a test that would have failed before:

diff Copy code diff --git a/plan.md b/plan.md @@ #[test] fn test_active_query() { @@

   insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");

   insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");

   insert_diffnote(&conn, 2, 1, 1, "reviewer_b", "src/foo.rs", "follow-up");

   assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);

   assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);

   assert_eq!(result.discussions[0].note_count, 2);

Index fix: idx_discussions_unresolved_recent won’t help global --active ordering Why

Your index is (project_id, last_note_at) with WHERE resolvable=1 AND resolved=0.

When --active is not project-scoped (common default), SQLite can’t use (project_id, last_note_at) to satisfy ORDER BY last_note_at DESC efficiently because project_id isn’t constrained.

This can turn into a scan+sort over potentially large unresolved sets.

Change

Keep the project-scoped index, but add a global ordering index (partial, still small):

diff Copy code diff --git a/plan.md b/plan.md @@ CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent ON discussions(project_id, last_note_at) WHERE resolvable = 1 AND resolved = 0; + +-- Active (global): unresolved discussions by recency (no project scope). +-- Supports ORDER BY last_note_at DESC LIMIT N when project_id is unconstrained. +CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent_global

ON discussions(last_note_at)
WHERE resolvable = 1 AND resolved = 0;

Make Overlap “touches” coherent: count MRs for reviewers, not DiffNotes Why

Overlap’s question is “Who else has MRs touching my files?” but:

reviewer branch uses COUNT(*) (DiffNotes)

author branch uses COUNT(DISTINCT m.id) (MRs)

Those are different units; summing them into touch_count is misleading.

Change

Count distinct MRs on the reviewer branch too:

diff Copy code diff --git a/plan.md b/plan.md @@

```
           COUNT(*) AS touch_count,
```

           COUNT(DISTINCT m.id) AS touch_count,
           MAX(n.created_at) AS last_touch_at,
           GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs

Also update human output labeling:

diff Copy code diff --git a/plan.md b/plan.md @@

```
   style("Touches").bold(),
```

```
   style("MRs").bold(),
```

(You still preserve “strength” via mr_refs and last_touch_at.)

Make outputs more actionable: add a canonical ref field (group/project!iid, group/project#iid) Why

You already do this for Overlap (mr_refs). Doing the same for Workload and Active reduces friction for both humans and agents:

humans can copy/paste a single token

robots don’t need to stitch project_path + iid + prefix

Change (Workload structs + SQL) diff Copy code diff --git a/plan.md b/plan.md @@ pub struct WorkloadIssue { pub iid: i64,

pub ref_: String, pub title: String, pub project_path: String, pub updated_at: i64, } @@ pub struct WorkloadMr { pub iid: i64,
pub ref_: String, pub title: String, pub draft: bool, pub project_path: String, @@

let issues_sql =

   "SELECT i.iid, i.title, p.path_with_namespace, i.updated_at

let issues_sql =
```
   "SELECT i.iid,
```

           (p.path_with_namespace || '#' || i.iid) AS ref,

           i.title, p.path_with_namespace, i.updated_at

```
           iid: row.get(0)?,
```
```
           title: row.get(1)?,
```
```
           project_path: row.get(2)?,
```
```
           updated_at: row.get(3)?,
```

```
           iid: row.get(0)?,
```
```
           ref_: row.get(1)?,
```
```
           title: row.get(2)?,
```
```
           project_path: row.get(3)?,
```

           updated_at: row.get(4)?,
       })

let authored_sql =

   "SELECT m.iid, m.title, m.draft, p.path_with_namespace, m.updated_at

let authored_sql =
```
   "SELECT m.iid,
```

           (p.path_with_namespace || '!' || m.iid) AS ref,

           m.title, m.draft, p.path_with_namespace, m.updated_at

```
           iid: row.get(0)?,
```
```
           title: row.get(1)?,
```

           draft: row.get::<_, i32>(2)? != 0,

```
           project_path: row.get(3)?,
```

```
           iid: row.get(0)?,
```
```
           ref_: row.get(1)?,
```
```
           title: row.get(2)?,
```

           draft: row.get::<_, i32>(3)? != 0,

           project_path: row.get(4)?,
           author_username: None,

```
           updated_at: row.get(4)?,
```

           updated_at: row.get(5)?,
       })

Then use ref_ in human output + robot JSON.

Reviews mode: tolerate leading whitespace before prefix Why

Many people write " suggestion: ...". Current LIKE '%%' misses that.

Change

Use ltrim(n.body) consistently:

diff Copy code diff --git a/plan.md b/plan.md @@

```
      AND n.body LIKE '**%**%'
```

```
      AND ltrim(n.body) LIKE '**%**%'
```

       SUBSTR(n.body, 3, INSTR(SUBSTR(n.body, 3), '**') - 1) AS raw_prefix,

       SUBSTR(ltrim(n.body), 3, INSTR(SUBSTR(ltrim(n.body), 3), '**') - 1) AS raw_prefix,

Add two small tests that catch the above regressions Why

These are exactly the kind of issues that slip through without targeted tests.

diff Copy code diff --git a/plan.md b/plan.md @@ #[test] fn test_escape_like() { @@ } +

#[test]
fn test_build_path_query_exact_does_not_escape() {
```
   // '_' must not be escaped for '='
```

   let pq = build_path_query("README_with_underscore.md");

   assert_eq!(pq.value, "README_with_underscore.md");

```
   assert!(!pq.is_prefix);
```
}
#[test]
fn test_path_flag_dotless_root_file_is_exact() {

   let pq = build_path_query("Makefile");

```
   assert_eq!(pq.value, "Makefile");
```
```
   assert!(!pq.is_prefix);
```
}

Summary of net effect

Correctness fixes: exact-path escaping bug; Active.note_count bug.

Perf fixes: global --active index; avoid broad note scans in Active.

Usefulness upgrades: coherent overlap “touch” metric; canonical refs everywhere; reviews prefix more robust.

If you want one extra “stretch” that still isn’t scope creep: add an unscoped warning line in human output when project_id == None (e.g., “Aggregated across projects; use -p to scope”) for Expert/Overlap/Active. That’s pure presentation, but prevents misinterpretation in multi-project DBs.

12 KiB Raw Permalink Blame History Unescape Escape

12 KiB

Raw Permalink Blame History