Files
gitlore/docs/who-command-design.feedback-5.md
Taylor Eernisse 8dc479e515 docs: add lore who command design plan with 8 iterations of review feedback
Design document for `lore who` — a people intelligence query layer over
existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53
participants). Answers five collaboration questions: expert lookup by
file/path, workload summary, review pattern analysis, active discussion
tracking, and file overlap detection.

Key design decisions refined across 8 feedback iterations:
- All SQL is fully static (no format!()) with prepare_cached() throughout
- Exact vs prefix path matching via PathQuery struct (two static SQL variants)
- Self-review exclusion (author != reviewer) on all DiffNote branches
- Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers
- Bounded payloads with *_total/*_truncated metadata for robot consumers
- Truncation transparency via LIMIT+1 overflow detection pattern
- Robot JSON includes resolved_input for reproducibility (since_mode tri-state)
- Multi-project correctness with project-qualified entity references
- Composite migration indexes designed for query selectivity on hot paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-07 21:35:05 -05:00

12 KiB
Raw Permalink Blame History

Below are the highest-leverage revisions Id make. Theyre tightly scoped (no new tables/APIs), but fix a few real correctness issues and make the outputs more actionable.

  1. Fix a correctness bug in PathQuery: dont escape for =, and make --path Makefile actually work Why

Bug: build_path_query() currently runs escape_like() even when is_prefix = false (exact match). That will break exact matches for paths containing _, %, or \ because = does not treat those as metacharacters (so the escaped string wont equal the stored path).

UX mismatch: The plan says --path handles dotless root files (Makefile/LICENSE), but the current logic still treats them as directory prefixes (Makefile/%) → zero results.

Change

Only escape for LIKE.

Treat root paths (no /) passed via --path as exact matches by default (unless they end with /).

diff Copy code diff --git a/plan.md b/plan.md @@ -/// Build a path query from a user-supplied path. -/// -/// Rules: -/// - If the path ends with /, it's a directory prefix -> escaped_path% (LIKE) -/// - If the last path segment contains ., it's a file -> exact match (=) -/// - Otherwise, it's a directory prefix -> escaped_path/% (LIKE) +/// Build a path query from a user-supplied path. +/// +/// Rules: +/// - If the path ends with /, it's a directory prefix -> escaped_path/% (LIKE) +/// - If the path is a root path (no /) and does NOT end with /, treat as exact (=) +/// (this makes --path Makefile and --path LICENSE work as intended) +/// - Else if the last path segment contains ., treat as exact (=) +/// - Otherwise, treat as directory prefix -> escaped_path/% (LIKE) @@ -fn build_path_query(path: &str) -> PathQuery { +fn build_path_query(path: &str) -> PathQuery { let trimmed = path.trim_end_matches('/'); let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);

  • let is_file = !path.ends_with('/') && last_segment.contains('.');
  • let escaped = escape_like(trimmed);
  • let is_root = !trimmed.contains('/');

  • let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));

    if is_file { PathQuery {

  •        value: escaped,
    
  •        // IMPORTANT: do NOT escape for exact match (=)
    
  •        value: trimmed.to_string(),
           is_prefix: false,
       }
    
    } else {
  •    let escaped = escape_like(trimmed);
       PathQuery {
           value: format!("{escaped}/%"),
           is_prefix: true,
       }
    
    } } @@ -/// Known limitation: Dotless root files (LICENSE, Makefile, Dockerfile) -/// without a trailing / will be treated as directory prefixes. Use --path -/// for these — the --path flag passes through to Expert mode directly, -/// and the build_path_query output for "LICENSE" is a prefix LICENSE/% -/// which will simply return zero results (a safe, obvious failure mode that the -/// help text addresses). +/// Note: Root file paths passed via --path (including dotless files like Makefile/LICENSE) +/// are treated as exact matches unless they end with /.

Also update the --path help text to be explicit:

diff Copy code diff --git a/plan.md b/plan.md @@

  • /// Force expert mode for a file/directory path (handles root files like
  • /// README.md, LICENSE, Makefile that lack a / and can't be auto-detected)
  • /// Force expert mode for a file/directory path.
  • /// Root files (README.md, LICENSE, Makefile) are treated as exact matches.
  • /// Use a trailing / to force directory-prefix matching.
  1. Fix Active mode: your note_count is currently counting participants, and the CTE scans too broadly Why

In note_agg, you do SELECT DISTINCT discussion_id, author_username and then COUNT(*) AS note_count. Thats participant count, not note count.

The current note_agg also builds the DISTINCT set from all notes then joins to picked. Its avoidable work.

Change

Split into two aggregations scoped to picked:

note_counts: counts non-system notes per picked discussion.

participants: distinct usernames per picked discussion, then GROUP_CONCAT.

diff Copy code diff --git a/plan.md b/plan.md @@

  •    note_agg AS (
    
  •        SELECT
    
  •            n.discussion_id,
    
  •            COUNT(*) AS note_count,
    
  •            GROUP_CONCAT(n.author_username, X'1F') AS participants
    
  •        FROM (
    
  •            SELECT DISTINCT discussion_id, author_username
    
  •            FROM notes
    
  •            WHERE is_system = 0 AND author_username IS NOT NULL
    
  •        ) n
    
  •        JOIN picked p ON p.id = n.discussion_id
    
  •        GROUP BY n.discussion_id
    
  •    )
    
  •    note_counts AS (
    
  •        SELECT
    
  •            n.discussion_id,
    
  •            COUNT(*) AS note_count
    
  •        FROM notes n
    
  •        JOIN picked p ON p.id = n.discussion_id
    
  •        WHERE n.is_system = 0
    
  •        GROUP BY n.discussion_id
    
  •    ),
    
  •    participants AS (
    
  •        SELECT
    
  •            x.discussion_id,
    
  •            GROUP_CONCAT(x.author_username, X'1F') AS participants
    
  •        FROM (
    
  •            SELECT DISTINCT n.discussion_id, n.author_username
    
  •            FROM notes n
    
  •            JOIN picked p ON p.id = n.discussion_id
    
  •            WHERE n.is_system = 0 AND n.author_username IS NOT NULL
    
  •        ) x
    
  •        GROUP BY x.discussion_id
    
  •    )
    

@@

  •    LEFT JOIN note_agg na ON na.discussion_id = p.id
    
  •    LEFT JOIN note_counts nc ON nc.discussion_id = p.id
    
  •    LEFT JOIN participants pa ON pa.discussion_id = p.id
    

@@

  •        COALESCE(na.note_count, 0) AS note_count,
    
  •        COALESCE(na.participants, '') AS participants
    
  •        COALESCE(nc.note_count, 0) AS note_count,
    
  •        COALESCE(pa.participants, '') AS participants
    

Net effect: correctness fix + more predictable perf.

Add a test that would have failed before:

diff Copy code diff --git a/plan.md b/plan.md @@ #[test] fn test_active_query() { @@

  •    insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
    
  •    insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
    
  •    insert_diffnote(&conn, 2, 1, 1, "reviewer_b", "src/foo.rs", "follow-up");
    

@@

  •    assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
    
  •    assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
    
  •    assert_eq!(result.discussions[0].note_count, 2);
    
  1. Index fix: idx_discussions_unresolved_recent wont help global --active ordering Why

Your index is (project_id, last_note_at) with WHERE resolvable=1 AND resolved=0.

When --active is not project-scoped (common default), SQLite cant use (project_id, last_note_at) to satisfy ORDER BY last_note_at DESC efficiently because project_id isnt constrained.

This can turn into a scan+sort over potentially large unresolved sets.

Change

Keep the project-scoped index, but add a global ordering index (partial, still small):

diff Copy code diff --git a/plan.md b/plan.md @@ CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent ON discussions(project_id, last_note_at) WHERE resolvable = 1 AND resolved = 0; + +-- Active (global): unresolved discussions by recency (no project scope). +-- Supports ORDER BY last_note_at DESC LIMIT N when project_id is unconstrained. +CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent_global

  • ON discussions(last_note_at)
  • WHERE resolvable = 1 AND resolved = 0;
  1. Make Overlap “touches” coherent: count MRs for reviewers, not DiffNotes Why

Overlaps question is “Who else has MRs touching my files?” but:

reviewer branch uses COUNT(*) (DiffNotes)

author branch uses COUNT(DISTINCT m.id) (MRs)

Those are different units; summing them into touch_count is misleading.

Change

Count distinct MRs on the reviewer branch too:

diff Copy code diff --git a/plan.md b/plan.md @@

  •            COUNT(*) AS touch_count,
    
  •            COUNT(DISTINCT m.id) AS touch_count,
               MAX(n.created_at) AS last_touch_at,
               GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
    

Also update human output labeling:

diff Copy code diff --git a/plan.md b/plan.md @@

  •    style("Touches").bold(),
    
  •    style("MRs").bold(),
    

(You still preserve “strength” via mr_refs and last_touch_at.)

  1. Make outputs more actionable: add a canonical ref field (group/project!iid, group/project#iid) Why

You already do this for Overlap (mr_refs). Doing the same for Workload and Active reduces friction for both humans and agents:

humans can copy/paste a single token

robots dont need to stitch project_path + iid + prefix

Change (Workload structs + SQL) diff Copy code diff --git a/plan.md b/plan.md @@ pub struct WorkloadIssue { pub iid: i64,

  • pub ref_: String, pub title: String, pub project_path: String, pub updated_at: i64, } @@ pub struct WorkloadMr { pub iid: i64,
  • pub ref_: String, pub title: String, pub draft: bool, pub project_path: String, @@
  • let issues_sql =
  •    "SELECT i.iid, i.title, p.path_with_namespace, i.updated_at
    
  • let issues_sql =
  •    "SELECT i.iid,
    
  •            (p.path_with_namespace || '#' || i.iid) AS ref,
    
  •            i.title, p.path_with_namespace, i.updated_at
    

@@

  •            iid: row.get(0)?,
    
  •            title: row.get(1)?,
    
  •            project_path: row.get(2)?,
    
  •            updated_at: row.get(3)?,
    
  •            iid: row.get(0)?,
    
  •            ref_: row.get(1)?,
    
  •            title: row.get(2)?,
    
  •            project_path: row.get(3)?,
    
  •            updated_at: row.get(4)?,
           })
    

@@

  • let authored_sql =
  •    "SELECT m.iid, m.title, m.draft, p.path_with_namespace, m.updated_at
    
  • let authored_sql =
  •    "SELECT m.iid,
    
  •            (p.path_with_namespace || '!' || m.iid) AS ref,
    
  •            m.title, m.draft, p.path_with_namespace, m.updated_at
    

@@

  •            iid: row.get(0)?,
    
  •            title: row.get(1)?,
    
  •            draft: row.get::<_, i32>(2)? != 0,
    
  •            project_path: row.get(3)?,
    
  •            iid: row.get(0)?,
    
  •            ref_: row.get(1)?,
    
  •            title: row.get(2)?,
    
  •            draft: row.get::<_, i32>(3)? != 0,
    
  •            project_path: row.get(4)?,
               author_username: None,
    
  •            updated_at: row.get(4)?,
    
  •            updated_at: row.get(5)?,
           })
    

Then use ref_ in human output + robot JSON.

  1. Reviews mode: tolerate leading whitespace before prefix Why

Many people write " suggestion: ...". Current LIKE '%%' misses that.

Change

Use ltrim(n.body) consistently:

diff Copy code diff --git a/plan.md b/plan.md @@

  •       AND n.body LIKE '**%**%'
    
  •       AND ltrim(n.body) LIKE '**%**%'
    

@@

  •        SUBSTR(n.body, 3, INSTR(SUBSTR(n.body, 3), '**') - 1) AS raw_prefix,
    
  •        SUBSTR(ltrim(n.body), 3, INSTR(SUBSTR(ltrim(n.body), 3), '**') - 1) AS raw_prefix,
    
  1. Add two small tests that catch the above regressions Why

These are exactly the kind of issues that slip through without targeted tests.

diff Copy code diff --git a/plan.md b/plan.md @@ #[test] fn test_escape_like() { @@ } +

  • #[test]
  • fn test_build_path_query_exact_does_not_escape() {
  •    // '_' must not be escaped for '='
    
  •    let pq = build_path_query("README_with_underscore.md");
    
  •    assert_eq!(pq.value, "README_with_underscore.md");
    
  •    assert!(!pq.is_prefix);
    
  • }
  • #[test]
  • fn test_path_flag_dotless_root_file_is_exact() {
  •    let pq = build_path_query("Makefile");
    
  •    assert_eq!(pq.value, "Makefile");
    
  •    assert!(!pq.is_prefix);
    
  • }

Summary of net effect

Correctness fixes: exact-path escaping bug; Active.note_count bug.

Perf fixes: global --active index; avoid broad note scans in Active.

Usefulness upgrades: coherent overlap “touch” metric; canonical refs everywhere; reviews prefix more robust.

If you want one extra “stretch” that still isnt scope creep: add an unscoped warning line in human output when project_id == None (e.g., “Aggregated across projects; use -p to scope”) for Expert/Overlap/Active. Thats pure presentation, but prevents misinterpretation in multi-project DBs.