docs: add lore who command design plan with 8 iterations of review feedback

Design document for `lore who` — a people intelligence query layer over
existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53
participants). Answers five collaboration questions: expert lookup by
file/path, workload summary, review pattern analysis, active discussion
tracking, and file overlap detection.

Key design decisions refined across 8 feedback iterations:
- All SQL is fully static (no format!()) with prepare_cached() throughout
- Exact vs prefix path matching via PathQuery struct (two static SQL variants)
- Self-review exclusion (author != reviewer) on all DiffNote branches
- Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers
- Bounded payloads with *_total/*_truncated metadata for robot consumers
- Truncation transparency via LIMIT+1 overflow detection pattern
- Robot JSON includes resolved_input for reproducibility (since_mode tri-state)
- Multi-project correctness with project-qualified entity references
- Composite migration indexes designed for query selectivity on hot paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Taylor Eernisse
2026-02-07 21:35:05 -05:00
parent 3e7fa607d3
commit 8dc479e515
9 changed files with 6194 additions and 0 deletions

View File

@@ -0,0 +1,356 @@
Below are the highest-leverage revisions Id make. Theyre tightly scoped (no new tables/APIs), but fix a few real correctness issues and make the outputs more actionable.
1) Fix a correctness bug in PathQuery: dont escape for =, and make --path Makefile actually work
Why
Bug: build_path_query() currently runs escape_like() even when is_prefix = false (exact match). That will break exact matches for paths containing _, %, or \ because = does not treat those as metacharacters (so the escaped string wont equal the stored path).
UX mismatch: The plan says --path handles dotless root files (Makefile/LICENSE), but the current logic still treats them as directory prefixes (Makefile/%) → zero results.
Change
Only escape for LIKE.
Treat root paths (no /) passed via --path as exact matches by default (unless they end with /).
diff
Copy code
diff --git a/plan.md b/plan.md
@@
-/// Build a path query from a user-supplied path.
-///
-/// Rules:
-/// - If the path ends with `/`, it's a directory prefix -> `escaped_path%` (LIKE)
-/// - If the last path segment contains `.`, it's a file -> exact match (=)
-/// - Otherwise, it's a directory prefix -> `escaped_path/%` (LIKE)
+/// Build a path query from a user-supplied path.
+///
+/// Rules:
+/// - If the path ends with `/`, it's a directory prefix -> `escaped_path/%` (LIKE)
+/// - If the path is a root path (no `/`) and does NOT end with `/`, treat as exact (=)
+/// (this makes `--path Makefile` and `--path LICENSE` work as intended)
+/// - Else if the last path segment contains `.`, treat as exact (=)
+/// - Otherwise, treat as directory prefix -> `escaped_path/%` (LIKE)
@@
-fn build_path_query(path: &str) -> PathQuery {
+fn build_path_query(path: &str) -> PathQuery {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
- let is_file = !path.ends_with('/') && last_segment.contains('.');
- let escaped = escape_like(trimmed);
+ let is_root = !trimmed.contains('/');
+ let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));
if is_file {
PathQuery {
- value: escaped,
+ // IMPORTANT: do NOT escape for exact match (=)
+ value: trimmed.to_string(),
is_prefix: false,
}
} else {
+ let escaped = escape_like(trimmed);
PathQuery {
value: format!("{escaped}/%"),
is_prefix: true,
}
}
}
@@
-/// **Known limitation:** Dotless root files (LICENSE, Makefile, Dockerfile)
-/// without a trailing `/` will be treated as directory prefixes. Use `--path`
-/// for these — the `--path` flag passes through to Expert mode directly,
-/// and the `build_path_query` output for "LICENSE" is a prefix `LICENSE/%`
-/// which will simply return zero results (a safe, obvious failure mode that the
-/// help text addresses).
+/// Note: Root file paths passed via `--path` (including dotless files like Makefile/LICENSE)
+/// are treated as exact matches unless they end with `/`.
Also update the --path help text to be explicit:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- /// Force expert mode for a file/directory path (handles root files like
- /// README.md, LICENSE, Makefile that lack a / and can't be auto-detected)
+ /// Force expert mode for a file/directory path.
+ /// Root files (README.md, LICENSE, Makefile) are treated as exact matches.
+ /// Use a trailing `/` to force directory-prefix matching.
2) Fix Active mode: your note_count is currently counting participants, and the CTE scans too broadly
Why
In note_agg, you do SELECT DISTINCT discussion_id, author_username and then COUNT(*) AS note_count. Thats participant count, not note count.
The current note_agg also builds the DISTINCT set from all notes then joins to picked. Its avoidable work.
Change
Split into two aggregations scoped to picked:
note_counts: counts non-system notes per picked discussion.
participants: distinct usernames per picked discussion, then GROUP_CONCAT.
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- note_agg AS (
- SELECT
- n.discussion_id,
- COUNT(*) AS note_count,
- GROUP_CONCAT(n.author_username, X'1F') AS participants
- FROM (
- SELECT DISTINCT discussion_id, author_username
- FROM notes
- WHERE is_system = 0 AND author_username IS NOT NULL
- ) n
- JOIN picked p ON p.id = n.discussion_id
- GROUP BY n.discussion_id
- )
+ note_counts AS (
+ SELECT
+ n.discussion_id,
+ COUNT(*) AS note_count
+ FROM notes n
+ JOIN picked p ON p.id = n.discussion_id
+ WHERE n.is_system = 0
+ GROUP BY n.discussion_id
+ ),
+ participants AS (
+ SELECT
+ x.discussion_id,
+ GROUP_CONCAT(x.author_username, X'1F') AS participants
+ FROM (
+ SELECT DISTINCT n.discussion_id, n.author_username
+ FROM notes n
+ JOIN picked p ON p.id = n.discussion_id
+ WHERE n.is_system = 0 AND n.author_username IS NOT NULL
+ ) x
+ GROUP BY x.discussion_id
+ )
@@
- LEFT JOIN note_agg na ON na.discussion_id = p.id
+ LEFT JOIN note_counts nc ON nc.discussion_id = p.id
+ LEFT JOIN participants pa ON pa.discussion_id = p.id
@@
- COALESCE(na.note_count, 0) AS note_count,
- COALESCE(na.participants, '') AS participants
+ COALESCE(nc.note_count, 0) AS note_count,
+ COALESCE(pa.participants, '') AS participants
Net effect: correctness fix + more predictable perf.
Add a test that would have failed before:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
#[test]
fn test_active_query() {
@@
- insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
+ insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
+ insert_diffnote(&conn, 2, 1, 1, "reviewer_b", "src/foo.rs", "follow-up");
@@
- assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
+ assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
+ assert_eq!(result.discussions[0].note_count, 2);
3) Index fix: idx_discussions_unresolved_recent wont help global --active ordering
Why
Your index is (project_id, last_note_at) with WHERE resolvable=1 AND resolved=0.
When --active is not project-scoped (common default), SQLite cant use (project_id, last_note_at) to satisfy ORDER BY last_note_at DESC efficiently because project_id isnt constrained.
This can turn into a scan+sort over potentially large unresolved sets.
Change
Keep the project-scoped index, but add a global ordering index (partial, still small):
diff
Copy code
diff --git a/plan.md b/plan.md
@@
CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent
ON discussions(project_id, last_note_at)
WHERE resolvable = 1 AND resolved = 0;
+
+-- Active (global): unresolved discussions by recency (no project scope).
+-- Supports ORDER BY last_note_at DESC LIMIT N when project_id is unconstrained.
+CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent_global
+ ON discussions(last_note_at)
+ WHERE resolvable = 1 AND resolved = 0;
4) Make Overlap “touches” coherent: count MRs for reviewers, not DiffNotes
Why
Overlaps question is “Who else has MRs touching my files?” but:
reviewer branch uses COUNT(*) (DiffNotes)
author branch uses COUNT(DISTINCT m.id) (MRs)
Those are different units; summing them into touch_count is misleading.
Change
Count distinct MRs on the reviewer branch too:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- COUNT(*) AS touch_count,
+ COUNT(DISTINCT m.id) AS touch_count,
MAX(n.created_at) AS last_touch_at,
GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
Also update human output labeling:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- style("Touches").bold(),
+ style("MRs").bold(),
(You still preserve “strength” via mr_refs and last_touch_at.)
5) Make outputs more actionable: add a canonical ref field (group/project!iid, group/project#iid)
Why
You already do this for Overlap (mr_refs). Doing the same for Workload and Active reduces friction for both humans and agents:
humans can copy/paste a single token
robots dont need to stitch project_path + iid + prefix
Change (Workload structs + SQL)
diff
Copy code
diff --git a/plan.md b/plan.md
@@
pub struct WorkloadIssue {
pub iid: i64,
+ pub ref_: String,
pub title: String,
pub project_path: String,
pub updated_at: i64,
}
@@
pub struct WorkloadMr {
pub iid: i64,
+ pub ref_: String,
pub title: String,
pub draft: bool,
pub project_path: String,
@@
- let issues_sql =
- "SELECT i.iid, i.title, p.path_with_namespace, i.updated_at
+ let issues_sql =
+ "SELECT i.iid,
+ (p.path_with_namespace || '#' || i.iid) AS ref,
+ i.title, p.path_with_namespace, i.updated_at
@@
- iid: row.get(0)?,
- title: row.get(1)?,
- project_path: row.get(2)?,
- updated_at: row.get(3)?,
+ iid: row.get(0)?,
+ ref_: row.get(1)?,
+ title: row.get(2)?,
+ project_path: row.get(3)?,
+ updated_at: row.get(4)?,
})
@@
- let authored_sql =
- "SELECT m.iid, m.title, m.draft, p.path_with_namespace, m.updated_at
+ let authored_sql =
+ "SELECT m.iid,
+ (p.path_with_namespace || '!' || m.iid) AS ref,
+ m.title, m.draft, p.path_with_namespace, m.updated_at
@@
- iid: row.get(0)?,
- title: row.get(1)?,
- draft: row.get::<_, i32>(2)? != 0,
- project_path: row.get(3)?,
+ iid: row.get(0)?,
+ ref_: row.get(1)?,
+ title: row.get(2)?,
+ draft: row.get::<_, i32>(3)? != 0,
+ project_path: row.get(4)?,
author_username: None,
- updated_at: row.get(4)?,
+ updated_at: row.get(5)?,
})
Then use ref_ in human output + robot JSON.
6) Reviews mode: tolerate leading whitespace before **prefix**
Why
Many people write " **suggestion**: ...". Current LIKE '**%**%' misses that.
Change
Use ltrim(n.body) consistently:
diff
Copy code
diff --git a/plan.md b/plan.md
@@
- AND n.body LIKE '**%**%'
+ AND ltrim(n.body) LIKE '**%**%'
@@
- SUBSTR(n.body, 3, INSTR(SUBSTR(n.body, 3), '**') - 1) AS raw_prefix,
+ SUBSTR(ltrim(n.body), 3, INSTR(SUBSTR(ltrim(n.body), 3), '**') - 1) AS raw_prefix,
7) Add two small tests that catch the above regressions
Why
These are exactly the kind of issues that slip through without targeted tests.
diff
Copy code
diff --git a/plan.md b/plan.md
@@
#[test]
fn test_escape_like() {
@@
}
+
+ #[test]
+ fn test_build_path_query_exact_does_not_escape() {
+ // '_' must not be escaped for '='
+ let pq = build_path_query("README_with_underscore.md");
+ assert_eq!(pq.value, "README_with_underscore.md");
+ assert!(!pq.is_prefix);
+ }
+
+ #[test]
+ fn test_path_flag_dotless_root_file_is_exact() {
+ let pq = build_path_query("Makefile");
+ assert_eq!(pq.value, "Makefile");
+ assert!(!pq.is_prefix);
+ }
Summary of net effect
Correctness fixes: exact-path escaping bug; Active.note_count bug.
Perf fixes: global --active index; avoid broad note scans in Active.
Usefulness upgrades: coherent overlap “touch” metric; canonical refs everywhere; reviews prefix more robust.
If you want one extra “stretch” that still isnt scope creep: add an unscoped warning line in human output when project_id == None (e.g., “Aggregated across projects; use -p to scope”) for Expert/Overlap/Active. Thats pure presentation, but prevents misinterpretation in multi-project DBs.