Design document for `lore who` — a people intelligence query layer over existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53 participants). Answers five collaboration questions: expert lookup by file/path, workload summary, review pattern analysis, active discussion tracking, and file overlap detection. Key design decisions refined across 8 feedback iterations: - All SQL is fully static (no format!()) with prepare_cached() throughout - Exact vs prefix path matching via PathQuery struct (two static SQL variants) - Self-review exclusion (author != reviewer) on all DiffNote branches - Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers - Bounded payloads with *_total/*_truncated metadata for robot consumers - Truncation transparency via LIMIT+1 overflow detection pattern - Robot JSON includes resolved_input for reproducibility (since_mode tri-state) - Multi-project correctness with project-qualified entity references - Composite migration indexes designed for query selectivity on hot paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
12 KiB
Below are the highest-leverage revisions I’d make. They’re tightly scoped (no new tables/APIs), but fix a few real correctness issues and make the outputs more actionable.
- Fix a correctness bug in PathQuery: don’t escape for =, and make --path Makefile actually work Why
Bug: build_path_query() currently runs escape_like() even when is_prefix = false (exact match). That will break exact matches for paths containing _, %, or \ because = does not treat those as metacharacters (so the escaped string won’t equal the stored path).
UX mismatch: The plan says --path handles dotless root files (Makefile/LICENSE), but the current logic still treats them as directory prefixes (Makefile/%) → zero results.
Change
Only escape for LIKE.
Treat root paths (no /) passed via --path as exact matches by default (unless they end with /).
diff
Copy code
diff --git a/plan.md b/plan.md
@@
-/// Build a path query from a user-supplied path.
-///
-/// Rules:
-/// - If the path ends with /, it's a directory prefix -> escaped_path% (LIKE)
-/// - If the last path segment contains ., it's a file -> exact match (=)
-/// - Otherwise, it's a directory prefix -> escaped_path/% (LIKE)
+/// Build a path query from a user-supplied path.
+///
+/// Rules:
+/// - If the path ends with /, it's a directory prefix -> escaped_path/% (LIKE)
+/// - If the path is a root path (no /) and does NOT end with /, treat as exact (=)
+/// (this makes --path Makefile and --path LICENSE work as intended)
+/// - Else if the last path segment contains ., treat as exact (=)
+/// - Otherwise, treat as directory prefix -> escaped_path/% (LIKE)
@@
-fn build_path_query(path: &str) -> PathQuery {
+fn build_path_query(path: &str) -> PathQuery {
let trimmed = path.trim_end_matches('/');
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
- let is_file = !path.ends_with('/') && last_segment.contains('.');
- let escaped = escape_like(trimmed);
-
let is_root = !trimmed.contains('/');
-
let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));
if is_file { PathQuery {
-
value: escaped,
-
// IMPORTANT: do NOT escape for exact match (=) -
} else {value: trimmed.to_string(), is_prefix: false, } -
} } @@ -/// Known limitation: Dotless root files (LICENSE, Makefile, Dockerfile) -/// without a trailinglet escaped = escape_like(trimmed); PathQuery { value: format!("{escaped}/%"), is_prefix: true, }/will be treated as directory prefixes. Use--path-/// for these — the--pathflag passes through to Expert mode directly, -/// and thebuild_path_queryoutput for "LICENSE" is a prefixLICENSE/%-/// which will simply return zero results (a safe, obvious failure mode that the -/// help text addresses). +/// Note: Root file paths passed via--path(including dotless files like Makefile/LICENSE) +/// are treated as exact matches unless they end with/.
Also update the --path help text to be explicit:
diff Copy code diff --git a/plan.md b/plan.md @@
- /// Force expert mode for a file/directory path (handles root files like
- /// README.md, LICENSE, Makefile that lack a / and can't be auto-detected)
- /// Force expert mode for a file/directory path.
- /// Root files (README.md, LICENSE, Makefile) are treated as exact matches.
- /// Use a trailing
/to force directory-prefix matching.
- Fix Active mode: your note_count is currently counting participants, and the CTE scans too broadly Why
In note_agg, you do SELECT DISTINCT discussion_id, author_username and then COUNT(*) AS note_count. That’s participant count, not note count.
The current note_agg also builds the DISTINCT set from all notes then joins to picked. It’s avoidable work.
Change
Split into two aggregations scoped to picked:
note_counts: counts non-system notes per picked discussion.
participants: distinct usernames per picked discussion, then GROUP_CONCAT.
diff Copy code diff --git a/plan.md b/plan.md @@
-
note_agg AS ( -
SELECT -
n.discussion_id, -
COUNT(*) AS note_count, -
GROUP_CONCAT(n.author_username, X'1F') AS participants -
FROM ( -
SELECT DISTINCT discussion_id, author_username -
FROM notes -
WHERE is_system = 0 AND author_username IS NOT NULL -
) n -
JOIN picked p ON p.id = n.discussion_id -
GROUP BY n.discussion_id -
)
-
note_counts AS ( -
SELECT -
n.discussion_id, -
COUNT(*) AS note_count -
FROM notes n -
JOIN picked p ON p.id = n.discussion_id -
WHERE n.is_system = 0 -
GROUP BY n.discussion_id -
), -
participants AS ( -
SELECT -
x.discussion_id, -
GROUP_CONCAT(x.author_username, X'1F') AS participants -
FROM ( -
SELECT DISTINCT n.discussion_id, n.author_username -
FROM notes n -
JOIN picked p ON p.id = n.discussion_id -
WHERE n.is_system = 0 AND n.author_username IS NOT NULL -
) x -
GROUP BY x.discussion_id -
)
@@
-
LEFT JOIN note_agg na ON na.discussion_id = p.id
-
LEFT JOIN note_counts nc ON nc.discussion_id = p.id -
LEFT JOIN participants pa ON pa.discussion_id = p.id
@@
-
COALESCE(na.note_count, 0) AS note_count, -
COALESCE(na.participants, '') AS participants
-
COALESCE(nc.note_count, 0) AS note_count, -
COALESCE(pa.participants, '') AS participants
Net effect: correctness fix + more predictable perf.
Add a test that would have failed before:
diff Copy code diff --git a/plan.md b/plan.md @@ #[test] fn test_active_query() { @@
-
insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
-
insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work"); -
insert_diffnote(&conn, 2, 1, 1, "reviewer_b", "src/foo.rs", "follow-up");
@@
-
assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
-
assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]); -
assert_eq!(result.discussions[0].note_count, 2);
- Index fix: idx_discussions_unresolved_recent won’t help global --active ordering Why
Your index is (project_id, last_note_at) with WHERE resolvable=1 AND resolved=0.
When --active is not project-scoped (common default), SQLite can’t use (project_id, last_note_at) to satisfy ORDER BY last_note_at DESC efficiently because project_id isn’t constrained.
This can turn into a scan+sort over potentially large unresolved sets.
Change
Keep the project-scoped index, but add a global ordering index (partial, still small):
diff Copy code diff --git a/plan.md b/plan.md @@ CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent ON discussions(project_id, last_note_at) WHERE resolvable = 1 AND resolved = 0; + +-- Active (global): unresolved discussions by recency (no project scope). +-- Supports ORDER BY last_note_at DESC LIMIT N when project_id is unconstrained. +CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent_global
- ON discussions(last_note_at)
- WHERE resolvable = 1 AND resolved = 0;
- Make Overlap “touches” coherent: count MRs for reviewers, not DiffNotes Why
Overlap’s question is “Who else has MRs touching my files?” but:
reviewer branch uses COUNT(*) (DiffNotes)
author branch uses COUNT(DISTINCT m.id) (MRs)
Those are different units; summing them into touch_count is misleading.
Change
Count distinct MRs on the reviewer branch too:
diff Copy code diff --git a/plan.md b/plan.md @@
-
COUNT(*) AS touch_count,
-
COUNT(DISTINCT m.id) AS touch_count, MAX(n.created_at) AS last_touch_at, GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
Also update human output labeling:
diff Copy code diff --git a/plan.md b/plan.md @@
-
style("Touches").bold(),
-
style("MRs").bold(),
(You still preserve “strength” via mr_refs and last_touch_at.)
- Make outputs more actionable: add a canonical ref field (group/project!iid, group/project#iid) Why
You already do this for Overlap (mr_refs). Doing the same for Workload and Active reduces friction for both humans and agents:
humans can copy/paste a single token
robots don’t need to stitch project_path + iid + prefix
Change (Workload structs + SQL) diff Copy code diff --git a/plan.md b/plan.md @@ pub struct WorkloadIssue { pub iid: i64,
- pub ref_: String, pub title: String, pub project_path: String, pub updated_at: i64, } @@ pub struct WorkloadMr { pub iid: i64,
- pub ref_: String, pub title: String, pub draft: bool, pub project_path: String, @@
- let issues_sql =
-
"SELECT i.iid, i.title, p.path_with_namespace, i.updated_at
- let issues_sql =
-
"SELECT i.iid, -
(p.path_with_namespace || '#' || i.iid) AS ref, -
i.title, p.path_with_namespace, i.updated_at
@@
-
iid: row.get(0)?, -
title: row.get(1)?, -
project_path: row.get(2)?, -
updated_at: row.get(3)?,
-
iid: row.get(0)?, -
ref_: row.get(1)?, -
title: row.get(2)?, -
project_path: row.get(3)?, -
updated_at: row.get(4)?, })
@@
- let authored_sql =
-
"SELECT m.iid, m.title, m.draft, p.path_with_namespace, m.updated_at
- let authored_sql =
-
"SELECT m.iid, -
(p.path_with_namespace || '!' || m.iid) AS ref, -
m.title, m.draft, p.path_with_namespace, m.updated_at
@@
-
iid: row.get(0)?, -
title: row.get(1)?, -
draft: row.get::<_, i32>(2)? != 0, -
project_path: row.get(3)?,
-
iid: row.get(0)?, -
ref_: row.get(1)?, -
title: row.get(2)?, -
draft: row.get::<_, i32>(3)? != 0, -
project_path: row.get(4)?, author_username: None,
-
updated_at: row.get(4)?,
-
updated_at: row.get(5)?, })
Then use ref_ in human output + robot JSON.
- Reviews mode: tolerate leading whitespace before prefix Why
Many people write " suggestion: ...". Current LIKE '%%' misses that.
Change
Use ltrim(n.body) consistently:
diff Copy code diff --git a/plan.md b/plan.md @@
-
AND n.body LIKE '**%**%'
-
AND ltrim(n.body) LIKE '**%**%'
@@
-
SUBSTR(n.body, 3, INSTR(SUBSTR(n.body, 3), '**') - 1) AS raw_prefix,
-
SUBSTR(ltrim(n.body), 3, INSTR(SUBSTR(ltrim(n.body), 3), '**') - 1) AS raw_prefix,
- Add two small tests that catch the above regressions Why
These are exactly the kind of issues that slip through without targeted tests.
diff Copy code diff --git a/plan.md b/plan.md @@ #[test] fn test_escape_like() { @@ } +
- #[test]
- fn test_build_path_query_exact_does_not_escape() {
-
// '_' must not be escaped for '=' -
let pq = build_path_query("README_with_underscore.md"); -
assert_eq!(pq.value, "README_with_underscore.md"); -
assert!(!pq.is_prefix); - }
- #[test]
- fn test_path_flag_dotless_root_file_is_exact() {
-
let pq = build_path_query("Makefile"); -
assert_eq!(pq.value, "Makefile"); -
assert!(!pq.is_prefix); - }
Summary of net effect
Correctness fixes: exact-path escaping bug; Active.note_count bug.
Perf fixes: global --active index; avoid broad note scans in Active.
Usefulness upgrades: coherent overlap “touch” metric; canonical refs everywhere; reviews prefix more robust.
If you want one extra “stretch” that still isn’t scope creep: add an unscoped warning line in human output when project_id == None (e.g., “Aggregated across projects; use -p to scope”) for Expert/Overlap/Active. That’s pure presentation, but prevents misinterpretation in multi-project DBs.