docs: add lore who command design plan with 8 iterations of review feedback
Design document for `lore who` — a people intelligence query layer over existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53 participants). Answers five collaboration questions: expert lookup by file/path, workload summary, review pattern analysis, active discussion tracking, and file overlap detection. Key design decisions refined across 8 feedback iterations: - All SQL is fully static (no format!()) with prepare_cached() throughout - Exact vs prefix path matching via PathQuery struct (two static SQL variants) - Self-review exclusion (author != reviewer) on all DiffNote branches - Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers - Bounded payloads with *_total/*_truncated metadata for robot consumers - Truncation transparency via LIMIT+1 overflow detection pattern - Robot JSON includes resolved_input for reproducibility (since_mode tri-state) - Multi-project correctness with project-qualified entity references - Composite migration indexes designed for query selectivity on hot paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
434
docs/who-command-design.feedback-1.md
Normal file
434
docs/who-command-design.feedback-1.md
Normal file
@@ -0,0 +1,434 @@
|
||||
Below are the highest-leverage revisions I’d make to this plan. I’m focusing on correctness pitfalls, SQLite gotchas, query performance on 280K notes, and reducing “dynamic SQL + param juggling” complexity—without turning this into a new ingestion project.
|
||||
|
||||
Change 1 — Fix a hard SQLite bug in --active (GROUP_CONCAT DISTINCT + separator)
|
||||
Why
|
||||
|
||||
SQLite does not allow GROUP_CONCAT(DISTINCT x, sep). With DISTINCT, SQLite only permits a single argument (GROUP_CONCAT(DISTINCT x)). Your current query will error at runtime in many SQLite versions.
|
||||
|
||||
Revision
|
||||
|
||||
Use a subquery that selects distinct participants, then GROUP_CONCAT with your separator.
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ fn query_active(...)
|
||||
- (SELECT GROUP_CONCAT(DISTINCT n.author_username, X'1F')
|
||||
- FROM notes n
|
||||
- WHERE n.discussion_id = d.id
|
||||
- AND n.is_system = 0
|
||||
- AND n.author_username IS NOT NULL) AS participants
|
||||
+ (SELECT GROUP_CONCAT(username, X'1F') FROM (
|
||||
+ SELECT DISTINCT n.author_username AS username
|
||||
+ FROM notes n
|
||||
+ WHERE n.discussion_id = d.id
|
||||
+ AND n.is_system = 0
|
||||
+ AND n.author_username IS NOT NULL
|
||||
+ ORDER BY username
|
||||
+ )) AS participants
|
||||
|
||||
Change 2 — Replace “contains('.') => exact file match” with segment-aware path classification
|
||||
Why
|
||||
|
||||
path.contains('.') misclassifies directories like:
|
||||
|
||||
.github/workflows/
|
||||
|
||||
src/v1.2/auth/
|
||||
|
||||
It also fails the “root file” case (README.md) because your mode discriminator only treats paths as paths if they contain /.
|
||||
|
||||
Revision
|
||||
|
||||
Add explicit --path to force Expert mode (covers root files cleanly).
|
||||
|
||||
Classify file-vs-dir by checking last path segment for a dot, and whether the input ends with /.
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ pub struct WhoArgs {
|
||||
- /// Username or file path (path if contains /)
|
||||
- pub target: Option<String>,
|
||||
+ /// Username or file path shorthand (ambiguous for root files like README.md)
|
||||
+ pub target: Option<String>,
|
||||
+
|
||||
+ /// Force expert mode for a file/directory path (supports root files like README.md)
|
||||
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["active", "overlap", "reviews"])]
|
||||
+ pub path: Option<String>,
|
||||
@@ fn resolve_mode<'a>(args: &'a WhoArgs) -> Result<WhoMode<'a>> {
|
||||
- if let Some(target) = &args.target {
|
||||
+ if let Some(p) = &args.path {
|
||||
+ return Ok(WhoMode::Expert { path: p });
|
||||
+ }
|
||||
+ if let Some(target) = &args.target {
|
||||
let clean = target.strip_prefix('@').unwrap_or(target);
|
||||
if args.reviews {
|
||||
return Ok(WhoMode::Reviews { username: clean });
|
||||
}
|
||||
- // Disambiguation: if target contains '/', it's a file path.
|
||||
- // GitLab usernames never contain '/'.
|
||||
- if target.contains('/') {
|
||||
+ // Disambiguation:
|
||||
+ // - treat as path if it contains '/'
|
||||
+ // - otherwise treat as username (root files require --path)
|
||||
+ if target.contains('/') {
|
||||
return Ok(WhoMode::Expert { path: target });
|
||||
}
|
||||
return Ok(WhoMode::Workload { username: clean });
|
||||
}
|
||||
|
||||
|
||||
And update the path pattern logic used by Expert/Overlap:
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ fn query_expert(...)
|
||||
- // Normalize path for LIKE matching: add trailing % if no extension
|
||||
- let path_pattern = if path.contains('.') {
|
||||
- path.to_string() // Exact file match
|
||||
- } else {
|
||||
- let trimmed = path.trim_end_matches('/');
|
||||
- format!("{trimmed}/%")
|
||||
- };
|
||||
+ // Normalize:
|
||||
+ // - if ends_with('/') => directory prefix
|
||||
+ // - else if last segment contains '.' => file exact match
|
||||
+ // - else => directory prefix
|
||||
+ let trimmed = path.trim_end_matches('/');
|
||||
+ let last = trimmed.rsplit('/').next().unwrap_or(trimmed);
|
||||
+ let is_file = !path.ends_with('/') && last.contains('.');
|
||||
+ let path_pattern = if is_file { trimmed.to_string() } else { format!("{trimmed}/%") };
|
||||
|
||||
Change 3 — Stop building dynamic SQL strings for optional filters; always bind params
|
||||
Why
|
||||
|
||||
Right now you’re mixing:
|
||||
|
||||
dynamic project_clause string fragments
|
||||
|
||||
ad-hoc param vectors
|
||||
|
||||
placeholder renumbering by branch
|
||||
|
||||
That’s brittle and easy to regress (especially when you add more conditions later). SQLite/rusqlite can bind Option<T> to NULL, which enables a simple pattern:
|
||||
|
||||
sql
|
||||
Copy code
|
||||
AND (?3 IS NULL OR n.project_id = ?3)
|
||||
|
||||
Revision (representative; apply to all queries)
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ fn query_expert(...)
|
||||
- let project_clause = if project_id.is_some() {
|
||||
- "AND n.project_id = ?3"
|
||||
- } else {
|
||||
- ""
|
||||
- };
|
||||
-
|
||||
- let sql = format!(
|
||||
+ let sql = format!(
|
||||
"SELECT username, role, activity_count, last_active_at FROM (
|
||||
@@
|
||||
FROM notes n
|
||||
WHERE n.position_new_path LIKE ?1
|
||||
AND n.is_system = 0
|
||||
AND n.author_username IS NOT NULL
|
||||
AND n.created_at >= ?2
|
||||
- {project_clause}
|
||||
+ AND (?3 IS NULL OR n.project_id = ?3)
|
||||
@@
|
||||
WHERE n.position_new_path LIKE ?1
|
||||
AND m.author_username IS NOT NULL
|
||||
AND m.updated_at >= ?2
|
||||
- {project_clause}
|
||||
+ AND (?3 IS NULL OR n.project_id = ?3)
|
||||
GROUP BY m.author_username
|
||||
- )"
|
||||
+ ) t"
|
||||
);
|
||||
-
|
||||
- let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
|
||||
- params.push(Box::new(path_pattern.clone()));
|
||||
- params.push(Box::new(since_ms));
|
||||
- if let Some(pid) = project_id {
|
||||
- params.push(Box::new(pid));
|
||||
- }
|
||||
- let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
|
||||
+ let param_refs = rusqlite::params![path_pattern, since_ms, project_id];
|
||||
|
||||
|
||||
Notes:
|
||||
|
||||
Adds required derived-table alias t (some SQLite configurations are stricter).
|
||||
|
||||
Eliminates the dynamic param vector and placeholder gymnastics.
|
||||
|
||||
Change 4 — Filter “path touch” queries to DiffNotes and escape LIKE properly
|
||||
Why
|
||||
|
||||
Only DiffNotes reliably have position_new_path; including other note types can skew counts and harm performance.
|
||||
|
||||
LIKE treats % and _ as wildcards—rare in file paths, but not impossible (generated files, templates). Escaping is a low-cost robustness win.
|
||||
|
||||
Revision
|
||||
|
||||
Add note_type='DiffNote' and LIKE ... ESCAPE '\' plus a tiny escape helper.
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ fn query_expert(...)
|
||||
- FROM notes n
|
||||
- WHERE n.position_new_path LIKE ?1
|
||||
+ FROM notes n
|
||||
+ WHERE n.note_type = 'DiffNote'
|
||||
+ AND n.position_new_path LIKE ?1 ESCAPE '\'
|
||||
AND n.is_system = 0
|
||||
@@
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ Helper Functions
|
||||
+fn escape_like(input: &str) -> String {
|
||||
+ input.replace('\\', "\\\\").replace('%', "\\%").replace('_', "\\_")
|
||||
+}
|
||||
|
||||
|
||||
And when building patterns:
|
||||
|
||||
diff
|
||||
Copy code
|
||||
- let path_pattern = if is_file { trimmed.to_string() } else { format!("{trimmed}/%") };
|
||||
+ let base = escape_like(trimmed);
|
||||
+ let path_pattern = if is_file { base } else { format!("{base}/%") };
|
||||
|
||||
|
||||
Apply the same changes to query_overlap and any other position_new_path LIKE ....
|
||||
|
||||
Change 5 — Use note timestamps for “touch since” semantics (Expert/Overlap author branch)
|
||||
Why
|
||||
|
||||
In Expert/Overlap “author” branches you filter by m.updated_at >= since. That answers “MR updated recently” rather than “MR touched at this path recently”, which can surface stale ownership.
|
||||
|
||||
Revision
|
||||
|
||||
Filter by the note creation time (and use it for “last touch” where relevant). You can still compute author activity, but anchor it to note activity.
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ fn query_overlap(...)
|
||||
- WHERE n.position_new_path LIKE ?1
|
||||
+ WHERE n.note_type = 'DiffNote'
|
||||
+ AND n.position_new_path LIKE ?1 ESCAPE '\'
|
||||
AND m.state IN ('opened', 'merged')
|
||||
AND m.author_username IS NOT NULL
|
||||
- AND m.updated_at >= ?2
|
||||
+ AND n.created_at >= ?2
|
||||
AND (?3 IS NULL OR m.project_id = ?3)
|
||||
|
||||
|
||||
Same idea in Expert mode’s “MR authors” branch.
|
||||
|
||||
Change 6 — Workload mode: apply --since consistently to unresolved discussions
|
||||
Why
|
||||
|
||||
Workload’s unresolved discussions ignore since_ms. That makes --since partially misleading and can dump very old threads.
|
||||
|
||||
Revision
|
||||
|
||||
Filter on d.last_note_at when since_ms is set.
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ fn query_workload(...)
|
||||
- let disc_sql = format!(
|
||||
+ let disc_since = if since_ms.is_some() {
|
||||
+ "AND d.last_note_at >= ?2"
|
||||
+ } else { "" };
|
||||
+ let disc_sql = format!(
|
||||
"SELECT d.noteable_type,
|
||||
@@
|
||||
WHERE d.resolvable = 1 AND d.resolved = 0
|
||||
AND EXISTS (
|
||||
@@
|
||||
)
|
||||
{disc_project_filter}
|
||||
+ {disc_since}
|
||||
ORDER BY d.last_note_at DESC
|
||||
LIMIT {limit}"
|
||||
);
|
||||
@@
|
||||
- // Rebuild params for discussion query (only username + optional project_id)
|
||||
- let mut disc_params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
|
||||
- disc_params.push(Box::new(username.to_string()));
|
||||
- if let Some(pid) = project_id {
|
||||
- disc_params.push(Box::new(pid));
|
||||
- }
|
||||
+ // Params: username, since_ms, project_id (NULLs ok)
|
||||
+ let disc_param_refs = rusqlite::params![username, since_ms, project_id];
|
||||
|
||||
|
||||
(If you adopt Change 3 fully, this becomes very clean.)
|
||||
|
||||
Change 7 — Make Overlap results represent “both roles” instead of collapsing to one
|
||||
Why
|
||||
|
||||
Collapsing to a single role loses valuable info (“they authored and reviewed”). Also your current “prefer author” rule is arbitrary for the “who else is touching this” question.
|
||||
|
||||
Revision
|
||||
|
||||
Track role counts separately and render as A, R, or A+R.
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ pub struct OverlapUser {
|
||||
pub username: String,
|
||||
- pub role: String,
|
||||
- pub touch_count: u32,
|
||||
+ pub author_touch_count: u32,
|
||||
+ pub review_touch_count: u32,
|
||||
+ pub touch_count: u32,
|
||||
pub last_touch_at: i64,
|
||||
pub mr_iids: Vec<i64>,
|
||||
}
|
||||
@@ fn query_overlap(...)
|
||||
- let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
|
||||
+ let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
|
||||
username: username.clone(),
|
||||
- role: role.clone(),
|
||||
+ author_touch_count: 0,
|
||||
+ review_touch_count: 0,
|
||||
touch_count: 0,
|
||||
last_touch_at: 0,
|
||||
mr_iids: Vec::new(),
|
||||
});
|
||||
entry.touch_count += count;
|
||||
+ if role == "author" { entry.author_touch_count += count; }
|
||||
+ if role == "reviewer" { entry.review_touch_count += count; }
|
||||
@@ human output
|
||||
- println!(
|
||||
- " {:<16} {:<8} {:>7} {:<12} {}",
|
||||
+ println!(
|
||||
+ " {:<16} {:<6} {:>7} {:<12} {}",
|
||||
...
|
||||
);
|
||||
@@
|
||||
- user.role,
|
||||
+ format_roles(user.author_touch_count, user.review_touch_count),
|
||||
|
||||
Change 8 — Add an “Index Audit + optional migration” step (big perf win, low blast radius)
|
||||
Why
|
||||
|
||||
With 280K notes, the path/timestamp queries will degrade quickly without indexes. This isn’t “scope creep”; it’s making the feature usable.
|
||||
|
||||
Revision (plan-level)
|
||||
|
||||
Add a non-breaking migration that only creates indexes if missing.
|
||||
|
||||
Optionally add a runtime check: if EXPLAIN QUERY PLAN indicates full table scan on notes, print a dim warning in human mode.
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ Implementation Order
|
||||
-| Step | What | Files |
|
||||
+| Step | What | Files |
|
||||
| 1 | CLI skeleton: `WhoArgs` + `Commands::Who` + dispatch + stub | `cli/mod.rs`, `commands/mod.rs`, `main.rs` |
|
||||
+| 1.5 | Index audit + add `CREATE INDEX IF NOT EXISTS` migration for who hot paths | `migrations/0xx_who_indexes.sql` |
|
||||
@@
|
||||
|
||||
|
||||
Suggested indexes (tune names to your conventions):
|
||||
|
||||
notes(note_type, position_new_path, created_at)
|
||||
|
||||
notes(discussion_id, is_system, author_username)
|
||||
|
||||
discussions(resolvable, resolved, last_note_at, project_id)
|
||||
|
||||
merge_requests(project_id, state, updated_at, author_username)
|
||||
|
||||
issue_assignees(username, issue_id)
|
||||
|
||||
Even if SQLite can’t perfectly index LIKE, these still help with join and timestamp filters.
|
||||
|
||||
Change 9 — Make robot JSON reproducible by echoing the effective query inputs
|
||||
Why
|
||||
|
||||
Agent workflows benefit from a stable “query record”: what mode ran, what path/user, resolved project, effective since, limit.
|
||||
|
||||
Revision
|
||||
|
||||
Include an input object in JSON output.
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ struct WhoJsonData {
|
||||
mode: String,
|
||||
+ input: serde_json::Value,
|
||||
#[serde(flatten)]
|
||||
result: serde_json::Value,
|
||||
}
|
||||
@@ pub fn print_who_json(...)
|
||||
- let output = WhoJsonEnvelope {
|
||||
+ let input = serde_json::json!({
|
||||
+ "project": /* resolved or raw args.project */,
|
||||
+ "since": /* resolved since ISO */,
|
||||
+ "limit": /* args.limit */,
|
||||
+ });
|
||||
+ let output = WhoJsonEnvelope {
|
||||
ok: true,
|
||||
data: WhoJsonData {
|
||||
mode: mode.to_string(),
|
||||
+ input,
|
||||
result: data,
|
||||
},
|
||||
meta: RobotMeta { elapsed_ms },
|
||||
};
|
||||
|
||||
Change 10 — Tighten clap constraints so invalid combinations never reach resolve_mode
|
||||
Why
|
||||
|
||||
Right now conflicts are enforced manually (or not at all). Clamp the invalid combos at the CLI layer:
|
||||
|
||||
--active should conflict with target, --overlap, --reviews, --path
|
||||
|
||||
--reviews should require a username (and should conflict with Expert path modes)
|
||||
|
||||
diff
|
||||
Copy code
|
||||
diff --git a/Plan.md b/Plan.md
|
||||
@@ pub struct WhoArgs {
|
||||
- pub active: bool,
|
||||
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["target", "overlap", "reviews", "path"])]
|
||||
+ pub active: bool,
|
||||
@@
|
||||
- pub overlap: Option<String>,
|
||||
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["target", "active", "reviews", "path"])]
|
||||
+ pub overlap: Option<String>,
|
||||
@@
|
||||
- pub reviews: bool,
|
||||
+ #[arg(long, help_heading = "Mode", requires = "target", conflicts_with_all = ["active", "overlap", "path"])]
|
||||
+ pub reviews: bool,
|
||||
|
||||
Summary of what I’d definitely change
|
||||
|
||||
If you do nothing else, do these first:
|
||||
|
||||
Fix GROUP_CONCAT(DISTINCT ..., sep) in Active mode (runtime error).
|
||||
|
||||
Path classification: add --path, and stop using contains('.') globally.
|
||||
|
||||
Remove dynamic SQL + param vectors: always bind project_id as nullable and use (? IS NULL OR ...).
|
||||
|
||||
Filter to DiffNotes + LIKE escaping for correctness and fewer rows scanned.
|
||||
|
||||
Optional index migration: otherwise this will feel slow/non-deterministically slow depending on local DB state.
|
||||
|
||||
If you want, I can also provide a consolidated “v2 plan” as a single unified patch (one diff) rather than per-change snippets.
|
||||
Reference in New Issue
Block a user