docs: add lore who command design plan with 8 iterations of review feedback

Design document for `lore who` — a people intelligence query layer over
existing GitLab data (280K notes, 210K discussions, 33K DiffNotes, 53
participants). Answers five collaboration questions: expert lookup by
file/path, workload summary, review pattern analysis, active discussion
tracking, and file overlap detection.

Key design decisions refined across 8 feedback iterations:
- All SQL is fully static (no format!()) with prepare_cached() throughout
- Exact vs prefix path matching via PathQuery struct (two static SQL variants)
- Self-review exclusion (author != reviewer) on all DiffNote branches
- Deterministic output: sorted GROUP_CONCAT results, stable tie-breakers
- Bounded payloads with *_total/*_truncated metadata for robot consumers
- Truncation transparency via LIMIT+1 overflow detection pattern
- Robot JSON includes resolved_input for reproducibility (since_mode tri-state)
- Multi-project correctness with project-qualified entity references
- Composite migration indexes designed for query selectivity on hot paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Taylor Eernisse
2026-02-07 21:35:05 -05:00
parent 3e7fa607d3
commit 8dc479e515
9 changed files with 6194 additions and 0 deletions

View File

@@ -0,0 +1,303 @@
Below are the highest-leverage revisions Id make to iteration 1 to tighten correctness, performance, and “agent usefulness” without blowing up scope. For each change: (1) rationale, (2) a focused unified diff against the plan you pasted.
Change 1 — Make robot “input echo” actually resolved (project_id, project_path, since_ms/iso, mode)
Why
Your Design Principle #5 says the robot envelope should echo resolved inputs (“effective since, resolved project”), but the current input object echoes only raw CLI strings. Agents cant reliably reproduce or compare runs (e.g., fuzzy project resolution may map differently over time).
This is also a reliability improvement: “what ran” should be computed once and propagated, not recomputed in output.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-5. **Robot-first reproducibility.** Robot JSON output includes an `input` object echoing the resolved query parameters (effective since, resolved project, limit) so agents can trace exactly what ran.
+5. **Robot-first reproducibility.** Robot JSON output includes a `resolved_input` object (mode, since_ms + since_iso, resolved project_id + project_path, limit, db_path) so agents can trace exactly what ran.
@@
-/// Main entry point. Resolves mode from args and dispatches.
-pub fn run_who(config: &Config, args: &WhoArgs) -> Result<WhoResult> {
+/// Main entry point. Resolves mode + resolved inputs once, then dispatches.
+pub fn run_who(config: &Config, args: &WhoArgs) -> Result<WhoRun> {
let db_path = get_db_path(config.storage.db_path.as_deref());
let conn = create_connection(&db_path)?;
- let project_id = args
+ let project_id = args
.project
.as_deref()
.map(|p| resolve_project(&conn, p))
.transpose()?;
+ let project_path = project_id
+ .map(|id| lookup_project_path(&conn, id))
+ .transpose()?;
let mode = resolve_mode(args)?;
match mode {
WhoMode::Expert { path } => {
let since_ms = resolve_since(args.since.as_deref(), "6m")?;
let result = query_expert(&conn, path, project_id, since_ms, args.limit)?;
- Ok(WhoResult::Expert(result))
+ Ok(WhoRun::new("expert", &db_path, project_id, project_path, since_ms, args.limit, WhoResult::Expert(result)))
}
@@
}
}
+
+/// Wrapper that carries resolved inputs for reproducible output.
+pub struct WhoRun {
+ pub mode: String,
+ pub resolved_input: WhoResolvedInput,
+ pub result: WhoResult,
+}
+
+pub struct WhoResolvedInput {
+ pub db_path: String,
+ pub project_id: Option<i64>,
+ pub project_path: Option<String>,
+ pub since_ms: i64,
+ pub since_iso: String,
+ pub limit: usize,
+}
@@
-pub fn print_who_json(result: &WhoResult, args: &WhoArgs, elapsed_ms: u64) {
- let (mode, data) = match result {
+pub fn print_who_json(run: &WhoRun, args: &WhoArgs, elapsed_ms: u64) {
+ let (mode, data) = match &run.result {
WhoResult::Expert(r) => ("expert", expert_to_json(r)),
@@
- let input = serde_json::json!({
+ let input = serde_json::json!({
"target": args.target,
"path": args.path,
"project": args.project,
"since": args.since,
"limit": args.limit,
});
+
+ let resolved_input = serde_json::json!({
+ "mode": run.mode,
+ "db_path": run.resolved_input.db_path,
+ "project_id": run.resolved_input.project_id,
+ "project_path": run.resolved_input.project_path,
+ "since_ms": run.resolved_input.since_ms,
+ "since_iso": run.resolved_input.since_iso,
+ "limit": run.resolved_input.limit,
+ });
@@
- data: WhoJsonData {
- mode: mode.to_string(),
- input,
- result: data,
- },
+ data: WhoJsonData { mode: mode.to_string(), input, resolved_input, result: data },
meta: RobotMeta { elapsed_ms },
};
@@
struct WhoJsonData {
mode: String,
input: serde_json::Value,
+ resolved_input: serde_json::Value,
#[serde(flatten)]
result: serde_json::Value,
}
Change 2 — Remove dynamic SQL format!(..LIMIT {limit}) and parameterize LIMIT everywhere
Why
You explicitly prefer static SQL ((?N IS NULL OR ...)) to avoid subtle bugs; but Workload/Active use format! for LIMIT. Even though limit is typed, its an inconsistency that complicates statement caching and encourages future string assembly creep.
SQLite supports LIMIT ? with bound parameters; rusqlite can bind an i64.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
- let issues_sql = format!(
- "SELECT ...
- ORDER BY i.updated_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&issues_sql)?;
+ let issues_sql =
+ "SELECT ...
+ ORDER BY i.updated_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(issues_sql)?;
let assigned_issues: Vec<WorkloadIssue> = stmt
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let authored_sql = format!(
- "SELECT ...
- ORDER BY m.updated_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&authored_sql)?;
+ let authored_sql =
+ "SELECT ...
+ ORDER BY m.updated_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(authored_sql)?;
@@
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let reviewing_sql = format!(
- "SELECT ...
- ORDER BY m.updated_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&reviewing_sql)?;
+ let reviewing_sql =
+ "SELECT ...
+ ORDER BY m.updated_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(reviewing_sql)?;
@@
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let disc_sql = format!(
- "SELECT ...
- ORDER BY d.last_note_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&disc_sql)?;
+ let disc_sql =
+ "SELECT ...
+ ORDER BY d.last_note_at DESC
+ LIMIT ?4";
+ let mut stmt = conn.prepare(disc_sql)?;
@@
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
@@
- let sql = format!(
- "SELECT ...
- ORDER BY d.last_note_at DESC
- LIMIT {limit}"
- );
- let mut stmt = conn.prepare(&sql)?;
+ let sql =
+ "SELECT ...
+ ORDER BY d.last_note_at DESC
+ LIMIT ?3";
+ let mut stmt = conn.prepare(sql)?;
@@
- .query_map(rusqlite::params![since_ms, project_id], |row| {
+ .query_map(rusqlite::params![since_ms, project_id, limit as i64], |row| {
Change 3 — Fix path matching for dotless files (LICENSE/Makefile) via “exact OR prefix” (no new flags)
Why
Your improved “dot only in last segment” heuristic still fails on dotless files (LICENSE, Makefile, Dockerfile) which are common, especially at repo root. Right now theyll be treated as directories (LICENSE/%) and silently return nothing.
Best minimal UX: if user provides a path thats ambiguous (no trailing slash), match either exact file OR directory prefix.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
-/// Build a LIKE pattern from a user-supplied path, with proper LIKE escaping.
-///
-/// Rules:
-/// - If the path ends with `/`, it's a directory prefix → `escaped_path%`
-/// - If the last path segment contains `.`, it's a file → exact match
-/// - Otherwise, it's a directory prefix → `escaped_path/%`
+/// Build an exact + prefix match from a user-supplied path, with proper LIKE escaping.
+///
+/// Rules:
+/// - If the path ends with `/`, treat as directory-only (prefix match)
+/// - Otherwise, treat as ambiguous: exact match OR directory prefix
+/// (fixes dotless files like LICENSE/Makefile without requiring new flags)
@@
-fn build_path_pattern(path: &str) -> String {
+struct PathMatch {
+ exact: String,
+ prefix: String,
+ dir_only: bool,
+}
+
+fn build_path_match(path: &str) -> PathMatch {
let trimmed = path.trim_end_matches('/');
- let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
- let is_file = !path.ends_with('/') && last_segment.contains('.');
let escaped = escape_like(trimmed);
-
- if is_file {
- escaped
- } else {
- format!("{escaped}/%")
- }
+ PathMatch {
+ exact: escaped.clone(),
+ prefix: format!("{escaped}/%"),
+ dir_only: path.ends_with('/'),
+ }
}
@@
- let path_pattern = build_path_pattern(path);
+ let pm = build_path_match(path);
@@
- AND n.position_new_path LIKE ?1 ESCAPE '\\'
+ AND (
+ (?4 = 1 AND n.position_new_path LIKE ?2 ESCAPE '\\')
+ OR (?4 = 0 AND (n.position_new_path = ?1 OR n.position_new_path LIKE ?2 ESCAPE '\\'))
+ )
@@
- let rows: Vec<(String, String, u32, i64)> = stmt
- .query_map(rusqlite::params![path_pattern, since_ms, project_id], |row| {
+ let rows: Vec<(String, String, u32, i64)> = stmt
+ .query_map(rusqlite::params![pm.exact, pm.prefix, since_ms, i32::from(pm.dir_only), project_id], |row| {
Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?))
})?
(Apply the same pattern to Overlap mode.)
Change 4 — Consistently exclude system notes in all DiffNote-based branches (Expert/Overlap author branches currently dont)
Why
You filter n.is_system = 0 for reviewer branches, but not in the author branches of Expert/Overlap. That can skew “author touch” via system-generated diff notes or bot activity.
Consistency here improves correctness and also enables more aggressive partial indexing.
Plan diff
diff
Copy code
--- a/who-command-design.md
+++ b/who-command-design.md
@@
- WHERE n.note_type = 'DiffNote'
+ WHERE n.note_type = 'DiffNote'
AND n.position_new_path LIKE ?1 ESCAPE '\\'
+ AND n.is_system = 0
AND m.author_username IS NOT NULL
AND n.created_at >= ?2
AND (?3 IS NULL OR m.project_id = ?3)
@@
- WHERE n.note_type = 'DiffNote'
+ WHERE n.note_type = 'DiffNote'
AND n.position_new_path LIKE ?1 ESCAPE '\\'
+ AND n.is_system = 0
AND m.state IN ('opened', 'merged')
AND m.author_username IS NOT NULL
AND n.created_at >= ?2
AND (?3 IS NULL OR m.project_id = ?3)
Change 5 — Rework Migration 017 indexes to match real predicates + add one critical notes index for discussion participation
Why
(a) idx_notes_diffnote_path_created currently leads with note_type even though its constant via partial index. You want the leading columns to match your most selective predicates: position_new_path prefix + created_at range, with optional project_id.
(b) Active + Workload discussion participation repeatedly hits notes by (discussion_id, author_username); you only guarantee notes(discussion_id) is indexed. Adding a narrow partial composite index pays off immediately for both “participants” and “EXISTS user participated” checks.
(c) The discussions index should focus on (project_id, last_note_at) with a partial predicate; resolvable/resolved a_